By now your probably aware of the huge amount of data generated and
collected by individuals, businesses, and the government on a daily
basis, and you've probably also become familiar with some of the ways
businesses comb through this data to extract information useful to
their endeavors. Huge data warehouses have sprung up, and
programmers have created platforms like Hadoop to market to
businesses looking to leverage big data to their advantage. In fact,
data mining has become one of the main tools a business uses to
gather information about customers, in some cases replacing
traditional means of customer interaction like surveys, e-mail chains
and focus groups. However, businesses often run into challenges when
attempting to utilize big data, and its important for them to
understand the limitations of blindly relying on a platform or
service to make decisions.
A saying commonly heard in the business world (and beyond) goes
something like this, in one form or another: “you can't see the
forest for the trees.” The idea of this metaphor is that it
becomes difficult to see an overall trend or solution when faced with
a seemingly insurmountable litany of information. However, with the
advances of data mining and the use of big data this metaphor would
perhaps be more accurately applied in its opposite form. As the
“forest” is exhaustively analyzed, large-scale data on trends are
created and used, sometimes at the expense of individual data points
(or at least much smaller trends) that go largely unnoticed. Many
important observations and ideas may come from only one source, and
if a strict adherence to big data analytics is used these could
potentially be overlooked. Not every outlier is unimportant,
particularly when making a decision on a smaller or more localized
scale.
Another concept many are likely familiar with is the idea of
correlation versus causation. With the ability of advanced data
mining techniques to identify trends and even predict outcomes, there
is a tendency to simply act based on these trends without looking for
the deeper reasons for why these trends are occurring. An
interesting example of this phenomenon occurred when Google attempted
to research flu outbreaks by analyzing keyword searches. Compiling
data on users who searched for “nearest drugstore” or “flu
symptoms” provided Google with a seemingly pertinent metric for
determining where and when outbreaks are occurring. However, does
the fact that someone searched for the nearest pharmacy necessarily
mean that they have the flu? Without going into too many details,
Google found that a useful pattern emerged only after combining
various search terms and comparing them with actual flu surveillance
data from the Centers for Disease Control.
Briefly
mentioned earlier, Hadoop is a popular platform for analyzing big
data that many companies use. As this is being written, there is a
conference being held in New York City called the “Strata
Conference + Hadoop World” that bills itself as the place, “where
big data's most influential decision makers, architects, developers,
and analysts gather to shape the future of their businesses and
technologies” (Strata 1). Many companies are represented at this
conference, including Facebook, whose analytics chief Ken Rudin
shared some interesting observations about how his company uses big
data.
"The
problem is that Hadoop is a technology, and big data isn't about
technology. Big data is about business needs,” said Rudin
(Kanaracus 1). A lot of companies spend a great deal of money to
implement an expensive platform designed for the data mining of big
data, knowing that it is a technological trend and has the potential
to generate profits, without fully understanding it or how it can meet
their specific needs. Rudin argues that, at least in Facebook's
case, Hadoop is not a total solution to their extensive data mining
needs (almost all of their revenue comes from targeted
advertisements). A sophisticated relational database serves their
needs more effectively, particularly when “drilling down” to a
more detailed level. Knowledge of data analytics, and particularly
the practical application of big data, also serves an important role
in Facebook's hiring process. Instead of merely focusing on if a
candidate knows “how … we calculate this metric,” Rudin
suggests that candidates be given a business case study and then be
asked what metrics would be best applied to the situation.
What types
of businesses is big data really an effective tool for? Do the
potential drawbacks of using big data and data mining outweigh the
perceived benefits? The answer to these questions certainly could
vary from situation to situation, and indeed it appears the use of
big data must be considered extensively and tailored to a specific
business's needs.
Citations
“About
Strata + Hadoop World.” Strata Conference + Hadoop World.
(2013). O'Reilly Media Inc. Web. Date Accessed: 2013/10/30.
http://strataconf.com/stratany2013/public/content/about
“Flu
Trends.” Google.org. (2011).
Web. Date Accessed: 2013/10/30.
http://www.google.org/flutrends/about/how.html
Kanaracus, Chris. “Hadoop is not Enough for 'Big Data', says
Facebook Analytics Chief.” IDG News Service.
(October 29, 2013). ITWorld. Web. Date Accessed: 2013/10/30.
http://www.itworld.com/software/380556/hadoop-not-enough-big-data-says-facebook-analytics-chief?page=0,0
Polsky,
Matt and Sommer, Claire. “Dodging Big Data's Big Problems.”
GreenBiz.com.
(September 16, 2013). Web. Date Accessed: 2013/10/30.
http://www.greenbiz.com/blog/2013/09/16/big-data-big-problems
I think what we may see, is future enterprises popping up to serve as a filter for “Big Data”. If so much information is truly being overlooked, I could definitely see a company stepping up to take advantage of this new market place.
ReplyDeleteI definitely agree. There are already companies seeking to do this, such as Tresata, a company that uses the Hadoop platform mentioned in the blog. I think companies will continue to refine how they attempt to use big data and data mining techniques will continue to get more advanced.
DeleteThe issue I have with big data, from a business perspective, is whether taking all the available data into consideration is even necessary. It seems that at some point, with enough data that is, the information would begin to overrun itself. This would cause trends and other analyses focus too much on a micro level producing unreliable information for decision making.
ReplyDeleteI absolutely agree. I think that not only could data become unreliable, certain analyses of data could conflict with one another, yielding very little helpful information and perhaps leading to decision paralysis.
DeleteI think as the technology of data mining continues to to get better, that there will be better ways to adapt it to the specific businesses. It seems like one of the problems they are realizing is that depending on the business and the objectives certain data is irrelevant and even destructive in decision making. It will be interesting in the future to see how this effects the business world and how they target customers about their products. I have no doubt that there will be specialization companies who will take full advantage of finding a "niche" market for big data.
ReplyDeleteKristina, Thanks for your comment. Totally! Data mining, in general, is changing the business World drastically. Currently, there are several different softwares that can break down extremely large sums of data and can organize it to be very meaningful for several organizations. For example, there is a software that I use in my marketing analytics class in which we organize data, compose charts, and graphs that can be very relevant in high level business decisions. Furthermore, I am in total agreeance with you that in the near future there will be drastic technological advancing where companies will be able to transform this data to relevant information much more efficiently.
ReplyDeleteThats really interesting Adam, my major is marketing analytics so I will be getting into more of the analytic classes next semester. Do you know what the software program you use in your class is called? I really enjoyed reading your blog posting, it had a ton of great information on Big Data. What types of data do you analyze in your class?
ReplyDelete