Wednesday, October 30, 2013

The Challenges of Using Big Data

By now your probably aware of the huge amount of data generated and collected by individuals, businesses, and the government on a daily basis, and you've probably also become familiar with some of the ways businesses comb through this data to extract information useful to their endeavors. Huge data warehouses have sprung up, and programmers have created platforms like Hadoop to market to businesses looking to leverage big data to their advantage. In fact, data mining has become one of the main tools a business uses to gather information about customers, in some cases replacing traditional means of customer interaction like surveys, e-mail chains and focus groups. However, businesses often run into challenges when attempting to utilize big data, and its important for them to understand the limitations of blindly relying on a platform or service to make decisions.
A saying commonly heard in the business world (and beyond) goes something like this, in one form or another: “you can't see the forest for the trees.” The idea of this metaphor is that it becomes difficult to see an overall trend or solution when faced with a seemingly insurmountable litany of information. However, with the advances of data mining and the use of big data this metaphor would perhaps be more accurately applied in its opposite form. As the “forest” is exhaustively analyzed, large-scale data on trends are created and used, sometimes at the expense of individual data points (or at least much smaller trends) that go largely unnoticed. Many important observations and ideas may come from only one source, and if a strict adherence to big data analytics is used these could potentially be overlooked. Not every outlier is unimportant, particularly when making a decision on a smaller or more localized scale.
Another concept many are likely familiar with is the idea of correlation versus causation. With the ability of advanced data mining techniques to identify trends and even predict outcomes, there is a tendency to simply act based on these trends without looking for the deeper reasons for why these trends are occurring. An interesting example of this phenomenon occurred when Google attempted to research flu outbreaks by analyzing keyword searches. Compiling data on users who searched for “nearest drugstore” or “flu symptoms” provided Google with a seemingly pertinent metric for determining where and when outbreaks are occurring. However, does the fact that someone searched for the nearest pharmacy necessarily mean that they have the flu? Without going into too many details, Google found that a useful pattern emerged only after combining various search terms and comparing them with actual flu surveillance data from the Centers for Disease Control.
Briefly mentioned earlier, Hadoop is a popular platform for analyzing big data that many companies use. As this is being written, there is a conference being held in New York City called the “Strata Conference + Hadoop World” that bills itself as the place, “where big data's most influential decision makers, architects, developers, and analysts gather to shape the future of their businesses and technologies” (Strata 1). Many companies are represented at this conference, including Facebook, whose analytics chief Ken Rudin shared some interesting observations about how his company uses big data.
"The problem is that Hadoop is a technology, and big data isn't about technology. Big data is about business needs,” said Rudin (Kanaracus 1). A lot of companies spend a great deal of money to implement an expensive platform designed for the data mining of big data, knowing that it is a technological trend and has the potential to generate profits, without fully understanding it or how it can meet their specific needs. Rudin argues that, at least in Facebook's case, Hadoop is not a total solution to their extensive data mining needs (almost all of their revenue comes from targeted advertisements). A sophisticated relational database serves their needs more effectively, particularly when “drilling down” to a more detailed level. Knowledge of data analytics, and particularly the practical application of big data, also serves an important role in Facebook's hiring process. Instead of merely focusing on if a candidate knows “how … we calculate this metric,” Rudin suggests that candidates be given a business case study and then be asked what metrics would be best applied to the situation.
What types of businesses is big data really an effective tool for? Do the potential drawbacks of using big data and data mining outweigh the perceived benefits? The answer to these questions certainly could vary from situation to situation, and indeed it appears the use of big data must be considered extensively and tailored to a specific business's needs.

Citations
About Strata + Hadoop World.” Strata Conference + Hadoop World. (2013). O'Reilly Media Inc. Web. Date Accessed: 2013/10/30. http://strataconf.com/stratany2013/public/content/about

Flu Trends.” Google.org. (2011). Web. Date Accessed: 2013/10/30. http://www.google.org/flutrends/about/how.html

Kanaracus, Chris. “Hadoop is not Enough for 'Big Data', says Facebook Analytics Chief.” IDG News Service. (October 29, 2013). ITWorld. Web. Date Accessed: 2013/10/30. http://www.itworld.com/software/380556/hadoop-not-enough-big-data-says-facebook-analytics-chief?page=0,0

Polsky, Matt and Sommer, Claire. “Dodging Big Data's Big Problems.” GreenBiz.com. (September 16, 2013). Web. Date Accessed: 2013/10/30. http://www.greenbiz.com/blog/2013/09/16/big-data-big-problems

7 comments:

  1. I think what we may see, is future enterprises popping up to serve as a filter for “Big Data”. If so much information is truly being overlooked, I could definitely see a company stepping up to take advantage of this new market place.

    ReplyDelete
    Replies
    1. I definitely agree. There are already companies seeking to do this, such as Tresata, a company that uses the Hadoop platform mentioned in the blog. I think companies will continue to refine how they attempt to use big data and data mining techniques will continue to get more advanced.

      Delete
  2. The issue I have with big data, from a business perspective, is whether taking all the available data into consideration is even necessary. It seems that at some point, with enough data that is, the information would begin to overrun itself. This would cause trends and other analyses focus too much on a micro level producing unreliable information for decision making.

    ReplyDelete
    Replies
    1. I absolutely agree. I think that not only could data become unreliable, certain analyses of data could conflict with one another, yielding very little helpful information and perhaps leading to decision paralysis.

      Delete
  3. I think as the technology of data mining continues to to get better, that there will be better ways to adapt it to the specific businesses. It seems like one of the problems they are realizing is that depending on the business and the objectives certain data is irrelevant and even destructive in decision making. It will be interesting in the future to see how this effects the business world and how they target customers about their products. I have no doubt that there will be specialization companies who will take full advantage of finding a "niche" market for big data.

    ReplyDelete
  4. Kristina, Thanks for your comment. Totally! Data mining, in general, is changing the business World drastically. Currently, there are several different softwares that can break down extremely large sums of data and can organize it to be very meaningful for several organizations. For example, there is a software that I use in my marketing analytics class in which we organize data, compose charts, and graphs that can be very relevant in high level business decisions. Furthermore, I am in total agreeance with you that in the near future there will be drastic technological advancing where companies will be able to transform this data to relevant information much more efficiently.

    ReplyDelete
  5. Thats really interesting Adam, my major is marketing analytics so I will be getting into more of the analytic classes next semester. Do you know what the software program you use in your class is called? I really enjoyed reading your blog posting, it had a ton of great information on Big Data. What types of data do you analyze in your class?

    ReplyDelete