The Case for Informatica Big Data Edition & Hadoop
Do you wonder what your customers are saying about your business? What disruptive products might the competition be launching? What issues are being reported in the news that could affect your product line? These are questions that all management at business organizations should be asking and acting upon every day. Recently, I was involved in helping answer these questions to solve a pharmaceutical use case: 1.) what studies are being done that could be leveraged for future product development or that may impact my current product line; 2.) what adverse events are occurring in my products so I can improve quality and indications; and 3.) what key nuggets of information are relevant and actionable from recent news and research articles.
Business Intelligence (BI) systems can help answer these questions and deliver this information to everyone that needs it, but most don’t have the data to do so. BI systems primarily take internal information as inputs. For example, most business intelligence implementations focus on extracting and preparing information from key transactional systems for analysis, such as ERP or CRM. This gives organizations insight into what happened in those systems and maybe even the ability to build a forecast based on historical information. This is the world of relational databases, structured information, and traditional BI. Why don’t most business organizations also leverage more external information into business intelligence tools? Because, it hasn’t been possible or it has been cost prohibitive. External information typically comes in the form as Big Data. That means it’s either high in volume (size), velocity (rate of change), or variety (structured vs. unstructured).
Hadoop opens the gate to a whole new world of analytics with its ability to manage Big Data in a cost effective manner. Hadoop gives us a data management system to collect our structured and unstructured information. In the pharmaceutical business case I mentioned, we can go acquire clinical study information from the National Institute of Health (NIH), adverse events to drugs & medicines from the Food & Drug Administration (FDA), as well as recent and relevant news and research articles from various websites. However, one problem persists: How are we going to acquire, persist, integrate, and transform this information into actionable insight? Hadoop’s ecosystem offers many different platforms to make this happen. There is Flume, Sqoop, Map Reduce, HBase, Hive, Pig Latin, Mahout, etc. That’s a lot of new coding platforms for a BI or ETL (Extract-Transform-Load) professional to learn. Java skills are also often a prerequisite to mastering these new technologies. This means skills will be scarce and expensive. The learning curve is steep. Instead, leverage an established ETL tool platform, with GUI based development and a large talent pool, like Informatica Power Center. This will accelerate your time to deliver a real solution as it did in our case of delivering pharmaceutical big data analytics. The key to success is to keep your data in Hadoop as it is transformed so the full power of the cluster can be leveraged.
Often a key business driver for traditional BI and data warehouse implementations is to save resources and enable business decision makers by acquiring, integrating, and transforming the data they need into usable information. This business driver exists because without these technologies, most time and resources were spent collecting and manipulating the data, and not spent understanding and acting upon it. That issue continues to persist today with Big Data projects.
Informatica PowerCenter Big Data Edition enabled us to quickly acquire, integrate, and transform the structured and unstructured data we needed to deliver pharmaceutical big data analytics. Although it’s more complex to stand up as it tightly integrates with Hadoop, there is significant savings and gain in time and capability with data acquisition and preparation. We are able to use the traditional Power Center transformation components like Expression, Aggregator, Lookup, and Filter transformations. Additionally, we’re able to leverage Data Quality transformations for cleansing and standardization. Informatica BDE’s ability to make Social Media, Web Service API, and HTTP calls, along with PDF and text parsers, for example, not only allow for the acquisition of relevant news and research articles but also let us bring structure to this unstructured information and find the little nuggets of actionable information.
One of the keys to success in any Big Data implementation is to allow Hadoop to manage and process the data at all times. Informatica BDE has the capability to execute all transformations in the Hadoop environment by executing Map Reduce and Hive code, thus minimizing network impact and maximizing the Hadoop engine.
If you’d like to see Informatica Big Data Edition and Hadoop live and in person, remember to visit us at our booth in the BIO IT World Conference in Boston, April 21-23, 2015. We are at booth 110.