Next Generation Analytics: The Collision of Classic & Big Data Analytics
Classic analytics, those traditionally supported by a data warehouse, bring focus and insights gained by understanding what an organization has done. Although there are too many examples to delve into all, one would include measuring the supply chain flow of materials into a product or service that’s brought to market, a theme that applies across all industries from telecom, financial services, pharmaceuticals, and retail. Others include measuring sales across product channels and customer demographics, or cash flow in, out, and through an organization. Most organizations leverage some forms of data warehouses and business intelligence tools to accomplish this against their own transactional data. Mature organizations conduct analytics in real-time and some even use predictive modeling to forecast and help make decisions.
Big Data analytics shifts the analytics focus from analyzing internal mechanisms to the events that happen outside of an organization. Now, it is possible to leverage data to understand events external to an organization. Tapping into social media, news feeds, and product review data will yield new insights into understanding how customers view an organization’s products & services. In some cases, tapping into machine logs will yield insight into how an organization’s key stakeholders (customers, employees, etc.) deliver or use the final products. For example, a recent telecom client I have engaged with wanted to gain deeper insight into the customer activity on their data network. Other clients I’ve engaged with in the logistics space wanted to gain deeper insight into the activity of the trucks used to make deliveries. Big Data Management Systems like Hadoop help manage the storage of this information. Moving this information into a system like Hadoop is challenging. Making sense of this massive amount of structured and unstructured data into actionable intelligence seems impossible.
Big Data Analytics requires new approaches and techniques to integrate data from classic data warehousing ETL. Specifically, the repurposing of Data Quality techniques can help solve Big Data Integration challenges. Recently I set up some of the leading discovery & visualization tools, Tableau & Qliksense, to access data in Hadoop. These tools require access through Hive, a native SQL-like interface to the data. In order to get the Hadoop data into Hive, we had to structure it. This meant parsing up the text from the logs and articles into traditional, relational based columns and rows. Processing massive amounts of text data from a nearly limitless pool of file formats is resource intensive and almost an irrational endeavor. The successful approach offered a systematic way to find the key actionable nuggets of information in the massive lake of Big Data. Unlike classic data warehousing ETL that processes an entire data set, Big Data Integration requires a system that can parse large strings, whole files, multiple formats and pump out only the important pieces of information. Data Quality techniques and routines address text processing issues now found in Big Data, from file parsing, data classifying and labeling, and probabilistic modeling. (Probabilistic Modeling is a great technique for processing Big Data as it learns patterns from a sample data set and applies to subsequent larger data sets) These key functional needs are all available in Informatica Power Center Big Data Edition, which now also includes Data Quality capabilities, and we used it to process and present the data, to solve these Big Data challenges.
Organizations will realize maximum insights and value when merging Classic Analytics with Big Data Analytics. Key stakeholders and business leaders use SWOT (Strength, Weakness, Opportunities, & Threats) techniques to guide an organization’s business into the best ventures. Once understanding the strength and weakness portion of SWOT, an organization appreciates where it excels and underperforms. Classic Analytics with traditional Business Intelligence tools and Data Warehouses with their focus on internal transaction systems deliver key information needed to understand strength and weaknesses. For example, measuring product sales across geographic regions shows which regions perform the best. Once understanding the opportunities and threats portion of SWOT, an organization appreciates what is happening in the environment around it. Big Data Analytics, with its means to collect massive amounts of data from outside of the organization and discover relevant pieces of information, will yield insight into the organization’s external environment. For example, discovering customer sentiment with regards to functionality and quality can help identify opportunities for product improvement and threats to future sales. In this example, customer sentiment should be cross referenced with the actual transactional sales to develop an understanding of the financial impact of customer sentiment. Analytics that yield insight across strength vs. weakness and opportunity vs. threat can only be achieved when Classic and Big Data Analytics collide.