Query the Quarry: Much Can Be Learned from Data Mining
While it sometimes mirrors traditional BI, data mining differs in its ability to dig deep into a quarry of information. With the right tools, the process can yield valuable and intricate intelligence. As we continue to analyze the BI Maturity Curve with this blog, we delve into data mining.
The primary purpose of data mining is to explore large data sets for specific patterns and answers. For example, a toy retail company might ask, What characteristics do our customers making over $100,000 have in common? How long do they tend to stay with us based on the place of residence AND number of children?
Data mining is used to find answers to these types of questions, requiring more resources than operational reporting. However, the investment may pay off in potentially higher ROI.
Additionally, data mining often uses complex rule-based engines as it attempts to dig deeper and wider into the data field, considering multiple variables. It requires a more specialized skillset in IT departments, often demanding trained statisticians and mathematicians. In some instances, the process resembles traditional BI (obtaining and preparing the data — transformation, reduction, cleansing; creating and testing data models; interpreting the results). Some algorithms and techniques used in data mining are fairly advanced, such as with neural networks, k-nearest neighbor technique, rule induction, and decision trees. These progressive techniques require a good understanding of advanced statistics in addition to domain knowledge and source data analysis.
As an efficient way to increase computing output, data mining solutions often use parallel/distributed systems for calculations.
One of the disadvantages of data mining is the risk for bias. In most instances, users formalize questions and parameters based on the information that they are seeking. This has potential to create bias and make the data modeling process intensive. This can lead to a gap between a query and an action, which might make data mining unsuitable for systems where an immediate action is required, such as with high-frequency stock trading systems.
Table below summarizes the activities of Data Mining:
|Primary Focus||Explore, model, discovery, market basket analysis|
|Data Sources||Transactional databases (such as POS), historical transactional datasets, social networks|
|Business Skills||Data experimentation techniques and methodologies, collaboration with multiple stakeholders, segmentation analysis, management of decision support systems, business process management|
|IT Skills||Data visualization tools, statistical tools, distributed systems, cluster S and parallel calculations|
|Business Processes||Data mining integration; data classification, sequence and association|
|IT Processes||High performance systems and databases, machine and artificial intelligence learning, big data visualization|
|Technologies||Data mining tools and databases – i.e. Hadoop, R, NoSQL|
|Typical Users||IT or BI builds the systems and extracts data for business users|
|Sample Systems||Direct marketing mailings, scientific DNA discovery|
|Timeframe||Historical, real-time data sets|