Choose an area of interest:
Search 

Choose an area of interest:


How to Turn Data into Knowledge
Part Two of a Two-Part Series

July 3, 2000 (SmartPros) Perhaps you have spent thousands on the development of your Web site. There is a return on that investment. Each visitor leaves trails of data that could be captured and add significant value to any marketing opportunities.



In an organization with more than just a few people, it is likely there is some degree of fraud taking place. Individuals who commit fraud will likely leave some evidence of their act. This data (evidence) could help organizations predict, detect, and reduce the risk of future fraud by helping to strengthen internal controls.

Successful organizations know that you cannot pay bills with net income. In other words, "Cash is King." As organizations operate, they will probably experience bad debts from regular trade receivables or other financing arrangements. Most organizations will record a great deal of information from the customer. Perhaps the data they obtain contains hidden patterns that may help predict the default of consumer financing arrangements.

These are just a few examples of the opportunities of data mining. Organizations collect and process staggering amounts of data everyday. Most organizations really do not use the data as well as they could. This is good news for competitors that do employ data mining techniques. They know that if they use these techniques properly, they will have a better understanding of consumer behavior.

Data mining tools and techniques are still in their infancy and require a considerable amount of business and technical experience to become proficient and to identify meaningful results. With proper use and management, it can provide organizations opportunities to improve operations and increase cash flow.

What is Data Mining?
Data mining is not just another fad that will eventually go away. It is a sophisticated process of discovering hidden relationships in data to use in building predictive models. It is a continuous process and helps professionals discover answers to questions that they may have not asked because there was a lack of information. It also helps to answer questions that traditionally were too time-consuming.

Data mining is not a magic bullet, nor will it transform business operations and profitability overnight. However, over time these techniques can help an organization gain a greater understanding and result in a competitive advantage.

It is unlikely that any data mining tools will ever replace professionals in our lifetime since there is no analysis technique that can replace experience. Data mining will assist as a tool to find patterns and relationships. The professional must determine the value of these findings and actually verify them in the real world. Therefore, it is critical that the professional uses the correct data and has a thorough understanding of it.

Data Mining and OLAP
Probably the most frequently asked question regarding data mining is how the process differs from OLAP (Online Analytical Processing). First, I should point out that data mining is not a replacement OLAP. Similar to an auto mechanic that uses several tools to help a vehicle run smoothly, the professional employs more than one tool to turn data into knowledge.

OLAP is commonly referred to as multi-dimensional reporting and includes a variety of techniques, which are user-driven. Although data mining and OLAP are two very different tools, they can complement each other.

OLAP is a way to validate a hypothesis (i.e., patterns and relationships) against data. OLAP tools are part of decision support tools and allow the users to run a series of queries against data and validate whether a hypothesis is true. The problem with this deductive analysis is that it is difficult and time consuming to find valuable hypothesis.

On the other hand, data mining is not as time consuming and it uses an inductive process, which is data-driven. The professional will not feed the data mining tool a hypothesis to validate because the tool will use the given data to uncover patterns and relationships.

OLAP tools are widely used and are an important element of decision-making. The OLAP can complement the data mining by identifying a good place to start the data mining. For example, if you wish to know what took place in the past such as what the sales were by country, by month, etc, then start with an OLAP tool.

Once what happened in the past is identified, many will wonder why those events happened, and what is likely to happen in the future. Knowledgeable and well-equipped professionals will use data mining tools and techniques for these types of questions.

Data Mining Process
It is important to note that a data warehouse or data mart as discussed in Part One of this article is not a requirement for data mining. However, it probably would prevent several headaches if the organization's data were scattered among multiple data sources.

Before beginning any type of data mining process, remember there are always two common ingredients of a successful project. First, know your objective. For example, what are you trying to solve? Second, use the appropriate data.

There are several phases in the data mining process and it all begins with the data collected. When combining these phases you will see that it is an iterative process because the process requires hopping back and forth between phases until reaching a comfort level. The seven distinct phases of data mining are:
  • Organization Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment & Monitoring

As one iterates between the different phases, professionals will obtain new knowledge resulting in new questions and clearer focus. As more data mining occurs, in theory, their processes should provide some degree of benefit to future data miners.

Organization Understanding
As stressed earlier, it is important to understand the organization's data, transactions, and environment. Organizations need to make sure that the professionals leading this process possess this experience and knowledge, otherwise they will not be able to understand the problems, objectives, or make any interpretations of the data mining results.

Data Understanding
This phase will help one get familiar with the data and identify data quality problems. It may also lead to capturing additional data to enhance the data mining process.

Data Preparation
Often it is necessary to obtain data from multiple data sources, which may be internal or external. Nevertheless, this process will include collecting, cleansing, and preparing the data. For additional information, please refer to Part One of this article.

Modeling
As mentioned earlier, it is essential to note that data mining is an iterative process. The first model may not always be optimal. It may require changes in the data or the parameters of the model. There are several techniques that professional can employ in their data mining models and each technique has advantages and disadvantages.

All of the techniques available will provide answer. However, the quality of the results will have a direct relationship with the applied technique. Although it is outside of the scope of this article, it is critical that one understands the available techniques so they can align the techniques with goals.

Some of the techniques used in data mining include the following:

  • Neural Networks - A complex nonlinear modeling technique that is particularly effective for predicting future events based on prior experiences.

  • Fuzzy Logic - A type of logic that recognizes and can deal with more than true and false values, such as manipulating various degrees of "maybe".

  • Decision Trees - A tree-like way of representing a collection of hierarchical rules that lead to a class or value.

  • Genetic Algorithms - Generate and test combinations of possible input parameters to find the optimal output. Process based on natural evolution concepts such as genetic combination, mutation and natural selection.

  • Nearest Neighbor Method - Predictive technique suitable for classification models.

  • Rule Induction - Using a set of non-hierarchical sets of conditions, which will then be used to predict values for new data items.

Evaluation
Before deployment, it is important to evaluate the model and try to determine if there are any serious flaws in the logic. For example, the model may say that if you promote a new product via direct mail that includes a small product sample to Caucasian females with income ranges between $50,000 to $100,000, there is a 10% probability that the individual will purchase the product. One way to determine whether the model provides a reasonable prediction is to conduct a pilot test. If the model fails, then it is back to the previous phases.

Deployment and Monitoring
Deployment of the model will generally require some form of modification to data capturing procedures. It may also require that the organization incorporate the model into real time operations such as making a determination of whether a credit card transaction is likely to be fraudulent.

Since whatever is being modeled is likely to change over time it is important that there are internal controls to constantly revalidate the models. Just because the model was valid yesterday, does not mean the model will be valid in the future.

Conclusion
Savvy organizations use data mining tools and techniques because they understand and realize the potential and added value it offers. The data mining process allows professionals to analyze, summarize, and help interpret information from large and multiple data sources more rapidly and with a greater degree of confidence. One can then use this knowledge to make critical decisions.

Even though data mining is an exciting technology with significant payback possibilities, keep in mind that without adequate training and experience, the process could lead to deficient findings, invalid conclusions, and costly organizational decisions.

2000, Smartpros Ltd. All Rights Reserved.

Related Stories
 
 
Data Extraction Software for Auditors

  Also By This Author
 
Home-Based Computer Networks

  Related Courses
 
Unlimited CPE Subscriptions


 
Would you recommend this article?
5 (yes, highly)
4
3
2
1 (no, not at all)
Comments:


 
 
About SmartPros | Accounting Products | Professional Education | Marketing Services | Consulting | Engineering Products | Contact Us
2009 SmartPros Ltd.