So much data is now collected that, along with the power of computers and statistical methods, it is not very difficult for us to search these data to find relationships among variables that can be used to target efforts in many fields, including marketing, healthcare and government. While the term data mining is not wholly new, it has come to be much better known in the past few years. So what is data mining and how is it used?
Defining Data Mining
The basic idea behind data mining is to search a large database – such as one on customers’ spending habits online or in a store – to find relationships among the many variables or categories of data. Powerful software has been developed to search ever-larger databases to:
- determine relationships among variables that may not otherwise be realized
- find patterns in the behavior of those whose information is in the database
- locate groups or clusters of information that are related in some way
- find new ways of organizing the data that may give better overall information
- provide forecasts for future developments or needs
Data that can later be mined can come from many sources. For example, when a person shops online, the information on what and how much is purchased, when and how often it is purchased, where the person lives, payment type preferences and preferred method of shipping is stored in the computer system of the retailer. Businesses store hundreds of thousands of pieces of information on personnel, production, sales, waste and many other categories. Hospitals and other medical facilities store information on such things as drugs administered to patients, length of hospital stay, services provided, personnel who provide care and types of equipment and supplies that are used. As more and more data are collected, data mining can be used to help the business, industry, government agency or whoever else uses the information to improve on things such as marketing, production of goods and, for government agencies, regulation.
As an example, in a telecommunications study, process mining was used on unstructured data from a company’s customer relationship management system to describe the “typical customer fulfillment business process” (Mahendrawathi, Astuti, & Nastiti, 2015). The database consisted of entries by employees in 5,809 cases, such as calls for installation of telephone service and calls for repairs. There were 673 different business processes that were entered, from the first request for telephone service by a customer to successful completion of the service. The major component of the process is the handling of the work order.
From the data mining analysis, the authors were able to determine 18 “typical processes,” those that occurred in at least 50 cases. By understanding the typical cases, the company could identify the standard time it takes to fulfill a request and better utilize their personnel. For example, by knowing that 40% of the cases require work on installation, field supervisors could be provided the necessary personnel and equipment to service these requests, which will minimize the time needed to complete the process.
Interpret the Findings
While data mining can be a powerful tool, the user has to be careful to analyze the search findings before taking action. The reason is that some categories of data that are totally unrelated can come up as being related. As with much of science, including data science, the information that is produced must be analyzed by people and employed properly. Training and experience will make those who use data mining better at interpreting the findings of the computer programs. As data mining grows in popularity, more people will specialize in this area, and training will also grow accordingly.
There are ethical and security considerations involved in data mining that go beyond the scope of this article. Experts in the field of data mining have presented a variety of sides to the story of data mining, and the debate over its usefulness and appropriateness will go on as it becomes a more popular technique used in decision making. If data mining is of interest to you, you may want to consider pursuing a degree in data science.
Covell, D.G. (2015). Data mining approaches for genomic biomarker development: Applications using drug screening data from the Cancer Genome Project and the Cancer Cell Line Encyclopedia. PLoS ONE 10(7): e0127433. doi: 10.1371/journal.
Mahendrawathi, E.R., Hanim Maria Astuti, H.M., & Nastiti, A. (2015). Analysis of Customer fulfilment with process mining: A case study in a telecommunication company. Procedia Computer Science 72, 588-596.