Читать книгу Fraud and Fraud Detection - Gee Sunder - Страница 10
CHAPTER 2
Fraud Detection
DATA MINING VERSUS DATA ANALYSIS AND ANALYTICS
ОглавлениеBusinessDictionary.com defines data mining as:
sifting through very large amounts of data for useful information. Data mining uses artificial intelligence techniques, neural networks, and advanced statistical tools (such as cluster analysis) to reveal trends, patterns, and relationships, which might otherwise have remained undetected.10
Data mining is the searching of large amounts of computerized data to find trends, patterns, or relationships without testing a hypothesis. No specific results or outcomes are anticipated.
BusinessDictionary.com defines data analysis as:
the process of evaluating data using analytical and logical reasoning to examine each component of the data provided. This form of analysis is just one of the many steps that must be completed when conducting a research experiment. Data from various sources is gathered, reviewed, and then analyzed to form some sort of finding or conclusion. There are a variety of specific data analysis method, some of which include data mining, text analytics, business intelligence, and data visualizations.11
Data mining is a subset of data analysis or analytics. Data analytics starts with a hypothesis that is to be confirmed or proven to be false. A conclusion is made based on inference from the findings.
The definition and types of data analytics can be further refined as to exploratory data analysis (EDA), confirmatory data analysis (CDA), and qualitative data analysis (QDA).
EDA is the initial stage where the data is explored when little is known about the data’s relationships. It is here those hypotheses are formed and new patterns of features of the data are discovered. Most EDA techniques are visual and graphical. They consist of plotting the data in various types of statistical graphs to obtain an insight to the data.
CDA is where testing takes place and the hypotheses are proven correct or false. Results from samples are applied to the entire database. Causal or cause-and-effect relationships are verified. A cause-and-effect relationship is where one variable is independent and the other dependent. That is, the cause is the independent variable that impacts the dependent effect. An example would be citing the amount of rainfall as causing the growth of grass; care must be taken, as many events appear to be associated but may not actually have a cause-and-effect relationship.
Online analytical processing (OLAP) tools are frequently used with the CDA process. They allow the user to extract data selectively and view the data from different perspectives or dimensions.
QDA is used to draw conclusions from nonquantitative or non-numerical data such as images or text. While typically employed in the social sciences, it can be used in organizational audits of controls, procedures, and processes.
Data analytics provide insight into the dataset, discover underlying data relationships and structures, test assumptions and hypothesis, identify variables of causal relationships, and detect anomalies.
10
BusinessDictionary.com, “Data Mining,” accessed July 1, 2013, www.businessdictionary.com/definition/data-mining.html.
11
BusinessDictionary.com, “Data Analysis?,” accessed July 1, 2013, www.businessdictionary.com/definition/data-analysis.html.