Читать книгу Data protection for the prevention of algorithmic discrimination - Alba Soriano Arnanz - Страница 12

2. Data processing tools and technologies 2.1. Machine learning and data mining

Оглавление

In order to extract the relevant knowledge from big data it is necessary to use a specific set of tools amongst which data mining and machine learning are included. Data mining is part of the knowledge discovery in databases process. Knowledge discovery in databases can be defined as “the overall process of discovering useful knowledge from data”.40 Within this process, data mining constitutes the step in which relevant relationships are extracted from the data. Data mining can be thus described as the part of the process of analysis in which algorithms are used to discover patterns in the data that would probably not be detected by human analysts.41

According to Hand, who already defined data mining in 1998, it is “the process of secondary analysis of large databases aimed at finding unsuspected relationships which are of interest or value to the database owners”.42 Data mining is thus used to extract rules from the available data by obtaining implicit, previously unknown and potentially useful information.43

Machine learning is a subfield of computer science, which includes a set of processes or methods that can find existing correlations in datasets, and use the discovered patterns in order to make predictions.44 Machine learning systems are able to constantly learn and improve over time through the use of techniques such as neural networks, which connect ideas in similar ways to human brains.45 These systems can develop as far as making ‘intelligent’ decisions similar to those that a human being in the same position would have made, thus being generally considered a branch of Artificial Intelligence.46

Machine learning and data mining tend to work together,47 seeing as the algorithms used to extract relevant relationships during the data mining phase are generally machine learning algorithms.48 It is quite difficult to establish a difference between data mining and machine learning since both tools are used in order to extract relevant relationships and make predictions.49 One of differences is that, while data mining mostly focuses on data collection and establishing relationships, machine learning is used in order to predict future outcomes and make decisions based on that information.50 Another key difference, one that becomes essential in dealing with the discriminatory and flawed outcomes of automated decision-making systems, is that in data mining the models produced are more interpretable but less accurate than machine learning models.51

One of the most relevant aspects of machine learning is that the algorithm will keep learning on its own as it receives more and more information even when the model has already been deployed.52 An example in which machine learning is used is music streaming platforms such as Spotify in which the programme will be able to detect the user’s music tastes and thus offer her different playlists according to said music taste.53 Applications such as Spotify or Netflix, which offer recommendations, use a specific subset of machine learning tools known as deep learning54 which, due to their effectiveness and efficiency are increasingly used in different areas such as medicine55 or manufacturing.56

Data protection for the prevention of algorithmic discrimination

Подняться наверх