Читать книгу Data Science For Dummies - Lillian Pierson - Страница 54
Becoming familiar with machine learning terms
ОглавлениеBefore diving too deeply into a discussion of machine learning methods, you need to know about the (sometimes confusing) vocabulary associated with the field. Because machine learning is an offshoot of both traditional statistics and computer science, it has adopted terms from both fields and added a few of its own. Here is what you need to know:
Instance: The same as a row (in a data table), an observation (in statistics), and a data point. Machine learning practitioners are also known to call an instance a case.
Feature: The same as a column or field (in a data table) and a variable (in statistics). In regression methods, a feature is also called an independent variable (IV).
Target variable: The same as a predictant or dependent variable (DV) in statistics.
In machine learning, feature selection is a somewhat straightforward process for selecting appropriate variables; for feature engineering, you need substantial domain expertise and strong data science skills to manually design input variables from the underlying dataset. You use feature engineering in cases where your model needs a better representation of the problem being solved than is available in the raw dataset.
Although machine learning is often referred to in context of data science and artificial intelligence, these terms are all separate and distinct. Machine learning is a practice within data science, but there is more to data science than just machine learning — as you will learn throughout this book. Artificial intelligence often, but not always, involves data science and machine learning. Artificial intelligence is a term that describes autonomously acting agents. In some case AI agents are robots, in others they are software applications. If the agent’s actions are triggered by outputs from an embedded machine learning model, then the AI is powered by data science and machine learning. On the other hand, if the AI’s actions are governed by a rules-based decision mechanism, then you can have AI that doesn’t actually involve machine learning or data science at all.