Читать книгу Data Science For Dummies - Lillian Pierson - Страница 79

Logistic regression

Logistic regression is a machine learning method you can use to estimate values for a categorical target variable based on your selected features. Your target variable should be numeric and should contain values that describe the target’s class — or category. One cool aspect of logistic regression is that, in addition to predicting the class of observations in your target variable, it indicates the probability for each of its estimates. Though logistic regression is like linear regression, its requirements are simpler, in that:

There doesn't need to be a linear relationship between the features and target variable.

Residuals don’t have to be normally distributed.

Predictive features aren’t required to have a normal distribution.

When deciding whether logistic regression is a good choice for you, consider the following limitations:

Missing values should be treated or removed.

Your target variable must be binary or ordinal. Binary classification assigns a 1 for “yes” and a 0 for “no.”

Predictive features should be independent of each other.

Logistic regression requires a greater number of observations than linear regression to produce a reliable result. The rule of thumb is that you should have at least 50 observations per predictive feature if you expect to generate reliable results.

Predicting survivors on the Titanic is the classic practice problem for newcomers to learn logistic regression. You can practice it and see lots of examples of this problem worked out over on Kaggle. (www.kaggle.com/c/titanic).

Подняться наверх