Читать книгу Data Science For Dummies - Lillian Pierson - Страница 65

Probability distributions

Оглавление

Imagine that you’ve just rolled into Las Vegas and settled into your favorite roulette table over at the Bellagio. When the roulette wheel spins off, you intuitively understand that there is an equal chance that the ball will fall into any of the slots of the cylinder on the wheel. The slot where the ball lands is totally random, and the probability, or likelihood, of the ball landing in any one slot over another is the same. Because the ball can land in any slot, with equal probability, there is an equal probability distribution, or a uniform probability distribution — the ball has an equal probability of landing in any of the slots in the wheel.

But the slots of the roulette wheel aren’t all the same — the wheel has 18 black slots and 20 slots that are either red or green.


Because of this arrangement, the probability that your ball will land on a black slot is 47.4%.

Your net winnings here can be considered a random variable, which is a measure of a trait or value associated with an object, a person, or a place (something in the real world) that is unpredictable. Because this trait or value is unpredictable, however, doesn’t mean that you know nothing about it. What’s more, you can use what you do know about this thing to help you in your decision-making. Keep reading to find out how.

A weighted average is an average value of a measure over a very large number of data points. If you take a weighted average of your winnings (your random variable) across the probability distribution, this would yield an expectation value — an expected value for your net winnings over a successive number of bets. (An expectation can also be thought of as the best guess, if you had to guess.) To describe it more formally, an expectation is a weighted average of some measure associated with a random variable. If your goal is to model an unpredictable variable so that you can make data-informed decisions based on what you know about its probability in a population, you can use random variables and probability distributions to do this.

When considering the probability of an event, you must know what other events are possible. Always define the set of events as mutually exclusive — only one can occur at a time. (Think of the six possible results of rolling a die.) Probability has these two important characteristics:

 The probability of any single event never goes below 0.0 or exceeds 1.0.

 The probability of all events always sums to exactly 1.0.

Probability distribution is classified per these two types:

 Discrete: A random variable where values can be counted by groupings

 Continuous: A random variable that assigns probabilities to a range of value

To understand discrete and continuous distribution, think of two variables from a dataset describing cars. A “color” variable would have a discrete distribution because cars have only a limited range of colors (black, red, or blue, for example). The observations would be countable per the color grouping. A variable describing cars’ miles per gallon, or mpg, would have a continuous distribution because each car could have its own, separate value for miles per gallon (mpg) that it gets on average.

 Normal distributions (numeric continuous): Represented graphically by a symmetric bell-shaped curve, these distributions model phenomena that tend toward some most-likely observation (at the top of the bell in the bell curve); observations at the two extremes are less likely.

 Binomial distributions (numeric discrete): These distributions model the number of successes that can occur in a certain number of attempts when only two outcomes are possible (the old heads-or-tails coin flip scenario, for example). Binary variables — variables that assume only one of two values — have a binomial distribution.

 Categorical distributions (non-numeric): These represent either non-numeric categorical variables or ordinal variables (an ordered categorical variable, for example the level of service offered by most airlines is ordinal because they offer first class, business class, and economy class seats).

Data Science For Dummies

Подняться наверх