Probability

Probability is the mathematics of uncertainty.

Specifically, we use probability to measure the likelihood that a specific event will occur when performing an experiment.

We quantify probability as a number between {0} and {1}:

A probability of {1} means that it’s certain the event will occur.
A probability of {0} means that it’s certain the event will not occur.
A probability between {0} and {1} indicates some degree of uncertainty about the outcome.

Here, we will introduce the mathematical definition of probability and all the terminology that’s associated with it.

This introduction is for students who have had one year of algebra education.

Index

Experiment and probability

    In statistics, an experiment is any process that produces a
    specific outcome.

Here are some examples of experiments:

Look at a clock and observe the time.
Measure the height of a tree.
Roll a pair of {6}-sided dice at the same time and observe the two numbers that appear at their tops.

As you can see, the term “experiment” has a much broader meaning in statistics than it does in ordinary English.

    Probability is the measurement of the likelihood that
    an experiment will produce a particular outcome
    (or set of outcomes).

In the next eight sections, we’ll introduce the terminology that’s used in the study of probability.

As we present the terminology, we’ll apply it to an example experiment that consists of rolling a pair of {6}-sided dice, producing one of the following {36} outcomes:

Random experiment

    A random experiment is an experiment that’s designed
    to maximize the unpredictability of its outcome.

An experiment can be made random by performing an unpredictable physical process, or by making a selection based on a random number generator.

Randomization is useful because it helps the experiment produce unbiased outcomes. Unbiased outcomes are required if we want to obtain statistical results that are as accurate as possible.

In our dice-rolling example, we’ll make the experiment more random by rolling the dice in a way that causes them to tumble or collide many times.

Outcome and trial

An outcome is the elementary result of an experiment.

Each outcome must be distinct, and cannot overlap with any other outcome. An experiment cannot produce two different outcomes at the same time.

To obtain an outcome, we perform a trial:

    A trial is one performance of an experiment,
    resulting in one outcome.

In our dice-rolling example, we’ll define {(a,b)} to represent the outcome, where {a} and {b} are the two numbers that appear on the two dice.

If {a \ne b} then {(a,b)} is considered to be a different outcome than {(b,a).} We’ll assume that the two dice have different colors, so we can tell which one is {\text{“}a\text{”}} and which one is {\text{“}b\text{”}.}

An example of a trial outcome would be {(a,b) = (4,6).}

Sample space

    A sample space is the set of all possible outcomes
    for an experiment.

In our dice-rolling example, the sample space is:

{ S = \lbrace \, } {(1,1),} {(1,2),} {(1,3),} {(1,4),} {(1,5),} {(1,6),}
{(2,1),} {(2,2),} {(2,3),} {(2,4),} {(2,5),} {(2,6),}
{(3,1),} {(3,2),} {(3,3),} {(3,4),} {(3,5),} {(3,6),}
{(4,1),} {(4,2),} {(4,3),} {(4,4),} {(4,5),} {(4,6),}
{(5,1),} {(5,2),} {(5,3),} {(5,4),} {(5,5),} {(5,6),}
{(6,1),} {(6,2),} {(6,3),} {(6,4),} {(6,5),} {(6,6) \, \rbrace}

The number of outcomes in the sample space {S} is:

{|S| = 36}

We place vertical bars around the name of a set to indicate the number of outcomes it contains.

Sample

    A sample is a sequence of outcomes that are
    obtained by performing one  or more trials
    of an experiment.

    A random sample is a sample that’s obtained
    from a random experiment.

For example, let’s roll the dice {5} times, and suppose we get the following sample:

sample: { \left\lbrace \, \eqalign { {s_{\large 1} = (4,6)} \\[-1pt] {s_{\large 2} = (3,1)} \\[-1pt] {s_{\large 3} = (5,5)} \\[-1pt] {s_{\large 4} = (2,4)} \\[-1pt] {s_{\large 5} = (3,1)} \\[-1pt] } \right. }

Event

    An event is a particular set of outcomes that we’re
    interested in obtaining, or that we consider to be desirable.

For example, let’s say that we want to get a dice roll that totals {10.} We can express this by defining the event {E} as:

{E = \lbrace \, (4,6), (5,5), (6,4) \, \rbrace }

There are {3} different ways to roll a total of {10,} which we can state like this:

{|E| = 3}

Each time we perform a trial, we check to see if the outcome is in the set {E} or not:

If the outcome is in {E,} we say that the event {E} occurred.
If the outcome is not in {E,} we say that the event {E} did not occur.

For example, if we look at the sample from the previous section, we can say:

in trial {1{:}\ } {s_{\large 1} = (4,6), \ } so the event {E} occurred
in trial {2{:}\ } {s_{\large 2} = (3,1), \ } so the event {E} did not occur
in trial {3{:}\ } {s_{\large 3} = (5,5), \ } so the event {E} occurred
in trial {4{:}\ } {s_{\large 4} = (2,4), \ } so the event {E} did not occur
in trial {5{:}\ } {s_{\large 5} = (3,1), \ } so the event {E} did not occur

In general, an event {E} is a subset of the sample space {S}. Therefore, {0 \le |E| \le |S|.}

If {E} has no outcomes, then {|E| = 0,} and the event will never occur.

If {E} has all possible outcomes, then {|E| = |S|,} and the event will always occur.

If {0 \lt |E| \lt |S|,} then there is uncertainty about whether the event will occur.

A sample and an event are similar in that they both consist of outcomes. However, there’s an important difference between them:

An event contains all the desired outcomes we might obtain in one trial.
A sample contains all the actual outcomes we obtain in one or more trials.

Simple event

A simple event is an event that consists of exactly one outcome.

In our dice-rolling example, let’s define the simple event {F} as:

{F = \lbrace \, (6,6) \, \rbrace }

{F} is a simple event because:

{|F| = 1}

Event description

    An event description is a statement that’s true for every
    outcome in the event, and is false for every
    outcome that’s not in the event.

In a previous section, we defined the dice-rolling event {E = \lbrace \, (4,6), (5,5), (6,4) \, \rbrace.} This event could be described as:

“the outcome {(a,b)} satisfies {a + b = 10}”

or, more informally:

“sum {= 10}”

Event descriptions are especially useful when the event contains a large number of outcomes.

Probability measure

A probability measure assigns a numerical value between {0} and {1} to each event.

We use the notation {P(E)} to represent the probability that the event {E} will occur when performing a trial.

If all of the outcomes in the sample space are equally likely, then we can say:

[[ P(E) = {{|E|} \over {|S|}} = \space ]]

the number of outcomes in {E}

the number of outcomes in {S}

In our dice-rolling example, we defined two events:

{E = \lbrace \, (4,6), (5,5), (6,4) \, \rbrace }

{F = \lbrace \, (6,6) \, \rbrace }

All {36} outcomes in {S} are equally likely, so we can say that the probability measures of the events {E} and {F} are:

[[ P(E) = {{|E|} \over {|S|}} = {{3} \over {36}} = 0.0833 ]]

[[ P(F) = {{|F|} \over {|S|}} = {{1} \over {36}} = 0.0277 ]]

Probability: the complement rule

Every event {E} has a complementary event called {\text{“}}not {E \, \text{”}} that contains all the events that are not in {E.}

Here, we’ll use {\skew3\overline{E}} as shorthand for {\text{“}}not {E \, \text{”}.}

In our dice-rolling example, we can describe {E} and {\skew3\overline{E}} as:

{E }: sum {= 10}

{\skew3\overline{E}}: sum {\ne 10}

The number of outcomes in these events are:

{|E| = 3} {(}there are {3} ways to roll a sum {= 10)}

{|\skew3\overline{E}| = 33} {(}there are {33} ways to roll a sum {\ne 10)}

In general, every event {E} and its complementary event {\skew3\overline{E}} satisfy this equation:

{|E| + |\skew3\overline{E}| = |S|}

Dividing both sides of the equation by {|S|,} we get:

[[ {{|E|} \over {|S|}} + {{|\skew3\overline{E}|} \over {|S|}} = 1 ]]

which allows us to express it in terms of probability measure:

{ P(E) + P(\skew3\overline{E}) = 1 }

This result gives us the complement rule for probability:

Complement rule for probability
If {E} and {\skew3\overline{E}} are complementary events, then:

[[ P({\skew3\overline{E}}) \ = \ 1 - P(E)]]

Probability: the addition rule

In our dice-rolling example, we have two events {E} and {F,} which can be described as:

{E}: sum {= 10}

{F}: sum {= 12}

The number of outcomes in these events are:

{|E| = 3} {(}there are {3} ways to roll a sum {= 10)}

{|F| = 1} {(}there is {1} way to roll a sum {= 12)}

We can define a new event called { \text{“} E \text{ or } F \hspace+1pt \text{”} } that occurs if either {E} or {F} occurs. Specifically:

{E \text{ or } F \, = \, \lbrace \, (4,6), (5,5), (6,4), (6,6) \, \rbrace} {(}sum {= 10} or sum {= 12)}

{|E \text{ or } F| \, = \, |E| + |F| \ = \ 4 }

Dividing by {|S|,} we get:

[[ { {|E \text{ or } F|} \over {|S|}} \ = \ {{|E|} \over {|S|}} + {{|F|} \over {|S|}} \ = \ {{4} \over {36}} ]]

which allows us to express it in terms of probability measure:

[[ P(E \text{ or } F)\ = \ P(E) + P(F) \ = \ {{4} \over {36}} ]]

This result gives us the addition rule for probability:

Addition rule for probability
If event {E} and event {F} are mutually exclusive events, then:

{ P(E \text{ or } F) \, = \, P(E) + P(F) }

where:

Two events are mutually exclusive if they have no outcomes in common.

If two events are mutually exclusive, then it’s not possible for both events to occur in the same trial. So, in order to use the addition rule, { \text{“} E \text{ or } F \hspace+1pt \text{”} } really needs to mean { \text{“} E \text{ or } F } but both can’t occur{\text{”}.} For example, you can roll a sum of {10,} or you can roll a sum of {12,} but you can’t roll both at the same time.

Recall that a simple event is an event that has only one outcome. Since all simple events are mutually exclusive with each other, we can always use the addition rule on them.

Probability: exercises (part 1)

Here’s a summary of the probability rules that we’ve seen so far:

[[ P({\skew3\overline{E}}) \ = \ 1 - P(E) ]] if {\skew3\overline{E}} is complementary to {E}

{ P(E \text{ or } F) \, = \, P(E) + P(F) } if {E} is mutually exclusive with {F}

In the following exercises, assume that we roll a pair of dice.

Find the probability of getting a sum {\le 4.}

sum {= 1}: {\lbrace \, \rbrace}
sum {= 2}: {\lbrace \, (1,1) \, \rbrace}
sum {= 3}: {\lbrace \, (1,2), (2,1) \, \rbrace}
sum {= 4}: {\lbrace \, (1,3), (2,2), (3,1) \, \rbrace}

{ P(}sum {= 2) + P(}sum {= 3) + P(}sum {= 4) \ = \ }[[ {{1} \over {36}} + {{2} \over {36}} + {{3} \over {36}} \ = \ {{6} \over {36}} ]]
Find the probability of not getting a sum of {6.}

sum {= 6}: {\lbrace \, (1,5), (2,4), (3,3), (4,2), (5,1) \, \rbrace}

{ P(}sum {= 6) \ = \ }[[ \vphantom{\large{1\over1}} {{5} \over {36}} ]]
{ P(}sum {\ne 6) \ = \ 1 - P(}sum {= 6) \ = \ }[[ 1 - {{5} \over {36}} \ = \ {{31} \over {36}} ]]

Probability: the multiplication rule

Up until now, we’ve been rolling both dice together in the dice-rolling experiment.

In this section, we’re going to do it differently — we’ll just roll one cube at a time, and we’ll treat each individual cube-roll as a separate experiment.

As a result, the sample space {S} is smaller now, because only one cube is rolled in the experiment:

{S = \lbrace \, \text{1, 2, 3, 4, 5, 6} \, \rbrace }

{|S| = 6 }

Let’s now define two new events, {A} and {B}:

{A = \lbrace \, \text{2, 3, 4} \, \rbrace } {(}roll one cube, and get an outcome of {2,} {3,} or {4)}

{B = \lbrace \, \text{5, 6} \, \rbrace } {(}roll one cube, and get an outcome of {5} or {6)}

The number of outcomes in these events are:

{|A| = 3}

{|B| = 2}

so their probability measures are:

[[ P(A) = {{|A|} \over {|S|}} = {{3} \over {6}} ]]

[[ P(B) = {{|B|} \over {|S|}} = {{2} \over {6}} ]]

We can define a new event called { \text{“} A \text{ and } B \hspace+1pt \text{”} } that occurs if {A} and then {B} occur when performing two trials of the experiment.

The following diagram shows all possible outcomes for both trials, where:

the outcome in the first trial is shown as a lighter-colored cube,
the outcome in the second trial is shown as a darker-colored cube.

The diagram also shows which outcomes are in event {A,} and which outcomes are in event {B,} with each event enclosed in a black rectangle:

Looking at the intersection of the black rectangles, we can see which outcomes are in both {A} and {B}:

{ A \text{ and } B = \lbrace \, (2,5), (2,6), (3,5), (3,6), (4,5), (4,6) \, \rbrace}

and the number of outcomes in the intersection is:

{|A \text{ and } B| \ = \ |A| \times |B| \ = \ 6}

The diagram also shows that when both trials are performed, the combined sample space has {|S|^{\large 2} = 36} outcomes, because each of the {|S| = 6} outcomes of the first trial can occur in combination with each of the {|S| = 6} outcomes of the second trial.

Dividing the above equation by {|S|^{\large 2},} we get:

[[ { {|A \text{ and } B|} \over {|S|^{\large 2}}} \ = \ {{|A|} \over {|S|}} \times {{|B|} \over {|S|}} \ = \ {{6} \over {36}} ]]

which allows us to express it in terms of probability measure:

[[ P(A \text{ and } B)\ = \ P(A) \times P(B) \ = \ {{6} \over {36}} ]]

This result gives us the multiplication rule for probability:

Multiplication rule for probability
If event {E} and event {F} are independent events, then:

{ P(E \text{ and } F) \, = \, P(E) \times P(F) }

where:

Two events are independent if the occurrence or non-occurrence of one event does not affect the probability of the other event occurring.

This requirement for independence explains why we broke up the dice roll into two separate experiments, one for each cube. There’s no way for the outcome of one cube to affect the outcome of the other cube, and that’s what makes them independent of each other.

Probability: dependent and independent events

Two events have a dependent relationship if the occurrence of one event can change the probability that the other event will occur.

Here’s an example of two events that are dependent:

{E}: roll one cube, and get an outcome of {1}
{F}: roll one cube, and get an outcome of {2}

Separately, each event has a probability of { {\large {{1} \over {6}} }. } However, if one event occurs, then the probability that the other event occurs in the same trial drops to {0.} This shows that the two events have a dependent relationship.

There’s an easy test you can apply to see if two events are independent:

Event {E} and event {F} are independent if:

{ P(E \text{ and } F) \, = \, P(E) \times P(F) }

In our example, events {E} and {F} fail this test, because there are no outcomes in both {E} and {F,} so the left side of the equation is {0,} producing the false equation { 0 = {\large {{1} \over {36}} } .}

To break the dependency between events {E} and {F,} you would need to roll the cube twice, performing two different trials. The randomness of the experiment would then ensure that the outcome of the second trial is independent of the outcome of the first trial. In that case, the multiplication rule { P(E) \times P(F) = {\large {{1} \over {36}} } } would give the correct probability of getting {E} in the first trial, and then getting {F} in the second trial.

As you can see, when we ask if two events are independent, it’s important to understand if they would occur together in the same trial, or if they would occur separately in two different trials. In other words, make sure that you understand if {\text{“} E \text{ and } F \hspace+1pt \text{”}} means {\text{“} E} and {F} simultaneously{\text{”}} or {\text{“} E} and then {F \hspace+1pt \text{”}.}

Probability: exercises (part 2)

Here’s a summary of the probability rules that we’ve seen so far:

[[ P({\skew3\overline{E}}) \ = \ 1 - P(E) ]] if {\skew3\overline{E}} is complementary to {E}

{ P(E \text{ or } F) \, = \, P(E) + P(F) } if {E} is mutually exclusive with {F}

{ P(E \text{ and } F) \, = \, P(E) \times P(F) } if {E} is independent of {F}

In the following exercises, assume that we roll a pair of dice.

Find the probability of getting an even number on both cubes.

{ P(}sum is even{) \times P(}sum is even{) \ = \ }[[ \vphantom{\large{1\over1}} {{3} \over {6}} \times {{3} \over {6}} \ = \ {{9} \over {36}} \ = \ {{1} \over {4}} ]]
Find the probability of getting an even number on one of the cubes, and an odd number on the other cube.

{ p = P(}sum is even{) \times P(}sum is odd{) \ = \ }[[ \vphantom{\large{1\over1}} {{3} \over {6}} \times {{3} \over {6}} \ = \ {{9} \over {36}} \ = \ {{1} \over {4}} ]]
{ q = P(}sum is odd{) \times P(}sum is even{) \ = \ }[[ \vphantom{\large{1\over1}} {{3} \over {6}} \times {{3} \over {6}} \ = \ {{9} \over {36}} \ = \ {{1} \over {4}} ]]
[[ p + q \ = \ \vphantom{\large{1\over1}} {{1} \over {4}} + {{1} \over {4}} \ = \ {{1} \over {2}} ]]

Probability: the addition rule (general case)

Continuing with our dice-rolling example, let’s find the probability of getting a {6} on either cube when rolling a pair of dice.

We’ll start by defining the events that occur when we roll a {6.} We’ll need two events, {C} and {D,} one for each cube:

{C = \lbrace \, (6,1), (6,2), (6,3), (6,4), (6,5), (6,6) \, \rbrace } {(}first cube {= 6)}

{D = \lbrace \, (1,6), (2,6), (3,6), (4,6), (5,6), (6,6) \, \rbrace } {(}second cube {= 6)}

We can determine their probability measures:

[[ P(C) = {{|C|} \over {|S|}} = {{6} \over {36}} ]]

[[ P(D) = {{|D|} \over {|S|}} = {{6} \over {36}} ]]

and we can shown them in the following diagram:

We can define a new event called { \text{“} C \text{ or } D \hspace+1pt \text{”} } that occurs if either {C} or {D} or both occurs. Specifically:

{C \text{ or } D \, = \, \lbrace \, } { (6,1), (6,2), (6,3), (6,4), (6,5), (6,6), }
{ (1,6), (2,6), (3,6), (4,6), (5,6) \, \rbrace }

{|C \text{ or } D| \, = \, 11 }

Notice that the event {C} and the event {D} are not mutually exclusive, because they share a common outcome: {(6,6).}

Also notice that when we constructed the set of outcomes in { \text{“} C \text{ or } D \hspace+1pt \text{”}, } we only listed {(6,6)} once, because a set of outcomes cannot contain any duplicates.

Therefore, when we determine the number of outcomes in { \text{“} C \text{ or } D \hspace+1pt \text{”}, } we need to ensure that any duplicates are counted only once. This can be done by subtracting out the number of duplicates, like so:

{|C \text{ or } D| \ = \ |C| + |D| - |C \text{ and } D| \ = \ 6 + 6 - 1 \ = \ 11}

Dividing by {|S|,} we get:

[[ { {|C \text{ or } D|} \over {|S|}} \ = \ {{|C|} \over {|S|}} + {{|D|} \over {|S|}} - {{|C \text{ and } D|} \over {|S|}} \ = \ {{11} \over {36}} ]]

which allows us to express it in terms of probability measure:

[[ P(C \text{ or } D)\ = \ P(C) + P(D) - P(C \text{ and } D) \ = \ {{11} \over {36}} ]]

This result gives us the addition rule for probability, even in cases where the events are not mutually exclusive:

Addition rule for probability (general case)
Given event {E} and event {F}:

{ P(E \text{ or } F) \, = \, P(E) + P(F) - P(E \text{ and } F)}

Probability: conditional probability

Continuing with our dice-rolling example, let’s define the events {G} and {H}:

{G}: get the same number on both cubes
{H}: get a sum {\le 6}

and let’s ask the following question:

“If we perform a trial and {H} occurs, what’s the probability that {G} also occurs?”

Let’s figure it out by looking at this diagram:

The diagram shows all the outcomes in {G} circled, and all the outcomes in {H} highlighted in a black outline. By inspection, we can see that:

{G = \lbrace \, (1,1), (2,2), (3,3), (4,4), (5,5), (6,6) \, \rbrace }

{|G| = 6}

{H = \lbrace \, } { (1,1), (1,2), (1,3), (1,4), (1,5), \hspace+40pt }
{ (2,1), (2,2), (2,3), (2,4), }
{ (3,1), (3,2), (3,3), }
{ (4,1), (4,2), }
{ (5,1) \, \rbrace }

{|H| = 15}

{|S| = 36}

{G \text{ and } H = \lbrace \, (1,1), (2,2), (3,3) \, \rbrace }

{|G \text{ and } H| = 3}

We can determine the probability of the event {\text{“} G \text{ and } H \text{”}}:

[[ P(G \text{ and } H) \ = \ { {|G \text{ and } H|} \over {|S|}} \ = \ {{3} \over {36}} ]]

However, {P(G \text{ and } H)} is not the answer to our question above. What we want to know is the probability of {\text{“} G \text{ and } H \text{”}} assuming that we already know {H} has occurred. This additional knowledge means that all the outcomes in {\skew3\overline{H}} are no longer possible, so we can remove them from consideration. Therefore, we no longer need to consider all {36} possible outcomes, but, instead, we only need to consider the {15} outcomes in {H.} So to get our answer, we need to take the previous equation and divide by {|H|} instead of {|S|}:

[[ P(G|H) \ = \ { {|G \text{ and } H|} \over {|H|}} \ = \ {{3} \over {15}} ]]

The notation {\text{“} G|H \hspace+1pt \text{”}} is pronounced {\text{“} G} given {H \hspace+1pt \text{”}.}

{\text{“} G|H \hspace+1pt \text{”}} is not an event. It’s only used inside the {P} notation.

Taking the above equation and dividing both the numerator and denominator by {|S|,} we get:

[[ P(G|H) \ = \ { \displaystyle { \left( {{|G \text{ and } H|} \over {|S|}} \right) } \over \displaystyle { \left( {{|H|} \over {|S|}} \right) } } \ = \ {{3} \over {15}} ]]

which allows us to express it entirely in terms of probability measure:

[[ P(G|H) \ = \ {{P(G \text{ and } H)} \over {P(H)}} \ = \ {{3} \over {15}} ]]

This result gives us the formula for conditional probability:

Conditional probability
The probability of event {E} occurring, given that event {F} has occurred is:

[[ P(E|F) \, = \, { { P(E \text{ and } F) } \over { P(F) } } ]]

This formula can be used in all cases, even if event {E} and event {F} are dependent.

Probability: the multiplication rule (general case)

In the previous section, we discussed the difference between the probabilities of {\text{“} G \text{ and } H \hspace+1pt \text{”}} versus {\text{“} G \text{ given } H \hspace+1pt \text{”}} (written as {\text{“} G|H \hspace+1pt \text{”}}):

[[ P(G \text{ and } H) \ = \ { {|G \text{ and } H|} \over {|S|}} ]] (assuming all the outcomes in {S} are possible)

[[ P(G|H) \ = \ { {|G \text{ and } H|} \over {|H|}} ]] (assuming only the outcomes in {H} are possible)

Looking at the right sides of the equations, we can see that one is a multiple of the other:

[[ { {|G \text{ and } H|} \over {|S|} } \, = \, { {|G \text{ and } H|} \over {|H|}} \times { {|H|} \over {|S|} } ]]

or, expressing it as a probability measure:

[[ P(G \text{ and } H) \ = \ P(G|H) \times P(H) ]]

This result gives us the multiplication rule for probability:

Multiplication rule for probability (general case)
Given event {E} and event {F}:

[[ P(E \text{ and } F) \, = \, P(E|F) \times P(F) ]]

Since {\textbf{and}} is commutative, you can swap {E} and {F} on the right side, if desired:

[[ P(E \text{ and } F) \, = \, P(E) \times P(F|E) ]]

This formula can be used in all cases, even if event {E} and event {F} are dependent.

We now have two versions of the multiplication rule: one that works only for independent events, and one that works in all cases. Here’s a comparison of the two rules:

{ P(E \text{ and } F) \, = \, P(E) \times P(F) } {(}if {E} is independent of {F)}
{ P(E \text{ and } F) \, = \, P(E|F) \times P(F) } {(}in all cases{)}

Comparing these two formulas, we find that there’s another test we can use to determine if two events are independent:

Event {E} and event {F} are independent if:

{ P(E|F) \, = \, P(E) }

In other words, the events {E} and {F} are independent if the probability of {E} occurring is the same regardless of whether you know if {F} has occurred or not.

Probability: exercises (part 3)

Here’s a summary of the probability rules from the previous three sections.

{ P(E \text{ or } F) \, = \, P(E) + P(F) - P(E \text{ and } F) }

[[ P(E|F) \, = \, { { P(E \text{ and } F) } \over { P(F) } } ]]

For the following exercises, assume that we’re performing a single trial, the sample space is {S,} and we’re interested in two events {Q} and {R,} where:

{ |Q| = 15 \quad\quad |R| = 25 \quad\quad |S| = 100 \quad\quad {|Q \text{ and } R|} = 5 }

Find the probability that either {Q} occurs or {R} occurs.

[[ P(Q \text{ or } R) \, = \, P(Q) + P(R) - P(Q \text{ and } R) \, = \, { {15} \over {100} } + { {25} \over {100} } - { {5} \over {100} } \, = \, { {35} \over {100} } ]]
Find the probability that {Q} occurs, given that we already know {R} occurred.

[[ P(Q|R) \, = \, { { P(Q \text{ and } R) } \over { P(R) } } \, = \, {{\left( {{\Large 5} \over {\Large 100}} \right)} \over {\left( {{\Large 25} \over {\Large 100}} \right)}} \, = \, { {5} \over {25} } \, = \, { {1} \over {5} } ]]
Find the probability that {R} occurs, given that we already know {Q} occurred.

[[ P(R|Q) \, = \, { { P(Q \text{ and } R) } \over { P(Q) } } \, = \, {{\left( {{\Large 5} \over {\Large 100}} \right)} \over {\left( {{\Large 15} \over {\Large 100}} \right)}} \, = \, { {5} \over {15} } \, = \, { {1} \over {3} } ]]
If {P(A) = 0.8} and {P(B|A) = 0.5,} what is {P(A \text{ and } B)?}

[[ P(B|A) \, = \, { { P(A \text{ and } B) } \over { P(A) } } ]]
[[ 0.5 \, = \, { {p} \over { 0.8 } } ]]
[[ p \, = \, 0.5 \times 0.8 \, = \, 0.4 ]]

Probability: Bayes' theorem

Bayes' theorem is a formula that relates the following two probabilities:

The probability of the event {E} occurring, given that {F} has occurred.
The probability of the event {F} occurring, given that {E} has occurred.

Bayes' theorem can be derived as follows:

[[ P(E|F) \, = \, { { P(E \text{ and } F) } \over { P(F) } } ]] (the definition of conditional probability)

[[ P(E|F) \, = \, { { P(F \text{ and } E) } \over { P(F) } } ]] {(P(F \text{ and } E) } is the same as { P(E \text{ and } F))}

[[ P(E|F) \, = \, { { P(F|E) \times P(E) } \over { P(F) } } ]] (apply the multiplication rule to the numerator)

[[ P(E|F) \, = \, P(F|E) \times { { P(E) } \over { P(F) } } ]] (express it as a conversion factor)

The final equation is the formula for Bayes' theorem:

Bayes' theorem converts a conditional probability {P(F|E)} to its converse probability {P(E|F)}:

[[ P(E|F) \, = \, P(F|E) \times { { P(E) } \over { P(F) } } ]]

Probability: Bayes' theorem example

Here’s an example problem that can be solved using Bayes' theorem:

Let’s say that you’re performing quality control for a manufacturing process.
Suppose you find that {1\%} of all the manufactured items have a defect.
Suppose you also find that {4\%} of the items fail the automated inspection process. Clearly, the inspection process is producing a significant number of false results.
Suppose you also find that if an item is actually defective, then there’s a {96\%} probability that it will fail inspection.
We now want to know the converse: If an item fails inspection, what’s the probability that it’s actually defective?

We can solve this using Bayes' theorem:

[[ P(D) = 0.01 ]] ({D} means the item has a defect)

[[ P(F) = 0.04 ]] ({F} means the item fails inspection)

[[ P(F|D) = 0.96 ]] (failing inspection, given the item has a defect)

[[ P(D|F) = P(F|D) \times { { P(D) } \over { P(F) } } ]] (having a defect, given the item fails inspection)

[[ \phantom{P(D|F) } = 0.96 \times { { 0.01 } \over { 0.04 } } ]]

[[ \phantom{P(D|F) } = 0.24 ]]

and we find that {24\%} of all items that fail inspection actually have a defect.

Probability: summary

Here’s a summary of the terminology used in the study of probability:

    
        Experiment:              any process that produces a specific outcome
        
Random:                  designed to maximize the unpredictability of the outcome
        
Outcome:                 an elementary result of an experiment; outcomes must be distinct and cannot overlap with another outcome
        
Trial:                   one performance of an experiment, producing one outcome
        
Sample space:            the set of all possible outcomes for an experiment
        
Sample:                  a sequence of outcomes obtained by one or more trials
        
Event:                   a particular set of outcomes that we’re interested in obtaining
        
Event description:       a statement that describes an event if and only if true
        
Probability measure:     assigns a numerical value between {0} and {1} to each event

Here’s a summary of all the probability rules presented in this chapter:

        [[ P({E}) \ = \ {{|E|} \over {|S|}} \ = \ { \small{\text{number of outcomes in } E} \over \small{\text{size of the sample space} }}      ]]
    
        if all outcomes are equally likely

        [[ P({\skew3\overline{E}}) \ = \ 1 - P(E) ]]
    
        if {\skew3\overline{E}} is complementary to {E}

        { P(E \text{ or } F) \, = \, P(E) + P(F) }
    
        if {E} is mutually exclusive with {F}

        { P(E \text{ or } F) \, = \, P(E) + P(F) - P(E \text{ and } F) }
    
        works in all cases

        { P(E \text{ and } F) \, = \, P(E) \times P(F) }
    
        if {E} is independent of {F}

        { P(E \text{ and } F) \, = \, P(E|F) \times P(F) }
    
        works in all cases

        [[ P(E|F) \, = \, { { P(E \text{ and } F) } \over { P(F) } }  ]]
    
        works in all cases

        [[ P(E|F) \, = \, P(F|E) \times { { P(E) } \over { P(F) } }  ]]
    
        Bayes' theorem, works in all cases

Here’s a summary of the types of relationships between two events, and their properties:

        If {E} and {F} are complementary, then:
    
        {P(E) + P(F) = 1}

        If {E} and {F} are mutually exclusive, then:
    
        { P(E \text{ or } F) \, = \, P(E) + P(F) }

        If {E} and {F} are mutually exclusive, then:
    
        { P(E \text{ and } F) \, = \, 0 }

        If {E} and {F} are independent, then:
    
        { P(E \text{ and } F) \, = \, P(E) \times P(F) }

        If {E} and {F} are independent, then:
    
        { P(E|F) \, = \, P(E) }

Review exercises

If you flip a coin {5} times, what’s the probability that it will land head-side-up every time?

If you flip a coin {n} times, what’s the probability that it will create a perfectly alternating pattern of either {\underbrace{\text{HTHTHT...} \vphantom{{\lower 1pt 0}}} _ {\large \text{length $n$}}} or {\underbrace{\text{THTHTH...} \vphantom{{\lower 1pt 0}}} _ {\large \text{length $n$}}} ?

If you roll a pair of {6}-sided dice, what’s the probability of getting a sum of {5} or greater?

If a bag contains two red marbles and two white marbles, and you randomly select two marbles from it at the same time, what’s the probability that both marbles will be the same color?

A number between {1} and {100} is randomly selected, with each number having the same probability of being selected. What’s the probability that the number will either be a multiple of {2} or a multiple of {11?}

You roll two {6}-sided dice. One cube stops rolling first, and you see that it’s a {6.}
What’s the probability that the other cube is also a {6?}
To solve this, use the formula for conditional probability:
[[ P(E|F) \, = \, { { P(E \text{ and } F) } \over { P(F) } } \vphantom{\large{0 \over 0}} ]]

Experiment:		any process that produces a specific outcome

Random:		designed to maximize the unpredictability of the outcome

Outcome:		an elementary result of an experiment; outcomes must be distinct and cannot overlap with another outcome

Trial:		one performance of an experiment, producing one outcome

Sample space:		the set of all possible outcomes for an experiment

Sample:		a sequence of outcomes obtained by one or more trials

Event:		a particular set of outcomes that we’re interested in obtaining

Event description:		a statement that describes an event if and only if true

Probability measure:		assigns a numerical value between {0} and {1} to each event

in trial {1{:}\ } {s_{\large 1} = (4,6), \ }	so the event {E} occurred
in trial {2{:}\ } {s_{\large 2} = (3,1), \ }	so the event {E} did not occur
in trial {3{:}\ } {s_{\large 3} = (5,5), \ }	so the event {E} occurred
in trial {4{:}\ } {s_{\large 4} = (2,4), \ }	so the event {E} did not occur
in trial {5{:}\ } {s_{\large 5} = (3,1), \ }	so the event {E} did not occur

{\|E\| = 3}	{(}there are {3} ways to roll a sum {= 10)}

{\|\skew3\overline{E}\| = 33}	{(}there are {33} ways to roll a sum {\ne 10)}

{\|E\| = 3}	{(}there are {3} ways to roll a sum {= 10)}

{\|F\| = 1}	{(}there is {1} way to roll a sum {= 12)}

[[ P({\skew3\overline{E}}) \ = \ 1 - P(E) ]]	if {\skew3\overline{E}} is complementary to {E}

{ P(E \text{ or } F) \, = \, P(E) + P(F) }	if {E} is mutually exclusive with {F}

{A = \lbrace \, \text{2, 3, 4} \, \rbrace }	{(}roll one cube, and get an outcome of {2,} {3,} or {4)}

{B = \lbrace \, \text{5, 6} \, \rbrace }	{(}roll one cube, and get an outcome of {5} or {6)}

{C = \lbrace \, (6,1), (6,2), (6,3), (6,4), (6,5), (6,6) \, \rbrace }	{(}first cube {= 6)}

{D = \lbrace \, (1,6), (2,6), (3,6), (4,6), (5,6), (6,6) \, \rbrace }	{(}second cube {= 6)}

{C \text{ or } D \, = \, \lbrace \, }	{ (6,1), (6,2), (6,3), (6,4), (6,5), (6,6), }
	{ (1,6), (2,6), (3,6), (4,6), (5,6) \, \rbrace }

{\|C \text{ or } D\| \, = \, 11 }

{G = \lbrace \, (1,1), (2,2), (3,3), (4,4), (5,5), (6,6) \, \rbrace }

{\|G\| = 6}

{H = \lbrace \, }	{ (1,1), (1,2), (1,3), (1,4), (1,5), \hspace+40pt }
	{ (2,1), (2,2), (2,3), (2,4), }
	{ (3,1), (3,2), (3,3), }
	{ (4,1), (4,2), }
	{ (5,1) \, \rbrace }

{\|H\| = 15}

{\|S\| = 36}

{G \text{ and } H = \lbrace \, (1,1), (2,2), (3,3) \, \rbrace }

{\|G \text{ and } H\| = 3}

[[ P(G \text{ and } H) \ = \ { {\|G \text{ and } H\|} \over {\|S\|}} ]]	(assuming all the outcomes in {S} are possible)

[[ P(G\|H) \ = \ { {\|G \text{ and } H\|} \over {\|H\|}} ]]	(assuming only the outcomes in {H} are possible)

{ P(E \text{ and } F) \, = \, P(E) \times P(F) }	{(}if {E} is independent of {F)}
{ P(E \text{ and } F) \, = \, P(E\|F) \times P(F) }	{(}in all cases{)}

[[ P(E\|F) \, = \, { { P(E \text{ and } F) } \over { P(F) } } ]]	(the definition of conditional probability)

[[ P(E\|F) \, = \, { { P(F \text{ and } E) } \over { P(F) } } ]]	{(P(F \text{ and } E) } is the same as { P(E \text{ and } F))}

[[ P(E\|F) \, = \, { { P(F\|E) \times P(E) } \over { P(F) } } ]]	(apply the multiplication rule to the numerator)

[[ P(E\|F) \, = \, P(F\|E) \times { { P(E) } \over { P(F) } } ]]	(express it as a conversion factor)

[[ P(D) = 0.01 ]]	({D} means the item has a defect)

[[ P(F) = 0.04 ]]	({F} means the item fails inspection)

[[ P(F\|D) = 0.96 ]]	(failing inspection, given the item has a defect)

[[ P(D\|F) = P(F\|D) \times { { P(D) } \over { P(F) } } ]]	(having a defect, given the item fails inspection)

[[ \phantom{P(D\|F) } = 0.96 \times { { 0.01 } \over { 0.04 } } ]]

[[ \phantom{P(D\|F) } = 0.24 ]]

[[ P({E}) \ = \ {{\|E\|} \over {\|S\|}} \ = \ { \small{\text{number of outcomes in } E} \over \small{\text{size of the sample space} }} ]]	if all outcomes are equally likely

[[ P({\skew3\overline{E}}) \ = \ 1 - P(E) ]]	if {\skew3\overline{E}} is complementary to {E}

{ P(E \text{ or } F) \, = \, P(E) + P(F) }	if {E} is mutually exclusive with {F}

{ P(E \text{ or } F) \, = \, P(E) + P(F) - P(E \text{ and } F) }	works in all cases

{ P(E \text{ and } F) \, = \, P(E) \times P(F) }	if {E} is independent of {F}

{ P(E \text{ and } F) \, = \, P(E\|F) \times P(F) }	works in all cases

[[ P(E\|F) \, = \, { { P(E \text{ and } F) } \over { P(F) } } ]]	works in all cases

[[ P(E\|F) \, = \, P(F\|E) \times { { P(E) } \over { P(F) } } ]]	Bayes' theorem, works in all cases

If {E} and {F} are complementary, then:	{P(E) + P(F) = 1}

If {E} and {F} are mutually exclusive, then:	{ P(E \text{ or } F) \, = \, P(E) + P(F) }

If {E} and {F} are mutually exclusive, then:	{ P(E \text{ and } F) \, = \, 0 }

If {E} and {F} are independent, then:	{ P(E \text{ and } F) \, = \, P(E) \times P(F) }

If {E} and {F} are independent, then:	{ P(E\|F) \, = \, P(E) }