Probability

Now that we gather some data and extracted features from them in form of variables is time to do inference, or educated guess with the data that you have available. In order to do inference we need to first understand probability.

Consider probability as a measure of the likelihood that some event will occur. Where P(E)=0P(E)=0 means that the probability of this event to occur is "impossible" and P(E)=1P(E)=1 means that the event will occur with 100% chance.

Probability can be determined experimentally or theoretically

Theoretically

Consider below a fair dice

P(E)=Possible ways of ENumber of possible outcomesP(E)=\frac{\text{Possible ways of E}}{\text{Number of possible outcomes}}

Here the probability of tossing the six-sided fair dice and having the value 1 is

P(E=1)=16=16.7%P(E=1)=\frac{1}{6}=16.7\%

On each toss only one value is possible (the dice only give one value at a time) and there are 6 possible values.

In order to make easier to find theoretical probabilities we may need to organize the data on tables or trees.

Experimentally

P(E)=number events occurednumber of trialsP(E)=\frac{\text{number events occured}}{\text{number of trials}}

Consider that we rolled the dice 12 times with the results: 6,3,4,1,2,2,1,3,1,5,3,5

P(E=1)=312=25%P(E=1)=\frac{3}{12}=25\%

Here the experimental value is wrong but as we do more experiments the experimental probability tends to reach the theoretical value.

Some assumptions

Here we present some assumptions to guide you:

  • Probability of A or B P(A or B)=P(A)+P(B)P(A and B)P(\text{A or B})=P(A)+P(B)-P(\text{A and B})

  • Probability of A and B P(A and B)=P(A).P(B)=P(AB)P(\text{A and B})=P(A).P(B)=P(A \cap B)

  • Probability of A not happening P(Not A)=1P(A)P(\text{Not A})=1-P(A)

  • The sum of all probabilities is always 1

Sometimes "OR" is substituted by the union symbol \cup and the "AND" is substituted by the intersection symbol \cap

Conditional Probability

What is the probability of rolling a dice and it's value is less than 4 (B) given that the value is a odd number(A). In other words:

P(BA)P(B|A)

As the dice has 6 possible values (1,2,3,4,5,6) it's probability of having a value less than 4 (3,2,1) will be P(B)=36=0.5P(B)=\frac{3}{6}=0.5.

Now about the probability of just having an odd number (1,3,5) will be P(A)=36=0.5P(A)=\frac{3}{6}=0.5

P(BA)=P(AB)P(A)P(B|A)=\frac{P(A \cap B)}{P(A)}

On this case the probability will be:

P(AB)=26=0.333P(A \cap B)=\frac{2}{6}=0.333

P(A)=0.5P(A)=0.5

P(BA)=P(AB)P(A)=0.3330.5=0.666P(B|A)=\frac{P(A \cap B)}{P(A)}=\frac{0.333}{0.5}=0.666

Dependence/Independence of events

It's important to define if the events has some kind of dependence or not because it will affect the way you calculate the probabilities.

  • Independent events: One event does not affect the likelihood of the next event to occur.

  • Dependent event: One event does affect the likelihood of the next event to occur.

Dependent event example:

You have a deck of cards, you shuffle them, then you take one card, and leave it out of the deck, them you take another card again, what is the probability of both been jokers.

First a deck has 52 cards, plus 2 jokers

So the probability of having a joker for the first time will be: (There are 2 jokers available on 54 cards)

P(joker)=254=0.037P(\text{joker})=\frac{2}{54}=0.037

Now take the joker out of the deck, shuffle and take one card again... What is the probability of having another joker ...

Now you need to consider the fact that one joker is already out

P(jokeragain)=153=0.018P(\text{joker}_\text{again})=\frac{1}{53}=0.018

The complete probability will be:

P(joker and joker)dependent=P(joker).P(jokeragain)=254.153=11431P(\text{joker and joker})_\text{dependent}=P(\text{joker}).P(\text{joker}_\text{again})=\frac{2}{54}.\frac{1}{53}=\frac{1}{1431}

Independent event example:

Now we do the same experiment but after we take the card you put it back again on the deck. On this case observe that the fact that we add the card back again make the events independent.

So the probability of having a joker for the first time will be: (There are 2 jokers available on 54 cards)

P(joker)=254=0.037P(\text{joker})=\frac{2}{54}=0.037

Now we but the card again on the deck and shuffle. What is the probability of having the joker again

P(joker)=254=0.037P(\text{joker})=\frac{2}{54}=0.037

The complete probability will be:

P(joker and joker)independent=P(joker).P(joker)=254.254=1729P(\text{joker and joker})_\text{independent}=P(\text{joker}).P(\text{joker})=\frac{2}{54}.\frac{2}{54}=\frac{1}{729}

Random Variables

Random variables are any result of a stochastic system (random process) for example how many heads will occur in a series of 20 flips. The name variable is confusing is more like the output of a stochastic system. There are 2 types of random variables:

  • Discrete: Example X=P(head)X = P(\text{head})

  • Continuous: Y=mass of random animal at the zooY = \text{mass of random animal at the zoo}

Normally all the observations from a process with uncertainties are random variables.

Example: X = Number of heads after 3 flips of a coin, calculate:

  • P(X=0)

  • P(X=1)

  • P(X=2)

  • P(X=3)

Adding all possible outcomes of a fair coin tossed 3 times

HHH

THH

HHT

THT

HTH

TTH

HTT

TTT

On this case:

  • P(X=0)=18P(X=0)=\frac{1}{8}

  • P(X=1)=38P(X=1)=\frac{3}{8}

  • P(X=2)=38P(X=2)=\frac{3}{8}

  • P(X=3)=18P(X=3)=\frac{1}{8}

So as mention before X is the output of a probabilistic process. Now let's also draw the probabilistic distribution for X

Probability Distribution

The probability distribution is a table, graph or function that describe all the probabilities for each possible outcome of a random process. Depending of the type of random variable the Probability distribution has different names:

  • Probability mass function: If variable is discrete

  • Probability density function: If variable is continuous.

All the values of a probability distribution are non-negative and sum to one.

The importance of the probability distribution is that you can easily infer information from it:

  • Mode: Gives you the most probable value, is the peak of the probability density function.

  • Mean or expected value: Is the weighted average of the possible values, using their probabilities as their weights

  • Median: The value such that the set of values less/bigger than the median has a probability of one-half.

Also the probability density function tells something about the random process: For example let X be the outcome of a fair dice.

On this case it's clear that all possible outcomes are equally probable.

Expected Value

The expected value is the average of a random variable. Or is a sum of the product between the random variable value and it's probability.

E[X]=i=1xi.piE[X]=\sum_{i=1}^{\infty} x_i.p_i

E[X]=xf(x)dxE[X]=\int_{-\infty}^{\infty} xf(x) dx

Example: Let X represent the outcome of a roll of a fair six-sided, calculate the expected value (or expectation) of X

E[X]=1.16+2.16+3.16+4.16+5.16+6.16=3.5E[X]=1.\frac{1}{6}+2.\frac{1}{6}+3.\frac{1}{6}+4.\frac{1}{6}+5.\frac{1}{6}+6.\frac{1}{6}=3.5

Another example given the following probability distribution of a discrete random variable X

x

0

1

2

p(x)

0.16

0.48

0.36

Joint probability distribution

The joint probability distribution is the probability space formed by 2 or more random variables.

Things that you can get from Joint probability distributions:

  • Check if the variables are independent and get the marginal probability function

  • Derive the joint distribution function

  • Derive the conditional probability function, conditional expectations, and conditional variance

  • Derive the joint expectation (Expected value of the product of 2 random variables]

References:

Last updated