“Machine Learning, Deep Learning, Data Science, etc. are all in the same nexus – which is the basically Probability plus Statistics. ”

 

Probability - The Science of Uncertainty and Data

Let’s fact it: life is uncertain. But one thing is certain: We need a way to make predictions and make decisions under uncertainty.

 

Where will be covered?

 

Basic probability

Sample space Ω

Namely, the elements should be mutually exclusive and collectively exhaustive.

Mutually exclusive means that, two or more events that cannot happen simultaneously

Being collectively exhaustive means something else-- that, together, all of these elements of the set exhaust all the possibilities, which is p(x)=1.

Probability Axioms (公理)

Event A is a subset of Ω, assigned with probability P(A).

Complement of A : Ac, is the event that A does not happened.

Probability Properties

Probability, at the minimum, gives us some rules for thinking systematically about uncertain situations.

The probability of union Ax is not equal to the sum of the P(Ax), where A can be uncountable sets.

Loosely speaking, probabilities can be interpreted as “Frequency” or “Describing our beliefs”.

 

Conditional Probability

P(A|B) = probability of event A, given that B occurred.

P(AB)=P(AB)P(B)define only when: P(B)>0)

And if we had a partition of our sample space into an infinite sequence of event Ai, we also have total probability theorem or “weighted average”:

P(B)=iP(B|Ai)

Example of total probability theorem

Assume we have n biased coin i with probability 2i of being selected and probability 3i results in Head. The probability that the result is Head is:

Set event Ai is that coin i being selected and result in Head, B is that we result in Head.

Probability Tools

De Morgon’s Laws

If we take the intersection of two sets and then take the complement of this intersection, what we obtain is the union of the complements of the two sets.

ScTc=(ST)cScTc=(ST)c

Or generally we have:

(nSn)c=nSnc(nSn)c=nSnc

Problem Solving Examples

P((ABc)(AcB))=P(A)+P(B)2P(AB)

Set in A not in B : ABc,

Giving P(A)=P(ABc)+P(AB).

So the whole term equal to “union - intersection”.

Bonferroni's inequality

Utilize to interpret the union bound.

P(A1A2)P(A1)+P(A2)

And Vise versa, now they are most:

P(A1A2)P(A1)+P(A2)1

(b) Generalize to the case of n events A1,A2,,An, by showing that

P(A1A2An)P(A1)+P(A2)++P(An)(n1)

Proof: P(A1A2)P(A1)+P(A2)1

We have: P((A1A2)c)=P(A1cA2c)P(A1c)+P(A2c)P((A1A2)c)=1P(A1A2)P(A1c)+P(A2c)=1P(A1)+1P(A2)

Bayes’ Rule

We initial beliefs P(Ai) on possible causes of an observed event B

 

Probability Recitations

1. Romeo and Juliet

image-20220301180157725

Geometric Demonstration just boils down to not a probability problem, but a problem in geometry - Calculate the area of the space.

Then you can ask, if he wants to have at least a 90% chance of meeting her, how long should he be willing to wait? ?