# Probability theory
Much is based on [Rick Durrett - Probability: Theory and Examples. 5th Edition](https://sites.math.duke.edu/~rtd/PTE/pte.html)
## 1. Measure theoretic foundations
### Definitions and elementary facts from measure theory
#### Measures and probability measures
>[!info] Definition
>A *discrete probability space* is a finite (or countably infinite) set $\Omega$ (called the set of outcomes),
>equipped with a function $p:\Omega\to \mathbb{R}_{\geq0}$, such that $\sum_{\omega\in\Omega}p(\omega)=1$.
The quantity $p(\omega)$ is called the *probability* of an outcome $\omega$.
>[!info] Definition
>Suppose $\Omega$ is a set. We denote by $2^{\Omega}$ the collection of its subsets.
>1. A collection $\mathcal{F}\subset2^{\Omega}$ is called a *$\sigma$-algebra on $\Omega$* if the following conditions are satisfied:
> - $\emptyset\in\mathcal{F}$;
> - if $A\in\mathcal{F}$, then $A^{c}:=\Omega\setminus A\in\mathcal{F}$;
> - if $A_{1},A_{2},\dots$ is a sequence of subsets of $\Omega$ such that $A_{i}\in\mathcal{F}$ for all $i$, then $\cup_{i=1}^{\infty}A_{i}\in\mathcal{F}$.
> 2. If $\mathcal{A} \subset2^{\Omega}$ the $\sigma(\mathcal{A})$ is the smallest $\sigma$-algebra containing $\mathcal{A}$.
> 3. If $\Omega$ is a topological space with open sets $\tau$ then the *Borel $\sigma$-algebra* is $\mathcal{B}(\Omega,\tau)=\sigma(\tau)$.
> 4. $(\Omega,\mathcal{F})$ is a *measurable space* if $\mathcal{F}$ is a $\sigma$-algebra on $\Omega$.
> 5. A function $\mu:\mathcal{F}\to \mathbb{R}_{\geq0}\cup\{+\infty\}$ is called a *measure* (on $\mathcal{F}$) if it satisfies the following properties:
> - $\mu(\emptyset)=0;$
> - ($\sigma$-additivity or countable additivity) If $A_{1},A_{2},\dots$ is a sequence of disjoint sets (that is, $A_{i}\cap A_{j}=\emptyset$ for $i\neq j$), such that $A_{i}\in \mathcal{F}$ for all $i$, then $
> \mu(\cup_{i=1}^{\infty}A_{i})=\sum_{i=1}^{\infty}\mu(A_{i}).$
> 6. A *measure space* is a triple $(\Omega, \mathcal{F}, \mu)$ such that $\Omega$ is a set, $\mathcal{F}$ is a $\sigma$-algebra on $\Omega$, and $\mu$ a measure on $\mathcal{F}$.
> 7. A measure space $(\Omega, \mathcal{F}, \mu)$ is called *$\sigma$-finite* if there is a sequence of sets $E_i \in \Omega$, such that $\mu\left(E_i\right)<\infty$, and $\Omega=\cup_{i=1}^{\infty} E_i$.
> 8. A measure $\mu$ is called *a probability measure* if $\mu(\Omega)=1$.
> 9. A *probability space* is a triple $(\Omega, \mathcal{F}, \mathbb{P})$, where $\Omega$ is a set, $\mathcal{F}$ is a $\sigma$-algebra on $\Omega$, and $\mathbb{P}$ is a probability measure on $\mathcal{F}$ ($\mathbb{P}(\Omega)=1$).
^d3fff1
>[!info] Definition
> Given a measure space $(\Omega, \mathcal{F}, \mu)$ , a set $A \subset \Omega$ is called a *null-set* (w. r. t. $\mu$ ) if it is contained in a measurable set of measure zero.
> The measure $\mu$ is called *complete* if all null-sets are $\mathcal{F}$-measurable.
> If $\mu$ is not complete, denote
> $
> \overline{\mathcal{F}}:=\{A \cup B \subset \Omega: A \in \mathcal{F}, B \text { is a null-set }\} .
> $
>the *completion* of $\mathcal{F}$.
>
>Note that $\overline{\mathcal{F}}$ is a $\sigma$-algebra, and $\mu$ can be extended to a measure $\bar{\mu}$ on $\overline{\mathcal{F}}$ by $\bar{\mu}(A):=\mu\left(A^{\prime}\right)$, where $A^{\prime} \subset A$ is any $\mathcal{F}$-measurable set such that $A \backslash A^{\prime}$ is a null-set.
> When extended to $\overline{\mathcal{F}}$, the measure $\mu$ is complete; we call this procedure a *completion of the measure $\mu$.*
>[!info] Definition - Measurable function
>A map $f: \Omega_1 \rightarrow \Omega_2$ between two measurable spaces $\left(\Omega_1, \mathcal{F}_1\right)$ and $\left(\Omega_2, \mathcal{F}_2\right)$ is called *measurable* if the preimage of any measurable set is measurable.
>[!info] Definition
>Let $\left(\Omega_1, \mathcal{F}_1\right)$ and $\left(\Omega_2, \mathcal{F}_2\right)$ be measurable spaces, $\mu$ a measure on $\left(\Omega_1, \mathcal{F}_1\right)$ and $f:\Omega_{1}\to \Omega_{2}$ a measurable function. Then the *push-forward* of $\mu$ by $f$, denoted $f_{*}(\mu)$, is the measure on $\left(\Omega_2, \mathcal{F}_2\right)$ defined by: $
> f_{*}(\mu)(B)= \mu(f^{-1}(B))
> $
^430066
#### Integration
>[!info] Definition - Lebesgue intergral
> Let $(\Omega,\mu)$ be a measurable space and $h:\Omega\to \mathbb{R}$ measurable a function. The *integral* $\int_{\Omega} h d \mu$ is defined in three steps.
> 1. If $h=\sum_{i=1}^n a_i \mathbb{I}_{A_i}$, where $A_1, \ldots, A_n$, are measurable of finite measure (such $h$ are called *simple functions*), put
> $
> \int_{\Omega} h d \mu:=\sum_{i=1}^n a_i \mu\left(A_i\right)
> $
> 2. If $h$ is a non-negative measurable function, put
> $
> \int_{\Omega} h d \mu:=\sup _{\substack{g \leq h \\ g \text { simple }}} \int_{\Omega} g d \mu
> $
> 3. For a general $h$, put $
> \int_{\Omega} h d \mu:=\int_{\Omega} h \mathbb{I}_{h \geq 0} d \mu-\int_{\Omega}\left(-h \mathbb{I}_{h<0}\right) d \mu$whenever at least one of the terms is finite, otherwise we say that the integral does not exist.
^842506
### Probability spaces
#### Random variables
>[!info] Definition
> Given a probability space ( $\Omega, \mathcal{F}, \mathbb{P}$ ), a measurable map from $\Omega$ to a measurable space ( $\Omega^{\prime}, \mathcal{F}^{\prime}$ ) is called *a random variable* (with values in $\Omega^{\prime}$ ).
> If $X$ is a random variable with values in $\Omega^{\prime}$, then the measure $\mu_{X}$ on $\mathcal{F}^{\prime}$ given by $ \mu_{X}(A):=\mathbb{P}\left(X^{-1}(A)\right) $
> is called a *distribution of the random variable $X$*.
> >[!note] Remark
>This is essentially the push-forward of $\mathbb{P}$ by $X$.
^b736ee
>[!note] Remark
> One can write $\mathbb{P}\left(X^{-1}(A)\right)=\mathbb{P}(\{\omega \in \Omega: X(\omega) \in A\})$.
> It is customary in Probability texts to use capital Latin letters for random variables (e. g., $X$ instead of $f$ ) and abbreviate the last formula to something like $\mathbb{P}(X \in A)$ or $\mathbb{P}(X\leq a)$ (when $a\in \mathbb{R}$).
>[!info] Definition
>A *scalar random variable* is a random variable with values in $\mathbb{R}$ (measurable with respect to the Borel $\sigma$-algebra).
>The *probability distribution function* (or *cumulative distribution*) of $X$ is the function $
> F_X(a):=\mathbb{P}(X \in(-\infty, a]) = \mathbb{P}(X\leq a)$
^dc0272
#### Conditional probability
>[!info] Definition
>Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space, and $A \in \mathcal{F}$, then the *restriction* of $\mathcal{F}$ to $A$: $\mathcal{F}|_A:=\{A \cap B: B \in \mathcal{F}\}$ is a $\sigma$-algebra on $A$, and $\mu(B):=\mathbb{P}(A \cap B)$ is a measure on $\mathcal{F}|_A$.
>If $\mathbb{P}(A)=0$, this measure identically zero; otherwise it could be normalized to be a probability measure on $A$, called the *conditional probability*: $
> \mathbb{P}(B \mid A):=\frac{\mathbb{P}(B \cap A)}{\mathbb{P}(A)} . $
>[!note] Remark
> Exchanging $A$ and $B$ in the above definition, one arrives at *Bayes' formula*:
> $
> \mathbb{P}(B \mid A)=\frac{\mathbb{P}(A \mid B) \mathbb{P}(B)}{\mathbb{P}(A)} .
> $
### Dynkin's $\pi-\lambda$ theorem and uniqueness of measures
>[!info] Definition
> A collection $\mathcal{A}$ of subsets of a set $\Omega$ is called
> - A *$\pi$-system* if it is closed under intersections: $A, B \in \mathcal{A} \Rightarrow A \cap B \in \mathcal{A}$.
> - a *$\lambda$-system* if
> - $\Omega \in \mathcal{A}$;
> - if $A, B \in \mathcal{A}$ and $A \subset B$, then $B \backslash A \in \mathcal{A}$;
> - if $A_1 \subset A_2 \subset \ldots$ all belong to $\mathcal{A}$, then $\cup_{i=1}^{\infty} A_i \in \mathcal{A}$.
> >[!note] Remark
>the last clause can be replaced by
>>- if $A_j \in \mathcal{A}$ for $j \in \mathbb{N}$ and $A_i \cap A_j=\emptyset$ for $i \neq j$, then $\cup_{i=1}^{\infty} A_i \in \mathcal{A}$.
>[!example]
> Assume that $\mu$ and $\nu$ are two probability measures on the same $\sigma$-algebra . Then $\{A \in \mathcal{F}: \mu(A)=\nu(A)\}$ is a $\lambda$-system.
**Lemma.** If $\mathcal{A}$ is both $\pi$- and $\lambda$-system, then it is a $\sigma$-algebra.
**Theorem ( $\pi-\lambda$ theorem or Dynkin's lemma)** Let $\mathcal{A}$ be a $\pi$-system and $\Lambda$ be a $\lambda$-system. Then
$
\mathcal{A} \subset \Lambda \Rightarrow \sigma(\mathcal{A}) \subset \Lambda .
$
**Corollary (uniqueness of measures).**
1. If two measures $\mu_1$ and $\mu_2$ on the same $\sigma$-algebra $\mathcal{F}$ are such that $\mu_1(\Omega)=\mu_2(\Omega)< \infty$ and that they agree on a $\pi$-system $\mathcal{A}$, they agree also on $\sigma(\mathcal{A})$.
2. A probability measure on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ is uniquely determined by its probability distribution function $F_{\mathbb{P}}(t):=\mathbb{P}((-\infty, t])$.
#### Usage example
>[!info] Definition
> Let $X,Y$ be scalar random variables. We say that *$X$ and $Y$ are independent* if $
> \mathbb{P}(X \in A, Y \in B)=\mathbb{P}(X \in A) \mathbb{P}(Y \in B), \quad \forall A, B \in \mathcal{B}(\mathbb{R}) . $
**Lemma.** Suppose that $
\mathbb{P}(X \leq s, Y \leq t)=\mathbb{P}(X \leq s) \mathbb{P}(Y \leq t), \quad \forall s, t \in \mathbb{R} .$Then $X$ and $Y$ are independent.
### Caratheodory extension and existence of measures
>[!info] Definition
> A collection $\mathcal{R} \subset 2^{\Omega}$ of subsets of a set $\Omega$ is called a *semi-ring* if the following conditions are satisfied:
> - $\emptyset \in R ;$
> - it is a $\pi$-system, i.e. closed under intersections;
> - if $A, B \in \mathcal{R}$, then there exists a finite collection of disjoint sets $A_1, \ldots, A_n \in \mathcal{R}$ such that $ A \backslash B=\cup_{i=1}^n A_i $
> $\mathcal{R}$ is called a *ring* if, in addition, it is closed under finite union of disjoint sets.
>[!example]
>The following are semi-rings:
> - $\mathcal{I}:=\{[a , b): a, b \in \mathbb{R}\}$
> - $\mathcal{J}:=\{\emptyset\} \cup\{(a, b)$ : $a, b \in \mathbb{R}, a<b\} \cup\left\{\cup_{i=1}^n\left\{c_i\right\}: n \in \mathbb{N}, c_i \in \mathbb{R}\right\}$
**Lemma.** Let $\mathcal{R}_0$ be a semi-ring. Then $
\mathcal{R}:=\left\{\bigcup_{i=1}^n A_i: n \in \mathbb{N}, A_i \in \mathcal{R}_0,\left\{A_i : i=1,\dots, n \right\} \text{ disjoint }\right\}$ is a ring.
>[!info] Definition
> - We say that $\mu$ is a *finitely additive function* on a semi-ring $\mathcal{R}$ if whenever $A_1, \ldots, A_N \in \mathcal{R}$ are mutually disjoint and $A=\bigcup_{i=1}^N A_i \in \mathcal{R}$, then $\mu(A)=\sum_{i=1}^N \mu\left(A_i\right)$.
> - We say that $\mu$ is *countably subadditive* on $\mathcal{R}$ if for any $A=\bigcup_{i=1}^{\infty} A_i$ with $A, A_i \in \mathcal{R}$, we have $\mu(A) \leq \sum_{i=1}^{\infty} \mu\left(A_i\right)$.
> >[!note] Remark
> $\mu$ can be uniquely extended to the ring-extension of $\mathcal{R}$.
> - We say that a function $\mu: \mathcal{R} \rightarrow \mathbb{R}_{\geq 0}$, is a *pre-measure* on $\mathcal{R}$ if $\mu(\emptyset)=0, \mu$ is finitely additive and countably subadditive on $\mathcal{R}$.
^e50295
**Theorem (Caratheodory extension theorem).** Let $\mathcal{R}$ be a semi-ring on a set $\Omega$, and let $\mu: \mathcal{R} \rightarrow \mathbb{R}_{\geq 0}$ be a pre-measure. Then there exists a measure on $\sigma(\mathcal{R})$ that coincides with $\mu$ on $\mathcal{R}$.
Furthermore, if $\Omega$ is the countable union of members of $\mathcal{R}$, then the extension is unique.
*Proof idea.* Define for every $E \in 2^{\Omega}$ the *outer measure* $\mu^*(E):=\inf _{\substack{\cup_{i=1}^{\infty} A_i \supset E \\ A_i \in \mathcal{R}}} \sum_{i=1}^{\infty} \mu\left(A_i\right)$ and let $\mathcal{M}:=\left\{A \in 2^{\Omega}: \mu^*(E)=\mu^*(E \cap A)+\mu^*(E \backslash A), \forall E \in 2^{\Omega}\right\} .$
Show that $\mathcal{M}$ is a $\sigma$-algebra containing $\mathcal{R}$ and that $\mu^{\ast}$ is a measure on $\mathcal{M}$ extending $\mu$.
>[!example] Example + Definition (Lebesgue measure)
> There exists a unique measure $\lambda$ on $\mathbb{R}$ such that for any $a<b$, $\lambda([a, b))=b-a$.
> *Proof.* Use the extension of $\mathcal{J}$ above - the semi-ring of finite unions of open intervals and singletons, with the pre-measure:
> $\operatorname{vol}\left(\cup_{j=1}^M (a_{j},b_{j}) \cup_{j=1}^N\left\{c_j\right\}\right)=\sum_{j=1}^M\left(b_j-a_j\right)$
**Lemma.** For any scalar random variable $X, F_X(a)$ is non-decreasing, right-continuous (i. e., $F_X\left(a_i\right) \rightarrow F_X(a)$ whenever $a_i \searrow a$, and has limits $\lim _{a \rightarrow+\infty} F_X(a)=1$ and $\lim _{a \rightarrow-\infty} F_X(a)=0$. Conversely, if $F$ is any function with these properties, then there exists a probability measure $\mu$ on $\mathcal{B}(\mathbb{R})$ such that $F(a)=\mu((-\infty, a])$.
### Expectation
>[!info] Definition
>The *expectation* of a real-valued random variable $X$ on $(\Omega,\mathcal{F},\mathbb{P})$ is defined as the [[Probability theory - notes#Integration|(Lebesgue) integral]] over the measure space:
> $
> \mathbb{E}(X)=\int_{\Omega} X d \mathbb{P}
> $
^89fd83
> [!tip] Intuition
> The expectation, or expected value, of a random value, represent a weighted average of the outcomes, where the weights are the respective probabilities of the outcomes. Thus, if $X$ has finitely many possible values $\{x_{1} ,\dots, x_{n}\}$, this is reduced to:
> $ \mathbb{E}(X)= \sum_{i=1}^{n} x_{i} \mathbb{P}(X=x_{i})$
>[!note] Terminology
> "Almost surely" = "except on a measure 0 set".
#### Properties of integral/expectation
##### Elementary properties
**Proposition**. The expectation satisfies the following properties:
- For a measurable $A$ $\mathbb{E}(\mathbb{I}_{A})=\mathbb{P}(A)$
- *Linearity*: if $\alpha, \beta \in \mathbb{R}$, and $\mathbb{E} X$ and $\mathbb{E} Y$ exist, then $\mathbb{E}(\alpha X+\beta Y)$ exists, and
$
\mathbb{E}(\alpha X+\beta Y)=\alpha \mathbb{E}(X)+\beta \mathbb{E}(Y) ;
$
- *Monotonicity*: if $X, Y$ are measurable such that $0 \leq X(\omega) \leq Y(\omega)$ for all $\omega \in \Omega$ and $\mathbb{E} Y$ exists, then $\mathbb{E}(X) \leq \mathbb{E}(Y)$.
Furthermore, if an operator on bounded measurable functions satisfies these properties, then it equals the expectation.
##### Convergence properties
**Theorem (monotone convergence theorem).** If $X_i \geq 0$ are measurable and $X_i \nearrow X$ almost surely, then $\mathbb{E}\left(X_i\right) \rightarrow \mathbb{E}(X)$.
>[!warning]+
>It is, in general, not true that $X_n \rightarrow X$ almost surely implies $\mathbb{E} X_n \rightarrow \mathbb{E} X$ :
**EXAMPLE (Growing bump).** Let $\Omega=(0,1)$ with Lebesgue measure $\lambda$, and $X_n=n \mathbb{I}_{\left(0 ; \frac{1}{n}\right)}$. Then $X_n(\omega) \rightarrow 0$ for any $\omega \in(0,1)$, but $\mathbb{E} X_i=n \lambda\left(\left(0 ; \frac{1}{n}\right)\right) \equiv 1$.
**Lemma (Fatou).** If $X_n \geq 0$, then
$
\liminf \mathbb{E} X_n \geq \mathbb{E} \liminf X_n .
$
**Theorem (Lebesgue's dominated convergence theorem).** If $X_i$ are scalar random variables, $X_i \rightarrow X$ almost surely, and there exists a random variable $Y$ with $\mathbb{E}(|Y|)<\infty$ such that $\left|X_i\right| \leq|Y|$ almost surely for all $i$, then $\mathbb{E}\left(X_i\right) \rightarrow \mathbb{E}(X)$.
##### Inequalities
**Proposition.** The expectation satisfies the following useful inequalities:
- (*Cauchy-Schwartz*)
$
\mathbb{E}(X Y) \leq \sqrt{\mathbb{E}\left(X^2\right)} \cdot \sqrt{\mathbb{E}\left(Y^2\right)} .
$
- (*Hölder's inequality*) for $p, q>1$ such that $\frac{1}{p}+\frac{1}{q}=1$,
$
\mathbb{E}(X Y) \leq\left(\mathbb{E}|X|^p\right)^{\frac{1}{p}}\left(\mathbb{E}|Y|^q\right)^{\frac{1}{q}} .
$
- (*Jensen's inequality*) if $f: \mathbb{R} \rightarrow \mathbb{R}$ is a convex function and $\mathbb{E}|X|<\infty, \mathbb{E}(|f(X)|)<\infty$, then
$
f(\mathbb{E}(X)) \leq \mathbb{E}(f(X))
$
Particular useful cases are $|E(X)| \leq \mathbb{E}(|X|)$ and $(\mathbb{E} X)^2 \leq \mathbb{E} X^2$.
- (*Chebyshev's inequality or Markov's inequality*) for a non-negative random variable $X$ and $a>0$,
$
\mathbb{P}(X \geq a) \leq \frac{\mathbb{E} X}{a} .
$
In particular, for an increasing function $f:[0, \infty) \rightarrow[0, \infty)$,
$
\mathbb{P}(X \geq a)=\mathbb{P}(f(X) \geq f(a)) \leq \frac{\mathbb{E} f(X)}{f(a)} .
$
#### Change of variable
**Proposition (abstract change of variable theorem).** Let ( $\Omega_1, \mathcal{F}_1, \mathbb{P}$ ) be a probability space, ( $\Omega_2, \mathcal{F}_2$ ) a measurable space, $X: \Omega_1 \rightarrow \Omega_2$ a random variable and $f: \Omega_2 \rightarrow \mathbb{R}$ a measurable function. Then, $
\mathbb{E}(f \circ X)= \int_{\Omega_{1}} f(X) d\mathbb{P} =\int_{\Omega_2} f d \mu_X
$where $\mu_X$ denotes the [[Probability theory - notes#^b736ee|distribution]] of the random variable $X$.^change-var
In particular, if $X:\Omega_{1} \to \mathbb{R}$, then: $
\mathbb{E}(X)=\int_{\mathbb{R}} x d \mu_X.
$
#### Differentiating under the integral
>[!question]
>What conditions on a function $f:I \times \Omega \to \mathbb{R}$ are needed for the following equality to hold
>$\frac{\partial}{\partial x} \int_{\Omega} f(x, \omega) d \mu(\omega)=\int_{\Omega} \frac{\partial}{\partial x} f(x, \omega) d \mu(\omega) ?$
>[!example] Counterexample
>Let, for $y>0$ and $x \in \mathbb{R}$,
> $
> f(x, y)=\left\{\begin{array}{ll}
> e^{-\frac{y^2}{x^2}}, & x \neq 0 \\
> 0, & x=0
> \end{array} .\right.
> $
>Then, $\partial_x f(0, y)=0$ for all $y>0$ but by change of variable $w=y/x$:
> $
> \left.\partial_x \int_0^{\infty} f(x, y) d y\right|_{x=0}=\int_0^{\infty} e^{-w^2} d w \neq 0
> $
**Theorem (Differentiating an integral, real version).** Let $I \subset \mathbb{R}$ be an open interval, and ( $\Omega, \mathcal{F}, \mu$ ) a measure space. Assume that a function $f: I \times \Omega \rightarrow \mathbb{R}$ satisfies the following properties:
- for every $x \in I$, the function $\omega \mapsto f(x, \omega)$ is integrable;
- for almost every $\omega$ and every $x \in I$, the derivative $\partial_x f(x, \omega)$ of the function $x \mapsto f(x, \omega)$ exists;
- there is a measurable function $h: \Omega \rightarrow \mathbb{R}_{\geq 0}$ such that $\int_{\Omega} h d \mu<\infty$ and $\left|\partial_x f(x, \omega)\right| \leq h(\omega)$ for all $x \in I$ and almost all $\omega \in \Omega$.
Then, the derivative $\varphi^{\prime}(x)$ of the function $\varphi(x):=\int_{\Omega} f(x, \omega) d \mu(\omega)$ exists at all $x \in I$, and $\varphi^{\prime}(x)= \int_{\Omega} \partial_x f(x, \omega) d \mu(\omega)$.
**Theorem (Differentiating an integral, complex version).** Let $\Lambda \subset \mathbb{C}$ be an open set, a measure space, and assume that a function $f: \Lambda \times \Omega \rightarrow \mathbb{C}$ satisfies the following properties:
- for every $z \in \Lambda$, the function $\omega \mapsto f(z, \omega)$ is measurable;
- for almost every $\omega$, the function $z \mapsto f(z, \omega)$ is analytic in $\Lambda$;
- there is a measurable function $h: \Omega \rightarrow \mathbb{R}_{\geq 0}$ such that $\int_{\Omega} h<\infty$ and $|f(z, \omega)| \leq h(\omega)$ for all $z \in \Lambda$ and almost every $\omega \in \Omega$.
Then, the function $\varphi(z):=\int_{\Omega} f(z, \omega) d \mu(\omega)$ is analytic in $\Lambda$, and $\frac{\partial^n}{\partial z^n} \varphi(z)=\int_{\Omega} \frac{\partial^n}{\partial z^n} f(z, \omega) d \mu(\omega)$ for all $n=1,2, \ldots$ and for all $z \in \Lambda$.
### Density / Radon-Nikodym derivative
**Lemma.** Let $f \geq 0$ be a measurable function on a measure space $(\Omega, \mathcal{F}, \mu)$ (not necessarily with finite measure). Then $\mu^{\prime}$, defined on $\mathcal{F}$ by$
\mu^{\prime}(A):=\int_A f d \mu=\int_{\Omega}\left(f \cdot \mathbb{I}_A\right) d \mu
$is a measure on $\mathcal{F}$.
>[!info] Definition
> If $(\Omega, \mathcal{F})$ is a measurable space, $\mu,\mu^{\prime}$ are measures on $\mathcal{F}$, and there exists a function $f:\Omega\to \Omega$ such that $\int_{A} d\mu' = \mu^{\prime}(A) \equiv \int_A f d \mu,$ then the function $f$ is called the *Radon-Nikodym derivative of $\mu^{\prime}$ with respect to $\mu$*.
>
> It is denoted $f=\frac{d \mu^{\prime}}{d \mu}$, or $d\mu'=fd\mu$.
>^radon-nikodym
**Theorem (Radon-Nikodym).** If $\mu,\mu'$ are measures on the same space such that $\mu(A)=0$ implies $\mu^{\prime}(A)=0$, then there exists a function $f$ such that $d \mu^{\prime}=f d \mu$ (i.e. the Radon-Nikodym derivative of $\mu'$ with respect to $\mu$ exist).
>[!info] Definition
>
> In the special case when $\mu$ is the Lebesgue measure (on $\mathbb{R}$ or on $\mathbb{R}^n$ ), and $\mu^{\prime}$ is a probability measure, the function $f$ satisfying is called *probability density*.
>
> If $\mu'=\mu_{X}$ is the probability distribution of a random variable $X$, then $f=f_{X}$ is the *probability density of $X$*, so that $d\mu_{X}=f_{X}d\lambda$, or in other words:
> $\mathbb{P}(X\in A)= \int_{A}f_{X}d\lambda.$
>^density
>[!example]
>A scalar random variable with probability density function
> $
> f(x)=\frac{1}{\sqrt{2 \pi}} e^{-\frac{x^2}{2}}
> $
> is called *standard Gaussian*. A scalar random variable $X$ is called *Gaussian* if $X=\sigma X^{\prime}+\mu$, where $\sigma \geq 0$, $\mu \in \mathbb{R}$, and $X^{\prime}$ is a standard Gaussian.
>[!note] Remark
>If $X$ is a random variable such that its distribution function $F_{X}$ has a continuous derivative, then $F'_{X}$ is the probability density of $X$, so that $\mathbb{P}(X \leq a) = F_{X}(a) = \int_{-\infty}^{a} F'_{X}(x) \, dx$.
>[!note] Remark
>By the [[Probability theory - notes#^change-var|change of variable theorem]], if $X$ has a [[Probability theory - notes#^density|density function]] $f_{X}$, so that $d\mu_{X}=f_{X}dx$:
> $
> \mathbb{E}(X)=\int_{\mathbb{R}} x d \mu_X=\int_{\mathbb{R}} xf_{X}(x) dx.
> $
### Direct products of measure spaces and Fubini’s theorem
**Theorem.** If $\left(\Omega_i, \mathcal{F}_i, \mu_i\right), i=1, \ldots, n$ are $\sigma$-finite measure spaces, then there is a unique measure $\mu_1 \otimes \cdots \otimes \mu_n$ on $\sigma\left(\mathcal{F}_1 \times \cdots \times \mathcal{F}_n\right)$, such that for any $A_1 \in \mathcal{F}_1, \ldots, A_n \in \mathcal{F}_n$, one has
$
\mu_1 \otimes \cdots \otimes \mu_n\left(A_1 \times \cdots \times A_n\right)=\prod_{i=1}^n \mu_i\left(A_i\right) .
$
>[!note] Notation
>Denote by $\mathcal{F}_{1}\otimes\cdots\otimes \mathcal{F}_{n}$ the completion of $\sigma(\mathcal{F}_{1}\times\cdots \times \mathcal{F}_{n})$.
**Theorem (Cavalieri principle).** Given $\sigma$-finite measure spaces $(\Omega_1, \mathcal{F}_1, \mu_1)$ and $(\Omega_2, \mathcal{F}_2, \mu_2)$, let $E \in \sigma\left(\mathcal{F}_1 \times \mathcal{F}_2\right)$. Then
- for all $\omega \in \Omega_1$, the set $E_\omega:=\left\{\omega^{\prime} \in \Omega_2:\left(\omega, \omega^{\prime}\right) \in E\right\}$ is $\mathcal{F}_2$ measurable;
- the function $f_E: \Omega_1 \rightarrow \mathbb{R}$, defined by $f_E(\omega):=\mu_2\left(E_\omega\right)$ is $\mathcal{F}_1$-to- $\mathcal{B}(\mathbb{R})$ measurable and
$
\mu_1 \otimes \mu_2(E)=\int_{\Omega_1} f_E d \mu_1
$
**Theorem (Cavalieri principle for completed spaces).** Let $\left(\Omega_1, \mathcal{F}_1, \mu_1\right)$ and $\left(\Omega_2, \mathcal{F}_2, \mu_2\right)$ be complete $\sigma$-finite measure spaces. Assume that $E \subset \Omega_1 \times \Omega_2$ is $\mathcal{F}_1 \otimes \mathcal{F}_2$ measurable. Then
- for $\mu_1$-almost every ${ }^{21} \omega \in \Omega_1$, the set $E_\omega:=\left\{\omega^{\prime} \in \Omega_2:\left(\omega, \omega^{\prime}\right) \in E\right\}$ is $\mathcal{F}_2$ measurable;
- the function $f_E(\omega):=\mu_2\left(E_\omega\right)$ is $\mathcal{F}_1$-to- $\mathcal{B}(\mathbb{R})$ (defined almost everywhere on $\Omega_1$ ) is measurable;
- one has the identity
$
\mu_1 \otimes \mu_2(E)=\int_\Omega f_E d \mu_1
$
**Theorem (Tonelli's theorem).** Let ( $\Omega_1, \mathcal{F}_1, \mu_1$ ) and ( $\Omega_2, \mathcal{F}_2, \mu_2$ ) be complete $\sigma$-finite measure spaces. If $f: \Omega_1 \times \Omega_2 \rightarrow \mathbb{R}_{\geq 0}$ is $\mathcal{F}_1 \otimes \mathcal{F}_2$ measurable, then
- For a. e. $\omega \in \Omega_1$, the function $f_\omega(\cdot):=f(\omega, \cdot): \Omega_2 \rightarrow \mathbb{R}$ is $\mathcal{F}_2$ measurable.
- The function $\omega \mapsto \int_{\Omega_2} f_\omega\left(\omega^{\prime}\right) d \mu_2\left(\omega^{\prime}\right)$, defined almost everywhere on $\Omega_1$, is $\mathcal{F}_1$-measurable.
- The following identity holds:
$
\int_{\Omega_1 \times \Omega_2} f d\left(\mu_1 \otimes \mu_2\right)=\int_{\Omega_1}\left(\int_{\Omega_2} f_\omega\left(\omega^{\prime}\right) d \mu_2\left(\omega^{\prime}\right)\right) d \mu_1(\omega)
$
**Theorem (Fubini's theorem).** Let ( $\Omega_1, \mathcal{F}_1, \mu_1$ ) and ( $\Omega_2, \mathcal{F}_2, \mu_2$ ) be complete $\sigma$-finite measure spaces. If $f: \Omega_1 \times \Omega_2 \rightarrow \mathbb{R}$ is $\mathcal{F}_1 \otimes \mathcal{F}_2$ measurable and such that
$
\int_{\Omega_1 \times \Omega_2}|f| d\left(\mu_1 \otimes \mu_2\right)<\infty
$
then the conclusion of Tonelli's theorem holds.
## 2. Laws of Large Numbers
### Independence
Fix a probability space $(\Omega,\mathcal{F},\mathbb{P})$.
>[!info] Definition
>Two events $A, B \in \mathcal{F}$ are called *independent* if
> $
> \mathbb{P}(A \cap B)=\mathbb{P}(A) \mathbb{P}(B).
> $
>
>A finite collection $A_1, \ldots, A_n$ of events is called independent if, for any subset $1 \leq i_1<\cdots<i_k \leq n$, one has
>$\mathbb{P}\left(A_{i_1} \cap \cdots \cap A_{i_k}\right)=\mathbb{P}\left(A_{i_1}\right) \cdot \ldots \mathbb{P}\left(A_{i_k}\right).$
>A finite collection $X_1, \ldots, X_n$ of random variables is independent if for any measurable sets $A_1, \ldots, A_n$, the events $\left\{X_i \in A_i\right\}, i=1, \ldots, n$, are independent, that is, the preimages of $A_i$ 's under $X_i$ 's are independent:
> $
> \mathbb{P}\left(X_1 \in A_1 ; \ldots ; X_n \in A_n\right)=\prod_{i=1}^n \mathbb{P}\left(X_i \in A_i\right) .
> $
> A countable collection of events (random variables) is called independent if all its finite sub-collections are independent.
>
>A collection of random variables are called *independent, identically distributed*, in short *i.i.d.*, if they are independent and all have the same distribution.
**Theorem.** Scalar random variables $X_1, \ldots, X_N$ are independent if and only if, for any measurable functions $f_i: \mathbb{R} \rightarrow \mathbb{R}$ such that $\mathbb{E} f_i\left(X_i\right)$ exists, one has
$
\mathbb{E}\left(\prod_{i=1}^N f_i\left(X_i\right)\right)=\prod_{i=1}^N \mathbb{E} f_i\left(X_i\right) .
$
**Proposition.** If independent scalar random variables $X_1, \ldots, X_N$ have densities $f_1, \ldots, f_N$, then the random vector $X=\left(X_1, \ldots, X_N\right)$ has density
$
f\left(x_1, \ldots x_N\right)=f_1\left(x_1\right) \cdots f_N\left(x_N\right)
$
with respect to the $N$-dimensional Lebesgue measure $\lambda^N$.
Conversely, if the random vector $X=\left(X_1, \ldots, X_N\right)$ has a density $f$, and there exist integrable functions $f_i: \mathbb{R} \rightarrow \mathbb{R} \geq 0$ such that the above equality holds almost everywhere, then $X_1, \ldots, X_N$ are independent.
### Weak law of large numbers
**Theorem.** Let $X_1, \ldots, X_n,\dots$ be i.i.d. scalar random variables such that $\mathbb{E}\left|X_1\right|<\infty$. Denote $S_n:=X_1+\cdots+X_n$ and $\mu:=\mathbb{E} X_1$. Then, for every $\varepsilon>0$,
$
\mathbb{P}\left(\left|\frac{S_n}{n}-\mu\right|>\varepsilon\right) \xrightarrow{n \rightarrow \infty} 0 .
$
**Proposition.** Let $X_1, \ldots, X_n$ be i. i. d. random variables with $\mu:=\mathbb{E} X_i$ and $\sigma^2:=\operatorname{Var} X_1< \infty$, then
$
\mathbb{P}\left(\left|\frac{S_n}{n}-\mu\right|>\varepsilon\right) \leq \frac{\sigma^2}{\varepsilon^2 n} .
$
### Strong law of large numbers
**Theorem. (Borel-Cantelli lemma)** Assume that $A_1, A_2, \ldots$ are events on the same probability space such that
$
\sum_{i=1}^{\infty} \mathbb{P}\left(A_i\right)<\infty .
$
Let $N(\omega):=\#\left\{i: \omega \in A_i\right\}$. Then $\mathbb{P}(N=\infty)=0$.
Theorem 2.5.1. (Strong ${ }^3$ law of large numbers.) Assume that $X_i$ are i. i. d. scalar random variables with expectation $\mu$, (such that $\mathbb{E} X_1^4<\infty$). Then, with probability 1,
$
\frac{1}{n} \sum_{i=1}^n X_i \rightarrow \mu .
$
### Kolmogorov’s zero-one law
### Convergence of random variables
>[!info] Definition
> Let $X, X_1, X_2, \ldots$ be scalar random variables defined on the same probability space $\Omega$. We say that
> - $X_i \rightarrow X$ *a.s*.( $X_i$ converges to $X$ *almost surely*) if there is an event $E$ of probability 1 such that $X_i(\omega) \rightarrow X(\omega)$ for each $\omega \in E$;
> - $X_i \xrightarrow{\mathcal{P}} X$ ( $X_i$ converges to $X$ *in probability*), if for any $\varepsilon>0, \mathbb{P}\left(\left|X_i-X\right|>\varepsilon\right) \rightarrow 0$;
> - $X_i \xrightarrow{L^p} X$, ( $X_i$ converges to $X$ *in $L^p$* ) where $p \geq 1$, if $\mathbb{E}\left|X_i-X\right|^p \rightarrow 0$.
> - The most common cases are $p=1$ (*convergence in mean*) and $p=2$ (*mean-square convergence*).
>
> Let $X, X_1, X_2, \ldots$ be random variables with values in the same metric space $M$ (but not necessarily defined on the same probability space). We say that
> - $X_i \xrightarrow{\mathcal{D}} X$ ( $X_i$ converges to $X$ *in distribution*) if for any bounded continuous function $f: M \rightarrow \mathbb{R}$, one has
> $
> \mathbb{E}\left(f\left(X_i\right)\right) \rightarrow(\mathbb{E} f(X)) .
> $
**Theorem.** A sequence $X_i$ of scalar random variables converges in distribution to $X$ if an only if
$
F_{X_i}(a) \rightarrow F_X(a)
$
for all $a \in \mathbb{R}$ such that $F_X$ is continuous at $a$.
#### Implications between notions of convergence
**Proposition.** There are the following implications between notions of convergence:
1) a. s. $\implies$ in probability;
2) in $L^p$ for any $p \geq 1$ $\implies$ in probability;
3) in probability $\implies$ in distribution;
4) in $L^p$ $\implies$ in $L^q$ if $q<p$.
**Remark**. No other implication holds in general
#### Converging subsequences
**Proposition.** If $X_i \xrightarrow{\mathcal{P}} X$, then there is a subsequence $X_{i_k}$ such that $X_{i_k} \rightarrow X$ almost surely.
>[!info] Definition
> A sequence $X_i$ of scalar random variables is called *tight* if for any $\varepsilon>0$, there exists $R>0$ such that for any $i$,
> $
> \mathbb{P}\left(X_i \in[-R ; R]\right)>1-\varepsilon .
> $
**Theorem.** (Helly's selection theorem)
- If $X_i$ is any sequence of scalar random variables, then there is a subsequence $i_k$ and a rightcontinuous non-decreasing function $F: \mathbb{R} \rightarrow[0,1]$ such that $F_{X_{i_k}}(a) \rightarrow F(a)$ for all a at which $F$ is continuous.
- If, in addition, $X_i$ is tight, then $F$ is a distribution function of a random variable $X$ (and thus $X_{i_k} \xrightarrow{\mathcal{D}} X$ ).
#### Convergence in distribution
**Theorem.** The following statements are equivalent:
(i) $X_n \xrightarrow{\mathcal{D}} X$
(ii) For all open sets $G, \liminf _{n \rightarrow \infty} P\left(X_n \in G\right) \geq P\left(X \in G\right)$.
(iii) For all closed sets $K, \limsup _{n \rightarrow \infty} P\left(X_n \in K\right) \leq P\left(X\in K\right)$.
(iv) For all Borel sets $A$ with $P\left(X \in \partial A\right)=0$, $\lim _{n \rightarrow \infty} P\left(X_n \in A\right)= P\left(X\in A\right)$.
### Characteristic functions
>[!info] Definition
> If $X$ is a scalar random variable, then the *characteristic function* of $X$ is defined as
> $
> \varphi_X(t)=\mathbb{E} e^{i t X}, \quad t \in \mathbb{R} .
> $
>**Remark.** Since $\left|e^{i t X}\right|=1$, the expectation always exists.
**Proposition.** If $X$ and $Y$ are independent, then
$
\varphi_{X+Y}(t) \equiv \varphi_X(t) \varphi_Y(t).
$
>[!info] Definition
>Given an integrable function $f: \mathbb{R} \rightarrow \mathbb{C}$, the *Fourier transform* $\mathfrak{F}(f)$ is defined by
> $
> \mathfrak{F} f(t)=\int_{\mathbb{R}} e^{i t x} f(x) d x
> $
**Proposition.** (Fourier inversion formula) If $f$ is integrable and continuously differentiable, then
$
\mathfrak{F} \mathfrak{F} f(t):=\lim _{R \rightarrow \infty} \int_{[-R ; R]} e^{-i t \theta}\left(\int_{\mathbb{R}} e^{i \theta x} f(x) d x\right) d \theta=2 \pi f(t)
$
**Corollary.** The distribution of a scalar random variable is uniquely determined by its characteristic function.
**Corollary.** If $X$ is a scalar random variable, and $X_1, X_2, \ldots$ is a tight sequence of scalar random variables such that $\varphi_{X_n}(t) \rightarrow \varphi_X(t)$ for all $t \in \mathbb{R}$, then $X_n \xrightarrow{\mathcal{D}} X$.
### The Central Limit Theorem
>[!info] Definition
>Definition 2.2.2. A scalar random variable with probability density function
>
> $
> f(x)=\frac{1}{\sqrt{2 \pi}} e^{-\frac{x^2}{2}}
> $
>
> is called *standard Gaussian*.
> A scalar random variable $X$ is called *Gaussian* if $X=\sigma X^{\prime}+\mu$, where $\sigma \geq 0$, $\mu \in \mathbb{R}$, and $X^{\prime}$ is a standard Gaussian.
>
> A Gaussian random variable has distribution
> > $
> \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2 \sigma^2}} .
> $
>
> This distribution is denoted by $\mathcal{N}(\mu, \sigma)$.
**Theorem.** (The Central Limit Theorem) Let $X_1, \ldots, X_n,\dots$ be independent, identically distributed scalar random variables such that $\mathbb{E} X_1=0$ and $\mathbb{E} X_1^2=\sigma^2<\infty$. Then
$
\frac{S_n}{\sqrt{n}}:=\frac{\sum_{i=1}^n X_i}{\sqrt{n}} \xrightarrow{\mathcal{D}} \mathcal{N}(0, \sigma) .
$
![[Probability theory - glossary]]