three possible sources of uncertainty
- Inherent stochasticity(随机性) in the system
- Incomplete observability.
- Incomplete modeling. (比如把连续的运行轨迹离散化后回丢失准确位置)
Random Variables
A random variable is a variable that can take on different values randomly.
Random variables may be discrete or continuous.
- A discrete random variable is one that has a finite or countably infinite number of states.
- A continuous random variable is associated with a real value.
Probability Distributions
A probability distribution is a description of how likely a random variable or set of random variables is to take on each of its possible states.
Discrete Variables and Probability Mass Function (PMF)
单个随机变量:$\mathbf{x} \sim P(\mathbf{x})$ 随机变量$\mathbf{x}$服从概率分布$P(\mathbf{x})$, $P(\mathbf{x}=x)$
多个随机变量:joint probability distribution: $P(\mathbf{x} = x,\mathbf{y} = y)$ 简写为$P(x,y)$
PMF 函数$P$需满足以下性质:
- The domain of P must be the set of all possible states of $\mathbf{x}$.
- $\forall x \in \mathbf{x},0 \le P(x) \le 1$
- $\sum_{x \in \mathbf{x}} P(x) = 1$
Continuous Variables and Probability Density Functions (PDF)
PDF 函数$p$需满足以下性质:
- The domain of p must be the set of all possible states of $\mathbf{x}$.
- $\forall x \in \mathbf{x},p(x) \ge 0$ 不要求 $p(x) \le 1$
- $\int p(x)dx = 1$
Marginal Probability
The probability distribution over the subset is known as the marginal probability distribution.
如果已知联合概率分布$P(\mathbf{x},\mathbf{y})$
则$P(\mathbf{x})$为:
\begin{equation}
\forall x \in \mathbf{x},P(x) = \sum_y P(\mathbf{x} = x,\mathbf{y} = y)
\end{equation}
对于连续变量:
\begin{equation}
p(x) = \int p(x,y)dy
\end{equation}
Conditional Probability
\begin{equation}
P(\mathbf{y} = y | \mathbf{x} = x) = \frac{P(\mathbf{y} = y,\mathbf{x} = x)}{P(\mathbf{x} = x)}
\end{equation}
只有$P(\mathbf{x} = x) > 0$时才能计算条件概率。(We cannot compute the conditional probability conditioned on an event that never happens.)
The Chain Rule of Conditional Probabilities
略
Independence and Conditional Independence
Two random variables x and y are independent,记为$\mathbf{x} \perp \mathbf{y}$, if
\begin{equation}
\forall x \in \mathbf{x}, y \in \mathbf{y}, p(x,y) = p(x)p(y)
\end{equation}
Two random variables x and y are conditionally independent given a random variable z, 记住为$\mathbf{x} \perp \mathbf{y}|z$
\begin{equation}
\forall x \in \mathbf{x}, y \in \mathbf{y}, z \in \mathbf{z} ,p(x,y|z) = p(x|z)p(y|z)
\end{equation}
Expectation, Variance and Covariance
expectation or expected value
设离散随机变量$\mathbf{x}$, 其概率分布为$P(\mathbf{x})$,期望值为:
\begin{equation}
E[\mathbf{x}] = \sum_x xP(x)
\end{equation}
\begin{equation}
E[f(\mathbf{x})] = \sum_x f(x)P(x)
\end{equation}
设连续随机变量$\mathbf{x}$, 期望值为:
\begin{equation}
E[\mathbf{x}] = \int xp(x)dx
\end{equation}
\begin{equation}
E[f(\mathbf{x})] = \int f(x)p(x)dx
\end{equation}
variance
\begin{equation}
Var(\mathbf{x}) = E[(\mathbf{x}-\mu)^2]=E(\mathbf{x}^2)-[E(\mathbf{x})]^2
\end{equation}
\begin{equation}
Var(f(\mathbf{x})) = E[(f(\mathbf{x})-E[f(\mathbf{x})])^2]
\end{equation}
covariance 协方差
\begin{equation}
Cov(f(x),g(y)) = E[(f(x)-E[f(x)])(g(y)-E[g(y)])]
\end{equation}
\begin{equation}
Cov(X,Y) = E[(X-\mu_x)(Y-\mu_y)]
\end{equation}
\begin{equation}
Cov(X,Y) = E(XY)-E(X)E(Y)
\end{equation}
\begin{equation}
Cov(X,X) = Var(X)
\end{equation}
Common Probability Distributions
Bernoulli distribution
Multinoulli Distribution
Gaussian distribution / normal distribution
Exponential and Laplace Distributions
The Dirac Distribution and Empirical Distribution
概率分布只在某个单独点附近有,使用Dirac delta function $\delta (x)$ (是一种generalized function )来定义PDF
\begin{equation}
p(x) = \delta (x-\mu)
\end{equation}
The Dirac delta function is defined such that it is zero-valued everywhere except 0, yet integrates to 1
The Dirac delta function as the limit (in the sense of distributions) of the sequence of zero-centered normal distributions
Mixtures of Distributions
A latent variable c is a random variable that we cannot observe directly. x 是能观测到的变量。
\begin{equation}
P(x,c) = P(x|c)P(c)
\end{equation}
Gaussian mixture model
- prior probability: 在观测到x之前, the model’s beliefs about c
- posterior probability:$P(c|x)$, 观测到x之后….
A Gaussian mixture model is a universal approximator of densities, in the sense that any smooth density can be approximated with any specific, non-zero amount of error by a Gaussian mixture model with enough components.
(高斯模型混合模型(GMM)理论上可以拟合任意形状的概率分布)
Structured Probabilistic Models (graphical models)
factorization of a probability distribution with a graph in which each node in the graph corresponds to a random variable, and an edge connecting two random variables means that the probability distribution is able to represent direct interactions between those two random variables.
- Directed models: 使用directed edges来分解成conditional probability distributions. 假设随机变量(节点)$x_i$的父亲节点集合为$Pa(x_i)$,则随机变量$\mathbf{x}$的概率分布可以分解为
\begin{equation}
p(\mathbf{x}) = \prod_i p(x_i|Pa(x_i))
\end{equation} - Undirected models: 使用undirected edges来分解成函数集合。clique $C^i$: any set of nodes that are all connected to each other. 其中$\phi$是与$C^i$有关的函数, Z是normalizing constant ,the sum or integral over all states of the product of the $\phi$ functions
\begin{equation}
p(\mathbf{x}) = \frac{1}{Z}\prod_i \phi^i (C^i)
\end{equation}