Linear Algebra in Deep learning

只涉及到与理解deep learning有关的线性代数部分。详细请参考《The Matrix Cookbook》。

Scalars, Vectors, Matrices and Tensors

scalars：单纯数字。 $a = a^\top$
Vectors：数字的数组。（只含有一列的矩阵，如无特别说明，所提到的向量都为列向量）
Matrices：矩阵。2维数组。$\mathbf{A}_{i,:}$ 矩阵A的第i行， $\mathbf{A}_{:,j}$ 矩阵A的第j列。
Tensors：张量。3维

transpose

转置。沿主对角线做镜像翻转。 $(\mathbf{A}^\top)_{ij}=\mathbf{A}_{ji}$

矩阵乘法

distributive: $A(B+C) = AB+AC$
associative: $A(BC) = (AB)C$
not commutative: $AB = BA$不总是满足。但向量的点乘满足：$x^\top y = y^\top x$
矩阵乘积的转置: $(AB)^\top = B^\top A^\top$
线性等式的矩阵形式: $Ax=b$ A矩阵，x：变量，列向量。b：列向量。

Identity and Inverse Matrices

identity matrix: $I_n$, 主对角线上n个元素都为1，其余元素全为0的矩阵。
matrix inverse: $A^{-1}A = I_n$

Linear Dependence and Span

space & subspace

space: 空间内的元素对加法和乘法封闭，即任意的加或者乘，所得的结果仍然属于该空间。
subspace：W是线性空间V的一个非空子集，如果 W对于V 中定义的加法和乘法也构成线性空间，那么就成W是V的线性子空间。

span & range

设有一个列向量集合$\\{v^1,v^2,…,v^n\\}$
linear combination： $\sum_i{c_iv^i}$
span: 向量的所有线性组合
column space/ range of A：矩阵A中列向量的span
null space: 矩阵A的零空间为使A中的列向量组合和为零

线性方程组有解的条件

$Ax=b$是否有解取决于b是否存在于矩阵A的列空间中。
对任意$b \in \mathbb{R}^m$要求有解，要求A的列空间为所有的$\mathbb{R}^m$。如果$\mathbb{R}^m$中存在一个点，在列空间之外，则该点对应的b不存在解。

有解的necessary condition：矩阵A的列数量 $n \ge m$。 m列中可能存在冗余（称为linear dependence）。
有解的necessary and sufficient condition：矩阵A中包含m列线性独立的列。the matrix must contain at least one set of m linearly independent columns.

矩阵有逆的条件

有唯一解，矩阵为方阵（square，n=m）且m列线性独立。
singular matrix：A square matrix with linearly dependent columns

如果矩阵A不是方阵，或者A是singular矩阵，则不能使用matrix inversion。

Special Kinds of Matrices and Vectors

Diagonal matrices

主对角线元素非零，其余元素为零的矩阵。
对角线上元素相等的对角矩阵称为数量矩阵；对角线上元素全为1的对角矩阵称为单位矩阵。

性质：

和差运算：同阶对角阵的和、差仍是对角阵
数乘运算：数与对角阵的乘积仍为对角阵
乘积运算：同阶对角矩阵的乘积仍为对角阵，且它们的乘积是可交换的，即AB=BA

diag(v): a square diagonal matrix whose diagonal entries are given by the entries of the vector v.

对角矩阵的乘法运算非常高效，$diag(v)x=v \odot x$

只有当主对角线上元素全为非零元素时，对角矩阵的逆存在。$diag(v)^{-1} = diag([1/v_1,…,1/v_n]^\top)$

symmetric matrix

\begin{equation}
A = A^\top
\end{equation}

unit vector 单位向量

范数为1的向量。$||x||_2 = 1$

orthogonal matrix 正交矩阵

当向量x和y满足 $x^\top y = 0$时，x和y垂直。如果向量不但垂直而且范数为1，则称之为orthonormal.

orthogonal matrix: a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal
如果A是正交矩阵

$A^\top 是正交矩阵$
$A^\top A = A A^\top = I$ (I为单位矩阵)
A的各行是单位向量且两两正交
A的各列是单位向量且两两正交
$A^{-1} = A^\top$

The Trace Operator （矩阵的迹）

矩阵A的迹是矩阵A的主对角线上各个元素的总和。 (迹是所有特征值的和)
\begin{equation}
Tr(A) = \sum_i A_{ii}
\end{equation}

矩阵范数(Frobenius norm)的迹形式：

\begin{equation}
||A||_F = \sqrt {Tr(AA^T)}
\end{equation}

迹的不变性（invariant）

$Tr(A)=Tr(A^\top)$ invariant to the transpose operator
$Tr(ABC) = Tr(CAB) = Tr(BCA)$
$Tr(A_{m \times n}B_{n \times m}) = Tr(B_{n \times m}A_{m \times n})$
$a = Tr(a)$ a scalar is its own trace
$Tr(mA+nB)=m Tr(A)+n Tr(B)$

The Determinant 矩阵的行列式

det(A)：方阵A的行列式

The determinant is equal to the product of all the eigenvalues of the matrix.

行列时的性质：
https://en.wikipedia.org/wiki/Determinant