[1] What Does A Kalman Filter Do?
If a system is a linear system, with Gaussian measurement and control noise, the Kalman Filter (KF) produces the minimum-covariance (i.e., the mean squared error) of the system states, given past control variables and sensor readings.
\[\begin{gather*} \begin{aligned} \mathbf x_k &= \mathbf A\,\mathbf x_{k-1} + \mathbf B\,\mathbf u_k + \mathbf w_k, &\quad \mathbf w_k &\sim \mathcal N(\mathbf 0,\,\mathbf Q) \\ \mathbf z_k &= \mathbf C\,\mathbf x_k + \mathbf v_k, &\quad \mathbf v_k &\sim \mathcal N(\mathbf 0,\,\mathbf R) \end{aligned} \end{gather*}\]- where $\mathbf w_k$ is process noise and $\mathbf v_k$ is measurement noise; both are assumed zero‑mean, white and mutually independent.
The Kalman Filter seeks to generate a sequence of estimates $\hat{x_k}$ such that its error covariance w.r.t the ground truth is minimized.
\[\begin{gather*} \begin{aligned} & \mathbf P_k \;=\; \operatorname{E}\big[(\hat{\mathbf x}_k-\mathbf x_k)(\hat{\mathbf x}_k-\mathbf x_k)^\top\big] \;\to\; \text{min.} \end{aligned} \end{gather*}\]- With currently known control command $u_k$, we can make a prediction $\hat{\mathbf x}_k^-$.
- Of course, this prediction comes with some uncertainty. The error covariance is amplified to:
- By design, the Kalman Filter applies a correction to the prediction (a.k.a prior) $\hat{\mathbf x}_k^-$, using the measurement $z_k$ with Kalman gain $K_k$
- So now, define the error term $e_k$, and transform it to a function of $K_k$, measurement noise $v_k$ and observation matrix $C$:
- Further, because the noise $v_k$ is independent from the state $x_k$ because we can rewrite the error covariance using $K_k$ as (skipping some steps here):
- Now, we are in a good place to solve for $K_k$, such that $P_k$ is at a local minima. This requires the partial derivative of each covariance term w.r.t $K_k$ being zero:
-
This way, $K_k$ minimizes each covariance term, hence the trace. $K_k$ is now called the optimal gain.
-
Substituting the optimal gain back into (★) yields the familiar closed‑form covariance update
[2] What Does An Extended Kalman Filter Do?
A system is non-linear when its state is a non-linear result of control signals , or when observations is a non-linear transform of the current state. (Its noise can still be Gaussian, mutually independent)
\[\begin{gather*} \begin{aligned} \mathbf x_k &= f\bigl(\mathbf x_{k-1},\,\mathbf u_k\bigr) \, + \, \mathbf w_k, &\quad \mathbf w_k &\sim \mathcal N(\mathbf 0,\,\mathbf Q_k) \\ \mathbf z_k &= h\bigl(\mathbf x_k\bigr) \, + \, \mathbf v_k, &\quad \mathbf v_k &\sim \mathcal N(\mathbf 0,\,\mathbf R_k) \end{aligned} \end{gather*}\]However, if the non‑linearity is mild we can linearise the models around the current mean estimate. Then we are able to calculate the Kalman gain, and use that to apply a correction with an observation onto our prediction. Here is how:
- Make prediction using the non-linear model:
- The equivalent linearized non-linear model is:
- where $F_k$ and $H_k$ are Jacobian matrices:
- After this linearisation step the model looks linear–Gaussian, so we can reuse the Kalman machinery with $\mathbf F_k$ and $\mathbf H_k$ playing the roles of $\mathbf A$ and $\mathbf C$:
This set up is called “the extended Kalman Filter” (EKF)
[3] Example: EKF For Robot Pose Estimation With GPS (Or USBL) and IMU Signal
Below is the motion model of a robot. x
is the state vector, $p$ is the Cartesian position x,y,z
, $v$ is the linear velocity, $q$ is the quaternion of the robot orientation. $b_a$ and $b_g$ are biases of the IMU acceleration and gyro
IMU measurements are linear acceleration $\tilde{\mathbf{a}}$ in a/m^2
, and angular velocity:
Process noise vector:
\[\begin{gather*} \begin{aligned} \mathbf{w} = \begin{bmatrix} \mathbf{n}_a \\ \mathbf{n}_\omega \\ \mathbf{n}_{ba} \\ \mathbf{n}_{bg} \end{bmatrix} \sim \mathcal{N}(0, Q_c) \end{aligned} \end{gather*}\]Continuous-time dynamics:
\[\begin{gather*} \begin{aligned} \dot{\mathbf{p}} &= \mathbf{v} \\ \dot{\mathbf{v}} &= R(\mathbf{q})\left(\tilde{\mathbf{a}} - \mathbf{b}_a - \mathbf{n}_a\right) + \mathbf{g} \\ \dot{\mathbf{q}} &= \tfrac{1}{2}\Gamma(\mathbf{q})\left(\tilde{\mathbf{\omega}} - \mathbf{b}_g - \mathbf{n}_\omega\right) \\ \dot{\mathbf{b}}_a &= \mathbf{n}_{ba} \\ \dot{\mathbf{b}}_g &= \mathbf{n}_{bg} \end{aligned} \end{gather*}\]Where $\Gamma(\mathbf{q})$ is a 4x3 matrix:
\[\begin{gather*} \begin{aligned} [\mathbf{x}]_\times = \begin{bmatrix} 0 & -x_3 & x_2 \\ x_3 & 0 & -x_1 \\ -x_2 & x_1 & 0 \end{bmatrix} \qquad \Gamma(\mathbf{q}) = \begin{bmatrix} -\mathbf{q}_v^\top \\ q_0 I_3 + [\mathbf{q}_v]_\times \end{bmatrix} \end{aligned} \end{gather*}\]Now, we can start developing a linearized model of the control and observation model $F_k$, and $G_k$, either using auto-diff, or deriving an analytic form by ignoring higher order terms
[4] Why EKF Is Not The Best And ESKF Is Widely Used Instead
Reason 1 - Rotation Update Is Not Natively On the SO(3) Manifold
Traditional Extended Kalman Filters (EKF) keep the system orientation (expressed in quaternion) directly in the state vector. The update on rotation however, is still done first by
\[\begin{gather*} \begin{aligned} \boldsymbol{\delta\theta} &= K\,\mathbf{r} \\[6pt] \delta q &\approx \begin{bmatrix} 1 \\[4pt] \tfrac{1}{2} \boldsymbol{\delta\theta} \end{bmatrix} \\[6pt] q^{+} &= \delta q \otimes q^{-} \\[6pt] q^{+} &\leftarrow \dfrac{q^{+}}{\lVert q^{+} \rVert} \quad \text{(normalization)} \end{aligned} \end{gather*}\]followed by normalizing the quaternion. Because of the extra normalization step, errors could accumulate over time.
ESKF keeps “correction-to-current-estimate” terms (or nominal quaternion) as its states. The correction is in $R^3$, but there’s an extra step: the nominal quaternion is updated multiplicatively outside the Kalman machinery. That eliminates the need for ad-hoc normalization and maintains consistency
Reason 2 - Loss of Significance
EKF uses raw position and orientation as state vectors. These values can vary, in the magnitudes of $10^2$ or more. The linearity of the control and observation models could vary drastically in this large range, too.
ESKF on the other hand, does not suffer this issue because its states (“correction-to-current-estimate” terms) are close to 0.