Saturday, October 17, 2009

proportional–integral–derivative controller (PID controller)

A proportional–integral–derivative controller (PID controller) is a generic control loop feedback mechanism (controller) widely used in industrial control systems. A PID controller attempts to correct the error between a measured process variable and a desired setpoint by calculating and then outputting a corrective action that can adjust the process accordingly and rapidly, to keep the error minimal.

General

A block diagram of a PID controller
The PID controller calculation (algorithm) involves three separate parameters; the proportional, the integral and derivative values. The proportional value determines the reaction to the current error, the integral value determines the reaction based on the sum of recent errors, and the derivative value determines the reaction based on the rate at which the error has been changing. The weighted sum of these three actions is used to adjust the process via a control element such as the position of a control valve or the power supply of a heating element.
By tuning the three constants in the PID controller algorithm, the controller can provide control action designed for specific process requirements. The response of the controller can be described in terms of the responsiveness of the controller to an error, the degree to which the controller overshoots the setpoint and the degree of system oscillation. Note that the use of the PID algorithm for control does not guarantee optimal control of the system or system stability.
Some applications may require using only one or two modes to provide the appropriate system control. This is achieved by setting the gain of undesired control outputs to zero. A PID controller will be called a PI, PD, P or I controller in the absence of the respective control actions. PI controllers are particularly common, since derivative action is very sensitive to measurement noise, and the absence of an integral value may prevent the system from reaching its target value due to the control action.
Note: Due to the diversity of the field of control theory and application, many naming conventions for the relevant variables are in common use.

Control loop basics

A familiar example of a control loop is the action taken to keep one's shower water at the ideal temperature, which typically involves the mixing of two process streams, cold and hot water. The person feels the water to estimate its temperature. Based on this measurement they perform a control action: use the cold water tap to adjust the process. The person would repeat this input-output control loop, adjusting the hot water flow until the process temperature stabilized at the desired value.
Feeling the water temperature is taking a measurement of the process value or process variable (PV). The desired temperature is called the setpoint (SP). The output from the controller and input to the process (the tap position) is called the manipulated variable (MV). The difference between the measurement and the setpoint is the error (e), too hot or too cold and by how much.
As a controller, one decides roughly how much to change the tap position (MV) after one determines the temperature (PV), and therefore the error. This first estimate is the equivalent of the proportional action of a PID controller. The integral action of a PID controller can be thought of as gradually adjusting the temperature when it is almost right. Derivative action can be thought of as noticing the water temperature is getting hotter or colder, and how fast, anticipating further change and tempering adjustments for a soft landing at the desired temperature (SP).
Making a change that is too large when the error is small is equivalent to a high gain controller and will lead to overshoot. If the controller were to repeatedly make changes that were too large and repeatedly overshoot the target, the output would oscillate around the setpoint in either a constant, growing, or decaying sinusoid. If the oscillations increase with time then the system is unstable, whereas if they decay the system is stable. If the oscillations remain at a constant magnitude the system is marginally stable. A human would not do this because we are adaptive controllers, learning from the process history, but PID controllers do not have the ability to learn and must be set up correctly. Selecting the correct gains for effective control is known as tuning the controller.
If a controller starts from a stable state at zero error (PV = SP), then further changes by the controller will be in response to changes in other measured or unmeasured inputs to the process that impact on the process, and hence on the PV. Variables that impact on the process other than the MV are known as disturbances. Generally controllers are used to reject disturbances and/or implement setpoint changes. Changes in feed water temperature constitute a disturbance to the shower process.
In theory, a controller can be used to control any process which has a measurable output (PV), a known ideal value for that output (SP) and an input to the process (MV) that will affect the relevant PV. Controllers are used in industry to regulate temperature, pressure, flow rate, chemical composition, speed and practically every other variable for which a measurement exists. Automobile cruise control is an example of a process which utilizes automated control.
Due to their long history, simplicity, well grounded theory and simple setup and maintenance requirements, PID controllers are the controllers of choice for many of these applications.

PID controller theory

This section describes the parallel or non-interacting form of the PID controller. For other forms please see the Section "Alternative notation and PID forms".
The PID control scheme is named after its three correcting terms, whose sum constitutes the manipulated variable (MV). Hence:
\mathrm{MV(t)}=\,P_{\mathrm{out}} + I_{\mathrm{out}} + D_{\mathrm{out}}
where
Pout, Iout, and Dout are the contributions to the output from the PID controller from each of the three terms, as defined below.

Proportional term

Plot of PV vs time, for three values of Kp (Ki and Kd held constant)
The proportional term (sometimes called gain) makes a change to the output that is proportional to the current error value. The proportional response can be adjusted by multiplying the error by a constant Kp, called the proportional gain.
The proportional term is given by:
P_{\mathrm{out}}=K_p\,{e(t)}
where
Pout: Proportional term of output
Kp: Proportional gain, a tuning parameter
e: Error = SPPV
t: Time or instantaneous time (the present)
A high proportional gain results in a large change in the output for a given change in the error. If the proportional gain is too high, the system can become unstable (See the section on loop tuning). In contrast, a small gain results in a small output response to a large input error, and a less responsive (or sensitive) controller. If the proportional gain is too low, the control action may be too small when responding to system disturbances.
In the absence of disturbances, pure proportional control will not settle at its target value, but will retain a steady state error that is a function of the proportional gain and the process gain. Despite the steady-state offset, both tuning theory and industrial practice indicate that it is the proportional term that should contribute the bulk of the output change.

Integral term

Plot of PV vs time, for three values of Ki (Kp and Kd held constant)
The contribution from the integral term (sometimes called reset) is proportional to both the magnitude of the error and the duration of the error. Summing the instantaneous error over time (integrating the error) gives the accumulated offset that should have been corrected previously. The accumulated error is then multiplied by the integral gain and added to the controller output. The magnitude of the contribution of the integral term to the overall control action is determined by the integral gain, Ki.
The integral term is given by:
I_{\mathrm{out}}=K_{i}\int_{0}^{t}{e(\tau)}\,{d\tau}
where
Iout: Integral term of output
Ki: Integral gain, a tuning parameter
e: Error = SPPV
t: Time or instantaneous time (the present)
τ: a dummy integration variable
The integral term (when added to the proportional term) accelerates the movement of the process towards setpoint and eliminates the residual steady-state error that occurs with a proportional only controller. However, since the integral term is responding to accumulated errors from the past, it can cause the present value to overshoot the setpoint value (cross over the setpoint and then create a deviation in the other direction). For further notes regarding integral gain tuning and controller stability, see the section on loop tuning.

Derivative term

Plot of PV vs time, for three values of Kd (Kp and Ki held constant)
The rate of change of the process error is calculated by determining the slope of the error over time (i.e., its first derivative with respect to time) and multiplying this rate of change by the derivative gain Kd. The magnitude of the contribution of the derivative term (sometimes called rate) to the overall control action is termed the derivative gain, Kd.
The derivative term is given by:
D_{\mathrm{out}}=K_d\frac{d}{dt}e(t)
where
Dout: Derivative term of output
Kd: Derivative gain, a tuning parameter
e: Error = SPPV
t: Time or instantaneous time (the present)
The derivative term slows the rate of change of the controller output and this effect is most noticeable close to the controller setpoint. Hence, derivative control is used to reduce the magnitude of the overshoot produced by the integral component and improve the combined controller-process stability. However, differentiation of a signal amplifies noise and thus this term in the controller is highly sensitive to noise in the error term, and can cause a process to become unstable if the noise and the derivative gain are sufficiently large.

Summary

The proportional, integral, and derivative terms are summed to calculate the output of the PID controller. Defining u(t) as the controller output, the final form of the PID algorithm is:
\mathrm{u(t)}=\mathrm{MV(t)}=K_p{e(t)} + K_{i}\int_{0}^{t}{e(\tau)}\,{d\tau} + K_{d}\frac{d}{dt}e(t)
where the tuning parameters are:
Proportional gain, Kp
larger values typically mean faster response since the larger the error, the larger the Proportional term compensation. An excessively large proportional gain will lead to process instability and oscillation.
Integral gain, Ki
larger values imply steady state errors are eliminated more quickly. The trade-off is larger overshoot: any negative error integrated during transient response must be integrated away by positive error before we reach steady state.
Derivative gain, Kd
larger values decrease overshoot, but slows down transient response and may lead to instability due to signal noise amplification in the differentiation of the error.

Loop tuning

If the PID controller parameters (the gains of the proportional, integral and derivative terms) are chosen incorrectly, the controlled process input can be unstable, i.e. its output diverges, with or without oscillation, and is limited only by saturation or mechanical breakage. Tuning a control loop is the adjustment of its control parameters (gain/proportional band, integral gain/reset, derivative gain/rate) to the optimum values for the desired control response.
The optimum behavior on a process change or setpoint change varies depending on the application. Some processes must not allow an overshoot of the process variable beyond the setpoint if, for example, this would be unsafe. Other processes must minimize the energy expended in reaching a new setpoint. Generally, stability of response (the reverse of instability) is required and the process must not oscillate for any combination of process conditions and setpoints. Some processes have a degree of non-linearity and so parameters that work well at full-load conditions don't work when the process is starting up from no-load. This section describes some traditional manual methods for loop tuning.
There are several methods for tuning a PID loop. The most effective methods generally involve the development of some form of process model, then choosing P, I, and D based on the dynamic model parameters. Manual tuning methods can be relatively inefficient.
The choice of method will depend largely on whether or not the loop can be taken "offline" for tuning, and the response time of the system. If the system can be taken offline, the best tuning method often involves subjecting the system to a step change in input, measuring the output as a function of time, and using this response to determine the control parameters.
Choosing a Tuning Method
Method Advantages Disadvantages
Manual Tuning No math required. Online method. Requires experienced personnel.
Ziegler–Nichols Proven Method. Online method. Process upset, some trial-and-error, very aggressive tuning.
Software Tools Consistent tuning. Online or offline method. May include valve and sensor analysis. Allow simulation before downloading. Some cost and training involved.
Cohen-Coon Good process models. Some math. Offline method. Only good for first-order processes.

Manual tuning

If the system must remain online, one tuning method is to first set Ki and Kd values to zero. Increase the Kp until the output of the loop oscillates, then the Kp should be left set to be approximately half of that value for a "quarter amplitude decay" type response. Then increase Ki until any offset is correct in sufficient time for the process. However, too much Ki will cause instability. Finally, increase Kd, if required, until the loop is acceptably quick to reach its reference after a load disturbance. However, too much Kd will cause excessive response and overshoot. A fast PID loop tuning usually overshoots slightly to reach the setpoint more quickly; however, some systems cannot accept overshoot, in which case an "over-damped" closed-loop system is required, which will require a Kp setting significantly less than half that of the Kp setting causing oscillation.
Effects of increasing parameters
Parameter Rise time Overshoot Settling time Error at equilibrium
Kp Decrease Increase Small change Decrease
Ki Decrease Increase Increase Eliminate
Kd Indefinite (small decrease or increase)[1] Decrease Decrease None

Ziegler–Nichols method

Another tuning method is formally known as the Ziegler–Nichols method, introduced by John G. Ziegler and Nathaniel B. Nichols. As in the method above, the Ki and Kd gains are first set to zero. The P gain is increased until it reaches the critical gain, Kc, at which the output of the loop starts to oscillate. Kc and the oscillation period Pc are used to set the gains as shown:
Ziegler–Nichols method
Control Type Kp Ki Kd
P 0.50Kc - -
PI 0.45Kc 1.2Kp / Pc -
PID 0.60Kc 2Kp / Pc KpPc / 8

PID tuning software

Most modern industrial facilities no longer tune loops using the manual calculation methods shown above. Instead, PID tuning and loop optimization software are used to ensure consistent results. These software packages will gather the data, develop process models, and suggest optimal tuning. Some software packages can even develop tuning by gathering data from reference changes.
Mathematical PID loop tuning induces an impulse in the system, and then uses the controlled system's frequency response to design the PID loop values. In loops with response times of several minutes, mathematical loop tuning is recommended, because trial and error can literally take days just to find a stable set of loop values. Optimal values are harder to find. Some digital loop controllers offer a self-tuning feature in which very small setpoint changes are sent to the process, allowing the controller itself to calculate optimal tuning values.
Other formulas are available to tune the loop according to different performance criteria.

Modifications to the PID algorithm

The basic PID algorithm presents some challenges in control applications that have been addressed by minor modifications to the PID form.
One common problem resulting from the ideal PID implementations is integral windup. This problem can be addressed by:
  • Initializing the controller integral to a desired value
  • Increasing the setpoint in a suitable ramp
  • Disabling the integral function until the PV has entered the controllable region
  • Limiting the time period over which the integral error is calculated
  • Preventing the integral term from accumulating above or below pre-determined bounds
Freezing the integral function in case of disturbances
If a PID loop is used to control the temperature of an electric resistance furnace, the system has stabilized and then the door is opened and something cold is put into the furnace the temperature drops below the setpoint. The integral function of the controller tends to compensate this error by introducing another error in the positive direction. This can be avoided by freezing of the integral function after the opening of the door for the time the control loop typically needs to reheat the furnace.
Replacing the integral function by a model based part
Often the time-response of the system is approximately known. Then it is an advantage to simulate this time-response with a model and to calculate some unknown parameter from the actual response of the system. If for instance the system is an electrical furnace the response of the difference between furnace temperature and ambient temperature to changes of the electrical power will be similar to that of a simple RC low-pass filter multiplied by an unknown proportional coefficient. The actual electrical power supplied to the furnace is delayed by a low-pass filter to simulate the response of the temperature of the furnace and then the actual temperature minus the ambient temperature is divided by this low-pass filtered electrical power. Then, the result is stabilized by another low-pass filter leading to an estimation of the proportional coefficient. With this estimation it is possible to calculate the required electrical power by dividing the set-point of the temperature minus the ambient temperature by this coefficient. The result can then be used instead of the integral function. This also achieves a control error of zero in the steady-state but avoids integral windup and can give a significantly improved control action compared to an optimized PID controller. This type of controller does work properly in an open loop situation which causes integral windup with an integral function. This is an advantage if for example the heating of a furnace has to be reduced for some time because of the failure of a heating element or if the controller is used as an advisory system to a human operator who may or may not switch it to closed-loop operation or if the controller is used inside of a branch of a complex control system where this branch may be temporarily inactive.
Many PID loops control a mechanical device (for example, a valve). Mechanical maintenance can be a major cost and wear leads to control degradation in the form of either stiction or a deadband in the mechanical response to an input signal. The rate of mechanical wear is mainly a function of how often a device is activated to make a change. Where wear is a significant concern, the PID loop may have an output deadband to reduce the frequency of activation of the output (valve). This is accomplished by modifying the controller to hold its output steady if the change would be small (within the defined deadband range). The calculated output must leave the deadband before the actual output will change.
The proportional and derivative terms can produce excessive movement in the output when a system is subjected to an instantaneous step increase in the error, such as a large setpoint change. In the case of the derivative term, this is due to taking the derivative of the error, which is very large in the case of an instantaneous step change. As a result, some PID algorithms incorporate the following modifications:
Derivative of output
In this case the PID controller measures the derivative of the output quantity, rather than the derivative of the error. The output is always continuous (i.e., never has a step change). For this to be effective, the derivative of the output must have the same sign as the derivative of the error.
Setpoint ramping
In this modification, the setpoint is gradually moved from its old value to a newly specified value using a linear or first order differential ramp function. This avoids the discontinuity present in a simple step change.
Setpoint weighting
Setpoint weighting uses different multipliers for the error depending on which element of the controller it is used in. The error in the integral term must be the true control error to avoid steady-state control errors. This affects the controller's setpoint response. These parameters do not affect the response to load disturbances and measurement noise.

Limitations of PID control

While PID controllers are applicable to many control problems, they can perform poorly in some applications.
PID controllers, when used alone, can give poor performance when the PID loop gains must be reduced so that the control system does not overshoot, oscillate or hunt about the control setpoint value. The control system performance can be improved by combining the feedback (or closed-loop) control of a PID controller with feed-forward (or open-loop) control. Knowledge about the system (such as the desired acceleration and inertia) can be fed forward and combined with the PID output to improve the overall system performance. The feed-forward value alone can often provide the major portion of the controller output. The PID controller can then be used primarily to respond to whatever difference or error remains between the setpoint (SP) and the actual value of the process variable (PV). Since the feed-forward output is not affected by the process feedback, it can never cause the control system to oscillate, thus improving the system response and stability.
For example, in most motion control systems, in order to accelerate a mechanical load under control, more force or torque is required from the prime mover, motor, or actuator. If a velocity loop PID controller is being used to control the speed of the load and command the force or torque being applied by the prime mover, then it is beneficial to take the instantaneous acceleration desired for the load, scale that value appropriately and add it to the output of the PID velocity loop controller. This means that whenever the load is being accelerated or decelerated, a proportional amount of force is commanded from the prime mover regardless of the feedback value. The PID loop in this situation uses the feedback information to effect any increase or decrease of the combined output in order to reduce the remaining difference between the process setpoint and the feedback value. Working together, the combined open-loop feed-forward controller and closed-loop PID controller can provide a more responsive, stable and reliable control system.
Another problem faced with PID controllers is that they are linear. Thus, performance of PID controllers in non-linear systems (such as HVAC systems) is variable. Often PID controllers are enhanced through methods such as PID gain scheduling or fuzzy logic. Further practical application issues can arise from instrumentation connected to the controller. A high enough sampling rate, measurement precision, and measurement accuracy are required to achieve adequate control performance.
A problem with the Derivative term is that small amounts of measurement or process noise can cause large amounts of change in the output. It is often helpful to filter the measurements with a low-pass filter in order to remove higher-frequency noise components. However, low-pass filtering and derivative control can cancel each other out, so reducing noise by instrumentation means is a much better choice. Alternatively, the differential band can be turned off in many systems with little loss of control. This is equivalent to using the PID controller as a PI controller.

Cascade control

One distinctive advantage of PID controllers is that two PID controllers can be used together to yield better dynamic performance. This is called cascaded PID control. In cascade control there are two PIDs arranged with one PID controlling the set point of another. A PID controller acts as outer loop controller, which controls the primary physical parameter, such as fluid level or velocity. The other controller acts as inner loop controller, which reads the output of outer loop controller as set point, usually controlling a more rapid changing parameter, flowrate or acceleration. It can be mathematically proven[citation needed] that the working frequency of the controller is increased and the time constant of the object is reduced by using cascaded PID controller.[vague].

Physical implementation of PID control

In the early history of automatic process control the PID controller was implemented as a mechanical device. These mechanical controllers used a lever, spring and a mass and were often energized by compressed air. These pneumatic controllers were once the industry standard.
Electronic analog controllers can be made from a solid-state or tube amplifier, a capacitor and a resistance. Electronic analog PID control loops were often found within more complex electronic systems, for example, the head positioning of a disk drive, the power conditioning of a power supply, or even the movement-detection circuit of a modern seismometer. Nowadays, electronic controllers have largely been replaced by digital controllers implemented with microcontrollers or FPGAs.
Most modern PID controllers in industry are implemented in programmable logic controllers (PLCs) or as a panel-mounted digital controller. Software implementations have the advantages that they are relatively cheap and are flexible with respect to the implementation of the PID algorithm.

Alternative nomenclature and PID forms

Ideal versus standard PID form

The form of the PID controller most often encountered in industry, and the one most relevant to tuning algorithms is the standard form. In this form the Kp gain is applied to the Iout, and Dout terms, yielding:
\mathrm{MV(t)}=K_p\left(\,{e(t)} + \frac{1}{T_i}\int_{0}^{t}{e(\tau)}\,{d\tau} + T_d\frac{d}{dt}e(t)\right)
where
Ti is the integral time
Td is the derivative time
In the ideal parallel form, shown in the controller theory section
\mathrm{MV(t)}=K_p{e(t)} + K_i\int_{0}^{t}{e(\tau)}\,{d\tau} + K_d\frac{d}{dt}e(t)
the gain parameters are related to the parameters of the standard form through K_i = \frac{K_p}{T_i} and  K_d = K_p T_d \,. This parallel form, where the parameters are treated as simple gains, is the most general and flexible form. However, it is also the form where the parameters have the least physical interpretation and is generally reserved for theoretical treatment of the PID controller. The standard form, despite being slightly more complex mathematically, is more common in industry.

Laplace form of the PID controller

Sometimes it is useful to write the PID regulator in Laplace transform form:
G(s)=K_p + \frac{K_i}{s} + K_d{s}=\frac{K_d{s^2} + K_p{s} + K_i}{s}
Having the PID controller written in Laplace form and having the transfer function of the controlled system, makes it easy to determine the closed-loop transfer function of the system.

Series/interacting form

Another representation of the PID controller is the series, or interacting form
G(s) = K_c \frac{(\tau_i{s}+1)}{\tau_i{s}} (\tau_d{s}+1)
where the parameters are related to the parameters of the standard form through
K_p = K_c \cdot \alpha, T_i = \tau_i \cdot \alpha, and
T_d = \frac{\tau_d}{\alpha}
with
\alpha = 1 + \frac{\tau_d}{\tau_i}.
This form essentially consists of a PD and PI controller in series, and it made early (analog) controllers easier to build. When the controllers later became digital, many kept using the interacting form.

 Discrete implementation

The analysis for designing a digital implementation of a PID controller in a Microcontroller (MCU) or FPGA device requires the standard form of the PID controller to be discretised [2]. Approximations for first-order derivatives are made by backward finite differences. The integral term is discretised, with a sampling time Δt,as follows,
\int_{0}^{t_k}{e(\tau)}\,{d\tau} = \sum_{i=1}^k e(t_i)\Delta t
The derivative term is approximated as,
\dfrac{de(t_k)}{dt}=\dfrac{e(t_k)-e(t_{k-1})}{\Delta t}
Thus, a velocity algorithm for implementation of the discretised PID controller in a MCU is obtained,
u(t_k)=u(t_{k-1})+K_p\left[\left(1+\dfrac{\Delta t}{T_i}+\dfrac{T_d}{\Delta t}\right)e(t_k)+\left(-1-\dfrac{2T_d}{\Delta t}\right)e(t_{k-1})+\dfrac{T_d}{\Delta t}e(t_{k-2})\right]

SENSORS AND PRIMARY TRANSDUCERS

Bessel filter

In electronics and signal processing, a Bessel filter is a type of linear filter with a maximally flat group delay (maximally linear phase response). Bessel filters are often used in audio crossover systems. Analog Bessel filters are characterized by almost constant group delay across the entire passband, thus preserving the wave shape of filtered signals in the passband.
The filter's name is a reference to Friedrich Bessel, a German mathematician (1784–1846), who developed the mathematical theory on which the filter is based.

The transfer function

A plot of the gain and group delay for a fourth-order low pass Bessel filter. Note that the transition from the pass band to the stop band is much slower than for other filters, but the group delay is practically constant in the passband. The Bessel filter maximizes the flatness of the group delay curve at zero frequency.
A Bessel low-pass filter is characterized by its transfer function:[1]
H(s) = \frac{\theta_n(0)}{\theta_n(s/\omega_0)}\,
where θn(s) is a reverse Bessel polynomials from which the filter gets its name and ω0 is a frequency chosen to give the desired cut-off frequency. The filter has a low-frequency group delay of 1 / ω0.

Bessel polynomials

The roots of the third-order Bessel polynomial are the poles of filter transfer function in the s plane, here plotted as crosses.
The transfer function of the Bessel filter is a rational function whose denominator is a reverse Bessel polynomial, such as the following:
n=1; \quad s+1
n=2; \quad s^2+3s+3
n=3; \quad s^3+6s^2+15s+15
The reverse Bessel polynomials are given by:[1]
P(s)=\sum_{k=0}^N a_ks^k where
a_k=\frac{(2N-k)!}{2^{N-k}k!(N-k)!} \quad k=0,1,...,N

Example

Gain plot of the third-order Bessel filter, versus normalized frequency
Group delay plot of the third-order Bessel filter, illustrating flat unit delay in the passband
The transfer function for a third-order (three-pole) Bessel low-pass filter, normalized to have unit group delay, is
H(s)=\frac{15}{s^3+6s^2+15s+15}\,
The roots of the denominator polynomial, the filter's poles, include a real pole at s = -2.3222, and a complex-conjugate pair of poles at s = -1.8389 \plusmn i 1.7544\quad, plotted above. The numerator 15 is chosen to give unity gain at DC (at s = 0).
The gain is then
G(\omega) = |H(j\omega)| = \frac{15}{\sqrt{\omega^6+6\omega^4+45\omega^2+225}}
The phase is
\phi(\omega)=-\mathrm{arg}(H(j\omega))=
-\mathrm{arctan}\left(\frac{15\omega-\omega^3}{15-6\omega^2}\right)\,
The group delay is
D(\omega)=-\frac{d\phi}{d\omega} =
\frac{6 \omega^4+ 45 \omega^2+225}{\omega^6+6\omega^4+45\omega^2+225}
The Taylor series expansion of the group delay is
D(\omega) = 1-\frac{\omega^6}{225}+\frac{\omega^8}{1125}+\cdots
Note that the two terms in ω2 and ω4 are zero, resulting in a very flat group delay at ω=0. This is the greatest number of terms that can be set to zero, since there are a total of four coefficients in the third order Bessel polynomial, requiring four equations in order to be defined. One equation specifies that the gain be unity at ω = 0 and a second specifies that the gain be zero at \omega=\infty, leaving two equations to specify two terms in the series expansion to be zero. This is a general property of the group delay for a Bessel filter of order n: the first n-1 terms in the series expansion of the group delay will be zero, thus maximizing the flatness of the group delay at ω

Kalman filter

The Kalman filter is an efficient recursive filter that estimates the state of a linear dynamic system from a series of noisy measurements. It is used in a wide range of engineering applications from radar to computer vision, and is an important topic in control theory and control systems engineering. Together with the linear-quadratic regulator (LQR), the Kalman filter solves the linear-quadratic-Gaussian control problem (LQG). The Kalman filter, the linear-quadratic regulator and the linear-quadratic-Gaussian controller are solutions to what probably are the most fundamental problems in control theory.

Example applications

An example application would be providing accurate, continuously updated information about the position and velocity of an object given only a sequence of observations about its position, each of which includes some error. For example, in a radar application where one is interested in tracking a target, information about the location, speed, and acceleration of the target is measured at each time instant with a great deal of corruption by noise. The Kalman filter exploits the trusted model of the dynamics of the target, which describes the kind of movement possible by the target, to remove the effects of the noise and get a good estimate of the location of the target at the present time (filtering), at a future time (prediction), or at a time in the past (interpolation or smoothing).
Alternatively, consider an old slow car that is known to go from 0 to 60 miles per hour (mph) in no less than 10 seconds. The speedometer on this car however shows very noisy measurements that vary wildly within a 40 mph window around the actual speed of the car. From stop – which is measured with certainty because the wheels are not turning – the driver of the car pushes its gas pedal as far as possible. Five seconds later, the speedometer reads 70 mph. The driver concludes that the slow car cannot be traveling that quickly and uses information about the known speedometer noise to conclude that the car is likely traveling at 30 mph instead. Similarly, a Kalman filter uses information about noise and system dynamics to reduce uncertainty from noisy measurements.

Underlying dynamic system model

Kalman filters are based on linear dynamical systems discretized in the time domain. They are modelled on a Markov chain built on linear operators perturbed by Gaussian noise. The state of the system is represented as a vector of real numbers. At each discrete time increment, a linear operator is applied to the state to generate the new state, with some noise mixed in, and optionally some information from the controls on the system if they are known. Then, another linear operator mixed with more noise generates the visible outputs from the hidden state. The Kalman filter may be regarded as analogous to the hidden Markov model, with the key difference that the hidden state variables take values in a continuous space (as opposed to a discrete state space as in the hidden Markov model). Additionally, the hidden Markov model can represent an arbitrary distribution for the next value of the state variables, in contrast to the Gaussian noise model that is used for the Kalman filter. There is a strong duality between the equations of the Kalman Filter and those of the hidden Markov model. A review of this and other models is given in Roweis and Ghahramani (1999).[1]
In order to use the Kalman filter to estimate the internal state of a process given only a sequence of noisy observations, one must model the process in accordance with the framework of the Kalman filter. This means specifying the following matrices: Fk, the state-transition model; Hk, the observation model; Qk, the covariance of the process noise; Rk, the observation noise; and sometimes Bk, the control-input model for each time-step, k, as described below.
Model underlying the Kalman filter. Squares represent matrices. Ellipses represent multivariate normal distributions (with the mean and covariance matrix enclosed). Unenclosed values are vectors.
The Kalman filter model assumes the true state at time k is evolved from the state at (k − 1) according to
 \textbf{x}_{k} = \textbf{F}_{k} \textbf{x}_{k-1} + \textbf{B}_{k} \textbf{u}_{k} + \textbf{w}_{k}
where
  • Fk is the state transition model which is applied to the previous state xk−1;
  • Bk is the control-input model which is applied to the control vector uk;
  • wk is the process noise which is assumed to be drawn from a zero mean multivariate normal distribution with covariance Qk.
\textbf{w}_{k} \sim N(0, \textbf{Q}_k)
At time k an observation (or measurement) zk of the true state xk is made according to
\textbf{z}_{k} = \textbf{H}_{k} \textbf{x}_{k} + \textbf{v}_{k}
where Hk is the observation model which maps the true state space into the observed space and vk is the observation noise which is assumed to be zero mean Gaussian white noise with covariance Rk.
\textbf{v}_{k} \sim N(0, \textbf{R}_k)
The initial state, and the noise vectors at each step {x0, w1, ..., wk, v1 ... vk} are all assumed to be mutually independent.
Many real dynamical systems do not exactly fit this model. In fact, unmodelled dynamics can seriously degrade the filter performance, even when it was supposed to work with unknown stochastic signals as inputs. The reason for this is that the effect of unmodelled dynamics depends on the input, and, therefore, can bring the estimation algorithm to unstability (to diverge). On the other hand, independent white noise signals will not make the algorithm diverge. The problem of separating between measurement noise and unmodelled dynamics is a difficult one and is treated in control theory under the framework of robust control.

The Kalman filter

The Kalman filter is a recursive estimator. This means that only the estimated state from the previous time step and the current measurement are needed to compute the estimate for the current state. In contrast to batch estimation techniques, no history of observations and/or estimates is required. In what follows, the notation \hat{\textbf{x}}_{n|m} represents the estimate of \textbf{x} at time n given observations up to, and including time m.
The state of the filter is represented by two variables:
  • \hat{\textbf{x}}_{k|k}, the a posteriori state estimate at time k given observations up to and including at time k;
  • \textbf{P}_{k|k}, the a posteriori error covariance matrix (a measure of the estimated accuracy of the state estimate).
The Kalman filter has two distinct phases: Predict and Update. The predict phase uses the state estimate from the previous timestep to produce an estimate of the state at the current timestep. This predicted state estimate is also known as the a priori state estimate because, although it is an estimate of the state at the current timestep, it does not include observation information from the current timestep. In the update phase, the current a priori prediction is combined with current observation information to refine the state estimate. This improved estimate is termed the a posteriori state estimate.

Predict

Predicted (a priori) state \hat{\textbf{x}}_{k|k-1} = \textbf{F}_{k}\hat{\textbf{x}}_{k-1|k-1} + \textbf{B}_{k-1} \textbf{u}_{k-1}
Predicted (a priori) estimate covariance
\textbf{P}_{k|k-1} =  \textbf{F}_{k} \textbf{P}_{k-1|k-1} \textbf{F}_{k}^{\text{T}} + \textbf{Q}_{k-1}

Update

Innovation or measurement residual \tilde{\textbf{y}}_k = \textbf{z}_k - \textbf{H}_k\hat{\textbf{x}}_{k|k-1}
Innovation (or residual) covariance \textbf{S}_k = \textbf{H}_k \textbf{P}_{k|k-1} \textbf{H}_k^\text{T} + \textbf{R}_k
Optimal Kalman gain \textbf{K}_k = \textbf{P}_{k|k-1}\textbf{H}_k^\text{T}\textbf{S}_k^{-1}
Updated (a posteriori) state estimate \hat{\textbf{x}}_{k|k} = \hat{\textbf{x}}_{k|k-1} + \textbf{K}_k\tilde{\textbf{y}}_k
Updated (a posteriori) estimate covariance \textbf{P}_{k|k} = (I - \textbf{K}_k \textbf{H}_k) \textbf{P}_{k|k-1}
The formula for the updated estimate covariance above is only valid for the optimal Kalman gain. Usage of other gain values require a more complex formula found in the derivations section.

Invariants

If the model is accurate, and the values for \hat{\textbf{x}}_{0|0} and \textbf{P}_{0|0} accurately reflect the distribution of the initial state values, then the following invariants are preserved: all estimates have mean error zero
  • \textrm{E}[\textbf{x}_k - \hat{\textbf{x}}_{k|k}] = \textrm{E}[\textbf{x}_k - \hat{\textbf{x}}_{k|k-1}] = 0
  • \textrm{E}[\tilde{\textbf{y}}_k] = 0
where E[ξ] is the expected value of ξ, and covariance matrices accurately reflect the covariance of estimates
  • \textbf{P}_{k|k} = \textrm{cov}(\textbf{x}_k - \hat{\textbf{x}}_{k|k})
  • \textbf{P}_{k|k-1} = \textrm{cov}(\textbf{x}_k - \hat{\textbf{x}}_{k|k-1})
  • \textbf{S}_{k} = \textrm{cov}(\tilde{\textbf{y}}_k)

Examples

Consider a truck on perfectly frictionless, infinitely long straight rails. Initially the truck is stationary at position 0, but it is buffeted this way and that by random acceleration. We measure the position of the truck every Δt seconds, but these measurements are imprecise; we want to maintain a model of where the truck is and what its velocity is. We show here how we derive the model from which we create our Kalman filter.
There are no controls on the truck, so we ignore Bk and uk. Since F, H, R and Q are constant, their time indices are dropped.
The position and velocity of the truck is described by the linear state space
\textbf{x}_{k} = \begin{bmatrix} x \\ \dot{x} \end{bmatrix}
where \dot{x} is the velocity, that is, the derivative of position with respect to time.
We assume that between the (k − 1)th and kth timestep the truck undergoes a constant acceleration of ak that is normally distributed, with mean 0 and standard deviation σa. From Newton's laws of motion we conclude that
\textbf{x}_{k} = \textbf{F} \textbf{x}_{k-1} + \textbf{B}a_{k}
where
\textbf{F} = \begin{bmatrix} 1 & \Delta t \\ 0 & 1 \end{bmatrix}
and
\textbf{B} = \begin{bmatrix} \frac{\Delta t^{2}}{2} \\ \Delta t \end{bmatrix}
We find that
 \textbf{Q} = \textrm{cov}(\textbf{B}a) = \textrm{E}[(\textbf{B}a)(\textbf{B}a)^{\text{T}}] = \textbf{B} \textrm{E}[a^2] \textbf{B}^{\text{T}} = \textbf{B}[\sigma_a^2]\textbf{B}^{\text{T}} = \sigma_a^2 \textbf{B}\textbf{B}^{\text{T}} (since σa is a scalar).
At each time step, a noisy measurement of the true position of the truck is made. Let us suppose the measurement noise vk is also normally distributed, with mean 0 and standard deviation σz.
\textbf{z}_{k} = \textbf{H x}_{k} + \textbf{v}_{k}
where
\textbf{H} = \begin{bmatrix} 1 & 0 \end{bmatrix}
and
\textbf{R} = \textrm{E}[\textbf{v}_k \textbf{v}_k^{\text{T}}] = \begin{bmatrix} \sigma_z^2 \end{bmatrix}
We know the initial starting state of the truck with perfect precision, so we initialize
\hat{\textbf{x}}_{0|0} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}
and to tell the filter that we know the exact position, we give it a zero covariance matrix:
\textbf{P}_{0|0} = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}
If the initial position and velocity are not known perfectly the covariance matrix should be initialized with a suitably large number, say L, on its diagonal.
\textbf{P}_{0|0} = \begin{bmatrix} L & 0 \\ 0 & L \end{bmatrix}
The filter will then prefer the information from the first measurements over the information already in the model.

Derivations

Deriving the posterior estimate covariance matrix

Starting with our invariant on the error covariance Pk|k as above
\textbf{P}_{k|k}  = \textrm{cov}(\textbf{x}_{k} - \hat{\textbf{x}}_{k|k})
substitute in the definition of \hat{\textbf{x}}_{k|k}
\textbf{P}_{k|k} = \textrm{cov}(\textbf{x}_{k} - (\hat{\textbf{x}}_{k|k-1} + \textbf{K}_k\tilde{\textbf{y}}_{k}))
and substitute \tilde{\textbf{y}}_k
\textbf{P}_{k|k} = \textrm{cov}(\textbf{x}_{k} - (\hat{\textbf{x}}_{k|k-1} + \textbf{K}_k(\textbf{z}_k - \textbf{H}_k\hat{\textbf{x}}_{k|k-1})))
and \textbf{z}_{k}
\textbf{P}_{k|k} = \textrm{cov}(\textbf{x}_{k} - (\hat{\textbf{x}}_{k|k-1} + \textbf{K}_k(\textbf{H}_k\textbf{x}_k + \textbf{v}_k - \textbf{H}_k\hat{\textbf{x}}_{k|k-1})))
and by collecting the error vectors we get
\textbf{P}_{k|k} = \textrm{cov}((I - \textbf{K}_k \textbf{H}_{k})(\textbf{x}_k - \hat{\textbf{x}}_{k|k-1}) - \textbf{K}_k \textbf{v}_k )
Since the measurement error vk is uncorrelated with the other terms, this becomes
\textbf{P}_{k|k} = \textrm{cov}((I - \textbf{K}_k \textbf{H}_{k})(\textbf{x}_k - \hat{\textbf{x}}_{k|k-1}))  + \textrm{cov}(\textbf{K}_k \textbf{v}_k )
by the properties of vector covariance this becomes
\textbf{P}_{k|k} = (I - \textbf{K}_k \textbf{H}_{k})\textrm{cov}(\textbf{x}_k - \hat{\textbf{x}}_{k|k-1})(I - \textbf{K}_k \textbf{H}_{k})^{\text{T}}  + \textbf{K}_k\textrm{cov}(\textbf{v}_k )\textbf{K}_k^{\text{T}}
which, using our invariant on Pk|k-1 and the definition of Rk becomes
\textbf{P}_{k|k} = 
(I - \textbf{K}_k \textbf{H}_{k}) \textbf{P}_{k|k-1} (I - \textbf{K}_k \textbf{H}_{k})^\text{T} +
\textbf{K}_k \textbf{R}_k \textbf{K}_k^\text{T}
This formula (sometimes known as the "Joseph form" of the covariance update equation) is valid no matter what the value of Kk. It turns out that if Kk is the optimal Kalman gain, this can be simplified further as shown below.

Kalman gain derivation

The Kalman filter is a minimum mean-square error estimator. The error in the posterior state estimation is
\textbf{x}_{k} - \hat{\textbf{x}}_{k|k}
We seek to minimize the expected value of the square of the magnitude of this vector, \textrm{E}[|\textbf{x}_{k} - \hat{\textbf{x}}_{k|k}|^2]. This is equivalent to minimizing the trace of the posterior estimate covariance matrix  \textbf{P}_{k|k} . By expanding out the terms in the equation above and collecting, we get:
 \textbf{P}_{k|k}  = \textbf{P}_{k|k-1} - \textbf{K}_k \textbf{H}_k \textbf{P}_{k|k-1} - \textbf{P}_{k|k-1} \textbf{H}_k^\text{T} \textbf{K}_k^\text{T} + \textbf{K}_k (\textbf{H}_k \textbf{P}_{k|k-1} \textbf{H}_k^\text{T} + \textbf{R}_k) \textbf{K}_k^\text{T}

 = \textbf{P}_{k|k-1} - \textbf{K}_k \textbf{H}_k \textbf{P}_{k|k-1} - \textbf{P}_{k|k-1} \textbf{H}_k^\text{T} \textbf{K}_k^\text{T} + \textbf{K}_k \textbf{S}_k\textbf{K}_k^\text{T}
The trace is minimized when the matrix derivative is zero:
\frac{\partial \; \textrm{tr}(\textbf{P}_{k|k})}{\partial \;\textbf{K}_k} = -2 (\textbf{H}_k \textbf{P}_{k|k-1})^\text{T} + 2 \textbf{K}_k \textbf{S}_k  = 0
Solving this for Kk yields the Kalman gain:
\textbf{K}_k \textbf{S}_k = (\textbf{H}_k \textbf{P}_{k|k-1})^\text{T} = \textbf{P}_{k|k-1} \textbf{H}_k^\text{T}
 \textbf{K}_{k} = \textbf{P}_{k|k-1} \textbf{H}_k^\text{T} \textbf{S}_k^{-1}
This gain, which is known as the optimal Kalman gain, is the one that yields MMSE estimates when used.

Simplification of the posterior error covariance formula

The formula used to calculate the posterior error covariance can be simplified when the Kalman gain equals the optimal value derived above. Multiplying both sides of our Kalman gain formula on the right by SkKkT, it follows that
\textbf{K}_k \textbf{S}_k \textbf{K}_k^T = \textbf{P}_{k|k-1} \textbf{H}_k^T \textbf{K}_k^T
Referring back to our expanded formula for the posterior error covariance,
 \textbf{P}_{k|k} = \textbf{P}_{k|k-1} - \textbf{K}_k \textbf{H}_k \textbf{P}_{k|k-1}  - \textbf{P}_{k|k-1} \textbf{H}_k^T \textbf{K}_k^T + \textbf{K}_k \textbf{S}_k \textbf{K}_k^T
we find the last two terms cancel out, giving
 \textbf{P}_{k|k} = \textbf{P}_{k|k-1} - \textbf{K}_k \textbf{H}_k \textbf{P}_{k|k-1} = (I - \textbf{K}_{k} \textbf{H}_{k}) \textbf{P}_{k|k-1}.
This formula is computationally cheaper and thus nearly always used in practice, but is only correct for the optimal gain. If arithmetic precision is unusually low causing problems with numerical stability, or if a non-optimal Kalman gain is deliberately used, this simplification cannot be applied; the posterior error covariance formula as derived above must be used.

Relationship to recursive Bayesian estimation

The true state is assumed to be an unobserved Markov process, and the measurements are the observed states of a hidden Markov model.
Hidden Markov model
Because of the Markov assumption, the true state is conditionally independent of all earlier states given the immediately previous state.
p(\textbf{x}_k|\textbf{x}_0,\dots,\textbf{x}_{k-1}) = p(\textbf{x}_k|\textbf{x}_{k-1})
Similarly the measurement at the k-th timestep is dependent only upon the current state and is conditionally independent of all other states given the current state.
p(\textbf{z}_k|\textbf{x}_0,\dots,\textbf{x}_{k}) = p(\textbf{z}_k|\textbf{x}_{k} )
Using these assumptions the probability distribution over all states of the hidden Markov model can be written simply as:
p(\textbf{x}_0,\dots,\textbf{x}_k,\textbf{z}_1,\dots,\textbf{z}_k) = p(\textbf{x}_0)\prod_{i=1}^k p(\textbf{z}_i|\textbf{x}_i)p(\textbf{x}_i|\textbf{x}_{i-1})
However, when the Kalman filter is used to estimate the state x, the probability distribution of interest is that associated with the current states conditioned on the measurements up to the current timestep. (This is achieved by marginalizing out the previous states and dividing by the probability of the measurement set.)
This leads to the predict and update steps of the Kalman filter written probabilistically. The probability distribution associated with the predicted state is the sum (integral) of the products of the probability distribution associated with the transition from the (k - 1)-th timestep to the k-th and the probability distribution associated with the previous state, over all possible x_{k_-1}.
 p(\textbf{x}_k|\textbf{Z}_{k-1}) = \int p(\textbf{x}_k | \textbf{x}_{k-1}) p(\textbf{x}_{k-1} | \textbf{Z}_{k-1} )  \, d\textbf{x}_{k-1}
The measurement set up to time t is
 \textbf{Z}_{t} = \left \{ \textbf{z}_{1},\dots,\textbf{z}_{t} \right \}
The probability distribution of the update is proportional to the product of the measurement likelihood and the predicted state.
 p(\textbf{x}_k|\textbf{Z}_{k}) = \frac{p(\textbf{z}_k|\textbf{x}_k) p(\textbf{x}_k|\textbf{Z}_{k-1})}{p(\textbf{z}_k|\textbf{Z}_{k-1})}
The denominator
p(\textbf{z}_k|\textbf{Z}_{k-1}) = \int p(\textbf{z}_k|\textbf{x}_k) p(\textbf{x}_k|\textbf{Z}_{k-1}) d\textbf{x}_k
is a normalization term.
The remaining probability density functions are
 p(\textbf{x}_k | \textbf{x}_{k-1}) = N(\textbf{F}_k\textbf{x}_{k-1}, \textbf{Q}_k)
 p(\textbf{z}_k|\textbf{x}_k) = N(\textbf{H}_{k}\textbf{x}_k, \textbf{R}_k)
 p(\textbf{x}_{k-1}|\textbf{Z}_{k-1}) = N(\hat{\textbf{x}}_{k-1},\textbf{P}_{k-1} )
Note that the PDF at the previous timestep is inductively assumed to be the estimated state and covariance. This is justified because, as an optimal estimator, the Kalman filter makes best use of the measurements, therefore the PDF for \mathbf{x}_k given the measurements \mathbf{Z}_k is the Kalman filter estimate.

Information filter

In the information filter, or inverse covariance filter, the estimated covariance and estimated state are replaced by the information matrix and information vector respectively. These are defined as:
\textbf{Y}_{k|k} =  \textbf{P}_{k|k}^{-1}
\hat{\textbf{y}}_{k|k} =  \textbf{P}_{k|k}^{-1}\hat{\textbf{x}}_{k|k}
Similarly the predicted covariance and state have equivalent information forms, defined as:
\textbf{Y}_{k|k-1} =  \textbf{P}_{k|k-1}^{-1}
\hat{\textbf{y}}_{k|k-1} =  \textbf{P}_{k|k-1}^{-1}\hat{\textbf{x}}_{k|k-1}
as have the measurement covariance and measurement vector, which are defined as:
\textbf{I}_{k} = \textbf{H}_{k}^{\text{T}} \textbf{R}_{k}^{-1} \textbf{H}_{k}
\textbf{i}_{k} = \textbf{H}_{k}^{\text{T}} \textbf{R}_{k}^{-1} \textbf{z}_{k}
The information update now becomes a trivial sum.
\textbf{Y}_{k|k} = \textbf{Y}_{k|k-1} + \textbf{I}_{k}
\hat{\textbf{y}}_{k|k} = \hat{\textbf{y}}_{k|k-1} + \textbf{i}_{k}
The main advantage of the information filter is that N measurements can be filtered at each timestep simply by summing their information matrices and vectors.
\textbf{Y}_{k|k} = \textbf{Y}_{k|k-1} + \sum_{j=1}^N \textbf{I}_{k,j}
\hat{\textbf{y}}_{k|k} = \hat{\textbf{y}}_{k|k-1} + \sum_{j=1}^N \textbf{i}_{k,j}
To predict the information filter the information matrix and vector can be converted back to their state space equivalents, or alternatively the information space prediction can be used.
\textbf{M}_{k} = 
  [\textbf{F}_{k}^{-1}]^{\text{T}} \textbf{Y}_{k-1|k-1} \textbf{F}_{k}^{-1}
\textbf{C}_{k} = 
  \textbf{M}_{k} [\textbf{M}_{k}+\textbf{Q}_{k}^{-1}]^{-1}
\textbf{L}_{k} = 
  I - \textbf{C}_{k}
\textbf{Y}_{k|k-1} = 
  \textbf{L}_{k} \textbf{M}_{k} \textbf{L}_{k}^{\text{T}} + 
  \textbf{C}_{k} \textbf{Q}_{k}^{-1} \textbf{C}_{k}^{\text{T}}
\hat{\textbf{y}}_{k|k-1} = 
  \textbf{L}_{k} [\textbf{F}_{k}^{-1}]^{\text{T}}\hat{\textbf{y}}_{k-1|k-1}
Note that if F and Q are time invariant these values can be cached. Note also that F and Q need to be invertible.

Fixed-lag smoother

The optimal fixed-lag smoother provides the optimal estimate of \hat{\textbf{x}}_{k - N | k} for a given fixed-lag N using the measurements from \textbf{z}_{1} to \textbf{z}_{k}. It can be derived using the previous theory via an augmented state, and the main equation of the filter is the following:
\begin{bmatrix}
  \hat{\textbf{x}}_{t|t} \\
  \hat{\textbf{x}}_{t-1|t} \\
  \vdots \\
  \hat{\textbf{x}}_{t-N+1|t} \\
 \end{bmatrix}
 =
 \begin{bmatrix}
  I \\
  0 \\
  \vdots \\
  0 \\
 \end{bmatrix}
 \hat{\textbf{x}}_{t|t-1}
 +
 \begin{bmatrix}
  0  & \ldots & 0 \\
  I  & 0  & \vdots \\
  \vdots  & \ddots & \vdots \\
  0  & \ldots & I \\
 \end{bmatrix}
 \begin{bmatrix}
  \hat{\textbf{x}}_{t-1|t-1} \\
  \hat{\textbf{x}}_{t-2|t-1} \\
  \vdots \\
  \hat{\textbf{x}}_{t-N|t-1} \\
 \end{bmatrix}
 +
 \begin{bmatrix}
  K^{(1)} \\
  K^{(2)} \\
  \vdots \\
  K^{(N)} \\
 \end{bmatrix}
 y_{t|t-1}
where:
1)  \hat{\textbf{x}}_{t|t-1} is estimated via a standard Kalman filter;
2)  y_{t|t-1} = z(t) - \hat{\textbf{x}}_{t|t-1} is the innovation produced considering the estimate of the standard Kalman filter;
3) the various  \hat{\textbf{x}}_{t-i|t} with  i = 0,\ldots,N are new variables, i.e. they do not appear in the standard Kalman filter;
4) the gains are computed via the following scheme:
K^{(i)} =
P^{(i)} H^{T}
\left[
 H P H^{T} + R
\right]^{-1}
and
P^{(i)} =
P
\left[
 \left[
  F - K H
 \right]^{T}
\right]^{i}
where P and K are the prediction error covariance and the gains of the standard Kalman filter.
Note that if we define the estimation error covariance
P_{i} :=
E
\left[
 \left(
  \textbf{x}_{t-i} - \hat{\textbf{x}}_{t-i|t}
 \right)^{*}
 \left(
  \textbf{x}_{t-i} - \hat{\textbf{x}}_{t-i|t}
 \right)
 |
 z_{1} \ldots z_{t}
\right]
then we have that the improvement on the estimation of  \textbf{x}_{t-i} is given by:
P - P_{i} =
\sum_{j = 0}^{i}
\left[
 P^{(j)} H^{T}
 \left[
 H P H^{T} + R
 \right]^{-1}
 H \left( P^{(i)} \right)^{T}
\right]

Fixed-interval filters

The optimal fixed-interval smoother provides the optimal estimate of \hat{\textbf{x}}_{k | n} (k \leq n) using the measurements from a fixed interval \textbf{z}_{1} to \textbf{z}_{n}. This is also called Kalman Smoothing.
There exists an efficient two-pass algorithm, Rauch-Tung-Striebel Algorithm, for achieving this. The main equations of the smoother is the following (assuming \textbf{B}_{k} = \textbf{0}):
  • forward pass: regular Kalman filter algorithm
  • backward pass:  \hat{\textbf{x}}_{k|n} = \tilde{F}_k \hat{\textbf{x}}_{k+1|n} + \tilde{K}_k \hat{\textbf{x}}_{k+1|k} , where
    •  \tilde{\textbf{F}}_k = \textbf{F}_k^{-1} (\textbf{I} -  \textbf{Q}_k \textbf{P}_{k+1|k}^{-1})
    •  \tilde{\textbf{K}}_k = \textbf{F}_k^{-1} \textbf{Q}_k \textbf{P}_{k+1|k}^{-1}

Non-linear filters

The basic Kalman filter is limited to a linear assumption. However, most non-trivial systems are non-linear. The non-linearity can be associated either with the process model or with the observation model or with both.

Extended Kalman filter

In the extended Kalman filter, (EKF) the state transition and observation models need not be linear functions of the state but may instead be (differentiable) functions.
\textbf{x}_{k} = f(\textbf{x}_{k-1}, \textbf{u}_{k}) + \textbf{w}_{k}
\textbf{z}_{k} = h(\textbf{x}_{k}) + \textbf{v}_{k}
The function f can be used to compute the predicted state from the previous estimate and similarly the function h can be used to compute the predicted measurement from the predicted state. However, f and h cannot be applied to the covariance directly. Instead a matrix of partial derivatives (the Jacobian) is computed.
At each timestep the Jacobian is evaluated with current predicted states. These matrices can be used in the Kalman filter equations. This process essentially linearizes the non-linear function around the current estimate.

Unscented Kalman filter

When the state transition and observation models – that is, the predict and update functions f and h (see above) – are highly non-linear, the extended Kalman filter can give particularly poor performance.[2] This is because the mean and covariance are propagated through linearization of the underlying non-linear model. The unscented Kalman filter (UKF) [2] uses a deterministic sampling technique known as the unscented transform to pick a minimal set of sample points (called sigma points) around the mean. These sigma points are then propagated through the non-linear functions, from which the mean and covariance of the estimate are then recovered. The result is a filter which more accurately captures the true mean and covariance. (This can be verified using Monte Carlo sampling or through a Taylor series expansion of the posterior statistics.) In addition, this technique removes the requirement to explicitly calculate Jacobians, which for complex functions can be a difficult task in itself (i.e., requiring complicated derivatives if done analytically or being computationally costly if done numerically).
Predict
As with the EKF, the UKF prediction can be used independently from the UKF update, in combination with a linear (or indeed EKF) update, or vice versa.
The estimated state and covariance are augmented with the mean and covariance of the process noise.
 \textbf{x}_{k-1|k-1}^{a} = [ \hat{\textbf{x}}_{k-1|k-1}^{T} \quad E[\textbf{w}_{k}^{T}] \ ]^{T}
 \textbf{P}_{k-1|k-1}^{a} = \begin{bmatrix} & \textbf{P}_{k-1|k-1} & & 0 & \\ & 0 & &\textbf{Q}_{k} & \end{bmatrix}
A set of 2L+1 sigma points is derived from the augmented state and covariance where L is the dimension of the augmented state.
\chi_{k-1|k-1}^{0} = \textbf{x}_{k-1|k-1}^{a}
\chi_{k-1|k-1}^{i} =\textbf{x}_{k-1|k-1}^{a} + \left ( \sqrt{ (L + \lambda) \textbf{P}_{k-1|k-1}^{a} } \right )_{i} i = 1..L \,\!
\chi_{k-1|k-1}^{i} = \textbf{x}_{k-1|k-1}^{a} - \left ( \sqrt{ (L + \lambda) \textbf{P}_{k-1|k-1}^{a} } \right )_{i-L} i = L+1,\dots{}2L \,\!
where
\left ( \sqrt{ (L + \lambda) \textbf{P}_{k-1|k-1}^{a} } \right )_{i}
is the ith column of the matrix square root of
(L + \lambda) \textbf{P}_{k-1|k-1}^{a}
using the definition: square root A of matrix B satisfies
B \equiv A A^T.
The matrix square root should be calculated using numerically efficient and stable methods such as the Cholesky decomposition.
The sigma points are propagated through the transition function f.
\chi_{k|k-1}^{i} = f(\chi_{k-1|k-1}^{i}) \quad i = 0..2L
The weighted sigma points are recombined to produce the predicted state and covariance.
\hat{\textbf{x}}_{k|k-1} = \sum_{i=0}^{2L} W_{s}^{i} \chi_{k|k-1}^{i}
\textbf{P}_{k|k-1} = \sum_{i=0}^{2L} W_{c}^{i}\ [\chi_{k|k-1}^{i} - \hat{\textbf{x}}_{k|k-1}] [\chi_{k|k-1}^{i} - \hat{\textbf{x}}_{k|k-1}]^{T}
where the weights for the state and covariance are given by:
W_{s}^{0} = \frac{\lambda}{L+\lambda}
W_{c}^{0} = \frac{\lambda}{L+\lambda} + (1 - \alpha^2 + \beta)
W_{s}^{i} = W_{c}^{i} = \frac{1}{2(L+\lambda)}
\lambda = \alpha^2 (L+\kappa) - L \,\!
Typical values for α, β, and κ are 10 − 3, 2 and 0 respectively. (These values should suffice for most purposes.)[citation needed]
Update
The predicted state and covariance are augmented as before, except now with the mean and covariance of the measurement noise.
 \textbf{x}_{k|k-1}^{a} = [ \hat{\textbf{x}}_{k|k-1}^{T} \quad E[\textbf{v}_{k}^{T}] \ ]^{T}
 \textbf{P}_{k|k-1}^{a} = \begin{bmatrix} & \textbf{P}_{k|k-1} & & 0 & \\ & 0 & &\textbf{R}_{k} & \end{bmatrix}
As before, a set of 2L + 1 sigma points is derived from the augmented state and covariance where L is the dimension of the augmented state.
\chi_{k|k-1}^{0} = \textbf{x}_{k|k-1}^{a}
\chi_{k|k-1}^{i} =\textbf{x}_{k|k-1}^{a} + \left ( \sqrt{ (L + \lambda) \textbf{P}_{k|k-1}^{a} } \right )_{i} i = 1..L \,\!
\chi_{k|k-1}^{i} = \textbf{x}_{k|k-1}^{a} - \left ( \sqrt{ (L + \lambda) \textbf{P}_{k|k-1}^{a} } \right )_{i-L} i = L+1,\dots{}2L \,\!
Alternatively if the UKF prediction has been used the sigma points themselves can be augmented along the following lines
 \chi_{k|k-1} := [ \chi_{k|k-1}^T \quad E[\textbf{v}_{k}^{T}] \ ]^{T} \pm \sqrt{ (L + \lambda) \textbf{R}_{k}^{a} }
where
 \textbf{R}_{k}^{a} = \begin{bmatrix} & 0 & & 0 & \\ & 0 & &\textbf{R}_{k} & \end{bmatrix}
The sigma points are projected through the observation function h.
\gamma_{k}^{i} = h(\chi_{k|k-1}^{i}) \quad i = 0..2L
The weighted sigma points are recombined to produce the predicted measurement and predicted measurement covariance.
\hat{\textbf{z}}_{k} = \sum_{i=0}^{2L} W_{s}^{i} \gamma_{k}^{i}
\textbf{P}_{z_{k}z_{k}} = \sum_{i=0}^{2L} W_{c}^{i}\ [\gamma_{k}^{i} - \hat{\textbf{z}}_{k}] [\gamma_{k}^{i} - \hat{\textbf{z}}_{k}]^{T}
The state-measurement cross-covariance matrix,
\textbf{P}_{x_{k}z_{k}} = \sum_{i=0}^{2L} W_{c}^{i}\ [\chi_{k|k-1}^{i} - \hat{\textbf{x}}_{k|k-1}] [\gamma_{k}^{i} - \hat{\textbf{z}}_{k}]^{T}
is used to compute the UKF Kalman gain.
K_{k} = \textbf{P}_{x_{k}z_{k}} \textbf{P}_{z_{k}z_{k}}^{-1}
As with the Kalman filter, the updated state is the predicted state plus the innovation weighted by the Kalman gain,
\hat{\textbf{x}}_{k|k} = \hat{\textbf{x}}_{k|k-1} + K_{k}( \textbf{z}_{k} - \hat{\textbf{z}}_{k} )
And the updated covariance is the predicted covariance, minus the predicted measurement covariance, weighted by the Kalman gain.
\textbf{P}_{k|k} = \textbf{P}_{k|k-1} - K_{k} \textbf{P}_{z_{k}z_{k}} K_{k}^{T}

Kalman–Bucy filter

The Kalman–Bucy filter is a continuous time version of the Kalman filter.[3][4]
It is based on the state space model
\frac{d}{dt}\mathbf{x}(t) = \mathbf{F}(t)\mathbf{x}(t) + \mathbf{w}(t)
\mathbf{z}(t) = \mathbf{H}(t) \mathbf{x}(t) + \mathbf{v}(t)
where the covariances of the noise terms \mathbf{w}(t) and \mathbf{v}(t) are given by \mathbf{Q}(t) and \mathbf{R}(t), respectively.
The filter consists of two differential equations, one for the state estimate and one for the covariance:
\frac{d}{dt}\hat{\mathbf{x}}(t) = \mathbf{F}(t)\hat{\mathbf{x}}(t) + \mathbf{K}(t) (\mathbf{z}(t)-\mathbf{H}(t)\hat{\mathbf{x}}(t))
\frac{d}{dt}\mathbf{P}(t) = \mathbf{F}(t)\mathbf{P}(t) + \mathbf{P}(t)\mathbf{F}^{T}(t) + \mathbf{Q}(t) - \mathbf{K}(t)\mathbf{R}(t)\mathbf{K}^{T}(t)
where the Kalman gain is given by
\mathbf{K}(t)=\mathbf{P}(t)\mathbf{H}^{T}(t)\mathbf{R}^{-1}(t)
Note that in this expression for \mathbf{K}(t) the covariance of the observation noise \mathbf{R}(t) represents at the same time the covariance of the prediction error (or innovation) \tilde{\mathbf{y}}(t)=\mathbf{z}(t)-\mathbf{H}(t)\hat{\mathbf{x}}(t); these covariances are equal only in the case of continuous time.[5]
The distinction between the prediction and update steps of discrete-time Kalman filtering does not exist in continuous time.
The second differential equation, for the covariance, is an example of a Riccati equation.

Naming and historical development

The filter is named after Rudolf E. Kalman, though Thorvald Nicolai Thiele[6][7] and Peter Swerling developed a similar algorithm earlier. Stanley F. Schmidt is generally credited with developing the first implementation of a Kalman filter. It was during a visit of Kalman to the NASA Ames Research Center that he saw the applicability of his ideas to the problem of trajectory estimation for the Apollo program, leading to its incorporation in the Apollo navigation computer. This Kalman filter was first described and partially developed in technical papers by Swerling (1958), Kalman (1960) and Kalman and Bucy (1961).
Kalman filters have been vital in the implemention of the navigation systems of U.S. Navy nuclear ballistic missile submarines; and in the guidance and navigation sustems of cruise missiles such as the U.S. Navy's Tomahawk missile; the U.S. Air Force's Air Launched Cruise Missile; It is also used in the guidance and navigation systems of the NASA Space Shuttle and the attitude control and navigation systems of the International Space Station.
This digital filter is sometimes called the Stratonovich–Kalman–Bucy filter because it is a special case of a more general, non-linear filter developed somewhat earlier by the Soviet mathematician Ruslan L. Stratonovich.[8][9] In fact, some of the equations of the special case linear filter appeared in these papers by Stratonovich that were published before summer 1960, when Kalman met with Stratonovich during a conference in Moscow.
In control theory, the Kalman filter is most commonly referred to as linear quadratic estimation (LQE).
A wide variety of Kalman filters have now been developed, from Kalman's original formulation, now called the simple Kalman filter, the Kalman-Bucy filter, Schmidt's extended filter, the information filter, and a variety of square-root filters that were developed by Bierman, Thornton and many others. Perhaps the most commonly used type of very simple Kalman filter is the phase-locked loop, which is now ubiquitous in radios, especially frequency modulation (FM) radios, television sets, satellite communications receivers, outer space communications systems, and nearly any other electronic communications equipment.

Applications