Estimation Theory

A.21 Estimation Theory

A.21.1 Probabilistic estimation theory

Estimation theory concerns estimating the values of parameters $𝜃$ based on measured data. An estimator attempts to approximate the unknown parameters using measurements that have a random component (e. g., due to noise). There are other approaches, but the one of interest to this text is the probabilistic estimation theory, which aims at finding a set $Θ = {𝜃_{1}, \dots, 𝜃_{M}}$ of $M$ parameters. In this approach, the measured data set $Ξ$ with $N$ elements is random and with a probability distribution $p (Ξ | Θ)$ that depends on $Θ$ . The parameters $𝜃$ themselves may have associated probability distributions. An estimator provides $\hat{Θ} = {{\hat{𝜃}}_{i}}$ , where the “hat” suggests the element ${\hat{𝜃}}_{i}$ is an estimate of the true parameter $𝜃_{i}$ . The value ${\hat{𝜃}}_{i}$ is a statistics with distribution, variance, mean $𝔼 [{\hat{𝜃}}_{i}]$ , etc.

The bias of an estimator is $𝔼 [{\hat{𝜃}}_{i}] - 𝜃_{i}$ and an estimator is called unbiased if the bias is zero for all $M$ elements of $\hat{Θ}$ . The estimator of the sample variance that uses $N$ as normalization factor instead of $N - 1$ is a good example of a biased estimator [ urlBMbia]. There is a well-known tradeoff between bias and variance. In many applications it is possible to decrease the variance of an estimator at the expenses of allowing its bias to increase.

Some specific topics of probabilistic estimation theory are discussed in the sequel. But this theory is broad and some of the (many) concepts out of the scope of this text are:

Cramér-Rao bound (CRB): lower bound on the variance of an estimator;
Several estimators: Maximum likelihood (MLE), maximum a posteriori (MAP), generalized Bayes, etc.;
Non-linear estimators such as artificial neural networks;
Dynamic systems: Kalman filter, recursive Bayesian estimation, etc.

A.21.2 Minimum mean square error (MMSE) estimators

A popular estimator is the minimum mean square error (MMSE) estimator, which utilizes the error $e_{i} = 𝜃_{i} - {\hat{𝜃}}_{i}, i = 1, \dots, M,$ between the estimated parameters and their actual values.

The measurements $Ξ$ may be organized in several different forms. For example, in multivariate statistics, each element of $Ξ$ can be an array. For simplicity, it is assumed here that $Ξ$ and $Θ$ are column vectors $y$ and $x$ with $N$ and $M$ complex-valued elements, respectively. The estimator $\hat{x} (y)$ is a function of the measurements $y$ and the estimation error is $e = \hat{x} - x$ with elements $e_{i}, i = 1, \dots, M$ . Because the vectors are random, the MSE is given by

MSE \overset{def}{=} \sum_{i = 1}^{M} 𝔼 [| e_{i} |^{2}],

(A.96)

which can be written in matrix notation as the expected value of the trace

MSE = tr {𝔼 [e e^{H}]},

(A.97)

where $R_{ee} = 𝔼 [e e^{H}]$ is the error autocorrelation matrix.

A.21.3 Orthogonality principle

AK-TODO Orthogonality principle