One-dimensional linear prediction over time

A.22 One-dimensional linear prediction over time

A.22.1 The innovations process

As indicated in Eq. (4.41), when white noise with power $σ_{n}^{2}$ is passed through an LTI filter $M (z)$ , the output PSD is

S (e^{j Ω}) = σ^{2} | M (e^{j Ω}) |^{2} .

(A.98)

Conversely, when a PSD $S (e^{j Ω})$ satisfies the corresponding conditions,¹⁵ it is possible to find a filter $M (z)$ that obeys Eq. (A.98).

As discussed in [BLM04] (page 71), a WSS random process $X [n]$ with PSD $S_{X} (e^{j Ω})$ allows the following monic¹⁶ minimum-phase spectral factorization of the associated¹⁷ “transfer function” $S_{X} (z)$ :

S_{X} (z) = σ_{i}^{2} M_{x} (z) M_{x}^{*} (1 ∕ z^{*}),

(A.99)

where $M_{x} (z)$ is a monic loosely¹⁸ minimum-phase causal filter and $σ_{i}^{2}$ is the variance of the innovation process or, equivalently, the geometric mean¹⁹ of $S_{X} (e^{j Ω})$ .

If $M_{x} (z)$ is strictly minimum phase, $M_{x}^{- 1} (z)$ is stable and the following filtering process

X [n] \to M_{x}^{- 1} (z) \to I [n] \to M_{x} (z) \to X [n]

illustrates how to generate a white WSS process $I [n]$ with power $σ_{i}^{2}$ from $X [n]$ (the “whitening” operation) and how to recover $X [n]$ from $I [n]$ .

The filter $M_{x}^{- 1} (z)$ removes correlated components in $X [n]$ and obtains $I [n]$ with characteristics of white noise. Hence, it is called whitening filter. And because $I [n]$ corresponds to the new and random part of the process, it is called the innovation sequence.

Linear prediction for signal decorrelation

The whitening filter $M_{x}^{- 1} (z)$ is also called prediction-error filter because its action can be split into two stages, assuming linear prediction:

f 1.

a prediction

\tilde{X} [m] = \sum_{k = 1}^{\infty} p_{k} X [m - k]

of the current (

m

-th) sample of

X [n]

is obtained from past samples of

X [n]

via the prediction filter

P (z) = \sum_{n = 1}^{\infty} p_{n} z^{- n},

(A.100)

which is $P (z) = 1 - M_{x}^{- 1} (z)$

f 2.

and then this prediction is subtracted from the current sample to produce the prediction error

I [m] = X [m] - \tilde{X} [m]

. It can be shown that this scheme is an optimum minimum mean square error (MMSE) estimator²⁰ [BLM04] (page 72).

Using the notation of “prediction”, the whitening can be obtained with:

X [n] \to 1 - P (z) \to I [n]

In summary, the filter $M_{x}^{- 1} (z) = 1 - P (z)$ is often used to perform decorrelation of a signal while $M_{x} (z)$ is called the synthesis filter because it can generate a signal $X [n]$ with a given PSD from white noise $I [n]$ .

When $X [m]$ is a correlated signal, the decorrelation procedure decreases its power and is typically beneficial for signal compression (reducing the bit rate) purposes.

The power of the prediction error $I [m]$ is $σ_{i}^{2}$ , which is the geometric mean of the PSD $S_{X} (e^{j Ω})$ , while the power $σ_{x}^{2}$ of $X [m]$ is the average of $S_{X} (e^{j Ω})$ . Hence, linear prediction reduces the noise power by

prediction gain = 10 {log}_{10} (\frac{σ_{x}^{2}}{σ_{i}^{2}}) = 10 {log}_{10} (\frac{1}{SFM}) dB .

(A.101)

The ratio $SFM = σ_{i}^{2} ∕ σ_{x}^{2}$ is called the spectral flatness measure. Assuming the PSD was estimated in $K$ discrete points $S_{X} [k]$ , the SFM can be alternatively obtained by

SFM = \frac{PSD geometric mean}{PSD arithmetic mean} = \frac{{(\prod_{k = 1}^{K} S_{X} [k])}^{1 ∕ K}}{(1 ∕ K) \sum_{k = 1}^{K} S_{X} [k]} .

(A.102)

Because the geometric mean is never larger than the arithmetic mean: $0 \leq SFM \leq 1$ , and $SFM = 1$ only if $S_{X} [k]$ is constant, $\forall k$ .

The prediction gain (in linear scale) coincides with $1 ∕ SFM$ . Small values of SFM suggest concentration of power at narrow portions of the PSD and, consequently, large prediction gains, while large SFM values indicate the PSD is approximately flat, which corresponds to small prediction gains.

Autoregressive (AR) linear prediction

This discussion is related to Section 4.10. There are computationally efficient methods to estimate $M (z)$ when it is restricted to be and autoregressive (AR) filter $M (z) = 1 ∕ A (z)$ . For example, the Yule-Walker equations consist of an efficient method to estimate $A (z)$ [Hay01]. In this case, it is assumed that $X [n]$ is a realization of an ergodic autoregressive random process of order $P$ , denoted as AR( $P$ ).

The problem can then be posed mathematically as finding the FIR filter $A (z) = 1 - \sum_{i = 1}^{P} a_{i} z^{- i}$ that minimizes the energy $\sum_{n} | I [n] |^{2}$ of the output signal $I [n]$ in the filtering process:

X [n] \to A (z) \to I [n] .

In this case, the prediction-error filter can be written as $M^{- 1} (z) = A (z) = 1 - \tilde{P} (z)$ , where $\tilde{P} (z) = \sum_{k = n}^{P} a_{k} z^{- k}$ predicts an estimate $\tilde{X} [n]$ of $X [n]$ as

\tilde{X} [n] = \sum_{k = 1}^{P} a_{k} X [n - k],

(A.103)

based on the $P$ past samples of $X [n]$ .

AR-based modeling using (A.103) is widely used in speech coding and, because $X [n]$ in (A.103) is a linear combination of past inputs, it is called linear predictive coding (LPC) (see Section 4.10). Listing A.14 provides two examples of linear prediction using or not an AR process.

Listing A.14: MatlabOnly/snip_appprediction_example.m

1randn('seed',1); %reset number generator. Could use rng('default') 
2useARProcess = 0;       % 1 for AR or 0 for ARMA 
3N       = 100000;  % Number of samples 
4inn = randn(1, N); % innovation: WGN with zero mean and unit variance 
5if useARProcess == 1 % AR process 
6    fprintf('AR process example\n') 
7    Bz = 4; % H(z) = Bz/Az is an all-poles filter 
8    Az = poly([0.98*exp(1j*pi/4) 0.98*exp(-1j*pi/4)]); 
9    x = filter(Bz, Az, inn); % realization of an AR (2) process 
10    P = 2; % adopt the correct order of H(z) 
11else %ARMA process 
12    fprintf('ARMA process example\n') 
13    Bz = 6*poly([0.7*exp(1j*pi/3) 0.7*exp(-1j*pi/3)]); 
14    Az = poly([0.9*exp(1j*pi/4) 0.9*exp(-1j*pi/4)]); 
15    x = filter(Bz, Az, inn); % realization of innovation process 
16    P = 10; % it is not AR, so choose a relatively high value 
17end 
18[A, Pinn] = aryule(x, P); %prediction, P: order and Pinn: error variance 
19% Given the filter M(z) = 1/A(z), the inverse M^{-1}(z) = A(z) 
20% estimates the innovation sequence with X as input: 
21estimatedInnovation = filter(A, 1, x); % predict 
22Hw = freqz(Bz, Az); % frequency response 
23Spsd = abs(Hw).^2 + eps; % convert to scaled PSD and avoid zeros 
24Sgeo = geo_mean(Spsd); % geometric mean: power of innovation 
25Sarit = mean(Spsd); % arithmetic mean: power of input signal 
26predictionGain = 10*log10(Sarit/Sgeo); %prediction gain in dB 
27fprintf('prediction gain = %g dB \n', predictionGain) 
28fprintf('Signal power (time-domain) =  %g \n', mean(x.^2) ) 
29fprintf('Signal power (via PSD arithmetic mean) = %g \n', Sarit) 
30fprintf('Innovation power (via PSD geometric mean) = %g \n', Sgeo) 
31fprintf('Innovation power (via AR modeling) = %g \n', Pinn) 
32fprintf('Innovation power (time-domain) = %g \n', ... 
33    mean(estimatedInnovation.^2)) 
34% freqz(Bz,Az) %frequency response, if useful

The result of running Listing A.14 with useARProcess = 1 is the following:

AR process example
prediction gain = 14.0937 dB
Signal power (time-domain) =  422.558
Signal power (via PSD arithmetic mean) = 412.08
Innovation power (via PSD geometric mean) = 16.0552
Innovation power (via AR modeling) = 16.0843
Innovation power (time-domain) = 16.0819

Running Listing A.14 with useARProcess = 0 leads to the following:

ARMA process example
prediction gain = 2.99342 dB
Signal power (time-domain) =  72.056
Signal power (via PSD arithmetic mean) = 71.8227
Innovation power (via PSD geometric mean) = 36.0512
Innovation power (via AR modeling) = 36.1937
Innovation power (time-domain) = 36.1937

In both cases the results were consistent. In the second case, an order 10 for the AR filter was adopted. Note that moving the poles closer to unit circle increases the prediction gain. This occurs because, by moving the poles closer to the unit circle, the frequency response (and the power spectral density) tends to present a shaper peak with higher magnitude. This, in turn, produces a more significant increase in the arithmetic mean than on the geometric mean of the PSD, which implies in increased prediction gain, as inferred from Eq. (A.101).

¹⁵ The spectral factorization of Eq. (A.99) is valid when $S (z)$ satisfies the Paley-Wiener condition [BLM04] (page 33).

¹⁶ In a monic filter, the highest order coefficient in the denominator is unitary.

¹⁷ See [BLM04], page 25, for explanations on reflected transfer functions $H^{*} (1 ∕ z^{*})$ and also Example 2-13 in page 31. Note that when $H (z) H^{*} (1 ∕ z^{*})$ is evaluated on the unit circle, the result is $| H (e^{j Ω}) |^{2}$ .

¹⁸ A loosely minimum-phase filter can have zeros on unit circle while a strictly minimum phase has all zeros within the unit circle.

¹⁹ The geometric mean can be easily calculated with geomean in Octave/Matlab, but as discussed in [BLM04], page 30, just before Example 2-12, the zeros of $S_{X} (e^{j Ω})$ must be avoided by, e. g., adding a small constant. The following code provides an example: w=linspace(-pi,pi,10000); S=2-2*sin(w); S=S+eps; Sg=geomean(S), Sa=mean(S).

²⁰ MMSE linear estimation theory is further discussed in [CF97].