A.22  Advanced: Vector Prediction Exploring Spatial Correlation

Instead of exploring correlation over time, this section discusses methods to explore the so-called spatial correlation: an element of the random vector being estimated based on the other elements of this vector. As signal model, the Gaussian block or “packet” ISI channel [CF97], (page 84, Eq. 4.6) is adopted here, which is given by

Y = HX + N,
(A.101)

where X is a zero-mean complex random input m-vector, Y is a zero-mean complex random output n-vector, N is a complex zero-mean Gaussian noise n-vector independent of X and H is the complex n × m channel matrix [CF97]. For these random vectors, the correlation matrices coincide with covariance matrices. 22

The output covariance matrix is given by

Ryy = HRxxH + R nn,
(A.102)

where a matrix superscript denotes Hermitian (transpose conjugate).

A characteristic representation of a random m-vector X is given by the linear combination of the columns of a matriz F, whose determinant is equal to one,23 weighted by a vector of uncorrelated random variables V, that is:

X = FV.
(A.103)

Hence, the covariance matrix of X is given by:

Rxx = FRvvF
(A.104)

where Rvv is diagonal (the random variables in V are uncorrelated).

There are two alternatives of interest for representing a vector in its characteristic form, the modal and the innovations representation. The first is derived from the eigendecomposition of Rxx. Given the factorization Rxx = UΛx2U = (UΛx)(UΛx), by comparison to Eq. (A.104), F corresponds to the unitary matrix U from the eigendecomposition, while the uncorrelated vector V from Eq. (A.103) corresponds to U−1X. Meanwhile, the latter (innovations representation) is derived from the Cholesky decompostion. In a similar manner, given the factorization Ryy = LDy2L = (LDy)(LDy), F corresponds to the lower triangular matrix L, while V corresponds to L−1X.

The important conclusion yielded by these two representations is that a vector X whose random variables are correlated can be whitened by a forward section given by U−1, the inverse of the unitary matrix from the eigendecomposition of its covariance matrix, or L−1, the inverse of the lower triangular matrix from the Cholesky decomposition of its covariance matrix.

The innovations representation is a natural adaptation of linear prediction over time and is obtained with a Cholesky factorization of Ryy, while the modal representation can be obtained via eigenanalysis or SVD.

The optimum MMSE linear predictor in this scenario is

Y~ = PY,
(A.105)

where the predictor matrix P is given by

P = I − L−1,
(A.106)

with L being obtained from the innovations representation Ryy = LDy2L and I being the identity matrix. It was assumed that Ryy is nonsingular, otherwise the pseudo inverse can be used.

Because L is lower triangular and monic, its inverse is also lower triangular and monic. The subtraction of L−1 from I makes P to be lower triangular with zeros in the main diagonal. This structure imposes a causal relation among the elements of Y, such that Y~ can be obtained recursively.

The error vector is

E = YY~ = Y − PY = Y − (I − L−1)Y = L−1Y.
(A.107)

In general, the sum mean-squared prediction error is

𝔼[||E||2] = 𝔼[||YY~||2] = trace{R ee},
(A.108)

where Ree = 𝔼[EE] is the autocorrelation matrix of E. It can be proved (see, e. g., [BLM04]) that when the optimum linear predictor of Eq. (A.106) is adopted, the error power trace{Ree} achieves its minimum value given by trace{Dy2}. This avoids the step of estimating Ree to obtain the prediction gain, which is given by

prediction gain = 10log10 (trace{Rxx} trace{Ree} ) = 10log10 ( trace{Rxx} trace{Dy2})dB.
(A.109)

Hence, making an analogy with prediction over time, repeated here for convenience:

X[n] →Mx−1(z) I[n] →M x(z) X[n]

the spatial prediction allows to obtain

YL−1 EL Y,

which is expressed in matrix notation as E = L−1Y and Y = LE.

Listing A.15 illustrates an example discussed in [BLM04].

Listing A.15: MatlabOctaveCodeSnippets/snip_appprediction_spatialLinearPredictionExample.m
1%Example 10-11 from Barry, 2004 (note a typo in matrix R in the book) 
2Ryy=[16 8 4; 8 20 10; 4 10 21] %noise autocorrelation, correlated 
3%Ryy=[10, 8, 2; 8 10 10; 2 10 10]; %another option, higher gain 
4[L D] = ldl_dg(Ryy)%own LDL, do not use chol(A) because it swaps rows 
5Ryy-L*D*L' %compare with Ryy, should be the same 
6P=eye(size(L))-inv(L) %optimum MMSE linear predictor 
7minMSE=trace(D) %minimum MSE is the trace{Ree} = trace{D} 
8sumPowerX=trace(Ryy); %sum of all "users" 
9predictionGain = 10*log10(sumPowerX/minMSE)

In Listing A.15, the original predictor matrix is

     P =     [0,         0,         0;
         0.5000,         0,         0;
              0,    0.5000,         0]

and the prediction gain is 0.7463 dB. Adopting a new correlation matrix Ryy=[10, 8, 2; 8 10 10; 2 10 10] leads to

     P =     [0,         0,         0;
         0.8000,         0,         0;
        -1.6667,    2.3333,         0]

and a prediction gain of 9.2082 dB.

Note that the first element y~1 of Y~ = [y1,…,yV ] in Y~ = PY is always zero due to the structure of P. Then, the second element y~2 is a scaled version of the first element y1 of Y, and so on.