It is often useful to infer whether or not random variables are related to each other via correlation. For example, height and weight are quantities that present positive correlation for the human population. However, happiness and height are uncorrelated. There are measures other than correlation to detect similarities among variables, such as the mutual information, but correlation is simple and yet very useful. Correlation is called a second-order statistics because it considers only pairwise or quadratic dependence. It does not detect non-linear relations nor causation, that is, correlation cannot deduce cause-and-effect relationships. But correlation helps to indicate trends such as one quantity increasing when another one decreases.

The correlation C for two complex-valued random variables X and Y is defined as

C = cor(X,Y) = 𝔼[XY],

where denotes the complex conjugate. The correlation coefficient, which is a normalized version of the covariance (not correlation), is defined as

ρ = cov(X,Y) σxσy ,

where the covariance is given by

cov (X,Y) = 𝔼[(X μx)(Y μy)].
(1.48)

The discussion in this chapter is restricted to real-valued random variables. The second-order characterization of complex random variables is more involved, as it must account for interactions between real and imaginary parts. When the random variables are real-valued, Eq. (1.48) simplifies to

cov (X,Y) = 𝔼[XY] μxμy.

Two real-valued random variables are called uncorrelated if, and only if, cov(X,Y) = 0, which is equivalent to

𝔼[XY] = μxμy.
(1.49)

As mentioned, ρ is calculated by extracting the means 𝔼[X] = μx and 𝔼[Y] = μy, i. e., using cov(X,Y) instead of cor(X,Y), and is restricted to 1 ρ 1 when the random variables are real-valued. Application 1.2 gives an example of using correlation to perform a simple data mining.

1.10.1  Autocorrelation functions

Autocorrelation functions are extension of the correlation concept to signals. Table 1.10 summarizes the definitions that will be explored in this text.

Table 1.10: Autocorrelation functions, their respective equation numbers and symbols.
Type of process or signal Equation Adopted symbol
General stochastic process X(t) (1.50) RX(s,t)
Wide-sense stationary stochastic process X(t) (1.51) RX(τ)
Deterministic continuous-time energy signal x(t) (1.52) RX(τ)
Deterministic continuous-time power signal x(t) (1.54) RX(τ)
Deterministic discrete-time power signal x[n] (1.56) RX[]
Deterministic discrete-time energy signal x[n] (unnormalized) (1.59) RX[]
Deterministic discrete-time energy signal x[n] (normalized) (1.60) RX[]

The existence of distinct autocorrelation definitions for power and energy signals illustrate the importance of distinguishing these two categories of signals, as discussed in Section 1.6.4. The reader that is not familiar with stochastic processes, may skip the next subsection and first focus on autocorrelation for deterministic signals.

Advanced: Definitions of autocorrelation for stochastic processes

An autocorrelation function (or ACF) RX(s,t) defined for stochastic processes (see Appendix B.4) is

RX(s,t) = 𝔼[X(s)X(t)],
(1.50)

which corresponds to the correlation between random variables X(s) and X(t) at two different time instants t and s of the same random process where X(t) is the complex conjugate of X(t). The purpose of the ACF is to determine the strength of relationship between amplitudes of the signal occurring at two different time instants.

For wide-sense stationary (WSS) processes, the ACF is given by

RX(τ) = 𝔼[X(t + τ)X(t)],
(1.51)

where the time difference τ = s t is called the lag (or delay) time.

The random process formalism is very powerful and allows for representing complicated processes. But in many practical cases only one realization x(t) (or x[n]) of the random process X(t) is available. The alternative to deal with the signal without departing from the random process formalism is to assume that X(t) is ergodic. In this case, the statistical (or ensemble) average represented by 𝔼[] is substituted by averages over time.

The definitions of autocorrelation given by Eq. (1.50) and Eq. (1.51) are the most common in signal processing, but others are adopted in fields such as statistics. Choosing one depends on the application and the model adopted for the signals.

When dealing with a simple signal, one can naturally adopt one of the definition of autocorrelation for deterministic signals discussed in the sequel.

Definitions of autocorrelation for deterministic signals

A definition of ACF RX(τ) tailored to a continuous-time deterministic (non-random) energy signal x(t), which does not use expected values as Eq. (1.50), is20

RX(τ) =x(t + τ)x(t)dt,
(1.52)

such that

RX(0) =|x(t)|2dt = E,
(1.53)

where E is the signal energy. It should be noted that this definition of autocorrelation cannot be used for power signals. Power signals have infinite energy and using Eq. (1.52) for τ = 0 leads to RX(0) = in the case of power signals, which is uninformative.

If x(t) is a deterministic power signal, an useful ACF definition is

RX(τ) = lim T 1 2TT T x(t + τ)x(t)dt,
(1.54)

where it is assumed that the limit exists for all τ. Then, the autocorrelation value at τ = 0 is

RX(0) = lim T 1 2TT T |x(t)|2dt = P,
(1.55)

where P is the signal average power.21

Similarly, there are distinct definitions of autocorrelation RX[] for deterministic discrete-time signals. When infinite duration power signals can be assumed, it is sensible to define

RX[] = lim N 1 2N n=NNx[n + ]x[n].
(1.56)

In this case, unless x[n] has infinite energy, RX[] would be always zero due to the division. Similar to τ in continuous-time RX(τ), is called the lag or delay.

For discrete-time energy signals x[n] that are defined for all values of n, one general definition is

RX[] = n=x[n + ]x[n] = nx[n + ]x[n].
(1.57)

When x[n] is a finite-duration signal with N non-zero samples, from n = 0 to N 1, the (unscaled or not normalized) autocorrelation RX[] can be written as22

R^ X [] = n=0N1||x[n + ||]x[n].
(1.58)

An alternative definition, which mirrors the autocorrelation expression used for infinite-length signals, is the unnormalized (non-scaled) autocorrelation

RX[] = nx[n + ]x[n], = (N 1),,1,0,1,,N 1
(1.59)

where it is assumed that x[n] = 0 whenever n lies outside the interval 0 n N 1. This corresponds to assuming the signal is extended with enough zeros (zero-padding) to the right and to the left. Under this zero-padding assumption, the summation can be written without explicitly adjusting the limits for each lag.

For example, assuming x[n] = δ[n] + 2δ[n 1] + 3δ[n 2], which can be represented by the vector [1,2,3] with N = 3 elements, the autocorrelation RX[] = 3δ[ + 2] + 8δ[ + 1] + 14δ[] + 8δ[ 1] + 3δ[ 2], which corresponds to the vector [3,8,14,8,3]. The calculations are detailed in Table 1.11, where it can be noted that, for a given lag , the subtraction of the indexes of all parcels in valid products is always (n + ||) n = ||. For instance, for the row corresponding to lag = 1 in Table 1.11, subtracting the indices of both parcels in the last column lead to || = | 1| = 1 (that is: 1 0 = 1; and 2 1 = 1).

Table 1.11: Example of autocorrelation using Eq. (1.58) for the real-valued signal x[n] = δ[n] + 2δ[n 1] + 3δ[n 2] with N = 3.
lag RX[] range of n

valid products x[n+||]x[n]

2 3 0

x[2]x[0]

1 8 0, 1

x[1]x[0]+x[2]x[1]

0 14 0, 1, 2

x[0]x[0]+x[1]x[1]+x[2]x[2]

1 8 0, 1

x[1]x[0]+x[2]x[1]

2 3 0

x[2]x[0]

Note in Table 1.11 that the number of valid products decreases as || increases. More specifically, when computing RX[] there are only N || “valid” products. To cope with that, there is an alternative to Eq. (1.58) called the normalized (or unbiased23) correlation, defined as

RX[] = 1 N || nx[n + ]x[n], = (N 1),,N 1.
(1.60)

The unnormalized definition of Eq. (1.59) is also called the biased autocorrelation. For deterministic and finite-duration signals, the biased version has the advantage of guaranteeing that the autocorrelation property |RX[]| RX[0] is obeyed.

Example 1.54. Calculating the correlation with Matlab/Octave. In Matlab/Octave, the unbiased estimate for the signal represented by the vector [1,2,3] can be obtained with:

1x=[1,2,3];xcorr(x,'unbiased')

which outputs [3,4,4.67,4,3]. The unscaled estimate of Table 1.11 can be obtained with xcorr(x,’none’) or simply xcorr(x) because it is the default.    

Table 1.12 provides another example of autocorrelation calculation, this time for the complex-valued signal x[n] = (1 + j)δ[n] + 2δ[n 1] + 3δ[n 2] with N = 3 non-zero values, corresponding to the vector [1 + j,2,3] (n = 0,1,2), where j = 1.

Table 1.12: Example of autocorrelation using Eq. (1.58) for the complex-valued signal x[n] = (1 + j)δ[n] + 2δ[n 1] + 3δ[n 2] + 4δ[n 3] with N = 4.
lag RX[] range of n

valid products x[n+||]x[n]

3 4 + 4j 0

x[3]x[0]

2 11 + 3j 0, 1

x[2]x[0]+x[3]x[1]

1 20 + 2j 0, 1, 2

x[1]x[0]+x[2]x[1]+x[3]x[2]

0 31 0, 1, 2, 3

x[0]x[0]+x[1]x[1]+x[2]x[2]+x[3]x[3]

1 20 2j 0, 1, 2

x[1]x[0]+x[2]x[1]+x[3]x[2]

2 11 3j 0, 1

x[2]x[0]+x[3]x[1]

3 4 4j 0

x[3]x[0]

Table 1.11 and Table 1.12 illustrate that autocorrelation presents Hermitian symmetry: RX[] = RX[]. For real-valued signals, this property simplifies to RX[] = RX[]. Hermitian symmetry RX(τ) = RX(τ) is also observed for autocorrelations of continuous-time signals.

It is important to follow the adopted definition when calculating autocorrelation and also note that there are alternatives that are mathematically equivalent. For instance, consider the following two alternatives of writing Eq. (1.52):

RX(τ) =x(t + τ)x(t)dt =x(t τ)x(t)dt,
(1.61)

The equivalence of the two alternatives in Eq. (1.61) can be proved as follows:

Proof: Let x L2() and define

RX(τ) =x(t + τ)x(t)dt.

Perform the change of variable u = t + τ. Then t = u τ and dt = du. The limits remain ] ,[, such that

RX(τ) =x(u)x(u τ)du.

Using commutativity of complex multiplication and renaming the dummy variable,

RX(τ) =x(t τ)x(t)dt.

This equivalence is also observed in autocorrelation definitions for discrete-time signals. For instance,

RX[] = nx[n + ]x[n] = nx[n ]x[n].

As pointed out in the next example, one should pay attention to indexing when choosing between the two mentioned alternatives.

Example 1.55. Simple software implementations of autocorrelation. Some previous definitions of RX[] used a generic summation n over n. To be more concrete, an example of Matlab/Octave code to calculate the unscaled autocorrelation RX[] is given in Listing 1.22 using two alternatives that lead to the same result. It can be seen that the property RX[] = RX[] is used to obtain the autocorrelation values for negative lags .

Listing 1.22: MatlabOctaveCodeSnippets/snip_signals_unscaled_autocorrelation.m. [ Python version]
1%Calculate the unscaled autocorrelation R(lag) of x 
2x=[1+1j 2 3 4]; %define some vector x to test the code 
3N=length(x); %number of non-zero signal values 
4%% Option 1) x(n+lag)*conj(x(n)) 
5R=zeros(1,N); %space for lag=0,1,...N-1 
6R(1)=sum(abs(x).^2); %R(0) is the energy 
7for lag=1:N-1 %for each positive lag 
8    temp = 0; %partial value of R 
9    for n=1:N-lag %vary n over valid products 
10        temp = temp + x(n+lag)*conj(x(n)); 
11    end 
12    R(lag+1)=temp; %store final value of R 
13end 
14R = [conj(fliplr(R(2:end))) R] %apply Hermitian symmetry 
15%% Option 2) conj(x(n-lag))*x(n) 
16R_alternative=zeros(1,N); %space for lag=0,1,...N-1 
17R_alternative(1)=sum(abs(x).^2); %R(0) is the energy 
18for lag=1:N-1 %for each positive lag 
19    temp = 0; %partial value of R 
20    for n=lag+1:N %vary n over valid products 
21        temp = temp + conj(x(n-lag))*x(n); 
22    end 
23    R_alternative(lag+1)=temp; %store final value of R 
24end 
25%apply Hermitian symmetry: 
26R_alternative = [conj(fliplr(R_alternative(2:end))) R_alternative] 
27error1 = R - xcorr(x) %compare with xcorr 
28error2 = R - R_alternative %compare the 2 alternatives  

Note the indexing for n=1:N-lag versus for n=lag+1:N that also distinguishes the two options.

The function xcorr in Matlab/Octave uses a much faster implementation based on the fast Fourier transform (FFT), to be discussed in Chapter 2. When comparing the results of the two methods, the discrepancy is around 1014, which is a typical order of magnitude for numerical errors when working with Matlab/Octave.    

Examples of some signals autocorrelations

Two examples of autocorrelation are discussed in the sequel.

Example 1.56. Autocorrelation of white signal. A signal is called “white” when it has an autocorrelation R[] = [] in discrete-time, or R(τ) = (τ) in continuous-time, where A is an arbitrary value. In other words, a white signal has an autocorrelation that is nonzero only at the origin, which corresponds to having neighbor samples that are uncorrelated.

This nomenclature will be clarified in Chapter 4 but it can be anticipated that such signals have their power uniformly distributed over frequency and the name is inspired by the property of white light, which is composed by a mixture of color wavelengths.

As mentioned, the samples of a white signal are independent and, if they are also identically distributed according to a Gaussian PDF, the signal is called white Gaussian noise (WGN). More strictly, a WGN signal can be modeled as a realization of a wide-sense stationary process. In this case, WGN denotes the stochastic process itself. Using Matlab/Octave, a discrete-time realization of WGN can be obtained with function randn, as illustrated in Section 1.6.2.    

Example 1.57. Autocorrelation of sinusoid. Using Eq. (1.54), the autocorrelation of a sinusoid x(t) = Asin (ωt + ϕ) can be calculated as follows:

R(τ) = lim T [ 1 2TT T Asin (ωt + ϕ)Asin (ωt + ωτ + ϕ)dt].

We now assume a = ωt + ϕ for simplicity and use Eq. (A.4) to expand sin (ωt + ϕ + ωτ):

R(τ) = lim T [ A2 2TT T sin (a) [sin (a)cos (ωτ) + sin (ωτ)cos (a)]dt] = A2 lim T [ cos (ωτ) 2T T T sin 2(a)dt + sin (ωτ) 2T T T sin (a)cos (a)dt] = A2 lim T [ cos (ωτ) 4T T T [1 cos (2a)]dt + sin (ωτ) 4T T T sin (2a)dt] = A2 lim T [ cos (ωτ) 2T 1 2T T dt] = A2 cos (ωτ) 2 . (1.62)

The third equality used Eq. (A.9) and Eq. (A.5), and it was simplified because the integrals of both cos (2a) and sin (2a) are zero given the integration interval is an integer number of their periods. The final result indicates that the autocorrelation of a sinusoid or cosine does not depend on the phase ϕ and is also periodic in τ, with the same period 2πω that the sinusoid has in t. Therefore, the frequencies contained in the realizations of a stationary random process can be investigated via the autocorrelation R(τ) of this process.

PIC

Figure 1.61: A sinusoid of period N=8 samples and its autocorrelation, which is also periodic each 8 lags. The cosine corresponding to Rx[l] has amplitude A22 = 422 = 8.

A simulation with Matlab/Octave can help understanding this result. Figure 1.61 was generated with Listing 1.23.

Listing 1.23: MatlabOctaveCodeSnippets/snip_signals_sinusoid_autocorrelation.m. [ Python version]
1numSamples = 48; %number of samples 
2n=0:numSamples-1; %indices 
3N = 8; %sinusoid period 
4x=4*sin(2*pi/N*n); %sinusoid (try varying the phase!) 
5[R,l]=xcorr(x,'unbiased'); %calculate autocorrelation 
6subplot(211); stem(n,x); xlabel('n');ylabel('x[n]'); 
7subplot(212); stem(l,R); xlabel('Lag l');ylabel('R_x[l]');  

It can be seen that the x and R are a sine and cosine, respectively, of the same frequency in their corresponding domains (n and l, respectively).

PIC

Figure 1.62: The a) unbiased and b) raw (unscaled) autcorrelations for the sinusoid of Figure 1.61 with a new period of N=15 samples.

Figure 1.62 was obtained by changing the sinusoid period from N=8 to N=15 and illustrates the effects of dealing with finite-duration signals. Note that both the unbiased and unscaled versions have diminishing values at the end points.   

Using Eq. (1.56) and a calculation similar to the one used in Eq. (1.62), one can prove that the autocorrelation of x[n] = sin (Ωn + ϕ) is RX[] = cos (Ω)2.

1.10.2  Cross-correlation function

The cross-correlation function (also called correlation) is very similar to the ACF but uses two distinct signals, being defined for deterministic energy signals as

Rxy(τ) x(t + τ)y(t)dt =x(t)y(t τ)dt.

Note the adopted convention with respect to the complex conjugate.

In discrete-time, the deterministic cross-correlation for energy signals is

Rxy[l] n=x[n + l]y[n] = n=x[n]y[n l].

Application 1.6 illustrates the use of cross-correlation to align two signals.

Before concluding this section, it is convenient to recall that the autocorrelation of a signal inherits any periodicity that is present in the signal. Sometimes this periodicity is more evident in the autocorrelation than in the signal waveform. The next example illustrates this point by discussing the autocorrelation of a sinusoid immersed in noise.

Example 1.58. The power of a sum of signals is the sum of their powers in case they are uncorrelated. Assume that a sinusoid x[n] is contaminated by noise z[n] such that the noisy version of the signal is y[n] = x[n] + z[n]. The signal z[n] is a WGN (see Section 1.10.1.0) that is added to the signal of interest x[n] and, therefore, called additive white Gaussian noise (AWGN). If x[n] and z[n] are uncorrelated, such that Rxz[l] = 𝔼[x[n + l]z[n]] = 0,l, the autocorrelation Ry[l] of y[n] is given by

Ry[l] = 𝔼[y[n + l]y[n]] = 𝔼[(x[n + l] + z[n + l])(x[n] + z[n])] = Rx[l] + Rzx[l] + Rxz[l] + Rz[l] = Rx[l] + Rz[l].

PIC

Figure 1.63: Sinusoid of amplitude 4 V immersed in AWGN of power 25 W. The bottom graph is a zoom showing the first 100 samples.

PIC

Figure 1.64: Bottom graph: autocorrelation of the sine plus noise in Figure 1.63, top: autocorrelation of the sine and middle: autocorrelation of the noise.

Listing 1.24 illustrates a practical use of this result. A sine with amplitude 4 V and power 422 = 8 W is contaminated by AWGN with power of 25 W. All signals are represented by 4,000 samples, such that the estimates are relatively accurate.

Listing 1.24: MatlabOctaveCodeSnippets/snip_signals_noisy_sinusoid.m. [ Python version]
1%Example of sinusoid plus noise 
2A=4; %sinusoid amplitude 
3noisePower=25; %noise power 
4f=2; %frequency in Hz 
5n=0:3999; %"time", using many samples to get good estimate 
6Fs=20; %sampling frequency in Hz 
7x=A*sin(2*pi*f/Fs*n); %generate discrete-time sine 
8randn('state', 0); %Set randn to its default initial state 
9z=sqrt(noisePower)*randn(size(x)); %generate noise 
10clf, subplot(211), plot(x+z); %plot noise 
11maxSample=100; %determine the zoom range 
12subplot(212),stem(x(1:maxSample)+z(1:maxSample)),pause; %zoom 
13maxLags = 20; %maximum lag for xcorr calculation 
14[Rx,lags]=xcorr(x,maxLags,'unbiased'); %signal only 
15[Rz,lags]=xcorr(z,maxLags,'unbiased'); %noise only 
16[Ry,lags]=xcorr(x+z,maxLags,'unbiased');%noisy signal 
17subplot(311), stem(lags,Rx); ylabel('R_x[l]'); 
18subplot(312),stem(lags,Rz);ylabel('R_z[l]'); 
19subplot(313),stem(lags,Ry);xlabel('Lag l');ylabel('R_y[l]');  

The signal-to-noise ratio (SNR) is a common metric consisting of the ratio between the signal and noise power values and often denoted in dB (seem Appendix B.5). Figure 1.63 illustrates the fact that it is hard to visualize the sinusoid because of the relatively low signal-to-noise ratio SNR dB = 10log10(825) 5 dB. The waveform does not seem to indicate periodicity.

Figure 1.64 shows the autocorrelations of y[n] and its two parcels. For the lag l = 0, the estimated values are Rx[0] = 8, Rz[0] = 24.92 and Ry[0] = 33.17. The theoretical values are 8 W (the sine power), 25 W (noise power) and 8 + 25 = 33 W (sum of the parcels), respectively. The bottom graph clearly exhibits periodicity and the noise disturbs only the value of Ry[l] at l = 0. In summary, two assumptions can simplify the analysis of random signals: that the ACF of the noise is approximately an impulse at the origin (l = 0) and that the signal and noise are uncorrelated.  

Example 1.59. Power of the sum of two uncorrelated signals. Assume a signal z[n] = x[n] + y[n] is generated by summing two real signals (similar result can be obtained for complex-valued signals) x[n] and y[n] with power Px and Py. The question is: What is the condition for having Pz = Px + Py?

Assuming the two signals are random and using expected values (a similar result would hold for deterministic signals):

P z = 𝔼[Z2] = 𝔼[(X + Y)2] = P x + Py + 2𝔼[XY].
(1.63)

If X and Y are uncorrelated, i. e., 𝔼[XY] = 𝔼[X]𝔼[Y] and at least one signal is zero-mean, Eq. (1.63) simplifies to

P z = Px + Py.
(1.64)

This is a useful result for analyzing communication channels that model the noise as additive. These models assume the noise is uncorrelated to the transmitted signal and Eq. (1.64) applies.    

Advanced: Random processes cross-correlation and properties

Some important properties of the cross-correlation are:

When considering random processes, the two random variables are obtained from distinct processes:

RXY (s,t) = 𝔼[X(s)Y (t)]
(1.65)

or, assuming stationarity (see Section B.4.0.0) with s = t + τ:

RXY (τ) = 𝔼[X(t + τ)Y (t)].
(1.66)

Example 1.60. The AWGN channel. Thermal noise is ubiquitous and WGN is a random process model often present in telecommunications and signal processing applications. WGN was briefly introduced in Examples 1.561.58 and Application 1.3. In telecommunications, distinct systems that share the property of having WGN added at the receiver are called AWGN channel models.

Figure 1.65 illustrates a continuous-time AWGN, where the received signal r(t) = s(t) + ν(t) is simply the sum of the transmitted signal s(t) and WGN ν(t).

PIC

Figure 1.65: Continuous-time version of the AWGN channel model.

Because s(t) and WGN ν(t) are often assumed to be uncorrelated, as discussed in Example 1.58, the power of r(t) is the sum of the powers of s(t) and ν(t).    

Having learned about correlation (cross-correlation, etc.), it is possible to study a linear model for quantization, which circumvents dealing with the quantizer non-linearity.