Rectangular Integration to Define Normalization Factors for Functions

A.13 Rectangular Integration to Define Normalization Factors for Functions

In several situations a computer is used to obtain points that should represent a continuous function $f (x)$ , $x \in ℝ$ . Two examples of this situation are the estimation of probability density functions (PDF) via histograms and power spectral density (PSD) estimation via an FFT routine.

Instead of aiming at an analytical expression $\hat{f} (x)$ to represent $f (x)$ , the task consists in obtaining a set of points ${\hat{f} (x [n])}$ calculated at the values $x [n]$ , $n = 0, \dots, N - 1$ , which are a uniformly-sampled version of the abscissa $x$ .

Often it is possible to first obtain a set of values ${ĝ (x [n])}$ in which the value $ĝ (x [n])$ is proportional to $f (x [n])$ , i. e., $f (x [n]) \propto ĝ (x [n])$ . In this case, it is required to later determine a scaling factor $κ \in ℝ$ such that the final set of values to represent $f (x)$ is obtained via

\hat{f} (x [n]) = κĝ (x [n]) .

(A.29)

Note that the goal is not necessarily to have $\hat{f} (x) \approx f (x)$ . There are situations in which the set of points ${\hat{f} (x [n])}$ must obey a property. For example, when histograms are used to estimate probability mass functions, one desired property is that $\sum_{x [n]} \hat{f} (x [n]) = 1$ . Alternatively, the goal may be to scale the histogram such that the two resulting curves (normalized histogram and probability density function) coincide. The values of $κ$ are different for these two possible cases of histogram normalization as discussed after recalling the rectangle method.

The rectangle method [ urlBMrec] is used for approximating a definite integral:

\int_{a}^{b} f (x) dx \approx h ∑_{n = 0}^{N - 1} f (x [n]),

(A.30)

where $h = (b - a) ∕ N$ is the rectangle width and $x [n] = a + nh$ .

The rectangle method can be used, for instance, to relate the continuous-time convolution in Eq. (3.2) with its discrete-time counterpart in Eq. (3.1). Assuming $T_{s}$ is the sampling interval used to obtain the discrete-time signals $x [n]$ and $h [n]$ from $x (t)$ and $h (t)$ , respectively, the factor $T_{s}$ is required to better approximate the samples of $y (t) = x (t) * h (t)$ when using a discrete-time convolution:

y (n T_{s}) \approx T_{s} (x [n] * h [n]) .

(A.31)

Besides, rectangle integration is useful to calculate the scaling factor $κ$ in the two cases discussed in next section.

A.13.1 Two normalizations for the histogram

When the task is to estimate the PDF $f (x)$ of a continuous random variable, one can try using a discrete histogram $ĝ (x [n])$ , which is obtained by drawing $M$ values from $f (x)$ and counting the number of values occurring at each of $B$ bins. Intuitively, for large $M$ and $B$ , the curve (or the “envelope”) of the histogram resembles $f (x)$ but it is off by a normalization factor $κ$ .

If $κ = 1 ∕ M$ is chosen, which is the most adopted option, one has $\hat{f} (x [n]) = (1 ∕ M) ĝ (x [n])$ and, consequently $\sum_{n} \hat{f} (x [n]) = 1$ . However, in this case, $\hat{f} (x [n])$ may be far from $f (x)$ by a large scaling factor. This can be observed in the curves generated by the following code:

1M=1000; x=3*rand(1,M); %M random numbers from 0 to 3 
2B=100; [hatgx,N_x]=hist(x,B); %histogram with B bins 
3hatfx = hatgx/M; %normalize the histogram to sum up to 1 
4plot(N_x,hatfx,[-1,0,0,3,3,4],[0,0,1/3,1/3,0,0],'o-') 
5xlabel('random variable x'), ylabel('PDF f(x)') 
6legend('estimated','theoretical'); sum(hatfx)

The result of sum(hatfx) is equal to one, as specified, but the PDF of the simulated distribution $U (0, 3)$ is 1/3 over its support and the superimposed estimated and theoretical graphs do not match. This discrepancy between the curves should be expected given that the normalized histogram hatfx was in fact an estimate of a probability mass function (PMF) of a discrete random variable, obtained by quantizing the original $x$ . Another normalization factor $κ \neq 1 ∕ M$ must be used if the goal is to have $f (x) \approx \hat{f} (x [n])$ .

To obtain $κ$ such that $f (x) \approx \hat{f} (x [n])$ , one can use the property that the integral of a PDF is one. Based on the rectangle method one can write

\int_{a}^{b} f (x) dx \approx h ∑_{n = 0}^{N - 1} \hat{f} (x [n]) = hκ ∑_{n = 0}^{N - 1} ĝ (x [n]) = 1,

where $h = (b - a) ∕ B$ . Because $\sum_{n = 0}^{N - 1} ĝ (x [n]) = M$ , one obtains $κ = 1 ∕ (hM)$ , which is the original factor $1 ∕ M$ divided by the bin width $h$ . The function ak_normalize_histogram.m uses this approach. Using the same example of the previous code, the following commands for obtaining hatfx would lead to consistent theoretical and estimated curves:

1M=1000; x=3*rand(1,M); %M random numbers from 0 to 3 
2B=100; [hatgx,N_x]=hist(x,B); %histogram with B bins 
3h=3/B; %h is the bin width assuming the support is 3 
4hatfx = hatgx/(M*h); %PDF values via normalized histogram 
5plot(N_x,hatfx,[-1,0,0,3,3,4],[0,0,1/3,1/3,0,0],'o-') 
6xlabel('random variable x'), ylabel('PDF f(x)') 
7legend('estimated','theoretical'); sum(hatfx)

As expected, in contrast to the sum equal to one in the first code, in this case sum(hatfx)=1/h=33.3. Both histogram normalization factors, $κ = 1 ∕ M$ and $κ = 1 ∕ (hM)$ , are useful and the choice depends whether the application requires values from a PMF or PDF, respectively.

A.13.2 Two normalizations for power distribution using FFT

Another application that can be related to Eq. (A.21) is the use of FFT for estimating how the signal power is distributed over frequency. It is assumed here a finite-duration discrete-time signal $x [n]$ with $N$ non-zero samples.

The squared FFT magnitude $| FFT {x [n]} |^{2}$ plays the role of the function $ĝ$ in Eq. (A.29). The choice $κ = 1 ∕ N^{2}$ leads to an estimate $\hat{f} (\cdot)$ of the mean-square spectrum (MSS) $S_{ms} [k]$ of Eq. (4.35), while $κ = 1 ∕ (N^{2} Δ f)$ corresponds to PSD $S (f)$ in Eq. (4.22), where $Δ f = BW ∕ N$ and $BW$ is given in Hz. As indicated in Table A.1, the two options for $κ$ have similarities with the ones for histogram normalization.

Table A.1: Analogy between using the histogram and DFT for estimation, where

ĝ (x [n])

is the estimated function and

\hat{f} (x [n]) = κĝ (x [n])

its normalized version. The unit of

\hat{f} (x [n])

is indicated within parentheses.


	$ĝ (\cdot)$ is histogram	$ĝ (\cdot)$ is $\| FFT {x [n]} \|^{2}$

Estimate a discrete function	$κ = 1 ∕ M$	$κ = 1 ∕ N^{2}$

	$\hat{f} (\cdot)$ is PMF (probability)	$\hat{f} (\cdot)$ is MSS $Ŝ_{ms} [k]$ (Watts)


Estimate a continuous function	$κ = \frac{1}{hM}$	$κ = \frac{1}{N^{2} Δ f}$

	$\hat{f} (\cdot)$ is PDF (likelihood)	$\hat{f} (\cdot)$ is PSD $Ŝ (f)$ (watts/Hz)

In both cases in Table A.1, when going from a discrete to a continuous function, the bin width ( $h$ for histogram and $Δ f$ for the FFT), is used as normalization factor.