4.9  Time-frequency Analysis using the Spectrogram

4.9.1  Definitions of STFT and spectrogram

The PSD is a powerful tool to analyze signals in the frequency-domain. However, a single PSD fails when the signal “changes” over time. More strictly, if the signal cannot be assumed (WSS) stationary, alternative tools are potentially needed to describe how information varies in frequency and time domains. One relatively simple technique is the short-time Fourier transform (STFT).

The concept behind STFT is to extract segments of the signal under analysis using windowing and calculate several Fourier transforms, one for each segment. Mathematically, the STFT of a continuous-time signal x(t) is

X(τ,f) =x(t)w(t τ)ej2πftdt,
(4.66)

where τ is used to shift the window w(t) originally centered at t = 0. Eq. (4.66) can be interpreted by fixing τ = τ0 and observing that X(τ0,f) is the Fourier transform of the windowed signal x(t)w(t τ0). The STFT is invertible and allows for recovering x(t). However, in the sequel it is assumed that the phase can be discarded given that the main interest is to observe the distribution of power along frequency and time.

The spectrogram (for continuous-time)

S(τ,f) = |X(τ,f)|2
(4.67)

is defined as the squared magnitude of the STFT and is widely used to analyze nonstationary signals.

PIC

Figure 4.41: PSD (top) and spectrogram (bottom) of a cosine that has its frequency increased from Ω = 2π30 to 2π7 and its power decreased by 20 dB at half of its duration.

The specgram function in Matlab/Octave can be used to estimate spectrograms S(τ,Ω) for discrete-time signals. Because S(τ,Ω) is restricted to real numbers, a color scale can be used instead of a 3-d graph. For example, Listing 4.27 was used to generate Figure 4.41.

Listing 4.27: MatlabOctaveCodeSnippets/snip_frequency_cosine_spectogram.m
1N=3000; %total number of samples 
2n=0:N-1; %abscissa 
3x1=100*cos(2*pi/30*n); %first cosine 
4x2=1*cos(2*pi/7*n); %second cosine 
5x=[x1 x2]; %concatenation of 2 cosines 
6subplot(211), pwelch(x) %PSD 
7subplot(212), specgram(x), colorbar %spectrogram
  

The code and Figure 4.41 illustrate that the PSD describes only the existence of two cosines but is not capable of informing their location in time. The spectrogram also indicates, by color, that the first half of the signal is composed of a cosine x1 with power (20 dB) greater than x2. The burst of power spread over the whole bandwidth at approximately n = 1500 occurs because the windowed signal at this specific FFT is composed by incomplete cycles of both cosines.

Matlab (but not Octave) has the spectrogram function. The companion software has ak_specgram, which represent two alternative functions to specgram11 with distinct input parameters.

PIC

Figure 4.42: All twelve DTMF symbols: 1-9,*,0,#, each one composed by a sum of a low [697,770,852,941] and a high [1209,1336,1477] (Hz) frequencies.

As another spectrogram example, Figure 4.42 shows a sequence of all twelve dual-tone multi-frequency (DTMF) tones generated by the script figs_spectral_dtmf.m. In this case, each symbol has a 100 ms duration. It is possible to visually decode the signal. For example, the first symbol (left-most) is composed by a sum of sines of frequencies 697 and 1,209 Hz (representing “1”) while the second is composed by frequencies 697 and 1,336 Hz (symbol “2”) and so on. Note again the bursts of power at the transitions between symbols.

After creating dtmfSignal with Fs = 8 kHz, the spectrogram of Figure 4.42 was generated with the commands below, and for a better visualization, the dynamic range was restricted to 40 dB via the parameter thresholdIndB:

1filterBWInHz=40; %equivalent FFT bandwidth in Hz 
2samplingFrequency=8000; %sampling frequenci in Hz 
3windowShiftInms=1; %window shift in miliseconds 
4thresholdIndB=40; %discards low power values below it 
5ak_specgram(dtmfSignal,filterBWInHz,samplingFrequency,... 
6   windowShiftInms,thresholdIndB) %calculate spectrogram

4.9.2  Advanced: Wide and narrowband spectrograms

A fundamental restriction of the STFT and, consequently, spectrograms, is the tradeoff between time and frequency resolution. When the window is made longer (its duration is increased), the frequency resolution improves but the time resolution gets worse. A spectrogram is called narrowband when the window is long and the FFT invoked by the spectrogram routine is equivalent to a bank of filters (see Section 4.3.2.0) with relatively narrow bandwidth. In contrast, a wideband spectrogram uses a short window and, consequently, the FFT corresponds to filters with relatively large bandwidths. The two spectrograms are contrasted here via an example using a speech signal. Speech is highly non stationary given that the information regarding the phonemes is encoded in segments composed of distinct frequencies. The sentence “We were away” was recorded with Fs = 8000 Hz using the Audacity free software and stored as a (RIFF) wav file.

PIC

Figure 4.43: Example of narrowband spectrogam of a speech signal.

PIC

Figure 4.44: Example of wideband spectrogam of a speech signal.

Figure 4.43 and Figure 4.44 were generated with Listing 4.28.

Listing 4.28: MatlabOctaveCodeSnippets/snip_frequency_narrow_wide_spec.m. [ Python version]
1[s,Fs,wmode,fidx]=readwav('WeWereAway.wav','r'); %read wav file 
2numbits = fidx(7); % num of bits per sample (should be 16) 
3Nfft = 1024; %number of FFT points 
4figure(1), M=64; %window length in samples for wideband 
5specgram(s,Nfft,Fs,hann(M),round(3/4*M)); colorbar 
6figure(2), M=256; %window length in samples for narrowband 
7specgram(s,Nfft,Fs,hann(M),round(3/4*M)); colorbar
  

Figure 4.44 shows a broadband spectrogram (good time resolution but poor frequency resolution) calculated with frames of 64 samples obtained by a Hann window. The frames had an overlap of 3/4 of the frame size, and the spectrum of each windowed signal is calculated through a 1024-points FFT. Zero-padding was used (1024 instead of 64) in order to sample more densely the DTFT of the windowed signal. The user is invited to try the command specgram(s,M,Fs,hann(M),0), which corresponds to not using zero-padding and overlapping to notice the improvements these two strategies bring.

Figure 4.43 simply increases the window length from 64 to 256 to create a narrowband spectrogram (poor time resolution and good frequency resolution). The narrowband version allows to see the harmonic structure due to the pitch (see Application 1.12) as horizontal strips in the graph. This harmonic structure appears in Figure 4.44 as vertical strips.