A spectrogram constitutes a visual depiction of a signal's frequency spectrum as it evolves over time. When applied to audio signals, these representations are alternatively termed sonographs, voiceprints, or voicegrams. In instances where data are presented in a three-dimensional plot, they may be referred to as waterfall displays.
Spectrograms find widespread application across diverse disciplines, including music, linguistics, sonar, radar, speech processing, seismology, and ornithology. In the context of audio analysis, spectrograms facilitate the phonetic identification of spoken words and enable the detailed examination of animal vocalizations.
Spectrograms can be generated through several methods: utilizing an optical spectrometer, employing a bank of band-pass filters, or by applying either a Fourier transform or a wavelet transform. When derived from a wavelet transform, the representation is also known as a scaleogram or scalogram.
Typically, a spectrogram is rendered as a heat map, which is an image where intensity variations are conveyed through differences in color or brightness.
Format
A prevalent format involves a two-dimensional graph where one axis denotes time and the other signifies frequency. A third dimension, representing the amplitude of a specific frequency at a given time, is illustrated by the intensity or color of individual points within the image.
Numerous format variations exist; for instance, the vertical and horizontal axes may be interchanged, causing time to progress vertically. Alternatively, a waterfall plot may be employed, where amplitude is depicted by the height of a three-dimensional surface rather than by color or intensity. Both frequency and amplitude axes can be configured as either linear or logarithmic, contingent upon the specific analytical objective. For audio representations, a logarithmic amplitude axis (typically in decibels, or dB) is common, while the frequency axis might be linear to highlight harmonic relationships or logarithmic to accentuate musical and tonal characteristics.
Generation
Spectrograms pertaining to light can be directly produced over time through the application of an optical spectrometer.
Spectrograms can be derived from a time-domain signal via two primary approaches: approximation as a filterbank, which involves a series of band-pass filters (historically the sole method prior to modern digital signal processing), or computation from the time signal using the Fourier transform. While these two methodologies yield distinct time-frequency representations, they exhibit equivalence under specific conditions.
The band-pass filter method typically employs analog processing to segment the input signal into various frequency bands. The output magnitude from each filter subsequently governs a transducer, which then records the spectrogram as an image on a physical medium, such as paper.
The creation of a spectrogram using the Fast Fourier Transform (FFT) constitutes a digital procedure. Initially, time-domain digital samples are segmented into often-overlapping chunks, which are then Fourier transformed to ascertain the magnitude of the frequency spectrum for each segment. Subsequently, each segment corresponds to a vertical line within the spectrogram image, representing a magnitude-versus-frequency measurement at a specific temporal point (the midpoint of the segment). These individual spectrums or time plots are then juxtaposed or slightly overlapped, often through windowing techniques, to construct the final image or a three-dimensional surface. Fundamentally, this process involves calculating the squared magnitude of the short-time Fourier transform (STFT) of the signal . Specifically, for a given window width , the spectrogram is defined as
Limitations and Resynthesis
Based on the aforementioned formula, a spectrogram seemingly lacks information regarding the precise, or even approximate, phase of the signal it depicts. Consequently, reversing this process to reconstruct an exact replica of the original signal from a spectrogram is generally infeasible. However, in contexts where the precise initial phase is not critical, a functional approximation of the original signal might be achievable. The Analysis & Resynthesis Sound Spectrograph exemplifies a computational tool designed for this purpose. An early speech synthesizer, known as the pattern playback, was developed at Haskins Laboratories in the late 1940s to convert visual representations of speech acoustic patterns (spectrograms) back into audible sound.
Nevertheless, spectrograms do contain certain phase information, which manifests as time delay (or group delay), representing the dual of instantaneous frequency.
The dimensions and configuration of the analysis window are adjustable. A smaller, shorter window yields greater accuracy in temporal resolution but compromises the precision of frequency representation. Conversely, a larger, longer window offers enhanced frequency precision at the cost of temporal accuracy. This trade-off illustrates the Heisenberg uncertainty principle, which states that the product of precision for two conjugate variables must be greater than or equal to a constant, typically expressed as B*T>=1.
Applications
- Historically, analog spectrograms found extensive application across various fields, including the analysis of avian vocalizations, such as those of the great tit. Modern research continues this work utilizing digital equipment, extending its application to all animal sounds. The contemporary digital spectrogram proves particularly valuable for investigating frequency modulation (FM) within animal calls. Notably, the distinct features of FM chirps, broadband clicks, and social harmonizing are most effectively visualized using spectrograms.
- Spectrograms serve as a valuable aid in addressing speech deficits and facilitating speech training for individuals with profound deafness.
- Research in phonetics and speech synthesis frequently benefits from the application of spectrograms.
- In deep learning-based speech synthesis, a spectrogram (or its Mel-scale representation) is initially predicted by a sequence-to-sequence (seq2seq) model. Subsequently, this spectrogram is input into a neural vocoder to generate the synthesized raw waveform.
- Reversing the spectrogram generation process enables the creation of a signal whose spectrogram corresponds to an arbitrary image. This technique facilitates embedding a picture within an audio segment and has been utilized by various electronic music artists.
- Spectrograms serve as an intermediate medium in the creation of certain modern music, allowing for the manipulation of frequency intensities over time or the generation of novel frequencies through graphical representation and subsequent inverse transformation.
- Spectrograms are instrumental in analyzing the output generated when a test signal traverses a signal processor, such as a filter, thereby enabling performance evaluation.
- The development of radio frequency (RF) and microwave systems frequently incorporates high-definition spectrograms.
- Contemporary applications of spectrograms include the visualization of scattering parameters obtained from vector network analyzers.
- Both the US Geological Survey and the IRIS Consortium offer near real-time spectrogram displays for the continuous monitoring of seismic stations.
- In speech recognition applications, spectrograms can be effectively integrated with recurrent neural networks.
- The Chinese government collects individual spectrograms as a component of its extensive mass surveillance initiatives.
- In the context of a vibration signal, a spectrogram's color scale delineates the frequencies corresponding to a waveform's amplitude peaks across time. Distinct from conventional time or frequency graphs, a spectrogram establishes a correlation between peak values, time, and frequency. Vibration test engineers employ spectrograms to scrutinize the frequency content of continuous waveforms, identify prominent signals, and ascertain temporal variations in vibration behavior.
- Spectrograms facilitate speech analysis in two distinct applications: the automated detection of speech deficits in cochlear implant recipients and phoneme class recognition for the extraction of phone-attribute features.
- To ascertain a speaker's pronunciation characteristics, certain researchers have advanced a bionics-inspired methodology. This approach leverages spectrogram statistics to generate a characteristic spectrogram, providing a stable representation of the speaker's pronunciation derived from a linear superposition of short-time spectrograms.
- Researchers are investigating a novel methodology for electrocardiogram (ECG) signal analysis through the application of spectrogram techniques, potentially enhancing visualization and comprehension. The incorporation of Mel-frequency cepstral coefficients (MFCC) for feature extraction indicates a cross-disciplinary utility, adapting audio processing methods to extract pertinent information from biomedical signals.
- Precise interpretation of temperature indicating paint (TIP) holds significant importance in aviation and various industrial contexts. Two-dimensional spectrograms of TIP can be utilized for temperature interpretation.
- Spectrograms can be employed to process signals related to the rate of change of the human thorax. By visually representing respiratory signals through spectrograms, researchers have proposed a neural network-based approach for classifying respiration states.
References
References
- See an online spectrogram of speech or other sounds captured by your computer's microphone.
- The generation of a tone sequence whose spectrogram corresponds to an arbitrary text.
- Details regarding the creation of a signal whose spectrogram represents an arbitrary image.
- An article detailing the development of a software-based spectrogram.
- The historical evolution of spectrograms and the advancement of associated instrumentation.
- Methodologies for identifying words within a spectrogram, as presented in a linguistic professor's Monthly Mystery Spectrogram publication.
- Sonogram Visible Speech, a GPL-licensed freeware application for the generation of spectrograms from signal files.
