Pitch is an auditory attribute enabling the categorization of sounds as "higher" or "lower" within the context of musical melodies. It is a perceptual characteristic facilitating the arrangement of sounds along a frequency-dependent continuum. Pitch stands as one of the primary auditory characteristics of musical tones, alongside duration, loudness, and timbre.
While pitch can be quantitatively expressed as a frequency, it is not solely an objective physical property, but rather a subjective psychoacoustical attribute of sound. Historically, the investigation of pitch and its perception has constituted a fundamental challenge within psychoacoustics, proving pivotal in the development and validation of theories concerning sound representation, processing, and perception within the auditory system.
Perception
Pitch and frequency
Pitch is an auditory sensation where individuals categorize musical tones into relative positions on a musical scale, primarily influenced by their perception of vibrational frequency (audio frequency). Pitch is intimately linked to frequency, yet these two concepts are not synonymous. Frequency represents an objective, scientifically quantifiable attribute. Conversely, pitch constitutes the subjective interpretation of a sound wave by an individual, rendering it not directly measurable. Nevertheless, this distinction does not preclude a general consensus among individuals regarding the relative 'highness' or 'lowness' of musical notes.
Sound wave oscillations are frequently characterized by their frequency. Consequently, pitches are typically correlated with, and thus measured as, frequencies (expressed in cycles per second, or hertz), through the comparison of evaluated sounds with pure tones (characterized by periodic, sinusoidal waveforms). This methodology also frequently enables the assignment of a pitch to complex and aperiodic sound waves.
According to the American National Standards Institute, pitch is the auditory attribute of sound that facilitates the ordering of sounds along a continuum from low to high. Given pitch's strong correlation with frequency, its determination is almost exclusively governed by the rate at which the sound wave causes air to vibrate, bearing minimal relation to the wave's intensity or amplitude. Specifically, a "high" pitch signifies extremely rapid oscillation, whereas a "low" pitch corresponds to a slower oscillation rate. Notwithstanding this, the linguistic convention associating vertical height with sound pitch is prevalent across numerous languages. In English, for instance, this represents merely one instance among several profound conceptual metaphors involving verticality. The precise etymological origins of the musical connotations of "high" and "low" pitch remain indeterminate. Empirical evidence suggests that humans indeed perceive the spatial origin of a sound as marginally higher or lower in vertical dimension when its frequency is either increased or decreased.
In most instances, the pitch of complex sounds, such as speech and musical notes, closely approximates the repetition rate of periodic or quasi-periodic sounds, or, alternatively, to the inverse of the temporal interval separating recurring similar events within the sound waveform.
The pitch of complex tones may exhibit ambiguity, implying that an observer can perceive two or more distinct pitches. Even when the actual fundamental frequency is precisely quantifiable via physical measurement, it may diverge from the perceived pitch due to the presence of overtones, also termed upper partials, whether harmonic or inharmonic. For instance, a complex tone comprising two sine waves at 1000 Hz and 1200 Hz might occasionally be perceived as possessing up to three pitches: two spectral pitches at 1000 Hz and 1200 Hz, originating from the physical frequencies of the constituent pure tones, and a combination tone at 200 Hz, which corresponds to the waveform's repetition rate. In such scenarios, the 200 Hz percept is frequently termed the "missing fundamental," often representing the greatest common divisor of the present frequencies.
Pitch perception is also marginally affected by the sound pressure level (loudness or volume) of a tone, particularly at frequencies below 1,000 Hz and above 2,000 Hz. For lower tones, pitch decreases with increasing sound pressure. For instance, a 200 Hz tone at high intensity is perceived as approximately one semitone lower than when barely audible. Conversely, above 2,000 Hz, pitch increases with heightened sound intensity. These initial findings were established in pioneering research by S. Stevens and W. Snow. Subsequent investigations, such as those by A. Cohen, indicated that most observed pitch shifts were not statistically distinct from typical pitch-matching inaccuracies. Upon averaging, any residual shifts aligned with Stevens's established trends but remained minor, typically 2% or less of the frequency, equating to less than a semitone.
Theoretical Frameworks of Pitch Perception
Theories of pitch perception endeavor to elucidate the interplay between physical sound characteristics and the auditory system's physiology to produce the perception of pitch. Broadly, pitch perception theories are categorized into place coding and temporal coding. The place theory posits that pitch perception arises from the location of peak stimulation along the basilar membrane.
A place code, which leverages the auditory system's tonotopic organization, is essential for high-frequency pitch perception, given the physiological constraints on neuronal phase-locking of action potentials. However, a purely place-based theory fails to adequately explain the precision of pitch perception within low and mid-frequency spectra. Furthermore, evidence suggests that certain non-human primates exhibit an absence of auditory cortex responses to pitch, despite possessing distinct tonotopic maps in this region, indicating that tonotopic place codes alone are insufficient for generating pitch percepts.
Temporal theories propose an alternative explanation grounded in the temporal structure of action potentials, primarily the phase-locking of action potentials to stimulus frequencies. The exact mechanism by which this temporal structure encodes pitch at higher neural levels remains a subject of ongoing debate, though processing appears to involve an autocorrelation of auditory nerve action potentials. Nevertheless, a long-standing observation is the absence of an identified neural mechanism capable of executing the delay operation essential for true autocorrelation. Conversely, at least one model suggests that a temporal delay is not requisite for an autocorrelation-based pitch perception model, instead invoking phase shifts between cochlear filters. However, prior research has demonstrated that some sounds possessing a prominent peak in their autocorrelation function fail to evoke a corresponding pitch percept, while others lacking such a peak nonetheless produce a distinct pitch. Consequently, for a more comprehensive model, autocorrelation should be applied to signals representing cochlear output, such as those derived from auditory-nerve interspike-interval histograms. Certain theories of pitch perception propose that pitch inherently contains octave ambiguities, suggesting its optimal decomposition into a pitch chroma—a periodic value within an octave, analogous to Western musical note names—and a potentially ambiguous pitch height, which specifies the octave.
Just-Noticeable Difference in Pitch
The just-noticeable difference (jnd), defined as the minimal perceptible change, varies according to the tonal frequency content. For frequencies below 500 Hz, the jnd approximates 3 Hz for sine waves and 1 Hz for complex tones. Above 1000 Hz, the jnd for sine waves is approximately 0.6%, or about 10 cents. Assessment of the jnd typically involves presenting two tones in rapid succession, with participants tasked to identify any perceived pitch discrepancy. The jnd diminishes when both tones are presented concurrently, as listeners can then detect beat frequencies. Within the human auditory spectrum, approximately 1,400 distinct pitch steps are perceptible. In contrast, the equal-tempered scale, spanning 16 to 16,000 Hz, comprises 120 notes.
Auditory Illusions
The relative perception of pitch is susceptible to deception, leading to auditory illusions. Examples include the tritone paradox, but the most prominent is the Shepard scale, in which a continuous or discrete series of specifically constructed tones creates the impression of perpetually ascending or descending pitch.
Definite and Indefinite Pitch Qualities
Certain musical instruments do not generate notes with a discernible pitch. Specifically, unpitched percussion instruments, a category within percussion, do not yield specific pitches. A sound or note is characterized as having definite pitch when a listener can readily or relatively easily perceive its pitch. Such sounds typically exhibit harmonic or near-harmonic frequency spectra.
When an instrument generates a sound, it simultaneously produces multiple modes of vibration. Consequently, a listener perceives numerous frequencies concurrently. The lowest frequency vibration is termed the fundamental frequency, while all other frequencies are designated as overtones. An essential subset of overtones comprises harmonics, which are frequencies that are integer multiples of the fundamental. Regardless of whether these higher frequencies are integer multiples, they are collectively referred to as partials, representing the constituent components of the overall sound spectrum.
Conversely, a sound or note possesses indefinite pitch if a listener finds it impossible or considerably challenging to ascertain its specific pitch. Sounds exhibiting indefinite pitch typically lack harmonic spectra or display altered harmonic spectra, a phenomenon known as inharmonicity.
Nevertheless, it remains possible to distinguish between two sounds of indefinite pitch as being clearly higher or lower in relation to each other. For example, a snare drum is perceived as higher-pitched than a bass drum, despite both having indefinite pitch, due to the presence of higher frequencies in its sound. Thus, while it is often feasible to approximate the relative pitches of two indefinite sounds, these sounds do not precisely align with any specific, absolute pitch.
Pitch Standards and Standardized Pitch
A pitch standard denotes the conventional reference frequency to which musical instruments within an ensemble are tuned for a given performance. This standard can differ across various ensembles and has undergone significant historical fluctuations.
Since 1939, the A above middle C has typically been standardized at 440 Hz (frequently denoted as A440 or "A = 440 Hz"), though alternative frequencies, such as 442 Hz, are also commonly employed as variations. A distinct standard, known as 'baroque pitch', is currently established at A = 415 Hz, which is a semitone lower than A440; this allows for the combined use of period and modern instruments through transposition. 'Classical pitch', utilized for fortepianos and other instruments performing music from the Classical period, may be tuned to either 427 Hz (approximately midway between A415 and A440) or 430 Hz (also between A415 and A440, but marginally sharper than a quarter tone). Furthermore, ensembles dedicated to historically informed performances of Romantic era repertoire typically set the A above middle C at 432 Hz, or adhere to the French standard of 435 Hz, which was prevalent until the 1930s.
Transposing instruments traditionally feature parts that are transposed and notated in keys distinct from those used for vocalists, non-transposing instruments, and even other transposing instruments. Consequently, musicians employ the term "concert pitch" to eliminate ambiguity in their communication. For instance, when the most prevalent types of clarinet or trumpet perform a note notated as C in their respective parts, the resulting sound corresponds to a pitch designated as B♭ on a non-transposing instrument, such as a violin. (This historical practice suggests that these wind instruments once operated at a standard pitch a whole tone lower than violin pitch). To unambiguously reference this pitch, a musician refers to it as concert B♭, signifying "the pitch that a performer on a non-transposing instrument, such as a violin, identifies as B♭."
Pitch Labeling Conventions
Pitches are typically designated through the following methods:
- Alphabetic characters, as exemplified by Helmholtz pitch notation.
- A combination of letters and numerical values, as observed in scientific pitch notation, where notes are sequentially labeled upward from C0, which corresponds to the 16 Hz C.
- Numerical values representing the frequency in hertz (Hz), indicating the number of cycles per second.
The A above middle C can be designated by various notations, such as a′, A4, or 440 Hz. In standard Western equal temperament, the concept of pitch is not affected by its "spelling"; for instance, "G§89§ double sharp" denotes the identical pitch as A§1112§. However, in alternative temperaments, these pitches might be distinct. Human auditory perception of musical intervals exhibits an approximately logarithmic relationship with fundamental frequency. Consequently, the perceived interval between "A220" and "A440" is equivalent to that between A440 and A880. This logarithmic perceptual characteristic motivates music theorists to represent pitches using a numerical scale derived from the logarithm of the fundamental frequency. For example, the prevalent MIDI standard can be utilized to convert a fundamental frequency, f, into a real number, p, as demonstrated below.
This methodology establishes a linear pitch continuum where octaves are assigned a size of 12 units, and semitones—representing the interval between adjacent keys on a piano keyboard—are assigned a size of 1 unit. Under this system, A440 is designated the numerical value 69. The distances within this pitch space directly correlate with musical intervals as conceptualized by musicians. Each equal-tempered semitone is further segmented into 100 cents. This system possesses sufficient adaptability to incorporate "microtones," which are pitches not typically present on conventional piano keyboards. For instance, a pitch situated precisely between C (60) and C♯ (61) can be precisely denoted as 60.5.
The subsequent table presents frequencies, expressed in Hertz, for notes across different octaves, utilizing the "German method" for octave nomenclature:
Scales
The specific relative pitches of notes within a musical scale are established by various tuning systems. In Western music, the twelve-note chromatic scale represents the predominant organizational framework, with equal temperament currently serving as the most prevalent tuning method for this scale. Within equal temperament, the precise pitch ratio between any two consecutive notes is the twelfth root of two, approximately 1.05946. Historically, during periods such as that of Johann Sebastian Bach, alternative well-tempered systems employed distinct approaches to musical tuning.
Across nearly all tuning systems, the interval of an octave corresponds to a doubling of a note's fundamental frequency; for instance, an octave above A440 is 880 Hz. However, if the initial overtone exhibits sharpness due to inharmonicity, particularly at the extreme registers of a piano, tuners employ a technique known as octave stretching.
Alternative Musical Interpretations of Pitch
Within atonal, twelve-tone, or musical set theory, the term "pitch" denotes a specific frequency, whereas a "pitch class" encompasses all octaves of a given frequency. In numerous analytical discourses concerning atonal and post-tonal music, pitches are often designated by integers, owing to the concepts of octave and enharmonic equivalency. For example, within a serial system, C♯ and D♭ are regarded as the same pitch, while C§89§ and C§1011§ are functionally equivalent, differing by one octave.
The use of discrete pitches, as opposed to continuously variable ones, is a nearly universal characteristic in music, with notable exceptions such as "tumbling strains" and "indeterminate-pitch chants." While gliding pitches are employed in the majority of cultures, they typically relate to or embellish the underlying discrete pitches.
3rd bridge (harmonic resonance derived from equal string divisions)
- 3rd bridge (harmonic resonance based on equal string divisions)
- Absolute pitch
- Diplacusis
- Eight-foot pitch
- Harmonic pitch class profiles
- Just intonation
- Meantone temperament
- Music and mathematics
- Piano key frequencies
- Pitch circularity
- Pitch class
- Pitch detection algorithm
- Pitch of brass instruments
- Pitch shifter
- Pitch pipe
- Relative pitch
- Scale of vowels
- Vocal and instrumental pitch ranges
References
Moore, B.C., and Glasberg, B.R. (1986). "Thresholds for Hearing Mistuned Partials as Separate Tones in Harmonic Complexes." Journal of the Acoustical Society of America, 80, 479–83.
- Moore, B.C. & Glasberg, B.R. (1986) "Thresholds for Hearing Mistuned Partials as Separate Tones in Harmonic Complexes". Journal of the Acoustical Society of America, 80, 479–83.
- Parncutt, R. (1989). Harmony: A Psychoacoustical Approach. Berlin: Springer-Verlag.
- Schneider, P., Sluming, V., Roberts, N., Scherg, M., Goebel, R., Specht, H.-J., Dosch, H.G., Bleeck, S., Stippich, C., and Rupp, A. (2005). "Structural and Functional Asymmetry of Lateral Heschl's Gyrus Reflects Pitch Perception Preference." Nat. Neurosci. 8, 1241–47.
- Terhardt, E., Stoll, G., and Seewann, M. (1982). "Algorithm for Extraction of Pitch and Pitch Salience from Complex Tonal Signals." Journal of the Acoustical Society of America, 71, 679–88.
