Auditory roughness estimation of complex spectra
Pantelis N. Vassilakis
UCLA, Systematic Musicology, Music Cognition and Acoustics Laboratory
Box 951657, Los Angeles, CA 90095-1657
During the last forty years a number of models quantifying auditory roughness have been proposed and have been employed in a series of studies, demonstrating a relatively low degree of predictive power. Correct estimation of the degree of roughness of a pair of sines or of an arbitrary spectrum is necessary before any claimed link between roughness and some acoustic, perceptual, or musical variable can be tested, as well as an important step towards the difficult task of quantifying inharmonicity.
Roughness is one of the perceptual attributes of amplitude fluctuation. Musical sounds are represented by vibration signals whose characteristics, practically always, change with time. Amplitude fluctuations (variations of a signal's amplitude around some reference value) constitute one such change and can be placed in three broad perceptual categories depending on the amplitude fluctuation rate. Slow amplitude fluctuations (20 per second) are perceived as loudness fluctuations referred to as beating. As the rate of amplitude fluctuation increases, the loudness appears to be constant and the fluctuations are perceived as 'fluttering' or roughness. The roughness sensation reaches a maximal strength and then gradually diminishes until it disappears (75-150 amplitude fluctuations per second, depending on the actual vibration frequency). These distinct perceptual categories do not reflect any fundamental qualitative differences in the vibrational frame of reference and should be approached as alternative manifestations of a single physical phenomenon.
If we accept that the ear performs a frequency analysis on incoming signals, the above perceptual categories can be related directly to the bandwidth of the analysis filters. For example, in the simplest case of amplitude fluctuations resulting from the addition of two sine signals, the following statements represent the general consensus: If the filter bandwidth is much larger than the fluctuation rate then a single tone is perceived, either with fluctuating loudness (and sometimes pitch) or with roughness. And if the filter bandwidth is much smaller than the fluctuation rate then a complex tone is perceived, to which one or more pitches can be assigned but which, in general1, exhibits no loudness fluctuation or roughness.
In the first case, the degree, rate, and shape (sine / complex) of amplitude fluctuations are parameters that are manipulated by musicians of various cultures, exploring the beating and roughness sensations. Manipulating the degree and rate of amplitude fluctuation helps create a shimmering (i.e. Indonesian gamelan performances) or rattling (i.e. Bosnian ganga singing) sonic canvas that becomes the backdrop for further musical elaboration. It permits the creation of timbral (i.e. Middle Eastern mijwiz playing) or even rhythmic (i.e. ganga singing) contrasts through gradual or abrupt changes between fluctuation rates and degrees2.Whether those contrasts are explicitly sought for (as in ganga singing, mijwiz playing, or even the use of 'modulation' wheels/pedals in modern popular music) or happen more subtly and gradually (as may be the case in the typical chord progressions/modulations of Western music), they form an important part of a musical tradition's expressive vocabulary.
Important clues regarding the ways various musical cultures approach roughness and other perceptual attributes of amplitude fluctuation may be found through an examination of musical instrument construction and performance practice. Additionally, the different choices among musical traditions with regards to vertical sonorities (i.e. harmonic intervals, chords, etc.) can reveal a variety of attitudes towards the sonic possibilities opened up by the manipulation of amplitude fluctuation in general and the sensation of roughness in particular.
Similarly to the sensation of beats, the sensation of roughness has often been associated with the concepts of consonance/dissonance, whether those have been understood as aesthetically loaded (Rameau, Romieu, in Carlton, 1990; Kameoka & Kuriyagawa, 1969a; Terhardt, 1974a&b, 1984) or not (Helmholtz, 1885; Hindemith, 1945; von Békésy, 1960; Plomp & Levelt, 1965.) Some of the studies addressing the sensation of roughness have occasionally (i.e. Stumpf, 1890, in von Békésy, 1960: 348; Vogel, 1993; etc.) been too keen to find a definite and universally acceptable justification of the 'natural inevitability' and 'aesthetic superiority' of Western music theory. This has prevented them from seriously examining the physical and physiological correlates of the roughness sensation. On the contrary, Helmholtz, the first researcher to tackle theoretically and experimentally the issue, concluded that:
Whether one combination [of tones] is rougher or smoother than another depends solely on the anatomical structure of the ear, and has nothing to do with psychological motives. But what degree of roughness a hearer is inclined to ... as a means of musical expression depends on taste and habit; hence the boundary between consonances and dissonances has frequently changed ... and will still further change... (Helmholtz, 1885: 234-235.)
The present study adopts this position and treats the sensation of roughness simply as a perceptual attribute that can be manipulated through controlling the degree and rate of amplitude fluctuation, providing means of sonic variation and musical expression.
ii Existing roughness estimation models and their application
The two principle studies that have systematically examined the sensation of roughness (von Békésy, 1960: 344-354, Terhardt, 1974a;) have, to a large extend, been ignored by the majority of models quantifying roughness / smoothness. Numerous such models have been proposed (Plomp & Levelt, 1965; Kameoka & Kuriyagawa, 1969a&b; Hutchinson & Knopoff, 1978), and have been employed in later studies (Bigand et al., 1993; Vos, 1986; Dibben, 1999;) demonstrating a low degree of agreement between predicted and experimental data. Dibben, for example, found no correlation between sensory consonance (smoothness), as predicted by the Hutchinson & Knopoff model, and the completeness / stability ratings of the final bars of selected musical pieces. She concluded that sensory consonance / dissonance is not a good measure of musical stability / tension, or completeness / incompleteness, interpreting her conclusion as supporting the need for an alternative model of consonance / dissonance. Her study is a good example of an attempt to load the concept of consonance with meanings that go far beyond the scope of the model employed. It basically demonstrates that the degree of smoothness of a vertical sonority is not a good measure of its sense of stability or completeness. This result could have been anticipated since the 'sense of stability' of any given event may be highly related to the events that precede it, while roughness models calculate the roughness of isolated vertical sonorities. The surprising fact is not the results of Dibben's study but the implied expectations that: a) a measure of 'smoothness' could correlate with multidimensional and highly temporal and context dependent notions such as stability or completeness, and b) any model of consonance / dissonance should map to stability / tension responses. It appears that the concept of consonance (even more than that of timbre - see Bergman, 1990: 93) has been a 'wastebasket' of all kinds of aesthetic and evaluative judgments in music, as well as the box of treasures for justification arguments regarding general stylistic trends or specific compositional decisions.
There are, however, many reasons (other than those posited by Dibben) for the revision of the existing models quantifying roughness, some of which have already been pointed out by other researchers and some of which will be addressed by the present study.
Vos (1986) pointed out a number of inconsistencies in the Plomp & Levelt and the Kameoka & Kuriyagawa models3, with regards to the critical bandwidth model derived from loudness summation experiments (Zwicker et al., 1957). In his study, Vos suggested some adjustments that would bring the predictions of all three models to a better agreement. Hutchinson & Knopoff's model has been criticized (Bigand et al. 1996) for its relatively crude representation of the nonlinear relationship between the amplitude fluctuation rate corresponding to maximum roughness and the frequency of the lower of the interfering sines.
A recent model (Sethares, 1998) has the advantage of being based on a large number of direct smoothness / roughness experimental ratings of pairs of sines, fitting a function that accounts for the above mentioned nonlinear relationship. Sethares' model offers the best theoretical fit to the observed relationship between roughness, frequency separation of the two interfering sines, and frequency of the lower sine. In this model, the experimentally derived roughness curves (i.e. graphs plotting the perceived roughness of a pair of sines with equal amplitudes as a function of their frequency separation) are essentially interpreted as positively skewed gaussian distributions:
where x represents an arbitrary measure of the frequency separation (f2 - f1), while b1 & b2 are the rates at which the function rises and falls. Using a gradient minimization of the squared error between the experimental data (averaged over all frequencies) and the curve described by Eq. (1) gives: b1 = 3.5 and b2 = 5.75. For these values, the curve maximum occurs when x = x* = 0.24, a quantity interpreted as representing the point of maximum roughness. To account for the non-linearity in the relationship between the fluctuation rate corresponding to maximum roughness and the frequency of the lower sine, Sethares introduced the following modification, which includes the actual frequency spacing () and the frequency of the lower component (), into the calculation of roughness (R):
where b1 = 3.5, b2 = 5.75, x* = 0.24, and .
The parameters s1 and s2 allow the function to stretch / contract with changes in the frequency of the lower component so that the point of maximum roughness always agrees with the experimental data. A least square fit gave s1 = 0.0207 and s2 = 18.96.
iii Drawbacks of existing models - Introducing a new roughness estimation model
An inaccuracy that Sethares' model shares with all earlier ones regards the expected contribution of the amplitudes of the interfering sine-pairs (and therefore of the degree of amplitude fluctuation of the resulting complex signal4) to the degree of roughness. The roughness function is usually multiplied by the product of the two amplitudes (), ensuring minimum roughness if either of the amplitudes approaches zero. At the same time, however, it severely overestimates the increase in roughness with increasing amplitudes and, most importantly, it fails to capture the relationship between the amplitude difference of two sines close in frequency and the salience of the resulting beats or roughness.
Terhardt (1974a) examined experimentally the influence of modulation depth (m) and sound pressure level (SPL) on the roughness of amplitude modulated tones, as well as the relationship between the roughness of amplitude modulated tones (modulation frequency = fmod, modulation depth = m = 1) and the roughness of tone pairs that result in amplitude fluctuations of the same rate () and degree (A1 = A2.) By manipulating modulation depth, Terhardt attempted to essentially link the degree of amplitude fluctuation to roughness. As it is shown in a different study (Vassilakis, in preparation), however, modulation depth and degree of amplitude fluctuation are not quantitatively equivalent. Therefore, the functions that describe the above relationships (as revealed by Terhardt) need to be adjusted accordingly:
a) The power function that describes the relationship between degree of amplitude fluctuation (m) of an AM tone and perceived roughness (R) is adjusted from Terhardt's (c: constant) to:
b) The contribution of SPL to the sensation of roughness of AM-tones (Eq. (4)) is negligible, especially when compared to the contribution of the degree of amplitude fluctuation (Eq. (3)):
(c: constant) Eq. (4)
c) The roughness of a beating tone pair (f1, f2; A1, A2), , is related to the roughness of an AM tone (;, ), , as follows:
Eqs. (3), (4) & (5) illustrate that all existing models calculating the roughness of pairs of sines (Plomp & Levelt, 1965; Kameoka & Kuriyagawa, 1969a & 1969b; Hutchinson & Knopoff, 1978, Sethares, 1998), have largely underestimated the importance of amplitude fluctuation depth5 (i.e. relative amplitudes values), while overestimating the importance of SPL (i.e. absolute amplitude values.)
Combining Eqs. (2), (3), (4), & (5) gives the new model for the calculation of the roughness, R, of pairs of sines (with frequencies f1 & f2, amplitudes A1 & A2 , and zero initial phases):
where b1 = 3.5, b2 = 5.75, , x* = 0.24, s1 = 0.0207, & s2 = 18.96.
The roughness of complex spectra with more than two sine components will be calculated by adding up the roughness of the individual sine-pairs. Although von Békésy (1960: 350-351) has suggested that the total roughness can be less than the sum of the roughness contributions of the individual sine-pairs, depending on the relative phase of the respective amplitude fluctuations, initial experiments indicated otherwise confirming previous experimental results (Terhardt, 1974a.)
iv Testing and application of the new roughness estimation model:
'Roughness degrees and harmonic interval consonance / dissonance ratings'
It is argued that a) Eq. (6) is able to capture subtle variations in roughness degrees and b) in the Western musical tradition where sensory roughness is in general avoided as 'dissonant,' the consonance hierarchy of harmonic intervals corresponds to subtle variations in roughness degrees. More specifically, two hypotheses are postulated: a) dissonance ratings match the roughness degrees estimated by the model and determined by listeners and b) for musicians within the Western musical tradition, roughness ratings of harmonic intervals within the chromatic scale match the dissonance degrees suggested by Western music theory.
Experimental design - procedure - analysis:
The thirteen harmonic intervals of the chromatic scale (unison and octave included) within the octave above middle C (fundamental frequency: 256Hz) constitute the experiment stimuli. The intervals were constructed out of synthesized complex tones with six components each and static, sawtooth spectra. The frequency components were shifted slightly away from a harmonic relationship. The sawtooth and slightly detuned spectra were chosen to introduce 'naturalness' to the stimuli. Based on previous studies (von Békésy, 1960; Terhardt, 1974a.), the low-frequency / low-level beating caused by the detuning was not expected to influence the roughness ratings
In the initial stage of experiments each interval was presented binaurally to musically trained subjects though headphones. One group of subjects was asked to rate the stimuli on a 'roughness' scale, outlined either by the adjectives 'rough' - 'not rough' or by two comparison stimuli spanning an appropriate roughness range. A second group of subjects was asked to rate the stimuli on a 'dissonance' scale, outlined by the adjectives 'dissonant' - 'not dissonant'. No comparison stimuli were included in this case, since the goal was to get at the assumed cultural associations of the terms 'consonance' and 'dissonance.' Subjects were able to listen to the stimuli as many times as needed before making their decision. Preliminary analysis of the results indicates that the roughness ratings correlate with the roughness of the stimuli as estimated by the proposed model (Eq. (6)) and that the responses of the first group of subjects correlate with those of the second group.
v Summary - conclusions
All existing models quantifying roughness have demonstrated limited predictive power due, for the most part, to:
With the roughness calculation model introduced by Sethares (1998) (see Eq. (2) above) as a starting point, a new model has been proposed (Eq. (6)), which includes a term that accounts for the correct contribution of the amplitudes of interfering sines to the roughness of the resulting complex tone. This term is based on existing experimental results (Terhardt, 1974a, von Békésy, 1960) with an additional adjustment that accounts for the important quantitative difference between amplitude modulation depth and degree of amplitude fluctuation (Vassilakis, in preparation.) The roughness of complex spectra with more than two sine components is calculated by adding up the roughness of the individual sine-pairs.
The final model has been tested experimentally and has been applied to the testing of a hypothesis linking the consonance hierarchy of harmonic intervals within the Western chromatic scale to variations in roughness degrees. Analysis of the pilot data indicates that, for isolated harmonic intervals, the proposed roughness estimation model agrees well with observation.
1)The beating and roughness sensations associated with certain complex tones are essentially understood in terms of sine-component interaction within the same critical band. However, studies (von Békésy, 1960: 577-590; Plomp, 1966) examining the beating and roughness of mistuned consonances (i.e. sine-pairs with frequency ratio slightly removed from a small integer ratio) indicate that these sensations arise even when the added sines are separated by frequencies much larger than the critical bandwidth. The experimental results of von Békésy and Plomp challenge earlier explanations of this phenomenon that were based on the nonlinear creation of combination tones (Helmholtz, 1885: 197-211) or harmonics (Wegel & Lane, 1924, in Plomp, 1966: 463; Lane, 1925) inside the ear. Although their final interpretations differ, both studies link the beating and roughness sensations of mistuned consonances directly to the complex signal's amplitude-fluctuations.
2)Changes in the rate of amplitude fluctuation exploit the differences not only between the beating and roughness sensations but also between various degrees of roughness. Depending on the rate of fluctuation, three 'shades' of roughness have been distinguished (von Békésy, 1960: 354.) Approximately 45 fluctuations per second give roughness of an intermediate character, lying between that of slower rates ("R" character) and that of higher rates ("Z" character.)
3)The Plomp & Levelt (1965) model underestimates roughness because of its bias against a power function for roughness, while the Kameoka & Kuriyagawa (1969a&b) model overestimates roughness because of its bias for a power function for roughness. The fact that some sort of power function (although not exactly the one relating amplitude to loudness) is called for is supported by the relationship between the mechanisms associated with the sensations of roughness and loudness. (Von Békésy 1960: 344-350.)
4)If two sines with different frequencies: f1, f2, () and amplitudes: A1 and A2 () are added together, the amplitude of the resulting signal will fluctuate between a maximum (Amax = A1 + A2) and a minimum (Amin = A1 - A2) value. The degree of amplitude fluctuation (Daf) is defined as the difference between the maximum and minimum amplitude values relative to the maximum amplitude value. So .
5)Hutchinson & Knopoff assumed a linear relationship between degree of amplitude fluctuation and roughness while all other models completely ignored the degree of amplitude fluctuation from their calculations.
Referencesvon Békésy, G. (1960). Experiments in Hearing. New York: Acoustical Society of America Press (1989.)
Back to index