The Perceptual Organisation of Tones in a Free Field
Martine Turgeona) and Albert S. Bregman
Auditory Laboratory, Psychology Department, McGill University
1205 Dr. Penfield Avenue, Montreal, Canada, H3A 1B1
Telephone: (33 1) 184.108.40.206 FAX: (33 1) 220.127.116.11
This research was presented as part of a Ph.D. thesis submitted to the Psychology Department of McGill University. Funding was provided in part, by a grant from the National Sciences and Engineering Research Council of Canada (NSERC), and in part by a team grant from the to A.S. Bregman and R.J. Zatorre.
a) The first author, Martine Turgeon is currently affiliated with "Institut de Recherche en Coordination Acoustique/Musique" (IRCAM), Perception et Cognition Musicales. Reprints are available from Martine Turgeon at IRCAM, 1 Place Igor-Stravinsky, 75004, Paris, France.
Issues of interest
The perceptual organization of complex tones depends on the detection of biologically-relevant cues in the acoustic signal, such as those providing evidence for a common spatial location of the components of a sound source, and those reflecting spectro-temporal regularities typical of causally-related sounds such as simple harmonic ratios and temporal synchrony. There is evidence that the auditory system groups perceptually sounds that share a common fundamental frequency (McAdams and Smith, 1990), and a common spatial location of their sources (Kidd et al., 1998). Furthermore, Turgeon and Bregman (1999) have shown that the fusion of noise bursts in a free field is promoted by temporal synchrony. Though the contribution of many specific cues to auditory grouping has been established empirically (reviewed by Darwin and Carlyon, 1995), their interaction is poorly understood, especially in a free field. It is important to study grouping in the context of many interacting cues, since in real-world situations, no grouping cue acts in isolation. The present study was conducted in a free field with a semi-circular array of speakers to look at how the spatial separation of sound sources interacts with two of the most robust cues for the grouping of concurrent tones: harmonicity and temporal synchrony.
Rationale of the Rhythmic Masking Release (RMR) paradigm
We used the RMR paradigm (Turgeon and Bregman, 1996) to study the relative contribution of onset asynchrony, deviations from simple harmonic ratios, and the spatial separation of sources for the segregation of concurrent brief tones. In this RMR study, a rhythm was perceptually masked by embedding identical tones irregularly among the regular tones. The rhythm is camouflaged because no acoustic property distinguishes the regular subset of tones from the irregular one. We refer to the irregular tones as "maskers"; though they do not mask the individual tones, they mask their rhythmic sequential organization. "Captor" tones can be added in different critical bands simultaneously with the irregular maskers. These tones release the rhythm from masking when they are completely simultaneous (Turgeon and Bregman, 1996); that is, temporal coincidence fuses them perceptually. The newly formed masker-captor units have emergent properties, such as a different timbre and a new pitch; this distinguishes the irregularly-spaced components from the regularly-spaced ones. The accurate perception of the rhythm is thus contingent upon the fusion of the irregular maskers and captors. Measuring the listener's ability to identify the embedded rhythm thus provides an estimate of the degree of perceptual fusion of the maskers and captors. We manipulated the spatial, spectral and temporal relations between the maskers and captors to see how their fusion was affected by these factors, using a two-alternative forced choice task, in which one of two rhythms was embedded in the sequence.
Objectives and hypotheses
Relative contribution of a common onset and offset, F0, and location of source on fusion. One of the objectives of the study was to assess the relative importance of auditory-grouping cues by creating competition among them. For instance, suppose that the masked rhythm sequence and the captors are presented in different speakers. While the common relation to a fundamental frequency (F0) and the common speaker location of the masked-rhythm tones should promote their sequential grouping, the temporal coincidence and common F0 between the maskers and captors should promote their simultaneous grouping. If common spatial location and frequencies overcome the segregating effects of temporal synchrony, the rhythm should remained perceptually masked; on the other hand, if temporal synchrony and a common F0 (among spatially and spectrally distributed components) win the competition, the maskers and captors should fuse perceptually and the rhythm should be heard clearly.
We expected simultaneity of onset and offset to make a much greater contribution to the fusion of complex tones than would their harmonic relations or their separation in space. This expectation was based on the high ecological validity of temporal coincidence for the perception of components as a single event as well as the empirical evidence showing its powerful effect on the fusion of components (reviewed by Darwin and Carlyon, 1995; Turgeon, 1999). Despite the importance of simple harmonic ratios on pitch perception (Hartmann, 1988), we did not expect it to have a strong effect on the fusion of our brief tonal stimuli. This expectation was based on recent results showing that harmonicity only weakly affects the diotic and dichotic fusion of the same stimuli over headphones (chapter 4 in Turgeon, 1999). The weakness of the harmonicity effect was attributed to the short duration of the tones (i.e., 48 ms). The tones typically used to study the effect of harmonicity on fusion range from one to several hundreds of milliseconds in length.
Past results suggest that the perceptual organization of sounds is influenced by the spatial separation of sound sources (Kidd et al., 1998). However, the results of a recent RMR experiment (Turgeon and Bregman, 1999), which presented noise burst stimuli in a free-field setting, showed that presenting them in different speakers only weakly affected their fusion, compared to when they came from the same speaker. Moreover, up to an angular separation (Dq) of 180 degrees of the sources, Dq was not sufficient for the full segregation of synchronous or slightly asynchronous bursts, and the magnitude of Dq did not affect the strength of the fusion. This weak effect can be contrasted with the strong effect of onset asynchronies (SOA) of 36 and 48 ms, which fully segregated the maskers and captors at all Dq's (from 0 to 180 degrees). The weak effect of Dq, compared to SOA, might be related to temporal coincidence being a more robust cue than a common location in space for sound-source determination. Unlike reflected light, sounds go around and through rigid surfaces (as such they are like transparent objects). As a consequence, in estimating the point of origin of a sound (i.e., the spatial location of the vibrating source), echoes may suggest more than one point of origin. We believe that echoes and reverberation present the auditory system with a degraded signal, so that spatial information is often unreliable. Given these ecological considerations and the results of our earlier free-field study with noise bursts (Turgeon and Bregman, 1999), we expected Dq to only have a weak effect on cross-spectral fusion.
Temporal limits for event perception. Another objective was to evaluate the minimum temporal deviation from perfect temporal synchrony which triggers the perception of concurrent tones as separate sound events. Onset asynchrony was expected to have a powerful effect on cross-spectral grouping, because it is a highly reliable cue for the segregation of sound-producing events. In a natural context, it is likely for sounds coming from different environmental sources to have some degree of temporal overlap; however, it is unlikely that they happen to be perfectly coincident in time. Given the adaptive value of detecting deviations from perfect coincidence, an empirical question of interest was to estimate the physical range of tolerance for the perceived simultaneity of sound events. Past research in this laboratory addressed this issue by estimating to what extent there could be a deviation from onset and offset synchrony before concurrent sounds were perceived as separate events 75% of the time (Turgeon, 1999). Such an SOA threshold for perceiving separate events was estimated to be between 28 and 35 ms, when brief complex tones were presented diotically and dichotically over headphones (chapter 4 in Turgeon, 1999). Such an SOA threshold for perceiving separate events was estimated to be between 28 and 35 ms, when brief complex tones were presented diotically and dichotically over headphones (chapter 4 in Turgeon, 1999). In that study, we estimated individual SOA thresholds, within each of four conditions: diotic and dichotic presentation of maskers and captors, either harmonically related or not. When they were presented dichotically, which induced a difference in perceived lateralization, the value of SOA required to segregate them was 12 ms lower than when they were presented diotically. This was true for both harmonic tones (40 vs. 28 ms) and inharmonic ones (38 ms vs. 26 ms), a lower threshold indicating less fusion of the maskers and captors. However, whether or not the tones shared a common F0 had little influence on the SOA threshold (the SOA value required to segregate them). A difference in F0 caused only a 2-ms difference in mean SOA thresholds, and the standard errors overlapped. Turgeon (1999) concluded that dichotic presentation, but not harmonicity, influenced the temporal disparity between concurrent tones that was needed for their perception as separate sounds.
The present experiment examined whether similar temporal limits hold for the presentation of the same stimuli in a free field. We did not expect harmonicity to have a significant effect on SOA thresholds, though it might affect them weakly. Assuming that the earlier observed effects of dichotic presentation had acted through differences in perceived lateralization, we expected that larger angular separations of maskers from captors in a free field should diminish SOA thresholds.
Subjects. The listeners were 18 adults who were naive to the purpose of the experiment. All had normal hearing for the 250-8000 Hz frequency range, as assessed through a short air-conductance audiometric test.
Stimuli. Stimuli were synthesized and presented by a PC-compatible 486 computer, which controlled a Data Translation DT 2823 16-BIT digital-to-analog converter. The rate of output was 20000 samples per second. Signals were low-pass filtered at 5000 Hz, using a flat amplitude (Butterworth) response with a roll-off of 48 dB/octave. Listeners sat at the center of a semi-circular array of 13 speakers, one meter away from the listener. The speaker array was situated in the sound-attenuated chamber of Dr. Zatorre, at the Montreal Neurological Institute. The head of the listener was fixed so as to point in the direction of the central speaker of the array. The RMS intensity level was the same for all the four-partial tones; it was calibrated as equal to that of a 1000-Hz tone presented at 60 dB SPL at the central position of the listener's head, that is, at the center of the array of speakers, one meter away from all of them. When temporally-overlapping tones were presented in two different speakers (a four-harmonic tone was presented in each speaker) the RMS level was the same at each speaker.
Two rhythmic patterns were to be discriminated by the listeners. Each was repeated to form a sequence that had a total duration of 9.5 seconds, was composed of 15 tones, and had a tempo of 1.7 tones per second. The two rhythms were different temporal arrangements of a short 384-ms inter-stimulus interval (ISI) and a long 768-ms one. Rhythm 1 repeated an alternation of short, long, short, long ISI's three and a half times. This gave rise to perceptual grouping of tones by pairs. Rhythm 2 repeated a cycle of short, long, long, short ISI's three and a half times; this gave rise to perceptual grouping of tones in which triplets alternated with a single tone. Both rhythms started and ended with an alternation of a short and a long ISI. To perceptually camouflage each rhythm, irregular maskers were interspersed among the rhythmic tones. The rhythms had a constant temporal density of one irregular masker for each 192-ms interval; there were thus two maskers in the short 384-ms ISI and four in the long 768-ms one. The variability in the distribution of irregular intervals was the same in all conditions, including the no-captor controls. There was no overlap between the rhythmic and masking tones.
In any condition, the same spectrum was used for the rhythmic and masker tones: the same four harmonics of 300 or 333 Hz of equal intensity. Together they formed the masked-rhythm sequence, which was presented in isolation for the no-captor control conditions. In all the other conditions, some captor tones were added; they were composed of four harmonics either of the same F0 as the maskers, or of a different F0. The four possible combinations of maskers and captors were: odd and even harmonics of a 300-Hz F0; odd and even harmonics of a 333-Hz F0; odd harmonics of a 333-Hz F0 and even harmonics of a 300- Hz F0; odd harmonics of a 300-Hz F0 and even harmonics of a 333-Hz F0. For each of these combinations, there were two versions, one in which the maskers (and the rhythm) had the high pitch (even harmonics of 300 or 333 Hz), the captors having the low pitch (odd harmonics of 300 or 333 Hz), and the other, which had them interchanged.
Each tone was 48-ms long, including a 8-ms onset and offset. The captors could be either simultaneous with the maskers or delayed from them by 12, 24, 36 or 48 ms. The maskers and captors were of the same duration; hence, for each onset asynchrony there was an offset asynchrony of the same duration. The amount of temporal overlap between the maskers and captors varied from a full 48-ms overlap to no overlap. The asynchronous maskers and captors were aligned in phase during their period of overlap so that the positive peaks of their waveforms were aligned at the period of their common F0. The masked-rhythm sequence and the irregular captors were either presented in the same central speaker, or else in two different speakers, equally distant from the central speaker. The speakers could be off center by 30, 60 or 90 degrees; these relative positions of the sources of the maskers and captors yielded threes angular separations (Dq): 60, 120 and 180 degrees. For each Dq, the presentation of the masked rhythm and captors on each side of the array was counterbalanced across trials.
Procedure. The subjects had to judge which one of the two rhythms was embedded in the sequence and how clearly it was heard on a 5-point scale. After each trial, feedback about the accuracy of rhythm identification was provided. There was a short training session. Listeners were told that they would hear a warning tone followed by one of the two rhythms that they had previously heard in isolation. They were instructed to direct their attention to the location of the speaker that had sent the warning tone and to tell which of two rhythms was played. The two isolated rhythms (without captors) were randomly played at each of the 13 possible speakers until the listeners reached the criterion of 10 correct identifications in a block. This was followed by a practice session which randomly presented each combination of SOA, Dq and harmonicity. This session allowed the listeners to become familiar with the task and to hear the variations across the conditions, so as to better use the full range of the rating scale. During the experimental trials of the experiment proper, a 1000-Hz warning tone was played in the speaker of the masked rhythm so that listeners could pay attention to the location of that rhythm. The listeners' heads remained fixed despite their attention being directed to speakers in different locations.
Computation of scores
Measure of rhythm sensitivity and response bias. Different accuracy measures were derived from listeners' responses: d' scores, proportion-correct scores (PC) and weighted-accuracy (WA). WA weights the rated accuracy by the clarity of the identified rhythm. For parsimony purposes, this short paper focuses on d'scores, occasionally reporting PC scores. The d'scores and response bias or c were evaluated according to standard procedures (Macmillan and Creelman, 1991). The d' scores measured sensitivity to Rhythm 1. In terms of Z (i.e., the inverse of the normal distribution function), d' is defined as Z(H) - Z(F); where H is the proportion of Hits (i.e., Rhythm 1 is reported when it is physically present) and F is the proportion of False Alarms (i.e., Rhythm 1 is reported when Rhythm 2 is physically present). In Z-scores units, c is given by: 0.5* [Z(H) + Z(F)]. A standard table of the normal distribution was to convert H and F to Z-scores (Macmillan and Creelman, 1991).
When listeners cannot discriminate at all between the two rhythms (i.e., chance-level performance), H=F and d'=0. On the other hand, perfect accuracy implies an infinite d'. To avoid values of infinity in the computation of d', proportions of 1 and 0 were thus converted into 0.999 and 0.001 respectively. Proportions of 0.999 and 0.001 yield d' values of 6.18 and -6.18. However, a lower value of d', namely, 4.65 is usually considered as the effective ceiling (Macmillan and Creelman, 1991); this is obtained when H=0.99 and F=0.01. As for response bias, a positive c indicates a higher tendency to respond Rhythm 1, a negative c indicates a higher tendency to respond Rhythm 2. Mean bias parameter c close to the zero-bias point are thus considered as indicative of the absence of a systematic response bias for a given subject.
Estimates of asynchrony threshold for perceiving separate events. To obtain an estimate of the magnitude of stimulus onset asynchrony (SOA) required for the perception of concurrent sounds as separate events, we determined the 75% SOA threshold from psychometric "Weibull" functions for the individual listeners (Weibull, 1951). Separate SOA thresholds were evaluated for the eight different spectro-spatial relations of this experiment (harmonic and inharmonic conditions for each Dq). For each of the eight Dq-by-harmonicity conditions, the mean goodness of fit (as measured by r, the Pearson correlation coefficient) of the data was equal to, or larger than 0.87.
Description of the main trends in the results
No-captor controls and measures of biases. The no-captor controls yielded mean PC of 0.54 (SE=0.03) and mean d' of 0.26 (SE=0.12); this is close to the chance level PC performance of 0.5 and d' performance of 0. This verifies that the rhythm was perceptually masked in the absence of captors. The results also verified that there was no bias for one rhythm over the other. For the conditions with captors, the mean response bias parameter c for the 18 listeners was -0.03 (SE=0.05); for their no-captor counterparts, the mean c across individuals was -0.009 (SE=0.09). Given the mean c value very close to zero and the small standard errors (SE), we concluded that response bias did not diminish the power of our statistical comparisons.
Effect of stimulus onset asynchrony (SOA). For each of the eight Dq-by-harmonicity conditions with no temporal asynchrony, the rhythm-identification performance was at the ceiling value, namely, a PC of 0.99 and a d' of 4.65 for each listener. It thus seems that temporal coincidence caused frequency components to be perceptually fused, whether they were harmonically related or not, and whether they came from the same location or from spatially-separated sources, 60, 120 or 180 degrees apart.
There was a clear monotonic decrease of d' with SOA [p < 10-5]. This powerful effect of SOA upon the fusion of the maskers and captors is consistent with past results found in the laboratory for the diotic and dichotic fusion of the same tonal stimuli presented over headphones, as well as for the fusion of brief noise bursts in a free field (Turgeon and Bregman, 1996, 1999). From the mean SOA thresholds estimated for the eight Dq-by-harmonicity conditions, an SOA between 26 and 37 ms (i.e., the range extending from one SE below the lowest mean threshold found in the present experiment to one SE above the highest one found) seems to trigger the perception of concurrent brief tones as separate events. This is in good agreement with the estimated 23-to-42 ms range for the perception of the same tones as separate events when they were presented over headphones (chapter 4 in Turgeon, 1999).
The 25-to-40-ms range of the mean SOA thresholds for the diotic, dichotic and free-field segregation of brief tones agrees with the literature on auditory grouping, reviewed by Darwin and Carlyon (1995), showing that an SOA of 30 to 40 ms is required for removing a partial from contributing to the overall timbre, to the lateralization and to the vowel identity of a complex sound. There is a close correspondence between the magnitude of the SOA leading to the perception of separate sounds and that for the computation of its emergent properties, since timbre, vowel quality and lateralization are properties of perceptually-segregated sound events. It is worth noting that this does seems not to apply to all perceptual properties of sounds. For instance, the SOA needed to prevent a partial from entering the computation of the pitch of a complex tone, estimated as 300 ms by Darwin and Ciocca (1992) is an order of magnitude higher than our estimated 30 ms SOA for event perception. This discrepancy between the temporal limits for pitch and event perception may be related to differences in their underlying neural mechanisms (Brunstrom and Roberts, 2000).
Effect of harmonicity and of the spatial separation of sound sources (Dq). There was a weak but consistent effect of harmonicity in promoting the fusion for asynchronous masker and captor tones, as measured by d' scores for each listener [p<0.01]. The spatial separation of sources (Dq) did not affect at all rhythm sensitivity as estimated by d' [p>0.1]. The d' scores are compatible with the highly consistent mean SOA thresholds found across the different spatial and spectral relations, as shown on Figure 1. The mean thresholds all fell between 28.5 (for Dq of 120 degrees and different F0's) and 34.2 ms (for Dq of 180 degrees and same F0). Note that a higher threshold indicates more fusion, since a larger asynchrony is required to perceptually segregate the maskers from the captors. From this figure, it is clear that the effect of harmonicity was weak and that Dq had no effect on fusion; still, harmonicity slightly affect the temporal disparity needed for the perception of separate events - a mean SOA of 32.4 (SE=3 ms) for harmonic stimuli, versus 30.1 (SE=3.9) for inharmonic ones. This 2-ms difference between the mean SOA threshold estimates for harmonic and inharmonic tones corresponds to that found for their presentation over headphones (chapter 4 in Turgeon, 1999). The present results suggest that only spectro-temporal regularities matter for the cross-spectral segregation of concurrent brief tones in a free field, SOA making by far the greatest contribution.
Figure 1: Mean SOA thresholds across individual listeners for different spectral and spatial relations between the masker and captor tones having the same F0 (harmonic) or different F0s (inharmonic), at four angular separations of their sources in a semi-circular speaker array. Standard errors (SE) are indicated.
Temporal coincidence and deviations from it, as induced by onset and offset asynchrony, was by far the most important factor for the perception of short-duration tones as one or two sound(s). Whereas masker and captor tones fused into a single masker-captor event when they were synchronous, when they were separated by an SOA of about 30 ms, they were segregated as two distinct events. Strong fusion was clearly shown by the perfect rhythm-identification performance at 0-ms SOA (PC of 0.99). On the other hand, clear segregation was shown by the low performance at 36 ms (mean PC of 0.70) and 48 ms SOA (mean PC of 0.67). Intermediate values of SOA of 12 and 24 ms produced ambiguous cases of grouping, in which the maskers and captors were neither fully fused, nor fully segregated. This ambiguous grouping might be linked to the inherent temporal constraints of the auditory system due to short-term adaptation of the auditory-nerve fibers (Kiang et al., 1965). As a result of the 10-to-20 ms period that it takes for an onset-sensitive neuron to return to its baseline activity, there might be a minimum temporal disparity required for the system to distinguish two consecutive sound events which are temporally contiguous. This is the situation when two sounds are close together in time and separated by a brief period of silence, as is the case for the detection of a temporal gap, or when they are temporally overlapping, as is the case in our RMR studies. The hypothesis of short-term adaptation as imposing some limit for the temporal resolution of sound events at different places in the spectrum is consistent with the estimated minimum 30 ms disparity needed to detect a gap across the spectrum, i.e., an offset-to-onset interval (Formby, Sherlock and Li, 1998) and to detect an onset-to-onset and offset-to-offset disparity across the spectrum in our RMR studies.
The presence or absence of a common F0 does not seem to play an important role for the segregation of brief concurrent tones as shown by the small differences in PC and d' obtained for harmonic and inharmonic maskers and captors. Furthermore, Figure 1 shows that it affected only weakly the temporal disparity needed for their segregation as separate events. This is consistent with the results found for the presentation of the same stimuli over headphones (chapter 4 in Turgeon, 1999). Further experimentation should attempt to determine whether the weak role of harmonicity for the fusion of short-duration sounds is related to differences in the temporal limits for the segregation of sounds as separate events and the computation of their pitch (Darwin and Carlyon, 1995).
In this study, the angular separation of the sources (Dq) did not yield any difference in fusion, whether fusion was estimated from d' or from SOA thresholds based on PC scores. This goes contrary to the results of research in which the same sounds were presented over headphones (chapter 4 in Turgeon, 1999). It might be that dichotic separation is more efficient for sound segregation because it is an extreme case of interaural differences for sounds happening simultaneously, the stimulation of one sound being delivered to one ear only, while that of the other sound(s) is delivered to the other ear only. Free-field testing is more akin to real-world situations in which each of many individual sounds stimulates both ears, though at slightly different times and intensities, allowing for the computation of the location of each sound source. When drawing conclusions about the contribution of spatial disparities, one should not consider dichotic presentation as reflecting ecologically valid differences in sound-source locations. Even when two sound sources are close to different ears, a sound coming from one of them usually stimulates the two ears, albeit with larger binaural differences in intensities and time of arrival than if sources were closer to the midline axis. For this reason, the separation of sources in a free field is considered as more representative of the true contribution of spatial separation to sound-source segregation. This contribution seems to be very weak when two sound sources are simultaneously active. It is also worth noting that steady-state sounds were used in the present study. Tones fluctuating in amplitude might permit spatial differences to cause segregation, especially with longer tones. This remains to be empirically investigated.
An important implication of this research is that when brief complex tones happen at the same time, sound-source segregation ("how many" individual sources are perceived) is independent of sound-source separation ("where" individual sources are relative to each other in the immediate environment). This is consistent with the claim that localization ("where") entails segregation ("how many"), but not the reverse. To localize a source, a source has to be perceived in the first place. For instance, if the bark of a dog or the sound of an unknown animal is heard as coming from a precise location, its source has to be segregated from the other environmental sources, and this whether it is identified or not. However, a source can be perceived and identified without being localized. Everyone has experienced at some point, hearing a familiar sound distinctly, without being able to tell exactly where it was coming from. A similar reasoning holds for pitch: pitch perception is a property of a perceptually-segregated sound; nevertheless a sound can be segregated without having a pitch, as happens when a brief click without a definite pitch is perceived. Sound segregation is such a basic property of audition that one might expect that the system computes it even in the face of ambiguity in the signal (e.g., as to "where" it comes from).
Summary of conclusions
The use of the RMR paradigm, which creates ambiguous auditory figures, allows for the evaluation of the relative importance of auditory-grouping cues in sound-source determination. It has shown that: i. temporal coincidence is sufficient for the perceptual fusion of short-duration tones; ii. an onset-to-onset disparity between 28 and 35 ms segregates them as separate sound events; iii. spectral regularities, such as simple harmonic ratios affect weakly the degree of fusion at intermediate values of temporal disparities from 12 to 24 ms and iv. the fusion of short-duration tones which are spectrally non overlapping appears to be independent of the angular separation of their sound sources. The short tones that were used in this study may have been responsible for these results. Whether these conclusions apply to sounds of a longer duration and to other types of complex sounds (e.g. speech sounds) awaits further experimentation.
Brunstrom, J.M., and Roberts, B. (2000). Separate mechanisms govern the selection of spectral components for perceptual fusion and for the computation of global pitch. J. Acoust. Soc. Am., 107, 1566-1577 .
Darwin, C. J. and Carlyon, R. (1995). Auditory grouping. In B.C.J. Moore (Second eds.), Hearing: The handbook of perception and cognition, Volume 6, (pp. 387-424). London: Academic Press.
Darwin, C. J. and Ciocca, V. (1992). Grouping in pitch perception: Effects of onset asynchrony and ear of presentation of a mistuned component. J. Acoust. Soc. Am., 91, 3381-3390.
Formby, C., Sherlock, L. P., and Li, S. (1998). Temporal gap detection measured with multiple sinusoidal markers: Effects of marker number, frequency, and temporal position. J. Acoust. Soc. Am., 104, 984-998.
Hartmann, W. M. (1988). Pitch perception and the organization and integration of auditory entities. In G.W. Edelman, W.E. Gall & W.M. Cowan (Eds.), Auditory function: neurobiological bases of hearing, (pp.; 623-645). New York: John Wiley and Sons).
Kiang, N. Y-S. Watanabe, T., Thomas, E. C., and Clark, L. F. (1965). Discharge patterns of single fibers in the cats auditory nerve. Research Monograph No. 35. Cambridge, MA.
Kidd, G., Mason, C., Rohtla, T. L., and Deliwala, P. S. (1998). Release from masking due to spatial separation of sources in the identification of nonspeech auditory patterns. J. Acoust. Soc. Am., 104, 422-431.
Macmillan, N. A. and Creelman, C. D. (1991). Detection Theory: a User's Guide. Cambridge, MA: MIT Press.
Turgeon, M. (1999). "Cross-spectral grouping using the paradigm of rhythmic masking release". McGill University. Doctoral thesis dissertation.
Turgeon, M., and Bregman, A.S. (1996). " 'Rhythmic Masking Release': A paradigm to investigate the auditory organization of tonal sequences.", In: Proceedings of the 4th ICMPC, pp. 315-316.
Turgeon, M. and Bregman, A.S. (1999). Rhythmic Masking Release II: Contribution of cues for perceptual organization to the cross-spectral integration of concurrent narrow-band noises in a free field -- asynchrony, correlation of rapid intensity changes, frequency separation and spatial separation. Unpublished manuscript, Dept. of Psychology, McGill University Montreal, Quebec, Canada. Submitted to J. Acoust. Soc. Am.
Weibull, W. A. (1951). A statistical distribution function of wide applicability. J. Appl. Mech., 18, 292-297.
Back to index