Chapter 4
Theme 4: Central Auditory Function
Principal Representatives: Quentin Summerfield, Alan Palmer, Mark Haggard

Progress report

Structured abstract

(i) Objectives of the theme: We seek to understand the processes of auditory analysis by which complex sounds, including speech, are discriminated and interpreted, particularly when presented together with competing sounds. One emphasis has been on the role of binaural hearing. A recent second emphasis has been on the development of enabling technologies and techniques to realise the potential of functional magnetic resonance imaging (fMRI) as an additional tool for investigating auditory perception.

(ii) Scientific progress: (a) We have used a combination of psychophysical methods and computational auditory modelling to investigate and describe the processes involved when listeners with normal hearing: (i) attempt to separate sound sources on the basis of differences in interaural timing, (ii) detect, localise, and discriminate "dichotic" pitches, (iii) integrate monaural and binaural evidence of the spectral structure of speech sounds in noise, and (iv) detect changes in the interaural correlation of signals. (b) We have established a collaboration with the Magnetic Resonance Centre, Department of Physics, University of Nottingham, to use fMRI to investigate central auditory function, and have overcome many of the technical difficulties in presenting high-fidelity calibrated sounds in the acoustically and magnetically hostile environment of an MR scanner. (c) We have developed and evaluated techniques for obtaining brain images at points separate in time from the acoustic stimulus, so that activation depends on the analysis of the stimulus, not on the cognitive effort in separating it perceptually from scanner noise. (d) We have laid the ground for future projects by measuring brain activation when subjects detect frequency differences in stationary and modulated sounds, detect changes in the intensity of simple and complex tones, and lipread continuous speech.

(iii) Specific scientific achievements: (a) We have shown that perceptual grouping of sounds by common interaural timing differs from grouping by common harmonicity, despite the strong likelihood that analyses of lateral position and pitch both involve temporal correlation. (b) We have demonstrated that listeners’ sensitivity to the across-frequency profile of interaural decorrelation can be modelled by independent, multi-channel, equalisation-cancellation, and that this model has the emergent property of accounting for an important subset of the dichotic pitches. (c) We have developed a system for the safe delivery of high-fidelity calibrated sounds to subjects during fMRI; copies of the apparatus are to be supplied to four other groups in the UK. (d) We have developed a technique – "sparse volume acquisition" – that obtains images at the maximum and minimum of the haemodynamic response to an acoustic stimulus, but at moments that do not overlap with the stimulus, and have demonstrated that it is better able to reveal activation by the stimulus than conventional continuous imaging.

(iv) Implications of the work for improving health, or health care, and increasing wealth: (a) Demonstrations of auditory strategies for separating signals from noise have indirect influence on the design of speech recognisers and procedures for speech enhancement which have the potential to benefit hearing-impaired people. (b) An understanding of central auditory function and its breakdown in pathology will be of value eventually in the assessment and rehabilitation of central pathologies affecting the auditory system.

(v) Collaboration with industrial, international, or charitable organisations, including external funding: The imaging component of the theme has been conducted in collaboration with the Magnetic Resonance Centre, Department of Physics, University of Nottingham. The recurrent costs are part-funded by one component of a Special Project Grant awarded to the Department of Physics by MRC.

Introduction

This theme was originally titled "Auditory Perception" and embraced studies of perceptual acclimatisation to amplified sound and studies of binaural hearing. Studies of acclimatisation are now described in Theme 5. Studies of binaural hearing are described below in the first half of this progress report. The new title reflects a future of broader and deeper scope linked more to neuroscience, but one which builds on past techniques and discoveries. The second half of this progress report describes the first steps in this reorientation – the development of technologies and techniques required for research on hearing using fMRI. These developments are one of three strands of work being conducted using functional imaging. The other two – studies of the functional architecture of the auditory cortex and applications of fMRI to address fundamental questions in audition and speech perception – have also made useful progress, but can be described more economically along with the future proposals to which they have led. The strategic justification for the new direction is given chiefly in Section 4 of Chapter 0.

4.1 Binaural processing of multiple sound sources

(Summerfield, Culling, Akeroyd)

Our overall aim has been to understand the auditory processes which allow listeners to attend selectively to one source of sound when several sources are present simultaneously. In much of the work, the sources are a pair of voices. A computational solution to this version of the problem of auditory selective attention would have obvious applications in automatic speech recognition and in aids to hearing. The immediate benefits, however, are the insights which the work generates into basic processes in hearing. The fact that processes of, for example, pitch perception and localisation work successfully when more than one source of sound is present both generates informative experimental leverage and constrains accounts of those processes. The project has had two specific goals: initially to understand the contribution of binaural hearing to separating sound sources, and latterly to understand how binaural hearing plays its essential function of establishing where sources are located when more than one source is present.

4.1.1 Source segregation

In 1995, we reported the counter-intuitive discovery [4.06], subsequently corroborated (Hukin and Darwin, 1995), that listeners cannot group energy in frequency regions that display the same interaural time difference (ITD) and segregate it from energy in other regions that display a different ITD. In this respect, ITD differs from harmonicity [4.14, 4.15], where across-frequency grouping by common fundamental periodicity is a robust phenomenon. The result is surprising because ITD is the primary cue to lateralisation, and listeners have the strong sense that lateral separation of sources facilitates selective attention. The role of ITD is rationalised by the demonstration that processes which determine where a source is located operate after processes of grouping have determined what there is to locate (e.g. Hill and Darwin, 1996).

The question arises of what auditory analyses underpin the classical finding of the Binaural Intelligibility Level Difference (BILD), wherein speech intelligibility is sustained at a lower signal-to-noise ratio (SNR) in random noise if the noise and the speech are presented with different interaural timing. We have demonstrated that listeners are sensitive to the profile of interaural decorrelation across frequency and can interpret its spectral features in a similar fashion to the features of a conventional (monaural) excitation pattern [4.19]. When a complex sound, such as a vowel, is presented at an adverse SNR but with interaural timing different from a competing noise, features of the spectrum of the vowel which are obscure in the excitation pattern can be found in the profile of interaural decorrelation, and vice versa. The profile offers an additional cue that contributes to the BILD. We have developed a computational model which performs equalisation-cancellation independently in each frequency channel to generate a measure (a "recovered spectrum") that is closely related to the profile [4.06, 4.10]. Briefly, (a) The signals presented to the left and right ears are analysed with matched gammatone filterbanks. (b) Mechanical to neural transduction is simulated by a model of an inner hair-cell (Meddis, 1986), yielding waveforms representing the instantaneous probability of neural discharge. The following processes are then applied in corresponding pairs of channels with the same centre frequency from the left and right ears. (c) One member of the pair is delayed with respect to the other. The waveforms are subtracted, point for point. The summed absolute remainder after subtraction is noted. (d) The process of delay and subtraction is repeated for delays ranging from -5 ms to +5 ms. (e) The minimum remainder over this range of delays contributes the amplitude of the recovered spectrum at the centre frequency of the channels.

The minimum remainder is closely related to the degree of interaural decorrelation because correlated waveforms cancel leaving small remainders, while decorrelated signals do not cancel and so leave large remainders. Two of the assumptions of the model appear surprising: that the internal cancelling delays can (a) exceed the ecological range of ITDs, and (b) be different in different frequency channels. The first assumption is not implausible, given the existence of units in the inferior colliculi of guinea pigs with best delays of 2 ms [2.09]. The second is required for E-C to measure interaural decorrelation, and is validated by the model’s success in accounting for the Fourcin Pitch (Section 4.1.2).

The thrust of subsequent work has been to justify this "modified Equalization-Cancellation" (mE-C) account and the concepts that underpin it in three ways: by demonstrating its generality through its power to explain other phenomena (Section 4.1.2), by justifying the idea that listeners can access both the profile of interaural decorrelation and the (conventional) profile of spectral amplitude (Section 4.1.3), and by measuring and characterising the shape and duration of the "temporal window" through which interaural correlation is assessed (Section 4.1.4).

4.1.2 Dichotic pitches

The generality of the mE-C model is shown by its ability to account for a sub-set of the "dichotic pitches". Dichotic pitches arise from interaural interaction between broad-band noises. They are not heard when either ear is stimulated alone. They can be grouped in three classes. Class I contains the Huggins Pitch (Cramer and Huggins, 1958), the Multiple Phase Shift (MPS) pitch (Bilsen, 1977), the Fourcin Pitch (Fourcin, 1970), and the Binaural Edge Pitch (BEP) (Klein and Hartmann, 1981). These pitches are heard easily by the majority of listeners. Each generates a recovered spectrum containing peaks at the harmonics of the fundamental [4.10, 4.11]. Class II contains the Binaural Coherence Edge Pitch (Hartmann, 1984). It is necessary to invoke central lateral inhibition to explain this pitch; its recovered spectrum contains no peak. Class III contains the dichotic repetition pitch (DRP) (Bilsen and Goldstein, 1974) which is heard by only a subset of listeners and even then with some difficulty. This pitch does not give rise to a recovered spectrum and has been hypothesised to arise through weak interaural neural "cross-talk". In summary, therefore, the mE-C model provides a coherent account of the perceived frequencies of the pitches in Class I. That result in turn helps consolidate the view that the other two classes arise through different mechanisms. Overall, the work demonstrates that the dichotic pitches, in the past regarded as interesting but obscure phenomena located somewhat outside mainstream interest in binaural hearing, can now receive an explanation that is compatible with other thinking about the mechanisms of binaural hearing. The future proposals contain plans to combine central lateral inhibition with other processes to produce a unified computational model of the pitches in Classes I and II (Section 4.3.6).

4.1.3 Integrating monaural and binaural evidence of vowels

We have measured the ability of listeners to combine evidence of the spectrum of a vowel derived from the profile of interaural decorrelation (termed "binaural" evidence) with evidence derived from the excitation pattern (termed "monaural" evidence) [4.19]. Listeners identified 2-formant vowels where one formant was defined by a band of interaural decorrelation and the other was defined by a peak in excitation. They could perform this task, demonstrating that integration is possible. Integration is not mandatory because listeners could attend selectively to evidence presented in either domain, insofar as they could identify a vowel defined in the profile of decorrelation that was presented simultaneously with a vowel defined in the excitation pattern, and vice versa. Such powers of selective attention are limited, however. It was necessary to introduce an additional cue for segregation – an onset asynchrony between the monaural and binaural vowel – for listeners to identify the lagging vowel as accurately as when it was presented in isolation. By preference, therefore, listeners do not segregate evidence in the two domains, suggesting that it converges on a common representation. Such convergence is beneficial, given that the profile of interaural decorrelation generally supplements, rather than replaces, the profile of monaural excitation when listeners attend to speech in noise.

4.1.4 The binaural temporal window

A model of a listener’s response to interaural decorrelation must include a temporal weighting function over which decorrelation is computed. Traditionally, single-sided exponential functions have been used to model the temporal integration of interaural parameters. However, such functions do not describe monaural temporal integration well, suggesting that they might also be deficient in binaural modelling. Accordingly, we measured the temporal weighting function (the "binaural temporal window") through which evidence of interaural decorrelation is assessed perceptually [4.09]. Our method was an analogue of the gapped-noise technique for measuring the monaural temporal window (Moore et al., 1988). Subjects detected an Sp tone in the N0  segment of an NU-N0-NU noise as a function of the duration of the N0 segment. The resulting window was fit well by an asymmetric two-sided Gaussian function with an equivalent rectangular duration (ERD) of 120 ms. We have incorporated this function in our modelling.

We then assessed the accuracy with which functions of this type could account for performance in a new task, a binaural analogue of gap detection [4.12]. We measured the minimum duration of an interaurally decorrelated segment of noise that could be detected in an otherwise correlated burst. The minimum duration increased from a few milliseconds at 200 Hz to several tens of milliseconds at 1 kHz. The inference that the duration of the binaural temporal window increases with frequency is incorrect, however. When the jnd for interaural correlation is taken into account, along with the increase in the jnd with frequency, the ERD of the window emerges as independent of frequency at about 160 ms – a similar value to our earlier estimate. Individual differences in window duration were smaller than individual differences in either the minimal detectable gap or the jnd for interaural correlation, suggesting that the window measure may reflect a more basic parameter of binaural hearing.

4.1.5 Lateralisation of dichotic pitches

If a narrow band of frequencies is interaurally decorrelated in an otherwise correlated broad-band noise, listeners hear a tone with the frequency of the decorrelated band against the background of the noise. This stimulus is the "generic" dichotic pitch of Class I (Section 4.1.2). When the noise is diotic (N0), the pitch is clearly lateralised for the majority of listeners. This stimulus has the potential to reveal processes by which weak narrowband signals are lateralized in the presence of more-intense broadband signals. Existing models of lateralisation (e.g. Stern and Trahiotis, 1995) were designed to accommodate single sound sources. They correctly predict the lateralisation of the noise, but not the tone. An alternative account of the lateralisation of dichotic pitches (Raatgever and Bilsen, 1986) is based on the idea of a "central spectrum". A central spectrum is created by sampling the cross-correlation functions of a stimulus across frequency at a particular internal delay. The account predicts the existence of a dichotic pitch if the spectrum at any delay contains a sharp peak. The frequency of the pitch is given by the frequency of the peak. The lateralisation of the pitch is given by the internal delay of the spectrum which contains the peak. We have identified several difficulties with this account: it requires unrealistically fine frequency selectivity, it fails to predict the existence of some dichotic pitches, and it predicts the existence of some pitches which are not heard [4.11]. Its limitations as a method for predicting the lateralisation of dichotic pitches arise because there are many acceptable candidate spectra, therefore the lateralisation is not predicted uniquely. The starting point for an alternative approach is the observation that a narrow decorrelated band in isolation has a diffuse lateralisation. Only when the band is bounded by a correlated noise does it take on a precise lateralisation. Accordingly, we have developed a model which takes account of the structure of the noise in predicting the lateralisation of a dichotic pitch. It does so by comparing the cross-correlogram of the noise incorporating the decorrelated band with the cross-correlogram of the same noise alone, inferred by interpolation across regions of decorrelation. It predicts lateralisation from the internal delay of the maximum positive difference between the two cross-correlograms. This model correctly predicts: (a) systematic variation with frequency of the lateralisation of the dichotic pitch when the decorrelated band is incorporated in a correlated (N0) noise, (b) constant lateralisation when the decorrelated band is incorporated in an antiphasic (Np) noise, and (c) systematic shifts in lateralisation with ITD when interaural decorrelation is created by giving a narrow correlated band a different ITD from the remainder of the noise [4.20]. Result (c) distinguishes the new model from Raatgever and Bilsen’s approach which predicts that lateralisation is determined by the structure of the noise, and so would not change with ITD.

4.2 Imaging of central auditory function

(Haggard, Summerfield, Palmer, Hall, Akeroyd, Chambers, Bullock, Foster, Gurney, Kornak, in collaboration with Morris, O’Hagan, Bowtell, Elliott)

The application of functional neuroimaging to study audition is more problematic than for other sensory modalities for several reasons: (a) the difficulty of delivering high-fidelity calibrated stimulation in high magnetic fields; (b) the intense sound levels generated by magnetic resonance (MR) scanners; (c) the limited pointers from animal studies towards the function of the auditory cortex; (d) the small size and non-superficial location of the human auditory cortex; and (e) the fact that a great deal of processing occurs within the brainstem and thalamic nuclei which are on the margins of anatomical registration and spatial resolution for obtaining any replicating voxels. We have addressed problem (a) by developing a high-quality sound system for use in MR scanners (Section 4.2.1). We have addressed problem (b) by developing a "sparse" volume acquisition technique, and have analysed the basis of its success (Section 4.2.2). Our approaches to problems (c), (d), and (e) are described in the future proposals.

4.2.1 Enabling technology: the IHR fMRI sound system

(Palmer, Chambers, Bullock, Akeroyd)

The presentation of precisely specified acoustic stimuli during fMRI poses considerable challenges: (a) subjects must be shielded from the sound fields (up to 130 dB SPL) that are generated during image acquisition; (b) devices introduced into the scanner for sound delivery must neither distort the magnetic fields nor be susceptible to electromagnetic induction that would distort the acoustic stimuli. To overcome these problems, many groups deliver sounds through plastic tubes combined with passive attenuation at the ears. However, it is difficult to calibrate tube-phone systems precisely or to correct frequency and phase distortions, thus reducing the potential for incorporating noise-cancellation techniques. Nor are tube-phones ideal in terms of fitting, comfort, and sound isolation. The IHR fMRI sound system [4.21] overcomes these problems and delivers low distortion broad-band sounds (0-22 kHz) over a wide dynamic range up to sufficiently high output levels (currently limited for safety to 90 dB SPL) to be suitable for testing hearing-impaired listeners. The key components are: (a) electrostatic headphones built into ear defenders, without compromise to the performance of either of them or to the homogeneity of the magnetic field; (b) electrical input to the headphones through non-metallic cables within the shielded enclosure of the scanner; (c) remote digital-to-analogue conversion located outside the shielded enclosure, with data transmitted to this point in digital optical form preventing electrical interference from the power supplies to the scanner; and (d) the delivery of audio-signals under the control of a PC whose activity is synchronised by the scanner. The sound system has been improved incrementally as experience with its use has accumulated both in Nottingham and as a result of tests in scanners in London, Oxford, and Sheffield. The design has stabilised and copies are being constructed for current and potential collaborators in four other research groups.

We are seeking now to increase the attenuation of air-born sound by incorporating active noise cancellation based on adaptive filtering. In the laboratory, we have demonstrated cancellation at the position of a microphone housed within the headset by up to 20 dB at frequencies up to 4 kHz. Psychophysical experiments are confirming that these degrees of physical cancellation are accompanied by reductions in masked thresholds for pure tones. Thresholds fall by 10-15 dB at frequencies below 1 kHz. At higher frequencies little or no benefit is measured, because bone-conducted energy exceeds air-conducted energy, given that the ear defenders around the headphones apply 40 dB of passive attenuation to air-born sounds (Berger, 1986). Thus, a further step will be to improve the sound isolation of the listener in order to reduce bone-conducted energy (Ravicz and Melcher, 1998) and thereby maximise the benefits from the cancellation system. We are also investigating ways of combining the sound system with a surface-coil for signal acquisition. For some experiments, the greater resolution attainable will offset the limitation of being able only to measure activation induced by contralateral stimulation. Finally, a simulation of the sound field in the scanner has been set up in a sound-proof room as an inexpensive means of piloting equipment. It will also be used to screen subjects and to familiarise them with stimuli and procedures in order to optimise the use of scanner time.

4.2.2 Enabling imaging paradigm: "Sparse" volume acquisition

(Haggard, Hall)

The use of fMRI to explore central auditory function may be compromised by the intense bursts of acoustic noise produced by the scanner whenever the MR signal is read out. Others have been aware of this problem, which can distract subjects during any task and also produces auditory masking (Eden et al., in press; Bandettini et al., 1998; Cho et al., 1998), but they have not systematically decomposed its basis experimentally with a view to optimising auditory paradigms. We have developed and evaluated a method which reduces the effect of the scanner noise, and named it "sparse temporal sampling" [4.13]. When using this technique, single volumes of brain images are acquired at times which optimise detection of the activation; i.e. near to the established maxima and minima of the haemodynamic response to the acoustic stimulus of interest, but at points in time that do not overlap with the stimulus. To achieve this, 14-s bursts of continuous speech were alternated with 14-s periods of silence. In Experiment 1, the course of the haemodynamic response was mapped by acquiring a volume of images every 2.33 s. The mean location of the peak of the response occurred 10.5 s after the onset of the speech and had returned to within 10% of baseline after 8.1 s of silence, indicating when single volumes of images can best be acquired to compare activation to speech and silence. To validate the technique, in Experiment 2, single volumes of images were acquired every 14 s (i.e. sparse imaging). Despite a much reduced quantity of MR data, this method successfully delimited broadly the same regions of activation as conventional continuous imaging. However, the mean percentage MR signal change within the cortical regions of interest was greater during sparse imaging, suggesting that sparse imaging is advantageous, as it ensures that the obtained activation depends on the acoustical stimulus alone. Auditory experiments that use continuous imaging may be measuring activation that is a result of an interaction between stimulus and task variables (e.g. attentive effort) induced by the scanner noise. We now use sparse temporal sampling in the majority of our experiments. The technique will be particularly important in experiments on auditory figure-ground segregation where we plan to exercise precise control over the cues that distinguish figure from ground (Section 4.3.5).

Future proposals

Structured abstract

(i) Objectives of proposed research: The aim is to understand the processes of central auditory analysis by which complex sounds, including speech, are discriminated and interpreted, particularly when presented together with competing sounds.

(ii) Design of research: The projects proposed form a progression which will (a) optimise the use of fMRI for studying auditory analysis, (b) define the functional architecture of the auditory cortices in analyses of spectro-temporal complexity, intensity, and the spectral extent of sounds, (c) test hypotheses from psychoacoustics and neurophysiology about the functional interplay between cortical, thalamic, and mid-brain loci.

(iii) Systems to be investigated: The work planned for the first three years will involve neurologically normal right-handed adults with normal hearing. Later, a study may be undertaken involving children.

(iv) Techniques to be used: We shall use fMRI in combination with rigorous psychophysical tasks and constrained stimuli to test specific hypotheses about the discrimination and identification of different classes of sound. In some experiments, subjects will be extensively familiarised with stimuli and procedures; thus, the limits of performance will be known and will be related to patterns of activation. For specific projects, we shall image cortical and sub-cortical loci, employing cardiac gating to reduce signal variability. Where possible, parametric or factorial imaging designs will be used in preference to subtractive designs. Statistical parametric mapping and other techniques will be used to analyse data. Structural equation modelling will be used to analyse effective connectivities between significantly activated voxels. Computational auditory modelling will be used to summarise understanding of auditory processes and to generate predictions.

(v) Measurements to be made/Outcomes to be evaluated: (a) We will complete an analysis of the population of voxels in auditory cortex, developing maps of three parameters of their haemodynamic response (HDR), validating the maps anatomically by their spatial relationship to blood vessels, and seek to determine, for example, respects in which early weak HDRs can be more informative of neural activity than the late strong HDRs which may reflect venous drainage. (b) We shall map the cortical response to intensity in spectrally compact and diffuse sounds, determining whether the activation of any sub-set of voxels reflects perceived loudness, and whether any component of individual differences between subjects in activation can be explained by the involvement of the stapedial reflex. (c) We shall use connectivity analysis to test the hypothesis that primary and secondary auditory areas form a functional hierarchy. (d) We shall use a combination of computational modelling, psychoacoustics, neurophysiology, and possibly fMRI, to test the hypothesis that there is a single temporal pitch extractor which is activated by both monaural and dichotic pitch stimuli. (e) We shall image cortical, thalamic, and mid-brain loci while subjects attend to auditory objects which are distinguished from their backgrounds by primitive grouping cues, also using connectivity analysis to establish the target loci of the descending influences involved in auditory selective attention. (f) We shall seek to identify the basis, in loci of activation or effective connectivities between loci, of the large and hitherto unexplained individual differences in the ability to lipread found in the neurologically normal population. (g) At a mid-term stocktaking, we shall decide whether to proceed towards imaging studies accompanying more detailed psychoacoustic studies building on the currently necessary clinical characterisation of the childhood deficit referred to as "Central Auditory Processing Disorder".

(vi) Implications for improving health or health care, or increasing wealth: The ultimate aim is to understand central auditory function and its abnormalities, and to apply that knowledge in the assessment and rehabilitation of patients.

(vii) Proposed collaboration with industrial, international or charitable or organisations, including external funding: The imaging component of the theme is founded on a collaboration with the Magnetic Resonance Centre, Department of Physics, University of Nottingham. This collaboration is part-funded by one component of a Special Project Grant awarded to the Department of Physics by MRC. We expect to collaborate further with other imaging centres on an ad hoc basis.

Introduction

Current understanding of the roles and mechanisms of the mammalian auditory cortex is limited, as summarised in Chapter 2, although one organising principle is well established: frequency is represented topographically across several adjacent areas (Merzenich et al., 1975; Imig and Reale, 1981). This principle has been confirmed in the primary auditory cortex of humans using PET (Lauter et al., 1985) and fMRI (Wessinger et al., 1997a; Talavage et al., 1997). In primary auditory cortex of cat, some topographic organisation has been demonstrated also with respect to other dimensions whose functional significance is less clear – including absolute threshold, sharpness of tuning, and the asymmetry of lateral inhibitory areas (Shamma and Symmes, 1985; Schreiner and Mendelson, 1990; Sutter and Schreiner, 1991; Clarey et al., 1991; Shamma et al., 1993; Kowalski et al., 1995). This uncertainty contrasts with the state of understanding in vision (e.g. Zeki, 1993). Visual cortex has been shown to be functionally parallel, modular, hierarchical, and re-entrant: properties relating to form, motion, and colour are analysed, in the first instance, by separate parallel systems (DeYoe and Van Essen, 1988); cells in V4 are not simply wavelength-selective as in V1 but respond uniformly to a certain colour irrespective of its reflected triplet of energies (Zeki et al., 1991); activity in V5 can selectively enhance activity in V1, V2, and V3 (Hupé et al., 1998). There are, of course, good reasons for expecting differences in functional organisation between the visual and auditory cortices not least because in audition a peripheral analogue to eye movement is absent and such sensory analysis occurs in sub-cortical nuclei. For example, in the cochlear nucleus, parallel pathways convey information relevant to the analysis of localisation, spectral shape, and amplitude modulation (possibly including pitch) (Young, 1998). Those pathways have already converged at the level of the inferior colliculus. Parallel ascending pathways from the dorsal division of the medial geniculate body project to secondary auditory cortex, and unidirectional projections from primary to secondary areas have also been identified (Liegois-Cheval et al., 1991; Pandya et al., 1995; Rauschecker et al., 1997). There is then a rich reciprocal innervation between primary auditory cortex and the ventral division of the medial geniculate body (Winer et al., 1977; Clarey et al., 1991), with additional descending influences from primary auditory cortex to the lower brain-stem nuclei. Two conclusions can be drawn from this picture. First, given the analyses undertaken in the brain-stem nuclei, the organisation of the auditory cortices may reflect integrative, modulatory, and redistributive roles, rather than the analysis of basic acoustical features. Second, in seeking parallels with the modular, hierarchical, and re-entrant organisation of the visual cortices, the focus should be on the entire mid-brain/thalamic/cortical system, not just the cortex. For these reasons, we believe that progress in understanding auditory perception using fMRI will be hastened by adopting the following five sub-goals:

(1) The development of techniques which distinguish voxels whose haemodynamic response mainly reflects neurophysiological function rather than venous drainage.  Given the small size of the auditory cortices, there is value in any technique that can help localise the neurophysiological origins of measured activation. Project 4.3.1 has already made progress in this direction.

(2) Stimulus control.  For reasons of technical simplicity, the scope of many auditory imaging studies has been limited to the search for gross differences in function between primary and secondary auditory cortical areas. Their results have been interpreted as indicating that secondary areas respond preferentially to spectro-temporally more complex signals than primary auditory areas do. For example, bilateral areas within superior temporal gyrus activate to 2-octave-wide bands of noise in preference to pure tones (Wessinger et al., 1997b) and a region of the left superior temporal gyrus activates preferentially to modulated tones (our own data). This region may the same as one in left anterior superior temporal gyrus shown by Thivard et al. (1998) and Boddaert et al. (1998) to activate preferentially to frequency-modulated (FM) tones. All three results require careful scrutiny and illustrate potential interpretative difficulties. One interpretation of Wessinger’s result is that it reflects the activation of the human homologue of an area of superior temporal gyrus in monkeys where single-units display preferences for noise bands over tones (Rauschecker et al., 1995). However, we have found that the level of activation in primary and secondary areas increases with an increase in either the frequency extent or the intensity of a stimulus. Thus an alternative interpretation of Wessinger’s result invokes a simple side-effect of tonotopicity. Another possibility is that the result reflects an effect of loudness, given that at fixed SPL the loudness of a pure tone is about half that of a 2-octave-band of noise centred on the same frequency (Moore et al., 1997). For these reasons, it is important to distinguish effects of intensity, loudness, and bandwidth, and this will be done in Project 4.3.2. Turning to the effects of modulation, Thivard and Boddaert interpreted their results as identifying an area responsible for the analysis or representation specifically of FM. However, neither their experiments (nor our own) established whether the area is activated only by FM or by any other change in a stimulus (e.g. amplitude modulation). Nor is it clear whether the secondary areas activate to time-varying stimuli in preference to pure tones specifically or to stationary signals more generally. Project 4.3.3 will distinguish these alternatives. In general, it is unlikely that the "primary-simple, secondary-complex" distinction will prove to be sustainable in the light of two recent demonstrations. First, PET activation has been shown in primary auditory cortex to be correlated with the pitch strength of iterated rippled noise (Griffiths et al., 1998); this stimulus is more obviously classified as complex than simple. Second, reverse correlation has identified clearly complex stimulus dimensions that elicit strong activity in monkey primary auditory cortex (deCharms et al., 1998).

(3) Spatial differentiation among secondary auditory areas.  There is some value in identifying the specific stimulus dimensions which induce activation in broadly defined regions of auditory cortex. There is greater relevance in attempting to localise effects to smaller regions, particularly where the regions are also anatomically distinct. The anatomical delineation of four secondary auditory areas close to primary auditory cortex in the supratemporal plane and the posterior part of the superior temporal gyrus in man (Rivier and Clark, 1997) suggests that this finer localisation should be possible. The areas are termed anterior, posterior, lateral, and medial auditory areas. Rivier and Clark also postulated a fifth area, the anterior insular area, although this area is notably difficult to activate consistently (Griffiths et al., 1994, 1998). The lateral and medial areas are similar in size to primary auditory cortex and should be resolvable using fMRI. This optimism offsets the fact that functional distinctions between the areas are currently less clear than the anatomical divisions between them – when the results of the neuroimaging studies undertaken so far are overlaid onto the areas, different classes of stimuli elicit responses in the same general regions of auditory cortex. Project 4.3.4 will seek to localise activation in the areas and to demonstrate a functional hierarchy among them by the application of stimuli of progressive complexity.

(4) Characterisation of connectivities between areas.  The benefits of finer spatial differentiation of function will be enhanced by the use of statistical procedures for characterising effective connectivities between loci. We have already gained some preliminary experience of the techniques of path analysis and structural equation modelling for quantifying functional interactions between voxel time series. In collaboration with the Department of Psychiatry, University of Nottingham, we are exploring the co-variance structure between significant regions of activity and have derived data-driven statistical models of high-level functional interactions between parietal, frontal, and two levels of auditory cortex. We believe these methods have great promise but must not be prematurely oversold. Problems include model stability, the large number of different ways in which path models can be composed, the degrees of anatomical constraint imposed on models, the removal of temporal autocorrelation, underdetermination of models where there are more unknowns than equations, and the fitting strategy for bi-directional causality. Given the importance of resolving these issues, we have proposed to conduct Project 4.3.4 in collaboration with the national leaders in imaging studies and the appropriate methods – Professor Frackowiak and his colleagues in the Wellcome Department of Cognitive Neurology, London (Friston et al., 1995; Friston et al., 1997; Büchel and Friston, 1997). Connectivity analysis could be highly informative in characterising descending influences during selective attention in project 4.3.5. More generally, it is likely that its major application will lie in describing connectivity differences as a function of task, experience, subject group, etc. We will proceed towards using connectivity analysis in that way, but understanding its properties and limitations in single conditions, or with simple stimulus differences, is an important first step.

(5) Imaging thalamic and mid-brain regions.  A further set of techniques is required to image thalamic and mid-brain loci along with cortical loci. Guimaraes et al. (1998) and Harms et al. (1998) have identified imaging planes that transect (a) the inferior colliculus and the posterior part of Heschl’s Gyrus and (b) the inferior colliculus and the medial geniculate body. It may also be possible to image other mid-brain nuclei, although they are small in man (cochlear nucleus ~24 mm3, superior olivary complex ~12 mm3) in relation to the inferior colliculus (~250 mm3) and medial geniculate nucleus (~84 mm3). These nuclei are all displaced by pulsatile flow of cerebro-spinal fluid through the ventricles during the cardiac cycle. Accordingly, it is necessary to acquire images at moments that are phase-locked to the cardiac cycle, adjusting signal values for inter-image variations that result from variability in interscan time. These steps reduce variability in MR responses to acoustic stimuli (Guimaraes et al., 1998). Techniques for observing CSF flow patterns in human cortex with echo-planar imaging were pioneered in Nottingham (Stehling et al., 1991). With our colleagues in the MR Centre, we shall optimise procedures for imaging sub-cortical loci and apply them first in Project 4.3.5.

Overall plan. In planning projects, we have struck a careful balance between the desirability of implementing new techniques and the need to make some early substantive progress. The eight projects described below meet three criteria: (a) They test hypotheses. (b) Only a small proportion of the total effort is directed to developing new techniques. (c) Although several projects would benefit from two techniques in particular – connectivity analysis and the ability to image sub-cortical loci – none depends crucially upon them. Progress can be made using techniques that are already available to us.

4.3.1 Haemodynamics and neuro-informatics of auditory cortex

(Haggard, Kornak, Hall, in collaboration with O’Hagan)

To further understand and work within the limitations of fMRI data at the spatial, temporal, and neural levels (e.g. no distinction of local inhibitory activity), we propose to contribute to methodological work pushing back those limits. It has been shown (Goodyear and Menon, 1998; Menon et al., 1997) that the dissociation between rapid deoxygenation and delayed increase in blood volume possible at field strengths of 4T permits a physiologically more realistic and detailed image that can, for example, detect ocular dominance regions in visual cortex. Are there other ways, valid at 1.5-3T, to "see past" venous drainage to an enhanced representation of activation, by selecting aspects of the signal "closer" in space, in timing of the hypothetical activity pattern, or in neurovascular feedback terms to the active neurones than to the draining blood vessels? In a population study of voxels in auditory cortex we have obtained preliminary evidence that there may be. We noted that square-wave convolution methods (e.g. Rajapakse et al., 1998) provide a poor fit to the haemodynamic response (HDR) whatever the function used because they are inherently time-asymmetric, and that full modelling is required. Starting from an adaptation of the statistical modelling contribution of Lange and Zeger (1997), we have now shown first that constrained 4th- and 5th-order polynomials provide a materially better fit to HDRs than gamma functions do; they can also provide gamma-equivalent descriptive parameters (e.g. the height of the turning point or the area above the baseline) that avoid the typical arbitrariness of polynomial coefficients. Secondly, the aggregate region-appropriate HDR delay needs to be known to develop efficient paradigms and analyses (Section 4.2.2), but we have gone further to show smooth gradation across in-slice voxels for the delay and width parameters of the HDR; i.e. two further types of map exist. For detection of some activation at low signal-to-noise ratios (SNRs), HDR fitting is unlikely to be stable enough to permit the voxel-by-voxel fitting of HDR parameters to improve the estimate of activation. However at good SNRs it is likely to be helpful in specialised problems such as edge definition. Third we have shown that correlations between the three maps (height, width, delay) are only modest, implying that they afford independent information. One probable underlying constraint is that the late strong HDRs arise from venous drainage. Although easier to detect, these may arise in voxels that are less informative neurophysiologically, as implied by the visual example above.

Four further stages are now envisaged: (a) completing the population distribution description of voxels in the auditory cortex, determining how many HDR parameters provide useful degrees of freedom beyond excitation magnitude, and providing a more complete analysis of the information contained in the phase of negative overshoot ("undershoot"); (b) providing an anatomical validation of some aspects of the HDR parameters; e.g. identifying late strong responding voxels with labelled major vessels; (c) identifying in a straightforward imaging question an instance where it can be shown that the extra information in the HDR is worthwhile, and defining the most effective answer to this question achievable at 3T (probably in more than one scanner) by using all the HDR parameters; (d) integrating voxel magnitudes and the extra variables into a Bayesian decision framework. This has potential advantages because in a classical framework complex and specific hypotheses incur a thresholding penalty against Type 1 errors, whereas expressing at least 1st-order contingencies within the data as a Markov chain (by Gibbs sampling) allows a simple account of the proportion of the data (posterior distributions) displaying the predicted properties. An example problem is the optimal delimitation of the edge of an area of activation. Certain substantive issues (e.g. functional plasticity) require a better formalised means of delimitation for anatomical mapping of activation patterns than double differencing of images allows. If we are successful in (c), we will attempt to generalise and build on the finding so as to use consequentially selected or enhanced representation of the data to shift the balance from haemodynamically to neurophysiologically originating effects, both fixed experimental effects and the random effects upon which connectivity analysis depends. Subject to the quantity and quality of raw data, the path coefficients in structural equation models using such "enhanced" data could end up being smaller but their patterns could carry more meaning for neural information processing.

4.3.2 Cortical representations of loudness

(Summerfield, Haggard, Palmer, Hall, postdoctoral scientist)

Intensity is an essential property of sounds and their sources, and its psychophysical basis is quite well understood; loudness is a function of the auditory excitation induced by a sound integrated across frequency (Moore et al., 1997). However, the task of identifying the underlying representation of intensity in the human brain is likely to be complex for two reasons: (a) The auditory brain is primarily a pattern analyser; much of its neural matter is probably devoted to analysing spectro-temporal patterns of relative, rather than absolute, intensity. (b) Intensity may be represented by the activation of units which are distributed within volumes containing units subserving other functions (Taniguchi and Nasu, 1993), many of which may show non-monotonic responses to increases in intensity (Schreiner, 1998). Accordingly, the limited spatial resolution of fMRI may restrict the insights which it can offer into the coding of intensity. Nonetheless, Jäncke et al. (1997) have demonstrated that the extent of MR activation in primary and secondary auditory cortices increases as the intensity of consonant-vowel syllables is increased. Results of our own, obtained at lower spatial resolution due to having to use non-isotropic voxels, show linear increases in the magnitude, but not extent, of activation with increases in the intensity of pure and complex tones. However, there were large differences between subjects in both the absolute amount of activation and its pattern of change with intensity, including non-monotonic growth in some subjects. These results prompt four questions:

(1) At the highest resolution available, can a systematic classification be derived of the patterns of activation with increasing stimulus intensity shown by voxels? Is the classification stable within subjects and reproducible across subjects? Specifically: (a) Does the principal contribution to the overall increase in activation come from voxels which display a monotonic increase in activation with increases in intensity, or by the recruitment of voxels with progressively higher thresholds, many of which display saturating responses? (b) Is there a systematic spatial organisation that distinguishes voxels showing a monotonic increasing response, a saturating response, and other non-monotonic responses?

(2) Is there a subset of voxels over which changes in activation, individually or collectively, reflect changes in loudness rather than intensity?

(3) Can a significant component of the individual differences in patterns of activation be explained by differences in the extent to which the stapedial reflex is induced, either by the stimuli or by the sound of the scanner? And does any measure (amount, extent, steepness of the increase in activation with intensity, incidence of the stapedial reflex) relate to individual differences in judgements of the loudness of stimuli?

(4) Do results on the above questions generalise across stimulus class?

We shall address these issues starting with three experiments, using a surface coil to give improved spatial resolution in auditory cortex. Sufficient numbers of subjects will be tested (³ 10) to generate an indication of the range of individual differences. Stability of results, question (a), will be evaluated by re-testing subsets of subjects both within and across sessions. The approach to question (b) will be based on the following principle: if activation across a set of voxels reflects loudness, then a similar change in activation should be observed for the same change in loudness, irrespective of the spectral content of the signal. Thus, in Experiment 1, the auditory cortices will be imaged while subjects listen passively to 400-ms bursts of either a 1-kHz tone or a Gaussian noise of uniform spectrum level (0-5 kHz), each alternating with 400-ms periods of silence. In separate conditions, the level of the tone will be either 50, 60, 70, 80, or 90 dB SPL. The spectrum level of the noise will be either 0, 7, 15, 26, and 38 dB/Hz. According to the model of Moore et al. (1997), these values equate the loudness of the noise and tone. We shall search for voxels over which similar increases in activation occur for increases in level of tone and noise separately.

Experiment 2 will ask whether there are voxels whose activation reflects the loudness of a tone in the presence of a masking noise. Loudness and intensity will be dissociated by exploiting the phenomenon of partial masking, wherein the loudness of a tone is reduced by presenting it in a broad-band noise. In Condition 1, a 1-kHz tone will be presented at levels of 36, 57, 72, and 86 dB SPL in noise with a spectrum level of 8 dB/Hz. In Condition 2, the tone will be presented at levels of 60, 70, 80, and 90 dB SPL in noise with a spectrum level of 38 dB/Hz. These values equate the loudness of the tone in the two conditions. We shall compare levels of activation over the voxels identified in Experiment 1 between the tone-plus-noise conditions and noise-alone conditions. If the voxels are coding loudness, then the same increase in activation should occur when tones at corresponding levels are added to the two noises (i.e. raising the tone from 36 to 57 dB SPL in the 8-dB/Hz noise should induce the same increase as raising the tone from 60 to 70 dB SPL in the 38-dB/Hz noise). In a control condition (Condition 3), the tone will again be presented at levels of 60, 70, 80, and 90 dB SPL but in the 38-dB/Hz noise which will contain a 4-ERB wide spectral notch centred on the frequency of the tone. The notch will release the tone from partial masking and so largely restore its loudness to the level in Condition 1.

In psychophysical experiments measuring partial loudness, subjects need very little encouragement to attend to the tone. Experiment 3 will check whether the results change under conditions of explicit attention. Subjects will perform an odd-ball discrimination task. Odd-ball stimuli will have a tone duration of 500 ms, in comparison with 400-ms reference stimuli. These durations exceed those over which time-intensity trading occurs in loudness judgements. In this experiment, the noise will be gated on at the same time as the tone. It will be informative to run a further condition in which the tone starts after the notched noise so that its loudness is "enhanced"; psychoacoustic results in humans (Viemeister and Bacon, 1982) predict more activation from an enhanced tone in this further condition than in Condition 3.

The more intense of the stimuli in these experiments, and the sound generated by the scanner, will inevitably trigger the stapedius reflex in most subjects, thereby attenuating the air-conducted input to the cochlea. We shall establish whether any component of the non-monotonic growth of activation with intensity in individual subjects can be explained by the extent to which the reflex is triggered. Technically, it would be possible to monitor the reflex while simultaneously imaging subjects. However, the costs are not yet justified, given that we can use our simulation of the scanner soundfield (Section 4.2.1) to monitor the reflex and its threshold using standard clinical otoadmittance techniques. If these simulations identify informative predictors of the results the imaging experiments, we can upgrade the methodology to measure reflex input-output functions (Section 8B.1) or middle-ear attenuation measures (Section 3.10). Finally, via a double-sampling design linked to population prevalence by a stratification questionnaire we will take extreme groups (<10%-ile, >90%-ile) of loudness-tolerant and loudness-averse young subjects, thereby reducing the comorbidity and other biases seen in haphazard clinical sourcing. We will study their physiological (ARTs, ULLs) and psychoacoustic performances (DI, loudness summation, etc.) in the upper half of the dynamic range. If systematic differences emerge, we will seek to demonstrate cortical correlates in fMRI activation parameters (extent, i/o function, etc). Of interest in its own right, this final stage may also have implications for subject selection in fMRI.

4.3.3 Specific activation of secondary auditory areas by AM, FM, and harmonicity?

(Summerfield, Palmer, Hall, postdoctoral scientist)

We have found that greater activation is induced in left superior temporal gyrus by sinusoidally modulated tones than pure tones. The finding is compatible with a role of secondary areas in the analysis of more "complex" stimuli, but does not pin-point the features that define complexity, other than variation over time. The result prompts several questions: (a) Does the nature of the modulation matter – for example, do FM and AM activate the same areas? The peripheral auditory response to FM necessarily includes AM within frequency channels. Thus, if the cortex simply reflects the periphery, AM would activate a more restricted region than FM. (b) Does the nature of the carrier matter – for example, does modulation activate the same area when carried on harmonic or inharmonic carriers?

Our preliminary experiment was factorial with two levels of attention (passive vs. active listening) crossed with two levels of complexity (modulated tones vs. pure tones). We measured more activation in auditory areas with active listening and modulated signals, but no interaction. There is the potential for interaction when examining effects of type of modulation and carrier, given that listeners routinely experience coherent FM carried on harmonic components, but not inharmonic components, because speech contains the former not the latter. Therefore we shall explore the two questions in a pair of factorial experiments, designed to distinguish effects of static from modulated signals, of harmonic from inharmonic carries, and of FM from AM. Experiment 1 would compare activation by stimuli composed of equal-amplitude components which were either static or coherently sinusoidally frequency modulated and either harmonically related or inharmonically related. Experiment 2 would compare FM with AM, again carried on harmonic and inharmonic carriers. We would use modulation rates in the 2-10 Hz range that is dominant in the modulation spectrum of speech. Subjects would listen actively, making a button press to infrequent stimuli whose components were 6% higher in frequency than the majority of stimuli. If separate regions with preferences for AM and FM were identified, we would then establish whether the regions were also involved in speech perception. For example, we would hypothesise that the AM and FM regions would both be activated by natural speech, but only the AM region would be activated if the speech was resynthesised as a limited number of amplitude-modulated noise bands (Shannon et al., 1995), and the FM area would be de-activated by resynthesising the speech on a monotone. These tests would also be relevant to the question that is addressed in the next project of whether the secondary auditory areas form a functional hierarchy.

4.3.4 Intra-cortical relations in auditory analysis: functional differentiation and connectivity

(Haggard, Hall)

The first aim of this project is to investigate further the relationships between the neuronal activity in primary and secondary auditory regions by measuring the degree of differential activity produced by different stimuli and by computing effective connectivities between regions showing significant activation. In particular, we will explore the relationship between primary auditory cortex and the surrounding regions delineated by Rivier and Clark (1997). Some of these secondary regions lie close to primary cortex, but can be identified using the Talairach co-ordinate system or individual structural landmarks.

The second aim is to quantify the extent to which hierarchical processing occurs in these regions. The clearest expression of the hierarchical view comes from Rauschecker et al. (1995) who suggest that lateral areas of monkey auditory cortex prefer stimuli of increasing spectro-temporal complexity and ecological relevance. It is not clear how sustainable this concept will prove to be, given the results discussed in the introduction from deCharms et al. (1998) and Griffiths et al. (1998). Nonetheless, human imaging studies demonstrate that stimuli that are spectro-temporally more complex than pure tones induce activation preferentially in secondary areas. Accordingly, we shall first seek to confirm our preliminary findings that path coefficients between primary and secondary areas are positive, then establish whether later in the path they are larger for complex than simple stimuli. This finding should hold most strongly for any secondary regions that show differential stimulus-induced activation to the particular stimuli used.

We have proposed this project as a collaborative venture with Professor Frackowiak and his colleagues at the Wellcome Department of Cognitive Neurology, London. The exact form of the experiment has yet to be agreed. One option could be to construct stimuli from dichotic pitches (Section 4.1) because the spectral composition of the monaural inputs can be held constant across conditions and so avoid confounding effects of complexity with intensity or spectral extent. For example, stimuli of three levels of complexity could be created from a diotic noise by decorrelating 1-ERB-wide bands centred on 600 Hz (Stimulus 1), 400, 600, and 800 Hz (Stimulus 2) and 200, 400, 600, 800, and 1000 Hz (Stimulus 3). However, we make no commitment to particular stimuli at this stage. Their exact form will be influenced by the results of Project 4.3.3 and by further piloting.

4.3.5 Cortical-subcortical interactions during auditory selective attention

(Summerfield, Palmer, Hall, postdoctoral scientist)

Ecological use of hearing requires selective attention. Many acoustic cues have been identified that provide a basis for selective attention because they permit listeners to separate concurrent sounds. We have modelled some of these perceptual processes (e.g. Assmann and Summerfield, 1990) [4.06]. We have also studied the neural signal processing in the mid-brain nuclei that mediates the perceptual analyses (e.g. Palmer, 1990) [2.15]. fMRI offers the potential to image mid-brain loci simultaneously with cortical loci and, through the use of connectivity analysis, to describe ascending and descending influences. The overall aim of this project is to use these techniques to bridge neurophysiological and psychoacoustical understanding of auditory selective attention.

Our approach is motivated by two related observations: (a) the speech of a single talker can be understood when only gross temporal patterning in a few frequency bands is preserved (Shannon et al., 1995); (b) yet it can readily be shown that the intelligibility of such minimal representations is lost when more than one voice is present, demonstrating that spectro-temporal detail is required to separate competing voices. There is sufficient information in temporal patterns of neural discharges at the level of the cochlear nerve to permit segregation of competing speech sounds on the basis of differences in fundamental periodicity, and at mid-brain sites to permit segregation of tones from noise on the basis of differences in interaural timing. However, this precision is not preserved cortically; cochlear nerve fibres phase lock to 3-4 kHz while cortical units lock to frequencies only as high as 30 Hz. A solution to the problem of how cortically controlled attention might nonetheless exploit temporally detailed grouping cues can be derived from two ideas. First, the necessary temporal analyses are performed early in the ascending auditory pathway. Second, those analyses result in a frequency channel being "labelled" according its dominant temporal patterning at the periphery. After the labelling stage, temporal detail need not be preserved. Central processes direct attention to the evidence of a particular voice by increasing the gain in channels with the same label and/or decreasing the gain in channels with other labels (e.g. Meddis and Hewitt, 1992). Supportive evidence for the idea that descending projections play specific roles in selective attention comes from vision. De-activation by cooling of neurons in V5 leads to a decrease in V1, V2 and V3 of response magnitude to a bar moving on a stationary background of lower luminance, but to an increase in response when bar and background move coherently (Hupé et al., 1998). In other words, de-activation of V5 reduces figure-ground separation.

An analogous understanding does not yet exist of where in the auditory system the gain of channels in controlled nor where in the brain control originates, although imaging studies have provided some clues. Scheich et al. (1998) have identified a region of secondary auditory cortex on the rostral parakoniocortex that may be associated with auditory figure-ground segregation. Neuromagnetic recordings taken while human subjects attend to signals presented to one ear in competition with signals presented to the other ear have been interpreted as evidence that processes of selective attention "can regulate auditory input at or before the initial stages of cortical analysis" (Waldorff et al., 1993). Thus, the loci could be cortical – both the to-be-attended voice and the competing voice could be represented in the primary auditory cortex, possibly in a map of fundamental periodicity against carrier frequency (Langner et al., 1997). Descending influences would attenuate the onward transmission of the unwanted voice from the primary to the secondary cortices. Alternatively, given rich reciprocal innervation, the descending influences could control the gain in channels projecting from thalamic to cortical sites.

This project aims to answer three questions: (a) What cortical and sub-cortical loci are involved in auditory selective attention? (b) What are the connectivities between them? (c) In what ways do the loci and/or the connectivities differ depending on the acoustic cue(s) available for grouping and segregation? Initially, two cues will be studied – fundamental periodicity and onset time – chosen because they are powerful cues whose subjective effects are somewhat different: components that start after other components seem to capture attention automatically, while listeners can direct attention to either of two sources that differ in fundamental periodicity. These differences could be reflected in different patterns of activation and connectivity.

Our eventual aim is to use two of the techniques discussed in the introduction to answer questions (a), (b) and (c): observing activity in cortical, thalamic, and mid-brain loci, and using connectivity analysis to characterise the strength and direction of modulatory influences between loci. Enough is known about the anatomy of the ascending auditory system to usefully constrain such models. We would test sufficient subjects (e.g. N³ 12) to develop path models from the data of half the subjects and test their validity with the data from the other half. However, the potential of connectivity analysis to reveal bi-directional influences in the auditory system is yet to be proven, and it may take time to optimise imaging of mid-brain loci. Accordingly, our experiments are intended to generate informative results initially through conventional imaging of cortical and thalamic areas. Their design was influenced by the experiment of Scheich et al. (1998) who confirmed the feasibility of using fMRI to study auditory selective attention, but highlighted the need for a careful choice of stimuli. They subtracted activation by a masker from activation by the combination of the masker with musical notes. They argued that the remaining activation embraced too large a volume of cortex simply to reflect activation by acoustic features of the notes, and must instead reflect processes of figure-ground segregation. This conclusion may be correct, but clearly the design and interpretation are not watertight, as the authors acknowledged.

Related difficulties would be encountered in interpreting results obtained with spoken sentences. For example, regions of activation could be revealed by subtracting activation induced by active listening to sentences in quiet from activation induced by sentences from the same talker plus the speech of a competing talker. However, there would be no guarantee that the difference in activation reflected selective attention, rather than any or all of: (a) additional stimulation from the added voice, (b) additional effort entailed in maintaining performance at a lower signal-to-noise ratio, or (c) additional linguistic processing resulting from breakthrough by the competing voice.

{short description of image}
(a) Schematic illustration of a buzz – a series of equal-amplitude harmonics.
(b) Vowel-like sound created by increasing levels of pairs of harmonics straddling formant frequencies.
(c) Vowel-like sound created by mistuning pairs of harmonics (in grey) without raising their levels.

These problems can be overcome by using a simple task and the highly constrained stimuli schematised in the figure above which illustrates two methods for configuring vowels from series of equal-amplitude harmonics summed in random phase (Panel (a)). First, in Panel (b) vowels are created by raising the levels of two pairs of adjacent harmonics straddling the frequencies of the first and second formants. Such vowels can be identified without source segregation by locating peaks in the spectral profile of the complex, and then matching the peaks to stored templates. In Panel (c) all harmonics are set to the same level and the vowel is defined by mistuning two pairs of adjacent harmonics (marked in grey) straddling the formants. With this stimulus, grouping based on harmonicity allows the mistuned harmonics to be segregated from the remaining harmonics. Attention can then be directed to the mistuned harmonics whose spectral profile can be analysed and matched. Note that in Panel (b) there is only a foreground object, whereas in Panel (c) there is both a foreground and a background object. However, the total number of stimulus components, and their sinusoidal nature, remain the same. Additionally, task difficulty can be equated across the conditions by determining the increment level in Panel (b) that gives the same level of performance as the mistuning in Panel (c).

Subjects will be imaged in three conditions. In Condition 1, they will listen actively to a quasi-random sequence of exemplars of five vowel-like sounds (Panel (b)). They will press a button whenever the current vowel is the same phonetically as the preceding vowel. So that the task is not simply one of acoustic matching, the overall level and fundamental frequency of the stimuli will be varied randomly from stimulus to stimulus. The task can be partitioned into four components: pre-attentive auditory analysis (A), spectral-profile analysis (P), vowel classification (V), and memory and comparison (M). In Condition 2, subjects will perform the task with vowels defined by mistuning (Panel (c)). The task components are the same supplemented by selective attention (S). Subtraction of activation in Condition 1 from Condition 2 [(A,S,P,V,M)-(A,P,V,M)] should reveal loci concerned with selective attention based on fundamental periodicity. To maximise the power of the study, the design will be parametric with the percentage mistuning taking the values 3%, 6%, 12% and 24%. Interest will focus on loci that survive subtractive analyses and which show levels of activity that are correlated (positively or negatively) with the degree of mistuning.

Experiment 2 will establish whether the loci activated in Experiment 1 are also activated when the cue for segregation is a difference in onset time. Target vowels will be defined by delaying the onsets of pairs of adjacent harmonics relative to the background harmonics, rather than mistuning them, with parametric variation introduced by varying the onset asynchrony (12.5, 25, 50, and 100 ms). Further experiments will establish whether there are correlates in activation patterns of the heightened sense of segregation that occurs when differences in either harmonicity or onset time are combined with a difference in interaural timing.

Before starting these experiments, we will train subjects in the task and will optimise the stimuli. For example, given that modulated tones induce greater and more widespread activation than static tones, we will establish whether frequency-modulated carriers induce more informative patterns of activation than the static carriers sketched above. In the longer term, we will test the generality of results using natural speech, including conditions in which natural speech is processed selectively to restrict the cues available for grouping and segregation.

4.3.6 A fully temporal account of the perception of dichotic pitches

(Summerfield, Palmer, Akeroyd, Hall)

Our account of dichotic pitches leaves open the question of whether the profile of interaural decorrelation across frequency is represented and interpreted as a mean-rate code or as a temporal code. The mE-C model generates a mean-rate code; i.e. a pattern akin to an excitation pattern. Given that place-time models provide a more coherent account of many monaural pitch phenomena than do place-rate models (e.g. Meddis and Hewitt, 1991; Meddis and O’Mard, 1997), it is important to establish whether a fully temporal account can be developed of the perception of dichotic pitches. We can do that by modifying the mE-C model to store the temporal patterning in each filter channel after cancellation by subtraction. Because cancellation occurs on a within-channel basis, each remainder after subtraction displays the periodicity of the centre frequency of its channel. Accordingly, it is possible to compute the autocorrelogram and pooled autocorrelation function of the remainder signals. We have confirmed the feasibility of this approach by demonstrating that Huggins Pitches, MPS Pitches, and Fourcin Pitches yield peaks in their summary autocorrelation functions at delays corresponding to the reciprocal of the perceived pitch frequency [4.15]. We shall consolidate these results and then test the idea experimentally that a single temporal pitch extractor is implemented in the auditory nervous system, and can be driven both monaurally and binaurally.

(1) We shall confirm with quantitative modelling that a place-time variant of the mE-C model, with lateral inhibition included, can account for the relative salience and frequency bounds of the dichotic pitches in Classes I and II (Section 4.1.2).

(2) We shall test the hypothesis that there is a single neural pitch extractor by recording from units which display temporal sensitivity to the fundamental period of complex tones in the inferior colliculus of anaesthetised guinea pigs (Palmer et al., 1990) (Section 2.5.2). We shall establish whether the units display a similar discharge pattern when stimulated by (a) a complex of monaural narrow-band noises at 400, 600, 800, and 1000 Hz, and (b) a diotic broadband noise containing decorrelated bands centred on the same four frequencies.

(3) Temporal pitch models can account for the ability of listeners to detect the mistuning of the components of harmonic series (Meddis and O’Mard, 1997). If only one pitch mechanism exists, performance in detecting mistuning should not be affected by the origin (dichotic or monaural) of the mistuned component in relation to the other components. We shall adapt the "one-mechanism/two-mechanisms" approach of Carlyon and Shackleton (1994), to establish whether it is harder for listeners to detect mistuning when the mistuned component is defined in one modality (dichotic or monaural) and the remaining components are defined in the other, than when all components are defined in the same modality.

(4) If the results of Experiments (2) and (3) point to a single temporal pitch extractor and if it proves possible to obtain enough reliable MR data from sub-cortical nuclei, there would be value in establishing which loci are activated in common by iterated rippled noises (IRNs) and dichotic pitches. IRNs can be presented monaurally. They are synthesised by adding a delayed copy of a noise to itself. The pitches of IRNs are not reflected systematically in the locations of peaks in their excitation patterns, but instead stem from consistency of temporal intervals within frequency channels (e.g. Patterson et al., 1996). Thus, their pitches must be extracted by a temporal pitch analyser which can be driven monaurally. The finding that a dichotic pitch, constructed by decorrelating harmonically-related narrow bands in an otherwise diotic noise, activates the same loci as an IRN, would be converging evidence for the existence of a single temporal pitch analyser.

4.3.7 The basis of individual differences in the ability to lipread

(Summerfield, Hall, graduate student, in collaboration with Ludman)

"Lipreading" is the ability to understand spoken language by observing articulatory movements displayed on a talker’s face. Individual differences in lipreading skills are among the largest that are found in homogeneous populations of people who are neurologically normal. Understanding their basis remains a challenge for cognitive neuroscience. Moreover, an improved understanding might contribute to the development of more effective training techniques.

It might be supposed that good lipreading depends on proficiency in many sub-tasks, with variation in the overall skill resulting from variation in any combination of the sub-skills. However, this account has proved difficult to sustain because skill in lipreading is not highly or consistently associated with other sensory or cognitive skills. That result, allied to demonstrations that lipreading improves only slightly with training, has spawned the idea that the skill crystallises early in life and is underpinned by a functionally distinct cognitive module which is developed to different degrees in different people (Summerfield, 1991a,b). The aim of this project is to build on that idea and on the work of Calvert et al. (1997) and Campbell (1998) in order to locate loci where activation is related to proficiency in lipreading. Calvert et al. (1997) demonstrated that lipreading induces activation in cortical areas including V5 and Brodmann’s areas 41 and 22. However, their experiments did not address individual differences, for the good reason that small numbers of subjects and an easy task are more appropriate for the important initial step of identifying loci involved in lipreading.

As a preliminary exploration of individual differences, we first measured lipreading skill using videorecorded tests of the ability to report words in spoken sentences. Subsequently, MR activation was measured while a selected sub-set of the same subjects "silently lipread" sentences and was compared with baseline activation measured while subjects observed the static face of the talker [4.24]. Subjects who lipread better displayed changes in activation in more cortical regions when attempting to lipread in the scanner. Thus, this experiment demonstrated the feasibility of searching for loci associated with good lipreading. However, it also identified the need for two improvements in experimental design. (a) Loci where activation is a consequence of good lipreading must be distinguished from loci where activation causes good lipreading. (b) In at least a subset of conditions, accuracy of lipreading must be measured during imaging so that differences in activation can be linked to differences in performance.

One component of an improved design can be derived from an experiment of MacLeod and Summerfield (1990; Summerfield 1991) which did not involve imaging. They measured the minimum signal-to-noise ratio at which subjects could report the words in sentences (the "speech reception threshold", SRT). In the auditory condition (A), subjects only listened, giving SRTA. In the audio-visual condition (AV), they could also lipread, giving SRTAV. The contribution of lipreading to understanding was quantified as the difference in decibels between SRTA and SRTAV. This difference was correlated highly (R = 0.89, N=20) with performance on a task of pure lipreading (V) (no sound was presented). The fact that variation in the whole skill (the V condition) correlated so highly with the decibel-difference score is very important. It demonstrates that the basis of individual differences in lipreading cannot lie in processes that occur after the point of audio-visual conflux, since all such processes contribute equally to SRTA and SRTAV and are therefore partialled out when the thresholds are differenced. Hence, given that audio-visual conflux occurs prior to phonetic categorisation (Summerfield, 1991b), individual differences in lipreading must arise in pre-phonetic processes of visual analysis.

We shall apply the same logic in a pair of imaging experiments. Experiment 1 will use a subtraction paradigm between A and AV conditions to identify candidates for the loci of individual differences. Experiment 2 will measure activation when subjects lipread with no sound in a V condition referenced to passive observation of the face of the talker. We can then seek to determine which of the areas isolated in Experiment 1 display levels of activation that are positively correlated with lipreading performance in Experiment 2 – a variant on the technique of cognitive conjunction (Price and Friston, 1997).

The experiments require tasks of speech perception with three properties: (a) subjects must parse words from continuous speech, because such tasks tend to reveal the largest individual differences; (b) tasks must be compatible with the button-press responses that can be made while subjects are scanned; (c) it must be possible to equate performance in the A and AV conditions of Experiment 1, so that task difficulty is not a confound. We shall explore several possibilities. A promising option for Experiment 1 is to present sentences in pink noise, with the SNR controlled adaptively to maintain performance at a criterion level, while subjects judge whether or not sentences are semantically plausible [e.g. 5.07]. An option for Experiment 2 is to require subjects to lipread sentences constructed from four monosyllables, the third of which is one of the numbers one to ten ("She ran four miles", "He chose nine fish"), and require a judgement of whether the number is odd or even. This task requires words to be parsed and identified, but the sentences can be sufficiently constrained to ensure that performance is always above the floor level.

If loci associated with good lipreading can be identified in this way, we shall run two further experiments. In the first, analyses of activation would be contingent on the accuracy of responses, in order to distinguish loci involved in successful lipreading from loci involved in the attempt to lipread. In the second experiment, poor lipreaders would be imaged before and after they were trained to lipread a constrained set of stimulus materials with the goal of establishing whether the loci associated with successful lipreading become active with training.

We do not necessarily expect to identify one cortical locus or even a particular constellation of loci as the "seat of good lipreading" in the project, although the initial experiments will search for such loci. Rather, good lipreaders could display different balances of activation, or particular patterns of connectivity, between loci which are typically somewhat active. Accordingly, depending on the success of Project 4.3.4, we will seek to apply connectivity analysis in this project, too.

4.3.8 Central auditory processing disorders in children (CAPD)

(Haggard, Higson, Fortnum, Multicentre CAPD Collaborative Group)

Laboratory-based investigators have become interested recently in the functional basis of so-called central auditory processing disorders in children (CAPD), and the eventual possibility of imaging in such cases boosts the scientific potential. There is no single agreed definition of the condition, a more useful one being a likely product of this project. It can be viewed as auditory comprehension problems in the absence of hearing impairment or abnormal non-verbal intelligence. This view of CAPD allows acknowledgement of possible overlap with a number of further diagnostic labels: specific language impairment, attention deficit hyperactivity disorder (ADHD), and marginal autism. This in turn recognises the need for diagnostic boundaries to be drawn in replicable useful ways. Prevalence, impact, and long-term outcome of CAPD are unknown and the fairly large clinical literature, chiefly on diagnosis, is unrewarding through lack of a scientific underpinning. Equally, as previously happened with dyslexia, hypotheses about a single-function basis of the deficit have been put forward. Confirmatory univariate evidence has been sought rather than disciplined attempts to model components or falsify hypotheses, often in small samples of children receiving diagnostic labels that may not have been assigned in systematic, or even agreed, ways. There has usually been little if any background characterisation of the cases in ways known to be diagnostically relevant, little basic control, e.g. for verbal IQ, and no obligation accepted to test the incremental deviance (i.e. in determining group membership) explained by the newly postulated underlying function, over and above the differences that are simple, obvious, or already known. This is a common flaw in laboratory scientists’ approaches to clinical research, given the multiple differences in diffuse pathology that usually arise from comorbidity. Such approaches limit cumulative knowledge and generalisation. We intend to follow the complementary approach likely to enhance the value of all studies by providing a useful means of characterising the children tested. Only an outline is provided here because uncertainties of timing and resource in Theme 8 entail that the hypothesis-testing phase will not start immediately, although a collaborative framework of clinic procedures will.

Without good clinical characterisation, the scientific value of hypothesis-testing studies will remain extremely limited. To achieve this, we are leading the formulation of a consensus on the desirable form of regional clinics to achieve the characterisation. This embraces: referral sources and their briefing; clinic logistics; standard test battery covering various levels of structure and difficulty in auditory processing; and the use of contingent tests or supplementary clinical information. The referral sources will include nodes in education as well as in healthcare. A first steering group meeting has already agreed general principles, and a draft protocol will be available for discussion at the visit. To maximise recruitment and assure sustained participation, there will be an explicit commitment covering the professional feedback to be given as a service to referrers, including evaluation of its usefulness. This formula worked well a decade ago for our studies of obscure auditory function (OAD) in adults. We will locate an adequate number of appropriate controls for case-control comparisons, leading to (a) the specification of a smaller optimum test battery for subsequent clinical use, (b) the generation of new hypotheses, and (c) an appropriate minimum specification of children to be considered as cases in subsequent clinical research studies. A further similarity with our past OAD studies will be routine follow-up of first-stage results with the reduced clinical form of the battery to confirm or otherwise the generality of the clinical characterisation findings [4.05]. We envisage being able to test two hundred children over two years via six specialising clinics in Manchester, Liverpool, Nottingham, Cambridge, London, and Birmingham. For subsequent prioritisation we will produce prevalence estimates in relation to recommended boundaries with other conditions such as ADHD; also qualitative and quantitative descriptions of the disability, symptomatology, and reasons for referral. On the successful precedent of the OAD studies we anticipate several other products that make further research ethical and feasible: recommended clinic format and associated clinical data requirement, reduced test package for clinics in other regions, parental guidance leaflet, revised referral guidelines.

We are taking advice on particular tests to include in the first test battery from several specialists working in related areas in the UK, especially concerning enjoyable age-appropriate tests for the range of sensory, cognitive, and personality-related functions which we predict to be highly associated with the condition(s). As one example, we will use the work of Manly and others at the MRC Cognitive and Brain Sciences Unit showing three underlying dimensions of variation in attention in children, so as to define the boundary with ADHD, and the particular dimensions affected or otherwise in CAPD. It is necessary to include some probably dissociated tasks, because in diffuse pathology the null hypothesis is of a weak association, not zero. It is of little use to know that, for example, verbal reaction times (Dagenais and Cox, 1997) have joined the list of univariate functions showing a significant deficit in a "CAPD" group. Our test battery will cover the age-range 8-14 years and undergo multivariate analysis for factors and clusters. Given the multiple factors and/or sub-groups likely to be involved, any highly specific hypothesis is likely to be wrong in an uninformative way. There are a number of interesting possibilities for psychoacoustic (e.g. temporal processing) or genetic underpinnings for the syndrome in at least a proportion of the children, as well as an immense psychological literature on interference in concurrent processing and memory that can be used to isolate modality-specific effects (e.g. MacFarland and Cacace, 1997). We will decide in 2001 how to proceed beyond clinical characterisation to studies that test such hypotheses and whether to commence imaging in this group of children. Prior to that, demonstration of post-hoc univariate "abnormal" imaging results on half a dozen children arbitrarily receiving this label might have some exploratory value, but would be liable merely to confirm what is known behaviourally, or contribute Type 1 errors through publication biases, or Type 3 errors.

Our longer-term aim in this theme is to understand central auditory function and its abnormalities, in ways that can be applied to benefit patients. The general timing for convergence of functional imaging technology with clinical investigation having demonstrable patient benefit is uncertain; in specialised areas such as guiding temporal lobe surgery for epilepsy, applicability has already been demonstrated. This project therefore serves a further purpose. It capitalises upon experience gained in Theme 8 with children and clinical studies, to give us an involvement from the basis of which we can judge the likely timing of convergence of imaging technology with viable, informative, and acceptable studies on patients and children.

Resources

Project 4.1 (binaural hearing) was conducted in collaboration with two post-doctoral scientists, John Culling (over the first two years of the quinquennium) and Michael Akeroyd (over the last three years). Project 4.2 (imaging) started in January 1997. Since then, John Chambers, David Bullock, and Michael Akeroyd have created and evaluated the fMRI sound system under Alan Palmer’s lead. Deborah Hall has coordinated the experimental programme. Prior to leaving IHR in July 1998, Elaine Gurney contributed to preliminary data analyses. John Foster has checked sound levels for compliance with safety standards, and following training now controls the scanner during IHR experiments.

Deborah Hall will continue to coordinate the experimental programme and will take the lead in Project 4.3.4. Project 4.3.1 forms John Kornak’s doctoral research. Michael Akeroyd will complete the modelling component of Project 4.3.6 before leaving IHR in April 1999 to take up a further post-doctoral position at the University of Connecticut Health Centre, Farmington. His successor will work with the senior scientists and Deborah Hall on Projects 4.3.2 to 4.3.6. Project 4.3.7 will be assigned to a graduate student, working under the direction of Deborah Hall and Quentin Summerfield. John Foster’s involvement will increase further by contributing to data analysis. In the longer term, Miguel Goncalves, starting in September 1998, will also contribute to data analysis. In our experience, even with only 2-3 scanning days each month, the time required for data analysis is the major bar to productivity, justifying the assignment of up to half the time of two people.

The costs incurred at the Nottingham MR Centre during IHR experiments have been met out of an MRC Special Project Grant, in which IHR is one of four collaborative partners. Supplementary funding could be required to support three components of the work during the coming quinquennium. First, the special project grant runs until the end of September 1999. If it were not to be renewed, we would apply to the Board for the additional funds required to reimburse the MR Centre for its specific contribution to this theme. At £1000 per day for our current allocation of 25 imaging days per year, those costs would amount to £25,000 p.a. Second, it will be necessary to reimburse the Wellcome Department of Cognitive Neurology for its contribution to Project 4.3.4 if agreement to collaborate is reached. Costs cannot be specified at present. Third, optimising a technique for imaging mid-brain nuclei in Project 4.3.5 may take time. Accordingly, we shall continue to foster relationships with other imaging centres where specifically required facilities may be available sooner, more conveniently, or at better SNR, than in Nottingham. To that end, we have established that key parts of the proposed programme could be conducted by purchasing time on a scanner elsewhere, after piloting and training of subjects in Nottingham. Charges would amount to £450 per hour. Thus a typical experiment in which six subjects were each tested for two hours on each of two occasions would cost £10,800. We are not requesting any of these sums now. However, we would like to discuss the issues at the site-visit with a view to a further exchange with the Board in the autumn of 1999 when the position, particularly in relation to the special project grant, is clearer.

The CAPD project 4.3.8 has to run on low resources for its first two years. Our function is mainly scientific guidance and synthesis on choice of test battery, protocol coordination, and multi-centre facilitation, including liaison with audiologists on standard implementation of tests and techniques. This is possible with a 40% commitment from Josie Higson balancing her residual commitment to TARGET (20%). In 2001 as cochlear implant research winds down, time of Heather Fortnum (80%) will become available. This is the stage from which field service support and research costs will be incurred as the research switches into a hypothesis-testing mode at a level which NHS funds cannot reasonably support, and more complex organisation issues like those in trials (e.g. recruiting different centres, negotiating with ethics committees, etc) will be faced.

References to the work of others and to previous IHR papers

Assmann PF, Summerfield AQ (1990) Modelling the perception of concurrent vowels: Vowels with different fundamental frequencies. J. Acoust. Soc. Am. 88: 680-697.

Bandettini PA et el. (1998) Functional MRI of brain activation induced by scanner acoustic noise. Magnetic Resonance in Medicine 39: 410-416.

Berger EH (1986) Methods of measuring the attenuation of hearing protection devices. J. Acoust. Soc. Am. 79: 1655-1687.

Bilsen FA (1977) Pitch of noise signals: Evidence for a 'central spectrum' J. Acoust. Soc. Am. 61: 150-161.

Bilsen FA, Goldstein JL (1974) Pitch of dichotically delayed noise and its possible spectral basis. J. Acoust. Soc. Amer. 55: 292-296.

Buechel C, Friston KJ (1997) Modulation of connectivity in visual pathways by attention: Cortical interactions evaluated with structural equation modelling and fMRI. Cerebral Cortex 7: 768-778.

Calvert G et al. (1997) Activation of auditory cortex during silent speechreading. Science 276: 593-596.

Campbell R (1998) How brains see speech. In R Campbell, B Dodd, D Burnham (Eds) Hearing by Eye II. Hove: Psychology Press. pp. 177-193.

Carlyon RP, Shackleton TM (1994) Comparing the fundamental frequencies of resolved and unresolved harmonics: evidence for two pitch mechanisms? J. Acoust. Soc. Amer. 95: 3541-3554.

Cho ZH et al. (1998) Effects of the acoustic noise of the gradient systems on fMRI: A study on auditory, motor, and visual cortices. Magnetic Resonance in Medicine 39: 331-335.

Clarey J et al. (1991) Physiology of thalamus and cortex. In A Popper, R Fay (Eds) The Mammalian Auditory Pathway. New York: Springer-Verlag. pp. 232-334.

Cramer EM, Huggins WH (1958) Creation of pitch through binaural interaction. J. Acoust. Soc. Am. 30: 858-866.

deCharms RC et al. (1998) Optimizing sound features for cortical neurons. Science 280: 1439-1443.

Dagenais PA, Cox M (1997) Vocal reaction times of children with CAPD, age-matched peers, and young adults to printed words. J. Speech. Lang. Hear. Res. 40: 694-703.

DeYoe EA, van Essen DC (1988) Concurrent processing streams in monkey visual cortex. Trends in Neurosciences 11: 219-226.

Eden GF et al. (in press) Utilizing haemodynamic delay and dispersion to detect fMRI signal change without auditory interference: The behaviour interleaved gradients technique. Magnetic Resonance in Medicine.

Fourcin AJ (1970) Central pitch and auditory lateralization. In R Plomp and GF Smoorenburg (Eds) Frequency Analysis and Periodicity Detection in Hearing. Sijthoff: The Netherlands.

Friston KJ et al. (1995) Characterizing modulatory interactions between areas V1 and V2 in human cortex: A new treatment of functional MRI data. Human Brain Mapping 2: 211-224.

Friston KJ et al. (1997) Psychophysiological and modulatory interactions in neuroimaging. NeuroImage 6: 218-229.

Griffiths TD et al. (1994) Human cortical areas selectively activated by apparent sound movement. Current Biology 4: 892-895.

Griffiths TD et al. (1998) Analysis of temporal structure in sound by the human brain. Nature Neuroscience 1: 422 - 427.

Guimaraes AR et al. (1998) Imaging subcortical auditory activity in humans. Human Brain Mapping 6: 33-41.

Harms MP et al. (1998) Time course of fMRI signals in the inferior colliculus, medial geniculate body and auditory cortex show different dependencies on noise burst rate. 4th International Conference on Functional Mapping of the Human Brain, Montreal, #365.

Hartmann WM (1984) Binaural coherence edge pitch. J. Acoust. Soc. Am. 75: K10.

Hill NI, Darwin CJ (1996) Lateralisation of a perturbed harmonic: Effects of onset asynchrony and mistuning. J. Acoust. Soc. Am. 100: 2352-2364.

Hukin RW, Darwin CJ (1995) Effects of contralateral presentation and of interaural time differences in segregating a harmonic from a vowel. J. Acoust. Soc. Am. 98: 1380-1387.

Hupé JM et al. (1998) Cortical feedback improves discrimination between figure and background in V1, V2 and V3 neurons. Nature 394: 784-787.

Imig TJ, Reale RA (1981) Ipsilateral corticocortical projections related to binaural columns in cat primary auditory cortex. J. Comp. Neurol. 203: 1-14.

Jäncke L et al. (1997) Intensity of auditory stimuli determines the spatial extent of the BOLD-response in the human auditory cortex to auditory stimuli. NeuroImage 54: S192.

Klein MA, Hartmann WM (1981) Binaural edge pitch. J. Acoust. Soc. Am. 70: 51-61.

Kowalski N et al. (1995) Comparison of responses in the anterior and primary auditory fields of the ferret cortex. J. Neurophysiol. 73: 1513-1523.

Langner G et al. (1997) Magnetoencephalography reveals orthogonal maps for periodicity pitch and frequency in the human auditory cortex. J. Comp. Physiol. (submitted).

Lange N, Zege SL (1997) Non-linear Fourier time series analysis for human brain mapping by fMRI. J. Roy. Stat. Soc. Series C Applied Statistics 46: 1-29.

Lauter JL et al. (1985) Tonotopic organisation in human auditory cortex revealed by positron emission tomography. Hear. Res. 20: 199-205.

MacFarland DJ, Cacace A (1997) Modality specificity of auditory and visual pattern recognition: implications for the assessment of central auditory processing disorders. Audiology 36: 249-260.

MacLeod A, Summerfield AQ (1990) A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise. British Journal of Audiology 24, 29-44.

Meddis R (1986) Simulation of mechanical to neural transduction in the auditory receptor. J. Acoust. Soc. Am. 79: 702-711.

Meddis R, Hewitt MJ (1991) Virtual pitch and phase sensitivity of a computer model of the auditory periphery: I. Pitch identification. J. Acoust. Soc. Am. 89: 2866-2882.

Meddis R, Hewitt MJ (1992) Modelling the identification of concurrent vowels with different fundamental frequencies. J. Acoust. Soc. Am. 90: 233-245.

Meddis R, O’Mard L (1997) A unitary model of pitch perception. J. Acoust. Soc. Am. 102: 1811-1820.

Menon RS et al. (1997) Ocular dominance in human V1 demonstrated by functional magnetic resonance imaging. J. Neurophysiol. 77: 2780-2787

Merzenich M et al. (1975) Representation of cochlea within primary auditory cortex in the cat. J. Neurophysiol. 38: 231-249.

Moore BCJ et al. (1997) A model for the prediction of thresholds, loudness, and partial loudness. J. Audio Eng. Soc. 45: 224- 240.

Moore BCJ et al. (1988) The shape of the ear’s temporal window. J. Acoust. Soc. Am. 83: 1103-1116.

Palmer AR (1990) The representation of the spectra and fundamental frequencies of steady-state single- and double-vowel sounds in the temporal discharge patterns of guinea pig cochlear-nerve fibers. J. Acoust. Soc. Am. 88: 1412-1426.

Patterson RD et al. (1996) The relative strength of the tone and noise components in iterated rippled noise. J. Acoust. Soc. Am. 100: 3286-3294.

Price CJ, Friston KJ (1997) Cognitive conjunction: A new approach to brain activation experiments. NeuroImage 5: 261-270.

Raatgever J, Bilsen FA (1986) A central theory of binaural processing: evidence from dichotic pitch. Journal of the Acoustical Society of America 80: 429-441.

Rajapakse et al. (1998) Modeling hemodynamic response for analysis of functional MRI time-series. Human Brain Mapping 6: 283-300.

Rauschecker JP et al. (1997) Serial and parallel processing in Rhesus monkey auditory cortex. J. Comp. Neurol. 382: 89-103.

Rivier F, Clark S (1997) Cytochrome oxidase, acetylcholinesterase, and NADPH-diaphorase staining in human supratemporal and insular cortex: Evidence for multiple auditory areas. NeuroImage 6, 288-304.

Ravicz ME, Melcher JR (1998) Reducing imager-generated acoustic noise at the ear during functional magnetic resonance imaging (fMRI): passive attenuation. Assoc. Res. Otolaryngol. 21st Midwinter Meeting, p. 208.

Scheich H et al. (1998) Functional magnetic resonance imaging of a human auditory cortex area involved in foreground-background decomposition. Euro. J. Neurosci. 10: 803-809.

Schreiner CE (1998) Spatial distribution of responses to simple and complex sounds in the primary auditory cortex. Audiology & Neuro-Otology 3: 104-122.

Schreiner CE, Mendelson JR (1990) Functional topography of cat primary auditory cortex: distribution of integrated excitation. J. Neurophysiol. 64: 1442-1459.

Shamma SA et al. (1993) Organisation of response areas in ferret primary auditory cortex. J. Neurophysiol. 69: 367-383.

Shamma SA, Symmes D (1985) Patterns of inhibition in auditory cortical cells in awake squirrel monkeys. Hear. Res. 19: 1-13.

Shannon RV (1995) Speech recognition with primarily temporal cues. Science 270: 303-304.

Stehling MK et al. (1991) Observation of cerebrospinal-fluid flow with echo-planar magnetic-resonance-imaging. Brit. J. Radiol. 64:89-97.

Stern RM, Trahiotis C (1995) Models of binaural interaction. In BCJ Moore (Ed) Hearing: Handbook of Perception and Cognition. San Diego: Academic. pp. 347-386.

Sutter ML, Schreiner CE (1991) Physiology and topography of neurons with multipeaked tuning curves in cat primary auditory cortex. J. Neurophysiol. 65: 1207-1226.

Summerfield AQ (1991a) Visual perception of phonetic gestures. In IG Mattingly and M Studdert-Kennedy (Eds) Modularity and the Motor Theory of Speech Perception. Erlbaum Associates: Hillsdale, N.J. pp. 117-137.

Summerfield AQ (1991b) Lipreading and audio-visual speech perception. Philosophical Transactions of the Royal Society London B 335: 71-78.

Talavage TM et al. (1997) Comparison of impact of fMRI sequence acoustics on auditory cortex activation International Society for Magnetic Resonance in Medicine, Sydney Meeting, 1503.

Taniguchi I, Nasu M (1993) Spatio-temporal representation of sound intensity in the guinea pig auditory cortex observed by optical recording. Neuroscience Letters 151: 178-181.

Viemeister NF, Bacon SP (1982) Forward masking by enhanced components in harmonic complexes. J. Acoust. Soc. Am. 71: 1502-1507.

Waldorff MG et al. (1993) Modulation of early sensory processing in human auditory cortex during auditory selective attention. Proc. Natl. Acad. Sci. USA 90: 8722-8726.

Wessinger CM et al. (1997a) Tonotopy in human auditory cortex examined with fMRI. Hum. Brain. Map. 5: S8.

Wessinger CM et al. (1997b) Processing of complex sounds in human auditory cortex. Assoc. Res. Otolaryngol. 20th Midwinter Meeting, p. 27.

Zeki S (1993) A Vision of the Brain. Oxford: Blackwell Scientific.

Zeki S et al. (1991) A direct demonstration of functional specialisation in human visual-cortex. J. Neuroscience 11: 641-649.

Awards and Honours

In 1997, Quentin Summerfield was elected a fellow of the Acoustical Society of America for contributions to understanding processes of auditory grouping and speech perception.

Publications

Refereed Papers

[4.01] Culling JF, Summerfield AQ, Marshall DH (1994) "Effects of simulated reverberation on the use of binaural cues and fundamental-frequency differences for separating concurrent vowels" Speech Communication 14, 71-95.

[4.02] Assmann PF, Summerfield AQ (1994) "The contribution of waveform interactions to the perception of concurrent vowels" Journal of the Acoustical Society of America 95, 471-484.

[4.03] Lea AP, Summerfield AQ (1994) "Minimal spectral contrast for vowel identification" Perception and Psychophysics 56, 379-391.

[4.04] Summerfield AQ, Palmer AR, Foster JR, Marshall DH, Twomey T (1994) "Clinical evaluation and test-retest reliability of the IHR-McCormick Automated Toy Discrimination test" British Journal of Audiology 28, 165-179.

[4.05] Higson JM, Haggard MP, Field DL (1994) Robustness of determinants of OAD status across samples and test methods. British Journal of Audiology: 28, 27-39.

[4.06] Culling JF, Summerfield AQ (1995) "Binaural grouping of complex sounds: absence of across frequency grouping by common inter-aural delay". Journal of the Acoustical Society of America 98, 785-797.

[4.07] Culling JF, Summerfield AQ (1995) "The role of frequency modulation in the perceptual separation of concurrent vowels". Journal of the Acoustical Society of America 98, 837-846.

[4.08] Culling JF (1996) Signal-processing software for teaching and research in psychoacoustics under UNIX and X-windows. Behaviour Research Methods, Instruments, and Computers 28, 376-382.

[4.09] Culling JF, Summerfield AQ (1998) Measurements of the binaural temporal window using a detection task. Journal of the Acoustical Society of America 103, 3540-3553.

[4.10] Culling JF, Summerfield AQ, Marshall DH (1998) Dichotic pitches as illusions of binaural unmasking I: Huggins’ Pitch and the "Binaural Edge Pitch". Journal of the Acoustical Society of America 103, 3509-3526.

[4.11] Culling JF, Marshall DH, Summerfield AQ (1998) Dichotic pitches as illusions of binaural unmasking II: the Fourcin Pitch and the Dichotic Repetition Pitch. Journal of the Acoustical Society of America 103, 3527-3539.

[4.12] Akeroyd MA, Summerfield AQ (submitted) A binaural analogue of gap detection. Submitted to the Journal of the Acoustical Society of America.

[4.13] Hall DA, Haggard MP, Akeroyd MA, Palmer AR, Summerfield AQ, Elliot MP, Gurney EM, Bowtell RW. (submitted) "Sparse" temporal sampling in auditory fMRI. Submitted to Human Brain Mapping.

Invited Chapters

[4.14] Summerfield AQ, Culling JF (1994) "Auditory processes that separate speech from competing sounds: A comparison of monaural and binaural processes." In: Keller, E. ed. Fundamentals of Speech Synthesis and Speech Recognition. London: Wiley and Sons. pp. 313-338.

[4.15] Summerfield AQ, Akeroyd MA (1998) "Computational approaches to modelling auditory selective attention: monaural and binaural processes" Course Reader for the NATO Advanced Study Institute on Computational Hearing. S Greenberg, M Slaney (Eds). International Computer Science Institute: Berkeley. pp. 743-799.

[4.16] Assmann PF, Summerfield AQ (submitted) The perception of speech under adverse conditions. To appear in: Speech Processing in the Auditory System (Eds. S. Greenberg, W.A. Ainsworth, A.N. Popper and R. Fay). Springer-Verlag, New York.

Conference Proceedings

[4.17] Summerfield AQ, Culling JF, Assmann PF (1996) The perception of speech under adverse conditions: contributions of spectro-temporal peaks, periodicity, and inter-aural timing to perceptual robustness. Distributed as a supplement to the Proceedings of the Workshop on the Auditory Basis of Speech Perception, W. Ainsworth and S. Greenberg (eds.) Newcastle-under-Lyme: Keele University.

[4.18] Akeroyd MA, Summerfield AQ (1997) Predictions of signal thresholds in a frozen-noise masker using monaural and binaural temporal windows. In Psychophysical and Physiological Advances in Hearing A.R. Palmer, A. Rees, A.Q. Summerfield, and R. Meddis (eds.). London: Whurr. pp. 433-440.

[4.19] Akeroyd MA, Summerfield AQ, Foster JR (1998) Integrating monaural and binaural spectral information. Proceedings of 16th International Congress on Acoustics and 135th Meeting of the Acoustical Society of America. Acoustical Society of America Woodbury, NY. pp. 1975-1976.

[4.20] Akeroyd MA, Summerfield AQ (1998) A computational model of the lateralisation of dichotic pitches. Proceedings of the NATO Advanced Study Institute on Computational Hearing. S. Greenberg and M Slaney (Eds). International Computer Science Institute: Berkeley. pp. 55-60.

[4.21] Palmer AR, Bullock DC, Chambers JD (1998) A high-output, high-quality sound system for use in auditory fMRI. NeuroImage 7, S359.

[4.22] Hall DA, Elliott MR, Bowtell RW, Gurney E, Haggard MP (1998) "Sparse" temporal sampling in fMRI enhances detection of activation by sound for both magnetic and acoustical reasons. NeuroImage 7, S551.

[4.23] Hall DA, Akeroyd MA, Palmer AR, Summerfield AQ, Haggard MP, Elliott MR, Gurney E, Bowtell RW (1998) Optimal sampling of haemodynamic changes in auditory cortex for Functional Magnetic Resonance Imaging (fMRI). NeuroImage 7, S576.

[4.24] Ludman CN, Hykin JL, Summerfield AQ, Clare S, Elliott M, Foster JR Bowtell RW, Morris PM, Worthington BS (1997) Functional MR Imaging During the Silent Lipreading of Sentences. Radiology 205(P): 379, 229.

Principal Collaborators

Dr PF Assmann, Department of Human Communication, University of Texas, Dallas.

Dr RW Bowtell, Magnetic Resonance Centre, Department of Physics, University of Nottingham.

Mr MR Elliott, Magnetic Resonance Centre, Department of Physics, University of Nottingham.

Professor A O’Hagan, Department of Mathematics, University of Nottingham.

Dr C Ludman, Department of Academic Radiology, University of Nottingham.

Professor P Morris, Magnetic Resonance Centre, Department of Physics, University of Nottingham.

Multi-centre CAPD Collaborative Group (Clinicians and audiologists in six cities).