GB2519569A

GB2519569A - A method of localizing audio sources in a reverberant environment

Info

Publication number: GB2519569A
Application number: GB1318869.3A
Authority: GB
Inventors: Lionel Le Scolan; Eric Nguyen
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-10-25
Filing date: 2013-10-25
Publication date: 2015-04-29
Anticipated expiration: 2033-10-25
Also published as: GB2519569B; GB201318869D0

Abstract

The method comprises (step 510) determining directions of arrival of sound waves using signals recorded by a microphone array (10, figure 1), each direction being expressed by azimuth and elevation angles (Î¸, Ï†) in a spherical coordinates system which reference plane is parallel to the reflective plane (24); (step 520) sorting the determined directions into groups where each group contains directions sharing substantially same azimuth angles; filtering (step 530) the determined directions based on a least one attribute associated with the directions of each group; and (step 540) localizing audio sources based on the filtered directions. In an embodiment one attribute is the number of directions sorted in each group (cardinal of the group). In this embodiment, the filtering (referred to as first filtering) comprises discarding directions belonging to groups having a single member.

Description

TITLE OF THE INVENTION

A method of localizing audio sources in a reverberant environment

BACKGROUND OF THE INVENTION

The invention relates to the field of sound source localization (SSL) which aims at determining the directions of sound sources of interest. The sources of interest may be at the origin of any type of sound, such as speech, music or environmental sound.

The function of a SSL method is to locate a sound source by determining typically its direction. The direction of the sound source is typicaly expressed in terms of azimuth and elevation angles (e,ip) relatively to a given reference system. The reference system has usually its origin at the listening position.

SSL methods operate on a set of audio signals recorded by a set of microphones forming a microphone array. The recording of the audio signals results from acoustic waves emanating from the sound sources impinging the set of microphones in different directions.

In general, only the direct sound is used to localize a source. The localization is performed by estimating differences in intensities and time delays between different audio signals recorded at each microphone of the array where the recorded signals result from the acoustic waves emanating directly from the source.

However in realistic acoustic conditions acoustic waves emanating from the sources may also propagate via indirect paths prior impinging the microphones. This is particularly true for highly reverberant environments containing objects and/or planes causing reflections of the acoustic waves. These acoustic waves propagating via indirect paths are seen by conventional SSL methods as resulting from secondary sound sources distinct from the actual sound sources (these latter are referred to hereinafter as primary sound sources).

One may rely on the intensities of audio signals to discriminate between a primary source and its corresponding secondary sound sources since acoustic waves propagating directly from the primary source are always the most energetic.

This technique fails however to discriminate between primary and secondary sources when a plurality of (primary) audio sources need to be localized. In fact, one primary source may be closer to the listening position than another, and the acoustic waves propagating indirectly from the closest source may still be more energetic at the listening position than the acoustic waves propagating directly from the farthest source.

Aspects of the present invention have been devised to address at least the foregoing concern. More specifically, an object of embodiments of the present invention is to accurately localize a plurality of sources in a reverberant environment. Another object of embodiments of the invention is to provide a method that is simple to implement.

SUMMARY OF THE INVENTION

To this end, the present invention provides according to a first aspect a method of localizing sound sources in a reverberant environment comprising a reflective plane of sound waves. The method comprising: determining directions of arrival of sound waves using signals recorded by a microphone array, each direction being expressed by an elevation and azimuth angles (0, tp) in a spherical coordinates system which reference plane is parallel to the reflective plane; sorting the determined directions into groups where each group contains directions sharing substantially same azimuth angles; filtering the determined directions based on a least one attribute associated with the directions of a group; and localizing audio sources based on the filtered directions.

The method advantageously makes it possible to discard secondary sound sources from the localization.

In one implementation, an attribute on which is based the filtering is the number of directions sorted in each group. The filtering comprises thus discarding directions belonging to groups having a single member, thereby obtaining a first set of filtered directions.

In one implementation, an attribute on which is based the filtering is the magnitude of a function representative of the strength of the sound signal, e.g. angular spectrum function, for each direction in a group.

In one implementation, an attribute on which is based the filtering is the elevation angle p of the directions in a group.

In one implementation, if the reflective plane is located below the microphone array and the sound sources, the filtering further comprises discarding, from the first set of filtered directions, directions of groups having no secondary direction with lower elevation angle than the direction associated with the strongest magnitude of the group, thereby obtaining a second set of filtered directions.

Alternatively, if the reflective plane is located above the microphone array and the sound sources, the filtering further comprises discarding, from the first set of filtered directions, directions of groups having no secondary direction with higher elevation angle than the direction associated with the strongest magnitude of the group, thereby obtaining a second set of filtered directions.

In a preferred implementation, the filtering comprises discarding, from the first or the second set of filtered directions, directions which associated magnitudes are not the strongest among the members of each group.

In one implementation, the filtering is performed time frame by time frame, i.e. without pooling the angular spectrum function over time. This has the advantage to avoid associating one primary peak of one time frame with a secondary peak of another time frame as this secondary peak cannot constitute a sound reflection of the wave forming the primary peak. In fact, when directions are pooled over time, this secondary peak may incorrectly be associated with the primary peak.

The present invention also provides according to a second aspect a device for localizing sound sources in a reverberant environment comprising a plane reflective of sound waves. The device comprising: determining means for determining directions of arrival of sound waves using signals recorded by a set of microphones, each direction being expressed by an elevation and azimuth angles (0, q) in a spherical coordinates system which reference plane is parallel to the reflective plane; sorting means for sorting the determined directions into groups where each group contains directions sharing substantially same azimuth angles; filtering means for filtering the determined directions based on a least one attribute associated with the directions of a group; and localizing means for localizing audio sources based on the filtered directions.

The present invention also extends to programs which, when run on a computer or processor, cause the computer or processor to carry out the method described above or which, when loaded into a programmable device, cause that device to become the device described above. The program may be provided by itself, or carried by a carrier medium. The carrier medium may be a storage or recording medium, or it may be a transmission medium such as a signal. A program embodying the present invention may be transitory or non-transitory.

The particular features and advantages of the device and the program being similar to those of the method of localizing audio sources, they are not repeated here.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 depicts for illustrative purposes a device for localizing sound sources of interest.

Figure 2 presents an example of a sound environment containing a reflective plane.

Figure 3 depicts a system for localizing sound sources of interest from the audio signals recorded by microphones of a microphone array.

Figure 4 is a flowchart illustrating general steps of a Sound Source Localization method.

Figure 5 is a flowchart illustrating steps of a method of localizing audio sources in a reverberant environment according to an embodiment of the invention.

Figure 6 is a flowchart illustrating an implementation example of the steps of figure 5 based on a histogram pooling method.

Figure 7 illustrates by an example the derivation of a 1 D spectrum from a 2D angular spectrum function.

Figures BA to 8D illustrate by an example the performances of embodiments of the invention when applying different pooling functions for the localization of sound sources.

Figure 9 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention.

DETAILED DESCRIPTION OF THE INVENTION

As illustrated in Figure 1, a device for localizing one or more sound sources of interest according to a particular embodiment of the invention comprises a microphone array 10, which itself comprises four microphones 15.

The number of microphones within the array may vary, but at least three non-aligned microphones are required to localize directions in both azimuth and elevation.

Each microphone 15 of microphone array 10 records the audio signals emanating from a number of sound sources of interest Si, S2 each located at a particular azimuth U and elevation p in spherical coordinates (only one sound source 100 is represented in Figure 1). The sound sources are placed in a sound environment comprising ambient noise. The plane (x, z) represents the reference plane of the spherical coordinates system and the intersection of the three axis (x, y, z) the reference point of the system.

In a realistic reverberant sound environment, one or more reflective planes are present such as the floor plane, a table surface or the ceiling for

example.

Figure 2 presents an example of a sound environment containing a reflective plane 24, e.g. the floor, and a sound source 200. In such as environment, in addition to a direct sound path 21, a first reflection path 22 over plane 24 and possibly further reflection paths over other reflective planes may be present. Also, a "sound image" can be detected at the listening position as an additional virtual source. In fact, the direct sound 21 of the source 200 and the corresponding reflection(s) 22 combine with each other and are seen as if a virtual source is located in between, similarly to a sound image in a stereo system.

Relying on the direction of arrival of sound waves, the direct sound path 21 would indicate the existence of a primary sound source 200, whereas the first reflection path 22 and the sound image path 23 would indicate the existence of secondary sound sources. In other words, in this system, three sound sources (one primary and two secondary) can be localized that are actually all linked to a single source 200. It is important to note here that these sound sources are characterized in that they all share substantially a same azimuth angle 0, in a spherical coordinates system having a reference plane (x, z) set parallel to the reflective plane 24. This is equivalent to say that the three sound sources belong substantially to a same plane that is perpendicular to the reflective plane 24. This property is used in embodiments of the invention to discriminate between primary and secondary sound sources as will be detailed hereinafter.

Figure 3 depicts a system for localizing sound sources of interest from the audio signals recorded by microphones 15 of the microphone array 10 (same reference numbers kept as figure 1). Two primary sound sources 301 and 302 are assumed for example to be present in the sound environment. It is also assumed that the acoustic waves emanating from source 301 reflect on a surface 34 creating an indirect path for sound waves as if a secondary sound source 301' was present in the sound environment.

The localization is performed through the estimation of differences in intensities and time delays t between received signals at each microphone in the array.

A Sound Source Localization (SSL) method 300 is then executed in order to obtain the Direction of Arrival of only primary sound sources of interest 301, 302 and specifically their respective coordinates (0,çü), according to an embodiment of the invention. Directions of arrival are searched for within a given angular search window [mjn, °maxI x [Pmim, c°max] using records of audio signals obtained from the set of microphones 15 during a given time duration T. Under particular conditions called far field corresponding to the situation where the sources are placed at a relatively large distance with respect to the dimensions of the array, only differences between the time delays (tii, t12, t13, t'11, ti2, t'13, t21, t22, t23, ...) can be physically exploited.

These time delay differences, also known as Time Differences Of Arrival (TDOA) are usually expressed relatively to a given microphone 15 of the array 10.

Considering that the TDOA depends on the Direction of Arrival DOA (0, q) of each source and on the geometry of the microphone array, and more specifically on the relative positions of the microphones, the TDOA and the relative positions of the microphones are used to obtain the desired Direction of Arrival.

The Sound Source Localization method 300 according to a preferred embodiment of the present disclosure is illustrated on Figure 4.

First, a digital sound recording step 405 is performed, during which environment audio signals, i.e. audio signals emanating from the sound sources present in the environment, are captured by the microphone array 10.

The digital sound recording step 405 includes pre-amplification, analog to digital conversion and synchronization means providing a multichannel set of M recorded digital audio signals xl(n),x2(n),*..,xM(n) sharing the same sampling clock, where M is the number of microphones 15 (e.g. 4) and n the sampling time index. Note that at least 3 non-aligned microphones are required to localized directions (9,q') in 3D.

The SSL method operates by first transforming at step 410 the recorded signals from the time domain x(n), I = into time-frequency representations X(t,fl, i = 1, ,M where t and f denote respectively time and frequency indices. Most sound source localization algorithms use the Short Time Fourier Transform (STFT) for this transformation. In this case, t is the index of the time frame used in the Discrete Time FT processing. Other transforms such as models of the human auditory front end (ERB, Equivalent Rectangular Bandwidth, transform) can be used.

Typically, for speech source localization, sound is sampled at 16 000 Hz and SIFT window size can be set to 1024 samples with 50% overlap considering a Hanning or sine window.

In step 420, a local angular spectrum function cD(t, f, 0, q) is computed for each time-frequency representation X(t,f), i = 1, ,M. This function is local in the sense that it represents the spectrum for each given time frame t. It exhibits large values for directions (O,Q) representing DOA of sound waves and lower values otherwise. The local angular spectrum is generally computed for a set of discrete values representing possible DOAs lying on a given grid of directions bounded by the angular search window [mjn' °max] x I*pmjn, 4mar]-Different methods known in the art can be used to compute the local angular spectrum function. These methods may belong to the class of Generalized Cross Correlation (GCC) functions, Subspace functions such as MUSIC or beamforming functions. For more details one may refer to the paper "Multi-source TDOA estimation in reverberant audio using angular spectra and clustering" by Charles Blandin; Alexey Ozerov and Emmanuel Vincent in Signal Processing, Elsevier, 2012, 92, pp. 1950-1 96.

The different local angular spectrum functions 1(t,f, 8,p) are then pooled (integrated) across time and frequency to get an angular spectrum function D(8,q) dependent only on directions (9w) and from which the DOA5 can be estimated.

The pooling is often performed differently over frequencies and time As for the pooling over frequencies (step 430), the local angular spectrum values are summed up over the frequencies. This mitigates the effect of spatial aliasing occurring at high frequencies. As for the pooling over time (step 440), different pooling functions P can be used.

Similarly to pooling over frequencies, integration over time can be performed by summing up the spectrum over time frames of an observation period T: (9,q) = P((t,9,)) = A limitation of this summing approach is that it is difficult to localize a source that is active only within few time frames during the observation period T due to the integration of irrelevant information when the source is inactive.

An alternative pooling function P, is to take the maximum over all time frames of the period T: t(9q) = P(t(t,9,qi)) = maxt(tO,ço) A further alternative pooling function P, is to build a histogram by counting occurrences of peaks of maximum energy (magnitude) over time frames of the period T. At step 450, the directions (8,q) of the sound sources are derived from the angular spectrum function l$(O, q), with 1 «= I «= N where N is a predetermined number at least equal to the number of sources (the setting of N is detailed later).

Typically the directions (01,q1) corresponding to the N highest values (peaks) of the magnitude of the angular spectrum function are chosen. It is to be understood that these directions may correspond to both primary and secondary sound sources.

Finally, at step 460, directions corresponding only to primary sources are determined.

The step 460 of localization of primary sound sources will now be described according to an embodiment of the invention illustrated by figure 5. The proposed localization method makes it possible to discriminate between primary and secondary sound sources.

Figure 5 is a flowchart illustrating general steps of a method of localizing audio sources in a reverberant environment according to an embodiment of the invention.

At a first step 510, directions of arrival of sound waves are determined using signals recorded by the microphone array 10. An implementation of this step is provided by the description of step 450 of the flowchart of figure 4. The number N of directions (peaks of the angular spectrum function) searched for in step 450 may be set beforehand to be equal to NN where N represents the expected number of primary sound sources (which may be determined using conventional sound source counting methods) and N represents the expected number of sound sources detected per primary source due to reflections in the reverberant environment. N is for example set to 3 as one may assume to have two secondary sound sources (first reflection and sound image) associated with each primary source in the sound environment illustrated in figure 2. In a variant, a threshold is set for the determination of the peaks, and thus the number N of directions is derived from this determination (instead to be set beforehand equal to NSN).

The determined directions may correspond to primary and secondary sound sources. Each direction is expressed by a couple of elevation and azimuth angles (0,q) in a spherical coordinates system which reference plane (x, z) is set parallel to a reflective plane of the reverberant sound environment.

At step 520, the determined directions are sorted into groups where each group contains directions sharing substantially same azimuth angles. Because the reference plane (x, z) is set parallel to the reflective plane, a group formed by directions having a same azimuth should necessarily include all the directions corresponding to secondary sources (first reflection, sound image) when present.

At the same time, it is considered to be unlikely to have two or more primary sound sources all located at a same given azimuth, and thus the members of the group are all considered to be linked to a single primary source.

Note that in order to decide that two directions share substantially a same azimuth angle, a typical margin of 1° to 2° is considered. This means that one direction qi) is considered to be substantially at the same azimuth angle than another direction (92'P2) if the angle 0 falls within e.g. one of the intervals (02 ± 0.5°) to (02 ± 1°).

At step 530, the determined directions are filtered based on a least one attribute associated with the directions of each group and primary audio sources are then localized at step 540 based on the filtered directions.

In one embodiment, one attribute is the number of directions sorted in each group (cardinal of the group). In this embodiment, the filtering (referred to as first filtering) comprises discarding directions belonging to groups having a single member, i.e. of cardinal 1, thereby obtaining a first set of filtered directions.

Indeed, because of the existence of a reflective plane, groups should necessarily contain two or more directions including the direct path, the first reflection path and/or the sound image path. If only one direction is contained in a group, it is assumed that this direction doesn't correspond to an actual sound source and thus should be discarded. Primary audio sources may then be localized using this first set of filtered directions as a first approximation.

Another attribute which may be taken into account is the magnitude of a function representative of the strength of the sound signal (e.g. energy), such as the above discussed angular spectrum, for each direction in a group. Magnitudes associated with the directions of a given group can be compared with each other to perform a second filtering.

In a variant, a second filtering comprises discarding directions in which associated magnitudes are not the strongest among the members of each group, thereby obtaining a second set of filtered directions. In other words, groups holding the first set of filtered directions each contains two or more directions (single member groups having been previously discarded), and only the direction in which associated magnitude is the strongest (referred to primary direction) is kept in each group as it represents the direction of the primary source. In fact, for a given azimuth (group), directions in which associated magnitudes are not the strongest among the directions of the group (referred to as secondary directions) are assumed to be necessarily resulting from sound waves travelling via indirect paths or sound image, and thus corresponding to secondary sound sources.

Primary audio sources are then localized from the second set of filtered directions.

Another attribute which may also be taken into account is the elevation angle p of the directions in a group. Elevation angles of the directions of a given group can be compared with each other to perform a third filtering.

For example, if it is known that the reflective plane is located below the microphone array and the sound sources of interest (typically when the reflective plane is the floor), the elevation angle of a secondary direction should necessarily be lower than the elevation angle of a primary direction. Thus, if no secondary direction with lower elevation angle than the direction associated with the strongest magnitude (supposed to be a primary direction) is present in the group, it is assumed that that primary direction doesn't correspond to an actual primary sound source and as a consequence the directions ot the corresponding group are discarded (filtered).

Similarly, if it is known that the reflective plane is located above the microphone array and the sound sources of interest (typically when the reflective plane is the ceiling), the elevation angle of a secondary direction should necessarily be higher than the elevation angle of a primary direction. Thus, if no secondary direction with higher elevation angle than the direction associated with the strongest magnitude (supposed to be a primary direction) is present in the group, it is assumed that that primary direction doesn't correspond to an actual primary sound source and the directions of the corresponding group are discarded (filtered).

The discarding of directions based on the elevation angle attribute as discussed above is applied to the first set of filtered directions, which means that the described third filtering may be applied after the first filtering, but prior the second filtering as this later removes secondary directions.

Figure 6 is a flowchart illustrating an implementation example of the steps of figureS based on a histogram pooling method. An histogram H(9,cp) is built by counting the number of times, within a predefined duration of analysis, a peak of maximum energy is localized at a specific location (Oi, p1) in the angular spectrum and for a given search window Omaxi x At step 601, the histogram H(O,cp) is initialized to zero which means that for any discrete location (Di, i) in the search window the value of H is set to zero.

At step 602, the local angular spectrum function of the first time frame is obtained. The local angular spectrum functions pooled over frequencies cP(t, 9, qi) are obtained from the execution of step 430 of figure 4, where one angular spectrum function is available for each time frame t.

At step 603, the N peaks of the obtained angular spectrum function, at maximum, are searched for. Preferably, N = NN as discussed above. This corresponds to the determining of the directions of sound waves (step 510, figure 5) applied to the angular spectrum for the first time frame only. Because the searching of peaks is performed frame by frame, fewer peaks (less than N) may be found for some time frames, but still N distinct peaks may be found at the end.

Now one implementation of the sorting of directions into groups is presented (corresponding to step 520). For each detected peak, a 1 D spectrum function is derived from the 2D angular spectrum function at the azimuth angle 01 of the peak. Figure 7 illustrates by an example the derivation of the 1 D spectrum 710 which corresponds to plane 720 positioned at azimuth angle 01.

Then peaks are searched from the 1D spectrum function (step 605).

These peaks may correspond for example to directions (Oi,q1),(Oi,pj) and (Oi,wk) in figure 7. All the detected peaks (directions) belong to a same group because sharing substantially a same azimuth angle (di ± A,where A = 1° to 2°).

At step 606, the elevation angle corresponding to the peak with highest magnitude (Oi,tpi) is determined, and then a test is performed at step 607 to start filtering. If no secondary peak is detected, i.e. only a single (primary) peak is detected, it means that the group contains only one direction (no secondary directions) and this direction should be discarded (first filtering). Consequently, the histogram is not incremented. If at least one secondary peak is detected, this may be indicative of a first reflection or a sound image and thus the primary peak may be counted. The counting if performed by incrementing the histogram, at step 608, for the direction of the primary peak, i.e. H(di,çoi) = H(8i,cpi) + 1. Note that only the primary peak is taken into account (counted) when incrementing the histogram, which is equivalent to perform the second filtering discussed above. In a variant, a third filtering may also be applied taking into account the elevation angle as discussed above with regards to figure 5. If a primary peak shouldn't be taken into account (discarded), the histogram is not incremented.

It is to be noted for this implementation that the filtering is performed frame by frame, without pooling the angular spectrum function over time. This has the advantage to avoid associating one primary peak of one time frame with a secondary peak of another time frame as this secondary peak cannot constitute a sound reflection of the wave forming the primary peak. When directions are pooled over time, this secondary peak may incorrectly be associated with the primary peak, unless an information about time frames within which peaks have been detected is kept.

At step 609, a test is performed to determine if all time frames have been analysed. For this, the duration of analysis is compared with the number I of total time frames. If not all time frames have been analysed, then the local angular spectrum of next time frame is obtained at step 610 and steps 603 to 609 are re-executed. If all time frames have been analysed, then the search and the filtering is finished.

The localization of the primary sources is performed by identifying the peaks in the resulting histogram H(O,q') (step 611), either by searching the peaks with the highest magnitudes or setting a threshold, and determining the corresponding directions (8i,qil) (step 612).

Figures 8A to 8D illustrate by an example the performances of embodiments of the invention when applying different pooling functions for the localization of sound sources.

Three sources (voice conversations) Si, 52 and S3 active at the same time are considered in the example. The sources are placed at 5 meters from the microphone array and at -8, -4 and 12° in azimuth respectively as sketched by figure 8A. A reflective plane representing the floor is also present.

The use of the "maximum" pooling functions P, leads to an angular spectrum function 4(9, q) illustrated in figure 8B. If a search is performed for the three highest peaks using this angular spectrum function, sources S3 and S2 will be appropriately localized, but an additional source not corresponding to a real source (801) will also be identified By implementing an embodiment of the invention, the sound source corresponding to peak 801 will be discarded because having the same azimuth angle than source S2 and with lower magnitude, considering that only one direction is kept per group. If a search is performed for the nine highest peaks (corresponding to JV = 3 x = 3), although sources Si and S2 may not be discriminated (seen as a single source), the method will still localize at the end only primary sources 52 and S3 by discarding secondary sources.

In a variant, the histogram pooling function may be used. Figure 8C illustrates the angular spectrum function t(O,ço) based on the histogram after the contribution of the secondary sources (reflections, image sound) have been discarded. The remaining content comprises mainly the activity of the direct sound path. From this filtered histogram, search for the angular position of the sources (step 611 of figure 6) is greatly simplified.

Figure 8D illustrates the result of the execution of step 611 where 4 primary sound sources have been identified. It is to be noted that while it should have been searched for only N = 3 sound sources, sound source S3 has been moving during the test between two positions (head of the person S3 moving during conversation), which led to a situation where source S3 was detected into two different positions. This is equivalent to the existence of four primary sound sources.

Figure 9 is a schematic block diagram of a computing device 900 for implementation of one or more embodiments of the invention. The computing device 900 may be a device such as a micro-computer, a workstation or a light portable device. The computing device 900 comprises a communication bus connected to: -a central processing unit 901, such as a microprocessor, denoted CPU; -a random access memory 902, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for localizing sound sources according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example; -a read only memory 903, denoted ROM, for storing computer programs for implementing embodiments of the invention; -a network interface 904 is typically connected to a communication network over which digital data to be processed are transmitted or received. The network interface 904 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 901; -a user interface 905 may be used for receiving inputs from a user or to display information to a user; -a hard disk 906 denoted HD may be provided as a mass storage device; -an I/O module 907 may be used for receiving/sending data from/to external devices such as a display.

The executable code may be stored either in read only memory 903, on the hard disk 906 or on a removable digital medium such as for example a disk.

According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 904, in order to be stored in one of the storage means of the communication device 900, such as the hard disk 906, before being executed.

The central processing unit 901 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 901 is capable of executing instructions from main RAM memory 902 relating to a software application after those instructions have been loaded from the program ROM 903 or the hard-disc (HD) 906 for example. Such a software application, when executed by the CPU 901, causes the steps of the flowcharts shown in Figures 4 to 6 to be performed.

Any step of the algorithm shown in Figure 4 to 6 may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC ("Personal Computer"), a DSP ("Digital Signal Processor") or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit").

Although the present invention has been described hereinabove with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A method of localizing sound sources in a reverberant environment comprising a reflective plane of sound waves, the method comprising: determining directions of arrival of sound waves using signals recorded by a microphone array, each direction being expressed by an elevation and azimuth angles (9,q.) in a spherical coordinates system having a reference plane parallel to the reflective plane; sorting the determined directions into groups where each group contains directions having substantially the same azimuth angle; filtering the determined directions based on a least one attribute associated with the directions of a group; and localizing audio sources based on the filtered directions.
2. The method of claim 1, wherein an attribute is the number of directions sorted into each group and wherein the filtering comprises discarding directions belonging to groups having a single member, thereby obtaining a first set of filtered directions.
3. The method of claim 2, wherein an attribute is the magnitude of a function representative of the strength of the sound signal for each direction in a group.
4. The method of claim 2, wherein an attribute is the elevation angle p of the directions in a group.
5. The method of claims 3 and 4, wherein if the reflective plane is located below the microphone array and the sound sources, the filtering further comprises discarding, from the first set of filtered directions, directions of groups having no secondary direction with lower elevation angle than the direction associated with the strongest magnitude of the group, thereby obtaining a second set of filtered directions.
6. The method of claims 3 and 4, wherein if the reflective plane is located above the microphone array and the sound sources, the filtering further comprises discarding, from the first set of filtered directions, directions of groups having no secondary direction with higher elevation angle than the direction associated with the strongest magnitude of the group, thereby obtaining a second set of filtered directions.
7. The method of claim 3, 5 or 6, wherein the filtering comprises discarding, from the first or the second set of filtered directions, directions which associated magnitudes are not the strongest among the members of each group.
8. The method of any one of claims 1 to 7, wherein the filtering is performed time frame by time frame.
9. A device for localizing sound sources in a reverberant environment comprising a plane reflective of sound waves, the device comprising: determining means for determining directions of arrival of sound waves using signals recorded by a set of microphones, each direction being expressed by an elevation and azimuth angles (O,q) in a spherical coordinates system having a reference plane parallel to the reflective plane; sorting means for sorting the determined directions into groups where each group contains directions having substantially the same azimuth angle; filtering means for filtering the determined directions based on a least one attribute associated with the directions of a group; and localizing means for localizing audio sources based on the filtered directions.
10. A non-transitory computer-readable storage medium storing a program for causing a computer to execute the method according to any one of claims 1 to 8.

-20 -
11. A program which, when run by a programmable device, causes the programmable device to execute the method according to any one of claims 1 to 8.
12. A method of localizing sound sources in a reverberant environment substantially as herein described with reference to, and as shown in, Figures 4 to 6 of the accompanying drawings.