WO2022135131A1

WO2022135131A1 - Sound source positioning method and apparatus, and electronic device

Info

Publication number: WO2022135131A1
Application number: PCT/CN2021/135833
Authority: WO
Inventors: 薛政; 徐杨飞; 张志飞
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2020-12-23
Filing date: 2021-12-06
Publication date: 2022-06-30
Also published as: CN112799018B; CN112799018A

Abstract

A sound source positioning method and apparatus, and an electronic device. The method comprises: determining at least one candidate sound source azimuth in a pickup interval corresponding to an audio information acquisition sensor (101); for the at least one candidate sound source azimuth, obtaining a signal-to-noise ratio and a signal-to-interference ratio of the candidate sound source azimuth, and determining a weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth (102); and determining a target azimuth of the sound source according to the weighting factor of the at least one candidate sound source azimuth (103). The accuracy of positioning a sound source in a multi-sound-source scene can be improved.

Description

Sound source localization method, device and electronic device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese patent application filed on December 23, 2020 with the application number 202011555230.6 and the invention titled "Sound Source Localization Method, Apparatus and Electronic Device", the full text of which is incorporated herein by reference middle.

technical field

The present disclosure relates to the technical field of information processing, and in particular, to a sound source localization method, apparatus and electronic device.

Background technique

Sound source localization refers to the technique of estimating the source of a sound source from an audio signal. Sound source localization includes the location and location of speech or other sounds. Sound source localization has a wide range of applications. For example, a security robot can adjust the camera to collect an image of the sound source position according to the sound source position determined by the sound source localization technology.

SUMMARY OF THE INVENTION

This disclosure section is provided to introduce concepts in a simplified form that are described in detail in the detailed description section that follows. This disclosure section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

Embodiments of the present disclosure provide a sound source localization method, apparatus, and electronic device.

In a first aspect, an embodiment of the present disclosure provides a sound source localization method, the method includes: determining at least one candidate sound source azimuth in a sound pickup interval corresponding to an audio information collection sensor; for the at least one candidate sound source azimuth , obtain the signal-to-noise ratio and signal-to-interference ratio of the azimuth of the candidate sound source, and determine the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source; The weighting factor for the angle determines the target azimuth of the sound source.

In a second aspect, an embodiment of the present disclosure provides a sound source localization device, the device includes: a first determination unit, configured to determine at least one candidate sound source azimuth angle within a sound pickup interval corresponding to an audio information collection sensor; The unit is configured to, for at least one candidate sound source azimuth, obtain the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth, and determine the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth a corresponding weighting factor; a second determining unit, configured to determine the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source.

In a third aspect, embodiments of the present disclosure provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs The one or more processors execute, such that the one or more processors implement the sound source localization method as described in the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the sound source localization method according to the first aspect.

Description of drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.

1 is a flowchart of one embodiment of a sound source localization method according to the present disclosure;

Fig. 2 is a schematic diagram of determining the azimuth angle of a sound source signal according to the use of two microphones;

FIG. 3 is a flowchart of yet another embodiment of a sound source localization method according to the present disclosure;

Fig. 4A is a schematic sound source localization effect diagram in the related art;

FIG. 4B shows a schematic sound source localization result diagram obtained according to the sound source localization method of the present disclosure;

Fig. 5 is a schematic structural diagram of the sound source localization method shown in Fig. 3;

6 is a schematic structural diagram of an embodiment of a sound source localization device according to the present disclosure;

7 is an exemplary system architecture to which a sound source localization method or sound source localization apparatus according to an embodiment of the present disclosure can be applied;

FIG. 8 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.

As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence.

It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

Please refer to FIG. 1 , which shows a flow of an embodiment of a sound source localization method according to the present disclosure. As shown in Figure 1, the sound source localization method includes the following steps:

Step 101: Determine at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor.

Sound signals can be collected using various audio information collection sensors. Audio information collecting sensors may include microphone arrays (also called microphone arrays or acoustic arrays).

Microphone arrays may include linear microphone arrays and non-linear microphone arrays.

After the microphone collects the audio signal, the collected analog audio signal can be converted into an electrical signal, and then subjected to sampling processing to obtain a digitized signal that can be an audio signal.

In this embodiment, it can be considered that the distance between the sound source and the audio information collection sensor is much larger than the size of the audio information collection sensor. In this scenario, the audio signal emitted by the sound source can be regarded as a plane wave.

A microphone array is usually composed of multiple microphones arranged according to certain rules. Multiple microphones can collect sound signals synchronously, and the position of the sound source that emits the sound can be determined by using the signal phase difference between the multiple microphones. The position of the sound source may be, for example, the azimuth angle from which the sound source is emitted.

Different microphone arrays can correspond to different pickup intervals.

The sound pickup interval in this embodiment refers to the spatial range in which the microphone can localize the sound source in consideration of symmetry, and is generally a plane interval or a space interval.

The sound pickup interval corresponding to the linear microphone array can be two-dimensional 180°, and the sound source localization effect is based on the rotational symmetry of the microphone connection. The sound pickup interval corresponding to the planar annular microphone array can be two-dimensional 360°, and the sound source localization effect is based on the mirror symmetry of the microphone plane. The sound pickup interval of the stereo microphone array can be stereo 360°.

Take a linear microphone array consisting of two microphones as an example. If a line parallel to the line connecting the two microphones is used as the x-axis, and a line perpendicular to the x-axis is used as the y-axis to establish a coordinate system. Set the two microphones on the x-axis, and set the midpoint of the line connecting the two microphones at the intersection O of the x-axis and the y-axis. Then the sound pickup interval corresponding to the linear microphone array composed of two microphones is from the sound source angle formed by the positive angle of the x-axis at 0° to the sound source angle formed by the positive angle of the x-axis at 180°. constituted interval.

If a line parallel to the line connecting the two microphones is used as the y-axis, and a line perpendicular to the y-axis is used as the x-axis to establish a coordinate system. Set the two microphones on the y-axis, and set the midpoint of the line connecting the two microphones at the intersection O of the x-axis and the y-axis. Then the sound pickup interval corresponding to the linear microphone array composed of two microphones is the sound source angle formed by the positive angle of the x-axis at -90° to the sound source angle formed by the positive angle of the x-axis at 90°. constituted interval.

At least one candidate sound source azimuth angle may be determined within the above-mentioned sound pickup interval.

In some optional implementations, the above-mentioned determining at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor includes:

First, the above-mentioned sound collection section is divided into at least one sound collection subsection.

Secondly, at least one candidate sound source azimuth angle is determined within at least one sound pickup sub-interval according to a preset rule.

In some application scenarios, the above-mentioned entire sound-collecting interval may be regarded as a sound-collecting sub-interval, and then at least one candidate sound source azimuth angle is determined within the above-mentioned sound-collecting sub-interval according to a preset rule. For example, the two end points corresponding to the sound pickup sub-interval are taken as candidate sound source azimuth angles.

In some other application scenarios, the sound-picking sub-intervals may be divided according to the preset sound-picking interval division rules. As an implementation manner, the number of sound pickup intervals to be divided may be determined first. Then, the pickup interval is divided into equal intervals according to the number. As another implementation manner, after the number of sound pickup intervals to be divided is determined, the sound pickup intervals may be divided at unequal intervals.

As a schematic illustration, the audio information collection sensor includes a linear microphone array composed of two microphones. The above-mentioned dividing the above-mentioned sound pickup interval into at least one sound pickup subsection includes: dividing the 180° sound pickup interval corresponding to the linear microphone array into a plurality of sound pickup subsections at equal intervals.

The following description takes the sound pickup interval corresponding to the linear microphone array as the sound pickup interval from 0° to 180° in front of the microphone as an example, and the 180° sound pickup interval can be divided into 18 sound pickup sub-intervals at equal intervals. The above 18 pickup sub-intervals can be: 0°～10°, 10°～20°, 20°～30°, 30°～40°, 40°～50°, 50°～60°, 60°～70° °, 70°～80°, 80°～90°, 90°～100°, 100°～110°, 110°～120°, 120°～130°, 130°～140°, 140°～150°, 150°～160°, 160°～170°, 170°～180°.

After the sound pickup interval is divided into sound pickup subintervals, at least one candidate sound source azimuth angle may be determined in at least one sound pickup subinterval.

As an implementation manner, the azimuth angles corresponding to the two end points of each sound pickup subsection may be used as the candidate sound source azimuth angles. After the candidate sound source azimuth angles corresponding to each sound pickup sub-interval are determined, the repeated candidate sound source azimuth angles can be de-duplicated to obtain the candidate sound source azimuth angles corresponding to the sound pickup interval. For example, the candidate sound source azimuth angles determined in the above-mentioned sound pickup subsections may be: 0°, 10°, 20°, 30°, 40°, 50°, 60°, 70°, 80°, 90°, 100° °, 110°, 120°, 130°, 140°, 150°, 160°, 170°, 180°.

As another implementation manner, a sound source azimuth (for example, the azimuth located in the middle of the sound pickup subsection) in each sound pickup subsection (excluding the two end points of the sound pickup subsection) can be used as the sound pickup subsection. The candidate sound source azimuth corresponding to the phonetic interval.

Step 102, for at least one candidate sound source azimuth, determine the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth, and determine the corresponding candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth. weighting factor.

In this embodiment, the signal-to-noise ratio of the azimuth angle of the candidate sound source can be determined according to various methods for determining the signal-to-noise ratio.

For example, the power of the noise can be measured by the noise measurement method, and then the power of the audio signal corresponding to the azimuth of the candidate sound source can be determined, and the signal-to-noise of the azimuth of the candidate sound source can be determined according to the ratio of the power of the audio signal and the power of the noise. Compare.

The signal-to-interference ratio of the azimuth angle of the candidate sound source can be determined according to various methods for determining the signal-to-interference ratio. For example, the power of the interference signal can be repeatedly extracted by measuring the interference signal, and then the power of the audio signal corresponding to the azimuth angle of the candidate sound source can be determined, and the information of the azimuth angle of the candidate sound source can be determined according to the ratio between the power of the audio signal and the power of the interference signal. dry ratio.

The weighting factor corresponding to the candidate azimuth can be determined by any function that is positively related to the signal-to-noise ratio and the signal-to-interference ratio.

Step 103: Determine the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source.

Various analyses can be performed using the above-described weighting factors indicating the azimuth of a candidate sound source to determine the target azimuth of the sound source.

Specifically, the above step 103 may include the following steps:

Sub-step 1031, for at least one candidate sound source azimuth, generate a value of the cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth.

Sub-step 1032: Determine the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.

The following takes a linear array composed of two microphones as an example for description. In other arrays of multiple microphones, the most basic unit is a linear array of two microphones. For other arrays, when determining the azimuth angle of the sound source, a linear array composed of two microphones can be used as the basic array unit for analysis, which will not be described here.

As shown in FIG. 2 , FIG. 2 shows a schematic diagram of two microphones A and B respectively receiving audio signals. It is assumed that the first audio signal received by A is x ₁ (m), and the second audio signal received by B is x ₂ (m+τ). By calculating the cross-correlation function of the first audio signal and the second audio signal, it is found that the value that maximizes the cross-correlation function is the time difference τ between the first audio signal and the second audio signal. The sound source azimuth angle θ is determined using the following formula (1).

τ=(dcos(θ))/c (1);

The travel difference between the first audio signal and the second audio signal is dcos(θ): d is the distance between the two microphones; c is the speed of light.

The cross-correlation function can be expressed by the following formula:

R(τ)=∫A(W)P(w)w ^jwτ dw (2);

Among them, where w is the frequency, τ is the time delay of the dual-mic received signal, P(w) is the cross-power spectrum of the dual-mic, and A(w) is the weighting factor.

According to formula (2), R(t) under different time delays is calculated. (t) corresponding to the maximum R(t) is the time delay of the sound source, and the corresponding sound source orientation can be calculated according to the distance between the microphones.

Specifically, the above formula (1) can be substituted into the above formula (2). Set θ as the azimuth angle of each candidate sound source mentioned above. Then, the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to the azimuth angle of each candidate sound source calculated from the azimuth angle of the candidate sound source. For example, the candidate azimuth angle with the maximum value of the corresponding cross-correlation function may be determined as the target azimuth angle of the sound source.

In the embodiment of the present disclosure, at least one candidate sound source azimuth is determined within the sound pickup interval corresponding to the audio information acquisition sensor; for at least one candidate sound source azimuth, the signal-to-noise ratio of the candidate sound source azimuth is obtained. and the signal-to-interference ratio, and determine the weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth; determine the target azimuth of the sound source according to the weighting factor of at least one candidate sound source azimuth In the above scheme, the respective weighting factors are determined by the corresponding signal-to-noise ratio and signal-to-interference ratio of each candidate sound source azimuth, and then the target azimuth is determined by the above weighting factors, which can improve the accuracy of sound source localization in multi-sound source scenarios. .

Please continue to refer to FIG. 3 , which shows a flowchart of yet another embodiment of the sound source localization method according to the present disclosure. As shown in Figure 3, the sound source localization method includes the following steps:

Step 301: Determine at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor.

In this embodiment, for the specific implementation of the foregoing step 301, reference may be made to the description of the step 101 in the embodiment shown in FIG. 1 , which is not repeated here.

Step 302 , for at least one candidate sound source azimuth, obtain a spatial enhancement signal of the audio signal and a spatial notch signal of the audio signal at the sound source azimuth.

In this embodiment, a linear array composed of two microphones is still used as an example for description.

After the at least one candidate sound source azimuth is determined in step 301, for the at least one candidate sound source azimuth, the spatial enhancement signal and the spatial notch signal of the candidate sound source azimuth can be obtained.

Still take the example shown in FIG. 2 for description. For any candidate sound source azimuth, the first audio signal and the second audio signal corresponding to the candidate sound source azimuth can be input to the preset beamforming module. The spatial enhancement signal corresponding to the azimuth angle of the candidate sound source is obtained. Wherein, the first audio signal and the second audio signal may be audio signals respectively received by each of the above-mentioned two microphones.

In practice, a signal obtained by summing the signal delays of the audio signals received by the two microphones can be determined as the spatially enhanced signal.

The spatially enhanced signal bf_ori can be characterized by the following formula (3):

bf_ori=X ₁ (ω)+X ₂ (ω)×e ^-jωτ (3);

Wherein, X ₁ (ω) is the frequency domain signal converted from the time domain to the frequency domain by x ₁ (t). X ₂ (ω) is the frequency domain signal converted from the time domain to the frequency domain by x ₂ (t),

c is the speed of light, d is the distance between the two microphones, θ is the azimuth angle of the candidate sound source, and τ is the time difference between the sound source signal reaching the two microphones.

The first audio signal and the second audio signal corresponding to the azimuth angle of the candidate sound source may be input into a preset blocking matrix to obtain a spatial notch signal corresponding to the azimuth angle of the candidate sound source. Wherein, the first audio signal and the second audio signal may be audio signals respectively received by each of the above-mentioned two microphones.

In practice, a signal obtained from the difference between the signal delays of the audio signals of the two microphones can be determined as a spatial notch signal. The spatial notch signal null_ori can be characterized by the following formula (4).

null_ori=X ₁ (ω)−X ₂ (ω)×e ^−jωτ (4).

Step 303: Determine the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source according to the above-mentioned spatial enhancement signal and the spatial notch signal, and determine the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source The weighting factor corresponding to the angle.

After obtaining the spatial enhancement signal and the spatial notch signal of the candidate sound source azimuth in step 302, the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth can be determined.

In some optional implementations, the above-mentioned spatially enhanced signal may be input to a preset noise estimation module, a first estimated noise signal is obtained by the preset noise estimation module, and the SNR is determined by using the spatially enhanced signal and the first estimated noise signal .

For example, the following formula can be used to determine the signal-to-noise ratio (SNR) corresponding to the azimuth angle of the candidate sound source:

in

bf_noise is the first estimated noise signal, and bf_ori is the spatially enhanced signal of the azimuth angle of the candidate sound source.

The above-mentioned preset noise estimation module may be a noise estimation module implemented by various algorithms for determining the signal noise floor. In some application scenarios, the algorithm for determining the signal noise floor may be, for example, a minimum controlled recursive averaging (Minimum Controlled Regressive Averaging, MCRA) module.

In these optional implementation manners, the spatial notch signal may be input to the preset noise estimation module to obtain a second estimated noise signal, and the difference between the spatially enhanced signal and the first estimated noise, the The difference between the spatial notch signal and the second estimated noise determines the signal-to-interference ratio (SIR).

in

bf_noise is the first estimated noise signal, and bf_ori is the spatially enhanced signal of the azimuth angle of the candidate sound source. null_ori is the spatial notch signal, and null_noise is the second estimated noise signal.

After the above signal-to-noise ratio and signal-to-interference ratio are determined, the weighting factor A(ω) of the azimuth angle of the sound source can be determined according to the following formula.

A(ω)=f(SNR(ω),SIR(ω)) (7);

The above-mentioned function f(SNR(ω), SIR(ω)) may be any function that is positively correlated with SNR(ω) and SIR(ω), which is not limited here.

The frequency points with low signal-to-noise ratio will have a smoothing effect on the sound source estimation, which will reduce the sound source direction resolution. The frequency points with low signal-to-interference ratio will cause serious interference to the sound source estimation. In the multi-sound source scenario, the R value near the high-intensity sound source will be high, thus affecting the direction estimation of other sound sources. The above-mentioned method for determining the weighting factor provided in this embodiment is to assign higher weights to the frequency bins with high signal-to-noise ratio and high signal-to-interference ratio. In this way, the adverse effects caused by the frequency points with low signal-to-noise ratio and low signal-to-interference ratio can be reduced when determining the sound source azimuth.

Step 304, for each candidate sound source azimuth, determine the value of the cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth.

Equation (2) can be used to calculate the value of the cross-correlation function R(τ) corresponding to each candidate sound source azimuth.

Step 305: Determine the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.

In some application scenarios, it may be required to determine the azimuth of a sound source in this scenario. The maximum value among the values of the mutual functions corresponding to the at least one candidate sound source azimuth angle may be determined, and the candidate sound source azimuth angle corresponding to the maximum value is determined as the target azimuth angle.

In some other application scenarios, the scenario includes multiple sound sources, and it is required to determine the azimuth angles corresponding to the multiple sound sources respectively. In these application scenarios, a plurality of local extrema can be determined from the values of the cross-correlation functions corresponding to the at least one candidate sound source azimuth, and the candidate sound source azimuth corresponding to each of the multiple local extremums can be determined. Target azimuths corresponding to each of the multiple sound sources.

Compared with the embodiment shown in FIG. 1 , this embodiment highlights the step of determining the weighting factor corresponding to each candidate azimuth angle by using the spatial enhancement signal and the spatial notch signal corresponding to each candidate azimuth angle. The above weighting factor has better anti-non-stationary interference performance, so the above-mentioned solution can be used to determine the target azimuth angle with better anti-non-stationary interference ability. In addition, the accuracy of the determined target azimuth angle of the sound source can be further improved.

4A and 4B, FIG. 4A is a schematic diagram of an audio signal energy distribution for sound source localization in the related art; FIG. 4B shows an audio signal energy distribution diagram obtained according to the sound source localization method of the present disclosure. . As shown in FIG. 4A , it shows a schematic diagram of the energy distribution of audio signals for dual sound sources in the related art. It can be seen in FIG. 4A that there is a maximum value in the azimuth angle of 0°, and the angle (0°) corresponding to the maximum value can be used as the target azimuth angle of a sound source. Furthermore, another extreme value cannot be clearly identified from Figure 4A.

Please refer to FIG. 4B . FIG. 4B is a schematic diagram of an audio signal energy distribution determined according to the sound source localization method shown in FIGS. 1 and 3 . As shown in Figure 4B, it can be clearly seen that the energy of the audio signal has two extremes at 0° and -60°, so it can be determined that the target azimuths corresponding to the above two sound sources are 0° and -60°, respectively.

Please refer to FIG. 5 , which shows a schematic structural diagram of the sound source localization method shown in FIG. 3 . As shown in FIG. 5 , a linear microphone array is formed by two microphones A and B, and the microphone array can collect the audio signal emitted by the sound source. The 180° audio signal pickup interval corresponding to the linear microphone array formed by the two microphones A and B can be divided into 18 sound pickup sub-intervals at equal intervals (every 10°). The endpoints corresponding to each of the 18 sound pickup subsections can be used as candidate sound source azimuths. After removing the repeated candidate sound source azimuths, 19 candidate sound source azimuths can be obtained. For the 19 candidate sound source azimuth angles, reference may be made to the description of the example shown in FIG. 1 , and details are not described here.

For each candidate sound source azimuth, the signal of the audio signal from the sound source reaching the microphone A can be expressed as x1(m), and the signal of the audio signal from the sound source reaching the microphone B can be expressed as x2(m+τ). The signals x1(m) and x2(m+τ) respectively received by the microphones A and B are converted to the frequency domain to obtain the frequency domain signals X ₁ (ω) and X ₂ (ω)×e ^-jωτ . in,

First, input the frequency domain signals corresponding to the microphones A and B respectively to the beamforming module to obtain a spatially enhanced signal (frequency domain signal) bf_ori of the signals corresponding to the microphones A and B respectively. Then, the above-mentioned spatially enhanced signal is input to the noise estimation module to obtain the first estimated noise bf_noise. Then, the signal-to-noise ratio SNR can be calculated from the spatially enhanced signal bf_ori and the first estimated noise bf_noise according to formula (5).

Second, input the frequency domain signals corresponding to the microphones A and B respectively to the blocking matrix module to obtain the spatial notch signal (frequency domain signal) null_ori of the signals corresponding to the microphones A and B respectively. Then, the above-mentioned spatial notch signal is input to the noise estimation module to obtain the second estimated noise null_noise. Then, the difference between the spatial enhancement signal bf_ori and the first estimated noise bf_noise and the difference between the spatial notch signal null_ori and the second estimated noise null_noise can be determined, and then the signal-to-interference ratio SIR is calculated according to formula (6).

Third, the weighting factor A(ω) corresponding to the azimuth of the candidate sound source can be determined according to the function that is positively correlated with the azimuth of the candidate sound source and the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source. The value of the cross-correlation function corresponding to the sound source azimuth is determined by the weighting factor corresponding to the sound source azimuth.

Finally, the target azimuth angle of the sound source is determined by the value of the cross-correlation function corresponding to each sound source azimuth angle.

Further referring to FIG. 6 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a sound source localization apparatus, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 1 . Can be used in various electronic devices.

As shown in FIG. 6 , the sound source localization apparatus of this embodiment includes: a first determination unit 601 , an acquisition unit 602 , and a second determination unit 603 . Wherein, the first determining unit 601 is used for determining at least one candidate sound source azimuth angle in the sound pickup interval corresponding to the audio information collection sensor; the acquiring unit 602 is used for obtaining the candidate sound source azimuth angle for at least one candidate sound source azimuth angle. The signal-to-noise ratio and the signal-to-interference ratio of the source azimuth, and determine the weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth; the second determining unit 603 is used for at least one candidate sound source azimuth. The weighting factor for the azimuth of the sound source determines the target azimuth of the sound source.

In this embodiment, the specific processing of the first determination unit 601 , the acquisition unit 602 and the second determination unit 603 of the sound source localization device and the technical effects brought about by them can refer to

steps

101 and 101 in the corresponding embodiment of FIG. 1 , respectively. 102. The related description of step 103 is not repeated here.

In some optional implementations, the second determining unit 603 is further configured to: for at least one candidate sound source azimuth, generate a cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth. value; the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth angle.

In some optional implementations, the first determining unit 601 is further configured to: divide the sound pickup interval into at least one sound pickup subsection; determine the sound pickup subsection within at least one sound pickup subsection according to a preset rule At least one candidate sound source azimuth.

In some optional implementations, the audio information collection sensor includes a linear microphone array composed of two microphones; and the first determining unit 601 is further configured to: pick up sounds corresponding to 180° of the linear microphone array The section is divided into a plurality of pickup subsections at equal intervals.

In some optional implementation manners, the obtaining unit 602 is further configured to: obtain the spatial enhancement signal of the audio signal at the azimuth angle of the candidate sound source and the spatial notch signal of the audio signal; according to the spatial enhancement signal and the spatial notch signal Wave signal, determine the signal-to-noise ratio and signal-to-interference ratio of the azimuth of the sound source.

In some optional implementations, the obtaining unit 602 is further configured to: determine a signal obtained by summing the signal delays of the audio signals received by the two microphones as a spatially enhanced signal; The signal obtained by the time difference is determined as the spatial notch signal.

In some optional implementation manners, the obtaining unit 602 is further configured to: input the spatially enhanced signal into a preset noise estimation module to obtain a first estimated noise signal, and use the spatially enhanced signal and the first estimated noise signal to determine the signal-to-noise ratio; input the spatial notch signal into the preset noise estimation module to obtain a second estimated noise, and use the difference between the spatially enhanced signal and the first estimated noise, the spatial notch The difference between the signal and the second estimated noise determines the signal-to-interference ratio.

In some optional implementations, the obtaining unit 602 is further configured to: determine the weighting factor according to a function that is positively correlated with the signal-to-noise ratio and the signal-to-interference ratio.

In some optional implementation manners, the second determining unit 603 is further configured to: determine the maximum value from the values of the cross-correlation functions corresponding to the at least one candidate sound source azimuth angle, and set the maximum value corresponding to the The candidate sound source azimuth of is determined as the target azimuth of the sound source.

In some optional implementations, the second determining unit 603 is further configured to: generate a distribution map of the values of the cross-correlation function according to the values of the cross-correlation function corresponding to the at least one candidate sound source azimuth; In the figure, at least two local extrema are determined, and the candidate sound source azimuth angles corresponding to the at least two local extrema values are determined as the target azimuth angles corresponding to the at least two sound sources respectively.

Please refer to FIG. 7 , which shows an exemplary system architecture in which a sound source localization method or a sound source localization apparatus according to an embodiment of the present disclosure can be applied.

As shown in FIG. 7 , the system architecture may include an audio information collection sensor, a terminal device 703 , a network 704 , and a server 705 . The audio information collection sensor may include

microphones

701 and 702 . In some application scenarios, the audio information collection sensor may be connected to the terminal device 703 through wired communication. In some other application scenarios, the above-mentioned audio information collection sensor may be set in the terminal device. The network 704 is the medium used to provide the communication link between the terminal device 703 and the server 705 . Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The audio information collection sensor may send the collected audio signal to the terminal device 703 through wired communication.

The terminal device 703 can interact with the server 705 through the network 704 to receive or send messages and the like. Various client applications may be installed on the terminal device 703 , such as web browser applications, search applications, news information applications, and audio signal processing applications. The client application in the terminal device 703 can receive the user's instruction, and perform corresponding functions according to the user's instruction, for example, analyze and process the audio signal according to the user's instruction.

The terminal device 703 may be hardware or software. When the terminal device 703 is hardware, it can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc. When the terminal device 703 is software, it can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (eg, software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.

The server 705 may be a server that provides various services, such as receiving the audio signal sent by the terminal device 703, performing analysis and processing according to the audio signal, and sending the processing result (eg, the target azimuth of the sound source) to the terminal device.

It should be noted that the sound source localization method provided by the embodiment of the present disclosure may be executed by a terminal device, and correspondingly, the sound source localization apparatus may be set in the terminal device 703 . In addition, the sound source localization method provided by the embodiment of the present disclosure may also be executed by the server 705 , and accordingly, the sound source localization apparatus may be provided in the server 705 .

It should be understood that the numbers of terminal devices, networks and servers in FIG. 7 are only illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

Referring next to FIG. 8 , it shows a schematic structural diagram of an electronic device (eg, a terminal device or a server in FIG. 7 ) suitable for implementing an embodiment of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in FIG. 8 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

As shown in FIG. 8, the electronic device may include a processing device (eg, a central processing unit, a graphics processor, etc.) 801, which may be loaded into a random access memory according to a program stored in a read only memory (ROM) 802 or from a storage device 808 The program in the (RAM) 803 executes various appropriate operations and processes. In the RAM 803, various programs and data required for the operation of the electronic device 800 are also stored. The processing device 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804. An input/output (I/O) interface 805 is also connected to bus 804 .

Typically, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 807 of a computer, etc.; a storage device 808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 809. Communication means 809 may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While Figure 8 illustrates an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or available. More or fewer devices may alternatively be implemented or provided.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 809, or from the storage device 808, or from the ROM 802. When the computer program is executed by the processing device 801, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, clients and servers can communicate using any currently known or future developed network protocols such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium. Communication (eg, a communication network) interconnects. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to: determine at least one candidate sound source in the sound pickup interval corresponding to the audio information acquisition sensor Azimuth; for at least one candidate sound source azimuth, obtain the signal-to-noise ratio and signal-to-interference ratio of the candidate sound source azimuth, and determine the corresponding candidate sound source azimuth according to the signal-to-noise ratio and signal-to-interference ratio of the candidate sound source azimuth The weighting factor of the sound source; the target azimuth angle of the sound source is determined according to the weighting factor of the azimuth angle of at least one candidate sound source.

Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

The sound source localization method provided according to one or more embodiments of the present disclosure includes: determining at least one candidate sound source azimuth in a sound pickup interval corresponding to an audio information collection sensor; for the at least one candidate sound source azimuth, Obtain the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source, and determine the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source; according to at least one azimuth of the candidate sound source The weighting factor of determines the target azimuth of the sound source.

According to one or more embodiments of the present disclosure, the determining the target azimuth of the sound source according to the weighting factor of the at least one candidate sound source azimuth includes: for at least one candidate sound source azimuth, according to the candidate sound source azimuth The weighting factor of the azimuth angle generates the value of the cross-correlation function corresponding to the azimuth angle of the candidate sound source; the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.

According to one or more embodiments of the present disclosure, the determining at least one candidate sound source azimuth angle in a sound pickup interval corresponding to the audio information collection device includes: dividing the sound pickup interval into at least one sound pickup sub-interval ; determine the azimuth angle of the at least one candidate sound source within at least one sound pickup sub-interval according to a preset rule.

According to one or more embodiments of the present disclosure, the audio information collection sensor includes a linear microphone array composed of two microphones; and the dividing the sound pickup interval into at least one sound pickup sub-interval includes: dividing the sound pickup interval into at least one sound pickup sub-interval. The 180° sound pickup interval corresponding to the linear microphone array is divided into a plurality of sound pickup sub-intervals at equal intervals.

According to one or more embodiments of the present disclosure, the determining the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source includes: acquiring a spatially enhanced signal and a spatial trap of the audio signal of the audio signal at the azimuth of the candidate sound source wave signal; according to the spatial enhancement signal and the spatial notch signal, determine the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the sound source.

According to one or more embodiments of the present disclosure, acquiring the spatial enhancement signal of the audio signal and the spatial notch signal of the audio signal at the azimuth angle of the candidate sound source includes: delaying the signal of the audio signal received by the two microphones. The signal obtained by the sum of the times is determined as the spatial enhancement signal; the signal obtained by the difference between the signal delays of the audio signals of the two microphones is determined as the spatial notch signal.

According to one or more embodiments of the present disclosure, the determining, according to the spatially enhanced signal and the spatially notch signal, the signal-to-noise ratio and the signal-to-interference ratio of the sound source azimuth includes: inputting the spatially enhanced signal into a pre- A noise estimation module is set to obtain a first estimated noise signal, and the signal-to-noise ratio is determined by using the spatial enhancement signal and the first estimated noise signal; the spatial notch signal is input to the preset noise estimation module, A second estimated noise is obtained, and the signal-to-interference ratio is determined by using the difference between the spatial enhancement signal and the first estimated noise and the difference between the spatial notch signal and the second estimated noise.

According to one or more embodiments of the present disclosure, the determining the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source includes: according to the signal-to-noise ratio and the signal-to-noise ratio The weighting factor is determined by a function that is positively related to the signal-to-interference ratio.

According to one or more embodiments of the present disclosure, the determining the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth includes: from the at least one candidate sound source azimuth The maximum value is determined from the values of the cross-correlation functions corresponding to the respective angles, and the candidate sound source azimuth angle corresponding to the maximum value is determined as the target azimuth angle of the sound source.

According to one or more embodiments of the present disclosure, the determining the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth includes: according to the at least one candidate sound source azimuth The value of the cross-correlation function corresponding to each angle generates a distribution diagram of the value of the cross-correlation function; at least two local extreme values are determined in the distribution diagram, and the candidate sound sources corresponding to the at least two local extreme values The azimuth angle is determined as the target azimuth angle corresponding to each of the at least two sound sources.

The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

A sound source localization method, comprising:

Determine at least one candidate sound source azimuth within the sound pickup interval corresponding to the audio information collection sensor;

For at least one candidate sound source azimuth, obtain the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth, and determine the weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth ;

The target azimuth of the sound source is determined according to the weighting factor of the at least one candidate sound source azimuth.
The method according to claim 1, wherein the determining the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source comprises:

For at least one candidate sound source azimuth, generate the value of the cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth;

The target azimuth of the sound source is determined according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth.
The method according to claim 1, wherein the determining at least one candidate sound source azimuth angle in the sound pickup interval corresponding to the audio information collection sensor comprises:

dividing the pickup interval into at least one pickup sub-interval;

The at least one candidate sound source azimuth is determined within at least one sound pickup sub-interval according to a preset rule.
The method of claim 2, wherein the audio information collection sensor comprises a linear microphone array consisting of two microphones; and

The dividing the sound-picking interval into at least one sound-picking sub-interval includes:

The 180° sound pickup interval corresponding to the linear microphone array is divided into a plurality of sound pickup sub-intervals at equal intervals.
The method according to claim 3, wherein the determining the signal-to-noise ratio and the signal-to-interference ratio of the azimuth angle of the candidate sound source comprises:

obtaining the spatial enhancement signal of the audio signal and the spatial notch signal of the audio signal at the azimuth angle of the candidate sound source;

According to the spatial enhancement signal and the spatial notch signal, the signal-to-noise ratio and the signal-to-interference ratio of the sound source azimuth are determined.
The method of claim 5, wherein:

The obtaining of the spatial enhancement signal of the audio signal and the spatial notch signal of the audio signal at the azimuth angle of the candidate sound source includes:

The signal obtained by the sum of the signal delays of the audio signals received by the two microphones is determined as the spatially enhanced signal;

A signal obtained from the difference between the signal delays of the audio signals of the two microphones is determined as a spatial notch signal.
The method according to claim 5, wherein the determining the signal-to-noise ratio and the signal-to-interference ratio of the sound source azimuth according to the spatial enhancement signal and the spatial notch signal comprises:

inputting the spatially enhanced signal into a preset noise estimation module to obtain a first estimated noise signal, and using the spatially enhanced signal and the first estimated noise signal to determine the signal-to-noise ratio;

Inputting the spatial notch signal into the preset noise estimation module to obtain a second estimated noise, using the difference between the spatially enhanced signal and the first estimated noise, and the difference between the spatial notch signal and the second estimated noise , to determine the signal-to-interference ratio.
The method according to claim 1, wherein the determining the weighting factor corresponding to the azimuth angle of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth angle of the candidate sound source comprises:

The weighting factor is determined according to a function that is positively related to the signal-to-noise ratio and the signal-to-interference ratio.
The method according to claim 1, wherein the determining the target azimuth angle of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth angle comprises:

The maximum value is determined from the values of the cross-correlation functions corresponding to the at least one candidate sound source azimuth, and the candidate sound source azimuth corresponding to the maximum value is determined as the target azimuth of the sound source.
The method according to claim 1, wherein the determining the target azimuth angle of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth angle comprises:

generating a distribution map of the values of the cross-correlation function according to the values of the cross-correlation function corresponding to the azimuth angles of the at least one candidate sound source;

At least two local extrema are determined in the distribution map, and the candidate sound source azimuth angles corresponding to the at least two local extrema are determined as the target azimuth angles corresponding to the at least two sound sources respectively.
A sound source localization device, comprising:

a first determining unit, configured to determine at least one candidate sound source azimuth within the sound pickup interval corresponding to the audio information collection sensor;

an acquisition unit, configured to acquire the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth for at least one candidate sound source azimuth, and determine the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth The weighting factor corresponding to the angle;

The second determining unit is configured to determine the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source.
An electronic device, comprising:

at least one processor;

storage means for storing at least one program,

The at least one program, when executed by the at least one processor, causes the at least one processor to implement a method as claimed in any one of claims 1-10.
A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 1-10 is implemented.