WO2022135131A1 - Sound source positioning method and apparatus, and electronic device - Google Patents
Sound source positioning method and apparatus, and electronic device Download PDFInfo
- Publication number
- WO2022135131A1 WO2022135131A1 PCT/CN2021/135833 CN2021135833W WO2022135131A1 WO 2022135131 A1 WO2022135131 A1 WO 2022135131A1 CN 2021135833 W CN2021135833 W CN 2021135833W WO 2022135131 A1 WO2022135131 A1 WO 2022135131A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound source
- signal
- azimuth
- candidate
- candidate sound
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000005236 sound signal Effects 0.000 claims description 63
- 230000004807 localization Effects 0.000 claims description 43
- 238000005314 correlation function Methods 0.000 claims description 32
- 230000006870 function Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 9
- 230000001934 delay Effects 0.000 claims description 7
- 230000000875 corresponding effect Effects 0.000 description 88
- 238000010586 diagram Methods 0.000 description 19
- 238000003491 array Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 239000000835 fiber Substances 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 101001120757 Streptococcus pyogenes serotype M49 (strain NZ131) Oleate hydratase Proteins 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 229940083712 aldosterone antagonist Drugs 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000001373 regressive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/22—Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Definitions
- the present disclosure relates to the technical field of information processing, and in particular, to a sound source localization method, apparatus and electronic device.
- Sound source localization refers to the technique of estimating the source of a sound source from an audio signal. Sound source localization includes the location and location of speech or other sounds. Sound source localization has a wide range of applications. For example, a security robot can adjust the camera to collect an image of the sound source position according to the sound source position determined by the sound source localization technology.
- Embodiments of the present disclosure provide a sound source localization method, apparatus, and electronic device.
- an embodiment of the present disclosure provides a sound source localization method, the method includes: determining at least one candidate sound source azimuth in a sound pickup interval corresponding to an audio information collection sensor; for the at least one candidate sound source azimuth , obtain the signal-to-noise ratio and signal-to-interference ratio of the azimuth of the candidate sound source, and determine the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source; The weighting factor for the angle determines the target azimuth of the sound source.
- an embodiment of the present disclosure provides a sound source localization device, the device includes: a first determination unit, configured to determine at least one candidate sound source azimuth angle within a sound pickup interval corresponding to an audio information collection sensor; The unit is configured to, for at least one candidate sound source azimuth, obtain the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth, and determine the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth a corresponding weighting factor; a second determining unit, configured to determine the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source.
- embodiments of the present disclosure provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs
- the one or more processors execute, such that the one or more processors implement the sound source localization method as described in the first aspect.
- an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the sound source localization method according to the first aspect.
- FIG. 1 is a flowchart of one embodiment of a sound source localization method according to the present disclosure
- Fig. 2 is a schematic diagram of determining the azimuth angle of a sound source signal according to the use of two microphones
- FIG. 3 is a flowchart of yet another embodiment of a sound source localization method according to the present disclosure
- Fig. 4A is a schematic sound source localization effect diagram in the related art
- FIG. 4B shows a schematic sound source localization result diagram obtained according to the sound source localization method of the present disclosure
- Fig. 5 is a schematic structural diagram of the sound source localization method shown in Fig. 3;
- FIG. 6 is a schematic structural diagram of an embodiment of a sound source localization device according to the present disclosure.
- FIG. 7 is an exemplary system architecture to which a sound source localization method or sound source localization apparatus according to an embodiment of the present disclosure can be applied;
- FIG. 8 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
- the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
- the term “based on” is “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- FIG. 1 shows a flow of an embodiment of a sound source localization method according to the present disclosure.
- the sound source localization method includes the following steps:
- Step 101 Determine at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor.
- Audio information collecting sensors may include microphone arrays (also called microphone arrays or acoustic arrays).
- Microphone arrays may include linear microphone arrays and non-linear microphone arrays.
- the collected analog audio signal can be converted into an electrical signal, and then subjected to sampling processing to obtain a digitized signal that can be an audio signal.
- the distance between the sound source and the audio information collection sensor is much larger than the size of the audio information collection sensor.
- the audio signal emitted by the sound source can be regarded as a plane wave.
- a microphone array is usually composed of multiple microphones arranged according to certain rules. Multiple microphones can collect sound signals synchronously, and the position of the sound source that emits the sound can be determined by using the signal phase difference between the multiple microphones. The position of the sound source may be, for example, the azimuth angle from which the sound source is emitted.
- Different microphone arrays can correspond to different pickup intervals.
- the sound pickup interval in this embodiment refers to the spatial range in which the microphone can localize the sound source in consideration of symmetry, and is generally a plane interval or a space interval.
- the sound pickup interval corresponding to the linear microphone array can be two-dimensional 180°, and the sound source localization effect is based on the rotational symmetry of the microphone connection.
- the sound pickup interval corresponding to the planar annular microphone array can be two-dimensional 360°, and the sound source localization effect is based on the mirror symmetry of the microphone plane.
- the sound pickup interval of the stereo microphone array can be stereo 360°.
- a linear microphone array consisting of two microphones as an example. If a line parallel to the line connecting the two microphones is used as the x-axis, and a line perpendicular to the x-axis is used as the y-axis to establish a coordinate system. Set the two microphones on the x-axis, and set the midpoint of the line connecting the two microphones at the intersection O of the x-axis and the y-axis. Then the sound pickup interval corresponding to the linear microphone array composed of two microphones is from the sound source angle formed by the positive angle of the x-axis at 0° to the sound source angle formed by the positive angle of the x-axis at 180°. constituted interval.
- a line parallel to the line connecting the two microphones is used as the y-axis
- a line perpendicular to the y-axis is used as the x-axis to establish a coordinate system.
- the sound pickup interval corresponding to the linear microphone array composed of two microphones is the sound source angle formed by the positive angle of the x-axis at -90° to the sound source angle formed by the positive angle of the x-axis at 90°. constituted interval.
- At least one candidate sound source azimuth angle may be determined within the above-mentioned sound pickup interval.
- the above-mentioned determining at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor includes:
- the above-mentioned sound collection section is divided into at least one sound collection subsection.
- At least one candidate sound source azimuth angle is determined within at least one sound pickup sub-interval according to a preset rule.
- the above-mentioned entire sound-collecting interval may be regarded as a sound-collecting sub-interval, and then at least one candidate sound source azimuth angle is determined within the above-mentioned sound-collecting sub-interval according to a preset rule. For example, the two end points corresponding to the sound pickup sub-interval are taken as candidate sound source azimuth angles.
- the sound-picking sub-intervals may be divided according to the preset sound-picking interval division rules.
- the number of sound pickup intervals to be divided may be determined first. Then, the pickup interval is divided into equal intervals according to the number.
- the sound pickup intervals may be divided at unequal intervals.
- the audio information collection sensor includes a linear microphone array composed of two microphones.
- the above-mentioned dividing the above-mentioned sound pickup interval into at least one sound pickup subsection includes: dividing the 180° sound pickup interval corresponding to the linear microphone array into a plurality of sound pickup subsections at equal intervals.
- the following description takes the sound pickup interval corresponding to the linear microphone array as the sound pickup interval from 0° to 180° in front of the microphone as an example, and the 180° sound pickup interval can be divided into 18 sound pickup sub-intervals at equal intervals.
- the above 18 pickup sub-intervals can be: 0° ⁇ 10°, 10° ⁇ 20°, 20° ⁇ 30°, 30° ⁇ 40°, 40° ⁇ 50°, 50° ⁇ 60°, 60° ⁇ 70° °, 70° ⁇ 80°, 80° ⁇ 90°, 90° ⁇ 100°, 100° ⁇ 110°, 110° ⁇ 120°, 120° ⁇ 130°, 130° ⁇ 140°, 140° ⁇ 150°, 150° ⁇ 160°, 160° ⁇ 170°, 170° ⁇ 180°.
- At least one candidate sound source azimuth angle may be determined in at least one sound pickup subinterval.
- the azimuth angles corresponding to the two end points of each sound pickup subsection may be used as the candidate sound source azimuth angles.
- the repeated candidate sound source azimuth angles can be de-duplicated to obtain the candidate sound source azimuth angles corresponding to the sound pickup interval.
- the candidate sound source azimuth angles determined in the above-mentioned sound pickup subsections may be: 0°, 10°, 20°, 30°, 40°, 50°, 60°, 70°, 80°, 90°, 100° °, 110°, 120°, 130°, 140°, 150°, 160°, 170°, 180°.
- a sound source azimuth (for example, the azimuth located in the middle of the sound pickup subsection) in each sound pickup subsection (excluding the two end points of the sound pickup subsection) can be used as the sound pickup subsection.
- the candidate sound source azimuth corresponding to the phonetic interval.
- Step 102 for at least one candidate sound source azimuth, determine the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth, and determine the corresponding candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth. weighting factor.
- the signal-to-noise ratio of the azimuth angle of the candidate sound source can be determined according to various methods for determining the signal-to-noise ratio.
- the power of the noise can be measured by the noise measurement method, and then the power of the audio signal corresponding to the azimuth of the candidate sound source can be determined, and the signal-to-noise of the azimuth of the candidate sound source can be determined according to the ratio of the power of the audio signal and the power of the noise. Compare.
- the signal-to-interference ratio of the azimuth angle of the candidate sound source can be determined according to various methods for determining the signal-to-interference ratio. For example, the power of the interference signal can be repeatedly extracted by measuring the interference signal, and then the power of the audio signal corresponding to the azimuth angle of the candidate sound source can be determined, and the information of the azimuth angle of the candidate sound source can be determined according to the ratio between the power of the audio signal and the power of the interference signal. dry ratio.
- the weighting factor corresponding to the candidate azimuth can be determined by any function that is positively related to the signal-to-noise ratio and the signal-to-interference ratio.
- Step 103 Determine the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source.
- step 103 may include the following steps:
- Sub-step 1031 for at least one candidate sound source azimuth, generate a value of the cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth.
- Sub-step 1032 Determine the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.
- the following takes a linear array composed of two microphones as an example for description.
- the most basic unit is a linear array of two microphones.
- a linear array composed of two microphones can be used as the basic array unit for analysis, which will not be described here.
- FIG. 2 shows a schematic diagram of two microphones A and B respectively receiving audio signals. It is assumed that the first audio signal received by A is x 1 (m), and the second audio signal received by B is x 2 (m+ ⁇ ). By calculating the cross-correlation function of the first audio signal and the second audio signal, it is found that the value that maximizes the cross-correlation function is the time difference ⁇ between the first audio signal and the second audio signal.
- the sound source azimuth angle ⁇ is determined using the following formula (1).
- the travel difference between the first audio signal and the second audio signal is dcos( ⁇ ): d is the distance between the two microphones; c is the speed of light.
- the cross-correlation function can be expressed by the following formula:
- w is the frequency
- ⁇ is the time delay of the dual-mic received signal
- P(w) is the cross-power spectrum of the dual-mic
- A(w) is the weighting factor
- R(t) under different time delays is calculated.
- (t) corresponding to the maximum R(t) is the time delay of the sound source, and the corresponding sound source orientation can be calculated according to the distance between the microphones.
- the above formula (1) can be substituted into the above formula (2).
- Set ⁇ as the azimuth angle of each candidate sound source mentioned above.
- the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to the azimuth angle of each candidate sound source calculated from the azimuth angle of the candidate sound source.
- the candidate azimuth angle with the maximum value of the corresponding cross-correlation function may be determined as the target azimuth angle of the sound source.
- At least one candidate sound source azimuth is determined within the sound pickup interval corresponding to the audio information acquisition sensor; for at least one candidate sound source azimuth, the signal-to-noise ratio of the candidate sound source azimuth is obtained.
- the respective weighting factors are determined by the corresponding signal-to-noise ratio and signal-to-interference ratio of each candidate sound source azimuth, and then the target azimuth is determined by the above weighting factors, which can improve the accuracy of sound source localization in multi-sound source scenarios. .
- FIG. 3 shows a flowchart of yet another embodiment of the sound source localization method according to the present disclosure.
- the sound source localization method includes the following steps:
- Step 301 Determine at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor.
- step 301 for the specific implementation of the foregoing step 301, reference may be made to the description of the step 101 in the embodiment shown in FIG. 1 , which is not repeated here.
- Step 302 for at least one candidate sound source azimuth, obtain a spatial enhancement signal of the audio signal and a spatial notch signal of the audio signal at the sound source azimuth.
- a linear array composed of two microphones is still used as an example for description.
- the spatial enhancement signal and the spatial notch signal of the candidate sound source azimuth can be obtained.
- the first audio signal and the second audio signal corresponding to the candidate sound source azimuth can be input to the preset beamforming module.
- the spatial enhancement signal corresponding to the azimuth angle of the candidate sound source is obtained.
- the first audio signal and the second audio signal may be audio signals respectively received by each of the above-mentioned two microphones.
- a signal obtained by summing the signal delays of the audio signals received by the two microphones can be determined as the spatially enhanced signal.
- the spatially enhanced signal bf_ori can be characterized by the following formula (3):
- X 1 ( ⁇ ) is the frequency domain signal converted from the time domain to the frequency domain by x 1 (t).
- X 2 ( ⁇ ) is the frequency domain signal converted from the time domain to the frequency domain by x 2 (t)
- c is the speed of light
- d is the distance between the two microphones
- ⁇ is the azimuth angle of the candidate sound source
- ⁇ is the time difference between the sound source signal reaching the two microphones.
- the first audio signal and the second audio signal corresponding to the azimuth angle of the candidate sound source may be input into a preset blocking matrix to obtain a spatial notch signal corresponding to the azimuth angle of the candidate sound source.
- the first audio signal and the second audio signal may be audio signals respectively received by each of the above-mentioned two microphones.
- a signal obtained from the difference between the signal delays of the audio signals of the two microphones can be determined as a spatial notch signal.
- the spatial notch signal null_ori can be characterized by the following formula (4).
- null_ori X 1 ( ⁇ ) ⁇ X 2 ( ⁇ ) ⁇ e ⁇ j ⁇ (4).
- Step 303 Determine the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source according to the above-mentioned spatial enhancement signal and the spatial notch signal, and determine the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source The weighting factor corresponding to the angle.
- the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth can be determined.
- the above-mentioned spatially enhanced signal may be input to a preset noise estimation module, a first estimated noise signal is obtained by the preset noise estimation module, and the SNR is determined by using the spatially enhanced signal and the first estimated noise signal .
- the following formula can be used to determine the signal-to-noise ratio (SNR) corresponding to the azimuth angle of the candidate sound source:
- bf_noise is the first estimated noise signal
- bf_ori is the spatially enhanced signal of the azimuth angle of the candidate sound source.
- the above-mentioned preset noise estimation module may be a noise estimation module implemented by various algorithms for determining the signal noise floor.
- the algorithm for determining the signal noise floor may be, for example, a minimum controlled recursive averaging (Minimum Controlled Regressive Averaging, MCRA) module.
- MCRA Minimum Controlled Regressive Averaging
- the spatial notch signal may be input to the preset noise estimation module to obtain a second estimated noise signal, and the difference between the spatially enhanced signal and the first estimated noise, the The difference between the spatial notch signal and the second estimated noise determines the signal-to-interference ratio (SIR).
- SIR signal-to-interference ratio
- bf_noise is the first estimated noise signal
- bf_ori is the spatially enhanced signal of the azimuth angle of the candidate sound source.
- null_ori is the spatial notch signal
- null_noise is the second estimated noise signal.
- the weighting factor A( ⁇ ) of the azimuth angle of the sound source can be determined according to the following formula.
- the above-mentioned function f(SNR( ⁇ ), SIR( ⁇ )) may be any function that is positively correlated with SNR( ⁇ ) and SIR( ⁇ ), which is not limited here.
- the frequency points with low signal-to-noise ratio will have a smoothing effect on the sound source estimation, which will reduce the sound source direction resolution.
- the frequency points with low signal-to-interference ratio will cause serious interference to the sound source estimation.
- the R value near the high-intensity sound source will be high, thus affecting the direction estimation of other sound sources.
- the above-mentioned method for determining the weighting factor provided in this embodiment is to assign higher weights to the frequency bins with high signal-to-noise ratio and high signal-to-interference ratio. In this way, the adverse effects caused by the frequency points with low signal-to-noise ratio and low signal-to-interference ratio can be reduced when determining the sound source azimuth.
- Step 304 for each candidate sound source azimuth, determine the value of the cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth.
- Equation (2) can be used to calculate the value of the cross-correlation function R( ⁇ ) corresponding to each candidate sound source azimuth.
- Step 305 Determine the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.
- the azimuth of a sound source it may be required to determine the azimuth of a sound source in this scenario.
- the maximum value among the values of the mutual functions corresponding to the at least one candidate sound source azimuth angle may be determined, and the candidate sound source azimuth angle corresponding to the maximum value is determined as the target azimuth angle.
- the scenario includes multiple sound sources, and it is required to determine the azimuth angles corresponding to the multiple sound sources respectively.
- a plurality of local extrema can be determined from the values of the cross-correlation functions corresponding to the at least one candidate sound source azimuth, and the candidate sound source azimuth corresponding to each of the multiple local extremums can be determined.
- this embodiment highlights the step of determining the weighting factor corresponding to each candidate azimuth angle by using the spatial enhancement signal and the spatial notch signal corresponding to each candidate azimuth angle.
- the above weighting factor has better anti-non-stationary interference performance, so the above-mentioned solution can be used to determine the target azimuth angle with better anti-non-stationary interference ability.
- the accuracy of the determined target azimuth angle of the sound source can be further improved.
- FIG. 4A is a schematic diagram of an audio signal energy distribution for sound source localization in the related art
- FIG. 4B shows an audio signal energy distribution diagram obtained according to the sound source localization method of the present disclosure.
- FIG. 4A it shows a schematic diagram of the energy distribution of audio signals for dual sound sources in the related art. It can be seen in FIG. 4A that there is a maximum value in the azimuth angle of 0°, and the angle (0°) corresponding to the maximum value can be used as the target azimuth angle of a sound source. Furthermore, another extreme value cannot be clearly identified from Figure 4A.
- FIG. 4B is a schematic diagram of an audio signal energy distribution determined according to the sound source localization method shown in FIGS. 1 and 3 . As shown in Figure 4B, it can be clearly seen that the energy of the audio signal has two extremes at 0° and -60°, so it can be determined that the target azimuths corresponding to the above two sound sources are 0° and -60°, respectively.
- FIG. 5 shows a schematic structural diagram of the sound source localization method shown in FIG. 3 .
- a linear microphone array is formed by two microphones A and B, and the microphone array can collect the audio signal emitted by the sound source.
- the 180° audio signal pickup interval corresponding to the linear microphone array formed by the two microphones A and B can be divided into 18 sound pickup sub-intervals at equal intervals (every 10°).
- the endpoints corresponding to each of the 18 sound pickup subsections can be used as candidate sound source azimuths.
- 19 candidate sound source azimuths can be obtained.
- the signal of the audio signal from the sound source reaching the microphone A can be expressed as x1(m), and the signal of the audio signal from the sound source reaching the microphone B can be expressed as x2(m+ ⁇ ).
- the signals x1(m) and x2(m+ ⁇ ) respectively received by the microphones A and B are converted to the frequency domain to obtain the frequency domain signals X 1 ( ⁇ ) and X 2 ( ⁇ ) ⁇ e -j ⁇ . in, c is the speed of light, d is the distance between the two microphones, ⁇ is the azimuth angle of the candidate sound source, and ⁇ is the time difference between the sound source signal reaching the two microphones.
- the beamforming module First, input the frequency domain signals corresponding to the microphones A and B respectively to the beamforming module to obtain a spatially enhanced signal (frequency domain signal) bf_ori of the signals corresponding to the microphones A and B respectively. Then, the above-mentioned spatially enhanced signal is input to the noise estimation module to obtain the first estimated noise bf_noise. Then, the signal-to-noise ratio SNR can be calculated from the spatially enhanced signal bf_ori and the first estimated noise bf_noise according to formula (5).
- the blocking matrix module inputs the frequency domain signals corresponding to the microphones A and B respectively to the blocking matrix module to obtain the spatial notch signal (frequency domain signal) null_ori of the signals corresponding to the microphones A and B respectively.
- the above-mentioned spatial notch signal is input to the noise estimation module to obtain the second estimated noise null_noise.
- the difference between the spatial enhancement signal bf_ori and the first estimated noise bf_noise and the difference between the spatial notch signal null_ori and the second estimated noise null_noise can be determined, and then the signal-to-interference ratio SIR is calculated according to formula (6).
- the weighting factor A( ⁇ ) corresponding to the azimuth of the candidate sound source can be determined according to the function that is positively correlated with the azimuth of the candidate sound source and the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source.
- the value of the cross-correlation function corresponding to the sound source azimuth is determined by the weighting factor corresponding to the sound source azimuth.
- the target azimuth angle of the sound source is determined by the value of the cross-correlation function corresponding to each sound source azimuth angle.
- the present disclosure provides an embodiment of a sound source localization apparatus, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 1 .
- the sound source localization apparatus of this embodiment includes: a first determination unit 601 , an acquisition unit 602 , and a second determination unit 603 .
- the first determining unit 601 is used for determining at least one candidate sound source azimuth angle in the sound pickup interval corresponding to the audio information collection sensor;
- the acquiring unit 602 is used for obtaining the candidate sound source azimuth angle for at least one candidate sound source azimuth angle.
- the signal-to-noise ratio and the signal-to-interference ratio of the source azimuth and determine the weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth; the second determining unit 603 is used for at least one candidate sound source azimuth.
- the weighting factor for the azimuth of the sound source determines the target azimuth of the sound source.
- the specific processing of the first determination unit 601 , the acquisition unit 602 and the second determination unit 603 of the sound source localization device and the technical effects brought about by them can refer to steps 101 and 101 in the corresponding embodiment of FIG. 1 , respectively. 102.
- the related description of step 103 is not repeated here.
- the second determining unit 603 is further configured to: for at least one candidate sound source azimuth, generate a cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth. value; the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth angle.
- the first determining unit 601 is further configured to: divide the sound pickup interval into at least one sound pickup subsection; determine the sound pickup subsection within at least one sound pickup subsection according to a preset rule At least one candidate sound source azimuth.
- the audio information collection sensor includes a linear microphone array composed of two microphones; and the first determining unit 601 is further configured to: pick up sounds corresponding to 180° of the linear microphone array The section is divided into a plurality of pickup subsections at equal intervals.
- the obtaining unit 602 is further configured to: obtain the spatial enhancement signal of the audio signal at the azimuth angle of the candidate sound source and the spatial notch signal of the audio signal; according to the spatial enhancement signal and the spatial notch signal Wave signal, determine the signal-to-noise ratio and signal-to-interference ratio of the azimuth of the sound source.
- the obtaining unit 602 is further configured to: determine a signal obtained by summing the signal delays of the audio signals received by the two microphones as a spatially enhanced signal; The signal obtained by the time difference is determined as the spatial notch signal.
- the obtaining unit 602 is further configured to: input the spatially enhanced signal into a preset noise estimation module to obtain a first estimated noise signal, and use the spatially enhanced signal and the first estimated noise signal to determine the signal-to-noise ratio; input the spatial notch signal into the preset noise estimation module to obtain a second estimated noise, and use the difference between the spatially enhanced signal and the first estimated noise, the spatial notch The difference between the signal and the second estimated noise determines the signal-to-interference ratio.
- the obtaining unit 602 is further configured to: determine the weighting factor according to a function that is positively correlated with the signal-to-noise ratio and the signal-to-interference ratio.
- the second determining unit 603 is further configured to: determine the maximum value from the values of the cross-correlation functions corresponding to the at least one candidate sound source azimuth angle, and set the maximum value corresponding to the The candidate sound source azimuth of is determined as the target azimuth of the sound source.
- the second determining unit 603 is further configured to: generate a distribution map of the values of the cross-correlation function according to the values of the cross-correlation function corresponding to the at least one candidate sound source azimuth; In the figure, at least two local extrema are determined, and the candidate sound source azimuth angles corresponding to the at least two local extrema values are determined as the target azimuth angles corresponding to the at least two sound sources respectively.
- FIG. 7 shows an exemplary system architecture in which a sound source localization method or a sound source localization apparatus according to an embodiment of the present disclosure can be applied.
- the system architecture may include an audio information collection sensor, a terminal device 703 , a network 704 , and a server 705 .
- the audio information collection sensor may include microphones 701 and 702 .
- the audio information collection sensor may be connected to the terminal device 703 through wired communication.
- the above-mentioned audio information collection sensor may be set in the terminal device.
- the network 704 is the medium used to provide the communication link between the terminal device 703 and the server 705 .
- Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
- the audio information collection sensor may send the collected audio signal to the terminal device 703 through wired communication.
- the terminal device 703 can interact with the server 705 through the network 704 to receive or send messages and the like.
- client applications may be installed on the terminal device 703 , such as web browser applications, search applications, news information applications, and audio signal processing applications.
- the client application in the terminal device 703 can receive the user's instruction, and perform corresponding functions according to the user's instruction, for example, analyze and process the audio signal according to the user's instruction.
- the terminal device 703 may be hardware or software.
- the terminal device 703 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
- the terminal device 703 is software, it can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (eg, software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.
- the server 705 may be a server that provides various services, such as receiving the audio signal sent by the terminal device 703, performing analysis and processing according to the audio signal, and sending the processing result (eg, the target azimuth of the sound source) to the terminal device.
- various services such as receiving the audio signal sent by the terminal device 703, performing analysis and processing according to the audio signal, and sending the processing result (eg, the target azimuth of the sound source) to the terminal device.
- the sound source localization method provided by the embodiment of the present disclosure may be executed by a terminal device, and correspondingly, the sound source localization apparatus may be set in the terminal device 703 .
- the sound source localization method provided by the embodiment of the present disclosure may also be executed by the server 705 , and accordingly, the sound source localization apparatus may be provided in the server 705 .
- terminal devices, networks and servers in FIG. 7 are only illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
- FIG. 8 it shows a schematic structural diagram of an electronic device (eg, a terminal device or a server in FIG. 7 ) suitable for implementing an embodiment of the present disclosure.
- Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like.
- the electronic device shown in FIG. 8 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
- the electronic device may include a processing device (eg, a central processing unit, a graphics processor, etc.) 801, which may be loaded into a random access memory according to a program stored in a read only memory (ROM) 802 or from a storage device 808
- the program in the (RAM) 803 executes various appropriate operations and processes.
- various programs and data required for the operation of the electronic device 800 are also stored.
- the processing device 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
- An input/output (I/O) interface 805 is also connected to bus 804 .
- I/O interface 805 input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 807 of a computer, etc.; a storage device 808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 809. Communication means 809 may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While Figure 8 illustrates an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or available. More or fewer devices may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
- the computer program may be downloaded and installed from the network via the communication device 809, or from the storage device 808, or from the ROM 802.
- the processing device 801 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
- the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
- clients and servers can communicate using any currently known or future developed network protocols such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
- Communication eg, a communication network
- Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to: determine at least one candidate sound source in the sound pickup interval corresponding to the audio information acquisition sensor Azimuth; for at least one candidate sound source azimuth, obtain the signal-to-noise ratio and signal-to-interference ratio of the candidate sound source azimuth, and determine the corresponding candidate sound source azimuth according to the signal-to-noise ratio and signal-to-interference ratio of the candidate sound source azimuth
- the weighting factor of the sound source; the target azimuth angle of the sound source is determined according to the weighting factor of the azimuth angle of at least one candidate sound source.
- Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
- LAN local area network
- WAN wide area network
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
- exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLDs Complex Programmable Logical Devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- the sound source localization method includes: determining at least one candidate sound source azimuth in a sound pickup interval corresponding to an audio information collection sensor; for the at least one candidate sound source azimuth, Obtain the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source, and determine the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source; according to at least one azimuth of the candidate sound source The weighting factor of determines the target azimuth of the sound source.
- the determining the target azimuth of the sound source according to the weighting factor of the at least one candidate sound source azimuth includes: for at least one candidate sound source azimuth, according to the candidate sound source azimuth
- the weighting factor of the azimuth angle generates the value of the cross-correlation function corresponding to the azimuth angle of the candidate sound source; the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.
- the determining at least one candidate sound source azimuth angle in a sound pickup interval corresponding to the audio information collection device includes: dividing the sound pickup interval into at least one sound pickup sub-interval ; determine the azimuth angle of the at least one candidate sound source within at least one sound pickup sub-interval according to a preset rule.
- the audio information collection sensor includes a linear microphone array composed of two microphones; and the dividing the sound pickup interval into at least one sound pickup sub-interval includes: dividing the sound pickup interval into at least one sound pickup sub-interval.
- the 180° sound pickup interval corresponding to the linear microphone array is divided into a plurality of sound pickup sub-intervals at equal intervals.
- the determining the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source includes: acquiring a spatially enhanced signal and a spatial trap of the audio signal of the audio signal at the azimuth of the candidate sound source wave signal; according to the spatial enhancement signal and the spatial notch signal, determine the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the sound source.
- acquiring the spatial enhancement signal of the audio signal and the spatial notch signal of the audio signal at the azimuth angle of the candidate sound source includes: delaying the signal of the audio signal received by the two microphones. The signal obtained by the sum of the times is determined as the spatial enhancement signal; the signal obtained by the difference between the signal delays of the audio signals of the two microphones is determined as the spatial notch signal.
- the determining, according to the spatially enhanced signal and the spatially notch signal, the signal-to-noise ratio and the signal-to-interference ratio of the sound source azimuth includes: inputting the spatially enhanced signal into a pre- A noise estimation module is set to obtain a first estimated noise signal, and the signal-to-noise ratio is determined by using the spatial enhancement signal and the first estimated noise signal; the spatial notch signal is input to the preset noise estimation module, A second estimated noise is obtained, and the signal-to-interference ratio is determined by using the difference between the spatial enhancement signal and the first estimated noise and the difference between the spatial notch signal and the second estimated noise.
- the determining the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source includes: according to the signal-to-noise ratio and the signal-to-noise ratio
- the weighting factor is determined by a function that is positively related to the signal-to-interference ratio.
- the determining the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth includes: from the at least one candidate sound source azimuth The maximum value is determined from the values of the cross-correlation functions corresponding to the respective angles, and the candidate sound source azimuth angle corresponding to the maximum value is determined as the target azimuth angle of the sound source.
- the determining the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth includes: according to the at least one candidate sound source azimuth The value of the cross-correlation function corresponding to each angle generates a distribution diagram of the value of the cross-correlation function; at least two local extreme values are determined in the distribution diagram, and the candidate sound sources corresponding to the at least two local extreme values The azimuth angle is determined as the target azimuth angle corresponding to each of the at least two sound sources.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Abstract
A sound source positioning method and apparatus, and an electronic device. The method comprises: determining at least one candidate sound source azimuth in a pickup interval corresponding to an audio information acquisition sensor (101); for the at least one candidate sound source azimuth, obtaining a signal-to-noise ratio and a signal-to-interference ratio of the candidate sound source azimuth, and determining a weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth (102); and determining a target azimuth of the sound source according to the weighting factor of the at least one candidate sound source azimuth (103). The accuracy of positioning a sound source in a multi-sound-source scene can be improved.
Description
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2020年12月23日提交的,申请号为202011555230.6、发明名称为“声源定位方法、装置和电子设备”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on December 23, 2020 with the application number 202011555230.6 and the invention titled "Sound Source Localization Method, Apparatus and Electronic Device", the full text of which is incorporated herein by reference middle.
本公开涉及信息处理技术领域,尤其涉及一种声源定位方法、装置和电子设备。The present disclosure relates to the technical field of information processing, and in particular, to a sound source localization method, apparatus and electronic device.
声源定位是指根据音频信号来估计声源来源的技术。声源定位包括对语音或其他声音的方位和位置的定位。声源定位有着广泛的应用,例如安防机器人可以根据声源定位技术确定到的声源方位,调整摄像头采集该声源方位的图像。Sound source localization refers to the technique of estimating the source of a sound source from an audio signal. Sound source localization includes the location and location of speech or other sounds. Sound source localization has a wide range of applications. For example, a security robot can adjust the camera to collect an image of the sound source position according to the sound source position determined by the sound source localization technology.
发明内容SUMMARY OF THE INVENTION
提供该公开内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该公开内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。This disclosure section is provided to introduce concepts in a simplified form that are described in detail in the detailed description section that follows. This disclosure section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.
本公开实施例提供了一种声源定位方法、装置和电子设备。Embodiments of the present disclosure provide a sound source localization method, apparatus, and electronic device.
第一方面,本公开实施例提供了一种声源定位方法,该方法包括:在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角;对于至少一个候选声源方位角,获取该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子;根据至少一个候选声源方位角的所述加 权因子确定声源的目标方位角。In a first aspect, an embodiment of the present disclosure provides a sound source localization method, the method includes: determining at least one candidate sound source azimuth in a sound pickup interval corresponding to an audio information collection sensor; for the at least one candidate sound source azimuth , obtain the signal-to-noise ratio and signal-to-interference ratio of the azimuth of the candidate sound source, and determine the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source; The weighting factor for the angle determines the target azimuth of the sound source.
第二方面,本公开实施例提供了一种声源定位装置,该装置包括:第一确定单元,用于在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角;获取单元,用于对于至少一个候选声源方位角,获取该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子;第二确定单元,用于根据至少一个候选声源方位角的所述加权因子确定声源的目标方位角。In a second aspect, an embodiment of the present disclosure provides a sound source localization device, the device includes: a first determination unit, configured to determine at least one candidate sound source azimuth angle within a sound pickup interval corresponding to an audio information collection sensor; The unit is configured to, for at least one candidate sound source azimuth, obtain the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth, and determine the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth a corresponding weighting factor; a second determining unit, configured to determine the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source.
第三方面,本公开实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的声源定位方法。In a third aspect, embodiments of the present disclosure provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs The one or more processors execute, such that the one or more processors implement the sound source localization method as described in the first aspect.
第四方面,本公开实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面所述的声源定位方法的步骤。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the sound source localization method according to the first aspect.
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.
图1是根据本公开的声源定位方法的一个实施例的流程图;1 is a flowchart of one embodiment of a sound source localization method according to the present disclosure;
图2是根据使用两个麦克风确定声源信号的方位角的一个示意图;Fig. 2 is a schematic diagram of determining the azimuth angle of a sound source signal according to the use of two microphones;
图3是根据本公开的声源定位方法的又一个实施例的流程图;FIG. 3 is a flowchart of yet another embodiment of a sound source localization method according to the present disclosure;
图4A是相关技术中一个示意性声源定位效果图;Fig. 4A is a schematic sound source localization effect diagram in the related art;
图4B示出了根据本公开的声源定位方法得到的示意性声源定位结果图;FIG. 4B shows a schematic sound source localization result diagram obtained according to the sound source localization method of the present disclosure;
图5是图3所示声源定位方法的一个原理性结构示意图;Fig. 5 is a schematic structural diagram of the sound source localization method shown in Fig. 3;
图6是根据本公开的声源定位装置的一个实施例的结构示意图;6 is a schematic structural diagram of an embodiment of a sound source localization device according to the present disclosure;
图7是本公开的一个实施例的声源定位方法或声源定位装置可以 应用于其中的示例性系统架构;7 is an exemplary system architecture to which a sound source localization method or sound source localization apparatus according to an embodiment of the present disclosure can be applied;
图8是根据本公开实施例提供的电子设备的基本结构的示意图。FIG. 8 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.
请参考图1,其示出了根据本公开的声源定位方法的一个实施例的流程。如图1所示该声源定位方法,包括以下步骤:Please refer to FIG. 1 , which shows a flow of an embodiment of a sound source localization method according to the present disclosure. As shown in Figure 1, the sound source localization method includes the following steps:
步骤101,在音频信息采集传感器所对应的拾音区间内确定至少 一个候选声源方位角。Step 101: Determine at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor.
可以使用各种音频信息采集传感器来采集声音信号。音频信息采集传感器可以包括麦克风阵列(也称声器阵列或声阵列)。Sound signals can be collected using various audio information collection sensors. Audio information collecting sensors may include microphone arrays (also called microphone arrays or acoustic arrays).
麦克风阵列可以包括线性麦克风阵列和非线性麦克风阵列。Microphone arrays may include linear microphone arrays and non-linear microphone arrays.
麦克风采集到音频信号之后,可以将所采集到的模拟音频信号转换为电信号,然后经过采样处理,得到可以音频信号的数字化信号。After the microphone collects the audio signal, the collected analog audio signal can be converted into an electrical signal, and then subjected to sampling processing to obtain a digitized signal that can be an audio signal.
在本实施例中,可以视为声源与音频信息采集传感器之间的距离远大于音频信息采集传感器的尺寸。在这种场景下,声源发出的音频信号可以视为平面波。In this embodiment, it can be considered that the distance between the sound source and the audio information collection sensor is much larger than the size of the audio information collection sensor. In this scenario, the audio signal emitted by the sound source can be regarded as a plane wave.
麦克风阵列通常由多个麦克风按照一定规则排列组成。多个麦克风可以同步采集声音信号,利用多个麦克风之间的信号相位差,可以确定发出声音的音源的位置。上述音源的位置例如可以是发出该音源的方位角。A microphone array is usually composed of multiple microphones arranged according to certain rules. Multiple microphones can collect sound signals synchronously, and the position of the sound source that emits the sound can be determined by using the signal phase difference between the multiple microphones. The position of the sound source may be, for example, the azimuth angle from which the sound source is emitted.
不同的麦克风阵列可以对应不同的拾音区间。Different microphone arrays can correspond to different pickup intervals.
本实施例中的拾音区间是指:考虑到对称性的情况下麦克风能够进行声源定位的空间范围,一般为平面区间或空间区间。The sound pickup interval in this embodiment refers to the spatial range in which the microphone can localize the sound source in consideration of symmetry, and is generally a plane interval or a space interval.
线性麦克风阵列对应的拾音区间可以为二维180°,其声源定位效果基于麦克风连线旋转对称。平面环形麦克风阵列对应的拾音区间可以为二维360°,其声源定位效果基于麦克风平面镜像对称。立体麦克风阵列的拾音区间可以为立体360°。The sound pickup interval corresponding to the linear microphone array can be two-dimensional 180°, and the sound source localization effect is based on the rotational symmetry of the microphone connection. The sound pickup interval corresponding to the planar annular microphone array can be two-dimensional 360°, and the sound source localization effect is based on the mirror symmetry of the microphone plane. The sound pickup interval of the stereo microphone array can be stereo 360°.
以两麦克风组成的线性麦克风阵列为例。若以平行于两个麦克风连线的直线为x轴,以垂直于x轴的直线为y轴建立坐标系。将两个麦克风设置在x轴上,且将两个麦克风连线的中点设置在x轴与y轴的交点O。则该由两个麦克风组成的线性麦克风阵列所对应的拾音区间为与x轴正向所成角度为0°的声源角度至与x轴正向所成角度为180°的声源角度所构成的区间。Take a linear microphone array consisting of two microphones as an example. If a line parallel to the line connecting the two microphones is used as the x-axis, and a line perpendicular to the x-axis is used as the y-axis to establish a coordinate system. Set the two microphones on the x-axis, and set the midpoint of the line connecting the two microphones at the intersection O of the x-axis and the y-axis. Then the sound pickup interval corresponding to the linear microphone array composed of two microphones is from the sound source angle formed by the positive angle of the x-axis at 0° to the sound source angle formed by the positive angle of the x-axis at 180°. constituted interval.
若以平行于两个麦克风连线的直线为y轴,以垂直于y轴的直线为x轴建立坐标系。将两个麦克风设置在y轴上,且将两个麦克风连线的中点设置在x轴与y轴的交点O。则该由两个麦克风组成的线性麦克风阵列所对应的拾音区间为与x轴正向所成角度为-90°的声源 角度至与x轴正向所成角度为90°的声源角度所构成的区间。If a line parallel to the line connecting the two microphones is used as the y-axis, and a line perpendicular to the y-axis is used as the x-axis to establish a coordinate system. Set the two microphones on the y-axis, and set the midpoint of the line connecting the two microphones at the intersection O of the x-axis and the y-axis. Then the sound pickup interval corresponding to the linear microphone array composed of two microphones is the sound source angle formed by the positive angle of the x-axis at -90° to the sound source angle formed by the positive angle of the x-axis at 90°. constituted interval.
可以在上述拾音区间内确定至少一个候选声源方位角。At least one candidate sound source azimuth angle may be determined within the above-mentioned sound pickup interval.
在一些可选的实现方式中,上述在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角,包括:In some optional implementations, the above-mentioned determining at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor includes:
首先,将上述拾音区间分割为至少一个拾音子区间。First, the above-mentioned sound collection section is divided into at least one sound collection subsection.
其次,根据预设规则在至少一个拾音子区间内确定至少一个候选声源方位角。Secondly, at least one candidate sound source azimuth angle is determined within at least one sound pickup sub-interval according to a preset rule.
在一些应用场景中,可以将上述整个拾音区间作为一个拾音子区间,然后根据预设规则在上述拾音子区间内确定至少一个候选声源方位角。例如将形成该拾音子区间对应的两个端点作为候选声源方位角。In some application scenarios, the above-mentioned entire sound-collecting interval may be regarded as a sound-collecting sub-interval, and then at least one candidate sound source azimuth angle is determined within the above-mentioned sound-collecting sub-interval according to a preset rule. For example, the two end points corresponding to the sound pickup sub-interval are taken as candidate sound source azimuth angles.
在另外一些应用场景中,可以按照预设拾音区间分割规则对拾音子区间进行分割。作为一种实现方式,可以首先确定拾音区间待被分割的数量。然后对该拾音区间按照该数量进行等间隔分割。作为另一种实现方式,在确定了拾音区间待被划分的数量之后,可以对拾音区间进行非等间隔划分。In some other application scenarios, the sound-picking sub-intervals may be divided according to the preset sound-picking interval division rules. As an implementation manner, the number of sound pickup intervals to be divided may be determined first. Then, the pickup interval is divided into equal intervals according to the number. As another implementation manner, after the number of sound pickup intervals to be divided is determined, the sound pickup intervals may be divided at unequal intervals.
作为一种示意性说明,音频信息采集传感器包括由两个麦克风组成的线性麦克风阵列。上述将上述拾音区间分割为至少一个拾音子区间,包括:将线性麦克风阵列所对应的180°的拾音区间等间隔分割为多个拾音子区间。As a schematic illustration, the audio information collection sensor includes a linear microphone array composed of two microphones. The above-mentioned dividing the above-mentioned sound pickup interval into at least one sound pickup subsection includes: dividing the 180° sound pickup interval corresponding to the linear microphone array into a plurality of sound pickup subsections at equal intervals.
下面以线性麦克风阵列对应的拾音区间为麦克风前方0°~180°的拾音区间为例进行说明,可以对该180°拾音区间按照等间隔划分为18个拾音子区间。上述18个拾音子区间可以为:0°~10°、10°~20°、20°~30°、30°~40°、40°~50°、50°~60°、60°~70°、70°~80°、80°~90°、90°~100°、100°~110°、110°~120°、120°~130°、130°~140°、140°~150°、150°~160°、160°~170°、170°~180°。The following description takes the sound pickup interval corresponding to the linear microphone array as the sound pickup interval from 0° to 180° in front of the microphone as an example, and the 180° sound pickup interval can be divided into 18 sound pickup sub-intervals at equal intervals. The above 18 pickup sub-intervals can be: 0°~10°, 10°~20°, 20°~30°, 30°~40°, 40°~50°, 50°~60°, 60°~70° °, 70°~80°, 80°~90°, 90°~100°, 100°~110°, 110°~120°, 120°~130°, 130°~140°, 140°~150°, 150°~160°, 160°~170°, 170°~180°.
在对拾音区间划分为拾音子区间之后,可以在至少一个拾音子区间内确定至少一个候选声源方位角。After the sound pickup interval is divided into sound pickup subintervals, at least one candidate sound source azimuth angle may be determined in at least one sound pickup subinterval.
作为一种实现方式,可以将每个拾音子区间的两个端点对应的方位角作为候选声源方位角。在确定了各拾音子区间各自对应的候选声 源方位角之后,可以对重复的候选声源方位角进行去重操作,得到拾音区间对应的候选声源方位角。例如,在上述各拾音子区间确定的候选声源方位角可以为:0°、10°、20°、30°、40°、50°、60°、70°、80°、90°、100°、110°、120°、130°、140°、150°、160°、170°、180°。As an implementation manner, the azimuth angles corresponding to the two end points of each sound pickup subsection may be used as the candidate sound source azimuth angles. After the candidate sound source azimuth angles corresponding to each sound pickup sub-interval are determined, the repeated candidate sound source azimuth angles can be de-duplicated to obtain the candidate sound source azimuth angles corresponding to the sound pickup interval. For example, the candidate sound source azimuth angles determined in the above-mentioned sound pickup subsections may be: 0°, 10°, 20°, 30°, 40°, 50°, 60°, 70°, 80°, 90°, 100° °, 110°, 120°, 130°, 140°, 150°, 160°, 170°, 180°.
作为另外一种实现方式,可以将每个拾音子区间内(除去拾音子区间的两个端点)的一个声源方位角(例如位于该拾音子区间中间位置的方位角)作为该拾音子区间对应的候选声源方位角。As another implementation manner, a sound source azimuth (for example, the azimuth located in the middle of the sound pickup subsection) in each sound pickup subsection (excluding the two end points of the sound pickup subsection) can be used as the sound pickup subsection. The candidate sound source azimuth corresponding to the phonetic interval.
步骤102,对于至少一个候选声源方位角,确定该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子。 Step 102, for at least one candidate sound source azimuth, determine the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth, and determine the corresponding candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth. weighting factor.
在本实施例中,可以根据各种确定信噪比的方法来确定候选声源方位角的信噪比。In this embodiment, the signal-to-noise ratio of the azimuth angle of the candidate sound source can be determined according to various methods for determining the signal-to-noise ratio.
例如可以通过噪声测量方法先测出噪声的功率,然后确定候选声源方位角所对应的音频信号的功率,根据该音频信号的功率和噪声的功率的比值确定该候选声源方位角的信噪比。For example, the power of the noise can be measured by the noise measurement method, and then the power of the audio signal corresponding to the azimuth of the candidate sound source can be determined, and the signal-to-noise of the azimuth of the candidate sound source can be determined according to the ratio of the power of the audio signal and the power of the noise. Compare.
可以根据各种确定信干比的方法来确定该候选声源方位角的信干比。例如可以通过干扰信号测量反复提取出干扰信号的功率,然后确定候选声源方位角对应的音频信号的功率,根据该音频信号的功率和干扰信号的功率的比值确定该候选声源方位角的信干比。The signal-to-interference ratio of the azimuth angle of the candidate sound source can be determined according to various methods for determining the signal-to-interference ratio. For example, the power of the interference signal can be repeatedly extracted by measuring the interference signal, and then the power of the audio signal corresponding to the azimuth angle of the candidate sound source can be determined, and the information of the azimuth angle of the candidate sound source can be determined according to the ratio between the power of the audio signal and the power of the interference signal. dry ratio.
可以由与信噪比和信干比成正相关的任意函数来确定候选方位角对应的加权因子。The weighting factor corresponding to the candidate azimuth can be determined by any function that is positively related to the signal-to-noise ratio and the signal-to-interference ratio.
步骤103,根据至少一个候选声源方位角的上述加权因子确定声源的目标方位角。Step 103: Determine the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source.
可以利用指示一个候选声源方位角的上述加权因子进行各种分析,从而确定声源的目标方位角。Various analyses can be performed using the above-described weighting factors indicating the azimuth of a candidate sound source to determine the target azimuth of the sound source.
具体地,上述步骤103可以包括如下步骤:Specifically, the above step 103 may include the following steps:
子步骤1031,对于至少一个候选声源方位角,根据该候选声源方位角的加权因子生成该候选声源方位角对应的互相关函数的值。Sub-step 1031, for at least one candidate sound source azimuth, generate a value of the cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth.
子步骤1032,根据至少一个候选声源方位角各自对应的所述互相 关函数的值确定声源的目标方位角。Sub-step 1032: Determine the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.
下面以两个麦克风组成的线性阵列为例进行说明。由多个麦克风组成的其他的阵列中,最基本的单元是由两个麦克风组成的线性阵列。其他的阵列在确定声源方位角时可以以由两个麦克风组成的线性阵列为基本的阵列单元来进行分析,此处不赘述。The following takes a linear array composed of two microphones as an example for description. In other arrays of multiple microphones, the most basic unit is a linear array of two microphones. For other arrays, when determining the azimuth angle of the sound source, a linear array composed of two microphones can be used as the basic array unit for analysis, which will not be described here.
如图2所示,图2示出了两个麦克风A、B分别接收音频信号的示意图。假设A所接收到的第一音频信号为x
1(m),B所接到的第二音频信号为x
2(m+τ)。可以通过计算第一音频信号和第二音频信号的互相关函数,找到使互相关函数最大的值即为第一音频信号和第二音频信号的时间差τ。利用下面的公式(1)来确定声源方位角θ。
As shown in FIG. 2 , FIG. 2 shows a schematic diagram of two microphones A and B respectively receiving audio signals. It is assumed that the first audio signal received by A is x 1 (m), and the second audio signal received by B is x 2 (m+τ). By calculating the cross-correlation function of the first audio signal and the second audio signal, it is found that the value that maximizes the cross-correlation function is the time difference τ between the first audio signal and the second audio signal. The sound source azimuth angle θ is determined using the following formula (1).
τ=(dcos(θ))/c (1);τ=(dcos(θ))/c (1);
其中第一音频信号和第二音频信号的行程差为dcos(θ):d为两个麦克风之间的距离;c为光速。The travel difference between the first audio signal and the second audio signal is dcos(θ): d is the distance between the two microphones; c is the speed of light.
互相关函数可以由如下公式表示:The cross-correlation function can be expressed by the following formula:
R(τ)=∫A(W)P(w)w
jwτdw (2);
R(τ)=∫A(W)P(w)w jwτ dw (2);
其中,其中w是频率,τ是双麦接收信号的时延,P(w)是双麦的互功率谱,A(w)是加权因子。Among them, where w is the frequency, τ is the time delay of the dual-mic received signal, P(w) is the cross-power spectrum of the dual-mic, and A(w) is the weighting factor.
根据式(2)计算出不同时延下的R(t),最大R(t)对应的(t)为声源的时延,根据麦克风间距即可计算出对应的声源方位。According to formula (2), R(t) under different time delays is calculated. (t) corresponding to the maximum R(t) is the time delay of the sound source, and the corresponding sound source orientation can be calculated according to the distance between the microphones.
具体地,可以将上述公式(1)代入到上述公式(2)。将θ分别取值为上述各个候选声源方位角。然后根据每一个候选声源方位角计算得到的与该候选声源方位角对应的互相关函数的值,确定声源的目标方位角。例如可以将所对应的互相关函数的值为最大值的候选方位角确定为声源的目标方位角。Specifically, the above formula (1) can be substituted into the above formula (2). Set θ as the azimuth angle of each candidate sound source mentioned above. Then, the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to the azimuth angle of each candidate sound source calculated from the azimuth angle of the candidate sound source. For example, the candidate azimuth angle with the maximum value of the corresponding cross-correlation function may be determined as the target azimuth angle of the sound source.
在本公开的实施例中,通过在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角;对于至少一个候选声源方位角,获取该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子;根据至少一个候选声源方位角的所述加权因子确定声源的目标方位角,上述方案由各候选声源方位角各自应的信噪比和信干比来确定各自的加权因 子,进而由上述加权因子确定目标方位角,可以提高多声源场景的声源定位的准确度。In the embodiment of the present disclosure, at least one candidate sound source azimuth is determined within the sound pickup interval corresponding to the audio information acquisition sensor; for at least one candidate sound source azimuth, the signal-to-noise ratio of the candidate sound source azimuth is obtained. and the signal-to-interference ratio, and determine the weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth; determine the target azimuth of the sound source according to the weighting factor of at least one candidate sound source azimuth In the above scheme, the respective weighting factors are determined by the corresponding signal-to-noise ratio and signal-to-interference ratio of each candidate sound source azimuth, and then the target azimuth is determined by the above weighting factors, which can improve the accuracy of sound source localization in multi-sound source scenarios. .
请继续参考图3,其示出了根据本公开的声源定位方法的又一个实施例的流程图。如图3所示,声源定位方法包括如下步骤:Please continue to refer to FIG. 3 , which shows a flowchart of yet another embodiment of the sound source localization method according to the present disclosure. As shown in Figure 3, the sound source localization method includes the following steps:
步骤301,在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角。Step 301: Determine at least one candidate sound source azimuth in the sound pickup interval corresponding to the audio information collection sensor.
在本实施例中,上述步骤301的具体实施可以参考图1所示实施例的步骤101的说明,此处不赘述。In this embodiment, for the specific implementation of the foregoing step 301, reference may be made to the description of the step 101 in the embodiment shown in FIG. 1 , which is not repeated here.
步骤302,对于至少一个候选声源方位角,获取该声源方位角的音频信号的空间增强信号和音频信号的空间陷波信号。 Step 302 , for at least one candidate sound source azimuth, obtain a spatial enhancement signal of the audio signal and a spatial notch signal of the audio signal at the sound source azimuth.
在本实施例中,仍以两个麦克风组成的线性阵列为例进行说明。In this embodiment, a linear array composed of two microphones is still used as an example for description.
在步骤301中确定了至少一个候选声源方位角之后,对于至少一个候选声源方位角,可以获取该候选声源方位角的空间增强信号和空间陷波信号。After the at least one candidate sound source azimuth is determined in step 301, for the at least one candidate sound source azimuth, the spatial enhancement signal and the spatial notch signal of the candidate sound source azimuth can be obtained.
仍以图2所示为例进行说明。对于任意候选声源方位角,可以将该候选声源方位角对应的第一音频信号、第二音频信号输入到预设波束成型模块。得到该候选声源方位角对应的空间增强信号。其中,第一音频信号和第二音频信号可以为由上述两个麦克风中的各麦克风分别接收到音频信号。Still take the example shown in FIG. 2 for description. For any candidate sound source azimuth, the first audio signal and the second audio signal corresponding to the candidate sound source azimuth can be input to the preset beamforming module. The spatial enhancement signal corresponding to the azimuth angle of the candidate sound source is obtained. Wherein, the first audio signal and the second audio signal may be audio signals respectively received by each of the above-mentioned two microphones.
实践中,可以将由两个麦克风接收到的音频信号的信号延时之和得到的信号确定为空间增强信号。In practice, a signal obtained by summing the signal delays of the audio signals received by the two microphones can be determined as the spatially enhanced signal.
空间增强信号bf_ori可以由如下公式(3)来表征:The spatially enhanced signal bf_ori can be characterized by the following formula (3):
bf_ori=X
1(ω)+X
2(ω)×e
-jωτ (3);
bf_ori=X 1 (ω)+X 2 (ω)×e -jωτ (3);
其中,X
1(ω)为x
1(t)由时域转换至频域的频域信号。X
2(ω)为x
2(t)由时域转换至频域的频域信号,
c为光速,d为两个麦克风之间的距离,θ为候选声源方位角,τ为声源信号到达两麦克风的时间差。
Wherein, X 1 (ω) is the frequency domain signal converted from the time domain to the frequency domain by x 1 (t). X 2 (ω) is the frequency domain signal converted from the time domain to the frequency domain by x 2 (t), c is the speed of light, d is the distance between the two microphones, θ is the azimuth angle of the candidate sound source, and τ is the time difference between the sound source signal reaching the two microphones.
可以将该候选声源方位角对应的第一音频信号、第二音频信号输入到预设阻塞矩阵中,得到该候选声源方位角对应的空间陷波信号。其中,第一音频信号和第二音频信号可以为由上述两个麦克风中的各 麦克风分别接收到音频信号。The first audio signal and the second audio signal corresponding to the azimuth angle of the candidate sound source may be input into a preset blocking matrix to obtain a spatial notch signal corresponding to the azimuth angle of the candidate sound source. Wherein, the first audio signal and the second audio signal may be audio signals respectively received by each of the above-mentioned two microphones.
实践中,可以将由两个麦克风的音频信号的信号延时之差得到的信号确定为空间陷波信号。空间陷波信号null_ori可以由如下公式(4)来表征。In practice, a signal obtained from the difference between the signal delays of the audio signals of the two microphones can be determined as a spatial notch signal. The spatial notch signal null_ori can be characterized by the following formula (4).
null_ori=X
1(ω)-X
2(ω)×e
-jωτ (4)。
null_ori=X 1 (ω)−X 2 (ω)×e −jωτ (4).
步骤303,根据上述空间增强信号和空间陷波信号,确定该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子。Step 303: Determine the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source according to the above-mentioned spatial enhancement signal and the spatial notch signal, and determine the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source The weighting factor corresponding to the angle.
在步骤302中得到候选声源方位角的空间增强信号和空间陷波信号之后,可以确定该候选声源方位角的信噪比和信干比。After obtaining the spatial enhancement signal and the spatial notch signal of the candidate sound source azimuth in step 302, the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth can be determined.
在一些可选的实现方式中,可以将上述空间增强信号输入到预设噪声估计模块,由预设噪声估计模块得到第一估计噪声信号,利用空间增强信号与第一估计噪声信号确定信噪比。In some optional implementations, the above-mentioned spatially enhanced signal may be input to a preset noise estimation module, a first estimated noise signal is obtained by the preset noise estimation module, and the SNR is determined by using the spatially enhanced signal and the first estimated noise signal .
例如可以使用如下公式确定该候选声源方位角对应的信噪比(SNR):For example, the following formula can be used to determine the signal-to-noise ratio (SNR) corresponding to the azimuth angle of the candidate sound source:
bf_noise为第一估计噪声信号,bf_ori为该候选声源方位角的空间增强信号。bf_noise is the first estimated noise signal, and bf_ori is the spatially enhanced signal of the azimuth angle of the candidate sound source.
上述预设噪声估计模块可以是由各种确定信号底噪的算法实现的噪声估计模块。在一些应用场景中,确定信号底噪的算法例如可以是最小值控制的递归平均算法(Minimum Controlled Regressive Averaging,MCRA)模块。The above-mentioned preset noise estimation module may be a noise estimation module implemented by various algorithms for determining the signal noise floor. In some application scenarios, the algorithm for determining the signal noise floor may be, for example, a minimum controlled recursive averaging (Minimum Controlled Regressive Averaging, MCRA) module.
在这些可选的实现方式中,可以将所述空间陷波信号输入到所述预设噪声估计模块,得到第二估计噪声信号,利用所述空间增强信号和第一估计噪声之差、所述空间陷波信号与第二估计噪声之差,确定所述信干比(SIR)。In these optional implementation manners, the spatial notch signal may be input to the preset noise estimation module to obtain a second estimated noise signal, and the difference between the spatially enhanced signal and the first estimated noise, the The difference between the spatial notch signal and the second estimated noise determines the signal-to-interference ratio (SIR).
bf_noise为第一估计噪声信号,bf_ori为该候选声源方位角的空间增强信号。null_ori为空间陷波信号,null_noise为第二估计噪声信号。bf_noise is the first estimated noise signal, and bf_ori is the spatially enhanced signal of the azimuth angle of the candidate sound source. null_ori is the spatial notch signal, and null_noise is the second estimated noise signal.
在确定了上述信噪比和信干比之后,可以按照如下公式确定该声源方位角的加权因子A(ω)。After the above signal-to-noise ratio and signal-to-interference ratio are determined, the weighting factor A(ω) of the azimuth angle of the sound source can be determined according to the following formula.
A(ω)=f(SNR(ω),SIR(ω)) (7);A(ω)=f(SNR(ω),SIR(ω)) (7);
上述函数f(SNR(ω),SIR(ω))可以是任意与SNR(ω),SIR(ω)正相关的函数,此处不进行限定。The above-mentioned function f(SNR(ω), SIR(ω)) may be any function that is positively correlated with SNR(ω) and SIR(ω), which is not limited here.
低信噪比的频点对声源估计会产生平滑作用,使得声源方向分辨率降低。而低信干比的频点对声源估计会产品严重的干扰作用,在多声源场景下,会使得高强度声源附近的R值偏高,从而影响其他声源的方向估计。本实施例提供的上述确定加权因子的方法在于赋予高信噪比、高信干比的频点更高的权重。从而可以降低在确定声源方位时由低信噪比、低信干比的频点带来的不良影响。The frequency points with low signal-to-noise ratio will have a smoothing effect on the sound source estimation, which will reduce the sound source direction resolution. The frequency points with low signal-to-interference ratio will cause serious interference to the sound source estimation. In the multi-sound source scenario, the R value near the high-intensity sound source will be high, thus affecting the direction estimation of other sound sources. The above-mentioned method for determining the weighting factor provided in this embodiment is to assign higher weights to the frequency bins with high signal-to-noise ratio and high signal-to-interference ratio. In this way, the adverse effects caused by the frequency points with low signal-to-noise ratio and low signal-to-interference ratio can be reduced when determining the sound source azimuth.
步骤304,对于每一个候选声源方位角,根据该候选声源方位角的加权因子确定该候选声源方位角对应的互相关函数的值。 Step 304, for each candidate sound source azimuth, determine the value of the cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth.
可以使用公式(2)来计算每一个候选声源方位角对应的互相关函数R(τ)的值。Equation (2) can be used to calculate the value of the cross-correlation function R(τ) corresponding to each candidate sound source azimuth.
步骤305,根据至少一个候选声源方位角各自对应的互相关函数的值确定声源的目标方位角。Step 305: Determine the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.
在一些应用场景中,在该场景中可以要求确定一个声源的方位角。可以确定上述至少一个候选声源方位角各自对应的互相函数的值中最大值,将该最大值所对应的候选声源方位角确定为目标方位角。In some application scenarios, it may be required to determine the azimuth of a sound source in this scenario. The maximum value among the values of the mutual functions corresponding to the at least one candidate sound source azimuth angle may be determined, and the candidate sound source azimuth angle corresponding to the maximum value is determined as the target azimuth angle.
在另外一些应用场景中,在该场景中包括多个声源,且要求确定出多个声源分别对应的方位角。在这些应用场景中,可以由上述至少一个候选声源方位角各自对应的互相关函数的值中,确定出多个局部极值,根据多个局部极值各自对应的候选声源方位角确定出多个声源各自对应的目标方位角。In some other application scenarios, the scenario includes multiple sound sources, and it is required to determine the azimuth angles corresponding to the multiple sound sources respectively. In these application scenarios, a plurality of local extrema can be determined from the values of the cross-correlation functions corresponding to the at least one candidate sound source azimuth, and the candidate sound source azimuth corresponding to each of the multiple local extremums can be determined. Target azimuths corresponding to each of the multiple sound sources.
与图1所示实施例相比,本实施例突出了利用每一个候选方位角对应的空间增强信号和空间陷波信号,确定各个候选方位角对应的加权因子的步骤。上述加权因子的抗非平稳干扰性能较好,从而使用上述方案确定目标方位角的抗非平稳干扰的能力较好,此外,可以进一步提高所确定出的声源的目标方位角准确度。Compared with the embodiment shown in FIG. 1 , this embodiment highlights the step of determining the weighting factor corresponding to each candidate azimuth angle by using the spatial enhancement signal and the spatial notch signal corresponding to each candidate azimuth angle. The above weighting factor has better anti-non-stationary interference performance, so the above-mentioned solution can be used to determine the target azimuth angle with better anti-non-stationary interference ability. In addition, the accuracy of the determined target azimuth angle of the sound source can be further improved.
请结合图4A和图4B,图4A是相关技术中一个示意性的用于声源定位的音频信号能量分布图;图4B示出了根据本公开的声源定位方法得到的音频信号能量分布图。如图4A所示,其示出了相关技术中对双声源的音频信号能量分布示意图。在图4A中可以看到0°方位角内有一个极大值,可以将该极大值对应的角度(0°)作为一个声源的目标方位角。此外,由图4A中无法明确确定出另一个极值。4A and 4B, FIG. 4A is a schematic diagram of an audio signal energy distribution for sound source localization in the related art; FIG. 4B shows an audio signal energy distribution diagram obtained according to the sound source localization method of the present disclosure. . As shown in FIG. 4A , it shows a schematic diagram of the energy distribution of audio signals for dual sound sources in the related art. It can be seen in FIG. 4A that there is a maximum value in the azimuth angle of 0°, and the angle (0°) corresponding to the maximum value can be used as the target azimuth angle of a sound source. Furthermore, another extreme value cannot be clearly identified from Figure 4A.
请参考图4B,图4B为根据图1和图3所示的声源定位方法确定出的音频信号能量分布示意图。如图4B所示,可以明确的看出音频信号的能量在0°和-60°有两个极值,从而可以确定出上述双声源分别对应的目标方位角分别为0°和-60°。Please refer to FIG. 4B . FIG. 4B is a schematic diagram of an audio signal energy distribution determined according to the sound source localization method shown in FIGS. 1 and 3 . As shown in Figure 4B, it can be clearly seen that the energy of the audio signal has two extremes at 0° and -60°, so it can be determined that the target azimuths corresponding to the above two sound sources are 0° and -60°, respectively.
请参考图5,其示出了图3所示声源定位方法的一个原理性结构示意图。如图5所示,由两个麦克风A和B形成线性麦克风阵列,该麦克风阵列可以采集声源发出的音频信号。可以将两个麦克风A和B形成线性麦克风阵列所对应的180°的音频信号拾取区间按照等间隔(每隔10°)分割为18个拾音子区间。可以将该18个拾音子区间各自对应的端点作为候选声源方位角。去除重复的候选声源方位角后可以得到19个候选声源方位角。19个候选声源方位角可以参照图1所示实例的说明,此处不赘述。Please refer to FIG. 5 , which shows a schematic structural diagram of the sound source localization method shown in FIG. 3 . As shown in FIG. 5 , a linear microphone array is formed by two microphones A and B, and the microphone array can collect the audio signal emitted by the sound source. The 180° audio signal pickup interval corresponding to the linear microphone array formed by the two microphones A and B can be divided into 18 sound pickup sub-intervals at equal intervals (every 10°). The endpoints corresponding to each of the 18 sound pickup subsections can be used as candidate sound source azimuths. After removing the repeated candidate sound source azimuths, 19 candidate sound source azimuths can be obtained. For the 19 candidate sound source azimuth angles, reference may be made to the description of the example shown in FIG. 1 , and details are not described here.
对于每一个候选声源方位角,声源发出的音频信号到达麦克风A的信号可以表示为x1(m),声源发出的音频信号到达麦克风B的信号可以表示为x2(m+τ)。将麦克风A、B分别接收到的信号x1(m)、x2(m+τ)转换到频域,得到频域信号X
1(ω)和X
2(ω)×e
-jωτ。其中,
c为光速,d为两个麦克风之间的距离,θ为候选声源方位角,τ为声源信号到达两麦克风的时间差。
For each candidate sound source azimuth, the signal of the audio signal from the sound source reaching the microphone A can be expressed as x1(m), and the signal of the audio signal from the sound source reaching the microphone B can be expressed as x2(m+τ). The signals x1(m) and x2(m+τ) respectively received by the microphones A and B are converted to the frequency domain to obtain the frequency domain signals X 1 (ω) and X 2 (ω)×e -jωτ . in, c is the speed of light, d is the distance between the two microphones, θ is the azimuth angle of the candidate sound source, and τ is the time difference between the sound source signal reaching the two microphones.
第一,将上述麦克风A、B分别对应的频域信号输入到波束成形模块,得到上述麦克风A、B分别对应的信号的空间增强信号(频域信号)bf_ori。然后将上述空间增强信号输入到噪声估计模块,得到第一估计噪声bf_noise。接着可以由空间增强信号bf_ori和第一估计噪声bf_noise根据公式(5)计算信噪比SNR。First, input the frequency domain signals corresponding to the microphones A and B respectively to the beamforming module to obtain a spatially enhanced signal (frequency domain signal) bf_ori of the signals corresponding to the microphones A and B respectively. Then, the above-mentioned spatially enhanced signal is input to the noise estimation module to obtain the first estimated noise bf_noise. Then, the signal-to-noise ratio SNR can be calculated from the spatially enhanced signal bf_ori and the first estimated noise bf_noise according to formula (5).
第二,将上述麦克风A、B分别对应的频域信号输入到阻塞矩阵 模块,得到上述麦克风A、B分别对应的信号的空间陷波信号(频域信号)null_ori。然后将上述空间陷波信号输入到噪声估计模块,得到第二估计噪声null_noise。接着可以确定空间增强信号bf_ori和第一估计噪声bf_noise之差,与空间陷波信号null_ori和第二估计噪声null_noise之差,然后根据公式(6)计算信干比SIR。Second, input the frequency domain signals corresponding to the microphones A and B respectively to the blocking matrix module to obtain the spatial notch signal (frequency domain signal) null_ori of the signals corresponding to the microphones A and B respectively. Then, the above-mentioned spatial notch signal is input to the noise estimation module to obtain the second estimated noise null_noise. Then, the difference between the spatial enhancement signal bf_ori and the first estimated noise bf_noise and the difference between the spatial notch signal null_ori and the second estimated noise null_noise can be determined, and then the signal-to-interference ratio SIR is calculated according to formula (6).
第三,由与该候选声源方位角,可以根据与该候选声源方位角的信噪比和信干比成正相关的函数来确定该候选声源方位角对应的加权因子A(ω)。由该声源方位角对应的加权因子确定该声源方位角对应的互相关函数的值。Third, the weighting factor A(ω) corresponding to the azimuth of the candidate sound source can be determined according to the function that is positively correlated with the azimuth of the candidate sound source and the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source. The value of the cross-correlation function corresponding to the sound source azimuth is determined by the weighting factor corresponding to the sound source azimuth.
最后,由各声源方位角各自对应的互相关函数的值确定声源的目标方位角。Finally, the target azimuth angle of the sound source is determined by the value of the cross-correlation function corresponding to each sound source azimuth angle.
进一步参考图6,作为对上述各图所示方法的实现,本公开提供了一种声源定位装置的一个实施例,该装置实施例与图1所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 6 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a sound source localization apparatus, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 1 . Can be used in various electronic devices.
如图6所示,本实施例的声源定位装置包括:第一确定单元601、获取单元602、和第二确定单元603。其中,第一确定单元601,用于在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角;获取单元602,用于对于至少一个候选声源方位角,获取该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子;第二确定单元603,用于根据至少一个候选声源方位角的所述加权因子确定声源的目标方位角。As shown in FIG. 6 , the sound source localization apparatus of this embodiment includes: a first determination unit 601 , an acquisition unit 602 , and a second determination unit 603 . Wherein, the first determining unit 601 is used for determining at least one candidate sound source azimuth angle in the sound pickup interval corresponding to the audio information collection sensor; the acquiring unit 602 is used for obtaining the candidate sound source azimuth angle for at least one candidate sound source azimuth angle. The signal-to-noise ratio and the signal-to-interference ratio of the source azimuth, and determine the weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth; the second determining unit 603 is used for at least one candidate sound source azimuth. The weighting factor for the azimuth of the sound source determines the target azimuth of the sound source.
在本实施例中,声源定位装置的第一确定单元601、获取单元602和第二确定单元603的具体处理及其所带来的技术效果可分别参考图1对应实施例中步骤101、步骤102、步骤103的相关说明,在此不再赘述。In this embodiment, the specific processing of the first determination unit 601 , the acquisition unit 602 and the second determination unit 603 of the sound source localization device and the technical effects brought about by them can refer to steps 101 and 101 in the corresponding embodiment of FIG. 1 , respectively. 102. The related description of step 103 is not repeated here.
在一些可选的实现方式中,第二确定单元603进一步用于:对于至少一个候选声源方位角,根据该候选声源方位角的加权因子生成该候选声源方位角对应的互相关函数的值;根据至少一个候选声源方位 角各自对应的所述互相关函数的值确定声源的目标方位角。In some optional implementations, the second determining unit 603 is further configured to: for at least one candidate sound source azimuth, generate a cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth. value; the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth angle.
在一些可选的实现方式中,所述第一确定单元601进一步用于:将所述拾音区间分割为至少一个拾音子区间;根据预设规则在至少一个拾音子区间内确定所述至少一个候选声源方位角。In some optional implementations, the first determining unit 601 is further configured to: divide the sound pickup interval into at least one sound pickup subsection; determine the sound pickup subsection within at least one sound pickup subsection according to a preset rule At least one candidate sound source azimuth.
在一些可选的实现方式中,所述音频信息采集传感器包括由两个麦克风组成的线性麦克风阵列;以及第一确定单元601进一步用于:将所述线性麦克风阵列所对应的180°的拾音区间等间隔分割为多个拾音子区间。In some optional implementations, the audio information collection sensor includes a linear microphone array composed of two microphones; and the first determining unit 601 is further configured to: pick up sounds corresponding to 180° of the linear microphone array The section is divided into a plurality of pickup subsections at equal intervals.
在一些可选的实现方式中,获取单元602进一步用于:获取该候选声源方位角的音频信号的空间增强信号和音频信号的空间陷波信号;根据所述空间增强信号和所述空间陷波信号,确定该声源方位角的信噪比和信干比。In some optional implementation manners, the obtaining unit 602 is further configured to: obtain the spatial enhancement signal of the audio signal at the azimuth angle of the candidate sound source and the spatial notch signal of the audio signal; according to the spatial enhancement signal and the spatial notch signal Wave signal, determine the signal-to-noise ratio and signal-to-interference ratio of the azimuth of the sound source.
在一些可选的实现方式中,获取单元602进一步用于:将由两个麦克风接收到的音频信号的信号延时之和得到的信号确定为空间增强信号;将由两个麦克风的音频信号的信号延时之差得到的信号确定为空间陷波信号。In some optional implementations, the obtaining unit 602 is further configured to: determine a signal obtained by summing the signal delays of the audio signals received by the two microphones as a spatially enhanced signal; The signal obtained by the time difference is determined as the spatial notch signal.
在一些可选的实现方式中,获取单元602进一步用于:将所述空间增强信号输入到预设噪声估计模块,得到第一估计噪声信号,利用所述空间增强信号与所述第一估计噪声信号确定所述信噪比;将所述空间陷波信号输入到所述预设噪声估计模块,得到第二估计噪声,利用所述空间增强信号和第一估计噪声之差、所述空间陷波信号与第二估计噪声之差,确定所述信干比。In some optional implementation manners, the obtaining unit 602 is further configured to: input the spatially enhanced signal into a preset noise estimation module to obtain a first estimated noise signal, and use the spatially enhanced signal and the first estimated noise signal to determine the signal-to-noise ratio; input the spatial notch signal into the preset noise estimation module to obtain a second estimated noise, and use the difference between the spatially enhanced signal and the first estimated noise, the spatial notch The difference between the signal and the second estimated noise determines the signal-to-interference ratio.
在一些可选的实现方式中,获取单元602进一步用于:根据与所述信噪比和所述信干比成正相关的函数确定所述加权因子。In some optional implementations, the obtaining unit 602 is further configured to: determine the weighting factor according to a function that is positively correlated with the signal-to-noise ratio and the signal-to-interference ratio.
在一些可选的实现方式中,第二确定单元603进一步用于:从所述至少一个候选声源方位角各自对应的互相关函数的值中确定出最大值,并将所述最大值所对应的候选声源方位角确定为声源的目标方位角。In some optional implementation manners, the second determining unit 603 is further configured to: determine the maximum value from the values of the cross-correlation functions corresponding to the at least one candidate sound source azimuth angle, and set the maximum value corresponding to the The candidate sound source azimuth of is determined as the target azimuth of the sound source.
在一些可选的实现方式中,第二确定单元603进一步用于:根据所述至少一个候选声源方位角各自对应的互相关函数的值生成互相关 函数的值的分布图;在所述分布图中确定出至少两个局部极值,并将所述至少两个局部极值各自对应的候选声源方位角确定为至少两个声源各自对应的目标方位角。In some optional implementations, the second determining unit 603 is further configured to: generate a distribution map of the values of the cross-correlation function according to the values of the cross-correlation function corresponding to the at least one candidate sound source azimuth; In the figure, at least two local extrema are determined, and the candidate sound source azimuth angles corresponding to the at least two local extrema values are determined as the target azimuth angles corresponding to the at least two sound sources respectively.
请参考图7,图7示出了本公开的一个实施例的声源定位方法或声源定位装置可以应用于其中的示例性系统架构。Please refer to FIG. 7 , which shows an exemplary system architecture in which a sound source localization method or a sound source localization apparatus according to an embodiment of the present disclosure can be applied.
如图7所示,系统架构可以包括音频信息采集传感器、终端设备703,网络704,服务器705。其中,音频信息采集传感器可以包括麦克风701、702。在一些应用场景中,音频信息采集传感器可以通过有线通信方式与终端设备703连接。在另外一些应用场景中,上述音频信息采集传感器可以设置在终端设备中。网络704用以在终端设备703和服务器705之间提供通信链路的介质。网络704可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 7 , the system architecture may include an audio information collection sensor, a terminal device 703 , a network 704 , and a server 705 . The audio information collection sensor may include microphones 701 and 702 . In some application scenarios, the audio information collection sensor may be connected to the terminal device 703 through wired communication. In some other application scenarios, the above-mentioned audio information collection sensor may be set in the terminal device. The network 704 is the medium used to provide the communication link between the terminal device 703 and the server 705 . Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
音频信息采集传感器可以通过有线通信方式将所采集的音频信号发送给终端设备703。The audio information collection sensor may send the collected audio signal to the terminal device 703 through wired communication.
终端设备703可以通过网络704与服务器705交互,以接收或发送消息等。终端设备703上可以安装有各种客户端应用,例如网页浏览器应用、搜索类应用、新闻资讯类应用、音频信号处理类应用。终端设备703中的客户端应用可以接收用户的指令,并根据用户的指令完成相应的功能,例如根据用户的指令对音频信号进行分析处理。The terminal device 703 can interact with the server 705 through the network 704 to receive or send messages and the like. Various client applications may be installed on the terminal device 703 , such as web browser applications, search applications, news information applications, and audio signal processing applications. The client application in the terminal device 703 can receive the user's instruction, and perform corresponding functions according to the user's instruction, for example, analyze and process the audio signal according to the user's instruction.
终端设备703可以是硬件,也可以是软件。当终端设备703为硬件时,可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当终端设备703为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal device 703 may be hardware or software. When the terminal device 703 is hardware, it can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc. When the terminal device 703 is software, it can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (eg, software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.
服务器705可以是提供各种服务的服务器,例如接收终端设备703发送的音频信号,根据音频信号进行分析处理,并将处理结果(例如声源的目标方位角)发送给终端设备。The server 705 may be a server that provides various services, such as receiving the audio signal sent by the terminal device 703, performing analysis and processing according to the audio signal, and sending the processing result (eg, the target azimuth of the sound source) to the terminal device.
需要说明的是,本公开实施例所提供的声源定位方法可以由终端设备执行,相应地,声源定位装置可以设置在终端设备703中。此外,本公开实施例所提供的声源定位方法还可以由服务器705执行,相应地,声源定位装置可以设置于服务器705中。It should be noted that the sound source localization method provided by the embodiment of the present disclosure may be executed by a terminal device, and correspondingly, the sound source localization apparatus may be set in the terminal device 703 . In addition, the sound source localization method provided by the embodiment of the present disclosure may also be executed by the server 705 , and accordingly, the sound source localization apparatus may be provided in the server 705 .
应该理解,图7中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 7 are only illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
下面参考图8,其示出了适于用来实现本公开实施例的电子设备(例如图7中的终端设备或服务器)的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring next to FIG. 8 , it shows a schematic structural diagram of an electronic device (eg, a terminal device or a server in FIG. 7 ) suitable for implementing an embodiment of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in FIG. 8 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
如图8所示,电子设备可以包括处理装置(例如中央处理器、图形处理器等)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储装置808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有电子设备800操作所需的各种程序和数据。处理装置801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。As shown in FIG. 8, the electronic device may include a processing device (eg, a central processing unit, a graphics processor, etc.) 801, which may be loaded into a random access memory according to a program stored in a read only memory (ROM) 802 or from a storage device 808 The program in the (RAM) 803 executes various appropriate operations and processes. In the RAM 803, various programs and data required for the operation of the electronic device 800 are also stored. The processing device 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804. An input/output (I/O) interface 805 is also connected to bus 804 .
通常,以下装置可以连接至I/O接口805:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置806;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置807;包括例如磁带、硬盘等的存储装置808;以及通信装置809。通信装置809可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图8示出了具有各种装置的电子设备,但是应理解的是, 并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 807 of a computer, etc.; a storage device 808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 809. Communication means 809 may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While Figure 8 illustrates an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or available. More or fewer devices may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储装置808被安装,或者从ROM 802被安装。在该计算机程序被处理装置801执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 809, or from the storage device 808, or from the ROM 802. When the computer program is executed by the processing device 801, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
在一些实施方式中,客户端、服务器可以利用诸如HTTP (HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, clients and servers can communicate using any currently known or future developed network protocols such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium. Communication (eg, a communication network) interconnects. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角;对于至少一个候选声源方位角,获取该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子;根据至少一个候选声源方位角的所述加权因子确定声源的目标方位角。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to: determine at least one candidate sound source in the sound pickup interval corresponding to the audio information acquisition sensor Azimuth; for at least one candidate sound source azimuth, obtain the signal-to-noise ratio and signal-to-interference ratio of the candidate sound source azimuth, and determine the corresponding candidate sound source azimuth according to the signal-to-noise ratio and signal-to-interference ratio of the candidate sound source azimuth The weighting factor of the sound source; the target azimuth angle of the sound source is determined according to the weighting factor of the azimuth angle of at least one candidate sound source.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实 现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
根据本公开的一个或多个实施例所提供的声源定位方法,包括:在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角;对于至少一个候选声源方位角,获取该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子;根据至少一个候选声源方位角的所述加权因子确定声源的目标方位角。The sound source localization method provided according to one or more embodiments of the present disclosure includes: determining at least one candidate sound source azimuth in a sound pickup interval corresponding to an audio information collection sensor; for the at least one candidate sound source azimuth, Obtain the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source, and determine the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source; according to at least one azimuth of the candidate sound source The weighting factor of determines the target azimuth of the sound source.
根据本公开的一个或多个实施例,所述根据至少一个候选声源方 位角的所述加权因子确定声源的目标方位角,包括:对于至少一个候选声源方位角,根据该候选声源方位角的加权因子生成该候选声源方位角对应的互相关函数的值;根据至少一个候选声源方位角各自对应的所述互相关函数的值确定声源的目标方位角。According to one or more embodiments of the present disclosure, the determining the target azimuth of the sound source according to the weighting factor of the at least one candidate sound source azimuth includes: for at least one candidate sound source azimuth, according to the candidate sound source azimuth The weighting factor of the azimuth angle generates the value of the cross-correlation function corresponding to the azimuth angle of the candidate sound source; the target azimuth angle of the sound source is determined according to the value of the cross-correlation function corresponding to each of the azimuth angles of the at least one candidate sound source.
根据本公开的一个或多个实施例,所述在音频信息采集设备所对应的拾音区间内确定至少一个候选声源方位角,包括:将所述拾音区间分割为至少一个拾音子区间;根据预设规则在至少一个拾音子区间内确定所述至少一个候选声源方位角。According to one or more embodiments of the present disclosure, the determining at least one candidate sound source azimuth angle in a sound pickup interval corresponding to the audio information collection device includes: dividing the sound pickup interval into at least one sound pickup sub-interval ; determine the azimuth angle of the at least one candidate sound source within at least one sound pickup sub-interval according to a preset rule.
根据本公开的一个或多个实施例,所述音频信息采集传感器包括由两个麦克风组成的线性麦克风阵列;以及所述将所述拾音区间分割为至少一个拾音子区间,包括:将所述线性麦克风阵列所对应的180°的拾音区间等间隔分割为多个拾音子区间。According to one or more embodiments of the present disclosure, the audio information collection sensor includes a linear microphone array composed of two microphones; and the dividing the sound pickup interval into at least one sound pickup sub-interval includes: dividing the sound pickup interval into at least one sound pickup sub-interval. The 180° sound pickup interval corresponding to the linear microphone array is divided into a plurality of sound pickup sub-intervals at equal intervals.
根据本公开的一个或多个实施例,所述确定该候选声源方位角的信噪比和信干比,包括:获取该候选声源方位角的音频信号的空间增强信号和音频信号的空间陷波信号;根据所述空间增强信号和所述空间陷波信号,确定该声源方位角的信噪比和信干比。According to one or more embodiments of the present disclosure, the determining the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source includes: acquiring a spatially enhanced signal and a spatial trap of the audio signal of the audio signal at the azimuth of the candidate sound source wave signal; according to the spatial enhancement signal and the spatial notch signal, determine the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the sound source.
根据本公开的一个或多个实施例,所述获取该候选声源方位角的音频信号的空间增强信号和音频信号的空间陷波信号,包括:将由两个麦克风接收到的音频信号的信号延时之和得到的信号确定为空间增强信号;将由两个麦克风的音频信号的信号延时之差得到的信号确定为空间陷波信号。According to one or more embodiments of the present disclosure, acquiring the spatial enhancement signal of the audio signal and the spatial notch signal of the audio signal at the azimuth angle of the candidate sound source includes: delaying the signal of the audio signal received by the two microphones. The signal obtained by the sum of the times is determined as the spatial enhancement signal; the signal obtained by the difference between the signal delays of the audio signals of the two microphones is determined as the spatial notch signal.
根据本公开的一个或多个实施例,所述根据所述空间增强信号和空间陷波信号,确定该声源方位角的信噪比和信干比,包括:将所述空间增强信号输入到预设噪声估计模块,得到第一估计噪声信号,利用所述空间增强信号与所述第一估计噪声信号确定所述信噪比;将所述空间陷波信号输入到所述预设噪声估计模块,得到第二估计噪声,利用所述空间增强信号和第一估计噪声之差、所述空间陷波信号与第二估计噪声之差,确定所述信干比。According to one or more embodiments of the present disclosure, the determining, according to the spatially enhanced signal and the spatially notch signal, the signal-to-noise ratio and the signal-to-interference ratio of the sound source azimuth includes: inputting the spatially enhanced signal into a pre- A noise estimation module is set to obtain a first estimated noise signal, and the signal-to-noise ratio is determined by using the spatial enhancement signal and the first estimated noise signal; the spatial notch signal is input to the preset noise estimation module, A second estimated noise is obtained, and the signal-to-interference ratio is determined by using the difference between the spatial enhancement signal and the first estimated noise and the difference between the spatial notch signal and the second estimated noise.
根据本公开的一个或多个实施例,所述所述根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子,包括: 根据与所述信噪比和所述信干比成正相关的函数确定所述加权因子。According to one or more embodiments of the present disclosure, the determining the weighting factor corresponding to the azimuth of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth of the candidate sound source includes: according to the signal-to-noise ratio and the signal-to-noise ratio The weighting factor is determined by a function that is positively related to the signal-to-interference ratio.
根据本公开的一个或多个实施例,所述根据至少一个候选声源方位角各自对应的所述互相关函数的值确定声源的目标方位角,包括:从所述至少一个候选声源方位角各自对应的互相关函数的值中确定出最大值,并将所述最大值所对应的候选声源方位角确定为声源的目标方位角。According to one or more embodiments of the present disclosure, the determining the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth includes: from the at least one candidate sound source azimuth The maximum value is determined from the values of the cross-correlation functions corresponding to the respective angles, and the candidate sound source azimuth angle corresponding to the maximum value is determined as the target azimuth angle of the sound source.
根据本公开的一个或多个实施例,所述根据至少一个候选声源方位角各自对应的所述互相关函数的值确定声源的目标方位角,包括:根据所述至少一个候选声源方位角各自对应的互相关函数的值生成互相关函数的值的分布图;在所述分布图中确定出至少两个局部极值,并将所述至少两个局部极值各自对应的候选声源方位角确定为至少两个声源各自对应的目标方位角。According to one or more embodiments of the present disclosure, the determining the target azimuth of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth includes: according to the at least one candidate sound source azimuth The value of the cross-correlation function corresponding to each angle generates a distribution diagram of the value of the cross-correlation function; at least two local extreme values are determined in the distribution diagram, and the candidate sound sources corresponding to the at least two local extreme values The azimuth angle is determined as the target azimuth angle corresponding to each of the at least two sound sources.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.
Claims (13)
- 一种声源定位方法,包括:A sound source localization method, comprising:在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角;Determine at least one candidate sound source azimuth within the sound pickup interval corresponding to the audio information collection sensor;对于至少一个候选声源方位角,获取该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子;For at least one candidate sound source azimuth, obtain the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth, and determine the weighting factor corresponding to the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth ;根据至少一个候选声源方位角的所述加权因子确定声源的目标方位角。The target azimuth of the sound source is determined according to the weighting factor of the at least one candidate sound source azimuth.
- 根据权利要求1所述的方法,其特征在于,所述根据至少一个候选声源方位角的所述加权因子确定声源的目标方位角,包括:The method according to claim 1, wherein the determining the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source comprises:对于至少一个候选声源方位角,根据该候选声源方位角的加权因子生成该候选声源方位角对应的互相关函数的值;For at least one candidate sound source azimuth, generate the value of the cross-correlation function corresponding to the candidate sound source azimuth according to the weighting factor of the candidate sound source azimuth;根据至少一个候选声源方位角各自对应的所述互相关函数的值确定声源的目标方位角。The target azimuth of the sound source is determined according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth.
- 根据权利要求1所述的方法,其特征在于,所述在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角,包括:The method according to claim 1, wherein the determining at least one candidate sound source azimuth angle in the sound pickup interval corresponding to the audio information collection sensor comprises:将所述拾音区间分割为至少一个拾音子区间;dividing the pickup interval into at least one pickup sub-interval;根据预设规则在至少一个拾音子区间内确定所述至少一个候选声源方位角。The at least one candidate sound source azimuth is determined within at least one sound pickup sub-interval according to a preset rule.
- 根据权利要求2所述的方法,其特征在于,所述音频信息采集传感器包括由两个麦克风组成的线性麦克风阵列;以及The method of claim 2, wherein the audio information collection sensor comprises a linear microphone array consisting of two microphones; and所述将所述拾音区间分割为至少一个拾音子区间,包括:The dividing the sound-picking interval into at least one sound-picking sub-interval includes:将所述线性麦克风阵列所对应的180°的拾音区间等间隔分割为多个拾音子区间。The 180° sound pickup interval corresponding to the linear microphone array is divided into a plurality of sound pickup sub-intervals at equal intervals.
- 根据权利要求3所述的方法,其特征在于,所述确定该候选声源方位角的信噪比和信干比,包括:The method according to claim 3, wherein the determining the signal-to-noise ratio and the signal-to-interference ratio of the azimuth angle of the candidate sound source comprises:获取该候选声源方位角的音频信号的空间增强信号和音频信号的空间陷波信号;obtaining the spatial enhancement signal of the audio signal and the spatial notch signal of the audio signal at the azimuth angle of the candidate sound source;根据所述空间增强信号和所述空间陷波信号,确定该声源方位角的信噪比和信干比。According to the spatial enhancement signal and the spatial notch signal, the signal-to-noise ratio and the signal-to-interference ratio of the sound source azimuth are determined.
- 根据权利要求5所述的方法,其特征在于,The method of claim 5, wherein:所述获取该候选声源方位角的音频信号的空间增强信号和音频信号的空间陷波信号,包括:The obtaining of the spatial enhancement signal of the audio signal and the spatial notch signal of the audio signal at the azimuth angle of the candidate sound source includes:将由两个麦克风接收到的音频信号的信号延时之和得到的信号确定为空间增强信号;The signal obtained by the sum of the signal delays of the audio signals received by the two microphones is determined as the spatially enhanced signal;将由两个麦克风的音频信号的信号延时之差得到的信号确定为空间陷波信号。A signal obtained from the difference between the signal delays of the audio signals of the two microphones is determined as a spatial notch signal.
- 根据权利要求5所述的方法,其特征在于,所述根据所述空间增强信号和空间陷波信号,确定该声源方位角的信噪比和信干比,包括:The method according to claim 5, wherein the determining the signal-to-noise ratio and the signal-to-interference ratio of the sound source azimuth according to the spatial enhancement signal and the spatial notch signal comprises:将所述空间增强信号输入到预设噪声估计模块,得到第一估计噪声信号,利用所述空间增强信号与所述第一估计噪声信号确定所述信噪比;inputting the spatially enhanced signal into a preset noise estimation module to obtain a first estimated noise signal, and using the spatially enhanced signal and the first estimated noise signal to determine the signal-to-noise ratio;将所述空间陷波信号输入到所述预设噪声估计模块,得到第二估计噪声,利用所述空间增强信号和第一估计噪声之差、所述空间陷波信号与第二估计噪声之差,确定所述信干比。Inputting the spatial notch signal into the preset noise estimation module to obtain a second estimated noise, using the difference between the spatially enhanced signal and the first estimated noise, and the difference between the spatial notch signal and the second estimated noise , to determine the signal-to-interference ratio.
- 根据权利要求1所述的方法,其特征在于,所述根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子,包括:The method according to claim 1, wherein the determining the weighting factor corresponding to the azimuth angle of the candidate sound source according to the signal-to-noise ratio and the signal-to-interference ratio of the azimuth angle of the candidate sound source comprises:根据与所述信噪比和所述信干比成正相关的函数确定所述加权因子。The weighting factor is determined according to a function that is positively related to the signal-to-noise ratio and the signal-to-interference ratio.
- 根据权利要求1所述的方法,其特征在于,所述根据至少一个候选声源方位角各自对应的所述互相关函数的值确定声源的目标方位角,包括:The method according to claim 1, wherein the determining the target azimuth angle of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth angle comprises:从所述至少一个候选声源方位角各自对应的互相关函数的值中确定出最大值,并将所述最大值所对应的候选声源方位角确定为声源的目标方位角。The maximum value is determined from the values of the cross-correlation functions corresponding to the at least one candidate sound source azimuth, and the candidate sound source azimuth corresponding to the maximum value is determined as the target azimuth of the sound source.
- 根据权利要求1所述的方法,其特征在于,所述根据至少一个候选声源方位角各自对应的所述互相关函数的值确定声源的目标方位角,包括:The method according to claim 1, wherein the determining the target azimuth angle of the sound source according to the value of the cross-correlation function corresponding to each of the at least one candidate sound source azimuth angle comprises:根据所述至少一个候选声源方位角各自对应的互相关函数的值生成互相关函数的值的分布图;generating a distribution map of the values of the cross-correlation function according to the values of the cross-correlation function corresponding to the azimuth angles of the at least one candidate sound source;在所述分布图中确定出至少两个局部极值,并将所述至少两个局部极值各自对应的候选声源方位角确定为至少两个声源各自对应的目标方位角。At least two local extrema are determined in the distribution map, and the candidate sound source azimuth angles corresponding to the at least two local extrema are determined as the target azimuth angles corresponding to the at least two sound sources respectively.
- 一种声源定位装置,包括:A sound source localization device, comprising:第一确定单元,用于在音频信息采集传感器所对应的拾音区间内确定至少一个候选声源方位角;a first determining unit, configured to determine at least one candidate sound source azimuth within the sound pickup interval corresponding to the audio information collection sensor;获取单元,用于对于至少一个候选声源方位角,获取该候选声源方位角的信噪比和信干比,并根据该候选声源方位角的信噪比和信干比确定该候选声源方位角对应的加权因子;an acquisition unit, configured to acquire the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth for at least one candidate sound source azimuth, and determine the candidate sound source azimuth according to the signal-to-noise ratio and the signal-to-interference ratio of the candidate sound source azimuth The weighting factor corresponding to the angle;第二确定单元,用于根据至少一个候选声源方位角的所述加权因子确定声源的目标方位角。The second determining unit is configured to determine the target azimuth angle of the sound source according to the weighting factor of the azimuth angle of at least one candidate sound source.
- 一种电子设备,其特征在于,包括:An electronic device, comprising:至少一个处理器;at least one processor;存储装置,用于存储至少一个程序,storage means for storing at least one program,当所述至少一个程序被所述至少一个处理器执行,使得所述至少 一个处理器实现如权利要求1-10中任一所述的方法。The at least one program, when executed by the at least one processor, causes the at least one processor to implement a method as claimed in any one of claims 1-10.
- 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-10中任一所述的方法。A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 1-10 is implemented.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011555230.6 | 2020-12-23 | ||
CN202011555230.6A CN112799018B (en) | 2020-12-23 | 2020-12-23 | Sound source positioning method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022135131A1 true WO2022135131A1 (en) | 2022-06-30 |
Family
ID=75805682
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/135833 WO2022135131A1 (en) | 2020-12-23 | 2021-12-06 | Sound source positioning method and apparatus, and electronic device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112799018B (en) |
WO (1) | WO2022135131A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112799018B (en) * | 2020-12-23 | 2023-07-18 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
CN113889140A (en) * | 2021-09-24 | 2022-01-04 | 北京有竹居网络技术有限公司 | Audio signal playing method and device and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100926132B1 (en) * | 2007-08-10 | 2009-11-11 | 한국전자통신연구원 | Method and apparatus for fixing sound source direction in robot environment |
CN102854494A (en) * | 2012-08-08 | 2013-01-02 | Tcl集团股份有限公司 | Sound source locating method and device |
US20140241549A1 (en) * | 2013-02-22 | 2014-08-28 | Texas Instruments Incorporated | Robust Estimation of Sound Source Localization |
US20160171965A1 (en) * | 2014-12-16 | 2016-06-16 | Nec Corporation | Vibration source estimation device, vibration source estimation method, and vibration source estimation program |
CN111770568A (en) * | 2019-04-02 | 2020-10-13 | 电信科学技术研究院有限公司 | Method and device for determining positioning measurement value |
CN112799018A (en) * | 2020-12-23 | 2021-05-14 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101395722B1 (en) * | 2007-10-31 | 2014-05-15 | 삼성전자주식회사 | Method and apparatus of estimation for sound source localization using microphone |
CN101227694B (en) * | 2008-01-02 | 2012-12-26 | 重庆重邮信科通信技术有限公司 | Method and apparatus for obtaining TD-SCDMA system noise power, signal-noise ratio and signal-interference ratio |
EP3236672B1 (en) * | 2016-04-08 | 2019-08-07 | Oticon A/s | A hearing device comprising a beamformer filtering unit |
CN111025233B (en) * | 2019-11-13 | 2023-09-15 | 阿里巴巴集团控股有限公司 | Sound source direction positioning method and device, voice equipment and system |
CN111044973B (en) * | 2019-12-31 | 2021-06-01 | 山东大学 | MVDR target sound source directional pickup method for microphone matrix |
-
2020
- 2020-12-23 CN CN202011555230.6A patent/CN112799018B/en active Active
-
2021
- 2021-12-06 WO PCT/CN2021/135833 patent/WO2022135131A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100926132B1 (en) * | 2007-08-10 | 2009-11-11 | 한국전자통신연구원 | Method and apparatus for fixing sound source direction in robot environment |
CN102854494A (en) * | 2012-08-08 | 2013-01-02 | Tcl集团股份有限公司 | Sound source locating method and device |
US20140241549A1 (en) * | 2013-02-22 | 2014-08-28 | Texas Instruments Incorporated | Robust Estimation of Sound Source Localization |
US20160171965A1 (en) * | 2014-12-16 | 2016-06-16 | Nec Corporation | Vibration source estimation device, vibration source estimation method, and vibration source estimation program |
CN111770568A (en) * | 2019-04-02 | 2020-10-13 | 电信科学技术研究院有限公司 | Method and device for determining positioning measurement value |
CN112799018A (en) * | 2020-12-23 | 2021-05-14 | 北京有竹居网络技术有限公司 | Sound source positioning method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112799018B (en) | 2023-07-18 |
CN112799018A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022135131A1 (en) | Sound source positioning method and apparatus, and electronic device | |
WO2022121799A1 (en) | Sound signal processing method and apparatus, and electronic device | |
WO2022135130A1 (en) | Voice extraction method and apparatus, and electronic device | |
CN111967339B (en) | Method and device for planning unmanned aerial vehicle path | |
WO2022105622A1 (en) | Image segmentation method and apparatus, readable medium, and electronic device | |
CN114399588B (en) | Three-dimensional lane line generation method and device, electronic device and computer readable medium | |
CN114692085B (en) | Feature extraction method and device, storage medium and electronic equipment | |
WO2023029893A1 (en) | Texture mapping method and apparatus, device and storage medium | |
Hu et al. | Geometry calibration for acoustic transceiver networks based on network newton distributed optimization | |
CN113297277B (en) | Test statistic determining method and device, readable medium and electronic equipment | |
WO2023138468A1 (en) | Virtual object generation method and apparatus, device, and storage medium | |
WO2022121800A1 (en) | Sound source positioning method and apparatus, and electronic device | |
CN112464039A (en) | Data display method and device of tree structure, electronic equipment and medium | |
WO2023045870A1 (en) | Network model compression method, apparatus and device, image generation method, and medium | |
Belloch et al. | Real-time sound source localization on an embedded GPU using a spherical microphone array | |
CN110765238A (en) | Data encryption query method and device | |
WO2022194145A1 (en) | Photographing position determination method and apparatus, device, and medium | |
CN115272760A (en) | Small sample smoke image fine classification method suitable for forest fire smoke detection | |
Feng et al. | Interpolation of the early part of the acoustic transfer functions using block sparse models | |
CN111460334B (en) | Information display method and device and electronic equipment | |
CN112634934B (en) | Voice detection method and device | |
CN115879320B (en) | Grid model generation method, device, electronic equipment and computer readable storage medium | |
CN115817163B (en) | Method, apparatus, electronic device and computer readable medium for adjusting wheel speed of vehicle | |
CN113808050B (en) | Denoising method, device and equipment for 3D point cloud and storage medium | |
CN111145793B (en) | Audio processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21909128 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21909128 Country of ref document: EP Kind code of ref document: A1 |