WO2020199351A1

WO2020199351A1 - Sound source locating method, device and storage medium

Info

Publication number: WO2020199351A1
Application number: PCT/CN2019/091280
Authority: WO
Inventors: 关海欣; 丁少为
Original assignee: 北京云知声信息技术有限公司
Priority date: 2019-04-01
Filing date: 2019-06-14
Publication date: 2020-10-08
Also published as: CN110095755A; CN110095755B

Abstract

A sound source locating method, a device and a storage medium. The sound source locating method comprises: arranging a microphone array which has a specific shape distribution, the microphone array including a microphone set in a first linear distribution and a microphone set in a second linear distribution, which are arranged perpendicular to each other; then, on the basis of the microphone set in a first linear distribution and the microphone set in a second linear distribution, respectively acquiring first position information and second position information of a target to be located; and finally, calculating, according to the first position information and the second position information, 360-degree full-plane locating information of the target to be located.

Description

Sound source localization method, device and storage medium

Technical field

The present invention relates to the technical field of target positioning, in particular to a sound source positioning method, device and storage medium.

Background technique

Sound source localization refers to collecting the sound signal emitted by a target object and performing a specific algorithm operation on the sound signal to determine the position of the target object relative to the sound collection device. In order to improve the accuracy of the sound signal collected by the sound collection device, the sound collection device usually collects the sound signal of the target through a microphone array, and the microphone array includes an array composed of a number of single microphones distributed in a specific arrangement. Since the sound signal emitted by the target is a spherical sound wave with the target as the center of the sphere and diverging and spreading outward, if the sound collection device only collects the sound signal through a single microphone, this will cause the collected sound signal to be missing and incomplete If the sound signal is collected by the microphone array, the complete information of the sound signal can be obtained to the greatest extent, and due to the relative distance between the target and the sound collection device, the microphone array can also be used to further obtain the sound signal. The signal difference between different microphones in the microphone array, and the signal difference can further improve the accuracy of the position calculation of the target object, which is an effect that cannot be achieved by using a single microphone to collect sound signals.

In addition, current sound source localization algorithms based on sound signals collected by microphone arrays mainly include localization algorithms based on time delay estimation, namely TDE, localization algorithms based on high-resolution spectrum estimation, and localization algorithms based on sparse representation; among them, based on The calculation core of the positioning algorithm of time delay estimation lies in the accurate estimation of the sound wave propagation time delay, which is generally obtained by performing cross-correlation processing on the sound signals collected between different microphones in the microphone array; in order to further obtain the sound source The position information can also be calculated by simple delay summation, geometric calculation, or controllable power response search for the results obtained by the cross-correlation processing. The above-mentioned algorithm is easier to implement, and its computational complexity is small and it is convenient for real-time processing. Therefore, it is widely used in actual calculations.

Although the prior art already has a 360-degree omnidirectional positioning method based on the combination of a microphone array and a corresponding algorithm, when the target has a flat-frame shape with a narrow frame, the existing sound source positioning method is difficult to obtain relatively Accurate positioning results, because in this case, it is difficult for the microphone array to distinguish the front and rear directions of the target, which brings great difficulty to the accurate calculation of the target with a narrow frame plate shape through the method of sound source positioning.

Summary of the invention

In the sound source localization method based on the microphone array, the existing microphone arrays, such as linear microphone arrays or differential microphone arrays, cannot accurately distinguish the front and back of the target when the target has a narrow frame flat shape. Most of the existing smart voice interaction devices, such as mobile phones, tablets, or flat-panel TVs, have this narrow-border tablet shape, which will severely limit the application of sound source localization methods to this type of smart voice interaction devices. The traditional sound source localization method is used in the similar intelligent voice interactive equipment to locate, and the positioning results obtained will not meet the basic accuracy requirements. In addition, the existing arrangement of linear microphone arrays or differential microphone arrays cannot efficiently obtain the sound-related signals of a target with a narrow-frame flat plate shape. Such sound-related signals, such as differences in sound intensity or sound along different azimuths The changing trend, etc., have important computational significance for the subsequent positioning algorithm processing. It can be seen that the existing microphone array arrangement cannot effectively improve the positioning accuracy of the sound source positioning method.

In view of the defects in the prior art, the present invention provides a sound source localization method, device and storage medium. The method is to arrange a microphone array with a specific shape distribution, wherein the microphone array includes first linear distributions arranged perpendicular to each other. The microphone set and the second linear distributed microphone set, and then based on the first linear distributed microphone set and the second linear distributed microphone set, the first position information and the second position information of the target to be located are obtained respectively, and finally according to the The first position information and the second position information calculate 360° full-plane positioning information about the target to be positioned. In fact, the first set of linearly distributed microphones and the second set of linearly distributed microphones divide the full-planar space in which the target to be positioned is located into two half-planar spaces, and then obtain the two half-planar spaces in sequence. The spatial position information of the target to be located, and the full-planar position information of the target to be located is comprehensively calculated according to the corresponding spatial position information in the two half-planar spaces, which can overcome the inability of traditional sound source localization methods to distinguish narrow borders The defects in the front and rear directions of the flat-shaped object, thereby effectively improving the sound source localization accuracy of the target with a narrow-frame flat-shaped object.

According to a first aspect of the embodiments of the present invention, a sound source localization method is provided. The sound source localization method includes the following steps:

Step (1), arranging a T-shaped distributed microphone array composed of several microphones, the T-shaped distributed microphone array including a first linear distributed microphone set and a second linear distributed microphone set arranged perpendicular to each other;

Step (2): Calculate first position information about the target based on the voice signal collected by the first linear distributed microphone set;

Step (3), based on the voice signal collected by the second linear distributed microphone set, calculate the second position information about the target, and determine the second position information about the target object according to the first position information and the second position information. State the 360° full plane positioning information of the target;

In one embodiment, in step (1), arranging the microphone array in a T-shaped distribution specifically includes: arranging a number of microphones at a predetermined interval along a first direction to form the first line-shaped distributed microphone set, and At least one microphone is arranged in a second direction perpendicular to the first direction to form the second linear distributed microphone set.

In one embodiment, in step (1), all microphones in the second linear distributed microphone set are located on the same side of the first linear distributed microphone set, and the second direction passes through the first linear distributed microphone set. One of the microphones in a line-type distributed microphone set.

In one embodiment, in step (2), calculating the first position information specifically includes: first obtaining the voice signal of each microphone in the first linearly distributed microphone set, and then calculating it based on a time delay estimation algorithm The first position information, wherein the first position information is 0°-180° half-plane position information of the target based on a corresponding setting direction of the second linear distributed microphone set.

In one embodiment, in step (2), calculating the first position information based on the time delay estimation algorithm specifically includes: performing voice signals corresponding to each microphone in the first linearly distributed microphone set. Cross-correlation processing, and performing controllable power response search processing on the mutual results obtained by the cross-correlation processing, so as to obtain the first position information by calculation.

In one embodiment, in step (2), calculating the first position information based on the time delay estimation algorithm specifically includes: performing voice signals corresponding to each microphone in the first linearly distributed microphone set. Generalized cross-correlation function processing, wherein the generalized cross-correlation function also introduces a weighting function on the cross-power spectral density between different microphones, and then calculates the first step according to the generalized cross-correlation phase transformation algorithm on the generalized cross-correlation function One location information.

In one embodiment, in step (3), calculating the second position information specifically includes selecting one of the microphones in the first linear distributed microphone set and the one in the second linear distributed microphone set A microphone forms a small-pitch microphone differential array, and then the second position information is calculated based on the small-pitch microphone differential array in combination with a corresponding differential array algorithm.

In one embodiment, in step (3), calculating the second position information based on the small-pitch microphone differential array in combination with a corresponding differential array algorithm specifically includes: using the small-pitch microphone differential array as a fixed The beamformer simultaneously obtains the first-order differential beam pattern corresponding to the fixed beamformer, and calculates the second position information based on the first-order differential beam pattern.

In one embodiment, in step (3), calculating the second position information based on the small-pitch microphone differential array in combination with a corresponding differential array algorithm specifically includes: taking the small-pitch microphone differential array as a fixed beam forming And design the first beam weight and the second beam weight that are different for the fixed beamformer at the same time, and calculate the first output signal energy sum corresponding to the first beam weight and the second beam weight The second output signal energy is calculated according to the larger one of the first output signal energy and the second output signal energy.

In an embodiment, calculating the energy of the first output signal specifically includes, for the first beam weight, taking the expected direction of the first beam weight as the front and taking the zero of the first beam weight. The trap direction is backward, and the input signals of the small-pitch microphone differential array are weighted and summed to obtain the first output signal energy; or, calculating the second output signal energy specifically includes: Beam weights, taking the expected direction of the second beam weights as the rear and the null direction of the second beam weights as the front, performing weighted summation processing on the input signals of the small-pitch microphone differential array, Thus, the energy of the second output signal is obtained.

According to a second aspect of the embodiments of the present invention, there is provided a sound source localization device, including:

An array arrangement module for arranging a T-shaped distributed microphone array composed of several microphones, the T-shaped distributed microphone array including a first linear distributed microphone set and a second linear distributed microphone set arranged perpendicular to each other;

The first collection module is configured to calculate the first position information about the target based on the voice signal collected by the first linear distributed microphone set;

The second collection module is configured to calculate second position information about the target based on the voice signal collected by the second linear distributed microphone set, and determine according to the first position information and the second position information 360° full plane positioning information about the target.

According to a third aspect of the embodiments of the present invention, there is provided a sound source localization device, including:

processor;

A memory for storing processor executable instructions;

Wherein, the processor is configured to execute the steps of the method described in any embodiment of the first aspect.

According to a fourth aspect of the embodiments of the present invention, there is provided a computer-readable storage medium having computer instructions stored thereon, characterized in that, when the instructions are executed by a processor, the steps of the method described in any of the embodiments of the first aspect are implemented .

Compared with the prior art, the sound source localization method of the present invention divides the full-planar space where the target object to be located is located into two halves through the first linear distributed microphone set and the second linear distributed microphone set arranged perpendicular to each other. Plane space, and then sequentially obtain the spatial position information of the target to be positioned in the two half-plane spaces, and comprehensively calculate the full-planar position of the target to be positioned according to the corresponding spatial position information in the two half-plane spaces In this way, the traditional sound source localization method can overcome the defect that the traditional sound source localization method cannot distinguish the front and rear directions of the flat panel shape with a narrow frame, thereby effectively improving the sound source localization accuracy of the target object with the flat frame shape of the narrow frame.

Other features and advantages of the present invention will be described in the following description, and partly become obvious from the description, or understood by implementing the present invention. The purpose and other advantages of the present invention can be realized and obtained by the structures specifically pointed out in the written description, claims, and drawings.

The technical solutions of the present invention will be further described in detail below through the accompanying drawings and embodiments.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

Fig. 1 is a schematic flowchart of a sound source localization method provided by the present invention;

2 is a schematic diagram of the distribution of microphone arrays in a sound source localization method provided by the present invention;

Figure 3 is a block diagram of a sound source localization device in an embodiment of the present invention;

Fig. 4 is a block diagram of a sound source localization device in an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

Refer to FIG. 1, which is a schematic flowchart of a sound source localization method provided by an embodiment of the present invention. The sound source localization method is to arrange a microphone array with a specific shape distribution near the target object to be located. The microphone array can divide the accessory space area of the target object to be located into a front area, a rear area, and a left area. The side area and the right area are four different space area parts. When the target to be positioned has a narrow frame flat plate shape, the division of the above four different space area parts can be sequentially in different half-plane spaces. This can effectively overcome the inability to distinguish the front-to-rear direction of the target object with a narrow frame and flat plate shape in the existing sound source localization method. In fact, the basic idea of the location detection of the sound source location method of the present invention is to collect different voice signals through two different linearly distributed microphone sets in the microphone array, and calculate the corresponding different voice signals based on the different voice signals. Position information, and the different position information is different position data about the target, so that the full-plane positioning information about the target can be determined according to the different position information. Preferably, several microphones in the microphone array can mutually form a T-shaped array distribution form.

Specifically, the sound source localization method may include the following steps:

Step (1): arranging a T-shaped distributed microphone array composed of several microphones. The T-shaped distributed microphone array includes a first linear distributed microphone set and a second linear distributed microphone set arranged perpendicular to each other.

Preferably, in the step (1), arranging the microphone array in a T-shaped distribution specifically includes: arranging a plurality of microphones at a predetermined interval along a first direction to form the first linearly distributed microphone set, which is aligned with the first direction. At least one microphone is arranged in the second vertical direction to form the second linear distributed microphone set. Wherein, the first linear distributed microphone set may include at least three microphones, and the second linear distributed microphone set may include at least two microphones; further, the first linear distributed microphone set and the second linear distributed microphone set are mutually There may be one or several microphones in common.

Preferably, in the step (1), all the microphones in the second linear distributed microphone set may be located on the same side of the first linear distributed microphone set, and the second direction passes through the first linear distributed microphone One of the microphones in the collection. Wherein, all the microphones in the first linear distributed microphone set can be evenly distributed on both sides of the second linear distributed microphone set with the axis of the second linear distributed microphone set as the axis of symmetry. Further, the distance between all adjacent two microphones in the first linear distributed microphone set or the second linear distributed microphone set may be equal, so as to ensure the first linear distributed microphone diversity set or the second linear distributed microphone set Different microphones in the microphone set have the same receiving performance in the sound field distribution.

Step (2), based on the voice signal collected by the first linear distributed microphone set, calculate the first position information about the target.

Preferably, in this step (2), calculating the first position information may specifically include: first obtaining the voice signal of each microphone in the first linearly distributed microphone set, and then calculating the first position information based on a time delay estimation algorithm. Position information; where the first position information is based on the 0°-180° half-plane position information of the target based on the corresponding setting direction of the second linear distributed microphone set. Further, in the process of calculating the first position, each microphone in the first linearly distributed microphone set is controlled to receive the sound signal generated by the target in real time, and then the sound signal received by each microphone is estimated to pass the time delay. The algorithm calculates the first position information.

Preferably, in this step (2), calculating the first position information based on the time delay estimation algorithm may specifically include performing cross-correlation processing on the voice signal corresponding to each microphone in the first linearly distributed microphone set, and Performing controllable power response search processing on the mutual results obtained by the cross-correlation processing, thereby calculating the first position information.

Preferably, in this step (2), calculating the first position information based on the time delay estimation algorithm may specifically include: performing a generalized cross-correlation function on the voice signal corresponding to each microphone in the first linearly distributed microphone set Processing, wherein the generalized cross-correlation function also introduces a weighting function on the cross power spectral density between different microphones, and then calculates the first position information according to the generalized cross-correlation phase transformation algorithm on the generalized cross-correlation function.

Among them, the time delay estimation algorithm can be a time delay estimation algorithm based on a generalized cross-correlation function, and the time delay estimation algorithm also introduces a weighting function during the implementation process, which adjusts the cross power spectral density of the sound signal , So as to optimize the calculation accuracy of the time delay estimation algorithm. Preferably, according to the different types of the weighting function, the generalized cross-correlation function corresponding to the time delay estimation algorithm can have a variety of different variants, and the time delay estimation algorithm based on the generalized cross-correlation function can be specifically generalized cross-correlation- Phase transformation method (GCC-PHAT), the generalized cross-correlation-phase transformation method itself has certain anti-noise and anti-reverberation capabilities, so the use of this algorithm can reduce to a certain extent each of the first linear distribution microphone sets The noise and/or reverberation existing in the voice signal of the microphone causes interference to the first position information. Simply put, the time delay estimation algorithm based on the generalized cross-correlation function estimates the time delay value based on the peak value of the cross-correlation function between the sound signals collected by two microphones. This is because in the same sound source positioning system, The sound signal received by each microphone in the microphone array set comes from the same sound source, which makes the channel signals corresponding to different microphones have a strong correlation, so that by calculating the difference between the channel signals corresponding to each two microphones The correlation function can determine the time delay value between the corresponding channel signals of the two corresponding microphones, and finally the position information corresponding to the sound source can be calculated with the time delay value.

Step (3), based on the voice signal collected by the second linear distributed microphone set, calculate the second position information about the target object, and determine the 360° of the target object according to the first position information and the second position information Full plane positioning information.

Preferably, in the step (3), calculating the second position information may specifically include selecting one of the microphones in the first linear distributed microphone set and one of the microphones in the second linear distributed microphone set to form a The small-pitch microphone differential array is then based on the small-pitch microphone differential array and combined with the corresponding differential array algorithm to calculate the second position information.

Preferably, in this step (3), calculating the second position information based on the small-pitch microphone differential array in combination with the corresponding differential array algorithm may specifically include: using the small-pitch microphone differential array as a fixed beamformer, At the same time, the first-order differential beam pattern corresponding to the fixed beamformer is acquired, and the second position information is calculated based on the first-order differential beam pattern.

Preferably, in this step (3), calculating the second position information based on the small-pitch microphone differential array in combination with the corresponding differential array algorithm may specifically include using the small-pitch microphone differential array as a fixed beamformer, and designing To obtain different first beam weights and second beam weights for the fixed beamformer, and calculate the first output signal energy and the second output signal energy corresponding to the first beam weight and the second beam weight, The second position information is calculated according to the larger one of the first output signal energy and the second output signal energy.

Preferably, in this step (3), calculating the energy of the first output signal may specifically include, for the first beam weight, taking the expected direction of the first beam weight as the front and taking the first beam weight as the forward direction. The nulling direction is the rear, and the input signal of the small-pitch microphone differential array is weighted and summed to obtain the first output signal energy; or, calculating the second output signal energy may specifically include, for the second beam weight Value, taking the expected direction of the second beam weight as the back and the null direction of the second beam weight as the front, and performing weighted summation processing on the input signal of the differential microphone array with small spacing to obtain the second output Signal energy.

Refer to FIG. 2, which is a schematic diagram of the distribution of the microphone array in a sound source localization method according to an embodiment of the present invention. Wherein, Figure 2 only schematically shows the distribution of several microphones in the microphone array. When the distribution form of the microphone array of the present invention is not limited to the situation shown in Figure 2, the distribution of the microphone array of the present invention The form can have other different distribution forms, which will not be listed here. As can be seen from Figure 2, the microphone array includes a first linear microphone set composed of three microphones M1, M2, M3 arranged in a horizontal direction, and two microphones M2, M4 arranged in a vertical direction. The second line microphone collection. Preferably, the distances between two adjacent microphones in the microphones M1, M2, and M3 are equal; preferably, the microphones M1 and M3 are symmetrically distributed on the microphone with the line on which the microphones M2 and M4 are located as the axis. Both sides of M2 and M4. Correspondingly, the sound source localization process of the microphone array shown in FIG. 2 is the same as the real-time process of the sound source localization method introduced above, and will not be further described here.

It can be seen from the above embodiments that the sound source localization method is to arrange a microphone array with a specific shape distribution, wherein the microphone array includes a first linear distribution microphone set and a second linear distribution microphone set arranged perpendicular to each other, and then based on The first line-shaped distributed microphone set and the second line-shaped distributed microphone set respectively obtain the first position information and the second position information about the target to be located, and finally calculate the information about the target object according to the first position information and the second position information. The 360° full-plane positioning information of the target to be located can overcome the defect that the traditional sound source localization method cannot distinguish the front and back directions of the flat panel shape with a narrow frame, thereby effectively improving the sound source localization of the target with a flat frame shape with a narrow frame Accuracy.

The following are device embodiments of the present invention, which can be used to implement the method embodiments of the present invention.

Fig. 3 is a schematic structural diagram of a sound source localization device provided by an embodiment of the present invention. The device can be implemented as part or all of an electronic device through software, hardware, or a combination of both. According to FIG. 3, the sound source localization device includes an array arrangement module 301, a first collection module 302 and a second collection module 303. Wherein, the array arrangement module 301 is used to arrange a T-shaped distributed microphone array composed of several microphones. The T-shaped distributed microphone array includes a first linear distributed microphone set and a second linear distributed microphone set arranged perpendicular to each other; The acquisition module 302 is used to calculate the first position information about the target based on the voice signal collected by the first linear distributed microphone set; the second acquisition module 303 is used to calculate the voice signal based on the voice signal collected by the second linear distributed microphone set The second position information about the target, and the 360° full-plane positioning information about the target is determined according to the first position information and the second position information.

Regarding the device in the above-mentioned embodiment, the specific operation mode of each module thereof has been described in detail in the embodiment of the method, and detailed description will not be given here.

The embodiment of the present invention also provides a sound source localization device, which includes:

processor;

A memory for storing processor executable instructions;

Among them, the processor is configured to execute:

Step (2), based on the voice signal collected by the first linear distributed microphone set, calculate the first position information about the target;

Step (3), based on the voice signal collected by the second linear distributed microphone set, calculate the second position information about the target, and determine the 360° full-plane positioning information about the target according to the first position information and the second position information ；

In an embodiment, the above-mentioned processor may also be configured to:

In step (1), arranging the microphone array in a T-shaped distribution specifically includes: arranging a plurality of microphones at a predetermined interval along a first direction to form a first linear distributed microphone set, and arranging at least one microphone array in a second direction perpendicular to the first direction. One microphone to form a second linear distribution microphone set.

In an embodiment, the above-mentioned processor may also be configured to:

In step (1), the microphones in the second linear distributed microphone set are all located on the same side of the first linear distributed microphone set, and the second direction passes through one of the microphones in the first linear distributed microphone set.

In an embodiment, the above-mentioned processor may also be configured to:

In step (2), calculating the first position information specifically includes: first obtaining the voice signal of each microphone in the first linearly distributed microphone set, and then calculating the first position information based on the time delay estimation algorithm, where the first position The information is the 0°-180° half-plane position information of the target based on the corresponding setting direction of the second linear distributed microphone set.

In an embodiment, the above-mentioned processor may also be configured to:

In step (2), calculating the first position information based on the time delay estimation algorithm specifically includes performing cross-correlation processing on the voice signal corresponding to each microphone in the first linearly distributed microphone set, and performing cross-correlation processing on the cross-correlation processing. As a result, a controllable power response search process is performed, and the first position information is calculated.

In an embodiment, the above-mentioned processor may also be configured to:

In step (2), calculating the first position information based on the time delay estimation algorithm specifically includes performing generalized cross-correlation function processing on the voice signal corresponding to each microphone in the first linearly distributed microphone set, where the generalized cross-correlation function is also The weighting function of the cross power spectral density between different microphones is introduced, and the first position information is calculated according to the generalized cross-correlation phase transformation algorithm of the generalized cross-correlation function.

In an embodiment, the above-mentioned processor may also be configured to:

In step (3), calculating the second position information specifically includes selecting one of the microphones in the first linear distributed microphone set and one of the microphones in the second linear distributed microphone set to form a small-pitch microphone differential array, and then based on The micro-pitch microphone differential array is combined with the corresponding differential array algorithm to calculate the second position information.

In an embodiment, the above-mentioned processor may also be configured to:

In step (3), calculating the second position information based on the small-pitch microphone differential array combined with the corresponding differential array algorithm specifically includes: using the small-pitch microphone differential array as a fixed beamformer, and at the same time obtaining the corresponding fixed beamformer The first-order differential beam pattern, and the second position information is calculated based on the first-order differential beam pattern.

In an embodiment, the above-mentioned processor may also be configured to:

In step (3), calculating the second position information based on the small-pitch microphone differential array combined with the corresponding differential array algorithm specifically includes taking the small-pitch microphone differential array as the fixed beamformer, and designing different information about the fixed beamformer. The first beam weight and the second beam weight are calculated, and the first output signal energy and the second output signal energy corresponding to the first beam weight and the second beam weight are calculated, and then according to the first output signal energy and the second output signal energy The larger one of the output signal energy calculates the second position information.

In an embodiment, the above-mentioned processor may also be configured to:

Calculating the energy of the first output signal specifically includes, for the first beam weight, taking the desired direction of the first beam weight as the front and the null direction of the first beam weight as the back, for the input signal of the differential microphone array with fine pitch Perform weighted summation processing to obtain the energy of the first output signal; or, calculating the energy of the second output signal specifically includes, for the second beam weight, taking the expected direction of the second beam weight as the rear and taking the second beam weight The null direction of is forward, and the input signal of the differential array of small-pitch microphones is weighted and summed to obtain the second output signal energy.

Fig. 4 is a structural block diagram of a sound source localization device provided by an embodiment of the present invention. For example, the device 40 may be provided as a server. The device 40 includes a processing component 402, which further includes one or more processors, and a memory resource represented by the memory 404, for storing instructions executable by the processing component 402, such as application programs. The application program stored in the memory 404 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 402 is configured to execute instructions to perform the above-mentioned method.

The device 40 may also include a power component 406 configured to perform power management of the device 40, a wired or wireless network interface 408 configured to connect the device 40 to a network, and an input/output (I/O) interface 410. The device 40 can operate based on an operating system stored in the memory 404, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.

The embodiments of the present disclosure also provide a non-transitory computer-readable storage medium. When the instructions in the storage medium are executed by the processor of the device 40, the device 40 can execute the above sound source localization method. The method includes:

In one embodiment, in step (1), arranging the microphone array in a T-shaped distribution specifically includes: arranging a plurality of microphones at a predetermined interval along the first direction to form a first linearly distributed microphone set, which is perpendicular to the first direction. At least one microphone is arranged in the second direction to form a second linear distributed microphone set.

In one embodiment, in step (1), the microphones in the second linearly distributed microphone set are all located on the same side of the first linearly distributed microphone set, and the second direction passes through one of the first linearly distributed microphone sets. A microphone.

In one embodiment, in step (2), calculating the first position information specifically includes: first obtaining the voice signal of each microphone in the first linearly distributed microphone set, and then calculating the first position information based on the delay estimation algorithm , Wherein the first position information is 0°-180° half-plane position information of the target based on the corresponding setting direction of the second linear distributed microphone set.

In one embodiment, in step (2), calculating the first position information based on the time delay estimation algorithm specifically includes performing cross-correlation processing on the voice signal corresponding to each microphone in the first linear distributed microphone set, and The mutual results obtained by the cross-correlation processing are subjected to controllable power response search processing, thereby calculating the first position information.

In one embodiment, in step (2), calculating the first position information based on the time delay estimation algorithm specifically includes performing generalized cross-correlation function processing on the voice signal corresponding to each microphone in the first linearly distributed microphone set, Among them, the generalized cross-correlation function also introduces a weighting function on the cross-power spectral density between different microphones, and then calculates the first position information according to the generalized cross-correlation phase transformation algorithm on the generalized cross-correlation function.

In one embodiment, in step (3), calculating the second position information specifically includes selecting one of the microphones in the first linear distributed microphone set and one of the microphones in the second linear distributed microphone set to form a small distance The microphone differential array is based on the small-pitch microphone differential array and combined with the corresponding differential array algorithm to calculate the second position information.

In one embodiment, in step (3), calculating the second position information based on the small-pitch microphone differential array in combination with the corresponding differential array algorithm specifically includes: using the small-pitch microphone differential array as a fixed beamformer and simultaneously acquiring The first-order differential beam pattern corresponding to the fixed beamformer, and the second position information is calculated based on the first-order differential beam pattern.

In one embodiment, in step (3), calculating the second position information based on the small-pitch microphone differential array combined with the corresponding differential array algorithm specifically includes using the small-pitch microphone differential array as a fixed beamformer, and at the same time designing The fixed beamformer has different first beam weights and second beam weights, and calculates the first output signal energy and the second output signal energy corresponding to the first beam weight and the second beam weight, and then according to the first The larger one of the energy of the output signal and the energy of the second output signal calculates the second position information.

In one embodiment, calculating the energy of the first output signal specifically includes, for the first beam weight, taking the desired direction of the first beam weight as the front and taking the null direction of the first beam weight as the back, and for the small spacing The input signal of the microphone differential array is weighted and summed to obtain the first output signal energy; or, calculating the second output signal energy specifically includes, for the second beam weight, taking the expected direction of the second beam weight as the rear and Taking the null direction of the second beam weight as the front, weighting and summation processing is performed on the input signal of the small-pitch microphone differential array to obtain the second output signal energy.

Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims

A sound source localization method, characterized in that the sound source localization method includes the following steps:

Step (1), arranging a T-shaped distributed microphone array composed of several microphones, the T-shaped distributed microphone array including a first linear distributed microphone set and a second linear distributed microphone set arranged perpendicular to each other;

Step (2): Calculate first position information about the target based on the voice signal collected by the first linear distributed microphone set;

Step (3), based on the voice signal collected by the second linear distributed microphone set, calculate the second position information about the target, and determine the second position information about the target object according to the first position information and the second position information. The 360° full plane positioning information of the target.
The sound source localization method according to claim 1, characterized in that: in step (1), arranging the microphone array in a T-shaped distribution specifically includes arranging a plurality of microphones at a predetermined interval along a first direction to form the A first linear distributed microphone set, and at least one microphone is arranged along a second direction perpendicular to the first direction to form the second linear distributed microphone set.
The sound source localization method according to claim 2, wherein in step (1), all microphones in the second linear distributed microphone set are located on the same side of the first linear distributed microphone set, The second direction passes through one of the microphones in the first linear distributed microphone set.
The sound source localization method according to claim 1, wherein in step (2), calculating the first position information specifically includes first obtaining the voice of each microphone in the first linearly distributed microphone set Signal, the first position information is calculated based on the time delay estimation algorithm, wherein the first position information is based on the corresponding setting direction of the second linearly distributed microphone set and the 0° of the target object -180° half-plane position information.
The sound source localization method according to claim 4, wherein in step (2), calculating the first position information based on the time delay estimation algorithm specifically includes: The voice signal corresponding to each microphone in the set is subjected to cross-correlation processing, and a controllable power response search processing is performed on the mutual result obtained by the cross-correlation processing, so as to calculate the first position information.
The sound source localization method according to claim 4, wherein in step (2), calculating the first position information based on the time delay estimation algorithm specifically includes: The speech signal corresponding to each microphone in the set is processed by the generalized cross-correlation function, wherein the generalized cross-correlation function also introduces a weighting function on the cross power spectral density between different microphones, and then according to the generalized cross-correlation function The first position information is calculated by a cross-correlation phase transformation algorithm.
The sound source localization method according to claim 1, wherein in step (3), calculating the second position information specifically includes selecting one of the microphones in the first linear distributed microphone set and all One of the microphones in the second linear distributed microphone set forms a small-pitch microphone differential array, and then the second position information is calculated based on the small-pitch microphone differential array in combination with a corresponding differential array algorithm.
7. The sound source localization method according to claim 7, characterized in that: in step (3), calculating the second position information based on the small-pitch microphone differential array in combination with the corresponding differential array algorithm specifically includes: The fine-pitch mic sub-differential array is used as a fixed beamformer, and the first-order differential beam pattern corresponding to the fixed beamformer is simultaneously acquired, and the second position information is calculated based on the first-order differential beam pattern.
The sound source localization method according to claim 7, characterized in that: in step (3), calculating the second position information based on the small-pitch microphone differential array in combination with the corresponding differential array algorithm specifically includes: The small-pitch microphone differential array is a fixed beamformer. At the same time, different first beam weights and second beam weights for the fixed beamformer are designed, and the first beam weights and the second beams are calculated The energy of the first output signal and the energy of the second output signal corresponding to the weight is calculated according to the larger one of the energy of the first output signal and the energy of the second output signal.
The sound source localization method according to claim 9, characterized in that: calculating the energy of the first output signal specifically includes, for the first beam weight, taking the expected direction of the first beam weight as forward and Taking the null direction of the first beam weight as the back, perform weighted summation processing on the input signals of the small-pitch microphone differential array to obtain the first output signal energy; or, calculate the second output The signal energy specifically includes, for the second beam weight, taking the expected direction of the second beam weight as the back and the null direction of the second beam weight as the front, and the difference to the small-pitch microphone The input signal of the array is subjected to weighted summation processing to obtain the energy of the second output signal.
A sound source localization device, characterized in that it comprises:

An array arrangement module for arranging a T-shaped distributed microphone array composed of several microphones, the T-shaped distributed microphone array including a first linear distributed microphone set and a second linear distributed microphone set arranged perpendicular to each other;

The first collection module is configured to calculate the first position information about the target based on the voice signal collected by the first linear distributed microphone set;

The second collection module is configured to calculate second position information about the target based on the voice signal collected by the second linear distributed microphone set, and determine according to the first position information and the second position information 360° full plane positioning information about the target.
A sound source localization device, characterized in that it comprises:

processor;

A memory for storing processor executable instructions;

Wherein, the processor is configured to perform the steps of the method according to any one of claims 1-10.
A computer-readable storage medium having computer instructions stored thereon, wherein the instructions implement the steps of the method according to any one of claims 1-10 when the instructions are executed by a processor.