CN113311391A

CN113311391A - Sound source positioning method, device and equipment based on microphone array and storage medium

Info

Publication number: CN113311391A
Application number: CN202110452117.3A
Authority: CN
Inventors: 陈英博
Original assignee: Pulian International Co ltd
Current assignee: Pulian International Co ltd
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2021-08-27

Abstract

The invention relates to the technical field of sound source positioning, and discloses a sound source positioning method, a sound source positioning device, positioning equipment and a storage medium based on a microphone array, wherein the method comprises the following steps: acquiring output signals of a plurality of microphone arrays; calculating the similarity of any two output signals of different microphone arrays, and classifying all the output signals according to the similarity to obtain a classification result; and determining the number and the position information of the sound sources according to the classification result. According to the invention, the similarity of all output signals of the plurality of microphone arrays is calculated, and the output signals are classified according to the similarity, so that the position of a sound source is determined according to the classification result, and the problem that the sound source matching cannot be realized in the prior art is solved.

Description

Sound source positioning method, device and equipment based on microphone array and storage medium

Technical Field

The invention relates to the technical field of sound source positioning, in particular to a sound source positioning method, a sound source positioning device, sound source positioning equipment and a storage medium based on a microphone array.

Background

In the technical field of sound source positioning, a plurality of microphone arrays are generally adopted to position a plurality of sound sources. In the positioning process, each microphone array collects output signals of multiple sound sources, namely, multiple sound sources and multiple output signals are positioned, but in the prior art, sound source matching cannot be performed on the output signals, namely, which outputs of the microphone arrays correspond to the same sound source cannot be determined, and the position of the corresponding sound source cannot be determined.

Disclosure of Invention

The embodiment of the invention aims to provide a sound source positioning method, a sound source positioning device, positioning equipment and a storage medium based on a microphone array.

In order to achieve the above object, an embodiment of the present invention provides a sound source localization method based on a microphone array, including:

acquiring output signals of a plurality of microphone arrays;

calculating the similarity of any two output signals of different microphone arrays, and classifying all the output signals according to the similarity to obtain a classification result;

and determining the number and the position information of the sound sources according to the classification result.

Preferably, the calculating the similarity between any two output signals of different microphone arrays and classifying all the output signals according to the similarity to obtain a classification result specifically includes:

obtaining an initial classification result according to an output signal of any microphone array;

calculating the similarity between any output signal of other microphone arrays and all categories in the initial classification result, and acquiring the maximum similarity;

when the maximum similarity is larger than a preset threshold value, classifying any output signal into the category of the initial classification result corresponding to the maximum similarity;

and when the maximum similarity is smaller than a preset threshold value, updating the initial classification result according to any output signal.

Preferably, the similarity is calculated from a cross-correlation function.

Preferably, the calculating the similarity between any output signal of the other microphone array and all the categories in the initial classification result specifically includes:

according to the formula

Calculating a similarity r, wherein S_iAn ith frequency domain signal representing the output signal corresponding to any one of the classes in the initial classification result,

representing the average, T, of all frequency domain signals representing the output signal corresponding to any one of the classes in the initial classification result_jA jth frequency domain signal representing any output signal of the other microphone array,

an average of all frequency domain signals representing any of the output signals of the other microphone arrays.

Preferably, the determining the number and the position information of the sound sources according to the classification result specifically includes:

determining the number of sound sources according to the category number of the classification result;

and determining the position information of the corresponding sound source according to the output signal corresponding to each category.

Preferably, before the determining the number of sound sources according to the number of categories of the classification result, the method further includes:

and deleting the categories of which the number of the output signals in the classification result is less than the preset number.

Another embodiment of the present invention provides a sound source localization apparatus based on a microphone array, including:

the signal acquisition module is used for acquiring output signals of a plurality of microphone arrays;

the classification module is used for calculating the similarity of any two output signals and classifying all the output signals according to the similarity to obtain a classification result;

and the positioning module is used for determining the number and the position information of the sound sources according to the classification result.

Another embodiment of the present invention provides a microphone array based sound source localization apparatus, comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the microphone array based sound source localization method as described in any one of the above items when executing the computer program.

Another embodiment of the present invention provides a computer-readable storage medium comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform any one of the above-mentioned microphone array based sound source localization methods.

Compared with the prior art, the sound source positioning method, the sound source positioning device, the sound source positioning equipment and the storage medium based on the microphone arrays provided by the embodiment of the invention calculate the similarity of all output signals of the microphone arrays and classify the output signals according to the similarity, so that which output signals correspond to the same sound source is determined, and the problem that the sound source matching cannot be realized in the prior art is solved. Meanwhile, after the output signal corresponding to each sound source is determined, the position information of the sound source can be determined according to the corresponding output signal, and the sound source can be accurately positioned.

Drawings

Fig. 1 is a schematic flowchart of a sound source localization method based on a microphone array according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a microphone array for sound source localization according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a sound source localization apparatus based on a microphone array according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a sound source localization apparatus based on a microphone array according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, it is a schematic flowchart of a sound source localization method based on a microphone array according to the embodiment of the present invention, where the method includes steps S1 to S3:

s1, acquiring output signals of a plurality of microphone arrays;

s2, calculating the similarity of any two output signals of different microphone arrays, and classifying all the output signals according to the similarity to obtain a classification result;

and S3, determining the number and the position information of the sound sources according to the classification result.

It should be noted that one microphone array can monitor a plurality of sound sources in a space, and when a plurality of microphone arrays exist in the space, the whole microphone array system outputs many signals, but the signals are not classified and matched, and it is not known which output signals correspond to the same sound source. For convenience of understanding, the embodiment of the present invention provides a schematic diagram of a microphone array for sound source localization, and specifically refers to fig. 2. As can be seen from fig. 2, the positioning of the first microphone array to three sound sources outputs three output signals, i.e., three dashed lines from O1 in fig. 2, each dashed line representing a sound source signal. Each array may be positioned to a different number of signals. For example, when O1 and O3 are far away and the sound source P2 is far away from O3, O3 cannot locate P2 and can only locate P1 and P3. The total number of output signals of the three microphone arrays in fig. 2 is 8, but the prior art cannot distinguish which output signals correspond to the same sound source, and the present invention aims to solve the technical problem.

Specifically, in a multi-sound-source space, a plurality of microphone arrays are controlled to listen to the multi-sound-source space, and then output signals of the plurality of microphone arrays are acquired. Typically, each microphone array will output N output signals, one for each sound source, each output signal including pitch angle, azimuth angle and audio signal. If the number of sound sources in the space is W, N is less than or equal to W, and some sound sources may be far from a certain microphone array, so that the sound sources cannot be monitored, and corresponding signals cannot be output.

Since the respective output signals of the same microphone array correspond to different sound sources, the similarity is definitely different and may be ignored in order to reduce the amount of calculation. And calculating the similarity of any two output signals of different microphone arrays, and classifying all the output signals according to the similarity to obtain a classification result. It is noted that in calculating the similarity, the calculation is generally performed using the audio signals in the output signal, since the audio of the same sound source will be similar.

And determining the number and position information of the sound sources according to the classification result. Namely, each type of result corresponds to one sound source, and the position of the corresponding sound source can be determined according to the pitch angle and the azimuth angle in the output signal of each type of result.

The embodiment of the invention provides a sound source positioning method based on a microphone array, which classifies output signals according to the similarity by calculating the similarity of all the output signals of a plurality of microphone arrays, determines the position of a sound source according to a classification result and solves the problem that the sound source matching cannot be realized in the prior art.

As an improvement of the above scheme, the calculating a similarity between any two output signals of different microphone arrays, and classifying all the output signals according to the similarity to obtain a classification result specifically includes:

Specifically, an initial classification result is obtained from the output signal of any microphone array. For example, if the first microphone array has K output signals, each output signal is taken as a class, and the initial classification result has K classes.

And calculating the similarity between any output signal of other microphone arrays and all the categories in the initial classification result, and acquiring the maximum similarity. It is to be noted that, in calculating the similarity of the output signals of the other microphone arrays to all the categories of the initial classification result, the similarity of the output signals of the other microphone arrays to the respective output signals in each category is also calculated.

When the maximum similarity is larger than a preset threshold, any output signal is classified into the category of the initial classification result corresponding to the maximum similarity, and the calculated output signal and the corresponding category are the same and correspond to the same sound source, so that the output signals need to be classified into the same category.

When the maximum similarity is smaller than the preset threshold, it indicates that the calculated output signals are not the same as the existing initial classification results, the initial classification results need to be updated according to any output signal, the output signals which are correspondingly calculated are independently used as a new class and added into the initial classification results, and in the subsequent similarity calculation of other output signals, the similarity with the new class is also calculated.

To further the understanding of this embodiment of the present invention, an example is described below. For example, the first microphone array has 3 output signals, and 3 sets, namely C1 ═ { O (1,1) }, C2 ═ O (1,2) }, and C3 ═ O (1,3) }, are established in advance. For the 1 st output signal O (2,1) of the second microphone array, the similarity of O (2,1) to each element in each existing set is calculated, and if the similarity of O (2,1) to each element in C1, C2, C3 is smaller than the threshold T, a set C4 ═ O (2,1) }iscreated for O (2, 1). For the 2 nd output signal O (2,2) of the second microphone array, the similarity between O (2,2) and O (1,1) is calculated to be greater than the threshold T, and then O (2,2) is also added to the set C1 corresponding to O (1, 1). To this end, we can get 4 sets, C1 ═ { O (1,1), O (2,2) }, C2 ═ { O (1,2) }, C3 ═ O (1,3) }, C4 ═ O (2,1) }. Similarly, the output signals of other microphone arrays are calculated according to a similar method, which is not described herein.

As an improvement of the above scheme, the similarity is calculated from a cross-correlation function.

Specifically, the similarity is calculated according to a cross-correlation function, that is, the cross-correlation value of any output signal and each output signal in each category is calculated by using the cross-correlation function, and the maximum cross-correlation value is taken as the similarity between the two corresponding output signals.

As an improvement of the above solution, the calculating the similarity between any output signal of the other microphone array and all the categories in the initial classification result specifically includes:

according to the formula

Specifically, two output signals of which the similarity needs to be calculated are converted into a frequency domain through fast Fourier transform to obtain corresponding frequency domain signals, and then the corresponding frequency domain signals are obtained according to a formula

Calculating a similarity r corresponding to the two output signals, wherein S_iAn ith frequency domain signal representing the output signal corresponding to any one of the classes in the initial classification result,i is more than or equal to 1 and less than or equal to I/2, I is the audio frequency length of the output signal corresponding to any category in the initial classification result, namely the length is I points,

representing the average, T, of all frequency domain signals representing the output signal corresponding to any one of the classes in the initial classification result_jJ ≦ 1 ≦ J/2, J being the audio length of any output signal of the other microphone array, i.e. the length of J points,

the average value of all frequency domain signals representing any output signal of other microphone arrays is 0 ≦ r ≦ 1, and the larger r is, the more similar the two output signals are.

As an improvement of the above scheme, the determining the number and the position information of the sound sources according to the classification result specifically includes:

Specifically, the number of sound sources is determined according to the number of categories of the classification result. Generally, the number of sound sources is equal to the number of categories.

And determining the position information of the corresponding sound source according to the output signal corresponding to each category. Generally, position information of a corresponding sound source is determined according to a pitch angle and an azimuth angle in an output signal.

As an improvement of the above solution, before the determining the number of sound sources according to the number of categories of the classification result, the method further includes:

Specifically, the categories of which the number of output signals in the classification result is less than the preset number are deleted. Optionally, the preset number is at least three, and the class with only 1 output signal must be deleted, because at least two output signals can implement the triangulation of the sound source, but only two output signals perform the triangulation, the error of the positioning result may be large, and in order to make the positioning more accurate, only the class with three output signals and more than three output signals is reserved.

After the unsatisfactory classes are deleted, the position information of the corresponding sound source can be determined according to the remaining classes. Constructing a cost function from the remaining output signals of any of the classes

And solving the cost function to obtain the spatial coordinates of the corresponding sound source. Wherein the content of the first and second substances,

to output signals

The straight line of the point-to-be-pointed,

for the mth output signal in either category,

to a pitch angle, θ_mM is more than or equal to 1 and less than or equal to M, and M is the total number of all output signals in any category; p_m＝H_mP，P＝(x,y,z)，H_mA spatial transformation matrix of the microphone array corresponding to the mth output signal relative to the world coordinate system, wherein (x, y, z) is the coordinate of the sound source point P in the world coordinate system, and P is the coordinate of the sound source point P in the world coordinate system_mThe coordinates of the sound source point P under the array coordinate system of the microphone array corresponding to the mth output signal; d_mTo output signals

Distance from sound source point PSeparating; n is a preset norm. Optionally, n is 2, and the corresponding solution method is a least square method; when n is 1, the corresponding solving method is a gradient descent method.

Referring to fig. 3, it is a schematic structural diagram of a sound source localization apparatus based on a microphone array according to the embodiment of the present invention, where the apparatus includes:

a signal acquisition module 11, configured to acquire output signals of a plurality of microphone arrays;

the classification module 12 is configured to calculate similarity between any two output signals, and classify all the output signals according to the similarity to obtain a classification result;

and the positioning module 13 is configured to determine the number and the position information of the sound sources according to the classification result.

Preferably, the classification module 12 specifically includes:

the initial classification unit is used for obtaining an initial classification result according to an output signal of any microphone array;

the calculating unit is used for calculating the similarity between any output signal of other microphone arrays and all categories in the initial classification result and acquiring the maximum similarity;

the dividing unit is used for classifying any output signal into the category of the initial classification result corresponding to the maximum similarity when the maximum similarity is larger than a preset threshold;

and the updating unit is used for updating the initial classification result according to any output signal when the maximum similarity is smaller than a preset threshold value.

Preferably, the similarity is calculated from a cross-correlation function.

Preferably, the computing unit specifically includes:

a similarity operator unit for calculating a similarity according to a formula

Preferably, the positioning module 13 specifically includes:

a sound source number determination unit for determining the number of sound sources according to the classification number of the classification result;

and the sound source positioning unit is used for determining the position information of the corresponding sound source according to the output signal corresponding to each category.

Preferably, the positioning module 13 further comprises:

and the deleting unit is used for deleting the categories of which the number of the output signals in the classification result is less than the preset number.

The sound source positioning device based on the microphone array provided by the embodiment of the invention can realize all the processes of the sound source positioning method based on the microphone array described in any one of the embodiments, and the functions and the realized technical effects of each module and unit in the device are respectively the same as the functions and the realized technical effects of the sound source positioning method based on the microphone array described in the embodiment, and are not repeated herein.

Referring to fig. 4, it is a schematic diagram of a microphone array based sound source positioning apparatus provided by the embodiment of the present invention, the positioning apparatus includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, and when the processor 10 executes the computer program, the microphone array based sound source positioning method described in any of the above embodiments is implemented.

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 20 and executed by the processor 10 to implement the present invention. One or more of the modules/elements may be a series of computer program instruction segments capable of performing specific functions describing the execution of a computer program in a microphone array based sound source localization. For example, the computer program may be divided into a signal acquisition module, a classification module and a positioning module, and each module has the following specific functions:

The positioning device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The positioning device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the schematic diagram 4 is merely an example of a pointing device and is not intended to be limiting, and may include more or fewer components than those shown, or some components may be combined, or different components, for example, the pointing device may also include input output devices, network access devices, buses, etc.

The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor 10 may be any conventional processor or the like, the processor 10 being the control center for the pointing device and utilizing various interfaces and lines to connect the various parts of the entire pointing device.

The memory 20 may be used to store the computer programs and/or modules, and the processor 10 implements the various functions of the positioning device by running or executing the computer programs and/or modules stored in the memory 20 and invoking data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein the module integrated with the positioning device can be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the embodiments of the method when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to perform the sound source positioning method based on a microphone array according to any of the above embodiments.

In summary, the sound source positioning method, device, positioning apparatus and storage medium based on a microphone array provided in the embodiments of the present invention calculate the similarity of all output signals of a plurality of microphone arrays, and classify the output signals according to the similarity, thereby determining which output signals correspond to the same sound source, and solving the problem that the sound source matching cannot be implemented in the prior art. Meanwhile, after the output signal corresponding to each sound source is determined, the position information of the sound source can be determined according to the corresponding output signal, and the sound source can be accurately positioned.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A sound source positioning method based on a microphone array is characterized by comprising the following steps:

acquiring output signals of a plurality of microphone arrays;

2. The method for positioning a sound source based on a microphone array according to claim 1, wherein the calculating a similarity between any two output signals of different microphone arrays and classifying all the output signals according to the similarity to obtain a classification result comprises:

3. The microphone array-based sound source localization method of claim 1, wherein the similarity is calculated according to a cross-correlation function.

4. The method as claimed in claim 2, wherein the calculating the similarity between any output signal of other microphone arrays and all classes in the initial classification result comprises:

according to the formula

5. The sound source localization method based on a microphone array according to claim 1, wherein the determining the number and location information of the sound sources according to the classification result specifically comprises:

6. The microphone array-based sound source localization method of claim 5, further comprising, before the determining the number of sound sources according to the number of categories of the classification result:

7. A sound source localization apparatus based on a microphone array, comprising:

8. A microphone array based sound source localization device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor when executing the computer program implementing a microphone array based sound source localization method according to any of claims 1 to 6.

9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform a microphone array based sound source localization method according to any one of claims 1 to 6.