CN106950542A

CN106950542A - The localization method of sound source, apparatus and system

Info

Publication number: CN106950542A
Application number: CN201610010206.1A
Authority: CN
Inventors: 唐邦友; 李星; 黄家典
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2016-01-06
Filing date: 2016-01-06
Publication date: 2017-07-14

Abstract

The invention provides a kind of localization method of sound source, apparatus and system, wherein, this method includes：Obtain the signal of each microphone in microphone array, wherein, the microphone array is used for the sound for gathering sound source, framing according to the signal, obtain the controlled power response of multiple microphones pair of each microphone composition, obtain the controlled power response sum of the plurality of microphone pair, the maximum for responding sum according to the controlled power determines the deflection of the sound source and the microphone array, the relative position relation of the microphone array and the sound source is determined according to direction angle, the problem of solving not high controlled power response technology resolution ratio and big operand, improve the real-time of auditory localization, stability and precision.

Description

Sound source positioning method, device and system

Technical Field

The invention relates to the field of communication, in particular to a method, a device and a system for positioning a sound source.

Background

In the process of meeting television, a speaker needs to be shot intensively to acquire important information such as body language, facial expression and the like. When a speaker is not in the shooting range, the traditional method is that a remote controller is manually used to control a camera to rotate, so that the speaker is in the shooting range; especially when the speaker frequently changes, the manual method is very inconvenient, and important information is lost due to the delay operation. The camera capable of automatically tracking the speaker can make up the defects, and brings better experience to the two parties participating in the conference.

The camera capable of tracking the speaker adopts the sound source positioning technology. The calculation of the sound source azimuth using a microphone array is a basic method of sound source localization. The design of the microphone array is closely related to a sound source positioning algorithm besides the requirement attribute and cost consideration of products. In particular, the topology, size, and number of microphones of the microphone array are related to the sound source localization algorithm employed, and they are complementary and inseparable. In addition, the sound source positioning algorithm determines the position relationship between the microphone array and the camera to a great extent. In summary, a camera device that can track a speaker is closely related to a sound source localization algorithm.

In the related art, in the maximum output power-based controllable beamforming (controllable power response) technique in the sound source localization method of the microphone array, the controllable power response technique must select the arrival direction from a set of discrete beamforming angles, so that the resolution of the sound source is significantly degraded when the sound source is located far away. In addition, the beam forming method is a nonlinear optimization problem, and needs to perform global search, so that the calculation amount is large, and real-time implementation is not easy. The above disadvantages limit the application of this method.

Aiming at the problems of low resolution and large computation of the controllable power response technology in the related technology, no effective solution is available at present.

Disclosure of Invention

The invention provides a sound source positioning method, a sound source positioning device and a sound source positioning system, which are used for at least solving the problems of low resolution and large computation amount of a controllable power response technology in the related technology.

According to an aspect of the present invention, there is provided a sound source localization method, including:

acquiring signals of all microphones in a microphone array, wherein the microphone array is used for collecting sound of a sound source;

acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals;

obtaining the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses;

determining a relative positional relationship of the microphone array and the sound source from the directional angle.

Further, calculating controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signal comprises:

and selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by the microphones according to the reference signal.

Further, the microphone array includes:

in the same coordinate system plane, M microphones in the abscissa axial direction of the coordinate system plane and N microphones in the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;

a plurality of microphone pairs of the microphone array comprising: the M microphones form (M-1))/2 pairs of microphones; the N microphones form (N x (N-1))/2 pairs of microphones.

Further, calculating a sum of the controllable power responses of the plurality of microphone pairs, determining a directional angle of the sound source with respect to the microphone array from a maximum of the sum of the controllable power responses comprises:

establishing a three-dimensional coordinate system from the microphone array, the three-dimensional coordinate system comprising: the microphone array comprises an X axis, a Y axis, a Z axis, an origin O and a sound source point P, wherein the X axis is M microphones in the horizontal coordinate axial direction, and the Y axis is N microphones in the vertical coordinate axial direction;

calculating the time delays τ of the plurality of microphone pairs:

d is the interval of the microphone pair on the coordinate axis, C is the sound velocity, theta is the angle rotated from the X axis to a line segment OS in the anticlockwise direction when viewed from the positive Z axis, and a point S is the projection of a sound source point P on an XOY plane to which the X axis and the Y axis belong, wherein the line segment OS is a line segment from an original point O to the point S;

calculating a sum E of controllable power responses of the plurality of microphone pairs:

p, q are a pair number of the microphone pair, M is the number of microphones on the X-axis,

a controllable power for the microphone pair;

obtaining a direction angle θ at which E takes a maximum value:

θ＝arg max E(θ)。

further, determining the relative positional relationship of the microphone array and the sound source according to the direction angle includes:

calculating a pitch angle gamma of the sound source in the three-dimensional coordinate system according to the direction angle theta;

γ＝arctan(a)

and lambda is an included angle between a directed line segment OP and the positive direction of the Z axis, and the line segment OP is a line segment from the origin O to the sound source point P.

Further, calculating the direction angle θ and the pitch angle γ of the sound source in the three-dimensional coordinate system includes:

dividing axial 0 degree to 180 degrees angle of X axle or Y axle into H intervals in the predetermined frame number of framing, statistics direction angle theta with pitch angle gamma falls into the number of times of interval, select the interval that the number of times is the biggest, will the number of times is the biggest the direction angle theta with pitch angle gamma gets the average value respectively, obtains respectively the sound source is in three-dimensional coordinate system direction angle theta with pitch angle gamma, wherein, H is positive integer.

According to another aspect of the present invention, there is also provided a sound source localization apparatus including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring signals of all microphones in a microphone array, and the microphone array is used for acquiring sound of a sound source;

the second acquisition module is used for acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals;

a third obtaining module, configured to obtain a sum of the controllable power responses of the plurality of microphone pairs, and determine a direction angle between the sound source and the microphone array according to a maximum value of the sum of the controllable power responses;

and the position module is used for determining the relative position relation between the microphone array and the sound source according to the direction angle.

Further, the second obtaining module includes:

and the reference unit is used for selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by the microphones according to the reference signal.

According to another aspect of the present invention, there is also provided a sound source localization system including:

a microphone array control unit, and a camera, wherein,

the microphone array control unit is used for acquiring signals of all microphones in a microphone array, wherein the microphone array is used for acquiring sound of a sound source; acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals; obtaining the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses; determining a relative positional relationship of the microphone array and the sound source according to the direction angle; sending the relative position relation to the camera;

the camera is configured to adjust the position of the camera 34 according to the relative position relationship.

Further, the microphone array is realized by the following steps: in the same coordinate system plane, M microphones in the abscissa axial direction of the coordinate system plane and N microphones in the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;

According to the invention, the signals of each microphone in the microphone array are acquired, wherein the microphone array is used for acquiring the sound of a sound source, the controllable power responses of a plurality of microphone pairs formed by each microphone are acquired according to the framing of the signals, the sum of the controllable power responses of the plurality of microphone pairs is acquired, the direction angle of the sound source and the microphone array is determined according to the maximum value of the sum of the controllable power responses, and the relative position relationship between the microphone array and the sound source is determined according to the direction angle, so that the problems of low resolution and large computation amount of the controllable power response technology are solved, and the real-time performance, the stability and the precision of sound source positioning are improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a flowchart of a method of positioning a sound source according to an embodiment of the present invention;

fig. 2 is a block diagram showing a structure of a sound source localization apparatus according to an embodiment of the present invention;

fig. 3 is a block diagram of a sound source localization system according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a trackable speaker camera system according to a preferred embodiment of the present invention;

FIG. 5 is a schematic flow chart of a sound source localization algorithm according to a preferred embodiment of the present invention;

FIG. 6 is a schematic diagram of a three-dimensional coordinate system model of a microphone array in accordance with an embodiment of the invention;

fig. 7 is a schematic diagram of the relationship between horizontal declination and elevation angle according to a preferred embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

In the present embodiment, a method for positioning a sound source is provided, and fig. 1 is a flowchart of a method for positioning a sound source according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:

step S102, acquiring signals of each microphone in a microphone array, wherein the microphone array is used for collecting sound of a sound source;

step S104, acquiring controllable power responses of a plurality of microphone pairs formed by each microphone according to the framing of the signal;

step S106, obtaining the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses;

step S108, determining the relative position relationship between the microphone array and the sound source according to the direction angle.

Through the steps, the signals of all the microphones in the microphone array are obtained, wherein the microphone array is used for collecting the sound of a sound source, the controllable power responses of a plurality of microphone pairs formed by all the microphones are obtained according to the framing of the signals, the sum of the controllable power responses of the plurality of microphone pairs is obtained, the direction angle of the sound source and the microphone array is determined according to the maximum value of the sum of the controllable power responses, and the relative position relation between the microphone array and the sound source is determined according to the direction angle.

In this embodiment, calculating the controllable power response of the plurality of microphone pairs formed by the microphones according to the framing of the signal includes:

and selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by each microphone according to the reference signal.

Wherein the microphone array comprises:

in the same coordinate system plane, M microphones on the abscissa axial direction of the coordinate system plane and N microphones on the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;

the plurality of microphone pairs of the microphone array includes: the M microphones form (M-1))/2 pairs of microphones; the N microphones form (N x (N-1))/2 pairs of microphones.

In this embodiment, a three-dimensional coordinate system is established according to the microphone array, the three-dimensional coordinate system includes: the system comprises an X axis, a Y axis, a Z axis, an origin O and a sound source point P, wherein the X axis is M microphones in the horizontal coordinate axial direction, and the Y axis is N microphones in the vertical coordinate axial direction;

calculating the time delays τ of the plurality of microphone pairs:

a controllable power for the microphone pair;

obtaining a direction angle θ at which E takes a maximum value:

θ＝arg max E(θ)。

calculating the pitch angle gamma of the sound source in the three-dimensional coordinate system by the direction angle theta;

γ＝arctan(a)

lambda is the positive included angle between the directed line segment OP and the Z axis, and the line segment OP is the line segment from the origin O to the sound source point P.

In this embodiment, calculating the direction angle θ and the pitch angle γ of the sound source in the three-dimensional coordinate system includes:

dividing an axial angle of 0 degree to 180 degrees of an X axis or a Y axis into H sections, counting the times that the direction angle theta and the pitch angle gamma fall into the sections within a preset number of frames of the frames, selecting the section with the maximum times, averaging the direction angle theta and the pitch angle gamma of the section with the maximum times respectively to obtain the direction angle theta and the pitch angle gamma of the sound source in the three-dimensional coordinate system respectively, wherein H is a positive integer.

In this embodiment, a sound source positioning device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 2 is a block diagram showing a structure of a sound source localization apparatus according to an embodiment of the present invention, as shown in fig. 2, the apparatus including:

a first obtaining module 22, configured to obtain signals of respective microphones in a microphone array, where the microphone array is configured to collect sound of a sound source;

a second obtaining module 24, connected to the first obtaining module 22, for obtaining controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signal;

a third obtaining module 26, connected to the second obtaining module 24, for obtaining a sum of the controllable power responses of the plurality of microphone pairs, and determining a direction angle between the sound source and the microphone array according to a maximum value of the sum of the controllable power responses;

and a position module 28 connected to the third obtaining module 26 for determining a relative position relationship between the microphone array and the sound source according to the direction angle.

With the above device, the first obtaining module 22 obtains signals of each microphone in a microphone array, where the microphone array is used to collect sound of a sound source, the second obtaining module 24 obtains controllable power responses of a plurality of microphone pairs formed by each microphone according to framing of the signals, the third obtaining module 26 obtains a sum of the controllable power responses of the plurality of microphone pairs, determines a direction angle between the sound source and the microphone array according to a maximum value of the sum of the controllable power responses, and the position module 28 determines a relative position relationship between the microphone array and the sound source according to the direction angle, so as to solve the problems of low resolution and large computation amount in the controllable power response technology, and improve real-time performance, stability and accuracy of sound source positioning.

The second obtaining module 24 includes:

Fig. 3 is a block diagram showing a configuration of a sound source localization system according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes:

a microphone array control unit 32, and a camera 34, wherein,

the microphone array control unit 32 is configured to acquire signals of respective microphones in a microphone array, where the microphone array is configured to collect sound of a sound source; acquiring controllable power responses of a plurality of microphone pairs formed by each microphone according to the framing of the signal; acquiring the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses; determining the relative position relationship between the microphone array and the sound source according to the direction angle; sending the relative positional relationship to the camera 34;

the camera 34 is configured to adjust a position of the camera 34 according to the relative position relationship.

Further, the microphone array is realized by the following method: in the same coordinate system plane, M microphones on the abscissa axial direction of the coordinate system plane and N microphones on the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;

The present invention will be described in detail with reference to preferred examples and embodiments.

The preferred embodiment of the invention provides a simple, practical and highly reliable camera device for tracking the speaker, and meanwhile, some improvement measures are taken aiming at the defects of the existing sound source positioning algorithm, so that the real-time property, the stability and the precision of sound source positioning are improved.

Fig. 4 is a schematic diagram of a camera system for tracking a speaker according to a preferred embodiment of the present invention, as shown in fig. 4, the apparatus comprising:

an omnidirectional microphone array 42 (equivalent to the microphone array control unit 32); a camera 44 (corresponding to the camera 34); the preferred embodiment of the present invention uses a plurality of high sensitivity omni-directional microphone arrays 42, as shown in fig. 4, with the omni-directional microphone arrays 42 being located in the same plane. M microphones in horizontal row, N microphones in vertical row and the microphones in horizontal and vertical rows form an L-shaped topological structure. The horizontal and vertical microphones are respectively arranged at equal intervals. The camera 44 is placed within a 90 degree angular spatial range formed by the horizontal and vertical microphone arrays.

The preferred embodiment of the present invention calculates the yaw angle of the sound source through a sound source localization algorithm using data collected by the transverse microphones and calculates the pitch angle of the sound source using data collected by the vertical microphones in combination with the yaw angle.

In the aspect of a sound source positioning algorithm, the preferred embodiment of the invention uses a far-field model of a sound field, and provides a controllable Power Response (SPR) sound source positioning technology based on plane search. The algorithm comprises the following steps:

step one, establishing a microphone array model in a three-dimensional coordinate system, and determining the position of each microphone in the coordinate. The abscissa axis of the invention is provided with M microphones which form M (M-1)/2 pairs of microphones; the ordinate axis is provided with N microphones which form N (N-1)/2 pairs of microphones; m and N are both integers greater than 1.

And step two, sampling the signals received by each microphone to obtain digital signals, and calculating by frames.

And step three, selecting data of one microphone for Voice Activity Detection (Voice Activity Detection, VAD for short), distinguishing Voice frames from noise frames, and only processing the Voice frames in the following steps. This step can greatly increase the accuracy of the algorithm.

And step four, aliasing and windowing the voice frames of each microphone, wherein a Hamming window with the window length of 1024 is adopted in the invention, and fast Fourier transform (DFT) conversion is carried out.

And step five, calculating controllable power response of the microphone pair.

(501) Let sound source S (n) arrive at microphone p and microphone q at times τ and τ, respectively_qAnd calculating the power of the signals after the delay compensation of the microphone p and the microphone q:

wherein,

(502) in order to reduce the influence of the ambient noise and reverberation on the controllable power response, the amplitude is normalized (PHAT weighted) in the frequency domain, and only the phase information is retained, so that the following expression is obtained:

and step six, controllable power response interpolation calculation.

In a far-field model, the interval of the microphones is relatively short, and the higher sampling rate can improve the accuracy of direction angle estimation; to further improve the accuracy, the cross-correlation function needs to be interpolated. And selecting a microphone to perform ten times of interpolation on the cross-correlation function value in the maximum time delay range.

Step seven, searching the maximum controllable power response in the semi-circle range, specifically as follows:

(701) calculating the time delay of the microphone pair

d is the spacing of the microphone pairs on the coordinate axis, and C is the speed of sound.

(702) Summing the responses of all the microphones to the controllable power

(703) And finding the direction angle theta which makes E take the maximum value

θ＝arg max E(θ)

Step eight, obtaining the horizontal deflection angle theta and the included angle lambda between OP' and OZ of the sound source through the step five to the step seven, and obtaining the pitch angle of the sound source:

γ＝arctan(a)

step nine, calculating the statistical average value of theta and gamma

The angle is averagely divided into H intervals from 0 degree to 180 degrees, the times of the 30-frame condition theta and gamma falling in each interval are counted, the interval with the most times is selected to calculate the average value, and the obtained average value is the declination angle and the pitch angle of the sound source.

As shown in fig. 4, a schematic diagram of a camera system capable of tracking a speaker in an embodiment of the present invention. In the schematic diagram, a microphone array in horizontal and vertical rows forms an L-shaped topological structure, all microphones are placed in the same direction and are located in the same plane, five microphones are arranged in the horizontal row, four microphones are arranged in the vertical row, and the interval between every two adjacent microphones is 8 cm. The microphone spacing of the present invention is not limited to 8cm as given in this embodiment, and other lengths may be used as alternatives of the present invention, selected according to the specific implementation requirements. In the schematic diagram, the camera is located right above the microphone Mic2 # 2, and any position in the 90-degree angle space range formed by the horizontal microphone array and the vertical microphone array can be used as an alternative of the invention.

Fig. 5 is a flow chart of a sound source localization algorithm according to a preferred embodiment of the present invention, and as shown in fig. 5, the sound source localization algorithm proposed by the preferred embodiment of the present invention comprises the following steps:

step S501 and fig. 6 are schematic diagrams of a three-dimensional coordinate system model of a microphone array according to an embodiment of the present invention, and as shown in fig. 6, a microphone array model is established in the three-dimensional coordinate system to determine the position of each microphone in the coordinate system. In the embodiment, 5 microphones are arranged on the axis of abscissa, and 10 pairs of microphones are formed; the ordinate axis has 4 microphones, making up 6 pairs of microphones.

Step S502, sampling signals received by each microphone, wherein the sampling rate is 48000Hz, digital signals are obtained, and the frame length is 20ms through frame calculation. In fact, the frame length is longer and has higher estimation accuracy, but the operation amount is increased significantly, so the frame length is limited to 20 ms.

Step S503, selecting data of one microphone to perform Voice Activity Detection (VAD), distinguishing voice frames from noise frames, and only processing the voice frames in the following steps. Since noise degrades the performance of the algorithm, only speech frames are selected for processing, which can greatly improve the robustness of the algorithm.

Step S504, aliasing and windowing the voice frames of each microphone, the invention adopts a Hamming window with the window length of 1024, and DFT conversion is carried out:

DFT is the part with the largest operation amount in the algorithm, and for this reason, the efficient split-radix FFT fast algorithm is particularly adopted to realize the DFT equivalently, so that the operation amount is greatly reduced.

And step S505, calculating controllable power response of the microphone pair.

(5051) Let the sound source S (n) arrive at the microphone p and the microphone q at times τ, respectively_pAnd τ_qAnd calculating the power of the signals after the time domain alignment of the microphone p and the microphone q:

wherein,

x_p(n)X_q(k)^*is x_p(n) and x_qCross power spectrum of (n).

(5052) In order to reduce the influence of the ambient noise and reverberation on the controllable power response, the amplitude is normalized (PHAT weighted) in the frequency domain, and only the phase information is retained, so that the following expression is obtained:

when noise is ignored, x_p(n)＝s(n-τ_p). Performing FFT to obtain:

therefore, the first and second electrodes are formed on the substrate,

and step S506, performing controllable power response interpolation calculation.

Step S507, searching for the maximum controllable power response in the semicircular range, which is specifically as follows:

(5071) calculating the time delay of each microphone pair

(5072) Summing the responses of all the microphones to the controllable power

(5073) And finding the direction angle theta which makes E take the maximum value

θ＝arg max(E(θ))

Step S508 and fig. 7 are schematic diagrams showing the relationship between the horizontal declination angle and the elevation angle according to the preferred embodiment of the present invention, and as shown in fig. 7, the included angle λ between the horizontal declination angle θ and OP' of the sound source and OZ can be obtained through steps S505 to S507, and the elevation angle of the sound source can be obtained through the following reasoning:

the coordinates of the sound source P are expressed in polar coordinates as:

and due to

Then we find the pitch angle as:

γ＝arctan(a)

step S509, calculate the statistical average of θ and γ.

The angle is averagely divided into H intervals from 0 degree to 180 degrees, the times of the 30-frame condition theta and gamma falling in each interval are counted, the interval with the most times is selected to calculate the average value, and the obtained average value is the declination angle and the pitch angle of the sound source. And when the frame is less than 30 frames, outputting the last statistical result by the current frame, and outputting the newly-counted angle until the frame is 30 frames. The step can reduce external interference and reduce the rotation times of the camera.

The sound source positioning algorithm provided by the preferred embodiment of the invention can effectively improve the accuracy and stability of sound source positioning in noise and reverberation environments, and the microphone array camera device based on the algorithm can accurately track a speaker in real time and has good stability.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in a plurality of processors.

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store program codes for executing the method steps of the above embodiment:

optionally, the storage medium is further arranged to store program code for performing the method steps of the above embodiments:

optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Optionally, in this embodiment, the processor executes the method steps of the above embodiments according to the program code stored in the storage medium.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for locating a sound source, comprising:

2. The method of claim 1, wherein calculating the controllable power response of the plurality of microphone pairs of each microphone from the frame of the signal comprises:

3. The method of claim 1, wherein the microphone array comprises:

4. The method of claim 3, wherein calculating a sum of the controllable power responses of the plurality of microphone pairs, and wherein determining the directional angle of the sound source from the microphone array based on the maximum of the sum of the controllable power responses comprises:

calculating the time delays τ of the plurality of microphone pairs:

τ = \frac{R}{C} = \frac{d c o s θ}{C}

E = Σ_{p = 1}^{M} Σ_{q = p + 1}^{M} R_{x_{p}, x_{q}} | τ

R_{x_{p} (n), x_{q} (n)} (τ)

a controllable power for the microphone pair;

obtaining a direction angle θ at which E takes a maximum value:

θ＝arg max E(θ)。

5. the method of claim 4, wherein determining the relative positional relationship of the microphone array and the sound source as a function of the directional angle comprises:

t a n γ = \frac{t a n λ}{\sin θ} = a

γ＝arctan(a)

6. The method according to claim 5, wherein calculating the direction angle θ and the pitch angle γ of the sound source in the three-dimensional coordinate system comprises:

7. An apparatus for locating a sound source, comprising:

8. The apparatus of claim 7, wherein the second obtaining module comprises:

9. A system for locating a sound source, comprising:

a microphone array control unit, and a camera, wherein,

and the camera is used for adjusting the position of the camera according to the relative position relation.

10. The system of claim 9,

the microphone array is realized by the following steps: in the same coordinate system plane, M microphones in the abscissa axial direction of the coordinate system plane and N microphones in the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;