CN106950542A - The localization method of sound source, apparatus and system - Google Patents

The localization method of sound source, apparatus and system Download PDF

Info

Publication number
CN106950542A
CN106950542A CN201610010206.1A CN201610010206A CN106950542A CN 106950542 A CN106950542 A CN 106950542A CN 201610010206 A CN201610010206 A CN 201610010206A CN 106950542 A CN106950542 A CN 106950542A
Authority
CN
China
Prior art keywords
microphone
microphones
sound source
microphone array
controllable power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201610010206.1A
Other languages
Chinese (zh)
Inventor
唐邦友
李星
黄家典
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201610010206.1A priority Critical patent/CN106950542A/en
Publication of CN106950542A publication Critical patent/CN106950542A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a kind of localization method of sound source, apparatus and system, wherein, this method includes:Obtain the signal of each microphone in microphone array, wherein, the microphone array is used for the sound for gathering sound source, framing according to the signal, obtain the controlled power response of multiple microphones pair of each microphone composition, obtain the controlled power response sum of the plurality of microphone pair, the maximum for responding sum according to the controlled power determines the deflection of the sound source and the microphone array, the relative position relation of the microphone array and the sound source is determined according to direction angle, the problem of solving not high controlled power response technology resolution ratio and big operand, improve the real-time of auditory localization, stability and precision.

Description

Sound source positioning method, device and system
Technical Field
The invention relates to the field of communication, in particular to a method, a device and a system for positioning a sound source.
Background
In the process of meeting television, a speaker needs to be shot intensively to acquire important information such as body language, facial expression and the like. When a speaker is not in the shooting range, the traditional method is that a remote controller is manually used to control a camera to rotate, so that the speaker is in the shooting range; especially when the speaker frequently changes, the manual method is very inconvenient, and important information is lost due to the delay operation. The camera capable of automatically tracking the speaker can make up the defects, and brings better experience to the two parties participating in the conference.
The camera capable of tracking the speaker adopts the sound source positioning technology. The calculation of the sound source azimuth using a microphone array is a basic method of sound source localization. The design of the microphone array is closely related to a sound source positioning algorithm besides the requirement attribute and cost consideration of products. In particular, the topology, size, and number of microphones of the microphone array are related to the sound source localization algorithm employed, and they are complementary and inseparable. In addition, the sound source positioning algorithm determines the position relationship between the microphone array and the camera to a great extent. In summary, a camera device that can track a speaker is closely related to a sound source localization algorithm.
In the related art, in the maximum output power-based controllable beamforming (controllable power response) technique in the sound source localization method of the microphone array, the controllable power response technique must select the arrival direction from a set of discrete beamforming angles, so that the resolution of the sound source is significantly degraded when the sound source is located far away. In addition, the beam forming method is a nonlinear optimization problem, and needs to perform global search, so that the calculation amount is large, and real-time implementation is not easy. The above disadvantages limit the application of this method.
Aiming at the problems of low resolution and large computation of the controllable power response technology in the related technology, no effective solution is available at present.
Disclosure of Invention
The invention provides a sound source positioning method, a sound source positioning device and a sound source positioning system, which are used for at least solving the problems of low resolution and large computation amount of a controllable power response technology in the related technology.
According to an aspect of the present invention, there is provided a sound source localization method, including:
acquiring signals of all microphones in a microphone array, wherein the microphone array is used for collecting sound of a sound source;
acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals;
obtaining the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses;
determining a relative positional relationship of the microphone array and the sound source from the directional angle.
Further, calculating controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signal comprises:
and selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by the microphones according to the reference signal.
Further, the microphone array includes:
in the same coordinate system plane, M microphones in the abscissa axial direction of the coordinate system plane and N microphones in the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;
a plurality of microphone pairs of the microphone array comprising: the M microphones form (M-1))/2 pairs of microphones; the N microphones form (N x (N-1))/2 pairs of microphones.
Further, calculating a sum of the controllable power responses of the plurality of microphone pairs, determining a directional angle of the sound source with respect to the microphone array from a maximum of the sum of the controllable power responses comprises:
establishing a three-dimensional coordinate system from the microphone array, the three-dimensional coordinate system comprising: the microphone array comprises an X axis, a Y axis, a Z axis, an origin O and a sound source point P, wherein the X axis is M microphones in the horizontal coordinate axial direction, and the Y axis is N microphones in the vertical coordinate axial direction;
calculating the time delays τ of the plurality of microphone pairs:
d is the interval of the microphone pair on the coordinate axis, C is the sound velocity, theta is the angle rotated from the X axis to a line segment OS in the anticlockwise direction when viewed from the positive Z axis, and a point S is the projection of a sound source point P on an XOY plane to which the X axis and the Y axis belong, wherein the line segment OS is a line segment from an original point O to the point S;
calculating a sum E of controllable power responses of the plurality of microphone pairs:
p, q are a pair number of the microphone pair, M is the number of microphones on the X-axis,
a controllable power for the microphone pair;
obtaining a direction angle θ at which E takes a maximum value:
θ=arg max E(θ)。
further, determining the relative positional relationship of the microphone array and the sound source according to the direction angle includes:
calculating a pitch angle gamma of the sound source in the three-dimensional coordinate system according to the direction angle theta;
γ=arctan(a)
and lambda is an included angle between a directed line segment OP and the positive direction of the Z axis, and the line segment OP is a line segment from the origin O to the sound source point P.
Further, calculating the direction angle θ and the pitch angle γ of the sound source in the three-dimensional coordinate system includes:
dividing axial 0 degree to 180 degrees angle of X axle or Y axle into H intervals in the predetermined frame number of framing, statistics direction angle theta with pitch angle gamma falls into the number of times of interval, select the interval that the number of times is the biggest, will the number of times is the biggest the direction angle theta with pitch angle gamma gets the average value respectively, obtains respectively the sound source is in three-dimensional coordinate system direction angle theta with pitch angle gamma, wherein, H is positive integer.
According to another aspect of the present invention, there is also provided a sound source localization apparatus including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring signals of all microphones in a microphone array, and the microphone array is used for acquiring sound of a sound source;
the second acquisition module is used for acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals;
a third obtaining module, configured to obtain a sum of the controllable power responses of the plurality of microphone pairs, and determine a direction angle between the sound source and the microphone array according to a maximum value of the sum of the controllable power responses;
and the position module is used for determining the relative position relation between the microphone array and the sound source according to the direction angle.
Further, the second obtaining module includes:
and the reference unit is used for selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by the microphones according to the reference signal.
According to another aspect of the present invention, there is also provided a sound source localization system including:
a microphone array control unit, and a camera, wherein,
the microphone array control unit is used for acquiring signals of all microphones in a microphone array, wherein the microphone array is used for acquiring sound of a sound source; acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals; obtaining the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses; determining a relative positional relationship of the microphone array and the sound source according to the direction angle; sending the relative position relation to the camera;
the camera is configured to adjust the position of the camera 34 according to the relative position relationship.
Further, the microphone array is realized by the following steps: in the same coordinate system plane, M microphones in the abscissa axial direction of the coordinate system plane and N microphones in the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;
a plurality of microphone pairs of the microphone array comprising: the M microphones form (M-1))/2 pairs of microphones; the N microphones form (N x (N-1))/2 pairs of microphones.
According to the invention, the signals of each microphone in the microphone array are acquired, wherein the microphone array is used for acquiring the sound of a sound source, the controllable power responses of a plurality of microphone pairs formed by each microphone are acquired according to the framing of the signals, the sum of the controllable power responses of the plurality of microphone pairs is acquired, the direction angle of the sound source and the microphone array is determined according to the maximum value of the sum of the controllable power responses, and the relative position relationship between the microphone array and the sound source is determined according to the direction angle, so that the problems of low resolution and large computation amount of the controllable power response technology are solved, and the real-time performance, the stability and the precision of sound source positioning are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a flowchart of a method of positioning a sound source according to an embodiment of the present invention;
fig. 2 is a block diagram showing a structure of a sound source localization apparatus according to an embodiment of the present invention;
fig. 3 is a block diagram of a sound source localization system according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a trackable speaker camera system according to a preferred embodiment of the present invention;
FIG. 5 is a schematic flow chart of a sound source localization algorithm according to a preferred embodiment of the present invention;
FIG. 6 is a schematic diagram of a three-dimensional coordinate system model of a microphone array in accordance with an embodiment of the invention;
fig. 7 is a schematic diagram of the relationship between horizontal declination and elevation angle according to a preferred embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In the present embodiment, a method for positioning a sound source is provided, and fig. 1 is a flowchart of a method for positioning a sound source according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:
step S102, acquiring signals of each microphone in a microphone array, wherein the microphone array is used for collecting sound of a sound source;
step S104, acquiring controllable power responses of a plurality of microphone pairs formed by each microphone according to the framing of the signal;
step S106, obtaining the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses;
step S108, determining the relative position relationship between the microphone array and the sound source according to the direction angle.
Through the steps, the signals of all the microphones in the microphone array are obtained, wherein the microphone array is used for collecting the sound of a sound source, the controllable power responses of a plurality of microphone pairs formed by all the microphones are obtained according to the framing of the signals, the sum of the controllable power responses of the plurality of microphone pairs is obtained, the direction angle of the sound source and the microphone array is determined according to the maximum value of the sum of the controllable power responses, and the relative position relation between the microphone array and the sound source is determined according to the direction angle.
In this embodiment, calculating the controllable power response of the plurality of microphone pairs formed by the microphones according to the framing of the signal includes:
and selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by each microphone according to the reference signal.
Wherein the microphone array comprises:
in the same coordinate system plane, M microphones on the abscissa axial direction of the coordinate system plane and N microphones on the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;
the plurality of microphone pairs of the microphone array includes: the M microphones form (M-1))/2 pairs of microphones; the N microphones form (N x (N-1))/2 pairs of microphones.
In this embodiment, a three-dimensional coordinate system is established according to the microphone array, the three-dimensional coordinate system includes: the system comprises an X axis, a Y axis, a Z axis, an origin O and a sound source point P, wherein the X axis is M microphones in the horizontal coordinate axial direction, and the Y axis is N microphones in the vertical coordinate axial direction;
calculating the time delays τ of the plurality of microphone pairs:
d is the interval of the microphone pair on the coordinate axis, C is the sound velocity, theta is the angle rotated from the X axis to a line segment OS in the anticlockwise direction when viewed from the positive Z axis, and a point S is the projection of a sound source point P on an XOY plane to which the X axis and the Y axis belong, wherein the line segment OS is a line segment from an original point O to the point S;
calculating a sum E of controllable power responses of the plurality of microphone pairs:
p, q are a pair number of the microphone pair, M is the number of microphones on the X-axis,
a controllable power for the microphone pair;
obtaining a direction angle θ at which E takes a maximum value:
θ=arg max E(θ)。
calculating the pitch angle gamma of the sound source in the three-dimensional coordinate system by the direction angle theta;
γ=arctan(a)
lambda is the positive included angle between the directed line segment OP and the Z axis, and the line segment OP is the line segment from the origin O to the sound source point P.
In this embodiment, calculating the direction angle θ and the pitch angle γ of the sound source in the three-dimensional coordinate system includes:
dividing an axial angle of 0 degree to 180 degrees of an X axis or a Y axis into H sections, counting the times that the direction angle theta and the pitch angle gamma fall into the sections within a preset number of frames of the frames, selecting the section with the maximum times, averaging the direction angle theta and the pitch angle gamma of the section with the maximum times respectively to obtain the direction angle theta and the pitch angle gamma of the sound source in the three-dimensional coordinate system respectively, wherein H is a positive integer.
In this embodiment, a sound source positioning device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 2 is a block diagram showing a structure of a sound source localization apparatus according to an embodiment of the present invention, as shown in fig. 2, the apparatus including:
a first obtaining module 22, configured to obtain signals of respective microphones in a microphone array, where the microphone array is configured to collect sound of a sound source;
a second obtaining module 24, connected to the first obtaining module 22, for obtaining controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signal;
a third obtaining module 26, connected to the second obtaining module 24, for obtaining a sum of the controllable power responses of the plurality of microphone pairs, and determining a direction angle between the sound source and the microphone array according to a maximum value of the sum of the controllable power responses;
and a position module 28 connected to the third obtaining module 26 for determining a relative position relationship between the microphone array and the sound source according to the direction angle.
With the above device, the first obtaining module 22 obtains signals of each microphone in a microphone array, where the microphone array is used to collect sound of a sound source, the second obtaining module 24 obtains controllable power responses of a plurality of microphone pairs formed by each microphone according to framing of the signals, the third obtaining module 26 obtains a sum of the controllable power responses of the plurality of microphone pairs, determines a direction angle between the sound source and the microphone array according to a maximum value of the sum of the controllable power responses, and the position module 28 determines a relative position relationship between the microphone array and the sound source according to the direction angle, so as to solve the problems of low resolution and large computation amount in the controllable power response technology, and improve real-time performance, stability and accuracy of sound source positioning.
The second obtaining module 24 includes:
and the reference unit is used for selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by the microphones according to the reference signal.
Fig. 3 is a block diagram showing a configuration of a sound source localization system according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes:
a microphone array control unit 32, and a camera 34, wherein,
the microphone array control unit 32 is configured to acquire signals of respective microphones in a microphone array, where the microphone array is configured to collect sound of a sound source; acquiring controllable power responses of a plurality of microphone pairs formed by each microphone according to the framing of the signal; acquiring the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses; determining the relative position relationship between the microphone array and the sound source according to the direction angle; sending the relative positional relationship to the camera 34;
the camera 34 is configured to adjust a position of the camera 34 according to the relative position relationship.
Further, the microphone array is realized by the following method: in the same coordinate system plane, M microphones on the abscissa axial direction of the coordinate system plane and N microphones on the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;
the plurality of microphone pairs of the microphone array includes: the M microphones form (M-1))/2 pairs of microphones; the N microphones form (N x (N-1))/2 pairs of microphones.
The present invention will be described in detail with reference to preferred examples and embodiments.
The preferred embodiment of the invention provides a simple, practical and highly reliable camera device for tracking the speaker, and meanwhile, some improvement measures are taken aiming at the defects of the existing sound source positioning algorithm, so that the real-time property, the stability and the precision of sound source positioning are improved.
Fig. 4 is a schematic diagram of a camera system for tracking a speaker according to a preferred embodiment of the present invention, as shown in fig. 4, the apparatus comprising:
an omnidirectional microphone array 42 (equivalent to the microphone array control unit 32); a camera 44 (corresponding to the camera 34); the preferred embodiment of the present invention uses a plurality of high sensitivity omni-directional microphone arrays 42, as shown in fig. 4, with the omni-directional microphone arrays 42 being located in the same plane. M microphones in horizontal row, N microphones in vertical row and the microphones in horizontal and vertical rows form an L-shaped topological structure. The horizontal and vertical microphones are respectively arranged at equal intervals. The camera 44 is placed within a 90 degree angular spatial range formed by the horizontal and vertical microphone arrays.
The preferred embodiment of the present invention calculates the yaw angle of the sound source through a sound source localization algorithm using data collected by the transverse microphones and calculates the pitch angle of the sound source using data collected by the vertical microphones in combination with the yaw angle.
In the aspect of a sound source positioning algorithm, the preferred embodiment of the invention uses a far-field model of a sound field, and provides a controllable Power Response (SPR) sound source positioning technology based on plane search. The algorithm comprises the following steps:
step one, establishing a microphone array model in a three-dimensional coordinate system, and determining the position of each microphone in the coordinate. The abscissa axis of the invention is provided with M microphones which form M (M-1)/2 pairs of microphones; the ordinate axis is provided with N microphones which form N (N-1)/2 pairs of microphones; m and N are both integers greater than 1.
And step two, sampling the signals received by each microphone to obtain digital signals, and calculating by frames.
And step three, selecting data of one microphone for Voice Activity Detection (Voice Activity Detection, VAD for short), distinguishing Voice frames from noise frames, and only processing the Voice frames in the following steps. This step can greatly increase the accuracy of the algorithm.
And step four, aliasing and windowing the voice frames of each microphone, wherein a Hamming window with the window length of 1024 is adopted in the invention, and fast Fourier transform (DFT) conversion is carried out.
And step five, calculating controllable power response of the microphone pair.
(501) Let sound source S (n) arrive at microphone p and microphone q at times τ and τ, respectivelyqAnd calculating the power of the signals after the delay compensation of the microphone p and the microphone q:
wherein,
(502) in order to reduce the influence of the ambient noise and reverberation on the controllable power response, the amplitude is normalized (PHAT weighted) in the frequency domain, and only the phase information is retained, so that the following expression is obtained:
and step six, controllable power response interpolation calculation.
In a far-field model, the interval of the microphones is relatively short, and the higher sampling rate can improve the accuracy of direction angle estimation; to further improve the accuracy, the cross-correlation function needs to be interpolated. And selecting a microphone to perform ten times of interpolation on the cross-correlation function value in the maximum time delay range.
Step seven, searching the maximum controllable power response in the semi-circle range, specifically as follows:
(701) calculating the time delay of the microphone pair
d is the spacing of the microphone pairs on the coordinate axis, and C is the speed of sound.
(702) Summing the responses of all the microphones to the controllable power
(703) And finding the direction angle theta which makes E take the maximum value
θ=arg max E(θ)
Step eight, obtaining the horizontal deflection angle theta and the included angle lambda between OP' and OZ of the sound source through the step five to the step seven, and obtaining the pitch angle of the sound source:
γ=arctan(a)
step nine, calculating the statistical average value of theta and gamma
The angle is averagely divided into H intervals from 0 degree to 180 degrees, the times of the 30-frame condition theta and gamma falling in each interval are counted, the interval with the most times is selected to calculate the average value, and the obtained average value is the declination angle and the pitch angle of the sound source.
As shown in fig. 4, a schematic diagram of a camera system capable of tracking a speaker in an embodiment of the present invention. In the schematic diagram, a microphone array in horizontal and vertical rows forms an L-shaped topological structure, all microphones are placed in the same direction and are located in the same plane, five microphones are arranged in the horizontal row, four microphones are arranged in the vertical row, and the interval between every two adjacent microphones is 8 cm. The microphone spacing of the present invention is not limited to 8cm as given in this embodiment, and other lengths may be used as alternatives of the present invention, selected according to the specific implementation requirements. In the schematic diagram, the camera is located right above the microphone Mic2 # 2, and any position in the 90-degree angle space range formed by the horizontal microphone array and the vertical microphone array can be used as an alternative of the invention.
Fig. 5 is a flow chart of a sound source localization algorithm according to a preferred embodiment of the present invention, and as shown in fig. 5, the sound source localization algorithm proposed by the preferred embodiment of the present invention comprises the following steps:
step S501 and fig. 6 are schematic diagrams of a three-dimensional coordinate system model of a microphone array according to an embodiment of the present invention, and as shown in fig. 6, a microphone array model is established in the three-dimensional coordinate system to determine the position of each microphone in the coordinate system. In the embodiment, 5 microphones are arranged on the axis of abscissa, and 10 pairs of microphones are formed; the ordinate axis has 4 microphones, making up 6 pairs of microphones.
Step S502, sampling signals received by each microphone, wherein the sampling rate is 48000Hz, digital signals are obtained, and the frame length is 20ms through frame calculation. In fact, the frame length is longer and has higher estimation accuracy, but the operation amount is increased significantly, so the frame length is limited to 20 ms.
Step S503, selecting data of one microphone to perform Voice Activity Detection (VAD), distinguishing voice frames from noise frames, and only processing the voice frames in the following steps. Since noise degrades the performance of the algorithm, only speech frames are selected for processing, which can greatly improve the robustness of the algorithm.
Step S504, aliasing and windowing the voice frames of each microphone, the invention adopts a Hamming window with the window length of 1024, and DFT conversion is carried out:
DFT is the part with the largest operation amount in the algorithm, and for this reason, the efficient split-radix FFT fast algorithm is particularly adopted to realize the DFT equivalently, so that the operation amount is greatly reduced.
And step S505, calculating controllable power response of the microphone pair.
(5051) Let the sound source S (n) arrive at the microphone p and the microphone q at times τ, respectivelypAnd τqAnd calculating the power of the signals after the time domain alignment of the microphone p and the microphone q:
wherein,
xp(n)Xq(k)*is xp(n) and xqCross power spectrum of (n).
(5052) In order to reduce the influence of the ambient noise and reverberation on the controllable power response, the amplitude is normalized (PHAT weighted) in the frequency domain, and only the phase information is retained, so that the following expression is obtained:
when noise is ignored, xp(n)=s(n-τp). Performing FFT to obtain:
therefore, the first and second electrodes are formed on the substrate,
and step S506, performing controllable power response interpolation calculation.
In a far-field model, the interval of the microphones is relatively short, and the higher sampling rate can improve the accuracy of direction angle estimation; to further improve the accuracy, the cross-correlation function needs to be interpolated. And selecting a microphone to perform ten times of interpolation on the cross-correlation function value in the maximum time delay range.
Step S507, searching for the maximum controllable power response in the semicircular range, which is specifically as follows:
(5071) calculating the time delay of each microphone pair
d is the spacing of the microphone pairs on the coordinate axis, and C is the speed of sound.
(5072) Summing the responses of all the microphones to the controllable power
(5073) And finding the direction angle theta which makes E take the maximum value
θ=arg max(E(θ))
Step S508 and fig. 7 are schematic diagrams showing the relationship between the horizontal declination angle and the elevation angle according to the preferred embodiment of the present invention, and as shown in fig. 7, the included angle λ between the horizontal declination angle θ and OP' of the sound source and OZ can be obtained through steps S505 to S507, and the elevation angle of the sound source can be obtained through the following reasoning:
the coordinates of the sound source P are expressed in polar coordinates as:
and due to
Then we find the pitch angle as:
γ=arctan(a)
step S509, calculate the statistical average of θ and γ.
The angle is averagely divided into H intervals from 0 degree to 180 degrees, the times of the 30-frame condition theta and gamma falling in each interval are counted, the interval with the most times is selected to calculate the average value, and the obtained average value is the declination angle and the pitch angle of the sound source. And when the frame is less than 30 frames, outputting the last statistical result by the current frame, and outputting the newly-counted angle until the frame is 30 frames. The step can reduce external interference and reduce the rotation times of the camera.
The sound source positioning algorithm provided by the preferred embodiment of the invention can effectively improve the accuracy and stability of sound source positioning in noise and reverberation environments, and the microphone array camera device based on the algorithm can accurately track a speaker in real time and has good stability.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in a plurality of processors.
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store program codes for executing the method steps of the above embodiment:
optionally, the storage medium is further arranged to store program code for performing the method steps of the above embodiments:
optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Optionally, in this embodiment, the processor executes the method steps of the above embodiments according to the program code stored in the storage medium.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for locating a sound source, comprising:
acquiring signals of all microphones in a microphone array, wherein the microphone array is used for collecting sound of a sound source;
acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals;
obtaining the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses;
determining a relative positional relationship of the microphone array and the sound source from the directional angle.
2. The method of claim 1, wherein calculating the controllable power response of the plurality of microphone pairs of each microphone from the frame of the signal comprises:
and selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by the microphones according to the reference signal.
3. The method of claim 1, wherein the microphone array comprises:
in the same coordinate system plane, M microphones in the abscissa axial direction of the coordinate system plane and N microphones in the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;
a plurality of microphone pairs of the microphone array comprising: the M microphones form (M-1))/2 pairs of microphones; the N microphones form (N x (N-1))/2 pairs of microphones.
4. The method of claim 3, wherein calculating a sum of the controllable power responses of the plurality of microphone pairs, and wherein determining the directional angle of the sound source from the microphone array based on the maximum of the sum of the controllable power responses comprises:
establishing a three-dimensional coordinate system from the microphone array, the three-dimensional coordinate system comprising: the microphone array comprises an X axis, a Y axis, a Z axis, an origin O and a sound source point P, wherein the X axis is M microphones in the horizontal coordinate axial direction, and the Y axis is N microphones in the vertical coordinate axial direction;
calculating the time delays τ of the plurality of microphone pairs:
τ = R C = d c o s θ C
d is the interval of the microphone pair on the coordinate axis, C is the sound velocity, theta is the angle rotated from the X axis to a line segment OS in the anticlockwise direction when viewed from the positive Z axis, and a point S is the projection of a sound source point P on an XOY plane to which the X axis and the Y axis belong, wherein the line segment OS is a line segment from an original point O to the point S;
calculating a sum E of controllable power responses of the plurality of microphone pairs:
E = Σ p = 1 M Σ q = p + 1 M R x p , x q | τ
p, q are a pair number of the microphone pair, M is the number of microphones on the X-axis,
R x p ( n ) , x q ( n ) ( τ )
a controllable power for the microphone pair;
obtaining a direction angle θ at which E takes a maximum value:
θ=arg max E(θ)。
5. the method of claim 4, wherein determining the relative positional relationship of the microphone array and the sound source as a function of the directional angle comprises:
calculating a pitch angle gamma of the sound source in the three-dimensional coordinate system according to the direction angle theta;
t a n γ = t a n λ sin θ = a
γ=arctan(a)
and lambda is an included angle between a directed line segment OP and the positive direction of the Z axis, and the line segment OP is a line segment from the origin O to the sound source point P.
6. The method according to claim 5, wherein calculating the direction angle θ and the pitch angle γ of the sound source in the three-dimensional coordinate system comprises:
dividing axial 0 degree to 180 degrees angle of X axle or Y axle into H intervals in the predetermined frame number of framing, statistics direction angle theta with pitch angle gamma falls into the number of times of interval, select the interval that the number of times is the biggest, will the number of times is the biggest the direction angle theta with pitch angle gamma gets the average value respectively, obtains respectively the sound source is in three-dimensional coordinate system direction angle theta with pitch angle gamma, wherein, H is positive integer.
7. An apparatus for locating a sound source, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring signals of all microphones in a microphone array, and the microphone array is used for acquiring sound of a sound source;
the second acquisition module is used for acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals;
a third obtaining module, configured to obtain a sum of the controllable power responses of the plurality of microphone pairs, and determine a direction angle between the sound source and the microphone array according to a maximum value of the sum of the controllable power responses;
and the position module is used for determining the relative position relation between the microphone array and the sound source according to the direction angle.
8. The apparatus of claim 7, wherein the second obtaining module comprises:
and the reference unit is used for selecting a voice frame in a signal of one microphone in the microphone array as a reference signal for voice activity detection, and calculating controllable power response of a plurality of microphone pairs formed by the microphones according to the reference signal.
9. A system for locating a sound source, comprising:
a microphone array control unit, and a camera, wherein,
the microphone array control unit is used for acquiring signals of all microphones in a microphone array, wherein the microphone array is used for acquiring sound of a sound source; acquiring controllable power responses of a plurality of microphone pairs formed by the microphones according to the framing of the signals; obtaining the sum of the controllable power responses of the plurality of microphone pairs, and determining the direction angle of the sound source and the microphone array according to the maximum value of the sum of the controllable power responses; determining a relative positional relationship of the microphone array and the sound source according to the direction angle; sending the relative position relation to the camera;
and the camera is used for adjusting the position of the camera according to the relative position relation.
10. The system of claim 9,
the microphone array is realized by the following steps: in the same coordinate system plane, M microphones in the abscissa axial direction of the coordinate system plane and N microphones in the ordinate axial direction of the coordinate system plane form an L-shaped topological structure, and the M microphones and the N microphones are arranged at equal intervals;
a plurality of microphone pairs of the microphone array comprising: the M microphones form (M-1))/2 pairs of microphones; the N microphones form (N x (N-1))/2 pairs of microphones.
CN201610010206.1A 2016-01-06 2016-01-06 The localization method of sound source, apparatus and system Withdrawn CN106950542A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610010206.1A CN106950542A (en) 2016-01-06 2016-01-06 The localization method of sound source, apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610010206.1A CN106950542A (en) 2016-01-06 2016-01-06 The localization method of sound source, apparatus and system

Publications (1)

Publication Number Publication Date
CN106950542A true CN106950542A (en) 2017-07-14

Family

ID=59465498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610010206.1A Withdrawn CN106950542A (en) 2016-01-06 2016-01-06 The localization method of sound source, apparatus and system

Country Status (1)

Country Link
CN (1) CN106950542A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180642A (en) * 2017-07-20 2017-09-19 北京华捷艾米科技有限公司 Audio signal bearing calibration, device and equipment
CN107333120A (en) * 2017-08-11 2017-11-07 吉林大学 A kind of integrated sensor based on microphone array and stereoscopic vision
CN107797096A (en) * 2017-10-20 2018-03-13 电子科技大学 A kind of detection localization method of blowing a whistle based on microphone face battle array
CN108417036A (en) * 2018-05-07 2018-08-17 北京中电慧声科技有限公司 Vehicle whistle sound localization method and device in intelligent transportation system
CN108538306A (en) * 2017-12-29 2018-09-14 北京声智科技有限公司 Improve the method and device of speech ciphering equipment DOA estimations
CN109490834A (en) * 2018-10-17 2019-03-19 北京车和家信息技术有限公司 A kind of sound localization method, sound source locating device and vehicle
CN109669158A (en) * 2017-10-16 2019-04-23 杭州海康威视数字技术股份有限公司 A kind of sound localization method, system, computer equipment and storage medium
CN109741609A (en) * 2019-02-25 2019-05-10 南京理工大学 A kind of motor vehicle whistle sound monitoring method based on microphone array
CN109982038A (en) * 2019-03-15 2019-07-05 深圳市沃特沃德股份有限公司 Show the method, apparatus and computer equipment of sound source position
CN110265038A (en) * 2019-06-28 2019-09-20 联想(北京)有限公司 A kind of processing method and electronic equipment
CN110389597A (en) * 2018-04-17 2019-10-29 北京京东尚科信息技术有限公司 Camera method of adjustment, device and system based on auditory localization
CN110418242A (en) * 2019-07-30 2019-11-05 西安声必捷信息科技有限公司 Sound source direction method, apparatus and system
CN110794368A (en) * 2019-10-28 2020-02-14 星络智能科技有限公司 Sound source positioning method and device, intelligent sound box and storage medium
CN110954866A (en) * 2019-11-22 2020-04-03 达闼科技成都有限公司 Sound source positioning method, electronic device and storage medium
CN111398904A (en) * 2020-02-28 2020-07-10 云知声智能科技股份有限公司 Method and device for accelerating sound source positioning of voice control equipment
CN111492668A (en) * 2017-12-14 2020-08-04 巴科股份有限公司 Method and system for locating the origin of an audio signal within a defined space
CN111836084A (en) * 2020-07-03 2020-10-27 海信视像科技股份有限公司 Display device
WO2022001200A1 (en) * 2020-07-03 2022-01-06 海信视像科技股份有限公司 Display device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020099677A1 (en) * 2000-05-27 2002-07-25 Calise Anthony J. Adaptive control system having direct output feedback and related apparatuses and methods
US20090279714A1 (en) * 2008-05-06 2009-11-12 Samsung Electronics Co., Ltd. Apparatus and method for localizing sound source in robot
US20110164761A1 (en) * 2008-08-29 2011-07-07 Mccowan Iain Alexander Microphone array system and method for sound acquisition
CN102186051A (en) * 2011-03-10 2011-09-14 弭强 Sound localization-based video monitoring system
US20120330653A1 (en) * 2009-12-02 2012-12-27 Veovox Sa Device and method for capturing and processing voice
CN103995252A (en) * 2014-05-13 2014-08-20 南京信息工程大学 Three-dimensional space sound source positioning method
US20140286497A1 (en) * 2013-03-15 2014-09-25 Broadcom Corporation Multi-microphone source tracking and noise suppression
CN104142492A (en) * 2014-07-29 2014-11-12 佛山科学技术学院 SRP-PHAT multi-source spatial positioning method
US20150350395A1 (en) * 2013-02-25 2015-12-03 Spreadtrum Communications(Shanghai) Co., Ltd. Detecting and switching between noise reduction modes in multi-microphone mobile devices

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020099677A1 (en) * 2000-05-27 2002-07-25 Calise Anthony J. Adaptive control system having direct output feedback and related apparatuses and methods
US20090279714A1 (en) * 2008-05-06 2009-11-12 Samsung Electronics Co., Ltd. Apparatus and method for localizing sound source in robot
US20110164761A1 (en) * 2008-08-29 2011-07-07 Mccowan Iain Alexander Microphone array system and method for sound acquisition
US20120330653A1 (en) * 2009-12-02 2012-12-27 Veovox Sa Device and method for capturing and processing voice
CN102186051A (en) * 2011-03-10 2011-09-14 弭强 Sound localization-based video monitoring system
US20150350395A1 (en) * 2013-02-25 2015-12-03 Spreadtrum Communications(Shanghai) Co., Ltd. Detecting and switching between noise reduction modes in multi-microphone mobile devices
US20140286497A1 (en) * 2013-03-15 2014-09-25 Broadcom Corporation Multi-microphone source tracking and noise suppression
CN103995252A (en) * 2014-05-13 2014-08-20 南京信息工程大学 Three-dimensional space sound source positioning method
CN104142492A (en) * 2014-07-29 2014-11-12 佛山科学技术学院 SRP-PHAT multi-source spatial positioning method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
金光明等: "基于两个L型阵列的远场多声源定位方法", 《东北大学学报(自然科学版)》 *
黄晨曦等: "数字传声器阵列声源定位FPGA实现", 《电声技术》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180642A (en) * 2017-07-20 2017-09-19 北京华捷艾米科技有限公司 Audio signal bearing calibration, device and equipment
CN107333120A (en) * 2017-08-11 2017-11-07 吉林大学 A kind of integrated sensor based on microphone array and stereoscopic vision
CN107333120B (en) * 2017-08-11 2020-08-04 吉林大学 Integrated sensor based on microphone array and stereoscopic vision
CN109669158A (en) * 2017-10-16 2019-04-23 杭州海康威视数字技术股份有限公司 A kind of sound localization method, system, computer equipment and storage medium
CN109669158B (en) * 2017-10-16 2021-04-20 杭州海康威视数字技术股份有限公司 Sound source positioning method, system, computer equipment and storage medium
CN107797096A (en) * 2017-10-20 2018-03-13 电子科技大学 A kind of detection localization method of blowing a whistle based on microphone face battle array
US11350212B2 (en) 2017-12-14 2022-05-31 Barco N.V. Method and system for locating the origin of an audio signal within a defined space
CN111492668B (en) * 2017-12-14 2021-10-29 巴科股份有限公司 Method and system for locating the origin of an audio signal within a defined space
CN111492668A (en) * 2017-12-14 2020-08-04 巴科股份有限公司 Method and system for locating the origin of an audio signal within a defined space
CN108538306A (en) * 2017-12-29 2018-09-14 北京声智科技有限公司 Improve the method and device of speech ciphering equipment DOA estimations
CN108538306B (en) * 2017-12-29 2020-05-26 北京声智科技有限公司 Method and device for improving DOA estimation of voice equipment
CN110389597B (en) * 2018-04-17 2024-05-17 北京京东尚科信息技术有限公司 Camera adjusting method, device and system based on sound source positioning
CN110389597A (en) * 2018-04-17 2019-10-29 北京京东尚科信息技术有限公司 Camera method of adjustment, device and system based on auditory localization
CN108417036A (en) * 2018-05-07 2018-08-17 北京中电慧声科技有限公司 Vehicle whistle sound localization method and device in intelligent transportation system
CN109490834A (en) * 2018-10-17 2019-03-19 北京车和家信息技术有限公司 A kind of sound localization method, sound source locating device and vehicle
CN109741609A (en) * 2019-02-25 2019-05-10 南京理工大学 A kind of motor vehicle whistle sound monitoring method based on microphone array
CN109741609B (en) * 2019-02-25 2021-05-04 南京理工大学 Motor vehicle whistling monitoring method based on microphone array
CN109982038A (en) * 2019-03-15 2019-07-05 深圳市沃特沃德股份有限公司 Show the method, apparatus and computer equipment of sound source position
CN110265038B (en) * 2019-06-28 2021-10-22 联想(北京)有限公司 Processing method and electronic equipment
CN110265038A (en) * 2019-06-28 2019-09-20 联想(北京)有限公司 A kind of processing method and electronic equipment
CN110418242B (en) * 2019-07-30 2021-02-05 西安声必捷信息科技有限公司 Sound source orientation method, device and system
CN110418242A (en) * 2019-07-30 2019-11-05 西安声必捷信息科技有限公司 Sound source direction method, apparatus and system
CN110794368B (en) * 2019-10-28 2021-10-19 星络智能科技有限公司 Sound source positioning method and device, intelligent sound box and storage medium
CN110794368A (en) * 2019-10-28 2020-02-14 星络智能科技有限公司 Sound source positioning method and device, intelligent sound box and storage medium
CN110954866B (en) * 2019-11-22 2022-04-22 达闼机器人有限公司 Sound source positioning method, electronic device and storage medium
CN110954866A (en) * 2019-11-22 2020-04-03 达闼科技成都有限公司 Sound source positioning method, electronic device and storage medium
CN111398904A (en) * 2020-02-28 2020-07-10 云知声智能科技股份有限公司 Method and device for accelerating sound source positioning of voice control equipment
WO2022001200A1 (en) * 2020-07-03 2022-01-06 海信视像科技股份有限公司 Display device
CN111836084A (en) * 2020-07-03 2020-10-27 海信视像科技股份有限公司 Display device

Similar Documents

Publication Publication Date Title
CN106950542A (en) The localization method of sound source, apparatus and system
CN111025233B (en) Sound source direction positioning method and device, voice equipment and system
CN109254266A (en) Sound localization method, device and storage medium based on microphone array
US9641929B2 (en) Audio signal processing method and apparatus and differential beamforming method and apparatus
CN103308889B (en) Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment
Yook et al. Fast sound source localization using two-level search space clustering
JP2021110938A5 (en)
EP1015911A1 (en) Methods and apparatus for source location estimation from microphone-array time-delay estimates
Farmani et al. Informed sound source localization using relative transfer functions for hearing aid applications
CN109669158B (en) Sound source positioning method, system, computer equipment and storage medium
Tellakula Acoustic source localization using time delay estimation
CN111856402B (en) Signal processing method and device, storage medium and electronic device
CN108549052A (en) A kind of humorous domain puppet sound intensity sound localization method of circle of time-frequency-spatial domain joint weighting
CN110927669A (en) CS (circuit switched) multi-sound-source positioning method and system for wireless sound sensor network
Plinge et al. Passive online geometry calibration of acoustic sensor networks
CN109188362A (en) A kind of microphone array auditory localization signal processing method
Rascon et al. Lightweight multi-DOA tracking of mobile speech sources
CN111273231A (en) Indoor sound source positioning method based on different microphone array topological structure analysis
CN113109764B (en) Sound source positioning method and system
Calmes et al. Azimuthal sound localization using coincidence of timing across frequency on a robotic platform
Astapov et al. A two-stage approach to 2D DOA estimation for a compact circular microphone array
Nikunen et al. Time-difference of arrival model for spherical microphone arrays and application to direction of arrival estimation
Abad et al. Audio-based approaches to head orientation estimation in a smart-room.
CN111157949A (en) Voice recognition and sound source positioning method
CN111933182B (en) Sound source tracking method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20170714

WW01 Invention patent application withdrawn after publication