CN114646920A

CN114646920A - Sound source positioning method, device, equipment and storage medium

Info

Publication number: CN114646920A
Application number: CN202210248170.6A
Authority: CN
Inventors: 柯国富; 严体华; 吕智杰; 杨光
Original assignee: GHT CO Ltd
Current assignee: GHT CO Ltd
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2022-06-21

Abstract

The invention discloses a sound source positioning method, which comprises the following steps: calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, the preset sound velocity and all the signal time delay values; searching a local space to obtain the position of the maximum local response power, and taking the position of the maximum local response power as the sound source position of the real-time microphone signal; the center of the local space is an initial position, and the radius of the local space is a preset real-time searching radius. The invention also discloses a sound source positioning device, equipment and a storage medium. The embodiment of the invention can carry out preliminary positioning by utilizing the time delay among the microphone arrays, greatly reduces the initial range of space search, reduces the positioning calculation amount and is suitable for scenes with higher real-time requirements.

Description

Sound source positioning method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of audio signal processing, in particular to a sound source positioning method, a sound source positioning device, sound source positioning equipment and a storage medium.

Background

In the fields of video conference, smart home, robot, fault detection and positioning, etc., the positioning of sound source is significant, the prior art usually adopts a sound source positioning technology based on a microphone array to perform sound source positioning, for example, adopts a sound source positioning algorithm based on joint controllable response power and phase transformation (SRP-PHAT), but the method has a large calculation amount during sound source positioning and has a high requirement on the calculation capability of equipment, so on the basis of the method, the prior art removes cross-power spectrum components which are accumulated and do not contribute to phase on the basis of the sound source positioning algorithm based on the SRP-PHAT, changes the full search of all frequency bands in the original method into the search of sub-bands from coarse to fine, although the calculation amount of the improved SRP-PHAT sound source positioning algorithm applied to sound source positioning is reduced compared with the original scheme, however, the method still has the problem of large calculation amount and is not suitable for scenes with high real-time requirements.

Disclosure of Invention

Based on the method, the device, the equipment and the storage medium, the sound source positioning method, the device, the equipment and the storage medium can carry out initial positioning by utilizing time delay among microphone arrays, greatly reduce the initial range of space search, reduce the positioning calculation amount and be suitable for scenes with higher real-time requirements.

In order to achieve the above object, an embodiment of the present invention provides a sound source localization method, including:

calculating the signal time delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;

calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity and all the signal time delay values;

searching a local space to obtain the position of the maximum local response power, and taking the position of the maximum local response power as the sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.

As an improvement of the above scheme, the real-time search radius is obtained by:

searching a global space to obtain a maximum global response power and a global position corresponding to the maximum global response power, calculating a distance error between the global position and the initial position, and setting a power threshold according to the maximum global response power;

setting a search radius according to the distance error;

and taking the search radius as the real-time search radius until the maximum local response power obtained by real-time calculation is smaller than the power threshold value, and returning to the step of searching the global space.

As an improvement of the above solution, after the step of using the position of the maximum local response power as the sound source position of the real-time microphone signal, the method further includes:

and performing weighted calculation according to the acquired sound source positions of a plurality of recent historical microphone signals and the sound source position of the real-time microphone signal to obtain a final position, and updating the sound source position of the real-time microphone signal according to the final position.

As a refinement of the above, the final position is calculated by:

IF(||r_s(t)-r_s(t-1)||²＞μ)；

wherein r is_{Final (a Chinese character of 'gan')}(t)Represents the final position, ω (t-i) represents the weight coefficient of the sound source position at the previous i-time, r_s(t) denotes the sound source position of the real-time microphone signal, r_s(t-1) represents the sound source position at the previous time, and μ represents a preset distance threshold.

As an improvement of the above solution, before calculating the signal delay value of each relative microphone and reference microphone according to the acquired real-time microphone signal, the method further includes:

carrying out normalization processing on the real-time microphone signals;

and performing valid frame detection on the real-time microphone signal to delete invalid frames.

As an improvement of the above scheme, the calculating a signal delay value of each relative microphone and reference microphone according to the acquired real-time microphone signal specifically includes:

and calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals based on a GCC-PHAT algorithm.

As an improvement of the above scheme, the calculating an initial position of the real-time microphone signal according to the acquired spatial position of each relative microphone, a preset sound velocity, and all the signal delay values specifically includes:

acquiring a reference position of the reference microphone and relative positions of all the relative microphones, and setting a sound source position of the real-time microphone signal as an unknown position;

listing a solution equation set of spatial time delay values of each relative microphone and the reference microphone according to each reference position, the unknown position and the acquired sound speed;

establishing a sound source position calculation equation set by using the solution equation set of the signal time delay value and the space time delay value:

and solving the sound source position calculation equation set based on a minimum mean square error method to obtain the initial position of the real-time microphone signal.

In order to achieve the above object, an embodiment of the present invention further provides a sound source localization apparatus, including:

the signal delay value calculation module is used for calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;

an initial position calculating module, configured to calculate an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity, and all the signal delay values;

a sound source position calculating module, configured to search a local space to obtain a position of a maximum local response power, and use the position of the maximum local response power as a sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.

To achieve the above object, an embodiment of the present invention further provides a sound source positioning device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and the processor, when executing the computer program, implements the sound source positioning method according to any of the above embodiments.

To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the sound source positioning method according to any one of the above embodiments.

Compared with the prior art, the sound source positioning method, the device, the equipment and the storage medium disclosed by the embodiment of the invention calculate the signal delay value of each relative microphone and the reference microphone by acquiring the real-time microphone signals, further calculate the initial position of the real-time microphone according to the signal delay value, the acquired space positions of all the relative microphones and the reference microphones and the preset sound velocity, and search the local space by taking the initial position as the center of the local space and taking the position of the maximum local response power as the sound source position of the real-time microphone signals by taking the preset real-time search radius. Therefore, the signal delay value of each relative microphone and the signal delay value of the reference microphone are calculated through the real-time microphone signals, the initial position of the real-time microphone signals is determined by combining the spatial position and the sound velocity of each microphone, and the initial position is used as the search center to search to obtain the sound source position of the real-time microphone, so that the initial range of spatial search is greatly reduced, the positioning calculation amount is reduced, and the method and the device are suitable for scenes with high real-time requirements.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a sound source localization method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a microphone array model according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a microphone pair and a sound source according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a sound source positioning device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a sound source localization apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a sound source localization method according to an embodiment of the present invention. The sound source positioning method may be executed by a computing terminal, and the computing terminal is a computer, a tablet computer, or other devices, and is not limited herein.

Specifically, the sound source localization method includes the following steps:

s1, calculating the signal time delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;

s2, calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, the preset sound velocity and all the signal time delay values;

s3, searching a local space to obtain the position of the maximum local response power, and taking the position of the maximum local response power as the sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.

Specifically, as an example, the microphone array is composed of 4 microphones, one of the microphones is selected as a reference microphone, the other 3 microphones are selected as opposite microphones, the microphone array collects audio signals in real time to obtain real-time microphone signals, and a signal delay value of each of the opposite microphones and the reference microphone is calculated according to the real-time microphone signals, that is, a signal-based method is used to calculate a signal delay value between the microphones; referring to the microphone array model shown in fig. 2, M represents the positions of microphones (4 microphones are included in the figure), and it is assumed that the distance from a sound source to the center of an array element is D, the azimuth angle is α, and the pitch angle is β; because the time delay values of the received signals among the microphones are related to the distance between each microphone and the sound source and the propagation speed, the initial position of the real-time microphone signal is calculated by utilizing the relation of the space positions among the microphones and the preset sound velocity (the sound velocity in the medium where the audio signal propagates) and combining the signal time delay values, the initial position is taken as the center of the local space, the preset real-time search radius is taken as the search radius of the local space, the space near the initial position is searched, the local maximum value point of the response power is found, and the point is taken as the sound source position of the real-time microphone signal. Therefore, the signal delay value of each relative microphone and the signal delay value of the reference microphone are calculated through the real-time microphone signals, the initial position of the real-time microphone signals is determined by combining the spatial position and the sound velocity of each microphone, and the initial position is used as the search center to search to obtain the sound source position of the real-time microphone, so that the initial range of spatial search is greatly reduced, the positioning calculation amount is reduced, and the method and the device are suitable for scenes with high real-time requirements.

In one embodiment, the real-time search radius is obtained by:

setting a search radius according to the distance error;

Specifically, this embodiment improves the SRG-PHAT algorithm, and when the SRP-PHAT algorithm is first calculated, the global space U { (d, α, β) | d represents the distance to the center of the array element, α represents the azimuth angle, and β represents the pitch angle } is searched to obtain the maximum global response power P_maxAnd recording the position (global position) of the maximum global response power (represented by vector), calculating the distance error between the global position and the initial position, and setting a power threshold (positive correlation between the maximum global power and the power threshold) according to the maximum global response powerOptionally, the distance error is multiplied by a preset radius coefficient (constant) to obtain a search radius, and the search radius is used as a real-time search radius, which can be used for real-time microphone signals acquired subsequently in real time, and as time goes on, if the position of the sound source moves significantly and is far away from the original local space, in this case, the maximum local response power searched in the original local space also decreases significantly, so that, in the subsequent calculation process, if the maximum local response power is smaller than the power threshold, the global space needs to be searched again to determine a new local space again.

In one embodiment, after the step S13 of using the position of the maximum local response power as the sound source position of the real-time microphone signal, the method further includes:

and performing weighted calculation according to the acquired sound source positions of the plurality of recent historical microphone signals and the sound source position of the real-time microphone signal to obtain a final position, and updating the sound source position of the real-time microphone signal according to the final position.

Specifically, after the position where the maximum local response power is obtained is searched, the sound source positions of a plurality of recent historical moments are considered, weighting processing is performed on the sound source positions, smooth positioning is performed, the final position is obtained through calculation, and the sound source position of the real-time microphone signal is updated according to the final position.

In one embodiment, the final position is calculated by:

IF(||r_s(t)-r_s(t-1)||²＞μ)；

wherein r is_{Final (a Chinese character of 'gan')}(t) represents the final position, ω (t-i) represents the weight coefficient of the sound source position at the previous i-time, r_s(t) denotes the sound source position, r, of the real-time microphone signal_s(t-1) represents the sound source position at the previous time, μ represents a preset distance threshold, and p is 1 or more.

Specifically, the final position is obtained by performing weighted calculation on the sound source position of the real-time microphone signal and the sound source position of the nearest p-1 times, when the square of the difference value between the current sound source position of the real-time microphone signal and the previous sound source position is greater than a preset distance threshold, it is described that the position of the sound source has great change at the current moment and the previous moment, that is, the sound source has obvious movement, the weight coefficient at this time adopts a first coefficient calculation mode, and when the square of the difference value between the current sound source position of the real-time microphone signal and the previous sound source position is not greater than the preset distance threshold, the coefficient is set by adopting an averaging idea (1/p). It is worth noting that the setting of the specific value of μ is set according to experience of engineering applications.

In one embodiment, before calculating the signal delay value of each relative microphone and reference microphone according to the acquired real-time microphone signals, the method further comprises:

carrying out normalization processing on the real-time microphone signals;

Specifically, an original audio signal is obtained, the original audio signal is translated and scaled in amplitude to obtain a standard audio signal (normalized signal) with 0 as an average value and 1 as a maximum amplitude, the normalized signal is preprocessed, a series of audio frames (one frame of audio frame represents one real-time microphone signal) are obtained by windowing (an optional sine window and a hanning window), each frame is processed in sequence, valid frame detection (voice frame or noise frame is detected by VAD algorithm) is carried out on the frame, if the frame is a non-voice frame (noise frame), skipping is carried out, and if the frame is a valid frame, subsequent algorithm processing is reserved.

In one embodiment, the calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signal in step S1 specifically includes:

and calculating the signal time delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals based on a GCC-PHAT algorithm.

Specifically, the GCC-PHAT algorithm is as follows:

wherein R (τ) is a cross-correlation function, a (w) is a weighted value of reciprocal of X, Y cross-power spectra, nFFt is a fourier transform length, Fs is a sampling rate, X (w), Y (w) is a fourier transform of the microphone signals X (t), Y (t), in this embodiment, X (t) is a real-time microphone signal of one of the opposing microphones, Y (t) is a real-time microphone signal of the reference microphone, and the signal delay value of each of the opposing microphones and the reference microphone is calculated by using the above-mentioned specific GCC-PHAT algorithm.

Further, the effective frame is taken as an object, GCC-PHAT (generalized cross correlation) calculation is carried out, tde (signal delay value) of each relative microphone and reference microphone is calculated, delay value screening is carried out, if the delay values do not pass the screening, the step of detecting the effective frame is carried out, and the effective frame is selected again.

Specifically, the delay value screening method is as follows:

tde＜＝d/c；

tde is less than the maximum time delay between the corresponding opposing microphone and the reference microphone, d is the distance between the corresponding microphone and the reference microphone, and c is the predetermined speed of sound.

In one embodiment, the calculating an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, the preset sound speed, and all the signal delay values in step S2 specifically includes:

listing a solution equation set of a space time delay value of each relative microphone and the reference microphone according to the relative position, the reference position, the unknown position and the acquired sound speed;

establishing a sound source position calculation equation set by using the signal delay value and the solution equation set of the space delay value;

Specifically, referring to the schematic diagram of the microphone pair in relation to the position of the sound source in fig. 3, the system of equations for solving the spatial delay values is listed:

where c represents a preset sound velocity, τ_ijRepresenting the spatial delay values of the ith relative microphone and the reference microphone, (x)_Mi,y_Mi,z_Mi) Represents the spatial position of the ith relative microphone, (x)_Mj,y_Mj,z_Mj) Represents the spatial position of the reference microphone, (x)_S,y_S,z_S) A sound source position (unknown position) representing the real-time microphone signals;

as can be seen from the solution of the system of equations for the spatial time delay values, the sound source is positioned opposite the microphone M_iAnd a reference microphone M_jIs a hyperboloid of focus. When the microphone array has a plurality of array elements (three or more)Each pair of microphones and its spatial delay value can define a hyperboloid. Due to the error of the delay estimation, it is impossible to absolutely intersect a point between the hyperboloids, but the overlapping areas of the hyperboloids are concentrated around the sound source in general.

In FIG. 3, M_jRepresenting the spatial position of the reference microphone (reference position), Mi representing the spatial position of the ith relative microphone, S representing the sound source position of the real-time microphone signal (unknown position), Rs being the reference microphone M_jDistance to sound source S, Ri is the distance to sound source S relative to microphone Mi, di_jIs the difference between the distance between the two microphones and the sound source, and is expressed by the following formula:

the method is simplified to obtain:

R_i ²-d_ij ²-2d_ijR_s-2r_i ^Tr_s＝0

due to d_ijD is obtained from an estimate of the time delay between the microphones_ij＝cτ_ijTherefore, there is inevitably an error from the actual value, and therefore the above equation is not 0, assuming that the error is:

ε＝R_i ²-d_ij ²-2d_ijR_s-2r_i ^Tr_s

suppose there are M microphones, numbered from 0 to M-1, in the sound source localization system. With a microphone M₀The position is a reference point, and a coordinate system is established by taking the position as an origin. It is possible to obtain:

ε＝δ-2R_sd-2Sr_s

the purpose of the minimum mean square error method is to minimize the mean square error of the above formula, and then the positioning result is most accurate. By derivation, to achieve the minimum mean square error, the location of the sound source is:

specifically, the specific formula of SRP-PHAT is as follows:

compared with the prior art, the sound source positioning method disclosed by the embodiment of the invention calculates the signal delay value of each relative microphone and the reference microphone by acquiring the real-time microphone signals, further calculates the initial position of the real-time microphone according to the signal delay value, the acquired spatial positions of all the relative microphones and the reference microphones and the preset sound velocity, and searches the local space by taking the initial position as the center of the local space and taking the position of the maximum local response power as the sound source position of the real-time microphone signals. Therefore, the signal delay value of each relative microphone and the signal delay value of the reference microphone are calculated through the real-time microphone signals, the initial position of the real-time microphone signals is determined by combining the spatial position and the sound velocity of each microphone, and the initial position is used as the search center to search to obtain the sound source position of the real-time microphone, so that the initial range of spatial search is greatly reduced, the positioning calculation amount is reduced, and the method and the device are suitable for scenes with high real-time requirements.

Referring to fig. 4, it is a schematic structural diagram of a sound source positioning device according to an embodiment of the present invention, where the sound source positioning device includes:

a signal delay value calculation module 11, configured to calculate a signal delay value of each of the relative microphones and the reference microphone according to the acquired real-time microphone signal; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;

an initial position calculating module 12, configured to calculate an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity, and all the signal delay values;

a sound source position calculating module 13, configured to search a local space to obtain a position of a maximum local response power, and use the position of the maximum local response power as a sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.

It should be noted that, for a specific working process of the sound source positioning device, reference may be made to the working process of the sound source positioning method in the foregoing embodiment, and details are not repeated herein.

Compared with the prior art, the sound source positioning device disclosed by the embodiment of the invention calculates the signal delay value of each relative microphone and the reference microphone by acquiring the real-time microphone signals, further calculates the initial position of the real-time microphone according to the signal delay value, the acquired spatial positions of all the relative microphones and the reference microphones and the preset sound velocity, and searches the local space by taking the initial position as the center of the local space and taking the position of the maximum local response power as the sound source position of the real-time microphone signals. Therefore, the device in the embodiment of the invention calculates the signal delay value of each relative microphone and the reference microphone through the real-time microphone signal, determines the initial position of the real-time microphone signal by combining the spatial position and the sound velocity of each microphone, and searches by taking the initial position as the search center to obtain the sound source position of the real-time microphone, thereby greatly reducing the initial range of spatial search, reducing the positioning calculation amount and being suitable for scenes with higher real-time requirements.

Referring to fig. 5, which is a schematic structural diagram of a sound source positioning device according to an embodiment of the present invention, the sound source positioning device includes a processor 21, a memory 22, and a computer program stored in the memory 22 and configured to be executed by the processor 21, and the processor 21, when executing the computer program, implements the steps in the sound source positioning method embodiments, such as the steps S1 to S3 shown in fig. 1; alternatively, the processor 21, when executing the computer program, implements the functions of the modules in the above device embodiments, such as the signal delay value calculating module 11.

Illustratively, the computer program may be divided into one or more modules, which are stored in the memory 22 and executed by the processor 21 to accomplish the present invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the sound source localization device. For example, the computer program may be divided into a signal delay value calculation module 11, an initial position calculation module 12, and an acoustic source position calculation module 13, where the specific functions of each module are as follows:

The specific working process of each module may refer to the working process of the sound source positioning device described in the above embodiment, and is not described herein again.

The sound source positioning device can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The sound source localization device may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a sound source localization device and is not intended to be limiting, and may include more or fewer components than shown, or some components in combination, or different components, e.g., the sound source localization device may also include input output devices, network access devices, buses, etc.

The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 21 is the control center of the sound source localization device and connects the various parts of the overall sound source localization device using various interfaces and lines.

The memory 22 may be used to store the computer programs and/or modules, and the processor 21 may implement the various functions of the sound source localization apparatus by running or executing the computer programs and/or modules stored in the memory 22 and invoking the data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein the sound source localization device integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A sound source localization method, comprising:

calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;

2. The sound source localization method of claim 1, wherein the real-time search radius is obtained by:

setting a search radius according to the distance error;

and taking the search radius as the real-time search radius until the maximum local response power obtained by real-time calculation is smaller than the power threshold, and returning to the step of searching the global space.

3. The sound source localization method of claim 1, wherein the determining the location of the maximum local response power as the sound source location of the real-time microphone signal further comprises:

4. The sound source localization method of claim 3, wherein the final position is calculated by:

IF(||r_s(t)-r_s(t-1)||²>μ)；

wherein r is_{Final (a Chinese character of 'gan')}(t) represents the final position, ω (t-i) represents the weight coefficient of the sound source position at the previous i-time, r_s(t) denotes the sound source position, r, of the real-time microphone signal_s(t-1) represents the sound source position at the previous time, and μ represents a preset distance threshold.

5. The sound source localization method of claim 1, wherein before calculating the signal delay value for each relative microphone and reference microphone based on the acquired real-time microphone signals, further comprising:

carrying out normalization processing on the real-time microphone signals;

6. The method of claim 1, wherein the calculating the signal delay value for each of the relative microphones and the reference microphone based on the acquired real-time microphone signals comprises:

7. The sound source localization method according to claim 1, wherein the calculating an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity, and all the signal delay values specifically comprises:

acquiring the reference position of the reference microphone and the relative positions of all the relative microphones, and setting the sound source position of the real-time microphone signal as an unknown position;

listing a solution equation set of the space time delay value of each relative microphone and the reference microphone according to the relative position, the reference position, the unknown position and the obtained sound speed;

8. A sound source localization apparatus, comprising:

the initial position calculation module is used for calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity and all the signal time delay values;

9. A sound source localization device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the sound source localization method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform a sound source localization method according to any one of claims 1 to 7.