CN114646920A - Sound source positioning method, device, equipment and storage medium - Google Patents

Sound source positioning method, device, equipment and storage medium Download PDF

Info

Publication number
CN114646920A
CN114646920A CN202210248170.6A CN202210248170A CN114646920A CN 114646920 A CN114646920 A CN 114646920A CN 202210248170 A CN202210248170 A CN 202210248170A CN 114646920 A CN114646920 A CN 114646920A
Authority
CN
China
Prior art keywords
time
real
microphone
sound source
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210248170.6A
Other languages
Chinese (zh)
Inventor
柯国富
严体华
吕智杰
杨光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GHT CO Ltd
Original Assignee
GHT CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GHT CO Ltd filed Critical GHT CO Ltd
Priority to CN202210248170.6A priority Critical patent/CN114646920A/en
Publication of CN114646920A publication Critical patent/CN114646920A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a sound source positioning method, which comprises the following steps: calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, the preset sound velocity and all the signal time delay values; searching a local space to obtain the position of the maximum local response power, and taking the position of the maximum local response power as the sound source position of the real-time microphone signal; the center of the local space is an initial position, and the radius of the local space is a preset real-time searching radius. The invention also discloses a sound source positioning device, equipment and a storage medium. The embodiment of the invention can carry out preliminary positioning by utilizing the time delay among the microphone arrays, greatly reduces the initial range of space search, reduces the positioning calculation amount and is suitable for scenes with higher real-time requirements.

Description

Sound source positioning method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of audio signal processing, in particular to a sound source positioning method, a sound source positioning device, sound source positioning equipment and a storage medium.
Background
In the fields of video conference, smart home, robot, fault detection and positioning, etc., the positioning of sound source is significant, the prior art usually adopts a sound source positioning technology based on a microphone array to perform sound source positioning, for example, adopts a sound source positioning algorithm based on joint controllable response power and phase transformation (SRP-PHAT), but the method has a large calculation amount during sound source positioning and has a high requirement on the calculation capability of equipment, so on the basis of the method, the prior art removes cross-power spectrum components which are accumulated and do not contribute to phase on the basis of the sound source positioning algorithm based on the SRP-PHAT, changes the full search of all frequency bands in the original method into the search of sub-bands from coarse to fine, although the calculation amount of the improved SRP-PHAT sound source positioning algorithm applied to sound source positioning is reduced compared with the original scheme, however, the method still has the problem of large calculation amount and is not suitable for scenes with high real-time requirements.
Disclosure of Invention
Based on the method, the device, the equipment and the storage medium, the sound source positioning method, the device, the equipment and the storage medium can carry out initial positioning by utilizing time delay among microphone arrays, greatly reduce the initial range of space search, reduce the positioning calculation amount and be suitable for scenes with higher real-time requirements.
In order to achieve the above object, an embodiment of the present invention provides a sound source localization method, including:
calculating the signal time delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;
calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity and all the signal time delay values;
searching a local space to obtain the position of the maximum local response power, and taking the position of the maximum local response power as the sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.
As an improvement of the above scheme, the real-time search radius is obtained by:
searching a global space to obtain a maximum global response power and a global position corresponding to the maximum global response power, calculating a distance error between the global position and the initial position, and setting a power threshold according to the maximum global response power;
setting a search radius according to the distance error;
and taking the search radius as the real-time search radius until the maximum local response power obtained by real-time calculation is smaller than the power threshold value, and returning to the step of searching the global space.
As an improvement of the above solution, after the step of using the position of the maximum local response power as the sound source position of the real-time microphone signal, the method further includes:
and performing weighted calculation according to the acquired sound source positions of a plurality of recent historical microphone signals and the sound source position of the real-time microphone signal to obtain a final position, and updating the sound source position of the real-time microphone signal according to the final position.
As a refinement of the above, the final position is calculated by:
Figure BDA0003545726310000021
IF(||rs(t)-rs(t-1)||2>μ);
Figure BDA0003545726310000022
Figure BDA0003545726310000031
wherein r isFinal (a Chinese character of 'gan')(t)Represents the final position, ω (t-i) represents the weight coefficient of the sound source position at the previous i-time, rs(t) denotes the sound source position of the real-time microphone signal, rs(t-1) represents the sound source position at the previous time, and μ represents a preset distance threshold.
As an improvement of the above solution, before calculating the signal delay value of each relative microphone and reference microphone according to the acquired real-time microphone signal, the method further includes:
carrying out normalization processing on the real-time microphone signals;
and performing valid frame detection on the real-time microphone signal to delete invalid frames.
As an improvement of the above scheme, the calculating a signal delay value of each relative microphone and reference microphone according to the acquired real-time microphone signal specifically includes:
and calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals based on a GCC-PHAT algorithm.
As an improvement of the above scheme, the calculating an initial position of the real-time microphone signal according to the acquired spatial position of each relative microphone, a preset sound velocity, and all the signal delay values specifically includes:
acquiring a reference position of the reference microphone and relative positions of all the relative microphones, and setting a sound source position of the real-time microphone signal as an unknown position;
listing a solution equation set of spatial time delay values of each relative microphone and the reference microphone according to each reference position, the unknown position and the acquired sound speed;
establishing a sound source position calculation equation set by using the solution equation set of the signal time delay value and the space time delay value:
and solving the sound source position calculation equation set based on a minimum mean square error method to obtain the initial position of the real-time microphone signal.
In order to achieve the above object, an embodiment of the present invention further provides a sound source localization apparatus, including:
the signal delay value calculation module is used for calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;
an initial position calculating module, configured to calculate an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity, and all the signal delay values;
a sound source position calculating module, configured to search a local space to obtain a position of a maximum local response power, and use the position of the maximum local response power as a sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.
To achieve the above object, an embodiment of the present invention further provides a sound source positioning device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and the processor, when executing the computer program, implements the sound source positioning method according to any of the above embodiments.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the sound source positioning method according to any one of the above embodiments.
Compared with the prior art, the sound source positioning method, the device, the equipment and the storage medium disclosed by the embodiment of the invention calculate the signal delay value of each relative microphone and the reference microphone by acquiring the real-time microphone signals, further calculate the initial position of the real-time microphone according to the signal delay value, the acquired space positions of all the relative microphones and the reference microphones and the preset sound velocity, and search the local space by taking the initial position as the center of the local space and taking the position of the maximum local response power as the sound source position of the real-time microphone signals by taking the preset real-time search radius. Therefore, the signal delay value of each relative microphone and the signal delay value of the reference microphone are calculated through the real-time microphone signals, the initial position of the real-time microphone signals is determined by combining the spatial position and the sound velocity of each microphone, and the initial position is used as the search center to search to obtain the sound source position of the real-time microphone, so that the initial range of spatial search is greatly reduced, the positioning calculation amount is reduced, and the method and the device are suitable for scenes with high real-time requirements.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a sound source localization method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a microphone array model according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a microphone pair and a sound source according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a sound source positioning device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a sound source localization apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a sound source localization method according to an embodiment of the present invention. The sound source positioning method may be executed by a computing terminal, and the computing terminal is a computer, a tablet computer, or other devices, and is not limited herein.
Specifically, the sound source localization method includes the following steps:
s1, calculating the signal time delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;
s2, calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, the preset sound velocity and all the signal time delay values;
s3, searching a local space to obtain the position of the maximum local response power, and taking the position of the maximum local response power as the sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.
Specifically, as an example, the microphone array is composed of 4 microphones, one of the microphones is selected as a reference microphone, the other 3 microphones are selected as opposite microphones, the microphone array collects audio signals in real time to obtain real-time microphone signals, and a signal delay value of each of the opposite microphones and the reference microphone is calculated according to the real-time microphone signals, that is, a signal-based method is used to calculate a signal delay value between the microphones; referring to the microphone array model shown in fig. 2, M represents the positions of microphones (4 microphones are included in the figure), and it is assumed that the distance from a sound source to the center of an array element is D, the azimuth angle is α, and the pitch angle is β; because the time delay values of the received signals among the microphones are related to the distance between each microphone and the sound source and the propagation speed, the initial position of the real-time microphone signal is calculated by utilizing the relation of the space positions among the microphones and the preset sound velocity (the sound velocity in the medium where the audio signal propagates) and combining the signal time delay values, the initial position is taken as the center of the local space, the preset real-time search radius is taken as the search radius of the local space, the space near the initial position is searched, the local maximum value point of the response power is found, and the point is taken as the sound source position of the real-time microphone signal. Therefore, the signal delay value of each relative microphone and the signal delay value of the reference microphone are calculated through the real-time microphone signals, the initial position of the real-time microphone signals is determined by combining the spatial position and the sound velocity of each microphone, and the initial position is used as the search center to search to obtain the sound source position of the real-time microphone, so that the initial range of spatial search is greatly reduced, the positioning calculation amount is reduced, and the method and the device are suitable for scenes with high real-time requirements.
In one embodiment, the real-time search radius is obtained by:
searching a global space to obtain a maximum global response power and a global position corresponding to the maximum global response power, calculating a distance error between the global position and the initial position, and setting a power threshold according to the maximum global response power;
setting a search radius according to the distance error;
and taking the search radius as the real-time search radius until the maximum local response power obtained by real-time calculation is smaller than the power threshold value, and returning to the step of searching the global space.
Specifically, this embodiment improves the SRG-PHAT algorithm, and when the SRP-PHAT algorithm is first calculated, the global space U { (d, α, β) | d represents the distance to the center of the array element, α represents the azimuth angle, and β represents the pitch angle } is searched to obtain the maximum global response power PmaxAnd recording the position (global position) of the maximum global response power (represented by vector), calculating the distance error between the global position and the initial position, and setting a power threshold (positive correlation between the maximum global power and the power threshold) according to the maximum global response powerOptionally, the distance error is multiplied by a preset radius coefficient (constant) to obtain a search radius, and the search radius is used as a real-time search radius, which can be used for real-time microphone signals acquired subsequently in real time, and as time goes on, if the position of the sound source moves significantly and is far away from the original local space, in this case, the maximum local response power searched in the original local space also decreases significantly, so that, in the subsequent calculation process, if the maximum local response power is smaller than the power threshold, the global space needs to be searched again to determine a new local space again.
In one embodiment, after the step S13 of using the position of the maximum local response power as the sound source position of the real-time microphone signal, the method further includes:
and performing weighted calculation according to the acquired sound source positions of the plurality of recent historical microphone signals and the sound source position of the real-time microphone signal to obtain a final position, and updating the sound source position of the real-time microphone signal according to the final position.
Specifically, after the position where the maximum local response power is obtained is searched, the sound source positions of a plurality of recent historical moments are considered, weighting processing is performed on the sound source positions, smooth positioning is performed, the final position is obtained through calculation, and the sound source position of the real-time microphone signal is updated according to the final position.
In one embodiment, the final position is calculated by:
Figure BDA0003545726310000081
IF(||rs(t)-rs(t-1)||2>μ);
Figure BDA0003545726310000082
Figure BDA0003545726310000083
wherein r isFinal (a Chinese character of 'gan')(t) represents the final position, ω (t-i) represents the weight coefficient of the sound source position at the previous i-time, rs(t) denotes the sound source position, r, of the real-time microphone signals(t-1) represents the sound source position at the previous time, μ represents a preset distance threshold, and p is 1 or more.
Specifically, the final position is obtained by performing weighted calculation on the sound source position of the real-time microphone signal and the sound source position of the nearest p-1 times, when the square of the difference value between the current sound source position of the real-time microphone signal and the previous sound source position is greater than a preset distance threshold, it is described that the position of the sound source has great change at the current moment and the previous moment, that is, the sound source has obvious movement, the weight coefficient at this time adopts a first coefficient calculation mode, and when the square of the difference value between the current sound source position of the real-time microphone signal and the previous sound source position is not greater than the preset distance threshold, the coefficient is set by adopting an averaging idea (1/p). It is worth noting that the setting of the specific value of μ is set according to experience of engineering applications.
In one embodiment, before calculating the signal delay value of each relative microphone and reference microphone according to the acquired real-time microphone signals, the method further comprises:
carrying out normalization processing on the real-time microphone signals;
and performing valid frame detection on the real-time microphone signal to delete invalid frames.
Specifically, an original audio signal is obtained, the original audio signal is translated and scaled in amplitude to obtain a standard audio signal (normalized signal) with 0 as an average value and 1 as a maximum amplitude, the normalized signal is preprocessed, a series of audio frames (one frame of audio frame represents one real-time microphone signal) are obtained by windowing (an optional sine window and a hanning window), each frame is processed in sequence, valid frame detection (voice frame or noise frame is detected by VAD algorithm) is carried out on the frame, if the frame is a non-voice frame (noise frame), skipping is carried out, and if the frame is a valid frame, subsequent algorithm processing is reserved.
In one embodiment, the calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signal in step S1 specifically includes:
and calculating the signal time delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals based on a GCC-PHAT algorithm.
Specifically, the GCC-PHAT algorithm is as follows:
Figure BDA0003545726310000091
Figure BDA0003545726310000092
Figure BDA0003545726310000093
wherein R (τ) is a cross-correlation function, a (w) is a weighted value of reciprocal of X, Y cross-power spectra, nFFt is a fourier transform length, Fs is a sampling rate, X (w), Y (w) is a fourier transform of the microphone signals X (t), Y (t), in this embodiment, X (t) is a real-time microphone signal of one of the opposing microphones, Y (t) is a real-time microphone signal of the reference microphone, and the signal delay value of each of the opposing microphones and the reference microphone is calculated by using the above-mentioned specific GCC-PHAT algorithm.
Further, the effective frame is taken as an object, GCC-PHAT (generalized cross correlation) calculation is carried out, tde (signal delay value) of each relative microphone and reference microphone is calculated, delay value screening is carried out, if the delay values do not pass the screening, the step of detecting the effective frame is carried out, and the effective frame is selected again.
Specifically, the delay value screening method is as follows:
tde<=d/c;
tde is less than the maximum time delay between the corresponding opposing microphone and the reference microphone, d is the distance between the corresponding microphone and the reference microphone, and c is the predetermined speed of sound.
In one embodiment, the calculating an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, the preset sound speed, and all the signal delay values in step S2 specifically includes:
acquiring a reference position of the reference microphone and relative positions of all the relative microphones, and setting a sound source position of the real-time microphone signal as an unknown position;
listing a solution equation set of a space time delay value of each relative microphone and the reference microphone according to the relative position, the reference position, the unknown position and the acquired sound speed;
establishing a sound source position calculation equation set by using the signal delay value and the solution equation set of the space delay value;
and solving the sound source position calculation equation set based on a minimum mean square error method to obtain the initial position of the real-time microphone signal.
Specifically, referring to the schematic diagram of the microphone pair in relation to the position of the sound source in fig. 3, the system of equations for solving the spatial delay values is listed:
Figure BDA0003545726310000101
where c represents a preset sound velocity, τijRepresenting the spatial delay values of the ith relative microphone and the reference microphone, (x)Mi,yMi,zMi) Represents the spatial position of the ith relative microphone, (x)Mj,yMj,zMj) Represents the spatial position of the reference microphone, (x)S,yS,zS) A sound source position (unknown position) representing the real-time microphone signals;
as can be seen from the solution of the system of equations for the spatial time delay values, the sound source is positioned opposite the microphone MiAnd a reference microphone MjIs a hyperboloid of focus. When the microphone array has a plurality of array elements (three or more)Each pair of microphones and its spatial delay value can define a hyperboloid. Due to the error of the delay estimation, it is impossible to absolutely intersect a point between the hyperboloids, but the overlapping areas of the hyperboloids are concentrated around the sound source in general.
In FIG. 3, MjRepresenting the spatial position of the reference microphone (reference position), Mi representing the spatial position of the ith relative microphone, S representing the sound source position of the real-time microphone signal (unknown position), Rs being the reference microphone MjDistance to sound source S, Ri is the distance to sound source S relative to microphone Mi, dijIs the difference between the distance between the two microphones and the sound source, and is expressed by the following formula:
Figure BDA0003545726310000111
Figure BDA0003545726310000112
Figure BDA0003545726310000113
Figure BDA0003545726310000114
Figure BDA0003545726310000115
the method is simplified to obtain:
Ri 2-dij 2-2dijRs-2ri Trs=0
due to dijD is obtained from an estimate of the time delay between the microphonesij=cτijTherefore, there is inevitably an error from the actual value, and therefore the above equation is not 0, assuming that the error is:
ε=Ri 2-dij 2-2dijRs-2ri Trs
suppose there are M microphones, numbered from 0 to M-1, in the sound source localization system. With a microphone M0The position is a reference point, and a coordinate system is established by taking the position as an origin. It is possible to obtain:
ε=δ-2Rsd-2Srs
Figure BDA0003545726310000121
the purpose of the minimum mean square error method is to minimize the mean square error of the above formula, and then the positioning result is most accurate. By derivation, to achieve the minimum mean square error, the location of the sound source is:
Figure BDA0003545726310000122
Figure BDA0003545726310000123
specifically, the specific formula of SRP-PHAT is as follows:
Figure BDA0003545726310000124
Figure BDA0003545726310000125
compared with the prior art, the sound source positioning method disclosed by the embodiment of the invention calculates the signal delay value of each relative microphone and the reference microphone by acquiring the real-time microphone signals, further calculates the initial position of the real-time microphone according to the signal delay value, the acquired spatial positions of all the relative microphones and the reference microphones and the preset sound velocity, and searches the local space by taking the initial position as the center of the local space and taking the position of the maximum local response power as the sound source position of the real-time microphone signals. Therefore, the signal delay value of each relative microphone and the signal delay value of the reference microphone are calculated through the real-time microphone signals, the initial position of the real-time microphone signals is determined by combining the spatial position and the sound velocity of each microphone, and the initial position is used as the search center to search to obtain the sound source position of the real-time microphone, so that the initial range of spatial search is greatly reduced, the positioning calculation amount is reduced, and the method and the device are suitable for scenes with high real-time requirements.
Referring to fig. 4, it is a schematic structural diagram of a sound source positioning device according to an embodiment of the present invention, where the sound source positioning device includes:
a signal delay value calculation module 11, configured to calculate a signal delay value of each of the relative microphones and the reference microphone according to the acquired real-time microphone signal; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;
an initial position calculating module 12, configured to calculate an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity, and all the signal delay values;
a sound source position calculating module 13, configured to search a local space to obtain a position of a maximum local response power, and use the position of the maximum local response power as a sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.
It should be noted that, for a specific working process of the sound source positioning device, reference may be made to the working process of the sound source positioning method in the foregoing embodiment, and details are not repeated herein.
Compared with the prior art, the sound source positioning device disclosed by the embodiment of the invention calculates the signal delay value of each relative microphone and the reference microphone by acquiring the real-time microphone signals, further calculates the initial position of the real-time microphone according to the signal delay value, the acquired spatial positions of all the relative microphones and the reference microphones and the preset sound velocity, and searches the local space by taking the initial position as the center of the local space and taking the position of the maximum local response power as the sound source position of the real-time microphone signals. Therefore, the device in the embodiment of the invention calculates the signal delay value of each relative microphone and the reference microphone through the real-time microphone signal, determines the initial position of the real-time microphone signal by combining the spatial position and the sound velocity of each microphone, and searches by taking the initial position as the search center to obtain the sound source position of the real-time microphone, thereby greatly reducing the initial range of spatial search, reducing the positioning calculation amount and being suitable for scenes with higher real-time requirements.
Referring to fig. 5, which is a schematic structural diagram of a sound source positioning device according to an embodiment of the present invention, the sound source positioning device includes a processor 21, a memory 22, and a computer program stored in the memory 22 and configured to be executed by the processor 21, and the processor 21, when executing the computer program, implements the steps in the sound source positioning method embodiments, such as the steps S1 to S3 shown in fig. 1; alternatively, the processor 21, when executing the computer program, implements the functions of the modules in the above device embodiments, such as the signal delay value calculating module 11.
Illustratively, the computer program may be divided into one or more modules, which are stored in the memory 22 and executed by the processor 21 to accomplish the present invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the sound source localization device. For example, the computer program may be divided into a signal delay value calculation module 11, an initial position calculation module 12, and an acoustic source position calculation module 13, where the specific functions of each module are as follows:
a signal delay value calculation module 11, configured to calculate a signal delay value of each of the relative microphones and the reference microphone according to the acquired real-time microphone signal; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;
an initial position calculating module 12, configured to calculate an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity, and all the signal delay values;
a sound source position calculating module 13, configured to search a local space to obtain a position of a maximum local response power, and use the position of the maximum local response power as a sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.
The specific working process of each module may refer to the working process of the sound source positioning device described in the above embodiment, and is not described herein again.
The sound source positioning device can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The sound source localization device may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a sound source localization device and is not intended to be limiting, and may include more or fewer components than shown, or some components in combination, or different components, e.g., the sound source localization device may also include input output devices, network access devices, buses, etc.
The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 21 is the control center of the sound source localization device and connects the various parts of the overall sound source localization device using various interfaces and lines.
The memory 22 may be used to store the computer programs and/or modules, and the processor 21 may implement the various functions of the sound source localization apparatus by running or executing the computer programs and/or modules stored in the memory 22 and invoking the data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein the sound source localization device integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A sound source localization method, comprising:
calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;
calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity and all the signal time delay values;
searching a local space to obtain the position of the maximum local response power, and taking the position of the maximum local response power as the sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.
2. The sound source localization method of claim 1, wherein the real-time search radius is obtained by:
searching a global space to obtain a maximum global response power and a global position corresponding to the maximum global response power, calculating a distance error between the global position and the initial position, and setting a power threshold according to the maximum global response power;
setting a search radius according to the distance error;
and taking the search radius as the real-time search radius until the maximum local response power obtained by real-time calculation is smaller than the power threshold, and returning to the step of searching the global space.
3. The sound source localization method of claim 1, wherein the determining the location of the maximum local response power as the sound source location of the real-time microphone signal further comprises:
and performing weighted calculation according to the acquired sound source positions of the plurality of recent historical microphone signals and the sound source position of the real-time microphone signal to obtain a final position, and updating the sound source position of the real-time microphone signal according to the final position.
4. The sound source localization method of claim 3, wherein the final position is calculated by:
Figure FDA0003545726300000021
IF(||rs(t)-rs(t-1)||2>μ);
Figure FDA0003545726300000022
Figure FDA0003545726300000023
wherein r isFinal (a Chinese character of 'gan')(t) represents the final position, ω (t-i) represents the weight coefficient of the sound source position at the previous i-time, rs(t) denotes the sound source position, r, of the real-time microphone signals(t-1) represents the sound source position at the previous time, and μ represents a preset distance threshold.
5. The sound source localization method of claim 1, wherein before calculating the signal delay value for each relative microphone and reference microphone based on the acquired real-time microphone signals, further comprising:
carrying out normalization processing on the real-time microphone signals;
and performing valid frame detection on the real-time microphone signal to delete invalid frames.
6. The method of claim 1, wherein the calculating the signal delay value for each of the relative microphones and the reference microphone based on the acquired real-time microphone signals comprises:
and calculating the signal time delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals based on a GCC-PHAT algorithm.
7. The sound source localization method according to claim 1, wherein the calculating an initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity, and all the signal delay values specifically comprises:
acquiring the reference position of the reference microphone and the relative positions of all the relative microphones, and setting the sound source position of the real-time microphone signal as an unknown position;
listing a solution equation set of the space time delay value of each relative microphone and the reference microphone according to the relative position, the reference position, the unknown position and the obtained sound speed;
establishing a sound source position calculation equation set by using the signal delay value and the solution equation set of the space delay value;
and solving the sound source position calculation equation set based on a minimum mean square error method to obtain the initial position of the real-time microphone signal.
8. A sound source localization apparatus, comprising:
the signal delay value calculation module is used for calculating the signal delay value of each relative microphone and the reference microphone according to the acquired real-time microphone signals; wherein the real-time microphone signals comprise real-time signals of the opposite microphones and real-time signals of the reference microphones, and the number of the opposite microphones is at least 3;
the initial position calculation module is used for calculating the initial position of the real-time microphone signal according to the acquired spatial positions of all the relative microphones, the spatial position of the reference microphone, a preset sound velocity and all the signal time delay values;
a sound source position calculating module, configured to search a local space to obtain a position of a maximum local response power, and use the position of the maximum local response power as a sound source position of the real-time microphone signal; the center of the local space is the initial position, and the radius of the local space is a preset real-time searching radius.
9. A sound source localization device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the sound source localization method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform a sound source localization method according to any one of claims 1 to 7.
CN202210248170.6A 2022-03-14 2022-03-14 Sound source positioning method, device, equipment and storage medium Pending CN114646920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210248170.6A CN114646920A (en) 2022-03-14 2022-03-14 Sound source positioning method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210248170.6A CN114646920A (en) 2022-03-14 2022-03-14 Sound source positioning method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114646920A true CN114646920A (en) 2022-06-21

Family

ID=81994119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210248170.6A Pending CN114646920A (en) 2022-03-14 2022-03-14 Sound source positioning method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114646920A (en)

Similar Documents

Publication Publication Date Title
US10123113B2 (en) Selective audio source enhancement
US11064294B1 (en) Multiple-source tracking and voice activity detections for planar microphone arrays
JP6780644B2 (en) Signal processing equipment, signal processing methods, and signal processing programs
TWI711035B (en) Method, device, audio interaction system, and storage medium for azimuth estimation
CN111445919B (en) Speech enhancement method, system, electronic device, and medium incorporating AI model
US20130083942A1 (en) Processing Signals
CN110634497A (en) Noise reduction method and device, terminal equipment and storage medium
CN110610718B (en) Method and device for extracting expected sound source voice signal
CN110554357A (en) Sound source positioning method and device
CN110556125B (en) Feature extraction method and device based on voice signal and computer storage medium
CN109308909B (en) Signal separation method and device, electronic equipment and storage medium
US20200381002A1 (en) Directional speech separation
CN110648680A (en) Voice data processing method and device, electronic equipment and readable storage medium
WO2016119388A1 (en) Method and device for constructing focus covariance matrix on the basis of voice signal
JP2008015848A (en) Object area search method, object area search program and object area search device
WO2022142853A1 (en) Method and device for sound source positioning
JP6815956B2 (en) Filter coefficient calculator, its method, and program
CN112489674A (en) Speech enhancement method, device, equipment and computer readable storage medium
CN113064118A (en) Sound source positioning method and device
CN114646920A (en) Sound source positioning method, device, equipment and storage medium
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
CN111933182B (en) Sound source tracking method, device, equipment and storage medium
CN111880146B (en) Sound source orientation method and device and storage medium
US10966024B2 (en) Sound source localization device, sound source localization method, and program
CN114038452A (en) Voice separation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination