CN117289208B

CN117289208B - Sound source positioning method and device

Info

Publication number: CN117289208B
Application number: CN202311575126.7A
Authority: CN
Inventors: 徐燕平; 李娅娆; 王邵惇; 刘宇杰
Original assignee: Rstech Ltd
Current assignee: Rstech Ltd
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-02-20
Anticipated expiration: 2043-11-24
Also published as: CN117289208A

Abstract

The invention discloses a sound source positioning method and device, and relates to the technical field of signal processing. One embodiment of the method comprises the following steps: acquiring sound signals based on the microphone array; preprocessing the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type; carrying out beam forming processing on the preprocessed sound signals; based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if so, terminating the current flow, otherwise, calculating a steering vector based on a sound source positioning result, performing beam forming processing on the sound signal based on the steering vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal. This embodiment can improve sound source positioning accuracy.

Description

Sound source positioning method and device

Technical Field

The present invention relates to the field of signal processing technologies, and in particular, to a method and an apparatus for positioning a sound source.

Background

The sound source positioning has wide application in the fields of security monitoring, intelligent home furnishing and the like, so how to accurately position the sound source becomes a problem to be solved at present.

In the prior art, a microphone array is used for collecting sound signals, and the sound signals are processed through a positioning algorithm such as a least square method to obtain the position of a sound source.

However, this method has low positioning accuracy.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a sound source positioning method and apparatus, which can improve sound source positioning accuracy.

In a first aspect, an embodiment of the present invention provides a sound source positioning method, including:

acquiring sound signals based on the microphone array;

preprocessing the sound signal;

identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal;

determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;

carrying out beam forming processing on the preprocessed sound signals;

based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result;

and determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, carrying out beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and carrying out preprocessing on the sound signal based on the enhanced sound signal.

In a second aspect, an embodiment of the present invention provides a sound source positioning apparatus, including:

an acquisition module configured to acquire sound signals based on the microphone array;

an identification module configured to pre-process the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;

a positioning module configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, carrying out beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and carrying out preprocessing on the sound signal based on the enhanced sound signal.

In a third aspect, an embodiment of the present invention provides an electronic device, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments above.

In a fourth aspect, embodiments of the present invention provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as in any of the embodiments described above.

One embodiment of the above invention has the following advantages or benefits: based on the type of the sound source to which the sound signal belongs, a more adaptive positioning algorithm is selected, and the accuracy of a positioning result is improved. And enhancing the sound signal by using the positioning result to obtain the sound signal with higher signal-to-noise ratio. The enhanced sound signal can reduce the influence of noise, is favorable for more accurately dividing the sound source type, and further obtains more accurate positioning results. Similarly, the embodiment of the invention carries out sound source localization based on sound source identification, optimizes the sound source identification based on the localization result, and improves the accuracy of sound source localization through multiple cycles.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a flow chart of a method of sound source localization provided in one embodiment of the present invention;

FIG. 2 is a flow chart of a method for controlling a camera based on sound source localization according to one embodiment of the present invention;

FIG. 3 is a schematic view of a sound source localization apparatus provided in accordance with one embodiment of the present invention;

fig. 4 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As shown in fig. 1, an embodiment of the present invention provides a sound source localization method, including:

step 101: sound signals are acquired based on the microphone array.

The microphone array may collect sound signals at a frequency range of [100hz,8000hz ].

Step 102: the sound signal is preprocessed.

The preprocessing may include noise removal, filtering, and the like.

Step 103: based on the preprocessed sound signal, the target sound source type to which the sound signal belongs is identified.

Step 104: among a plurality of localization algorithms corresponding to the sound source type, a target localization algorithm corresponding to the target sound source type is determined.

The sound source type can be human voice, vehicle voice and the like, and different localization algorithms are suitable for sound source localization of different source types, for example, if the sound source type is human voice, the adopted localization algorithm can be a cross correlation method.

Step 105: and carrying out beam forming processing on the preprocessed sound signals.

The beamforming process can enhance the sound signal in the direction of interest, suppressing noise and interference in other directions.

Step 106: and based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result.

The positioning result may include position coordinates of the sound source, and may further include angle information, such as an incident angle of the sound source.

Step 107: it is determined whether the loop termination condition is satisfied, if so, the current flow is terminated, otherwise, step 108 is performed.

Step 108: a steering vector is calculated based on the localization result of the sound source.

Step 109: the sound signal is beamformed based on the steering vector, resulting in an enhanced sound signal, and step 102 is performed based on the enhanced sound signal.

According to the embodiment of the invention, based on the type of the sound source to which the sound signal belongs, a more adaptive positioning algorithm is selected, and the accuracy of a positioning result is improved. And enhancing the sound signal by using the positioning result to obtain the sound signal with higher signal-to-noise ratio. The enhanced sound signal can reduce the influence of noise, is favorable for more accurately dividing the sound source type, and further obtains more accurate positioning results. Similarly, the embodiment of the invention carries out sound source localization based on sound source identification, optimizes the sound source identification based on the localization result, and improves the accuracy of sound source localization through multiple cycles.

The method may be applied, i.e. before the acquisition of the sound signal based on the microphone array is performed, by calibrating the parameters of the microphone array, the parameters of the filter for preprocessing the sound signal, by calibrating the signals. Specifically, preprocessing a calibration signal, identifying the type of a calibration sound source to which the calibration signal belongs based on the preprocessed calibration signal, determining a calibration positioning algorithm corresponding to the type of the calibration sound source in a plurality of positioning algorithms corresponding to the type of the sound source, carrying out beam forming processing on the preprocessed calibration signal, obtaining a calibration positioning result based on the calibration signal subjected to the beam forming processing by adopting the calibration positioning algorithm, comparing the calibration positioning result with the position information of the sound source to which the calibration signal belongs to obtain a positioning error, and adjusting parameters of a microphone array and/or parameters of a filter based on the positioning error. Parameters of the microphone array include the type, layout, number, etc. of microphones.

In one embodiment of the present invention, identifying a target sound source type to which a sound signal belongs based on a preprocessed sound signal includes:

extracting the characteristics of the preprocessed sound signals to obtain sound characteristics;

carrying out feature standardization on sound features;

and inputting the sound characteristics subjected to the characteristic standardization into a trained sound source classification model to obtain the target sound source type to which the sound signal belongs.

Feature normalization may be performed by subtracting the mean of the sound features from each obtained sound feature to change the mean of the updated sound feature to about 0, and may be performed by dividing each sound feature by the standard deviation of the sound feature to normalize the sound feature to a unit variance. The influence of scale difference can be eliminated through feature standardization, and the accuracy of sound source identification is improved.

The sound source classification model preselection is obtained through sample signal training, specifically, sample signals are preprocessed, the preprocessed sample signals are subjected to feature extraction to obtain sound features of the sample signals, the sound features of the sample signals are subjected to feature standardization, the sound features of the sample signals subjected to feature standardization are input into the decision tree model, and parameters of the decision tree model are adjusted based on results output by the decision tree model and labels of the sample signals. The decision tree model can also be replaced by other classification models such as a support vector machine model.

The machine learning algorithm can fully learn the relation between the sound characteristics and the sound source types, and can more accurately identify the sound source types to which the sound signals belong.

In one embodiment of the present invention, feature extraction is performed on a preprocessed sound signal to obtain sound features, including:

converting the preprocessed sound signal from a time domain representation to a frequency domain representation based on a short-time fourier transform;

extracting mel-frequency spectrum features from the sound signal represented by the frequency domain;

extracting channel parameters from the preprocessed sound signal based on linear predictive coding;

the fundamental frequency of the sound signal is extracted.

According to the embodiment of the invention, the characteristics of different dimensions are extracted from the sound signals, and the accuracy of sound source identification is improved. In an actual application scenario, the sound features may include: any one or more of mel spectral characteristics, channel parameters, and fundamental frequencies. For example, only mel spectral features are included.

In one embodiment of the present invention, after obtaining the localization result of the sound source using the target localization algorithm, before determining whether the loop termination condition is satisfied, the method further includes:

based on the positioning results of the multi-frame sound signals, an average positioning result is calculated, and a determination is performed as to whether or not a cycle termination condition is satisfied.

In order to reduce the positioning error of the single-frame sound signal, the positioning results of the multi-frame sound signals can be fused to obtain the final positioning result. The fusion mode adopted by the embodiment of the invention is to calculate the average value, specifically, calculate the average position coordinate according to the position coordinates in each positioning result.

In one embodiment of the invention, the method further comprises: and performing smoothing processing on the sound source positioning result based on the Kalman filter, and determining whether the cycle termination condition is met.

In order to further reduce the influence of factors such as noise, the embodiment of the invention carries out smoothing processing on the sound source positioning result based on the Kalman filter, and further improves the accuracy of the positioning result.

In the actual application scene, the positioning result can be filtered through a set threshold value to eliminate abnormal or unstable positioning points.

In one embodiment of the present invention, calculating a steering vector based on a localization result of a sound source includes:

calculating a steering vector based on formula (1);

（1）

wherein,for the characterization of the steering vector,for the characterization of the imaginary units,for characterizing the frequency of the sound signal,for characterizing the distance of the sound source to the mth microphone of the microphone array,for characterizing the angle of incidence of the sound source,for characterizing the speed of sound,anddetermined by the localization result of the sound source.

Steering vectors are used to determine weight vectors in the beamforming formula, e.g., steering vectors of microphones are directly taken as weight vectors.

The beamforming formula is shown in formula (2).

（2）

Wherein,for characterizing the signal output after the beamforming process,the weight vector used to characterize microphone i,for characterizing the signal received by microphone i at time t.

In one embodiment of the invention, the positioning algorithm comprises: any one or more of a cross correlation method, a mutual information method and a least square method.

As shown in fig. 2, an embodiment of the present invention provides a method for controlling a camera based on sound source localization, including:

step 201: the control system obtains the positioning result of the noise source.

Step 202: and the control system calculates the azimuth angle and the pitch angle of the noise source relative to the camera according to the positioning result.

Step 203: and the control system sends a control instruction to the camera according to the azimuth angle and the pitch angle.

Step 204: and the camera turns to the direction of the noise source according to the control instruction and shoots an image comprising the noise source.

Step 205: the control system analyzes the image of the noise source, identifies the type of the noise source, and sends out an alarm signal when the noise source meets the alarm condition.

If the noise source is identified as potentially safe, the control system may trigger an alarm mechanism informing the relevant personnel or automatically taking action.

According to the embodiment of the invention, the sound source positioning is combined with the control of the camera, so that potential safety risks can be found in time, and the emergency treatment speed is improved.

As shown in fig. 3, an embodiment of the present invention provides a sound source positioning apparatus, including:

an acquisition module 301 configured to acquire sound signals based on the microphone array;

an identification module 302 configured to pre-process the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;

a positioning module 303 configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if so, terminating the current flow, otherwise, calculating a steering vector based on a sound source positioning result, performing beam forming processing on the sound signal based on the steering vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal.

In one embodiment of the present invention, the recognition module 302 is configured to perform feature extraction on the preprocessed sound signal to obtain sound features; carrying out feature standardization on sound features; and inputting the sound characteristics subjected to the characteristic standardization into a trained sound source classification model to obtain the target sound source type to which the sound signal belongs.

In one embodiment of the present invention, the positioning module 303 is configured to calculate an average positioning result based on the positioning result of the multi-frame sound signal, and perform a determination as to whether the cycle termination condition is satisfied.

In one embodiment of the present invention, the positioning module 303 is configured to perform smoothing processing on the positioning result of the sound source based on the kalman filter, and perform determination as to whether the loop termination condition is satisfied.

In one embodiment of the invention, the positioning module 303 is configured to calculate the steering vector based on equation (1).

In one embodiment of the invention, the recognition module 302 is configured to convert the preprocessed sound signal from a time domain representation to a frequency domain representation based on a short-time fourier transform; extracting mel-frequency spectrum features from the sound signal represented by the frequency domain; extracting channel parameters from the preprocessed sound signal based on linear predictive coding; the fundamental frequency of the sound signal is extracted.

An embodiment of the present invention provides an electronic device, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.

The present invention provides a computer readable medium having stored thereon a computer program which when executed by a processor implements a method as in any of the embodiments described above.

Referring now to FIG. 4, there is illustrated a schematic diagram of a computer system 400 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 4 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU) 401, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output portion 407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 408 including a hard disk or the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. The drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 409 and/or installed from the removable medium 411. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 401.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases limit the module itself, and for example, the transmitting module may also be described as "a module that transmits a picture acquisition request to a connected server".

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A sound source localization method, comprising:

acquiring sound signals based on the microphone array;

preprocessing the sound signal;

carrying out beam forming processing on the preprocessed sound signals;

determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, performing beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal;

calculating a steering vector based on the localization result of the sound source, comprising:

calculating the steering vector based on equation 1;

equation 1:

wherein (1)>For characterizing guide vectors>For characterizing imaginary units, < >>Frequency for characterizing said sound signal, +.>For characterizing the distance of the sound source to the mth microphone of the microphone array，/>For characterizing the angle of incidence of said sound source, +.>For characterizing sound speed->And->And determining by the positioning result of the sound source.

2. The method of claim 1, wherein,

based on the preprocessed sound signal, identifying a target sound source type to which the sound signal belongs, comprising:

performing feature standardization on the sound features;

3. The method of claim 1, wherein,

after the target positioning algorithm is adopted to obtain the positioning result of the sound source, before determining whether the cycle termination condition is met, the method further comprises the following steps:

and calculating an average positioning result based on the positioning result of the multi-frame sound signals, and executing the determination whether the cycle termination condition is met.

4. The method as recited in claim 1, further comprising: and carrying out smoothing processing on the sound source positioning result based on a Kalman filter, and executing the determination whether the loop termination condition is met.

5. The method of claim 2, wherein,

extracting the characteristics of the preprocessed sound signals to obtain sound characteristics, wherein the method comprises the following steps:

extracting a fundamental frequency of the sound signal.

6. The method of claim 1, wherein,

the positioning algorithm comprises the following steps: any one or more of a cross correlation method, a mutual information method and a least square method.

7. A sound source localization apparatus, comprising:

a positioning module configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result; determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, performing beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal;

calculating the steering vector based on equation 1;

equation 1:

wherein (1)>For characterizing guide vectors>For characterizing imaginary units, < >>Frequency for characterizing said sound signal, +.>For characterizing the distance of the sound source to the m-th microphone of the microphone array,/->For characterizing the angle of incidence of said sound source, +.>For characterizing sound speed->And->And determining by the positioning result of the sound source.

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

9. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.