CN117289208B - Sound source positioning method and device - Google Patents
Sound source positioning method and device Download PDFInfo
- Publication number
- CN117289208B CN117289208B CN202311575126.7A CN202311575126A CN117289208B CN 117289208 B CN117289208 B CN 117289208B CN 202311575126 A CN202311575126 A CN 202311575126A CN 117289208 B CN117289208 B CN 117289208B
- Authority
- CN
- China
- Prior art keywords
- sound
- sound source
- sound signal
- positioning
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000005236 sound signal Effects 0.000 claims abstract description 117
- 238000012545 processing Methods 0.000 claims abstract description 32
- 239000013598 vector Substances 0.000 claims abstract description 29
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 230000004807 localization Effects 0.000 claims description 25
- 238000004590 computer program Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 238000013145 classification model Methods 0.000 claims description 5
- 238000009499 grossing Methods 0.000 claims description 4
- 238000001228 spectrum Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a sound source positioning method and device, and relates to the technical field of signal processing. One embodiment of the method comprises the following steps: acquiring sound signals based on the microphone array; preprocessing the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type; carrying out beam forming processing on the preprocessed sound signals; based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if so, terminating the current flow, otherwise, calculating a steering vector based on a sound source positioning result, performing beam forming processing on the sound signal based on the steering vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal. This embodiment can improve sound source positioning accuracy.
Description
Technical Field
The present invention relates to the field of signal processing technologies, and in particular, to a method and an apparatus for positioning a sound source.
Background
The sound source positioning has wide application in the fields of security monitoring, intelligent home furnishing and the like, so how to accurately position the sound source becomes a problem to be solved at present.
In the prior art, a microphone array is used for collecting sound signals, and the sound signals are processed through a positioning algorithm such as a least square method to obtain the position of a sound source.
However, this method has low positioning accuracy.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a sound source positioning method and apparatus, which can improve sound source positioning accuracy.
In a first aspect, an embodiment of the present invention provides a sound source positioning method, including:
acquiring sound signals based on the microphone array;
preprocessing the sound signal;
identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal;
determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
carrying out beam forming processing on the preprocessed sound signals;
based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result;
and determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, carrying out beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and carrying out preprocessing on the sound signal based on the enhanced sound signal.
In a second aspect, an embodiment of the present invention provides a sound source positioning apparatus, including:
an acquisition module configured to acquire sound signals based on the microphone array;
an identification module configured to pre-process the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
a positioning module configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, carrying out beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and carrying out preprocessing on the sound signal based on the enhanced sound signal.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments above.
In a fourth aspect, embodiments of the present invention provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as in any of the embodiments described above.
One embodiment of the above invention has the following advantages or benefits: based on the type of the sound source to which the sound signal belongs, a more adaptive positioning algorithm is selected, and the accuracy of a positioning result is improved. And enhancing the sound signal by using the positioning result to obtain the sound signal with higher signal-to-noise ratio. The enhanced sound signal can reduce the influence of noise, is favorable for more accurately dividing the sound source type, and further obtains more accurate positioning results. Similarly, the embodiment of the invention carries out sound source localization based on sound source identification, optimizes the sound source identification based on the localization result, and improves the accuracy of sound source localization through multiple cycles.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a flow chart of a method of sound source localization provided in one embodiment of the present invention;
FIG. 2 is a flow chart of a method for controlling a camera based on sound source localization according to one embodiment of the present invention;
FIG. 3 is a schematic view of a sound source localization apparatus provided in accordance with one embodiment of the present invention;
fig. 4 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, an embodiment of the present invention provides a sound source localization method, including:
step 101: sound signals are acquired based on the microphone array.
The microphone array may collect sound signals at a frequency range of [100hz,8000hz ].
Step 102: the sound signal is preprocessed.
The preprocessing may include noise removal, filtering, and the like.
Step 103: based on the preprocessed sound signal, the target sound source type to which the sound signal belongs is identified.
Step 104: among a plurality of localization algorithms corresponding to the sound source type, a target localization algorithm corresponding to the target sound source type is determined.
The sound source type can be human voice, vehicle voice and the like, and different localization algorithms are suitable for sound source localization of different source types, for example, if the sound source type is human voice, the adopted localization algorithm can be a cross correlation method.
Step 105: and carrying out beam forming processing on the preprocessed sound signals.
The beamforming process can enhance the sound signal in the direction of interest, suppressing noise and interference in other directions.
Step 106: and based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result.
The positioning result may include position coordinates of the sound source, and may further include angle information, such as an incident angle of the sound source.
Step 107: it is determined whether the loop termination condition is satisfied, if so, the current flow is terminated, otherwise, step 108 is performed.
Step 108: a steering vector is calculated based on the localization result of the sound source.
Step 109: the sound signal is beamformed based on the steering vector, resulting in an enhanced sound signal, and step 102 is performed based on the enhanced sound signal.
According to the embodiment of the invention, based on the type of the sound source to which the sound signal belongs, a more adaptive positioning algorithm is selected, and the accuracy of a positioning result is improved. And enhancing the sound signal by using the positioning result to obtain the sound signal with higher signal-to-noise ratio. The enhanced sound signal can reduce the influence of noise, is favorable for more accurately dividing the sound source type, and further obtains more accurate positioning results. Similarly, the embodiment of the invention carries out sound source localization based on sound source identification, optimizes the sound source identification based on the localization result, and improves the accuracy of sound source localization through multiple cycles.
The method may be applied, i.e. before the acquisition of the sound signal based on the microphone array is performed, by calibrating the parameters of the microphone array, the parameters of the filter for preprocessing the sound signal, by calibrating the signals. Specifically, preprocessing a calibration signal, identifying the type of a calibration sound source to which the calibration signal belongs based on the preprocessed calibration signal, determining a calibration positioning algorithm corresponding to the type of the calibration sound source in a plurality of positioning algorithms corresponding to the type of the sound source, carrying out beam forming processing on the preprocessed calibration signal, obtaining a calibration positioning result based on the calibration signal subjected to the beam forming processing by adopting the calibration positioning algorithm, comparing the calibration positioning result with the position information of the sound source to which the calibration signal belongs to obtain a positioning error, and adjusting parameters of a microphone array and/or parameters of a filter based on the positioning error. Parameters of the microphone array include the type, layout, number, etc. of microphones.
In one embodiment of the present invention, identifying a target sound source type to which a sound signal belongs based on a preprocessed sound signal includes:
extracting the characteristics of the preprocessed sound signals to obtain sound characteristics;
carrying out feature standardization on sound features;
and inputting the sound characteristics subjected to the characteristic standardization into a trained sound source classification model to obtain the target sound source type to which the sound signal belongs.
Feature normalization may be performed by subtracting the mean of the sound features from each obtained sound feature to change the mean of the updated sound feature to about 0, and may be performed by dividing each sound feature by the standard deviation of the sound feature to normalize the sound feature to a unit variance. The influence of scale difference can be eliminated through feature standardization, and the accuracy of sound source identification is improved.
The sound source classification model preselection is obtained through sample signal training, specifically, sample signals are preprocessed, the preprocessed sample signals are subjected to feature extraction to obtain sound features of the sample signals, the sound features of the sample signals are subjected to feature standardization, the sound features of the sample signals subjected to feature standardization are input into the decision tree model, and parameters of the decision tree model are adjusted based on results output by the decision tree model and labels of the sample signals. The decision tree model can also be replaced by other classification models such as a support vector machine model.
The machine learning algorithm can fully learn the relation between the sound characteristics and the sound source types, and can more accurately identify the sound source types to which the sound signals belong.
In one embodiment of the present invention, feature extraction is performed on a preprocessed sound signal to obtain sound features, including:
converting the preprocessed sound signal from a time domain representation to a frequency domain representation based on a short-time fourier transform;
extracting mel-frequency spectrum features from the sound signal represented by the frequency domain;
extracting channel parameters from the preprocessed sound signal based on linear predictive coding;
the fundamental frequency of the sound signal is extracted.
According to the embodiment of the invention, the characteristics of different dimensions are extracted from the sound signals, and the accuracy of sound source identification is improved. In an actual application scenario, the sound features may include: any one or more of mel spectral characteristics, channel parameters, and fundamental frequencies. For example, only mel spectral features are included.
In one embodiment of the present invention, after obtaining the localization result of the sound source using the target localization algorithm, before determining whether the loop termination condition is satisfied, the method further includes:
based on the positioning results of the multi-frame sound signals, an average positioning result is calculated, and a determination is performed as to whether or not a cycle termination condition is satisfied.
In order to reduce the positioning error of the single-frame sound signal, the positioning results of the multi-frame sound signals can be fused to obtain the final positioning result. The fusion mode adopted by the embodiment of the invention is to calculate the average value, specifically, calculate the average position coordinate according to the position coordinates in each positioning result.
In one embodiment of the invention, the method further comprises: and performing smoothing processing on the sound source positioning result based on the Kalman filter, and determining whether the cycle termination condition is met.
In order to further reduce the influence of factors such as noise, the embodiment of the invention carries out smoothing processing on the sound source positioning result based on the Kalman filter, and further improves the accuracy of the positioning result.
In the actual application scene, the positioning result can be filtered through a set threshold value to eliminate abnormal or unstable positioning points.
In one embodiment of the present invention, calculating a steering vector based on a localization result of a sound source includes:
calculating a steering vector based on formula (1);
(1)
wherein,for the characterization of the steering vector,for the characterization of the imaginary units,for characterizing the frequency of the sound signal,for characterizing the distance of the sound source to the mth microphone of the microphone array,for characterizing the angle of incidence of the sound source,for characterizing the speed of sound,anddetermined by the localization result of the sound source.
Steering vectors are used to determine weight vectors in the beamforming formula, e.g., steering vectors of microphones are directly taken as weight vectors.
The beamforming formula is shown in formula (2).
(2)
Wherein,for characterizing the signal output after the beamforming process,the weight vector used to characterize microphone i,for characterizing the signal received by microphone i at time t.
In one embodiment of the invention, the positioning algorithm comprises: any one or more of a cross correlation method, a mutual information method and a least square method.
As shown in fig. 2, an embodiment of the present invention provides a method for controlling a camera based on sound source localization, including:
step 201: the control system obtains the positioning result of the noise source.
Step 202: and the control system calculates the azimuth angle and the pitch angle of the noise source relative to the camera according to the positioning result.
Step 203: and the control system sends a control instruction to the camera according to the azimuth angle and the pitch angle.
Step 204: and the camera turns to the direction of the noise source according to the control instruction and shoots an image comprising the noise source.
Step 205: the control system analyzes the image of the noise source, identifies the type of the noise source, and sends out an alarm signal when the noise source meets the alarm condition.
If the noise source is identified as potentially safe, the control system may trigger an alarm mechanism informing the relevant personnel or automatically taking action.
According to the embodiment of the invention, the sound source positioning is combined with the control of the camera, so that potential safety risks can be found in time, and the emergency treatment speed is improved.
As shown in fig. 3, an embodiment of the present invention provides a sound source positioning apparatus, including:
an acquisition module 301 configured to acquire sound signals based on the microphone array;
an identification module 302 configured to pre-process the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
a positioning module 303 configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if so, terminating the current flow, otherwise, calculating a steering vector based on a sound source positioning result, performing beam forming processing on the sound signal based on the steering vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal.
In one embodiment of the present invention, the recognition module 302 is configured to perform feature extraction on the preprocessed sound signal to obtain sound features; carrying out feature standardization on sound features; and inputting the sound characteristics subjected to the characteristic standardization into a trained sound source classification model to obtain the target sound source type to which the sound signal belongs.
In one embodiment of the present invention, the positioning module 303 is configured to calculate an average positioning result based on the positioning result of the multi-frame sound signal, and perform a determination as to whether the cycle termination condition is satisfied.
In one embodiment of the present invention, the positioning module 303 is configured to perform smoothing processing on the positioning result of the sound source based on the kalman filter, and perform determination as to whether the loop termination condition is satisfied.
In one embodiment of the invention, the positioning module 303 is configured to calculate the steering vector based on equation (1).
In one embodiment of the invention, the recognition module 302 is configured to convert the preprocessed sound signal from a time domain representation to a frequency domain representation based on a short-time fourier transform; extracting mel-frequency spectrum features from the sound signal represented by the frequency domain; extracting channel parameters from the preprocessed sound signal based on linear predictive coding; the fundamental frequency of the sound signal is extracted.
An embodiment of the present invention provides an electronic device, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.
The present invention provides a computer readable medium having stored thereon a computer program which when executed by a processor implements a method as in any of the embodiments described above.
Referring now to FIG. 4, there is illustrated a schematic diagram of a computer system 400 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 4 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU) 401, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output portion 407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 408 including a hard disk or the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. The drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 409 and/or installed from the removable medium 411. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 401.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases limit the module itself, and for example, the transmitting module may also be described as "a module that transmits a picture acquisition request to a connected server".
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (9)
1. A sound source localization method, comprising:
acquiring sound signals based on the microphone array;
preprocessing the sound signal;
identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal;
determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
carrying out beam forming processing on the preprocessed sound signals;
based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result;
determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, performing beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal;
calculating a steering vector based on the localization result of the sound source, comprising:
calculating the steering vector based on equation 1;
equation 1:
wherein (1)>For characterizing guide vectors>For characterizing imaginary units, < >>Frequency for characterizing said sound signal, +.>For characterizing the distance of the sound source to the mth microphone of the microphone array,/>For characterizing the angle of incidence of said sound source, +.>For characterizing sound speed->And->And determining by the positioning result of the sound source.
2. The method of claim 1, wherein,
based on the preprocessed sound signal, identifying a target sound source type to which the sound signal belongs, comprising:
extracting the characteristics of the preprocessed sound signals to obtain sound characteristics;
performing feature standardization on the sound features;
and inputting the sound characteristics subjected to the characteristic standardization into a trained sound source classification model to obtain the target sound source type to which the sound signal belongs.
3. The method of claim 1, wherein,
after the target positioning algorithm is adopted to obtain the positioning result of the sound source, before determining whether the cycle termination condition is met, the method further comprises the following steps:
and calculating an average positioning result based on the positioning result of the multi-frame sound signals, and executing the determination whether the cycle termination condition is met.
4. The method as recited in claim 1, further comprising: and carrying out smoothing processing on the sound source positioning result based on a Kalman filter, and executing the determination whether the loop termination condition is met.
5. The method of claim 2, wherein,
extracting the characteristics of the preprocessed sound signals to obtain sound characteristics, wherein the method comprises the following steps:
converting the preprocessed sound signal from a time domain representation to a frequency domain representation based on a short-time fourier transform;
extracting mel-frequency spectrum features from the sound signal represented by the frequency domain;
extracting channel parameters from the preprocessed sound signal based on linear predictive coding;
extracting a fundamental frequency of the sound signal.
6. The method of claim 1, wherein,
the positioning algorithm comprises the following steps: any one or more of a cross correlation method, a mutual information method and a least square method.
7. A sound source localization apparatus, comprising:
an acquisition module configured to acquire sound signals based on the microphone array;
an identification module configured to pre-process the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
a positioning module configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result; determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, performing beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal;
calculating a steering vector based on the localization result of the sound source, comprising:
calculating the steering vector based on equation 1;
equation 1:
wherein (1)>For characterizing guide vectors>For characterizing imaginary units, < >>Frequency for characterizing said sound signal, +.>For characterizing the distance of the sound source to the m-th microphone of the microphone array,/->For characterizing the angle of incidence of said sound source, +.>For characterizing sound speed->And->And determining by the positioning result of the sound source.
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
9. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311575126.7A CN117289208B (en) | 2023-11-24 | 2023-11-24 | Sound source positioning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311575126.7A CN117289208B (en) | 2023-11-24 | 2023-11-24 | Sound source positioning method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117289208A CN117289208A (en) | 2023-12-26 |
CN117289208B true CN117289208B (en) | 2024-02-20 |
Family
ID=89241038
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311575126.7A Active CN117289208B (en) | 2023-11-24 | 2023-11-24 | Sound source positioning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117289208B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109920448A (en) * | 2019-02-26 | 2019-06-21 | 江苏大学 | A kind of identifying system and method for automatic driving vehicle traffic environment special type sound |
CN112379330A (en) * | 2020-11-27 | 2021-02-19 | 浙江同善人工智能技术有限公司 | Multi-robot cooperative 3D sound source identification and positioning method |
CN112485761A (en) * | 2021-02-03 | 2021-03-12 | 成都启英泰伦科技有限公司 | Sound source positioning method based on double microphones |
CN114089279A (en) * | 2021-10-15 | 2022-02-25 | 浙江工业大学 | Sound target positioning method based on uniform concentric circle microphone array |
CN114114153A (en) * | 2021-11-23 | 2022-03-01 | 哈尔滨工业大学(深圳) | Multi-sound-source positioning method and system, microphone array and terminal device |
CN114325214A (en) * | 2021-11-18 | 2022-04-12 | 国网辽宁省电力有限公司电力科学研究院 | Electric power online monitoring method based on microphone array sound source positioning technology |
CN116106826A (en) * | 2022-12-16 | 2023-05-12 | 北京奕斯伟计算技术股份有限公司 | Sound source positioning method, related device and medium |
CN116381717A (en) * | 2023-04-25 | 2023-07-04 | 广东石油化工学院 | Unmanned aerial vehicle positioning device and positioning method based on Kalman filtering algorithm |
CN116705056A (en) * | 2023-07-25 | 2023-09-05 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio generation method, vocoder, electronic device and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6501260B2 (en) * | 2015-08-20 | 2019-04-17 | 本田技研工業株式会社 | Sound processing apparatus and sound processing method |
JP7266433B2 (en) * | 2019-03-15 | 2023-04-28 | 本田技研工業株式会社 | Sound source localization device, sound source localization method, and program |
-
2023
- 2023-11-24 CN CN202311575126.7A patent/CN117289208B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109920448A (en) * | 2019-02-26 | 2019-06-21 | 江苏大学 | A kind of identifying system and method for automatic driving vehicle traffic environment special type sound |
CN112379330A (en) * | 2020-11-27 | 2021-02-19 | 浙江同善人工智能技术有限公司 | Multi-robot cooperative 3D sound source identification and positioning method |
CN112485761A (en) * | 2021-02-03 | 2021-03-12 | 成都启英泰伦科技有限公司 | Sound source positioning method based on double microphones |
CN114089279A (en) * | 2021-10-15 | 2022-02-25 | 浙江工业大学 | Sound target positioning method based on uniform concentric circle microphone array |
CN114325214A (en) * | 2021-11-18 | 2022-04-12 | 国网辽宁省电力有限公司电力科学研究院 | Electric power online monitoring method based on microphone array sound source positioning technology |
CN114114153A (en) * | 2021-11-23 | 2022-03-01 | 哈尔滨工业大学(深圳) | Multi-sound-source positioning method and system, microphone array and terminal device |
CN116106826A (en) * | 2022-12-16 | 2023-05-12 | 北京奕斯伟计算技术股份有限公司 | Sound source positioning method, related device and medium |
CN116381717A (en) * | 2023-04-25 | 2023-07-04 | 广东石油化工学院 | Unmanned aerial vehicle positioning device and positioning method based on Kalman filtering algorithm |
CN116705056A (en) * | 2023-07-25 | 2023-09-05 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio generation method, vocoder, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117289208A (en) | 2023-12-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11398235B2 (en) | Methods, apparatuses, systems, devices, and computer-readable storage media for processing speech signals based on horizontal and pitch angles and distance of a sound source relative to a microphone array | |
CN107799126B (en) | Voice endpoint detection method and device based on supervised machine learning | |
EP2530484B1 (en) | Sound source localization apparatus and method | |
RU2642353C2 (en) | Device and method for providing informed probability estimation and multichannel speech presence | |
CN108701469B (en) | Cough sound recognition method, device, and storage medium | |
US11941968B2 (en) | Systems and methods for identifying an acoustic source based on observed sound | |
CN111048104B (en) | Speech enhancement processing method, device and storage medium | |
CN106872945B (en) | Sound source positioning method and device and electronic equipment | |
US10951982B2 (en) | Signal processing apparatus, signal processing method, and computer program product | |
JP2017102085A (en) | Information processing apparatus, information processing method, and program | |
CN110709929B (en) | Processing sound data to separate sound sources in a multi-channel signal | |
US20190281386A1 (en) | Apparatus and a method for unwrapping phase differences | |
JP5708294B2 (en) | Signal detection apparatus, signal detection method, and signal detection program | |
CN112992190A (en) | Audio signal processing method and device, electronic equipment and storage medium | |
AU2013204156A1 (en) | Classification apparatus and program | |
US20180188104A1 (en) | Signal detection device, signal detection method, and recording medium | |
CN116421163A (en) | Vital sign detection method and device | |
CN117289208B (en) | Sound source positioning method and device | |
CN103890843B (en) | Signal noise attenuation | |
CN113093106A (en) | Sound source positioning method and system | |
CN110890099A (en) | Sound signal processing method, device and storage medium | |
KR101671305B1 (en) | Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same | |
CN117169812A (en) | Sound source positioning method based on deep learning and beam forming | |
KR101711302B1 (en) | Discriminative Weight Training for Dual-Microphone based Voice Activity Detection and Method thereof | |
KR20180068467A (en) | Speech recognition method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |