CN117289208B - Sound source positioning method and device - Google Patents

Sound source positioning method and device Download PDF

Info

Publication number
CN117289208B
CN117289208B CN202311575126.7A CN202311575126A CN117289208B CN 117289208 B CN117289208 B CN 117289208B CN 202311575126 A CN202311575126 A CN 202311575126A CN 117289208 B CN117289208 B CN 117289208B
Authority
CN
China
Prior art keywords
sound
sound source
sound signal
positioning
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311575126.7A
Other languages
Chinese (zh)
Other versions
CN117289208A (en
Inventor
徐燕平
李娅娆
王邵惇
刘宇杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rstech Ltd
Original Assignee
Rstech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rstech Ltd filed Critical Rstech Ltd
Priority to CN202311575126.7A priority Critical patent/CN117289208B/en
Publication of CN117289208A publication Critical patent/CN117289208A/en
Application granted granted Critical
Publication of CN117289208B publication Critical patent/CN117289208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a sound source positioning method and device, and relates to the technical field of signal processing. One embodiment of the method comprises the following steps: acquiring sound signals based on the microphone array; preprocessing the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type; carrying out beam forming processing on the preprocessed sound signals; based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if so, terminating the current flow, otherwise, calculating a steering vector based on a sound source positioning result, performing beam forming processing on the sound signal based on the steering vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal. This embodiment can improve sound source positioning accuracy.

Description

Sound source positioning method and device
Technical Field
The present invention relates to the field of signal processing technologies, and in particular, to a method and an apparatus for positioning a sound source.
Background
The sound source positioning has wide application in the fields of security monitoring, intelligent home furnishing and the like, so how to accurately position the sound source becomes a problem to be solved at present.
In the prior art, a microphone array is used for collecting sound signals, and the sound signals are processed through a positioning algorithm such as a least square method to obtain the position of a sound source.
However, this method has low positioning accuracy.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a sound source positioning method and apparatus, which can improve sound source positioning accuracy.
In a first aspect, an embodiment of the present invention provides a sound source positioning method, including:
acquiring sound signals based on the microphone array;
preprocessing the sound signal;
identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal;
determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
carrying out beam forming processing on the preprocessed sound signals;
based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result;
and determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, carrying out beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and carrying out preprocessing on the sound signal based on the enhanced sound signal.
In a second aspect, an embodiment of the present invention provides a sound source positioning apparatus, including:
an acquisition module configured to acquire sound signals based on the microphone array;
an identification module configured to pre-process the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
a positioning module configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, carrying out beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and carrying out preprocessing on the sound signal based on the enhanced sound signal.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments above.
In a fourth aspect, embodiments of the present invention provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as in any of the embodiments described above.
One embodiment of the above invention has the following advantages or benefits: based on the type of the sound source to which the sound signal belongs, a more adaptive positioning algorithm is selected, and the accuracy of a positioning result is improved. And enhancing the sound signal by using the positioning result to obtain the sound signal with higher signal-to-noise ratio. The enhanced sound signal can reduce the influence of noise, is favorable for more accurately dividing the sound source type, and further obtains more accurate positioning results. Similarly, the embodiment of the invention carries out sound source localization based on sound source identification, optimizes the sound source identification based on the localization result, and improves the accuracy of sound source localization through multiple cycles.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a flow chart of a method of sound source localization provided in one embodiment of the present invention;
FIG. 2 is a flow chart of a method for controlling a camera based on sound source localization according to one embodiment of the present invention;
FIG. 3 is a schematic view of a sound source localization apparatus provided in accordance with one embodiment of the present invention;
fig. 4 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, an embodiment of the present invention provides a sound source localization method, including:
step 101: sound signals are acquired based on the microphone array.
The microphone array may collect sound signals at a frequency range of [100hz,8000hz ].
Step 102: the sound signal is preprocessed.
The preprocessing may include noise removal, filtering, and the like.
Step 103: based on the preprocessed sound signal, the target sound source type to which the sound signal belongs is identified.
Step 104: among a plurality of localization algorithms corresponding to the sound source type, a target localization algorithm corresponding to the target sound source type is determined.
The sound source type can be human voice, vehicle voice and the like, and different localization algorithms are suitable for sound source localization of different source types, for example, if the sound source type is human voice, the adopted localization algorithm can be a cross correlation method.
Step 105: and carrying out beam forming processing on the preprocessed sound signals.
The beamforming process can enhance the sound signal in the direction of interest, suppressing noise and interference in other directions.
Step 106: and based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result.
The positioning result may include position coordinates of the sound source, and may further include angle information, such as an incident angle of the sound source.
Step 107: it is determined whether the loop termination condition is satisfied, if so, the current flow is terminated, otherwise, step 108 is performed.
Step 108: a steering vector is calculated based on the localization result of the sound source.
Step 109: the sound signal is beamformed based on the steering vector, resulting in an enhanced sound signal, and step 102 is performed based on the enhanced sound signal.
According to the embodiment of the invention, based on the type of the sound source to which the sound signal belongs, a more adaptive positioning algorithm is selected, and the accuracy of a positioning result is improved. And enhancing the sound signal by using the positioning result to obtain the sound signal with higher signal-to-noise ratio. The enhanced sound signal can reduce the influence of noise, is favorable for more accurately dividing the sound source type, and further obtains more accurate positioning results. Similarly, the embodiment of the invention carries out sound source localization based on sound source identification, optimizes the sound source identification based on the localization result, and improves the accuracy of sound source localization through multiple cycles.
The method may be applied, i.e. before the acquisition of the sound signal based on the microphone array is performed, by calibrating the parameters of the microphone array, the parameters of the filter for preprocessing the sound signal, by calibrating the signals. Specifically, preprocessing a calibration signal, identifying the type of a calibration sound source to which the calibration signal belongs based on the preprocessed calibration signal, determining a calibration positioning algorithm corresponding to the type of the calibration sound source in a plurality of positioning algorithms corresponding to the type of the sound source, carrying out beam forming processing on the preprocessed calibration signal, obtaining a calibration positioning result based on the calibration signal subjected to the beam forming processing by adopting the calibration positioning algorithm, comparing the calibration positioning result with the position information of the sound source to which the calibration signal belongs to obtain a positioning error, and adjusting parameters of a microphone array and/or parameters of a filter based on the positioning error. Parameters of the microphone array include the type, layout, number, etc. of microphones.
In one embodiment of the present invention, identifying a target sound source type to which a sound signal belongs based on a preprocessed sound signal includes:
extracting the characteristics of the preprocessed sound signals to obtain sound characteristics;
carrying out feature standardization on sound features;
and inputting the sound characteristics subjected to the characteristic standardization into a trained sound source classification model to obtain the target sound source type to which the sound signal belongs.
Feature normalization may be performed by subtracting the mean of the sound features from each obtained sound feature to change the mean of the updated sound feature to about 0, and may be performed by dividing each sound feature by the standard deviation of the sound feature to normalize the sound feature to a unit variance. The influence of scale difference can be eliminated through feature standardization, and the accuracy of sound source identification is improved.
The sound source classification model preselection is obtained through sample signal training, specifically, sample signals are preprocessed, the preprocessed sample signals are subjected to feature extraction to obtain sound features of the sample signals, the sound features of the sample signals are subjected to feature standardization, the sound features of the sample signals subjected to feature standardization are input into the decision tree model, and parameters of the decision tree model are adjusted based on results output by the decision tree model and labels of the sample signals. The decision tree model can also be replaced by other classification models such as a support vector machine model.
The machine learning algorithm can fully learn the relation between the sound characteristics and the sound source types, and can more accurately identify the sound source types to which the sound signals belong.
In one embodiment of the present invention, feature extraction is performed on a preprocessed sound signal to obtain sound features, including:
converting the preprocessed sound signal from a time domain representation to a frequency domain representation based on a short-time fourier transform;
extracting mel-frequency spectrum features from the sound signal represented by the frequency domain;
extracting channel parameters from the preprocessed sound signal based on linear predictive coding;
the fundamental frequency of the sound signal is extracted.
According to the embodiment of the invention, the characteristics of different dimensions are extracted from the sound signals, and the accuracy of sound source identification is improved. In an actual application scenario, the sound features may include: any one or more of mel spectral characteristics, channel parameters, and fundamental frequencies. For example, only mel spectral features are included.
In one embodiment of the present invention, after obtaining the localization result of the sound source using the target localization algorithm, before determining whether the loop termination condition is satisfied, the method further includes:
based on the positioning results of the multi-frame sound signals, an average positioning result is calculated, and a determination is performed as to whether or not a cycle termination condition is satisfied.
In order to reduce the positioning error of the single-frame sound signal, the positioning results of the multi-frame sound signals can be fused to obtain the final positioning result. The fusion mode adopted by the embodiment of the invention is to calculate the average value, specifically, calculate the average position coordinate according to the position coordinates in each positioning result.
In one embodiment of the invention, the method further comprises: and performing smoothing processing on the sound source positioning result based on the Kalman filter, and determining whether the cycle termination condition is met.
In order to further reduce the influence of factors such as noise, the embodiment of the invention carries out smoothing processing on the sound source positioning result based on the Kalman filter, and further improves the accuracy of the positioning result.
In the actual application scene, the positioning result can be filtered through a set threshold value to eliminate abnormal or unstable positioning points.
In one embodiment of the present invention, calculating a steering vector based on a localization result of a sound source includes:
calculating a steering vector based on formula (1);
(1)
wherein,for the characterization of the steering vector,for the characterization of the imaginary units,for characterizing the frequency of the sound signal,for characterizing the distance of the sound source to the mth microphone of the microphone array,for characterizing the angle of incidence of the sound source,for characterizing the speed of sound,anddetermined by the localization result of the sound source.
Steering vectors are used to determine weight vectors in the beamforming formula, e.g., steering vectors of microphones are directly taken as weight vectors.
The beamforming formula is shown in formula (2).
(2)
Wherein,for characterizing the signal output after the beamforming process,the weight vector used to characterize microphone i,for characterizing the signal received by microphone i at time t.
In one embodiment of the invention, the positioning algorithm comprises: any one or more of a cross correlation method, a mutual information method and a least square method.
As shown in fig. 2, an embodiment of the present invention provides a method for controlling a camera based on sound source localization, including:
step 201: the control system obtains the positioning result of the noise source.
Step 202: and the control system calculates the azimuth angle and the pitch angle of the noise source relative to the camera according to the positioning result.
Step 203: and the control system sends a control instruction to the camera according to the azimuth angle and the pitch angle.
Step 204: and the camera turns to the direction of the noise source according to the control instruction and shoots an image comprising the noise source.
Step 205: the control system analyzes the image of the noise source, identifies the type of the noise source, and sends out an alarm signal when the noise source meets the alarm condition.
If the noise source is identified as potentially safe, the control system may trigger an alarm mechanism informing the relevant personnel or automatically taking action.
According to the embodiment of the invention, the sound source positioning is combined with the control of the camera, so that potential safety risks can be found in time, and the emergency treatment speed is improved.
As shown in fig. 3, an embodiment of the present invention provides a sound source positioning apparatus, including:
an acquisition module 301 configured to acquire sound signals based on the microphone array;
an identification module 302 configured to pre-process the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
a positioning module 303 configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, a target positioning algorithm is adopted to obtain a sound source positioning result; and determining whether a cycle termination condition is met, if so, terminating the current flow, otherwise, calculating a steering vector based on a sound source positioning result, performing beam forming processing on the sound signal based on the steering vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal.
In one embodiment of the present invention, the recognition module 302 is configured to perform feature extraction on the preprocessed sound signal to obtain sound features; carrying out feature standardization on sound features; and inputting the sound characteristics subjected to the characteristic standardization into a trained sound source classification model to obtain the target sound source type to which the sound signal belongs.
In one embodiment of the present invention, the positioning module 303 is configured to calculate an average positioning result based on the positioning result of the multi-frame sound signal, and perform a determination as to whether the cycle termination condition is satisfied.
In one embodiment of the present invention, the positioning module 303 is configured to perform smoothing processing on the positioning result of the sound source based on the kalman filter, and perform determination as to whether the loop termination condition is satisfied.
In one embodiment of the invention, the positioning module 303 is configured to calculate the steering vector based on equation (1).
In one embodiment of the invention, the recognition module 302 is configured to convert the preprocessed sound signal from a time domain representation to a frequency domain representation based on a short-time fourier transform; extracting mel-frequency spectrum features from the sound signal represented by the frequency domain; extracting channel parameters from the preprocessed sound signal based on linear predictive coding; the fundamental frequency of the sound signal is extracted.
An embodiment of the present invention provides an electronic device, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.
The present invention provides a computer readable medium having stored thereon a computer program which when executed by a processor implements a method as in any of the embodiments described above.
Referring now to FIG. 4, there is illustrated a schematic diagram of a computer system 400 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 4 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU) 401, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output portion 407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 408 including a hard disk or the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. The drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 409 and/or installed from the removable medium 411. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 401.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases limit the module itself, and for example, the transmitting module may also be described as "a module that transmits a picture acquisition request to a connected server".
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A sound source localization method, comprising:
acquiring sound signals based on the microphone array;
preprocessing the sound signal;
identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal;
determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
carrying out beam forming processing on the preprocessed sound signals;
based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result;
determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, performing beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal;
calculating a steering vector based on the localization result of the sound source, comprising:
calculating the steering vector based on equation 1;
equation 1:
wherein (1)>For characterizing guide vectors>For characterizing imaginary units, < >>Frequency for characterizing said sound signal, +.>For characterizing the distance of the sound source to the mth microphone of the microphone array,/>For characterizing the angle of incidence of said sound source, +.>For characterizing sound speed->And->And determining by the positioning result of the sound source.
2. The method of claim 1, wherein,
based on the preprocessed sound signal, identifying a target sound source type to which the sound signal belongs, comprising:
extracting the characteristics of the preprocessed sound signals to obtain sound characteristics;
performing feature standardization on the sound features;
and inputting the sound characteristics subjected to the characteristic standardization into a trained sound source classification model to obtain the target sound source type to which the sound signal belongs.
3. The method of claim 1, wherein,
after the target positioning algorithm is adopted to obtain the positioning result of the sound source, before determining whether the cycle termination condition is met, the method further comprises the following steps:
and calculating an average positioning result based on the positioning result of the multi-frame sound signals, and executing the determination whether the cycle termination condition is met.
4. The method as recited in claim 1, further comprising: and carrying out smoothing processing on the sound source positioning result based on a Kalman filter, and executing the determination whether the loop termination condition is met.
5. The method of claim 2, wherein,
extracting the characteristics of the preprocessed sound signals to obtain sound characteristics, wherein the method comprises the following steps:
converting the preprocessed sound signal from a time domain representation to a frequency domain representation based on a short-time fourier transform;
extracting mel-frequency spectrum features from the sound signal represented by the frequency domain;
extracting channel parameters from the preprocessed sound signal based on linear predictive coding;
extracting a fundamental frequency of the sound signal.
6. The method of claim 1, wherein,
the positioning algorithm comprises the following steps: any one or more of a cross correlation method, a mutual information method and a least square method.
7. A sound source localization apparatus, comprising:
an acquisition module configured to acquire sound signals based on the microphone array;
an identification module configured to pre-process the sound signal; identifying a target sound source type to which the sound signal belongs based on the preprocessed sound signal; determining a target positioning algorithm corresponding to the target sound source type in a plurality of positioning algorithms corresponding to the sound source type;
a positioning module configured to perform beam forming processing on the preprocessed sound signal; based on the sound signals subjected to the beam forming processing, the target positioning algorithm is adopted to obtain a sound source positioning result; determining whether a cycle termination condition is met, if yes, terminating the current flow, otherwise, calculating a guide vector based on a positioning result of the sound source, performing beam forming processing on the sound signal based on the guide vector to obtain an enhanced sound signal, and performing preprocessing on the sound signal based on the enhanced sound signal;
calculating a steering vector based on the localization result of the sound source, comprising:
calculating the steering vector based on equation 1;
equation 1:
wherein (1)>For characterizing guide vectors>For characterizing imaginary units, < >>Frequency for characterizing said sound signal, +.>For characterizing the distance of the sound source to the m-th microphone of the microphone array,/->For characterizing the angle of incidence of said sound source, +.>For characterizing sound speed->And->And determining by the positioning result of the sound source.
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
9. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
CN202311575126.7A 2023-11-24 2023-11-24 Sound source positioning method and device Active CN117289208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311575126.7A CN117289208B (en) 2023-11-24 2023-11-24 Sound source positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311575126.7A CN117289208B (en) 2023-11-24 2023-11-24 Sound source positioning method and device

Publications (2)

Publication Number Publication Date
CN117289208A CN117289208A (en) 2023-12-26
CN117289208B true CN117289208B (en) 2024-02-20

Family

ID=89241038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311575126.7A Active CN117289208B (en) 2023-11-24 2023-11-24 Sound source positioning method and device

Country Status (1)

Country Link
CN (1) CN117289208B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920448A (en) * 2019-02-26 2019-06-21 江苏大学 A kind of identifying system and method for automatic driving vehicle traffic environment special type sound
CN112379330A (en) * 2020-11-27 2021-02-19 浙江同善人工智能技术有限公司 Multi-robot cooperative 3D sound source identification and positioning method
CN112485761A (en) * 2021-02-03 2021-03-12 成都启英泰伦科技有限公司 Sound source positioning method based on double microphones
CN114089279A (en) * 2021-10-15 2022-02-25 浙江工业大学 Sound target positioning method based on uniform concentric circle microphone array
CN114114153A (en) * 2021-11-23 2022-03-01 哈尔滨工业大学(深圳) Multi-sound-source positioning method and system, microphone array and terminal device
CN114325214A (en) * 2021-11-18 2022-04-12 国网辽宁省电力有限公司电力科学研究院 Electric power online monitoring method based on microphone array sound source positioning technology
CN116106826A (en) * 2022-12-16 2023-05-12 北京奕斯伟计算技术股份有限公司 Sound source positioning method, related device and medium
CN116381717A (en) * 2023-04-25 2023-07-04 广东石油化工学院 Unmanned aerial vehicle positioning device and positioning method based on Kalman filtering algorithm
CN116705056A (en) * 2023-07-25 2023-09-05 腾讯音乐娱乐科技(深圳)有限公司 Audio generation method, vocoder, electronic device and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6501260B2 (en) * 2015-08-20 2019-04-17 本田技研工業株式会社 Sound processing apparatus and sound processing method
JP7266433B2 (en) * 2019-03-15 2023-04-28 本田技研工業株式会社 Sound source localization device, sound source localization method, and program

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920448A (en) * 2019-02-26 2019-06-21 江苏大学 A kind of identifying system and method for automatic driving vehicle traffic environment special type sound
CN112379330A (en) * 2020-11-27 2021-02-19 浙江同善人工智能技术有限公司 Multi-robot cooperative 3D sound source identification and positioning method
CN112485761A (en) * 2021-02-03 2021-03-12 成都启英泰伦科技有限公司 Sound source positioning method based on double microphones
CN114089279A (en) * 2021-10-15 2022-02-25 浙江工业大学 Sound target positioning method based on uniform concentric circle microphone array
CN114325214A (en) * 2021-11-18 2022-04-12 国网辽宁省电力有限公司电力科学研究院 Electric power online monitoring method based on microphone array sound source positioning technology
CN114114153A (en) * 2021-11-23 2022-03-01 哈尔滨工业大学(深圳) Multi-sound-source positioning method and system, microphone array and terminal device
CN116106826A (en) * 2022-12-16 2023-05-12 北京奕斯伟计算技术股份有限公司 Sound source positioning method, related device and medium
CN116381717A (en) * 2023-04-25 2023-07-04 广东石油化工学院 Unmanned aerial vehicle positioning device and positioning method based on Kalman filtering algorithm
CN116705056A (en) * 2023-07-25 2023-09-05 腾讯音乐娱乐科技(深圳)有限公司 Audio generation method, vocoder, electronic device and storage medium

Also Published As

Publication number Publication date
CN117289208A (en) 2023-12-26

Similar Documents

Publication Publication Date Title
US11398235B2 (en) Methods, apparatuses, systems, devices, and computer-readable storage media for processing speech signals based on horizontal and pitch angles and distance of a sound source relative to a microphone array
CN107799126B (en) Voice endpoint detection method and device based on supervised machine learning
EP2530484B1 (en) Sound source localization apparatus and method
RU2642353C2 (en) Device and method for providing informed probability estimation and multichannel speech presence
CN108701469B (en) Cough sound recognition method, device, and storage medium
US11941968B2 (en) Systems and methods for identifying an acoustic source based on observed sound
CN111048104B (en) Speech enhancement processing method, device and storage medium
CN106872945B (en) Sound source positioning method and device and electronic equipment
US10951982B2 (en) Signal processing apparatus, signal processing method, and computer program product
JP2017102085A (en) Information processing apparatus, information processing method, and program
CN110709929B (en) Processing sound data to separate sound sources in a multi-channel signal
US20190281386A1 (en) Apparatus and a method for unwrapping phase differences
JP5708294B2 (en) Signal detection apparatus, signal detection method, and signal detection program
CN112992190A (en) Audio signal processing method and device, electronic equipment and storage medium
AU2013204156A1 (en) Classification apparatus and program
US20180188104A1 (en) Signal detection device, signal detection method, and recording medium
CN116421163A (en) Vital sign detection method and device
CN117289208B (en) Sound source positioning method and device
CN103890843B (en) Signal noise attenuation
CN113093106A (en) Sound source positioning method and system
CN110890099A (en) Sound signal processing method, device and storage medium
KR101671305B1 (en) Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same
CN117169812A (en) Sound source positioning method based on deep learning and beam forming
KR101711302B1 (en) Discriminative Weight Training for Dual-Microphone based Voice Activity Detection and Method thereof
KR20180068467A (en) Speech recognition method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant