CN110544490A - sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics - Google Patents

sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics Download PDF

Info

Publication number
CN110544490A
CN110544490A CN201910694072.3A CN201910694072A CN110544490A CN 110544490 A CN110544490 A CN 110544490A CN 201910694072 A CN201910694072 A CN 201910694072A CN 110544490 A CN110544490 A CN 110544490A
Authority
CN
China
Prior art keywords
sound source
formula
mixture model
signal
gaussian mixture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910694072.3A
Other languages
Chinese (zh)
Other versions
CN110544490B (en
Inventor
赵小燕
陈书文
刘鸿斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU SECOND NORMAL University (JIANGSU INSTITUTE OF EDUCATIONAL SCIENCE RESEARCH)
Nanjing Institute of Technology
Original Assignee
Jiangsu Second Normal College (jiangsu Academy Of Educational Sciences)
Nanjing Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Second Normal College (jiangsu Academy Of Educational Sciences), Nanjing Forestry University filed Critical Jiangsu Second Normal College (jiangsu Academy Of Educational Sciences)
Priority to CN201910694072.3A priority Critical patent/CN110544490B/en
Publication of CN110544490A publication Critical patent/CN110544490A/en
Application granted granted Critical
Publication of CN110544490B publication Critical patent/CN110544490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • General Physics & Mathematics (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

the invention discloses a sound source positioning method based on a Gaussian mixture model and space power spectrum characteristics, which comprises two stages of training and testing, wherein a space power spectrum of each azimuth is extracted as a characteristic vector in the training stage, the Gaussian mixture model is established for each azimuth, the likelihood of a test signal relative to each azimuth is given by a Gaussian mixture model classifier in the testing stage, and an estimated value of a sound source azimuth is obtained based on the maximum likelihood. The invention utilizes information of sound source position, acoustic environment and the like, can effectively depict class characteristics through the Gaussian mixture model, can realize real-time sound source positioning only by one frame of signal, obviously improves the positioning performance and has stronger anti-noise capability.

Description

Sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics
Technical Field
The invention belongs to the technical field of signal processing, and particularly relates to a sound source positioning technology.
Background
The small microphone array is usually suitable for application places such as offices, conference rooms, intelligent robots and the like, and the traditional positioning algorithm, such as the SRP algorithm, has unsatisfactory positioning performance in complex acoustic environments of reverberation and noise, because the reverberation causes a spatial power spectrum to form a plurality of local peaks, and the peaks are caused by the reflection of the acoustic environment to a real sound source.
The reverberation information of the room is embodied on the room impulse response, and researches show that the phase weighted controllable response power spectrum SRP-PHAT of the signals received by the microphone is equal to the phase weighted controllable response power spectrum SRP-PHAT of the room impulse response from the sound source to each array element, the reverberation actually comprises the acoustic environment information of the sound source, including the room size, the position of the sound source relative to the room, the room reflection characteristics and the like, so that the spatial power spectrum function comprises spatial structure information which is closely related to the sound source position and the acoustic environment.
The invention discloses a Gaussian Mixture Model (GMM), which is a Model with only one state, wherein the state is provided with a plurality of Gaussian distribution functions, and the GMM can effectively depict the characteristics of the category.
Disclosure of Invention
in order to solve the problems in the prior art, the invention provides a sound source positioning method based on a Gaussian mixture model and spatial power spectrum characteristics.
The method comprises the steps of adopting microphones as array elements to form an array, wherein the number of the microphones is U, the serial number of each microphone is U, U is 1,2, … and U, collecting sound source signals, the directions of the sound source signals are known, the direction vector of each sound source is gamma, the azimuth angle is determined by the horizontal angle theta and the pitch angle of the sound source relative to the array, and then, a time domain signal of the direction vector gamma collected by the U-th array element is used as a training signal.
framing the training signal, the framing length is N, the sampling serial number is N, N is more than or equal to 0 and less than N, the number of single frames is L, the serial number of the single frame is L, L is 1,2, … and L, then the training signal framing is that the Hamming window function is adopted to carry out windowing, the formula is substituted for calculation, and the L-th frame single frame signal of the direction vector gamma acquired by the u array elements is obtained
And performing discrete Fourier transform on the single-frame signal by adopting a discrete Fourier transform function DFT, wherein the length of the DFT is K, the frequency point is K, and the DFT is substituted into a formula for calculation to obtain a frequency domain signal corresponding to the l-th frame time domain signal of the direction vector gamma acquired by the u array elements so as to convert the training signal from the time domain signal into the frequency domain signal.
Setting the sound source from far field, propagating from known candidate position to array, the speed of sound in air is c, the candidate position of far field is r, the central position of array is r0, the position of u array element is ru, the sound propagation delay from the candidate position of far field to the central position of array is tau 0(r), the sound propagation delay from the candidate position of far field to the position of u array element is tau u (r), the guiding delay from the candidate position of far field to u array element is tau u0(r), then
calculating the guiding time delay of a sound source in a far field, enabling the sound source position to be equivalent to a sound source azimuth, measuring a direction vector xi of a far field candidate position, wherein the candidate direction vector xi is determined by a horizontal angle alpha and a pitch angle phi of the candidate position relative to an array, and when xi is [ cos phi cos alpha, cos phi sin alpha, sin phi ] T, substituting the formula for calculation, and substituting the formula for obtaining the guiding time delay tau u0(r) of the far field candidate position.
Setting the signal sampling rate to fs, substituting the guide time delay tau u0(r) and the frequency domain signal into a formula for calculation to obtain PHAT weighted controllable response output of the single frame signal, substituting the PHAT weighted controllable response output into the formula for calculation to obtain a controllable response power value of the training signal, substituting the PHAT weighted controllable response output into the formula for calculation to perform normalization processing, setting the vector dimension to D, substituting the normalization power value into the formula for calculation to obtain a space power spectrum characteristic vector of the first frame signal of the direction vector gamma
A Gaussian mixture model lambda gamma is constructed according to the characteristic vector, the characteristic vector is used as a training sample and is substituted into a formula for calculation, L is the frame number of a training signal, a training set of a direction vector gamma is obtained, a Gaussian mixture model lambda gamma is constructed for each training direction, one category of characteristic distribution is decomposed into a plurality of Gaussian sub-distributions, the characteristic distribution is approximated by the mixture weighted average of Gaussian components, and the complete Gaussian mixture model is composed of a mixture weight, a mean vector and a covariance matrix.
Setting the parameter of the Gaussian mixture model with the d-th component of the characteristic vector as a direction vector gamma as lambda gamma, the order of the Gaussian mixture model as I, the serial number of the Gaussian component as I, and the d-th component with the mixed weight as a mean vector as a covariance matrix as
Estimating model parameters by adopting an EM algorithm, setting 1/I as an initial value of a mixed weight, randomly selecting a training sample from a training set as an initial value of a mean vector, adopting a unit matrix as an initial value of a covariance matrix, and constructing the initial value of lambda gamma by using the initial value.
Will be substituted into the formula
Calculating to obtain Gaussian probability sub-distribution, substituting the Gaussian probability sub-distribution into a formula, calculating to obtain the posterior probability of the training sample in the ith Gaussian component, substituting the posterior probability into the formula to obtain a re-estimated value, calculating to obtain a re-estimated value, substituting the formula with D being more than or equal to 1 and less than or equal to D, calculating to obtain a substitution diagonal matrix, obtaining a re-estimated value, constructing the re-estimated value to obtain a re-estimated value of lambda gamma, and repeating iterative operation for re-estimation until the model is converged according to the method.
collecting a sound source with an unknown azimuth as a test signal, converting the test signal into a single-frame signal, calculating a spatial power spectrum characteristic vector ytest (l) of the sound source with the unknown azimuth from the single-frame signal, and substituting the ytest (l) into a Gaussian mixture model
substituting ytest (l) into a formula for calculation to obtain Gaussian probability sub-distribution, and substituting the Gaussian probability sub-distribution into the formula for calculation to obtain the likelihood of the test sample in each direction vector gamma
p(y(l)|λγ)。
substituting p (ytest (l) | λ γ) into a formula for calculation to obtain the direction vector of the l frame signal corresponding to the maximum value of p (ytest (l) | λ γ) as the estimation value of the direction vector γ of the l frame signal of the sound source with unknown azimuth.
The method comprises two stages of training and testing, wherein a spatial power spectrum of each direction is extracted as a characteristic vector in the training stage, and a Gaussian mixture model is established for each direction; in the testing stage, the likelihood of the testing signal relative to each azimuth is given by a Gaussian mixture model classifier, and the estimated value of the sound source azimuth is obtained based on the maximum likelihood; the space power spectrum characteristic vector contains space structure information which is closely related to the sound source direction and the acoustic environment; the Gaussian mixture model can effectively depict the characteristics of the categories, and the mixed weighted average of a plurality of Gaussian probability density functions is used for approximating the characteristic distribution of the categories; the Gaussian mixture model of the training azimuth can be stored in a memory after off-line calculation, and real-time sound source positioning can be realized only by one frame of signal during testing; the SRP-PHAT space power spectrum of the microphone array is extracted to be used as a characteristic vector of the sound source positioning system, acoustic environment information is fully utilized, positioning performance is obviously improved, and the anti-noise capability is stronger.
Drawings
Fig. 1 is a flowchart of the method, fig. 2 is a diagram illustrating a comparison between the positioning success rates of the method and the conventional SRP-PHAT method when the reverberation time T60 is 0.3s, and fig. 3 is a diagram illustrating a comparison between the positioning success rates of the method and the conventional SRP-PHAT method when the reverberation time T60 is 0.6 s.
Detailed Description
The technical scheme of the invention is specifically explained in the following by combining the attached drawings.
As shown in fig. 1, a sound source signal at a known position is collected as a training signal, the training signal is converted into a single-frame signal through frame division and windowing preprocessing, the single-frame signal is transformed into a frequency domain signal through discrete fourier transform, the guiding time delay of a candidate position is calculated to obtain a controllable response output, a spatial power spectrum feature vector of a training orientation is extracted as a training sample, a gaussian mixture model is constructed, a sound source signal at an unknown position is collected as a test signal by the same method, a spatial power spectrum feature vector of a test orientation is extracted as a test sample, the test sample is input into a constructed gaussian mixture model classifier, the likelihood of the gaussian mixture model corresponding to each training orientation of the test sample is calculated, and the orientation corresponding to the maximum likelihood is selected as the sound source orientation.
Setting a sound source at a known position in the far field, where the known position can be equivalent to a known azimuth, the direction vector γ of the sound source with respect to the array is made up of a horizontal angle θ and a pitch angle, one for each known sound source.
The technical scheme is that 6 omnidirectional microphones are used as microphone array elements to form a uniform circular array, the array radius is 0.1m, the microphone serial numbers are u, u is 1,2, … and 6, sound source signals with known directions are set for convenience of explanation, and the sound source signals are located on the same horizontal plane with the microphone array, so that a pitch angle sound source direction vector gamma of a sound source is degraded from three dimensions to two dimensions, and gamma is [ cos theta, sin theta ] T.
defining the right front of a horizontal plane as 90 degrees, setting the range of a horizontal angle theta as [0 degrees and 360 degrees ], setting the interval of theta as 10 degrees, then theta as 0 degrees, 10 degrees, … degrees and 350 degrees, then the number of the directions of training signals is 36, each array element collects sound source signals as training signals, and then the training signals of the direction vector gamma collected by u array elements are
The training signal is set to be from a candidate position r, the position of the array center is r0, the position of the u-th array element is ru, the sound velocity in air is 342m/s, the sound propagation delay from the candidate position to the array center is tau 0(r), the sound propagation delay from the candidate position to the u-th array element is tau u (r), and the guide delay from the candidate position to the u-th array element is tau u0 (r).
Framing the training signal, the framing length is 512(32ms), the frame is shifted to 0, the sampling serial number is N, N is more than or equal to 0 and is less than N, the number of single frames is 300, the serial number of the single frame is L, L is 1,2, …, L, then the training signal is framed to be
Adopting Hamming window function to carry out windowing, substituting the Hamming window function into a formula to calculate to obtain l frame time domain signals of the direction vector gamma acquired by u array elements
And performing discrete Fourier transform on the single-frame signal by adopting a discrete Fourier transform function DFT, wherein the length K of the DFT is 1024, the frequency point is K, and the DFT is substituted into a formula for calculation to obtain a frequency domain signal corresponding to the l-frame time domain signal of the direction vector gamma acquired by the u array elements so as to convert the training signal from the time domain signal into the frequency domain signal.
And calculating the guiding time delay of the sound source in the far field, wherein the position of the sound source can be equivalent to the azimuth of the sound source, the direction vector of the candidate position is set to xi, the vector is composed of a horizontal angle alpha and a pitch angle phi of the candidate position relative to the array, and xi is [ cos phi cos alpha, cos phi sin alpha, sin phi ] T, and each candidate position corresponds to xi.
Since the sound source and the microphone array are set to be in the same horizontal plane, the azimuth angle can be equivalent to xi ═ cos α, sin α ] T, the azimuth angle is substituted into the formula for calculation, the substitution formula obtains the guiding time delay tau u0(r) of the far-field candidate position, and xi ═ cos α, sin α ] T and tau u0(r) are irrelevant to the received signal and can be stored in a memory after being calculated off line.
Setting the signal sampling rate fs to be 16kHz, substituting the guide time delay tau u0(r) and the frequency domain signal into a formula for calculation to obtain the PHAT weighted controllable response output of the single-frame signal, substituting the PHAT weighted controllable response output into the formula for calculation to obtain the controllable response power value of the training signal, substituting the PHAT weighted controllable response power value into the formula for calculation, and performing normalization processing.
Setting the vector dimension as D, and since the number of orientations of the training signal is set as 36 in the foregoing, D is 36, and the normalized power value is substituted into the formula to calculate, so as to obtain the space power spectrum feature vector of the l frame signal of the direction vector γ
Constructing a Gaussian mixture model lambda gamma according to the characteristic vector, corresponding to each training orientation, taking the space power spectrum characteristic vector of all frames of the orientation as a training sample, taking L as the frame number of a training signal, wherein the number of the training signal is consistent with that of single frames set in the previous paragraph, and the L is 300, substituting the formula for calculation to obtain a training sample set of the direction vector gamma, constructing a Gaussian mixture model for each training orientation, decomposing the characteristic distribution of one category into a plurality of Gaussian sub-distributions, approximating the characteristic distribution by using the mixed weighted average of Gaussian components, and forming the complete Gaussian mixture model by using mixed weight, mean vector and covariance matrix.
Setting the d-th component of the feature vector as the order I of the Gaussian mixture model to be 8, the serial number of the Gaussian component to be I, the mixing weight as the mean vector as the d-th component of the mean vector as the variance as the covariance matrix to be
Iterative estimation of model parameters is carried out by adopting an EM algorithm, 1/I is set as an initial value of a mixing weight, training samples are randomly selected from a training set as an initial value of a mean vector, a unit matrix is used as an initial value of a covariance matrix, a Gaussian mixing model is constructed, and an initial value of lambda gamma is constructed according to the initial value.
Firstly, calculating a substitution formula to obtain Gaussian probability sub-distribution, calculating the substitution formula to obtain the posterior probability of the training sample at the ith Gaussian component
secondly, substituting the formula for calculation to obtain a re-estimated value, and substituting the formula for D to be not less than 1 and not more than D to obtain a re-estimated value to be substituted for diagonal matrix calculation.
And finally, substituting the re-estimated value into a formula for calculation to obtain a re-estimated value of the lambda gamma, repeating the re-estimation again, iteratively calculating and updating the lambda gamma until the model is converged, completing the construction of the Gaussian mixture models of all the training directions to form a classifier, and storing the classifier in an internal memory in an off-line manner.
The same microphone array is adopted, for convenience of explaining the technical scheme, a sound source with an unknown position is set to be in a far field, the sound source and the array are in the same horizontal plane, the sound source position is equivalent to a sound source azimuth at the moment, an azimuth angle is equivalent to a horizontal angle, a sound source signal with the unknown azimuth is collected, the sound source signal is used as a test signal, and the test signal is converted into a single-frame signal by adopting the same framing and windowing method as the method.
According to the same method as the method, a single-frame test signal is converted from a time domain signal to a frequency domain signal, the pilot time delay of the candidate position is calculated, the controllable response output of the test signal is calculated according to the frequency domain signal and the pilot time delay, and then the spatial power spectrum feature vector ytest (l) of the unknown sound source is obtained to serve as a test sample.
Inputting ytest (l) into Gaussian mixture model classifiers corresponding to different training orientations, substituting ytest (l) into a formula for calculation to obtain Gaussian probability sub-distribution, and substituting into the formula for calculation to obtain the likelihood p (ytest (l) | lambda gamma) of the test sample in each direction vector gamma.
substituting p (ytest (l) | λ γ) into a formula for calculation, because the likelihood of the test sample in the training azimuth is maximum, the prior probability of each training azimuth is equal, and the posterior probability of the test sample in the training azimuth is also maximum by a Bayesian formula, so that the direction vector with the maximum likelihood p (ytest (l) | λ γ) is used as the direction vector estimation value of the l-th frame signal of the unknown azimuth to realize sound source positioning.
when the reverberation time T60 is 0.3s and 0.6s respectively, the signal-to-noise ratio dB is taken as an abscissa, the positioning success rate% is taken as an ordinate, the sound source is positioned by adopting the method and the traditional SRP-PHAT method, and the results are compared, so that the method is superior to the traditional SRP-PHAT method.
The above-described embodiments are not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention are included in the scope of the present invention.

Claims (9)

1. A sound source positioning method based on a Gaussian mixture model and spatial power spectrum features is characterized by comprising the following steps:
collecting sound source signals of known azimuth as training signals, converting the training signals into L single-frame signals, calculating space power spectrum characteristic vectors of the sound source signals of the known azimuth through the single-frame signals, constructing a Gaussian mixture model lambda gamma according to the characteristic vectors, and positioning the sound source of the unknown azimuth through the Gaussian mixture model.
2. The sound source localization method based on Gaussian mixture model and spatial power spectrum features according to claim 1, wherein the acquiring the sound source signal with known orientation as the training signal comprises:
Using a microphone as an array element to form an array, collecting sound source signals in a known direction, setting the number of the array elements as U, the serial number as U, the number as U being 1,2, …, and the direction vector of the known direction as gamma, and obtaining training signals of the direction vector gamma collected by the U array elements
3. The sound source localization method based on Gaussian mixture model and spatial power spectrum features according to claim 1, wherein the converting the training signal into L single frame signals comprises:
Setting the frame length as N, the sampling sequence number as N, the number of single frames as L, the sequence number of single frames as L, N is more than or equal to 0 and less than N, L as 1,2, …, L, dividing the frame into frames
Adopting Hamming window function to carry out windowing, substituting into formula to calculate to obtain the l frame time domain signal of the direction vector gamma collected by u array elements
4. the sound source localization method according to claim 1, wherein the calculating the spatial power spectral feature vector of the sound source signal of the known azimuth from the single frame signal comprises:
Calculating the pilot time delay tau u0(r) of the far-field candidate position, converting the single frame signal into the frequency domain signal, calculating the space power spectrum characteristic vector of the sound source with known azimuth from the pilot time delay and the frequency domain signal
5. The sound source localization method based on Gaussian mixture model and spatial power spectrum features according to claim 4, wherein the calculating the steering time delay tau u0(r) of the far-field candidate position comprises:
Setting the sound velocity in air as c, the far-field candidate position as r, the array center position as r0, the u array element position as ru, the sound propagation delay from the far-field candidate position to the array center position as τ 0(r), the sound propagation delay from the far-field candidate position to the u array element position as τ u (r), and the guide delay from the far-field candidate position to the u array element as τ u0(r), the acoustic velocity in air is set as c, the far-field candidate position as r, the array center position as r0, the u array element position as ru
And measuring the direction vector xi of the far-field candidate position, substituting the vector xi into a formula for calculation, and replacing the formula to obtain the guiding time delay tau u0 (r).
6. The sound source localization method based on Gaussian mixture model and spatial power spectrum features according to claim 4, wherein the converting the single frame signal into a frequency domain signal comprises:
Adopting a discrete Fourier transform function DFT to carry out discrete Fourier transform, setting the length of the DFT as K and the frequency point as K, substituting the length of the DFT into a formula to calculate, and obtaining the l frame frequency domain signal of the gamma direction collected by the u array elements
7. the sound source localization method according to claim 6, wherein the calculating the spatial power spectrum feature vector of the known sound source from the pilot time delay and the frequency domain signal comprises:
Setting the signal sampling rate to fs, substituting the guide time delay tau u0(r) and the frequency domain signal into a formula for calculation to obtain the PHAT weighted controllable response output of the single-frame signal
Calculating the substituted formula to obtain the controllable response power value of the training signal, calculating the substituted formula to perform normalization processing;
Setting the vector dimension as D, substituting the normalized power value into a formula for calculation to obtain the space power spectrum characteristic vector of the l frame signal of the direction vector gamma
8. The method for sound source localization based on Gaussian mixture model according to claim 1, wherein the constructing the Gaussian mixture model λ γ according to the feature vector comprises:
Substituting the characteristic vector as a training sample into a formula for calculation to obtain a training set of a direction vector gamma
Setting the parameter of the Gaussian mixture model with the d-th component of the characteristic vector as a direction vector gamma as lambda gamma, the order of the Gaussian mixture model as I, the serial number of the Gaussian component as I, and the d-th component with the mixed weight as a mean vector as a covariance matrix as
Iterative estimation parameters are adopted, 1/I is set as an initial value of the mixing weight, training samples are randomly selected from a training set as an initial value of a mean vector, and a unit matrix is adopted as an initial value of a covariance matrix;
Constructing a Gaussian mixture model lambda gamma, and constructing an initial value of the lambda gamma according to the initial value;
Will be substituted into the formula
Calculating to obtain Gaussian probability sub-distribution, substituting the obtained Gaussian probability sub-distribution into a formula to calculate to obtain the posterior probability of the training sample in the ith Gaussian component
Calculating by substituting into a formula to obtain a reestimated value;
Calculating by substituting into a formula to obtain a reestimated value;
Calculating a substitution formula to obtain a re-estimated value which is calculated by substituting a diagonal matrix;
Substituting the reestimated value into a formula for calculation to obtain a reestimated value of lambda gamma, and reestimating iterative operation until the model converges.
9. The Gaussian mixture model-based sound source localization method according to claim 8, wherein the localization of the unknown sound source through the Gaussian mixture model comprises:
Collecting a sound source with an unknown azimuth as a test signal, converting the test signal into a single-frame signal, calculating a spatial power spectrum characteristic vector ytest (l) of the sound source with the unknown azimuth from the single-frame signal, and substituting the ytest (l) into a Gaussian mixture model
substituting ytest (l) into a formula for calculation to obtain Gaussian probability sub-distribution, and substituting the Gaussian probability sub-distribution into the formula for calculation to obtain the likelihood p (ytest (l) lambda gamma) of the test sample in the direction vector gamma;
Substituting p (ytest (l) | λ γ) into a formula for calculation to obtain an estimated value of the direction vector γ of the l-th frame signal of the sound source with unknown azimuth, wherein the direction vector of the l-th frame signal corresponding to the maximum value of p (ytest (l) | λ γ) is used as the estimated value of the direction vector γ of the l-th frame signal of the sound source with unknown azimuth.
CN201910694072.3A 2019-07-30 2019-07-30 Sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics Active CN110544490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910694072.3A CN110544490B (en) 2019-07-30 2019-07-30 Sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910694072.3A CN110544490B (en) 2019-07-30 2019-07-30 Sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics

Publications (2)

Publication Number Publication Date
CN110544490A true CN110544490A (en) 2019-12-06
CN110544490B CN110544490B (en) 2022-04-05

Family

ID=68709911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910694072.3A Active CN110544490B (en) 2019-07-30 2019-07-30 Sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics

Country Status (1)

Country Link
CN (1) CN110544490B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111929645A (en) * 2020-09-23 2020-11-13 深圳市友杰智新科技有限公司 Method and device for positioning sound source of specific human voice and computer equipment
CN112904279A (en) * 2021-01-18 2021-06-04 南京工程学院 Sound source positioning method based on convolutional neural network and sub-band SRP-PHAT space spectrum
CN112951264A (en) * 2019-12-10 2021-06-11 中国科学院声学研究所 Multichannel sound source separation method based on hybrid probability model
CN112946576A (en) * 2020-12-10 2021-06-11 北京有竹居网络技术有限公司 Sound source positioning method and device and electronic equipment
CN113534198A (en) * 2021-06-16 2021-10-22 北京遥感设备研究所 Satellite navigation dynamic anti-interference method and system based on covariance matrix reconstruction
CN114355289A (en) * 2022-03-19 2022-04-15 深圳市烽火宏声科技有限公司 Sound source positioning method, sound source positioning device, storage medium and computer equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090118718A (en) * 2008-05-14 2009-11-18 삼성전자주식회사 Real-time srp-phat sound source localization system and control method using a search space clustering method
KR20090128221A (en) * 2008-06-10 2009-12-15 삼성전자주식회사 Method for sound source localization and system thereof
WO2011091754A1 (en) * 2010-01-27 2011-08-04 华为终端有限公司 Sound source locating method and apparatus thereof
CN102438189A (en) * 2011-08-30 2012-05-02 东南大学 Dual-channel acoustic signal-based sound source localization method
KR20140015893A (en) * 2012-07-26 2014-02-07 삼성테크윈 주식회사 Apparatus and method for estimating location of sound source
CN104142492A (en) * 2014-07-29 2014-11-12 佛山科学技术学院 SRP-PHAT multi-source spatial positioning method
CN106093864A (en) * 2016-06-03 2016-11-09 清华大学 A kind of microphone array sound source space real-time location method
CN108806694A (en) * 2018-06-13 2018-11-13 高艳艳 A kind of teaching Work attendance method based on voice recognition

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090118718A (en) * 2008-05-14 2009-11-18 삼성전자주식회사 Real-time srp-phat sound source localization system and control method using a search space clustering method
KR20090128221A (en) * 2008-06-10 2009-12-15 삼성전자주식회사 Method for sound source localization and system thereof
WO2011091754A1 (en) * 2010-01-27 2011-08-04 华为终端有限公司 Sound source locating method and apparatus thereof
CN102438189A (en) * 2011-08-30 2012-05-02 东南大学 Dual-channel acoustic signal-based sound source localization method
KR20140015893A (en) * 2012-07-26 2014-02-07 삼성테크윈 주식회사 Apparatus and method for estimating location of sound source
CN104142492A (en) * 2014-07-29 2014-11-12 佛山科学技术学院 SRP-PHAT multi-source spatial positioning method
CN106093864A (en) * 2016-06-03 2016-11-09 清华大学 A kind of microphone array sound source space real-time location method
CN108806694A (en) * 2018-06-13 2018-11-13 高艳艳 A kind of teaching Work attendance method based on voice recognition

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HOANG DO ET AL: "SRP-PHAT METHODS OF LOCATING SIMULTANEOUS MULTIPLE TALKERS USING A FRAME OF MICROPHONE ARRAY DATA", 《ICASSP 2010》 *
ZHAO XIAOYAN ET AL: "A FAST SEARCH METHOD OF STEERED RESPONSE POWER WITH SMALL-APERTURE MICROPHONE ARRAY FOR SOUND SOURCE LOCALIZATION", 《JOURNAL OF ELECTRONICS (CHINA)》 *
ZHAO XIAOYAN ET AL: "Accelerated steered response power method for sound source localization via clustering search", 《SCIENCE CHINA》 *
周琳等: "基于子带信噪比估计和软判决的鲁棒双耳声源定位算法", 《东南大学学报( 自然科学版)》 *
肖骏: "基于麦克风阵列的实时声源定位技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112951264A (en) * 2019-12-10 2021-06-11 中国科学院声学研究所 Multichannel sound source separation method based on hybrid probability model
CN112951264B (en) * 2019-12-10 2022-05-17 中国科学院声学研究所 Multichannel sound source separation method based on hybrid probability model
CN111929645A (en) * 2020-09-23 2020-11-13 深圳市友杰智新科技有限公司 Method and device for positioning sound source of specific human voice and computer equipment
CN112946576A (en) * 2020-12-10 2021-06-11 北京有竹居网络技术有限公司 Sound source positioning method and device and electronic equipment
WO2022121800A1 (en) * 2020-12-10 2022-06-16 北京有竹居网络技术有限公司 Sound source positioning method and apparatus, and electronic device
CN112946576B (en) * 2020-12-10 2023-04-14 北京有竹居网络技术有限公司 Sound source positioning method and device and electronic equipment
CN112904279A (en) * 2021-01-18 2021-06-04 南京工程学院 Sound source positioning method based on convolutional neural network and sub-band SRP-PHAT space spectrum
CN112904279B (en) * 2021-01-18 2024-01-26 南京工程学院 Sound source positioning method based on convolutional neural network and subband SRP-PHAT spatial spectrum
CN113534198A (en) * 2021-06-16 2021-10-22 北京遥感设备研究所 Satellite navigation dynamic anti-interference method and system based on covariance matrix reconstruction
CN113534198B (en) * 2021-06-16 2023-05-23 北京遥感设备研究所 Satellite navigation dynamic anti-interference method and system based on covariance matrix reconstruction
CN114355289A (en) * 2022-03-19 2022-04-15 深圳市烽火宏声科技有限公司 Sound source positioning method, sound source positioning device, storage medium and computer equipment
CN114355289B (en) * 2022-03-19 2022-06-10 深圳市烽火宏声科技有限公司 Sound source positioning method, sound source positioning device, storage medium and computer equipment

Also Published As

Publication number Publication date
CN110544490B (en) 2022-04-05

Similar Documents

Publication Publication Date Title
CN110544490B (en) Sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics
CN107703486B (en) Sound source positioning method based on convolutional neural network CNN
Nadiri et al. Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test
JP5587396B2 (en) System, method and apparatus for signal separation
JP4912778B2 (en) Method and system for modeling the trajectory of a signal source
CN109490822B (en) Voice DOA estimation method based on ResNet
CN111415676A (en) Blind source separation method and system based on separation matrix initialization frequency point selection
Sivasankaran et al. Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment
Xiao et al. Beamforming networks using spatial covariance features for far-field speech recognition
Smaragdis et al. Position and trajectory learning for microphone arrays
CN113111765B (en) Multi-voice source counting and positioning method based on deep learning
CN111341339A (en) Target voice enhancement method based on acoustic vector sensor adaptive beam forming and deep neural network technology
Salvati et al. Two-microphone end-to-end speaker joint identification and localization via convolutional neural networks
Nesta et al. Enhanced multidimensional spatial functions for unambiguous localization of multiple sparse acoustic sources
Zhu et al. Modified complementary joint sparse representations: a novel post-filtering to MVDR beamforming
Hu et al. Evaluation and comparison of three source direction-of-arrival estimators using relative harmonic coefficients
CN115713943A (en) Beam forming voice separation method based on complex space angular center Gaussian mixture clustering model and bidirectional long-short-term memory network
Aroudi et al. DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation
CN111505569B (en) Sound source positioning method and related equipment and device
Firoozabadi et al. Combination of nested microphone array and subband processing for multiple simultaneous speaker localization
Dwivedi et al. Hybrid sh-cnn-mp approach for super resolution doa estimation
Li et al. Low complex accurate multi-source RTF estimation
Chen et al. Multi-channel end-to-end neural network for speech enhancement, source localization, and voice activity detection
Kawase et al. Automatic parameter switching of noise reduction for speech recognition
Yuan et al. Multi-channel Speech Enhancement with Multiple-target GANs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200911

Address after: 1 No. 210000 Jiangsu city of Nanjing province Jiangning Science Park Hongjing Road

Applicant after: NANJING INSTITUTE OF TECHNOLOGY

Applicant after: JIANGSU SECOND NORMAL University (JIANGSU INSTITUTE OF EDUCATIONAL SCIENCE RESEARCH)

Address before: Longpan road Xuanwu District of Nanjing city of Jiangsu Province, No. 159 210000

Applicant before: NANJING FORESTRY University

Applicant before: JIANGSU SECOND NORMAL University (JIANGSU INSTITUTE OF EDUCATIONAL SCIENCE RESEARCH)

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant