CN108269581B - Double-microphone time delay difference estimation method based on frequency domain coherent function - Google Patents

Double-microphone time delay difference estimation method based on frequency domain coherent function Download PDF

Info

Publication number
CN108269581B
CN108269581B CN201710004194.6A CN201710004194A CN108269581B CN 108269581 B CN108269581 B CN 108269581B CN 201710004194 A CN201710004194 A CN 201710004194A CN 108269581 B CN108269581 B CN 108269581B
Authority
CN
China
Prior art keywords
coherent
function
frequency domain
calculating
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710004194.6A
Other languages
Chinese (zh)
Other versions
CN108269581A (en
Inventor
方义
冯海泓
陈友元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN201710004194.6A priority Critical patent/CN108269581B/en
Publication of CN108269581A publication Critical patent/CN108269581A/en
Application granted granted Critical
Publication of CN108269581B publication Critical patent/CN108269581B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a double-microphone time delay difference estimation method based on a frequency domain coherent function, which comprises the following steps: step 1) calculating coherent functions of the two microphones at different azimuth angles under the condition that the two microphones do not receive sound source signals, extracting a real part and an imaginary part of the coherent functions under the coherent functions corresponding to each angle, and establishing a database of coherent function characteristic quantities; and 2) converting signals received by the double microphones into a frequency domain, calculating a coherent function of the two signals in the frequency domain, performing peak value smoothing on each frequency point according to a module value of the coherent function, then calculating a real part and an imaginary part of the coherent function, matching in a coherent function characteristic quantity database to obtain a sound source azimuth angle, and calculating time delay difference. The method effectively overcomes the interference of background noise and reverberation by extracting the frequency points with large modulus values of the coherent function frame by frame, and improves the accuracy of the estimation of the time delay difference of the double microphones.

Description

Double-microphone time delay difference estimation method based on frequency domain coherent function
Technical Field
The invention relates to the field of voice signal delay estimation, in particular to a double-microphone delay difference estimation method based on a frequency domain coherent function.
Background
The time delay difference estimation method plays an important role in algorithms such as sound source positioning and voice enhancement of a small microphone array, and the time delay difference refers to the time difference between the same signal source received by the microphone array and caused by different transmission distances.
The known methods are based on the estimation of the delay difference by finding the peak time of the generalized cross-correlation function (reference [1 ]: Knapp C, Carter G. the generated correlation method for estimation of time delay [ J ]. IEEE Transactions on optics Specification & Signal Processing,1976,24(4): 320. 12. Liu C, Wheeler B C, Jr O W, et al. localization of multiple sources with optics. J. Journal of the optical source of America 2000, 108. 1888. 1905. reference [ 3. Jr. J. distribution S. distribution of optics. J. Journal of the optical source of America,2000, 108. 1888. J. Journal of analysis [ 10. J. Journal of analysis [ 64. J. Journal of analysis [ 1. J. Journal of analysis [ 10. J. Journal of analysis [ 1. J. Journal of analysis [ 10. J. Journal of analysis [ 10. Journal of analysis [ 10. J. Journal of analysis [ 10. J. Journal of analysis [ 10. Journal of analysis [ 10. J. Journal of analysis [ 10. Journal of analysis [ 10. J. Journal of analysis [ 10. Journal of analysis [ 10 ] 1. Journal of analysis [ 10 ] of analysis [ 10. Journal of analysis [ 10. Journal of analysis of, speech, and Signal Processing, IEEE International Conference on ICASSP. IEEE,1983: 1148-1151; reference [5 ]: extension of a organizational cross-correlation model by relational inhibition. I.A. simulation of correlation for statistical signals [ J ]. Journal of the environmental Society of America,1986,80(6):1608-22 ]. Although in quiet environments, such methods provide a good estimate of the delay difference. But has problems in that: under the interference of background noise and a reverberation environment, the accuracy of the delay inequality estimation is reduced sharply.
Disclosure of Invention
The invention aims to overcome the defects of the time delay difference estimation method in the prior art, and utilizes the characteristic that the module value of the coherent function of a direct voice section is 1, and the module values of the coherent functions of a noise and reverberation sound section are generally less than 1; and smoothing the peak value of the coherent function by using the modulus of the coherent function in the frequency domain, thereby ensuring that each frequency point of the coherent function after the peak value smoothing is dominated by direct sound. That is, the peak smoothed coherence function filters out a portion of the interference of the background noise and reverberant sound. Finally, matching the coherent function after the peak value smoothing with the coherent function under the ideal condition so as to obtain the azimuth angle corresponding to the coherent function under the closest ideal condition; finally, solving the time delay difference; thereby improving the deficiencies of the existing delay inequality estimation technology.
In order to achieve the above object, the present invention provides a method for estimating delay inequality of two microphones based on a frequency domain coherence function, the method comprising:
step 1) calculating coherent functions of the two microphones at different azimuth angles under the condition that the two microphones do not receive sound source signals, extracting a real part and an imaginary part of the coherent functions under the coherent functions corresponding to each angle, and establishing a database of coherent function characteristic quantities;
and 2) converting signals received by the double microphones into a frequency domain, calculating a coherent function of the two signals in the frequency domain, performing peak value smoothing on each frequency point according to a module value of the coherent function, then calculating a real part and an imaginary part of the coherent function, matching in a coherent function characteristic quantity database to obtain a sound source azimuth angle, and calculating time delay difference.
In the above technical solution, the step 1) specifically includes:
step 1-1) carrying out incremental increase by taking every 7.5 degrees as an interval, changing the azimuth angle of a sound source from 0 degree to 180 degrees, and calculating 25 coherence functions of the signals of the double microphones under an ideal condition;
the formula for the coherence function is:
Figure BDA0001202577780000021
wherein d is the distance between the two microphones; c is 340m/s, theta is the azimuth angle of the sound source, omega is the angular frequency, and fs is the sampling rate;
step 1-2) respectively extracting real parts and imaginary parts of 25 coherent functions;
the real part is: cos (ω · τ · cos (θ)); the imaginary part is sin (ω · τ · cos (θ)); wherein,
Figure BDA0001202577780000022
step 1-3) using a K nearest KNN classification algorithm to obtain the characteristic quantity of the coherent function of each angle: and classifying the imaginary part and the real part, wherein each angle corresponds to one class, namely 25 classes in total, and the classification label is set to be 1-25, so that a coherence function characteristic quantity database is established.
In the above technical solution, the step 2) specifically includes:
step 2-1), framing and windowing signals received by the double microphones, then transforming the signals into a frequency domain through FFT, recording the signals transformed into the frequency domain as X1 (lambda, mu), wherein X2 (lambda, mu) is a time frame, and mu is a frequency point of the frequency domain;
step 2-2) calculating a coherent function of the two signals on a frequency domain;
the formula for the calculation of the coherence function is:
Figure BDA0001202577780000023
where PX1X1(λ, μ) is the self-power spectrum of signal X1(λ, μ), PX2X2(λ, μ) is the self-power spectrum of signal X2(λ, μ), and PX1X2(λ, μ) is the cross-power spectrum of the two signals:
PX1X1(λ,μ)=α·PX1X1((λ-1,μ))+(1-α)·|X1(λ,μ)|2
PX2X2(λ,μ)=α·PX2X2((λ-1,μ))+(1-α)·|X2(λ,μ)|2
PX1X2(λ,μ)=α·PX1X2((λ-1,μ))+(1-α)·|X1(λ,μ)·X2(λ,μ)|2
wherein α is a smoothing factor;
step 2-3) calculating a modulus value gamma of the coherence functionX1X2(lambda, mu) l, and performing peak value smoothing at each frequency point to obtain a smoothed peak value
Figure BDA0001202577780000031
Step 2-4) extracting in frequency domain
Figure BDA0001202577780000032
The real part and the imaginary part are matched with a coherent function characteristic quantity database by using a K nearest KNN classification algorithm according to the two values, and each frame of signal obtains a matching result; the matching result is the serial number of the classification label, and the corresponding azimuth angle theta is obtained0Then the delay difference Time is:
Figure BDA0001202577780000033
in the above technical solution, the value of the smoothing factor α in the step 2-2) is 0.68.
In the above technical solution, the specific implementation process of step 2-3) is as follows:
computing the modulus value | Γ of the coherence functionX1X2(lambda, mu) l, and acquiring the Peak value Peak (lambda, mu) of each frequency point; if | ΓX1X2(lambda, mu) is greater than | Peak (lambda, mu) |, Peak (lambda, mu) is smoothed, and the smoothed Peak value
Figure BDA0001202577780000034
Comprises the following steps:
Figure BDA0001202577780000035
wherein alpha is1The value is 0.35;
if | ΓX1X2(λ, μ) | is smaller than | Peak (λ, μ) |, the smoothed Peak value
Figure BDA0001202577780000036
Comprises the following steps:
Figure BDA0001202577780000037
wherein alpha is2The value is 0.95, and the initial value of Peak (lambda, mu) is equal to gammaX1X2(1, μ), the initial value is equal to the coherence function of the first frame of speech.
The invention has the advantages that: the method effectively overcomes the interference of background noise and reverberation by extracting the frequency points with large modulus values of the coherent function frame by frame, and improves the accuracy of the estimation of the time delay difference of the double microphones.
Drawings
FIG. 1 is a schematic diagram of a two-microphone delay-lag estimation scenario of the present invention;
FIG. 2 is a schematic diagram of the present invention for building an ideal frequency domain coherence function database;
FIG. 3 is a schematic representation of the real and imaginary parts of an ideal coherence function at three different azimuths;
fig. 4 is a flow chart of step 2) of the method of the invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
The time delay estimation method based on the frequency domain coherent function utilizes the coherent function which obtains the frequency domain in real time to estimate the time delay difference, the method utilizes the module value of the coherent function (the module value of the coherent function of target direct sound is 1, and the module value of the coherent function of background noise and a voice segment which is greatly influenced by reverberation is smaller) to judge the reliability of frequency points, the specific method is to utilize a peak value smoothing function to obtain the reliable coherent function, and through the method, the method filters the influence of the background noise and reverberant sound to a certain extent. Therefore, the time delay difference is matched with the established ideal characteristic quantity database of the coherent function at different angles, and the matched optimal value is the current angle value, so that the time delay difference can be obtained. The method is suitable for equipment for positioning and separating the sound source of the double microphones and the like.
FIG. 1 is a schematic diagram of a two-microphone delay spread estimation algorithm, in which two microphones are at a distance d, a sound source is located at an azimuth θ between the two microphones, and the delay spread between the two microphones is
Figure BDA0001202577780000041
A two-microphone time delay difference estimation method based on a frequency domain coherence function comprises the following steps:
step 1) calculating coherent functions of the two microphones at different azimuth angles under the condition that the two microphones do not receive sound source signals, extracting a real part and an imaginary part of the coherent functions under the coherent functions corresponding to each angle, and establishing a database of coherent function characteristic quantities;
as shown in fig. 2, the step 1) specifically includes:
step 1-1) carrying out incremental increase by taking every 7.5 degrees as an interval, changing the azimuth angle of a sound source from 0 degree to 180 degrees, and calculating 25 coherent functions under the ideal condition of double-microphone signals;
the formula for the coherence function is:
Figure BDA0001202577780000042
wherein d is the distance between the two microphones; where c is 340m/s, θ is the azimuth of the sound source, ω is the angular frequency, and fs is the sampling rate, which is 16000Hz in this embodiment.
The azimuth angle of the sound source has 25 values, and 25 different coherence functions can be obtained in total.
Step 1-2) respectively extracting real parts and imaginary parts of 25 coherent functions;
the real part is: cos (ω · τ · cos (θ));
the imaginary part is sin (ω · τ · cos (θ));
wherein,
Figure BDA0001202577780000043
from the real part and imaginary part formulas, in an ideal environment (only a target sound source, no interference of background noise and reverberation), the modulus of the coherence function is:
Figure BDA0001202577780000051
step 1-3) utilizing a K nearest KNN (K-nearest neighbor) classification algorithm to obtain the characteristic quantity of the coherence function of each angle: and classifying the imaginary part and the real part, wherein each angle corresponds to one class, namely 25 classes in total, and the classification label is set to be 1-25, so that a coherence function characteristic quantity database is established.
Fig. 3 shows the real and imaginary values of the ideal coherence function at three different azimuthal angles. The microphone distance in this figure is 0.255 m. As can be seen from FIG. 3, the difference between the real part and the imaginary part of the coherent function at different azimuth angles is large, and the present invention uses the characteristics of the real part and the imaginary part of the coherent function at different azimuth angles to make classification judgment.
And 2) converting signals received by the double microphones into a frequency domain, calculating a coherent function of the two signals in the frequency domain, performing peak value smoothing on each frequency point according to a module value of the coherent function, then calculating a real part and an imaginary part of the coherent function, matching in a coherent function characteristic quantity database to obtain a sound source azimuth angle, and calculating time delay difference.
As shown in fig. 4, the step 2) specifically includes:
step 2-1), framing and windowing signals received by the double microphones, then transforming the signals into a frequency domain through FFT, recording the signals transformed into the frequency domain as X1 (lambda, mu), wherein X2 (lambda, mu) is a time frame, and mu is a frequency point of the frequency domain;
in the embodiment, the received signals x1, x2 of the two microphones are subjected to framing, windowing and FFT conversion; at a sampling rate of 16000Hz, 512 samples per frame and 128 points are shifted.
Step 2-2) calculating a coherent function of the two signals on a frequency domain;
the formula for the calculation of the coherence function is:
Figure BDA0001202577780000052
where PX1X1(λ, μ) is the self-power spectrum of signal X1(λ, μ), PX2X2(λ, μ) is the self-power spectrum of signal X2(λ, μ), and PX1X2(λ, μ) is the cross-power spectrum of the two signals:
PX1X1(λ,μ)=α·PX1X1((λ-1,μ))+(1-α)·|X1(λ,μ)|2
PX2X2(λ,μ)=α·PX2X2((λ-1,μ))+(1-α)·|X2(λ,μ)|2
PX1X2(λ,μ)=α·PX1X2((λ-1,μ))+(1-α)·|X1(λ,μ)·X2(λ,μ)|2
wherein alpha is a smoothing factor and takes a value of 0.68.
Step 2-3) calculating a modulus value gamma of the coherence functionX1X2(λ, μ) |, and performing peak smoothing at each frequency point;
computing the modulus value | Γ of the coherence functionX1X2(lambda, mu) l, and acquiring the Peak value Peak (lambda, mu) of each frequency point; if | ΓX1X2(lambda, mu) is larger than | Peak (lambda, mu) |, the | Peak (lambda, mu) | is smoothed, and the Peak value after smoothing
Figure BDA0001202577780000061
Comprises the following steps:
Figure BDA0001202577780000062
α1the value is 0.35;
if | ΓX1X2(λ, μ) | is smaller than | Peak (λ, μ) |, the smoothed Peak value
Figure BDA0001202577780000063
Comprises the following steps:
Figure BDA0001202577780000064
wherein alpha is2The value is 0.95, and the initial value of Peak (lambda, mu) is equal to gammaX1X2(1, μ), the initial value is equal to the coherence function of the first frame of speech.
Step 2-4) extracting in frequency domain
Figure BDA0001202577780000065
The real part and the imaginary part are matched with a coherent function characteristic quantity database by using a K nearest KNN classification algorithm according to the two values, and each frame of signal obtains a matching result; the matching result is a classification labelFrom the number of (2), the corresponding azimuth angle theta can be obtained0Then the delay difference is:
Figure BDA0001202577780000066
the invention fully utilizes the characteristics of a real part and an imaginary part of a frequency domain coherent function to construct a characteristic vector to classify sound sources with different azimuth angles, and the algorithm realizes the azimuth judgment of each frame of voice signals through off-line modeling and on-line prediction, namely delay difference estimation. Meanwhile, the algorithm considers the interference problem under the background noise and reverberation environment, utilizes the module value of the frequency domain coherent function to smooth the peak value of the coherent function, and effectively ensures the effectiveness of the characteristic vector at the online prediction stage. The algorithm is clear in idea, simple and effective. The method is convenient to realize in real time in devices such as microphone arrays and the like.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (1)

1. A two-microphone time delay difference estimation method based on a frequency domain coherence function, the method comprising:
step 1) calculating coherent functions of the two microphones at different azimuth angles under the condition that the two microphones do not receive sound source signals, extracting a real part and an imaginary part of the coherent functions under the coherent functions corresponding to each angle, and establishing a database of coherent function characteristic quantities;
step 2) converting signals received by the double microphones into a frequency domain, calculating a coherent function of the two signals on the frequency domain, performing peak value smoothing on each frequency point according to a module value of the coherent function, then calculating a real part and an imaginary part of the coherent function, matching in a coherent function characteristic quantity database to obtain a sound source azimuth angle, and calculating a time delay difference;
the step 1) specifically comprises the following steps:
step 1-1) carrying out incremental increase by taking every 7.5 degrees as an interval, changing the azimuth angle of a sound source from 0 degree to 180 degrees, and calculating 25 coherence functions of the signals of the double microphones under an ideal condition;
the formula for the coherence function is:
Figure FDA0002964018620000011
wherein d is the distance between the two microphones; c is 340m/s, theta is the azimuth angle of the sound source, omega is the angular frequency, and fs is the sampling rate;
step 1-2) respectively extracting real parts and imaginary parts of 25 coherent functions;
the real part is: cos (ω · τ · cos (θ)); the imaginary part is sin (ω · τ · cos (θ)); wherein,
Figure FDA0002964018620000012
step 1-3) using a K nearest KNN classification algorithm to obtain the characteristic quantity of the coherent function of each angle: classifying the imaginary part and the real part, wherein each angle corresponds to one class, namely 25 classes in total, and the classification label is set to be 1-25, so that a coherence function characteristic quantity database is established;
the step 2) specifically comprises the following steps:
step 2-1), framing and windowing signals received by the double microphones, then transforming the signals into a frequency domain through FFT, recording the signals transformed into the frequency domain as X1 (lambda, mu), wherein X2 (lambda, mu) is a time frame, and mu is a frequency point of the frequency domain;
step 2-2) calculating a coherent function of the two signals on a frequency domain;
the formula for the calculation of the coherence function is:
Figure FDA0002964018620000013
where PX1X1(λ, μ) is the self-power spectrum of signal X1(λ, μ), PX2X2(λ, μ) is the self-power spectrum of signal X2(λ, μ), and PX1X2(λ, μ) is the cross-power spectrum of the two signals:
PX1X1(λ,μ)=α·PX1X1((λ-1,μ))+(1-α)·|X1(λ,μ)|2
PX2X2(λ,μ)=α·PX2X2((λ-1,μ))+(1-α)·|X2(λ,μ)|2
PX1X2(λ,μ)=α·PX1X2((λ-1,μ))+(1-α)·|X1(λ,μ)·X2(λ,μ)|2
wherein α is a smoothing factor;
step 2-3) calculating a modulus value gamma of the coherence functionX1X2(lambda, mu) l, and performing peak value smoothing at each frequency point to obtain a smoothed peak value
Figure FDA0002964018620000021
Step 2-4) extracting in frequency domain
Figure FDA0002964018620000022
The real part and the imaginary part are matched with a coherent function characteristic quantity database by using a K nearest KNN classification algorithm according to the two values, and each frame of signal obtains a matching result; the matching result is the serial number of the classification label, and the corresponding azimuth angle theta is obtained0Then the delay difference Time is:
Figure FDA0002964018620000023
the value of the smoothing factor alpha in the step 2-2) is 0.68;
the specific implementation process of the step 2-3) is as follows:
computing the modulus value | Γ of the coherence functionX1X2(lambda, mu) l, and acquiring the Peak value Peak (lambda, mu) of each frequency point; if | ΓX1X2(lambda, mu) is greater than | Peak (lambda, mu) |, Peak (lambda, mu) is smoothed, and the smoothed Peak value
Figure FDA0002964018620000024
Comprises the following steps:
Figure FDA0002964018620000025
wherein alpha is1The value is 0.35;
if | ΓX1X2(λ, μ) | is smaller than | Peak (λ, μ) |, the smoothed Peak value
Figure FDA0002964018620000026
Comprises the following steps:
Figure FDA0002964018620000027
wherein alpha is2The value is 0.95, and the initial value of Peak (lambda, mu) is equal to gammaX1X2(1, μ), the initial value is equal to the coherence function of the first frame of speech.
CN201710004194.6A 2017-01-04 2017-01-04 Double-microphone time delay difference estimation method based on frequency domain coherent function Active CN108269581B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710004194.6A CN108269581B (en) 2017-01-04 2017-01-04 Double-microphone time delay difference estimation method based on frequency domain coherent function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710004194.6A CN108269581B (en) 2017-01-04 2017-01-04 Double-microphone time delay difference estimation method based on frequency domain coherent function

Publications (2)

Publication Number Publication Date
CN108269581A CN108269581A (en) 2018-07-10
CN108269581B true CN108269581B (en) 2021-06-08

Family

ID=62771665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710004194.6A Active CN108269581B (en) 2017-01-04 2017-01-04 Double-microphone time delay difference estimation method based on frequency domain coherent function

Country Status (1)

Country Link
CN (1) CN108269581B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992176B (en) * 2020-09-30 2022-07-19 北京海兰信数据科技股份有限公司 Ship acoustic signal identification method and device
CN112526452B (en) * 2020-11-24 2024-08-06 杭州萤石软件有限公司 Sound source detection method, pan-tilt camera, intelligent robot and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101494522A (en) * 2008-12-30 2009-07-29 清华大学 Method for eliminating wireless signal interference based on network encode
CN102854494A (en) * 2012-08-08 2013-01-02 Tcl集团股份有限公司 Sound source locating method and device
CN103076593A (en) * 2012-12-28 2013-05-01 中国科学院声学研究所 Sound source localization method and device
CN104101871A (en) * 2013-04-15 2014-10-15 中国科学院声学研究所 Narrowband interference suppression method and narrowband interference suppression system used for passive synthetic aperture
JP2016156944A (en) * 2015-02-24 2016-09-01 日本電信電話株式会社 Model estimation device, target sound enhancement device, model estimation method, and model estimation program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9354310B2 (en) * 2011-03-03 2016-05-31 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for source localization using audible sound and ultrasound

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101494522A (en) * 2008-12-30 2009-07-29 清华大学 Method for eliminating wireless signal interference based on network encode
CN102854494A (en) * 2012-08-08 2013-01-02 Tcl集团股份有限公司 Sound source locating method and device
CN103076593A (en) * 2012-12-28 2013-05-01 中国科学院声学研究所 Sound source localization method and device
CN104101871A (en) * 2013-04-15 2014-10-15 中国科学院声学研究所 Narrowband interference suppression method and narrowband interference suppression system used for passive synthetic aperture
JP2016156944A (en) * 2015-02-24 2016-09-01 日本電信電話株式会社 Model estimation device, target sound enhancement device, model estimation method, and model estimation program

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A BINAURAL SOUND SOURCE LOCALIZATION MODEL BASED ON TIME-DELAY COMPENSATION AND INTERAURAL COHERENCE;Hong Liu, Jie Zhang;《ICASSP》;20141231;第1424-1428页 *
A binaural speech enhancement algorithm: Application to background and directional noise fields;Yi Fang, Youyuan Chen, Haihong Feng;《CISP 2015》;20151231;第1261-1265页 *
一种抑制方向性噪声的双耳语音増强算法;方义,冯海泓,陈友元,胡晓城;《声学学报》;20161130;第41卷(第6期);第897-904页 *
一种频域自适应最大似然时延估计算法;陈华伟,赵俊渭,郭业才;《系统工程与电子技术》;20031130;第25卷(第11期);第1355-1361页 *

Also Published As

Publication number Publication date
CN108269581A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN106251877B (en) Voice Sounnd source direction estimation method and device
CN108731886B (en) A kind of more leakage point acoustic fix ranging methods of water supply line based on iteration recursion
WO2019080551A1 (en) Target voice detection method and apparatus
CN111429939B (en) Sound signal separation method of double sound sources and pickup
CN110534126B (en) Sound source positioning and voice enhancement method and system based on fixed beam forming
CN102411138A (en) Method for positioning sound source by robot
Pavlidi et al. Real-time multiple sound source localization using a circular microphone array based on single-source confidence measures
CN111239687A (en) Sound source positioning method and system based on deep neural network
CN111044973A (en) MVDR target sound source directional pickup method for microphone matrix
CN109188362A (en) A kind of microphone array auditory localization signal processing method
Ren et al. A novel multiple sparse source localization using triangular pyramid microphone array
CN103901400B (en) A kind of based on delay compensation and ears conforming binaural sound source of sound localization method
CN109212481A (en) A method of auditory localization is carried out using microphone array
CN108269581B (en) Double-microphone time delay difference estimation method based on frequency domain coherent function
Imran et al. A methodology for sound source localization and tracking: Development of 3D microphone array for near-field and far-field applications
Wang et al. Pseudo-determined blind source separation for ad-hoc microphone networks
Hu et al. Decoupled direction-of-arrival estimations using relative harmonic coefficients
WO2022042864A1 (en) Method and apparatus for measuring directions of arrival of multiple sound sources
Hu et al. Evaluation and comparison of three source direction-of-arrival estimators using relative harmonic coefficients
Grondin et al. A study of the complexity and accuracy of direction of arrival estimation methods based on GCC-PHAT for a pair of close microphones
Nikunen et al. Time-difference of arrival model for spherical microphone arrays and application to direction of arrival estimation
Firoozabadi et al. Combination of nested microphone array and subband processing for multiple simultaneous speaker localization
Dang et al. Multiple sound source localization based on a multi-dimensional assignment model
Deleforge et al. Audio-motor integration for robot audition
Sledevič et al. An evaluation of hardware-software design for sound source localization based on SoC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant