WO2018086444A1 - Method for estimating signal-to-noise ratio for noise suppression, and user terminal - Google Patents

Method for estimating signal-to-noise ratio for noise suppression, and user terminal Download PDF

Info

Publication number
WO2018086444A1
WO2018086444A1 PCT/CN2017/106502 CN2017106502W WO2018086444A1 WO 2018086444 A1 WO2018086444 A1 WO 2018086444A1 CN 2017106502 W CN2017106502 W CN 2017106502W WO 2018086444 A1 WO2018086444 A1 WO 2018086444A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise ratio
estimated
audio frame
current audio
signal
Prior art date
Application number
PCT/CN2017/106502
Other languages
French (fr)
Chinese (zh)
Inventor
谢单辉
Original Assignee
电信科学技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 电信科学技术研究院 filed Critical 电信科学技术研究院
Publication of WO2018086444A1 publication Critical patent/WO2018086444A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise

Definitions

  • the present disclosure relates to the field of voice technologies, and in particular, to a noise suppression signal to noise ratio estimation method and a user terminal.
  • a single microphone noise reduction method is generally used in a user terminal to perform noise reduction on an audio signal.
  • the method mainly includes the following steps:
  • the noisy speech is used to decompose the frequency domain signal Y in the frequency domain;
  • FFT fast Fourier Transformation
  • the noise-reduced frequency domain signal is transformed into a time domain signal by Inverse Fast Fourier Transform (IFFT).
  • IFFT Inverse Fast Fourier Transform
  • the a priori signal-to-noise ratio is estimated using a direct decision method, that is, estimated by the following formula:
  • An estimate of the a priori signal-to-noise ratio of the current frame, ⁇ usually needs to take a smoothing number close to 1, specifically 0.95 to 1.
  • the estimated value of the posterior SNR is heavily biased towards the noise reduction processing result of the previous frame. and Can be seen as the previous frame of speech variance Instantaneous value. Therefore, the a priori estimated signal-to-noise ratio ⁇ estimated by the above formula is not an estimate of the signal-to-noise ratio ⁇ (m) of the current frame, and can be regarded as estimating the a priori signal-to-noise ratio ⁇ (m-1) of the previous frame. It can be seen that it is currently estimated that the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is not conducive to the problem of noise suppression of the current audio frame.
  • the purpose of the present disclosure is to provide a noise suppression signal to noise ratio estimation method and a user terminal, which solves the problem that estimating the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is disadvantageous to the noise of the current audio frame.
  • the problem of suppression is to provide a noise suppression signal to noise ratio estimation method and a user terminal, which solves the problem that estimating the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is disadvantageous to the noise of the current audio frame.
  • an embodiment of the present disclosure provides a method for estimating a priori signal to noise ratio, including:
  • MMSE minimum mean square error
  • a final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
  • the estimating an a priori signal to noise ratio of the current audio frame includes:
  • the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • the method further includes:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
  • the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
  • Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
  • the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the calculating a voice existence probability of the current audio frame includes:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
  • the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
  • the embodiment of the present disclosure further provides a user terminal, including:
  • a first estimating module configured to estimate an estimated a priori signal to noise ratio of the current audio frame
  • a first calculating module configured to calculate an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
  • a second calculating module configured to calculate a voice existence probability of the current audio frame
  • a second estimating module configured to estimate a final a priori signal to noise ratio of the current audio frame in combination with the voice presence probability and the estimated value.
  • the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
  • the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the user terminal further includes:
  • An adjustment module for adjusting a smoothing number required to estimate the estimated a priori signal to noise ratio by the following formula:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the first estimation module is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the first calculating module is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame by using:
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the second calculating module is configured to calculate a voice existence probability of the current audio frame by using the following formula:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • the second estimation module is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
  • the embodiment of the present disclosure further provides a user terminal, including: a processor, a memory, and a transceiver, where:
  • the processor is configured to read a program in the memory and perform the following process:
  • the transceiver is configured to receive and transmit data
  • the memory is capable of storing data used by the processor when performing operations.
  • the final a priori signal-to-noise ratio estimated by combining the estimated probability of the voice of the current frame with the estimated a priori SNR of the current audio frame, compared to the prior art according to the previous frame. Detecting the signal to noise ratio for estimation, the a priori signal to noise ratio that can be estimated by the embodiments of the present disclosure is more correlated with the current audio frame, thereby facilitating noise suppression of the current audio frame.
  • FIG. 1 is a schematic flowchart diagram of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of another noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a user terminal according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a noise suppression signal to noise ratio estimation method, as shown in FIG. 1 , including the following steps:
  • the current audio frame may be a current frame collected by a microphone of the user terminal, and the current frame may be a voice frame or a noise frame.
  • the above-mentioned estimated a priori signal-to-noise ratio may be an a priori signal-to-noise ratio estimated by a direct decision method or a maximum likelihood method.
  • the above estimated MMSE estimate for estimating the a priori SNR may be an estimate of the MMSE using the MMSE algorithm to obtain the above-described estimated prior SNR.
  • the voice existence probability of the current audio frame may be calculated according to the posterior signal to noise ratio of the current audio frame, or may be averaged or smoothed by combining the posterior signal to noise ratio of the same frequency point of the previous frames. The value of the calculation calculates the probability of speech presence of the current audio frame.
  • step 103 may be performed first, then step 101 may be performed, or step 101 may be performed first. Then step 103 is performed.
  • the final a priori signal to noise ratio of the current audio frame may be understood as a priori signal to noise ratio for gain calculation in the process of performing noise reduction on the audio frame, or may also be understood as being directed to the embodiments of the present disclosure.
  • the a priori signal-to-noise ratio of the current audio frame output may be understood as a priori signal to noise ratio for gain calculation in the process of performing noise reduction on the audio frame, or may also be understood as being directed to the embodiments of the present disclosure.
  • Estimating the final a priori signal to noise ratio of the current audio frame according to the voice existence probability and the estimated value may be: determining a probability that the current audio frame is a voice frame according to the voice existence probability, and determining that the current audio frame is pure noise Frame, then set the final a priori SNR to a stable minimum, such as ⁇ min , to ensure smooth processing of pure noise segments and reduce music noise; and when determining that the current audio frame is an audio frame in a speech segment Then, the final a priori SNR is calculated to be biased toward the estimated minimum azimuth error of the a priori SNR, so that the final a priori SNR estimation is more accurate.
  • the final a priori SNR of the estimated value of the minimum mean square error of the current frame and the estimated a priori SNR of the current audio frame can be realized, the estimated a priori SNR and the current
  • the correlation of audio frames is higher, which is beneficial to the noise suppression of the current audio frame to improve the noise suppression effect.
  • the estimating an a priori signal to noise ratio of the current audio frame includes:
  • the posterior signal to noise ratio of the current audio frame is common knowledge and will not be described in detail herein.
  • the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame may be based on the a posteriori signal to noise ratio estimation value of the current audio frame, using a direct decision method to estimate the current
  • the estimated a priori signal to noise ratio of the audio frame is of course not limited by the embodiments of the present disclosure.
  • the estimating the a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • the estimated a priori signal to noise ratio can be estimated by any one of the above two formulas. According to experiments Corresponding formulas are better for calculating the above-mentioned estimated a priori signal-to-noise ratio. In this method, mainly the musical tone is less, so in the embodiment of the present disclosure, optionally, The corresponding formula calculates the above-mentioned estimated prior signal-to-noise ratio.
  • the smoothing number may be a value set in advance, for example, a value of 0.95 to 1, or a value of 0.98 or 0.3, which is not limited thereto, and the noise variance is common knowledge, and will not be described in detail.
  • the foregoing method further includes:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the ⁇ factor needs to be as large as possible in pure noise, so that the estimated value is as stable as possible, and needs to be as small as possible when there is a voice segment, so as to ensure fast tracking of the voice.
  • the above-mentioned a 1 and a 2 may be 0.98 and 0.3, respectively.
  • the embodiment of the present disclosure does not limit this, for example, it may be 0.95 and 0.28, etc., and may be adjusted according to actual conditions.
  • the accuracy of estimating the a priori signal to noise ratio can be improved by the above a 1 and a 2 .
  • the step of estimating the estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
  • the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
  • Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
  • the estimated a priori signal to noise ratio may be switched according to the audio presence probability of the current audio frame to improve the accuracy of the estimated a priori signal to noise ratio.
  • calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the estimated a priori signal to noise ratio calculated in step 101 is not limited to the above mentioned The estimated a priori signal-to-noise ratio calculated by the formula.
  • a super Gaussian model of speech can also be used to calculate E(X 2
  • the a priori SNR is mainly to estimate the variance of the speech signal. By definition This only depends on the speech signal X. But X is not available, so most of the pairs The estimation algorithm has to be estimated from the noisy signal Y. This can also be seen from the direct decision method. In the second half of the calculation formula of the direct decision method, ⁇ -1 is the variance of the speech. The maximum likelihood estimate for the case where ⁇ is known (ieY known), the first half is the instantaneous value To replace E(X 2 ).
  • conditional expectations are employed. or To estimate the variance of speech Based on this idea, from the definition of conditional expectations It can be seen that the corresponding is actually the MMSE estimation of the speech amplitude spectrum X 2 . Considering the probability p(H 1
  • the above The formula of the representation can pass the complex Gaussian model Super Gaussian model Derived.
  • the estimated value of the minimum mean square error of the estimated prior signal to noise ratio may be directly calculated by using the above formula, without performing the derivation process desired by the above condition, and performing the corresponding steps. That is, the above conditions are expected to be merely explanations of the principles at the time of implementation in the embodiments of the present disclosure.
  • the calculating a voice existence probability of the current audio frame includes:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • speech and noise are distinguished by the above formula.
  • the probability of existence of speech when the above formula is used to calculate the probability of existence of speech, the probability of existence of the current audio frame can be calculated by combining the a posteriori signal-to-noise ratio of the same frequency points of the previous frames to obtain an average or smoothed value. Additionally, the above formula may be derived directly from the complex Gaussian model provided above.
  • the probability of existence by voice is to provide a probability of existence of a voice, so that the current estimated a priori signal-to-noise ratio can be soft-switched in pure noise and voice segments, thereby accelerating the tracking delay problem existing in the direct decision method.
  • the advantages of the direct decision method can be retained.
  • the foregoing estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
  • the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
  • the calculation of the above formula is such that the final a priori signal-to-noise ratio pure noise is kept as small as possible at a stable small value, such as ⁇ min , and in the speech segment, the estimated a priori signal-to-noise ratio is biased toward Or understand that the estimated a priori signal-to-noise ratio is biased toward
  • the voice state and the voiceless state can be distinguished, and the optimal a priori signal and noise estimate is derived according to the MMSE criterion in the voice state.
  • the existence and non-existence state of speech are calculated by the probability of existence of speech.
  • the probability is calculated by using the fixed value a priori SNR, which makes the a priori SNR estimation more accurate and can solve the existence of direct judgment. Tracking delay issues.
  • the estimated a priori signal to noise ratio may be used for gain calculation of the noise reduction process of the audio signal, and optionally, gain calculation using a single microphone noise reduction process may be applied.
  • the a posteriori signal-to-noise ratio and the power spectrum of the previous frame processing structure are obtained, and the a priori of the current audio frame is calculated using a direct decision method based on the posterior signal-to-noise ratio and the power spectrum of the previous frame processing structure.
  • Signal-to-noise ratio calculating a voice existence probability of a current audio signal frame based on a posteriori signal-to-noise ratio, calculating an estimated value of the MMSE estimating the a priori signal-to-noise ratio, and estimating the current in combination with the voice existence probability and the estimated value
  • the final a priori signal-to-noise ratio of the audio frame which is used for gain calculation.
  • the effect of the inherent delay of one frame can be eliminated by the above steps, and the initial segment of the speech is attenuated and the tail of the end segment is degraded, thereby improving the noise reduction performance.
  • the following is an explanation of the results through experimental data:
  • the experiment uses the noisy MMSE database, the data sampling rate is 8 kHz, the white noise is generated using Cool Edit (for an audio processing software), and the other noise is the noisyzus database.
  • the frame length is 20ms, the overlap rate is 50%, and the square root Hanning window is used before and after. Take 15dB. ⁇ min takes -20dB, the suppression criterion uses MMSE-STSA (Short-Time Spectral Amplitude) algorithm, and the noise estimation uses unbiased MMSE algorithm.
  • MMSE-STSA Short-Time Spectral Amplitude
  • Figures 3 and 4 show a comparison between the direct decision and the method of the present disclosure when the signal to noise ratio is 0 dB and 5 dB, respectively.
  • the speech in Figure 3 is sp01
  • the noise is white noise
  • the speech in Figure 4 is sp04
  • the noise is car noise.
  • sp01 and sp04 are the speech numbers in the data set.
  • Figure 5 shows the noisysus database of 30 sets of car noise and white noise, and the average segment signal-to-noise ratio is improved at 0/5/10/15 dB. It is easy to see from the figure that the performance of the present disclosure method is superior to the direct decision.
  • any user terminal with a microphone such as a mobile phone, a tablet personal computer, a laptop computer, a personal digital assistant (PDA), and a mobile device.
  • a terminal device such as a Mobile Intemet Device (MID), an in-vehicle device, or a wearable device, it should be noted that the specific type of the user terminal is not limited in the embodiment of the present disclosure.
  • Estimating an estimated a priori signal to noise ratio of the current audio frame calculating an estimated value of the estimated MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
  • the probability of speech presence of the current audio frame; the final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
  • the user terminal 600 includes the following modules:
  • the first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame
  • the first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame;
  • a second calculating module 603, configured to calculate a voice existence probability of the current audio frame
  • the second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame in conjunction with the voice presence probability and the estimated value.
  • the first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
  • the first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • the user terminal 600 further includes:
  • the adjusting module 605 is configured to adjust, by using the following formula, a smoothing number required to estimate the estimated a priori signal to noise ratio:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the first estimation module 601 is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
  • Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
  • the first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame by using a formula :
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the second calculating module 603 is configured to calculate a voice existence probability of the current audio frame by using the following formula:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • the second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
  • the user terminal 600 may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any implementation in the method embodiment in the embodiment of the present disclosure The method can be implemented by the foregoing user terminal 600 in the embodiment, and achieve the same beneficial effects, and details are not described herein again.
  • an embodiment of the present disclosure provides a structure of another user terminal, including: a processor 800, a transceiver 810, a memory 820, a user interface 830, and a bus interface, where:
  • the processor 800 is configured to read a program in the memory 820 and perform the following process:
  • a final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
  • the microphone included in the user interface 830, the transceiver 810, is configured to receive and transmit data under the control of the processor 800.
  • the bus architecture may include any number of interconnected buses and bridges, specifically linked by one or more processors represented by processor 800 and various circuits of memory represented by memory 820.
  • the bus architecture can also link various other circuits such as peripherals, voltage regulators, and power management circuits.
  • the bus interface provides an interface.
  • Transceiver 810 can be a plurality of components, including a transmitter and a receiver, providing means for communicating with various other devices on a transmission medium.
  • the user interface 830 may also be an interface capable of externally connecting the required devices, including but not limited to a keypad, a display, a speaker, a microphone, a joystick, and the like.
  • the processor 800 is responsible for managing the bus architecture and general processing, and the memory 820 can store data used by the processor 800 in performing operations.
  • the estimating an a priori signal to noise ratio of the current audio frame includes:
  • the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • the estimated a priori SNR of the current audio frame is estimated by the following formula:
  • processor 800 is further configured to:
  • a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
  • the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
  • the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
  • Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
  • the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
  • An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  • the calculating a voice existence probability of the current audio frame includes:
  • Y) represents the probability of existence of the speech
  • p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
  • exp() is an exponential function
  • ⁇ min and ⁇ max are two empirical values
  • ⁇ min ⁇ max , p max and p min are two empirical values
  • the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
  • the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
  • the user terminal may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any of the method embodiments in the embodiments of the present disclosure It can be implemented by the above user terminal in this embodiment, and achieve the same beneficial effects, and will not be described again here.
  • the disclosed method and apparatus may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform part of the steps of the method of transmitting and receiving described in various embodiments of the present disclosure.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), and a random access memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

A method for estimating signal-to-noise ratio for noise suppression, and a user terminal. The method may comprise: estimating preestimated priori signal-to-noise ratio of a current audio frame (101); computing, according to the preestimated priori signal-to-noise ratio, an estimated value of an MMSE corresponding to the preestimated priori signal-to-noise ratio of the current audio frame (102); computing a speech presence probability of the current audio frame (103); and estimating final priori signal-to-noise ratio of the current audio frame with reference to the speech presence probability and the estimated value (104).

Description

噪声抑制信噪比估计方法和用户终端Noise suppression signal to noise ratio estimation method and user terminal
相关申请的交叉引用Cross-reference to related applications
本申请主张在2016年11月10日在中国提交的中国专利申请No.201611039463.4的优先权,其全部内容通过引用包含于此。The present application claims priority to Chinese Patent Application No. 201611039463.4, filed on Jan.
技术领域Technical field
本公开文本涉及语音技术领域,尤其涉及一种噪声抑制信噪比估计方法和用户终端。The present disclosure relates to the field of voice technologies, and in particular, to a noise suppression signal to noise ratio estimation method and a user terminal.
背景技术Background technique
目前用户终端中通常采用单麦克风降噪方法对音频信号进行降噪,该方法中主要包括如下步骤:At present, a single microphone noise reduction method is generally used in a user terminal to perform noise reduction on an audio signal. The method mainly includes the following steps:
将带噪语音使用快速傅氏变换(Fast Fourier Transformation,FFT)或者其他变换方法,将带噪语音在频域分解频域信号Y;Using a fast Fourier Transformation (FFT) or other transform method, the noisy speech is used to decompose the frequency domain signal Y in the frequency domain;
估计频域信号Y的噪声方差;Estimating the noise variance of the frequency domain signal Y;
基于上述噪声方差推算先验信噪比和后验信噪比;Estimating the a priori signal to noise ratio and the a posteriori signal to noise ratio based on the noise variance described above;
根据先验信噪比和后验信噪比计算出适合的增益;Calculating a suitable gain based on the a priori signal to noise ratio and the a posteriori signal to noise ratio;
对频域信号Y的每个频域乘以上述增益,以得到降噪后的频域信号;Multiplying each frequency domain of the frequency domain signal Y by the above gain to obtain a noise-reduced frequency domain signal;
将降噪后的频域信号通过快速傅氏逆变换(Inverse Fast Fourier Transform,IFFT)变换成时域信号。The noise-reduced frequency domain signal is transformed into a time domain signal by Inverse Fast Fourier Transform (IFFT).
然而,上述技术中,先验信噪比是采用直接判决方法估计的,即通过如下公式进行估计的:However, in the above technique, the a priori signal-to-noise ratio is estimated using a direct decision method, that is, estimated by the following formula:
Figure PCTCN2017106502-appb-000001
Figure PCTCN2017106502-appb-000001
其中,
Figure PCTCN2017106502-appb-000002
表示当前帧的先验信噪比的估计值,α通常需要取接近1的平滑数,具体为0.95~1的值,
Figure PCTCN2017106502-appb-000003
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000004
表示噪声方差,
Figure PCTCN2017106502-appb-000005
表示当前帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000002
An estimate of the a priori signal-to-noise ratio of the current frame, α usually needs to take a smoothing number close to 1, specifically 0.95 to 1.
Figure PCTCN2017106502-appb-000003
Indicates the noise reduction processing result of the previous frame,
Figure PCTCN2017106502-appb-000004
Indicates the noise variance,
Figure PCTCN2017106502-appb-000005
Represents an a posteriori signal to noise ratio estimate for the current frame.
通过上述公式可以看出,后验信噪比的估计值严重偏向于前一帧的降噪 处理结果
Figure PCTCN2017106502-appb-000006
Figure PCTCN2017106502-appb-000007
可以看成是前一帧语音方差
Figure PCTCN2017106502-appb-000008
的瞬时值。所以,通过上述公式最终估计到的先验信噪比ξ并非是估计当前帧的信噪比ξ(m),可以视为估计前一帧的先验信噪比ξ(m-1)。可见,目前估算当前音频帧的先验信噪比存在与当前音频帧的相关性较差,不利于当前音频帧的噪声抑制的问题。
It can be seen from the above formula that the estimated value of the posterior SNR is heavily biased towards the noise reduction processing result of the previous frame.
Figure PCTCN2017106502-appb-000006
and
Figure PCTCN2017106502-appb-000007
Can be seen as the previous frame of speech variance
Figure PCTCN2017106502-appb-000008
Instantaneous value. Therefore, the a priori estimated signal-to-noise ratio ξ estimated by the above formula is not an estimate of the signal-to-noise ratio ξ(m) of the current frame, and can be regarded as estimating the a priori signal-to-noise ratio ξ(m-1) of the previous frame. It can be seen that it is currently estimated that the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is not conducive to the problem of noise suppression of the current audio frame.
发明内容Summary of the invention
本公开文本的目的在于提供一种噪声抑制信噪比估计方法和用户终端,解决了估算当前音频帧的先验信噪比存在与当前音频帧的相关性较差,不利于当前音频帧的噪声抑制的问题。The purpose of the present disclosure is to provide a noise suppression signal to noise ratio estimation method and a user terminal, which solves the problem that estimating the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is disadvantageous to the noise of the current audio frame. The problem of suppression.
为了达到上述目的,本公开文本实施例提供一种先验信噪比估计方法,包括:In order to achieve the above object, an embodiment of the present disclosure provides a method for estimating a priori signal to noise ratio, including:
估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差(Minimum Mean Square Error,MMSE)的估计值;Calculating an estimated value of a minimum mean square error (MMSE) corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。A final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
可选地,所述估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame includes:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
可选地,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame, including:
通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
Figure PCTCN2017106502-appb-000009
Figure PCTCN2017106502-appb-000009
其中,表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000011
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000012
表示噪声方差,
Figure PCTCN2017106502-appb-000013
表示所述当前音频帧的后验信噪比估计值;
among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000011
Indicates the noise reduction processing result of the previous frame,
Figure PCTCN2017106502-appb-000012
Indicates the noise variance,
Figure PCTCN2017106502-appb-000013
Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者, Or,
通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
Figure PCTCN2017106502-appb-000014
Figure PCTCN2017106502-appb-000014
其中,
Figure PCTCN2017106502-appb-000015
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000016
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000017
表示当前帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000015
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000016
For the a priori signal to noise ratio of the previous frame,
Figure PCTCN2017106502-appb-000017
Represents an a posteriori signal to noise ratio estimate for the current frame.
可选地,所述方法还包括:Optionally, the method further includes:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:The smoothing number required to estimate the estimated a priori signal to noise ratio is adjusted by the following formula:
Figure PCTCN2017106502-appb-000018
Figure PCTCN2017106502-appb-000018
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
可选地,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:Optionally, the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice, further comprising:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
Figure PCTCN2017106502-appb-000019
或者
Figure PCTCN2017106502-appb-000020
Figure PCTCN2017106502-appb-000019
or
Figure PCTCN2017106502-appb-000020
其中,
Figure PCTCN2017106502-appb-000021
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000022
Figure PCTCN2017106502-appb-000023
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
among them,
Figure PCTCN2017106502-appb-000021
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000022
with
Figure PCTCN2017106502-appb-000023
Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
可选地,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:Optionally, the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame, including:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Calculating an estimated value of the minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio:
Figure PCTCN2017106502-appb-000024
Figure PCTCN2017106502-appb-000024
其中,
Figure PCTCN2017106502-appb-000025
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000026
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000027
表示所述当前音频帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000025
An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000026
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000027
Representing an a posteriori signal to noise ratio estimate for the current audio frame.
可选地,所述计算所述当前音频帧的语音存在概率,包括:Optionally, the calculating a voice existence probability of the current audio frame includes:
通过如下公式计算所述当前音频帧的语音存在概率: Calculating the probability of existence of the current audio frame by the following formula:
Figure PCTCN2017106502-appb-000028
Figure PCTCN2017106502-appb-000028
Figure PCTCN2017106502-appb-000029
Figure PCTCN2017106502-appb-000029
或者
Figure PCTCN2017106502-appb-000030
or
Figure PCTCN2017106502-appb-000030
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000031
为一固定值,
Figure PCTCN2017106502-appb-000032
表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability,
Figure PCTCN2017106502-appb-000031
For a fixed value,
Figure PCTCN2017106502-appb-000032
Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ minmax , p max and p min are two empirical values And p min <p max .
可选地,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:Optionally, the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value, including:
通过如下公式估计所述当前音频帧的最终先验信噪比:The final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
Figure PCTCN2017106502-appb-000033
Figure PCTCN2017106502-appb-000033
其中,
Figure PCTCN2017106502-appb-000034
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000035
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
among them,
Figure PCTCN2017106502-appb-000034
Representing the final a priori signal to noise ratio of the current audio frame,
Figure PCTCN2017106502-appb-000035
An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
本公开文本实施例还提供一种用户终端,包括:The embodiment of the present disclosure further provides a user terminal, including:
第一估计模块,用于估计当前音频帧的预估先验信噪比;a first estimating module, configured to estimate an estimated a priori signal to noise ratio of the current audio frame;
第一计算模块,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;a first calculating module, configured to calculate an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
第二计算模块,用于计算所述当前音频帧的语音存在概率;a second calculating module, configured to calculate a voice existence probability of the current audio frame;
第二估计模块,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。And a second estimating module, configured to estimate a final a priori signal to noise ratio of the current audio frame in combination with the voice presence probability and the estimated value.
可选地,所述第一估计模块用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Optionally, the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
可选地,所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比: Optionally, the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000036
Figure PCTCN2017106502-appb-000036
其中,
Figure PCTCN2017106502-appb-000037
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000038
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000039
表示噪声方差,
Figure PCTCN2017106502-appb-000040
表示所述当前音频帧的后验信噪比估计值;
among them,
Figure PCTCN2017106502-appb-000037
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000038
Indicates the noise reduction processing result of the previous frame,
Figure PCTCN2017106502-appb-000039
Indicates the noise variance,
Figure PCTCN2017106502-appb-000040
Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,or,
所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:The first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000041
Figure PCTCN2017106502-appb-000041
其中,
Figure PCTCN2017106502-appb-000042
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000043
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000044
表示当前帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000042
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000043
For the a priori signal to noise ratio of the previous frame,
Figure PCTCN2017106502-appb-000044
Represents an a posteriori signal to noise ratio estimate for the current frame.
可选地,所述用户终端还包括:Optionally, the user terminal further includes:
调整模块,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:An adjustment module for adjusting a smoothing number required to estimate the estimated a priori signal to noise ratio by the following formula:
Figure PCTCN2017106502-appb-000045
Figure PCTCN2017106502-appb-000045
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
可选地,所述第一估计模块还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:Optionally, the first estimation module is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000046
或者
Figure PCTCN2017106502-appb-000047
Figure PCTCN2017106502-appb-000046
or
Figure PCTCN2017106502-appb-000047
其中,
Figure PCTCN2017106502-appb-000048
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000049
Figure PCTCN2017106502-appb-000050
Figure PCTCN2017106502-appb-000051
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
among them,
Figure PCTCN2017106502-appb-000048
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000049
Figure PCTCN2017106502-appb-000050
Figure PCTCN2017106502-appb-000051
Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
可选地,所述第一计算模块用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值:Optionally, the first calculating module is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame by using:
Figure PCTCN2017106502-appb-000052
Figure PCTCN2017106502-appb-000052
其中,
Figure PCTCN2017106502-appb-000053
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000054
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000055
表示所述当前音频帧的后验信噪比估 计值。
among them,
Figure PCTCN2017106502-appb-000053
An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000054
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000055
Representing an a posteriori signal to noise ratio estimate for the current audio frame.
可选地,所述第二计算模块用于通过如下公式计算所述当前音频帧的语音存在概率:Optionally, the second calculating module is configured to calculate a voice existence probability of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000056
Figure PCTCN2017106502-appb-000056
Figure PCTCN2017106502-appb-000057
Figure PCTCN2017106502-appb-000057
或者
Figure PCTCN2017106502-appb-000058
or
Figure PCTCN2017106502-appb-000058
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000059
为一固定值,
Figure PCTCN2017106502-appb-000060
表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability,
Figure PCTCN2017106502-appb-000059
For a fixed value,
Figure PCTCN2017106502-appb-000060
Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ minmax , p max and p min are two empirical values And p min <p max .
可选地,所述第二估计模块用于通过如下公式估计所述当前音频帧的最终先验信噪比:Optionally, the second estimation module is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000061
Figure PCTCN2017106502-appb-000061
其中,
Figure PCTCN2017106502-appb-000062
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000063
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
among them,
Figure PCTCN2017106502-appb-000062
Representing the final a priori signal to noise ratio of the current audio frame,
Figure PCTCN2017106502-appb-000063
An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
本公开文本实施例还提供一种用户终端,包括:处理器、存储器和收发机,其中:The embodiment of the present disclosure further provides a user terminal, including: a processor, a memory, and a transceiver, where:
所述处理器用于读取存储器中的程序,执行下列过程:The processor is configured to read a program in the memory and perform the following process:
估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;Calculating an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比, Estimating a final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimated value,
其中,所述收发机用于接收和发送数据,所述存储器能够存储处理器在执行操作时所使用的数据。The transceiver is configured to receive and transmit data, and the memory is capable of storing data used by the processor when performing operations.
本公开文本的上述技术方案至少具有如下有益效果:The above technical solution of the present disclosure has at least the following beneficial effects:
本公开文本实施例,估计当前音频帧的预估先验信噪比;根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;计算所述当前音频帧的语音存在概率;结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。由于是结合当前帧的语音存在概率和当前音频帧的预估先验信噪比对应的最小均方误差的估计值估计的最终先验信噪比,相比相关技术中根据前一帧的先验信噪比进行估计,本公开文本实施例可以估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制。In an embodiment of the present disclosure, estimating an estimated a priori signal to noise ratio of a current audio frame; and calculating, according to the estimated a priori signal to noise ratio, an MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame. Estimating a value; calculating a speech presence probability of the current audio frame; estimating a final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimated value. The final a priori signal-to-noise ratio estimated by combining the estimated probability of the voice of the current frame with the estimated a priori SNR of the current audio frame, compared to the prior art according to the previous frame. Detecting the signal to noise ratio for estimation, the a priori signal to noise ratio that can be estimated by the embodiments of the present disclosure is more correlated with the current audio frame, thereby facilitating noise suppression of the current audio frame.
附图说明DRAWINGS
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。以下附图并未刻意按实际尺寸等比例缩放绘制,重点在于示出本申请的主旨。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments described in the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work. The following figures are not intended to be scaled to scale in actual dimensions, with emphasis on the subject matter of the present application.
图1为本公开文本实施例提供的一种噪声抑制信噪比估计方法的流程示意图;FIG. 1 is a schematic flowchart diagram of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure;
图2为本公开文本实施例提供的另一种噪声抑制信噪比估计方法的示意图;2 is a schematic diagram of another noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure;
图3为本公开文本实施例提供的一种噪声抑制信噪比估计方法的实验数据示意图;FIG. 3 is a schematic diagram of experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure; FIG.
图4为本公开文本实施例提供的一种噪声抑制信噪比估计方法的另一实验数据示意图;4 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure;
图5为本公开文本实施例提供的一种噪声抑制信噪比估计方法的另一实验数据示意图;FIG. 5 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure;
图6为本公开文本实施例提供的一种用户终端的结构示意图; FIG. 6 is a schematic structural diagram of a user terminal according to an embodiment of the present disclosure;
图7为本公开文本实施例提供的另一种用户终端的结构示意图;FIG. 7 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure;
图8为本公开文本实施例提供的另一种用户终端的结构示意图。FIG. 8 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure.
具体实施方式detailed description
下面将结合本公开文本实施例中的附图,对本公开文本实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开文本一部分实施例,而不是全部的实施例。基于本公开文本中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开文本保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described in conjunction with the drawings in the embodiments of the present disclosure. It is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. example. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present disclosure without departing from the inventive scope are the scope of the disclosure.
参见图1,本公开文本实施例提供一种噪声抑制信噪比估计方法,如图1所示,包括以下步骤:Referring to FIG. 1 , an embodiment of the present disclosure provides a noise suppression signal to noise ratio estimation method, as shown in FIG. 1 , including the following steps:
101、估计当前音频帧的预估先验信噪比;101. Estimating an estimated a priori signal to noise ratio of the current audio frame;
102、根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;102. Calculate, according to the estimated a priori signal to noise ratio, an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame.
103、计算所述当前音频帧的语音存在概率;103. Calculate a voice existence probability of the current audio frame.
104、结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。104. Estimate a final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimated value.
本公开文本实施例中,上述当前音频帧可以是用户终端的麦克风采集的当前帧,该当前帧可能是语音帧,也有可能是噪声帧。In the embodiment of the present disclosure, the current audio frame may be a current frame collected by a microphone of the user terminal, and the current frame may be a voice frame or a noise frame.
另外,上述预估先验信噪比可以是采用直接判决方法或者最大似然方法等方法进行估计的先验信噪比。上述计算预估先验信噪比的MMSE的估计值可以是采用MMSE算法得到上述预估先验信噪比的MMSE的估计值。上述当前音频帧的语音存在概率可以根据当前音频帧的后验信噪比计算当前音频帧的语音存概率,也可以是结合前几帧相同频点的后验信噪比做一个平均或者平滑得到的值计算当前音频帧的语音存在概率。In addition, the above-mentioned estimated a priori signal-to-noise ratio may be an a priori signal-to-noise ratio estimated by a direct decision method or a maximum likelihood method. The above estimated MMSE estimate for estimating the a priori SNR may be an estimate of the MMSE using the MMSE algorithm to obtain the above-described estimated prior SNR. The voice existence probability of the current audio frame may be calculated according to the posterior signal to noise ratio of the current audio frame, or may be averaged or smoothed by combining the posterior signal to noise ratio of the same frequency point of the previous frames. The value of the calculation calculates the probability of speech presence of the current audio frame.
需要说明的是,对于步骤103与步骤101和步骤102之间的执行顺序,本公开文本实施例不作限定,例如:可以是先执行步骤103,再执行步骤101,或者可以是先执行步骤101,之后再执行步骤103。It should be noted that, in the order of execution between step 103 and step 101 and step 102, the embodiment of the present disclosure is not limited. For example, step 103 may be performed first, then step 101 may be performed, or step 101 may be performed first. Then step 103 is performed.
另外,上述当前音频帧的最终先验信噪比可以是理解为,在对音频帧进 行降噪过程中用于增益计算的先验信噪比,或者也可以理解为本公开文本实施例中针对当前音频帧输出的先验信噪比。结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比可以是,根据上述语音存在概率确定当前音频帧为语音帧的概率,若确定当前音频帧为纯噪声帧,则将上述最终先验信噪比设置为一个稳定的最小值,例如ξmin,以保证纯噪声段处理平稳,减小音乐噪声;而当确定当前音频帧为语音段中的音频帧时,则计算最终先验信噪比偏向于上述预估先验信噪比对应的最小均方误差的估计值,使得最终先验信噪比估计更为准确。In addition, the final a priori signal to noise ratio of the current audio frame may be understood as a priori signal to noise ratio for gain calculation in the process of performing noise reduction on the audio frame, or may also be understood as being directed to the embodiments of the present disclosure. The a priori signal-to-noise ratio of the current audio frame output. Estimating the final a priori signal to noise ratio of the current audio frame according to the voice existence probability and the estimated value may be: determining a probability that the current audio frame is a voice frame according to the voice existence probability, and determining that the current audio frame is pure noise Frame, then set the final a priori SNR to a stable minimum, such as ξ min , to ensure smooth processing of pure noise segments and reduce music noise; and when determining that the current audio frame is an audio frame in a speech segment Then, the final a priori SNR is calculated to be biased toward the estimated minimum azimuth error of the a priori SNR, so that the final a priori SNR estimation is more accurate.
通过上述步骤可以实现结合当前帧的语音存在概率和当前音频帧的预估先验信噪比的最小均方误差的估计值估计的最终先验信噪比,估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制,以提高噪声抑制效果。Through the above steps, the final a priori SNR of the estimated value of the minimum mean square error of the current frame and the estimated a priori SNR of the current audio frame can be realized, the estimated a priori SNR and the current The correlation of audio frames is higher, which is beneficial to the noise suppression of the current audio frame to improve the noise suppression effect.
可选地,所述估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame includes:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
其中,当前音频帧的后验信噪比为公知常识,此处不作详细说明。其中,基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比可以是基于所述当前音频帧的后验信噪比估计值采用直接判决方法估计当前音频帧的预估先验信噪比,当然,本公开文本实施例对此并不作限定。The posterior signal to noise ratio of the current audio frame is common knowledge and will not be described in detail herein. The estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame may be based on the a posteriori signal to noise ratio estimation value of the current audio frame, using a direct decision method to estimate the current The estimated a priori signal to noise ratio of the audio frame is of course not limited by the embodiments of the present disclosure.
可选地,上述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating the a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame, including:
通过如下公式估计当前音频帧的预估验信噪比:Estimate the estimated signal-to-noise ratio of the current audio frame by the following formula:
Figure PCTCN2017106502-appb-000064
Figure PCTCN2017106502-appb-000064
其中,
Figure PCTCN2017106502-appb-000065
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000066
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000067
表示噪声方差,
Figure PCTCN2017106502-appb-000068
表示所述当前音频帧的后验信噪比估计值;
among them,
Figure PCTCN2017106502-appb-000065
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000066
Indicates the noise reduction processing result of the previous frame,
Figure PCTCN2017106502-appb-000067
Indicates the noise variance,
Figure PCTCN2017106502-appb-000068
Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,or,
通过如下公式估计当前音频帧的预估先验信噪比: The estimated a priori SNR of the current audio frame is estimated by the following formula:
Figure PCTCN2017106502-appb-000069
Figure PCTCN2017106502-appb-000069
其中,
Figure PCTCN2017106502-appb-000070
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000071
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000072
表示当前帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000070
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000071
For the a priori signal to noise ratio of the previous frame,
Figure PCTCN2017106502-appb-000072
Represents an a posteriori signal to noise ratio estimate for the current frame.
该实施方式中,可以通过上述两个公式中的任一公式估算上述预估先验信噪比。根据实验表明采用
Figure PCTCN2017106502-appb-000073
对应的公式计算上述预估先验信噪比效果更好,该方法中主要是音乐噪声(musical tone)会少,所以本公开文本实施例中可选地,采用
Figure PCTCN2017106502-appb-000074
对应的公式计算上述预估先验信噪比。
In this embodiment, the estimated a priori signal to noise ratio can be estimated by any one of the above two formulas. According to experiments
Figure PCTCN2017106502-appb-000073
Corresponding formulas are better for calculating the above-mentioned estimated a priori signal-to-noise ratio. In this method, mainly the musical tone is less, so in the embodiment of the present disclosure, optionally,
Figure PCTCN2017106502-appb-000074
The corresponding formula calculates the above-mentioned estimated prior signal-to-noise ratio.
另外,上述平滑数可以是预先设置的数值,例如,为0.95~1的值,或者为0.98或者0.3等数值,对此不作限定,而噪声方差为公知常识,对此不作详细说明。Further, the smoothing number may be a value set in advance, for example, a value of 0.95 to 1, or a value of 0.98 or 0.3, which is not limited thereto, and the noise variance is common knowledge, and will not be described in detail.
可选地,上述方法还包括:Optionally, the foregoing method further includes:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:The smoothing number required to estimate the estimated a priori signal to noise ratio is adjusted by the following formula:
Figure PCTCN2017106502-appb-000075
Figure PCTCN2017106502-appb-000075
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
该实施方式中,考虑到α因子需要在纯噪声时,保证尽可能的大,使得估计出来的值尽可能的稳定,而在有语音段的时候需要尽可能的小,以便保证快速的跟踪语音。其中,上述a1和a2可以分别为0.98和0.3,当然,本公开文本实施例对此并不作限定,例如:还可以是0.95和0.28等,具体还可以根据实际进行调整。In this embodiment, it is considered that the α factor needs to be as large as possible in pure noise, so that the estimated value is as stable as possible, and needs to be as small as possible when there is a voice segment, so as to ensure fast tracking of the voice. . The above-mentioned a 1 and a 2 may be 0.98 and 0.3, respectively. Of course, the embodiment of the present disclosure does not limit this, for example, it may be 0.95 and 0.28, etc., and may be adjusted according to actual conditions.
该实施方式中,通过上述a1和a2可以提高预估先验信噪比的准确性。In this embodiment, the accuracy of estimating the a priori signal to noise ratio can be improved by the above a 1 and a 2 .
可选地,该实施方式中,上述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:Optionally, in this implementation, the step of estimating the estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice, further comprising:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
Figure PCTCN2017106502-appb-000076
或者
Figure PCTCN2017106502-appb-000077
Figure PCTCN2017106502-appb-000076
or
Figure PCTCN2017106502-appb-000077
其中,
Figure PCTCN2017106502-appb-000078
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000079
Figure PCTCN2017106502-appb-000080
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
among them,
Figure PCTCN2017106502-appb-000078
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000079
with
Figure PCTCN2017106502-appb-000080
Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
该实施方式中,可以根据当前音频帧的音频存在概率切换预估先验信噪比,以提高预估先验信噪比的准确性。In this implementation manner, the estimated a priori signal to noise ratio may be switched according to the audio presence probability of the current audio frame to improve the accuracy of the estimated a priori signal to noise ratio.
可选地,上述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:Optionally, calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame, including:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Calculating an estimated value of the minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio:
Figure PCTCN2017106502-appb-000081
Figure PCTCN2017106502-appb-000081
其中,
Figure PCTCN2017106502-appb-000082
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000083
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000084
表示所述当前音频帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000082
An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000083
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000084
Representing an a posteriori signal to noise ratio estimate for the current audio frame.
需要说明的是,上述
Figure PCTCN2017106502-appb-000085
表示步骤101计算得到的所述预估先验信噪比,并不限定是通过上述提到的关于
Figure PCTCN2017106502-appb-000086
公式计算的预估先验信噪比。
It should be noted that the above
Figure PCTCN2017106502-appb-000085
The estimated a priori signal to noise ratio calculated in step 101 is not limited to the above mentioned
Figure PCTCN2017106502-appb-000086
The estimated a priori signal-to-noise ratio calculated by the formula.
其中,上述可以是根据复高斯模型得到的
Figure PCTCN2017106502-appb-000087
此外,还可以采用语音的超高斯模型来计算E(X2|Y)。其中,
Figure PCTCN2017106502-appb-000088
可以等效于E(X2|Y)。因为在实际应用中,先验信噪比主要是估计语音信号的方差
Figure PCTCN2017106502-appb-000089
根据定义
Figure PCTCN2017106502-appb-000090
这只依赖于语音信号X。但X无从获取,所以大部分对
Figure PCTCN2017106502-appb-000091
的估计算法,都得从带噪信号Y估计。这一点也可以从直接判决方法看出,在直接判决方法的计算公式的后一半中的γ-1是对语音方差
Figure PCTCN2017106502-appb-000092
在γ已知(i.e.Y已知)的情况的最大似然估计,前一半是使用瞬时值
Figure PCTCN2017106502-appb-000093
来替换E(X2)。
Wherein, the above may be obtained according to a complex Gaussian model
Figure PCTCN2017106502-appb-000087
In addition, a super Gaussian model of speech can also be used to calculate E(X 2 |Y). among them,
Figure PCTCN2017106502-appb-000088
It can be equivalent to E(X 2 |Y). Because in practical applications, the a priori SNR is mainly to estimate the variance of the speech signal.
Figure PCTCN2017106502-appb-000089
By definition
Figure PCTCN2017106502-appb-000090
This only depends on the speech signal X. But X is not available, so most of the pairs
Figure PCTCN2017106502-appb-000091
The estimation algorithm has to be estimated from the noisy signal Y. This can also be seen from the direct decision method. In the second half of the calculation formula of the direct decision method, γ-1 is the variance of the speech.
Figure PCTCN2017106502-appb-000092
The maximum likelihood estimate for the case where γ is known (ieY known), the first half is the instantaneous value
Figure PCTCN2017106502-appb-000093
To replace E(X 2 ).
所以,从大部分信噪比估计算法来看,都需要建立在带噪信号Y已知的条件下。换句话说,实际上,并不能直接估计语音方差
Figure PCTCN2017106502-appb-000094
而是在Y已知的条件,估计
Figure PCTCN2017106502-appb-000095
因此,本公开文本实施例中,采用条件期望
Figure PCTCN2017106502-appb-000096
Figure PCTCN2017106502-appb-000097
来估计语音方差
Figure PCTCN2017106502-appb-000098
在这种想法的基础上,从条件期望的定义
Figure PCTCN2017106502-appb-000099
可以看出,对应的其实是对语音幅度谱X2的MMSE估计。考虑Y中有语音的概率p(H1|Y),条件期望最终的表达式为:
Therefore, from most of the SNR estimation algorithms, it needs to be established under the condition that the noisy signal Y is known. In other words, in reality, the variance of the speech cannot be directly estimated.
Figure PCTCN2017106502-appb-000094
But the condition known in Y, estimated
Figure PCTCN2017106502-appb-000095
Therefore, in the embodiments of the present disclosure, conditional expectations are employed.
Figure PCTCN2017106502-appb-000096
or
Figure PCTCN2017106502-appb-000097
To estimate the variance of speech
Figure PCTCN2017106502-appb-000098
Based on this idea, from the definition of conditional expectations
Figure PCTCN2017106502-appb-000099
It can be seen that the corresponding is actually the MMSE estimation of the speech amplitude spectrum X 2 . Considering the probability p(H 1 |Y) of speech in Y, the condition expects the final expression to be:
Figure PCTCN2017106502-appb-000100
Figure PCTCN2017106502-appb-000100
根据复高斯模型: According to the complex Gaussian model:
Figure PCTCN2017106502-appb-000101
Figure PCTCN2017106502-appb-000101
其中,p(H0|Y)表示Y已知的条件下,无语音H0的概率,即条件概率,二元假设:Where p(H 0 |Y) represents the probability that there is no speech H 0 under the condition that Y is known, that is, the conditional probability, the binary hypothesis:
H0:Y=N,表示无语音H0: Y=N, indicating no voice
H1:Y=X+N,表示有语音H1: Y=X+N, indicating that there is voice
E(X2|Y,H0)根据上述二元假设,E(X2|Y,H0)=0。E(X 2 |Y, H 0 ) According to the above binary hypothesis, E(X 2 |Y, H 0 )=0.
上式中
Figure PCTCN2017106502-appb-000102
是真正的语音方差,实际需要进一步估计,可以采用最大似然或者直接判决方法估计,另一个方面,还可以从假设语音服从其它模型,例如超高斯模型等,例如卡方(chi)分布:
In the above formula
Figure PCTCN2017106502-appb-000102
It is the true speech variance, which needs to be further estimated. It can be estimated by the maximum likelihood or direct decision method. On the other hand, it can also obey the other models from the hypothetical speech, such as super Gaussian models, such as chi-square (chi) distribution:
Figure PCTCN2017106502-appb-000103
Figure PCTCN2017106502-appb-000103
之后推导出
Figure PCTCN2017106502-appb-000104
After derivation
Figure PCTCN2017106502-appb-000104
Figure PCTCN2017106502-appb-000105
Figure PCTCN2017106502-appb-000105
上面
Figure PCTCN2017106502-appb-000106
是汇通型超几何函数。由于包含超越函数,使得整体计算比较复杂,一般需要查表等方式来实现。
Above
Figure PCTCN2017106502-appb-000106
It is a Huitong type hypergeometric function. Due to the inclusion of the transcendental function, the overall calculation is more complicated, and it is generally required to look up the table and the like.
通过上述分析可知,上述关于
Figure PCTCN2017106502-appb-000107
表示所的公式可以通过复高斯模型
Figure PCTCN2017106502-appb-000108
和超高斯模型
Figure PCTCN2017106502-appb-000109
推导得到。
According to the above analysis, the above
Figure PCTCN2017106502-appb-000107
The formula of the representation can pass the complex Gaussian model
Figure PCTCN2017106502-appb-000108
Super Gaussian model
Figure PCTCN2017106502-appb-000109
Derived.
需要说明的是,本公开文本实施例中,直接可以采用上述公式计算预估先验信噪比的最小均方误差的估计值,而不需要执行上述条件期望的推导过程,而执行相应的步骤即可,上述条件期望仅是本公开文本实施例中在实施时的原理解释说明。It should be noted that, in the embodiment of the present disclosure, the estimated value of the minimum mean square error of the estimated prior signal to noise ratio may be directly calculated by using the above formula, without performing the derivation process desired by the above condition, and performing the corresponding steps. That is, the above conditions are expected to be merely explanations of the principles at the time of implementation in the embodiments of the present disclosure.
可选地,所述计算所述当前音频帧的语音存在概率,包括:Optionally, the calculating a voice existence probability of the current audio frame includes:
通过如下公式计算所述当前音频帧的语音存在概率:Calculating the probability of existence of the current audio frame by the following formula:
Figure PCTCN2017106502-appb-000110
Figure PCTCN2017106502-appb-000110
Figure PCTCN2017106502-appb-000111
Figure PCTCN2017106502-appb-000111
或者
Figure PCTCN2017106502-appb-000112
or
Figure PCTCN2017106502-appb-000112
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000113
为一固定值,
Figure PCTCN2017106502-appb-000114
表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability,
Figure PCTCN2017106502-appb-000113
For a fixed value,
Figure PCTCN2017106502-appb-000114
Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ minmax , p max and p min are two empirical values And p min <p max .
该实施方式中,通过上述公式区分语音和噪声。另外,使用上面公式计算语音存在概率时可以结合前几帧相同频点的后验信噪比做一个平均或者平滑得到的值计算当前音频帧的语音存在概率。另外,上面公式可以是根据上面提供的复高斯模型直接推导出来的。In this embodiment, speech and noise are distinguished by the above formula. In addition, when the above formula is used to calculate the probability of existence of speech, the probability of existence of the current audio frame can be calculated by combining the a posteriori signal-to-noise ratio of the same frequency points of the previous frames to obtain an average or smoothed value. Additionally, the above formula may be derived directly from the complex Gaussian model provided above.
本公开文本实施例中,通过语音存在概率是提供一个语音存在的概率,使得当前估计的先验信噪比能够在纯噪声和语音段进行软切换,从而加快直接判决方法存在的跟踪时延问题,同时又能保留直接判决方法的优点。In the embodiment of the present disclosure, the probability of existence by voice is to provide a probability of existence of a voice, so that the current estimated a priori signal-to-noise ratio can be soft-switched in pure noise and voice segments, thereby accelerating the tracking delay problem existing in the direct decision method. At the same time, the advantages of the direct decision method can be retained.
可选地,上述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:Optionally, the foregoing estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value, including:
通过如下公式估计所述当前音频帧的最终先验信噪比:The final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
Figure PCTCN2017106502-appb-000115
Figure PCTCN2017106502-appb-000115
其中,
Figure PCTCN2017106502-appb-000116
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000117
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
among them,
Figure PCTCN2017106502-appb-000116
Representing the final a priori signal to noise ratio of the current audio frame,
Figure PCTCN2017106502-appb-000117
An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
该实施方式中,通过上述公式计算使得最终先验信噪比纯噪声尽可能保持在一个稳定的小的数值,例如ξmin,而在语音段时,估计的先验信噪比偏向于
Figure PCTCN2017106502-appb-000118
或者理解为估计的先验信噪比偏向于
Figure PCTCN2017106502-appb-000119
In this embodiment, the calculation of the above formula is such that the final a priori signal-to-noise ratio pure noise is kept as small as possible at a stable small value, such as ξ min , and in the speech segment, the estimated a priori signal-to-noise ratio is biased toward
Figure PCTCN2017106502-appb-000118
Or understand that the estimated a priori signal-to-noise ratio is biased toward
Figure PCTCN2017106502-appb-000119
该实施方式中,可以区分有语音状态和无语音状态,在有语音状态根据MMSE准则推导出最优的先验信噪估计。无语音状态,使用某一个最小值来作为最大抑制力度的限制,可以保证纯噪声段处理平稳,减小音乐噪声。语音存在和不存在状态的采用语音存在概率进行计算,该概率采用固定值先验信噪比计算,从而使得先验信噪比估计的更为准确,可以解决直接判决存在 的跟踪时延问题。In this embodiment, the voice state and the voiceless state can be distinguished, and the optimal a priori signal and noise estimate is derived according to the MMSE criterion in the voice state. There is no voice state, and using a certain minimum value as the limit of maximum suppression strength can ensure smooth processing of pure noise segments and reduce music noise. The existence and non-existence state of speech are calculated by the probability of existence of speech. The probability is calculated by using the fixed value a priori SNR, which makes the a priori SNR estimation more accurate and can solve the existence of direct judgment. Tracking delay issues.
需要说明的是,本公开文本实施例中,上述介绍的多种实施方式可以相互结合实现,也可以单独实现,对此本公开文本实施例不作限定。另外,本公开文本实施例中,估算的先验信噪比可以用于音频信号的降噪过程的增益计算,可选地,可以应用采用单个麦克风降噪过程的增益计算。例如:如图2所示,获取后验信噪比和前一帧处理结构功率谱,基于后验信噪比和前一帧处理结构功率谱使用直接判决方法计算当前音频帧的预估先验信噪比,基于后验信噪比计算当前音频信号帧的语音存在概率,计算预估先验信噪比的MMSE的估计值,以及结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,该先验信噪比用于增益计算。It should be noted that, in the embodiments of the present disclosure, the various embodiments described above may be implemented in combination with each other, or may be implemented separately, and the embodiments of the present disclosure are not limited thereto. Additionally, in an embodiment of the present disclosure, the estimated a priori signal to noise ratio may be used for gain calculation of the noise reduction process of the audio signal, and optionally, gain calculation using a single microphone noise reduction process may be applied. For example, as shown in FIG. 2, the a posteriori signal-to-noise ratio and the power spectrum of the previous frame processing structure are obtained, and the a priori of the current audio frame is calculated using a direct decision method based on the posterior signal-to-noise ratio and the power spectrum of the previous frame processing structure. Signal-to-noise ratio, calculating a voice existence probability of a current audio signal frame based on a posteriori signal-to-noise ratio, calculating an estimated value of the MMSE estimating the a priori signal-to-noise ratio, and estimating the current in combination with the voice existence probability and the estimated value The final a priori signal-to-noise ratio of the audio frame, which is used for gain calculation.
本公开文本实施例中,通过上述步骤可以消除固有延时一帧的影响,缓解语音的起始段被衰减和末尾段存在的拖尾,进而带来降噪性能的提升。以下通过实验数据进行效果说明:In the embodiment of the present disclosure, the effect of the inherent delay of one frame can be eliminated by the above steps, and the initial segment of the speech is attenuated and the tail of the end segment is degraded, thereby improving the noise reduction performance. The following is an explanation of the results through experimental data:
实验采用Noizus数据库,数据的采样率为8kHz,白噪声使用Cool Edit(为一音频处理软件)生成,其它噪声则为Noizus数据库自带。帧长取20ms,重叠率为50%,前后各使用平方根哈宁窗(Hanning window),
Figure PCTCN2017106502-appb-000120
取15dB。ξmin取-20dB,抑制准则采用MMSE-STSA(Short-Time Spectral Amplitude,短时谱幅度)算法,噪声估计采用无偏MMSE算法。
The experiment uses the Noizus database, the data sampling rate is 8 kHz, the white noise is generated using Cool Edit (for an audio processing software), and the other noise is the Noizus database. The frame length is 20ms, the overlap rate is 50%, and the square root Hanning window is used before and after.
Figure PCTCN2017106502-appb-000120
Take 15dB. ξ min takes -20dB, the suppression criterion uses MMSE-STSA (Short-Time Spectral Amplitude) algorithm, and the noise estimation uses unbiased MMSE algorithm.
图3和图4分别是信噪比为0dB和5dB时的直接判决和本公开文本方法之间的对比。图3的语音为sp01,噪声为白噪,图4的语音为sp04,噪声为汽车噪声,其中,sp01和sp04是数据集里面的语音编号。箭头处可以看出,本公开文本方法明显优于对比算法。主观对比听,处理结果音乐噪声均不明显。图5为Noizus数据库30组汽车噪声和白噪声,在0/5/10/15dB下的平均段信噪比提升,从图中不难看出,本公开文本方法性能优于直接判决。Figures 3 and 4 show a comparison between the direct decision and the method of the present disclosure when the signal to noise ratio is 0 dB and 5 dB, respectively. The speech in Figure 3 is sp01, the noise is white noise, the speech in Figure 4 is sp04, and the noise is car noise. Among them, sp01 and sp04 are the speech numbers in the data set. As can be seen at the arrows, the disclosed method is clearly superior to the comparison algorithm. Subjective contrast, the music noise of the processing results are not obvious. Figure 5 shows the Noisus database of 30 sets of car noise and white noise, and the average segment signal-to-noise ratio is improved at 0/5/10/15 dB. It is easy to see from the figure that the performance of the present disclosure method is superior to the direct decision.
需要说明的是,上述方法可以应用于任何具备麦克风的用户终端,例如:手机、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)、个人数字助理(personal digital assistant,PDA)、移动上网装置(Mobile Intemet Device,MID)、车载设备或可穿戴式设备(Wearable Device)等终端设备,需要说明的是,在本公开文本实施例中并不限定用户终端的具体类型。 It should be noted that the above method can be applied to any user terminal with a microphone, such as a mobile phone, a tablet personal computer, a laptop computer, a personal digital assistant (PDA), and a mobile device. A terminal device such as a Mobile Intemet Device (MID), an in-vehicle device, or a wearable device, it should be noted that the specific type of the user terminal is not limited in the embodiment of the present disclosure.
估计当前音频帧的预估先验信噪比;根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;计算所述当前音频帧的语音存在概率;结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。由于是结合当前帧的语音存在概率和当前音频帧的预估先验信噪比对应的最小均方误差的估计值估计的最终先验信噪比,相比相关技术中根据前一帧的先验信噪比进行估计,本公开文本实施例可以估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制。Estimating an estimated a priori signal to noise ratio of the current audio frame; calculating an estimated value of the estimated MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio; The probability of speech presence of the current audio frame; the final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate. The final a priori signal-to-noise ratio estimated by combining the estimated probability of the voice of the current frame with the estimated a priori SNR of the current audio frame, compared to the prior art according to the previous frame. Detecting the signal to noise ratio for estimation, the a priori signal to noise ratio that can be estimated by the embodiments of the present disclosure is more correlated with the current audio frame, thereby facilitating noise suppression of the current audio frame.
参见图6,本公开文本实施例提供一种用户终端,如图6所示,用户终端600,包括以下模块:Referring to FIG. 6, an embodiment of the present disclosure provides a user terminal. As shown in FIG. 6, the user terminal 600 includes the following modules:
第一估计模块601,用于估计当前音频帧的预估先验信噪比;The first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame;
第一计算模块602,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;The first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame;
第二计算模块603,用于计算所述当前音频帧的语音存在概率;a second calculating module 603, configured to calculate a voice existence probability of the current audio frame;
第二估计模块604,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。The second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame in conjunction with the voice presence probability and the estimated value.
可选地,第一估计模块601用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Optionally, the first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
可选地,第一估计模块601用于通过如下公式估计当前音频帧的预估先验信噪比:Optionally, the first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000121
Figure PCTCN2017106502-appb-000121
其中,
Figure PCTCN2017106502-appb-000122
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000123
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000124
表示噪声方差,
Figure PCTCN2017106502-appb-000125
表示所述当前音频帧的后验信噪比估计值;
among them,
Figure PCTCN2017106502-appb-000122
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000123
Indicates the noise reduction processing result of the previous frame,
Figure PCTCN2017106502-appb-000124
Indicates the noise variance,
Figure PCTCN2017106502-appb-000125
Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,or,
所述第一估计模块601用于通过如下公式估计当前音频帧的预估先验信噪比:The first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000126
Figure PCTCN2017106502-appb-000126
其中,
Figure PCTCN2017106502-appb-000127
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000128
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000129
表示当前帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000127
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000128
For the a priori signal to noise ratio of the previous frame,
Figure PCTCN2017106502-appb-000129
Represents an a posteriori signal to noise ratio estimate for the current frame.
可选地,如图7所示,用户终端600还包括:Optionally, as shown in FIG. 7, the user terminal 600 further includes:
调整模块605,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:The adjusting module 605 is configured to adjust, by using the following formula, a smoothing number required to estimate the estimated a priori signal to noise ratio:
Figure PCTCN2017106502-appb-000130
Figure PCTCN2017106502-appb-000130
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
可选地,第一估计模块601还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:Optionally, the first estimation module 601 is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000131
或者
Figure PCTCN2017106502-appb-000132
Figure PCTCN2017106502-appb-000131
or
Figure PCTCN2017106502-appb-000132
其中,
Figure PCTCN2017106502-appb-000133
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000134
Figure PCTCN2017106502-appb-000135
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
among them,
Figure PCTCN2017106502-appb-000133
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000134
with
Figure PCTCN2017106502-appb-000135
Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
可选地,第一计算模块602用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Optionally, the first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame by using a formula :
Figure PCTCN2017106502-appb-000136
Figure PCTCN2017106502-appb-000136
其中,
Figure PCTCN2017106502-appb-000137
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000138
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000139
表示所述当前音频帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000137
An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000138
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000139
Representing an a posteriori signal to noise ratio estimate for the current audio frame.
可选地,第二计算模块603用于通过如下公式计算所述当前音频帧的语音存在概率:Optionally, the second calculating module 603 is configured to calculate a voice existence probability of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000140
Figure PCTCN2017106502-appb-000140
Figure PCTCN2017106502-appb-000141
Figure PCTCN2017106502-appb-000141
或者
Figure PCTCN2017106502-appb-000142
or
Figure PCTCN2017106502-appb-000142
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000143
为一固定值,
Figure PCTCN2017106502-appb-000144
表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability,
Figure PCTCN2017106502-appb-000143
For a fixed value,
Figure PCTCN2017106502-appb-000144
Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ minmax , p max and p min are two empirical values And p min <p max .
可选地,第二估计模块604用于通过如下公式估计所述当前音频帧的最终先验信噪比:Optionally, the second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
Figure PCTCN2017106502-appb-000145
Figure PCTCN2017106502-appb-000145
其中,
Figure PCTCN2017106502-appb-000146
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000147
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
among them,
Figure PCTCN2017106502-appb-000146
Representing the final a priori signal to noise ratio of the current audio frame,
Figure PCTCN2017106502-appb-000147
An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
需要说明的是,本实施例中上述用户终端600可以是与本公开文本实施例中方法实施例提供的语音信号降噪方法对应的用户终端,本公开文本实施例中方法实施例中的任意实施方式都可以被本实施例中的上述用户终端600所实现,以及达到相同的有益效果,此处不再赘述。It should be noted that, in the embodiment, the user terminal 600 may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any implementation in the method embodiment in the embodiment of the present disclosure The method can be implemented by the foregoing user terminal 600 in the embodiment, and achieve the same beneficial effects, and details are not described herein again.
参见图8,本公开文本实施例提供另一种用户终端的结构,该用户终端包括:处理器800、收发机810、存储器820、用户接口830和总线接口,其中:Referring to FIG. 8, an embodiment of the present disclosure provides a structure of another user terminal, including: a processor 800, a transceiver 810, a memory 820, a user interface 830, and a bus interface, where:
处理器800,用于读取存储器820中的程序,执行下列过程:The processor 800 is configured to read a program in the memory 820 and perform the following process:
估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;Calculating an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。A final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
其中,用户接口830中包括的麦克风,收发机810,用于在处理器800的控制下接收和发送数据。 The microphone included in the user interface 830, the transceiver 810, is configured to receive and transmit data under the control of the processor 800.
在图8中,总线架构可以包括任意数量的互联的总线和桥,具体由处理器800代表的一个或多个处理器和存储器820代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起。总线接口提供接口。收发机810可以是多个元件,即包括发送机和接收机,提供用于在传输介质上与各种其他装置通信的单元。针对不同的用户设备,用户接口830还可以是能够外接内接需要设备的接口,连接的设备包括但不限于小键盘、显示器、扬声器、麦克风、操纵杆等。In FIG. 8, the bus architecture may include any number of interconnected buses and bridges, specifically linked by one or more processors represented by processor 800 and various circuits of memory represented by memory 820. The bus architecture can also link various other circuits such as peripherals, voltage regulators, and power management circuits. The bus interface provides an interface. Transceiver 810 can be a plurality of components, including a transmitter and a receiver, providing means for communicating with various other devices on a transmission medium. For different user equipments, the user interface 830 may also be an interface capable of externally connecting the required devices, including but not limited to a keypad, a display, a speaker, a microphone, a joystick, and the like.
处理器800负责管理总线架构和通常的处理,存储器820可以存储处理器800在执行操作时所使用的数据。The processor 800 is responsible for managing the bus architecture and general processing, and the memory 820 can store data used by the processor 800 in performing operations.
可选地,所述估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame includes:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
可选地,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame, including:
通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
Figure PCTCN2017106502-appb-000148
Figure PCTCN2017106502-appb-000148
其中,
Figure PCTCN2017106502-appb-000149
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000150
表示前一帧的降噪处理结果,
Figure PCTCN2017106502-appb-000151
表示噪声方差,
Figure PCTCN2017106502-appb-000152
表示所述当前音频帧的后验信噪比估计值;
among them,
Figure PCTCN2017106502-appb-000149
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000150
Indicates the noise reduction processing result of the previous frame,
Figure PCTCN2017106502-appb-000151
Indicates the noise variance,
Figure PCTCN2017106502-appb-000152
Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,or,
通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
Figure PCTCN2017106502-appb-000153
Figure PCTCN2017106502-appb-000153
其中,
Figure PCTCN2017106502-appb-000154
表示所述预估先验信噪比,α为平滑数,
Figure PCTCN2017106502-appb-000155
为前一帧的先验信噪比,
Figure PCTCN2017106502-appb-000156
表示当前帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000154
Representing the estimated a priori signal to noise ratio, α is a smoothing number,
Figure PCTCN2017106502-appb-000155
For the a priori signal to noise ratio of the previous frame,
Figure PCTCN2017106502-appb-000156
Represents an a posteriori signal to noise ratio estimate for the current frame.
可选地,处理器800还用于:Optionally, the processor 800 is further configured to:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:The smoothing number required to estimate the estimated a priori signal to noise ratio is adjusted by the following formula:
Figure PCTCN2017106502-appb-000157
Figure PCTCN2017106502-appb-000157
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
可选地,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:Optionally, the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice, further comprising:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
Figure PCTCN2017106502-appb-000158
或者
Figure PCTCN2017106502-appb-000159
Figure PCTCN2017106502-appb-000158
or
Figure PCTCN2017106502-appb-000159
其中,
Figure PCTCN2017106502-appb-000160
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000161
Figure PCTCN2017106502-appb-000162
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
among them,
Figure PCTCN2017106502-appb-000160
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000161
with
Figure PCTCN2017106502-appb-000162
Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
可选地,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:Optionally, the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame, including:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Calculating an estimated value of the minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio:
Figure PCTCN2017106502-appb-000163
Figure PCTCN2017106502-appb-000163
其中,
Figure PCTCN2017106502-appb-000164
表示所述预估先验信噪比对应的最小均方误差的估计值,
Figure PCTCN2017106502-appb-000165
表示所述预估先验信噪比,
Figure PCTCN2017106502-appb-000166
表示所述当前音频帧的后验信噪比估计值。
among them,
Figure PCTCN2017106502-appb-000164
An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000165
Representing the estimated a priori signal to noise ratio,
Figure PCTCN2017106502-appb-000166
Representing an a posteriori signal to noise ratio estimate for the current audio frame.
可选地,所述计算所述当前音频帧的语音存在概率,包括:Optionally, the calculating a voice existence probability of the current audio frame includes:
通过如下公式计算所述当前音频帧的语音存在概率:Calculating the probability of existence of the current audio frame by the following formula:
Figure PCTCN2017106502-appb-000167
Figure PCTCN2017106502-appb-000167
Figure PCTCN2017106502-appb-000168
Figure PCTCN2017106502-appb-000168
或者
Figure PCTCN2017106502-appb-000169
or
Figure PCTCN2017106502-appb-000169
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
Figure PCTCN2017106502-appb-000170
为一固定值,
Figure PCTCN2017106502-appb-000171
表示所述当前音频帧的后 验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability,
Figure PCTCN2017106502-appb-000170
For a fixed value,
Figure PCTCN2017106502-appb-000171
Representing a posterior signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ minmax , p max and p min are two empirical values And p min <p max .
可选地,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:Optionally, the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value, including:
通过如下公式估计所述当前音频帧的最终先验信噪比:The final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
Figure PCTCN2017106502-appb-000172
Figure PCTCN2017106502-appb-000172
其中,
Figure PCTCN2017106502-appb-000173
表示所述当前音频帧的最终先验信噪比,
Figure PCTCN2017106502-appb-000174
表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
among them,
Figure PCTCN2017106502-appb-000173
Representing the final a priori signal to noise ratio of the current audio frame,
Figure PCTCN2017106502-appb-000174
An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
需要说明的是,本实施例中上述用户终端可以是与本公开文本实施例中方法实施例提供的语音信号降噪方法对应的用户终端,本公开文本实施例中方法实施例中的任意实施方式都可以被本实施例中的上述用户终端所实现,以及达到相同的有益效果,此处不再赘述It should be noted that, in the embodiment, the user terminal may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any of the method embodiments in the embodiments of the present disclosure It can be implemented by the above user terminal in this embodiment, and achieve the same beneficial effects, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
另外,在本公开文本各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开文本各个实施例所述收发方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存 储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform part of the steps of the method of transmitting and receiving described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), and a random access memory. A variety of media that can store program code, such as a random access memory (RAM), a disk, or an optical disk.
以上所述是本公开文本的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本公开文本所述原理的前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本公开文本的保护范围。 The above is a preferred embodiment of the present disclosure, and it should be noted that those skilled in the art can make several improvements and refinements without departing from the principles of the present disclosure. Retouching should also be considered as protection of this disclosure.

Claims (17)

  1. 一种噪声抑制信噪比估计方法,包括:A noise suppression signal to noise ratio estimation method includes:
    估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;
    根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;Calculating an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
    计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;
    结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。A final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
  2. 如权利要求1所述的方法,其中,所述估计当前音频帧的预估先验信噪比,包括:The method of claim 1 wherein said estimating an estimated a priori signal to noise ratio of a current audio frame comprises:
    基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
  3. 如权利要求2所述的方法,其中,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:The method of claim 2 wherein said estimating an a priori signal to noise ratio of a current audio frame based on an a posteriori signal to noise ratio estimate of said current audio frame comprises:
    通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
    Figure PCTCN2017106502-appb-100001
    Figure PCTCN2017106502-appb-100001
    其中,
    Figure PCTCN2017106502-appb-100002
    表示所述预估先验信噪比,α为平滑数,
    Figure PCTCN2017106502-appb-100003
    表示前一帧的降噪处理结果,
    Figure PCTCN2017106502-appb-100004
    表示噪声方差,
    Figure PCTCN2017106502-appb-100005
    表示所述当前音频帧的后验信噪比估计值;
    among them,
    Figure PCTCN2017106502-appb-100002
    Representing the estimated a priori signal to noise ratio, α is a smoothing number,
    Figure PCTCN2017106502-appb-100003
    Indicates the noise reduction processing result of the previous frame,
    Figure PCTCN2017106502-appb-100004
    Indicates the noise variance,
    Figure PCTCN2017106502-appb-100005
    Representing an a posteriori signal to noise ratio estimate of the current audio frame;
    或者,or,
    通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
    Figure PCTCN2017106502-appb-100006
    Figure PCTCN2017106502-appb-100006
    其中,
    Figure PCTCN2017106502-appb-100007
    表示所述预估先验信噪比,α为平滑数,
    Figure PCTCN2017106502-appb-100008
    为前一帧的先验信噪比,
    Figure PCTCN2017106502-appb-100009
    表示当前帧的后验信噪比估计值。
    among them,
    Figure PCTCN2017106502-appb-100007
    Representing the estimated a priori signal to noise ratio, α is a smoothing number,
    Figure PCTCN2017106502-appb-100008
    For the a priori signal to noise ratio of the previous frame,
    Figure PCTCN2017106502-appb-100009
    Represents an a posteriori signal to noise ratio estimate for the current frame.
  4. 如权利要求3所述的方法,还包括:The method of claim 3 further comprising:
    通过如下公式调整估计所述预估先验信噪比时所需要的平滑数: The smoothing number required to estimate the estimated a priori signal to noise ratio is adjusted by the following formula:
    Figure PCTCN2017106502-appb-100010
    Figure PCTCN2017106502-appb-100010
    其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
  5. 如权利要求4所述的方法,其中,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:The method of claim 4, wherein the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the speech further comprises:
    通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
    Figure PCTCN2017106502-appb-100011
    或者
    Figure PCTCN2017106502-appb-100012
    Figure PCTCN2017106502-appb-100011
    or
    Figure PCTCN2017106502-appb-100012
    其中,
    Figure PCTCN2017106502-appb-100013
    表示所述预估先验信噪比,
    Figure PCTCN2017106502-appb-100014
    Figure PCTCN2017106502-appb-100015
    分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
    among them,
    Figure PCTCN2017106502-appb-100013
    Representing the estimated a priori signal to noise ratio,
    Figure PCTCN2017106502-appb-100014
    with
    Figure PCTCN2017106502-appb-100015
    Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
  6. 如权利要求1-5中任一项所述的方法,其中,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:The method according to any one of claims 1 to 5, wherein the calculating a minimum average of the estimated a priori signal to noise ratios of the current audio frame according to the estimated a priori signal to noise ratio Estimates of the square error, including:
    根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Calculating an estimated value of the minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio:
    Figure PCTCN2017106502-appb-100016
    Figure PCTCN2017106502-appb-100016
    其中,
    Figure PCTCN2017106502-appb-100017
    表示所述预估先验信噪比对应的最小均方误差的估计值,
    Figure PCTCN2017106502-appb-100018
    表示所述预估先验信噪比,
    Figure PCTCN2017106502-appb-100019
    表示所述当前音频帧的后验信噪比估计值。
    among them,
    Figure PCTCN2017106502-appb-100017
    An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio,
    Figure PCTCN2017106502-appb-100018
    Representing the estimated a priori signal to noise ratio,
    Figure PCTCN2017106502-appb-100019
    Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  7. 如权利要求1-5中任一项所述的方法,其中,所述计算所述当前音频帧的语音存在概率,包括:The method of any of claims 1-5, wherein the calculating a voice presence probability of the current audio frame comprises:
    通过如下公式计算所述当前音频帧的语音存在概率:Calculating the probability of existence of the current audio frame by the following formula:
    Figure PCTCN2017106502-appb-100020
    Figure PCTCN2017106502-appb-100020
    Figure PCTCN2017106502-appb-100021
    Figure PCTCN2017106502-appb-100021
    或者
    Figure PCTCN2017106502-appb-100022
    or
    Figure PCTCN2017106502-appb-100022
    其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
    Figure PCTCN2017106502-appb-100023
    为一固定值,
    Figure PCTCN2017106502-appb-100024
    表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
    Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability,
    Figure PCTCN2017106502-appb-100023
    For a fixed value,
    Figure PCTCN2017106502-appb-100024
    Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ minmax , p max and p min are two empirical values And p min <p max .
  8. 如权利要求1-5中任一项所述的方法,其中,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:The method of any of claims 1-5, wherein the estimating the final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimate comprises:
    通过如下公式估计所述当前音频帧的最终先验信噪比:The final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
    Figure PCTCN2017106502-appb-100025
    Figure PCTCN2017106502-appb-100025
    其中,
    Figure PCTCN2017106502-appb-100026
    表示所述当前音频帧的最终先验信噪比,
    Figure PCTCN2017106502-appb-100027
    表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
    among them,
    Figure PCTCN2017106502-appb-100026
    Representing the final a priori signal to noise ratio of the current audio frame,
    Figure PCTCN2017106502-appb-100027
    An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
  9. 一种用户终端,包括:A user terminal comprising:
    第一估计模块,用于估计当前音频帧的预估先验信噪比;a first estimating module, configured to estimate an estimated a priori signal to noise ratio of the current audio frame;
    第一计算模块,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;a first calculating module, configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame;
    第二计算模块,用于计算所述当前音频帧的语音存在概率;a second calculating module, configured to calculate a voice existence probability of the current audio frame;
    第二估计模块,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。And a second estimating module, configured to estimate a final a priori signal to noise ratio of the current audio frame in combination with the voice presence probability and the estimated value.
  10. 如权利要求9所述的用户终端,其中,所述第一估计模块用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。The user terminal of claim 9, wherein the first estimating module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
  11. 如权利要求10所述的用户终端,其中,所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:The user terminal of claim 10, wherein the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by the following formula:
    Figure PCTCN2017106502-appb-100028
    Figure PCTCN2017106502-appb-100028
    其中,
    Figure PCTCN2017106502-appb-100029
    表示所述预估先验信噪比,α为平滑数,
    Figure PCTCN2017106502-appb-100030
    表示前一帧的降噪处理结果,
    Figure PCTCN2017106502-appb-100031
    表示噪声方差,
    Figure PCTCN2017106502-appb-100032
    表示所述当前音频帧的后验信噪 比估计值;
    among them,
    Figure PCTCN2017106502-appb-100029
    Representing the estimated a priori signal to noise ratio, α is a smoothing number,
    Figure PCTCN2017106502-appb-100030
    Indicates the noise reduction processing result of the previous frame,
    Figure PCTCN2017106502-appb-100031
    Indicates the noise variance,
    Figure PCTCN2017106502-appb-100032
    Representing an a posteriori signal to noise ratio estimate of the current audio frame;
    或者,or,
    所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:The first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
    Figure PCTCN2017106502-appb-100033
    Figure PCTCN2017106502-appb-100033
    其中,
    Figure PCTCN2017106502-appb-100034
    表示所述预估先验信噪比,α为平滑数,
    Figure PCTCN2017106502-appb-100035
    为前一帧的先验信噪比,
    Figure PCTCN2017106502-appb-100036
    表示当前帧的后验信噪比估计值。
    among them,
    Figure PCTCN2017106502-appb-100034
    Representing the estimated a priori signal to noise ratio, α is a smoothing number,
    Figure PCTCN2017106502-appb-100035
    For the a priori signal to noise ratio of the previous frame,
    Figure PCTCN2017106502-appb-100036
    Represents an a posteriori signal to noise ratio estimate for the current frame.
  12. 如权利要求11所述的用户终端,还包括:The user terminal of claim 11, further comprising:
    调整模块,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:An adjustment module for adjusting a smoothing number required to estimate the estimated a priori signal to noise ratio by the following formula:
    Figure PCTCN2017106502-appb-100037
    Figure PCTCN2017106502-appb-100037
    其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
  13. 如权利要求12所述的用户终端,其中,所述第一估计模块还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The user terminal of claim 12, wherein the first estimating module is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by:
    Figure PCTCN2017106502-appb-100038
    或者
    Figure PCTCN2017106502-appb-100039
    Figure PCTCN2017106502-appb-100038
    or
    Figure PCTCN2017106502-appb-100039
    其中,
    Figure PCTCN2017106502-appb-100040
    表示所述预估先验信噪比,
    Figure PCTCN2017106502-appb-100041
    Figure PCTCN2017106502-appb-100042
    分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
    among them,
    Figure PCTCN2017106502-appb-100040
    Representing the estimated a priori signal to noise ratio,
    Figure PCTCN2017106502-appb-100041
    with
    Figure PCTCN2017106502-appb-100042
    Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
  14. 如权利要求9-13中任一项所述的用户终端,其中,所述第一计算模块用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:The user terminal according to any one of claims 9 to 13, wherein the first calculation module is configured to calculate the pre-preparation of the current audio frame according to the estimated a priori signal to noise ratio by the following formula Estimate the estimate of the minimum mean square error corresponding to the prior SNR:
    Figure PCTCN2017106502-appb-100043
    Figure PCTCN2017106502-appb-100043
    其中,
    Figure PCTCN2017106502-appb-100044
    表示所述预估先验信噪比对应的最小均方误差的估计值,
    Figure PCTCN2017106502-appb-100045
    表示所述预估先验信噪比,
    Figure PCTCN2017106502-appb-100046
    表示所述当前音频帧的后验信噪比估计值。
    among them,
    Figure PCTCN2017106502-appb-100044
    An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio,
    Figure PCTCN2017106502-appb-100045
    Representing the estimated a priori signal to noise ratio,
    Figure PCTCN2017106502-appb-100046
    Representing an a posteriori signal to noise ratio estimate for the current audio frame.
  15. 如权利要求9-13中任一项所述的用户终端,其中,所述第二计算模块用于通过如下公式计算所述当前音频帧的语音存在概率: The user terminal according to any one of claims 9 to 13, wherein the second calculation module is configured to calculate a voice existence probability of the current audio frame by the following formula:
    Figure PCTCN2017106502-appb-100047
    Figure PCTCN2017106502-appb-100047
    Figure PCTCN2017106502-appb-100048
    Figure PCTCN2017106502-appb-100048
    或者
    Figure PCTCN2017106502-appb-100049
    or
    Figure PCTCN2017106502-appb-100049
    其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,
    Figure PCTCN2017106502-appb-100050
    为一固定值,
    Figure PCTCN2017106502-appb-100051
    表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax
    Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability,
    Figure PCTCN2017106502-appb-100050
    For a fixed value,
    Figure PCTCN2017106502-appb-100051
    Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ minmax , p max and p min are two empirical values And p min <p max .
  16. 如权利要求9-13中任一项所述的用户终端,其中,所述第二估计模块用于通过如下公式估计所述当前音频帧的最终先验信噪比:The user terminal according to any one of claims 9 to 13, wherein the second estimation module is configured to estimate a final a priori signal to noise ratio of the current audio frame by the following formula:
    Figure PCTCN2017106502-appb-100052
    Figure PCTCN2017106502-appb-100052
    其中,
    Figure PCTCN2017106502-appb-100053
    表示所述当前音频帧的最终先验信噪比,
    Figure PCTCN2017106502-appb-100054
    表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。
    among them,
    Figure PCTCN2017106502-appb-100053
    Representing the final a priori signal to noise ratio of the current audio frame,
    Figure PCTCN2017106502-appb-100054
    An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
  17. 一种用户终端,包括:处理器、存储器和收发机,其中:A user terminal includes: a processor, a memory, and a transceiver, wherein:
    所述处理器用于读取存储器中的程序,执行下列过程:The processor is configured to read a program in the memory and perform the following process:
    估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;
    根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;Calculating an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
    计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;
    结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,Estimating a final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimated value,
    其中,所述收发机用于接收和发送数据,所述存储器能够存储处理器在执行操作时所使用的数据。 The transceiver is configured to receive and transmit data, and the memory is capable of storing data used by the processor when performing operations.
PCT/CN2017/106502 2016-11-10 2017-10-17 Method for estimating signal-to-noise ratio for noise suppression, and user terminal WO2018086444A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611039463.4A CN108074582B (en) 2016-11-10 2016-11-10 Noise suppression signal-to-noise ratio estimation method and user terminal
CN201611039463.4 2016-11-10

Publications (1)

Publication Number Publication Date
WO2018086444A1 true WO2018086444A1 (en) 2018-05-17

Family

ID=62109133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/106502 WO2018086444A1 (en) 2016-11-10 2017-10-17 Method for estimating signal-to-noise ratio for noise suppression, and user terminal

Country Status (2)

Country Link
CN (1) CN108074582B (en)
WO (1) WO2018086444A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986693A (en) * 2020-08-10 2020-11-24 北京小米松果电子有限公司 Audio signal processing method and device, terminal equipment and storage medium
US20210327448A1 (en) * 2018-12-18 2021-10-21 Tencent Technology (Shenzhen) Company Limited Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
CN113838474A (en) * 2021-11-25 2021-12-24 全时云商务服务股份有限公司 Communication system howling suppression method and device
CN114724571A (en) * 2022-03-29 2022-07-08 大连理工大学 Robust distributed speaker noise elimination system

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767781A (en) * 2019-03-06 2019-05-17 哈尔滨工业大学(深圳) Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning
CN109817234B (en) * 2019-03-06 2021-01-26 哈尔滨工业大学(深圳) Target speech signal enhancement method, system and storage medium based on continuous noise tracking
CN111899752B (en) * 2020-07-13 2023-01-10 紫光展锐(重庆)科技有限公司 Noise suppression method and device for rapidly calculating voice existence probability, storage medium and terminal
CN112969130A (en) * 2020-12-31 2021-06-15 维沃移动通信有限公司 Audio signal processing method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1763846A (en) * 2005-11-23 2006-04-26 北京中星微电子有限公司 Voice gain factor estimating device and method
WO2006136900A1 (en) * 2005-06-15 2006-12-28 Nortel Networks Limited Method and apparatus for non-intrusive single-ended voice quality assessment in voip
CN103187068A (en) * 2011-12-30 2013-07-03 联芯科技有限公司 Priori signal-to-noise ratio estimation method, device and noise inhibition method based on Kalman
CN105280193A (en) * 2015-07-20 2016-01-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 Prior signal-to-noise ratio estimating method based on MMSE error criterion
CN105702262A (en) * 2014-11-28 2016-06-22 上海航空电器有限公司 Headset double-microphone voice enhancement method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814290A (en) * 2009-02-25 2010-08-25 三星电子株式会社 Method for enhancing robustness of voice recognition system
CN101853665A (en) * 2009-06-18 2010-10-06 博石金(北京)信息技术有限公司 Method for eliminating noise in voice
JP6129316B2 (en) * 2012-09-03 2017-05-17 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Apparatus and method for providing information-based multi-channel speech presence probability estimation
CN102938254B (en) * 2012-10-24 2014-12-10 中国科学技术大学 Voice signal enhancement system and method
US9449610B2 (en) * 2013-11-07 2016-09-20 Continental Automotive Systems, Inc. Speech probability presence modifier improving log-MMSE based noise suppression performance
US9449609B2 (en) * 2013-11-07 2016-09-20 Continental Automotive Systems, Inc. Accurate forward SNR estimation based on MMSE speech probability presence
CN103646648B (en) * 2013-11-19 2016-03-23 清华大学 A kind of noise power estimation method
CN105741849B (en) * 2016-03-06 2019-03-22 北京工业大学 The sound enhancement method of phase estimation and human hearing characteristic is merged in digital deaf-aid

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006136900A1 (en) * 2005-06-15 2006-12-28 Nortel Networks Limited Method and apparatus for non-intrusive single-ended voice quality assessment in voip
CN1763846A (en) * 2005-11-23 2006-04-26 北京中星微电子有限公司 Voice gain factor estimating device and method
CN103187068A (en) * 2011-12-30 2013-07-03 联芯科技有限公司 Priori signal-to-noise ratio estimation method, device and noise inhibition method based on Kalman
CN105702262A (en) * 2014-11-28 2016-06-22 上海航空电器有限公司 Headset double-microphone voice enhancement method
CN105280193A (en) * 2015-07-20 2016-01-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 Prior signal-to-noise ratio estimating method based on MMSE error criterion

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210327448A1 (en) * 2018-12-18 2021-10-21 Tencent Technology (Shenzhen) Company Limited Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
CN111986693A (en) * 2020-08-10 2020-11-24 北京小米松果电子有限公司 Audio signal processing method and device, terminal equipment and storage medium
CN113838474A (en) * 2021-11-25 2021-12-24 全时云商务服务股份有限公司 Communication system howling suppression method and device
CN113838474B (en) * 2021-11-25 2022-02-18 全时云商务服务股份有限公司 Communication system howling suppression method and device
CN114724571A (en) * 2022-03-29 2022-07-08 大连理工大学 Robust distributed speaker noise elimination system
CN114724571B (en) * 2022-03-29 2024-05-03 大连理工大学 Robust distributed speaker noise elimination system

Also Published As

Publication number Publication date
CN108074582B (en) 2021-08-06
CN108074582A (en) 2018-05-25

Similar Documents

Publication Publication Date Title
WO2018086444A1 (en) Method for estimating signal-to-noise ratio for noise suppression, and user terminal
US20210327448A1 (en) Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
US20230298610A1 (en) Noise suppression method and apparatus for quickly calculating speech presence probability, and storage medium and terminal
US8239196B1 (en) System and method for multi-channel multi-feature speech/noise classification for noise suppression
CN110634497B (en) Noise reduction method and device, terminal equipment and storage medium
WO2021179424A1 (en) Speech enhancement method combined with ai model, system, electronic device and medium
US8483398B2 (en) Methods and systems for reducing acoustic echoes in multichannel communication systems by reducing the dimensionality of the space of impulse responses
JP6361156B2 (en) Noise estimation apparatus, method and program
AU2015240992B2 (en) Situation dependent transient suppression
WO2021128670A1 (en) Noise reduction method, device, electronic apparatus and readable storage medium
WO2012158156A1 (en) Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood
CN109817234A (en) Targeted voice signal Enhancement Method, system and storage medium based on continuing noise tracking
CN109727607B (en) Time delay estimation method and device and electronic equipment
WO2020124325A1 (en) Echo elimination adaptive filtering method, apparatus, device and storage medium
US20140357326A1 (en) Echo suppression
WO2022218254A1 (en) Voice signal enhancement method and apparatus, and electronic device
WO2019119593A1 (en) Voice enhancement method and apparatus
WO2012166092A1 (en) Control of adaptation step size and suppression gain in acoustic echo control
WO2021143249A1 (en) Transient noise suppression-based audio processing method, apparatus, device, and medium
CN112289337B (en) Method and device for filtering residual noise after machine learning voice enhancement
WO2021007841A1 (en) Noise estimation method, noise estimation apparatus, speech processing chip and electronic device
CN113763975B (en) Voice signal processing method, device and terminal
CN113611319A (en) Wind noise suppression method, device, equipment and system based on voice component
CN116913306A (en) Voice enhancement method and device and electronic equipment
CN116453538A (en) Voice noise reduction method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17869048

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17869048

Country of ref document: EP

Kind code of ref document: A1