WO2018086444A1 - Method for estimating signal-to-noise ratio for noise suppression, and user terminal - Google Patents
Method for estimating signal-to-noise ratio for noise suppression, and user terminal Download PDFInfo
- Publication number
- WO2018086444A1 WO2018086444A1 PCT/CN2017/106502 CN2017106502W WO2018086444A1 WO 2018086444 A1 WO2018086444 A1 WO 2018086444A1 CN 2017106502 W CN2017106502 W CN 2017106502W WO 2018086444 A1 WO2018086444 A1 WO 2018086444A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise ratio
- estimated
- audio frame
- current audio
- signal
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000001629 suppression Effects 0.000 title claims abstract description 19
- 238000009499 grossing Methods 0.000 claims description 37
- 230000009467 reduction Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 5
- 230000000875 corresponding effect Effects 0.000 description 27
- 238000010586 diagram Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000007476 Maximum Likelihood Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- IXKSXJFAGXLQOQ-XISFHERQSA-N WHWLQLKPGQPMY Chemical compound C([C@@H](C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N1CCC[C@H]1C(=O)NCC(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)NC(=O)[C@@H](N)CC=1C2=CC=CC=C2NC=1)C1=CNC=N1 IXKSXJFAGXLQOQ-XISFHERQSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
Definitions
- the present disclosure relates to the field of voice technologies, and in particular, to a noise suppression signal to noise ratio estimation method and a user terminal.
- a single microphone noise reduction method is generally used in a user terminal to perform noise reduction on an audio signal.
- the method mainly includes the following steps:
- the noisy speech is used to decompose the frequency domain signal Y in the frequency domain;
- FFT fast Fourier Transformation
- the noise-reduced frequency domain signal is transformed into a time domain signal by Inverse Fast Fourier Transform (IFFT).
- IFFT Inverse Fast Fourier Transform
- the a priori signal-to-noise ratio is estimated using a direct decision method, that is, estimated by the following formula:
- An estimate of the a priori signal-to-noise ratio of the current frame, ⁇ usually needs to take a smoothing number close to 1, specifically 0.95 to 1.
- the estimated value of the posterior SNR is heavily biased towards the noise reduction processing result of the previous frame. and Can be seen as the previous frame of speech variance Instantaneous value. Therefore, the a priori estimated signal-to-noise ratio ⁇ estimated by the above formula is not an estimate of the signal-to-noise ratio ⁇ (m) of the current frame, and can be regarded as estimating the a priori signal-to-noise ratio ⁇ (m-1) of the previous frame. It can be seen that it is currently estimated that the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is not conducive to the problem of noise suppression of the current audio frame.
- the purpose of the present disclosure is to provide a noise suppression signal to noise ratio estimation method and a user terminal, which solves the problem that estimating the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is disadvantageous to the noise of the current audio frame.
- the problem of suppression is to provide a noise suppression signal to noise ratio estimation method and a user terminal, which solves the problem that estimating the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is disadvantageous to the noise of the current audio frame.
- an embodiment of the present disclosure provides a method for estimating a priori signal to noise ratio, including:
- MMSE minimum mean square error
- a final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
- the estimating an a priori signal to noise ratio of the current audio frame includes:
- the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
- the estimated a priori SNR of the current audio frame is estimated by the following formula:
- the estimated a priori SNR of the current audio frame is estimated by the following formula:
- the method further includes:
- a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
- the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
- the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
- Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
- the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
- An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
- the calculating a voice existence probability of the current audio frame includes:
- Y) represents the probability of existence of the speech
- p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
- exp() is an exponential function
- ⁇ min and ⁇ max are two empirical values
- ⁇ min ⁇ max , p max and p min are two empirical values
- the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
- the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
- the embodiment of the present disclosure further provides a user terminal, including:
- a first estimating module configured to estimate an estimated a priori signal to noise ratio of the current audio frame
- a first calculating module configured to calculate an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
- a second calculating module configured to calculate a voice existence probability of the current audio frame
- a second estimating module configured to estimate a final a priori signal to noise ratio of the current audio frame in combination with the voice presence probability and the estimated value.
- the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
- the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
- the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
- the user terminal further includes:
- An adjustment module for adjusting a smoothing number required to estimate the estimated a priori signal to noise ratio by the following formula:
- a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
- the first estimation module is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
- the first calculating module is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame by using:
- An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
- the second calculating module is configured to calculate a voice existence probability of the current audio frame by using the following formula:
- Y) represents the probability of existence of the speech
- p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
- exp() is an exponential function
- ⁇ min and ⁇ max are two empirical values
- ⁇ min ⁇ max , p max and p min are two empirical values
- the second estimation module is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
- the embodiment of the present disclosure further provides a user terminal, including: a processor, a memory, and a transceiver, where:
- the processor is configured to read a program in the memory and perform the following process:
- the transceiver is configured to receive and transmit data
- the memory is capable of storing data used by the processor when performing operations.
- the final a priori signal-to-noise ratio estimated by combining the estimated probability of the voice of the current frame with the estimated a priori SNR of the current audio frame, compared to the prior art according to the previous frame. Detecting the signal to noise ratio for estimation, the a priori signal to noise ratio that can be estimated by the embodiments of the present disclosure is more correlated with the current audio frame, thereby facilitating noise suppression of the current audio frame.
- FIG. 1 is a schematic flowchart diagram of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
- FIG. 2 is a schematic diagram of another noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
- FIG. 3 is a schematic diagram of experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
- FIG. 4 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
- FIG. 5 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure
- FIG. 6 is a schematic structural diagram of a user terminal according to an embodiment of the present disclosure.
- FIG. 7 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure.
- FIG. 8 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure.
- an embodiment of the present disclosure provides a noise suppression signal to noise ratio estimation method, as shown in FIG. 1 , including the following steps:
- the current audio frame may be a current frame collected by a microphone of the user terminal, and the current frame may be a voice frame or a noise frame.
- the above-mentioned estimated a priori signal-to-noise ratio may be an a priori signal-to-noise ratio estimated by a direct decision method or a maximum likelihood method.
- the above estimated MMSE estimate for estimating the a priori SNR may be an estimate of the MMSE using the MMSE algorithm to obtain the above-described estimated prior SNR.
- the voice existence probability of the current audio frame may be calculated according to the posterior signal to noise ratio of the current audio frame, or may be averaged or smoothed by combining the posterior signal to noise ratio of the same frequency point of the previous frames. The value of the calculation calculates the probability of speech presence of the current audio frame.
- step 103 may be performed first, then step 101 may be performed, or step 101 may be performed first. Then step 103 is performed.
- the final a priori signal to noise ratio of the current audio frame may be understood as a priori signal to noise ratio for gain calculation in the process of performing noise reduction on the audio frame, or may also be understood as being directed to the embodiments of the present disclosure.
- the a priori signal-to-noise ratio of the current audio frame output may be understood as a priori signal to noise ratio for gain calculation in the process of performing noise reduction on the audio frame, or may also be understood as being directed to the embodiments of the present disclosure.
- Estimating the final a priori signal to noise ratio of the current audio frame according to the voice existence probability and the estimated value may be: determining a probability that the current audio frame is a voice frame according to the voice existence probability, and determining that the current audio frame is pure noise Frame, then set the final a priori SNR to a stable minimum, such as ⁇ min , to ensure smooth processing of pure noise segments and reduce music noise; and when determining that the current audio frame is an audio frame in a speech segment Then, the final a priori SNR is calculated to be biased toward the estimated minimum azimuth error of the a priori SNR, so that the final a priori SNR estimation is more accurate.
- the final a priori SNR of the estimated value of the minimum mean square error of the current frame and the estimated a priori SNR of the current audio frame can be realized, the estimated a priori SNR and the current
- the correlation of audio frames is higher, which is beneficial to the noise suppression of the current audio frame to improve the noise suppression effect.
- the estimating an a priori signal to noise ratio of the current audio frame includes:
- the posterior signal to noise ratio of the current audio frame is common knowledge and will not be described in detail herein.
- the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame may be based on the a posteriori signal to noise ratio estimation value of the current audio frame, using a direct decision method to estimate the current
- the estimated a priori signal to noise ratio of the audio frame is of course not limited by the embodiments of the present disclosure.
- the estimating the a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
- the estimated a priori SNR of the current audio frame is estimated by the following formula:
- the estimated a priori signal to noise ratio can be estimated by any one of the above two formulas. According to experiments Corresponding formulas are better for calculating the above-mentioned estimated a priori signal-to-noise ratio. In this method, mainly the musical tone is less, so in the embodiment of the present disclosure, optionally, The corresponding formula calculates the above-mentioned estimated prior signal-to-noise ratio.
- the smoothing number may be a value set in advance, for example, a value of 0.95 to 1, or a value of 0.98 or 0.3, which is not limited thereto, and the noise variance is common knowledge, and will not be described in detail.
- the foregoing method further includes:
- a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
- the ⁇ factor needs to be as large as possible in pure noise, so that the estimated value is as stable as possible, and needs to be as small as possible when there is a voice segment, so as to ensure fast tracking of the voice.
- the above-mentioned a 1 and a 2 may be 0.98 and 0.3, respectively.
- the embodiment of the present disclosure does not limit this, for example, it may be 0.95 and 0.28, etc., and may be adjusted according to actual conditions.
- the accuracy of estimating the a priori signal to noise ratio can be improved by the above a 1 and a 2 .
- the step of estimating the estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
- the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
- Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
- the estimated a priori signal to noise ratio may be switched according to the audio presence probability of the current audio frame to improve the accuracy of the estimated a priori signal to noise ratio.
- calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
- An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
- the estimated a priori signal to noise ratio calculated in step 101 is not limited to the above mentioned The estimated a priori signal-to-noise ratio calculated by the formula.
- a super Gaussian model of speech can also be used to calculate E(X 2
- the a priori SNR is mainly to estimate the variance of the speech signal. By definition This only depends on the speech signal X. But X is not available, so most of the pairs The estimation algorithm has to be estimated from the noisy signal Y. This can also be seen from the direct decision method. In the second half of the calculation formula of the direct decision method, ⁇ -1 is the variance of the speech. The maximum likelihood estimate for the case where ⁇ is known (ieY known), the first half is the instantaneous value To replace E(X 2 ).
- conditional expectations are employed. or To estimate the variance of speech Based on this idea, from the definition of conditional expectations It can be seen that the corresponding is actually the MMSE estimation of the speech amplitude spectrum X 2 . Considering the probability p(H 1
- the above The formula of the representation can pass the complex Gaussian model Super Gaussian model Derived.
- the estimated value of the minimum mean square error of the estimated prior signal to noise ratio may be directly calculated by using the above formula, without performing the derivation process desired by the above condition, and performing the corresponding steps. That is, the above conditions are expected to be merely explanations of the principles at the time of implementation in the embodiments of the present disclosure.
- the calculating a voice existence probability of the current audio frame includes:
- Y) represents the probability of existence of the speech
- p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
- exp() is an exponential function
- ⁇ min and ⁇ max are two empirical values
- ⁇ min ⁇ max , p max and p min are two empirical values
- speech and noise are distinguished by the above formula.
- the probability of existence of speech when the above formula is used to calculate the probability of existence of speech, the probability of existence of the current audio frame can be calculated by combining the a posteriori signal-to-noise ratio of the same frequency points of the previous frames to obtain an average or smoothed value. Additionally, the above formula may be derived directly from the complex Gaussian model provided above.
- the probability of existence by voice is to provide a probability of existence of a voice, so that the current estimated a priori signal-to-noise ratio can be soft-switched in pure noise and voice segments, thereby accelerating the tracking delay problem existing in the direct decision method.
- the advantages of the direct decision method can be retained.
- the foregoing estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
- the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
- the calculation of the above formula is such that the final a priori signal-to-noise ratio pure noise is kept as small as possible at a stable small value, such as ⁇ min , and in the speech segment, the estimated a priori signal-to-noise ratio is biased toward Or understand that the estimated a priori signal-to-noise ratio is biased toward
- the voice state and the voiceless state can be distinguished, and the optimal a priori signal and noise estimate is derived according to the MMSE criterion in the voice state.
- the existence and non-existence state of speech are calculated by the probability of existence of speech.
- the probability is calculated by using the fixed value a priori SNR, which makes the a priori SNR estimation more accurate and can solve the existence of direct judgment. Tracking delay issues.
- the estimated a priori signal to noise ratio may be used for gain calculation of the noise reduction process of the audio signal, and optionally, gain calculation using a single microphone noise reduction process may be applied.
- the a posteriori signal-to-noise ratio and the power spectrum of the previous frame processing structure are obtained, and the a priori of the current audio frame is calculated using a direct decision method based on the posterior signal-to-noise ratio and the power spectrum of the previous frame processing structure.
- Signal-to-noise ratio calculating a voice existence probability of a current audio signal frame based on a posteriori signal-to-noise ratio, calculating an estimated value of the MMSE estimating the a priori signal-to-noise ratio, and estimating the current in combination with the voice existence probability and the estimated value
- the final a priori signal-to-noise ratio of the audio frame which is used for gain calculation.
- the effect of the inherent delay of one frame can be eliminated by the above steps, and the initial segment of the speech is attenuated and the tail of the end segment is degraded, thereby improving the noise reduction performance.
- the following is an explanation of the results through experimental data:
- the experiment uses the noisy MMSE database, the data sampling rate is 8 kHz, the white noise is generated using Cool Edit (for an audio processing software), and the other noise is the noisyzus database.
- the frame length is 20ms, the overlap rate is 50%, and the square root Hanning window is used before and after. Take 15dB. ⁇ min takes -20dB, the suppression criterion uses MMSE-STSA (Short-Time Spectral Amplitude) algorithm, and the noise estimation uses unbiased MMSE algorithm.
- MMSE-STSA Short-Time Spectral Amplitude
- Figures 3 and 4 show a comparison between the direct decision and the method of the present disclosure when the signal to noise ratio is 0 dB and 5 dB, respectively.
- the speech in Figure 3 is sp01
- the noise is white noise
- the speech in Figure 4 is sp04
- the noise is car noise.
- sp01 and sp04 are the speech numbers in the data set.
- Figure 5 shows the noisysus database of 30 sets of car noise and white noise, and the average segment signal-to-noise ratio is improved at 0/5/10/15 dB. It is easy to see from the figure that the performance of the present disclosure method is superior to the direct decision.
- any user terminal with a microphone such as a mobile phone, a tablet personal computer, a laptop computer, a personal digital assistant (PDA), and a mobile device.
- a terminal device such as a Mobile Intemet Device (MID), an in-vehicle device, or a wearable device, it should be noted that the specific type of the user terminal is not limited in the embodiment of the present disclosure.
- Estimating an estimated a priori signal to noise ratio of the current audio frame calculating an estimated value of the estimated MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
- the probability of speech presence of the current audio frame; the final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
- the user terminal 600 includes the following modules:
- the first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame
- the first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame;
- a second calculating module 603, configured to calculate a voice existence probability of the current audio frame
- the second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame in conjunction with the voice presence probability and the estimated value.
- the first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
- the first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
- the first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
- the user terminal 600 further includes:
- the adjusting module 605 is configured to adjust, by using the following formula, a smoothing number required to estimate the estimated a priori signal to noise ratio:
- a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
- the first estimation module 601 is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
- Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
- the first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame by using a formula :
- An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
- the second calculating module 603 is configured to calculate a voice existence probability of the current audio frame by using the following formula:
- Y) represents the probability of existence of the speech
- p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
- exp() is an exponential function
- ⁇ min and ⁇ max are two empirical values
- ⁇ min ⁇ max , p max and p min are two empirical values
- the second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
- the user terminal 600 may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any implementation in the method embodiment in the embodiment of the present disclosure The method can be implemented by the foregoing user terminal 600 in the embodiment, and achieve the same beneficial effects, and details are not described herein again.
- an embodiment of the present disclosure provides a structure of another user terminal, including: a processor 800, a transceiver 810, a memory 820, a user interface 830, and a bus interface, where:
- the processor 800 is configured to read a program in the memory 820 and perform the following process:
- a final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
- the microphone included in the user interface 830, the transceiver 810, is configured to receive and transmit data under the control of the processor 800.
- the bus architecture may include any number of interconnected buses and bridges, specifically linked by one or more processors represented by processor 800 and various circuits of memory represented by memory 820.
- the bus architecture can also link various other circuits such as peripherals, voltage regulators, and power management circuits.
- the bus interface provides an interface.
- Transceiver 810 can be a plurality of components, including a transmitter and a receiver, providing means for communicating with various other devices on a transmission medium.
- the user interface 830 may also be an interface capable of externally connecting the required devices, including but not limited to a keypad, a display, a speaker, a microphone, a joystick, and the like.
- the processor 800 is responsible for managing the bus architecture and general processing, and the memory 820 can store data used by the processor 800 in performing operations.
- the estimating an a priori signal to noise ratio of the current audio frame includes:
- the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame including:
- the estimated a priori SNR of the current audio frame is estimated by the following formula:
- the estimated a priori SNR of the current audio frame is estimated by the following formula:
- processor 800 is further configured to:
- a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , ⁇ th and ⁇ th are two empirical thresholds.
- the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice further comprising:
- the estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
- Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames
- the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame including:
- An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
- the calculating a voice existence probability of the current audio frame includes:
- Y) represents the probability of existence of the speech
- p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability
- exp() is an exponential function
- ⁇ min and ⁇ max are two empirical values
- ⁇ min ⁇ max , p max and p min are two empirical values
- the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value including:
- the final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
- the user terminal may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any of the method embodiments in the embodiments of the present disclosure It can be implemented by the above user terminal in this embodiment, and achieve the same beneficial effects, and will not be described again here.
- the disclosed method and apparatus may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
- the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
- the software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform part of the steps of the method of transmitting and receiving described in various embodiments of the present disclosure.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), and a random access memory.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephone Function (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
A method for estimating signal-to-noise ratio for noise suppression, and a user terminal. The method may comprise: estimating preestimated priori signal-to-noise ratio of a current audio frame (101); computing, according to the preestimated priori signal-to-noise ratio, an estimated value of an MMSE corresponding to the preestimated priori signal-to-noise ratio of the current audio frame (102); computing a speech presence probability of the current audio frame (103); and estimating final priori signal-to-noise ratio of the current audio frame with reference to the speech presence probability and the estimated value (104).
Description
相关申请的交叉引用Cross-reference to related applications
本申请主张在2016年11月10日在中国提交的中国专利申请No.201611039463.4的优先权,其全部内容通过引用包含于此。The present application claims priority to Chinese Patent Application No. 201611039463.4, filed on Jan.
本公开文本涉及语音技术领域,尤其涉及一种噪声抑制信噪比估计方法和用户终端。The present disclosure relates to the field of voice technologies, and in particular, to a noise suppression signal to noise ratio estimation method and a user terminal.
目前用户终端中通常采用单麦克风降噪方法对音频信号进行降噪,该方法中主要包括如下步骤:At present, a single microphone noise reduction method is generally used in a user terminal to perform noise reduction on an audio signal. The method mainly includes the following steps:
将带噪语音使用快速傅氏变换(Fast Fourier Transformation,FFT)或者其他变换方法,将带噪语音在频域分解频域信号Y;Using a fast Fourier Transformation (FFT) or other transform method, the noisy speech is used to decompose the frequency domain signal Y in the frequency domain;
估计频域信号Y的噪声方差;Estimating the noise variance of the frequency domain signal Y;
基于上述噪声方差推算先验信噪比和后验信噪比;Estimating the a priori signal to noise ratio and the a posteriori signal to noise ratio based on the noise variance described above;
根据先验信噪比和后验信噪比计算出适合的增益;Calculating a suitable gain based on the a priori signal to noise ratio and the a posteriori signal to noise ratio;
对频域信号Y的每个频域乘以上述增益,以得到降噪后的频域信号;Multiplying each frequency domain of the frequency domain signal Y by the above gain to obtain a noise-reduced frequency domain signal;
将降噪后的频域信号通过快速傅氏逆变换(Inverse Fast Fourier Transform,IFFT)变换成时域信号。The noise-reduced frequency domain signal is transformed into a time domain signal by Inverse Fast Fourier Transform (IFFT).
然而,上述技术中,先验信噪比是采用直接判决方法估计的,即通过如下公式进行估计的:However, in the above technique, the a priori signal-to-noise ratio is estimated using a direct decision method, that is, estimated by the following formula:
其中,表示当前帧的先验信噪比的估计值,α通常需要取接近1的平滑数,具体为0.95~1的值,表示前一帧的降噪处理结果,表示噪声方差,表示当前帧的后验信噪比估计值。among them, An estimate of the a priori signal-to-noise ratio of the current frame, α usually needs to take a smoothing number close to 1, specifically 0.95 to 1. Indicates the noise reduction processing result of the previous frame, Indicates the noise variance, Represents an a posteriori signal to noise ratio estimate for the current frame.
通过上述公式可以看出,后验信噪比的估计值严重偏向于前一帧的降噪
处理结果而可以看成是前一帧语音方差的瞬时值。所以,通过上述公式最终估计到的先验信噪比ξ并非是估计当前帧的信噪比ξ(m),可以视为估计前一帧的先验信噪比ξ(m-1)。可见,目前估算当前音频帧的先验信噪比存在与当前音频帧的相关性较差,不利于当前音频帧的噪声抑制的问题。It can be seen from the above formula that the estimated value of the posterior SNR is heavily biased towards the noise reduction processing result of the previous frame. and Can be seen as the previous frame of speech variance Instantaneous value. Therefore, the a priori estimated signal-to-noise ratio ξ estimated by the above formula is not an estimate of the signal-to-noise ratio ξ(m) of the current frame, and can be regarded as estimating the a priori signal-to-noise ratio ξ(m-1) of the previous frame. It can be seen that it is currently estimated that the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is not conducive to the problem of noise suppression of the current audio frame.
发明内容Summary of the invention
本公开文本的目的在于提供一种噪声抑制信噪比估计方法和用户终端,解决了估算当前音频帧的先验信噪比存在与当前音频帧的相关性较差,不利于当前音频帧的噪声抑制的问题。The purpose of the present disclosure is to provide a noise suppression signal to noise ratio estimation method and a user terminal, which solves the problem that estimating the a priori signal to noise ratio of the current audio frame has a poor correlation with the current audio frame, which is disadvantageous to the noise of the current audio frame. The problem of suppression.
为了达到上述目的,本公开文本实施例提供一种先验信噪比估计方法,包括:In order to achieve the above object, an embodiment of the present disclosure provides a method for estimating a priori signal to noise ratio, including:
估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差(Minimum Mean Square Error,MMSE)的估计值;Calculating an estimated value of a minimum mean square error (MMSE) corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。A final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
可选地,所述估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame includes:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
可选地,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame, including:
通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
其中,表示所述预估先验信噪比,α为平滑数,表示前一帧的降噪处理结果,表示噪声方差,表示所述当前音频帧的后验信噪比估计值;among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, Indicates the noise reduction processing result of the previous frame, Indicates the noise variance, Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,
Or,
通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
其中,表示所述预估先验信噪比,α为平滑数,为前一帧的先验信噪比,表示当前帧的后验信噪比估计值。among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, For the a priori signal to noise ratio of the previous frame, Represents an a posteriori signal to noise ratio estimate for the current frame.
可选地,所述方法还包括:Optionally, the method further includes:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:The smoothing number required to estimate the estimated a priori signal to noise ratio is adjusted by the following formula:
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
可选地,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:Optionally, the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice, further comprising:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
其中,表示所述预估先验信噪比,和分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。among them, Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
可选地,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:Optionally, the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame, including:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Calculating an estimated value of the minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio:
其中,表示所述预估先验信噪比对应的最小均方误差的估计值,表示所述预估先验信噪比,表示所述当前音频帧的后验信噪比估计值。among them, An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio, Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
可选地,所述计算所述当前音频帧的语音存在概率,包括:Optionally, the calculating a voice existence probability of the current audio frame includes:
通过如下公式计算所述当前音频帧的语音存在概率:
Calculating the probability of existence of the current audio frame by the following formula:
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,为一固定值,表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax。Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability, For a fixed value, Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ min <γ max , p max and p min are two empirical values And p min <p max .
可选地,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:Optionally, the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value, including:
通过如下公式估计所述当前音频帧的最终先验信噪比:The final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
其中,表示所述当前音频帧的最终先验信噪比,表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。among them, Representing the final a priori signal to noise ratio of the current audio frame, An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
本公开文本实施例还提供一种用户终端,包括:The embodiment of the present disclosure further provides a user terminal, including:
第一估计模块,用于估计当前音频帧的预估先验信噪比;a first estimating module, configured to estimate an estimated a priori signal to noise ratio of the current audio frame;
第一计算模块,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;a first calculating module, configured to calculate an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
第二计算模块,用于计算所述当前音频帧的语音存在概率;a second calculating module, configured to calculate a voice existence probability of the current audio frame;
第二估计模块,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。And a second estimating module, configured to estimate a final a priori signal to noise ratio of the current audio frame in combination with the voice presence probability and the estimated value.
可选地,所述第一估计模块用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Optionally, the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
可选地,所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:
Optionally, the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
其中,表示所述预估先验信噪比,α为平滑数,表示前一帧的降噪处理结果,表示噪声方差,表示所述当前音频帧的后验信噪比估计值;among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, Indicates the noise reduction processing result of the previous frame, Indicates the noise variance, Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,or,
所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:The first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
其中,表示所述预估先验信噪比,α为平滑数,为前一帧的先验信噪比,表示当前帧的后验信噪比估计值。among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, For the a priori signal to noise ratio of the previous frame, Represents an a posteriori signal to noise ratio estimate for the current frame.
可选地,所述用户终端还包括:Optionally, the user terminal further includes:
调整模块,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:An adjustment module for adjusting a smoothing number required to estimate the estimated a priori signal to noise ratio by the following formula:
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
可选地,所述第一估计模块还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:Optionally, the first estimation module is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
其中,表示所述预估先验信噪比,
分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。among them, Representing the estimated a priori signal to noise ratio, Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
可选地,所述第一计算模块用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值:Optionally, the first calculating module is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame by using:
其中,表示所述预估先验信噪比对应的最小均方误差的估计值,表示所述预估先验信噪比,表示所述当前音频帧的后验信噪比估
计值。among them, An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio, Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
可选地,所述第二计算模块用于通过如下公式计算所述当前音频帧的语音存在概率:Optionally, the second calculating module is configured to calculate a voice existence probability of the current audio frame by using the following formula:
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,为一固定值,表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax。Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability, For a fixed value, Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ min <γ max , p max and p min are two empirical values And p min <p max .
可选地,所述第二估计模块用于通过如下公式估计所述当前音频帧的最终先验信噪比:Optionally, the second estimation module is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
其中,表示所述当前音频帧的最终先验信噪比,表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。among them, Representing the final a priori signal to noise ratio of the current audio frame, An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
本公开文本实施例还提供一种用户终端,包括:处理器、存储器和收发机,其中:The embodiment of the present disclosure further provides a user terminal, including: a processor, a memory, and a transceiver, where:
所述处理器用于读取存储器中的程序,执行下列过程:The processor is configured to read a program in the memory and perform the following process:
估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;Calculating an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,
Estimating a final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimated value,
其中,所述收发机用于接收和发送数据,所述存储器能够存储处理器在执行操作时所使用的数据。The transceiver is configured to receive and transmit data, and the memory is capable of storing data used by the processor when performing operations.
本公开文本的上述技术方案至少具有如下有益效果:The above technical solution of the present disclosure has at least the following beneficial effects:
本公开文本实施例,估计当前音频帧的预估先验信噪比;根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;计算所述当前音频帧的语音存在概率;结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。由于是结合当前帧的语音存在概率和当前音频帧的预估先验信噪比对应的最小均方误差的估计值估计的最终先验信噪比,相比相关技术中根据前一帧的先验信噪比进行估计,本公开文本实施例可以估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制。In an embodiment of the present disclosure, estimating an estimated a priori signal to noise ratio of a current audio frame; and calculating, according to the estimated a priori signal to noise ratio, an MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame. Estimating a value; calculating a speech presence probability of the current audio frame; estimating a final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimated value. The final a priori signal-to-noise ratio estimated by combining the estimated probability of the voice of the current frame with the estimated a priori SNR of the current audio frame, compared to the prior art according to the previous frame. Detecting the signal to noise ratio for estimation, the a priori signal to noise ratio that can be estimated by the embodiments of the present disclosure is more correlated with the current audio frame, thereby facilitating noise suppression of the current audio frame.
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。以下附图并未刻意按实际尺寸等比例缩放绘制,重点在于示出本申请的主旨。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments described in the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work. The following figures are not intended to be scaled to scale in actual dimensions, with emphasis on the subject matter of the present application.
图1为本公开文本实施例提供的一种噪声抑制信噪比估计方法的流程示意图;FIG. 1 is a schematic flowchart diagram of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure;
图2为本公开文本实施例提供的另一种噪声抑制信噪比估计方法的示意图;2 is a schematic diagram of another noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure;
图3为本公开文本实施例提供的一种噪声抑制信噪比估计方法的实验数据示意图;FIG. 3 is a schematic diagram of experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure; FIG.
图4为本公开文本实施例提供的一种噪声抑制信噪比估计方法的另一实验数据示意图;4 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure;
图5为本公开文本实施例提供的一种噪声抑制信噪比估计方法的另一实验数据示意图;FIG. 5 is a schematic diagram of another experimental data of a noise suppression signal to noise ratio estimation method according to an embodiment of the present disclosure;
图6为本公开文本实施例提供的一种用户终端的结构示意图;
FIG. 6 is a schematic structural diagram of a user terminal according to an embodiment of the present disclosure;
图7为本公开文本实施例提供的另一种用户终端的结构示意图;FIG. 7 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure;
图8为本公开文本实施例提供的另一种用户终端的结构示意图。FIG. 8 is a schematic structural diagram of another user terminal according to an embodiment of the present disclosure.
下面将结合本公开文本实施例中的附图,对本公开文本实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开文本一部分实施例,而不是全部的实施例。基于本公开文本中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开文本保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described in conjunction with the drawings in the embodiments of the present disclosure. It is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. example. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present disclosure without departing from the inventive scope are the scope of the disclosure.
参见图1,本公开文本实施例提供一种噪声抑制信噪比估计方法,如图1所示,包括以下步骤:Referring to FIG. 1 , an embodiment of the present disclosure provides a noise suppression signal to noise ratio estimation method, as shown in FIG. 1 , including the following steps:
101、估计当前音频帧的预估先验信噪比;101. Estimating an estimated a priori signal to noise ratio of the current audio frame;
102、根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;102. Calculate, according to the estimated a priori signal to noise ratio, an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame.
103、计算所述当前音频帧的语音存在概率;103. Calculate a voice existence probability of the current audio frame.
104、结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。104. Estimate a final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimated value.
本公开文本实施例中,上述当前音频帧可以是用户终端的麦克风采集的当前帧,该当前帧可能是语音帧,也有可能是噪声帧。In the embodiment of the present disclosure, the current audio frame may be a current frame collected by a microphone of the user terminal, and the current frame may be a voice frame or a noise frame.
另外,上述预估先验信噪比可以是采用直接判决方法或者最大似然方法等方法进行估计的先验信噪比。上述计算预估先验信噪比的MMSE的估计值可以是采用MMSE算法得到上述预估先验信噪比的MMSE的估计值。上述当前音频帧的语音存在概率可以根据当前音频帧的后验信噪比计算当前音频帧的语音存概率,也可以是结合前几帧相同频点的后验信噪比做一个平均或者平滑得到的值计算当前音频帧的语音存在概率。In addition, the above-mentioned estimated a priori signal-to-noise ratio may be an a priori signal-to-noise ratio estimated by a direct decision method or a maximum likelihood method. The above estimated MMSE estimate for estimating the a priori SNR may be an estimate of the MMSE using the MMSE algorithm to obtain the above-described estimated prior SNR. The voice existence probability of the current audio frame may be calculated according to the posterior signal to noise ratio of the current audio frame, or may be averaged or smoothed by combining the posterior signal to noise ratio of the same frequency point of the previous frames. The value of the calculation calculates the probability of speech presence of the current audio frame.
需要说明的是,对于步骤103与步骤101和步骤102之间的执行顺序,本公开文本实施例不作限定,例如:可以是先执行步骤103,再执行步骤101,或者可以是先执行步骤101,之后再执行步骤103。It should be noted that, in the order of execution between step 103 and step 101 and step 102, the embodiment of the present disclosure is not limited. For example, step 103 may be performed first, then step 101 may be performed, or step 101 may be performed first. Then step 103 is performed.
另外,上述当前音频帧的最终先验信噪比可以是理解为,在对音频帧进
行降噪过程中用于增益计算的先验信噪比,或者也可以理解为本公开文本实施例中针对当前音频帧输出的先验信噪比。结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比可以是,根据上述语音存在概率确定当前音频帧为语音帧的概率,若确定当前音频帧为纯噪声帧,则将上述最终先验信噪比设置为一个稳定的最小值,例如ξmin,以保证纯噪声段处理平稳,减小音乐噪声;而当确定当前音频帧为语音段中的音频帧时,则计算最终先验信噪比偏向于上述预估先验信噪比对应的最小均方误差的估计值,使得最终先验信噪比估计更为准确。In addition, the final a priori signal to noise ratio of the current audio frame may be understood as a priori signal to noise ratio for gain calculation in the process of performing noise reduction on the audio frame, or may also be understood as being directed to the embodiments of the present disclosure. The a priori signal-to-noise ratio of the current audio frame output. Estimating the final a priori signal to noise ratio of the current audio frame according to the voice existence probability and the estimated value may be: determining a probability that the current audio frame is a voice frame according to the voice existence probability, and determining that the current audio frame is pure noise Frame, then set the final a priori SNR to a stable minimum, such as ξ min , to ensure smooth processing of pure noise segments and reduce music noise; and when determining that the current audio frame is an audio frame in a speech segment Then, the final a priori SNR is calculated to be biased toward the estimated minimum azimuth error of the a priori SNR, so that the final a priori SNR estimation is more accurate.
通过上述步骤可以实现结合当前帧的语音存在概率和当前音频帧的预估先验信噪比的最小均方误差的估计值估计的最终先验信噪比,估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制,以提高噪声抑制效果。Through the above steps, the final a priori SNR of the estimated value of the minimum mean square error of the current frame and the estimated a priori SNR of the current audio frame can be realized, the estimated a priori SNR and the current The correlation of audio frames is higher, which is beneficial to the noise suppression of the current audio frame to improve the noise suppression effect.
可选地,所述估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame includes:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
其中,当前音频帧的后验信噪比为公知常识,此处不作详细说明。其中,基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比可以是基于所述当前音频帧的后验信噪比估计值采用直接判决方法估计当前音频帧的预估先验信噪比,当然,本公开文本实施例对此并不作限定。The posterior signal to noise ratio of the current audio frame is common knowledge and will not be described in detail herein. The estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame may be based on the a posteriori signal to noise ratio estimation value of the current audio frame, using a direct decision method to estimate the current The estimated a priori signal to noise ratio of the audio frame is of course not limited by the embodiments of the present disclosure.
可选地,上述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating the a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame, including:
通过如下公式估计当前音频帧的预估验信噪比:Estimate the estimated signal-to-noise ratio of the current audio frame by the following formula:
其中,表示所述预估先验信噪比,α为平滑数,表示前一帧的降噪处理结果,表示噪声方差,表示所述当前音频帧的后验信噪比估计值;among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, Indicates the noise reduction processing result of the previous frame, Indicates the noise variance, Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,or,
通过如下公式估计当前音频帧的预估先验信噪比:
The estimated a priori SNR of the current audio frame is estimated by the following formula:
其中,表示所述预估先验信噪比,α为平滑数,为前一帧的先验信噪比,表示当前帧的后验信噪比估计值。among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, For the a priori signal to noise ratio of the previous frame, Represents an a posteriori signal to noise ratio estimate for the current frame.
该实施方式中,可以通过上述两个公式中的任一公式估算上述预估先验信噪比。根据实验表明采用对应的公式计算上述预估先验信噪比效果更好,该方法中主要是音乐噪声(musical tone)会少,所以本公开文本实施例中可选地,采用对应的公式计算上述预估先验信噪比。In this embodiment, the estimated a priori signal to noise ratio can be estimated by any one of the above two formulas. According to experiments Corresponding formulas are better for calculating the above-mentioned estimated a priori signal-to-noise ratio. In this method, mainly the musical tone is less, so in the embodiment of the present disclosure, optionally, The corresponding formula calculates the above-mentioned estimated prior signal-to-noise ratio.
另外,上述平滑数可以是预先设置的数值,例如,为0.95~1的值,或者为0.98或者0.3等数值,对此不作限定,而噪声方差为公知常识,对此不作详细说明。Further, the smoothing number may be a value set in advance, for example, a value of 0.95 to 1, or a value of 0.98 or 0.3, which is not limited thereto, and the noise variance is common knowledge, and will not be described in detail.
可选地,上述方法还包括:Optionally, the foregoing method further includes:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:The smoothing number required to estimate the estimated a priori signal to noise ratio is adjusted by the following formula:
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
该实施方式中,考虑到α因子需要在纯噪声时,保证尽可能的大,使得估计出来的值尽可能的稳定,而在有语音段的时候需要尽可能的小,以便保证快速的跟踪语音。其中,上述a1和a2可以分别为0.98和0.3,当然,本公开文本实施例对此并不作限定,例如:还可以是0.95和0.28等,具体还可以根据实际进行调整。In this embodiment, it is considered that the α factor needs to be as large as possible in pure noise, so that the estimated value is as stable as possible, and needs to be as small as possible when there is a voice segment, so as to ensure fast tracking of the voice. . The above-mentioned a 1 and a 2 may be 0.98 and 0.3, respectively. Of course, the embodiment of the present disclosure does not limit this, for example, it may be 0.95 and 0.28, etc., and may be adjusted according to actual conditions.
该实施方式中,通过上述a1和a2可以提高预估先验信噪比的准确性。In this embodiment, the accuracy of estimating the a priori signal to noise ratio can be improved by the above a 1 and a 2 .
可选地,该实施方式中,上述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:Optionally, in this implementation, the step of estimating the estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice, further comprising:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
其中,表示所述预估先验信噪比,和分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。
among them, Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
该实施方式中,可以根据当前音频帧的音频存在概率切换预估先验信噪比,以提高预估先验信噪比的准确性。In this implementation manner, the estimated a priori signal to noise ratio may be switched according to the audio presence probability of the current audio frame to improve the accuracy of the estimated a priori signal to noise ratio.
可选地,上述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:Optionally, calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame, including:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Calculating an estimated value of the minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio:
其中,表示所述预估先验信噪比对应的最小均方误差的估计值,表示所述预估先验信噪比,表示所述当前音频帧的后验信噪比估计值。among them, An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio, Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
需要说明的是,上述表示步骤101计算得到的所述预估先验信噪比,并不限定是通过上述提到的关于公式计算的预估先验信噪比。It should be noted that the above The estimated a priori signal to noise ratio calculated in step 101 is not limited to the above mentioned The estimated a priori signal-to-noise ratio calculated by the formula.
其中,上述可以是根据复高斯模型得到的此外,还可以采用语音的超高斯模型来计算E(X2|Y)。其中,可以等效于E(X2|Y)。因为在实际应用中,先验信噪比主要是估计语音信号的方差根据定义这只依赖于语音信号X。但X无从获取,所以大部分对的估计算法,都得从带噪信号Y估计。这一点也可以从直接判决方法看出,在直接判决方法的计算公式的后一半中的γ-1是对语音方差在γ已知(i.e.Y已知)的情况的最大似然估计,前一半是使用瞬时值来替换E(X2)。Wherein, the above may be obtained according to a complex Gaussian model In addition, a super Gaussian model of speech can also be used to calculate E(X 2 |Y). among them, It can be equivalent to E(X 2 |Y). Because in practical applications, the a priori SNR is mainly to estimate the variance of the speech signal. By definition This only depends on the speech signal X. But X is not available, so most of the pairs The estimation algorithm has to be estimated from the noisy signal Y. This can also be seen from the direct decision method. In the second half of the calculation formula of the direct decision method, γ-1 is the variance of the speech. The maximum likelihood estimate for the case where γ is known (ieY known), the first half is the instantaneous value To replace E(X 2 ).
所以,从大部分信噪比估计算法来看,都需要建立在带噪信号Y已知的条件下。换句话说,实际上,并不能直接估计语音方差而是在Y已知的条件,估计因此,本公开文本实施例中,采用条件期望或来估计语音方差在这种想法的基础上,从条件期望的定义可以看出,对应的其实是对语音幅度谱X2的MMSE估计。考虑Y中有语音的概率p(H1|Y),条件期望最终的表达式为:Therefore, from most of the SNR estimation algorithms, it needs to be established under the condition that the noisy signal Y is known. In other words, in reality, the variance of the speech cannot be directly estimated. But the condition known in Y, estimated Therefore, in the embodiments of the present disclosure, conditional expectations are employed. or To estimate the variance of speech Based on this idea, from the definition of conditional expectations It can be seen that the corresponding is actually the MMSE estimation of the speech amplitude spectrum X 2 . Considering the probability p(H 1 |Y) of speech in Y, the condition expects the final expression to be:
根据复高斯模型:
According to the complex Gaussian model:
其中,p(H0|Y)表示Y已知的条件下,无语音H0的概率,即条件概率,二元假设:Where p(H 0 |Y) represents the probability that there is no speech H 0 under the condition that Y is known, that is, the conditional probability, the binary hypothesis:
H0:Y=N,表示无语音H0: Y=N, indicating no voice
H1:Y=X+N,表示有语音H1: Y=X+N, indicating that there is voice
E(X2|Y,H0)根据上述二元假设,E(X2|Y,H0)=0。E(X 2 |Y, H 0 ) According to the above binary hypothesis, E(X 2 |Y, H 0 )=0.
上式中是真正的语音方差,实际需要进一步估计,可以采用最大似然或者直接判决方法估计,另一个方面,还可以从假设语音服从其它模型,例如超高斯模型等,例如卡方(chi)分布:In the above formula It is the true speech variance, which needs to be further estimated. It can be estimated by the maximum likelihood or direct decision method. On the other hand, it can also obey the other models from the hypothetical speech, such as super Gaussian models, such as chi-square (chi) distribution:
上面是汇通型超几何函数。由于包含超越函数,使得整体计算比较复杂,一般需要查表等方式来实现。Above It is a Huitong type hypergeometric function. Due to the inclusion of the transcendental function, the overall calculation is more complicated, and it is generally required to look up the table and the like.
通过上述分析可知,上述关于表示所的公式可以通过复高斯模型和超高斯模型推导得到。According to the above analysis, the above The formula of the representation can pass the complex Gaussian model Super Gaussian model Derived.
需要说明的是,本公开文本实施例中,直接可以采用上述公式计算预估先验信噪比的最小均方误差的估计值,而不需要执行上述条件期望的推导过程,而执行相应的步骤即可,上述条件期望仅是本公开文本实施例中在实施时的原理解释说明。It should be noted that, in the embodiment of the present disclosure, the estimated value of the minimum mean square error of the estimated prior signal to noise ratio may be directly calculated by using the above formula, without performing the derivation process desired by the above condition, and performing the corresponding steps. That is, the above conditions are expected to be merely explanations of the principles at the time of implementation in the embodiments of the present disclosure.
可选地,所述计算所述当前音频帧的语音存在概率,包括:Optionally, the calculating a voice existence probability of the current audio frame includes:
通过如下公式计算所述当前音频帧的语音存在概率:Calculating the probability of existence of the current audio frame by the following formula:
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,为一固定值,表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax。Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability, For a fixed value, Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ min <γ max , p max and p min are two empirical values And p min <p max .
该实施方式中,通过上述公式区分语音和噪声。另外,使用上面公式计算语音存在概率时可以结合前几帧相同频点的后验信噪比做一个平均或者平滑得到的值计算当前音频帧的语音存在概率。另外,上面公式可以是根据上面提供的复高斯模型直接推导出来的。In this embodiment, speech and noise are distinguished by the above formula. In addition, when the above formula is used to calculate the probability of existence of speech, the probability of existence of the current audio frame can be calculated by combining the a posteriori signal-to-noise ratio of the same frequency points of the previous frames to obtain an average or smoothed value. Additionally, the above formula may be derived directly from the complex Gaussian model provided above.
本公开文本实施例中,通过语音存在概率是提供一个语音存在的概率,使得当前估计的先验信噪比能够在纯噪声和语音段进行软切换,从而加快直接判决方法存在的跟踪时延问题,同时又能保留直接判决方法的优点。In the embodiment of the present disclosure, the probability of existence by voice is to provide a probability of existence of a voice, so that the current estimated a priori signal-to-noise ratio can be soft-switched in pure noise and voice segments, thereby accelerating the tracking delay problem existing in the direct decision method. At the same time, the advantages of the direct decision method can be retained.
可选地,上述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:Optionally, the foregoing estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value, including:
通过如下公式估计所述当前音频帧的最终先验信噪比:The final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
其中,表示所述当前音频帧的最终先验信噪比,表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。among them, Representing the final a priori signal to noise ratio of the current audio frame, An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
该实施方式中,通过上述公式计算使得最终先验信噪比纯噪声尽可能保持在一个稳定的小的数值,例如ξmin,而在语音段时,估计的先验信噪比偏向于或者理解为估计的先验信噪比偏向于
In this embodiment, the calculation of the above formula is such that the final a priori signal-to-noise ratio pure noise is kept as small as possible at a stable small value, such as ξ min , and in the speech segment, the estimated a priori signal-to-noise ratio is biased toward Or understand that the estimated a priori signal-to-noise ratio is biased toward
该实施方式中,可以区分有语音状态和无语音状态,在有语音状态根据MMSE准则推导出最优的先验信噪估计。无语音状态,使用某一个最小值来作为最大抑制力度的限制,可以保证纯噪声段处理平稳,减小音乐噪声。语音存在和不存在状态的采用语音存在概率进行计算,该概率采用固定值先验信噪比计算,从而使得先验信噪比估计的更为准确,可以解决直接判决存在
的跟踪时延问题。In this embodiment, the voice state and the voiceless state can be distinguished, and the optimal a priori signal and noise estimate is derived according to the MMSE criterion in the voice state. There is no voice state, and using a certain minimum value as the limit of maximum suppression strength can ensure smooth processing of pure noise segments and reduce music noise. The existence and non-existence state of speech are calculated by the probability of existence of speech. The probability is calculated by using the fixed value a priori SNR, which makes the a priori SNR estimation more accurate and can solve the existence of direct judgment.
Tracking delay issues.
需要说明的是,本公开文本实施例中,上述介绍的多种实施方式可以相互结合实现,也可以单独实现,对此本公开文本实施例不作限定。另外,本公开文本实施例中,估算的先验信噪比可以用于音频信号的降噪过程的增益计算,可选地,可以应用采用单个麦克风降噪过程的增益计算。例如:如图2所示,获取后验信噪比和前一帧处理结构功率谱,基于后验信噪比和前一帧处理结构功率谱使用直接判决方法计算当前音频帧的预估先验信噪比,基于后验信噪比计算当前音频信号帧的语音存在概率,计算预估先验信噪比的MMSE的估计值,以及结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,该先验信噪比用于增益计算。It should be noted that, in the embodiments of the present disclosure, the various embodiments described above may be implemented in combination with each other, or may be implemented separately, and the embodiments of the present disclosure are not limited thereto. Additionally, in an embodiment of the present disclosure, the estimated a priori signal to noise ratio may be used for gain calculation of the noise reduction process of the audio signal, and optionally, gain calculation using a single microphone noise reduction process may be applied. For example, as shown in FIG. 2, the a posteriori signal-to-noise ratio and the power spectrum of the previous frame processing structure are obtained, and the a priori of the current audio frame is calculated using a direct decision method based on the posterior signal-to-noise ratio and the power spectrum of the previous frame processing structure. Signal-to-noise ratio, calculating a voice existence probability of a current audio signal frame based on a posteriori signal-to-noise ratio, calculating an estimated value of the MMSE estimating the a priori signal-to-noise ratio, and estimating the current in combination with the voice existence probability and the estimated value The final a priori signal-to-noise ratio of the audio frame, which is used for gain calculation.
本公开文本实施例中,通过上述步骤可以消除固有延时一帧的影响,缓解语音的起始段被衰减和末尾段存在的拖尾,进而带来降噪性能的提升。以下通过实验数据进行效果说明:In the embodiment of the present disclosure, the effect of the inherent delay of one frame can be eliminated by the above steps, and the initial segment of the speech is attenuated and the tail of the end segment is degraded, thereby improving the noise reduction performance. The following is an explanation of the results through experimental data:
实验采用Noizus数据库,数据的采样率为8kHz,白噪声使用Cool Edit(为一音频处理软件)生成,其它噪声则为Noizus数据库自带。帧长取20ms,重叠率为50%,前后各使用平方根哈宁窗(Hanning window),取15dB。ξmin取-20dB,抑制准则采用MMSE-STSA(Short-Time Spectral Amplitude,短时谱幅度)算法,噪声估计采用无偏MMSE算法。The experiment uses the Noizus database, the data sampling rate is 8 kHz, the white noise is generated using Cool Edit (for an audio processing software), and the other noise is the Noizus database. The frame length is 20ms, the overlap rate is 50%, and the square root Hanning window is used before and after. Take 15dB. ξ min takes -20dB, the suppression criterion uses MMSE-STSA (Short-Time Spectral Amplitude) algorithm, and the noise estimation uses unbiased MMSE algorithm.
图3和图4分别是信噪比为0dB和5dB时的直接判决和本公开文本方法之间的对比。图3的语音为sp01,噪声为白噪,图4的语音为sp04,噪声为汽车噪声,其中,sp01和sp04是数据集里面的语音编号。箭头处可以看出,本公开文本方法明显优于对比算法。主观对比听,处理结果音乐噪声均不明显。图5为Noizus数据库30组汽车噪声和白噪声,在0/5/10/15dB下的平均段信噪比提升,从图中不难看出,本公开文本方法性能优于直接判决。Figures 3 and 4 show a comparison between the direct decision and the method of the present disclosure when the signal to noise ratio is 0 dB and 5 dB, respectively. The speech in Figure 3 is sp01, the noise is white noise, the speech in Figure 4 is sp04, and the noise is car noise. Among them, sp01 and sp04 are the speech numbers in the data set. As can be seen at the arrows, the disclosed method is clearly superior to the comparison algorithm. Subjective contrast, the music noise of the processing results are not obvious. Figure 5 shows the Noisus database of 30 sets of car noise and white noise, and the average segment signal-to-noise ratio is improved at 0/5/10/15 dB. It is easy to see from the figure that the performance of the present disclosure method is superior to the direct decision.
需要说明的是,上述方法可以应用于任何具备麦克风的用户终端,例如:手机、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)、个人数字助理(personal digital assistant,PDA)、移动上网装置(Mobile Intemet Device,MID)、车载设备或可穿戴式设备(Wearable Device)等终端设备,需要说明的是,在本公开文本实施例中并不限定用户终端的具体类型。
It should be noted that the above method can be applied to any user terminal with a microphone, such as a mobile phone, a tablet personal computer, a laptop computer, a personal digital assistant (PDA), and a mobile device. A terminal device such as a Mobile Intemet Device (MID), an in-vehicle device, or a wearable device, it should be noted that the specific type of the user terminal is not limited in the embodiment of the present disclosure.
估计当前音频帧的预估先验信噪比;根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;计算所述当前音频帧的语音存在概率;结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。由于是结合当前帧的语音存在概率和当前音频帧的预估先验信噪比对应的最小均方误差的估计值估计的最终先验信噪比,相比相关技术中根据前一帧的先验信噪比进行估计,本公开文本实施例可以估算的先验信噪比与当前音频帧的相关性更高,从而有利于当前音频帧的噪声抑制。Estimating an estimated a priori signal to noise ratio of the current audio frame; calculating an estimated value of the estimated MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio; The probability of speech presence of the current audio frame; the final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate. The final a priori signal-to-noise ratio estimated by combining the estimated probability of the voice of the current frame with the estimated a priori SNR of the current audio frame, compared to the prior art according to the previous frame. Detecting the signal to noise ratio for estimation, the a priori signal to noise ratio that can be estimated by the embodiments of the present disclosure is more correlated with the current audio frame, thereby facilitating noise suppression of the current audio frame.
参见图6,本公开文本实施例提供一种用户终端,如图6所示,用户终端600,包括以下模块:Referring to FIG. 6, an embodiment of the present disclosure provides a user terminal. As shown in FIG. 6, the user terminal 600 includes the following modules:
第一估计模块601,用于估计当前音频帧的预估先验信噪比;The first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame;
第一计算模块602,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;The first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame;
第二计算模块603,用于计算所述当前音频帧的语音存在概率;a second calculating module 603, configured to calculate a voice existence probability of the current audio frame;
第二估计模块604,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。The second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame in conjunction with the voice presence probability and the estimated value.
可选地,第一估计模块601用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Optionally, the first estimating module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame.
可选地,第一估计模块601用于通过如下公式估计当前音频帧的预估先验信噪比:Optionally, the first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
其中,表示所述预估先验信噪比,α为平滑数,表示前一帧的降噪处理结果,表示噪声方差,表示所述当前音频帧的后验信噪比估计值;among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, Indicates the noise reduction processing result of the previous frame, Indicates the noise variance, Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,or,
所述第一估计模块601用于通过如下公式估计当前音频帧的预估先验信噪比:The first estimation module 601 is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
其中,表示所述预估先验信噪比,α为平滑数,为前一帧的先验信噪比,表示当前帧的后验信噪比估计值。among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, For the a priori signal to noise ratio of the previous frame, Represents an a posteriori signal to noise ratio estimate for the current frame.
可选地,如图7所示,用户终端600还包括:Optionally, as shown in FIG. 7, the user terminal 600 further includes:
调整模块605,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:The adjusting module 605 is configured to adjust, by using the following formula, a smoothing number required to estimate the estimated a priori signal to noise ratio:
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
可选地,第一估计模块601还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:Optionally, the first estimation module 601 is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
其中,表示所述预估先验信噪比,和分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。among them, Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
可选地,第一计算模块602用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Optionally, the first calculating module 602 is configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame by using a formula :
其中,表示所述预估先验信噪比对应的最小均方误差的估计值,表示所述预估先验信噪比,表示所述当前音频帧的后验信噪比估计值。among them, An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio, Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
可选地,第二计算模块603用于通过如下公式计算所述当前音频帧的语音存在概率:Optionally, the second calculating module 603 is configured to calculate a voice existence probability of the current audio frame by using the following formula:
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,为一固定值,表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax。Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability, For a fixed value, Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ min <γ max , p max and p min are two empirical values And p min <p max .
可选地,第二估计模块604用于通过如下公式估计所述当前音频帧的最终先验信噪比:Optionally, the second estimation module 604 is configured to estimate a final a priori signal to noise ratio of the current audio frame by using the following formula:
其中,表示所述当前音频帧的最终先验信噪比,表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。among them, Representing the final a priori signal to noise ratio of the current audio frame, An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
需要说明的是,本实施例中上述用户终端600可以是与本公开文本实施例中方法实施例提供的语音信号降噪方法对应的用户终端,本公开文本实施例中方法实施例中的任意实施方式都可以被本实施例中的上述用户终端600所实现,以及达到相同的有益效果,此处不再赘述。It should be noted that, in the embodiment, the user terminal 600 may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any implementation in the method embodiment in the embodiment of the present disclosure The method can be implemented by the foregoing user terminal 600 in the embodiment, and achieve the same beneficial effects, and details are not described herein again.
参见图8,本公开文本实施例提供另一种用户终端的结构,该用户终端包括:处理器800、收发机810、存储器820、用户接口830和总线接口,其中:Referring to FIG. 8, an embodiment of the present disclosure provides a structure of another user terminal, including: a processor 800, a transceiver 810, a memory 820, a user interface 830, and a bus interface, where:
处理器800,用于读取存储器820中的程序,执行下列过程:The processor 800 is configured to read a program in the memory 820 and perform the following process:
估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;
根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的MMSE的估计值;Calculating an estimated value of the MMSE corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;
计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;
结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。A final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
其中,用户接口830中包括的麦克风,收发机810,用于在处理器800的控制下接收和发送数据。
The microphone included in the user interface 830, the transceiver 810, is configured to receive and transmit data under the control of the processor 800.
在图8中,总线架构可以包括任意数量的互联的总线和桥,具体由处理器800代表的一个或多个处理器和存储器820代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起。总线接口提供接口。收发机810可以是多个元件,即包括发送机和接收机,提供用于在传输介质上与各种其他装置通信的单元。针对不同的用户设备,用户接口830还可以是能够外接内接需要设备的接口,连接的设备包括但不限于小键盘、显示器、扬声器、麦克风、操纵杆等。In FIG. 8, the bus architecture may include any number of interconnected buses and bridges, specifically linked by one or more processors represented by processor 800 and various circuits of memory represented by memory 820. The bus architecture can also link various other circuits such as peripherals, voltage regulators, and power management circuits. The bus interface provides an interface. Transceiver 810 can be a plurality of components, including a transmitter and a receiver, providing means for communicating with various other devices on a transmission medium. For different user equipments, the user interface 830 may also be an interface capable of externally connecting the required devices, including but not limited to a keypad, a display, a speaker, a microphone, a joystick, and the like.
处理器800负责管理总线架构和通常的处理,存储器820可以存储处理器800在执行操作时所使用的数据。The processor 800 is responsible for managing the bus architecture and general processing, and the memory 820 can store data used by the processor 800 in performing operations.
可选地,所述估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame includes:
基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
可选地,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:Optionally, the estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimation value of the current audio frame, including:
通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
其中,表示所述预估先验信噪比,α为平滑数,表示前一帧的降噪处理结果,表示噪声方差,表示所述当前音频帧的后验信噪比估计值;among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, Indicates the noise reduction processing result of the previous frame, Indicates the noise variance, Representing an a posteriori signal to noise ratio estimate of the current audio frame;
或者,or,
通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
其中,表示所述预估先验信噪比,α为平滑数,为前一帧的先验信噪比,表示当前帧的后验信噪比估计值。among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, For the a priori signal to noise ratio of the previous frame, Represents an a posteriori signal to noise ratio estimate for the current frame.
可选地,处理器800还用于:Optionally, the processor 800 is further configured to:
通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:The smoothing number required to estimate the estimated a priori signal to noise ratio is adjusted by the following formula:
其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
可选地,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:Optionally, the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the voice, further comprising:
通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:
其中,表示所述预估先验信噪比,和分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。among them, Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
可选地,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:Optionally, the calculating, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame, including:
根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Calculating an estimated value of the minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio:
其中,表示所述预估先验信噪比对应的最小均方误差的估计值,表示所述预估先验信噪比,表示所述当前音频帧的后验信噪比估计值。among them, An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio, Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
可选地,所述计算所述当前音频帧的语音存在概率,包括:Optionally, the calculating a voice existence probability of the current audio frame includes:
通过如下公式计算所述当前音频帧的语音存在概率:Calculating the probability of existence of the current audio frame by the following formula:
其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,为一固定值,表示所述当前音频帧的后
验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax。Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability, For a fixed value, Representing a posterior signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ min <γ max , p max and p min are two empirical values And p min <p max .
可选地,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:Optionally, the estimating the final a priori signal to noise ratio of the current audio frame by combining the voice existence probability and the estimated value, including:
通过如下公式估计所述当前音频帧的最终先验信噪比:The final a priori signal to noise ratio of the current audio frame is estimated by the following formula:
其中,表示所述当前音频帧的最终先验信噪比,表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。among them, Representing the final a priori signal to noise ratio of the current audio frame, An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
需要说明的是,本实施例中上述用户终端可以是与本公开文本实施例中方法实施例提供的语音信号降噪方法对应的用户终端,本公开文本实施例中方法实施例中的任意实施方式都可以被本实施例中的上述用户终端所实现,以及达到相同的有益效果,此处不再赘述It should be noted that, in the embodiment, the user terminal may be a user terminal corresponding to the voice signal noise reduction method provided by the method embodiment in the embodiment of the present disclosure, and any of the method embodiments in the embodiments of the present disclosure It can be implemented by the above user terminal in this embodiment, and achieve the same beneficial effects, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
另外,在本公开文本各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开文本各个实施例所述收发方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存
储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform part of the steps of the method of transmitting and receiving described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), and a random access memory.
A variety of media that can store program code, such as a random access memory (RAM), a disk, or an optical disk.
以上所述是本公开文本的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本公开文本所述原理的前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本公开文本的保护范围。
The above is a preferred embodiment of the present disclosure, and it should be noted that those skilled in the art can make several improvements and refinements without departing from the principles of the present disclosure. Retouching should also be considered as protection of this disclosure.
Claims (17)
- 一种噪声抑制信噪比估计方法,包括:A noise suppression signal to noise ratio estimation method includes:估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;Calculating an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。A final a priori signal to noise ratio of the current audio frame is estimated in conjunction with the speech presence probability and the estimate.
- 如权利要求1所述的方法,其中,所述估计当前音频帧的预估先验信噪比,包括:The method of claim 1 wherein said estimating an estimated a priori signal to noise ratio of a current audio frame comprises:基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。Estimating an a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
- 如权利要求2所述的方法,其中,所述基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比,包括:The method of claim 2 wherein said estimating an a priori signal to noise ratio of a current audio frame based on an a posteriori signal to noise ratio estimate of said current audio frame comprises:通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:其中,表示所述预估先验信噪比,α为平滑数,表示前一帧的降噪处理结果,表示噪声方差,表示所述当前音频帧的后验信噪比估计值;among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, Indicates the noise reduction processing result of the previous frame, Indicates the noise variance, Representing an a posteriori signal to noise ratio estimate of the current audio frame;或者,or,通过如下公式估计当前音频帧的预估先验信噪比:The estimated a priori SNR of the current audio frame is estimated by the following formula:
- 如权利要求3所述的方法,还包括:The method of claim 3 further comprising:通过如下公式调整估计所述预估先验信噪比时所需要的平滑数: The smoothing number required to estimate the estimated a priori signal to noise ratio is adjusted by the following formula:其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
- 如权利要求4所述的方法,其中,所述基于所述语音存在概率估计值估计当前音频帧的预估先验信噪比的步骤,进一步还包括:The method of claim 4, wherein the step of estimating an estimated a priori signal to noise ratio of the current audio frame based on the estimated probability of existence of the speech further comprises:通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The estimated a priori signal to noise ratio of the current audio frame is further estimated by the following formula:其中,表示所述预估先验信噪比,和分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。among them, Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
- 如权利要求1-5中任一项所述的方法,其中,所述根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值,包括:The method according to any one of claims 1 to 5, wherein the calculating a minimum average of the estimated a priori signal to noise ratios of the current audio frame according to the estimated a priori signal to noise ratio Estimates of the square error, including:根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:Calculating an estimated value of the minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio:其中,表示所述预估先验信噪比对应的最小均方误差的估计值,表示所述预估先验信噪比,表示所述当前音频帧的后验信噪比估计值。among them, An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio, Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
- 如权利要求1-5中任一项所述的方法,其中,所述计算所述当前音频帧的语音存在概率,包括:The method of any of claims 1-5, wherein the calculating a voice presence probability of the current audio frame comprises:通过如下公式计算所述当前音频帧的语音存在概率:Calculating the probability of existence of the current audio frame by the following formula:其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,为一固定值,表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax。Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability, For a fixed value, Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ min <γ max , p max and p min are two empirical values And p min <p max .
- 如权利要求1-5中任一项所述的方法,其中,所述结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,包括:The method of any of claims 1-5, wherein the estimating the final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimate comprises:通过如下公式估计所述当前音频帧的最终先验信噪比:The final a priori signal to noise ratio of the current audio frame is estimated by the following formula:其中,表示所述当前音频帧的最终先验信噪比,表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。among them, Representing the final a priori signal to noise ratio of the current audio frame, An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
- 一种用户终端,包括:A user terminal comprising:第一估计模块,用于估计当前音频帧的预估先验信噪比;a first estimating module, configured to estimate an estimated a priori signal to noise ratio of the current audio frame;第一计算模块,用于根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;a first calculating module, configured to calculate, according to the estimated a priori signal to noise ratio, an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame;第二计算模块,用于计算所述当前音频帧的语音存在概率;a second calculating module, configured to calculate a voice existence probability of the current audio frame;第二估计模块,用于结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比。And a second estimating module, configured to estimate a final a priori signal to noise ratio of the current audio frame in combination with the voice presence probability and the estimated value.
- 如权利要求9所述的用户终端,其中,所述第一估计模块用于基于所述当前音频帧的后验信噪比估计值估计当前音频帧的预估先验信噪比。The user terminal of claim 9, wherein the first estimating module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame based on the a posteriori signal to noise ratio estimate of the current audio frame.
- 如权利要求10所述的用户终端,其中,所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:The user terminal of claim 10, wherein the first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by the following formula:其中,表示所述预估先验信噪比,α为平滑数,表示前一帧的降噪处理结果,表示噪声方差,表示所述当前音频帧的后验信噪 比估计值;among them, Representing the estimated a priori signal to noise ratio, α is a smoothing number, Indicates the noise reduction processing result of the previous frame, Indicates the noise variance, Representing an a posteriori signal to noise ratio estimate of the current audio frame;或者,or,所述第一估计模块用于通过如下公式估计当前音频帧的预估先验信噪比:The first estimation module is configured to estimate an estimated a priori signal to noise ratio of the current audio frame by using the following formula:
- 如权利要求11所述的用户终端,还包括:The user terminal of claim 11, further comprising:调整模块,用于通过如下公式调整估计所述预估先验信噪比时所需要的平滑数:An adjustment module for adjusting a smoothing number required to estimate the estimated a priori signal to noise ratio by the following formula:其中,a1和a2为预设的两个平滑数,且a1>a2,γth和ξth为两个经验阈值。Where a 1 and a 2 are preset two smooth numbers, and a 1 > a 2 , γ th and ξ th are two empirical thresholds.
- 如权利要求12所述的用户终端,其中,所述第一估计模块还用于通过如下公式进一步估计所述当前音频帧的预估先验信噪比:The user terminal of claim 12, wherein the first estimating module is further configured to further estimate an estimated a priori signal to noise ratio of the current audio frame by:其中,表示所述预估先验信噪比,和分别表示平滑数为a1时所述当前音频帧的预估先验信噪比和平滑数为a2时所述当前音频帧的预估先验信噪比,p(H1|Y)表示所述语音存在概率,pth为预设阈值。among them, Representing the estimated a priori signal to noise ratio, with Respectively smoothing said number is a 2-priori SNR estimate the current audio frame, p is a 1 and said current smoothing priori SNR estimate the number of audio frames | represents (H 1 Y) The voice existence probability, and p th is a preset threshold.
- 如权利要求9-13中任一项所述的用户终端,其中,所述第一计算模块用于根据所述预估先验信噪比,通过如下公式计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值:The user terminal according to any one of claims 9 to 13, wherein the first calculation module is configured to calculate the pre-preparation of the current audio frame according to the estimated a priori signal to noise ratio by the following formula Estimate the estimate of the minimum mean square error corresponding to the prior SNR:其中,表示所述预估先验信噪比对应的最小均方误差的估计值,表示所述预估先验信噪比,表示所述当前音频帧的后验信噪比估计值。among them, An estimate of the minimum mean square error corresponding to the estimated a priori signal to noise ratio, Representing the estimated a priori signal to noise ratio, Representing an a posteriori signal to noise ratio estimate for the current audio frame.
- 如权利要求9-13中任一项所述的用户终端,其中,所述第二计算模块用于通过如下公式计算所述当前音频帧的语音存在概率: The user terminal according to any one of claims 9 to 13, wherein the second calculation module is configured to calculate a voice existence probability of the current audio frame by the following formula:其中,p(H1|Y)表示所述语音存在概率,p(H1)和p(H0)分别表示先验语音存在概率和先验无语音概率,为一固定值,表示所述当前音频帧的后验信噪比估计值,exp()为指数函数,γmin和γmax为两个经验值,且γmin<γmax,pmax和pmin为两个经验值,且pmin<pmax。Where p(H 1 |Y) represents the probability of existence of the speech, and p(H 1 ) and p(H 0 ) respectively represent a priori speech existence probability and a priori no speech probability, For a fixed value, Representing an a posteriori signal to noise ratio estimate of the current audio frame, exp() is an exponential function, γ min and γ max are two empirical values, and γ min <γ max , p max and p min are two empirical values And p min <p max .
- 如权利要求9-13中任一项所述的用户终端,其中,所述第二估计模块用于通过如下公式估计所述当前音频帧的最终先验信噪比:The user terminal according to any one of claims 9 to 13, wherein the second estimation module is configured to estimate a final a priori signal to noise ratio of the current audio frame by the following formula:其中,表示所述当前音频帧的最终先验信噪比,表示所述预估先验信噪比的最小均方误差的估计值,p(H1|Y)表示所述语音存在概率,ξmin为某一小数值。among them, Representing the final a priori signal to noise ratio of the current audio frame, An estimated value of the minimum mean square error of the estimated a priori signal to noise ratio, p(H 1 |Y) represents the probability of existence of the voice, and ξ min is a certain fractional value.
- 一种用户终端,包括:处理器、存储器和收发机,其中:A user terminal includes: a processor, a memory, and a transceiver, wherein:所述处理器用于读取存储器中的程序,执行下列过程:The processor is configured to read a program in the memory and perform the following process:估计当前音频帧的预估先验信噪比;Estimating the estimated a priori signal to noise ratio of the current audio frame;根据所述预估先验信噪比,计算所述当前音频帧的所述预估先验信噪比对应的最小均方误差的估计值;Calculating an estimated value of a minimum mean square error corresponding to the estimated a priori signal to noise ratio of the current audio frame according to the estimated a priori signal to noise ratio;计算所述当前音频帧的语音存在概率;Calculating a voice existence probability of the current audio frame;结合所述语音存在概率和所述估计值估计所述当前音频帧的最终先验信噪比,Estimating a final a priori signal to noise ratio of the current audio frame in conjunction with the speech presence probability and the estimated value,其中,所述收发机用于接收和发送数据,所述存储器能够存储处理器在执行操作时所使用的数据。 The transceiver is configured to receive and transmit data, and the memory is capable of storing data used by the processor when performing operations.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611039463.4A CN108074582B (en) | 2016-11-10 | 2016-11-10 | Noise suppression signal-to-noise ratio estimation method and user terminal |
CN201611039463.4 | 2016-11-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018086444A1 true WO2018086444A1 (en) | 2018-05-17 |
Family
ID=62109133
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/106502 WO2018086444A1 (en) | 2016-11-10 | 2017-10-17 | Method for estimating signal-to-noise ratio for noise suppression, and user terminal |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108074582B (en) |
WO (1) | WO2018086444A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111986693A (en) * | 2020-08-10 | 2020-11-24 | 北京小米松果电子有限公司 | Audio signal processing method and device, terminal equipment and storage medium |
US20210327448A1 (en) * | 2018-12-18 | 2021-10-21 | Tencent Technology (Shenzhen) Company Limited | Speech noise reduction method and apparatus, computing device, and computer-readable storage medium |
CN113838474A (en) * | 2021-11-25 | 2021-12-24 | 全时云商务服务股份有限公司 | Communication system howling suppression method and device |
CN114724571A (en) * | 2022-03-29 | 2022-07-08 | 大连理工大学 | Robust distributed speaker noise elimination system |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767781A (en) * | 2019-03-06 | 2019-05-17 | 哈尔滨工业大学(深圳) | Speech separating method, system and storage medium based on super-Gaussian priori speech model and deep learning |
CN109817234B (en) * | 2019-03-06 | 2021-01-26 | 哈尔滨工业大学(深圳) | Target speech signal enhancement method, system and storage medium based on continuous noise tracking |
CN111899752B (en) * | 2020-07-13 | 2023-01-10 | 紫光展锐(重庆)科技有限公司 | Noise suppression method and device for rapidly calculating voice existence probability, storage medium and terminal |
CN112969130A (en) * | 2020-12-31 | 2021-06-15 | 维沃移动通信有限公司 | Audio signal processing method and device and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1763846A (en) * | 2005-11-23 | 2006-04-26 | 北京中星微电子有限公司 | Voice gain factor estimating device and method |
WO2006136900A1 (en) * | 2005-06-15 | 2006-12-28 | Nortel Networks Limited | Method and apparatus for non-intrusive single-ended voice quality assessment in voip |
CN103187068A (en) * | 2011-12-30 | 2013-07-03 | 联芯科技有限公司 | Priori signal-to-noise ratio estimation method, device and noise inhibition method based on Kalman |
CN105280193A (en) * | 2015-07-20 | 2016-01-27 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Prior signal-to-noise ratio estimating method based on MMSE error criterion |
CN105702262A (en) * | 2014-11-28 | 2016-06-22 | 上海航空电器有限公司 | Headset double-microphone voice enhancement method |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814290A (en) * | 2009-02-25 | 2010-08-25 | 三星电子株式会社 | Method for enhancing robustness of voice recognition system |
CN101853665A (en) * | 2009-06-18 | 2010-10-06 | 博石金(北京)信息技术有限公司 | Method for eliminating noise in voice |
JP6129316B2 (en) * | 2012-09-03 | 2017-05-17 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | Apparatus and method for providing information-based multi-channel speech presence probability estimation |
CN102938254B (en) * | 2012-10-24 | 2014-12-10 | 中国科学技术大学 | Voice signal enhancement system and method |
US9449610B2 (en) * | 2013-11-07 | 2016-09-20 | Continental Automotive Systems, Inc. | Speech probability presence modifier improving log-MMSE based noise suppression performance |
US9449609B2 (en) * | 2013-11-07 | 2016-09-20 | Continental Automotive Systems, Inc. | Accurate forward SNR estimation based on MMSE speech probability presence |
CN103646648B (en) * | 2013-11-19 | 2016-03-23 | 清华大学 | A kind of noise power estimation method |
CN105741849B (en) * | 2016-03-06 | 2019-03-22 | 北京工业大学 | The sound enhancement method of phase estimation and human hearing characteristic is merged in digital deaf-aid |
-
2016
- 2016-11-10 CN CN201611039463.4A patent/CN108074582B/en active Active
-
2017
- 2017-10-17 WO PCT/CN2017/106502 patent/WO2018086444A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006136900A1 (en) * | 2005-06-15 | 2006-12-28 | Nortel Networks Limited | Method and apparatus for non-intrusive single-ended voice quality assessment in voip |
CN1763846A (en) * | 2005-11-23 | 2006-04-26 | 北京中星微电子有限公司 | Voice gain factor estimating device and method |
CN103187068A (en) * | 2011-12-30 | 2013-07-03 | 联芯科技有限公司 | Priori signal-to-noise ratio estimation method, device and noise inhibition method based on Kalman |
CN105702262A (en) * | 2014-11-28 | 2016-06-22 | 上海航空电器有限公司 | Headset double-microphone voice enhancement method |
CN105280193A (en) * | 2015-07-20 | 2016-01-27 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Prior signal-to-noise ratio estimating method based on MMSE error criterion |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210327448A1 (en) * | 2018-12-18 | 2021-10-21 | Tencent Technology (Shenzhen) Company Limited | Speech noise reduction method and apparatus, computing device, and computer-readable storage medium |
CN111986693A (en) * | 2020-08-10 | 2020-11-24 | 北京小米松果电子有限公司 | Audio signal processing method and device, terminal equipment and storage medium |
CN113838474A (en) * | 2021-11-25 | 2021-12-24 | 全时云商务服务股份有限公司 | Communication system howling suppression method and device |
CN113838474B (en) * | 2021-11-25 | 2022-02-18 | 全时云商务服务股份有限公司 | Communication system howling suppression method and device |
CN114724571A (en) * | 2022-03-29 | 2022-07-08 | 大连理工大学 | Robust distributed speaker noise elimination system |
CN114724571B (en) * | 2022-03-29 | 2024-05-03 | 大连理工大学 | Robust distributed speaker noise elimination system |
Also Published As
Publication number | Publication date |
---|---|
CN108074582B (en) | 2021-08-06 |
CN108074582A (en) | 2018-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018086444A1 (en) | Method for estimating signal-to-noise ratio for noise suppression, and user terminal | |
US20210327448A1 (en) | Speech noise reduction method and apparatus, computing device, and computer-readable storage medium | |
US20230298610A1 (en) | Noise suppression method and apparatus for quickly calculating speech presence probability, and storage medium and terminal | |
US8239196B1 (en) | System and method for multi-channel multi-feature speech/noise classification for noise suppression | |
CN110634497B (en) | Noise reduction method and device, terminal equipment and storage medium | |
WO2021179424A1 (en) | Speech enhancement method combined with ai model, system, electronic device and medium | |
US8483398B2 (en) | Methods and systems for reducing acoustic echoes in multichannel communication systems by reducing the dimensionality of the space of impulse responses | |
JP6361156B2 (en) | Noise estimation apparatus, method and program | |
AU2015240992B2 (en) | Situation dependent transient suppression | |
WO2021128670A1 (en) | Noise reduction method, device, electronic apparatus and readable storage medium | |
WO2012158156A1 (en) | Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood | |
CN109817234A (en) | Targeted voice signal Enhancement Method, system and storage medium based on continuing noise tracking | |
CN109727607B (en) | Time delay estimation method and device and electronic equipment | |
WO2020124325A1 (en) | Echo elimination adaptive filtering method, apparatus, device and storage medium | |
US20140357326A1 (en) | Echo suppression | |
WO2022218254A1 (en) | Voice signal enhancement method and apparatus, and electronic device | |
WO2019119593A1 (en) | Voice enhancement method and apparatus | |
WO2012166092A1 (en) | Control of adaptation step size and suppression gain in acoustic echo control | |
WO2021143249A1 (en) | Transient noise suppression-based audio processing method, apparatus, device, and medium | |
CN112289337B (en) | Method and device for filtering residual noise after machine learning voice enhancement | |
WO2021007841A1 (en) | Noise estimation method, noise estimation apparatus, speech processing chip and electronic device | |
CN113763975B (en) | Voice signal processing method, device and terminal | |
CN113611319A (en) | Wind noise suppression method, device, equipment and system based on voice component | |
CN116913306A (en) | Voice enhancement method and device and electronic equipment | |
CN116453538A (en) | Voice noise reduction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17869048 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17869048 Country of ref document: EP Kind code of ref document: A1 |