WO2021007841A1 - Noise estimation method, noise estimation apparatus, speech processing chip and electronic device - Google Patents

Noise estimation method, noise estimation apparatus, speech processing chip and electronic device Download PDF

Info

Publication number
WO2021007841A1
WO2021007841A1 PCT/CN2019/096503 CN2019096503W WO2021007841A1 WO 2021007841 A1 WO2021007841 A1 WO 2021007841A1 CN 2019096503 W CN2019096503 W CN 2019096503W WO 2021007841 A1 WO2021007841 A1 WO 2021007841A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
power spectrum
target
existence
noisy speech
Prior art date
Application number
PCT/CN2019/096503
Other languages
French (fr)
Chinese (zh)
Inventor
何婷婷
王鑫山
朱虎
李国梁
郭红敬
Original Assignee
深圳市汇顶科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市汇顶科技股份有限公司 filed Critical 深圳市汇顶科技股份有限公司
Priority to CN201980001368.0A priority Critical patent/CN112602150A/en
Priority to PCT/CN2019/096503 priority patent/WO2021007841A1/en
Publication of WO2021007841A1 publication Critical patent/WO2021007841A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the embodiments of the present application relate to the field of signal processing technology, and in particular, to a noise estimation method, a noise estimation device, a speech processing chip, and electronic equipment.
  • Voice is an important means of communication between people.
  • the forms of communication between people are becoming more and more diversified.
  • voice communication such as phone calls, WeChat voice, video, etc.; in addition, , Voice communication is no longer limited to people.
  • voice interactions between people and machines, and between machines and machines have also been seen everywhere in daily life.
  • Speech enhancement technology is an important means to suppress noise.
  • the main tasks of speech enhancement include two aspects: one is to suppress noise through signal processing means to obtain relatively pure speech, thereby improving the intelligibility and comfort of speech, and improving hearing fatigue caused by noise; second, speech enhancement is The necessary links of various voice communication and voice interaction systems can effectively reduce the bit error rate of voice communication and the error recognition rate of voice recognition, thereby improving the performance of the voice processing system.
  • Speech enhancement technology is an important branch in the field of signal processing.
  • speech enhancement technologies mainly including non-parametric methods, parametric methods, statistical methods, wavelet transforms, neural networks, etc.
  • non-parametric methods the more typical processing techniques include spectral subtraction and its improvement methods. This type of method is widely used because of its simple principle and easy implementation.
  • non-parametric methods will produce severe musical noise under strong noise.
  • Typical processing techniques in the parameter method include subspace method, etc. The subspace method requires eigenvalue decomposition in the implementation process, and then introduces a larger amount of calculation, so it is rarely used in the engineering implementation process.
  • the typical representative of statistical methods is the Minimum Mean Square Error (MMSE) and its improved methods.
  • MMSE Minimum Mean Square Error
  • the embodiments of the present application provide a noise estimation method, a noise estimation device, a speech processing chip, and an electronic device to overcome the above-mentioned defects in the prior art.
  • the embodiment of the present application provides a noise estimation method, which includes:
  • the initial estimated noise power spectrum is updated to obtain an effective noise power spectrum.
  • An embodiment of the present application provides a noise estimation device, which includes:
  • the initial noise estimation unit is used to determine the initial estimated noise power spectrum of noisy speech
  • the noise update unit is configured to update the initial estimated noise power spectrum to obtain an effective noise power spectrum according to the noisy speech and a smoothing factor, and the calculated smoothing factor is calculated according to the probability of the existence of the target speech.
  • An embodiment of the present application provides a voice processing chip, which includes the noise estimation device in any embodiment of the present application.
  • An embodiment of the present application provides an electronic device, which includes the voice processing chip in any embodiment of the present application.
  • the smoothing factor is calculated according to the probability of the existence of the target speech; according to the noisy speech and the smoothing factor, the initial estimated noise power spectrum is calculated
  • the effective noise power spectrum is updated to make the effective noise power spectrum as close to the real noise power spectrum as possible, thereby eliminating noise as much as possible in the subsequent noise elimination process, avoiding noise residue, and improving the overall performance of speech enhancement.
  • FIG. 1 is a schematic diagram of the structure of a speech enhancement system in Embodiment 1 of this application;
  • FIG. 2 is a schematic flowchart of a voice enhancement method in Embodiment 2 of this application;
  • Figures 3 and 4 respectively show the first and second schematic diagrams of the mapping curves of the posterior probability and the smoothing factor of the target speech.
  • Fig. 1 is a schematic structural diagram of a speech enhancement system in Embodiment 1 of this application; as shown in Fig. 1, the noise estimation scheme of this application is applied in the speech enhancement system.
  • the specific structure of the speech enhancement system in this embodiment is only an example and is not a limitation.
  • those of ordinary skill in the art can also simplify some of the modules according to the needs of the application scenario, or add some modules on this basis.
  • Other modules. The functions between the modules can actually be integrated with each other.
  • the voice enhancement system includes: a collection module, a preprocessing module, a voice enhancement device, a restoration module, and an output module.
  • the collection module may specifically be a voice receiving device such as a microphone, which is mainly used to collect the target voice generated by the sound source of interest (or the sound source of interest), in which environmental noise and noise interfered by other sound sources can also be collected.
  • Get noisy speech that includes both target speech and noise.
  • the acquisition module also performs processing such as sampling and encoding on the noisy speech, and converts it into a binary code group, that is, the noisy speech in digital form, or simply called the original digital noisy speech.
  • the preprocessing module is used to sequentially perform windowing and framing processing, pre-emphasis processing, fast Fourier transform (FFT) processing on the noisy speech, and finally convert the noisy speech from the time domain to On the frequency domain.
  • the preprocessing module includes but is not limited to the above processing steps.
  • the pre-processing module may include a windowing unit, a pre-emphasis unit, and an FFT unit, but is not limited to the above processing unit.
  • the windowing unit is mainly used for windowing and framing the input noisy speech through a window function.
  • the duration of each frame of noisy speech is 10 ⁇ Between 30ms.
  • an overlap and windowing method is adopted between the noisy speech of the previous and subsequent frames, and the overlap degree is 50%.
  • the window function is selected according to different application scenarios, such as rectangular window, Hamming window, Caesar window, etc.
  • the pre-emphasis unit performs pre-emphasis processing on each frame of noisy speech after windowing and framing, so as to enhance the high-frequency components of the noisy speech and at the same time remove the influence of lip radiation.
  • the pre-emphasis unit can be specifically, but not limited to, a high-pass filter.
  • the FFT unit performs fast Fourier transform on each frame of noisy speech after pre-emphasis, to obtain the frequency domain signal of each frame of noisy speech, so as to reduce the noisy speech in the frequency domain. Noise processing.
  • the speech enhancement device is mainly used to estimate the noise in the noisy speech in the frequency domain, and further remove the noise from the noisy speech by means of filtering.
  • the speech enhancement device includes a noise estimation device, a noise update control module, and a filtering module.
  • the speech enhancement device since this embodiment performs noise estimation and filter coefficient calculation based on the noisy speech power spectrum, the speech enhancement device also includes: power spectrum calculation module. Therefore, the speech enhancement device actually includes four main modules: a power spectrum calculation module, a noise estimation device, a noise update control module, and a filtering module.
  • the speech enhancement device does not necessarily include a power spectrum calculation module and a filter module. In fact, those of ordinary skill in the art can also configure the power spectrum calculation module and the filter module in the speech enhancement system according to their needs. On the module, or the power spectrum calculation module and the filter module are independent modules.
  • the power spectrum calculation module is configured to calculate the power spectrum of the noisy speech according to the spectrum of the noisy speech
  • the initial noise estimation unit is further configured to determine the initial estimated noise of the noisy speech according to the power spectrum of the noisy speech power spectrum.
  • the noise update control module is used to calculate the smoothing factor according to the posterior probability of the target voice.
  • the noise estimation device is used to determine the initial estimated noise power spectrum according to the noisy speech power spectrum, and update or correct the initial estimated noise power spectrum according to the smoothing factor output by the noise update control module to obtain Effective noise power spectrum.
  • the filtering module is configured to calculate filter coefficients according to the effective noise power spectrum; according to the filter coefficients, the real part and the imaginary part of the noisy speech spectrum are respectively filtered to obtain an enhanced speech spectrum.
  • the noise in the noisy speech is estimated in the frequency domain, and the noise is further eliminated from the noisy speech through frequency domain filtering.
  • the noise estimation device adopts two-step estimation, namely, determining the initial estimated noise power spectrum and updating the initial estimated noise power spectrum to obtain the effective noise. power spectrum.
  • the noise estimation device includes an initial noise estimation unit and a noise update unit, wherein: the initial noise estimation unit is used to perform windowing processing on the noisy speech power spectrum, that is, smoothing between frequencies; After the windowing, the noisy speech power spectrum is smoothed before and after frames, that is, inter-frame smoothing, to obtain a smoothed noisy speech power spectrum; the noisy speech power spectrum after the smoothing between the frames is A minimum power spectrum search is performed within a certain time window, and the searched minimum power spectrum is used as the initial estimated noise power spectrum; the noise update unit is configured to perform a calculation on the initial estimated noise power spectrum according to the smoothing factor output by the noise update control module Update to get the effective noise power spectrum.
  • the initial estimated noise power spectrum can also choose other different methods to determine the initial estimated noise power spectrum, such as quantiles, histograms, etc., from the perspectives of hardware overhead, algorithm simplicity, application scenarios, and algorithm performance. Time recursive average etc. Since the initial noise estimation unit is only a rough estimate of the noise in the noisy speech, there is still a large deviation between the initial estimated noise power spectrum and the real noise power spectrum. Generally, the initial estimated noise power spectrum is smaller than the real noise power spectrum. As mentioned earlier, considering that the accuracy of noise estimation directly affects the performance of subsequent filters, as well as the overall performance of the speech enhancement system.
  • this application adds a noise update unit to correct (or update) the initial estimated noise power spectrum, so that the effective noise power spectrum obtained after the correction is as close as possible to the real noise power spectrum, so that the filtered noise power spectrum can be effectively solved.
  • a noise update unit to correct (or update) the initial estimated noise power spectrum, so that the effective noise power spectrum obtained after the correction is as close as possible to the real noise power spectrum, so that the filtered noise power spectrum can be effectively solved.
  • There is a large residual noise problem in the enhanced speech which improves the overall performance of the speech enhancement system.
  • the smoothing factor is calculated in real time for each frame of noisy speech, and the initial estimated noise power spectrum is updated by the smoothing factor to obtain the effective noise power spectrum.
  • the effective noise power spectrum is closer to the real noise power spectrum, which solves
  • the initial estimated noise power spectrum is relatively small compared to the real noise power spectrum.
  • the smoothing factor is a function of the posterior probability of the target speech, it can be based on the posterior probability of the target speech at each frequency point in the current frame. Controlling the size of the smoothing factor effectively avoids the problem that the effective noise power spectrum is too large compared to the real noise power spectrum. Therefore, by adding a noise update control module, it can effectively solve the problems of excessive noise residue caused by small initial estimated noise power spectrum during speech enhancement and voice loss caused by large effective noise power spectrum.
  • the following provides a specific noise update control Module.
  • the noise update control module includes a likelihood ratio calculation unit, a priori probability calculation unit for the existence of a target voice, a posterior probability calculation unit for the existence of a target voice, and a smoothing factor calculation unit.
  • the likelihood ratio calculation unit, the prior probability calculation unit for the existence of the target speech, the posterior probability calculation unit for the existence of the target speech, and the smoothing factor calculation unit are all based on the power spectrum of the noisy speech in the frequency domain. Carry out their respective related technical processing.
  • noise update control module in this embodiment is only an example and not a limitation. In fact, a person of ordinary skill in the art can also simplify some of the modules according to the needs of the application scenario, or add some other modules on this basis. Module. The functions between the modules can actually be integrated with each other.
  • the likelihood ratio calculation unit is used for the probability density distribution function of the noisy speech spectrum when the target speech is assumed to exist and the probability density distribution function of the noisy speech spectrum when the target speech is assumed to be absent. , Calculate the likelihood ratio. Further, the likelihood ratio calculation unit calculates the likelihood based on the probability density distribution function of the noisy speech spectrum when the target speech is assumed to exist and the probability density distribution function of the noisy speech spectrum when the target speech is assumed to be absent. When comparing, replace the target speech power spectrum with the estimated enhanced speech power spectrum, replace the real noise power spectrum with the initial estimated noise power spectrum, and use the noisy speech power spectrum and the enhanced speech The power spectrum and the initial estimated noise power spectrum are used to calculate the likelihood ratio.
  • the specific method for the likelihood ratio calculation unit to calculate the likelihood ratio depends on the probability density distribution characteristics of the target speech spectrum and the noise spectrum in a specific application scenario. For details, please refer to the description of the following method embodiments.
  • the prior probability calculation unit for the existence of the target speech is used to determine the prior probability of the existence of the effective target speech according to the power spectrum of the noisy speech to determine whether the target speech exists in the noisy speech possibility.
  • the prior probability calculation unit for the existence of the target speech calculates the prior probability of the existence of the target speech in two steps, and further includes: the first step is to preliminarily judge the noisy current frame according to the power spectrum of the noisy speech Whether there is a target voice in the voice; the second step is to determine the estimated prior probability of the existence of the target voice according to the preliminary judgment result of the preliminary judgment on whether there is a target voice in the noisy voice in the current frame, and according to the estimated target voice The prior probability of existence determines the prior probability of the existence of a valid target speech.
  • the prior probability calculation unit for the existence of the target speech is further configured to: if the target speech does not exist, perform inter-frequency smoothing on the power spectrum of noisy speech without the target speech to obtain inter-frequency smoothing Or, if the target speech exists, the power spectrum of the noisy speech after smoothing between historical frames is used as the power spectrum of the noisy speech after the smoothing between frequency points; Perform inter-frame smoothing on the noisy speech power spectrum smoothed between points to obtain the inter-frame smoothed noisy speech power spectrum; according to the smoothed inter-frame noisy speech power spectrum, determine the prior art of the estimated target speech Probability.
  • the historical power spectrum of the noisy speech after inter-frame smoothing may directly be the inter-frame smoothed noisy speech power spectrum obtained for the noisy speech in the previous frame.
  • the inter-frame smoothed noisy speech power spectrum obtained for the noisy speech in the previous frame it is not particularly limited to only use the inter-frame smoothed noisy speech power spectrum obtained for the noisy speech in the previous frame. In fact, it can be flexibly selected according to the requirements of the application scenario.
  • the prior probability calculation unit for the existence of the target speech further determines the first frequency of each frequency point in the current frame according to the power spectrum of the noisy speech and the minimum power spectrum of the noisy speech smoothed between the frames.
  • a detection factor; the second detection factor of each frequency point of the current frame is determined according to the power spectrum of the noisy speech after being windowed and smoothed between frames and the minimum power spectrum of the noisy speech after the smoothing between frames, so as The first detection factor and the second detection factor are calculated at each frequency point of the current frame, and it is preliminarily determined whether the target voice exists in each frequency point of the noisy speech.
  • the noisy speech does not have the target speech at this frequency point; if the above conditions are not met, it is preliminarily determined that the noisy speech has the target speech at this frequency point.
  • the prior probability calculation unit for the existence of the target speech is used to indicate whether the target speech exists at a certain frequency point of the noisy speech in the current frame according to a defined indicator function, and determine whether the target speech exists at the frequency point in the current frame In the case of the target voice, the value of the indicator function at the frequency point is 0, and when it is determined that the target voice does not exist at the frequency point of the current frame, the value of the indicator function at the frequency point is 1.
  • the prior probability calculation unit for the existence of the target speech is further configured to perform the inter-frequency calculation of the noisy speech in the current frame according to the value of the indicator function calculated at each frequency point of the current frame. Smoothing (or one-time smoothing), if the value of the indicator function at each frequency point of the current frame is not all zero, it is determined that the target voice does not exist in the frame, and then the indicator function and the window function are paired The power spectrum of the noisy speech without the target speech is smoothed between frequency points; further, the inter-frame smoothing (or called the second smoothing) is performed according to the noisy speech after the first smoothing to obtain two smoothed The power spectrum of noisy speech. If the value of the indicator function at each frequency point of the current frame is all zero, it is determined that the target speech exists in the frame, and the two smoothed noisy speech power spectrum obtained in the previous frame is used as the current frame secondary Smoothed noisy speech power spectrum.
  • the prior probability calculation unit for the existence of the target speech is further configured to determine each frequency point of the current frame according to the noisy speech power spectrum and the minimum power spectrum of the two smoothed noisy speech power spectra
  • the third detection factor at each frequency point in the current frame is determined according to the two smoothed noisy speech power spectrum and its minimum power spectrum; according to the third detection factor at each frequency point
  • the detection factor and the fourth detection factor determine the prior probability of the presence of the estimated target speech at the corresponding frequency point.
  • the a priori probability calculation unit for the existence of the target voice is further configured to compare the third detection factor and the fourth detection factor calculated at each frequency point of the current frame with a corresponding threshold, and according to the The different comparison results are used to determine the prior probability of the presence of the estimated target speech at the corresponding frequency point in the current frame.
  • the prior probability calculation unit for the existence of the target voice is further configured to compare the prior probability of the existence of the target speech at each frequency point of the current frame with the minimum value of the prior probability of the existence of the target speech , Taking the maximum of the minimum value of the estimated prior probability of the existence of the target speech at each frequency point and the minimum of the prior probability of the existence of the target speech as the effective target speech existence at each frequency point in the current frame The prior probability, so that the prior probability of the effective target voice at each frequency point of each frame of noisy speech can be obtained.
  • the posterior probability calculation unit for the existence of the target speech is configured to determine the posterior probability of the existence of the target speech according to the likelihood ratio and the prior probability of the existence of the effective target speech.
  • the smoothing factor calculation unit is used to determine the mapping model between the posterior probability of the existence of the target speech and the smoothing factor according to different noise reduction scenarios;
  • the test probability is used as the input of the mapping model
  • the smoothing factor is the output of the mapping model.
  • the smoothing factor is a function of the posterior probability of the existence of the target speech. For the frequency domain speech signal corresponding to each frame of noisy speech, the posterior probability of the existence of the corresponding target speech can be calculated at each frequency point. Probability, the posterior probability of the existence of the target speech at different frequencies can be mapped to different smoothing factors, and further, the smoothing factor obtained by the mapping is used to achieve the initial estimated noise power at each frequency The spectrum is corrected.
  • y(n) is the collected noisy speech
  • x(n) is the target speech
  • n(n) is the noise
  • the n in parentheses represents the sampling time sequence.
  • step S204a specifically includes the following steps when determining the initial estimated noise power spectrum:
  • hamming(n) is the normalized Hamming window
  • cov is the convolution operation
  • P w ( ⁇ ,k) is the power spectrum of noisy speech after windowing in the ⁇ th frame
  • m is the parameter representing the window length
  • k represents different frequency points.
  • the above formula (15) can be simplified to obtain likelihood ratio calculation formulas in different simplified forms to save hardware resource overhead.
  • H 0 indicates that there is no target speech
  • H 1 indicates that there is a target speech. Therefore, p(Y( ⁇ ,k)
  • H 1 ) represents the probability density distribution function of the noisy speech spectrum in the ⁇ th frame when there is a target speech. See formula (13) again, the likelihood ratio corresponding to the k-th frequency point is actually p(Y( ⁇ ,k)
  • the first step is to preliminarily determine whether there is a target speech in the noisy speech in the current frame according to the power spectrum of the noisy speech;
  • the second step is to determine the estimated prior probability of the existence of the target speech according to the preliminary judgment result of whether the target speech exists in the noisy speech in the current frame, and determine the effective The prior probability of the existence of the target speech.
  • determining the estimated prior probability of the presence of the target speech according to the power spectrum of the noisy speech includes: smoothing the power spectrum of the noisy speech without the target speech between frequency points And inter-frame smoothing processing; according to the noisy speech power spectrum after twice smoothing, the prior probability of the existence of the estimated target speech is determined.
  • step S204c when preliminarily determining whether the target speech exists in the noisy speech, it is based on the power spectrum of the noisy speech, namely the above
  • the minimum power spectrum of the noisy speech after inter-smoothing is the above-mentioned P min ( ⁇ ,k), and the first detection factor at each frequency point of the current frame is determined; according to the noisy speech after windowing and inter-frame smoothing
  • the power spectrum is the above P( ⁇ ,k)
  • the minimum power spectrum of the noisy speech after windowing and smoothing between frames is the above P min ( ⁇ ,k) to determine the second detection factor at each frequency point in the current frame
  • the first detection factor calculated at a certain frequency point of the noisy speech in the current frame is less than the set first detection factor threshold, and the second detection factor at this frequency point is less than the set second detection factor Factor threshold, it is preliminarily determined that the noisy speech does not have the target voice at this frequency point; if the above conditions are not met, it is preliminarily determined that the noisy speech has the target voice at this frequency point.
  • ⁇ 0 and ⁇ 0 are thresholds
  • the second detection factor ⁇ ( is determined according to the smoothed ⁇ th frame noisy speech power spectrum P( ⁇ ,k) and the minimum power spectrum P min calculated according to the above formula (6) or (8) ⁇ ,k); ⁇ ( ⁇ ,k) is used to detect whether there is a target voice in the frequency domain signal corresponding to each frequency point of the noisy speech in the ⁇ th frame.
  • the first detection factor and the second detection factor calculated according to the above formulas (17) and (18) The value is relatively small. Therefore, according to the above formulas (17) and (18), the first detection factor ⁇ min ( ⁇ , k) and the second detection factor ⁇ ( ⁇ , k) are obtained, respectively, and the corresponding threshold ⁇ 0 and ⁇ 0 , if the first detection factor ⁇ min ( ⁇ ,k) at a certain frequency point in the current frame is less than the corresponding threshold ⁇ 0 , and the second detection factor ⁇ ( ⁇ , k) is less than the corresponding threshold ⁇ 0 , then it is preliminarily determined that the noisy speech includes only noise at this frequency point and does not include the target speech. Under other conditions, it is preliminarily determined that the noisy speech at this frequency point It includes both noise and target speech.
  • the thresholds ⁇ 0 and ⁇ 0 can be flexibly set according to application scenarios.
  • the third detection factor at each frequency point of the current frame is determined according to the minimum power spectrum of the noisy speech power spectrum after twice smoothing and the noisy speech power spectrum; according to the twice smoothed noisy speech
  • the power spectrum and its minimum power spectrum determine the fourth detection factor at each frequency point in the current frame; according to the third detection factor at each frequency point in the current frame and the fourth detection factor at each frequency point in the current frame Factor to determine the prior probability of the existence of the estimated target speech at each frequency point in the current frame.
  • 2 is smoothed between frequency points and between frames to obtain the two smoothing The power spectrum of noisy speech afterwards.
  • ⁇ 1 and ⁇ 0 are thresholds
  • q min is the minimum value of the prior probability of the existence of the target speech. After the application scenario is determined, q min is approximately fixed, that is, q min can be set according to the application scenario.
  • q( ⁇ ,k) is the prior probability of the existence of a valid target speech.
  • the above first detection factor is similarly determined.
  • determining the prior probability of the estimated target speech includes :According to the two smoothed power spectrum of noisy speech Minimum power spectrum And the noisy speech power spectrum
  • the third detection factor and the fourth detection factor calculated for each frequency point of the noisy speech in the current frame are compared with the corresponding threshold, and according to the According to the comparison result, the prior probability of the existence of the estimated target speech at each frequency point of the current frame is determined.
  • the corresponding frequency point is less than the corresponding threshold ⁇ 0 , it is determined that the prior probability q s ( ⁇ ,k) of the estimated target speech at this frequency point is 0; if the third detection factor The value at a certain frequency point is greater than 1 but less than the threshold ⁇ 1 , and the fourth detection factor The value at the corresponding frequency point is less than the corresponding threshold ⁇ 0 , then the prior probability q s ( ⁇ ,k) of the estimated target speech existence at the frequency point is calculated according to the above formula (21), specifically according to the third Detection factor And the corresponding threshold is used to calculate the estimated prior probability q s ( ⁇ ,k) of the existence of the target speech at the frequency point; in other cases except the above two cases, the estimated prior probability of the existence of the target speech q s ( The values of
  • S204d Determine the posterior probability of the existence of the target speech at each frequency point of the current frame according to the likelihood ratio and the prior probability of the existence of the effective target speech.
  • the smoothing factor calculation unit calculates the smoothing factor according to the posterior probability of the existence of the target voice
  • the mapping model between the posterior probability of the target voice and the smoothing factor is determined; correspondingly, the corresponding smoothing factor is calculated according to the posterior probability of the target voice, including: The posterior probability of the existence of the target speech is used as the input of the mapping model, and the output of the mapping model is the smoothing factor.
  • Fig. 3 and Fig. 4 exemplarily show the first and second schematic diagrams of the mapping curve of the posterior probability and the smoothing factor of the target speech.
  • the abscissa is the posterior probability of the existence of the target speech, and the ordinate is the smoothing factor.
  • ⁇ , ⁇ , ⁇ , ⁇ are all configurable parameters, and different parameter configurations will produce different p(H 1
  • formula (28)-( 29) It can be seen that the relationship between the posterior probability of the existence of the target speech and the smoothing factor includes a non-linear relationship.
  • the noise update unit is configured to update the initial estimated noise power spectrum according to the smoothing factor and the noisy speech power spectrum to obtain an effective noise power spectrum
  • step S205 when step S205 is specifically executed, it is assumed that the update of the initial estimated noise power spectrum is stopped when there is a target voice, so as to avoid damage to the target voice, and the initial estimated noise power spectrum is updated when there is no target voice. To improve the accuracy of noise estimation. For this reason, the update modes in the case of no target voice and target voice are obtained respectively.
  • ⁇ 3 is the aforementioned smoothing factor, which has a functional relationship with the posterior probability of the existence of the target speech.
  • Y( ⁇ ,k)) is the posterior probability of the existence of no target speech.
  • Y( ⁇ ,k)) is the posterior probability of the existence of the target speech, and
  • Y( ⁇ ,k)) 1-p(H 1
  • the above three parameters are calculated by the noise update control module, In order to obtain the initial estimated noise power spectrum for the noisy speech in the ⁇ -1 frame, if it is necessary to enhance the noise reduction capability, the effective noise power spectrum corresponding to the noisy speech in frame ⁇ -1 can also be used, namely
  • the initial estimated noise power spectrum is updated to obtain the effective noise power spectrum, specifically, according to the noisy speech power spectrum, the smoothing factor, and the aftermath of the existence of no target speech
  • the initial estimated noise power spectrum and the posterior probability of the existence of the target speech are updated to obtain the effective noise power spectrum.
  • the historical initial estimated noise power spectrum can be directly the initial estimated noise power spectrum corresponding to the noisy speech in the ⁇ -1th frame. If the noise reduction capability needs to be enhanced, it can also be The effective noise power spectrum corresponding to the noisy speech in the ⁇ -1 frame.
  • the filter coefficient calculation module calculates the filter coefficient according to the effective noise power spectrum.
  • the filter module separately filters the real part and the imaginary part of the noisy speech spectrum according to the filter coefficient to obtain an enhanced speech spectrum.
  • a and b are variable parameters.
  • the real target speech power spectrum and real noise power spectrum cannot be obtained, so the following classic decision-guided method is used to approximate ⁇ k .
  • ⁇ min is The minimum desirable value.
  • 2 represents the power spectrum of the noisy speech in the ⁇ th frame.
  • the filter mainly includes an adder and a multiplier.
  • the filter coefficients calculated by equations (34) and (35) are used to reduce the noise of the real and imaginary parts of the noisy speech spectrum of the ⁇ th frame, that is, with the real and The imaginary parts are multiplied and added to obtain the enhanced speech complex spectrum.
  • the restoration module restores the enhanced speech spectrum from the frequency domain back to the time domain to obtain a time domain binary code group
  • the output module decodes and transmits the time-domain binary code group to be played through the speaker.
  • the "user” in the foregoing embodiment is a relative concept, and is not specifically limited to a person, but may also be a machine.
  • the above embodiments can be applied to various reference scenarios such as human-to-human voice calls, human-to-robot voice calls, and robot-to-robot voice calls. In fact, it can generalize any object that can produce effective voice.
  • steps S04a-S204e and step S205 are actually an exemplary embodiment of the noise estimation method.
  • steps S04a-S204e and step S205 are actually an exemplary embodiment of the noise estimation method.
  • further or specific technical implementation manners are not uniquely limited.
  • An embodiment of the present application provides a voice processing chip, which includes the noise estimation device in any embodiment of the present application.
  • An embodiment of the present application also provides an electronic device, which includes the solution described in any of the embodiments of the present application.
  • Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
  • Such terminals include: PDA, MID and UMPC devices, such as iPad.
  • Portable entertainment equipment This type of equipment can display and play multimedia content.
  • Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as program modules.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific transactions or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network execute transactions.
  • program modules can be located in local and remote computer storage media including storage devices.

Abstract

A noise estimation method. The method comprises: determining an initially estimated noise power of noisy speech; calculating a smooth factor according to the probability of the existence of target speech; and updating an initially estimated noise power spectrum according to the noisy speech and the smooth factor so as to obtain an effective noise power spectrum. By means of the method, an effective noise power spectrum is made to be as close as possible to a real noise power spectrum, thereby avoiding noise residue, and improving the overall performance of speech enhancement.

Description

噪声估计方法、噪声估计装置、语音处理芯片以及电子设备Noise estimation method, noise estimation device, speech processing chip and electronic equipment 技术领域Technical field
本申请实施例涉及信号处理技术领域,尤其涉及一种噪声估计方法、噪声估计装置、语音处理芯片以及电子设备。The embodiments of the present application relate to the field of signal processing technology, and in particular, to a noise estimation method, a noise estimation device, a speech processing chip, and electronic equipment.
背景技术Background technique
语音是人与人之间交流的重要手段。同时随着电子信息技术的发展,人与人之间的交流形式也越来越多元化,除了传统的面对面交流,还包括多种类型的语音通信,如打电话、微信语音、视频等;此外,语音通信也不再仅仅局限于人与人之间,近年来,人与机器、机器与机器之间的语音交互在日常生活中也随处可见。然而由于人或机器经常处于各种嘈杂的公共场所中,因此在语音通信或人机交互过程中,语音不可避免的受到周围噪声的干扰,如街道上的汽车噪声、办公室内的空调噪声、工厂里的机器噪声、餐馆中的其他音源干扰等,这使得接收方接收到的不再是纯净的语音,而是夹杂各种噪声的带噪语音,这些噪声会对语音造成严重干扰,降低了语音通信产品的产品质量,比如,在语音通信过程中导致语音失真,造成通信失败;在语音识别系统中,造成语音识别率急剧下降,严重影响语音识别系统的性能等。噪声不仅降低了语音通信产品的产品质量,还给用户带来较差的使用体验。因此,抑制噪声并提取较为纯净的语音信号(又称为目标语音)变得尤为重要。Voice is an important means of communication between people. At the same time, with the development of electronic information technology, the forms of communication between people are becoming more and more diversified. In addition to traditional face-to-face communication, it also includes various types of voice communication, such as phone calls, WeChat voice, video, etc.; in addition, , Voice communication is no longer limited to people. In recent years, voice interactions between people and machines, and between machines and machines have also been seen everywhere in daily life. However, because people or machines are often in various noisy public places, in the process of voice communication or human-computer interaction, the voice is inevitably interfered by surrounding noises, such as car noise on the street, air-conditioning noise in the office, and factory Machine noise in the restaurant, interference from other audio sources in the restaurant, etc., which makes the receiver no longer receive pure speech, but noisy speech mixed with various noises. These noises will cause serious interference to the speech and reduce the speech The product quality of communication products, for example, causes voice distortion during the voice communication process, causing communication failure; in the voice recognition system, it causes a sharp drop in the voice recognition rate, which seriously affects the performance of the voice recognition system. Noise not only reduces the product quality of voice communication products, but also brings poor user experience to users. Therefore, it is particularly important to suppress noise and extract a relatively pure voice signal (also called target voice).
语音增强技术是抑制噪声的重要手段。语音增强的主要任务包括两方面:其一,通过信号处理手段抑制噪声,获取较为纯净的语音,从而提高语音的可懂读和舒适度,改善由于噪声造成的听觉疲劳;其二,语音增强是各类语音通信和语音交互系统的必要环节,能够有效降低语音通信的误码率和语音识别的误识别率,进而提高语音处理系统的工作性能。Speech enhancement technology is an important means to suppress noise. The main tasks of speech enhancement include two aspects: one is to suppress noise through signal processing means to obtain relatively pure speech, thereby improving the intelligibility and comfort of speech, and improving hearing fatigue caused by noise; second, speech enhancement is The necessary links of various voice communication and voice interaction systems can effectively reduce the bit error rate of voice communication and the error recognition rate of voice recognition, thereby improving the performance of the voice processing system.
语音增强技术是信号处理领域的重要分支,现有技术中存在几类比较有代表性的语音增强技术,主要包括非参数法、参数法、统计方法以及小波变换、神经网络等。在非参数类方法中较典型的处理技术包括谱减法和其改进方法,该类方法因具有原理简单易于实现等特点而被广泛应用。但是非参数类方法在强噪声下,会产生严重的音乐噪声。参数法中典型的处理技术包括子空间法等,子空间法在实现过程中需要进行特征值分解,进而引入较大的计算量,因此在工程实现过程中较少被采用。统计方法中的典型代表为最小均方误差法(Minimum Mean Square Error,MMSE)及其改进方法,该类方法是在最小均方误差准则下求解的最优估计器,能够更好的抑制残留噪声,但是该方法原理更复杂,硬件开销也更大;新兴的小波变化和神经网络技术还处于研究阶段,目前较少应用于工程实现。Speech enhancement technology is an important branch in the field of signal processing. There are several types of representative speech enhancement technologies in the prior art, mainly including non-parametric methods, parametric methods, statistical methods, wavelet transforms, neural networks, etc. Among the non-parametric methods, the more typical processing techniques include spectral subtraction and its improvement methods. This type of method is widely used because of its simple principle and easy implementation. However, non-parametric methods will produce severe musical noise under strong noise. Typical processing techniques in the parameter method include subspace method, etc. The subspace method requires eigenvalue decomposition in the implementation process, and then introduces a larger amount of calculation, so it is rarely used in the engineering implementation process. The typical representative of statistical methods is the Minimum Mean Square Error (MMSE) and its improved methods. This type of method is the optimal estimator solved under the minimum mean square error criterion, which can better suppress residual noise , But the principle of this method is more complicated, and the hardware cost is also greater; the emerging wavelet change and neural network technology is still in the research stage, and is currently less applied to engineering implementation.
实际上,无论上述哪种语音增强技术,均假设噪声已知,然而在实际增强处理过程中无法提前获取噪声特性,而是需要利用带噪语音对噪声进行估计,而噪声估计的准确与否实际上直接影响语音增强的整体性能。因此,亟待提供一种有效的噪声估计方法,以提高语音增强的整体性能。In fact, regardless of the above-mentioned speech enhancement technology, it is assumed that the noise is known. However, in the actual enhancement process, the noise characteristics cannot be obtained in advance. Instead, noisy speech needs to be used to estimate the noise, and the accuracy of the noise estimation is practical. This directly affects the overall performance of speech enhancement. Therefore, it is urgent to provide an effective noise estimation method to improve the overall performance of speech enhancement.
发明内容Summary of the invention
有鉴于此,本申请实施例提供一种噪声估计方法、噪声估计装置、语音处理芯片以及电子设备,用以克服现有技术中的上述缺陷。In view of this, the embodiments of the present application provide a noise estimation method, a noise estimation device, a speech processing chip, and an electronic device to overcome the above-mentioned defects in the prior art.
本申请实施例提供一种噪声估计方法,其包括:The embodiment of the present application provides a noise estimation method, which includes:
确定带噪语音的初始估计噪声功率谱;Determine the initial estimated noise power spectrum of noisy speech;
根据目标语音存在的概率,计算平滑因子;Calculate the smoothing factor according to the probability of the target voice;
根据所述带噪语音以及所述平滑因子,对所述初始估计噪声功率谱进行更新得到有效噪声功率谱。According to the noisy speech and the smoothing factor, the initial estimated noise power spectrum is updated to obtain an effective noise power spectrum.
本申请实施例提供一种噪声估计装置,其包括:An embodiment of the present application provides a noise estimation device, which includes:
初始噪声估计单元,用于确定带噪语音的初始估计噪声功率谱;The initial noise estimation unit is used to determine the initial estimated noise power spectrum of noisy speech;
噪声更新单元,用于根据所述带噪语音以及平滑因子,对所述初始估计噪声功率谱进行更新得到有效噪声功率谱,所述计算平滑因子根据目标语音存在的概率计算得到。The noise update unit is configured to update the initial estimated noise power spectrum to obtain an effective noise power spectrum according to the noisy speech and a smoothing factor, and the calculated smoothing factor is calculated according to the probability of the existence of the target speech.
本申请实施例提供一种语音处理芯片,其包括本申请任一实施例中的噪声估计装置。An embodiment of the present application provides a voice processing chip, which includes the noise estimation device in any embodiment of the present application.
本申请实施例提供一种电子设备,其包括本申请任一实施例中的语音处理芯片。An embodiment of the present application provides an electronic device, which includes the voice processing chip in any embodiment of the present application.
本申请实施例中,通过确定带噪语音的初始估计噪声功率谱,根据目标语音存在的概率,计算平滑因子;根据所述带噪语音以及所述平滑因子,对所述初始估计噪声功率谱进行更新得到有效噪声功率谱,使得有效噪声功率谱尽可能接近真实噪声功率谱,从而在后续噪声消除过程中尽可能的消除掉噪声,避免了噪声的残留,从而提高了语音增强的整体性能。In this embodiment of the application, by determining the initial estimated noise power spectrum of the noisy speech, the smoothing factor is calculated according to the probability of the existence of the target speech; according to the noisy speech and the smoothing factor, the initial estimated noise power spectrum is calculated The effective noise power spectrum is updated to make the effective noise power spectrum as close to the real noise power spectrum as possible, thereby eliminating noise as much as possible in the subsequent noise elimination process, avoiding noise residue, and improving the overall performance of speech enhancement.
附图说明Description of the drawings
后文将参照附图以示例性而非限制性的方式详细描述本申请实施例的一些具体实施例。附图中相同的附图标记标示了相同或类似的部件或部分。本领域技术人员应该理解,这些附图未必是按比例绘制的。附图中:Hereinafter, some specific embodiments of the embodiments of the present application will be described in detail in an exemplary but not restrictive manner with reference to the accompanying drawings. The same reference numerals in the drawings indicate the same or similar components or parts. Those skilled in the art should understand that these drawings are not necessarily drawn to scale. In the attached picture:
图1为本申请实施例一中语音增强系统结构示意图;FIG. 1 is a schematic diagram of the structure of a speech enhancement system in Embodiment 1 of this application;
图2为本申请实施例二中语音增强方法的流程示意图;2 is a schematic flowchart of a voice enhancement method in Embodiment 2 of this application;
图3和图4分别给出了目标语音存在的后验概率与平滑因子的映射曲线示意图之一和之二。Figures 3 and 4 respectively show the first and second schematic diagrams of the mapping curves of the posterior probability and the smoothing factor of the target speech.
具体实施方式Detailed ways
实施本申请实施例的任一技术方案并不一定需要同时达到以上的所有优点。The implementation of any technical solution of the embodiments of the present application does not necessarily need to achieve all the above advantages at the same time.
下面结合本申请实施例附图进一步说明本申请实施例具体实现。The specific implementation of the embodiments of the present application will be further described below in conjunction with the drawings of the embodiments of the present application.
图1为本申请实施例一中语音增强系统结构示意图;如图1所示,在该语音增强系统中应用本申请的噪声估计方案。但是,本实施例中的语音增强系统的具体结构仅仅是示例,并非限定,实际上本领域普通技术人员也可以根据应用场景的需要,精简其中的某些模块,也可以在此基础上增加一些其他模块。模块之间的功能实际也可以相互集成。Fig. 1 is a schematic structural diagram of a speech enhancement system in Embodiment 1 of this application; as shown in Fig. 1, the noise estimation scheme of this application is applied in the speech enhancement system. However, the specific structure of the speech enhancement system in this embodiment is only an example and is not a limitation. In fact, those of ordinary skill in the art can also simplify some of the modules according to the needs of the application scenario, or add some modules on this basis. Other modules. The functions between the modules can actually be integrated with each other.
如图1所示,本实施例中,语音增强系统包括:采集模块、预处理模块、语音增强装置、还原模块和输出模块。As shown in Figure 1, in this embodiment, the voice enhancement system includes: a collection module, a preprocessing module, a voice enhancement device, a restoration module, and an output module.
(一)采集模块(1) Acquisition module
本实施例中,采集模块具体可以为麦克风等语音接收设备,主要用于采集感兴趣声源(或者感兴趣音源)产生的目标语音,其中还可采集到环境噪声及其他声源干扰的噪声,得到既包括目标语音又包括噪声的带噪语音。该采集模块还对带噪语音进行采样、编码等处理,将其转换成二进制码组,即数字形式的带噪语音,或者简称为原始数字带噪语音。In this embodiment, the collection module may specifically be a voice receiving device such as a microphone, which is mainly used to collect the target voice generated by the sound source of interest (or the sound source of interest), in which environmental noise and noise interfered by other sound sources can also be collected. Get noisy speech that includes both target speech and noise. The acquisition module also performs processing such as sampling and encoding on the noisy speech, and converts it into a binary code group, that is, the noisy speech in digital form, or simply called the original digital noisy speech.
(二)预处理模块(2) Preprocessing module
预处理模块,用于对所述带噪语音依次进行加窗分帧处理、预加重处理、快速傅里叶变换(fast Fourier transform,简称FFT)处理等,最终将带噪语音从时域转换到频域上。预处理模块包括但不局限于上述处理步骤。The preprocessing module is used to sequentially perform windowing and framing processing, pre-emphasis processing, fast Fourier transform (FFT) processing on the noisy speech, and finally convert the noisy speech from the time domain to On the frequency domain. The preprocessing module includes but is not limited to the above processing steps.
进一步地,本实施例中,预处理模块可以包括加窗单元、预加重单元、FFT单元,但不限于上述处理单元。Further, in this embodiment, the pre-processing module may include a windowing unit, a pre-emphasis unit, and an FFT unit, but is not limited to the above processing unit.
具体地,本实施例中,加窗单元主要用于通过窗函数对输入的带噪语音进行加窗分帧,其中根据目标语音的短时平稳特性,每一帧带噪语音的时长在10~30ms之间。另外,为了保持帧与帧之间的平滑过渡,前后两帧带噪语音之间采用重叠加窗的方式,重叠度为50%。窗函数根据不同应用场景选择,比如可以为矩形窗、汉明窗、凯撒窗等。Specifically, in this embodiment, the windowing unit is mainly used for windowing and framing the input noisy speech through a window function. According to the short-term stationary characteristics of the target speech, the duration of each frame of noisy speech is 10~ Between 30ms. In addition, in order to maintain a smooth transition from frame to frame, an overlap and windowing method is adopted between the noisy speech of the previous and subsequent frames, and the overlap degree is 50%. The window function is selected according to different application scenarios, such as rectangular window, Hamming window, Caesar window, etc.
具体地,本实施例中,预加重单元对加窗分帧后的每一帧带噪语音进行预加重处理,从而加强带噪语音的高频成分,同时去除口唇辐射的影响。预加重单元具体可以但不限于采用高通滤波器实现。Specifically, in this embodiment, the pre-emphasis unit performs pre-emphasis processing on each frame of noisy speech after windowing and framing, so as to enhance the high-frequency components of the noisy speech and at the same time remove the influence of lip radiation. The pre-emphasis unit can be specifically, but not limited to, a high-pass filter.
具体地,本实施例中,FFT单元对预加重后的每一帧带噪语音进行快速傅里叶变换,得到每一帧带噪语音的频域信号,以在频域对带噪语音进行降噪处理。Specifically, in this embodiment, the FFT unit performs fast Fourier transform on each frame of noisy speech after pre-emphasis, to obtain the frequency domain signal of each frame of noisy speech, so as to reduce the noisy speech in the frequency domain. Noise processing.
(三)语音增强装置(3) Voice enhancement device
本实施例中,语音增强装置主要用于在频域上对带噪语音中的噪声进行估计,并进一步通过滤波手段从带噪语音中消除掉噪声。In this embodiment, the speech enhancement device is mainly used to estimate the noise in the noisy speech in the frequency domain, and further remove the noise from the noisy speech by means of filtering.
具体地,语音增强装置包括噪声估计装置、噪声更新控制模块、滤波模块,除此之外,由于本实施例是基于带噪语音功率谱进行噪声估计和滤波器系数计算的,因此,语音增强装置还包括:功率谱计算模块。由此,语音增强装置实际上包括:功率谱计算模块、噪声估计装置、噪声更新控制模块、滤波模块共计四个主要模块。此处,需要说明的是,语音增强装置并非必须包括功率谱计算模块、滤波模块,实际上本领域普通技术人员也可以根据需求,将功率谱计算模块、滤波模块配置在语音增强系统中的其他模块上,或者功率谱计算模块、滤波模块为独立的模块。Specifically, the speech enhancement device includes a noise estimation device, a noise update control module, and a filtering module. In addition, since this embodiment performs noise estimation and filter coefficient calculation based on the noisy speech power spectrum, the speech enhancement device Also includes: power spectrum calculation module. Therefore, the speech enhancement device actually includes four main modules: a power spectrum calculation module, a noise estimation device, a noise update control module, and a filtering module. Here, it should be noted that the speech enhancement device does not necessarily include a power spectrum calculation module and a filter module. In fact, those of ordinary skill in the art can also configure the power spectrum calculation module and the filter module in the speech enhancement system according to their needs. On the module, or the power spectrum calculation module and the filter module are independent modules.
功率谱计算模块,用于根据带噪语音的频谱计算所述带噪语音功率谱,所述初始噪声估计单元进一步用于根据所述带噪语音功率谱,确定所述带噪语音的初始估计噪声功率谱。The power spectrum calculation module is configured to calculate the power spectrum of the noisy speech according to the spectrum of the noisy speech, and the initial noise estimation unit is further configured to determine the initial estimated noise of the noisy speech according to the power spectrum of the noisy speech power spectrum.
噪声更新控制模块,用于根据目标语音存在的后验概率,计算平滑因子。The noise update control module is used to calculate the smoothing factor according to the posterior probability of the target voice.
噪声估计装置,用于根据所述带噪语音功率谱,确定所述初始估计噪声功率谱,并根据所述噪声更新控制模块输出的平滑因子对所述初始估计噪声功率谱进行更新或修正,得到有效噪声功率谱。The noise estimation device is used to determine the initial estimated noise power spectrum according to the noisy speech power spectrum, and update or correct the initial estimated noise power spectrum according to the smoothing factor output by the noise update control module to obtain Effective noise power spectrum.
滤波模块,用于根据所述有效噪声功率谱,计算滤波器系数;根据所述滤波器系数,对所述带噪语音频谱的实部和虚部分别进行滤波操作,得到增强语音频谱。The filtering module is configured to calculate filter coefficients according to the effective noise power spectrum; according to the filter coefficients, the real part and the imaginary part of the noisy speech spectrum are respectively filtered to obtain an enhanced speech spectrum.
如前所述,是在频域上实现对带噪语音中的噪声进行估计,并通过频域滤波进一步从带噪语音中消除掉噪声。As mentioned earlier, the noise in the noisy speech is estimated in the frequency domain, and the noise is further eliminated from the noisy speech through frequency domain filtering.
(3.1)噪声估计装置(3.1) Noise estimation device
具体地,为了使得有效噪声功率谱更接近真实噪声功率谱,本实施例中,所述噪声估计装置采用两步估计,即确定初始估计噪声功率谱以及更新所述初始估计噪声功率谱得到有效噪声功率谱。Specifically, in order to make the effective noise power spectrum closer to the real noise power spectrum, in this embodiment, the noise estimation device adopts two-step estimation, namely, determining the initial estimated noise power spectrum and updating the initial estimated noise power spectrum to obtain the effective noise. power spectrum.
如图1所示,所述噪声估计装置包括初始噪声估计单元以及噪声更新单元,其中:初始噪声估计单元用于对所述带噪语音功率谱进行加窗处理,即频点间平滑;再对所述加窗后的所述带噪语音功率谱进行前后帧平滑处理,即帧间平滑,得到平滑后的带噪语音功率谱;对所述帧间平滑后的所述带噪语音功率谱在一定时间窗内进行最小功率谱搜索,将搜索到的最小功率谱作为所述初始估计噪声功率谱;噪声更新单元用于根据噪声更新控制模块输出的所述平滑因子对所述初始估计噪声功率谱进行更新得到有效噪声功率谱。As shown in Figure 1, the noise estimation device includes an initial noise estimation unit and a noise update unit, wherein: the initial noise estimation unit is used to perform windowing processing on the noisy speech power spectrum, that is, smoothing between frequencies; After the windowing, the noisy speech power spectrum is smoothed before and after frames, that is, inter-frame smoothing, to obtain a smoothed noisy speech power spectrum; the noisy speech power spectrum after the smoothing between the frames is A minimum power spectrum search is performed within a certain time window, and the searched minimum power spectrum is used as the initial estimated noise power spectrum; the noise update unit is configured to perform a calculation on the initial estimated noise power spectrum according to the smoothing factor output by the noise update control module Update to get the effective noise power spectrum.
具体地,除此之外,本领域普通技术人员也可以综合硬件开销,算法简易程度,应用场景和算法性能等角度选择其他不同的方法确定初始估计噪声功率谱,比如分位数、直方图、时间递归平均等。由于初始噪声估计单元仅仅是对带噪语音中的噪声的粗估,因此初始估计噪声功率谱与真实噪声功率谱之间还存在较大偏差。通常情况下,初始估计噪声功率谱与真实噪声功率谱相比偏小。如前所述,考虑到噪声估计的准确度直接影响后续滤波器的性能,以及直接影响语音增强系统的整体性能。因此,本申请增加了噪声更新单元来修正(或又称之更新)初始估计噪声功率谱,从而使得经过修正后得到的有效噪声功率谱尽可能接近真实噪声功率谱,从而可以有效解决滤波后的增强语音中有较大噪声残留问题,提高语音增强系统的整体性能。Specifically, in addition, those of ordinary skill in the art can also choose other different methods to determine the initial estimated noise power spectrum, such as quantiles, histograms, etc., from the perspectives of hardware overhead, algorithm simplicity, application scenarios, and algorithm performance. Time recursive average etc. Since the initial noise estimation unit is only a rough estimate of the noise in the noisy speech, there is still a large deviation between the initial estimated noise power spectrum and the real noise power spectrum. Generally, the initial estimated noise power spectrum is smaller than the real noise power spectrum. As mentioned earlier, considering that the accuracy of noise estimation directly affects the performance of subsequent filters, as well as the overall performance of the speech enhancement system. Therefore, this application adds a noise update unit to correct (or update) the initial estimated noise power spectrum, so that the effective noise power spectrum obtained after the correction is as close as possible to the real noise power spectrum, so that the filtered noise power spectrum can be effectively solved. There is a large residual noise problem in the enhanced speech, which improves the overall performance of the speech enhancement system.
(3.2)噪声更新控制模块(3.2) Noise update control module
本实施例中,针对每一帧带噪语音实时计算平滑因子,利用平滑因子对初始估计噪声功率谱进行更新得到有效噪声功率谱,所述的有效噪声功率谱更接近真实噪声功率谱,解决了初始估计噪声功率谱与真实噪声功率谱相比偏小的问题,同时由于平滑因子是目标语音存在的后验概率的函数,能够根据当前帧每个频点处目标语音存在的后验概率的大小控制平滑因子的大小,有效避免了有效噪声功率谱与真实噪声功率谱相比偏大的问题。因此通过增加噪声更新控制模块能有效解决语音增强过程中估计的初始估计噪声功率谱偏小引起噪声残留过大和有效噪声功率谱偏大引起语音损失等问题,以下提供了一种具体的噪声更新控制模块。In this embodiment, the smoothing factor is calculated in real time for each frame of noisy speech, and the initial estimated noise power spectrum is updated by the smoothing factor to obtain the effective noise power spectrum. The effective noise power spectrum is closer to the real noise power spectrum, which solves The initial estimated noise power spectrum is relatively small compared to the real noise power spectrum. At the same time, since the smoothing factor is a function of the posterior probability of the target speech, it can be based on the posterior probability of the target speech at each frequency point in the current frame. Controlling the size of the smoothing factor effectively avoids the problem that the effective noise power spectrum is too large compared to the real noise power spectrum. Therefore, by adding a noise update control module, it can effectively solve the problems of excessive noise residue caused by small initial estimated noise power spectrum during speech enhancement and voice loss caused by large effective noise power spectrum. The following provides a specific noise update control Module.
如图1所示,噪声更新控制模块包括似然比计算单元、目标语音存在的先验概率计算单元、目标语音存在的后验概率计算单元、平滑因子计算单元。具体地,本实施例中,似然比计算单元、目标语音存在的先验概率计算单元、目标语音存在的后验概率计算单元、平滑因子计算单元均是基于带噪语音功率谱在频域上进行各自相关的技术处理。As shown in Figure 1, the noise update control module includes a likelihood ratio calculation unit, a priori probability calculation unit for the existence of a target voice, a posterior probability calculation unit for the existence of a target voice, and a smoothing factor calculation unit. Specifically, in this embodiment, the likelihood ratio calculation unit, the prior probability calculation unit for the existence of the target speech, the posterior probability calculation unit for the existence of the target speech, and the smoothing factor calculation unit are all based on the power spectrum of the noisy speech in the frequency domain. Carry out their respective related technical processing.
本实施例中的噪声更新控制模块的具体结构仅仅是示例,并非限定,实际上本领域普通技术人员也可以根据应用场景的需要,精简其中的某些模块,也可以在此基础上增加一些其他模块。模块之间的功能实际也可以相互集成。The specific structure of the noise update control module in this embodiment is only an example and not a limitation. In fact, a person of ordinary skill in the art can also simplify some of the modules according to the needs of the application scenario, or add some other modules on this basis. Module. The functions between the modules can actually be integrated with each other.
具体地,本实施例中,似然比计算单元用于根据假设存在所述目标语音时带噪语音频谱的概率密度分布函数和假设不存在所述目标语音时带噪语音频谱的概率密度分布函数,计算似然比。进一步地,所述似然比计算单元在根据假设存在所述目标语音时带噪语音频谱的概率密度分布函数和假设不存在所述目标语音时带噪语音频谱的概率密度分布函数,计算似然比时,用估计的所述增强语音功率谱代替所述目标语音功率谱,用所述初始估计噪声功率谱代替所述真实噪声功率谱,并根据所述带噪语音功率谱、所述增强语音功率谱以及所述初始估计噪声功率谱,计算所述似然比。似然比计算单元具体计算似然比的方式取决于在特定应用场景中的目标语音频谱和噪声频谱的概率密度分布特性,详细请参见下述方法实施例说明。Specifically, in this embodiment, the likelihood ratio calculation unit is used for the probability density distribution function of the noisy speech spectrum when the target speech is assumed to exist and the probability density distribution function of the noisy speech spectrum when the target speech is assumed to be absent. , Calculate the likelihood ratio. Further, the likelihood ratio calculation unit calculates the likelihood based on the probability density distribution function of the noisy speech spectrum when the target speech is assumed to exist and the probability density distribution function of the noisy speech spectrum when the target speech is assumed to be absent. When comparing, replace the target speech power spectrum with the estimated enhanced speech power spectrum, replace the real noise power spectrum with the initial estimated noise power spectrum, and use the noisy speech power spectrum and the enhanced speech The power spectrum and the initial estimated noise power spectrum are used to calculate the likelihood ratio. The specific method for the likelihood ratio calculation unit to calculate the likelihood ratio depends on the probability density distribution characteristics of the target speech spectrum and the noise spectrum in a specific application scenario. For details, please refer to the description of the following method embodiments.
具体地,所述目标语音存在的先验概率计算单元用于根据所述带噪语音功率谱,确定有效的目标语音存在的先验概率,以判断所述带噪语音中存在所述目标语音的可能性。Specifically, the prior probability calculation unit for the existence of the target speech is used to determine the prior probability of the existence of the effective target speech according to the power spectrum of the noisy speech to determine whether the target speech exists in the noisy speech possibility.
进一步地,所述目标语音存在的先验概率计算单元分两步计算目标语音存在的先验概率,进一步包括:第一步,根据所述带噪语音功率谱,初步判断当前帧所述带噪语音中是否存在目标语音;第二步,根据所述初步判断当前帧所述带噪语音中是否存在目标语音的初步判断结果,确定估计的目标语音存在的先验概率,以及根据估计的目标语音存在的先验概率,确定有效的目标语音存在的先验概率。Further, the prior probability calculation unit for the existence of the target speech calculates the prior probability of the existence of the target speech in two steps, and further includes: the first step is to preliminarily judge the noisy current frame according to the power spectrum of the noisy speech Whether there is a target voice in the voice; the second step is to determine the estimated prior probability of the existence of the target voice according to the preliminary judgment result of the preliminary judgment on whether there is a target voice in the noisy voice in the current frame, and according to the estimated target voice The prior probability of existence determines the prior probability of the existence of a valid target speech.
进一步地,所述目标语音存在的先验概率计算单元进一步用于:若不存在所述目标语音,则对不存在所述目标语音的带噪语音功率谱进行频点间平滑得到频点间平滑后的带噪语音功率谱,或者,若存在所述目标语音,则将历史的帧间平滑后的带噪语音功率谱作为所述频点间平滑后的带噪语音功率谱;对所述频点间平滑后的带噪语音功率谱进行帧间平滑得到帧间平滑后的带噪语音功率谱;根据所述帧间平滑后的带噪语音功率谱,确定所述估计的目标语音存在的先验概率。对于当前帧所述带噪语音来说,历史的帧间平滑后的带噪语音功率谱直接可以为针对上一帧所述带噪语音时得到的帧间平滑后的带噪语音功率谱。当然,此处,并非特别限定只能用针对上一帧所述带噪语音时得到的帧间平滑后的带噪语音功率谱,实际上可以根据应用场景需求灵活选取。Further, the prior probability calculation unit for the existence of the target speech is further configured to: if the target speech does not exist, perform inter-frequency smoothing on the power spectrum of noisy speech without the target speech to obtain inter-frequency smoothing Or, if the target speech exists, the power spectrum of the noisy speech after smoothing between historical frames is used as the power spectrum of the noisy speech after the smoothing between frequency points; Perform inter-frame smoothing on the noisy speech power spectrum smoothed between points to obtain the inter-frame smoothed noisy speech power spectrum; according to the smoothed inter-frame noisy speech power spectrum, determine the prior art of the estimated target speech Probability. For the noisy speech in the current frame, the historical power spectrum of the noisy speech after inter-frame smoothing may directly be the inter-frame smoothed noisy speech power spectrum obtained for the noisy speech in the previous frame. Of course, here, it is not particularly limited to only use the inter-frame smoothed noisy speech power spectrum obtained for the noisy speech in the previous frame. In fact, it can be flexibly selected according to the requirements of the application scenario.
进一步地,所述目标语音存在的先验概率计算单元进一步根据所述带噪语音功率谱以及所述帧间平滑后的所述带噪语音的最小功率谱,确定当前帧每个频点的第一检测因子;根据经加窗以及帧间平滑后的带噪语音功率谱以及所述帧间平滑后的带噪语音的最小功率谱确定当前帧每个频点的第二检测因子,以根据所述当前帧每个频点计算得到第一检测因子以及所述第二检测因子,初步判断所述带噪语音每个频点中是否存在所述目标语音。Further, the prior probability calculation unit for the existence of the target speech further determines the first frequency of each frequency point in the current frame according to the power spectrum of the noisy speech and the minimum power spectrum of the noisy speech smoothed between the frames. A detection factor; the second detection factor of each frequency point of the current frame is determined according to the power spectrum of the noisy speech after being windowed and smoothed between frames and the minimum power spectrum of the noisy speech after the smoothing between frames, so as The first detection factor and the second detection factor are calculated at each frequency point of the current frame, and it is preliminarily determined whether the target voice exists in each frequency point of the noisy speech.
若当前帧带噪语音的某个频点计算出的所述第一检测因子小于设定的第一检测因子门限,且所述第二检测因子小于设定的第二检测因子门限,则初步判定所述带噪语音在该频点中不存在所述目标语音;若不满足上述条件,则初步判定所述带噪语音在该频点处存在所述目标语音。If the first detection factor calculated at a certain frequency point of the noisy speech in the current frame is less than the set first detection factor threshold, and the second detection factor is less than the set second detection factor threshold, a preliminary determination is made The noisy speech does not have the target speech at this frequency point; if the above conditions are not met, it is preliminarily determined that the noisy speech has the target speech at this frequency point.
进一步地,所述目标语音存在的先验概率计算单元用于根据定义的指示函数表示当前帧带噪语音的某个频点是否存在目标语音的判定结果,判定当前帧该频点处存在所述目标语音时,所述指示函数在该频点处的值为0,判定当前帧该频点处不存在所述目标语音时,所述指示函数在该频点处的值为1。Further, the prior probability calculation unit for the existence of the target speech is used to indicate whether the target speech exists at a certain frequency point of the noisy speech in the current frame according to a defined indicator function, and determine whether the target speech exists at the frequency point in the current frame In the case of the target voice, the value of the indicator function at the frequency point is 0, and when it is determined that the target voice does not exist at the frequency point of the current frame, the value of the indicator function at the frequency point is 1.
进一步地,所述目标语音存在的先验概率计算单元进一步用于根据所述当前帧每个频点计算得到的所述指示函数的值,对所述的当前帧带噪语音进行频点之间平滑(或称之一次平滑),若当前帧每个频点所述指示函数的值不全为零时,即判定该帧不存在所述目标语音,则通过所述指示函数和所述窗函数对不存在所述目标语音的带噪语音功率谱进行频点间平滑;进一步地,根据所述一次平滑后的带噪语音进行帧间平滑(或称之进行二次平滑),得到两次平滑后的带噪语音功率谱。若当前帧每个频点所述指示函数的值全为零时,即判定该帧存在所述目标语音,则沿用前一帧得到的两次平滑后的带噪语音功率谱作为当前帧二次平滑后的带噪语音功率谱。Further, the prior probability calculation unit for the existence of the target speech is further configured to perform the inter-frequency calculation of the noisy speech in the current frame according to the value of the indicator function calculated at each frequency point of the current frame. Smoothing (or one-time smoothing), if the value of the indicator function at each frequency point of the current frame is not all zero, it is determined that the target voice does not exist in the frame, and then the indicator function and the window function are paired The power spectrum of the noisy speech without the target speech is smoothed between frequency points; further, the inter-frame smoothing (or called the second smoothing) is performed according to the noisy speech after the first smoothing to obtain two smoothed The power spectrum of noisy speech. If the value of the indicator function at each frequency point of the current frame is all zero, it is determined that the target speech exists in the frame, and the two smoothed noisy speech power spectrum obtained in the previous frame is used as the current frame secondary Smoothed noisy speech power spectrum.
进一步地,所述目标语音存在的先验概率计算单元进一步用于根据所述带噪语音功率谱及所述的两次平滑后的带噪语音功率谱的最小功率谱确定当前帧每个频点处的第三检测因子;根据所述的两次平滑后的带噪语音功率谱及其最小功率谱确定当前帧每个频点处的第四检测因子;根据每个频点处所述第三检测因子与所述第四检测因子,确定该对应频点处所述估计的目标语音存在的先验概率。Further, the prior probability calculation unit for the existence of the target speech is further configured to determine each frequency point of the current frame according to the noisy speech power spectrum and the minimum power spectrum of the two smoothed noisy speech power spectra The third detection factor at each frequency point in the current frame is determined according to the two smoothed noisy speech power spectrum and its minimum power spectrum; according to the third detection factor at each frequency point The detection factor and the fourth detection factor determine the prior probability of the presence of the estimated target speech at the corresponding frequency point.
进一步地,所述目标语音存在的先验概率计算单元进一步用于将当前帧每个频点计算的所述第三检测因子与所述第四检测因子与对应的阈值进行比对,并根据所述不同的比对结果,确定当前帧该对应频点处所述估计的目标语音存在的先验概率。Further, the a priori probability calculation unit for the existence of the target voice is further configured to compare the third detection factor and the fourth detection factor calculated at each frequency point of the current frame with a corresponding threshold, and according to the The different comparison results are used to determine the prior probability of the presence of the estimated target speech at the corresponding frequency point in the current frame.
进一步地,所述目标语音存在的先验概率计算单元进一步用于根据所述的当前帧每个频点处估计的目标语音存在的先验概率与目标语音存在的先验概率的最小值进行比较,取每个频点处所述估计的目标语音存在的先验概率与所述目标语音存在的先验概率的最小值中的最大值作为当前帧每个频点处的有效的目标语音存在的先验概率,从而可得到每帧带噪语音在每个频点处的有效的目标语音存在的先验概率。Further, the prior probability calculation unit for the existence of the target voice is further configured to compare the prior probability of the existence of the target speech at each frequency point of the current frame with the minimum value of the prior probability of the existence of the target speech , Taking the maximum of the minimum value of the estimated prior probability of the existence of the target speech at each frequency point and the minimum of the prior probability of the existence of the target speech as the effective target speech existence at each frequency point in the current frame The prior probability, so that the prior probability of the effective target voice at each frequency point of each frame of noisy speech can be obtained.
具体地,本实施例中,目标语音存在的后验概率计算单元用于根据所述似然比以及所述有效的目标语音存在的先验概率,确定所述目标语音存在的后验概率。Specifically, in this embodiment, the posterior probability calculation unit for the existence of the target speech is configured to determine the posterior probability of the existence of the target speech according to the likelihood ratio and the prior probability of the existence of the effective target speech.
具体地,本实施例中,平滑因子计算单元用于根据不同降噪场景,确定所述目标语音存在的后验概率与所述平滑因子之间的映射模型;并将所述目标语音存在的后验概率作为所述映射模型的输入,所述平滑因子为所述映射模型的输出。本实施例中,平滑因子是目标语音存在的后验概率的函数,对于每一帧带噪语音对应的频域语音信号,其在每个频点均可以计算出对应的目标语音存在的后验概率,所述的不同频点的目标语音存在的后验概率能够映射到不同的平滑因子,进一步地,利用所述映射得到的平滑因子实现了在每个频点对所述的初始估计噪声功率谱进行修正。Specifically, in this embodiment, the smoothing factor calculation unit is used to determine the mapping model between the posterior probability of the existence of the target speech and the smoothing factor according to different noise reduction scenarios; The test probability is used as the input of the mapping model, and the smoothing factor is the output of the mapping model. In this embodiment, the smoothing factor is a function of the posterior probability of the existence of the target speech. For the frequency domain speech signal corresponding to each frame of noisy speech, the posterior probability of the existence of the corresponding target speech can be calculated at each frequency point. Probability, the posterior probability of the existence of the target speech at different frequencies can be mapped to different smoothing factors, and further, the smoothing factor obtained by the mapping is used to achieve the initial estimated noise power at each frequency The spectrum is corrected.
具体地,可以建立所述目标语音存在的后验概率与噪声更新使用的所述平滑因子之间的映射表格,在实现过程中可以通过采用查表的方式降低运算量,从而节省硬件资源开销。Specifically, a mapping table between the posterior probability of the existence of the target voice and the smoothing factor used in the noise update can be established. In the implementation process, the amount of calculation can be reduced by using a table lookup method, thereby saving hardware resource overhead.
(3.3)滤波模块(3.3) Filter module
本实施例中,滤波模块用于根据所述有效噪声功率谱,计算滤波器系数;根据所述滤波器系数,对所述带噪语音进行滤波得到增强语音频谱。具体地,可以根据针对所述前一帧和当前帧带噪语音的有效噪声功率谱、针对前一帧带噪语音计算得到的所述目标语音功率谱(或称之为增强语音功率谱)及针对当前帧带噪语音的所述带噪语音功率谱计算滤波器系数;根据所述滤波器系数,对当前帧带噪语音的所述带噪语音频谱的实部和虚部分别进行滤波得到增强语音频谱。In this embodiment, the filtering module is configured to calculate filter coefficients according to the effective noise power spectrum; and filter the noisy speech according to the filter coefficients to obtain an enhanced speech spectrum. Specifically, the target speech power spectrum (or called enhanced speech power spectrum) calculated for the noisy speech of the previous frame and the effective noise power spectrum of the noisy speech of the current frame, and Calculate filter coefficients for the noisy speech power spectrum of the current frame of noisy speech; according to the filter coefficients, filter the real and imaginary parts of the noisy speech spectrum of the current frame of noisy speech to be enhanced Voice spectrum.
进一步地,如图1所示,滤波模块可以包括:滤波器系数计算单元以及滤波 器单元。其中,滤波器系数计算单元用于根据针对所述前一帧和当前帧带噪语音的有效噪声功率谱、针对前一帧带噪语音计算得到的所述目标语音功率谱(或称之为增强语音功率谱)及针对当前帧带噪语音的所述带噪语音功率谱计算滤波器系数;滤波器单元根据所述滤波器系数,对当前帧带噪语音的所述带噪语音频谱的实部和虚部分别进行滤波得到增强语音频谱。本实施例中,滤波器可以为维纳滤波器、MMSE估计器等。Further, as shown in Fig. 1, the filtering module may include: a filter coefficient calculation unit and a filter unit. Wherein, the filter coefficient calculation unit is used to calculate the target speech power spectrum (or called enhancement) based on the effective noise power spectrum of the noisy speech in the previous frame and the current frame, and the target speech power spectrum calculated for the noisy speech in the previous frame. Speech power spectrum) and calculate filter coefficients for the noisy speech power spectrum of the current frame of noisy speech; the filter unit calculates the real part of the noisy speech spectrum of the current frame of noisy speech according to the filter coefficient And the imaginary part is filtered separately to obtain the enhanced speech spectrum. In this embodiment, the filter may be a Wiener filter, an MMSE estimator, etc.
(四)还原模块(4) Restore module
本实施例中,还原模块主要用于将降噪后的增强语音从频域还原回时域,同时消除预处理模块的一些操作的影响。In this embodiment, the restoration module is mainly used to restore the noise-reduced enhanced speech from the frequency domain back to the time domain, while eliminating the influence of some operations of the preprocessing module.
具体地,还原模块包括快速傅里叶逆变换(Inverse Fast Fourier Transform,简称IFFT)单元、去加重单元、去窗单元。IFFT单元对增强语音频谱做IFFT操作,将增强语音从频域还原回时域得到增强语音的时域波形。去加重单元主要用于消除预加重过程中高通滤波器的影响,去加重单元主要采用低通滤波器实现;去窗单元主要用于去除之前的加窗影响,去窗操作一方面要去重叠,将增强时域语音还原回最初的时域序列,同时还要去除加窗操作对幅度的影响。为此,本实施例中,加窗和去窗单元优选同时设计。Specifically, the restoration module includes an Inverse Fast Fourier Transform (IFFT) unit, a de-emphasis unit, and a window-removing unit. The IFFT unit performs an IFFT operation on the enhanced speech spectrum, and restores the enhanced speech from the frequency domain back to the time domain to obtain the time domain waveform of the enhanced speech. The de-emphasis unit is mainly used to eliminate the influence of the high-pass filter in the pre-emphasis process. The de-emphasis unit is mainly realized by a low-pass filter; the window-removing unit is mainly used to remove the effect of the previous windowing. The enhanced time-domain speech is restored to the original time-domain sequence, and the influence of the windowing operation on the amplitude is also removed. For this reason, in this embodiment, the windowing unit and the window removing unit are preferably designed at the same time.
(四)输出模块(4) Output module
本实施例中,输出模块将还原模块输入的时域二进制码组进行解码传输等相关操作,然后经扬声器播放出来。In this embodiment, the output module performs related operations such as decoding and transmission of the time-domain binary code group input by the restoration module, and then plays it through the speaker.
此处,需要说明的是,上述图1实施例中,是从系统角度对本申请实施例中噪声估计装置应用的示例性解释,并非唯一性限定。另外,根据应用场景的需要,上述图1所示实施例中,进一步或者具体的技术实现方式也仅仅是示例,并非唯一性限定。Here, it should be noted that the embodiment in FIG. 1 above is an exemplary explanation of the application of the noise estimation device in the embodiment of the present application from a system perspective, and is not a unique limitation. In addition, according to the needs of the application scenario, in the embodiment shown in FIG. 1, further or specific technical implementation manners are only examples, and are not uniquely limited.
图2为本申请实施例二中语音增强方法的流程示意图;对应上述图1的语音增强系统结构;具体地,本实施例中,包括如下步骤:Fig. 2 is a schematic flowchart of the speech enhancement method in the second embodiment of this application; it corresponds to the structure of the speech enhancement system in Fig. 1; specifically, in this embodiment, the following steps are included:
S201、采集模块采集带噪语音;S201. The collection module collects noisy speech;
本实施例中,采集的带噪语音如下述公式(1)表示。In this embodiment, the collected noisy speech is represented by the following formula (1).
y(n)=x(n)+n(n)     (1)y(n)=x(n)+n(n) (1)
其中,y(n)为采集到的带噪语音,x(n)为目标语音,n(n)为噪声,其中括号中的n表示采样时刻序列。Among them, y(n) is the collected noisy speech, x(n) is the target speech, n(n) is the noise, and the n in parentheses represents the sampling time sequence.
S202、预处理模块对所述带噪语音进行预处理以将所述带噪语音变换到频域上。S202. The preprocessing module preprocesses the noisy speech to transform the noisy speech into the frequency domain.
本实施例中,步骤S202具体包括步骤S212-S232:In this embodiment, step S202 specifically includes steps S212-S232:
S212、加窗单元通过窗函数对带噪语音进行加窗分帧;S212. The windowing unit performs windowing and framing on the noisy speech through a window function;
S222、预加重单元对加窗分帧后的每一帧带噪语音进行预加重处理;S222. The pre-emphasis unit performs pre-emphasis processing on each frame of noisy speech after windowing and framing;
S232、FFT单元对预加重后的每一帧带噪语音进行快速傅里叶变换以将带噪语音变换到频域上。S232. The FFT unit performs fast Fourier transform on each frame of noisy speech after pre-emphasis to transform the noisy speech into the frequency domain.
经过上述步骤S212-S232处理后得到第λ帧带噪语音的频域信号,如公式(2) 所示:After the above steps S212-S232 are processed, the frequency domain signal of the λth frame of noisy speech is obtained, as shown in formula (2):
Figure PCTCN2019096503-appb-000001
Figure PCTCN2019096503-appb-000001
其中,Y(λ,k)表示第λ帧带噪语音在频域上的频谱,X(λ,k)表示第λ帧目标语音在频域上的频谱,N(λ,k)表示第λ帧噪声在频域上的频谱,k表示频域信号的不同频点,0≤k≤N-1。[w(l-m)]为加窗操作中使用的窗函数,其中,m表示代表窗的位置的参数,l表示代表窗长的参数,N表示FFT点数。其中在一个具体的应用场景下,窗函数满足以下特性。Among them, Y(λ,k) represents the frequency spectrum of the noisy speech in the λth frame, X(λ,k) represents the frequency spectrum of the target speech in the λth frame, and N(λ,k) represents the λth frame. The frequency spectrum of the frame noise in the frequency domain, k represents different frequency points of the frequency domain signal, 0≤k≤N-1. [w(l-m)] is the window function used in the windowing operation, where m represents the parameter representing the position of the window, l represents the parameter representing the window length, and N represents the number of FFT points. Among them, in a specific application scenario, the window function satisfies the following characteristics.
w 2(M)+w 2(M+L)=1    (3) w 2 (M)+w 2 (M+L)=1 (3)
其中,L为参与加窗操作的每帧带噪语音的具体长度,即具体的窗长,M表示窗的具体位置,即上述公式中l=L,m=M。Among them, L is the specific length of each frame of noisy speech participating in the windowing operation, that is, the specific window length, and M represents the specific position of the window, that is, in the above formula, l=L, m=M.
S203、功率谱计算模块计算所述带噪语音功率谱;S203: The power spectrum calculation module calculates the power spectrum of the noisy speech;
本实施例中,第λ帧的带噪语音功率谱|Y(λ,k)| 2可以通过对该帧带噪语音频谱Y(λ,k)的实部和虚部分别平方并相加得到。但是,在一些应用场景中,考虑到计算和存储|Y(λ,k)| 2将占用很多硬件资源,可以采用带噪语音模值|Y(λ,k)|来代替带噪语音功率谱,即对带噪语音功率谱开根号得到所述带噪语音模值|Y(λ,k)|。 In this embodiment, the noisy speech power spectrum of the λth frame |Y(λ,k)| 2 can be obtained by squaring and adding the real and imaginary parts of the noisy speech spectrum Y(λ,k) of the frame. . However, in some application scenarios, considering that the calculation and storage of |Y(λ,k)| 2 will occupy a lot of hardware resources, the noisy speech modulus |Y(λ,k)| can be used to replace the noisy speech power spectrum , That is, open the root sign of the noisy speech power spectrum to obtain the noisy speech modulus |Y(λ,k)|.
S204a、初始噪声估计单元根据所述带噪语音功率谱,确定所述初始估计噪声功率谱。S204a. The initial noise estimation unit determines the initial estimated noise power spectrum according to the noisy speech power spectrum.
本实施例中,步骤S204a在确定所述初始估计噪声功率谱时,具体包括如下步骤:In this embodiment, step S204a specifically includes the following steps when determining the initial estimated noise power spectrum:
S214a、对所述带噪语音功率谱进行加窗处理即在频点之间进行带噪语音功率谱的平滑处理;S214a: Perform windowing processing on the noisy speech power spectrum, that is, perform smoothing processing on the noisy speech power spectrum between frequency points;
P w(λ,k)=cov(|Y(λ,k)| 2,hamming(n))     (4) P w (λ,k)=cov(|Y(λ,k)| 2 ,hamming(n)) (4)
其中hamming(n)为归一化汉明窗,cov是卷积操作,P w(λ,k)表示第λ帧所述加窗后的带噪语音功率谱,m为表示窗长的参数,k表示不同频点。 Among them, hamming(n) is the normalized Hamming window, cov is the convolution operation, P w (λ,k) is the power spectrum of noisy speech after windowing in the λth frame, and m is the parameter representing the window length, k represents different frequency points.
S224a、对所述加窗后的带噪语音功率谱进行帧间平滑处理;S224a: Perform inter-frame smoothing processing on the windowed noisy speech power spectrum;
P(λ,)=α 1P(λ-1,k)+(1-α 1)P w(λ,k)    (5) P(λ,)=α 1 P(λ-1,k)+(1-α 1 )P w (λ,k) (5)
其中α 1为平滑因子,P(λ-1,k)表示第λ-1帧所述经加窗后的带噪语音功率谱P w(λ-1,k)经前后帧平滑后的带噪语音功率谱,P w(λ,k)表示第λ帧所述带噪语音功率谱经加窗后的带噪语音功率谱,P(λ,k)表示第λ帧所述经加窗后的带噪语音功率谱P w(λ,k)经前后帧平滑后的带噪语音功率谱,即平滑后的带噪语音功率谱。 Where α 1 is the smoothing factor, and P(λ-1,k) represents the noisy speech power spectrum P w (λ-1,k) smoothed by the preceding and following frames in the λ-1 frame after being windowed Speech power spectrum, P w (λ,k) represents the noisy speech power spectrum of the λth frame after windowing, P(λ,k) represents the windowed noise of the λth frame The noisy speech power spectrum P w (λ, k) is the smoothed noisy speech power spectrum of the front and rear frames, that is, the smoothed noisy speech power spectrum.
S234a、对所述经加窗和前后帧间平滑后的带噪语音功率谱(或称之为平滑后的带噪语音功率谱),在某一时间窗内进行最小功率谱搜索。S234a: Perform a minimum power spectrum search on the noisy speech power spectrum after being windowed and smoothed between the preceding and following frames (or called the smoothed noisy speech power spectrum) in a certain time window.
本实施中,将搜索到的最小功率谱作为所述初始估计噪声功率谱。In this implementation, the minimum power spectrum found is used as the initial estimated noise power spectrum.
if mod(λ/D)==0if mod(λ/D) == 0
P min(λ,k)=min{P temp(λ-1,k),P(λ,k)}     (6) P min (λ,k)=min{P temp (λ-1,k),P(λ,k)} (6)
P temp(λ,k)=P(λ,k)       (7) P temp (λ,k)=P(λ,k) (7)
elseelse
P min(λ,k)=min{P min(λ-1,k),P(λ,k)}    (8) P min (λ,k)=min{P min (λ-1,k),P(λ,k)} (8)
P temp(λ,k)=min{P temp(λ-1,k),P(λ,k)}     (9) P temp (λ,k)=min{P temp (λ-1,k),P(λ,k)} (9)
endend
其中,D为最小功率谱搜索窗长,D选取过小会导致噪声功率谱波动较大,D过大又会导致初始估计噪声与真实噪声之间存在较长的时间延迟,因此,D在具体应用时折中选择。Among them, D is the minimum power spectrum search window length. If D is selected too small, the noise power spectrum will fluctuate greatly. If D is too large, it will cause a long time delay between the initial estimated noise and the real noise. Therefore, D is specific A compromise choice when applying.
由上述公式(6)-(9)可见,通过计算当前处理的带噪语音帧数λ与最小功率谱搜索窗长D的余数,判断其是否为0。如果为0,则把第λ帧所述平滑后的带噪语音功率谱P(λ,k)保存到临时数组P temp(λ,k)中,取第λ-1帧所述的临时数组P temp(λ-1,k)中保存的数据与表示第λ帧所述平滑后的带噪语音功率谱P(λ,k)在每个频点k处的最小值作为第λ帧的最小功率谱P min(λ,k)。如果不为0,则确定第λ-1帧所述临时数组P temp(λ-1,k)中保存的数据与第λ帧所述平滑后的带噪语音功率谱P(λ,k)在每个频点k处的最小值作为当前帧临时数组P temp(λ,k)中保存的数据,并进一步确定第λ-1帧所述平滑后的带噪语音功率谱的最小功率谱P min(λ-1,k)与当前帧平滑后的带噪语音功率谱P(λ,k)在每个频点k处的最小值作为当前帧最小功率谱P min(λ,k)。 It can be seen from the above formulas (6)-(9) that by calculating the remainder of the number of noisy speech frames currently processed λ and the minimum power spectrum search window length D, it is judged whether it is 0. If it is 0, save the smoothed noisy speech power spectrum P(λ,k) in the λth frame in the temporary array P temp (λ,k), and take the temporary array P in the λ-1 frame The minimum value of the data saved in temp (λ-1,k) and the smoothed noisy speech power spectrum P(λ,k) at each frequency point k in the λth frame is used as the minimum power of the λth frame The spectrum P min (λ,k). If it is not 0, it is determined that the data stored in the temporary array P temp (λ-1,k) in the λ-1 frame and the smoothed noisy speech power spectrum P(λ,k) in the λth frame are in The minimum value at each frequency point k is used as the data saved in the temporary array P temp (λ, k) of the current frame, and the minimum power spectrum P min of the smoothed noisy speech power spectrum in the λ-1 frame is further determined The minimum value of (λ-1,k) and the smoothed noised speech power spectrum P(λ,k) of the current frame at each frequency point k is taken as the minimum power spectrum P min (λ,k) of the current frame.
Figure PCTCN2019096503-appb-000002
Figure PCTCN2019096503-appb-000002
参见上述公式(10),可见将每一帧经过比较后输出的平滑后的带噪语音最小功率谱作为当前帧所述初始估计噪声功率谱,P min(λ,k)表示第λ帧输出的平滑后的带噪语音的最小功率谱,
Figure PCTCN2019096503-appb-000003
表示初始估计噪声功率谱。
Referring to the above formula (10), it can be seen that the minimum power spectrum of the smoothed noisy speech output after comparison of each frame is taken as the initial estimated noise power spectrum of the current frame, and P min (λ,k) represents the output of the λth frame The minimum power spectrum of the smoothed noisy speech,
Figure PCTCN2019096503-appb-000003
Represents the initial estimated noise power spectrum.
S204b、根据假设存在所述目标语音时带噪语音频谱的概率密度分布和假设不存在所述目标语音时带噪语音频谱的概率密度分布,确定似然比。S204b: Determine the likelihood ratio according to the probability density distribution of the noisy speech spectrum when the target speech is assumed to exist and the probability density distribution of the noisy speech spectrum when the target speech is assumed to not exist.
在统计理论上,似然比随目标语音和噪声的频谱概率密度函数分布特性的改变而改变,下面假设目标语音和噪声的频谱都服从高斯分布,则In statistical theory, the likelihood ratio changes with the change in the distribution characteristics of the target speech and noise spectrum probability density function. The following assumes that the target speech and noise spectrum obey the Gaussian distribution, then
Figure PCTCN2019096503-appb-000004
Figure PCTCN2019096503-appb-000004
Figure PCTCN2019096503-appb-000005
Figure PCTCN2019096503-appb-000005
Figure PCTCN2019096503-appb-000006
Figure PCTCN2019096503-appb-000006
Figure PCTCN2019096503-appb-000007
Figure PCTCN2019096503-appb-000007
具体地,工程实现过程中,根据假设存在所述目标语音时带噪语音频谱的概率密度分布和假设不存在所述目标语音时带噪语音频谱的概率密度分布,确定似然比 时,使用所述当前帧(如第λ帧)估计的目标语音功率谱
Figure PCTCN2019096503-appb-000008
(或称之为增强语音功率谱)代替所述当前帧(如第λ帧)真实的目标语音功率谱|X(λ,k)| 2,具体地,可以通过使用针对上一帧(如第λ-1帧)得到的滤波器系数对当前帧(如第λ帧)带噪语音功率谱进行滤波得到的增强语音功率谱作为估计的目标语音功率谱
Figure PCTCN2019096503-appb-000009
Figure PCTCN2019096503-appb-000010
表示真实噪声功率谱,其可用根据上述公式(10)计算的初始估计噪声功率谱
Figure PCTCN2019096503-appb-000011
代替,得到工程实现中的似然比
Figure PCTCN2019096503-appb-000012
计算公式如下:
Specifically, in the process of engineering realization, according to the probability density distribution of the noisy speech spectrum when the target speech is assumed to exist and the probability density distribution of the noisy speech spectrum when the target speech is assumed to not exist, when determining the likelihood ratio, use all State the estimated target speech power spectrum of the current frame (such as the λth frame)
Figure PCTCN2019096503-appb-000008
(Or called enhanced speech power spectrum) instead of the real target speech power spectrum of the current frame (such as the λth frame) |X(λ,k)| 2 , specifically, it can be used for the previous frame (such as the first frame) λ-1 frame) to obtain the filter coefficient to filter the current frame (such as the λth frame) noisy speech power spectrum to obtain the enhanced speech power spectrum as the estimated target speech power spectrum
Figure PCTCN2019096503-appb-000009
Figure PCTCN2019096503-appb-000010
Represents the true noise power spectrum, which can be calculated according to the initial estimated noise power spectrum calculated by the above formula (10)
Figure PCTCN2019096503-appb-000011
Instead, get the likelihood ratio in engineering realization
Figure PCTCN2019096503-appb-000012
Calculated as follows:
Figure PCTCN2019096503-appb-000013
Figure PCTCN2019096503-appb-000013
具体地,可对上述公式(15)进行化简,得到不同简化形式的似然比计算公式,以节约硬件资源开销。Specifically, the above formula (15) can be simplified to obtain likelihood ratio calculation formulas in different simplified forms to save hardware resource overhead.
另外,在上述公式(11)、(12)中,H 0表示无目标语音,H 1表示有目标语音,因此,p(Y(λ,k)|H 0)表示无目标语音时第λ帧带噪语音频谱的概率密度分布函数,p(Y(λ,k)|H 1)表示有目标语音时第λ帧带噪语音频谱的概率密度分布函数。再参见公式(13),在第k个频点对应的似然比,实际上为p(Y(λ,k)|H 1)与[(Y(λ,k)|H 0)在相应频点处的比值,因此确定上述公式(11)和(12)的具体形式,并将其带入公式(13)即可得到每个频点对应的似然比Δ k,公式(14)为确定公式(11)和(12)的一种形式后,得到的似然比的具体表达式。公式(15)为工程实现中的似然比的具体表达式。 In addition, in the above formulas (11) and (12), H 0 indicates that there is no target speech, and H 1 indicates that there is a target speech. Therefore, p(Y(λ,k)|H 0 ) indicates the λth frame when there is no target speech. The probability density distribution function of the noisy speech spectrum, p(Y(λ,k)|H 1 ) represents the probability density distribution function of the noisy speech spectrum in the λth frame when there is a target speech. See formula (13) again, the likelihood ratio corresponding to the k-th frequency point is actually p(Y(λ,k)|H 1 ) and [(Y(λ,k)|H 0 ) at the corresponding frequency. Therefore, the specific form of the above formulas (11) and (12) is determined, and the formula (13) can be used to obtain the likelihood ratio Δ k corresponding to each frequency point. The formula (14) is determined After a form of formula (11) and (12), the specific expression of the likelihood ratio is obtained. Formula (15) is a concrete expression of the likelihood ratio in engineering realization.
S204c、根据所述带噪语音功率谱,确定有效的目标语音存在的先验概率。S204c: Determine a priori probability of the existence of a valid target speech according to the power spectrum of the noisy speech.
本实施例中,在步骤S204c中确定有效的目标语音存在的先验概率时,第一步,根据所述带噪语音功率谱,初步判断当前帧所述带噪语音中是否存在目标语音;第二步,根据所述初步判断当前帧所述带噪语音中是否存在目标语音的判断结果,确定估计的目标语音存在的先验概率,并根据估计的目标语音存在的先验概率,确定有效的目标语音存在的先验概率。In this embodiment, when determining the prior probability of the existence of a valid target speech in step S204c, the first step is to preliminarily determine whether there is a target speech in the noisy speech in the current frame according to the power spectrum of the noisy speech; The second step is to determine the estimated prior probability of the existence of the target speech according to the preliminary judgment result of whether the target speech exists in the noisy speech in the current frame, and determine the effective The prior probability of the existence of the target speech.
进一步地,在步骤S204c中,根据所述带噪语音功率谱,确定估计的目标语音存在的先验概率,包括:对所述不存在所述目标语音的带噪语音功率谱进行频点间平滑以及帧间平滑处理;根据两次平滑后的所述的带噪语音功率谱,确定估计的目标语音存在的先验概率。Further, in step S204c, determining the estimated prior probability of the presence of the target speech according to the power spectrum of the noisy speech includes: smoothing the power spectrum of the noisy speech without the target speech between frequency points And inter-frame smoothing processing; according to the noisy speech power spectrum after twice smoothing, the prior probability of the existence of the estimated target speech is determined.
进一步地,在步骤S204c中,初步判断所述带噪语音中是否存在所述目标语音时,根据所述带噪语音功率谱即上述|Y(λ,k)| 2,以及经加窗以及帧间平滑后的带噪语音的最小功率谱即上述P min(λ,k),确定当前帧每个频点处的第一检测因子;根据经加窗以及帧间平滑后的所述带噪语音功率谱即上述P(λ,k),以及经加窗以及帧间平滑后的带噪语音的最小功率谱即上述P min(λ,k)确定当前帧每个频点处的第二检测因子,以根据当前帧每个频点处的所述第一检测因子以及所述第二检测因子,初步判断当前帧每个频点处所述带噪语音中是否存在所述目标语音。 Further, in step S204c, when preliminarily determining whether the target speech exists in the noisy speech, it is based on the power spectrum of the noisy speech, namely the above |Y(λ,k)| 2 , and after windowing and frame The minimum power spectrum of the noisy speech after inter-smoothing is the above-mentioned P min (λ,k), and the first detection factor at each frequency point of the current frame is determined; according to the noisy speech after windowing and inter-frame smoothing The power spectrum is the above P(λ,k), and the minimum power spectrum of the noisy speech after windowing and smoothing between frames is the above P min (λ,k) to determine the second detection factor at each frequency point in the current frame To preliminarily determine whether the target voice exists in the noisy speech at each frequency point of the current frame according to the first detection factor and the second detection factor at each frequency point of the current frame.
具体地,若当前帧带噪语音的某个频点计算出的所述第一检测因子小于设定 的第一检测因子门限,且该频点所述第二检测因子小于设定的第二检测因子门限,则初步判定所述带噪语音在该频点处不存在所述目标语音;若不满足上述条件,则初步判定所述带噪语音在该频点处存在所述目标语音。Specifically, if the first detection factor calculated at a certain frequency point of the noisy speech in the current frame is less than the set first detection factor threshold, and the second detection factor at this frequency point is less than the set second detection factor Factor threshold, it is preliminarily determined that the noisy speech does not have the target voice at this frequency point; if the above conditions are not met, it is preliminarily determined that the noisy speech has the target voice at this frequency point.
在一具体应用场景中,通过如下公式(16)-(18)初步判断所述带噪语音中是否存在所述目标语音。In a specific application scenario, the following formulas (16)-(18) are used to preliminarily determine whether the target voice exists in the noisy voice.
Figure PCTCN2019096503-appb-000014
Figure PCTCN2019096503-appb-000014
其中γ 0和ζ 0为阈值,且 Where γ 0 and ζ 0 are thresholds, and
Figure PCTCN2019096503-appb-000015
Figure PCTCN2019096503-appb-000015
Figure PCTCN2019096503-appb-000016
Figure PCTCN2019096503-appb-000016
其中B min=1.66为估计偏差因子,P min为式(6)或式(8)输出的平滑后的带噪语音功率谱的最小功率谱,P(λ,k)为式(5)计算得到的所述的平滑后的带噪语音功率谱。B min用于对P min进行补偿或者修正,比如P min偏小,通过B min对P min进行修正使其更准确。 Where B min =1.66 is the estimated deviation factor, P min is the minimum power spectrum of the smoothed noisy speech power spectrum output by equation (6) or equation (8), and P(λ,k) is calculated by equation (5) The smoothed noisy speech power spectrum. B min is used to compensate or correct P min, P min such as small, corrects the B min P min by making it more accurate.
参见上述公式(17),根据第λ帧带噪语音功率谱|Y(λ,k)| 2以及根据上述公式(6)或(8)计算的最小功率谱P min,确定第一检测因子γ min(λ,k),γ min(λ,k)用于检测第λ帧带噪语音在每个频点对应的频域信号中是否存在目标语音。 Refer to the above formula (17), according to the λth frame noisy speech power spectrum |Y(λ,k)| 2 and the minimum power spectrum P min calculated according to the above formula (6) or (8), determine the first detection factor γ min (λ,k), γ min (λ,k) are used to detect whether there is a target voice in the frequency domain signal corresponding to each frequency point of the noisy speech in the λth frame.
参见上述公式(18),根据平滑后的第λ帧带噪语音功率谱P(λ,k)以及根据上述公式(6)或(8)计算的最小功率谱P min确定第二检测因子ζ(λ,k);ζ(λ,k)用于检测第λ帧带噪语音在每个频点对应的频域信号中是否存在目标语音。 Referring to the above formula (18), the second detection factor ζ( is determined according to the smoothed λth frame noisy speech power spectrum P(λ,k) and the minimum power spectrum P min calculated according to the above formula (6) or (8) λ,k); ζ(λ,k) is used to detect whether there is a target voice in the frequency domain signal corresponding to each frequency point of the noisy speech in the λth frame.
考虑到如果没有目标语音的话,或者称之为大概率仅存在噪声,由于噪声相对比较稳定,因此,根据上述公式(17)、(18)分别计算得到的第一检测因子和第二检测因子的值比较小,为此,根据上述公式(17)、(18)分别得到第一检测因子γ min(λ,k)以及第二检测因子ζ(λ,k),再分别与各自对应的阈值γ 0以及ζ 0比对,若当前帧某一频点处所述第一检测因子γ min(λ,k)小于对应的阈值γ 0,且当前帧该频点处所述第二检测因子ζ(λ,k)小于对应的阈值ζ 0,则初步判定所述带噪语音在该频点只包括噪声,而不包括目标语音,在其他条件情况下,则初步判定该频点所述带噪语音中既包括噪声,又包括目标语音。 Considering that if there is no target speech, or it is called a high probability that there is only noise, since the noise is relatively stable, the first detection factor and the second detection factor calculated according to the above formulas (17) and (18) The value is relatively small. Therefore, according to the above formulas (17) and (18), the first detection factor γ min (λ, k) and the second detection factor ζ (λ, k) are obtained, respectively, and the corresponding threshold γ 0 and ζ 0 , if the first detection factor γ min (λ,k) at a certain frequency point in the current frame is less than the corresponding threshold γ 0 , and the second detection factor ζ( λ, k) is less than the corresponding threshold ζ 0 , then it is preliminarily determined that the noisy speech includes only noise at this frequency point and does not include the target speech. Under other conditions, it is preliminarily determined that the noisy speech at this frequency point It includes both noise and target speech.
将当前帧每个频点判别是否存在所述目标语音的结果用0和1表示,其中0表示判定当前帧当前频点存在所述目标语音,1表示判定当前帧当前频点不存在所述目标语音,定义I(λ,k)为指示函数并将判定的结果保存到I(λ,k)的相应频点中。对于上述指示函数I(λ,k)来说,当所述带噪语音相应频点不包含目标语音时,指示函数在对应频点处的值为1;否则,其值为0。The result of judging whether the target voice exists at each frequency point of the current frame is represented by 0 and 1, where 0 means determining that the target voice exists at the current frequency point of the current frame, and 1 means determining that the target voice does not exist at the current frequency point of the current frame For speech, define I(λ,k) as an indicator function and save the result of the judgment to the corresponding frequency point of I(λ,k). For the above indicator function I(λ, k), when the corresponding frequency point of the noisy speech does not contain the target voice, the value of the indicator function at the corresponding frequency point is 1; otherwise, the value is 0.
在上述公式(16)中,阈值γ 0以及ζ 0的大小可以根据应用场景灵活设置。 In the above formula (16), the thresholds γ 0 and ζ 0 can be flexibly set according to application scenarios.
具体地,在步骤S204c中确定有效的目标语音存在的先验概率时,根据所述指示函数判断当前帧所述带噪语音中是否存所述目标语音,若当前帧不存在所述目标语音,即指示函数在每个频点处不全为零,利用所述指示函数和所述窗函数对带噪语音功率谱进行频点间平滑;若当前帧带噪语音存在所述目标语音,即指示函数在每个频点处全为零,则采用前一帧得到的两次平滑后的带噪语音功率谱作为当前帧帧频点间平滑后的带噪语音功率谱;之后,对当前帧频点间平滑后的带噪语音功率谱进行帧间平滑处理得到帧间平滑后的带噪语音功率谱;由于经过了频点间平滑处理以及帧间平滑处理,所以最终得到帧间平滑后的带噪语音功率谱又可以称之为两次平滑后的带噪语音功率谱。进一步地,根据两次平滑后的带噪语音功率谱的最小功率谱,以及所述带噪语音功率谱确定当前帧每个频点处的第三检测因子;根据两次平滑后的带噪语音功率谱及其最小功率谱确定当前帧每个频点处的第四检测因子;根据当前帧每个频点处的所述第三检测因子与当前帧每个频点处的所述第四检测因子,确定当前帧每个频点处的估计的目标语音存在的先验概率。Specifically, when determining the prior probability of the existence of a valid target voice in step S204c, determine whether the target voice exists in the noisy voice in the current frame according to the indicator function, and if the target voice does not exist in the current frame, That is, the indicator function is not all zero at each frequency point, and the noisy speech power spectrum is smoothed between frequency points by using the indicator function and the window function; if the target speech exists in the noisy speech in the current frame, the indicator function If it is all zeros at each frequency point, the two smoothed noisy speech power spectra obtained in the previous frame are used as the smoothed noisy speech power spectrum between the current frame frequency points; then, the current frame frequency point Inter-frame smoothing of the noisy speech power spectrum after inter-frame smoothing is performed to obtain the inter-frame smoothed noisy speech power spectrum; due to the inter-frequency smoothing processing and the inter-frame smoothing processing, the final smoothed inter-frame noise is obtained The speech power spectrum can also be called the noisy speech power spectrum after twice smoothing. Further, the third detection factor at each frequency point of the current frame is determined according to the minimum power spectrum of the noisy speech power spectrum after twice smoothing and the noisy speech power spectrum; according to the twice smoothed noisy speech The power spectrum and its minimum power spectrum determine the fourth detection factor at each frequency point in the current frame; according to the third detection factor at each frequency point in the current frame and the fourth detection factor at each frequency point in the current frame Factor to determine the prior probability of the existence of the estimated target speech at each frequency point in the current frame.
获得所述两次平滑后的带噪语音功率谱的最小功率谱具体可以参照上述公式(6)-(9)来实现。Obtaining the minimum power spectrum of the twice-smoothed noisy speech power spectrum can be specifically implemented with reference to the above formulas (6)-(9).
再进一步地,按照下述公式(19)、(20)对第λ帧带噪语音功率谱|Y(λ,k)| 2进行频点间以及帧间平滑处理,获得所述的两次平滑后的带噪语音功率谱。 Furthermore, according to the following formulas (19) and (20), the λ-th frame noisy speech power spectrum |Y(λ,k)| 2 is smoothed between frequency points and between frames to obtain the two smoothing The power spectrum of noisy speech afterwards.
Figure PCTCN2019096503-appb-000017
Figure PCTCN2019096503-appb-000017
Figure PCTCN2019096503-appb-000018
Figure PCTCN2019096503-appb-000018
其中α 2为平滑因子。
Figure PCTCN2019096503-appb-000019
表示第λ帧(当前帧)带噪语音功率谱|Y(λ,k)| 2经过所述指示函数I(λ,k)和所述窗函数w(L)在频点间平滑后的功率谱,Lw表示窗长,i表示窗的位置,
Figure PCTCN2019096503-appb-000020
表示第λ帧所述经过所述指示函数I(λ,k)和所述窗函数w(L)在频点间平滑后的功率谱
Figure PCTCN2019096503-appb-000021
再经帧间平滑后得到的两次平滑后的带噪语音功率谱,
Figure PCTCN2019096503-appb-000022
表示第λ-1帧所述的两次平滑后的带噪语音功率谱。
Where α 2 is the smoothing factor.
Figure PCTCN2019096503-appb-000019
Represents the power spectrum of noisy speech in the λth frame (the current frame) |Y(λ,k)| 2 The power after the indicator function I(λ,k) and the window function w(L) are smoothed between frequency points Spectrum, Lw represents the length of the window, i represents the position of the window,
Figure PCTCN2019096503-appb-000020
Represents the power spectrum smoothed between the frequency points of the indicator function I(λ,k) and the window function w(L) in the λth frame
Figure PCTCN2019096503-appb-000021
Two smoothed noisy speech power spectra obtained after smoothing between frames,
Figure PCTCN2019096503-appb-000022
Represents the noisy speech power spectrum after twice smoothing in the λ-1 frame.
结合上述公式(16)-(18),再结合上述公式(19)可见,相当于根据初步判断的结果对带噪语音功率谱进行频点间平滑。再参见公式(20),相当于对频点间平滑后的带噪语音功率谱再进行帧间平滑。Combining the above formulas (16)-(18), and then combining the above formula (19), it can be seen that it is equivalent to smoothing the power spectrum of noisy speech between frequencies based on the result of preliminary judgment. See formula (20) again, which is equivalent to smoothing the power spectrum of noisy speech after smoothing between frequency points and then smoothing between frames.
得到
Figure PCTCN2019096503-appb-000023
之后,再参照上述公式(6)-(8)确定两次平滑后的带噪语音功率谱的最小功率谱
Figure PCTCN2019096503-appb-000024
再根据如下公式(21)-(24)确定估计的目标语音存在的先验概率q s(λ,k)以及有效的目标语音存在的先验概率q(λ,k)。
get
Figure PCTCN2019096503-appb-000023
After that, refer to the above formulas (6)-(8) to determine the minimum power spectrum of the two smoothed noisy speech power spectrum
Figure PCTCN2019096503-appb-000024
Then according to the following formulas (21)-(24), determine the estimated prior probability of the existence of the target speech q s (λ, k) and the effective prior probability of the existence of the target speech q (λ, k).
Figure PCTCN2019096503-appb-000025
Figure PCTCN2019096503-appb-000025
q(λ,k)=max((q s(λ,k),q min)        (22) q(λ,k)=max((q s (λ,k),q min ) (22)
其中γ 1、ζ 0为阈值,q min为目标语音存在的先验概率的最小值,当应用场景确定之后,q min大致固定不变,即q min可以根据应用场景进行设定的。q(λ,k)即为有效的目标语音存在的先验概率。 Among them, γ 1 and ζ 0 are thresholds, and q min is the minimum value of the prior probability of the existence of the target speech. After the application scenario is determined, q min is approximately fixed, that is, q min can be set according to the application scenario. q(λ,k) is the prior probability of the existence of a valid target speech.
Figure PCTCN2019096503-appb-000026
Figure PCTCN2019096503-appb-000026
Figure PCTCN2019096503-appb-000027
Figure PCTCN2019096503-appb-000027
参见上述公式(23)和公式(24),对于第λ帧带噪语音在第k个频点来说,类似确定上述第一检测因子,在确定估计的目标语音存在的先验概率时,包括:根据所述两次平滑后的带噪语音功率谱
Figure PCTCN2019096503-appb-000028
的最小功率谱
Figure PCTCN2019096503-appb-000029
以及所述带噪语音功率谱|Y(λ,k)| 2,确定第三检测因子
Figure PCTCN2019096503-appb-000030
根据所述两次平滑后的带噪语音功率谱
Figure PCTCN2019096503-appb-000031
及其最小功率谱
Figure PCTCN2019096503-appb-000032
确定第四检测因子
Figure PCTCN2019096503-appb-000033
根据所述第三检测因子与所述第四检测因子,确定当前帧每个频点处的估计的目标语音存在的先验概率。
With reference to the above formula (23) and formula (24), for the noisy speech in the λth frame at the kth frequency point, the above first detection factor is similarly determined. When determining the prior probability of the estimated target speech, it includes :According to the two smoothed power spectrum of noisy speech
Figure PCTCN2019096503-appb-000028
Minimum power spectrum
Figure PCTCN2019096503-appb-000029
And the noisy speech power spectrum |Y(λ,k)| 2 , determine the third detection factor
Figure PCTCN2019096503-appb-000030
According to the two smoothed noisy speech power spectrum
Figure PCTCN2019096503-appb-000031
And its minimum power spectrum
Figure PCTCN2019096503-appb-000032
Determine the fourth detection factor
Figure PCTCN2019096503-appb-000033
According to the third detection factor and the fourth detection factor, the prior probability of the existence of the estimated target voice at each frequency point of the current frame is determined.
进一步地,参见上述公式(21)-(22),对当前帧带噪语音每个频点计算出的所述第三检测因子与所述第四检测因子与对应的阈值进行比对,根据所述比对结果,确定当前帧每个频点处的估计的目标语音存在的先验概率。Further, referring to the above formulas (21)-(22), the third detection factor and the fourth detection factor calculated for each frequency point of the noisy speech in the current frame are compared with the corresponding threshold, and according to the According to the comparison result, the prior probability of the existence of the estimated target speech at each frequency point of the current frame is determined.
参见上述公式(21)-(22),第λ帧所述带噪语音来说,其对应的第三检测因子
Figure PCTCN2019096503-appb-000034
在某一频点处小于等于1,且第四检测因子
Figure PCTCN2019096503-appb-000035
在相应频点处小于对应的阈值ζ 0,则判定该频点处的估计的目标语音存在的先验概率q s(λ,k)为0;若第三检测因子
Figure PCTCN2019096503-appb-000036
在某一频点处的值大于1但小于阈值γ 1,且第四检测因子
Figure PCTCN2019096503-appb-000037
在相应频点处的值小于对应的阈值ζ 0,则该频点处的估计的目标语音存在的先验概率q s(λ,k)再按照上述公式(21)计算,具体地根据第三检测因子
Figure PCTCN2019096503-appb-000038
以及对应的阈值计算该频点处的估计的目标语音存在的先验概率q s(λ,k);除此上述两种情况外的其他情形,估计的目标语音存在的先验概率q s(λ,k)的值均为1。
Referring to the above formulas (21)-(22), for the noisy speech in frame λ, its corresponding third detection factor
Figure PCTCN2019096503-appb-000034
Less than or equal to 1 at a certain frequency point, and the fourth detection factor
Figure PCTCN2019096503-appb-000035
If the corresponding frequency point is less than the corresponding threshold ζ 0 , it is determined that the prior probability q s (λ,k) of the estimated target speech at this frequency point is 0; if the third detection factor
Figure PCTCN2019096503-appb-000036
The value at a certain frequency point is greater than 1 but less than the threshold γ 1 , and the fourth detection factor
Figure PCTCN2019096503-appb-000037
The value at the corresponding frequency point is less than the corresponding threshold ζ 0 , then the prior probability q s (λ,k) of the estimated target speech existence at the frequency point is calculated according to the above formula (21), specifically according to the third Detection factor
Figure PCTCN2019096503-appb-000038
And the corresponding threshold is used to calculate the estimated prior probability q s (λ,k) of the existence of the target speech at the frequency point; in other cases except the above two cases, the estimated prior probability of the existence of the target speech q s ( The values of λ, k) are all 1.
进一步,参照上述公式(22),通过取每个频点处的估计的目标语音存在的先验概率q s(λ,k)和目标语音存在的先验概率的最小值q min中的最大值,作为对应频点处的有效的目标语音存在的先验概率q(λ,k)。 Further, referring to the above formula (22), by taking the estimated prior probability of the existence of the target speech q s (λ, k) at each frequency point and the maximum of the minimum value of the prior probability of the existence of the target speech q min , As the prior probability q(λ,k) of the effective target speech at the corresponding frequency point.
此处,需要说明的是,上述实施例示例性地提供了一种计算有效的目标语音存在的先验概率q(λ,k)的方式,但是,也可以根据不同需求选择其他方法求解q(λ,k)。Here, it should be noted that the foregoing embodiment exemplarily provides a way to calculate the prior probability q(λ,k) of the existence of an effective target speech. However, other methods can also be selected to solve q( λ,k).
S204d、根据所述似然比以及所述有效的目标语音存在的先验概率,确定当前帧每个频点处所述目标语音存在的后验概率。S204d: Determine the posterior probability of the existence of the target speech at each frequency point of the current frame according to the likelihood ratio and the prior probability of the existence of the effective target speech.
根据贝叶斯理论,则目标语音存在的后验概率通过如下公式(25)计算:According to Bayesian theory, the posterior probability of the existence of the target speech is calculated by the following formula (25):
Figure PCTCN2019096503-appb-000039
Figure PCTCN2019096503-appb-000039
化简上述公式(25)式得:Simplify the above formula (25) to get:
Figure PCTCN2019096503-appb-000040
Figure PCTCN2019096503-appb-000040
在上述公式(13)中,
Figure PCTCN2019096503-appb-000041
为第λ帧带噪语音在不同频点处的似然比,q(λ,k)为上述的有效的目标语音存在的先验概率。似然比以及有效的目标语音存在的先验概率根据上述公式已经得知,带入公式(26)中即可得到第λ帧带噪语音在每个频点处所述的目标语音存在的后验概率p(H 1|Y(λ,k))。
In the above formula (13),
Figure PCTCN2019096503-appb-000041
Is the likelihood ratio of the noisy speech in the λth frame at different frequency points, and q(λ,k) is the prior probability of the existence of the above-mentioned effective target speech. The likelihood ratio and the priori probability of the existence of the effective target speech have been known according to the above formula, which can be taken into formula (26) to obtain the posterior probability of the existence of the target speech in the λth frame of noisy speech at each frequency point. The probability p(H 1 |Y(λ,k)).
S204e、平滑因子计算单元根据目标语音存在的后验概率,计算平滑因子;S204e. The smoothing factor calculation unit calculates the smoothing factor according to the posterior probability of the existence of the target voice;
根据不同降噪场景,确定所述目标语音存在的后验概率与所述平滑因子之间的映射模型;对应地,根据目标语音存在的后验概率,计算相应的平滑因子,包括:将所述目标语音存在的后验概率作为所述映射模型的输入,所述映射模型的输出为所述平滑因子。According to different noise reduction scenarios, the mapping model between the posterior probability of the target voice and the smoothing factor is determined; correspondingly, the corresponding smoothing factor is calculated according to the posterior probability of the target voice, including: The posterior probability of the existence of the target speech is used as the input of the mapping model, and the output of the mapping model is the smoothing factor.
具体地,参见下述公式(27)计算第λ帧带噪语音在每个频点处对应的平滑因子。Specifically, refer to the following formula (27) to calculate the smoothing factor corresponding to the noisy speech in the λth frame at each frequency point.
α(k)=f(p(H 1|Y(λ,k))     (27) α(k)=f(p(H 1 |Y(λ,k)) (27)
由上述公式(25)可见,对于第λ帧带噪语音在第k个频点上的频域信号来说,都可以根据在第k个频点处的目标语音存在的后验概率计算相应的平滑因子α(k)。具体的p(H 1|Y(λ,k)与α(k)的函数关系可以是线性的、指数的、对数的等等,具体采用哪种映射模型取决于应用环境中的噪声特性。 It can be seen from the above formula (25) that for the frequency domain signal of the noisy speech in the λth frame at the kth frequency point, the corresponding posterior probability of the existence of the target speech at the kth frequency point can be calculated. Smoothing factor α(k). The specific functional relationship between p(H 1 |Y(λ,k) and α(k) can be linear, exponential, logarithmic, etc. The specific mapping model used depends on the noise characteristics in the application environment.
图3和图4分别示例性地给出了目标语音存在的后验概率与平滑因子的映射曲线示意图之一和之二。横坐标为目标语音存在的后验概率,纵坐标为平滑因子。不同参数配置条件下α(k)随p(H 1|Y(λ,k)的变化关系曲线。 Fig. 3 and Fig. 4 exemplarily show the first and second schematic diagrams of the mapping curve of the posterior probability and the smoothing factor of the target speech. The abscissa is the posterior probability of the existence of the target speech, and the ordinate is the smoothing factor. The relationship curve of α(k) with p(H 1 |Y(λ,k) under different parameter configuration conditions.
α(k)=min{β+(1-β)*P(k),0.96}     (28)α(k)=min{β+(1-β)*P(k),0.96} (28)
Figure PCTCN2019096503-appb-000042
Figure PCTCN2019096503-appb-000042
其中β、γ、μ、ε均为可配参数,不同参数配置会产生不同p(H 1|Y(λ,k)与α(k)的函数关系和函数曲线。如公式(28)-(29)可见,所述目标语音存在的后验概率和所述平滑因子之间的关系包括非线性关系。 Among them, β, γ, μ, ε are all configurable parameters, and different parameter configurations will produce different p(H 1 |Y(λ,k) and α(k) function relations and function curves. Such as formula (28)-( 29) It can be seen that the relationship between the posterior probability of the existence of the target speech and the smoothing factor includes a non-linear relationship.
S205、噪声更新单元用于根据所述平滑因子以及带噪语音功率谱,对初始估计噪声功率谱进行更新得到有效噪声功率谱;S205. The noise update unit is configured to update the initial estimated noise power spectrum according to the smoothing factor and the noisy speech power spectrum to obtain an effective noise power spectrum;
本实施例中,步骤S205具体执行时,假设在有目标语音时停止对初始估计噪声功率谱进行更新,从而避免对目标语音的损伤,同时在不存在目标语音时对初始估计噪声功率谱进行更新,以提升噪声估计的准确性。为此,分别得到无目标语音和有目标语音情况下的更新方式。In this embodiment, when step S205 is specifically executed, it is assumed that the update of the initial estimated noise power spectrum is stopped when there is a target voice, so as to avoid damage to the target voice, and the initial estimated noise power spectrum is updated when there is no target voice. To improve the accuracy of noise estimation. For this reason, the update modes in the case of no target voice and target voice are obtained respectively.
Figure PCTCN2019096503-appb-000043
Figure PCTCN2019096503-appb-000043
Figure PCTCN2019096503-appb-000044
Figure PCTCN2019096503-appb-000044
参见上述公式(30),是针对第λ帧带噪语音功率谱,在没有目标语音存在情形时,初始估计噪声功率谱的更新方式;在有目标语音存在情形时,则不对初始估 计噪声功率谱进行更新,参见上述公式(31)。即没有目标语音存在的时候,更新初始估计噪声功率谱,在有目标语音存在的时候,不更新初始估计噪声功率谱,既能避免语音损失,又能避免噪声残留过大。Refer to the above formula (30), which is for the noisy speech power spectrum of the λth frame. When there is no target speech, the initial estimated noise power spectrum is updated; when there is a target speech, the initial noise power spectrum is not estimated To update, see formula (31) above. That is, when there is no target voice, the initial estimated noise power spectrum is updated, and when there is a target voice, the initial estimated noise power spectrum is not updated, which can avoid speech loss and avoid excessive noise residue.
因此,基于上述公式(30)、(31)的假设,考虑到对第λ帧带噪语音来说,其对应的有效噪声功率谱
Figure PCTCN2019096503-appb-000045
如公式(32)所示。
Therefore, based on the assumptions of the above formulas (30) and (31), considering that for the noisy speech in the λth frame, the corresponding effective noise power spectrum
Figure PCTCN2019096503-appb-000045
As shown in formula (32).
Figure PCTCN2019096503-appb-000046
Figure PCTCN2019096503-appb-000047
Figure PCTCN2019096503-appb-000046
Figure PCTCN2019096503-appb-000047
将(30)、(31)式带入(32)式,得Bringing (30) and (31) into (32), we get
Figure PCTCN2019096503-appb-000048
Figure PCTCN2019096503-appb-000049
Figure PCTCN2019096503-appb-000048
Figure PCTCN2019096503-appb-000049
其中,α 3为所述的平滑因子,与目标语音存在的后验概率具有函数关系,针对第λ帧带噪语音,p(H 0|Y(λ,k))为无目标语音存在的后验概率,p(H 1|Y(λ,k))为有目标语音存在的后验概率,且p(H 0|Y(λ,k))=1-p(H 1|Y(λ,k))。上述三个参量均通过噪声更新控制模块计算,
Figure PCTCN2019096503-appb-000050
为针对第λ-1帧带噪语音得到的初始估计噪声功率谱,若需要增强降噪能力,
Figure PCTCN2019096503-appb-000051
也可以采用第λ-1帧带噪语音对应的有效噪声功率谱,即
Figure PCTCN2019096503-appb-000052
Among them, α 3 is the aforementioned smoothing factor, which has a functional relationship with the posterior probability of the existence of the target speech. For the noisy speech in the λth frame, p(H 0 |Y(λ,k)) is the posterior probability of the existence of no target speech. P(H 1 |Y(λ,k)) is the posterior probability of the existence of the target speech, and p(H 0 |Y(λ,k)) = 1-p(H 1 |Y(λ, k)). The above three parameters are calculated by the noise update control module,
Figure PCTCN2019096503-appb-000050
In order to obtain the initial estimated noise power spectrum for the noisy speech in the λ-1 frame, if it is necessary to enhance the noise reduction capability,
Figure PCTCN2019096503-appb-000051
The effective noise power spectrum corresponding to the noisy speech in frame λ-1 can also be used, namely
Figure PCTCN2019096503-appb-000052
由上述公式(30)-(33)可见在对所述初始估计噪声功率谱进行更新得到有效噪声功率谱时,具体地,根据带噪语音功率谱、所述平滑因子、无目标语音存在的后验概率、历史的所述初始估计噪声功率谱以及有目标语音存在的后验概率,对所述初始估计噪声功率谱进行更新,从而得到有效噪声功率谱。对于第λ帧带噪语音来说,历史的所述初始估计噪声功率谱可以直接为针对第λ-1帧带噪语音对应的初始估计噪声功率谱,若需要增强降噪能力,则还可以为第λ-1帧带噪语音对应的有效噪声功率谱。From the above formulas (30)-(33), it can be seen that when the initial estimated noise power spectrum is updated to obtain the effective noise power spectrum, specifically, according to the noisy speech power spectrum, the smoothing factor, and the aftermath of the existence of no target speech The initial estimated noise power spectrum and the posterior probability of the existence of the target speech are updated to obtain the effective noise power spectrum. For the noisy speech in the λth frame, the historical initial estimated noise power spectrum can be directly the initial estimated noise power spectrum corresponding to the noisy speech in the λ-1th frame. If the noise reduction capability needs to be enhanced, it can also be The effective noise power spectrum corresponding to the noisy speech in the λ-1 frame.
S206、滤波器系数计算模块根据所述有效噪声功率谱,计算滤波器系数;S206. The filter coefficient calculation module calculates the filter coefficient according to the effective noise power spectrum.
S207、滤波器模块根据所述滤波器系数,对所述带噪语音频谱的实部和虚部分别进行滤波得到增强语音频谱。S207. The filter module separately filters the real part and the imaginary part of the noisy speech spectrum according to the filter coefficient to obtain an enhanced speech spectrum.
经典的频域维纳滤波器结构如下:The classic frequency domain Wiener filter structure is as follows:
Figure PCTCN2019096503-appb-000053
Figure PCTCN2019096503-appb-000053
其中a和b均为可变参量。
Figure PCTCN2019096503-appb-000054
实际中无法获得真实的目标语音功率谱和真实噪声功率谱,因此采用如下经典的判决引导法对ξ k做近似计算。
Wherein a and b are variable parameters.
Figure PCTCN2019096503-appb-000054
In practice, the real target speech power spectrum and real noise power spectrum cannot be obtained, so the following classic decision-guided method is used to approximate ξ k .
Figure PCTCN2019096503-appb-000055
Figure PCTCN2019096503-appb-000055
其中a为平滑因子。ξ min
Figure PCTCN2019096503-appb-000056
可取的最小值。
Figure PCTCN2019096503-appb-000057
表示第λ帧估计得到的有效噪声功率谱;
Figure PCTCN2019096503-appb-000058
表示第λ-1帧估计得到的有效噪声功率谱;
Figure PCTCN2019096503-appb-000059
表示第λ-1帧得到的目标语音功率谱或增强的目标语音功率谱;|Y(λ,k)| 2表示第λ帧的带噪语音功率谱。
Where a is the smoothing factor. ξ min is
Figure PCTCN2019096503-appb-000056
The minimum desirable value.
Figure PCTCN2019096503-appb-000057
Represents the effective noise power spectrum estimated from the λth frame;
Figure PCTCN2019096503-appb-000058
Represents the effective noise power spectrum estimated from the λ-1 frame;
Figure PCTCN2019096503-appb-000059
Represents the power spectrum of the target speech obtained in the λ-1 frame or the enhanced target speech power spectrum; |Y(λ,k)| 2 represents the power spectrum of the noisy speech in the λth frame.
滤波器主要包含加法器和乘法器,利用式(34)、(35)计算得到的滤波器系数分别对第λ帧带噪语音频谱的实部和虚部进行降噪处理,即与实部和虚部分别相乘后相加即得到增强语音复频谱。The filter mainly includes an adder and a multiplier. The filter coefficients calculated by equations (34) and (35) are used to reduce the noise of the real and imaginary parts of the noisy speech spectrum of the λth frame, that is, with the real and The imaginary parts are multiplied and added to obtain the enhanced speech complex spectrum.
S208、还原模块将增强语音频谱从频域还原回时域得到时域二进制码组;S208. The restoration module restores the enhanced speech spectrum from the frequency domain back to the time domain to obtain a time domain binary code group;
S209、输出模块对时域二进制码组进行解码传输等处理,以经扬声器播放出来。S209. The output module decodes and transmits the time-domain binary code group to be played through the speaker.
此处,上述实施例中“用户”是相对概念,并非特定限定为人,也可以为机器。上述实施例中可以运用到人与人的语音通话、人与机器人的语音通话,机器人与机器人的语音通话等各种引用场景中,实际是其可以概括任意可产生有效语音的对象。Here, the "user" in the foregoing embodiment is a relative concept, and is not specifically limited to a person, but may also be a machine. The above embodiments can be applied to various reference scenarios such as human-to-human voice calls, human-to-robot voice calls, and robot-to-robot voice calls. In fact, it can generalize any object that can produce effective voice.
上述实施例二中,步骤S04a-S204e以及步骤S205实际上是噪声估计方法的一种示例性实施例。但是,需要说明的是,其中进一步或者具体的技术实现方式并非唯一性限定。In the second embodiment mentioned above, steps S04a-S204e and step S205 are actually an exemplary embodiment of the noise estimation method. However, it should be noted that further or specific technical implementation manners are not uniquely limited.
本申请实施例提供一种语音处理芯片,其包括本申请任一实施例中的噪声估计装置。An embodiment of the present application provides a voice processing chip, which includes the noise estimation device in any embodiment of the present application.
本申请实施例还提供一种电子设备,其包括本申请任一实施例上述方案。An embodiment of the present application also provides an electronic device, which includes the solution described in any of the embodiments of the present application.
另外,上述实施例中记载的具体公式,仅仅是示例并非唯一性限定,在不偏离本申请思想的前提下,本领域普通技术人员可对其进行变形。In addition, the specific formulas described in the foregoing embodiments are merely examples and are not uniquely limited. Those of ordinary skill in the art can modify them without departing from the idea of the present application.
本申请实施例的上述技术方案可以具体用的各种类型的电子设备上,该电子设备以多种形式存在,包括但不限于:The above-mentioned technical solutions of the embodiments of the present application can be specifically applied to various types of electronic equipment, and the electronic equipment exists in various forms, including but not limited to:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。(1) Mobile communication equipment: This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications. Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features. Such terminals include: PDA, MID and UMPC devices, such as iPad.
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。(3) Portable entertainment equipment: This type of equipment can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
(4)其他具有数据交互功能的电子装置。(4) Other electronic devices with data interaction functions.
至此,已经对本主题的特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作可以按照不同的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序,以实现期望的结果。在某些实施方式中,多任务处理和并行处理可以是有利的。So far, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired result. In certain embodiments, multitasking and parallel processing may be advantageous.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing this application, the functions of each unit can be implemented in the same one or more software and/or hardware.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, equipment (systems), and computer program products according to the embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, product or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or include elements inherent to this process, method, commodity, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定事务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行事务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This application may be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific transactions or implement specific abstract data types. This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network execute transactions. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of this application and are not used to limit this application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims (32)

  1. 一种噪声估计方法,其特征在于,包括:A noise estimation method, characterized in that it includes:
    确定带噪语音的初始估计噪声功率谱;Determine the initial estimated noise power spectrum of noisy speech;
    根据目标语音存在的概率,计算平滑因子;Calculate the smoothing factor according to the probability of the target voice;
    根据所述带噪语音以及所述平滑因子,对所述初始估计噪声功率谱进行更新得到有效噪声功率谱。According to the noisy speech and the smoothing factor, the initial estimated noise power spectrum is updated to obtain an effective noise power spectrum.
  2. 根据权利要求1所述的方法,其特征在于,所述目标语音存在的概率包括目标语音存在的后验概率,所述目标语音存在的后验概率和所述平滑因子之间的关系包括非线性关系。The method according to claim 1, wherein the probability of the existence of the target speech comprises the posterior probability of the existence of the target speech, and the relationship between the posterior probability of the existence of the target speech and the smoothing factor comprises nonlinearity. relationship.
  3. 根据权利要求2所述的方法,其特征在于,还包括:根据假设存在所述目标语音时带噪语音频谱的概率密度分布和假设不存在所述目标语音时带噪语音频谱的概率密度分布,确定似然比;根据所述似然比以及有效的目标语音存在的先验概率,确定所述目标语音存在的后验概率;The method according to claim 2, further comprising: according to the probability density distribution of the noisy speech spectrum when the target speech is assumed to exist and the probability density distribution of the noisy speech spectrum when the target speech is assumed to be absent, Determine the likelihood ratio; determine the posterior probability of the existence of the target speech according to the likelihood ratio and the prior probability of the existence of the effective target speech;
    对应地,根据所述目标语音存在的后验概率,计算平滑因子。Correspondingly, the smoothing factor is calculated according to the posterior probability of the existence of the target speech.
  4. 根据权利要求3所述的方法,其特征在于,还包括:根据带噪语音功率谱,确定所述有效的目标语音存在的先验概率。The method according to claim 3, further comprising: determining the prior probability of the existence of the effective target speech according to the power spectrum of the noisy speech.
  5. 根据权利要求4所述的方法,其特征在于,所述根据带噪语音功率谱,确定所述有效的目标语音存在的先验概率,包括:根据所述带噪语音功率谱,初步判断所述带噪语音中是否存在所述目标语音;根据所述初步判断的结果,确定估计的目标语音存在的先验概率,以及根据所述估计的目标语音存在的先验概率,确定所述有效的目标语音存在的先验概率。The method according to claim 4, wherein the determining a priori probability of the existence of the effective target speech according to the noisy speech power spectrum comprises: preliminarily determining the noisy speech power spectrum Whether the target speech exists in the noisy speech; determine the estimated prior probability of the existence of the target speech according to the result of the preliminary judgment, and determine the effective target according to the estimated prior probability of the existence of the target speech The prior probability of speech existence.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述初步判断的结果,确定估计的目标语音存在的先验概率,包括:若不存在所述目标语音,则对不存在所述目标语音的带噪语音功率谱进行频点间平滑得到频点间平滑后的带噪语音功率谱,或者,若存在所述目标语音,则将历史的帧间平滑后的带噪语音功率谱作为所述频点间平滑后的带噪语音功率谱;对所述频点间平滑后的带噪语音功率谱进行帧间平滑得到帧间平滑后的带噪语音功率谱;根据所述帧间平滑后的带噪语音功率谱,确定所述估计的目标语音存在的先验概率。The method according to claim 5, wherein the determining the estimated prior probability of the existence of the target speech according to the result of the preliminary judgment comprises: if the target speech does not exist, determining whether the target speech does not exist The noisy speech power spectrum of the target speech is smoothed between frequency points to obtain the smoothed noisy speech power spectrum between frequency points, or, if the target speech exists, the noisy speech power spectrum smoothed between historical frames is taken as Smoothing the power spectrum of the noisy speech between the frequency points; performing inter-frame smoothing on the power spectrum of the noisy speech smoothed between the frequency points to obtain the noisy speech power spectrum after the smoothing between the frames; according to the inter-frame smoothing After the noisy speech power spectrum, the prior probability of the existence of the estimated target speech is determined.
  7. 根据权利要求5所述的方法,其特征在于,所述根据所述带噪语音功率谱,初步判断所述带噪语音中是否存在所述目标语音,包括:根据所述带噪语音功率谱以及经加窗以及帧间平滑后的带噪语音功率谱的最小功率谱,确定第一检测因子;根据经加窗以及帧间平滑后的带噪语音功率谱以及其最小功率谱确定第二检测因子,以根据所述第一检测因子以及所述第二检测因子,初步判断所述带噪语音中是否存在所述目标语音。The method according to claim 5, wherein the preliminarily determining whether the target speech exists in the noisy speech according to the noisy speech power spectrum comprises: according to the noisy speech power spectrum and Determine the first detection factor based on the minimum power spectrum of the noisy speech power spectrum after windowing and smoothing between frames; determine the second detection factor according to the noisy speech power spectrum after windowing and smoothing between frames and its minimum power spectrum To preliminarily determine whether the target voice exists in the noisy voice based on the first detection factor and the second detection factor.
  8. 根据权利要求7所述的方法,其特征在于,若所述第一检测因子小于设定的第一检测因子门限,且所述第二检测因子小于设定的第二检测因子门限,则初步判定所述带噪语音中不存在所述目标语音;反之,则初步判定所述带噪语音中存在所述目标语音。8. The method of claim 7, wherein if the first detection factor is less than a set first detection factor threshold, and the second detection factor is less than a set second detection factor threshold, a preliminary determination The target voice does not exist in the noisy speech; otherwise, it is preliminarily determined that the target voice exists in the noisy speech.
  9. 根据权利要求6-8任一项所述的方法,其特征在于,所述根据所述帧间平滑后的带噪语音功率谱,确定所述估计的目标语音存在的先验概率,包括:根据所述带噪语 音功率谱以及所述帧间平滑后的带噪语音功率谱的最小功率谱,确定第三检测因子;根据所述帧间平滑后的带噪语音功率谱及其最小功率谱确定第四检测因子;根据所述第三检测因子与所述第四检测因子,确定所述估计的目标语音存在的先验概率。The method according to any one of claims 6-8, wherein the determining a priori probability of the presence of the estimated target speech according to the power spectrum of the noisy speech smoothed between frames comprises: Determine the third detection factor based on the noisy speech power spectrum and the minimum power spectrum of the noisy speech power spectrum smoothed between frames; determine according to the noisy speech power spectrum after the smoothing between frames and the minimum power spectrum A fourth detection factor; according to the third detection factor and the fourth detection factor, determine the estimated prior probability of the existence of the target voice.
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述第三检测因子与所述第四检测因子,确定所述估计的目标语音存在的估计先验概率,包括:将所述第三检测因子与所述第四检测因子与对应的阈值进行比对,根据所述比对结果,确定所述估计的目标语音存在的先验概率。The method according to claim 9, wherein the determining the estimated prior probability of the existence of the estimated target speech according to the third detection factor and the fourth detection factor comprises: combining the first The three detection factors are compared with the fourth detection factor and the corresponding threshold, and the estimated prior probability of the existence of the target speech is determined according to the comparison result.
  11. 根据权利要求5-10任一项所述的方法,其特征在于,所述根据所述估计的目标语音存在的先验概率,确定所述有效的目标语音存在的先验概率,包括:根据所述估计的目标语音存在的先验概率以及目标语音存在的先验概率的最小值,确定所述有效的目标语音存在的先验概率。The method according to any one of claims 5-10, wherein the determining the prior probability of the existence of the effective target speech according to the estimated prior probability of the existence of the target speech comprises: The estimated prior probability of the existence of the target speech and the minimum value of the prior probability of the existence of the target speech determine the prior probability of the existence of the effective target speech.
  12. 根据权利要求4-11任一项所述的方法,其特征在于,还包括:计算所述带噪语音功率谱,以根据所述带噪语音功率谱,确定所述带噪语音的初始估计噪声功率谱。The method according to any one of claims 4-11, further comprising: calculating the power spectrum of the noisy speech to determine the initial estimated noise of the noisy speech according to the power spectrum of the noisy speech power spectrum.
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述带噪语音功率谱,确定所述带噪语音的初始估计噪声功率谱,包括:对所述带噪语音功率谱进行加窗处理;对所述加窗后的带噪语音功率谱进行帧间平滑处理;对所述帧间平滑后的带噪语音功率谱进行最小功率谱搜索,将搜索到的最小功率谱作为所述初始估计噪声功率谱。The method according to claim 12, wherein the determining the initial estimated noise power spectrum of the noisy speech according to the noisy speech power spectrum comprises: windowing the noisy speech power spectrum Processing; performing inter-frame smoothing processing on the windowed noisy speech power spectrum; performing a minimum power spectrum search on the noisy speech power spectrum smoothed between frames, and using the searched minimum power spectrum as the initial Estimate the noise power spectrum.
  14. 根据权利要求4-13任一项所述的方法,其特征在于,根据所述带噪语音以及所述平滑因子,对所述初始估计噪声功率谱进行更新得到有效噪声功率谱,包括:根据所述带噪语音功率谱、所述平滑因子、无目标语音存在的后验概率、历史的所述初始估计噪声功率谱以及有目标语音存在的后验概率,对所述初始估计噪声功率谱进行更新得到所述有效噪声功率谱。The method according to any one of claims 4-13, wherein, according to the noisy speech and the smoothing factor, updating the initial estimated noise power spectrum to obtain an effective noise power spectrum comprises: The noisy speech power spectrum, the smoothing factor, the posterior probability of the existence of no target speech, the historical initial estimated noise power spectrum, and the posterior probability of the existence of target speech, update the initial estimated noise power spectrum Obtain the effective noise power spectrum.
  15. 根据权利要求1-14任一项所述的方法,其特征在于,还包括:根据所述有效噪声功率谱,计算滤波器系数;根据所述滤波器系数,对所述带噪语音进行滤波得到增强语音频谱。The method according to any one of claims 1-14, further comprising: calculating a filter coefficient according to the effective noise power spectrum; and filtering the noisy speech according to the filter coefficient to obtain Enhance the speech spectrum.
  16. 一种噪声估计装置,其特征在于,包括:A noise estimation device is characterized by comprising:
    初始噪声估计单元,用于确定带噪语音的初始估计噪声功率谱;The initial noise estimation unit is used to determine the initial estimated noise power spectrum of noisy speech;
    噪声更新单元,用于根据所述带噪语音以及平滑因子,对所述初始估计噪声功率谱进行更新得到有效噪声功率谱,所述计算平滑因子根据目标语音存在的概率计算得到。The noise update unit is configured to update the initial estimated noise power spectrum to obtain an effective noise power spectrum according to the noisy speech and a smoothing factor, and the calculated smoothing factor is calculated according to the probability of the existence of the target speech.
  17. 根据权利要求16所述的装置,其特征在于,所述目标语音存在的概率包括目标语音存在的后验概率,所述目标语音存在的后验概率和所述平滑因子之间的关系包括非线性关系。The apparatus according to claim 16, wherein the probability of the existence of the target speech comprises a posterior probability of the existence of the target speech, and the relationship between the posterior probability of the existence of the target speech and the smoothing factor comprises nonlinearity relationship.
  18. 根据权利要求17所述的装置,其特征在于,还包括:似然比计算单元,用于根据假设存在所述目标语音时带噪语音频谱的概率密度分布和假设不存在所述目标语音时带噪语音频谱的概率密度分布,确定似然比;目标语音存在的后验概率计算单元,用于根据所述似然比以及有效的目标语音存在的先验概率,确定所述目标语音存在的后验概率;18. The device according to claim 17, further comprising: a likelihood ratio calculation unit, configured to assume the presence of the target speech based on the probability density distribution of the noisy speech spectrum and the assumption that the target speech does not exist. The probability density distribution of the noise speech spectrum determines the likelihood ratio; the posterior probability calculation unit for the existence of the target speech is used to determine the posterior probability of the existence of the target speech according to the likelihood ratio and the prior probability of the existence of the effective target speech Probability
    对应地,所述平滑因子根据目标语音存在的后验概率计算得到。Correspondingly, the smoothing factor is calculated according to the posterior probability of the existence of the target speech.
  19. 根据权利要求18所述的装置,其特征在于,还包括:目标语音存在的先验概率计算单元,用于根据带噪语音功率谱,确定所述有效的目标语音存在的先验概率。The apparatus according to claim 18, further comprising: a priori probability calculation unit for the existence of the target speech, configured to determine the priori probability of the existence of the effective target speech according to the power spectrum of the noisy speech.
  20. 根据权利要求19所述的装置,其特征在于,所述目标语音存在的先验概率计算单元进一步用于:根据所述带噪语音功率谱,初步判断所述带噪语音中是否存在所述目标语音;根据所述初步判断当前帧所述带噪语音中是否存在目标语音的初步判断结果,确定估计的目标语音存在的先验概率,以及根据所述估计的目标语音存在的先验概率,确定所述有效的目标语音存在的先验概率。The device according to claim 19, wherein the prior probability calculation unit for the existence of the target speech is further configured to: according to the power spectrum of the noisy speech, preliminarily judge whether the target exists in the noisy speech Speech; according to the preliminary judgment result of the preliminary judgment of whether the target speech exists in the noisy speech in the current frame, determine the estimated prior probability of the existence of the target speech, and determine the estimated prior probability of the existence of the target speech The prior probability that the effective target speech exists.
  21. 根据权利要求20所述的装置,其特征在于,所述目标语音存在的先验概率计算单元进一步用于:若不存在所述目标语音,则对不存在所述目标语音的带噪语音功率谱进行频点间平滑得到频点间平滑后的带噪语音功率谱,或者,若存在所述目标语音,则将历史的帧间平滑后的带噪语音功率谱作为所述频点间平滑后的带噪语音功率谱;对所述频点间平滑后的带噪语音功率谱进行帧间平滑得到帧间平滑后的带噪语音功率谱;根据所述帧间平滑后的带噪语音功率谱,确定所述估计的目标语音存在的先验概率。22. The device according to claim 20, wherein the prior probability calculation unit for the existence of the target speech is further configured to: if the target speech does not exist, determine the power spectrum of the noisy speech without the target speech Perform inter-frequency smoothing to obtain the smoothed inter-frequency noisy speech power spectrum, or, if the target voice exists, use the smoothed historical inter-frame noisy speech power spectrum as the inter-frequency smoothed power spectrum Noisy speech power spectrum; inter-frame smoothing is performed on the noisy speech power spectrum smoothed between frequency points to obtain an inter-frame smoothed noisy speech power spectrum; according to the noisy speech power spectrum smoothed between frames, The prior probability of the existence of the estimated target speech is determined.
  22. 根据权利要求20所述的装置,其特征在于,所述目标语音存在的先验概率计算单元进一步用于:根据所述带噪语音功率谱以及经加窗以及帧间平滑后的带噪语音功率谱的最小功率谱,确定第一检测因子;根据经加窗以及帧间平滑后的带噪语音功率谱以及其最小功率谱确定第二检测因子,以根据所述第一检测因子以及所述第二检测因子,初步判断所述带噪语音中是否存在所述目标语音。The device according to claim 20, wherein the prior probability calculation unit for the existence of the target speech is further configured to: according to the noisy speech power spectrum and the noisy speech power after windowing and inter-frame smoothing The minimum power spectrum of the spectrum determines the first detection factor; the second detection factor is determined based on the windowed and smoothed inter-frame noised speech power spectrum and its minimum power spectrum to determine the second detection factor according to the first detection factor and the first The second detection factor is to preliminarily determine whether the target voice exists in the noisy voice.
  23. 根据权利要求22所述的装置,其特征在于,若所述第一检测因子小于设定的第一检测因子门限,且所述第二检测因子小于设定的第二检测因子门限,则初步判定所述带噪语音中不存在所述目标语音;反之,则初步判定所述带噪语音中存在所述目标语音。The device according to claim 22, wherein if the first detection factor is less than a set first detection factor threshold, and the second detection factor is less than a set second detection factor threshold, a preliminary determination The target voice does not exist in the noisy speech; otherwise, it is preliminarily determined that the target voice exists in the noisy speech.
  24. 根据权利要求21-23任一项所述的装置,其特征在于,所述目标语音存在的先验概率计算单元进一步用于:根据所述带噪语音功率谱以及所述帧间平滑后的带噪语音功率谱的最小功率谱,确定第三检测因子;根据所述帧间平滑后的带噪语音功率谱及其最小功率谱确定第四检测因子;根据所述第三检测因子与所述第四检测因子,确定所述估计的目标语音存在的先验概率。The device according to any one of claims 21-23, wherein the prior probability calculation unit for the existence of the target speech is further configured to: according to the power spectrum of the noisy speech and the smoothed band between frames Determine the third detection factor according to the minimum power spectrum of the noisy speech power spectrum; determine the fourth detection factor according to the smoothed noised speech power spectrum between frames and the minimum power spectrum; according to the third detection factor and the first Four detection factors to determine the prior probability of the estimated target speech existence.
  25. 根据权利要求24所述的装置,其特征在于,所述目标语音存在的先验概率计算单元进一步用于:将所述第三检测因子与所述第四检测因子与对应的阈值进行比对,根据所述比对结果,确定所述估计的目标语音存在的先验概率。The device according to claim 24, wherein the prior probability calculation unit for the existence of the target speech is further configured to: compare the third detection factor and the fourth detection factor with a corresponding threshold; According to the comparison result, the prior probability that the estimated target speech exists is determined.
  26. 根据权利要求20-25任一项所述的装置,其特征在于,所述目标语音存在的先验概率计算单元进一步用于:根据所述估计的目标语音存在的先验概率以及目标语音存在的先验概率的最小值,确定所述有效的目标语音存在的先验概率。The device according to any one of claims 20-25, wherein the prior probability calculation unit for the existence of the target speech is further configured to: according to the estimated prior probability of the existence of the target speech and the existence of the target speech The minimum value of the prior probability determines the prior probability of the existence of the effective target speech.
  27. 根据权利要求19-26任一项所述的装置,其特征在于,还包括:功率谱计算模块,用于计算所述带噪语音功率谱,以根据所述带噪语音功率谱,确定所述带噪语音的初始估计噪声功率谱。The device according to any one of claims 19-26, further comprising: a power spectrum calculation module, configured to calculate the noisy speech power spectrum to determine the noisy speech power spectrum The initial estimated noise power spectrum of noisy speech.
  28. 根据权利要求27所述的装置,其特征在于,所述初始噪声估计单元进一步用于:对所述带噪语音功率谱进行加窗处理;对所述加窗后的带噪语音功率谱进行帧间平滑处理;对所述帧间平滑后的带噪语音功率谱进行最小功率谱搜索,将搜索到的最小功率谱作为所述初始估计噪声功率谱。The device according to claim 27, wherein the initial noise estimation unit is further configured to: perform windowing processing on the noisy speech power spectrum; and frame the windowed noisy speech power spectrum Inter-smoothing processing; performing a minimum power spectrum search on the noisy speech power spectrum smoothed between frames, and using the searched minimum power spectrum as the initial estimated noise power spectrum.
  29. 根据权利要求19-28任一项所述的装置,其特征在于,所述噪声更新单元进一步用于:根据所述带噪语音功率谱、所述平滑因子、无目标语音存在的后验概率、 历史的所述初始估计噪声功率谱以及有目标语音存在的后验概率,对所述初始估计噪声功率谱进行更新得到所述有效噪声功率谱。The device according to any one of claims 19-28, wherein the noise update unit is further configured to: according to the power spectrum of the noisy speech, the smoothing factor, the posterior probability of the existence of no target speech, The historical initial estimated noise power spectrum and the posterior probability of the existence of the target speech are updated, and the effective noise power spectrum is obtained by updating the initial estimated noise power spectrum.
  30. 根据权利要求16-29任一项所述的装置,其特征在于,还包括:滤波模块,用于根据所述有效噪声功率谱,计算滤波器系数;根据所述滤波器系数,对所述带噪语音进行滤波得到增强语音频谱。The device according to any one of claims 16-29, further comprising: a filtering module, configured to calculate filter coefficients according to the effective noise power spectrum; according to the filter coefficients, the band The noisy speech is filtered to obtain an enhanced speech spectrum.
  31. 一种语音处理芯片,其特征在于,包括权利要求16-30任一项所述的噪声估计装置。A speech processing chip, characterized by comprising the noise estimation device according to any one of claims 16-30.
  32. 一种电子设备,其特征在于,包括权利要求31所述的语音处理芯片。An electronic device, characterized by comprising the voice processing chip of claim 31.
PCT/CN2019/096503 2019-07-18 2019-07-18 Noise estimation method, noise estimation apparatus, speech processing chip and electronic device WO2021007841A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980001368.0A CN112602150A (en) 2019-07-18 2019-07-18 Noise estimation method, noise estimation device, voice processing chip and electronic equipment
PCT/CN2019/096503 WO2021007841A1 (en) 2019-07-18 2019-07-18 Noise estimation method, noise estimation apparatus, speech processing chip and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/096503 WO2021007841A1 (en) 2019-07-18 2019-07-18 Noise estimation method, noise estimation apparatus, speech processing chip and electronic device

Publications (1)

Publication Number Publication Date
WO2021007841A1 true WO2021007841A1 (en) 2021-01-21

Family

ID=74209600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/096503 WO2021007841A1 (en) 2019-07-18 2019-07-18 Noise estimation method, noise estimation apparatus, speech processing chip and electronic device

Country Status (2)

Country Link
CN (1) CN112602150A (en)
WO (1) WO2021007841A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113270107A (en) * 2021-04-13 2021-08-17 维沃移动通信有限公司 Method and device for acquiring noise loudness in audio signal and electronic equipment
CN113838476A (en) * 2021-09-24 2021-12-24 世邦通信股份有限公司 Noise estimation method and device for noisy speech

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114166491A (en) * 2021-11-26 2022-03-11 中科传启(苏州)科技有限公司 Target equipment fault monitoring method and device, electronic equipment and medium
CN116403594B (en) * 2023-06-08 2023-08-18 澳克多普有限公司 Speech enhancement method and device based on noise update factor

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103650040A (en) * 2011-05-16 2014-03-19 谷歌公司 Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood
CN108735225A (en) * 2018-04-28 2018-11-02 南京邮电大学 It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099007A1 (en) * 2009-10-22 2011-04-28 Broadcom Corporation Noise estimation using an adaptive smoothing factor based on a teager energy ratio in a multi-channel noise suppression system
CN108831499B (en) * 2018-05-25 2020-07-21 西南电子技术研究所(中国电子科技集团公司第十研究所) Speech enhancement method using speech existence probability
WO2020107269A1 (en) * 2018-11-28 2020-06-04 深圳市汇顶科技股份有限公司 Self-adaptive speech enhancement method, and electronic device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103650040A (en) * 2011-05-16 2014-03-19 谷歌公司 Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood
CN108735225A (en) * 2018-04-28 2018-11-02 南京邮电大学 It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ISRAEL COHEN: "Noise Spectrum Estimation in Adverse Environments:Improved Minima Controlled Recursive Averaging", IEEE RANSACTIONS ON SPEECH AND AUDIO PROCESSING,, 30 September 2003 (2003-09-30), XP011100006, DOI: 20200417104418Y *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113270107A (en) * 2021-04-13 2021-08-17 维沃移动通信有限公司 Method and device for acquiring noise loudness in audio signal and electronic equipment
WO2022218252A1 (en) * 2021-04-13 2022-10-20 维沃移动通信有限公司 Method and apparatus for acquiring noise loudness in audio signal, and electronic device
CN113270107B (en) * 2021-04-13 2024-02-06 维沃移动通信有限公司 Method and device for acquiring loudness of noise in audio signal and electronic equipment
CN113838476A (en) * 2021-09-24 2021-12-24 世邦通信股份有限公司 Noise estimation method and device for noisy speech
CN113838476B (en) * 2021-09-24 2023-12-01 世邦通信股份有限公司 Noise estimation method and device for noisy speech

Also Published As

Publication number Publication date
CN112602150A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN108831499B (en) Speech enhancement method using speech existence probability
WO2021007841A1 (en) Noise estimation method, noise estimation apparatus, speech processing chip and electronic device
CN109767783B (en) Voice enhancement method, device, equipment and storage medium
WO2020107269A1 (en) Self-adaptive speech enhancement method, and electronic device
CN110634497B (en) Noise reduction method and device, terminal equipment and storage medium
CN111899752B (en) Noise suppression method and device for rapidly calculating voice existence probability, storage medium and terminal
CN111418010A (en) Multi-microphone noise reduction method and device and terminal equipment
JP5875609B2 (en) Noise suppressor
CN113539285B (en) Audio signal noise reduction method, electronic device and storage medium
CN110556125B (en) Feature extraction method and device based on voice signal and computer storage medium
CN110875049B (en) Voice signal processing method and device
WO2022218254A1 (en) Voice signal enhancement method and apparatus, and electronic device
CN112289337B (en) Method and device for filtering residual noise after machine learning voice enhancement
CN116312545B (en) Speech recognition system and method in a multi-noise environment
CN115440240A (en) Training method for voice noise reduction, voice noise reduction system and voice noise reduction method
CN115662461A (en) Noise reduction model training method, device and equipment
CN113611319A (en) Wind noise suppression method, device, equipment and system based on voice component
CN113299308A (en) Voice enhancement method and device, electronic equipment and storage medium
CN113808608B (en) Method and device for suppressing mono noise based on time-frequency masking smoothing strategy
CN116013337B (en) Audio signal processing method, training method, device, equipment and medium for model
CN115050367B (en) Method, device, equipment and storage medium for positioning speaking target
CN114333885A (en) Voice noise reduction method and device and storage medium
CN114664319A (en) Band spreading method, device, apparatus, medium, and program product
CN114360566A (en) Noise reduction processing method and device for voice signal and storage medium
Wang Research of the Tolerated Noise in Noisy Speech

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19938057

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19938057

Country of ref document: EP

Kind code of ref document: A1