WO2020107269A1 - Self-adaptive speech enhancement method, and electronic device - Google Patents

Self-adaptive speech enhancement method, and electronic device Download PDF

Info

Publication number
WO2020107269A1
WO2020107269A1 PCT/CN2018/117972 CN2018117972W WO2020107269A1 WO 2020107269 A1 WO2020107269 A1 WO 2020107269A1 CN 2018117972 W CN2018117972 W CN 2018117972W WO 2020107269 A1 WO2020107269 A1 WO 2020107269A1
Authority
WO
WIPO (PCT)
Prior art keywords
current frame
log
quantile
noise
signal
Prior art date
Application number
PCT/CN2018/117972
Other languages
French (fr)
Chinese (zh)
Inventor
朱虎
王鑫山
李国梁
曾端
郭红敬
Original Assignee
深圳市汇顶科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市汇顶科技股份有限公司 filed Critical 深圳市汇顶科技股份有限公司
Priority to CN201880002760.2A priority Critical patent/CN109643554B/en
Priority to PCT/CN2018/117972 priority patent/WO2020107269A1/en
Publication of WO2020107269A1 publication Critical patent/WO2020107269A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of information processing technology, in particular to an adaptive speech enhancement method and electronic equipment.
  • speech enhancement is an efficient way to solve noise pollution.
  • speech enhancement the clarity, intelligibility and comfort of the noise in a noisy environment can be improved, and the quality of human hearing perception can be improved; on the other hand, speech enhancement is also an indispensable part of the speech processing system.
  • speech enhancement must first be performed to reduce the impact of noise on the speech processing system and improve the working skills of the system.
  • Speech enhancement mainly includes two parts: noise estimation and filter coefficient solution.
  • Representative speech enhancement methods include spectral subtraction, Wiener filtering, minimum mean square error estimation, subspace methods, wavelet transform-based enhancement methods, and so on. Most of these methods are based on statistical models of speech and noise components in frequency, and combined with various estimation theories to design targeted noise cancellation techniques.
  • the purpose of some embodiments of the present application is to provide an adaptive speech enhancement method, which makes the estimation of noise more accurate, and reduces the complexity of the algorithm, thereby facilitating the enhancement of speech signals and improving the quality of human auditory perception.
  • An embodiment of the present application provides an adaptive speech enhancement method, which includes: after receiving a speech signal, calculating the power of the current frame of the speech signal according to the speech signal; comparing the power of the current frame with the noise power of the previous frame ;According to the result of the comparison and the noise power of the previous frame, the noise estimate value of the current frame is obtained; according to the noise estimate value, the pure voice signal is obtained.
  • An embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores executable by the at least one processor Instructions that are executed by the at least one processor to enable the at least one processor to perform the adaptive speech enhancement method described above.
  • the embodiment of the present application calculates the power of the current frame of the voice signal according to the received voice signal, compares the power of the current frame with the noise power of the previous frame, and according to the result of the comparison and the previous frame To obtain the noise estimate of the current frame.
  • the VAD algorithm it is not necessary to use the VAD algorithm to detect whether the current frame is a speech frame or a noise frame, so that the inaccurate detection of the VAD algorithm will lead to a large deviation of the noise estimation, which is beneficial to quickly estimate the noise component in the speech signal.
  • This application uses an iterative estimation method.
  • the noise power of each frame is adaptively updated.
  • the power of the current frame is compared with the noise power of the previous frame to estimate the noise value of the current frame.
  • the power is recalculated for each frame, which can achieve continuous estimation and continuous update of noise. It only needs to compare the power of the current frame with the noise power of the previous frame, and does not need to store the previous D frame data and Sorting according to power size reduces the algorithm resource overhead and algorithm complexity. Obtaining pure voice signals based on noise estimates is helpful for enhancing voice signals and improving the quality of human auditory perception.
  • the power of the current frame is specifically: the log power spectrum of the current frame
  • the noise power of the previous frame is specifically: the log quantile of the previous frame.
  • the logarithmic coordinates can amplify the details, and can extract signals that cannot be extracted under the general coordinate scale, which is conducive to compressing the dynamic range of the values, so that in the logarithmic coordinate system, the log power spectrum of the current frame and the log of the previous frame The comparison between quantiles is more precise, which facilitates subsequent accurate processing.
  • obtaining the noise estimate value of the current frame includes: according to the comparison result of the log power spectrum of the current frame and the log quantile of the previous frame, obtaining the current frame Incremental step; obtain the log quantile of the current frame according to the log quantile of the previous frame and the incremental step of the current frame; obtain the noise of the current frame according to the log quantile of the current frame estimated value.
  • the incremental step size of the current frame provides a meaningful reference for obtaining the log quantile of the current frame, which is beneficial to accurately obtain the log quantile of the current frame, so as to accurately estimate the noise value of the previous frame .
  • the log quantile of the current frame is obtained according to the log quantile of the previous frame and the incremental step size of the current frame, which specifically includes: if the log power spectrum of the current frame is greater than or equal to the log of the previous frame Quantile, the log quantile of the previous frame is adaptively increased according to the incremental step size to obtain the log quantile of the current frame; if the log power spectrum of the current frame is less than the log quantile of the previous frame For the number of bits, the log quantile of the previous frame is adaptively reduced according to the incremental step size to obtain the log quantile of the current frame.
  • adaptively increasing or decreasing the log quantile of the previous frame according to the incremental step size it is beneficial to accurately obtain the log quantile of the current frame.
  • the incremental step of the current frame is obtained, which specifically includes: according to the log power spectrum of the current frame and the previous frame.
  • the comparison result of the quantiles is used to obtain the density function; obtaining the incremental step size of the current frame according to the density function provides a way to obtain the incremental step size of the current frame.
  • is the frame number of the current frame
  • k is the number of frequency points
  • is the empirical value of the experiment
  • is the preset threshold
  • 2 ) Is the log power spectrum of the current frame, where lq( ⁇ -1, k) is the log quantile of the previous frame; a specific calculation formula for obtaining the density function is provided, which is beneficial to quickly and accurately obtain the density function.
  • the incremental step of the current frame is obtained according to the density function
  • the incremental step delta is specifically obtained by the following formula:
  • is the frame number of the current frame
  • K is the incremental step size control factor
  • density( ⁇ -1, k) is the density function of the previous frame, and provides a specific calculation formula for obtaining the incremental step size. It is helpful to obtain the incremental step quickly and accurately.
  • the log quantile of the previous frame is adaptively increased to obtain the log quantile of the current frame.
  • the calculation formulas for adaptively increasing and decreasing the log quantile are provided, which is beneficial to obtain the log quantile of the current frame directly, quickly and accurately.
  • obtaining the pure voice signal according to the noise estimate includes: obtaining the power spectrum of the current frame of the voice signal; obtaining the spectral gain coefficient according to the noise estimate; obtaining the pure voice signal of the current frame according to the spectral gain coefficient, which is beneficial to self Adapt to track the change of noise in each frame, enhance the voice of each frame, improve the clarity, intelligibility and comfort of the voice in a noisy environment, reduce the impact of noise on the voice processing system, and improve the working skills of the system .
  • obtaining the spectral gain coefficient based on the noise estimate includes: calculating the prior signal-to-noise ratio based on the previous frame's noise estimate and the previous frame's pure speech signal; based on the current frame's noise estimate and the current frame's Power, calculate the posterior signal-to-noise ratio; obtain the spectral gain coefficient according to the a priori signal-to-noise ratio and the posterior signal-to-noise ratio, providing a way to obtain the spectral gain coefficient.
  • obtaining the spectral gain coefficient according to the a priori signal-to-noise ratio and the posterior signal-to-noise ratio specifically includes: obtaining the spectral gain coefficient according to the following formula:
  • ⁇ k is the posterior signal-to-noise ratio
  • ⁇ k is the a priori signal-to-noise ratio
  • p is the perceptual weighted order
  • is the order of the higher-order amplitude spectrum.
  • calculating the signal-to-noise ratio of several subbands specifically includes: calculating the signal-to-noise ratio of the several subbands by the following formula:
  • b is the serial number of the subband
  • k is the number of frequency points
  • B low (b) is the starting point of the frequency point of the b subband in the Bark domain
  • B up (b) is the end point of the frequency point of the b subband in the Bark domain.
  • calculating the perceptual weighted order based on the signal-to-noise ratio of several subbands specifically: calculating the perceptual weighted order p by the following formula:
  • ⁇ 1 , ⁇ 2 , p min and p max are experimental empirical values.
  • a specific calculation formula for obtaining the perceptual weighted order is provided, which is beneficial to obtain the perceptual weighted order accurately and quickly.
  • E.g, with It can be obtained by the following method: query according to the input/output correspondence relationship of the pre-stored ⁇ function with with It can be obtained in the following way: query according to the input/output correspondence between the pre-stored ⁇ function with The query method based on the correspondence relationship greatly reduces the calculation complexity of the method, reduces the calculation amount, and has more engineering applicability.
  • the pure voice signal is obtained according to the spectral gain coefficient, specifically obtained by the following formula:
  • Y w (k) is the signal amplitude of the current frame, and provides a specific formula for obtaining a pure voice signal, which is beneficial to quickly and accurately obtain the pure voice signal of the current frame.
  • FIG. 1 is a flowchart of the adaptive speech enhancement method according to the first embodiment of the present application
  • FIG. 2 is a schematic diagram of the Kaiser window function according to the first embodiment of the present application.
  • step 104 is a schematic diagram of the sub-steps of step 104 in the first embodiment of the present application.
  • FIG. 4 is a flowchart of an adaptive speech enhancement method according to the second embodiment of the present application.
  • FIG. 5 is a schematic diagram of modules for implementing an adaptive speech enhancement method according to the second embodiment of the present application.
  • FIG. 6 is a flowchart of an adaptive speech enhancement method according to the third embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
  • the first embodiment of the present application relates to an adaptive speech enhancement method, which includes: after receiving the speech signal, calculating the power of the current frame of the speech signal according to the speech signal; and performing the power of the current frame with the adaptively updated noise power Comparison; wherein, the adaptively updated noise power is the noise power of the previous frame of the speech signal; according to the comparison result, the noise estimate value of the current frame is obtained; according to the noise estimate value, the pure speech signal is obtained so that the noise
  • the estimation is more accurate and reduces the complexity of the algorithm, which is beneficial to enhance the speech signal and improve the quality of human auditory perception.
  • the implementation details of the adaptive speech enhancement method of this embodiment are described below in detail. The following content is only for implementation details provided for easy understanding, and is not necessary for implementing this solution.
  • the adaptive speech enhancement method of this embodiment can be applied in the field of speech signal processing technology and is applicable to low-power speech enhancement, speech recognition, and voice interaction products, including but not limited to headphones, stereos, mobile phones, televisions, automobiles, and wearable Electronic equipment such as equipment and smart home.
  • FIG. 1 The specific process of the adaptive speech enhancement method in this embodiment is shown in FIG. 1 and includes:
  • Step 101 After receiving the voice signal, calculate the power of the current frame of the voice signal according to the voice signal.
  • the voice signal after receiving the voice signal, the voice signal can be transformed in the time domain and the frequency domain to obtain the frequency domain voice.
  • the frequency domain voice is a coordinate system used to describe the frequency characteristics of the voice signal.
  • the transformation of the speech signal from the time domain to the frequency domain is mainly realized by Fourier series and Fourier transform.
  • the periodic signal depends on the Fourier series, and the non-periodic signal depends on the Fourier transform.
  • the power of the current frame is obtained according to the amplitude of the current frame of the frequency domain speech signal.
  • the data length is generally between 8 ms and 30 ms.
  • the processing of the voice signal can be 64 points and overlap 64 points in the previous frame, then the Processing 128 points, that is, the overlap rate between the current frame and the previous frame is 50%, but in practical applications, it is not limited to this.
  • the specific operation can be: Among them, ⁇ is a smoothing factor. In this embodiment, ⁇ can take a value of 0.98, but in actual applications, different settings can be made according to actual needs.
  • y(n) is the sampled speech signal of the current frame
  • y(n-1) is the sampled speech signal of the previous frame.
  • the interception function can be used to truncate the signal.
  • the truncation function is called the window function, that is, the speech signal is windowed.
  • the design of the window function can be selected according to different application scenarios Rectangular windows, Hamming windows, Hanning windows, and Gaussian window functions can be flexibly selected in actual design.
  • Rectangular windows, Hamming windows, Hanning windows, and Gaussian window functions can be flexibly selected in actual design.
  • the Kaiser window function shown in FIG. 2 is used, and the overlap is 50%.
  • the windowed data can be subjected to the fast Fourier transform FFT by the following formula to obtain the frequency domain signal.
  • N 128 points are actually processed at a time.
  • N 128 as an example, but it is not limited to this in practical applications.
  • . m is the number of frames, and the value of n can be from 1 to 128.
  • the amplitude of the transformed 128-frequency frequency domain signal can be obtained, and the amplitude of the 128 frequency points can be squared respectively.
  • Step 102 Compare the power of the current frame with the noise power of the previous frame.
  • the noise power of the previous frame is the adaptively updated noise power.
  • the noise power can be initialized according to the experimental value first. If the current frame is the first frame, the power of the current frame can be compared with the initialized noise power.
  • the adaptively updated noise power means that the noise power of different frames is different.
  • the noise power of the current frame can be adaptively updated during the iteration process. For example, compare the power of 128 frequency points of the current frame with the power of 128 frequency points of the previous frame, and adaptively update the noise power corresponding to each frequency point of the current frame.
  • Step 103 Acquire the noise estimate value of the current frame according to the comparison result and the noise power of the previous frame.
  • the noise power of the previous frame can be adaptively increased as the noise estimate of the current frame, for example, an incremental step can be preset Long, according to the incremental step to increase adaptively.
  • the incremental step size can also be updated adaptively during the iteration. If the power of the current frame is less than the noise power of the previous frame, the noise power of the previous frame can be adaptively reduced, and the reduced noise power can be used as the noise estimation value of the current frame.
  • Step 104 Obtain a pure voice signal according to the noise estimate.
  • step 104 may include the following sub-steps as shown in FIG. 3:
  • Step 1041 Calculate the a priori signal-to-noise ratio based on the noise estimate of the previous frame and the pure voice signal of the previous frame.
  • the calculation of the a priori signal-to-noise ratio can use the classic improved decision guidance method, and the a priori signal-to-noise ratio can be calculated according to the following formula
  • a is the smoothing factor and ⁇ min is the preset empirical value
  • I the pure voice signal power of the previous frame
  • the frame number of the current frame.
  • the value of a may be 0.98
  • ⁇ min may be -15dB according to experience, but it is not limited to this in practical applications.
  • the prior signal-to-noise ratio is calculated by the above formula as an example, but it is not limited to this in practical applications.
  • Step 1042 Calculate the posterior signal-to-noise ratio according to the current frame noise estimate and the current frame power.
  • the posterior signal-to-noise ratio can be calculated according to the following formula:
  • I the power of the current frame
  • ⁇ d (k) the noise estimate of the current frame
  • step 1041 and step 1042 are not limited. In practical applications, step 1042 may be executed first and then step 1041 may be executed, or step 1041 and step 1042 may be executed simultaneously.
  • Step 1043 Calculate the perceptual weighted order p.
  • the parameter p can be calculated adaptively according to the sub-band signal-to-noise ratio and the characteristics of the Bark domain.
  • the Bark domain can be divided into several subbands, for example: the Bark domain can be divided into 18 subbands, and the upper limit frequency of each subband is: [100,200,300,400,510,630,770,920,1080,1270,1480 , 1720, 2000, 2320, 2700, 3150, 3700, 4400], according to the human ear is more sensitive to speech in the Bark domain, calculate the signal-to-noise ratio of subbands, and calculate the signal-to-noise ratio of several subbands by the following formula:
  • b is the serial number of the sub-band
  • the serial number of the sub-band is 1 ⁇ b ⁇ 18
  • k is the number of frequency points
  • B low (b) is the starting point of the frequency point of the b sub-band of the Bark domain
  • B up (b) is the first of the Bark domain.
  • b The end point of the frequency of the subband.
  • the parameter p can be calculated by the following formula:
  • ⁇ 1 , ⁇ 2 , p min and p max are experimental experience values.
  • Step 1044 Calculate the order ⁇ of the higher-order amplitude spectrum.
  • F s is the sampling frequency
  • ⁇ max and A are experimental empirical values.
  • step 1044 can be executed first and then step 1043, or step 1043 and step 1044 can be executed simultaneously.
  • Step 1045 Obtain the spectral gain coefficient according to the a priori signal-to-noise ratio, the posterior signal-to-noise ratio, the perceptual weighted order, and the order of the higher-order amplitude spectrum.
  • the core idea of obtaining the spectral gain coefficient can be Bayesian short-term amplitude spectrum estimation, and its cost function is:
  • the spectral gain coefficient G can be calculated according to the prior signal-to-noise ratio ⁇ k and the posterior signal-to-noise ratio ⁇ k , parameters ⁇ and p.
  • the spectral gain coefficient can be calculated in the form of a look-up table.
  • the specific correspondence between the input and output of the ⁇ function and the ⁇ function can be pre-stored. Input and output the corresponding relationship table to query, when the input is , The corresponding output value When the input is , The corresponding output value Query in the input and output correspondence table of the pre-stored ⁇ function: When the input is , The corresponding output value When the input is Corresponding output value Finally, the found output value is brought into the calculation expression of the spectral gain coefficient to obtain the spectral gain coefficient, which greatly reduces the calculation complexity of the method.
  • the spectral gain coefficient is obtained by the expression of the spectral gain coefficient G as an example, but it is not limited to this in practical applications.
  • Step 1046 Obtain the pure voice signal of the current frame according to the spectral gain coefficient.
  • the pure voice signal of the current frame can be calculated according to the following formula
  • Y w (k) is the signal amplitude of the current frame.
  • this embodiment has the following technical effects: First, compared with the traditional noise estimation, there is no need to detect voiced and unvoiced speech, the noise is updated at the same time in the noise frame and the speech frame, and the noise can be tracked adaptively The change. Second, compared with the traditional quantile noise estimation, there is no need to store the previous D frame data and sort by power, which reduces the algorithm resource overhead. Third, when calculating the spectral gain coefficient, the human ear masking mechanism and the sensitivity to noise and spectral amplitude are also considered. The adaptive update parameters p and ⁇ are compared with the traditional generalized weighted higher-order spectral estimator for speech enhancement. Reduced the amount of calculation, and has more engineering applicability.
  • the second embodiment of the present application relates to an adaptive speech enhancement method.
  • the power of the current frame in this embodiment is specifically: the log power spectrum of the current frame; the noise power in this embodiment is specifically the log quantile .
  • the comparison between the log power spectrum of the current frame and the log quantile of the previous frame is more accurate, thereby facilitating subsequent accurate processing.
  • the specific process of the adaptive speech enhancement method in this embodiment is shown in FIG. 4 and includes:
  • Step 201 After receiving the voice signal, calculate the log power spectrum of the current frame of the voice signal according to the voice signal.
  • step 201 is substantially the same as step 101, the difference is that the power calculated in step 101 is the current frame power, and the log power spectrum calculated in this step is the current frame, that is, the calculated current
  • the power of the frame is logarithmic.
  • the processing of the voice signal of the current frame can be 64 points and overlap the 64 points of the previous frame, then the actual processing of 128 points at a time can obtain the power value of 128 points, and the power value of 128 points
  • the logarithmic power corresponding to 128 frequency points can be obtained by taking the logarithms respectively, and the 128 logarithmic powers constitute the logarithmic power spectrum of the current frame.
  • Step 202 Acquire a density function according to the comparison result of the log power spectrum of the current frame and the log quantile of the previous frame.
  • the initial log quantile and the initial density function may be preset. That is, the density function and the log quantile can be initialized according to the experimental value first.
  • the density function of the current frame can be updated according to the log power spectrum of the current frame and the log quantile of the previous frame. Specifically, it can be updated according to the following formula:
  • is the frame number of the current frame
  • k is the number of frequency points
  • is the empirical value of the experiment
  • is the preset threshold threshold
  • 2 ) is the logarithmic power spectrum of the current frame
  • lq( ⁇ -1,k) is the log quantile of the previous frame.
  • the density function of the current frame is obtained by using the above density function calculation formula as an example, but it is not limited to this in practical applications.
  • Step 203 Obtain the incremental step size of the current frame according to the density function.
  • the initial incremental step size can be set in advance.
  • the incremental step of the current frame is updated according to the density function of the previous frame, which can be specifically updated according to the following formula:
  • K is the incremental step size control factor. If the current frame is the first frame, the incremental step size control factor K is the initial incremental step size.
  • the incremental step of the current frame is obtained by the above incremental step calculation formula as an example.
  • any method for obtaining the incremental step of the current frame according to the density function is This embodiment is within the protection scope.
  • Step 204 Obtain the log quantile of the current frame according to the log quantile of the previous frame and the incremental step size of the current frame.
  • the log quantile of the previous frame can be adaptively increased according to the incremental step size to obtain the current frame’s Log quantile; if the log power spectrum of the current frame is less than the log quantile of the previous frame, the log quantile of the previous frame can be adaptively reduced according to the increment step to get the current frame Of the log quantile.
  • Step 205 Acquire the noise estimate of the current frame according to the log quantile of the current frame.
  • the noise estimate can be calculated by the following formula:
  • Step 206 Obtain a pure voice signal according to the noise estimate.
  • Step 206 is substantially the same as step 104 in the first embodiment, and will not be repeated here to avoid repetition.
  • this embodiment provides a block diagram as shown in FIG. 5 to explain the adaptive speech enhancement method in this embodiment:
  • the de-pre-emphasis module 310 is mainly a low-pass filter.
  • the de-pre-emphasis module 310 and the pre-emphasis module 301 are reciprocal processes, and the combination of the two can achieve the effect of de-reverberation.
  • the windowing module 302 is mainly to avoid the occurrence of sudden changes in overlapping signals.
  • the window synthesis module 309 mainly removes the effect of the window function on the output of pure voice signals.
  • the windowing module 302 and the window synthesis module 309 use the same window function in the implementation process. Therefore, the window function must be a power-preserving mapping, that is, the sum of the squared windows of the overlapping parts of the voice signal must be 1, as shown in the following formula Show:
  • N is the number of FFT processing points, the value is 128, M is the frame length value is 64.
  • the fast Fourier transform FFT module 303 is mainly used for mutual conversion between the time domain signal and the frequency domain signal.
  • the FFT module 303 and the inverse FFT module 308 are inverse processes of each other.
  • the FFT module 303 converts the time domain signal into a frequency domain signal, and after conversion into the frequency domain signal, the signal amplitude Y w can be obtained according to the frequency domain signal.
  • the inverse FFT module 308 converts the frequency domain signal into a time domain signal.
  • the power spectrum calculation module 304 is configured to obtain the power P of the current frame by squaring the amplitude obtained from the frequency domain signal.
  • the log power spectrum calculation module 305 is configured to log the power of the current frame to obtain the log power spectrum of the current frame.
  • the power spectrum calculation module 304 and the logarithm calculation module 305 are mainly a pre-processing process before noise estimation.
  • the noise value estimation module 306 mainly performs noise estimation on a noisy speech signal, and estimates an accurate noise signal as much as possible.
  • the noise estimation value is mainly obtained through noise estimation according to the principle of adaptive quantile noise estimation.
  • the calculating spectral gain coefficient module 307 is mainly to complete the calculation of the spectral gain coefficient according to the noise estimate value and the power of the noisy speech signal to obtain the spectral gain coefficient G. Specifically, the calculation of the spectral gain coefficient is mainly based on the principle of the generalized weighted high-order short-time spectral amplitude estimator.
  • the pure voice signal in the frequency domain is obtained according to the spectral gain coefficient G and the signal amplitude Y w.
  • the frequency domain signal is converted into a time domain signal through the inverse FFT module 308, and then processed by the window synthesis module 309 and the de-pre-emphasis module 310 to output a pure voice signal in the time domain.
  • this embodiment compares the log power spectrum of the current frame of the noisy speech with the log quantile of the previous frame to modify the log quantile to obtain the noise estimate value. It can avoid the detection of voice signals, a large amount of data storage and power spectrum sorting operations in the prior art, and reduce the algorithm resource overhead. Moreover, the logarithmic coordinates can amplify the details, and can extract signals that cannot be extracted under the general coordinate scale, which is conducive to compressing the dynamic range of the values, so that in the logarithmic coordinate system, the logarithmic power spectrum of the current frame and the previous frame The comparison between the quantiles is more accurate, which facilitates subsequent accurate processing.
  • the third embodiment of the present application relates to an adaptive speech enhancement method.
  • a specific formula is provided to adaptively increase the log quantile of the previous frame according to the incremental step size to obtain
  • the log quantile of the current frame is beneficial to obtain the log quantile of the current frame directly, quickly and accurately.
  • the specific process of the adaptive speech enhancement method in this embodiment is shown in FIG. 6 and includes:
  • Step 401 After receiving the voice signal, calculate the log power spectrum of the current frame of the voice signal according to the voice signal.
  • Step 402 Acquire a density function according to the comparison result of the log power spectrum of the current frame and the log quantile of the previous frame.
  • Step 403 Obtain the incremental step size of the current frame according to the density function.
  • Steps 401 to 403 are substantially the same as steps 201 to 203 in the second embodiment, and will not be repeated here to avoid repetition.
  • Step 404 Determine whether the log power spectrum of the current frame is greater than or equal to the log quantile of the previous frame. If yes, perform step 405; otherwise, perform step 406.
  • the log quantile of the current frame is: according to the increment of the log quantile of the previous frame
  • is the current number of frames
  • k is the number of frequency points
  • ⁇ and ⁇ are the experimental experience values.
  • the log quantile of the current frame is: according to the increment of the log quantile of the previous frame
  • Step 407 According to the formula Get the noise estimate of the current frame.
  • Step 408 Obtain the pure voice signal according to the noise estimate.
  • Steps 407 to 408 are substantially the same as steps 205 to 206 in the second embodiment, and will not be repeated here to avoid repetition.
  • this embodiment provides a specific formula for adaptively increasing the log quantile of the previous frame according to the incremental step size to obtain the log quantile of the current frame, which is beneficial to the current frame.
  • the incremental step size of is directly, quickly and accurately obtains the log quantile of the current frame, which is beneficial to the noise estimation according to the log quantile of the current frame.
  • the fourth embodiment of the present application relates to an electronic device, as shown in FIG. 7, including at least one processor 501; and, a memory 502 communicatively connected to the at least one processor 501; wherein, the memory 502 stores An instruction executed by one processor 501, the instruction is executed by at least one processor 501, so that the at least one processor 501 can execute the adaptive speech enhancement method described above.
  • the bus may include any number of interconnected buses and bridges.
  • the bus connects one or more processors 501 and various circuits of the memory 502 together.
  • the bus can also connect various other circuits such as peripheral devices, voltage regulators, and power management circuits, etc., which are well known in the art, and therefore, they will not be described further herein.
  • the bus interface provides an interface between the bus and the transceiver.
  • the transceiver can be a single element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices on the transmission medium.
  • the data processed by the processor 501 is transmitted on the wireless medium through the antenna. Further, the antenna also receives the data and transmits the data to the processor 501.
  • the processor 501 is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions.
  • the memory 502 may be used to store data used by the processor 501 when performing operations.
  • a storage medium includes several instructions to make a device (may be A single chip microcomputer, a chip, etc.) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .

Abstract

A self-adaptive speech enhancement method and an electronic device. The self-adaptive speech enhancement method comprises: after receiving a speech signal, calculating, according to the speech signal, the power of a current frame of the speech signal (101); comparing the power of the current frame with the noise power of a previous frame (102); acquiring, according to a comparison result and the noise power of the previous frame, a noise estimation value of the current frame (103); and acquiring, according to the noise estimation value, a pure speech signal (104). The use of this method makes the estimation of noise more accurate and reduces the complexity of an algorithm, thereby facilitating the enhancement of a speech signal and improving the quality of human auditory perception.

Description

自适应语音增强方法和电子设备Adaptive speech enhancement method and electronic equipment 技术领域Technical field
本申请涉及信息处理技术领域,特别涉及一种自适应语音增强方法和电子设备。This application relates to the field of information processing technology, in particular to an adaptive speech enhancement method and electronic equipment.
背景技术Background technique
现实生活中,由于说话人经常要处于各种嘈杂的环境中,语音信号不可避免的会受到背景噪声的污染,背景噪声使许多语音处理系统的性能急剧下降。语音增强作为一种信号处理方法,是解决噪声污染的一种高效途径。一方面,通过进行语音增强,可以提高噪声环境下语音的清晰度,可懂度和舒适度,改善人的听觉感知质量;另一方面,语音增强也是语音处理系统中必不可少的环节,在进行各种语音信号处理操作前首先要进行语音增强,以减小噪声对语音处理系统的影响,提高系统的工作技能。In real life, since the speaker is often in a variety of noisy environments, the voice signal is inevitably polluted by background noise. The background noise degrades the performance of many voice processing systems. As a signal processing method, speech enhancement is an efficient way to solve noise pollution. On the one hand, through speech enhancement, the clarity, intelligibility and comfort of the noise in a noisy environment can be improved, and the quality of human hearing perception can be improved; on the other hand, speech enhancement is also an indispensable part of the speech processing system. Before performing various speech signal processing operations, speech enhancement must first be performed to reduce the impact of noise on the speech processing system and improve the working skills of the system.
语音增强最主要包括噪声估计和滤波器系数求解两部分。代表性的语音增强方法包括谱减法、维纳滤波法、最小均方误差估计法、子空间法、基于小波变换的增强方法等。这些方法大都基于频率中语音和噪声分量的统计模型,并结合各种估计理论来设计具有针对性的噪声消除技术。Speech enhancement mainly includes two parts: noise estimation and filter coefficient solution. Representative speech enhancement methods include spectral subtraction, Wiener filtering, minimum mean square error estimation, subspace methods, wavelet transform-based enhancement methods, and so on. Most of these methods are based on statistical models of speech and noise components in frequency, and combined with various estimation theories to design targeted noise cancellation techniques.
现有技术中的语音增强算法中,存在噪声估计不准确,算法复杂的问题。In the speech enhancement algorithm in the prior art, there are problems of inaccurate noise estimation and complicated algorithm.
发明内容Summary of the invention
本申请部分实施例的目的在于提供一种自适应语音增强方法,使得对于噪声的估计更加准确,且降低了算法的复杂度,从而有利于对语音信号进行增强,改善人的听觉感知质量。The purpose of some embodiments of the present application is to provide an adaptive speech enhancement method, which makes the estimation of noise more accurate, and reduces the complexity of the algorithm, thereby facilitating the enhancement of speech signals and improving the quality of human auditory perception.
本申请实施例提供了一种自适应语音增强方法,包括:在接收到语音信号后,根据语音信号,计算语音信号的当前帧的功率;将当前帧的功率与前一帧的噪声功率进行比较;根据比较的结果和前一帧的噪声功率,获取当前帧的噪声估计值;根据噪声估计值,获取纯净语音信号。An embodiment of the present application provides an adaptive speech enhancement method, which includes: after receiving a speech signal, calculating the power of the current frame of the speech signal according to the speech signal; comparing the power of the current frame with the noise power of the previous frame ;According to the result of the comparison and the noise power of the previous frame, the noise estimate value of the current frame is obtained; according to the noise estimate value, the pure voice signal is obtained.
本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的自适应语音增强方法。An embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores executable by the at least one processor Instructions that are executed by the at least one processor to enable the at least one processor to perform the adaptive speech enhancement method described above.
本申请实施例相对于现有技术而言,根据接收的语音信号,计算语音信号的当前帧的功率,将当前帧的功率与前一帧的噪声功率进行比较、根据比较的结果和前一帧的噪声功率,获取当前帧的噪声估计值。对于噪声的估计无需利用VAD算法以检测区分当前帧是语音帧还是噪声帧,从而可以避免由于VAD算法检测的不准确将导致噪声估计的偏差很大,有利于快速估计语音信号中的噪声成分。本申请采用了迭代的估计方法,每一帧的噪声功率都会进行自适应更新,将当前帧的功率与前一帧的噪声功率进行比较,估计当前帧的噪声值,在不断的迭代过程中,也使得估计的噪声值越来越精准。而且,本申请中,对每一帧都重新计算其功率,能够实现噪声的连续估计和不断更新,只需要将当前帧的功率与前一帧的噪声功率对比,不需要存储前D帧数据和按照功率大 小进行排序,从而减小了算法资源开销,降低了算法的复杂度。根据噪声估计值,获取纯净语音信号,有利于对语音信号进行增强,改善人的听觉感知质量。Compared with the prior art, the embodiment of the present application calculates the power of the current frame of the voice signal according to the received voice signal, compares the power of the current frame with the noise power of the previous frame, and according to the result of the comparison and the previous frame To obtain the noise estimate of the current frame. For the estimation of noise, it is not necessary to use the VAD algorithm to detect whether the current frame is a speech frame or a noise frame, so that the inaccurate detection of the VAD algorithm will lead to a large deviation of the noise estimation, which is beneficial to quickly estimate the noise component in the speech signal. This application uses an iterative estimation method. The noise power of each frame is adaptively updated. The power of the current frame is compared with the noise power of the previous frame to estimate the noise value of the current frame. During the continuous iteration process, It also makes the estimated noise value more and more accurate. Moreover, in this application, the power is recalculated for each frame, which can achieve continuous estimation and continuous update of noise. It only needs to compare the power of the current frame with the noise power of the previous frame, and does not need to store the previous D frame data and Sorting according to power size reduces the algorithm resource overhead and algorithm complexity. Obtaining pure voice signals based on noise estimates is helpful for enhancing voice signals and improving the quality of human auditory perception.
例如,当前帧的功率具体为:当前帧的对数功率谱;前一帧的噪声功率具体为:前一帧的对数分位数。对数坐标能够放大细节,可以提取出一般坐标刻度下提取不出的信号,有利于压缩数值的动态范围,使得在对数坐标系下,当前帧的对数功率谱和前一帧的对数分位数之间的比较更加精确,从而有利于后续的精准处理。For example, the power of the current frame is specifically: the log power spectrum of the current frame; the noise power of the previous frame is specifically: the log quantile of the previous frame. The logarithmic coordinates can amplify the details, and can extract signals that cannot be extracted under the general coordinate scale, which is conducive to compressing the dynamic range of the values, so that in the logarithmic coordinate system, the log power spectrum of the current frame and the log of the previous frame The comparison between quantiles is more precise, which facilitates subsequent accurate processing.
例如,根据比较结果和前一帧的噪声功率,获取当前帧的噪声估计值,具体包括:根据当前帧的对数功率谱与前一帧的对数分位数的比较结果,获取当前帧的增量步长;根据前一帧的对数分位数和当前帧的增量步长获取当前帧的对数分位数;根据所述当前帧的对数分位数,获取当前帧的噪声估计值。当前帧的增量步长为获取当前的帧的对数分位数提供有针对意义的参考,有利于准确的获取当前帧的对数分位数,从而对前帧的噪声值进行准确的估计。For example, according to the comparison result and the noise power of the previous frame, obtaining the noise estimate value of the current frame includes: according to the comparison result of the log power spectrum of the current frame and the log quantile of the previous frame, obtaining the current frame Incremental step; obtain the log quantile of the current frame according to the log quantile of the previous frame and the incremental step of the current frame; obtain the noise of the current frame according to the log quantile of the current frame estimated value. The incremental step size of the current frame provides a meaningful reference for obtaining the log quantile of the current frame, which is beneficial to accurately obtain the log quantile of the current frame, so as to accurately estimate the noise value of the previous frame .
例如,根据前一帧的对数分位数和当前帧的增量步长获取当前帧的对数分位数,具体包括:若当前帧的对数功率谱大于或等于前一帧的对数分位数,则根据增量步长将前一帧的对数分位数自适应增加,得到当前帧的对数分位数;若当前帧的对数功率谱小于前一帧的对数分位数,则根据增量步长将前一帧的对数分位数自适应减小,得到当前帧的对数分位数。通过将前一帧的对数分位数根据增量步长进行自适应的增加或减小,有利于准确的获取当前帧的对数分位数。For example, the log quantile of the current frame is obtained according to the log quantile of the previous frame and the incremental step size of the current frame, which specifically includes: if the log power spectrum of the current frame is greater than or equal to the log of the previous frame Quantile, the log quantile of the previous frame is adaptively increased according to the incremental step size to obtain the log quantile of the current frame; if the log power spectrum of the current frame is less than the log quantile of the previous frame For the number of bits, the log quantile of the previous frame is adaptively reduced according to the incremental step size to obtain the log quantile of the current frame. By adaptively increasing or decreasing the log quantile of the previous frame according to the incremental step size, it is beneficial to accurately obtain the log quantile of the current frame.
例如,根据当前帧的对数功率谱与前一帧的对数分位数的比较结果,获取当前帧的增量步长,具体包括:根据当前帧的对数功率谱与前一帧的对数分 位数的比较结果,获取密度函数;根据密度函数获取当前帧的增量步长,提供了一种获取当前帧的增量步长的方式。For example, according to the comparison result of the log power spectrum of the current frame and the log quantile of the previous frame, the incremental step of the current frame is obtained, which specifically includes: according to the log power spectrum of the current frame and the previous frame. The comparison result of the quantiles is used to obtain the density function; obtaining the incremental step size of the current frame according to the density function provides a way to obtain the incremental step size of the current frame.
例如,获取密度函数,具体用过以下公式获取密度函数density:For example, to obtain the density function, specifically use the following formula to obtain the density function density:
Figure PCTCN2018117972-appb-000001
Figure PCTCN2018117972-appb-000001
其中,所述λ为当前帧的帧号,所述k为频点数,所述β为实验的经验值,所述ξ为预设的门限阈值,所述log(|Y w(λ)| 2)为当前帧的对数功率谱,所述lq(λ-1,k)为前一帧的对数分位数;提供了一种获取密度函数的具体计算公式,有利于快速准确地获取密度函数。 Where λ is the frame number of the current frame, k is the number of frequency points, β is the empirical value of the experiment, ξ is the preset threshold, and log(|Y w (λ)| 2 ) Is the log power spectrum of the current frame, where lq(λ-1, k) is the log quantile of the previous frame; a specific calculation formula for obtaining the density function is provided, which is beneficial to quickly and accurately obtain the density function.
例如,根据密度函数获取当前帧的增量步长,具体通过以下公式获取增量步长delta:For example, the incremental step of the current frame is obtained according to the density function, and the incremental step delta is specifically obtained by the following formula:
Figure PCTCN2018117972-appb-000002
Figure PCTCN2018117972-appb-000002
其中,λ为当前帧的帧号,K为增量步长控制因子,density(λ-1,k)为前一帧的密度函数,提供了一种获取增量步长的具体计算公式,有利于快速准确地获取增量步长。Among them, λ is the frame number of the current frame, K is the incremental step size control factor, density(λ-1, k) is the density function of the previous frame, and provides a specific calculation formula for obtaining the incremental step size. It is helpful to obtain the incremental step quickly and accurately.
例如,根据增量步长将前一帧的对数分位数自适应增加,得到当前帧的对数分位数,具体通过以下公式得到当前帧的对数分位数:lq(λ,k)=lq(λ-1,k)+α·delta(λ,k)/β;根据增量步长将前一帧的对数分位数自适应减小,得到当前帧的对数分位数,具体包括:具体通过以下公式得到当前帧的对数分位数:lq(λ,k)=lq(λ-1,k)-(1-α)·delta(λ,k)/β;其中,λ为当前帧的帧号,k为频点数,α为实验的经验值,delta(λ,k)为增量步长。提供了自适应增 加和减小对数分位数的计算公式,有利于直接、快速并准确的获取当前帧的对数分位数。For example, according to the increment step, the log quantile of the previous frame is adaptively increased to obtain the log quantile of the current frame. Specifically, the log quantile of the current frame is obtained by the following formula: lq(λ,k )=lq(λ-1,k)+α·delta(λ,k)/β; adaptively reduce the log quantile of the previous frame according to the increment step to obtain the log quantile of the current frame The number includes: the log quantile of the current frame is obtained by the following formula: lq(λ,k)=lq(λ-1,k)-(1-α)·delta(λ,k)/β; Where λ is the frame number of the current frame, k is the number of frequency points, α is the empirical value of the experiment, and delta(λ, k) is the incremental step size. The calculation formulas for adaptively increasing and decreasing the log quantile are provided, which is beneficial to obtain the log quantile of the current frame directly, quickly and accurately.
例如,根据噪声估计值,获取纯净语音信号,具体包括:获取语音信号的当前帧的功率谱;根据噪声估计值,获取谱增益系数;根据谱增益系数获取当前帧的纯净语音信号,有利于自适应的跟踪每一帧噪声的变化,对每一帧均进行语音增强,提高噪声环境下语音的清晰度,可懂度和舒适度,减小噪声对语音处理系统的影响,提高系统的工作技能。For example, obtaining the pure voice signal according to the noise estimate includes: obtaining the power spectrum of the current frame of the voice signal; obtaining the spectral gain coefficient according to the noise estimate; obtaining the pure voice signal of the current frame according to the spectral gain coefficient, which is beneficial to self Adapt to track the change of noise in each frame, enhance the voice of each frame, improve the clarity, intelligibility and comfort of the voice in a noisy environment, reduce the impact of noise on the voice processing system, and improve the working skills of the system .
例如,根据噪声估计值,获取谱增益系数,具体包括:根据前一帧的噪声估计值和前一帧的纯净语音信号,计算先验信噪比;根据当前帧的噪声估计值和当前帧的功率,计算后验信噪比;根据先验信噪比和后验信噪比获取谱增益系数,提供了一种获取谱增益系数的方式。For example, obtaining the spectral gain coefficient based on the noise estimate includes: calculating the prior signal-to-noise ratio based on the previous frame's noise estimate and the previous frame's pure speech signal; based on the current frame's noise estimate and the current frame's Power, calculate the posterior signal-to-noise ratio; obtain the spectral gain coefficient according to the a priori signal-to-noise ratio and the posterior signal-to-noise ratio, providing a way to obtain the spectral gain coefficient.
例如,根据先验信噪比和后验信噪比获取谱增益系数,具体包括:根据以下公式获取谱增益系数:For example, obtaining the spectral gain coefficient according to the a priori signal-to-noise ratio and the posterior signal-to-noise ratio specifically includes: obtaining the spectral gain coefficient according to the following formula:
Figure PCTCN2018117972-appb-000003
Figure PCTCN2018117972-appb-000003
其中,γ k为后验信噪比,ξ k为先验信噪比,
Figure PCTCN2018117972-appb-000004
p为感知加权阶数,β为高阶幅度谱的阶数。提供了一种获取谱增益系数的具体计算公式,有利于准确快速的获取谱增益系数。
Where γ k is the posterior signal-to-noise ratio, and ξ k is the a priori signal-to-noise ratio,
Figure PCTCN2018117972-appb-000004
p is the perceptual weighted order, and β is the order of the higher-order amplitude spectrum. A specific calculation formula for obtaining the spectral gain coefficient is provided, which is beneficial to obtain the spectral gain coefficient accurately and quickly.
例如,计算若干个子带的信噪比,具体包括:通过以下公式计算所述若干个子带的信噪比:For example, calculating the signal-to-noise ratio of several subbands specifically includes: calculating the signal-to-noise ratio of the several subbands by the following formula:
Figure PCTCN2018117972-appb-000005
Figure PCTCN2018117972-appb-000005
其中,b为子带的序号,k为频点数,B low(b)为Bark域第b个子带的频点起点,B up(b)为Bark域第b子带的频点终点,考虑到了人耳对语音在Bark域更加的敏感和人耳掩蔽机制,有利于改善人的听觉感知质量。 Where b is the serial number of the subband, k is the number of frequency points, B low (b) is the starting point of the frequency point of the b subband in the Bark domain, and B up (b) is the end point of the frequency point of the b subband in the Bark domain. The human ear is more sensitive to speech in the Bark domain and the human ear masking mechanism is conducive to improving the quality of human auditory perception.
例如,根据若干个子带的信噪比计算感知加权阶数,具体为:通过以下公式计算所述感知加权阶数p:For example, calculating the perceptual weighted order based on the signal-to-noise ratio of several subbands, specifically: calculating the perceptual weighted order p by the following formula:
p(b,k)=max{min[α 1SNR(b,k)+α 2,p max],p min} p(b,k)=max{min[α 1 SNR(b,k)+α 2 ,p max ],p min }
其中,α 1,α 2,p min和p max均为实验经验值。提供了一种获取感知加权阶数的具体计算公式,有利于准确快速的获取感知加权阶数。 Among them, α 1 , α 2 , p min and p max are experimental empirical values. A specific calculation formula for obtaining the perceptual weighted order is provided, which is beneficial to obtain the perceptual weighted order accurately and quickly.
例如,
Figure PCTCN2018117972-appb-000006
Figure PCTCN2018117972-appb-000007
具体通过以下方式得到:根据预存的Γ函数的输入输出对应关系查询
Figure PCTCN2018117972-appb-000008
Figure PCTCN2018117972-appb-000009
Figure PCTCN2018117972-appb-000010
具体通过以下方式得到:根据预存的Φ函数的输入输出对应关系查询
Figure PCTCN2018117972-appb-000011
Figure PCTCN2018117972-appb-000012
采用根据对应关系查询的方式,使得该方法的计算复杂度大大降低,减小了运算量,更加具有工程适用性。
E.g,
Figure PCTCN2018117972-appb-000006
with
Figure PCTCN2018117972-appb-000007
It can be obtained by the following method: query according to the input/output correspondence relationship of the pre-stored Γ function
Figure PCTCN2018117972-appb-000008
with
Figure PCTCN2018117972-appb-000009
with
Figure PCTCN2018117972-appb-000010
It can be obtained in the following way: query according to the input/output correspondence between the pre-stored Φ function
Figure PCTCN2018117972-appb-000011
with
Figure PCTCN2018117972-appb-000012
The query method based on the correspondence relationship greatly reduces the calculation complexity of the method, reduces the calculation amount, and has more engineering applicability.
例如,根据谱增益系数获取纯净语音信号,具体通过以下公式获取:For example, the pure voice signal is obtained according to the spectral gain coefficient, specifically obtained by the following formula:
Figure PCTCN2018117972-appb-000013
Figure PCTCN2018117972-appb-000013
其中,所述Y w(k)为当前帧的信号幅值,提供了获取纯净语音信号的具体公式, 有利于快速准确的得到当前帧的纯净语音信号。 Wherein, Y w (k) is the signal amplitude of the current frame, and provides a specific formula for obtaining a pure voice signal, which is beneficial to quickly and accurately obtain the pure voice signal of the current frame.
附图说明BRIEF DESCRIPTION
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。One or more embodiments are exemplarily illustrated by the pictures in the corresponding drawings. These exemplary descriptions do not constitute a limitation on the embodiments, and elements with the same reference numerals in the drawings represent similar elements. Unless otherwise stated, the figures in the drawings do not constitute a scale limitation.
图1是根据本申请第一实施例中的自适应语音增强方法的流程图;FIG. 1 is a flowchart of the adaptive speech enhancement method according to the first embodiment of the present application;
图2是根据本申请第一实施例中的Kaiser窗函数示意图;2 is a schematic diagram of the Kaiser window function according to the first embodiment of the present application;
图3是根据本申请第一实施例中的步骤104的子步骤的示意图;3 is a schematic diagram of the sub-steps of step 104 in the first embodiment of the present application;
图4是根据本申请第二实施例中的自适应语音增强方法的流程图;4 is a flowchart of an adaptive speech enhancement method according to the second embodiment of the present application;
图5是根据本申请第二实施例中的实现自适应语音增强方法的模块示意图;5 is a schematic diagram of modules for implementing an adaptive speech enhancement method according to the second embodiment of the present application;
图6是根据本申请第三实施例中的自适应语音增强方法的流程图;6 is a flowchart of an adaptive speech enhancement method according to the third embodiment of the present application;
图7是根据本申请第四实施例中的电子设备的结构示意图。7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请部分实施例进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。以下各个实施例的划分是为了描述方便,不应对本发明的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the purpose, technical solutions and advantages of the present application more clear, the following describes some embodiments of the present application in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application. The division of the following embodiments is for convenience of description, and should not constitute any limitation on the specific implementation manner of the present invention. The embodiments can be combined with each other and referenced without contradiction.
本申请第一实施例涉及一种自适应语音增强方法,包括:在接收到语音信号后,根据语音信号,计算语音信号的当前帧的功率;将当前帧的功率与自适应更新的噪声功率进行比较;其中,自适应更新的噪声功率为所述语音信号的前一帧的噪声功率;根据比较的结果,获取当前帧的噪声估计值;根据噪声估计值,获取纯净语音信号,使得对于噪声的估计更加准确,且降低了算法的复杂度,从而有利于对语音信号进行增强,改善人的听觉感知质量。下面对本实施方式的自适应语音增强方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。The first embodiment of the present application relates to an adaptive speech enhancement method, which includes: after receiving the speech signal, calculating the power of the current frame of the speech signal according to the speech signal; and performing the power of the current frame with the adaptively updated noise power Comparison; wherein, the adaptively updated noise power is the noise power of the previous frame of the speech signal; according to the comparison result, the noise estimate value of the current frame is obtained; according to the noise estimate value, the pure speech signal is obtained so that the noise The estimation is more accurate and reduces the complexity of the algorithm, which is beneficial to enhance the speech signal and improve the quality of human auditory perception. The implementation details of the adaptive speech enhancement method of this embodiment are described below in detail. The following content is only for implementation details provided for easy understanding, and is not necessary for implementing this solution.
本实施例的自适应语音增强方法能够运用于语音信号处理技术领域并适用于低功耗语音增强、语音识别、语音交互产品中,包括但不限于耳机、音响、手机、电视、汽车、可穿戴设备、智能家居等电子设备。The adaptive speech enhancement method of this embodiment can be applied in the field of speech signal processing technology and is applicable to low-power speech enhancement, speech recognition, and voice interaction products, including but not limited to headphones, stereos, mobile phones, televisions, automobiles, and wearable Electronic equipment such as equipment and smart home.
本实施例中自适应语音增强方法的具体流程如图1所示,包括:The specific process of the adaptive speech enhancement method in this embodiment is shown in FIG. 1 and includes:
步骤101:在接收到语音信号后,根据语音信号,计算语音信号的当前帧的功率。Step 101: After receiving the voice signal, calculate the power of the current frame of the voice signal according to the voice signal.
具体的说,在接收到语音信号后可以对语音信号进行时域和频域的变换,得到频域语音,频域语音是描述语音信号在频率方面特性时用到的一种坐标系。语音信号从时间域变换到频率域主要通过傅立叶级数和傅立叶变换实现。周期信号靠傅立叶级数,非周期信号靠傅立叶变换。通常,语音信号的时域越宽,其频域越短。根据频域语音信号的当前帧的幅值得到当前帧的功率。Specifically, after receiving the voice signal, the voice signal can be transformed in the time domain and the frequency domain to obtain the frequency domain voice. The frequency domain voice is a coordinate system used to describe the frequency characteristics of the voice signal. The transformation of the speech signal from the time domain to the frequency domain is mainly realized by Fourier series and Fourier transform. The periodic signal depends on the Fourier series, and the non-periodic signal depends on the Fourier transform. Generally, the wider the time domain of a speech signal, the shorter the frequency domain. The power of the current frame is obtained according to the amplitude of the current frame of the frequency domain speech signal.
在一个例子中,假设对语音信号的采样速率为Fs=8000Hz,数据长度一般处理数据为8ms~30ms之间,对语音信号进行处理可以为64点并且重叠上一帧64个点,则实际一次处理128个点,即当前帧与上一帧的重叠率为50%,但 在实际应用中,并不以此为限。对接收的语音信号进行预加重处理,提高语音信号的高频成分,具体的运算可以为:
Figure PCTCN2018117972-appb-000014
其中,α为平滑因子,本实施例中α可取值为0.98,但在实际应用中可根据实际需求进行不同的设置。y(n)为采样的当前帧的语音信号,y(n-1)为采样得到的上一帧的语音信号。
In one example, assuming that the sampling rate of the voice signal is Fs = 8000 Hz, the data length is generally between 8 ms and 30 ms. The processing of the voice signal can be 64 points and overlap 64 points in the previous frame, then the Processing 128 points, that is, the overlap rate between the current frame and the previous frame is 50%, but in practical applications, it is not limited to this. Perform pre-emphasis processing on the received voice signal to improve the high-frequency component of the voice signal. The specific operation can be:
Figure PCTCN2018117972-appb-000014
Among them, α is a smoothing factor. In this embodiment, α can take a value of 0.98, but in actual applications, different settings can be made according to actual needs. y(n) is the sampled speech signal of the current frame, and y(n-1) is the sampled speech signal of the previous frame.
进一步的,在预加重处理后,为了减少频谱能量泄漏,可采用截取函数对信号进行截断,截断函数称为窗函数,即对语音信号进行加窗处理,窗函数设计可以根据不同运用场景可选择矩形窗、汉明窗、汉宁窗和高斯窗函数等,实际设计中可灵活选择,本实施例采用如图2所示的Kaiser窗函数,重叠50%。Further, after pre-emphasis processing, in order to reduce the spectral energy leakage, the interception function can be used to truncate the signal. The truncation function is called the window function, that is, the speech signal is windowed. The design of the window function can be selected according to different application scenarios Rectangular windows, Hamming windows, Hanning windows, and Gaussian window functions can be flexibly selected in actual design. In this embodiment, the Kaiser window function shown in FIG. 2 is used, and the overlap is 50%.
另外,由于计算语音信号的当前帧的功率通常在频域进行处理,所以可以将加窗数据通过以下公式,进行快速傅里叶变换FFT,得到频域信号。In addition, since the calculation of the power of the current frame of the speech signal is usually processed in the frequency domain, the windowed data can be subjected to the fast Fourier transform FFT by the following formula to obtain the frequency domain signal.
Figure PCTCN2018117972-appb-000015
Figure PCTCN2018117972-appb-000015
Figure PCTCN2018117972-appb-000016
Figure PCTCN2018117972-appb-000016
其中,k代表频点数,w(n)为Kasier窗函数,N为128,即实际一次处理128个点,本实施例只是以N=128为例,但在实际应用中并不以此为限。m为帧数,n的取值可以为1到128。对于当前帧的功率的计算,可以对变换后的128个频点的频域信号求幅值,并分别对128个频点的幅值求平方得到。Where k represents the number of frequency points, w(n) is the Kasier window function, and N is 128, that is, 128 points are actually processed at a time. This embodiment only takes N=128 as an example, but it is not limited to this in practical applications. . m is the number of frames, and the value of n can be from 1 to 128. For the calculation of the power of the current frame, the amplitude of the transformed 128-frequency frequency domain signal can be obtained, and the amplitude of the 128 frequency points can be squared respectively.
步骤102:将当前帧的功率与前一帧的噪声功率进行比较。Step 102: Compare the power of the current frame with the noise power of the previous frame.
具体的说,前一帧的噪声功率为自适应更新的噪声功率。在实际应用中,首先可以根据实验值初始化噪声功率,如果当前帧为第一帧,那么可以将当前帧的功率与初始化的噪声功率进行比较。自适应更新的噪声功率是指不同帧的 噪声功率是不同的,设置噪声功率的初始值后,在迭代的过程中可自适应的更新当前帧的噪声功率。比如说,分别将当前帧的128个频点的功率和前一帧的128个频点的功率进行比较,自适应的更新当前帧中每一个频点对应的噪声功率。Specifically, the noise power of the previous frame is the adaptively updated noise power. In practical applications, the noise power can be initialized according to the experimental value first. If the current frame is the first frame, the power of the current frame can be compared with the initialized noise power. The adaptively updated noise power means that the noise power of different frames is different. After the initial value of the noise power is set, the noise power of the current frame can be adaptively updated during the iteration process. For example, compare the power of 128 frequency points of the current frame with the power of 128 frequency points of the previous frame, and adaptively update the noise power corresponding to each frequency point of the current frame.
步骤103:根据比较的结果和前一帧的噪声功率,获取当前帧的噪声估计值。Step 103: Acquire the noise estimate value of the current frame according to the comparison result and the noise power of the previous frame.
具体的说,如果当前帧的功率大于前一帧的噪声功率,则可以自适应的增大前一帧的噪声功率,以此作为当前帧的噪声估计值,比如说,可以预设增量步长,根据增量步长进行自适应的增加。较佳的,增量步长也可以在迭代的过程中自适应的更新。如果当前帧的功率小于前一帧的噪声功率,则可以自适应的减小前一帧的噪声功率,将减小后的噪声功率作为当前帧的噪声估计值。Specifically, if the power of the current frame is greater than the noise power of the previous frame, the noise power of the previous frame can be adaptively increased as the noise estimate of the current frame, for example, an incremental step can be preset Long, according to the incremental step to increase adaptively. Preferably, the incremental step size can also be updated adaptively during the iteration. If the power of the current frame is less than the noise power of the previous frame, the noise power of the previous frame can be adaptively reduced, and the reduced noise power can be used as the noise estimation value of the current frame.
步骤104:根据噪声估计值,获取纯净语音信号。Step 104: Obtain a pure voice signal according to the noise estimate.
具体的说,步骤104可以包括如图3所示的如下子步骤:Specifically, step 104 may include the following sub-steps as shown in FIG. 3:
步骤1041:根据前一帧的噪声估计值和前一帧的纯净语音信号,计算先验信噪比。Step 1041: Calculate the a priori signal-to-noise ratio based on the noise estimate of the previous frame and the pure voice signal of the previous frame.
具体的说,先验信噪比的计算可以采用经典的改进判决引导方法,可以根据以下公式计算先验信噪比
Figure PCTCN2018117972-appb-000017
Specifically, the calculation of the a priori signal-to-noise ratio can use the classic improved decision guidance method, and the a priori signal-to-noise ratio can be calculated according to the following formula
Figure PCTCN2018117972-appb-000017
Figure PCTCN2018117972-appb-000018
Figure PCTCN2018117972-appb-000018
其中,a为平滑因子,ξ min为预设的经验值,
Figure PCTCN2018117972-appb-000019
为所述前一帧的纯净语音信号功率,λ为当前帧的帧号。在一个例子中,a的取值可以为0.98,ξ min可根据经验取值为-15dB,但在实际应用中并不以此为限。
Where a is the smoothing factor and ξ min is the preset empirical value,
Figure PCTCN2018117972-appb-000019
Is the pure voice signal power of the previous frame, and λ is the frame number of the current frame. In an example, the value of a may be 0.98, and ξ min may be -15dB according to experience, but it is not limited to this in practical applications.
需要说明的是,本实施方式以通过上述公式计算先验信噪比为例,但在实际应用中并不以此为限。It should be noted that, in this embodiment, the prior signal-to-noise ratio is calculated by the above formula as an example, but it is not limited to this in practical applications.
步骤1042:根据当前帧的噪声估计值和当前帧的功率,计算后验信噪比。Step 1042: Calculate the posterior signal-to-noise ratio according to the current frame noise estimate and the current frame power.
具体的说,可以根据以下公式计算后验信噪比:Specifically, the posterior signal-to-noise ratio can be calculated according to the following formula:
Figure PCTCN2018117972-appb-000020
Figure PCTCN2018117972-appb-000020
其中,
Figure PCTCN2018117972-appb-000021
为当前帧的功率,λ d(k)为当前帧的噪声估计值。
among them,
Figure PCTCN2018117972-appb-000021
Is the power of the current frame, and λ d (k) is the noise estimate of the current frame.
需要说明的是,本实施方式以通过上述公式计算后验信噪比为例,但在实际应用中并不以此为限。而且本实施例中并不限定步骤1041与步骤1042的执行顺序,在实际应用中,也可以先执行步骤1042再执行步骤1041,或者同时执行步骤1041和步骤1042。It should be noted that, in this embodiment, the posterior signal-to-noise ratio is calculated by the above formula as an example, but it is not limited to this in practical applications. Moreover, in this embodiment, the execution order of step 1041 and step 1042 is not limited. In practical applications, step 1042 may be executed first and then step 1041 may be executed, or step 1041 and step 1042 may be executed simultaneously.
步骤1043:计算感知加权阶数p。Step 1043: Calculate the perceptual weighted order p.
具体的说,可以根据子带信噪比和Bark域的特性自适应的计算参数p。具体的,在语音信号的频率谱中,可以将Bark域划分为若干个子带,比如说:可以将Bark域分为18个子带,每个子带的上限频率为:[100,200,300,400,510,630,770,920,1080,1270,1480,1720,2000,2320,2700,3150,3700,4400],根据人耳对语音在Bark域更加的敏感,计算子带的信噪比,通过以下公式计算若干个子带的信噪比:Specifically, the parameter p can be calculated adaptively according to the sub-band signal-to-noise ratio and the characteristics of the Bark domain. Specifically, in the frequency spectrum of the voice signal, the Bark domain can be divided into several subbands, for example: the Bark domain can be divided into 18 subbands, and the upper limit frequency of each subband is: [100,200,300,400,510,630,770,920,1080,1270,1480 , 1720, 2000, 2320, 2700, 3150, 3700, 4400], according to the human ear is more sensitive to speech in the Bark domain, calculate the signal-to-noise ratio of subbands, and calculate the signal-to-noise ratio of several subbands by the following formula:
Figure PCTCN2018117972-appb-000022
Figure PCTCN2018117972-appb-000022
其中,b为子带的序号,子带的序号1≤b≤18,k为频点数,B low(b)为Bark域 第b子带的频点起点,B up(b)为Bark域第b子带的频点终点。进一步的,可以通过以下公式计算参数p: Where b is the serial number of the sub-band, the serial number of the sub-band is 1≤b≤18, k is the number of frequency points, B low (b) is the starting point of the frequency point of the b sub-band of the Bark domain, and B up (b) is the first of the Bark domain. b The end point of the frequency of the subband. Further, the parameter p can be calculated by the following formula:
p(b,k)=max{min[α 1SNR(b,k)+α 2,p max],p min} p(b,k)=max{min[α 1 SNR(b,k)+α 2 ,p max ],p min }
其中,α 1,α 2,p min和p max均为实验经验值,在本实施例中,比如说实验经验值的取值可以如下所示:α 1=0.251,α 2=-1.542,p max=4,p min=-1,但在实际应用中并不以此为限。 Among them, α 1 , α 2 , p min and p max are experimental experience values. In this embodiment, for example, the experimental experience values can be as follows: α 1 =0.251, α 2 =-1.542, p max = 4, p min = -1, but it is not limited to this in practical applications.
步骤1044:计算高阶幅度谱的阶数β。Step 1044: Calculate the order β of the higher-order amplitude spectrum.
具体的说,通过以下公式计算高阶幅度谱的阶数β:Specifically, the order β of the higher-order amplitude spectrum is calculated by the following formula:
Figure PCTCN2018117972-appb-000023
Figure PCTCN2018117972-appb-000023
其中,F s为采样频率,f(k)=kFs/N,代表FFT以后每个频点代表的频率数,β max和A为实验经验值。比如说,在本实施例中,上述经验值的取值可以分别如下:β max=0.8,β min=0.2,A=165.4Hz,但在实际应用中并不以此为限。 Among them, F s is the sampling frequency, f(k)=kFs/N, which represents the number of frequencies represented by each frequency point after FFT, and β max and A are experimental empirical values. For example, in this embodiment, the values of the above empirical values may be as follows: β max = 0.8, β min = 0.2, A = 165.4 Hz, but it is not limited to this in practical applications.
需要说明的是,本实施方式并不限定步骤1043和步骤1044的执行顺序,在实际应用中,也可以先执行步骤1044再执行步骤1043,或者同时执行步骤1043和步骤1044。It should be noted that this embodiment does not limit the execution order of step 1043 and step 1044. In practical applications, step 1044 can be executed first and then step 1043, or step 1043 and step 1044 can be executed simultaneously.
步骤1045:根据先验信噪比、后验信噪比、感知加权阶数和高阶幅度谱的阶数获取谱增益系数。Step 1045: Obtain the spectral gain coefficient according to the a priori signal-to-noise ratio, the posterior signal-to-noise ratio, the perceptual weighted order, and the order of the higher-order amplitude spectrum.
具体的说,得到谱增益系数的核心思想可以为贝叶斯短时幅度谱估计,其代价函数为:Specifically, the core idea of obtaining the spectral gain coefficient can be Bayesian short-term amplitude spectrum estimation, and its cost function is:
Figure PCTCN2018117972-appb-000024
Figure PCTCN2018117972-appb-000024
类似经典的MMSE估计器的推导过程,可以得到:Similar to the derivation process of the classic MMSE estimator, we can get:
Figure PCTCN2018117972-appb-000025
Figure PCTCN2018117972-appb-000025
假设X k和D k都是复高斯随机分布,可以得到: Assuming that both X k and D k are complex Gaussian random distributions, we can obtain:
Figure PCTCN2018117972-appb-000026
Figure PCTCN2018117972-appb-000026
Figure PCTCN2018117972-appb-000027
Figure PCTCN2018117972-appb-000027
其中,
Figure PCTCN2018117972-appb-000028
为先验信噪比的理论公式,由于实际中很难得到当前帧的纯净语音功率λ x(k),所以通常可以采用以下公式来估计和近似求解先验信噪比ξ k
among them,
Figure PCTCN2018117972-appb-000028
It is a theoretical formula of a priori signal-to-noise ratio. Since it is difficult to obtain the pure voice power λ x (k) of the current frame in practice, the following formula can usually be used to estimate and approximate the a priori signal-to-noise ratio ξ k :
Figure PCTCN2018117972-appb-000029
Figure PCTCN2018117972-appb-000029
由以上推导公式可以得到谱增益系数G的计算公式如下:The formula for calculating the spectral gain coefficient G from the above derivation formula is as follows:
Figure PCTCN2018117972-appb-000030
Figure PCTCN2018117972-appb-000030
由上述谱增益系数G的表达式可知,根据先验信噪比ξ k,后验信噪比γ k,参数β和p可以计算谱增益系数。 From the above expression of the spectral gain coefficient G, it can be known that the spectral gain coefficient can be calculated according to the prior signal-to-noise ratio ξ k and the posterior signal-to-noise ratio γ k , parameters β and p.
进一步的,考虑到Γ函数和Φ函数的复杂性,可以采用查表的形式计算谱增益系数,具体的可以预存的Γ函数和Φ函数的输入输出对应关系,比如说:根据预存的Γ函数的输入输出对应关系表进行查询,当输入为
Figure PCTCN2018117972-appb-000031
时,对应 的输出值
Figure PCTCN2018117972-appb-000032
当输入为
Figure PCTCN2018117972-appb-000033
时,对应的输出值
Figure PCTCN2018117972-appb-000034
在预存的Φ函数的输入输出对应关系表中进行查询:当输入为
Figure PCTCN2018117972-appb-000035
时,对应的输出值
Figure PCTCN2018117972-appb-000036
当输入为
Figure PCTCN2018117972-appb-000037
对应的输出值
Figure PCTCN2018117972-appb-000038
最后将查找到的输出值带入谱增益系数的计算表达式中可以得到谱增益系数,这样使得该方法的计算复杂度大大降低。
Further, considering the complexity of the Γ function and the Φ function, the spectral gain coefficient can be calculated in the form of a look-up table. The specific correspondence between the input and output of the Γ function and the Φ function can be pre-stored. Input and output the corresponding relationship table to query, when the input is
Figure PCTCN2018117972-appb-000031
, The corresponding output value
Figure PCTCN2018117972-appb-000032
When the input is
Figure PCTCN2018117972-appb-000033
, The corresponding output value
Figure PCTCN2018117972-appb-000034
Query in the input and output correspondence table of the pre-stored Φ function: When the input is
Figure PCTCN2018117972-appb-000035
, The corresponding output value
Figure PCTCN2018117972-appb-000036
When the input is
Figure PCTCN2018117972-appb-000037
Corresponding output value
Figure PCTCN2018117972-appb-000038
Finally, the found output value is brought into the calculation expression of the spectral gain coefficient to obtain the spectral gain coefficient, which greatly reduces the calculation complexity of the method.
需要说明的是,本实施例中只是以通过谱增益系数G的表达式得到谱增益系数为例,但在实际应用中并不以此为限。It should be noted that, in this embodiment, the spectral gain coefficient is obtained by the expression of the spectral gain coefficient G as an example, but it is not limited to this in practical applications.
步骤1046:根据谱增益系数获取当前帧的纯净语音信号。Step 1046: Obtain the pure voice signal of the current frame according to the spectral gain coefficient.
具体的说,在得到谱增益系数后,可以根据以下公式计算得到当前帧的纯净语音信号
Figure PCTCN2018117972-appb-000039
Specifically, after obtaining the spectral gain coefficient, the pure voice signal of the current frame can be calculated according to the following formula
Figure PCTCN2018117972-appb-000039
Figure PCTCN2018117972-appb-000040
Figure PCTCN2018117972-appb-000040
其中,Y w(k)为当前帧的信号幅值。 Where Y w (k) is the signal amplitude of the current frame.
需要说明的是,本实施例中只是以通过纯净语音信号
Figure PCTCN2018117972-appb-000041
的上述计算公式得到的到纯净语音信号为例,在实际应用中任何通过谱增益系数获取当前帧的纯净语音信号的方法均在本实施例保护范围之内。
It should be noted that in this embodiment, only pure voice signals are used.
Figure PCTCN2018117972-appb-000041
The pure voice signal obtained by the above calculation formula is taken as an example. In practical applications, any method for obtaining the pure voice signal of the current frame through the spectral gain coefficient is within the protection scope of this embodiment.
本实施例相对于现有技术而言,具有以下技术效果:第一、与传统噪声估计相比,不需要进行语音有声无声检测,在噪声帧和语音帧同时更新噪声,能够自适应的跟踪噪声的变化。第二、与传统的分位数噪声估计相比,不需要存储前D帧数据和按照功率大小进行排序,降低了算法资源开销。第三、在计算谱增益系数时同时考虑人耳掩蔽机制和对噪声与频谱振幅的敏感程度,自适 应的更新参数p和β,与传统的广义加权高阶谱估计量的语音增强相比,减小了运算量,更加具有工程适用性。Compared with the prior art, this embodiment has the following technical effects: First, compared with the traditional noise estimation, there is no need to detect voiced and unvoiced speech, the noise is updated at the same time in the noise frame and the speech frame, and the noise can be tracked adaptively The change. Second, compared with the traditional quantile noise estimation, there is no need to store the previous D frame data and sort by power, which reduces the algorithm resource overhead. Third, when calculating the spectral gain coefficient, the human ear masking mechanism and the sensitivity to noise and spectral amplitude are also considered. The adaptive update parameters p and β are compared with the traditional generalized weighted higher-order spectral estimator for speech enhancement. Reduced the amount of calculation, and has more engineering applicability.
本申请第二实施例涉及一种自适应语音增强方法,本实施例中的当前帧的功率,具体为:当前帧的对数功率谱;本实施例中的噪声功率具体为对数分位数。在对数坐标系下,当前帧的对数功率谱和前一帧的对数分位数之间的比较更加精确,从而有利于后续的精准处理。The second embodiment of the present application relates to an adaptive speech enhancement method. The power of the current frame in this embodiment is specifically: the log power spectrum of the current frame; the noise power in this embodiment is specifically the log quantile . Under the logarithmic coordinate system, the comparison between the log power spectrum of the current frame and the log quantile of the previous frame is more accurate, thereby facilitating subsequent accurate processing.
本实施例中自适应语音增强方法的具体流程如图4所示,包括:The specific process of the adaptive speech enhancement method in this embodiment is shown in FIG. 4 and includes:
步骤201:在接收到语音信号后,根据语音信号,计算语音信号的当前帧的对数功率谱。Step 201: After receiving the voice signal, calculate the log power spectrum of the current frame of the voice signal according to the voice signal.
具体的说,步骤201与步骤101大致相同,不同之处在于,步骤101计算的为当前帧的功率,而本步骤中计算的为当前帧的对数功率谱,即还需对计算得到的当前帧的功率取对数。比如说,对当前帧的语音信号进行处理可为64个点并且重叠上一帧64个点,则实际一次处理128个点,即可以得到128个点的功率值,对128个点的功率值分别取对数可以得到128个频点对应的对数功率,128个对数功率组成当前帧的对数功率谱。Specifically, step 201 is substantially the same as step 101, the difference is that the power calculated in step 101 is the current frame power, and the log power spectrum calculated in this step is the current frame, that is, the calculated current The power of the frame is logarithmic. For example, the processing of the voice signal of the current frame can be 64 points and overlap the 64 points of the previous frame, then the actual processing of 128 points at a time can obtain the power value of 128 points, and the power value of 128 points The logarithmic power corresponding to 128 frequency points can be obtained by taking the logarithms respectively, and the 128 logarithmic powers constitute the logarithmic power spectrum of the current frame.
步骤202:根据当前帧的对数功率谱与前一帧的对数分位数的比较结果,获取密度函数。Step 202: Acquire a density function according to the comparison result of the log power spectrum of the current frame and the log quantile of the previous frame.
具体的说,本实施例中,可以预先设置初始对数分位数和初始密度函数。即首先可以根据实验值初始化密度函数和对数分位数,比如说,根据实验值初始化后的对数分位数可以为:lq(1,k)=8。如果当前帧为第一帧,则可以将第一帧的对数功率谱与初始对数分位数进行比较。在之后的处理中可以根据当前帧的对数功率谱和前一帧的对数分位数,对当前帧的密度函数进行更新,具体可 以根据以下公式进行更新:Specifically, in this embodiment, the initial log quantile and the initial density function may be preset. That is, the density function and the log quantile can be initialized according to the experimental value first. For example, the log quantile after the initialization based on the experimental value can be: lq(1,k)=8. If the current frame is the first frame, the log power spectrum of the first frame can be compared with the initial log quantile. In the subsequent processing, the density function of the current frame can be updated according to the log power spectrum of the current frame and the log quantile of the previous frame. Specifically, it can be updated according to the following formula:
Figure PCTCN2018117972-appb-000042
Figure PCTCN2018117972-appb-000042
其中,λ为当前帧的帧号,k为频点数,β为实验的经验值,ξ为预设的门限阈值,log(|Y w(λ)| 2)为当前帧的对数功率谱,lq(λ-1,k)为前一帧的对数分位数。 Where λ is the frame number of the current frame, k is the number of frequency points, β is the empirical value of the experiment, ξ is the preset threshold threshold, and log(|Y w (λ)| 2 ) is the logarithmic power spectrum of the current frame, lq(λ-1,k) is the log quantile of the previous frame.
需要说明的是,本实施例只是以通过上述密度函数的计算公式得到当前帧的密度函数为例,但在实际应用中并不以此为限。It should be noted that, in this embodiment, the density function of the current frame is obtained by using the above density function calculation formula as an example, but it is not limited to this in practical applications.
步骤203:根据密度函数获取当前帧的增量步长。Step 203: Obtain the incremental step size of the current frame according to the density function.
具体的说,可以预先设置初始增量步长。比如说,根据实验值初始化后得到的初始增量步长可以为:delta(1,k)=40。在之后的处理中,根据前一帧的密度函数对当前帧的增量步长进行更新,具体可以根据以下公式进行更新:Specifically, the initial incremental step size can be set in advance. For example, the initial incremental step size obtained after initialization based on the experimental value may be: delta(1,k)=40. In the subsequent processing, the incremental step of the current frame is updated according to the density function of the previous frame, which can be specifically updated according to the following formula:
Figure PCTCN2018117972-appb-000043
Figure PCTCN2018117972-appb-000043
其中,K为增量步长控制因子。如果当前帧为第一帧,则增量步长控制因子K即为初始增量步长。Among them, K is the incremental step size control factor. If the current frame is the first frame, the incremental step size control factor K is the initial incremental step size.
需要说明的是,本实施例只是以通过上述增量步长的计算公式得到当前帧的增量步长为例,在实际应用中任何根据密度函数获取当前帧的增量步长的方法均在本实施例保护范围之内。It should be noted that, in this embodiment, the incremental step of the current frame is obtained by the above incremental step calculation formula as an example. In practical applications, any method for obtaining the incremental step of the current frame according to the density function is This embodiment is within the protection scope.
步骤204:根据前一帧的对数分位数和当前帧的增量步长获取当前帧的对数分位数。Step 204: Obtain the log quantile of the current frame according to the log quantile of the previous frame and the incremental step size of the current frame.
具体的说,若当前帧的对数功率谱大于或等于前一帧的对数分位数,则可以根据增量步长将前一帧的对数分位数自适应增加,得到当前帧的对数分位 数;若当前帧的对数功率谱小于前一帧的对数分位数,则可以根据增量步长将前一帧的对数分位数自适应减小,得到当前帧的对数分位数。Specifically, if the log power spectrum of the current frame is greater than or equal to the log quantile of the previous frame, the log quantile of the previous frame can be adaptively increased according to the incremental step size to obtain the current frame’s Log quantile; if the log power spectrum of the current frame is less than the log quantile of the previous frame, the log quantile of the previous frame can be adaptively reduced according to the increment step to get the current frame Of the log quantile.
步骤205:根据当前帧的对数分位数,获取当前帧的噪声估计值。Step 205: Acquire the noise estimate of the current frame according to the log quantile of the current frame.
具体的说,在得到当前帧的对数分位数lq(λ,k)后,可以通过以下公式计算噪声估计值:Specifically, after obtaining the log quantile lq(λ,k) of the current frame, the noise estimate can be calculated by the following formula:
Figure PCTCN2018117972-appb-000044
Figure PCTCN2018117972-appb-000044
步骤206:根据噪声估计值,获取纯净语音信号。Step 206: Obtain a pure voice signal according to the noise estimate.
步骤206与第一实施例中步骤104大致相同,为避免重复在此不再赘述。Step 206 is substantially the same as step 104 in the first embodiment, and will not be repeated here to avoid repetition.
为方便说明,本实施例提供如图5所示的框图对本实施例中的自适应语音增强方法进行说明:For convenience of description, this embodiment provides a block diagram as shown in FIG. 5 to explain the adaptive speech enhancement method in this embodiment:
预加重模块301,主要实现高通滤波器的功能,滤除低频成分,增强高频语音成分,即对接收的带噪语音信号y(n)=x(n)+d(n),滤除低频成分,其中,x(n)为纯净语音信号,d(n)为噪声信号。去预加重模块310,主要是低通滤波器,去预加重模块310和预加重模块301为互逆过程,两者结合可以达到去混响的效果。The pre-emphasis module 301 mainly implements the function of a high-pass filter to filter out low-frequency components and enhance high-frequency voice components, that is, to filter the received noisy voice signal y(n)=x(n)+d(n), to filter out low frequencies Components, where x(n) is a pure voice signal and d(n) is a noise signal. The de-pre-emphasis module 310 is mainly a low-pass filter. The de-pre-emphasis module 310 and the pre-emphasis module 301 are reciprocal processes, and the combination of the two can achieve the effect of de-reverberation.
加窗模块302,主要是避免重叠信号出现突变的情况。窗口合成模块309主要是去除窗函数对输出纯净语音信号的影响。本实施例中加窗模块302和窗口合成模块309在实现的过程中,采用同一窗函数,因此,窗函数必须为保幂映射,即语音信号重叠部分的窗口平方和必须为1,如下公式所示:The windowing module 302 is mainly to avoid the occurrence of sudden changes in overlapping signals. The window synthesis module 309 mainly removes the effect of the window function on the output of pure voice signals. In this embodiment, the windowing module 302 and the window synthesis module 309 use the same window function in the implementation process. Therefore, the window function must be a power-preserving mapping, that is, the sum of the squared windows of the overlapping parts of the voice signal must be 1, as shown in the following formula Show:
w 2(N)+w 2(N+M)=1 w 2 (N)+w 2 (N+M)=1
其中,N是FFT处理的点数,取值为128,M是帧长度取值为64。Among them, N is the number of FFT processing points, the value is 128, M is the frame length value is 64.
快速傅里叶变换FFT模块303,主要是进行时域信号与频域信号之间的 互相转换。FFT模块303与逆FFT模块308互为逆过程,FFT模块303将时域信号转换为频域信号,转换为频域信号后可以根据频域信号得到信号幅值Y w。逆FFT模块308将频域信号转换为时域信号。 The fast Fourier transform FFT module 303 is mainly used for mutual conversion between the time domain signal and the frequency domain signal. The FFT module 303 and the inverse FFT module 308 are inverse processes of each other. The FFT module 303 converts the time domain signal into a frequency domain signal, and after conversion into the frequency domain signal, the signal amplitude Y w can be obtained according to the frequency domain signal. The inverse FFT module 308 converts the frequency domain signal into a time domain signal.
功率谱计算模块304,用于通过对频域信号求得的幅值进行平方运算得到当前帧的功率P。对数功率谱计算模块305,用于对当前帧的功率取对数得到当前帧的对数功率谱。功率谱计算模块304和对数计算模块305主要是噪声估计之前的预处理过程。The power spectrum calculation module 304 is configured to obtain the power P of the current frame by squaring the amplitude obtained from the frequency domain signal. The log power spectrum calculation module 305 is configured to log the power of the current frame to obtain the log power spectrum of the current frame. The power spectrum calculation module 304 and the logarithm calculation module 305 are mainly a pre-processing process before noise estimation.
噪声值估计模块306,主要是对带噪语音信号进行噪声估计,尽可能估计准确的噪声信号,主要根据自适应分位数噪声估计原理进行噪声估计得到噪声估计值
Figure PCTCN2018117972-appb-000045
The noise value estimation module 306 mainly performs noise estimation on a noisy speech signal, and estimates an accurate noise signal as much as possible. The noise estimation value is mainly obtained through noise estimation according to the principle of adaptive quantile noise estimation.
Figure PCTCN2018117972-appb-000045
计算谱增益系数模块307,主要是完成根据噪声估计值和带噪语音信号的功率进行谱增益系数的计算得到谱增益系数G。具体的,谱增益系数的计算主要根据广义加权高阶短时频谱振幅估计器原理。The calculating spectral gain coefficient module 307 is mainly to complete the calculation of the spectral gain coefficient according to the noise estimate value and the power of the noisy speech signal to obtain the spectral gain coefficient G. Specifically, the calculation of the spectral gain coefficient is mainly based on the principle of the generalized weighted high-order short-time spectral amplitude estimator.
进一步的,根据谱增益系数G和信号幅值Y w得到频域纯净语音信号
Figure PCTCN2018117972-appb-000046
接着再经过逆FFT模块308将频域信号转换为时域信号,通过窗口合成模块309和去预加重模块310的处理,输出时域纯净语音信号
Figure PCTCN2018117972-appb-000047
从而完成对语音信号的增强。
Further, the pure voice signal in the frequency domain is obtained according to the spectral gain coefficient G and the signal amplitude Y w
Figure PCTCN2018117972-appb-000046
Then, the frequency domain signal is converted into a time domain signal through the inverse FFT module 308, and then processed by the window synthesis module 309 and the de-pre-emphasis module 310 to output a pure voice signal in the time domain.
Figure PCTCN2018117972-appb-000047
Thus completing the enhancement of the voice signal.
本实施例相对于现有技术而言,将带噪语音的当前帧的对数功率谱与前一帧的对数分位数比较来修改对数分位数以得到噪声估计值,这种做法可以避免现有技术中语音信号有无检测、大量数据存储和功率谱排序操作,减小算法资源开销。而且对数坐标能够放大细节,可以提取出一般坐标刻度下提取不出的信号,有利于压缩数值的动态范围,使得在对数坐标系下,当前帧的对数功 率谱和前一帧的对数分位数之间的比较更加精确,从而有利于后续的精准处理。Compared with the prior art, this embodiment compares the log power spectrum of the current frame of the noisy speech with the log quantile of the previous frame to modify the log quantile to obtain the noise estimate value. It can avoid the detection of voice signals, a large amount of data storage and power spectrum sorting operations in the prior art, and reduce the algorithm resource overhead. Moreover, the logarithmic coordinates can amplify the details, and can extract signals that cannot be extracted under the general coordinate scale, which is conducive to compressing the dynamic range of the values, so that in the logarithmic coordinate system, the logarithmic power spectrum of the current frame and the previous frame The comparison between the quantiles is more accurate, which facilitates subsequent accurate processing.
本申请第三实施例涉及一种自适应语音增强方法,在本实施例中,提供了一种具体公式,用以根据增量步长将前一帧的对数分位数自适应增加,得到当前帧的对数分位数,有利于直接、快速并准确的获取当前帧的对数分位数。The third embodiment of the present application relates to an adaptive speech enhancement method. In this embodiment, a specific formula is provided to adaptively increase the log quantile of the previous frame according to the incremental step size to obtain The log quantile of the current frame is beneficial to obtain the log quantile of the current frame directly, quickly and accurately.
本实施例中自适应语音增强方法的具体流程如图6所示,包括:The specific process of the adaptive speech enhancement method in this embodiment is shown in FIG. 6 and includes:
步骤401:在接收到语音信号后,根据语音信号,计算语音信号的当前帧的对数功率谱。Step 401: After receiving the voice signal, calculate the log power spectrum of the current frame of the voice signal according to the voice signal.
步骤402:根据当前帧的对数功率谱与前一帧的对数分位数的比较结果,获取密度函数。Step 402: Acquire a density function according to the comparison result of the log power spectrum of the current frame and the log quantile of the previous frame.
步骤403:根据密度函数获取当前帧的增量步长。Step 403: Obtain the incremental step size of the current frame according to the density function.
步骤401至步骤403与第二实施例中步骤201至步骤203大致相同,为避免重复在此不再一一赘述。 Steps 401 to 403 are substantially the same as steps 201 to 203 in the second embodiment, and will not be repeated here to avoid repetition.
步骤404:判断当前帧的对数功率谱是否大于或等于前一帧的对数分位数,如果是则执行步骤405,否则执行步骤406。Step 404: Determine whether the log power spectrum of the current frame is greater than or equal to the log quantile of the previous frame. If yes, perform step 405; otherwise, perform step 406.
步骤405:根据公式lq(λ,k)=lq(λ-1,k)+α·delta(λ,k)/β计算当前帧的对数分位数。Step 405: Calculate the log quantile of the current frame according to the formula lq(λ,k)=lq(λ-1,k)+α·delta(λ,k)/β.
也就是说,当log(|Y w(λ)| 2)≥lq(λ-1,k)时,当前帧的对数分位数为:将前一帧的对数分位数根据增量步长自适应增加,具体通过以下公式自适应增加,计算得到当前帧的对数分位数:lq(λ,k)=lq(λ-1,k)+α·delta(λ,k)/β。其中,λ为当前帧数,k为频点数,α和β都是实验的经验值,本实施例中实验的经验值可以为:α=0.25,β=67,但在实际应用中并不以此为限。 That is to say, when log(|Y w (λ)| 2 )≥lq(λ-1,k), the log quantile of the current frame is: according to the increment of the log quantile of the previous frame The step size is adaptively increased. Specifically, it is adaptively increased by the following formula, and the log quantile of the current frame is calculated: lq(λ,k)=lq(λ-1,k)+α·delta(λ,k)/ β. Where λ is the current number of frames, k is the number of frequency points, and α and β are the experimental experience values. In this embodiment, the experimental experience values may be: α=0.25, β=67, but in practical applications This is limited.
步骤406:根据公式lq(λ,k)=lq(λ-1,k)-(1-α)·delta(λ,k)/β计算当前帧的 对数分位数。Step 406: Calculate the log quantile of the current frame according to the formula lq(λ,k)=lq(λ-1,k)-(1-α)·delta(λ,k)/β.
也就是说,当log(|Y w(λ)| 2)<lq(λ-1,k)时,当前帧的对数分位数为:将前一帧的对数分位数根据增量步长自适应减小,具体通过以下公式自适应减小,计算得到当前帧的对数分位数:lq(λ,k)=lq(λ-1,k)-(1-α)·delta(λ,k)/β。 In other words, when log(|Y w (λ)| 2 )<lq(λ-1,k), the log quantile of the current frame is: according to the increment of the log quantile of the previous frame The step size is adaptively reduced. Specifically, it is adaptively reduced by the following formula, and the log quantile of the current frame is calculated: lq(λ,k)=lq(λ-1,k)-(1-α)·delta (λ,k)/β.
步骤407:根据公式
Figure PCTCN2018117972-appb-000048
获取当前帧的噪声估计值。
Step 407: According to the formula
Figure PCTCN2018117972-appb-000048
Get the noise estimate of the current frame.
步骤408:根据噪声估计值,获取纯净语音信号。Step 408: Obtain the pure voice signal according to the noise estimate.
步骤407至步骤408与第二实施例中步骤205至步骤206大致相同,为避免重复此处不再赘述。 Steps 407 to 408 are substantially the same as steps 205 to 206 in the second embodiment, and will not be repeated here to avoid repetition.
本实施例相对于现有技术而言,提供了根据增量步长将前一帧的对数分位数自适应增加,得到当前帧的对数分位数的具体公式,有利于根据当前帧的增量步长直接、快速并准确的获取当前帧的对数分位数,从而有利于根据当前帧的对数分位数进行噪声估计。Compared with the prior art, this embodiment provides a specific formula for adaptively increasing the log quantile of the previous frame according to the incremental step size to obtain the log quantile of the current frame, which is beneficial to the current frame. The incremental step size of is directly, quickly and accurately obtains the log quantile of the current frame, which is beneficial to the noise estimation according to the log quantile of the current frame.
本申请第四实施例涉及一种电子设备,如图7所示,包括至少一个处理器501;以及,与至少一个处理器501通信连接的存储器502;其中,存储器502存储有可被所述至少一个处理器501执行的指令,指令被至少一个处理器501执行,以使至少一个处理器501能够执行上述的自适应语音增强方法。The fourth embodiment of the present application relates to an electronic device, as shown in FIG. 7, including at least one processor 501; and, a memory 502 communicatively connected to the at least one processor 501; wherein, the memory 502 stores An instruction executed by one processor 501, the instruction is executed by at least one processor 501, so that the at least one processor 501 can execute the adaptive speech enhancement method described above.
其中,存储器502和处理器501采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器501和存储器502的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上 与各种其他装置通信的单元。经处理器501处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器501。Among them, the memory 502 and the processor 501 are connected by a bus. The bus may include any number of interconnected buses and bridges. The bus connects one or more processors 501 and various circuits of the memory 502 together. The bus can also connect various other circuits such as peripheral devices, voltage regulators, and power management circuits, etc., which are well known in the art, and therefore, they will not be described further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver can be a single element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices on the transmission medium. The data processed by the processor 501 is transmitted on the wireless medium through the antenna. Further, the antenna also receives the data and transmits the data to the processor 501.
处理器501负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器502可以被用于存储处理器501在执行操作时所使用的数据。The processor 501 is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions. The memory 502 may be used to store data used by the processor 501 when performing operations.
本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that all or part of the steps in the method of the above embodiments can be completed by instructing relevant hardware through a program, which is stored in a storage medium, and includes several instructions to make a device (may be A single chip microcomputer, a chip, etc.) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Persons of ordinary skill in the art may understand that the above-mentioned embodiments are specific embodiments for implementing the present application, and in practical applications, various changes may be made in form and detail without departing from the spirit and range.

Claims (20)

  1. 一种自适应语音增强方法,其特征在于,包括:An adaptive speech enhancement method, characterized in that it includes:
    在接收到语音信号后,根据所述语音信号,计算所述语音信号的当前帧的功率;After receiving the voice signal, according to the voice signal, calculate the power of the current frame of the voice signal;
    将所述当前帧的功率与前一帧的噪声功率进行比较;Compare the power of the current frame with the noise power of the previous frame;
    根据比较的结果和所述前一帧的噪声功率,获取所述当前帧的噪声估计值;Obtaining the noise estimate value of the current frame according to the comparison result and the noise power of the previous frame;
    根据所述噪声估计值,获取纯净语音信号。Based on the noise estimate, a pure voice signal is obtained.
  2. 根据权利要求1所述的自适应语音增强方法,其特征在于,The adaptive speech enhancement method according to claim 1, wherein
    所述当前帧的功率为:当前帧的对数功率谱;The power of the current frame is: the log power spectrum of the current frame;
    所述前一帧的噪声功率为:前一帧的对数分位数。The noise power of the previous frame is: the log quantile of the previous frame.
  3. 根据权利要求2所述的自适应语音增强方法,其特征在于,所述根据比较结果和所述前一帧的噪声功率,获取所述当前帧的噪声估计值,包括:The adaptive speech enhancement method according to claim 2, wherein the acquiring the noise estimate value of the current frame according to the comparison result and the noise power of the previous frame includes:
    根据所述当前帧的对数功率谱与所述前一帧的对数分位数的比较结果,获取所述当前帧的增量步长;Obtain the incremental step size of the current frame according to the comparison result of the log power spectrum of the current frame and the log quantile of the previous frame;
    根据所述前一帧的对数分位数和所述当前帧的增量步长获取所述当前帧的对数分位数;Acquiring the log quantile of the current frame according to the log quantile of the previous frame and the incremental step size of the current frame;
    根据所述当前帧的对数分位数,获取所述当前帧的噪声估计值。Acquire the noise estimate of the current frame according to the log quantile of the current frame.
  4. 根据权利要求3所述的自适应语音增强方法,其特征在于,所述根据所述前一帧的对数分位数和所述当前帧的增量步长获取所述当前帧的对数分位数,包括:The adaptive speech enhancement method according to claim 3, wherein the log score of the current frame is obtained according to the log quantile of the previous frame and the incremental step size of the current frame Number of digits, including:
    若所述当前帧的对数功率谱大于或等于前一帧的对数分位数,则根据所述 增量步长将所述前一帧的对数分位数自适应增加,得到当前帧的对数分位数;If the log power spectrum of the current frame is greater than or equal to the log quantile of the previous frame, the log quantile of the previous frame is adaptively increased according to the incremental step size to obtain the current frame Of the log quantile;
    若所述当前帧的对数功率谱小于前一帧的对数分位数,则根据所述增量步长将所述前一帧的对数分位数自适应减小,得到当前帧的对数分位数。If the log power spectrum of the current frame is less than the log quantile of the previous frame, the log quantile of the previous frame is adaptively reduced according to the incremental step size to obtain the current frame’s Log quantile.
  5. 根据权利要求3所述的自适应语音增强方法,其特征在于,还包括:The adaptive speech enhancement method according to claim 3, further comprising:
    预先设置初始对数分位数和初始增量步长。Set the initial log quantile and initial increment step in advance.
  6. 根据权利要求3所述的自适应语音增强方法,其特征在于,所述根据所述当前帧的对数功率谱与所述前一帧的对数分位数的比较结果,获取所述当前帧的增量步长,包括:The adaptive speech enhancement method according to claim 3, wherein the current frame is obtained according to a comparison result of the log power spectrum of the current frame and the log quantile of the previous frame The incremental step size includes:
    根据所述当前帧的对数功率谱与所述前一帧的对数分位数的比较结果,获取密度函数;Obtain a density function according to a comparison result of the log power spectrum of the current frame and the log quantile of the previous frame;
    根据所述密度函数获取所述当前帧的增量步长。Obtain the incremental step size of the current frame according to the density function.
  7. 根据权利要求6所述的自适应语音增强方法,其特征在于,所述获取概率密度,包括:The adaptive speech enhancement method according to claim 6, wherein the acquisition probability density includes:
    通过以下公式获取密度函数density:The density function density is obtained by the following formula:
    Figure PCTCN2018117972-appb-100001
    Figure PCTCN2018117972-appb-100001
    其中,所述λ为当前帧的帧号,所述k为频点数,所述β为实验值,所述ξ为预设的门限阈值,所述log(|Y w(λ)| 2)为当前帧的对数功率谱,所述lq(λ-1,k)为前一帧的对数分位数。 Where λ is the frame number of the current frame, k is the number of frequency points, β is the experimental value, ξ is the preset threshold threshold, and log(|Y w (λ)| 2 ) is The log power spectrum of the current frame, where lq(λ-1,k) is the log quantile of the previous frame.
  8. 根据权利要求6所述的自适应语音增强方法,其特征在于,所述根据所述密度函数获取所述当前帧的增量步长,包括:The adaptive speech enhancement method according to claim 6, wherein the obtaining the incremental step size of the current frame according to the density function includes:
    通过以下公式获取增量步长delta:The incremental step delta is obtained by the following formula:
    Figure PCTCN2018117972-appb-100002
    Figure PCTCN2018117972-appb-100002
    其中,所述λ为当前帧的帧号,所述K为增量步长控制因子,所述density(λ-1,k)为所述前一帧的密度函数。Wherein, λ is the frame number of the current frame, K is an incremental step size control factor, and density(λ-1, k) is the density function of the previous frame.
  9. 根据权利要求4所述的自适应语音增强方法,其特征在于,所述根据所述增量步长将所述前一帧的对数分位数自适应增加,得到当前帧的对数分位数,包括:The adaptive speech enhancement method according to claim 4, wherein the log quantile of the previous frame is adaptively increased according to the incremental step size to obtain the log quantile of the current frame Number, including:
    通过以下公式得到所述当前帧的对数分位数:The log quantile of the current frame is obtained by the following formula:
    lq(λ,k)=lq(λ-1,k)+α·delta(λ,k)/βlq(λ,k)=lq(λ-1,k)+α·delta(λ,k)/β
    所述根据所述增量步长将所述前一帧的对数分位数自适应减小,得到当前帧的对数分位数,包括:The step of adaptively reducing the log quantile of the previous frame according to the increment step to obtain the log quantile of the current frame includes:
    通过以下公式得到所述当前帧的对数分位数:The log quantile of the current frame is obtained by the following formula:
    lq(λ,k)=lq(λ-1,k)-(1-α)·delta(λ,k)/βlq(λ,k)=lq(λ-1,k)-(1-α)·delta(λ,k)/β
    其中,所述λ为当前帧的帧号,所述k为频点数,所述α为实验的经验值,所述delta(λ,k)为增量步长。Where λ is the frame number of the current frame, k is the number of frequency points, α is the empirical value of the experiment, and delta(λ, k) is the incremental step size.
  10. 根据权利要求3所述的自适应语音增强方法,其特征在于,所述根据所述当前帧的对数分位数,获取所述当前帧的噪声估计值,包括:The adaptive speech enhancement method according to claim 3, wherein the obtaining the noise estimate value of the current frame according to the log quantile of the current frame includes:
    通过以下公式获取所述当前帧的噪声估计值:The noise estimate value of the current frame is obtained by the following formula:
    Figure PCTCN2018117972-appb-100003
    Figure PCTCN2018117972-appb-100003
    其中,所述
    Figure PCTCN2018117972-appb-100004
    为噪声估计值,所述lq(λ,k)为所述当前帧的对数分位数,所述λ为所述当前帧的帧号,所述k为频点数。
    Among them, the
    Figure PCTCN2018117972-appb-100004
    For noise estimates, the lq(λ, k) is the log quantile of the current frame, the λ is the frame number of the current frame, and the k is the number of frequency points.
  11. 根据权利要求1所述的自适应语音增强方法,其特征在于,所述根据 所述噪声估计值,获取纯净语音信号,包括:The adaptive speech enhancement method according to claim 1, wherein the acquiring the pure speech signal according to the noise estimation value comprises:
    根据所述噪声估计值,获取谱增益系数;Obtain the spectral gain coefficient according to the noise estimate;
    根据所述谱增益系数获取所述当前帧的纯净语音信号。Obtain the pure voice signal of the current frame according to the spectral gain coefficient.
  12. 根据权利要求11所述的自适应语音增强方法,其特征在于,所述根据所述噪声估计值,获取谱增益系数,包括:The adaptive speech enhancement method according to claim 11, wherein the acquiring the spectral gain coefficient according to the noise estimation value comprises:
    根据所述前一帧的噪声估计值和所述前一帧的纯净语音信号,计算先验信噪比;Calculate the a priori signal-to-noise ratio according to the noise estimate value of the previous frame and the pure speech signal of the previous frame;
    根据所述当前帧的噪声估计值和所述当前帧的功率,计算后验信噪比;Calculate the posterior signal-to-noise ratio according to the estimated noise value of the current frame and the power of the current frame;
    根据所述先验信噪比和所述后验信噪比获取谱增益系数。The spectral gain coefficient is obtained according to the a priori signal-to-noise ratio and the a posteriori signal-to-noise ratio.
  13. 根据权利要求12所述的自适应语音增强方法,其特征在于,所述根据所述先验信噪比和所述后验信噪比获取谱增益系数,包括:The adaptive speech enhancement method according to claim 12, wherein the obtaining of the spectral gain coefficient according to the prior signal-to-noise ratio and the posterior signal-to-noise ratio includes:
    根据以下公式获取谱增益系数G:Obtain the spectral gain coefficient G according to the following formula:
    Figure PCTCN2018117972-appb-100005
    Figure PCTCN2018117972-appb-100005
    其中,所述γ k为后验信噪比,所述ξ k为先验信噪比,所述
    Figure PCTCN2018117972-appb-100006
    所述p为感知加权阶数,所述β为高阶幅度谱的阶数。
    Where γ k is the a posteriori signal-to-noise ratio, and ξ k is the a priori signal-to-noise ratio, the
    Figure PCTCN2018117972-appb-100006
    The p is the perceptual weighted order, and the β is the order of the higher-order amplitude spectrum.
  14. 根据权利要求13所述的自适应语音增强方法,其特征在于,所述感知加权阶数通过以下方式获取:The adaptive speech enhancement method according to claim 13, wherein the perceptual weighting order is obtained in the following manner:
    在所述语音信号的频率谱中,将Bark域的频带划分为若干个子带;In the frequency spectrum of the voice signal, the frequency band of the Bark domain is divided into several subbands;
    计算所述若干个子带的信噪比:Calculate the signal-to-noise ratio of the several subbands:
    根据所述若干个子带的信噪比计算所述感知加权阶数。The perceptual weighting order is calculated according to the signal-to-noise ratio of the several subbands.
  15. 根据权利要求14所述的自适应语音增强方法,其特征在于,所述计算所述若干个子带的信噪比,包括:The adaptive speech enhancement method according to claim 14, wherein the calculating the signal-to-noise ratio of the several sub-bands includes:
    通过以下公式计算所述若干个子带的信噪比SNR:The signal-to-noise ratio SNR of the several sub-bands is calculated by the following formula:
    Figure PCTCN2018117972-appb-100007
    Figure PCTCN2018117972-appb-100007
    其中,所述b为子带的序号,所述k为频点数,所述B low(b)为所述Bark域第b个子带的频点起点,所述B up(b)为所述Bark域第b子带的频点终点。 Where b is the serial number of the subband, k is the number of frequency points, B low (b) is the starting point of the frequency point of the b subband in the Bark domain, and B up (b) is the Bark The end of the frequency point of the b-th subband of the domain.
  16. 根据权利要求15所述的自适应语音增强方法,其特征在于,所述根据所述若干个子带的信噪比计算感知加权阶数,为:The adaptive speech enhancement method according to claim 15, wherein the calculation of the perceptual weighting order according to the signal-to-noise ratio of the several sub-bands is:
    通过以下公式计算所述感知加权阶数p:The perceptual weighted order p is calculated by the following formula:
    p(b,k)=max{min[α 1SNR(b,k)+α 2,p max],p min} p(b,k)=max{min[α 1 SNR(b,k)+α 2 ,p max ],p min }
    其中,所述α 1,所述α 2,所述p min和所述p max均为实验经验值。 Wherein, the α 1 , the α 2 , the p min and the p max are experimental empirical values.
  17. 根据权利要求13所述的自适应语音增强方法,其特征在于,所述高阶幅度谱的阶数通过以下方式获取:The adaptive speech enhancement method according to claim 13, wherein the order of the higher-order amplitude spectrum is obtained in the following manner:
    在所述语音信号的频率谱中,将Bark域划分为若干个子带;In the frequency spectrum of the speech signal, the Bark domain is divided into several subbands;
    通过以下公式计算所述高阶幅度谱的阶数β:The order β of the higher-order amplitude spectrum is calculated by the following formula:
    Figure PCTCN2018117972-appb-100008
    Figure PCTCN2018117972-appb-100008
    其中,所述F s为采样频率,所述β min,所述β max,所述p min,所述p max和所述A均为实验经验值,所述b为子带的序号,所述k为频点数,所述B low(b)为所述Bark域第b个子带的频点起点,所述B up(b)为所述Bark域第b个子带的 频点终点,所述f(k)=kFs/N为对所述接收的语音信号进行快速傅里叶变换后第k个频点的频率。 Where F s is the sampling frequency, the β min , the β max , the p min , the p max and the A are experimental empirical values, and the b is the serial number of the subband, k is the number of frequency points, the B low (b) is the starting point of the frequency point of the b-th sub-band of the Bark domain, the B up (b) is the end point of the frequency point of the b-th sub-band of the Bark domain, and the f (k)=kFs/N is the frequency of the k-th frequency point after performing fast Fourier transform on the received speech signal.
  18. 根据权利要求13所述的自适应语音增强方法,其特征在于,The adaptive speech enhancement method according to claim 13, wherein:
    所述
    Figure PCTCN2018117972-appb-100009
    Figure PCTCN2018117972-appb-100010
    通过查询预存的Γ函数的输入输出对应关系得到:
    Said
    Figure PCTCN2018117972-appb-100009
    with
    Figure PCTCN2018117972-appb-100010
    By querying the corresponding relationship between the input and output of the pre-stored Γ function:
    所述
    Figure PCTCN2018117972-appb-100011
    Figure PCTCN2018117972-appb-100012
    通过查询预存的Φ函数的输入输出对应关系得到。
    Said
    Figure PCTCN2018117972-appb-100011
    with
    Figure PCTCN2018117972-appb-100012
    It can be obtained by querying the corresponding relationship between input and output of the pre-stored Φ function.
  19. 根据权利要求13所述的自适应语音增强方法,其特征在于,所述根据所述谱增益系数获取纯净语音信号,包括:The adaptive speech enhancement method according to claim 13, wherein the obtaining of pure speech signals according to the spectral gain coefficients includes:
    通过以下公式获取纯净语音信号
    Figure PCTCN2018117972-appb-100013
    Get pure voice signal by the following formula
    Figure PCTCN2018117972-appb-100013
    Figure PCTCN2018117972-appb-100014
    Figure PCTCN2018117972-appb-100014
    其中,所述Y w(k)为所述当前帧的信号幅值。 Wherein, Y w (k) is the signal amplitude of the current frame.
  20. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it includes:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至19中任一所述的自适应语音增强方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to execute any one of claims 1 to 19 Adaptive speech enhancement method.
PCT/CN2018/117972 2018-11-28 2018-11-28 Self-adaptive speech enhancement method, and electronic device WO2020107269A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880002760.2A CN109643554B (en) 2018-11-28 2018-11-28 Adaptive voice enhancement method and electronic equipment
PCT/CN2018/117972 WO2020107269A1 (en) 2018-11-28 2018-11-28 Self-adaptive speech enhancement method, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/117972 WO2020107269A1 (en) 2018-11-28 2018-11-28 Self-adaptive speech enhancement method, and electronic device

Publications (1)

Publication Number Publication Date
WO2020107269A1 true WO2020107269A1 (en) 2020-06-04

Family

ID=66060188

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117972 WO2020107269A1 (en) 2018-11-28 2018-11-28 Self-adaptive speech enhancement method, and electronic device

Country Status (2)

Country Link
CN (1) CN109643554B (en)
WO (1) WO2020107269A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986660A (en) * 2020-08-26 2020-11-24 深圳信息职业技术学院 Single-channel speech enhancement method, system and storage medium for neural network sub-band modeling
CN112735458A (en) * 2020-12-28 2021-04-30 苏州科达科技股份有限公司 Noise estimation method, noise reduction method and electronic equipment
CN113299308A (en) * 2020-09-18 2021-08-24 阿里巴巴集团控股有限公司 Voice enhancement method and device, electronic equipment and storage medium
CN113593599A (en) * 2021-09-02 2021-11-02 北京云蝶智学科技有限公司 Method for removing noise signal in voice signal

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151053B (en) * 2019-06-11 2024-04-16 北京汇钧科技有限公司 Speech enhancement method, system, electronic device and storage medium
CN113113039B (en) * 2019-07-08 2022-03-18 广州欢聊网络科技有限公司 Noise suppression method and device and mobile terminal
WO2021007841A1 (en) * 2019-07-18 2021-01-21 深圳市汇顶科技股份有限公司 Noise estimation method, noise estimation apparatus, speech processing chip and electronic device
CN110739005B (en) * 2019-10-28 2022-02-01 南京工程学院 Real-time voice enhancement method for transient noise suppression
CN110706716B (en) * 2019-10-30 2022-08-19 歌尔科技有限公司 Voice signal processing method, voice signal processing device and storage medium
CN111429933B (en) * 2020-03-06 2022-09-30 北京小米松果电子有限公司 Audio signal processing method and device and storage medium
CN111508519B (en) * 2020-04-03 2022-04-26 北京达佳互联信息技术有限公司 Method and device for enhancing voice of audio signal
CN112116914B (en) * 2020-08-03 2022-11-25 四川大学 Sound processing method and system based on variable step length LMS algorithm
CN111899724A (en) * 2020-08-06 2020-11-06 中国人民解放军空军预警学院 Voice feature coefficient extraction method based on Hilbert-Huang transform and related equipment
CN113270107B (en) * 2021-04-13 2024-02-06 维沃移动通信有限公司 Method and device for acquiring loudness of noise in audio signal and electronic equipment
CN113345461A (en) * 2021-04-26 2021-09-03 北京搜狗科技发展有限公司 Voice processing method and device for voice processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060271362A1 (en) * 2005-05-31 2006-11-30 Nec Corporation Method and apparatus for noise suppression
CN103021420A (en) * 2012-12-04 2013-04-03 中国科学院自动化研究所 Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation
CN103646648A (en) * 2013-11-19 2014-03-19 清华大学 Noise power estimation method
CN103730124A (en) * 2013-12-31 2014-04-16 上海交通大学无锡研究院 Noise robustness endpoint detection method based on likelihood ratio test

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5485522A (en) * 1993-09-29 1996-01-16 Ericsson Ge Mobile Communications, Inc. System for adaptively reducing noise in speech signals
JPH11514453A (en) * 1995-09-14 1999-12-07 エリクソン インコーポレイテッド A system for adaptively filtering audio signals to enhance speech intelligibility in noisy environmental conditions
DE60222813T2 (en) * 2002-07-12 2008-07-03 Widex A/S HEARING DEVICE AND METHOD FOR INCREASING REDEEMBLY
CN1162838C (en) * 2002-07-12 2004-08-18 清华大学 Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization
GB2426167B (en) * 2005-05-09 2007-10-03 Toshiba Res Europ Ltd Noise estimation method
EP2226794B1 (en) * 2009-03-06 2017-11-08 Harman Becker Automotive Systems GmbH Background noise estimation
CN103650040B (en) * 2011-05-16 2017-08-25 谷歌公司 Use the noise suppressing method and device of multiple features modeling analysis speech/noise possibility
CN104103278A (en) * 2013-04-02 2014-10-15 北京千橡网景科技发展有限公司 Real time voice denoising method and device
US10141003B2 (en) * 2014-06-09 2018-11-27 Dolby Laboratories Licensing Corporation Noise level estimation
EP3252766B1 (en) * 2016-05-30 2021-07-07 Oticon A/s An audio processing device and a method for estimating a signal-to-noise-ratio of a sound signal
CN104269178A (en) * 2014-08-08 2015-01-07 华迪计算机集团有限公司 Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals
KR102475869B1 (en) * 2014-10-01 2022-12-08 삼성전자주식회사 Method and apparatus for processing audio signal including noise
WO2016135741A1 (en) * 2015-02-26 2016-09-01 Indian Institute Of Technology Bombay A method and system for suppressing noise in speech signals in hearing aids and speech communication devices
CN107393553B (en) * 2017-07-14 2020-12-22 深圳永顺智信息科技有限公司 Auditory feature extraction method for voice activity detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060271362A1 (en) * 2005-05-31 2006-11-30 Nec Corporation Method and apparatus for noise suppression
CN103021420A (en) * 2012-12-04 2013-04-03 中国科学院自动化研究所 Speech enhancement method of multi-sub-band spectral subtraction based on phase adjustment and amplitude compensation
CN103646648A (en) * 2013-11-19 2014-03-19 清华大学 Noise power estimation method
CN103730124A (en) * 2013-12-31 2014-04-16 上海交通大学无锡研究院 Noise robustness endpoint detection method based on likelihood ratio test

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986660A (en) * 2020-08-26 2020-11-24 深圳信息职业技术学院 Single-channel speech enhancement method, system and storage medium for neural network sub-band modeling
CN113299308A (en) * 2020-09-18 2021-08-24 阿里巴巴集团控股有限公司 Voice enhancement method and device, electronic equipment and storage medium
CN112735458A (en) * 2020-12-28 2021-04-30 苏州科达科技股份有限公司 Noise estimation method, noise reduction method and electronic equipment
CN113593599A (en) * 2021-09-02 2021-11-02 北京云蝶智学科技有限公司 Method for removing noise signal in voice signal

Also Published As

Publication number Publication date
CN109643554A (en) 2019-04-16
CN109643554B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
WO2020107269A1 (en) Self-adaptive speech enhancement method, and electronic device
US11056130B2 (en) Speech enhancement method and apparatus, device and storage medium
US10573301B2 (en) Neural network based time-frequency mask estimation and beamforming for speech pre-processing
CN105741849B (en) The sound enhancement method of phase estimation and human hearing characteristic is merged in digital deaf-aid
US8712074B2 (en) Noise spectrum tracking in noisy acoustical signals
CN100543842C (en) Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error
US7313518B2 (en) Noise reduction method and device using two pass filtering
US20120245927A1 (en) System and method for monaural audio processing based preserving speech information
CN112735456B (en) Speech enhancement method based on DNN-CLSTM network
Borowicz et al. Signal subspace approach for psychoacoustically motivated speech enhancement
CN1210608A (en) Noisy speech parameter enhancement method and apparatus
Verteletskaya et al. Noise reduction based on modified spectral subtraction method
US10839820B2 (en) Voice processing method, apparatus, device and storage medium
WO2021007841A1 (en) Noise estimation method, noise estimation apparatus, speech processing chip and electronic device
CN113345460B (en) Audio signal processing method, device, equipment and storage medium
CN110808057A (en) Voice enhancement method for generating confrontation network based on constraint naive
US20240046947A1 (en) Speech signal enhancement method and apparatus, and electronic device
CN110556125A (en) Feature extraction method and device based on voice signal and computer storage medium
Shi et al. Fusion feature extraction based on auditory and energy for noise-robust speech recognition
CN109102823A (en) A kind of sound enhancement method based on subband spectrum entropy
CN101322183A (en) Signal distortion elimination apparatus, method, program, and recording medium having the program recorded thereon
WO2017128910A1 (en) Method, apparatus and electronic device for determining speech presence probability
CN114566179A (en) Time delay controllable voice noise reduction method
CN113035216B (en) Microphone array voice enhancement method and related equipment
Wei et al. Analysis and implementation of low‐power perceptual multiband noise reduction for the hearing aids application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18941159

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18941159

Country of ref document: EP

Kind code of ref document: A1