CN107045874B - Non-linear voice enhancement method based on correlation - Google Patents

Non-linear voice enhancement method based on correlation Download PDF

Info

Publication number
CN107045874B
CN107045874B CN201610079921.0A CN201610079921A CN107045874B CN 107045874 B CN107045874 B CN 107045874B CN 201610079921 A CN201610079921 A CN 201610079921A CN 107045874 B CN107045874 B CN 107045874B
Authority
CN
China
Prior art keywords
noise
attenuation gain
speech
calculating
gain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610079921.0A
Other languages
Chinese (zh)
Other versions
CN107045874A (en
Inventor
韩翀蛟
高可攀
羊开云
徐晓峰
李夏宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GRANDSTREAM NETWORK Inc
SHENZHEN GRANDSTREAM NETWORKS Inc
Original Assignee
GRANDSTREAM NETWORK Inc
SHENZHEN GRANDSTREAM NETWORKS Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GRANDSTREAM NETWORK Inc, SHENZHEN GRANDSTREAM NETWORKS Inc filed Critical GRANDSTREAM NETWORK Inc
Priority to CN201610079921.0A priority Critical patent/CN107045874B/en
Publication of CN107045874A publication Critical patent/CN107045874A/en
Application granted granted Critical
Publication of CN107045874B publication Critical patent/CN107045874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a non-linear method based on correlationA method of sexual speech enhancement, comprising: step a: noisy speech data for speech preprocessing
Figure 637097DEST_PATH_IMAGE001
And estimating noisy data
Figure 660285DEST_PATH_IMAGE002
Performing fast Fourier transform to obtain frequency spectrum of noisy speech frame
Figure 525473DEST_PATH_IMAGE003
And estimating the spectrum of the noise frame
Figure 726647DEST_PATH_IMAGE004
(ii) a Step b: calculating signal-to-noise ratio and attenuation gain to obtain attenuation gain
Figure 310075DEST_PATH_IMAGE005
Figure 372840DEST_PATH_IMAGE006
(ii) a Step c: calculating the correlation between the voice with noise and the noise to obtain the frequency spectrum of the voice frame with noise
Figure DEST_PATH_IMAGE007
And estimating the spectrum of the noise frame
Figure 755453DEST_PATH_IMAGE008
Cross correlation function of
Figure 127528DEST_PATH_IMAGE009
Figure 198252DEST_PATH_IMAGE011
(ii) a Step d: calculating nonlinear attenuation gain to obtain the nonlinear attenuation gain
Figure 64708DEST_PATH_IMAGE012
(ii) a Step e: speech enhancement processing by gain-attenuating
Figure 904489DEST_PATH_IMAGE013
And said nonlinear attenuation gain in said step d
Figure 447465DEST_PATH_IMAGE014
Co-acting on the frequency spectrum of the noisy speech frame
Figure 739906DEST_PATH_IMAGE015
To realize the processing of voice enhancement and obtain the pure voice signal frequency spectrum
Figure 908588DEST_PATH_IMAGE016
. The technical scheme provided by the invention can more thoroughly remove the noise component in the voice signal with noise, and can be flexibly applied in compromise in the aspects of removing noise and ensuring voice quality according to different application scenes.

Description

Non-linear voice enhancement method based on correlation
Technical Field
The invention belongs to the technical field of voice communication, and particularly relates to a voice enhancement technology.
Background
In the process of voice communication, the voice sent by the sender can be interfered by noise introduced from the surrounding environment where the sender is located, such as the sound of an air conditioner in an office, the sound of the rotation of fans such as a computer host and the like. The voice received at the receiving end is not the pure voice of the transmitter-end talker any more, but the noisy voice interfered by various noises is introduced, so that the recognition degree of the voice heard by the receiver at the receiving end is reduced. However, in many situations, especially during a teleconference, speech recognition and speech quality need to be better guaranteed, so that speech enhancement is necessary, and incoming speech enhancement techniques are rapidly developed.
One of the existing speech enhancement methods is a method based on a spectral subtraction idea, and the method performs a difference between a noisy speech spectrum and an estimated noise spectrum to obtain an enhanced speech signal spectrum, and has the disadvantages of low algorithm complexity and small calculation amount, but has the defect of serious noise residue in the speech signal after speech enhancement by using spectral subtraction. The second category is speech enhancement technology based on adaptive filtering algorithm, which cannot fundamentally overcome the contradiction between convergence rate and steady-state error, and the algorithm has poor effect in the environment with low signal-to-noise ratio. The third type is a speech enhancement method based on matrix decomposition or model learning, which has a good effect of removing non-stationary sudden noise, but the method involves complex theoretical implementation processes such as matrix decomposition and model training learning, and the calculated amount is much higher than that of the first two types. Based on the above, the present invention discloses a novel speech enhancement technique to overcome the disadvantages of the prior art.
Disclosure of Invention
The invention aims to provide a correlation-based nonlinear speech enhancement method, which solves the problems of unclean noise removal and the like on the premise of ensuring speech quality and can obtain a better speech enhancement effect under the scene of a lower signal-to-noise ratio.
In order to achieve the above object, the technical solution of the present invention is as follows: a nonlinear speech enhancement method based on correlation mainly comprises the following steps: step a: noisy speech data for speech preprocessing
Figure 277900DEST_PATH_IMAGE001
And estimating noisy data
Figure 245856DEST_PATH_IMAGE002
Performing fast Fourier transform to obtain frequency spectrum of noisy speech frame
Figure 404305DEST_PATH_IMAGE003
And estimating the spectrum of the noise frame
Figure 115909DEST_PATH_IMAGE004
(ii) a Step b: calculating signal-to-noise ratio and attenuation gain to obtain attenuation gain
Figure 43414DEST_PATH_IMAGE005
Figure 182271DEST_PATH_IMAGE007
(ii) a Step c: the correlation calculation of the voice with noise and the noise obtains the frequency spectrum of the voice frame with noise
Figure 828016DEST_PATH_IMAGE008
And estimating the spectrum of the noise frame
Figure 343311DEST_PATH_IMAGE009
Cross correlation function of
Figure 626787DEST_PATH_IMAGE010
Figure 936545DEST_PATH_IMAGE011
(ii) a Step d: calculating nonlinear attenuation gain to obtain nonlinear attenuation gain
Figure 69587DEST_PATH_IMAGE012
(ii) a Step e: speech enhancement processing to attenuate gain
Figure 122993DEST_PATH_IMAGE013
And the nonlinear attenuation gain in step d
Figure 25090DEST_PATH_IMAGE014
Co-acting on frequency spectrum of noisy speech frame
Figure 505750DEST_PATH_IMAGE015
To realize the processing of voice enhancement and obtain the pure voice signal frequency spectrum
Figure 126087DEST_PATH_IMAGE016
Preferably, step e is followed by step f of inverse fast fourier transforming the spectrum of the speech signal
Figure 717606DEST_PATH_IMAGE017
Performing a known inverse fast fourier transform, converting the signal from the frequency domain back to the time domain:
Figure 972744DEST_PATH_IMAGE018
preferably, step b further comprises the steps of: step b 1: calculating the posterior signal-to-noise ratio
Figure 889885DEST_PATH_IMAGE019
Figure 669622DEST_PATH_IMAGE020
(ii) a Step b 2: calculating SNR update coefficients
Figure 127148DEST_PATH_IMAGE021
Figure 941520DEST_PATH_IMAGE022
Wherein
Figure 826300DEST_PATH_IMAGE023
For the previous frame of noisy speech data, parameters
Figure 719431DEST_PATH_IMAGE024
A proper value can be selected according to a specific application scene; step b 3: calculating a priori signal-to-noise ratio
Figure 918332DEST_PATH_IMAGE025
Figure 649527DEST_PATH_IMAGE026
(ii) a Step b 4: calculating a priori signal-to-noise ratio
Figure 642891DEST_PATH_IMAGE027
Figure 459537DEST_PATH_IMAGE028
(ii) a Step b 5: calculating optimal attenuation gain by using hyper-geometric distribution correlation calculation formula
Figure 196549DEST_PATH_IMAGE029
(ii) a Step b 6: calculating attenuation gain lower bound
Figure 782251DEST_PATH_IMAGE030
(ii) a Step b 7: calculating to obtain attenuation gain
Figure 212096DEST_PATH_IMAGE031
Preferably, the parameters described in step b2
Figure 139207DEST_PATH_IMAGE024
The value range commonly used is [0.05,0.30]]. Parameter(s)
Figure 476647DEST_PATH_IMAGE024
May be taken to be 0.25.
Preferably, step b5 best attenuation gain
Figure 120118DEST_PATH_IMAGE032
Wherein
Figure 19066DEST_PATH_IMAGE033
Figure 747988DEST_PATH_IMAGE034
For the purpose of the known gamma function,
Figure 889119DEST_PATH_IMAGE035
Figure 387097DEST_PATH_IMAGE036
is based on natural constant
Figure 221061DEST_PATH_IMAGE037
An exponential function of the base is used,
Figure 171699DEST_PATH_IMAGE038
and
Figure 116521DEST_PATH_IMAGE039
are respectively 0 order andbessel function of order 1.
Preferably, the lower attenuation gain limit of step b6
Figure 469005DEST_PATH_IMAGE040
Wherein
Figure 972406DEST_PATH_IMAGE036
Is based on natural constant
Figure 410340DEST_PATH_IMAGE037
An exponential function of the base.
Preferably, the attenuation gain
Figure 830957DEST_PATH_IMAGE041
Wherein
Figure 365844DEST_PATH_IMAGE042
For the weighting coefficients, suitable values can be selected according to the application scenario, and the commonly used value range is [0.60,0.90 ]]。
Preferably, the attenuation gain is non-linear
Figure 479293DEST_PATH_IMAGE012
By passing
Figure 466841DEST_PATH_IMAGE043
And calculating to obtain the result, wherein,
Figure 691149DEST_PATH_IMAGE044
for usual operations on smaller values, i.e.
Figure 814963DEST_PATH_IMAGE045
Preferably, the attenuation gain is non-linear
Figure 663095DEST_PATH_IMAGE012
The calculation process of (2) is as follows:
Figure 75622DEST_PATH_IMAGE046
the invention provides a nonlinear speech enhancement method based on correlation, which can overcome the defects of the prior art method on the premise of lower calculated amount, can more thoroughly remove noise components in a noisy speech signal by utilizing the technical scheme of the invention, and can flexibly compromise the noise removal and the speech quality assurance according to different application scenes.
Drawings
FIG. 1 is a flow chart of a non-linear speech enhancement method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The basic principle of the invention is as follows: a non-linear speech enhancement method. The method comprises the steps of calculating a signal-to-noise ratio by using frequency domain information of a voice signal with noise and a reference noise signal, and calculating attenuation gain values of all frequency bands by using the signal-to-noise ratio; and then calculating the correlation between the voice signal with noise and the reference noise signal, carrying out nonlinear adjustment on the attenuation gain value according to the correlation, and finally multiplying the adjusted attenuation gain by the voice frequency spectrum with noise to obtain the pure voice without noise interference.
FIG. 1 is a flow chart of a non-linear speech enhancement method according to an embodiment of the present invention. The steps of the method of the present invention are further described below with reference to FIG. 1.
The invention concerns the use of known noisy speech
Figure 165938DEST_PATH_IMAGE047
And known estimated noise
Figure 409837DEST_PATH_IMAGE048
On the premise of carrying out voice enhancement processing on noise
Figure 661827DEST_PATH_IMAGE048
The estimation process of (2) is not described.
Step 1, voice preprocessing:
for voice with noise
Figure 122502DEST_PATH_IMAGE047
And estimating noise
Figure 688612DEST_PATH_IMAGE048
Performing frame division processing to obtain the voice data with noise to be enhanced by windowing and frame division processing
Figure 52598DEST_PATH_IMAGE001
And estimating noisy data
Figure 413172DEST_PATH_IMAGE002
Figure 862608DEST_PATH_IMAGE049
Wherein the content of the first and second substances,
Figure 232409DEST_PATH_IMAGE050
for the window function, a Hamming (Hamming) window is used in the present embodiment; the windowing and framing processing is a common and necessary process in digital signal processing, and a digital signal operation processing unit can read and process a limited number of digital signals each time and frames the digital signals according to the number of readable processing each time by using a window function.
Step 2, fast Fourier transform:
to the noisy speech obtained by windowing
Figure 185322DEST_PATH_IMAGE001
And estimating noise
Figure 982376DEST_PATH_IMAGE002
Performing a known fast Fourier transform to obtain a frequency spectrum of the noisy speech frame
Figure 154994DEST_PATH_IMAGE051
And estimating the spectrum of the noise frame
Figure 328486DEST_PATH_IMAGE052
:
Figure 401485DEST_PATH_IMAGE053
Wherein
Figure 369441DEST_PATH_IMAGE054
Known as the fast fourier transform.
Step 3, calculating signal-to-noise ratio and attenuation gain:
in this step, the estimation of the Signal-to-noise ratio and the attenuation gain refers to the classic algorithm proposed by y, Ephraim and d, Malah in "y, Ephraim and d, Malah," Speech enhancement using minimum mean-square error short-time spectral estimation estimator ". IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-32, No. 6, pp. 1109 and 1121, 1984", and the algorithm is improved and simplified, the calculation process is described only briefly, and the detailed information refers to the above-mentioned original text:
1) first calculating the posterior signal-to-noise ratio
Figure 527890DEST_PATH_IMAGE019
Figure 239494DEST_PATH_IMAGE020
2) Then calculating the SNR update coefficient
Figure 727850DEST_PATH_IMAGE021
Figure 866708DEST_PATH_IMAGE022
Wherein
Figure 512453DEST_PATH_IMAGE023
For the previous frame of noisy speech data, parameters
Figure 762168DEST_PATH_IMAGE024
Suitable values can be selected according to specific application scenes, and the common value range is [0.05,0.30]]In the examples of the present invention
Figure 809759DEST_PATH_IMAGE024
Selecting the content to be 0.25;
3) calculating a priori signal-to-noise ratio
Figure 119517DEST_PATH_IMAGE025
Figure 252559DEST_PATH_IMAGE026
In the step, the posterior signal-to-noise ratio calculated in the step 1) is utilized
Figure 305965DEST_PATH_IMAGE019
And 2) the update coefficient calculated in
Figure 709527DEST_PATH_IMAGE021
Weighted summation to obtain estimated prior signal-to-noise ratio
Figure 190187DEST_PATH_IMAGE025
4) Using calculation of 3)
Figure 544945DEST_PATH_IMAGE025
Calculating a priori signal-to-noise ratio
Figure 464359DEST_PATH_IMAGE027
Figure 158646DEST_PATH_IMAGE028
5) Calculating optimal attenuation gain by using hyper-geometric distribution correlation calculation formula
Figure 872524DEST_PATH_IMAGE029
Figure 917840DEST_PATH_IMAGE032
Wherein
Figure 873902DEST_PATH_IMAGE033
Figure 422695DEST_PATH_IMAGE034
For the purpose of the known gamma function,
Figure 71588DEST_PATH_IMAGE035
Figure 338622DEST_PATH_IMAGE036
is an exponential function with a natural constant as the base,
Figure 599839DEST_PATH_IMAGE038
and
Figure 3138DEST_PATH_IMAGE039
for Bessel correlation, see William J.Lentz, "Bessel functions in Mie calibration using coherent fractions";
6) calculating attenuation gain lower bound
Figure 324398DEST_PATH_IMAGE030
Figure 78728DEST_PATH_IMAGE040
Wherein
Figure 213078DEST_PATH_IMAGE036
As in 5), is a natural constant
Figure 798780DEST_PATH_IMAGE037
Bottom exponential function, lower attenuation gain bound
Figure 228625DEST_PATH_IMAGE030
Is a positive value, and is used to determine the optimum attenuation gain
Figure 204671DEST_PATH_IMAGE029
Is limited if
Figure 807691DEST_PATH_IMAGE055
Then, the optimum attenuation gain is described
Figure 949697DEST_PATH_IMAGE029
The value is too small, so that the enhanced speech will have fluctuating "musical noise", and it is necessary to use
Figure 848645DEST_PATH_IMAGE030
To pair
Figure 702200DEST_PATH_IMAGE029
The value of (3) is limited, see the operation process in 7);
7) calculating to obtain attenuation gain
Figure 341867DEST_PATH_IMAGE056
Figure 839844DEST_PATH_IMAGE058
Wherein
Figure 906764DEST_PATH_IMAGE059
Is commonly usedTo take a larger value, i.e.
Figure 122982DEST_PATH_IMAGE060
By using
Figure 67804DEST_PATH_IMAGE030
To pair
Figure 544922DEST_PATH_IMAGE029
Is restricted and is combined with
Figure 51252DEST_PATH_IMAGE061
Weighted summation and squaring are carried out to obtain attenuation gain
Figure 551503DEST_PATH_IMAGE062
(ii) a Wherein
Figure 782240DEST_PATH_IMAGE042
For the weighting coefficients, suitable values can be selected according to the application scenario, and the commonly used value range is [0.60,0.90 ]]In the examples of the present invention, selection
Figure 254809DEST_PATH_IMAGE042
Is 0.75.
Step 4, calculating the correlation between the voice with noise and the noise
In the step, the voice signal with noise is calculated firstly
Figure 853412DEST_PATH_IMAGE063
Power spectrum of
Figure 840960DEST_PATH_IMAGE064
And estimating noise
Figure 65268DEST_PATH_IMAGE065
Power spectrum of
Figure 953196DEST_PATH_IMAGE066
In the step involving
Figure 988279DEST_PATH_IMAGE067
The lower corner indicates the real part of the complex quantity,
Figure 463123DEST_PATH_IMAGE068
the lower corner indicates the imaginary part of the complex quantity:
Figure 287859DEST_PATH_IMAGE069
then calculates the voice signal with noise
Figure 653463DEST_PATH_IMAGE070
And estimating noise
Figure 108715DEST_PATH_IMAGE071
Cross power spectrum of
Figure 70855DEST_PATH_IMAGE072
Figure 158938DEST_PATH_IMAGE073
Then calculates the voice signal with noise
Figure 195027DEST_PATH_IMAGE070
And estimating noise
Figure 883497DEST_PATH_IMAGE071
Cross correlation function of
Figure 21349DEST_PATH_IMAGE074
Figure 125571DEST_PATH_IMAGE075
The invention aims to utilize a noisy speech signal
Figure 842598DEST_PATH_IMAGE047
And estimating noise
Figure 764286DEST_PATH_IMAGE048
The correlation of (2) enhances the effect of speech enhancement, in this step the noisy speech power spectrum is used in the frequency domain
Figure 373122DEST_PATH_IMAGE064
Estimating a noise power spectrum
Figure 608932DEST_PATH_IMAGE066
And cross power spectra of the two
Figure 993514DEST_PATH_IMAGE072
Calculating to obtain a noisy speech signal
Figure 774520DEST_PATH_IMAGE047
And estimating noise
Figure 870652DEST_PATH_IMAGE048
Cross correlation function of
Figure 378993DEST_PATH_IMAGE076
. During speech processing, noisy speech signals
Figure 509760DEST_PATH_IMAGE047
And estimating the noise signal
Figure 396420DEST_PATH_IMAGE048
The cross-correlation function can represent the correlation degree of the voice with noise and the estimated noise in different frequency bands, the cross-correlation function value is larger, the correlation between the voice with noise and the estimated noise is stronger, the voice with noise has no voice component or less voice component, and the noise component has higher ratio; the cross-correlation function value is small, which means that the correlation between the noisy speech and the estimated noise is weak, and it means that the noisy speech contains more speech components, so that the noisy speech and the estimated noise show weak correlation.
Step 5 nonlinear attenuation gain calculation
Computing
Figure 979848DEST_PATH_IMAGE077
Mean value of
Figure 793346DEST_PATH_IMAGE078
Figure 840936DEST_PATH_IMAGE079
In the above formula
Figure 150695DEST_PATH_IMAGE080
Is less than
Figure 969222DEST_PATH_IMAGE081
The integer of the upper limit value is used for controlling the power of the power supply according to different application scenes,
Figure 84945DEST_PATH_IMAGE080
the values may be chosen differently, such as where the noise is concentrated at low frequencies,
Figure 298627DEST_PATH_IMAGE080
a smaller value may be selected, and where the noise characteristics are unknown,
Figure 841604DEST_PATH_IMAGE080
can select and
Figure 196362DEST_PATH_IMAGE081
the upper limit value is the same size. For example, the sampling rate is 16kHz, the frame length in the windowing preprocessing process is 10ms, the number of data points in one frame is 160, the frame stacking method is adopted to perform fast fourier transform and obtain the cross-correlation function
Figure 679558DEST_PATH_IMAGE082
Then, then
Figure 560795DEST_PATH_IMAGE081
A value in the range of 0, 159, if noise is knownThe sound is intensively distributed at the low frequency band of 0Hz-4kHz, so that the sound can be concentrated and distributed
Figure 212356DEST_PATH_IMAGE080
Value is selected as 79 to obtain
Figure 818525DEST_PATH_IMAGE078
Mean value of correlation according to frequency band of interest
Figure 797474DEST_PATH_IMAGE078
Determining whether to apply nonlinear attenuation gain to the current frame, and comparing
Figure 408584DEST_PATH_IMAGE078
Correlation threshold
Figure 496626DEST_PATH_IMAGE083
If, if
Figure 825976DEST_PATH_IMAGE084
In the frequency domain segment of interest, the correlation between the current speech frame data and the estimated noise data is small, the speech occupies the main component, and in order to ensure that the voice quality is not damaged, the nonlinear attenuation gain is not applied, and the nonlinear attenuation gain is applied
Figure 821614DEST_PATH_IMAGE085
Setting the value to be 1.0; if it is not
Figure 490493DEST_PATH_IMAGE086
In the frequency domain segment of interest, the correlation between the current speech frame data and the estimated noise data is large, the noise component is dominant, in order to better achieve the speech component enhancement effect, a nonlinear attenuation gain is applied to further remove the noise, and the nonlinear attenuation gain
Figure 310288DEST_PATH_IMAGE087
By passing
Figure 861355DEST_PATH_IMAGE043
The calculation results in that,wherein
Figure 552361DEST_PATH_IMAGE044
For usual operations on smaller values, i.e.
Figure 75746DEST_PATH_IMAGE045
Use of
Figure 426962DEST_PATH_IMAGE044
Is to ensure
Figure 229440DEST_PATH_IMAGE088
Ensuring nonlinear attenuation gain
Figure 832459DEST_PATH_IMAGE089
The effect of attenuation rather than amplification is played for noisy speech.
In summary,
Figure 210351DEST_PATH_IMAGE090
the calculation process is as follows:
Figure 873414DEST_PATH_IMAGE046
wherein
Figure 336756DEST_PATH_IMAGE083
The appropriate value can be selected according to the specific application scene, and the value can also be considered as a compromise between removing noise interference and ensuring voice tone quality if the value is selected
Figure 743467DEST_PATH_IMAGE083
If a larger value is selected, then according to the above formula,
Figure 805226DEST_PATH_IMAGE089
the probability of being set to 1.0 is increased, the effect of nonlinear attenuation gain is weakened, and noise is left while the voice quality is ensured not to be damaged; if it is
Figure 373610DEST_PATH_IMAGE083
The smaller value is selected to be the value of,
Figure 652145DEST_PATH_IMAGE091
the probability of being set to 1.0 is reduced and the effect of the nonlinear attenuation gain is enhanced, allowing better removal of noise interference, but if it is set to 1.0
Figure 596967DEST_PATH_IMAGE083
If the selected value is too small, the nonlinear attenuation gain is too large, which may damage the voice quality. Thus, it is possible to provide
Figure 949451DEST_PATH_IMAGE083
Appropriate values need to be selected according to specific application scenarios, and the commonly used value range is [0.70, 0.80 ]]In the examples of the present invention
Figure 187272DEST_PATH_IMAGE083
The value is 0.735.
Step 6 speech enhancement processing
The attenuation gain calculated in the step 3
Figure 687524DEST_PATH_IMAGE092
And the nonlinear attenuation gain calculated in the step 5
Figure 436037DEST_PATH_IMAGE093
Acting together on the spectrum of noisy speech
Figure 705344DEST_PATH_IMAGE094
The voice enhancement processing is realized:
Figure 818794DEST_PATH_IMAGE095
spectrum of noisy speech signal
Figure 307806DEST_PATH_IMAGE094
Obtained by calculation using signal-to-noise ratioAttenuation gain of
Figure 532114DEST_PATH_IMAGE092
On the basis of the action, nonlinear attenuation gain processing is further carried out, noise is better removed by utilizing the nonlinear attenuation gain, and purer voice is obtained
Figure 921507DEST_PATH_IMAGE096
Step 7 inverse fast fourier transform
For the speech signal frequency spectrum obtained by enhancement processing
Figure 268175DEST_PATH_IMAGE096
Performing a known inverse fast fourier transform, converting the signal from the frequency domain back to the time domain:
Figure 415122DEST_PATH_IMAGE097
to obtain an enhanced time-domain speech signal
Figure 505438DEST_PATH_IMAGE098
Wherein
Figure 687021DEST_PATH_IMAGE099
Known as the inverse fast fourier transform.
The present invention is not limited to the above-described preferred embodiments, but rather, the present invention is intended to cover all modifications, equivalents, and improvements falling within the spirit and scope of the present invention.

Claims (10)

1. A non-linear speech enhancement method based on correlation is characterized in that the method mainly comprises the following steps:
step a: noisy speech data s preprocessed for speechw(l) And estimated noise data n'w(l) Performing fast Fourier transform to obtain a frequency spectrum S (k) of a voice frame with noise and a frequency spectrum N' (k) of an estimated noise frame;
step b: calculating signal-to-noise ratio and attenuation gain to obtainThe attenuation gain (k) is,
Figure FDA0002769659100000011
Gainopt(k) for optimum attenuation Gain, Gain _ floor (k) is the lower attenuation Gain limit, and MAX (-) is the usual larger operation, i.e.
Figure FDA0002769659100000012
Wherein λ is a weighting coefficient;
step c: calculating the correlation between the voice with noise and the noise, calculating to obtain the cross-correlation function CohSN (k) of the frequency spectrum S (k) of the voice frame with noise and the frequency spectrum N' (k) of the estimated noise frame,
Figure FDA0002769659100000013
wherein, spsd (k) is a power spectrum of the speech with noise, npsd (k) is an estimated power spectrum of the noise, snpsd (k) is a cross-power spectrum of a frequency spectrum s (k) of the speech frame with noise and a frequency spectrum N' (k) of the estimated noise frame, r lower corner mark related represents a real part of the complex quantity, i lower corner mark represents an imaginary part of the complex quantity;
step d: calculating nonlinear attenuation gain to obtain the nonlinear attenuation gain Nlpgain (k);
step e: the speech enhancement processing, in which the attenuation gain (k) and the nonlinear attenuation gain Nlpgain (k) in the step d are jointly applied to the spectrum S (k) of the noisy speech frame to realize the speech enhancement processing, so as to obtain a pure speech signal spectrum Sout(k)。
2. The method of claim 1, wherein said step e is further followed by a step f of spectrally separating said speech signal Sout(k) Performing a known inverse fast fourier transform, converting the signal from the frequency domain back to the time domain: sout(l)=IFFT(Sout(k))。
3. The method of claim 2, wherein step b further comprises the steps of: step b 1: calculating the A posteriori SNRpost(k),SNRpost(k) (k)/N' (k) -1.0; step b 2: calculating the update coefficient gamma (k) of the S/N ratio, wherein the gamma (k) is alpha + (1-alpha) x (S)n-1(k)/(Sn-1(k)+N′(k)))2In which S isn-1(k) For the previous frame of voice data with noise, the parameter alpha can be selected to be a proper value according to a specific application scene; step b 3: calculating the prior SNRprior(k),SNRprior(k)=γ(k)·SNRpost(k)+(1-γ(k))·(Sn-1(k) N' (k)); step b 4: calculating the prior SNRratio(k),
Figure FDA0002769659100000021
Step b 5: calculating optimal attenuation Gain by using hyper-geometric distribution correlation calculation formulaopt(k) (ii) a Step b 6: calculating the lower attenuation Gain limit Gain _ floor (k); step b 7: the attenuation gain (k) is calculated.
4. The method of claim 3, wherein the value of the parameter α in the step b2 is [0.05,0.30 ].
5. The method according to claim 4, characterized in that the parameter α takes 0.25.
6. Method according to claim 4 or 5, characterized in that said optimal attenuation gain in step b5
Figure FDA0002769659100000022
Where θ (k) is SNRratio(k)·(1.0+SNRpost(k) Γ () is a known gamma function,
Figure FDA0002769659100000023
exp (-) is an exponential function based on a natural constant e, I0(. and I)1(. cndot.) are Bessel functions of order 0 and 1, respectively.
7. The method of claim 6, wherein the first and second light sources are selected from the group consisting of,wherein said attenuation gain lower limit in said step b6
Figure FDA0002769659100000024
Where exp (·) is an exponential function based on a natural constant e.
8. The method of claim 7, wherein the attenuation gain
Figure FDA0002769659100000025
Wherein λ is weighting coefficient, and the value range is [0.60,0.90 ]]。
9. The method of claim 8, wherein the nonlinear damping gain NlpGaln (k) is calculated by MIN (1.0,1.35-CohSN (k)), where MIN () is a commonly used lower-value operation, i.e.
Figure FDA0002769659100000031
10. The method of claim 9 wherein the nonlinear damping gain nlpgain (k) is calculated by:
Figure FDA0002769659100000032
AvgCoh is the mean value of the correlation of the frequency bands, and ThreCoh is the correlation threshold.
CN201610079921.0A 2016-02-05 2016-02-05 Non-linear voice enhancement method based on correlation Active CN107045874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610079921.0A CN107045874B (en) 2016-02-05 2016-02-05 Non-linear voice enhancement method based on correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610079921.0A CN107045874B (en) 2016-02-05 2016-02-05 Non-linear voice enhancement method based on correlation

Publications (2)

Publication Number Publication Date
CN107045874A CN107045874A (en) 2017-08-15
CN107045874B true CN107045874B (en) 2021-03-02

Family

ID=59542672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610079921.0A Active CN107045874B (en) 2016-02-05 2016-02-05 Non-linear voice enhancement method based on correlation

Country Status (1)

Country Link
CN (1) CN107045874B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108682429A (en) * 2018-05-29 2018-10-19 平安科技(深圳)有限公司 Sound enhancement method, device, computer equipment and storage medium
CN109410975B (en) * 2018-10-31 2021-03-09 歌尔科技有限公司 Voice noise reduction method, device and storage medium
CN110335618B (en) * 2019-06-06 2021-07-30 福建星网智慧软件有限公司 Method for improving nonlinear echo suppression and computer equipment
CN114047010A (en) * 2021-11-10 2022-02-15 宜昌达瑞机电科技有限公司 Vibration and noise detection method and device for automobile air conditioner and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050025263A1 (en) * 2003-07-23 2005-02-03 Gin-Der Wu Nonlinear overlap method for time scaling
CN102065190A (en) * 2010-12-31 2011-05-18 杭州华三通信技术有限公司 Method and device for eliminating echo
CN104685903A (en) * 2012-10-09 2015-06-03 皇家飞利浦有限公司 Method and apparatus for audio interference estimation

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001352594A (en) * 2000-06-07 2001-12-21 Sony Corp Method and device for reducing wind sound
JP4282227B2 (en) * 2000-12-28 2009-06-17 日本電気株式会社 Noise removal method and apparatus
US20040024596A1 (en) * 2002-07-31 2004-02-05 Carney Laurel H. Noise reduction system
US7680656B2 (en) * 2005-06-28 2010-03-16 Microsoft Corporation Multi-sensory speech enhancement using a speech-state model
US8615393B2 (en) * 2006-11-15 2013-12-24 Microsoft Corporation Noise suppressor for speech recognition
KR20080075362A (en) * 2007-02-12 2008-08-18 인하대학교 산학협력단 A method for obtaining an estimated speech signal in noisy environments
CN101192411B (en) * 2007-12-27 2010-06-02 北京中星微电子有限公司 Large distance microphone array noise cancellation method and noise cancellation system
US9142221B2 (en) * 2008-04-07 2015-09-22 Cambridge Silicon Radio Limited Noise reduction
CN101673550A (en) * 2008-09-09 2010-03-17 联芯科技有限公司 Spectral gain calculating method and device and noise suppression system
CN101430882B (en) * 2008-12-22 2012-11-28 无锡中星微电子有限公司 Method and apparatus for restraining wind noise
CN101510426B (en) * 2009-03-23 2013-03-27 北京中星微电子有限公司 Method and system for eliminating noise
CN101582264A (en) * 2009-06-12 2009-11-18 瑞声声学科技(深圳)有限公司 Method and voice collecting system for speech enhancement
JP5629372B2 (en) * 2010-06-17 2014-11-19 ドルビー ラボラトリーズ ライセンシング コーポレイション Method and apparatus for reducing the effects of environmental noise on a listener
CN101976566B (en) * 2010-07-09 2012-05-02 瑞声声学科技(深圳)有限公司 Voice enhancement method and device using same
CN101976565A (en) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 Dual-microphone-based speech enhancement device and method
CN101894563B (en) * 2010-07-15 2013-03-20 瑞声声学科技(深圳)有限公司 Voice enhancing method
CN102347028A (en) * 2011-07-14 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
FR2992459B1 (en) * 2012-06-26 2014-08-15 Parrot METHOD FOR DEBRUCTING AN ACOUSTIC SIGNAL FOR A MULTI-MICROPHONE AUDIO DEVICE OPERATING IN A NOISE MEDIUM
CN103730126B (en) * 2012-10-16 2017-04-05 联芯科技有限公司 Noise suppressing method and noise silencer
US9401746B2 (en) * 2012-11-27 2016-07-26 Nec Corporation Signal processing apparatus, signal processing method, and signal processing program
US9449615B2 (en) * 2013-11-07 2016-09-20 Continental Automotive Systems, Inc. Externally estimated SNR based modifiers for internal MMSE calculators
KR101535135B1 (en) * 2013-12-31 2015-07-24 서울대학교산학협력단 Method and system forspeech enhancement using non negative matrix factorization and basis matrix update

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050025263A1 (en) * 2003-07-23 2005-02-03 Gin-Der Wu Nonlinear overlap method for time scaling
CN102065190A (en) * 2010-12-31 2011-05-18 杭州华三通信技术有限公司 Method and device for eliminating echo
CN104685903A (en) * 2012-10-09 2015-06-03 皇家飞利浦有限公司 Method and apparatus for audio interference estimation

Also Published As

Publication number Publication date
CN107045874A (en) 2017-08-15

Similar Documents

Publication Publication Date Title
CN108831499B (en) Speech enhancement method using speech existence probability
US7359838B2 (en) Method of processing a noisy sound signal and device for implementing said method
US8010355B2 (en) Low complexity noise reduction method
CN105280193B (en) Priori signal-to-noise ratio estimation method based on MMSE error criterion
CN107045874B (en) Non-linear voice enhancement method based on correlation
CN107610712B (en) Voice enhancement method combining MMSE and spectral subtraction
JP2021128328A (en) Method for enhancing telephone voice signal based on convolutional neural network
Mosayyebpour et al. Single-microphone early and late reverberation suppression in noisy speech
CN105489226A (en) Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup
Martín-Doñas et al. Dual-channel DNN-based speech enhancement for smartphones
Zou et al. Speech signal enhancement based on MAP algorithm in the ICA space
Ravi et al. A survey on speech enhancement methodologies
Li et al. Single-channel speech enhancement based on improved frame-iterative spectral subtraction in the modulation domain
Yamashita et al. Spectral subtraction iterated with weighting factors
Goel et al. Developments in spectral subtraction for speech enhancement
Alam et al. Speech enhancement based on novel two-step a priori SNR estimators.
Lu et al. Speech enhancement using hybrid gain factor in critical-band-wavelet-packet transform
Sulong et al. Speech enhancement based on wiener filter and compressive sensing
Jeub et al. Blind Dereverberation for Hearing Aids with Binaural Link.
Nemade et al. Performance comparison of single channel Speech enhancement techniques for personal Communication
Esch et al. Combined reduction of time varying harmonic and stationary noise using frequency warping
Rao et al. Speech enhancement using cross-correlation compensated multi-band wiener filter combined with harmonic regeneration
Islam et al. Enhancement of noisy speech based on decision-directed Wiener approach in perceptual wavelet packet domain
Jan et al. Joint blind dereverberation and separation of speech mixtures
Alam et al. COMPARATIVE STUDY OF A PRIORI SIGNAL-TONOISE RATIO (SNR) ESTIMATION APPROACHES FOR SPEECH ENHANCEMENT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant