CN107910016A

CN107910016A - A kind of noise containment determination methods of noisy speech

Info

Publication number: CN107910016A
Application number: CN201711372174.0A
Authority: CN
Inventors: 王亦红
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2017-12-19
Filing date: 2017-12-19
Publication date: 2018-04-13
Anticipated expiration: 2037-12-19
Also published as: CN107910016B

Abstract

The invention discloses a kind of noise containment determination methods of noisy speech, choose noisy speech sample, calculate the prior weight of each each Frequency point of noisy speech sample signal, and find out wherein minimum prior weight；The minimum prior weight of each sample is contrasted, determines the threshold value of Noisy Speech Signal noise containment.Framing and preemphasis are carried out to noisy speech, and carry out end-point detection；Power Spectral Estimation is carried out respectively to noise and noisy speech；Calculate the prior weight of each Frequency point of each frame.If the prior weight for having one and its frequencies above point is less than threshold value, the noise of the frame voice can not hold, and need speech enhan-cement to handle.If the prior weight of each Frequency point of the frame is all higher than threshold value, the noise of the frame voice signal can hold, without speech enhan-cement processing.According to the determination methods, noisy speech that recognizable noise can hold.The noisy speech that can hold for noise, without speech enhan-cement, so as to avoid the loss of effective information.

Description

A kind of noise containment determination methods of noisy speech

Technical field

The present invention relates to a kind of noise for judging noisy speech whether the method that can hold, belong to the processing of Noisy Speech Signal Technical field.

Background technology

In speech signal processing, noisy speech while its signal-to-noise ratio is improved, can cause after speech enhan-cement The loss of part efficient voice information, causes voice signal distortion.According to the auditory masking effect of human ear, the presence of voice signal The threshold value that its ambient noise is heard by human ear can be improved.Based on the principle, the ambient noises of some Noisy Speech Signals can not be by Human ear hears, even if or heard by human ear, people will not be allowed to produce discomfort.Such ambient noise is referred to as by the present invention Noise can be held, and give judge noise whether the method that can hold.

The content of the invention

Goal of the invention：For problems of the prior art, the present invention provides a kind of noise containment of noisy speech Determination methods, for Noisy Speech Signal, if judging, its ambient noise can hold, and in signal processing, can be kept away without de-noising The loss of efficient voice information caused by de-noising is exempted from, so as to remain effective information to greatest extent.

Technical solution：A kind of noise containment determination methods of noisy speech, including threshold value determine and based on threshold value Judge two parts.

The step of Part I threshold value

The first step, records several sections of pure voice signals

Second step, in Noisex-92 noises storehouse, respectively from impulse noise, broadband noise, periodic noise and voice interference four It is each to extract several scenes noise sample in noise-like signal.

3rd step, adds each noise sample in each pure voice signal that the first step is recorded respectively, is formed various Noisy Speech Signal.In the SNR ranges of 0dB to 20dB, based on different signal-to-noise ratio, each noisy speech is increased Manage strength.

4th step, respectively each noisy speech forward and backward to speech enhan-cement progress MOS marking.

5th step, as one group, it is forward and backward speech enhan-cement is therefrom found out using based on the same noisy speech of different signal-to-noise ratio The convergent signal of MOS marking, and the sample signal using the relatively minimum signal of wherein signal-to-noise ratio as this noisy speech.

6th step, carries out noisy speech sample framing and preemphasis is handled, and then, endpoint is carried out to noisy speech sample Detection, and power Spectral Estimation is carried out to noise and noisy speech respectively；

7th step, calculates the prior weight of each Frequency point in each frame of each noisy speech sample respectively：

The prior weight of ξ (n, k)-n-th frame kth Frequency point in formula；

The noisy speech power of-n-th frame in kth Frequency point.

Noise power of-the n-th frame in kth Frequency point.

α-value is 0.98

Find out the minimum prior weight of each noisy speech sample.The minimum prior weight of each sample is contrasted, really Determine the threshold value of noisy speech noise containment.Based on above-mentioned steps, present invention determine that threshold value be 0.95-1.05.

Part II judges whether the noise of Noisy Speech Signal can hold

The first step, carries out Noisy Speech Signal framing and preemphasis is handled,

Second step, to Noisy Speech Signal this progress end-point detection,

3rd step, carries out power Spectral Estimation to noise and noisy speech respectively；

4th step, the prior weight of each Frequency point in each frame of noisy speech is calculated according to formula (1) respectively：

5th step, the selected threshold in the range of 0.95-1.05.If the priori noise of each Frequency point of noisy speech frame Than less than selected threshold value, then it is assumed that the noise of the noisy speech frame can not hold, it is necessary to speech enhan-cement.Conversely, then it is considered It can hold, without enhancing.

Brief description of the drawings

Fig. 1 is the method flow block diagram of the embodiment of the present invention；

Fig. 2 is the pure voice signal waveform diagram of the embodiment of the present invention；

Fig. 3 is the Noisy Speech Signal waveform diagram that the signal-to-noise ratio of the embodiment of the present invention is 15dB；

Fig. 4 is the filtered waveform diagram that the addition containment of the embodiment of the present invention judges；

Fig. 5 is the filtered waveform diagram for not adding containment judgement of the embodiment of the present invention.

Embodiment

With reference to specific embodiment, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after the present invention has been read, various equivalences of the those skilled in the art to the present invention The modification of form falls within the application appended claims limited range.

The method that judging the noise of noisy speech can hold includes definite and judgement two parts based on threshold value of threshold value：

Threshold value determines

The first step, noise sample is extracted from Noisex-92 noises storehouse

Extract four classes such as impulse noise, broadband noise, periodic noise and voice interference respectively from Noisex-92 noises storehouse Noise sample of the additive noise under different scenes.Wherein, impact noise class selection applause, hammer knock is as noise sample This；The selection of broadband noise class is with the noise in the car of 130km/h travelings, the noise in noisy workshop, and road On clamour as sample noise；Periodic noise selects outdoor machine of air-conditioner, electric wind and the sound that is sent with hair dryer etc. to make respectively For sample noise；Voice interference selects the more people's speeches of office and people's sound of speech as noise sample respectively.

Second step, the formation and enhancing of noisy speech

Each noise sample is separately added into following five sections of pure voice signals：" if above kind water, thick moral loading ", " water energy Carrying boat also can capsized boat ", " sailing against the current, not to advance is to go back ", " one of voice enhancement algorithm ", " noise sample chosen ", forms Various noisy speeches, and in the SNR ranges of 0dB to 20dB, based on different signal-to-noise ratio, respectively to various noisy speeches Enhancing.Wherein, using the voice enhancement algorithm of short-time energy average respectively into hammer knock and with applause Voice signal is strengthened；The spectrum-subtraction based on auditory masking effect is used to strengthen respectively with the small sedan-chair travelled with 130km/h In-car noise, the voice signal with the noise in noisy workshop and with noisy noise on road；Using LMS certainly Suitable filter method strengthens with hair dryer noise, powered fan noise and voice signal with outdoor machine of air-conditioner noise respectively；Using comb The voice signal that shape wave filter difference speech enhan-cement is disturbed with a people or multi-person speech

3rd step, the selection of sample signal

Each noisy speech forward and backward to speech enhan-cement carries out MOS marking respectively.By based on the same of different signal-to-noise ratio Kind of Noisy Speech Signal therefrom finds out the forward and backward MOS of speech enhan-cement and gives a mark convergent signal as one group, and will wherein noise Than sample of the relatively minimum signal as this Noisy Speech Signal.

4th, minimum prior weight threshold value is utilized based on sample signal

As follows, the prior weight of each Frequency point in each each frame of noisy speech sample is calculated respectively：

The prior weight of ξ (n, k)-n-th frame kth Frequency point in formula；

The noisy speech power of-n-th frame in kth Frequency point.

Noise power of-the n-th frame in kth Frequency point.

α-value is 0.98

Find out the minimum prior weight of each noisy speech sample.The minimum prior weight of each sample is contrasted, really The threshold value for determining noisy speech noise containment is 0.95-1.05.

Part II judges whether the noise of Noisy Speech Signal can hold.

One section of pure voice signal of women " one of voice enhancement algorithm " is enrolled, speech waveform is as shown in Figure 2.Additive noise takes Factory2 factory noises in Noisex-92 standard noises storehouse.The band that waveform signal-to-noise ratio as shown in Figure 3 is 15dB is formed to make an uproar language Sound signal.Noisy Speech Signal shown in Fig. 3 is carried out respectively without noise containment to judge and with the judgement of noise containment Wiener filtering.It is as follows to implement step；

First, to Noisy Speech Signal preemphasis and adding window framing, and end-point detection is carried out；

Secondly, power Spectral Estimation is carried out to noise and noisy speech respectively；

Then, the prior weight of each Frequency point in each frame of noisy speech is calculated respectively according to formula (1)；

Finally, for the threshold value of 0.95-1.05,0.95 threshold value judged as noise containment is taken.If noisy speech The prior weight of each Frequency point of frame is less than 0.95, then it is assumed that the noise of the noisy speech frame can not hold, and need Wiener filtering. Conversely, it is considered as then to hold, without filtering.Noisy Speech Signal shown in Fig. 3 clicks here reason frame by frame, the waveform after processing As shown in Figure 4.For comparative descriptions, to the Noisy Speech Signal shown in Fig. 3, containment judgement is not introduced, directly using wiener Filtering is handled, its handling result is as shown in Figure 5.

Claims

A kind of 1. noise containment determination methods of noisy speech, it is characterised in that extraction several scenes noise sample, by noise Sample is added in pure voice signal, is formed Noisy Speech Signal, enhancing processing is carried out to noisy speech, respectively to speech enhan-cement Each forward and backward noisy speech carries out MOS marking, and framing is carried out to Noisy Speech Signal and preemphasis is handled, and is made an uproar to band Voice signal carries out end-point detection；Secondly, power Spectral Estimation is carried out respectively to noise and noisy speech；Calculate Noisy Speech Signal The prior weight of each each Frequency point of frame, if the prior weight of each Frequency point is all higher than threshold value, the frame voice Noise can hold, without speech enhan-cement processing；If the prior weight for having one and its frequencies above point is less than threshold value, the frame language The noise not tolerable of sound, needs speech enhan-cement to handle.
2. the noise containment determination methods of noisy speech as claimed in claim 1, it is characterised in that make an uproar in Noisex-92 Sound storehouse, disturbs in four noise-like signals from impulse noise, broadband noise, periodic noise and voice, respectively extracts several scenes respectively Noise sample.
3. the noise containment determination methods of noisy speech as claimed in claim 1, it is characterised in that each is made an uproar respectively Sound sample is added in each pure voice signal that the first step is recorded, and forms various Noisy Speech Signals；In the noise of 0dB to 20dB Than in the range of, based on different signal-to-noise ratio, enhancing processing is carried out to each noisy speech.
4. the noise containment determination methods of noisy speech as claimed in claim 1, it is characterised in that different noises will be based on The same noisy speech of ratio therefrom finds out the convergent signal of the forward and backward MOS marking of speech enhan-cement as one group, and will wherein Sample signal of the relatively minimum signal of signal-to-noise ratio as this noisy speech；Framing and preemphasis are carried out to noisy speech sample Processing, then, carries out noisy speech sample end-point detection, and carry out power Spectral Estimation to noise and noisy speech respectively.
5. the noise containment determination methods of noisy speech as claimed in claim 1, it is characterised in that calculate each band respectively Make an uproar the prior weight of each Frequency point in each frame of speech samples；Then, the minimum prior weight of each sample is contrasted, Determine the threshold value that Noisy Speech Signal noise can hold.
6. the noise containment determination methods of noisy speech as claimed in claim 1, it is characterised in that the threshold range is 0.95-1.05dB。