CN107910016B - Noise tolerance judgment method for noisy speech - Google Patents

Noise tolerance judgment method for noisy speech Download PDF

Info

Publication number
CN107910016B
CN107910016B CN201711372174.0A CN201711372174A CN107910016B CN 107910016 B CN107910016 B CN 107910016B CN 201711372174 A CN201711372174 A CN 201711372174A CN 107910016 B CN107910016 B CN 107910016B
Authority
CN
China
Prior art keywords
noise
voice
signal
sample
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711372174.0A
Other languages
Chinese (zh)
Other versions
CN107910016A (en
Inventor
王亦红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201711372174.0A priority Critical patent/CN107910016B/en
Publication of CN107910016A publication Critical patent/CN107910016A/en
Application granted granted Critical
Publication of CN107910016B publication Critical patent/CN107910016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Noise Elimination (AREA)

Abstract

The invention discloses a noise tolerance judgment method of a voice with noise, which comprises the steps of selecting a voice sample with noise, calculating the prior signal-to-noise ratio of each frequency point of each voice sample signal with noise, and finding out the minimum prior signal-to-noise ratio; and comparing the minimum prior signal-to-noise ratio of each sample, and determining the threshold value of the noise tolerance of the voice signal with noise. Carrying out framing and pre-emphasis on the voice with noise, and carrying out end point detection; respectively estimating power spectrums of the noise and the voice with the noise; and calculating the prior signal-to-noise ratio of each frequency point of each frame. If the prior signal-to-noise ratio of one or more frequency points is less than the threshold value, the noise of the frame of voice is not compatible, and voice enhancement processing is required. If the prior signal-to-noise ratio of each frequency point of the frame is greater than the threshold value, the noise of the frame voice signal is tolerable, and voice enhancement processing is not needed. According to the determination method, a noisy speech with acceptable noise can be recognized. For noisy speech with noise tolerance, speech enhancement is not needed, thereby avoiding the loss of effective information.

Description

Noise tolerance judgment method for noisy speech
Technical Field
The invention relates to a method for judging whether noise of a voice with noise is tolerable or not, and belongs to the technical field of processing of voice signals with noise.
Background
In the speech signal processing, after the speech enhancement, the signal-to-noise ratio of the noisy speech is improved, and simultaneously, part of effective speech information is lost, so that the speech signal distortion is caused. The presence of a speech signal may raise the threshold at which its background noise is heard by the human ear, depending on the auditory masking effect of the human ear. Based on this principle, the background noise of some noisy speech signals may not be heard by the human ear or, even if heard by the human ear, may not cause discomfort. The present invention refers to such background noise as a tolerable noise, and provides a method for judging whether the noise is tolerable or not.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a noise tolerance judging method of a voice with noise, and for the voice signal with noise, if the background noise is judged to be tolerant, the noise can not be eliminated during signal processing, so that the loss of effective voice information caused by noise elimination is avoided, and the effective information is retained to the maximum extent.
The technical scheme is as follows: a noise tolerance judging method for noisy speech includes determining threshold value and judging based on threshold value.
Step of first part determining threshold
First, recording several segments of pure speech signal
And secondly, respectively extracting a plurality of scene noise samples from four noise signals of impulse noise, broadband noise, periodic noise and voice interference in a noise library of Noisex-92.
And thirdly, respectively adding each noise sample into each pure voice signal recorded in the first step to form various voice signals with noise. Enhancement processing is performed on each noisy speech based on different signal-to-noise ratios within a signal-to-noise ratio range of 0dB to 20 dB.
And fourthly, respectively carrying out MOS scoring on each noisy voice before and after voice enhancement.
And fifthly, taking the same kind of noisy speech based on different signal-to-noise ratios as a group, finding out signals with convergent MOS scores before and after speech enhancement, and taking the signal with the relatively lowest signal-to-noise ratio as a sample signal of the noisy speech.
Sixthly, performing framing and pre-emphasis processing on the voice sample with noise, then performing end point detection on the voice sample with noise, and performing power spectrum estimation on the noise and the voice with noise respectively;
seventhly, respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of each noisy speech sample:
Figure BDA0001513956630000021
xi (n, k) -the prior signal-to-noise ratio of the kth frequency point of the nth frame;
Figure BDA0001513956630000022
-noisy speech power at the k frequency point for the nth frame.
Figure BDA0001513956630000023
-noise power at k frequency point for n frame.
Alpha-value of 0.98
A minimum a priori signal-to-noise ratio is found for each noisy speech sample. And comparing the minimum prior signal-to-noise ratio of each sample, and determining a threshold value of the noise tolerance of the voice with noise. Based on the steps, the threshold value determined by the invention is 0.95-1.05.
The second part judges whether the noise of the voice signal with noise is tolerable
Firstly, carrying out framing and pre-emphasis processing on a noisy speech signal,
secondly, carrying out endpoint detection on the noisy speech signal book,
thirdly, respectively carrying out power spectrum estimation on the noise and the voice with the noise;
fourthly, respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of the voice with noise according to the formula (1):
fifthly, selecting a threshold value within the range of 0.95-1.05. And if the prior signal-to-noise ratio of each frequency point of the voice frame with noise is smaller than the selected threshold, the noise of the voice frame with noise is considered to be not tolerable, and voice enhancement is required. Otherwise, it is considered as being acceptable without enhancement.
Drawings
FIG. 1 is a block diagram of a method flow of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a pure speech signal waveform according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a waveform of a noisy speech signal with a signal-to-noise ratio of 15dB according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a filtered waveform with a tolerable decision added according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a filtered waveform without adding a tolerance determination according to an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
The method for judging the noise tolerance of the noisy speech comprises two parts of threshold value determination and threshold value-based judgment:
determination of threshold values
First, noise samples are extracted from the noise bank Noisex-92
Noise samples of four types of additive noise such as impulse noise, broadband noise, periodic noise and voice interference under different scenes are respectively extracted from a Noisex-92 noise library. Wherein, the impact noise is selected from the palm sound, and the hammer knock sound is used as a noise sample; selecting the noise in a car running at 130km/h, the noise in a noisy workshop and the noise on a road as sample noise by the broadband noise class; periodic noise respectively selects sounds emitted by an air conditioner external unit, electric wind, an electric hair drier and the like as sample noise; the speech interference selects office multi-person speech and one-person speech respectively as noise samples.
Second, the formation and enhancement of noisy speech
Each noise sample is added to the following five pure voice signals respectively: the method comprises the steps of 'carrying water if the water is good, carrying a great deal of goods', 'carrying water energy boat can also cover a boat', 'moving backward if the water energy boat does not move', a 'voice enhancement algorithm', and 'selected noise samples', forming various noisy voices, and respectively enhancing the various noisy voices based on different signal-to-noise ratios within the range of the signal-to-noise ratio from 0dB to 20 dB. The method comprises the following steps that a short-time energy mean value voice enhancement algorithm is adopted to respectively enhance voice signals with hammer knocking sound and clapping sound; respectively enhancing the noise in a car running at 130km/h, the noise in a noisy workshop and the voice signals with the noisy noise on a road by adopting a spectral subtraction method based on an auditory masking effect; respectively enhancing voice signals with electric hair drier noise, electric fan noise and air conditioner outdoor unit noise by adopting an LMS adaptive filtering method; speech enhancement of speech signals with speech interference of one or more persons using comb filters
Third, selecting sample signal
And respectively carrying out MOS scoring on each noisy voice before and after voice enhancement. The same noisy speech signals based on different signal-to-noise ratios are used as a group, signals with the MOS scores converging before and after speech enhancement are found out, and the signal with the relatively lowest signal-to-noise ratio is used as a sample of the noisy speech signals.
Fourth, threshold is determined based on sample signal with minimum prior signal-to-noise ratio
Respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of each noisy speech sample according to the following formula:
Figure BDA0001513956630000031
xi (n, k) -the prior signal-to-noise ratio of the kth frequency point of the nth frame;
Figure BDA0001513956630000041
-noisy speech power at the k frequency point for the nth frame.
Figure BDA0001513956630000042
-noise power at k frequency point for n frame.
Alpha-value of 0.98
A minimum a priori signal-to-noise ratio is found for each noisy speech sample. And comparing the minimum prior signal-to-noise ratio of each sample, and determining that the threshold value of the noise tolerance of the voice with noise is 0.95-1.05.
The second part determines whether the noise of the noisy speech signal is tolerable.
A female pure speech signal, one of the speech enhancement algorithms, is recorded, and the speech waveform is shown in fig. 2. Additive noise was taken as factory noise of factary 2 in the Noisex-92 standard noise library. A noisy speech signal with a signal-to-noise ratio of 15dB is formed as shown in fig. 3. Wiener filtering is performed on the noisy speech signal shown in fig. 3, without noise tolerance determination and with noise tolerance determination, respectively. The specific implementation steps are as follows;
firstly, pre-emphasizing a voice signal with noise, windowing and framing, and carrying out end point detection;
secondly, respectively carrying out power spectrum estimation on the noise and the voice with the noise;
then, respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of the voice with noise according to the formula (1);
finally, for the threshold value of 0.95-1.05, 0.95 is taken as the threshold value for noise tolerance judgment. And if the prior signal-to-noise ratio of each frequency point of the voice frame with the noise is less than 0.95, considering that the noise of the voice frame with the noise is not tolerable and needing wiener filtering. Otherwise, it is considered to be acceptable without filtering. The noisy speech signal shown in fig. 3 is processed frame by frame, and the processed waveform is shown in fig. 4. For comparison, the noise-containing speech signal shown in fig. 3 is processed by wiener filtering without introducing a tolerable degree decision, and the processing result is shown in fig. 5.

Claims (1)

1. A noise tolerance judging method of a voice with noise is characterized by comprising two parts of threshold value determination and threshold value-based judgment, and specifically comprises the following steps:
(1) a step of determining a threshold:
(1.1) recording several segments of pure voice signals;
(1.2) respectively extracting a plurality of scene noise samples from four types of noise signals, namely impulse noise, broadband noise, periodic noise and voice interference in a noise library of Noisex-92;
(1.3) respectively adding each noise sample into each pure voice signal recorded in the first step to form various voice signals with noise, and enhancing each voice with noise based on different signal-to-noise ratios within the range of 0dB to 20 dB;
(1.4) respectively carrying out MOS scoring on each noisy voice before and after voice enhancement;
(1.5) taking the same noisy speech based on different signal-to-noise ratios as a group, finding out signals with converged MOS scoring before and after speech enhancement, and taking the signal with the relatively lowest signal-to-noise ratio as a sample signal of the noisy speech;
(1.6) carrying out framing and pre-emphasis processing on the voice sample with noise, then carrying out end point detection on the voice sample with noise, and respectively carrying out power spectrum estimation on the noise and the voice with noise;
(1.7) respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of each noisy speech sample:
Figure FDA0003091594650000011
xi (n, k) is the prior signal-to-noise ratio of the k frequency point of the nth frame;
Figure FDA0003091594650000012
the power of the voice with noise at the k frequency point of the nth frame;
Figure FDA0003091594650000013
noise power at a k-th frequency point for an nth frame; the value of alpha is 0.98;
finding out the minimum prior signal-to-noise ratio of each voice sample with noise, comparing the minimum prior signal-to-noise ratios of the samples, and determining the threshold value of the noise tolerance of the voice sample with noise, wherein based on the steps, the threshold value determined by the invention is 0.95-1.05;
(2) judging whether the noise of the voice signal with noise is tolerable or not:
(2.1) carrying out framing and pre-emphasis processing on the noisy speech signal;
(2.2) carrying out endpoint detection on the noisy speech signal book;
(2.3) respectively carrying out power spectrum estimation on the noise and the voice with the noise;
(2.4) respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of the voice with noise according to the formula (1);
(2.5) selecting a threshold value within the range of 0.95-1.05, and if the prior signal-to-noise ratio of each frequency point of the voice frame with noise is less than the selected threshold value, considering that the noise of the voice frame with noise is not tolerable and needing voice enhancement; otherwise, it is considered as being acceptable without enhancement.
CN201711372174.0A 2017-12-19 2017-12-19 Noise tolerance judgment method for noisy speech Active CN107910016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711372174.0A CN107910016B (en) 2017-12-19 2017-12-19 Noise tolerance judgment method for noisy speech

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711372174.0A CN107910016B (en) 2017-12-19 2017-12-19 Noise tolerance judgment method for noisy speech

Publications (2)

Publication Number Publication Date
CN107910016A CN107910016A (en) 2018-04-13
CN107910016B true CN107910016B (en) 2021-07-27

Family

ID=61870324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711372174.0A Active CN107910016B (en) 2017-12-19 2017-12-19 Noise tolerance judgment method for noisy speech

Country Status (1)

Country Link
CN (1) CN107910016B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831500B (en) * 2018-05-29 2023-04-28 平安科技(深圳)有限公司 Speech enhancement method, device, computer equipment and storage medium
CN109920434B (en) * 2019-03-11 2020-12-15 南京邮电大学 Noise classification removal method based on conference scene

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103730110A (en) * 2012-10-10 2014-04-16 北京百度网讯科技有限公司 Method and device for detecting voice endpoint
CN104869209A (en) * 2015-04-24 2015-08-26 广东小天才科技有限公司 Method and apparatus for adjusting recording of mobile terminal
CN105810201A (en) * 2014-12-31 2016-07-27 展讯通信(上海)有限公司 Voice activity detection method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013073230A (en) * 2011-09-29 2013-04-22 Renesas Electronics Corp Audio encoding device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103730110A (en) * 2012-10-10 2014-04-16 北京百度网讯科技有限公司 Method and device for detecting voice endpoint
CN105810201A (en) * 2014-12-31 2016-07-27 展讯通信(上海)有限公司 Voice activity detection method and system
CN104869209A (en) * 2015-04-24 2015-08-26 广东小天才科技有限公司 Method and apparatus for adjusting recording of mobile terminal

Also Published As

Publication number Publication date
CN107910016A (en) 2018-04-13

Similar Documents

Publication Publication Date Title
CN108831499B (en) Speech enhancement method using speech existence probability
CN109817209B (en) Intelligent voice interaction system based on double-microphone array
CN106875938B (en) Improved nonlinear self-adaptive voice endpoint detection method
US9558755B1 (en) Noise suppression assisted automatic speech recognition
CN107863099B (en) Novel double-microphone voice detection and enhancement method
CN110970053A (en) Multichannel speaker-independent voice separation method based on deep clustering
CN105280193B (en) Priori signal-to-noise ratio estimation method based on MMSE error criterion
CN107610712B (en) Voice enhancement method combining MMSE and spectral subtraction
CN107910016B (en) Noise tolerance judgment method for noisy speech
CN111091833A (en) Endpoint detection method for reducing noise influence
CN103021405A (en) Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN112951257A (en) Audio image acquisition equipment and speaker positioning and voice separation method
CN105702262A (en) Headset double-microphone voice enhancement method
CN110827847A (en) Microphone array voice denoising and enhancing method with low signal-to-noise ratio and remarkable growth
Ramirez et al. Voice activity detection with noise reduction and long-term spectral divergence estimation
CN112259117B (en) Target sound source locking and extracting method
CN111225317B (en) Echo cancellation method
May et al. Generalization of supervised learning for binary mask estimation
Koldovský et al. CHiME data separation based on target signal cancellation and noise masking
KR20030010432A (en) Apparatus for speech recognition in noisy environment
KR100571427B1 (en) Feature Vector Extraction Unit and Inverse Correlation Filtering Method for Speech Recognition in Noisy Environments
CN115410593A (en) Audio channel selection method, device, equipment and storage medium
TWI749547B (en) Speech enhancement system based on deep learning
Nakatani et al. Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise for Robust ASR.
Lu et al. Reduction of musical residual noise using hybrid median filter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant