CN107910016B - Noise tolerance judgment method for noisy speech - Google Patents
Noise tolerance judgment method for noisy speech Download PDFInfo
- Publication number
- CN107910016B CN107910016B CN201711372174.0A CN201711372174A CN107910016B CN 107910016 B CN107910016 B CN 107910016B CN 201711372174 A CN201711372174 A CN 201711372174A CN 107910016 B CN107910016 B CN 107910016B
- Authority
- CN
- China
- Prior art keywords
- noise
- voice
- signal
- sample
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000001514 detection method Methods 0.000 claims abstract description 6
- 238000009432 framing Methods 0.000 claims abstract description 6
- 238000001228 spectrum Methods 0.000 claims abstract description 6
- 230000002708 enhancing effect Effects 0.000 claims description 4
- 230000000737 periodic effect Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 5
- 238000001914 filtration Methods 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Noise Elimination (AREA)
Abstract
The invention discloses a noise tolerance judgment method of a voice with noise, which comprises the steps of selecting a voice sample with noise, calculating the prior signal-to-noise ratio of each frequency point of each voice sample signal with noise, and finding out the minimum prior signal-to-noise ratio; and comparing the minimum prior signal-to-noise ratio of each sample, and determining the threshold value of the noise tolerance of the voice signal with noise. Carrying out framing and pre-emphasis on the voice with noise, and carrying out end point detection; respectively estimating power spectrums of the noise and the voice with the noise; and calculating the prior signal-to-noise ratio of each frequency point of each frame. If the prior signal-to-noise ratio of one or more frequency points is less than the threshold value, the noise of the frame of voice is not compatible, and voice enhancement processing is required. If the prior signal-to-noise ratio of each frequency point of the frame is greater than the threshold value, the noise of the frame voice signal is tolerable, and voice enhancement processing is not needed. According to the determination method, a noisy speech with acceptable noise can be recognized. For noisy speech with noise tolerance, speech enhancement is not needed, thereby avoiding the loss of effective information.
Description
Technical Field
The invention relates to a method for judging whether noise of a voice with noise is tolerable or not, and belongs to the technical field of processing of voice signals with noise.
Background
In the speech signal processing, after the speech enhancement, the signal-to-noise ratio of the noisy speech is improved, and simultaneously, part of effective speech information is lost, so that the speech signal distortion is caused. The presence of a speech signal may raise the threshold at which its background noise is heard by the human ear, depending on the auditory masking effect of the human ear. Based on this principle, the background noise of some noisy speech signals may not be heard by the human ear or, even if heard by the human ear, may not cause discomfort. The present invention refers to such background noise as a tolerable noise, and provides a method for judging whether the noise is tolerable or not.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a noise tolerance judging method of a voice with noise, and for the voice signal with noise, if the background noise is judged to be tolerant, the noise can not be eliminated during signal processing, so that the loss of effective voice information caused by noise elimination is avoided, and the effective information is retained to the maximum extent.
The technical scheme is as follows: a noise tolerance judging method for noisy speech includes determining threshold value and judging based on threshold value.
Step of first part determining threshold
First, recording several segments of pure speech signal
And secondly, respectively extracting a plurality of scene noise samples from four noise signals of impulse noise, broadband noise, periodic noise and voice interference in a noise library of Noisex-92.
And thirdly, respectively adding each noise sample into each pure voice signal recorded in the first step to form various voice signals with noise. Enhancement processing is performed on each noisy speech based on different signal-to-noise ratios within a signal-to-noise ratio range of 0dB to 20 dB.
And fourthly, respectively carrying out MOS scoring on each noisy voice before and after voice enhancement.
And fifthly, taking the same kind of noisy speech based on different signal-to-noise ratios as a group, finding out signals with convergent MOS scores before and after speech enhancement, and taking the signal with the relatively lowest signal-to-noise ratio as a sample signal of the noisy speech.
Sixthly, performing framing and pre-emphasis processing on the voice sample with noise, then performing end point detection on the voice sample with noise, and performing power spectrum estimation on the noise and the voice with noise respectively;
seventhly, respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of each noisy speech sample:
xi (n, k) -the prior signal-to-noise ratio of the kth frequency point of the nth frame;
Alpha-value of 0.98
A minimum a priori signal-to-noise ratio is found for each noisy speech sample. And comparing the minimum prior signal-to-noise ratio of each sample, and determining a threshold value of the noise tolerance of the voice with noise. Based on the steps, the threshold value determined by the invention is 0.95-1.05.
The second part judges whether the noise of the voice signal with noise is tolerable
Firstly, carrying out framing and pre-emphasis processing on a noisy speech signal,
secondly, carrying out endpoint detection on the noisy speech signal book,
thirdly, respectively carrying out power spectrum estimation on the noise and the voice with the noise;
fourthly, respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of the voice with noise according to the formula (1):
fifthly, selecting a threshold value within the range of 0.95-1.05. And if the prior signal-to-noise ratio of each frequency point of the voice frame with noise is smaller than the selected threshold, the noise of the voice frame with noise is considered to be not tolerable, and voice enhancement is required. Otherwise, it is considered as being acceptable without enhancement.
Drawings
FIG. 1 is a block diagram of a method flow of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a pure speech signal waveform according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a waveform of a noisy speech signal with a signal-to-noise ratio of 15dB according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a filtered waveform with a tolerable decision added according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a filtered waveform without adding a tolerance determination according to an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
The method for judging the noise tolerance of the noisy speech comprises two parts of threshold value determination and threshold value-based judgment:
determination of threshold values
First, noise samples are extracted from the noise bank Noisex-92
Noise samples of four types of additive noise such as impulse noise, broadband noise, periodic noise and voice interference under different scenes are respectively extracted from a Noisex-92 noise library. Wherein, the impact noise is selected from the palm sound, and the hammer knock sound is used as a noise sample; selecting the noise in a car running at 130km/h, the noise in a noisy workshop and the noise on a road as sample noise by the broadband noise class; periodic noise respectively selects sounds emitted by an air conditioner external unit, electric wind, an electric hair drier and the like as sample noise; the speech interference selects office multi-person speech and one-person speech respectively as noise samples.
Second, the formation and enhancement of noisy speech
Each noise sample is added to the following five pure voice signals respectively: the method comprises the steps of 'carrying water if the water is good, carrying a great deal of goods', 'carrying water energy boat can also cover a boat', 'moving backward if the water energy boat does not move', a 'voice enhancement algorithm', and 'selected noise samples', forming various noisy voices, and respectively enhancing the various noisy voices based on different signal-to-noise ratios within the range of the signal-to-noise ratio from 0dB to 20 dB. The method comprises the following steps that a short-time energy mean value voice enhancement algorithm is adopted to respectively enhance voice signals with hammer knocking sound and clapping sound; respectively enhancing the noise in a car running at 130km/h, the noise in a noisy workshop and the voice signals with the noisy noise on a road by adopting a spectral subtraction method based on an auditory masking effect; respectively enhancing voice signals with electric hair drier noise, electric fan noise and air conditioner outdoor unit noise by adopting an LMS adaptive filtering method; speech enhancement of speech signals with speech interference of one or more persons using comb filters
Third, selecting sample signal
And respectively carrying out MOS scoring on each noisy voice before and after voice enhancement. The same noisy speech signals based on different signal-to-noise ratios are used as a group, signals with the MOS scores converging before and after speech enhancement are found out, and the signal with the relatively lowest signal-to-noise ratio is used as a sample of the noisy speech signals.
Fourth, threshold is determined based on sample signal with minimum prior signal-to-noise ratio
Respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of each noisy speech sample according to the following formula:
xi (n, k) -the prior signal-to-noise ratio of the kth frequency point of the nth frame;
Alpha-value of 0.98
A minimum a priori signal-to-noise ratio is found for each noisy speech sample. And comparing the minimum prior signal-to-noise ratio of each sample, and determining that the threshold value of the noise tolerance of the voice with noise is 0.95-1.05.
The second part determines whether the noise of the noisy speech signal is tolerable.
A female pure speech signal, one of the speech enhancement algorithms, is recorded, and the speech waveform is shown in fig. 2. Additive noise was taken as factory noise of factary 2 in the Noisex-92 standard noise library. A noisy speech signal with a signal-to-noise ratio of 15dB is formed as shown in fig. 3. Wiener filtering is performed on the noisy speech signal shown in fig. 3, without noise tolerance determination and with noise tolerance determination, respectively. The specific implementation steps are as follows;
firstly, pre-emphasizing a voice signal with noise, windowing and framing, and carrying out end point detection;
secondly, respectively carrying out power spectrum estimation on the noise and the voice with the noise;
then, respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of the voice with noise according to the formula (1);
finally, for the threshold value of 0.95-1.05, 0.95 is taken as the threshold value for noise tolerance judgment. And if the prior signal-to-noise ratio of each frequency point of the voice frame with the noise is less than 0.95, considering that the noise of the voice frame with the noise is not tolerable and needing wiener filtering. Otherwise, it is considered to be acceptable without filtering. The noisy speech signal shown in fig. 3 is processed frame by frame, and the processed waveform is shown in fig. 4. For comparison, the noise-containing speech signal shown in fig. 3 is processed by wiener filtering without introducing a tolerable degree decision, and the processing result is shown in fig. 5.
Claims (1)
1. A noise tolerance judging method of a voice with noise is characterized by comprising two parts of threshold value determination and threshold value-based judgment, and specifically comprises the following steps:
(1) a step of determining a threshold:
(1.1) recording several segments of pure voice signals;
(1.2) respectively extracting a plurality of scene noise samples from four types of noise signals, namely impulse noise, broadband noise, periodic noise and voice interference in a noise library of Noisex-92;
(1.3) respectively adding each noise sample into each pure voice signal recorded in the first step to form various voice signals with noise, and enhancing each voice with noise based on different signal-to-noise ratios within the range of 0dB to 20 dB;
(1.4) respectively carrying out MOS scoring on each noisy voice before and after voice enhancement;
(1.5) taking the same noisy speech based on different signal-to-noise ratios as a group, finding out signals with converged MOS scoring before and after speech enhancement, and taking the signal with the relatively lowest signal-to-noise ratio as a sample signal of the noisy speech;
(1.6) carrying out framing and pre-emphasis processing on the voice sample with noise, then carrying out end point detection on the voice sample with noise, and respectively carrying out power spectrum estimation on the noise and the voice with noise;
(1.7) respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of each noisy speech sample:
xi (n, k) is the prior signal-to-noise ratio of the k frequency point of the nth frame;the power of the voice with noise at the k frequency point of the nth frame;noise power at a k-th frequency point for an nth frame; the value of alpha is 0.98;
finding out the minimum prior signal-to-noise ratio of each voice sample with noise, comparing the minimum prior signal-to-noise ratios of the samples, and determining the threshold value of the noise tolerance of the voice sample with noise, wherein based on the steps, the threshold value determined by the invention is 0.95-1.05;
(2) judging whether the noise of the voice signal with noise is tolerable or not:
(2.1) carrying out framing and pre-emphasis processing on the noisy speech signal;
(2.2) carrying out endpoint detection on the noisy speech signal book;
(2.3) respectively carrying out power spectrum estimation on the noise and the voice with the noise;
(2.4) respectively calculating the prior signal-to-noise ratio of each frequency point in each frame of the voice with noise according to the formula (1);
(2.5) selecting a threshold value within the range of 0.95-1.05, and if the prior signal-to-noise ratio of each frequency point of the voice frame with noise is less than the selected threshold value, considering that the noise of the voice frame with noise is not tolerable and needing voice enhancement; otherwise, it is considered as being acceptable without enhancement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711372174.0A CN107910016B (en) | 2017-12-19 | 2017-12-19 | Noise tolerance judgment method for noisy speech |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711372174.0A CN107910016B (en) | 2017-12-19 | 2017-12-19 | Noise tolerance judgment method for noisy speech |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107910016A CN107910016A (en) | 2018-04-13 |
CN107910016B true CN107910016B (en) | 2021-07-27 |
Family
ID=61870324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711372174.0A Active CN107910016B (en) | 2017-12-19 | 2017-12-19 | Noise tolerance judgment method for noisy speech |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107910016B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108831500B (en) * | 2018-05-29 | 2023-04-28 | 平安科技(深圳)有限公司 | Speech enhancement method, device, computer equipment and storage medium |
CN109920434B (en) * | 2019-03-11 | 2020-12-15 | 南京邮电大学 | Noise classification removal method based on conference scene |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103730110A (en) * | 2012-10-10 | 2014-04-16 | 北京百度网讯科技有限公司 | Method and device for detecting voice endpoint |
CN104869209A (en) * | 2015-04-24 | 2015-08-26 | 广东小天才科技有限公司 | Method and apparatus for adjusting recording of mobile terminal |
CN105810201A (en) * | 2014-12-31 | 2016-07-27 | 展讯通信(上海)有限公司 | Voice activity detection method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013073230A (en) * | 2011-09-29 | 2013-04-22 | Renesas Electronics Corp | Audio encoding device |
-
2017
- 2017-12-19 CN CN201711372174.0A patent/CN107910016B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103730110A (en) * | 2012-10-10 | 2014-04-16 | 北京百度网讯科技有限公司 | Method and device for detecting voice endpoint |
CN105810201A (en) * | 2014-12-31 | 2016-07-27 | 展讯通信(上海)有限公司 | Voice activity detection method and system |
CN104869209A (en) * | 2015-04-24 | 2015-08-26 | 广东小天才科技有限公司 | Method and apparatus for adjusting recording of mobile terminal |
Also Published As
Publication number | Publication date |
---|---|
CN107910016A (en) | 2018-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108831499B (en) | Speech enhancement method using speech existence probability | |
CN109817209B (en) | Intelligent voice interaction system based on double-microphone array | |
CN106875938B (en) | Improved nonlinear self-adaptive voice endpoint detection method | |
US9558755B1 (en) | Noise suppression assisted automatic speech recognition | |
CN107863099B (en) | Novel double-microphone voice detection and enhancement method | |
CN110970053A (en) | Multichannel speaker-independent voice separation method based on deep clustering | |
CN105280193B (en) | Priori signal-to-noise ratio estimation method based on MMSE error criterion | |
CN107610712B (en) | Voice enhancement method combining MMSE and spectral subtraction | |
CN107910016B (en) | Noise tolerance judgment method for noisy speech | |
CN111091833A (en) | Endpoint detection method for reducing noise influence | |
CN103021405A (en) | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter | |
CN112951257A (en) | Audio image acquisition equipment and speaker positioning and voice separation method | |
CN105702262A (en) | Headset double-microphone voice enhancement method | |
CN110827847A (en) | Microphone array voice denoising and enhancing method with low signal-to-noise ratio and remarkable growth | |
Ramirez et al. | Voice activity detection with noise reduction and long-term spectral divergence estimation | |
CN112259117B (en) | Target sound source locking and extracting method | |
CN111225317B (en) | Echo cancellation method | |
May et al. | Generalization of supervised learning for binary mask estimation | |
Koldovský et al. | CHiME data separation based on target signal cancellation and noise masking | |
KR20030010432A (en) | Apparatus for speech recognition in noisy environment | |
KR100571427B1 (en) | Feature Vector Extraction Unit and Inverse Correlation Filtering Method for Speech Recognition in Noisy Environments | |
CN115410593A (en) | Audio channel selection method, device, equipment and storage medium | |
TWI749547B (en) | Speech enhancement system based on deep learning | |
Nakatani et al. | Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise for Robust ASR. | |
Lu et al. | Reduction of musical residual noise using hybrid median filter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |