CN103824562A - Psychological acoustic model-based voice post-perception filter - Google Patents
Psychological acoustic model-based voice post-perception filter Download PDFInfo
- Publication number
- CN103824562A CN103824562A CN201410046572.3A CN201410046572A CN103824562A CN 103824562 A CN103824562 A CN 103824562A CN 201410046572 A CN201410046572 A CN 201410046572A CN 103824562 A CN103824562 A CN 103824562A
- Authority
- CN
- China
- Prior art keywords
- perception
- voice
- filter
- noise
- perceptual filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention relates to a psychological acoustic model-based voice post-perception filter. The perception filters does not need to be fused in all algorithms, so that the complexity of the algorithms is not influenced; but the identical auditory perception enhancement effect can be obtained. Because the re-processing process of voice enhancement is focused, the auditory perception of the enhanced voice is further improved; and even under the circumstances that noise exists and the signal to noise ratio is not improved, the objective of auditory perception improvement can be achieved by using the post-perception filter. The filter is established under the circumstances that the voice signal distortion is in a minimum state and on the condition that the residual noises are not heard by human ears. Moreover, the gain of the filter is obtained by constructing a cost function containing a masking threshold on the condition; and further optimization is carried out by a perception normalization factor constructed by the masking threshold. Therefore, excessive signal weakening can be avoided and the minimum voice perception distortion after enhancement can be ensured.
Description
Technical field
The present invention relates to the rearmounted perceptual filter of voice based on psychoacoustic model.
Background technology
At present, the various algorithms that voice strengthen can be removed noise to some extent, but more or less also have residual noise and music noise, have affected the quality of voice, so need further to eliminate it; Add the auditory perception that the evaluation of voice is finally depended on to people, thereby the research that voice are strengthened should be in conjunction with the apperceive characteristic that uses human auditory system to voice, it is the masking effect of people's ear, unwanted noise is had to special inhibit feature, make the voice after strengthening reduce as much as possible auditory fatigue degree, improve auditory perception, play the usefulness that improves voice quality.So performance voice being strengthened in conjunction with the masking effect in human hearing characteristic has very important effect.
In recent years, there are many experts and scholars to strengthen and done research the voice based on people's ear masking effect, and obtained certain effect.But these algorithms are to be all based upon on the basis of merging with other algorithm, make algorithm originally because added the calculating of masking model more complicated, even can not real-time implementation.For this problem, this chapter has proposed a kind of rearmounted perceptual filter based on masking effect, and it is used in voice enhancing.
Summary of the invention
There is residual noise in the voice that the present invention is directed to after enhancing, causes the poor problem of Auditory Perception degree, proposes a kind of rearmounted perceptual filter based on psychoacoustic model, and it is used in voice enhancing.First, this perceptual filter does not need to merge in each algorithm, thereby can not affect the complexity of algorithm, but has but obtained the effect of same enhancing Auditory Perception degree.Secondly, it just, for the process of again processing that strengthens voice, further improves the Auditory Perception degree that strengthens voice, even if noise exists, in the situation that signal to noise ratio (S/N ratio) does not improve, utilize this rearmounted perceptual filter, also can reach the object that improves Auditory Perception degree; This postposition perceptual filter is to be based upon in the situation of voice signal distortion minimum, to make under condition that residual noise do not heard by people's ear as far as possible, and the gain of wave filter is to obtain by build the cost function that contains masking threshold under this condition, and the perception normalized factor being built by masking threshold is further optimized, object is the weakening signal of having avoided excessive, speech perception distortion minimum after having guaranteed to strengthen.
As shown in Figure 1, concrete scheme is:
1) noisy speech, after spectrum-subtraction (this method can change) strengthens, divides frame to calculate the masking threshold of every frame according to psychoacoustic model.
2) masking threshold that solves by the first step builds cost function, and object is to guarantee under the condition of voice signal distortion minimum, residual noise is not heard by people's ear as far as possible.
J=P(ε
s)+μ(P(ε
r)-E[T
k])=|G-1|
2E[|S
k|]
2+μ(|G|
2E[|N
k|]
2-E[T
k])
Wherein, ε
s=S
k(G-1) be voice distortion, ε
r=N
kg is residual noise.Because voice and noise are not
Relevant, so E (N
ks
k)=0, the power P (ε of voice distortion
s), the power P (ε of residual noise
r).
3) by make cost function under, solve the gain of perceptual filter.
4) for fear of excessive weakening signal, then by perception normalized factor, perceptual filter is revised, avoided excessive weakening signal, speech perception distortion minimum after having guaranteed to strengthen.
Perception normalized factor is:
wherein, T
min(l) be the minimum value in the 1st frame, T
max(l) be the maximal value in the 1st frame, obtain the gain G of final perceptual filter
kfor:
5) voice that are finally enhanced.
Accompanying drawing explanation
By describing in more detail exemplary embodiment of the present invention with reference to accompanying drawing, above and other aspect of the present invention and advantage will become more and be readily clear of, in the accompanying drawings:
Fig. 1 is the schematic diagram of the rearmounted perceptual filter of the voice based on psychoacoustic model of the present invention;
Fig. 2 is under the white noise background of the rearmounted perceptual filter of the voice based on psychoacoustic model of the present invention, and SS and WF add the relatively schematic diagram of result before and after perceptual filter;
Fig. 3 is under the train noise background of the rearmounted perceptual filter of the voice based on psychoacoustic model of the present invention, and SS and WF add the relatively schematic diagram of result before and after perceptual filter.
Embodiment
Hereinafter, now with reference to accompanying drawing, the present invention is described more fully, various embodiment shown in the drawings.But the present invention can implement in many different forms, and should not be interpreted as being confined to embodiment set forth herein.On the contrary, it will be thorough with completely providing these embodiment to make the disclosure, and scope of the present invention is conveyed to those skilled in the art fully.
Hereinafter, exemplary embodiment of the present invention is described with reference to the accompanying drawings in more detail.
With MATLAB to spectrum-subtraction (Spectral Subtraction, and Wiener Filter Method (WienerFiltering SS), WF) voice after strengthening add rearmounted perceptual filter and carry out experiment simulation, the English male voice in 863 sound banks taken from voice: " The birch canoe slid on the smooth planks. ", sampling rate is 8kHz, frame length K is 160, and frame is stacked as 50%; Noise is white Gaussian noise and the train noise of NOISEX.92 database.In clean speech, add white Gaussian noise and train noise as noisy speech.Wherein add the signal to noise ratio (S/N ratio) of the noisy speech of white Gaussian noise to be-10dB ,-5dB, 0dB, 5dB, 10dB; The signal to noise ratio (S/N ratio) that adds the noisy speech of train noise is 0dB, 5dB, 10dB, 15dB.The object of emulation is that spectrum-subtraction and Wiener filtering are added to SNR (Signal Noise Ratio) and PESQ (the Perceptual Evaluation Speech Quality) comparison before and after rearmounted perceptual filter, and experimental result as shown in Figure 2,3.
As can be seen from Figure 2, under white noise background, spectrum-subtraction and Wiener Filter Method add after perceptual filter, and signal to noise ratio (S/N ratio) has slightly and to improve or to reduce, but PESQ value entirety improved, such as when the 10dB.This has just in time confirmed the design philosophy of perceptual filter, and acceptable noise exists, and signal to noise ratio (S/N ratio) decreases, but Auditory Perception degree has improved.In addition, under train noise background in Fig. 3, situation is basic identical, what what illustrate no matter after noise background, voice enhancement algorithm, this design meets the designing requirement of perceptual filter, meet the requirement of human auditory system, also proved the validity of the perceptual filter of new proposition, can be used in voice enhancing simultaneously.
The foregoing is only embodiments of the invention, be not limited to the present invention.The present invention can have various suitable changes and variation.All any modifications of doing within the spirit and principles in the present invention, be equal to replacement, improvement etc., within protection scope of the present invention all should be included in.
Claims (1)
1. the rearmounted perceptual filter of the voice based on psychoacoustic model, is characterized in that:
In described wave filter,
1) noisy speech, after the spectrum-subtraction that can lead continuously strengthens, divides frame to calculate the masking threshold of every frame according to psychoacoustic model;
2) masking threshold solving by the first step builds cost function:
J=P(ε
s)+μ(P(ε
r)-E[T
k])=|G-1|
2E[|S
k|
2]+μ(|G|
2E[|N
k|
2]-E[T
k])
Wherein, ε
s=S
k(G-1) be voice distortion, ε
r=N
kg is residual noise; E (N
ks
k)=0, language
Power P (the ε of sound distortion
s), the power P (ε of residual noise
r);
3) by make cost function under, solve the gain of perceptual filter;
4) by perception normalized factor, perceptual filter is revised again,
Perception normalized factor is:
wherein, T
min(l) be the minimum value in the 1st frame, T
max(l) be the maximal value in the 1st frame, obtain the gain G of final perceptual filter
kfor:
5) voice that are finally enhanced.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410046572.3A CN103824562B (en) | 2014-02-10 | 2014-02-10 | The rearmounted perceptual filter of voice based on psychoacoustic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410046572.3A CN103824562B (en) | 2014-02-10 | 2014-02-10 | The rearmounted perceptual filter of voice based on psychoacoustic model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103824562A true CN103824562A (en) | 2014-05-28 |
CN103824562B CN103824562B (en) | 2016-08-17 |
Family
ID=50759584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410046572.3A Expired - Fee Related CN103824562B (en) | 2014-02-10 | 2014-02-10 | The rearmounted perceptual filter of voice based on psychoacoustic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103824562B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105869649A (en) * | 2015-01-21 | 2016-08-17 | 北京大学深圳研究院 | Perceptual filtering method and perceptual filter |
CN109036466A (en) * | 2018-08-01 | 2018-12-18 | 太原理工大学 | The emotion dimension PAD prediction technique of Emotional Speech identification |
CN109979478A (en) * | 2019-04-08 | 2019-07-05 | 网易(杭州)网络有限公司 | Voice de-noising method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6477489B1 (en) * | 1997-09-18 | 2002-11-05 | Matra Nortel Communications | Method for suppressing noise in a digital speech signal |
EP1619793A1 (en) * | 2004-07-20 | 2006-01-25 | Harman Becker Automotive Systems GmbH | Audio enhancement system and method |
CN101505447A (en) * | 2008-02-07 | 2009-08-12 | 奥迪康有限公司 | Method of estimating weighting function of audio signals in a hearing aid |
CN101636648A (en) * | 2007-03-19 | 2010-01-27 | 杜比实验室特许公司 | Speech enhancement employing a perceptual model |
-
2014
- 2014-02-10 CN CN201410046572.3A patent/CN103824562B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6477489B1 (en) * | 1997-09-18 | 2002-11-05 | Matra Nortel Communications | Method for suppressing noise in a digital speech signal |
EP1619793A1 (en) * | 2004-07-20 | 2006-01-25 | Harman Becker Automotive Systems GmbH | Audio enhancement system and method |
CN101636648A (en) * | 2007-03-19 | 2010-01-27 | 杜比实验室特许公司 | Speech enhancement employing a perceptual model |
CN101505447A (en) * | 2008-02-07 | 2009-08-12 | 奥迪康有限公司 | Method of estimating weighting function of audio signals in a hearing aid |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105869649A (en) * | 2015-01-21 | 2016-08-17 | 北京大学深圳研究院 | Perceptual filtering method and perceptual filter |
CN105869649B (en) * | 2015-01-21 | 2020-02-21 | 北京大学深圳研究院 | Perceptual filtering method and perceptual filter |
CN109036466A (en) * | 2018-08-01 | 2018-12-18 | 太原理工大学 | The emotion dimension PAD prediction technique of Emotional Speech identification |
CN109036466B (en) * | 2018-08-01 | 2022-11-29 | 太原理工大学 | Emotion dimension PAD prediction method for emotion voice recognition |
CN109979478A (en) * | 2019-04-08 | 2019-07-05 | 网易(杭州)网络有限公司 | Voice de-noising method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN103824562B (en) | 2016-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103778920B (en) | Speech enhan-cement and compensating for frequency response phase fusion method in digital deaf-aid | |
CN110473567B (en) | Audio processing method and device based on deep neural network and storage medium | |
JP7258182B2 (en) | Speech processing method, device, electronic device and computer program | |
CN105611477B (en) | The voice enhancement algorithm that depth and range neutral net are combined in digital deaf-aid | |
CN103236260B (en) | Speech recognition system | |
CN103236263B (en) | A kind of method, system and mobile terminal improving speech quality | |
CN112767963B (en) | Voice enhancement method, device and system and computer readable storage medium | |
AU2010204470B2 (en) | Automatic sound recognition based on binary time frequency units | |
CN105741849A (en) | Voice enhancement method for fusing phase estimation and human ear hearing characteristics in digital hearing aid | |
DE602007001338D1 (en) | SPEECH RECOGNITION WITH SPEAKER ADAPTATION BASED ON BASIC FREQUENCY CLASSIFICATION | |
CN103761974B (en) | Cochlear implant | |
CN108335702A (en) | A kind of audio defeat method based on deep neural network | |
WO2020186742A1 (en) | Voice recognition method applied to ground-air communication | |
CN104505100A (en) | Non-supervision speech enhancement method based robust non-negative matrix decomposition and data fusion | |
CN106878851A (en) | A kind of active noise reduction earphone based on channel compensation Yu speech recognition | |
Abdullah et al. | Towards more efficient DNN-based speech enhancement using quantized correlation mask | |
CN103824562A (en) | Psychological acoustic model-based voice post-perception filter | |
Min et al. | Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement | |
CN102314883B (en) | Music noise judgment method and voice noise elimination method | |
CN104778948A (en) | Noise-resistant voice recognition method based on warped cepstrum feature | |
CN106658323A (en) | Dual microphone noise reduction system and method for cochlear implants and hearing aids | |
Sun et al. | An RNN-based speech enhancement method for a binaural hearing aid system | |
CN104703108B (en) | A kind of digital deaf-aid dynamic range compression algorithm being under noise conditions | |
Lee et al. | Citear: A two-stage end-to-end system for noisy-reverberant hearing-aid processing | |
Liu et al. | Speech enhancement based on the integration of fully convolutional network, temporal lowpass filtering and spectrogram masking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160817 Termination date: 20180210 |
|
CF01 | Termination of patent right due to non-payment of annual fee |