CN103824562B - The rearmounted perceptual filter of voice based on psychoacoustic model - Google Patents

The rearmounted perceptual filter of voice based on psychoacoustic model Download PDF

Info

Publication number
CN103824562B
CN103824562B CN201410046572.3A CN201410046572A CN103824562B CN 103824562 B CN103824562 B CN 103824562B CN 201410046572 A CN201410046572 A CN 201410046572A CN 103824562 B CN103824562 B CN 103824562B
Authority
CN
China
Prior art keywords
voice
noise
perceptual filter
filter
rearmounted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410046572.3A
Other languages
Chinese (zh)
Other versions
CN103824562A (en
Inventor
贾海蓉
李鸿燕
武奕峰
张雪英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN201410046572.3A priority Critical patent/CN103824562B/en
Publication of CN103824562A publication Critical patent/CN103824562A/en
Application granted granted Critical
Publication of CN103824562B publication Critical patent/CN103824562B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The present invention relates to the rearmounted perceptual filter of voice based on psychoacoustic model, first, this perceptual filter need not merge in each algorithm, thus does not interferes with the complexity of algorithm, but but obtains the same effect strengthening Auditory Perception degree.Secondly, it is just for the process again processed strengthening voice so that the Auditory Perception degree strengthening voice further improves, even if noise exists, in the case of signal to noise ratio does not improve, utilize the perceptual filter that this is rearmounted, also can reach to improve the purpose of Auditory Perception degree;Under conditions of this rearmounted perceptual filter makes residual noise not heard by human ear in the case of being built upon voice signal distortion minimum, and the gain of wave filter is to be obtained by the cost function built under this condition containing masking threshold, and the perception normalization factor built by masking threshold optimizes further, purpose is that of avoiding excessive weakening signal, it is ensured that after enhancing, speech perception distortion is minimum.

Description

The rearmounted perceptual filter of voice based on psychoacoustic model
Technical field
The present invention relates to the rearmounted perceptual filter of voice based on psychoacoustic model.
Background technology
At present, the various algorithms of speech enhan-cement can remove noise to some extent, but more or less there is also Residual noise and music noise, have impact on the quality of voice, so needing to eliminate further it;Add language The evaluation of sound depends finally on the auditory perception of people, thus R. concomitans human auditory is answered in the research to speech enhan-cement The system perception characteristic to voice, i.e. the masking effect of human ear, unwanted noise is had special suppression function, Make enhanced voice reduce auditory fatigue degree as much as possible, improve auditory perception, play raising voice quality Usefulness.So, in conjunction with the masking effect in human hearing characteristic, the performance of speech enhan-cement is had extremely important Effect.
In recent years, there are many experts and scholars that speech enhan-cement based on human ear masking effect has been made research, and obtain Certain effect.But on the basis of these algorithms are all built upon merging with other algorithm, make originally Algorithm, even can not real-time implementation because the calculating adding masking model is more complicated.For this problem, This patent proposes a kind of rearmounted perceptual filter based on masking effect, and it is used in speech enhan-cement.
Summary of the invention
The present invention is directed to enhanced voice and there is residual noise, the problem causing Auditory Perception degree difference, propose A kind of rearmounted perceptual filter based on psychoacoustic model, and it is used in speech enhan-cement.First, should Perceptual filter need not merge in each algorithm, thus does not interferes with the complexity of algorithm, but but obtains The effect of same enhancing Auditory Perception degree.Secondly, it is just for the process again processed of enhancing voice, The Auditory Perception degree strengthening voice is further improved, even if noise exists, the feelings that signal to noise ratio does not improve Under condition, utilize the perceptual filter that this is rearmounted, also can reach to improve the purpose of Auditory Perception degree;This rearmounted sense Know that wave filter makes residual noise not heard by human ear in the case of being built upon voice signal distortion minimum Under conditions of, and the gain of wave filter be by under this condition build containing masking threshold cost function obtain, And the perception normalization factor built by masking threshold optimizes further, it is therefore an objective to avoid excessive weakening letter Number, it is ensured that after enhancing, speech perception distortion is minimum.
As it is shown in figure 1, the specific scheme is that
1) noisy speech is after spectrum-subtraction (this method can change) strengthens, according to psychoacoustic model Framing calculates the masking threshold of every frame.
2) with the 1st) masking threshold that solves of step builds cost function, it is therefore an objective to ensure that voice signal distortion is Under conditions of little, residual noise is made not heard by human ear.
J=P (εs)+μ(P(εr)-E[Tk])
=| G-1 |2E[|Sk|2]+μ(|G|2E[|Nk|2]-E[Tk])
Wherein, SkKth frequency spectrum after being fourier transformed for the clean speech signal in noisy speech Component;NkKth spectrum component after being fourier transformed for the noise signal in noisy speech, and E[|Nk|2]=λk;G represents the gain of perceptual filter unified in each subband;μ is Lagrange Operator;TkFor kth masking threshold component, the meaning of representative is that energy is more than Tk, human ear can be heard Sound, otherwise, will can't hear;εs=Sk(G-1) it is voice distortion;εr=NkG is residual noise; Because voice and noise are uncorrelated, and noise average is 0, so E (NkSk)=0, then voice distortion Power can be expressed as P (εs)=| G-1 |2E[|Sk|2], the power of residual noise can be expressed as P(εr)=| G |2E[|Nk|2]。
3) by making cost function minimum, it is meant that in the case of ensureing voice distortion minimum, each subband In residual noise sheltered by clean speech as far as possible, make human ear be not felt by, design sense with this Know wave filter, i.e. solve the gain of perceptual filter.
4) in order to avoid excessive weakening signal, then by perception normalization factor, perceptual filter is repaiied Just, it is to avoid excessive weakening signal, it is ensured that after enhancing, speech perception distortion is minimum.
Perception normalization factor is:Wherein, TminL () is the minima in the 1st frame, TmaxL () is the maximum in the 1st frame, obtain the gain G of final perceptual filterkFor:
G k = 1 / m a x ( θ * | N k | 2 T k , 1 ) = 1 / m a x ( θ * | N k | T k , 1 )
5) voice of enhancing is finally obtained.
Accompanying drawing explanation
The exemplary embodiment of the present invention it is more fully described, the above and other side of the present invention by referring to accompanying drawing Face and advantage will become the clearest, in the accompanying drawings:
Fig. 1 is the schematic diagram of the rearmounted perceptual filter of voice based on psychoacoustic model of the present invention;
Fig. 2 is the white noise background of the rearmounted perceptual filter of voice based on psychoacoustic model of the present invention Under, SS and WF adds the results contrast schematic diagram before and after perceptual filter;
Fig. 3 is the train noise background of the rearmounted perceptual filter of voice based on psychoacoustic model of the present invention Under, SS and WF adds the results contrast schematic diagram before and after perceptual filter.
Detailed description of the invention
Hereinafter, it is more fully described the present invention, various enforcements shown in the drawings now with reference to accompanying drawing Example.But, the present invention can implement in many different forms, and should not be construed as limited to explain at this The embodiment stated.On the contrary, it is provided that these embodiments make the disclosure will be thoroughly and completely, and by the present invention Scope be fully conveyed to those skilled in the art.
Hereinafter, the exemplary embodiment of the present invention it is more fully described with reference to the accompanying drawings.
With MATLAB to spectrum-subtraction (Spectral Subtraction, SS) and Wiener Filter Method (Wiener Filtering, WF) strengthen after voice add rearmounted perceptual filter and carry out experiment simulation, voice takes from 863 English male voice in sound bank: " The birch canoe slid on the smooth planks. ", adopts Sample rate is 8kHz, and frame length K is 160, and frame is stacked as 50%;Noise is the white Gaussian noise of NOISEX.92 data base And train noise.White Gaussian noise and train noise is added as noisy speech in clean speech.Wherein add The signal to noise ratio of the noisy speech of white Gaussian noise is-10dB ,-5dB, 0dB, 5dB, 10dB;Add train noise The signal to noise ratio of noisy speech is 0dB, 5dB, 10dB, 15dB.The purpose of emulation is to add spectrum-subtraction and Wiener filtering SNR (Signal Noise Ratio) before and after rearmounted perceptual filter and PESQ (Perceptual Evaluation Speech Quality) compare, experimental result is as shown in Figure 2,3.
From figure 2 it can be seen that under white noise background, spectrum-subtraction and Wiener Filter Method add perceptual filter After, signal to noise ratio has and slightly improves or reduce, but PESQ value is integrally improved, such as when 10dB.This is just Easily confirming the design philosophy of perceptual filter, acceptable noise exists, and signal to noise ratio decreases, but Auditory Perception Degree but improves.What it addition, situation is essentially identical under train noise background in figure 3, no matter illustrate at After noise background, what voice enhancement algorithm, this design meets the design requirement of perceptual filter, meets people The requirement of ear audition, also demonstrates the effectiveness of the perceptual filter of new proposition simultaneously, can be used in voice and increase Persistent erection of the penis.
The foregoing is only embodiments of the invention, be not limited to the present invention.The present invention can have respectively Plant suitably change and change.All made within the spirit and principles in the present invention any amendment, equivalent, Improve etc., should be included in protection scope of the present invention it.

Claims (1)

1. the rearmounted perceptual filter of voice based on psychoacoustic model, it is characterised in that:
In described wave filter,
1) noisy speech is after the spectrum-subtraction can led continuously strengthens, and calculates every according to psychoacoustic model framing The masking threshold of frame;
2) with the 1st) masking threshold that solves of step builds cost function:
J=P (εs)+μ(P(εr)-E[Tk])
=| G-1 |2E[|Sk|2]+μ(|G|2E[|Nk|2]-E[Tk])
Wherein, SkKth frequency spectrum after being fourier transformed for the clean speech signal in noisy speech divides Amount;NkKth spectrum component after being fourier transformed for the noise signal in noisy speech, and E[|Nk|2]=λk;G represents the gain of perceptual filter unified in each subband;μ is Lagrange Operator;TkFor kth masking threshold component, the meaning of representative is that energy is more than Tk, human ear can be heard Sound, otherwise, will can't hear;εs=Sk(G-1) it is voice distortion;εr=NkG is residual noise; Because voice and noise are uncorrelated, and noise average is 0, so E (NkSk)=0, then voice distortion Power can be expressed as P (εs)=| G-1 |2E[|Sk|2], the power of residual noise can be expressed as P(εr)=| G |2E[|Nk|2]。
3) by making cost function minimum, it is meant that in the case of ensureing voice distortion minimum, each subband In residual noise sheltered by clean speech as far as possible, make human ear be not felt by, design perception with this Wave filter, i.e. solves the gain of perceptual filter.
4) being modified perceptual filter by perception normalization factor, perception normalization factor is again:Wherein, TminL () is the minima in the 1st frame, TmaxL () is the 1st frame In maximum, obtain the gain G of final perceptual filterkFor:
G k = 1 / m a x ( θ * | N k | 2 T k , 1 ) = 1 / m a x ( θ * | N k | T k , 1 )
5) voice of enhancing is finally obtained.
CN201410046572.3A 2014-02-10 2014-02-10 The rearmounted perceptual filter of voice based on psychoacoustic model Expired - Fee Related CN103824562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410046572.3A CN103824562B (en) 2014-02-10 2014-02-10 The rearmounted perceptual filter of voice based on psychoacoustic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410046572.3A CN103824562B (en) 2014-02-10 2014-02-10 The rearmounted perceptual filter of voice based on psychoacoustic model

Publications (2)

Publication Number Publication Date
CN103824562A CN103824562A (en) 2014-05-28
CN103824562B true CN103824562B (en) 2016-08-17

Family

ID=50759584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410046572.3A Expired - Fee Related CN103824562B (en) 2014-02-10 2014-02-10 The rearmounted perceptual filter of voice based on psychoacoustic model

Country Status (1)

Country Link
CN (1) CN103824562B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869649B (en) * 2015-01-21 2020-02-21 北京大学深圳研究院 Perceptual filtering method and perceptual filter
CN109036466B (en) * 2018-08-01 2022-11-29 太原理工大学 Emotion dimension PAD prediction method for emotion voice recognition
CN109979478A (en) * 2019-04-08 2019-07-05 网易(杭州)网络有限公司 Voice de-noising method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6477489B1 (en) * 1997-09-18 2002-11-05 Matra Nortel Communications Method for suppressing noise in a digital speech signal
EP1619793A1 (en) * 2004-07-20 2006-01-25 Harman Becker Automotive Systems GmbH Audio enhancement system and method
CN101505447A (en) * 2008-02-07 2009-08-12 奥迪康有限公司 Method of estimating weighting function of audio signals in a hearing aid
CN101636648A (en) * 2007-03-19 2010-01-27 杜比实验室特许公司 Speech enhancement employing a perceptual model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6477489B1 (en) * 1997-09-18 2002-11-05 Matra Nortel Communications Method for suppressing noise in a digital speech signal
EP1619793A1 (en) * 2004-07-20 2006-01-25 Harman Becker Automotive Systems GmbH Audio enhancement system and method
CN101636648A (en) * 2007-03-19 2010-01-27 杜比实验室特许公司 Speech enhancement employing a perceptual model
CN101505447A (en) * 2008-02-07 2009-08-12 奥迪康有限公司 Method of estimating weighting function of audio signals in a hearing aid

Also Published As

Publication number Publication date
CN103824562A (en) 2014-05-28

Similar Documents

Publication Publication Date Title
EP3698360B1 (en) Noise reduction using machine learning
EP2141695B1 (en) Speech sound enhancement device
CN101901602B (en) Method for reducing noise by using hearing threshold of impaired hearing
CN105741849A (en) Voice enhancement method for fusing phase estimation and human ear hearing characteristics in digital hearing aid
CN107371079B (en) A kind of the diamylose gram noise reduction system and noise-reduction method of earphone
CN104704560A (en) Formant dependent speech signal enhancement
CN104505100A (en) Non-supervision speech enhancement method based robust non-negative matrix decomposition and data fusion
CN103824562B (en) The rearmounted perceptual filter of voice based on psychoacoustic model
Abdullah et al. Towards more efficient DNN-based speech enhancement using quantized correlation mask
CN103578466B (en) Based on the voice non-voice detection method of Fourier Transform of Fractional Order
Lin et al. Subband noise estimation for speech enhancement using a perceptual Wiener filter
Min et al. Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement
JP2007251354A (en) Microphone and sound generation method
CN104867498A (en) Mobile communication terminal and voice enhancement method and module thereof
CN105869649A (en) Perceptual filtering method and perceptual filter
CN106658323A (en) Dual microphone noise reduction system and method for cochlear implants and hearing aids
CN105719658A (en) Wavelet packet speech denoising method based on new threshold function and self-adaptive threshold
Sun et al. An RNN-based speech enhancement method for a binaural hearing aid system
CN104703108B (en) A kind of digital deaf-aid dynamic range compression algorithm being under noise conditions
CN210629614U (en) Voice noise reduction processor for built-in interphone
Liu et al. Speech enhancement based on the integration of fully convolutional network, temporal lowpass filtering and spectrogram masking
Tu et al. Sheffield system for the second clarity enhancement challenge
Jiang et al. An algorithm combined with spectral subtraction and binary masking for monaural speech segregation
CN108810692A (en) Active noise reduction system, active denoising method and earphone
Deepa et al. Performance evaluation of white noise for different noisy speech signals in mobile applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160817

Termination date: 20180210