CN103824562A

CN103824562A - Psychological acoustic model-based voice post-perception filter

Info

Publication number: CN103824562A
Application number: CN201410046572.3A
Authority: CN
Inventors: 贾海蓉; 李鸿燕; 武奕峰; 张雪英
Original assignee: Taiyuan University of Technology
Current assignee: Taiyuan University of Technology
Priority date: 2014-02-10
Filing date: 2014-02-10
Publication date: 2014-05-28
Anticipated expiration: 2034-02-10
Also published as: CN103824562B

Abstract

The invention relates to a psychological acoustic model-based voice post-perception filter. The perception filters does not need to be fused in all algorithms, so that the complexity of the algorithms is not influenced; but the identical auditory perception enhancement effect can be obtained. Because the re-processing process of voice enhancement is focused, the auditory perception of the enhanced voice is further improved; and even under the circumstances that noise exists and the signal to noise ratio is not improved, the objective of auditory perception improvement can be achieved by using the post-perception filter. The filter is established under the circumstances that the voice signal distortion is in a minimum state and on the condition that the residual noises are not heard by human ears. Moreover, the gain of the filter is obtained by constructing a cost function containing a masking threshold on the condition; and further optimization is carried out by a perception normalization factor constructed by the masking threshold. Therefore, excessive signal weakening can be avoided and the minimum voice perception distortion after enhancement can be ensured.

Description

The rearmounted perceptual filter of voice based on psychoacoustic model

Technical field

The present invention relates to the rearmounted perceptual filter of voice based on psychoacoustic model.

Background technology

At present, the various algorithms that voice strengthen can be removed noise to some extent, but more or less also have residual noise and music noise, have affected the quality of voice, so need further to eliminate it; Add the auditory perception that the evaluation of voice is finally depended on to people, thereby the research that voice are strengthened should be in conjunction with the apperceive characteristic that uses human auditory system to voice, it is the masking effect of people's ear, unwanted noise is had to special inhibit feature, make the voice after strengthening reduce as much as possible auditory fatigue degree, improve auditory perception, play the usefulness that improves voice quality.So performance voice being strengthened in conjunction with the masking effect in human hearing characteristic has very important effect.

In recent years, there are many experts and scholars to strengthen and done research the voice based on people's ear masking effect, and obtained certain effect.But these algorithms are to be all based upon on the basis of merging with other algorithm, make algorithm originally because added the calculating of masking model more complicated, even can not real-time implementation.For this problem, this chapter has proposed a kind of rearmounted perceptual filter based on masking effect, and it is used in voice enhancing.

Summary of the invention

There is residual noise in the voice that the present invention is directed to after enhancing, causes the poor problem of Auditory Perception degree, proposes a kind of rearmounted perceptual filter based on psychoacoustic model, and it is used in voice enhancing.First, this perceptual filter does not need to merge in each algorithm, thereby can not affect the complexity of algorithm, but has but obtained the effect of same enhancing Auditory Perception degree.Secondly, it just, for the process of again processing that strengthens voice, further improves the Auditory Perception degree that strengthens voice, even if noise exists, in the situation that signal to noise ratio (S/N ratio) does not improve, utilize this rearmounted perceptual filter, also can reach the object that improves Auditory Perception degree; This postposition perceptual filter is to be based upon in the situation of voice signal distortion minimum, to make under condition that residual noise do not heard by people's ear as far as possible, and the gain of wave filter is to obtain by build the cost function that contains masking threshold under this condition, and the perception normalized factor being built by masking threshold is further optimized, object is the weakening signal of having avoided excessive, speech perception distortion minimum after having guaranteed to strengthen.

As shown in Figure 1, concrete scheme is:

1) noisy speech, after spectrum-subtraction (this method can change) strengthens, divides frame to calculate the masking threshold of every frame according to psychoacoustic model.

2) masking threshold that solves by the first step builds cost function, and object is to guarantee under the condition of voice signal distortion minimum, residual noise is not heard by people's ear as far as possible.

J＝P(ε _s)+μ(P(ε _r)-E[T _k])＝|G-1| ²E[|S _k|] ²+μ(|G| ²E[|N _k|] ²-E[T _k])

Wherein, ε _s=S _k(G-1) be voice distortion, ε _r=N _kg is residual noise.Because voice and noise are not

Relevant, so E (N _ks _k)=0, the power P (ε of voice distortion _s), the power P (ε of residual noise _r).

3) by make cost function under, solve the gain of perceptual filter.

4) for fear of excessive weakening signal, then by perception normalized factor, perceptual filter is revised, avoided excessive weakening signal, speech perception distortion minimum after having guaranteed to strengthen.

Perception normalized factor is:

wherein, T _min(l) be the minimum value in the 1st frame, T _max(l) be the maximal value in the 1st frame, obtain the gain G of final perceptual filter _kfor:

G_{k} = 1 / \max (θ * \frac{\sqrt{{| N_{k} |}^{2}}}{\sqrt{T_{k}}}, 1) = 1 / \max (θ * \frac{| N_{k} |}{\sqrt{T_{k}}}, 1)

5) voice that are finally enhanced.

Accompanying drawing explanation

By describing in more detail exemplary embodiment of the present invention with reference to accompanying drawing, above and other aspect of the present invention and advantage will become more and be readily clear of, in the accompanying drawings:

Fig. 1 is the schematic diagram of the rearmounted perceptual filter of the voice based on psychoacoustic model of the present invention;

Fig. 2 is under the white noise background of the rearmounted perceptual filter of the voice based on psychoacoustic model of the present invention, and SS and WF add the relatively schematic diagram of result before and after perceptual filter;

Fig. 3 is under the train noise background of the rearmounted perceptual filter of the voice based on psychoacoustic model of the present invention, and SS and WF add the relatively schematic diagram of result before and after perceptual filter.

Embodiment

Hereinafter, now with reference to accompanying drawing, the present invention is described more fully, various embodiment shown in the drawings.But the present invention can implement in many different forms, and should not be interpreted as being confined to embodiment set forth herein.On the contrary, it will be thorough with completely providing these embodiment to make the disclosure, and scope of the present invention is conveyed to those skilled in the art fully.

Hereinafter, exemplary embodiment of the present invention is described with reference to the accompanying drawings in more detail.

With MATLAB to spectrum-subtraction (Spectral Subtraction, and Wiener Filter Method (WienerFiltering SS), WF) voice after strengthening add rearmounted perceptual filter and carry out experiment simulation, the English male voice in 863 sound banks taken from voice: " The birch canoe slid on the smooth planks. ", sampling rate is 8kHz, frame length K is 160, and frame is stacked as 50%; Noise is white Gaussian noise and the train noise of NOISEX.92 database.In clean speech, add white Gaussian noise and train noise as noisy speech.Wherein add the signal to noise ratio (S/N ratio) of the noisy speech of white Gaussian noise to be-10dB ,-5dB, 0dB, 5dB, 10dB; The signal to noise ratio (S/N ratio) that adds the noisy speech of train noise is 0dB, 5dB, 10dB, 15dB.The object of emulation is that spectrum-subtraction and Wiener filtering are added to SNR (Signal Noise Ratio) and PESQ (the Perceptual Evaluation Speech Quality) comparison before and after rearmounted perceptual filter, and experimental result as shown in Figure 2,3.

As can be seen from Figure 2, under white noise background, spectrum-subtraction and Wiener Filter Method add after perceptual filter, and signal to noise ratio (S/N ratio) has slightly and to improve or to reduce, but PESQ value entirety improved, such as when the 10dB.This has just in time confirmed the design philosophy of perceptual filter, and acceptable noise exists, and signal to noise ratio (S/N ratio) decreases, but Auditory Perception degree has improved.In addition, under train noise background in Fig. 3, situation is basic identical, what what illustrate no matter after noise background, voice enhancement algorithm, this design meets the designing requirement of perceptual filter, meet the requirement of human auditory system, also proved the validity of the perceptual filter of new proposition, can be used in voice enhancing simultaneously.

The foregoing is only embodiments of the invention, be not limited to the present invention.The present invention can have various suitable changes and variation.All any modifications of doing within the spirit and principles in the present invention, be equal to replacement, improvement etc., within protection scope of the present invention all should be included in.

Claims

1. the rearmounted perceptual filter of the voice based on psychoacoustic model, is characterized in that:

In described wave filter,

1) noisy speech, after the spectrum-subtraction that can lead continuously strengthens, divides frame to calculate the masking threshold of every frame according to psychoacoustic model;

2) masking threshold solving by the first step builds cost function:

J＝P(ε _s)+μ(P(ε _r)-E[T _k])＝|G-1| ²E[|S _k| ²]+μ(|G| ²E[|N _k| ²]-E[T _k])

Wherein, ε _s=S _k(G-1) be voice distortion, ε _r=N _kg is residual noise; E (N _ks _k)=0, language

Power P (the ε of sound distortion _s), the power P (ε of residual noise _r);

3) by make cost function under, solve the gain of perceptual filter;

4) by perception normalized factor, perceptual filter is revised again,

Perception normalized factor is: wherein, T _min(l) be the minimum value in the 1st frame, T _max(l) be the maximal value in the 1st frame, obtain the gain G of final perceptual filter _kfor:

G_{k} = 1 / \max (θ * \frac{\sqrt{{| N_{k} |}^{2}}}{\sqrt{T_{k}}}, 1) = 1 / \max (θ * \frac{| N_{k} |}{\sqrt{T_{k}}}, 1)

5) voice that are finally enhanced.