CN104867497A

CN104867497A - Voice noise-reducing method

Info

Publication number: CN104867497A
Application number: CN201410076957.4A
Authority: CN
Inventors: 陈子华; 徐正春
Original assignee: BEIJING XINYOUDA VIDEO TECHNOLOGY Co Ltd; Beijing Xinwei Telecom Technology Inc
Current assignee: BEIJING XINYOUDA VIDEO TECHNOLOGY Co Ltd; Beijing Xinwei Telecom Technology Inc
Priority date: 2014-02-26
Filing date: 2014-02-26
Publication date: 2015-08-26

Abstract

The invention provides a voice noise-reducing method, which comprises the steps of: a, dividing a voice frame region into silent frames and voice frames through endpoint detection; b, calculating a power spectral value of the current frame to serve as a noise power spectrum estimated value for the silent frames, and calculating an average noise power spectrum to serve as a noise power spectrum estimated value for the voice frames; c, subtracting the noise power spectrum estimated value from power spectra of the voice frames to obtain voice power spectra after noise reduction; and d, acquiring the voice frames after noise reduction according to the voice power spectra after noise reduction. The voice noise-reducing method reduces the error of the noise power spectrum estimated value by adopting the endpoint detection technology, and basically eliminates musical noise, thereby improving the voice noise-reducing quality and the effect of the subjective sense of hearing.

Description

A kind of voice de-noising method

Technical field

The present invention relates to voice call field, particularly relate to a kind of voice de-noising method.

Background technology

In speech business, modal problem has noise in call, and the technology that process noise is the most frequently used is at present spectrum-subtraction.It utilizes the short-term stationarity characteristic of voice signal, deducts the short time spectrum value of noise, thus obtains the frequency spectrum compared with clean speech, reach the object of voice de-noising from the spectrum in short-term of noisy speech.Spectrum-subtraction comprises amplitude spectrum subtraction and power spectrum subtraction: amplitude spectrum subtraction is exactly the amplitude spectrum of amplitude spectrum as voice signal deducting noise in a frequency domain from the amplitude spectrum of noisy speech; Power spectrum subtraction is then the power spectrum deducting noise from the power spectrum of noisy speech, obtains the power spectrum of clean speech, obtains amplitude spectrum by extracting operation.Because people's ear is insensitive to the phase place perception of voice spectrum component, therefore these algorithms are all the corrections carried out in amplitude, and phase bit position then remains unchanged, after processing noise, still use the phase place of noisy speech to recover the voice after noise reduction.In the estimation of noise spectrum, be generally the noise spectrum estimation value of the noise spectrum before use voice do not produce as whole voice de-noising interval.

Above-mentioned spectrum-subtraction reaches the object of voice de-noising by the short time spectrum value deducting noise from the short-time spectrum of noisy speech, and algorithm simply and easily realize.Owing to being using the noise spectrum estimation value of the noise spectrum before voice do not produce as whole voice de-noising interval, make the estimated value error of noise spectrum larger, therefore after deducting noise spectrum, also can the remainder of some relatively high power spectral component, frequency spectrum presents the random spike occurred, is acoustically forming residual noise.This noise has certain rhythm fluctuating sense, and being referred to as " music noise ", is the combined result of the tone that the multiple random frequency point of each frame occurs.Hearer usually can find " music noise " in the voice after processing, and it is more more clear than the noise in raw tone, also more easily offensive.

Summary of the invention

In order to solve adopt at present spectrum-subtraction process noise after there is the problem of music noise, invention proposes a kind of voice de-noising method improved based on spectrum-subtraction.The method comprises the following steps:

A, divides into mute frame and speech frame by end-point detection by speech frame;

B, for mute frame, calculate the power spectral value of present frame as noise power spectrum estimated value, for speech frame, calculating average noise power spectrum is as noise power spectrum estimated value;

C, deducts noise power spectrum estimated value by the power spectrum of speech frame, obtains the spectrum of the phonetic speech power after noise reduction;

D, draws the speech frame after noise reduction according to the phonetic speech power spectrum after noise reduction.

Preferably, step a is specially: the energy calculating each speech frame, if be more than or equal to threshold value, is then speech frame, if be less than threshold value, is then mute frame.Further, using the average noise energy of front 30 frame speech frames as described threshold value.

Preferably, in step b, the average noise energy of front 30 frame speech frames is composed as described average noise power.

Preferably, the value of noise spectrum estimation described in step b also smoothing process.

Preferably, steps d utilizes the phase spectrum of speech frame before noise reduction, calculates the speech manual after noise reduction, and then obtain the speech frame after noise reduction according to the phonetic speech power spectrum after noise reduction.

The present invention reduces the error of noise power spectrum estimated value by end-point detection technology, essentially eliminates music noise, thus improves the effect of voice de-noising quality and the subjective sense of hearing.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the end-point detection schematic flow sheet of the embodiment of the present invention.

Embodiment

For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments; It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

In noise-reduction method, the estimation of noise spectrum is most important, if noise estimated bias is comparatively large, affects voice de-noising quality by unquestionable.The present embodiment carries out noise estimation on the basis of end-point detection, end-point detection refers to from one section and comprises the starting point and terminal of determining voice the signal of voice, thus from continuous recording Noisy Speech Signal, isolate our real interested voice signal.The present embodiment divides into mute frame and speech frame by end-point detection wanting the speech frame of noise reduction.At mute frame, current spectrum is exactly noise spectrum, at speech frame, uses average noise power spectrum as noise power spectrum estimated value.So just use average noise power spectrum much little as the estimated value error of noise power spectrum than traditional in whole noise reduction interval.

The end-point detecting method of the present embodiment is compared with threshold value at the short-time energy of voice signal, if exceed threshold value, represents current for there being the voice segments of voice, otherwise just represent current quiet section for tone-off.Whole end-point detection flow process is as shown in Figure 1: first arranging an empirical value is threshold value, and the present embodiment is as threshold value using the average noise energy (EMN) of front 30 frame speech frames.Then the energy of each frame is calculated successively: in formula, N is frame length, and n is frame number, and 1≤n≤L, L is frame number, and m is each point in each frame.If the energy value of present frame is more than or equal to threshold value, then shows that present frame is speech frame, if be less than threshold value, be then shown to be noise frame.

Specific implementation step the following detailed description of the noise-reduction method of the present embodiment:

1, pre-filtering is carried out to the voice signal of input;

2, voice signal is carried out framing by every frame 128 signaling points;

3, Hamming window (Haming) is added to signal frame;

4, FFT conversion is carried out to the signal frame after windowing;

5, power spectrum is asked to each speech frame signal;

6, ask for average noise power spectrum according to front 30 frames;

7, utilize end-point detection to carry out noise and estimate to detect mute frame.If be in mute frame, then use the estimated value of power spectrum as noise power spectrum of present frame, if be in speech frame, then compose the estimated value (other common methods also can be adopted to carry out calculating average noise power spectrum) as noise power spectrum with the average noise power that the 6th step calculates;

8, median smoothing process is carried out to noise spectrum estimation value, eliminates wild point, make noise spectrum estimation value more level and smooth;

9, carrying out spectrum-subtraction computing, by treating that the phonetic speech power spectrum of noise reduction deducts noise power spectrum estimated value, obtaining the spectrum of the phonetic speech power after noise reduction;

10, insert the voice phase spectrum before noise reduction, calculate speech manual;

11, carry out IFFT conversion, reduction obtains the speech frame after noise reduction;

12, be combined as the voice signal after noise reduction according to each speech frame.

The present embodiment has also carried out emulation experiment, three kinds of representational situations are chosen in experiment: talk in microcomputer room, minimum shelves got to by fan simultaneously of talking on fan side, fan simultaneously of talking on fan side is got to middle-grade, under these three kinds of situations, use audio collecting device (not with decrease of noise functions) to gather the original noisy speech PCM data of 2 minutes with the sampling rate of 8K respectively, then traditionally spectrum-subtraction and the present embodiment method carry out noise reduction simulation process, obtain the data after the data after traditional spectrum-subtraction process and the process of the present embodiment method.Comparand it is found that, under these three kinds of situations, no matter be from figure or from acoustically, the noise reduction of the present embodiment method is all good than traditional spectrum-subtraction.

One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in a computer read/write memory medium, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.

Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. a voice de-noising method, is characterized in that, said method comprising the steps of:

2. method according to claim 1, is characterized in that, step a is specially: the energy calculating each speech frame, if be more than or equal to threshold value, is then speech frame, if be less than threshold value, is then mute frame.

3. method according to claim 2, is characterized in that, using the average noise energy of front 30 frame speech frames as described threshold value.

4. method according to claim 1, is characterized in that step b, is composed by the average noise energy of front 30 frame speech frames as described average noise power.

5. the method according to claim 1 or 4, is characterized in that step b, the also smoothing process of described noise spectrum estimation value.

6. method according to claim 1, is characterized in that steps d, utilizes the phase spectrum of speech frame before noise reduction, calculates the speech manual after noise reduction, and then obtain the speech frame after noise reduction according to the phonetic speech power spectrum after noise reduction.