CN1190773A

CN1190773A - Method estimating wave shape gain for phoneme coding

Info

Publication number: CN1190773A
Application number: CN97100716A
Authority: CN
Inventors: 林进灯; 林信安
Original assignee: HETAI SEMICONDUCTOR CO Ltd
Current assignee: HETAI SEMICONDUCTOR CO Ltd
Priority date: 1997-02-13
Filing date: 1997-02-13
Publication date: 1998-08-19

Abstract

The waveform gain estimating method for voice coding includes the following steps: providing a decoding envelope data, including envelope form index value and quantification gain value; a periodic voice pulse signal, making it pass through an oscillator to produce a non-periodic pulse signal and feed it into sound/soundless recognition unit, and making noise signal pass through another path and feeding it into sound/soundless recognition unit; dividing inputted voice signal frame into several sub-frames, and discriminating every sub-frame; providing a modified LPC parameter, and at the same time feeding it into synthesizing filter and post filter; using an amplitude calculating unit to obtain the LPC parameter and decoded envelope data from the synthesizing filter; calculating gain value and feeding into gain unit to control potential of synthesized voice; finally, using the post filter to output a smooth voice signal.

Description

The method estimating wave shape gain of voice coding

The present invention relates to a kind of speech coding technology, particularly relevant for a kind of voice coding method estimating wave shape gain.

In speech synthesis technique, generally use linear predictor speech coder LPC (Liner Predictive Coding Vocoder) technology.And in this linear predictor voice coding method, vocoder LPC-10 speech coder is widely used in the voice compression technique of low bit rate.

Fig. 1 has shown the calcspar of this traditional voice coding techniques.In the square, include a voice pulse generator 11 (Impulse Train Generator) shown in the diagram.One random noise signal generator 12 (Random Noise Generator).One sound/noiseless change-over switch 13 (Voiced/unvoiced Switch), a gain unit 14 (Gain Unit), a composite filter (LPC Filter) 15, composite filter controlled variable setup unit 16, wherein gain unit 14 has a gain setting unit in addition.

The white noise signal (While Noise) that periodic speech pulse signal that voice pulse generator 11 is produced (Periodic Impulse Train) or random noise signal generator 12 are produced, through a sound/noiseless change-over switch 13, type attribute according to its input signal, do suitably to switch after the selection, process gain unit 14 is according to its default yield value earlier, to the gain of signal to adjust its signal potential, carry out filtering by composite filter 15 according to the default LPC parameter (LPC Parameters) of composite filter controlled variable again, export voice signal S (n) by the output terminal of composite filter 15 at last.

When the actual speech coding was used, the output gain signal value of synthetic speech need be set or be controlled, so that its output signal is accorded with the signal potential of importing voice.In conventional art, reach this gain value settings and control purpose, mainly adopt following two kinds of technology, first method is that the energy according to the linear prediction of voice signal sampling (Linear Predicted Samples) decides its yield value.The method of another gain value settings and control is that (root-mean-square RMS) does the calculating of yield value according to root-mean-square value.In this kind located by prior art, sound frame (Unvoiced Frame) for noise signal, its gain is estimated by this root-mean-square value merely, and having voice to produce under (Voiced Frame) situation, also adopt identical root-mean-square value estimation method, but it can adopt further the estimating and measuring method of a kind of so-called rectangular window (several present pitch cycles) to obtain more precise gain value.Adopting the resulting yield value edge of the former located by prior art is to give equal quantification treatment with 7 bit logarithm value.

Yet which kind of gain estimating technology of commonly using of masterization employing not all can't accurately be estimated with single kind of gain estimating method and its correct yield value, and its reason is that traditional LPC scrambler belongs to open system.

Purpose of the present invention promptly is in order to overcome the shortcoming of aforementioned located by prior art, to provide a kind of, can obtain the voice coding gain estimating method of a level and smooth synthetic speech signal.

The shape that another object of the present invention provides a kind of signal envelope according to speech waveform is carried out the method for the voice coding estimation yield value of yield value estimation.

For arriving above-mentioned purpose, the present invention takes following scheme:

The method estimating wave shape gain of voice coding of the present invention may further comprise the steps:

A., the one envelope data of decoding is provided, and it obtains via analyzing typical voice signal;

B. sound/noiseless recognition unit select the aperiodicity pulse signal that produces through an oscillator by the periodic speech pulse signal and a noise signal the two one of;

C. the voice signal sound frame with input is divided into several consonant frames, by described recognition unit each consonant frame of this input is carried out sound/noiseless differentiation then;

D. provide a LPC parameter of crossing to deliver to simultaneously in a composite filter and the postfilter with revising;

E. obtain the envelope data of this LPC parameter and this decoding by a magnitude determinations unit by the composite filter place,, this yield value is delivered in the gain unit, with the current potential of control output synthetic speech through the calculating of yield value;

F. export a voice signal by postfilter.

Conjunction with figs. and embodiment give that details are as follows:

Brief Description Of Drawings:

Fig. 1 is the basic calcspar of traditional voice combiner circuit;

Fig. 2 is a phonetic synthesis step synoptic diagram of the present invention;

Fig. 3 comprises the corresponding coding schedule of 16 kinds of different envelope shapes in the preferred embodiment of the present invention with 4 byte codes.

As shown in Figure 2, it is a phonetic synthesis synoptic diagram of the present invention, it mainly includes an oscillator 21 (Vibrator), one sound/no acoustical signal recognition unit 22 (Voiced/Unvoiced Decision), one composite filter 24 (SynthesisFilter), one corrected LPC parameter unit 23 inserts LPC (Interpolate LPC Coefficient in LSPDomain) in the LSP zone, one magnitude determinations unit 25 (Amplitude Calculation Unit), the one signal envelope information unit 26 (DecodedEnvelope) of decoding, one gain unit 27 (Gain Unit), one postfilter 28 (Post Filter).Composite filter 24 includes a full polarity wave filter (ALL-pole Filter) and and separates accentuation filter (De-emphasis Filter).

Periodic speech pulse signal (Periodic Impulse Train) is through after the oscillator 21, send an aperiodicity pulse signal (Aperiodic Pulse) to sound/noiseless recognition unit 22, white noise signal (White Noise) is then delivered to sound/noiseless recognition unit 22 through another path.

Sound/method of discrimination that noiseless recognition unit 22 is adopted be adopt will input voice signal sound frame be divided into four consonant frames (Subframe), and then each consonant frame differentiated, in this method of discrimination, at first each the sound frame in the input speech signal is divided into four consonant frames (Subframe), then at each sub-frame, according to its correlation parameter, each sub-frame of comprehensive distinguishing.Aforesaid parameter include NC, energy, linear spectral to (line Spectrum Pair is called for short LSP) and the paramount frequency range energy ratio of low-frequency range (Low to High Band Energy Ratio Value, LOH).Relevant this sound/noiseless recognition technology, same applicant has applied for another patent.

In the voice input signal that slowly changes, the method for upgrading each sound frame one by one can reach required quality of output signals.Yet, if when some transient behaviour, can when changing, each sound frame produce the situation of transient distortion, therefore in order to reduce transient distortion, so when sending the LPC parameter, can revise LSP parameter (the LSP parameter means the LPC parameter before revising in above explanation) by the corrected LPC parameter unit 23 among the present invention to composite filter 24.Its method is for the middle groups parameter between assessment sound frame, not increase under the code capacity, can reach and makes the more level and smooth purpose in sound frame confluce.In order to reduce the linear calculation times of revising of LPC, so in preferred embodiment of the present invention, be that each voice sound frame is divided into four consonant frames, and obtaining of the LSP parameter of each consonant frame is to obtain by revising the LSP parameter value between present sound frame and last sound frame.And then this LSP Parameters Transformation become the LPC parameter, this corrected LPC parameter can be delivered to synthetic filter Chinese device 24 and postfilter 28 simultaneously at last.

The LPC parameter can be obtained and by sealing after the related data that information unit 26 sent into of decoding by composite filter 24 places in magnitude determinations unit 25, outputing gain control signal is also delivered in the gain unit 27, exports a required voice signal by postfilter 28 more at last.

Inputing to the signal of sealing information unit 26 comprises and seals shape index value (Shape Index) and quantize yield value (Quantized Gain).Obtaining of these two parameters is to obtain by the sound frame of analyzing typical voice signal.In an embodiment of the present invention, be to comprise 16 kinds of different shapes of sealing with 4 byte codes, its corresponding tables is as shown in Figure 3.One seal the shape coding table according to this, in sealing cataloged procedure, in case to the shape of voice sound frame of input, compare out and accord with most in this coding schedule after some index values of sealing shape, promptly the technology with known logarithm quantizer is quantized into for example yield value of 7 bits.With the resulting quantification yield value of this technology and seal the shape index value and can send into as sealing in the information unit 26 among Fig. 2.

Yield value of the present invention calculates, and is calculated when the peak swing of synthetic speech just reaches sealing of decoding.In yield value computing method of the present invention, the sound frame to voice and noise consonant carries out analytical calculation respectively.

One, voice sound frame:

For sound consonant frame, it is the form that excites for the aperiodicity pulse.When carrying out Calculation of Gain, at first calculate composite filter and respond in the unit at this pulse position place digit pulse.The yield value of this pulse can calculate by following formula:

α k=min (abs (Envk, i/imp_resk, i)), p _o≤ i≤p _o+ r wherein α k represents k ^ThThe gain of pulse;

Envk, i are illustrated in i place, position, k ^ThThe decoding of pulse is sealed;

Imp_resk, the response of i indicating impulse;

p _oThe position of indicating impulse;

R represents to search length (representative value is 10); After the yield value that calculates this pulse, this pulse promptly is sent in the composite filter, and composite filter can so can produce a synthetic speech signal (Synthesized Speech) at the output terminal of composite filter 27 with this signal times with the aforementioned α k value that is calculated after receiving this signal.After finishing the aforementioned calculation step, can repeat above-mentioned steps to calculate the yield value of next pulse.

Two, noise consonant frame:

For noise consonant frame, be the form that excites that adopts by noise (White Noise).At first calculate the position of the noise response of composite filter in whole consonant frame, this purpose is to surpass decoding envelope phenomenon for fear of the amplitude of composite signal in this consonant frame.The yield value of the noise signal of whole consonant frame can be calculated by following formula:

βj＝min(abs(Envj，i/noise_resj，i))，

w _o≤ i≤sub_leng wherein β k represents whole j ^ThThe gain of the noise signal of consonant frame;

Envj, i are illustrated in i place, position, the decoding envelope of noise signal;

Noise_resj, i represent the noise signal response;

w _oThe position of beginning of opening of representing each consonant;

Sub_leng represents the length of consonant frame; After the yield value that calculates this noise signal, this noise signal promptly is sent in the composite filter, and composite filter can be with this signal times with the aforementioned β j value that is calculated after receiving this signal, so can in the consonant frame of whole jth, produce a noiseless consonant synthetic speech signal (Unvoiced Synthesized Speech) by output terminal on the composite filter 27.

In sum, effect of the present invention is as follows:

Because the present invention takes the method according to the waveform shape estimation yield value of voice signal envelope, can in the voice input signal that slowly changes, upgrade one by one the data of each sound frame, the transient distortion of signal can be reduced, therefore truer and level and smooth synthesized voice signal can be obtained.

Claims

1, a kind of method estimating wave shape gain of voice coding may further comprise the steps:

D. provide a corrected LPC parameter to deliver to simultaneously in a composite filter and the postfilter;

F. export a voice signal by postfilter.

2, the method estimating wave shape gain of voice coding according to claim 1 is characterized in that, the envelope data among the described step a includes the envelope shape index value of voice signal and quantizes yield value.

3, the method estimating wave shape gain of voice coding according to claim 2, it is characterized in that, described envelope shape index value and obtaining of yield value of quantification are to obtain by the sound frame of analyzing speech signal, according to analysis result, comprise 16 kinds of different envelope shapes with 4 byte codes, and obtain a corresponding tables.

4, the method estimating wave shape gain of voice coding according to claim 1 is characterized in that, the corrected LPC parameter of delivering in the described steps d in the composite filter obtains with the following step:

According to a LSP parameter of decode, to insert in first LSP time domain by a corrected LPC parameter, its method is the middle groups parameter of assessing between the sound frame, is not increasing under the code capacity, makes with interpolation method that sound frame confluce is more smooth-going, the reduction transient distortion.

5, voice coding method estimating wave shape gain according to claim 4, it is characterized in that, described when inserting the step of the LPC parameter in the LSP time domain, be that each voice sound frame is divided into four consonant frames, and the obtaining of the LSP parameter of each consonant frame, be get by revising the LSP parameter value between present sound frame and last sound frame with, and then this LSP Parameters Transformation become the LPC parameter.

6, the method estimating wave shape gain of voice coding according to claim 1, it is characterized in that, calculating at yield value described in the described step e, when just reaching the envelope of decoding, the peak swing of synthetic speech calculated suitable yield value, and the sound/noiseless consonant sound frame to input speech signal carries out analytical calculation respectively respectively, to calculate the yield value of its voice and noise consonant frame respectively.

7, the method estimating wave shape gain of voice coding according to claim 6 is characterized in that, may further comprise the steps for the calculating of the yield value of described voice sound frame:

A. calculate composite filter in the impulse response of the unit at this pulse position place;

B. calculate the yield value of this pulse with following formula;

αk＝min(abs(Envk，i/imp_resk，i))，p _o≤i≤p _o+r

α k represents k ^ThThe gain of pulse;

Env _{K, i}Be illustrated in i place, position, k ^ThThe decoding envelope of pulse;

Imp_res _{K, i}The indicating impulse response;

p _oThe position of indicating impulse;

R represents the length of searching;

C. after the yield value that calculates this pulse, this pulse promptly is sent in the composite filter;

D. composite filter with the aforementioned α k value that is calculated, produces a synthetic speech signal with the output terminal at composite filter with this signal times after receiving this signal;

E. after finishing the aforementioned calculation step, repeat above-mentioned step to calculate the yield value of next pulse.

8, the method estimating wave shape gain of voice coding according to claim 6 is characterized in that, includes for the calculating of the yield value of described noise signal consonant sound frame:

A. at first calculate the position of the noise signal response of composite filter in whole consonant frame;

B. calculate the yield value of whole consonant frame with following formula:

βj＝min(abs(Env _j，i/noise_res _j，i))，

w _o≤ i≤sub_leng wherein β k represents whole j ^ThThe noise signal gain of consonant frame;

Env _{J, i}Be illustrated in i place, position, the decoding envelope of noise signal;

Noise_res _{J, i}The response of expression noise signal;

w _oThe position of beginning of opening of representing each consonant;

Sub_leng represents the length of consonant frame;

C. after the yield value that calculates this noise signal, this noise signal promptly is sent in the composite filter;

D. composite filter is after receiving this signal, with this signal times with the aforementioned β j value that is calculated, so can be at whole j _ThThe consonant frame in, produce a noise signal consonant synthetic speech by the output terminal of composite filter.