CN1190773A - Method estimating wave shape gain for phoneme coding - Google Patents
Method estimating wave shape gain for phoneme coding Download PDFInfo
- Publication number
- CN1190773A CN1190773A CN97100716A CN97100716A CN1190773A CN 1190773 A CN1190773 A CN 1190773A CN 97100716 A CN97100716 A CN 97100716A CN 97100716 A CN97100716 A CN 97100716A CN 1190773 A CN1190773 A CN 1190773A
- Authority
- CN
- China
- Prior art keywords
- signal
- voice
- frame
- consonant
- gain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000000737 periodic effect Effects 0.000 claims abstract description 8
- 238000011002 quantification Methods 0.000 claims abstract description 4
- 239000002131 composite material Substances 0.000 claims description 31
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000001052 transient effect Effects 0.000 claims description 5
- 230000004069 differentiation Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 abstract 2
- 238000005516 engineering process Methods 0.000 description 7
- 238000007789 sealing Methods 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 230000001944 accentuation Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Abstract
The waveform gain estimating method for voice coding includes the following steps: providing a decoding envelope data, including envelope form index value and quantification gain value; a periodic voice pulse signal, making it pass through an oscillator to produce a non-periodic pulse signal and feed it into sound/soundless recognition unit, and making noise signal pass through another path and feeding it into sound/soundless recognition unit; dividing inputted voice signal frame into several sub-frames, and discriminating every sub-frame; providing a modified LPC parameter, and at the same time feeding it into synthesizing filter and post filter; using an amplitude calculating unit to obtain the LPC parameter and decoded envelope data from the synthesizing filter; calculating gain value and feeding into gain unit to control potential of synthesized voice; finally, using the post filter to output a smooth voice signal.
Description
The present invention relates to a kind of speech coding technology, particularly relevant for a kind of voice coding method estimating wave shape gain.
In speech synthesis technique, generally use linear predictor speech coder LPC (Liner Predictive Coding Vocoder) technology.And in this linear predictor voice coding method, vocoder LPC-10 speech coder is widely used in the voice compression technique of low bit rate.
Fig. 1 has shown the calcspar of this traditional voice coding techniques.In the square, include a voice pulse generator 11 (Impulse Train Generator) shown in the diagram.One random noise signal generator 12 (Random Noise Generator).One sound/noiseless change-over switch 13 (Voiced/unvoiced Switch), a gain unit 14 (Gain Unit), a composite filter (LPC Filter) 15, composite filter controlled variable setup unit 16, wherein gain unit 14 has a gain setting unit in addition.
The white noise signal (While Noise) that periodic speech pulse signal that voice pulse generator 11 is produced (Periodic Impulse Train) or random noise signal generator 12 are produced, through a sound/noiseless change-over switch 13, type attribute according to its input signal, do suitably to switch after the selection, process gain unit 14 is according to its default yield value earlier, to the gain of signal to adjust its signal potential, carry out filtering by composite filter 15 according to the default LPC parameter (LPC Parameters) of composite filter controlled variable again, export voice signal S (n) by the output terminal of composite filter 15 at last.
When the actual speech coding was used, the output gain signal value of synthetic speech need be set or be controlled, so that its output signal is accorded with the signal potential of importing voice.In conventional art, reach this gain value settings and control purpose, mainly adopt following two kinds of technology, first method is that the energy according to the linear prediction of voice signal sampling (Linear Predicted Samples) decides its yield value.The method of another gain value settings and control is that (root-mean-square RMS) does the calculating of yield value according to root-mean-square value.In this kind located by prior art, sound frame (Unvoiced Frame) for noise signal, its gain is estimated by this root-mean-square value merely, and having voice to produce under (Voiced Frame) situation, also adopt identical root-mean-square value estimation method, but it can adopt further the estimating and measuring method of a kind of so-called rectangular window (several present pitch cycles) to obtain more precise gain value.Adopting the resulting yield value edge of the former located by prior art is to give equal quantification treatment with 7 bit logarithm value.
Yet which kind of gain estimating technology of commonly using of masterization employing not all can't accurately be estimated with single kind of gain estimating method and its correct yield value, and its reason is that traditional LPC scrambler belongs to open system.
Purpose of the present invention promptly is in order to overcome the shortcoming of aforementioned located by prior art, to provide a kind of, can obtain the voice coding gain estimating method of a level and smooth synthetic speech signal.
The shape that another object of the present invention provides a kind of signal envelope according to speech waveform is carried out the method for the voice coding estimation yield value of yield value estimation.
For arriving above-mentioned purpose, the present invention takes following scheme:
The method estimating wave shape gain of voice coding of the present invention may further comprise the steps:
A., the one envelope data of decoding is provided, and it obtains via analyzing typical voice signal;
B. sound/noiseless recognition unit select the aperiodicity pulse signal that produces through an oscillator by the periodic speech pulse signal and a noise signal the two one of;
C. the voice signal sound frame with input is divided into several consonant frames, by described recognition unit each consonant frame of this input is carried out sound/noiseless differentiation then;
D. provide a LPC parameter of crossing to deliver to simultaneously in a composite filter and the postfilter with revising;
E. obtain the envelope data of this LPC parameter and this decoding by a magnitude determinations unit by the composite filter place,, this yield value is delivered in the gain unit, with the current potential of control output synthetic speech through the calculating of yield value;
F. export a voice signal by postfilter.
Conjunction with figs. and embodiment give that details are as follows:
Brief Description Of Drawings:
Fig. 1 is the basic calcspar of traditional voice combiner circuit;
Fig. 2 is a phonetic synthesis step synoptic diagram of the present invention;
Fig. 3 comprises the corresponding coding schedule of 16 kinds of different envelope shapes in the preferred embodiment of the present invention with 4 byte codes.
As shown in Figure 2, it is a phonetic synthesis synoptic diagram of the present invention, it mainly includes an oscillator 21 (Vibrator), one sound/no acoustical signal recognition unit 22 (Voiced/Unvoiced Decision), one composite filter 24 (SynthesisFilter), one corrected LPC parameter unit 23 inserts LPC (Interpolate LPC Coefficient in LSPDomain) in the LSP zone, one magnitude determinations unit 25 (Amplitude Calculation Unit), the one signal envelope information unit 26 (DecodedEnvelope) of decoding, one gain unit 27 (Gain Unit), one postfilter 28 (Post Filter).Composite filter 24 includes a full polarity wave filter (ALL-pole Filter) and and separates accentuation filter (De-emphasis Filter).
Periodic speech pulse signal (Periodic Impulse Train) is through after the oscillator 21, send an aperiodicity pulse signal (Aperiodic Pulse) to sound/noiseless recognition unit 22, white noise signal (White Noise) is then delivered to sound/noiseless recognition unit 22 through another path.
Sound/method of discrimination that noiseless recognition unit 22 is adopted be adopt will input voice signal sound frame be divided into four consonant frames (Subframe), and then each consonant frame differentiated, in this method of discrimination, at first each the sound frame in the input speech signal is divided into four consonant frames (Subframe), then at each sub-frame, according to its correlation parameter, each sub-frame of comprehensive distinguishing.Aforesaid parameter include NC, energy, linear spectral to (line Spectrum Pair is called for short LSP) and the paramount frequency range energy ratio of low-frequency range (Low to High Band Energy Ratio Value, LOH).Relevant this sound/noiseless recognition technology, same applicant has applied for another patent.
In the voice input signal that slowly changes, the method for upgrading each sound frame one by one can reach required quality of output signals.Yet, if when some transient behaviour, can when changing, each sound frame produce the situation of transient distortion, therefore in order to reduce transient distortion, so when sending the LPC parameter, can revise LSP parameter (the LSP parameter means the LPC parameter before revising in above explanation) by the corrected LPC parameter unit 23 among the present invention to composite filter 24.Its method is for the middle groups parameter between assessment sound frame, not increase under the code capacity, can reach and makes the more level and smooth purpose in sound frame confluce.In order to reduce the linear calculation times of revising of LPC, so in preferred embodiment of the present invention, be that each voice sound frame is divided into four consonant frames, and obtaining of the LSP parameter of each consonant frame is to obtain by revising the LSP parameter value between present sound frame and last sound frame.And then this LSP Parameters Transformation become the LPC parameter, this corrected LPC parameter can be delivered to synthetic filter Chinese device 24 and postfilter 28 simultaneously at last.
The LPC parameter can be obtained and by sealing after the related data that information unit 26 sent into of decoding by composite filter 24 places in magnitude determinations unit 25, outputing gain control signal is also delivered in the gain unit 27, exports a required voice signal by postfilter 28 more at last.
Inputing to the signal of sealing information unit 26 comprises and seals shape index value (Shape Index) and quantize yield value (Quantized Gain).Obtaining of these two parameters is to obtain by the sound frame of analyzing typical voice signal.In an embodiment of the present invention, be to comprise 16 kinds of different shapes of sealing with 4 byte codes, its corresponding tables is as shown in Figure 3.One seal the shape coding table according to this, in sealing cataloged procedure, in case to the shape of voice sound frame of input, compare out and accord with most in this coding schedule after some index values of sealing shape, promptly the technology with known logarithm quantizer is quantized into for example yield value of 7 bits.With the resulting quantification yield value of this technology and seal the shape index value and can send into as sealing in the information unit 26 among Fig. 2.
Yield value of the present invention calculates, and is calculated when the peak swing of synthetic speech just reaches sealing of decoding.In yield value computing method of the present invention, the sound frame to voice and noise consonant carries out analytical calculation respectively.
One, voice sound frame:
For sound consonant frame, it is the form that excites for the aperiodicity pulse.When carrying out Calculation of Gain, at first calculate composite filter and respond in the unit at this pulse position place digit pulse.The yield value of this pulse can calculate by following formula:
α k=min (abs (Envk, i/imp_resk, i)), p
o≤ i≤p
o+ r wherein α k represents k
ThThe gain of pulse;
Envk, i are illustrated in i place, position, k
ThThe decoding of pulse is sealed;
Imp_resk, the response of i indicating impulse;
p
oThe position of indicating impulse;
R represents to search length (representative value is 10); After the yield value that calculates this pulse, this pulse promptly is sent in the composite filter, and composite filter can so can produce a synthetic speech signal (Synthesized Speech) at the output terminal of composite filter 27 with this signal times with the aforementioned α k value that is calculated after receiving this signal.After finishing the aforementioned calculation step, can repeat above-mentioned steps to calculate the yield value of next pulse.
Two, noise consonant frame:
For noise consonant frame, be the form that excites that adopts by noise (White Noise).At first calculate the position of the noise response of composite filter in whole consonant frame, this purpose is to surpass decoding envelope phenomenon for fear of the amplitude of composite signal in this consonant frame.The yield value of the noise signal of whole consonant frame can be calculated by following formula:
βj=min(abs(Envj,i/noise_resj,i)),
w
o≤ i≤sub_leng wherein β k represents whole j
ThThe gain of the noise signal of consonant frame;
Envj, i are illustrated in i place, position, the decoding envelope of noise signal;
Noise_resj, i represent the noise signal response;
w
oThe position of beginning of opening of representing each consonant;
Sub_leng represents the length of consonant frame; After the yield value that calculates this noise signal, this noise signal promptly is sent in the composite filter, and composite filter can be with this signal times with the aforementioned β j value that is calculated after receiving this signal, so can in the consonant frame of whole jth, produce a noiseless consonant synthetic speech signal (Unvoiced Synthesized Speech) by output terminal on the composite filter 27.
In sum, effect of the present invention is as follows:
Because the present invention takes the method according to the waveform shape estimation yield value of voice signal envelope, can in the voice input signal that slowly changes, upgrade one by one the data of each sound frame, the transient distortion of signal can be reduced, therefore truer and level and smooth synthesized voice signal can be obtained.
Claims (8)
1, a kind of method estimating wave shape gain of voice coding may further comprise the steps:
A., the one envelope data of decoding is provided, and it obtains via analyzing typical voice signal;
B. sound/noiseless recognition unit select the aperiodicity pulse signal that produces through an oscillator by the periodic speech pulse signal and a noise signal the two one of;
C. the voice signal sound frame with input is divided into several consonant frames, by described recognition unit each consonant frame of this input is carried out sound/noiseless differentiation then;
D. provide a corrected LPC parameter to deliver to simultaneously in a composite filter and the postfilter;
E. obtain the envelope data of this LPC parameter and this decoding by a magnitude determinations unit by the composite filter place,, this yield value is delivered in the gain unit, with the current potential of control output synthetic speech through the calculating of yield value;
F. export a voice signal by postfilter.
2, the method estimating wave shape gain of voice coding according to claim 1 is characterized in that, the envelope data among the described step a includes the envelope shape index value of voice signal and quantizes yield value.
3, the method estimating wave shape gain of voice coding according to claim 2, it is characterized in that, described envelope shape index value and obtaining of yield value of quantification are to obtain by the sound frame of analyzing speech signal, according to analysis result, comprise 16 kinds of different envelope shapes with 4 byte codes, and obtain a corresponding tables.
4, the method estimating wave shape gain of voice coding according to claim 1 is characterized in that, the corrected LPC parameter of delivering in the described steps d in the composite filter obtains with the following step:
According to a LSP parameter of decode, to insert in first LSP time domain by a corrected LPC parameter, its method is the middle groups parameter of assessing between the sound frame, is not increasing under the code capacity, makes with interpolation method that sound frame confluce is more smooth-going, the reduction transient distortion.
5, voice coding method estimating wave shape gain according to claim 4, it is characterized in that, described when inserting the step of the LPC parameter in the LSP time domain, be that each voice sound frame is divided into four consonant frames, and the obtaining of the LSP parameter of each consonant frame, be get by revising the LSP parameter value between present sound frame and last sound frame with, and then this LSP Parameters Transformation become the LPC parameter.
6, the method estimating wave shape gain of voice coding according to claim 1, it is characterized in that, calculating at yield value described in the described step e, when just reaching the envelope of decoding, the peak swing of synthetic speech calculated suitable yield value, and the sound/noiseless consonant sound frame to input speech signal carries out analytical calculation respectively respectively, to calculate the yield value of its voice and noise consonant frame respectively.
7, the method estimating wave shape gain of voice coding according to claim 6 is characterized in that, may further comprise the steps for the calculating of the yield value of described voice sound frame:
A. calculate composite filter in the impulse response of the unit at this pulse position place;
B. calculate the yield value of this pulse with following formula;
αk=min(abs(Envk,i/imp_resk,i)),p
o≤i≤p
o+r
α k represents k
ThThe gain of pulse;
Env
K, iBe illustrated in i place, position, k
ThThe decoding envelope of pulse;
Imp_res
K, iThe indicating impulse response;
p
oThe position of indicating impulse;
R represents the length of searching;
C. after the yield value that calculates this pulse, this pulse promptly is sent in the composite filter;
D. composite filter with the aforementioned α k value that is calculated, produces a synthetic speech signal with the output terminal at composite filter with this signal times after receiving this signal;
E. after finishing the aforementioned calculation step, repeat above-mentioned step to calculate the yield value of next pulse.
8, the method estimating wave shape gain of voice coding according to claim 6 is characterized in that, includes for the calculating of the yield value of described noise signal consonant sound frame:
A. at first calculate the position of the noise signal response of composite filter in whole consonant frame;
B. calculate the yield value of whole consonant frame with following formula:
βj=min(abs(Env
j,i/noise_res
j,i)),
w
o≤ i≤sub_leng wherein β k represents whole j
ThThe noise signal gain of consonant frame;
Env
J, iBe illustrated in i place, position, the decoding envelope of noise signal;
Noise_res
J, iThe response of expression noise signal;
w
oThe position of beginning of opening of representing each consonant;
Sub_leng represents the length of consonant frame;
C. after the yield value that calculates this noise signal, this noise signal promptly is sent in the composite filter;
D. composite filter is after receiving this signal, with this signal times with the aforementioned β j value that is calculated, so can be at whole j
ThThe consonant frame in, produce a noise signal consonant synthetic speech by the output terminal of composite filter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN97100716A CN1190773A (en) | 1997-02-13 | 1997-02-13 | Method estimating wave shape gain for phoneme coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN97100716A CN1190773A (en) | 1997-02-13 | 1997-02-13 | Method estimating wave shape gain for phoneme coding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1190773A true CN1190773A (en) | 1998-08-19 |
Family
ID=5165266
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN97100716A Pending CN1190773A (en) | 1997-02-13 | 1997-02-13 | Method estimating wave shape gain for phoneme coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1190773A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7519530B2 (en) | 2003-01-09 | 2009-04-14 | Nokia Corporation | Audio signal processing |
CN100587807C (en) * | 1999-01-27 | 2010-02-03 | 编码技术股份公司 | Device for enhancing information source decoder and method for enhancing information source decoding method |
CN101199233B (en) * | 2005-05-18 | 2012-01-18 | 松下电器产业株式会社 | Howling control apparatus and acoustic apparatus |
CN103001598A (en) * | 2011-07-19 | 2013-03-27 | 联发科技股份有限公司 | Audio processing device and audio systems using same |
US9252730B2 (en) | 2011-07-19 | 2016-02-02 | Mediatek Inc. | Audio processing device and audio systems using the same |
CN105355197A (en) * | 2015-10-30 | 2016-02-24 | 百度在线网络技术(北京)有限公司 | Gain processing method and device for speech recognition system |
-
1997
- 1997-02-13 CN CN97100716A patent/CN1190773A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100587807C (en) * | 1999-01-27 | 2010-02-03 | 编码技术股份公司 | Device for enhancing information source decoder and method for enhancing information source decoding method |
CN101625866B (en) * | 1999-01-27 | 2012-12-26 | 杜比国际公司 | Methods and an apparatus for enhancement of source decoder |
US7519530B2 (en) | 2003-01-09 | 2009-04-14 | Nokia Corporation | Audio signal processing |
CN101199233B (en) * | 2005-05-18 | 2012-01-18 | 松下电器产业株式会社 | Howling control apparatus and acoustic apparatus |
CN103001598A (en) * | 2011-07-19 | 2013-03-27 | 联发科技股份有限公司 | Audio processing device and audio systems using same |
CN103001598B (en) * | 2011-07-19 | 2015-10-28 | 联发科技股份有限公司 | Apparatus for processing audio and use the audio system of this apparatus for processing audio |
US9252730B2 (en) | 2011-07-19 | 2016-02-02 | Mediatek Inc. | Audio processing device and audio systems using the same |
CN105355197A (en) * | 2015-10-30 | 2016-02-24 | 百度在线网络技术(北京)有限公司 | Gain processing method and device for speech recognition system |
CN105355197B (en) * | 2015-10-30 | 2020-01-07 | 百度在线网络技术(北京)有限公司 | Gain processing method and device for voice recognition system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8620647B2 (en) | Selection of scalar quantixation (SQ) and vector quantization (VQ) for speech coding | |
JP3483891B2 (en) | Speech coder | |
TW448417B (en) | Speech encoder adaptively applying pitch preprocessing with continuous warping | |
KR100908219B1 (en) | Method and apparatus for robust speech classification | |
US5018200A (en) | Communication system capable of improving a speech quality by classifying speech signals | |
US20060064301A1 (en) | Parametric speech codec for representing synthetic speech in the presence of background noise | |
KR20020052191A (en) | Variable bit-rate celp coding of speech with phonetic classification | |
KR19990006262A (en) | Speech coding method based on digital speech compression algorithm | |
US6985857B2 (en) | Method and apparatus for speech coding using training and quantizing | |
EP1420391B1 (en) | Generalized analysis-by-synthesis speech coding method, and coder implementing such method | |
CN101359978A (en) | Method for control rate variant multi-mode wideband encoding rate | |
AU2014317525A1 (en) | Unvoiced/voiced decision for speech processing | |
CN1190773A (en) | Method estimating wave shape gain for phoneme coding | |
US20100153099A1 (en) | Speech encoding apparatus and speech encoding method | |
Wang et al. | Phonetic segmentation for low rate speech coding | |
WO2003001172A1 (en) | Method and device for coding speech in analysis-by-synthesis speech coders | |
JP3232701B2 (en) | Audio coding method | |
Wong | On understanding the quality problems of LPC speech | |
CN1189664A (en) | Sub-voice discrimination method of voice coding | |
Mcaulay et al. | Sinusoidal transform coding | |
WO2001009880A1 (en) | Multimode vselp speech coder | |
Mao et al. | A 2000 bps LPC vocoder based on multiband excitation | |
HEIKKINEN et al. | On Improving the Performance of an ACELP Speech Coder | |
Ould-cheikh | WIDE BAND SPEECH CODER AT 13 K bit/s | |
Zhang et al. | A 2400 bps improved MBELP vocoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C53 | Correction of patent for invention or patent application | ||
CB02 | Change of applicant information |
Applicant after: Shengqun Semiconductor Co., Ltd. Applicant before: Hetai Semiconductor Co., Ltd. |
|
COR | Change of bibliographic data |
Free format text: CORRECT: APPLICANT; FROM: HETAI SEMICONDUCTOR CO., LTD. TO: SHENGQUN SEMICONDUCTOR CO., LTD. |
|
C01 | Deemed withdrawal of patent application (patent law 1993) | ||
WD01 | Invention patent application deemed withdrawn after publication |