CN103050121A

CN103050121A - Linear prediction speech coding method and speech synthesis method

Info

Publication number: CN103050121A
Application number: CN 201210592909
Authority: CN
Inventors: 洪小斌
Original assignee: BEIJING XUNGUANGDA COMMUNICATION TECHNOLOGY CO LTD
Current assignee: BEIJING XUNGUANGDA COMMUNICATION TECHNOLOGY CO LTD
Priority date: 2012-12-31
Filing date: 2012-12-31
Publication date: 2013-04-17

Abstract

The invention discloses a linear prediction speech coding method and a speech synthesis method. The linear prediction speech coding method includes the following steps: speech is preprocessed; second-order backward linear prediction is carried out on the preprocessed speech, so that a residual signal is obtained; wavelet decomposition and compression are carried out on the residual signal, so that a wavelet coefficient is obtained, vector quantization is carried out on the wavelet coefficient, and meanwhile, the pitch period and gain parameters of the residual signal and the unvoicing and voicing characteristic of each sub-band are calculated and respectively and scalarly quantized. The speech synthesis method is based on the linear prediction speech coding method. After being adopted, the technical scheme of the invention can reduce the affection of noise on the quality of decoded speech, inhibit the deterioration of speech quality when unvoicing and voicing judgment is mistaken and improve the performance of coding unvoiced speech or background noise.

Description

Linear predict voice coding method and phoneme synthesizing method

Technical field

The present invention relates to speech coding technology, particularly a kind of linear predict voice coding method and a kind of phoneme synthesizing method.

Background technology

Along with the high speed development of information society and the communication technology, it is further valuable that frequency resource seems.In digital mobile communication and voice field of storage, in order effectively to utilize communication bandwidth or storage space, use the transmission bandwidth of sound encoding device compressed voice signal or the transmission code rate of reduction telephone channel, raise the efficiency and encode, be the target that people pursue always.Along with increase, the Network synthesization and variation of communication network users quantity, the contradiction of the network bandwidth and power system capacity, service quality, traditional voice compressed encoding and decoding technology can not satisfy the transmission channel requirement that constantly becomes crowded.Therefore, the bit rate that how to reduce as far as possible its transmission under the prerequisite of not sacrificing voice call quality is important research topic.Over past ten years, (4.8kbps～16kbps) speech coding algorithm research has obtained significant progress and widespread use has been arranged middle bit rate, and the while low bit rate particularly following speech coding algorithm of 2.4kbps becomes research focus gradually.Along with the rapid raising of the process chip arithmetic speed of moving encryption algorithm, gradually become the main flow of low bit rate speech coding algorithm based on the algorithm of linear prediction mixed coding technology.

Linear predictive coding (Linear Prediction Coding, LPC) basis is that hypothesis voice signal (voiced sound) is the hummer generation of sound pipe end, be attended by once in a while fizz with explosion sound (sibilant and plosive), glottis between the vocal cords produces the sound of varying strength (volume) and frequency (tone), and throat and mouth form the sympathetic response sound channel.Fizz generate with the effect of explosion sound by tongue, lip and throat.Linear predictive coding is by estimating resonance peak, reject their effects in voice signal, estimating that the buzz intensity and the frequency that keep come the analyzing speech signal.The process of rejecting resonance peak is called liftering, is called residual signals through the remaining signal of this process.Describe parameter and the linear predictor coefficient of resonance peak, residual signals and can preserve, send to the take over party.The take over party is by reverse process synthetic speech signal, and resonance peak, residual signals produce source signal as driving source, uses linear predictor coefficient as the wave filter of sound channel, and source signal just obtains voice signal through the processing of wave filter.

According to the difference to the pumping signal describing mode, the linear predict voice coding method mainly is divided into LPC-10, Qualcomm Code Excited Linear Prediction (QCELP) (Code Excited Linear Prediction, CELP), mixed excitation (Mixed Excited Linear Prediction, MELP), sinusoidal excitation (Sinusoidal Excited Linear Prediction, SELP) and multi-band excitation (Multi-BandExcitation) etc.These voice coding modes are that voice are divided into certain frame length (about 20ms～50ms), each frame is carried out the linear prediction of voice, with known codebook the prediction residual of passing through the linear prediction gained (pumping signal) of linear prediction vector and every frame is encoded.

Fig. 1 is the fundamental block diagram of existing voice coding method based on linear prediction, and these methods are except the extracting method difference of residual error parameter, and the extraction of other parameter is all basic identical.In Fig. 1, pumping signal represents with the pure and impure sound of each subband of the gain of the pitch period of residual error parameter, raw tone, raw tone and raw tone, and the residual error parameter is used for describing the harmonic component of voiced sound in the residual error, and voiceless sound replaces with noise.

The existing noise intensity that depends on consumingly raw tone based on the vocoder voice quality of linear prediction, when the raw tone signal to noise ratio (S/N ratio) was relatively poor, voicing decision mistake, fundamental tone extracted mistake and can cause serious modified tone distortion, and synthetic naturalness is descended.The pure and impure sound that produces pitch period, gain and the subband of pumping signal in these technology all extracts from raw tone, partial parameters derives from raw tone during receiving end reduction pumping signal, partial parameters derives from residual signals, and the voice quality of decoding is restricted.

Summary of the invention

(1) technical matters to be solved

The object of the present invention is to provide a kind of linear predict voice coding method and a kind of phoneme synthesizing method, can noise decrease on the impact of decoded speech quality, sound quality deterioration when suppressing the voicing decision mistake, and improvement is to the coding efficiency of unvoiced speech or ground unrest.

(2) technical scheme

In order to solve the problems of the technologies described above, the present invention proposes a kind of linear predict voice coding method, described voice coding method may further comprise the steps:

S101, voice are carried out pre-service disturb to remove flip-flop and power frequency;

S102, pretreated voice are carried out second order antilinear prediction, obtain residual signals;

S103, described residual signals is carried out wavelet decomposition compression, obtains wavelet coefficient, and described wavelet coefficient is carried out vector quantization,

Calculate the pitch period of described residual signals, and described pitch period carried out scalar quantization,

Calculate the gain parameter of described residual signals, and described gain parameter carried out scalar quantization,

Described residual signals is divided into several subbands, each subband is carried out voicing decision, the pure and impure sound characteristic that obtains each subband rower amount of going forward side by side quantizes.

Optionally, step S102 further comprises:

Described pretreated voice are carried out linear prediction analysis, obtain linear predictor coefficient, then described linear predictor coefficient is converted to line spectral frequencies pair, and to described line spectral frequencies to carrying out vector quantization.

Optionally, among the step S102, described linear prediction analysis specifically comprises:

Adopt Hamming window to carry out windowing process to described pretreated voice, and the voice signal after the windowing is carried out autocorrelation calculation, utilize the Levinson-Durbin algorithm to calculate 10 rank linear predictor coefficients, then described 10 rank linear predictor coefficients be multiply by 0.994 ^I+1(i=1,2 ..., 10) and to obtain the linear predictor coefficient of bandwidth expansion.

Optionally, among the step S103, described wavelet decomposition compression specifically comprises:

The sampling point of choosing described residual signals carries out the single order wavelet decomposition, adopts the dB10 wavelet basis, obtains wavelet coefficient, and front 100 wavelet coefficients are compressed analysis.

Optionally, among the step S103, described wavelet coefficient is carried out vector quantization specifically comprises:

First described wavelet coefficient is converted to small echo excitation amplitude spectrum, then described small echo excitation amplitude spectrum is carried out vector quantization, codebook search adopts full-search algorithm during quantification, and distortion metrics adopts weighted euclidean distance.

Optionally, among the step S103, the pitch period that calculates described residual signals specifically comprises:

Adopt Fourier transform to carry out spectrum analysis to described residual signals, and spectral magnitude carried out inversefouriertransform, with the autocorrelation peak of the residual signals that obtains as the integer fundamental tone;

In the scope of described integer fundamental tone ± 1, search for, by described residual signals is carried out interpolation and local correlation, obtain pitch period.

Optionally, among the step S103, the pitch period that calculates described residual signals further comprises:

Utilize described pitch period that described residual signals is carried out the search of fundamental tone peak value and resonance peak thereof, and with the mean value of the difference of each peak value as final pitch period.

Optionally, among the step S103, each subband is carried out voicing decision specifically comprises:

Calculate near the maximum normalized autocorrelation value of each subband signal described pitch period;

Calculate near the maximum normalized autocorrelation value of each subband envelope signal described pitch period;

Adopt threshold value comparison method, according near the maximum normalized autocorrelation value of described each subband signal described pitch period, and near the maximum normalized autocorrelation value of described each subband envelope signal described pitch period, each subband is carried out voicing decision.

The present invention has proposed a kind of phoneme synthesizing method based on described voice coding method simultaneously, and described phoneme synthesizing method may further comprise the steps:

S201, the line spectral frequencies that quantizes is decoded to the pure and impure sound characteristic of, wavelet coefficient, pitch period, gain parameter and each subband, obtain line spectral frequencies to the pure and impure sound characteristic of, small echo excitation amplitude spectrum, pitch period, gain parameter and each subband;

S202, utilize the pure and impure sound characteristic of described small echo excitation amplitude spectrum, described pitch period and described each subband, synthetic small echo pumping signal;

S203, utilize described line spectral frequencies pair, described small echo pumping signal is carried out the antilinear prediction, obtain synthetic speech;

S204, described synthetic speech composed strengthen and the phase place adjustment.

Optionally, step S202 specifically comprises:

According to the pure and impure sound characteristic of described each subband, the voiceless sound composition of each subband and voiced sound composition are carried out filtering mix, obtain wavelet spectrum;

Described wavelet spectrum is carried out inversefouriertransform, obtain wavelet coefficient, and utilize that the dB10 wavelet basis is compound to obtain described small echo pumping signal.

(3) beneficial effect

Technical scheme of the present invention has following advantage:

1, utilizes wavelet compression to remove the ground unrest of voice signal, removed simultaneously redundant information, adopt the wavelet coefficient spectrum can describe better primary speech signal as driving source.Owing to adopting the wavelet decomposition compression method to produce pumping signal, under same quantizing bit number, can describe more accurately residual signals than prior art, thereby can improve decoded voice quality.

Adopt Fourier transform to carry out spectrum analysis to residual signals when 2, extracting the integer fundamental tone, spectral magnitude is carried out anti-FFT conversion, obtain the autocorrelation peak position of residual signals as the integer fundamental tone, more accurate with respect to the integer fundamental tone that extracts in the background technology, thus can significantly improve voice quality after synthetic.

3, the pure and impure sound characteristic that produces pitch period, gain parameter and each subband of small echo pumping signal is all extracted from residual signals, has improved the voice quality of decoding.

Description of drawings

Fig. 1 is the fundamental block diagram of existing voice coding method based on linear prediction.

Fig. 2 is the fundamental block diagram of linear predict voice coding method of the present invention.

Fig. 3 is the fundamental block diagram of phoneme synthesizing method of the present invention.

Embodiment

Below in conjunction with drawings and Examples, the specific embodiment of the present invention is described in further detail.

The invention provides a kind of linear predict voice coding method, as shown in Figure 2, described voice coding method may further comprise the steps:

S102, pretreated voice are carried out linear prediction analysis, obtain linear predictor coefficient, then described linear predictor coefficient is converted to line spectral frequencies pair, and to described line spectral frequencies to carrying out vector quantization,

Described pretreated voice are carried out the prediction of second order antilinear, obtain residual signals;

Preferably, among the step S102, described linear prediction analysis specifically comprises:

Preferably, among the step S102 to described line spectral frequencies when carrying out vector quantization, adopt 3 grades of code books, codebook search adopts the weighted euclidean distance criterion.

Preferably, among the step S103, described wavelet decomposition compression specifically comprises:

Preferably, among the step S103, described wavelet coefficient is carried out vector quantization specifically comprises:

Preferably, among the step S103, the pitch period that calculates described residual signals specifically comprises:

Preferably, among the step S103, the pitch period that calculates described residual signals further comprises:

Preferably, among the step S103, each subband is carried out voicing decision specifically comprises:

The technical scheme that the present invention proposes can be called for short the WELP vocoder by a kind of small echo Excited Linear Prediction (WaveletExcited Linear Prediction, WELP) voice encoding/decording device, is realized.The parameter that the WELP vocoder need to extract comprises that mainly line spectral frequencies (Line Spectrum Frequency, LSF), residual error small echo drive factor (being wavelet coefficient), wavelet coefficient cycle (being pitch period), small echo gain (being gain parameter) and wavelet coefficient periodically indicate (the pure and impure sound characteristic that is each subband).

Below in conjunction with the WELP vocoder, respectively the concrete methods of realizing of each step carried out detailed illustrating.

In step S101, the WELP vocoder will be inputted voice and disturb to remove power frequency through Hi-pass filter, the Hi-pass filter that pre-service is 60Hz by a cutoff frequency is finished, disturb and carry out high boost in order to the power frequency of removing flip-flop and 50Hz, the frequency response function of Hi-pass filter as shown in Equation (1):

H (z) = \frac{{1 - z}^{- 1}}{{1 - 0.95 z}^{- 1}} - - - (1)

Z is the frequency variable of transform in the formula (1).The frequency response of above-mentioned Hi-pass filter has the inhibition of 50dB at the 50Hz place.

In step S102, the coefficient (being the LPC coefficient) of linear prediction filter be used for is extracted in linear prediction analysis, and to the speech frame signal of choosing, filter coefficient is chosen 10 rank, its transport function as shown in Equation (2):

A (s) = 1 - Σ_{i = 1}^{10} a_{i} z^{- i} - - - (2)

Wherein, { a _i, (i=1,2 ..., 10) and be the LPC coefficient.

The WELP vocoder will adopt 200 Hamming windows carry out windowing process through pretreated voice signal, the expression formula of Hamming window w (n) as shown in Equation (3):

w (n) = 0.54 - 0.46 \cos (\frac{2 πn}{199}) - - - (3)

Wherein, sampling point sequence number n=0,1 ..., 199.Voice signal s after the windowing _w(n) be used for the calculating of autocorrelation function, as shown in Equation (4):

r (k) = Σ_{n = k}^{199} s_{w} (n) s_{w} (n - k) - - - (4)

Wherein, s _w(n) be the voice signal after the windowing, r (k) is autocorrelation function, and n and k are respectively sequence number.The LPC coefficient obtains by solution formula (5):

\begin{matrix} Σ_{i = 1}^{10} a_{i} r (| i - k |) = - r (k) & k = 1,2 . . ., 10 \end{matrix} - - - (5)

A in the formula (5) _iBe the LPC coefficient, r is autocorrelation function, and the system of equations of utilizing Levinson-Durbin algorithm solution following formula to consist of can obtain a _i, i=1,2 ..., 10.With the 10 rank linear predictor coefficient a that obtain _iMultiply by 0.994 ^I+1, carrying out bandwidth expansion, this helps to improve resonance peak structure and is convenient to the LSF parameter quantification.

The WELP vocoder is converted into line spectral frequencies (LSF) with the LPC coefficient that extracts, and LSF transforms to the LPC of stochastic distribution in the circle on the circumference, is suitable for quantizing.The LSF parameter-definition is the root of formula (6) and formula (7):

F_{1} (z) = \frac{A (z) + z^{- 11} A (z^{- 1})}{1 + z^{- 1}} - - - (6)

F_{2} (z) = \frac{A (z) - z^{- 11} A (z^{- 1})}{1 - z^{- 1}} - - - (7)

Wherein, A (z) is the transport function of the described linear prediction filter of formula (2).

Next, the WELP vocoder uses vector quantization to the line spectral frequencies parameter of 10 dimensions, adopts 3 grades of code books, and used bit numbers at different levels are respectively 7,7,6, amount to 20 bits.Codebook search adopts the weighted euclidean distance criterion, as shown in Equation (8):

Wherein, d _LspBe the Euclidean Weighted distance, l represents LSF vector to be quantified,

The expression codebook vectors, the i dimension LSF that l (i) expression is to be quantified,

I dimension LSF in the expression code book, w (i) is the weight of every one dimension, w (i) as shown in Equation (9):

w (i) = \{\begin{matrix} p {(f_{i})}^{0.3}, & i = 1 - 8 \\ 0.64 \cdot p {(f_{i})}^{0.3}, & i = 9 \\ 0.16 \cdot p {(f_{i})}^{0.3}, & i = 10 \end{matrix} - - - (9)

Wherein, p (f _i) represent that linear prediction filter is in the power spectrum density at i line spectral frequencies parameter respective frequencies place.

Simultaneously, WELP vocoder employing cutoff frequency is that the low-pass filter of 1000Hz carries out filtering to voice signal, removes high-frequency signal to the impact of pitch estimation; Then this signal is carried out the prediction of second order antilinear, remove resonance peak to the impact of fundamental tone harmonic wave, thereby obtain residual signals.

In step S103, at first residual signals is carried out wavelet decomposition and compression.Residual signals is the signal that removes behind the spectrum envelope, and to the voiced sound signal, frequency spectrum has periodically.Get 160 of present frame and 40 residual signals sampling points of former frame and carry out 1 rank wavelet decomposition, adopt the dB10 wavelet basis, obtain 218 wavelet coefficients, then get front 100 wavelet coefficients and compress analysis.

Next, wavelet coefficient is transformed to frequency domain, and fundamental frequency is taken advantage of 2, calculate near 10 spectrum peaks of wavelet coefficient power spectrum fundamental frequency after the frequency multiplication and harmonic wave thereof, when peak value less than 10 the time, remaining number is filled to 1.

The WELP vocoder adopts vector quantization method to the small echo excitation amplitude spectrum of 10 dimensions, and codebook search adopts full-search algorithm, and distortion metrics adopts weighted euclidean distance, as shown in Equation (10):

Wherein,

The expression distortion metrics, the small echo excitation amplitude spectrum that A (i) expression is to be quantified,

The expression codebook vectors, w (i) represents weighting coefficient.

When calculating pitch period, the signal that adopts Fourier pair to remove resonance peak carries out spectrum analysis, and spectral magnitude is carried out anti-FFT conversion, obtains the autocorrelation peak of this signal, as shown in Equation (11):

r (τ) = {&Integral; [| &Integral; x (t) e^{- jwt} dt |]}^{2} e^{- jwτ} dw - - - (11)

Wherein, r (τ) represents autocorrelation function, and the signal value of resonance peak is removed in x (t) expression, and w is the signal angular frequency, and t is the time, r (τ) on time shaft the correlation curve peak value and the difference of time shaft mid point be pitch period.

The pitch period value that said method obtains is integer, and the pitch period of voice may not be an integer, need to search in the scope of this integer ± 1, obtains accurately pitch value, by the signal of removing resonance peak is carried out interpolation and local correlation, search obtains accurately pitch period.

Because pitch period is very large to the vocoder voice quality impacts, in order to estimate more exactly pitch period, utilize pitch period obtained above to carry out fundamental tone peak value and resonance peak search thereof to removing the resonance peak signal spectrum, with the mean value of the difference of each peak value as last pitch period.

The scope of pitch period is 18 ~ 145, adopts 7 bits to carry out scalar quantization, and 18 ~ 145 are mapped to respectively 0 ~ 127 of 7 bits.

The yield value of gain parameter (Gain) expression voice signal, as shown in Equation (12):

G = 10 \log_{10} [0.01 + \sqrt{\frac{1}{L} Σ_{n = 0}^{L - 1} r {(n)}^{2}}] - - - (12)

Wherein, L is pitch period, and r (n) is wavelet coefficient, and frame head and postamble are extracted respectively gain parameter G1 and G2.

The WELP vocoder carries out the quantification of gain parameter at log-domain.Gain parameter transforms to first log-domain, and limited range 10～77dB, and G1 adopts 3 bits to quantize, and G2 adopts 5 bits to quantize, its quantization index as shown in Equation (13):

Wherein, g' represents the gain of log-domain.

Also need in addition each subband is carried out voicing decision, the pure and impure sound characteristic that obtains each subband rower amount of going forward side by side quantizes.The WELP vocoder is divided into a plurality of subbands to the voice signal of 0 ~ 4000Hz to carry out clearly/the voiced sound analysis.The below is divided into 0 ~ 500Hz to voice signal take 5 subbands as example is described, 500 ~ 1000Hz, and 1000 ~ 2000Hz, 2000 ~ 3000Hz and 3000 ~ 4000Hz be totally 5 subbands.Adopt 6 rank Butterworth IIR bandpass filter that pretreated voice signal is carried out filtering, obtain respectively each subband signal.Then, each bandpass filtered signal is calculated maximum normalized autocorrelation functions value near the pitch period value, as shown in Equation (14):

r_{i} (p) = \frac{Σ_{n = 0}^{L - 1} s_{i} (n) s_{i} (n - p)}{{[Σ_{n = 0}^{L - 1} s_{i} {(n)}^{2} Σ_{n = 0}^{L - 1} s_{i} {(n - p)}^{2}]}^{1 / 2}} i = 1, . . ., 5 - - - (14)

Wherein, s _i(n) be bandpass filtered signal, p is near the sample value pitch period.In these normalized autocorrelation values, the autocorrelation value r that search is maximum ₁(i), i=1 ..., 5.

Then the envelope signal of each subband signal calculated normalized autocorrelation functions.Envelope signal obtains by subband signal is carried out second-order low-pass filter.Envelope signal is calculated maximum normalized autocorrelation functions value r near pitch period ₂(i), i=1 ..., 5.

Adopt the thresholding relative method, according to r ₁(i), i=1 ..., 5 and r ₂(i), i=1 ..., 5 unite each sub-band surd and sonant of judgement.

First subband of WELP vocoder clear/voiced sound represents with pitch period, 0 expression voiceless sound, non-zero be voiced sound, other 4 subbands represent with 4 bits, 1 represents voiced sound, 0 represents voiceless sound.

Based on above-mentioned linear predict voice coding method, the present invention has proposed a kind of phoneme synthesizing method simultaneously, and as shown in Figure 3, described phoneme synthesizing method may further comprise the steps:

Preferably, step S202 specifically comprises:

The below describes the phonetic synthesis process as an example of 5 subbands example.Carry out the small echo pumping signal according to pure and impure sound characteristic, pitch period and the Wavelet Spectrum of 5 subbands synthetic, each subband signal according to 0 ~ 500Hz, 500 ~ 1000Hz, 1000 ~ 2000Hz, 2000 ~ 3000Hz and 3000 ~ 4000Hz the voicing decision result of totally 5 subbands voiced sound composition and voiceless sound composition carried out filtering in proportion at each frequency band mix, wherein the voiced sound subband adopts the voiced sound composition to adopt the wavelet coefficient spectrum amplitude to carry out inversefouriertransform, and the voiceless sound subband adopts white noise to carry out match.

According to scrambler the voicing decision result of 5 subbands is carried out filtering to voiced sound composition and voiceless sound composition in proportion at each frequency band and mix, obtain wavelet spectrum c _f(k):

c _f(k)＝V _kA _k+(1-V _k)e _k0≤k≤9 （15）

Wherein, A _kBe wavelet coefficient spectrum amplitude, V _kBe the pure and impure phonetic symbol note of k wavelet coefficient spectrum amplitude subband of living in, voiceless sound is 0, and voiced sound is 1, e _kBe random noise.

By to c _f(k) carry out inversefouriertransform and can recover wavelet coefficient, and utilize the dB10 wavelet basis to be compounded to form pumping signal.

Pumping signal can be obtained synthetic speech by antilinear prediction synthesis filter H (z):

H (z) = \frac{1}{1 + Σ_{i = 1}^{10} a_{i} z^{- i}} - - - (16)

Wherein, a _i, i=1,2 ..., 10 is linear predictor coefficient.

In order to make synthetic speech more natural, need to compose enhancing and phase place adjustment to synthetic speech, spectrum strengthens employing linear filter A (z) and constructs:

H (z) = \frac{A (z / α)}{A (z / β)} (1 + {μz}^{- 1}) - - - (17)

A (z) = \frac{1}{H (z)} - - - (18)

Wherein, α, β are that short-term postfilter is adjusted the factor, and μ is the slope compensation factor.α=0.5p, β=0.8p, G _IntBe the current gain after the interpolation, G _nBe the ground unrest gain, μ is the inclination factor after max (0.5k1,0) the p interpolation.

The below provides an application example of the present invention.

For the WELP vocoder, the input voice are the linear PCM signal of 8kHz sampling, and the speech signal analysis frame length is 20ms, and present frame is totally 160 sampling points, and 240 sampling points before the buffer memory.These 400 sampling points are carried out parameter extraction, the parameter of extracting comprise 10 linear predictor coefficients generate 10 line spectrum pairs, 10 wavelet coefficients, 1 pitch period, 2 gain parameters, 4 subband voice clear/voiced sound (described by pitch period by the pure and impure sound characteristic of the 1st subband, pitch period is 0 expression voiceless sound, otherwise is voiced sound).The Bit Allocation in Discrete of these parameters is as shown in table 1:

The Bit Allocation in Discrete table of table 1 parameter

Parameter	Voiced sound	Voiceless sound
			Line spectrum pair	20	20
Pitch period	8	8
			Wavelet coefficient	8	0
Gain	8	8
			The sub-band surd and sonant mark	4	0
FEC	0	13

In order to weigh effect of the present invention, the objective MOS value evaluation method that adopts the ITU-TP.862 recommendation to provide is carried out the voice coding evaluation, and the MOS value of evaluation software PESQ is that 0 ~ 4.5, MOS value is higher, and voice quality is better, and test result is as shown in table 2:

Table 2 test result

	English	Chinese
			Male voice	3.3	3.18
Female voice	3.4	3.21

The above only is preferred implementation of the present invention; should be pointed out that for a person skilled in the art, under the prerequisite that does not break away from the technology of the present invention principle; can also make some improvement and replacement, these improvement and replacement also should be considered as protection scope of the present invention.

Claims

1. a linear predict voice coding method is characterized in that, described voice coding method may further comprise the steps:

2. voice coding method according to claim 1 is characterized in that, step S102 further comprises:

3. voice coding method according to claim 2 is characterized in that, among the step S102, described linear prediction analysis specifically comprises:

4. voice coding method according to claim 1 is characterized in that, among the step S103, described wavelet decomposition compression specifically comprises:

5. according to claim 1 or 4 described voice coding methods, it is characterized in that, among the step S103, described wavelet coefficient carried out vector quantization specifically comprise:

6. voice coding method according to claim 1 is characterized in that, among the step S103, the pitch period that calculates described residual signals specifically comprises:

7. voice coding method according to claim 6 is characterized in that, among the step S103, the pitch period that calculates described residual signals further comprises:

8. voice coding method according to claim 1 is characterized in that, among the step S103, each subband is carried out voicing decision specifically comprise:

9. the phoneme synthesizing method based on claim 2 or 3 described linear predict voice coding methods is characterized in that, described phoneme synthesizing method may further comprise the steps:

10. phoneme synthesizing method according to claim 9 is characterized in that, step S202 specifically comprises: