CN103050121A - Linear prediction speech coding method and speech synthesis method - Google Patents

Linear prediction speech coding method and speech synthesis method Download PDF

Info

Publication number
CN103050121A
CN103050121A CN 201210592909 CN201210592909A CN103050121A CN 103050121 A CN103050121 A CN 103050121A CN 201210592909 CN201210592909 CN 201210592909 CN 201210592909 A CN201210592909 A CN 201210592909A CN 103050121 A CN103050121 A CN 103050121A
Authority
CN
China
Prior art keywords
carried out
pitch period
subband
wavelet
residual signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201210592909
Other languages
Chinese (zh)
Inventor
洪小斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING XUNGUANGDA COMMUNICATION TECHNOLOGY CO LTD
Original Assignee
BEIJING XUNGUANGDA COMMUNICATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING XUNGUANGDA COMMUNICATION TECHNOLOGY CO LTD filed Critical BEIJING XUNGUANGDA COMMUNICATION TECHNOLOGY CO LTD
Priority to CN 201210592909 priority Critical patent/CN103050121A/en
Publication of CN103050121A publication Critical patent/CN103050121A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a linear prediction speech coding method and a speech synthesis method. The linear prediction speech coding method includes the following steps: speech is preprocessed; second-order backward linear prediction is carried out on the preprocessed speech, so that a residual signal is obtained; wavelet decomposition and compression are carried out on the residual signal, so that a wavelet coefficient is obtained, vector quantization is carried out on the wavelet coefficient, and meanwhile, the pitch period and gain parameters of the residual signal and the unvoicing and voicing characteristic of each sub-band are calculated and respectively and scalarly quantized. The speech synthesis method is based on the linear prediction speech coding method. After being adopted, the technical scheme of the invention can reduce the affection of noise on the quality of decoded speech, inhibit the deterioration of speech quality when unvoicing and voicing judgment is mistaken and improve the performance of coding unvoiced speech or background noise.

Description

Linear predict voice coding method and phoneme synthesizing method
Technical field
The present invention relates to speech coding technology, particularly a kind of linear predict voice coding method and a kind of phoneme synthesizing method.
Background technology
Along with the high speed development of information society and the communication technology, it is further valuable that frequency resource seems.In digital mobile communication and voice field of storage, in order effectively to utilize communication bandwidth or storage space, use the transmission bandwidth of sound encoding device compressed voice signal or the transmission code rate of reduction telephone channel, raise the efficiency and encode, be the target that people pursue always.Along with increase, the Network synthesization and variation of communication network users quantity, the contradiction of the network bandwidth and power system capacity, service quality, traditional voice compressed encoding and decoding technology can not satisfy the transmission channel requirement that constantly becomes crowded.Therefore, the bit rate that how to reduce as far as possible its transmission under the prerequisite of not sacrificing voice call quality is important research topic.Over past ten years, (4.8kbps~16kbps) speech coding algorithm research has obtained significant progress and widespread use has been arranged middle bit rate, and the while low bit rate particularly following speech coding algorithm of 2.4kbps becomes research focus gradually.Along with the rapid raising of the process chip arithmetic speed of moving encryption algorithm, gradually become the main flow of low bit rate speech coding algorithm based on the algorithm of linear prediction mixed coding technology.
Linear predictive coding (Linear Prediction Coding, LPC) basis is that hypothesis voice signal (voiced sound) is the hummer generation of sound pipe end, be attended by once in a while fizz with explosion sound (sibilant and plosive), glottis between the vocal cords produces the sound of varying strength (volume) and frequency (tone), and throat and mouth form the sympathetic response sound channel.Fizz generate with the effect of explosion sound by tongue, lip and throat.Linear predictive coding is by estimating resonance peak, reject their effects in voice signal, estimating that the buzz intensity and the frequency that keep come the analyzing speech signal.The process of rejecting resonance peak is called liftering, is called residual signals through the remaining signal of this process.Describe parameter and the linear predictor coefficient of resonance peak, residual signals and can preserve, send to the take over party.The take over party is by reverse process synthetic speech signal, and resonance peak, residual signals produce source signal as driving source, uses linear predictor coefficient as the wave filter of sound channel, and source signal just obtains voice signal through the processing of wave filter.
According to the difference to the pumping signal describing mode, the linear predict voice coding method mainly is divided into LPC-10, Qualcomm Code Excited Linear Prediction (QCELP) (Code Excited Linear Prediction, CELP), mixed excitation (Mixed Excited Linear Prediction, MELP), sinusoidal excitation (Sinusoidal Excited Linear Prediction, SELP) and multi-band excitation (Multi-BandExcitation) etc.These voice coding modes are that voice are divided into certain frame length (about 20ms~50ms), each frame is carried out the linear prediction of voice, with known codebook the prediction residual of passing through the linear prediction gained (pumping signal) of linear prediction vector and every frame is encoded.
Fig. 1 is the fundamental block diagram of existing voice coding method based on linear prediction, and these methods are except the extracting method difference of residual error parameter, and the extraction of other parameter is all basic identical.In Fig. 1, pumping signal represents with the pure and impure sound of each subband of the gain of the pitch period of residual error parameter, raw tone, raw tone and raw tone, and the residual error parameter is used for describing the harmonic component of voiced sound in the residual error, and voiceless sound replaces with noise.
The existing noise intensity that depends on consumingly raw tone based on the vocoder voice quality of linear prediction, when the raw tone signal to noise ratio (S/N ratio) was relatively poor, voicing decision mistake, fundamental tone extracted mistake and can cause serious modified tone distortion, and synthetic naturalness is descended.The pure and impure sound that produces pitch period, gain and the subband of pumping signal in these technology all extracts from raw tone, partial parameters derives from raw tone during receiving end reduction pumping signal, partial parameters derives from residual signals, and the voice quality of decoding is restricted.
Summary of the invention
(1) technical matters to be solved
The object of the present invention is to provide a kind of linear predict voice coding method and a kind of phoneme synthesizing method, can noise decrease on the impact of decoded speech quality, sound quality deterioration when suppressing the voicing decision mistake, and improvement is to the coding efficiency of unvoiced speech or ground unrest.
(2) technical scheme
In order to solve the problems of the technologies described above, the present invention proposes a kind of linear predict voice coding method, described voice coding method may further comprise the steps:
S101, voice are carried out pre-service disturb to remove flip-flop and power frequency;
S102, pretreated voice are carried out second order antilinear prediction, obtain residual signals;
S103, described residual signals is carried out wavelet decomposition compression, obtains wavelet coefficient, and described wavelet coefficient is carried out vector quantization,
Calculate the pitch period of described residual signals, and described pitch period carried out scalar quantization,
Calculate the gain parameter of described residual signals, and described gain parameter carried out scalar quantization,
Described residual signals is divided into several subbands, each subband is carried out voicing decision, the pure and impure sound characteristic that obtains each subband rower amount of going forward side by side quantizes.
Optionally, step S102 further comprises:
Described pretreated voice are carried out linear prediction analysis, obtain linear predictor coefficient, then described linear predictor coefficient is converted to line spectral frequencies pair, and to described line spectral frequencies to carrying out vector quantization.
Optionally, among the step S102, described linear prediction analysis specifically comprises:
Adopt Hamming window to carry out windowing process to described pretreated voice, and the voice signal after the windowing is carried out autocorrelation calculation, utilize the Levinson-Durbin algorithm to calculate 10 rank linear predictor coefficients, then described 10 rank linear predictor coefficients be multiply by 0.994 I+1(i=1,2 ..., 10) and to obtain the linear predictor coefficient of bandwidth expansion.
Optionally, among the step S103, described wavelet decomposition compression specifically comprises:
The sampling point of choosing described residual signals carries out the single order wavelet decomposition, adopts the dB10 wavelet basis, obtains wavelet coefficient, and front 100 wavelet coefficients are compressed analysis.
Optionally, among the step S103, described wavelet coefficient is carried out vector quantization specifically comprises:
First described wavelet coefficient is converted to small echo excitation amplitude spectrum, then described small echo excitation amplitude spectrum is carried out vector quantization, codebook search adopts full-search algorithm during quantification, and distortion metrics adopts weighted euclidean distance.
Optionally, among the step S103, the pitch period that calculates described residual signals specifically comprises:
Adopt Fourier transform to carry out spectrum analysis to described residual signals, and spectral magnitude carried out inversefouriertransform, with the autocorrelation peak of the residual signals that obtains as the integer fundamental tone;
In the scope of described integer fundamental tone ± 1, search for, by described residual signals is carried out interpolation and local correlation, obtain pitch period.
Optionally, among the step S103, the pitch period that calculates described residual signals further comprises:
Utilize described pitch period that described residual signals is carried out the search of fundamental tone peak value and resonance peak thereof, and with the mean value of the difference of each peak value as final pitch period.
Optionally, among the step S103, each subband is carried out voicing decision specifically comprises:
Calculate near the maximum normalized autocorrelation value of each subband signal described pitch period;
Calculate near the maximum normalized autocorrelation value of each subband envelope signal described pitch period;
Adopt threshold value comparison method, according near the maximum normalized autocorrelation value of described each subband signal described pitch period, and near the maximum normalized autocorrelation value of described each subband envelope signal described pitch period, each subband is carried out voicing decision.
The present invention has proposed a kind of phoneme synthesizing method based on described voice coding method simultaneously, and described phoneme synthesizing method may further comprise the steps:
S201, the line spectral frequencies that quantizes is decoded to the pure and impure sound characteristic of, wavelet coefficient, pitch period, gain parameter and each subband, obtain line spectral frequencies to the pure and impure sound characteristic of, small echo excitation amplitude spectrum, pitch period, gain parameter and each subband;
S202, utilize the pure and impure sound characteristic of described small echo excitation amplitude spectrum, described pitch period and described each subband, synthetic small echo pumping signal;
S203, utilize described line spectral frequencies pair, described small echo pumping signal is carried out the antilinear prediction, obtain synthetic speech;
S204, described synthetic speech composed strengthen and the phase place adjustment.
Optionally, step S202 specifically comprises:
According to the pure and impure sound characteristic of described each subband, the voiceless sound composition of each subband and voiced sound composition are carried out filtering mix, obtain wavelet spectrum;
Described wavelet spectrum is carried out inversefouriertransform, obtain wavelet coefficient, and utilize that the dB10 wavelet basis is compound to obtain described small echo pumping signal.
(3) beneficial effect
Technical scheme of the present invention has following advantage:
1, utilizes wavelet compression to remove the ground unrest of voice signal, removed simultaneously redundant information, adopt the wavelet coefficient spectrum can describe better primary speech signal as driving source.Owing to adopting the wavelet decomposition compression method to produce pumping signal, under same quantizing bit number, can describe more accurately residual signals than prior art, thereby can improve decoded voice quality.
Adopt Fourier transform to carry out spectrum analysis to residual signals when 2, extracting the integer fundamental tone, spectral magnitude is carried out anti-FFT conversion, obtain the autocorrelation peak position of residual signals as the integer fundamental tone, more accurate with respect to the integer fundamental tone that extracts in the background technology, thus can significantly improve voice quality after synthetic.
3, the pure and impure sound characteristic that produces pitch period, gain parameter and each subband of small echo pumping signal is all extracted from residual signals, has improved the voice quality of decoding.
Description of drawings
Fig. 1 is the fundamental block diagram of existing voice coding method based on linear prediction.
Fig. 2 is the fundamental block diagram of linear predict voice coding method of the present invention.
Fig. 3 is the fundamental block diagram of phoneme synthesizing method of the present invention.
Embodiment
Below in conjunction with drawings and Examples, the specific embodiment of the present invention is described in further detail.
The invention provides a kind of linear predict voice coding method, as shown in Figure 2, described voice coding method may further comprise the steps:
S101, voice are carried out pre-service disturb to remove flip-flop and power frequency;
S102, pretreated voice are carried out linear prediction analysis, obtain linear predictor coefficient, then described linear predictor coefficient is converted to line spectral frequencies pair, and to described line spectral frequencies to carrying out vector quantization,
Described pretreated voice are carried out the prediction of second order antilinear, obtain residual signals;
S103, described residual signals is carried out wavelet decomposition compression, obtains wavelet coefficient, and described wavelet coefficient is carried out vector quantization,
Calculate the pitch period of described residual signals, and described pitch period carried out scalar quantization,
Calculate the gain parameter of described residual signals, and described gain parameter carried out scalar quantization,
Described residual signals is divided into several subbands, each subband is carried out voicing decision, the pure and impure sound characteristic that obtains each subband rower amount of going forward side by side quantizes.
Preferably, among the step S102, described linear prediction analysis specifically comprises:
Adopt Hamming window to carry out windowing process to described pretreated voice, and the voice signal after the windowing is carried out autocorrelation calculation, utilize the Levinson-Durbin algorithm to calculate 10 rank linear predictor coefficients, then described 10 rank linear predictor coefficients be multiply by 0.994 I+1(i=1,2 ..., 10) and to obtain the linear predictor coefficient of bandwidth expansion.
Preferably, among the step S102 to described line spectral frequencies when carrying out vector quantization, adopt 3 grades of code books, codebook search adopts the weighted euclidean distance criterion.
Preferably, among the step S103, described wavelet decomposition compression specifically comprises:
The sampling point of choosing described residual signals carries out the single order wavelet decomposition, adopts the dB10 wavelet basis, obtains wavelet coefficient, and front 100 wavelet coefficients are compressed analysis.
Preferably, among the step S103, described wavelet coefficient is carried out vector quantization specifically comprises:
First described wavelet coefficient is converted to small echo excitation amplitude spectrum, then described small echo excitation amplitude spectrum is carried out vector quantization, codebook search adopts full-search algorithm during quantification, and distortion metrics adopts weighted euclidean distance.
Preferably, among the step S103, the pitch period that calculates described residual signals specifically comprises:
Adopt Fourier transform to carry out spectrum analysis to described residual signals, and spectral magnitude carried out inversefouriertransform, with the autocorrelation peak of the residual signals that obtains as the integer fundamental tone;
In the scope of described integer fundamental tone ± 1, search for, by described residual signals is carried out interpolation and local correlation, obtain pitch period.
Preferably, among the step S103, the pitch period that calculates described residual signals further comprises:
Utilize described pitch period that described residual signals is carried out the search of fundamental tone peak value and resonance peak thereof, and with the mean value of the difference of each peak value as final pitch period.
Preferably, among the step S103, each subband is carried out voicing decision specifically comprises:
Calculate near the maximum normalized autocorrelation value of each subband signal described pitch period;
Calculate near the maximum normalized autocorrelation value of each subband envelope signal described pitch period;
Adopt threshold value comparison method, according near the maximum normalized autocorrelation value of described each subband signal described pitch period, and near the maximum normalized autocorrelation value of described each subband envelope signal described pitch period, each subband is carried out voicing decision.
The technical scheme that the present invention proposes can be called for short the WELP vocoder by a kind of small echo Excited Linear Prediction (WaveletExcited Linear Prediction, WELP) voice encoding/decording device, is realized.The parameter that the WELP vocoder need to extract comprises that mainly line spectral frequencies (Line Spectrum Frequency, LSF), residual error small echo drive factor (being wavelet coefficient), wavelet coefficient cycle (being pitch period), small echo gain (being gain parameter) and wavelet coefficient periodically indicate (the pure and impure sound characteristic that is each subband).
Below in conjunction with the WELP vocoder, respectively the concrete methods of realizing of each step carried out detailed illustrating.
In step S101, the WELP vocoder will be inputted voice and disturb to remove power frequency through Hi-pass filter, the Hi-pass filter that pre-service is 60Hz by a cutoff frequency is finished, disturb and carry out high boost in order to the power frequency of removing flip-flop and 50Hz, the frequency response function of Hi-pass filter as shown in Equation (1):
H ( z ) = 1 - z - 1 1 - 0.95 z - 1 - - - ( 1 )
Z is the frequency variable of transform in the formula (1).The frequency response of above-mentioned Hi-pass filter has the inhibition of 50dB at the 50Hz place.
In step S102, the coefficient (being the LPC coefficient) of linear prediction filter be used for is extracted in linear prediction analysis, and to the speech frame signal of choosing, filter coefficient is chosen 10 rank, its transport function as shown in Equation (2):
A ( s ) = 1 - Σ i = 1 10 a i z - i - - - ( 2 )
Wherein, { a i, (i=1,2 ..., 10) and be the LPC coefficient.
The WELP vocoder will adopt 200 Hamming windows carry out windowing process through pretreated voice signal, the expression formula of Hamming window w (n) as shown in Equation (3):
w ( n ) = 0.54 - 0.46 cos ( 2 πn 199 ) - - - ( 3 )
Wherein, sampling point sequence number n=0,1 ..., 199.Voice signal s after the windowing w(n) be used for the calculating of autocorrelation function, as shown in Equation (4):
r ( k ) = Σ n = k 199 s w ( n ) s w ( n - k ) - - - ( 4 )
Wherein, s w(n) be the voice signal after the windowing, r (k) is autocorrelation function, and n and k are respectively sequence number.The LPC coefficient obtains by solution formula (5):
Σ i = 1 10 a i r ( | i - k | ) = - r ( k ) k = 1,2 . . . , 10 - - - ( 5 )
A in the formula (5) iBe the LPC coefficient, r is autocorrelation function, and the system of equations of utilizing Levinson-Durbin algorithm solution following formula to consist of can obtain a i, i=1,2 ..., 10.With the 10 rank linear predictor coefficient a that obtain iMultiply by 0.994 I+1, carrying out bandwidth expansion, this helps to improve resonance peak structure and is convenient to the LSF parameter quantification.
The WELP vocoder is converted into line spectral frequencies (LSF) with the LPC coefficient that extracts, and LSF transforms to the LPC of stochastic distribution in the circle on the circumference, is suitable for quantizing.The LSF parameter-definition is the root of formula (6) and formula (7):
F 1 ( z ) = A ( z ) + z - 11 A ( z - 1 ) 1 + z - 1 - - - ( 6 )
F 2 ( z ) = A ( z ) - z - 11 A ( z - 1 ) 1 - z - 1 - - - ( 7 )
Wherein, A (z) is the transport function of the described linear prediction filter of formula (2).
Next, the WELP vocoder uses vector quantization to the line spectral frequencies parameter of 10 dimensions, adopts 3 grades of code books, and used bit numbers at different levels are respectively 7,7,6, amount to 20 bits.Codebook search adopts the weighted euclidean distance criterion, as shown in Equation (8):
Figure BDA00002693616600092
Wherein, d LspBe the Euclidean Weighted distance, l represents LSF vector to be quantified,
Figure BDA00002693616600093
The expression codebook vectors, the i dimension LSF that l (i) expression is to be quantified,
Figure BDA00002693616600094
I dimension LSF in the expression code book, w (i) is the weight of every one dimension, w (i) as shown in Equation (9):
w ( i ) = p ( f i ) 0.3 , i = 1 - 8 0.64 · p ( f i ) 0.3 , i = 9 0.16 · p ( f i ) 0.3 , i = 10 - - - ( 9 )
Wherein, p (f i) represent that linear prediction filter is in the power spectrum density at i line spectral frequencies parameter respective frequencies place.
Simultaneously, WELP vocoder employing cutoff frequency is that the low-pass filter of 1000Hz carries out filtering to voice signal, removes high-frequency signal to the impact of pitch estimation; Then this signal is carried out the prediction of second order antilinear, remove resonance peak to the impact of fundamental tone harmonic wave, thereby obtain residual signals.
In step S103, at first residual signals is carried out wavelet decomposition and compression.Residual signals is the signal that removes behind the spectrum envelope, and to the voiced sound signal, frequency spectrum has periodically.Get 160 of present frame and 40 residual signals sampling points of former frame and carry out 1 rank wavelet decomposition, adopt the dB10 wavelet basis, obtain 218 wavelet coefficients, then get front 100 wavelet coefficients and compress analysis.
Next, wavelet coefficient is transformed to frequency domain, and fundamental frequency is taken advantage of 2, calculate near 10 spectrum peaks of wavelet coefficient power spectrum fundamental frequency after the frequency multiplication and harmonic wave thereof, when peak value less than 10 the time, remaining number is filled to 1.
The WELP vocoder adopts vector quantization method to the small echo excitation amplitude spectrum of 10 dimensions, and codebook search adopts full-search algorithm, and distortion metrics adopts weighted euclidean distance, as shown in Equation (10):
Figure BDA00002693616600101
Wherein,
Figure BDA00002693616600102
The expression distortion metrics, the small echo excitation amplitude spectrum that A (i) expression is to be quantified,
Figure BDA00002693616600103
The expression codebook vectors, w (i) represents weighting coefficient.
When calculating pitch period, the signal that adopts Fourier pair to remove resonance peak carries out spectrum analysis, and spectral magnitude is carried out anti-FFT conversion, obtains the autocorrelation peak of this signal, as shown in Equation (11):
r ( τ ) = ∫ [ | ∫ x ( t ) e - jwt dt | ] 2 e - jwτ dw - - - ( 11 )
Wherein, r (τ) represents autocorrelation function, and the signal value of resonance peak is removed in x (t) expression, and w is the signal angular frequency, and t is the time, r (τ) on time shaft the correlation curve peak value and the difference of time shaft mid point be pitch period.
The pitch period value that said method obtains is integer, and the pitch period of voice may not be an integer, need to search in the scope of this integer ± 1, obtains accurately pitch value, by the signal of removing resonance peak is carried out interpolation and local correlation, search obtains accurately pitch period.
Because pitch period is very large to the vocoder voice quality impacts, in order to estimate more exactly pitch period, utilize pitch period obtained above to carry out fundamental tone peak value and resonance peak search thereof to removing the resonance peak signal spectrum, with the mean value of the difference of each peak value as last pitch period.
The scope of pitch period is 18 ~ 145, adopts 7 bits to carry out scalar quantization, and 18 ~ 145 are mapped to respectively 0 ~ 127 of 7 bits.
The yield value of gain parameter (Gain) expression voice signal, as shown in Equation (12):
G = 10 log 10 [ 0.01 + 1 L Σ n = 0 L - 1 r ( n ) 2 ] - - - ( 12 )
Wherein, L is pitch period, and r (n) is wavelet coefficient, and frame head and postamble are extracted respectively gain parameter G1 and G2.
The WELP vocoder carries out the quantification of gain parameter at log-domain.Gain parameter transforms to first log-domain, and limited range 10~77dB, and G1 adopts 3 bits to quantize, and G2 adopts 5 bits to quantize, its quantization index as shown in Equation (13):
Figure BDA00002693616600111
Figure BDA00002693616600112
Wherein, g' represents the gain of log-domain.
Also need in addition each subband is carried out voicing decision, the pure and impure sound characteristic that obtains each subband rower amount of going forward side by side quantizes.The WELP vocoder is divided into a plurality of subbands to the voice signal of 0 ~ 4000Hz to carry out clearly/the voiced sound analysis.The below is divided into 0 ~ 500Hz to voice signal take 5 subbands as example is described, 500 ~ 1000Hz, and 1000 ~ 2000Hz, 2000 ~ 3000Hz and 3000 ~ 4000Hz be totally 5 subbands.Adopt 6 rank Butterworth IIR bandpass filter that pretreated voice signal is carried out filtering, obtain respectively each subband signal.Then, each bandpass filtered signal is calculated maximum normalized autocorrelation functions value near the pitch period value, as shown in Equation (14):
r i ( p ) = Σ n = 0 L - 1 s i ( n ) s i ( n - p ) [ Σ n = 0 L - 1 s i ( n ) 2 Σ n = 0 L - 1 s i ( n - p ) 2 ] 1 / 2 i = 1 , . . . , 5 - - - ( 14 )
Wherein, s i(n) be bandpass filtered signal, p is near the sample value pitch period.In these normalized autocorrelation values, the autocorrelation value r that search is maximum 1(i), i=1 ..., 5.
Then the envelope signal of each subband signal calculated normalized autocorrelation functions.Envelope signal obtains by subband signal is carried out second-order low-pass filter.Envelope signal is calculated maximum normalized autocorrelation functions value r near pitch period 2(i), i=1 ..., 5.
Adopt the thresholding relative method, according to r 1(i), i=1 ..., 5 and r 2(i), i=1 ..., 5 unite each sub-band surd and sonant of judgement.
First subband of WELP vocoder clear/voiced sound represents with pitch period, 0 expression voiceless sound, non-zero be voiced sound, other 4 subbands represent with 4 bits, 1 represents voiced sound, 0 represents voiceless sound.
Based on above-mentioned linear predict voice coding method, the present invention has proposed a kind of phoneme synthesizing method simultaneously, and as shown in Figure 3, described phoneme synthesizing method may further comprise the steps:
S201, the line spectral frequencies that quantizes is decoded to the pure and impure sound characteristic of, wavelet coefficient, pitch period, gain parameter and each subband, obtain line spectral frequencies to the pure and impure sound characteristic of, small echo excitation amplitude spectrum, pitch period, gain parameter and each subband;
S202, utilize the pure and impure sound characteristic of described small echo excitation amplitude spectrum, described pitch period and described each subband, synthetic small echo pumping signal;
S203, utilize described line spectral frequencies pair, described small echo pumping signal is carried out the antilinear prediction, obtain synthetic speech;
S204, described synthetic speech composed strengthen and the phase place adjustment.
Preferably, step S202 specifically comprises:
According to the pure and impure sound characteristic of described each subband, the voiceless sound composition of each subband and voiced sound composition are carried out filtering mix, obtain wavelet spectrum;
Described wavelet spectrum is carried out inversefouriertransform, obtain wavelet coefficient, and utilize that the dB10 wavelet basis is compound to obtain described small echo pumping signal.
The below describes the phonetic synthesis process as an example of 5 subbands example.Carry out the small echo pumping signal according to pure and impure sound characteristic, pitch period and the Wavelet Spectrum of 5 subbands synthetic, each subband signal according to 0 ~ 500Hz, 500 ~ 1000Hz, 1000 ~ 2000Hz, 2000 ~ 3000Hz and 3000 ~ 4000Hz the voicing decision result of totally 5 subbands voiced sound composition and voiceless sound composition carried out filtering in proportion at each frequency band mix, wherein the voiced sound subband adopts the voiced sound composition to adopt the wavelet coefficient spectrum amplitude to carry out inversefouriertransform, and the voiceless sound subband adopts white noise to carry out match.
According to scrambler the voicing decision result of 5 subbands is carried out filtering to voiced sound composition and voiceless sound composition in proportion at each frequency band and mix, obtain wavelet spectrum c f(k):
c f(k)=V kA k+(1-V k)e k0≤k≤9 (15)
Wherein, A kBe wavelet coefficient spectrum amplitude, V kBe the pure and impure phonetic symbol note of k wavelet coefficient spectrum amplitude subband of living in, voiceless sound is 0, and voiced sound is 1, e kBe random noise.
By to c f(k) carry out inversefouriertransform and can recover wavelet coefficient, and utilize the dB10 wavelet basis to be compounded to form pumping signal.
Pumping signal can be obtained synthetic speech by antilinear prediction synthesis filter H (z):
H ( z ) = 1 1 + Σ i = 1 10 a i z - i - - - ( 16 )
Wherein, a i, i=1,2 ..., 10 is linear predictor coefficient.
In order to make synthetic speech more natural, need to compose enhancing and phase place adjustment to synthetic speech, spectrum strengthens employing linear filter A (z) and constructs:
H ( z ) = A ( z / α ) A ( z / β ) ( 1 + μz - 1 ) - - - ( 17 )
A ( z ) = 1 H ( z ) - - - ( 18 )
Wherein, α, β are that short-term postfilter is adjusted the factor, and μ is the slope compensation factor.α=0.5p, β=0.8p, G IntBe the current gain after the interpolation, G nBe the ground unrest gain, μ is the inclination factor after max (0.5k1,0) the p interpolation.
The below provides an application example of the present invention.
For the WELP vocoder, the input voice are the linear PCM signal of 8kHz sampling, and the speech signal analysis frame length is 20ms, and present frame is totally 160 sampling points, and 240 sampling points before the buffer memory.These 400 sampling points are carried out parameter extraction, the parameter of extracting comprise 10 linear predictor coefficients generate 10 line spectrum pairs, 10 wavelet coefficients, 1 pitch period, 2 gain parameters, 4 subband voice clear/voiced sound (described by pitch period by the pure and impure sound characteristic of the 1st subband, pitch period is 0 expression voiceless sound, otherwise is voiced sound).The Bit Allocation in Discrete of these parameters is as shown in table 1:
The Bit Allocation in Discrete table of table 1 parameter
Parameter Voiced sound Voiceless sound
Line spectrum pair 20 20
Pitch period 8 8
Wavelet coefficient 8 0
Gain 8 8
The sub-band surd and sonant mark 4 0
FEC 0 13
In order to weigh effect of the present invention, the objective MOS value evaluation method that adopts the ITU-TP.862 recommendation to provide is carried out the voice coding evaluation, and the MOS value of evaluation software PESQ is that 0 ~ 4.5, MOS value is higher, and voice quality is better, and test result is as shown in table 2:
Table 2 test result
English Chinese
Male voice 3.3 3.18
Female voice 3.4 3.21
The above only is preferred implementation of the present invention; should be pointed out that for a person skilled in the art, under the prerequisite that does not break away from the technology of the present invention principle; can also make some improvement and replacement, these improvement and replacement also should be considered as protection scope of the present invention.

Claims (10)

1. a linear predict voice coding method is characterized in that, described voice coding method may further comprise the steps:
S101, voice are carried out pre-service disturb to remove flip-flop and power frequency;
S102, pretreated voice are carried out second order antilinear prediction, obtain residual signals;
S103, described residual signals is carried out wavelet decomposition compression, obtains wavelet coefficient, and described wavelet coefficient is carried out vector quantization,
Calculate the pitch period of described residual signals, and described pitch period carried out scalar quantization,
Calculate the gain parameter of described residual signals, and described gain parameter carried out scalar quantization,
Described residual signals is divided into several subbands, each subband is carried out voicing decision, the pure and impure sound characteristic that obtains each subband rower amount of going forward side by side quantizes.
2. voice coding method according to claim 1 is characterized in that, step S102 further comprises:
Described pretreated voice are carried out linear prediction analysis, obtain linear predictor coefficient, then described linear predictor coefficient is converted to line spectral frequencies pair, and to described line spectral frequencies to carrying out vector quantization.
3. voice coding method according to claim 2 is characterized in that, among the step S102, described linear prediction analysis specifically comprises:
Adopt Hamming window to carry out windowing process to described pretreated voice, and the voice signal after the windowing is carried out autocorrelation calculation, utilize the Levinson-Durbin algorithm to calculate 10 rank linear predictor coefficients, then described 10 rank linear predictor coefficients be multiply by 0.994 I+1(i=1,2 ..., 10) and to obtain the linear predictor coefficient of bandwidth expansion.
4. voice coding method according to claim 1 is characterized in that, among the step S103, described wavelet decomposition compression specifically comprises:
The sampling point of choosing described residual signals carries out the single order wavelet decomposition, adopts the dB10 wavelet basis, obtains wavelet coefficient, and front 100 wavelet coefficients are compressed analysis.
5. according to claim 1 or 4 described voice coding methods, it is characterized in that, among the step S103, described wavelet coefficient carried out vector quantization specifically comprise:
First described wavelet coefficient is converted to small echo excitation amplitude spectrum, then described small echo excitation amplitude spectrum is carried out vector quantization, codebook search adopts full-search algorithm during quantification, and distortion metrics adopts weighted euclidean distance.
6. voice coding method according to claim 1 is characterized in that, among the step S103, the pitch period that calculates described residual signals specifically comprises:
Adopt Fourier transform to carry out spectrum analysis to described residual signals, and spectral magnitude carried out inversefouriertransform, with the autocorrelation peak of the residual signals that obtains as the integer fundamental tone;
In the scope of described integer fundamental tone ± 1, search for, by described residual signals is carried out interpolation and local correlation, obtain pitch period.
7. voice coding method according to claim 6 is characterized in that, among the step S103, the pitch period that calculates described residual signals further comprises:
Utilize described pitch period that described residual signals is carried out the search of fundamental tone peak value and resonance peak thereof, and with the mean value of the difference of each peak value as final pitch period.
8. voice coding method according to claim 1 is characterized in that, among the step S103, each subband is carried out voicing decision specifically comprise:
Calculate near the maximum normalized autocorrelation value of each subband signal described pitch period;
Calculate near the maximum normalized autocorrelation value of each subband envelope signal described pitch period;
Adopt threshold value comparison method, according near the maximum normalized autocorrelation value of described each subband signal described pitch period, and near the maximum normalized autocorrelation value of described each subband envelope signal described pitch period, each subband is carried out voicing decision.
9. the phoneme synthesizing method based on claim 2 or 3 described linear predict voice coding methods is characterized in that, described phoneme synthesizing method may further comprise the steps:
S201, the line spectral frequencies that quantizes is decoded to the pure and impure sound characteristic of, wavelet coefficient, pitch period, gain parameter and each subband, obtain line spectral frequencies to the pure and impure sound characteristic of, small echo excitation amplitude spectrum, pitch period, gain parameter and each subband;
S202, utilize the pure and impure sound characteristic of described small echo excitation amplitude spectrum, described pitch period and described each subband, synthetic small echo pumping signal;
S203, utilize described line spectral frequencies pair, described small echo pumping signal is carried out the antilinear prediction, obtain synthetic speech;
S204, described synthetic speech composed strengthen and the phase place adjustment.
10. phoneme synthesizing method according to claim 9 is characterized in that, step S202 specifically comprises:
According to the pure and impure sound characteristic of described each subband, the voiceless sound composition of each subband and voiced sound composition are carried out filtering mix, obtain wavelet spectrum;
Described wavelet spectrum is carried out inversefouriertransform, obtain wavelet coefficient, and utilize that the dB10 wavelet basis is compound to obtain described small echo pumping signal.
CN 201210592909 2012-12-31 2012-12-31 Linear prediction speech coding method and speech synthesis method Pending CN103050121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201210592909 CN103050121A (en) 2012-12-31 2012-12-31 Linear prediction speech coding method and speech synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201210592909 CN103050121A (en) 2012-12-31 2012-12-31 Linear prediction speech coding method and speech synthesis method

Publications (1)

Publication Number Publication Date
CN103050121A true CN103050121A (en) 2013-04-17

Family

ID=48062736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201210592909 Pending CN103050121A (en) 2012-12-31 2012-12-31 Linear prediction speech coding method and speech synthesis method

Country Status (1)

Country Link
CN (1) CN103050121A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714820A (en) * 2013-12-27 2014-04-09 广州华多网络科技有限公司 Packet loss hiding method and device of parameter domain
CN104517614A (en) * 2013-09-30 2015-04-15 上海爱聊信息科技有限公司 Voiced/unvoiced decision device and method based on sub-band characteristic parameter values
CN104990553A (en) * 2014-12-23 2015-10-21 上海安悦四维信息技术有限公司 Hand-held vehicle terminal C-Pad intelligent navigation system and working method thereof
CN106415718A (en) * 2014-01-24 2017-02-15 日本电信电话株式会社 Linear-predictive analysis device, method, program, and recording medium
CN106575511A (en) * 2014-07-29 2017-04-19 瑞典爱立信有限公司 Estimation of background noise in audio signals
CN107077857A (en) * 2014-05-07 2017-08-18 三星电子株式会社 The method and apparatus and the method and apparatus of de-quantization quantified to linear predictor coefficient
CN109003621A (en) * 2018-09-06 2018-12-14 广州酷狗计算机科技有限公司 A kind of audio-frequency processing method, device and storage medium
CN109243479A (en) * 2018-09-20 2019-01-18 广州酷狗计算机科技有限公司 Acoustic signal processing method, device, electronic equipment and storage medium
CN109256143A (en) * 2018-09-21 2019-01-22 西安蜂语信息科技有限公司 Speech parameter quantization method, device, computer equipment and storage medium
CN110380826A (en) * 2019-08-21 2019-10-25 苏州大学 The compression of mobile communication signal ADAPTIVE MIXED and decompressing method
CN110415713A (en) * 2018-04-28 2019-11-05 北京展讯高科通信技术有限公司 The coding method of DMR system and device, storage medium, digital walkie-talkie
WO2020001568A1 (en) * 2018-06-29 2020-01-02 华为技术有限公司 Method and apparatus for determining weighting coefficient during stereo signal coding process
CN110730015A (en) * 2019-11-20 2020-01-24 深圳市星网荣耀科技有限公司 Multilink portable communication device and voice coding compression and decoding method thereof
US10714107B2 (en) 2014-04-25 2020-07-14 Ntt Docomo, Inc. Linear prediction coefficient conversion device and linear prediction coefficient conversion method
CN112233686A (en) * 2020-09-29 2021-01-15 天津联声软件开发有限公司 Voice data processing method of NVOCPLUS high-speed broadband vocoder
CN112270934A (en) * 2020-09-29 2021-01-26 天津联声软件开发有限公司 Voice data processing method of NVOC low-speed narrow-band vocoder
CN112562699A (en) * 2019-09-26 2021-03-26 宏碁股份有限公司 Voice processing method and device
TWI723545B (en) * 2019-09-17 2021-04-01 宏碁股份有限公司 Speech processing method and device thereof
CN113196388A (en) * 2018-12-17 2021-07-30 微软技术许可有限责任公司 Phase quantization in a speech encoder
CN113409756A (en) * 2020-03-16 2021-09-17 阿里巴巴集团控股有限公司 Speech synthesis method, system, device and storage medium
US11450329B2 (en) 2014-03-28 2022-09-20 Samsung Electronics Co., Ltd. Method and device for quantization of linear prediction coefficient and method and device for inverse quantization

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104517614A (en) * 2013-09-30 2015-04-15 上海爱聊信息科技有限公司 Voiced/unvoiced decision device and method based on sub-band characteristic parameter values
CN103714820A (en) * 2013-12-27 2014-04-09 广州华多网络科技有限公司 Packet loss hiding method and device of parameter domain
CN103714820B (en) * 2013-12-27 2017-01-11 广州华多网络科技有限公司 Packet loss hiding method and device of parameter domain
CN106415718A (en) * 2014-01-24 2017-02-15 日本电信电话株式会社 Linear-predictive analysis device, method, program, and recording medium
CN106415718B (en) * 2014-01-24 2019-10-25 日本电信电话株式会社 Linear prediction analysis device, method and recording medium
US11450329B2 (en) 2014-03-28 2022-09-20 Samsung Electronics Co., Ltd. Method and device for quantization of linear prediction coefficient and method and device for inverse quantization
US11222644B2 (en) 2014-04-25 2022-01-11 Ntt Docomo, Inc. Linear prediction coefficient conversion device and linear prediction coefficient conversion method
US10714108B2 (en) 2014-04-25 2020-07-14 Ntt Docomo, Inc. Linear prediction coefficient conversion device and linear prediction coefficient conversion method
US10714107B2 (en) 2014-04-25 2020-07-14 Ntt Docomo, Inc. Linear prediction coefficient conversion device and linear prediction coefficient conversion method
CN107077857A (en) * 2014-05-07 2017-08-18 三星电子株式会社 The method and apparatus and the method and apparatus of de-quantization quantified to linear predictor coefficient
US11922960B2 (en) 2014-05-07 2024-03-05 Samsung Electronics Co., Ltd. Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same
US11238878B2 (en) 2014-05-07 2022-02-01 Samsung Electronics Co., Ltd. Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same
CN107077857B (en) * 2014-05-07 2021-03-09 三星电子株式会社 Method and apparatus for quantizing linear prediction coefficients and method and apparatus for dequantizing linear prediction coefficients
CN106575511A (en) * 2014-07-29 2017-04-19 瑞典爱立信有限公司 Estimation of background noise in audio signals
CN104990553A (en) * 2014-12-23 2015-10-21 上海安悦四维信息技术有限公司 Hand-held vehicle terminal C-Pad intelligent navigation system and working method thereof
CN110415713A (en) * 2018-04-28 2019-11-05 北京展讯高科通信技术有限公司 The coding method of DMR system and device, storage medium, digital walkie-talkie
CN110415713B (en) * 2018-04-28 2021-11-09 北京紫光展锐通信技术有限公司 Encoding method and device of DMR system, storage medium and digital interphone
WO2020001568A1 (en) * 2018-06-29 2020-01-02 华为技术有限公司 Method and apparatus for determining weighting coefficient during stereo signal coding process
US11551701B2 (en) 2018-06-29 2023-01-10 Huawei Technologies Co., Ltd. Method and apparatus for determining weighting factor during stereo signal encoding
EP3800638A4 (en) * 2018-06-29 2021-08-18 Huawei Technologies Co., Ltd. Method and apparatus for determining weighting coefficient during stereo signal coding process
US11922958B2 (en) 2018-06-29 2024-03-05 Huawei Technologies Co., Ltd. Method and apparatus for determining weighting factor during stereo signal encoding
CN109003621A (en) * 2018-09-06 2018-12-14 广州酷狗计算机科技有限公司 A kind of audio-frequency processing method, device and storage medium
CN109003621B (en) * 2018-09-06 2021-06-04 广州酷狗计算机科技有限公司 Audio processing method and device and storage medium
CN109243479A (en) * 2018-09-20 2019-01-18 广州酷狗计算机科技有限公司 Acoustic signal processing method, device, electronic equipment and storage medium
CN109256143A (en) * 2018-09-21 2019-01-22 西安蜂语信息科技有限公司 Speech parameter quantization method, device, computer equipment and storage medium
CN113196388A (en) * 2018-12-17 2021-07-30 微软技术许可有限责任公司 Phase quantization in a speech encoder
CN110380826B (en) * 2019-08-21 2021-09-28 苏州大学 Self-adaptive mixed compression method for mobile communication signal
CN110380826A (en) * 2019-08-21 2019-10-25 苏州大学 The compression of mobile communication signal ADAPTIVE MIXED and decompressing method
TWI723545B (en) * 2019-09-17 2021-04-01 宏碁股份有限公司 Speech processing method and device thereof
US11587573B2 (en) 2019-09-17 2023-02-21 Acer Incorporated Speech processing method and device thereof
CN112562699B (en) * 2019-09-26 2023-08-15 宏碁股份有限公司 Voice processing method and device thereof
CN112562699A (en) * 2019-09-26 2021-03-26 宏碁股份有限公司 Voice processing method and device
CN110730015A (en) * 2019-11-20 2020-01-24 深圳市星网荣耀科技有限公司 Multilink portable communication device and voice coding compression and decoding method thereof
CN113409756B (en) * 2020-03-16 2022-05-03 阿里巴巴集团控股有限公司 Speech synthesis method, system, device and storage medium
CN113409756A (en) * 2020-03-16 2021-09-17 阿里巴巴集团控股有限公司 Speech synthesis method, system, device and storage medium
CN112270934A (en) * 2020-09-29 2021-01-26 天津联声软件开发有限公司 Voice data processing method of NVOC low-speed narrow-band vocoder
CN112270934B (en) * 2020-09-29 2023-03-28 天津联声软件开发有限公司 Voice data processing method of NVOC low-speed narrow-band vocoder
CN112233686A (en) * 2020-09-29 2021-01-15 天津联声软件开发有限公司 Voice data processing method of NVOCPLUS high-speed broadband vocoder

Similar Documents

Publication Publication Date Title
CN103050121A (en) Linear prediction speech coding method and speech synthesis method
KR101147878B1 (en) Coding and decoding methods and devices
EP3039676B1 (en) Adaptive bandwidth extension and apparatus for the same
EP3138096B1 (en) High band excitation signal generation
KR101747917B1 (en) Apparatus and method for determining weighting function having low complexity for lpc coefficients quantization
CN102341850B (en) Speech coding
CN103325375B (en) One extremely low code check encoding and decoding speech equipment and decoding method
JP4270866B2 (en) High performance low bit rate coding method and apparatus for non-speech speech
JP6980871B2 (en) Signal coding method and its device, and signal decoding method and its device
US6138092A (en) CELP speech synthesizer with epoch-adaptive harmonic generator for pitch harmonics below voicing cutoff frequency
CN106463134B (en) method and apparatus for quantizing linear prediction coefficients and method and apparatus for inverse quantization
CN108231083A (en) A kind of speech coder code efficiency based on SILK improves method
EP1597721B1 (en) 600 bps mixed excitation linear prediction transcoding
KR102052144B1 (en) Method and device for quantizing voice signals in a band-selective manner
WO2004090864A2 (en) Method and apparatus for the encoding and decoding of speech
JP4578145B2 (en) Speech coding apparatus, speech decoding apparatus, and methods thereof
Li et al. A new distortion measure for parameter quantization based on MELP
Zou et al. High quality 0.6/1.2/2.4 kbps multi-band lpc speech coding algorithm
Laurent et al. A robust 2400 bps subband LPC vocoder
Bao et al. High quality harmonic excitation linear predictive speech coding at 2 kb/s
Saleem et al. Implementation of Low Complexity CELP Coder and Performance Evaluation in terms of Speech Quality
KR0156983B1 (en) Voice coder
Bao Harmonic excitation LPC (HE-LPC) speech coding at 2.3 kb/s
KR100757366B1 (en) Device for coding/decoding voice using zinc function and method for extracting prototype of the same
Zou et al. A 300bps speech coding algorithm based on multi-mode matrix quantization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130417