CN106935243A

CN106935243A - A kind of low bit digital speech vector quantization method and system based on MELP

Info

Publication number: CN106935243A
Application number: CN201511005800.3A
Authority: CN
Inventors: 王国文; 罗世新; 何丽; 张盼
Original assignee: Aisino Corp
Current assignee: Aisino Corp
Priority date: 2015-12-29
Filing date: 2015-12-29
Publication date: 2017-07-07

Abstract

The embodiment of the invention provides a kind of low bit digital speech vector quantization method based on MELP and system.The present invention carries out linear predictor coefficient vector quantization to the pitch signal after adjustment using MELP MELP algorithms, including：Two-stage Split vector quantizer is used to LSF parameters, the LSF parameters of first order vector quantization, the LSF parameters of the LSF parameter acquirings second level vector quantization based on the first order vector quantization is first obtained；Digital speech vector quantization is carried out using the LSF parameters after the vector quantization of the second level.The present invention, using LSF two-stages level vector quantization scheme, reduces code check on the basis of MELP algorithms, reduces the amount of storage and computation complexity of code book.

Description

Low-bit digital speech vector quantization method and system based on MELP

Technical Field

The invention relates to the technical field of signal processing, in particular to a low-bit digital speech vector quantization method based on MELP.

Background

At present, the research of low-bit digital speech compression algorithms is more and more mature, and in the low-bit digital speech algorithms, a Mixed Excitation Linear Prediction (MELP) algorithm has own specific advantages, and 2.4Kbps MELP is a new speech generation model which is more in line with human pronunciation and is used for synthesizing speech by combining the advantages of coding methods such as mixed Excitation, multi-band Excitation, prototype waveform interpolation and the like on the basis of Linear Predictive Coding (LPC). The MELP algorithm is characterized by adopting multi-band mixed excitation, non-periodic pulse, residual harmonic processing, adaptive spectrum enhancement and pulse shaping filtering.

In view of the above problems, it is generally proposed in the prior art to use a vocoder of the identification synthesis type to encode a speech signal by using speech recognition and synthesis techniques, where the encoding units are speech primitives, so as to reduce the encoding rate to below 1 Kb/s. In addition, based on 2.4K/s linear predictive coding LPC (linear predictive coding), the voice data is further compressed by using a vector quantization technology and inter-frame correlation of voice. Vector quantization refers to a group of scalar data as a vector, and the vector space is quantized as a whole, so that the data is compressed without losing much information. The efficiency of the vector quantization determines the efficiency of the encoder. In the quantization of parameters in low-rate coding, because of the relatively high number of bits occupied by the LSP (Line Spectrum Pa quantization, a significant reduction in the coding rate is necessary if the LSP parameter quantization method can be improved, because of the large correlation between adjacent frames of speech signals, especially in the stationary phase of speech, the coding rate is greatly reduced if the speech parameters are transmitted once every other frame, therefore, it has also been proposed to further reduce the number of bits for parameter quantization by using inter-frame correlation, i.e. to encode several frames of continuous signals as one frame and perform an overall vector quantization on the parameters of the super-frame to compress inter-frame redundancy, and also to propose a piecewise quantization method of variable segment length, i.e. to treat the input speech as a segment of variable sequence length, each segment consisting of one or several frames of signals, each frame is represented by parameters such as gain, pitch and spectrum. Although the method is complex to implement, the coding rate can be greatly reduced, the coding delay can be shortened, and the synthesized voice with higher quality can be obtained.

Disclosure of Invention

The embodiment of the invention provides a low-bit digital speech vector quantization method and system based on MELP, and the invention provides the following scheme:

performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, wherein the method comprises the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization;

and performing digital voice vector quantization by using the LSF parameter after the second-stage vector quantization.

According to another aspect of the present invention, there is also provided a MELP-based low-bit digital speech vector quantization system, comprising:

a coefficient acquisition module: the method is used for performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, and comprises the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization;

a quantization module: which is used for digital speech vector quantization using LSF parameters after the second level of vector quantization.

As can be seen from the technical solutions provided by the embodiments of the present invention, the embodiments of the present invention provide a method and a system for quantizing a low-bit digital speech vector based on MELP. The invention adopts mixed excitation linear prediction MELP algorithm to carry out linear prediction coefficient vector quantization on the adjusted fundamental tone signal, comprising the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization; and performing digital voice vector quantization by using the LSF parameter after the second-stage vector quantization. The design scheme of the MELP-based low-bit digital speech algorithm is based on the existing design method and provides a novel MELP-based low-bit digital speech construction method aiming at the defects of the existing design method. On the basis of the MELP algorithm, the quantization of the algorithm is analyzed, the quantization of the pitch period and the quantization of the linear prediction coefficient are emphatically analyzed, an improved method is further provided for the quantization of the linear prediction coefficient, an LSF two-stage vector quantization scheme is adopted, the code rate is reduced, the storage amount and the calculation complexity of a codebook are reduced, and the method has advantages compared with the original scheme.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a process flow diagram of a MELP-based low-bit digital speech vector quantization method according to an embodiment of the present invention;

fig. 2 is a block diagram of a MELP-based low-bit digital speech vector quantization system according to a second embodiment of the present invention.

Detailed Description

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

Example one

In the embodiment of the invention, firstly, a fundamental tone signal is required to be acquired; in this implementation, the obtaining the pitch signal specifically includes:

the sampled digital voice signal passes through a high-pass filter to obtain a filtering signal;

performing unvoiced and voiced sound judgment on the filtering signal by adopting multi-band mixed excitation, and calculating the gain of the filtering signal to obtain the fundamental tone signal;

specifically, the filtering signal is divided into a plurality of sub-bands, and voiced and unvoiced sound decisions are respectively performed, and voiced and unvoiced sounds are respectively labeled to the sound intensities of the sub-bands,

the sound intensity of the sub-band is represented by a parameter Vbpi (i is 1,2 …, n), where Vbpi represents the sound intensity of each sub-band, and represents voiced sound when the value is 1 and unvoiced sound when the value is 0,

in this embodiment, every 22.5ms of speech is used as an analysis frame, corresponding to 180 samples (8000 samples/s) at a sampling rate of 8kHz, and processed to output 54 bits per frame for transmission, so that its rate is 2.4 kbps;

taking the example of dividing the filtered signal into 5 subbands, each subband parameter is Vbpi (i ═ 1,2 …,5),

preferably, the input signal is respectively passed through 5 6-order Butterworth band-pass filters, and divided into five sub-bands of 0 Hz-500 Hz, 500 Hz-1000 Hz, 1000 Hz-2000 Hz, 2000 Hz-3000 Hz, and 3000 Hz-4000 Hz. The output of the speech signal after being filtered by a band-pass filter of 0 Hz-500 Hz is used for estimating the fractional fundamental tone once, thereby obtaining the fractional fundamental tone period P₂And a corresponding autocorrelation function value r (P)₂)，r(P₂) The value of (d) determines the lowest band and the overall unvoiced/voiced decision result. According to fractional pitch period P₂Corresponding autocorrelation function value r (P)₂) Setting a first intensity threshold, wherein the value in the embodiment is 0.6;

when the sound intensity parameter Vbp1 of the first sub-band is not larger than the first intensity threshold value, the current frame is an unvoiced frame, and all the rest band-pass unvoiced intensity Vbpi (i is 1,2,3,4,5) are quantized and encoded by the unvoiced frame;

when the sound intensity parameter Vbp1 of the first sub-band is greater than the first intensity threshold, the current frame is a voiced frame, and all of the remaining band-pass unvoiced intensities Vbpi (i ═ 1,2,3,4,5) are quantized and encoded using the voiced frame.

When Vbp1 is less than or equal to 0.6, the current frame is an unvoiced frame, and all the rest bandpass unvoiced intensity Vbpi (i is 1,2,3,4 and 5) are quantized and coded into 0;

when Vbp1>0.6, i ═ 2,3,4,5, it is said that the current frame is a voiced frame, and Vbp1 is encoded as 1.

The filtered signal gain is calculated and the sampled digital speech signal is previously subjected to a windowing adjustment process, specifically,

adjusting the window length adopted for the sampled digital voice signal according to the sound intensity parameter of the first sub-band, specifically, when the sound intensity of the first sub-band is greater than the first intensity threshold value, and the fractional pitch period P₂Is not greater than the window length threshold, the window length is adjusted to be greater than the fractional pitch period P₂The smallest factor product of;

when the sound intensity of the first sub-band is greater than the first intensity threshold, and the fractional pitch period P₂If the minimum factor product of (2) is greater than the window length threshold, then the window length is adjusted to be half of the minimum factor product of the fractional pitch period;

when the sound intensity of the first sub-band is less than or equal to the first intensity threshold, the adjustment window length is equal to the smallest factor product of the fractional pitch periods.

For example, in the embodiment, the first intensity threshold is 0.6 when Vbp1 is set>At 0.6, the window length is greater than P₂Fractional pitch period minimum factor product; in this embodiment, taking the speech with length of 22.5ms as an analysis frame, corresponding to 180 samples (8000 samples/s) at 8kHz sampling rate, processed to output 54 bits per frame for transmission, such that its rate is 2.4kbps, when the adjustment window length is greater than 120 samples,

in this case, taking the window length threshold as 320 samples as an example, if the window length adjusted by the calculation is longer than 320 samples, the window length calculated by the calculation is divided by 2.

When Vbp1 is less than or equal to 0.6, the adjustment window length is 120 sampling points.

Secondly, linear predictive coding is carried out on the fundamental tone signal to obtain a residual signal, the fundamental tone period of the fundamental tone signal is calculated, the fundamental tone signal is adjusted according to the fundamental tone period and the residual signal, and the adjusted fundamental tone signal is obtained; specifically, the method comprises the following steps:

step A, carrying out LPC (Linear predictive coding) linear predictive coding on the sampled digital voice signal of the fundamental tone signal;

in this embodiment, a hamming window of 200 samples contained in a 25 ms-long speech signal is used to weight a sampled digital speech signal, and then 10-order linear predictive coding is performed, where the center of the window is the reference point of the current frame.

Step B, obtaining a residual signal after linear predictive coding; the obtained residual signal does not contain sound channel response information but contains complete excitation information, and the effect of reducing the influence of sound channel characteristics and improving the pitch period estimation effect is achieved;

to obtain a residual signal, the sampled digital speech signal is passed through a linear prediction error filter with a transfer function of:

wherein, a_iFor linear prediction coefficients, the residual signal is:

where n is the window length for residual analysis. The linear prediction error filter is a FIR filter, the output of which is a residual signal.

Step C, calculating a pitch period of the pitch signal, adjusting the pitch signal according to the pitch period and the residual signal, and obtaining an adjusted pitch signal;

and step C1, for the calculation of the integer gene period, the sampled digital voice signal firstly passes through a 6-order Butterworth low-pass filter with the cut-off frequency of 1KHz, and the interference of the high-frequency component of the voice on the pitch period estimation in the parameter analysis is eliminated. The normalized autocorrelation function r (τ) is defined as:

wherein

The value of the integer pitch period is equal to the value T corresponding to the maximum value of the normalized autocorrelation function r (τ), and max (r (τ)) is obtained from the above calculation formula as the integer pitch period P₁。

And C2, the output signal of the first sub-band-pass filter (0-500 Hz) is Sb1(n), and the signal Sb1(n) is mainly used for searching fractional pitch periods. Since the Sb1(n) signal has already filtered out the fourth or more harmonics of the pitch period when passing through the first subband filter, the effect of higher harmonics on the pitch search is eliminated, and after the above operation, the integer pitch period P, which was roughly estimated before, is combined with the integer pitch period P₁So that the pitch period can be estimated more accurately. Integer pitch period estimated using the current frame and previous frame, in (P)₁-5,P₁+5) to obtain P₂Reuse of P₂A fractional pitch period is calculated. The calculation of the fractional pitch period can greatly improve the accuracy of the pitch period estimation. The true pitch period is also likely to be (P)₂-1,P₂) Is a group of formulae (I) to (I) or (P)₂,P₂+1), and therefore, C is usually compared using the formula C τ (m, n)_P2(0,P₂-1) and C_P2(0,P₂+1) size. In the range of determined pitch period [ P, P +1 ]]Then, interpolation can be used to determine the fractional pitch period.

In this embodiment, for the fractional gene period calculation, the fractional pitch period extraction uses the first band (0-500 Hz) output signal in the band pass analysis, and the two candidate values are the integer pitch periods of the current frame and the previous frame, respectively. Assuming that the offset of the actual pitch period from the integer pitch period is Δ, 0< Δ <1, the formula for calculating Δ is as follows:

the normalized autocorrelation values for the fractional pitch period are:

respectively setting a as CT (0, 0); CT (0, T); c ═ CT (0, T + 1); CT (T, T); e ═ CT (T, T + 1); substituting (T +1) into the above two expressions to evaluate and obtain a fractional pitch period;

step C3: the final calculation of the pitch period is performed based on the candidate pitch period values of step C1 and step C2 described above. P3 is the final pitch period estimate, and the corresponding normalized autocorrelation value is r (P3); when the autocorrelation value is larger (r (P3) ≧ 0.6), the estimation of the pitch period is more accurate, and finally the fold detection of the pitch period is carried out by using the residual signal of the low-pass filtering, thus obtaining the final pitch period estimation value. The pitch period affects the recognition rate of speech recognition and the accuracy of speech compression coding.

When the autocorrelation value is small (r (P3) <0.6), indicating that the pitch signal in the LPC residual signal may be corrupted by noise, or the frame signal is not stationary, a search is made for fractional pitch periods only in the vicinity, replacing the LPC residual signal with the sampled digital speech signal, resulting in new P3 and r (P3).

The processing flow of the low-bit digital speech vector quantization method based on the MELP provided by the embodiment of the invention is shown in fig. 1, and comprises the following processing steps:

step 11, performing linear prediction coefficient vector quantization on the adjusted pitch signal by using a mixed Excitation linear prediction (melp) algorithm, including: the LSF parameter is quantized by adopting two-stage split vectors, firstly, the LSF (Linear vector frequency) parameter of the first-stage vector quantization is obtained, and the LSF parameter of the second-stage vector quantization is obtained based on the LSF parameter of the first-stage vector quantization;

specifically, in this embodiment, 5 bits are adopted to perform first-level vector quantization on the LSF parameter, so as to obtain a 10-dimensional LSF parameter;

dividing the 10-dimensional LSF parameters into front 5-dimensional LSF parameters and rear 5-dimensional LSF parameters, respectively adopting a 7-bit codebook to carry out second-stage vector quantization on the front 5-dimensional LSF parameters, and adopting a 5-bit codebook to carry out second-stage vector quantization on the rear 5-dimensional LSF parameters;

specifically, in this embodiment, 17 bits are used for vector quantization on the LSF parameters of the 2 nd subframe and the 4 th subframe of the adjusted pitch signal;

calculating the LSF parameters of the 1 st subframe and the 3 rd subframe of the adjusted pitch signal by adopting the following formula:

j＝1,2,...,9

wherein,for interpolated values of LSF parameters for sub-frame 1 and sub-frame 3,the quantized value of the LSF parameter for the last subframe of the previous joint frame,quantized values of LSF for sub-frame 2 and sub-frame 4, a₁(j),a₂(j) Interpolating coefficients for LSF, wherein a₁(j),a₂(j) A 4-bit codebook is used for vector quantization.

A is a₁(j),a₂(j) Vector quantization is performed by adopting a codebook with 4 bits, and the method comprises the following steps:

the following objective function of vector quantization, namely a vector quantization object, is established:

wherein, w₁(j),w₃(j) As a weighting coefficient,/₁(j),l₃(j) Are the 1 st and 3 rd subframe LSF parameters without quantization.

And step 12, carrying out digital voice vector quantization by adopting the LSF parameters after the second-stage vector quantization.

Specifically, the method further comprises the steps of comparing the LSF parameter after the second-stage vector quantization with the original LSF quantization value by adopting a spectrum distortion index, wherein N represents the spectrum distortion index;

wherein L is the number of fundamental tone harmonics in the subframe, A_mlAs original spectral amplitude values, A_mrlIs the spectral amplitude value reconstructed after the LSF parameter after the second-stage vector quantization is adopted.

Example two

The embodiment provides a MELP-based low-bit digital speech vector quantization system, and a specific implementation structure of the system is shown in fig. 2, which may specifically include the following modules:

the coefficient acquisition module 21: the method is used for performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, and comprises the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization;

the quantization module 23: which is used for digital speech vector quantization using LSF parameters after the second stage of vector quantization.

The specific process of performing digital speech vector quantization by using the system of the embodiment of the present invention is similar to the method embodiment described above, and is not described here again.

In summary, the embodiment of the present invention obtains the adjusted pitch signal; performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, wherein the linear prediction coefficient vector quantization comprises the following steps: converting LPC parameters into line spectral pair vector LSF parameters, wherein two-stage split vector quantization is adopted for the LSF parameters, and the method comprises the following steps: carrying out first-stage vector quantization on the LSF parameters by adopting 5 bits to obtain 10-dimensional LSF parameters; dividing the 10-dimensional LSF parameters into front 5-dimensional LSF parameters and rear 5-dimensional LSF parameters, respectively adopting a 7-bit codebook to carry out second-stage vector quantization on the front 5-dimensional LSF parameters, and adopting a 5-bit codebook to carry out second-stage vector quantization on the rear 5-dimensional LSF parameters; the design scheme of the MELP-based low-bit digital speech algorithm is based on the existing design method and provides a novel MELP-based low-bit digital speech construction method aiming at the defects of the existing design method. On the basis of the MELP algorithm, the quantization of the algorithm is analyzed, the quantization of the pitch period and the quantization of the linear prediction coefficient are emphatically analyzed, an improved method is further provided for the quantization of the linear prediction coefficient, an LSF two-stage vector quantization scheme is adopted, the code rate is reduced, the storage amount and the calculation complexity of a codebook are reduced, and the method has advantages compared with the original scheme.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for low-bit digital speech vector quantization based on MELP, comprising:

2. A MELP-based low bit-rate digital speech vector quantization method according to claim 1, wherein said first obtaining LSF parameters for a first level of vector quantization and obtaining LSF parameters for a second level of vector quantization based on said LSF parameters for the first level of vector quantization comprises:

carrying out first-stage vector quantization on the LSF parameters by adopting 5 bits to obtain 10-dimensional LSF parameters; and dividing the 10-dimensional LSF parameters into front 5-dimensional LSF parameters and rear 5-dimensional LSF parameters, respectively carrying out second-stage vector quantization on the front 5-dimensional LSF parameters by adopting a 7-bit codebook, and carrying out second-stage vector quantization on the rear 5-dimensional LSF parameters by adopting a 5-bit codebook to obtain the LSF parameters of the second-stage vector quantization.

3. A MELP-based low-bit digital speech vector quantization method according to claim 1, wherein said performing linear prediction coefficient vector quantization on the adjusted pitch signal using a mixed excitation linear prediction MELP algorithm comprises:

carrying out vector quantization on LSF parameters of the 2 nd subframe and the 4 th subframe of the adjusted pitch signal by adopting 17 bits;

{\hat{l}}_{1} (j) = a_{1} (j) {\hat{l}}_{0} (j) + [1 - a_{1} (j)] {\hat{l}}_{2} (j)

l₃(j)＝a₂(j)l₂(j)+[1-a₂(j)]l₄(j)

j＝1,2,...,9

4. A MELP-based low-bit digital speech vector quantization method according to claim 3, characterized in that said a₁(j),a₂(j) Vector quantization is performed by adopting a codebook with 4 bits, and the method comprises the following steps:

an objective function of vector quantization is established as follows:

E = Σ_{0}^{9} w_{1} (j) {| l_{1} (j) - {\hat{l}}_{1} (j) |}^{2} + Σ_{0}^{9} w_{3} (j) | l_{3} (j) - {\hat{l}}_{3} (j) |

5. A MELP based low-bit digital speech vector quantization method according to claim 4, characterized in that,

comparing the LSF parameter after the second-stage vector quantization with the original LSF quantization value by adopting a spectrum distortion index,

N = \sqrt{\frac{1}{L} Σ_{1}^{L} {[10 \lg {| A_{m l} / A_{m r l} |}^{2}]}^{2}}

6. A method for MELP-based low-bit digital speech vector quantization according to claim 1, wherein obtaining said adjusted pitch signal comprises:

and performing linear predictive coding on the fundamental tone signal to obtain a residual signal, calculating the fundamental tone period of the fundamental tone signal, adjusting the fundamental tone signal according to the fundamental tone period and the residual signal, and obtaining the adjusted fundamental tone signal.

7. A MELP-based low-bit digital speech vector quantization method according to claim 6, characterized in that said performing unvoiced-voiced decisions with multi-band mixed excitation on said filtered signal comprises:

dividing the filtering signal into a plurality of sub-bands, respectively carrying out unvoiced and voiced sound judgment, respectively labeling the voiced and unvoiced sounds of the sound intensity of the sub-bands,

the sound intensity of the sub-bands is represented by a parameter Vbpi (i ═ 1,2 …, n), which represents the sound intensity of the respective sub-bands,

according to fractional pitch period P₂Corresponding autocorrelation function value r (P)₂) Setting a first intensity threshold, when the sound intensity parameter Vbp1 of the first sub-band is not greater than the first intensity threshold, the current frame is an unvoiced frame, and all the rest band-pass unvoiced intensity Vbpi (i is 1,2,3,4 and 5) adopt unvoiced frame quantization coding;

8. A MELP-based low-bit digital speech vector quantization method according to claim 6, characterized in that said calculating said filtered signal gain comprises:

when the sound intensity of the first sub-band is greater than the first intensity threshold and the minimum factor product of the fractional pitch period P2 is not greater than the window length threshold, adjusting the window length to be greater than the minimum factor product of the fractional pitch period P2;

when the sound intensity of the first sub-band is greater than the first intensity threshold and the minimum factor product of the fractional pitch period P2 is greater than the window length threshold, adjusting the window length to be half of the minimum factor product of the fractional pitch period;

9. A MELP-based low-bit digital speech vector quantization method according to claim 6, characterized in that said linear predictive coding of said pitch signal to obtain a residual signal comprises: passing the sampled digital speech signal through a linear prediction error filter with a transfer function of:

H (z) = 1 - Σ_{i = 1}^{10} a_{i} \cdot z^{i}

wherein, a_iFor linear prediction coefficients, the residual signal is:

r_{n} = s (n) - Σ_{i = 1}^{10} a_{i} s (n - i)

where n is the window length of the residual analysis, the linear prediction error filter is an FIR filter, and its output is the residual signal.

10. A MELP-based low-bit digital speech vector quantization system, comprising: