CN106935243A - A kind of low bit digital speech vector quantization method and system based on MELP - Google Patents

A kind of low bit digital speech vector quantization method and system based on MELP Download PDF

Info

Publication number
CN106935243A
CN106935243A CN201511005800.3A CN201511005800A CN106935243A CN 106935243 A CN106935243 A CN 106935243A CN 201511005800 A CN201511005800 A CN 201511005800A CN 106935243 A CN106935243 A CN 106935243A
Authority
CN
China
Prior art keywords
vector quantization
lsf
signal
melp
lsf parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511005800.3A
Other languages
Chinese (zh)
Inventor
王国文
罗世新
何丽
张盼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201511005800.3A priority Critical patent/CN106935243A/en
Publication of CN106935243A publication Critical patent/CN106935243A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • G10L19/038Vector quantisation, e.g. TwinVQ audio
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/087Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the invention provides a kind of low bit digital speech vector quantization method based on MELP and system.The present invention carries out linear predictor coefficient vector quantization to the pitch signal after adjustment using MELP MELP algorithms, including:Two-stage Split vector quantizer is used to LSF parameters, the LSF parameters of first order vector quantization, the LSF parameters of the LSF parameter acquirings second level vector quantization based on the first order vector quantization is first obtained;Digital speech vector quantization is carried out using the LSF parameters after the vector quantization of the second level.The present invention, using LSF two-stages level vector quantization scheme, reduces code check on the basis of MELP algorithms, reduces the amount of storage and computation complexity of code book.

Description

Low-bit digital speech vector quantization method and system based on MELP
Technical Field
The invention relates to the technical field of signal processing, in particular to a low-bit digital speech vector quantization method based on MELP.
Background
At present, the research of low-bit digital speech compression algorithms is more and more mature, and in the low-bit digital speech algorithms, a Mixed Excitation Linear Prediction (MELP) algorithm has own specific advantages, and 2.4Kbps MELP is a new speech generation model which is more in line with human pronunciation and is used for synthesizing speech by combining the advantages of coding methods such as mixed Excitation, multi-band Excitation, prototype waveform interpolation and the like on the basis of Linear Predictive Coding (LPC). The MELP algorithm is characterized by adopting multi-band mixed excitation, non-periodic pulse, residual harmonic processing, adaptive spectrum enhancement and pulse shaping filtering.
In view of the above problems, it is generally proposed in the prior art to use a vocoder of the identification synthesis type to encode a speech signal by using speech recognition and synthesis techniques, where the encoding units are speech primitives, so as to reduce the encoding rate to below 1 Kb/s. In addition, based on 2.4K/s linear predictive coding LPC (linear predictive coding), the voice data is further compressed by using a vector quantization technology and inter-frame correlation of voice. Vector quantization refers to a group of scalar data as a vector, and the vector space is quantized as a whole, so that the data is compressed without losing much information. The efficiency of the vector quantization determines the efficiency of the encoder. In the quantization of parameters in low-rate coding, because of the relatively high number of bits occupied by the LSP (Line Spectrum Pa quantization, a significant reduction in the coding rate is necessary if the LSP parameter quantization method can be improved, because of the large correlation between adjacent frames of speech signals, especially in the stationary phase of speech, the coding rate is greatly reduced if the speech parameters are transmitted once every other frame, therefore, it has also been proposed to further reduce the number of bits for parameter quantization by using inter-frame correlation, i.e. to encode several frames of continuous signals as one frame and perform an overall vector quantization on the parameters of the super-frame to compress inter-frame redundancy, and also to propose a piecewise quantization method of variable segment length, i.e. to treat the input speech as a segment of variable sequence length, each segment consisting of one or several frames of signals, each frame is represented by parameters such as gain, pitch and spectrum. Although the method is complex to implement, the coding rate can be greatly reduced, the coding delay can be shortened, and the synthesized voice with higher quality can be obtained.
Disclosure of Invention
The embodiment of the invention provides a low-bit digital speech vector quantization method and system based on MELP, and the invention provides the following scheme:
performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, wherein the method comprises the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization;
and performing digital voice vector quantization by using the LSF parameter after the second-stage vector quantization.
According to another aspect of the present invention, there is also provided a MELP-based low-bit digital speech vector quantization system, comprising:
a coefficient acquisition module: the method is used for performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, and comprises the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization;
a quantization module: which is used for digital speech vector quantization using LSF parameters after the second level of vector quantization.
As can be seen from the technical solutions provided by the embodiments of the present invention, the embodiments of the present invention provide a method and a system for quantizing a low-bit digital speech vector based on MELP. The invention adopts mixed excitation linear prediction MELP algorithm to carry out linear prediction coefficient vector quantization on the adjusted fundamental tone signal, comprising the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization; and performing digital voice vector quantization by using the LSF parameter after the second-stage vector quantization. The design scheme of the MELP-based low-bit digital speech algorithm is based on the existing design method and provides a novel MELP-based low-bit digital speech construction method aiming at the defects of the existing design method. On the basis of the MELP algorithm, the quantization of the algorithm is analyzed, the quantization of the pitch period and the quantization of the linear prediction coefficient are emphatically analyzed, an improved method is further provided for the quantization of the linear prediction coefficient, an LSF two-stage vector quantization scheme is adopted, the code rate is reduced, the storage amount and the calculation complexity of a codebook are reduced, and the method has advantages compared with the original scheme.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a process flow diagram of a MELP-based low-bit digital speech vector quantization method according to an embodiment of the present invention;
fig. 2 is a block diagram of a MELP-based low-bit digital speech vector quantization system according to a second embodiment of the present invention.
Detailed Description
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
Example one
In the embodiment of the invention, firstly, a fundamental tone signal is required to be acquired; in this implementation, the obtaining the pitch signal specifically includes:
the sampled digital voice signal passes through a high-pass filter to obtain a filtering signal;
performing unvoiced and voiced sound judgment on the filtering signal by adopting multi-band mixed excitation, and calculating the gain of the filtering signal to obtain the fundamental tone signal;
specifically, the filtering signal is divided into a plurality of sub-bands, and voiced and unvoiced sound decisions are respectively performed, and voiced and unvoiced sounds are respectively labeled to the sound intensities of the sub-bands,
the sound intensity of the sub-band is represented by a parameter Vbpi (i is 1,2 …, n), where Vbpi represents the sound intensity of each sub-band, and represents voiced sound when the value is 1 and unvoiced sound when the value is 0,
in this embodiment, every 22.5ms of speech is used as an analysis frame, corresponding to 180 samples (8000 samples/s) at a sampling rate of 8kHz, and processed to output 54 bits per frame for transmission, so that its rate is 2.4 kbps;
taking the example of dividing the filtered signal into 5 subbands, each subband parameter is Vbpi (i ═ 1,2 …,5),
preferably, the input signal is respectively passed through 5 6-order Butterworth band-pass filters, and divided into five sub-bands of 0 Hz-500 Hz, 500 Hz-1000 Hz, 1000 Hz-2000 Hz, 2000 Hz-3000 Hz, and 3000 Hz-4000 Hz. The output of the speech signal after being filtered by a band-pass filter of 0 Hz-500 Hz is used for estimating the fractional fundamental tone once, thereby obtaining the fractional fundamental tone period P2And a corresponding autocorrelation function value r (P)2),r(P2) The value of (d) determines the lowest band and the overall unvoiced/voiced decision result. According to fractional pitch period P2Corresponding autocorrelation function value r (P)2) Setting a first intensity threshold, wherein the value in the embodiment is 0.6;
when the sound intensity parameter Vbp1 of the first sub-band is not larger than the first intensity threshold value, the current frame is an unvoiced frame, and all the rest band-pass unvoiced intensity Vbpi (i is 1,2,3,4,5) are quantized and encoded by the unvoiced frame;
when the sound intensity parameter Vbp1 of the first sub-band is greater than the first intensity threshold, the current frame is a voiced frame, and all of the remaining band-pass unvoiced intensities Vbpi (i ═ 1,2,3,4,5) are quantized and encoded using the voiced frame.
When Vbp1 is less than or equal to 0.6, the current frame is an unvoiced frame, and all the rest bandpass unvoiced intensity Vbpi (i is 1,2,3,4 and 5) are quantized and coded into 0;
when Vbp1>0.6, i ═ 2,3,4,5, it is said that the current frame is a voiced frame, and Vbp1 is encoded as 1.
The filtered signal gain is calculated and the sampled digital speech signal is previously subjected to a windowing adjustment process, specifically,
adjusting the window length adopted for the sampled digital voice signal according to the sound intensity parameter of the first sub-band, specifically, when the sound intensity of the first sub-band is greater than the first intensity threshold value, and the fractional pitch period P2Is not greater than the window length threshold, the window length is adjusted to be greater than the fractional pitch period P2The smallest factor product of;
when the sound intensity of the first sub-band is greater than the first intensity threshold, and the fractional pitch period P2If the minimum factor product of (2) is greater than the window length threshold, then the window length is adjusted to be half of the minimum factor product of the fractional pitch period;
when the sound intensity of the first sub-band is less than or equal to the first intensity threshold, the adjustment window length is equal to the smallest factor product of the fractional pitch periods.
For example, in the embodiment, the first intensity threshold is 0.6 when Vbp1 is set>At 0.6, the window length is greater than P2Fractional pitch period minimum factor product; in this embodiment, taking the speech with length of 22.5ms as an analysis frame, corresponding to 180 samples (8000 samples/s) at 8kHz sampling rate, processed to output 54 bits per frame for transmission, such that its rate is 2.4kbps, when the adjustment window length is greater than 120 samples,
in this case, taking the window length threshold as 320 samples as an example, if the window length adjusted by the calculation is longer than 320 samples, the window length calculated by the calculation is divided by 2.
When Vbp1 is less than or equal to 0.6, the adjustment window length is 120 sampling points.
Secondly, linear predictive coding is carried out on the fundamental tone signal to obtain a residual signal, the fundamental tone period of the fundamental tone signal is calculated, the fundamental tone signal is adjusted according to the fundamental tone period and the residual signal, and the adjusted fundamental tone signal is obtained; specifically, the method comprises the following steps:
step A, carrying out LPC (Linear predictive coding) linear predictive coding on the sampled digital voice signal of the fundamental tone signal;
in this embodiment, a hamming window of 200 samples contained in a 25 ms-long speech signal is used to weight a sampled digital speech signal, and then 10-order linear predictive coding is performed, where the center of the window is the reference point of the current frame.
Step B, obtaining a residual signal after linear predictive coding; the obtained residual signal does not contain sound channel response information but contains complete excitation information, and the effect of reducing the influence of sound channel characteristics and improving the pitch period estimation effect is achieved;
to obtain a residual signal, the sampled digital speech signal is passed through a linear prediction error filter with a transfer function of:
wherein, aiFor linear prediction coefficients, the residual signal is:
where n is the window length for residual analysis. The linear prediction error filter is a FIR filter, the output of which is a residual signal.
Step C, calculating a pitch period of the pitch signal, adjusting the pitch signal according to the pitch period and the residual signal, and obtaining an adjusted pitch signal;
and step C1, for the calculation of the integer gene period, the sampled digital voice signal firstly passes through a 6-order Butterworth low-pass filter with the cut-off frequency of 1KHz, and the interference of the high-frequency component of the voice on the pitch period estimation in the parameter analysis is eliminated. The normalized autocorrelation function r (τ) is defined as:
wherein
The value of the integer pitch period is equal to the value T corresponding to the maximum value of the normalized autocorrelation function r (τ), and max (r (τ)) is obtained from the above calculation formula as the integer pitch period P1
And C2, the output signal of the first sub-band-pass filter (0-500 Hz) is Sb1(n), and the signal Sb1(n) is mainly used for searching fractional pitch periods. Since the Sb1(n) signal has already filtered out the fourth or more harmonics of the pitch period when passing through the first subband filter, the effect of higher harmonics on the pitch search is eliminated, and after the above operation, the integer pitch period P, which was roughly estimated before, is combined with the integer pitch period P1So that the pitch period can be estimated more accurately. Integer pitch period estimated using the current frame and previous frame, in (P)1-5,P1+5) to obtain P2Reuse of P2A fractional pitch period is calculated. The calculation of the fractional pitch period can greatly improve the accuracy of the pitch period estimation. The true pitch period is also likely to be (P)2-1,P2) Is a group of formulae (I) to (I) or (P)2,P2+1), and therefore, C is usually compared using the formula C τ (m, n)P2(0,P2-1) and CP2(0,P2+1) size. In the range of determined pitch period [ P, P +1 ]]Then, interpolation can be used to determine the fractional pitch period.
In this embodiment, for the fractional gene period calculation, the fractional pitch period extraction uses the first band (0-500 Hz) output signal in the band pass analysis, and the two candidate values are the integer pitch periods of the current frame and the previous frame, respectively. Assuming that the offset of the actual pitch period from the integer pitch period is Δ, 0< Δ <1, the formula for calculating Δ is as follows:
the normalized autocorrelation values for the fractional pitch period are:
respectively setting a as CT (0, 0); CT (0, T); c ═ CT (0, T + 1); CT (T, T); e ═ CT (T, T + 1); substituting (T +1) into the above two expressions to evaluate and obtain a fractional pitch period;
step C3: the final calculation of the pitch period is performed based on the candidate pitch period values of step C1 and step C2 described above. P3 is the final pitch period estimate, and the corresponding normalized autocorrelation value is r (P3); when the autocorrelation value is larger (r (P3) ≧ 0.6), the estimation of the pitch period is more accurate, and finally the fold detection of the pitch period is carried out by using the residual signal of the low-pass filtering, thus obtaining the final pitch period estimation value. The pitch period affects the recognition rate of speech recognition and the accuracy of speech compression coding.
When the autocorrelation value is small (r (P3) <0.6), indicating that the pitch signal in the LPC residual signal may be corrupted by noise, or the frame signal is not stationary, a search is made for fractional pitch periods only in the vicinity, replacing the LPC residual signal with the sampled digital speech signal, resulting in new P3 and r (P3).
The processing flow of the low-bit digital speech vector quantization method based on the MELP provided by the embodiment of the invention is shown in fig. 1, and comprises the following processing steps:
step 11, performing linear prediction coefficient vector quantization on the adjusted pitch signal by using a mixed Excitation linear prediction (melp) algorithm, including: the LSF parameter is quantized by adopting two-stage split vectors, firstly, the LSF (Linear vector frequency) parameter of the first-stage vector quantization is obtained, and the LSF parameter of the second-stage vector quantization is obtained based on the LSF parameter of the first-stage vector quantization;
specifically, in this embodiment, 5 bits are adopted to perform first-level vector quantization on the LSF parameter, so as to obtain a 10-dimensional LSF parameter;
dividing the 10-dimensional LSF parameters into front 5-dimensional LSF parameters and rear 5-dimensional LSF parameters, respectively adopting a 7-bit codebook to carry out second-stage vector quantization on the front 5-dimensional LSF parameters, and adopting a 5-bit codebook to carry out second-stage vector quantization on the rear 5-dimensional LSF parameters;
specifically, in this embodiment, 17 bits are used for vector quantization on the LSF parameters of the 2 nd subframe and the 4 th subframe of the adjusted pitch signal;
calculating the LSF parameters of the 1 st subframe and the 3 rd subframe of the adjusted pitch signal by adopting the following formula:
j=1,2,...,9
wherein,for interpolated values of LSF parameters for sub-frame 1 and sub-frame 3,the quantized value of the LSF parameter for the last subframe of the previous joint frame,quantized values of LSF for sub-frame 2 and sub-frame 4, a1(j),a2(j) Interpolating coefficients for LSF, wherein a1(j),a2(j) A 4-bit codebook is used for vector quantization.
A is a1(j),a2(j) Vector quantization is performed by adopting a codebook with 4 bits, and the method comprises the following steps:
the following objective function of vector quantization, namely a vector quantization object, is established:
wherein, w1(j),w3(j) As a weighting coefficient,/1(j),l3(j) Are the 1 st and 3 rd subframe LSF parameters without quantization.
And step 12, carrying out digital voice vector quantization by adopting the LSF parameters after the second-stage vector quantization.
Specifically, the method further comprises the steps of comparing the LSF parameter after the second-stage vector quantization with the original LSF quantization value by adopting a spectrum distortion index, wherein N represents the spectrum distortion index;
wherein L is the number of fundamental tone harmonics in the subframe, AmlAs original spectral amplitude values, AmrlIs the spectral amplitude value reconstructed after the LSF parameter after the second-stage vector quantization is adopted.
Example two
The embodiment provides a MELP-based low-bit digital speech vector quantization system, and a specific implementation structure of the system is shown in fig. 2, which may specifically include the following modules:
the coefficient acquisition module 21: the method is used for performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, and comprises the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization;
the quantization module 23: which is used for digital speech vector quantization using LSF parameters after the second stage of vector quantization.
The specific process of performing digital speech vector quantization by using the system of the embodiment of the present invention is similar to the method embodiment described above, and is not described here again.
In summary, the embodiment of the present invention obtains the adjusted pitch signal; performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, wherein the linear prediction coefficient vector quantization comprises the following steps: converting LPC parameters into line spectral pair vector LSF parameters, wherein two-stage split vector quantization is adopted for the LSF parameters, and the method comprises the following steps: carrying out first-stage vector quantization on the LSF parameters by adopting 5 bits to obtain 10-dimensional LSF parameters; dividing the 10-dimensional LSF parameters into front 5-dimensional LSF parameters and rear 5-dimensional LSF parameters, respectively adopting a 7-bit codebook to carry out second-stage vector quantization on the front 5-dimensional LSF parameters, and adopting a 5-bit codebook to carry out second-stage vector quantization on the rear 5-dimensional LSF parameters; the design scheme of the MELP-based low-bit digital speech algorithm is based on the existing design method and provides a novel MELP-based low-bit digital speech construction method aiming at the defects of the existing design method. On the basis of the MELP algorithm, the quantization of the algorithm is analyzed, the quantization of the pitch period and the quantization of the linear prediction coefficient are emphatically analyzed, an improved method is further provided for the quantization of the linear prediction coefficient, an LSF two-stage vector quantization scheme is adopted, the code rate is reduced, the storage amount and the calculation complexity of a codebook are reduced, and the method has advantages compared with the original scheme.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for low-bit digital speech vector quantization based on MELP, comprising:
performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, wherein the method comprises the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization;
and performing digital voice vector quantization by using the LSF parameter after the second-stage vector quantization.
2. A MELP-based low bit-rate digital speech vector quantization method according to claim 1, wherein said first obtaining LSF parameters for a first level of vector quantization and obtaining LSF parameters for a second level of vector quantization based on said LSF parameters for the first level of vector quantization comprises:
carrying out first-stage vector quantization on the LSF parameters by adopting 5 bits to obtain 10-dimensional LSF parameters; and dividing the 10-dimensional LSF parameters into front 5-dimensional LSF parameters and rear 5-dimensional LSF parameters, respectively carrying out second-stage vector quantization on the front 5-dimensional LSF parameters by adopting a 7-bit codebook, and carrying out second-stage vector quantization on the rear 5-dimensional LSF parameters by adopting a 5-bit codebook to obtain the LSF parameters of the second-stage vector quantization.
3. A MELP-based low-bit digital speech vector quantization method according to claim 1, wherein said performing linear prediction coefficient vector quantization on the adjusted pitch signal using a mixed excitation linear prediction MELP algorithm comprises:
carrying out vector quantization on LSF parameters of the 2 nd subframe and the 4 th subframe of the adjusted pitch signal by adopting 17 bits;
calculating the LSF parameters of the 1 st subframe and the 3 rd subframe of the adjusted pitch signal by adopting the following formula:
l ^ 1 ( j ) = a 1 ( j ) l ^ 0 ( j ) + &lsqb; 1 - a 1 ( j ) &rsqb; l ^ 2 ( j )
l3(j)=a2(j)l2(j)+[1-a2(j)]l4(j)
j=1,2,...,9
wherein,for interpolated values of LSF parameters for sub-frame 1 and sub-frame 3,the quantized value of the LSF parameter for the last subframe of the previous joint frame,quantized values of LSF for sub-frame 2 and sub-frame 4, a1(j),a2(j) Interpolating coefficients for LSF, wherein a1(j),a2(j) A 4-bit codebook is used for vector quantization.
4. A MELP-based low-bit digital speech vector quantization method according to claim 3, characterized in that said a1(j),a2(j) Vector quantization is performed by adopting a codebook with 4 bits, and the method comprises the following steps:
an objective function of vector quantization is established as follows:
E = &Sigma; 0 9 w 1 ( j ) | l 1 ( j ) - l ^ 1 ( j ) | 2 + &Sigma; 0 9 w 3 ( j ) | l 3 ( j ) - l ^ 3 ( j ) |
wherein, w1(j),w3(j) As a weighting coefficient,/1(j),l3(j) Are the 1 st and 3 rd subframe LSF parameters without quantization.
5. A MELP based low-bit digital speech vector quantization method according to claim 4, characterized in that,
comparing the LSF parameter after the second-stage vector quantization with the original LSF quantization value by adopting a spectrum distortion index,
N = 1 L &Sigma; 1 L &lsqb; 10 lg | A m l / A m r l | 2 &rsqb; 2
wherein L is the number of fundamental tone harmonics in the subframe, AmlAs original spectral amplitude values, AmrlIs the spectral amplitude value reconstructed after the LSF parameter after the second-stage vector quantization is adopted.
6. A method for MELP-based low-bit digital speech vector quantization according to claim 1, wherein obtaining said adjusted pitch signal comprises:
the sampled digital voice signal passes through a high-pass filter to obtain a filtering signal;
performing unvoiced and voiced sound judgment on the filtering signal by adopting multi-band mixed excitation, and calculating the gain of the filtering signal to obtain the fundamental tone signal;
and performing linear predictive coding on the fundamental tone signal to obtain a residual signal, calculating the fundamental tone period of the fundamental tone signal, adjusting the fundamental tone signal according to the fundamental tone period and the residual signal, and obtaining the adjusted fundamental tone signal.
7. A MELP-based low-bit digital speech vector quantization method according to claim 6, characterized in that said performing unvoiced-voiced decisions with multi-band mixed excitation on said filtered signal comprises:
dividing the filtering signal into a plurality of sub-bands, respectively carrying out unvoiced and voiced sound judgment, respectively labeling the voiced and unvoiced sounds of the sound intensity of the sub-bands,
the sound intensity of the sub-bands is represented by a parameter Vbpi (i ═ 1,2 …, n), which represents the sound intensity of the respective sub-bands,
according to fractional pitch period P2Corresponding autocorrelation function value r (P)2) Setting a first intensity threshold, when the sound intensity parameter Vbp1 of the first sub-band is not greater than the first intensity threshold, the current frame is an unvoiced frame, and all the rest band-pass unvoiced intensity Vbpi (i is 1,2,3,4 and 5) adopt unvoiced frame quantization coding;
when the sound intensity parameter Vbp1 of the first sub-band is greater than the first intensity threshold, the current frame is a voiced frame, and all of the remaining band-pass unvoiced intensities Vbpi (i ═ 1,2,3,4,5) are quantized and encoded using the voiced frame.
8. A MELP-based low-bit digital speech vector quantization method according to claim 6, characterized in that said calculating said filtered signal gain comprises:
when the sound intensity of the first sub-band is greater than the first intensity threshold and the minimum factor product of the fractional pitch period P2 is not greater than the window length threshold, adjusting the window length to be greater than the minimum factor product of the fractional pitch period P2;
when the sound intensity of the first sub-band is greater than the first intensity threshold and the minimum factor product of the fractional pitch period P2 is greater than the window length threshold, adjusting the window length to be half of the minimum factor product of the fractional pitch period;
when the sound intensity of the first sub-band is less than or equal to the first intensity threshold, the adjustment window length is equal to the smallest factor product of the fractional pitch periods.
9. A MELP-based low-bit digital speech vector quantization method according to claim 6, characterized in that said linear predictive coding of said pitch signal to obtain a residual signal comprises: passing the sampled digital speech signal through a linear prediction error filter with a transfer function of:
H ( z ) = 1 - &Sigma; i = 1 10 a i &CenterDot; z i
wherein, aiFor linear prediction coefficients, the residual signal is:
r n = s ( n ) - &Sigma; i = 1 10 a i s ( n - i )
where n is the window length of the residual analysis, the linear prediction error filter is an FIR filter, and its output is the residual signal.
10. A MELP-based low-bit digital speech vector quantization system, comprising:
a coefficient acquisition module: the method is used for performing linear prediction coefficient vector quantization on the adjusted pitch signal by adopting a Mixed Excitation Linear Prediction (MELP) algorithm, and comprises the following steps: the LSF parameters are quantized by adopting two-stage split vectors, the LSF parameters of the first-stage vector quantization are firstly obtained, and the LSF parameters of the second-stage vector quantization are obtained based on the LSF parameters of the first-stage vector quantization;
a quantization module: which is used for digital speech vector quantization using LSF parameters after the second level of vector quantization.
CN201511005800.3A 2015-12-29 2015-12-29 A kind of low bit digital speech vector quantization method and system based on MELP Pending CN106935243A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511005800.3A CN106935243A (en) 2015-12-29 2015-12-29 A kind of low bit digital speech vector quantization method and system based on MELP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511005800.3A CN106935243A (en) 2015-12-29 2015-12-29 A kind of low bit digital speech vector quantization method and system based on MELP

Publications (1)

Publication Number Publication Date
CN106935243A true CN106935243A (en) 2017-07-07

Family

ID=59458182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511005800.3A Pending CN106935243A (en) 2015-12-29 2015-12-29 A kind of low bit digital speech vector quantization method and system based on MELP

Country Status (1)

Country Link
CN (1) CN106935243A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109256143A (en) * 2018-09-21 2019-01-22 西安蜂语信息科技有限公司 Speech parameter quantization method, device, computer equipment and storage medium
CN111818519A (en) * 2020-07-16 2020-10-23 郑州信大捷安信息技术股份有限公司 End-to-end voice encryption and decryption method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040153317A1 (en) * 2003-01-31 2004-08-05 Chamberlain Mark W. 600 Bps mixed excitation linear prediction transcoding
CN101114450A (en) * 2007-07-20 2008-01-30 华中科技大学 Speech encoding selectivity encipher method
CN101281750A (en) * 2008-05-29 2008-10-08 上海交通大学 Expanding encoding and decoding system based on vector quantization high-order code book of variable splitting table
CN101937680A (en) * 2010-08-27 2011-01-05 太原理工大学 Vector quantization method for sorting and rearranging code book and vector quantizer thereof
CN103050122A (en) * 2012-12-18 2013-04-17 北京航空航天大学 MELP-based (Mixed Excitation Linear Prediction-based) multi-frame joint quantization low-rate speech coding and decoding method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040153317A1 (en) * 2003-01-31 2004-08-05 Chamberlain Mark W. 600 Bps mixed excitation linear prediction transcoding
CN101114450A (en) * 2007-07-20 2008-01-30 华中科技大学 Speech encoding selectivity encipher method
CN101281750A (en) * 2008-05-29 2008-10-08 上海交通大学 Expanding encoding and decoding system based on vector quantization high-order code book of variable splitting table
CN101937680A (en) * 2010-08-27 2011-01-05 太原理工大学 Vector quantization method for sorting and rearranging code book and vector quantizer thereof
CN103050122A (en) * 2012-12-18 2013-04-17 北京航空航天大学 MELP-based (Mixed Excitation Linear Prediction-based) multi-frame joint quantization low-rate speech coding and decoding method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王国文 等: "《第十六届全国青年通信学术会议论文集(上)》", 31 December 2011 *
王国文: "语音密码机中的语音压缩改进算法研究", 《中国优秀硕士学位论文全文数据库,信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109256143A (en) * 2018-09-21 2019-01-22 西安蜂语信息科技有限公司 Speech parameter quantization method, device, computer equipment and storage medium
CN111818519A (en) * 2020-07-16 2020-10-23 郑州信大捷安信息技术股份有限公司 End-to-end voice encryption and decryption method and system
CN111818519B (en) * 2020-07-16 2022-02-11 郑州信大捷安信息技术股份有限公司 End-to-end voice encryption and decryption method and system

Similar Documents

Publication Publication Date Title
CN105825861B (en) Apparatus and method for determining weighting function, and quantization apparatus and method
KR101698905B1 (en) Apparatus and method for encoding and decoding an audio signal using an aligned look-ahead portion
EP3132443B1 (en) Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
JP2002023800A (en) Multi-mode sound encoder and decoder
JP6395612B2 (en) System and method for mixed codebook excitation for speech coding
CN103050121A (en) Linear prediction speech coding method and speech synthesis method
EP1141946A1 (en) Coded enhancement feature for improved performance in coding communication signals
US20040111257A1 (en) Transcoding apparatus and method between CELP-based codecs using bandwidth extension
WO2009125588A1 (en) Encoding device and encoding method
US20040153317A1 (en) 600 Bps mixed excitation linear prediction transcoding
CN106935243A (en) A kind of low bit digital speech vector quantization method and system based on MELP
JP6042900B2 (en) Method and apparatus for band-selective quantization of speech signal
Tanaka et al. Low-bit-rate speech coding using a two-dimensional transform of residual signals and waveform interpolation
JP4578145B2 (en) Speech coding apparatus, speech decoding apparatus, and methods thereof
KR101857799B1 (en) Apparatus and method for determining weighting function having low complexity for lpc coefficients quantization
JP3185748B2 (en) Signal encoding device
KR101997897B1 (en) Apparatus and method for determining weighting function having low complexity for lpc coefficients quantization
KR20100006491A (en) Method and apparatus for encoding and decoding silence signal
Girin Long-term quantization of speech LSF parameters
WO2011048810A1 (en) Vector quantisation device and vector quantisation method
JP3715417B2 (en) Audio compression encoding apparatus, audio compression encoding method, and computer-readable recording medium storing a program for causing a computer to execute each step of the method
Ozaydin Residual Lsf Vector Quantization Using Arma Prediction
JP3144244B2 (en) Audio coding device
Saleem et al. Implementation of Low Complexity CELP Coder and Performance Evaluation in terms of Speech Quality
JP2005062410A (en) Method for encoding speech signal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170707