WO1994007239A1

WO1994007239A1 - Speech encoding method and apparatus

Info

Publication number: WO1994007239A1
Application number: PCT/JP1993/001323
Authority: WO
Inventors: Tomohiko Taniguchi; Yoshinori Tanaka; Yasuji Ohta; Hideaki Kurihara
Original assignee: Fujitsu Limited
Priority date: 1992-09-16
Filing date: 1993-09-16
Publication date: 1994-03-31
Also published as: JP3531935B2

Abstract

L' delta vectors Δi (i = 0, 1, 2, ..., L'-1) greater than the necessary number L is multiplied by a matrix of a linear prediction synthesis filter (3) and power (AΔi)T(AΔi) is evaluated. These powers are rearranged in accordance with their magnitude (43), and L powers are selected from the greater side. Using these selected powers as a tree-structure data code table (41), A-b-S vector quantization is effected (48). In this way, freedom of expansion of the space defined by the delta vectors can be increased and quantization characteristics can be improved. Furthermore, variable rate encoding can be accomplished by making the most of the structure of the tree-structure data code table.

Description

Description Speech coding method and device

The present invention relates to an audio encoding method and apparatus for compressing information of an audio signal, and more particularly to an analysis by-synthesis (A—b—S) vector for encoding at a transmission rate of 4 to 16 Kbps. The present invention relates to a speech encoding method and apparatus using torque quantization. Background art

A _ b — Speech encoders that use S-vector quantization, such as Code-Excited Linear Prediction (CELP) encoders, have recently been used to convert speech signals in in-house communication systems and digital mobile radio systems. It is regarded as a promising speech coder that compresses information while maintaining its quality. In this vector quantized speech coder (hereinafter simply referred to as a “coder”), a prediction signal is applied to each code vector of a codebook to generate a reproduction signal, and the reproduction signal and the input speech signal are combined with each other. It evaluates the error power between the two and determines the code vector number (index) with the least error and transmits it to the receiving side.

The encoder using the A—b—S-type vector quantization method described above performs linear prediction synthesis on each of the approximately 1000 patterns of the source signal vectors stored in the codebook. A process in which filtering is performed, and a pattern that minimizes the error between each reproduced audio signal and the input audio signal to be encoded is searched from among the approximately 1,000 patterns. Shore.

By the way, since the encoder is required to have a prompt call, Processing must be done in real time. In that case, the search process must be performed continuously during a call at a short time interval, for example, 5 ms.

However, as will be described later, complicated arithmetic operations such as filter operation and correlation operation are included in this search processing, and the amount of operation required for these arithmetic operations is, for example, several lOOMops (mega operation / second). It becomes a huge thing. To cope with this, even with the digital signal processor (DSP), which is currently the fastest, several chips are required. There is a problem that it is difficult to reduce power consumption.

As a speech coding method that solves the above problem, the present applicant has disclosed in Japanese Patent Application No. 3-127669 (Japanese Patent Application Laid-Open No. 4-352200) an alternative to storing the code vector itself as in the prior art. A tree-structured delta code that generates a code vector having a tree structure by sequentially adding and subtracting delta vectors using a codebook that stores the delta vector that is the difference between signal vectors He suggested using a book.

According to this method, the memory capacity required for the codebook is greatly reduced, and the filter operation and the correlation operation for each code vector are performed for the delta vector. And by performing a correlation operation, and sequentially adding and subtracting the results, a large reduction in the amount of operation is realized.

However, in this method, each code vector is generated as a linear combination of the delta vector as a smaller number of base vectors, and thus the delta vector is generated as a component. It has no components other than the vector. That is, of the space where the vector to be encoded is distributed (usually 40 to 64 dimensions), only the subspace of the dimension corresponding to the number of delta vectors at most (usually 8 to 10). The distribution of the code vector cannot be given. Therefore, in the tree-structured delta codebook, even if the base vector (delta vector) is designed sufficiently based on the statistical distribution of the speech signal to be encoded, the conventional structural constraints There was a problem that the quantization characteristics deteriorated compared to a codebook without one.

In Japanese Patent Application No. 3-515016, the applicant of the present application applied the above-described linear predictive synthesis filter operation to the code vector to evaluate the distance, and applied equally to all delta vector components. In the tree-structured delta codebook, each delta vector contributes to the code vector, and the contribution to the delta vector can be changed by changing the order of the delta vector. Focusing on, the filter operation is performed on each delta vector each time the coefficient of the linear prediction synthesis filter is determined, and the power (vector length) is compared. We proposed to improve the characteristics by performing encoding using a tree-structured delta codebook rearranged in order from the largest delta vector.

However, this method is not different from the conventional method in that a code vector is generated from only a limited number of delta vectors, so there is a limit to the improvement of characteristics, and further improvement is required. Have been.

A—b—Another challenge for speech encoders using S-vector quantization is to realize variable bit rate encoding. Variable bit rate coding refers to the overall efficiency by changing the bit rate of the code as appropriate according to conditions such as the margin of the transmission path and the importance of the sound source. It is a coding method that can change the bit rate to achieve good coding.

When using the vector quantization method for variable-rate speech coding, prepare a codebook with the number of patterns corresponding to each transmission rate, and use them according to the desired transmission rate. It is necessary to perform encoding while switching. At this time, in the case of a conventional codebook in which code vectors are simply arranged, the maintenance of each codebook is equivalent to the product of (vector dimension: N) and (number of patterns: M). NXM word memory is required. Here, the number of patterns M is proportional to the power of 2 of the number of bits of the index of the code vector, so that the variable width of the transmission rate can be increased or the transmission rate can be increased. There is a problem in that controlling the rate in small steps requires a large amount of memory.

Also, in variable rate transmission, it may be necessary to forcibly reduce the transmission rate of the encoded transmission signal in response to a request from the transmission network side. In such a case, the audio signal has to be reproduced from the bit-dropped information in the encoded information generated by the encoder.

Conventionally, in scalar quantization, which is less efficient than vector quantization, as a countermeasure against bit drop, control is performed by dropping bits in the order of least significant LSB or high rate. For example, a scheme has been devised such that the quantizer of (1) includes the quantization level of a low-rate quantizer (embedded coding).

However, in the case of a conventional vector quantization scheme using a codebook in which code vectors are simply arranged, the codebook itself is not structured at all, and therefore, the code vector inversion is performed. There is no difference in significance between the bits of data (although the LSB is dropped or the MSB is dropped, a completely different vector is called), and scalar quantization is performed. The same countermeasures as in case (1) cannot be taken, and there is a problem that bit-drop causes large deterioration of sound quality. Disclosure of the invention

Therefore, the first object of the present invention is a further improvement over the above-mentioned method. 3) A speech encoding method and apparatus using the extracted tree-structured data codebook is provided.

A second object of the present invention is to provide a voice coding method and apparatus using vector quantization which does not require a huge amount of memory for a codebook and can take measures against bit-up. It is here.

According to the present invention, the input speech signal vector is determined by the index assigned to the code vector having the shortest distance from the input speech signal vector among the code vectors given in advance. Is a speech encoding method for encoding the

a) Store multiple difference code vectors,

b) Multiply each of the difference code vectors by the matrix of the linear prediction synthesis filter,

c) Evaluate the amplification factor of the power of the difference code vector multiplied by the matrix,

d) rearranging the difference code vector multiplied by the matrix power in the order of the magnitude of the evaluated power amplification factor;

e) From the rearranged vectors, select a predetermined number of vectors in the order of the magnitude of the power amplification rate evaluated,

f) a code vector to be generated by sequentially adding and subtracting the selected vector on a tree structure and having been subjected to linear prediction synthesis filter processing, and the input audio signal G) estimating the distance to the vector, and g) providing a speech coding method comprising the steps of determining a code vector having the smallest estimated distance.

According to the present invention, the index added to the code vector having the smallest distance from the input speech signal vector among the code vectors given in advance. An encoding device for encoding the input audio signal vector, comprising: Means for storing a plurality of difference code vectors;

Means for multiplying each of the difference code vectors by a matrix of a linear prediction synthesis filter;

Means for evaluating the amplification factor of the power of the difference code vector multiplied by the matrix;

Means for rearranging the differential code vector multiplied by the matrix in the order of the magnitude of the estimated power amplification factor; and- power amplification evaluated from the rearranged vectors. Means for selecting a predetermined number of vectors in order of magnitude of rate,

A code vector to be generated by sequentially adding and subtracting the selected vectors on a tree structure and having been subjected to a linear prediction synthesis filter process; and the input audio signal A speech coding apparatus is also provided, comprising: means for estimating a distance to a vector; and means for determining a code vector having the smallest estimated distance.

According to the present invention, the input speech signal is represented by a code having a variable bit length added to a code vector having a minimum distance from the input speech signal vector in a predetermined code vector. A variable length speech coding method for performing variable length coding on a signal vector, comprising:

a) Store multiple difference code vectors,

b) The code vector to be generated by sequentially adding and subtracting the number of difference code vectors according to the desired code bit length from the top on the tree structure, and Evaluate the distance from the input audio signal vector,

c) determining the code vector with the smallest estimated distance; d) determining the code of the desired code bit length to be attached to the determined code vector. A provided variable length speech coding method is also provided.

According to the present invention, an input speech signal is provided in a predetermined code vector. A variable-length speech coding method for performing variable-length coding on the input speech signal vector using a variable-bit-length code attached to a code vector having a minimum distance from the signal vector.

Means for storing a plurality of difference code vectors,

A code vector to be generated by sequentially adding and subtracting a number of difference code vectors according to a desired code bit length from the head on a tree structure and the input speech signal vector Means for estimating the distance between the determined code vector, a means for determining a code vector having the smallest estimated distance, and a desired code bit length to be added to the determined code vector. There is also provided a variable-length speech encoding device comprising: a code determining unit. BRIEF DESCRIPTION OF THE FIGURES

Figure 1 is a block diagram showing the concept of the speech generation system;

Fig. 2 is a block diagram showing the principle of general CELP speech coding; Fig. 3 is a conventional technology that uses A—b—S-type vector quantization for noise codes. Block diagram showing the configuration;

Figure 4 is a block diagram that models the algorithm of the random codebook search process;

Fig. 5 is a block diagram for explaining the principle of the delta codebook; Figs. 6A and 6B are diagrams for explaining the adaptation method of the tree-structured delta codebook;

7A, 7B and 7C are diagrams for explaining the principle of the present invention; FIG. 8 is a block diagram of the speech encoder of the present invention;

FIG. 9A and FIG. 9B are diagrams for explaining the variable rate coding method of the present invention. BEST MODE FOR CARRYING OUT THE INVENTION

Speech includes voiced and unvoiced sounds. Voiced sound is generated based on a pulse sound source caused by vibration of the vocal cords, and becomes a voice with the vocal tract characteristics of the individual's nodule and mouth added. An unvoiced sound is a sound produced without shaking the vocal cords, and a mere Gaussian noise sequence becomes a sound source and becomes a voice through the vocal tract. Therefore, as shown in Fig. 1, the speech generation mechanism adds a vocal tract characteristic to the signal output from each sound source, a pulse sound source P SG that is a source of voiced sound, a noise source N SG that is a source of unvoiced sound. It can be modeled by a linear prediction synthesis filter LPCF. It should be noted that a human voice has a pitch periodicity, and the periodicity corresponds to a periodicity of a pulse output from a pulse sound source, and differs depending on a person or the content of a talk.

From the above, if the period of the pulse source corresponding to the input speech and the noise sequence of the noise source can be specified, a code (index) that identifies the pulse period and the noise sequence of the noise source can be obtained. Thus, the input speech can be encoded.

Here, as shown in FIG. 2, the vector P obtained by delaying the past value (b P + g C) by a different number of samples is stored in the adaptive codebook 11, and is stored in the adaptive codebook 11. The vector b P obtained by multiplying the vector P by the gain b is input to the linear prediction synthesis filter 12 and subjected to filter operation processing, and the obtained filter operation result bAP is input to the input audio signal. The cycle is determined by selecting the vector P of the adaptive codebook 11 in which the difference power is minimized in the error power evaluation unit 13 from the error signal after subtraction from X.

Thereafter or simultaneously with this, a plurality of noise sequences (each noise sequence is represented by an N-dimensional code vector) are prepared in the noise codebook 1, and each code vector C Is a code that minimizes the error between the reproduced signal vector g AC and the input signal vector X (N-dimensional vector) obtained by multiplying the gain by g and multiplying by the linear prediction synthesis filter 3. Evaluate the vector for error power If the unit 5 determines, the speech can be encoded by the data (index) specifying the cycle and the code vector. Note that the example described above with reference to FIG. 2 is an example where the vector AC and the vector AP are orthogonalized, and otherwise, the input signal vector X The vector X minus the vector bAP is determined from the vector X — the code vector that minimizes the error from bAP.

Figure 3 is a block diagram of the audio transmission (encoding) method using vector quantization by the A-b-S method, which corresponds to the lower half of Fig. 2. In detail, 1 is a noise codebook that stores only the size M of the N-dimensional code vector C, 2 is the gain section of the gain g, and 3 is determined by linear prediction analysis from the input signal X. A linear prediction synthesis filter having coefficients and performing a linear prediction filter operation on the output of the amplification unit 2; 4 is a reproduction signal vector and an input signal vector output from the linear prediction synthesis filter 3; An error generator 5 for outputting a vector error is an error power evaluator for evaluating the error and obtaining a code vector that minimizes the error.

In the A—b—S quantization, unlike ordinary vector quantization, after applying the optimal gain (g) to each code vector (C) in the random codebook 1, The filter processing is performed by the linear prediction synthesis filter 3, and the error between the reproduced signal vector (g AC) and the input signal vector (X) obtained by the filter processing is obtained. The signal (E) is obtained by the error generator 4, and the error power evaluator 5 searches the noise codebook 1 using the power of the error signal as an evaluation function (distance scale). A vector is obtained, and the input signal is encoded using a code (index) that specifies the code vector and transmitted.

The error power at this time is

Given by IEI ² = | X-g AC | ² (1). The optimal code vector and gain g are It is determined as minimizing the error power shown in Eq. (1). Since the power varies depending on the volume of the voice, the gain g is optimized to match the reproduced signal power with the input signal power. The optimal gain can be obtained by partially differentiating equation (1) with g and setting it to 0. That is, d IEI ² / dg = 0

Thus, g is

g = (X ^T AC) / ((AC) ^T (AO) (2). Substituting this g into the equation (1) gives

IE = IXI ² one ^{^{(X T AC) z / (}} (AC) T (AO) (3) and composed. Input signal X and the output AC of the cross-correlation of a linear prediction synthesis off I filter 3 R _xc, linear prediction synthesis If the autocorrelation of the output AC of filter 3 is _Rcc , then the cross-correlation and autocorrelation are

R xc = X ^T AC (4)

R cc = (AC) ^T (AC) (5)

Since the code vector C that minimizes the error power in Eq. (3) maximizes the second term on the right side of Eq. (3), the code vector C is

= Argmax (R _xc ² / R cc) (r; ¹ ), and the optimal gain is obtained from equation (2) using the cross-correlation and auto-correlation that satisfy equation (6).

Given by

Figure 4 is the above equation, a block diagram of the noise codebook search processing algorithms to encode the input signal in search of code base click preparative Le which error power is minimized and modeled, the cross-correlation R _xc (= X ^T AC), a calculation unit 7 for calculating the square of the cross-correlation R _xc , and an AC Yourself an arithmetic unit 8 for calculating a correlation R _cc, R _xc ^z and calculation unit 9 for calculating a ZR cc, R _xc ^z ZR cc is maximum code base click preparative error power in other words the minimum and ing An error power evaluator 5 is provided for determining a code and outputting a code specifying the code vector, but is equivalent to FIG.

Such mainly among conventional codebook search process, ① call Dobe click off I filter processing for Torr C, ② cross correlation R calculation processing of _xc, and ③ autocorrelation R _cc 3 calculation process One. Assuming that the order of the LPC filter 3 is N _P and the dimension of vector quantization (code vector) is N, the amount of computation required for each of ① to ③ for one code vector is as follows. NP · N, N, and N. Therefore, the calculation amount required for the codebook search in Ri per one code Dobeku Torr becomes _{(N P + 2) · Ν} .

Noise codebook commonly used 1, 40-D codebook size 1024 (Ν = 40, = 1024 ) are of the order, in minutes folding-degree New _[rho of about 10 next LPC full I filter 3 Therefore, in one codebook search

(10+ 2) 40 1024 = 480 X 10 ³

Requires multiply-accumulate.

In order to perform such a codebook search in each subframe of speech coding (5 msec) is, 96Mo _P s (main Gaopereshi ® N'nobyo) enormous amount of processing power that is required, the current top speed of Di digital · Even if a signal processor is used (permissible operation amount: 20-40 Mops), several chips are required for real-time implementation.

In addition, in order to store and maintain such a random codebook 1 as a table, Ν · Ν (= 40 · 1024 = 40 Kword) memory capacity is required.

Furthermore, for mobile phones and mobile phones, which are considered to be the application fields of speech encoders using A-b-S-type vector quantization, miniaturization of equipment. Low power consumption is an indispensable condition, and the enormous amount of computation and enormous memory capacity are all serious obstacles in implementing a speech encoder.

From the above, the applicant of the present application has disclosed in Japanese Patent Application No. 3-127669 (Japanese Unexamined Patent Application Publication No. 4-352200) that the amount of computation required for searching for a noise codebook can be reduced and the memorandum required for storage of the noise codebook can be reduced. In order to provide a voice coding method that can reduce the storage capacity, we proposed to use a tree-structured delta codebook as shown in Fig. 5 instead of the conventional noise codebook.

5, the initial base-vector in advance one reference noise sequences (:..! (= △ ) and (L one 1) Type (hierarchy) of delta noise sequences at a delta base-vector to A _L - L -IO) and may be stored in the delta codebook 10, the delta base-vector-according tree厶_L -, the initial base click preparative Le 〇 respectively. Each layer in addition registration and subtracting this and by sequential tree structure on the (2 ¹⁰ - 1) types of noise sequences of code base-vector (codeword) C. ~ C ₁₀₂₂ can be expressed. Or these codes base-vector - Co base-vector (or no base-vector) was added 2 of ¹⁰ noise sequences co one de base-vector (codeword) (:. -C ₁₀₂₃ As can represent the I do.

In this way, the delta codebook 10 has the initial vector △. And (L-1) kinds of Dell vector Δ, ~ 1, (1 ^ = 10) are stored, and 2 ^L -1 (= 2!.-1 = M-1) ) Code vectors or 2 ^L (= 2 ¹⁰ = M) code vectors, and the storage capacity of the delta codebook 10 is L-N (= 10N) Thus, the storage capacity of the conventional random codebook, M · N (= 1024 · N), can be significantly reduced.

With the tree structure delta codebook 10 in such a configuration, code base click preparative Le C "(j = 0 to 1022 or 1023) the cross-correlation R _xc ^(j) self Kashiwa function R _cc for ^(the next In other words, each code vector can be expressed by C _{Zk +} , = CK + Δ ii = 1, 2-L-1 (8)

)

Or C _{2 k} + ₂ = C _k -Δ i 2 ¹ 1 ≤ k, 2 (9)

RRxXc ( ^k + ⁺ 1 ⁿ )

C ⁽ 2 ^Z k —— = R <k)

XC + X ^T (AA i) (10) or R (2 K + 2)

X c = R (k)

X c-X ^Τ (ΑΔ i) (11) and

R (2 k + 1)

CC = R _cc ( ^k ) + (ΑΔ ^τ (ΑΔ i) + 2 (ΑΔ i) ^T (AC _k ) (12) or

R cc ^{(2k + z>} = R cc ^(k) + (ΑΔi) ^T (AA i) −2 (A m i) ^T (AC _k ) (13)

Therefore, for the cross-correlation R _xc , if the calculation of the cross-correlation X ^T (AAi) for each delta vector Δ (i 0 to L 1; By sequentially adding or subtracting them according to Eq. (11), that is, according to the tree structure in FIG. 5, the cross-correlation R _xc ^(j >) for all code vectors C j is calculated immediately. In the conventional codebook, it is necessary to calculate the cross-correlation of the entire noise sequence with respect to the code vector.

MN (= 1024N)

Multiply-accumulate was required. On the other hand, in the tree-structured delta codebook, the cross-correlation R _xc is not calculated directly from each code vector C j (j = 0, 1… 2 ^L — 1), and each delta vector △ j (j = 0, 1,..., L-1) are calculated by sequentially adding or subtracting them.

LN (= 10

It is possible to perform the multiply-accumulate operation twice, and the number of operations can be significantly reduced.

In addition, the cross term (ΑΔ i) ^T (AC _k ) of the third term in the autocorrelation equations (12) and (13) is About

C _k = Δ 0 ± rum, ± rum 2…

If you express

(Am i) ^T (AC _k ) = (A m i) ^T (ΑΔ o) (A m i) ^τ (ΑΔ J ±

… Sat (AAi) ^T (AAi—,) (14) Calculate the correlation (ΑΔ i) ^T (AA.,,, ₂ ... i-,) with,…,,,,,,,,,,,,,,,,, ₂ ... i-,). Is calculated. Further, the autocorrelation (ΑΔ;) ^Τ (ΑΔ ^) of each delta vector in the second term is calculated, and this is sequentially added or subtracted according to the equations (12) and (13), that is, according to the tree structure in FIG. Then, the autocorrelation R cc ^(j ) of all the code vectors C is calculated immediately.

In other words, in the conventional codebook, to calculate the autocorrelation

MN (= 1024N)

Multiply-accumulate was required. In contrast, in the tree structure delta codebook does not calculate directly from the autocorrelation R _xc ( "each code vector C Pas j = 0, 1 ... 2 have ¹ one 1), each delta vector preparative Le Δ J = 0, 1,… L1 1) and the cross-correlation of all combinations of different delta vectors,

L (L + 1) N / 2 (= 55N)

However, since the codewords (code vectors) of such a tree-structured delta codebook are all generated as linear combinations of the delta vectors, the components of the delta vectors are delta vectors. No other components. That is, of the space in which the vector to be encoded is distributed (usually 40 to 64 dimensions), the part of the dimension corresponding to the number of delta vectors at most (usually 8 to 10). Only the space can be given the distribution of the code vector.

Therefore, in the tree-structured delta codebook, even if the base vector (delta vector) is sufficiently designed based on the statistical distribution of the speech signal to be coded, the conventional structural constraints are imposed. There is a problem that the quantization characteristics are degraded as compared to a codebook without a codebook.

By the way, in the CEL P-type speech encoder to which the present invention is applied, as described above, the vector quantization is different from the normal vector quantization, and the code vector is filtered. The distance is evaluated in the space of the signal vector that has been subjected to the linear prediction synthesis filter having the transfer function Az, and the optimal vector!

It determines the torque. Five

Therefore, as shown in Figs. 6A and 6B, the space of the residual signal (the sphere of Fig. 6A when L = 3) is transformed into the space of the reproduced signal by the linear prediction synthesis filter. At this time, generally, as shown in FIG. 6B, the directional components of each axis are not uniform, and amplification with a certain distortion is performed.

In other words, the characteristic (A) of the linear predictive synthesis filter shows different amplitude amplification characteristics for each delta vector that is a component of the codebook, and the resulting vector is uniform over the entire space. Does not distribute.

In the tree-structured delta codebook shown in FIG. 5, the contribution of each delta vector to the code vector differs depending on the position where the delta vector is placed in the delta codebook 10. For example, base delta placed in the second-vector △, whereas that help to all code base-vector follows the second layer, the third delta base-vector delta ₂ below the third layer contribute to the code base click preparative Le of Te to Baie, delta ₉ contributes only to the code base click preparative Le tenth hierarchy. That is, by changing the order of the delta vectors, the contribution of each delta vector to the sign vector can be changed. In view of the above, the applicant of the present application has disclosed in Japanese Patent Application No. 3-515016 that the filter characteristics (A) are first applied to each delta vector Δi. Power for A Δ i (Amplification factor: If each delta vector is standardized, the power of A △ i itself becomes the amplification factor) IA Δ i I ² = (AAi iAAi) is calculated and compared with each other By using a codebook created by rearranging in order from the delta vector with the largest power, encoding is performed, and the distribution is biased by giving a fixed delta vector. The characteristics were improved compared to the tree-structured delta codebook.

However, in this case as well, the number of delta vectors is the same as the number actually used, and encoding is performed using the rearranged delta vectors. There are restrictions.

For the sake of simplicity, for example, in the case of L = 2, that is, the vector C 0 (= 厶), the delta vector △, and the power, the vector C ₀ ,

Consider the case of a tree-structured delta codebook that produces C ₂ (= 厶. — △,). As shown in Figure 7A. If the vectors used as, and are limited to the unit vectors e _x and e _y , the generated code vector is limited to the X — y plane represented by the oblique lines even if the order is changed. You. On the other hand, if necessary, select two from the three linearly independent unit vectors e _x , e _y > e _z and Δ. When used as,, as shown in Figs. 7A to 7C, the degree of freedom in selecting a subspace is increased. Improvement of tree structure delta codebook

Therefore, in the present invention, in order to further improve the delta vector codebook, the delta vector (L-initial vector tens L—1 Delta vector) More delta vector candidates (L 'book: L'> L) are given, and these candidates are sorted by performing the same operation as above. After that, the desired number (L) of delta vectors are selected from those with the highest amplitude amplification rate. And configure the codebook. By doing so, a codebook with a high degree of freedom can be obtained, and the quantization characteristics are improved.

The above description is about the encoder. The decoder opposite to this encoder also has the same delta vector candidates as the encoder side, and performs the same control as the encoder side. Thus, by always generating and using a codebook having the same contents as the encoder side, it is possible to ensure the opposition to the encoder side.

FIG. 8 is a block diagram showing an embodiment of the speech coding method according to the present invention based on the above idea. In this embodiment, the delta vector codebook 10 is an initial vector C representing one reference noise sequence. (= △.) And memorize and retain the delta vector Δ, which expresses more than (L'1) N-dimensional delta noise sequences than (L-1) The initial vector C. And each delta vector! ~ Δ are represented in N dimensions. In other words, the initial vector and the delta vector are two-dimensional vectors in which the amplitudes of the noises of the 発生 samples generated in time series are encoded.

In this embodiment, the linear predictive synthesis filter 3 is a filter composed of an IIR filter of order Νρ, a square matrix of Ν X される generated from the impulse response of this filter, and a delta matrix. The vector multiplication is performed, and the delta vector A i is subjected to the final processing A, and the vector Δ ι is output. The Np coefficients of the IIR filter are changed based on the input audio signal, and are determined each time by a well-known method. That is, since there is a correlation between adjacent samples of the input speech signal, a phase coefficient between samples is calculated, a partial autocorrelation coefficient called a Percoll coefficient is calculated from the correlation coefficient, and 11 R is calculated from the Percoll coefficient. The alpha coefficient of the type filter is determined, an NXN square matrix A is created using the impulse response sequence of the filter, and the vector Δi is filtered. The L 'vectors A i (i = 0, 1 ·' · L '-1) on which the filtering process has been performed are stored in the storage unit 40, and the power I iΔ i I ² = (Am i) ^T (A △ i) is evaluated. Since each delta vector A i is standardized (II ² == (A i) ^T (A i) -1), the amplification degree by filter processing A can be obtained only by evaluating the power. Is directly evaluated. Next, sorting is performed in the sorting section 43 in descending order of power based on the evaluation result of the power evaluation section 42. For example, in the example of Figure 6B,

O = e _z , Δ 1 = e X, Δ 2 = e _y

In the order of

The vector A i (i = 0, 1-L '1 1) rearranged as described above has a total of L' lines, but the subsequent encoding process uses the L lines that are actually used. This is performed by the vector AA i (i-0, 1 to L — 1).

Therefore, the selection storage unit 41 selects and stores only L vectors from those having the largest amplitude amplification factors. For example, in the above example, Δ of the above delta vectors. = E _z, and = e _x is selected. Then, based on the tree-structured delta codebook constituted by these vectors, the encoding process is performed in exactly the same manner as in the case of the conventional tree-structured delta codebook described above. Details of the encoding process

Hereinafter, the vector 記憶 Δ stored in the selection storage unit 41. , Α Δ,, Α Δ ₂ … A △ L—, a tree-structured delta codebook and the input signal vector X, the code vector with the minimum distance from the input signal vector X The details of the encoding unit 48 that finds the index of the file C will be described. The encoding unit 48 calculates a cross-correlation ^{τ τ} (ΑΔί) between the input signal vector X and each delta vector, and computes the delta vector A i by itself. An arithmetic unit 52 for calculating the autocorrelation ^(ΑΔ Τ (AA i), the cross-correlation between the delta base-vector ^{(ΑΔ ι) τ (ΑΔ ο} , an arithmetic unit 54 for calculating a ,, the output of the arithmetic unit 54 The cross-correlation term (ΑΔ ^Τ (Calculation unit 55 for calculating ACJ) and the cross-correlation of each delta vector from the calculation unit 50 are accumulated to calculate the cross-correlation between the input signal vector X and the sign vector C. An arithmetic unit 56 for calculating R _xc , an autocorrelation (ΑΔ _± ) ^Τ (ΑΔι) of each delta vector output from the arithmetic unit 52 and a cross term (ΑΔί) ^τ (AC _k ) to calculate the autocorrelation of each code vector C, a calculation unit 60 to calculate R _cx ^z ZR cc, a minimum error noise sequence determination unit 62, and speech coding It consists of 64 parts.

First, the parameter i representing the hierarchy being operated on is set to zero. In this state, the calculation units 50 and 52 calculate and output Χ ^Τ (ΑΔ.) And (ΑΔο) ^τ (ΑΔ. :), respectively. The arithmetic units 54 and 55 output 0. Calculating unit 50, 52 is output chi ^T (Alpha厶.), (Alpha厶.) ^Tau (Alpha厶.) Cross-correlation R _xc in the first hierarchy, respectively (.>, And autocorrelation R _cc (°) The signals are stored and output to the calculation units 56 and 58. The calculation unit 60 outputs these values from R _xc ⁽⁰⁾ and R cc ⁽⁰⁾ to F (X, C).

The value of R cc is calculated and output.

The error minimum noise sequence determination unit 62 compares the calculated F (X, C) with the maximum value F max (the initial value is 0) of the previous F (X, C), and obtains F (X, C)> F If it is max, F (X, C) → F max is updated and F max is updated, and a code specifying a noise sequence (code vector) that gives F max is replaced by a code up to that. Update.

Next, parameter i is updated from 0 to 1. In this state, arithmetic units 50 and 52 calculate and output X ^T (Aum (ΑΔ)) ^τ (Α Δ i), respectively, and arithmetic unit 54 calculates (ΑΔ) ^τ (Α Δ.). The calculation unit 55 outputs the value as the cross term (ΑΔ,) ^Τ (ACo) The calculation unit 56 outputs the stored R _xc ^<()) and the output from the calculation unit 50 X ^T (Am,) From the values of (10) and (11), the values of the cross correlations R _xc ⁽¹⁾ and R _xc ^(2> in the second hierarchy are calculated, output, and stored. The arithmetic unit 58 stores the stored R _cc From the values of (Aum (ΑΔ,), (ΑΔ,) ^T (ACo)) output from the arithmetic units 52 and 55, respectively, the autocorrelation R _cc in the second layer is calculated according to the equations (12) and (13). Calculate, output, and store the values of ⁽¹⁾ and R _cc ⁽²⁾ The operations of the operation unit 60 and the minimum error noise sequence determination unit 62 are the same as when i = 0.

Next, parameter i is updated from 1 to 2. In this state, the arithmetic units 50 and 52 calculate and output Χ ^Τ (Α Δ ₂ )> (Α Δ _ζ ) ^τ (Α Δ ₂ ), respectively. The arithmetic unit 54 and delta _2,厶. The cross-correlation (Α Δ ₂ ) ^Τ (Α,,) and (Α Δ ₂ ) ^Τ (room) are calculated and output. The arithmetic unit 55, from those values, calculates and outputs the term crosspoint according (14) _{^{(Α Δ Ζ) Τ (AC}} ). The arithmetic unit 56 _{divides the} stored R _xc ^(1> , R _xc ⁽²⁾⁾ and the value of X ^T (Am ₂ ) input from the arithmetic unit 50 into the third layer according to equations (10) and (11). Calculates, outputs, and stores the value of the cross-correlation R _xc ⁽³ to ^{6> in} the calculation unit. The arithmetic unit 58 stores the stored R _cc ^(1> , R cc ^(2> and the values from the arithmetic units 52 and 55). From the values of (Α Δ ₂ ) ^Τ (room ₂ ) and (Α Δ _Z ) ^T (AC) output, the autocorrelation R _cc ⁽³ ~ The operation of the operation unit 60 and the minimum error noise sequence determination unit 62 is the same as when i = 0.1.

When the above processing is repeated and the processing up to i = L-1 1 is completed, the voice coding unit 64 converts the latest code stored in the minimum error noise sequence determination unit 62 into the input signal vector X It is output as the index of the code vector whose distance to is the smallest.

In the operation of (Am i) ^T (Am i) in the operation section 52, the calculation result of the power evaluation section 42 can be used as it is. Variable rate coding

By using the above-described tree-structured delta codebook and the tree-structured book delta codebook improved by the present invention, the huge memory required in the conventional codebook is not required, and the bit-drawing is not required. Variable rate coding that can take countermeasures against gaps is realized.

That is, a tree-structured delta codebook Δ having the structure shown in FIG. 9A. , Delta,, it may be stored the厶₂ ..., the first layer of the base click preparative Le厶Chi These sac as shown in FIG. 9 beta. Using only

C * = 0 (zero vector)

C o = Δ 0

If encoding is performed so that the two code vectors are generated, C is obtained as index data. 1-bit encoding is achieved by 1-bit information indicating whether or not to select.

Vector up to the second level. Using, um,

C * = 0

C ₀ = mu.

C, = 0 + Δ,

C 2 == Δ 0 1 △ 1

If the encoding is performed so that the four code vectors are generated, C is used as the index data. A two-bit encoding is achieved with two bits of information specifying whether or not to select and / or one Ad. Similarly, the vector Δ up to the i-th stage. , Mm, ... i-bit encoding is achieved by using Δi. Therefore, the bit length of the generated index data can be set to 1 to just by using a set of tree-structured delta codebooks including L delta vectors Δ 0, Δ,… △ L—,. It can be changed arbitrarily in the range of L.

Using a conventional codebook, variable bit rate encoding of〗 to L bits is performed. Assuming that the dimension of the vector is N, the number of required memory bits is

NX (2 ° + 2 ¹ ten ... ten ^{^{2 L) = NX (2 L}} + 1 one 1)

It is. On the other hand, if the tree-structured delta codebook of Fig. 9A is used as shown in Fig. 9B, the required number of memory words is

N X L

It is.

The tree-structured delta codebook includes a tree-structured delta codebook that does not perform the above-described reordering, a tree-structured delta codebook in which delta vectors are rearranged according to the amplification factor of A, and L ′ Any of the tree-structured delta codebooks that can be used by selecting L out of these data vectors can be used.

The control for changing the bit rate can be easily achieved by terminating the processing in the encoding processing unit 48 of FIG. 8 at an intermediate layer in accordance with the desired number of bits. For example, in the case of 4-bit encoding, the above-described processing of the encoding processing unit 48 may be performed for i == 0, 1, 2, and 3.

Embedded coding

The embedded coding, that is, the coding system that can reproduce the sound at the decoder even if some bits are forcibly omitted in the transmission path, uses the above-mentioned tree structure delta code. In variable-rate encoding using books, if some bits are missing, the coding system can be configured to be reproduced as a code vector of its parent or ancestor on a tree structure. Achieved. For example, a 4-bit code system [C. , C,... C, ₄ ], if one bit is missing, C, ₃ , C, and ₄ are assumed to be 3-bit C _6, and C ^. Cu is a 3-bit C ₅ It is configured to be played as. In this way, the parent-child code vectors are relatively close. Since it has a high value, it is possible to reproduce the sound without significant deterioration of sound quality.

Table 14 shows an example of such a coding system.

Table 1 Transmission bits: 1 bit

Table 2 Transmission bits: 2 bits

Table 3 Transmission bits: For 3 bits Code vector fe;

0 0 0

C 0 0 0 1

c ₁ 0 1 1

C ₂ 0 1 0

C 1 1 1

C 4 1 1 0

C 1 0 1

C 6 1 0 0 Table 4 Transmission bits: In case of 4 bits Code vector Transmission code

0 0 0 0

C 0 0 0 0 1

c, 0 0 1 1

C 2 0 0 1 0

C 3 0 1 1 1

C 4 0 1 1 0

C ₅ 0 1 0 1

C 6 0 1 0 0

C 7 1 1 1 1

C ₈ 1 1 1 0

C, 1 1 0 1

10 1 1 0 0

C H 1 0 1 1

I 2 1 0 1 0

13 1 0 0 1

1 1 0 0 0 The above coding system, for example, in the case of 4 bits, is defined as in the following example.

C,, = mu. Ichi厶, ten delta ₂ +厶₃ has a element of the four delta 'vector, each code in order from the upper (ten, one, ten, +) and this since the "1 0 1 1 ".

C 2 = Δ. — △, has only two delta 'vector elements, and the signs are in the order (10,1). The code in this case is regarded as (0, 0, ten, one) and is expressed as "0 0 1 0". Table 5 shows the case where a bit drop of 1 bit from 4 bits to 3 bits occurs for the information encoded in this way.

Table 5

If two bits are lost, the data is reproduced as shown in Table 6. Table 6

In this case, it is reproduced as a vector of the ancestor of the upper hierarchy two layers. Table 7-10 shows another example of the embedded code system of the present invention. Table 7 Transmission bit: 1 bit In the case of a packet

C * 0

C 0 1 8 transmission bits: 2 bits

9 transmission bits: 3 bits

Two

Code vector fc 7:

0 0 0

C 0 0 0 1 c, 0 1 0

C 2 0 1 1

C 3 1 0 0

C 4 1 0 1

C 5 1 1 0

C _b 1 1 1

Table 10 Transmission bits: 4 bits

In this coding system, if one bit is missing, the vector of the parent is played. If two bits are missing, the vector of the ancestor two levels higher is played.

Claims

The scope of the claims

1. The input speech signal vector is determined by the index assigned to the code vector having the smallest distance from the input speech signal vector among the code vectors given in advance. A speech coding method to be coded, comprising: a) storing a plurality of differential code vectors,

c) Evaluate the power gain of the differential code vector multiplied by the matrix,

d) rearrange the difference code vector multiplied by the matrix in the order of the magnitude of the evaluated power amplification factor;

e) From the sorted vectors, select a predetermined number of vectors in the order of the magnitude of the power amplification rate evaluated,

f) a code vector to be generated by sequentially adding and subtracting the selected vector on a tree structure and having been subjected to a linear prediction synthesis filter process; and the input audio signal G) A speech coding method comprising the steps of: e) evaluating a distance from a vector, and g) determining a code vector having the smallest estimated distance.

2. The method of claim 1, wherein each of the difference code vectors is normalized.

3. The step f) is performed by calculating a cross-correlation between each of the selected vectors and the input audio signal vector and sequentially adding and subtracting them on a tree structure. Calculate the cross-correlation _Rxc between the input speech signal vector and the code vector that has been subjected to the linear prediction synthesis filter processing, and calculate the autocorrelation of each of the selected _vectors and a different vector. Calculate the cross-correlation of all combinations of tolls and add and subtract them sequentially on the tree structure. To calculate the autocorrelation R cc of the code vector subjected to the linear prediction synthesis filter processing.

For each of the code _vectors , calculating R _xc ² / R _cc by dividing the square of the cross-correlation R _xc by the auto-correlation R _cc ,

Wherein step g) includes Complex that you determine the distance between the maximum R _xc ^z / R _cc input speech signal base click preparative Le code base-vector giving the value of a code base-vector is the minimum The method of claim 1.

4. The index assigned to the code vector whose distance from the input speech signal vector is the smallest of the code vectors given in advance is determined by the index assigned to the input speech signal vector. An encoding device for encoding a plurality of differential code vectors, comprising:

Means for evaluating the gain of the phase of the differential code vector multiplied by the matrix;

Means for rearranging the difference code vector multiplied by the matrix in the order of the magnitude of the evaluated power amplification factor;

Means for selecting a predetermined number of vectors in the order of the magnitude of the power amplification rate evaluated from the sorted vectors,

A code vector to be generated by sequentially adding and subtracting the selected vector on a tree structure, the code vector having been subjected to a linear prediction synthesis filter process, and the input audio signal vector A speech coding apparatus comprising: means for evaluating a distance to the speech signal; and means for determining a code vector having the smallest estimated distance.

5. The apparatus according to claim 4, wherein each of the difference code vectors is normalized.

6. The distance estimating means includes: each of the selected vectors; The cross-correlation with the input speech signal vector was calculated and sequentially added and subtracted on the tree structure, whereby the input speech signal vector and the linear prediction synthesis filter processing were performed. Means for calculating the cross-correlation R _xc with the code vector,

The auto-correlation of each of the selected vectors and the cross-correlation of all combinations of the different vectors are calculated and sequentially added and subtracted on a tree structure, thereby obtaining the linear prediction synthesis filter. Means for calculating the autocorrelation R _cc of the filtered code vector,

For each code vector, provide a means to calculate R _xc ² / R _cc by dividing the square of the cross-correlation R _xc by the auto-correlation R _cc ,

The code base click preparative Le determining means maximum R _xc ² / R _cc means the distance the code base-vector giving the value as the input speech signal base-vector is determined to be the code base-vector is the minimum of The apparatus according to claim 4, wherein the apparatus comprises:

7. The code of the variable bit length assigned to the code vector whose distance from the input speech signal vector is the smallest among the code vectors given in advance is determined by the code of the variable bit length. A variable length speech encoding method for variable length encoding

a) Store multiple difference code vectors,

b) A code vector to be generated by sequentially adding and subtracting a number of difference code vectors according to a desired code bit length from the head on a tree structure and the input speech signal level. Evaluate the distance to the

c) determining the code vector with the smallest estimated distance; d) determining the code of the desired code bit length to be attached to the determined code vector. A variable length speech coding method provided.

8. further comprising the step of multiplying each of the difference code vectors by a matrix of a linear prediction synthesis filter;

In the step b), the differential code vector multiplied by the matrix is used. A linear prediction synthesis filter to be generated by sequentially adding and subtracting tolls on a tree structure, and a code vector subjected to the filtering process and the input voice signal vector. 8. The method according to claim 7, wherein the distance is evaluated.

9. The step b) calculates a cross-correlation between each of the difference code vectors multiplied by the matrix and the input speech signal vector, and sequentially adds and subtracts them on a tree structure. As a result, a cross-correlation R _xc between the input speech signal vector and the code vector subjected to the linear prediction synthesis filter processing is calculated,

The tree structure is calculated by calculating the autocorrelation of each differential code vector multiplied by the matrix and the cross-correlation of all combinations of different vectors.

自己 Calculate the autocorrelation R of the code vector subjected to the linear prediction synthesis filter processing by sequentially adding and subtracting on the structure.

For each code vector, we evaluate R _xc ² ZR cc by dividing the square of the cross-correlation R _xc by the auto-correlation R _cc ,

The step c) includes determining that the code vector giving the maximum value of R _xc ² _cc is the code vector having the minimum distance from the input speech signal vector. The method according to paragraph 8 above.

10. Evaluate the power gain of the differential code vector multiplied by the matrix,

Rearranging the difference code vector multiplied by the matrix in order of the magnitude of the evaluated power amplification factor;

10. The method according to claim 9, wherein in the step b), addition and subtraction on a tree structure are performed according to the rearranged order.

11. A further step of selecting a predetermined number of vectors in the order of the magnitudes of the power amplification rates evaluated from the sorted vectors,

In the step b), a tree structure is selected for the selected vector. 10. The method according to claim 10, wherein synthetic addition and subtraction are performed.

12. A variable bit length code assigned to the code vector having the shortest distance from the input voice signal vector among the code vectors given in advance, and the input voice signal vector A variable-length speech encoding device for encoding a file with variable length,

Means for storing a plurality of difference code vectors;

The difference between the code vector to be generated by sequentially adding and subtracting the number of difference code vectors corresponding to the desired code bit length from the head on the tree structure and the input speech signal vector. Means for estimating the distance, means for determining the code vector having the smallest estimated distance, and determining the code of the desired code bit length to be added to the determined code vector Variable length speech coding apparatus comprising:

13. The apparatus further comprises means for multiplying each of the difference code vectors by a matrix of a linear prediction synthesis filter,

The distance estimating means is a code that has been subjected to linear prediction synthesis filter processing to be generated by sequentially adding and subtracting the difference code vector multiplied by the matrix on a tree structure. 13. The apparatus according to claim 12, wherein a distance between the vector and the input voice signal vector is evaluated.

14. The distance evaluation means calculates a cross-correlation between each of the difference code vectors multiplied by the matrix and the input speech signal vector, and sequentially adds and subtracts them on a tree structure. Means for calculating a cross-correlation _Rxc between the input speech signal vector and the code vector subjected to the linear prediction synthesis filter processing,

By calculating the autocorrelation of each of the difference code vectors multiplied by the matrix and the cross-correlation of all combinations of different vectors, and sequentially adding and subtracting them on a tree structure. Means for calculating an autocorrelation R cc of the code vector subjected to the linear prediction synthesis filter processing; For each of the code _vectors , provide a means to evaluate R _xc ² / R _cc by dividing the square of the cross-correlation R _xc by the auto-correlation R _cc ,

The code base click preparative Le determining means maximum R _xc ² / R _cc means the distance the code base-vector giving the value as the input speech signal base-vector is determined to be the code base-vector is the minimum of 14. The device according to claim 13, wherein the device comprises:

15. means for evaluating the power gain of the differential code vector multiplied by the matrix;

15. The apparatus according to claim 14, wherein said distance evaluation means performs addition and subtraction on a tree structure according to the rearranged order.

16. There is further provided a means for selecting a predetermined number of vectors in the order of the magnitudes of the power amplification rates evaluated from the rearranged vectors,

16. The apparatus according to claim 15, wherein the distance evaluation unit performs addition and subtraction on a tree structure with respect to the selected vector.

16. A claim according to claim 7, wherein a code is assigned to each of said code vectors so as to correspond to the code vector corresponding to its parent in the tree structure if one bit is missing. The described method.

17. A claim according to claim 12, wherein a code is assigned to each of said code vectors so as to correspond to a code vector corresponding to its parent in a tree structure if one bit is missing. The described device.