JP4236675B2

JP4236675B2 - Speech code conversion method and apparatus

Info

Publication number: JP4236675B2
Application number: JP2006205814A
Authority: JP
Inventors: 政直鈴木; 恭士大田; 義照土永; 正清田中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2006-07-28
Filing date: 2006-07-28
Publication date: 2009-03-11
Anticipated expiration: 2022-01-29
Also published as: JP2006350373A

Abstract

<P>PROBLEM TO BE SOLVED: To reduce deterioration in sound quality by making it possible to perform voice code conversion even between voice coding systems differing in subframe length and to perform code conversion corresponding to the rate of an EVRC by making a delay time as short as possible. <P>SOLUTION: Voice code conversion sections for a full rate, a half rate, and a 1/8 rate are provided corresponding to rates of a first voice code, and when a rate of the first voice code is the 1/8 rate, the voice code conversion part for the 1/8 rate (1) quantizes a reversely quantized value of an LSP code included in a voice code by a first voice encoding system by a second encoding system to obtain the LSP code of a second voice code, (2) generates a target signal and an algebraic composite signal and obtains an algebraic code by the second voice encoding system such that the difference between those target signal and algebraic composite signal becomes minimum, and (3) uses a reversely quantized value, an algebraic code, etc., of a pitch lag code obtained by voice code conversion for the full rate or half rate to obtain and output a gain code of the second voice encoding system. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、第1の音声符号化方式により符号化して得られる音声符号を第２の音声符号化方式の音声符号に変換する音声符号変換方法及び装置に係わり、特に、インターネットや携帯電話システムなどで用いられる第1の音声符号化方式で音声を符号化して得られた音声符号を、異なる第2の音声符号化方式の音声符号に変換する音声符号変換方法および装置に関する。 The present invention relates to a voice code conversion method and apparatus for converting a voice code obtained by encoding using a first voice coding method into a voice code of a second voice coding method, and in particular, the Internet, a mobile phone system, and the like. The present invention relates to a speech code conversion method and apparatus for converting a speech code obtained by encoding speech using the first speech encoding method used in the above to a speech code of a different second speech encoding method.

近年、携帯電話の加入者が爆発的に増加しており、今後も利用者数が増加することが予想されている。また、インターネットを使った音声通信(Voice over IP; VoIP)は、企業内ネットワーク(イントラネット)や長距離電話サービスなどの分野で普及しつつある。携帯電話やVoIPなどの音声通信システムでは、通信回線を有効利用するために音声を圧縮する音声符号化技術が用いられている。携帯電話では国によってあるいはシステムによって異なる音声符号化技術が用いられているが、次世代携帯電話システムとして期待されているcdma2000では、音声符号化方式としてEVRC(Enhanced Variable Rate CODEC; エンハンスト可変レート音声符号化)方式が採用されている。一方、VoIPでは音声符号化方式としてITU-T勧告G.729Aが広く用いられている。以下では、まずG.729AとEVRCの概要について説明する。 In recent years, the number of mobile phone subscribers has increased explosively, and the number of users is expected to increase in the future. Voice communication using the Internet (Voice over IP; VoIP) is becoming widespread in fields such as corporate networks (intranet) and long distance telephone services. In a voice communication system such as a cellular phone or VoIP, a voice coding technique for compressing voice is used to effectively use a communication line. Different voice coding technologies are used in mobile phones depending on the country or system. In cdma2000, which is expected as a next-generation mobile phone system, EVRC (Enhanced Variable Rate CODEC) is used as a voice coding method. Method) is adopted. On the other hand, in VoIP, ITU-T recommendation G.729A is widely used as a voice encoding method. Below, the outline of G.729A and EVRC will be described first.

(1) G.729Aの説明
・符号器の構成及び動作
図15はITU-T勧告G.729A方式の符号器の構成図である。図15において、１フレーム当り所定サンプル数（＝Ｎ）の入力信号（音声信号）Ｘがフレーム単位でLPC分析部１に入力する。サンプリング速度を8kHz、1フレーム期間を10msecとすれば、1フレームは80サンプルである。LPC分析部１は、人間の声道を次式
H(z)=１／［１＋Σαi・ｚ^-i］（ｉ＝１〜P） (1)
で表される全極型フィルタと見なし、このフィルタの係数αi(i=1,・・・,P)を求める。ここで、Pはフィルタ次数である。一般に、電話帯域音声の場合はPとして10〜12の値が用いられる。 LPC(線形予測)分析部１では、入力信号の80サンプルと先読み分の40サンプル及び過去の信号120サンプルの合計240サンプルを用いてLPC分析を行いLPC係数を求める。 (1) Description of G.729A Configuration and operation of encoder FIG. 15 is a configuration diagram of an encoder of the ITU-T recommendation G.729A system. In FIG. 15, an input signal (audio signal) X having a predetermined number of samples (= N) per frame is input to the LPC analyzer 1 in units of frames. If the sampling rate is 8 kHz and one frame period is 10 msec, one frame is 80 samples. The LPC analysis unit 1 expresses the human vocal tract as
H (z) = 1 / [1 + Σαi · z ⁻ⁱ ] (i = 1 to P) (1)
The coefficient αi (i = 1,..., P) of this filter is obtained. Here, P is the filter order. Generally, in the case of telephone band voice, a value of 10 to 12 is used as P. The LPC (Linear Prediction) analysis unit 1 performs LPC analysis using a total of 240 samples of 80 samples of the input signal, 40 samples of the pre-reading, and 120 samples of the past signal to obtain LPC coefficients.

パラメータ変換部２はLPC係数をLSP(線スペクトル対)パラメータに変換する。ここで、LSPパラメータは、LPC係数と相互に変換が可能な周波数領域のパラメータであり、量子化特性がLPC係数よりも優れていることから量子化はLSPの領域で行われる。LSP量子化部３は変換されたLSPパラメータを量子化してLSP符号とLSP逆量子化値を求める。LSP補間部４は、現フレームで求めたLSP逆量子化値と前フレームで求めたLSP逆量子化値によりLSP補間値を求める。すなわち、１フレームは5msecの第１、第２の２つのサブフレームに分割され、LPC分析部１は第２サブフレームのLPC係数を決定するが、第１サブフレームのLPC係数は決定しない。そこで、LSP補間部４は、現フレームで求めたLSP逆量子化値と前フレームで求めたLSP逆量子化値を用いて補間演算により第１サブフレームのLSP逆量子化値を予測する。 The parameter converter 2 converts the LPC coefficient into an LSP (Line Spectrum Pair) parameter. Here, the LSP parameter is a parameter in the frequency domain that can be mutually converted with the LPC coefficient. Since the quantization characteristic is superior to the LPC coefficient, the quantization is performed in the LSP area. The LSP quantization unit 3 quantizes the converted LSP parameter to obtain an LSP code and an LSP inverse quantization value. The LSP interpolation unit 4 obtains an LSP interpolation value from the LSP inverse quantization value obtained in the current frame and the LSP inverse quantization value obtained in the previous frame. That is, one frame is divided into two first and second subframes of 5 msec, and the LPC analysis unit 1 determines the LPC coefficient of the second subframe, but does not determine the LPC coefficient of the first subframe. Therefore, the LSP interpolation unit 4 predicts the LSP inverse quantization value of the first subframe by interpolation using the LSP inverse quantization value obtained in the current frame and the LSP inverse quantization value obtained in the previous frame.

パラメータ逆変換部５はLSP逆量子化値とLSP補間値をそれぞれLPC係数に変換してLPC合成フィルタ６に設定する。この場合、LPC合成フィルタ６のフィルタ係数として、フレームの第１サブフレームではLSP補間値から変換されたLPC係数が用いられ、第２サブフレームではLSP逆量子化値から変換したLPC係数が用いられる。尚、以降において1に添字があるもの、例えばlspi, li（ｎ）,・・・における1はアルファベットのエルである。 The parameter inverse conversion unit 5 converts the LSP inverse quantization value and the LSP interpolation value into LPC coefficients, respectively, and sets them in the LPC synthesis filter 6. In this case, as the filter coefficient of the LPC synthesis filter 6, the LPC coefficient converted from the LSP interpolation value is used in the first subframe of the frame, and the LPC coefficient converted from the LSP dequantized value is used in the second subframe. . In the following description, 1 with a subscript, for example, 1 in lspi, li (n),.

LSPパラメータlspi(i=1,・・・,P)はLSP量子化部３でスカラー量子化やベクトル量子化などにより量子化された後、量子化インデックス（LSP符号)が復号器側へ伝送される。図1６は量子化方法説明図であり、量子化テーブル３ａにはインデックス番号１〜ｎに対応させて多数の量子化LSPパラメータの組が記憶されている。距離演算部３ｂは次式
ｄ＝Σｉ｛lspｑ(i)-lspi｝² (i=1〜P)
により距離を演算する。そして、ｑを１〜ｎまで変化させた時、最小距離インデックス検出部３ｃは距離ｄが最小となるｑを求め、インデックスｑをLSP符号として復号器側へ伝送する。 The LSP parameter lspi (i = 1,..., P) is quantized by the LSP quantization unit 3 by scalar quantization or vector quantization, and then the quantization index (LSP code) is transmitted to the decoder side. The FIG. 16 is an explanatory diagram of the quantization method, and the quantization table 3a stores a large number of sets of quantization LSP parameters corresponding to the index numbers 1 to n. The distance calculation unit 3b has the following formula: d = Σi {lspq (i) -lspi} ² (i = 1 to P)
To calculate the distance. When q is changed from 1 to n, the minimum distance index detection unit 3c obtains q that minimizes the distance d, and transmits the index q as an LSP code to the decoder side.

次に音源とゲインの探索処理を行なう。音源とゲインはサブフレーム単位で処理を行う。まず、音源信号をピッチ周期成分と雑音成分の２つに分け、ピッチ周期成分の量子化には過去の音源信号系列を格納した適応符号帳７を用い、雑音成分の量子化には代数符号帳や雑音符号帳などを用いる。以下では、音源符号帳として適応符号帳７と代数符号帳８の２つを使用する音声符号化方式について説明する。 Next, sound source and gain search processing is performed. Sound source and gain are processed in subframe units. First, the sound source signal is divided into a pitch period component and a noise component, the pitch code component is quantized using the adaptive codebook 7 storing a past sound source signal sequence, and the noise component is quantized with an algebraic codebook. Or a noise codebook. In the following, a speech coding scheme that uses the adaptive codebook 7 and the algebraic codebook 8 as the excitation codebook will be described.

適応符号帳７は、インデックス１〜Ｌに対応して順次１サンプル遅延したＮサンプル分の音源信号（周期性信号という）を出力するようになっている。図1７は1サブフレーム40サンプル(N=40)とした場合の適応符号帳７の構成図であり、最新の(L+39)サンプルのピッチ周期成分を記憶するバッファＢＦで構成され、インデックス１により1〜40サンプルよりなる周期性信号が特定され、インデックス２により2〜41サンプルよりなる周期性信号が特定され、・・・インデックスＬによりL〜L+39サンプルよりなる周期性信号が特定される。初期状態では適応符号帳７の中身は全ての振幅が0の信号が入っており、毎サブフレーム毎に時間的に一番古い信号をサブフレーム長だけ捨て、現サブフレームで求めた音源信号を適応符号帳７に格納するように動作する。 The adaptive codebook 7 outputs a sound source signal (referred to as a periodic signal) for N samples sequentially delayed by one sample corresponding to the indexes 1 to L. FIG. 17 is a block diagram of the adaptive codebook 7 when one subframe has 40 samples (N = 40). The adaptive codebook 7 includes a buffer BF that stores the pitch period component of the latest (L + 39) samples. Identifies a periodic signal composed of 1 to 40 samples, identifies a periodic signal composed of 2 to 41 samples by index 2, ... identifies a periodic signal composed of L to L + 39 samples by index L The In the initial state, the contents of the adaptive codebook 7 contain all signals with an amplitude of 0, discard the oldest signal in time every subframe by the length of the subframe, and obtain the sound source signal obtained in the current subframe. It operates so as to be stored in the adaptive codebook 7.

適応符号帳探索は、過去の音源信号を格納している適応符号帳７を用いて音源信号の周期性成分を同定する。すなわち、適応符号帳７から読み出す開始点を1サンプルづつ変えながら適応符号帳７内の過去の音源信号をサブフレーム長(=40サンプル)だけ取り出し、LPC合成フィルタ６に入力してピッチ合成信号β・A・PLの周期性信号(適応符号ベクトル)、AはLPC合成フィルタ６のインパルス応答、βは適応符号帳ゲインである。 In the adaptive codebook search, the periodic component of the excitation signal is identified using the adaptive codebook 7 storing the past excitation signal. That is, while the starting point read from the adaptive codebook 7 is changed by one sample, the past excitation signal in the adaptive codebook 7 is extracted by subframe length (= 40 samples), and is input to the LPC synthesis filter 6 to be input to the pitch synthesis signal β. A / PL periodic signal (adaptive code vector), A is an impulse response of the LPC synthesis filter 6, and β is an adaptive codebook gain.

演算部９は入力音声Ｘとβ・A・PLの誤差電力ELを次式
EL＝｜X−β・A・PL｜² (2)
により求める。
適応符号帳出力の重み付き合成出力をA・PLとし、A・PLの自己相関をＲpp、A・PLと入力信号Xの相互相関をＲxpとすると、式(2)の誤差電力が最小となるピッチラグＬoptにおける適応符号ベクトルPLは、次式
PL=argmax（Rxp²／Rpp） (3)
により表わされる。すなわち、ピッチ合成信号A・PLと入力信号Xとの相互相関Ｒxpをピッチ合成信号の自己相関Ｒppで正規化した値が最も大きくなる読み出し開始点を最適な開始点とする。以上より、誤差電力評価部１０は(3)式を満足するピッチラグＬoptを求める。このとき、最適ピッチゲインβoptは次式
βopt＝Ｒxp／Ｒpp (4)
で与えられる。 The arithmetic unit 9 calculates the input power X and the error power EL of β / A / PL as follows:
EL ＝｜ X−β ・ A ・ PL ｜ ² (2)
Ask for.
When the weighted composite output of the adaptive codebook output is A · PL, the autocorrelation of A · PL is Rpp, and the cross correlation between A · PL and the input signal X is Rxp, the error power in equation (2) is minimized. The adaptive code vector PL for the pitch lag Lopt is given by
^{PL = argmax (Rxp 2 / Rpp} ) (3)
Is represented by That is, the reading start point where the value obtained by normalizing the cross-correlation Rxp between the pitch composite signal A · PL and the input signal X by the autocorrelation Rpp of the pitch composite signal is the optimum start point. As described above, the error power evaluation unit 10 obtains the pitch lag Lopt that satisfies the expression (3). At this time, the optimum pitch gain βopt is expressed by the following equation: βopt = Rxp / Rpp (4)
Given in.

次に代数符号帳８を用いて音源信号に含まれる雑音成分を量子化する。代数符号帳８は、振幅が1又は−1の複数のパルスから構成される。例として、サブフレーム長が40サンプルの場合のパルス位置を図１８に示す。代数符号帳８は、１サブフレームを構成するＮ(=40)サンプル点を複数のパルス系統グループ１〜４に分割し、各パルス系統グループから１つのサンプル点を取り出してなる全組み合わせについて、各サンプル点で＋１あるいは−１のパルスを有するパルス性信号を雑音成分として順次出力する。この例では、基本的に1サブフレームあたり4本のパルスが配置される。図１９は各パルス系統グループ１〜４に割り当てたサンプル点の説明図であり、
(1) パルス系統グループ１には8個のサンプル点 0、5、10,15,20,25,30,35が割り当てられ、
(2) パルス系統グループ２には8個のサンプル点 1、6、11,16,21,26,31,36が割り当てられ、
(3) パルス系統グループ３には8個のサンプル点 2、7、12,17,22,27,32,37が割り当てられ、
(4) パルス系統グループ４には16個のサンプル点 3,4,8,9,13,14,18,19,23,
24,28,29,33,34,38,39が割り当てられている。 Next, the noise component contained in the sound source signal is quantized using the algebraic codebook 8. The algebraic codebook 8 is composed of a plurality of pulses having an amplitude of 1 or −1. As an example, FIG. 18 shows pulse positions when the subframe length is 40 samples. The algebraic codebook 8 divides N (= 40) sample points constituting one subframe into a plurality of pulse system groups 1 to 4, and for each combination obtained by taking one sample point from each pulse system group, A pulse signal having a +1 or -1 pulse at the sample point is sequentially output as a noise component. In this example, basically four pulses are arranged per subframe. FIG. 19 is an explanatory diagram of sample points assigned to each of the pulse system groups 1 to 4.
(1) Eight sample points 0, 5, 10, 15, 20, 25, 30, 35 are assigned to pulse system group 1,
(2) Eight sample points 1, 6, 11, 16, 21, 26, 31, 36 are assigned to pulse system group 2.
(3) Eight sample points 2, 7, 12, 17, 22, 27, 32, 37 are assigned to pulse system group 3,
(4) Pulse system group 4 has 16 sample points 3,4,8,9,13,14,18,19,23,
24,28,29,33,34,38,39 are assigned.

パルス系統グループ１〜３のサンプル点を表現するために３ビット、パルスの正負を表現するのに１ bit、トータル4 bit が必要であり、又、パルス系統グループ４のサンプル点を表現するために4 bit、パルスの正負を表現するのに1 bit、トータル5 bit 必要である。従って、図１８のパルス配置を有する雑音符号帳８から出力するパルス性信号を特定するために17bitが必要になり、パルス性信号の種類は２¹⁷（＝２⁴×２⁴×２⁴×２⁵）存在する。
図１８に示すように各パルス系統のパルス位置は限定されており、代数符号帳探索では各パルス系統のパルス位置の組み合わせの中から、再生領域で入力音声との誤差電力が最も小さくなるパルスの組み合わせを決定する。すなわち、適応符号帳探索で求めた最適ピッチゲインβoptとし、適応符号帳出力ＰＬに該ゲインβoptを乗算して加算器１１に入力する。これと同時に代数符号帳８より順次パルス性信号を加算器に１１に入力し、加算器出力をLPC合成フィルタ６に入力して得られる再生信号と入力信号Ｘとの差が最小となるパルス性信号を特定する。具体的には、まず入力信号Ｘから適応符号帳探索で求めた最適な適応符号帳出力ＰＬ、最適ピッチゲインβ_optから次式により代数符号帳探索のためのターゲットベクトルＸ′を生成する。 3 bits are required to represent the sample points of pulse system groups 1 to 3, 1 bit is required to represent the positive and negative of the pulse, and a total of 4 bits are required. Also, to represent the sample points of pulse system group 4 4 bits, 1 bit and 5 bits in total are required to express the positive and negative of the pulse. Accordingly, ¹⁷ bits are required to specify the pulse signal output from the noise codebook 8 having the pulse arrangement of FIG. 18, and the type of the pulse signal is 2 ¹⁷ (= 2 ⁴ × 2 ⁴ × 2 ⁴ × 2). ⁵ ) Exists.
As shown in FIG. 18, the pulse position of each pulse system is limited, and in the algebraic codebook search, the pulse with the smallest error power from the input speech in the reproduction area is selected from the combinations of pulse positions of each pulse system. Determine the combination. That is, the optimum pitch gain βopt obtained by the adaptive codebook search is set, and the adaptive codebook output PL is multiplied by the gain βopt and input to the adder 11. At the same time, a pulse signal is sequentially input to the adder 11 from the algebraic codebook 8 and the difference between the reproduced signal obtained by inputting the adder output to the LPC synthesis filter 6 and the input signal X is minimized. Identify the signal. Specifically, first, a target vector X ′ for algebraic codebook search is generated from the optimal adaptive codebook output PL obtained from the input signal X by adaptive codebook search and the optimal pitch gain _βopt by the following equation.

Ｘ′＝Ｘ−β_optAPＬ (5)
この例では、パルスの位置と振幅(正負)を前述のように17bitで表現するため、その組合わせは2の17乗通り存在する。ここで、k通り目の代数符号出力ベクトルをCｋとすると、代数符号帳探索では次式
Ｄ＝|Ｘ′−GC・A・Ck|² (6)
の評価関数誤差電力Ｄを最小とする符号ベクトルCｋを求める。GCは代数符号帳ゲインである。誤差電力評価部１０は代数符号帳の探索において、代数合成信号A・Ckと入力信号Ｘ′の相互相関値Rcxの２乗を代数合成信号の自己相関値Rccで正規化して得られる正規化相互相関値(Rcx*Rcx/Rcc)が最も大きくなるパルス位置と極性の組み合わせを探索する。尚、ピッチラグがサブフレーム長よりも短い場合には、音質を向上させるためにピッチ周期化部を設け、該ピッチ周期化部により代数符号帳出力に周期性を持たせるピッチ周期化処理を行わせることができる。代数符号帳探索の出力結果は、各パルスの位置と符号(正負)であり、これをまとめて代数符号と呼ぶ。 X ′ = X−β _opt APL (5)
In this example, the position and amplitude (positive / negative) of the pulse are expressed in 17 bits as described above, so there are 2 17 combinations. Here, when the kth algebraic code output vector is Ck, in the algebraic codebook search, D = | X′−GC · A · Ck | ² (6)
The code vector Ck that minimizes the evaluation function error power D is obtained. GC is the algebraic codebook gain. In the search of the algebraic codebook, the error power evaluation unit 10 normalizes the mutual value obtained by normalizing the square of the cross-correlation value Rcx between the algebraic composite signal A · Ck and the input signal X ′ with the autocorrelation value Rcc of the algebraic composite signal. A search is made for a combination of a pulse position and a polarity having the largest correlation value (Rcx * Rcx / Rcc). When the pitch lag is shorter than the subframe length, a pitch periodicizing unit is provided to improve the sound quality, and the pitch periodicizing process is performed by the pitch periodicizing unit to make the algebraic codebook output periodic. be able to. The output result of the algebraic codebook search is the position and sign (positive or negative) of each pulse, which are collectively called an algebraic code.

次にゲイン量子化について説明する。G.729A方式では代数符号帳ゲインは直接には量子化されず、適応符号帳ゲインＧa（＝βopt）と代数符号帳ゲインＧCの補正係数γをベクトル量子化する。ここで、代数符号帳ゲインGCと補正係数γとの間には
GC＝ｇ′×γ
なる関係がある。ｇ′は過去の4サブフレームの対数利得から予測される現フレームの利得である。ゲイン量子化器１２の図示しないゲイン量子化テーブル（ゲイン符号帳）には、適応符号帳ゲインＧaと代数符号帳ゲインに対する補正係数γの組み合わせが128通り(＝２７)用意されている。ゲイン符号帳の探索方法は、(1)適応符号帳出力ベクトルと代数符号帳出力ベクトルに対して、ゲイン量子化テーブルの中から1組のテーブル値を取り出してゲイン可変部１３、１４に設定し、(2)ゲイン可変部１３、１４でそれぞれのベクトルにゲインＧa、Ｇcを乗じてLPC合成フィルタ６に入力し、(3)誤差電力評価部１０において入力信号Ｘとの誤差電力が最も小さくなる組み合わせを選択する、ことにより行なう。 Next, gain quantization will be described. In the G.729A system, the algebraic codebook gain is not directly quantized, but the adaptive codebook gain Ga (= βopt) and the correction coefficient γ of the algebraic codebook gain GC are vector quantized. Here, between the algebraic codebook gain GC and the correction coefficient γ
GC = g ′ × γ
There is a relationship. g ′ is the gain of the current frame predicted from the logarithmic gain of the past 4 subframes. In the gain quantization table (gain codebook) (not shown) of the gain quantizer 12, 128 combinations (= 27) of the correction code γ for the adaptive codebook gain Ga and the algebraic codebook gain are prepared. The gain codebook search method is as follows: (1) For the adaptive codebook output vector and the algebraic codebook output vector, one set of table values is extracted from the gain quantization table and set in the gain variable sections 13 and 14. (2) The gain variable units 13 and 14 multiply the respective vectors by the gains Ga and Gc and input them to the LPC synthesis filter 6. (3) The error power evaluation unit 10 has the smallest error power with the input signal X. This is done by selecting a combination.

以上より、回線符号化部１５は、(1)LSPの量子化インデックスであるLSP符号、(2)ピッチラグ符号Ｌopt、(3) 代数符号帳インデックスである代数符号、(4) ゲインの量子化インデックスであるゲイン符号を多重して回線データを作成し、復号器に伝送する。以上説明した通り、G.729A方式の符号化方式は音声の生成過程をモデル化し、そのモデルの特徴パラメータを量子化して伝送することにより、音声を効率良く圧縮することができる。 From the above, the line coding unit 15 (1) an LSP code that is an LSP quantization index, (2) a pitch lag code Lopt, (3) an algebraic code that is an algebraic codebook index, and (4) a gain quantization index. The line code is generated by multiplexing the gain code, and is transmitted to the decoder. As described above, the G.729A encoding scheme can efficiently compress speech by modeling the speech generation process and quantizing and transmitting the feature parameters of the model.

・復号器の構成及び動作
図２０にG.729A方式の復号器のブロック図である。符号器側から送られてきた回線データが回線復号部２１へ入力されてLSP符号、ピッチラグ符号、代数符号、ゲイン符号が出力される。復号器ではこれらの符号に基づいて音声データを復号する。復号器の動作については、復号器の機能が符号器に含まれているため一部重複するが、以下で簡単に説明する。
LSP逆量子化部２２はLSP符号が入力すると逆量子化し、LSP逆量子化値を出力する。LSP補間部２３は現フレームの第２サブフレームにおけるLSP逆量子化値と前フレームの第２サブフレームのLSP逆量子化値から現フレームの第１サブフレームのLSP逆量子化値を補間演算する。次に、パラメータ逆変換部２４はLSP補間値とLSP逆量子化値をそれぞれLPC合成フィルタ係数へ変換する。G.729A方式のLPC合成フィルタ２５は、最初の第１サブフレームではLSP補間値から変換されたLPC係数を用い、次の第２サブフレームではLSP逆量子化値から変換されたLPC係数を用いる。 Decoder Configuration and Operation FIG. 20 is a block diagram of a G.729A decoder. The line data sent from the encoder side is input to the line decoding unit 21, and an LSP code, pitch lag code, algebraic code, and gain code are output. The decoder decodes the audio data based on these codes. The operation of the decoder is partly duplicated because the function of the decoder is included in the encoder, but will be briefly described below.
When the LSP code is input, the LSP inverse quantization unit 22 performs inverse quantization and outputs an LSP inverse quantization value. The LSP interpolation unit 23 interpolates the LSP inverse quantization value of the first subframe of the current frame from the LSP inverse quantization value of the second subframe of the current frame and the LSP inverse quantization value of the second subframe of the previous frame. . Next, the parameter inverse conversion unit 24 converts the LSP interpolation value and the LSP inverse quantization value into LPC synthesis filter coefficients, respectively. The G.729A LPC synthesis filter 25 uses the LPC coefficient converted from the LSP interpolation value in the first first subframe, and uses the LPC coefficient converted from the LSP inverse quantization value in the next second subframe. .

適応符号帳２６はピッチラグ符号が指示する読み出し開始位置からサブフレーム長(=40サンプル)のピッチ信号を出力し、雑音符号帳２７は代数符号に対応するの読出し位置からパルス位置とパルスの極性を出力する。また、ゲイン逆量子化部２８は入力されたゲイン符号より適応符号帳ゲイン逆量子化値と代数符号帳ゲイン逆量子化値を算出してゲイン可変部２９，３０に設定する。加算部３１は適応符号帳出力に適応符号帳ゲイン逆量子化値を乗じて得られる信号と、代数符号帳出力に代数符号帳ゲイン逆量子化値を乗じて得られる信号とを加え合わせて音源信号を作成し、この音源信号をLPC合成フィルタ２５に入力する。これにより、LPC合成フィルタ２５から再生音声を得ることができる。
尚、初期状態では復号器側の適応符号帳２６の内容は全て振幅0の信号が入っており、サブフレーム毎に時間的に一番古い信号をサブフレーム長だけ捨て、一方、現サブフレームで求めた音源信号を適応符号帳２６に格納するように動作する。つまり、符号器と復号器の適応符号帳２６は常に最新の同じ状態になるように維持される。 The adaptive codebook 26 outputs a pitch signal having a subframe length (= 40 samples) from the reading start position indicated by the pitch lag code, and the noise codebook 27 indicates the pulse position and pulse polarity from the reading position corresponding to the algebraic code. Output. The gain dequantization unit 28 calculates an adaptive codebook gain dequantization value and an algebraic codebook gain dequantization value from the input gain code, and sets them in the gain variable units 29 and 30. The adder 31 adds the signal obtained by multiplying the adaptive codebook output by the adaptive codebook gain inverse quantization value and the signal obtained by multiplying the algebraic codebook output by the algebraic codebook gain inverse quantization value, A signal is created, and this sound source signal is input to the LPC synthesis filter 25. Thereby, reproduced sound can be obtained from the LPC synthesis filter 25.
In the initial state, the contents of the adaptive codebook 26 on the decoder side all contain a signal with an amplitude of 0. For each subframe, the oldest signal in time is discarded by the subframe length, while in the current subframe. It operates so as to store the obtained excitation signal in the adaptive codebook 26. In other words, the adaptive codebook 26 of the encoder and decoder is always maintained in the latest state.

(2)EVRCの説明
EVRCは、入力信号の性質に応じて1フレーム当りの伝送ビット数を変化させる点に特徴がある。すなわち、母音などの定常部ではビットレートを高くし、無音部や過渡部などでは伝送ビット数を少なくして、時間平均のビットレートを少なくする。EVRCのビットレートを表１に示す。

(2) Explanation of EVRC
EVRC is characterized in that the number of transmission bits per frame is changed according to the nature of the input signal. That is, the bit rate is increased in a stationary part such as a vowel, and the number of transmission bits is reduced in a silent part or a transient part, thereby reducing the time average bit rate. Table 1 shows the EVRC bit rates.

EVRCでは現フレームの入力信号に対してレート判定を行う。レート判定は、入力音声信号の周波数領域を低域と高域に分け各帯域の電力を計算する。各帯域の電力とあらかじめ決められた2種類の閾値とを比較し、低域電力と高域電力が共に閾値よりも高い場合はフルレートを選択し、低域電力又は高域電力のいずれか一方のみが閾値よりも高い場合はハーフレートを選択する。また、低域電力と高域電力が共に閾値よりも低い場合には1/8レートを選択する。 In EVRC, the rate is determined for the input signal of the current frame. In the rate determination, the frequency region of the input audio signal is divided into a low region and a high region, and the power of each band is calculated. Compare the power of each band with two predetermined thresholds, and if both the low-frequency power and high-frequency power are higher than the threshold, select the full rate, and either low-frequency power or high-frequency power only If is higher than the threshold, the half rate is selected. In addition, when both the low frequency power and the high frequency power are lower than the threshold, the 1/8 rate is selected.

図２１にEVRCの符号器の構成を示す。EVRCでは、20msec(160サンプル)のフレームに分割された入力信号を符号器に入力する。また、1フレームの入力信号は、表２に示すように3つのサブフレームに分割される。尚、フルレートとハーフレートでは符号器の構成はほぼ同一であり、各量子化器の量子化ビット数が異なるだけなので以下ではフルレートについて説明する。

FIG. 21 shows the configuration of the EVRC encoder. In EVRC, an input signal divided into 20 msec (160 sample) frames is input to an encoder. An input signal of one frame is divided into three subframes as shown in Table 2. The full rate and half rate have substantially the same encoder configuration, and only the number of quantization bits of each quantizer is different, so the full rate will be described below.

LPC(線形予測)分析部41では、図２２に示すように現フレームの入力信号160サンプルと、先読み分80サンプルの合計240サンプルを用いたLPC分析によりLPC係数を求める。LSP量子化部42では、LPC係数をLSPパラメータに変換してから量子化してLSP符号を求め、LSP逆量子化部43はLSP符号よりＬSP逆量子化値を求める。また、LSP補間部44では、現フレームで求めたLSP逆量子化値(第3サブフレームのLSP逆量子化値)と前フレームで求めた第3サブフレームのLSP逆量子化値を用いて線形補間演算により現フレームの第1、第2、第3サブフレームにおけるLSP逆量子化値を求める。 As shown in FIG. 22, the LPC (Linear Prediction) analysis unit 41 obtains an LPC coefficient by LPC analysis using a total of 240 samples of 160 samples of the input signal of the current frame and 80 samples of prefetch. The LSP quantization unit 42 converts the LPC coefficients into LSP parameters and then quantizes them to obtain an LSP code, and the LSP inverse quantization unit 43 obtains an LSP inverse quantization value from the LSP code. The LSP interpolation unit 44 uses the LSP inverse quantization value obtained in the current frame (the LSP inverse quantization value in the third subframe) and the LSP inverse quantization value in the third subframe obtained in the previous frame. LSP inverse quantization values in the first, second, and third subframes of the current frame are obtained by interpolation calculation.

次に、ピッチ分析部45で現フレームのピッチラグとピッチゲインを求める。EVRCでは、1フレームにつき2回のピッチ分析を行う。ピッチ分析の分析窓位置は図２２に示す通りである。ピッチ分析の手順は次の通りである。
(1)現フレームの入力信号と先読み信号を前記LPC係数で構成されるLPC逆フィルタに入力してLPC残差信号を求める。なお、LPC合成フィルタをH(z)とするとLPC逆フィルタは1/H(z)である。
(2)LPC残差信号の自己相関関数を求め、自己相関関数が最大となる時のピッチラグとゲインを求める。
(3)上記の処理を2つの分析窓位置で行う。1回目の分析で求めたピッチラグとピッチゲインを各々Lag1、Gain1とし、2回目の分析で求めたピッチラグとピッチゲインをLag2、Gain2とする。
(4)Gain1とGain2の差があらかじめ決められた閾値よりも大きい時は、Gain1とLag1を現フレームのピッチゲインとピッチラグとする。また、閾値以下の場合にはGain2とLag2を各々現フレームのピッチゲインとピッチラグとする。 Next, the pitch analysis unit 45 obtains the pitch lag and pitch gain of the current frame. EVRC performs pitch analysis twice per frame. The analysis window positions for pitch analysis are as shown in FIG. The procedure for pitch analysis is as follows.
(1) The input signal of the current frame and the look-ahead signal are input to an LPC inverse filter composed of the LPC coefficients to obtain an LPC residual signal. If the LPC synthesis filter is H (z), the LPC inverse filter is 1 / H (z).
(2) Obtain the autocorrelation function of the LPC residual signal, and find the pitch lag and gain when the autocorrelation function is maximized.
(3) The above processing is performed at two analysis window positions. The pitch lag and pitch gain obtained in the first analysis are Lag1 and Gain1, respectively, and the pitch lag and pitch gain obtained in the second analysis are Lag2 and Gain2.
(4) When the difference between Gain1 and Gain2 is larger than a predetermined threshold, Gain1 and Lag1 are set as the pitch gain and pitch lag of the current frame. If it is equal to or smaller than the threshold, Gain2 and Lag2 are set as the pitch gain and pitch lag of the current frame, respectively.

上記の手順により現フレームのピッチラグとピッチゲインを求める。ピッチゲイン量子化部46は該ピッチゲインを量子化テーブルを用いて量子化してピッチゲイン符号を出力し、、ピッチゲイン逆量子化部47はピッチゲイン符号を逆量子化してゲイン可変部48に入力する。G.729Aではサブフレーム単位でピッチラグとピッチゲインを求めるのに対し、EVRCではフレーム単位でピッチラグとピッチゲインを求める点が異なっている。 The pitch lag and pitch gain of the current frame are obtained by the above procedure. The pitch gain quantization unit 46 quantizes the pitch gain using a quantization table and outputs a pitch gain code, and the pitch gain dequantization unit 47 dequantizes the pitch gain code and inputs it to the gain variable unit 48 To do. In G.729A, the pitch lag and pitch gain are obtained in units of subframes, whereas in EVRC, the pitch lag and pitch gain are obtained in units of frames.

又、EVRCでは、入力音声修正部4９がピッチラグ符号に応じて入力信号を修正する点が異なっている。つまり、G.729Aのように、入力信号との誤差が最も小さくなるようなピッチラグとピッチゲインを求めるのではなく、EVRCでは入力音声修正部46が、ピッチ分析によって求めたピッチラグとピッチゲインによって決まる適応符号帳出力に最も近くなるように入力信号を修正する。具体的に、入力音声修正部46は、LPC逆フィルタにより入力信号を残差信号に変換し、残差信号領域でのピッチピーク位置を適応符号帳47の出力のピッチピーク位置と同じ位置になるように時間シフトすることで実現する。 Further, EVRC is different in that the input voice correcting unit 49 corrects an input signal in accordance with the pitch lag code. In other words, instead of obtaining the pitch lag and pitch gain that minimize the error from the input signal as in G.729A, in EVRC, the input speech correction unit 46 is determined by the pitch lag and pitch gain obtained by pitch analysis. The input signal is corrected so as to be closest to the adaptive codebook output. Specifically, the input speech correction unit 46 converts the input signal into a residual signal using an LPC inverse filter, and the pitch peak position in the residual signal area becomes the same position as the pitch peak position of the output of the adaptive codebook 47. This is achieved by time shifting.

次に雑音性音源信号とゲインの決定をサブフレーム単位で行う。まず、適応符号帳出力をゲイン可変部48、LPC合成フィルタ51を通して得られる適応符号帳合成信号を、入力音声修正部46の修正入力信号から演算部５２で差し引いて代数符号帳探索のターゲット信号Ｘ′を生成する。EVRCの代数符号帳53は、G.729Aと同様に複数本のパルスから構成され、フルレートでは1サブフレーム当り35ビットを割り当てている。フルレートのパルス位置を表３に示す。 Next, the noisy sound source signal and gain are determined in units of subframes. First, the adaptive codebook synthesized signal obtained by passing the adaptive codebook output through the gain variable unit 48 and the LPC synthesis filter 51 is subtracted from the modified input signal of the input speech correcting unit 46 by the arithmetic unit 52 to obtain the target signal X for the algebraic codebook search. ′ Is generated. The EVRC algebraic codebook 53 is composed of a plurality of pulses as in G.729A, and 35 bits are allocated per subframe at the full rate. Table 3 shows the full-rate pulse positions.

代数符号帳の探索方法はG.729Aと同様であるが、各パルス系統から選ぶパルスの本数が異なる。5つのパルス系統のうち3系統に2パルスを割り当て、2系統に1パルスを割り当てる。ただし、1パルスを割り当てる系統の組み合わせはT3-T4,T4-T0, T0-T1, T1-T2の４通りに限定されている。従って、パルス系統とパルス本数の組合わせは表4のようになる。

The algebraic codebook search method is the same as G.729A, but the number of pulses selected from each pulse system is different. 2 pulses are assigned to 3 of 5 pulse systems, and 1 pulse is assigned to 2 systems. However, the combinations of systems to which one pulse is assigned are limited to four combinations of T3-T4, T4-T0, T0-T1, and T1-T2. Therefore, combinations of pulse system and number of pulses are as shown in Table 4.

以上のように1パルスを割り当てる系統と2パルスを割り当てる系統があるため、パルス本数によって各パルス系統に割り当てるビット数が異なっている。表５にフルレートの代数符号帳のビット配分を示す。

As described above, since there are a system for assigning one pulse and a system for assigning two pulses, the number of bits assigned to each pulse system differs depending on the number of pulses. Table 5 shows the bit allocation of the full rate algebraic codebook.

1本のパルス系統の組み合わせは表4より4通りあるため、2ビット必要である。パルス数が1本である２つのパルス系統における11個のパルス位置をそれぞれX,Y方向に配列すると、１１×11の格子点が形成でき、1つの格子点により２つのパルス系統のパルス位置を特定することができる。従って、パルス数が1本である２つのパルス系統のパルス位置を特定するために7ビット必要であり、パルス数が1本である２つのパルス系統のパルスの極性を表現するのに2ビット必要である。また、パルス数が２本である２つのパルス系統のパルス位置を特定するために7×３ビット必要であり、パルス数が２本である３つのパルス系統のパルスの極性を表現するのに1×３ビット必要である。尚、1系統のパルスの極性は同じである。以上より、EVRCにおいて代数符号はトータル35ビットで表現される。

Since there are four combinations of one pulse system from Table 4, 2 bits are required. If 11 pulse positions in two pulse systems with one pulse are arranged in the X and Y directions respectively, 11 × 11 lattice points can be formed, and the pulse positions of the two pulse systems can be formed by one lattice point. Can be identified. Therefore, 7 bits are required to specify the pulse position of two pulse systems with one pulse, and 2 bits are required to express the polarity of the pulses of two pulse systems with one pulse. It is. In addition, 7 × 3 bits are required to specify the pulse position of two pulse systems having two pulses, and 1 is used to express the polarity of the pulses of three pulse systems having two pulses. * 3 bits are required. Note that the polarity of one pulse is the same. From the above, the algebraic code is expressed by a total of 35 bits in EVRC.

代数符号帳探索において、代数符号帳53は順次パルス性信号をゲイン乗算部54、LPC合成フィルタ55に入力して代数合成信号を発生し、演算部56は代数合成信号とターゲット信号X′との差を演算し、
Ｄ＝|Ｘ′−GC・A・Cｋ|²
の評価関数誤差電力Ｄを最小とする符号ベクトルCｋを求める。GCは代数符号帳ゲインである。誤差電力評価部59は代数符号帳の探索において、代数合成信号A・Ckとターゲット信号Ｘ′の相互相関値Rcxの２乗を代数合成信号の自己相関値Rccで正規化して得られる正規化相互相関値(Rcx*Rcx/Rcc)が最も大きくなるパルス位置と極性の組み合わせを探索する。 In the algebraic codebook search, the algebraic codebook 53 sequentially inputs a pulse signal to the gain multiplier 54 and the LPC synthesis filter 55 to generate an algebraic synthesized signal, and the arithmetic unit 56 calculates the algebraic synthesized signal and the target signal X ′. Calculate the difference
D = | X′−GC · A · Ck | ²
The code vector Ck that minimizes the evaluation function error power D is obtained. GC is the algebraic codebook gain. In the search of the algebraic codebook, the error power evaluator 59 normalizes the mutual value obtained by normalizing the square of the cross-correlation value Rcx between the algebraic composite signal A · Ck and the target signal X ′ with the autocorrelation value Rcc of the algebraic composite signal. A search is made for a combination of a pulse position and a polarity having the largest correlation value (Rcx * Rcx / Rcc).

代数符号帳ゲインは直接には量子化されず、代数符号帳ゲインの補正係数γが１サブフレーム当たり５ビットでスカラー量子化される。補正係数γは、過去のサブフレームから予測されるゲインをg′で代数符号帳ゲインGcを正規化して得られる値(γ＝Gc／g′)である。以上より、多重化部６０は、(1)LSPの量子化インデックスであるLSP符号、(2)ピッチラグ符号、(3) 代数符号帳インデックスである代数符号、(4) ピッチゲインの量子化インデックスであるピッチゲイン符号、(5)代数符号帳ゲインの量子化インデックスである代数符号帳ゲイン符号を多重して回線データを作成し、復号器に伝送する。
尚、復号器は符号器側から送られてきたLSP符号、ピッチラグ符号、代数符号、ピッチゲイン符号、代数符号帳ゲイン符号を用いて音声データを復号するように構成される。EVRCの復号器は、G.729の復号器が符号器に対応して作成されるのと同様に作成できるためその説明は省略する。 The algebraic codebook gain is not directly quantized, but the algebraic codebook gain correction coefficient γ is scalar quantized with 5 bits per subframe. The correction coefficient γ is a value (γ = Gc / g ′) obtained by normalizing the algebraic codebook gain Gc with the gain predicted from the past subframe as g ′. As described above, the multiplexing unit 60 uses (1) an LSP code as an LSP quantization index, (2) a pitch lag code, (3) an algebraic code as an algebraic codebook index, and (4) a pitch gain quantization index. A line data is created by multiplexing a certain pitch gain code and (5) an algebraic codebook gain code which is a quantization index of the algebraic codebook gain, and transmits it to the decoder.
The decoder is configured to decode the speech data using the LSP code, pitch lag code, algebraic code, pitch gain code, and algebraic codebook gain code sent from the encoder side. Since the EVRC decoder can be created in the same manner as the G.729 decoder corresponding to the encoder, the description thereof is omitted.

(3)従来の音声符号の変換方式
インターネットと携帯電話の普及に伴い、インターネットのユーザと携帯電話網のユーザによる音声通話の通信量が今後ますます増えてくると考えられる。ところが、携帯電話網とインターネットとでは使用する音声符号化方式が異なるため、そのままでは通信することはできない。このため、従来は一方のネットワークで符号化された音声符号を音声符号変換部により他方のネットワークで用いられる符号化方式の音声符号に変換していた。
図２３に従来の典型的な音声符号変換方法の原理図を示す。以下ではこの方法を従来技術1と呼ぶ。図において、ユーザＡが端末71に対して入力した音声をユーザＢの端末7２に伝える場合のみを考える。ここで、ユーザＡの持つ端末71は符号化方式１の符号器71ａのみを持ち、ユーザＢの持つ端末72は符号化方式２の復号器72ａのみを持つこととする。 (3) Conventional voice code conversion method With the spread of the Internet and mobile phones, it is considered that the volume of voice calls between Internet users and mobile phone network users will increase in the future. However, since the voice encoding method used differs between the cellular phone network and the Internet, communication cannot be performed as it is. For this reason, conventionally, a speech code encoded in one network is converted into a speech code of a coding system used in the other network by a speech code conversion unit.
FIG. 23 shows a principle diagram of a conventional typical speech code conversion method. Hereinafter, this method is referred to as Prior Art 1. In the figure, only the case where the voice input by the user A to the terminal 71 is transmitted to the terminal 72 of the user B is considered. Here, it is assumed that the terminal 71 possessed by the user A has only the encoder 71a of the encoding scheme 1, and the terminal 72 possessed by the user B has only the decoder 72a of the encoding scheme 2.

送信側のユーザＡが発した音声は、端末71に組み込まれた符号化方式71の符号器71ａへ入力する。符号器71ａは入力した音声信号を符号化方式１の音声符号に符号化して伝送路71ｂに送出する。音声符号変換部73の復号器73ａは、伝送路71ｂを介して音声符号が入力すると、符号化方式１の音声符号から一旦再生音声を復号する。続いて、音声符号変換部73の符号器73ｂは再生音声信号を符号化方式２の音声符号に変換して伝送路72ｂに送出する。この符号化方式２の音声符号は伝送路72ｂを通して端末72に入力する。復号器72ａは音声符号が入力すると、符号化方式２の音声符号から再生音声を復号する。これにより、受信側のユーザＢは再生音声を聞くことができる。以上のように一度符号化された音声を復号し、復号された音声を再度符号化する処理をタンデム接続と呼ぶ。 The voice uttered by the user A on the transmission side is input to the encoder 71 a of the encoding method 71 incorporated in the terminal 71. The encoder 71a encodes the input audio signal into an encoding method 1 audio code and sends it to the transmission line 71b. When a speech code is input via the transmission path 71b, the decoder 73a of the speech code conversion unit 73 once decodes the reproduced speech from the speech code of the encoding scheme 1. Subsequently, the encoder 73b of the voice code conversion unit 73 converts the reproduced voice signal into a voice code of the encoding method 2 and sends it to the transmission path 72b. The voice code of this encoding method 2 is input to the terminal 72 through the transmission path 72b. When the speech code is input, the decoder 72a decodes the reproduced speech from the speech code of the encoding method 2. Thereby, the user B on the receiving side can listen to the reproduced sound. The process of decoding the speech once encoded as described above and encoding the decoded speech again is called tandem connection.

以上のように従来技術１の構成では、音声符号化方式1で符号化した音声符号を一旦符号化音声に復号し、再度、音声符号化方式2により符号化するタンデム接続を行うため、音声品質の著しい劣化や遅延の増加といった問題があった。すなわち、一度符号化処理され情報圧縮された音声(再生音声)は、元の音声(原音)に比べて音声の情報量が減っており、再生音声の音質は、厳密には原音よりも悪い。特に、G.729AやEVRCに代表される近年の低ビットレート音声符号化方式では、高圧縮率を実現するために入力音声に含まれる多くの情報を捨てて符号化しており、符号化と復号を繰り返すタンデム接続を行うと、再生音声の品質が著しく劣化するという問題があった As described above, in the configuration of the conventional technique 1, since the speech code encoded by the speech encoding method 1 is once decoded into the encoded speech and then encoded again by the speech encoding method 2, the speech quality is improved. There were problems such as significant deterioration of the network and increased delay. That is, once encoded and information-compressed sound (reproduced sound) has a smaller amount of sound information than the original sound (original sound), and the sound quality of the reproduced sound is strictly worse than the original sound. In particular, in recent low bit rate speech coding schemes typified by G.729A and EVRC, a large amount of information contained in the input speech is discarded and coded in order to achieve a high compression rate. When tandem connection is repeated repeatedly, there is a problem that the quality of the playback audio deteriorates remarkably

このようなタンデム接続の問題点を解決する方法として、音声符号を音声信号に戻すことなく、LSP符号、ピッチラグ符号等のパラメータ符号に分解し、各パラメータ符号を個別に別の音声符号化方式の符号に変換する手法が提案されている（特願2001-75427参照）。図２４にその原理図を示す。以下ではこれを従来技術２と呼ぶ。
端末71に組み込まれた符号化方式１の符号器71ａはユーザＡが発した音声信号を符号化方式１の音声符号に符号化して伝送路71ｂに送出する。音声符号変換部74は伝送路71ｂより入力した符号化方式１の音声符号を符号化方式２の音声符号に変換して伝送路72ｂに送出し、端末72の復号器72ａは、伝送路72ｂを介して入力する符号化方式２の音声符号から再生音声を復号し、ユーザＢはこの再生音声を聞くことができる。 As a method for solving such a problem of tandem connection, the speech code is decomposed into parameter codes such as LSP code and pitch lag code without returning to speech signals, and each parameter code is individually converted into another speech coding method. A method of converting to a code has been proposed (see Japanese Patent Application No. 2001-75427). FIG. 24 shows the principle diagram. Hereinafter, this is referred to as Prior Art 2.
The encoding method 1 encoder 71a incorporated in the terminal 71 encodes the audio signal emitted by the user A into the encoding method 1 audio code and sends it to the transmission line 71b. The voice code conversion unit 74 converts the voice code of the encoding method 1 input from the transmission line 71b into the voice code of the coding method 2 and sends it to the transmission line 72b. The decoder 72a of the terminal 72 uses the transmission line 72b. The user B can listen to the reproduced voice by decoding the reproduced voice from the voice code of the encoding method 2 input via the user.

符号化方式１は、(1)フレーム毎の線形予測分析により得られる線形予測係数(LPC係数)から求まるLSPパラメータを量子化することにより得られる第１のLＳＰ符号と、(2)周期性音源信号を出力するための適応符号帳の出力信号を特定する第１のピッチラグ符号と、(3)雑音性音源信号を出力するための代数符号帳(あるいは雑音符号帳)の出力信号を特定する第１の代数符号(雑音符号)と、(4)前記適応符号帳の出力信号の振幅を表すピッチゲインと前記代数符号帳の出力信号の振幅を表す代数符号帳ゲインとを量子化して得られる第１のゲイン符号とで音声信号を符号化する方式である。又、符号化方式２は、第１の音声符号化方式と異なる量子化方法により量子化して得られる(1)第２のLＳＰ符号、(2)第２のピッチラグ符号、(3)第２の代数符号（雑音符号）、(4)第２のゲイン符号とで音声信号を符号化する方式である。 The encoding method 1 includes (1) a first LSP code obtained by quantizing an LSP parameter obtained from a linear prediction coefficient (LPC coefficient) obtained by linear prediction analysis for each frame, and (2) a periodic sound source. A first pitch lag code that specifies an output signal of an adaptive codebook for outputting a signal, and (3) an output signal of an algebraic codebook (or a noise codebook) for outputting a noisy excitation signal. 1 is obtained by quantizing an algebraic code (noise code), and (4) a pitch gain representing the amplitude of the output signal of the adaptive codebook and an algebraic codebook gain representing the amplitude of the output signal of the algebraic codebook. This is a method of encoding an audio signal with a gain code of 1. The encoding method 2 is obtained by quantization by a quantization method different from the first speech encoding method (1) a second LSP code, (2) a second pitch lag code, (3) a second In this method, a speech signal is encoded with an algebraic code (noise code) and (4) a second gain code.

音声符号変換部74は、符号分離部74ａ、LSP符号変換部74ｂ、ピッチラグ符号変換部74ｃ、代数符号変換部74ｄ、ゲイン符号変換部74ｅ、符号多重化部74ｆを有している。符号分離部74ａは、端末１の符号器71ａから伝送路71ｂを介して入力する符号化方式１の音声符号より、音声信号を再現するために必要な複数の成分の符号、すなわち、(1)LSP符号、(2)ピッチラグ符号、(3)代数符号、(4)ゲイン符号に分離し、それぞれを各符号変換部74ｂ〜74ｅに入力する。各符号変換部74ｂ〜74ｅは入力された音声符号化方式１によるLSP符号、ピッチラグ符号、代数符号、ゲイン符号をそれぞれ音声符号化方式２によるLSP符号、ピッチラグ符号、代数符号、ゲイン符号に変換し、符号多重化部74ｆは変換された音声符号化方式２の各符号を多重化して伝送路72ｂに送出する。 The speech code conversion unit 74 includes a code separation unit 74a, an LSP code conversion unit 74b, a pitch lag code conversion unit 74c, an algebraic code conversion unit 74d, a gain code conversion unit 74e, and a code multiplexing unit 74f. The code separation unit 74a uses a code of a plurality of components necessary for reproducing a voice signal from the voice code of the coding method 1 inputted from the encoder 71a of the terminal 1 via the transmission path 71b, that is, (1) An LSP code, (2) a pitch lag code, (3) an algebraic code, and (4) a gain code are separated and input to the code conversion units 74b to 74e, respectively. Each of the code conversion units 74b to 74e converts the input LSP code, pitch lag code, algebraic code, and gain code according to the speech coding method 1 into an LSP code, pitch lag code, algebraic code, and gain code according to the speech coding method 2, respectively. The code multiplexing unit 74f multiplexes the converted codes of the audio coding method 2 and sends them to the transmission path 72b.

図２５は各符号変換部74ｂ〜74ｅの構成を明確にした音声符号変換部74の構成図であり、図２４と同一部分には同一符号を付している。符号分離部74ａは伝送路より入力端子＃１を介して入力する符号化方式１の音声符号より、LSP符号１、ピッチラグ符号１、代数符号１、ゲイン符号１を分離し、それぞれ符号変換部74ｂ〜74ｅに入力する。 FIG. 25 is a block diagram of the voice code conversion unit 74 in which the configuration of each of the code conversion units 74b to 74e is clarified. The same parts as those in FIG. The code separation unit 74a separates the LSP code 1, the pitch lag code 1, the algebraic code 1, and the gain code 1 from the coding method 1 speech code input from the transmission line via the input terminal # 1, and the code conversion unit 74b. Input to 74e.

LSP符号変換部74ｂのLSP逆量子化器74ｂ₁は、符号化方式１のLSP符号１を逆量子化してLSP逆量子化値を出力し、LSP量子化器74ｂ₂は該LSP逆量子化値を符号化方式２のLSP量子化テーブルを用いて量子化してLSP符号２を出力する。ピッチラグ符号変換部74ｃのピッチラグ逆量子化器74ｃ₁は、符号化方式１のピッチラグ符号１を逆量子化してピッチラグ逆量子化値を出力し、ピッチラグ量子化器74ｃ₂は該ピッチラグ逆量子化値を符号化方式２のピッチラグ量子化テーブルを用いて量子化してピッチラグ符号２を出力する。代数符号変換部74ｄの代数符号逆量子化器74ｄ₁は、符号化方式１の代数符号１を逆量子化して代数符号逆量子化値を出力し、代数符号量子化器74ｄ₂は該代数符号逆量子化値を符号化方式２の代数符号量子化テーブルを用いて量子化して代数符号２を出力する。ゲイン符号変換部74ｅのゲイン逆量子化器74ｅ₁は、符号化方式１のゲイン符号１を逆量子化してゲイン逆量子化値を出力し、ゲイン量子化器74ｅ₂は該ゲイン逆量子化値を符号化方式２のゲイン量子化テーブルを用いて量子化してゲイン符号２を出力する。
符号多重化部74ｆは、各量子化器74ｂ₂〜74ｅ₂から出力するLSP符号２、ピッチラグ符号２、代数符号２、ゲイン符号２を多重して符号化方式２による音声符号を作成して出力端子＃２より伝送路に送出する。 The LSP dequantizer 74b ₁ of the LSP code converter 74b dequantizes the LSP code 1 of the encoding scheme 1 and outputs an LSP dequantized value, and the LSP quantizer 74b ₂ outputs the LSP dequantized value. Is quantized using the LSP quantization table of encoding method 2 and LSP code 2 is output. The pitch lag dequantizer 74c ₁ of the pitch lag code converter 74c dequantizes the pitch lag code ₁ of the encoding scheme 1 and outputs a pitch lag dequantized value, and the pitch lag quantizer 74c ₂ outputs the pitch lag dequantized value. Is quantized using the pitch lag quantization table of encoding method 2 and pitch lag code 2 is output. Algebraic code dequantizer 74d ₁ of algebraic code converting unit 74d is the algebraic code 1 of encoding scheme 1 and dequantized outputs algebraic code dequantized value, the algebraic code quantizer 74d ₂ are surrogate number of codes The inverse quantization value is quantized using the algebraic code quantization table of encoding method 2 and algebraic code 2 is output. The gain dequantizer 74e ₁ of the gain code converter 74e dequantizes the gain code ₁ of the encoding scheme 1 and outputs a gain dequantized value, and the gain quantizer 74e ₂ outputs the gain dequantized value. Is quantized using a gain quantization table of encoding method 2 and gain code 2 is output.
The code multiplexer 74f multiplexes the LSP code 2, the pitch lag code 2, the algebraic code 2, and the gain code 2 output from the quantizers 74b _{2 to} 74e ₂ to create a speech code according to the encoding scheme 2 and output it. Send to terminal # 2 to transmission line.

図２３のタンデム接続方式（従来技術１）は、符号化方式１で符号化された音声符号を一旦音声に復号して得られた再生音声を入力とし、再度符号化と復号を行っている。このため、再度の符号化(つまり音声情報圧縮)によって原音に比べて遥かに情報量が少なくなっている再生音声から音声のパラメータ抽出を行うため、それによって得られる音声符号は必ずしも最適なものではなかった。これに対し、図２４の従来技術２の音声符号化装置によれば、符号化方式１の音声符号を逆量子化及び量子化の過程を介して符号化方式２の音声符号に変換するため、従来技術１のタンデム接続に比べて格段に劣化の少ない音声符号変換が可能となる。また、音声符号変換のために一度も音声に復号する必要がないので、従来のタンデム接続で問題となっていた遅延も少なくて済むという利点がある。 In the tandem connection method (conventional technology 1) in FIG. 23, the reproduced speech obtained by once decoding the speech code encoded by the encoding method 1 is input, and the encoding and decoding are performed again. For this reason, since speech parameters are extracted from reproduced speech that has a much smaller amount of information than the original sound by re-encoding (i.e., speech information compression), the resulting speech code is not necessarily optimal. There wasn't. On the other hand, according to the speech coding apparatus of the related art 2 in FIG. Compared with the tandem connection of the prior art 1, speech code conversion with much less deterioration is possible. Further, since there is no need to decode the speech once for speech code conversion, there is an advantage that the delay which has been a problem in the conventional tandem connection can be reduced.

VoIP網では音声符号化方式としてG.729Aが用いられている。一方、次世代携帯電話システムとして期待されるcdma2000網ではEVRCが採用されている。表６にG.729AとEVRCの主要諸元を比較した結果を示す。

G.729Aのフレーム長は10msecであり、サブフレーム長は5msecである。一方、EVRCのフレーム長は20msecであり、１フレームを3つのサブフレームに分割している。このため、EVRCのサブフレーム長は6.625msec(最終サブフレームのみ6.75msec)となり、G.729Aとはフレーム長だけでなく、サブフレーム長も異なっている。表７にG.729AとEVRCのビット割り当てを比較した結果を示す。 In the VoIP network, G.729A is used as a voice encoding method. On the other hand, EVRC is adopted in the cdma2000 network, which is expected as a next-generation mobile phone system. Table 6 shows the results of comparing the main specifications of G.729A and EVRC.

The frame length of G.729A is 10 msec, and the subframe length is 5 msec. On the other hand, the frame length of EVRC is 20 msec, and one frame is divided into three subframes. Therefore, the EVRC subframe length is 6.625 msec (only the final subframe is 6.75 msec), and not only the frame length but also the subframe length is different from that of G.729A. Table 7 shows the result of comparison of bit assignment between G.729A and EVRC.

VoIP網とcdma2000網との間で音声通信をする場合には、一方の音声符号を他方の音声符号に変換するための音声符号変換技術が必要である。このような場合に用いられる技術として、前述した従来技術１と従来技術２が知られている。
ところが、従来技術１では符号化方式１の音声符号から一旦音声を再生し、再生された音声を入力として音声符号化方式２で再度符号化するため、符号化方式の違いに影響されずに符号変換が可能である。ところが、この方法では再符号化する際にLPC分析とピッチ分析のために信号の先読み（すなわち、遅延）が生じるという問題や、音質が大幅に劣化するという問題がある。

When performing voice communication between the VoIP network and the cdma2000 network, a voice code conversion technique for converting one voice code into the other voice code is required.

Conventional techniques

1 and 2 described above are known as techniques used in such a case.
However, in the prior art 1, since the voice is once reproduced from the voice code of the coding system 1, and the reproduced voice is encoded again by the voice coding system 2, the code is not affected by the difference in the coding system. Conversion is possible. However, this method has a problem that pre-reading (that is, delay) of a signal occurs due to LPC analysis and pitch analysis when re-encoding, and a problem that sound quality is greatly deteriorated.

一方、従来技術２の音声符号変換方式では、符号化方式１と符号化方式２のサブフレーム長が等しいという前提で音声符号に変換するため、符号方式１と符号方式２のサブフレーム長が異なる場合の符号変換に問題があった。すなわち、代数符号帳はサブフレーム長に応じてパルス位置候補が決定されているため、サブフレーム長が異なる方式間（G.729AとEVRC）では、パルスの位置が全く異なることになり、パルスの位置を一対一で対応付けるのが難しいという問題があった。 On the other hand, in the speech code conversion method of the prior art 2, since the subframe lengths of the encoding method 1 and the encoding method 2 are converted on the assumption that the subframe lengths of the encoding method 1 and the encoding method 2 are equal, the subframe lengths of the encoding method 1 and the encoding method 2 are different. There was a problem with the code conversion. That is, in the algebraic codebook, pulse position candidates are determined according to the subframe length, and therefore, the pulse positions are completely different between systems with different subframe lengths (G.729A and EVRC). There was a problem that it was difficult to associate the positions one-on-one.

以上より本発明の目的は、サブフレーム長の異なる音声符号化方式間であっても音声符号変換を行なえるようにすることである。
本発明の別の目的は、音質劣化を少なくでき、しかも、遅延時間を小さくできるようにすることである。
本発明の別の目的は、EVRCのレートに応じた符号変換を行なえるようにすることである。 Accordingly, an object of the present invention is to enable voice code conversion even between voice coding systems having different subframe lengths.
Another object of the present invention is to make it possible to reduce deterioration of sound quality and to reduce delay time.
Another object of the present invention is to enable code conversion corresponding to the EVRC rate.

本発明の第１の態様は、第1音声符号化方式に基いて音声信号をLSP符号、ピッチラグ符号、代数符号、ピッチゲイン符号、代数符号帳ゲイン符号で符号化した第1音声符号を、第２音声符号化方式に基づいた第２音声符号に変換する音声符号変換方法であり、第1音声符号のレートに対応させてフルレート用、ハーフレート用、1/8レート用の音声符号変換部を設け、第1音声符号のレートが1/8レートの場合、1/8レート用の音声符号変換部は、第1音声符号化方式による音声符号に含まれるLSP符号と代数符号帳ゲイン符号を分離してそれぞれを逆量子化して逆量子化値を出力するステップ、逆量子化値のうちLSP符号の逆量子化値を第２音声符号化方式により量子化して第２音声符号のLSP符号を求めるステップ、音源発生部から出力するランダム信号に前記ゲイン符号の逆量子化値を乗算し、乗算結果をLSP符号の逆量子化値で構成されるLPC合成フィルタに入力してターゲット信号を発生するステップ、第２音声符号化方式における任意の代数符号と前記第2音声符号のLSP符号の逆量子化値を用いて代数合成信号を生成するステップ、前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求め、
第2音声符号の前記LSP符号の逆量子化値、フルレートあるいはハーフレートの音声符号変換により得られたピッチラグ符号の逆量子化値、前記求めた代数符号及び前記ターゲット信号を用いて第２音声符号化方式のゲイン符号を求めるステップ、前記求めた第２音声符号化方式におけるLSP符号、代数符号、ゲイン符号及び前記ピッチラグ符号を出力するステップを有している。
上記本発明の音声符号変換方法において、第1音声符号のレートがフルレートの場合、フルレート用の音声符号変換部は、第1音声符号を構成する各符号を逆量子化し、逆量子化値のうちLSP符号、ピッチラグ符号の逆量子化値を第２音声符号化方式により量子化して第２音声符号のLSP符号、ピッチラグ符号を求めるステップ、第1音声符号のピッチゲイン符号の逆量子化値を用いて補間処理により第2音声符号のピッチゲイン符号の逆量子化値を求めるステップ、前記第2音声符号のLSP符号、ピッチラグ符号、ピッチゲインの逆量子化値を用いてピッチ周期性合成信号を生成すると共に、第1音声符号より音声信号を再生し、該再生された音声信号と前記ピッチ周期性合成信号の差信号をターゲット信号として発生するステップ、第２音声符号化方式における任意の代数符号と前記第2音声符号のLSP符号の逆量子化値を用いて代数合成信号を生成するステップ、前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求めるステップ、第2音声符号の前記LSP符号、ピッチラグ符号の逆量子化値、前記求めた代数符号及び前記ターゲット信号を用いて第２音声符号化方式のゲイン符号を求めるステップ、前記求めた第２音声符号化方式におけるLSP符号、ピッチラグ符号、代数符号、ゲイン符号を出力するステップを有している。
本発明の第２の態様は、第1音声符号化方式に基づいて音声信号をLSP符号、ピッチラグ符号、代数符号、ピッチゲイン符号、代数符号帳ゲイン符号で符号化した第1音声符号を、第２音声符号化方式に基づいた第２音声符号に変換する音声符号変換装置であり、第1音声符号のレートに対応させてフルレート用、ハーフレート用、1/8レート用の音声符号変換部を備え、該1/8レート用の音声符号変換部は、第1音声符号化方式による音声符号に含まれるLSP符号と代数符号帳ゲイン符号を分離してそれぞれを逆量子化して逆量子化値を出力する逆量子化部、逆量子化値のうちLSP符号の逆量子化値を第２音声符号化方式により量子化して第２音声符号のLSP符号を出力するLSP量子化部、音源発生部から出力するランダム信号に前記ゲイン符号の逆量子化値を乗算し、乗算結果をLSP符号の逆量子化値で構成されるLPC合成フィルタに入力してターゲット信号を発生するターゲット生成部、第２音声符号化方式における任意の代数符号と前記第2音声符号のLSP符号の逆量子化値を用いて代数合成信号を生成し、前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求める代数符号取得部、第2音声符号の前記LSP符号の逆量子化値、フルレートあるいはハーフレートの音声符号変換により得られたピッチラグ符号の逆量子化値、前記求めた代数符号及び前記ターゲット信号を用いて第２音声符号化方式のゲイン符号を求めるゲイン符号取得部、前記求めた第２音声符号化方式におけるLSP符号、代数符号、ゲイン符号及び前記ピッチラグ符号を多重して出力する符号多重部を有している。
上記本発明の音声符号変換装置において、前記フルレート用の音声符号変換部は、第1音声符号を構成する各符号を逆量子化し、逆量子化値のうちLSP符号、ピッチラグ符号の逆量子化値を第２音声符号化方式により量子化して第２音声符号のLSP符号、ピッチラグ符号を求める逆量子化部、第1音声符号のピッチゲイン符号の逆量子化値を用いて補間処理により第2音声符号のピッチゲイン符号の逆量子化値を求めるピッチゲイン補間部、前記第2音声符号のLSP符号、ピッチラグ符号、ピッチゲインの逆量子化値を用いてピッチ周期性合成信号を生成すると共に、第1音声符号より音声信号を再生し、該再生された音声信号と前記ピッチ周期性合成信号の差信号をターゲット信号として発生するターゲット生成部、第２音声符号化方式における任意の代数符号と前記第2音声符号のLSP符号の逆量子化値を用いて代数合成信号を生成し、前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求める代数符号取得部、第2音声符号の前記LSP符号、ピッチラグ符号の逆量子化値、前記求めた代数符号及び前記ターゲット信号を用いて第２音声符号化方式のゲイン符号を求めるゲイン符号取得部、前記求めた第２音声符号化方式におけるLSP符号、ピッチラグ符号、代数符号、ゲイン符号を多重して出力する符号多重部を有している。 According to a first aspect of the present invention, a first speech code obtained by encoding a speech signal with an LSP code, a pitch lag code, an algebraic code, a pitch gain code, and an algebraic codebook gain code based on the first speech coding scheme is provided. This is a voice code conversion method for converting to a second voice code based on a two-voice coding system, and a full-rate, half-rate, and 1 / 8-rate voice code conversion unit corresponding to the rate of the first voice code. When the rate of the 1st speech code is 1/8 rate, the speech code conversion unit for 1/8 rate separates the LSP code and the algebraic codebook gain code included in the speech code by the 1st speech coding method And dequantizing each of them to output an inverse quantized value. The inverse quantized value of the LSP code among the inverse quantized values is quantized by the second speech coding method to obtain the LSP code of the second speech code. Step, random signal output from the sound source generator Generating a target signal by multiplying the inverse quantization value of the gain code and inputting the multiplication result to an LPC synthesis filter composed of the inverse quantization value of the LSP code, an arbitrary algebra in the second speech coding system Generating an algebraic synthesized signal using a code and an inverse quantized value of the LSP code of the second speech code, and an algebraic code in the second speech coding system that minimizes a difference between the target signal and the algebraic synthesized signal Seeking
The second speech code using the inverse quantization value of the LSP code of the second speech code, the inverse quantization value of the pitch lag code obtained by full-rate or half-rate speech code conversion, the obtained algebraic code and the target signal Obtaining a gain code of the encoding system, and outputting an LSP code, an algebraic code, a gain code, and the pitch lag code in the obtained second speech encoding system.
In the speech code conversion method of the present invention, when the rate of the first speech code is a full rate, the speech code conversion unit for full rate dequantizes each code constituting the first speech code, and among the inverse quantization values The step of obtaining the LSP code and pitch lag code of the second speech code by quantizing the inverse quantization values of the LSP code and pitch lag code by the second speech coding method, and using the inverse quantization value of the pitch gain code of the first speech code A step of obtaining an inverse quantized value of the pitch gain code of the second speech code by interpolation processing, and generating a pitch periodic composite signal using the LSP code, pitch lag code, and the inverse quantized value of the pitch gain of the second speech code And a step of reproducing an audio signal from the first audio code and generating a difference signal between the reproduced audio signal and the pitch periodic synthesized signal as a target signal, a second audio encoding method Generating an algebraic composite signal using an arbitrary algebraic code and an inverse quantized value of the LSP code of the second speech code, and a second speech encoding that minimizes a difference between the target signal and the algebraic composite signal A step of obtaining an algebraic code in the system, a step of obtaining a gain code of the second speech coding system using the LSP code of the second speech code, an inverse quantization value of the pitch lag code, the obtained algebraic code and the target signal, A step of outputting an LSP code, a pitch lag code, an algebraic code, and a gain code in the obtained second speech coding method.
According to a second aspect of the present invention, a first speech code obtained by encoding a speech signal with an LSP code, a pitch lag code, an algebraic code, a pitch gain code, and an algebraic codebook gain code based on the first speech coding scheme is provided. This is a speech code conversion device for converting to a second speech code based on a 2-speech coding scheme, and a speech code conversion unit for full rate, half rate, and 1/8 rate corresponding to the rate of the first speech code. The 1/8 rate speech code conversion unit separates the LSP code and the algebraic codebook gain code included in the speech code by the first speech coding method, and inverse quantizes each to obtain an inverse quantized value. From the inverse quantization unit that outputs, the LSP quantization unit that quantizes the inverse quantization value of the LSP code among the inverse quantization values by the second speech coding method and outputs the LSP code of the second speech code, and the sound source generation unit The inverse quantization value of the gain code is added to the output random signal. A target generation unit for generating a target signal by inputting the multiplication result to an LPC synthesis filter composed of an inverse quantization value of the LSP code, an arbitrary algebraic code in the second speech coding scheme, and the second speech code An algebraic code acquisition unit for generating an algebraic synthesized signal using a dequantized value of the LSP code of the second speech encoding method, and obtaining an algebraic code in the second speech coding scheme that minimizes a difference between the target signal and the algebraic synthesized signal; Second speech coding using the inverse quantization value of the LSP code of two speech codes, the inverse quantization value of the pitch lag code obtained by full-rate or half-rate speech code conversion, the obtained algebraic code and the target signal A gain code acquisition unit for obtaining a gain code of the system, a code that multiplexes and outputs the LSP code, the algebraic code, the gain code, and the pitch lag code in the obtained second speech coding system It has a heavy part.
In the speech code conversion device according to the present invention, the full-rate speech code conversion unit performs inverse quantization on each code constituting the first speech code, and among the inverse quantization values, an LSP code and an inverse quantization value of a pitch lag code Is quantized by the second speech coding method to obtain the LSP code of the second speech code and the pitch lag code, and the second speech by interpolation using the inverse quantization value of the pitch gain code of the first speech code. A pitch gain interpolation unit for obtaining a dequantized value of a pitch gain code of the code, an LSP code of the second speech code, a pitch lag code, and a pitch gain dequantized value of the pitch gain. A target generator that reproduces an audio signal from one audio code, and generates a difference signal between the reproduced audio signal and the pitch periodic synthesized signal as a target signal; an arbitrary in the second audio encoding method; An algebraic synthesized signal is generated using an inverse quantized value of the LSP code of the number code and the second speech code, and an algebraic code in the second speech coding system that minimizes the difference between the target signal and the algebraic synthesized signal An algebraic code obtaining unit for obtaining a gain code for obtaining a gain code of the second speech coding method using the LSP code of the second speech code, the inverse quantization value of the pitch lag code, the obtained algebraic code and the target signal A code multiplexing unit that multiplexes and outputs the LSP code, pitch lag code, algebraic code, and gain code in the obtained second speech encoding method.

本発明によれば、サブフレーム長の異なる音声符号化方式間であっても音声符号変換を行なうことができ、また、音質劣化を少なくでき、しかも、遅延時間を小さくできる。
また、本発明によれば、EVRCのレートに応じた符号変換を行なうことができる。 According to the present invention, speech code conversion can be performed even between speech coding systems having different subframe lengths, sound quality deterioration can be reduced, and delay time can be reduced.
Further, according to the present invention, code conversion corresponding to the EVRC rate can be performed.

（A）本発明の概略
図１は本発明の音声符号変換装置の原理説明図であり、符号化方式１（G.729A）の音声符号CODE1を符号化方式２（EVRC）の音声符号CODE2に変換する場合の音声符号変換装置の原理構成を示している。
本発明は、LSP符号、ピッチラグ符号、ピッチゲイン符号を従来技術２と同様に量子化パラメータ領域において符号化方式1から符号化方式2に符号変換し、かつ、再生音声とピッチ周期性合成信号とからターゲット信号を作成し、該ターゲット信号と代数合成信号との誤差が最小になるように代数符号、代数符号帳ゲインを求める。これにより、符号化方式1から符号化方式2に変換する点に特徴がある。図に従って、変換手順の詳細を説明すると以下の通りである。 (A) Outline of the Present Invention FIG. 1 is a diagram for explaining the principle of a speech code conversion apparatus according to the present invention. The speech code CODE1 of encoding method 1 (G.729A) is changed to the speech code CODE2 of encoding method 2 (EVRC). The principle structure of the speech code converter in the case of converting is shown.
In the present invention, the LSP code, pitch lag code, and pitch gain code are converted from encoding method 1 to encoding method 2 in the quantization parameter area in the same manner as in prior art 2, and the reproduced speech and the pitch periodic composite signal are converted. Then, a target signal is created, and an algebraic code and algebraic codebook gain are obtained so that an error between the target signal and the algebraic synthesized signal is minimized. This is characterized in that the encoding method 1 is converted to the encoding method 2. The details of the conversion procedure will be described with reference to FIG.

符号分離部101は符号化方式１（G.729A）の音声符号CODE１が入力すると該音声符号CODE1をLSP符号Lsp1、ピッチラグ符号Lag1、ピッチゲイン符号Gain1、代数符号Cb1の各パラメータ符号に分離し、LSP符号変換部102、ピッチラグ変換部103、ピッチゲイン変換部104、音声再生部105に入力する。
LSP符号変換部102はLSP符号Lsp1を符号化方式２のLSP符号Lsp2に変換する。ピッチラグ変換部103はピッチラグ符号Lag1を符号化方式2のピッチラグ符号Lag2に変換する。ピッチゲイン変換部104はピッチゲイン符号Gain1からピッチゲイン逆量子化値を求め、このピッチゲイン逆量子化値を符号化方式２のピッチゲイン符号Gp2に変換する。 When a speech code CODE1 of encoding method 1 (G.729A) is input, the code separation unit 101 separates the speech code CODE1 into parameter codes of LSP code Lsp1, pitch lag code Lag1, pitch gain code Gain1, and algebraic code Cb1, The data is input to the LSP code conversion unit 102, the pitch lag conversion unit 103, the pitch gain conversion unit 104, and the audio reproduction unit 105.
The LSP code converter 102 converts the LSP code Lsp1 into the LSP code Lsp2 of the encoding scheme 2. The pitch lag conversion unit 103 converts the pitch lag code Lag1 into the pitch lag code Lag2 of the encoding scheme 2. The pitch gain conversion unit 104 obtains a pitch gain inverse quantization value from the pitch gain code Gain1, and converts the pitch gain inverse quantization value into a pitch gain code Gp2 of the encoding scheme 2.

音声再生部105は音声符号CODE1の符号成分であるLSP符号Lsp1、ピッチラグ符号Lag1、ピッチゲイン符号Gain1、代数符号Cb1を用いて音声信号Spを再生する。ターゲット作成部106は、音声符号化方式２のLSP符号Lsp2、ピッチラグ符号Lag2、ピッチゲイン符号Gp2から符号化方式２のピッチ周期性合成信号を作成する。しかる後、ターゲット作成部106は再生音声信号Spからピッチ周期性合成信号を差し引いてターゲット信号Targetを作成する。
代数符号変換部107は音声符号化方式２における任意の代数符号と音声符号化方式２のLSP符号Lsp2の逆量子化値を用いて代数合成信号を生成し、ターゲット信号Targetと該代数合成信号との差が最小となる音声符号化方式２の代数符号Cb2を決定する。 The audio reproduction unit 105 reproduces the audio signal Sp using the LSP code Lsp1, the pitch lag code Lag1, the pitch gain code Gain1, and the algebraic code Cb1, which are code components of the audio code CODE1. The target creation unit 106 creates a pitch periodicity synthesized signal of the encoding method 2 from the LSP code Lsp2, the pitch lag code Lag2, and the pitch gain code Gp2 of the speech encoding method 2. Thereafter, the target creating unit 106 creates the target signal Target by subtracting the pitch periodic synthesized signal from the reproduced audio signal Sp.
The algebraic code conversion unit 107 generates an algebraic synthesized signal using an arbitrary algebraic code in the speech coding scheme 2 and an inverse quantization value of the LSP code Lsp2 in the speech coding scheme 2, and generates a target signal Target and the algebraic synthesized signal. The algebraic code Cb2 of the speech coding scheme 2 that minimizes the difference between the two is determined.

代数符号帳ゲイン変換部108は、音声符号化方式２の前記代数符号Cb2に応じた代数符号帳出力信号をLSP符号Lsp2の逆量子化値で構成されたLPC合成フィルタに入力して代数合成信号を作成し、該代数合成信号と前記ターゲット信号とから代数符号帳ゲインを決定し、該代数符号帳ゲインを符号化方式2の量子化テーブルを用いて代数符号帳ゲイン符号を発生する。
符号多重部109は以上により求まった符号化方式2のLSP符号Lsp2、ピッチラグ符号Lag2、ピッチゲイン符号Gp2、代数符号Cb2、代数符号帳ゲイン符号Gc2を多重化して符号化方式２の音声符号CODE2として出力する。 The algebraic codebook gain conversion unit 108 inputs an algebraic codebook output signal corresponding to the algebraic code Cb2 of the speech coding method 2 to an LPC synthesis filter composed of an inverse quantization value of the LSP code Lsp2, and inputs the algebraic synthesized signal , And an algebraic codebook gain is determined from the algebraic synthesized signal and the target signal, and an algebraic codebook gain code is generated using the algebraic codebook gain using a quantization table of encoding scheme 2.
The code multiplexing unit 109 multiplexes the LSP code Lsp2, the pitch lag code Lag2, the pitch gain code Gp2, the algebraic code Cb2, and the algebraic codebook gain code Gc2 obtained as described above as the speech code CODE2 of the encoding system 2. Output.

（B）第1実施例
図２は本発明の第1実施例の音声符号変換装置の構成図であり、図1の原理図と同一部分には同一符号を付している。本実施例では、音声符号化方式１としてG.729Aを用い、音声符号化方式２としてEVRCを用いる場合を示している。また、EVRCにはフルレート、ハーフレート、1/8レートの３種類のモードが存在するが、ここではフルレートのみを用いることとする。
G.729Aのフレーム長は10msecであり、EVRCのフレーム長が20msecであることから、G.729Aの２フレーム分の音声符号をEVRCの１フレーム分の音声符号に変換する。以下では、図３(a)に示すG.729Aの第nフレーム及び第n+1フレームの音声符号を、図3(b)に示すEVRCの第mフレームの音声符号に変換する場合について説明する。 (B) First Embodiment FIG. 2 is a block diagram of a speech code conversion apparatus according to a first embodiment of the present invention. The same reference numerals are given to the same parts as in the principle diagram of FIG. In the present embodiment, G.729A is used as the speech encoding scheme 1 and EVRC is used as the speech encoding scheme 2. EVRC has three modes of full rate, half rate, and 1/8 rate. Here, only full rate is used.
Since the frame length of G.729A is 10 msec and the frame length of EVRC is 20 msec, the voice code for two frames of G.729A is converted into the voice code for one frame of EVRC. In the following, a case will be described in which the G.729A frame n and n + 1 frame speech codes shown in FIG. 3 (a) are converted into EVRC m frame speech codes shown in FIG. 3 (b). .

図2において、G.729Aの符号器（図示せず）から伝送路を介して第nフレーム目の音声符号（回線データ）CODE1(n)が端子＃１に入力する。符号分離部101は、音声符号CODE1(n)からLSP符号Lsp1(n)、ピッチラグ符号Lag1(n, j)、ゲイン符号Gain1(n, j)、代数符号Cb1(n, j)を分離して各変換部102,103,104及び代数符号逆量子化部110に入力する。ここで、括弧内の添字jはサブフレームの番号を表し(図3(a)参照)、0または１の値を取る。
LSP符号変換部102はLSP逆器量子化部102aとLSP量子化部102bを有している。前述のようにG.729Aのフレーム長は10msecであり、G.729A符号器は10msecに1回だけ第１サブフレームの入力信号から求めたLSPパラメータを量子化する。これに対し、EVRCのフレーム長は20msecであり、EVRC符号器は20msecに1回だけ第２サブフレーム及び先読み部分の入力信号から求めたLSPパラメータを量子化する。つまり、同じ20msecを単位として考えると、G.729A符号器は２回のLSP量子化を行うのに対してEVRC符号器は１回しか量子化を行わない。このため、G.729の連続する２つのフレームのLSP符号をそのままではEVRCのLSP符号に変換することはできない。 In FIG. 2, the voice code (line data) CODE1 (n) of the nth frame is input to the terminal # 1 from the G.729A encoder (not shown) via the transmission line. The code separation unit 101 separates the LSP code Lsp1 (n), the pitch lag code Lag1 (n, j), the gain code Gain1 (n, j), and the algebraic code Cb1 (n, j) from the speech code CODE1 (n). Each of the transform units 102, 103, 104 and the algebraic code inverse quantization unit 110 is input. Here, the subscript j in parentheses represents a subframe number (see FIG. 3A), and takes a value of 0 or 1.
The LSP code converter 102 includes an LSP inverse quantizer 102a and an LSP quantizer 102b. As described above, the frame length of G.729A is 10 msec, and the G.729A encoder quantizes the LSP parameter obtained from the input signal of the first subframe once every 10 msec. On the other hand, the frame length of EVRC is 20 msec, and the EVRC encoder quantizes the LSP parameter obtained from the input signal of the second subframe and the prefetch portion only once every 20 msec. That is, considering the same 20 msec as a unit, the G.729A encoder performs LSP quantization twice, whereas the EVRC encoder performs quantization only once. For this reason, the LSP code of two consecutive frames of G.729 cannot be converted into the EVRC LSP code as it is.

そこで、第1実施例では、G.729Aの奇数フレーム(第(n+1)フレーム)におけるLSP符号のみをEVRCのLSP符号に変換し、偶数フレーム（第nフレーム）のLSP符号は変換しない構成とした。ただし、偶数フレームのLSP符号をEVRCのLSP符号に変換し、奇数フレームのLSP符号を変換しないようにすることもできる。
LSP逆量子化部102aは、LSP符号Lsp1(n)が入力されると該符号を逆量子化してLSP逆量子化値lsp1を出力する。ここで、lsp1は10個の係数からなるベクトルである。又、LSP逆量子化部102aはG.729Aの復号器において用いられる逆量子化器と同じ動作をする。 Therefore, in the first embodiment, only the LSP code in the odd frame ((n + 1) th frame) of G.729A is converted into the EVRC LSP code, and the LSP code in the even frame (nth frame) is not converted. It was. However, it is possible to convert the LSP code of the even frame into the LSP code of EVRC and not convert the LSP code of the odd frame.
When the LSP code Lsp1 (n) is input, the LSP dequantization unit 102a dequantizes the code and outputs an LSP dequantized value lsp1. Here, lsp1 is a vector composed of 10 coefficients. The LSP inverse quantization unit 102a performs the same operation as the inverse quantizer used in the G.729A decoder.

LSP量子化部102bに奇数フレームのLSP逆量子化値lsp1が入力されると、該LSP量子化部102bはEVRCのLSP量子化方法に従って量子化してLSP符号Lsp2(m)を出力する。ここで、LSP量子化部102bはEVRC符号器において用いられる量子化器と必ずしも全く同じものである必要はないが、少なくともLSP量子化テーブルはEVRCの量子化テーブルと同一のテーブルを用いるものとする。尚、偶数フレームのLSP逆量子化値はLSP符号変換には用いられない。また、LSP逆量子化値lsp1は後述する音声再生部105においてLPC合成フィルタの係数として用いられる。
ついで、LSP量子化部102bは変換されたLSP符号Lsp2(m)を復号して得られるLSP逆量子化値と、前フレームのLSP符号Lsp2(m-1)を復号して得られるLSP逆量子化値とから線形補間により現フレーム内の３つのサブフレームにおけるLSPパラメータlsp2(k)、(k=0,1,2)を求める。lsp2(k)は後述するターゲット生成部106等で用いられる。lsp2(k)は10次元のベクトルである。 When the LSP inverse quantized value lsp1 of the odd frame is input to the LSP quantizing unit 102b, the LSP quantizing unit 102b quantizes it according to the EVRC LSP quantization method and outputs the LSP code Lsp2 (m). Here, the LSP quantization unit 102b is not necessarily the same as the quantizer used in the EVRC encoder, but at least the LSP quantization table uses the same table as the EVRC quantization table. . Note that the LSP inverse quantization value of the even frame is not used for LSP code conversion. Further, the LSP inverse quantization value lsp1 is used as a coefficient of the LPC synthesis filter in the audio reproduction unit 105 described later.
Next, the LSP quantization unit 102b decodes the converted LSP code Lsp2 (m) and the LSP inverse quantization value obtained by decoding the LSP code Lsp2 (m-1) of the previous frame. LSP parameters lsp2 (k) and (k = 0, 1, 2) in three subframes in the current frame are obtained from the converted values by linear interpolation. lsp2 (k) is used in the target generation unit 106 and the like described later. lsp2 (k) is a 10-dimensional vector.

ピッチラグ変換部103は、ピッチラグ逆器量子化部103aとピッチラグ量子化部103bを有している。G.729Aでは5msecのサブフレームごとにピッチラグを量子化する。一方、EVRCでは１フレームに1回だけピッチラグを量子化する。20msecを単位として考えるとG.729Aは４つのピッチラグを量子化するのに対して、EVRCは１つのピッチラグのみを量子化する。したがって、G.729Aの音声符号をEVRCの音声符号へ変換する場合には、G.729Aの全てのピッチラグをEVRCのピッチラグに変換することはできない。
そこで、第1実施例では、G.729Aの第n+1フレームの最終サブフレーム（第１サブレーム）におけるピッチラグ符号Lag1(n+1,1)をG.729Aのピッチラグ逆量子化部103aにより逆量子化してピッチラグlag1を求め、このlag1をEVRCのピッチラグ量子化部103bにより量子化して第mフレーム第2サブフレームにおけるピッチラグ符号Lag2(m)とする。また、ピッチラグ量子化部103bはEVRC符号器・復号器と同じ方法によりピッチラグの補間をする。すなわち、Lag2(m)を逆量子化して得られる第2サブフレームのピッチラグ逆量子化値と、前フレームの第2サブフレームのピッチラグ逆量子化値との線形補間により各サブフレームのピッチラグ補間値lag2(k), (k=0,1,2)を求める。ピッチラグ補間値は後述するターゲット生成部106で使用される。 The pitch lag conversion unit 103 includes a pitch lag inverse quantization unit 103a and a pitch lag quantization unit 103b. In G.729A, the pitch lag is quantized every 5 msec subframe. On the other hand, EVRC quantizes the pitch lag only once per frame. Considering 20 msec as a unit, G.729A quantizes four pitch lags, whereas EVRC quantizes only one pitch lag. Therefore, when converting a G.729A speech code into an EVRC speech code, it is not possible to convert all pitch lags in G.729A into EVRC pitch lags.
Therefore, in the first embodiment, the pitch lag code Lag1 (n + 1,1) in the last subframe (first subframe) of the (n + 1) th frame of G.729A is inverted by the pitch lag inverse quantization unit 103a of G.729A. The pitch lag lag1 is obtained by quantization, and this lag1 is quantized by the EVRC pitch lag quantization unit 103b to obtain the pitch lag code Lag2 (m) in the m-th frame second subframe. Further, the pitch lag quantization unit 103b performs pitch lag interpolation by the same method as the EVRC encoder / decoder. That is, the pitch lag interpolation value of each subframe is obtained by linear interpolation between the pitch lag inverse quantization value of the second subframe obtained by dequantizing Lag2 (m) and the pitch lag inverse quantization value of the second subframe of the previous frame. Find lag2 (k), (k = 0,1,2). The pitch lag interpolation value is used by the target generation unit 106 described later.

ピッチゲイン変換部104は、ピッチゲイン逆器量子化部104aとピッチゲイン量子化部104bを有している。G.729Aでは5msecのサブフレーム毎にピッチゲインを量子化するから、20msecを単位として考えるとG.729Aは１フレームに４つのピッチゲインを量子化する。一方、EVRCは１フレームに3つのピッチゲインを量子化する。したがって、G.729Aの音声符号をEVRCの音声符号へ変換する場合には、G.729Aの全てのピッチゲインをEVRCのピッチゲインに変換することはできない。そこで、第1実施例では図４に示す方法によりゲインの変換を行う。すなわち、G.729Aの連続する２つのフレームのピッチゲインをgp1(0)、gp1(1)、gp1(2)、gp1(3)とし、次式
gp2(0) = gp1(0)
gp2(1) = (gp1(1) + gp1(2)) / 2
gp2(2) = gp1(3)
によりピッチゲインを合成する。合成されたピッチゲインgp2(k)(k=0,1,2)をそれぞれEVRCのピッチゲイン量子化テーブルを用いてスカラー量子化し、ピッチゲイン符号Gp2(m,k)を求める。ピッチゲインgp2(k)(k=0,1,2)は後述するターゲット生成部106で使用される。 The pitch gain converter 104 includes a pitch gain inverse quantizer 104a and a pitch gain quantizer 104b. In G.729A, the pitch gain is quantized every 5 msec sub-frame, so when considering 20 msec as a unit, G.729A quantizes four pitch gains in one frame. On the other hand, EVRC quantizes three pitch gains in one frame. Therefore, when converting a G.729A speech code into an EVRC speech code, it is not possible to convert all pitch gains of G.729A into EVRC pitch gains. Therefore, in the first embodiment, gain conversion is performed by the method shown in FIG. That is, gp1 (0), gp1 (1), gp1 (2), gp1 (3) are set as the pitch gain of two consecutive frames of G.729A
gp2 (0) = gp1 (0)
gp2 (1) = (gp1 (1) + gp1 (2)) / 2
gp2 (2) = gp1 (3)
To synthesize pitch gain. The synthesized pitch gain gp2 (k) (k = 0, 1, 2) is scalar quantized using the EVRC pitch gain quantization table to obtain a pitch gain code Gp2 (m, k). The pitch gain gp2 (k) (k = 0, 1, 2) is used by the target generation unit 106 described later.

代数符号逆量子化部110は代数符号Cb(n,j)を逆量子化し、得られた代数符号逆量子化値cb1(j)を音声再生部105に入力する。
音声再生部105は、第nフレームにおけるG.729Aの再生音声Sp(n，h)と、第n+1フレームにおけるG.729Aの再生音声Sp(n+1，h)を作成する。なお、再生音声の作成方法はG.729Aの復号器の動作と同じであり、従来技術の項で説明済みであり、ここでは説明を省略する。再生音声Sp(n，h)とSp(n+1，h)の次元数はG.729Aのフレーム長と同じ80サンプルであり（h＝１〜80）、合わせて160サンプルとなりEVRCの1フレーム当たりのサンプル数になる。音声再生部105は、作成した再生音声Sp(n，h)，Sp(n+1，h)を図５に示すようにSp (0，i)、Sp(1，i)、Sp(2，i)の３つのベクトルに分割して出力する。ｉはEVRCの第0、1サブフレームでは１〜53、第2サブフレームでは１〜54である。 The algebraic code dequantization unit 110 dequantizes the algebraic code Cb (n, j), and inputs the obtained algebraic code dequantized value cb1 (j) to the speech reproduction unit 105.
The audio playback unit 105 creates G.729A playback audio Sp (n, h) in the nth frame and G.729A playback audio Sp (n + 1, h) in the n + 1th frame. Note that the method of creating reproduced audio is the same as the operation of the G.729A decoder, and has been described in the section of the prior art, and the description thereof is omitted here. The number of dimensions of the playback audio Sp (n, h) and Sp (n + 1, h) is 80 samples, the same as the frame length of G.729A (h = 1 to 80). The number of samples per unit. The audio reproducing unit 105 generates the reproduced audio Sp (n, h), Sp (n + 1, h) as shown in FIG. 5 as Sp (0, i), Sp (1, i), Sp (2, Divided into three vectors i) and output. i is 1 to 53 in the 0th and 1st subframes of EVRC, and 1 to 54 in the 2nd subframe.

ターゲット生成部106は、代数符号変換部107及び代数符号長ゲイン変換部108で参照信号として用いられるターゲット信号Target(k、I)を作成する。図６はターゲット生成部106の構成図である。適応符号帳106aは、ピッチラグ符号変換部103で求めたピッチラグlag2(k)に対応するN個のサンプル信号acb(k，i)(i=0〜N-1)を出力する。ここで、kはEVRCのサブフレーム番号、NはEVRCのサブフレーム長であり、第0、1サブフレームでは53、第2サブフレームでは54である。以下、特に断らない限り添字iは53又は54である。尚、106eは適応符号帳更新部である。 The target generation unit 106 generates a target signal Target (k, I) that is used as a reference signal in the algebraic code conversion unit 107 and the algebraic code length gain conversion unit 108. FIG. 6 is a configuration diagram of the target generation unit 106. The adaptive codebook 106a outputs N sample signals acb (k, i) (i = 0 to N−1) corresponding to the pitch lag lag2 (k) obtained by the pitch lag code conversion unit 103. Here, k is the EVRC subframe number, N is the EVRC subframe length, and is 53 for the 0th and 1st subframes and 54 for the 2nd subframe. Hereinafter, the suffix i is 53 or 54 unless otherwise specified. Reference numeral 106e denotes an adaptive codebook update unit.

ゲイン乗算部106bは適応符号帳出力acb(k，i)にピッチゲインgp2(k)を乗算してLPC合成フィルタ106cに入力する。LPC合成フィルタ106cは、LSP符号の逆量子化値lsp2(k)で構成されており、適応符号帳合成信号syn(k，i)を出力する。演算部106dは、3分割された再生信号Sp(k，i)から適応符号帳合成信号syn(k，i)を差し引いてターゲット信号Target(k，i)を求める。Target(k，i)は後述する代数符号変換部107及び代数符号帳ゲイン変換部108で使用される。 Gain multiplication section 106b multiplies adaptive codebook output acb (k, i) by pitch gain gp2 (k) and inputs the result to LPC synthesis filter 106c. The LPC synthesis filter 106c is configured with an inverse quantization value lsp2 (k) of the LSP code, and outputs an adaptive codebook synthesis signal syn (k, i). The computing unit 106d obtains the target signal Target (k, i) by subtracting the adaptive codebook synthesized signal syn (k, i) from the reproduction signal Sp (k, i) divided into three. Target (k, i) is used in an algebraic code converter 107 and an algebraic codebook gain converter 108 described later.

代数符号変換部107は、EVRCの代数符号探索と全く同じ処理を行う。図7は代数符号変換部107の構成図である。代数符号帳107aは、表３に示したパルス位置・極性の組み合わせでできる任意のパルス性音源信号を出力する。すなわち、代数符号帳107aは誤差評価部107bから所定の代数符号に応じたパルス性音源信号の出力が指示されると、該指示された代数符号に応じたパルス性音源信号をLPC合成フィルタ107cに入力する。LSP符号の逆量子化値lsp2(k)で構成されるLPC合成フィルタ107cは、代数符号帳出力信号が入力すると代数合成信号alg(k，i)を作成して出力する。誤差評価部107bは、代数合成信号alg(k，i)とターゲット信号Target(k，i)の相互相関値Rcx、代数合成信号の自己相関値Rccを計算し、Rcxの２乗をRccで正規化して得られる正規化相互相関値Rcx・Rcx/Rccが最も大きくなる代数符号Cb2(m，k)を探索して出力する。 The algebraic code conversion unit 107 performs exactly the same processing as the EVRC algebraic code search. FIG. 7 is a configuration diagram of the algebraic code converter 107. The algebraic codebook 107a outputs an arbitrary pulsed sound source signal that can be generated by a combination of pulse positions and polarities shown in Table 3. That is, when the algebraic codebook 107a is instructed to output a pulsed excitation signal corresponding to a predetermined algebraic code from the error evaluation unit 107b, the algebraic codebook 107a sends the pulsed excitation signal corresponding to the instructed algebraic code to the LPC synthesis filter 107c. input. The LPC synthesis filter 107c composed of the dequantized value lsp2 (k) of the LSP code creates and outputs an algebraic synthesized signal alg (k, i) when an algebraic codebook output signal is input. The error evaluation unit 107b calculates the cross-correlation value Rcx of the algebraic synthesized signal alg (k, i) and the target signal Target (k, i) and the autocorrelation value Rcc of the algebraic synthesized signal, and normalizes the square of Rcx with Rcc. The algebraic code Cb2 (m, k) having the largest normalized cross-correlation value Rcx · Rcx / Rcc obtained by the conversion is searched and output.

代数符号帳ゲイン変換部108は図8に示す構成を備えている。代数符号帳108aは代数符号変換部107で得られた代数符号Cb2(m, k)に対応するパルス性音源信号を発生してLPC合成フィルタ108bに入力する。LSP符号の逆量子化値lsp2(k)で構成されるLPC合成フィルタ108bは、代数符号帳出力信号が入力すると代数合成信号gan(k，i)を作成して出力する。代数符号帳ゲイン算出部108cは、代数合成信号gan(k，i)とターゲット信号Target(k，i)との相互相関値Rcx、代数合成信号の自己相関値Rccを求め、しかる後、RcxをRccで正規化して代数符号帳ゲインgc2(k)（=Rcx/Rcc）を求める。代数符号帳ゲイン量子化部108dは代数符号帳ゲインgc(k)2をEVRCの代数符号帳ゲイン量子化テーブル108eを使ってスカラー量子化する。EVRCでは代数符号帳ゲインの量子化ビットとして１サブフレーム当たり5bit（３２パタン）を割り当てている。したがって、この３２通りのテーブル値の中からgc2(k)に最も近いテーブル値を探し、その時のインデックス値を変換された代数符号帳ゲイン符号Gc2(m, k)とする。 The algebraic codebook gain converter 108 has the configuration shown in FIG. The algebraic codebook 108a generates a pulsed excitation signal corresponding to the algebraic code Cb2 (m, k) obtained by the algebraic code converter 107 and inputs it to the LPC synthesis filter 108b. The LPC synthesis filter 108b composed of the dequantized value lsp2 (k) of the LSP code creates and outputs an algebraic synthesized signal gan (k, i) when an algebraic codebook output signal is input. The algebraic codebook gain calculation unit 108c obtains the cross-correlation value Rcx between the algebraic synthesized signal gan (k, i) and the target signal Target (k, i) and the autocorrelation value Rcc of the algebraic synthesized signal, and then calculates Rcx. The algebraic codebook gain gc2 (k) (= Rcx / Rcc) is obtained by normalization with Rcc. The algebraic codebook gain quantization unit 108d performs scalar quantization on the algebraic codebook gain gc (k) 2 using the EVRC algebraic codebook gain quantization table 108e. In EVRC, 5 bits (32 patterns) are assigned per subframe as quantization bits of algebraic codebook gain. Therefore, the table value closest to gc2 (k) is searched from among these 32 table values, and the index value at that time is set as the converted algebraic codebook gain code Gc2 (m, k).

EVRCの１つのサブフレームについてピッチラグ符号、ピッチゲイン符号、代数符号、代数符号帳ゲイン符号の変換が終った後に、適応符号帳106a(図6)の更新を行う。初期状態では適応符号帳106aには全て振幅０の信号が格納されている。サブフレームの変換処理が終ると、図6の適応符号帳更新部106eは適応符号帳内で時間的に最も古い信号をサブフレーム長さだけ捨て、残りの信号をサブフレーム長だけシフトし、変換直後の最新の音源信号を適応符号帳内に格納する。ここで最新の音源信号とは、変換後のピッチラグ符号lag2(k)、ピッチゲインgp2(k)に応じた周期性音源信号と、代数符号Cb2(m,k)、代数符号帳ゲインgc2(k)に応じた雑音性音源信号を合成した音源信号である。
以上により、EVRCのLSP符号Lsp2(m)、ピッチラグ符号Lag2(m)、ピッチゲイン符号Gp2(m,k)、代数符号Cb2(m，k)、代数符号帳ゲイン符号Gc2(m，k)が求まれば、符号多重部109はこれらの符号を多重して一つにまとめて符号化方式２の音声符号CODE2(m)として出力する。 After conversion of the pitch lag code, pitch gain code, algebraic code, and algebraic codebook gain code is completed for one EVRC subframe, the adaptive codebook 106a (FIG. 6) is updated. In the initial state, all signals with an amplitude of 0 are stored in the adaptive codebook 106a. When the subframe conversion process is completed, adaptive codebook updating section 106e in FIG. 6 discards the oldest temporal signal in the adaptive codebook by the subframe length, shifts the remaining signals by the subframe length, and converts them. The latest excitation signal immediately after is stored in the adaptive codebook. Here, the latest excitation signal is a periodic excitation signal corresponding to the converted pitch lag code lag2 (k), pitch gain gp2 (k), algebraic code Cb2 (m, k), algebraic codebook gain gc2 (k ) Is a sound source signal synthesized with a noisy sound source signal.
By the above, EVRC LSP code Lsp2 (m), pitch lag code Lag2 (m), pitch gain code Gp2 (m, k), algebraic code Cb2 (m, k), algebraic codebook gain code Gc2 (m, k) If it is obtained, the code multiplexing unit 109 multiplexes these codes and outputs them as a speech code CODE2 (m) of the encoding scheme 2 as one.

第1実施例では、LSP符号、ピッチラグ符号、ピッチゲイン符号を量子化パラメータ領域で符号変換しているため、再生音声を再度LPC分析、ピッチ分析する場合に比べて分析誤差が小さく、音質劣化の少ないパラメータ変換が可能である。また、再生音声を再度LPC分析、ピッチ分析しないため、従来技術１で問題となっていた符号変換による遅延の問題を解決することができる。
一方、代数符号、代数符号帳ゲイン符号については、再生音声からターゲット信号を作成し、ターゲット信号との誤差が最小になるように変換することにより、従来技術２で問題となっていた符号化方式１と符号化方式２の代数符号帳の構成が大きく異なっている場合でも音質劣化の少ない符号変換が可能である。 In the first embodiment, since the LSP code, pitch lag code, and pitch gain code are code-converted in the quantization parameter area, the analysis error is small compared with the case where the reproduced speech is analyzed again by LPC analysis and pitch analysis, and the sound quality is deteriorated. Less parameter conversion is possible. Further, since the reproduced speech is not subjected to LPC analysis or pitch analysis again, the problem of delay due to code conversion, which has been a problem in the prior art 1, can be solved.
On the other hand, for the algebraic code and the algebraic codebook gain code, a coding method which has been a problem in the prior art 2 is created by creating a target signal from reproduced speech and converting the target signal to minimize an error from the target signal. Even when the configurations of the algebraic codebooks 1 and 2 are greatly different, code conversion with little deterioration in sound quality is possible.

（C）第2実施例
図９は本発明の第2実施例の音声符号変換装置の構成図であり、図2の第1実施例と同一部分には同一符号を付している。第2実施例において第1実施例と異なる点は、（1）第1実施例の代数符号帳ゲイン変換部108を除去し、替わって代数符号帳ゲイン量子化部１１０を設けた点、（2）LSP符号、ピッチラグ符号、ピッチゲイン符号に加えて、代数符号帳ゲイン符号も量子化パラメータ領域で符号変換する点である。 (C) Second Embodiment FIG. 9 is a block diagram of a speech code conversion apparatus according to a second embodiment of the present invention. Components identical with those of the first embodiment of FIG. The second embodiment differs from the first embodiment in that (1) the algebraic codebook gain conversion unit 108 of the first embodiment is removed, and an algebraic codebook gain quantization unit 110 is provided instead. ) In addition to the LSP code, pitch lag code, and pitch gain code, the algebraic codebook gain code is also subjected to code conversion in the quantization parameter area.

第2実施例において、代数符号帳ゲイン符号の変換方法だけが第1実施例と異なる。以下、第2実施例の代数符号帳ゲイン符号の変換方法を説明する。
G.729Aでは5msecのサブフレーム毎に代数符号帳ゲインを量子化するから、20msecを単位として考えるとG.729Aは１フレームに４つの代数符号帳ゲインを量子化する。一方、EVRCは１フレームに3つの代数符号帳ゲインを量子化する。したがって、G.729Aの音声符号をEVRCの音声符号へ変換する場合には、G.729Aの全ての代数符号帳ゲインをEVRCの代数符号帳ゲインに変換することはできない。そこで、第２実施例では図１０に示す方法によりゲインの変換を行う。すなわち、G.729Aの連続する２つのフレームの代数符号帳ゲインをgc1(0)、gc1(1)、gc1(2)、gc1(3)とし、次式
gc2(0) ＝ gc1(0)
gc2(1) ＝ (gc1(1) + gc1(2)) / 2
gc2(2) ＝ gc1(3)
により代数符号帳ゲインを合成する。合成された代数符号帳ゲインgc2(k)(k=0,1,2)をそれぞれEVRCの代数符号帳ゲイン量子化テーブルを用いてスカラー量子化し、代数符号帳ゲイン符号Gc2(m,k)を求める。 The second embodiment differs from the first embodiment only in the algebraic codebook gain code conversion method. The algebraic codebook gain code conversion method of the second embodiment will be described below.
In G.729A, the algebraic codebook gain is quantized every 5 msec sub-frame, so when considering 20 msec as a unit, G.729A quantizes four algebraic codebook gains in one frame. On the other hand, EVRC quantizes three algebraic codebook gains in one frame. Therefore, when converting a G.729A speech code to an EVRC speech code, it is not possible to convert all G.729A algebraic codebook gains to EVRC algebraic codebook gains. Therefore, in the second embodiment, gain conversion is performed by the method shown in FIG. That is, the algebraic codebook gains of two consecutive frames of G.729A are gc1 (0), gc1 (1), gc1 (2), and gc1 (3),
gc2 (0) = gc1 (0)
gc2 (1) = (gc1 (1) + gc1 (2)) / 2
gc2 (2) = gc1 (3)
To synthesize the algebraic codebook gain. The synthesized algebraic codebook gain gc2 (k) (k = 0,1,2) is scalar quantized using the EVRC algebraic codebook gain quantization table, and the algebraic codebook gain code Gc2 (m, k) is obtained. Ask.

第2実施例では、LSP符号、ピッチラグ符号、ピッチゲイン符号、代数符号帳ゲイン符号を量子化パラメータ領域で符号変換しているため、再生音声を再度LPC分析、ピッチ分析する場合に比べて分析誤差が小さく、音質劣化の少ないパラメータ変換が可能である。また、再生音声を再度LPC分析、ピッチ分析しないため、従来技術１で問題となっていた符号変換による遅延の問題を解決することができる。
一方、代数符号については、再生音声からターゲット信号を作成し、ターゲット信号との誤差が最小になるように変換することにより、従来技術２で問題となっていた符号化方式１と符号化方式２の代数符号帳の構成が大きく異なっている場合でも音質劣化の少ない符号変換が可能である。 In the second embodiment, since the LSP code, pitch lag code, pitch gain code, and algebraic codebook gain code are code-converted in the quantization parameter area, the analysis error is compared with the case where the reproduced speech is subjected to LPC analysis and pitch analysis again. Parameter conversion with small sound quality degradation is possible. Further, since the reproduced speech is not subjected to LPC analysis or pitch analysis again, the problem of delay due to code conversion, which has been a problem in the prior art 1, can be solved.
On the other hand, for algebraic codes, a target signal is generated from reproduced speech and converted so that an error from the target signal is minimized, so that encoding method 1 and encoding method 2 which are problems in conventional technique 2 are obtained. Even if the configurations of the algebraic codebooks are greatly different, code conversion with little deterioration in sound quality is possible.

（D）第3実施例
図11は第3実施例の音声符号変換装置の全体構成図である。第3実施例はEVRCの音声符号をG.729Aの音声符号に変換する場合の例を示している。図11において、レート判定部201は、EVRC符号器より音声符号が入力すると、EVRCのレートを判別する。EVRC音声符号の中にフルレート、ハーフレート、1/8レートの何れであるかを示すレート情報が含まれているから、レート判定部201はこの情報を用いてEVRCのレートを判別する。そして、レート判定部201はレートに応じてスイッチS1、S2を切り替え、EVRC音声符号を選択的に所定のレート用音声符号変換部202,203,204に入力し、かつ、該レート用音声符号変換部から出力されるG.729Aの音声符号をG.729A復号器側に送出する。 (D) Third Embodiment FIG. 11 is an overall configuration diagram of a speech code conversion device according to a third embodiment. The third embodiment shows an example in which EVRC speech codes are converted to G.729A speech codes. In FIG. 11, when a speech code is input from an EVRC encoder, a rate determination unit 201 determines an EVRC rate. Since the EVRC speech code includes rate information indicating whether the rate is a full rate, a half rate, or a 1/8 rate, the rate determination unit 201 uses this information to determine the EVRC rate. Then, the rate determination unit 201 switches the switches S1 and S2 according to the rate, and selectively inputs the EVRC speech code to the predetermined rate speech code conversion units 202, 203, and 204, and is output from the rate speech code conversion unit. The G.729A speech code is sent to the G.729A decoder.

・フルレート用音声符号変換部
図12はフルレート用音声符号変換部２０２の構成図である。EVRCのフレーム長は20msecであり、G.729Aのフレーム長は10msecであるため、EVRCの１フレーム（第mフレーム）の音声符号をG.729Aの２フレーム（第n, n+1フレーム）の音声符号に変換する。
EVRCの符号器（図示せず）から伝送路を介して第ｍフレーム目の音声符号（回線データ）CODE1(m)が端子＃１に入力する。符号分離部301は、音声符号CODE1(m)からLSP符号Lsp1(m)、ピッチラグ符号Lag1(m)、ピッチゲイン符号Gp1(m, k)、代数符号Cb1(m, k)、代数符号帳ゲイン符号Gc1(m, k)を分離して各逆量子化部302〜306に入力する。ここで、ｋはEVRCのサブフレーム番号であり、0、１，２の何れかの値を取る。 Full-rate speech code converter FIG. 12 is a block diagram of the full-rate speech code converter 202. Since the EVRC frame length is 20 msec and the G.729A frame length is 10 msec, the EVRC 1 frame (m-th frame) voice code is changed to 2 G.729A frames (n-th and n + 1-th frames). Convert to speech code.
The mth frame speech code (line data) CODE1 (m) is input to terminal # 1 from an EVRC encoder (not shown) via a transmission line. Code separation unit 301 is LSP code Lsp1 (m), pitch lag code Lag1 (m), pitch gain code Gp1 (m, k), algebraic code Cb1 (m, k), algebraic codebook gain from speech code CODE1 (m) The code Gc1 (m, k) is separated and input to the inverse quantization units 302 to 306. Here, k is a subframe number of EVRC, and takes a value of 0, 1, or 2.

LSP逆量子化部302は、サブフレーム番号2のLSP符号Lsp1(m)の逆量子化値lsp1(m，2)を求める。なお、LSP逆量子化部302はEVRC復号器と同じ量子化テーブルを用いるものとする。次に、LSP逆量子化部302は、前フレーム（第m-1フレーム）で同様にして求めたサブフレーム番号2の逆量子化値lsp1(m-1，2)と前記逆量子化値lsp1(m，2)を用いて線形補間によりサブフレーム番号0,1の逆量子化値lsp1(m，0)とlsp1(m，1)を求め、サブフレーム番号1の逆量子化値lsp1(m, 1)をLSP量子化部307に入力する。LSP量子化部307は、符号化方式2(G.729A)の量子化テーブルを用いて逆量子化値lsp1(m，1)を量子化して符号化方式2のLSP符号Lsp2(n)を求めると共にそのLSP逆量子化値lsp2(n，1)を求める。同様にして、LSP逆量子化部302は、サブフレーム番号２の逆量子化値lsp1(m，2)をLSP量子化部307に入力し、符号化方式2のLSP符号Lsp2(n+1)とそのLSP逆量子化値lsp2(n+1，1)を求める。ここで、LSP量子化部302はG.729Aと同じ量子化テーブルを用いるものとする。
ついで、LSP量子化部307は前フレーム（第n-1フレーム）で求めた逆量子化値lsp2(n-1,1)と現フレームの逆量子化値lsp2(n，1)との線形補間によりサブフレーム番号0の逆量子化値lsp2(n，0)を求める。また、逆量子化値lsp2(n，1)と逆量子化値lsp2(n+1，1)との線形補間によりサブフレーム0の逆量子化値lsp2(n+1，0)を求める。これら逆量子化値lsp2(n，j)はターゲット信号の作成や代数符号、ゲイン符号の変換に使用される。 The LSP inverse quantization unit 302 obtains an inverse quantization value lsp1 (m, 2) of the LSP code Lsp1 (m) of subframe number 2. Note that the LSP inverse quantization unit 302 uses the same quantization table as the EVRC decoder. Next, the LSP inverse quantization unit 302 performs the inverse quantization value lsp1 (m−1,2) of subframe number 2 obtained in the same manner in the previous frame (the m−1th frame) and the inverse quantization value lsp1. The inverse quantization values lsp1 (m, 0) and lsp1 (m, 1) of subframe numbers 0 and 1 are obtained by linear interpolation using (m, 2), and the inverse quantization values lsp1 (m , 1) is input to the LSP quantization unit 307. The LSP quantization unit 307 quantizes the inverse quantization value lsp1 (m, 1) using the quantization table of coding scheme 2 (G.729A) to obtain the LSP code Lsp2 (n) of coding scheme 2 In addition, the LSP inverse quantization value lsp2 (n, 1) is obtained. Similarly, the LSP inverse quantization unit 302 inputs the inverse quantization value lsp1 (m, 2) of subframe number 2 to the LSP quantization unit 307, and the LSP code Lsp2 (n + 1) of encoding scheme 2 And its LSP inverse quantization value lsp2 (n + 1, 1). Here, it is assumed that the LSP quantization unit 302 uses the same quantization table as G.729A.
Next, the LSP quantization unit 307 performs linear interpolation between the inverse quantization value lsp2 (n-1,1) obtained in the previous frame (the (n-1) th frame) and the inverse quantization value lsp2 (n, 1) of the current frame. Thus, the inverse quantization value lsp2 (n, 0) of subframe number 0 is obtained. Further, the inverse quantization value lsp2 (n + 1,0) of subframe 0 is obtained by linear interpolation between the inverse quantization value lsp2 (n, 1) and the inverse quantization value lsp2 (n + 1,1). These inverse quantized values lsp2 (n, j) are used for creating a target signal and converting algebraic codes and gain codes.

ピッチラグ逆量子化部303はサブフレーム番号２のピッチラグ符号Lag1(m)の逆量子化値lag1(m, 2)を求め、この逆量子化値lag1(m, 2)と第m-1フレームで求めたサブフレーム番号2の逆量子化値lag(m-1, 2)の線形補間によりサブフレーム番号0,1の逆量子化値lag1(m, 0)，lag1(m, 1)を求める。次に、ピッチラグ逆量子化部303は逆量子化値lag1(m, 1)をピッチラグ量子化部308に入力し、ピッチラグ量子化部308は符号化方式2(G.729A)の量子化テーブルを用いて逆量子化値lag1(m,1)に対応する符号化方式2のピッチラグ符号Lag2(n)を求めると共にその逆量子化値lag2(n,1)を求める。同様にして、ピッチラグ逆量子化部303は逆量子化値lag1(m, 2)をピッチラグ量子化部308に入力し、ピッチラグ量子化部308はピッチラグ符号Lag2(n+1)を求めると共にその逆量子化値lag2(n+1, 1)を求める。ここで、ピッチラグ量子化部308はG.729Aと同じ量子化テーブルを用いる。 The pitch lag inverse quantization unit 303 obtains an inverse quantization value lag1 (m, 2) of the pitch lag code Lag1 (m) of subframe number 2, and uses the inverse quantization value lag1 (m, 2) and the m−1th frame. The inverse quantization values lag1 (m, 0) and lag1 (m, 1) of the subframe numbers 0 and 1 are obtained by linear interpolation of the obtained inverse quantization value lag (m-1, 2) of the subframe number 2. Next, the pitch lag inverse quantization unit 303 inputs the inverse quantization value lag1 (m, 1) to the pitch lag quantization unit 308, and the pitch lag quantization unit 308 obtains the quantization table of coding scheme 2 (G.729A). The pitch lag code Lag2 (n) of the encoding scheme 2 corresponding to the inverse quantized value lag1 (m, 1) is obtained and the inverse quantized value lag2 (n, 1) is obtained. Similarly, the pitch lag inverse quantization unit 303 inputs the inverse quantization value lag1 (m, 2) to the pitch lag quantization unit 308, and the pitch lag quantization unit 308 obtains the pitch lag code Lag2 (n + 1) and vice versa. The quantized value lag2 (n + 1, 1) is obtained. Here, the pitch lag quantization unit 308 uses the same quantization table as G.729A.

ついで、ピッチラグ量子化部308は前フレーム（第n-1フレーム）で求めた逆量子化値lag2(n-1,1)と現フレームの逆量子化値lag2(n,1)との線形補間によりサブフレーム0の逆量子化値lag2(n, 0)を求める。また、逆量子化値lag2(n, 1)と逆量子化値lag2(n+1, 1)との線形補間によりサブフレーム0の逆量子化値lag2(n+1，0)を求める。これら逆量子化値lag2(n，j)はターゲット信号の作成やゲイン符号の変換に使用される。
ピッチゲイン逆量子化部304はEVRCの第mフレームの３つのピッチゲイン符号Gp1(m, k) (k=0,1,2)の逆量子化値gp1(m, k)を求め、ピッチゲイン補間部309に入力する。ピッチゲイン補間部309は逆量子化値gp1(m, k)を用いて、符号化方式2(G.729A)のピッチゲイン逆量子化値gp2(n, j)(j＝0,1)、gp2(n+1, j)(j＝0,1)を次式
(1) gp2(n, 0) ＝ gp1(m, 0)
(2) gp2(n, 1) ＝ (gp1(m, 0) + gp1(m,1)) / 2
(3) gp2(n+1, 0) ＝ (gp1(m, 1) + gp1(m, 2)) / 2
(4) gp2(n+1, 1) ＝ gp1(m, 2)
により補間して求める。尚、ゲイン符号変換の際にピッチゲイン逆量子化値gp2(n, j)は、直接必要でないが、ターゲット信号の生成に使用する。 Next, the pitch lag quantization unit 308 performs linear interpolation between the inverse quantization value lag2 (n-1,1) obtained in the previous frame (n-1 frame) and the inverse quantization value lag2 (n, 1) of the current frame. Thus, the inverse quantization value lag2 (n, 0) of subframe 0 is obtained. Further, the inverse quantization value lag2 (n + 1, 0) of subframe 0 is obtained by linear interpolation between the inverse quantization value lag2 (n, 1) and the inverse quantization value lag2 (n + 1, 1). These dequantized values lag2 (n, j) are used for creating a target signal and converting a gain code.
The pitch gain inverse quantization unit 304 obtains inverse quantization values gp1 (m, k) of the three pitch gain codes Gp1 (m, k) (k = 0, 1, 2) of the EVRC mth frame, and the pitch gain Input to the interpolation unit 309. The pitch gain interpolation unit 309 uses the inverse quantized value gp1 (m, k), and the pitch gain inverse quantized value gp2 (n, j) (j = 0, 1) of coding scheme 2 (G.729A), gp2 (n + 1, j) (j = 0,1)
(1) gp2 (n, 0) = gp1 (m, 0)
(2) gp2 (n, 1) = (gp1 (m, 0) + gp1 (m, 1)) / 2
(3) gp2 (n + 1, 0) = (gp1 (m, 1) + gp1 (m, 2)) / 2
(4) gp2 (n + 1, 1) = gp1 (m, 2)
Is obtained by interpolation. Note that the pitch gain dequantized value gp2 (n, j) is not directly required for gain code conversion, but is used for generating a target signal.

音声再生部310はEVRCの各符号の逆量子化値lsp1(m, k)、lag1(m, k)、gp1(m,k)、cb1(m, k)、gc1(m, k)を入力されて、第ｍフレームにおけるトータル160サンプルのEVRCの再生音声SP(k,i)を作成し、これら再生信号を80サンプルづつの2つのG.729Aの再生信号Sp(n,h)，Sp(n+1,h)に分割して出力する。ここで、再生音声の作成方法はEVRCの復号器と同じで周知であるので説明を省略する。
ターゲット生成部311は第1実施例のターゲット生成部（図6参照）と同様な構成を備えており、代数符号変換部312と代数符号帳ゲイン変換部313で用いるターゲット信号Target(n,h)，Target(n+1,h)を作成する。すなわち、ターゲット生成部311は、まず、ピッチラグ量子化部308で求めたピッチラグlag2(n, j)に対応する適応符号帳出力を求め、これにピッチゲインgp2(n, j)を乗じて音源信号を作成する。次に、該音源信号をLSP逆量子化値lsp2(n, j)で構成されるLPC合成フィルタに入力して適応符号帳合成信号syn(n,h)を作成する。しかる後、音声再生部310で作成した再生音声Sp(n,h)から適応符号帳合成信号syn(n,h)を差し引いてターゲット信号Target(n,h)を求める。同様にして、第n+1フレーム目のターゲット信号Target(n+1,h)を作成する。 The audio playback unit 310 inputs the inverse quantization values lsp1 (m, k), lag1 (m, k), gp1 (m, k), cb1 (m, k), gc1 (m, k) of each EVRC code Thus, EVRC playback sound SP (k, i) of 160 samples in total in the m-th frame is created, and these playback signals are converted into two G.729A playback signals Sp (n, h), Sp ( n + 1, h) and output. Here, the method of creating the playback audio is the same as that of the EVRC decoder and is well known, and thus the description thereof is omitted.
The target generation unit 311 has the same configuration as the target generation unit (see FIG. 6) of the first embodiment, and the target signal Target (n, h) used in the algebraic code conversion unit 312 and the algebraic codebook gain conversion unit 313. , Target (n + 1, h). That is, the target generation unit 311 first obtains an adaptive codebook output corresponding to the pitch lag lag2 (n, j) obtained by the pitch lag quantization unit 308, and multiplies it by the pitch gain gp2 (n, j) to generate a sound source signal. Create Next, the excitation signal is input to an LPC synthesis filter composed of LSP dequantized values lsp2 (n, j) to create an adaptive codebook synthesis signal syn (n, h). Thereafter, the target code Target (n, h) is obtained by subtracting the adaptive codebook synthesized signal syn (n, h) from the reproduced speech Sp (n, h) created by the speech reproducing unit 310. Similarly, a target signal Target (n + 1, h) of the (n + 1) th frame is generated.

代数符号変換部312は第1実施例の代数符号変換部(図7参照)と同様の構成を備え、G.729Aの代数符号帳探索と全く同じ処理を行う。まず、図１８に示したパルス位置・極性の組合せでできる代数符号帳出信号をLSP逆量子化値lsp2(n, j)で構成されるLPC合成フィルタに入力して代数合成信号を作成する。次に、前記代数合成信号とターゲット信号の相互相関値Rcxと、代数合成信号の自己相関値Rccを計算し、Rcxの２乗をRccで正規化して得られる正規化相互相関値Rcx・Rcx/Rccが最も大きくなる代数符号Cb2(n, j)を探索する。同様にして代数符号Cb2(n+1,j)を求める。 The algebraic code converting unit 312 has the same configuration as the algebraic code converting unit (see FIG. 7) of the first embodiment, and performs exactly the same processing as the G.729A algebraic codebook search. First, an algebraic code book signal generated by the combination of pulse position and polarity shown in FIG. 18 is input to an LPC synthesis filter composed of LSP dequantized values lsp2 (n, j) to create an algebraic synthesized signal. Next, the cross-correlation value Rcx between the algebraic synthesized signal and the target signal and the autocorrelation value Rcc of the algebraic synthesized signal are calculated, and the normalized cross-correlation value Rcx · Rcx / obtained by normalizing the square of Rcx with Rcc. Search for the algebraic code Cb2 (n, j) with the largest Rcc. Similarly, an algebraic code Cb2 (n + 1, j) is obtained.

ゲイン変換部313はターゲット信号Target(n,h)、ピッチラグlag2(n, j)、代数符号Cb2(n, j)、LSP逆量子化値lsp2(n, j)を用いてゲイン変換を行う。変換方法はG.729Aの符号器におけるゲイン量子化と同じである。手順を以下に示す。
(1) G.729のゲイン量子化テーブルの中から一組のテーブル値（ピッチゲイン、代数符号帳ゲインの補正係数γ）を取り出す。
(2) 適応符号帳出力に前記ピッチゲインのテーブル値を乗じて信号Xを作成する。
(3) 代数符号帳出力に、前記補正係数γとゲイン予測値g′を乗じて信号Yを作成する。
(4) 信号Xと信号Yを加算して得られる信号を、LSP逆量子化値lsp2(n, j)で構成されるLPC合成フィルタに入力して合成信号Zを作成する。
(5) ターゲット信号と合成信号Zの誤差電力Eを計算する。
(6) (1)〜(5)の処理をゲイン量子化テーブルの全てのテーブル値について行い、誤差電力Eが最小となるテーブル値を決定し、そのインデックスをゲイン符号Gain2(n, j)とする。同様にして、ターゲット信号Target(n+1,h)とピッチラグlag2(n+1, j)、代数符号Cb2(n+1, j)、LSP逆量子化値lsp2(n+1, j)からゲイン符号Gain2(n+1, j)を求める。 The gain conversion unit 313 performs gain conversion using the target signal Target (n, h), the pitch lag lag2 (n, j), the algebraic code Cb2 (n, j), and the LSP inverse quantization value lsp2 (n, j). The conversion method is the same as the gain quantization in the G.729A encoder. The procedure is shown below.
(1) A set of table values (pitch gain, algebraic codebook gain correction coefficient γ) is extracted from the G.729 gain quantization table.
(2) The signal X is generated by multiplying the adaptive codebook output by the pitch gain table value.
(3) The signal Y is generated by multiplying the algebraic codebook output by the correction coefficient γ and the predicted gain value g ′.
(4) A signal obtained by adding the signal X and the signal Y is input to an LPC synthesis filter composed of the LSP dequantized value lsp2 (n, j) to create a synthesized signal Z.
(5) The error power E between the target signal and the synthesized signal Z is calculated.
(6) The processing of (1) to (5) is performed for all the table values of the gain quantization table, the table value that minimizes the error power E is determined, and the index is set as the gain code Gain2 (n, j). To do. Similarly, from the target signal Target (n + 1, h), pitch lag lag2 (n + 1, j), algebraic code Cb2 (n + 1, j), LSP dequantized value lsp2 (n + 1, j) Gain code Gain2 (n + 1, j) is obtained.

しかる後、符号多重部314はLSP符号Lsp2(n)、ピッチラグ符号Lag2(n)、代数符号Cb2(n, j)、ゲイン符号Gain2(n, j)を多重してG.729Aの第nフレームにおける音声符号CODE2(n)を出力する。また、符号多重部314は、LSP符号Lsp2(n+1)、ピッチラグ符号Lag2(n+1)、代数符号Cb2(n+1, j)、ゲイン符号Gain2(n+1, j)を多重してG.729Aの第n+1フレームにおける音声符号CODE2(n+1)を出力する。以上の説明の通り、第3実施例によれば、EVRC(フルレート)の音声符号をG.729Aの音声符号に変換することができる。 After that, the code multiplexing unit 314 multiplexes the LSP code Lsp2 (n), the pitch lag code Lag2 (n), the algebraic code Cb2 (n, j), and the gain code Gain2 (n, j) to divide the G.729A nth frame. Output the voice code CODE2 (n). The code multiplexing unit 314 multiplexes the LSP code Lsp2 (n + 1), the pitch lag code Lag2 (n + 1), the algebraic code Cb2 (n + 1, j), and the gain code Gain2 (n + 1, j). The voice code CODE2 (n + 1) in the (n + 1) th frame of G.729A is output. As described above, according to the third embodiment, an EVRC (full rate) speech code can be converted into a G.729A speech code.

・ハーフレート用音声符号変換部
フルレートとハーフレートの符号器・復号器は、各量子化テーブルの大きさが異なるだけであり、その構成はほぼ同じである。したがって、ハーフレート用の音声符号変換部203も、前述したフルレート用の音声符号変換部202と同様に構成でき、同様にハーフレートの音声符号をG.729Aの音声符号に変換することができる。 Half-rate speech code conversion unit The full-rate and half-rate encoders / decoders differ only in the size of each quantization table, and have substantially the same configuration. Therefore, the half-rate speech code conversion unit 203 can be configured in the same manner as the full-rate speech code conversion unit 202 described above, and can similarly convert a half-rate speech code into a G.729A speech code.

・1/8レート用の音声符号変換部
図13は、1/8レート用の音声符号変換部204の構成図である。1/8レートは無音部や背景雑音部などの非音声区間に用いられる。又、1/8レートで伝送する情報はLSP符号(8bit/フレーム)とゲイン符号(8bit/フレーム)の計16bitであり、音源信号は符号器・復号器の内部でランダム発生させるため伝送しない。
図13において、符号分離部401はEVRC(1/8レート)の第ｍフレームにおける音声符号CODE1(m)が入力すると、LSP符号Lsp1(m)とゲイン符号Gc1(m)を分離する。LSP逆量子化部402及びLSP量子化部403は図12のフルレートの場合と同様にEVRCのLSP符号Lsp1(m)をG.729AのLSP符号Lsp2(n)に変換する。尚、LSP逆量子化部402はLSP符号逆量子化値lsp1(m, k)を求め、LSP量子化部403はG.729AのLSP符号Lsp2(n)を出力すると共にLSP符号逆量子化値lsp2(n, j)を求める。 1/8 Rate Speech Code Conversion Unit FIG. 13 is a configuration diagram of the 1/8 rate speech code conversion unit 204. The 1/8 rate is used for non-voice segments such as silence and background noise. The information transmitted at the 1/8 rate is 16 bits in total including the LSP code (8 bits / frame) and the gain code (8 bits / frame), and the excitation signal is not transmitted because it is randomly generated inside the encoder / decoder.
In FIG. 13, when the speech code CODE1 (m) in the mth frame of EVRC (1/8 rate) is input, the code separation unit 401 separates the LSP code Lsp1 (m) and the gain code Gc1 (m). The LSP inverse quantization unit 402 and the LSP quantization unit 403 convert the EVRC LSP code Lsp1 (m) into the G.729A LSP code Lsp2 (n), as in the case of the full rate in FIG. The LSP inverse quantization unit 402 obtains an LSP code inverse quantization value lsp1 (m, k), and the LSP quantization unit 403 outputs an LSP code Lsp2 (n) of G.729A and an LSP code inverse quantization value. Find lsp2 (n, j).

ゲイン逆量子化部404はゲイン符号Gc1(m)のゲイン逆量子化値gc1(m, k)を求める。尚、1/8レートでは雑音性音源信号に対するゲインのみが使用され、周期性音源に対するゲイン（ピッチゲイン）は使用しない。
1/8レートでは音源信号を符号器・復号器の内部でランダム発生させて使用している。そこで、1/8レート用音声符号変換部においても、音源発生部405はEVRC符号器・復号器と同様にランダム信号を発生し、該ランダム信号の振幅がガウス分布になるように調節した信号を音源信号Cb1(m, k)として出力する。尚、ランダム信号の発生方法、ガウス分布への調節方法についてはEVRCと同様の方法を用いる。 The gain inverse quantization unit 404 obtains a gain inverse quantization value gc1 (m, k) of the gain code Gc1 (m). At the 1/8 rate, only the gain for the noisy sound source signal is used, and the gain for the periodic sound source (pitch gain) is not used.
At the 1/8 rate, the sound source signal is randomly generated inside the encoder / decoder. Therefore, also in the 1/8 rate speech code converter, the sound source generator 405 generates a random signal in the same manner as the EVRC encoder / decoder, and the signal adjusted so that the amplitude of the random signal has a Gaussian distribution. Output as sound source signal Cb1 (m, k). In addition, the method similar to EVRC is used about the generation method of a random signal, and the adjustment method to a Gaussian distribution.

ゲイン乗算部406は音源信号Cb1(m, k)にゲイン逆量子化値gc1(m, k)を乗算してLPC合成フィルタに入力してターゲット信号Target(n,h)，Target(n+1,h)を作成する。なお、LPC合成フィルタ407はLSP符号逆量子化値lsp1(m, k)で構成される。
代数符号変換部408は図12のフルレートの場合と同様にして代数符号変換を行い、G.729Aの代数符号Cb2(n, j)を出力する。
EVRCの1/8レートは、無音部や雑音部などの周期性のほとんどない非音声区間に対して用いられるためピッチラグ符号が存在しない。そこで、以下の方法によりG.729A用のピッチラグ符号を生成する。1/8レートの音声符号変換機204は、フルレートあるいはハーフレートの音声符号変換部202,203のピッチラグ変換部303,308で得られたG.729A用のピッチラグ符号を取り出し、ピッチラグバッファ409に格納する。そして、現フレーム（第nフレーム）で1/8レートが選択されると該ピッチラグバッファ409内のピッチラグ符号Lag2(n, j)を出力する。ただし、ピッチラグバッファの記憶内容は変更しない。一方、現フレームで1/8レートが選択されなかった場合は、選択されたレート（フルレート又はハーフレート）の音声符号変換部202,203のピッチラグ変換部303,308で得られたG.729A用のピッチラグ符号がバッファ409に格納される。 The gain multiplier 406 multiplies the sound source signal Cb1 (m, k) by the gain dequantized value gc1 (m, k) and inputs it to the LPC synthesis filter to input the target signals Target (n, h), Target (n + 1) , h). Note that the LPC synthesis filter 407 is configured with an LSP code inverse quantization value lsp1 (m, k).
Algebraic code conversion section 408 performs algebraic code conversion in the same manner as in the full rate case of FIG. 12, and outputs an algebraic code Cb2 (n, j) of G.729A.
The EVRC 1/8 rate is used for non-speech sections with almost no periodicity such as silence and noise, so there is no pitch lag code. Therefore, a pitch lag code for G.729A is generated by the following method. The 1/8 rate speech code converter 204 extracts the pitch lag codes for G.729A obtained by the pitch lag conversion units 303 and 308 of the full rate or half rate speech code conversion units 202 and 203 and stores them in the pitch lag buffer 409. When the 1/8 rate is selected in the current frame (nth frame), the pitch lag code Lag2 (n, j) in the pitch lag buffer 409 is output. However, the stored contents of the pitch lag buffer are not changed. On the other hand, when the 1/8 rate is not selected in the current frame, the pitch lag code for G.729A obtained by the pitch lag converters 303 and 308 of the speech code converters 202 and 203 at the selected rate (full rate or half rate) is It is stored in the buffer 409.

ゲイン変換部410は図12のフルレートの場合と同様にしてゲイン符号変換を行ってゲイン符号Gc2(n, j)を出力する。
しかる後、符号多重部411はLSP符号Lsp2(n)、ピッチラグ符号Lag2(n)、代数符号Cb2(n, j)、ゲイン符号Gain2(n, j)を多重してG.729Aの第nフレームにおける音声符号CODE2(n)を出力する。また、符号多重部411は、LSP符号Lsp2(n+1)、ピッチラグ符号Lag2(n+1)、代数符号Cb2(n+1, j)、ゲイン符号Gain2(n+1, j)を多重してG.729Aの第n+1フレームにおける音声符号CODE2(n+1)を出力する。以上の説明の通り、EVRC(1/8レート)の音声符号をG.729Aの音声符号に変換することができる。 Gain conversion section 410 performs gain code conversion in the same manner as in the full rate case of FIG. 12, and outputs gain code Gc2 (n, j).
After that, the code multiplexing unit 411 multiplexes the LSP code Lsp2 (n), the pitch lag code Lag2 (n), the algebraic code Cb2 (n, j), and the gain code Gain2 (n, j), and the nth frame of G.729A Output the voice code CODE2 (n). The code multiplexing unit 411 multiplexes the LSP code Lsp2 (n + 1), the pitch lag code Lag2 (n + 1), the algebraic code Cb2 (n + 1, j), and the gain code Gain2 (n + 1, j). The voice code CODE2 (n + 1) in the (n + 1) th frame of G.729A is output. As described above, EVRC (1/8 rate) speech code can be converted to G.729A speech code.

（E）第4実施例
図14は第4実施例の音声符号変換装置の構成図であり、音声符号に回線誤りが発生しても対応できるようになっており、図2の第1実施例と同一符号を付している。異なる点は、（1）回線誤り検出部501が設けられている点、（2）LSP逆量子化部102a、ピッチラグ逆量子化部103a、ゲイン逆量子化部104a、代数符号逆量子化部110の替わりにLSP符号修正部511、ピッチラグ修正部512、ゲイン符号修正部513、代数符号修正部514が設けられている点である。 (E) Fourth Embodiment FIG. 14 is a block diagram of a voice code conversion apparatus according to the fourth embodiment, which can cope with a line error in the voice code. The first embodiment of FIG. Are given the same reference numerals. The difference is that (1) the line error detection unit 501 is provided, (2) the LSP inverse quantization unit 102a, the pitch lag inverse quantization unit 103a, the gain inverse quantization unit 104a, and the algebraic code inverse quantization unit 110. Instead, an LSP code correction unit 511, a pitch lag correction unit 512, a gain code correction unit 513, and an algebraic code correction unit 514 are provided.

入力音声xinが符号化方式１(G.729A)の符号器500へ入力されると、符号器500は符号化方式１の音声符号sp1を発生する。音声符号sp1は、無線回線又は有線回線(インターネット等)の伝送路502を通って音声符号変換装置へ入力する。ここで、音声符号変換装置に入力される前に回線誤りERRが混入すると、音声符号sp1は回線誤りの入った音声符号sp′に変形される。回線誤りERRのパターンはシステムに依存し、ランダムビット誤り、バースト性誤りなどの様々なパターンを取りえる。尚、誤りが混入しない場合にはsp1′とsp1は全く同じ符号となる。音声符号sp1′は符号分離部101へ入力され、LSP符号Lsp1(n)、ピッチラグ符号Lag1(n,j)、代数符号Cbi(n,j)、ゲイン符号Gain1(n,j)に分離される。又、音声符号sp1′は回線誤り検出部501に入力し、周知の方法で回線誤りの有無が検出される。たとえば音声符号sp1にCRC符号を付加しておくことにより回線誤りを検出することができる。 When the input speech xin is input to the encoding method 1 (G.729A) encoder 500, the encoder 500 generates the encoding method 1 speech code sp1. The speech code sp1 is input to the speech code conversion device through a transmission line 502 of a wireless line or a wired line (Internet or the like). Here, if a line error ERR is mixed before being input to the speech code converter, the speech code sp1 is transformed into a speech code sp ′ containing a line error. The pattern of line error ERR depends on the system and can take various patterns such as random bit errors and burst errors. If no error is mixed, sp1 ′ and sp1 have exactly the same sign. The speech code sp1 ′ is input to the code separation unit 101 and separated into an LSP code Lsp1 (n), a pitch lag code Lag1 (n, j), an algebraic code Cbi (n, j), and a gain code Gain1 (n, j). . The voice code sp1 'is input to the line error detecting unit 501, and the presence or absence of a line error is detected by a known method. For example, a line error can be detected by adding a CRC code to the voice code sp1.

LSP修正部511は誤りのないLSP符号Lsp1(n)が入力すると、第1実施例のLSP逆量子化部102aと同一の処理を行なってLSP逆量子化値lsp1を出力する。一方、回線誤りやフレーム消失により現フレームの正しいLsp符号を受信できない場合に最後に受信した良好な過去4フレームのLsp符号を用いてLSP逆量子化値lsp1を出力する。
ピッチラグ修正部512は、回線誤りやフレーム消失しなければ、受信した現フレームのピッチラグ符号の逆量子化値lag1を出力する。また、回線誤りやフレーム消失があれば、最後に受信した良好なフレームのピッチラグ符号の逆量子化値を出力する。一般的に、有声部ではピッチラグが滑らかに変化することが知られている。したがって、有声部では上記のように前フレームのピッチラグで代用させても音質上の劣化はほとんどない。また、無声部では、ピッチラグは大きく変化することが知られているが、無声部における適応符号帳の寄与率は小さい(ピッチゲインが小さい)ため、前述の方法による音質劣化はほとんどない。 When an error-free LSP code Lsp1 (n) is input, the LSP correction unit 511 performs the same processing as the LSP inverse quantization unit 102a of the first embodiment and outputs an LSP inverse quantization value lsp1. On the other hand, when the correct Lsp code of the current frame cannot be received due to a line error or frame loss, the LSP inverse quantization value lsp1 is output using the Lsp code of the last four good frames received.
Pitch lag correction unit 512 outputs a dequantized value lag1 of the pitch lag code of the received current frame unless a line error or frame disappears. If there is a line error or frame loss, the dequantized value of the pitch lag code of the last good frame received is output. Generally, it is known that the pitch lag changes smoothly in the voiced portion. Therefore, in the voiced portion, there is almost no deterioration in sound quality even when the pitch lag of the previous frame is substituted as described above. Further, it is known that the pitch lag changes greatly in the unvoiced part, but since the contribution rate of the adaptive codebook in the unvoiced part is small (the pitch gain is small), there is almost no deterioration in sound quality due to the above-described method.

ゲイン符号修正部513は、回線誤りやフレーム消失がない場合、第１実施例と同様に、受信した現フレームのゲイン符号Gain1(n,j)からピッチゲインgp1(j)と代数符号帳ゲインgc1(j)を求める。一方、回線誤りやフレーム消失がある場合には、現フレームのゲイン符号を用いることができないので、次式
gp1(n,0) =α・gp1(n-1,1)
gp1(n,1) =α・gp1(n-1,0)
gc1(n,0) =β・gc1(n-1,1)
gc1(n,1) =β・gc1(n-1,0)
により記憶してある1サブフレーム前のゲインを減衰してピッチゲインgp1(n,j)と代数符号帳ゲインgc1(n,j)を求めて出力する。ここでα、βは１以下の定数である。 When there is no line error or frame loss, the gain code correction unit 513 uses the gain code Gain1 (n, j) of the received current frame to the pitch gain gp1 (j) and the algebraic codebook gain gc1 as in the first embodiment. Find (j). On the other hand, if there is a line error or frame loss, the gain code of the current frame cannot be used.
gp1 (n, 0) = α ・ gp1 (n-1,1)
gp1 (n, 1) = α ・ gp1 (n-1,0)
gc1 (n, 0) = β ・ gc1 (n-1,1)
gc1 (n, 1) = β ・ gc1 (n-1,0)
Attenuate the gain of the previous subframe stored in step (a) to obtain and output the pitch gain gp1 (n, j) and the algebraic codebook gain gc1 (n, j). Here, α and β are constants of 1 or less.

代数符号修正部514は、回線誤りやフレーム消失がない場合、受信した現フレームの代数符号Cb1(n,j)の逆量子化値cbi(j)を出力する。また、回線誤りやフレーム消失があった場合には、記憶してある最後に受信した良好なフレームの代数符号の逆量子化値を出力する。 The algebraic code correction unit 514 outputs the dequantized value cbi (j) of the received algebraic code Cb1 (n, j) when there is no line error or frame loss. In addition, when there is a line error or frame loss, the dequantized value of the algebraic code of the last received good frame stored is output.

・付記
(付記１)第1音声符号化方式により符号化して得られる音声符号を第２音声符号化方式の音声符号に変換する音声符号変換方法において、
第1音声符号化方式による音声符号より、音声信号を再現するために必要な複数の符号成分を分離し、
各成分の符号をそれぞれ逆量子化して逆量子化値を出力し、
代数符号以外の符号成分の前記逆量子化値を量子化して第２音声符号化方式の音声符号の符号成分に変換し、
前記各逆量子化値から音声を再生し、
前記第２音声符号化方式の各符号成分を逆量子化して第２音声符号化方式の逆量子化値を求め、
前記再生音声と、前記第２音声符号化方式の各逆量子化値を用いてターゲット信号を生成し、
前記ターゲット信号を用いて第2音声符号化方式の代数符号を求め、
前記第２音声符号化方式の各符号成分を音声符号として出力する、
ことを特徴とする音声符号変換方法。
（付記２）伝送路誤り発生の有無を検出し、
伝送路誤りが発生していなければ前記分離された符号成分を使用し、伝送路誤りが発生していれば過去の正常な符号成分を用いて前記逆量子化値を出力する、ことを特徴とする付記1記載の音声符号変換方法。
(付記３）第1音声符号化方式に基いて音声信号をLSP符号、ピッチラグ符号、代数符号、ゲイン符号で符号化した第1音声符号を、第２音声符号化方式に基いた第２音声符号に変換する音声符号変換方法において、
第1音声符号のLSP符号、ピッチラグ符号、ゲイン符号を逆量子化し、これらの逆量子化値を第２音声符号化方式により量子化して第２音声符号のLSP符号、ピッチラグ符号、ゲイン符号を求め、
前記第２音声符号化方式のLSP符号、ピッチラグ符号、ゲイン符号の逆量子化値を用いてピッチ周期性合成信号を生成すると共に第1音声符号より音声信号を再生し、該再生された音声信号と前記ピッチ周期性合成信号の差信号をターゲット信号として発生し、
第２音声符号化方式における任意の代数符号と前記第2音声符号を構成するLSP符号の逆量子化値とを用いて代数合成信号を生成し、
前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求め、
第２音声符号化方式における前記LSP符号、ピッチラグ符号、代数符号、ゲイン符号を出力する、
ことを特徴とする音声符号変換方法。
（付記４）第2音声符号化方式の前記ピッチラグ符号の逆量子化値に応じた適応符号帳出力信号に、第2音声符号化方式の前記ゲイン符号に応じたゲインを掛けて得られた信号を、第2音声符号化方式の前記LSP符号の逆量子化値に基いたLPC合成フィルタに入力し、その出力信号を前記ピッチ周期性合成信号とする、
ことを特徴とする付記３記載の音声符号変換方法。
（付記５）第2音声符号化方式の前記任意の代数符号に応じた代数符号帳出力信号を第2音声符号化方式の前記LSP符号の逆量子化値に基いたLPC合成フィルタに入力し、その出力信号を前記代数合成信号とする、
ことを特徴とする付記３記載の音声符号変換方法。
（付記６）前記第1音声符号化方式のゲイン符号はピッチゲインと代数符号帳ゲインを組にして符号化したものであり、該ゲイン符号を逆量子化して得られた逆量子化値のうちピッチゲイン逆量子化値を第２音声符号化方式により量子化して第２音声符号のピッチゲイン符号を求める、
ことを特徴とする付記３記載の音声符号変換方法。
（付記７）第2音声符号化方式の前記求めた代数符号に応じた代数符号帳出力信号を第2音声符号化方式の前記LSP符号の逆量子化値に基いたLPC合成フィルタに入力し、
その出力信号と前記ターゲット信号とから代数符号帳ゲインを求め、
該代数符号帳ゲインを量子化して第2音声符号化方式に基いた代数符号帳ゲインを求める、
ことを特徴とする付記６記載の音声符号変換方法。
（付記８）前記第1音声符号化方式のゲイン符号はピッチゲインと代数符号帳ゲインを組にして符号化したものであり、該ゲイン符号を逆量子化して得られたピッチゲイン逆量子化値及び代数符号帳ゲイン逆量子化値をそれぞれ第２音声符号化方式により量子化して第２音声符号のピッチゲイン符号及び代数符号帳ゲイン符号を求める、
ことを特徴とする付記３記載の音声符号変換方法。
(付記９）第1音声符号化方式に基いて音声信号をLSP符号、ピッチラグ符号、代数符号、ピッチゲイン符号、代数符号帳ゲイン符号で符号化した第1音声符号を、第２音声符号化方式に基いた第２音声符号に変換する音声符号変換方法において、
第1音声符号を構成する各符号を逆量子化し、逆量子化値のうちLSP符号、ピッチラグ符号の逆量子化値を第２音声符号化方式により量子化して第２音声符号のLSP符号、ピッチラグ符号を求め、
第1音声符号のピッチゲイン符号の逆量子化値を用いて補間処理により第2音声符号のピッチゲイン符号の逆量子化値を求め、
前記第2音声符号のLSP符号、ピッチラグ符号、ピッチゲインの逆量子化値を用いてピッチ周期性合成信号を生成すると共に、第1音声符号より音声信号を再生し、該再生された音声信号と前記ピッチ周期性合成信号の差信号をターゲット信号として発生し、
第２音声符号化方式における任意の代数符号と前記第2音声符号のLSP符号の逆量子化値を用いて代数合成信号を生成し、
前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求め、
第2音声符号の前記LSP符号、ピッチラグ符号の逆量子化値、前記求めた代数符号及び前記ターゲット信号を用いて第２音声符号化方式によりピッチゲインと代数符号帳ゲインを組み合せた第２音声符号のゲイン符号を求め、
前記求めた第２音声符号化方式におけるLSP符号、ピッチラグ符号、代数符号、ゲイン符号を出力する、
ことを特徴とする音声符号変換方法。
(付記１０) 第１音声符号化方式により符号化して得られる音声符号を第２音声符号化方式の音声符号に変換する音声符号変換装置において、
第１音声符号化方式による音声符号より音声信号を再現するために必要な複数の符号成分を分離する符号分離手段、
各成分の符号をそれぞれ逆量子化して逆量子化値を出力する逆量子化部、
前記各逆量子化部から出力する代数符号以外の符号成分の逆量子化値を量子化して第２音声符号化方式の音声符号の符号成分に変換する量子化部、
前記各逆量子化値から音声を再生する音声再生部、
前記第２音声符号化方式の各符号成分を逆量子化して第２音声符号化方式の逆量子化値を求める逆量子化手段、
前記音声再生部から出力される再生音声と、前記第２音声符号化方式の逆量子化手段から出力される各逆量子化値とを用いてターゲット信号を生成するターゲット生成手段、
前記ターゲット信号を用いて第2音声符号化方式の代数符号を求める代数符号取得部、
前記第２音声符号化方式の各符号成分を音声符号として出力する符号多重手段、
を備えたことを特徴とする音声符号変換装置。
(付記１１) 第1音声符号化方式に基いて音声信号をLSP符号、ピッチラグ符号、代数符号、ゲイン符号で符号化した第1音声符号を、第２音声符号化方式に基いた第２音声符号に変換する音声符号変換装置において、
第1音声符号のLSP符号、ピッチラグ符号、ゲイン符号を逆量子化し、これらの逆量子化値を第２音声符号化方式により量子化して第２音声符号のLSP符号、ピッチラグ符号、ゲイン符号に変換する変換部、
前記第1音声符号より音声信号を再生する音声再生部、
前記第２音声符号化方式のLSP符号、ピッチラグ符号、ゲイン符号の逆量子化値を用いてピッチ周期性合成信号を生成し、前記音声再生部で再生した音声信号と該ピッチ周期性合成信号の差信号をターゲット信号として発生するターゲット信号生成部、
第２音声符号化方式における任意の代数符号と、前記第2音声符号のLSP符号の逆量子化値とを用いて代数合成信号を生成し、前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求める代数符号取得部、
求めた第２音声符号化方式におけるLSP符号、ピッチラグ符号、代数符号、ゲイン符号を多重して出力する符号多重部、
を備えたことを特徴とする音声符号変換装置。
（付記１２）前記ターゲット信号生成部は、
第2音声符号化方式の前記ピッチラグ符号の逆量子化値に応じた周期性音源信号を発生する適応符号帳、
適応符号帳出力信号に、第2音声符号化方式の前記ゲイン符号に応じたゲインを掛けるゲイン乗算部、
第2音声符号化方式の前記LSP符号の逆量子化値に基いて作成され、前記ゲイン乗算部の出力信号を入力されて前記ピッチ周期性合成信号を出力するLPC合成フィルタ、
前記音声再生部で再生した音声信号と該ピッチ周期性合成信号の差信号をターゲット信号として出力する手段、
を備えた付記1１記載の音声符号変換装置。
（付記１３）前記代数符号取得部は、
第2音声符号化方式の任意の代数符号に応じた雑音性音源信号を出力する代数符号帳、
第2音声符号化方式の前記LSP符号の逆量子化値に基いて作成され、代数符号帳出力信号が入力されて前記代数合成信号を出力するLPC合成フィルタ、
前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求める手段、
を有することを特徴とする付記1１記載の音声符号変換装置。
（付記１４）前記第1音声符号化方式のゲイン符号がピッチゲインと代数符号帳ゲインを組にして符号化したものであれば、前記変換部は、
該ゲイン符号を逆量子化してピッチゲイン逆量子化値及び代数符号帳ゲイン逆量子化値を発生する逆量子化部、
逆量子化値のうちピッチゲイン逆量子化値を第２音声符号化方式により量子化して第２音声符号のピッチゲイン符号に変換する手段、
を有することを特徴とする請求項1１記載の音声符号変換装置。
（付記１５）前記音声符号変換装置は更に、
第2音声符号化方式の前記LSP符号の逆量子化値に基いて作成されたLPC合成フィルタ、
前記求めた代数符号に応じた代数符号帳出力信号を前記LPC合成フィルタに入力したときの出力信号と前記ターゲット信号とから代数符号帳ゲインを決定する代数符号帳ゲイン決定部、該代数符号帳ゲインを量子化して第2音声符号化方式に基いた代数符号帳ゲイン符号を発生する代数符号帳ゲイン符号発生部、
を有することを特徴とする付記１４記載の音声符号変換装置。
（付記１６）前記第1音声符号化方式のゲイン符号がピッチゲインと代数符号帳ゲインを組にして符号化したものであれば、前記変換部は、
該ゲイン符号を逆量子化してピッチゲイン逆量子化値及び代数符号帳ゲイン逆量子化値を発生する逆量子化部、
逆量子化により得られたピッチゲイン逆量子化値及び代数符号帳ゲイン逆量子化値をそれぞれ第２音声符号化方式により量子化して第２音声符号のピッチゲイン符号及び代数符号帳ゲインに変換する手段、
を有することを特徴とする請求項１１記載の音声符号変換装置。
(付記１７）前記音声再生部は、前記変換部で逆量子化された第1音声符号のLSP符号、ピッチラグ符号、ゲイン符号の逆量子化値を用いて音声信号を再生することを特徴とする請求項１１記載の音声符号変換装置。
(付記１８）第1音声符号化方式に基いて音声信号をLSP符号、ピッチラグ符号、代数符号、ピッチゲイン符号、代数符号帳ゲイン符号で符号化した第1音声符号を、第２音声符号化方式に基いた第２音声符号に変換する音声符号変換装置において、
第1音声符号を構成する各符号を逆量子化し、逆量子化値のうちLSP符号、ピッチラグ符号の逆量子化値を第２音声符号化方式により量子化して第２音声符号のLSP符号、ピッチラグ符号に変換する変換部、
第1音声符号のピッチゲイン符号の逆量子化値を用いて補間処理により第2音声符号のピッチゲイン符号の逆量子化値を発生するピッチゲイン補間部、
第1音声符号より音声信号を再生する音声信号再生部、
前記第2音声符号のLSP符号、ピッチラグ符号、ピッチゲインの逆量子化値を用いてピッチ周期性合成信号を生成し、前記音声信号再生部から出力する再生音声信号と前記ピッチ周期性合成信号の差信号をターゲット信号として発生するターゲット信号発生部、
第２音声符号化方式における任意の代数符号と前記第2音声符号のLSP符号の逆量子化値を用いて代数合成信号を生成し、前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を求める代数符号取得部、
第2音声符号の前記LSP符号の逆量子化値、第2音声符号のピッチラグ符号及び代数符号、前記ターゲット信号を用いて第２音声符号化方式により、ピッチゲインと代数符号帳ゲインを組み合せた第２音声符号のゲイン符号を取得するゲイン符号取得部、
前記求めた第２音声符号化方式におけるLSP符号、ピッチラグ符号、代数符号、ゲイン符号を多重して出力する符号多重部、
を備えたことを特徴とする音声符号変換装置。
（付記１９）前記ターゲット信号生成部は、
第2音声符号化方式の前記ピッチラグ符号の逆量子化値に応じた周期性音源信号を発生する適応符号帳、
適応符号帳出力信号に、第2音声符号化方式の前記ピッチゲイン符号に応じたゲインを掛けるゲイン乗算部、
第2音声符号化方式の前記LSP符号の逆量子化値に基いて作成され、前記ゲイン乗算部の出力信号を入力されて前記ピッチ周期性合成信号を出力するLPC合成フィルタ、
前記音声再生部で再生した音声信号と該ピッチ周期性合成信号の差信号をターゲット信号として出力する手段、
を備えた付記18記載の音声符号変換装置。
（付記２０）前記代数符号取得部は、
第2音声符号化方式の任意の代数符号に応じた雑音性音源信号を出力する代数符号帳、
第2音声符号化方式の前記LSP符号の逆量子化値に基いて作成され、代数符号帳出力信号が入力されて前記代数合成信号を出力するLPC合成フィルタ、
前記ターゲット信号と該代数合成信号との差が最小となる第２音声符号化方式における代数符号を取得する手段、
を有することを特徴とする付記1８記載の音声符号変換装置。・ Additional notes
(Supplementary note 1) In a speech code conversion method for converting a speech code obtained by encoding using the first speech encoding method into a speech code of the second speech encoding method,
From the audio code by the first audio encoding method, a plurality of code components necessary to reproduce the audio signal are separated,
Dequantize the sign of each component and output the dequantized value,
Quantize the dequantized value of the code component other than the algebraic code and convert it to the code component of the speech code of the second speech coding scheme,
Play audio from each dequantized value,
Dequantizing each code component of the second speech coding scheme to obtain a dequantized value of the second speech coding scheme;
Generating a target signal using the reproduced speech and each inverse quantization value of the second speech encoding method;
Using the target signal to obtain the algebraic code of the second speech coding scheme,
Outputting each code component of the second speech coding method as a speech code;
A speech code conversion method characterized by the above.
(Appendix 2) Detects the presence or absence of transmission line errors,
If the transmission path error does not occur, the separated code component is used, and if the transmission path error has occurred, the dequantized value is output using the past normal code component. The speech code conversion method according to appendix 1.
(Supplementary Note 3) A first speech code obtained by encoding a speech signal with an LSP code, a pitch lag code, an algebraic code, and a gain code based on the first speech coding method is used as a second speech code based on the second speech coding method. In the voice code conversion method for converting to
The LSP code, pitch lag code, and gain code of the first speech code are inversely quantized, and the inverse quantized values are quantized by the second speech encoding method to obtain the LSP code, pitch lag code, and gain code of the second speech code. ,
A pitch periodic synthesized signal is generated using the dequantized values of the LSP code, pitch lag code, and gain code of the second audio encoding method, and the audio signal is reproduced from the first audio code, and the reproduced audio signal is reproduced. And a difference signal between the pitch periodicity synthesis signal and the target signal,
An algebraic composite signal is generated using an arbitrary algebraic code in the second speech coding system and an inverse quantization value of the LSP code constituting the second speech code;
Obtaining an algebraic code in the second speech coding scheme that minimizes the difference between the target signal and the algebraic synthesized signal;
Outputting the LSP code, pitch lag code, algebraic code, and gain code in the second speech encoding method;
A speech code conversion method characterized by the above.
(Supplementary note 4) A signal obtained by multiplying an adaptive codebook output signal corresponding to the inverse quantized value of the pitch lag code of the second speech coding scheme by a gain corresponding to the gain code of the second speech coding scheme Is input to the LPC synthesis filter based on the inverse quantization value of the LSP code of the second speech coding system, and the output signal is the pitch periodic synthesis signal,
The speech code conversion method according to supplementary note 3, characterized by:
(Supplementary Note 5) An algebraic codebook output signal corresponding to the arbitrary algebraic code of the second speech coding scheme is input to an LPC synthesis filter based on an inverse quantization value of the LSP code of the second speech coding scheme, The output signal is the algebraic composite signal,
The speech code conversion method according to supplementary note 3, characterized by:
(Supplementary Note 6) The gain code of the first speech encoding method is a combination of pitch gain and algebraic codebook gain, and among the inverse quantized values obtained by dequantizing the gain code A pitch gain code of the second speech code is obtained by quantizing the pitch gain inverse quantization value by the second speech coding method;
The speech code conversion method according to supplementary note 3, characterized by:
(Supplementary note 7) An algebraic codebook output signal corresponding to the obtained algebraic code of the second speech coding scheme is input to an LPC synthesis filter based on an inverse quantization value of the LSP code of the second speech coding scheme,
Obtain the algebraic codebook gain from the output signal and the target signal,
Quantizing the algebraic codebook gain to obtain an algebraic codebook gain based on the second speech coding scheme;
The speech code conversion method according to appendix 6, wherein:
(Supplementary Note 8) The gain code of the first speech encoding method is a combination of pitch gain and algebraic codebook gain, and the pitch gain dequantized value obtained by dequantizing the gain code And the algebraic codebook gain dequantized values are respectively quantized by the second speech coding method to obtain the pitch gain code and the algebraic codebook gain code of the second speech code.
The speech code conversion method according to supplementary note 3, characterized by:
(Supplementary note 9) The first speech code obtained by encoding the speech signal with the LSP code, the pitch lag code, the algebraic code, the pitch gain code, and the algebraic codebook gain code based on the first speech coding method is used as the second speech coding method. In the speech code conversion method for converting to the second speech code based on
Each code constituting the first speech code is inversely quantized, and among the inversely quantized values, the LSP code and the inverse quantized value of the pitch lag code are quantized by the second speech coding method, and the LSP code and pitch lag of the second speech code Find the sign
Obtain the inverse quantization value of the pitch gain code of the second speech code by interpolation using the inverse quantization value of the pitch gain code of the first speech code,
A pitch periodic composite signal is generated using the LSP code of the second speech code, the pitch lag code, and the inverse quantization value of the pitch gain, and the speech signal is reproduced from the first speech code, and the reproduced speech signal and Generating a difference signal of the pitch periodic composite signal as a target signal;
An algebraic synthesized signal is generated using an arbitrary algebraic code in the second speech coding system and an inverse quantization value of the LSP code of the second speech code;
Obtaining an algebraic code in the second speech coding scheme that minimizes the difference between the target signal and the algebraic synthesized signal;
The second speech code combining the pitch gain and the algebraic codebook gain by the second speech coding method using the LSP code of the second speech code, the inverse quantization value of the pitch lag code, the obtained algebraic code and the target signal. Find the gain sign of
Outputting an LSP code, a pitch lag code, an algebraic code, and a gain code in the obtained second speech encoding method;
A speech code conversion method characterized by the above.
(Supplementary Note 10) In a speech code conversion device that converts a speech code obtained by encoding using the first speech encoding method into a speech code of the second speech encoding method,
Code separating means for separating a plurality of code components necessary for reproducing a sound signal from a sound code by the first sound coding method;
An inverse quantization unit that inversely quantizes the sign of each component and outputs an inverse quantized value;
A quantization unit that quantizes a dequantized value of a code component other than the algebraic code output from each dequantization unit and converts the quantized value into a code component of a speech code of a second speech coding scheme;
An audio reproduction unit for reproducing audio from each dequantized value;
Inverse quantization means for inversely quantizing each code component of the second speech encoding method to obtain an inverse quantization value of the second speech encoding method;
Target generation means for generating a target signal using the reproduced sound output from the sound reproduction unit and each dequantized value output from the dequantization means of the second speech encoding method;
An algebraic code obtaining unit for obtaining an algebraic code of the second speech coding method using the target signal;
Code multiplexing means for outputting each code component of the second speech coding method as a speech code;
A speech code conversion device comprising:
(Additional remark 11) The 1st audio | voice code which encoded the audio | voice signal with the LSP code, the pitch lag code, the algebraic code, and the gain code based on the 1st audio | voice coding system is used as the 2nd audio | voice code based on the 2nd audio | voice coding system In the speech code conversion device for converting to
The LSP code, pitch lag code, and gain code of the first speech code are dequantized, and these dequantized values are quantized by the second speech coding method and converted to the LSP code, pitch lag code, and gain code of the second speech code. Conversion part,
An audio reproduction unit for reproducing an audio signal from the first audio code;
A pitch periodicity synthesized signal is generated using the dequantized values of the LSP code, pitch lag code, and gain code of the second speech coding method, and the speech signal reproduced by the speech playback unit and the pitch periodicity synthesized signal A target signal generator for generating a difference signal as a target signal;
An algebraic composite signal is generated using an arbitrary algebraic code in the second speech coding system and an inverse quantization value of the LSP code of the second speech code, and a difference between the target signal and the algebraic composite signal is minimized. An algebraic code acquisition unit for obtaining an algebraic code in the second speech encoding method,
A code multiplexing unit that multiplexes and outputs the LSP code, pitch lag code, algebraic code, and gain code in the obtained second speech coding method;
A speech code conversion device comprising:
(Supplementary Note 12) The target signal generator
An adaptive codebook for generating a periodic excitation signal according to an inverse quantization value of the pitch lag code of the second speech coding system;
A gain multiplier that multiplies the adaptive codebook output signal by a gain according to the gain code of the second speech encoding method;
An LPC synthesis filter that is created based on the dequantized value of the LSP code of the second speech coding system, and that receives the output signal of the gain multiplier and outputs the pitch periodic synthesis signal,
Means for outputting a difference signal between the audio signal reproduced by the audio reproduction unit and the pitch periodic synthetic signal as a target signal;
The speech code conversion device according to appendix 11, comprising:
(Supplementary Note 13) The algebraic code acquisition unit includes:
An algebraic codebook that outputs a noisy sound source signal according to an arbitrary algebraic code of the second speech coding scheme;
An LPC synthesis filter that is created based on the dequantized value of the LSP code of the second speech coding system, and that receives the algebraic codebook output signal and outputs the algebraic synthesized signal,
Means for obtaining an algebraic code in the second speech coding system in which a difference between the target signal and the algebraic synthesized signal is minimized;
The speech code converter according to appendix 11, characterized by comprising:
(Supplementary note 14) If the gain code of the first speech encoding method is encoded by combining a pitch gain and an algebraic codebook gain,
An inverse quantization unit for inversely quantizing the gain code to generate a pitch gain inverse quantization value and an algebraic codebook gain inverse quantization value;
Means for quantizing a pitch gain dequantized value among the dequantized values by a second speech coding method and converting the quantized value into a pitch gain code of a second speech code;
The speech code conversion apparatus according to claim 11, comprising:
(Supplementary note 15) The speech code converter further includes:
An LPC synthesis filter created based on the inverse quantization value of the LSP code of the second speech encoding method,
An algebraic codebook gain determining unit that determines an algebraic codebook gain from an output signal when the algebraic codebook output signal corresponding to the obtained algebraic code is input to the LPC synthesis filter and the target signal, and the algebraic codebook gain An algebraic codebook gain code generator for generating an algebraic codebook gain code based on the second speech coding method by quantizing
15. The speech code conversion device according to supplementary note 14, characterized by comprising:
(Supplementary note 16) If the gain code of the first speech encoding method is encoded by combining a pitch gain and an algebraic codebook gain,
An inverse quantization unit for inversely quantizing the gain code to generate a pitch gain inverse quantization value and an algebraic codebook gain inverse quantization value;
The pitch gain inverse quantization value and the algebraic codebook gain inverse quantization value obtained by the inverse quantization are respectively quantized by the second speech coding method and converted into the pitch gain code and the algebraic codebook gain of the second speech code. means,
The speech code conversion apparatus according to claim 11, comprising:
(Additional remark 17) The said audio | voice reproduction | regeneration part reproduces | regenerates an audio | voice signal using the dequantization value of the LSP code of the 1st audio | voice code, the pitch lag code, and the gain code which were dequantized by the said conversion part. The speech code conversion device according to claim 11.
(Supplementary Note 18) A first speech code obtained by coding a speech signal with an LSP code, a pitch lag code, an algebraic code, a pitch gain code, and an algebraic codebook gain code based on the first speech coding method In the speech code conversion device for converting to the second speech code based on
Each code constituting the first speech code is inversely quantized, and among the inversely quantized values, the LSP code and the inverse quantized value of the pitch lag code are quantized by the second speech coding method, and the LSP code and pitch lag of the second speech code A conversion unit for converting into a code,
A pitch gain interpolation unit that generates an inverse quantized value of the pitch gain code of the second speech code by an interpolation process using an inverse quantized value of the pitch gain code of the first speech code;
An audio signal reproduction unit for reproducing an audio signal from the first audio code;
A pitch periodic composite signal is generated using the LSP code of the second speech code, the pitch lag code, and the inverse quantized value of the pitch gain, and the reproduced speech signal output from the speech signal playback unit and the pitch periodic composite signal A target signal generator for generating a difference signal as a target signal;
An algebraic synthesized signal is generated using an arbitrary algebraic code in the second speech coding system and an inverse quantization value of the LSP code of the second speech code, and the difference between the target signal and the algebraic synthesized signal is minimized. An algebraic code acquisition unit for obtaining an algebraic code in the second speech encoding method;
A combination of pitch gain and algebraic codebook gain according to the second speech coding method using the inverse quantization value of the LSP code of the second speech code, the pitch lag code and algebraic code of the second speech code, and the target signal. A gain code acquisition unit for acquiring a gain code of two audio codes;
A code multiplexing unit that multiplexes and outputs the LSP code, pitch lag code, algebraic code, and gain code in the obtained second speech encoding method;
A speech code conversion device comprising:
(Supplementary note 19) The target signal generator
An adaptive codebook for generating a periodic excitation signal according to an inverse quantization value of the pitch lag code of the second speech coding system;
A gain multiplier that multiplies the adaptive codebook output signal by a gain corresponding to the pitch gain code of the second speech encoding method;
An LPC synthesis filter that is created based on the dequantized value of the LSP code of the second speech coding system, and that receives the output signal of the gain multiplier and outputs the pitch periodic synthesis signal,
Means for outputting a difference signal between the audio signal reproduced by the audio reproduction unit and the pitch periodic synthetic signal as a target signal;
The speech code conversion device according to appendix 18, comprising:
(Supplementary note 20) The algebraic code acquisition unit
An algebraic codebook that outputs a noisy sound source signal according to an arbitrary algebraic code of the second speech coding scheme;
An LPC synthesis filter that is created based on the dequantized value of the LSP code of the second speech coding system, and that receives the algebraic codebook output signal and outputs the algebraic synthesized signal,
Means for obtaining an algebraic code in a second speech coding scheme in which a difference between the target signal and the algebraic synthesized signal is minimized;
The speech code converter according to appendix 18, characterized by comprising:

以上、本発明によれば、LSP符号、ピッチラグ符号、ピッチゲイン符号を量子化パラメータ領域で符号変換しているため、あるいは、LSP符号、ピッチラグ符号、ピッチゲイン符号、代数符号帳ゲイン符号を量子化パラメータ領域で符号変換しているため、再生音声を再度LPC分析、ピッチ分析する場合に比べて分析誤差が小さく、音質劣化の少ないパラメータ変換が可能となる。
又、本発明によれば、再生音声を再度LPC分析、ピッチ分析しないため、従来技術１で問題となっていた符号変換による遅延の問題を解決することができる。 As described above, according to the present invention, the LSP code, pitch lag code, and pitch gain code are code-converted in the quantization parameter area, or the LSP code, pitch lag code, pitch gain code, and algebraic codebook gain code are quantized. Since the code conversion is performed in the parameter area, the parameter conversion can be performed with less analysis error and less deterioration in sound quality as compared with the case where the reproduced sound is subjected to the LPC analysis and the pitch analysis again.
Further, according to the present invention, since the reproduced speech is not subjected to LPC analysis or pitch analysis again, the problem of delay due to code conversion, which has been a problem in the prior art 1, can be solved.

本発明によれば、代数符号、代数符号帳ゲイン符号については、再生音声からターゲット信号を作成し、ターゲット信号と代数合成信号との誤差が最小になるように変換するようにしたから、従来技術２で問題となっていた符号化方式１と符号化方式２の代数符号帳の構成が大きく異なっている場合でも音質劣化の少ない符号変換が可能となる。
又、本発明によれば、G.729Aの符号化方式とEVRC符号化方式との間で音声符号変換が可能となる。
更に、本発明によれば、EVRCのレートに応じた符号変換を行なうことができる。 According to the present invention, for the algebraic code and the algebraic codebook gain code, the target signal is created from the reproduced speech and converted so that the error between the target signal and the algebraic synthesized signal is minimized. Even if the configurations of the algebraic codebooks of coding scheme 1 and coding scheme 2 which are problematic in 2 are greatly different, code conversion with little deterioration in sound quality is possible.
Further, according to the present invention, speech code conversion can be performed between the G.729A encoding method and the EVRC encoding method.
Furthermore, according to the present invention, code conversion corresponding to the EVRC rate can be performed.

本発明の音声符号変換装置の原理説明図である。It is principle explanatory drawing of the speech code converter of this invention. 本発明の第1実施例の音声符号変換装置の構成図である。1 is a configuration diagram of a speech code conversion device according to a first embodiment of the present invention. FIG. G.729AとEVRCのフレーム構成図である。It is a frame configuration diagram of G.729A and EVRC. ピッチゲイン符号変換説明図である。It is pitch gain code conversion explanatory drawing. G.729AとEVRCにおけるサブフレームのサンプル数の説明図である。It is explanatory drawing of the number of samples of the sub-frame in G.729A and EVRC. ターゲット生成部の構成図である。It is a block diagram of a target production | generation part. 代数符号変換部の構成図である。It is a block diagram of an algebraic code converter. 代数符号帳ゲイン変換部の構成図である。It is a block diagram of an algebraic codebook gain converter. 本発明の第2実施例の音声符号変換装置の構成図である。FIG. 5 is a configuration diagram of a speech code conversion device according to a second embodiment of the present invention. 代数符号帳ゲイン符号の変換説明図である。It is conversion explanatory drawing of an algebraic codebook gain code. 第3実施例の音声符号変換装置の全体構成図である。FIG. 10 is an overall configuration diagram of a speech code conversion device according to a third embodiment. フルレート用音声符号変換部の構成図である。It is a block diagram of the full-rate speech code converter. 1/8レート用の音声符号変換部の構成図である。It is a block diagram of the 1/8 rate speech code converter. 第4実施例の音声符号変換装置の構成図である。FIG. 10 is a configuration diagram of a speech code conversion device according to a fourth embodiment. ITU-T勧告G.729A方式の符号器の構成図である。It is a block diagram of an encoder of ITU-T recommendation G.729A system. 量子化方法説明図である。It is quantization method explanatory drawing. 適応符号帳の構成図である。It is a block diagram of an adaptive codebook. G.729Aの代数符号帳説明図である。It is an algebraic codebook explanatory diagram of G.729A. 各パルス系統グループに割り当てたサンプル点の説明図である。It is explanatory drawing of the sample point allocated to each pulse system group. G.729A方式の復号器のブロック図である。[Fig. 7] Fig. 7 is a block diagram of a G.729A decoder. EVRCの符号器の構成図である。It is a block diagram of the encoder of EVRC. EVRCのフレームとLPC分析窓、ピッチ分析窓の関係説明図である。It is explanatory drawing of the relationship between the frame of EVRC, the LPC analysis window, and the pitch analysis window. 従来の典型的な音声符号変換方法の原理図である。It is a principle diagram of a conventional typical speech code conversion method. 従来技術２の音声符号化装置である。This is a speech encoding apparatus according to prior art 2. 従来技術２の詳細な音声符号化装置である。3 is a detailed speech encoding apparatus of conventional technique 2;

Explanation of symbols

２０１レート判定部
２０２フルレート用音声符号変換部
２０３ハーフレート用音声符号変換部
２０４１/8レート用音声符号変換部
４０１符号分離部
４０２ LSP逆量子化部
４０３ LSP量子化部
４０４ゲイン逆量子化部
４０５音源発生部
４０６ゲイン乗算部
４０７ LPC合成フィルタ
４０８代数符号変換部
４０９ピッチラグバッファ
４１０ゲイン変換部
４１１符号多重部 201 rate determination unit 202 full-rate speech code conversion unit 203 half-rate speech code conversion unit 204 1/8 rate speech code conversion unit 401 code separation unit 402 LSP inverse quantization unit 403 LSP quantization unit 404 gain inverse quantization unit 405 Sound source generation unit 406 Gain multiplication unit 407 LPC synthesis filter 408 Algebraic code conversion unit 409 Pitch lag buffer 410 Gain conversion unit 411 Code multiplexing unit

Claims

The first speech code obtained by encoding the speech signal with the LSP code, the pitch lag code, the algebraic code, the pitch gain code, and the algebraic codebook gain code based on the first speech coding method is based on the second speech coding method. In a voice code conversion method for converting into two voice codes,
A full-rate, half-rate, and 1 / 8-rate speech code conversion unit is provided corresponding to the rate of the first speech code,
When the rate of the first speech code is 1/8 rate, the speech code conversion unit for 1/8 rate is
Separating the LSP code and the algebraic codebook gain code included in the speech code by the first speech coding method and dequantizing each to output an inverse quantization value,
Among the inverse quantization values, the inverse quantization value of the LSP code is quantized by the second speech encoding method to obtain the LSP code of the second speech code,
The random signal output from the sound source generator is multiplied by the inverse quantization value of the gain code, the multiplication result is input to an LPC synthesis filter composed of the inverse quantization value of the LSP code, and a target signal is generated,
An algebraic synthesized signal is generated using an arbitrary algebraic code in the second speech coding system and an inverse quantization value of the LSP code of the second speech code;
Obtaining an algebraic code in the second speech coding scheme that minimizes the difference between the target signal and the algebraic synthesized signal;
The second speech code using the inverse quantization value of the LSP code of the second speech code, the inverse quantization value of the pitch lag code obtained by full-rate or half-rate speech code conversion, the obtained algebraic code and the target signal The gain code of the conversion method,
Outputting the LSP code, the algebraic code, the gain code and the pitch lag code in the obtained second speech encoding method;
A speech code conversion method characterized by the above.

When the rate of the first speech code is full rate, the speech code conversion unit for full rate is
Each code constituting the first speech code is inversely quantized, and among the inversely quantized values, the LSP code and the inverse quantized value of the pitch lag code are quantized by the second speech coding method, and the LSP code and pitch lag of the second speech code Find the sign
Obtain the inverse quantization value of the pitch gain code of the second speech code by interpolation using the inverse quantization value of the pitch gain code of the first speech code,
A pitch periodic composite signal is generated using the LSP code of the second speech code, the pitch lag code, and the inverse quantization value of the pitch gain, and the speech signal is reproduced from the first speech code, and the reproduced speech signal and Generating a difference signal of the pitch periodic composite signal as a target signal;
An algebraic synthesized signal is generated using an arbitrary algebraic code in the second speech coding system and an inverse quantization value of the LSP code of the second speech code;
Obtaining an algebraic code in the second speech coding scheme that minimizes the difference between the target signal and the algebraic synthesized signal;
Using the LSP code of the second speech code, the inverse quantization value of the pitch lag code, the obtained algebraic code and the target signal, a gain code of the second speech coding method is obtained,
Outputting an LSP code, a pitch lag code, an algebraic code, and a gain code in the obtained second speech encoding method;
The speech code conversion method according to claim 1, wherein:

The first speech code obtained by encoding the speech signal with the LSP code, the pitch lag code, the algebraic code, the pitch gain code, and the algebraic codebook gain code based on the first speech coding method is based on the second speech coding method. In a voice code conversion device that converts two voice codes,
Corresponding to the rate of the first speech code, it is equipped with a speech code conversion unit for full rate, half rate, 1/8 rate, the speech code conversion unit for 1/8 rate,
An inverse quantization unit that separates an LSP code and an algebraic codebook gain code included in the speech code according to the first speech coding scheme and dequantizes each to output an inverse quantization value;
An LSP quantization unit that quantizes the inverse quantization value of the LSP code among the inverse quantization values by the second speech encoding method and outputs the LSP code of the second speech code;
A target generation unit that generates a target signal by multiplying a random signal output from a sound source generation unit by an inverse quantization value of the gain code, and inputs the multiplication result to an LPC synthesis filter including an inverse quantization value of an LSP code ,
An algebraic synthesized signal is generated using an arbitrary algebraic code in the second speech coding system and an inverse quantization value of the LSP code of the second speech code, and the difference between the target signal and the algebraic synthesized signal is minimized. An algebraic code acquisition unit for obtaining an algebraic code in the second speech encoding method;
The second speech code using the inverse quantization value of the LSP code of the second speech code, the inverse quantization value of the pitch lag code obtained by full-rate or half-rate speech code conversion, the obtained algebraic code and the target signal Gain code acquisition unit for obtaining the gain code of the conversion method,
A code multiplexing unit that multiplexes and outputs the LSP code, the algebraic code, the gain code, and the pitch lag code in the obtained second speech encoding method;
A speech code conversion device comprising:

The full-rate speech code converter is
Each code constituting the first speech code is inversely quantized, and among the inversely quantized values, the LSP code and the inverse quantized value of the pitch lag code are quantized by the second speech coding method, and the LSP code and pitch lag of the second speech code An inverse quantization unit for obtaining a code,
A pitch gain interpolation unit that obtains an inverse quantization value of the pitch gain code of the second speech code by an interpolation process using an inverse quantization value of the pitch gain code of the first speech code;
A pitch periodic composite signal is generated using the LSP code of the second speech code, the pitch lag code, and the inverse quantization value of the pitch gain, and the speech signal is reproduced from the first speech code, and the reproduced speech signal and A target generator for generating a difference signal of the pitch periodic composite signal as a target signal;
An algebraic synthesized signal is generated using an arbitrary algebraic code in the second speech coding system and an inverse quantization value of the LSP code of the second speech code, and the difference between the target signal and the algebraic synthesized signal is minimized. An algebraic code acquisition unit for obtaining an algebraic code in the second speech encoding method;
A gain code obtaining unit for obtaining a gain code of a second speech coding method using the LSP code of the second speech code, the inverse quantization value of the pitch lag code, the obtained algebraic code and the target signal;
A code multiplexing unit that multiplexes and outputs the LSP code, pitch lag code, algebraic code, and gain code in the obtained second speech encoding method;
4. The speech code conversion apparatus according to claim 3, further comprising: