JP4900402B2

JP4900402B2 - Speech code conversion method and apparatus

Info

Publication number: JP4900402B2
Application number: JP2009029787A
Authority: JP
Inventors: 義照土永; 恭士大田; 政直鈴木; 正清田中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-02-12
Filing date: 2009-02-12
Publication date: 2012-03-21
Anticipated expiration: 2022-02-04
Also published as: JP2009104200A

Abstract

<P>PROBLEM TO BE SOLVED: To enable both speech communication and data communication between a communication system, having only a speech line, and a communication system having a data line, for other than the speech line. <P>SOLUTION: In a speech code converter 103, a first speech code obtained by encoding input speech by a first speech encoding system is converted into a second speech code according to a second speech encoding system. The first speech code is converted to the second speech code; and when optional data is embedded in the first speech code received from the transmission end, the embedded data is extracted from the first speech code, and the second speech code and the extracted data are transmitted separately to the transmission destination. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は音声符号変換方法及び音声符号変換装置に係わり、特に、インターネットなどのネットワークで用いられる音声符号化装置、又は自動車・携帯電話システム等で用いられる音声符号化装置によって符号化された音声符号を別の符号化方式の音声符号に変換する音声符号変換方法及び音声符号変換装置に関する。 The present invention relates to a speech code conversion method and a speech code conversion device, and in particular, a speech code encoded by a speech encoding device used in a network such as the Internet or a speech encoding device used in an automobile / mobile phone system or the like. The present invention relates to a voice code conversion method and a voice code conversion apparatus for converting a voice code into a voice code of another encoding method.

近年、携帯電話システムの多様化や加入者の爆発的な増加、インターネットを使った音声通信(Voice over IP:VoIP)の普及等により、異なる通信システム間での通信量がますます増加すると考えられる。携帯電話やVoIPなどの音声通信システムでは、通信回線を有効利用するために音声を圧縮する音声符号化技術が用いられている。携帯電話では国によって、あるいはシステムによって異なる音声符号化技術が用いられており、W-CDMAでは世界共通の音声符号化方式としてAMR(Adaptive Multi-Rate;適応マルチレート)方式が採用されている。一方、VoIPでは音声符号化方式としてITU-T勧告G.729Aが広く用いられている。以下では、G.729Aの符号化方式及び復号方式を説明すると共に、G.729AとAMR方式の相違点について説明する。 In recent years, the amount of communication between different communication systems is expected to increase more and more due to the diversification of mobile phone systems, the explosion of subscribers, and the spread of voice communication using the Internet (Voice over IP: VoIP). . In a voice communication system such as a cellular phone or VoIP, a voice coding technique for compressing voice is used to effectively use a communication line. Mobile phones use different voice coding techniques depending on the country or system, and W-CDMA employs an AMR (Adaptive Multi-Rate) system as a world-wide voice coding system. On the other hand, in VoIP, ITU-T recommendation G.729A is widely used as a voice encoding method. Hereinafter, the G.729A encoding method and decoding method will be described, and differences between the G.729A and AMR methods will be described.

G.729Aの符号化方式及び復号方式は次の通りである。
・符号器の構成及び動作
図18はITU-T勧告G.729A方式の符号器の構成図である。図18において、１フレーム当り所定サンプル数（＝Ｎ）の入力信号（音声信号）Ｘがフレーム単位でLPC分析部１に入力する。サンプリング速度を8kHz、1フレーム期間を10msecとすれば、1フレームは80サンプルである。LPC分析部１は、人間の声道を次式
H(z)=１／［１＋Σαi・ｚ^-i］（ｉ＝１〜P） (1)
で表される全極型フィルタと見なし、このフィルタの係数αi(i=1,・・・,p)を求める。ここで、Pはフィルタ次数である。一般に、電話帯域音声の場合はPとして10〜12の値が用いられる。LPC(線形予測)分析部１では、入力信号の80サンプルと先読み分の40サンプル及び過去の信号120サンプルの合計240サンプルを用いてLPC分析を行いLPC係数を求める。 The G.729A encoding method and decoding method are as follows.
-Configuration and operation of encoder FIG. 18 is a configuration diagram of an encoder of the ITU-T recommendation G.729A system. In FIG. 18, an input signal (audio signal) X having a predetermined number of samples (= N) per frame is input to the LPC analyzer 1 in units of frames. If the sampling rate is 8 kHz and one frame period is 10 msec, one frame is 80 samples. The LPC analysis unit 1 expresses the human vocal tract as
H (z) = 1 / [1 + Σαi · z ⁻ⁱ ] (i = 1 to P) (1)
The coefficient αi (i = 1,..., P) of this filter is obtained. Here, P is the filter order. Generally, in the case of telephone band voice, a value of 10 to 12 is used as P. The LPC (Linear Prediction) analysis unit 1 performs LPC analysis using a total of 240 samples of 80 samples of the input signal, 40 samples of the pre-reading, and 120 samples of the past signal to obtain LPC coefficients.

パラメータ変換部２はLPC係数をLSP(線スペクトル対)パラメータに変換する。ここで、LSPパラメータは、LPC係数と相互に変換が可能な周波数領域のパラメータであり、量子化特性がLPC係数よりも優れていることから量子化はLSPの領域で行われる。LSP量子化部３は変換されたLSPパラメータを量子化してLSP符号とLSP逆量子化値を求める。LSP補間部４は、現フレームで求めたLSP逆量子化値と前フレームで求めたLSP逆量子化値によりLSP補間値を求める。すなわち、１フレームは5msecの第１、第２の２つのサブフレームに分割され、LPC分析部１は第２サブフレームのLPC係数を決定するが、第１サブフレームのLPC係数は決定しない。そこで、LSP補間部４は、現フレームで求めたLSP逆量子化値と前フレームで求めたLSP逆量子化値を用いて補間演算により第１サブフレームのLSP逆量子化値を予測する。 The parameter converter 2 converts the LPC coefficient into an LSP (Line Spectrum Pair) parameter. Here, the LSP parameter is a parameter in the frequency domain that can be mutually converted with the LPC coefficient. Since the quantization characteristic is superior to the LPC coefficient, the quantization is performed in the LSP area. The LSP quantization unit 3 quantizes the converted LSP parameter to obtain an LSP code and an LSP inverse quantization value. The LSP interpolation unit 4 obtains an LSP interpolation value from the LSP inverse quantization value obtained in the current frame and the LSP inverse quantization value obtained in the previous frame. That is, one frame is divided into two first and second subframes of 5 msec, and the LPC analysis unit 1 determines the LPC coefficient of the second subframe, but does not determine the LPC coefficient of the first subframe. Therefore, the LSP interpolation unit 4 predicts the LSP inverse quantization value of the first subframe by interpolation using the LSP inverse quantization value obtained in the current frame and the LSP inverse quantization value obtained in the previous frame.

パラメータ逆変換部５はLSP逆量子化値とLSP補間値をそれぞれLPC係数に変換してLPC合成フィルタ６に設定する。この場合、LPC合成フィルタ６のフィルタ係数として、フレームの第１サブフレームではLSP補間値から変換されたLPC係数が用いられ、第２サブフレームではLSP逆量子化値から変換したLPC係数が用られる。尚、以降において1に添字があるもの、例えばlspi, li（ｎ）,・・・における1はアルファベットのエルである。
LSPパラメータlspi(i=1,・・・,p)はLSP量子化部３でスカラー量子化やベクトル量子化などにより量子化された後、量子化インデックス（LSP符号)が復号器側へ伝送される。 The parameter inverse conversion unit 5 converts the LSP inverse quantization value and the LSP interpolation value into LPC coefficients, respectively, and sets them in the LPC synthesis filter 6. In this case, as the filter coefficient of the LPC synthesis filter 6, the LPC coefficient converted from the LSP interpolation value is used in the first subframe of the frame, and the LPC coefficient converted from the LSP dequantized value is used in the second subframe. . In the following description, 1 with a subscript, for example, 1 in lspi, li (n),.
The LSP parameter lspi (i = 1,..., P) is quantized by scalar quantization or vector quantization in the LSP quantization unit 3, and then the quantization index (LSP code) is transmitted to the decoder side. The

次に音源とゲインの探索処理を行なう。音源とゲインはサブフレーム単位で処理を行う。まず、音源信号をピッチ周期成分と雑音成分の２つに分け、ピッチ周期成分の量子化には過去の音源信号系列を格納した適応符号帳７を用い、雑音成分の量子化には代数符号帳や雑音符号帳などを用いる。以下では、音源符号帳として適応符号帳７と代数符号帳８の２つを使用する音声符号化方式について説明する。 Next, sound source and gain search processing is performed. Sound source and gain are processed in subframe units. First, the sound source signal is divided into a pitch period component and a noise component, the pitch code component is quantized using the adaptive codebook 7 storing a past sound source signal sequence, and the noise component is quantized with an algebraic codebook. Or a noise codebook. In the following, a speech coding scheme that uses the adaptive codebook 7 and the algebraic codebook 8 as the excitation codebook will be described.

適応符号帳７は、インデックス１〜Ｌに対応して順次１サンプル遅延したＮサンプル分の音源信号（周期性信号という）を出力するようになっている。Ｎは1サブフレームのサンプル数であり（N=40)、最新の(L+39)サンプルのピッチ周期成分を記憶するバッファを有している。インデックス１により第1〜第40サンプルよりなる周期性信号が特定され、インデックス２により第2〜第41サンプルよりなる周期性信号が特定され、・・・インデックスＬにより第Ｌ〜第L+39サンプルよりなる周期性信号が特定される。初期状態では適応符号帳７の中身は全ての振幅が0の信号が入っており、サブフレーム毎に時間的に一番古い信号をサブフレーム長だけ捨て、現サブフレームで求めた音源信号を適応符号帳７に格納するように動作する。 The adaptive codebook 7 outputs a sound source signal (referred to as a periodic signal) for N samples sequentially delayed by one sample corresponding to the indexes 1 to L. N is the number of samples in one subframe (N = 40), and has a buffer for storing the pitch period component of the latest (L + 39) samples. Index 1 identifies the periodic signal consisting of the 1st to 40th samples, Index 2 identifies the periodic signal consisting of the 2nd to 41st samples, ... Lth to L + 39th samples based on the index L A periodic signal is identified. In the initial state, the contents of the adaptive codebook 7 contain all signals with an amplitude of 0. For each subframe, the oldest signal in time is discarded by the subframe length, and the excitation signal obtained in the current subframe is applied. It operates so as to be stored in the codebook 7.

適応符号帳探索は、過去の音源信号を格納している適応符号帳７を用いて音源信号の周期性成分を同定する。すなわち、適応符号帳７から読み出す開始点を1サンプルづつ変えながら適応符号帳７内の過去の音源信号をサブフレーム長(=40サンプル)だけ取り出し、LPC合成フィルタ６に入力してピッチ合成信号β×A×PLを作成する。ただし、PLは適応符号帳７から取り出された遅れＬに相当する過去のピッチ周期性信号(適応符号ベクトル）、AはLPC合成フィルタ６のインパルス応答、βは適応符号帳ゲインである。 In the adaptive codebook search, the periodic component of the excitation signal is identified using the adaptive codebook 7 storing the past excitation signal. That is, while the starting point read from the adaptive codebook 7 is changed by one sample, the past excitation signal in the adaptive codebook 7 is extracted by subframe length (= 40 samples), and is input to the LPC synthesis filter 6 to be input to the pitch synthesis signal β. Create × A × PL. Here, PL is a past pitch periodic signal (adaptive code vector) corresponding to the delay L extracted from the adaptive codebook 7, A is an impulse response of the LPC synthesis filter 6, and β is an adaptive codebook gain.

演算部９は入力音声Ｘとβ×A×PLの誤差電力ELを次式
EL＝｜X−β×A×PL｜² (2)
により求める。適応符号帳出力の重み付き合成出力をA×PLとし、A×PLの自己相関をＲpp、A×PLと入力信号Ｘの相互相関をＲxpとすると、式(2)の誤差電力が最小となるピッチラグＬoptにおける適応符号ベクトルPLは、次式
P_L=argmax（Rxp²／Rpp） (3)
により表わされる。すなわち、ピッチ合成信号A×PLと入力信号Ｘとの相互相関Ｒxpをピッチ合成信号の自己相関Ｒppで正規化した値が最も大きくなる読み出し開始点を最適な開始点とする。以上より、誤差電力評価部１０は(3)式を満足するピッチラグＬoptを求める。このとき、最適ピッチゲインβoptは次式
βopt＝Ｒxp／Ｒpp (4)
で与えられる。 The arithmetic unit 9 calculates the input power X and the error power EL of β × A × PL as follows:
EL = | X−β × A × PL | ² (2)
Ask for. When the weighted composite output of the adaptive codebook output is A × PL, the autocorrelation of A × PL is Rpp, and the cross correlation between A × PL and the input signal X is Rxp, the error power in equation (2) is minimized. The adaptive code vector PL for the pitch lag Lopt is given by
P _L = argmax (Rxp ² / Rpp) (3)
Is represented by That is, the reading start point at which the value obtained by normalizing the cross-correlation Rxp between the pitch synthesized signal A × PL and the input signal X by the autocorrelation Rpp of the pitch synthesized signal becomes the optimum starting point. As described above, the error power evaluation unit 10 obtains the pitch lag Lopt that satisfies the expression (3). At this time, the optimum pitch gain βopt is expressed by the following equation: βopt = Rxp / Rpp (4)
Given in.

次に代数符号帳８を用いて音源信号に含まれる雑音成分を量子化する。代数符号帳８は、振幅が1又は−1の複数のパルスから構成される。例として、サブフレーム長が40サンプルの場合のパルス位置を表1に示す。

代数符号帳８は、１サブフレームを構成するＮ(=40)サンプル点を複数のパルス系統グループ１〜４に分割し、各パルス系統グループから１つのサンプル点を取り出してなる全組み合わせについて、各サンプル点で＋１あるいは−１のパルスを有するパルス性信号を雑音成分として順次出力する。この例では、基本的に1サブフレームあたり4本のパルスが配置される。 Next, the noise component contained in the sound source signal is quantized using the algebraic codebook 8. The algebraic codebook 8 is composed of a plurality of pulses having an amplitude of 1 or −1. As an example, Table 1 shows pulse positions when the subframe length is 40 samples.

The algebraic codebook 8 divides N (= 40) sample points constituting one subframe into a plurality of pulse system groups 1 to 4, and for each combination obtained by taking one sample point from each pulse system group, A pulse signal having a +1 or -1 pulse at the sample point is sequentially output as a noise component. In this example, basically four pulses are arranged per subframe.

図1９は各パルス系統グループ１〜４に割り当てたサンプル点の説明図であり、
(1) パルス系統グループ１には8個のサンプル点 0,5,10,15,20,25,30,35が割り当てられ、
(2) パルス系統グループ２には8個のサンプル点1,6,11,16,21,26,31,36が割り当てられ、
(3) パルス系統グループ３には8個のサンプル点2,7,12,17,22,27,32,37が割り当てられ、
(4)パルス系統グループ４には16個のサンプル点3,4,8,9,13,14,18,19,23,24,28, 29,33,34,38,39が割り当てられている。 FIG. 19 is an explanatory diagram of sample points assigned to each of the pulse system groups 1 to 4.
(1) Eight sample points 0,5,10,15,20,25,30,35 are assigned to pulse system group 1,
(2) Eight sample points 1,6,11,16,21,26,31,36 are assigned to pulse system group 2,
(3) Eight sample points 2,7,12,17,22,27,32,37 are assigned to pulse system group 3,
(4) 16 sample points 3,4,8,9,13,14,18,19,23,24,28, 29,33,34,38,39 are assigned to the pulse system group 4 .

パルス系統グループ１〜３のサンプル点を表現するために３ビット、パルスの正負を表現するのに１ bit、トータル4 bit が必要であり、又、パルス系統グループ４のサンプル点を表現するために4 bit、パルスの正負を表現するのに1 bit、トータル5 bit 必要である。従って、表１のパルス配置を有する雑音符号帳８から出力するパルス性信号を特定するために17bitが必要になり、パルス性信号の種類は217(＝24×24×24×25)存在する。
表1に示すように各パルス系統のパルス位置は限定されており、代数符号帳探索では各パルス系統のパルス位置の組み合わせの中から、再生領域で入力音声との誤差電力が最も小さくなるパルスの組み合わせを決定する。すなわち、適応符号帳探索で求めた最適ピッチゲインβoptとし、適応符号帳出力PLに該ゲインβoptを乗算して加算器１１に入力する。これと同時に代数符号帳８より順次パルス性信号を加算器に１１に入力し、加算器出力をLPC合成フィルタ６に入力して得られる再生信号と入力信号Ｘとの差が最小となるパルス性信号を特定する。具体的には、まず入力信号Ｘから適応符号帳探索で求めた最適な適応符号帳出力ＰＬ、最適ピッチゲインβ_optから次式により代数符号帳探索のためのターゲットベクトルＸ′を生成する。 3 bits are required to represent the sample points of pulse system groups 1 to 3, 1 bit is required to represent the positive and negative of the pulse, and a total of 4 bits are required. Also, to represent the sample points of pulse system group 4 4 bits, 1 bit and 5 bits in total are required to express the positive and negative of the pulse. Accordingly, 17 bits are required to specify the pulse signal output from the noise codebook 8 having the pulse arrangement shown in Table 1, and there are 217 (= 24 × 24 × 24 × 25) types of pulse signals.
As shown in Table 1, the pulse position of each pulse system is limited, and in the algebraic codebook search, the pulse with the smallest error power from the input speech in the playback area is selected from the combinations of pulse positions of each pulse system. Determine the combination. That is, the optimum pitch gain βopt obtained by the adaptive codebook search is set, and the adaptive codebook output PL is multiplied by the gain βopt and input to the adder 11. At the same time, a pulse signal is sequentially input from the algebraic codebook 8 to the adder 11 and the output of the adder is input to the LPC synthesis filter 6 so that the difference between the reproduced signal and the input signal X is minimized. Identify the signal. Specifically, first, a target vector X ′ for algebraic codebook search is generated from the optimal adaptive codebook output PL obtained from the input signal X by adaptive codebook search and the optimal pitch gain _βopt by the following equation.

X′＝X−βopt×A×PL (5)
この例では、パルスの位置と振幅(正負)を前述のように17bitで表現するため、その組合わせは2の17乗通り存在する。ここで、k通り目の代数符号出力ベクトルをCｋとすると、代数符号帳探索では次式
Ｄ＝|Ｘ′−GC×A×Ｃk|² (6)
の評価関数誤差電力Ｄを最小とする符号ベクトルＣkを求める。GCは代数符号帳ゲインである。誤差電力評価部１０は代数符号帳の探索において、代数合成信号A×Ckと入力信号Ｘ′の相互相関値Rcxの２乗を代数合成信号の自己相関値Rccで正規化して得られる正規化相互相関値(Rcx*Rcx/Rcc)が最も大きくなるパルス位置と極性の組み合わせを探索する。 X ′ ＝ X−βopt × A × PL (5)
In this example, the position and amplitude (positive / negative) of the pulse are expressed in 17 bits as described above, so there are 2 17 combinations. Here, if the kth algebraic code output vector is Ck, the following equation is used in the algebraic codebook search: D = | X′−GC × A × Ck | ² (6)
The code vector Ck that minimizes the evaluation function error power D is obtained. GC is the algebraic codebook gain. In the search of the algebraic codebook, the error power evaluation unit 10 normalizes the mutual value obtained by normalizing the square of the cross-correlation value Rcx between the algebraic composite signal A × Ck and the input signal X ′ with the autocorrelation value Rcc of the algebraic composite signal. A search is made for a combination of a pulse position and a polarity having the largest correlation value (Rcx * Rcx / Rcc).

次にゲイン量子化について説明する。G.729A方式において代数符号帳ゲインは直接量子化されず、適応符号帳ゲインGa(＝βopt)と代数符号帳ゲインGcの補正係数γをベクトル量子化する。ここで、代数符号帳ゲインGCと補正係数γとの間には GC＝g′×γなる関係がある。ｇ′は過去の4サブフレームの対数利得から予測される現フレームの利得である。
ゲイン量子化器１２の図示しないゲイン量子化テーブルには、適応符号帳ゲインＧaと代数符号帳ゲインに対する補正係数γの組み合わせが128通り(＝２７)用意されている。ゲイン符号帳の探索方法は、(1)適応符号帳出力ベクトルと代数符号帳出力ベクトルに対して、ゲイン量子化テーブルの中から1組のテーブル値を取り出してゲイン可変部１３、１４に設定し、(2)ゲイン可変部１３、１４でそれぞれのベクトルにゲインＧa、Ｇcを乗じてLPC合成フィルタ６に入力し、(3)誤差電力評価部１０において入力信号Ｘとの誤差電力が最も小さくなる組み合わせを選択する、ことにより行なう。 Next, gain quantization will be described. In the G.729A system, the algebraic codebook gain is not directly quantized, and the adaptive codebook gain Ga (= βopt) and the correction coefficient γ of the algebraic codebook gain Gc are vector-quantized. Here, there is a relationship GC = g ′ × γ between the algebraic codebook gain GC and the correction coefficient γ. g ′ is the gain of the current frame predicted from the logarithmic gain of the past 4 subframes.
In the gain quantization table (not shown) of the gain quantizer 12, 128 (= 27) combinations of the correction code γ for the adaptive codebook gain Ga and the algebraic codebook gain are prepared. The gain codebook search method is as follows: (1) For the adaptive codebook output vector and the algebraic codebook output vector, one set of table values is extracted from the gain quantization table and set in the gain variable sections 13 and 14. (2) The gain variable units 13 and 14 multiply the respective vectors by the gains Ga and Gc and input them to the LPC synthesis filter 6. (3) The error power evaluation unit 10 has the smallest error power with the input signal X. This is done by selecting a combination.

以上より、回線符号化部１５は、(1)LSPの量子化インデックスであるLSP符号、(2)ピッチラグの量子化インデックスであるピッチラグ符号Ｌopt、(3) 代数符号帳インデックスである代数符号、(4) ゲインの量子化インデックスであるゲイン符号を多重して回線データを作成し、復号器に伝送する。 From the above, the line encoder 15 (1) an LSP code that is an LSP quantization index, (2) a pitch lag code Lopt that is a pitch lag quantization index, (3) an algebraic code that is an algebraic codebook index, ( 4) Multiplex the gain code, which is the gain quantization index, to create circuit data and transmit it to the decoder.

・復号器の構成及び動作
図２０はG.729A方式の復号器のブロック図である。符号器側から送られてきた回線データが回線復号部２１へ入力されてLSP符号、ピッチラグ符号、代数符号、ゲイン符号が出力される。復号器ではこれらの符号に基づいて音声データを復号する。復号器の動作については、復号器の機能が符号器に含まれているため一部重複するが、以下で簡単に説明する。
LSP逆量子化部２２はLSP符号が入力すると逆量子化し、LSP逆量子化値を出力する。LSP補間部２３は現フレームの第２サブフレームにおけるLSP逆量子化値と前フレームの第２サブフレームのLSP逆量子化値から現フレームの第１サブフレームのLSP逆量子化値を補間演算する。次に、パラメータ逆変換部２４はLSP補間値とLSP逆量子化値をそれぞれLPC合成フィルタ係数へ変換する。G.729A方式のLPC合成フィルタ２５は、最初の第１サブフレームではLSP補間値から変換されたLPC係数を用い、次の第２サブフレームではLSP逆量子化値から変換されたLPC係数を用いる。 Decoder Configuration and Operation FIG. 20 is a block diagram of a G.729A decoder. The line data sent from the encoder side is input to the line decoding unit 21, and an LSP code, pitch lag code, algebraic code, and gain code are output. The decoder decodes the audio data based on these codes. The operation of the decoder is partly duplicated because the function of the decoder is included in the encoder, but will be briefly described below.
When the LSP code is input, the LSP inverse quantization unit 22 performs inverse quantization and outputs an LSP inverse quantization value. The LSP interpolation unit 23 interpolates the LSP inverse quantization value of the first subframe of the current frame from the LSP inverse quantization value of the second subframe of the current frame and the LSP inverse quantization value of the second subframe of the previous frame. . Next, the parameter inverse conversion unit 24 converts the LSP interpolation value and the LSP inverse quantization value into LPC synthesis filter coefficients, respectively. The G.729A LPC synthesis filter 25 uses the LPC coefficient converted from the LSP interpolation value in the first first subframe, and uses the LPC coefficient converted from the LSP inverse quantization value in the next second subframe. .

適応符号帳２６はピッチラグ符号が指示する読み出し開始位置からサブフレーム長(=40サンプル)のピッチ信号を出力し、雑音符号帳２７は代数符号に対応する読出し位置からパルス位置とパルスの極性を出力する。また、ゲイン逆量子化部２８は入力されたゲイン符号より適応符号帳ゲイン逆量子化値と代数符号帳ゲイン逆量子化値を算出してゲイン可変部２９，３０に設定する。加算部３１は適応符号帳出力に適応符号帳ゲイン逆量子化値を乗じて得られる信号と、代数符号帳出力に代数符号帳ゲイン逆量子化値を乗じて得られる信号とを加え合わせて音源信号を作成し、この音源信号をLPC合成フィルタ２５に入力する。これにより、LPC合成フィルタ２５から再生音声を得ることができる。
尚、初期状態では復号器側の適応符号帳２６の内容は全て振幅0の信号が入っており、サブフレーム毎に時間的に一番古い信号をサブフレーム長だけ捨て、一方、現サブフレームで求めた音源信号を適応符号帳２６に格納するように動作する。つまり、符号器と復号器の適応符号帳２６は常に最新の同じ状態になるように維持される。
以上がG.729Aの符号化及び復号方式である。一方、AMR方式もG.729A方式と同様にCELP(Code Excited Linear Prediction;符号駆動線形予測符号化)と呼ばれる基本アルゴリズムを用いており、G.729A方式との違いは以下の通りである。 The adaptive codebook 26 outputs a pitch signal of subframe length (= 40 samples) from the reading start position indicated by the pitch lag code, and the noise codebook 27 outputs the pulse position and pulse polarity from the reading position corresponding to the algebraic code. To do. The gain dequantization unit 28 calculates an adaptive codebook gain dequantization value and an algebraic codebook gain dequantization value from the input gain code, and sets them in the gain variable units 29 and 30. The adder 31 adds the signal obtained by multiplying the adaptive codebook output by the adaptive codebook gain inverse quantization value and the signal obtained by multiplying the algebraic codebook output by the algebraic codebook gain inverse quantization value, A signal is created, and this sound source signal is input to the LPC synthesis filter 25. Thereby, reproduced sound can be obtained from the LPC synthesis filter 25.
In the initial state, the contents of the adaptive codebook 26 on the decoder side all contain a signal with an amplitude of 0. For each subframe, the oldest signal in time is discarded by the subframe length, while in the current subframe. It operates so as to store the obtained excitation signal in the adaptive codebook 26. In other words, the adaptive codebook 26 of the encoder and decoder is always maintained in the latest state.
The above is the G.729A encoding and decoding scheme. On the other hand, the AMR system uses a basic algorithm called CELP (Code Excited Linear Prediction) as in the G.729A system, and the differences from the G.729A system are as follows.

・G729A方式とAMR方式における符号化方法の相違
図２１はG.729A方式とAMRの主要諸元を比較した結果である。なお、AMRの符号化モードは全部で８種類あるが図２１の諸元は全ての符号化モードで共通である。G729A方式とAMR方式は、入力信号の標本化周波数(=8KHz)、サブフレーム長(=5msec)、線形予測次数(=10次)は同じであるが、フレーム長が異なり、１フレーム当りのサブフレーム数が異なっている。図２２に示すようにG.729A方式では１フレームが２つの第０〜第１サブフレームで構成され、AMR方式では１フレームが４つの第０〜第３サブフレームで構成されている。 -Difference in encoding method between G729A method and AMR method FIG. 21 shows the result of comparing the main specifications of the G.729A method and AMR. Although there are eight types of AMR encoding modes in total, the specifications in FIG. 21 are common to all encoding modes. The G729A method and the AMR method have the same sampling frequency (= 8KHz), subframe length (= 5msec), and linear prediction order (= 10th order) of the input signal, but the frame length is different and the subframe per frame The number of frames is different. As shown in FIG. 22, one frame is composed of two 0th to first subframes in the G.729A system, and one frame is composed of four 0th to third subframes in the AMR system.

図２３はG.729A方式とAMR方式におけるビット割り当ての比較結果を示すもので、AMR方式についてはG.729Aのビットレートに最も近い7.95kbit/sモードの場合を示した。図２３から明らかなように、1サブフレーム当りの代数符号帳のビット数(=17ビット)は同じであるが、その他の符号に必要なビット数の配分は全て異なっている。また、G.729A方式では適応符号帳ゲインと代数符号帳ゲインをまとめてベクトル量子化するため、ゲイン符号は１サブフレームにつき１種類であるが、AMR方式では１サブフレームにつき適応符号帳ゲインと代数符号帳ゲインの２種類が必要である。
以上説明した通り、インターネットで音声を通信するVoIPで広く用いられているG.729A方式と携帯電話システムで採用されたAMR方式とでは、基本アルゴリズムが共通であるが、フレーム長が異なり、しかも、符号を表現するビット数が異なっている。 FIG. 23 shows a comparison result of bit allocation between the G.729A system and the AMR system. The AMR system shows the case of the 7.95 kbit / s mode closest to the G.729A bit rate. As is clear from FIG. 23, the number of bits in the algebraic codebook per subframe (= 17 bits) is the same, but the distribution of the number of bits necessary for other codes is all different. In addition, in the G.729A system, the adaptive codebook gain and the algebraic codebook gain are vector quantized together, so that there is one type of gain code per subframe, but in the AMR system, the adaptive codebook gain and the adaptive codebook gain per subframe are different. Two types of algebraic codebook gain are required.
As explained above, the basic algorithm is common between the G.729A system widely used in VoIP that communicates voice over the Internet and the AMR system adopted in mobile phone systems, but the frame length is different, The number of bits representing the code is different.

・音声符号変換
インターネットと携帯電話の普及に伴い、インターネットユーザと携帯電話網のユーザによる音声通話の通信量が今後ますます増えてくると考えられる。このような異なる通信システム間の音声通信には、図２４に示すように中間に音声符号変換装置５３が必要になる。すなわち、音声符号変換装置５３において、一方の通信システム５1の符号器５２で第1音声符号化方式に従って符号化した音声符号を、他方の通信システム５４で使用されている第2音声符号化方式の音声符号に変換する。このように音声符号変換すれば、通信システム５４の第2音声符号化方式の復号器５５はユーザ1の音声を正しく再生することができる。・ Voice code conversion With the spread of the Internet and mobile phones, it is considered that the volume of voice calls between Internet users and mobile phone network users will increase in the future. For voice communication between such different communication systems, a voice code converter 53 is required in the middle as shown in FIG. That is, in the speech code conversion device 53, the speech code encoded by the encoder 52 of one communication system 51 according to the first speech encoding method is used in the second speech encoding method used in the other communication system 54. Convert to speech code. If the speech code conversion is performed in this way, the decoder 55 of the second speech coding system of the communication system 54 can correctly reproduce the user 1 speech.

かかる符号変換技術としては、(1)各々のシステムの音声符号化方式で復号・符号を繰り返すタンデム接続方式や、（2）音声符号を、該音声符号を構成する各要素符号に分解し、各要素符号を個別に別の音声符号化方式の符号に変換する手法が提案されている（特願2001-75427参照）。図２５は後者の手法の説明図である。
端末71に組み込まれた符号化方式１の符号器71ａはユーザＡが発した音声信号を符号化方式１の音声符号に符号化して伝送路71ｂに送出する。音声符号変換部74は伝送路71ｂより入力した符号化方式１の音声符号を符号化方式２の音声符号に変換して伝送路72ｂに送出し、端末72の復号器72ａは、伝送路72ｂを介して入力する符号化方式２の音声符号から再生音声を復号し、ユーザＢはこの再生音声を聞くことができる。 As such a code conversion technique, (1) a tandem connection system that repeats decoding / codes in each system's speech coding system, and (2) a speech code is decomposed into element codes constituting the speech code, There has been proposed a method for individually converting element codes into codes of different speech encoding methods (see Japanese Patent Application No. 2001-75427). FIG. 25 is an explanatory diagram of the latter method.
The encoding method 1 encoder 71a incorporated in the terminal 71 encodes the audio signal emitted by the user A into the encoding method 1 audio code and sends it to the transmission line 71b. The voice code conversion unit 74 converts the voice code of the encoding method 1 input from the transmission line 71b into the voice code of the coding method 2 and sends it to the transmission line 72b. The decoder 72a of the terminal 72 uses the transmission line 72b. The user B can listen to the reproduced voice by decoding the reproduced voice from the voice code of the encoding method 2 input via the user.

符号化方式１は、(1)フレーム毎の線形予測分析により得られる線形予測係数(LPC係数)から求まるLSPパラメータを量子化することにより得られる第１のLＳＰ符号と、(2)周期性音源信号を出力するための適応符号帳の出力信号を特定する第１のピッチラグ符号と、(3)雑音性音源信号を出力するための代数符号帳(あるいは雑音符号帳)の出力信号を特定する第１の代数符号(雑音符号)と、(4)前記適応符号帳の出力信号の振幅を表すピッチゲインと前記代数符号帳の出力信号の振幅を表す代数符号帳ゲインとを量子化して得られる第１のゲイン符号とで音声信号を符号化する方式である。又、符号化方式２は、第１の音声符号化方式と異なる量子化方法により量子化して得られる(1)第２のLＳＰ符号、(2)第２のピッチラグ符号、(3)第２の代数符号（雑音符号）、(4)第２のゲイン符号とで音声信号を符号化する方式である。 The encoding method 1 includes (1) a first LSP code obtained by quantizing an LSP parameter obtained from a linear prediction coefficient (LPC coefficient) obtained by linear prediction analysis for each frame, and (2) a periodic sound source. A first pitch lag code that specifies an output signal of an adaptive codebook for outputting a signal, and (3) an output signal of an algebraic codebook (or a noise codebook) for outputting a noisy excitation signal. 1 is obtained by quantizing an algebraic code (noise code), and (4) a pitch gain representing the amplitude of the output signal of the adaptive codebook and an algebraic codebook gain representing the amplitude of the output signal of the algebraic codebook. This is a method of encoding an audio signal with a gain code of 1. The encoding method 2 is obtained by quantization by a quantization method different from the first speech encoding method (1) a second LSP code, (2) a second pitch lag code, (3) a second In this method, a speech signal is encoded with an algebraic code (noise code) and (4) a second gain code.

音声符号変換部74は、符号分離部74ａ、LSP符号変換部74ｂ、ピッチラグ符号変換部74ｃ、代数符号変換部74ｄ、ゲイン符号変換部74ｅ、符号多重化部74ｆを有している。符号分離部74ａは、端末１の符号器71ａから伝送路71ｂを介して入力する符号化方式１の音声符号より、音声信号を再現するために必要な複数の成分の符号、すなわち、(1)LSP符号、(2)ピッチラグ符号、(3)代数符号、(4)ゲイン符号に分離し、それぞれを各符号変換部74ｂ〜74ｅに入力する。各符号変換部74ｂ〜74ｅは入力された音声符号化方式１によるLSP符号、ピッチラグ符号、代数符号、ゲイン符号をそれぞれ音声符号化方式２によるLSP符号、ピッチラグ符号、代数符号、ゲイン符号(ピッチゲイン符号、代数ゲイン符号)に変換し、符号多重化部74ｆは変換された音声符号化方式２の各符号を多重化して伝送路72ｂに送出する。 The speech code conversion unit 74 includes a code separation unit 74a, an LSP code conversion unit 74b, a pitch lag code conversion unit 74c, an algebraic code conversion unit 74d, a gain code conversion unit 74e, and a code multiplexing unit 74f. The code separation unit 74a uses a plurality of component codes necessary for reproducing a speech signal from the speech code of the encoding method 1 input from the encoder 71a of the terminal 1 via the transmission line 71b, that is, (1) An LSP code, (2) a pitch lag code, (3) an algebraic code, and (4) a gain code are separated and input to the code conversion units 74b to 74e, respectively. Each of the code conversion units 74b to 74e converts the input LSP code, pitch lag code, algebraic code, and gain code according to the speech coding method 1 into the LSP code, pitch lag code, algebraic code, and gain code according to the speech coding method 2 (pitch gain). The code multiplexing unit 74f multiplexes the converted codes of the audio coding scheme 2 and sends them to the transmission path 72b.

・データの埋め込み技術
近年コンピュータやインターネットが普及する中で、マルチメディアコンテンツ(静止画、動画、オーディオ、音声など)に特殊なデータを埋め込む「電子透かし技術」が注目を集めている。電子透かし技術とは、画像や動画、音声などのマルチメディアコンテンツ自体に、人間の知覚の特性を利用し、品質にはほとんど影響を与えずに別の任意の情報を埋め込む技術である。このような技術は、コンテンツに作成者や販売者などの名前を埋め込んで、不正コピーやデータの改ざんなどを防止するといった著作権保護を目的とすることが多いが、その他にもコンテンツに関する関連情報や付属情報を埋め込んで利用者のコンテンツ利用時における利便性を高めることを目的としても用いられる。 Data Embedding Technology With the spread of computers and the Internet in recent years, “digital watermark technology” that embeds special data in multimedia content (still images, moving images, audio, audio, etc.) has attracted attention. The digital watermark technology is a technology that embeds other arbitrary information in multimedia content itself such as an image, a moving image, and sound by using the characteristics of human perception and hardly affecting the quality. Such technologies are often intended to protect copyrights by embedding the names of creators and sellers in the content to prevent unauthorized copying and data tampering. It is also used for the purpose of improving the convenience when the user uses the content by embedding the attached information.

音声通信の分野でも、音声符号にこのような任意の情報を埋め込んで伝送する試みが行われている。図２６はデータ埋め込み技術を適用した音声通信システムの概念図である。符号器81は、入力音声SPを音声符号に符号化する際に、音声以外の任意のデータ系列DTを音声符号SCDに埋め込んで復号器８２へ伝送する。このときデータの埋め込みを音声符号のフォーマットを変えずに音声符号自体に行うため、音声符号の情報量の増加はない。復号器82は音声符号に埋め込まれた任意のデータ系列を読み出すとともに、音声符号に通常の復号器処理を施して再生音声SP′を出力する。このとき、再生音声SP′の品質にほとんど影響がないように埋め込みが行われるため、再生音声は埋め込みを行わない場合とほとんど差がない。以上の構成により、伝送量を増加させることなく音声とは別に任意のデータを伝送することが可能となる。また、データが埋め込まれていることを知らない第3者にとっては通常の音声通信としか認識されない。 In the field of voice communication, attempts have been made to embed such arbitrary information in a voice code for transmission. FIG. 26 is a conceptual diagram of a voice communication system to which the data embedding technique is applied. When encoding the input speech SP into a speech code, the encoder 81 embeds an arbitrary data sequence DT other than speech in the speech code SCD and transmits it to the decoder 82. At this time, since the data is embedded in the voice code itself without changing the format of the voice code, the information amount of the voice code does not increase. The decoder 82 reads out an arbitrary data sequence embedded in the speech code, and performs normal decoder processing on the speech code to output a reproduced speech SP ′. At this time, since the embedding is performed so that the quality of the reproduced sound SP ′ is hardly affected, the reproduced sound is hardly different from the case where the embedding is not performed. With the above configuration, it is possible to transmit arbitrary data separately from voice without increasing the transmission amount. In addition, a third party who does not know that data is embedded can only recognize normal voice communication.

データの埋め込み方法としては、さまざまな方法がある。特にCELP方式をベースとする高圧縮音声符号化方式では、符号化された音声符号に任意の情報を埋め込む方法がいくつか提案されている。例えば、代数符号帳および適応符号帳を用いて符号化を行う音声符号化方式において、ピッチラグ符号、代数符号に任意のデータを埋め込む技術が提案されている。この埋め込む技術は、ある規則に従って代数符号帳あるいは適応符号帳で量子化した符号（ピッチラグ符号、代数符号）に任意のデータ系列を埋め込むものである。
ピッチ音源に対応するピッチラグ符号と雑音音源に対応する代数符号に着目すると、これらのゲイン(ピッチゲイン、代数符号帳ゲイン)が各符号の寄与度を示すファクタとみなすことができ、ゲインが小さい場合は対応する符号の寄与度が小さくなる。そこで、ゲインを判定パラメータとして定義し、該ゲインがある閾値以下になる場合は対応する符号の寄与度が小さいと判断して、該符号のインデックスを任意のデータ系列で置き換える。これにより、置き換えの影響を小さく抑えながら、任意のデータを埋め込むことが可能となる。 There are various methods for embedding data. In particular, in the high-compression voice coding system based on the CELP system, several methods for embedding arbitrary information in the coded voice code have been proposed. For example, in a speech coding system that performs coding using an algebraic codebook and an adaptive codebook, a technique for embedding arbitrary data in pitch lag codes and algebraic codes has been proposed. This embedding technique embeds an arbitrary data sequence in a code (pitch lag code, algebraic code) quantized with an algebraic codebook or an adaptive codebook according to a certain rule.
Focusing on the pitch lag code corresponding to the pitch sound source and the algebraic code corresponding to the noise sound source, these gains (pitch gain, algebraic codebook gain) can be regarded as factors indicating the contribution of each code, and the gain is small. The contribution of the corresponding code becomes small. Therefore, a gain is defined as a determination parameter, and when the gain falls below a certain threshold, it is determined that the degree of contribution of the corresponding code is small, and the index of the code is replaced with an arbitrary data series. As a result, it is possible to embed arbitrary data while suppressing the influence of replacement small.

今後、以上説明したようなデータ埋め込み技術を適用した通信システム間での通信が増大することが予想される。このとき音声符号変換装置はデータ埋め込みを施された音声符号を対象に符号変換を行う必要性がある。 In the future, it is expected that communication between communication systems to which the data embedding technology as described above is applied will increase. At this time, it is necessary for the speech code conversion apparatus to perform code conversion on the speech code on which data is embedded.

・課題1
図２７に符号変換の原理図を示す。図２７は第1符号化方式の符号化データCode1を第2符号化方式の符号化データCode2に変換する場合を示している。符号変換部91は、第1符号化方式による符号化の際に使用される第1量子化テーブル92と第2符号化方式による符号化の際に使用される第2量子化テーブル93をそれぞれ備えている。また、第1量子化テーブル92と第2量子化テーブル93はテーブルサイズおよびテーブル値が異なるが、図２７では、説明の簡略化のためにテーブルサイズが2ビットと同じ場合を示す。・ Problem 1
FIG. 27 shows the principle of code conversion. FIG. 27 shows a case where encoded data Code1 of the first encoding method is converted into encoded data Code2 of the second encoding method. The code conversion unit 91 includes a first quantization table 92 used when encoding by the first encoding method and a second quantization table 93 used when encoding by the second encoding method, respectively. ing. Although the first quantization table 92 and the second quantization table 93 have different table sizes and table values, FIG. 27 shows a case where the table size is the same as 2 bits for simplification of explanation.

図２７において、符号変換部91に入力される第1符号化方式の符号化データCode1（図では"01"）は、第1量子化テーブル92のインデックス番号を表している。したがって、入力されたCode1に対応する第1量子化テーブル92の値（図では2.0）に最も誤差の小さい値を第2量子化テーブル93より選択し、それに対応する第2量子化テーブル93のインデックス番号（図では、"10"）を第2符号化方式の符号化データCode2として出力する。このように符号変換部91では、変換元、変換先の量子化テーブルを比較して誤差が最も小さくなるようにインデックス番号の対応付けを行っている。
ここで入力符号Code1のデータ系列が、前述した埋め込み方法によって埋め込まれた任意のデータ("01"とする)である場合を考える。符号変換部91は、前述と同様の変換処理を行うため、入力データ系列"01"を"10"へ変換する。しかし、これでは、埋め込まれたデータ系列が"01"→"10"と変化してしまい保持されなくなり、受信側の第2符号化方式の復号器は埋め込まれたデータ系列を正常に復元することができない。
以上のように、従来の符号変換方式では、入力符号に任意のデータ系列が埋め込まれている場合、該埋め込みデータ系列を保持できず、結果として符号変換装置において埋め込みデータが損なわれる問題があった。 In FIG. 27, the encoded data Code1 (“01” in the figure) of the first encoding method input to the code conversion unit 91 represents the index number of the first quantization table 92. Therefore, a value with the smallest error is selected from the second quantization table 93 for the value (2.0 in the figure) of the first quantization table 92 corresponding to the input Code1, and the index of the second quantization table 93 corresponding thereto is selected. The number ("10" in the figure) is output as encoded data Code2 of the second encoding method. In this way, the code conversion unit 91 compares the conversion source and conversion destination quantization tables and associates index numbers so that the error is minimized.
Here, consider a case where the data sequence of the input code Code1 is arbitrary data (referred to as “01”) embedded by the above-described embedding method. The code conversion unit 91 converts the input data series “01” to “10” in order to perform the same conversion processing as described above. However, in this case, the embedded data sequence changes from “01” to “10” and is not retained, and the decoder on the receiving side of the second encoding method can normally restore the embedded data sequence. I can't.
As described above, in the conventional code conversion method, when an arbitrary data sequence is embedded in the input code, the embedded data sequence cannot be held, and as a result, there is a problem that the embedded data is damaged in the code conversion device. .

・課題2
今後、第3世代携帯電話システムに代表されるように、音声通信に加え、データ通信等マルチメディア情報を対象とした通信システムの普及が予想される。このため、従来のような音声回線のみを持つ通信システムと、音声回線とその他のデータ回線を持つ通信システム間での通信が発生する。かかる場合、音声回線については従来の音声符号変換装置で両通信システム間の音声符号の相互変換を行うことによりユーザ間の音声通信が可能となる。しかし、データ回線については、一方がデータ回線を持たないため、ユーザ間のデータ通信は不可能である。以上のように音声回線のみを持つ通信システムと音声回線と他にデータ回線を持つ通信システム間では、ユーザ間で音声通信しか行うことが出来ない問題がある。・ Problem 2
In the future, as represented by third-generation mobile phone systems, it is expected that communication systems targeting multimedia information such as data communications will become widespread in addition to voice communications. For this reason, communication occurs between a conventional communication system having only a voice line and a communication system having a voice line and other data lines. In such a case, voice communication between users becomes possible by performing mutual conversion of voice codes between the two communication systems with a conventional voice code conversion device. However, since one of the data lines does not have a data line, data communication between users is impossible. As described above, there is a problem that only voice communication can be performed between users between a communication system having only a voice line and a communication system having a voice line and another data line.

以上から、本発明の目的は、音声回線のみを持つ通信システムと音声回線の外にデータ回線を持つ通信システム間で、音声通信とデータ通信の両方の通信ができるようにすることである。 Accordingly, an object of the present invention is to enable both voice communication and data communication between a communication system having only a voice line and a communication system having a data line outside the voice line.

本発明は入力音声を第1音声符号化方式により符号化した第1音声符号を第2音声符号化方式による第2音声符号に変換する音声符号変換方法および音声符号変換装置である。
・音声符号変換方法
本発明の音声符号変換方法は、第1音声符号を受信するステップ、該第1音声符号に任意のデータが埋め込まれている場合、該第1音声符号を第2音声符号に変換すると共に、該第1音声符号から埋め込みデータを抽出するステップ、前記変換により得られる第2音声符号と前記抽出したデータを別々に送信先に送信するステップを有している。 The present invention is a speech code conversion method and a speech code conversion device for converting a first speech code obtained by encoding an input speech by a first speech encoding method into a second speech code by a second speech encoding method.
-Voice code conversion method The voice code conversion method of the present invention includes a step of receiving a first voice code, and when arbitrary data is embedded in the first voice code, the first voice code is converted into a second voice code. And converting, extracting the embedded data from the first speech code, and transmitting the second speech code obtained by the conversion and the extracted data separately to the transmission destination.

送信元において、データ埋め込み条件が満たされた時、第1音声符号の一部を前記データで置き換えることにより、第1音声符号にデータを埋め込み、音声符号変換部において、前記データ抽出ステップは、受信した第1音声符号を構成する所定の要素符号の逆量子化値を参照して前記データ埋め込み条件が満たされているか監視し、データ埋め込み条件が満たされていれば該第1音声符号より前記埋め込みデータを抽出する。 When the data embedding condition is satisfied at the transmission source, the data is embedded in the first speech code by replacing a part of the first speech code with the data. The data embedding condition is monitored by referring to a dequantized value of a predetermined element code constituting the first speech code. If the data embedding condition is satisfied, the embedding is performed from the first speech code. Extract data.

・音声符号変換装置
本発明の音声符号変換装置は、第1音声符号を第2音声符号に変換する符号変換部、送信元から受信した第1音声符号に任意のデータが埋め込まれている場合、該該第1音声符号から埋め込みデータを抽出する埋め込みデータ抽出部、前記変換により得られる第2音声符号と前記抽出したデータを別々に送信先に送信する手段を有している。
送信元において、データ埋め込み条件が満たされた時、第1音声符号の一部を前記データで置き換えることにより、第1音声符号にデータを埋め込んだ場合、前記埋め込みデータ抽出部は、送信元から受信した1音声符号を構成する所定の要素符号の逆量子化値を参照して前記データ埋め込み条件が満たされているか監視する監視部、データ埋め込み条件が満たされていれば第1音声符号より前記埋め込みデータを抽出する抽出部を有している。 -Speech code conversion device The speech code conversion device of the present invention, the code conversion unit for converting the first speech code to the second speech code, if any data is embedded in the first speech code received from the transmission source, An embedded data extraction unit for extracting embedded data from the first speech code, and a means for separately transmitting the second speech code obtained by the conversion and the extracted data to a transmission destination.
When the data embedding condition is satisfied at the transmission source, when the data is embedded in the first audio code by replacing a part of the first audio code with the data, the embedded data extraction unit receives the data from the transmission source. A monitoring unit that monitors whether the data embedding condition is satisfied with reference to a dequantized value of a predetermined element code constituting the one speech code; if the data embedding condition is satisfied, the embedding is performed from the first speech code An extraction unit for extracting data is included.

本発明によれば、変換元の音声回線によって伝送された音声情報とデータ情報とを、変換先の音声回線とデータ回線に分離して伝送することが可能となる。 According to the present invention, voice information and data information transmitted through a conversion source voice line can be separated and transmitted to a conversion destination voice line and data line.

本発明の第1のシステム概念図である。1 is a first system conceptual diagram of the present invention. 本発明の第1システムにおける音声符号変換装置の構成図である。1 is a configuration diagram of a speech code conversion device in a first system of the present invention. FIG. 本発明の第1システムにおける音声符号変換装置の別の概略構成図である。FIG. 5 is another schematic configuration diagram of a speech code conversion device in the first system of the present invention. 本発明の第２のシステム概念図である。It is a 2nd system conceptual diagram of this invention. 本発明の第２システムにおける音声符号変換装置の概略構成図である。It is a schematic block diagram of the speech code converter in the 2nd system of this invention. 本発明の第２システムにおける音声符号変換装置の別の概略構成図である。It is another schematic block diagram of the speech code converter in the 2nd system of this invention. 本発明の第３のシステム概念図である。It is a 3rd system conceptual diagram of this invention. 本発明の第３システムにおける音声符号変換装置の概略構成図である。It is a schematic block diagram of the speech code converter in the 3rd system of this invention. 本発明の第３システムにおける音声符号変換装置の別の概略構成図である。It is another schematic block diagram of the speech code converter in the 3rd system of this invention. 本発明の第1システムにおける音声符号変換装置の構成図である。1 is a configuration diagram of a speech code conversion device in a first system of the present invention. FIG. 本発明の第1システムにおける音声符号変換装置の別の実施例構成図である。FIG. 5 is a configuration diagram of another example of the speech code conversion device in the first system of the present invention. 本発明の第1システムにおける音声符号変換装置の更に別の実施例構成図である。FIG. 6 is a configuration diagram of still another example of the speech code conversion device in the first system of the present invention. 代数符号の構成図である。It is a block diagram of an algebraic code. 本発明の第2のシステムにおける音声符号変換装置の実施例構成図である。FIG. 6 is a block diagram of an embodiment of a speech code conversion device in the second system of the present invention. 本発明の第2のシステムにおける音声符号変換装置の別の実施例構成図である。FIG. 6 is a configuration diagram of another example of the speech code conversion device in the second system of the present invention. 本発明の第3のシステムにおける音声符号変換装置の実施例構成図である。FIG. 6 is a block diagram of an embodiment of a speech code conversion device in the third system of the present invention. 本発明の第3のシステムにおける音声符号変換装置の別の実施例構成図である。FIG. 10 is a configuration diagram of another example of the speech code conversion device in the third system of the present invention. ITU-T勧告G.729A方式の符号器の構成図である。It is a block diagram of an encoder of ITU-T recommendation G.729A system. 各パルス系統グループ１〜４に割り当てたサンプル点の説明図である。It is explanatory drawing of the sample point allocated to each pulse system group 1-4. G.729A方式の復号器のブロック図である。[Fig. 7] Fig. 7 is a block diagram of a G.729A decoder. G.729A方式とAMRの主要諸元の比較説明図である。It is a comparison explanatory drawing of the main specifications of G.729A system and AMR. G.729A方式とAMRのフレーム構成説明図である。It is a G.729A system and AMR frame structure explanatory drawing. G.729A方式とAMR方式におけるビット割り当ての比較説明図である。It is a comparison explanatory drawing of the bit allocation in a G.729A system and an AMR system. 異なる通信システム間での音声符号変換説明図である。It is voice code conversion explanatory drawing between different communication systems. 音声符号を別の音声符号化方式の符号に変換する従来技術の説明図である。It is explanatory drawing of the prior art which converts an audio | voice code into the code | symbol of another audio | voice encoding system. データ埋め込み技術を適用した音声通信システムの概念図である。It is a conceptual diagram of the audio | voice communication system to which a data embedding technique is applied. 符号変換の原理図である。It is a principle diagram of code conversion.

(A)本発明の概略
(a)第1のシステム
図１は本発明の第1のシステム概念図であり、任意のデータDTを埋め込んだ第1符号化方式の音声符号ＳＰ1を、該データDTを埋め込んだ第2符号化方式の音声符号SP2へ変換する場合を示している。
第１符号化方式の通信システム101と第2符号化方式の通信システム102間に音声符号変換装置103が設けられている。通信システム101における第１符号化方式の符号器104は、入力音声SP１を符号化する際、音声データ以外の任意のデータ系列DTを音声符号SCD１に埋め込んで伝送路105に送出する。この際、符号器104によるデータの埋め込みは、音声符号のフォーマットを変えずに音声符号自体に行われるため、音声符号の情報量の増加はない。
音声符号変換装置103は、符号器104から第1音声符号化方式に従って符号化した音声符号SCD1を受信すれば、該音声符号を通信システム102で使用されている第2音声符号化方式の音声符号SCD2に変換して伝送路106に送出する。この際、音声符号変換装置103は埋め込みデータを損なわずに音声符号変換を行う。
通信システム102における第2符号化方式の復号器107は音声符号SCD2に埋め込まれた任意のデータ系列ＤＴを読み出して出力するとともに、音声符号に通常の復号器処理を施して再生音声SP２を出力する。このとき、再生音声SP２の品質にほとんど影響がないように埋め込みが行われるため、再生音声は埋め込みを行わない場合とほとんど差がない。 (A) Outline of the present invention
(a) First System FIG. 1 is a conceptual diagram of a first system of the present invention, in which a first encoding speech code SP1 in which arbitrary data DT is embedded is converted into second encoding in which the data DT is embedded. This shows the case of conversion to the system voice code SP2.
A speech code conversion apparatus 103 is provided between the communication system 101 using the first encoding scheme and the communication system 102 using the second encoding scheme. When encoding the input speech SP1, the encoder 104 of the first encoding system in the communication system 101 embeds an arbitrary data series DT other than speech data in the speech code SCD1 and sends it to the transmission path 105. At this time, since the data embedding by the encoder 104 is performed in the speech code itself without changing the speech code format, there is no increase in the information amount of the speech code.
When speech code conversion apparatus 103 receives speech code SCD1 encoded in accordance with the first speech encoding method from encoder 104, speech code conversion device 103 uses the speech code of the second speech encoding method used in communication system 102. It is converted to SCD2 and sent to the transmission line 106. At this time, the speech code conversion device 103 performs speech code conversion without damaging the embedded data.
The decoder 107 of the second coding method in the communication system 102 reads out and outputs an arbitrary data sequence DT embedded in the speech code SCD2, and outputs the reproduced speech SP2 by performing normal decoder processing on the speech code. . At this time, since the embedding is performed so that the quality of the reproduced sound SP2 is hardly affected, the reproduced sound is hardly different from the case where the embedding is not performed.

図２は本発明の第1システムにおける符号変換装置103の構成図である。変換元で第1符号化方式に従って符号化され、且つ、データDTが埋め込まれた音声符号SCD1は、フレーム単位で順番に符号変換部111と埋め込みデータ抽出部112に入力する。符号変換部111は図２５に示す従来と同様の構成を有し、第1符号化方式の音声符号SCD1を第２符号化方式の音声符号SCD２′に変換する。埋め込みデータ抽出部112は、音声符号SCD1に埋め込まれたデータDTを抽出してデータ埋め込み部113へ出力する。埋め込みデータ抽出部112によるデータ抽出方法は、第1符号化方式の復号器のデータ抽出方法と同じである。データ埋め込み部113は、符号変換部111で変換された第2符号化方式の音声符号SCD2′と音声符号SCD1から抽出したデータDTが入力すると、音声符号SCD2′へフレーム単位でデータDTの埋め込みを行い、音声符号SCD2として出力する。データ埋め込み部113によるデータ埋め込み方法は、第2符号化方式の符号器のデータ埋め込み方法と同じである。 FIG. 2 is a block diagram of the code conversion apparatus 103 in the first system of the present invention. The speech code SCD1 encoded at the conversion source according to the first encoding method and embedded with the data DT is input to the code conversion unit 111 and the embedded data extraction unit 112 in order in units of frames. The code converting unit 111 has the same configuration as that shown in FIG. 25, and converts the first encoding speech code SCD1 into the second encoding speech code SCD2 '. The embedded data extraction unit 112 extracts the data DT embedded in the speech code SCD1 and outputs it to the data embedding unit 113. The data extraction method by the embedded data extraction unit 112 is the same as the data extraction method of the first encoding decoder. The data embedding unit 113 embeds the data DT in units of frames in the speech code SCD2 ′ when the second encoding method speech code SCD2 ′ converted by the code conversion unit 111 and the data DT extracted from the speech code SCD1 are input. And output as voice code SCD2. The data embedding method by the data embedding unit 113 is the same as the data embedding method of the encoder of the second encoding method.

図３は本発明の第1システムにおける符号変換装置103の別の構成図であり、図２の符号変換装置と同一部分には同一符号を付している。この符号変換装置103は、音声符号の性質に基いて適応的に音声符号SCD1から埋め込みデータDTを抽出すると共に音声符号SCD2′へデータDTの埋め込みを行う。たとえば、従来技術の項で説明したように、第1符号化方式の符号器は、ゲイン(ピッチゲイン、代数符号帳ゲイン)がある閾値以下であれば対応する符号(ピッチラグ符号、代数符号)の音声に対する寄与は小さいもの見なして、該符号のインデックスを任意のデータ系列DTで置き換える。このため、第1符号化方式の音声符号SCD1には、ゲインに応じてデータが埋め込まれている区間と埋め込まれていない区間が生じる。 FIG. 3 is another block diagram of the code conversion device 103 in the first system of the present invention. The same parts as those of the code conversion device of FIG. The code conversion device 103 adaptively extracts embedded data DT from the speech code SCD1 based on the properties of the speech code and embeds the data DT in the speech code SCD2 ′. For example, as described in the section of the prior art, the encoder of the first encoding scheme has a corresponding code (pitch lag code, algebraic code) if the gain (pitch gain, algebraic codebook gain) is less than a certain threshold. Assuming that the contribution to speech is small, the index of the code is replaced with an arbitrary data sequence DT. For this reason, a section in which data is embedded and a section in which data is not embedded are generated according to the gain, in the first encoding speech code SCD1.

埋め込み判定部121は、音声符号SCD1のゲインに基いてフレームあるいはサブフレーム単位で該符号に別のデータが埋め込まれているかどうかを判定し、データが埋め込まれていると判定した場合には、スイッチSW1を閉じて音声符号SCD1を埋め込みデータ抽出部112に入力する。埋め込みデータ抽出部112は音声符号SCD1よりデータを抽出し、FIFOバッファ構成のデータ保持部122に入力する。FIFOバッファはfirst-in first-outのバッファである。 The embedding determination unit 121 determines whether another data is embedded in the code in units of frames or subframes based on the gain of the audio code SCD1, and when determining that the data is embedded, SW1 is closed and the voice code SCD1 is input to the embedded data extraction unit 112. The embedded data extraction unit 112 extracts data from the audio code SCD1 and inputs the data to the data holding unit 122 having a FIFO buffer configuration. The FIFO buffer is a first-in first-out buffer.

埋め込み判定部123は、符号変換部111より出力された第2符号化方式の音声符号SCD2′のゲインに基いてフレームあるいはサブフレーム単位で該音声符号にデータを埋め込むかどうか判定し、データを埋め込むと判定すればスイッチSW2を閉じ、データ保持部122は保持しているデータを古いものからフレームあるいはサブフレーム単位でデータ埋め込み部113に入力する。この結果、データ埋め込み部113は、第2符号化方式の音声符号SCD2′にデータ保持部122から出力するデータDTをフレーム単位で埋め込み、音声符号SCD2として出力する。 The embedding determination unit 123 determines whether to embed data in the audio code in units of frames or subframes based on the gain of the audio code SCD2 ′ of the second encoding method output from the code conversion unit 111, and embeds the data If it is determined, the switch SW2 is closed, and the data holding unit 122 inputs the held data from the oldest to the data embedding unit 113 in units of frames or subframes. As a result, the data embedding unit 113 embeds the data DT output from the data holding unit 122 in the second encoding speech code SCD2 ′ in units of frames and outputs the result as the speech code SCD2.

各埋め込み判定の方法は、それぞれの符号化方式において使用されている方法と同じでよい。埋め込み判定部１２１と埋め込み判定部123の埋め込み判定方法が異なる場合、スイッチSW1,SW2の閉じるタイミングは必ずしも一致しない。さらに埋め込み判定方法が同じ場合でも、音声符号変換部111の変換誤差により変換前後で音声符号が異なるため、同様な現象が生じる。図3のデータ保持部122は上記スイッチングタイミングの差を吸収してデータの消失を防止する機能を有している。 Each embedding determination method may be the same as the method used in each encoding method. When the embedding determination unit 121 and the embedding determination unit 123 have different embedding determination methods, the closing timings of the switches SW1 and SW2 do not necessarily match. Further, even when the embedding determination method is the same, the same phenomenon occurs because the speech code differs before and after conversion due to the conversion error of the speech code conversion unit 111. The data holding unit 122 in FIG. 3 has a function of absorbing the difference in the switching timing and preventing data loss.

すなわち、変換先が埋め込み対象区間でない場合には、データ保持部122により第1音声符号SCD1から抽出したデータDTを一旦保持する。逆に変換元が埋め込み対象区間でない場合には、データ保持部122に保持しているデータを取り出して第2音声符号SCD2′に埋め込む。さらに、変換元の埋め込み対象の符号データサイズが変換先よりも大きい場合は、埋め込み可能なデータ量のみを埋め込み、残りをデータ保持部122により一旦保持する。また、データ保持部122のデータ保持数が減少した場合、変換先のデータ埋め込みを一旦停止し、データ保持数を回復させる。以上により、スイッチングタイミングの差を吸収してデータの消失を防止する。 That is, when the conversion destination is not the embedding target section, the data DT extracted from the first speech code SCD1 by the data holding unit 122 is temporarily held. Conversely, when the conversion source is not the embedding target section, the data held in the data holding unit 122 is extracted and embedded in the second speech code SCD2 ′. Further, when the code data size to be embedded at the conversion source is larger than the conversion destination, only the amount of data that can be embedded is embedded, and the remaining data is temporarily held by the data holding unit 122. When the data holding number of the data holding unit 122 decreases, data conversion destination data embedding is temporarily stopped to restore the data holding number. As described above, the difference in switching timing is absorbed to prevent data loss.

(b)第２のシステム
図４は本発明の第２のシステム概念図であり、変換元の通信システム101が音声回線105とデータ回線108を持ち、変換先の通信システム102が音声回線106のみ持つ場合を示している。図に示すように通信システム101における第１符号化方式の符号器104は、入力音声SP1を符号化して音声符号SCD1にし該音声符号を音声回線105に送出すると共に、音声符号以外の任意のデータ系列DTをデータ回線108に送出する。実際には音声符号SCDとデータ系列DTを時分割多重して多重回線に送出し、適当な箇所で分離して音声符号変換装置103に入力する。以上により、音声符号変換装置103には音声回線105から音声符号SCD1とデータ回線108からデータDTがそれぞれ入力する。音声符号変換装置103は第1符号化方式の音声符号SCD1を第２符号化方式の音声符号に変換するとともに該音声符号にデータDTを埋め込んで音声符号SCD2として変換先の通信システム102に音声回線106を介して伝送する。 (b) Second System FIG. 4 is a conceptual diagram of the second system of the present invention. The conversion source communication system 101 has a voice line 105 and a data line 108, and the conversion destination communication system 102 has only a voice line 106. It shows the case of having. As shown in the figure, the encoder 104 of the first encoding method in the communication system 101 encodes the input speech SP1 into a speech code SCD1 and sends the speech code to the speech line 105, and any data other than the speech code The sequence DT is transmitted to the data line 108. Actually, the voice code SCD and the data series DT are time-division multiplexed and transmitted to the multiplex line, separated at an appropriate location, and input to the voice code converter 103. Thus, the voice code SCD1 is input from the voice line 105 and the data DT is input from the data line 108 to the voice code conversion apparatus 103. The voice code converter 103 converts the voice code SCD1 of the first coding system into a voice code of the second coding system and embeds the data DT in the voice code as the voice code SCD2 to the destination communication system 102. 106 is transmitted.

通信システム102における第2符号化方式の復号器107は音声符号に埋め込まれた任意のデータ系列ＤＴを読み出して出力すると共に、音声符号に通常の復号器処理を施して再生音声SP2を出力する。このとき、再生音声SP2の品質にほとんど影響がないように埋め込みが行われるため、再生音声は埋め込みを行わない場合とほとんど差がない。
図5は本発明の第２システムにおける符号変換装置103の構成図であり、図2の第１システムにおける符号変換装置と同一部分には同一符号を付している。異なる点は、(1)データDTが音声符号SCD1とは別の経路で入力する点、(2)埋め込みデータ抽出部がなく、埋め込みデータDTを直接データ埋め込み部113へ入力する点である。 The decoder 107 of the second encoding method in the communication system 102 reads out and outputs an arbitrary data series DT embedded in the speech code, and performs a normal decoder process on the speech code to output the reproduced speech SP2. At this time, since the embedding is performed so that the quality of the reproduced sound SP2 is hardly affected, the reproduced sound is hardly different from the case where the embedding is not performed.
FIG. 5 is a block diagram of the code conversion device 103 in the second system of the present invention. The same reference numerals are given to the same parts as those of the code conversion device in the first system of FIG. The difference is that (1) the data DT is input via a route different from that of the speech code SCD1, and (2) there is no embedded data extraction unit, and the embedded data DT is directly input to the data embedding unit 113.

変換元である通信システムは第1符号化方式に従って符号化した音声符号SCD1とデータDTを時分割多重して多重回線200に送出し、回線分離部201はこれら音声符号SCD1とデータDTを分離して音声回線105、データ回線108を介して符号変換装置103に入力する。データ埋め込み部113は、符号変換部111で変換された第2符号化方式の音声符号SCD2′とデータDTが入力すると、音声符号SCD2′へフレーム単位でデータDTの埋め込みを行い、音声符号SCD2として音声回線106に送出する。 The communication system that is the conversion source time-division-multiplexes the voice code SCD1 and data DT encoded according to the first encoding method and sends them to the multiplex line 200. The line separation unit 201 separates these voice codes SCD1 and data DT. Are input to the code conversion device 103 via the voice line 105 and the data line 108. The data embedding unit 113, when the audio code SCD2 'of the second encoding method converted by the code conversion unit 111 and the data DT are input, embeds the data DT in units of frames in the audio code SCD2', and as the audio code SCD2 The data is sent to the voice line 106.

図６は本発明の第２システムにおける符号変換装置103の別の構成図であり、図３の第１システムにおける符号変換装置と同一部分には同一符号を付している。図3と異なる点は、(1)データDTが音声符号SCD1とは別の経路で入力する点、(2)埋め込み判定部、埋め込みデータ抽出部がなく、埋め込みデータDTを直接データ保持部122へ入力する点である。
変換元である通信システムは第1符号化方式に従って符号化した音声符号SCD1とデータDTを時分割多重して多重回線200に送出し、回線分離部201はこれら音声符号SCD1とデータDTを分離して音声回線105、データ回線108を介して符号変換装置103に入力する。 FIG. 6 is another configuration diagram of the code conversion apparatus 103 in the second system of the present invention. The same reference numerals are given to the same parts as those of the code conversion apparatus in the first system of FIG. 3 differs from FIG. 3 in that (1) the data DT is input via a different path from the speech code SCD1, and (2) there is no embedded determination unit and embedded data extraction unit, and the embedded data DT is directly sent to the data holding unit 122. It is a point to input.
The communication system that is the conversion source time-division-multiplexes the voice code SCD1 and data DT encoded according to the first encoding method and sends them to the multiplex line 200. The line separation unit 201 separates these voice codes SCD1 and data DT. Are input to the code conversion device 103 via the voice line 105 and the data line 108.

符号変換装置103は、音声符号の性質に基いて適応的に音声符号SCD′へデータDTの埋め込みを行う。すなわち、符号変換部111は第1符号化方式の音声符号SCD1を第２符号化方式の音声符号SCD２′に変換し、FIFOバッファ構成のデータ保持部122は入力されたデータDTを保持する。埋め込み判定部123は、符号変換部111より出力された第2符号化方式の音声符号SCD2′を基にフレームあるいはサブフレーム単位で該音声符号にデータを埋め込むかどうか判定し、データを埋め込むと判定すればスイッチSW2を閉じ、データ保持部122は保持しているデータを古いものからフレームあるいはサブフレーム単位でデータ埋め込み部113に入力する。この結果、データ埋め込み部113は、第2符号化方式の音声符号SCD2′にデータ保持部122から出力するデータDTをフレーム単位で埋め込み、音声符号SCD2として音声回線106に送出する。 The code conversion device 103 adaptively embeds the data DT in the speech code SCD ′ based on the nature of the speech code. That is, the code conversion unit 111 converts the first encoding speech code SCD1 into the second encoding speech code SCD2 ', and the FIFO buffer configuration data holding unit 122 holds the input data DT. The embedding determination unit 123 determines whether to embed data in the audio code in units of frames or subframes based on the second encoding method audio code SCD2 ′ output from the code conversion unit 111, and determines to embed the data Then, the switch SW2 is closed, and the data holding unit 122 inputs the held data from the oldest to the data embedding unit 113 in units of frames or subframes. As a result, the data embedding unit 113 embeds the data DT output from the data holding unit 122 in the second encoding method voice code SCD2 ′ in units of frames, and sends it as the voice code SCD2 to the voice line 106.

(c)第３のシステム
図７は本発明の第３のシステム概念図であり、第2のシステムとは逆に、変換元の通信システム101が音声回線105のみを持ち、変換先の通信システム102が音声回線106とデータ回線109を持つ場合を示している。
通信システム101における第１符号化方式の符号器104は、入力音声SP1を符号化すると共に該符号に音声データ以外の任意のデータ系列DTを埋め込み、音声符号SCD1として音声回線105に送出する。音声符号変換装置103は、第1符号化方式の音声符号SCD1を第2符号化方式の音声符号SCD2に変換するとともに、音声符号SCD1に埋め込まれているデータDTを抽出し、これら音声符号SCD2、データDTを各回線106,109に送出する。通信システム102はデータ回線109を介して入力したデータを出力すると共に、復号器107で音声符号SCD2を復号して再生音声SP2を出力する。なお、実際には音声符号SCD2、データDTは適所で時分割多重されて通信システム102に伝送され、通信システムで分離される。 (c) Third System FIG. 7 is a conceptual diagram of the third system of the present invention. Contrary to the second system, the conversion source communication system 101 has only the voice line 105 and the conversion destination communication system. A case where 102 has a voice line 106 and a data line 109 is shown.
The encoder 104 of the first encoding system in the communication system 101 encodes the input speech SP1, embeds an arbitrary data sequence DT other than speech data in the code, and sends it as a speech code SCD1 to the speech line 105. The voice code conversion device 103 converts the voice code SCD1 of the first coding system into the voice code SCD2 of the second coding system, and extracts the data DT embedded in the voice code SCD1, these voice codes SCD2, Data DT is transmitted to each of the lines 106 and 109. The communication system 102 outputs the data input via the data line 109 and also decodes the speech code SCD2 by the decoder 107 and outputs the reproduced speech SP2. Actually, the voice code SCD2 and the data DT are time-division multiplexed at an appropriate place, transmitted to the communication system 102, and separated by the communication system.

図8は本発明の第３システムにおける符号変換装置103の構成図であり、図2の第１システムにおける符号変換装置と同一部分には同一符号を付している。異なる点は、(1)データ埋め込み部がなく、符号変換部111から出力する第2符号化方式の音声符号SCD2に埋め込みデータ抽出部112で抽出したデータDTを埋め込まない点、(2)データDTが第2符号化方式の音声符号SCD２とは別々に送出される点である。
変換元で第1符号化方式に従って符号化され、且つ、データDTが埋め込まれた音声符号SCD1は、フレーム単位で順番に符号変換部111と埋め込みデータ抽出部112に入力する。符号変換部111は第1符号化方式の音声符号SCD1を第２符号化方式の音声符号SCD２に変換して音声回線106に送出する。また、埋め込みデータ抽出部112は、音声符号SCD1に埋め込まれたデータDTを抽出してデータ回線109に送出する。回線多重部203は音声回線106 データ回線109を介して入力する音声符号SCD2及びデータDTを時分割多重して多重回線204に送出する。 FIG. 8 is a block diagram of the code conversion apparatus 103 in the third system of the present invention. The same reference numerals are given to the same parts as those of the code conversion apparatus in the first system of FIG. The difference is that (1) there is no data embedding unit, and the data DT extracted by the embedded data extraction unit 112 is not embedded in the audio code SCD2 of the second encoding method output from the code conversion unit 111, (2) the data DT Is transmitted separately from the audio code SCD2 of the second encoding method.
The speech code SCD1 encoded at the conversion source according to the first encoding method and embedded with the data DT is input to the code conversion unit 111 and the embedded data extraction unit 112 in order in units of frames. The code conversion unit 111 converts the voice code SCD1 of the first coding system into the voice code SCD2 of the second coding system and sends it to the voice line 106. The embedded data extraction unit 112 extracts the data DT embedded in the speech code SCD1 and sends it to the data line 109. The line multiplexing unit 203 time-division-multiplexes the voice code SCD2 and the data DT input via the voice line 106 and the data line 109 and sends them to the multiplexing line 204.

図９は本発明の第３システムにおける符号変換装置103の別の構成図であり、図３の第１システムにおける符号変換装置と同一部分には同一符号を付している。図3と異なる点は、(1)データ保持部、埋め込み判定部、データ埋め込み部がない点、(2)符号変換部111から出力する音声符号SCD2にデータDTを埋め込まない点、(3)データDTが音声符号SCD２とは別々に送出される点である。 FIG. 9 is another configuration diagram of the code conversion apparatus 103 in the third system of the present invention. The same reference numerals are given to the same parts as those of the code conversion apparatus in the first system of FIG. 3 differs from FIG. 3 in that (1) there is no data holding unit, embedding determination unit, and data embedding unit, (2) data DT is not embedded in the audio code SCD2 output from the code conversion unit 111, and (3) data The point is that DT is sent separately from the voice code SCD2.

送信側の通信システムの符号器は,ゲイン(ピッチゲイン、代数符号帳ゲイン)がある閾値以下の場合は対応する符号(ピッチラグ符号、代数符号)の音声に対する寄与は小さいもの見なして、該符号のインデックスを任意のデータ系列DTで置き換える。この結果、第1符号化方式の音声符号SCD1には、データが埋め込まれている区間と埋め込まれていない区間が生じる。埋め込み判定部121は、音声符号SCD1から求まるゲインを基にフレームあるいはサブフレーム単位で該符号に別のデータが埋め込まれているかどうかを判定し、データが埋め込まれていると判定した場合には、スイッチSW1を閉じて音声符号SCD1を埋め込みデータ抽出部112に入力する。埋め込みデータ抽出部112は音声符号SCD1より埋め込みデータを抽出し、データ回線109に送出する。又、以上と並行して音声符号変換部111は第1符号化方式の音声符号SCD1を第2符号化方式の音声符号SCD2に変換して音声回線106に送出する。回線多重部203は音声回線106、データ回線109を介して入力する音声符号SCD2及びデータDTを時分割多重して多重回線204に送出する。 When the gain (pitch gain, algebraic codebook gain) is below a certain threshold, the encoder of the communication system on the transmitting side assumes that the contribution of the corresponding code (pitch lag code, algebraic code) to speech is small, and Replace the index with an arbitrary data series DT. As a result, a section in which data is embedded and a section in which data is not embedded are generated in the speech code SCD1 of the first encoding method. The embedding determination unit 121 determines whether another data is embedded in the code in units of frames or subframes based on the gain obtained from the audio code SCD1, and when determining that the data is embedded, The switch SW1 is closed and the voice code SCD1 is input to the embedded data extraction unit 112. The embedded data extraction unit 112 extracts embedded data from the voice code SCD 1 and sends it to the data line 109. In parallel with the above, the voice code conversion unit 111 converts the voice code SCD1 of the first coding system into the voice code SCD2 of the second coding system and sends it to the voice line 106. The line multiplexing unit 203 time-division multiplexes the voice code SCD2 and the data DT inputted via the voice line 106 and the data line 109 and sends them to the multiplexing line 204.

（B）第1システムにおける実施例
(a)第1実施例
図10は本発明の第1システムにおける符号変換装置の構成図であり、埋め込み制御する場合の構成を示している。
この第1実施例では、任意のデータが埋め込まれているAMRの音声符号を、埋め込みデータを損なうことなくG.729Aの音声符号に変換する場合の例を示している。さらに、第1実施例では、変換元のAMRの符号器は、代数符号帳ゲインが設定値より小さければ、代数符号に割り当てられている17ビット／サブフレームすべてに任意のデータを埋め込み、代数符号帳ゲインが設定値より大きければ本来の代数符号データを埋め込むものとする。また、変換先のG.729Aの符号器も同様に代数符号帳ゲインに応じて代数符号に割り当てられている17bitすべてにデータを埋め込むものとする。 (B) Example in the first system
(a) First Embodiment FIG. 10 is a configuration diagram of a code conversion device in the first system of the present invention, and shows a configuration in the case of embedding control.
The first embodiment shows an example in which an AMR speech code in which arbitrary data is embedded is converted into a G.729A speech code without losing the embedded data. Further, in the first embodiment, the AMR encoder of the conversion source embeds arbitrary data in all 17 bits / subframes assigned to the algebraic code if the algebraic codebook gain is smaller than the set value, If the book gain is larger than the set value, the original algebraic code data is embedded. Similarly, the conversion destination G.729A encoder also embeds data in all 17 bits assigned to the algebraic code according to the algebraic codebook gain.

図10において、第mフレームのAMRの符号器出力である回線データbst1(m)が端子1を通して符号分離部114に入力すると、該符号分離部114は、回線データbst1(m)をAMRの要素符号(LSP符号1、ピッチラグ符号1、ピッチゲイン符号1、代数符号1、代数ゲイン符号1)に分離する。そして、これら要素符号を符号変換部111における各符号変換部(LSP符号変換部111a、ピッチラグ符号変換部111b、ピッチゲイン符号変換部111c、代数ゲイン符号変換部111d、代数符号変換部111e)へ入力する。各符号変換部111a〜111eは第１符号化方式の符号を第2符号化方式の符号に変換するが、その動作については従来技術と同じであるためここでは説明を省略する。以下では、データ埋め込みに関連した部分のみを説明する。 In FIG. 10, when the line data bst1 (m), which is the AMR encoder output of the m-th frame, is input to the code separation unit 114 through the terminal 1, the code separation unit 114 converts the line data bst1 (m) into an AMR element. It is separated into codes (LSP code 1, pitch lag code 1, pitch gain code 1, algebraic code 1, algebraic gain code 1). Then, these element codes are input to each code converter (LSP code converter 111a, pitch lag code converter 111b, pitch gain code converter 111c, algebraic gain code converter 111d, algebraic code converter 111e) in the code converter 111. To do. Each of the code conversion units 111a to 111e converts the code of the first coding method into the code of the second coding method, but the operation thereof is the same as that of the prior art, and thus description thereof is omitted here. Below, only the part relevant to data embedding is demonstrated.

埋め込み判定部121は、代数ゲイン符号1から代数ゲイン逆量子化値(代数ゲイン)を求め、そのゲイン値に応じてスイッチSW1の切り替えを行う。すなわち、AMRの代数ゲイン値がある閾値よりも小さい場合は、埋め込みデータありと判定してスイッチSW1を閉じ、代数符号1を埋め込みデータ抽出部112に入力する。埋め込みデータ抽出部112は、代数符号に含まれる埋め込みデータDcodeを抽出してデータ保持部122へ出力する。本実施例では、AMRの代数符号(１７ビット／サブフレーム)すべてにデータが埋め込まれているので、１７bitのデータ系列を埋め込みデータDcodeとしてそのまま切り出す。FIFO構成のデータ保持部122は、入力されたデータ系列を古い順に格納して保持する。 The embedding determination unit 121 obtains an algebraic gain inverse quantization value (algebraic gain) from the algebraic gain code 1, and switches the switch SW1 according to the gain value. That is, if the AMR algebraic gain value is smaller than a certain threshold, it is determined that there is embedded data, the switch SW1 is closed, and the algebraic code 1 is input to the embedded data extraction unit 112. The embedded data extraction unit 112 extracts embedded data Dcode included in the algebraic code and outputs it to the data holding unit 122. In this embodiment, since data is embedded in all AMR algebraic codes (17 bits / subframe), a 17-bit data series is cut out as embedded data Dcode. A data holding unit 122 having a FIFO configuration stores and holds the input data series in chronological order.

一方、埋め込み判定部123は、代数ゲイン符号変換部111dより入力された変換後のG.729Aの代数ゲイン符号2から代数ゲイン逆量子化値を求め、そのゲイン値に応じてスイッチSW2の切り替えを行う。すなわち、G.729Aの代数ゲイン値がある閾値よりも小さい場合は、データを埋め込むと判断してスイッチSW2を閉じ、データ保持部122からデータをデータ埋め込み部113に入力する。本実施例では、G.729Aの代数符号(１７ビット／サブフレーム)すべてにデータを埋め込むため、データ保持部122は１７ビットのデータをデータ埋め込み部113に入力する。データ埋め込み部113は、代数符号2に割り当てられている１７ビットに入力されたデータを埋め込む。すなわち、G.729Aの代数符号(１７ビット)すべてをデータ系列(１７ビット)で置き換える。 On the other hand, the embedding determination unit 123 obtains an algebraic gain dequantized value from the algebraic gain code 2 of the converted G.729A input from the algebraic gain code converter 111d, and switches the switch SW2 according to the gain value. Do. That is, when the algebraic gain value of G.729A is smaller than a certain threshold value, it is determined that data is embedded, the switch SW2 is closed, and data is input from the data holding unit 122 to the data embedding unit 113. In this embodiment, the data holding unit 122 inputs 17-bit data to the data embedding unit 113 in order to embed data in all G.729A algebraic codes (17 bits / subframe). The data embedding unit 113 embeds data input to 17 bits assigned to the algebraic code 2. That is, all the algebraic codes (17 bits) of G.729A are replaced with the data series (17 bits).

データを埋め込まれた代数符号2は、その他の要素符号と共に符号多重部115で多重化され、埋め込みデータを含んだG.729Aの第ｎフレームの回線データbst2(n)として、端子2より出力される。
この第1実施例によれば、AMRの音声符号bst1(m)における代数符号に任意のデータが埋め込まれている場合、埋め込みデータを損なうことなく、該データをG.729Aの代数符号に埋め込んだ音声符号bst2(n)へと変換することができる。これによりAMRとG.729A間で音声フォーマットを変更することなく、音声通信に加えデータ通信を行うことが可能となる。
以上では、AMR→G.729Aへの変換について説明したが、第1実施例のデータ抽出、データ埋め込みに関連する部分の構成は、G.729AからAMRへの逆変換時にも適用可能である。 The algebraic code 2 in which the data is embedded is multiplexed by the code multiplexing unit 115 together with other element codes, and is output from the terminal 2 as line data bst2 (n) of the nth frame of G.729A including the embedded data. The
According to the first embodiment, when arbitrary data is embedded in the algebraic code in the AMR speech code bst1 (m), the data is embedded in the algebraic code of G.729A without damaging the embedded data. It can be converted into a voice code bst2 (n). As a result, data communication can be performed in addition to voice communication without changing the voice format between AMR and G.729A.
Although the conversion from AMR to G.729A has been described above, the configuration of the portion related to data extraction and data embedding according to the first embodiment can also be applied at the time of reverse conversion from G.729A to AMR.

(b)第2実施例
図11は本発明の第1システムにおける符号変換装置の別の構成図であり、埋め込み制御する場合の構成を示しており、図10の第1実施例と同一部分には同一符号を付している。異なる点は、第1実施例では、代数ゲインが設定値より小さければ、代数符号に割り当てられている17ビット／サブフレームすべてに任意のデータを埋め込むものとしているが、第2実施例では、ピッチゲインが設定値より小さければ、ピッチラグ符号に割り当てられている8ビットあるいは５ビット／サブフレームすべてに任意のデータを埋め込むものとする点である。 (b) Second Embodiment FIG. 11 is another configuration diagram of the code conversion device in the first system of the present invention, showing a configuration in the case of embedding control, and in the same part as the first embodiment of FIG. Are given the same reference numerals. The difference is that in the first embodiment, if the algebraic gain is smaller than the set value, arbitrary data is embedded in all 17 bits / subframes assigned to the algebraic code. If the gain is smaller than the set value, arbitrary data is embedded in all 8 bits or 5 bits / subframe assigned to the pitch lag code.

埋め込み判定部121は、ピッチゲイン符号1からピッチゲイン逆量子化値(ピッチゲイン)を求め、そのゲイン値に応じてスイッチSW1の切り替えを行う。すなわち、AMRのピッチゲイン値がある閾値よりも小さい場合は、埋め込みデータありと判定してスイッチSW1を閉じ、ピッチラグ符号1を埋め込みデータ抽出部112に入力する。埋め込みデータ抽出部112は、ピッチラグ符号に含まれる埋め込みデータDcodeを抽出してデータ保持部122へ出力する。本実施例では、AMRのピッチラグ符号(8ビット又は６ビット／サブフレーム)すべてにデータが埋め込まれているので、8ビット又は６ビットのデータ系列を埋め込みデータDcodeとしてそのまま切り出す。FIFO構成のデータ保持部122は、入力されたデータ系列を古い順に格納して保持する。 The embedding determination unit 121 obtains a pitch gain inverse quantization value (pitch gain) from the pitch gain code 1, and switches the switch SW1 according to the gain value. That is, if the AMR pitch gain value is smaller than a certain threshold value, it is determined that there is embedded data, the switch SW1 is closed, and the pitch lag code 1 is input to the embedded data extraction unit 112. The embedded data extraction unit 112 extracts embedded data Dcode included in the pitch lag code and outputs the extracted data to the data holding unit 122. In this embodiment, since data is embedded in all AMR pitch lag codes (8 bits or 6 bits / subframe), an 8-bit or 6-bit data series is cut out as embedded data Dcode as it is. A data holding unit 122 having a FIFO configuration stores and holds the input data series in chronological order.

一方、埋め込み判定部123は、ピッチゲイン符号変換部111cより入力された変換後のG.729Aのピッチゲイン符号2からピッチゲイン逆量子化値を求め、そのゲイン値に応じてスイッチSW2の切り替えを行う。すなわち、G.729Aのピッチゲイン値がある閾値よりも小さい場合は、データを埋め込むと判断してスイッチSW2を閉じ、データ保持部122からデータをデータ埋め込み部113に入力する。本実施例では、G.729Aのピッチラグ符号(8ビット又は５ビット／サブフレーム)すべてにデータを埋め込むため、データ保持部122はサブフレームに応じて8ビット又は５ビットのデータをデータ埋め込み部113に入力する。データ埋め込み部113は、ピッチラグ符号2に割り当てられている8ビット又は５ビットに入力されたデータを埋め込む。 On the other hand, the embedding determination unit 123 obtains a pitch gain inverse quantization value from the converted G.729A pitch gain code 2 input from the pitch gain code conversion unit 111c, and switches the switch SW2 according to the gain value. Do. That is, when the pitch gain value of G.729A is smaller than a certain threshold value, it is determined that data is embedded, the switch SW2 is closed, and data is input from the data holding unit 122 to the data embedding unit 113. In this embodiment, since data is embedded in all pitch lag codes (8 bits or 5 bits / subframe) of G.729A, the data holding unit 122 stores data of 8 bits or 5 bits according to the subframe. To enter. The data embedding unit 113 embeds data input in 8 bits or 5 bits assigned to the pitch lag code 2.

データを埋め込まれたピッチラグ符号2は、その他の要素符号と共に符号多重部115で多重化され、埋め込みデータを含んだG.729Aの第ｎフレームの回線データbst2(n)として、端子2より出力される。
第2実施例によれば、AMRの音声符号bst1(ｍ)のピッチラグ符号に任意のデータが埋め込まれている場合、埋め込みデータを損なうことなく、該データをG.729Aのピッチラグ符号に埋め込んだ音声符号bst2(n)へと変換することができる。これによりAMR(7.95kbps)とG.729A間で音声フォーマットを変更することなく、音声通信に加えデータ通信を行うことが可能となる。
以上では、AMR→G.729Aへの変換について説明したが、データ抽出、データ埋め込みに関連する部分の構成は、G.729AからAMRへの逆変換時やその他の符号変換時にも適用可能である。 The pitch lag code 2 in which the data is embedded is multiplexed by the code multiplexing unit 115 together with other element codes, and is output from the terminal 2 as line data bst2 (n) of the nth frame of G.729A including the embedded data. The
According to the second embodiment, when arbitrary data is embedded in the pitch lag code of the AMR speech code bst1 (m), the voice is embedded in the pitch lag code of G.729A without damaging the embedded data. It can be converted into the code bst2 (n). As a result, it is possible to perform data communication in addition to voice communication without changing the voice format between AMR (7.95 kbps) and G.729A.
Although the conversion from AMR to G.729A has been described above, the configuration of the part related to data extraction and data embedding can also be applied at the time of reverse conversion from G.729A to AMR and other code conversions. .

（ｃ）第3実施例
図12は本発明の第1システムにおける符号変換装置の別の構成図であり、埋め込み制御を行なわない場合の構成を示している。この第３実施例では、AMRの音声符号を埋め込みデータを損なうことなく、G.729Aの音声符号に変換する場合の例を示している。AMRの音声符号は図２１〜図２３を参照すると1フレーム20msecであり、5msec毎の4つのサブフレームを備え、各サブフレーム毎に１７ビットの代数符号を有している。一方、G.729Aの音声符号は1フレーム10msecであり、5msec毎の２つのサブフレームを備え、各サブフレーム毎に１７ビットの代数符号を有している。AMR,G729Aともに、この17ビットにより4つのパルス系統(表1参照)のパルス位置m0〜m3と極性s0〜s3が表現される。パルス位置m0〜m3と極性s0〜s3に対するビット割当は図13に示す通りである。 (C) Third Embodiment FIG. 12 is another configuration diagram of the code conversion apparatus in the first system of the present invention, and shows a configuration when embedding control is not performed. In the third embodiment, an example in which an AMR speech code is converted into a G.729A speech code without losing embedded data is shown. The AMR speech code is 20 msec per frame with reference to FIGS. 21 to 23, and includes four subframes every 5 msec, and each subframe has a 17-bit algebraic code. On the other hand, a G.729A speech code is 10 msec per frame, has two subframes every 5 msec, and each subframe has a 17-bit algebraic code. In both AMR and G729A, the pulse positions m0 to m3 and the polarities s0 to s3 of the four pulse systems (see Table 1) are expressed by these 17 bits. Bit assignments for the pulse positions m0 to m3 and the polarities s0 to s3 are as shown in FIG.

第3実施例において、変換元のAMRの符号器は例えば第4パス系統のパルス位置及び極性を示すm3,s3の5ビットにデータDcodeを埋め込む。埋め込みデータ抽出部112は常時、代数符号１に含まれる埋め込みデータDcodeを抽出してデータ埋め込み部113に入力する。データ埋め込み部113は、代数符号2に割り当てられている１７ビットのうちm3,s3の5ビットに入力されたデータDcodeを埋め込む。データを埋め込まれた代数符号2は、その他の要素符号と共に符号多重部115で多重化され、埋め込みデータを含んだG.729Aの第ｎフレームの回線データbst2(n)として、端子2より出力される。 In the third embodiment, the AMR encoder of the conversion source embeds data Dcode in 5 bits of m3 and s3 indicating the pulse position and polarity of the fourth path system, for example. The embedded data extraction unit 112 always extracts the embedded data Dcode included in the algebraic code 1 and inputs it to the data embedding unit 113. The data embedding unit 113 embeds the data Dcode input in 5 bits m3 and s3 out of 17 bits assigned to the algebraic code 2. The algebraic code 2 in which the data is embedded is multiplexed by the code multiplexing unit 115 together with other element codes, and is output from the terminal 2 as line data bst2 (n) of the nth frame of G.729A including the embedded data. The

以上第1のシステムによれば、変換元の第１符号化方式の音声符号SCD1から埋め込みデータDTを一旦抽出して、符号変換後の第２符号化方式の音声符号SCD2′に該データDTを再度埋め込むことにより、音声符号SCD1に埋め込まれたデータDTを損なうことなく、同データを埋め込んだ音声符号SCD2に変換することができる。
また、第1のシステムによれば、変換元と変換先で適応的に埋め込み制御が行われる場合、各符号化方式の埋め込み制御方法の相違により、あるいは従来の音声符号変換部での変換誤差により生じるデータ抽出と埋め込みのタイミングの差をデータ保持部により吸収することで、音声符号SCD1に埋め込まれたデータを損なうことなく、同データを埋め込んだ音声符号SCD2に変換することができる。
また、第1のシステムによれば、データ埋め込み技術を適用した音声回線を持つ音声通信システム間において、埋め込まれたデータを損なうことなく、しかも、音声符号フォーマットを変更することなく音声回線を介して音声とデータの両方の通信を行うことが可能となる。 As described above, according to the first system, the embedded data DT is once extracted from the speech code SCD1 of the first encoding method as the conversion source, and the data DT is added to the speech code SCD2 'of the second encoding method after the code conversion. By embedding again, the data DT embedded in the speech code SCD1 can be converted into the speech code SCD2 embedded in the data without damaging the data DT.
According to the first system, when adaptive embedding control is performed at the conversion source and the conversion destination, due to a difference in the embedding control method of each coding method or due to a conversion error in the conventional speech code conversion unit By absorbing the difference between the timing of data extraction and embedding that occurs by the data holding unit, the data embedded in the audio code SCD1 can be converted into the audio code SCD2 embedded in the same without damaging the data.
In addition, according to the first system, between voice communication systems having a voice line to which the data embedding technology is applied, the embedded data is not lost and the voice code format is not changed via the voice line. It is possible to perform both voice and data communication.

（C）本発明の第2のシステムの実施例
(a)第1実施例
図14は本発明の第2のシステムにおける音声符号変換装置の構成図であり、音声符号bst1(m)にデータDcodeが埋め込まれておらず、該データが音声符号と別回線で音声符号変換装置に入力される点が第1のシステムの実施例と異なる。回線多重部201は多重回線200を介して受信した多重データより音声符号bst1(m)とデータDcodeを分離し、端子1より音声符号bst1(m)を符号分離部114に入力し、端子3からデータDcodeを直接データ保持部122に入力する。
符号分離部114は、回線データbst1(m)をAMRの要素符号(LSP符号1、ピッチラグ符号1、ピッチゲイン符号1、代数符号1、代数ゲイン符号1)に分離し、これら要素符号を符号変換部111における各符号変換部(LSP符号変換部111a、ピッチラグ符号変換部111b、ピッチゲイン符号変換部111c、代数ゲイン符号変換部111d、代数符号変換部111e)へ入力する。各符号変換部111a〜111eは第１符号化方式の符号を第2符号化方式の符号に変換する。 (C) Embodiment of the second system of the present invention
(a) First Example FIG. 14 is a configuration diagram of a speech code conversion device in the second system of the present invention, in which data Dcode is not embedded in speech code bst1 (m), and the data is a speech code. It is different from the first system embodiment in that it is input to the speech code conversion device via a separate line. The line multiplexing unit 201 separates the voice code bst1 (m) and the data Dcode from the multiplexed data received via the multiple line 200, and inputs the voice code bst1 (m) from the terminal 1 to the code separation unit 114. Data Dcode is directly input to the data holding unit 122.
The code separation unit 114 separates the line data bst1 (m) into AMR element codes (LSP code 1, pitch lag code 1, pitch gain code 1, algebraic code 1, algebraic gain code 1), and converts these element codes. Input to each code conversion unit (LSP code conversion unit 111a, pitch lag code conversion unit 111b, pitch gain code conversion unit 111c, algebraic gain code conversion unit 111d, algebraic code conversion unit 111e) in unit 111. Each code conversion unit 111a to 111e converts the code of the first coding method into the code of the second coding method.

埋め込み判定部123は、代数ゲイン符号変換部111dより入力された変換後のG.729Aの代数ゲイン符号2から代数ゲイン逆量子化値を求め、そのゲイン値に応じてスイッチSW2の切り替えを行う。すなわち、G.729Aの代数ゲイン値がある閾値よりも小さい場合は、データを埋め込むと判断してスイッチSW2を閉じ、データ保持部122からデータをデータ埋め込み部113に入力する。データ埋め込み部113は、代数符号2に割り当てられている１７ビットに入力されたデータを埋め込む。データを埋め込まれた代数符号2は、その他の要素符号と共に符号多重部115で多重化され、埋め込みデータを含んだG.729Aの第ｎフレームの回線データbst2(n)として、端子2より出力される。 The embedding determination unit 123 obtains an algebraic gain inverse quantization value from the algebraic gain code 2 of G.729A after conversion input from the algebraic gain code conversion unit 111d, and switches the switch SW2 according to the gain value. That is, when the algebraic gain value of G.729A is smaller than a certain threshold value, it is determined that data is embedded, the switch SW2 is closed, and data is input from the data holding unit 122 to the data embedding unit 113. The data embedding unit 113 embeds data input to 17 bits assigned to the algebraic code 2. The algebraic code 2 in which the data is embedded is multiplexed by the code multiplexing unit 115 together with other element codes, and is output from the terminal 2 as line data bst2 (n) of the nth frame of G.729A including the embedded data. The

この実施例によれば、AMR側の通信システムにおいて、音声回線に加えデータ回線を持つ場合に、音声回線とデータ回線を介して別々に入力された音声符号bst1(m)とデータDcodeを、データを埋め込んだ音声符号bst2(n)に変換し、音声回線のみを持つG.729A側の通信システムへ伝送することができる。これにより、音声通信とデータ通信が可能な通信システム例えば第3世代携帯電話システム(音声符号化方式としてAMRが採用)から、音声回線のみを持つ通信システム例えば音声通信のみを行う従来の第2世代の携帯電話システム(G.729A)へ音声通信に加えてデータ通信を行うことが可能となる。 According to this embodiment, in the communication system on the AMR side, when there is a data line in addition to the voice line, the voice code bst1 (m) and the data Dcode input separately via the voice line and the data line are Can be converted to a voice code bst2 (n) embedded in the signal and transmitted to a communication system on the G.729A side having only a voice line. As a result, a communication system capable of voice communication and data communication, such as a third generation mobile phone system (AMR is adopted as a voice encoding method), a communication system having only a voice line, for example, a conventional second generation that performs only voice communication. In addition to voice communication, data communication can be performed with the mobile phone system (G.729A).

(a)第２実施例
図1５は本発明の第2のシステムにおける音声符号変換装置の別の構成図であり、埋め込み制御を行なわない場合の構成を示している。この第２実施例では、音声符号bst1(m)にデータDcodeが埋め込まれておらず、該データが音声符号と別回線で音声符号変換装置に入力される。又、G729Aの代数符号は、17ビットにより4つのパルス系統の各パルス位置m0〜m3と極性s0〜s3を表現するから、第2実施例では例えば第4パス系統のパルス位置及び極性を示すm3, s3の5ビットにデータDcodeを埋め込むものとする。
回線多重部201は多重回線200を介して受信した多重データより音声符号bst1(m)とデータDcodeを分離し、端子1より音声符号bst1(m)を符号分離部114に入力し、端子3からデータDcodeを直接データ埋め込み部113に入力する。
符号分離部114は、回線データbst1(m)をAMRの要素符号(LSP符号1、ピッチラグ符号1、ピッチゲイン符号1、代数符号1、代数ゲイン符号1)に分離し、これら要素符号を符号変換部111における各符号変換部(LSP符号変換部111a、ピッチラグ符号変換部111b、ピッチゲイン符号変換部111c、代数ゲイン符号変換部111d、代数符号変換部111e)へ入力する。各符号変換部111a〜111eは第１符号化方式の符号を第2符号化方式の符号に変換する。 (a) Second Embodiment FIG. 15 is another configuration diagram of the speech code conversion apparatus in the second system of the present invention, and shows a configuration when embedding control is not performed. In the second embodiment, the data Dcode is not embedded in the speech code bst1 (m), and the data is input to the speech code conversion device through a separate line from the speech code. In addition, since the algebraic code of G729A represents each pulse position m0 to m3 and the polarity s0 to s3 of the four pulse systems by 17 bits, in the second embodiment, for example, m3 indicating the pulse position and polarity of the fourth path system , Data Dcode is embedded in 5 bits of s3.
The line multiplexing unit 201 separates the voice code bst1 (m) and the data Dcode from the multiplexed data received via the multiple line 200, and inputs the voice code bst1 (m) from the terminal 1 to the code separation unit 114. Data Dcode is directly input to the data embedding unit 113.
The code separation unit 114 separates the line data bst1 (m) into AMR element codes (LSP code 1, pitch lag code 1, pitch gain code 1, algebraic code 1, algebraic gain code 1), and converts these element codes. Input to each code conversion unit (LSP code conversion unit 111a, pitch lag code conversion unit 111b, pitch gain code conversion unit 111c, algebraic gain code conversion unit 111d, algebraic code conversion unit 111e) in unit 111. Each code conversion unit 111a to 111e converts the code of the first coding method into the code of the second coding method.

データ埋め込み部113は、代数符号2に割り当てられている１７ビットのうちm3,s3の5ビットに入力されたデータDcodeを埋め込む。データを埋め込まれた代数符号2は、その他の要素符号と共に符号多重部115で多重化され、埋め込みデータを含んだG.729Aの第ｎフレームの回線データbst2(n)として、端子2より出力される。 The data embedding unit 113 embeds the data Dcode input in 5 bits m3 and s3 out of 17 bits assigned to the algebraic code 2. The algebraic code 2 in which the data is embedded is multiplexed by the code multiplexing unit 115 together with other element codes, and is output from the terminal 2 as line data bst2 (n) of the nth frame of G.729A including the embedded data. The

以上第2のシステムによれば、音声回線と別にデータ回線を持つ通信システムから音声回線のみを持つ通信システムへ音声符号フォーマットを変更することなく、音声通信とデータ通信を行うことが可能となる。
以上では、AMR→G.729Aへの変換について説明したが、G.729AからAMRへの逆変換時、その他の符号変換時にも適用可能である。又、以上では、代数ゲインに応じて代数符号にデータを埋め込む場合について説明したが、ピッチゲインに応じてピッチラグ符号にデータを埋め込むようにすることもできる。 As described above, according to the second system, it is possible to perform voice communication and data communication without changing the voice code format from a communication system having a data line separately from a voice line to a communication system having only a voice line.
Although the conversion from AMR to G.729A has been described above, the present invention can also be applied during reverse conversion from G.729A to AMR and other code conversions. In the above description, the case where data is embedded in an algebraic code in accordance with an algebraic gain has been described. However, data may be embedded in a pitch lag code in accordance with a pitch gain.

（D）本発明の第3のシステム
(a)第1実施例
図16は本発明の第3のシステムにおける音声符号変換装置の構成図であり、埋め込みデータを適応的に抽出する場合の構成を示している。この実施例において、第1の符号化方式はG.729A、第2の符号化方式はAMR(7.95kbps)であり、符号変換装置はG.729Aの音声符号をAMRの音声符号に変換して伝送すると共に、G.729Aの音声符号に埋め込まれていたデータを抽出して音声符号と別々に伝送する。また、変換元のG.729Aの符号器（図示せず)は、代数ゲインが設定値より小さければ、代数符号に割り当てられている17ビット／サブフレームすべてに任意のデータを埋め込み、代数ゲインが設定値より大きければ本来の代数符号データを埋め込むものとする。 (D) Third system of the present invention
(a) First Embodiment FIG. 16 is a configuration diagram of a speech code conversion apparatus in the third system of the present invention, and shows a configuration when adaptively extracting embedded data. In this embodiment, the first encoding scheme is G.729A, the second encoding scheme is AMR (7.95 kbps), and the code converter converts the G.729A speech code into an AMR speech code. At the same time, the data embedded in the G.729A speech code is extracted and transmitted separately from the speech code. Also, the conversion source G.729A encoder (not shown) embeds arbitrary data in all 17 bits / subframes assigned to the algebraic code if the algebraic gain is smaller than the set value, and the algebraic gain is If it is larger than the set value, the original algebraic code data is embedded.

第mフレームのG.729Aの符号器出力である回線データbst1(m)が端子1を通して符号分離部114に入力すると、該符号分離部114は、回線データbst1(m)をG.729Aの要素符号(LSP符号1、ピッチラグ符号1、ピッチゲイン符号1、代数符号1、代数ゲイン符号1)に分離する。そして、これら要素符号を符号変換部111における各符号変換部(LSP符号変換部111a、ピッチラグ符号変換部111b、ピッチゲイン符号変換部111c、代数ゲイン符号変換部111d、代数符号変換部111e)へ入力する。各符号変換部111a〜111eはG.729Aの符号をAMRの符号に変換し、符号多重部115は各AMRの符号を多重して音声符号bst2(n)として回線多重部203に入力する。 When the line data bst1 (m), which is the G.729A encoder output of the m-th frame, is input to the code separation unit 114 through the terminal 1, the code separation unit 114 converts the line data bst1 (m) into the G.729A element. It is separated into codes (LSP code 1, pitch lag code 1, pitch gain code 1, algebraic code 1, algebraic gain code 1). Then, these element codes are input to each code converter (LSP code converter 111a, pitch lag code converter 111b, pitch gain code converter 111c, algebraic gain code converter 111d, algebraic code converter 111e) in the code converter 111. To do. Each code conversion unit 111a to 111e converts the G.729A code into an AMR code, and the code multiplexing unit 115 multiplexes each AMR code and inputs it to the line multiplexing unit 203 as a voice code bst2 (n).

以上と並行して、埋め込み判定部121は、代数ゲイン符号1から代数ゲイン逆量子化値(代数ゲイン)を求め、そのゲイン値に応じてスイッチSW1の切り替えを行う。すなわち、G.729Aの代数ゲイン値がある閾値よりも小さい場合は、埋め込みデータありと判定してスイッチSW1を閉じ、代数符号1を埋め込みデータ抽出部112に入力する。埋め込みデータ抽出部112は、代数符号に含まれる埋め込みデータDcodeを抽出して回線多重部203に入力する。G.729Aの代数符号(１７ビット／サブフレーム)すべてにデータが埋め込まれているので、１７bitのデータ系列を埋め込みデータDcodeとしてそのまま切り出して回線多重部203に入力する。
回線多重部203は入力する音声符号bst2(n)及びデータDcode を多重して多重回線204に送出する。 In parallel with the above, the embedding determination unit 121 obtains an algebraic gain dequantized value (algebraic gain) from the algebraic gain code 1, and switches the switch SW1 according to the gain value. That is, when the algebraic gain value of G.729A is smaller than a certain threshold value, it is determined that there is embedded data, the switch SW1 is closed, and the algebraic code 1 is input to the embedded data extraction unit 112. The embedded data extraction unit 112 extracts embedded data Dcode included in the algebraic code and inputs it to the line multiplexing unit 203. Since data is embedded in all G.729A algebraic codes (17 bits / subframe), a 17-bit data series is cut out as embedded data Dcode and input to the line multiplexing unit 203 as it is.
The line multiplexing unit 203 multiplexes the input voice code bst2 (n) and the data Dcode and sends them to the multiplexing line 204.

(b)第2実施例
図17は本発明の第3のシステムにおける音声符号変換装置の別の構成図であり、埋め込みデータが代数符号に常に挿入されている場合である。この実施例において、第1の符号化方式はG.729A、第2の符号化方式はAMR(7.95kbps)であり、音声符号変換装置はG.729Aの音声符号をAMRの音声符号に変換して伝送すると共に、G.729Aの音声符号に埋め込まれていたデータを抽出して音声符号と別回線で伝送する。また、変換元のG.729Aの符号器は、代数符号のm3, s3の5ビット(図13参照)にデータDcodeを埋め込むものとする。 (b) Second Embodiment FIG. 17 is another configuration diagram of the speech code conversion apparatus in the third system of the present invention, in which embedded data is always inserted in an algebraic code. In this embodiment, the first encoding method is G.729A, the second encoding method is AMR (7.95 kbps), and the speech code converter converts the G.729A speech code into an AMR speech code. The data embedded in the G.729A speech code is extracted and transmitted on a separate line from the speech code. Also, the G.729A encoder as the conversion source embeds data Dcode in 5 bits (see FIG. 13) of algebraic codes m3 and s3.

以上と並行して、埋め込みデータ抽出部112は、代数符号に含まれる埋め込みデータDcodeを抽出して回線多重部203に入力する。G.729Aの代数符号m3,s3ビット位置にデータが埋め込まれているので、該データを切り取って埋め込みデータDcodeとして回線多重部203に入力する。回線多重部203は入力する音声符号bst2(n)及びデータDcode を多重して多重回線204に送出する。
第3のシステムによれば、音声回線のみを持つ通信システムから音声回線と別にデータ回線を持つ通信システムへ音声符号フォーマットを変更することなく、音声通信とデータ通信を行うことが可能となる。
以上では、G.729A→AMRへの変換について説明したが、その他の符号変換時にも適用可能である。又、以上では、代数ゲインに応じて代数符号にデータを埋め込む場合について説明したが、ピッチゲインに応じてピッチラグ符号にデータを埋め込むようにすることもできる。 In parallel with the above, the embedded data extraction unit 112 extracts embedded data Dcode included in the algebraic code and inputs it to the line multiplexing unit 203. Since data is embedded in the algebraic code m3, s3 bit position of G.729A, the data is cut out and input to the line multiplexing unit 203 as embedded data Dcode. The line multiplexing unit 203 multiplexes the input voice code bst2 (n) and the data Dcode and sends them to the multiplexing line 204.
According to the third system, voice communication and data communication can be performed without changing the voice code format from a communication system having only a voice line to a communication system having a data line separately from the voice line.
Although the conversion from G.729A to AMR has been described above, the present invention can also be applied to other code conversions. In the above description, the case where data is embedded in an algebraic code in accordance with an algebraic gain has been described. However, data may be embedded in a pitch lag code in accordance with a pitch gain.

・付記
（付記１）入力音声を第1音声符号化方式により符号化した第1音声符号を第2音声符号化方式による第2音声符号に変換する音声符号変換方法において、第1音声符号に任意のデータが埋め込まれている場合、該第1音声符号を第2音声符号に変換すると共に、該第1音声符号から埋め込みデータを抽出し、前記変換により得られる第2音声符号に前記抽出したデータを埋め込む、ことを特徴とする音声符号変換方法。
（付記２）
送信元において、データ埋め込み条件が満たされた時、第1音声符号の一部を前記データで置き換えることにより、第1音声符号にデータを埋め込んだ場合、受信した第1音声符号を構成する所定の要素符号の逆量子化値を参照して前記データ埋め込み条件が満たされているか監視し、データ埋め込み条件が満たされていれば第1音声符号より前記埋め込みデータを抽出する、ことを特徴とする付記1記載の音声符号変換方法。
（付記３）
前記抽出した埋め込みデータをデータ保持部に保存すると共に、該データ保持部より埋め込みデータを読み出して第2音声符号に埋め込む、ことを特徴とする付記２記載の音声符号変換方法。
（付記４）
送信元において、データ埋め込み条件が満たされた時、第1音声符号の一部を前記データで置き換えることにより、第1音声符号にデータを埋め込んだ場合、送信元から受信した第1音声符号を構成する所定の要素符号の逆量子化値を参照して前記データ埋め込み条件が満たされているか監視し、データ埋め込み条件が満たされていれば該第1音声符号より前記埋め込みデータを抽出し、該抽出した埋め込みデータを保持し、前記変換により得られた第2音声符号を構成する所定の要素符号の逆量子化値を参照してデータ埋め込み条件が満たされているか監視し、満たされている場合、前記保持されているデータで該第2音声符号の一部を置き換えることによりデータを第2音声符号に埋め込む、ことを特徴とする付記１記載の音声符号変換方法。
（付記5）
入力音声を第1音声符号化方式により符号化した第1音声符号を第2音声符号化方式による第2音声符号に変換する音声符号変換方法において、第1音声符号とデータを送信元から別々に受信し、第1音声符号を第2音声符号に変換し、該変換により得られた第2音声符号に前記データを埋め込んで送信先へ送信する、ことを特徴とする音声符号変換方法。
（付記６）
前記第1音声符号を音声回線より、前記データをデータ回線よりそれぞれ受信し、前記データが埋め込まれた第2音声符号を音声回線を介して送信先へ送信する、ことを特徴とする付記５記載の音声符号変換方法。
（付記７）
前記受信したデータをデータ保持部に保存し、前記第2音声符号を構成する所定の要素符号の逆量子化値を参照してデータ埋め込み条件が満たされているか監視し、満たされている場合、前記データ保持部に保存されているデータで第2音声符号の一部を置き換えることによりデータを第2音声符号に埋め込む、ことを特徴とする付記５記載の音声符号変換方法。
（付記８）
入力音声を第1音声符号化方式により符号化した第1音声符号を第2音声符号化方式による第2音声符号に変換する音声符号変換方法において、第1音声符号を受信し、該第1音声符号に任意のデータが埋め込まれている場合、該第1音声符号を第2音声符号に変換すると共に、該第1音声符号から埋め込みデータを抽出し、前記変換により得られる第2音声符号と前記抽出したデータを別々に送信先に送信する、ことを特徴とする音声符号変換方法。
（付記９）
送信元において、データ埋め込み条件が満たされた時、第1音声符号の一部を前記データで置き換えることにより、第1音声符号にデータを埋め込んだ場合、受信した第1音声符号を構成する所定の要素符号の逆量子化値を参照して前記データ埋め込み条件が満たされているか監視し、データ埋め込み条件が満たされていれば該第1音声符号より前記埋め込みデータを抽出する、ことを特徴とする付記8記載の音声符号変換方法。
（付記１０）
入力音声を第1音声符号化方式により符号化した第1音声符号を第2音声符号化方式による第2音声符号に変換する音声符号変換装置において、第1音声符号に任意のデータが埋め込まれている場合、第1音声符号を第2音声符号に変換する符号変換部、該第1音声符号から埋め込みデータを抽出する埋め込みデータ抽出部、前記変換により得られる第2音声符号に前記抽出したデータを埋め込むデータ埋め込み部、を備えたことを特徴とする音声符号変換装置。
（付記１１）
送信元において、データ埋め込み条件が満たされた時、第1音声符号の一部を前記データで置き換えることにより、第1音声符号にデータを埋め込んだ場合、前記埋め込みデータ抽出部は受信した第1音声符号を構成する所定の要素符号の逆量子化値を参照して前記データ埋め込み条件が満たされているか監視し、データ埋め込み条件が満たされていれば第1音声符号より前記埋め込みデータを抽出する、ことを特徴とする付記1０記載の音声符号変換装置。
（付記１２）
更に、前記抽出した埋め込みデータを保存するデータ保持部を備え、前記埋め込みデータ抽出部は該データ保持部に前記抽出した埋め込みデータを保存すると共に、前記データ埋め込み部は該データ保持部より埋め込みデータを読み出して第2音声符号に埋め込む、ことを特徴とする付記１１記載の音声符号変換装置。
（付記１３）
前記埋め込みデータ抽出部は、前記第2音声符号を構成する所定の要素符号の逆量子化値を参照してデータ埋め込み条件が満たされているか監視し、満たされている場合、前記データ保持部に保存されているデータで第2音声符号の一部を置き換えることによりデータを第2音声符号に埋め込む、ことを特徴とする付記１２記載の音声符号変換装置。
（付記１４）
入力音声を第1音声符号化方式により符号化した第1音声符号を第2音声符号化方式による第2音声符号に変換する音声符号変換装置において、第1音声符号とデータを送信元から別々に受信する受信手段、第1音声符号を第2音声符号に変換する符号変換部、該変換により得られた第2音声符号に前記データを埋め込んで送信先へ送信するデータ埋め込み部、を有することを特徴とする音声符号変換装置。
（付記１５）
音声符号変換装置は更に前記データを保存するデータ保持部を備え、データ埋め込み部は、前記第2音声符号を構成する所定の要素符号の逆量子化値を参照してデータ埋め込み条件が満たされているか監視する手段、満たされている場合、前記データ保持部に保存されているデータで第2音声符号の一部を置き換えることによりデータを第2音声符号に埋め込む手段、を有することを特徴とする付記１４記載の音声符号変換装置。
（付記１６）
入力音声を第1音声符号化方式により符号化した第1音声符号を第2音声符号化方式による第2音声符号に変換する音声符号変換装置において、送信元から受信した第1音声符号に任意のデータが埋め込まれている場合、該第1音声符号を第2音声符号に変換する符号変換部、該第1音声符号から埋め込みデータを抽出する埋め込みデータ抽出部、前記変換により得られる第2音声符号と前記抽出したデータを別々に送信先に送信する手段、を備えたことを特徴とする音声符号変換装置。
（付記１７）
送信元において、データ埋め込み条件が満たされた時、第1音声符号の一部を前記データで置き換えることにより、第1音声符号にデータを埋め込んだ場合、前記埋め込みデータ抽出部は、送信元から受信した1音声符号を構成する所定の要素符号の逆量子化値を参照して前記データ埋め込み条件が満たされているか監視し、データ埋め込み条件が満たされていれば第1音声符号より前記埋め込みデータを抽出する、ことを特徴とする付記１６記載の音声符号変換装置。・ Supplementary Note (Supplementary Note 1) In the speech code conversion method for converting the first speech code obtained by encoding the input speech by the first speech coding method into the second speech code by the second speech coding method, the first speech code is arbitrarily selected Is embedded, the first speech code is converted into a second speech code, embedded data is extracted from the first speech code, and the extracted data is extracted into the second speech code obtained by the conversion. A speech code conversion method characterized by embedding.
(Appendix 2)
When the data embedding condition is satisfied at the transmission source, when the data is embedded in the first speech code by replacing a part of the first speech code with the data, a predetermined constituent of the received first speech code Additional information characterized by monitoring whether the data embedding condition is satisfied with reference to an inverse quantized value of an element code, and extracting the embedded data from the first speech code if the data embedding condition is satisfied The speech code conversion method according to 1.
(Appendix 3)
3. The speech code conversion method according to appendix 2, wherein the extracted embedded data is stored in a data holding unit, and the embedded data is read from the data holding unit and embedded in the second speech code.
(Appendix 4)
When the data embedding condition is satisfied at the transmission source, when the data is embedded in the first audio code by replacing a part of the first audio code with the data, the first audio code received from the transmission source is configured. Monitoring whether the data embedding condition is satisfied with reference to a dequantized value of the predetermined element code, extracting the embedded data from the first speech code if the data embedding condition is satisfied, and extracting Holding the embedded data, and monitoring whether the data embedding condition is satisfied with reference to the inverse quantization value of the predetermined element code constituting the second speech code obtained by the conversion, The speech code conversion method according to appendix 1, wherein the data is embedded in the second speech code by replacing a part of the second speech code with the retained data.
(Appendix 5)
In the voice code conversion method for converting the first voice code obtained by encoding the input voice by the first voice coding method into the second voice code by the second voice coding method, the first voice code and the data are separately transmitted from the transmission source. A speech code conversion method comprising: receiving, converting a first speech code into a second speech code, embedding the data in a second speech code obtained by the conversion, and transmitting the data to a transmission destination.
(Appendix 6)
The supplementary note 5, wherein the first voice code is received from the voice line, the data is received from the data line, and the second voice code in which the data is embedded is transmitted to the destination via the voice line. Voice code conversion method.
(Appendix 7)
When the received data is stored in a data holding unit, and monitored whether the data embedding condition is satisfied with reference to a dequantized value of a predetermined element code constituting the second speech code, 6. The speech code conversion method according to appendix 5, wherein the data is embedded in the second speech code by replacing a part of the second speech code with the data stored in the data holding unit.
(Appendix 8)
In a speech code conversion method for converting a first speech code obtained by encoding an input speech using a first speech encoding method into a second speech code using a second speech encoding method, the first speech code is received, and the first speech code is received. When arbitrary data is embedded in the code, the first speech code is converted into a second speech code, and the embedded data is extracted from the first speech code, and the second speech code obtained by the conversion and the A speech code conversion method, wherein the extracted data is separately transmitted to a transmission destination.
(Appendix 9)
When the data embedding condition is satisfied at the transmission source, when the data is embedded in the first speech code by replacing a part of the first speech code with the data, a predetermined constituent of the received first speech code Monitoring whether the data embedding condition is satisfied by referring to an inverse quantization value of an element code, and extracting the embedded data from the first speech code if the data embedding condition is satisfied The voice code conversion method according to attachment 8.
(Appendix 10)
In a speech code conversion apparatus that converts a first speech code obtained by encoding an input speech using a first speech encoding method into a second speech code using a second speech encoding method, arbitrary data is embedded in the first speech code. A code conversion unit that converts the first speech code into a second speech code, an embedded data extraction unit that extracts embedded data from the first speech code, and the extracted data in the second speech code obtained by the conversion A speech code conversion device comprising a data embedding unit to be embedded.
(Appendix 11)
When the data embedding condition is satisfied at the transmission source, when the data is embedded in the first speech code by replacing a part of the first speech code with the data, the embedded data extraction unit receives the received first speech code Monitoring whether or not the data embedding condition is satisfied with reference to an inverse quantization value of a predetermined element code constituting the code, and extracting the embedded data from the first speech code if the data embedding condition is satisfied; The speech code conversion device according to supplementary note 10, characterized by the above.
(Appendix 12)
And a data holding unit for storing the extracted embedded data. The embedded data extracting unit stores the extracted embedded data in the data holding unit, and the data embedding unit stores the embedded data from the data holding unit. The speech code conversion device according to appendix 11, wherein the speech code conversion device is read out and embedded in the second speech code.
(Appendix 13)
The embedded data extraction unit monitors whether a data embedding condition is satisfied with reference to an inverse quantization value of a predetermined element code constituting the second speech code. 13. The speech code conversion device according to appendix 12, wherein the data is embedded in the second speech code by replacing a part of the second speech code with the stored data.
(Appendix 14)
In a speech code conversion device for converting a first speech code obtained by encoding an input speech using a first speech encoding method into a second speech code using a second speech encoding method, the first speech code and data are separately transmitted from a transmission source. Receiving means for receiving, a code converting unit for converting the first speech code into the second speech code, and a data embedding unit for embedding the data in the second speech code obtained by the conversion and transmitting it to the transmission destination A speech code conversion device.
(Appendix 15)
The speech code conversion apparatus further includes a data holding unit that stores the data, and the data embedding unit refers to a dequantized value of a predetermined element code constituting the second speech code and satisfies a data embedding condition. Means for monitoring whether or not, if satisfied, means for embedding data in the second voice code by replacing a part of the second voice code with the data stored in the data holding unit. The speech code converter according to appendix 14.
(Appendix 16)
In a speech code conversion apparatus that converts a first speech code obtained by encoding an input speech using a first speech encoding method into a second speech code using a second speech encoding method, the first speech code received from a transmission source is arbitrarily set. When data is embedded, a code conversion unit that converts the first speech code into a second speech code, an embedded data extraction unit that extracts embedded data from the first speech code, and a second speech code obtained by the conversion And a means for separately transmitting the extracted data to a transmission destination.
(Appendix 17)
When the data embedding condition is satisfied at the transmission source, when the data is embedded in the first audio code by replacing a part of the first audio code with the data, the embedded data extraction unit receives the data from the transmission source. The data embedding condition is monitored by referring to a dequantized value of a predetermined element code constituting one voice code, and if the data embedding condition is satisfied, the embedded data is obtained from the first voice code. The speech code conversion device according to supplementary note 16, characterized in that it is extracted.

以上、本発明によれば、変換元の第１符号化方式の音声符号から埋め込みデータを一旦抽出して、符号変換後の第２符号化方式の音声符号に該データを再度埋め込むことにより、第１符号化方式の音声符号に埋め込まれたデータを損なうことなく、同データを埋め込んだ第２符号化方式の音声符号に変換することができる。
また、本発明によれば、変換元と変換先で適応的に埋め込み制御が行われる場合、各符号化方式の埋め込み制御方法の相違により、あるいは従来の音声符号変換部での変換誤差により生じるデータ抽出と埋め込みのタイミングの差をデータ保持部により吸収することで、第１符号化方式の音声符号に埋め込まれたデータを損なうことなく、同データを埋め込んだ第２符号化方式の音声符号に変換することができる。 As described above, according to the present invention, the embedding data is once extracted from the speech code of the first encoding method as the conversion source, and the data is re-embedded in the speech code of the second encoding method after the code conversion. Without damaging the data embedded in the speech code of one encoding method, it can be converted into the speech code of the second encoding method embedded with the same data.
Further, according to the present invention, when adaptive embedding control is performed between the conversion source and the conversion destination, data generated due to a difference in the embedding control method of each encoding method or due to a conversion error in the conventional speech code conversion unit The difference between the timing of extraction and embedding is absorbed by the data holding unit, so that the data embedded in the first encoding speech code is converted to the second encoding speech code embedded with the same data without damaging the data. can do.

また、本発明によれば、データ埋め込み技術を適用した音声回線を持つ音声通信システム間において、埋め込まれたデータを損なうことなく、しかも、音声符号フォーマットを変更することなく音声回線を介して音声とデータの両方の通信を行うことが可能となる。
また、本発明によれば、変換元のシステムより第１符号化方式の音声符号とデータが別回線で音声符号変換部に入力された場合、該音声符号変換部は符号変換後の第２符号化方式の音声符号に前記データを埋め込むことにより変換先へ音声回線のみで伝送することが可能となる。
また、本発明によれば、変換元のシステムより音声回線を介して任意のデータDTが埋め込まれた第１符号化方式の音声符号が入力された場合に、音声符号変換部は該音声符号から埋め込みデータを抽出してデータ回線に送出すると共に第１符号化方式の音声符号を第２符号化方式の音声符号に変換して音声回線に送出することにより、変換元の音声回線によって伝送された音声情報とデータ情報とを変換先の音声回線とデータ回線に分離して伝送することが可能となる。 Further, according to the present invention, between voice communication systems having a voice line to which the data embedding technique is applied, voice and voice can be transmitted via the voice line without damaging the embedded data and without changing the voice code format. Both data communications can be performed.
Further, according to the present invention, when the voice code and data of the first encoding method are input to the voice code conversion unit through separate lines from the conversion source system, the voice code conversion unit performs the second code after code conversion. By embedding the data in the voice code of the encoding method, it becomes possible to transmit to the conversion destination only by the voice line.
Further, according to the present invention, when a speech code of the first encoding method in which arbitrary data DT is embedded via a speech line is input from the conversion source system, the speech code conversion unit converts the speech code from the speech code. The embedded data is extracted and sent to the data line, and the voice code of the first coding method is converted into the voice code of the second coding method and sent to the voice line. It is possible to transmit the voice information and the data information separately on the voice line and the data line to be converted.

また、本発明によれば、音声回線のみを持つ通信システムと音声回線と別にデータ回線を持つ通信システム間において、音声符号フォーマットを変更することなく、音声通信とデータ通信を行うことが可能となる。
今後、マルチメディア情報通信の普及を背景に、従来携帯電話システムと次世代携帯電話システム間の通信、またはVoIPと携帯電話等のモバイルシステム間の通信等、多様な通信システム間の通信において、データ埋め込み技術と音声符号変換技術を併用した技術の必要性は高いため、本発明の効果は大きい。 Also, according to the present invention, it is possible to perform voice communication and data communication between a communication system having only a voice line and a communication system having a data line separately from the voice line, without changing the voice code format. .
In the future, with the spread of multimedia information communication, data will be used in communication between various communication systems, such as communication between conventional mobile phone systems and next-generation mobile phone systems, or communication between mobile systems such as VoIP and mobile phones. Since there is a high need for a technique using both the embedding technique and the voice code conversion technique, the effect of the present invention is great.

１０１変換元の通信システム
１０２変換先の通信システム
１０３音声符号変換装置
１０４第１符号化方式の符号器
１０５音声回線
１０６音声回線
１０７第2符号化方式の復号器
１０８データ回線 DESCRIPTION OF SYMBOLS 101 Conversion-source communication system 102 Conversion-destination communication system 103 Speech code conversion apparatus 104 First encoding system encoder 105 Audio line 106 Audio line 107 Second encoding system decoder 108 Data line

Claims

In the speech code conversion method for converting the first speech code obtained by encoding the input speech by the first speech encoding method into the second speech code by the second speech encoding method,
Receiving the first voice code,
When arbitrary data is embedded in the first speech code, the first speech code is converted into a second speech code, and the embedded data is extracted from the first speech code,
Sending the second voice code obtained by the conversion and the extracted data separately to the destination,
A speech code conversion method characterized by the above.

At the transmission source, when the data embedding condition is satisfied, the data is embedded in the first speech code by replacing a part of the first speech code with the data,
Monitoring whether the data embedding condition is satisfied with reference to an inverse quantization value of a predetermined element code constituting the received first speech code;
If the data embedding condition is satisfied, the embedded data is extracted from the first speech code,
The speech code conversion method according to claim 1, wherein:

In a speech code conversion device that converts a first speech code obtained by encoding an input speech by a first speech encoding method into a second speech code by a second speech encoding method,
A code conversion unit for converting the first audio code into the second audio code;
When arbitrary data is embedded in the first speech code received from the transmission source, an embedded data extraction unit that extracts embedded data from the first speech code,
Means for separately transmitting the second voice code obtained by the conversion and the extracted data to a destination;
A speech code conversion device comprising:

When the data embedding condition is satisfied at the transmission source, when the data is embedded in the first speech code by replacing a part of the first speech code with the data, the embedded data extraction unit is
A monitoring unit that monitors whether the data embedding condition is satisfied with reference to a dequantized value of a predetermined element code constituting one speech code received from a transmission source;
An extraction unit that extracts the embedded data from the first speech code if the data embedding condition is satisfied,
4. The speech code conversion apparatus according to claim 3, further comprising: