JP2010044408A

JP2010044408A - Speech code conversion method

Info

Publication number: JP2010044408A
Application number: JP2009240710A
Authority: JP
Inventors: Yoshiteru Tsuchinaga; 義照土永; Takashi Ota; 恭士大田; Masanao Suzuki; 政直鈴木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-10-19
Filing date: 2009-10-19
Publication date: 2010-02-25
Anticipated expiration: 2021-08-31
Also published as: JP4985743B2

Abstract

【目的】送信側と受信側のフレーム長の相違やDTX制御の相違を考慮して、送信側の第１非音声符号化方式の第１非音声符号を受信側の第２非音声符号化方式の第２非音声符号に変換する。
【構成】非音声区間では所定のフレームにおいてのみ非音声符号を伝送し、それ以外のフレームでは非音声符号を伝送せず、前記フレーム単位の符号情報に、音声フレーム、非音声フレーム、符号を伝送しない非伝送フレームの別を示すフレームタイプ情報を付加し、フレームタイプ情報に基いてどのフレームの符号であるか識別し、非音声フレーム、非伝送フレームの場合には、第１、第２の非音声符号化方式におけるフレーム長の差、および非音声符号の伝送制御の相違を考慮して第１の非音声符号を第２の非音声符号に変換する。この変換に際して、非音声フレームの場合であって、変換する第１の非音声符号が得られない場合には、過去の第１の音声フレームの音声符号を用いて第２の非音声符号を求め、第２の非音声符号に変換する
【選択図】図４[Purpose] Considering the difference in frame length between the transmitting side and the receiving side and the difference in DTX control, the first non-speech code of the first non-speech encoding method on the transmitting side is changed to the second non-speech encoding method on the receiving side. To the second non-speech code.
[Configuration] Non-speech code is transmitted only in a predetermined frame in non-speech period, non-speech code is not transmitted in other frames, and speech frame, non-speech frame and code are transmitted in the code information of each frame. Frame type information indicating the type of non-transmitted frame is added to identify which frame code is based on the frame type information. In the case of a non-voice frame or non-transmitted frame, the first and second non-transmitted frames are identified. The first non-speech code is converted into the second non-speech code in consideration of the difference in the frame length in the speech coding system and the difference in transmission control of the non-speech code. In this conversion, if the first non-voice code to be converted cannot be obtained in the case of a non-voice frame, the second non-voice code is obtained using the voice code of the past first voice frame. Convert to second non-speech code [selection figure]

Description

本発明は音声符号変換方法に係わり、特に、インターネットなどのネットワークで用いられる音声符号化装置や自動車・携帯電話システム等で用いられる音声符号化装置によって符号化された音声符号を別の符号化方式の音声符号に変換する音声符号変換方法に関する。 The present invention relates to a speech code conversion method, and in particular, a speech code encoded by a speech encoding device used in a network such as the Internet or a speech encoding device used in an automobile / mobile phone system or the like is differently encoded. The present invention relates to a voice code conversion method for converting into a voice code.

近年、携帯電話加入者が爆発的に増加しており、今後も増加し続けることが予想される。また、インターネットを使った音声通信(Voice over IP:VoIP)は、企業内ネットワークや長距離電話サービスなどの分野で普及してきている。このような音声通信システムでは、通信回線を有効利用するため音声を圧縮する音声符号化技術が用いられるが、システム毎に使用される音声符号化方式が異なる。例えば、次世代の携帯電話システムとして期待されているW-CDMAでは、世界共通の音声符号化方式としてAMR(Adaptive Multi-Rate;適応マルチレート)方式が採用されている。一方、VoIPでは音声符号化方式としてITU-T勧告G.729A方式が広く用いられている。 In recent years, the number of mobile phone subscribers has increased explosively and is expected to continue to increase in the future. Voice communication over the Internet (Voice over IP: VoIP) has become widespread in fields such as corporate networks and long distance telephone services. In such a voice communication system, a voice coding technique for compressing voice is used in order to effectively use a communication line, but a voice coding method used for each system is different. For example, in W-CDMA, which is expected as a next-generation mobile phone system, an AMR (Adaptive Multi-Rate) system is adopted as a world-wide voice encoding system. On the other hand, in VoIP, the ITU-T recommendation G.729A system is widely used as a voice encoding system.

今後、インターネットと携帯電話の普及に伴い、インターネットユーザーと携帯電話ユーザによる音声通信の通信量がますます増加すると考えられる。ところが、前述したように携帯電話網とインターネット網では、使用する音声符号化方式が異なるためそのままでは通信することができない。このため、従来は一方のネットワークで符号化された音声符号を音声符号変換器により、他方のネットワークで使用されている音声符号方式の音声符号に変換する必要がある。 In the future, with the spread of the Internet and mobile phones, it is considered that the volume of voice communication between Internet users and mobile phone users will increase further. However, as described above, the cellular phone network and the Internet network cannot communicate with each other because they use different voice encoding methods. For this reason, conventionally, it is necessary to convert a voice code encoded in one network into a voice code of a voice code method used in the other network by a voice code converter.

・音声符号変換
図１５に従来の典型的な音声符号変換方法の原理図を示す。以下ではこの方法を従来技術1と呼ぶ。図において、ユーザＡが端末１に対して入力した音声をユーザＢの端末２に伝える場合のみを考える。ここで、ユーザＡの持つ端末１は符号化方式１の符号器１ａのみを持ち、ユーザＢの持つ端末２は符号化方式２の復号器２ａのみを持つこととする。 Speech code conversion FIG. 15 shows a principle diagram of a conventional typical speech code conversion method. Hereinafter, this method is referred to as Prior Art 1. In the figure, only the case where the voice input by the user A to the terminal 1 is transmitted to the terminal 2 of the user B is considered. Here, the terminal 1 possessed by the user A has only the encoder 1a of the encoding scheme 1, and the terminal 2 possessed by the user B has only the decoder 2a of the encoding scheme 2.

送信側のユーザＡが発した音声は、端末１に組み込まれた符号化方式１の符号器１ａへ入力する。符号器１ａは入力した音声信号を符号化方式１の音声符号に符号化して伝送路１ｂに送出する。音声符号変換部３の復号器３ａは、伝送路１ｂを介して音声符号が入力すると、符号化方式１の音声符号から一旦再生音声を復号する。続いて、音声符号変換部３の符号器３ｂは再生音声信号を符号化方式２の音声符号に変換して伝送路２ｂに送出する。この符号化方式２の音声符号は伝送路２ｂを通して端末２に入力する。復号器２ａは音声符号が入力すると、符号化方式２の音声符号から再生音声を復号する。これにより、受信側のユーザＢは再生音声を聞くことができる。以上のように一度符号化された音声を復号し、復号された音声を再度符号化する処理をタンデム接続と呼ぶ。 The voice uttered by the user A on the transmission side is input to the encoder 1a of the encoding method 1 incorporated in the terminal 1. The encoder 1a encodes the input audio signal into an encoding method 1 audio code and sends it to the transmission line 1b. When a speech code is input via the transmission path 1b, the decoder 3a of the speech code conversion unit 3 once decodes the reproduced speech from the speech code of the encoding scheme 1. Subsequently, the encoder 3b of the voice code converter 3 converts the reproduced voice signal into a voice code of the encoding method 2 and sends it to the transmission line 2b. The voice code of this encoding scheme 2 is input to the terminal 2 through the transmission path 2b. When the voice code is input, the decoder 2a decodes the reproduced voice from the voice code of the encoding method 2. Thereby, the user B on the receiving side can listen to the reproduced sound. The process of decoding the speech once encoded as described above and encoding the decoded speech again is called tandem connection.

以上のように従来技術１の構成では、音声符号化方式1で符号化した音声符号を一旦符号化音声に復号し、再度、音声符号化方式2により符号化するタンデム接続を行うため、音声品質の著しい劣化や遅延の増加といった問題があった。このようなタンデム接続の問題点を解決する方法として、音声符号を音声信号に戻すことなく、LSP符号、ピッチラグ符号等のパラメータ符号に分解し、各パラメータ符号を個別に別の音声符号化方式の符号に変換する手法が提案されている(特許文献１参照)。図１６にその原理図を示す。以下ではこれを従来技術２と呼ぶ。 As described above, in the configuration of the conventional technique 1, since the speech code encoded by the speech encoding method 1 is once decoded into the encoded speech and then encoded again by the speech encoding method 2, the speech quality is improved. There were problems such as significant deterioration of the network and increased delay. As a method for solving such a problem of tandem connection, the speech code is decomposed into parameter codes such as LSP code and pitch lag code without returning to speech signals, and each parameter code is individually converted into another speech coding method. A method of converting to a code has been proposed (see Patent Document 1). FIG. 16 shows the principle diagram. Hereinafter, this is referred to as Prior Art 2.

端末１に組み込まれた符号化方式１の符号器１ａはユーザＡが発した音声信号を符号化方式１の音声符号に符号化して伝送路１ｂに送出する。音声符号変換部４は伝送路１ｂより入力した符号化方式１の音声符号を符号化方式２の音声符号に変換して伝送路２ｂに送出し、端末２の復号器２ａは、伝送路２ｂを介して入力する符号化方式２の音声符号から再生音声を復号し、ユーザＢはこの再生音声を聞くことができる。 The encoding method 1 encoder 1a incorporated in the terminal 1 encodes the audio signal emitted by the user A into the encoding method 1 audio code and sends it to the transmission line 1b. The voice code conversion unit 4 converts the voice code of the coding method 1 input from the transmission line 1b into the voice code of the coding method 2 and sends it to the transmission line 2b. The decoder 2a of the terminal 2 uses the transmission line 2b. The user B can listen to the reproduced voice by decoding the reproduced voice from the voice code of the encoding method 2 input via the user.

符号化方式１は、(1)フレーム毎の線形予測分析により得られる線形予測係数(LPC計数)から求まるLSPパラメータを量子化することにより得られる第１のLＳＰ符号と、(2)周期性音源信号を出力するための適応符号帳の出力信号を特定する第１のピッチラグ符号と、(3)雑音性音源信号を出力するための代数符号帳(あるいは雑音符号帳)の出力信号を特定する第１の代数符号(雑音符号)と、(4)前記適応符号帳の出力信号の振幅を表すピッチゲインと前記代数符号帳の出力信号の振幅を表す代数ゲインとを量子化して得られる第１のゲイン符号とで音声信号を符号化する方式である。又、符号化方式２は、第１の音声符号化方式と異なる量子化方法により量子化して得られる(1)第２のLＳＰ符号、(2)第２のピッチラグ符号、(3)第２の代数符号（雑音符号）、(4)第２のゲイン符号とで音声信号を符号化する方式である。 The encoding method 1 includes (1) a first LSP code obtained by quantizing an LSP parameter obtained from a linear prediction coefficient (LPC count) obtained by linear prediction analysis for each frame, and (2) a periodic sound source. A first pitch lag code that specifies an output signal of an adaptive codebook for outputting a signal, and (3) an output signal of an algebraic codebook (or a noise codebook) for outputting a noisy excitation signal. 1 is obtained by quantizing an algebraic code (noise code) of 1 and (4) a pitch gain representing the amplitude of the output signal of the adaptive codebook and an algebraic gain representing the amplitude of the output signal of the algebraic codebook. This is a method of encoding an audio signal with a gain code. The encoding method 2 is obtained by quantization by a quantization method different from the first speech encoding method (1) a second LSP code, (2) a second pitch lag code, (3) a second In this method, a speech signal is encoded with an algebraic code (noise code) and (4) a second gain code.

音声符号変換部４は、符号分離部４ａ、LSP符号変換部４ｂ、ピッチラグ符号変換部４ｃ、代数符号変換部４ｄ、ゲイン符号変換部４ｅ、符号多重化部４ｆを有している。符号分離部４ａは、端末１の符号器１ａから伝送路１ｂを介して入力する符号化方式１の音声符号より、音声信号を再現するために必要な複数の成分の符号、すなわち、(1)LSP符号、(2)ピッチラグ符号、(3)代数符号、(4)ゲイン符号に分離し、それぞれを各符号変換部４ｂ〜４ｅに入力する。各符号変換部４ｂ〜４ｅは入力された音声符号化方式１によるLSP符号、ピッチラグ符号、代数符号、ゲイン符号をそれぞれ音声符号化方式２によるLSP符号、ピッチラグ符号、代数符号、ゲイン符号に変換し、符号多重化部４ｆは変換された音声符号化方式２の各符号を多重化して伝送路２ｂに送出する。 The speech code conversion unit 4 includes a code separation unit 4a, an LSP code conversion unit 4b, a pitch lag code conversion unit 4c, an algebraic code conversion unit 4d, a gain code conversion unit 4e, and a code multiplexing unit 4f. The code separation unit 4a is a code of a plurality of components necessary for reproducing a speech signal from the speech code of the encoding method 1 inputted from the encoder 1a of the terminal 1 through the transmission line 1b, that is, (1) An LSP code, (2) a pitch lag code, (3) an algebraic code, and (4) a gain code are separated and input to the code conversion units 4b to 4e, respectively. Each of the code conversion units 4b to 4e converts the input LSP code, pitch lag code, algebraic code, and gain code according to the speech coding method 1 into an LSP code, pitch lag code, algebraic code, and gain code according to the speech coding method 2, respectively. The code multiplexing unit 4f multiplexes the converted codes of the audio coding method 2 and sends them to the transmission line 2b.

図１７は各符号変換部４ｂ〜４ｅの構成を明示した音声符号変換部の構成図であり、図1６と同一部分には同一符号を付している。符号分離部４ａは伝送路より入力端子＃１を介して入力する符号化方式１の音声符号より、LSP符号１、ピッチラグ符号１、代数符号１、ゲイン符号１を分離し、それぞれ符号変換部４ｂ〜４ｅに入力する。 FIG. 17 is a block diagram of the voice code conversion unit in which the configurations of the code conversion units 4b to 4e are clearly shown. The same reference numerals are given to the same parts as those in FIG. The code separation unit 4a separates the LSP code 1, the pitch lag code 1, the algebraic code 1, and the gain code 1 from the speech code of the coding method 1 that is input from the transmission line via the input terminal # 1, and the code conversion unit 4b. Enter in ~ 4e.

LSP符号変換部４ｂのLSP逆量子化器４ｂ₁は、符号化方式１のLSP符号１を逆量子化してLSP逆量子化値を出力し、LSP量子化器４ｂ₂は該LSP逆量子化値を符号化方式２のLSP量子化テーブルを用いて量子化してLSP符号２を出力する。ピッチラグ符号変換部４ｃのピッチラグ逆量子化器４ｃ₁は、符号化方式１のピッチラグ符号１を逆量子化してピッチラグ逆量子化値を出力し、ピッチラグ量子化器４ｃ₂は該ピッチラグ逆量子化値を符号化方式２のピッチラグ量子化テーブルを用いて量子化してピッチラグ符号２を出力する。代数符号変換部４ｄの代数符号逆量子化器４ｄ₁は、符号化方式１の代数符号１を逆量子化して代数符号逆量子化値を出力し、代数符号量子化器４ｄ₂は該代数符号逆量子化値を符号化方式２の代数符号量子化テーブルを用いて量子化して代数符号２を出力する。ゲイン符号変換部４ｅのゲイン逆量子化器４ｅ₁は、符号化方式１のゲイン符号１を逆量子化してゲイン逆量子化値を出力し、ゲイン量子化器４ｅ₂は該ゲイン逆量子化値を符号化方式２のゲイン量子化テーブルを用いて量子化してゲイン符号２を出力する。
符号多重化部４ｆは、各量子化器４ｂ₂〜４ｅ₂から出力するLSP符号２、ピッチラグ符号２、代数符号２、ゲイン符号２を多重して符号化方式２による音声符号を作成して出力端子＃２より伝送路に送出する。 The LSP dequantizer 4b ₁ of the LSP code converter 4b dequantizes the LSP code 1 of the encoding scheme 1 and outputs an LSP dequantized value, and the LSP quantizer 4b ₂ outputs the LSP dequantized value. Is quantized using the LSP quantization table of encoding method 2 and LSP code 2 is output. The pitch lag dequantizer 4c ₁ of the pitch lag code conversion unit 4c dequantizes the pitch lag code ₁ of the encoding scheme 1 and outputs a pitch lag dequantized value, and the pitch lag quantizer 4c ₂ outputs the pitch lag dequantized value. Is quantized using the pitch lag quantization table of encoding method 2 and pitch lag code 2 is output. The algebraic code dequantizer 4d ₁ of the algebraic code converter 4d dequantizes the algebraic code ₁ of the encoding scheme 1 and outputs an algebraic code dequantized value. The algebraic code quantizer 4d ₂ The inverse quantization value is quantized using the algebraic code quantization table of encoding method 2 and algebraic code 2 is output. The gain dequantizer 4e ₁ of the gain code converter 4e dequantizes the gain code 1 of the encoding scheme 1 and outputs a gain dequantized value, and the gain quantizer 4e ₂ outputs the gain dequantized value. Is quantized using a gain quantization table of encoding method 2 and gain code 2 is output.
The code multiplexer 4f multiplexes the LSP code 2, the pitch lag code 2, the algebraic code 2, and the gain code 2 output from the quantizers 4b _{2 to} 4e ₂ to create a speech code according to the encoding scheme 2 and output it. Send to terminal # 2 to transmission line.

図１５のタンデム接続方式（従来技術１）は、符号化方式１で符号化された音声符号を一旦音声に復号して得られた再生音声を入力とし、再度符号化と復号を行っている。このため、再度の符号化(つまり音声情報圧縮)によって原音に比べて遥かに情報量が少なくなっている再生音声から音声のパラメータ抽出を行うため、それによって得られる音声符号は必ずしも最適なものではなかった。これに対し、図１６の従来技術２の音声符号化装置によれば、符号化方式１の音声符号を逆量子化及び量子化の過程を介して符号化方式２の音声符号に変換するため、従来技術１のタンデム接続に比べて格段に劣化の少ない音声符号変換が可能となる。また、音声符号変換のために一度も音声に復号する必要がないので、従来のタンデム接続で問題となっていた遅延も少なくて済むという利点がある。 In the tandem connection method (prior art 1) in FIG. 15, the reproduced speech obtained by temporarily decoding the speech code encoded by the encoding method 1 is input and the encoding and decoding are performed again. For this reason, since speech parameters are extracted from reproduced speech that has a much smaller amount of information than the original sound by re-encoding (i.e., speech information compression), the resulting speech code is not necessarily optimal. There wasn't. On the other hand, according to the speech coding apparatus of the related art 2 in FIG. 16, in order to convert the speech code of the coding method 1 into the speech code of the coding method 2 through the process of inverse quantization and quantization, Compared with the tandem connection of the prior art 1, speech code conversion with much less deterioration is possible. Further, since there is no need to decode the speech once for speech code conversion, there is an advantage that the delay which has been a problem in the conventional tandem connection can be reduced.

・非音声圧縮
ところで、実際の音声通信システムは、音声会話に含まれる非音声区間を有効利用してさらに情報の伝送効率を向上させる非音声圧縮機能を持つのが一般的である。図1８に非音声圧縮機能の概念図を示す。人の会話では、音声と音声の間に無音部、背景雑音部などの非音声区間が存在する。このような区間では音声情報を伝送する必要が無く、通信回線をより有効利用できる。これが非音声圧縮の基本的な考えである。しかし、このままでは受信側で再生された音声と音声の間が全くの無音になり聴覚的に不自然さが生じるため、通常は受信側で聴覚的に違和感のない自然なノイズ(コンフォートノイズ)を発生させる。入力信号に類似したコンフォートノイズを生成するため、送信側よりコンフォートノイズ情報(以下、CN情報と呼ぶ)を伝送する必要があるが、ＣＮ情報の情報量は音声に比べ少なく、また非音声区間の性質は緩やかに変化するため常にCN情報を送る必要がない。これにより音声区間に比べ伝送する情報量を大幅に低減できるため、通信回線全体の伝送効率をさらに向上させることができる。このような非音声圧縮機能は、音声区間・非音声区間を検出するVAD部(Voice Activity Detection:音声区間検出)、送信側でCN情報の生成・伝送制御を行うDTX部(Discontinuous Transmission:不連続伝送制御)、受信側でコンフォートノイズを発生させるCNG部(Comfort Noise Generator:コンフォートノイズ発生器)で実現される。 -Non-voice compression By the way, an actual voice communication system generally has a non-voice compression function that effectively uses non-voice sections included in a voice conversation to further improve information transmission efficiency. FIG. 18 shows a conceptual diagram of the non-voice compression function. In human conversation, there are non-speech intervals such as silence and background noise between speech. In such a section, it is not necessary to transmit voice information, and the communication line can be used more effectively. This is the basic idea of non-voice compression. However, if this is the case, there will be no sound between the audio played on the receiving side and audio will be unnatural. generate. In order to generate comfort noise similar to the input signal, it is necessary to transmit comfort noise information (hereinafter referred to as CN information) from the transmission side. However, the amount of information of CN information is smaller than that of speech, and in the non-speech interval Since the properties change slowly, it is not always necessary to send CN information. As a result, the amount of information to be transmitted can be greatly reduced as compared with the voice section, so that the transmission efficiency of the entire communication line can be further improved. Such non-speech compression functions include a VAD unit (Voice Activity Detection) that detects speech and non-speech intervals, and a DTX unit (Discontinuous Transmission: discontinuous transmission) that performs CN information generation and transmission control on the transmission side. Transmission control) and a CNG unit (Comfort Noise Generator) that generates comfort noise on the receiving side.

以下、非音声圧縮機能の動作原理を説明する。図1９に原理図を示す。
送信側において、一定長のフレーム(例えば、80サンプル／10msec)に分割した入力信号をVAD部５ａに入力して音声区間検出を行う。VAD部５ａは、音声区間で1、非音声区間で0の判定結果vad_flagを出力する。音声区間(vad_flag=1)の場合、スイッチSW1〜SW4をすべて音声側に切り替え、送信側の音声符号器5b及び受信側の音声復号器6aは通常の音声符号化方式(例えば、G.729AやAMR)にしたがって音声信号の符号化、復号化を行う。一方、非音声区間(vad_flag=0)の場合、スイッチSW1〜SW4をすべて非音声側に切り替え、送信側の非音声符号器5cはDTX部(図示せず)の制御で非音声信号の符号化処理、すなわち、CN情報の生成・伝送制御を行い、受信側の非音声復号器6ｂはCNG部(図示せず)の制御で復号化処理、すなわち、コンフォートノイズを発生する。 Hereinafter, the operation principle of the non-voice compression function will be described. FIG. 19 shows the principle diagram.
On the transmission side, an input signal divided into frames of a certain length (for example, 80 samples / 10 msec) is input to the VAD unit 5a to detect a voice section. The VAD unit 5a outputs a determination result vad_flag of 1 in the voice interval and 0 in the non-voice interval. In the case of a speech section (vad_flag = 1), all the switches SW1 to SW4 are switched to the speech side, and the speech encoder 5b on the transmission side and the speech decoder 6a on the reception side are in a normal speech coding scheme (for example, G.729A or The audio signal is encoded and decoded according to AMR). On the other hand, in the case of non-speech period (vad_flag = 0), all switches SW1 to SW4 are switched to the non-speech side, and the non-speech encoder 5c on the transmission side encodes a non-speech signal under the control of the DTX unit (not shown) Processing, that is, CN information generation / transmission control is performed, and the non-speech decoder 6b on the receiving side generates decoding processing, that is, comfort noise under the control of a CNG unit (not shown).

次に非音声符号器５ｃ、非音声復号器６ｂの動作について説明する。図２０にそれぞれのブロック図、図２１(a),(b)にそれぞれの処理フローを示す。
CN情報生成部７ａでは、フレーム毎に入力信号を分析して受信側のCNG部８ａでコンフォートノイズを生成するためのCNパラメータを算出する(ステップS101)。CNパラメータとしては一般的に周波数特性の概形情報と振幅情報が用いられる。DTX制御部7bはスイッチ７ｃを制御して、求めたCN情報を受信側へ伝送する/しないをフレーム毎に制御する(S102)。制御方法としては、信号の性質に応じて適応的に制御する方法や一定間隔で定期的に制御する方法がある。伝送が必要な場合には、CNパラメータをCN量子化部７ｄへ入力し、CN量子化部７ｄはCNパラメータを量子化してCN符号を生成し(S103)、回線データとして受信側へ伝送する(S104)。以後、CN情報が伝送されるフレームをSID(Silence Insertion Descriptor)フレームと呼ぶ。その他のフレームでは、非伝送フレームとなり何も伝送しない（S105）。 Next, operations of the non-speech encoder 5c and the non-speech decoder 6b will be described. FIG. 20 shows a block diagram of each, and FIGS. 21 (a) and 21 (b) show respective processing flows.
The CN information generation unit 7a analyzes the input signal for each frame and calculates a CN parameter for generating comfort noise in the CNG unit 8a on the receiving side (step S101). As the CN parameter, generally, outline information of frequency characteristics and amplitude information are used. The DTX control unit 7b controls the switch 7c to control whether or not the obtained CN information is transmitted to the receiving side for each frame (S102). As a control method, there are a method of adaptively controlling according to the nature of the signal and a method of periodically controlling at regular intervals. When transmission is necessary, the CN parameter is input to the CN quantization unit 7d, and the CN quantization unit 7d quantizes the CN parameter to generate a CN code (S103), and transmits it to the receiving side as line data (S103). S104). Hereinafter, a frame in which CN information is transmitted is referred to as a SID (Silence Insertion Descriptor) frame. Other frames are non-transmission frames and nothing is transmitted (S105).

受信側のCNG部８ａは、伝送されてきたCN符号を基にコンフォートノイズを発生する。すなわち、送信側から送られてきたCN符号は、ＣＮ逆量子化部8bに入力し、ＣＮ逆量子化部8bは該CN符号を逆量子化してCNパラメータにし（S111）、CNG部８ａはCNパラメータを用いてコンフォートノイズを生成(S112)する。また、CNパラメータが伝送されて来ない非伝送フレームでは、最後に受信したCNパラメータを用いてコンフォートノイズを生成する（S113）。
以上のように、実際の音声通信システムでは、会話の中の非音声区間を判別し、この非音声区間において受信側で聴覚的に自然なノイズを生成するための情報のみを間欠的に伝送し、これにより伝送効率をさらに向上させることが可能である。このような非音声圧縮機能は、先に述べた次世代携帯電話網やVoIP網でも採用されており、システム毎に異なる方式が用いられている。 The CNG unit 8a on the receiving side generates comfort noise based on the transmitted CN code. That is, the CN code sent from the transmission side is input to the CN dequantization unit 8b, and the CN dequantization unit 8b dequantizes the CN code into CN parameters (S111), and the CNG unit 8a Comfort noise is generated using the parameters (S112). In a non-transmission frame in which no CN parameter is transmitted, comfort noise is generated using the last received CN parameter (S113).
As described above, in an actual voice communication system, a non-speech section in a conversation is determined, and only information for generating aurally natural noise on the receiving side is intermittently transmitted in the non-speech section. As a result, the transmission efficiency can be further improved. Such a non-voice compression function is also employed in the next-generation mobile phone network and VoIP network described above, and different systems are used for each system.

次に代表的な符号化方式であるG.729A(VoIP)とAMR(次世代携帯電話)に用いられている非音声圧縮機能について説明する。表1に両方式の諸元を示す。

G.729A、AMRともCN情報としてLPC係数(線形予測計数)とフレーム信号電力が用いられる。LPC係数は入力信号の周波数特性の概形を表わすパラメータであり、フレーム信号電力は入力信号の振幅特性を表わすパラメータである。これらパラメータはフレーム毎に入力信号を分析することによって得られる。以下にG.729AとAMRのCN情報の生成方法を述べる。 Next, the non-speech compression function used in G.729A (VoIP) and AMR (next generation mobile phone), which are typical coding systems, will be described. Table 1 shows the specifications of both types.

Both G.729A and AMR use LPC coefficients (linear prediction count) and frame signal power as CN information. The LPC coefficient is a parameter representing an outline of the frequency characteristic of the input signal, and the frame signal power is a parameter representing the amplitude characteristic of the input signal. These parameters are obtained by analyzing the input signal every frame. The following describes how to generate CN information for G.729A and AMR.

G.729Aでは、LPC情報は現フレームを含む過去６フレームのLPC係数の平均値として求められる。また、SIDフレーム近傍の信号変動を考慮して、求めた平均値または現フレームのLPC係数を最終的にCN情報として用いる。どちらを選択するかは、両LPC係数間のひずみを測定することによって決定される。信号に変動がある(歪が大きい)と判定された場合、現フレームのLPC係数が用いられる。フレーム電力情報は、LPC予測残差信号の対数電力を現フレームを含む過去０〜３フレームで平均化した値として求められる。ここでLPC残差信号は、フレーム毎に入力信号をLPC逆フィルタに通すことによって得られる信号である。 In G.729A, LPC information is obtained as an average value of LPC coefficients of the past six frames including the current frame. Further, in consideration of signal fluctuations in the vicinity of the SID frame, the obtained average value or the LPC coefficient of the current frame is finally used as CN information. Which one to select is determined by measuring the strain between both LPC coefficients. When it is determined that the signal has a variation (large distortion), the LPC coefficient of the current frame is used. The frame power information is obtained as a value obtained by averaging the logarithmic power of the LPC prediction residual signal in the past 0 to 3 frames including the current frame. Here, the LPC residual signal is a signal obtained by passing an input signal through an LPC inverse filter for each frame.

AMRでは、LPC情報は現フレームを含む過去８フレームのLPC係数の平均値として求められる。平均値の算出はLPC係数をLSPパラメータに変換した領域で行われる。ここで、LSPはLPC係数と相互に変換が可能な周波数領域のパラメータである。フレーム信号電力情報は、入力信号の対数電力を過去8フレーム(現フレームを含む)で平均化した値として求められる。
以上のようにG.729A、AMRともにCN情報としてLPC情報とフレーム信号電力情報を用いるが、その生成(算出)方法は異なる。 In AMR, LPC information is obtained as an average value of LPC coefficients of the past 8 frames including the current frame. The average value is calculated in a region where LPC coefficients are converted into LSP parameters. Here, LSP is a parameter in the frequency domain that can be mutually converted with the LPC coefficient. The frame signal power information is obtained as a value obtained by averaging the logarithmic power of the input signal over the past 8 frames (including the current frame).
As described above, both G.729A and AMR use LPC information and frame signal power information as CN information, but their generation (calculation) methods are different.

CN情報はCN符号に量子化され復号器へと伝送される。表1にG.729AとAMRのCN符号のビット割り当てを示す。G.729Aでは、LPC情報を10bit、フレーム電力情報を5bitで量子化する。一方、AMRでは、LPC情報を29bit、フレーム電力情報を6bitで量子化する。ここで、LPC情報はLSPパラメータに変換して量子化される。このようにG.729AとAMRでは、量子化するためのビット割り当ても異なっている。図２２(a)，(b)はそれぞれG.729AとAMRにおける非音声符号(CN符号)構成図である。 The CN information is quantized into a CN code and transmitted to the decoder. Table 1 shows the bit allocation of G.729A and AMR CN codes. G.729A quantizes LPC information with 10 bits and frame power information with 5 bits. On the other hand, in AMR, LPC information is quantized with 29 bits and frame power information with 6 bits. Here, the LPC information is converted into LSP parameters and quantized. In this way, bit allocation for quantization is different between G.729A and AMR. 22 (a) and 22 (b) are non-speech code (CN codes) configuration diagrams in G.729A and AMR, respectively.

G.729Aでは図２２(a)に示すように非音声符号のサイズは15bitであり、LSP符号I_LSPg（10bit）と電力符号I_POWg(5bit)で構成される。また、各符号はG.729Aの量子化器が持つ符号帳のインデックス（要素番号）で構成されており、詳細は以下の通りである。すなわち、(1)LSP符号I_LSPgは、符号L_G1（1bit）、L_G2（5bit）、L_G3（4bit）で構成され、L_G1は、LSP量子化器の予測係数の切り替え情報、L_G2、L_G3はLSP量子化器の符号帳CB_G1、CB_G2の各インデックス、(2)電力符号は、電力量子化器の符号帳CB_G3のインデックスである。
AMRでは図２２(b) に示すように非音声符号のサイズは35bitであり、LSP符号I_LSPａ(29bit)と電力符号I_ POWa(6bit)で構成される。また、各符号はAMRの量子化器が持つ符号帳のインデックスで構成されており、詳細は以下の通りである。すなわち、(1)LSP符号I_LSPaは、符号L_A1（3bit）、L_A2（8bit）、L_A3（9bit）、L_A4（9bit）で構成され、各符号は、LSP量子化器の符号帳GB_A1、GB_A2、GB_A3、GB_A4の各インデックス、(２)電力符号は、電力量子化器の符号帳GB_A5のインデックスである。 In G.729A, as shown in FIG. 22 (a), the size of the non-voice code is 15 bits, and is composed of an LSP code I_LSPg (10 bits) and a power code I_POWg (5 bits). Each code is composed of a codebook index (element number) held by a G.729A quantizer, and details are as follows. That is, (1) LSP code I_LSPg is composed of codes L _G1 (1 bit), L _G2 (5 bits), L _G3 (4 bits), and L _G1 is the prediction information switching information of the LSP quantizer, L _G2 , L _G3 is an index of codebooks CB _G1 and CB _G2 of the LSP quantizer, and (2) power code is an index of codebook CB _G3 of the power quantizer.
In AMR, as shown in FIG. 22 (b), the size of the non-voice code is 35 bits, and is composed of an LSP code I_LSPa (29 bits) and a power code I_POWA (6 bits). Each code is composed of a codebook index of the AMR quantizer, and details are as follows. That is, (1) LSP code I_LSPa is composed of codes L _A1 (3 bits), L _A2 (8 bits), L _A3 (9 bits), L _A4 (9 bits), and each code is a code book GB of the LSP quantizer. Each index of _A1 , GB _A2 , GB _A3 , GB _A4 , (2) The power code is an index of the code book GB _A5 of the power quantizer.

・DTX制御
次にDTXの制御方法について述べる。図２３にG.729A、図２４、図２５にAMRのDTX制御の時間的流れを示す。先ず、図２３を参考にG.729AのDTX制御について説明する。
G.729Aでは、VADが音声区間(VAD_flag=1)から非音声区間(VAD_flag=0)の変化を検出すると非音声区間の最初のフレームをSIDフレームとして設定する。SIDフレームは、上述した方法によるCN情報の生成、ＣＮ情報の量子化により作成され、受信側に伝送される。非音声区間では、フレーム毎に信号の変動を観測し、変動が検出されたフレームのみをSIDフレームとして設定し、再度CN情報の伝送を行う。変動なしと判定されたフレームは非伝送フレームとして設定し、情報の伝送は行わない。また、SIDフレーム間には最低非伝送フレームが2フレーム以上含まれるように制限している。変動の検出は、現フレームと最後に伝送したSIDフレームのCN情報の変化量を測定することにより行う。以上のように、G.729AではSIDフレームの設定が非音声信号の変動に対して適応的に行われる。 -DTX control Next, the DTX control method is described. FIG. 23 shows the time flow of G.729A, and FIGS. 24 and 25 show the AMR DTX control. First, G.729A DTX control will be described with reference to FIG.
In G.729A, when VAD detects a change from a voice interval (VAD_flag = 1) to a non-voice interval (VAD_flag = 0), the first frame of the non-voice interval is set as an SID frame. The SID frame is created by generating CN information and quantizing CN information by the above-described method, and transmitted to the receiving side. In the non-voice interval, signal fluctuation is observed for each frame, only the frame in which the fluctuation is detected is set as an SID frame, and CN information is transmitted again. A frame determined not to change is set as a non-transmission frame, and information is not transmitted. In addition, the SID frame is limited to include at least two non-transmission frames. The change is detected by measuring the amount of change in CN information between the current frame and the last transmitted SID frame. As described above, in G.729A, the setting of the SID frame is adaptively performed with respect to the fluctuation of the non-voice signal.

次に図２４、図２５を参考にAMRのDTX制御について説明する。AMRでは、図２４に示すようにSIDフレームの設定方法がG.729Aの適応制御と異なり基本的に8フレーム毎に定期的に設定される。ただし、長い音声区間後の非音声区間への変化点では、図２５に示すようにハングオーバー制御を行う。具体的には、変化点以後７フレームが非音声区間(VAD_flag=0)にもかかわらず音声区間として設定され、通常の音声符号化処理が行われる。この区間をハングオーバーと呼ぶ。このハングオーバーは、最後にSIDフレームが設定されてからの経過フレーム数(P-FRM)が23フレーム以上の場合に設定される。これにより、変化点(非音声区間の始点)でのCN情報が音声区間(過去8フレーム)の特徴パラメータより求められるのを防止し、音声から非音声への変化点における音質を向上させることが出来る。 Next, AMR DTX control will be described with reference to FIGS. In AMR, as shown in FIG. 24, the SID frame setting method is basically periodically set every 8 frames unlike the adaptive control of G.729A. However, hangover control is performed as shown in FIG. 25 at the point of change to a non-voice section after a long voice section. Specifically, seven frames after the change point are set as speech sections regardless of the non-speech section (VAD_flag = 0), and normal speech coding processing is performed. This section is called hangover. This hangover is set when the number of frames (P-FRM) elapsed since the last SID frame was set is 23 frames or more. This prevents CN information at the change point (start point of the non-speech interval) from being obtained from the feature parameters of the speech interval (past 8 frames), and improves the sound quality at the change point from speech to non-speech. I can do it.

その後、８フレーム目が最初のSIDフレーム(SID_FIRSTフレーム)として設定されが、SID_FIRSTフレームではCN情報の伝送は行わない。これはハングオーバー区間において受信側の復号器で復号信号からCN情報を生成できるためである。SID_FIRSTフレーム以後、3フレーム目がSID_UPDATEフレームとして設定され、ここで初めてCN情報の伝送が行われる。その後の非音声区間では、８フレーム毎にSID_UPDATAフレームが設定される。SID_UPDATAフレームは上述した方法により作成されて受信側へ伝送される。その他のフレームは非伝送フレームと設定されCN情報の伝送は行われない。 Thereafter, the eighth frame is set as the first SID frame (SID_FIRST frame), but CN information is not transmitted in the SID_FIRST frame. This is because CN information can be generated from the decoded signal by the receiving decoder in the hangover period. After the SID_FIRST frame, the third frame is set as the SID_UPDATE frame, and CN information is transmitted for the first time here. In the subsequent non-voice section, a SID_UPDATA frame is set every 8 frames. The SID_UPDATA frame is created by the method described above and transmitted to the receiving side. Other frames are set as non-transmission frames and CN information is not transmitted.

また、図２４に示すように最後にSIDフレームが設定されてからの経過フレームが23フレーム以下の場合は、ハングオーバー制御を行わない。この場合は、変化点のフレーム(非音声区間の最初のフレーム)がSID_UPDATEとして設定されるが、CN情報の算出を行わず最後に伝送したCN情報を再度伝送する。以上のようにAMRのDTX制御は、G.729Aのような適応制御を行わず固定制御でCN情報の伝送を行うため、音声から非音声への変化点を考慮して適宜ハングオーバー制御が行われる以上に示したようにG.729AとAMRの非音声圧縮機能は、基本原理は同じであるが、CN情報生成、量子化、DTX制御方法ともに異なっている。 Also, as shown in FIG. 24, when the number of frames that have elapsed since the last SID frame was set is 23 frames or less, hangover control is not performed. In this case, the frame at the change point (the first frame of the non-voice interval) is set as SID_UPDATE, but the CN information transmitted last is transmitted again without calculating the CN information. As described above, the AMR DTX control transmits CN information by fixed control without performing adaptive control like G.729A. Therefore, appropriate hangover control is performed in consideration of the change point from voice to non-voice. As described above, the non-voice compression functions of G.729A and AMR have the same basic principle, but are different in CN information generation, quantization, and DTX control methods.

従来技術1において、各通信システムが非音声圧縮機能を持つ場合の構成図を図２６に示す。タンデム接続の場合、前述のように符号化方式１の音声符号を一旦再生信号に復号して符号化方式2により再度符号化を行う構成となる。各システムに非音声圧縮機能を持つ場合、図２６にように符号変換部３のVAD部３ｃは符号化方式1によって符号／復号(情報圧縮)された再生信号を対象に音声／非音声区間の判定を行うことになる。このため、VAD部３ｃの音声／非音声区間の判定精度が低下し、誤判定による話頭切れ等の問題が生じ、音質が劣化する場合がある。このため、符号化方式2ではすべてを音声区間として処理するといった対策が考えられるが、これでは最適な非音声圧縮が行えず本来の非音声圧縮による伝送効率向上の効果が損なわれる。更に、非音声区間では符号化方式1の復号器１ａで生成されたコンフォートノイズから符号化方式2のＣＮ情報を求めることになるため、入力信号に類似したノイズを発生させるためのＣＮ情報としては必ずしも最適でない。
又、従来技術2は、従来技術1(タンデム接続)に比べ音質劣化と伝送遅延が少ない優れた音声符号変換方法であるが、非音声圧縮機能が考慮されていないという問題がある。つまり、従来技術2では入力される音声符号が常に音声区間として符号化された情報を想定しているため、非音声圧縮機能によりSIDフレーム又は非伝送フレームが生じた場合、正常な変換動作が行えない。 In prior art 1, FIG. 26 shows a configuration diagram when each communication system has a non-voice compression function. In the case of tandem connection, as described above, the speech code of encoding method 1 is once decoded into a reproduction signal, and is encoded again by encoding method 2. When each system has a non-voice compression function, as shown in FIG. 26, the VAD unit 3c of the code conversion unit 3 uses a reproduction signal encoded / decoded (information compression) by the encoding method 1 as a target for a voice / non-voice section. Judgment will be made. For this reason, the determination accuracy of the voice / non-speech section of the VAD unit 3c is lowered, and there is a problem that the head is cut off due to erroneous determination, and the sound quality may be deteriorated. For this reason, the encoding method 2 can be considered to treat all as speech sections, but this cannot perform optimum non-sound compression, and the effect of improving the transmission efficiency by the original non-sound compression is impaired. Furthermore, since CN information of encoding scheme 2 is obtained from comfort noise generated by decoder 1a of encoding scheme 1 in the non-speech section, CN information for generating noise similar to the input signal is used. Not necessarily optimal.
The prior art 2 is an excellent speech code conversion method with less sound quality degradation and transmission delay than the prior art 1 (tandem connection), but there is a problem that the non-speech compression function is not considered. In other words, in the prior art 2, since the input voice code is always assumed to be information encoded as a voice section, normal conversion operation can be performed when a non-voice compression function generates a SID frame or non-transmission frame. Absent.

特願2001-75427 (特開2002−202799号公報)Japanese Patent Application 2001-75427 (Japanese Patent Laid-Open No. 2002-202799)

本発明の目的は、非音声符号化方法が異なる２つの音声通信システム間の通信において、送信側の非音声符号化方法で符号化したＣＮ符号をCN信号に復号しなくても受信側の非音声符号化方法に応じたＣＮ符号に変換することである。
本発明の別の目的は、送信側と受信側のフレーム長の相違やDTX制御の相違を考慮して送信側のＣＮ符号を受信側のＣＮ符号に変換することである。
本発明の別の目的は、非音声符号化方法や音声符号化方法が異なる２つの音声通信システム間の通信において、高品質な非音声符号変換及び音声符号変換を実現することである。 An object of the present invention is to communicate non-speech communication systems between two different speech communication systems without receiving a CN code encoded by a non-speech encoding method on the transmission side into a CN signal. Conversion to a CN code according to the speech encoding method.
Another object of the present invention is to convert the CN code on the transmission side into the CN code on the reception side in consideration of the difference in frame length between the transmission side and the reception side and the difference in DTX control.
Another object of the present invention is to realize high-quality non-speech code conversion and speech code conversion in communication between two speech communication systems having different non-speech coding methods and speech coding methods.

本発明は、入力信号の一定サンプル数をフレームとし、フレーム単位で音声区間における音声信号を第１の音声符号化方式で符号化して得られる第１の音声符号と、非音声区間における非音声信号を第１の非音声符号化方式で符号化して得られる第１の非音声符号を混在して送信側より伝送し、これら第１の音声符号と第１の非音声符号をそれぞれ、第２の音声符号化方式による第２の音声符号と第２の非音声符号化方式による第２の非音声符号とにそれぞれ変換し、変換により得られた第２の音声符号と第２の非音声符号を混在して受信側に伝送する音声通信システムにおける音声符号変換方法である。 The present invention relates to a first speech code obtained by encoding a speech signal in a speech interval in a frame unit by a first speech encoding method, and a non-speech signal in a non-speech interval. The first non-speech code obtained by encoding the first non-speech code is mixed and transmitted from the transmission side, and the first speech code and the first non-speech code are respectively transmitted to the second non-speech code. The second speech code by the speech coding method and the second non-speech code by the second non-speech coding method are respectively converted, and the second speech code and the second non-speech code obtained by the conversion are respectively converted. This is a voice code conversion method in a voice communication system that mixes and transmits to the receiving side.

本発明の第１の音声符号変換方法において、非音声区間では所定のフレームにおいてのみ非音声符号を伝送し、それ以外のフレームでは非音声符号を伝送せず、前記フレーム単位の符号情報に、音声フレーム、非音声フレーム、符号を伝送しない非伝送フレームの別を示すフレームタイプ情報を付加し、フレームタイプ情報に基いてどのフレームの符号であるか識別し、非音声フレーム、非伝送フレームの場合には、第１、第２の非音声符号化方式におけるフレーム長の差、および非音声符号の伝送制御の相違を考慮して第１の非音声符号を第２の非音声符号に変換するとともに、非音声フレームの場合であって、変換する第１の非音声符号が得られない場合には、過去の第１の音声フレームの音声符号を用いて第２の非音声符号を求め、第２の非音声符号に変換する。 In the first speech code conversion method of the present invention, a non-speech code is transmitted only in a predetermined frame in a non-speech interval, and a non-speech code is not transmitted in other frames. Adds frame type information indicating whether the frame, non-voice frame, or non-transmission frame that does not transmit a code, identifies which frame code is based on the frame type information, and in the case of a non-voice frame or non-transmission frame In consideration of the difference in frame length in the first and second non-speech coding methods and the difference in transmission control of the non-speech code, the first non-speech code is converted into the second non-speech code, In the case of a non-speech frame, if the first non-speech code to be converted cannot be obtained, the second non-speech code is obtained using the speech code of the past first speech frame, To convert non-voice code of the.

本発明の第２の音声符号変換方法において、非音声区間では所定のフレームにおいてのみ非音声符号を伝送し、それ以外のフレームでは非音声符号を伝送せず、前記フレーム単位の符号情報に、音声フレーム、非音声フレーム、符号を伝送しない非伝送フレームの別を示すフレームタイプ情報を付加し、フレームタイプ情報に基いてどのフレームの符号であるか識別し、非音声フレーム、非伝送フレームの場合には、第１、第２の非音声符号化方式におけるフレーム長の差、および非音声符号の伝送制御の相違を考慮して第１の非音声符号を第２の非音声符号に変換すると共に、第１の非音声符号を定期的に第２の非音声符号に変換する場合、非音声符号の有無に関わらず、受信した第１の非音声符号を平均して定期的に得られる平均値を第２の非音声符号として用いる、ことにより第２の非音声符号を定期的に生成する。 In the second speech code conversion method of the present invention, a non-speech code is transmitted only in a predetermined frame in a non-speech interval, and a non-speech code is not transmitted in other frames. Adds frame type information indicating whether the frame, non-voice frame, or non-transmission frame that does not transmit a code, identifies which frame code is based on the frame type information, and in the case of a non-voice frame or non-transmission frame Takes into account the difference in frame length between the first and second non-speech coding methods and the difference in transmission control of the non-speech code, and converts the first non-speech code to the second non-speech code, When the first non-speech code is periodically converted to the second non-speech code, an average value obtained by averaging the received first non-speech code regardless of the presence or absence of the non-speech code First Used as a non-speech code, the second non-speech code periodically generated by.

前記第２の音声符号変換方法において、前記第２の非音声符号化方式が、音声区間から非音声区間への変化点において、変化点のフレームを含めて連続ｎフレームを音声フレームとみなして音声符号を伝送する方式である場合、第１の非音声フレームの非音声符号を逆量子化して得られる複数の要素符号の逆量子化値と、予め定めたあるいはランダムな別の要素符号の逆量子化値とを用いて第２音声符号化方式の連続ｎフレームの音声符号に発生し、前記ｎフレーム分の第２音声符号化方式の音声符号を出力する。 In the second speech code conversion method, the second non-speech coding method is configured such that at the change point from the speech interval to the non-speech interval, the continuous n frames including the frame at the change point are regarded as speech frames. When the code is transmitted, the dequantized values of a plurality of element codes obtained by dequantizing the non-speech code of the first non-speech frame and the dequantization of another predetermined or random element code Are generated in a continuous n-frame speech code of the second speech coding scheme, and a speech code of the second speech coding scheme for the n frames is output.

本発明によれば、非音声符号化方法が異なる２つの音声通信システム間の通信において、送信側の非音声符号化方法で符号化した非音声符号（CN符号）をCN信号に復号しなくても受信側の非音声符号化方法に応じた非音声符号（CN符号）に変換することができ、高品質な非音声符号変換を実現できる。
本発明によれば、送信側と受信側のフレーム長の相違やDTX制御の相違を考慮して非音声信号に復号することなく送信側の非音声符号(ＣＮ符号)を受信側の非音声符号（ＣＮ符号）に変換することができ、高品質な非音声符号変換を実現できる。 According to the present invention, in communication between two speech communication systems having different non-speech encoding methods, a non-speech code (CN code) encoded by a non-speech encoding method on the transmission side is not decoded into a CN signal. Can also be converted into a non-speech code (CN code) according to the non-speech encoding method on the receiving side, and high-quality non-speech code conversion can be realized.
According to the present invention, a non-speech code (CN code) on the transmission side is converted to a non-speech code on the reception side without decoding into a non-speech signal in consideration of a difference in frame length between the transmission side and the reception side and a difference in DTX control. (CN code) can be converted, and high-quality non-voice code conversion can be realized.

本発明の原理説明図である。It is a principle explanatory view of the present invention. 本発明の非音声符号変換の第1実施例の構成図である。1 is a configuration diagram of a first embodiment of non-speech code conversion according to the present invention. FIG. G.729AとAMRの処理フレームである。It is a processing frame of G.729A and AMR. AMRからG.729Aへのフレームタイプの変換制御手順である。It is a frame type conversion control procedure from AMR to G.729A. 電力修正部の処理フローである。It is a processing flow of an electric power correction part. 本発明の第２実施例の構成図である。It is a block diagram of 2nd Example of this invention. 本発明の第３実施例の構成図である。It is a block diagram of 3rd Example of this invention. 音声区間での変換制御説明図である。It is conversion control explanatory drawing in an audio | voice area. 非音声区間での変換制御説明図である。It is conversion control explanatory drawing in a non-voice area. 非音声区間での変換制御説明図（ＡＭＲ8フレーム毎の変換制御）である。It is conversion control explanatory drawing (conversion control for every AMR8 frame) in a non-voice area. 本発明の第４実施例の構成図である。It is a block diagram of 4th Example of this invention. 第４実施例における音声符号変換部の構成図である。It is a block diagram of the audio | voice code | symbol conversion part in 4th Example. 音声→非音声変化点での変換制御説明図である。It is conversion control explanatory drawing in the audio | voice non-voice change point. 非音声→音声変化点での変換制御説明図である。It is conversion control explanatory drawing in the non-voice-> voice change point. 従来技術1(タンデム接続)の説明図である。FIG. 6 is an explanatory diagram of prior art 1 (tandem connection). 従来技術２の説明図である。It is explanatory drawing of the prior art 2. FIG. 従来技術２のより詳細な説明図である。FIG. 10 is a more detailed explanatory diagram of the prior art 2. 非音声圧縮機能の概念図である。It is a conceptual diagram of a non-voice compression function. 非音声圧縮機能の原理図である。It is a principle diagram of a non-voice compression function. 非音声圧縮機能の処理ブロック図である。It is a processing block diagram of a non-voice compression function. 非音声圧縮機能の処理フローである。It is a processing flow of a non-voice compression function. 非音声符号構成図である。It is a non-speech code configuration diagram. G.729AのDTX制御説明図である。It is a DTX control explanatory diagram of G.729A. ＡＭＲのＤＴＸ制御(非ハングオーバ制御時)説明図である。It is AMR DTX control (at the time of non-hangover control) explanatory drawing. ＡＭＲのＤＴＸ制御(ハングオーバ制御時)説明図である。It is AMR DTX control (at the time of hangover control) explanatory drawing. 従来技術において非音声圧縮機能を持つ場合の構成図である。It is a block diagram in the case of having a non-voice compression function in the prior art.

(Ａ)本発明の原理
図１は本発明の原理説明図であり、符号化方式1と符号化方式2としてAMRやG.729AなどのCELP(Code Excited Linear Prediction)方式をベースとした符号化方式が用いられ、各符号化方式は前述した非音声圧縮機能を持つものとする。図１において、入力信号xinが符号化方式１の符号器51aへ入力すると、符号器51aは入力信号を符号化して符号データbst1を出力する。このとき、符号化方式１の符号器51aは非音声圧縮機能によりVAD部51bの判定結果(VAD_flag)に応じて音声・非音声区間の符号化処理を行う。従って、符号データbst1は音声符号か又は、CN符号で構成される。また、符号データbst1にはそのフレームが音声フレームであるかSIDフレームであるか(又は非伝送フレームであるか)を表すフレームタイプ情報Ftype1が含まれる。 (A) Principle of the present invention FIG. 1 is an explanatory diagram of the principle of the present invention. As coding system 1 and coding system 2, coding based on CELP (Code Excited Linear Prediction) systems such as AMR and G.729A is used. It is assumed that each encoding method has the above-described non-voice compression function. In FIG. 1, when an input signal xin is input to the encoder 51a of the encoding method 1, the encoder 51a encodes the input signal and outputs code data bst1. At this time, the encoder 51a of the encoding scheme 1 performs the encoding process of the speech / non-speech section according to the determination result (VAD_flag) of the VAD unit 51b by the non-speech compression function. Therefore, the code data bst1 is composed of a voice code or a CN code. The code data bst1 includes frame type information Ftype1 indicating whether the frame is a voice frame or a SID frame (or a non-transmission frame).

フレームタイプ検出部52は、入力された符号データbst1からフレームタイプFtype1を検出し、変換制御部５３へフレームタイプ情報Ftype1を出力する。変換制御部５３は、フレームタイプ情報Ftype1に基いて音声区間、非音声区間を識別し、識別結果に応じて適切な変換処理を選択し、制御スイッチＳ１，Ｓ２の切り替えを行う。
フレームタイプ情報Ftype1がSIDフレームであれば、非音声符号変換部６０が選択される。非音声符号変換部60において、まず符号データbst1を符号分離部６１に入力する。符号分離部61は符号データbst1を構成する符号化方式の１の要素CN符号に分離する。各要素CN符号はそれぞれCN符号変換部62₁〜62nへ入力され、各CN符号変換部62₁〜62nは各要素CN符号をCN情報に復号することなくそれぞれ符号化方式2の要素CN符号に直接変換する。符号多重部63は変換された各要素CN符号を多重化し、符号化方式2の非音声符号bst2として符号化方式2の復号器５４へ入力する。 The frame type detection unit 52 detects the frame type Ftype1 from the input code data bst1, and outputs the frame type information Ftype1 to the conversion control unit 53. The conversion control unit 53 identifies a speech section and a non-speech section based on the frame type information Ftype1, selects an appropriate conversion process according to the identification result, and switches the control switches S1 and S2.
If the frame type information Ftype1 is an SID frame, the non-voice code converting unit 60 is selected. In the non-voice code converting unit 60, first, the code data bst1 is input to the code separating unit 61. The code separation unit 61 separates the code data bst1 into one element CN code of the coding method. Each element CN codes are inputted to the CN code converting unit 62 ₁ ~62n, the CN code converting unit 62 ₁ ~62n Each element CN codes of encoding scheme 2 without decoding each element CN codes CN information Direct conversion. The code multiplexing unit 63 multiplexes the converted element CN codes, and inputs them to the decoder 54 of the encoding scheme 2 as the non-voice code bst2 of the encoding scheme 2.

フレームタイプ情報Ftype1が非伝送フレームの場合には変換処理を行わない。この場合，非音声符号bst2には非伝送フレームのフレームタイプ情報のみが含まれる。
フレームタイプ情報Ftype1が音声フレームの場合には、従来技術1または従来技術2にしたがって構成した音声符号変換部７０が選択される。音声符号変換部７０は従来技術1または従来技術2にしたがって音声符号変換処理を行い、符号化方式2の音声符号で構成される符号データbst2が出力する。
以上より、音声符号にフレームタイプ情報Ftype1を含ませたから、該情報を参照することによりフレームタイプを識別できる。このため、符号化方式変換部においてVAD部を不用にでき、しかも、音声区間と非音声区間の誤判定をなくすことができる。 If the frame type information Ftype1 is a non-transmission frame, no conversion process is performed. In this case, the non-voice code bst2 includes only the frame type information of the non-transmission frame.
When the frame type information Ftype1 is a speech frame, the speech code conversion unit 70 configured according to the prior art 1 or the prior art 2 is selected. The speech code conversion unit 70 performs speech code conversion processing according to the prior art 1 or the prior art 2, and outputs code data bst2 composed of the speech code of the encoding scheme 2.
As described above, since the frame type information Ftype1 is included in the speech code, the frame type can be identified by referring to the information. For this reason, the VAD part can be made unnecessary in the encoding method conversion part, and the erroneous determination of the speech section and the non-speech section can be eliminated.

又、符号化方式1のＣＮ符号を一旦復号信号(CN信号)に戻さずに直接符号化方式２のCN符号に変換するため、受信側において入力信号に対して最適なＣＮ情報を得ることができる。これにより、非音声圧縮機能による伝送効率の向上効果を損なうことなく、自然な背景雑音を再生することができる。
また、音声フレームに加えSIDフレームおよび非伝送フレームに対しても正常な符号変換処理を行うことができる。これにより、非音声圧縮機能を持つ異なる音声符号化方式間での符号変換が可能となる。
また、異なる非音声/音声圧縮機能を持つ２つの音声符号化方式間での符号変換が、非音声圧縮機能の伝送効率向上効果を維持しつつ、かつ、品質劣化と伝送遅延を抑えつつ、可能となるためその効果は大きい。 In addition, since the CN code of encoding method 1 is directly converted into the CN code of encoding method 2 without returning it to the decoded signal (CN signal), it is possible to obtain optimum CN information for the input signal on the receiving side. it can. Thereby, natural background noise can be reproduced without impairing the effect of improving the transmission efficiency by the non-voice compression function.
Also, normal code conversion processing can be performed for SID frames and non-transmission frames in addition to voice frames. As a result, code conversion between different audio encoding systems having a non-audio compression function is possible.
Also, code conversion between two audio coding systems with different non-speech / sound compression functions is possible while maintaining the transmission efficiency improvement effect of the non-speech compression function and suppressing quality degradation and transmission delay. Therefore, the effect is great.

（Ｂ）第1実施例
図２は本発明の非音声符号変換の第1実施例の構成図であり、符号化方式1としてAMR、符号化方式２としてG.729Aを用いた場合の例を示している。図２において、AMRの符号器(図示せず)より第nフレーム目の回線データすなわち音声符号bst1(n)が端子1に入力する。フレームタイプ検出部５２は、回線データbst1(n)に含まれるフレームタイプ情報Ftype1(n)を抽出し変換制御部５３に出力する。AMRのフレームタイプ情報Ftype(n)は、音声フレーム(SPEECH)、SIDフレーム(SID_FIRST )、SIDフレーム(SID_UPDATE)、非伝送フレーム(NO_DATE)の４通りである(図２４〜図２５参照)。非音声符号変換部６０では、フレームタイプ情報Ftype1(n)に応じてCN符号変換制御を行う。 (B) First Embodiment FIG. 2 is a block diagram of a first embodiment of non-voice code conversion according to the present invention. An example in which AMR is used as the encoding method 1 and G.729A is used as the encoding method 2 is shown. Show. In FIG. 2, the NMR frame data, that is, the voice code bst1 (n) is input to the terminal 1 from an AMR encoder (not shown). The frame type detection unit 52 extracts the frame type information Ftype1 (n) included in the line data bst1 (n) and outputs it to the conversion control unit 53. There are four types of AMR frame type information Ftype (n): a voice frame (SPEECH), a SID frame (SID_FIRST), a SID frame (SID_UPDATE), and a non-transmission frame (NO_DATE) (see FIGS. 24 to 25). The non-voice code conversion unit 60 performs CN code conversion control according to the frame type information Ftype1 (n).

このCN符号変換制御では、AMRとG.729Aのフレーム長の違いを考慮する必要がある。図３に示すようにAMRのフレーム長は20msであり、これに対してG.729Aのフレーム長は10msである。したがって、変換処理はAMRの1フレーム(第nフレーム)をG.729Aの2フレーム（第m,m+1フレーム）として変換することになる。図４にAMRからG.729Aへのフレームタイプの変換制御手順を示す。以下に各場合について順に説明する。 In this CN code conversion control, it is necessary to consider the difference in frame length between AMR and G.729A. As shown in FIG. 3, the frame length of AMR is 20 ms, while the frame length of G.729A is 10 ms. Therefore, the conversion process converts one AMR frame (nth frame) as two G.729A frames (m, m + 1 frames). FIG. 4 shows the conversion control procedure for the frame type from AMR to G.729A. Each case will be described in turn below.

(a) Ftype1(n)=SPEECHの場合
図４(a)に示すようにFtype1(n)=SPEECHの場合には、図２中の制御スイッチS1,S2が端子2に切り替えられ、音声符号変換部70で符号変換処理が行われる。
(b) Ftype1(n)=SID_UPDATEの場合
次に、Ftype1(n)=SID_UPDATEの場合について説明する。図４(b-1)に示すようにAMRの1フレームがSID_UPDATEフレームである場合、G.729Aの第mフレームをSIDフレームと設定してCN符号変換処理を行う。すなわち、図２中のスイッチが端子3に切り替えられ、非音声符号変換部60は、AMRのCN符号bst1(n)をG.729Aの第mフレームのCN符号bst2(m)に変換する。また、図２３で説明したようにG.729AではSIDフレームが続けて設定されることはないから、次フレームの第m+1フレームは非伝送フレームとして設定する。各CN要素符号変換部(LSP変換部62₁、フレーム電力変換部62₂)の動作について以下に説明する。 (a) When Ftype1 (n) = SPEECH As shown in FIG. 4 (a), when Ftype1 (n) = SPEECH, control switches S1 and S2 in FIG. The code conversion process is performed in the unit 70.
(b) When Ftype1 (n) = SID_UPDATE Next, a case where Ftype1 (n) = SID_UPDATE will be described. If one AMR frame is a SID_UPDATE frame as shown in FIG. 4 (b-1), the m-th frame of G.729A is set as an SID frame and CN code conversion processing is performed. That is, the switch in FIG. 2 is switched to the terminal 3, and the non-voice code converting unit 60 converts the AMR CN code bst1 (n) into the G.729A m-th frame CN code bst2 (m). Also, as described with reference to FIG. 23, in G.729A, SID frames are not set continuously, so the m + 1th frame of the next frame is set as a non-transmission frame. The operation of each CN element code converter (LSP converter 62 ₁ , frame power converter 62 ₂ ) will be described below.

先ず、CN符号bst1(n)が符号分離部61に入力すれば、符号分離部61はCN符号bst1(n)をLSP符号I_LSP1(n)とフレーム電力符号I_POW1(n)に分離し、I_LSP1(n)をAMRと同じ量子化テーブルを持つLSP逆量子化器81に入力し、I_POW1(n)をAMRと同じ量子化テーブルを持つフレーム電力逆量子化器91に入力する。 First, if the CN code bst1 (n) is input to the code separation unit 61, the code separation unit 61 separates the CN code bst1 (n) into the LSP code I_LSP1 (n) and the frame power code I_POW1 (n), and I_LSP1 ( n) is input to the LSP inverse quantizer 81 having the same quantization table as the AMR, and I_POW1 (n) is input to the frame power inverse quantizer 91 having the same quantization table as the AMR.

LSP逆量子化器81は入力されたLSP符号I_LSP1(n)を逆量子化し、AMRのLSPパラメータLSP1(n)を出力する。すなわち、LSP逆量子化器81は逆量子化結果であるLSPパラメータLSP1(n)を、そのままG.729Aの第mフレームのLSPパラメータLSP2(m)としてLSP量子化器８２へ入力する。LSP量子化器８２はLSP2(m)を量子化し、G.729AのLSP符号I_LSP2(m)を出力する。ここでLSP量子化器82の量子化方法は任意であるが、使用する量子化テーブルはG.729Aで用いられているものと同じものである。 The LSP inverse quantizer 81 inverse quantizes the input LSP code I_LSP1 (n) and outputs an AMR LSP parameter LSP1 (n). That is, the LSP inverse quantizer 81 inputs the LSP parameter LSP1 (n), which is the inverse quantization result, to the LSP quantizer 82 as the LSP parameter LSP2 (m) of the m.th frame of G.729A. The LSP quantizer 82 quantizes LSP2 (m) and outputs a G.729A LSP code I_LSP2 (m). Here, the quantization method of the LSP quantizer 82 is arbitrary, but the quantization table to be used is the same as that used in G.729A.

フレーム電力逆量子化器91は入力されたフレーム電力符号I_POW1(n)を逆量子化し、AMRのフレーム電力パラメータPOW1(n)を出力する。ここで、AMRとG.729Aのフレーム電力パラメータは、表1に示すようにAMRは入力信号領域、G.729AはLPC残差信号領域というようにフレーム電力を計算する際の信号領域が異なる。したがって、フレーム電力修正部92は、AMRのPOW1(n)をG.729Aで使用できるようにLSP残差信号領域に後述する手順に従って修正する。以上により、フレーム電力修正部92は、POW1(n)を入力としG.729Aのフレーム電力パラメータPOW2(m)を出力する。フレーム電力量子化器93は、POW2(m)を量子化し、G.729Aのフレーム電力符号I_POW2(m)を出力する。ここでフレーム電力量子化器９３の量子化方法は任意であるが、使用する量子化テーブルはG.729Aで用いられているものと同じものである。
符号多重化部63はI_LSP2(m)とI_POW2(n)を多重化し、G.729AのCN符号bst2(m)として出力する。第m+1フレームは非伝送フレームとして設定されるため変換処理は行わない。したがって、bst2(m+1)には非伝送フレームを表すフレームタイプ情報のみが含まれる。 The frame power inverse quantizer 91 inversely quantizes the input frame power code I_POW1 (n) and outputs an AMR frame power parameter POW1 (n). Here, the frame power parameters of AMR and G.729A are different in signal areas when calculating frame power, as shown in Table 1, such that AMR is an input signal area and G.729A is an LPC residual signal area. Therefore, the frame power correction unit 92 corrects the AMR POW1 (n) in the LSP residual signal area according to the procedure described later so that it can be used in G.729A. As described above, the frame power correction unit 92 receives POW1 (n) as an input and outputs the G.729A frame power parameter POW2 (m). The frame power quantizer 93 quantizes POW2 (m) and outputs a G.729A frame power code I_POW2 (m). Here, the quantization method of the frame power quantizer 93 is arbitrary, but the quantization table to be used is the same as that used in G.729A.
The code multiplexing unit 63 multiplexes I_LSP2 (m) and I_POW2 (n), and outputs the result as a CN code bst2 (m) of G.729A. Since the (m + 1) th frame is set as a non-transmission frame, no conversion process is performed. Therefore, bst2 (m + 1) includes only frame type information representing a non-transmission frame.

(c) Ftype1(n)=NO_DATAの場合
次にフレームタイプ情報Ftype1(n)=NO_DATAの場合は、図４(c)のように第m、m+1フレームともに非伝送フレームとして設定される。この場合、変換処理は行わずbst2(m),bst2(m+1)には非伝送フレームを表すフレームタイプ情報のみが含まれる。 (c) When Ftype1 (n) = NO_DATA Next, when the frame type information Ftype1 (n) = NO_DATA, both the m-th and m + 1-th frames are set as non-transmission frames as shown in FIG. 4C. In this case, conversion processing is not performed, and bst2 (m) and bst2 (m + 1) include only frame type information representing a non-transmission frame.

(d)フレーム電力修正法
G.729Aの対数電POW1は、次式を基に算出される。
POW1＝20log₁₀E1 （1）
ここで、

である。err(n) (n＝0，．．．，N₁-1，N₁：G.729Aのフレーム長（80サンプル）)はLPC残差信号であり、入力信号s(n)（n=0，．．．，N₁-1）とs(n)から求めたLPC係数α_i(i＝1，．．．，10)を用いて次式 (d) Frame power correction method
The logarithmic electricity POW1 of G.729A is calculated based on the following equation.
POW1 ＝ 20log ₁₀ E1 （1）
here,

It is. err (n) (n = 0,..., N ₁ -1, N ₁ : G.729A frame length (80 samples)) is an LPC residual signal, and the input signal s (n) (n = 0 , ..., N ₁ -1) and LPC coefficients α _i (i = 1, ..., 10) obtained from s (n)

により求められる。

Is required.

一方、AMRの対数電力POW2は、次式を基に算出される。
POW2＝log₂E2 （4）
ここで、

である。また、N2は、AMRのフレーム長（160サンプル）である。
式（2）、式（5）から明らかなように、G.729AとAMRでは電力E1、E2を算出するのに各々残差err(n)、入力信号s(n)と異なる領域の信号を用いている。したがって、その間を変換する電力修正部が必要となる。修正方法は任意であるが、例えば以下の方法が考えられる。 On the other hand, the logarithmic power POW2 of AMR is calculated based on the following equation.
POW2 ＝ log ₂ E2 （4）
here,

It is. N2 is the AMR frame length (160 samples).
As is clear from Equation (2) and Equation (5), G.729A and AMR calculate the power E1 and E2, respectively, using the residual err (n) and the signal in a different region from the input signal s (n). Used. Therefore, a power correction unit that converts between them is required. Although the correction method is arbitrary, for example, the following method can be considered.

・G.729AからAMRへの修正
図５(ａ)に処理フローを示す。まずG.729Aの対数電力POW1より電力E1を求める。
E1＝10^(POW1/20) (6)
次に電力がE1となるように擬似LPC残差信号d_err（n）（n=0，．．．，N₁-1）を次式により生成する。
ｄ_err(n)＝E1・ｑ(n) （7）
ここで、q(n)(n=0 ，．．．，N₁-1)は、電力が1に正規化されたランダムノイズ信号である。d_err（n）をLPC合成フィルタに通して、擬似信号（入力信号領域）d_s（n）（n=0，．．．，N₁-1）を生成する。 -Modification from G.729A to AMR Figure 5 (a) shows the processing flow. First, the power E1 is obtained from the logarithmic power POW1 of G.729A.
E1 = 10 ^{(POW1 / 20)} (6)
Next, a pseudo LPC residual signal d_err (n) (n = 0,..., N ₁ −1) is generated by the following equation so that the power becomes E1.
d_err (n) = E1 · q (n) (7)
Here, q (n) (n = 0,..., N ₁ −1) is a random noise signal whose power is normalized to 1. d_err (n) is passed through the LPC synthesis filter to generate a pseudo signal (input signal region) d_s (n) (n = 0,..., N ₁ −1).

ここで、α_i(i＝1，．．．，10)はLSP逆量化値から求められたG.729AのLPC係数である。またd_s（-i）（i＝1，．．．，10）の初期値は0とする。ｄ_s(n)の電力を算出し、AMRの電力E2として用いる。したがって、AMRの対数電力POW2は、次式で求められる。

Here, α _i (i = 1,..., 10) is an LPC coefficient of G.729A obtained from the LSP dequantization value. The initial value of d_s (-i) (i = 1, ..., 10) is 0. The power of d_s (n) is calculated and used as the AMR power E2. Therefore, the logarithmic power POW2 of AMR is obtained by the following equation.

・AMRからG.729Aへの修正
図５(b)に処理フローを示す。まず、AMRの対数電力POW2より電力E2を求める。
E2＝2^POW2 (10）
電力がE2となる擬似入力信号d_s(n)(n=0，．．．，N₂-1)を次式より生成する。
ｄ_s(n)＝E2・q(n) （11）
ここで、q(n)は、電力が1に正規化されたランダムノイズ信号である。d_s(n)をLPC逆合成フィルタに通して、擬似信号（LPC残差信号領域）ｄ_err(n)（n=0，．．．，N₂-1）を生成する。 -Modification from AMR to G.729A Fig. 5 (b) shows the processing flow. First, electric power E2 is obtained from logarithmic electric power POW2 of AMR.
E2 = 2 ^POW2 (10)
A pseudo input signal d_s (n) (n = 0,..., N ₂ −1) with power E2 is generated from the following equation.
d_s (n) ＝ E2 ・ q (n) (11)
Here, q (n) is a random noise signal whose power is normalized to 1. d_s (n) is passed through an LPC inverse synthesis filter to generate a pseudo signal (LPC residual signal region) d_err (n) (n = 0,..., N ₂ −1).

ここで、α_i(i＝1，．．．，10)はLSP逆量子化値から求められたAMRのLPC係数である。また、ｄ_s(-i) （i＝1，．．．，10）の初期値は0とする。d_err（n）の電力を算出し、G.729Aの電力E1として用いる。したがって、G.729Aの対数電力POW1は、次式

で求められる。

Here, α _i (i = 1,..., 10) is an AMR LPC coefficient obtained from the LSP inverse quantization value. The initial value of d_s (-i) (i = 1,..., 10) is 0. The power of d_err (n) is calculated and used as the power E1 of G.729A. Therefore, the logarithmic power POW1 of G.729A is given by

Is required.

(e)第1実施例の効果
以上説明した通り、第1実施例によればAMRのCN符号であるLSP符号とフレーム電力符号をG.729AのCN符号に直接変換できる。また、音声符号変換部70と非音声符号変換部60を切り替えることにより非音声圧縮機能を備えたAMRから符号データ(音声符号、非音声符号)を一旦再生音声に復号することなしに非音声圧縮機能を備えたG.729Aの符号データに正常に変換することができる。 (e) Effects of First Embodiment As described above, according to the first embodiment, an LSP code and a frame power code, which are AMR CN codes, can be directly converted into G.729A CN codes. Also, by switching between the voice code conversion unit 70 and the non-voice code conversion unit 60, the non-voice compression is performed without decoding the code data (voice code, non-voice code) from the AMR having the non-voice compression function into the playback voice once. It can be normally converted into G.729A code data with a function.

(Ｃ)第２実施例
図６は本発明の第2実施例の構成図であり、図2の第1実施例と同一部分には同一符号を付している。第2実施例は、第1実施例と同様に符号化方式1としてAMR、符号化方式２としてG.729Aを用いた場合において、フレームタイプ検出部52で検出したAMRのフレームタイプがFtype1(n)=SID_FIRSTの場合の変換処理を実現すものである。
図４の(b-2)で示すようにAMRの1フレームがSID_FIRSTフレームの場合も、第1実施例のSID_UPDATEフレームの場合(図４の(b-1))と同様にG.729Aの第mフレームをSIDフレーム、第m+1フレームを非伝送フレームと設定して変換処理を行える。しかし、図２５で説明したようにAMRのSID_FIRSTフレームでは、ハングオーバー制御によりCN符号が伝送されてきていないことを考慮する必要がある。すなわち、図２の第1実施例の構成では、bst1(n)が送られてこないためこのままではG.729AのCNパラメータであるLSP2(m)とPOW2(m)を求めることができない。 (C) Second Embodiment FIG. 6 is a block diagram of a second embodiment of the present invention. The same reference numerals are given to the same parts as those of the first embodiment of FIG. In the second embodiment, when AMR is used as the encoding method 1 and G.729A is used as the encoding method 2 as in the first embodiment, the frame type of the AMR detected by the frame type detecting unit 52 is Ftype1 (n ) = Conversion processing when SID_FIRST is realized.
As shown in FIG. 4 (b-2), when one AMR frame is a SID_FIRST frame, as in the case of the SID_UPDATE frame in the first embodiment (FIG. 4 (b-1)), the G.729A Conversion processing can be performed by setting m frames as SID frames and m + 1 frame as non-transmission frames. However, as described in FIG. 25, it is necessary to consider that the CN code has not been transmitted by the hangover control in the SID_FIRST frame of AMR. That is, in the configuration of the first embodiment shown in FIG. 2, bst1 (n) is not sent, so LSP2 (m) and POW2 (m) that are CN parameters of G.729A cannot be obtained as they are.

そこで、第2実施例では、SID_FIRSTフレーム直前に伝送された過去7フレームの音声フレームの情報を用いてこれらを算出する。以下に変換処理について説明する。
上述の通り、SID_FIRSTフレームにおけるLSP2(m)は、音声符号変換部70におけるLSP符号変換部4bのLSP逆量子化部4b₁(図1７参照)から出力する過去7フレーム分のLSPパラメータOLD_LSP(l),(l=n-1,n-7)の平均値として算出する。したがってLSPバッファ部83は現フレームに対して常に過去7フレームのLSPパラメータを保持し、LSP平均値算出部８４は過去7フレーム分のLSPパラメータOLD_LSP(l),(l=n-1,n-7)の平均値を算出して保持する。 Therefore, in the second embodiment, these are calculated using information of the past seven audio frames transmitted immediately before the SID_FIRST frame. The conversion process will be described below.
As described above, LSP2 (m) in the SID_FIRST frame is the LSP parameter OLD_LSP (1) for the past seven frames output from the LSP inverse quantization unit 4b ₁ (see FIG. 17) of the LSP code conversion unit 4b in the speech code conversion unit 70. ), (l = n-1, n-7) as an average value. Therefore, the LSP buffer unit 83 always holds the LSP parameters of the past seven frames with respect to the current frame, and the LSP average value calculation unit 84 stores the LSP parameters OLD_LSP (l), (l = n-1, n− for the past seven frames. Calculate and hold the average value of 7).

POW2(m)も同様に過去7フレームのフレーム電力OLD_POW(l),(l=n-1,n-7)の平均値として算出する。OLD_POW(l)は、音声符号変換部70におけるゲイン符号変換部４ｅ(図17参照)で生成される音源信号EX(l)のフレーム電力として求められる。したがって、電力計算部94は音源信号EX(l)のフレーム電力を計算し、フレーム電力バッファ部95は、現フレームに対して常に過去7フレームのフレーム電力OLD_POW(l)を保持し、電力平均値算出部96は過去7フレーム分のフレーム電力OLD_POW(l)の平均値を算出して保持する。
LSP量子化器８２及びフレーム電力量子化器93は、非音声区間においてフレームタイプがSID_FIRSTでなければ、変換制御部53よりその旨が通知されるから、LSP逆量子化器81及びフレーム電力逆量子化器91から出力するLSPパラメータ、フレーム電力パラメータを用いてG.729AのLSP符号I_LSP2(m)及びフレーム電力符号I_POW2(m)を求めて出力する。 Similarly, POW2 (m) is calculated as an average value of frame power OLD_POW (l), (l = n−1, n−7) of the past seven frames. OLD_POW (l) is obtained as the frame power of the excitation signal EX (l) generated by the gain code conversion unit 4e (see FIG. 17) in the speech code conversion unit 70. Accordingly, the power calculation unit 94 calculates the frame power of the sound source signal EX (l), and the frame power buffer unit 95 always holds the frame power OLD_POW (l) of the past seven frames with respect to the current frame, and the power average value The calculation unit 96 calculates and holds the average value of the frame power OLD_POW (l) for the past seven frames.
If the frame type is not SID_FIRST in the non-voice interval, the LSP quantizer 82 and the frame power quantizer 93 are notified by the conversion control unit 53. Therefore, the LSP inverse quantizer 81 and the frame power inverse quantizer The LSP code I_LSP2 (m) and the frame power code I_POW2 (m) of G.729A are obtained and output using the LSP parameter and the frame power parameter output from the generator 91.

しかし、非音声区間においてフレームタイプがSID_FIRSTであれば、すなわち、Ftype1(n)=SID_FIRSTであれば、変換制御部53よりその旨が通知される。これにより、LSP量子化器８２及びフレーム電力量子化器93は、LSP平均値算出部８４及び電力平均値算出部96で保持されている過去7フレーム分の平均LSPパラメータ、平均フレーム電力パラメータを用いてG.729AのLSP符号I_LSP2(m)及びフレーム電力符号I_POW2(m)を求めて出力する。
符号多重部63は、LSP符号I_LSP2(m)及びフレーム電力符号I_POW2(m)を多重化し、bst2(m)として出力する。
また、第m+1フレームでは変換処理は行わず、bst2(m+1)には非伝送フレームを表すフレームタイプ情報のみを含めて送出する。 However, if the frame type is SID_FIRST in the non-voice interval, that is, if Ftype1 (n) = SID_FIRST, the conversion control unit 53 notifies that fact. Thereby, the LSP quantizer 82 and the frame power quantizer 93 use the average LSP parameter and the average frame power parameter for the past seven frames held in the LSP average value calculation unit 84 and the power average value calculation unit 96. Thus, the LSP code I_LSP2 (m) and the frame power code I_POW2 (m) of G.729A are obtained and output.
The code multiplexing unit 63 multiplexes the LSP code I_LSP2 (m) and the frame power code I_POW2 (m) and outputs them as bst2 (m).
Also, conversion processing is not performed in the (m + 1) th frame, and bst2 (m + 1) is transmitted including only frame type information representing a non-transmission frame.

以上説明した通り、第2実施例によればAMRのハングオーバー制御により変換するべきCN符号が得られない場合でも、過去の音声フレームの音声パラメータを利用してCNパラメータを求め、G.729AのCN符号を生成することができる。 As described above, according to the second embodiment, even when the CN code to be converted is not obtained by the AMR hangover control, the CN parameter is obtained using the speech parameter of the past speech frame, and the G.729A CN code can be generated.

(Ｄ)第３実施例
図７に本発明の第3実施例の構成図を示し、第1実施例と同一部分には同一符号を付している。第3実施例は、符号化方式1としてG.729A、符号化方式２としてAMRを用いた場合の例を示している。
図７において、G.729Aの符号器(図示せず)より第ｍフレーム目の回線データすなわち音声符号bst1(m)が端子1に入力する。フレームタイプ検出部５２は、bst1(m)に含まれるフレームタイプFtype(m)を抽出し変換制御部53に出力する。G.729AのFtype(m)は音声フレーム(SPEECH)、SIDフレーム(SID)、非伝送フレーム(NO_DATA)の3通りである(図２３参照)。変換制御部53はフレームタイプに基いて音声区間、非音声区間を識別して制御スイッチS1,S2を切り替える。 (D) Third Embodiment FIG. 7 shows a block diagram of a third embodiment of the present invention, and the same reference numerals are given to the same parts as those of the first embodiment. The third embodiment shows an example in which G.729A is used as the encoding method 1 and AMR is used as the encoding method 2.
In FIG. 7, m-th frame data, that is, a voice code bst1 (m) is input to a terminal 1 from a G.729A encoder (not shown). The frame type detection unit 52 extracts the frame type Ftype (m) included in bst1 (m) and outputs it to the conversion control unit 53. There are three types of F.type (m) of G.729A: a voice frame (SPEECH), a SID frame (SID), and a non-transmission frame (NO_DATA) (see FIG. 23). The conversion control unit 53 identifies the voice section and the non-voice section based on the frame type and switches the control switches S1 and S2.

非音声符号変換部６０は、非音声区間においてフレームタイプ情報Ftype(m)に応じてCN符号変換処理の制御を行う。ここで、第1実施例と同様にAMRとG.729Aのフレーム長の違いを考慮する必要がある。すなわち、G.729Aの2フレーム分(第m,第m+1フレーム)をAMRの1フレーム分(第nフレーム)として変換することになる。また、G.729AからAMRへの変換では、DTX制御の相違点を考慮して変換処理を制御する必要がある。 The non-voice code conversion unit 60 controls the CN code conversion process in accordance with the frame type information Ftype (m) in the non-voice section. Here, it is necessary to consider the difference in the frame length between AMR and G.729A as in the first embodiment. That is, two G.729A frames (m-th and m + 1-th frames) are converted as one AMR frame (n-th frame). Also, in the conversion from G.729A to AMR, it is necessary to control the conversion process in consideration of the difference in DTX control.

図８に示すように、Ftype1(m),Ftype1(m+1)がともに音声フレーム(SPEECH)の場合には、AMRの第ｎフレームも音声フレームとして設定する。すなわち、図７の制御スイッチS1,S2が端子2，４に切り替えられ、音声符号変換部70が従来技術2にしたがって音声符号の符号変換処理を行う。
また、図９に示すようにFtype1(m),Ftype1(m+1)が共に非伝送フレーム(NO_DATA)の場合には、AMRの第nフレームも非伝送フレームに設定し、変換処理は行わない。すなわち、図７の制御スイッチS1,S2が端子３，５に切り替えられ、符号多重部63は非伝送フレームのフレームタイプ情報のみを送出する。従って、bst2(n)には非伝送フレームを表すフレームタイプ情報のみが含まれる。 As shown in FIG. 8, when both Ftype1 (m) and Ftype1 (m + 1) are speech frames (SPEECH), the AMR nth frame is also set as a speech frame. That is, the control switches S1 and S2 in FIG. 7 are switched to terminals 2 and 4, and the speech code conversion unit 70 performs speech code conversion processing according to the related art 2.
Also, as shown in FIG. 9, when both Ftype1 (m) and Ftype1 (m + 1) are non-transmission frames (NO_DATA), the AMR nth frame is also set as a non-transmission frame and no conversion process is performed. . That is, the control switches S1 and S2 in FIG. 7 are switched to the terminals 3 and 5, and the code multiplexing unit 63 transmits only the frame type information of the non-transmission frame. Therefore, bst2 (n) includes only frame type information representing a non-transmission frame.

次に、図１０に示すような非音声区間でのＣＮ符号の変換方法について説明する。図１０は非音声区間でのCN符号変換方法の時間的な流れを示す。非音声区間において、図７のスイッチS１、S2は端子3，５に切り替えられ、非音声符号変換部60がCN符号の変換処理を行う。この変換処理において、G.729AとAMRのDTX制御の相違点を考慮する必要がある。G.729AにおけるSIDフレームの伝送制御は適応的であり、CN情報（非音声信号）の変動に応じてSIDフレームが不定期に設定される。一方、AMRではSIDフレーム(SID_UPDATA)は８フレーム毎に定期的に設定されるようになっている。したがって、非音声区間では図1０に示すように変換元のG.729Aのフレームタイプ(SID or NO_DATA)に関係なく、変換先のAMRに合わせて8フレーム毎(G.729Aで16フレームに相当)にSIDフレーム(SID_UPDATA)へ変換する。また、その他の７フレームは非伝送区間(NO_DATA)となるように変換を行う。 Next, a CN code conversion method in a non-voice section as shown in FIG. 10 will be described. FIG. 10 shows a temporal flow of the CN code conversion method in the non-voice section. In the non-voice section, the switches S1 and S2 in FIG. 7 are switched to terminals 3 and 5, and the non-voice code conversion unit 60 performs a CN code conversion process. In this conversion process, it is necessary to consider the difference between DTX control between G.729A and AMR. Transmission control of SID frames in G.729A is adaptive, and SID frames are set irregularly according to changes in CN information (non-voice signal). On the other hand, in AMR, the SID frame (SID_UPDATA) is set periodically every 8 frames. Therefore, in the non-speech period, as shown in Fig. 10, regardless of the G.729A frame type (SID or NO_DATA) of the conversion source, every 8 frames (corresponding to 16 frames in G.729A) according to the AMR of the conversion destination To SID frame (SID_UPDATA). The other seven frames are converted so as to be in the non-transmission section (NO_DATA).

具体的には、図１０中のAMRの第nフレームにおけるSID_UPDATAフレームへの変換では、現フレーム(第m,第m+1フレーム)を含む過去16フレーム(第m-14,…,第m+1)(AMRでは8フレームに相当)の間に受信したSIDフレームのCNパラメータから平均値を求め、AMRのSID_UPDATAフレームのCNパラメータへ変換する。図７を参考に変換処理について説明する。 Specifically, in the conversion of the AMR in the nth frame of FIG. 10 into the SID_UPDATA frame, the past 16 frames (m-14,..., M +) including the current frame (m, m + 1) are included. 1) The average value is obtained from the CN parameter of the SID frame received during (corresponding to 8 frames in AMR), and converted to the CN parameter of the SID_UPDATA frame of AMR. The conversion process will be described with reference to FIG.

第kフレームでG.729AのSIDフレームが受信されると、符号分離部61はCN符号bst1(k)をLSP符号I_LSP1(k)とフレーム電力符号I_POW1(k)に分離し、I_LSP1(k)をG.729Aと同じ量子化テーブルを持つLSP逆量子化器81に入力し、I_POW1(k)をG.729Aと同じ量子化テーブルを持つフレーム電力逆量子化器91に入力する。LSP逆量子化器81はLSP符号I_LSP1(k)を逆量子化してG.729AのLSPパラメータLSP1(k)を出力する。フレーム電力逆量子化器91はフレーム電力符号I_POW1(k) を逆量子化してG.729Aのフレーム電力パラメータPOW1(k) を出力する。 When the G.729A SID frame is received in the kth frame, the code separation unit 61 separates the CN code bst1 (k) into the LSP code I_LSP1 (k) and the frame power code I_POW1 (k), and I_LSP1 (k) Is input to the LSP inverse quantizer 81 having the same quantization table as G.729A, and I_POW1 (k) is input to the frame power inverse quantizer 91 having the same quantization table as G.729A. The LSP inverse quantizer 81 inversely quantizes the LSP code I_LSP1 (k) and outputs an LSP parameter LSP1 (k) of G.729A. The frame power dequantizer 91 dequantizes the frame power code I_POW1 (k) and outputs the G.729A frame power parameter POW1 (k).

G.729AとAMRのフレーム電力パラメータは、表1に示したようにG.729AはLPC残差信号領域、AMRは入力信号領域というようにフレーム電力を計算する際の信号領域が異なる。したがって、フレーム電力修正部92はG.729AのLSP残差信号領域のパラメータPOW1(k)をAMRで使用できるように入力信号領域に修正する。この結果、フレーム電力修正部92はPOW1(k)を入力されてAMRのフレーム電力パラメータPOW2(k)を出力する。
求められたLSP(k),POW2(k)は、それぞれバッファ部85,97に入力される。ここでk=m-14,…,m+1であり、過去16フレームで受信したSIDフレームの各CNパラメータがバッファ部85,97で保持される。ここで、もし過去16フレームにおいて受信したSIDフレームが無い場合には、最後に受信したSIDフレームのCNパラメータを用いる。 As shown in Table 1, the G.729A and AMR frame power parameters have different signal areas when calculating frame power, such as G.729A is an LPC residual signal area and AMR is an input signal area. Therefore, the frame power correction unit 92 corrects the parameter POW1 (k) of the G.729A LSP residual signal region into the input signal region so that it can be used in AMR. As a result, the frame power correction unit 92 receives POW1 (k) and outputs the AMR frame power parameter POW2 (k).
The obtained LSP (k) and POW2 (k) are input to the buffer units 85 and 97, respectively. Here, k = m−14,..., M + 1, and each CN parameter of the SID frame received in the past 16 frames is held in the buffer units 85 and 97. Here, if there is no SID frame received in the past 16 frames, the CN parameter of the last received SID frame is used.

平均値算出部86,98はバッファ保持データの平均値を算出し、AMRのCNパラメータLSP2(n),POW2(n)として出力する。LSP量子化器82はLSP2(n)を量子化し、AMRのLSP符号I_LSP2(n)を出力する。ここでLSP量子化器82の量子化方法は任意であるが、使用する量子化テーブルはAMRで用いられているものと同じものである。フレーム電力量子化器93はPOW2(n)を量子化し、AMRのフレーム電力符号I_POW2(n)を出力する。ここでフレーム電力量子化器93の量子化方法は任意であるが、使用する量子化テーブルはAMRで用いられているものと同じものである。符号多重化部63はI_LSP2(n)とI_POW2(n)を多重化すると共にフレームタイプ情報（=U）を付加してbst2(n)として出力する。 The average value calculation units 86 and 98 calculate the average value of the buffer holding data and output it as the AMR CN parameters LSP2 (n) and POW2 (n). The LSP quantizer 82 quantizes LSP2 (n) and outputs an AMR LSP code I_LSP2 (n). Here, the quantization method of the LSP quantizer 82 is arbitrary, but the quantization table to be used is the same as that used in AMR. The frame power quantizer 93 quantizes POW2 (n) and outputs an AMR frame power code I_POW2 (n). Here, the quantization method of the frame power quantizer 93 is arbitrary, but the quantization table to be used is the same as that used in AMR. The code multiplexing unit 63 multiplexes I_LSP2 (n) and I_POW2 (n), adds frame type information (= U), and outputs the result as bst2 (n).

以上説明した通り、第3実施例によれば非音声区間において変換元のG.729Aのフレームタイプに関わらず、CN符号の変換処理を変換先のAMRのDTX制御に合わせて定期的に行う場合、変換処理が行われるまでに受信したG.729AのCNパラメータの平均値をAMRのCNパラメータとして用いることでAMRのCN符号を生成することができる。
また、音声符号変換部とCN符号変換部を切り替えることにより非音声圧縮機能を備えたG.729Aの符号データ(音声符号、非音声符号)を一旦再生音声に復号することなしに非音声圧縮機能を備えたAMRの符号データに正常に変換することができる。 As described above, according to the third embodiment, when the CN code conversion processing is periodically performed in accordance with the DTX control of the conversion destination AMR regardless of the conversion source G.729A frame type in the non-voice section. The AMR CN code can be generated by using the average value of the CN parameters of G.729A received until the conversion process is performed as the CN parameter of AMR.
Also, by switching between the voice code converter and the CN code converter, G.729A code data (speech code, non-speech code) provided with a non-speech compression function is temporarily decoded without reproducing it into reproduced speech. Can be normally converted to AMR code data.

(Ｅ)第４実施例
図１１は本発明の第4実施例の構成図であり、図７の第3実施例と同一部分には同一符号を付している。図１２は第４実施例における音声符号変換部７０の構成図である。第４実施例は、第３実施例と同様に符号化方式1としてG.729A2、符号化方式２としてAMRを用いた場合において、音声区間から非音声区間への変化点でのCN符号変換処理を実現するものである。
図1３に変換制御方法の時間的な流れを示す。G.729Aの第mフレームが音声フレーム、第m+1フレームがSIDフレームである場合、そこは音声区間から非音声区間への変化点である。AMRではこのような変化点でハングオーバー制御を行う。なお、最後にSID_UPDATAフレームへ変換処理が行われてから区間変更フレームまでのAMRにおける経過フレーム数が23フレーム以下の場合には、ハングオーバー制御は行われない。以下では、経過フレームが23フレームより大きく、ハングオーバー制御を行う場合について説明する。 (E) Fourth Embodiment FIG. 11 is a block diagram of a fourth embodiment of the present invention. Components identical with those of the third embodiment of FIG. FIG. 12 is a block diagram of the speech code converter 70 in the fourth embodiment. In the fourth embodiment, as in the third embodiment, when G.729A2 is used as the coding method 1 and AMR is used as the coding method 2, the CN code conversion process at the change point from the voice section to the non-voice section is performed. Is realized.
FIG. 13 shows a temporal flow of the conversion control method. When the G.729A m-th frame is a voice frame and the m + 1-th frame is a SID frame, this is a transition point from a voice section to a non-voice section. In AMR, hangover control is performed at such change points. Note that if the number of elapsed frames in the AMR from the last conversion to the SID_UPDATA frame to the section change frame is 23 frames or less, the hangover control is not performed. Hereinafter, a case where the elapsed frame is larger than 23 frames and hangover control is performed will be described.

ハングオーバー制御を行う場合、変換点フレームから７フレーム(第n,…,第n+7フレーム)は非音声フレームにもかかわらず、音声フレームとして設定する必要がある。従って、図１３(a)に示すようにG.729Aの第m+1フレーム〜第m+13フレームは、非音声フレーム(SIDフレーム or 非伝送フレーム)にもかかわらず、変換先のAMRのDTX制御に合わせて音声フレームとみなして変換処理を行う。以下、図１１、図１２を参考に変換処理について説明する。 When hangover control is performed, it is necessary to set 7 frames (nth,..., N + 7th frames) from the conversion point frame as audio frames regardless of non-audio frames. Therefore, as shown in FIG. 13 (a), the m + 1th frame to the m + 13th frame of G.729A are the non-voice frames (SID frame or non-transmission frame), but the AMR DTX of the conversion destination In accordance with the control, the conversion process is performed considering the voice frame. Hereinafter, the conversion process will be described with reference to FIGS. 11 and 12.

音声区間から非音声区間への変換点において、G.729AからAMRの音声フレームに変換するためには、音声符号変換部70を用いて変換処理するしかない。しかし、変換点以降ではG.729A側が非音声フレームであるため、このままでは音声符号変換部70の入力となるG.729Aの音声パラメータ(LSP、ピッチラグ、代数符号、ピッチゲイン、代数符号ゲイン)を得ることができない。そこで、図1２に示すようにLSPと代数符号ゲインは、非音声符号変換部60で最後に受信したCNパラメータLSP1(k),POW1(k) （ｋ＜n)で代用し、その他のパラメータ(ピッチラグlag(m),ピッチゲインＧa(m),代数符号code(ｍ))については、ピッチラグ生成部101、代数符号生成部102、ピッチゲイン生成部103で聴覚的に悪影響の無い程度で任意に生成する。生成方法はランダムに生成しても、固定値により生成してもよい。ただし、ピッチゲインについては最小値(0.2)を設定することが望ましい。 In order to convert from a G.729A to an AMR speech frame at a conversion point from a speech segment to a non-speech segment, there is only a conversion process using the speech code conversion unit 70. However, since the G.729A side is a non-speech frame after the conversion point, the G.729A speech parameters (LSP, pitch lag, algebraic code, pitch gain, algebraic code gain) that are input to the speech code conversion unit 70 will remain as they are. Can't get. Therefore, as shown in FIG. 12, the LSP and the algebraic code gain are substituted with the CN parameters LSP1 (k), POW1 (k) (k <n) received last by the non-voice code converting unit 60, and other parameters ( The pitch lag lag (m), pitch gain Ga (m), and algebraic code code (m)) are arbitrarily set to such an extent that the pitch lag generating unit 101, algebraic code generating unit 102, and pitch gain generating unit 103 have no audible adverse effects. Generate. The generation method may be generated randomly or by a fixed value. However, it is desirable to set the minimum value (0.2) for the pitch gain.

音声区間及び音声→非音声区間への切り替わり時、音声符号変換部７０は以下のように動作する。
音声区間において、符号分離部71は入力するG.729Aの音声符号より、LSP符号ILSP1(m)、ピッチラグ符号I LAG1(m)、代数符号I CODE1(m)、ゲイン符号I GAIN1(m)を分離し、それぞれLSP逆量子化器72a、ピッチラグ逆量子化器73a、代数符号逆量子化器74a、ゲイン逆量子化器75aに入力する。又、音声区間において、切換部77a〜77eは変換制御部53からの指示により、LSP逆量子化器72a、ピッチラグ逆量子化器73a、代数符号逆量子化器74a、ゲイン逆量子化器75aの出力を選択する。 At the time of switching from the voice section and the voice to the non-voice section, the voice code conversion unit 70 operates as follows.
In the speech section, the code separation unit 71 determines the LSP code ILSP1 (m), the pitch lag code I, from the input G.729A speech code. LAG1 (m), algebraic code I CODE1 (m), gain code I GAIN1 (m) is separated and input to the LSP inverse quantizer 72a, pitch lag inverse quantizer 73a, algebraic code inverse quantizer 74a, and gain inverse quantizer 75a, respectively. In addition, in the voice section, the switching units 77a to 77e are operated by the LSP inverse quantizer 72a, the pitch lag inverse quantizer 73a, the algebraic code inverse quantizer 74a, and the gain inverse quantizer 75a according to an instruction from the conversion control unit 53. Select an output.

LSP逆量子化器72ａは、G.729AのLSP符号を逆量子化してLSP逆量子化値を出力し、LSP量子化器72bは該LSP逆量子化値をAMRのLSP量子化テーブルを用いて量子化してLSP符号I LSP2(n)を出力する。ピッチラグ逆量子化器73aは、G.729Aのピッチラグ符号を逆量子化してピッチラグ逆量子化値を出力し、ピッチラグ量子化器73bは該ピッチラグ逆量子化値をAMRのピッチラグ量子化テーブルを用いて量子化してピッチラグ符号I LAG2(n)を出力する。代数符号逆量子化器74aは、G.729Aの代数符号を逆量子化して代数符号逆量子化値を出力し、代数符号量子化器74bは該代数符号逆量子化値をAMRの代数符号量子化テーブルを用いて量子化して代数符号I CODE2(n) を出力する。ゲイン逆量子化器75aは、G.729Aのゲイン符号を逆量子化してピッチゲイン逆量子化値Ｇaと代数ゲイン逆量子化値Ｇcを出力し、ピッチゲイン量子化器75bは該ピッチゲイン逆量子化値ＧaをAMRのピッチゲイン量子化テーブルを用いて量子化してピッチゲイン符号I GAIN2a(n)を出力する。また、代数ゲイン量子化器75ｃは代数ゲイン逆量子化値ＧcをAMRのゲイン量子化テーブルを用いて量子化して代数ゲイン符号I GAIN2c(n)を出力する。 The LSP dequantizer 72a dequantizes the G.729A LSP code and outputs an LSP dequantized value, and the LSP quantizer 72b uses the AMR LSP quantization table to output the LSP dequantized value. Quantize and LSP code I Outputs LSP2 (n). The pitch lag dequantizer 73a dequantizes the pitch lag code of G.729A and outputs a pitch lag dequantized value, and the pitch lag quantizer 73b uses the AMR pitch lag quantization table to output the pitch lag dequantized value. Quantize and pitch lag code I Output LAG2 (n). The algebraic code dequantizer 74a dequantizes the G.729A algebraic code and outputs an algebraic code dequantized value, and the algebraic code quantizer 74b converts the algebraic code dequantized value into the AMR algebraic code quantizer. Algebraic code I quantized using the quantization table Output CODE2 (n). The gain dequantizer 75a dequantizes the G.729A gain code and outputs a pitch gain dequantized value Ga and an algebraic gain dequantized value Gc. The pitch gain quantizer 75b The quantized value Ga is quantized using the pitch gain quantization table of AMR, and the pitch gain code I Output GAIN2a (n). Also, the algebraic gain quantizer 75c quantizes the algebraic gain inverse quantization value Gc using the AMR gain quantization table, and algebraic gain code I Output GAIN2c (n).

符号多重化部76は、各量子化器72ｂ〜75b,75cから出力するLSP符号、ピッチラグ符号、代数符号、ピッチゲイン符号、代数ゲイン符号を多重し、フレームタイプ情報(=S)を付加してAMRによる音声符号を作成して送出する。
音声区間においては、以上の動作が繰り返され、G.729Aの音声符号をAMRの音声符号に変換して出力する。
一方、音声→非音声区間への切り替わり時においてハングオーバー制御を行うものとすれば、切換部77aは変換制御部53からの指示に従って、非音声符号変換部60で最後に受信したLSP符号より得られたLSPパラメータLSP1(k)を選択してLSP量子化器72bに入力する。また、切換部77bはピッチラグ生成部101から発生するピッチラグパラメータlag(m)を選択してピッチラグ量子化器7３bに入力する。また、切換部77cは代数符号生成部102から発生する代数符号パラメータcode(m)を選択して代数符号量子化器74bに入力する。また、切換部77dはピッチゲイン生成部10３から発生するピッチゲインパラメータＧa(m)を選択してピッチゲイン量子化器75bに入力する。また、切換部77eは非音声符号変換部60で最後に受信したフレーム電力符号IPOW1(k)より得られたフレーム電力パラメータPOW1(k)を選択して代数ゲイン量子化器75cに入力する。 The code multiplexing unit 76 multiplexes the LSP code, pitch lag code, algebraic code, pitch gain code, and algebraic gain code output from each quantizer 72b to 75b, 75c, and adds frame type information (= S). Create and send AMR voice code.
In the speech section, the above operation is repeated, and the G.729A speech code is converted into an AMR speech code and output.
On the other hand, if hangover control is to be performed at the time of switching from speech to non-speech section, switching unit 77a obtains from the last received LSP code by non-speech code conversion unit 60 in accordance with an instruction from conversion control unit 53. The selected LSP parameter LSP1 (k) is selected and input to the LSP quantizer 72b. Further, the switching unit 77b selects the pitch lag parameter lag (m) generated from the pitch lag generation unit 101 and inputs it to the pitch lag quantizer 73b. The switching unit 77c selects the algebraic code parameter code (m) generated from the algebraic code generating unit 102 and inputs it to the algebraic code quantizer 74b. The switching unit 77d selects the pitch gain parameter Ga (m) generated from the pitch gain generation unit 103 and inputs it to the pitch gain quantizer 75b. Further, the switching unit 77e selects the frame power parameter POW1 (k) obtained from the frame power code IPOW1 (k) last received by the non-voice code converting unit 60 and inputs it to the algebraic gain quantizer 75c.

LSP量子化器72bは切換部77ａを介して非音声符号変換部60より入力したLSPパラメータLSP1(k)をAMRのLSP量子化テーブルを用いて量子化してLSP符号I LSP2(n)を出力する。ピッチラグ量子化器73bは切換部77bを介してピッチラグ生成部101より入力したピッチラグパラメータをAMRのピッチラグ量子化テーブルを用いて量子化してピッチラグ符号I LAG2(n)を出力する。代数符号量子化器74bは切換部77cを介して代数符号生成部102より入力した代数符号パラメータをAMRの代数符号量子化テーブルを用いて量子化して代数符号I CODE2(n) を出力する。ピッチゲイン量子化器75bは切換部77ｄを介してピッチゲイン生成部103より入力したピッチゲインパラメータをAMRのピッチゲイン量子化テーブルを用いて量子化してピッチゲイン符号I GAIN2a(n)を出力する。また、代数ゲイン量子化器75ｃは切換部77eを介して非音声符号変換部60より入力したフレーム電力パラメータPOW1(k)をAMRの代数ゲイン量子化テーブルを用いて量子化して代数ゲイン符号I GAIN2c(n)を出力する。 The LSP quantizer 72b quantizes the LSP parameter LSP1 (k) input from the non-voice code conversion unit 60 via the switching unit 77a using the AMR LSP quantization table, and performs the LSP code I Outputs LSP2 (n). The pitch lag quantizer 73b quantizes the pitch lag parameters input from the pitch lag generation unit 101 via the switching unit 77b using the AMR pitch lag quantization table to generate the pitch lag code I. Output LAG2 (n). The algebraic code quantizer 74b quantizes the algebraic code parameter input from the algebraic code generating unit 102 via the switching unit 77c using the AMR algebraic code quantization table, and algebraic code I Output CODE2 (n). The pitch gain quantizer 75b quantizes the pitch gain parameter input from the pitch gain generating unit 103 via the switching unit 77d using the pitch gain quantization table of AMR, and performs pitch gain code I. Output GAIN2a (n). The algebraic gain quantizer 75c quantizes the frame power parameter POW1 (k) input from the non-speech code converting unit 60 via the switching unit 77e using the AMR algebraic gain quantization table, and algebraic gain code I Output GAIN2c (n).

符号多重化部76は、各量子化器72ｂ〜75b,75cから出力するLSP符号、ピッチラグ符号、代数符号、ピッチゲイン符号、代数ゲイン符号を多重し、フレームタイプ情報(=Ｓ)を付加してAMRによる音声符号を作成して送出する。
音声区間→非音成区間への変化点において、音声符号変換部70はAMRの7フレーム分の音声符号を送出するまで以上の動作を繰り返し、7フレーム分の音声符号の送出が完了すれば次の音声区間が検出されるまで音声符号の出力を停止する。 The code multiplexing unit 76 multiplexes the LSP code, pitch lag code, algebraic code, pitch gain code, and algebraic gain code output from each quantizer 72b to 75b, 75c, and adds frame type information (= S). Create and send AMR voice code.
At the transition point from the speech section to the non-speech section, the speech code conversion unit 70 repeats the above operation until sending the AMR speech code for 7 frames, and if the transmission of the 7 frames of speech code is completed, The voice code output is stopped until a voice segment is detected.

７フレーム分の音声符号の送出が完了すれば、変換制御部53の制御で図11のスイッチS1,S2が端子3,5側に切り替わり、以後、非音声符号変換部60によるCN符号変換処理が行われる。
図1３(a)に示すようにハングオーバー後の第m+14,第m+15フレーム(AMR側の第n+7フレーム)は、AMRのDTX制御に合わせてSID_FIRSTフレームとして設定する必要がある。ただし、CNパラメータの伝送は必要なく、したがって、符号多重部63はSID_FIRSTのフレームタイプを表す情報のみをbst2(m+7)に含めて出力する。以後、図7の第3実施例と同様にCN符号変換を行う。 When the transmission of the speech code for 7 frames is completed, the switches S1 and S2 in FIG. 11 are switched to the terminals 3 and 5 side by the control of the conversion control unit 53. Thereafter, the CN code conversion processing by the non-speech code conversion unit 60 is performed. Done.
As shown in Figure 13 (a), the m + 14th and m + 15th frames (the n + 7th frame on the AMR side) after the hangover must be set as SID_FIRST frames in accordance with AMR DTX control. . However, transmission of the CN parameter is not necessary, and therefore the code multiplexing unit 63 outputs only the information indicating the frame type of SID_FIRST included in bst2 (m + 7). Thereafter, CN code conversion is performed as in the third embodiment of FIG.

以上は、ハングオーバー制御を行う場合におけるCN符号変換であるが、最後にSID_UPDATAフレームへ変換処理が行われてから変化点フレームまでのAMRにおける経過フレーム数が23フレーム以下の場合には、ハングオーバー制御は行われない。かかるハングオーバ制御を行わない場合の制御方法を図1３(b)に示す。
音声区間と非音声区間の境界フレームである第m,第m+1フレームは、ハングオーバー時と同じように音声符号変換部７０でAMRの音声フレームに変換して出力する。 The above is CN code conversion when hangover control is performed, but when the number of elapsed frames in AMR from the last conversion to SID_UPDATA frame to the change point frame is 23 frames or less, hangover There is no control. FIG. 13B shows a control method when such hangover control is not performed.
The m-th and m + 1-th frames, which are the boundary frames between the speech section and the non-speech section, are converted into AMR speech frames by the speech code conversion unit 70 in the same manner as at the time of hangover, and output.

次の第m+2、第m+3フレームは、SID_UPDATAフレームに変換する。
また、第m+4フレーム以後のフレームは第3実施例で述べた非音声区間における変換方法と同じ方法を用いる。
次に非音声区間から音声区間への変化点でのCN符号変換方法について説明する。図１４に変換制御方法の時間的な流れを示す。G.729Aの第mフレームが非音声フレーム(SIDフレーム or 非伝送フレーム)、第m+1フレームが音声フレームである場合、そこは非音声区間から音声区間への変化点である。この場合、音声の話頭切れ(音声の立ち上がりが消えてしまう)を防ぐため、AMRの第nフレームは音声フレームとして変換する。したがって、G.729Aの第mフレームは非音声フレームを音声フレームとして変換する。変換方法は、ハングオーバー時と同じように音声符号変換部７０でAMRの音声フレームに変換して出力する。 The next m + 2 and m + 3 frames are converted into SID_UPDATA frames.
For the frames after the (m + 4) th frame, the same method as the conversion method in the non-voice section described in the third embodiment is used.
Next, a CN code conversion method at a change point from a non-voice section to a voice section will be described. FIG. 14 shows a temporal flow of the conversion control method. When the m.th frame of G.729A is a non-voice frame (SID frame or non-transmission frame) and the m + 1th frame is a voice frame, this is a transition point from a non-voice section to a voice section. In this case, the AMR nth frame is converted as a voice frame in order to prevent the voice head from being cut off (the rising edge of the voice disappears). Therefore, the mth frame of G.729A converts a non-voice frame as a voice frame. As in the case of the hangover, the conversion method converts the voice code conversion unit 70 into an AMR voice frame and outputs it.

以上説明した通り、本実施例によれば音声区間から非音声区間への変化点においてG.729Aの非音声フレームをAMRの音声フレームに変換する必要がある場合、G.729AのCNパラメータをAMRの音声パラメータとして代用してAMRの音声符号を生成することができる。 As described above, according to the present embodiment, when it is necessary to convert a non-voice frame of G.729A into a voice frame of AMR at the transition point from the voice section to the non-voice section, the CN parameter of G.729A is set to AMR. It is possible to generate an AMR speech code by substituting as a speech parameter.

・付記
（付記１）入力信号を第1の音声符号化方式で符号化して得られる第1の音声符号を、第2の音声符号化方式の第2の音声符号に変換する音声符号変換方法において、
入力信号に含まれる非音声信号を第1の音声符号化方式の非音声圧縮機能により符号化して得られた第1の非音声符号を一旦非音声信号に復号することなく第2の音声符号化方式の第2の非音声符号に変換する、
することを特徴とする音声符号変換方法。 Supplementary Note (Supplementary Note 1) In a speech code conversion method for converting a first speech code obtained by encoding an input signal using a first speech coding method to a second speech code of a second speech coding method ,
Second speech coding without first decoding the first non-speech code obtained by coding the non-speech signal included in the input signal by the non-speech compression function of the first speech coding method Convert to the second non-voice code of the scheme,
A speech code conversion method characterized by:

（付記２）入力信号を第1の音声符号化方式で符号化して得られる第1の音声符号を、第2の音声符号化方式の第2の音声符号に変換する音声符号変換方法において、
入力信号に含まれる非音声信号を第1の音声符号化方式の非音声圧縮機能により符号化して得られた第1の非音声符号を第1の複数の要素符号に分離し、
第1の複数の要素符号を前記第2の非音声符号を構成する第2の複数の要素符号に変換し、
前記変換により得られた第2の複数の要素符号を多重化して第2の非音声符号を出力する、
ことを特徴とする音声符号変換方法。 (Supplementary note 2) In a speech code conversion method for converting a first speech code obtained by encoding an input signal by a first speech encoding method into a second speech code of a second speech encoding method,
Separating the first non-speech code obtained by encoding the non-speech signal included in the input signal by the non-speech compression function of the first speech encoding method into a plurality of first element codes;
Converting a plurality of first element codes into a plurality of second element codes constituting the second non-voice code;
Multiplex the second plurality of element codes obtained by the conversion to output a second non-voice code,
A speech code conversion method characterized by the above.

（付記３）前記第１の要素符号は、非音声信号を一定サンプル数からなるフレームに分割し、フレーム毎に分析して得られる非音声信号の特徴を表す特徴パラメータを第1の音声符号化方式独自の量子化テーブルを用いて量子化して得られる符号であり、
前記第２の要素符号は、前記特徴パラメータを第２の音声符号化方式独自の量子化テーブルを用いて量子化して得られる符号である、
ことを特徴とする付記２記載の音声符号変換方法。
（付記４）前記特徴パラメータは、非音声信号の周波数特性の概形を表わすLPC係数(線形予測係数)と非音声信号の振幅特性を表わすフレーム信号電力である、
ことを特徴とする付記３記載の音声符号変換方法。
（付記５）前記変換ステップにおいて、前記第1の複数の要素符号を第1の音声符号化方式と同じ量子化テーブルを持つ逆量子化器で逆量子化し、
逆量子化により得られた複数の要素符号の逆量子化値を第2の音声符号化方式と同じ量子化テーブルを持つ量子化器で量子化して第2の複数の要素符号に変換する、ことを特徴とする付記２または付記３または４記載の音声符号変換方法。 (Supplementary Note 3) The first element code is obtained by dividing a non-speech signal into frames each having a predetermined number of samples and analyzing a feature parameter representing a feature of the non-speech signal obtained by analysis for each frame. It is a code obtained by quantizing using a method-specific quantization table,
The second element code is a code obtained by quantizing the feature parameter using a quantization table unique to the second speech encoding method.
The speech code conversion method according to supplementary note 2, characterized by:
(Supplementary Note 4) The characteristic parameter is an LPC coefficient (linear prediction coefficient) representing an outline of the frequency characteristic of a non-speech signal and a frame signal power representing an amplitude characteristic of the non-speech signal.
The speech code conversion method according to supplementary note 3, characterized by:
(Supplementary note 5) In the conversion step, the first plurality of element codes are inversely quantized by an inverse quantizer having the same quantization table as that of the first speech encoding method,
Quantize the inverse quantization values of multiple element codes obtained by inverse quantization with a quantizer that has the same quantization table as the second speech encoding method, and convert it to the second multiple element codes. The speech code conversion method according to appendix 2 or appendix 3 or 4, characterized by the above.

（付記６）入力信号の一定サンプル数をフレームとし、フレーム単位で音声区間における音声信号を第1の音声符号化方式で符号化して得られる第1の音声符号と、非音声区間における非音声信号を第1の非音声符号化方式で符号化して得られる第1の非音声符号を混在して送信側より伝送し、これら第１の音声符号と第１の非音声符号をそれぞれ、第２の音声符号化方式による第2の音声符号と第２の非音声符号化方式による第2の非音声符号とにそれぞれ変換し、変換により得られた第2の音声符号と第2の非音声符号を混在して受信側に伝送する音声通信システムにおける音声符号変換方法において、
非音声区間では所定のフレームにおいてのみ非音声符号を伝送し、それ以外のフレームでは非音声符号を伝送せず、
前記フレーム単位の符号情報に、音声フレーム、非音声フレーム、符号を伝送しない非伝送フレームの別を示すフレームタイプ情報を付加し、
フレームタイプ情報に基いてどのフレームの符号であるか識別し、
非音声フレーム、非伝送フレームの場合には、第1、第2の非音声符号化方式におけるフレーム長の差、および非音声符号の伝送制御の相違を考慮して第1の非音声符号を第2の非音声符号に変換する、
ことを特徴とする音声符号変換方法。 (Supplementary Note 6) A first speech code obtained by encoding a speech signal in a speech interval in a frame unit by a first speech encoding method and a non-speech signal in a non-speech interval, with a certain number of samples of the input signal as a frame The first non-speech code obtained by encoding the first non-speech code is transmitted from the transmission side in a mixed manner, and the first non-speech code and the first non-speech code are respectively transmitted to the second non-speech code. The second speech code by the speech coding method and the second non-speech code by the second non-speech coding method are respectively converted, and the second speech code and the second non-speech code obtained by the conversion are converted. In a voice code conversion method in a voice communication system that mixes and transmits to the receiving side,
In a non-voice section, a non-voice code is transmitted only in a predetermined frame, and a non-voice code is not transmitted in other frames.
Add frame type information indicating the distinction between voice frames, non-voice frames, and non-transmission frames that do not transmit codes to the code information in units of frames,
Identify which frame code is based on frame type information,
In the case of non-speech frames and non-transmission frames, the first non-speech code is assigned to the first non-speech code in consideration of the difference in frame length between the first and second non-speech coding methods and the difference in transmission control of the non-speech code. Convert to 2 non-speech code,
A speech code conversion method characterized by the above.

（付記7） (1)第1の非音声符号化方式が、非音声区間における所定フレーム数毎に平均した非音声符号を伝送すると共に、その他のフレームでは非音声符号を伝送しない方式であり、（2）第2の非音声符号化方式が、非音声区間における非音声信号の変化の度合が大きいフレームにおいてのみ非音声符号を伝送し、その他のフレームでは非音声符号を伝送せず、しかも、連続して非音声符号を伝送しない方式であり、更に、(3)第1の非音声符号化方式のフレーム長が、第２の非音声符号化方式のフレーム長の2倍であるとき、
第1の非音声符号化方式における非伝送フレームの符号情報を第２の非音声符号化方式における２つの非伝送フレームの符号情報に変換し、
第1の非音声符号化方式における非音声フレームの符号情報を、第２の非音声符号化方式における非音声フレームの符号情報と非伝送フレームの符号情報との2つに変換する、
ことを特徴とする付記6記載の音声符号変換方法。 (Appendix 7) (1) The first non-speech encoding method is a method of transmitting a non-speech code averaged for each predetermined number of frames in a non-speech interval and not transmitting a non-speech code in other frames. (2) The second non-speech encoding method transmits a non-speech code only in a frame in which the degree of change of a non-speech signal in a non-speech period is large, and does not transmit a non-speech code in other frames, (3) when the frame length of the first non-voice encoding scheme is twice the frame length of the second non-voice encoding scheme;
Converting the code information of the non-transmission frame in the first non-voice encoding scheme into the code information of two non-transmission frames in the second non-voice encoding scheme;
Converting the code information of the non-voice frame in the first non-voice coding method into two pieces of code information of the non-voice frame and code information of the non-transmission frame in the second non-voice coding method;
The speech code conversion method according to appendix 6, characterized in that:

（付記8）音声区間から非音声区間に変化するとき、前記第1の非音声符号化方式が、変化点のフレームを含めて連続nフレームは音声フレームとみなして音声符号を伝送し、次のフレームは非音声符号を含まない最初の非音声フレームとしてフレームタイプ情報を伝送する場合、
第1の非音声符号化方式における前記最初の非音声フレームが検出された時、第1の音声符号化方式における直前n個の音声フレームの音声符号を逆量子化して得られる逆量子化値を平均化し、平均値を量子化して前記第２の非音声符号化方式の非音声フレームにおける非音声符号を求める、
ことを特徴とする付記７記載の音声符号変換方法。 (Appendix 8) When changing from a voice section to a non-voice section, the first non-voice coding method transmits a voice code by regarding the consecutive n frames including the frame at the change point as a voice frame, If the frame carries frame type information as the first non-voice frame that does not contain a non-voice code,
When the first non-speech frame in the first non-speech encoding method is detected, an inverse quantization value obtained by de-quantizing the speech code of the n previous speech frames in the first speech encoding method Averaging, quantizing the average value to obtain a non-speech code in a non-speech frame of the second non-speech coding scheme;
The speech code conversion method according to supplementary note 7, wherein

（付記９） (1)第１の非音声符号化方式が、非音声区間における非音声信号の変化の度合が大きいフレームにおいてのみ非音声符号を伝送し、その他のフレームでは非音声符号を伝送せず、また、連続して非音声符号を伝送しない方式であり、（2）第２の非音声符号化方式が、非音声区間における所定フレーム数Ｎ毎に平均した非音声符号を伝送すると共に、その他のフレームでは非音声符号を伝送しない方式であり、更に、(3)第１の非音声符号化方式のフレーム長が、第２の非音声符号化方式のフレーム長の半分であるとき、
第１の非音声符号化方式の連続する2×Ｎフレームにおける各非音声符号の逆量子化値を平均し、平均値を逆量子化して第２の非音声符号化方式におけるNフレーム毎のフレームの非音声符号とし、
Nフレーム毎以外のフレームについては、第1の非音声符号化方式の連続する2つのフレームの符号情報をフレームタイプに関係なく第２の非音声符号化方式の１つの非伝送フレームの符号情報に変換する、
ことを特徴とする付記6記載の音声符号変換方法。 (Supplementary note 9) (1) The first non-speech coding method transmits a non-speech code only in a frame where the degree of change of a non-speech signal in a non-speech section is large, and transmits a non-speech code in other frames. And (2) the second non-speech encoding method transmits a non-speech code averaged every predetermined number of frames N in the non-speech interval, and (2) In other frames, a non-speech code is not transmitted. Further, (3) when the frame length of the first non-speech encoding method is half of the frame length of the second non-speech encoding method,
A frame for every N frames in the second non-speech coding method is obtained by averaging the inverse quantization values of the non-speech codes in the continuous 2 × N frames of the first non-speech coding method, and dequantizing the average value. Non-speech code
For frames other than every N frames, the code information of two consecutive frames of the first non-voice coding method is changed to the code information of one non-transmission frame of the second non-voice coding method regardless of the frame type. Convert,
The speech code conversion method according to appendix 6, characterized in that:

（付記１０）音声区間から非音声区間に変化するとき、前記第２の非音声符号化方式が、変化点のフレームを含めて連続ｎフレームは音声フレームとみなして音声符号を伝送し、次のフレームは非音声符号を含まない最初の非音声フレームとしてフレームタイプ情報を伝送する場合、
第１の非音声フレームの非音声符号を逆量子化して複数の要素符号の逆量子化値を発生し、同時に、予め定めた、あるいはランダムな別の要素符号の逆量子化値を発生し、
連続する２フレームの各要素符号の逆量子化値を第２音声符号化方式の量子化テーブルを用いてそれぞれ量子化して第２音声符号化方式の１フレーム分の音声符号に変換し、
ｎフレーム分の第２音声符号化方式の音声符号を出力した後、非音声符号を含まない前記最初の非音声フレームのフレームタイプ情報を送出する、
ことを特徴とする付記９記載の音声符号変換方法。 (Supplementary Note 10) When the second non-speech encoding method changes from a speech segment to a non-speech segment, the second non-speech encoding scheme transmits a speech code by regarding the consecutive n frames including the change point frame as a speech frame, If the frame carries frame type information as the first non-voice frame that does not contain a non-voice code,
Dequantizing the non-speech code of the first non-speech frame to generate a dequantized value of a plurality of element codes, and simultaneously generating a dequantized value of another predetermined or random element code;
The inverse quantization value of each element code of two consecutive frames is quantized using the quantization table of the second speech coding scheme and converted into a speech code for one frame of the second speech coding scheme,
After outputting the voice code of the second voice coding system for n frames, sending the frame type information of the first non-voice frame not including the non-voice code,
The speech code conversion method according to supplementary note 9, characterized by that.

（付記１１）入力信号を第1の音声符号化方式で符号化して得られる第1の音声符号を、第2の音声符号化方式の第2の音声符号に変換する音声符号変換装置において、
入力信号に含まれる非音声信号を第1の音声符号化方式の非音声圧縮機能により符号化して得られた第1の非音声符号を第1の複数の要素符号に分離する符号分離部、
第1の複数の要素符号を、前記第2の非音声符号を構成する第2の複数の要素符号に変換する要素符号変換部、
前記変換により得られた第2の各要素符号を多重化して第2の非音声符号を出力する符号多重部、
を備えたことを特徴とする音声符号変換装置。 (Supplementary Note 11) In a speech code conversion device that converts a first speech code obtained by encoding an input signal using a first speech encoding method to a second speech code of a second speech encoding method,
A code separation unit for separating a first non-voice code obtained by encoding a non-voice signal included in an input signal by the non-voice compression function of the first voice coding method into a plurality of first element codes;
An element code conversion unit that converts the first plurality of element codes into the second plurality of element codes constituting the second non-voice code;
A code multiplexing unit that multiplexes each second element code obtained by the conversion and outputs a second non-voice code;
A speech code conversion device comprising:

（付記１２）前記第１の要素符号は、非音声信号を一定サンプル数からなるフレームに分割し、フレーム毎に分析して得られる非音声信号の特徴を表す特徴パラメータを第1の音声符号化方式独自の量子化テーブルを用いて量子化して得られる符号であり、
前記第２の要素符号は、前記特徴パラメータを第２の音声符号化方式独自の量子化テーブルを用いて量子化して得られる符号である、
ことを特徴とする付記１１記載の音声符号変換装置。
（付記１３）前記要素符号変換部は、
前記第1の各要素符号を第1の音声符号化方式と同じ量子化テーブルに基いて逆量子化する逆量子化器、
前記逆量子化により得られた各要素符号の逆量子化値を第2の音声符号化方式と同じ量子化テーブルに基いて量子化して第2の各要素符号に変換する量子化器、
を備えたことを特徴とする付記１１または１２記載の音声符号変換装置。 (Supplementary note 12) The first element code is obtained by dividing a non-speech signal into frames each having a predetermined number of samples and analyzing a feature parameter representing a feature of the non-speech signal obtained by analysis for each frame. It is a code obtained by quantizing using a method-specific quantization table,
The second element code is a code obtained by quantizing the feature parameter using a quantization table unique to the second speech encoding method.
The speech code conversion device according to appendix 11, characterized in that.
(Supplementary Note 13) The element code conversion unit includes:
An inverse quantizer that inversely quantizes each of the first element codes based on the same quantization table as the first speech encoding scheme;
A quantizer that quantizes the dequantized value of each element code obtained by the dequantization based on the same quantization table as the second speech encoding method and converts it into a second element code;
The speech code converter according to appendix 11 or 12, characterized by comprising:

（付記１４）入力信号の一定サンプル数をフレームとし、フレーム単位で音声区間における音声信号を第1の音声符号化方式で符号化して得られる第1の音声符号と、非音声区間における非音声信号を第1の非音声符号化方式で符号化して得られる第1の非音声符号を混在して送信側より伝送し、これら第１の音声符号と第１の非音声符号をそれぞれ、第２の音声符号化方式による第2の音声符号と第２の非音声符号化方式による第2の非音声符号とにそれぞれ変換し、変換により得られた第2の音声符号と第2の非音声符号を受信側に伝送する音声通信システムにおける音声符号変換装置において、
符号情報に付加されているフレームタイプ情報に基いて、音声フレーム、非音声フレーム、非音声区間において非音声符号を伝送しない非伝送フレームの別を識別するフレームタイプ識別部、
非音声フレームにおける第1の非音声符号を、第1の非音声符号化方式と同じ量子化テーブルに基いて逆量子化し、得られた逆量子化値を第2の非音声符号化方式と同じ量子化テーブルに基いて量子化して第2の非音声符号に変換する非音声符号変換部、
第1、第2の非音声符号化方式におけるフレーム長の差、および非音声符号の伝送制御の相違を考慮して前記非音声符号変換部を制御する変換制御部、
を有することを特徴とする音声符号変換装置。 (Supplementary Note 14) A first speech code obtained by encoding a speech signal in a speech interval in a frame unit with a first speech encoding method and a non-speech signal in a non-speech interval, with a certain number of samples of the input signal as a frame The first non-speech code obtained by encoding the first non-speech code is transmitted from the transmission side in a mixed manner, and the first non-speech code and the first non-speech code are respectively transmitted to the second non-speech code. The second speech code by the speech coding method and the second non-speech code by the second non-speech coding method are respectively converted, and the second speech code and the second non-speech code obtained by the conversion are converted. In a voice code conversion apparatus in a voice communication system for transmission to a receiving side,
A frame type identifying unit that identifies a non-transmission frame that does not transmit a non-speech code in a non-speech section based on frame type information added to the code information;
The first non-speech code in the non-speech frame is dequantized based on the same quantization table as the first non-speech encoding method, and the obtained dequantized value is the same as the second non-speech encoding method A non-speech code conversion unit that quantizes and converts to a second non-speech code based on a quantization table;
A conversion control unit for controlling the non-speech code conversion unit in consideration of a difference in frame length in the first and second non-speech encoding schemes and a difference in transmission control of the non-speech code;
A speech code conversion device comprising:

（付記１５） (1)第1の非音声符号化方式が、非音声区間における所定フレーム数毎に平均した非音声符号を伝送すると共に、その他のフレームでは非音声符号を伝送しない方式であり、（2）第2の非音声符号化方式が、非音声区間における非音声信号の変化の度合が大きいフレームにおいてのみ非音声符号を伝送し、その他のフレームでは非音声符号を伝送せず、しかも、連続して非音声符号を伝送しない方式であり、更に、(3)第1の非音声符号化方式のフレーム長が、第２の非音声符号化方式のフレーム長の2倍であるとき、前記非音声符号変換部は、
第1の非音声符号化方式における非伝送フレームの符号情報を第２の非音声符号化方式における２つの非伝送フレームの符号情報に変換し、第1の非音声符号化方式における非音声フレームの符号情報を、第２の非音声符号化方式における非音声フレームの符号情報と非伝送フレームの符号情報の2つに変換する、ことを特徴とする付記１４記載の音声符号変換装置。 (Supplementary Note 15) (1) The first non-speech coding method is a method of transmitting a non-speech code averaged for each predetermined number of frames in a non-speech interval and not transmitting a non-speech code in other frames. (2) The second non-speech encoding method transmits a non-speech code only in a frame in which the degree of change of a non-speech signal in a non-speech period is large, and does not transmit a non-speech code in other frames, When the frame length of the first non-speech encoding method is twice the frame length of the second non-speech encoding method, and the non-speech code is not continuously transmitted. Non-speech code converter
The code information of the non-transmission frame in the first non-voice encoding method is converted into the code information of the two non-transmission frames in the second non-voice coding method, and the non-voice frame of the first non-voice coding method is converted. 15. The speech code conversion apparatus according to appendix 14, wherein the code information is converted into two pieces of code information of a non-voice frame and code information of a non-transmission frame in the second non-voice coding method.

（付記１６）音声区間から非音声区間に変化するとき、前記第1の非音声符号化方式が、変化点のフレームを含めて連続nフレームは音声フレームとみなして音声符号を伝送し、次のフレームは非音声符号を含まない最初の非音声フレームとしてフレームタイプ情報を伝送する場合、前記非音声符号変換部は、
第1の音声符号化方式における最新のn個の音声フレームの音声符号を逆量子化して得られる逆量子化値を保持するバッファ、
n個の逆量子化値を平均する平均値算出部、
前記最初の非音声フレームが検出されたとき、前記平均値を量子化する量子化器、
を備え、量子化器の出力に基いて前記第２の非音声符号化方式における非音声符号を出力することを特徴とする付記１５記載の音声符号変換装置。 (Supplementary Note 16) When the speech section changes to the non-speech section, the first non-speech coding scheme transmits the speech code by regarding the consecutive n frames including the change point frame as speech frames, When transmitting the frame type information as a first non-voice frame that does not include a non-voice code, the non-voice code conversion unit includes:
A buffer that holds a dequantized value obtained by dequantizing the speech code of the latest n speech frames in the first speech coding scheme;
An average value calculation unit that averages n dequantized values,
A quantizer for quantizing the average value when the first non-voice frame is detected;
16. The speech code conversion device according to appendix 15, wherein the speech code conversion device outputs a non-speech code in the second non-speech coding method based on an output of the quantizer.

（付記１７） (1)第１の非音声符号化方式が、非音声区間における非音声信号の変化の度合が大きいフレームにおいてのみ非音声符号を伝送し、その他のフレームでは非音声符号を伝送せず、また、連続して非音声符号を伝送しない方式であり、（2）第２の非音声符号化方式が、非音声区間における所定フレーム数Ｎ毎に平均した非音声符号を伝送すると共に、その他のフレームでは非音声符号を伝送しない方式であり、更に、(3)第１の非音声符号化方式のフレーム長が、第２の非音声符号化方式のフレーム長の半分であるとき、前記非音声符号変換部は、
第1の非音声符号化方式の連続する2×Ｎフレームにおける各非音声符号の逆量子化値を保持するバッファ、
保持されている逆量子化値の平均値を演算する平均値算出部、
平均値を量子化して第２の非音声符号化方式におけるNフレーム毎の非音声符号に変換する量子化器、
Nフレーム毎以外のフレームについては、第1の非音声符号化方式の連続する2つのフレームの符号情報をフレームタイプに関係なく第２の非音声符号化方式の１つの非伝送フレームの符号情報に変換する手段、
を備えたことを特徴とする付記１４記載の音声符号変換装置。 (Supplementary Note 17) (1) The first non-speech coding method transmits a non-speech code only in a frame where the degree of change of a non-speech signal in a non-speech section is large, and transmits a non-speech code in other frames. And (2) the second non-speech encoding method transmits a non-speech code averaged every predetermined number of frames N in the non-speech interval, and (2) In other frames, a non-speech code is not transmitted, and (3) when the frame length of the first non-speech encoding method is half the frame length of the second non-speech encoding method, Non-speech code converter
A buffer that holds an inverse quantization value of each non-voice code in consecutive 2 × N frames of the first non-voice coding method;
An average value calculation unit for calculating the average value of the dequantized values held;
A quantizer that quantizes the average value and converts it into a non-voice code every N frames in the second non-voice coding method;
For frames other than every N frames, the code information of two consecutive frames of the first non-voice coding method is changed to the code information of one non-transmission frame of the second non-voice coding method regardless of the frame type. Means to convert,
15. A speech code conversion device according to supplementary note 14, characterized by comprising:

（付記１８）音声区間から非音声区間に変化するとき、前記第２の非音声符号化方式が、変化点のフレームを含めて連続nフレームを音声フレームとみなして音声符号を伝送し、次のフレームは非音声符号を含まない最初の非音声フレームとしてフレームタイプ情報を伝送する場合、非音声符号変換部は、
第1の非音声フレームの非音声符号を逆量子化して複数の要素符号の逆量子化値を発生する逆量子化器、
予め定めた、あるいはランダムな複数の要素符号の逆量子化値を発生する手段、を備え、連続する2フレームの各要素符号の逆量子化値を第2音声符号化方式の量子化テーブルを用いてそれぞれ量子化して第2の音声符号化方式の1フレーム分の音声符号に変換して出力し、ｎフレーム分の第2音声符号化方式の音声符号を出力した後、非音声符号を含まない前記最初の非音声フレームのフレームタイプ情報を送出する、
ことを特徴とする付記１７記載の音声符号変換装置。 (Supplementary Note 18) When the speech section changes to the non-speech section, the second non-speech coding scheme transmits the speech code by regarding the consecutive n frames including the frame at the change point as speech frames, If the frame transmits frame type information as the first non-voice frame that does not contain a non-voice code, the non-voice code conversion unit
An inverse quantizer that dequantizes the non-speech code of the first non-speech frame to generate dequantized values of a plurality of element codes;
Means for generating dequantized values of a plurality of predetermined or random element codes, and using the quantization table of the second speech coding method for the dequantized values of the element codes of two consecutive frames Each of them is quantized and converted into a speech code for one frame of the second speech coding method and output, and after a speech code of the second speech coding method for n frames is output, a non-speech code is not included. Sending frame type information of the first non-voice frame;
The speech code conversion device according to supplementary note 17, characterized in that.

以上、本発明によれば、非音声符号化方法が異なる２つの音声通信システム間の通信において、送信側の非音声符号化方法で符号化した非音声符号（CN符号）をCN信号に復号しなくても受信側の非音声符号化方法に応じた非音声符号（CN符号）に変換することができ、高品質な非音声符号変換を実現できる。
また、本発明によれば、送信側と受信側のフレーム長の相違やDTX制御の相違を考慮して非音声信号に復号することなく送信側の非音声符号(ＣＮ符号)を受信側の非音声符号（ＣＮ符号）に変換することができ、高品質な非音声符号変換を実現できる。 As described above, according to the present invention, in communication between two speech communication systems having different non-speech encoding methods, a non-speech code (CN code) encoded by the non-speech encoding method on the transmission side is decoded into a CN signal. Without conversion, it can be converted into a non-speech code (CN code) according to the non-speech encoding method on the receiving side, and high-quality non-speech code conversion can be realized.
Further, according to the present invention, the non-speech code (CN code) on the transmitting side is converted to the non-speech code on the receiving side without decoding into a non-speech signal in consideration of the difference in frame length between the transmitting side and the receiving side and the difference in DTX control. It can be converted into a voice code (CN code), and high-quality non-voice code conversion can be realized.

また、本発明によれば音声フレームに加えて非音声圧縮機能によるSIDフレームおよび非伝送フレームに対しても正常な符号変換処理を行うことができる。これにより、従来の音声符号変換部で課題となっていた非音声圧縮機能を持つ音声符号化方式間での符号変換が可能となる。
また、本発明によれば非音声圧縮機能の伝送効率向上効果を維持しつつ、さらに品質劣化と伝送遅延を抑えた異なる通信システム間の音声符号変換が可能となる。VoIPや携帯電話システムを始めとしてほとんどの音声通信システムでは非音声圧縮機能が用いられており、本発明の効果は大きい。 Further, according to the present invention, normal code conversion processing can be performed not only for voice frames but also for SID frames and non-transmission frames based on the non-voice compression function. As a result, it is possible to perform code conversion between speech coding methods having a non-speech compression function, which has been a problem in conventional speech code conversion units.
Further, according to the present invention, it is possible to perform speech code conversion between different communication systems while maintaining the transmission efficiency improvement effect of the non-speech compression function and further suppressing quality degradation and transmission delay. Most voice communication systems such as VoIP and mobile phone systems use a non-voice compression function, and the effect of the present invention is great.

51a 符号化方式１の符号器
51b VAD部
52 フレームタイプ検出部
53 変換制御部
54 符号化方式２の復号器
60 非音声符号変換部
61 符号分離部
62_１〜62n CN符号変換部
63 符号多重部
70 音声符号変換部 51a Encoder for encoding system 1
51b VAD section
52 Frame type detector
53 Conversion controller
54 Coding method 2 decoder
60 Non-speech code converter
61 Code separator
62 _{1 to} 62n CN code converter
63 Code multiplexer
70 Voice code converter

Claims

A first sample code obtained by encoding a speech signal in a speech section in a frame unit with a first speech coding scheme and a non-speech signal in a non-speech section as a first, with a certain number of samples of the input signal as a frame. A first non-speech code obtained by encoding with a non-speech encoding method is mixed and transmitted from the transmission side, and the first speech code and the first non-speech code are respectively transmitted to the second speech encoding method. Are converted into a second speech code by the second non-speech coding method and a second non-speech code by the second non-speech coding method, and the second speech code and the second non-speech code obtained by the conversion are mixedly received. In a speech code conversion method in a speech communication system that transmits to the side,
In a non-voice section, a non-voice code is transmitted only in a predetermined frame, and a non-voice code is not transmitted in other frames.
Add frame type information indicating the distinction between voice frames, non-voice frames, and non-transmission frames that do not transmit codes to the code information in units of frames,
Identify which frame code is based on frame type information,
In the case of non-speech frames and non-transmission frames, the first non-speech code is changed to the first non-speech code in consideration of the difference in frame length between the first and second non-speech coding methods and the difference in transmission control of the non-speech code. While converting to 2 non-speech code,
In the case of a non-speech frame, if the first non-speech code to be converted cannot be obtained, the second non-speech code is obtained using the speech code of the past first speech frame, and the second non-speech code is obtained. Convert to non-speech code,
A speech code conversion method characterized by the above.

A first sample code obtained by encoding a speech signal in a speech section in a frame unit with a first speech coding scheme and a non-speech signal in a non-speech section as a first, with a certain number of samples of the input signal as a frame. A first non-speech code obtained by encoding with a non-speech encoding method is mixed and transmitted from the transmission side, and the first speech code and the first non-speech code are respectively transmitted to the second speech encoding method. Are converted into a second speech code by the second non-speech coding method and a second non-speech code by the second non-speech coding method, and the second speech code and the second non-speech code obtained by the conversion are mixedly received. In a speech code conversion method in a speech communication system that transmits to the side,
In a non-voice section, a non-voice code is transmitted only in a predetermined frame, and a non-voice code is not transmitted in other frames.
Add frame type information indicating the distinction between voice frames, non-voice frames, and non-transmit frames that do not transmit codes to the code information of the frame unit,
Identify which frame code is based on frame type information,
In the case of non-voice frames and non-transmission frames, the first non-voice code is changed to the first non-voice code in consideration of the difference in frame length between the first and second non-voice coding methods and the difference in transmission control of the non-voice code. And convert it to 2 non-speech codes,
When the first non-speech code is periodically converted to the second non-speech code, an average value obtained by averaging the received first non-speech code regardless of the presence or absence of the non-speech code Use as a second non-speech code, thereby periodically generating a second non-speech code;
A speech code conversion method characterized by the above.

When the second non-speech coding scheme is a scheme for transmitting a speech code by regarding consecutive n frames as a speech frame including a frame at the transition point at a transition point from a speech segment to a non-speech segment, A second speech code using a dequantized value of a plurality of element codes obtained by dequantizing a non-speech code of one non-speech frame, and a dequantized value of another predetermined or random element code Generated in a continuous n-frame speech code of the encoding method, and outputs a second speech encoding speech code for the n frames,
The speech code conversion method according to claim 2.