JP2002221994A

JP2002221994A - Method and apparatus for assembling packet of code string of voice signal, method and apparatus for disassembling packet, program for executing these methods, and recording medium for recording program thereon

Info

Publication number: JP2002221994A
Application number: JP2001018541A
Authority: JP
Inventors: Toru Morinaga; 徹森永; Shigeaki Sasaki; 茂明佐々木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-01-26
Filing date: 2001-01-26
Publication date: 2002-08-09
Anticipated expiration: 2021-01-26
Also published as: JP3566931B2

Abstract

PROBLEM TO BE SOLVED: To suppress the deterioration of voice quality caused by packet dissipa tion by adding a small amount of auxiliary information. SOLUTION: In assembling of voice packets, voice signals are coded for every frame and code string are created. The code strings (high quality main bit stream) (1), etc., of the current frame is combined with the code strings (subbit stream compressed at a high compression rate) (2), etc., of a previous frame, respectively, and then the voice packets (1), etc., of the current frame time are created and stored. In the disassembling of the voice packets, the code strings (main bit stream) of the current frame are decoded among the stored code strings when voice packets do not dissipate. When voice packets (3) dissipate, the code strings (3) (subbit stream) of the previous frame of the voice packets (2) are decoded to reproduce the voice signals.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声信号をパケット
伝送するときに生じうるパケット消失により欠落した音
声信号の品質劣化を抑えて補償する技術に用いられる音
声信号の符号列のパケット組立方法、装置及びパケット
分解方法、装置並びにこれらの方法を実行するプログラ
ム、プログラムを記憶する記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for assembling a packet of a code string of an audio signal used in a technique for suppressing and compensating for quality deterioration of an audio signal lost due to packet loss which may occur when transmitting an audio signal in packets. The present invention relates to a packet disassembly method and apparatus, a program for executing these methods, and a storage medium for storing the program.

【０００２】[0002]

【従来の技術】移動通信やVoIP(Voice over IP)に代表
されるように、パケット通信によって音声とデータを統
合的に扱うことが可能となる。通常パケットのヘッダの
オーバーヘッドを少なくするために１パケットには複数
音声フレームを詰めて送信する場合が多い。音声通信に
おいては、一般的に音声信号を構成する信号系列が発生
した順序が伝送後再生する場合においても維持されるこ
とが必須である反面、パケット通信では一定時間毎に発
信されたパケットの伝送遅延が各々変動することから到
着時刻に揺らぎが生じうる。その揺らぎを吸収して発信
された順序にパケットに格納された符号により音声信号
を再生するために揺らぎ吸収バッファを用いる。2. Description of the Related Art As represented by mobile communication and VoIP (Voice over IP), voice and data can be integratedly handled by packet communication. In many cases, a plurality of voice frames are packed and transmitted in one packet in order to reduce the overhead of the header of the normal packet. In voice communication, in general, it is essential that the order in which a signal sequence constituting a voice signal is generated is maintained even when the signal is reproduced after transmission, but in packet communication, transmission of packets transmitted at regular intervals is required. Since the delay varies, the arrival time may fluctuate. A fluctuation absorbing buffer is used to reproduce the audio signal by the code stored in the packet in the order in which the fluctuation is absorbed and transmitted.

【０００３】パケット音声通信における問題点の一つと
してパケット消失があげられる。通信路が広帯域化、高
速化されることにより、符号化による劣化、遅延は解消
される。反面、パケット消失は通信容量が増えても避け
られない問題である。パケット消失が起こる原因として
次のものがあげられる。まず、パケット数が多い場合、
パケットどうしの衝突（コリジョン）によってパケット
が完全に消失してしまう場合がある。また符号ビット誤
りが伝送の過程で約50%程度に達した場合、そのパケッ
ト情報は全て失われたものとし、パケット消失と判定さ
れることがある。さらに、パケットの到着遅延が揺らぎ
吸収バッファで補償されるよりも大きい場合にパケット
が失われたものとしてパケット消失と判定されることが
ある。これらの原因によって音声の品質劣化が生じる。One of the problems in packet voice communication is packet loss. By increasing the bandwidth and speed of the communication path, the degradation and delay due to coding are eliminated. On the other hand, packet loss is a problem that cannot be avoided even if the communication capacity increases. The causes of packet loss include the following. First, if the number of packets is large,
A packet may be completely lost due to collision between packets. When the code bit error reaches about 50% in the course of transmission, it is assumed that all the packet information has been lost, and it may be determined that the packet has been lost. Furthermore, if the arrival delay of a packet is larger than compensated by the fluctuation absorbing buffer, it may be determined that the packet has been lost and that the packet has been lost. These causes the voice quality to deteriorate.

【０００４】品質の劣化によって聴覚に不快感を与えな
いために、失われたパケットの部分は別の何らかの信号
で補償する必要がある。符号化方式によってはバッファ
の前後の情報を用いて符号化しているため、一度パケッ
トが消失すると、復帰後しばらく品質が劣化することが
ある。その品質の劣化を聴感上抑制することもパケット
消失補償に含まれる。パケット消失により欠落した音声
信号の品質劣化を抑えて補償する従来の技術を図９を参
照して説明する。[0004] In order not to make the hearing uncomfortable due to the deterioration of quality, the part of the lost packet needs to be compensated by some other signal. Depending on the encoding method, encoding is performed using information before and after the buffer. Therefore, once a packet is lost, quality may deteriorate for a while after restoration. Suppressing the quality deterioration from the viewpoint of hearing is also included in the packet loss compensation. With reference to FIG. 9, a description will be given of a conventional technique for suppressing and compensating for quality degradation of a voice signal lost due to packet loss.

【０００５】欠落音声補間装置は、多重分離回路102、
残差信号電力復号器119、逆量子化器120、長時間予測係
数復号／選択器118、短時間予測係数復号器107、長時間
合成／補間フィルタ117、短時間合成フィルタ110と入出
力端子101,115,116を備え、音声符号化情報の欠落を検
出した場合、すなわち、欠落検出信号が入力された場合
には、接続遮断スイッチ121が開放されると共に長時間
予測係数復号／選択器118から補間用の長時間予測係数
（1.予め数値が設定されており、常に一定の値、もしく
は2.長時間予測係数復号器から得た前フレームの長時間
予測係数に応じた補間用の長時間予測係数）が長時間予
測器112に出力される。また短時間予測器114には前フレ
ームでの短時間予測係数をそのまま設定しておく。長時
間合成／補間フィルタ117には何の入力もされずに自己
駆動することにより出力信号を短時間合成フィルタ110
に入力する。この短時間合成フィルタ110は通常の復号
処理を行うことにより再生デジタル音声信号が補間され
る。（特開平5-88697号公報参照）この従来の技術は、復号器において、過去の信号からピ
ッチ周期を解析し適当な波形を取り出し、それを繰り返
すことによって、擬似的な信号を作る方法である。この
ピッチ周期繰り返し補償で最も劣化の原因となりやすい
のは波形の不連続によるものである。その波形の不連続
が発生しやすいのは、パケット消失間の補償信号とパケ
ット消失から復帰後の信号の繋ぎ合わせの部分である。
この不連続性を目立たなくするために、ピッチ周期を消
失から復帰後と連続になるように調整する、あるいは、
OLA(Overlap add)によって、合成信号と復帰後の信号を
徐々に変化させていくという方法や合成信号のパワーを
徐々に減衰させることが提案されている。[0005] The missing voice interpolation device comprises a demultiplexing circuit 102,
Residual signal power decoder 119, inverse quantizer 120, long-term prediction coefficient decoding / selector 118, short-term prediction coefficient decoder 107, long-time synthesis / interpolation filter 117, short-time synthesis filter 110, and input / output terminals 101, 115, 116 When the loss of the voice coded information is detected, that is, when the loss detection signal is input, the connection cutoff switch 121 is opened and the long-term prediction coefficient decoding / selector 118 Temporal prediction coefficient (1. Numerical value is set in advance, always constant value, or 2. Long-term prediction coefficient for interpolation according to long-term prediction coefficient of previous frame obtained from long-term prediction coefficient decoder) It is output to the long-term predictor 112. In the short-time predictor 114, the short-time prediction coefficient in the previous frame is set as it is. The self-driving without any input to the long-time synthesis / interpolation filter 117 allows the output signal to be
To enter. The short-time synthesis filter 110 performs normal decoding processing to interpolate the reproduced digital audio signal. This conventional technique is a method of generating a pseudo signal by analyzing a pitch period from a past signal, extracting an appropriate waveform, and repeating the same in a decoder. . The most likely cause of the deterioration in the pitch cycle repetition compensation is the discontinuity of the waveform. Discontinuities in the waveform are likely to occur in a portion where the compensation signal during packet loss and the signal after recovery from packet loss are joined.
In order to make this discontinuity inconspicuous, the pitch cycle is adjusted so that it is continuous with that after returning from disappearance, or
A method of gradually changing the combined signal and the restored signal by OLA (Overlap add) and gradually attenuating the power of the combined signal have been proposed.

【０００６】低ビットレートの音声符号化に使用される
CELP(Code Excited Linear Prediction：符号励振線形
予測)方式のパケット消失補償では、パケット内の音声
信号をあらかじめ周期性と非周期性に分類しておき、消
失パケットのピッチ周波数が周期性であれば、適応符号
帳の励振信号を用い、非周期性であれば白色雑音をラン
ダムに使用するという方法が良く用いられる。さらにそ
の他の手法において、特徴的な処理として、合成フィル
タ係数を反復させる、適応・固定コードブックゲインを
減衰させる、ゲイン予測を減衰させるという手法があげ
られる。Used for low bit rate speech coding
In CELP (Code Excited Linear Prediction) packet erasure compensation, the audio signal in a packet is classified into periodic and aperiodic in advance, and if the pitch frequency of the lost packet is periodic, A method of using an excitation signal of an adaptive codebook and randomly using white noise if it is aperiodic is often used. Still other methods include, as characteristic processes, a method of repeating a synthesis filter coefficient, attenuating adaptive / fixed codebook gain, and attenuating gain prediction.

【０００７】これらの手法は聴覚に不快な信号を抑制す
る効果に関しては有効な手法であった。しかし、あくま
で擬似的な合成信号の再生であり常に原音に近い音を再
生することが困難である場合が多い。パケット間におい
て、ピッチやパワーが急速に変動する場合、あるいはピ
ッチ間隔の不一致による波形の不連続性や無理な調整に
よって音質が著しく劣化する場合があった。さらに、圧
縮コーデックの場合は消失から復帰後に立ち上がりの部
分が劣化するという問題点があった。[0007] These methods are effective methods for suppressing an unpleasant signal for hearing. However, in many cases, it is difficult to reproduce a sound that is close to the original sound because the reproduction is a pseudo synthetic signal. Between packets, the pitch or power may fluctuate rapidly, or the sound quality may be significantly degraded due to waveform discontinuity or unreasonable adjustment due to pitch interval mismatch. Further, in the case of the compression codec, there is a problem that the rising portion is deteriorated after recovery from the disappearance.

【０００８】[0008]

【発明が解決しようとする課題】本発明では通信路の容
量が十分に大きく、多少の補助情報を付加できることを
前提として、従来のパケット消失補償技術の欠点を解消
し、パケット消失による音声の品質劣化を改善すること
を課題としている。従来技術ではパケットが消失してい
る区間で、ピッチ周期、パワー等が変化する場合に劣化
が顕著になる。本発明は、パケットに含ませる音声デー
タ長が大きくても、パケットが消失している間の音声信
号、消失から復帰後の音声信号の劣化を抑えることので
きる符号化、および復号化方法、およびこれらの方法を
実現する手段を提供することを課題とする。SUMMARY OF THE INVENTION The present invention solves the drawbacks of the conventional packet loss compensation technique, assuming that the capacity of the communication path is sufficiently large and some auxiliary information can be added. The task is to improve the deterioration. In the prior art, when the pitch period, the power, and the like change in the section where the packet has been lost, the deterioration becomes significant. The present invention provides an audio signal while a packet is lost, and an encoding and decoding method capable of suppressing deterioration of an audio signal after restoration from loss, even if the audio data length included in the packet is large, and It is an object to provide means for realizing these methods.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するため
に、本発明は、音声パケットの組立において、フレーム
毎に音声信号を符号化して符号列を生成し、現フレーム
と先のフレームの符号列を結合して現フレーム時刻のパ
ケットを作成して格納し、また、音声パケットの分解に
おいて、パケットが消失していない場合に現フレーム時
刻のパケットに格納されている符号列のうち現フレーム
の符号列を復号化し、パケットが消失している場合には
過去フレーム時刻のパケットの現フレームの符号列を復
号化して、音声信号を再生する。In order to solve the above-mentioned problems, the present invention provides a method of assembling a voice packet, in which a voice signal is encoded for each frame to generate a code sequence, and the code of the current frame and the previous frame is generated. The columns are combined to create and store the packet at the current frame time, and in the disassembly of the voice packet, if no packet has been lost, of the code string stored in the packet at the current frame time, The code string is decoded, and if the packet is lost, the code string of the current frame of the packet at the previous frame time is decoded to reproduce the audio signal.

【００１０】[0010]

【発明の実施の形態】図１に示すようにVoP（Voice ove
r Packet）では音声パケットをネットワークモジュール
１で受信し、パケット毎に分解して揺らぎ吸収バッファ
３に出力し、またパケット消失を判定してパケット消失
フラグをパケット消失補償部２に出力する。分解された
パケットは揺らぎ吸収バッファ３に蓄積し、しばらくパ
ケットを待ってから再生を行う。パケット消失の判定が
されない場合には、揺らぎ吸収バッファ３の蓄積したパ
ケットの現フレームのメインビットストリームを音声デ
コーダ４に出力する。パケット消失の判定がされた場合
には、揺らぎ吸収バッファ３に届いている前後のパケッ
トのメイン、サブビットストリームを使って効率の良い
パケット消失補償を行うことができる。例えば、再生す
べきパケットが到着しない場合には、音声デコーダ４
は分析係数を作成してパケット消失補償部２に出力し、
パケット消失補償部２は分析係数と揺らぎ吸収バッファ
３に届いている過去のパケットのサブビットストリー
ムを使って、消失補償データを音声デコーダ４に出力
し、効率の良いパケット消失補償を行うことができる。
音声デコーダ４はパラメータの補間や音量の制御を施す
ことにより、できるだけ劣化を抑えるように処理し、音
声を出力することができる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS As shown in FIG.
In (r Packet), a voice packet is received by the network module 1, decomposed for each packet and output to the fluctuation absorbing buffer 3, and packet loss is determined and a packet loss flag is output to the packet loss compensator 2. The decomposed packets are stored in the fluctuation absorbing buffer 3, and the packets are reproduced after waiting for a while. If the packet loss is not determined, the main bit stream of the current frame of the packet stored in the fluctuation absorbing buffer 3 is output to the audio decoder 4. When packet loss is determined, efficient packet loss compensation can be performed using the main and sub bit streams of the packets before and after reaching the fluctuation absorbing buffer 3. For example, when a packet to be reproduced does not arrive, the audio decoder 4
Creates an analysis coefficient and outputs it to the packet loss compensator 2,
The packet erasure compensator 2 outputs the erasure compensation data to the audio decoder 4 using the analysis coefficient and the sub-bitstream of the past packet that has reached the fluctuation absorbing buffer 3, and can perform efficient packet erasure compensation. .
The audio decoder 4 performs processing to suppress deterioration as much as possible by performing parameter interpolation and volume control, and can output audio.

【００１１】本発明においては通常使用している１つの
メインコーデックに、パケットが消失した場合の補償手
段として、複数のサブコーデックを組み合わせることに
よってパケット消失に対して耐性を持たせている。メイ
ンコーデックは圧縮率の比較的低い、高品質の符号化方
式を用い、また、サブコーデックはメインコーデックよ
り高圧縮、かつ品質が十分良い符号化方式を選ぶ必要が
ある。このようにすることによりデータ量の増加を抑え
ることができる。（エンコーダ）図２，３を参照して本発明のエンコーダ
を説明する。In the present invention, one main codec which is normally used is provided with resistance to packet loss by combining a plurality of sub-codecs as a compensating means when a packet is lost. The main codec uses a high-quality coding method with a relatively low compression rate, and the sub-codec needs to select a coding method with higher compression and higher quality than the main codec. By doing so, an increase in the data amount can be suppressed. (Encoder) The encoder of the present invention will be described with reference to FIGS.

【００１２】入力された音声データはフレーム（データ
単位）に分割され、デコーダ側で通常再生すべき信号
を、メインコーデック（エンコーダ）１１で第１の符号
化方法により符号化してメインビットストリームをつく
る。例えば、ITU-Tが勧告した音声符号化方式で8kHzサ
ンプリングの音声帯域信号を64kb/sで伝送するパルス符
号変調(PCM)方式であるG.711符号化標準により符号化す
る。また、１パケット分先の音声データをサブコーデッ
ク（エンコーダ）１４で第２の符号化方法により符号化
してサブビットストリームを作る。例えば、ITU-Tが勧
告した16kb/s電話帯域音声符号化に関するLD-CELP方式
であるG.728符号化標準により符号化する。２つ以上の
サブビットストリームを含ませる場合は、それよりもさ
らに先読みした信号を符号化したビットストリームを含
ませるとよい。The input audio data is divided into frames (data units), and a signal to be normally reproduced on the decoder side is encoded by a main codec (encoder) 11 by a first encoding method to form a main bit stream. . For example, encoding is performed according to the G.711 encoding standard, which is a pulse code modulation (PCM) system that transmits an 8 kHz sampling audio band signal at 64 kb / s in the audio encoding system recommended by the ITU-T. In addition, the sub-codec (encoder) 14 encodes the audio data one packet ahead by the second encoding method to create a sub-bit stream. For example, encoding is performed according to the G.728 encoding standard, which is an LD-CELP scheme for 16 kb / s telephone band audio encoding recommended by ITU-T. When two or more sub-bitstreams are included, it is preferable to include a bitstream obtained by encoding a signal that is pre-read more than that.

【００１３】そして、メインビットストリームと１パケ
ット分先のサブビットストリームは結合部１５で結合さ
れ、公知の通番付与回路等により時間順シーケンス番
号、ビット誤り検出符号等を付加する処理をパケット化
部１６で行いパケット化して出力される。以上のような
手法で、それぞれパケットにはメインバッファに蓄えら
れた信号をメインコーデックで符号化したメインビット
ストリーム、サブバッファに蓄えられたメインバッファ
より先の信号をサブコーデックで符号化したサブビット
ストリームを持つ。The main bit stream and the sub-bit stream one packet ahead are combined by a combining unit 15, and a process of adding a chronological sequence number, a bit error detection code, and the like by a known serial number assigning circuit or the like is performed by a packetizing unit. The packet is output at step 16 and packetized. In the above-described manner, each packet has a main bit stream in which the signal stored in the main buffer is encoded by the main codec, and a sub-bit in which the signal ahead of the main buffer stored in the sub buffer is encoded by the sub codec. Have a stream.

【００１４】図４に示すように、そのようにビットスト
リームを作成することによって、デコード側ではパケッ
トが消失したと判断された時（パケット）、消失する
直前のパケットに含まれているサブビットストリーム
の情報によって消失した区間の音声（復号音声信号
）を作ることができる。それは、サブビットストリー
ムには先読み信号をサブコーデックで符号化した信号が
含まれるからである。また、パケットのヘッダのオーバ
ーヘッドを少なくするため、音声フレームは１パケット
に複数個詰めて送信される場合が多い。従来のパケット
消失補償では合成音声によって過去の信号の繰り返しで
擬似的な音声信号を作成しているため、パケットに含ま
れる音声データが長ければ長いほど劣化が顕著になる。As shown in FIG. 4, by creating a bit stream in this way, when the decoding side determines that a packet has been lost (packet), the sub-bit stream included in the packet immediately before the loss is determined. , The speech (decoded speech signal) of the lost section can be created. This is because the sub-bit stream includes a signal obtained by encoding a pre-read signal using a sub-codec. In addition, in order to reduce the overhead of the packet header, a plurality of voice frames are often packed in one packet and transmitted. In the conventional packet erasure compensation, a pseudo audio signal is generated by repeating a past signal using a synthesized voice, so that the longer the audio data included in the packet is, the more remarkable the deterioration becomes.

【００１５】本発明では、サブコーデックにはメインコ
ーデックの１パケット分先の音声データを含ませるた
め、各パケットに含ませる音声データの長さに係わらず
劣化の少ないパケット消失補償を行うことができる。サ
ブコーデックに使用する圧縮符号化によっては、注意を
しなければならない点がある。それは符号化、復号化に
必要な分析係数（これはコーデックによって異なるが、
例えば合成信号、フィルタ係数、予測係数等があげられ
る）を、前パケットから引き継いで復号化するコーデッ
クでは、パケット消失が発生した時、通常エンコーダと
デコーダで予測器や量子化器等の分析係数が異なってし
まう。そのような場合でも分析係数を一致させるために
は、エンコーダ側で分析係数の初期情報を符号化情報と
して送信する必要がある。In the present invention, since the sub codec includes audio data one packet ahead of the main codec, packet loss compensation with little deterioration can be performed regardless of the length of the audio data included in each packet. . Care must be taken depending on the compression coding used for the sub-codec. It is the analysis coefficient required for encoding and decoding (this depends on the codec,
For example, a codec that inherits and decodes a synthesized signal, a filter coefficient, a prediction coefficient, and the like from a previous packet, and decodes the analysis coefficient of a predictor or a quantizer by a normal encoder and decoder when a packet loss occurs. Will be different. Even in such a case, in order to match the analysis coefficients, it is necessary for the encoder to transmit initial information of the analysis coefficients as encoded information.

【００１６】本発明においては、パケット内部に高品質
符号化であるメインコーデックで符号化されたメインビ
ットストリームと高圧縮符号化であるサブコーデックで
符号化されたサブビットストリームがセットになって存
在する。そこでサブコーデックの分析係数を、メインコ
ーデックを復号した信号からメインコーデック（デコー
ダ）１２とサブコーデック（分析係数算出）１３により
作成すれば、その情報を送らなくても良い。例えば、G.
728のようにエンコーダの分析係数が過去の合成信号か
ら作られているような場合、合成信号の部分をメインコ
ーデックで復号した高品質信号で置き換えることが可能
となる。同様にデコーダでも合成信号の部分を高品質符
号化で置き換える必要がある。そしてエンコーダとデコ
ーダの内部状態を合わせることによって正しく復号する
ことが可能となる。また、分析係数を合成信号でなく高
品質信号で置き換えることによって復号化信号の品質も
向上させることが可能となる。In the present invention, a main bit stream coded by a main codec, which is high quality coding, and a sub bit stream coded by a sub codec, which is high compression coding, are present as a set inside a packet. I do. Therefore, if the analysis coefficient of the sub codec is created by the main codec (decoder) 12 and the sub codec (analysis coefficient calculation) 13 from the signal decoded by the main codec, the information need not be transmitted. For example, G.
When the analysis coefficients of the encoder are generated from past synthesized signals as in 728, it is possible to replace the synthesized signal portion with a high-quality signal decoded by the main codec. Similarly, in the decoder, it is necessary to replace the synthesized signal portion with high quality coding. The decoding can be performed correctly by matching the internal states of the encoder and the decoder. Further, the quality of the decoded signal can be improved by replacing the analysis coefficient with a high-quality signal instead of the synthesized signal.

【００１７】分析係数は、例えば、G.728：LD-CELP符号
器（図示せず）のバックワード合成フィルタ適応器で求
められる合成フィルタ係数と聴覚重み付けフィルタで求
められる聴覚重み付けフィルタ係数を指す。合成逆フィルタｅ_n＝ｘ_n＋ａ₁ｘ_nー₁＋・・・＋ａ_nｘ₀ ａ：フィルタ係数ｘ：合成信号ｅ：残差信号同様にして聴覚重み付けフィルタを高品質の信号より置
き換えすることも可能となる。The analysis coefficients refer to, for example, a synthesis filter coefficient obtained by a backward synthesis filter adaptor of a G.728: LD-CELP encoder (not shown) and an auditory weighting filter coefficient obtained by an auditory weighting filter. The synthetic inverted filter _{_{_{e n = x n + a 1}}} x n over _{_{_{1 + ··· + a n x 0}}} a: filter coefficient x: synthesized signal e: to replace higher-quality signal to perceptual weighting filter in a manner similar residual signal Is also possible.

【００１８】聴覚重み付けフィルタｗ_n＝ａ₀ｘ_n＋ａ₁ｘ_nー₁＋・・・＋ａ_nｘ_o −(ｂ₀ｗ_n＋ｂ
₁ｗ_nー₁＋・・・＋ｂ_nｗ_o) ａ、ｂ：フィルタ係数ｘ：合成信号ｗ：聴覚重み付
け信号このようにして、高品質な信号と置き換えることによっ
て品質の良い復号化をすることができる。算出された分
析係数をサブコーデック（デコーダ）１４に転送し、す
なわちG.728：LD-CELP符号器（図示せず）の最適コード
ブックデータ選択器からの出力としてコードブック（符
号帳）中に格納される形状コードベクトル（波形）と利
得レベルの中から最適なものに対応する符号が選択さ
れ、サブビットストリームが出力される。（デコーダ）図５，６を参照してデコーダを説明する。An auditory weighting filter w_n= A₀x_n+ A₁x_nー₁+ ... + a_nx_o − (B₀w_n+ B
₁w_nー₁+ ... + b_nw_oa, b: filter coefficient x: synthesized signal w: auditory weighting
Signal in this way, by replacing it with a higher quality signal.
High-quality decoding. Calculated minutes
The analysis coefficients are transferred to the sub codec (decoder) 14 and
That is, G.728: Optimal code of LD-CELP encoder (not shown)
Codebook (sign) as output from book data selector
Code book (waveform) stored in the
The code corresponding to the best one is selected from the gain levels.
And a sub-bitstream is output. (Decoder) The decoder will be described with reference to FIGS.

【００１９】デコーダ側では、まず受信した信号をデパ
ケット化部２１でデパケット化し、メインコーデック／
サブコーデック分配部で現フレーム時刻の音声パケット
のうち、メインビットストリーム（現フレームのG.711
音声符号化標準による符号列）とサブビットストリーム
（先フレームのG.728音声符号化標準による符号列）に
分配する。メインビットストリームは、メインコーデッ
ク（デコーダ）２３で第１の復号化方法により音声信号
に復号する。On the decoder side, first, the received signal is depacketized by the depacketizing section 21 and the main codec /
In the sub-codec distribution unit, the main bit stream (G.711
It is distributed to a code sequence according to the audio coding standard) and a sub-bitstream (code sequence according to the G.728 audio coding standard of the previous frame). The main bit stream is decoded by the main codec (decoder) 23 into an audio signal by the first decoding method.

【００２０】そして、その復号した信号をサブコーデッ
ク（分析係数算出）２４で分析係数を算出しサブコーデ
ック（デコーダ）２５の内部状態を作りあげる。あるい
は前述したように、メインコーデックから直接サブコー
デックの内部状態を作成する。最後に、その内部状態を
引き継いだ状態で、サブコーデック（デコーダ）２５に
よりサブビットストリームを第２の復号化方法により復
号化して音声信号を出力する。Then, the decoded signal is used to calculate an analysis coefficient by a sub codec (analysis coefficient calculation) 24 to create an internal state of the sub codec (decoder) 25. Alternatively, as described above, the internal state of the sub codec is created directly from the main codec. Finally, with the internal state being inherited, the sub-codestream (decoder) 25 decodes the sub-bit stream by the second decoding method, and outputs an audio signal.

【００２１】具体的には、LD-CELP復号器（図示せず）
にG.728音声符号化標準による符号化列として形状コー
ドベクトル（波形）に対する符号と利得レベルに対する
符号をそれぞれ入力し符号帳（励振VWコードブック）か
ら形状コードベクトルと利得ベクトルを選択し、また合
成フィルタにおけるフィルタ係数として算出された分析
係数を転送して用い復号音声を再生する。音声パケット
の消失有無の検出は、図５に示すデパケット化部２１の
前段で行い、汎用されているパケット消失検出回路によ
りシーケンス番号の乱れ、もしくはビット誤りを検出し
て行う。Specifically, an LD-CELP decoder (not shown)
Input the code for the shape code vector (waveform) and the code for the gain level as a coded sequence according to the G.728 speech coding standard, select the shape code vector and the gain vector from the codebook (excitation VW codebook), and The decoded speech is reproduced by transferring the analysis coefficients calculated as the filter coefficients in the synthesis filter. The detection of the presence / absence of the voice packet loss is performed before the depacketizing unit 21 shown in FIG. 5, and is performed by detecting a disorder of the sequence number or a bit error by a general-purpose packet loss detection circuit.

【００２２】パケット消失信号有の判定がされない場合
には、切換スイッチ２７をメインコーデック（デコー
ダ）２３側に切り換えて音声信号を出力する。また、パ
ケット消失信号有と判定された場合には切換スイッチ２
７をサブコーデック（デコーダ）２５側に切り換える。
メインコーデックがADPCM(Adaptive Pulse Code Modula
tion)のようにメインコーデックが過去の情報を必要と
する、つまり内部状態を引き継ぐようなコーデックを使
用する場合において、過去ののパケットが消失した場合
は、消失補償に使用したサブコーデックを復号化したも
のと、メインコーデックを復号化した音声信号のつなが
りの部分に劣化が生じる。そのような場合、サブコーデ
ックが再生した信号から必要な情報を作成することによ
って補償後のメインコーデックの再生の劣化を抑えるこ
とができる。If it is not determined that there is a packet loss signal, the changeover switch 27 is switched to the main codec (decoder) 23 to output an audio signal. When it is determined that the packet loss signal is present, the changeover switch 2
7 is switched to the sub codec (decoder) 25 side.
The main codec is ADPCM (Adaptive Pulse Code Modula)
If the main codec requires past information, such as (tion), that is, a codec that takes over the internal state, and the past packet is lost, the sub codec used for erasure compensation is decoded. Deterioration occurs in the connection between the decoded signal and the audio signal decoded from the main codec. In such a case, by creating necessary information from the signal reproduced by the sub codec, it is possible to suppress the deterioration of the reproduction of the main codec after the compensation.

【００２３】サブコーデックが１つの時で、かつ連続し
てパケットが消失してしまった場合において、消失した
パケット数だけのサブコーデックが無い場合、サブコー
デックによる消失補償ができず、音声が劣化すると考え
られる。そのような場合、図５に示された従来手法によ
る波形繰り返し補償部２６を用いて図６に示された補償
を行う。サブコーデックが使えない時のみ過去のピッチ
周波数繰り返し消失補償等の合成信号を用いて波形を作
るものとする。バースト消失（パケット消失が２個以
上）の場合の対処として図５のように従来手法により補
償を行う例が示されているが、音声パケット構成におい
て１つのパケットに先の２個フレーム以上の音声信号に
よる符号列を格納し、パケット分解において先の２個以
上のフレームにまたがる符号列から先のフレームにおけ
る符号列をそれぞれ用いて音声信号を復号すればよい。When there is only one sub-codec and packets are continuously lost, if there are not as many sub-codecs as the number of lost packets, erasure compensation by the sub-codec cannot be performed, and if the sound is degraded. Conceivable. In such a case, the compensation shown in FIG. 6 is performed by using the waveform repetition compensator 26 according to the conventional method shown in FIG. Only when the sub-codec cannot be used, a waveform is generated using a composite signal such as the past pitch frequency repetition erasure compensation. As a countermeasure for the case of burst loss (two or more packet losses), an example in which compensation is performed by a conventional method as shown in FIG. 5 is shown. It is sufficient to store a code sequence based on a signal, and decode a speech signal using a code sequence in a preceding frame from a code sequence extending over two or more frames in packet decomposition.

【００２４】メインコーデック、サブコーデックのお互
いの量子化雑音の歪具合の大差が無い場合、お互いを同
期させ足し合わせることによって信号対量子化雑音比(S
NR:Signal-to-quantization Noise Ratio)をあげること
ができる。それは異なったコーデックの場合、量子化雑
音は無相関である場合が多く、足し合わせることにより
相関のある音響部分と無相関の雑音のパワーの比率が音
響部分の方が大きくなると考えられる所以である。サブ
コーデックを増やせば増やすほどメインコーデックに対
して先読み情報を多くもつことになる。そのことによっ
てパケットが連続的に消失する場合においても耐性をも
たせることが可能となる。If the main codec and the sub codec have no great difference in the degree of distortion of the quantization noise, the signal-to-quantization noise ratio (S
NR: Signal-to-quantization Noise Ratio). That is why quantization noise is often uncorrelated for different codecs, and the sum of the powers of the correlated and uncorrelated noise is considered to be greater in the acoustic part when added together. . As the number of sub-codecs increases, the more pre-read information is provided for the main codec. As a result, it is possible to provide resistance even when packets are continuously lost.

【００２５】しかし、図７に示すようにサブコーデック
を増やせば増やすほど、その分先読み情報が必要とな
り、結果として遅延が増加することになる。また。サブ
コーデックの数だけ情報量が多くなる。VoIPにおいて遅
延の原因となるのは上述したもの以外にパケットの到着
遅延などの揺らぎを吸収する、揺らぎ吸収バッファによ
る遅延が大きい。また、PC(Personal Computer)上では
ネットワークカード、サウンドカード等のバッファの影
響により、大きな遅延が生じるが、専用ハードウエアの
導入や、PCの性能向上により解決される。リアルタイム
の会話では、片方向での遅延時間の合計が200ミリ秒以
内であることが望ましい。よって、サブコーデックの
数、揺らぎ吸収、その他の遅延の合計をその基準に合う
ように調整する必要がある。However, as shown in FIG. 7, as the number of sub-codecs is increased, pre-read information is required, and as a result, the delay is increased. Also. The amount of information increases by the number of sub-codecs. The cause of the delay in VoIP is a large delay caused by a fluctuation absorbing buffer that absorbs fluctuation such as packet arrival delay other than the above. In addition, a large delay occurs on a PC (Personal Computer) due to a buffer of a network card, a sound card, or the like, but this is solved by introducing dedicated hardware or improving the performance of the PC. For real-time conversations, it is desirable that the total delay time in one direction be within 200 milliseconds. Therefore, it is necessary to adjust the total number of sub-codecs, fluctuation absorption, and other delays to meet the standard.

【００２６】移動通信、VoIPは、通信速度が常に一定で
あるとは限らず、アプリケーションに使う情報量によっ
ても音声通信に使うことができる情報量が変化すると考
えられる。本発明では、通信速度、コンピュータの演算
速度によってサブコーデックの品質、サブコーデックの
数をフレキシブルに変更することによってネットワーク
に適した組み合わせを可能とすることを特徴としてい
る。図８に本手法を用いた時の波形の概略図を示す。こ
の図を参照すると、従来手法と比較すると本手法では原
音により近いことが分かる。In mobile communication and VoIP, the communication speed is not always constant, and it is considered that the amount of information that can be used for voice communication changes depending on the amount of information used for applications. The present invention is characterized in that a combination suitable for a network is enabled by flexibly changing the quality of a sub-codec and the number of sub-codecs according to a communication speed and a calculation speed of a computer. FIG. 8 shows a schematic diagram of a waveform when this method is used. Referring to this figure, it can be seen that the present method is closer to the original sound than the conventional method.

【００２７】また、本発明のパケット組立装置とパケッ
ト分解装置をCPUやメモリ等を有するコンピュータと、
アクセス主体となるユーザが利用する利用者端末と、CD
-ROM、磁気ディスク装置、半導体メモリ等の機械読み取
り可能な記録媒体から構成することができる。コンピュ
ータに前述した動作を実行させる制御用プログラムを記
録媒体に記憶させ、この制御用プログラムをコンピュー
タに読み取り、コンピュータの動作を制御してコンピュ
ータ上に前述した実施の形態における各要素を実現する
ことができる。The packet assembling apparatus and the packet disassembling apparatus according to the present invention may include a computer having a CPU, a memory, and the like.
User terminal and CD used by the user who will be the subject of access
-It can be composed of a machine-readable recording medium such as a ROM, a magnetic disk device, and a semiconductor memory. A control program for causing a computer to execute the above-described operation is stored in a recording medium, and the control program is read by a computer, and the operation of the computer is controlled to realize each element in the above-described embodiment on the computer. it can.

【００２８】[0028]

【発明の効果】本発明によれば、従来の方式に比較し
て、パケット消失による品質の劣化を極力抑え、波形の
不連続部分がなくなり、原音に忠実な消失部分の補償を
することができる。また、現フレーム時刻のパケットと
して現フレームの符号列と先のフレームの符号列を結合
しているので先のパケットが消失しても容易に補償する
ことができる。According to the present invention, as compared with the conventional method, deterioration in quality due to packet loss can be suppressed as much as possible, and discontinuous portions of the waveform can be eliminated, and the lost portion faithful to the original sound can be compensated. . Further, since the code sequence of the current frame and the code sequence of the previous frame are combined as the packet at the current frame time, even if the previous packet is lost, it can be easily compensated.

[Brief description of the drawings]

【図１】VoPの基本構成を示す図。FIG. 1 is a diagram showing a basic configuration of VoP.

【図２】エンコーダの処理の説明図。FIG. 2 is an explanatory diagram of processing of an encoder.

【図３】エンコーダの概略構成図。FIG. 3 is a schematic configuration diagram of an encoder.

【図４】パケットが消失した場合の処理の説明図。FIG. 4 is an explanatory diagram of processing when a packet is lost.

【図５】デコーダの概略構成図。FIG. 5 is a schematic configuration diagram of a decoder.

【図６】バースト消失した場合の処理の説明図。FIG. 6 is an explanatory diagram of a process when a burst is lost.

【図７】複数サブコーデックを持たせる場合の１パケッ
トの構造を示す図。FIG. 7 is a diagram showing the structure of one packet when a plurality of sub-codecs are provided.

【図８】原音に対する従来技術と本発明の手法の波形の
比較図。FIG. 8 is a comparison diagram of waveforms of the prior art and the method of the present invention for original sound.

【図９】従来の欠落音声補間装置の構成図。FIG. 9 is a configuration diagram of a conventional missing voice interpolation device.

[Explanation of symbols]

１ネットワークモジュール２パケット消失補償部３揺らぎ吸収バッファ４音声デコーダ１１メインコーデック（エンコーダ）１２、２３メインコーデック（デコーダ）１３、２４サブコーデック（分析係数算出）１４サブコーデック（エンコーダ）１５結合部１６パケット化部２１デパケット化部２２メインコーデック・サブコーデック分配部２５サブコーデック（デコーダ）２６波形繰り返し補償部２７切換スイッチ DESCRIPTION OF SYMBOLS 1 Network module 2 Packet loss compensation part 3 Fluctuation absorption buffer 4 Audio decoder 11 Main codec (encoder) 12, 23 Main codec (decoder) 13, 24 Sub codec (calculation of analysis coefficient) 14 Sub codec (encoder) 15 Coupling part 16 packet Conversion unit 21 depacketization unit 22 main codec / sub-codec distribution unit 25 sub-codec (decoder) 26 waveform repetition compensation unit 27 switch

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5D045 DA20 5J064 AA01 BB04 BC01 BC02 BD02 BD03 BD04 5K067 AA23 BB04 CC08 EE02 GG11 HH21 HH23 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5D045 DA20 5J064 AA01 BB04 BC01 BC02 BD02 BD03 BD04 5K067 AA23 BB04 CC08 EE02 GG11 HH21 HH23

Claims

[Claims]

1. A packet assembling method for assembling a packet of a code string of an audio signal, comprising the steps of: encoding an audio signal for each frame to generate a code string; Combining and storing code strings of N (N: any integer) frames.

2. A packet assembling method for assembling a packet of a code string of an audio signal, wherein the audio signal is encoded for each frame by a high-quality first encoding method. Generating a second code sequence by encoding a speech signal of a preceding N (N: any integer) frame by a second encoding method with high compression for each frame; Combining and storing the first code string of the current frame and the second code string of the previous frame as a packet of the current frame time.

3. The packet assembling method according to claim 2, wherein the step of generating the second code sequence includes the steps of: decoding from the first code sequence to generate a decoded signal; Calculating an analysis coefficient, wherein the second encoding method includes the step of generating the second code sequence from the audio signal using the analysis coefficient. Method.

4. A packet assembling apparatus for assembling a packet of a code sequence of an audio signal, comprising: an encoder for encoding the audio signal for each frame to generate a code sequence; A combining unit for combining and storing code strings of N (N: any integer) frames.

5. A packet assembling apparatus for assembling a packet of a code string of an audio signal, wherein the audio signal is encoded by a high-quality first encoding method for the audio signal for each frame, and the first code string is encoded. A first encoder that generates a second code sequence by coding an audio signal for each frame with a code sequence of a preceding N (N: an arbitrary integer) frame using a high-compression second coding method; A second encoder, combining the code sequence of the first frame of the current frame generated by the first encoder as the packet of the current frame time and the second code sequence of the previous frame generated by the second encoder A packet assembling device, comprising: a coupling unit for storing.

6. The packet assembling apparatus according to claim 5, further comprising: means for decoding from the first code sequence to generate a decoded signal; and means for calculating an analysis coefficient from the decoded signal. A second encoding method for a second encoder that generates a sequence, comprising: means for generating the second code sequence from the audio signal using an analysis coefficient.

7. A packet at the current frame time obtained by combining the code sequence of the current frame and the code sequence of the previous N (N: an arbitrary integer) frame is input, and the speech is generated from the code sequence stored in the packet for each frame. In a packet disassembly method for decoding a signal, a process of determining whether or not a packet has been lost for each frame; and, if it is determined that the packet at the current frame time has not been lost, the current frame stored in the packet at the current frame time. Decoding the audio signal from the code string of the above, and decoding the audio signal from the code string of the current frame included in the packet at the past frame time when it is determined that the packet at the current frame time is lost. A packet disassembly method comprising:

8. An audio signal is encoded for each frame by a high-quality first encoding method using a code sequence of a current frame to generate a first code sequence. : Arbitrary integer) The code sequence of the frame is encoded by the second encoding method of high compression to generate a second code sequence, and the first code sequence of the current frame is generated.
A packet disassembly method for inputting a packet at the current frame time obtained by combining the code sequence of the frame of the previous frame and the second code sequence of the previous frame and decoding the audio signal from the code sequence stored in the packet for each frame, Determining whether or not the packet at the current frame time has not been lost; and determining from the first code sequence of the current frame among the code sequences stored in the packet at the current frame time when it is determined that the packet at the current frame time has not been lost. A first step of decoding an audio signal by a first decoding method corresponding to the first encoding method, and a packet of a past frame time when it is determined that the packet at the current frame time is lost. A second step of decoding an audio signal from a second code string of the current frame among the code strings stored in the second frame by a second coding method and a corresponding second decoding method. Packet decomposition method according to symptoms.

9. A packet decomposing method according to claim 8, wherein an analysis coefficient is calculated from an audio signal decoded from a first code sequence of a past frame among code sequences stored in a packet at a past frame time. And in the second step, the second encoding method uses a second code of a current frame in a code string stored in a packet at a past frame time using the analysis coefficient. A packet disassembly method comprising decoding an audio signal from a sequence.

10. The packet disassembly method according to claim 7, wherein a plurality of packets at the current frame time are determined to be continuously lost, and the speech of the current frame is determined from the second code string. Generating a pseudo audio signal from frames before and after the frame that cannot be decoded if the signal cannot be decoded, and performing erasure compensation.

11. The packet disassembly method according to claim 7, wherein when it is determined that the packet at the current frame time has not been lost, the packet is included in the current frame signal and the past packet. A packet disassembly method comprising a step of generating a voice signal by adding a current frame signal.

12. An audio signal is encoded for each frame to generate a code sequence, and a packet at the current frame time obtained by combining the code sequence of the current frame and the code sequence of the preceding N (N: any integer) is input. A packet decomposer that decodes an audio signal and determines whether or not a packet has been lost for each frame; and if the packet loss determination means determines that the packet at the current frame time has not been lost, The audio signal is decoded from the code string of the current frame stored in the packet of the current frame time, and if it is determined that the packet of the current frame time has been lost, the code string of the current frame included in the packet of the previous frame time And a decoding means for decoding an audio signal from the packet signal.

13. An audio signal is encoded for each frame by a high-quality first encoding method using a code sequence of a current frame to generate a first code sequence. (Arbitrary integer) The code sequence of the frame is encoded by the second encoding method of high compression to generate a second code sequence, and the first code sequence of the current frame and the second code sequence of the previous frame are generated. In a packet decomposer for inputting a packet of the combined current frame time and decoding an audio signal from a code string stored in a packet for each frame, a packet loss determining means for determining whether or not a packet is lost for each frame; If the determination means determines that the packet at the current frame time has not been lost, it corresponds to the first encoding method from the first code sequence of the current frame among the code sequences stored in the packet at the current frame time. Do When the audio signal is decoded by the decoding method of No. 1 and it is determined that the packet at the current frame time has been lost, the second code of the current frame in the code string stored in the packet at the previous frame time is used. Decoding means for decoding an audio signal from a sequence by a second decoding method corresponding to the second encoding method.

14. A packet decomposing apparatus according to claim 13, wherein said decoding means comprises a decoding means for converting a speech signal decoded from a first code string of a past frame among code strings stored in a packet at a past frame time. Means for calculating an analysis coefficient, wherein the second encoding method uses the analysis coefficient to convert an audio signal from a second code string of a current frame among code strings stored in a packet at a previous frame time. A packet decomposing device for decoding.

15. The packet disassembly device according to claim 12, wherein the decoding means determines whether the packet at the current frame time has not been lost and the current frame signal and the past frame signal. A packet decomposer for generating an audio signal by adding a current frame signal included in a packet.

16. A computer for encoding a speech signal of a current frame by a high-quality first encoding method to generate a first code string for each frame; Encoding the code sequence of the previous N (N: any integer) frame by the second encoding method of high compression to generate a second code sequence; and the first frame of the current frame as a packet at the current frame time. And a program for executing a procedure of combining and storing the code sequence of the first frame and the second code sequence of the previous frame.

17. A computer comprising: a computer for encoding a speech signal of a current frame by a high-quality first encoding method for each frame to generate a first code string; Encoding the code sequence of the previous N (N: integer) frame by a second encoding method of high compression to generate a second code sequence; and a first frame of the current frame as a packet at the current frame time. A computer-readable recording medium on which a program for executing a procedure for combining and storing the code sequence of the previous frame and the second code sequence of the previous frame is stored.

18. A procedure for inputting a packet at a current frame time obtained by combining a code sequence of a current frame and a code sequence of a previous N (N: an arbitrary integer) frame into a computer, and determining whether or not a packet has been lost for each frame. A determining step; a step of decoding an audio signal from a code string of the current frame stored in the packet at the current frame time when it is determined that the packet at the current frame time has not been lost; Is a program for executing a procedure for decoding an audio signal from a code string of a current frame included in a packet at a past frame time when it is determined that is lost.

19. A procedure for inputting, as a packet at the current frame time, a packet at the current frame time obtained by combining the code string of the current frame and the code string of the previous N (N: an arbitrary integer) frame to the computer; And a procedure for decoding the audio signal from the code string of the current frame stored in the packet at the current frame time when it is determined that the packet at the current frame time has not been lost. If it is determined that the packet at the current frame time has been lost, a computer-readable program that records a program for executing a procedure for decoding the audio signal from the code string of the current frame included in the packet at the past frame time recoding media.