JP3240832B2

JP3240832B2 - Packet voice decoding method

Info

Publication number: JP3240832B2
Application number: JP12371294A
Authority: JP
Inventors: 一則間野; 宏志小西; 仲大室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-06-06
Filing date: 1994-06-06
Publication date: 2001-12-25
Anticipated expiration: 2016-12-25
Also published as: JPH07334191A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声信号が符号化さ
れ、その符号化情報がパケット化して伝送されてきたパ
ケットを受信復号して音声信号を出力するパケット音声
復号方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a packet speech decoding method for encoding a speech signal, receiving and decoding a packet in which the encoded information is packetized and transmitted, and outputting a speech signal.

【０００２】[0002]

【従来の技術】まず、パケットによる音声の送受につい
て説明する。図１０に示すように、端子１より入力した
音声信号は、入力バッファ２に格納されたのち、符号化
部３で音声符号化される。その符号化音声は、送信バッ
ファ４に一時的に蓄えられたのち、パケットとして端子
５からパケット網１１に送出される。送出パケットは、
受信器の入力端子６で受信し、受信バッファ７に一時的
に蓄えられたのち、復号部８で復号される。その復号音
声は出力バッファ９に送られ、端子１０より音声出力さ
れる。ここで使用する音声符号化方式としては、サンプ
ルごとの符号化方式あるいは、複数サンプルのブロック
符号化方式のいずれでもよい。例えば、線形ＰＣＭ，Ｉ
ＴＵ−Ｔ勧告Ｇ．７１１（μ則ＰＣＭ）やＧ．７２６
（ADPCM)，Ｇ．７２８（LD−CELP）といった方式やＣＥ
ＬＰ（Code Excited Linear Prediction：符号励振線形
予測）符号化方式でもよい。2. Description of the Related Art First, transmission and reception of voice by packet will be described. As shown in FIG. 10, an audio signal input from a terminal 1 is stored in an input buffer 2 and then audio-encoded by an encoding unit 3. The encoded voice is temporarily stored in the transmission buffer 4 and then transmitted from the terminal 5 to the packet network 11 as a packet. The outgoing packet is
The signal is received at the input terminal 6 of the receiver, temporarily stored in the reception buffer 7, and then decoded by the decoding unit 8. The decoded sound is sent to the output buffer 9 and output from the terminal 10 as sound. The speech coding method used here may be either a coding method for each sample or a block coding method for a plurality of samples. For example, linear PCM, I
TU-T Recommendation G. 711 (μ-law PCM) and G. 726
(ADPCM), G.M. 728 (LD-CELP) and CE
An LP (Code Excited Linear Prediction) coding scheme may be used.

【０００３】パケット送受のタイミングを図１１に示
す。図１０中の端子５から送信される１０個の送信パケ
ットＰ₁〜Ｐ₁₀（ａ）に対し、端子６で受信される受信
パケット（ｂ）は時間的に遅れるが、この図では全ての
パケットＰ₁〜Ｐ₁₀が同一遅れで到着し、つまり最初の
受信パケットＰ₁の到達時刻から予期される到着時刻に
それぞれ遅れることなく到着した場合を示している。こ
の受信パケットを復号した端子１０からの音声出力は図
１１（ｃ）のようになる。このように全ての受信パケッ
トが遅れないで到着した場合には、出力音声信号（ｃ）
に切断等の劣化は生じない。FIG. 11 shows the timing of packet transmission / reception. The received packet (b) received at the terminal 6 is delayed in time from the ten transmitted packets P _{1 to} P ₁₀ (a) transmitted from the terminal 5 in FIG. This shows a case where P _{1 to} P ₁₀ arrive with the same delay, that is, arrive at the expected arrival time from the arrival time of the first received packet P ₁ without delay. The audio output from the terminal 10 obtained by decoding the received packet is as shown in FIG. When all the received packets arrive without delay, the output audio signal (c)
No deterioration such as cutting occurs.

【０００４】ところが、図１２（ａ）に示すように、受
信パケットＰ₄，Ｐ₈が到着予定時刻ｔ₄，ｔ₈よりも
遅延して到着した場合には、復号出力音声信号は図１２
（ｂ）に示すように、パケットＰ₃の復号音声信号Ｖ₃
とパケットＰ₄の復号音声信号Ｖ₄との間に切断が生
じ、同様に復号音声信号Ｖ₇とＶ₈との間に切断が生じ
る。However, as shown in FIG. 12A, when the received packets P ₄ and P ₈ arrive later than the estimated arrival times t ₄ and t ₈ , the decoded output audio signal is output as shown in FIG.
(B), the decoded voice signal V ₃ of the packet P ₃
And cutting occurs between the decoded speech signal V ₄ of the packet P _4, is cut between the decoded speech signal V ₇ and V ₈ similarly occurs.

【０００５】この従来のパケット受信復号処理は図１３
に示すように、音声パケットを受信し（Ｓ₁），その各
音声パケットを復号し（Ｓ₂），その復号音声信号をバ
ッファリングし（Ｓ₃），そのバッファに復号音声信号
があるかを調べ（Ｓ₄），音声信号があればその音声信
号を出力し（Ｓ₅），音声信号がなければ無音が出力さ
れる（Ｓ₆）。このように受信パケットから音声を復号
して出力するが、音声出力時点で出力する音声がない場
合には、遅れてきたパケットが出力されるまでは、零
（無音）出力とし、図１２（ｂ）に示すようになり、か
つパケットＰ₄，Ｐ₈の遅延により、出力音声に切断区
間ができ、また、その切断区間の累積時間がそのまま出
力音声の累積遅延時間となってしまう。[0005] This conventional packet reception / decoding process is shown in FIG.
As shown in ( ₁ ), a voice packet is received (S ₁ ), each voice packet is decoded (S ₂ ), the decoded voice signal is buffered (S ₃ ), and it is determined whether there is a decoded voice signal in the buffer. It examined (S _4), if there is audio signal to output the audio signal (S _5), silence is output if there is no audio signal (S _6). As described above, the voice is decoded from the received packet and output. If there is no voice to be output at the time of outputting the voice, the output is zero (silence) until the delayed packet is output. ) And the delay of the packets P ₄ and P ₈ causes a cut section in the output voice, and the cumulative time of the cut section becomes the cumulative delay time of the output voice as it is.

【０００６】このような音声切断を防ぐため従来におい
ては、初期音声出力時刻を遅くし、想定されるパケット
の遅れを吸収し、連続的に音声出力が可能となる程度に
十分大きな出力バッファを用意することが提案されてい
る。この場合には、例えば図１２（ｃ）に示すように初
期復号音声信号Ｖ₁を出力する時刻を十分な時間Ｔ₃遅
らせて、切断がなくなる。しかし、音声出力の遅延が大
きくなり、会話を想定した低遅延の音声通信としては不
適合である。Conventionally, in order to prevent such audio disconnection, an output buffer large enough to delay the initial audio output time, absorb an expected packet delay, and continuously output audio is prepared. It has been proposed to. In this case, the time for outputting the initial decoded voice signal V ₁ as example shown in FIG. 12 (c) is delayed for sufficient time T _3, the cutting is eliminated. However, the delay of the voice output becomes large, which is unsuitable for low-latency voice communication assuming conversation.

【０００７】従来において出力遅延のない復号方法とし
ては、図１４に示す処理が提案されている。つまり音声
パケットを受信し（Ｓ₁），その音声パケットが予定時
刻より遅れた遅延パケットであるか否かを判定し
（Ｓ₂)、遅延パケットでなければそのパケットを復号処
理し（Ｓ₃），バッファリングし（Ｓ₄），その復号音
声出力する（Ｓ₅）。遅延パケットの場合は、そのパケ
ットを欠落したものとみなして、無音を出力する
（Ｓ₆）。この場合は図１２（ｄ）に示すように、出力
遅延はないが、遅れて復号ができないパケットＰ₄の音
声信号Ｖ₄と、パケットＰ₈の音声信号Ｖ₈と相当する
区間は、それぞれ切断区間Ｔ₄，Ｔ₅となってしまう。Conventionally, a processing shown in FIG. 14 has been proposed as a decoding method without output delay. That is, a voice packet is received (S ₁ ), and it is determined whether or not the voice packet is a delay packet delayed from the scheduled time (S ₂ ). If not, the packet is decoded (S ₃ ). , Buffering (S ₄ ) and outputting the decoded voice (S ₅ ). If the packet is a delay packet, the packet is regarded as being lost, and silence is output (S ₆ ). As shown in this case FIG. 12 (d), the the output delay is not a voice signal V ₄ of the packet P ₄ can not delayed decoding, a section corresponding with the voice signal V ₈ of the packet P ₈ are each cut Sections T ₄ and T ₅ will result.

【０００８】そこで、従来において図１５にブロック構
成を示し、その処理手順を図１６に示すように、現在フ
レームの復号に間に合わないパケットは欠落したものと
して扱い、現フレームより先に到着したパケットのフレ
ーム音声から外挿補間によって、現在フレームの復号音
声とする方法が提案されている。つまり制御部２０で現
在フレームの復号すべき情報パケットの到達の有無を受
信バッファ７について監視し、必要なパケットが遅れて
いる場合には、制御部２０は、スイッチ２２を切り換え
る出力バッファ９の入力側を復号部８の出力側から補間
部２１の出力側に接続して、補間部２１では既に得られ
ている復号音声情報を用いて補間音声を生成する。図１
６に示すように、音声パケットを受信し（Ｓ₁），それ
が予定時刻より遅れた遅延パケットか否かを調べ
（Ｓ₂），遅延パケットでなければ音声復号化し
（Ｓ₃），バッファリングし（Ｓ₄），その後、音声信
号を出力する（Ｓ₅）。遅延パケットであれば既に受信
されている復号音声信号による補間処理を行って
（Ｓ₇），バッファリングする（Ｓ₄）。[0008] Therefore, conventionally, a block configuration is shown in FIG. 15, and the processing procedure is treated as shown in FIG. 16, where a packet that cannot be decoded in time for the current frame is treated as a lost packet, and a packet that arrives earlier than the current frame is processed. A method has been proposed in which the current frame is decoded by extrapolation from the frame voice. That is, the control unit 20 monitors the arrival of the information packet to be decoded in the current frame with respect to the reception buffer 7, and when a necessary packet is delayed, the control unit 20 switches the input of the output buffer 9 for switching the switch 22. The side is connected from the output side of the decoding unit 8 to the output side of the interpolation unit 21, and the interpolation unit 21 generates an interpolated speech using the decoded speech information already obtained. FIG.
As shown in FIG. 6, a voice packet is received (S ₁ ), and it is checked whether or not it is a delayed packet that is later than the scheduled time (S ₂ ). If it is not a delayed packet, voice decoding is performed (S ₃ ) and buffering is performed. and (S _4), and then outputs the audio signal (S _5). Performing interpolation processing by the decoded speech signal that has already been received if the delay packet (S _7), buffers (S _4).

【０００９】音声の補間方法としては、例えば特公昭６
１−７７７９号「音声瞬断時補間受信装置」に示す手法
を用いることができる。この手法は受信音声信号の周期
を測定するピッチ周期検出手段をもち、補間が必要な場
合には、得られたピッチ周期に基づいて補間の開始時点
から必要な時間だけピッチ周期前の信号を繰り返す。ま
た、Ｒ．Ｖ．Ｃoxらの“Robust CELP coders for noisy
backgrounds and noisy channels",IEEE Proc. ICASSP
-89, pp.739-742(1989) にＣＥＬＰ系の音声符号化方式
での補間方法が述べられている。つまり、ＣＥＬＰ系の
音声符号化方式では、音声符号化情報として線形予測係
数、ピッチ周期、利得、励振符号が伝送される。あるフ
レームを補間する場合には、前のフレームの各パラメー
タを繰り返して使用すればよい。さらに、補間区間が長
い場合には、少しずつ利得を小さくしてもよい。As a method of interpolating speech, for example,
A method described in No. 1-7779 “Interpolation receiving device at the time of instantaneous interruption of voice” can be used. This method has a pitch period detecting means for measuring the period of the received voice signal. If interpolation is necessary, the signal before the pitch period is repeated for a necessary time from the start time of the interpolation based on the obtained pitch period. . In addition, R. V. Cox et al. “Robust CELP coders for noisy
backgrounds and noisy channels ", IEEE Proc. ICASSP
-89, pp. 739-742 (1989) describes an interpolation method in a CELP speech coding system. That is, in the CELP speech coding scheme, a linear prediction coefficient, a pitch period, a gain, and an excitation code are transmitted as speech coding information. When interpolating a certain frame, the parameters of the previous frame may be used repeatedly. Further, when the interpolation section is long, the gain may be gradually reduced.

【００１０】図１５，図１６に示す補間をする場合の出
力音声の様子を図１２（ｅ）に示す。つまり図１２
（ｄ）中のパケットＰ₄の遅延に基づく切断区間Ｔ₄は
その直前の復号音声信号Ｖ₃を利用した補間音声信号Ｖ
₃′で補間され、同様にパケットＰ₈の遅延に基づく切
断区間Ｔ₅は直前の復号音声信号Ｖ₇から作られた補間
信号Ｖ₇′で補間される。この方法では、先に到着した
フレームだけから欠落したパケットを補間している。従
って、現在フレームの音声内容がその直前のフレームの
内容と変わらずに、同じ音韻が継続する場合には問題は
ない。しかし、欠落したパケットの中だけに含まれてい
た音韻があった場合には、その内容を補間によって復元
することはできない。FIG. 12 (e) shows the state of the output sound when the interpolation shown in FIGS. 15 and 16 is performed. That is, FIG.
(D) in the cutting zone T ₄ based on the delay of the packet P ₄ of the interpolated sound signal V using a decoded audio signal V ₃ and the preceding
'Is interpolated, the cutting interval T ₅ based on the same delay of the packet P ₈ immediately before the decoded speech signal V ₇ interpolation signal V ₇ made of' ₃ are interpolated. In this method, a missing packet is interpolated only from a previously arrived frame. Therefore, there is no problem if the same phoneme continues without changing the voice content of the current frame from the content of the immediately preceding frame. However, if there is a phoneme included only in the missing packet, its contents cannot be restored by interpolation.

【００１１】また従来において音声補間の場合、図１７
（ａ）に示すようにパケットＰ_Kによる復号音声信号Ｖ
_Kが終わった時刻ｔ₀にはパケットＰ_K+1の遅延のた
め、その遅延時間と対応する区間Ｔ_iはパケットＰ_Kか
らの補間音声信号Ｖ_K′が補間され、その補間区間Ｔ_i
の終了時点ｔ₁に遅延パケットＰ_K+1の復号音声信号Ｖ
_K+1（図１７（ｂ））をそのまま接続するとｔ₀とｔ₁
には何らの制約がないので、その接続した信号は図１７
（ｃ）に示すように、接続点ｔ₁の前後で不連続にな
り、ピッチの周期性も乱れてしまう。また補間時間Ｔ_i
だけ遅れた音声出力となる。さらに復号に既に受信され
た信号を利用する場合は、補間音声Ｖ_K′を利用して遅
延パケットＰ_K+1を復号することになり、送信側では補
間音声のことを考慮して符号化して送信することは不可
能であるから、送信側（符号器側）と受信側（復号器
側）とで復号過程が異なり、送信側と同じ音声を復号す
ることができなくなってしまう。Conventionally, in the case of voice interpolation, FIG.
As shown in FIG._KVoice signal V
_KTime t ends₀Has a packet P_{K + 1}Of delay
The delay T and the corresponding section T_iIs the packet P_KOr
Interpolation sound signal V_K′ Are interpolated, and the interpolation section T_i
End time t₁Delay packet P_{K + 1}Of the decoded audio signal V
_{K + 1}If (FIG. 17B) is directly connected, t₀And t₁
Has no restrictions, the connected signal is
As shown in FIG.₁Before and after
In addition, the pitch periodicity is disturbed. The interpolation time T_i
The audio output is delayed only by. Already decrypted already received
To use the interpolated voice V_K′ To delay
Total packet P_{K + 1}Will be decoded, and
It is not possible to encode and transmit in consideration of inter-voice
The transmission side (encoder side) and the reception side (decoder
Side), the decoding process is different, and the same audio as the transmitting side is decoded.
Can not be done.

【００１２】[0012]

【発明が解決しようとする課題】先に説明したように、
従来のパケット音声復号方法において、音声補間しない
切断区間のある復号音声とする場合は、ぶつぶつととぎ
れた聴感的に非常に劣化した音声となってしまう。ま
た、バッファリングによる方法により、切断を少なくす
る場合は大きな時間遅れが必要となり、実時間の音声対
話が不自由になる。さらに、遅延パケットを欠落パケッ
トとして補間する場合は、その遅延パケット中にだけあ
った音韻は補間できず、正しい音声内容を復元できな
い。As described above, as described above,
In the conventional packet voice decoding method, when a decoded voice having a cut section in which voice interpolation is not performed is used, the voice is degraded in terms of audibility. In addition, when the disconnection is reduced by the buffering method, a large time delay is required, and real-time speech dialogue becomes inconvenient. Further, when a delayed packet is interpolated as a missing packet, phonemes that exist only in the delayed packet cannot be interpolated, and correct voice content cannot be restored.

【００１３】この発明の目的は、上記の欠点を解決する
ためのもので、ある制限時間内のパケット遅延であれ
ば、遅延したフレーム音韻の欠落をなくし、スムーズな
補間音声を出力して切断区間をなくし、かつ、時間遅延
が大きくならないパケット音声復号方法を提供すること
にある。An object of the present invention is to solve the above-mentioned drawbacks. If a packet is delayed within a certain time limit, a lost frame phoneme is eliminated, a smooth interpolated voice is output, and a cut section is output. Another object of the present invention is to provide a packet voice decoding method which eliminates the problem and does not increase the time delay.

【００１４】[0014]

【課題を解決するための手段】この発明では、パケット
が遅れた場合には、まず補間によって得た補間音声信号
を、その前の音声信号に続けて出力し、その後、予め定
めた制限時間内に遅れたパケットが到着した場合には、
それを復号し、その復号音声信号を補間音声信号の後に
接続させるが、請求項１の発明では、遅れたパケット
が、遅れないで到着した場合にその復号音声信号の終了
まで、遅れた音声パケットの復号音声を時間軸圧縮して
接続させ、請求項２の発明では、遅れたパケットの全て
を復号し、その復号音声信号以後における無音区間を、
補間音声に用いた時間分だけ圧縮して時間調整を行い、
請求項３の発明では、請求項２の発明においてさらに遅
れたパケットの有音区間も時間圧縮して、これと無音区
間での圧縮との両者で補間音声区間分の時間調整を行
う。According to the present invention, when a packet is delayed, an interpolated audio signal obtained by interpolation is first output following the previous audio signal, and thereafter, within a predetermined time limit. If a packet arrives late to
The decoded audio signal is connected after the interpolated audio signal. In the invention according to claim 1, when the delayed packet arrives without delay, the delayed audio packet is output until the end of the decoded audio signal. In the invention of claim 2, all the delayed packets are decoded, and a silent section after the decoded audio signal is
The time is adjusted by compressing only the time used for the interpolation sound,
According to the third aspect of the present invention, the voiced section of the packet which is further delayed in the second aspect is time-compressed, and the time adjustment for the interpolated voice section is performed by both the compression and the compression in the silent section.

【００１５】請求項１，請求項２あるいは請求項３のい
ずれかに記載されるパケット音声復号方法において、補
間音声にピッチ周期性があるときに、遅延パケットの音
声を接続する場合には、補間音声の開始時刻からピッチ
周期の整数倍の時刻までを補間音声区間とし（請求項４
の発明）、パケットの復号に過去の復号音声が必要な場
合には、遅れたパケットの復号に補間音声信号の直前の
音声情報を用い（請求項５の発明）、補間音声信号と遅
延パケットの復号音声信号との接続を、これら両信号に
補間用の窓関数をそれぞれ乗じて加算して行う（請求項
６の発明）。In the packet speech decoding method according to any one of claims 1, 2 and 3, when the speech of the delay packet is connected when the interpolation speech has a pitch periodicity, the interpolation An interpolated voice section is defined from a voice start time to a time that is an integral multiple of the pitch period.
Invention), when a past decoded speech is required for decoding a packet, the speech information immediately before the interpolation speech signal is used for decoding the delayed packet (the invention of claim 5), and the interpolation speech signal and the delayed packet are decoded. The connection with the decoded audio signal is performed by multiplying both of these signals by the window function for interpolation and adding them (the invention of claim 6).

【００１６】[0016]

【作用】請求項１の発明では、パケットが遅れるとそ
のパケットが到着して、復号されるまでの間、先に到着
したパケットの符号化音声情報から補間音声が出力さ
れ、音声の切断がなくなり、切断による品質劣化が防げ
る。また遅れた音声パケットが到着して、復号音声信号
を接続することにより、音韻を失うことなく、音声内容
を確実に再生することができる。しかも、この遅れたパ
ケットの復号音声信号は時間軸圧縮されているから、音
声遅延の累積がない。According to the present invention, when a packet is delayed, the interpolated speech is output from the encoded speech information of the previously arrived packet until the packet arrives and is decoded. And quality degradation due to cutting can be prevented. Also, by connecting the decoded audio signal when a delayed audio packet arrives, the audio content can be reliably reproduced without losing the phoneme. Moreover, since the decoded audio signal of the delayed packet is compressed on the time axis, there is no accumulation of audio delay.

【００１７】請求項２の発明では、パケットが遅れると
そのパケットが到着して、復号されるまでの間、先に到
着したパケットの符号化音声情報から補間音声が出力さ
れ、音声の切断がなくなり、切断による品質劣化を防ぐ
ことができる。また遅れた音声パケットが到着して、復
号音声信号が接続されるため、そのパケットに存在する
音声内容を確実に再生することができる。さらに、この
復号音声信号以後の音声信号の無音区間が時間軸圧縮さ
れるため、音声の遅延が累積していくということがな
い。According to the second aspect of the present invention, if a packet is delayed, the interpolated speech is output from the encoded speech information of the previously arrived packet until the packet arrives and is decoded. In addition, quality degradation due to cutting can be prevented. Further, since a delayed voice packet arrives and a decoded voice signal is connected, the voice content existing in the packet can be reliably reproduced. Furthermore, since the silent section of the audio signal after the decoded audio signal is compressed on the time axis, the delay of the audio does not accumulate.

【００１８】請求項３の発明では、パケットが遅れると
そのパケットが到着して、復号されるまでの間、先に到
着したパケットの符号化音声情報から補間音声が出力さ
れ、音声の切断がなくなり、切断による品質劣化を防ぐ
ことができる。遅れた音声パケットが到着して、その復
号音声信号が接続されることにより、そのパケットに存
在する音声内容を確実に再生することができる。さら
に、この復号音声信号以後の音声信号の無音区間および
有音区間において、時間軸圧縮が行われることにより、
音声の遅延が累積していくということがない。According to the third aspect of the present invention, if a packet is delayed, the interpolated speech is output from the encoded speech information of the previously arrived packet until the packet arrives and is decoded. In addition, quality degradation due to cutting can be prevented. When the delayed voice packet arrives and the decoded voice signal is connected, the voice content existing in the packet can be reliably reproduced. Further, by performing time axis compression in a silent section and a sound section of the audio signal after the decoded audio signal,
Voice delays do not accumulate.

【００１９】請求項１，請求項２あるいは請求項３のい
ずれかに記載されるパケット音声復号方法において、請
求項４の発明では、補間音声にピッチ周期性があるとき
に、遅延パケットの音声を接続する場合には、補間音声
の開始時刻からピッチ周期の整数倍の時刻までを補間音
声とされるため、補間音声の開始時刻の波形と補間終了
時刻の波形とが１ピッチの同じ位置になるので、それ以
後に遅延パケットの復号音声を接続しても接続境界で不
連続となることがない。請求項５の発明では遅延パケッ
トの復号に過去の復号音声が必要な場合には、補間する
直前の音声情報を用いて復号されるため、音声補間処理
は受信側のみであるが、後続の音声復号処理に影響を与
えることがなく、後続の復号音声としては送信側と同じ
波形が生成される。請求項６の発明では、補間音声と遅
延パケット復号音声を補間用の窓関数を乗じて接続する
ことにより、補間途中で音声が変化した場合でも、連続
的に重み付け加算されるので、接続境界の不連続性が弱
まる。In the packet speech decoding method according to any one of the first, second and third aspects, according to the fourth aspect of the invention, when the interpolated speech has a pitch periodicity, the speech of the delayed packet is reproduced. In the case of connection, since the interpolation sound is from the start time of the interpolation sound to the time that is an integral multiple of the pitch period, the waveform of the interpolation sound start time and the waveform of the interpolation end time are at the same position of one pitch. Therefore, even if the decoded voice of the delayed packet is connected after that, no discontinuity occurs at the connection boundary. According to the fifth aspect of the present invention, when a past decoded voice is required for decoding a delayed packet, the decoded voice is decoded by using the voice information immediately before the interpolation. Therefore, the voice interpolation process is performed only on the receiving side. The same waveform as that on the transmitting side is generated as the subsequent decoded voice without affecting the decoding process. According to the invention of claim 6, by interpolating the interpolated voice and the delayed packet decoded voice by multiplying them by a window function for interpolation, even if the voice changes during the interpolation, the weights are continuously added. Discontinuity weakens.

【００２０】[0020]

【実施例】請求項１の実施例図１に、請求項１の発明の実施例が適用された音声パケ
ット通信の受信側ブロック構成を、図１５と対応する部
分に同一符号を付けて示す。図１において、端子６より
受信したパケットは、受信バッファ７に蓄えられ、送信
パケット順に並べ変えられる。制御部３０では、図２に
示す流れ図に示すように、音声パケットを受信すると
（Ｓ₁），復号しようとする音声パケットが遅れている
かどうかを判断する（Ｓ₂），受信バッファ７よりのパ
ケットは復号部８で、順番に符号化情報を復号して復号
音声信号を生成する。遅延パケットでない場合には、そ
の復号音声信号は切換え器３３，３４の各接点Ｎ側を通
じて出力バッファ９に送られ、出力バッファ９を経て端
子１０より音声信号が出力される。図２の流れ図では、
遅れていないパケットが音声復号処理され（Ｓ₃），さ
らにバッファリングされた後（Ｓ₄），音声は出力され
る（Ｓ₅）。 Example Figure 1 EXAMPLES claim 1, shows claims reception block configuration of a voice packet communication according to the embodiments of the first aspect of the invention, with the parts corresponding to those in FIG. 15. In FIG. 1, packets received from a terminal 6 are stored in a reception buffer 7 and rearranged in the order of transmission packets. As shown in the flow chart of FIG. 2, when the control unit 30 receives a voice packet (S ₁ ), it determines whether or not the voice packet to be decoded is delayed (S ₂ ). Is a decoding unit 8 for sequentially decoding the encoded information to generate a decoded audio signal. If the packet is not a delay packet, the decoded audio signal is sent to the output buffer 9 through the contacts N of the switches 33 and 34, and the audio signal is output from the terminal 10 via the output buffer 9. In the flowchart of FIG.
A packet that has not been delayed is subjected to voice decoding processing (S ₃ ), and after being further buffered (S ₄ ), voice is output (S ₅ ).

【００２１】制御部３０が遅延パケットであると判断し
た場合には、図２に示すように、補間部３１でその遅延
パケットが到着し、復号するまで音声補間処理を行う
（Ｓ₆）。この場合の補間処理は、〔従来の技術〕の項
で述べた波形のピッチ周期抽出に基づく繰り返し処理、
またはＣＥＬＰ系の場合には前の伝送パラメータを繰り
返して使用する。If the control unit 30 determines that the packet is a delay packet, as shown in FIG. 2, the interpolation unit 31 performs voice interpolation processing until the delay packet arrives and is decoded (S ₆ ). The interpolation process in this case is a repetition process based on the extraction of the pitch period of the waveform described in the section of [Prior Art],
Alternatively, in the case of the CELP system, the previous transmission parameters are used repeatedly.

【００２２】この音声補間は遅延パケットが到来するま
で行われ（Ｓ₇），遅延パケットが到来すると、その遅
延パケットが音声復号処理され（Ｓ₈），その復号音声
信号は時間軸圧縮部３２で時間軸圧縮され、その圧縮さ
れた信号は、その遅延パケットが遅れることなく到来し
たときのその復号音声信号の終了時刻まで、切換え器３
３，３４の各接点Ａ側を通じて出力バッファ９に出力さ
れ（Ｓ₄），補間音声に続けて端子１０より出力される
（Ｓ₅）。This voice interpolation is performed until a delay packet arrives (S ₇ ). When the delay packet arrives, the delay packet is subjected to voice decoding processing (S ₈ ), and the decoded voice signal is processed by the time axis compression unit 32. The time axis compressed and the compressed signal is output to the switch 3 until the end time of the decoded audio signal when the delayed packet arrives without delay.
The signals are output to the output buffer 9 through the contacts A of the terminals 3 and 34 (S ₄ ), and output from the terminal 10 following the interpolation sound (S ₅ ).

【００２３】ここで使用する時間軸圧縮方法としては、
例えばＤ．Ｍalah氏の論文：" Time-Domain Algorithms
for Harmonic Bandwidth Reduction and Time Scaling
ofSpeech Signals", IEEE Trans. on Asouctics, Spee
ch, and Signal Processing,vol. ASSP-27, No.2, pp.1
21-133,(1979)にある時間領域調波構造伸縮（TDHS :Tim
e Domain Harmonic Scaling) アルゴリズム、または同
様な手法である森田・板倉氏の研究会資料：“自己相関
法による音声の時間軸での伸縮方式とその評価”，電子
情報通信学会電気音響研究会技術報告ＥＡ８６−５（１
９８６）のアルゴリズムを利用する。これらは、ピッチ
周期単位で前後の波形に重み付け窓をかけ、その区間を
重ね合わせることによって時間軸圧縮する。図３にＴＤ
ＨＳアルゴリズムによる２：１の圧縮の様子を示す。ま
ず、図３（ａ）に示す音声信号からピッチ周期Ｔp を求
め、次に例えば同図（ｂ）に示すように時刻ｔ₁からそ
れぞれ１ピッチ周期Ｔp 前後の各時刻ｔ₀，ｔ₁に直線
的に０より１になる重み付け窓関数を同図（ａ）の２ピ
ッチ周期の音声信号に乗じて、時刻ｔ₀〜ｔ₁，ｔ ₁〜
ｔ₂の各音声波形を同図（ｃ）の波形とし、これら両波
形を重ね合わせ加算して、同図（ｄ）に示す１ピッチ周
期Ｔp の時間軸圧縮音声信号を得る。またピッチ周期が
ない区間に対しても、適当な周期で重ね合わせを行うこ
とにより時間軸圧縮を行う。The time axis compression method used here is as follows.
For example, D. Malah's dissertation: "Time-Domain Algorithms
for Harmonic Bandwidth Reduction and Time Scaling
ofSpeech Signals ", IEEE Trans. on Asouctics, Spee
ch, and Signal Processing, vol.ASSP-27, No.2, pp.1
21-133, (1979) Time domain harmonic structure stretching (TDHS: Tim
e Domain Harmonic Scaling) algorithm, or
Morita and Itakura's Study Group Materials: “Autocorrelation
Method of speech expansion and contraction on the time axis by the method ",
IEICE Technical Report EA86-5 (1
986). These are pitch
A weight window is applied to the preceding and following waveforms in cycle units, and the section
The time axis is compressed by overlapping. FIG. 3 shows TD
The state of 2: 1 compression by the HS algorithm is shown. Ma
Instead, the pitch period Tp is obtained from the audio signal shown in FIG.
Next, for example, as shown in FIG.₁Karaso
Each time t before and after one pitch period Tp₀, T₁To a straight line
The weighted window function that becomes 1 from 0 is shown in FIG.
At the time t₀~ T₁, T ₁~
t_Two(C) of FIG.
The shape is superimposed and added, and a one-pitch circumference shown in FIG.
A time axis compressed audio signal of the period Tp is obtained. Also, the pitch period
It is necessary to perform superposition at appropriate intervals
Then, the time axis is compressed.

【００２４】図４（ａ）に示すように、図１２（ａ）と
同様にパケットＰ₄，Ｐ₈が遅れた場合は、図１，図２
による処理により出力される出力音声信号は図４（ｂ）
に示すようになる。パケットＰ₄が遅れたため、パケッ
トＰ₃の復号音声信号Ｖ₃が終了した時点ｔ_3eにパケッ
トＰ₄の復号音声信号Ｖ₄が間に合わず、それまでの音
声信号から生成された補間音声信号Ｖ₃′が復号音声信
号Ｖ₃に連続して出力され、その後、この例では時刻ｔ
₅にパケットＰ₄の復号音声信号Ｖ₄が得られ、その時
間圧縮音声信号Ｖ₄ ^*が補間音声信号Ｖ₃′と連続して
出力され、遅れたパケットＰ₄が予期された正しい時刻
に到着したとした時のその復号音声信号Ｖ₄の終了時刻
ｔ_4eになると圧縮音声信号Ｖ₄ ^*の送出を停止して、次
のパケットＰ₄の復号音声信号Ｖ₅を時点ｔ_4eから出力
する。つまり、この例ではパケットＰ₄が到着予定時刻
より所定時間以上遅れると、パケットＰ₃の復号音声信
号Ｖ₃が終了してしまい、その終了時刻ｔ_3eにパケット
Ｐ₄の復号音声信号Ｖ₄が間に合わなくなり、補間音声
信号を出力し、遅れたパケットＰ₄の復号音声信号Ｖ₄
が、パケットＰ₄が遅れないときの復号音声信号Ｖ₄の
終了時刻ｔ_4e前に得られると、復号音声信号Ｖ₄の圧縮
音声信号Ｖ₄ ^*を、これが得られてから時刻ｔ_4eまで出
力する。As shown in FIG. 4A, when the packets P ₄ and P ₈ are delayed as in FIG.
The output audio signal output by the processing according to FIG.
It becomes as shown in. Since packet P ₄ is delayed, the packet P ₃ of the decoded decoded voice signal V ₄ of the audio signal V ₃ packet P ₄ at a time t _3e ended is too late, the interpolation sound signal V ₃ generated from the audio signal to it ′ Is output continuously to the decoded voice signal V ₃ , and thereafter, in this example, at time t
₅ decoded voice signal V ₄ of the packet P ₄ is obtained, arrives at which time compressed audio signal V ₄ ^* is continuously output the interpolated sound signal V ₃ ', the correct time of late packets P ₄ is expected When the end time t _4e of the decoded audio signal V ₄ is _reached , the transmission of the compressed audio signal V ₄ ^* is stopped, and the decoded audio signal V ₅ of the next packet P ₄ is output from the time t _4e . That is, when a packet P ₄ in this example lags estimated arrival time for a predetermined time or more, would exit the decoded speech signal V ₃ of the packet P _3, the decoded speech signal V ₄ of the packet P ₄ to the end time t _3e In time, an interpolated audio signal is output, and the decoded audio signal V ₄ of the delayed packet P ₄ is output.
Is obtained before the end time t _4e of the decoded audio signal V ₄ when the packet P ₄ is not delayed, the compressed audio signal V ₄ ^* of the decoded audio signal V ₄ is output until the time t 4 _e after this is obtained. I do.

【００２５】この場合、圧縮音声信号Ｖ₄ ^*の挿入区間
ｔ₅〜ｔ_4eは１ピッチ周期Ｔ_Pの圧縮音声信号Ｖ₄ ^*の
始めをｔ₅に合わせ、または１ピッチ周期Ｔ_Pの圧縮音
声信号Ｖ₄ ^*の終わりをｔ_4eに合わせるようにするとよ
い。同様にしてパケットＰ₇の復号音声信号Ｖ₇が終了
した時刻ｔ_7eに、パケットＰ₈の復号音声信号が間に合
わず、補間音声信号Ｖ₇′で補間され、遅延パケットＰ
₈の受信で、この例では時刻ｔ₉過ぎから遅れないで受
信された場合のパケットＰ₈の復号音声信号の終了時刻
ｔ_8eまで復号音声信号Ｖ₈の圧縮音声信号Ｖ₈ ^*が出力
される。このようにして補間音声によって切断区間がな
く、また圧縮音声信号Ｖ₄ ^*，Ｖ₈ ^*が出力されるので
復号音声信号Ｖ₄，Ｖ₈の各音韻内容が失われることは
ない。また補間音声信号Ｖ₃′と圧縮音声信号Ｖ₄ ^*と
の合計の時間長が１つのパケットの復号音声信号長に一
致するので最終的な出力音声の遅延はないので、音声対
話通信が可能である。[0025] In this case, the compressed audio signal V ₄ ^* insertion interval t ₅ ~t _4e is combined compressed audio signal V ₄ ^* at the beginning of one pitch period T _P in t _5, or compressed voice of one pitch period T _P The end of the signal V ₄ ^* may be adjusted to t _4e . At time t _7e decoded audio signal V ₇ of the packet P ₇ is completed in the same manner, the decoded audio signal packet P ₈ is too late, it is interpolated by the interpolation sound signal V ₇ ', delayed packets P
_In this example, the compressed audio signal V ₈ ^{* of the} decoded audio signal V ₈ is output until the end time t _{8 e} of the decoded audio signal of the packet P ₈ when the packet P ₈ is received without delay after the time t _9. . In this way, there is no cut section due to the interpolated speech, and the compressed speech signals V ₄ ^* , V ₈ ^* are output, so that the phoneme contents of the decoded speech signals V ₄ , V ₈ are not lost. Also, since the total time length of the interpolated voice signal V ₃ ′ and the compressed voice signal V ₄ ^* matches the decoded voice signal length of one packet, there is no delay in the final output voice, so that voice interactive communication is possible. is there.

【００２６】請求項２の実施例図５に、請求項２の発明の実施例が適用された音声パケ
ット通信の受信側ブロック構成図を示す。この場合は復
号部８の出力側は補間部４１と、無音区間検出部４２
と、無音区間時間軸圧縮部４４と、スイッチの接点Ｎと
に接続され、無音区間時間軸圧縮部４４の出力側はスイ
ッチ４４の接点Ａに接続される。制御部４０では、図６
のフロー図に示すように、音声パケットを受信し
（Ｓ₁），その後、これから復号しようとする音声パケ
ットが遅れているかどうかを判断する（Ｓ ₂）。遅延パ
ケットでない場合には、復号化処理をし（Ｓ₃），その
復号音声信号に無音区間があるかを調べ（Ｓ₄），無音
区間でなければ出力バッファ９に復号音声信号を送り
（Ｓ₅），無音区間があれば圧縮処理が必要かを調べ
（Ｓ₆），圧縮処理が必要でなければ復号音声信号を切
換え器４３，４５の各接点Ｎを通じて出力バッファ９へ
送り（Ｓ₅），出力端子１０へ出力される（Ｓ₇）。[0026]Embodiment of Claim 2 FIG. 5 shows a voice packet to which the embodiment of the second aspect of the present invention is applied.
FIG. 2 shows a block diagram of a receiving side of packet communication. In this case,
The output side of the signal section 8 includes an interpolation section 41 and a silent section detection section 42.
And the silent section time axis compression unit 44 and the contact N of the switch.
The output side of the silence section time axis compression section 44 is connected to a switch.
Switch 44 is connected to the contact A. In the control unit 40, FIG.
As shown in the flow diagram of
(S₁), After that, the voice packet to be decoded
It is determined whether the cost is late (S _Two). Delay
If it is not a packet, a decryption process is performed (S_Three),That
It is checked whether there is a silent section in the decoded audio signal (S_Four), Silence
If not, send decoded audio signal to output buffer 9
(S_Five), If there is a silent section, check if compression processing is necessary
(S₆), If the compression processing is not necessary,
To the output buffer 9 through the respective contacts N of the changers 43 and 45
Send (S_Five) And output to the output terminal 10 (S₇).

【００２７】ステップＳ₂において遅延パケットであっ
た場合には、遅延パケットが到着し、復号するまで音声
補間処理を行う（Ｓ₈，Ｓ₉）。この場合の補間処理
は、従来の技術の項で述べた波形のピッチ周期抽出に基
づく繰り返し処理、またはＣＥＬＰ系の場合には前の伝
送パラメータを繰り返して使用して行う。その補間中
に、遅延パケットが得られると音声復号処理を行い（Ｓ
₁₀），補間音声信号に続けて出力バッファ９を経て、端
子１０より出力する。このままでは出力音声に切断区間
はできないが、補間に要した時間だけ出力が遅れてく
る。そこで、無音区間検出部４２で、復号音声信号の無
音区間検出を行い、無音区間が検出され（Ｓ₄），かつ
圧縮処理を必要とする場合（Ｓ₆）は、無音区間時間軸
圧縮部４４で無音復号音声信号を補間に要した時間だけ
圧縮する（Ｓ₁₁）。これにより、出力遅延をなくすこと
ができる。[0027] In the case was delayed packets in step S _2, the delay packet arrives, performs speech interpolation process until decoding (S _8, S _9). In this case, the interpolation processing is performed by repeating the processing based on the extraction of the pitch period of the waveform described in the section of the related art, or repeatedly using the previous transmission parameter in the case of the CELP system. If a delayed packet is obtained during the interpolation, a speech decoding process is performed (S
₁₀ ) After the interpolated audio signal, the signal is output from the terminal 10 via the output buffer 9. In this state, a cut section cannot be formed in the output voice, but the output is delayed by the time required for interpolation. Therefore, the silent section detecting section 42 detects the silent section of the decoded audio signal, and if the silent section is detected (S ₄ ) and the compression processing is required (S ₆ ), the silent section time axis compressing section 44 in compressed amount of time required silence decoded audio signal to the interpolator (S _11). Thereby, output delay can be eliminated.

【００２８】無音区間検出に関しては、送信パケットに
予め無音か無音でないかの識別子が付与してある場合に
はその識別子を使用する。識別子がない場合には、受信
側で例えば現在フレームのパワＰ_Cと有音区間の平均パ
ワＰ_Vとのパワ比（Ｐ_C／Ｐ _V）が一定しきい値以下で
あれば無音区間であると判断する。無音区間の時間軸圧
縮法としては、圧縮に必要な時間分をそのまま復号音声
信号から切断して切断前後の無音区間を接続させるだけ
でよい。無音区間に、背景雑音等が含まれている場合に
は、図３に示した時間軸圧縮において、ピッチ周期Ｔ_P
のかわりに、予め決めた特定の周期をとり、重み付け窓
をかけて重ね合わせてもよい。１パケットの無音区間が
補間音声の時間に比較して短ければ、複数区間に分けて
無音区間圧縮を適用することにより、各区間での圧縮率
が低くなり、音声劣化も少ない。Regarding the silent section detection, the transmission packet
If an identifier of silence or not is given in advance
Uses that identifier. If there is no identifier, receive
On the side, for example, the power P of the current frame_CAnd the average
WaP_VPower ratio (P_C/ P _V) Is below a certain threshold
If there is, it is determined that it is a silent section. Time axis pressure in silent section
As the compression method, the time required for compression is
Just disconnect from signal and connect silence section before and after disconnection
Is fine. When background noise is included in the silent section
Is the pitch period T in the time axis compression shown in FIG._P
Instead of taking a predetermined period, a weighting window
May be superimposed. One packet silence section
If it is shorter than the time of the interpolated voice, divide it into multiple sections
By applying silence section compression, the compression ratio in each section
And sound degradation is small.

【００２９】図４（ｃ）に、図４（ａ）の受信パケット
に対する本実施例の出力音声タイミングを示す。ここ
で、パケットＰ₄の遅れにより時刻ｔ_3eから補間音声信
号Ｖ₃′を補間し、時刻ｔ₅に遅れたパケットＰ₄の復
号音声信号Ｖ₄が得られると、これを直ちに補間音声信
号Ｖ₃′に続け、その全ての復号音声信号Ｖ₄を出力
し、その後の復号音声信号中のＶ₅とＶ₆との無音区間
を圧縮し、Ｖ₅，Ｖ₆より短い信号Ｖ₅♯，Ｖ₆♯とし
て補間音声信号Ｖ₃′の長さ分を吸収している。同様に
パケットＰ₈の遅れにより、補間信号Ｖ₇′を補間し、
パケットＰ₈の復号信号Ｖ₈が得られると、そのＶ₈の
全体を補間信号Ｖ₇′に続けさせ、その直後の復号音声
信号Ｖ₉には無音区間がなく、さらにその後の復号音声
信号Ｖ₁₀中の無音区間を補間信号Ｖ₇′の長さだけ圧縮
し、圧縮音声信号Ｖ₁₀♯とした場合である。このように
して補間音声信号によって切断区間がなく、また遅延パ
ケットＰ₄とＰ₈の各復号音声信号Ｖ₄，Ｖ₈がそのま
ま出力されるので音韻内容が失われることはない。ま
た、補間音声信号Ｖ₃′，Ｖ₇′に要した時間長を音声
信号Ｖ₅♯，Ｖ₆♯，Ｖ₁₀♯の無音区間圧縮時間と同じ
にしているので、最終的な出力音声の遅延はなく、実時
間での音声対話通信が可能である。FIG. 4 (c) shows the output voice timing of the present embodiment with respect to the received packet of FIG. 4 (a). Here, by interpolating the interpolated sound signal V ₃ 'from the time t _3e due to the delay of the packet P _4, the decoded voice signal V ₄ of the packet P _4, which is delayed in time t ₅ is obtained which immediately interpolated audio signal V Following the ₃ ', all of the decoded audio signal V ₄ and outputs a later decryption compress the silent interval between V ₅ and V ₆ in the speech signal, V _5, V ₆ from the short signal V ₅ ♯, V ₆ ♯ absorbs the length of the interpolated voice signal V ₃ ′. Similarly, the interpolation signal V ₇ ′ is interpolated by the delay of the packet P ₈ ,
When the decoding signal V ₈ of the packet P ₈ is obtained, it allowed to continue in its entirety V ₈ to the interpolated signal V ₇ ', there is no silent section in the decoded audio signal V ₉ of immediately, further followed decoded speech signal V _{In this case} , a silent section in _{10 is} compressed by the length of the interpolation signal V ₇ ′ to obtain a compressed audio signal V ₁₀ #. Thus no cut section by interpolation sound signal, also is not because each decoded voice signal V _4, V ₈ of delayed packets P ₄ and P ₈ is output as the phoneme content is lost. Further, since the time length required for the interpolated audio signals V ₃ ′ and V ₇ ′ is the same as the silent section compression time of the audio signals V ₅ ♯, V ₆ ♯ and V ₁₀ 、, the final output audio delay However, real-time voice conversation communication is possible.

【００３０】請求項３の実施例図７に、請求項３の発明の実施例を適用した音声パケッ
ト通信の受信側ブロック構成図を示す。図７において、
復号部８の出力側は無音／有音区間判定部５２と、無音
区間時間軸圧縮部５４と、有音区間時間軸圧縮部５５
と、切換え器５３の接点Ｎとに接続され、無音区間時間
軸圧縮部５４の出力側、有音区間時間軸圧縮部５５の出
力側にそれぞれ切換え器５３の接点Ａ₁，Ａ₂に接続さ
れている。制御部５０は図８に示す流れ図に示すよう
に、パケットを受信すると（Ｓ₁），これから復号しよ
うとする音声パケットが遅れているかどうかを判断し
（Ｓ₂），遅延パケットでない場合は音声復号処理して
復号音声信号を生成し（Ｓ₃），その復号音声信号が無
音区間かの判定がされ（Ｓ₄），無音区間でも、有音区
間のいずれでもそれぞれ圧縮処理を必要とするかが調べ
られ（Ｓ₅，Ｓ₆），いずれも圧縮処理を必要としない
場合は出力バッファ９に復号音声信号が送出され
（Ｓ₇），出力バッファ９を経て端子１０より音声信号
が出力される（Ｓ₈）。 Third Embodiment FIG. 7 is a block diagram showing the receiving side of voice packet communication to which the third embodiment of the present invention is applied. In FIG.
The output side of the decoding unit 8 is a silent / sound section determination unit 52, a silent section time axis compression unit 54, and a sound section time axis compression unit 55.
And the contact N of the switch 53, and connected to the contacts A ₁ and A ₂ of the switch 53 on the output side of the silent section time axis compression section 54 and the output side of the sound section time axis compression section 55, respectively. ing. As shown in the flow chart shown in FIG. 8, when the control unit 50 receives a packet (S ₁ ), it determines whether or not a voice packet to be decoded is delayed (S ₂ ). The decoded speech signal is generated by processing (S ₃ ), it is determined whether the decoded speech signal is a silent section (S ₄ ), and it is determined whether a compression process is required for both a silent section and a sound section. examined (S _5, S _6), one may not need the compression processing decoded speech signal is sent to the output buffer 9 (S _7), the audio signal is output from the terminal 10 through the output buffer 9 ( S _8).

【００３１】ステップＳ₂で遅延パケットであった場合
には、遅延パケットが到着し、音声信号を復号するま
で、音声補間処理が行われる（Ｓ₉，Ｓ₁₀），この場合
の補間処理は、従来の技術の項で述べた波形のピッチ周
期抽出に基づく繰り返し処理、またはＣＥＬＰ系の場合
には前の伝送パラメータを繰り返して使用する。その補
間中に、遅延パケットが到来し、その音声復号処理がな
されると（Ｓ₁₁），補間音声信号に続けて出力バッファ
９を経て、端子１０より復号音声信号が出力される。こ
の処理だけでは、出力音声に切断区間はできないが、補
間に要した時間だけ出力が遅れてくる。そこで、無音／
有音区間判定部５２で復号音声の無音／有音の判定が行
われ、無音と判定された音声信号に対しては（Ｓ₄），
圧縮処理を必要とする場合は（Ｓ₆），無音区間時間軸
圧縮部５４で補間に要した時間を圧縮する（Ｓ₁₂）。ま
たステップＳ₄で有音と判定された音声信号に対して
は、圧縮処理を必要とする場合は（Ｓ₅），有音区間時
間軸圧縮部５５で補間に要した時間を圧縮する
（Ｓ₁₃）。これにより出力遅延をなくすことができる。[0031] if it was delayed packets in step S _2, the delay a packet arrives until the decoded audio signal, the audio interpolation processing is carried out (S _9, S _10), the interpolation process in this case, The repetition processing based on the pitch period extraction of the waveform described in the section of the related art, or the previous transmission parameter is repeatedly used in the case of the CELP system. During the interpolation, arrives delayed packet, when the audio decoding process is executed (S _11), through the output buffer 9 following the interpolated sound signal, the decoded audio signal is output from the pin 10. With this processing alone, a cut section cannot be formed in the output voice, but the output is delayed by the time required for interpolation. So silence /
The voiced section determination unit 52 determines whether or not the decoded voice is silent or non-voiced. For the voice signal determined to be silent, (S ₄ )
If you need a compression process compresses (S _6), the time required for interpolation in the silent section time-base compression unit 54 (S _12). Also for the voiced and determined audio signal in step S _4, the case that requires compression process compresses (S _5), the time required for interpolation in the sound interval time-base compression unit 55 (S ₁₃ ). As a result, output delay can be eliminated.

【００３２】無音／有音区間判定部５２では、送信パケ
ットに予め無音か有音かの識別子が付与してある場合に
はその識別子を使用する。識別子がない場合には、受信
側で例えば現在区間のパワＰ_Cと有音区間の平均パワＰ
_Vとの比（Ｐ_C／Ｐ_V）が一定しきい値以下であれば無
音区間であるとし、そうでなければ有音区間とする。無
音区間の時間軸圧縮法としては、圧縮に必要な時間分を
そのまま復号音声信号から切断して切断前後の無音区間
を接続させるだけでよい。無音区間に、背景雑音等が含
まれている場合には、図３に示した時間軸圧縮におい
て、ピッチ周期Ｔ _Pのかわりに予め決めた特定の周期を
とり、重み付け窓をかけて重ね合わせてもよい。有音区
間の圧縮法は、ここでは請求項１の発明の実施例で述べ
た図３に示したＴＤＨＳによる時間軸圧縮法を用いる。The silent / voiced section determination section 52 transmits the transmission packet.
When a silent or sound identifier is assigned to the
Uses that identifier. If there is no identifier, receive
On the side, for example, the power P of the current section_CAnd average power P of sound section
_VAnd the ratio (P_C/ P_V) Is below a certain threshold
It is assumed to be a sound section, and otherwise, a sound section. Nothing
As the time axis compression method for the sound section, the time required for compression is
Silence section before and after cutting by cutting directly from the decoded audio signal
Only need to be connected. Silence periods include background noise, etc.
If it is rare, the time axis compression shown in FIG.
And the pitch period T _PInstead of a specific cycle
Alternatively, a weighting window may be applied and superimposed. Aru-ku
The compression method is described in the embodiment of the first aspect of the present invention.
The time axis compression method by TDHS shown in FIG. 3 is used.

【００３３】補間時間が長く、後続の一区間（１パケッ
トの復号音声信号期間）での圧縮時間が補間音声の時間
に比較して短時間しかとれない場合には、複数区間に分
けて無音／有音区間圧縮を適用することにより、各区間
での時間軸圧縮すべき時間の割合すなわち、圧縮率が低
くなり、音声劣化も少ない処理が可能である。図４
（ｄ）に、図４（ａ）の受信パケットに対する本実施例
の出力音声タイミングを示す。ここで図４（ｃ）と対応
する部分に同一符号をつけてあり、補間音声信号Ｖ₃′
にパケットＰ₄の復号音声信号Ｖ₄を接続するが、この
例では復号音声信号Ｖ₄の有音信号が時間軸圧縮され、
圧縮信号Ｖ₄ ^*が接続される。ただし、図４（ｂ）と異
なり、復号音声信号Ｖ₄の圧縮信号を途中で断にするこ
となく、全てを用いる。この圧縮時間だけでは補間信号
Ｖ₃′の時間長には不足で、その後の復号音声信号
Ｖ₅，Ｖ₆中の各無音区間が圧縮され、無音圧縮信号Ｖ
₅♯，Ｖ₆♯として順次接続され、有音圧縮信号Ｖ₄ ^*
と無音圧縮信号Ｖ₅♯，Ｖ ₆♯との各圧縮時間の合計が
補間信号Ｖ₃′の時間長と等しくされている。同様に補
間信号Ｖ₇′以後の復号音声信号Ｖ₈，Ｖ₉，Ｖ₁₀中の
Ｖ₈，Ｖ₉についてはそれぞれ有音時間軸圧縮した信号
Ｖ₈ ^*，Ｖ₉ ^*として、Ｖ₁₀については無音区間を圧縮
した信号Ｖ₁₀♯とし、これら３つの復号音声信号の圧縮
時間の合計が補間信号Ｖ₇′の長さと等しくされてい
る。この場合も、有音圧縮信号Ｖ₈ ^*，Ｖ₉ ^*はそれぞ
れ復号音声信号Ｖ₈，Ｖ₉のそれぞれの圧縮信号を切断
することなく、全てが用いられる。The interpolation time is long, and the succeeding section (one packet)
Compression time in the decoded audio signal period)
If it takes only a short time compared to
By applying silence / voice interval compression,
Of the time to compress on the time axis, ie the compression ratio is low
This makes it possible to perform processing with less sound degradation. FIG.
FIG. 4D shows the present embodiment for the received packet of FIG.
3 shows the output audio timing. Here, corresponding to FIG.
Are assigned the same reference numerals, and the interpolated audio signal V_Three′
Packet P_FourOf the decoded audio signal V_FourTo connect this
In the example, the decoded audio signal V_FourIs compressed in the time axis,
Compressed signal V_Four ^*Is connected. However, it differs from FIG.
And the decoded audio signal V_FourOf the compressed signal
Use all. With this compression time alone, the interpolation signal
V_Three′ Is not enough for the time length,
V_Five, V₆Each silent section is compressed and the silent compressed signal V
_Five♯, V₆順次 are sequentially connected, and the sound compression signal V_Four ^*
And silence compression signal V_Five♯, V ₆The sum of each compression time with ♯
Interpolation signal V_Three′. Similarly
Signal V₇'The decoded voice signal V₈, V₉, V_TenIn
V₈, V₉For each is a signal that has been compressed with a sound time axis
V₈ ^*, V₉ ^*As V_TenAbout silence section compression
Signal V_Ten圧縮 and compression of these three decoded audio signals
The total time is the interpolation signal V₇'Is equal to the length
You. Also in this case, the sound compression signal V₈ ^*, V₉ ^*Each
Decoded audio signal V₈, V₉Disconnect each compressed signal
Everything is used without doing.

【００３４】このようにして補間音声信号によって切断
区間がなく、また遅延パケットＶ₄ ^*とＶ₈ ^*が出力さ
れているので音韻内容が失われることはない。また補間
音声信号Ｖ₃′とＶ₇′に要した時間長が信号Ｖ₅♯，
Ｖ₆♯，Ｖ₁₀♯とＶ₄ ^*，Ｖ ₈ ^*，Ｖ₉ ^*との無音／有
音区間圧縮時間の合計と同じにするので、最終的な出力
音声の遅延はなく、実時間での音声対話通信が可能であ
る。In this manner, cutting is performed by the interpolated audio signal.
No section and delayed packet V_Four ^*And V₈ ^*Is output
The phonetic content is not lost. Also interpolation
Audio signal V_Three'And V₇'Is the signal V_Five♯,
V₆♯, V_Ten♯ and V_Four ^*, V ₈ ^*, V₉ ^*Silence with / Yes
The same as the total sound section compression time, so the final output
There is no delay in voice and real-time voice dialogue communication is possible.
You.

【００３５】他の請求項の実施例図１，図５，図７におけるそれぞれの切り換え／接続部
３４，４５，５６としては単なる切換えスイッチを示し
たが、補間音声信号と遅延パケットの復号音声信号との
接続は次のようにすることもできる。即ち、請求項４の
発明では補間音声信号にピッチ周期性があるときは、例
えば図９（ａ），（ｂ）に示すように補間音声信号
Ｖ_K′の開始時刻ｔ₀からピッチ周期Ｔ_Pの整数倍（Ｔ
_i＝ｎｘＴ_P，例ではｎ＝２）の時刻ｔ₁までを補間音
声信号Ｖ_K′とする。すると、補間音声信号Ｖ_K′の開
始時刻ｔ₀の波形と、補間終了時刻ｔ₁の波形とが１ピ
ッチの同じ位置に対応するので、それ以後に、遅延パケ
ットの復号音声信号Ｖ_K+1を接続しても接続境界ｔ₁で
図１０（ｃ）に示すように大きな不連続にならない。[0035] Example Figure 1 of another aspect, FIG. 5 shows a simple changeover switch as each of the switching / connecting portion 34,45,56 in Figure 7, the interpolation sound signal and a decoded audio signal delayed packets The connection with can also be as follows. That is, in the invention of claim 4, when the interpolated speech signal has a pitch periodicity, for example, as shown in FIGS. 9 (a) and 9 (b), the pitch period T _P starts from the start time t ₀ of the interpolated speech signal V _K ′. Integer multiple of (T
_The interpolated audio signal V _K ′ up to time t _{1 of} _i = nxT _P (in the example, n = 2) is used. Then, the waveform at the start time t ₀ of the interpolated audio signal V _K ′ and the waveform at the interpolation end time t ₁ correspond to the same position of one pitch, and thereafter, the decoded audio signal V _{K + 1} of the delayed packet Are not greatly discontinuous at the connection boundary t ₁ as shown in FIG.

【００３６】遅延パケットの復号音声信号Ｖ_K+1を生成
する際に、過去の復号音声が必要な場合には、補間信号
Ｖ_K′は使用せずに、遅延がなかったと仮定して補間開
始時点ｔ₀に続くとして、その直前の音声情報、つまり
復号音声信号Ｖ_Kを用いて復号する（請求項５の発
明）。こうすることにより、受信側で遅延パケットが生
じても後続の音声復号に補間音声信号による影響がな
く、送信側と同じ音声を出力することができる。When the decoded speech signal V _{K + 1} of the delayed packet is required, if the decoded speech in the past is necessary, the interpolation signal V _K ′ is not used and the interpolation is started on the assumption that there is no delay. Assuming that it follows the time point t ₀ , decoding is performed using the immediately preceding audio information, that is, the decoded audio signal V _K (the invention of claim 5). By doing so, even if a delay packet is generated on the receiving side, the subsequent voice decoding is not affected by the interpolated voice signal, and the same voice as on the transmitting side can be output.

【００３７】さらに、補間音声信号Ｖ_K′と遅延パケッ
ト復号音声Ｖ_K+1とを図９（ａ），（ｂ）に示すように
補間用の窓関数、つまり被接続信号である補間音声信号
Ｖ_Kは接続の時刻ｔ₁から漸次減少し、逆に接続信号で
ある復号音声信号Ｖ_K+1は接続時刻ｔ₁から漸次１にな
るような各窓関数をそれぞれに乗じて加算して接続する
ことにより、補間途中で音声信号が変化した場合でも、
連続的に重み付け加算されるので、接続境界ｔ₁の不連
続性を弱めることが可能となり、接続による品質劣化を
抑制できる（請求項６の発明）。Further, as shown in FIGS. 9A and 9B, the interpolated speech signal V _K ′ and the delayed packet decoded speech V _{K + 1} are interpolated as a window function, that is, the interpolated speech signal which is a connected signal. V _K gradually decreases from the connection time t ₁ , and conversely, the decoded voice signal V _{K + 1,} which is the connection signal, is multiplied by each window function that gradually becomes ₁ from the connection time t ₁ , added and connected. By doing so, even if the audio signal changes during interpolation,
Since weighting and addition are continuously performed, it is possible to reduce the discontinuity of the connection boundary t ₁ , and it is possible to suppress quality deterioration due to connection (the invention of claim 6).

【００３８】[0038]

【発明の効果】以上説明したように、この発明ではある
制限時間内のパケット遅延であれば、遅延分の間だけ、
前に到着したパケットの復号音声信号により音声信号を
補間し、そのあとで遅延パケットの復号音声信号を接続
し、その際にその復号音声信号自体またはそれ以後の無
音区間、あるいは無音および有音区間で、時間軸圧縮を
行うので、遅延したフレーム音韻の欠落をなくし、円滑
な補間音声信号を出力し、かつ、時間遅延が大きくなら
ないパケット音声復号方法を実現でき、その効果は極め
て大きい。As described above, according to the present invention, if a packet is delayed within a certain time limit, only the delay time is used.
The audio signal is interpolated by the decoded audio signal of the packet that arrived earlier, and then the decoded audio signal of the delayed packet is connected. At that time, the decoded audio signal itself or a silent section thereafter, or a silent and voiced section. Since the time axis compression is performed, it is possible to realize a packet speech decoding method that eliminates the loss of delayed frame phonemes, outputs a smooth interpolated speech signal, and does not increase the time delay.

[Brief description of the drawings]

【図１】請求項１の発明を適用した受信装置の例を示す
ブロック図。FIG. 1 is a block diagram showing an example of a receiving apparatus to which the invention of claim 1 is applied.

【図２】その受信復号処理手順の例を示す流れ図。FIG. 2 is a flowchart showing an example of the reception decoding processing procedure.

【図３】時間軸圧縮処理を説明するための波形図。FIG. 3 is a waveform chart for explaining time axis compression processing.

【図４】受信パケット例（ａ）に対する請求項１，２，
３の各発明による音声信号の出力例（ｂ），（ｃ），
（ｄ）を示す図。FIG. 4 is a block diagram showing an example of received packets (a).
Output examples (b), (c), and
FIG.

【図５】請求項２の発明を適用した受信装置の例を示す
ブロック図。FIG. 5 is a block diagram showing an example of a receiving apparatus to which the invention of claim 2 is applied.

【図６】その受信復号処理手順の例を示す流れ図。FIG. 6 is a flowchart showing an example of the reception decoding processing procedure.

【図７】請求項３の発明を適用した受信装置の例を示す
ブロック図。FIG. 7 is a block diagram showing an example of a receiving apparatus to which the invention of claim 3 is applied.

【図８】その受信復号処理手順の例を示す流れ図。FIG. 8 is a flowchart showing an example of the reception decoding processing procedure.

【図９】補間音声信号と遅延パケットの復号音声信号と
の接続方法である請求項４乃至６の発明を説明するため
の波形図。FIG. 9 is a waveform chart for explaining the invention according to claims 4 to 6, which is a method for connecting an interpolation audio signal and a decoded audio signal of a delay packet.

【図１０】音声信号のパケット送受信伝送系の一般的構
成を示すブロック図。FIG. 10 is a block diagram showing a general configuration of an audio signal packet transmission / reception transmission system.

【図１１】その送信パケットと、受信パケットと、復号
音声信号との関係を示す図。FIG. 11 is a diagram showing a relationship among a transmission packet, a reception packet, and a decoded audio signal.

【図１２】遅延パケットと、これに対する従来の各種復
号音声信号との関係を示す図。FIG. 12 is a diagram showing a relationship between a delay packet and various conventional decoded audio signals corresponding thereto.

【図１３】従来の受信パケット復号処理手順を示す流れ
図。FIG. 13 is a flowchart showing a conventional received packet decoding processing procedure.

【図１４】従来の遅延パケットを含む復号処理手順を示
す流れ図。FIG. 14 is a flowchart showing a conventional decoding processing procedure including a delay packet.

【図１５】従来の遅延パケットに対する音声補間をする
復号装置を示すブロック図。FIG. 15 is a block diagram showing a conventional decoding device that performs voice interpolation on a delayed packet.

【図１６】その従来の処理手順を示す流れ図。FIG. 16 is a flowchart showing a conventional processing procedure.

【図１７】従来の補間音声信号と遅延パケット復号音声
信号との接続を説明するための波形図。FIG. 17 is a waveform diagram for explaining connection between a conventional interpolated audio signal and a delayed packet decoded audio signal.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭54−139417（ＪＰ，Ａ) 特開平２−183648（ＪＰ，Ａ) 特開平５−88697（ＪＰ，Ａ) 特開平４−219797（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-54-139417 (JP, A) JP-A-2-183648 (JP, A) JP-A-5-88697 (JP, A) JP-A-4- 219797 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 19/00

Claims

(57) [Claims]

When a packet is delayed by a predetermined time or more from an estimated arrival time, a speech signal is generated by interpolation processing from encoded information of a previously arrived packet, and is output continuously with the output speech signal up to that time. When the delayed packet is received before the scheduled arrival time of the next packet, the decoded speech of the delayed packet is compressed on the time axis, and the decoded speech output when it arrives at the original scheduled arrival time is output. And outputting the interpolated audio signal continuously until the last time.

2. When a packet is delayed by a predetermined time or more from an estimated arrival time, an audio signal is generated by interpolation processing from encoded information of a previously arrived packet, and output as a continuous output audio signal. When the delayed packet is received within a predetermined time, the decoded audio signal of the delayed packet is output following the interpolated audio signal, and the decoded audio signal of the packet after the decoded audio signal is output. A packet voice decoding method comprising compressing a silent section by a time length of the interpolation voice signal.

3. A speech section in a decoded speech signal of a packet after outputting the decoded speech signal of the delayed packet following the interpolation speech signal is time-compressed, and the interpolation is performed by summing the compression with the compression of the silence section. 3. The packet audio decoding method according to claim 2, wherein the packet audio decoding time is equal to a time length of the audio signal.

4. The apparatus according to claim 1, wherein when the interpolated audio signal has a pitch periodicity, a section of the interpolated audio signal is set to an integral multiple of the pitch period. Packet voice decoding method.

5. The method according to claim 1, wherein when decoding of a packet requires past speech information, speech information immediately before said interpolation speech signal is used for decoding of said delayed packet. The packet audio decoding method according to any one of the above.

6. The connection between the interpolated audio signal and the decoded audio signal of the delayed packet by multiplying an interpolation window function by the interpolated audio signal and the decoded audio signal and adding them together. 4. The packet audio decoding method according to claim 1, wherein: