JP2006171353A

JP2006171353A - Voice decoding system

Info

Publication number: JP2006171353A
Application number: JP2004363680A
Authority: JP
Inventors: Masahiro Omine; 正寛大峯; Yoshiaki Nozawa; 善明野澤
Original assignee: NEC Engineering Ltd
Current assignee: NEC Engineering Ltd
Priority date: 2004-12-15
Filing date: 2004-12-15
Publication date: 2006-06-29

Abstract

<P>PROBLEM TO BE SOLVED: To make compatible temporal continuity of a voice signal and lowering of delay. <P>SOLUTION: An arrival time recorder 50 records arrival time, when a voice code packet arrives and a jitter counter 60 calculates the jitter value. A readout controller 70 sends an indication for a skip read of the packet or a read stop to a jitter buffer 10, based on the jitter value. A voice decoder 20 decodes the voice code packet supplied from the jitter buffer. A voice signal generating section 30 for compensation generates a voice signal for compensation for the start of packet skip read control and a voice signal generating section 40 for repetition generates a voice signal for repeated reproduction for the start of packet skip read control. A selector 80 receives notice from the readout controller 70 and selects and outputs a voice signal to be reproduced. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声符号化器によって符号化された音声信号を伝送路から入力し、復号して出力する音声復号システムに関する。 The present invention relates to a speech decoding system that inputs a speech signal encoded by a speech coder from a transmission line, decodes it, and outputs it.

圧縮符号化された音声信号（音声符号）のパケットを入力として伝送路を介して受け、復号を行う音声通信システムにおいて、伝送路の輻輳等によりパケットの到着時刻間隔が乱れ、ジッタが生じることがある。この場合、復号器での復号タイミングが一定でなくなり、結果、復号された音声信号は時間的に不連続になってしまう。 In a voice communication system that receives and decodes a packet of a voice signal (voice code) that has been compressed and encoded via a transmission line, the arrival time interval of the packet is disturbed due to congestion of the transmission line, and jitter occurs. is there. In this case, the decoding timing in the decoder is not constant, and as a result, the decoded audio signal becomes temporally discontinuous.

この対策として、伝送路と復号器の間にジッタバッファと呼ばれる記憶装置を配備し、ジッタバッファにある程度の音声符号パケットを保持しつつ、ジッタバッファ内の音声符号パケットを一定時間間隔で復号器に供給し、復号器によって音声符号の復号を行うのが一般的である。 As a countermeasure, a storage device called a jitter buffer is arranged between the transmission line and the decoder, and while holding a certain amount of speech code packets in the jitter buffer, the speech code packets in the jitter buffer are sent to the decoder at regular time intervals. In general, the speech code is decoded by a decoder.

しかしながら、ジッタが大きい場合、ジッタバッファに格納していた音声符号パケットが全て無くなると、次の音声符号パケットの到着前に復号器において復号すべき音声符号が無くなり、音声通信が破綻してしまう。この対策として、ジッタバッファの容量を大きくして、ジッタバッファに格納しておく音声符号パケット数を大きくすればするほどジッタに対しての耐性が強くなる。しかし、あまりにも大量の音声符号パケットをジッタバッファに格納しておくことは、音声符号パケットの転送遅延を大きくし、音声通信のリアルタイム性を損なうため好ましくない。 However, when the jitter is large, if all the speech code packets stored in the jitter buffer are lost, the speech code to be decoded by the decoder is lost before the arrival of the next speech code packet, and speech communication is broken. As a countermeasure, as the capacity of the jitter buffer is increased and the number of voice code packets stored in the jitter buffer is increased, the tolerance to jitter becomes stronger. However, storing too many speech code packets in the jitter buffer is not preferable because it increases the transfer delay of speech code packets and impairs real-time performance of speech communication.

このような課題を解決するために、従来、受信した音声信号の無音部分を抽出し、バッファ量を削減する場合には受信した音声信号の無音部分を削除し、バッファ量を増加させる場合には無音部分の直後に無音パターンを挿入することで動的にバッファ量を制御しつつ復号した音声信号のノイズ発生を抑制するようにした例が知られている（例えば、特許文献１参照）。 In order to solve such a problem, conventionally, when the silent part of the received audio signal is extracted and the buffer amount is reduced, the silent part of the received audio signal is deleted and the buffer amount is increased. An example is known in which noise generation of a decoded audio signal is suppressed while a buffer amount is dynamically controlled by inserting a silence pattern immediately after the silence portion (see, for example, Patent Document 1).

また、伝送路より供給される音声符号パケットを蓄積する揺らぎ吸収バッファと、音声符号を復号処理によって復号した音声信号を蓄積する音声信号蓄積バッファとを備え、揺らぎ吸収バッファに格納されている音声符号量が下限を下回ると音声信号蓄積バッファに蓄積させる音声信号を補間挿入し、同時に揺らぎ吸収バッファからの音声符号の読出しを遅らせることで揺らぎ吸収バッファ内の音声符号量の維持を図り、また、揺らぎ吸収バッファに格納されている音声符号量が上限を上回ると音声信号蓄積バッファに蓄積されている音声信号を間引き、同時に揺らぎ吸収バッファ内の音声符号の読み出しを早めることで揺らぎ吸収バッファ内の音声符号量を削減するようにした例も知られている（例えば、特許文献２参照）。これにより、無音区間の出現頻度が少ない連続音声の伝送でも、音声品質の劣化を抑制することができるとしている。 The voice code stored in the fluctuation absorbing buffer is provided with a fluctuation absorbing buffer for accumulating voice code packets supplied from the transmission path and a voice signal accumulation buffer for accumulating voice signals obtained by decoding the voice code by decoding processing. If the amount falls below the lower limit, the audio signal to be stored in the audio signal storage buffer is interpolated and at the same time the reading of the audio code from the fluctuation absorbing buffer is delayed to maintain the amount of audio code in the fluctuation absorbing buffer. When the amount of audio code stored in the absorption buffer exceeds the upper limit, the audio signal stored in the audio signal storage buffer is thinned out, and at the same time, the audio code in the fluctuation absorption buffer is accelerated so that the audio code in the fluctuation absorption buffer is accelerated. An example in which the amount is reduced is also known (see, for example, Patent Document 2). Thereby, it is said that deterioration of voice quality can be suppressed even in continuous voice transmission in which the frequency of appearance of silent sections is low.

特開2002-271391（第２頁−第３頁、図２）JP 2002-271391 (2nd page-3rd page, FIG. 2) 特開2003-050598（第２頁−第４頁、図１）JP2003-050598 (2nd page-4th page, FIG. 1)

しかしながら、特許文献１記載の技術では、バッファ制御の発動は受信した音声信号に無音声部分が含まれていることが条件となるので、背景音楽が流れている環境下での通話や、雑音レベルが高い環境下での通話などでは長時間無音声部分が発生せず、バッファ量制御が発動しないまま音声通信が破綻する場合があるという問題点がある。 However, in the technique described in Patent Document 1, since the buffer control is activated under the condition that the received audio signal includes a non-voice part, a call in an environment where background music is flowing, or a noise level However, there is a problem in that there is a case where a silent part does not occur for a long time in a call under a high environment, and the voice communication may fail without the buffer amount control being activated.

また、特許文献２記載の技術では、補間処理においては線形補間処理を実施することにより、過去の音声信号と最新の音声信号との平均値を計算し、その平均値を補間用の音声信号とすることによって復号された音声信号の時間的連続性を確保しているものの、間引き処理においては単に復号用音声信号を任意のサンプル数間引く処理をするため、復号された音声信号の時間的連続性が保たれない。 Further, in the technique described in Patent Document 2, by performing linear interpolation processing in the interpolation processing, an average value between the past audio signal and the latest audio signal is calculated, and the average value is calculated as an audio signal for interpolation. Although the temporal continuity of the decoded audio signal is ensured by performing the decimation process, the decoding audio signal is simply thinned out by an arbitrary number of samples in the decimation process. Is not kept.

更に、この技術における揺らぎ吸収バッファ制御は、送受で動作クロックの同期が取れていないことによって発生するジッタバッファの蓄積量の単調増加、あるいは単調減少に起因するジッタバッファのオーバーフロー、あるいはアンダーフローを抑制するものあり、伝送路の輻輳等に起因するジッタによるバッファの破綻は考慮されていない。言い換えれば、この技術における揺らぎ吸収バッファは、伝送路の輻輳等に起因するジッタによってオーバーフローあるいはアンダーフローを起こさない程十分に大きい容量を持っていることが前提となっている。 Furthermore, the fluctuation absorption buffer control in this technology suppresses the jitter buffer overflow or underflow caused by the monotonous increase or decrease of the jitter buffer accumulation caused by the synchronization of the operation clock between transmission and reception. However, the failure of the buffer due to jitter caused by congestion of the transmission path is not considered. In other words, the fluctuation absorbing buffer in this technique is premised on having a sufficiently large capacity so as not to cause overflow or underflow due to jitter caused by congestion in the transmission path.

また、特許文献１、特許文献２記載の技術はともに、音声通信の低遅延化を図るために積極的にジッタバッファ内の音声符号パケットを削減する制御ではなく、ジッタバッファのオーバーフローまたはアンダーフロー防止のための制御でしかない。 Further, both of the techniques described in Patent Document 1 and Patent Document 2 are not control for actively reducing the voice code packet in the jitter buffer in order to reduce the delay of the voice communication, but preventing jitter buffer overflow or underflow. There is only control for.

結局、上記いずれの従来技術であっても、伝送路の輻輳等によりパケットの到着時刻間隔が乱れるような音声通信環境において、ジッタバッファ制御による音声通信の連続性の確保と、音声通信の低遅延化の両立が不可能である。 After all, in any of the above prior arts, in a voice communication environment where the arrival time interval of packets is disturbed due to congestion of the transmission path, ensuring continuity of voice communication by jitter buffer control and low delay of voice communication It is impossible to achieve both.

そこで、本発明の目的は、伝送路の輻輳等によりパケットの到着時刻間隔が乱れるような音声通信環境において、ジッタ吸収による音声通信の連続性の確保と、ジッタバッファ制御による音声通信の低遅延化を実現する音声復号システムを提供することを目的とする。 Accordingly, an object of the present invention is to ensure the continuity of voice communication by absorbing jitter and to reduce the delay of voice communication by controlling jitter buffer in a voice communication environment in which the arrival time interval of packets is disturbed due to congestion of the transmission path. An object of the present invention is to provide a speech decoding system that realizes the above.

本発明は上記の目的を達成するため、パケットに到着時刻を打刻し、打刻時間よりジッタを算出し、算出されたジッタの値を元に、ジッタバッファを制御する手段、ジッタバッファより供給される音声符号パケットを復号し音声信号を出力する音声復号器、復号された音声信号を一時保持する記憶装置と、復号された音声信号から予測信号を生成する手段、記憶装置により保持された復号された音声信号と予測信号を用いて補償用音声信号を生成する手段、バッファメモリを制御する手段と連動し、復号された音声信号と補償用音声信号のどちらか一方を出力するよう制御する手段を備えている。 In order to achieve the above object, the present invention imprints arrival time on a packet, calculates jitter from the time stamped, and supplies the jitter buffer based on the calculated jitter value and the jitter buffer. Decoder for decoding a speech code packet to be output and outputting a speech signal, a storage device for temporarily storing the decoded speech signal, means for generating a prediction signal from the decoded speech signal, and a decoding held by the storage device Means for generating a compensation speech signal using the decoded speech signal and the predicted signal, and means for controlling to output either the decoded speech signal or the compensation speech signal in conjunction with the means for controlling the buffer memory It has.

より詳しくは、本発明の音声復号システムは、伝送路から入力する音声符号パケットを復号して出力する音声復号システムであって、音声符号パケットを保持しクロックの制御によって音声符号パケットを出力するジッタバッファ（図１の10）と、音声符号パケットの到達時刻を順次に打刻する到達時刻打刻器（図１の50）と、該到達時刻打刻器による打刻時刻の差分より到達時間のジッタを計数するジッタ計数器（図１の60）と、該ジッタ計数器の示すジッタ量がジッタバッファにおける音声符号パケットの滞留量に対して十分小さい場合にはジッタバッファに対して音声符号パケットの読み飛ばしを行い、またジッタバッファにおける音声符号パケットの滞留量がジッタ計数器の示すジッタ量に対して十分な余裕を持たない場合にはジッタバッファからの音声符号パケットの読出し停止を行う読出し制御器（図１の70）と、ジッタバッファから読出し制御器による制御に基づく音声符号パケットの供給を受けた後、該音声符号パケットの復号を行い復号音声信号を出力する音声復号器（図１の20）と、該音声復号器が出力する復号音声信号により、読出し制御器によるパケットの読み飛ばし制御時の音声ノイズを抑圧するための補償用音声信号を生成する補償用音声信号生成部（図１の30）と、音声復号器の出力する復号音声信号により、読出し制御器によるパケット読出し停止制御時に音声ノイズを抑圧するための重複用音声信号を生成する重複用音声信号生成部（図１の40）と、音声復号器が出力する復号音声信号と補償用音声信号と重複用音声信号との内のいずれかを読出し制御器による制御に基づき選択し出力するセレクタ（図１の40）とを具備する。 More specifically, the speech decoding system of the present invention is a speech decoding system that decodes and outputs a speech code packet input from a transmission line, and holds the speech code packet and outputs a speech code packet by controlling a clock. A buffer (10 in FIG. 1), an arrival time stamp (50 in FIG. 1) that sequentially stamps the arrival times of speech code packets, and the difference in the arrival time by the arrival time stamp. A jitter counter (60 in FIG. 1) that counts the jitter, and when the jitter amount indicated by the jitter counter is sufficiently smaller than the retention amount of the voice code packet in the jitter buffer, Jitter is skipped, and if the retention amount of voice code packets in the jitter buffer does not have sufficient margin for the jitter amount indicated by the jitter counter A read controller (70 in FIG. 1) that stops reading the voice code packet from the buffer, and after receiving the voice code packet supplied from the jitter buffer based on the control by the read controller, the voice code packet is decoded. A speech decoder (20 in FIG. 1) that outputs a decoded speech signal, and a compensation speech for suppressing speech noise at the time of packet skip control by the readout controller by the decoded speech signal output by the speech decoder An audio signal for duplication for suppressing audio noise at the time of packet read stop control by the read controller is determined by a compensatory audio signal generator (30 in FIG. 1) that generates a signal and a decoded audio signal output from the audio decoder. One of the duplicate audio signal generation unit (40 in FIG. 1) to be generated, the decoded audio signal output from the audio decoder, the compensation audio signal, and the overlap audio signal is read out. And a selector (40 in FIG. 1) for selecting and outputting based on the control by the control unit.

そして、補償用音声信号生成部（図１の30）は、音声復号器が出力した１クロック前の音声信号を保持するバッファメモリ（図１の31）と、該バッファメモリ内に蓄積された１クロック前の音声信号を用いて読み飛ばしにより廃棄される音声信号を予測する予測信号生成器（図１の20）と、読み飛ばし制御の開始時点においては予測信号生成器が生成する予測音声信号に対する重み係数を大きくし、読み飛ばし制御の終了時点においては後続の音声信号に対する重み係数を大きくするようにオーバラップ演算を行って補償用音声信号を生成するオーバラップ演算器（図１の33）とを有する。 The compensation audio signal generation unit (30 in FIG. 1) includes a buffer memory (31 in FIG. 1) that holds the audio signal one clock before output from the audio decoder, and 1 stored in the buffer memory. A prediction signal generator (20 in FIG. 1) that predicts a speech signal discarded by skipping using a speech signal before the clock, and a predicted speech signal generated by the prediction signal generator at the start of skipping control. An overlap calculator (33 in FIG. 1) that generates a compensation audio signal by performing an overlap calculation so that the weight coefficient is increased and the weight coefficient for the subsequent audio signal is increased at the end of the skipping control. Have

また、重複用音声信号生成部（図１の40）は、音声復号器が出力した１クロック前の音声信号を保持するバッファメモリ（図１の41）と、該バッファメモリ内に蓄積された１クロック前の音声信号を用いて読出し停止により廃棄される音声信号を予測する予測信号生成器（図１の42）と、読出し停止制御の開始時点において予測信号生成器が生成する予測音声信号に対する重み係数を大きくし、読出し停止制御の終了時点においてはバッファメモリ内に蓄積された１クロック前の音声信号に対する重み係数を大きくするようにオーバラップ演算を行って前記重複用音声信号を生成するオーバラップ演算器（図１の43）とを有する。 Also, the duplicate audio signal generator (40 in FIG. 1) includes a buffer memory (41 in FIG. 1) that holds the audio signal one clock before output from the audio decoder, and the 1 stored in the buffer memory. A prediction signal generator (42 in FIG. 1) that predicts a speech signal that is discarded due to read stop using a speech signal before the clock, and a weight for the predicted speech signal generated by the prediction signal generator at the start of read stop control The overlap is generated by increasing the coefficient and performing the overlap calculation so as to increase the weighting coefficient for the audio signal one clock before stored in the buffer memory at the end of the read stop control. And an arithmetic unit (43 in FIG. 1).

本発明によれば、音声符号パケットの到着時刻のジッタに基づき適応的に、ジッタバッファ内に格納されるパケット数をジッタが小さい場合には、ジッタバッファ内に格納してあるパケット数を削減することにより、伝送遅延の低減を行うことができ、ジッタが大きい場合には、ジッタバッファ内に格納してあるパケット数を増大することにより、ジッタバッファのアンダーフローによる音声の途切れを防止して、音声通信のリアルタイム性を確保することができる。 According to the present invention, when the number of packets stored in the jitter buffer is small based on the arrival time jitter of the voice code packet, the number of packets stored in the jitter buffer is reduced. Therefore, transmission delay can be reduced, and when the jitter is large, by increasing the number of packets stored in the jitter buffer, it prevents voice interruption due to jitter buffer underflow, Real-time performance of voice communication can be ensured.

そして、ジッタバッファ内に格納してあるパケット数の削減、増加を適応的に行うに当り、ジッタバッファ制御に伴う音声信号の時間的不連続性をも低減することができ、リアルタイム音声通信における品質向上に寄与するところ大であるという効果を得ることができる。 In addition, when adaptively reducing and increasing the number of packets stored in the jitter buffer, it is possible to reduce the temporal discontinuity of the audio signal associated with jitter buffer control. It is possible to obtain the effect of being large in terms of contributing to improvement.

本発明の音声通信システムでは、伝送路から供給される音声符号パケットに対して、到着時刻を打刻し、打刻時刻を元にジッタを算出する。そして、算出されたジッタを元に、ジッタが小さい場合は転送遅延を小さくするためにジッタバッファに対してパケットの読み飛ばしを指示し、ジッタバッファの格納パケット数の削減を図り、ジッタが大きい場合にはバッファ内のパケット枯渇による音声通信の途切れを防ぐため、パケットの読出し停止の指示を出し、ジッタバッファの格納パケット数の維持を図る。 In the voice communication system of the present invention, the arrival time is stamped on the voice code packet supplied from the transmission line, and the jitter is calculated based on the stamping time. Based on the calculated jitter, if the jitter is small, instruct the jitter buffer to skip reading packets to reduce the transfer delay, and reduce the number of packets stored in the jitter buffer. In order to prevent interruption of voice communication due to packet depletion in the buffer, an instruction to stop packet reading is issued to maintain the number of packets stored in the jitter buffer.

ジッタバッファでの読み飛ばし制御が発動した際、復号された音声信号は時間的に不連続なものとなるが、これに対して、補償用音声信号生成部において、補償用音声信号生成部内の記憶装置に格納されている読み飛ばしパケットの１時刻前のパケットの復号により得られた音声信号と、読み飛ばしパケットの１時刻後のパケットから復号された音声信号を用いて生成された予測信号を用いて、補償用音声信号を生成し出力することによりパケットの読み飛ばしに起因する音声通信の不連続性を低減する。 When the skip skip control in the jitter buffer is activated, the decoded audio signal becomes temporally discontinuous. On the other hand, the compensation audio signal generation unit stores it in the compensation audio signal generation unit. Using a speech signal obtained by decoding a packet one hour before the skipped packet stored in the apparatus and a prediction signal generated using a speech signal decoded from the packet one hour after the skipped packet Thus, the discontinuity of voice communication due to skipping of packets is reduced by generating and outputting the compensation voice signal.

また、ジッタバッファの読出し停止制御が発動した際、復号された音声信号は時間的に不連続なものとなるが、これに対して、重複用音声信号生成部において、重複用音声信号生成部内の記憶装置に格納されている読み出し停止制御発動の直前に復号された音声信号と、復号された音声信号を用いて生成された予測信号を用いて、重複用音声信号を生成し出力することによりパケットの読出し停止に起因する音声通信の不連続性を低減する。 In addition, when the readout stop control of the jitter buffer is activated, the decoded audio signal becomes discontinuous in time. On the other hand, in the duplication audio signal generation unit, in the duplication audio signal generation unit, A packet by generating and outputting a duplicate audio signal using the audio signal decoded immediately before the read stop control activation stored in the storage device and the prediction signal generated using the decoded audio signal The discontinuity of the voice communication due to the stop of reading is reduced.

図１は本発明による音声復号システムの一実施例の構成を示すブロック図である。この音声復号システムは、符号化された入力音声信号を伝送路から入力し、復号して出力音声信号を出力するものであって、ジッタバッファ10，音声復号器20，補償用音声復号部30，重複用音声信号生成部40，到着時刻打刻器50，ジッタ計数器60，読出し制御器70およびセレクタ80から構成されている。 FIG. 1 is a block diagram showing the configuration of an embodiment of a speech decoding system according to the present invention. This speech decoding system inputs a coded input speech signal from a transmission line, decodes it and outputs an output speech signal, and comprises a jitter buffer 10, a speech decoder 20, a compensation speech decoding unit 30, The audio signal generator 40 for duplication, an arrival time stamp 50, a jitter counter 60, a read controller 70, and a selector 80 are included.

ジッタバッファ10は、伝送路から供給される、符号化された音声信号（以下、「音声符号」と記す）のパケットを所定数だけ保持しつつ、外部のクロック供給源（図示せず）からのクロックに応答して一定間隔で音声復号器20に送出する。 The jitter buffer 10 holds a predetermined number of packets of an encoded audio signal (hereinafter referred to as “audio code”) supplied from a transmission line, while receiving a predetermined number of packets from an external clock supply source (not shown). In response to the clock, it is sent to the speech decoder 20 at regular intervals.

到着時刻打刻器50は、伝送路からジッタバッファ10へのパケットの到着を監視し、パケットがジッタバッファ10に到着すると、その到着時刻を打刻し、打刻時刻をジッタ計数器60にわたす。 The arrival time stamper 50 monitors the arrival of a packet from the transmission path to the jitter buffer 10, and when the packet arrives at the jitter buffer 10, the arrival time is stamped and the time stamp is passed to the jitter counter 60. .

ジッタ計数器60は到着時刻打刻器50からの時刻を保持しつつ、到着時刻打刻器50からの次の到着パケットに対する打刻時刻の供給を待つ。ジッタ計数器60は次の到着パケットの打刻時刻を受け取ると、前回の到着パケットと今回の到着パケットとの打刻時刻の差をとり、これをジッタとして読出し制御器70に供給する。また、ジッタとして読出し制御器70に供給した後、保持している２つのパケットの打刻時刻のうち、古いパケットに対する打刻時刻を破棄する。 The jitter counter 60 holds the time from the arrival time stamp 50 and waits for the supply of the time stamp for the next arrival packet from the arrival time stamp 50. When receiving the time stamp of the next arrival packet, the jitter counter 60 takes the difference in time stamp between the previous arrival packet and the current arrival packet and supplies it to the read controller 70 as jitter. Further, after being supplied to the read controller 70 as jitter, the time stamp for the old packet is discarded among the time stamps of the two held packets.

読出し制御部70はジッタ計数器60から供給されるジッタ値を保持し、かつ、供給される度に逐次最新のジッタに更新する。読出し制御器70は、更新されたジッタが、事前に定められた値より小さい場合、ジッタバッファ10内に格納されている音声符号パケットの量を削減するため、パケット読み飛ばしの指示を出す。このとき、セレクタ80に対して読み飛ばし制御が発動した通知を行う。 The read control unit 70 holds the jitter value supplied from the jitter counter 60 and updates it to the latest jitter each time it is supplied. When the updated jitter is smaller than a predetermined value, the read controller 70 issues a packet skip instruction in order to reduce the amount of speech code packets stored in the jitter buffer 10. At this time, the selector 80 is notified that the skipping control has been activated.

また、読出し制御器70は、更新されたジッタが、事前に定められた値より大きい場合、ジッタバッファ10内に格納されている音声符号パケットの枯渇を防ぐため、ジッタバッファ10に対し、パケット読出し停止の指示を出す。また、このとき、セレクタ80に対して読出し停止制御が発動した通知を行う。 In addition, when the updated jitter is larger than a predetermined value, the read controller 70 reads the packet from the jitter buffer 10 to prevent the voice code packets stored in the jitter buffer 10 from being depleted. Give stop instructions. At this time, the selector 80 is notified that the read stop control has been activated.

音声復号器20はジッタバッファ10から供給される音声符号パケットを逐次復号し、復号した音声信号を、セレクタ80と補償用音声信号生成部30と重複用音声信号生成部40とに出力する。 The audio decoder 20 sequentially decodes the audio code packet supplied from the jitter buffer 10 and outputs the decoded audio signal to the selector 80, the compensation audio signal generation unit 30, and the duplication audio signal generation unit 40.

補償用音声復号部30は、読出し制御器70によるパケット読み飛ばし制御時の音声ノイズを抑制するための補償用音声信号を生成してセレクタ80へ出力する。そのための手段として、バッファメモリ31，予測信号生成器32およびオーバーラップ演算器33を備えている。 The compensation speech decoding unit 30 generates a compensation speech signal for suppressing speech noise during packet skip control by the read controller 70 and outputs the compensation speech signal to the selector 80. For this purpose, a buffer memory 31, a prediction signal generator 32, and an overlap calculator 33 are provided.

バッファメモリ31は、音声復号器20から出力される音声信号を保持し、ジッタバッファ10に対するクロックと同一のクロックに応答して、保持している音声信号を予測信号生成器32に供給する。 The buffer memory 31 holds the audio signal output from the audio decoder 20, and supplies the held audio signal to the prediction signal generator 32 in response to the same clock as the clock for the jitter buffer 10.

予測信号生成器32は、バッファメモリ31から供給された音声信号を元に予測信号を生成してオーバーラップ演算器33に供給する。ここにいう予測信号とは、バッファメモリ31より供給された音声信号に対して、次の時刻における入力音声信号を予測して生成される音声信号であり、この場合では、音声信号予測に用いられる音声信号と予測信号は時間的な連続性があることになる。予測信号生成方法は一般的には線形予測などがあるが、当該音声信号と予測信号の時間的連続性が保たれてさえいればよく、予測の方法は本発明の本質に影響しないため、ここでは特に指定しない。 The prediction signal generator 32 generates a prediction signal based on the audio signal supplied from the buffer memory 31 and supplies it to the overlap calculator 33. The prediction signal here is an audio signal generated by predicting an input audio signal at the next time with respect to the audio signal supplied from the buffer memory 31, and in this case, used for audio signal prediction. The speech signal and the prediction signal have temporal continuity. The prediction signal generation method generally includes linear prediction, but it is only necessary to maintain temporal continuity between the speech signal and the prediction signal, and the prediction method does not affect the essence of the present invention. Does not specify.

なお、音声信号に対する予測方法の例として、音声符号化方式の国際標準規格ITU-T G.711の拡張として定められている「G.711 APPENDIX I」に示されている方式が周知である。これは、音声信は周期的に似た波形が繰り返されるので、この繰返しの間隔の波形を単位として、過去の音声信号との比較を行い、最も相関の高い波形を使って音声信号を予測する方式である。 As an example of a prediction method for a speech signal, a method shown in “G.711 APPENDIX I” defined as an extension of the international standard ITU-T G.711 of a speech coding method is well known. This is because the sound signal periodically repeats a similar waveform, so that the waveform of the repetition interval is used as a unit for comparison with the past sound signal, and the sound signal is predicted using the waveform with the highest correlation. It is a method.

オーバーラップ演算器33は、音声復号器20から出力された音声信号と、予測信号生成器32からの予測信号の入力をオーバーラップ演算して、補償用音声信号を生成しセレクタ80に出力する。 The overlap calculator 33 overlaps the input of the speech signal output from the speech decoder 20 and the prediction signal from the prediction signal generator 32, generates a compensation speech signal, and outputs it to the selector 80.

ここにいうオーバーラップ演算とは、０〜１に線形増加する窓関数を作用させつつ２つの音声信号（ここでは仮に音声信号Xと音声信号Yとする）の和を取る演算であり、時刻tにおける音声信号Xの値をX(t)、音声信号Yの値をY(t)、窓関数をW(t)とすると、時刻tにおけるオーバーラップ演算後の音声信号XY(t)は式１で与えられる。
XY(t)=(1-W(t))*X(t)+W(t)*Y(t)(０≦t＜N)・・・・式１
ここで、時刻tは０〜Nまでの値を取り、時刻０は音声信号の最初の値に対する時刻であり、時刻Nは音声信号の最後の値に対する時刻であり、また、窓関数W(t)は、０≦t＜Nにおいて、０〜１まで線形増加する関数である。 The overlap calculation here is an operation that takes the sum of two audio signals (here, the audio signal X and the audio signal Y) while applying a window function that linearly increases from 0 to 1, and at time t If the value of the audio signal X at X is X (t), the value of the audio signal Y is Y (t), and the window function is W (t), the audio signal XY (t) after the overlap calculation at time t is expressed by Equation 1. Given in.
XY (t) = (1-W (t)) * X (t) + W (t) * Y (t) (0 ≦ t <N)
Here, the time t takes a value from 0 to N, the time 0 is the time for the first value of the audio signal, the time N is the time for the last value of the audio signal, and the window function W (t ) Is a function that linearly increases from 0 to 1 when 0 ≦ t <N.

重複用音声信号生成部40は、読出し制御器70によるパケット読出し停止制御時の音声ノイズを抑制するための重複用音声信号を生成してセレクタ80へ出力する。そのための手段として、バッファメモリ41，予測信号生成器および42およびオーバーラップ演算器43を備えている。 The duplication audio signal generation unit 40 generates an audio signal for duplication for suppressing audio noise at the time of packet read stop control by the read controller 70 and outputs it to the selector 80. As means for this, a buffer memory 41, a prediction signal generator 42 and an overlap calculator 43 are provided.

バッファメモリ41は、音声復号器20から出力される音声信号を保持し、ジッタバッファ10に対するクロックと同一のクロックに応答して、保持している音声信号を予測信号生成器42およびオーバーラップ演算器43に供給する。 The buffer memory 41 holds the audio signal output from the audio decoder 20, and responds to the same clock as the clock for the jitter buffer 10 in response to the audio signal held by the prediction signal generator 42 and the overlap calculator. Supply to 43.

予測信号生成器42は、バッファメモリ41から供給された音声信号により、予測信号を生成し、オーバーラップ演算器43に出力する。予測信号の生成方法は予測信号生成器32におけるものと同じである。 The prediction signal generator 42 generates a prediction signal based on the audio signal supplied from the buffer memory 41 and outputs the prediction signal to the overlap calculator 43. The prediction signal generation method is the same as that in the prediction signal generator 32.

オーバーラップ演算器43は、バッファメモリ41から出力された音声信号と、予測信号生成器42からの予測信号をオーバーラップ演算し、演算結果を重複用音声信号としてセレクタ80に出力する。オーバーラップ演算の方法は、オーバーラップ演算器33におけもるのと同じである。 The overlap calculator 43 overlaps the audio signal output from the buffer memory 41 and the prediction signal from the prediction signal generator 42, and outputs the calculation result to the selector 80 as an audio signal for duplication. The method of overlap calculation is the same as that in the overlap calculator 33.

セレクタ80は、ジッタバッファ10に対するクロックと同一のクロックに応答して、音声復号器20からの音声信号，オーバーラップ演算器33またはオーバーラップ演算器43のいずれかを選択し出力音声信号として出力する。 In response to the same clock as the clock for the jitter buffer 10, the selector 80 selects either the audio signal from the audio decoder 20, the overlap calculator 33 or the overlap calculator 43 and outputs it as an output audio signal. .

ここで、出力タイミングにおいて読出し制御器70からの出力切替え指示を受けていない場合は、音声復号器20から出力された音声信号を出力音声信号として選択し出力する。一方、読出し制御器70からの出力切替え指示を受けている場合は、オーバーラップ演算器33からの補償用音声信号と、オーバーラップ演算器43からの重複用音声信号とのうち、出力切替え指示で指定された方を選択し出力する。
［動作の説明］
図１において、伝送路から供給される入力音声信号は、音声符号を格納したパケットを単位としている。また、一般に伝送路を介して供給されるパケットは不定の時間間隔で到着する。伝送路から音声符号を格納したパケットが到着すると、音声符号パケットはジッタバッファ10に格納される。また、到着時刻打刻器50において、パケットの到着時刻が打刻され、この時刻はジッタ計数器60にわたされる。 Here, when the output switching instruction from the read controller 70 is not received at the output timing, the audio signal output from the audio decoder 20 is selected and output as the output audio signal. On the other hand, when an output switching instruction is received from the read controller 70, an output switching instruction is selected from the compensation audio signal from the overlap calculator 33 and the overlapping audio signal from the overlap calculator 43. Select and output the specified one.
[Description of operation]
In FIG. 1, the input voice signal supplied from the transmission path is in units of packets storing voice codes. In general, packets supplied via a transmission path arrive at indefinite time intervals. When a packet storing a voice code arrives from the transmission path, the voice code packet is stored in the jitter buffer 10. Also, the arrival time stamper 50 stamps the arrival time of the packet, and this time is passed to the jitter counter 60.

先ず、パケット読み飛ばしの場合の動作を説明する。説明を簡単にするためにパケット読み飛ばしに関連する部分を図１から抽出した図２と、処理のイメージを示した図３と、タイミングチャートを示した図４を参照する。なお、図４におけるジッタバッファ10欄のパケットはジッタバッファ10から出力される音声符号パケットを表している。 First, the operation for skipping a packet will be described. To simplify the description, reference is made to FIG. 2 in which portions related to packet skipping are extracted from FIG. 1, FIG. 3 showing an image of processing, and FIG. 4 showing a timing chart. Note that the packet in the column of jitter buffer 10 in FIG. 4 represents a voice code packet output from the jitter buffer 10.

いま、伝送路から、時間的に連続な音声符号パケットが、到着順にパケットA、パケットB、パケットC、パケットD、パケットEがジッタバッファ10に格納されており、既に、パケットAとパケットBは通常の動作によって音声復号器20へ出力され（図４のクロック１，２）、音声復号器20によって復号済みであり、次にパケットCが出力される状態であるとする。また、パケットA、パケットB、パケットC、パケットD、パケットEそれぞれを音声復号器20によって復号した音声信号を音声信号A、音声信号B、音声信号C、音声信号D、音声信号Eと記す。 Now, speech code packets that are temporally continuous from the transmission path, packet A, packet B, packet C, packet D, and packet E are stored in jitter buffer 10 in the order of arrival, and packet A and packet B are already It is assumed that the data is output to the speech decoder 20 by the normal operation (clocks 1 and 2 in FIG. 4), decoded by the speech decoder 20, and then the packet C is output. Also, audio signals obtained by decoding the packets A, B, C, D, and E by the audio decoder 20 are referred to as audio signal A, audio signal B, audio signal C, audio signal D, and audio signal E, respectively.

なお、ここでいう通常の動作とは、読出し制御器70からジッタバッファ10並びにセレクタ80に対し、パケット読み飛ばし並びにパケット読出し停止のいずれの制御も発動しない場合の動作であり、この場合、セレクタ80は音声復号器20からの音声信号を出力音声信号として出力する。 Note that the normal operation here is an operation in a case where neither the packet reading skip nor the packet reading stop control is activated from the read controller 70 to the jitter buffer 10 and the selector 80. In this case, the selector 80 Outputs the audio signal from the audio decoder 20 as an output audio signal.

ここで、新たに音声符号パケット（仮にパケットFとする）が伝送路から到着すると、到着時刻打刻器50によってパケットFの到着時刻が打刻され、ジッタ計数器60によって、パケットFの到着時刻と、パケットFの１時刻前に到着したパケットEの到着時刻からジッタ値が算出され、読出し制御器70にわたされる。 Here, when a new voice code packet (assumed to be packet F) arrives from the transmission line, the arrival time of the packet F is stamped by the arrival time stamp 50, and the arrival time of the packet F is checked by the jitter counter 60. Then, the jitter value is calculated from the arrival time of the packet E that arrived one time before the packet F, and passed to the read controller 70.

このジッタが前もって定めている値より小さい場合、読出し制御器70は、ジッタバッファ10に対して次の読出しパケットであるパケットCの読み飛ばしを指示し（図４のクロック３）、また、セレクタ80に対してパケットの読み飛ばしが発動したことを通知する。ジッタバッファ10は読出し制御器70の指示に応動し、パケット出力タイミングでパケットCを読み飛ばしパケットDを出力する。 If this jitter is smaller than a predetermined value, the read controller 70 instructs the jitter buffer 10 to skip the next read packet, packet C (clock 3 in FIG. 4), and the selector 80 Is notified that packet skipping has been activated. The jitter buffer 10 responds to an instruction from the read controller 70, skips the packet C at the packet output timing, and outputs the packet D.

ジッタバッファ10よりパケットDの供給を受けると、音声復号器20はパケットDを復号し、音声信号Dをセレクタ80と、補償用音声信号生成部30内のバッファメモリ31およびオーバーラップ演算器32に出力する。 When the packet D is supplied from the jitter buffer 10, the audio decoder 20 decodes the packet D, and the audio signal D is sent to the selector 80, the buffer memory 31 in the compensation audio signal generation unit 30, and the overlap calculator 32. Output.

バッファメモリ31はジッタバッファ10に対するクロックと同一のクロックで音声信号Dの１時刻前に音声符号器20より供給された音声信号Bを予測信号生成器32に出力し、その後到着する音声信号Dを保持する。この動作により、バッファメモリ31は、ジッタバッファ10から音声復号器20への供給パケットに対して常に１時刻前の音声信号を保持する。 The buffer memory 31 outputs the audio signal B supplied from the audio encoder 20 one time before the audio signal D to the prediction signal generator 32 at the same clock as the clock for the jitter buffer 10, and then receives the audio signal D that arrives thereafter. Hold. With this operation, the buffer memory 31 always holds the audio signal one time before the supply packet from the jitter buffer 10 to the audio decoder 20.

予測信号生成器32は音声信号Dの１時刻前の音声信号Bに対する予測信号（ここでは予測信号B´と記す）を生成する。 The prediction signal generator 32 generates a prediction signal (referred to as a prediction signal B ′ here) for the audio signal B one time before the audio signal D.

オーバーラップ演算器33は、音声復号器20から供給された音声信号Dと、予測信号生成器32から供給された予測信号B´とをオーバーラップ演算し、補償用音声信号（D+B´）を生成し、セレクタ80に出力する。この補償用音声信号（D+B´）は、過去の音声信号Bと、後の音声信号Eとの時間的な連続性が確保された音声信号である。 The overlap calculator 33 overlaps the speech signal D supplied from the speech decoder 20 and the prediction signal B ′ supplied from the prediction signal generator 32 to obtain a compensation speech signal (D + B ′). And output to the selector 80. The compensation audio signal (D + B ′) is an audio signal in which temporal continuity between the past audio signal B and the subsequent audio signal E is ensured.

このオーバーラップ演算は、前述の式１において、W(t)に予測信号B´、Y(t)に音声信号Dを適用することにより、読み飛ばし制御の開始時点においては予測信号B´に対する重み係数を大きくし、読み飛ばし制御の終了時点においては音声信号Dに対する重み係数を大きくするようにした音声信号としてのXY(t)を得られ、これによって過去の音声信号Bと、後の音声信号Eとの時間的な連続性が確保することができるのである。 This overlap calculation is performed by applying the prediction signal B ′ to W (t) and the audio signal D to Y (t) in the above-described equation 1, so that the weight for the prediction signal B ′ at the start of the skipping control. XY (t) is obtained as an audio signal in which the coefficient is increased and the weighting coefficient for the audio signal D is increased at the end of the skip-by-reading control, whereby the past audio signal B and the subsequent audio signal are obtained. Therefore, temporal continuity with E can be ensured.

セレクタ80は、音声復号器20からの音声信号Dと、オーバーラップ演算器33からの補償用音声信号（D+B´）との入力を受けるが（図４のクロック４）、読出し制御器70からの指示により補償用音声信号（D+B´）を選択し出力する。もっとも、実際には後述のように、セレクタ80には、オーバーラップ演算器43から重複用音声信号（B+B´）も入力している。 The selector 80 receives the audio signal D from the audio decoder 20 and the compensation audio signal (D + B ′) from the overlap calculator 33 (clock 4 in FIG. 4), but the read controller 70. The audio signal for compensation (D + B ′) is selected and output in accordance with the instruction from. However, as will be described later, the selector 80 also receives the overlapping audio signal (B + B ′) from the overlap calculator 43.

以上の結果、出力音声信号は、図４に示すように、クロック３，４，５において、音声信号B、補償用音声信号（D+B´）、音声信号Eの順となり、音声信号Cの出力をスキップしつつも、補償用音声信号（D+B´）により、出力される一連の音声信号の時間的連続性は確保されていることになる。 As a result, the output audio signal is in the order of the audio signal B, the compensation audio signal (D + B ′), and the audio signal E in the clocks 3, 4, and 5, as shown in FIG. While the output is skipped, the temporal continuity of a series of output audio signals is ensured by the compensation audio signal (D + B ′).

次に、パケット読出し停止の場合の動作を説明する。説明を簡単にするために、パケット読出し停止に関連する部分を図１から抽出した図５と、処理のイメージを示した図６と、タイミングチャートを示した図７を参照する。なお、図７におけるジッタバッファ10欄のパケットはジッタバッファ10から出力される音声符号パケットを表している。 Next, the operation when packet reading is stopped will be described. In order to simplify the description, reference is made to FIG. 5 in which portions related to packet reading stop are extracted from FIG. 1, FIG. 6 showing an image of processing, and FIG. 7 showing a timing chart. Note that the packet in the column of jitter buffer 10 in FIG. 7 represents a voice code packet output from the jitter buffer 10.

いま、伝送路から、時間的に連続な音声符号パケットが、到着順にパケットA、パケットB、パケットCの順でジッタバッファ10に格納されており（図７のクロック１，２）、既に、パケットA、パケットBは通常の動作によって音声復号器20へ出力され、音声復号器20によって復号済みであり、次にパケットCが出力される状態であるとする。また、パケットA、パケットB、パケットCそれぞれを音声復号器20によって復号した音声信号を音声信号A、音声信号B、音声信号Cと記す。 Now, voice code packets that are temporally continuous from the transmission path are stored in the jitter buffer 10 in the order of arrival in the order of packets A, B, and C (clocks 1 and 2 in FIG. 7). Assume that A and packet B are output to speech decoder 20 through normal operations, have been decoded by speech decoder 20, and packet C is next output. Also, audio signals obtained by decoding the packets A, B, and C by the audio decoder 20 are referred to as audio signal A, audio signal B, and audio signal C, respectively.

ここで、新たに音声符号パケット（仮にパケットDとする）が伝送路から到着すると、到着時刻打刻器50によってパケットDの到着時刻が打刻され、ジッタ計数器60によって、パケットDの到着時刻と、パケットDの１時刻前に到着したパケットCの到着時刻からジッタ値が算出され、読出し制御器70にわされる。 Here, when a new voice code packet (assumed to be packet D) arrives from the transmission path, the arrival time of packet D is stamped by arrival time stamper 50, and the arrival time of packet D is stamped by jitter counter 60. Then, the jitter value is calculated from the arrival time of the packet C that arrived one time before the packet D, and is sent to the read controller 70.

このジッタが前もって定めている値より大きい場合、読出し制御器70は、ジッタバッファ10に対してパケットCの読出し停止を指示し（図７のクロック３）、また、セレクタ80に対してパケットの読出し停止が発動したことを通知する。ジッタバッファ10は読出し制御器70の指示に応動し、パケット出力タイミングでパケットCの読出しを停止し、音声復号器20に対して何も出力しない。 When this jitter is larger than a predetermined value, the read controller 70 instructs the jitter buffer 10 to stop reading the packet C (clock 3 in FIG. 7), and also reads the packet to the selector 80. Notify that the stop has been activated. The jitter buffer 10 responds to the instruction of the read controller 70, stops reading the packet C at the packet output timing, and outputs nothing to the speech decoder 20.

バッファメモリ41はジッタバッファ10に対するクロックと同一のクロックで、保持している１時刻前の音声信号Bを、オーバーラップ演算器43と予測信号生成器42とに出力する。予測信号生成器42は、音声信号Bに対する予測信号（ここでは予測信号B´と記す）を生成し、オーバーラップ演算器43に出力する。 The buffer memory 41 outputs the held audio signal B at the previous time with the same clock as that for the jitter buffer 10 to the overlap calculator 43 and the prediction signal generator 42. The prediction signal generator 42 generates a prediction signal for the audio signal B (herein referred to as a prediction signal B ′) and outputs it to the overlap calculator 43.

オーバーラップ演算器43はバッファメモリ41から供給された音声信号Bと、予測信号生成器42から供給された予測信号B´とをオーバーラップ演算し、重複用音声信号（B+B´）を生成し、セレクタ80に出力する。この重複用音声信号（B+B´）は、過去の音声信号Bと、後の音声信号Cとの時間的な連続性が確保された音声信号である。 The overlap calculator 43 overlaps the audio signal B supplied from the buffer memory 41 and the prediction signal B ′ supplied from the prediction signal generator 42 to generate an audio signal for duplication (B + B ′). And output to the selector 80. This duplication audio signal (B + B ′) is an audio signal in which temporal continuity between the past audio signal B and the subsequent audio signal C is ensured.

このオーバーラップ演算は、前述の式１において、W(t)に過去の音声信号B、Y(t)に予測信号B´を適用することにより、読出し停止制御の開始時点においては音声信号Bに対する重み係数を大きくし、読出し停止制御の終了時点においては予測信号B´に対する重み係数を大きくするようにした音声信号としてのXY(t)を得ることができ、これによって過去の音声信号Bと、後の音声信号Cとの時間的な連続性が確保することができるのである。 This overlap calculation is performed by applying the past audio signal B to W (t) and the prediction signal B ′ to Y (t) in the above-described equation 1, so that the audio signal B is started at the start of the read stop control. XY (t) can be obtained as an audio signal in which the weighting factor is increased and the weighting factor for the prediction signal B ′ is increased at the end of the read stop control. Thus, temporal continuity with the subsequent audio signal C can be ensured.

セレクタ80は、音声復号器20からの無信号と、オーバーラップ演算器43からの重複用音声信号（B+B´）の入力を受けるが（図７のクロック４）、読出し制御器70からの指示により重複用音声信号（B+B´）を選択し出力する。もっとも、実際には前述のように、セレクタ80には、オーバーラップ演算器33から補償用音声信号（D+B´）も入力している。 The selector 80 receives the no-signal from the speech decoder 20 and the duplicate speech signal (B + B ′) from the overlap calculator 43 (clock 4 in FIG. 7), but from the read controller 70. The audio signal for duplication (B + B ′) is selected and output according to the instruction. However, as described above, the selector 80 also receives the compensation audio signal (D + B ′) from the overlap calculator 33.

以上の結果、出力音声信号は、図７に示すように、クロック３，４，５において、音声信号B、重複用音声信号（B+B´）、音声信号Cの順となり、音声信号Cの出力を一度停止しつつも、重複用音声信号（B+B´）により、一連の出力される音声信号の時間的連続性は確保されていることになる。 As a result, as shown in FIG. 7, the output audio signal is in the order of the audio signal B, the duplication audio signal (B + B ′), and the audio signal C in the clocks 3, 4, and 5. While the output is once stopped, the temporal continuity of a series of output audio signals is ensured by the audio signal for duplication (B + B ′).

本発明による音声復号システムの一実施例の構成を示すブロック図The block diagram which shows the structure of one Example of the audio | voice decoding system by this invention. 本発明におけるパケット読み飛ばしに関連する部分を図１から抽出した図The part extracted from FIG. 1 related to packet skipping in the present invention 本発明におけるパケット読み飛ばしの動作イメージを示す図The figure which shows the operation | movement image of packet skipping in this invention 本発明におけるパケット読み飛ばしの動作を示すタイミングチャートTiming chart showing operation of skipping packets in the present invention 本発明におけるパケット読出し停止に関連する部分を図１から抽出した図The part extracted from FIG. 1 related to the packet reading stop in the present invention 本発明におけるパケット読出し停止の動作イメージを示す図The figure which shows the operation | movement image of the packet reading stop in this invention 本発明におけるパケット読出し停止の動作を示すタイミングチャートTiming chart showing operation of stopping packet reading in the present invention

Explanation of symbols

１０ジッタバッファ
２０音声復号器
３０補償用音声信号生成部
３１バッファメモリ
３２予測信号生成器
３３オーバーラップ演算器
４０重複用音声信号生成部
４１バッファメモリ
４２予測信号生成器
４３オーバーラップ演算器
５０到着時刻打刻器
６０ジッタ計数器
７０読出し制御器
８０セレクタ DESCRIPTION OF SYMBOLS 10 Jitter buffer 20 Audio | voice decoder 30 Compensation audio | voice signal production | generation part 31 Buffer memory 32 Prediction signal generator 33 Overlap operation unit 40 Duplication audio | voice signal generation part 41 Buffer memory 42 Prediction signal generator 43 Overlap operation unit 50 Arrival time Stamper 60 Jitter counter 70 Read controller 80 Selector

Claims

In a speech decoding system for decoding and outputting speech code packets input from a transmission path,
A jitter buffer that holds the voice code packet and outputs the voice code packet by controlling a clock;
An arrival time stamper that sequentially stamps the arrival times of the voice code packets;
A jitter counter that counts the arrival time jitter from the difference in time stamped by the time stamper;
When the jitter amount indicated by the jitter counter is sufficiently smaller than the retention amount of the voice code packet in the jitter buffer, the voice code packet is skipped from the jitter buffer, and the voice code packet in the jitter buffer is read. A read controller that stops reading voice code packets from the jitter buffer when the amount of dwell is not sufficient for the jitter amount indicated by the jitter counter;
A voice decoder that receives the supply of the voice code packet based on the control by the read controller from the jitter buffer, then decodes the voice code packet and outputs a decoded voice signal;
A compensation speech signal generator for generating a compensation speech signal for suppressing speech noise at the time of packet skip control by the readout controller based on the decoded speech signal output by the speech decoder;
An audio signal generator for duplication that generates an audio signal for duplication for suppressing audio noise at the time of packet read stop control by the read controller, based on the decoded audio signal output by the audio decoder;
A selector that selects and outputs one of the decoded audio signal output from the audio decoder, the compensation audio signal, and the duplication audio signal based on control by the read controller; Voice decoding system.

The compensation audio signal generator is
A buffer memory for holding an audio signal one clock before output from the audio decoder;
A prediction signal generator that predicts an audio signal discarded by skipping using an audio signal one clock before stored in the buffer memory;
The overlap calculation is performed so that the weighting coefficient for the predicted speech signal generated by the prediction signal generator is increased at the start of the skipping control and the weighting coefficient for the subsequent speech signal is increased at the end of the skipping control. The speech decoding system according to claim 1, further comprising an overlap calculator that performs the compensation speech signal.

The duplication audio signal generator is
A buffer memory for holding an audio signal one clock before output from the audio decoder;
A prediction signal generator that predicts an audio signal discarded due to stop of reading by using an audio signal one clock before stored in the buffer memory;
The weighting coefficient for the audio signal one clock before accumulated in the buffer memory at the start time of the read stop control is increased, and the weight coefficient for the predicted audio signal generated by the prediction signal generator at the end time of the read stop control. 2. The speech decoding system according to claim 1, further comprising an overlap computing unit that performs an overlap computation so as to increase the number of the speech signals for duplication.

The compensation audio signal generator is
A first buffer memory for holding an audio signal one clock before output from the audio decoder;
A first prediction signal generator for predicting an audio signal discarded by skipping using an audio signal one clock before stored in the buffer memory;
The overlap calculation is performed so that the weighting coefficient for the predicted speech signal generated by the prediction signal generator is increased at the start of the skipping control and the weighting coefficient for the subsequent speech signal is increased at the end of the skipping control. Having an overlap calculator for generating the compensation audio signal,
The duplication audio signal generator is
A second buffer memory for holding an audio signal one clock before output from the audio decoder;
A smooth second predictive signal generator in which the next continuous discontinuity is suppressed by using the audio signal one clock before stored in the buffer memory;
At the start of read stop control, the weight coefficient for the audio signal one clock before stored in the buffer memory is increased, and at the end of read stop control, the weight for the predicted audio signal generated by the prediction signal generator. The speech decoding system according to claim 1, further comprising a second overlap computing unit that performs an overlap operation so as to increase a coefficient to generate the duplicate speech signal.

2. The overlap calculator applies a window function to an audio signal one clock before stored in the buffer memory and a prediction audio signal generated by the prediction signal generator. The speech decoding system according to claim 4.