JP5074749B2

JP5074749B2 - Voice signal receiving apparatus, voice packet loss compensation method used therefor, program for implementing the method, and recording medium recording the program

Info

Publication number: JP5074749B2
Application number: JP2006327051A
Authority: JP
Inventors: 仲大室; 岳至森; 祐介日和▲崎▼; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-12-04
Filing date: 2006-12-04
Publication date: 2012-11-14
Anticipated expiration: 2026-12-04
Also published as: JP2008139661A

Abstract

<P>PROBLEM TO BE SOLVED: To easily perform the processing of speech packet loss compensation. <P>SOLUTION: Speech packets stored in a receiving buffer (12) are read out in the increasing order of numbers, decoded, and speech signals as many as a predetermined number of feames are put in a delay buffer unit (19). A speech signal having the smallest frame number is output from the delay buffer unit, and put in an output buffer (14). When a frame output from the delay buffer unit has packet loss, a speech waveform interpolation processing unit (17) cuts waveforms of a pitch length out of a frame waveform from the output buffer (14) and a frame waveform from the delay buffer unit (19), arranges them one after the other to generate two frame waveforms, and generates an interpolated speech signal from them. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、ディジタル化された音声、音楽などの音響信号(以下総称して音声信号)をインターネットをはじめとするパケット通信網を介して送信する際に、受信側において、安定した品質で音声信号を再生するための音声信号受信装置、それに使用される音声パケット消失補償方法、その方法を実施するプログラム、及びそのプログラムを記録した記録媒体に関する。 The present invention provides an audio signal with stable quality on the receiving side when transmitting an acoustic signal (hereinafter collectively referred to as an audio signal) such as digitized voice or music via a packet communication network such as the Internet. The present invention relates to an audio signal receiving apparatus for reproducing an audio signal, an audio packet loss compensation method used therefor, a program for implementing the method, and a recording medium on which the program is recorded.

音声信号をVoice over IP技術を利用して送信するサービスが普及しつつある。図1に示すように、入力音声信号を音声信号送信装置２０でフレームと呼ばれる一定の時間(例えば10ミリ秒〜20ミリ秒)ごとに区切って音声パケットに変換し、パケット通信網３０を通して音声信号受信装置１０にリアルタイムに送信する場合、通信網の状態によっては通信路の途中でパケットロスが生じ、それによって再生音声が途切れるといった品質劣化が問題となっている。特に、インターネットなどのベストエフォートと呼ばれる通信サービスの場合には、通信網の混雑時に特にこの問題が顕著である。 Services that transmit voice signals using Voice over IP technology are becoming popular. As shown in FIG. 1, an input voice signal is divided into voice packets by the voice signal transmitting apparatus 20 every predetermined time called a frame (for example, 10 milliseconds to 20 milliseconds) and converted into voice packets. When transmitting to the receiving apparatus 10 in real time, depending on the state of the communication network, packet loss occurs in the middle of the communication path, which causes a problem of quality degradation such that the reproduced voice is interrupted. In particular, in the case of a communication service called “best effort” such as the Internet, this problem is particularly remarkable when the communication network is congested.

そこで、音声信号をパケット通信網で通信する場合には、パケット消失補償と呼ばれる手法を用いて、パケットが通信路の途中で消失あるいは通信路の遅延によって制限時間内に受信側に届かなかった場合(以下総称してパケットロスの場合)に、届かなかったパケット(以下ロスパケット)に対応する区間の音声信号を受信側で推定して補償する方法が用いられる。パケット消失補償処理の代表的な方法として、非特許文献1が知られている。 Therefore, when communicating voice signals over a packet communication network, using a technique called packet loss compensation, if the packet does not reach the receiving side within the time limit due to loss or delay in the communication path. In the case of packet loss (hereinafter collectively referred to as packet loss), a method of estimating and compensating for an audio signal in a section corresponding to a packet that has not arrived (hereinafter referred to as lost packet) is used. Non-Patent Document 1 is known as a typical method of packet loss compensation processing.

図２は、非特許文献1で用いられている音声信号受信装置１０の構成例である。以下では説明を簡単にするため、音声信号送信装置では１フレーム毎の音声信号符号データを１パケットに挿入して送出するものとする。一連の音声パケットは、ゆらぎ吸収バッファとも呼ばれるパケット受信部１１によって順次受信され、受信バッファ１２に格納される。 FIG. 2 is a configuration example of the audio signal receiving apparatus 10 used in Non-Patent Document 1. In the following, for the sake of simplicity, it is assumed that the audio signal transmitting apparatus inserts audio signal code data for each frame into one packet and transmits it. A series of voice packets is sequentially received by the packet receiving unit 11, also called a fluctuation absorbing buffer, and stored in the reception buffer 12.

制御部１６の制御に従って受信バッファ１２から蓄積された音声パケットがフレーム番号の小さい順に取り出される。取り出された音声パケットは、音声波形復号部１３に送られ、ディジタル音声信号に復号されて出力される。出力されたディジタル音声信号は、予め決められた時間(フレーム数)だけ、出力音声バッファ１４に蓄積される。なお、以降の説明においても音声波形復号部により復号され出力される音声信号はこの発明の場合も含めてディジタル音声信号であり、従って、以下においては特にディジタル信号であることをことわらず単に音声信号と呼ぶ。 Under the control of the control unit 16, the voice packets accumulated from the reception buffer 12 are extracted in ascending order of frame numbers. The extracted voice packet is sent to the voice waveform decoding unit 13, decoded into a digital voice signal, and output. The output digital audio signal is stored in the output audio buffer 14 for a predetermined time (number of frames). In the following description, the audio signal decoded and output by the audio waveform decoding unit is a digital audio signal including the case of the present invention. Therefore, in the following description, the audio signal is simply an audio signal regardless of the digital signal. Called a signal.

受信バッファ１２から音声パケットを取り出す段階において、取り出すべきフレーム番号の音声パケットが蓄積されていない場合、パケットロス判定部１５
は、当該フレームでパケットロスが発生したと判断し、パケットロスである旨を制御部１６に伝える。
制御部１６は、パケットロスが発生した旨を受け取ると、スイッチＳＷをＢ側にセットする。音声波形補間処理部１７は、出力音声バッファ１４内の音声信号を用いて、パケットロスが発生したフレームの音声信号を補間処理により生成し、スイッチＳＷを通して出力する。 When the voice packet having the frame number to be taken out is not accumulated at the stage of taking out the voice packet from the reception buffer 12, the packet loss determination unit 15
Determines that a packet loss has occurred in the frame and informs the control unit 16 of the packet loss.
When the control unit 16 receives that the packet loss has occurred, the control unit 16 sets the switch SW to the B side. The voice waveform interpolation processing unit 17 uses the voice signal in the output voice buffer 14 to generate a voice signal of a frame in which packet loss has occurred by interpolation processing, and outputs it through the switch SW.

図３は音声波形補間処理部１７において出力音声信号を生成する補間処理を概念的に表したものである。出力音声バッファ１４内の音声信号の最後のサンプル点S_Eからピッチ長と呼ばれる音声の基本周期に対応した長さだけ遡った波形３Ａをコピーし、その波形を音声波形補間処理部１７内の図示してないロスフレームバッファに順に波形３Ｂ、３Ｃ、３Ｄとして並べて貼り付ける。この合成された波形の音声信号を現フレーム（ロスフレーム）の音声信号としてスイッチＳＷを通して出力する。なお、波形接続部が不連続とならないように、波形３Ａを１ピッチ長よりも少し長く取り、波形の一部を重ね合わせながらコピーする方法を用いてもよい。 FIG. 3 conceptually shows an interpolation process for generating an output audio signal in the audio waveform interpolation processing unit 17. A waveform 3A that is traced back from the last sample point S _E of the audio signal in the output audio buffer 14 by a length corresponding to the basic period of the audio, which is called a pitch length, is copied, and the waveform is a diagram in the audio waveform interpolation processing unit 17. A waveform 3B, 3C, and 3D are sequentially arranged and pasted on a loss frame buffer not shown. The synthesized waveform audio signal is output as an audio signal of the current frame (loss frame) through the switch SW. In order to prevent the waveform connecting portion from being discontinuous, a method may be used in which the waveform 3A is slightly longer than one pitch length and copied while overlapping a part of the waveform.

非特許文献１（図３）の方法の問題は、過去の出力信号のみを用いて波形の補間処理(一方向の外挿ともいう)を行うために、音声の特性が変動している過渡部、例えば無音（背景雑音）区間から音声区間への変化部、あるいは無声音から有声音への変化部などでパケットロスが発生すると、音質が劣化することである。
この問題を解決する方法として、非特許文献２の方法が提案されている。図４に、非特許文献２で用いられている音声信号受信装置の構成例を、図２と同じ構成要素には同じ参照番号を付けて示す。 The problem of the method of Non-Patent Document 1 (FIG. 3) is that a transient part where the characteristics of the sound fluctuate because waveform interpolation processing (also referred to as unidirectional extrapolation) is performed using only past output signals. For example, when a packet loss occurs in a change portion from a silent (background noise) interval to a voice interval or a change portion from an unvoiced sound to a voiced sound, the sound quality deteriorates.
As a method for solving this problem, the method of Non-Patent Document 2 has been proposed. FIG. 4 shows an example of the configuration of an audio signal receiving apparatus used in Non-Patent Document 2 with the same reference numerals assigned to the same components as those in FIG.

図４に示す音声信号受信装置１０は、図２の装置においてバッファ探索部１８が追加された構成となっている。図４の装置では、パケットロスが発生すると、バッファ探索部１８において受信バッファ１２内を探索し、パケットロスが発生したフレームよりも後のフレーム番号のパケットが受信バッファ１２内に到着している揚合には、当該パケットをデコードし、デコードして得られた音声信号(以下先読み波形と呼ぶ)を音声波形補間処理部１７に送る。 The audio signal receiving apparatus 10 shown in FIG. 4 has a configuration in which a buffer search unit 18 is added to the apparatus shown in FIG. In the apparatus of FIG. 4, when a packet loss occurs, the buffer search unit 18 searches the reception buffer 12, and a packet having a frame number after the frame in which the packet loss has occurred has arrived in the reception buffer 12. In this case, the packet is decoded, and an audio signal (hereinafter referred to as a prefetch waveform) obtained by decoding is sent to the audio waveform interpolation processing unit 17.

図５は、図４における音声波形補間処理部１７によって音声波形を補間処理により生成する補間処理を概念的に示す。音声波形補間処理部１７は、非特許文献１（図３）の方法と同様に出力音声バッファ１４内の音声信号から1ピッチ長の波形３Ａをコピーして１フレームを埋めるように波形３Ｂ，３Ｃ，３Ｄとして並べて貼り付ける処理と、先読み波形の先頭から1ピッチ長の波形４Ａをコピーして、後向きに１フレームを埋めるように波形４Ｂ，４Ｃ，４Ｄとして並べて貼り付ける処理とによって、前向き補間されたフレーム波形３１と後ろ向き補間されたフレーム波形４１を生成し、それら２つのフレーム波形の加重和によって1つのフレーム波形を生成する。このような処理を、前後方向の内挿ともいう。なお、図３の説明と同様に、波形接続部が不連続とならないように、波形３Ａあるいは４Ａを１ピッチ長よりも少し長く取り、波形の一部を重ね合わせながらコピーする方法を用いてもよい。
ITU-T Rec. G.711 Appendix I(1999) 大室、森、日和崎、栗原、片岡、"音声特徴量並行送信によるバーストパケットロス耐性の向上"、電子情報通信学会信学技報 SP2004-77(2004) FIG. 5 conceptually shows an interpolation process for generating a speech waveform by the interpolation process by the speech waveform interpolation processing unit 17 in FIG. Similar to the method of Non-Patent Document 1 (FIG. 3), the speech waveform interpolation processing unit 17 copies the waveform 3A of 1 pitch length from the speech signal in the output speech buffer 14 and fills one frame with the waveforms 3B and 3C. , 3D, and the process of copying and pasting the waveform 4A having a pitch length of 1 pitch from the beginning of the pre-read waveform, and arranging and pasting it as waveforms 4B, 4C, and 4D so as to fill one frame backward. The frame waveform 31 and the backward interpolated frame waveform 41 are generated, and one frame waveform is generated by the weighted sum of the two frame waveforms. Such processing is also referred to as longitudinal interpolation. Similar to the description of FIG. 3, it is also possible to use a method in which the waveform 3A or 4A is slightly longer than one pitch length and copied while overlapping a part of the waveform so that the waveform connecting portion does not become discontinuous. Good.
ITU-T Rec. G.711 Appendix I (1999) Omuro, Mori, Hiwasaki, Kurihara, Kataoka, "Improvement of burst packet loss tolerance by parallel transmission of voice features", IEICE Technical Report SP2004-77 (2004)

非特許文献２（図５）の方法は、音声信号の過渡部でパケットロスが発生した場合において、過渡部の品質劣化を抑える優れた方法である。一般的なパケット音声通信端末では、受信バッファとパケット消失補償処理は別のプロセッサに実装することが多い。しかしながら、図４において受信バッファとパケット消失補償を別のプロセッサに実装するハードウェア上では、同じ受信バッファ１２に対し受信バッファ処理とパケット消失補償処理、特に受信バッファ探索部１８による探索処理とを密に連動させて非特許文献２の方法を実現することは、大変に複雑な実装が必要となり、実質的に実現が困難であった。 The method of Non-Patent Document 2 (FIG. 5) is an excellent method for suppressing the quality deterioration of the transient part when packet loss occurs in the transient part of the audio signal. In general packet voice communication terminals, the reception buffer and the packet loss compensation processing are often implemented in different processors. However, on the hardware in which the reception buffer and the packet loss compensation are implemented in different processors in FIG. 4, the reception buffer processing and the packet loss compensation processing, particularly the search processing by the reception buffer search unit 18, are densely performed for the same reception buffer 12. The realization of the method of Non-Patent Document 2 in conjunction with the above requires a very complicated mounting and is substantially difficult to realize.

この発明の目的は、パケット消失補償処理が容易に実行可能な音声信号受信装置、音声パケット消失補償方法、及びその方法のプログラムと、そのプログラムを記録した記録媒体を提供することである。 An object of the present invention is to provide a voice signal receiving apparatus, a voice packet loss compensation method, a program for the method, and a recording medium on which the program is recorded.

この発明による音声信号受信装置は、
受信した音声パケットを一時的に蓄える受信バッファと、
上記受信バッファから、フレーム番号の小さい順に音声パケットを取り出し、音声パケット内の音声符号を復号して音声信号を得る復号手段と、
取り出すべき音声パケットが上記受信バッファに蓄積されているか否かを判定し、判定結果をパケットロスが発生したか否かを表すパケットロスフラグとして生成するパケットロス判定手段と、
復号された上記音声信号を指定されたフレーム数まで蓄積し、フレーム番号の小さい順に音声信号を出力する遅延バッファ手段と、
予め決められた時間またはフレーム数の出力音声信号を蓄積する出力音声バッファ手段と、
上記遅延バッファ手段から出力すべきフレームがパケットロスであった場合に、上記出力音声バッファ内の出力音声信号から波形をコピーすることによって第１の補間波形を生成し、上記遅延バッファ手段内から探索された、上記パケットロスのフレームよりも後のフレーム番号の音声信号からコピーした波形によって第２の補間波形を生成し、上記第１の補間波形と上記第２の補間波形を用いて補間音声信号を生成し、それをパケットロスのフレームに対応する出力音声とする音声波形補間処理手段と、
上記遅延バッファ手段から出力すべきフレームがパケットロスでなければ上記遅延バッファ手段からの音声信号を出力音声信号として出力し、パケットロスであれば上記音声波形補間処理手段からの上記補間音声信号を出力音声信号として出力する出力制御手段、
とを含むように構成されている。 The audio signal receiving apparatus according to the present invention is
A reception buffer for temporarily storing received voice packets;
Decoding means for extracting voice packets from the reception buffer in ascending order of frame numbers and decoding voice codes in the voice packets to obtain voice signals;
Packet loss determination means for determining whether or not a voice packet to be extracted is accumulated in the reception buffer, and generating a determination result as a packet loss flag indicating whether or not a packet loss has occurred;
Delay buffer means for accumulating the decoded audio signal up to a specified number of frames and outputting the audio signal in ascending order of frame number;
Output audio buffer means for storing output audio signals for a predetermined time or number of frames;
When a frame to be output from the delay buffer means is a packet loss, a first interpolated waveform is generated by copying the waveform from the output audio signal in the output audio buffer and searched from within the delay buffer means A second interpolation waveform is generated by a waveform copied from the audio signal having a frame number after the frame of the packet loss, and the interpolated audio signal is generated using the first interpolation waveform and the second interpolation waveform. Voice waveform interpolation processing means for generating an output voice corresponding to a packet loss frame, and
If the frame to be output from the delay buffer means is not a packet loss, the audio signal from the delay buffer means is output as an output audio signal, and if it is a packet loss, the interpolated audio signal from the audio waveform interpolation processing means is output. Output control means for outputting as an audio signal;
Are included.

また、この発明による音声パケット消失補償方法は、
(a) 受信した音声パケットを受信バッファに一時的に蓄えるステップと、
(b) 上記受信バッファから、フレーム番号の小さい順に音声パケットを取り出し、音声パケット内の音声符号を復号して音声信号を得るステップと、
(c) 取り出すべき音声パケットが上記受信バッファに蓄積されているか否かを判定し、判定結果をパケットロスが発生したか否かを表すパケットロスフラグとして生成するステップと、
(d) 復号された上記音声信号を指定されたフレーム数まで遅延バッファ手段に蓄積し、フレーム番号の小さい順に音声信号を遅延バッファ手段から出力するステップと、
(e) 予め決められた時間またはフレーム数の出力音声信号を出力音声バッファ手段に蓄積するステップと、
(f) 上記遅延バッファ手段から出力すべきフレームがパケットロスであった場合に、上記出力音声バッファ内の出力音声信号から波形をコピーすることによって第１の補間波形を生成し、上記遅延バッファ手段内から探索された、上記パケットロスのフレームよりも後のフレーム番号の音声信号からコピーした波形によって第２の補間波形を生成し、上記第１の補間波形と上記第２の補間波形を用いて補間音声信号を生成し、それをパケットロスのフレームに対応する出力音声とするステップと、
(g) 上記遅延バッファ手段から出力すべきフレームがパケットロスでなければ上記遅延バッファ手段からの音声信号を出力音声信号として出力し、パケットロスであれば上記音声波形補間処理手段からの上記補間音声信号を出力音声信号として出力するステップ、
とを含む。 Also, the voice packet loss compensation method according to the present invention provides:
(a) temporarily storing received voice packets in a reception buffer;
(b) extracting voice packets from the reception buffer in ascending order of frame numbers and decoding voice codes in the voice packets to obtain voice signals;
(c) determining whether a voice packet to be extracted is stored in the reception buffer, and generating a determination result as a packet loss flag indicating whether a packet loss has occurred;
(d) storing the decoded audio signal in the delay buffer means up to a specified number of frames, and outputting the audio signals from the delay buffer means in ascending order of frame numbers;
(e) accumulating an output audio signal for a predetermined time or number of frames in the output audio buffer means;
(f) When a frame to be output from the delay buffer means is a packet loss, a first interpolation waveform is generated by copying a waveform from the output audio signal in the output audio buffer, and the delay buffer means A second interpolation waveform is generated from a waveform copied from an audio signal having a frame number later than the frame of the packet loss searched from within, and the first interpolation waveform and the second interpolation waveform are used. Generating an interpolated audio signal and making it an output audio corresponding to a packet loss frame;
(g) If the frame to be output from the delay buffer means is not a packet loss, the audio signal from the delay buffer means is output as an output audio signal, and if it is a packet loss, the interpolated audio from the audio waveform interpolation processing means is output. Outputting a signal as an output audio signal;
Including.

この発明によれば、復号音声信号を遅延バッファ手段に指定されたフレーム数だけ保持し、出力し、パケットロスのフレームについては遅延バッファ手段内のロスフレームより後の音声信号を先読みして補間波形を生成するので、受信バッファとパケット消失補償を別プロセッサにインプリメントする場合においても、必要最小限の遅延の増加で、過渡部における音質劣化の少ないパケット消失補償を実現できる。結果として、低コストで通話品質の高いパケット音声通信端末が容易に実現可能である。
本発明は、コンピュータ本体とコンピュータプログラムとして実行することが可能であるし、デジタルシグナルプロセッサや専用LSIに実装して実現することも可能である。 According to the present invention, the decoded audio signal is held and output for the number of frames designated in the delay buffer means, and for the packet loss frame, the audio signal after the loss frame in the delay buffer means is pre-read to interpolate the waveform. Therefore, even when the reception buffer and the packet loss compensation are implemented in different processors, it is possible to realize the packet loss compensation with little deterioration in sound quality in the transitional part with the minimum increase in delay. As a result, a packet voice communication terminal with low cost and high call quality can be easily realized.
The present invention can be executed as a computer main body and a computer program, or can be realized by being mounted on a digital signal processor or a dedicated LSI.

［実施例１］
図６は、本発明における音声信号受信装置の構成例である。図２，４における装置と共通する構成要素には同じ参照番号を付けて示してある。この発明による音声信号受信装置は、パケット受信部１１，受信バッファ１２，音声波形復号部１３，出力音声バッファ１４，パケットロス判定部１５、制御部１６，音声波形補間処理部１７，バッファ探索部１８が設けられている点は図４の場合と同様である。図６の実施例では、更に遅延バッファ部１９と出力制御部２１が追加されている。出力音声バッファ１４、音声波形補間処理部１７、バッファ探索部１８、遅延バッファ部１９、スイッチＳＷはパケット消失補償処理部１００を構成している。 [Example 1]
FIG. 6 is a configuration example of an audio signal receiving apparatus according to the present invention. Components common to the apparatus in FIGS. 2 and 4 are given the same reference numerals. The voice signal receiving apparatus according to the present invention includes a packet receiver 11, a reception buffer 12, a voice waveform decoder 13, an output voice buffer 14, a packet loss determiner 15, a controller 16, a voice waveform interpolation processor 17, and a buffer searcher 18. Is provided in the same manner as in FIG. In the embodiment of FIG. 6, a delay buffer unit 19 and an output control unit 21 are further added. The output audio buffer 14, the audio waveform interpolation processing unit 17, the buffer search unit 18, the delay buffer unit 19, and the switch SW constitute a packet loss compensation processing unit 100.

図２で説明したと同様に、音声パケットは、パケット受信部１１によって受信され、受信バッファ１２に送られる。制御部１６の制御に従って受信バッファ１２からフレーム番号の小さい順に蓄積された音声パケットが取り出される。取り出された音声パケットは、音声波形復号部１３に送られ、音声信号に復号されて、この発明では遅延バッファ部１９に送られる。ここで、復号された音声信号には、狭義の音声波形信号即ちPCM形式を代表とする音声波形信号だけでなく、音声波形復号時に得られる音声パラメータ（例えば、ピッチ、パワー、フレーム番号情報等）も含んでもよい。 As described in FIG. 2, the voice packet is received by the packet receiver 11 and sent to the reception buffer 12. Under the control of the control unit 16, voice packets stored in ascending order of frame numbers are extracted from the reception buffer 12. The extracted voice packet is sent to the voice waveform decoding unit 13, decoded into a voice signal, and sent to the delay buffer unit 19 in the present invention. Here, the decoded speech signal includes not only a speech waveform signal in a narrow sense, that is, a speech waveform signal typified by the PCM format, but also speech parameters (for example, pitch, power, frame number information, etc.) obtained at the time of speech waveform decoding. May also be included.

パケットロス判定部１５は、受信バッファから音声パケットが取り出される段階において、取り出されるべきフレーム番号の音声パケットが蓄積されていいるか否かを判定し、判定結果を当該フレームでパケットロスが発生したか否かを表すパケットロスフラグF_PLとして制御部１６に与える。例えばF_PL=1の場合はパケットロスが生じたことを表すものとする。 The packet loss determination unit 15 determines whether or not a voice packet having a frame number to be extracted is accumulated at the stage where the voice packet is extracted from the reception buffer, and the determination result indicates whether or not a packet loss has occurred in the frame. _Is given to the control unit 16 as a packet loss flag _FPL . For example, when F _PL = 1, it is assumed that packet loss has occurred.

制御部１６は、パケットロスフラグF_PLを受け取ると、それを遅延バッファ部１９に伝える。あるいはパケットロス判定部１５から直接パケットロスフラグF_PLを遅延バッファ部１９に与えてもよい。制御部１６の統計値計算部１６Ａは、パケットロス判定部１５からパケットロスフラグF_PLを受け取るごとに統計的にパケットロスの頻度やパターンをパケットロスの統計値として求め、遅延バッファ部１９内に保持すべきフレーム数N_Fを決定して、遅延バッファ部１９に与える。パケットロスの頻度やパターンと、要保持フレーム数N_Fの関係は、予め規則(対応表)を作成して表メモリ１６Ｂに格納しておき、それを参照して要保持フレーム数N_Fを決定する。 When receiving the packet loss flag _FPL , the control unit 16 _{notifies the} delay buffer unit 19 of it. Alternatively, the packet loss flag _FPL may be directly supplied from the packet loss determination unit 15 to the delay buffer unit 19. Each time the statistical value calculation unit 16A of the control unit 16 receives the packet loss flag _FPL from the packet loss determination unit 15, the statistical value calculation unit 16A statistically obtains the frequency and pattern of packet loss as the statistical value of packet loss, and stores it in the delay buffer unit 19. The number of frames N _F to be held is determined and supplied to the delay buffer unit 19. Regarding the relationship between the frequency and pattern of packet loss and the number of required frames N _F, a rule (correspondence table) is created in advance and stored in the table memory 16B, and the number of required frames N _F is determined by referring to it. To do.

パケットロスの頻度としてはパケットロス率を使用し、統計値計算部１６Ａで例えば、フレームｋにおけるパケットロス率e(k)をフレーム毎に、
e(k)＝e(k-1)×0.99+0.01, パケットロス(F_PL=1)の場合 (1)
e(k)＝e(k-1)×0.99, パケットロスでない(F_PL=0)場合 (2)
により計算する。kはフレーム番号k=1,2,3,...であり、e(k)の初期値をe(0)＝0とする。または、パケットロスフラグF_PLを統計値計算部１６Ａ内の図示してないバッファに一定時間蓄積して、フレーム毎にバッファ内のパケットロスフラグF_PLからパケットロス率e(k)をバッファ内のF_PL=1の数とF_PLの全数（フレーム数）の比として計算してもよい。 The packet loss rate is used as the frequency of packet loss. For example, the statistical value calculation unit 16A calculates the packet loss rate e (k) in the frame k for each frame.
e (k) = e (k-1) x 0.99 + 0.01, packet loss (F _PL = 1) (1)
e (k) = e (k-1) x 0.99, no packet loss (F _PL = 0) (2)
Calculate according to k is a frame number k = 1, 2, 3,..., and an initial value of e (k) is set to e (0) = 0. Alternatively, the packet loss flag F _PL is accumulated in a buffer (not shown) in the statistical value calculation unit 16A for a predetermined time, and the packet loss rate e (k) is stored in the buffer from the packet loss flag F _PL in the buffer for each frame. may be calculated as the ratio of F _PL = 1 the number and F _PL of the total number (number of frames).

パケットロスのパターンを数値で表すには、連続パケットロス率を計算する。ｎ連続パケットロス率（ｎは１以上の整数）は、ｎフレーム以上連続してパケットロスした率と定義し、第ｋフレームにおけるｎ連続パケットロス率をen(k)と表記することにする。ここでは、ei(k), i=1, 2, ..., nの各連続パケットロス率を計算し、それらに基づいて後述のように表を参照して要保持フレーム数N_Fを決定する。ei(k)の計算方法を図７のフロー図を参照して以下に説明する。 To express the packet loss pattern as a numerical value, the continuous packet loss rate is calculated. The n consecutive packet loss rate (n is an integer of 1 or more) is defined as a rate of packet loss continuously for n frames or more, and the n continuous packet loss rate in the kth frame is expressed as en (k). Here, the consecutive packet loss rates of ei (k), i = 1, 2, ..., n are calculated, and based on them, the number of required frames _NF is determined by referring to the table as described later. To do. A method of calculating ei (k) will be described below with reference to the flowchart of FIG.

まず、連続回数パラメータｒの初期値を０に設定する（ステップＳ１）。次にパケットロスフラグF_PLが１か０であるかによりパケットロスであるか判定する（ステップＳ２）。パケットロスであった場合はステップＳ３でｒの値を１増加させ、ステップＳ２に戻る。ステップＳ２でパケットロスでないと判定された場合は、ステップＳ４でei(k), i=1, 2, ..., nの計算を以下のように行う。
ｒ＜ｉのei(k)に対しそれぞれ
ei(k)＝ei(k-1) ×0.99 (3)
をr+1回ずつ計算する。
ｒ≧ｉのei(k)に対しそれぞれ
ei(k)＝ei(k-1)×0.99＋0.01 (4)
をｒ回ずつ計算し、さらに
ei(k)＝ei(k-1)×0.99 (5)
の計算を１回ずつ行う。なお、i=1の場合であるe1(k)は前記e(k)と同じである。また、ei(k)の計算は、上記e(k)の計算と同様に、パケットロスフラグF_PLを一定時間バッファに蓄積して、フレーム毎にバッファ内のパケットロスフラグF_PLから連続パケットロス率を計算してもよい。式(1)〜(5)における定数0.99及び0.01は実験的に適当なものを予め決めることとし、例えば、それぞれ0.95と0.05、0.995と0.005といった組み合わせでもよい。また、フレーム番号ｋが小さい間、即ち音声パケットの受信を開始してから間もない間は0.95と0.05という定数の組を用い、一定の時間が経過後は0.99と0.01の組を用いるというように、途中で定数を変更してもよい。 First, the initial value of the continuous number parameter r is set to 0 (step S1). Next, it is determined whether the packet loss is caused by whether the packet loss flag _FPL is 1 or 0 (step S2). If it is a packet loss, the value of r is incremented by 1 in step S3, and the process returns to step S2. If it is determined in step S2 that there is no packet loss, ei (k), i = 1, 2,..., N is calculated as follows in step S4.
For ei (k) of r <i
ei (k) ＝ ei (k-1) × 0.99 (3)
Is calculated r + 1 times.
For ei (k) of r ≧ i
ei (k) = ei (k-1) x 0.99 + 0.01 (4)
Is calculated r times, and
ei (k) ＝ ei (k-1) × 0.99 (5)
Is calculated once. Note that e1 (k) in the case of i = 1 is the same as e (k). Similarly to the calculation of e (k), the calculation of ei (k) accumulates the packet loss flag F _PL in the buffer for a certain period of time, and continuously packet loss from the packet loss flag F _PL in the buffer for each frame. The rate may be calculated. The constants 0.99 and 0.01 in the formulas (1) to (5) are determined experimentally as appropriate, and may be combinations of 0.95 and 0.05, 0.995 and 0.005, respectively, for example. Also, a constant set of 0.95 and 0.05 is used while the frame number k is small, that is, shortly after the start of voice packet reception, and a set of 0.99 and 0.01 is used after a certain period of time. In addition, the constant may be changed on the way.

図８と図９Ａに、それぞれe(k)とei(k), i=2, 3, 4によって要保持フレーム数N_Fを決める規則の例を表で示す。これらの表は制御部１６の表メモリ１６Ｂに予め格納しておく。要保持フレーム数N_Fは図８に示すe(k)だけに基づいて決めてもよいし、図８のe(k)と図９Ａのei(k)に基づいて決めてもよい。後者の場合、図８と図９Ａで該当する項目が複数ある場合は、該当する中で最も大きい値を要保持フレーム数N_Fとする。なお、図８と図９Ａの各閾値は他の値に変更してもよく、図９Ａの代わりに図９Ｂを用いてもよい。一般に閾値を大きく設定すると要保持フレーム数が小さく設定されて、遅延バッファ部１９による通話遅延は少なくなるが、パケットロス時の品質劣化が大きくなり、逆に閾値を小さく設定すると要保持フレーム数が大きく設定されて、パケットロス時の品質劣化は抑えられるが、遅延バッファ部１９による通話遅延が大きくなるため、トレードオフの関係にある。従って、閾値は品質劣化が目立たない範囲でできるだけ大きくなるように実際にパケットロスが発生する環境で調整するのがよい。 FIG. 8 and FIG. 9A show examples of rules for determining the number of required frames N _F by e (k) and ei (k), i = 2, 3, and 4, respectively. These tables are stored in advance in the table memory 16B of the control unit 16. The number of required frames N _F may be determined based only on e (k) shown in FIG. 8, or may be determined based on e (k) in FIG. 8 and ei (k) in FIG. 9A. In the latter case, when there are a plurality of corresponding items in FIG. 8 and FIG. 9A, the largest value among the corresponding items is set as the number of required holding frames N _F. 8 and 9A may be changed to other values, and FIG. 9B may be used instead of FIG. 9A. Generally, if the threshold value is set large, the number of required frames is set small, and the call delay due to the delay buffer unit 19 is reduced. However, quality degradation at the time of packet loss increases, and conversely, if the threshold value is set small, the number of required frames required Although it is set large and quality deterioration at the time of packet loss is suppressed, since the call delay by the delay buffer unit 19 becomes large, there is a trade-off relationship. Therefore, the threshold value should be adjusted in an environment where packet loss actually occurs so that it becomes as large as possible within a range where quality degradation is not noticeable.

遅延バッファ部１９は、制御部１６から指定される要保持フレーム数N_F分の復号された音声信号を内部に保持するシフトバッファまたはリングバッファである。遅延バッファ部１９は音声波形復号部１３から受け取った音声信号を保持し、保持している音声信号のうち、最もフレーム番号の小さい音声信号を出力する。ただし、制御部１６からパケットロスが発生した旨（F_PL=1）を受け取ると、音声波形復号部１３から復号音声信号を受け取って遅延バッファ部１９に格納する代わりに、そのフレームはパケットロスである旨の情報を格納する。パケットロス情報としては、例えばパケットロスフラグF_PLとフレーム番号情報を使う。制御部１６から指定される要保持フレーム数N_Fが0のときは、音声波形復号部１３から音声信号を受け取ると同時にそれを出力するので、遅延は生じない。 The delay buffer unit 19 is a shift buffer or a ring buffer that internally holds decoded audio signals for the number of required frames N _F specified by the control unit 16. The delay buffer unit 19 holds the audio signal received from the audio waveform decoding unit 13 and outputs the audio signal having the smallest frame number among the held audio signals. However, when a packet loss has occurred from the control unit 16 (F _PL = 1), instead of receiving the decoded audio signal from the audio waveform decoding unit 13 and storing it in the delay buffer unit 19, the frame is a packet loss. Stores information to that effect. As the packet loss information, for example, a packet loss flag _FPL and frame number information are used. When the number of required frames N _F specified by the control unit 16 is 0, since the audio signal is received from the audio waveform decoding unit 13 and output at the same time, no delay occurs.

遅延バッファ部１９から出力された音声信号は、スイッチＳＷのＡ端子に送られる。出力制御部２１は遅延バッファ部１９から最もフレーム番号の小さい音声信号を出力させるときに、当該フレームがパケットロスでない場合は、スイッチＳＷをＡ側即ち遅延バッファ部側にセットする。遅延バッファ部１９から出力されるフレームが当該フレームにおいて復号音声信号の代わりにパケットロスである旨の情報が格納されていた場合は、スイッチＳＷをＢ側即ち音声波形補間処理部１７側にセットする。音声信号はスイッチＳＷを通して出力されるとともに、出力音声バッファ１４に書き込まれる。 The audio signal output from the delay buffer unit 19 is sent to the A terminal of the switch SW. When the output control unit 21 outputs the audio signal having the smallest frame number from the delay buffer unit 19, if the frame is not a packet loss, the output control unit 21 sets the switch SW to the A side, that is, the delay buffer unit side. When information indicating that the frame output from the delay buffer unit 19 is a packet loss instead of the decoded audio signal is stored in the frame, the switch SW is set to the B side, that is, the audio waveform interpolation processing unit 17 side. . The audio signal is output through the switch SW and written to the output audio buffer 14.

出力音声バッファ１４は、スイッチＳＷから送られる出力音声信号を、予め決められた時間(フレーム数)だけ内部に蓄積し、蓄積した音声信号を音声波形補間部１７に送る。バッファ探索部１８は、遅延バッファ部１９から出力されるフレームがパケットロスであった場合に、遅延バッファ部１９内を探索し、パケットロスが発生したフレームよりも後のフレーム番号（大きいフレーム番号）のパケットが遅延バッファ部１９に蓄積されている場合には、探索して得られた波形（先読み波形）を音声波形補間処理部１７に送る。 The output audio buffer 14 accumulates the output audio signal sent from the switch SW for a predetermined time (number of frames), and sends the accumulated audio signal to the audio waveform interpolation unit 17. The buffer search unit 18 searches the delay buffer unit 19 when the frame output from the delay buffer unit 19 is a packet loss, and a frame number (large frame number) after the frame in which the packet loss has occurred. Are stored in the delay buffer unit 19, the waveform (prefetch waveform) obtained by the search is sent to the speech waveform interpolation processing unit 17.

音声波形補間処理部１７は、図５で説明した方法と同様に、出力音声バッファ１４内の音声信号から1ピッチ波形を切り出して並べる処埋と、遅延バッファ部１９からの先読み波形の先頭から1ピッチ波形を切り出して、後向きに波形を並べる処理によって２つのフレーム波形を生成し、それらの加重和によって1つのフレーム波形を補間音声信号として生成する。図５の説明と同様に、波形接続部が不連続とならないように、切り出す波形は１ピッチ長よりも少し長く取り、波形の一部を重ね合わせながらコピーする方法を用いてもよい。生成した補間音声信号は、スイッチＳＷのＢ端子から出力音声信号として出力されるとともに、出力音声バッファ１４に書き込まれる。 Similar to the method described with reference to FIG. 5, the speech waveform interpolation processing unit 17 performs processing to cut out and arrange one pitch waveform from the speech signal in the output speech buffer 14, and 1 from the head of the prefetched waveform from the delay buffer unit 19. Two frame waveforms are generated by cutting out the pitch waveform and arranging the waveforms backward, and one frame waveform is generated as an interpolated speech signal by the weighted sum of them. Similarly to the description of FIG. 5, a method may be used in which a waveform to be cut out is slightly longer than one pitch length and copied while overlapping a part of the waveform so that the waveform connecting portion does not become discontinuous. The generated interpolated audio signal is output as an output audio signal from the B terminal of the switch SW and written to the output audio buffer 14.

上述した図６の音声信号受信装置において実施される音声パケット消失補償方法を図１０のフロー図で説明する。
ステップＳ１：受信バッファ１２に取り出すべき音声パケットが蓄積されているか判定し、パケットロスフラグF_PLを生成する。
ステップＳ２：制御部１６においてパケットロスフラグF_PLに基づいて連続パケットロス率ei(k), i=1, 2, ..., nを計算する。
ステップＳ３：制御部１６において連続パケットロス率から、表メモリ１６Ｂの表（図８及び９Ａ）を参照して要保持フレーム数N_Fを決定する。 The voice packet loss compensation method implemented in the above-described voice signal receiving apparatus of FIG. 6 will be described with reference to the flowchart of FIG.
Step S1: It is determined whether a voice packet to be taken out is stored in the reception buffer 12, and a packet loss flag _FPL is generated.
Step S2: The control unit 16 based on the packet loss flag F _PL consecutive packet loss rate ei (k), i = 1 , 2, ..., to calculate the n.
Step S3: The control unit 16 determines the number of required frames N _F from the continuous packet loss rate with reference to the table of the table memory 16B (FIGS. 8 and 9A).

ステップＳ４：パケットロスフラグF_PLがパケットロスを表しているか判定する。
ステップＳ５：ステップＳ４でパケットロスでないと判定された場合は、受信バッファ１２から取り出した音声パケットを復号して音声信号を得て、要保持フレーム数N_Fの条件の下に復号音声信号を遅延バッファ部に書き込む。N_Fの制限の下に実際に復号音声信号を遅延バッファ部１９に書き込むか否かについては後述の実施例において説明する。
ステップＳ６：ステップＳ４でパケットロスであると判定された場合は、パケットロス情報を遅延バッファ部１９に書き込む。
ステップＳ７：遅延バッファ部１９内の最も番号の小さいフレームがパケットロスであるか判定する。 Step S4: It is determined whether the packet loss flag _FPL indicates a packet loss.
Step S5: If it is determined in step S4 that there is no packet loss, the voice packet extracted from the reception buffer 12 is decoded to obtain a voice signal, and the decoded voice signal is delayed under the condition of the number of required frames N _F Write to the buffer. Whether or not the decoded audio signal is actually written to the delay buffer unit 19 under the restriction of N _F will be described in the embodiments described later.
Step S6: If it is determined in step S4 that there is a packet loss, packet loss information is written in the delay buffer unit 19.
Step S7: It is determined whether the frame with the smallest number in the delay buffer unit 19 is a packet loss.

ステップＳ８：ステップＳ７でパケットロスでないと判定された場合は、読み出した音声信号を音声信号受信装置から出力するとともに、出力音声バッファ１４に書き込み、ステップＳ１に戻る。
ステップＳ９：ステップＳ７でパケットロスであると判定された場合は、バッファ探索部１８により遅延バッファ部１９内のパケットロスのフレーム番号より大きい番号のフレームがパケットロスであるか判定し、パケットロスでないフレームが見つかるまでフレーム番号の小さい順に判定することを繰り返す。
ステップＳ１０：ステップＳ９の探索により得られたフレームの音声信号と、出力音声バッファ１４に保持されている前フレームの音声信号を使って音声波形補間処理部１７において前述した補間処理により補間音声信号を生成してスイッチＳＷを介して出力するとともに出力音声バッファ１４に書き込み、ステップＳ１に戻る。 Step S8: If it is determined in step S7 that there is no packet loss, the read audio signal is output from the audio signal receiving device, written to the output audio buffer 14, and the process returns to step S1.
Step S9: If it is determined in step S7 that there is a packet loss, the buffer search unit 18 determines whether a frame with a number greater than the frame number of the packet loss in the delay buffer unit 19 is a packet loss. The determination is repeated in ascending order of frame number until a frame is found.
Step S10: Using the audio signal of the frame obtained by the search in step S9 and the audio signal of the previous frame held in the output audio buffer 14, the audio waveform interpolation processing unit 17 converts the interpolated audio signal by the interpolation processing described above. It is generated and output via the switch SW and written to the output audio buffer 14, and the process returns to step S1.

以上説明したように、この発明では復号音声信号を保持する遅延バッファ部１９を設けて、その遅延バッファ部１９内を探索して補間処理に使用するための先読み波形を得ているので、受信バッファ１２の書き込み、読み出し処理とのタイミングを考慮する必要がなく、補間処理によるパケット消失補償処理を容易に実施することができる。
［実施例２］ As described above, in the present invention, the delay buffer unit 19 that holds the decoded audio signal is provided, and the pre-read waveform for use in the interpolation processing is obtained by searching the delay buffer unit 19. It is not necessary to consider the timing of the 12 writing and reading processes, and the packet loss compensation process by the interpolation process can be easily performed.
[Example 2]

図６の構成において、遅延バッファ部１９は、制御部１６から指定される要保持フレーム数N_F分の復号音声信号を内部に保持するが、制御部１６から指定される要保持フレーム数N_Fが途中で変わった場合は、遅延バッファ部１９は更新された要保持フレーム数N_Fの復号音声信号を内部に保持するように、遅延バッファ部１９の内部状態を遷移させる。例えば、要保持フレーム数N_Fが減った場合には、無音区間または非音声区間の復号音声信号の遅延バッファ部１９への取り込みを禁止して破棄することにより、遅延バッファ部１９内に保持されているフレーム数を減らす。要保持フレーム数N_Fが増えた場合には、無音区間または非音声区間での同じ復号音声信号の遅延バッファ部１９への取り込みを複数回行なうことにより遅延バッファ部１９内に保持するフレーム数を増やす。 In the configuration of FIG. 6, the delay buffer unit 19 retains the decoded speech signal of a main holding frame number N _F content designated by the control unit 16 therein, a main holding frame number N _F designated by the control unit 16 Is changed in the middle, the delay buffer unit 19 transitions the internal state of the delay buffer unit 19 so as to hold the updated decoded speech signal of the required number of frames N _F inside. For example, when the number of required frames N _F decreases, the decoded speech signal in the silence period or the non-speech period is prohibited from being taken into the delay buffer unit 19 and discarded, thereby being held in the delay buffer unit 19. Reduce the number of frames. When the number of required frames N _F increases, the number of frames to be held in the delay buffer unit 19 is obtained by taking the same decoded audio signal into the delay buffer unit 19 a plurality of times during a silence period or a non-speech period. increase.

そのような遅延バッファ部１９の構成例を図１１に、その動作フローを図１２に示す。遅延バッファ部１９はフレームバッファ１９Ａと、比較部１９Ｂと、無音区間検出部１９Ｃと、書込み制御部１９Ｄとで構成する。以下では簡単のため、要保持フレーム数N_Fの初期値N_F0は予め決められた値であり、復号開始からパケットロスが生じないでフレームバッファ１９Ａにフレーム数ｍがｍ＝N_F0となるまで復号音声信号S_Dが取り込まれたものとする。
比較部１９Ｂは制御部１６から要保持フレーム数N_Fを受け（ステップＳ１）、書込み制御部１９Ｄから与えられたフレームバッファ１９Ａ内のフレーム数ｍと比較し、比較結果を書込み制御部１９Ｄに与える（ステップＳ２）。 A configuration example of such a delay buffer unit 19 is shown in FIG. 11, and an operation flow thereof is shown in FIG. The delay buffer unit 19 includes a frame buffer 19A, a comparison unit 19B, a silent period detection unit 19C, and a write control unit 19D. For simplicity in the following, the initial value N _F0 of a main holding frame number N _F is the predetermined value, until the number of frames in the frame buffer 19A m is m = N _F0 in packet loss from the decoding start does not occur It is assumed that the decoded audio signal _SD is captured.
The comparison unit 19B receives the number of required frames N _F from the control unit 16 (step S1), compares it with the number m of frames in the frame buffer 19A given from the write control unit 19D, and gives the comparison result to the write control unit 19D. (Step S2).

無音区間検出部１９Ｃは音声波形復号部１３から与えられている復号音声信号S_Dが無音区間の信号であるか否かを判定し（ステップＳ３及びＳ６）、判定結果を書込み制御部１９Ｄに与える。書込み制御部１９Ｄは、ｍ＜N_Fで、かつ復号音声信号S_Dが無音区間のものである場合は復号音声信号S_Dをフレームバッファ１９Ａに取り込み、ｍの値を１だけ増加させ（ステップＳ４）、ステップＳ５に移る。ステップＳ３で復号音声信号S_Dが無音区間のものでないと判定された場合は、そのままステップＳ５に移る。ステップＳ２でｍ＝N_Fの場合もステップＳ５に移る。 The silent section detector 19C determines whether or not the decoded speech signal _SD given from the speech waveform decoder 13 is a silent section signal (steps S3 and S6), and gives the determination result to the write controller 19D. . If m <N _F and the decoded audio signal _SD is in the silent section, the write control unit 19D takes the decoded audio signal _SD into the frame buffer 19A and increases the value of m by 1 (step S4). ), The process proceeds to step S5. If it is determined in step S3 that the decoded speech signal _SD is not in the silent section, the process proceeds to step S5 as it is. In the case of m = N _F in step S2 proceeds to a step S5.

ステップＳ５で書込み制御部１９Ｄは与えられている復号音声信号S_Dをフレームバッファ１９Ａに取り込み、ステップＳ１に戻る。フレームバッファ１９Ａは１フレーム分の音声信号が取り込まれると、最も番号の小さいフレームの音声信号を出力するので、ステップＳ５の処理によるｍの値の増減はない。 In step S5, the writing control unit 19D takes the supplied decoded audio signal _SD into the frame buffer 19A, and returns to step S1. Since the frame buffer 19A outputs the audio signal of the frame with the smallest number when the audio signal for one frame is taken in, there is no increase / decrease in the value of m by the process of step S5.

ｍ＞N_Fの場合は、ステップＳ６で無音区間でないと判定されていれば上記ステップＳ５を実行する。ステップＳ６で無音区間と判定されている場合、書込み制御部１９Ｄは、与えられているその無音区間の復号音声信号S_Dの取込を禁止して破棄し、フレームバッファ１９Ａから最も番号の小さいフレームを出力するので、フレームバッファ１９Ａ内のフレーム数ｍは１だけ減り、従って、ｍの値を１だけ減じて（ステップＳ７）ステップＳ１に戻る。
このように、フレーム毎に図１２の処理を行なうことにより、ｍ＝N_Fとなるように次第にフレーム数ｍが変化する。 If m> N _F , if it is determined in step S6 that it is not a silent section, step S5 is executed. If it is determined in step S6 that there is a silent section, the writing control unit 19D prohibits and discards the given decoded speech signal _{SD in} that silent section, and the frame with the smallest number from the frame buffer 19A. Therefore, the number m of frames in the frame buffer 19A is decreased by 1, so the value of m is decreased by 1 (step S7) and the process returns to step S1.
Thus, by performing the processing in FIG. 12 for each frame, m = N _F and so as to gradually frame number m is changed.

無音区間検出部１９Ｃによる無音区間の検出方法としては、例えば復号音声信号に含まれている音声パラメータの１つであるパワーが予め決めた閾値より小さい場合に無音区間と判定する。上記ステップＳ４、Ｓ５を実行した場合は、与えられている同じ復号音声信号S_Dが２回フレームバッファ１９Ａに取り込まれることになる。ステップＳ４において復号音声信号S_Dをフレームバッファ１９Ａに取り込む代わりに、例えば図１１に破線で示すように波形メモリ１９Ｅに無音区間または非音声区間の予め決めた音声波形を格納しておき、ステップＳ４でその音声波形の音声信号をフレームバッファ１９Ａに取り込んでもよい。
［実施例３］ As a method for detecting a silence interval by the silence interval detector 19C, for example, when the power, which is one of the audio parameters included in the decoded audio signal, is smaller than a predetermined threshold value, the silence interval is determined. When steps S4 and S5 are executed, the same decoded audio signal _SD that has been given is fetched twice into the frame buffer 19A. Instead of fetching the decoded speech signal _SD into the frame buffer 19A in step S4, for example, a predetermined speech waveform of a silent period or a non-speech period is stored in the waveform memory 19E as shown by a broken line in FIG. Then, the audio signal having the audio waveform may be taken into the frame buffer 19A.
[Example 3]

図１３は、前述のように、復号された第ｋフレームの音声信号S_Dが、狭義の音声波形信号即ちPCM形式を代表とする音声波形信号S_PCMと、フレーム番号差分である相対位置ｊと、第k-jフレームの音声パラメータP_k-jとがセットになっている例である。つまり、音声信号送信装置（図１参照）側で第ｋフレームのPCM形式音声波形信号S_PCMの符号に、jフレーム前である第k-jフレームの音声パラメータP_k-jの符号をセットにして音声パケットに挿入し、送信するものとする。 In FIG. 13, as described above, the decoded speech signal _SD of the k-th frame is a narrowly defined speech waveform signal, that is, a speech waveform signal S _PCM typified by the PCM format, and a relative position j that is a frame number difference. This is an example in which the audio parameter P _kj of the kth frame is set. That is, on the voice signal transmitting apparatus (see FIG. 1) side, the code of the PCM format voice waveform signal S _PCM of the kth frame is set to the code of the voice parameter P _kj of the kjth frame before j frames, and the voice packet is set. Insert and send.

もし第ｋフレームのパケットに第ｋフレームの音声波形信号符号と第ｋフレームの音声パラメータ符号のみを挿入した場合には、第ｋフレームのパケットが受信できなかった場合にその第ｋフレームの音声情報はまったく無くなってしまうので、隣接フレームの音声信号から補間処理により音声信号を生成しても品質の高い音声信号が得られない。これに対し、図１３のように第ｋフレームの音声波形信号S_PCMに対し第k-jフレームの音声パラメータP_k-jを付加しておけば、出力制御部２１が、遅延バッファ部１９から出力しようとする第ｋフレームがパケットロスであることを検出した場合に、遅延バッファ部１９内に保持されている第k+jフレームの音声信号S_Dに含まれている第ｋフレームの音声パラメータP_k中のピッチ長を使って例えば図５で説明した前後のフレーム波形からそれぞれ１ピッチ長の波形を切り出し、補間処理により波形生成を行い、更に必要であれば音声パラメータP_k中のパワーを使って、補間処理により生成した波形のパワーを補正することにより、第ｋフレームにパケットロスが生じても比較的に品質の高い音声出力を得ることができるからである。 If only the kth frame speech waveform signal code and the kth frame speech parameter code are inserted into the kth frame packet, if the kth frame packet cannot be received, the kth frame speech information is received. Therefore, even if an audio signal is generated from an audio signal of an adjacent frame by interpolation processing, a high-quality audio signal cannot be obtained. On the other hand, as shown in FIG. 13, if the speech parameter P _kj of the kjth frame is added to the speech waveform signal S _PCM of the kth frame, the output control unit 21 tries to output from the delay buffer unit 19. when the k-th frame is detected to be a packet loss, in the speech parameter P _k of the k-th frame included in the audio signal S _D of the k + j frame held in the delay buffer 19 Using the pitch length, for example, a 1-pitch length waveform is cut out from the previous and next frame waveforms described in FIG. 5, and the waveform is generated by interpolation processing. If necessary, the power in the audio parameter P _k is used for interpolation. This is because by correcting the power of the waveform generated by the processing, it is possible to obtain a relatively high-quality audio output even if a packet loss occurs in the kth frame.

遅延バッファ部１９に制御部１６から指定される要保持フレーム数N_Fが途中で変化し、前述の図１２で説明した処理により遅延バッファ部１９が更新された要保持フレーム数N_Fの音声信号を内部に保持するためにフレーム波形を破棄/挿入する場合、その対象がPCM音声波形信号S_PCMだけであれば、破棄/挿入に伴って必要となるような処理は発生しないが、図１３のように、第ｋフレームのPCM音声波形信号S_PCMに、フレーム番号k-jの音声パラメータP_k-jがセットになっている場合には、フレーム波形の破棄/挿入処理によって、PCM音声波形信号S_PCMと音声パラメータP_k-jのフレーム相対位置関係がずれてしまうことになる。そのような揚合には、遅延バッファ部１９において、フレーム波形の破棄/挿入処理とともに、音声パラメータの相対位置関係の修正処理を併せて行う必要がある。 Number main holding frame designated by the control unit 16 to the delay buffer unit 19 N _F is changed in the middle, the audio signal of the main holding frame number N _F of the delay buffer 19 is updated by the processing described with reference to FIG. 12 described above When the frame waveform is discarded / inserted in order to hold the signal inside, if the target is only the PCM speech waveform signal S _PCM , the processing necessary for the discard / insertion does not occur. as such, the PCM audio waveform signal S _PCM k th frame, when the speech parameters P _kj frame number kj is in sets, the discard / insertion process of the frame waveform, PCM audio waveform signal S _PCM and voice The frame relative positional relationship of the parameter P _kj will shift. For such an assembly, the delay buffer unit 19 needs to perform a process of correcting the relative positional relationship of the audio parameters together with the discarding / inserting process of the frame waveform.

図１４にそのための遅延バッファ部１９の構成を示し、図１５にその処理フローを示す。図１４の遅延バッファ部１９は、図１１に示した遅延バッファ部１９において書込み制御部１９Ｄ内に相対位置修正部19D1を設けた構成になっている。音声波形復号部１３からの復号音声信号S_Dの挿入または削除を行う場合は、相対位置修正部19D1が復号音声信号に続くｊ個またはj-1個のフレームの音声信号に含まれる相対位置ｊをそれぞれj+1またはj-1に修正を行う。 FIG. 14 shows the configuration of the delay buffer unit 19 for that purpose, and FIG. 15 shows the processing flow. The delay buffer unit 19 shown in FIG. 14 has a configuration in which a relative position correcting unit 19D1 is provided in the write control unit 19D in the delay buffer unit 19 shown in FIG. When inserting or deleting the decoded audio signal _SD from the audio waveform decoding unit 13, the relative position j included in the audio signal of j frames or j-1 frames following the decoded audio signal by the relative position correcting unit 19D1. Are corrected to j + 1 or j-1, respectively.

図１５に示す処理フローは、図１２に示した処理フローにおいてステップＳ４とＳ７に相対位置ｊの修正処理が追加されている。ｍ＜N_Fの場合、ステップＳ３で復号音声信号が無音区間のものであると判定されると、ステップＳ４において復号音声信号に続くｊ個のフレームの音声信号にそれぞれ含まれる相対位置ｊをj+1に修正し、その後、復号音声信号をフレームバッファ１９Ａに取り込むとともに、ｍを１だけ増加させる。ステップＳ７においては、復号音声信号の書き込みを禁止し、ｍを１だけ減算し、復号音声信号に続くj-1個のフレームにそれぞれ含まれる相対位置ｊをj-1に修正する。その他の処理は図１２の処理と同じである。 Processing flow shown in FIG. 15, the correction processing of the relative position j in step S4 and S7 in the process flow shown in FIG. 12 has been added. For m <N _F, the decoded speech signal is determined to be silent section at step S3, the relative position j respectively included in the audio signal of the j frame following the decoded speech signal Te step S4 smell Then, the decoded audio signal is taken into the frame buffer 19A and m is increased by 1. In step S7, writing of the decoded audio signal is prohibited, m is subtracted by 1, and the relative position j included in each of j −1 frames following the decoded audio signal is corrected to j−1. Other processes are the same as those in FIG.

なお、図１３，１４，１５の説明では、相対位置ｊは第ｋフレームの音声パラメータがｊフレーム後の第k+jフレームに収納されていることを意味する場合として説明したが、相対位置ｊがｊフレーム前の第k-jフレームに収納されていると定義することもできる。その場合は、例えばステップＳ４で音声信号の挿入が行われると、フレームバッファ１９Ａ内の最新のｊ個のフレームに対し、ｊをj+1に修正し、ステップＳ７で書き込み禁止を行った場合は、フレームバッファ１９Ａ内の最新のj-1個のフレームに対しｊをj-1に修正すればよい。
［実施例４］ In the description of FIGS. 13, 14, and 15, the relative position j has been described as meaning that the sound parameter of the kth frame is stored in the k + j frame after the jth frame. Can be defined as being stored in the kj frame before j frames. In this case, for example, when an audio signal is inserted in step S4, j is corrected to j + 1 for the latest j frames in the frame buffer 19A, and writing is prohibited in step S7. The j may be corrected to j-1 for the latest j -1 frames in the frame buffer 19A .
[Example 4]

図１６は、本発明の音声信号受信装置を広帯域音声符号化に代表される帯域分割符号化に適用した構成例である。図６におけるパケット消失補償処理部１００が低域パケット消失補償処理部100Lと高域パケット消失補償処理部100Hの２つ設けられた構成になっている。それらの構成部は低域側には記号Ｌを、高域側には記号Ｈをそれぞれの参照番号に付加して示してある。 FIG. 16 shows a configuration example in which the speech signal receiving apparatus of the present invention is applied to band division coding represented by wideband speech coding. The packet loss compensation processing unit 100 shown in FIG. 6 has two low-band packet loss compensation processing units 100L and a high-band packet loss compensation processing unit 100H. These components are indicated by adding a symbol L to the low frequency side and a symbol H to the high frequency side, respectively.

音声波形復号部１３は、受信バッファ１２から取り出した音声パケットを復号して、高域音声信号S_DHと低域音声信号S_DLをそれぞれ出力する。高域音声信号S_DHは、高域パケット消失補償処理部100Hの遅延バッファ部１９Ｈに送られる。低域音声信号S_DLは、低域パケット消失補償処理部100Lの遅延バッファ部１９Ｌに送られる。 The speech waveform decoding unit 13 decodes the speech packet extracted from the reception buffer 12 and outputs a high frequency audio signal _SDH and a low frequency audio signal _SDL , respectively. The high frequency audio signal S _DH is sent to the delay buffer unit 19H of the high frequency packet loss compensation processing unit 100H. Low-band speech signal S _DL is sent to the delay buffer portion 19L of the low-band packet loss compensation processing unit 100L.

高域パケット消失補償処理部100Hと低域パケット消失補償処理部100Lの処理は、それぞれ実施例１〜３の図６と同様である。ただし、高域の音声波形補間処理部１７Ｈの補間処理は、ピッチ単位で切り出して並べる方法をとらず、単純にフレーム長の波形をピッチ単位の波形とみなしてそのまま補間に使用する方法でもよい。高域パケット消失補償処理部100Hと低域パケット消失補償処理部100Lから出力されたそれぞれの音声信号は、帯域合成部２２で帯域合成されて、出力音声として出力される。 The processes of the high frequency packet loss compensation processing unit 100H and the low frequency packet loss compensation processing unit 100L are the same as those in FIGS. However, the interpolation processing of the high frequency speech waveform interpolation processing unit 17H may be a method in which a frame length waveform is simply regarded as a pitch unit waveform and used for interpolation as it is, instead of a method of cutting out and arranging in units of pitch. The audio signals output from the high frequency packet loss compensation processing unit 100H and the low frequency packet loss compensation processing unit 100L are subjected to band synthesis by the band synthesis unit 22 and output as output audio.

制御部１６は、パケットロス判定部１５からパケットロスが発生した旨（即ちパケットロスフラグF_PL=1）を受け取ると、その旨を遅延バッファ部１９Ｈ，１９Ｌに伝える。また、制御部１６は、図６の場合と同様に、パケットロス判定部１５から受け取るパケットロスの情報(パケットロスフラグF_PL)から統計的にパケットロスの頻度やパターンを求め、遅延バッファ部１９Ｈ，１９Ｌ内に保持すべきフレーム数N_Fを決定して、要保持フレーム数N_Fを高域パケット消失補償処理部100Hと低域パケット消失補償処理部100Lの遅延バッファ部１９Ｈ，１９Ｌに指示する。 When the control unit 16 receives from the packet loss determination unit 15 that a packet loss has occurred (that is, the packet loss flag F _PL = 1), the control unit 16 notifies the delay buffer units 19H and 19L to that effect. Similarly to the case of FIG. 6, the control unit 16 statistically obtains the frequency and pattern of packet loss from the packet loss information (packet loss flag F _PL ) received from the packet loss determination unit 15, and the delay buffer unit 19 H , to determine the frame number N _F to be held in a 19L, instructs the main holding frame number N _F highband packet loss compensation processing unit 100H and a low packet loss compensation processing section 100L of the delay buffer unit 19H, the 19L .

遅延バッファ部１９Ｈ，１９Ｌは、制御部１６から指定された要保持フレーム数N_Fの復号音声信号を内部に保持する。制御部１６から指定された要保持フレーム数N_Fが途中で変わった場合は、遅延バッファ部１９Ｈ，１９Ｌは更新された要保持フレーム数N_Fの復号音声信号を内部に保持するように、遅延バッファ部１９Ｈ．１９Ｌ内のフレーム保持状態を遷移させる。即ち、図６の構成と同様に、例えば、要保持フレーム数N_Fが減った場合には、無音区間または非音声区間でフレーム波形を破棄し、要保持フレーム数N_Fが増えた場合には、無音区間または非音声区間で無音に相当するフレーム波形を挿入すればよい。
［実施例５］ The delay buffer units 19H and 19L hold the decoded audio signal having the number of required frames N _F designated by the control unit 16 inside. When the required number of required frames N _F specified by the control unit 16 changes halfway, the delay buffers 19H and 19L delay so that the updated decoded speech signal of the required number of required frames N _F is held inside. Buffer unit 19H. The frame holding state in 19L is transitioned. That is, similar to the configuration of FIG. 6, for example, in the case when the reduced the main holding frame number N _F is that discards the frame waveform silence section or non-speech section, have increased main holding frame number N _F is A frame waveform corresponding to silence may be inserted in a silent section or a non-voice section.
[Example 5]

図１２の説明から理解されるように、図１６の構成において、制御部１６による要保持フレーム数N_Fの変更指示から、両方の遅延バッファ部１９Ｈ，１９Ｌに、指示されたフレーム数N_Fの音声信号が実際に蓄積される状態になるまで、タイムラグが生じることが多い。無音区間で波形の削除、挿入を行う方法では、指示を受けた後、無音区間になるまで待たなければならないからである。その場合でも、高域パケット消失補償処理部100Hと低域パケット消失補償処理部100Lの両遅延バッファ部１９Ｈ，１９Ｌの実際の遅延量の増減、即ち保持フレーム数の増減は、必ず同期するように制御する。 As understood from the description of FIG. 12, in the configuration of FIG. 16, from the instruction to change the number of required frames N _F by the control unit 16, both of the instructed number of frames N _F are sent to both delay buffer units 19H and 19L. A time lag often occurs until the audio signal is actually accumulated. This is because, in the method of deleting and inserting a waveform in a silent section, it is necessary to wait until the silent section is reached after receiving an instruction. Even in such a case, increase / decrease of the actual delay amount of both the delay buffer units 19H and 19L of the high frequency packet loss compensation processing unit 100H and the low frequency packet loss compensation processing unit 100L, that is, increase / decrease of the number of retained frames must be synchronized. Control.

例えば、低域のほうが、制御部１６から要保持フレーム数N_Fの変更を遅延バッファ部１９Ｌに指示した後、実際に低域保持フレーム数ｍ_Lが指示値N_Fに遷移するまでに時間がかかることが多いため、制御部１６はまず低域の遅延バッファ部１９Ｌに要保持フレーム数N_Fの指示を行い、低域の遅延バッファ部１９Ｌの実際の保持フレーム数ｍ_Lを図１６中に破線で示すように高域の遅延バッファ部１９Ｈに伝えて、与えられた高域復号音声信号S_DHが無音区間のものであるか否かにかかわらず、削除または複数回取り込みを行なうことで高域の遅延バッファ部１９Ｈの保持フレーム数ｍ_Hを強制的に低域の現フレーム数ｍ_Lに合わせるようにしてもよい。
［実施例６］ For example, in the low frequency region, after the control unit 16 instructs the delay buffer unit 19L to change the number of required frames N _F , the time until the low frequency frame count _ML actually changes to the command value N _F is longer. In many cases, the control unit 16 first instructs the low-frequency delay buffer unit 19L about the number N _F of necessary holding frames, and the actual holding frame number m _L of the low-frequency delay buffer unit 19L is shown in FIG. As shown by the broken line, it is transmitted to the high-frequency delay buffer unit 19H, and the high-frequency decoded speech signal _SDH is deleted or captured multiple times regardless of whether or not the high-frequency decoded speech signal _SDH is in a silent period. the frame number m _H holding the delay buffer portion 19H of the frequency forcibly may be matched to the current frame number m _L of the low band.
[Example 6]

図１７は、図６（及び図１６）における音声波形補間処理部１７（１７Ｈ，１７Ｌ）の構成例を示したものである。この例では音声波形補間処理部１７はピッチ抽出部１７Ａ，前向き波形生成部１７Ｂ、後向き波形生成部１７Ｃ、及び重み付け加算部１７Ｄとから構成されている。ピッチ抽出部１７Ａは、出力音声バッファ１４（１４Ｈ，１４Ｌ）内の音声信号を分析して、音声の基本周期に対応するピッチ長を決定し、ピッチ長を前向き波形生成部１７Ｂと後向き波形生成部１７Ｃに送る。ピッチ長は、通常サンプル数で表し、8kHzサンプリングの場合は20〜140程度の値であることが多い。前向き波形生成部１７Ｂは、図５を用いて説明したように、出力音声バッファ１４内の音声信号の最後のサンプル点からピッチ長の波形をコピーして、ロスフレームバッファ内に順に並べて貼り付ける。 FIG. 17 shows a configuration example of the speech waveform interpolation processing unit 17 (17H, 17L) in FIG. 6 (and FIG. 16). In this example, the speech waveform interpolation processing unit 17 includes a pitch extraction unit 17A, a forward waveform generation unit 17B, a backward waveform generation unit 17C, and a weighting addition unit 17D. The pitch extraction unit 17A analyzes the audio signal in the output audio buffer 14 (14H, 14L), determines the pitch length corresponding to the basic period of the audio, and sets the pitch length to the forward waveform generation unit 17B and the backward waveform generation unit. Send to 17C. The pitch length is usually expressed by the number of samples, and in the case of 8 kHz sampling, it is often a value of about 20 to 140. As described with reference to FIG. 5, the forward waveform generation unit 17B copies the waveform having the pitch length from the last sample point of the audio signal in the output audio buffer 14 and pastes the waveforms in order in the loss frame buffer.

後向き波形生成部１７Ｃも、図５を用いて説明したように、先読み波形の先頭から1ピッチ波形をコピーして、現在のフレームバッファ内に後向きに波形を並べて貼り付ける処理を行う。重み付け加算部１７Ｄは、上記前向き波形生成部１７Ｂによって生成された波形と、上記後向き波形生成部１７Ｃによって生成された波形に重み付けをして加算することによって、1フレーム分の補間波形を作成し、補間音声信号として出力する。
［変形実施例］ As described with reference to FIG. 5, the backward waveform generation unit 17 </ b> C also performs a process of copying the 1-pitch waveform from the beginning of the prefetch waveform and arranging the waveforms in the backward direction in the current frame buffer. The weighting addition unit 17D creates an interpolation waveform for one frame by weighting and adding the waveform generated by the forward waveform generation unit 17B and the waveform generated by the backward waveform generation unit 17C, Output as an interpolated audio signal.
[Modification]

図１７では出力音声バッファ１４に保持されている音声信号を分析してピッチ長を抽出し、そのピッチ長で音声波形のコピー、貼り付けを行なって補間音声信号を生成したが、例えば前述のように、復号された各フレームの音声信号S_Dが、狭義の音声波形信号即ちPCM形式を代表とする音声波形信号と、音声パラメータがセットになっている場合には、その音声パラメータに含まれるピッチパラメータを利用してもよい。ここでは、図１３の例とは異なり、各フレームの音声信号S_Dが図１８に示すように、ｋを現在のフレーム番号とすると、第ｋフレームの音声波形信号S_PCMと、第ｋフレームの音声パラメータP_kがセットになっているものとする。 In FIG. 17, the audio signal held in the output audio buffer 14 is analyzed to extract the pitch length, and the audio waveform is copied and pasted with the pitch length to generate the interpolated audio signal. In addition, when the decoded speech signal _{SD of} each frame is a speech waveform signal in a narrow sense, that is, a speech waveform signal typified by the PCM format, and a speech parameter, the pitch included in the speech parameter Parameters may be used. Here, unlike the example of FIG. 13, as shown in FIG. 18, when the audio signal _{SD of} each frame is k as the current frame number, the audio waveform signal S _PCM of the kth frame and It is assumed that the audio parameter P _k is set.

図１９は、図１８の音声信号を使用する場合の音声波形補間処理部１７（１７Ｈ，１７Ｌ）の構成例である。この場合には、遅延バッファ部１９内の各フレームの音声信号、出力音声バッファ１４内の音声信号とも音声パラメータがセットになっている。遅延バッファ部１９内の探索により読み出された先読み波形の音声信号にセットされている音声パラメータからピッチパラメータ取得部１７Ｅでピッチ長が取得され、得られたピッチ長は後向き波形生成部１７Ｃに送られる。出力音声バッファ１４内の音声信号にセットされている音声パラメータからピッチパラメータ取得部１７Ｆでピッチ長が取得され、得られたピッチ長は前向き波形生成部１７Ｂに送られる。その他の処理は図１７の場合と同様である。 FIG. 19 is a configuration example of the speech waveform interpolation processing unit 17 (17H, 17L) when the speech signal of FIG. 18 is used. In this case, audio parameters are set for both the audio signal of each frame in the delay buffer unit 19 and the audio signal in the output audio buffer 14. The pitch length is acquired by the pitch parameter acquisition unit 17E from the audio parameters set in the audio signal of the pre-read waveform read by the search in the delay buffer unit 19, and the obtained pitch length is sent to the backward waveform generation unit 17C. It is done. The pitch length is acquired by the pitch parameter acquisition unit 17F from the audio parameters set in the audio signal in the output audio buffer 14, and the obtained pitch length is sent to the forward waveform generation unit 17B. Other processes are the same as those in FIG.

図１７の方法と図１９の方法を比べると、図１９のほうが品質劣化の少ない補間波形が得られる。また、図１９のほうが、受信側の処理量が少ないというメリットもある。 When the method of FIG. 17 is compared with the method of FIG. 19, an interpolated waveform with less quality degradation is obtained in FIG. Further, FIG. 19 has an advantage that the processing amount on the receiving side is small.

図１９の音声波形補間処理部では図１８のように同じフレーム番号の音声波形信号と音声パラメータのセットで構成された音声信号を使用する場合の構成例を示したが、図１３で示したように音声波形信号のフレーム番号と音声パラメータのフレーム番号が異なる復号音声信号S_Dを使うこともできる。その場合は、図１９においてピッチパラメータ取得部１７Ｆは設けず、遅延バッファ部１９から出力しようとする第ｋフレームがロスフレームであった場合、ピッチパラメータ取得部１７Ｅは遅延バッファ部１９内に保持されている第k+jフレームの復号音声信号に含まれている第ｋフレームの音声パラメータP_kからピッチ長を取得して後向き波形生成部１７Ｃに与えると共に、同じピッチ長を破線で示すように前向き波形生成部１７Ｂにも与える。 The speech waveform interpolation processing unit of FIG. 19 shows a configuration example in the case of using a speech signal composed of a speech waveform signal of the same frame number and a speech parameter set as shown in FIG. 18, but as shown in FIG. It is also possible to use a decoded speech signal _SD whose speech waveform signal frame number and speech parameter frame number are different. In that case, in FIG. 19, the pitch parameter acquisition unit 17F is not provided, and when the k-th frame to be output from the delay buffer unit 19 is a loss frame, the pitch parameter acquisition unit 17E is held in the delay buffer unit 19. The pitch length is acquired from the speech parameter P _k of the kth frame included in the decoded speech signal of the k + j frame and is given to the backward waveform generation unit 17C, and the same pitch length is forwarded as indicated by a broken line. This is also given to the waveform generator 17B.

以上説明したこの発明による音声信号受信装置は、その処理をコンピュータでプログラムを実行することにより実施するように構成してもよい。また、この発明による音声パケット消失補償方法をコンピュータで実施可能なプログラムとして記録媒体に格納しておき、その記録媒体のプログラムをコンピュータで実施するようにしてもよい。 The audio signal receiving apparatus according to the present invention described above may be configured to execute the processing by executing a program on a computer. Further, the voice packet loss compensation method according to the present invention may be stored in a recording medium as a computer-executable program, and the program of the recording medium may be executed by the computer.

パケット通信網上で音声通信を行う利用形態が普及してきており、本発明を適用することによって、安価で信頼駐の高い音声通信が実現できる。 Usage forms for performing voice communication on a packet communication network have become widespread, and by applying the present invention, voice communication with low cost and high reliability can be realized.

音声信号をパケット化して通信するシステムの例を示す図。The figure which shows the example of the system which packetizes an audio | voice signal and communicates. 従来の音声信号受信装置の構成例を示すブロック図。The block diagram which shows the structural example of the conventional audio | voice signal receiver. 図２における音声波形補間処理を概念的に示す波形図。FIG. 3 is a waveform diagram conceptually showing speech waveform interpolation processing in FIG. 2. 従来の音声信号受信装置の他の構成例を示すブロック図。The block diagram which shows the other structural example of the conventional audio | voice signal receiver. 図４における音声波形補間処理を概念的に示す波形図。FIG. 5 is a waveform diagram conceptually showing speech waveform interpolation processing in FIG. 4. 本発明による音声信号受信装置の構成例を示すブロック図。The block diagram which shows the structural example of the audio | voice signal receiver by this invention. ｎ連続パケットロス率の計算を実行する処理フロー図。The processing flow figure which performs calculation of n continuous packet loss rate. パケットロス率に対する要保持フレーム数の例を示す表。The table | surface which shows the example of the number of required frames with respect to a packet loss rate. Ａは連続パケットロス率に対する要保持フレーム数を示す表の一例、Ｂは表の他の例。A is an example of a table indicating the number of required frames for the continuous packet loss rate, and B is another example of the table. 音声パケット消失補償の処理フロー図。The processing flow figure of voice packet loss compensation. フレーム波形削除／挿入を行なう遅延バッファ部の構成例を示すブロック図。The block diagram which shows the structural example of the delay buffer part which performs frame waveform deletion / insertion. 図１１におけるフレーム波形削除／挿入処理を行なうフロー図。FIG. 12 is a flowchart for performing frame waveform deletion / insertion processing in FIG. 11. PCM音声波形信号に音声パラメータがセットになった音声信号の例を示す図。The figure which shows the example of the audio | voice signal by which the audio | voice parameter was set to the PCM audio | voice waveform signal. 音声パラメータのフレーム相対位置を修正する遅延バッファ部の構成例を示すブロック図。The block diagram which shows the structural example of the delay buffer part which corrects the flame | frame relative position of an audio | voice parameter. 図１４における音声パラメータのフレーム相対位置を修正する処理フロー図。FIG. 15 is a process flow diagram for correcting the frame relative position of the audio parameter in FIG. 14. 本発明による音声信号受信装置の他の構成例を示すブロック図。The block diagram which shows the other structural example of the audio | voice signal receiver by this invention. 音声波形補間処理部の構成例を示すブロック図。The block diagram which shows the structural example of a speech waveform interpolation process part. PCM音声波形信号に音声パラメータがセットになった音声信号の他の例を示す図。The figure which shows the other example of the audio | voice signal by which the audio | voice parameter was set to the PCM audio | voice waveform signal. 音声波形補間処理部の他の構成例を示すブロック図。The block diagram which shows the other structural example of a speech waveform interpolation process part.

Claims

A reception buffer for temporarily storing received voice packets;
A voice packet is extracted from the reception buffer in ascending order of the frame number, the voice code in the voice packet is decoded, and the PCM format voice waveform signal S _{PCM of} the k-th frame (k is the frame number) is the frame number difference. Decoding means for obtaining an audio signal in which the relative position j and the audio parameter P _kj of the kth frame are set ;
Packet loss determination means for determining whether or not a voice packet to be extracted is accumulated in the reception buffer, and generating a determination result as a packet loss flag indicating whether or not a packet loss has occurred ;
Determining the number of frames to be held in the delay buffer means (hereinafter referred to as the number of required frames), and determining the number of required frames to be given to the delay buffer means;
If the number of frames held in the delay buffer means is larger than the number of required frames, the decoded audio signal is discarded and the audio signals stored in the frame buffer are output in ascending order of frame numbers. When the number of frames held in the delay buffer means is smaller than the number of required frames, the decoded audio signal is accumulated twice in the frame buffer and the audio signal accumulated in the frame buffer is framed. If the number of frames output in ascending order and the number of frames held in the delay buffer means is equal to the number of required frames, the decoded audio signal is stored in the frame buffer and stored in the frame buffer. a delay buffer unit which output the audio signals in ascending order of frame numbers,
An output speech buffer means for accumulation by a predetermined time or number of frames the output audio signal,
When the frame to be output from the delay buffer means is a packet loss, the audio waveform signal of the output audio signal in the output audio buffer and the frame after the frame that is the packet loss in the delay buffer means Voice waveform interpolation processing means for generating an interpolated voice signal using a voice waveform signal and a voice parameter corresponding to the voice waveform signal of the frame that is the packet loss in the delay buffer means ;
If the frame to be output from the delay buffer means is not a packet loss, the audio signal from the delay buffer means is output as an output audio signal, and if it is a packet loss, the interpolated audio signal from the audio waveform interpolation processing means is output. An audio signal receiving device including output control means for outputting as an audio signal,
The delay buffer means includes
If the number of frames held in the delay buffer means is larger than the number of required frames, the value of the relative position j included in the j-1 audio signals following the discarded frame is corrected to j-1. If the number of frames held in the delay buffer means is smaller than the number of required frames, the value of the relative position j included in the audio signal of j frames following the frame is set to j + 1 Relative position correcting means for correcting to
Audio signal receiving apparatus which comprises and.

The audio signal receiving device according to claim 1 ,
Given above SL packet loss flag, further comprising a packet loss statistics calculation hand stage for obtaining the statistical values of packet loss,
The required frame number determining means includes:
Based on the packet loss statistics, the delay buffer means the number of frames to be retained within the determined (hereinafter main holding frame number), the audio signal receiving apparatus, wherein the Turkey applied to the delay buffer means.

The audio signal receiving apparatus according to claim 2, further comprising a memory means for storing rules defining the number of main support frame for statistics pre Me packet loss,
The audio signal receiving apparatus according to claim 1, wherein the number of required frames determining means determines the number of required frames from the packet loss statistical value with reference to the rules of the memory means.

4. The audio signal receiving device according to claim 2, wherein the delay buffer means includes:
Silent section detecting means for determining whether the decoded speech signal is a speech section of a silent section or a non-speech section ,
Discard or twice accumulation No. sound Koeshin to the audio signal receiving apparatus, wherein the audio signal der Turkey of silence interval or non-speech section in said write control means.

In the audio signal receiving apparatus according to claim 1乃Itaru 4,
The decoding means decodes the high frequency audio code and the low frequency audio code obtained from each audio packet, respectively, and outputs a high frequency decoded audio signal and a low frequency decoded audio signal,
The combination of the delay buffer means, the output buffer means, and the speech waveform interpolation processing means is a high-frequency set for processing the high-frequency decoded speech signal given as a decoded speech signal to generate a high-frequency output speech signal. And two sets of low-frequency sets for processing the low-frequency decoded audio signal given as the decoded audio signal to generate a low-frequency output audio signal, and
An audio signal receiving apparatus comprising band synthesizing means for generating an output audio signal by synthesizing the high frequency output audio signal and the low frequency output audio signal.

The audio signal receiving apparatus according to claim 5 Symbol mounting, the delay buffer means for the set of the delay buffer means and said low frequency for the high band, the number of frames of the main holding the same value is given, each high frequency decoding Means are provided for controlling so that the actual number of frames held in the high frequency delay buffer means and the low frequency delay buffer means are always synchronized by deleting or inserting the audio signal and the low frequency decoded audio signal. An audio signal receiving apparatus characterized by comprising:

The audio signal receiving apparatus according to claim 5 Symbol placement, delay buffer means for the low range by the deletion or insertion of the low band decoding voice signal according to the delay buffer means main holding frame number is given for the low band, different After transitioning to the state of the number of retained frames, the number of retained frames is given to the high frequency delay buffer means, and the high frequency delay buffer means forcibly matches the number of retained frames with the high frequency decoded audio signal. An audio signal receiving apparatus that performs deletion or insertion of

In the audio signal receiving apparatus according to claim 1乃optimum 4, the speech waveform interpolation processing means,
A first pitch length is acquired from an audio parameter included in an output audio signal in the output audio buffer means, and a waveform corresponding to the first pitch length is copied from the output audio signal to generate the first interpolation waveform. Forward waveform generating means for
A second pitch length is obtained from a speech parameter included in the speech signal searched from within the delay buffer means, a waveform corresponding to the second pitch length is copied from the searched speech signal, and the second interpolation is performed. A backward waveform generating means for generating a waveform;
Weighted addition means for generating the interpolated audio signal by weighted addition of the first interpolation waveform and the second interpolation waveform;
An audio signal receiving device comprising:

In the audio signal receiving apparatus according to claim 1乃optimum 4, the speech waveform interpolation processing means,
The pitch length is acquired from the audio parameter included in the audio signal after j frames specified in advance from the packet loss frame in the delay buffer means, and the waveform corresponding to the pitch length is copied from the searched audio signal. Backward waveform generation means for generating the second interpolation waveform;
Forward waveform generation means for copying the waveform corresponding to the pitch length from the output audio signal in the output audio buffer means to generate the first interpolation waveform;
Weighted addition means for generating the interpolated audio signal by weighted addition of the first interpolation waveform and the second interpolation waveform;
An audio signal receiving device comprising:

Voice packet loss compensation method,
(a) temporarily storing received voice packets in a reception buffer;
(b) Voice packets are extracted from the reception buffer in ascending order of frame numbers, the voice codes in the voice packets are decoded , the PCM format voice waveform signal S _{PCM of} the kth frame (k is the frame number), and the frame number Obtaining a sound signal in which a relative position j that is a difference and a sound parameter P _kj of the kth frame are set ;
(c) determining whether a voice packet to be extracted is stored in the reception buffer, and generating a determination result as a packet loss flag indicating whether a packet loss has occurred;
(d) determining the number of frames to be held in the delay buffer means (hereinafter referred to as the number of frames to be held) and giving the delay buffer means;
(e) When the number of frames held in the delay buffer means is larger than the number of required frames, the decoded audio signal is discarded and the audio signal stored in the frame buffer is reduced in frame number When the number of frames output in order and held in the delay buffer means is smaller than the number of required frames, the decoded audio signal is stored twice in the frame buffer and the audio stored in the frame buffer When the signals are output in ascending order of the frame number and the number of frames held in the delay buffer means is equal to the number of required frames, the decoded audio signal is stored in the frame buffer and stored in the frame buffer. and outputting from the delay buffer means the stored audio signals in ascending order of frame numbers,
(f) a step of storing a predetermined time or number of frames by output speech buffer means output audio signal,
(g) When the frame to be output from the delay buffer means is a packet loss, the audio waveform signal of the output audio signal in the output audio buffer and the frame that is the packet loss in the delay buffer means Generating an interpolated speech signal using the speech waveform signal of the frame and a speech parameter corresponding to the speech waveform signal of the frame that is the packet loss in the delay buffer means ;
(h) If the frame to be output from the delay buffer means is not a packet loss, the audio signal from the delay buffer means is output as an output audio signal, and if it is a packet loss, the interpolated audio from the audio waveform interpolation processing means is output. Outputting a signal as an output audio signal;
A voice packet loss compensation method including:
Step (e) above is
If the number of frames held in the delay buffer means is larger than the number of required frames, the value of the relative position j included in the j-1 audio signals following the discarded frame is corrected to j-1. If the number of frames held in the delay buffer means is smaller than the number of required frames, the value of the relative position j included in the audio signal of j frames following the frame is set to j + 1 Steps to fix,
Voice packet loss compensation method which comprises and.

Furthermore according to claim 10 Symbol mounting method,
(i) includes steps for obtaining the statistical values of packet loss from the packet loss flag,
Step (d) above is
Based on SL packet loss statistics, the delay buffer means the number of frames to be retained within the determined (hereinafter main holding frame number), the voice packet loss compensation, wherein the Turkey applied to the delay buffer means Method.

In claim 11 Symbol mounting method, to determine the number of the main holding frame from step (d) of the packet loss statistics with reference to the rules that govern the number of main support frame for statistics in advance packet loss A characteristic voice packet loss compensation method.

11. was or in 12 Symbol mounting method, the step (e) is
(e- 1) decoded the audio signal comprises steps determines whether the audio signal silence section or non-speech section,
Discard or twice No. sound Koeshin accumulating voice packets loss compensation and wherein the audio signal der Turkey of silence interval or non-speech section in said write control means.

A program for implementing the voice packet loss method according to any one of claims 10 to 13 by a computer.

It was recorded claim 14 Symbol mounting program, the computer-readable recording medium.