JP4232030B2

JP4232030B2 - Fluctuation absorption control method of voice packet

Info

Publication number: JP4232030B2
Application number: JP2004135437A
Authority: JP
Inventors: 耕二山宮; 豪宮尾
Original assignee: サクサ株式会社
Priority date: 2004-04-30
Filing date: 2004-04-30
Publication date: 2009-03-04
Anticipated expiration: 2024-04-30
Also published as: JP2005318379A

Description

この発明は、例えば、ＶｏＩＰ（ＶｏｉｃｅｏｖｅｒＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）電話システムにおける音声パケットの授受の際に適用して好適な揺らぎ吸収制御方法に関する。 The present invention relates to a fluctuation absorption control method suitable for application when, for example, voice packets are exchanged in a VoIP (Voice over Internet Protocol) telephone system.

インターネットやイントラネットのようなＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）ネットワークを利用して音声信号を送る技術であるＶｏＩＰを用いて電話通信を行なうようにするＶｏＩＰ電話システムが提供されている。このＶｏＩＰ電話システムは、例えば、一般のＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）用の規格であるＩＴＵ‐Ｔ勧告Ｈ.３２３が用いられて構成される。 There has been provided a VoIP telephone system that performs telephone communication using VoIP, which is a technology for transmitting an audio signal using an IP (Internet Protocol) network such as the Internet or an intranet. This VoIP telephone system is configured using, for example, ITU-T recommendation H.323, which is a standard for a general LAN (Local Area Network).

この場合、通話音声信号は所定時間長分毎にパケット化されてＬＡＮ上を順次に伝送されるもので、そのためのトランスポート層のプロトコルとしてはＲＴＰ（Ｒｅａｌ−ｔｉｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ）が用いられる。このＲＴＰにおいては、送信側では、ＲＴＰヘッダの中にパケットの順序番号（シーケンス番号）やタイムスタンプ（時刻情報）を付けてパケットを送信し、受信側では、前記順序番号やタイムスタンプを基に、再生の同期をとることにより、実時間動作をすることができる。 In this case, the call voice signal is packetized every predetermined time length and sequentially transmitted on the LAN, and RTP (Real-time Transport Protocol) is used as a transport layer protocol for that purpose. In this RTP, on the transmitting side, a packet is transmitted with the sequence number (sequence number) and time stamp (time information) of the packet added to the RTP header, and on the receiving side, based on the sequence number and time stamp. By synchronizing the reproduction, real-time operation can be performed.

ところで、ＶｏＩＰ端末間での通話品質劣化の要因として、ＬＡＮからの受信ＲＴＰパケット到着時間間隔の遅延、すなわち、揺らぎが挙げられる。この揺らぎの問題については、例えば、特許文献１（特開２００４−４８６８０公報参照）にも記載されているように、従来から対策が施されており、通常、受信開始時に、所定数のパケットを揺らぎ吸収バッファに蓄積して、音声の再生を遅れさせることにより、揺らぎの吸収を行っている。 By the way, as a factor of deterioration in call quality between VoIP terminals, there is a delay in arrival time interval of received RTP packets from the LAN, that is, fluctuation. For example, as described in Patent Document 1 (see Japanese Patent Application Laid-Open No. 2004-48680), countermeasures have been taken for this fluctuation problem. Usually, a predetermined number of packets are received at the start of reception. The fluctuation is absorbed by accumulating in the fluctuation absorbing buffer and delaying the reproduction of the sound.

揺らぎ吸収バッファのバッファサイズは、開始蓄積パケット数と、最大蓄積パケット数とにより決定される。開始蓄積パケット数は、ＲＴＰパケット受信開始時に、揺らぎ吸収バッファからのパケットの読み出しを遅らせて、音声再生開始を待たせる受信パケット数である。 The buffer size of the fluctuation absorbing buffer is determined by the starting accumulated packet number and the maximum accumulated packet number. The start accumulation packet number is the number of received packets that delays the reading of the packet from the fluctuation absorbing buffer and waits for the start of audio reproduction at the start of RTP packet reception.

すなわち、ＲＴＰパケット受信開始時には、揺らぎ吸収バッファに、開始蓄積パケット数の受信パケットが蓄積されるまで、揺らぎ吸収バッファからパケットデータは読み出さず、揺らぎ吸収バッファに開始蓄積パケット数の受信パケットが蓄積されたときには、次の受信パケットの到来のときに、蓄積している受信パケットの先頭のパケットを読み出すようにする。 That is, at the start of RTP packet reception, packet data is not read from the fluctuation absorbing buffer until the number of received packets of the starting accumulated packet is accumulated in the fluctuation absorbing buffer, and the number of received packets of the starting accumulated packet is accumulated in the fluctuation absorbing buffer. When the next received packet arrives, the head packet of the stored received packet is read out.

最大蓄積パケット数は、揺らぎ吸収バッファに蓄積可能な最大パケット数であり、音声再生の遅延の最大値に対応する。揺らぎ吸収バッファの蓄積パケット数が、この最大蓄積パケット数を越えた場合には、開始蓄積パケット数分のパケットをバッファ内に残して、それ以外の蓄積パケットを廃棄するようにする。これにより、音声再生の遅延を最大蓄積パケット数の範囲内に押さえるようにしている。 The maximum number of accumulated packets is the maximum number of packets that can be accumulated in the fluctuation absorbing buffer, and corresponds to the maximum value of the delay of audio reproduction. When the number of accumulated packets in the fluctuation absorbing buffer exceeds this maximum number of accumulated packets, the packets corresponding to the number of start accumulated packets are left in the buffer, and the other accumulated packets are discarded. As a result, the audio reproduction delay is kept within the maximum number of accumulated packets.

従来、一般的には、開始蓄積パケット数と最大蓄積パケット数の値、したがって、揺らぎ吸収バッファのバッファサイズは、使用環境に応じて設定された固定値とされるが、実際の揺らぎよりも揺らぎ吸収バッファのバッファサイズが小さすぎたり、大きすぎたりすると、次のような問題がある。 Conventionally, in general, the values of the starting accumulated packet number and the maximum accumulated packet number, and therefore the buffer size of the fluctuation absorbing buffer is a fixed value set according to the use environment, but the fluctuation is larger than the actual fluctuation. If the buffer size of the absorption buffer is too small or too large, there are the following problems.

すなわち、発生する揺らぎに対して、揺らぎ吸収バッファのサイズが小さい場合には、揺らぎによりパケット受信間隔が遅延している間に、揺らぎ吸収バッファに蓄積されていたパケット分の音声データを再生しきってしまうため、再生音声の音切れの問題が発生する。一方、発生する揺らぎに対して、揺らぎ吸収バッファのサイズが大きすぎる場合には、必要以上に音声再生が遅延してしまう問題がある。 In other words, if the fluctuation absorbing buffer is small in size against the fluctuations that occur, while the packet reception interval is delayed due to fluctuations, the voice data corresponding to the packets accumulated in the fluctuation absorbing buffer is completely reproduced. As a result, there is a problem that the reproduced sound is interrupted. On the other hand, if the size of the fluctuation absorbing buffer is too large with respect to the fluctuation that occurs, there is a problem that audio reproduction is delayed more than necessary.

上記の特許文献１の発明は、揺らぎ吸収バッファのサイズ、すなわち、開始蓄積パケット数および最大蓄積パケット数を、伝送系で発生する揺らぎの量の増減に応じて動的に変更することにより、上記の問題を解決している。そして、この特許文献１に記載の発明による揺らぎ吸収方法においては、パケットの単位でデータを廃棄したり、無音データや、直前に受信したパケットデータを繰り返し再生したりして、揺らぎ吸収バッファにおけるパケットの蓄積量を動的に変更している。 The invention of the above-mentioned Patent Document 1 dynamically changes the size of the fluctuation absorbing buffer, that is, the start accumulation packet number and the maximum accumulation packet number in accordance with the increase / decrease of the fluctuation amount generated in the transmission system. The problem is solved. In the fluctuation absorbing method according to the invention described in Patent Document 1, data is discarded in units of packets, silence data, packet data received immediately before is repeatedly reproduced, and the packet in the fluctuation absorbing buffer is The amount of storage is dynamically changed.

図３４および図３５は、特許文献１における揺らぎ吸収バッファのサイズの変更およびパケット単位のデータ処理を説明するための図である。この例は、送信端末から送られてくるパケット単位の音声１、音声２、音声３、・・・を、受信端末で受信して、リアルタイムで再生するときの処理例である。この例は、説明の簡単のため、揺らぎ吸収バッファの開始蓄積パケット数が「０」で、最大蓄積パケット数が「１」以上である場合である。図３６は、そのときの受信端末での処理のフローチャートを示すものである。 FIG. 34 and FIG. 35 are diagrams for explaining the change in the size of the fluctuation absorbing buffer and the data processing in units of packets in Patent Document 1. In this example, voice 1, voice 2, voice 3,... Sent in units of packets sent from the transmitting terminal are received by the receiving terminal and reproduced in real time. In this example, for simplicity of explanation, the number of start accumulation packets in the fluctuation absorbing buffer is “0” and the maximum number of accumulation packets is “1” or more. FIG. 36 shows a flowchart of processing at the receiving terminal at that time.

図３４では、送信端末から送出された音声１、音声２のパケットデータに関しては、伝送系では揺らぎがなく、受信端末では、それらを順次に受信してリアルタイムに再生することができる（図３６のステップＳ１およびステップＳ２）。 In FIG. 34, the packet data of voice 1 and voice 2 sent from the transmitting terminal is not fluctuated in the transmission system, and the receiving terminal can receive them sequentially and reproduce them in real time (FIG. 36). Step S1 and step S2).

しかし、送信端末から送出された音声３のパケットは、伝送系で発生した揺らぎのため受信端末への到達が遅れ、このため、受信端末では、例えば音声２から生成した１パケット分の合成信号を生成して再生する（ステップＳ３）。 However, the voice 3 packet sent out from the transmitting terminal is delayed in reaching the receiving terminal due to fluctuations occurring in the transmission system. For this reason, the receiving terminal, for example, generates a synthesized signal for one packet generated from the voice 2. Generate and play back (step S3).

そして、受信端末では、遅れて到着する音声３のパケットデータを受信し、この音声３を再生するが、このとき、揺らぎの発生に合わせて揺らぎ吸収バッファの開始蓄積パケット数および最大蓄積パケット数を増加させる（ステップＳ４）。つまり、揺らぎ吸収バッファのバッファ量（蓄積パケット数）を増加する。そして、次に到来する音声４のパケットデータは、揺らぎ吸収バッファの開始蓄積パケット数が増加したことにより、受信して保存し、再生はしない（ステップＳ５）。 The receiving terminal receives the packet data of the voice 3 that arrives late and reproduces the voice 3. At this time, the start accumulated packet number and the maximum accumulated packet number of the fluctuation absorbing buffer are set in accordance with the occurrence of fluctuation. Increase (step S4). That is, the buffer amount (accumulated packet number) of the fluctuation absorbing buffer is increased. Then, the packet data of the next incoming voice 4 is received and stored, and is not reproduced (step S5), because the start accumulation packet number of the fluctuation absorbing buffer has increased.

次に、伝送系での揺らぎが収束して、音声５のパケットが到来したときには、受信端末では、揺らぎ収束に応じて、揺らぎ吸収バッファの開始蓄積パケット数および最大蓄積パケット数を減少させると共に、保存していた音声４のパケットは廃棄して、音声５のパケットを再生するようにする（ステップＳ６）。 Next, when the fluctuation in the transmission system converges and a packet of voice 5 arrives, the receiving terminal decreases the start accumulation packet number and the maximum accumulation packet number of the fluctuation absorption buffer according to the fluctuation convergence, The stored voice 4 packet is discarded and the voice 5 packet is reproduced (step S6).

その後の音声６および音声７のパケットデータは、伝送系での揺らぎがない状態であるので、受信端末では、それらを順次に受信してリアルタイムに再生することができる（ステップＳ７およびステップＳ８）。このときのパケット単位の音声データについての再生順序は、図３５に示すようなものとなる。 Since the subsequent packet data of the voice 6 and voice 7 is in a state where there is no fluctuation in the transmission system, the receiving terminal can receive them sequentially and reproduce them in real time (steps S7 and S8). At this time, the reproduction order of the audio data in packet units is as shown in FIG.

上記の特許文献は、次の通りである。
特開２００４−４８６８０公報 The above-mentioned patent documents are as follows.
JP 2004-48680 A

上述したように、揺らぎ吸収バッファによる揺らぎ吸収制御方法においては、伝送系で発生する揺らぎに応じて、パケット単位で音声データを廃棄したり、無音データを追加したり、直前に受信したパケットデータから合成音声信号を生成して再生したりするようにしている。 As described above, in the fluctuation absorbing control method using the fluctuation absorbing buffer, the voice data is discarded in units of packets, silence data is added, or the packet data received immediately before is added according to the fluctuation generated in the transmission system. A synthesized voice signal is generated and reproduced.

しかしながら、パケット単位で音声データを廃棄したり、合成音声信号を生成したりしているため、パケット単位のデータの区切り位置で音声信号波形に不連続部分が生じ、音声信号の再生音においては、当該不連続部分が人間の耳には雑音として聞こえるという問題がある。 However, since the audio data is discarded in units of packets or a synthesized audio signal is generated, a discontinuous portion occurs in the audio signal waveform at the position where the data in units of packets is separated, and in the reproduced sound of the audio signal, There is a problem that the discontinuous portion can be heard as noise by the human ear.

この発明は、以上の点にかんがみ、揺らぎ吸収バッファを用いて、音声パケットの伝送における揺らぎを吸収するように制御する場合において、雑音の影響をできるだけ少なくすることができる音声パケットの揺らぎ吸収制御方法を提供することを目的とする。 In view of the above points, the present invention provides a voice packet fluctuation absorption control method capable of reducing the influence of noise as much as possible in the case of using a fluctuation absorption buffer to control fluctuations in voice packet transmission. The purpose is to provide.

上記の課題を解決するために、請求項１の発明は、
リアルタイムで再生されるべき音声信号が所定時間長分毎にパケット化されて順次に伝送されてくる音声パケットを、揺らぎ吸収バッファを通じて受信し、伝送系において発生する前記音声パケットの到着タイミングの揺らぎを、前記揺らぎ吸収バッファにより制御する方法であって、
受信開始時に、前記揺らぎ吸収バッファに、開始蓄積パケット数の音声パケットが蓄積されてから、前記揺らぎ吸収バッファからの前記音声パケットの読み出しを開始し、前記揺らぎ吸収バッファに蓄積される音声パケット数が最大蓄積パケット数を越えたときには、所定数の音声パケット分のデータを廃棄するようにする音声パケットの揺らぎ吸収制御方法において、
前記揺らぎ吸収バッファに蓄積されている音声パケットのデータについて、音声波形周期を検出しておき、前記所定数の音声パケット分のデータを廃棄する際には、前記検出された音声波形周期の単位で廃棄を行なうことにより、音声波形の連続性を保持するようにした
ことを特徴とする音声パケットの揺らぎ吸収制御方法を提供する。 In order to solve the above problems, the invention of claim 1
The voice signal that is to be reproduced in real time is packetized every predetermined time length and is sequentially transmitted through the fluctuation absorbing buffer, and fluctuation of arrival timing of the voice packet generated in the transmission system is received. , A method of controlling by the fluctuation absorbing buffer,
At the start of reception, after the number of voice packets stored in the fluctuation absorbing buffer is accumulated in the fluctuation absorbing buffer, reading of the voice packet from the fluctuation absorbing buffer is started, and the number of voice packets accumulated in the fluctuation absorbing buffer is In the voice packet fluctuation absorption control method that discards data for a predetermined number of voice packets when the maximum number of accumulated packets is exceeded,
For voice packet data stored in the fluctuation absorbing buffer, a voice waveform period is detected, and when discarding the data for the predetermined number of voice packets, the detected voice waveform period is used as a unit. Provided is a voice packet fluctuation absorption control method characterized in that the continuity of a voice waveform is maintained by discarding.

この請求項１の発明においては、揺らぎ吸収バッファに蓄積されていた音声パケットデータについて、データの廃棄が必要なときには、当該揺らぎ吸収バッファに蓄積されている音声パケットのデータについて検出された音声波形周期単位でデータが廃棄される。 In the first aspect of the present invention, when the voice packet data stored in the fluctuation absorbing buffer needs to be discarded, the voice waveform period detected for the voice packet data stored in the fluctuation absorbing buffer. Data is discarded in units.

音声波形周期単位で廃棄されるため、廃棄されたデータの前後のデータは、連続性が保たれると共に、廃棄されずに残ったパケットデータの最後の部分と、その次の音声パケットデータの先頭の部分とは、本来、連続性を保持しているデータであるので、全体として、波形的に不連続なつなぎ目が生じることは無く、雑音の発生を抑えることができる。 Since the data is discarded in units of speech waveform cycles, the data before and after the discarded data is maintained in continuity, and the last part of the packet data that remains without being discarded and the beginning of the next voice packet data. Since this part is originally data that maintains continuity, there is no continuous discontinuity in the waveform as a whole, and the generation of noise can be suppressed.

また、請求項２の発明は、
リアルタイムで再生されるべき音声信号が所定時間長分毎にパケット化されて順次に伝送されてくる音声パケットを、揺らぎ吸収バッファを通じて受信し、伝送系において発生する前記音声パケットの到着タイミングの揺らぎを、前記揺らぎ吸収バッファにより制御する方法であって、
受信開始時に、前記揺らぎ吸収バッファに、開始蓄積パケット数の音声パケットが蓄積されてから、前記揺らぎ吸収バッファからの前記音声パケットの読み出しを開始し、前記揺らぎ吸収バッファに蓄積される音声パケット数が前記開始蓄積パケット数よりも少なくなり、前記揺らぎ吸収バッファから前記リアルタイムで再生されるべき音声信号が読み出されなくなったときには、合成音声信号を前記リアルタイムで再生する音声信号とするようにする音声パケットの揺らぎ吸収制御方法において、
前記揺らぎ吸収バッファに蓄積されている音声パケットのデータについて、音声波形周期を検出しておき、前記合成音声信号は、前記揺らぎ吸収バッファに蓄積されている音声パケットのデータを用いて、前記検出された音声波形周期の単位で生成することにより、音声波形の連続性を保持するようにした
ことを特徴とする音声パケットの揺らぎ吸収制御方法を提供する。 The invention of claim 2
The voice signal that is to be reproduced in real time is packetized every predetermined time length and is sequentially transmitted through the fluctuation absorbing buffer, and fluctuation of arrival timing of the voice packet generated in the transmission system is received. , A method of controlling by the fluctuation absorbing buffer,
At the start of reception, after the number of voice packets stored in the fluctuation absorbing buffer is accumulated in the fluctuation absorbing buffer, reading of the voice packet from the fluctuation absorbing buffer is started, and the number of voice packets accumulated in the fluctuation absorbing buffer is An audio packet for reducing a synthesized audio signal to be an audio signal to be reproduced in real time when the audio signal to be reproduced in real time is no longer read from the fluctuation absorbing buffer when the number is less than the start accumulation packet number In the fluctuation absorption control method,
A voice waveform period is detected for the voice packet data stored in the fluctuation absorbing buffer, and the synthesized voice signal is detected using the voice packet data stored in the fluctuation absorbing buffer. The present invention provides a voice packet fluctuation absorption control method characterized in that the continuity of a voice waveform is maintained by generating the voice waveform period.

また、この請求項２の発明においては、伝送系における揺らぎのために、揺らぎ吸収バッファから前記リアルタイムで再生されるべき音声信号が読み出されなくなったときには、当該揺らぎ吸収バッファに蓄積されている音声パケットのデータについて検出された音声波形周期単位で生成された合成音声信号を用いて再生するようにする。 According to the second aspect of the present invention, when the audio signal to be reproduced in real time is not read from the fluctuation absorbing buffer due to fluctuations in the transmission system, the voice stored in the fluctuation absorbing buffer is not read. Playback is performed using a synthesized speech signal generated in units of speech waveform periods detected for packet data.

合成音声信号は、音声波形周期単位で生成されるため、生成された音声波形周期単位のデータは、連続性が保たれると共に、音声波形周期単位であるため、当該合成音声信号と、その次に到来する音声パケットデータとのつなぎ目においても、音声波形周期単位とすることができるので、波形的に不連続なつなぎ目が生じることを防止することができ、雑音の発生を抑えることができる。 Since the synthesized speech signal is generated in units of speech waveform periods, the generated speech waveform period unit data is maintained in continuity and is in units of speech waveform periods. Even at the joint with the voice packet data arriving at, the voice waveform cycle unit can be used, so that it is possible to prevent the joint between the waveforms being discontinuous and to suppress the generation of noise.

この発明によれば、音声信号の廃棄および合成は、揺らぎ吸収バッファに蓄積されている音声パケットのデータから検出された音声波形周期単位でなされるので、音声データのつなぎ目における波形の不連続性を防止して、再生音声における雑音の発生を抑えることができる。 According to the present invention, since discarding and synthesizing of the voice signal is performed in units of voice waveform periods detected from the voice packet data stored in the fluctuation absorbing buffer, the waveform discontinuity at the joint of the voice data is reduced. It is possible to prevent the occurrence of noise in the reproduced voice.

以下、この発明による音声パケットの揺らぎ吸収方法の実施形態を、ＶｏＩＰ電話システムにおける音声パケットの伝送に適用した場合を例にとって、図を参照しながら説明する。 Hereinafter, an embodiment of a voice packet fluctuation absorbing method according to the present invention will be described with reference to the drawings, taking as an example a case where the voice packet fluctuation absorbing method is applied to transmission of voice packets in a VoIP telephone system.

図１は、この実施形態におけるＶｏＩＰ電話システムの全体の概要を示すブロック図である。この例のＶｏＩＰ電話システムは、ゲートキーパー１と、電話端末（Ｈ．３２３端末）２の複数個と、ＬＡＮ３と、ＶｏＩＰゲートウエイ４とによって構成される。 FIG. 1 is a block diagram showing an overview of the entire VoIP telephone system in this embodiment. The VoIP telephone system in this example includes a gatekeeper 1, a plurality of telephone terminals (H.323 terminals) 2, a LAN 3, and a VoIP gateway 4.

電話端末２は、コンピュータの機能を備えると共に、ハンドセット２Ｈを備えている。そして、複数個の電話端末２のそれぞれは、ＩＰネットワークを構成するＬＡＮ３に接続され、このＬＡＮ３を通じてゲートキーパー１に接続されている。 The telephone terminal 2 has a computer function and a handset 2H. Each of the plurality of telephone terminals 2 is connected to a LAN 3 that constitutes an IP network, and is connected to the gatekeeper 1 through the LAN 3.

ＶｏＩＰゲートウエイ４は、複数回線分の電話回線６を介して、電話網５と接続されると共に、ＬＡＮ３に接続される。電話回線６は、例えばＩＳＤＮ（ＩｎｔｅｇｒａｔｅｄＳｅｒｖｉｃｅｓＤｉｇｉｔａｌＮｅｔｗｏｒｋ）回線、専用回線などを含む。このゲートウエイ４は、電話網５とＩＰネットワークとを接続するための機能を備える中継管理装置の役割を果たすもので、連続音声信号とＩＰ音声パケットの相互変換や、ＩＰ音声パケットのゲートキーパー１とのやり取り、さらに、電話番号とＩＰアドレスとの相互変換を行なう。 The VoIP gateway 4 is connected to a telephone network 5 and a LAN 3 through telephone lines 6 for a plurality of lines. The telephone line 6 includes, for example, an ISDN (Integrated Services Digital Network) line and a dedicated line. The gateway 4 serves as a relay management device having a function for connecting the telephone network 5 and the IP network. The gateway 4 performs mutual conversion between continuous voice signals and IP voice packets, and the gatekeeper 1 for IP voice packets. Exchange, and mutual conversion between telephone number and IP address.

この場合、ＬＡＮ３は、ＩＴＵ−Ｔ勧告Ｈ．３２３の規格によるＩＰネットワークの構成とされている。 In this case, the LAN 3 is an ITU-T recommendation H.264. The IP network is configured according to the H.323 standard.

ゲートキーパー１は、ＩＴＵ−Ｔ勧告Ｈ．３２３のシステム構成の中で中心的な管理機能を担当するもので、例えばパーソナルコンピュータで構成される。複数の電話端末２のそれぞれは、ＬＡＮ３に接続されたときに、ゲートキーパー１に登録される。 The gatekeeper 1 is an ITU-T recommendation H.264. It is in charge of a central management function in the H.323 system configuration, and is configured by a personal computer, for example. Each of the plurality of telephone terminals 2 is registered in the gatekeeper 1 when connected to the LAN 3.

ゲートキーパー１は、システム内のゲートウエイ４や複数の電話端末２の交換管理、帯域幅の割り当て、電話番号とＩＰアドレスの対応付けなどを行ない、これに登録された複数個の電話端末による複数の電話回線６を利用した電話通信を管理する機能を有する交換管理装置の役割を果たす。 The gatekeeper 1 manages the exchange of gateways 4 and a plurality of telephone terminals 2 in the system, allocates bandwidth, associates telephone numbers with IP addresses, and performs a plurality of registrations by a plurality of registered telephone terminals. It plays the role of an exchange management device having a function of managing telephone communication using the telephone line 6.

［ゲートキーパー１のハードウエア構成例］
この実施の形態のシステムにおけるゲートキーパー１のハードウエア構成例を図２に示す。この実施の形態のゲートキーパー１は、例えばパーソナルコンピュータにより構成されるもので、ＣＰＵ１１０に対して、システムバス１１１を介してＲＯＭ１１２と、ＲＡＭ１１３と、ＬＡＮインターフェイス１１４と、パケット処理部１１５と、ネットワーク管理メモリ１１６とが接続されている。 [Hardware configuration example of gatekeeper 1]
An example of the hardware configuration of the gatekeeper 1 in the system of this embodiment is shown in FIG. The gate keeper 1 of this embodiment is constituted by, for example, a personal computer, and is connected to the CPU 110 via a system bus 111 with a ROM 112, a RAM 113, a LAN interface 114, a packet processing unit 115, and a network management. A memory 116 is connected.

ＲＯＭ１１２には、電話番号情報とＩＰアドレスとの変換の処理プログラムなど、ゲートキーパー１が実行する処理プログラムが記憶されている。ＲＡＭ１１３は、主としてＲＯＭ１１２のプログラムがＣＰＵ１１０によって実行される際にワークエリアとして使用される。 The ROM 112 stores a processing program executed by the gatekeeper 1 such as a processing program for converting telephone number information and an IP address. The RAM 113 is mainly used as a work area when the program of the ROM 112 is executed by the CPU 110.

また、ＬＡＮインターフェイス１１４は、ＬＡＮ３を通じて送られてくるパケット化データを取り込み、また、ＬＡＮ３にパケット化データを送出するための機能を備える。 The LAN interface 114 has a function for capturing packetized data sent through the LAN 3 and sending the packetized data to the LAN 3.

パケット処理部１１５は、ＬＡＮインターフェイス１１４により取り込んだパケットが、制御データのパケットであった場合には、その制御データを解読するために、受信したパケットを分解し、また、送信する制御データのパケット化データを生成する機能を有する。 When the packet fetched by the LAN interface 114 is a control data packet, the packet processing unit 115 disassembles the received packet to decode the control data, and transmits the control data packet to be transmitted. Has a function of generating digitized data.

なお、音声データのパケットの場合には、一般的には、ゲートキーパー１を介さずに、ゲートウエイ４と電話端末２との間でやり取りする。ただし、ゲートキーパー１を介して音声データのパケットの転送を行なう場合もあり、その場合には、ゲートキーパー１は、ゲートウエイ４からのパケットは電話端末２に宛てて、電話端末２からのパケットはゲートウエイ４または他の電話端末２に宛てて、その音声パケットを転送するようにする。 In the case of a packet of voice data, generally, it is exchanged between the gateway 4 and the telephone terminal 2 without going through the gatekeeper 1. However, the voice data packet may be transferred via the gatekeeper 1. In this case, the gatekeeper 1 sends the packet from the gateway 4 to the telephone terminal 2, and the packet from the telephone terminal 2 does not The voice packet is transferred to the gateway 4 or another telephone terminal 2.

そして、パケット処理部１１５は、パケット化データを分解／生成したり、転送処理のために一時保存したりするためのバッファメモリを備える。 The packet processing unit 115 includes a buffer memory for decomposing / generating packetized data and temporarily storing the packetized data for transfer processing.

ネットワーク管理メモリ１１６は、ネットワーク内に存在するゲートウエイ４や複数の電話端末２の登録情報やそれらのＩＰアドレスおよび、電話端末２のＩＰアドレスと電話番号との対応などの情報を記憶している。ＣＰＵ１１０は、それらの情報を用いて、パケットの行き先のＩＰアドレスを決定したり、受信先のＩＰアドレスを判別したりする。 The network management memory 116 stores information such as registration information of the gateway 4 and the plurality of telephone terminals 2 existing in the network, their IP addresses, and correspondence between the IP addresses of the telephone terminals 2 and telephone numbers. The CPU 110 determines the destination IP address of the packet or determines the destination IP address using the information.

［電話端末２のハードウエア構成例］
この実施形態のシステムにおける電話端末２は、図３に示すようなハードウエア構成例を示すものである。この実施形態の電話端末２は、端末本体２００と、ハンドセット２Ｈとからなる。ハンドセット２Ｈは、図示を省略したが、送話器を構成するマイクロホンと、送話アンプと、受話器を構成するスピーカと、受話アンプとを備えている。 [Hardware configuration example of telephone terminal 2]
The telephone terminal 2 in the system of this embodiment shows a hardware configuration example as shown in FIG. The telephone terminal 2 of this embodiment includes a terminal body 200 and a handset 2H. Although not shown, the handset 2H includes a microphone that constitutes a transmitter, a transmitter amplifier, a speaker that constitutes a receiver, and a receiver amplifier.

端末本体２００は、コンピュータにより構成されており、ＣＰＵ２１０に対して、システムバス２１１を介してＲＯＭ２１２と、ＲＡＭ２１３と、ディスプレイコントローラ２１４と、操作入力インターフェイス（図ではインターフェイスはＩ／Ｆと記載する。以下同じ）２１６と、ＬＡＮインターフェイス２１８と、パケット分解／生成部２１９と、音声データ入出力インターフェイス２２０と、揺らぎ吸収バッファを構成するＲＴＰ受信バッファ２２１と、ＲＴＰ送信バッファ２２２とが接続されている。 The terminal main body 200 is configured by a computer, and with respect to the CPU 210, a ROM 212, a RAM 213, a display controller 214, and an operation input interface (in the figure, the interface is described as I / F) via a system bus 211. 216, a LAN interface 218, a packet decomposition / generation unit 219, an audio data input / output interface 220, an RTP reception buffer 221 constituting a fluctuation absorbing buffer, and an RTP transmission buffer 222 are connected.

さらに、システムバス２１１には、受話音声データや送話音声データについての音声波形周期を抽出演算する音声波形周期演算部２２３と、音声データ合成処理部２２４と、音声データ廃棄処理部２２５とが接続されている。 Furthermore, a speech waveform period calculation unit 223 that extracts and calculates a speech waveform period for received voice data and transmission voice data, a voice data synthesis processing unit 224, and a voice data discard processing unit 225 are connected to the system bus 211. Has been.

ディスプレイコントローラ２１４には、ディスプレイ２１５が接続されており、このディスプレイ２１５の表示画面には、ＣＰＵ２１０の制御にしたがった表示が行われる。 A display 215 is connected to the display controller 214, and a display according to the control of the CPU 210 is performed on the display screen of the display 215.

また、操作入力インターフェイス２１６には、テンキー、カーソルキー、その他の操作キーを含む操作入力部２１７が接続されている。ＣＰＵ２１０は、操作入力インターフェイス２１６を介して操作入力部２１７を通じて使用者がいずれの入力キーを操作したかを認識し、その認識結果に基づいて、キー入力操作に応じた処理をＲＯＭ２１２のプログラムに従って実行する。 The operation input interface 216 is connected to an operation input unit 217 including a numeric keypad, a cursor key, and other operation keys. The CPU 210 recognizes which input key is operated by the user through the operation input unit 217 via the operation input interface 216, and executes processing corresponding to the key input operation according to the program in the ROM 212 based on the recognition result. To do.

ＲＯＭ２１２には、電話端末２をゲートキーパー１に登録する際の処理シーケンスを実行するためのプログラムや揺らぎ吸収バッファを構成するＲＴＰ受信バッファ２２１のバッファサイズを、ＬＡＮ３上で実際に発生する揺らぎに動的に対応して制御するためのプログラム、音声波形周期演算部２２３、音声データ合成処理部２２４および音声データ廃棄処理部２２５などを制御して、音声データの音声波形単位の合成または廃棄処理をするためのプログラムなどが記憶されている。 In the ROM 212, the program for executing the processing sequence when registering the telephone terminal 2 in the gatekeeper 1 and the buffer size of the RTP reception buffer 221 constituting the fluctuation absorbing buffer are moved according to fluctuations actually generated on the LAN 3. The control program, the voice waveform cycle calculation unit 223, the voice data synthesis processing unit 224, the voice data discard processing unit 225, and the like are controlled to synthesize or discard voice data in units of voice waveforms. A program for storing the program is stored.

ＲＡＭ２１３は、主としてＲＯＭ２１２のプログラムがＣＰＵ２１０によって実行される際にワークエリアとして使用される。 The RAM 213 is mainly used as a work area when the program of the ROM 212 is executed by the CPU 210.

ＬＡＮインターフェイス２１８は、ＬＡＮ３を通じて送られてくるパケット化データを取り込んでＲＴＰ受信バッファ２２１に順次に蓄積し、また、ＲＴＰ送信バッファ２２２に蓄積されている音声データの送信パケットを、ＬＡＮ３の空きを確認しながら、ＬＡＮ３に順次に送出する機能を備える。 The LAN interface 218 captures packetized data sent through the LAN 3 and sequentially stores the packetized data in the RTP reception buffer 221, and checks the voice data transmission packets stored in the RTP transmission buffer 222 for the availability of the LAN 3. However, it has a function of sequentially sending to the LAN 3.

パケット分解／生成部２１９は、ＬＡＮインターフェイス２１８により取り込まれ、ＲＴＰ受信バッファ２２１から読み出されたパケット化データを分解して、制御データや音声データを得る機能と、送信する制御データや所定時間分毎の音声データをパケット化したパケット化データを生成し、ＲＴＰ送信バッファ２２２に転送する機能を有する。このパケット分解／生成部２１９は、パケット化データを分解したり、生成したりするためのバッファメモリを備える。 The packet decomposing / generating unit 219 has a function of decomposing packetized data taken in by the LAN interface 218 and read out from the RTP reception buffer 221 to obtain control data and audio data, control data to be transmitted, and a predetermined time. It has a function of generating packetized data obtained by packetizing each voice data and transferring the packetized data to the RTP transmission buffer 222. The packet decomposition / generation unit 219 includes a buffer memory for decomposing and generating packetized data.

なお、このパケット分解／生成部２１９のパケット分解処理機能や生成処理機能は、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）により構成されるが、ＣＰＵ２１０と、ＲＯＭ２１２とにより、ソフトウエアとして実現することもできる。 Note that the packet decomposition processing function and the generation processing function of the packet decomposition / generation unit 219 are configured by a DSP (Digital Signal Processor), but may be realized as software by the CPU 210 and the ROM 212.

音声データ入出力インターフェイス２２０は、パケット分解されて得られた音声データをアナログ音声信号に変換してハンドセット２Ｈに供給し、また、ハンドセット２Ｈから入力されるアナログ音声信号をデジタル信号に変換して取り込む機能を備える。 The audio data input / output interface 220 converts the audio data obtained by the packet decomposition into an analog audio signal and supplies it to the handset 2H, and converts the analog audio signal input from the handset 2H into a digital signal and takes it in. It has a function.

音声波形周期演算部２２３は、揺らぎ吸収バッファを構成するＲＴＰ受信バッファ２２１に格納されている音声データから、当該音声データの音声波形周期を演算して求めるようにする。この音声波形周期演算部２２３は、例えばＤＳＰで構成されるが、ＣＰＵ２１０と、ＲＯＭ２１２とにより、ソフトウエアとして実現することもできる。この音声波形周期演算部２２３における音声波形周期演算処理の詳細については、後で詳述する。 The voice waveform cycle calculation unit 223 calculates the voice waveform cycle of the voice data from the voice data stored in the RTP reception buffer 221 that constitutes the fluctuation absorbing buffer. The voice waveform cycle calculation unit 223 is configured by a DSP, for example, but can also be realized as software by the CPU 210 and the ROM 212. Details of the voice waveform cycle calculation processing in the voice waveform cycle calculation unit 223 will be described later.

音声データ合成処理部２２４は、音声パケットについての伝送揺らぎのために、揺らぎ吸収バッファとしてのＲＴＰ受信バッファ２２１から音声データが読み出されなくなる状態になったときに、ＲＴＰ受信バッファ２２１に記憶されている音声データから合成音声信号を生成し、生成した合成音声信号を次のパケットの音声データとつなぎ合わせる処理を行なう。 The voice data synthesis processing unit 224 stores the voice data in the RTP reception buffer 221 when the voice data is not read from the RTP reception buffer 221 serving as a fluctuation absorbing buffer due to transmission fluctuation of the voice packet. A synthesized voice signal is generated from the existing voice data, and the generated synthesized voice signal is connected to the voice data of the next packet.

ここで、この実施形態では、音声データ合成処理部２２４では、音声波形周期演算部２２３で求められた音声波形周期に基づいて、音声波形周期単位で合成音声信号を生成し、つなぎ合わせの処理を行なうようにする。この音声データ合成処理部２２４における処理も、後で詳述する。 Here, in this embodiment, the voice data synthesis processing unit 224 generates a synthesized voice signal in units of voice waveform cycles based on the voice waveform cycle obtained by the voice waveform cycle calculation unit 223, and performs a process of joining. Do it. The processing in the voice data synthesis processing unit 224 will also be described in detail later.

音声データ廃棄処理部２２５は、後述するように、揺らぎ吸収バッファとしてのＲＴＰ受信バッファ２２１のバッファサイズを、音声パケットのデータの伝送揺らぎに応じて動的に変更制御する際において、音声データを廃棄する必要が生じたときに、音声波形周期演算部２２３で求められた音声波形周期に基づいて、音声波形周期単位で音声データの廃棄処理を行なう。 As will be described later, the voice data discard processing unit 225 discards voice data when dynamically changing the buffer size of the RTP reception buffer 221 as a fluctuation absorbing buffer according to transmission fluctuation of voice packet data. When it becomes necessary to perform this processing, voice data is discarded in units of voice waveform periods based on the voice waveform period obtained by the voice waveform period calculation unit 223.

なお、音声データ合成処理部２２４や音声データ廃棄処理部２２５も、例えばＤＳＰで構成されるが、ＣＰＵ２１０と、ＲＯＭ２１２とにより、ソフトウエアとして実現することもできる。 Note that the voice data synthesis processing unit 224 and the voice data discard processing unit 225 are also configured by, for example, a DSP, but can also be realized as software by the CPU 210 and the ROM 212.

［ゲートウエイ４のハードウエア構成例］
この実施の形態のシステムにおけるゲートウエイ４のハードウエア構成例を図４に示す。この実施の形態のゲートウエイ４は、例えばパーソナルコンピュータにより構成されるもので、ＣＰＵ４１０に対して、システムバス４１１を介してＲＯＭ４１２と、ＲＡＭ４１３と、ＬＡＮインターフェイス４１４と、パケット処理部４１５と、揺らぎ吸収バッファを構成するＲＴＰ受信バッファ４１７と、ＲＴＰ送信バッファ４１８とが接続されている。 [Hardware configuration example of Gateway 4]
An example of the hardware configuration of the gateway 4 in the system of this embodiment is shown in FIG. The gateway 4 of this embodiment is configured by a personal computer, for example, and is connected to the CPU 410 via a system bus 411, a ROM 412, a RAM 413, a LAN interface 414, a packet processing unit 415, and a fluctuation absorbing buffer. Are connected to an RTP reception buffer 417 and an RTP transmission buffer 418.

また、この実施形態のゲートウエイ４においては、システムバス４１１に対して、受話音声データや送話音声データについての音声波形周期を抽出演算する音声波形周期演算部４１９と、音声データ合成処理部４２０と、音声データ廃棄処理部４２１とが接続されている。 In the gateway 4 of this embodiment, a voice waveform cycle calculation unit 419 that extracts and calculates a voice waveform cycle of received voice data and transmitted voice data with respect to the system bus 411, a voice data synthesis processing unit 420, and the like. The audio data discard processing unit 421 is connected.

ＲＯＭ４１２には、電話端末２からの音声パケットのデータを、ＲＴＰ受信バッファ４１７を介して受け取り、電話回線６を通じて電話網５に送出するデータに変換するためのプログラム、また、逆に電話網５から取得したデータをＬＡＮ３を通じて電話端末２に送出するパケット化データに変換するためのプログラムや、揺らぎ吸収バッファを構成するＲＴＰ受信バッファ４１７のバッファサイズをＬＡＮ３上で実際に発生する揺らぎに動的に対応して制御するためのプログラム、音声波形周期演算部４１９、音声データ合成処理部４２０および音声データ廃棄処理部４２１などを制御して、音声データの音声波形単位の合成または廃棄処理をするためのプログラムなどが記憶されている。 The ROM 412 receives a voice packet data from the telephone terminal 2 through the RTP reception buffer 417 and converts it into data to be sent to the telephone network 5 through the telephone line 6, and conversely from the telephone network 5. A program for converting the acquired data into packetized data to be transmitted to the telephone terminal 2 via the LAN 3 and the buffer size of the RTP reception buffer 417 constituting the fluctuation absorbing buffer are dynamically supported for fluctuations actually generated on the LAN 3. , A speech waveform period calculation unit 419, a speech data synthesis processing unit 420, a speech data discard processing unit 421, and the like, and a program for performing speech waveform unit synthesis or discard processing of speech data Etc. are stored.

ＲＡＭ４１３は、主としてＲＯＭ４１２のプログラムがＣＰＵ４１０によって実行される際にワークエリアとして使用される。 The RAM 413 is mainly used as a work area when the program of the ROM 412 is executed by the CPU 410.

また、ＬＡＮインターフェイス４１４は、ＬＡＮ３を通じて送られてくるパケット化データを取り込み、また、ＬＡＮ３にパケット化データを送出するための機能を備える。 The LAN interface 414 has a function for capturing packetized data sent through the LAN 3 and sending the packetized data to the LAN 3.

パケット処理部４１５は、ＬＡＮインターフェイス４１４により取り込んだパケットを分解して、電話網５を伝送する形式のデータに変換し、また、電話網５から受信したデータをパケット化して、ＬＡＮインターフェイス４１４を通じてＬＡＮ３に送出するパケット化データを生成する機能を有する。 The packet processing unit 415 disassembles the packet taken in by the LAN interface 414 and converts it into data of a format to be transmitted through the telephone network 5, packetizes the data received from the telephone network 5, and transmits the LAN 3 through the LAN interface 414. A function of generating packetized data to be transmitted to

ＲＴＰ受信バッファ４１７とＲＴＰ送信バッファ４１８とは、前述した図３の電話端末のＲＴＰ受信バッファ２２１とＲＴＰ送信バッファ２２２と同様の働きをするものである。これらＲＴＰ受信バッファ４１７とＲＴＰ送信バッファ４１８とは、図４では、一つずつ示したが、実際には、複数本の電話回線６のそれぞれの電話回線に対して、一つずつ設けられるものである。 The RTP reception buffer 417 and the RTP transmission buffer 418 function in the same manner as the RTP reception buffer 221 and the RTP transmission buffer 222 of the telephone terminal shown in FIG. These RTP reception buffer 417 and RTP transmission buffer 418 are shown one by one in FIG. 4, but actually, one RTP reception buffer 417 and one RTP transmission buffer 418 are provided for each of the telephone lines of the plurality of telephone lines 6. is there.

音声波形周期演算部４１９と、音声データ合成処理部４２０と、音声データ廃棄処理部４２１とは、図３の音声波形周期演算部２２３と、音声データ合成処理部２２４と、音声データ廃棄処理部２２５と同様の処理動作をするものである。 The voice waveform cycle calculation unit 419, the voice data synthesis processing unit 420, and the voice data discard processing unit 421 are the voice waveform cycle calculation unit 223, the voice data synthesis processing unit 224, and the voice data discard processing unit 225 of FIG. The same processing operation is performed.

［揺らぎ吸収制御方法の実施形態の説明］
この実施形態の揺らぎ吸収制御方法は、電話端末のＲＴＰ受信バッファ２２１、また、ゲートウエイ４のＲＴＰ受信バッファ４１７の、いずれの制御にも適用される。この実施の形態においては、揺らぎ吸収バッファとしてのＲＴＰ受信バッファ２２１または４１７のバッファサイズを、ＬＡＮ３において実際に発生する揺らぎに応じて動的（ダイナミック）に変更制御する。 [Description of Embodiment of Fluctuation Absorption Control Method]
The fluctuation absorption control method of this embodiment is applied to any control of the RTP reception buffer 221 of the telephone terminal and the RTP reception buffer 417 of the gateway 4. In this embodiment, the buffer size of the RTP reception buffer 221 or 417 as a fluctuation absorbing buffer is dynamically changed and controlled in accordance with fluctuations actually generated in the LAN 3.

そして、この実施形態では、揺らぎ吸収バッファとしてのＲＴＰ受信バッファ２２１または４１７のバッファサイズの変更制御に伴い、音声データの廃棄または追加（合成音声信号の追加）が必要であるときに、それら廃棄または追加を、音声パケット単位ではなく、音声波形周期単位で行なうようにする。 In this embodiment, the voice data is discarded or added (addition of the synthesized voice signal) when the buffer size change control of the RTP reception buffer 221 or 417 as the fluctuation absorbing buffer is required. The addition is performed not in units of voice packets but in units of voice waveform periods.

この実施形態のバッファサイズの制御方法を説明する前に、揺らぎ吸収バッファのバッファサイズを決定するためのパラメータの定義を、図５を参照して説明する。この図５は、揺らぎ吸収バッファとしてのＲＴＰ受信バッファ２２１または４１７のバッファサイズを決定するためのイメージを説明するためのものである。 Before describing the buffer size control method of this embodiment, the definition of parameters for determining the buffer size of the fluctuation absorbing buffer will be described with reference to FIG. FIG. 5 is an illustration for explaining an image for determining the buffer size of the RTP reception buffer 221 or 417 as the fluctuation absorbing buffer.

初期再生パケット数は、ＲＴＰ受信バッファから読み出すパケット数であり、固定値であって、この例では、初期再生パケット数＝１とされている。 The number of initial playback packets is the number of packets read from the RTP reception buffer, and is a fixed value. In this example, the number of initial playback packets = 1.

開始蓄積パケット数は、前述したように、パケット受信開始時に、ＲＴＰ受信バッファからのパケットの読み出しを遅らせて、音声再生開始を待たせる受信パケット数である。前述したように、この開始蓄積パケット数は従来は固定値であったが、この実施形態では、後述するようにＬＡＮ３に発生する揺らぎ量に応じて動的に変化する。以下に説明する例では、開始蓄積パケット数の初期値は、開始蓄積パケット数の初期値＝４とされている。 As described above, the start accumulation packet number is the number of received packets that delays the reading of the packet from the RTP reception buffer and waits for the start of audio reproduction at the start of packet reception. As described above, the number of starting accumulated packets has conventionally been a fixed value, but in this embodiment, it dynamically changes according to the amount of fluctuation generated in the LAN 3 as will be described later. In the example described below, the initial value of the starting accumulation packet number is set to 4 as the initial value of the starting accumulation packet number.

次に、最大蓄積パケット数は、前述したように、揺らぎ吸収バッファであるＲＴＰ受信バッファに蓄積可能な最大パケット数であり、ＲＴＰ受信バッファに最大蓄積パケット数以上のパケットが蓄積されたときには、前述したように、開始蓄積パケット数分のパケットをＲＴＰ受信バッファ内に残して、それ以外の蓄積パケットを廃棄する。このときに廃棄するパケット数を、溢れバッファ廃棄数と呼び、この例では、溢れバッファ廃棄数＝２とされる。ただし、後述するように、実際に廃棄する音声データは、音声波形周期単位のデータである。 Next, as described above, the maximum number of accumulated packets is the maximum number of packets that can be accumulated in the RTP reception buffer, which is a fluctuation absorbing buffer, and when more packets than the maximum number of accumulated packets are accumulated in the RTP reception buffer, As described above, packets corresponding to the number of start accumulation packets are left in the RTP reception buffer, and other accumulation packets are discarded. The number of packets discarded at this time is called the overflow buffer discard number, and in this example, the overflow buffer discard number = 2. However, as will be described later, the audio data that is actually discarded is data in units of audio waveform periods.

この最大蓄積パケット数は、
最大蓄積パケット数＝開始蓄積パケット数＋溢れバッファ廃棄数
と定義できるもので、音声再生の遅延の最大値に対応する。この実施形態では、上記の定義から、開始蓄積パケット数が動的変化値であるので、これも動的変化値である。 The maximum number of accumulated packets is
This can be defined as the maximum number of accumulated packets = the number of start accumulated packets + the number of overflow buffer discards, and corresponds to the maximum delay of audio reproduction. In this embodiment, since the number of start accumulation packets is a dynamic change value from the above definition, this is also a dynamic change value.

次に、最大開始蓄積パケット数は、動的変化値である開始蓄積パケット数の最大値（上限）であり、固定値である。 Next, the maximum start accumulation packet number is the maximum value (upper limit) of the start accumulation packet number which is a dynamic change value, and is a fixed value.

ＲＴＰ受信バッファ２２１または４１７は、図５に示すように、最大開始蓄積パケット数以上のメモリセル（１つのパケットの蓄積メモリ部をこの明細書ではセルと呼ぶ）を備え、メモリアクセス上、最後のメモリセルの次には最初のメモリセルに戻るようにされたリング状のバッファ構成とされている。 As shown in FIG. 5, the RTP reception buffer 221 or 417 includes memory cells (a storage memory portion of one packet is referred to as a cell in this specification) equal to or more than the maximum start accumulation packet number. Next to the memory cell, a ring-shaped buffer configuration is made so as to return to the first memory cell.

そして、ＲＴＰ受信バッファ２２１または４１７から読み出すパケット位置（メモリセル）は参照インデックスにより指定される。参照インデックスのメモリセルにパケットが蓄積されていないとき、または、ＲＴＰ受信バッファ２２１または４１７に開始蓄積パケット数が蓄積されていないときには、パケットは読み出さずに、後述するように、その前の音声パケットのデータから合成音声信号のデータを生成して、その再生を行なう。このときは、参照インデックスはインクリメントしない。 A packet position (memory cell) to be read from the RTP reception buffer 221 or 417 is designated by a reference index. When no packet is accumulated in the memory cell of the reference index, or when the start accumulation packet number is not accumulated in the RTP reception buffer 221 or 417, the packet is not read out, and the previous audio packet as will be described later. The synthesized voice signal data is generated from the data and reproduced. At this time, the reference index is not incremented.

そして、開始蓄積パケット数分のパケットがＲＴＰ受信バッファ２２１または４１７に蓄積されていて、参照インデックスのセルのパケットの読み出しをしたときには、参照インデックスをインクリメントする。 Then, when the packets corresponding to the start accumulation packet number are accumulated in the RTP reception buffer 221 or 417 and the packet of the cell of the reference index is read, the reference index is incremented.

また、ＣＰＵ２１０またはＣＰＵ４１０は、ＲＴＰ受信バッファ２２１または４１７の各メモリセルに蓄積されているパケットのシーケンス番号を保持して管理している。シーケンス番号は、パケットのＲＴＰヘッダに含まれており、パケットの受信時にＲＴＰヘッダを解析して取得する。 In addition, the CPU 210 or the CPU 410 holds and manages the sequence number of the packet stored in each memory cell of the RTP reception buffer 221 or 417. The sequence number is included in the RTP header of the packet, and is obtained by analyzing the RTP header when the packet is received.

そして、検索開始インデックスが、
検索開始インデックス＝参照インデックス＋初期再生パケット数
として定義されている。ＣＰＵ２１０またはＣＰＵ４１０は、パケットを受信したとき、受信したパケットのＲＴＰヘッダからシーケンス番号を抽出して検知し、検索開始インデックスのメモリセルから、各メモリセルに格納されているパケットのシーケンス番号を検索し、受信パケットを格納するメモリセルを決定する。 And the search start index is
Search start index = reference index + number of initial playback packets
Is defined as When the CPU 210 or the CPU 410 receives a packet, it extracts and detects the sequence number from the RTP header of the received packet, and searches the sequence number of the packet stored in each memory cell from the memory cell of the search start index. The memory cell storing the received packet is determined.

図６は、揺らぎ吸収制御装置５００の機能ブロック図である。この図６の揺らぎ吸収バッファの制御装置５００は、電話端末２およびゲートウエイ４のいずれにも適用されるものであり、図３に示した電話端末２の端末本体２００のハードウエアおよび図４に示したゲートウエイ４のハードウエアから、揺らぎ吸収バッファとしてのＲＴＰ受信バッファ２２１，４１７の制御および音声データの廃棄および追加に関する部分を抽出したものに相当する。 FIG. 6 is a functional block diagram of the fluctuation absorption control device 500. 6 is applied to both the telephone terminal 2 and the gateway 4. The hardware of the terminal main body 200 of the telephone terminal 2 shown in FIG. 3 and the control apparatus 500 of the fluctuation absorbing buffer shown in FIG. This corresponds to a part extracted from the hardware of the gateway 4 regarding the control of the RTP reception buffers 221 and 417 as the fluctuation absorbing buffer and the discard and addition of the voice data.

ＬＡＮ３を通じて自分宛に送られてくるＲＴＰパケットは、データ受信部５０１（ＬＡＮインターフェイス２１８，４１４に対応）で受信され、揺らぎ吸収バッファ５０２（ＲＴＰ受信バッファ２２１または４１７に対応）に蓄積される。データ受信部５０１で受信されたＲＴＰパケットは、揺らぎ検出部５０３に供給されて、ＬＡＮ３において発生しているＲＴＰパケットについての揺らぎ量（単位は時間）が検出される。 The RTP packet sent to itself through the LAN 3 is received by the data receiving unit 501 (corresponding to the LAN interfaces 218 and 414) and accumulated in the fluctuation absorbing buffer 502 (corresponding to the RTP receiving buffer 221 or 417). The RTP packet received by the data reception unit 501 is supplied to the fluctuation detection unit 503, and the fluctuation amount (unit: time) of the RTP packet generated in the LAN 3 is detected.

ここで、揺らぎ量は、順次に受信される２つの受信ＲＴＰパケットの到着時間差と、パケットのＲＴＰヘッダに含まれるタイムスタンプの差よりパケット受信毎に算出される。すなわち、シーケンス番号がｉのパケットＳ_ｉのタイムスタンプをＴ_ｉ、パケットＳ_ｉの到着時間をＡ_ｉ、パケットＳ_ｉの受信時の揺らぎ量をＤ_ｉとしたとき、
Ｄ_ｉ＝（Ａ_ｉ−Ａ_ｉ−１）−（Ｔ_ｉ−Ｔ_ｉ−１）・・・（式１）
により、揺らぎ量Ｄ_ｉが算出される。 Here, the fluctuation amount is calculated for each packet reception from the arrival time difference between two received RTP packets received sequentially and the difference between the time stamps included in the RTP header of the packet. That is, when the time stamp _{T i} of a packet _{S i} sequence number i, the arrival time of a packet _{S i} _{A i,} the amount of fluctuation in reception of packet _{S i} was _{D i,}
D _i = (A _i −A _i−1 ) − (T _i −T _i−1 ) (Equation 1)
Thus, the fluctuation amount D _i is calculated.

揺らぎ検出部５０３で検出された揺らぎ量Ｄ_ｉは、揺らぎ吸収制御部５０４に供給される。揺らぎ吸収制御部５０４は、揺らぎ検出部５０３で検出された揺らぎ量Ｄ_ｉに基づいて、後述するようにＲＴＰ受信バッファのバッファサイズの制御を行ない、揺らぎ吸収処理を行ないながら、揺らぎ吸収バッファ５０２から受信パケットを読み出し、バッファ出力データ処理制御部５０５に供給する。 The fluctuation amount D _i detected by the fluctuation detection unit 503 is supplied to the fluctuation absorption control unit 504. Based on the fluctuation amount D _i detected by the fluctuation detection unit 503, the fluctuation absorption control unit 504 controls the buffer size of the RTP reception buffer as will be described later, and performs the fluctuation absorption processing from the fluctuation absorption buffer 502. The received packet is read and supplied to the buffer output data processing control unit 505.

バッファ出力データ処理制御部５０５は、音声波形周期演算部５０５１と、音声データ合成処理部５０５２と、音声データ廃棄処理部５０５３とを備える。音声波形周期演算部５０５１、音声データ合成処理部５０５２および音声データ廃棄処理部５０５３は、図３の音声波形周期演算部２２３、音声データ合成処理部２２４および音声データ廃棄処理部２２５、または、図４の音声波形周期演算部４１９、音声データ合成処理部４２０および音声データ廃棄処理部４２１に対応する。 The buffer output data processing control unit 505 includes a speech waveform cycle calculation unit 5051, a speech data synthesis processing unit 5052, and a speech data discard processing unit 5053. The voice waveform cycle calculation unit 5051, the voice data synthesis processing unit 5052, and the voice data discard processing unit 5053 are the same as the voice waveform cycle calculation unit 223, the voice data synthesis processing unit 224, and the voice data discard processing unit 225 shown in FIG. Corresponds to the voice waveform period calculation unit 419, the voice data synthesis processing unit 420, and the voice data discard processing unit 421.

バッファ出力データ処理制御部５０５は、音声波形周期演算部５０５１で、出力済みの音声データから音声波形周期の算出を行なう。また、バッファ出力データ処理制御部５０５は、揺らぎ吸収制御部５０４からの制御情報に基づいて、音声データ合成処理部５０５２および音声データ廃棄処理部５０５３で、揺らぎ吸収制御部５０４による揺らぎ吸収バッファ５０２のバッファサイズの変更制御に伴い、音声データの廃棄または追加（合成音声信号の追加）が必要であるときに、それら廃棄または追加を、音声波形周期演算部５０５１で算出した音声波形周期単位で行なう。 The buffer output data processing control unit 505 is a voice waveform cycle calculation unit 5051 that calculates a voice waveform cycle from the outputted voice data. Further, the buffer output data processing control unit 505 is based on the control information from the fluctuation absorption control unit 504, and includes a voice data synthesis processing unit 5052 and a voice data discard processing unit 5053, and the fluctuation absorption control unit 504 uses the fluctuation absorption buffer 502. When discarding or adding speech data (adding a synthesized speech signal) is necessary in accordance with the buffer size change control, discarding or adding the speech data is performed in units of speech waveform periods calculated by the speech waveform period computing unit 5051.

そして、バッファ出力データ処理制御部５０５は、その出力音声データをデータデコード処理部５０６に供給する。 Then, the buffer output data processing control unit 505 supplies the output audio data to the data decoding processing unit 506.

データデコード処理部５０６は、受け取った音声データをデコードし、データ送信部５０７に供給する。データ送信部５０７は、電話端末２の場合であれば、音声データをアナログ音声信号に変換してハンドセット２Ｈに送る。また、データ送信部５０７は、ゲートウエイ４の場合であれば、音声データを空き回線を通じて電話網５に送出する。 The data decoding processing unit 506 decodes the received audio data and supplies it to the data transmission unit 507. In the case of the telephone terminal 2, the data transmission unit 507 converts the voice data into an analog voice signal and sends it to the handset 2H. In the case of the gateway 4, the data transmission unit 507 transmits the voice data to the telephone network 5 through an empty line.

なお、揺らぎ検出部５０３および揺らぎ吸収制御部５０４の動作は、ＣＰＵ２１０または４１０により実行される。また、バッファ出力データ処理制御部５０５およびデータデコード処理部５０６は、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）により構成される。データ送信部５０７は、音声データ入出力インターフェイス２２０または回線インターフェイス４１６により構成される。 The operations of the fluctuation detection unit 503 and the fluctuation absorption control unit 504 are executed by the CPU 210 or 410. Further, the buffer output data processing control unit 505 and the data decoding processing unit 506 are configured by a DSP (Digital Signal Processor). The data transmission unit 507 is configured by the voice data input / output interface 220 or the line interface 416.

［揺らぎ吸収制御部５０４における開始蓄積量の制御動作］
この実施形態では、ＲＴＰ受信バッファの開始蓄積量（開始蓄積パケット数に対応）は、ＬＡＮ３上で発生する揺らぎ量に応じて動的に制御するが、この開始蓄積量の動的制御には、開始蓄積量の増加制御と、開始蓄積量の減少制御とがある。なお、この明細書において、蓄積量等における「量」は、時間の単位の値を示しており、蓄積パケット数等における「パケット数」は、前記「量」としての時間を、１パケット当たりの時間で割った値となるものである。 [Control operation of starting accumulation amount in fluctuation absorption control unit 504]
In this embodiment, the start accumulation amount (corresponding to the number of start accumulation packets) of the RTP reception buffer is dynamically controlled according to the fluctuation amount generated on the LAN 3, and the dynamic control of the start accumulation amount includes There are a start accumulation amount increase control and a start accumulation amount decrease control. In this specification, the “amount” in the accumulated amount or the like indicates a value in a unit of time, and the “number of packets” in the accumulated packet number or the like indicates the time as the “amount” per packet. It is the value divided by time.

〔開始蓄積量の増加手順〕
まず、開始蓄積量の増加制御手順について説明する。
この実施の形態においては、開始蓄積量より大きな揺らぎが、連続して、または周期的に、ＬＡＮ３上に発生した場合、開始蓄積量を増やして、受信パケットを再蓄積するようにする。以下、その増加手順を説明する。 [Procedure for starting accumulation amount increase]
First, the start accumulation amount increase control procedure will be described.
In this embodiment, when fluctuations larger than the start accumulation amount occur on the LAN 3 continuously or periodically, the start accumulation amount is increased and the received packet is re-accumulated. Hereinafter, the increase procedure will be described.

（１）増加目標蓄積量の算出
パケットを受信する毎に、揺らぎ量を検出し、検出した揺らぎ量の増減から、増加目標蓄積量を算出する。増加目標蓄積量は、開始蓄積量を増加させるときの目標値である。すなわち、増加目標蓄積量をＩＢ_ｉとし、その初期値をＩＢｏ、揺らぎ量をＤ_ｉとしたとき、以下のようにして、増加目標蓄積量ＩＢ_ｉを求める。なお、サフィックスｉは、パケットのシーケンス番号に相当する。 (1) Calculation of increase target accumulation amount Every time a packet is received, the fluctuation amount is detected, and the increase target accumulation amount is calculated from the increase / decrease in the detected fluctuation amount. The increase target accumulation amount is a target value for increasing the starting accumulation amount. That is, an increase in the target storage amount and IB _i, when IBo its initial value, the amount of fluctuation was D _i, as follows, obtaining the increase target accumulation amount IB _i. The suffix i corresponds to the sequence number of the packet.

ＩＢｏ＝開始蓄積量
ＩＢ_ｉ＝ＩＢ_ｉ−１＋ｆａ（Ｄ_ｉ，ＩＢ_ｉ−１）／ＣＩ・・・（式２）
ただし、
Ｄ_ｉ＞ＩＢ_ｉ−１の場合、
ｆａ（Ｄ_ｉ，ＩＢ_ｉ−１）＝Ｄ_ｉ−ＩＢ_ｉ−１
Ｄ_ｉ≦ＩＢ_ｉ−１の場合、
ｆａ（Ｄ_ｉ，ＩＢ_ｉ−１）＝０
・・・（式３）
ここで、ＣＩは収束速度定数であり、ＣＩ≧１である。この収束速度定数ＣＩにより、局所的に発生した揺らぎの影響を抑え、連続的に発生する揺らぎ量を得ることができる。ｆａ（）は、（）内の変数に関する関数を意味している。 IBo = start accumulation amount IB _i = IB _i−1 + fa (D _i , IB _i−1 ) / CI (Expression 2)
However,
If D _i > IB _i−1 ,
fa (D _i , IB _i-1 ) = D _i -IB _i-1
If D _i ≦ IB _i−1 ,
fa (D _i , IB _i-1 ) = 0
... (Formula 3)
Here, CI is a convergence rate constant, and CI ≧ 1. With this convergence rate constant CI, it is possible to suppress the influence of locally occurring fluctuations and obtain a continuously occurring fluctuation amount. fa () means a function related to a variable in ().

（２）開始蓄積量の更新
揺らぎ遅延によりＲＴＰ受信バッファが空になったときに、開始蓄積量＝増加目標蓄積量として、開始蓄積量を増加目標蓄積量に置き換えて増加させる。そして、更新された開始蓄積量になるまで、ＲＴＰ受信バッファからのパケットの読み出しを停止して、ＲＴＰ受信バッファに受信パケットを再蓄積する。なお、増加目標蓄積量＞最大開始蓄積量のときには、増加目標蓄積量＝最大開始蓄積量とされて制限される。 (2) Update of start accumulation amount When the RTP reception buffer becomes empty due to fluctuation delay, the start accumulation amount is increased by replacing the start accumulation amount with the increase target accumulation amount as start accumulation amount = increase target accumulation amount. Then, reading of packets from the RTP reception buffer is stopped until the updated start accumulation amount is reached, and the reception packets are reaccumulated in the RTP reception buffer. When the increase target accumulation amount> the maximum start accumulation amount, the increase target accumulation amount = the maximum start accumulation amount is limited.

以上のようにして、この実施の形態においては、揺らぎが連続的または周期的に発生する期間が十分に長ければ、揺らぎに対する蓄積量の不足分が増加目標蓄積量に累積されてゆき、増加目標蓄積量は適切な揺らぎ量（ＩＢ_ｉ＞Ｄ_ｉ）で収束してゆくので、その収束した増加目標蓄積量に開始蓄積量を置き換えることにより、発生する揺らぎ量に対応した揺らぎ吸収を行なうことができる。 As described above, in this embodiment, if the period in which the fluctuation occurs continuously or periodically is sufficiently long, the shortage of the accumulated amount with respect to the fluctuation is accumulated in the increased target accumulated amount, and the increase target Since the accumulated amount converges with an appropriate amount of fluctuation (IB _i > D _i ), it is possible to perform fluctuation absorption corresponding to the generated fluctuation amount by substituting the starting accumulated amount for the converged increase target accumulated amount. it can.

〔開始蓄積量の減少手順〕
次に、開始蓄積量の減少制御手順について説明する。
上述のようにして開始蓄積量を増加させた後、揺らぎがより小さい値に収束することがあるが、そのように揺らぎ量が小さくなった場合にも、開始蓄積量を、増加したままの値に保持していた場合には、ＲＴＰ受信バッファでの遅延（通話遅延）が大きくなる問題がある。 [Procedure for reducing the starting accumulation amount]
Next, the start accumulation amount reduction control procedure will be described.
After increasing the starting accumulation amount as described above, the fluctuation may converge to a smaller value, but even if the fluctuation amount becomes smaller in this way, the starting accumulation amount remains the increased value. If this is held, the delay (call delay) in the RTP reception buffer becomes large.

この実施形態では、揺らぎ量が、そのときの開始蓄積量よりも低い値で安定している場合には、通話遅延を減少させるために、適切な開始蓄積量にまで減少させるようにする。この実施の形態では、以下の手順で揺らぎが安定しているかどうかを判定し、適切な開始蓄積量の算出を行なう。 In this embodiment, when the fluctuation amount is stable at a value lower than the starting accumulation amount at that time, it is reduced to an appropriate starting accumulation amount in order to reduce the call delay. In this embodiment, it is determined whether the fluctuation is stable by the following procedure, and an appropriate starting accumulation amount is calculated.

（１）減少目標蓄積量の算出
パケットを受信する毎に、揺らぎ量を検出し、検出した揺らぎ量の増減から、開始蓄積量を減少させるときの目標値を求めるための減少目標蓄積量を算出する。すなわち、減少目標蓄積量をＤＢ_ｉとし、その初期値をＤＢｏ、揺らぎ量をＤ_ｉとし、増加目標蓄積量をＩＢ_ｉとし、収束速度定数をＣＤとしたとき、以下のようにして、減少目標蓄積量ＤＢ_ｉを求める。 (1) Calculation of the decrease target accumulation amount Every time a packet is received, the fluctuation amount is detected, and from the increase / decrease of the detected fluctuation amount, the decrease target accumulation amount is calculated to obtain the target value when the start accumulation amount is decreased. To do. That is, the decrease target accumulation amount is DB _i, and DBo its initial value, the fluctuation amount of D _i, the increase target accumulation amount and IB _i, when the convergence rate constant was CD, as follows, reducing the target The accumulated amount DB _i is obtained.

ＤＢｏ＝０
Ｄ_ｉ＞ＩＢ_ｉ−１の場合、
ＤＢ_ｉ＝０
Ｄ_ｉ≦ＩＢ_ｉ−１の場合、
ＤＢ_ｉ＝ＤＢ_ｉ−１＋ｆｂ（Ｄ_ｉ，ＤＢ_ｉ−１）／ＣＤ
・・・（式４）
ただし、
Ｄ_ｉ＞ＤＢ_ｉ−１の場合、
ｆｂ（Ｄ_ｉ，ＤＢ_ｉ−１）＝Ｄ_ｉ−ＤＢ_ｉ−１
Ｄ_ｉ≦ＤＢ_ｉ−１の場合、
ｆｂ（Ｄ_ｉ，ＤＢ_ｉ−１）＝０
・・・（式５）
ここで、収束速度定数ＣＤは、ＣＤ≧１である。 DBo = 0
If D _i > IB _i−1 ,
DB _i = 0
If D _i ≦ IB _i−1 ,
DB _i = DB _i-1 + fb (D _i , DB _i-1 ) / CD
... (Formula 4)
However,
If D _i > DB _i−1 ,
fb (D _i , DB _i-1 ) = D _i -DB _i-1
If D _i ≦ DB _i−1 ,
fb (D _i , DB _i-1 ) = 0
... (Formula 5)
Here, the convergence rate constant CD is CD ≧ 1.

なお、式４において、減少目標蓄積量ＤＢ_ｉの更新判定に、増加目標蓄積量ＩＢ_ｉを用いるのは、前述の（式２），（式３）との同期のためである。ｆｂ（）は、（）内の変数に関する関数を意味している。 In Equation 4, the increase target accumulation amount IB _i is used for the update determination of the decrease target accumulation amount DB _i for the purpose of synchronization with the above-described (Equation 2) and (Equation 3). fb () means a function related to a variable in ().

（２）揺らぎ安定の判定
揺らぎが安定したかどうかは、減少目標蓄積量が変化しない期間の長さ（パケット受信回数）で判定する。この実施形態では、収束期間カウント値ＣＮＴを、以下の手順で更新し、揺らぎ安定判定を行なう。 (2) Determination of fluctuation stability Whether the fluctuation is stable is determined by the length of the period during which the decrease target accumulation amount does not change (number of packet receptions). In this embodiment, the convergence period count value CNT is updated by the following procedure, and fluctuation stability determination is performed.

すなわち、
Ｄ_ｉ＞ＤＢ_ｉ−１の場合、
ＣＮＴ＝０
Ｄ_ｉ≦ＤＢ_ｉ−１の場合、
ＣＮＴ_ｉ＝ＣＮＴ_ｉ−１＋１
・・・（式６）
とする。 That is,
If D _i > DB _i−1 ,
CNT = 0
If D _i ≦ DB _i−1 ,
CNT _i = CNT _i-1 +1
... (Formula 6)
And

そして、収束期間カウント値ＣＮＴが、予め定めた揺らぎ安定と認められるような収束期間カウント値である収束期間定数ＣＮＴ−ｔｈよりも大きくなったかどうかを検査し、収束期間カウント値ＣＮＴが、収束期間定数ＣＮＴ−ｔｈよりも大きくなったときには、揺らぎが安定したと判定する。 Then, it is checked whether or not the convergence period count value CNT is larger than a convergence period constant CNT-th that is a convergence period count value that is recognized as stable fluctuation, and the convergence period count value CNT is equal to the convergence period. When it becomes larger than the constant CNT-th, it is determined that the fluctuation is stable.

なお、収束期間定数ＣＮＴ−ｔｈは、固定値でもよいし、また、受信パケットサイズにより変更するようにしてもよい。 The convergence period constant CNT-th may be a fixed value or may be changed according to the received packet size.

（３）バッファ更新判定およびバッファ更新処理
収束期間カウント値ＣＮＴが、収束期間定数ＣＮＴ−ｔｈよりも大きくなって、揺らぎが安定したと判定したときには、ＲＴＰ受信バッファのバッファサイズを変更するかどうかのバッファ更新判定を行なう。バッファ更新判定は、開始蓄積量と減少目標蓄積量との差が、予め定めた最小減少量よりも大きいかどうかにより行なう。最小減少量は、例えば１パケット分の時間とされる。 (3) Buffer update determination and buffer update processing When the convergence period count value CNT is larger than the convergence period constant CNT-th and it is determined that the fluctuation is stable, whether or not to change the buffer size of the RTP reception buffer. Perform buffer update determination. The buffer update determination is made based on whether or not the difference between the start accumulation amount and the reduction target accumulation amount is larger than a predetermined minimum reduction amount. The minimum decrease amount is, for example, a time for one packet.

バッファ更新判定の結果、開始蓄積量と減少目標蓄積量との差が、予め定めた最小減少量よりも小さい場合には、ＲＴＰ受信バッファサイズは、減少処理する必要がないとして、収束期間カウント値ＣＮＴや減少目標蓄積量はゼロリセットされる。 If the difference between the start accumulation amount and the reduction target accumulation amount is smaller than the predetermined minimum reduction amount as a result of the buffer update determination, the RTP reception buffer size does not need to be reduced, and the convergence period count value CNT and the reduction target accumulation amount are reset to zero.

バッファ更新判定の結果、開始蓄積量と減少目標蓄積量との差が、予め定めた最小減少量よりも大きい場合には、バッファ更新処理を実行して、開始蓄積量＝減少目標蓄積量とし、開始蓄積量を減少目標蓄積量まで減少させる。ただし、１度のバッファ更新手順で減少させる開始蓄積量は、予めパラメータとして設定された最大減少量により制限する。そして、収束期間カウント値ＣＮＴや減少目標蓄積量はゼロリセットすると共に、減少した分のバッファ内パケットを廃棄するようにする。 As a result of the buffer update determination, when the difference between the start accumulation amount and the decrease target accumulation amount is larger than the predetermined minimum decrease amount, the buffer update process is executed, and the start accumulation amount = the decrease target accumulation amount is set. Decrease the starting accumulation amount to the target reduction amount. However, the starting accumulation amount to be reduced in one buffer update procedure is limited by the maximum reduction amount set in advance as a parameter. Then, the convergence period count value CNT and the decrease target accumulation amount are reset to zero, and the decreased amount of packets in the buffer are discarded.

図７のフローチャートを参照して、以上のバッファ更新判定およびバッファ更新処理のルーチンを、さらに説明する。この図７の処理は、パケットを受信する毎に、電話端末２では、ＣＰＵ２１０により、ゲートウエイ４では、ＣＰＵ４１０により実行されるものである。 With reference to the flowchart of FIG. 7, the routine for the above buffer update determination and buffer update processing will be further described. The processing in FIG. 7 is executed by the CPU 210 in the telephone terminal 2 and by the CPU 410 in the gateway 4 every time a packet is received.

まず、収束期間カウント値ＣＮＴが、収束期間定数ＣＮＴ−ｔｈよりも大きくなって、揺らぎが安定したかどうかを判定する（ステップＳ１１）。揺らぎが安定していないと判定したときには、このルーチンを抜けて他の処理ステップに進む。 First, it is determined whether the convergence period count value CNT is larger than the convergence period constant CNT-th and the fluctuation is stable (step S11). When it is determined that the fluctuation is not stable, the routine exits and proceeds to another processing step.

また、揺らぎが安定したと判定したときには、その時点における開始蓄積量から減少目標蓄積量を減算して、その減算結果ΔＳを求める（ステップＳ１２）。そして、その減算結果ΔＳが、予め設定された最小減少量よりも大きいかどうか判定する（ステップＳ１３）。 When it is determined that the fluctuation is stable, the decrease target accumulation amount is subtracted from the starting accumulation amount at that time, and the subtraction result ΔS is obtained (step S12). Then, it is determined whether or not the subtraction result ΔS is larger than a preset minimum reduction amount (step S13).

ステップＳ３で、減算結果ΔＳが、予め設定された最小減少量よりも小さいと判定された場合には、収束期間カウント値ＣＮＴを「０」にし（ステップＳ１４）、また、減少目標蓄積量を「０」にする（ステップＳ１５）。 If it is determined in step S3 that the subtraction result ΔS is smaller than the preset minimum decrease amount, the convergence period count value CNT is set to “0” (step S14), and the decrease target accumulation amount is set to “ 0 "(step S15).

ステップＳ１３で、減算結果ΔＳが、予め設定された最小減少量よりも大きいと判定されたときには、当該減算結果ΔＳは、予め設定された最大減少量以下であるか否か判別する（ステップＳ１６）。 When it is determined in step S13 that the subtraction result ΔS is larger than a preset minimum decrease amount, it is determined whether or not the subtraction result ΔS is equal to or less than a preset maximum decrease amount (step S16). .

このステップＳ１６において、減算結果ΔＳは、予め設定された最大減少量以下であると判別したときには、開始蓄積量＝減少目標蓄積量として、開始蓄積量を減少させる（ステップＳ１７）。また、ステップＳ１６において、減算結果ΔＳは、予め設定された最大減少量よりも大きいと判別したときには、開始蓄積量＝そのときの開始蓄積量−最大減少量として、開始蓄積量を、最大減少量分だけ減少させる（ステップＳ１８）。 In this step S16, when it is determined that the subtraction result ΔS is equal to or less than the preset maximum decrease amount, the start accumulation amount is decreased as start accumulation amount = decrease target accumulation amount (step S17). In Step S16, when it is determined that the subtraction result ΔS is larger than the preset maximum decrease amount, the start accumulation amount is set to the maximum decrease amount as Start accumulation amount = Start accumulation amount at that time−Maximum decrease amount. Decrease by the amount (step S18).

ステップＳ１７およびステップＳ１８の後は、ステップＳ１９に進んで、増加目標蓄積量を、更新された開始蓄積量に置き換える。つまり、この時点から、増加目標蓄積量の初期値は、更新された開始蓄積量になる。 After step S17 and step S18, the process proceeds to step S19 to replace the increase target accumulation amount with the updated start accumulation amount. That is, from this time, the initial value of the increase target accumulation amount becomes the updated start accumulation amount.

次に、収束期間カウント値ＣＮＴを「０」にし（ステップＳ２０）、また、減少目標蓄積量を「０」にする（ステップＳ２１）。さらに、開始蓄積量を減少させたことにより溢れる分のパケットデータを、ＲＴＰ受信バッファから廃棄する（ステップ２２）。ここで、実際に廃棄する音声データは、パケット単位ではなく、音声波形周期単位とする。この場合の開始蓄積量の減少により廃棄するパケット数は、最大蓄積パケット数を越えて溢れたときに廃棄する溢れバッファ廃棄数のような固定値ではなく、動的なものとなる。以上で、バッファ更新処理は終了となる。 Next, the convergence period count value CNT is set to “0” (step S20), and the reduction target accumulation amount is set to “0” (step S21). Further, the excess packet data due to the reduction of the starting accumulation amount is discarded from the RTP reception buffer (step 22). Here, the audio data to be actually discarded is not a packet unit but a voice waveform cycle unit. In this case, the number of packets discarded due to a decrease in the starting accumulation amount is not a fixed value such as the number of overflow buffer discards when overflowing beyond the maximum accumulation packet number, but is dynamic. This is the end of the buffer update process.

［揺らぎ吸収バッファ制御の処理ルーチン］
次に、揺らぎ吸収バッファのバッファサイズの制御および読み出し処理動作の全体を、図８および図９のフローチャートを参照しながら説明する。 [Processing routine for fluctuation absorption buffer control]
Next, the overall control of the buffer size of the fluctuation absorbing buffer and the read processing operation will be described with reference to the flowcharts of FIGS.

ＲＴＰパケットを受信すると、電話端末２では、ＣＰＵ２１０に、ゲートウエイ４ではＣＰＵ４１０に割り込みが発生して、図８および図９のフローチャートの処理ルーチンを開始し、まず、ＲＴＰヘッダを解析する（ステップＳ３１）。次に、既にパケットの受信を開始しているかどうか判別する（ステップＳ３２）。 When the RTP packet is received, an interruption occurs in the CPU 210 in the telephone terminal 2 and in the CPU 410 in the gateway 4, and the processing routines of the flowcharts of FIGS. 8 and 9 are started. First, the RTP header is analyzed (step S31). . Next, it is determined whether or not reception of a packet has already been started (step S32).

ステップＳ２２で、未だ、パケット受信開始状態になっておらず、受信したパケットが受信開始のパケットであると判別したときには、パケットサイズ、つまり、音声データサイズを、予め決定されているパラメータに基づいて取得する（ステップＳ３３）。 When it is determined in step S22 that the packet reception start state has not yet been reached and the received packet is a reception start packet, the packet size, that is, the audio data size is determined based on a predetermined parameter. Obtain (step S33).

すなわち、この例においては、送信側は、音声信号を所定時間長分毎にパケット化して、ＬＡＮ３に送出するが、この場合のパケット化の際の所定時間長（音声データのサイズ）は、例えば１０ｍｓｅｃ、２０ｍｓｅｃ、３０ｍｓｅｃなどのいくつかのサイズの中から、選択することができる。そして、どのサイズでパケット化されているかの情報（パラメータ）は、音声データの送受信の前に制御信号をやり取りすることにより決定されており、その情報（パラメータ）により、音声データサイズを取得する。 That is, in this example, the transmission side packetizes the audio signal every predetermined time length and sends it to the LAN 3. In this case, the predetermined time length (size of the audio data) at the time of packetization is, for example, It can be selected from several sizes such as 10 msec, 20 msec, and 30 msec. The information (parameter) indicating what size is packetized is determined by exchanging control signals before transmission / reception of audio data, and the audio data size is acquired from the information (parameter).

音声データサイズを取得したら、初期再生パケット数、開始蓄積パケット数の初期値、バッファ廃棄数、最大開始蓄積パケット数等のバッファサイズ制御パラメータを算出する（ステップＳ３４）。この例では、開始蓄積量の初期値、バッファ廃棄量、最大開始蓄積量は、時間単位で定められており、１パケット当たりの時間長である音声データサイズから、前記パラメータをパケット数単位に変換する。 When the audio data size is acquired, buffer size control parameters such as the number of initial playback packets, the initial value of the start accumulation packet number, the number of buffer discards, and the maximum number of start accumulation packets are calculated (step S34). In this example, the initial value of the start accumulation amount, the buffer discard amount, and the maximum start accumulation amount are determined in units of time, and the parameter is converted into units of packets from the audio data size that is the time length per packet. To do.

例えば、開始蓄積量の初期値が３０ｍｓｅｃ、バッファ廃棄量が２０ｍｓｅｃ、最大開始蓄積量が１０００ｍｓｅｃに設定されており、音声データサイズが、１０ｍｓｅｃ／パケットであったときには、開始蓄積パケット数の初期値は「３」、バッファ廃棄数は「２」、最大開始蓄積パケット数は「１００」となる。 For example, when the initial value of the start accumulation amount is set to 30 msec, the buffer discard amount is set to 20 msec, the maximum start accumulation amount is set to 1000 msec, and the audio data size is 10 msec / packet, the initial value of the start accumulation packet number is “3”, the number of buffer discards is “2”, and the maximum start accumulated packet number is “100”.

以上のパラメータの算出が終了したら、ＲＴＰ受信バッファを受信開始状態にし、ステップＳ３３で取得した音声データサイズに合わせて、図６のバッファ出力データ処理制御部５０５の制御側からの受信割り込み周期を設定する（ステップＳ３５）。そして、前記受信割り込み周期は、バッファ出力データ処理制御部５０５が制御側から受信する割り込みに関するものであり、この割り込みは、ＬＡＮ３側の音声データの受信で発生するものではない。 When the calculation of the above parameters is completed, the RTP reception buffer is set to the reception start state, and the reception interrupt cycle from the control side of the buffer output data processing control unit 505 in FIG. 6 is set in accordance with the audio data size acquired in step S33. (Step S35). The reception interrupt cycle relates to an interrupt received by the buffer output data processing control unit 505 from the control side, and this interrupt does not occur upon reception of voice data on the LAN 3 side.

電話端末２あるいはゲートウエイ４は、この受信割り込み周期で、ＲＴＰ受信バッファからパケットの取り出しを行って、後述する音声送出処理を行なう。 The telephone terminal 2 or the gateway 4 takes out the packet from the RTP reception buffer at this reception interrupt cycle, and performs voice transmission processing described later.

次に、受信開始状態にしたら、到着したパケットについての揺らぎ量の算出を行なう。また、ステップＳ３２で既に受信開始状態であると判別したときには、即座に、到着したパケットについての揺らぎ量の算出を行なう（ステップＳ３６）。このステップＳ３６における揺らぎ量の算出は、前述した（式１）によって行われる。 Next, when the reception is started, the fluctuation amount for the arrived packet is calculated. If it is determined in step S32 that the reception has already started, the amount of fluctuation for the arrived packet is immediately calculated (step S36). The calculation of the fluctuation amount in step S36 is performed by the above-described (Equation 1).

次に、算出した揺らぎ量に基づき、前述の（式２）、（式３）に基づき増加目標蓄積量ＩＢ_ｉの算出を行なうと共に、前述の（式４）、（式５）に基づき減少目標蓄積量ＤＢ_ｉの算出を行なう（ステップＳ３７）。 Next, based on the calculated fluctuation amount, the increase target accumulation amount IB _i is calculated based on the above-described (Expression 2) and (Expression 3), and the decrease target is determined based on the above-mentioned (Expression 4) and (Expression 5) The accumulation amount DB _i is calculated (step S37).

次に、ＲＴＰ受信バッファ内の蓄積パケット数が、初期再生パケット数より小さいかどうか判定する（ステップＳ３８）。 Next, it is determined whether or not the number of stored packets in the RTP reception buffer is smaller than the number of initial reproduction packets (step S38).

ＲＴＰ受信バッファ内の蓄積パケット数が、初期再生パケット数より小さいときには、再生するパケットがバッファ内にないため、ＲＴＰ受信バッファからのパケットの読み出し時に用いるバッファ参照フラグを「ＦＡＬＳＥ」にして、バッファ参照を禁止させ、無音生成を指示する（ステップＳ３９）。 When the number of packets stored in the RTP reception buffer is smaller than the initial number of packets to be reproduced, there are no packets to be reproduced, so the buffer reference flag used when reading the packet from the RTP reception buffer is set to “FALSE” and the buffer is referenced. Is prohibited, and silence generation is instructed (step S39).

次に、開始蓄積量を、そのときの増加目標蓄積量とする（ステップＳ４０）。ただし、増加目標蓄積量が最大開始蓄積量よりも大きいとき（増加目標蓄積量＞最大開始蓄積量）には、開始蓄積量＝最大開始蓄積量に制限される。前述したように、増加目標蓄積量の初期値は、開始蓄積量の初期値である。 Next, the starting accumulation amount is set as an increase target accumulation amount at that time (step S40). However, when the increase target accumulation amount is larger than the maximum start accumulation amount (increase target accumulation amount> maximum start accumulation amount), the start accumulation amount is limited to the maximum start accumulation amount. As described above, the initial value of the increase target accumulation amount is the initial value of the start accumulation amount.

このステップＳ４０においては、パケットの受信を開始したときには、前回のパケット受信時における揺らぎ量に応じたＲＴＰ受信バッファサイズとなるように、開始蓄積量が設定される。また、パケットの受信を開始している状態においては、ＲＴＰ受信バッファ内の蓄積パケット数が、初期再生パケット数より小さいということは、揺らぎによる遅延のために、ＲＴＰ受信バッファ内のパケットが空になったことを意味するため、開始蓄積量がそのときの増加目標蓄積量に増加変更される。 In this step S40, when the reception of the packet is started, the start accumulation amount is set so that the RTP reception buffer size corresponding to the fluctuation amount at the previous packet reception is obtained. In addition, in a state where reception of packets is started, the number of packets stored in the RTP reception buffer is smaller than the number of initial reproduction packets. This means that the packets in the RTP reception buffer are emptied due to delay due to fluctuations. This means that the starting accumulation amount is increased and changed to the increase target accumulation amount at that time.

次に、ＲＴＰ受信バッファサイズを、
ＲＴＰ受信バッファサイズ＝開始蓄積パケット数
として算出する（ステップＳ４１）。また、最大蓄積パケット数を、
最大蓄積パケット数＝ＲＴＰ受信バッファサイズ＋溢れ時のバッファ廃棄数
として算出する（ステップＳ４２）。次に、次回受信したときに、受信パケットをＲＴＰ受信バッファのいずれのメモリセルに蓄積するかを検索する際の開始インデックス（インデックスは、ＲＴＰ受信バッファのメモリセルを指し示す指標である）である検索開始インデックスを、ＲＴＰ受信バッファの先頭にセットする（ステップＳ４３）。その後、ＲＴＰ受信バッファへの受信パケット蓄積処理に進む（ステップＳ４５）。 Next, set the RTP receive buffer size to
RTP reception buffer size = start accumulation packet number is calculated (step S41). Also, the maximum number of accumulated packets
The maximum accumulated packet number is calculated as RTP reception buffer size + buffer discard number at overflow (step S42). Next, a search that is a start index (index is an index indicating a memory cell of the RTP reception buffer) for searching in which memory cell of the RTP reception buffer the received packet is stored when received next time A start index is set at the head of the RTP reception buffer (step S43). Thereafter, the process proceeds to a process of storing received packets in the RTP reception buffer (step S45).

また、ステップ３８で、蓄積パケット数が初期再生パケット数以上であると判別したときには、検索開始インデックスを、
検索開始インデックス＝参照インデックス＋初期再生パケット数
とする（ステップ４４）。その後、ステップＳ４５の受信パケット蓄積処理に進む。 If it is determined in step 38 that the number of accumulated packets is equal to or greater than the number of initially reproduced packets, the search start index is
Search start index = reference index + initial reproduction packet number (step 44). Thereafter, the process proceeds to a received packet accumulation process in step S45.

ステップ４５の受信パケット蓄積処理においては、前述したように、検索開始インデックスで指し示されるメモリセルから順に、ＲＴＰ受信バッファの各メモリセルに蓄積されているパケットのシーケンス番号を検索して、今回受信したパケットを格納すべきセルを検出し、検出されたセルに、受信パケットを格納する。 In the received packet storage process of step 45, as described above, the sequence number of the packet stored in each memory cell of the RTP reception buffer is searched in order from the memory cell indicated by the search start index, and received this time. The cell in which the received packet is to be stored is detected, and the received packet is stored in the detected cell.

以上のようにして、ＲＴＰ受信バッファ内に受信パケットを蓄積したら、蓄積パケット数を１だけインクリメントする（図９のステップＳ５１）。 When the received packet is accumulated in the RTP reception buffer as described above, the accumulated packet number is incremented by 1 (step S51 in FIG. 9).

次に、揺らぎが、収束したか否か判別する（ステップＳ５２）。そして、揺らぎが収束したと判別したときには、前述した開始蓄積量の減少手順を実行する（ステップＳ５３）。ステップＳ５２の処理は、図７のステップＳ１１の処理に対応し、また、ステップＳ５３の処理は、図７のステップＳ１２〜ステップＳ２２に処理に対応する。 Next, it is determined whether or not the fluctuation has converged (step S52). When it is determined that the fluctuation has converged, the above-described start accumulation amount reduction procedure is executed (step S53). The processing in step S52 corresponds to the processing in step S11 in FIG. 7, and the processing in step S53 corresponds to the processing in steps S12 to S22 in FIG.

ステップＳ５３での開始蓄積量の減少手順が終了した後と、ステップＳ５２で、減少目標蓄積量が収束していないと判別したときには、ＲＴＰ受信バッファの蓄積パケット数が、最大蓄積パケット数以上であるか否か判別する（ステップ５４）。そして、蓄積パケット数が最大蓄積パケット数以上であると判別したときには、ＲＴＰ受信バッファの先頭から、ステップＳ２４で定められた溢れ時のバッファ廃棄数分のパケットを、ＲＴＰ受信バッファから廃棄する（ステップＳ５５）。なお、実際の音声データの廃棄は、後述するように音声波形周期単位である。 After the start accumulation amount reduction procedure in step S53 is completed, and when it is determined in step S52 that the reduction target accumulation amount has not converged, the number of accumulated packets in the RTP reception buffer is equal to or greater than the maximum accumulated packet number. Whether or not (step 54). When it is determined that the number of stored packets is equal to or greater than the maximum number of stored packets, packets corresponding to the number of buffer discards at the time of overflow determined in step S24 are discarded from the RTP reception buffer from the top of the RTP reception buffer (step S24). S55). Note that the actual discarding of audio data is in units of speech waveform periods as will be described later.

ステップＳ５４で、蓄積パケット数が最大蓄積パケット数よりも少ないと判別されたとき、また、ステップ５５でのパケットの廃棄処理が終了した後には、蓄積パケット数が、ＲＴＰ受信バッファサイズ以上であるか否か判別する（ステップＳ５６）。 If it is determined in step S54 that the accumulated packet number is smaller than the maximum accumulated packet number, or after the packet discard process in step 55 is completed, is the accumulated packet number greater than or equal to the RTP reception buffer size? It is determined whether or not (step S56).

ステップＳ５６で、蓄積パケット数が、ＲＴＰ受信バッファサイズ以上であると判別したときには、参照インデックスを、ＲＴＰ受信バッファの先頭のセル番号とし（ステップＳ５７）、検索開始インデックスは、
検索開始インデックス＝参照インデックス＋初期再生パケット数
とした後（ステップＳ５８）、バッファ参照フラグを「ＴＲＵＥ」として、音声データ送出処理手順において、バッファ参照を許可し、音声データ再生開始を指示する（ステップＳ５９）。 When it is determined in step S56 that the number of accumulated packets is equal to or larger than the RTP reception buffer size, the reference index is set to the first cell number of the RTP reception buffer (step S57), and the search start index is
After setting the search start index = reference index + the number of initial playback packets (step S58), the buffer reference flag is set to “TRUE”, buffer reference is permitted in the audio data transmission processing procedure, and audio data playback start is instructed (step S58). S59).

その後、このパケット受信時の処理ルーチンを終了する。また、ステップＳ５６で、蓄積パケット数が、ＲＴＰ受信バッファサイズ以上ではないと判別したときにも、このパケット受信時の処理ルーチンを終了する。 Thereafter, the processing routine at the time of receiving the packet is terminated. Further, when it is determined in step S56 that the number of accumulated packets is not equal to or larger than the RTP reception buffer size, the processing routine at the time of packet reception is ended.

［ＲＴＰ受信バッファ内のパケット数の変化の具体例］
次に、図１０〜図１２を参照して、パケット到着時のＲＴＰ受信バッファ内のパケット数の変化の具体例について説明する。図１０〜図１２の各図において、ＢＦはＲＴＰ受信バッファを模式的に示し、網掛けを付して示したメモリセルは、パケットが蓄積されていることを示しており、当該蓄積パケットの中に示した数字はシーケンス番号を示している。また、白抜き部分のセルは、空きを示している。 [Specific example of change in number of packets in RTP receive buffer]
Next, a specific example of the change in the number of packets in the RTP reception buffer when a packet arrives will be described with reference to FIGS. In each of FIGS. 10 to 12, BF schematically shows an RTP reception buffer, and the memory cells shown with shading indicate that packets are stored. The numbers shown in are the sequence numbers. In addition, white cells indicate empty spaces.

以下に説明する例では、初期再生パケット数＝１、開始蓄積量の初期パケット数＝３、最大蓄積パケット数の初期値＝５、最大開始蓄積パケット数＝１０とした場合である。開始蓄積パケット数と、最大蓄積パケット数は動的に変化する。 In the example described below, the number of initial reproduction packets = 1, the initial number of packets of the starting accumulation amount = 3, the initial value of the maximum accumulation packet number = 5, and the maximum number of start accumulation packets = 10. The starting accumulated packet number and the maximum accumulated packet number dynamically change.

なお、図１０〜図１２の各図において、Ａ〜Ｄは、Ａ＝開始蓄積量の初期パケット数、Ｂ＝開始蓄積パケット数（動的変化）、Ｃ＝最大蓄積パケット数（動的変化）、Ｄ＝最大開始蓄積パケット数であり、各図において、上方に示したこれらＡ〜Ｄのうち、ＢおよびＣの値は、その初期値を示している。また、図１０〜図１２において、網掛けを付して示したものは、蓄積パケットを示しており、当該蓄積パケットの中に示した数字はシーケンス番号を示している。 10 to 12, A to D are A = the number of initial packets of the starting accumulation amount, B = the number of starting accumulation packets (dynamic change), and C = the maximum number of accumulation packets (dynamic change). , D = maximum starting accumulated packet number, and in each figure, among these A to D shown above, the values of B and C indicate their initial values. In FIGS. 10 to 12, shaded portions indicate stored packets, and the numbers shown in the stored packets indicate sequence numbers.

図１０は、パケットの受信開始からの処理と、揺らぎ発生による開始蓄積量および最大蓄積パケット数の変更および揺らぎ吸収処理を示すものである。この図１０では、発生した揺らぎは、パケットを廃棄しなければならないほど大きいものではなかった場合である。 FIG. 10 shows the process from the start of packet reception, the change of the start accumulation amount and the maximum number of accumulated packets due to fluctuations, and the fluctuation absorption process. In FIG. 10, the fluctuation that has occurred is not so large that the packet must be discarded.

最初のパケットを受信すると、前述したように、音声データサイズ（パケットサイズ）が認識され、バッファサイズ制御パラメータの算出がなされて、受信開始状態とされ、以後、音声データサイズ（パケットサイズ）に基づいて設定された受信割り込み周期で、後述するような音声データ送出処理がなされる。そして、到着したパケットがＲＴＰ受信バッファＢＦに蓄積される。 When the first packet is received, as described above, the voice data size (packet size) is recognized, the buffer size control parameter is calculated, and reception is started. Thereafter, based on the voice data size (packet size). The audio data transmission process described later is performed at the reception interrupt period set in the above. Then, the arrived packet is accumulated in the RTP reception buffer BF.

そして、パケット受信開始から、ＲＴＰ受信バッファＢＦに３個のパケットが蓄積されるまでは、ＲＴＰ受信バッファＢＦからパケットは読み出されず、無音再生が行われる。 From the start of packet reception until the three packets are accumulated in the RTP reception buffer BF, no packets are read from the RTP reception buffer BF, and silent reproduction is performed.

４個目のパケットが到着すると、それが蓄積されると共に、前述した受信割り込み周期のタイミングで、ＲＴＰ受信バッファＢＦの先頭のパケットが読み出されて、音声データの送出がなされる。以後は、受信割り込み周期のタイミングで、ＲＴＰ受信バッファＢＦに蓄積されているパケットは、先頭から順次読み出されて音声データ送出される。したがって、揺らぎが発生しなければ、図１０の例においては、ＲＴＰ受信バッファＢＦには、常に３個のパケットが蓄積される状態になる。 When the fourth packet arrives, it is accumulated, and at the timing of the reception interrupt cycle described above, the head packet of the RTP reception buffer BF is read out and voice data is transmitted. Thereafter, packets stored in the RTP reception buffer BF are sequentially read from the head and transmitted as audio data at the timing of the reception interrupt cycle. Therefore, if fluctuation does not occur, in the example of FIG. 10, three packets are always stored in the RTP reception buffer BF.

ここで、遅延揺らぎが発生すると、受信パケットが到着しないにも関わらず、受信割り込み周期のタイミングで、ＲＴＰ受信バッファに蓄積されているパケットは、先頭から順次、読み出されるので、新パケットが到着しない限り、ＲＴＰ受信バッファＢＦ内の蓄積パケット数は、徐々に減ってゆく。 Here, when delay fluctuation occurs, the packets stored in the RTP reception buffer are sequentially read from the beginning at the timing of the reception interrupt period even though the reception packet does not arrive, so that no new packet arrives. As long as the number of packets stored in the RTP reception buffer BF is gradually reduced.

そして、ＲＴＰ受信バッファＢＦ内の蓄積パケット数がゼロになると、揺らぎ吸収制御部５０４は、前述したように、開始蓄積パケット数Ｂを、増加目標蓄積パケット数に増加変更する。これに伴い、最大蓄積パケット数Ｃも増加変更される。図１０の例では、開始蓄積パケット数は「５」に、最大蓄積パケット数は「７」にそれぞれ変更される。 When the number of accumulated packets in the RTP reception buffer BF becomes zero, the fluctuation absorption control unit 504 changes the start accumulated packet number B to the increased target accumulated packet number as described above. Accordingly, the maximum accumulated packet number C is also changed. In the example of FIG. 10, the starting accumulated packet number is changed to “5”, and the maximum accumulated packet number is changed to “7”.

その後、揺らぎの発生により遅れた３周期分と合わせて４個のパケットが到着すると、それがＲＴＰ受信バッファＢＦに蓄積される。この時点では、変更された開始蓄積パケット数「５」には満たないので、ＲＴＰ受信バッファＢＦからパケットは読み出されず、無音再生が行われる。そして、次にパケットが到着してＲＴＰ受信バッファＢＦの蓄積パケット数が５になるまで、ＲＴＰ受信バッファＢＦからパケットは読み出されず、バッファ出力データ処理制御部５０５で、音声信号がその直前の音声データから音声波形周期単位で合成され、それが出力音声信号とされる。 Thereafter, when four packets arrive together with the three periods delayed by the occurrence of fluctuation, they are accumulated in the RTP reception buffer BF. At this time, since the number of changed start accumulation packets “5” is not reached, the packets are not read from the RTP reception buffer BF, and silent reproduction is performed. Then, until the next packet arrives and the number of accumulated packets in the RTP reception buffer BF reaches 5, the packet is not read from the RTP reception buffer BF, and the buffer output data processing control unit 505 outputs the audio signal immediately before the audio data. Are synthesized in units of speech waveform periods, and are used as output speech signals.

そして、ＲＴＰ受信バッファＢＦの蓄積パケット数が５になると、受信割り込み周期のタイミングで、ＲＴＰ受信バッファＢＦに蓄積されているパケットは、先頭から順次読み出されて音声データの送出がなされる。こうして、発生した揺らぎの量に合わせて、開始蓄積パケット数が変更されるので、パケットが廃棄されることによる音切れが防止される。 When the number of accumulated packets in the RTP reception buffer BF becomes 5, the packets accumulated in the RTP reception buffer BF are sequentially read from the head and the audio data is transmitted at the timing of the reception interrupt cycle. In this way, since the number of start accumulation packets is changed in accordance with the amount of fluctuation that has occurred, sound interruption due to packet discard is prevented.

次に、大きな揺らぎが発生した場合におけるＲＴＰ受信バッファのバッファサイズの変更制御（増加制御）について、図１１を参照して説明する。図１１の例は、開始蓄積パケット数が３、最大蓄積パケット数が５である場合において、大きな揺らぎが発生した場合である。 Next, buffer size change control (increase control) of the RTP reception buffer when a large fluctuation occurs will be described with reference to FIG. The example of FIG. 11 is a case where a large fluctuation occurs when the number of start accumulation packets is 3 and the maximum number of accumulation packets is 5.

図１１において、揺らぎが発生するまでは、受信割り込み周期のタイミングで、順次、ＲＴＰ受信バッファＢＦの蓄積パケットの先頭のパケットが読み出されると共に、受信パケットが蓄積される。 In FIG. 11, until the fluctuation occurs, at the timing of the reception interrupt cycle, the leading packet of the accumulation packet of the RTP reception buffer BF is sequentially read and the reception packet is accumulated.

この状態で揺らぎが発生すると、前述と同様にして、受信パケットが到着しないにも関わらず、受信割り込み周期のタイミングで、ＲＴＰ受信バッファに蓄積されているパケットは、先頭から順次、読み出されるので、ＲＴＰ受信バッファＢＦ内の蓄積パケット数は、徐々に減ってゆく。 When fluctuations occur in this state, the packets accumulated in the RTP reception buffer are sequentially read from the beginning at the timing of the reception interrupt cycle, even though the reception packet does not arrive, as described above. The number of accumulated packets in the RTP reception buffer BF gradually decreases.

そして、ＲＴＰ受信バッファＢＦ内の蓄積パケット数がゼロになると、前述と同様に、揺らぎ吸収制御部５０４により、開始蓄積パケット数Ｂは、増加目標蓄積パケット数に増加変更される。これに伴い、最大蓄積パケット数Ｃも増加変更される。図１１の例では、開始蓄積パケット数は「４」に、最大蓄積パケット数は「６」にそれぞれ変更される。 Then, when the number of accumulated packets in the RTP reception buffer BF becomes zero, the fluctuation accumulation control unit 504 increases and changes the starting accumulated packet number B to the increase target accumulated packet number as described above. Accordingly, the maximum accumulated packet number C is also changed. In the example of FIG. 11, the start accumulation packet number is changed to “4”, and the maximum accumulation packet number is changed to “6”.

この図１１の例では、このように開始蓄積パケット数の増加変更を行った後においても、揺らぎ量が大きいため、さらにパケットの到着が遅れる。このため、ＲＴＰ受信バッファＢＦからパケットは読み出されず、バッファ出力データ処理制御部５０５で、音声信号がその直前の音声データから音声波形周期単位で合成され、それが出力音声信号とされる。このとき、後述するように、出力音声データは、音声信号がその直前の音声データから合成されたものとされる。 In the example of FIG. 11, the arrival of packets is further delayed because the amount of fluctuation is large even after the increase in the number of starting accumulated packets is changed. For this reason, the packet is not read from the RTP reception buffer BF, and the buffer output data processing control unit 505 synthesizes the audio signal from the immediately preceding audio data in units of the audio waveform period, and sets it as the output audio signal. At this time, as will be described later, the output audio data is obtained by synthesizing the audio signal from the immediately preceding audio data.

そして、到着が遅れていた複数個のパケットと共に、図１１に示すように、最大蓄積パケット数を越える新たなパケットがまとめて到着する。すると、揺らぎ吸収制御部５０４は、予め定められている溢れバッファ廃棄数のパケット、この例では２パケットと、さらに１個のパケットを、ＲＴＰ受信バッファＢＦの先頭から、順次、バッファ出力データ処理制御部５０５に送出して、バッファ出力データ処理制御部５０５に廃棄処理させるように指示する。 Then, new packets exceeding the maximum number of accumulated packets arrive together, as shown in FIG. 11, together with a plurality of packets whose arrival has been delayed. Then, the fluctuation absorption control unit 504 sequentially controls the buffer output data processing for packets of a predetermined number of overflow buffer discards, two packets in this example, and one more packet from the top of the RTP reception buffer BF. And instruct the buffer output data processing control unit 505 to discard the data.

なお、このときに、図示のように、ＲＴＰ受信バッファＢＦからはパケット単位で読み出されるが、バッファ出力データ処理制御部５０５においては、廃棄処理のために送られてきたパケットデータの全てを廃棄するのではなく、後述するように、音声波形周期単位で廃棄して、音声波形の連続性を保持するようにする。 At this time, as shown in the figure, the packet is read from the RTP reception buffer BF in units of packets, but the buffer output data processing control unit 505 discards all the packet data sent for the discarding process. Instead, as described later, it is discarded in units of speech waveform cycles so as to maintain the continuity of the speech waveform.

ＲＴＰ受信バッファＢＦからの前記パケットの廃棄処理のための送出の結果、ＲＴＰ受信バッファＢＦの蓄積パケット数は５になるので、最大蓄積パケット数「６」以内に収まる。そして、受信割り込み周期のタイミングで、ＲＴＰ受信バッファＢＦに蓄積されているパケットは、先頭から読み出されて再生される。 As a result of transmission for discarding the packet from the RTP reception buffer BF, the number of accumulated packets in the RTP reception buffer BF becomes 5, so that it falls within the maximum number of accumulated packets “6”. Then, at the timing of the reception interrupt cycle, the packet stored in the RTP reception buffer BF is read from the head and reproduced.

図１１の例では、以上の処理により、ＲＴＰ受信バッファＢＦ内の蓄積パケット数は、開始蓄積パケット数になるので、その後は、揺らぎがなければ、先頭からのパケットの読み出しと、到着パケットの蓄積が行われる。 In the example of FIG. 11, the accumulated packet count in the RTP reception buffer BF becomes the start accumulated packet count by the above processing. After that, if there is no fluctuation, reading of the packet from the head and accumulation of the arrival packet are performed. Is done.

次に、揺らぎが停止してゆくときのＲＴＰ受信バッファのバッファサイズの変更制御（減少制御）について、図１２を参照して説明する。図１２の例は、開始蓄積パケット数Ｂが５に増加していて、このため、最大蓄積パケット数Ｃが７である場合において、揺らぎが収束して停止してゆく場合である。 Next, buffer size change control (decrease control) of the RTP reception buffer when fluctuations stop will be described with reference to FIG. The example of FIG. 12 is a case where the start accumulation packet number B has increased to 5, and therefore, when the maximum accumulation packet number C is 7, the fluctuation converges and stops.

すなわち、図１２に示すように、揺らぎが発生している間は、開始蓄積パケット数Ｂは「５」とされて、揺らぎ吸収処理が行われるが、揺らぎが停止して収束してゆくと、揺らぎ吸収制御部５０４は、前述した収束期間カウント値ＣＮＴが、収束期間定数ＣＮＴ−ｔｈよりも大きくなって、揺らぎが安定したと判定する。このときには、前述したように、揺らぎ吸収制御部５０４は、開始蓄積パケット数Ｂを減少目的蓄積量に減少変更し、減少させたパケット数分をＲＴＰ受信バッファＢＦの先頭から廃棄のためにバッファ出力データ処理制御部５０５に送出して、バッファ出力データ処理制御部５０５に廃棄処理させるように指示する。 That is, as shown in FIG. 12, while the fluctuation is occurring, the start accumulation packet number B is set to “5” and the fluctuation absorbing process is performed, but when the fluctuation stops and converges, The fluctuation absorption control unit 504 determines that the fluctuation is stable because the convergence period count value CNT described above becomes larger than the convergence period constant CNT-th. At this time, as described above, the fluctuation absorption control unit 504 changes the start accumulation packet number B to the reduction target accumulation amount, and outputs the reduced packet number from the head of the RTP reception buffer BF for discarding the buffer. The data is sent to the data processing control unit 505 and instructs the buffer output data processing control unit 505 to discard the data.

さらに、揺らぎが収束してゆくと、揺らぎ吸収制御部５０４は、収束期間カウント値ＣＮＴが、再び、収束期間定数ＣＮＴ−ｔｈよりも大きくなって、揺らぎが安定したと判定する。そして、揺らぎ吸収制御部５０４は、開始蓄積パケット数Ｂを減少目的蓄積量に減少変更し、減少させたパケット数分をＲＴＰ受信バッファＢＦの先頭から廃棄のためにバッファ出力データ処理制御部５０５に送出して、バッファ出力データ処理制御部５０５に廃棄処理させるように指示する。 Further, when the fluctuation converges, the fluctuation absorption control unit 504 determines that the convergence period count value CNT becomes larger than the convergence period constant CNT-th again and the fluctuation is stable. Then, the fluctuation absorption control unit 504 changes the start accumulation packet number B to the reduction target accumulation amount and changes the reduced packet number from the head of the RTP reception buffer BF to the buffer output data processing control unit 505 for discarding. Send out and instruct the buffer output data processing control unit 505 to discard the data.

なお、このときにも、図１１の場合と同様に、ＲＴＰ受信バッファＢＦからは図示のようにパケット単位で読み出されるが、バッファ出力データ処理制御部５０５においては、廃棄処理のために送られてきたパケットデータの全てを廃棄するのではなく、後述するように、音声波形周期単位で廃棄して、音声波形の連続性を保持するようにする。 At this time, as in the case of FIG. 11, the packet is read from the RTP reception buffer BF in units of packets as shown in the figure, but the buffer output data processing control unit 505 sends it for discard processing. Instead of discarding all of the packet data, as will be described later, it is discarded in units of speech waveform periods to maintain the continuity of the speech waveform.

以上の処理を繰り返すことにより、揺らぎがなくなる方向に収束するにつれて、開始蓄積パケット数Ｂは、初期値にまで減少させられる。 By repeating the above processing, the start accumulation packet number B is reduced to the initial value as it converges in the direction in which the fluctuation is eliminated.

以上のようにして、この実施形態によれば、揺らぎの発生に合わせて、ＲＴＰ受信バッファの開始蓄積パケット数を増加し、揺らぎが停止して減少収束するにつれて、開始蓄積パケット数を減少させるようにするので、ＲＴＰ受信バッファサイズは揺らぎ量に合わせて動的に制御され、音切れが軽減されると共に、音の遅延も最小限に抑えられるものである。 As described above, according to this embodiment, the number of start accumulation packets in the RTP reception buffer is increased in accordance with the occurrence of fluctuations, and the number of start accumulation packets is reduced as fluctuations stop and converge. Therefore, the RTP reception buffer size is dynamically controlled in accordance with the fluctuation amount, so that sound interruption is reduced and sound delay is minimized.

なお、上述の実施形態においては、ＲＴＰ受信バッファが空になったら、開始蓄積パケット数を増加目標蓄積量に変更するようにしたが、空ではなく、ＲＴＰ受信バッファの蓄積パケット数が所定数以下、例えば１個以下になったときに、開始蓄積パケット数を増加目標蓄積量に変更するようにしてもよい。 In the above-described embodiment, when the RTP reception buffer becomes empty, the start accumulation packet number is changed to the increase target accumulation amount. However, it is not empty, and the accumulation packet number of the RTP reception buffer is equal to or less than a predetermined number. For example, when the number becomes one or less, the number of start accumulation packets may be changed to an increase target accumulation amount.

［バッファ出力データ処理制御部５０５での処理］
図１３は、バッファ出力データ処理制御部５０５での処理をさらに説明するための機能ブロック図である。 [Processing in Buffer Output Data Processing Control Unit 505]
FIG. 13 is a functional block diagram for further explaining the processing in the buffer output data processing control unit 505.

すなわち、バッファ出力データ処理制御部５０５では、この例では、音声パケットサイズに合わせた周期で揺らぎ吸収バッファ５０２に対して受信パケットの取得を要求し、処理判断部５０５４で、受信パケットが得られたかどうかを判断する。 That is, in this example, the buffer output data processing control unit 505 requests the fluctuation absorbing buffer 502 to acquire a received packet at a cycle according to the voice packet size, and the processing determination unit 5054 obtains the received packet. Judge whether.

そして、処理判断部５０５４は、受信パケットの受信に成功したと判断したときには、当該受信パケットのデータをデータデコード処理部５０６に出力すると共に、履歴バッファＨＴＢＦに追加するようにする。 When it is determined that the received packet has been successfully received, the process determining unit 5054 outputs the received packet data to the data decode processing unit 506 and adds it to the history buffer HTBF.

履歴バッファＨＴＢＦは、所定時間分（複数パケット分以上）の音声データを保持することができるメモリで構成されており、常に、再生出力された最新の所定時間分の音声データを保持するようにする。すなわち、履歴バッファＨＴＢＦは再生した音声を前記所定時間分保持するもので、新たに履歴バッファＨＴＢＦに音声データが追加されるときには、当該履歴バッファＨＴＢＦに蓄積されている音声データの内の最も旧いデータが廃棄される。 The history buffer HTBF is composed of a memory that can hold audio data for a predetermined time (a plurality of packets or more), and always holds the audio data for the latest predetermined time that has been reproduced and output. . In other words, the history buffer HTBF holds the reproduced audio for the predetermined time, and when new audio data is added to the history buffer HTBF, the oldest data among the audio data stored in the history buffer HTBF. Is discarded.

音声波形周期演算部５０５１は、この履歴バッファＨＴＢＦに蓄積された音声データを用いて、最新の音声波形周期を算出する。この音声波形周期の算出処理方法については、後で詳述する。 The voice waveform cycle calculation unit 5051 calculates the latest voice waveform cycle using the voice data accumulated in the history buffer HTBF. A method for calculating the speech waveform period will be described in detail later.

また、処理判断部５０５４は、受信パケットを受信できなかったと判断したときには、音声データ合成処理部５０５２を制御して、出力した直近の音声データに基づいて音声信号を合成させる。 Also, when the process determining unit 5054 determines that the received packet has not been received, the process determining unit 5054 controls the audio data synthesizing unit 5052 to synthesize an audio signal based on the most recently output audio data.

音声データ合成処理部５０５２は、処理判断部５０５４からの制御指示により、ピッチバッファＰｉＢＦに記憶されている再生出力された最新の所定時間分の音声データに基づいて、その音声波形周期単位の合成音声信号を生成する。そして、音声データ合成処理部５０５２は、生成した合成音声信号をデータデコード処理部５０６に出力すると共に、履歴バッファＨＴＢＦに追加するようにする。 In response to a control instruction from the process determination unit 5054, the audio data synthesis processing unit 5052 is based on the latest audio data for a predetermined period of time reproduced and stored in the pitch buffer PiBF. Generate a signal. Then, the voice data synthesis processing unit 5052 outputs the generated synthesized voice signal to the data decoding processing unit 506 and adds it to the history buffer HTBF.

ピッチバッファＰｉＢＦには、合成音声信号の生成が必要となったときに、処理判断部５０５４からの制御指示により、履歴バッファＨＴＢＦに格納されている所定時間分の音声データがコピーされて書き込まれる。音声データ合成処理部５０５２は、履歴バッファＨＴＢＦの蓄積データを直接アクセスして合成音声信号の生成をすることもできるが、この例では、合成音声信号の生成処理作業を容易化するため、履歴バッファＨＴＢＦの内容を、ピッチバッファＰｉＢＦにコピーして、当該ピッチバッファＰｉＢＦを音声データ合成処理部５０５２がアクセスするようにしている。 When it is necessary to generate a synthesized audio signal, audio data for a predetermined time stored in the history buffer HTBF is copied and written to the pitch buffer PiBF in accordance with a control instruction from the processing determination unit 5054. The voice data synthesis processing unit 5052 can directly generate the synthesized voice signal by directly accessing the data stored in the history buffer HTBF. In this example, however, the history buffer HTBF is used to facilitate the synthesized voice signal generation process. Is copied to the pitch buffer PiBF, and the voice data synthesis processing unit 5052 accesses the pitch buffer PiBF.

＜音声波形周期の算出処理＞
図１４〜図１５は、音声波形周期の算出処理を説明するための図である。図１４（Ａ）は、履歴バッファＨＴＢＦの記憶データを説明するための図である。この例では、音声データはＰＣＭ信号であって、サンプリング周波数は例えば８ｋＨｚとされ、１音声パケットは８０サンプルとされている。 <Sound waveform cycle calculation processing>
14-15 is a figure for demonstrating the calculation process of a speech waveform period. FIG. 14A is a diagram for explaining data stored in the history buffer HTBF. In this example, the voice data is a PCM signal, the sampling frequency is 8 kHz, for example, and one voice packet is 80 samples.

図１４（Ａ）に示すように、この例の履歴バッファＨＴＢＦは、音声データの３９０サンプル分を記憶可能な容量を有するものとされている。図１４（Ａ）において、０〜３９０の数値は、サンプルアドレスを示し、数値が小さいアドレスほど、旧い音声データが記憶されているものとする。 As shown in FIG. 14A, the history buffer HTBF in this example has a capacity capable of storing 390 samples of audio data. In FIG. 14A, numerical values from 0 to 390 indicate sample addresses, and it is assumed that older audio data is stored for addresses having a smaller numerical value.

音声波形周期は、この例では、履歴バッファＨＴＢＦに記憶データの最新の過去の２０ミリ秒から求めるようにする。そのため、この例では、履歴バッファＨＴＢＦの最後（最新）の２０ミリ秒の波形ＳＡ（アドレス２３０〜３９０までの１６０サンプル）と、最新時点から３５ミリ秒までの過去の波形中の２０ミリ秒分の波形ＳＢ（アドレス１１０〜３９０までの２８０サンプルのうちの１６０サンプル）とを比較し、最も近似した箇所を検出するようにする。当該近似した箇所の検出方法は、２つの波形ＳＡとＳＢとの自己相関関数を計算し、その計算結果の値が大きいものほど近似しているという方法を用いるものである。 In this example, the voice waveform period is obtained from the latest past 20 milliseconds stored in the history buffer HTBF. Therefore, in this example, the last (latest) 20 ms waveform SA (160 samples from addresses 230 to 390) of the history buffer HTBF and 20 ms in the past waveform from the latest time to 35 ms. The waveform SB (160 samples out of 280 samples from addresses 110 to 390) is compared to detect the most approximate location. The approximate location detection method uses a method in which the autocorrelation function between the two waveforms SA and SB is calculated, and the larger the calculation result value is, the closer the approximation is.

すなわち、アドレス１１０〜３９０までの２８０サンプル中から、順次に１６０サンプル分を抽出したものを波形ＳＢとして、この波形ＳＢと波形ＳＡとを比較する。このとき、波形ＳＢの先頭アドレスのアドレス１１０に対する差をオフセット量と呼び、このオフセット量により、音声波形周期を算出するものである。 That is, the waveform SB is obtained by sequentially extracting 160 samples from 280 samples at addresses 110 to 390, and the waveform SB is compared with the waveform SA. At this time, the difference between the head address of the waveform SB and the address 110 is called an offset amount, and the speech waveform period is calculated based on the offset amount.

つまり、図１４（Ｂ）、（Ｃ）、（Ｄ）に示すように、この例では、オフセット０から１づつ順次にオフセットを増加しながら、オフセット８０まで、波形ＳＢを更新し、波形ＳＡと比較する（自己相関関数を演算する）。そして、自己相関関数の演算結果がピーク値を示すオフセット量から音声波形周期を算出する。ここで、オフセット８０の波形ＳＢまでしか繰り返さない理由は、求める音声波形周期の範囲が４０〜１２０サンプル分であるためである。 That is, as shown in FIGS. 14B, 14C, and 14D, in this example, the waveform SB is updated up to the offset 80 while increasing the offset sequentially from the offset 0 one by one, and the waveform SA and Compare (calculate autocorrelation function). Then, the speech waveform period is calculated from the offset amount at which the calculation result of the autocorrelation function indicates the peak value. Here, the reason for repeating only up to the waveform SB of the offset 80 is that the range of the speech waveform period to be obtained is 40 to 120 samples.

図１５は、音声波形周期の算出方法を説明するための図である。図１５（Ａ）は、オフセット０のときの相関関数値が最大ピークとなった場合であり、図１５（Ｂ）は、オフセットが８０のときの相関関数値が最大ピークになった場合である。 FIG. 15 is a diagram for explaining a method of calculating a speech waveform period. FIG. 15A shows a case where the correlation function value has a maximum peak when the offset is 0, and FIG. 15B shows a case where the correlation function value has a maximum peak when the offset is 80. .

図１５（Ａ）の場合には、履歴バッファＨＴＢＦ（図１５（Ａ−１）参照）のアドレス２３０〜３９０の波形ＳＡと、オフセット０の波形ＳＢ（アドレス１１０〜２７０の波形；図１５（Ａ−２）参照）とでは、両波形ＳＡ，ＳＢにおいて斜線を付した部分と網点を付した部分も近似した波形となっていることが分かる。 In the case of FIG. 15A, the waveform SA of addresses 230 to 390 of the history buffer HTBF (see FIG. 15A-1) and the waveform SB of offset 0 (the waveforms of addresses 110 to 270; FIG. 15A 2))), it can be seen that both the waveforms SA and SB have a waveform that approximates the hatched portion and the half-dotted portion.

したがって、アドレス２３０〜３９０の波形ＳＡにおいては、図１５（Ａ−３）で斜線を付して示すように、アドレス２３０〜２７０の部分と、アドレス３５０〜３９０の部分も近似している。このことから、図１５（Ａ）の例の音声波形周期は、図１５（Ａ−４）に示すように、アドレス２７０〜３９０までの１５ミリ秒であることが分かる。 Therefore, in the waveform SA of the addresses 230 to 390, as shown by hatching in FIG. 15A-3, the address 230 to 270 portion and the address 350 to 390 portion are also approximated. From this, it can be seen that the speech waveform period in the example of FIG. 15A is 15 milliseconds from the address 270 to 390 as shown in FIG. 15A-4.

また、図１５（Ｂ）の場合には、履歴バッファＨＴＢＦ（図１５（Ｂ−１）参照）のアドレス２３０〜３９０の１６０サンプル分の波形ＳＡと、オフセット８０の波形ＳＢ（アドレス１９０〜２７０の１６０サンプルの波形；図１５（Ｂ−２）参照）とでは、４０サンプル分ごとの部分においても近似した波形となっていることが分かる。 In the case of FIG. 15B, a waveform SA of 160 samples at addresses 230 to 390 of the history buffer HTBF (see FIG. 15B-1) and a waveform SB of offset 80 (addresses 190 to 270). 160 waveform (see FIG. 15B-2)), it can be seen that the waveform is also approximated in every 40 samples.

したがって、アドレス２３０〜３９０の波形ＳＡにおいては、図１５（Ｂ−３）で網点を付して示すように、アドレス２３０〜２７０の部分、アドレス２７０〜３１０の部分と、アドレス３１０〜３５０の部分およびアドレス３５０〜３９０の部分の４つは互いに近似していることになる。このことから、図１５（Ｂ）の例の音声波形周期は、図１５（Ｂ−４）に示すように、アドレス３５０〜３９０までの５ミリ秒であることが分かる。 Therefore, in the waveform SA of the addresses 230 to 390, as shown with halftone dots in FIG. 15B-3, the address 230 to 270, the address 270 to 310, and the addresses 310 to 350 are shown. The part and the four parts of the addresses 350 to 390 are approximate to each other. From this, it can be seen that the speech waveform period in the example of FIG. 15B is 5 milliseconds from addresses 350 to 390, as shown in FIG. 15B-4.

以上のことから、この実施形態では、音声波形周期の値は、（１２０−オフセット）サンプル分として求めることができる。 From the above, in this embodiment, the value of the speech waveform period can be obtained as (120−offset) samples.

図１６に、この例の場合における音声波形周期の算出処理の一例のフローチャートを示す。 FIG. 16 shows a flowchart of an example of the calculation process of the speech waveform period in the case of this example.

先ず、オフセット０での音声信号エネルギーを算出する（ステップＳ６１）。次に、オフセット０の波形ＳＢと波形ＳＡとの自己相関関数の値を算出する（ステップＳ６２）。次に、ステップＳ６１で求めたエネルギー値を元に正規化係数を算出する（ステップＳ６３）。そして、ステップＳ６３で求めた正規化係数により、ステップＳ６２で求めたオフセット０のときの自己相関関数の値を、正規化する（ステップＳ６４）。そして、そのときに得られた正規化された自己相関関数の値をピーク値として保持する（ステップＳ６５）。 First, the audio signal energy at offset 0 is calculated (step S61). Next, the value of the autocorrelation function between the waveform SB with the offset 0 and the waveform SA is calculated (step S62). Next, a normalization coefficient is calculated based on the energy value obtained in step S61 (step S63). Then, the value of the autocorrelation function at the offset 0 obtained in step S62 is normalized by the normalization coefficient obtained in step S63 (step S64). Then, the value of the normalized autocorrelation function obtained at that time is held as a peak value (step S65).

次に、オフセットを２インクリメントして、上記ステップＳ６１〜６４を行なって、正規化された自己相関関数の値を求め、それまでのピーク値と比較して、それまでのピーク値よりも新たに求めた自己相関関数が大きい時には、当該新たに求めた自己相関関数の値をピーク値としてそのときのオフセットと共に保持する。この処理を、オフセット８０まで繰り返す（ステップＳ６６）。 Next, the offset is incremented by 2 and steps S61 to S64 are performed to obtain a normalized autocorrelation function value, which is compared with the previous peak value and more newly than the previous peak value. When the obtained autocorrelation function is large, the value of the newly obtained autocorrelation function is held as a peak value together with the offset at that time. This process is repeated up to offset 80 (step S66).

ステップＳ６６の処理が終了したら、最後まで保持されていたピーク値のときのオフセットの前後±１のオフセットの範囲で、上記ステップＳ６１〜６４における自己相関関数の値を求める演算を行ない、値が最大となるオフセットを求める（ステップＳ６７）。そして、ステップＳ６７で求めたオフセットを、１２０から減算して、音声波形周期分のサンプル数を求める（ステップＳ６８）。以上で、音声波形周期の演算処理を終了する。 When the process of step S66 is completed, the calculation for obtaining the value of the autocorrelation function in steps S61 to 64 is performed in the range of ± 1 offset before and after the offset at the peak value held until the end, and the value is the maximum. Is obtained (step S67). Then, the offset obtained in step S67 is subtracted from 120 to obtain the number of samples corresponding to the speech waveform period (step S68). Thus, the speech waveform cycle calculation processing is completed.

この音声波形周期の算出結果は、音声データ合成処理部５０５２および音声データ廃棄処理部５０５３に通知されて、音声データの合成処理および音声データの廃棄処理が、当該音声波形周期単位で実行されるようにされる。 The calculation result of the voice waveform cycle is notified to the voice data synthesis processing unit 5052 and the voice data discard processing unit 5053 so that the voice data synthesis processing and the voice data discard processing are executed in units of the voice waveform cycle. To be.

＜正常再生音声データ出力処理および合成音声データ処理＞
次に、バッファ出力データ処理制御部５０５が、揺らぎ吸収バッファ５０２（ＲＴＰ受信バッファ）からのパケットを受信したときは、正常処理を行ない、受信できなかったときには、合成音声信号を生成する処理について説明する。この処理は、図８および図９に示した揺らぎ吸収バッファ５０２のバッファサイズ変更制御処理に併せて行なわれるのは、前述した通りである。 <Normal playback audio data output processing and synthesized audio data processing>
Next, when the buffer output data processing control unit 505 receives a packet from the fluctuation absorbing buffer 502 (RTP reception buffer), it performs normal processing, and when it cannot receive the packet, it generates a synthesized voice signal. To do. As described above, this processing is performed together with the buffer size change control processing of the fluctuation absorbing buffer 502 shown in FIGS.

図１７は、バッファ出力データ処理制御部５０５における処理のメインルーチンを示すフローチャートである。この図１７のバッファ出力処理制御部５０５における処理は、設定された割り込み周期で起動される。この例では、この割り込み周期は、音声パケットサイズに合わせたものとされている。なお、割り込み周期は、音声パケットサイズと一致させなくても勿論よい。 FIG. 17 is a flowchart showing a main routine of processing in the buffer output data processing control unit 505. The processing in the buffer output processing control unit 505 in FIG. 17 is started at the set interrupt cycle. In this example, this interrupt cycle is set to match the voice packet size. Of course, the interrupt period does not have to match the voice packet size.

まず、ＲＴＰ受信バッファを検索する（ステップＳ７１）。そして、バッファ参照フラグが「ＴＲＵＥ」であるか「ＦＡＬＳＥ」であるか判別する（ステップＳ７２）。バッファ参照フラグが「ＴＲＵＥ」であった場合には、バッファ出力データ処理制御部５０５では後述する正常フレーム処理を行なう（ステップＳ７３）。また、バッファ参照フラグが「ＦＡＬＳＥ」であった場合には、バッファ出力データ処理制御部５０５では、再生すべき音声データがないと判断して、後述する異常フレーム処理を行なう（ステップＳ７４）。なお、ここで、フレームは、パケットと同義の意味として用いている。 First, the RTP reception buffer is searched (step S71). Then, it is determined whether the buffer reference flag is “TRUE” or “FALSE” (step S72). If the buffer reference flag is “TRUE”, the buffer output data processing control unit 505 performs normal frame processing described later (step S73). If the buffer reference flag is “FALSE”, the buffer output data processing control unit 505 determines that there is no audio data to be reproduced, and performs an abnormal frame process to be described later (step S74). Here, the frame is used as the same meaning as the packet.

そして、ステップＳ７３およびステップＳ７４の処理の後、これらステップＳ７３、ステップＳ７４での処理結果の音声データをデータデコード処理部５０６（コーデック部に相当）に渡す。そして、この例では、割り込み周期分である１パケット分の時間を待ち（ステップＳ７６）、ステップＳ７１に戻って、以上の処理を繰り返す。 Then, after the processing in step S73 and step S74, the audio data resulting from the processing in steps S73 and S74 is transferred to the data decoding processing unit 506 (corresponding to the codec unit). In this example, a time of one packet that is an interrupt cycle is waited (step S76), the process returns to step S71, and the above processing is repeated.

＜ステップＳ７３の正常フレーム処理＞
ステップＳ７３の正常フレーム処理について、図１８および図１９のフローチャートを参照して説明する。 <Normal Frame Processing in Step S73>
The normal frame processing in step S73 will be described with reference to the flowcharts in FIGS.

先ず、揺らぎ吸収バッファ５０２からの音声パケットデータを受信する（ステップＳ８１）。次に、直前はデータロスのため、合成音声信号が出力されていたか否か判別する（ステップＳ８２）。直前は、データロスではなかったときには、受信した音声パケットデータを、履歴バッファＨＴＢＦに追加すると共にデータデコード処理部５０６に送る（ステップＳ９０）。また、このとき、揺らぎ吸収制御部５０４は、管理上、揺らぎ吸収バッファ５０２の蓄積パケット数をデクリメントすると共に、参照インデックスをインクリメントする。バッファ出力データ処理制御部５０５からのパケット取得要求があったときには、参照インデックスから音声パケットを読み出すものである。 First, voice packet data is received from the fluctuation absorbing buffer 502 (step S81). Next, because of data loss immediately before, it is determined whether or not a synthesized speech signal has been output (step S82). If there is no data loss immediately before, the received voice packet data is added to the history buffer HTBF and sent to the data decoding processing unit 506 (step S90). At this time, the fluctuation absorption control unit 504 decrements the number of accumulated packets in the fluctuation absorption buffer 502 and increments the reference index for management. When there is a packet acquisition request from the buffer output data processing control unit 505, the voice packet is read from the reference index.

なお、音声データは、オーディオポートに送られる前に、例えば３０サンプル分遅延される。この遅延により、次フレーム（次パケット）が合成音声信号となった場合に、この遅延区間を波形合成することにより、波形が滑らかに遷移するようになる。 The audio data is delayed by, for example, 30 samples before being sent to the audio port. When the next frame (next packet) becomes a synthesized speech signal due to this delay, the waveform transitions smoothly by synthesizing the delay section.

ステップＳ８２で、直前はデータロスであったため、合成音声信号が出力されていたと判別したときには、受信パケットのシーケンス番号が、合成音声信号が出力される前に受信されて出力された受信パケットのシーケンス番号に連続するシーケンス番号であるか、不連続のシーケンス番号であるかを判別する（ステップＳ８３）。 In step S82, when it is determined that a synthesized voice signal has been output because there was a data loss immediately before, the sequence number of the received packet is received and output before the synthesized voice signal is output. It is determined whether the sequence number is a sequence number consecutive to the number or a discontinuous sequence number (step S83).

ステップＳ８３で、連続するシーケンス番号であると判別したときには、合成音声信号が、前の受信パケットと波形的に連続するように、前の受信パケットから求められた音声周期に基づいて生成されたパケット単位データであることから、受信パケットは、そのまま波形的に連続するものと考えられるが、より波形を滑らかに遷移させるようにするため、この例では、先に出力された合成音声信号と受信パケットの音声データとは波形合成するようにする。 When it is determined in step S83 that the sequence numbers are consecutive, the packet generated based on the voice cycle obtained from the previous received packet so that the synthesized voice signal is continuous in waveform with the previous received packet Since it is unit data, the received packet is considered to be continuous as a waveform, but in this example, in order to make the waveform transition more smoothly, in this example, the synthesized speech signal and the received packet that were output first Waveform synthesis is performed with the audio data.

すなわち、先ず、先に出力された合成音声信号と受信パケットとについて波形合成範囲を算出する（ステップＳ８８）。次に、先に出力された合成音声信号についての波形合成範囲を波形合成用バッファＯＬＢＦにコピーする（ステップＳ８９）。そして、波形合成範囲において、先に出力された合成音声信号と受信パケットの音声データとを波形合成する（ステップＳ９０）。そして、波形合成範囲において波形合成処理した１パケット分の受信データを履歴バッファＨＴＢＦに追加すると共に、データデコード処理部５０６に送る（ステップＳ９１）。 That is, first, a waveform synthesis range is calculated for the synthesized speech signal and the received packet that have been output previously (step S88). Next, the waveform synthesis range for the synthesized speech signal output previously is copied to the waveform synthesis buffer OLBF (step S89). Then, in the waveform synthesis range, the synthesized voice signal output earlier and the voice data of the received packet are synthesized (step S90). Then, the received data for one packet subjected to the waveform synthesis processing in the waveform synthesis range is added to the history buffer HTBF and sent to the data decoding processing unit 506 (step S91).

また、ステップＳ８３で、不連続のシーケンス番号であると判別したときには、音声波形周期演算部５０５１で算出された音声波形周期を取得する（ステップＳ８４）。そして、音声波形周期は、パケットサイズである８０サンプルに対してどのような値であるか判別する（ステップＳ８５）。 If it is determined in step S83 that the sequence number is discontinuous, the voice waveform cycle calculated by the voice waveform cycle calculator 5051 is acquired (step S84). Then, it is determined what value the voice waveform cycle is for 80 samples which is the packet size (step S85).

ステップＳ８５で、音声波形周期が丁度８０サンプル分で、パケットサイズと等しいと判別されたときには、合成音声信号が音声周期単位で生成されていることから、そのまま波形的に連続するものと考えられるが、より波形を滑らかに遷移させるようにするため、この例では、先に出力された合成音声信号と受信パケットの音声データとは波形合成するようにする。 If it is determined in step S85 that the speech waveform period is exactly 80 samples and equal to the packet size, the synthesized speech signal is generated in units of speech cycles, so it is considered that the waveform continues as it is. In this example, the synthesized voice signal output earlier and the voice data of the received packet are subjected to waveform synthesis in order to make the waveform transition more smoothly.

すなわち、ステップＳ８５からステップＳ８８に飛び、前述したステップＳ８８〜ステップＳ９１の処理を行なうようにする。 That is, the process jumps from step S85 to step S88, and the processes of steps S88 to S91 described above are performed.

また、ステップＳ８５で、音声波形周期が８０サンプル分よりも短い周期であると判別したときには、図２０に示すような処理を行なう。 If it is determined in step S85 that the speech waveform cycle is shorter than 80 samples, the processing shown in FIG. 20 is performed.

すなわち、図２０（Ａ）は、履歴バッファＨＴＢＦに格納されている過去の音声パケットデータを示すものであり、この例では、最新のパケットデータは合成音声信号となっている。この実施形態では、音声波形周期単位で合成音声信号を生成して、パケットのつなぎ目で再生音声波形が滑らかに遷移するようにしており、合成音声信号は、図２０（Ｂ）に示すような状態で、音声波形周期単位で生成されているものである。 20A shows past voice packet data stored in the history buffer HTBF. In this example, the latest packet data is a synthesized voice signal. In this embodiment, a synthesized speech signal is generated in units of speech waveform cycles so that the reproduced speech waveform smoothly transitions at the joint of packets, and the synthesized speech signal is in a state as shown in FIG. Thus, it is generated in units of speech waveform periods.

受信パケットは、伝送系での揺らぎにより遅れて到着したものであり、図２０の例であれば、「音声２」のパケットの次の「音声３」であって、合成音声信号が算出された音声波形周期分であれば、滑らかに繋がると予想できる。 The received packet arrives late due to fluctuations in the transmission system. In the example of FIG. 20, the synthesized voice signal is calculated for “voice 3” next to the packet of “voice 2”. It can be expected that the speech waveform period is connected smoothly.

しかし、合成音声信号のパケットサイズよりも音声波形周期が短いため、合成音声信号のパケットは、図２０（Ｂ）に示すように、（１音声波形周期＋残余部分）からなり、当該残余部分の存在により、そのまま「音声３」の受信パケットをつなげたのでは、波形は不連続になってしまう。 However, since the speech waveform cycle is shorter than the packet size of the synthesized speech signal, the packet of the synthesized speech signal is composed of (1 speech waveform cycle + residual portion) as shown in FIG. If the received packet of “voice 3” is connected as it is, the waveform becomes discontinuous.

そこで、この例においては、図２０（Ｃ）に示すように、「音声３」の受信パケットは、その先頭の前記残余部分に相当する部分を削除した後、合成音声信号とつなげるようにすれば、図２０（Ｄ）に示すように、音声波形周期単位で連続したものとなり、音声波形は連続性を維持することができる。ただし、合成音声信号と「音声３」の受信パケットとをより滑らかにつなげるため、この例では、合成音声信号と「音声３」の受信パケットとの間で後述するような波形合成処理を行なう。 Therefore, in this example, as shown in FIG. 20C, the received packet of “voice 3” can be connected to the synthesized voice signal after the portion corresponding to the remaining portion at the head is deleted. As shown in FIG. 20D, the speech waveform is continuous in units of speech waveform, and the speech waveform can maintain continuity. However, in order to connect the synthesized voice signal and the “voice 3” received packet more smoothly, in this example, a waveform synthesis process as described later is performed between the synthesized voice signal and the “voice 3” received packet.

以上のことを考慮して、ステップＳ８５で、音声波形周期の値がパケットサイズである８０よりも短いときには、図２０（Ｃ）に示した残余部分に対応する削除部分ＥＲの長さ（削除レングス）を算出する（ステップＳ８６）。削除レングスは、（８０−音声波形周期）として算出することができる。そして、算出した削除レングス分を、受信パケットの先頭から削除する（ステップＳ８７）。 Considering the above, when the value of the speech waveform period is shorter than 80, which is the packet size, in step S85, the length (deletion length) of the deletion portion ER corresponding to the remaining portion shown in FIG. ) Is calculated (step S86). The deletion length can be calculated as (80−speech waveform period). Then, the calculated deletion length is deleted from the head of the received packet (step S87).

そして、合成音声信号と受信パケットとについて波形合成範囲を算出する（ステップＳ８８）。次に、合成音声信号についての波形合成範囲を波形合成用バッファＯＬＢＦにコピーする（ステップＳ８９）。そして、波形合成範囲において、波形合成用バッファＯＬＢＦの音声データと受信パケットの音声データとを波形合成する（ステップＳ９０）。そして、波形合成範囲において波形合成処理した１パケット分の受信データを履歴バッファＨＴＢＦに追加すると共に、データデコード処理部５０６に送る（ステップＳ９１）。 Then, a waveform synthesis range is calculated for the synthesized voice signal and the received packet (step S88). Next, the waveform synthesis range for the synthesized speech signal is copied to the waveform synthesis buffer OLBF (step S89). Then, in the waveform synthesis range, the waveform data is synthesized between the audio data of the waveform synthesis buffer OLBF and the audio data of the received packet (step S90). Then, the received data for one packet subjected to the waveform synthesis processing in the waveform synthesis range is added to the history buffer HTBF and sent to the data decoding processing unit 506 (step S91).

また、ステップＳ８５で、音声波形周期が８０サンプル分よりも長い周期であると判別したときには、図２１に示すような処理を行なう。 If it is determined in step S85 that the speech waveform cycle is longer than 80 samples, processing as shown in FIG. 21 is performed.

すなわち、図２１（Ａ）は、履歴バッファＨＴＢＦに格納されている過去の音声パケットデータを示すものであり、この例では、最新のパケットデータは合成音声信号となっている。この実施形態では、音声波形周期単位で合成音声信号を生成して、パケットのつなぎ目で再生音声波形が滑らかに遷移するようにしており、合成音声信号は、図２１（Ｂ）に示すような状態で、音声波形周期単位で生成されているものである。 That is, FIG. 21A shows past voice packet data stored in the history buffer HTBF. In this example, the latest packet data is a synthesized voice signal. In this embodiment, a synthesized speech signal is generated in units of speech waveform cycles so that the reproduced speech waveform smoothly transitions at the joint of packets, and the synthesized speech signal is in a state as shown in FIG. Thus, it is generated in units of speech waveform periods.

受信パケットは、伝送系での揺らぎにより遅れて到着したものであり、図２１の例であれば、「音声２」のパケットの次の「音声３」であって、合成音声信号が算出された音声波形周期分であれば、滑らかに繋がると予想できる。 The received packet arrives with a delay due to fluctuations in the transmission system. In the example of FIG. 21, the synthesized voice signal is calculated for “voice 3” next to the packet of “voice 2”. It can be expected that the speech waveform period is connected smoothly.

しかし、合成音声信号のパケットサイズよりも音声波形周期が長いため、合成音声信号のパケットは、図２１（Ｂ）に示すように、（１音声波形周期−不足部分）からなり、当該不足部分の存在により、そのまま「音声３」の受信パケットをつなげたのでは、波形は不連続になってしまう。 However, since the speech waveform cycle is longer than the packet size of the synthesized speech signal, the synthesized speech signal packet is composed of (1 speech waveform cycle-insufficient portion) as shown in FIG. If the received packet of “voice 3” is connected as it is, the waveform becomes discontinuous.

そこで、この例においては、図２１（Ｃ）に示すように、「音声３」の受信パケットは、そのパケットの前に前記不足部分に相当する部分を追加した後、合成音声信号とつなげるようにすれば、図２１（Ｄ）に示すように、音声波形周期単位で連続したものとなり、音声波形は連続性を維持することができる。ただし、合成音声信号と「音声３」の受信パケットとをより滑らかにつなげるため、この例では、合成音声信号と「音声３」の受信パケットとの間で後述するような波形合成処理を行なう。 Therefore, in this example, as shown in FIG. 21C, the received packet of “voice 3” is connected to the synthesized voice signal after adding a portion corresponding to the insufficient portion before the packet. Then, as shown in FIG. 21 (D), it becomes continuous in units of speech waveform cycles, and the speech waveform can maintain continuity. However, in order to connect the synthesized voice signal and the “voice 3” received packet more smoothly, in this example, a waveform synthesis process as described later is performed between the synthesized voice signal and the “voice 3” received packet.

以上のことを考慮して、ステップＳ８５で、音声波形周期の値がパケットサイズである８０よりも長いときには、図２１（Ｃ）に示した不足部分に対応する追加部分の長さ（追加レングス）を算出する（図１９のステップＳ１０１）。追加レングスは、（音声波形周期−８０）として算出することができる。そして、算出した追加レングス分を、図２１（Ｃ）に示すように、「音声２」のパケットの最後の相当分として抽出してコピーすることにより、受信パケットの先頭に追加する（ステップＳ１０２）。 In consideration of the above, when the value of the speech waveform period is longer than 80, which is the packet size, in step S85, the length of the additional portion (additional length) corresponding to the shortage portion shown in FIG. Is calculated (step S101 in FIG. 19). The additional length can be calculated as (speech waveform period-80). Then, as shown in FIG. 21C, the calculated additional length is extracted and copied as the last equivalent of the “voice 2” packet, and added to the head of the received packet (step S102). .

そして、追加したデータ部分を適当に波形減衰させ（ステップＳ１０３）、それを履歴バッファＨＴＢＦに追加する（ステップＳ１０４）。その後、追加部分と受信パケットとについて波形合成範囲を算出する（ステップＳ１０５）。次に、波形合成範囲を波形合成用バッファＯＬＢＦにコピーする（ステップＳ１０６）。そして、波形合成範囲において、波形合成用バッファの音声データと受信パケットの音声データとを波形合成する（ステップＳ１０７）。そして、波形合成範囲において波形合成処理した１パケット分の受信データを履歴バッファＨＴＢＦに追加すると共に、データデコード処理部５０６に送る（図１８のステップＳ９０）。 Then, the added data portion is appropriately waveform attenuated (step S103), and added to the history buffer HTBF (step S104). Thereafter, a waveform synthesis range is calculated for the additional portion and the received packet (step S105). Next, the waveform synthesis range is copied to the waveform synthesis buffer OLBF (step S106). Then, in the waveform synthesis range, the voice data of the waveform synthesis buffer and the voice data of the received packet are synthesized (step S107). Then, the received data for one packet subjected to the waveform synthesis processing in the waveform synthesis range is added to the history buffer HTBF and sent to the data decoding processing unit 506 (step S90 in FIG. 18).

図２２は、この波形合成処理を説明するための図であり、この図２２の例は、音声波形周期がパケットサイズである８０よりも短い場合である。 FIG. 22 is a diagram for explaining this waveform synthesis processing, and the example of FIG. 22 is a case where the speech waveform cycle is shorter than 80 which is the packet size.

すなわち、図２２（Ａ）は、波形合成処理前の履歴バッファＨＴＢＦの記憶内容およびピッチバッファＰｉＢＦの記憶内容を説明するための図である。ピッチバッファＰｉＢＦの記憶内容は、後述する合成処理（図２４〜図２８参照）を行なった後の記憶内容であり、ｐｉｓｔは、音声波形周期のデータのスタートアドレス、ｐｏｆｓは、スタートアドレスｐｉｓｔよりも（パケットサイズ−音声波形周期）分だけ離れたアドレス、ｐｅｎｄは、ピッチバッファＰｉＢＦの終わりのアドレスである。 That is, FIG. 22A is a diagram for explaining the storage contents of the history buffer HTBF and the storage contents of the pitch buffer PiBF before the waveform synthesis processing. The content stored in the pitch buffer PiBF is the content stored after a synthesizing process (see FIGS. 24 to 28) described later, pist is the start address of the data of the speech waveform period, and pofs is more than the start address pist. An address separated by (packet size−voice waveform period), “pend”, is an end address of the pitch buffer PiBF.

履歴バッファＨＴＢＦの記憶内容である合成信号（１パケット分）は、ピッチバッファＰｉＢＦの１周期分の音声波形データを、スタートアドレスｐｉｓｔからエンドアドレスｐｅｎｄに向かって、その間を繰り返すことにより得られるもので、１周期分に対してｐｉｓｔ−ｐｏｆｓだけの差分が追加あるいは削除されたものに等しい。つまり、履歴バッファＨＴＢＦの最後尾の音声データサンプルは、アドレスｐｏｆｓのサンプルに等しい。 The synthesized signal (one packet) that is stored in the history buffer HTBF is obtained by repeating the sound waveform data for one cycle in the pitch buffer PiBF from the start address pist toward the end address pend. This is equivalent to adding or deleting a difference of only pist-pofs for one period. That is, the last audio data sample in the history buffer HTBF is equal to the sample at the address pofs.

したがって、波形合成範囲は、図２２（Ｂ）に示すように、ピッチバッファＰｉＢＦのアドレスｐｏｆｓから所定時間分（例えば、音声波形周期の１／４）とされる部分ｏｌａｐとされ、この部分ｏｌａｐが波形合成用バッファＯＬＢＦにコピーされる。 Therefore, as shown in FIG. 22 (B), the waveform synthesis range is a part lap that is a predetermined time (for example, 1/4 of the speech waveform period) from the address pofs of the pitch buffer PiBF. Copied to the waveform synthesis buffer OLBF.

そして、図２２（Ｃ）に示すように、波形合成用バッファＯＬＢＦにコピーされた部分ｏｌａｐと、受信パケットの音声データの対応する範囲部分が波形合成され、その波形合成結果が、受信パケットの音声データの対応する範囲部分に書き戻される。 Then, as shown in FIG. 22 (C), the portion olap copied to the waveform synthesis buffer OLBF and the corresponding range portion of the voice data of the received packet are subjected to waveform synthesis, and the waveform synthesis result becomes the voice of the received packet. Written back to the corresponding range part of the data.

そして、このようにして、先頭の波形合成範囲のデータについて波形合成が行なわれた受信パケットが、図２２（Ｄ）に示すようにして、履歴バッファＨＴＢＦに追加されるものである。 Then, the received packet that has been subjected to waveform synthesis for the data in the head waveform synthesis range in this way is added to the history buffer HTBF as shown in FIG. 22 (D).

波形合成処理は、図２３に示すようにして行なわれる。すなわち、図２３は、波形Ａと波形Ｂとを合成する場合を示すものである。ここで、図２３（Ａ）に示す波形Ａは、履歴バッファＨＴＢＦに記憶されている音声データ、図２３（Ｂ）に示す波形Ｂは、受信パケットの音声データとする。 The waveform synthesis process is performed as shown in FIG. That is, FIG. 23 shows a case where the waveform A and the waveform B are synthesized. Here, the waveform A shown in FIG. 23A is audio data stored in the history buffer HTBF, and the waveform B shown in FIG. 23B is audio data of a received packet.

先ず、波形Ａは、図２３（Ｃ）に示すように、徐々に減衰するゲイン特性を掛け算して、図２３（Ｅ）に示すように、徐々に振幅が減衰する波形の信号に変換する。また、波形Ｂは、図２３（Ｄ）に示すように、徐々に上昇するゲイン特性を掛け算して、図２３（Ｆ）に示すように、徐々に振幅が大きくなる波形の信号に変換する。そして、図２３（Ｅ）おおび（Ｆ）の波形を掛け算することにより、両波形を合成して、図２３（Ｇ）に示すような合成波形を得る。この合成波形は、波形Ａの信号から波形Ｂの信号に連続性を保持して滑らかに変化するものとなる。 First, the waveform A is multiplied by a gain characteristic that gradually attenuates as shown in FIG. 23C, and is converted into a signal having a waveform that gradually decreases in amplitude as shown in FIG. Further, the waveform B is multiplied by a gain characteristic that gradually increases as shown in FIG. 23D, and converted into a waveform signal that gradually increases in amplitude as shown in FIG. Then, by multiplying the waveforms of FIGS. 23E and 23F, both waveforms are synthesized to obtain a synthesized waveform as shown in FIG. The synthesized waveform changes smoothly from the waveform A signal to the waveform B signal while maintaining continuity.

＜ステップＳ７４の異常フレーム処理＞
図２４および図２５は、バッファ出力データ処理制御部５０５におけるステップＳ７４の異常フレーム処理の詳細例を示すフローチャートである。また、図２６〜図２８は、その処理内容を説明するための図である。以下、これらの図を参照しながら、ステップＳ７４の異常フレーム処理の例について説明する。 <Abnormal Frame Processing in Step S74>
24 and 25 are flowcharts showing a detailed example of the abnormal frame processing in step S74 in the buffer output data processing control unit 505. FIGS. 26 to 28 are diagrams for explaining the processing contents. Hereinafter, an example of the abnormal frame process in step S74 will be described with reference to these drawings.

先ず、揺らぎ吸収バッファ５０２から受信できなかったパケットが１個目であるか否か判別する（ステップＳ１１１）。１個目であると判別したときには、履歴バッファＨＴＢＦの内容をピッチバッファＰｉＢＦにコピーする（ステップＳ１１２；図２６（Ａ）参照）。そして、音声波形周期演算部５０５１から算出された音声波形周期を取得する（ステップＳ１１３）。 First, it is determined whether or not the packet that could not be received from the fluctuation absorbing buffer 502 is the first packet (step S111). When it is determined that it is the first one, the contents of the history buffer HTBF are copied to the pitch buffer PiBF (step S112; see FIG. 26A). Then, the voice waveform cycle calculated from the voice waveform cycle calculation unit 5051 is acquired (step S113).

次に、ピッチバッファＰｉＢＦにコピーされた音声データの最後の部分（最新部分）について、波形合成範囲部分ｐｏｖｌを算出し（ステップＳ１１４）、算出した波形合成範囲部分ｐｏｖｌをバッファｌａｓｔｑにコピーしておく（ステップＳ１１５；図２６（Ｂ）参照）。この例では、波形合成範囲は、前述したように、音声波形周期の１／４とされている。 Next, the waveform synthesis range part povl is calculated for the last part (latest part) of the audio data copied to the pitch buffer PiBF (step S114), and the calculated waveform synthesis range part povl is copied to the buffer lastq. (Step S115; see FIG. 26B). In this example, the waveform synthesis range is ¼ of the speech waveform cycle as described above.

次に、図２６（Ｃ）に示すように、ピッチバッファＰｉＢＦの最新の音声波形周期分よりも前の波形合成範囲分ｐｏｖｌと、バッファｌａｓｔｑにコピーされた波形合成範囲ｐｏｖｌとについて前述した波形合成処理を行ない、その波形合成結果を、ピッチバッファＰｉＢＦの最新の音声波形周期分のうちの最後尾の波形合成範囲分ｐｏｖｌに書き戻す（ステップＳ１１６）。 Next, as shown in FIG. 26C, the waveform synthesis described above with respect to the waveform synthesis range portion povl before the latest speech waveform cycle of the pitch buffer PiBF and the waveform synthesis range povl copied to the buffer lastq. The processing is performed, and the waveform synthesis result is written back to the last waveform synthesis range povl of the latest speech waveform period of the pitch buffer PiBF (step S116).

次に、ステップＳ１１６で書き戻されたピッチバッファＰｉＢＦの最新の音声波形周期分のうちの最後尾の波形合成範囲分ｐｏｖｌを、履歴バッファＨＴＢＦに追加すると共にデータデコード処理部５０６に送出する（ステップＳ１１７；図２６（Ｄ）参照）。 Next, the last waveform synthesis range part povl of the latest voice waveform period of the pitch buffer PiBF written back in step S116 is added to the history buffer HTBF and sent to the data decoding processing unit 506 (step S116). S117; see FIG. 26 (D)).

次に、ピッチバッファＰｉＢＦの最新の音声波形周期分を、図２６（Ｅ）に示すように、アドレスｐｉｓｔからｐｅｎｄ（＝３９０）までを繰り返し読み出すようにして、１パケット分をピッチバッファＰｉＢＦから読み出し（ステップＳ１１８）、読み出した１パケット分の合成音声信号を、１パケット分の容量の出力バッファＯＵＴを介してデータデコード処理部５０６に送出すると共に、履歴バッファＨＴＢＦに追加する（ステップＳ１１９）。 Next, as shown in FIG. 26E, the latest voice waveform period of the pitch buffer PiBF is repeatedly read from the address pist to pend (= 390), and one packet is read from the pitch buffer PiBF. (Step S118) The read synthesized voice signal for one packet is sent to the data decoding processing unit 506 via the output buffer OUT having a capacity for one packet, and is added to the history buffer HTBF (Step S119).

次に、ステップＳ１１１で、パケットデータの受信を失敗したのは１回目ではないと判別したときには、パケットデータの受信を失敗したのは、連続した２回あるいは３回であるかどうか判別し（ステップＳ１２０）、そうであると判別したときには、ピッチバッファＰｉＢＦの最新の音声波形周期分から、波形合成範囲部分を算出して、それをバッファｔｍｐにコピーしておく（ステップＳ１２１；図２７（Ａ）および（Ｂ）参照）。 Next, when it is determined in step S111 that the packet data reception has not been failed for the first time, it is determined whether the packet data reception has failed two or three times in succession (step If it is determined that this is the case, the waveform synthesis range portion is calculated from the latest speech waveform period of the pitch buffer PiBF and copied to the buffer tmp (step S121; FIG. 27A and FIG. 27). (See (B)).

すなわち、図２７（Ａ）は、このときの履歴バッファＨＴＢＦの記憶内容およびピッチバッファＰｉＢＦの記憶内容であり、連続した２回目の受信パケットの取得を失敗した場合である。このピッチバッファＰｉＢＦの記憶内容は、前述した１回目のパケット受信の失敗における処理を行なった後の記憶内容であり、ｐｉｓｔは、音声波形周期のデータのスタートアドレス、ｐｏｆｓは、スタートアドレスｐｉｓｔよりも（パケットサイズ−音声波形周期）分だけ離れたアドレス、ｐｅｎｄは、ピッチバッファＰｉＢＦの終わりのアドレス（＝３９０）である。 That is, FIG. 27A shows the stored contents of the history buffer HTBF and the stored contents of the pitch buffer PiBF at this time, and is a case where acquisition of the second consecutive received packet has failed. The stored content of the pitch buffer PiBF is the stored content after the processing in the first packet reception failure described above, pist is the start address of the voice waveform cycle data, and pofs is more than the start address pist. An address separated by (packet size−voice waveform period), “pend”, is the address (= 390) at the end of the pitch buffer PiBF.

前述したように、履歴バッファＨＴＢＦの記憶内容である合成信号（１パケット分）は、ピッチバッファＰｉＢＦの１周期分の音声波形データを、スタートアドレスｐｉｓｔからエンドアドレスｐｅｎｄに向かって、その間を繰り返すことにより得られるもので、１周期分に対してｐｉｓｔ−ｐｏｆｓだけの差分が追加あるいは削除されたものに等しく、履歴バッファＨＴＢＦの最後尾の音声データサンプルは、アドレスｐｏｆｓのサンプルに等しい。 As described above, the synthesized signal (for one packet) that is the content stored in the history buffer HTBF repeats the sound waveform data for one cycle of the pitch buffer PiBF from the start address pist to the end address pend. And the difference of only pist-pofs for one period is added or deleted, and the last audio data sample of the history buffer HTBF is equal to the sample of the address pofs.

したがって、波形合成範囲は、図２７（Ｂ）に示すように、ピッチバッファＰｉＢＦのアドレスｐｏｆｓから所定時間分（例えば、音声波形周期の１／４）とされる部分ｐｏｖｌとされ、この部分ｐｏｖｌが一時バッファｔｍｐにコピーされる。 Therefore, as shown in FIG. 27B, the waveform synthesis range is a portion povl that is a predetermined time (for example, ¼ of the speech waveform period) from the address pofs of the pitch buffer PiBF, and this portion povl is Copied to temporary buffer tmp.

次に、ピッチバッファＰｉＢＦの最終アドレスｐｅｎｄから、連続２回目の受信失敗の場合には２音声波形周期分前、連続３回目の受信失敗の場合には３音声波形周期分前、のアドレスを算出し、それをアドレスｐｉｓｔとすると共に、そのアドレスｐｉｓｔよりもさらに、所定時間分（例えば、音声波形周期の１／４）前までの部分を波形合成範囲ｐｏｖｌとする（ステップＳ１２２）。連続２回目の受信失敗の場合を図２７（Ｃ）の上側に示す。 Next, from the last address pend of the pitch buffer PiBF, the address of two voice waveform periods before the second consecutive reception failure and the address of three voice waveform periods before the third consecutive reception failure are calculated. Then, it is set as an address pist, and a portion up to a predetermined time (for example, 1/4 of the speech waveform period) before the address pist is set as a waveform synthesis range povl (step S122). The case of continuous second reception failure is shown on the upper side of FIG.

そして、図２７（Ｃ）に示すように、ステップＳ１２２で求めた波形合成範囲ｐｏｖｌと、１回目のパケット受信失敗の際にコピーして保存したバッファｌａｓｔｑの波形合成範囲ｐｏｖｌとを、前述の図２３を用いて説明したようにして波形合成し、当該波形合成結果を、ピッチバッファＰｉＢＦのアドレスｐｅｎｄの前の波形合成範囲ｐｏｖｌの部分に書き戻す（ステップＳ１２３）。 Then, as shown in FIG. 27C, the waveform synthesis range povl obtained in step S122 and the waveform synthesis range povl of the buffer lastq copied and stored when the first packet reception failed are shown in the above-described diagram. The waveform synthesis is performed as described with reference to Fig. 23, and the waveform synthesis result is written back to the portion of the waveform synthesis range povl before the address pend of the pitch buffer PiBF (step S123).

次に、ピッチバッファＰｉＢＦから１パケット分の合成音声信号を読み出して、１パケット分の容量のバッファＯＵＴに書き込む（ステップＳ１２４）。このとき、１パケット分の音声データのピッチバッファＰｉＢＦからの読み出しは、連続２回目の受信失敗の場合であれば、図２７（Ｄ）に示すように、ピッチバッファＰｉＢＦのアドレスｐｏｆｓ（ｐｅｎｄよりも波形合成範囲ｐｏｖｌ分手前のアドレス）から開始し、アドレスｐｅｎｄになったら、このアドレスｐｅｎｄから２音声波形周期前のアドレスｐｉｓｔに飛び、それ以降は、アドレスｐｅｎｄの方向に向かって順次行われる。 Next, a synthesized voice signal for one packet is read from the pitch buffer PiBF and written to the buffer OUT having a capacity for one packet (step S124). At this time, when the voice data for one packet is read from the pitch buffer PiBF in the case of the second consecutive reception failure, as shown in FIG. 27D, the address pofs (pend of the pitch buffer PiBF) (Address before the waveform synthesis range povl). When the address becomes the pend, the address pend jumps to the address pist two voice waveform cycles before, and thereafter, the process is sequentially performed in the direction of the address pend.

連続３回目の受信失敗の場合には、ピッチバッファＰｉＢｆにおける上記アドレスｐｉｓｔがアドレスｐｅｎｄから３音声波形周期前のアドレスになる点が異なるだけでその他は、同様である。 In the case of the third consecutive reception failure, the other points are the same except that the address pist in the pitch buffer PiBf becomes an address three speech waveform periods before the address pend.

次に、図２７（Ｅ）に示すように、バッファＯＵＴの先頭の波形合成範囲分ｐｏｖｌのデータと、一時バッファｔｍｐにコピーされた波形合成範囲分ｐｏｖｌのデータとについて、前述の図２３を用いて説明したようにして波形合成し、当該波形合成結果を、バッファＯＵＴの先頭の波形合成範囲分ｐｏｖｌに書き戻す（ステップＳ１２５）。そして、このバッファＯＵＴの合成音声信号データを、例えば全体を２０％減衰させた後（ステップＳ１２６）、履歴バッファＨＴＢＦに追加すると共に、データデコード処理部５０６に出力する（ステップＳ１１９）。 Next, as shown in FIG. 27E, the above-described FIG. 23 is used for the data of the waveform synthesis range povl at the head of the buffer OUT and the data of the waveform synthesis range povl copied to the temporary buffer tmp. The waveform synthesis is performed as described above, and the waveform synthesis result is written back to the first waveform synthesis range povl of the buffer OUT (step S125). The synthesized audio signal data in the buffer OUT is attenuated by 20%, for example (step S126), and then added to the history buffer HTBF and output to the data decode processing unit 506 (step S119).

また、ステップＳ１２０で、連続してパケットの受信を失敗したのは２〜３回目ではないと判別したときには、連続してパケットの受信を失敗したのは４〜５回目であるか否か判別し（図２５のステップＳ１３１）、そうであれば、図２８に示すように、そのときのピッチバッファＰｉＢＦに格納されている音声データのうちの、アドレスｐｅｎｄから３音声波形周期前までの音声データから、１パケット分の音声信号を読み出す（ステップＳ１３２）。 Further, when it is determined in step S120 that the packet reception has failed continuously for the second to third times, it is determined whether or not the packet reception has failed continuously for the fourth to fifth times. (Step S131 in FIG. 25), if so, as shown in FIG. 28, from the speech data stored in the pitch buffer PiBF at that time, from speech data from the address pend to three speech waveform periods before An audio signal for one packet is read (step S132).

このとき、１パケット分の音声データのピッチバッファＰｉＢＦからの読み出しは、図２８に示すように、ピッチバッファＰｉＢＦのアドレスｐｏｆｓ（ｐｅｎｄよりも波形合成範囲ｐｏｖｌ分手前のアドレス）から開始し、アドレスｐｅｎｄになったら、このアドレスｐｅｎｄから３音声波形周期前のアドレスｐｉｓｔに飛び、それ以降は、アドレスｐｅｎｄの方向に向かって順次行われる。 At this time, reading of audio data for one packet from the pitch buffer PiBF starts from the address pofs of the pitch buffer PiBF (an address before the waveform synthesis range povl before the pend), as shown in FIG. Then, it jumps from this address “pend” to the address “pist” three cycles before the speech waveform period, and thereafter, it is sequentially performed toward the address “pend”.

次に、ステップＳ１３２で読み出した１パケット分の音声データの全体を６０％減衰した後（ステップＳ１３３）、履歴バッファＨＴＢＦに追加すると共に、データデコード処理部５０６に出力する（ステップＳ１３４）。 Next, the entire audio data for one packet read in step S132 is attenuated by 60% (step S133) and then added to the history buffer HTBF and output to the data decoding processing unit 506 (step S134).

また、ステップＳ１３１で、連続して受信パケットの受信を失敗したのは、４〜５回ではなく、６回以上であると判別したときには、１パケット分のデータとして全て「０」（無音）を出力バッファＯＵＴに書き込み（ステップＳ１３５）、それを履歴バッファＨＴＢＦに追加すると共に、データデコード処理部５０６に出力する（ステップＳ１３４）。 If it is determined in step S131 that reception of consecutive received packets has failed six or more times, not four to five times, all “0” (silence) is set as one packet data. Write to the output buffer OUT (step S135), add it to the history buffer HTBF, and output it to the data decoding processing unit 506 (step S134).

上述のようにして、この実施形態では、合成音声信号は、音声波形周期単位で生成して、出力するようにしているので、音声波形は、連続性が保持される。また、この実施形態では、受信パケットの受信を２〜３回連続して失敗したときには、１回目の受信失敗の際に用いた音声波形周期のデータを、そのまま用いずに、ピッチバッファＰｉＢＦに格納されている２音声波形周期分の音声データ、３音声波形周期分の音声データをも用いるようにしたことにより、合成音声信号が人工音になってしまうことを防止している。 As described above, in this embodiment, since the synthesized speech signal is generated and output in units of speech waveform periods, the speech waveform maintains continuity. In this embodiment, when reception of a received packet fails two to three times consecutively, the voice waveform cycle data used at the time of the first reception failure is not used as it is, but stored in the pitch buffer PiBF. By using the voice data corresponding to two voice waveform periods and the voice data corresponding to three voice waveform periods, the synthesized voice signal is prevented from becoming an artificial sound.

［音声データの廃棄］
前述したように、バッファ出力データ処理制御部５０５は、揺らぎ吸収制御部５０４からの割り込みによる廃棄要求に基づいて、音声波形周期単位の音声データの廃棄処理を行なう。この廃棄処理は、音声データ廃棄処理部５０５３が行なうものである。 [Discard audio data]
As described above, the buffer output data processing control unit 505 performs voice data discarding processing in units of voice waveform periods based on the discarding request by the interruption from the fluctuation absorption control unit 504. This discarding process is performed by the voice data discarding processing unit 5053.

図２９および図３０、図３１は、この音声波形周期単位の音声データの廃棄処理を説明するためのフローチャートである。また、図３２および図３３は、このときの廃棄処理の様子を説明するための図である。 FIG. 29, FIG. 30, and FIG. 31 are flowcharts for explaining the voice data discarding process in units of voice waveform periods. FIG. 32 and FIG. 33 are diagrams for explaining the state of the discarding process at this time.

バッファ出力データ処理制御部５０５では、まず、揺らぎ吸収制御部５０４からの廃棄要求に含まれる廃棄レングス（データ廃棄のために揺らぎ吸収バッファ５０２から強制送出されるパケット数に応じたデータ量）を受信し、それをバッファに保持する（ステップＳ１４１）。 First, the buffer output data processing control unit 505 receives the discard length included in the discard request from the fluctuation absorption control unit 504 (the amount of data corresponding to the number of packets forcibly transmitted from the fluctuation absorption buffer 502 for data discard). Then, it is held in the buffer (step S141).

次に、揺らぎ吸収バッファ５０２から送出されてくる音声パケットデータを受信し（ステップＳ１４２）、音声波形周期単位の音声データの廃棄処理を行なう（ステップＳ１４３）。そして、ステップＳ１４１で保持した廃棄レングス分の処理が終了したか否か判別し（ステップＳ１４４）、終了していなければ、ステップＳ１４２に戻って、ステップＳ１４２とステップＳ１４３を繰り返し行ない、終了したと判別したときには、この廃棄処理ルーチンを終了する。 Next, the voice packet data transmitted from the fluctuation absorbing buffer 502 is received (step S142), and the voice data is discarded in units of voice waveform periods (step S143). Then, it is determined whether or not the processing for the discard length held in step S141 has been completed (step S144). If not, the process returns to step S142, and steps S142 and S143 are repeated to determine that the process has ended. If so, the discard processing routine is terminated.

次に、ステップＳ１４３における音声波形周期単位の音声データの廃棄処理を、図３０および図３１のフローチャートおよび図３２および図３３を参照して説明する。図３０および図３１のフローチャートは、１パケット分ごとに実行されるものである。 Next, the audio data discarding process in units of audio waveform periods in step S143 will be described with reference to the flowcharts of FIGS. 30 and 31 and FIGS. 32 and 33. FIG. The flowcharts of FIGS. 30 and 31 are executed for each packet.

すなわち、先ず、音声データ廃棄処理部では、周期残余データがあるか否か判別する（ステップＳ１５１）。ここで、周期残余データは、音声波形周期単位でデータ廃棄を行なったときに、音声波形周期が１パケットよりも長い場合には、１パケット分の音声サンプルデータの廃棄では１音声波形周期分の廃棄とはならず、廃棄しなければならない部分が残るが、当該残った部分を指す。 That is, first, the audio data discard processing unit determines whether there is periodic residual data (step S151). Here, in the case where the voice waveform period is longer than one packet when the data is discarded in units of voice waveform periods, the period remaining data is equivalent to one voice waveform period when discarding voice sample data for one packet. It is not discarded, and there remains a portion that must be discarded, but it refers to the remaining portion.

このステップＳ１５１で、周期残余データが「０」であると判別したときには、廃棄カウンタが最大値であるか否か判別する（ステップＳ１５２）。ここで、廃棄カウンタは、毎回パケットデータを破棄してしまうと、早送り音声のように不自然な音声となってしまうので、それを避けるために、破棄するパケットの間隔を設定するためのものであり、廃棄カウンタのカウント値が、設定された最大値になったときに、廃棄を実行するようにするものである。この例では、廃棄カウンタの最大値は、例えば「１」に設定されている。なお、揺らぎ吸収制御部５０４において決定する廃棄パケット数は、設定される廃棄カウンタのカウント値の最大値に応じて決定されるものである。 If it is determined in this step S151 that the period residual data is “0”, it is determined whether or not the discard counter is the maximum value (step S152). Here, the discard counter is for setting the interval of discarded packets in order to avoid unnatural voice like fast-forward voice if packet data is discarded every time. Yes, discarding is executed when the count value of the discard counter reaches a set maximum value. In this example, the maximum value of the discard counter is set to “1”, for example. Note that the number of discarded packets determined by the fluctuation absorption control unit 504 is determined in accordance with the maximum count value of the set discard counter.

ステップＳ１５２で、廃棄カウンタのカウント値が最大値になっていると判別したときには、廃棄カウンタのカウント値をリセットして「０」にした後（ステップＳ１５３）、音声波形周期を音声波形周期演算部５０５１から取得する（ステップＳ１５４）。 If it is determined in step S152 that the count value of the discard counter has reached the maximum value, the count value of the discard counter is reset to “0” (step S153), and then the speech waveform period is converted to a speech waveform period calculation unit. It acquires from 5051 (step S154).

次に、波形合成のための波形合成範囲データを、履歴バッファＨＴＢＦの最後尾の（音声波形周期／４）分に更新しておく（ステップＳ１５５）。次に、音声波形周期の値を検知し、パケットサイズである８０サンプル分未満か、あるいは、８０サンプル分以上であるかを判別する（ステップＳ１５６）。 Next, the waveform synthesis range data for waveform synthesis is updated to the last (voice waveform cycle / 4) of the history buffer HTBF (step S155). Next, the value of the voice waveform cycle is detected, and it is determined whether the packet size is less than 80 samples or more than 80 samples (step S156).

ステップＳ１５６で、音声波形周期が８０サンプル分未満であると判別したときには、受信パケットのデータの先頭から音声波形周期分を削除（廃棄）する（ステップＳ１５７）。そして、履歴バッファＨＴＢＦの最後のデータと、波形合成する波形合成範囲を算出する（ステップＳ１５８）。このとき、（８０（１パケット分のサンプル数）−音声波形周期分のサンプル数）が、（音声波形周期／４）分よりも大きいときには、（音声波形周期／４）分を波形合成範囲とし、（音声波形周期／４）分よりも小さいときには、（８０−音声波形周期分のサンプル数）分を波形合成範囲とする。 If it is determined in step S156 that the speech waveform period is less than 80 samples, the speech waveform period is deleted (discarded) from the beginning of the data of the received packet (step S157). Then, the last data in the history buffer HTBF and the waveform synthesis range for waveform synthesis are calculated (step S158). At this time, when (80 (number of samples for one packet) −number of samples for voice waveform cycle) is larger than (voice waveform cycle / 4), the waveform synthesis range is set to (voice waveform cycle / 4). , (Speech waveform period / 4), the waveform synthesis range is (80−the number of samples corresponding to the speech waveform period).

次に求めた波形合成範囲の音声データを履歴バッファＨＴＢＦから波形合成用バッファにコピーし（ステップＳ１５９）、コピーされた波形合成用バッファのデータと、受信パケットのうちの廃棄されずに残った（８０−音声波形周期分のサンプル数）分とを、前述と同様にして波形合成する（ステップＳ１６０）。 Next, the audio data in the obtained waveform synthesis range is copied from the history buffer HTBF to the waveform synthesis buffer (step S159), and the copied waveform synthesis buffer data and the received packet remain without being discarded ( The waveform is synthesized in the same manner as described above (step S160).

そして、波形合成後の受信パケットのうちの廃棄されずに残った（８０−音声波形周期分のサンプル数）分を履歴バッファＨＴＢＦに追加すると共に、データデコード処理部５０６に出力する（ステップＳ１６１）。そして、バッファに記憶された廃棄レングスの値から、廃棄した１音声波形周期分を減算する（ステップＳ１６２）。そして、この１パケット分の廃棄処理ルーチンを終了する。 Then, the remaining (80−number of samples corresponding to the speech waveform period) remaining in the received packet after waveform synthesis is added to the history buffer HTBF and output to the data decode processing unit 506 (step S161). . Then, the discarded one speech waveform cycle is subtracted from the discard length value stored in the buffer (step S162). Then, the discard processing routine for one packet is completed.

ステップＳ１５６で、音声波形周期の値が８０以上であると判別したときには、（音声波形周期分のサンプル数−８０（１パケット分のサンプル数））を、周期残余データとして保持する（ステップＳ１６３）。そして、バッファに記憶された廃棄レングスの値から、廃棄した１音声波形周期分を減算する（ステップＳ１６４）。そして、この１パケット分の廃棄処理ルーチンを終了する。 When it is determined in step S156 that the value of the voice waveform period is 80 or more, (number of samples for the voice waveform period−80 (number of samples for one packet)) is held as the remaining period data (step S163). . Then, the discarded one speech waveform period is subtracted from the discard length value stored in the buffer (step S164). Then, the discard processing routine for one packet is completed.

また、ステップＳ１５２で、破棄カウンタのカウント値は、最大値ではないと判別したときには、受信パケットは、廃棄対象とはしないので、廃棄カウンタを１だけインクリメントし（ステップＳ１６５）、受信パケットを履歴バッファＨＴＢＦに追加すると共にデータデコード処理部５０６に出力するようにする（ステップＳ１６６）。そして、この１パケット分の廃棄処理ルーチンを終了する。 If it is determined in step S152 that the count value of the discard counter is not the maximum value, the received packet is not targeted for discarding, so the discard counter is incremented by 1 (step S165), and the received packet is stored in the history buffer. In addition to the HTBF, the data is output to the data decoding processing unit 506 (step S166). Then, the discard processing routine for one packet is completed.

また、ステップＳ１５１で、周期残余データが「０」ではないと判別したときには、受信パケットの先頭から周期残余データ分のサンプルを削除（廃棄）する（図３１のステップＳ１７１）。 If it is determined in step S151 that the period residual data is not “0”, samples corresponding to the period residual data are deleted (discarded) from the head of the received packet (step S171 in FIG. 31).

次に、周期残余分を削除した受信パケットのデータについて、波形合成範囲を算出し（ステップＳ１７２）、算出した波形合成範囲の音声データ分を履歴バッファＨＴＢＦから波形合成用バッファにコピーし（ステップＳ１７３）、コピーされた波形合成用バッファのデータと、受信パケットのうちの廃棄されずに残った（８０−周期残余データサンプル数）分とを、前述と同様にして波形合成する（ステップＳ１７４）。 Next, a waveform synthesis range is calculated for the received packet data from which the period remainder is deleted (step S172), and the audio data in the calculated waveform synthesis range is copied from the history buffer HTBF to the waveform synthesis buffer (step S173). ) The copied waveform synthesis buffer data and the remaining (80-period residual data sample number) of the received packets that have not been discarded are synthesized in the same manner as described above (step S174).

そして、波形合成後の受信パケットのうちの廃棄されずに残った（８０−周期残余データサンプル数）分を履歴バッファＨＴＢＦに追加すると共に、データデコード処理部５０６に出力する（ステップＳ１７５）。そして、バッファに記憶された廃棄レングスの値から、廃棄した周期残用データサンプル分を減算する（ステップＳ１７６）。そして、周期残余データの値を「０」にした後（ステップＳ１７７）、この１パケット分の廃棄処理ルーチンを終了する。 Then, the remaining (80-period remaining data sample number) of the received packets after waveform synthesis that have not been discarded are added to the history buffer HTBF and output to the data decode processing unit 506 (step S175). Then, the discarded period remaining data sample is subtracted from the discard length value stored in the buffer (step S176). Then, after the value of the cycle residual data is set to “0” (step S177), the discard processing routine for one packet is ended.

以上説明した音声データの廃棄処理を図３２および図３３を参照してさらに説明する。図３２は、音声波形周期がパケットサイズ＝８０サンプルよりも短い場合の廃棄処理を説明するための図である。 The audio data discarding process described above will be further described with reference to FIGS. 32 and 33. FIG. FIG. 32 is a diagram for explaining the discarding process when the voice waveform cycle is shorter than the packet size = 80 samples.

すなわち、図３２の例においては、図３２（Ａ）に示すように、音声２のパケットの受信処理の後、データ廃棄要求が到来した場合である。このときは、図３２（Ｃ）に示すように、音声３のパケットの先頭から音声波形周期分の音声データを廃棄する。廃棄されずに残った音声３の部分は、図３２（Ｄ）に示すように、履歴バッファＨＴＢＦに追加される。 That is, in the example of FIG. 32, as shown in FIG. 32A, a data discard request arrives after the reception process of the voice 2 packet. At this time, as shown in FIG. 32C, voice data corresponding to the voice waveform period is discarded from the head of the voice 3 packet. The part of the voice 3 remaining without being discarded is added to the history buffer HTBF as shown in FIG.

このとき、図３２（Ｂ）と図３２（Ｃ）とから分かるように、廃棄されたのが、音声波形周期分であるので、音声２と廃棄されずに残った音声３とは、音声波形周期でみると連続したものとなり、しかも、音声２と廃棄されずに残った音声３との間で波形合成処理がなされることにより、波形は、滑らかに連続するものとなる。 At this time, as can be seen from FIGS. 32 (B) and 32 (C), since it is the speech waveform period that has been discarded, the speech 2 and the speech 3 that remains without being discarded are the speech waveform. When viewed in terms of the period, the waveform is continuous, and the waveform is smoothly continuous by performing the waveform synthesis process between the speech 2 and the speech 3 remaining without being discarded.

そして、音声３のパケットについてデータ廃棄がなされたときに廃棄カウンタがリセットされるため、音声４のパケットは、データ廃棄の対象とはならず、図３２（Ｄ）に示すように、そのまま履歴バッファＨＴＢＦに追加される。そして、図３２（Ｃ）に示すように、次の音声５のパケットについて、音声３のパケットと同様の処理がなされる。以下、廃棄レングス分の処理が終了するまで、以上の処理が繰り返し行われる。 Since the discard counter is reset when data discard is performed for the voice 3 packet, the voice 4 packet is not subject to data discard, and as shown in FIG. Added to HTBF. Then, as shown in FIG. 32C, the same processing as that for the voice 3 packet is performed on the next voice 5 packet. Thereafter, the above processing is repeatedly performed until the processing for the discard length is completed.

次に、図３３は、音声波形周期がパケットサイズ＝８０サンプル以上である場合の廃棄処理を説明するための図である。 Next, FIG. 33 is a diagram for explaining the discarding process when the voice waveform period is packet size = 80 samples or more.

すなわち、図３３の例においても、図３３（Ａ）に示すように、音声２のパケットの受信処理の後、データ廃棄要求が到来した場合である。このときは、図３３（Ｃ）に示すように、音声波形周期がパケットサイズ以上であるので、音声３のパケット全部の音声データを廃棄すると共に、次の音声４のパケットの先頭から、周期残余データ分を廃棄するようにする。 That is, also in the example of FIG. 33, as shown in FIG. 33A, a data discard request arrives after the reception process of the voice 2 packet. At this time, as shown in FIG. 33C, since the voice waveform cycle is equal to or larger than the packet size, the voice data of the whole voice 3 packet is discarded and the remaining period from the beginning of the next voice 4 packet. Discard the data.

そして、廃棄されずに残った音声４の部分は、図３３（Ｄ）に示すように、履歴バッファＨＴＢＦに追加される。この場合、当該音声４の部分が音声２の後に追加されることになるが、音声波形周期単位の破棄であるので、波形の連続性が保持されると共に、音声２と廃棄されずに残った音声４との間で波形合成処理がなされることにより、波形は、滑らかに連続するものとなる。 The portion of the voice 4 that remains without being discarded is added to the history buffer HTBF, as shown in FIG. In this case, the portion of the voice 4 is added after the voice 2, but since the voice waveform cycle unit is discarded, the continuity of the waveform is maintained and the voice 2 remains without being discarded. By performing the waveform synthesis process with the voice 4, the waveform is smoothly continuous.

そして、以上の処理が廃棄レングス分の処理が終了するまで、繰り返し行われる。 The above processing is repeated until the processing for the discard length is completed.

以上説明したように、廃棄レングスが複数個の音声パケット分となる場合であっても、それら全ての音声パケットのデータが廃棄されるのではなく、音声波形周期単位の廃棄であるので、再生音は、ノイズが少なく、かつ、聞きやすいものとなる。 As described above, even when the discard length is for a plurality of voice packets, the data of all the voice packets is not discarded, but is discarded in units of voice waveform periods. Is less noise and easier to hear.

［その他の変形例］
以上説明した実施形態は、揺らぎ吸収バッファのバッファサイズを、発生する揺らぎに応じて動的に変更制御するようにする場合であるが、揺らぎ吸収バッファのバッファサイズが固定であって、揺らぎ吸収バッファに蓄積される音声パケット数が最大蓄積パケット数を越えたときには、所定数の音声パケット分のデータを廃棄するようにする場合にも、この発明は適用できる。 [Other variations]
In the embodiment described above, the buffer size of the fluctuation absorbing buffer is dynamically changed and controlled according to the fluctuation that occurs. However, the buffer size of the fluctuation absorbing buffer is fixed, and the fluctuation absorbing buffer The present invention can also be applied to a case where data for a predetermined number of voice packets is discarded when the number of voice packets stored in exceeds the maximum number of packets stored.

また、揺らぎ吸収バッファのバッファサイズが固定であって、揺らぎ吸収バッファに、開始蓄積パケット数の音声パケットが蓄積されてから、揺らぎ吸収バッファからの音声パケットの読み出しを開始し、揺らぎ吸収バッファに蓄積される音声パケット数が開始蓄積パケット数よりも少なくなり、揺らぎ吸収バッファからリアルタイムで再生されるべき音声信号が読み出されなくなったときには、合成音声信号をリアルタイムで再生する音声信号とするようにする場合にも、この発明は適用できる。 In addition, the buffer size of the fluctuation absorbing buffer is fixed, and after the number of voice packets starting to accumulate is accumulated in the fluctuation absorbing buffer, reading of the voice packets from the fluctuation absorbing buffer is started and accumulated in the fluctuation absorbing buffer. When the number of voice packets to be played is smaller than the number of start accumulation packets and the voice signal to be reproduced in real time is not read from the fluctuation absorbing buffer, the synthesized voice signal is made to be a voice signal to be reproduced in real time. Even in this case, the present invention can be applied.

また、この発明は、ＬＡＮを通じてリアルタイム再生が必要なデータを伝送する際において、ＬＡＮ上で発生する揺らぎについて適用可能であるので、上述の実施形態で説明したＶｏＩＰ電話システムにのみ適用される場合のみに、この発明が限られるものではないことは言うまでもない。 In addition, since the present invention can be applied to fluctuations occurring on the LAN when transmitting data that requires real-time reproduction through the LAN, only when the present invention is applied only to the VoIP telephone system described in the above embodiment. Needless to say, the present invention is not limited.

この発明の適用例としてのＶｏＩＰ電話システムの構成例を示す図である。It is a figure which shows the structural example of the VoIP telephone system as an example of application of this invention. 図１の例のＶｏＩＰ電話システムを構成するゲートキーパーの構成例を示すブロック図である。It is a block diagram which shows the structural example of the gatekeeper which comprises the VoIP telephone system of the example of FIG. 図１の例のＶＯＩＰ電話システムを構成する電話端末の構成例を示すブロック図である。It is a block diagram which shows the structural example of the telephone terminal which comprises the VOIP telephone system of the example of FIG. 図１の例のＶＯＩＰ電話システムを構成するゲートウエイの構成例を示すブロック図である。It is a block diagram which shows the structural example of the gateway which comprises the VOIP telephone system of the example of FIG. この発明の実施の形態における揺らぎ吸収バッファの構成を説明するための図である。It is a figure for demonstrating the structure of the fluctuation | variation absorption buffer in embodiment of this invention. この発明の実施の形態における揺らぎ吸収制御装置の構成を説明するためのブロック図である。It is a block diagram for demonstrating the structure of the fluctuation absorption control apparatus in embodiment of this invention. この発明の実施の形態における揺らぎ吸収制御方法の一部を説明するためのフローチャートである。It is a flowchart for demonstrating a part of fluctuation absorption control method in embodiment of this invention. この発明の実施の形態における揺らぎ吸収制御方法を説明するためのフローチャートの一部である。It is a part of flowchart for demonstrating the fluctuation absorption control method in embodiment of this invention. この発明の実施の形態における揺らぎ吸収制御方法を説明するためのフローチャートの一部である。It is a part of flowchart for demonstrating the fluctuation absorption control method in embodiment of this invention. この発明の実施の形態が適用された揺らぎ吸収バッファにおける蓄積パケットの変化およびバッファサイズの変更制御を説明するための図である。It is a figure for demonstrating the change control of the accumulation | storage packet in the fluctuation absorption buffer to which embodiment of this invention was applied, and change of a buffer size. この発明の実施の形態が適用された揺らぎ吸収バッファにおける蓄積パケットの変化およびバッファサイズの変更制御を説明するための図である。It is a figure for demonstrating the change control of the accumulation | storage packet in the fluctuation absorption buffer to which embodiment of this invention was applied, and change of a buffer size. この発明の実施の形態が適用された揺らぎ吸収バッファにおける蓄積パケットの変化およびバッファサイズの変更制御を説明するための図である。It is a figure for demonstrating the change control of the accumulation | storage packet in the fluctuation absorption buffer to which embodiment of this invention was applied, and change of a buffer size. この発明の実施の形態における揺らぎ吸収制御装置の要部の構成を説明するためのブロック図である。It is a block diagram for demonstrating the structure of the principal part of the fluctuation absorption control apparatus in embodiment of this invention. この発明の実施の形態における音声波形周期算出処理を説明するための図である。It is a figure for demonstrating the audio | voice waveform period calculation process in embodiment of this invention. この発明の実施の形態における音声波形周期算出処理を説明するための図である。It is a figure for demonstrating the audio | voice waveform period calculation process in embodiment of this invention. この発明の実施の形態における音声波形周期算出処理を説明するためのフローチャートである。It is a flowchart for demonstrating the audio | voice waveform period calculation process in embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法を説明するためのフローチャートである。It is a flowchart for demonstrating the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法を説明するためのフローチャートである。It is a flowchart for demonstrating the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法を説明するためのフローチャートである。It is a flowchart for demonstrating the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法におけるパケットデータの処理手順を説明するための図である。It is a figure for demonstrating the process sequence of the packet data in the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法におけるパケットデータの処理手順を説明するための図である。It is a figure for demonstrating the process sequence of the packet data in the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法におけるパケットデータの処理手順を説明するための図である。It is a figure for demonstrating the process sequence of the packet data in the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法における波形合成処理を説明するための図である。It is a figure for demonstrating the waveform synthesis process in the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法を説明するためのフローチャートの一部である。It is a part of flowchart for demonstrating the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法を説明するためのフローチャートの一部である。It is a part of flowchart for demonstrating the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法におけるパケットデータの処理手順を説明するための図である。It is a figure for demonstrating the process sequence of the packet data in the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法におけるパケットデータの処理手順を説明するための図である。It is a figure for demonstrating the process sequence of the packet data in the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法におけるパケットデータの処理手順を説明するための図である。It is a figure for demonstrating the process sequence of the packet data in the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法を説明するためのフローチャートである。It is a flowchart for demonstrating the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法を説明するためのフローチャートの一部である。It is a part of flowchart for demonstrating the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法を説明するためのフローチャートの一部である。It is a part of flowchart for demonstrating the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法におけるパケットデータの処理手順を説明するための図である。It is a figure for demonstrating the process sequence of the packet data in the fluctuation absorption control method of embodiment of this invention. この発明の実施の形態の揺らぎ吸収制御方法におけるパケットデータの処理手順を説明するための図である。It is a figure for demonstrating the process sequence of the packet data in the fluctuation absorption control method of embodiment of this invention. 従来の揺らぎ吸収制御方法を説明するためのシーケンス図である。It is a sequence diagram for demonstrating the conventional fluctuation absorption control method. 従来の揺らぎ吸収制御方法を説明するための図である。It is a figure for demonstrating the conventional fluctuation absorption control method. 従来の揺らぎ吸収制御方法を説明するためのフローチャートである。It is a flowchart for demonstrating the conventional fluctuation absorption control method.

Explanation of symbols

１ゲートキーパー
２電話端末
３ＩＰネットワークを構成するＬＡＮ
４ゲートウエイ
２１０，４１０ＣＰＵ
２２１、４１７揺らぎ吸収バッファを構成するＲＴＰ受信バッファ
５０２揺らぎ吸収バッファ
５０３揺らぎ検出部
５０４揺らぎ吸収制御部
５０５バッファ出力データ処理制御部
５０６データデコード処理部
５０５１音声波形周期演算部
５０５２音声データ合成処理部
５０５３音声データ廃棄処理部
５０５４処理判断部
1 Gatekeeper 2 Telephone terminal 3 LAN constituting the IP network
4 Gateway 210, 410 CPU
221 and 417 RTP reception buffer 502 constituting the fluctuation absorbing buffer Fluctuation absorbing buffer 503 Fluctuation detecting section 504 Fluctuation absorbing control section 505 Buffer output data processing control section 506 Data decoding processing section 5051 Voice waveform period computing section 5052 Voice data synthesis processing section 5053 Audio data discard processing unit 5054 processing determination unit

Claims

The voice signal that is to be reproduced in real time is packetized every predetermined time length and is sequentially transmitted through the fluctuation absorbing buffer, and fluctuation of arrival timing of the voice packet generated in the transmission system is received. , A method of controlling by the fluctuation absorbing buffer,
At the start of reception, after the number of voice packets stored in the fluctuation absorbing buffer is accumulated in the fluctuation absorbing buffer, reading of the voice packet from the fluctuation absorbing buffer is started, and the number of voice packets accumulated in the fluctuation absorbing buffer is In the voice packet fluctuation absorption control method that discards data for a predetermined number of voice packets when the maximum number of accumulated packets is exceeded,
For voice packet data stored in the fluctuation absorbing buffer, a voice waveform period is detected, and when discarding the data for the predetermined number of voice packets, the detected voice waveform period is used as a unit. A voice packet fluctuation absorption control method characterized in that the continuity of a voice waveform is maintained by discarding.

The voice signal that is to be reproduced in real time is packetized every predetermined time length and is sequentially transmitted through the fluctuation absorbing buffer, and fluctuation of arrival timing of the voice packet generated in the transmission system is received. , A method of controlling by the fluctuation absorbing buffer,
At the start of reception, after the number of voice packets stored in the fluctuation absorbing buffer is accumulated in the fluctuation absorbing buffer, reading of the voice packet from the fluctuation absorbing buffer is started, and the number of voice packets accumulated in the fluctuation absorbing buffer is An audio packet for reducing a synthesized audio signal to be an audio signal to be reproduced in real time when the audio signal to be reproduced in real time is no longer read from the fluctuation absorbing buffer when the number is less than the start accumulation packet number In the fluctuation absorption control method,
A voice waveform period is detected for the voice packet data stored in the fluctuation absorbing buffer, and the synthesized voice signal is detected using the voice packet data stored in the fluctuation absorbing buffer. The voice packet fluctuation absorption control method is characterized in that the continuity of the voice waveform is maintained by generating the voice waveform period.

In the voice packet fluctuation absorption control method according to claim 1 or 2,
A voice packet characterized in that the fluctuation is absorbed by dynamically changing the number of start accumulation packets and the maximum number of accumulation packets in accordance with an increase or decrease in the amount of fluctuation occurring in the transmission system. Fluctuation absorption control method.

In the voice packet fluctuation absorption control method according to claim 1,
When the fluctuation amount is stable at a value lower than the starting accumulated packet number at that time, the starting accumulated packet number and the maximum accumulated packet number are changed to a value smaller than the value at that time, The fluctuation absorption control method, wherein voice data corresponding to the number of packets overflowing from the fluctuation absorption buffer in accordance with the change is discarded in units of the voice waveform period.

In the voice packet fluctuation absorption control method according to claim 1,
The fluctuation absorption control method according to claim 1, wherein the discarding of the voice waveform period unit is performed every N (N ≧ 1) voice packets.

In the voice packet fluctuation absorption control method according to claim 1,
A fluctuation absorption control method comprising performing waveform synthesis processing at a joint between data before and after discarding voice data in units of the voice waveform period.

In the voice packet fluctuation absorption control method according to claim 2,
A fluctuation absorption control method, wherein the synthesized voice signal is subjected to waveform synthesis processing with data before and after the synthesized voice signal.