JP2002026964A

JP2002026964A - Voice packet transmitter, voice packet receiver, and packet communication system

Info

Publication number: JP2002026964A
Application number: JP2000200584A
Authority: JP
Inventors: Satoshi Watanabe; 聡渡辺; Shinji Hayakawa; 慎司早川
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2000-07-03
Filing date: 2000-07-03
Publication date: 2002-01-25

Abstract

PROBLEM TO BE SOLVED: To provide a voice packet receiver that minimizes deterioration in voice quality. SOLUTION: The voice packet receiver that receives a voice packet sent from a voice packet transmitter via a prescribed network, is provided with a reception state detection means that detects a reception state of a voice packet fluctuated corresponding to the traffic on the network and with a reception state information transmission means that transmits the reception state information in response to the reception state detected by the reception state detection means to the network to transmit the reception state information to the voice packet transmitter.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はパケット通信システ
ムに関し、例えば、インターネットなどのパケット通信
網を使って音声通信を行う場合に適用し得るものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a packet communication system, and can be applied, for example, to voice communication using a packet communication network such as the Internet.

【０００２】また、本発明は、かかるパケット通信シス
テムの構成要素としての音声パケット受信装置に関する
ものである。[0002] The present invention also relates to a voice packet receiving apparatus as a component of such a packet communication system.

【０００３】さらに本発明は、かかるパケット通信シス
テムの構成要素としての音声パケット送信装置に関する
ものである。Further, the present invention relates to a voice packet transmitting apparatus as a component of such a packet communication system.

【０００４】[0004]

【従来の技術】現在、インターネットなどのパケット通
信網を使って音声通信をおこなう方法（いわゆるインタ
ーネット電話等）が盛んである。2. Description of the Related Art At present, a method of performing voice communication using a packet communication network such as the Internet (a so-called Internet telephone) is active.

【０００５】[0005]

【発明が解決しようとする課題】ところが、このような
パケット通信において良好な音質を確保するためには、
ビットレートと音質の関係が重要な問題となっている。However, in order to ensure good sound quality in such packet communication,
The relationship between bit rate and sound quality is an important issue.

【０００６】ビットレートを上げれば符号化歪みは減り
音質が向上するが、ネットワーク上のトラフィックが増
加するため、パケット到着時間が不安定になり、バッフ
ァリング遅延やパケット損失（パケットロス）を許容せ
ざるを得ず、結果的に音質は低下する。一方ビットレー
トを下げれば、トラフィック面の問題は緩和されるが符
号化歪みが増大し、やはり音質は低下する。[0006] Increasing the bit rate reduces coding distortion and improves sound quality, but increases traffic on the network, so that packet arrival time becomes unstable, and buffering delay and packet loss (packet loss) are tolerated. Inevitably, the sound quality deteriorates as a result. On the other hand, if the bit rate is reduced, the problem of traffic is alleviated, but the coding distortion increases, and the sound quality also deteriorates.

【０００７】この問題に対する解決方法の一つとして、
無音圧縮方法が使われている。無音圧縮方法は音声の無
音期間中における伝送ビットを低減するために使用され
る方法で、送信側で１フレーム毎にそのフレームが有音
か無音かを判定（以下、「有音判定」）し、無音フレー
ムであれば有音フレームと比べて少ない情報量に符号化
して、ネットワークに送出する方法である。人間同士の
会話において、片方の話者につき一般に５０％〜６０％
は無音だといわれており、無音圧縮方法は、平均ビット
レートを下げるために、極めて効果の高い符号化方法で
ある。One of the solutions to this problem is
A silence compression method is used. The silence compression method is a method used to reduce the number of transmission bits during a silence period of voice. The transmitting side determines, for each frame, whether the frame is speech or silence (hereinafter, “speech decision”). In the case of a silent frame, this method encodes the information into a smaller amount of information than a voiced frame and sends the information to the network. In human-to-human conversations, generally 50% to 60% per speaker
Is said to be silent, and the silent compression method is an extremely effective coding method for lowering the average bit rate.

【０００８】しかし、具体的な通信装置のなかで有音判
定をどのように実行するかについては、技術的に非常に
困難な面があり、有音判定レベルを厳しく（すなわち、
高く）設定し過ぎると本来発声区間であるはずのフレー
ムを無音と判定してしまったり、逆に有音判定レベルを
甘く（すなわち、低く）設定し過ぎると何も発声してい
ない区間のフレームでも有音と判定してしまったりする
場合があり、この判定誤りが音質劣化につながってい
る。よってネットワーク上のトラフィック量の観点から
ゆるされるのであれば、無音圧縮を極力おこなわないほ
うが、音質的に好ましい。However, it is technically very difficult to determine whether or not a sound is to be determined in a specific communication device.
If it is set too high, a frame that should be an utterance section is judged as silence. Conversely, if the sound judgment level is set too low (ie, low), even a frame in a section where no utterance is made In some cases, it may be determined that there is sound, and this determination error leads to sound quality degradation. Therefore, if it is relaxed from the viewpoint of the amount of traffic on the network, it is preferable in terms of sound quality not to perform silent compression as much as possible.

【０００９】図２に無音圧縮レベルＴｈｒｅｓｈ（有音
判定レベル）を変化させた場合の、平均ビットレートと
音質の関係を示す。横軸は無音圧縮無しのとき（すなわ
ち、Ｔｈｒｅｓｈ＝０のとき）の平均ビットレートを１
００％として、平均ビットレートを示しており、縦軸
は、ＭＯＳテスト（５段階評価で数値が高いほど音質が
高いことを示す）による音質評価を示している。FIG. 2 shows the relationship between the average bit rate and the sound quality when the silence compression level Thresh (the sound determination level) is changed. The horizontal axis represents the average bit rate without silence compression (ie, when Thresh = 0) as 1
The average bit rate is shown as 00%, and the vertical axis indicates the sound quality evaluation by the MOS test (the higher the numerical value in the five-step evaluation, the higher the sound quality).

【００１０】Ｔｈｒｅｓｈをあげていくとビットレート
は下がり音質も下がるのがわかる。したがって音質を確
保するためには送信時のネットワークに問題が生じない
範囲で、低いＴｈｒｅｓｈを使うことが好ましいことが
わかる。ネットワーク上のトラフィックは時間とともに
変わるので、Ｔｈｒｅｓｈも、当該トラフィックの変動
に対して動的に適応して変更していくのが好ましい。It can be seen that as the threshold is increased, the bit rate decreases and the sound quality also decreases. Therefore, in order to ensure sound quality, it is understood that it is preferable to use a low threshold as long as no problem occurs in the network at the time of transmission. Since the traffic on the network changes with time, it is preferable that the Thresh is also dynamically changed and adapted to the fluctuation of the traffic.

【００１１】有音判定レベルを動的に変更する方法とし
て、ノイズレベルに逐次適応していくＶＡＤ（有声検
出）アルゴリズムが知られている。As a method of dynamically changing the sound determination level, a VAD (voiced detection) algorithm that sequentially adapts to the noise level is known.

【００１２】当該ＶＡＤアルゴリズムを記載した文献と
しては、次の文献１がある。The following document 1 describes the VAD algorithm.

【００１３】文献１：ＩＴＵ−Ｔ勧告Ｇ．７２３．１
ＡｎｎｅｘＡただしこの文献１の方法は、時間的に変
化する背景雑音に有音判定レベルを適応させることが目
的であって、ネットワーク上のトラフィック量とは無関
係である。Reference 1: ITU-T Recommendation G. 723.1
Annex A However, the method of Document 1 aims at adapting the sound determination level to the background noise that changes over time, and is independent of the traffic volume on the network.

【００１４】また、ネットワーク上のトラフィック量に
あわせて送信データレートを適応させる方法を記載した
文献としては、次の文献２がある。The following document 2 describes a method of adapting the transmission data rate according to the amount of traffic on the network.

【００１５】文献２：特開平１１−１７７６２３号公報
ただしこの文献２の方法は、マルチレートオーディオエ
ンコーダを使って指示した正確な値にビットレートを適
応させるものであり、有音判定レベルを指定するもので
はない。Reference 2: Japanese Patent Application Laid-Open No. H11-177623 However, the method of Reference 2 adapts a bit rate to an accurate value specified by using a multi-rate audio encoder, and specifies a sound determination level. Not something.

【００１６】結局、これらの文献１、文献２に記載され
た方法よっても、なお、ネットワーク上のトラフィック
量に負担をかけない範囲で、有音検出レベルを一定時間
単位で変化させ、音質劣化を最小に押さえるパケット音
声通信を提供するという課題は、達成されていない。After all, according to the methods described in Documents 1 and 2, the sound detection level is changed in a fixed time unit within a range that does not impose a load on the traffic volume on the network, and the sound quality is deteriorated. The task of providing packet voice communications with a minimum has not been achieved.

【００１７】[0017]

【課題を解決するための手段】かかる課題を解決するた
めに、第１の発明では、音声パケット送信装置から送信
された音声パケットを、所定のネットワークを介して受
信する音声パケット受信装置において、前記ネットワー
ク上のトラフィックに対応して変動する音声パケットの
受信状況を検出する受信状況検出手段と、当該受信状況
検出手段が検出した受信状況に応じた受信状況情報を、
前記音声パケット送信装置に送達するため、前記ネット
ワークに送信する受信状況情報送信手段とを備えたこと
を特徴とする。According to a first aspect of the present invention, there is provided a voice packet receiving apparatus for receiving a voice packet transmitted from a voice packet transmitting apparatus via a predetermined network. Receiving status detecting means for detecting a receiving status of a voice packet that fluctuates according to traffic on the network, and receiving status information according to the receiving status detected by the receiving status detecting unit;
And a receiving status information transmitting means for transmitting to the network for transmitting to the voice packet transmitting apparatus.

【００１８】また、第２の発明では、所定のネットワー
クを介して音声パケット受信装置に音声パケットを送信
する音声パケット送信装置において、（１）送信しよう
とする音声フレームが有音であるか無音であるかを、判
定レベルに応じて判定する判定手段と、（２）当該判定
手段が無音と判定した無音フレームは、有音と判定した
有音フレームよりも少ない情報量となるように符号化す
る音声符号化手段と、（３）前記ネットワーク上のトラ
フィックに対応して変動する前記音声パケットの受信状
況に応じて、前記音声パケット受信装置が送信した受信
状況情報を、前記ネットワークを介して受信する受信状
況情報受信手段と、（４）当該受信状況情報受信手段が
受信した受信状況情報に応じて、前記判定レベルを変更
する判定レベル変更手段とを備えることを特徴とする。According to a second aspect of the present invention, in a voice packet transmitting apparatus for transmitting a voice packet to a voice packet receiving apparatus via a predetermined network, (1) the voice frame to be transmitted is voiced or silent. Determining means for determining whether or not there is a sound frame in accordance with the determination level; and (2) encoding a silent frame determined to be silent by the determining means so as to have a smaller amount of information than a sound frame determined to be sound. Voice encoding means, and (3) receiving, via the network, reception status information transmitted by the voice packet receiving apparatus in accordance with a reception status of the voice packet, which varies according to traffic on the network. Receiving status information receiving means; and (4) a determination level changing means for changing the determination level in accordance with the receiving status information received by the receiving status information receiving means. Characterized in that it comprises a means.

【００１９】さらに、第３の発明にかかるパケット通信
システムでは、請求項１の音声パケット受信装置と、請
求項２の音声パケット送信装置とを備えることを特徴と
する。Further, a packet communication system according to a third aspect of the present invention includes the voice packet receiving device of claim 1 and the voice packet transmitting device of claim 2.

【００２０】[0020]

【発明の実施の形態】（Ａ）実施形態以下、本発明の音声パケット送信装置、音声パケット受
信装置、及びパケット通信システムの実施形態について
説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS (A) Embodiment Hereinafter, embodiments of a voice packet transmitting device, a voice packet receiving device, and a packet communication system according to the present invention will be described.

【００２１】（Ａ−１）第１の実施形態の構成本実施形態の通信システム１０の全体構成を図１に示
す。(A-1) Configuration of the First Embodiment FIG. 1 shows the overall configuration of a communication system 10 of the present embodiment.

【００２２】図１において、当該通信システム１０は、
音声送信端末１１と、ＩＰ（インターネットプロトコ
ル）ネットワーク１２と、音声受信端末１３とを備えて
いる。In FIG. 1, the communication system 10 includes:
A voice transmitting terminal 11, an IP (Internet Protocol) network 12, and a voice receiving terminal 13 are provided.

【００２３】このうち音声送信端末１１は、音声入力部
２０と、音声符号化部２１と、音声パケット送信部２２
と、受信状態情報受信部２３と、有音判定レベル計算部
２４とを備えている。The voice transmitting terminal 11 includes a voice input unit 20, a voice encoding unit 21, and a voice packet transmitting unit 22.
, A reception state information receiving section 23, and a sound determination level calculation section 24.

【００２４】音声入力部２０は、例えばマイクロホンな
どによって外部から入力される音声を取り込み、当該音
声を１フレーム長ごとに音声データＡＤに変換して出力
する部分である。The audio input section 20 is a section which takes in an externally input audio by, for example, a microphone or the like, converts the audio into audio data AD for each frame length, and outputs the audio data AD.

【００２５】音声入力部２０から当該音声データＡＤを
受け取る音声符号化部２１は、当該音声データＡＤを固
定時間長（一例として、３０ｍｓ）ごとに、音声フレー
ム（すなわち音声フレーム符号化データ）ＣＦに変換し
て出力する部分である。この音声符号化部２１は、無音
圧縮を実行する部分であるため、出力される音声フレー
ムＣＦは、無音フレーム（無音フレーム符号化データ）
ＣＦ２（図３参照）または有音フレーム（有音フレーム
符号化データ）ＣＦ１のいずれかである。The audio encoding unit 21 which receives the audio data AD from the audio input unit 20 converts the audio data AD into an audio frame (ie, audio frame encoded data) CF for each fixed time length (for example, 30 ms). This is the part that converts and outputs. Since the speech encoding unit 21 is a part that executes silence compression, the speech frame CF to be output is a silence frame (silence frame encoded data).
Either CF2 (see FIG. 3) or a voiced frame (voiced frame coded data) CF1.

【００２６】すなわち、音声符号化部２１は図３に示す
ような内部構成を持ち、有音判定部２１Ａと、有音フレ
ーム符号化部２１Ｂと、無音フレーム符号化部２１Ｃと
を備えている。That is, the speech encoding unit 21 has an internal configuration as shown in FIG. 3, and includes a speech determination unit 21A, a speech frame encoding unit 21B, and a silence frame encoding unit 21C.

【００２７】有音判定部２１Ａは、後述する有音判定レ
ベル計算部２４から供給される有音判定レベル（有音検
出レベル）ＬＴＨに応じて前記有音判定を実行し、１フ
レーム分の音声データＡＤごとに、有音と判定した音声
データＡＤは有音フレーム符号化部２１Ｂに供給し、無
音と判定した音声データＡＤは無音フレーム符号化部２
１Ｃに供給する部分である。The sound determination unit 21A executes the sound determination in accordance with a sound determination level (voice detection level) LTH supplied from a sound determination level calculation unit 24, which will be described later, and outputs one frame of voice. For each data AD, the audio data AD determined to be voiced is supplied to the voiced frame encoding unit 21B, and the voiced data AD determined to be silence is supplied to the silent frame encoding unit 2B.
This is the part that supplies 1C.

【００２８】有音判定にあたり、有音判定部２１Ａは、
１フレーム分の音声データＡＤの分散を計算し、その値
が前記有音判定レベルＬＴＨ以上の場合には有音、以下
の場合は無音と判定する。In sound determination, the sound determination unit 21A
The variance of the audio data AD for one frame is calculated. If the variance is equal to or higher than the sound determination level LTH, it is determined that there is sound, and if the value is less than or equal to the sound determination level LTH, there is no sound.

【００２９】有音フレーム符号化部２１Ｂは有音フレー
ムのために符号化データサイズが比較的多くなる符号化
を実行して前記有音フレームＣＦ１を出力し、無音フレ
ーム符号化部２１Ｃは無音フレームのために符号化デー
タサイズが少なくなる符号化を実行して、前記無音フレ
ームＣＦ２を出力する部分である。例えば、有音フレー
ム符号化部２１Ｂは１６ビットＰＣＭ（Pulse Code M
odulation）方式で符号化を行い、無音フレーム符号化
２１Ｃは４ビットＰＣＭ方式で符号化を行うものであっ
てよい。The voiced frame coding unit 21B executes the coding in which the coded data size is relatively large for the voiced frame and outputs the voiced frame CF1, and the voiceless frame coding unit 21C outputs the voiceless frame CF1. This is a portion for executing the encoding for reducing the encoded data size and outputting the silent frame CF2. For example, the voiced frame encoding unit 21B uses a 16-bit PCM (Pulse Code M
odulation), and the silence frame coding 21C may be coded by a 4-bit PCM method.

【００３０】そして、図１に示した前記音声パケット送
信部２２は、当該音声フレームＣＦをもとに音声パケッ
トＡＰを形成してネットワーク１２へ送出する部分であ
る。１つの音声パケットＡＰには、１または複数の音声
フレームＣＦが収容されていてよいが、通常は、複数フ
レーム（例えば５フレーム）が、１つの音声パケットＡ
Ｐに収容される。The voice packet transmitting section 22 shown in FIG. 1 is a section for forming a voice packet AP based on the voice frame CF and transmitting it to the network 12. One voice packet AP may contain one or a plurality of voice frames CF, but usually, a plurality of frames (for example, five frames)
It is housed in P.

【００３１】また、当該音声パケット送信部２２は、時
系列に送信される各音声パケットＡＰのパケットヘッダ
に、音声パケット系列の連続状態を示す連続情報を付与
する連続情報付与部２２Ａを備えている。本実施形態で
はこの連続情報を、送信順にしたがって１つずつインク
リメントされるパケット番号（ｐＩＤ）であるものとす
る。The voice packet transmitting section 22 includes a continuous information providing section 22A for providing continuous information indicating the continuous state of the voice packet sequence to the packet header of each voice packet AP transmitted in time series. . In the present embodiment, this continuous information is assumed to be a packet number (pID) that is incremented by one according to the transmission order.

【００３２】また、前記受信状態情報受信部２３は、ネ
ットワーク１２から受信状態情報ＳＩを受信する部分で
ある。ネットワーク１２はパケット通信網なので、当該
受信状態情報ＳＩも、パケットＲＰに収容されて伝送さ
れてくる。受信状態情報受信部２３は、このパケットＲ
Ｐのなかから受信状態情報ＳＩを抽出して有音判定レベ
ル計算部２４に出力する。The receiving state information receiving section 23 is a part for receiving the receiving state information SI from the network 12. Since the network 12 is a packet communication network, the reception state information SI is also accommodated in the packet RP and transmitted. The reception state information receiving unit 23
The reception state information SI is extracted from P and output to the sound determination level calculation unit 24.

【００３３】有音判定レベル計算部２４は、当該受信状
態情報ＳＩに応じて、前記有音判定レベルＬＴＨを変更
する部分である。The sound level judgment section 24 is a section for changing the sound level LTH according to the reception status information SI.

【００３４】有音判定レベルＬＴＨは、前記無音圧縮レ
ベルＴｈｒｅｓｈに対応する閾値である。したがって上
述したように、有音判定レベルＬＴＨを高く設定し過ぎ
ると本来発声区間であるはずのフレームの音声データＡ
Ｄを無音と判定してしまったり、逆に有音判定レベルＬ
ＴＨを低く設定し過ぎると何も発声していない区間のフ
レームの音声データＡＤを有音と判定してしまったりす
る場合があり、音質劣化につながる。The sound level LTH is a threshold value corresponding to the silence compression level Thresh. Therefore, as described above, if the sound determination level LTH is set too high, the audio data A of the frame that should be the utterance section is originally set.
D is determined to be silent, and conversely, the sound determination level L
If the TH is set too low, the audio data AD of a frame in a section in which nothing is uttered may be determined as having sound, leading to sound quality deterioration.

【００３５】有音判定レベル計算部２４は、有音判定レ
ベルＬＴＨを受信状態情報ＳＩに適応させることによ
り、このような音質劣化を最小限度に抑制する。The sound determination level calculation unit 24 minimizes such sound quality deterioration by adapting the sound determination level LTH to the reception state information SI.

【００３６】一方、ネットワーク１２を介して前記音声
パケットＡＰを受信するとともに、前記パケットＲＰの
送信元でもある音声受信端末１３は、音声パケット受信
部３０と、音声復号部３１と、音声出力部３２と、受信
状態情報作成部３３と、受信状態情報送信部３４とを備
えている。On the other hand, the voice receiving terminal 13 which receives the voice packet AP via the network 12 and also transmits the packet RP includes a voice packet receiving unit 30, a voice decoding unit 31, a voice output unit 32 And a reception state information creation unit 33 and a reception state information transmission unit 34.

【００３７】このうち音声パケット受信部３０は、前記
音声送信端末１１内の音声パケット送信部２２が送信し
た音声パケットＡＰを、ネットワーク１２から受信する
部分である。当該音声パケット受信部３０は、受信した
音声パケットＡＰから抽出した音声フレームＣＦは音声
復号部３１に供給し、前記パケット番号ｐＩＤは受信状
態を示す受信状態データＳＤとして受信状態情報作成部
３３に供給する。The voice packet receiving section 30 receives the voice packet AP transmitted from the voice packet transmitting section 22 in the voice transmitting terminal 11 from the network 12. The voice packet receiving unit 30 supplies the voice frame CF extracted from the received voice packet AP to the voice decoding unit 31, and supplies the packet number pID to the reception state information creation unit 33 as reception state data SD indicating the reception state. I do.

【００３８】音声フレームＣＦを受け取った音声復号部
３１は、当該音声フレームＣＦを音声データＡＤに変換
して音声出力部３２に出力する。The audio decoding unit 31 that has received the audio frame CF converts the audio frame CF into audio data AD and outputs it to the audio output unit 32.

【００３９】音声出力部３２は、例えばスピーカなどに
よって、当該音声データＡＤに対応した音声出力を行う
部分である。The audio output section 32 is a section for outputting an audio corresponding to the audio data AD, for example, by a speaker or the like.

【００４０】また、前記受信状態情報作成部３３は、前
記受信状態データＳＤをもとに後述する処理を行って受
信状態情報ＳＩを作成し、受信状態情報送信部３４へ出
力する部分である。The receiving state information creating section 33 is a section that creates receiving state information SI by performing processing described later based on the receiving state data SD, and outputs it to the receiving state information transmitting section 34.

【００４１】受信状態情報送信部３４は、当該受信状態
情報ＳＩを収容したパケットＲＰを作成してネットワー
ク１２へ送出する。当該パケットＲＰは、宛先として前
記音声送信端末１１を指定するルーティング情報を付与
される。The reception state information transmitting section 34 creates a packet RP containing the reception state information SI and sends it to the network 12. The packet RP is provided with routing information designating the voice transmitting terminal 11 as a destination.

【００４２】なお、前記音声送信端末１１および音声受
信端末１３は、一例として、ＩＰネットワークインタフ
ェース手段とサウンド入出力手段を備えたＰＣ（パーソ
ナルコンピュータ）等で実現することができる。The voice transmitting terminal 11 and the voice receiving terminal 13 can be realized by, for example, a PC (personal computer) having an IP network interface and a sound input / output unit.

【００４３】以下、上記のような構成を有する本実施形
態の動作について説明する。The operation of this embodiment having the above configuration will be described below.

【００４４】（Ａ−２）第１の実施形態の動作外部から
入力した音声信号は、音声入力部２０で１フレーム単位
の音声データＡＤに変換され、音声符号化部２１に供給
される。(A-2) Operation of the First Embodiment An audio signal input from outside is converted into audio data AD in units of one frame by an audio input unit 20 and supplied to an audio encoding unit 21.

【００４５】音声符号化部２１による符号化、音声パケ
ット送信部２２によるパケット化を経て、前記音声パケ
ットＡＰがネットワーク１２に送出される。The voice packet AP is transmitted to the network 12 after being encoded by the audio encoding unit 21 and packetized by the audio packet transmitting unit 22.

【００４６】当該音声パケットＡＰには音声受信端末１
３を指定するルーティング情報が付与されているので、
ＩＰネットワーク１２によって音声受信端末１３まで転
送される。The voice packet AP includes a voice receiving terminal 1
Since the routing information specifying 3 is given,
The data is transferred to the voice receiving terminal 13 by the IP network 12.

【００４７】音声送信端末１１が次々と時系列に音声パ
ケットＡＰを送出すると、前記連続情報付与部２２Ａ
は、パケット番号ｐＩＤとして、第１番目の音声パケッ
トＡＰ（１）には「１」（１０進数表示）を付与する。When the voice transmitting terminal 11 successively sends out voice packets AP in time series, the continuous information providing unit 22A
Assigns “1” (decimal notation) to the first audio packet AP (1) as the packet number pID.

【００４８】同様に、第２番目の音声パケットＡＰ
（２）にはパケット番号ｐＩＤとして「２」が、第３番
目の音声パケットＡＰ（３）にはパケット番号ｐＩＤと
して「３」が、第４番目の音声パケットＡＰ（４）には
パケット番号ｐＩＤとして「４」が、…、第Ｎ番目の音
声パケットＡＰ（Ｎ）にはパケット番号ｐＩＤとして
「Ｎ」が、それぞれ付与される。Similarly, the second voice packet AP
(2) has a packet number pID of “2”, a third voice packet AP (3) has a packet number pID of “3”, and a fourth voice packet AP (4) has a packet number pID. ,..., And the Nth audio packet AP (N) is assigned “N” as the packet number pID.

【００４９】したがって、ネットワーク１２のトラフィ
ックが通常範囲内にあり輻輳度が通常程度であれば、音
声受信端末１３における受信状態も正常となり、前記音
声パケットＡＰ（１）〜ＡＰ（Ｎ）が、１パケットも抜
け落ちることなくこの順番に受信される。このとき、音
声受信端末１３内において音声パケット受信部３０が出
力する受信状態データＳＤは、１ずつインクリメントさ
れた抜けのない連続した「１」〜「Ｎ」のパケット番号
ｐＩＤとなる。Therefore, if the traffic of the network 12 is within the normal range and the degree of congestion is normal, the reception state at the voice receiving terminal 13 becomes normal, and the voice packets AP (1) to AP (N) become 1 Packets are also received in this order without dropping. At this time, the reception state data SD output by the voice packet receiving unit 30 in the voice receiving terminal 13 is a packet number pID of continuous “1” to “N” without omission which is incremented by one.

【００５０】ところが、ネットワーク１２のトラフィッ
クが通常範囲を超えて輻輳度が通常よりも高まると、ネ
ットワーク１２中でパケット損失が発生して前記音声パ
ケットＡＰ（１）〜ＡＰ（Ｎ）のなかに受信できないも
のが生じる。However, if the traffic of the network 12 exceeds the normal range and the congestion degree becomes higher than normal, a packet loss occurs in the network 12 and the voice packets AP (1) to AP (N) are received. Something cannot be done.

【００５１】このときは、前記音声パケット受信部３０
が出力する受信状態データＳＤに抜けが発生する。At this time, the voice packet receiving unit 30
Is missing in the reception state data SD output by the.

【００５２】受信状態データＳＤが例えば、１，２，
３，４，…，Ｍ−１、Ｍ＋１，…，Ｎとなる場合は、Ｍ
番目の音声パケットＡＰ（Ｍ）がパケット損失によって
失われたケースである。When the reception state data SD is, for example, 1, 2, 2,
If 3, 4,..., M−1, M + 1,.
This is the case where the third voice packet AP (M) is lost due to packet loss.

【００５３】このような受信状態データＳＤ（受信され
たパケット番号ｐＩＤ）を、受信状態情報ＳＩに変換す
るのは、図４に示した通りの、受信状態情報作成部３３
の処理による。The conversion of the reception state data SD (received packet number pID) into the reception state information SI is performed by the reception state information creation unit 33 as shown in FIG.
It depends on the processing.

【００５４】図４のフローチャートは、Ｓ１０〜Ｓ１７
の各ステップから構成されている。図４では、受信状態
情報ＳＩとして、１００パケット当たりのパケット損失
数（すなわちパケット損失率）を算出して出力する。The flow chart of FIG.
It consists of each step. In FIG. 4, the number of packet losses per 100 packets (that is, the packet loss rate) is calculated and output as the reception state information SI.

【００５５】図４において、受信状態情報作成部３３の
処理がスタートすると、初期値として変数Ｎｕｍ、変数
Ｌｏｓｓ、変数Ｌａｓｔ＿ｐＩＤにはすべて、「０」が
代入される（Ｓ１０）。ここで、変数（パケットカウン
タ）Ｎｕｍは、受信された音声パケットＡＰの数を示
し、変数（パケットロスカウンタ）Ｌｏｓｓはパケット
損失によって失われた音声パケットＡＰの数を示し、変
数Ｌａｓｔ＿ｐＩＤは、これから図４のフローチャート
を用いて処理しようとしている音声パケットＡＰ（Ｘ）
の直前に受信された音声パケットＡＰ（Ｘ−１）が持っ
ていたｐＩＤを示す。In FIG. 4, when the process of the reception state information creating unit 33 starts, "0" is substituted into the variables Num, Loss, and Last_pID as initial values (S10). Here, the variable (packet counter) Num indicates the number of received voice packets AP, the variable (packet loss counter) Loss indicates the number of voice packets AP lost due to packet loss, and the variable Last_pID will Voice packet AP (X) to be processed using the flowchart of FIG.
Indicates the pID held by the voice packet AP (X-1) received immediately before.

【００５６】なお、変数とは、ハードウエア的には、そ
の目的のためにメモリ上に確保された領域を意味する。
以下においても同じである。Note that a variable means an area secured on a memory for the purpose in terms of hardware.
The same applies to the following.

【００５７】次に、関数Ｇｅｔ＿ｐＩＤが受信状態デー
タＳＤとして音声パケット受信部３０から供給された音
声パケットＡＰ（Ｘ）のｐＩＤの値を取得し、その値を
変数ｐＩＤに代入する（Ｓ１１）。この関数Ｇｅｔ＿ｐ
ＩＤは、新たなパケット番号ｐＩＤが音声パケット受信
部３０から供給されるたびに同じ処理を行う。Next, the function Get_pID acquires the value of the pID of the audio packet AP (X) supplied from the audio packet receiving unit 30 as the reception state data SD, and substitutes that value for the variable pID (S11). This function Get_p
The ID performs the same process each time a new packet number pID is supplied from the voice packet receiving unit 30.

【００５８】そしてステップＳ１２では、変数Ｎｕｍの
値が「１００」（１０進数表示）に等しいかどうかが検
査され、等しい場合にはＹｅｓ側に分岐し、等しくない
場合にはＮｏ側に分岐する。In step S12, it is checked whether the value of the variable Num is equal to "100" (decimal notation). If the value is equal, the flow branches to the Yes side, and if not, the flow branches to the No side.

【００５９】最初、変数Ｎｕｍには前記「０」が代入さ
れているので、この分岐はＮｏ側が選択される。もっと
も早い場合でも、ステップＳ１２がＹｅｓ側に分岐する
のは、１０１番目の音声パケットＡＰ（１０１）が受信
されたときである。パケット損失があるとこの１０１
が、当該パケット損失数だけ大きくなり、ステップＳ１
２のＹｅｓ側の分岐が選択されるタイミングが遅くな
る。First, since the above-mentioned "0" is substituted for the variable Num, the No side is selected for this branch. Even in the earliest case, step S12 branches to the Yes side when the 101st voice packet AP (101) is received. If there is a packet loss, this 101
Becomes larger by the packet loss number, and the step S1
The timing at which the branch on the Yes side of No. 2 is selected is delayed.

【００６０】Ｘ（１≦Ｘ≦１０１。ただしＸは自然数）
番目のパケットである音声パケットＡＰ（Ｘ）を処理す
るときには、それまでにパケット損失がなければ、当該
変数Ｎｕｍの値は、Ｘ−１である。X (1 ≦ X ≦ 101, where X is a natural number)
When processing the audio packet AP (X) which is the third packet, the value of the variable Num is X−1 if there is no packet loss until then.

【００６１】ステップＳ１２がＮｏ側に分岐するとステ
ップＳ１３の処理が行われる。When step S12 branches to No, the processing of step S13 is performed.

【００６２】ステップＳ１３では、当該音声パケットＡ
Ｐ（Ｘ）の変数ｐＩＤの値が、前記Ｌａｓｔ＿ｐＩＤに
「１」を加えた値に等しいかどうかが検査される。すな
わち、前記受信状態データＳＤとして音声パケット受信
部３０から供給されるｐＩＤが１つずつインクリメント
されていて抜けがないかどうかが調べられる。抜けがな
ければ、ステップＳ１３はＹｅｓ側に分岐して、処理は
ステップＳ１７に進み、抜けがあればＮｏ側に分岐して
ステップＳ１６に進む。In step S13, the voice packet A
It is checked whether the value of the variable pID of P (X) is equal to the value of the Last_pID plus "1". That is, it is checked whether the pID supplied from the voice packet receiving unit 30 as the reception state data SD is incremented one by one and there is no omission. If there is no omission, step S13 branches to Yes side, and the process proceeds to step S17. If there is omission, the process branches to No side and proceeds to step S16.

【００６３】例えば、Ｌａｓｔ＿ｐＩＤの初期値は前記
「０」なので、最初に音声送信端末１１から送信された
音声パケットＡＰ（１）がネットワーク１２で失われる
ことなく、最初に音声受信端末１３に受信されたなら
ば、音声パケットＡＰ（１）のｐＩＤの「１」と、Ｌａ
ｓｔ＿ｐＩＤ＋１の値は等しくなる。For example, since the initial value of Last_pID is “0”, the voice packet AP (1) transmitted from the voice transmitting terminal 11 first is received by the voice receiving terminal 13 without being lost in the network 12. Then, the pID “1” of the voice packet AP (1) and La
The values of st_pID + 1 are equal.

【００６４】Ｘ番目の音声パケットＡＰ（Ｘ）を処理す
るときには、音声パケットＡＰ（Ｘ）またはＡＰ（Ｘ−
１）の双方ともがパケット損失によって失われていない
場合にだけ、ステップＳ１３のＹｅｓ側が選択される。When processing the Xth voice packet AP (X), the voice packet AP (X) or AP (X-
Only when both of 1) are not lost due to the packet loss, the Yes side of step S13 is selected.

【００６５】当該ステップＳ１３につづいて行われるス
テップＳ１７では、Ｌａｓｔ＿ｐＩＤ＋＋およびＮｕｍ
＋＋が行われる。すなわち、変数Ｌａｓｔ＿ｐＩＤと変
数Ｎｕｍがインクリメントされる。当該ステップＳ１７
の次には前記ステップＳ１１が実行される。In step S17, which is performed after step S13, Last_pID ++ and Num
++ is performed. That is, the variable Last_pID and the variable Num are incremented. Step S17
Then, the step S11 is executed.

【００６６】一方、ステップＳ１３につづいて行われる
ステップＳ１６では、変数Ｌａｓｔ＿ｐＩＤおよびＮｕ
ｍに加えて、変数Ｌｏｓｓもインクリメントされて、処
理は、前記ステップＳ１２へ戻る。On the other hand, in step S16 which is performed after step S13, the variables Last_pID and Nu
In addition to the variable m, the variable Loss is also incremented, and the process returns to step S12.

【００６７】また、前記ステップＳ１２がＹｅｓ側に分
岐すると、ステップＳ１４において、Ｓｅｎｄ（Ｌｏｓ
ｓ）の処理が行われる。すなわち送信関数Ｓｅｎｄ（）
により、変数Ｌｏｓｓの値が、前記受信状態情報ＳＩと
して、受信状態情報送信部３４に出力される。When the step S12 branches to the Yes side, at step S14 Send (Loss)
The processing of s) is performed. That is, the transmission function Send ()
As a result, the value of the variable Loss is output to the reception state information transmission unit 34 as the reception state information SI.

【００６８】ステップＳ１４の次にはステップＳ１５に
おいて、変数Ｌｏｓｓと変数Ｎｕｍに初期値「０」が代
入されて、処理はステップＳ１３に進む。なお、このと
き変数Ｌａｓｔ＿ｐＩＤは初期化されないので、ステッ
プＳ１３における変数ｐＩＤとＬａｓｔ＿ｐＩＤの関係
はその前の関係が維持される。すなわち、パケット番号
ｐＩＤとしては、１０１以上の連続番号を使用すること
ができる。After step S14, in step S15, the initial value "0" is substituted for the variable Loss and the variable Num, and the process proceeds to step S13. At this time, since the variable Last_pID is not initialized, the relation between the variable pID and Last_pID in step S13 is maintained. That is, as the packet number pID, a serial number of 101 or more can be used.

【００６９】音声送信端末１１と音声受信端末１３のあ
いだで、パケット損失のない正常な通信が維持されるか
ぎり、ステップＳ１１、Ｓ１２、Ｓ１３、Ｓ１７によっ
て構成されるループが繰り返され、当該ループの１０１
回目の繰り返しごと（１００パケット期間ごと）にステ
ップＳ１４とＳ１５の処理が行われる。この場合、パケ
ット損失数は「０」なので、関数Ｓｅｎｄ（Ｌｏｓｓ）
によって出力される受信状態情報ＳＩは「０」である。As long as normal communication without packet loss is maintained between the voice transmitting terminal 11 and the voice receiving terminal 13, the loop composed of steps S11, S12, S13, and S17 is repeated, and the loop 101
The processing of steps S14 and S15 is performed for each repetition (every 100 packet periods). In this case, since the packet loss number is “0”, the function Send (Loss)
Is "0".

【００７０】ただし、パケット損失があると、その損失
数だけ、前記ステップＳ１２、Ｓ１３、Ｓ１６によって
構成されるループが繰り返されることになる。However, if there is a packet loss, the loop constituted by steps S12, S13 and S16 is repeated by the number of the packet loss.

【００７１】前記ステップＳ１４によって、受信状態情
報ＳＩとして前記変数Ｌｏｓｓの値を受け取った受信状
態情報送信部３４は、その値をパケットＲＰのなかに収
容し、音声送信端末１１に宛ててネットワーク１２へ送
出する。At step S 14, the reception state information transmitting section 34, which has received the value of the variable Loss as the reception state information SI, stores the value in the packet RP and sends it to the voice transmission terminal 11 to the network 12. Send out.

【００７２】ネットワーク１２から当該パケットＲＰを
受信した音声送信端末１１の内部では、受信状態情報受
信部２３が当該パケットＲＰから受信状態情報ＳＩとし
て前記変数Ｌｏｓｓの値を抽出し、有音判定レベル計算
部２４に出力する。In the voice transmitting terminal 11 that has received the packet RP from the network 12, the reception state information receiving unit 23 extracts the value of the variable Loss as the reception state information SI from the packet RP, and calculates the sound determination level. Output to the unit 24.

【００７３】この有音判定レベル計算部２４が、当該変
数Ｌｏｓｓの値をもとに有音判定レベルＬＴＨとして、
音声符号化部２１にＴｈｒｅｓｈを設定する処理を、図
５のフローチャートに示す。The sound level judgment section 24 calculates the sound level LTH based on the value of the variable Loss.
The process of setting Threshold in the audio encoding unit 21 is shown in the flowchart of FIG.

【００７４】図５のフローチャートは、Ｓ２０〜Ｓ２７
の各ステップから構成されている。The flowchart of FIG.
It consists of each step.

【００７５】図５において、有音判定レベル計算部２４
の処理がスタートする（Ｓ２０）と、関数Ｇｅｔ＿Ｌｏ
ｓｓが、受信状態情報ＳＩとして受信状態情報受信部２
３から供給された前記変数Ｌｏｓｓの値を取得して、有
音判定レベル計算部２４側の変数Ｌｏｓｓに代入する
（Ｓ２１）。関数Ｇｅｔ＿Ｌｏｓｓは、新たな受信状態
情報ＳＩが供給されるたびに同じ動作を繰り返す。In FIG. 5, the sound determination level calculating section 24
Is started (S20), the function Get_Lo
ss is the reception state information receiving unit 2 as the reception state information SI.
The value of the variable Loss supplied from No. 3 is obtained and substituted into the variable Loss on the sound determination level calculation unit 24 side (S21). The function Get_Loss repeats the same operation each time new reception state information SI is supplied.

【００７６】なお、図５のフローチャートにおいて、変
数Ｌｏｓｓは、この有音判定レベル計算部２４側の変数
Ｌｏｓｓを意味するものとする。この変数Ｌｏｓｓも、
前記受信状態情報作成部３３側の変数Ｌｏｓｓも、パケ
ット損失数を格納している点では同じであるが、前記受
信状態情報作成部３３側の変数Ｌｏｓｓが１つの１００
パケット期間中にインクリメントされてその値が変化し
得るパケットロスカウンタであったのに対し、この変数
Ｌｏｓｓの値は、１つの１００パケット期間中は一定不
変である。In the flowchart of FIG. 5, the variable Loss means the variable Loss on the sound determination level calculation unit 24 side. This variable Loss also
The variable Loss on the reception state information creation unit 33 side is the same in that the packet loss number is stored, but the variable Loss on the reception state information creation unit 33 side is one 100.
While the packet loss counter is incremented during the packet period and its value can change, the value of this variable Loss is constant during one 100 packet period.

【００７７】次にステップＳ２２では、関数Ｇｅｔ＿Ｔ
ｈｒｅｓｈが直前の１００パケット期間に音声符号化部
２１で使用した有音判定レベルＬＴＨ（現在、音声符号
化部２１で使用している有音判定レベルＬＴＨ等であっ
てもよい）の値としてＴｈｒｅｓｈの値を取得し、変数
Ｔｈｒｅｓｈ０に代入する。関数Ｇｅｔ＿Ｔｈｒｅｓｈ
も、新たな受信状態情報ＳＩが供給されるたびに同じ動
作を繰り返す。Next, in step S22, the function Get_T
hresh is Thresh as a value of the sound determination level LTH used in the voice encoding unit 21 in the immediately preceding 100 packet period (may be the voice determination level LTH currently used in the voice encoding unit 21). Is obtained and assigned to the variable Threshold0. Function Get_Thresh
The same operation is repeated every time new reception state information SI is supplied.

【００７８】ここで、変数Ｔｈｒｅｓｈ０は、直前の１
００パケット期間に音声符号化部２１で使用した有音判
定レベルＴＬＨの値を格納するための変数である。Here, the variable Threshold0 is set to the immediately preceding 1
This is a variable for storing the value of the sound determination level TLH used by the voice encoding unit 21 during the 00 packet period.

【００７９】また、Ｔｈｒｅｓｈの値とは、有音判定部
２１Ａによって１フレーム分の音声データＡＤの分散と
比較される値である。その値が大きいほど有音判定レベ
ルＬＴＨが高くなって、前記有音判定部２１Ａにより、
音声データＡＤが無音フレームであるとの判定結果が出
される傾向が高まり、反対にその値が小さいほど有音判
定レベルＬＴＨが低くなって、音声データＡＤが有音フ
レームであるとの判定結果が出される傾向が高まる。The value of Threshold is a value to be compared with the variance of the audio data AD for one frame by the sound existence determination unit 21A. The larger the value is, the higher the sound determination level LTH is.
The tendency that the determination result that the audio data AD is a silent frame increases, and conversely, the smaller the value is, the lower the voiced determination level LTH is, and the determination result that the voice data AD is a voiced frame is increased. Increased tendency to be issued.

【００８０】ステップＳ２２につづくステップＳ２３で
は、変数Ｌｏｓｓの値が「０」であるかどうかが検査さ
れ、「０」の場合にはＹｅｓ側に分岐して処理はステッ
プＳ２４に進み、「０」でない場合にはＮｏ側に分岐し
てステップＳ２７へ進む。In step S23 following step S22, it is checked whether or not the value of the variable Loss is "0". If the value is "0", the process branches to the Yes side and the process proceeds to step S24, in which "0" is set. If not, the flow branches to No and proceeds to step S27.

【００８１】ステップＳ２４では、前記変数Ｔｈｒｅｓ
ｈ０の値から予め定めた変化幅α（α＞０）を減算して
得られる値を、変数Ｔｈｒｅｓｈに代入する。ここで、
変数Ｔｈｒｅｓｈは、基本的に、次の１００パケット期
間に音声符号化部２１で使用する有音判定レベルＬＴＨ
を格納するための変数である。In step S24, the variable Thres
A value obtained by subtracting a predetermined change width α (α> 0) from the value of h0 is substituted for a variable Thresh. here,
The variable Threshold is basically a sound determination level LTH used by the voice encoding unit 21 in the next 100 packet periods.
Is a variable for storing.

【００８２】この処理は、直前の１００パケット期間中
におけるパケット損失数が「０」であれば、現状でもビ
ットレートは十分に低くトラフィック面の問題はないと
して、ビットレートを上げて符号化歪みを減少させる方
向に制御していることになる。In this processing, if the packet loss number during the immediately preceding 100 packet period is “0”, the bit rate is considered to be sufficiently low at present and there is no traffic problem, and the coding rate is increased by increasing the bit rate. That is, the control is performed in the direction of decreasing.

【００８３】ステップＳ２４の次に実行されるステップ
Ｓ２５では、関数Ｌｉｍｉｔａｔｉｏｎによって、当該
関数Ｔｈｒｅｓｈの値が予め定められた最大値以下ない
し最小値以上の有効範囲内に限定される。すなわち、変
数Ｔｈｒｅｓｈの値が当該有効範囲内にある場合にはそ
の値がそのまま使用され、（有効範囲よりも大きい場合
には関数Ｌｉｍｉｔａｔｉｏｎによって規定される最大
値が変数Ｔｈｒｅｓｈの値とされ、）有効範囲よりも小
さい場合には関数Ｌｉｍｉｔａｔｉｏｎによって規定さ
れる最小値が変数Ｔｈｒｅｓｈの値とされる。In step S25 executed after step S24, the value of the function Threshold is limited by the function Limitation within an effective range of a predetermined maximum value or less or a minimum value or more. That is, if the value of the variable Thresh is within the valid range, the value is used as it is, and if the value is larger than the valid range, the maximum value defined by the function Limit is set as the value of the variable Thresh. If the value is smaller than the range, the minimum value defined by the function Limitation is set as the value of the variable Threshold.

【００８４】次にステップＳ２６では、当該変数Ｔｈｒ
ｅｓｈの値が、関数ＳｅｔＥｎｃによって音声符号化部
２１の有音判定レベルＬＴＨとして設定され、処理は、
前記ステップＳ２１に戻る。Next, at step S26, the variable Thr
The value of esh is set as a sound determination level LTH of the audio encoding unit 21 by the function SetEnc, and the processing is performed as follows.
The process returns to step S21.

【００８５】一方、前記ステップＳ２３においてＮｏ側
の分岐が選択された場合に実行されるステップＳ２７で
は、変数Ｔｈｒｅｓｈ０の値にＬｏｓｓ＊βを加算して
得られる値を、前記変数Ｔｈｒｅｓｈに代入する。ここ
で、「＊」は乗算を意味する演算子で、Ｌｏｓｓ＊β
は、変数Ｌｏｓｓの値にβ（＞０）を乗算して得られる
値を意味する。On the other hand, in step S27, which is executed when the branch on the No side is selected in step S23, a value obtained by adding Loss * β to the value of variable Thresh0 is substituted for the variable Thresh. Here, “*” is an operator meaning multiplication, and Loss * β
Means a value obtained by multiplying the value of the variable Loss by β (> 0).

【００８６】この処理は、直前の１００パケット期間中
におけるパケット損失数が「０」でなければ、現状でも
ビットレートは高すぎてトラフィック面の問題が大きい
として、たとえ符号化歪みが増大するとしてもビットレ
ートを下げる方向に制御することを意味する。In this processing, if the packet loss number during the immediately preceding 100 packet period is not “0”, the bit rate is still too high at present and the traffic problem is large, even if the coding distortion increases. This means that the bit rate is controlled to decrease.

【００８７】なお、βの値は、前記αと等しくてもよく
異なってもよい。The value of β may be equal to or different from the value of α.

【００８８】このステップＳ２７の次にも前記ステップ
Ｓ２５とＳ２６が実行される。After step S27, steps S25 and S26 are also executed.

【００８９】そして、音声符号化部２１は、ステップＳ
２６で有音判定レベルＬＴＨとして設定された変数Ｔｈ
ｒｅｓｈの値に基いて、次の１００パケット期間の有音
判定を行う。Then, the audio encoding unit 21 determines in step S
Variable Th set as sound determination level LTH in 26
Based on the value of “resh”, a sound determination is made for the next 100 packet periods.

【００９０】以上の手順を繰り返すことで、直前１００
パケットの受信状態を反映した有音判定レベルＬＴＨで
音声通信を行うことができる。By repeating the above procedure, the last 100
Voice communication can be performed at the sound determination level LTH reflecting the packet reception state.

【００９１】図５において、ネットワーク１２のトラフ
ィックが安定している場合、ステップＳ２３のＹｅｓ側
とＮｏ側がほぼ均等な頻度で選択されて、有音判定レベ
ルＬＴＨは一定値（最適値）をはさんで小さな振幅で変
動する状態を維持することになる。In FIG. 5, when the traffic of the network 12 is stable, the Yes side and the No side in step S23 are selected with almost equal frequency, and the sound determination level LTH is over a certain value (optimum value). , The state of fluctuating with a small amplitude is maintained.

【００９２】そして、ネットワーク１２のトラフィック
が変動すればそれに応じて、ステップＳ２３のＹｅｓ側
とＮｏ側の選択頻度が不均等となり、前記最適値の変動
に追従する。If the traffic of the network 12 fluctuates, the selection frequency of the Yes side and the No side in step S23 becomes uneven, and follows the fluctuation of the optimum value.

【００９３】要するに、本実施形態では、音声パケット
ＡＰの受信状態を計算し、音声パケットＡＰの受信状態
が悪いときには有音判定レベルＬＴＨを上げ、受信状態
にゆとりがあるときには有音判定レベルＬＴＨを下げる
ことができる。In short, in the present embodiment, the reception state of the voice packet AP is calculated, and when the reception state of the voice packet AP is poor, the sound determination level LTH is increased, and when the reception state is clear, the sound determination level LTH is raised. Can be lowered.

【００９４】これにより、ネットワーク１２上のトラフ
ィックが混雑しているときは音声パケットＡＰの平均ビ
ットレートを下げパケット損失やジッタ増加による音質
劣化を抑制し、前記トラフィックにゆとりがある場合に
は、無音圧縮による音質劣化を抑制することができ、平
均ビットレートをネットワークの混雑状況に自動適応さ
せることができる。Thus, when traffic on the network 12 is congested, the average bit rate of the voice packet AP is reduced to suppress sound quality deterioration due to packet loss and increase in jitter. Sound quality degradation due to compression can be suppressed, and the average bit rate can be automatically adapted to network congestion.

【００９５】なお、図４のフローチャート（特にステッ
プＳ１１、Ｓ１２、Ｓ１３、Ｓ１７によって構成される
ループ、またはステップＳ１１、Ｓ１２、Ｓ１３、Ｓ１
６によって構成されるループ）は、通常、１００パケッ
ト期間に１００回繰り返されるが、図５のフローチャー
トは、１００パケット期間、または１００パケット期間
と１００パケット期間のあいだの期間などにステップＳ
２１、Ｓ２２、Ｓ２３、Ｓ２４、Ｓ２５、Ｓ２６によっ
て構成されるループまたはステップＳ２１、Ｓ２２、Ｓ
２３、Ｓ２７、Ｓ２５、Ｓ２６によって構成されるルー
プが１回だけ実行されればよい。The flowchart of FIG. 4 (particularly, a loop constituted by steps S11, S12, S13 and S17, or steps S11, S12, S13 and S1)
6 is normally repeated 100 times in 100 packet periods. However, the flowchart of FIG. 5 shows that the step S is performed in 100 packet periods or in a period between 100 packet periods and 100 packet periods.
Loop formed by 21, S22, S23, S24, S25, S26 or steps S21, S22, S
The loop constituted by 23, S27, S25, and S26 may be executed only once.

【００９６】（Ａ−３）第１の実施形態の効果以上説明したように、本実施形態によれば、ネットワー
ク（１２）の混雑状態に対応した適切な平均ビットレー
トを設定することで、高すぎるビットレートによっても
たらされるパケット損失を防止するとともに、低すぎる
ビットレートによってもたらされる符号化歪みに起因す
る音質低下も可及的に低減して、音質低下を最小限度内
に抑制し、高品質で信頼性の高い音声通信を実現するこ
とができる。(A-3) Effects of the First Embodiment As described above, according to the present embodiment, by setting an appropriate average bit rate corresponding to the congestion state of the network (12), a high average bit rate can be obtained. In addition to preventing packet loss caused by an excessively low bit rate, it also minimizes audio quality degradation due to coding distortion caused by an excessively low bit rate. Highly reliable voice communication can be realized.

【００９７】また、パケット損失が発生するような通信
状態ではパケット損失にともなってジッタも増加する可
能性が高いが、本実施形態でパケット損失を防止すれ
ば、当該ジッタによる音質低下も低減することが可能で
ある。In a communication state in which packet loss occurs, it is highly possible that jitter increases with packet loss. However, if packet loss is prevented in the present embodiment, it is possible to reduce deterioration in sound quality due to the jitter. Is possible.

【００９８】(Ｂ）第２の実施形態以下では、本実施形態が第１の実施形態と相違する点に
ついてのみ説明する。(B) Second Embodiment In the following, only differences between the present embodiment and the first embodiment will be described.

【００９９】本実施形態では、音声符号化部の内部で有
音判定レベルＬＴＨが音声データＡＤの内容に応じて動
的に変化する無音圧縮方式を適用する。In the present embodiment, a silence compression method in which the sound determination level LTH dynamically changes according to the content of the audio data AD is applied inside the audio encoding unit.

【０１００】本実施形態は、有音判定レベル計算部によ
って算出される変数Ｓｃａｌｅに関連する部分に特徴が
ある。This embodiment is characterized in a portion related to the variable Scale calculated by the sound determination level calculation unit.

【０１０１】また、本実施形態は、前記文献１に記載さ
れた音声符号化方式を実現することができる。Further, the present embodiment can realize the speech coding method described in the above-mentioned reference 1.

【０１０２】（Ｂ−１）第２の実施形態の構成および動
作本実施形態の通信システム４０の全体構成は、図１に示
した第１の実施形態の通信システム１０と同じである。
したがって、第１の実施形態と同じ符号２０、２２、２
３、３０〜３４を付した各部の機能は、第１の実施形態
と同じである。(B-1) Configuration and Operation of Second Embodiment The overall configuration of a communication system 40 of the present embodiment is the same as the communication system 10 of the first embodiment shown in FIG.
Therefore, the same reference numerals 20, 22, 2 as in the first embodiment.
The functions of the units denoted by 3, 30 to 34 are the same as those of the first embodiment.

【０１０３】ただし本実施形態の有音判定レベル計算部
４４は、前記有音判定レベル計算部２４と異なり、受信
状態情報ＳＩ（Ｌｏｓｓ）を受け取って、有音判定レベ
ルＬＴＨ（Ｔｈｒｅｓｈ）ではなく有音判定スケールＳ
ｃａｌｅを計算する部分である。有音判定レベル計算部
４４が、受信状態情報ＳＩをもとに有音判定スケールＳ
ｃａｌｅを求める処理は、後述する図７に示した通りで
ある。However, unlike the sound judgment level calculation section 24, the sound judgment level calculation section 44 of the present embodiment receives the reception state information SI (Loss) and sets the sound judgment level LTH (Thresh) instead of the sound judgment level LTH (Thresh). Sound judgment scale S
This is the part for calculating the call. Based on the reception status information SI, the sound determination level calculation unit 44
The process of obtaining the call is as shown in FIG. 7 described later.

【０１０４】また、音声符号化部４１の内部構成も、図
６に示すように、第１の実施形態の音声符号化部２１と
相違する。本実施形態の音声符号化部４１は、ＶＡＤ方
式による有音判定方法を組み込んだＡＣＥＬＰ（Algebr
aic Code Exited LinerPrediction）方式による符号
化を実行する。Also, the internal configuration of the speech encoding unit 41 is different from that of the speech encoding unit 21 of the first embodiment, as shown in FIG. The speech encoding unit 41 according to the present embodiment includes an ACELP (Algebr
aic Code Exited Liner Prediction).

【０１０５】ＶＡＤ方式を組み込んだＡＣＥＬＰ方式の
原理は、前記文献１に詳細に述べられている通りのもの
であるが、文献１に記載されたような通常のＶＡＤ方式
では、ＬＰＣ（Linear Predictive Coding）分析によ
る平均残差パワーＥｒｒとノイズレベルＮｌｅｖに基く
有音判定レベルＴｈｒｅｓｈ（Ｎｌｅｖ）を閾値として
決定するのに対し、本実施形態では、当該Ｔｈｒｅｓｈ
の値に有音判定Ｓｃａｌｅ蓄積部４５から読み出したＳ
ｃａｌｅの値（初期値は１．００）を乗じたＳｃａｌｅ
＊Ｔｈｒｅｓｈ（Ｎｌｅｖ）を閾値ＳＣとする。The principle of the ACELP system incorporating the VAD system is as described in detail in the above-mentioned reference 1, but in a normal VAD system as described in the reference 1, the LPC (Linear Predictive Coding) is used. ) Although the sound determination level Thresh (Nlev) based on the average residual power Err and the noise level Nlev by the analysis is determined as a threshold, in the present embodiment, the threshold is determined in the present embodiment.
Is set to the value of S read from the sound determination Scale accumulation unit 45.
Scale multiplied by the value of call (initial value is 1.00)
* Thresh (Nlev) is set as threshold SC.

【０１０６】したがって、文献１の方式では、パケット
損失率（前記受信状態ＳＩ）を閾値に反映させることが
不可能であるが、本実施形態ではそれが可能となる。Therefore, in the method of Reference 1, it is impossible to reflect the packet loss rate (the reception state SI) on the threshold, but in the present embodiment, it is possible.

【０１０７】また、第１の実施形態と比べた場合、第１
の実施形態のＴｈｒｅｓｈ（有音判定レベルＬＴＨ）が
図５のフローチャートによって生成されたものであるの
に対し、本実施形態の当該Ｔｈｒｅｓｈは、前記平均残
差パワーＥｒｒとノイズレベルＮｌｅｖに基いて生成さ
れたＴｈｒｅｓｈ（Ｎｌｅｖ）である点で相違する。Further, when compared with the first embodiment, the first embodiment
The threshold (voice determination level LTH) of the embodiment is generated according to the flowchart of FIG. 5, whereas the threshold of the embodiment is generated based on the average residual power Err and the noise level Nlev. The difference is that the threshold value is Thresh (Nlev).

【０１０８】図６において、当該音声符号化部４１は、
有音フレーム符号化部４１Ｂと、無音フレーム符号化部
４１Ｃと、音声分析部４２と、有音判定部４３と、有音
判定Ｓｃａｌｅ蓄積部４５とを備えている。In FIG. 6, the speech encoding unit 41
It includes a voiced frame coding unit 41B, a voiceless frame coding unit 41C, a voice analysis unit 42, a voiced determination unit 43, and a voiced determination Scale storage unit 45.

【０１０９】このうち有音フレーム符号化部４１Ｂは前
記有音フレーム符号化部２１Ｂに対応し、無音フレーム
符号化部４１Ｃは前記無音フレーム符号化部２１Ｃに対
応し、有音判定部４３は前記有音判定部２１Ａに対応す
る。The voiced frame coding unit 41B corresponds to the voiced frame coding unit 21B, the voiceless frame coding unit 41C corresponds to the voiceless frame coding unit 21C, and the voiced sound determination unit 43 Corresponds to the sound presence determination unit 21A.

【０１１０】ただし有音判定部４３は、有音判定を行う
点では、前記有音判定部２１Ａと同じであるが、有音フ
レーム符号化部４１Ｂまたは無音フレーム符号化部４１
Ｃに音声データＡＤを供給する機能は音声分析部４２が
装備している等、その他の点でも、後述する種々の相違
を有する。However, the sound determination section 43 is the same as the sound determination section 21A in that the sound determination is performed, but the voice determination section 43A or the silent frame coding section 41A performs the voice determination.
The function of supplying the audio data AD to C is provided with the audio analysis unit 42, and also has various differences described later in other points.

【０１１１】音声入力部２０から１フレーム分の音声デ
ータＡＤを受け取る音声分析部４２は、当該音声データ
ＡＤに対して音声分析を実行して、その分析結果を有音
判定部４３に供給する。The voice analysis unit 42 that receives the audio data AD for one frame from the voice input unit 20 performs a voice analysis on the voice data AD, and supplies the analysis result to the sound determination unit 43.

【０１１２】本実施形態では、当該音声分析として、Ｌ
ＰＣ分析を行い、その分析結果や、音声データＡＤから
抽出することのできる付帯情報（例えば、パワー計算結
果やゼロクロス回数など）を、前記Ｔｈｒｅｓｈ（Ｎｌ
ｅｖ）を生成するために用いる基礎情報ＢＩとして有音
判定部４３に供給する。In the present embodiment, as the voice analysis, L
A PC analysis is performed, and the analysis result and incidental information (for example, the power calculation result and the number of zero crossings) that can be extracted from the audio data AD are stored in the Thresh (Nl
ev) is supplied to the sound existence determination unit 43 as basic information BI used for generating ev).

【０１１３】有音判定部４３は、当該基礎情報ＢＩに基
づいて、Ｔｈｒｅｓｈ（Ｎｌｅｖ）を生成する機能を装
備している。[0113] The sound existence judgment unit 43 has a function of generating a Threshold (Nlev) based on the basic information BI.

【０１１４】なお、基礎情報ＢＩは、Ｔｈｒｅｓｈ（Ｎ
ｌｅｖ）を生成するために用いられるほか、有音判定の
対象とするためにも使用される。The basic information BI is defined as Thresh (N
lev) is generated, and also used as an object of sound determination.

【０１１５】一方、有音判定Ｓｃａｌｅ蓄積部４５は、
有音判定レベル計算部４４から有音判定スケールＳｃａ
ｌｅを受け取って蓄積し、有音判定部４３に供給する。On the other hand, the sound determination Scale storage unit 45
From the sound judgment level calculation unit 44, the sound judgment scale Sca is calculated.
le is received and stored, and is supplied to the sound existence determination unit 43.

【０１１６】前記有音判定部４３は、前記Ｔｈｒｅｓｈ
（Ｎｌｅｖ）とこの有音判定スケールＳｃａｌｅを乗算
して、判定用閾値ＳＣ（＝Ｓｃａｌｅ＊Ｔｈｒｅｓｈ
（Ｎｌｅｖ））を得る。この判定用閾値ＳＣが、有音判
定部４３における有音判定に使用される有音判定レベル
となる。The sound existence judging section 43 performs the above-mentioned Thresh.
(Nlev) is multiplied by this sound determination scale Scale to obtain a determination threshold SC (= Scale * Thresh)
(Nlev)). This threshold SC for determination is a sound determination level used for sound determination in the sound determination unit 43.

【０１１７】当該有音判定部４３による有音判定の判定
結果ＪＡが音声分析部４２に供給されると、音声分析部
４２は、当該判定結果ＪＡに応じて１フレーム分の音声
データＡＤを、有音フレーム符号化部４１Ｂまたは無音
フレーム符号化部４１Ｃのいずれかに供給する。When the judgment result JA of the sound judgment by the sound judgment section 43 is supplied to the sound analysis section 42, the sound analysis section 42 outputs the sound data AD for one frame in accordance with the judgment result JA. It is supplied to either the voiced frame coding unit 41B or the silent frame coding unit 41C.

【０１１８】次に、前記有音判定スケール蓄積部４４に
有音判定スケールＳｃａｌｅを供給する有音判定レベル
計算部４４の処理について説明する。この処理は、図７
のフローチャートに示すように、受信状態情報ＳＩ（Ｌ
ｏｓｓ）を有音判定スケールＳｃａｌｅに変換する処理
である。Next, the processing of the sound judgment level calculation unit 44 for supplying the sound judgment scale Scale to the sound judgment scale accumulation unit 44 will be described. This processing is shown in FIG.
As shown in the flowchart of FIG.
oss) is converted to a sound determination scale Scale.

【０１１９】図７のフローチャートは、Ｓ３０〜Ｓ３５
の各ステップから構成されている。The flowchart of FIG.
It consists of each step.

【０１２０】図７において、処理がスタートすると（Ｓ
３０）、実数型変数Ｓｃａｌｅには初期値として実数
１．００が格納される。In FIG. 7, when the process starts (S
30), the real number variable Scale stores a real number 1.00 as an initial value.

【０１２１】次に、関数Ｇｅｔ＿Ｌｏｓｓが、受信状態
情報ＳＩとして受信状態情報受信部２３から供給された
前記変数Ｌｏｓｓの値を取得して、有音判定レベル計算
部２４側の変数Ｌｏｓｓに代入する（Ｓ３１）。Next, the function Get_Loss acquires the value of the variable Loss supplied from the reception state information receiving section 23 as the reception state information SI, and substitutes it for the variable Loss on the sound determination level calculation section 24 side ( S31).

【０１２２】このステップＳ３１は、前記ステップＳ２
１とまったく同じ処理である。したがってステップＳ３
１でも、関数Ｇｅｔ＿Ｌｏｓｓは、新たな受信状態情報
ＳＩが供給されるたびに同じ動作を繰り返すことにな
る。This step S31 is performed in step S2.
This is exactly the same processing as 1. Therefore, step S3
Even if it is 1, the function Get_Loss repeats the same operation every time new reception state information SI is supplied.

【０１２３】そして、当該変数Ｌｏｓｓに格納された値
が「０」であれば、処理はステップＳ３３に進み、
「０」でなければステップＳ３５に進む。If the value stored in the variable Loss is “0”, the process proceeds to step S33,
If not "0", the process proceeds to step S35.

【０１２４】ステップＳ３３では、直前の変数Ｓｃａｌ
ｅの値（初回は前記初期値１．００）にαを乗じて得ら
れる値（Ｓｃａｌｅ＊α）が、変数Ｓｃａｌｅに代入さ
れる。ここで、αは、０＜α＜１の実数定数である。In step S33, the immediately preceding variable Scal
The value (Scale * α) obtained by multiplying the value of e (the initial value is the initial value 1.00) by α is substituted for the variable Scale. Here, α is a real constant of 0 <α <1.

【０１２５】ステップＳ３３につづくステップＳ３４で
は、当該変数Ｓｃａｌｅの値が、関数ＳｅｔＥｎｃによ
って有音判定部４３の判定用閾値ＳＣとして設定され、
処理は、前記ステップＳ３１に戻る。In step S34 following step S33, the value of the variable Scale is set as the determination threshold SC of the soundness determination unit 43 by the function SetEnc.
The process returns to step S31.

【０１２６】一方、前記ステップＳ３５では、直前の変
数Ｓｃａｌｅの値に変数Ｌｏｓｓの値と定数βの値を乗
算した結果を、変数Ｓｃａｌｅに代入する。ここで、定
数βは、１＜βの実数（または整数）である。On the other hand, in step S35, the result of multiplying the value of the variable Scale immediately before by the value of the variable Loss and the value of the constant β is assigned to the variable Scale. Here, the constant β is a real number (or an integer) of 1 <β.

【０１２７】当該ステップＳ３５の次には、前記ステッ
プＳ３４が行われ、変数Ｓｃａｌｅの値が、関数Ｓｅｔ
Ｅｎｃによって有音判定部４３に供給され、処理は、前
記ステップＳ３１に戻る。After the step S35, the step S34 is performed, and the value of the variable Scale is changed to the function Set.
The data is supplied to the sound determination unit 43 by Enc, and the process returns to the step S31.

【０１２８】このような図７のフローチャートによれ
ば、パケット損失のない状態がつづくと定数αが累乗さ
れて変数Ｓｃａｌｅの値が低下するので判定用閾値ＳＣ
も低下して行き、反対に、パケット損失のある状態がつ
づくと定数βが累乗されて判定用閾値ＳＣが上昇して行
く。According to the flowchart of FIG. 7, if the state where there is no packet loss continues, the constant α is raised to the power and the value of the variable Scale decreases.
On the other hand, when the state with the packet loss continues, the constant β is raised to the power and the determination threshold SC increases.

【０１２９】ネットワーク１２のトラフィックが安定
し、音声データＡＤの内容も安定している場合、ステッ
プＳ３２のＹｅｓ側とＮｏ側がほぼ均等な頻度で選択さ
れて、判定用閾値ＳＣは一定値（最適値）をはさんで小
さな振幅で変動する状態を維持することになる。If the traffic on the network 12 is stable and the contents of the audio data AD are also stable, the Yes side and the No side in step S32 are selected with almost equal frequency, and the determination threshold SC is set to a fixed value (optimal value). ) To maintain a state of fluctuation with a small amplitude.

【０１３０】そして、ネットワーク１２のトラフィック
または音声データＡＤの内容が変動すればそれに応じ
て、ステップＳ３２のＹｅｓ側とＮｏ側の選択頻度が不
均等となり、前記最適値の変動に追従する。If the traffic of the network 12 or the content of the audio data AD fluctuates, the selection frequency of the Yes side and the No side in step S32 becomes unequal, and follows the fluctuation of the optimum value.

【０１３１】第１の実施形態では、ネットワーク１２の
トラフィックが変動し、受信状態情報ＳＩが変化した場
合にだけ有音判定レベルＬＴＨが変化したが、本実施形
態では、ネットワーク１２のトラフィックがほぼ一定で
あっても、音声データＡＤの内容が変化すると、その変
化に応じて判定用閾値ＳＣが変化し得る。In the first embodiment, the presence / absence level LTH changes only when the traffic on the network 12 fluctuates and the reception state information SI changes. In the present embodiment, the traffic on the network 12 is substantially constant. However, if the content of the audio data AD changes, the determination threshold SC may change according to the change.

【０１３２】（Ｂ−２）第２の実施形態の効果本実施形態によれば、第１の実施形態の効果と同等な効
果を得ることができる。(B-2) Effects of the Second Embodiment According to the present embodiment, the same effects as those of the first embodiment can be obtained.

【０１３３】加えて、本実施形態では、判定用閾値（Ｓ
Ｃ）が、音声入力部（２０）から供給される音声データ
（ＡＤ）の内容に応じても動的に変動するため、より適
切で精密なビットレートの制御が可能である。In addition, in the present embodiment, the judgment threshold (S
C) dynamically fluctuates according to the content of the audio data (AD) supplied from the audio input unit (20), so that more appropriate and precise control of the bit rate is possible.

【０１３４】(Ｃ）他の実施形態なお、上記第１および第２の実施形態では、２台の端末
１２、１３間で音声パケットＡＰを送受する場合につい
て説明したが、図８に示すように、同一のネットワーク
１２に多数の端末５１〜５６を接続して複数組の端末間
で音声パケット送受を行うシステム構成としてもよい。
図８の場合、前記端末１１と１３のように相互に通信す
る端末は、本発明に対応したものでなければならない
が、システム中に、本発明に対応していない端末が混在
してもかまわない。(C) Other Embodiments In the first and second embodiments, the case where the voice packet AP is transmitted and received between the two terminals 12 and 13 has been described. As shown in FIG. Alternatively, a system configuration in which a large number of terminals 51 to 56 are connected to the same network 12 to transmit and receive voice packets between a plurality of sets of terminals may be adopted.
In the case of FIG. 8, the terminals that communicate with each other, such as the terminals 11 and 13, must be compatible with the present invention. However, terminals that do not support the present invention may be mixed in the system. Absent.

【０１３５】また、第１および第２の実施形態では、前
記音声送信端末１１および音声受信端末１３の実現例と
して、ＩＰネットワークインタフェース手段とサウンド
入出力手段を備えたＰＣ（パーソナルコンピュータ）を
示したが、この各ＰＣを、図９に示すように複数の装置
を組合せたもので置換することもできる。In the first and second embodiments, a PC (personal computer) having an IP network interface unit and a sound input / output unit has been described as an example of realizing the voice transmitting terminal 11 and the voice receiving terminal 13. However, each PC can be replaced with a combination of a plurality of devices as shown in FIG.

【０１３６】たとえば、電話機６４とＰＢＸ（構内交換
機）６３と電話−パケット変換装置６２とを組み合わせ
たものや、ＩＰプロトコルに対応した電話−パケット変
換装置６０と電話機６１を組み合わせたもの等で、前記
各ＰＣを置換してもよい。変換装置としては、例えばＶ
ｏＩＰ（voice over IP）ゲートウェイ装置である沖
電気製ＢＳ−１２００などを用いることができる。For example, a combination of a telephone 64, a PBX (Private Branch Exchange) 63 and a telephone-packet converter 62, a combination of a telephone-packet converter 60 compatible with the IP protocol and a telephone 61, etc. Each PC may be replaced. As a conversion device, for example, V
An oIP (voice over IP) gateway device such as BS-1200 manufactured by Oki Electric Corporation can be used.

【０１３７】また、上記第１および第２の実施形態で
は、前記連続情報として、パケット番号ｐＩＤを使用し
たが、当該パケット番号ｐＩＤの替わりとして、例えば
タイムスタンプ情報などを使用することもできる。タイ
ムスタンプ情報とは、音声パケットＡＰが送信された時
刻や音声パケットＡＰが生成された時刻を示す情報であ
る。Although the packet number pID is used as the continuous information in the first and second embodiments, for example, time stamp information or the like can be used instead of the packet number pID. The time stamp information is information indicating the time at which the voice packet AP was transmitted or the time at which the voice packet AP was generated.

【０１３８】連続情報としてタイムスタンプを使う場合
の処理の概略を、図１を参照しながら説明すると、次の
ようになる。The outline of the processing when a time stamp is used as continuous information will be described below with reference to FIG.

【０１３９】まず、音声パケット送信部２２内の連続情
報付与部２２Ａは、各音声パケットＡＰを送出する際に
タイムスタンプを当該音声パケットＡＰに付加する。First, the continuation information adding unit 22A in the voice packet transmitting unit 22 adds a time stamp to each voice packet AP when transmitting each voice packet AP.

【０１４０】次に、音声パケット受信部３０は、受取っ
た各音声パケットＡＰのタイムスタンプと、現在の時刻
を比較し、この音声パケットＡＰの転送に要した時間
（パケット転送時間）を算出する。Next, the voice packet receiving unit 30 compares the time stamp of each received voice packet AP with the current time, and calculates the time required for transferring the voice packet AP (packet transfer time).

【０１４１】そして、受信状態情報作成部３３は、あら
かじめ定めた平均パケット転送時間と、各音声パケット
ＡＰのパケット転送時間を比較することでパケット揺ら
ぎ時間（ジッタ）を得、１００パケット毎にジッタ分散
を計算し、これを受信状態情報ＳＩとする。The reception state information creating section 33 obtains a packet fluctuation time (jitter) by comparing a predetermined average packet transfer time with a packet transfer time of each voice packet AP, and obtains a jitter dispersion every 100 packets. Is calculated, and this is used as reception state information SI.

【０１４２】なお、前記パケット番号ｐＩＤやこのタイ
ムスタンプ情報等の連続情報を音声パケットＡＰに付加
する方法は、任意の方法で良く、ＲＴＰ（ＲｅａｌＴｉ
ｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｃｏｌ）などの既存
のプロトコルを使っても良い。The method of adding continuous information such as the packet number pID and the time stamp information to the voice packet AP may be any method, and may be RTP (RealTiP)
An existing protocol such as meTransport Protocol may be used.

【０１４３】また、前記受信状態情報の送付間隔、すな
わちパケットＲＰの送信間隔は任意であるが、あまり頻
繁に行うと、パケットＲＰの伝送自体がネットワーク
（１２）のトラフィックに対する負荷となるので、注意
が必要である。The transmission interval of the reception status information, that is, the transmission interval of the packet RP is arbitrary, but if it is performed too frequently, the transmission of the packet RP itself becomes a load on the traffic of the network (12). is necessary.

【０１４４】さらに、前記符号化方法としては、ＰＣＭ
の１６ビットと、４ビットを使って説明したが、１フレ
ームあたりの音声符号化データ量が有音フレームより無
音フレームのほうが少なくなる組み合わせであれば、ど
のような方法を使用してもよい。例えばＩＴＵ−Ｔ勧告
Ｇ．７２９ＡｎｎｅｘＢに示されるＧ．７２９のた
めの無音圧縮スキームを使っても良い。Further, as the encoding method, PCM
Although the above description has been made using 16 bits and 4 bits, any method may be used as long as the amount of encoded audio data per frame is smaller in a silent frame than in a voiced frame. For example, ITU-T Recommendation G. G.729 Annex B. 729 may be used.

【０１４５】なお、上記第１および第２の実施形態で
は、有音判定レベルＬＴＨを動的に制御する方法をしめ
したが、無音圧縮機能をＯＮ・ＯＦＦする２値制御で行
っても良い。この場合は、このようなＯＮ・ＯＦＦスイ
ッチを持っている音声符号化部に全く変更を加えること
なく、本発明を実現することが可能である。In the first and second embodiments, the method of dynamically controlling the sound determination level LTH has been described. However, the control may be performed by binary control for turning on / off the silent compression function. In this case, the present invention can be realized without making any change to the audio encoding unit having such an ON / OFF switch.

【０１４６】また、上記第１および第２の実施形態で
は、音声パケットＡＰに関しては、音声送信端末１１は
送信機能のみを備え、前記音声受信端末１３は受信機能
のみを備えていたが、端末１１，１３ともに、音声パケ
ットの送信機能と受信機能の両機能を装備して双方向に
音声パケット通信ができる構成としてもよい。In the first and second embodiments, with respect to the voice packet AP, the voice transmitting terminal 11 has only the transmitting function, and the voice receiving terminal 13 has only the receiving function. , 13 may be provided with both a voice packet transmitting function and a voice packet receiving function to enable bidirectional voice packet communication.

【０１４７】その場合、前記受信状態情報ＳＩは、独立
したパケットＲＰに収容するのではなく、端末１３から
端末１１に宛てて送信する音声パケットの一部に収容す
るようにすることも可能である。In this case, the reception status information SI can be contained not in an independent packet RP but in a part of a voice packet transmitted from the terminal 13 to the terminal 11. .

【０１４８】また、この場合には、図１０に示すよう
に、有音判定レベル計算部８１を、受信端末（送受信端
末）７１内に設けても良い。In this case, as shown in FIG. 10, a sound determination level calculation unit 81 may be provided in the reception terminal (transmission / reception terminal) 71.

【０１４９】さらに、図４のステップＳ１２では、音声
パケットＡＰの１００パケット当たりのパケット損失数
を検査するために、変数Ｎｕｍの値が「１００」と等し
いかどうかを調べたが、当該「１００」は１００以外の
任意の定数に変更し、任意のパケット数ごとのパケット
損失数を検査することができる。Further, in step S12 of FIG. 4, to check the number of packet losses per 100 packets of the voice packet AP, it was checked whether or not the value of the variable Num was equal to "100". Can be changed to an arbitrary constant other than 100, and the number of packet losses for each arbitrary number of packets can be inspected.

【０１５０】なお、受信状況情報としては、前記受信状
態情報ＳＩの替わりに、損失率の他の特徴量（例えばジ
ッタの程度や遅延時間の程度）を検査するようにしても
よい。すなわち受信状態情報としては、ネットワーク
（１２）上のトラフィック量に関わる音声パケットＡＰ
の受信状態を表す変数であれば何を使っても良い。As the reception status information, instead of the reception status information SI, other characteristic amounts (for example, the degree of jitter and the degree of delay time) of the loss rate may be inspected. That is, as the reception state information, the voice packet AP related to the traffic volume on the network
Any variable may be used as long as it is a variable that indicates the reception state of.

【０１５１】また、図５のフローチャートでは、有音判
定レベルＬＴＨを変更するために定数α、βを使った
が、パケット受信状態が悪いと判断した場合に有音判定
レベルＬＴＨを上げ、パケット受信状態にゆとりがある
と判断した場合には、有音判定レベルＬＴＨを下げるこ
とができる方法であれば、その他の方法でも使用可能で
ある。In the flowchart of FIG. 5, the constants α and β are used to change the sound determination level LTH. However, when it is determined that the packet reception state is bad, the sound determination level LTH is increased and the packet reception level is increased. If it is determined that there is enough space, other methods can be used as long as the method can lower the sound determination level LTH.

【０１５２】例えば、過去数期間の受信状態を記憶して
おきそれを反映させた形で有音判定レベルのレベル変更
を行っても良い。一例として、１０期間連続でＬｏｓｓ
＝０の場合にはじめて有音判定レベルＬＴＨをαだけ上
げる方法などを使っても良い。For example, the reception state of the past several periods may be stored, and the level of the sound determination level may be changed in a manner reflecting the reception state. As an example, Loss for 10 consecutive periods
For the first time when = 0, a method of increasing the sound determination level LTH by α may be used.

【０１５３】また、図５のフローチャートでは、変数Ｌ
ｏｓｓの値が「０」であるか否かに応じて２通りのルー
プを実行するようにしたが、「０」でないケースでも変
数Ｌｏｓｓの数値範囲には１〜１００の幅があり得るの
で、例えば「１」の場合と「１０」の場合とで異なるル
ープを実行するようにしてもよい。In the flowchart of FIG. 5, the variable L
Two types of loops are executed depending on whether the value of oss is “0”. However, even in the case where the value of oss is not “0”, the value range of the variable Loss can have a width of 1 to 100. For example, different loops may be executed for “1” and “10”.

【０１５４】さらに、図４のフローチャートは上述した
ように非常に繰り返し回数が多いので、このフローチャ
ートの処理が、システム全体の動作速度向上のボトルネ
ックとなる可能性がある。したがって、図４のフローチ
ャートをもっと効率的に構成すると、システム全体を高
速化できるなど、利点が大きい。Further, since the flow chart of FIG. 4 has a very large number of repetitions as described above, the processing of this flow chart may be a bottleneck in improving the operation speed of the entire system. Therefore, if the flow chart of FIG. 4 is configured more efficiently, there is a great advantage that the speed of the entire system can be increased.

【０１５５】例えば、ネットワーク（１２）の輻輳度な
どの条件によっては、図４のフローチャートのように前
記パケットカウンタＮｕｍの値に応じて分岐するのでは
なく、パケットロスカウンタＬｏｓｓの値に応じて分岐
するフローチャートを構成したほうが効率的となる可能
性もある。一例としては、パケットロスカウンタＬｏｓ
ｓの値が５になったらそのときのパケット損失率を計算
し、当該パケット損失率を前記受信状態情報ＳＩとして
用いるようにしてもよい。For example, depending on conditions such as the degree of congestion of the network (12), branching is not performed according to the value of the packet counter Num as in the flowchart of FIG. 4, but is performed according to the value of the packet loss counter Loss. There is a possibility that it is more efficient to configure a flowchart that performs the processing. As an example, a packet loss counter Los
When the value of s becomes 5, the packet loss rate at that time may be calculated, and the packet loss rate may be used as the reception state information SI.

【０１５６】また、上記第１および第２の実施形態で
は、ＩＰプロトコルに対応したＩＰネットワーク１２を
使用したが、本発明で用いるネットワークは、ＩＰネッ
トワークに限らない。In the first and second embodiments, the IP network 12 compatible with the IP protocol is used. However, the network used in the present invention is not limited to the IP network.

【０１５７】なお、図４、図５、図７のフローチャート
の機能は、どのようなプログラミング言語を用いて記述
してもよく、論理回路を利用してハードウエア的に実現
することも可能である。The functions of the flowcharts in FIGS. 4, 5, and 7 may be described using any programming language, and may be realized in hardware using a logic circuit. .

【０１５８】[0158]

【発明の効果】以上に説明したように、本発明によれ
ば、音質低下を、ネットワークのトラフィックに対応し
た最小限度内に抑制し、高品質で信頼性の高い音声通信
を実現することができる。As described above, according to the present invention, a decrease in sound quality can be suppressed to a minimum level corresponding to network traffic, and high-quality and highly reliable voice communication can be realized. .

[Brief description of the drawings]

【図１】第１の実施形態に係る通信システムの全体構成
を示す概略図である。FIG. 1 is a schematic diagram illustrating an overall configuration of a communication system according to a first embodiment.

【図２】平均ビットレートと音質の関係を示す概略図で
ある。FIG. 2 is a schematic diagram showing a relationship between an average bit rate and sound quality.

【図３】第１の実施形態の主要部の概略構成を示すブロ
ック図である。FIG. 3 is a block diagram illustrating a schematic configuration of a main part of the first embodiment.

【図４】第１の実施形態の動作説明図である。FIG. 4 is an operation explanatory diagram of the first embodiment.

【図５】第１の実施形態の動作説明図である。FIG. 5 is an operation explanatory diagram of the first embodiment.

【図６】第２の実施形態の主要部の概略構成を示すブロ
ック図である。FIG. 6 is a block diagram illustrating a schematic configuration of a main part of the second embodiment.

【図７】第２の実施形態の動作説明図である。FIG. 7 is an operation explanatory diagram of the second embodiment.

【図８】他の実施形態にかかる通信システムの全体構成
を示す概略図である。FIG. 8 is a schematic diagram illustrating an overall configuration of a communication system according to another embodiment.

【図９】他の実施形態にかかる通信システムの全体構成
を示す概略図である。FIG. 9 is a schematic diagram illustrating an overall configuration of a communication system according to another embodiment.

【図１０】他の実施形態にかかる通信システムの全体構
成を示す概略図である。FIG. 10 is a schematic diagram illustrating an overall configuration of a communication system according to another embodiment.

[Explanation of symbols]

１０，４０…通信システム、１１…音声送信端末、１２
…ネットワーク、１３…音声受信端末、２０…音声入力
部、２１、４１…音声符号化部、２１Ａ、４３…有音判
定部、２１Ｂ、４１Ｂ…有音フレーム符号化部、２１
Ｃ、４１Ｃ…無音フレーム符号化部、２２…音声パケッ
ト送信部、２３…受信状態情報受信部、２４、４４…有
音判定レベル計算部、３０…音声パケット受信部、３１
…音声復号部、３２…音声出力部、３３…受信状態情報
作成部、３４…受信状態情報送信部、４２…音声分析
部、４５…音声判定Ｓｃａｌｅ蓄積部。10, 40: communication system, 11: voice transmitting terminal, 12
... Network, 13 ... Speech receiving terminal, 20 ... Speech input unit, 21, 41 ... Speech coding unit, 21A, 43 ... Speech determination unit, 21B, 41B ... Speech frame coding unit, 21
C, 41C: silent frame encoding unit, 22: voice packet transmitting unit, 23: reception state information receiving unit, 24, 44: voiced judgment level calculation unit, 30: voice packet receiving unit, 31
... Voice decoding unit, 32 voice output unit, 33 reception state information creation unit, 34 reception state information transmission unit, 42 voice analysis unit, 45 voice determination Scale storage unit.

Claims

[Claims]

An audio packet receiving apparatus for receiving, via a predetermined network, an audio packet transmitted from an audio packet transmitting apparatus, detects a reception state of the audio packet that fluctuates according to traffic on the network. Receiving status detecting means, and receiving status information transmitting means for transmitting the receiving status information according to the receiving status detected by the receiving status detecting means to the network in order to deliver the receiving status information to the voice packet transmitting apparatus. Characteristic voice packet receiving device.

2. A voice packet transmitting apparatus for transmitting a voice packet to a voice packet receiving apparatus via a predetermined network, wherein a voice frame to be transmitted is voiced or silent in accordance with a determination level. Determination determining means; voice coding means for coding the silent frame determined to be silent by the determining means so as to have a smaller amount of information than the voiced frame determined to be voiced; Receiving status information receiving means for receiving, via the network, receiving status information transmitted by the voice packet receiving device according to the receiving status of the voice packet which varies correspondingly; A decision level changing means for changing the decision level in accordance with the received reception status information. The transmission device.

3. A packet communication system comprising: the voice packet receiving device according to claim 1; and the voice packet transmitting device according to claim 2.