JP2007312265A

JP2007312265A - Voice packet communication system and speech reproducer

Info

Publication number: JP2007312265A
Application number: JP2006141154A
Authority: JP
Inventors: Ken Yoshii; 謙吉井
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2006-05-22
Filing date: 2006-05-22
Publication date: 2007-11-29

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice packet communication system in which even when a delay time of a voice packet varies, a high quality voice can be reproduced. <P>SOLUTION: In the voice packet communication system, a voice reproducer has a stored amount detecting means for detecting an amount of the voice packet stored in a voice jitter buffer, an information amount calculating means for calculating an information amount of a voice packet, and an information amount determining means for determining whether or not the information amount calculated by the information amount calculating means is equal to or less than a predetermined value. Further, a reproduction control means changes a reproduction time of the voice packet in which an information amount determinator determines that the information amount is equal to or less than the predetermined value in accordance with an amount of the voice packet detected by the stored amount detecting means. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声パケット通信システム、及び音声再生装置に関する。 The present invention relates to a voice packet communication system and a voice playback device.

従来、パケット通信方式の音声再生装置における音声再生方法としては、受信した音声パケットの順序番号を検査しながら順番に音声パケットキューに積み込み、音声フレーム再生の際に、前記音声パケットキューから順番に音声パケットを取り出し、その音声パケットに付加された順序番号を検査しながら音声パケットを逆変換し、正当な順番の音声フレーム列を得ている。しかしながら、ネットワークの遅延などにより音声パケットの到着が所定の時間間隔から変動するため、再生時に音声の途切れが発生することが課題であった。 Conventionally, as a voice reproduction method in a packet communication type voice reproduction apparatus, the received voice packets are sequentially loaded into the voice packet queue while inspecting the sequence number of the voice packets. The packet is taken out, the voice packet is reverse-transformed while checking the sequence number added to the voice packet, and the voice frame sequence in the correct order is obtained. However, since arrival of voice packets fluctuates from a predetermined time interval due to network delay or the like, there is a problem that voice breaks occur during reproduction.

このような課題に対応するため予め代替の音声フレームを用意し、音声パケットの到着が所定時間より遅延したときは代替の音声フレームを再生する方法が提案されている（例えば、特許文献１参照）。 In order to deal with such a problem, a method has been proposed in which an alternative voice frame is prepared in advance and the alternative voice frame is reproduced when the arrival of the voice packet is delayed from a predetermined time (see, for example, Patent Document 1). .

一方、パケット通信方式を用いた音声通信技術として、ネットワークの帯域削減をためパケットサイズを単位に有音声と無音声とを区別し、無音声部分を転送しない無音圧縮技術や、ＩＴＵ−Ｔ勧告Ｇ．７２９ＡｎｎｅｘＢに示されるように、有音声から無音声の切り替わり時、および無音声区間において背景雑音パターンの変化が生じた場合に限り、背景雑音を送出する圧縮手法が知られている。 On the other hand, as a voice communication technique using a packet communication method, a voice compression technique that distinguishes between voiced and voiceless in units of packet size in order to reduce the bandwidth of the network and does not transfer a voiceless part, or ITU-T Recommendation G . As shown in 729 Annex B, a compression method for transmitting background noise is known only when a background noise pattern is changed during switching from voiced voice to voiceless voice and in a voiceless zone.

また、ネットワーク部分に有音声、無音声状態における背景雑音、無音声状態の３つの状態を区別するための機能を持たせるため、パケットヘッダに３つの状態に関する識別情報を持たせ、パケットが未到着の区間は背景雑音など代替の音声を再生する方法が提案されている（例えば、特許文献２参照）。
特開平７−３８６０８号公報特開平１０−２８５２１２号公報 In addition, in order to provide the network part with a function for distinguishing the three states of voiced, non-voiced background noise, and no-voiced state, the packet header has identification information regarding the three states, and the packet has not arrived. A method of reproducing alternative voices such as background noise has been proposed (see, for example, Patent Document 2).
JP 7-38608 A Japanese Patent Laid-Open No. 10-285212

特許文献１の方法では、予め用意した代替の音声フレームを再生するので、再生音を聞くと違和感があり不自然に感じる場合があった。 In the method of Patent Document 1, since an alternative audio frame prepared in advance is reproduced, there is a case where it is unnatural and unnatural when the reproduced sound is heard.

一方、近年、ネットワークの通信帯域はブロードバンド化が進み、無音部分や背景雑音まで音声を圧縮して送信する必然性は少ない。 On the other hand, in recent years, the communication band of networks has become broadband, and there is little necessity for compressing and transmitting audio to silent portions and background noise.

しかしながら、特許文献２の方法では、送信側で有音声、無音声状態における背景雑音、無音声状態の判別をして、背景雑音と無音声状態を大幅に圧縮して送信しているので、音の臨場感を引き出すための重要な情報が損なわれ、音声品質を損なう可能性がある。 However, in the method of Patent Document 2, the transmission side discriminates the background noise and the no-voice state in the voiced and no-voice states, and the background noise and the no-voice state are greatly compressed and transmitted. Important information to bring out a sense of reality is lost, and there is a possibility that the voice quality is impaired.

本発明は、上記課題に鑑みてなされたものであって、音声パケットの遅延時間が変動した場合でも高品質な音声を再生可能な音声パケット通信システムを提供することを課題とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a voice packet communication system capable of reproducing high-quality voice even when the delay time of voice packets varies.

１．
音声信号をパケット化して音声パケットにする音声パケットエンコーダと、
音声パケットに再生情報を付与する再生情報付与部と、
前記再生情報を付与された音声パケットを送信する送信部と、を備えた音声送信装置と、
送信された前記音声パケットを受信する受信部と、
前記受信部が受信した前記音声パケットを順次蓄積する音声ジッタバッファと、
前記音声パケットをデコードして再生する音声パケットデコーダと、
前記再生情報を検出する再生情報検出手段と、
前記再生情報検出手段が検出した前記再生情報に基づいて、前記音声パケットデコーダに再生を指令する再生制御手段と、を備えた音声再生装置と、
を有する音声パケット通信システムにおいて、
前記音声再生装置は、
前記音声ジッタバッファに蓄積された音声パケットの量を検出する蓄積量検出手段と、
前記音声パケットデコーダで再生する音声パケットの情報量を算出する情報量算出手段と、
前記情報量算出手段が算出した情報量を判定する情報量判定手段とを有し、
前記再生制御手段は、
前記蓄積量検出手段の検出した音声パケットの量に応じて、前記情報量判定手段が所定値以下の情報量であると判定した音声パケットに関しその再生時間を変更することを特徴とする音声パケット通信システム。 1.
A voice packet encoder that packetizes voice signals into voice packets;
A reproduction information adding unit for adding reproduction information to the voice packet;
A voice transmitting device comprising: a transmitter that transmits the voice packet to which the reproduction information is attached;
A receiving unit for receiving the transmitted voice packet;
A voice jitter buffer for sequentially storing the voice packets received by the receiver;
An audio packet decoder for decoding and reproducing the audio packet;
Reproduction information detecting means for detecting the reproduction information;
An audio playback device comprising: playback control means for instructing the audio packet decoder to play based on the playback information detected by the playback information detection means;
In a voice packet communication system having
The audio playback device
Accumulated amount detecting means for detecting the amount of voice packets accumulated in the voice jitter buffer;
An information amount calculating means for calculating an information amount of an audio packet to be reproduced by the audio packet decoder;
An information amount determination means for determining the information amount calculated by the information amount calculation means;
The reproduction control means includes
Voice packet communication characterized in that, according to the amount of voice packets detected by the accumulated amount detection means, the playback time of the voice packets determined by the information amount judgment means to be information amount equal to or less than a predetermined value is changed. system.

２．
前記再生制御手段は、
前記蓄積量検出手段が検出した音声パケットの量が第１の値以上のとき、
前記情報量判定手段が所定値以下の情報量であると判定した音声パケットの再生を行わず、情報量が所定値以上の音声パケットを順次再生することを特徴とする１に記載の音声パケット通信システム。 2.
The reproduction control means includes
When the amount of voice packets detected by the accumulated amount detection means is greater than or equal to the first value,
2. The voice packet communication according to 1, wherein voice packets whose information amount is equal to or greater than a predetermined value are sequentially played back without performing playback of the voice packets determined by the information amount determination unit as being equal to or smaller than a predetermined value. system.

３．
前記再生制御手段は、
前記蓄積量検出手段が検出した音声パケットの量が第２の値以下のとき、
前記情報量判定手段が所定値以下の情報量であると判定した音声パケットの再生を前記音声パケットの量に応じて繰り返した後、次の音声パケットを順次再生することを特徴とする１に記載の音声パケット通信システム。 3.
The reproduction control means includes
When the amount of voice packets detected by the accumulated amount detection means is less than or equal to the second value,
2. The reproduction of the voice packet determined by the information amount determination means as being an information amount equal to or less than a predetermined value is repeated according to the amount of the voice packet, and then the next voice packet is sequentially reproduced. Voice packet communication system.

４．
前記再生制御手段は、
前記蓄積量検出手段が検出した音声パケットの量の変動量を算出し、
前記変動量が所定値未満のとき、前記第１の値と前記第２の値を少なくし、
前記変動量が所定値以上のとき、前記第１の値と前記第２の値を多くする、
ことを特徴とする１乃至３の何れか１項に記載の音声パケット通信システム。 4).
The reproduction control means includes
Calculating the amount of change in the amount of voice packets detected by the accumulated amount detection means;
When the fluctuation amount is less than a predetermined value, the first value and the second value are decreased,
When the fluctuation amount is a predetermined value or more, the first value and the second value are increased.
The voice packet communication system according to any one of claims 1 to 3, wherein

５．
前記情報量算出手段は、前記音声信号の平均値に基づいて前記情報量を算出することを特徴とする１乃至４の何れか１項に記載の音声パケット通信システム。 5).
The voice packet communication system according to any one of claims 1 to 4, wherein the information amount calculation means calculates the information amount based on an average value of the voice signal.

６．
前記情報量算出手段は、前記音声信号の各周波数成分を算出し、前回までに算出した音声パケットの各周波数成分の値との加算平均値と、今回算出した各周波数成分の値との差分に基づいて前記情報量を算出することを特徴とする１乃至４の何れか１項に記載の音声パケット通信システム。 6).
The information amount calculating means calculates each frequency component of the audio signal, and calculates a difference between the addition average value of each frequency component value of the audio packet calculated up to the previous time and the value of each frequency component calculated this time. 5. The voice packet communication system according to any one of claims 1 to 4, wherein the information amount is calculated based on the information amount.

７．
再生情報を付与された音声パケットを受信する受信部と、
前記受信部が受信した前記音声パケットを順次蓄積する音声ジッタバッファと、
前記音声パケットをデコードして再生する音声パケットデコーダと、
前記再生情報を検出する再生情報検出手段と、
前記再生情報検出手段が検出した前記再生情報に基づいて、前記音声パケットデコーダに再生を指令する再生制御手段と、を備えた音声再生装置において、
前記音声再生装置は、
前記音声ジッタバッファに蓄積された音声パケットの量を検出する蓄積量検出手段と、
音声パケットの情報量を算出する情報量算出手段と、
前記情報量算出手段が算出した情報量を判定する情報量判定手段とを有し、
前記再生制御手段は、
前記蓄積量検出手段の検出した音声パケットの量に応じて、前記情報量判定手段が所定値以下の情報量であると判定した音声パケットに関しその再生時間を変更することを特徴とする音声再生装置。 7).
A receiving unit for receiving a voice packet to which reproduction information is given;
A voice jitter buffer for sequentially storing the voice packets received by the receiver;
An audio packet decoder for decoding and reproducing the audio packet;
Reproduction information detecting means for detecting the reproduction information;
A playback control means for instructing playback to the audio packet decoder based on the playback information detected by the playback information detection means;
The audio playback device
Accumulated amount detecting means for detecting the amount of voice packets accumulated in the voice jitter buffer;
Information amount calculating means for calculating the information amount of the voice packet;
An information amount determination means for determining the information amount calculated by the information amount calculation means;
The reproduction control means includes
An audio reproduction apparatus characterized in that, according to the amount of audio packets detected by the accumulated amount detection means, the reproduction time of the audio packets determined by the information amount determination means to be an information amount equal to or less than a predetermined value is changed. .

８．
前記再生制御手段は、
前記蓄積量検出手段が検出した音声パケットの量が第１の値以上のとき、
前記情報量判定手段が所定値以下の情報量であると判定した音声パケットの再生を行わず、情報量が所定値以上の音声パケットを順次再生することを特徴とする７に記載の音声再生装置。 8).
The reproduction control means includes
When the amount of voice packets detected by the accumulated amount detection means is greater than or equal to the first value,
8. The audio reproducing device according to claim 7, wherein the audio packet whose information amount is not less than a predetermined value is sequentially reproduced without reproducing the audio packet determined by the information amount determining means to be the information amount not larger than the predetermined value. .

９．
前記再生制御手段は、
前記蓄積量検出手段が検出した音声パケットの量が第２の値以下のとき、
前記情報量判定手段が所定値以下の情報量であると判定した音声パケットの再生を前記音声パケットの量に応じて繰り返した後、次の音声パケットを順次再生することを特徴とする７に記載の音声再生装置。 9.
The reproduction control means includes
When the amount of voice packets detected by the accumulated amount detection means is less than or equal to the second value,
8. The reproduction of the voice packet determined by the information amount determination unit to be an information amount equal to or less than a predetermined value is repeated according to the amount of the voice packet, and then the next voice packet is sequentially reproduced. Audio playback device.

１０．
前記再生制御手段は、
前記蓄積量検出手段が検出した音声パケットの量の変動量を算出し、
前記変動量が所定値未満のとき、前記第１の値と前記第２の値を少なくし、
前記変動量が所定値以上のとき、前記第１の値と前記第２の値を多くする、
ことを特徴とする７乃至９の何れか１項に記載の音声再生装置。 10.
The reproduction control means includes
Calculating the amount of change in the amount of voice packets detected by the accumulated amount detection means;
When the fluctuation amount is less than a predetermined value, the first value and the second value are decreased,
When the fluctuation amount is a predetermined value or more, the first value and the second value are increased.
10. The sound reproducing device according to any one of 7 to 9, wherein

１１．
前記情報量算出手段は、音声信号の平均値に基づいて前記情報量を算出することを特徴とする７乃至１０の何れか１項に記載の音声再生装置。 11.
11. The audio reproduction device according to any one of 7 to 10, wherein the information amount calculation unit calculates the information amount based on an average value of an audio signal.

１２．
前記情報量算出手段は、前記音声信号の各周波数成分を算出し、前回までに算出した音声パケットの各周波数成分の値との加算平均値と、今回算出した各周波数成分の値との差分に基づいて前記情報量を算出することを特徴とする７乃至１０の何れか１項に記載の音声再生装置。 12
The information amount calculating means calculates each frequency component of the audio signal, and calculates a difference between the addition average value of each frequency component value of the audio packet calculated up to the previous time and the value of each frequency component calculated this time. The audio reproduction device according to any one of 7 to 10, wherein the information amount is calculated based on the information.

本発明によれば、音声パケットの遅延時間が変動した場合でも高品質な音声を再生可能な音声パケット通信システムを提供できる。 ADVANTAGE OF THE INVENTION According to this invention, the audio | voice packet communication system which can reproduce | regenerate a high quality audio | voice can be provided even when the delay time of an audio | voice packet fluctuates.

以下、実施形態により本発明を詳しく説明するが、本発明はこれに限定されるものではない。 Hereinafter, the present invention will be described in detail with reference to embodiments, but the present invention is not limited thereto.

図１は本発明における音声パケット通信システムの一例を示すブロック図である。 FIG. 1 is a block diagram showing an example of a voice packet communication system according to the present invention.

図１（ａ）は全体構成を示すブロック図、図１（ｂ）はメインマイコン部７１３の詳細なブロック図である。 FIG. 1A is a block diagram showing the overall configuration, and FIG. 1B is a detailed block diagram of the main microcomputer unit 713.

図１（ａ）の音声送信装置７０１と音声再生装置７０２は伝送回線７１７に接続されている。音声送信装置７０１は入力された音声を音声パケットにエンコードして伝送回線７１７に送信し、伝送回線７１７を介して音声パケットを受信した音声再生装置７０２は音声を再生する。 The audio transmission device 701 and the audio reproduction device 702 in FIG. 1A are connected to a transmission line 717. The voice transmitting device 701 encodes the input voice into a voice packet and transmits it to the transmission line 717, and the voice reproducing device 702 that receives the voice packet via the transmission line 717 plays back the voice.

音声送信装置７０１は、音声入力部７０３、音声パケットエンコーダ７０４、再生情報付与部７０７、送信部７０８から構成される。 The audio transmission device 701 includes an audio input unit 703, an audio packet encoder 704, a reproduction information addition unit 707, and a transmission unit 708.

音声入力部７０３から入力された音声信号は音声パケットエンコーダ７０４でパケット化され、音声パケットになる。音声パケットは次に再生情報付与部７０７に入力され、再生情報付与部７０７は再生順序の情報を再生情報として音声パケットに記録する。送信部７０８は再生情報を付与された音声パケットを伝送回線７１７に送信する。 The voice signal input from the voice input unit 703 is packetized by the voice packet encoder 704 to become a voice packet. The audio packet is then input to the reproduction information adding unit 707, and the reproduction information adding unit 707 records the information of the reproduction order as reproduction information in the audio packet. The transmission unit 708 transmits the voice packet provided with the reproduction information to the transmission line 717.

伝送回線７１７は有線ネットワーク、又は、無線ネットワークであり、インターネット、イントラネットの何れでも良い。 The transmission line 717 is a wired network or a wireless network, and may be either the Internet or an intranet.

音声再生装置７０２は、音声出力部７０９、音声パケットデコーダ部７１０、メインマイコン部７１３、受信部７１４、音声ジッタバッファ７１６から構成されている。 The audio reproduction device 702 includes an audio output unit 709, an audio packet decoder unit 710, a main microcomputer unit 713, a reception unit 714, and an audio jitter buffer 716.

受信部７１４で伝送回線７１７を介して受信した音声パケットは、音声ジッタバッファ７１６に順次記憶される。音声ジッタバッファ７１６は、伝送回線７１７を介して受信する音声パケットの到着時間のバラツキを補正するために設けられたバッファメモリであり、受信した音声パケットは、音声ジッタバッファ７１６に一旦蓄えられる。 Voice packets received by the receiving unit 714 via the transmission line 717 are sequentially stored in the voice jitter buffer 716. The voice jitter buffer 716 is a buffer memory provided for correcting variations in arrival times of voice packets received via the transmission line 717, and the received voice packets are temporarily stored in the voice jitter buffer 716.

音声パケットデコーダ部７１０は、パケット化された音声パケットをデコードし、デコードした音声信号を再生し、音声出力部７０９から出力する。 The audio packet decoder unit 710 decodes the packetized audio packet, reproduces the decoded audio signal, and outputs it from the audio output unit 709.

次に、図１（ｂ）を用いてメインマイコン部７１３について説明する。 Next, the main microcomputer unit 713 will be described with reference to FIG.

メインマイコン部７１３は、マイクロコンピュータを備えて構成される。すなわち、メインマイコン部７１３は、各種演算処理を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０と、演算を行うための作業領域となるＲＡＭ２１と、制御プログラム等が記憶されるＲＯＭ２２とを備え、音声再生装置７０２の各処理部の動作を統括的に制御する。不揮発性メモリであるＲＯＭ２２としては、例えば、データの電気的な書き換えが可能なＥＥＰＲＯＭが採用される。 The main microcomputer unit 713 includes a microcomputer. That is, the main microcomputer unit 713 includes a CPU (Central Processing Unit) 20 that performs various arithmetic processing, a RAM 21 that is a work area for performing arithmetic, and a ROM 22 that stores a control program and the like, and an audio playback device 702. Centrally controls the operation of each processing unit. As the ROM 22 which is a non-volatile memory, for example, an EEPROM capable of electrically rewriting data is adopted.

本実施形態のＣＰＵ２０は、再生情報検出部１０、再生制御部１１、蓄積量検出部１２、情報量算出部１３、情報量判定部１４を有している。再生情報検出部１０、再生制御部１１、蓄積量検出部１２、情報量算出部１３、情報量判定部１４は、本発明の再生情報検出手段、再生制御手段、蓄積量検出手段、情報量算出手段、情報量判定手段である。 The CPU 20 of this embodiment includes a reproduction information detection unit 10, a reproduction control unit 11, an accumulation amount detection unit 12, an information amount calculation unit 13, and an information amount determination unit 14. The reproduction information detection unit 10, the reproduction control unit 11, the accumulation amount detection unit 12, the information amount calculation unit 13, and the information amount determination unit 14 are the reproduction information detection unit, reproduction control unit, accumulation amount detection unit, and information amount calculation according to the present invention. Means for determining the amount of information.

再生情報検出部１０は音声パケットの再生情報から再生の順序に関する情報を検出する。 The reproduction information detection unit 10 detects information related to the reproduction order from the reproduction information of the audio packet.

再生制御部１１は、再生情報検出部１０が検出した再生情報に基づいて、音声パケットデコーダ７１０にデコードと再生を指令する。 The playback control unit 11 instructs the audio packet decoder 710 to perform decoding and playback based on the playback information detected by the playback information detection unit 10.

蓄積量検出部１２は、音声ジッタバッファ７１６に蓄積された音声パケットの量を検出する。音声ジッタバッファ７１６には音声パケットが順に蓄積されるので、蓄積量検出部１２は、音声ジッタバッファ７１６のに蓄積された先頭の音声パケットのアドレスから音声ジッタバッファ７１６に蓄積された音声パケットの量を検出できる。 The accumulation amount detection unit 12 detects the amount of voice packets accumulated in the voice jitter buffer 716. Since voice packets are sequentially stored in the voice jitter buffer 716, the storage amount detection unit 12 starts from the address of the head voice packet stored in the voice jitter buffer 716 and the amount of voice packets stored in the voice jitter buffer 716. Can be detected.

情報量算出部１３は音声パケットの情報量を算出し、情報量判定部１４は算出した情報量が所定値以下か、否か、判定する。情報量は、例えば音声信号のレベルであり、所定値は無音状態と判定できるレベルである。このように音声信号のレベルが低い場合は、情報量も低く重要な情報が含まれない、と考えられる。 The information amount calculation unit 13 calculates the information amount of the voice packet, and the information amount determination unit 14 determines whether or not the calculated information amount is a predetermined value or less. The amount of information is, for example, the level of an audio signal, and the predetermined value is a level at which it can be determined that there is no sound. Thus, when the level of the audio signal is low, it is considered that the amount of information is low and important information is not included.

本実施形態では、音声送信装置７０１は、一例として、２０ｍｓｅｃ分の音声をパケット化して音声パケットにするものとする。また、再生情報付与部７０７は音声パケットに再生情報として送信順に番号を付与し、送信部７０８から音声再生装置７０２に送信するものとして以下の処理を説明する。なお、再生情報は時刻情報など再生側で再生順序を特定できる情報であれば、本実施形態で説明する番号に限定されるものではない。 In this embodiment, as an example, the voice transmission device 701 packetizes 20 msec of voice into voice packets. Further, the following processing will be described on the assumption that the reproduction information adding unit 707 assigns numbers to audio packets as reproduction information in the order of transmission and transmits the audio packets from the transmission unit 708 to the audio reproduction device 702. The reproduction information is not limited to the numbers described in the present embodiment as long as the reproduction information is information that can specify the reproduction order on the reproduction side, such as time information.

本実施形態のＣＰＵ２０の処理について、具体的な例を説明する。 A specific example of the processing of the CPU 20 of this embodiment will be described.

再生制御部１１は、受信した音声パケットが音声ジッタバッファ７１６に所定量蓄えられるまで再生を開始しない。例えば、再生制御部１１は、蓄積量検出部１２が検出した音声ジッタバッファ７１６に蓄積された音声パケットの量が、音声ジッタバッファ７１６の記憶容量の５０％に達した時点から再生の制御を開始する。 The playback control unit 11 does not start playback until the received voice packet is stored in the voice jitter buffer 716 by a predetermined amount. For example, the playback control unit 11 starts playback control when the amount of voice packets stored in the voice jitter buffer 716 detected by the storage amount detection unit 12 reaches 50% of the storage capacity of the voice jitter buffer 716. To do.

再生制御部１１は、再生情報検出部１０が検出した音声パケットの再生情報に記されている再生情報の順に再生するよう制御する。再生制御部１１は、再生しようとする音声パケットの再生情報から、次に再生すべき音声パケットを音声ジッタバッファ７１６から検索して音声パケットデコーダ部７１０に送り、次の音声を再生する。 The reproduction control unit 11 performs control so that reproduction is performed in the order of the reproduction information described in the reproduction information of the audio packet detected by the reproduction information detection unit 10. The reproduction control unit 11 retrieves the audio packet to be reproduced next from the audio jitter buffer 716 from the reproduction information of the audio packet to be reproduced, and sends it to the audio packet decoder unit 710 to reproduce the next audio.

ネットワークの伝搬遅延や、音声再生装置７０２のデータ処理時間の影響により、音声ジッタバッファ７１６がアンダーフローやオーバフローを起こす場合がある。図２を用いてこの現象について説明する。 The audio jitter buffer 716 may underflow or overflow due to the influence of the propagation delay of the network or the data processing time of the audio playback device 702. This phenomenon will be described with reference to FIG.

図２はアンダーフロー、オーバフローを模式的に説明する説明図である。図２において７１６は音声ジッタバッファ７１６に蓄えられた音声パケットの状態を模式的に表す図である。図２の７１６の外形は音声ジッタバッファ７１６の最大メモリ容量を表している。横軸はメモリアドレスであり、音声パケットは図２の左端のメモリアドレスから順にＦＩＦＯ（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）で蓄積される。斜線部は音声パケットが音声ジッタバッファ７１６に蓄積されている部分を表している。したがって、図２の左側の矢印方向に蓄積された音声パケットほど「新しい」パケットであり、右側の矢印方向に蓄積された音声パケットほど「古い」パケットである。 FIG. 2 is an explanatory diagram for schematically explaining underflow and overflow. In FIG. 2, reference numeral 716 is a diagram schematically showing the state of the voice packet stored in the voice jitter buffer 716. The outer shape 716 in FIG. 2 represents the maximum memory capacity of the audio jitter buffer 716. The horizontal axis represents the memory address, and the voice packets are stored in FIFO (First In First Out) in order from the memory address at the left end of FIG. The hatched portion represents a portion where the voice packet is accumulated in the voice jitter buffer 716. Therefore, the voice packet stored in the arrow direction on the left side of FIG. 2 is a “new” packet, and the voice packet stored in the arrow direction on the right side is an “old” packet.

音声ジッタバッファ７１６から音声パケットが枯渇したり（アンダーフロー）、あふれたり（オーバフロー）すると、音声の再生が途切れてしまうので、音声パケットはアンダーフロー、オーバフローのおそれがない適正な範囲の量が音声ジッタバッファ７１６に蓄積されていなければいけない。閾値Ｕ（Ｘ）以下のときアンダーフローのおそれが、閾値Ｏ（Ｘ）以上のときアンダーフローのおそれがあるとすると、適正な範囲はＵ（Ｘ）を超え、Ｏ（Ｘ）未満の範囲である。閾値Ｏ（Ｘ）は本発明の第１の値、閾値Ｕ（Ｘ）は本発明の第２の値である。 When audio packets are depleted (underflow) or overflow (overflow) from the audio jitter buffer 716, the audio playback is interrupted. It must be stored in the jitter buffer 716. If the risk of underflow is less than or equal to the threshold value U (X), and there is a risk of underflow when the value is greater than or equal to the threshold value O (X), the appropriate range is greater than U (X) and less than O (X). is there. The threshold value O (X) is the first value of the present invention, and the threshold value U (X) is the second value of the present invention.

このようなアンダーフロー、オーバーフローは、ＩＰネットワーク経由でデータの送受信を行っているため避けられない現象である。また、エンコーダー、デコーダーの処理時間の個体差によっても発生する。 Such underflow and overflow are phenomena that cannot be avoided because data is transmitted and received via the IP network. It also occurs due to individual differences in the processing time of the encoder and decoder.

図２（ａ）は通信状態が良く、音声パケットが順調に再生されている場合を説明する説明図である。 FIG. 2A is an explanatory diagram for explaining a case where the communication state is good and voice packets are being reproduced smoothly.

図２（ａ）の矢印Ａ１は次に再生する先頭の音声パケットを示している。このように、音声パケットが順調に再生されている場合は、多くの音声パケットを音声ジッタバッファ７１６に蓄積する必要はないので、蓄積量を減らして再生される音声の実時間からの遅れを少なくしている。閾値Ｏ（Ｘ）は矢印で示すＯ（１）、閾値Ｕ（Ｘ）は矢印で示すＵ（１）であり、音声パケットの蓄積量はＯ（１）からＵ（１）の範囲であれば良い。 An arrow A1 in FIG. 2A indicates the head audio packet to be reproduced next. In this way, when audio packets are being played back smoothly, it is not necessary to store many audio packets in the audio jitter buffer 716, so that the delay from the real time of the audio to be played is reduced by reducing the storage amount. is doing. The threshold value O (X) is O (1) indicated by an arrow, the threshold value U (X) is U (1) indicated by an arrow, and the accumulated amount of voice packets is within a range from O (1) to U (1). good.

図２（ｂ）は通信状態が悪く、音声パケットが順調に再生されていない場合を説明する説明図である。 FIG. 2B is an explanatory diagram for explaining a case where the communication state is bad and the voice packet is not reproduced smoothly.

図２（ａ）の矢印Ａ２は次に再生する先頭の音声パケットを示している。このように、音声パケットが順調に再生されていない場合は、図２（ａ）より多くの音声パケットを音声ジッタバッファ７１６に蓄積する。閾値Ｏ（Ｘ）は矢印で示すＯ（２）、閾値Ｕ（Ｘ）は矢印で示すＵ（２）であり、音声パケットの蓄積量は、Ｏ（２）からＵ（２）の範囲であれば良い。 An arrow A2 in FIG. 2A indicates the head audio packet to be reproduced next. As described above, when the voice packet is not reproduced smoothly, more voice packets than those in FIG. 2A are accumulated in the voice jitter buffer 716. The threshold value O (X) is O (2) indicated by an arrow, the threshold value U (X) is U (2) indicated by an arrow, and the accumulated amount of voice packets may be in the range of O (2) to U (2). It ’s fine.

次に、本実施形態において音声パケットの蓄積量を適正な範囲にする処理について説明する。 Next, a process of setting the voice packet accumulation amount to an appropriate range in the present embodiment will be described.

図３はメインマイコン部７１３による再生制御の流れを示すフローチャート処理である。 FIG. 3 is a flowchart process showing the flow of reproduction control by the main microcomputer unit 713.

図３のフローチャートでは、閾値の初期値は例えばＵ（１）＝１０％、Ｏ（１）＝５０％であり、受信した音声パケットの数が、例えば音声ジッタバッファ７１６の記憶容量の３０％に達し、再生を開始してからの処理について説明する。 In the flowchart of FIG. 3, the initial threshold values are, for example, U (1) = 10% and O (1) = 50%, and the number of received voice packets is, for example, 30% of the storage capacity of the voice jitter buffer 716. A process after reaching and starting reproduction will be described.

Ｓ１０１：最も古い再生情報を持つ音声パケットをデコードするステップである。 S101: This is a step of decoding a voice packet having the oldest reproduction information.

再生制御部１１は、再生情報検出部１０に指令して音声ジッタバッファ７１６に記憶されている最も古い再生情報を有する音声パケットを検索し、検索した音声パケットを音声パケットデコーダ部７１０に送り、音声パケットデコーダ部７１０は該音声パケットをデコードして音声データにする。 The reproduction control unit 11 instructs the reproduction information detection unit 10 to search for an audio packet having the oldest reproduction information stored in the audio jitter buffer 716, and sends the searched audio packet to the audio packet decoder unit 710. The packet decoder unit 710 decodes the audio packet into audio data.

Ｓ５００：デコードした音声パケットの情報量を算出するステップである。 S500: This is a step of calculating the information amount of the decoded voice packet.

情報量算出部１３は、デコードした音声パケットの情報量を算出するサブルーチンをコールし、音声パケットの情報量の値を得る。例えば会議中に発言が活発に行われているときは音声パケットの情報量が多く、発言が途切れてほぼ無音状態になったり、空調の音など背景音だけが聞こえる場合などは情報量が少ない。音声データから情報量算出するアルゴリズムについては後に詳しく説明する。 The information amount calculation unit 13 calls a subroutine for calculating the information amount of the decoded voice packet, and obtains the value of the information amount of the voice packet. For example, the amount of information in a voice packet is large when speech is actively performed during a conference, and the amount of information is small when speech is interrupted and the sound is almost silent, or only background sounds such as air conditioning sounds are heard. An algorithm for calculating the amount of information from the audio data will be described in detail later.

Ｓ１０２：音声の再生を開始するステップである。 S102: This is a step of starting audio reproduction.

再生制御部１１は、音声パケットデコーダ部７１０に指令し、デコードした音声データを再生して音声出力部７０９から出力する。 The reproduction control unit 11 instructs the audio packet decoder unit 710 to reproduce the decoded audio data and output it from the audio output unit 709.

Ｓ８００：閾値Ｏ（Ｘ）、Ｕ（Ｘ）を設定するサブルーチンをコールするステップである。 S800: This is a step of calling a subroutine for setting threshold values O (X) and U (X).

音声ジッタバッファ７１６に蓄積される音声パケット量の変動量に応じて閾値Ｏ（Ｘ）、Ｕ（Ｘ）を設定する閾値設定サブルーチンをコールする。閾値設定サブルーチンの処理については後に詳しく説明する。 A threshold setting subroutine for setting the threshold values O (X) and U (X) according to the fluctuation amount of the voice packet amount accumulated in the voice jitter buffer 716 is called. The processing of the threshold setting subroutine will be described in detail later.

閾値設定サブルーチンにより、適切な閾値Ｏ（Ｘ）、Ｕ（Ｘ）の値が返される。例えば、音声パケットが順調に再生されているときは、閾値Ｏ（Ｘ）＝Ｏ（１）、Ｕ（Ｘ）＝Ｕ（１）であり、音声パケットが順調に再生されていないときは、閾値Ｏ（Ｘ）＝Ｏ（２）、Ｕ（Ｘ）＝Ｕ（２）であり、Ｏ（２）＞Ｏ（１）、Ｕ（２）＞Ｕ（１）である。 Appropriate threshold values O (X) and U (X) are returned by the threshold setting subroutine. For example, when voice packets are being played back smoothly, threshold values O (X) = O (1) and U (X) = U (1) are set, and when voice packets are not played back smoothly, threshold values are set. O (X) = O (2), U (X) = U (2), and O (2)> O (1), U (2)> U (1).

Ｓ１０４：音声パケットの蓄積量がＯ（Ｘ）以上であるか、否か判定するステップである。 S104: This is a step of determining whether or not the accumulated amount of voice packets is O (X) or more.

蓄積量検出部１２は、音声ジッタバッファ７１６に蓄積されている音声パケットの先頭アドレスを調べて、蓄積量を検出する。再生制御部１１は、Ｏ（Ｘ）以上か、否か判定する。 The accumulation amount detection unit 12 checks the start address of the voice packet accumulated in the voice jitter buffer 716 and detects the accumulation amount. The reproduction control unit 11 determines whether or not it is greater than or equal to O (X).

Ｏ（Ｘ）未満の場合、（ステップＳ１０４；Ｎｏ）、ステップＳ１０５に進む。 If it is less than O (X) (No at Step S104), the process proceeds to Step S105.

Ｏ（Ｘ）以上場合、（ステップＳ１０４；Ｙｅｓ）、ステップＳ１１０に進む。 When it is O (X) or more (step S104; Yes), the process proceeds to step S110.

Ｓ１１０：音声の情報量は所定値以下か、否か、判定するステップである。 S110: A step of determining whether or not the amount of audio information is equal to or less than a predetermined value.

情報量判定部１４は、ステップＳ５００で算出した音声の情報量の値が所定値以下か、否か、判定する。所定値は例えば無音状態や背景音のみが聞こえている状態を判定する閾値であり、使用環境に応じて予め設定する。 The information amount determination unit 14 determines whether or not the value of the audio information amount calculated in step S500 is equal to or less than a predetermined value. The predetermined value is a threshold value for determining, for example, a silent state or a state in which only background sound is heard, and is set in advance according to the use environment.

所定値以上の場合、（ステップＳ１１０；Ｎｏ）、ステップＳ１１２に進む。 If it is greater than or equal to the predetermined value (step S110; No), the process proceeds to step S112.

音声の再生を継続し、ステップＳ１１２に進む。 The audio reproduction is continued, and the process proceeds to step S112.

所定値以下の場合、（ステップＳ１１０；Ｙｅｓ）、ステップＳ１１１に進む。 If the value is equal to or smaller than the predetermined value (step S110; Yes), the process proceeds to step S111.

Ｓ１１１：音声再生を中止するステップである。 S111: This is a step of stopping audio reproduction.

再生制御部１１は、音声パケットデコーダ部７１０に音声再生を中止するよう指令し、次のステップＳ１１２に進む。この場合、音声パケットの処理が遅れているので、音声の実時間からの遅延が大きく、例えば会話を行うときの違和感が大きい。そのため、情報量の少ない、例えば無音状態や背景音のみが聞こえている音声パケットの再生を中止し、音声の遅延を減少させるようにしている。このようにすると、聞いている人にとって違和感が少なく、また重要な情報を聞き漏らすことなく、音声の遅延時間を短縮できる。 The reproduction control unit 11 instructs the audio packet decoder unit 710 to stop audio reproduction, and proceeds to the next step S112. In this case, since the processing of the voice packet is delayed, the delay from the real time of the voice is large, for example, a sense of incongruity when talking is large. For this reason, the reproduction of a voice packet with a small amount of information, for example, a silent state or a background sound that is only heard, is stopped to reduce the voice delay. In this way, there is little sense of incongruity for the person who is listening, and the voice delay time can be shortened without missing important information.

Ｓ１０５：音声パケットの蓄積量がＵ（Ｘ）以下であるか、否か判定するステップである。 S105: A step of determining whether or not the amount of voice packets stored is U (X) or less.

蓄積量検出部１２は、音声ジッタバッファ７１６に蓄積されている音声パケットの先頭アドレスを調べて、音声ジッタバッファ７１６に蓄積された音声パケットの蓄積量を検出する。再生制御部１１は、Ｕ（Ｘ）以下か、否か判定する。 The accumulation amount detector 12 checks the head address of the voice packet stored in the voice jitter buffer 716 and detects the amount of voice packet stored in the voice jitter buffer 716. The playback control unit 11 determines whether or not it is equal to or less than U (X).

Ｕ（Ｘ）を超える場合、（ステップＳ１０５；Ｎｏ）、ステップＳ１１２に進む。 When it exceeds U (X) (step S105; No), it progresses to step S112.

Ｕ（Ｘ）以下の場合、（ステップＳ１０５；Ｙｅｓ）、ステップＳ１２２に進む。 In the case of U (X) or less (step S105; Yes), the process proceeds to step S122.

Ｓ１２２：音声の情報量は所定値以下か、否か、判定するステップである。 S122: This is a step of determining whether or not the amount of audio information is a predetermined value or less.

再生制御部１１は、ステップＳ５００で算出した音声データの情報量の値が所定値以下か、否か、判定する。所定値は例えば無音状態や背景音のみが聞こえている状態を判定する閾値であり、使用環境に応じて予め設定する。 The reproduction control unit 11 determines whether or not the information amount value of the audio data calculated in step S500 is equal to or less than a predetermined value. The predetermined value is a threshold value for determining, for example, a silent state or a state in which only background sound is heard, and is set in advance according to the use environment.

所定値以上の場合、（ステップＳ１２２；Ｎｏ）、ステップＳ１１２に進む。 If it is equal to or greater than the predetermined value (step S122; No), the process proceeds to step S112.

所定値以下の場合、（ステップＳ１２２；Ｙｅｓ）、ステップＳ１２３に進む。 If it is equal to or smaller than the predetermined value (step S122; Yes), the process proceeds to step S123.

Ｓ１２３：音声を繰り返し再生するステップである。 S123: This is a step of repeatedly reproducing the sound.

再生制御部１１は、ステップＳ１２２で所定量以下の情報量の音声データと判定された音声データを、音声パケットが音声ジッタバッファ７１５にＵ（Ｘ）以上蓄積されるまで、音声パケットデコーダ７１０に繰り返し再生するように指令する。この場合、音声パケットの到着が遅れているので、情報量の少ない、例えば無音状態や背景音のみが聞こえている音声パケットを繰り返して再生し、映像パケットが映像ジッタバッファ７１５にＵ（Ｘ）以上量蓄積されるまで待つ。メインマイコン部７１３は、映像パケットが映像ジッタバッファ７１５に所定量蓄積されたことを蓄積量検出部１２が検出すると、ステップ１１２に進む。 The reproduction control unit 11 repeatedly stores the audio data determined as the audio data having the information amount equal to or less than the predetermined amount in step S122 until the audio packet is accumulated in the audio jitter buffer 715 by U (X) or more. Command to play. In this case, since the arrival of the audio packet is delayed, an audio packet with a small amount of information, for example, an audio packet in which only a silent state or background sound is heard is repeatedly reproduced, and the video packet is transferred to the video jitter buffer 715 by U (X) or more. Wait until the amount is accumulated. When the accumulation amount detection unit 12 detects that a predetermined amount of video packets have been accumulated in the video jitter buffer 715, the main microcomputer unit 713 proceeds to step 112.

Ｓ１１２：次の音声パケットがあるか、否か、判定するステップである。 S112: A step of determining whether or not there is a next voice packet.

再生制御部１１は、音声ジッタバッファ７１６を検索し、次に再生する音声パケットを検索する。 The reproduction control unit 11 searches the audio jitter buffer 716 and searches for the next audio packet to be reproduced.

音声パケットがある場合、（ステップＳ１１２；Ｙｅｓ）、ステップＳ１０１に戻る。 If there is a voice packet (step S112; Yes), the process returns to step S101.

ステップＳ１０１に戻り、次の音声パケットを再生する。 Returning to step S101, the next voice packet is reproduced.

音声パケットがない場合、（ステップＳ１１２；Ｎｏ）、終了する。 When there is no voice packet (step S112; No), the process ends.

このように、本発明では音声送信装置７０１から送信された情報量の少ない、例えば無音状態や背景音のみが聞こえている音声パケットも、通信状態が良い場合など映像ジッタバッファ７１５に蓄積された音声パケットが所定範囲内であれば再生される。一方、映像ジッタバッファ７１５がオーバーフローまたはアンダーフローのおそれがある場合は、情報量の少ない音声パケットの再生を中止、または繰り返し再生することにより音声の実時間からの遅延時間を調整し、音声ジッタバッファ７１６に蓄積されている音声パケットの量を一定量にして音声パケットを確保している。このようにすることのより、重要な音声情報が損なわれることが無く、また、通話者が再生された音声に違和感を感じることが少ない高品質な音声パケット通信システムを提供できる。 As described above, in the present invention, a small amount of information transmitted from the audio transmission device 701, for example, an audio packet in which only a silent state or background sound is heard can be stored in the video jitter buffer 715 when the communication state is good. If the packet is within a predetermined range, it is played back. On the other hand, when the video jitter buffer 715 may overflow or underflow, the audio jitter buffer 715 adjusts the delay time from the real time of the audio by stopping or repeatedly reproducing the audio packet with a small amount of information. The amount of voice packets stored in 716 is set to a fixed amount to secure voice packets. By doing so, it is possible to provide a high-quality voice packet communication system in which important voice information is not impaired and the caller feels uncomfortable with the reproduced voice.

次に、図４、図５を用いてステップＳ５００で説明した音声の情報量を算出するルーチンについて説明する。 Next, the routine for calculating the amount of audio information described in step S500 will be described with reference to FIGS.

図４は本発明の音声の情報量算出ルーチンの第１の実施形態である。 FIG. 4 shows a first embodiment of the audio information amount calculation routine of the present invention.

Ｓ５０１：音声信号の平均信号レベルを算出する。 S501: The average signal level of the audio signal is calculated.

情報量算出部１３は、音声パケットデコーダ７１０がデコードした音声データの平均信号レベルを算出し、この値を音声の情報量とする。 The information amount calculation unit 13 calculates the average signal level of the audio data decoded by the audio packet decoder 710, and uses this value as the audio information amount.

例えば、音声データが１２ｂｉｔであり、完全な無音状態がデジタル値０だとする。ステップＳ５００で算出した平均値がデジタル値８０だったとすると、デジタル値８０が音声の情報量の値である。例えばステップＳ１１０で判定する所定値が１００だとすると、音声の情報量の値が８０の場合は、無音状態と判定される。 For example, it is assumed that the audio data is 12 bits and the complete silence state is a digital value of 0. If the average value calculated in step S500 is a digital value 80, the digital value 80 is the value of the amount of audio information. For example, if the predetermined value determined in step S110 is 100, if the value of the audio information amount is 80, it is determined that there is no sound.

図５は本発明の音声の情報量算出ルーチンの第２の実施形態である。 FIG. 5 shows a second embodiment of the sound information amount calculation routine of the present invention.

Ｓ６０１：音声信号を周波数変換する。 S601: Frequency conversion of an audio signal is performed.

情報量算出部１３は、音声パケットデコーダ７１０がデコードした音声データを周波数変換し、各周波数毎の信号レベルの値を求める。 The information amount calculation unit 13 performs frequency conversion on the audio data decoded by the audio packet decoder 710, and obtains a signal level value for each frequency.

Ｓ６０２：前回までの値と加算平均する。 S602: Addition averaging with the previous value.

メインマイコン部７１３は、内蔵するメモリに記憶されている前回までの各周波数毎の信号レベルの値に今回の値を加算し、各周波数毎の平均値を求める。 The main microcomputer unit 713 adds the current value to the signal level value for each frequency stored in the built-in memory until the previous time, and obtains an average value for each frequency.

このように毎回の周波数分布の加算平均を算出すると、常に同じ音を発生している、例えば空調の音などの背景音の周波数分布の平均値が算出される。 When the addition average of the frequency distribution of each time is calculated in this way, the average value of the frequency distribution of the background sound such as the air-conditioning sound that always generates the same sound is calculated.

Ｓ６０３：加算平均値と今回の値の差分を算出する。 S603: The difference between the addition average value and the current value is calculated.

情報量算出部１３は、ステップＳ６０２で求めた各周波数毎の加算平均値と今回の値との差分を算出する。 The information amount calculation unit 13 calculates the difference between the addition average value for each frequency obtained in step S602 and the current value.

Ｓ６０４：周波数毎の差分の総和を算出する。 S604: The sum of the differences for each frequency is calculated.

情報量算出部１３は、ステップＳ６０３で求めた各周波数毎の差分の総和を算出する。 The information amount calculation unit 13 calculates the sum of differences for each frequency obtained in step S603.

ステップＳ６０４で求めた総和の値を情報量とし、もとのルーチンに戻る。 The sum total obtained in step S604 is used as the information amount, and the process returns to the original routine.

例えば、情報量が低い無音状態が続いて居る場合は、このようにして算出した値は０であり、一方、会議などで複数の人が発言する場合は周波数分布が毎回異なることが多いので、得られた値は大きくなる。 For example, when there is a silent state with a low amount of information, the value calculated in this way is 0. On the other hand, when multiple people speak in a meeting or the like, the frequency distribution is often different each time. The value obtained is large.

次に、図６を用いてステップＳ８００で説明した閾値Ｏ（Ｘ）、Ｕ（Ｘ）を設定するサブルーチンについて説明する。 Next, a subroutine for setting the threshold values O (X) and U (X) described in step S800 will be described with reference to FIG.

図６は本発明の閾値設定ルーチンの処理を説明するフローチャートである。 FIG. 6 is a flowchart for explaining the processing of the threshold setting routine of the present invention.

Ｓ８０１：音声パケットの蓄積量を検出する。 S801: A voice packet accumulation amount is detected.

蓄積量検出部１２は、音声ジッタバッファ７１６に蓄積されている音声パケットの蓄積量を検出する。得られた蓄積量はＲＡＭ２１に記憶する。 The accumulation amount detection unit 12 detects the accumulation amount of the voice packet accumulated in the voice jitter buffer 716. The obtained accumulation amount is stored in the RAM 21.

Ｓ８０２：前回までの蓄積量の値と今回の蓄積量の値から標準偏差を算出する。 S802: The standard deviation is calculated from the value of the accumulated amount up to the previous time and the value of the accumulated amount this time.

再生制御部１１は、ＲＡＭ２１に記憶されている前回までの蓄積量の値と今回の蓄積量の値から標準偏差を算出する。 The reproduction control unit 11 calculates a standard deviation from the previous accumulated amount value and the current accumulated amount value stored in the RAM 21.

Ｓ８０３：標準偏差が所定値以上か、否か、判定する。 S803: It is determined whether or not the standard deviation is greater than or equal to a predetermined value.

再生制御部１１は、算出した標準偏差が所定値以上か、否か、判定する
標準偏差が所定値以上の場合、（ステップＳ８０３；Ｙｅｓ）、ステップＳ８０５に進む。 The reproduction control unit 11 determines whether or not the calculated standard deviation is greater than or equal to a predetermined value. If the standard deviation is greater than or equal to the predetermined value (step S803; Yes), the process proceeds to step S805.

Ｓ８０５：Ｏ（Ｘ）＝Ｏ（２）、Ｕ（Ｘ）＝Ｕ（２）とする。 S805: O (X) = O (2), U (X) = U (2).

標準偏差が所定値以上の場合、音声パケットの蓄積量の変動が大きいので、再生制御部１１は、閾値Ｏ（Ｘ）、Ｕ（Ｘ）を大きな値Ｏ（２）、Ｕ（２）とし、もとのステップに戻る。 When the standard deviation is greater than or equal to a predetermined value, the variation in the amount of voice packets accumulated is large, so the playback control unit 11 sets the threshold values O (X) and U (X) to large values O (2) and U (2), Return to the original step.

標準偏差が所定値未満の場合、（ステップＳ８０３；Ｎｏ）、ステップＳ８０４に進む。 When the standard deviation is less than the predetermined value (step S803; No), the process proceeds to step S804.

Ｓ８０５：Ｏ（Ｘ）＝Ｏ（１）、Ｕ（Ｘ）＝Ｕ（１）とする。 S805: O (X) = O (1) and U (X) = U (1).

標準偏差が所定値未満の場合、音声パケットの蓄積量の変動が大きいので、再生制御部１１は、閾値Ｏ（Ｘ）、Ｕ（Ｘ）を小さな値Ｏ（２）、Ｕ（２）とし、もとのステップに戻る。 When the standard deviation is less than the predetermined value, the amount of accumulated voice packets is large, so the playback control unit 11 sets the threshold values O (X) and U (X) to small values O (2) and U (2), Return to the original step.

以上で閾値設定ルーチンの説明は終わりである。なお、閾値Ｏ、Ｕの値は２つに限られるものではなく、音声パケットの変動に応じて、複数の値を設定しても良い。 This is the end of the description of the threshold setting routine. Note that the values of the thresholds O and U are not limited to two, and a plurality of values may be set according to the variation of the voice packet.

以上このように、本発明によれば、音声パケットの遅延時間が変動した場合でも高品質な音声を再生可能な音声パケット通信システムを提供できる。 As described above, according to the present invention, it is possible to provide a voice packet communication system capable of reproducing high-quality voice even when the delay time of voice packets varies.

本発明に係る音声パケット通信システムの構成の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the structure of the voice packet communication system which concerns on this invention. アンダーフロー、オーバフローを模式的に説明する説明図である。It is explanatory drawing which illustrates underflow and overflow typically. メインマイコン部７１３による再生制御の流れを示すフローチャート処理である。It is a flowchart process which shows the flow of reproduction | regeneration control by the main microcomputer part 713. FIG. 本発明の音声の情報量算出ルーチンの第１の実施形態である。It is 1st Embodiment of the audio | voice information amount calculation routine of this invention. 本発明の音声の情報量算出ルーチンの第２の実施形態である。It is 2nd Embodiment of the audio | voice information amount calculation routine of this invention. 本発明の閾値設定ルーチンの処理を説明するフローチャートである。It is a flowchart explaining the process of the threshold value setting routine of this invention.

Explanation of symbols

１０再生情報検出部
１１再生制御部
１２蓄積量検出部
１３情報量算出部
１４情報量判定部
２０ＣＰＵ
２１ＲＡＭ
２２ＲＯＭ
７０１音声送信装置
７１３メインマイコン部
７０２音声再生装置
７１７伝送回線 DESCRIPTION OF SYMBOLS 10 Reproduction | regeneration information detection part 11 Reproduction | regeneration control part 12 Accumulation amount detection part 13 Information amount calculation part 14 Information amount determination part 20 CPU
21 RAM
22 ROM
701 Audio transmission device 713 Main microcomputer unit 702 Audio reproduction device 717 Transmission line

Claims

A voice packet encoder that packetizes voice signals into voice packets;
A reproduction information adding unit for adding reproduction information to the voice packet;
A voice transmitting device comprising: a transmitter that transmits the voice packet to which the reproduction information is attached;
A receiving unit for receiving the transmitted voice packet;
A voice jitter buffer for sequentially storing the voice packets received by the receiver;
An audio packet decoder for decoding and reproducing the audio packet;
Reproduction information detecting means for detecting the reproduction information;
An audio playback device comprising: playback control means for instructing the audio packet decoder to play based on the playback information detected by the playback information detection means;
In a voice packet communication system having
The audio playback device
Accumulated amount detecting means for detecting the amount of voice packets accumulated in the voice jitter buffer;
An information amount calculating means for calculating an information amount of an audio packet to be reproduced by the audio packet decoder;
An information amount determination means for determining the information amount calculated by the information amount calculation means;
The reproduction control means includes
Voice packet communication characterized in that, according to the amount of voice packets detected by the accumulated amount detection means, the playback time of the voice packets determined by the information amount judgment means to be an information amount equal to or less than a predetermined value is changed. system.

The reproduction control means includes
When the amount of voice packets detected by the accumulated amount detection means is greater than or equal to the first value,
2. The voice according to claim 1, wherein the voice packet whose information amount is equal to or greater than a predetermined value is sequentially reproduced without reproducing the voice packet determined as the information amount being equal to or smaller than the predetermined value. Packet communication system.

The reproduction control means includes
When the amount of voice packets detected by the accumulated amount detection means is less than or equal to the second value,
2. The audio packet determined by the information amount determination means to be an information amount equal to or less than a predetermined value is repeated according to the amount of the audio packet, and then the next audio packet is sequentially reproduced. The voice packet communication system described in 1.

The reproduction control means includes
Calculating the amount of change in the amount of voice packets detected by the accumulated amount detection means;
When the fluctuation amount is less than a predetermined value, the first value and the second value are decreased,
When the fluctuation amount is a predetermined value or more, the first value and the second value are increased.
The voice packet communication system according to any one of claims 1 to 3, wherein

5. The voice packet communication system according to claim 1, wherein the information amount calculation unit calculates the information amount based on an average value of the voice signal. 6.

The information amount calculating means calculates each frequency component of the audio signal, and calculates a difference between the addition average value of each frequency component value of the audio packet calculated up to the previous time and the value of each frequency component calculated this time. 5. The voice packet communication system according to claim 1, wherein the information amount is calculated based on the information amount.

A receiving unit for receiving a voice packet to which reproduction information is given;
A voice jitter buffer for sequentially storing the voice packets received by the receiver;
An audio packet decoder for decoding and reproducing the audio packet;
Reproduction information detecting means for detecting the reproduction information;
A reproduction control means for instructing the audio packet decoder to reproduce based on the reproduction information detected by the reproduction information detection means;
The audio playback device
Accumulated amount detecting means for detecting the amount of voice packets accumulated in the voice jitter buffer;
Information amount calculating means for calculating the information amount of the voice packet;
An information amount determination means for determining the information amount calculated by the information amount calculation means;
The reproduction control means includes
An audio reproduction apparatus characterized in that, according to the amount of audio packets detected by the accumulated amount detection means, the reproduction time of the audio packets determined by the information amount determination means to be an information amount equal to or less than a predetermined value is changed. .

The reproduction control means includes
When the amount of voice packets detected by the accumulated amount detection means is greater than or equal to the first value,
8. The voice according to claim 7, wherein the voice packet whose information amount is equal to or greater than a predetermined value is sequentially reproduced without reproducing the voice packet determined as the information amount being equal to or smaller than the predetermined value. Playback device.

The reproduction control means includes
When the amount of voice packets detected by the accumulated amount detection means is less than or equal to the second value,
8. The reproduction of voice packets determined by the information amount determination means to be an information amount equal to or less than a predetermined value is repeated according to the amount of the voice packets, and then the next voice packets are sequentially reproduced. The audio reproducing device according to 1.

The reproduction control means includes
Calculating the amount of change in the amount of voice packets detected by the accumulated amount detection means;
When the fluctuation amount is less than a predetermined value, the first value and the second value are decreased,
When the fluctuation amount is a predetermined value or more, the first value and the second value are increased.
The sound reproducing device according to claim 7, wherein the sound reproducing device is a sound reproducing device.

11. The audio reproduction device according to claim 7, wherein the information amount calculation unit calculates the information amount based on an average value of an audio signal.

The information amount calculating means calculates each frequency component of the audio signal, and calculates a difference between the addition average value of each frequency component value of the audio packet calculated up to the previous time and the value of each frequency component calculated this time. The audio reproduction device according to claim 7, wherein the information amount is calculated based on the information amount.