JP2007108388A

JP2007108388A - Voice packet receiving and reproducing method, and device therefor and its program recording medium

Info

Publication number: JP2007108388A
Application number: JP2005298691A
Authority: JP
Inventors: Naka Omuro; 仲大室; Takeshi Mori; 岳至森; Yuusuke Hiwazaki; 祐介日和▲崎▼; Akitoshi Kataoka; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-10-13
Filing date: 2005-10-13
Publication date: 2007-04-26
Anticipated expiration: 2025-10-13
Also published as: JP4510742B2

Abstract

<P>PROBLEM TO BE SOLVED: To reproduce a voice signal without deterioration even when there is no voice packet in a receiving buffer for a relatively long time. <P>SOLUTION: In a method of taking voice packets out of the receiving buffer 81 in the order of frame numbers, decoding the voice packets, and outputting a voice signal, it is decided whether there is a voice packet to be taken out of the receiving buffer 81; when there is no voice packet, a compensation signal is generated for only one frame through voice packet loss compensation processing (packet loss concealment) and a frame number counter 85 is made to count up by one. When none of predetermined N frames (frames corresponding to, for example, 40 to 100 milliseconds) include a voice packet to be taken out continuously and there is not a voice packet to be taken out next, a soundless signal generation part 87 generates a soundless signal. The frame number counter 85, however, is not made to count up. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、ディジタル化された音声信号、音楽信号などの音響信号（この出願書類においては、総称して音声信号）をインターネットをはじめとするパケット通信網を介して送信する際に、受信側において、安定した品質で、音声信号を再生するために用いる音声パケット受信再生方法及びその装置、そのプログラム記録媒体に関する。 In the present invention, when a digital audio signal, an audio signal such as a music signal (in this application document, the audio signal is generically) is transmitted through a packet communication network such as the Internet, the receiving side The present invention relates to an audio packet reception and reproduction method and apparatus used for reproducing an audio signal with stable quality, and a program recording medium thereof.

近年、音声信号をＶｏｉｃｅｏｖｅｒＩＰ（ＶｏＩＰ）技術を利用して、より品質の高い音声信号を送受信する技術が望まれている。
図８は音声信号を音声パケットに変換してＩＰ通信網をはじめとするパケット通信網によってリアルタイムで通信する構成例である。音声信号送信部７０に入力された入力音声信号はフレームと呼ばれる一定の時間毎にフレーム分割部７１で区切られ、音声符号化部７２により、音声信号は符号化される。ここで、フレームの長さは、一般に１０ミリ秒から２０ミリ秒が用いられることが多い。符号化された音声信号は、音声パケット変換部７３により、音声パケットに変換される。このように入力音声は音声信号送信部７０内で、音声パケットに変換されて、ＩＰ通信網７４に送信される。 In recent years, there has been a demand for a technology for transmitting and receiving a higher quality audio signal by using a Voice over IP (VoIP) technology.
FIG. 8 shows a configuration example in which voice signals are converted into voice packets and communicated in real time through a packet communication network such as an IP communication network. The input audio signal input to the audio signal transmitting unit 70 is divided by a frame dividing unit 71 at regular intervals called frames, and the audio encoding unit 72 encodes the audio signal. Here, the frame length is generally 10 to 20 milliseconds in many cases. The encoded audio signal is converted into an audio packet by the audio packet converter 73. In this way, the input voice is converted into voice packets in the voice signal transmission unit 70 and transmitted to the IP communication network 74.

音声パケットには、フレームに区切った音声信号を音声符号化の手法によって変換した音声符号と、パケットの時間順序を示すタイムスタンプまたはフレーム番号が含まれている。代表的な音声符号化手法としては、ＩＴＵ−Ｔ（国際電気通信連合）の標準であるＧ．７１１方式があるが、その他の任意の方式でも良い。タイムスタンプとフレーム番号は相互に変換可能であるため、以下タイムスタンプまたはフレーム番号を単にフレーム番号と呼ぶことにする。
ＩＰ通信網７４よりの音声パケットは音声信号受信部７５で受信され、音声信号受信部７５に受信された音声パケットは、音声信号に変換されて出力される。 The voice packet includes a voice code obtained by converting a voice signal divided into frames by a voice coding method, and a time stamp or a frame number indicating the time order of the packet. As a typical speech coding method, G.I. which is a standard of ITU-T (International Telecommunication Union). There are 711 methods, but any other method may be used. Since the time stamp and the frame number can be converted to each other, the time stamp or the frame number is hereinafter simply referred to as a frame number.
Voice packets from the IP communication network 74 are received by the voice signal receiver 75, and the voice packets received by the voice signal receiver 75 are converted into voice signals and output.

図９に音声信号受信部７５の具体的な構成例を示す。ＩＰ通信網からの音声パケットは、パケット受信部８０によって受信され、更に、受信バッファ８１に蓄積される。音声パケット復号部８２は受信バッファ８１から音声パケットを取り出し、音声信号に復号して、音声信号を切替スイッチ８９を通じて、出力端子８８に出力する。
ここで受信バッファ８１は、ゆらぎ吸収バッファとも呼ばれ、ＩＰ通信網の状態によってパケットの到着時間にゆらぎが生じた時にゆらぎを吸収し、再生音声信号が途切れることを防止する。また通信網において、パケットロス（パケット損失）が発生した場合や、受信バッファ８１のゆらぎ吸収量を超えるゆらぎが発生して、受信バッファ８１が一時的に空になった場合には、音声パケット復号部８２は受信バッファ８１から復号すべき音声パケットを取り出すことができない。これらの場合、制御部８４は、損失補償処理部（パケットロスコンシールメント部）８３を作動させ、損失補償処理部８３は補償音声信号を生成して切替スイッチ８９を通じて、出力端子８８に出力する。なお損失補償処理部８３とは、パケットロスが発生した際に、再生音声の劣化が目立たないような音声信号を、既に受信した音声信号から、生成する方法で、代表的な方法として、ＩＴＵ−Ｔ（国際電気通信連合）の標準として定められているＧ．７１１ＡｐｐｅｎｄｉｘＩ（ジードット７１１アペンディックスワン）または非特許文献１に示す方法が知られている。 FIG. 9 shows a specific configuration example of the audio signal receiving unit 75. Voice packets from the IP communication network are received by the packet receiver 80 and further stored in the reception buffer 81. The voice packet decoding unit 82 takes out the voice packet from the reception buffer 81, decodes it into a voice signal, and outputs the voice signal to the output terminal 88 through the changeover switch 89.
Here, the reception buffer 81 is also called a fluctuation absorbing buffer, and absorbs fluctuation when the arrival time of the packet is fluctuated depending on the state of the IP communication network, and prevents the reproduced audio signal from being interrupted. Also, when packet loss (packet loss) occurs in the communication network or when fluctuation exceeding the fluctuation absorption amount of the reception buffer 81 occurs and the reception buffer 81 is temporarily emptied, the voice packet decoding is performed. The unit 82 cannot extract the voice packet to be decoded from the reception buffer 81. In these cases, the control unit 84 operates the loss compensation processing unit (packet loss concealment unit) 83, and the loss compensation processing unit 83 generates a compensation audio signal and outputs it to the output terminal 88 through the changeover switch 89. Note that the loss compensation processing unit 83 is a method of generating an audio signal from which the deterioration of the reproduced audio is not noticeable from the already received audio signal when packet loss occurs. G. has been established as a standard of T (International Telecommunication Union). 711 Appendix I (Gee Dot 711 Appendix One) or the method shown in Non-Patent Document 1 is known.

切替スイッチ８９は、音声パケット復号部８２から、又は損失補償処理部８３から、音声信号を出力するかに応じて、制御部８４により、切替を行う。
図１０は１つの装置内に、音声信号受信部９１、音声信号送信部９２、音声信号受信部９１よりの音声信号を再生するスピーカ９３、送話信号を受音するマイクロホン９４、マイクロホン９４に収音された音声信号から反響信号を抑圧して、音声信号送信部９２へ送信するエコーキャンセラ９５を備えた装置の構成例を示す。スピーカ９３とマイクロホン９４はそれぞれ、一般的な音響スピーカ、一般的なマイクロホンであり、音声信号受信部９１と音声信号送信部９２の両方を備えた、電話通信のように双方向に通話する装置の構成例である。また音声信号受信部９１よりの受話音声信号（受話信号）ｘ（ｋ）はスピーカ９３から再生放音され、この再生音声は反響経路９６を通じて、マイクロホン９４にも収音され、これが音響エコーとして送信される。そこで、スピーカ９３へ供給される音声信号ｘ（ｋ）がエコーキャンセラ９５にも入力され、エコーキャンセラ９５内で擬似反響経路に通され、擬似音響エコーが生成され、これが、マイクロホン９４よりの収音信号ｙ（ｋ）からエコーキャンセラ９５内で差し引かれ、反響信号が抑圧された送話信号が、音声信号送信部９２に入力される。 The changeover switch 89 is switched by the control unit 84 depending on whether the audio signal is output from the audio packet decoding unit 82 or the loss compensation processing unit 83.
FIG. 10 shows an audio signal receiving unit 91, an audio signal transmitting unit 92, a speaker 93 for reproducing an audio signal from the audio signal receiving unit 91, a microphone 94 for receiving a transmission signal, and a microphone 94. A configuration example of an apparatus including an echo canceller 95 that suppresses an echo signal from a sound signal that has been sounded and transmits the signal to the sound signal transmission unit 92 will be described. The speaker 93 and the microphone 94 are a general acoustic speaker and a general microphone, respectively, which are equipped with both the audio signal receiving unit 91 and the audio signal transmitting unit 92, and are used for two-way communication like telephone communication. It is a structural example. In addition, the received voice signal (received signal) x (k) from the voice signal receiving unit 91 is reproduced and emitted from the speaker 93, and the reproduced voice is also collected by the microphone 94 through the echo path 96 and transmitted as an acoustic echo. Is done. Therefore, the audio signal x (k) supplied to the speaker 93 is also input to the echo canceller 95 and passed through the pseudo echo path in the echo canceller 95 to generate a pseudo acoustic echo, which is collected by the microphone 94. A speech signal which is subtracted from the signal y (k) in the echo canceller 95 and the echo signal is suppressed is input to the audio signal transmission unit 92.

なお、エコーキャンセラ９５内の擬似反響経路は真の反響経路の特性に、近づくように適応的に制御されている。このように、エコーキャンセラ９５は受信信号ｘ（ｋ）と、収音信号ｙ（ｋ）とが正しく同期するように入力されて、正常に作動している。受信信号ｘ（ｋ）が入力されない状態が発生したり、同期がずれると、正常に作動しなくなる。すなわち、正しく反響抑圧されなくなる。
音声信号送信部９２としては、入力音声信号が無音のとき、すなわち呼は接続されているが、送信側の話者が発声していない時に音声パケットを全く送らない送信方法もある。この方法を無音圧縮またはＤｉｓｃｏｎｔｉｎｕｏｕｓＴｒａｎｓｍｉｓｓｉｏｎ（以下ＤＴＸ）と呼び、パケット通信網の帯域を効率的に使用している方法として使われている。
大室仲、他“音声特徴量並行送信によるバーストパケットロス耐性の向上”，信学技報（電子情報学会）学会）通信，ＳＰ２００４−７７，ｐｐ．３５−４０，２００４ The pseudo echo path in the echo canceller 95 is adaptively controlled so as to approach the characteristics of the true echo path. In this way, the echo canceller 95 is operating normally, with the received signal x (k) and the collected sound signal y (k) being input so as to be correctly synchronized. If a state in which the reception signal x (k) is not input occurs or the synchronization is lost, it does not operate normally. That is, the echo is not correctly suppressed.
The voice signal transmission unit 92 includes a transmission method in which a voice packet is not transmitted at all when an input voice signal is silent, that is, a call is connected but a transmitter speaker is not speaking. This method is called silence compression or Discontinuous Transmission (hereinafter referred to as DTX), and is used as a method of efficiently using the bandwidth of the packet communication network.
Omuro Naka, et al. “Improvement of burst packet loss tolerance by parallel transmission of voice features”, IEICE Technical Report (Japan Society for Electronics and Information Engineers) Communication, SP2004-77, pp. 35-40, 2004

この発明が解決しようとする課題を２点挙げる。
１つ目は、前記のようにパケットのゆらぎが受信バッファのゆらぎ吸収量よりも大きくなった場合に、受信バッファは一時的に空になる。この現象を防ぐために、受信バッファのゆらぎ吸収量を大きくする、すなわち受信バッファに蓄積するパケットの量を多くすることは可能であるが、受信バッファに蓄積するパケットの量を多くすると、パケットを受信してから音声信号が再生されるまでの遅延が大きくなり、双方向通話の場合には話しづらくなる。このため、ゆらぎが変動した際に受信バッファが一時的に空になる状態はある程度許容することが一般的である。 There are two problems to be solved by the present invention.
First, when the fluctuation of the packet becomes larger than the fluctuation absorption amount of the reception buffer as described above, the reception buffer is temporarily emptied. In order to prevent this phenomenon, it is possible to increase the amount of fluctuation absorbed in the reception buffer, that is, to increase the amount of packets stored in the reception buffer, but if the amount of packets stored in the reception buffer is increased, packets are received. After that, the delay until the audio signal is reproduced increases, and it becomes difficult to speak in the case of a two-way call. For this reason, it is common to tolerate a state in which the reception buffer is temporarily emptied when fluctuations fluctuate.

受信バッファが空になった場合は、前記のように、損失補償処理部が作動され、パケット損失補償処理を行う。しかし、この処理を、何秒も連続して適用すると、パケット到着再開時の受信バッファの状態が不安定になる問題である。つまり、一般に受信バッファに、蓄積する量があらかじめ定められた適当量になるように、プログラム制御していることがあるが、その場合、前述したように何秒も連続してパケット損失補償処理を行う状態になると、蓄積量を適量にする動作がうまく作動しなくなり、遅延が不適切になったり、それを回復する処理によって再生音が劣化する。またこのように長い間、パケット損失補償処理による再生音声は、音質が悪いものとなる。 When the reception buffer becomes empty, the loss compensation processing unit is operated as described above to perform packet loss compensation processing. However, if this process is applied continuously for many seconds, there is a problem that the state of the reception buffer becomes unstable when resuming packet arrival. In other words, in general, program control may be performed so that the amount stored in the reception buffer becomes a predetermined appropriate amount. In this case, as described above, packet loss compensation processing is continuously performed for many seconds. When it is in a state to perform, the operation for setting the accumulation amount to an appropriate amount does not work well, the delay becomes inappropriate, or the reproduced sound deteriorates due to the process of recovering it. In addition, for a long time, the sound reproduced by the packet loss compensation process has a poor sound quality.

２つ目は、前記の受信バッファが不安定になるという問題を防ぐため、損失補償処理を行わず、パケットが受信バッファに溜まるまで、音声パケットを復号する処理を停止することにすると、エコーキャンセラの作動に不具合が生じるという問題である。即ち、図１０において、エコーキャンセラ９５に入力される受話信号ｘ（ｋ）が停止すると、収音信号ｙ（ｋ）のみがエコーキャンセラ９５に入力されることになり、エコーキャンセラ９５が正しく動作せず、この場合は、送信音すなわち、相手側の再生音が劣化するという現象が生じる。 Second, in order to prevent the problem that the reception buffer becomes unstable, loss compensation processing is not performed, and the process of decoding voice packets is stopped until packets accumulate in the reception buffer. It is a problem that malfunction occurs in the operation. That is, in FIG. 10, when the reception signal x (k) input to the echo canceller 95 is stopped, only the collected sound signal y (k) is input to the echo canceller 95, and the echo canceller 95 operates correctly. However, in this case, a phenomenon occurs in which the transmission sound, that is, the reproduction sound of the other party is deteriorated.

フレームごとに、上記受信バッファから取り出す音声パケットがあるか否かを第１判定過程で判定し、音声パケットがないと判断された場合に、パケット損失補償処理を行うか、無音生成処理を行うか第２判定過程で判定し、音声パケットがないと判定された場合に、上記第２判定過程の判定に基づいて、パケット損失補償処理または無音生成処理のいずれかを行って信号を生成する。つまり音声パケットがないと判定される状態がＮフレーム（Ｎは０以上の整数）連続するまでは、上記第２判定過程でパケット損失補償処理を行うと判定し、音声パケットがないと判定される状態がＮフレーム（Ｎは０以上の整数）を超えて連続した場合は、上記第２判定過程で無音生成処理を行うと判定する。 Whether or not there is a voice packet to be extracted from the reception buffer for each frame is determined in the first determination process, and if it is determined that there is no voice packet, whether to perform packet loss compensation processing or silence generation processing When it is determined in the second determination process and it is determined that there is no voice packet, a signal is generated by performing either the packet loss compensation process or the silence generation process based on the determination in the second determination process. That is, until N frames (N is an integer equal to or greater than 0) continue to be determined that there is no voice packet, it is determined that the packet loss compensation process is performed in the second determination process, and it is determined that there is no voice packet. When the state continues beyond N frames (N is an integer of 0 or more), it is determined that the silence generation process is performed in the second determination process.

必要に応じて、受信される一連の音声パケットを観測して、受信した音声パケットが無音圧縮を利用したものであるか否かをＤＴＸ検出過程で推定し、無音圧縮を利用したものでないと、推定される場合には、Ｎ≧１とし、無音圧縮を利用したものであると、推定される場合には、Ｎ＝０とする。
更に必要に応じて、上記受信バッファより音声パケットを取り出して復号するごとに、対応するフレーム番号をフレーム番号カウンタに記憶させ、上記パケット損失補償処理を１フレーム分行うごとに、上記フレーム番号カウンタの計数値を１増加させ、上記無音生成処理を行うごとに、上記フレーム番号カウンタの計数値は増加させないでそのまま維持し、上記第１判定過程で取り出す音声パケットがあると判定され、取り出された音声パケットのフレーム番号が上記フレーム番号カウンタが示すフレーム番号以下か否かをフレーム番号判定過程で判定し、その判定結果が以下であれば、上記受信した音声パケットを破棄し、上記判定結果が以下でなければ、上記取り出した音声パケットを復号して、音声信号を出力する。 If necessary, observe a series of received voice packets, estimate whether the received voice packets use silence compression in the DTX detection process, and if not use silence compression, If it is estimated, N ≧ 1, and if it is estimated that silence compression is used, N = 0.
If necessary, every time an audio packet is extracted from the reception buffer and decoded, the corresponding frame number is stored in the frame number counter, and each time the packet loss compensation process is performed for one frame, the frame number counter Every time the count value is incremented by 1 and the silence generation process is performed, the count value of the frame number counter is maintained as it is without being increased, and it is determined that there is an audio packet to be extracted in the first determination process. In the frame number determination process, it is determined whether or not the frame number of the packet is equal to or less than the frame number indicated by the frame number counter. If the determination result is the following, the received voice packet is discarded, and the determination result is If not, the extracted voice packet is decoded and a voice signal is output.

［実施例１］
図１にこの発明を実施するための最良な形態の音声パケット受信再生装置の機能構成例を示す。この図には図９と対応する部分に同じ符号を付け、重複説明を省略する。パケット受信部８０は、ＩＰ通信網から音声パケットを受け取り、受信バッファ８１へ送る。
受信バッファ８１に音声パケットが蓄積されている時は、音声パケット復号部８２は受信バッファ８１内の音声パケットの中から、フレーム番号の最も小さいものを１つ（１フレーム分）取り出し、音声パケットに含まれる音声符号を復号すると同時に、取り出されたパケットのフレーム番号をフレーム番号カウンタ８５にカウント値として設定する。復号された音声信号は出力端子８８に送られ、出力される。 [Example 1]
FIG. 1 shows a functional configuration example of a voice packet reception / playback apparatus according to the best mode for carrying out the present invention. In this figure, parts corresponding to those in FIG. The packet receiving unit 80 receives a voice packet from the IP communication network and sends it to the reception buffer 81.
When the voice packet is stored in the reception buffer 81, the voice packet decoding unit 82 extracts one (one frame) having the smallest frame number from the voice packets in the reception buffer 81 and converts it into the voice packet. At the same time as decoding the included speech code, the frame number of the extracted packet is set in the frame number counter 85 as a count value. The decoded audio signal is sent to the output terminal 88 and output.

受信バッファ８１が空で、音声パケット復号部８２が受信バッファ８１から音声パケットを取り出そうとした際に、取り出す音声パケットが蓄積されていない場合は、制御部８４により、損失補償処理部８３を作動させ、補償音声信号が出力端子８８に送られ、出力される。パケット損失補償処理を１フレーム分行うごとに、フレーム番号カウンタ８５のカウント値を「１」増加（「１」インクリメント）させる。パケット損失補償処理が終わると、再度受信バッファ８１から音声パケットを取出せるか否かを調べる。取り出す音声パケットがなく損失補償処理部８３を作動させることがＮ回連続的に行われ、その後も、受信バッファ８１から取り出せる音声パケットがない場合は、無音生成部８７を作動させ、無音信号を生成し、出力端子８８に送られ、出力される。ただしＮは０以上の整数とする。これにより、１つ目の課題を解決できる。また、出力端子８８へ供給される信号はエコーキャンセラへも供給される。従って、エコーキャンセラには受信信号側からも常時、信号が入力されるため、エコーキャンセラを正しく動作させることができ、これにより、２つ目の課題を解決できる。 When the reception buffer 81 is empty and the voice packet decoding unit 82 tries to extract the voice packet from the reception buffer 81 and the extracted voice packet is not accumulated, the control unit 84 causes the loss compensation processing unit 83 to operate. The compensation audio signal is sent to the output terminal 88 and output. Every time packet loss compensation processing is performed for one frame, the count value of the frame number counter 85 is incremented by “1” (incremented by “1”). When the packet loss compensation processing is completed, it is checked again whether or not the voice packet can be taken out from the reception buffer 81. When there is no voice packet to be extracted and the loss compensation processing unit 83 is operated N times continuously, and thereafter there is no voice packet that can be extracted from the reception buffer 81, the silence generating unit 87 is operated to generate a silence signal. Then, it is sent to the output terminal 88 and outputted. N is an integer of 0 or more. Thereby, the first problem can be solved. The signal supplied to the output terminal 88 is also supplied to the echo canceller. Therefore, since the signal is always input to the echo canceller also from the reception signal side, the echo canceller can be operated correctly, thereby solving the second problem.

なお、フレーム長が２０ミリ秒の場合は、Ｎの値は２フレーム〜５フレームがよく、フレーム長が１０ミリ秒の場合は、Ｎの値は４フレーム〜１０フレームがよく、つまり、Ｎは４０〜１００ミリ秒に相当するフレーム数とするとよく、好ましくは、４０〜６０ミリ秒に相当するフレーム数とするとよいことが実験的に確認されている。なお、Ｎが小さすぎると直ぐ無音信号となり、音声として、好ましくなく、逆に、Ｎが大きすぎても、音声品質が劣化する。
［実施例２］
送信側でＤＴＸを利用する場合に対するこの発明装置の実施例を説明する。
この場合は、図１中に一点鎖線枠で示されているＤＴＸ検出部８６が設けられる。ＤＴＸ検出部８６は、パケット受信部８０におけるパケット受信状態を監視して、送信側がＤＴＸを利用して音声パケットを送っているか否かの推定を継続的に行う。ＤＴＸが利用されているかどうかは、例えば、パケットのヘッダ情報を参照することにより、推定可能である。 When the frame length is 20 milliseconds, the value of N is preferably 2 to 5 frames, and when the frame length is 10 milliseconds, the value of N is preferably 4 to 10 frames. It has been experimentally confirmed that the number of frames corresponds to 40 to 100 milliseconds, and preferably the number of frames corresponding to 40 to 60 milliseconds. Note that if N is too small, it becomes a silent signal immediately, which is not preferable as voice, and conversely, if N is too large, voice quality deteriorates.
[Example 2]
An embodiment of the inventive device for the case of using DTX on the transmission side will be described.
In this case, a DTX detector 86 indicated by a one-dot chain line in FIG. 1 is provided. The DTX detection unit 86 monitors the packet reception state in the packet reception unit 80, and continuously estimates whether or not the transmission side transmits a voice packet using DTX. Whether or not DTX is used can be estimated, for example, by referring to the header information of the packet.

ＤＴＸを利用して送信側が音声パケットを送信している場合、受信側の受信バッファ８１内が空になり、音声パケット復号部８２が音声パケットを取り出せない場合、それが想定したゆらぎよりも大きなゆらぎによる場合なのか、ＤＴＸを利用している場合なのかの判断がその時点ではできない。もしＤＴＸの利用により、発生した受信バッファ８１内が空になる現象をパケットロスと見なして（間違った判断をして）、損失補償処理を行うと、再生音が劣化する。
そこで、受信バッファ８１内が空になった場合、送信側がＤＴＸを利用していなかったら、前述したように、パケット損失補償処理、更に必要に応じて無音信号生成を行い、受信パケットがＤＴＸを利用していた場合は直ちに無音信号の生成を行う。
制御部の具体例
次に、図１中の制御部８４の具体的機能の構成例を図２に示す。音声パケット復号部８２が受信バッファ８１から取り出せる音声パケットがあるか否かの信号が受信バッファ８１内の音声パケット蓄積状態からパケット取出し可否信号生成部２０で生成される。パケット取出し可否信号生成部２０は、取り出せる音声パケットがあれば図２では、「１」を出力し、取り出せる音声パケットがなければ、「０」を出力する。パケット取り出し可否信号は音声パケット復号指示部２２、損失補償指示部２４、無音生成指示部２６へそれぞれ送られる。パケット取り出し可否信号が「１」であれば、音声パケット復号指示部２２は、音声パケット復号部８２に復号処理を指示する。 When the transmission side is transmitting voice packets using DTX, the reception buffer 81 on the reception side is emptied, and when the voice packet decoding unit 82 cannot extract the voice packets, the fluctuation is larger than the expected fluctuation. It is not possible to determine at this time whether it is a case of using DTX or a case of using DTX. If the phenomenon that the reception buffer 81 is emptied due to the use of DTX is regarded as packet loss (incorrect determination) and loss compensation processing is performed, reproduced sound deteriorates.
Therefore, when the reception buffer 81 becomes empty, if the transmission side does not use DTX, as described above, the packet loss compensation processing and, if necessary, silence signal generation are performed, and the reception packet uses DTX. If so, a silence signal is immediately generated.
Specific Example of Control Unit Next, a configuration example of specific functions of the control unit 84 in FIG. 1 is shown in FIG. A signal indicating whether or not there is a voice packet that the voice packet decoding unit 82 can extract from the reception buffer 81 is generated by the packet extraction enable / disable signal generation unit 20 from the voice packet accumulation state in the reception buffer 81. The packet extraction enable / disable signal generation unit 20 outputs “1” in FIG. 2 if there is an audio packet that can be extracted, and outputs “0” if there is no audio packet that can be extracted. The packet extraction enable / disable signal is sent to the voice packet decoding instruction unit 22, the loss compensation instruction unit 24, and the silence generation instruction unit 26, respectively. If the packet extraction enable / disable signal is “1”, the voice packet decoding instruction unit 22 instructs the voice packet decoding unit 82 to perform decoding processing.

ＤＴＸ検出部８６からの検出信号はＤＴＸフラグレジスタ２８に送られる。この例では、ＤＴＸフラグレジスタ２８内のＤＴＸフラグを、ＤＴＸを利用している場合は「１」に、ＤＴＸを利用していない場合はＤＴＸフラグを「０」に設定する。ＤＴＸフラグ信号は、損失補償指示部２４及び無音生成指示部２６に送られる。
パケット損失補償処理は連続してＮフレーム（Ｎは０以上の整数）までしか行わせないため、補償継続可否信号生成部３４が設けられる。これよりの補償継続可否信号は損失補償指示部２４および、無音生成指示部２６へ送られる。 A detection signal from the DTX detection unit 86 is sent to the DTX flag register 28. In this example, the DTX flag in the DTX flag register 28 is set to “1” when DTX is used, and the DTX flag is set to “0” when DTX is not used. The DTX flag signal is sent to the loss compensation instruction unit 24 and the silence generation instruction unit 26.
Since the packet loss compensation processing can be performed continuously up to N frames (N is an integer of 0 or more), a compensation continuation enable / disable signal generation unit 34 is provided. The further compensation continuation enable / disable signal is sent to the loss compensation instruction unit 24 and the silence generation instruction unit 26.

補償継続可否信号生成部３４は例えば、図３に示すように、ゼロにリセットされているｎカウンタ３４ｅが、パケット損失補償処理を１フレーム分行うように指示する毎に、１増加され、ｎカウンタ３４ｅの計数値ｎはレジスタ３４ｃの値Ｎと比較部３４ｄで比較され、その比較結果に応じて、計数値ｎがＮ未満であれば、補償継続可否信号を「１」とし、Ｎ以上であれば、補償継続可否信号を「０」として出力する。ｎカウンタは音声パケット復号指示部２２よりの復号指示信号により０にリセットされる。
あるいは、損失補償処理部８３がパケット損失補償処理の開始時のフレーム番号カウンタ８５のカウント値Ｆ_１を記憶し、パケット損失補償処理を１フレーム分行うごとに、フレーム番号カウンタ８５のカウント値Ｆ_ｃからＦ_１を減算して、その減算結果がＮ未満か否かの判定をしてもよい。 For example, as shown in FIG. 3, the compensation continuation enable / disable signal generator 34 is incremented by 1 every time the n counter 34e reset to zero instructs to perform the packet loss compensation process for one frame. The count value n of 34e is compared with the value N of the register 34c by the comparison unit 34d. If the count value n is less than N according to the comparison result, the compensation continuation enable / disable signal is set to “1”, and it is greater than or equal to N. In this case, the compensation continuation enable / disable signal is output as “0”. The n counter is reset to 0 by a decoding instruction signal from the voice packet decoding instruction unit 22.
Alternatively, the loss compensation processing unit 83 stores the count value F ₁ of the frame number counter 85 at the start of the packet loss compensation process, and every time the packet loss compensation process is performed for one frame, the count value F _c of the frame number counter 85 F ₁ may be subtracted from and a determination may be made as to whether or not the subtraction result is less than N.

これらの構成は、いずれも連続的に行うパケット損失補償処理のフレームの数を計数しているから、損失補償処理カウンタともいう。
受信バッファ８１から音声パケット復号部８２が音声パケットを取り出せず、かつＤＴＸフラグの値が「０」であり、かつ損失補償処理の連続回数がＮ回未満であれば、つまり補償継続可否信号が「１」であれば、損失補償指示部２４は、損失補償指示を出し、損失補償処理部８３を作動させ、つまり、損失補償指示部２４は例えば、図中で示すように、ＡＮＤ回路３８で構成され、ＡＮＤ回路３８の出力が「１」となると、損失補償指示が発生する。 These configurations are also referred to as loss compensation processing counters because they count the number of frames of packet loss compensation processing that are continuously performed.
If the voice packet decoding unit 82 does not extract a voice packet from the reception buffer 81, the value of the DTX flag is “0”, and the number of consecutive loss compensation processes is less than N, that is, the compensation continuation permission signal is “ 1 ”, the loss compensation instruction unit 24 issues a loss compensation instruction and operates the loss compensation processing unit 83. That is, the loss compensation instruction unit 24 is configured by an AND circuit 38, for example, as shown in FIG. When the output of the AND circuit 38 becomes “1”, a loss compensation instruction is generated.

また、受信バッファ８１から音声パケット復号部８２が音声パケットを取り出せず、かつＤＴＸフラグの値が０であり、かつ損失補償処理の連続回数がＮ回以上であれば、つまり補償継続可否信号が「０」であれば、（条件１とする）、無音生成指示部２６が無音生成指示を無音生成部８７に出し、無音生成部８７を作動させる。また、受信バッファ８１から音声パケット復号部８２が音声パケットを取り出せず、かつＤＴＸフラグの値が１であれば、つまり送信側が無音区間は音声パケットを送信しなければ（条件２とする）、無音生成指示部２６が無音生成指示を無音生成部８７に出し、無音生成部８７を作動させる。つまり、無音生成指示部２６は、例えば、図中に示すように、条件１はＡＮＤ回路４０で検出され、条件２はＡＮＤ回路４２で検出され、ＡＮＤ回路４０、４２のいずれかの出力が「１」であれば、ＯＲ回路４６を通じて、無音生成指示部２６が無音生成指示を出力することになる。 Further, if the voice packet decoding unit 82 does not extract a voice packet from the reception buffer 81, the value of the DTX flag is 0, and the number of consecutive loss compensation processes is N or more, that is, the compensation continuation enable / disable signal is “ If it is “0” (condition 1), the silence generation instruction unit 26 issues a silence generation instruction to the silence generation unit 87 and activates the silence generation unit 87. Further, if the voice packet decoding unit 82 cannot extract the voice packet from the reception buffer 81 and the value of the DTX flag is 1, that is, if the transmission side does not transmit the voice packet in the silent period (condition 2), there is no sound. The generation instruction unit 26 issues a silence generation instruction to the silence generation unit 87 and activates the silence generation unit 87. That is, for example, as shown in the drawing, the silence generation instruction unit 26 detects the condition 1 with the AND circuit 40, the condition 2 with the AND circuit 42, and the output of either of the AND circuits 40 and 42 is “ If “1”, the silence generation instruction unit 26 outputs a silence generation instruction through the OR circuit 46.

制御部８４は、音声パケット復号指示部２２から復号指示が出力されると、図１中のスイッチ８９を切り替えて、音声パケット復号部８２の出力側を出力端子８８に接続し、損失補償指示部２４から、損失補償指示が出力されると、スイッチ８９を切り替え、損失補償処理部８３の出力側を出力端子８８に接続し、無音生成指示部４６から無音生成指示が出力されると、スイッチ８９を切り替えて、無音生成部８７の出力側を出力端子８８に接続する。
先にも述べたように、受信バッファ８１から音声パケット復号部８２が取り出した音声パケットのフレーム番号をフレーム番号カウンタ８５にカウント値として設定し、かつパケット損失補償処理を１フレーム分行うごとにフレーム番号カウンタ８５のカウント値を「１」増加させるが、無音信号を生成した場合は、フレーム番号カウンタ８５のカウント値の増加は行わせず、そのカウント値はそのままとする。 When the decoding instruction is output from the voice packet decoding instruction unit 22, the control unit 84 switches the switch 89 in FIG. 1 to connect the output side of the voice packet decoding unit 82 to the output terminal 88, and the loss compensation instruction unit 24, when the loss compensation instruction is output, the switch 89 is switched, the output side of the loss compensation processing unit 83 is connected to the output terminal 88, and when the silence generation instruction unit 46 outputs the silence generation instruction, the switch 89 And the output side of the silence generator 87 is connected to the output terminal 88.
As described above, the frame number of the voice packet taken out by the voice packet decoding unit 82 from the reception buffer 81 is set as the count value in the frame number counter 85, and every time the packet loss compensation process is performed for one frame, The count value of the number counter 85 is increased by “1”, but when a silence signal is generated, the count value of the frame number counter 85 is not increased and the count value is left as it is.

パケット受信部８０は音声パケットを受信したときに、例えば図４に示すように一旦受信パケットレジスタ８０ａに格納し、パケット受信部８０内において、その受信音声パケットのフレーム番号と、フレーム番号カウンタ８５のカウント値とを整合判定部８０ｂが比較し、受信音声パケットのフレーム番号がフレーム番号カウンタのカウント値以下であれば、当該パケットを受信バッファ５２で蓄積せずに、整合判定部８０ｂがパケット破棄部８０ｃを作動させ、受信音声パケットを破棄する。受信音声パケットのフレーム番号がフレーム番号カウンタのカウント値より大きければ、整合判定部８０ｂは、ゲート８０ｄを開き、ゲート８０ｄ通じて、当該受信音声パケットを受信バッファ８１に蓄積する。 When receiving a voice packet, the packet receiver 80 temporarily stores it in the received packet register 80a as shown in FIG. 4, for example, and within the packet receiver 80, the frame number of the received voice packet and the frame number counter 85 The match determination unit 80b compares the count value, and if the frame number of the received voice packet is equal to or less than the count value of the frame number counter, the match determination unit 80b does not store the packet in the reception buffer 52 and the packet determination unit 80b 80c is activated and the received voice packet is discarded. If the frame number of the received voice packet is larger than the count value of the frame number counter, the matching determination unit 80b opens the gate 80d and stores the received voice packet in the reception buffer 81 through the gate 80d.

このような受信パケットと既に受信したパケットとフレーム番号の整合性の処理は、受信音声パケットではなく、受信バッファ８１から取り出した音声パケットについて行ってもよい。この場合は、図４中に示すように、括弧内に示すように受信バッファ８１内にこのための機能構成が設けられる。受信バッファ８１から取り出した音声パケットは取り出しパケットレジスタ８１ａに格納され、その音声パケットの整合判定部８１ｂに入力し、フレーム番号カウンタ８５のカウント値との整合性が判定される。この場合は、取り出した音声パケットをパケット破棄部８１ｃでするか、ゲート８１ｄを通じて、音声パケット復号部８２へ送り、復号処理するかのいずれかになる。 Such a process of consistency between the received packet and the already received packet and the frame number may be performed not on the received voice packet but on the voice packet extracted from the reception buffer 81. In this case, as shown in FIG. 4, a functional configuration for this purpose is provided in the reception buffer 81 as shown in parentheses. The voice packet taken out from the reception buffer 81 is stored in the taken-out packet register 81a, and is input to the voice packet matching determination unit 81b, where consistency with the count value of the frame number counter 85 is determined. In this case, either the packet discarding unit 81c or the sent voice packet is sent to the voice packet decoding unit 82 through the gate 81d to be decoded.

またこの構成例はＤＴＸ検出部８６を設けた場合であるので、ＤＴＸ検出部８６を設けない場合は、ＤＴＸフラグレジスタ２８及び無音生成指示部２６内のＡＮＤ回路４２およびＯＲ回路４６は省略される。
［実施例３］
図５、図６を参照して、ＤＴＸ検出部８６を設けた場合のこの発明方法の実施例を説明する。音声パケットの受信バッファ８１への蓄積と、受信バッファ８１からの取り出しは非同期に作動する。図５は受信バッファへの蓄積作動の流れを示す。まずＤＴＸフラグレジスタ２８を０にリセットし（Ｓ２０１）、パケット受信部８０が音声パケットを受信すると（Ｓ２０３）、ＤＴＸ検出部８６がパケットヘッダを観測する（Ｓ２０５）。その結果、受信音声パケットがＤＴＸを利用しているか否かを判別し（Ｓ２０７）、ＤＴＸを利用していれば、ＤＴＸフラグレジスタ２８を１にセットし（Ｓ２０９）、ＤＴＸを利用していなければ、ＤＴＸフラグレジスタ２８をそのままにする。整合判定部８０ｂが、フレーム番号カウンタ８５のカウント値を参照して（Ｓ２１１）、受信音声パケットのフレーム番号と比較し（Ｓ２１３）、既に、再生されたフレームでなければ、つまり受信音声パケットのフレーム番号の方が大きければ、受信バッファ８１へ当該パケットを蓄積し（Ｓ２１５）、既に再生されたフレーム、つまり受信音声パケットのフレーム番号がフレーム番号カウンタ８５のカウント値以下であれば、パケット破棄部８０ｃにより、その受信音声パケットを破棄する（Ｓ２１７）。その後、通話終了したか否かを調べ、終了していなければ、ステップＳ２０３のパケットを受信するステップに戻り、通話終了であれば、受信バッファへの蓄積処理を終了する（Ｓ２１９）。 Since this configuration example is provided with the DTX detection unit 86, the DTX flag register 28 and the AND circuit 42 and the OR circuit 46 in the silence generation instruction unit 26 are omitted when the DTX detection unit 86 is not provided. .
[Example 3]
With reference to FIG. 5 and FIG. 6, an embodiment of the method of the present invention in the case where the DTX detector 86 is provided will be described. Accumulation of voice packets in the reception buffer 81 and extraction from the reception buffer 81 operate asynchronously. FIG. 5 shows a flow of accumulation operation in the reception buffer. First, the DTX flag register 28 is reset to 0 (S201), and when the packet receiver 80 receives a voice packet (S203), the DTX detector 86 observes the packet header (S205). As a result, it is determined whether or not the received voice packet uses DTX (S207). If DTX is used, the DTX flag register 28 is set to 1 (S209), and if DTX is not used. The DTX flag register 28 is left as it is. The matching determination unit 80b refers to the count value of the frame number counter 85 (S211) and compares it with the frame number of the received voice packet (S213). If the frame is not already reproduced, that is, the frame of the received voice packet. If the number is larger, the packet is stored in the reception buffer 81 (S215). If the frame that has already been reproduced, that is, the frame number of the received voice packet is less than or equal to the count value of the frame number counter 85, the packet discarding unit 80c. Accordingly, the received voice packet is discarded (S217). Thereafter, it is checked whether or not the call is finished. If not finished, the process returns to the step of receiving the packet in step S203. If the call is finished, the storing process in the reception buffer is finished (S219).

次に受信バッファ８１から音声パケットの取り出しとその後の処理の流れを図６参照して説明する。補償継続可否信号生成部３４内の損失補償処理カウンタを０にリセットし（Ｓ３０１）、音声パケット復号部８２が、受信バッファ８１から音声パケットを取り出す操作を行う（Ｓ３０３）。取り出せる音声パケットが受信バッファ８１にあるか否かを判別し（Ｓ３０５）、取り出せる音声パケットがあれば、フレーム番号最小の音声パケットを取り出す（Ｓ３０７）。その取り出した音声パケットを音声パケット復号部８２で復号し（Ｓ３０９）、フレーム番号カウンタ８５に復号した音声パケットのフレーム番号をカウント値として設定する（Ｓ３１１）。この復号した音声信号を出力端子８８に出力し、音声信号送信部も備えている場合は、復号音声信号をエコーキャンセラへも送る（Ｓ３１３）。次に、通話が終了か否かを判定し（Ｓ３１５）、通話が終了でなければ、ステップＳ３０３の受信バッファ８１から音声パケットを取り出すステップに戻る。 Next, the extraction of voice packets from the reception buffer 81 and the subsequent processing flow will be described with reference to FIG. The loss compensation processing counter in the compensation continuation permission / prohibition signal generation unit 34 is reset to 0 (S301), and the voice packet decoding unit 82 performs an operation of taking out a voice packet from the reception buffer 81 (S303). It is determined whether there is a voice packet that can be taken out in the reception buffer 81 (S305). If there is a voice packet that can be taken out, the voice packet with the smallest frame number is taken out (S307). The extracted voice packet is decoded by the voice packet decoding unit 82 (S309), and the frame number of the decoded voice packet is set as a count value in the frame number counter 85 (S311). When the decoded audio signal is output to the output terminal 88 and the audio signal transmission unit is also provided, the decoded audio signal is also sent to the echo canceller (S313). Next, it is determined whether or not the call is ended (S315). If the call is not ended, the process returns to the step of taking out the voice packet from the reception buffer 81 in step S303.

一方、ステップＳ３０５で、受信バッファ８１から取り出す音声パケットがない場合は、ＤＴＸフラグの値を判別し（Ｓ３１７）、もしＤＴＸフラグの値が「０」の場合、すなわち、ＤＴＸが利用されていなければ、パケット損失補償処理の連続回数がＮ回未満であるかどうかを判別する（Ｓ３１９）。もし、パケット損失補償処理の連続回数がＮ回未満であると、損失補償処理部８３を作動させ、１フレーム分の損失補償信号を生成し（Ｓ３２１）、かつフレーム番号カウンタ８５のカウント値を「１」増加させ（Ｓ３２３）、生成した補償信号を出力端子８８に出力する（Ｓ３１３）。またステップＳ３１７で、ＤＴＸフラグの値が１の場合、すなわち、ＤＴＸが利用されている場合は、無音生成部８７を作動させ（Ｓ３２５）、出力端子８８に無音信号を出力する（Ｓ３１３）。またステップＳ３１９で、パケット損失補償処理の連続回数がＮ回以上と判断されると、無音生成部８７を作動させて（Ｓ３２５）、出力端子８８に無音信号を出力する（Ｓ３１３）。ステップＳ３１５で通話終了と判定されると、以上述べた音声パケットの取り出し、これに伴う処理を終了する。
［実施例４］
次に、図５、７を参照して、受信音声パケットのフレーム番号の整合性を、受信バッファから取り出す際に行う実施例を説明する。この場合、音声パケットの受信バッファ８１への蓄積処理は図５中に破線で示すように、ステップＳ２１１、Ｓ２１３、Ｓ２１７が省略され、ステップＳ２０９からステップＳ２１５に直ちに移る。 On the other hand, if there is no voice packet to be extracted from the reception buffer 81 in step S305, the value of the DTX flag is determined (S317). If the value of the DTX flag is “0”, that is, if DTX is not used. Then, it is determined whether the number of consecutive packet loss compensation processes is less than N (S319). If the number of consecutive packet loss compensation processes is less than N, the loss compensation processing unit 83 is activated to generate a loss compensation signal for one frame (S321), and the count value of the frame number counter 85 is set to “ 1 "is increased (S323), and the generated compensation signal is output to the output terminal 88 (S313). In step S317, if the value of the DTX flag is 1, that is, if DTX is used, the silence generator 87 is activated (S325), and a silence signal is output to the output terminal 88 (S313). If it is determined in step S319 that the number of consecutive packet loss compensation processes is N or more, the silence generator 87 is activated (S325), and a silence signal is output to the output terminal 88 (S313). If it is determined in step S315 that the call has ended, the voice packet described above is extracted, and the process associated therewith is terminated.
[Example 4]
Next, with reference to FIGS. 5 and 7, an embodiment will be described which is performed when the consistency of the frame number of the received voice packet is taken out from the reception buffer. In this case, the process of accumulating voice packets in the reception buffer 81 omits steps S211, S213, and S217 as shown by the broken line in FIG. 5 and immediately moves from step S209 to step S215.

一方、受信バッファ８１から音声パケットの取り出し処理は、図７に図６と対応するステップには、同一番号を示し、異なる部分では、音声パケット復号部８２が最小のフレーム番号の音声パケットを取り出した後は、フレーム番号カウンタ８５のカウント値を参照し（Ｓ６０１）、このカウント値と取り出した音声パケットのフレーム番号を比較し、後者の方が大きければ、再生済みでないと判断して（Ｓ６０３）、ステップＳ３０９に移り、取り出した音声パケットの復号処理を行う。ステップＳ６０３で取り出した音声パケットのフレーム番号がフレーム番号カウンタのカウント値以下であれば、再生済みと判断して、その取り出した音声パケットを破棄し（Ｓ６０５）、ステップＳ３１５に移る。 On the other hand, in the process of extracting the voice packet from the reception buffer 81, the step corresponding to FIG. 6 in FIG. 7 indicates the same number, and in a different part, the voice packet decoding unit 82 extracts the voice packet having the smallest frame number. After that, the count value of the frame number counter 85 is referred to (S601), the count value is compared with the frame number of the extracted voice packet, and if the latter is larger, it is determined that it has not been reproduced (S603). In step S309, the extracted voice packet is decoded. If the frame number of the voice packet extracted in step S603 is less than or equal to the count value of the frame number counter, it is determined that the voice packet has been reproduced, and the extracted voice packet is discarded (S605), and the process proceeds to step S315.

上述において、無音信号は完全な無音信号ではなく、受話者に影響を与えない小さな振幅の信号でもよい。
図１及び図２に示した装置をコンピュータにより機能させてもよい。この場合は、これら図１及び図２に示した装置として、コンピュータを機能させるためのプログラムを、ＣＤ−ＲＯＭ、磁気ディスク、など記録媒体に実装して、あるいは、通信回線を介して、コンピュータにダウンロードして、実現することも可能である。
この発明の構成により、ネットワークから音声パケットが到着せず、受信バッファが空の場合、損失補償処理を行い、再生音の劣化を防ぐ。しかし、損失補償処理を何秒も行うと、再生音の劣化などを招きやすくなるので、あらかじめ設定しておいたＮフレーム分、損失補償処理を行うと、無音信号を生成出力し、エコーキャンセラにその無音信号も入力されるため、エコーキャンセラを正しく作動させることができる。更に、ＤＴＸ検出部を設け、受信バッファが空になった現象が、ＤＴＸ機能を利用しているものなのか、想定以上のゆらぎによるものなのかを検出し、それに適した処理を行うことにより、音質を劣化させない効果を得ることができる。 In the above description, the silence signal is not a complete silence signal, and may be a small amplitude signal that does not affect the receiver.
The apparatus shown in FIGS. 1 and 2 may be operated by a computer. In this case, a program for causing the computer to function as the apparatus shown in FIGS. 1 and 2 is mounted on a recording medium such as a CD-ROM, a magnetic disk, or the computer via a communication line. It can also be downloaded and realized.
With the configuration of the present invention, when a voice packet does not arrive from the network and the reception buffer is empty, loss compensation processing is performed to prevent deterioration of reproduced sound. However, if the loss compensation process is performed for many seconds, it will be easy to cause deterioration of the reproduced sound. Therefore, if the loss compensation process is performed for N frames set in advance, a silence signal is generated and output to the echo canceller Since the silent signal is also input, the echo canceller can be operated correctly. Furthermore, by providing a DTX detector, detecting whether the phenomenon that the reception buffer is emptied is due to the use of the DTX function or due to fluctuations more than expected, and performing processing suitable for it, An effect that does not deteriorate the sound quality can be obtained.

この発明の実施例１の機能構成例を示すブロック図。The block diagram which shows the function structural example of Example 1 of this invention. 図１中の制御部８４の具体的機能構成例を示す図。The figure which shows the specific functional structural example of the control part 84 in FIG. 図２中の補償継続可否信号生成部３４の具体的構成例の各特性を表すブロック図。The block diagram showing each characteristic of the specific structural example of the compensation continuation permission signal generation part 34 in FIG. 受信パケット又は取出しパケットのフレーム番号とフレーム番号カウンタ８５の整合性を判定する機能構成を示すブロック図。The block diagram which shows the function structure which determines the consistency of the frame number of a received packet or an extraction packet, and the frame number counter 85. FIG. この発明の実施例３及び実施例４の受信バッファ８１への蓄積処理の例を示すフローチャート図。The flowchart figure which shows the example of the accumulation | storage process to the reception buffer 81 of Example 3 and Example 4 of this invention. この発明の実施例３の受信バッファ８１からの取り出し処理の例を示すフローチャート図。The flowchart figure which shows the example of the extraction process from the reception buffer 81 of Example 3 of this invention. この発明の実施例４の受信バッファ８１からの取り出し処理の例を示すフローチャート図。The flowchart figure which shows the example of the extraction process from the reception buffer 81 of Example 4 of this invention. 従来技術の音声信号を音声パケットに変換してパケット通信網によって通信するシステム構成例を示すブロック図。1 is a block diagram showing an example of a system configuration for converting a voice signal of a prior art into a voice packet and communicating via a packet communication network. 従来技術の音声信号受信部の構成例を示すブロック図。The block diagram which shows the structural example of the audio | voice signal receiving part of a prior art. 従来技術の図８のエコーキャンセラを備えた送受信装置の構成例を示すブロック図。The block diagram which shows the structural example of the transmitter / receiver provided with the echo canceller of FIG. 8 of a prior art.

Claims

In the audio packet receiving and reproducing method for accumulating received audio packets in a reception buffer, sequentially extracting audio packets from the reception buffer in order of frame number, decoding the extracted audio packets, and outputting an audio signal.
A first determination step for determining whether there is an audio packet to be extracted from the reception buffer for each frame;
A second determination process for determining whether to perform a packet loss compensation process or a silence generation process when it is determined in the first determination process that there is no voice packet;
When it is determined in the first determination process that there is no voice packet, a signal is generated by performing either the packet loss compensation process or the silence generation process based on the determination in the second determination process, and is output. And having a process
In the first determination process, until the state where it is determined that there is no voice packet continues for N frames (N is an integer of 0 or more), in the second determination process, a determination is made to perform packet loss compensation processing,
In the first determination process, when the state where it is determined that there is no voice packet continues beyond N frames (N is an integer of 0 or more), a determination is made to perform silence generation processing in the second determination process. A voice packet receiving and reproducing method characterized by the above.

The method of claim 1, wherein
Having a DTX detection process of observing a series of received voice packets and estimating whether the received voice packets utilize silence compression;
In the above DTX detection process, if it is estimated that silence compression is not used, N ≧ 1,
In the DTX detection process, if it is estimated that silence compression is used, N = 0 is set.

The method according to claim 1 or 2, wherein
Each time a voice packet is extracted from the reception buffer and decoded, a corresponding frame number is stored in a frame number counter;
Each time the packet loss compensation process is performed for one frame, the count value of the frame number counter is incremented by 1,
Each time the silence generation process is performed, the process of maintaining the count value of the frame number counter without increasing it,
It is determined that there is a voice packet to be extracted in the first determination process, and a frame number determination process for determining whether or not the frame number of the extracted voice packet is equal to or less than the frame number indicated by the frame number counter;
If the determination result of the frame number determination process is the following, the received voice packet is discarded, and if the determination result is not the following, the extracted voice packet is decoded and an audio signal is output. A method for receiving and reproducing voice packets, comprising:

The method according to claim 2 or 3,
In the DTX detection process, when it is estimated that silence compression is not used, the N is set to the number of frames corresponding to 40 to 100 milliseconds.

A computer-readable recording medium storing a program for causing a computer to execute each step of the voice packet receiving and reproducing method according to any one of claims 1 to 4.

The received voice packets are stored in the reception buffer, and the voice packets are sequentially extracted from the reception buffer in the order of the frame numbers. The extracted voice packets are decoded by the decoding unit, and the voice signal is output to the output terminal. In the voice packet receiving / reproducing apparatus for storing the frame number in the frame number counter,
A packet extraction enable / disable signal generator for generating a packet extraction enable / disable signal indicating whether or not there is a voice packet to be extracted from the reception buffer for each frame;
Compensation continuation enable / disable signal indicating whether or not the packet takeout enable / disable signal from the packet takeout enable / disable signal generation unit is input and that the packet takeout enable / disable signal indicates that it is impossible to take out continues for N frames (N is an integer of 0 or more). A compensation continuation enable / disable signal generator for generating
For each frame, the packet extraction enable / disable signal and the compensation continuation enable / disable signal are input. If the packet extraction enable / disable signal cannot be extracted, and the compensation continuation enable / disable signal can continue to be compensated, a packet loss compensation process is performed. A loss compensator that executes the frame, generates an audio signal, outputs the audio signal to the output terminal, and increases the count value of the frame number counter by 1;
For each frame, the packet extraction enable / disable signal and the compensation continuation enable / disable signal are input. If the packet extraction enable / disable signal cannot be extracted and the compensation continuation enable / disable signal cannot be compensated, a silence signal is generated. A voice packet receiving / reproducing apparatus comprising: a silence generation unit that outputs to the output terminal but does not increase the count value of the frame number counter.

The apparatus of claim 6.
A DTX detector that receives the received voice packet and detects whether the transmission side uses silence compression;
If the detection output of the DTX detection unit is input and the detection output indicates that silence compression is not used, the N is set to 0, and if the detection output indicates that the NTX is used, the N is 1 or more. A voice packet receiving / reproducing apparatus, comprising: