JP2005107283A

JP2005107283A - Method, device and program of packet loss concealment in voip voice communication

Info

Publication number: JP2005107283A
Application number: JP2003341918A
Authority: JP
Inventors: Tadashi Aoki; 直史青木; Takashi Nakano; 隆司中野
Original assignee: Individual
Current assignee: Individual
Priority date: 2003-09-30
Filing date: 2003-09-30
Publication date: 2005-04-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and a device of packet loss concealment in VoIP communication which can effectively perform error concealment by waveform duplication of a packet lost in the VoIP voice communication with high accuracy in order to suppress deterioration of call quality in VoIP as much as possible. <P>SOLUTION: When the device detects the packet lost in reception of packets transmitted via an IP network, calculates pitches of frames before and after a lost part in the lost packet (steps S17, S19), calculates pitch fluctuation rate from the pitches of the frames before and after the lost part (a step S21), compares the pitch fluctuation rate with a predetermined threshold (a step S23), when the pitch fluctuation rate is larger than the predetermined threshold as a result of this comparison, executes a regular 2-side PWR(Pitch Waveform Replication) method (a step S25) and when the pitch fluctuation rate is smaller than the predetermined threshold, executes a 2-side PWR method in consideration of the pitch fluctuation rate (a step S27). <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、ＩＰネットワークを利用したＶｏＩＰ（Voice over Internet Protocol）音声通信において損失したパケットを波形複製により隠蔽するＶｏＩＰ音声通信におけるパケット損失隠蔽方法、装置およびプログラムに関する。 The present invention relates to a packet loss concealment method, apparatus, and program in VoIP voice communication that conceals a packet lost in VoIP (Voice over Internet Protocol) voice communication using an IP network by waveform duplication.

ＩＰネットワークを利用して音声通信を行うＶｏＩＰ技術が近年注目を集めている（例えば、非特許文献１参照）。ＶｏＩＰによる電話サービスは、距離や時間に比例した通話料金を設定している従来の電話サービスに比較して、安価な通信料金を実現できることから近年急速に普及が進んでおり、次世代の電話サービスとして期待されている（例えば、非特許文献２）。 In recent years, VoIP technology that performs voice communication using an IP network has attracted attention (see, for example, Non-Patent Document 1). The VoIP telephone service has been rapidly spreading in recent years because it can realize an inexpensive communication charge compared to the conventional telephone service that sets a call charge proportional to distance and time. (For example, Non-Patent Document 2).

しかしながら、従来の電話サービスがギャランティ型のネットワークにより音声通信を実現しているのに対して、ＶｏＩＰは本来リアルタイム通信に不向きなベストエフォート型のＩＰネットワークにより音声通信を実現しているため、パケットの消失や遅延といった通信上のエラーが不可避であり、通話品質の低下を招きやすいという原理的な問題を抱えている（例えば、非特許文献３、４）。 However, while the conventional telephone service realizes voice communication using a guarantee type network, VoIP realizes voice communication using a best effort type IP network that is inherently unsuitable for real-time communication. Communication errors such as disappearance and delay are unavoidable, and there is a problem in principle that the call quality is likely to deteriorate (for example, Non-Patent Documents 3 and 4).

ＶｏＩＰによる音声通信を実現するにあたっては、上述したようなエラーをできる限り生じさせないようにするため、一定時間のうちに確実にパケットが受信されるようにネットワークを整備することが重要である（例えば、非特許文献５）。しかしながら、ベストエフォート型のネットワークを完全に制御することは不可能であり、エラーを皆無にすることは困難である。そのため、ある程度のエラーが生じることを想定して、そのような場合でも通話品質の低下をできる限り小さく抑えるための対策を講じておくことが必要となる。通常、非リアルタイム通信では、エラーが生じた場合、当該パケットを再送することでエラーに対処するが、リアルタイム性が要求されるＶｏＩＰではパケットを再送する時間的余裕がほとんどないため、パケットの再送を必要としないエラー隠蔽処理を施す必要がある。 In realizing voice communication by VoIP, it is important to prepare a network so that packets are received reliably within a certain period of time in order to prevent the above-described errors as much as possible (for example, Non-patent document 5). However, it is impossible to completely control a best effort network, and it is difficult to eliminate all errors. For this reason, it is necessary to take measures for minimizing the deterioration of the call quality as much as possible even in such a case, assuming that a certain degree of error occurs. Normally, in non-real-time communication, when an error occurs, the error is dealt with by retransmitting the packet. However, in VoIP that requires real-time performance, there is almost no time to retransmit the packet. It is necessary to perform error concealment processing that is not necessary.

このようなエラー隠蔽処理として、従来、様々な方法が提案されているが、大別すると、送信側で対処するsender-basedの手法と受信側で対処するreceiver-basedの手法の２種類に分類される（非特許文献６、７）。また、エラーの状況としては、単独パケットの損失による瞬断、受信バッファのオーバーフローによるオーバーラン、受信バッファのアンダーフローによるアンダーランの３種類が考えられる。 Various methods have been proposed for error concealment processing in the past, but can be broadly classified into two types: a sender-based method for dealing with the transmitting side and a receiver-based method for dealing with the receiving side. (Non-Patent Documents 6 and 7). There are three types of error situations: instantaneous interruption due to loss of a single packet, overrun due to reception buffer overflow, and underrun due to reception buffer underflow.

ＶｏＩＰ音声通信では、音声通話の際、送信側においてアナログ音声をＡＤ変換し、デジタル化した音声データをフレームに分割した後、圧縮処理を行う。それから、圧縮音声データをペイロードに搭載することでＩＰパケットを作成し、ＩＰネットワークを経由して受信側に伝送する。受信側では、ＩＰパケットを受信すると、このＩＰパケットの分解、圧縮音声データの伸張、ＤＡ変換によりアナログ音声を再生する。 In VoIP voice communication, analog voice is AD-converted on the transmission side during voice call, and digitized voice data is divided into frames, and then compression processing is performed. Then, an IP packet is created by mounting the compressed audio data in the payload and transmitted to the receiving side via the IP network. When receiving the IP packet, the receiving side reproduces the analog voice by decomposing the IP packet, decompressing the compressed voice data, and DA conversion.

ＶｏＩＰでは、リアルタイム通信を実現するため、トランスポート層のプロトコルとしてＵＤＰ（User Datagram Protocol）を使用している。そのため、パケットの損失や遅延といった通信上のエラーが生じても当該パケットの再送を行わない。なお、ペイロードに搭載する圧縮音声データのフレーム長は一般に２０ｍsに設定されることが多い。また、システムによって対応するコーデックの種類は異なるが、コーデックはネットワークのブロードバンド化に伴い、ＩＴＵ勧告のＧ．７１１が一般的に利用されるようになってきている。Ｇ．７１１は標本化周波数８ｋＨz でデジタル化した音声データを量子化精度８ビットに対数量子化するコーデックであり、他のコーデックと比較して圧縮効率が低いが、通話品質の劣化を小さく抑えることができる。なお、Ｇ．７１１にはμ−１awとＡ−１awの２種類がある。 In VoIP, UDP (User Datagram Protocol) is used as a transport layer protocol to realize real-time communication. Therefore, even if a communication error such as packet loss or delay occurs, the packet is not retransmitted. In general, the frame length of the compressed audio data mounted on the payload is generally set to 20 ms. In addition, although the types of codecs supported differ depending on the system, the codecs have become G. 711 is generally used. G. Reference numeral 711 denotes a codec that logarithmically quantizes voice data digitized at a sampling frequency of 8 kHz to a quantization accuracy of 8 bits. Although the compression efficiency is lower than that of other codecs, it is possible to suppress deterioration in speech quality. . In addition, G. There are two types of 711, μ-1aw and A-1aw.

従来のreceiver-basedのエラー隠蔽処理である波形複製法では、正常に受信された音声データから損失フレームの代理となる置換ブロックを推定し、これを損失フレームにコピーすることによりエラー隠蔽を行う。また、波形複製法は、置換ブロックの定義の違いによりＷＲ（Wave Replication）法とＰＷＲ（Pitch Waveform Replication）法に分類される。 In the waveform replication method, which is a conventional receiver-based error concealment process, error concealment is performed by estimating a replacement block serving as a substitute for a lost frame from normally received speech data and copying this to the lost frame. The waveform replication method is classified into a WR (Wave Replication) method and a PWR (Pitch Waveform Replication) method depending on the definition of the replacement block.

ＷＲ法は、図９（ａ）に示すように、テンプレートマッチングにより最も相関の大きい部分を同定し、この直後の音声データを置換ブロックとし、この置換ブロックを複製ブロックとして損失フレームに一括してコピーすることでエラー隠蔽を行う。また、ＰＷＲ法は、図９（ｂ）に示すように、テンプレートマッチングにより１ピッチ波形を同定し、これを置換ブロックとし、この置換ブロックを複製ブロックとして周期的に繰り返し損失フレームにコピーすることでエラー隠蔽を行う。 In the WR method, as shown in FIG. 9 (a), a portion having the highest correlation is identified by template matching, and the voice data immediately after this is used as a replacement block, and this replacement block is copied as a duplicate block to the lost frame at once. Error concealment. In the PWR method, as shown in FIG. 9B, a one-pitch waveform is identified by template matching, this is used as a replacement block, and this replacement block is periodically copied as a duplicate block to a lost frame. Perform error concealment.

なお、図９に示すように、フレーム長をＬ、探索窓長をＭ、テンプレート長をＮ、受信した音声データをｓ(n)とし、このとき、ｋ（≧０）番目のパケットが瞬断したと仮定すると、ＷＲ法におけるテンプレートマッチングは、式（１）で定義される相互相関関数Ｃ(m)の最大値を与える時刻ｍを求めることになる。 As shown in FIG. 9, the frame length is L, the search window length is M, the template length is N, and the received voice data is s (n). At this time, the kth (≧ 0) th packet is momentarily interrupted. Assuming that the template matching in the WR method is performed, the time m giving the maximum value of the cross-correlation function C (m) defined by the equation (1) is obtained.

一方、ＰＷＲ法におけるテンプレートマッチングは、式（２）で定義される相互相関関数Ｃ(m)の最大値を与える時刻ｍを求めることになる。 On the other hand, in the template matching in the PWR method, the time m giving the maximum value of the cross-correlation function C (m) defined by the equation (2) is obtained.

ＷＲ法では、処理に必要となる音声データの長さはＬ＋Ｍ＋Ｎとなるので、少なくとも損失フレーム直前の２パケットが正常に受信される必要がある。一方、ＰＷＲ法では、処理に必要となる音声データの長さはＭ＋Ｎとなり、抽出できる最大のピッチ長は探索窓長Ｍに等しくなる。一般に、男声のピッチ長は５ｍsから１２ｍs、女声のピッチ長は２ｍsから７ｍsに分布していることから、フレーム長をＬ＝２０ｍsとした場合、テンプレート長をＮ≦８ｍsとすると、損失フレーム直前に受信された１パケットのみでピッチ長１２ｍsまでのピッチ波形を抽出することができる。 In the WR method, the length of audio data necessary for processing is L + M + N, and at least two packets immediately before the lost frame need to be normally received. On the other hand, in the PWR method, the length of audio data necessary for processing is M + N, and the maximum pitch length that can be extracted is equal to the search window length M. In general, the pitch length of male voices is distributed from 5 ms to 12 ms, and the pitch length of female voices is distributed from 2 ms to 7 ms. Therefore, if the frame length is L = 20 ms and the template length is N ≦ 8 ms, the pitch length is just before the loss frame. A pitch waveform up to a pitch length of 12 ms can be extracted with only one received packet.

フレーム長が短い場合は、ＷＲ法でも効果的にエラー隠蔽を行うことができるが、ＷＲ法では処理に必要となる音声データが長くなると、音声データの定常性を十分に保証できなくなるので、フレーム長が長い場合は、ＰＷＲ法を適用した方が効果的である。 When the frame length is short, error concealment can be effectively performed even with the WR method. However, with the WR method, if the audio data required for processing becomes long, the steadiness of the audio data cannot be sufficiently guaranteed, so the frame When the length is long, it is more effective to apply the PWR method.

なお、基本的にＷＲ法およびＰＷＲ法は、損失フレーム直前の音声データを利用してエラー隠蔽を行うが、ＶｏＩＰでは遅延ジッタを吸収するため３パケット程度の受信バッファを設定することから、損失フレームの前後の音声データを利用する２-sideの処理に拡張することで、より精度の高いエラー隠蔽を実現することができる。特に、ＰＷＲ法では、処理に必要となる音声データの長さを１パケットに抑えることができるため、２-sideの処理に拡張してもルックアヘッド遅延を１パケットに抑えることが可能であり、実用性が高い。 Basically, in the WR method and the PWR method, error concealment is performed using voice data immediately before the lost frame. However, since VoIP absorbs delay jitter, a reception buffer of about 3 packets is set. The error concealment with higher accuracy can be realized by extending the processing to 2-side processing using the audio data before and after. In particular, in the PWR method, the length of voice data required for processing can be suppressed to one packet, so that the look-ahead delay can be suppressed to one packet even when the processing is expanded to 2-side processing. High practicality.

２-sideＰＷＲ法は、図１０に示すように、損失フレームの前後をそれぞれ１パケットを利用してＰＷＲ法を実行し、両者のオーバーラップによりエラー隠蔽を行い、これにより音声波形の開始部分や終了部分といった非定常性の顕著な部分でのエラー隠蔽の精度を向上することができる（非特許文献８）。 In the 2-side PWR method, as shown in FIG. 10, the PWR method is executed by using one packet before and after the lost frame, and error concealment is performed by overlapping both of them, thereby starting and ending the speech waveform. It is possible to improve the accuracy of error concealment in a portion where the non-stationary property such as the portion is remarkable (Non-patent Document 8).

図１０を参照して、従来の２-sideＰＷＲ法について詳しく説明する。図１０（ａ）に示すような原音声を受信側で受信した結果、ｋ番目のフレームが受信できず、図１０（ｂ）に示すように損失したとすると、この損失フレームに対して図１０（ｃ）に示すように前側のｋ−１番号のフレームの方から置換ブロックを推定し、損失フレームにコピーするというbackwardＰＷＲを行い、更に図１０（ｄ）に示すように後側のｋ＋１番号のフレームの方から置換ブロックを推定し、損失フレームにコピーするというforwardＰＷＲを行う。 The conventional 2-side PWR method will be described in detail with reference to FIG. As a result of receiving the original voice as shown in FIG. 10A on the receiving side, if the k-th frame cannot be received and is lost as shown in FIG. As shown in FIG. 10C, a backward PWR is performed in which a replacement block is estimated from the front k-1 number frame and copied to the lost frame. Further, as shown in FIG. A forward PWR is performed in which a replacement block is estimated from the frame and copied to the lost frame.

それから、図１０（ｅ）に示すように、図１０（ｃ）の前側からの置換ブロックと図１０（ｄ）の後側からの置換ブロックとを比例配分によりオーバーラップ＆アド（overlap and add）して、エラー隠蔽音声を生成する。この結果、損失フレームは、前側のフレームと後側のフレーム間で連続し、損失のない連続した音声として再生される。 Then, as shown in FIG. 10 (e), the replacement block from the front side of FIG. 10 (c) and the replacement block from the rear side of FIG. 10 (d) are overlapped and added by proportional distribution. Then, error concealment speech is generated. As a result, the lost frame is reproduced between the front frame and the rear frame, and is reproduced as a continuous sound with no loss.

図１０（ｅ）に示す再生されたエラー隠蔽音声は、図１０（ａ）に示す原音声と比較すると、図１０（ｆ）に示すような差分波形が生成されるように、原音声とかなりの相違があることが分かるが、これは前側のフレームから生成した置換ブロックの波形の位相と後側のフレームから生成した置換ブロックの波形の位相とがずれていて、両波形のピッチがずれているからである。
藤原洋著、マルチメディア情報圧縮、共立出版，2000 G,Held,音声＆データ統合技術ガイド，インプレス，2000 今井恵一，“ＶｏＩＰ実現上の課題”信学会誌、vol.83,no.4,pp.295-301,2001 長渕裕実，“ＶｏＩＰ品質上の諸問題”，信学技報，vol.IN2000-128,2000 戸田巌、詳解ネットワークＱｏＳ技術，オーム社，2001 C.Perkins,O.Hodson and V.Hardman,“A survey of packet loss recovery techniques for streaming audio”，IEEE Network Magazine,pp.40-48,September/October 1998 H.Sanneck,“Packet Loss Recovery and Control for Voice Transmission over the Internet”，Ph.D.thesis,Technical University Berlin,2000 小牧憲子、青木直史、山本強、“波形置換に基づくVoIPにおけるパケット損失の一隠蔽法”信学技報、vol.CQ2002-59,2002 The reproduced error concealed speech shown in FIG. 10 (e) is considerably different from the original speech so that a differential waveform as shown in FIG. 10 (f) is generated when compared with the original speech shown in FIG. 10 (a). The waveform of the replacement block generated from the previous frame is out of phase with the waveform of the replacement block generated from the rear frame, and the pitch of both waveforms is shifted. Because.
Hiroshi Fujiwara, multimedia information compression, Kyoritsu Publishing, 2000 G, Held, Voice & Data Integration Technology Guide, Impress, 2000 Keiichi Imai, “Problems in Realizing VoIP”, IEICE Journal, vol.83, no.4, pp.295-301, 2001 Hiromi Nagahama, “Problems on VoIP Quality”, IEICE Technical Report, vol.IN2000-128,2000 Satoshi Toda, Detailed Network QoS Technology, Ohmsha, 2001 C. Perkins, O. Hodson and V. Hardman, “A survey of packet loss recovery techniques for streaming audio”, IEEE Network Magazine, pp. 40-48, September / October 1998 H. Sanneck, “Packet Loss Recovery and Control for Voice Transmission over the Internet”, Ph.D. thesis, Technical University Berlin, 2000 Nobuko Komaki, Naofumi Aoki, Tsuyoshi Yamamoto, “A Method for Concealing Packet Loss in VoIP Based on Waveform Replacement”, IEICE Tech. Bulletin, vol.CQ2002-59,2002

上述したように、ＶｏＩＰは本来リアルタイム通信に不向きなベストエフォート型のＩＰネットワークにより音声通信を実現しているため、パケットの損失や遅延といった通信上のエラーが不可避であり、通話品質の低下を招きやすいという問題があるが、このようなパケット損失を隠蔽する波形複製による従来のエラー隠蔽方法である２−sideＰＷＲ法は、前後のフレームから生成した波形に位相ずれが発生するため、損失フレームを複製したエラー隠蔽音声が原音声に対して図１０（ｆ）に示すような比較的大きな差分波形を生じるというように原音声とかなりの相違があり、エラー隠蔽を効果的に行うことができないという問題がある。 As described above, since VoIP realizes voice communication using a best-effort IP network that is inherently unsuitable for real-time communication, communication errors such as packet loss and delay are unavoidable, leading to a decrease in call quality. The 2-side PWR method, which is a conventional error concealment method using waveform duplication that conceals such packet loss, causes a phase shift in the waveform generated from the previous and subsequent frames. The error concealed speech is considerably different from the original speech such that a relatively large difference waveform as shown in FIG. 10 (f) is generated with respect to the original speech, and the error concealment cannot be performed effectively. There is.

本発明は、上記に鑑みてなされたもので、その目的とするところは、ＶｏＩＰにおける通話品質の低下を極力抑えるためにＶｏＩＰ音声通信において損失したパケットの波形複製によるエラー隠蔽を高い精度で効果的に行い得るＶｏＩＰ音声通信におけるパケット損失隠蔽方法、装置およびプログラムを提供することにある。 The present invention has been made in view of the above, and an object thereof is to effectively conceal an error by duplicating a waveform of a packet lost in VoIP voice communication with high accuracy in order to suppress a decrease in call quality in VoIP as much as possible. It is an object to provide a packet loss concealment method, apparatus, and program for VoIP voice communication that can be performed in the following manner.

請求項１記載の本発明のＶｏＩＰ音声通信におけるパケット損失隠蔽方法は、ＩＰネットワークを利用したＶｏＩＰ音声通信において損失したパケットを波形複製により隠蔽するＶｏＩＰ音声通信におけるパケット損失隠蔽方法であって、ＩＰネットワークを介して送信されてくるパケットの受信において損失したパケットを検知し、この検知した損失パケット中の損失部分の前後のフレームのピッチを計算し、この計算した前後のフレームのピッチに基づいて当該前後のフレーム間におけるピッチ変動率を計算し、この計算したピッチ変動率を所定の閾値と比較し、この比較の結果、ピッチ変動率が所定の閾値より大きい場合、通常の２-sideＲＷＲ（Pitch Waveform Replication）法を実施し、ピッチ変動率が所定の閾値より小さい場合、前記ピッチ変動率を考慮した２-sideＰＷＲ法を実施することを要旨とする。 A packet loss concealment method in VoIP voice communication according to claim 1 of the present invention is a packet loss concealment method in VoIP voice communication that conceals a lost packet in VoIP voice communication using an IP network by waveform duplication. Detects a lost packet in the reception of a packet transmitted through the network, calculates the frame pitch before and after the lost part in the detected lost packet, and based on the calculated previous and next frame pitch The pitch fluctuation rate between frames is calculated, and the calculated pitch fluctuation rate is compared with a predetermined threshold. If the pitch fluctuation rate is larger than the predetermined threshold as a result of this comparison, a normal 2-side RWR (Pitch Waveform Replication ) Method, and when the pitch variation rate is smaller than a predetermined threshold, the pitch The gist is to implement the 2-side PWR method considering the fluctuation rate.

また、請求項２記載の本発明のＶｏＩＰ音声通信におけるパケット損失隠蔽装置は、ＩＰネットワークを利用したＶｏＩＰ音声通信において損失したパケットを波形複製により隠蔽するＶｏＩＰ音声通信におけるパケット損失隠蔽装置であって、ＩＰネットワークを介して送信されてくるパケットの受信において損失したパケットを検知する損失パケット検知手段と、この検知した損失パケット中の損失部分の前後のフレームのピッチを計算するピッチ計算手段と、この計算した前後のフレームのピッチに基づき当該前後のフレーム間におけるピッチ変動率を計算するピッチ変動率計算手段と、この計算したピッチ変動率を所定の閾値と比較する比較手段と、この比較の結果、ピッチ変動率が所定の閾値より大きい場合、通常の２-sideＰＷＲ（Pitch Waveform Replication）法を実施し、ピッチ変動率が所定の閾値より小さい場合、前記ピッチ変動率を考慮した２-sideＰＷＲ法を実施することを要旨とする。 The packet loss concealment device in VoIP voice communication of the present invention according to claim 2 is a packet loss concealment device in VoIP voice communication that conceals a lost packet in VoIP voice communication using an IP network by waveform duplication. Lost packet detection means for detecting a lost packet in reception of a packet transmitted via the IP network, pitch calculation means for calculating the pitch of frames before and after the lost portion in the detected lost packet, and this calculation A pitch fluctuation rate calculating means for calculating a pitch fluctuation rate between the preceding and succeeding frames based on the pitches of the preceding and following frames, a comparing means for comparing the calculated pitch fluctuation rate with a predetermined threshold, and as a result of this comparison, the pitch When the rate of change is greater than a predetermined threshold, normal 2-side PWR Pitch Waveform Replication) technique carried out, when the pitch variation rate is smaller than a predetermined threshold value, and summarized in that to implement the 2-sidePWR method considering the pitch variation rate.

請求項３記載の本発明のＶｏＩＰ音声通信におけるパケット損失隠蔽プログラムは、ＩＰネットワークを利用したＶｏＩＰ音声通信において損失したパケットを波形複製により隠蔽するためのコンピュータが実行可能なＶｏＩＰ音声通信におけるパケット損失隠蔽プログラムであって、前記コンピュータをＩＰネットワークを介して送信されてくるパケットの受信において損失したパケットを検知する損失パケット検知手段と、この検知した損失パケット中の損失部分の前後のフレームのピッチを計算するピッチ計算手段と、この計算した前後のフレームのピッチに基づき当該前後のフレーム間におけるピッチ変動率を計算するピッチ変動率計算手段と、この計算したピッチ変動率を所定の閾値と比較する比較手段と、この比較の結果、ピッチ変動率が所定の閾値より大きい場合、通常の２-sideＰＷＲ（Pitch Waveform Replication）法を実施し、ピッチ変動率が所定の閾値より小さい場合、前記ピッチ変動率を考慮した２-sideＰＷＲ法を実施するＰＷＲ実施手段として機能させることを要旨とする。 A packet loss concealment program in VoIP voice communication according to claim 3 of the present invention is a packet loss concealment in VoIP voice communication that can be executed by a computer for concealing a lost packet in VoIP voice communication using an IP network by waveform duplication. A program for calculating lost packet detection means for detecting a lost packet in receiving a packet transmitted from the computer via the IP network, and calculating a pitch of frames before and after the lost portion in the detected lost packet A pitch calculating means for calculating the pitch fluctuation rate between the preceding and following frames based on the calculated pitch of the preceding and following frames, and a comparing means for comparing the calculated pitch fluctuation rate with a predetermined threshold value As a result of this comparison, When the fluctuation rate is larger than a predetermined threshold, a normal 2-side PWR (Pitch Waveform Replication) method is performed. When the pitch fluctuation rate is smaller than the predetermined threshold, the 2-side PWR method considering the pitch fluctuation rate is performed. The gist is to function as PWR implementation means.

本発明によれば、パケットの受信において検知した損失パケット中の損失部分の前後のフレームのピッチを計算し、この前後のフレームのピッチに基づいてピッチ変動率を計算し、このピッチ変動率を所定の閾値と比較し、この比較の結果、ピッチ変動率が所定の閾値より大きい場合、通常の２−sideＰＷＲ法を実施し、ピッチ変動率が所定の閾値より小さい場合、前記ピッチ変動率を考慮した２−sideＰＷＲ法を実施するので、位相ずれを抑え、高い精度で効果的にエラー隠蔽を行うことができる。 According to the present invention, the pitch of the frames before and after the lost part in the lost packet detected in the reception of the packet is calculated, the pitch variation rate is calculated based on the pitch of the preceding and following frames, and this pitch variation rate is determined in advance. When the pitch fluctuation rate is larger than the predetermined threshold value, the normal 2-side PWR method is performed. When the pitch fluctuation rate is smaller than the predetermined threshold value, the pitch fluctuation rate is considered. Since the 2-side PWR method is implemented, phase shift can be suppressed and error concealment can be effectively performed with high accuracy.

本発明のＶｏＩＰ音声通信におけるパケット損失隠蔽方法は、ＩＰネットワークを介して送信されてくるパケットの受信において損失したパケットを検知すると、この検知した損失パケット中の損失部分の前後のフレームのピッチを計算し、この前後のフレームのピッチに基づいてピッチ変動率を計算し、このピッチ変動率を所定の閾値と比較し、この比較の結果、ピッチ変動率が所定の閾値より大きい場合、通常の２-sideＰＷＲ法を実施し、ピッチ変動率が所定の閾値より小さい場合、ピッチ変動率を考慮した２-sideＰＷＲ法を実施する。 The packet loss concealment method in the VoIP voice communication according to the present invention calculates the pitch of frames before and after the lost portion in the detected lost packet when detecting the lost packet in receiving the packet transmitted through the IP network. Then, the pitch fluctuation rate is calculated based on the pitches of the preceding and following frames, the pitch fluctuation rate is compared with a predetermined threshold value, and if the pitch fluctuation rate is larger than the predetermined threshold value as a result of this comparison, the normal 2- When the side PWR method is performed and the pitch variation rate is smaller than a predetermined threshold, the 2-side PWR method is performed in consideration of the pitch variation rate.

図１は、本発明の一実施例に係わるＶｏＩＰ音声通信におけるパケット損失隠蔽方法の処理手順を示すフローチャートである。
本実施例のＶｏＩＰ音声通信におけるパケット損失隠蔽方法は、図１０で説明した従来の２-sideＰＷＲ法を利用するも、この２-sideＰＷＲ法において損失フレームの前後のピッチの変動を考慮して位相ずれを低減させて、複製したエラー隠蔽音声と原音声との差分を低減し、これにより損失パケットの波形複製による隠蔽を高い精度で効果的に行い得るものである。 FIG. 1 is a flowchart showing a processing procedure of a packet loss concealment method in VoIP voice communication according to an embodiment of the present invention.
The packet loss concealment method in the VoIP voice communication of the present embodiment uses the conventional 2-side PWR method described in FIG. 10, but in this 2-side PWR method, the phase shift is considered in consideration of the pitch variation before and after the lost frame. Thus, the difference between the copied error concealed speech and the original speech is reduced, so that concealment by waveform duplication of the lost packet can be effectively performed with high accuracy.

更に詳しくは、本実施例のＶｏＩＰ音声通信におけるパケット損失隠蔽方法は、ＶｏＩＰ音声通信においてパケットの消失や遅延といった通信上のエラーによる通信品質の低下を極力抑えるために波形複製によるエラー隠蔽処理を効果的に行うものであるが、本実施例では、エラー隠蔽処理として受信側で対処するreceiver−basedの手法を利用し、このreceiver−basedの手法において比較的簡単な処理にも関わらず精度の高いエラー隠蔽が可能である波形複製法を基本として利用し、この波形複製によるエラー隠蔽手法でより効果的にエラー隠蔽を行うために、損失フレームの前後の音声データを利用する２-sideの処理に拡張した２-sideＰＷＲ法を利用するとともに、この２-sideからのエラー隠蔽を行う際に抽出するピッチの変動を考慮して位相ずれを低減させるように前後からのエラー隠蔽処理を高い精度をもって行うものである。 More specifically, the packet loss concealment method in VoIP voice communication according to the present embodiment is effective in error concealment processing by waveform duplication in order to suppress the deterioration of communication quality due to communication errors such as packet loss and delay in VoIP voice communication as much as possible. However, in this embodiment, a receiver-based method to be dealt with on the receiving side is used as an error concealment process, and the receiver-based method is highly accurate despite relatively simple processing. In order to perform error concealment more effectively with this method of error concealment using waveform duplication, which is based on the waveform duplication method capable of error concealment, in 2-side processing using audio data before and after the lost frame Using the extended 2-side PWR method and taking into account fluctuations in pitch extracted when performing error concealment from the 2-side, phase shift To reduce and performs error concealment processing from the front and rear with high accuracy.

図１を参照して、本発明の一実施例に関わるＶｏＩＰ音声通信におけるパケット損失隠蔽方法について詳しく説明する。
本実施例では、２-sideＰＷＲ法を利用するも、この２-sideＰＷＲ法の精度を更に向上させるために、損失フレームの前後の音声データのピッチ変動を考慮した補間を行っている。なお、本実施例では、２-sideＰＷＲ法におけるテンプレート長をＮ＝８ｍsとし、ピッチ長の最小値が２ｍs、最大値が１２ｍsとなるように探索窓長をＭ＝１０ｍsと設定した。 With reference to FIG. 1, a packet loss concealment method in VoIP voice communication according to an embodiment of the present invention will be described in detail.
In this embodiment, although the 2-side PWR method is used, in order to further improve the accuracy of the 2-side PWR method, interpolation is performed in consideration of the pitch variation of the audio data before and after the lost frame. In this embodiment, the template length in the 2-side PWR method is set to N = 8 ms, the search window length is set to M = 10 ms so that the minimum value of the pitch length is 2 ms and the maximum value is 12 ms.

図１では、ＶｏＩＰ音声通信においてＩＰネットワークを介して送信されてくるパケットを受信する受信側において、ｋ＋１番目のパケットを取得し（ステップＳ１１）、ｋ＋１番目のフレームの音声データを取得すると（ステップＳ１３）、ここでｋ番目のパケットの損失の有無のチェック、すなわち損失パケット検知手段によりｋ番目のパケットが損失されていて、正常に受信されていないか否かのチェック（損失パケットの検知）を行う（ステップＳ１５）。 In FIG. 1, the receiving side that receives a packet transmitted via the IP network in VoIP voice communication acquires the k + 1th packet (step S11), and acquires the voice data of the k + 1th frame (step S13). ) Here, a check is made to determine whether or not the k-th packet has been lost, that is, whether the k-th packet has been lost by the lost packet detection means and has not been received normally (detection of a lost packet). (Step S15).

ｋ番目のパケットが損失され正常に受信されていない場合には、まずピッチ計算手段により、１つ前のフレームであるｋ−１番目のフレームのピッチＰｂを抽出し（ステップＳ１７）、更に１つ後のフレームであるｋ＋１番目のフレームのピッチＰｆを抽出する（ステップＳ１９）。そして、次にピッチ変動率計算手段により、この抽出したｋ−１番目とｋ＋１番目のフレームの両ピッチからピッチ変動率αを次式により計算する（ステップＳ２１）。 When the k-th packet is lost and is not normally received, the pitch calculation means first extracts the pitch Pb of the k-1th frame, which is the previous frame (step S17), and one more The pitch Pf of the (k + 1) th frame that is the subsequent frame is extracted (step S19). Next, the pitch fluctuation rate calculating means calculates the pitch fluctuation rate α from the extracted pitches of the (k−1) th and k + 1th frames according to the following equation (step S21).

α＝（Ｐ_f−Ｐ_b）／Ｌ (3)
ここで、Ｌはフレーム長である。
図２を参照して、ピッチ変動率αについて説明する。図２において、フレーム長がＬであり、ｋ番目のフレームが損失フレームであるとし、この損失フレームのピッチ変動率を計算する。ｋ−１番目のフレームのピッチがＰbであり、ｋ＋１番目のフレームのピッチがＰfであるとすると、両フレームの間のｋ番目のフレームにおけるピッチ変動率αは、式（３）のように計算され、これは両ピッチＰb、Ｐfで定義される線分の傾きを表すことになる。従って、このピッチ変動率αからｋ番目のフレームの各時点におけるピッチ長を計算することができる。すなわち、ｋ番目のフレームのピッチ長は、ｋ番目のフレームの開始時刻ｎ＝０からの時間ｎをピッチ変動率αに掛けた値にｋ番目のフレームの開始時刻ｎ＝０におけるピッチ長Ｐb を加算することにより算出することができる。 α = (P _f −P _b ) / L (3)
Here, L is the frame length.
The pitch variation rate α will be described with reference to FIG. In FIG. 2, it is assumed that the frame length is L and the kth frame is a lost frame, and the pitch variation rate of the lost frame is calculated. The pitch of the (k-1) th frame is Pb And the pitch of the (k + 1) th frame is Pf , The pitch variation rate α in the k-th frame between the two frames is calculated as shown in the equation (3), which is calculated by using both pitches Pb. , Pf This represents the slope of the line segment defined by. Therefore, the pitch length at each time point of the kth frame can be calculated from the pitch variation rate α. In other words, the pitch length of the kth frame is obtained by multiplying the pitch variation rate α by the time n from the start time n = 0 of the kth frame and the pitch length Pb at the start time n = 0 of the kth frame. It can be calculated by adding.

なお、有声音の定常区間では、損失フレームの前後のフレームのピッチ長は似通った値をとることが期待され、この場合には、ピッチ変動率αの絶対値は小さい値となる。しかしながら、無音または無声音から有声音への遷移またはその逆の場合には、非定常性が強く、ピッチ変動率の絶対値は必ずしも小さい値を取るとは限らない。 In the steady section of the voiced sound, it is expected that the pitch lengths of the frames before and after the loss frame take similar values. In this case, the absolute value of the pitch variation rate α is a small value. However, in the case of transition from silent sound or unvoiced sound to voiced sound or vice versa, non-stationarity is strong, and the absolute value of the pitch fluctuation rate does not always take a small value.

図３は、男声女声各１０個の音声データから抽出したピッチ変動率αの絶対値のヒストグラムである。縦軸にパケット数を示し、横軸にピッチ変動率αの絶対値を示している。なお、音声データは、日本音響学会の音声データベースからランダムに選択した音声資料を標本化周波数８ｋＨz 、量子化精度１６ビットで再サンプリングしたものである。音声データの合計の時間長は８９．９２s 、パケット数にして４４９６個であった。図３に示すように、通常の音声データにおけるピッチ変動率αの絶対値は０付近に集中する可能性があり、大きな変動がないことが分かる。 FIG. 3 is a histogram of absolute values of the pitch fluctuation rate α extracted from the voice data of 10 male and female voices. The vertical axis represents the number of packets, and the horizontal axis represents the absolute value of the pitch variation rate α. Note that the audio data is obtained by re-sampling audio material randomly selected from the audio database of the Acoustical Society of Japan with a sampling frequency of 8 kHz and a quantization accuracy of 16 bits. The total time length of the voice data was 89.92 s, and the number of packets was 4496. As shown in FIG. 3, it can be seen that the absolute value of the pitch fluctuation rate α in normal audio data may be concentrated near 0 and there is no significant fluctuation.

図１に示すフローチャートに戻って、本実施例では、比較手段は、ピッチ変動率αの絶対値に対して閾値Ｔを設定し、ピッチ変動率αの絶対値が閾値Ｔよりも小さいか否かを判定する（ステップＳ２３）。そして、ピッチ変動率αの絶対値が閾値Ｔよりも大きい場合には、従来の２-sideＰＷＲ法を実施し、図１０で説明したようにピッチ長を変更せずに前側からの置換ブロックと後側からの置換ブロックとをオーバーラップ＆アド（overlap and add）した外挿を行う（ステップＳ２５）。 Returning to the flowchart shown in FIG. 1, in this embodiment, the comparison means sets a threshold value T for the absolute value of the pitch fluctuation rate α, and whether or not the absolute value of the pitch fluctuation rate α is smaller than the threshold value T. Is determined (step S23). When the absolute value of the pitch fluctuation rate α is larger than the threshold value T, the conventional 2-side PWR method is performed, and the replacement block from the front side and the rear side are changed without changing the pitch length as described in FIG. Extrapolation is performed by overlapping and adding the replacement block from the side (step S25).

それ以外、すなわちピッチ変動率αの絶対値が閾値Ｔよりも小さい場合には、損失フレームを有声音の定常区間と見なし、ピッチ変動を考慮した２-sideＰＷＲ法を実施し、損失フレームに置換ブロックのピッチ波形をコピーする毎に、このピッチ波形のコピー開始時刻ｎ（損失フレームの開始時刻ではｎ＝０）におけるピッチ長を前側フレームからの置換のbackwardＰＷＲでは式（４）で計算し、また後側フレームからの置換のforwardＰＷＲでは式（５）で計算し、各ピッチ波形のピッチ長を更新する（ステップＳ２７）。 In other cases, that is, when the absolute value of the pitch fluctuation rate α is smaller than the threshold T, the lost frame is regarded as a steady section of voiced sound, and the 2-side PWR method is performed in consideration of the pitch fluctuation, and the replacement block is replaced with the lost frame. Each time the pitch waveform is copied, the pitch length of the pitch waveform at the copy start time n (n = 0 at the start time of the lost frame) is calculated by equation (4) in the backward PWR of the replacement from the previous frame, and later In the forward PWR for replacement from the side frame, the pitch length of each pitch waveform is updated by calculation using equation (5) (step S27).

図４は、このステップＳ２７におけるピッチ変動率αを考慮した２-sideＰＷＲ法について図４に示す波形図を参照して説明する。
図４（ａ）に示すような原音声を受信側で受信した結果、ｋ番目のフレームが受信できず、図４（ｂ）に示すように損失したとすると、この損失フレームに対して図４（ｃ）に示すように前側のｋ−１番号のフレームからの置換ブロックを推定し、損失フレームにコピーするというbackwardＰＷＲを行い、更に図４（ｄ）に示すように後側のｋ＋１番号のフレームの方から置換ブロックを推定し、損失フレームにコピーするというforwardＰＷＲを行うが、このbackwardＰＷＲおよびforwardＰＷＲは、それぞれ前記式（４）および（５）で各コピー毎にピッチ長を計算し、この計算したピッチ長を有するように行われる。この結果、図４（ｃ）、（ｄ）をまたがって縦線で示すように両波形に位相のずれがなく、位相がほぼ一致するようにコピーが行われる。 FIG. 4 explains the 2-side PWR method in consideration of the pitch variation rate α in step S27 with reference to the waveform diagram shown in FIG.
As a result of receiving the original speech as shown in FIG. 4A on the receiving side, if the k-th frame cannot be received and is lost as shown in FIG. As shown in FIG. 4C, a backward PWR is performed in which a replacement block from the frame of the k-1 number on the front side is estimated and copied to the lost frame. Further, as shown in FIG. The forward PWR in which the replacement block is estimated from the side and copied to the lost frame is performed. The backward PWR and the forward PWR are calculated by calculating the pitch length for each copy by the equations (4) and (5), respectively. It is performed so as to have a pitch length. As a result, as shown by the vertical lines across FIGS. 4C and 4D, copying is performed so that there is no phase shift between the two waveforms and the phases almost coincide.

このようにピッチ長が調整され位相がほぼ一致するように生成されて損失フレームへのコピーされるbackwardＰＷＲおよびforwardＰＷＲの両者は、図４（ｅ）に示すように、コピー開始時刻からの時間を考慮した比例配分によりオーバーラップ＆アド（overlap and add）され、エラー隠蔽音声を生成する。この結果、損失フレームは、前側のフレームと後側のフレーム間で連続し、損失のない音声としてピッチ変動も考慮して再生される。 As shown in FIG. 4 (e), both the backward PWR and the forward PWR that are generated in such a manner that the pitch length is adjusted and the phases are substantially matched and copied to the lost frame take into account the time from the copy start time. Overlap and add by the proportional distribution, error concealment speech is generated. As a result, the lost frame is continuous between the front frame and the rear frame, and is reproduced as a lossless sound in consideration of pitch fluctuation.

図４（ｅ）に示すように再生されたエラー隠蔽音声は、図４（ａ）に示す原音声と比較すると、図４（ｆ）に示すような差分波形が生成される。この差分波形は、図１０（ｅ）に示した従来の差分波形に比較して、かなり小さく低減し、エラー隠蔽音声は原音声にかなり近いことが分かる。これは、ピッチ変動を考慮して２-sideＰＷＲを実施した結果、従来の２-sideＰＷＲ法では無視されていた有声音の定常区間におけるピッチ変動により生じる位相のずれを低減しているためである。 When the error concealed speech reproduced as shown in FIG. 4 (e) is compared with the original speech shown in FIG. 4 (a), a differential waveform as shown in FIG. 4 (f) is generated. This difference waveform is considerably reduced compared to the conventional difference waveform shown in FIG. 10E, and it can be seen that the error concealed speech is much closer to the original speech. This is because, as a result of performing the 2-side PWR in consideration of the pitch variation, the phase shift caused by the pitch variation in the steady section of the voiced sound that has been ignored in the conventional 2-side PWR method is reduced.

上述したように、ピッチ変動率αが閾値よりも大きい場合には、比較手段は、従来の２-sideＰＷＲ法を実施し、小さい場合には、ピッチ変動を考慮した２-sideＰＷＲ法を実施することにより、損失ブロックに対する置換波形を生成し（ステップＳ２９）、この置換波形の音声データをｋ−１とｋ＋１の前後のフレーム間に挿入してフレームを連結し（ステップＳ３１）、音声データを再生する（ステップＳ３３）。 As described above, when the pitch fluctuation rate α is larger than the threshold value, the comparison unit performs the conventional 2-side PWR method, and when it is smaller, the comparison unit performs the 2-side PWR method considering the pitch variation. Thus, a replacement waveform for the lost block is generated (step S29), the audio data of this replacement waveform is inserted between frames before and after k-1 and k + 1, and the frames are connected (step S31) to reproduce the audio data. (Step S33).

図５および図６は、図１のステップＳ２３でピッチ変動率αと比較される閾値Ｔと品質の関係を示すグラフである。両図は、上述した音声データにおける全フレームに対して本発明のパケット損失隠蔽方法を適用した場合のエラー隠蔽音声のＳＮＲ（Signal-to-Noise Ratio）とＰＥＳＱ（Perceptual Evaluation of Speech Quality）を閾値Ｔに対する品質として示すグラフである。両図に示すように、閾値Ｔは比較的小さな、例えば０．４〜０．１５程度が好ましく、大き過ぎると、ＳＮＲやＰＥＳＱなどの品質は悪くなることが分かる。 5 and 6 are graphs showing the relationship between the threshold value T and the quality compared with the pitch variation rate α in step S23 of FIG. Both figures show threshold values of SNR (Signal-to-Noise Ratio) and PESQ (Perceptual Evaluation of Speech Quality) of error concealed speech when the packet loss concealment method of the present invention is applied to all frames in the speech data described above. It is a graph shown as quality with respect to T. As shown in both figures, it is understood that the threshold value T is relatively small, for example, about 0.4 to 0.15, and if it is too large, the quality of SNR, PESQ, etc. deteriorates.

なお、ＰＥＳＱは、ピニオンモデルを考慮した指標であり、主観評価との相関が高く、ＶｏＩＰにおける音声データの品質評価に適用されている。ＰＥＳＱの値は、４．５から−０．５までの範囲に分布し、値が大きい程、品質がよいとされる。結果として、閾値を大きくし過ぎると、逆に品質が低下する可能性があり、閾値を適切に設定することが重要である。 PESQ is an index that takes into account the pinion model, has a high correlation with subjective evaluation, and is applied to voice data quality evaluation in VoIP. The value of PESQ is distributed in the range from 4.5 to -0.5, and the larger the value, the better the quality. As a result, if the threshold value is increased too much, the quality may be lowered, and it is important to set the threshold value appropriately.

図７および図８は、それぞれ上述した本発明のパケット損失隠蔽方法の有効性を確認するために行った評価実験結果のパケット損失率の変化に対するＳＮＲおよびＰＥＳＱを従来法の場合と比較して示すグラフである。なお、パケット損失率は、０．５から１０％まで変化させた。また、ＳＮＲおよびＰＥＳＱはすべての音声データから得られた平均値を最終的な評価値としている。 FIGS. 7 and 8 show the SNR and PESQ with respect to changes in the packet loss rate as a result of evaluation experiments performed to confirm the effectiveness of the packet loss concealment method of the present invention described above in comparison with the conventional method. It is a graph. Note that the packet loss rate was varied from 0.5 to 10%. In addition, SNR and PESQ use the average values obtained from all audio data as final evaluation values.

この評価実験で使用した音声データは、日本音響学会の音声データベースからランダムに選択した男性話者４名、女性話者４名の音声資料を標本化周波数８ｋＨz 、量子化精度１６ビットで再サンプリングしたものである。但し、評価実験のパラメータであるパケット損失率の最小値を考慮して、時間長にして２０ｓ、パケット数にして１０００個以上となるように話者毎に音声資料をランダムに連結したものを１個の音声データとし、各話者５個で合計４０個の音声データを用意した。結果として、音声データの合計の時間長は８９８．１６ｓ、パケット数にして４４９０８となった。 The audio data used in this evaluation experiment was resampled with audio data of 4 male speakers and 4 female speakers randomly selected from the audio database of the Acoustical Society of Japan at a sampling frequency of 8 kHz and a quantization accuracy of 16 bits. Is. However, in consideration of the minimum value of the packet loss rate that is a parameter of the evaluation experiment, 1 is obtained by randomly connecting audio materials for each speaker so that the time length is 20 s and the number of packets is 1000 or more. A total of 40 voice data was prepared for each of five speakers. As a result, the total time length of the voice data was 898.16 s, and the number of packets was 44908.

そして、評価実験では、これらの音声データに対して擬似的にエラーを生じさせ、（ａ）本発明のパケット損失隠蔽方法、（ｂ）従来の２-sideＰＷＲ法、（ｃ）Ｇ．７１１ＰＷＲ法のそれぞれについてＳＮＲおよびＰＥＳＱを指標とした客観的な品質評価を行った。なお、本発明のパケット損失隠蔽方法における前記閾値は、Ｔ＝０．１とした。また、評価実験では、ランダムに生じる単独パケットの損失による瞬断のみを対象とした。更に、パケット損失率が同一であっても、損失フレームの本来のゲインの大小により評価が異なることを考慮して、それぞれの音声データ毎にゲインの大きいフレームから順番にエラーを生じさせた。 In the evaluation experiment, a pseudo error is generated in the voice data, and (a) the packet loss concealment method of the present invention, (b) the conventional 2-side PWR method, (c) G.G. For each of the 711PWR methods, an objective quality evaluation was performed using SNR and PESQ as indices. Note that the threshold in the packet loss concealment method of the present invention is T = 0.1. In the evaluation experiment, only the instantaneous interruption due to the loss of a single packet generated randomly was targeted. Furthermore, even if the packet loss rate is the same, errors are generated in order from the frame with the largest gain for each audio data in consideration of the fact that the evaluation varies depending on the magnitude of the original gain of the lost frame.

図７および図８において、（ａ）で示す本発明のパケット損失隠蔽方法によるＳＮＲおよびＰＥＳＱは、（ｂ）および（ｃ）で示す従来の２-sideＰＷＲ法およびＧ．７１１ＰＬＣ（Packet Loss Concealment）法によるものに比較して、値が大きく、従来の他の方法よりもエラー隠蔽を効果的に行えることが分かった。 7 and 8, the SNR and PESQ according to the packet loss concealment method of the present invention shown in (a) are the same as the conventional 2-side PWR method shown in (b) and (c). It was found that the value was larger than that according to the 711 PLC (Packet Loss Concealment) method, and error concealment was more effective than other conventional methods.

なお、上記実施形態のＶｏＩＰ音声通信におけるパケット損失隠蔽方法の処理手順をプログラムとして例えばＣＤやＦＤなどの記録媒体に記録して、この記録媒体をコンピュータシステムに組み込んだり、または記録媒体に記録されたプログラムを通信回線を介してコンピュータシステムにダウンロードしたり、または記録媒体からインストールし、該プログラムでコンピュータシステムを作動させることにより、パケット損失隠蔽方法を実施するパケット損失隠蔽装置として機能させることができることは勿論であり、このような記録媒体を用いることにより、その流通性を高めることができるものである。 The processing procedure of the packet loss concealment method in the VoIP voice communication of the above embodiment is recorded as a program on a recording medium such as a CD or FD, and this recording medium is incorporated in a computer system or recorded on a recording medium. It is possible to function as a packet loss concealment device that implements a packet loss concealment method by downloading a program to a computer system via a communication line or installing it from a recording medium and operating the computer system with the program. Of course, the use of such a recording medium can improve the distribution.

本発明の一実施例に係わるＶｏＩＰ音声通信におけるパケット損失隠蔽方法の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the packet loss concealment method in the VoIP voice communication concerning one Example of this invention. 図１に示す実施例のパケット損失隠蔽方法におけるピッチ長の更新を説明するための図である。It is a figure for demonstrating the update of the pitch length in the packet loss concealment method of the Example shown in FIG. ピッチ変動率αの絶対値に対するパケット数を示すヒストグラムである。It is a histogram which shows the number of packets with respect to the absolute value of pitch variation rate (alpha). 図１に示す実施例のパケット損失隠蔽方法におけるピッチ変動を考慮した２-sideＰＷＲ法を説明するための波形図である。It is a wave form diagram for demonstrating the 2-sidePWR method in consideration of the pitch fluctuation | variation in the packet loss concealment method of the Example shown in FIG. ピッチ変動率と比較される閾値に対するＳＮＲを示す図である。It is a figure which shows SNR with respect to the threshold value compared with a pitch fluctuation rate. ピッチ変動率と比較される閾値に対するＰＥＳＱを示す図である。It is a figure which shows PESQ with respect to the threshold value compared with a pitch fluctuation rate. 本発明のパケット損失隠蔽方法の有効性を確認するために行った評価実験結果のパケット損失率の変化に対するＳＮＲを従来法の場合と比較して示すグラフである。It is a graph which shows SNR with respect to the change of the packet loss rate of the evaluation experiment result performed in order to confirm the effectiveness of the packet loss concealment method of this invention compared with the case of a conventional method. 本発明のパケット損失隠蔽方法の有効性を確認するために行った評価実験結果のパケット損失率の変化に対するＰＥＳＱを従来法の場合と比較して示すグラフである。It is a graph which shows PESQ with respect to the change of the packet loss rate of the evaluation experiment result performed in order to confirm the effectiveness of the packet loss concealment method of this invention compared with the case of a conventional method. 従来のＷＲ法およびＰＷＲ法を説明するための図である。It is a figure for demonstrating the conventional WR method and PWR method. 従来の２-sideＰＷＲ法を説明するための図である。It is a figure for demonstrating the conventional 2-sidePWR method.

Claims

A packet loss concealment method in VoIP voice communication for concealing a lost packet in VoIP voice communication using an IP network by waveform duplication,
Detect lost packets in receiving packets sent over the IP network,
Calculate the frame pitch before and after the lost part in the detected lost packet,
Based on the calculated pitch of the previous and next frames, the pitch fluctuation rate between the previous and next frames is calculated,
Compare this calculated pitch fluctuation rate with a predetermined threshold,
As a result of this comparison, when the pitch fluctuation rate is larger than a predetermined threshold, the normal 2-side RWR method is executed, and when the pitch fluctuation rate is lower than the predetermined threshold, the 2-side PWR method considering the pitch fluctuation rate is executed. A packet loss concealment method in VoIP voice communication.

A packet loss concealment device in VoIP voice communication that conceals a lost packet in VoIP voice communication using an IP network by waveform duplication,
A lost packet detecting means for detecting a lost packet in receiving a packet transmitted via the IP network;
A pitch calculating means for calculating the pitch of the frames before and after the lost portion in the detected lost packet;
A pitch variation rate calculating means for calculating a pitch variation rate between the preceding and following frames based on the calculated preceding and following frame pitches;
A comparison means for comparing the calculated pitch fluctuation rate with a predetermined threshold;
As a result of this comparison, when the pitch fluctuation rate is larger than a predetermined threshold, the normal 2-side PWR method is executed, and when the pitch fluctuation rate is lower than the predetermined threshold, the 2-side PWR method considering the pitch fluctuation rate is executed. A packet loss concealment device for VoIP voice communication.

A packet loss concealment program in VoIP voice communication that can be executed by a computer for concealing a lost packet in VoIP voice communication using an IP network by waveform duplication,
Lost packet detection means for detecting lost packets in receiving packets transmitted from the computer via the IP network;
A pitch calculating means for calculating the pitch of the frames before and after the lost portion in the detected lost packet;
A pitch variation rate calculating means for calculating a pitch variation rate between the preceding and following frames based on the calculated preceding and following frame pitches;
A comparison means for comparing the calculated pitch fluctuation rate with a predetermined threshold;
As a result of this comparison, when the pitch fluctuation rate is larger than a predetermined threshold, the normal 2-side PWR method is executed, and when the pitch fluctuation rate is lower than the predetermined threshold, the 2-side PWR method considering the pitch fluctuation rate is executed. A packet loss concealment program in VoIP voice communication, which functions as 2-side PWR implementation means.