JP2005077889A

JP2005077889A - Voice packet absence interpolation system

Info

Publication number: JP2005077889A
Application number: JP2003309745A
Authority: JP
Inventors: Kazuhiro Kondo; 和弘近藤; Seiji Nakagawa; 清司中川
Original assignee: Individual
Current assignee: Individual
Priority date: 2003-09-02
Filing date: 2003-09-02
Publication date: 2005-03-24

Abstract

<P>PROBLEM TO BE SOLVED: To interpolate a voice signal included in a voice packet which is lost during transmission. <P>SOLUTION: Linear prediction coefficients are calculated from voice signals normally received before and after they are lost. The prediction coefficient in a forward direction, i.e. a precedent direction is calculated from the voice signal right before the loss is calculated and the linear prediction coefficient in a backward direction, i.e. traced back is calculated from the voice signal right after the loss. A sample which is one sample after the forward linear prediction coefficient is predicted from the voice signal before the loss and a sample which is further one sample after is predicted from the voice signal before the loss. Those operations are repeated to predict all samples of the lost part. on the other hand a sample which is one sample before, is predicted from the voice signal after the loss and the backward prediction coefficient and then a sample which is further one sample before is predicted from the predicted sample and the voice signal after the loss. Those are repeated to predict all samples of the lost part. There is provided a means for obtaining a high quality voice sample for interpolating the lost part by averaging the predicted sample of the lost voice of these two kinds. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は音声信号の欠落部分の補間に関し、特に線形予測を利用した補間方式に関する。 The present invention relates to interpolation of missing portions of a speech signal, and more particularly to an interpolation method using linear prediction.

音声信号をパケット等に収容して伝送する場合発生する欠落の補間方式として、従来から、欠落部直前のピッチを推定し、欠落直前の１ピッチ分の音声信号を欠落部で必要回数繰り返して補間する方法がＩＴＵ標準Ｇ．７１１ＡｐｐｅｎｄｉｘＩに詳しく記載されている。
ＩＴＵ標準Ｇ．７１１ＡｐｐｅｎｄｉｘＩ As a missing interpolation method that occurs when an audio signal is accommodated in a packet or the like and transmitted, conventionally, the pitch immediately before the missing portion is estimated, and the audio signal for one pitch immediately before the missing portion is repeatedly interpolated by the missing portion as many times as necessary. ITU standard G. 711 Appendix I.
ITU standard G. 711 Appendix I

上記従来技術では、欠落区間が長くなると繰り返し回数が多くなり、補間部分の音質が合成的で不自然な音になる。 In the above prior art, if the missing section becomes longer, the number of repetitions increases, and the sound quality of the interpolation portion becomes synthetic and unnatural.

従って、本発明の目的は欠落区間が長くなっても補間部分の音質が不自然にならない方法を提供することである。 Therefore, an object of the present invention is to provide a method in which the sound quality of the interpolation portion does not become unnatural even if the missing section becomes long.

本発明によれば、欠落直前の音声より欠落部分の補間信号を線形予測により得る手段を備えた欠落補間方式が提供される。また本発明によれば、欠落直前および直後の音声より欠落部分の補間信号を線形予測により得る手段を備えた欠落補間方式が提供される。 According to the present invention, there is provided a missing interpolation method provided with means for obtaining an interpolation signal of a missing part by linear prediction from speech immediately before missing. Further, according to the present invention, there is provided a missing interpolation method including means for obtaining an interpolation signal of a missing portion by linear prediction from speech immediately before and after the missing.

本発明によれば、伝送中に欠落したパケットに含まれる音声信号を自然な音質で補間することが出来る。例えば３０％の音声パケットが欠落する場合、５段階の平均主観音質において従来の方法では２を下回るのに対し、本発明によれば２．４の音質を保つことが出来る。 According to the present invention, it is possible to interpolate audio signals included in packets lost during transmission with natural sound quality. For example, when 30% of voice packets are lost, the average subjective sound quality of five levels is lower than 2 in the conventional method, whereas the sound quality of 2.4 can be maintained according to the present invention.

本発明においてはまず欠落前後の正常に受信された音声信号より線形予測係数を算出しておく。この時、欠落直前の音声信号よりは前方向、すなわち先行する方向の予測係数を、また欠落直後の音声信号からは後ろ向き、すなわち時間をさかのぼった線形予測係数を算出する。次に欠落前の音声信号から上記前向線形予測係数から１サンプル先のサンプルを予測する。次に予測したサンプルと欠落前の音声信号から更に１サンプル先のサンプルを予測する。これを繰り返して、欠落部分の全サンプルを予測する。一方、欠落後の音声信号と後向予測係数から１サンプル前のサンプル、すなわち欠落部分最後のサンプルを予測する。次にこの予測サンプルと欠落直後の音声信号からさらに１サンプル前のサンプルを予測する。これを繰り返して、欠落部分の全サンプルを後ろ向きに予測する。この２種類の欠落音声の予測信号を平均化して欠落部分を補間する音声サンプルを得る。 In the present invention, linear prediction coefficients are first calculated from normally received speech signals before and after the loss. At this time, the prediction coefficient in the forward direction, that is, the preceding direction is calculated from the speech signal immediately before the loss, and the linear prediction coefficient is calculated backward from the speech signal immediately after the loss, that is, by going back in time. Next, a sample one sample ahead is predicted from the forward linear prediction coefficient from the speech signal before missing. Next, a sample one sample ahead is predicted from the predicted sample and the audio signal before missing. This is repeated to predict all samples of missing parts. On the other hand, the previous sample, that is, the last sample of the missing portion is predicted from the missing audio signal and the backward prediction coefficient. Next, a sample one more sample before is predicted from the predicted sample and the voice signal immediately after the loss. This is repeated to predict all missing samples backwards. The two types of missing speech prediction signals are averaged to obtain speech samples that interpolate the missing portions.

以下、図面を用いて本発明の実施例について説明するが、本発明の範囲をこれらに限定するものでないことはいうまでもない。 Examples of the present invention will be described below with reference to the drawings, but it goes without saying that the scope of the present invention is not limited thereto.

図１に本発明の第１の実施例を示す。あらかじめ定められたサンプル数の音声信号が１つのパケットと呼ばれる形式に集合され、あて先等の付加情報を付与され、パケット毎にネットワーク伝送される。ネットワークを伝送中にはいくつかの中継点を経る。各中継点においては、各パケットは他の場所から送信されてきたパケットとともに一旦メモリに収容され、受け取った順に次の中継点に送出される。このときあまりに中継点にパケットが集中して送られてくると、メモリが有限であるため、パケットが棄却され、収容されている音声信号が失われる。また、棄却されなくてもパケットが集中し、次の中継点に送り出されるまで長い待ち時間が発生する。いずれ最終到着点に達しても長い伝送遅延のため、収容されている音声信号の再生すべき時刻に間に合わない場合は、やはりパケットは棄却される。このようにパケットを使った音声信号の伝送においては、主に上記２種類の要因により音声信号の欠落が生じる可能性がある。 FIG. 1 shows a first embodiment of the present invention. Audio signals of a predetermined number of samples are collected in a format called one packet, added with additional information such as a destination, and transmitted over the network for each packet. There are several relay points during network transmission. At each relay point, each packet is once stored in a memory together with a packet transmitted from another location, and is sent to the next relay point in the order received. At this time, if the packets are concentrated and sent to the relay point, the memory is limited and the packets are discarded and the stored voice signal is lost. Even if the packet is not rejected, a long waiting time is generated until the packet is concentrated and sent to the next relay point. Even if the final arrival point is reached at any time, the packet is discarded if it is not in time for the playback time of the accommodated audio signal due to a long transmission delay. As described above, in the transmission of the audio signal using the packet, the audio signal may be lost mainly due to the above two types of factors.

最終到着点に到達した音声パケットはまずバッファに蓄えられる。このとき、パケットに付与されている通し番号を監視し、欠落の有無を判定し、これが欠落補間部および再生音声源切り替えスイッチに送られる。パケットが無事到着してかつ再生タイミングに間に合っている場合はパケットは分解され、次のパケットが欠落していない場合はスピーカに送られてそのまま再生される。しかしこの音声信号は次のパケットが欠落していると判断される場合は欠落補間部にてスムージング処理されたのちスピーカにて再生される。現在再生されるべきパケットが欠落していると判断される場合は、新たなパケットは読み出されず、欠落直前の蓄積してある音声信号から欠落部分の音声信号を発生する。 Voice packets that have reached the final arrival point are first stored in a buffer. At this time, the serial number given to the packet is monitored to determine the presence or absence of a missing packet, and this is sent to the missing interpolation unit and the playback audio source changeover switch. If the packet arrives safely and is in time for playback, the packet is disassembled, and if the next packet is not missing, it is sent to the speaker and played as it is. However, when it is determined that the next packet is missing, this audio signal is reproduced by the speaker after being smoothed by the missing interpolation unit. When it is determined that a packet to be reproduced at present is missing, a new packet is not read out, and a missing portion audio signal is generated from the accumulated audio signal immediately before the loss.

図２に第１の実施例における欠落補間部分の構成を示す。分解されたパケットはまずバッファに記憶される。このバッファには常に最近受信、および再生された２パケットに収容されていた音声信号が記憶されている。まず最新の音声パケットに含まれている音声信号より線形予測係数を線形予測係数算出部において算出する。予測係数の算出は例えばＬｅｖｉｎｓｏｎ−Ｄｕｒｂｉｎのアルゴリズムで算出することが出来る。このアルゴリズムについては例えばＳ．Ｈａｙｋｉｎ著“ＡｄａｐｔｉｖｅＦｉｌｔｅｒＴｈｅｏｒｙ”（Ｐｒｅｎｔｉｃｅ−Ｈａｌｌ社、１９９６年、２５４項）に詳しく記述されている。 FIG. 2 shows the configuration of the missing interpolation part in the first embodiment. The decomposed packet is first stored in a buffer. This buffer always stores the audio signal contained in the two recently received and reproduced packets. First, a linear prediction coefficient is calculated by a linear prediction coefficient calculation unit from a voice signal included in the latest voice packet. The prediction coefficient can be calculated by, for example, the Levinson-Durbin algorithm. For this algorithm, for example It is described in detail in “Adaptive Filter Theory” by Haykin (Prentice-Hall, 1996, item 254).

次に最新のパケットに含まれる音声信号をまずシフトレジスタにプリセットする。この信号と算出した予測係数よりまず欠落部の最初の音声信号を予測する。この様子を図３に示す。次に予測した１サンプルと、シフトレジスタに記憶されている音声信号よりさらに次のサンプルを予測する。これを繰り返し、欠落パケット分に相当するサンプル数の音声信号を予測する。 Next, the audio signal included in the latest packet is first preset in the shift register. Based on this signal and the calculated prediction coefficient, the first speech signal of the missing portion is first predicted. This is shown in FIG. Next, the next sample is predicted from the predicted one sample and the audio signal stored in the shift register. This process is repeated to predict a voice signal having the number of samples corresponding to the missing packet.

線形予測を予測したサンプルを用いて繰り返すと振幅が徐々に減少する。そこでこれを補正するために図２に示すように可変利得を用いる。この利得はまず１に設定され、一様に増加され、最終的にあらかじめ設定した上限値まで増加する。これにより予測サンプルの振幅は原音に極めて近い値に保たれる。 When the linear prediction is repeated using a predicted sample, the amplitude gradually decreases. In order to correct this, a variable gain is used as shown in FIG. This gain is first set to 1, is increased uniformly, and finally increases to a preset upper limit value. As a result, the amplitude of the predicted sample is kept very close to the original sound.

受信したパケットに含まれる音声信号と予測した音声信号では特性が異なる。よってこれをスムージングするため、上記線形予測係数を用いて１パケット更に過去の音声信号より欠落直前のパケットに含まれる音声信号を前記と同様に予測する。この予測したサンプルに対して、最初のサンプルにおいて０から、パケットの最終サンプルにおいて徐々に１に近づく重み１（ω_１）を乗算する。これに実際に受信した欠落直前の音声サンプルを、最初のサンプルにおいて１から、最終サンプルにおいて徐々に０に近づく重み２（ω_２）を乗算したものに加算する。ここで、ω_１＋ω_２＝１となる。この様子を図４に示す。この方法により、欠落部分から遠いサンプルに対しては受信した音声サンプルを、また欠落に近くなるにつれ予測した音声により多い重みが乗算されるので、スムーズに予測した欠落部につながる。同様のスムージングを欠落直後の音声にも適用する。すなわち予測した欠落音声信号を用いて欠落直後の音声信号をさらに繰り返し予測する。予測した音声サンプルには、欠落直後のサンプルにおいては１より欠落直後の音声パケットの最終サンプルにおいて徐々に０に近づく重み１（ω_１）を乗算し、一方受信した音声信号には最初のサンプルにおいては０から最終サンプルにおいて徐々に１に近づく重み２（ω_２）を乗算し、この２種の音声信号を加算して、スムージングを適用した音声信号として出力する。この方法により欠落直後は予測した音声信号に多くの重みを与え、これを徐々に受信した音声信号に重みを移行する特性となる。 The audio signal included in the received packet has different characteristics from the predicted audio signal. Therefore, in order to smooth this, the voice signal included in the packet immediately before the missing voice packet is predicted in the same manner as described above by using the linear prediction coefficient. This predicted sample is multiplied by a weight 1 (ω ₁ ) that approaches 0 in the first sample and gradually approaches 1 in the last sample of the packet. Then, the voice sample actually received just before the missing is added to a value obtained by multiplying the weight 2 (ω ₂ ) gradually approaching 0 in the final sample from 1 in the first sample. Here, ω ₁ + ω ₂ = 1. This is shown in FIG. With this method, the received voice sample is multiplied by the weight of the predicted voice as the sample is far from the missing portion, and the predicted voice is multiplied as it becomes closer to the missing portion, leading to a smoothly predicted missing portion. The same smoothing is also applied to the voice immediately after missing. That is, using the predicted missing voice signal, the voice signal immediately after the missing is further repeatedly predicted. The predicted voice sample is multiplied by a weight 1 (ω ₁ ) that gradually approaches 0 in the final sample of the voice packet immediately after the missing in the sample immediately after the missing, while the received voice signal is multiplied in the first sample. Is multiplied by a weight 2 (ω ₂ ) that gradually approaches 1 from 0 in the final sample, and these two types of audio signals are added and output as an audio signal to which smoothing is applied. Immediately after the loss by this method, a large amount of weight is given to the predicted audio signal, and the weight is gradually transferred to the audio signal that has been received.

以上の第１の実施例によれば、音声パケット欠落部に対する音声信号を自然な音質を持つ予測音声信号で補間することが出来る。 According to the first embodiment described above, it is possible to interpolate the voice signal for the voice packet missing portion with the predicted voice signal having natural sound quality.

次に欠落の補間を、欠落直前の音声信号から前向き予測を用いて得た予測信号と、欠落直後の音声信号から後向き予測を用いて得た予測信号双方を用いて行う第２の実施例について説明する。図５にこの実施例の欠落補完部分を示す。全体の構成は図１に示すとおりである。前方向パケットバッファ、シフトレジスタ、前向き線形予測音声係数算出、前向き線形予測および可変利得で構成される前向き欠落補間部分は第１の実施例と同じである。 Next, in the second embodiment, missing interpolation is performed using both a prediction signal obtained from the speech signal immediately before the loss using forward prediction and a prediction signal obtained from the speech signal immediately after the loss using backward prediction. explain. FIG. 5 shows a missing complement portion of this embodiment. The overall configuration is as shown in FIG. The forward missing interpolation part composed of the forward packet buffer, shift register, forward linear prediction speech coefficient calculation, forward linear prediction and variable gain is the same as that of the first embodiment.

一方、音声信号は後方向パケットバッファにも格納される。欠落直後の同バッファに蓄積された音声信号より欠落部を予測する後向き線形予測係数を算出する。これはバッファに格納されている１パケット分の音声信号を時間を逆転して、前記Ｌｅｖｉｎｓｏｎ−Ｄｕｒｂｉｎのアルゴリズムを適用すればよい。次に時間軸を逆転した欠落直後の音声信号をまずシフトレジスタにプリセットする。この信号と算出した予測係数よりまず欠落部の最後の音声信号を予測する。この様子は図３と同様であるが、時間軸が逆転したもの、すなわち欠落部分の最後のサンプルから徐々にさかのぼって予測を行う。次に予測した１サンプルと、シフトレジスタに記憶されている音声信号双方よりさらに１つ前のサンプルを予測する。これを繰り返し、欠落パケット分に相当するサンプル数の音声信号を予測する。後向きの予測でも前向きと同様、予測を繰り返すと振幅が減少するので、徐々に利得が大きくなる可変利得を適用する。 On the other hand, the audio signal is also stored in the backward packet buffer. A backward linear prediction coefficient for predicting the missing part is calculated from the audio signal stored in the buffer immediately after the missing. For this purpose, the Levinson-Durbin algorithm may be applied by reversing the time of the audio signal for one packet stored in the buffer. Next, the audio signal immediately after the loss with the time axis reversed is first preset in the shift register. Based on this signal and the calculated prediction coefficient, the last speech signal of the missing part is first predicted. This situation is the same as in FIG. 3, but prediction is performed by gradually going back from the last sample in which the time axis is reversed, that is, the missing part. Next, a predicted one sample and a sample immediately before both of the audio signal stored in the shift register are predicted. This process is repeated to predict a voice signal having the number of samples corresponding to the missing packet. In the backward prediction, as in the forward prediction, since the amplitude decreases when the prediction is repeated, a variable gain that gradually increases the gain is applied.

このようにして欠落部分に対し前向き予測および後向き予測より得た２種類の音声信号を得る。これを欠落の最初の部分は前者に、最後の部分は後者に多い重みを乗算の上加算して補間信号を得る。この様子を図６に示す。欠落部分の前向き予測信号には欠落直後は１となり、徐々に０となる重み２（ω_２）を乗算する。一方、後向き予測信号には欠落直後は０となり、徐々に１となる重み３（ω_３）を乗算する。欠落区間においてはω_２＋ω_３＝１である。この２種の重み付き信号の和を補間信号とする。これにより予測の繰り返し回数が少なくまだ前向き予測の精度が高いと思われる欠落前半では前向き予測音声を多く、一方同じ理由で後ろ向き予測の精度が高い後半では後ろ向き予測音声を多く含むことになる。 In this way, two types of speech signals obtained from forward prediction and backward prediction for the missing portion are obtained. The interpolated signal is obtained by multiplying the first part of the missing part by the former and the latter part by multiplying the latter by multiplying the weights. This is shown in FIG. The forward prediction signal of the missing part is multiplied by a weight 2 (ω ₂ ) that becomes 1 immediately after the missing and gradually becomes 0. On the other hand, the backward prediction signal is multiplied by a weight 3 (ω ₃ ) that becomes 0 immediately after being lost and gradually becomes 1. In the missing section, ω ₂ + ω ₃ = 1. The sum of these two types of weighted signals is used as an interpolation signal. As a result, in the first half of the missing period, where the number of prediction iterations is small and the accuracy of forward prediction is still high, there are many forward prediction voices. On the other hand, in the latter half, where the accuracy of backward prediction is high for the same reason, many backward prediction voices are included.

欠落前の信号は第１の実施例と同じようにさらに１パケット前の音声信号から予測した音声と受信したパケットの音声信号をスムージングする。一方、欠落直後の音声信号も同様にスムージングする。すなわち、さらに１パケット先の音声信号から後ろ向きに音声信号を予測する。これと、欠落直後に受信した音声信号に重みを乗算して加算し、スムージングをかける。このとき欠落直後は予測音声信号の重み３（ω_３）を１にし、これを徐々に０にする。一方、受信信号の重み１（ω_１）を欠落直後は０とし、これを徐々に１にしていく。ここでもω_３＋ω_１＝１となる。以上の２種の重みを乗算した信号の和を音声信号とする。これにより欠落直後は後ろ向き予測音声信号の特性を多く含み、徐々に受信音声信号の特性を多く含む音声信号となる。 As in the first embodiment, the signal before missing further smoothes the voice predicted from the voice signal one packet before and the voice signal of the received packet. On the other hand, the audio signal immediately after the loss is similarly smoothed. That is, the audio signal is predicted backward from the audio signal one packet ahead. This is multiplied by a weight and added to the audio signal received immediately after the loss, and smoothed. At this time, immediately after the loss, the weight 3 (ω ₃ ) of the predicted speech signal is set to 1, and this is gradually set to 0. On the other hand, the received signal weight 1 (ω ₁ ) is set to 0 immediately after being lost, and is gradually set to 1. Again, ω ₃ + ω ₁ = 1. The sum of signals obtained by multiplying the above two types of weights is defined as an audio signal. As a result, immediately after the loss, the sound signal includes many characteristics of the backward predicted sound signal and gradually includes many characteristics of the received sound signal.

以上の第２の実施例により、欠落直後の正常に受信した音声信号も利用することによりさらに自然で高品質な音声信号で欠落を補間することが出来る。 According to the second embodiment described above, it is possible to interpolate the lack with a more natural and high-quality speech signal by using the normally received speech signal immediately after the lack.

次に第１の実施例に示した前向き予測音声信号と、第２の実施例に示した前向きと後向き双方を用いた双方向予測音声信号を切り替えて用いる第３の実施例について説明する。 Next, a description will be given of a third embodiment in which the forward predicted speech signal shown in the first embodiment and the bidirectional predicted speech signal using both forward and backward directions shown in the second embodiment are switched.

第２の実施例は欠落が多くても高品質な補間音声を得ることができる。しかし欠落後のパケットに含まれる音声信号を利用するため、このパケットの受信を待つ必要がある。すなわち、長い遅延が必要となる。一方、前向き予測ではあえて欠落後のパケット受信を待つ必要がないため、長い遅延が必要ない。１０％の欠落までは前向き予測と双方向予測ではそれほど音質に差がない。そこで欠落率を監視し、１０％程度の欠落率までは図２に図示し第１の実施例で説明した前向き予測補間を採用し、これ以上では図５に図示し第２の実施例で説明した双方向予測を用いる。この様子を図７に示す。 In the second embodiment, high quality interpolated speech can be obtained even if there are many missing parts. However, since the audio signal included in the lost packet is used, it is necessary to wait for reception of this packet. That is, a long delay is required. On the other hand, in the forward prediction, there is no need to wait for packet reception after the loss, so that a long delay is not necessary. Until 10% loss, there is not much difference in sound quality between forward prediction and bidirectional prediction. Therefore, the missing rate is monitored, and the forward predictive interpolation illustrated in FIG. 2 and described in the first embodiment is adopted up to a missing rate of about 10%, and more than that, illustrated in FIG. 5 and described in the second embodiment. Bi-directional prediction. This is shown in FIG.

本実施例の構成は第１の実施例とほぼ同じであるが、バッファより出力される欠落信号が欠落率算出部に入力される。欠落率算出部では欠落信号より欠落率の移動平均を推定する。欠落率が１０％程度までは第１の実施例で説明した前向き予測補間を行い、これをスピーカより再生する。一方１０％以上となった場合は第２の実施例で説明した双方向予測補間を行い、これをスピーカより再生する。 The configuration of this embodiment is substantially the same as that of the first embodiment, but the missing signal output from the buffer is input to the missing rate calculation unit. The missing rate calculation unit estimates the moving average of the missing rate from the missing signal. The forward predictive interpolation described in the first embodiment is performed until the missing rate is about 10%, and this is reproduced from the speaker. On the other hand, when it becomes 10% or more, the bidirectional predictive interpolation described in the second embodiment is performed, and this is reproduced from the speaker.

第３の実施例に拠れば、欠落率が低い場合は遅延を少なく抑えて高品質な欠落補間を行い、欠落率が高い場合はやや長い遅延を許容し高品質な音声補間を行うことが出来る。 According to the third embodiment, when the missing rate is low, high-quality missing interpolation is performed with a small delay, and when the missing rate is high, a slightly longer delay is allowed and high-quality speech interpolation can be performed. .

本発明の第１の実施例を示す図である。It is a figure which shows the 1st Example of this invention. 本発明の第１の実施例における欠落補間部の構成を示す図である。It is a figure which shows the structure of the missing interpolation part in 1st Example of this invention. 本発明の第１の実施例における線形予測補間の動作を説明する図である。It is a figure explaining the operation | movement of the linear prediction interpolation in 1st Example of this invention. 本発明の第１の実施例におけるスムージングの動作を説明する図である。It is a figure explaining the operation | movement of the smoothing in 1st Example of this invention. 本発明の第２の実施例における欠落補間部の構成を示す図である。It is a figure which shows the structure of the missing interpolation part in 2nd Example of this invention. 本発明の第２の実施例におけるスムージングの動作を示す図である。It is a figure which shows the operation | movement of the smoothing in 2nd Example of this invention. 本発明の第３の実施例における欠落補間部の構成を示す図である。It is a figure which shows the structure of the missing interpolation part in the 3rd Example of this invention.

Explanation of symbols

１１…バッファ
１２…欠落補間部
２１…線形予測係数算出部
２２…線形予測部 DESCRIPTION OF SYMBOLS 11 ... Buffer 12 ... Missing interpolation part 21 ... Linear prediction coefficient calculation part 22 ... Linear prediction part

Claims

Means for packetizing the audio signal; means for transmitting the packet to the network; means for receiving the packet transmitted from the network; means for resolving the received packet to reproduce the audio signal; A voice communication system comprising means for detecting that a packet has not been received, and means for estimating and interpolating a voice signal contained in a missing packet from a voice signal contained in a normally received packet.

The voice communication method according to claim 1, wherein linear prediction is recursively repeated using a voice signal included in a voice packet normally received immediately before the missing voice packet to interpolate the missing voice signal.

3. The voice communication system according to claim 2, further comprising a variable gain that corrects a decrease in prediction gain when interpolating a missing voice signal by repeating linear prediction.

The voice communication according to claim 2, further comprising means for performing smoothing by replacing a voice signal contained in a voice packet immediately before the loss with a linear sum of the voice signal predicted by repeating linear prediction of the signal of the packet and the received voice signal. method.

For the interpolation of the missing voice signal, the voice signal included in the voice packet normally received just before the missing and the voice signal contained in the voice packet normally received just after the missing are recursively repeated to perform the linear prediction, thereby missing the voice signal. The voice communication system according to claim 1 for interpolating.

The voice signal contained in the voice packet immediately after the loss is smoothed by replacing the voice signal contained in the voice packet immediately before the loss with the linear sum of the signal of the packet repeated by linear prediction and the received voice signal. 6. A voice communication system according to claim 5, further comprising means for performing smoothing by replacing the signal of the packet with a linear sum of a voice signal predicted by repeating linear prediction and a received voice signal.

The type and weight of a linear prediction signal used for missing interpolation and its weight, and the type and weight of a signal used for smoothing a speech signal contained in packets before and after the missing are changed according to the number of consecutive missing packets. The voice communication method described.

6. Means for monitoring the missing rate of voice packets; means for interpolating missing speech using the voice signal immediately before missing according to claim 2; and voice signals immediately before and after missing according to claim 5. A voice communication system that interpolates missing voice signals by adaptively switching the means for interpolating missing voices.