JP2005534984A

JP2005534984A - Voice communication unit and method for reducing errors in voice frames

Info

Publication number: JP2005534984A
Application number: JP2004526664A
Authority: JP
Inventors: アラステアギブス、ジョナサン; アフテラック、スティーブン
Original assignee: Motorola Inc
Current assignee: Motorola Solutions Inc
Priority date: 2002-07-31
Filing date: 2003-05-12
Publication date: 2005-11-17
Also published as: KR20050027272A; WO2004015690A1; CN100349395C; GB2391440A; GB0217729D0; CN1672193A; EP1527440A1; AU2003240644A1; GB2391440B

Abstract

入力音声信号を表わすことが可能な音声エンコーダ（１３４）を備えた音声通信ユニット（１００）であって、音声エンコーダ（１３４）は、音声デコーダに多くの音声フレームを送信するための伝送路（２８１）を有し、音声エンコーダ（１３４）は、伝送路（２８１）上を送信される多くの音声フレームに対する１または複数の参照を送信するための仮想伝送路（２８２）を特徴とし、該１または複数の参照は、フレームが誤って受け取られた場合に置換フレームとして使用される、伝送路（２８１）上を送信される多くの音声フレーム内の代替の音声フレームに関するものである、音声通信ユニット（１００）。音声通信ユニット（１００）は、より正確な置換フレーム機構が提供され、そのため回復された音声フレームで望ましくないアーチファクトが聞こえる危険性を低下させるという、少なくとも１つの利点を提供する。A voice communication unit (100) having a voice encoder (134) capable of representing an input voice signal, wherein the voice encoder (134) is a transmission path (281) for transmitting a number of voice frames to a voice decoder. The speech encoder (134) is characterized by a virtual transmission path (282) for transmitting one or more references to a number of speech frames transmitted over the transmission path (281), The plurality of references relate to alternative voice frames in a number of voice frames transmitted over the transmission path (281) that are used as replacement frames if a frame is received in error, the voice communication unit ( 100). The voice communication unit (100) provides at least one advantage that a more accurate replacement frame mechanism is provided, thus reducing the risk of hearing undesirable artifacts in the recovered voice frame.

Description

本発明は音声の符号化および音声通信ユニットにおける音声コーデックの性能を改善する方法に関する。本発明は、音声コーデックにおけるエラー軽減に適用可能であるが、これに限定されるわけではない。 The present invention relates to speech coding and methods for improving the performance of speech codecs in speech communication units. The present invention is applicable to error reduction in an audio codec, but is not limited to this.

個人の移動無線ユーザのための、グローバル移動体通信システム（ＧＳＭ）、セルラ電話標準システム、および地上基盤無線（ＴＥＴＲＡ）システムを初めとする、現在の多くの音声通信システムは、音声パターンをエンコードおよびデコードするために音声処理ユニットを使用している。このような音声通信システムでは、送信ユニット内の音声エンコーダが、アナログの音声パターンを送信に適したデジタル形式に変換する。受信ユニット内の音声デコーダは、受信したデジタルの音声信号を、可聴なアナログの音声パターンに変換する。 Many current voice communication systems, including Global Mobile Communication System (GSM), Cellular Telephone Standard System, and Terrestrial Radio (TETRA) System, for personal mobile radio users encode and encode voice patterns. An audio processing unit is used for decoding. In such a voice communication system, a voice encoder in the transmission unit converts an analog voice pattern into a digital format suitable for transmission. An audio decoder in the receiving unit converts the received digital audio signal into an audible analog audio pattern.

そのような無線音声通信システム用の周波数スペクトルは貴重な資源であるため、１つの周波数帯域当たりのユーザ数を最大限にするために、音声信号によって使用されるチャネル帯域幅を制限することが望ましい。従って、音声符号化技術の使用における主な目的は、性能（fidelity) を損なわずに、圧縮技術の使用により、音声パターンの占める容量をできるだけ減少させることである。 Because the frequency spectrum for such wireless voice communication systems is a valuable resource, it is desirable to limit the channel bandwidth used by voice signals to maximize the number of users per frequency band. . Therefore, the main objective in using speech coding techniques is to reduce the capacity occupied by speech patterns as much as possible by using compression techniques without compromising performance.

音声・データ通信システムに関する更なるアプローチは、同程度のデータ信号と比較した時に、音声信号に対しては実質的に、より少ない保護を提供することである。このアプローチでは、データパケットよりも音声パケットにかなり多くのエラーが生じると共に、音声パケット全体が失われるという危険性の増加にもつながる。 A further approach for voice and data communication systems is to provide substantially less protection for voice signals when compared to comparable data signals. This approach introduces significantly more errors in voice packets than data packets and leads to an increased risk of losing the entire voice packet.

音声デコーダでは、
（ｉ）受信した音声フレーム内に、あまりにも多くのビットエラーが存在する；または（ｉｉ）インターネット・プロトコル（ＩＰ）ベースのネットワーク内データパケット（音声情報を含むことがある）が失われている；
という場合に、例えば音声通信ユニットの性能を改善するために、エラー軽減技術が使用されることが一般的である。 In the audio decoder,
(I) There are too many bit errors in the received voice frame; or (ii) Internet Protocol (IP) based intra-network data packets (which may contain voice information) are lost ;
In such cases, error mitigation techniques are commonly used, for example, to improve the performance of voice communication units.

「悪いフレーム（bad frame ）」の軽減技術は、誤って受け取られたフレームの可聴な影響を最小限にするために必要である。「誤って受け取られる」とは、本明細書では、誤りと共に受け取られるか、あるいは全く受け取られないことを意味している。そのような技術は失われた音声フレームの推定値を再生するのであって、デコードされた音声にサイレンスかノイズのいずれかを導入するものではない。そのような技術は、通常、音声の統計学的な定常的特性を使用することを含む。エラーのある１つのフレームは、それを、エネルギー、ピッチ、スペクトルおよび以前のフレームからの有声化を含む同様のパラメータに置き換えることにより通常は十分に推定される。しかしながら、音声は真に定常であるわけではなく（例えば音声の出だし）、破裂音は非常に短い出来事である。従って、この単純な「置換」技術は、しばしば、不自然で、したがって望ましくないアーチファクトを招来することがある。 "Bad frame" mitigation techniques are necessary to minimize the audible effects of erroneously received frames. “Received incorrectly” means herein received with an error or not received at all. Such a technique reproduces estimates of lost speech frames, and does not introduce either silence or noise into the decoded speech. Such techniques typically involve using the statistical stationary properties of speech. One frame in error is usually well estimated by replacing it with similar parameters including energy, pitch, spectrum and voicing from previous frames. However, the voice is not truly steady (eg, the voice comes out) and the plosive is a very short event. Thus, this simple “replacement” technique is often unnatural and therefore can lead to undesirable artifacts.

理想的な世界では、送信中断の両側からデータを補間すること、すなわち、悪いフレームシーケンスの後と前にデータを採り、その間を補間することが好ましい。しかしながら、そのようなアプローチは、望ましくない遅延を招来するため、音声通信システムでは許
容しがたい。 In an ideal world, it is preferable to interpolate data from both sides of the transmission interruption, ie take data before and after the bad frame sequence and interpolate between them. However, such an approach introduces undesirable delays and is unacceptable in a voice communication system.

いくつかの悪いフレームが受信されると、音声信号のエネルギーはしばしば数フレーム後に０まで減少される。音声が有声であるか否かに基づいて繰り返す対象を変更することが有用であるため、「有声化」パラメータが含まれていることが多い。原則として、有声の音声については、周期的な成分を単に繰り返すことが望ましい。対照的に、無声の音声については、周期的にしすぎずに、同様のオーディオ・スペクトルと同様のエネルギーを生成することが望ましい。 When several bad frames are received, the energy of the speech signal is often reduced to zero after a few frames. Since it is useful to change what is repeated based on whether the voice is voiced or not, a “voiced” parameter is often included. In principle, for voiced speech, it is desirable to simply repeat the periodic component. In contrast, for unvoiced speech, it is desirable not to be too periodic, but to produce similar energy with a similar audio spectrum.

本発明の発明者らは、悪いフレームの軽減戦略としてそのような単純な「置換」フレーム機構を使用することの限界に気づくと共に、それを正しく認識した。詳細には、本発明の発明者らは、置換フレームはまれな場合にしか本当に適切なフレームでないことに気づいた。さらに、多くのフレームが誤って受け取られると、これは低品質の無線通信リンクでは頻繁に生じ得ることであるが、そのような置換フレーム機構はさらに許容しがたいものである。 The inventors of the present invention have realized and correctly recognized the limitations of using such a simple “replacement” frame mechanism as a bad frame mitigation strategy. In particular, the inventors of the present invention have realized that replacement frames are really only suitable frames in rare cases. Moreover, if many frames are received in error, this can often occur in low quality wireless communication links, but such a replacement frame mechanism is even more unacceptable.

従って、そのような音声コーデックを使用した場合に、上述の欠点のうちの少なくとも一部を軽減する、改善されたエラー軽減技術を提供することが必要とされている。 Therefore, there is a need to provide an improved error mitigation technique that mitigates at least some of the aforementioned drawbacks when using such audio codecs.

本発明の第１態様では、請求項１に記載の音声通信ユニットが提供される。
本発明の第２態様では、請求項１１に記載の音声通信ユニットが提供される。
本発明の第３態様では、請求項１３に記載の音声通信ユニットにおいて悪いフレームのエラー軽減を実行する方法が提供される。 In a first aspect of the present invention, a voice communication unit according to claim 1 is provided.
In a second aspect of the present invention, a voice communication unit according to claim 11 is provided.
According to a third aspect of the present invention, there is provided a method for performing bad frame error mitigation in a voice communication unit according to claim 13.

本発明の第４の態様では、請求項１４に記載の音声通信ユニットが提供される。
本発明の第５の態様では、請求項１５に記載の無線通信システムが提供される。
本発明のさらなる態様が、従属請求項で定義される。 According to a fourth aspect of the present invention, there is provided a voice communication unit according to claim 14.
In a fifth aspect of the present invention, a wireless communication system according to claim 15 is provided.
Further aspects of the invention are defined in the dependent claims.

要約すると、本発明は、現在の悪いフレームのエラー軽減技術に関連する上述の欠点の少なくとも一部を軽減する、音声コーデックを備えた通信ユニットおよび方法を提供することを目的とする。これは、伝送路上の音声フレームが誤って受け取られたと仮定した場合に、伝送路上に音声フレームを送信し、音声デコーダによって使用される代替の置換音声フレームを示すために仮想伝送路上を送信される参照／ポインタを使用することにより、主として達成される。理想的には異なるエラー統計（例えば別のＦＥＣ方式）を有する追加の仮想伝送路を使用することにより、参照／ポインタは、それが参照している音声フレームと同じエラーを受けないだろう。さらに、多くの以前に送信された音声フレームから代替の音声フレームを選択するために、バッファリング技術がエンコーダで使用される。多くの以前に送信された音声フレームは、参照される選択された代替音声フレームと同様の特性を示す。 In summary, it is an object of the present invention to provide a communication unit and method with a voice codec that alleviates at least some of the above-mentioned drawbacks associated with current bad frame error mitigation techniques. This assumes that a voice frame on the transmission path has been received in error, sends a voice frame on the transmission path, and is sent on a virtual transmission path to indicate an alternative replacement voice frame to be used by the voice decoder. This is mainly achieved by using a reference / pointer. By using an additional virtual transmission line that ideally has a different error statistic (eg, another FEC scheme), the reference / pointer will not receive the same error as the voice frame it references. In addition, buffering techniques are used at the encoder to select alternative voice frames from many previously transmitted voice frames. Many previously transmitted speech frames exhibit characteristics similar to the selected alternative speech frame referenced.

ここで、本発明の例証的実施形態を、図面を参照しながら説明する。 Illustrative embodiments of the invention will now be described with reference to the drawings.

ここで図１を参照すると、以下では移動局（ＭＳ）１００と称する、本発明の好ましい実施形態の発明概念をサポートするよう適合された、無線加入者ユニットのブロック図が示される。ＭＳ１００は、好ましくはデュプレックス・フィルタ、アンテナスイッチ、またはＭＳ１００内の受信機と送信機鎖との間の絶縁を提供するサーキュレータ１０４
に好ましくは結合された、アンテナ１０２を備えている。 Referring now to FIG. 1, a block diagram of a wireless subscriber unit adapted to support the inventive concept of a preferred embodiment of the present invention, hereinafter referred to as mobile station (MS) 100, is shown. The MS 100 is preferably a duplex filter, antenna switch, or circulator 104 that provides isolation between the receiver and transmitter chain within the MS 100.
The antenna 102 is preferably coupled to the antenna 102.

当該技術分野で周知のように、受信機鎖は一般に、走査型受信機（scanning receiver ）フロントエンド回路１０６（受信、フィルタリング、ならびに中間またはベースバンド周波数変換を有効に提供する）を備えている。走査フロントエンド回路１０６は、信号処理機能１０８に直列に結合される。信号処理機能１０８からの出力は、音声処理ユニット１３０を介して、スピーカを初めとする適切な出力デバイス１１０に供給される。 As is well known in the art, the receiver chain generally comprises a scanning receiver front-end circuit 106 (effectively providing reception, filtering, and intermediate or baseband frequency conversion). The scan front end circuit 106 is coupled in series with the signal processing function 108. The output from the signal processing function 108 is supplied to an appropriate output device 110 such as a speaker via the audio processing unit 130.

音声処理ユニット１３０は、ユーザの音声を伝送媒体で送信するのに適した形式にエンコードする音声符号化機能１３４を有している。また、音声処理ユニット１３０は、受信音声を出力デバイス（スピーカ）１１０を介して出力するのに適した形式にデコードする音声復号化機能１３２も有している。音声処理ユニット１３０は、記憶装置１１６と、およびコントローラ１１４を介してタイマ１１８と、結合して作用する。特に、音声処理ユニット１３０の動作は、本発明の好ましい実施形態の発明概念をサポートするように適合されている。詳細には、音声処理ユニット１３０は、多くの以前に送信された音声フレームから、置換音声フレームを選択するように適合されている。次に、音声処理ユニット１３０、すなわち信号プロセッサ１０８は、主要な伝送路に対する代替仮想伝送路での参照／ポインタ信号（選択された置換音声フレームを示す）の送信を開始する。音声処理ユニット１３０のこの適合は、図２を参照しながらさらに詳しく説明する。 The voice processing unit 130 has a voice encoding function 134 that encodes a user's voice into a format suitable for transmission on a transmission medium. The audio processing unit 130 also includes an audio decoding function 132 that decodes received audio into a format suitable for output via the output device (speaker) 110. The voice processing unit 130 operates in combination with the storage device 116 and the timer 118 via the controller 114. In particular, the operation of the audio processing unit 130 is adapted to support the inventive concept of the preferred embodiment of the present invention. In particular, the speech processing unit 130 is adapted to select a replacement speech frame from a number of previously transmitted speech frames. Next, the audio processing unit 130, ie, the signal processor 108, starts transmitting a reference / pointer signal (indicating the selected replacement audio frame) on the alternative virtual transmission line for the main transmission line. This adaptation of the audio processing unit 130 will be described in more detail with reference to FIG.

完全を期すため、受信機鎖は、受信信号強度表示（ＲＳＳＩ）回路１１２（走査型受信機フロントエンド１０６に結合された状態で示されている）も有するが、ＲＳＳＩ回路１１２は受信機鎖内の他のいずれの場所に配置してもよい。ＲＳＳＩ回路は、加入者ユニット全体の制御を管理するために、コントローラ１１４に結合される。コントローラ１１４は、走査型受信機フロントエンド回路１０６および信号処理機能１０８（一般にＤＳＰによって実現される）にも結合されている。したがってコントローラ１１４は、回復された情報から、ビット誤り率（ＢＥＲ）またはフレーム誤り率（ＦＥＲ）データを受け取り得る。コントローラ１１４は、復号化／符号化等の動作様式を格納すべく、メモリ・デバイス１１６に結合される。ＭＳ１００内の動作のタイミング（時間依存信号の送信または受信）を制御するために、コントローラ１１４にはタイマ１１８が通常結合される。本発明に関連して、タイマ１１８は、送信（符号化）路および／または受信（復号化）路で、音声信号のタイミングを指示する。 For completeness, the receiver chain also includes a received signal strength indication (RSSI) circuit 112 (shown coupled to the scanning receiver front end 106), but the RSSI circuit 112 is included in the receiver chain. You may arrange | position in any other place. The RSSI circuit is coupled to the controller 114 to manage control of the entire subscriber unit. Controller 114 is also coupled to scanning receiver front-end circuit 106 and signal processing function 108 (generally implemented by a DSP). Accordingly, the controller 114 may receive bit error rate (BER) or frame error rate (FER) data from the recovered information. Controller 114 is coupled to memory device 116 to store modes of operation such as decoding / encoding. A timer 118 is typically coupled to the controller 114 to control the timing of operations within the MS 100 (sending or receiving time-dependent signals). In connection with the present invention, the timer 118 indicates the timing of the audio signal in the transmission (encoding) path and / or the reception (decoding) path.

送信鎖に関して言うと、送信鎖は、音声エンコーダ１３４を介して送信機／変調回路１２２に直列に結合された、マイクロフォン変換器を初めとする入力装置１２０を実質的に備えている。その後、いかなる伝送信号も、パワー・アンプ１２４を通り、アンテナ１０２から放射される。送信機／変調回路１２２およびパワー・アンプ１２４は、コントローラに応答して作用し、パワー・アンプ１２４からの出力はデュプレックス・フィルタまたはサーキュレータ１０４に結合される。送信機／変調回路１２２および走査型受信機フロントエンド回路１０６は、周波数アップコンバージョン機能および周波数ダウンコンバージョン機能を有する（図示しない）。 With respect to the transmit chain, the transmit chain substantially comprises an input device 120, including a microphone converter, coupled in series to a transmitter / modulation circuit 122 via a speech encoder 134. Thereafter, any transmitted signal is radiated from the antenna 102 through the power amplifier 124. Transmitter / modulation circuit 122 and power amplifier 124 operate in response to the controller, and the output from power amplifier 124 is coupled to duplex filter or circulator 104. The transmitter / modulation circuit 122 and the scanning receiver front-end circuit 106 have a frequency up-conversion function and a frequency down-conversion function (not shown).

当然ながら、ＭＳ１００内の種々のコンポーネントを、本発明の発明概念を利用可能にする任意の適切な機能的トポロジーに配置することができる。さらに、ＭＳ１００内の種々のコンポーネントは、個別のコンポーネント形式で実現されても統合されたコンポーネント形式で実現されてもよく、最終的な構造は単に任意選択したものにすぎない。 Of course, the various components within the MS 100 can be placed in any suitable functional topology that makes the inventive concept of the present invention available. Further, the various components within MS 100 may be implemented in discrete component formats or in integrated component formats, and the final structure is merely optional.

音声信号の好ましいバッファリングまたはプロセシングを、好ましくは音声処理機能を実行するソフトウェア・プロセッサ（またはデジタル信号プロセッサ（ＤＳＰ））を用いて、ソフトウェア、ファームウェア、またはハードウェアで実行し得ることは、本発明の
想定範囲内である。 It is the present invention that preferred buffering or processing of audio signals can be performed in software, firmware, or hardware, preferably using a software processor (or digital signal processor (DSP)) that performs audio processing functions. Is within the expected range.

ここで図２を参照すると、本発明の好ましい実施形態による、符号励起線形予測（ＣＥＬＰ）音声エンコーダ１３４のブロック図が示される。分析される音響入力信号は、マイクロホン２０２において音声コーダ１３４に向けられる。その後、入力信号は、フィルタ２０４に向けられる。フィルタ２０４は一般にバンドパスフィルタ特性を示すだろう。しかしながら、音声帯域幅が既に充分である場合、フィルタ２０４は直接のワイヤ接続を有してもよい。 With reference now to FIG. 2, a block diagram of a code-excited linear prediction (CELP) speech encoder 134 is shown in accordance with a preferred embodiment of the present invention. The acoustic input signal to be analyzed is directed to the voice coder 134 at the microphone 202. The input signal is then directed to the filter 204. Filter 204 will generally exhibit bandpass filter characteristics. However, if the audio bandwidth is already sufficient, the filter 204 may have a direct wire connection.

その後、フィルタ２０４からのアナログ音声信号は、Ｎ個のパルス・サンプルから成るシーケンスに変換され、次に、各パルス・サンプルの振幅は、当該技術分野で周知のように、デジタル−アナログ（Ａ／Ｄ）変換器２０８でデジタル・コードにより表される。サンプリング・レートはサンプル・クロック（ＳＣ）により決定される。サンプル・クロック（ＳＣ）はフレーム・クロック（ＦＣ）と共に生成される。 The analog audio signal from filter 204 is then converted into a sequence of N pulse samples, and the amplitude of each pulse sample is then digital-analog (A / A) as is well known in the art. D) Represented by the digital code in the converter 208. The sampling rate is determined by the sample clock (SC). The sample clock (SC) is generated together with the frame clock (FC).

その後、入力音声ベクトルｓ（ｎ）として表わされ得るＡ／Ｄ２０８のデジタル出力は、係数アナライザ２１０に向けられる。この入力音声ベクトルｓ（ｎ）は、個別の複数のフレームで、すなわち時間の複数のブロックで、反復して得られる。その時間の長さは、当該技術分野で周知のように、フレーム・クロック（ＦＣ）により決定される。 Thereafter, the digital output of A / D 208, which can be represented as input speech vector s (n), is directed to coefficient analyzer 210. This input speech vector s (n) is obtained iteratively in individual frames, ie in blocks of time. The length of time is determined by the frame clock (FC), as is well known in the art.

音声の各ブロックに対して、係数アナライザ２１０により、１セットの線形予測符号化（ＬＰＣ）パラメータが本発明の好ましい実施形態に従って生成される。生成された音声コーダパラメータは、ＬＰＣパラメータ、長期予測（ＬＴＰ）パラメータ、励起利得係数（Ｇ₂）（最良の確率的コードブック励起コードワードＩと共に）を含み得る。そのような音声符号化パラメータは、マルチプレクサ２５０に向けられ、デコーダで音声合成装置によって使用するためにチャンネル上を送られる。入力音声ベクトルｓ（ｎ）は、減算器２３０にも向けられる。減算器２３０の機能は以下に説明する。 For each block of speech, the coefficient analyzer 210 generates a set of linear predictive coding (LPC) parameters according to a preferred embodiment of the present invention. Generated speech coder parameters may include LPC parameters, long-term prediction (LTP) parameters, excitation gain factor (G ₂ ) (with best stochastic codebook excitation codeword I). Such speech coding parameters are directed to the multiplexer 250 and sent over the channel for use by the speech synthesizer at the decoder. Input speech vector s (n) is also directed to subtractor 230. The function of the subtracter 230 will be described below.

図２の従来のＣＥＬＰエンコーダで、コードブック検索コントローラ２４０は、入力音声サンプルを表すために使用される選択された励起ベクトル合計の最小重み付けエラーを生成するために、ブロック２１６内の適応コードブックとブロック２１４内の確率的コードブックの中から最良のインデックスおよび利得を選択する。確率的コードブック２１４と適応コードブック２１６からの出力は、それぞれの利得関数２２２および２１８に入力される。その後、利得を調整された出力は、加算器２２０で加算され、当該技術分野に周知のように、ＬＰＣフィルタ２２４に入力される。 In the conventional CELP encoder of FIG. 2, the codebook search controller 240 uses the adaptive codebook in block 216 to generate a minimum weighting error for the selected excitation vector sum used to represent the input speech samples. The best index and gain are selected from the probabilistic codebook in block 214. The outputs from probabilistic codebook 214 and adaptive codebook 216 are input to respective gain functions 222 and 218. Thereafter, the gain-adjusted outputs are added by the adder 220 and input to the LPC filter 224 as is well known in the art.

最初に、適応コードブックまたは長期予測器コンポーネントは、ｌ（ｎ）と計算される。これは遅延と利得係数「Ｇ₁」によって特徴付けられる。
個々の確率的コードブック励起ベクトルｕ_i（ｎ）の各々に対して、入力音声ベクトルｓ（ｎ）との比較のために、再構成音声ベクトルｓ’_i（ｎ）が、生成される。利得ブロック２２は励起利得係数「Ｇ₂」を計算し、加算ブロック２２０が適応コードブック・コンポーネントを加わる。そのような利得は、係数アナライザ２１０によって予め計算され、すべての励起ベクトルを分析するために使用されてもよいし、あるいは、コードブック検索コントローラ２４０により生成される最良の励起コード名Ｉの検索と共に最適化されてもよい。 Initially, the adaptive codebook or long-term predictor component is calculated as l (n). This is characterized by a delay and a gain factor “G ₁ ”.
For each individual stochastic codebook excitation vector u _i (n), a reconstructed speech vector s ′ _i (n) is generated for comparison with the input speech vector s (n). Gain block 22 calculates the excitation gain factor “G ₂ ” and summing block 220 adds an adaptive codebook component. Such a gain may be pre-calculated by the coefficient analyzer 210 and used to analyze all excitation vectors, or along with a search for the best excitation code name I generated by the codebook search controller 240. It may be optimized.

その後、計算された励起Ｇ₁ｌ（ｎ）＋Ｇ₂ ｕ_i（ｎ）は、短期予測（ＳＴＰ）フィルタを構成する線形予測符号化フィルタ２２４によりフィルタされ、再構成音声ベクトルｓ’_i（ｎ）を生成する。ｉ番目の励起コードベクトルに対する再構成音声信号は、入力音声ベクトルｓ（ｎ）の同じブロックと、減算器２３０においてこれら２つの信号の減算
をすることにより、比較される。 The calculated excitation G ₁ l (n) + G ₂ u _i (n) is then filtered by the linear predictive coding filter 224 that constitutes a short-term prediction (STP) filter, and the reconstructed speech vector s ′ _i (n) Is generated. The reconstructed speech signal for the i th excitation code vector is compared with the same block of the input speech vector s (n) by subtracting these two signals in the subtractor 230.

差分ベクトルｅ_i（ｎ）は、音声のオリジナルのブロックと再構成ブロックとの間の差を表わす。差分ベクトルは、係数アナライザ２１０によって生成された重み付け・フィルタパラメータ（ＷＴＰ）を利用する重み付けフィルタ２３２により、知覚的に重み付けされる。知覚的重み付けは、エラーが人間の耳にとってより重要な場合にその周波数を強調し、他の周波数を軽減ずる。 The difference vector e _i (n) represents the difference between the original block of speech and the reconstructed block. The difference vector is perceptually weighted by a weighting filter 232 that utilizes weighting and filter parameters (WTP) generated by the coefficient analyzer 210. Perceptual weighting emphasizes that frequency when the error is more important to the human ear and reduces other frequencies.

コードブック検索コントローラ２４０内のエネルギー計算器機能は、重み付けされた差分ベクトルｅ’_i（ｎ）のエネルギーを計算する。コードブック検索コントローラは、現在の励起ベクトルｕ_i（ｎ）のｉ番目のエラー信号を以前のエラー信号と比較して、最小のエラーを生成する励起ベクトルを決定する。その後、最小のエラーを有するｉ番目の励起ベクトルのコードが、最良の励起コードＩとしてチャンネルに出力される。 The energy calculator function in the codebook search controller 240 calculates the energy of the weighted difference vector e ′ _i (n). The codebook search controller compares the i th error signal of the current excitation vector u _i (n) with the previous error signal to determine the excitation vector that produces the smallest error. Thereafter, the code of the i th excitation vector with the smallest error is output to the channel as the best excitation code I.

計算された励起Ｇ₁ｌ（ｎ）＋Ｇ₂ｕ_I（ｎ）は、将来の使用に備えて２１６の長期予測器メモリ内に格納される。
代替例では、コードブック検索コントローラ２４０は、所定のエラー閾値を満たすいくつかの予め定義された基準を有するエラー信号を提供する、特定のコード名を決定してもよい。 The calculated excitation G ₁ l (n) + G ₂ u _I (n) is stored in 216 long-term predictor memory for future use.
In the alternative, the codebook search controller 240 may determine a specific code name that provides an error signal having a number of predefined criteria that meet a predetermined error threshold.

典型的な音声符号化ユニットの機能についてのより詳細な説明を、１９９４年にJohn Wileyによって公表されたA. M. Kondozの「低ビット速度の通信システムのためのデジタル音声符号化（Digital speech coding for low-bit rate communications systems ）」に見出すことができる。 A more detailed description of the functionality of a typical speech coding unit can be found in AM Kondoz's “Digital speech coding for low-bit communication systems” published by John Wiley in 1994. bit rate communications systems)) ”.

本発明の好ましい実施形態では、エラー軽減技術を、マルチプレクサ２５０の後で音声フレームに適用する。本発明は、メイン伝送路２８１上をエンコーダから送られた以前に符号化された音声フレームにポインタを送るために使用される、好ましくは並列の、代替の仮想伝送路２８２を利用する。 In the preferred embodiment of the present invention, error mitigation techniques are applied to speech frames after multiplexer 250. The present invention utilizes an alternative virtual transmission line 282, preferably parallel, that is used to send pointers to previously encoded speech frames sent from the encoder over the main transmission line 281.

本発明に関連して、「仮想」という表現は、音声通信を支援する主要な伝送路に加えてエンコーダからデコーダに供給される伝送路として定義される。「仮想」伝送路は、同じビット・ストリーム内にあってもよいし、または時分割マルチプレクス方式における同じ時間フレームまたはマルチフレーム内にあってもよいし、例えばＶｏＩＰシステムのような異なる通信経路を介してもよい。理想的には異なるエラー統計（例えば別のＦＥＣ方式）を有する追加の仮想伝送路の利用により、参照／ポインタは、それが参照している音声フレームと同じエラーを受けないだろう。 In the context of the present invention, the expression “virtual” is defined as the transmission path supplied from the encoder to the decoder in addition to the main transmission path supporting voice communication. The “virtual” transmission paths may be in the same bit stream, or in the same time frame or multiframe in a time division multiplex scheme, and may be on different communication paths such as VoIP systems, for example. It may be through. Ideally, by utilizing an additional virtual transmission line with different error statistics (eg, another FEC scheme), the reference / pointer will not receive the same error as the voice frame it references.

既知の符号化装置に対する１つの顕著な違いは、多重化動作後に第２の最小化セクションがあることである。そのような回路は、バッファに保持された音声パラメータ・データを評価し、現在の音声フレームに最も近いものを選択する。 One notable difference to the known encoder is that there is a second minimized section after the multiplexing operation. Such a circuit evaluates the speech parameter data held in the buffer and selects the one closest to the current speech frame.

増強された実施形態では、並列の仮想伝送路が、音声コーダによりメイン伝送路で使用されるのとは異なる前方型エラー訂正（ＦＥＣ）保護を使用する。このように、独立したＦＥＣパスを使用することにより、音声データパケットは異なるエラー統計を受ける。メイン伝送路と並列の仮想伝送路との間のこの差は、エラーに対する丈夫さ（robustness）を改善する手助けとなる。 In an enhanced embodiment, parallel virtual transmission lines use forward error correction (FEC) protection that is different from that used by the voice coder on the main transmission line. Thus, by using independent FEC paths, voice data packets are subject to different error statistics. This difference between the main transmission path and the parallel virtual transmission path helps to improve robustness against errors.

マルチプレクサ２５０は、データ・パケット／フレームを、以前の多重化フレームを保持しているバッファ２６０に出力する。デマルチプレクサ２７０は、バッファ２６０中に
保持された多重化信号のバッファ・フレームにアクセスする。これに関して、デマルチプレクサ２７０は、励起パラメータ２７４をＬＰＣパラメータ２７２から分離する。励起パラメータを生成するために使用される長期予測器のメモリは、フレームの最初の長期予測器２１６と同一でなければならないことに留意する。 Multiplexer 250 outputs the data packet / frame to buffer 260, which holds the previous multiplexed frame. Demultiplexer 270 accesses the buffer frame of the multiplexed signal held in buffer 260. In this regard, demultiplexer 270 separates excitation parameter 274 from LPC parameter 272. Note that the long-term predictor memory used to generate the excitation parameters must be the same as the first long-term predictor 216 of the frame.

従って、多重化音声の各ブロックについて、現在のフレームと以前のフレームに対する１セットの線形予測符号化（ＬＰＣ）パラメータが生成される。本発明の好ましい実施形態では、量子化ＬＰＣパラメータと励起パラメータの各セットが、バッファ・データのｊ番目の以前のフレームに対する再構成音声ベクトルｓ’_j（ｎ）を形成する。再構成音声ベクトルｓ’_j（ｎ）は、以前にバッファされた音声ベクトルｓ（ｎ）と、減算器２６２でこれら２つの信号を減算することより比較される。 Thus, for each block of multiplexed speech, a set of linear predictive coding (LPC) parameters for the current and previous frames is generated. In the preferred embodiment of the present invention, each set of quantized LPC and excitation parameters forms a reconstructed speech vector s ′ _j (n) for the j th previous frame of buffer data. The reconstructed speech vector s ′ _j (n) is compared with the previously buffered speech vector s (n) by subtracting these two signals at the subtractor 262.

差分ベクトルｅ_j（ｎ）は、音声のオリジナルのブロックと、以前にバッファされたブロックとの間の差を表わす。差分ベクトルは、ＬＰＣ重み付けフィルタ２６４により、知覚的に重み付けされる。既に示したように、知覚的重み付けは、エラーが人間の耳にとってより重要な場合にその周波数を強調し、他の周波数を軽減ずる。 The difference vector e _j (n) represents the difference between the original block of speech and the previously buffered block. The difference vector is perceptually weighted by the LPC weighting filter 264. As already indicated, perceptual weighting emphasizes that frequency when the error is more important to the human ear and reduces other frequencies.

コードブック検索コントローラ２６６内のエネルギー計算器機能は、重み付けされた差分ベクトルｅ’_j（ｎ）のエネルギーを計算する。コードブック検索コントローラ２６６は、現在の励起ベクトルｕ_j（ｎ）のｊ番目のエラー信号を、以前のエラー信号と比較して、最小のエラーを生成する励起ベクトルを決定する。その後、コードブック検索コントローラ２６６は、最小の重み付けされたエラーを提供するために、「フレームデータに対する最良のインデックス」を選択する。エンコーダは、次に、デコーダに、それ自体とメイン伝送路中のそれぞれの音声フレームとの間に最小の重み付けエラーを提供するものとして決定された以前のフレームに対するポインタを伝送する。 The energy calculator function in the codebook search controller 266 calculates the energy of the weighted difference vector e ′ _j (n). The codebook search controller 266 compares the jth error signal of the current excitation vector u _j (n) with the previous error signal to determine the excitation vector that produces the smallest error. The codebook search controller 266 then selects the “best index for frame data” to provide the least weighted error. The encoder then transmits to the decoder a pointer to the previous frame determined to provide the least weighting error between itself and each speech frame in the main transmission path.

実質的には、（理想的には現在の送信フレームから時間もフレーム番号も異なる）参照音声フレームは、エンコーダによって符号化されたフレームに（知覚的に重み付けされたエラーの意味で）最もよく似た音声の特定の動いているウィンドウ内のフレームを構成する。したがって、音声フレームが誤って受け取られた場合、参照音声フレームは、エラー軽減手順に使用される現在のフレームに最良のマッチ（ポインタ）を表わす。この表現すなわちポインタについては、図３でより詳細に説明する。 In effect, the reference speech frame (ideally with a different time and frame number from the current transmission frame) is most similar to the frame encoded by the encoder (in the sense of perceptually weighted errors). Compose frames within a particular moving window of audio. Thus, if a speech frame is received in error, the reference speech frame represents the best match (pointer) to the current frame used for error mitigation procedures. This representation or pointer will be described in more detail in FIG.

ここで図３を参照すると、本発明の好ましいプロセスを示すバッファ・タイミング図３００が示されている。このタイミング図は、フレーム０３１０を、音声デコーダで受け取られ、誤っていることが決定されたものとして示している。その後、デコーダは、フレーム０３１０を置き換えるのに最も適切なフレームを決定すべく、代替の仮想伝送路にアクセスする。図３に示されるように、代替の仮想伝送路は、フレーム０３１０の好ましい置換物としてのフレーム−４３２０に対するポインタを含んでいる。フレーム０３１０をフレーム−４３２０と置き換えることによって、音声復号プロセスにおける音声品質には最小の影響しか及ばない。 Referring now to FIG. 3, a buffer timing diagram 300 illustrating the preferred process of the present invention is shown. The timing diagram shows frame 0 310 as received at the audio decoder and determined to be incorrect. The decoder then accesses an alternative virtual transmission path to determine the most appropriate frame to replace frame 0 310. As shown in FIG. 3, an alternative virtual transmission path includes a pointer to frame-4 320 as the preferred replacement for frame 0 310. Replacing frame 0 310 with frame-4 320 has minimal impact on speech quality in the speech decoding process.

本発明の発明者らは、直前の先行フレームがすべて（一般に）同じ話者によって話されたものである、すなわち音声フレームは同様のピッチとフォルマント位置を示すという事実を認識し、利用している。したがって、現在の音声フレームと同様な、以前の音声フレームを見出すことができる可能性は高い。 The inventors of the present invention recognize and utilize the fact that the immediately preceding preceding frame is all (generally) spoken by the same speaker, i.e., the speech frame exhibits a similar pitch and formant position. . Therefore, it is highly possible that a previous speech frame similar to the current speech frame can be found.

本発明の好ましい実施形態によれば、メモリ内の各フレームに対するパラメータのセットを与えると、バッファ・フレームの各々に対するセグメンタル信号対雑音（ＳＥＧＳＮＲ）または平均重み付けＳＮＲを評価することにより、最小の重み付け知覚エラーが見出
される。好ましくは、セグメントは音声コーデックサブフレームレベルで定義される。 In accordance with a preferred embodiment of the present invention, given a set of parameters for each frame in memory, the minimum weight is determined by evaluating the segmental signal to noise (SEGSNR) or average weighted SNR for each of the buffer frames. Perceptual errors are found. Preferably, the segment is defined at the voice codec subframe level.

この決定はエンコーダで行なわれる。小さなピッチエラーがある場合には、かなり異なるＳＥＧＳＮＲ値が生じ得ることが想定される。これは、ソース音声とバッファ信号が素早く移動して位相から外れ得るからである。従って、本発明の増強された実施形態では、サンプルより小さい解像度（通常１／３または１／４サンプル）を使用してバッファ・フレームのピッチ期間とその周辺（例えば＋／−５％）を検索し、最も高いＳＥＧＳＮＲ値を採用することが提案されている。 This determination is made at the encoder. It is envisioned that if there is a small pitch error, significantly different SEGSNR values can occur. This is because the source audio and buffer signal can move quickly and out of phase. Thus, in an enhanced embodiment of the invention, a resolution smaller than the sample (usually 1/3 or 1/4 sample) is used to find the pitch period of the buffer frame and its surroundings (eg +/- 5%). However, it has been proposed to employ the highest SEGSNR value.

本発明のさらなる増強では、フレームの悪い受け取りを軽減するために使用されたフレームは、フレームがそれ自体誤って受け取られても、それ自体、図４で示されるように、誤って受け取られた現在のフレームの音声情報の最良なソースとなるだろう。従って、図４は、複数のエラーがどのように取り扱われるか示すタイミング図を示す。フレーム０４１０からのデータは誤っていることがわかっている。提案されたエラー軽減プロセスは、適切な置換物としてデータフレーム−４４２０を示す代替の仮想伝送路を使用する。しかしながら、データフレーム−４４２０は誤っていると判定される。その場合には、ポインタは、破損フレーム−４４２０と最も類似するフレームであるフレームとして、フレーム−６４３０からのデータを示す。したがって、フレーム−６４５０はフレーム−４４２０と置き換わるために使用され、フレーム−１４１０と置き換わるのに適している。このように、複数のフレームエラーを、メモリーから外れる参照の問題点を克服するために取り扱うことができる。 In a further enhancement of the present invention, the frame used to mitigate bad reception of the frame itself is now erroneously received, as shown in FIG. 4, even if the frame itself is received incorrectly. Would be the best source of audio information for frames. Accordingly, FIG. 4 shows a timing diagram showing how multiple errors are handled. The data from frame 0 410 is known to be incorrect. The proposed error mitigation process uses an alternative virtual transmission line that shows data frame-4 420 as a suitable replacement. However, data frame-4 420 is determined to be incorrect. In that case, the pointer indicates data from frame-6 430 as the frame that is most similar to corrupted frame-4 420. Thus, frame-6 450 is used to replace frame-4 420 and is suitable to replace frame-1 410. In this way, multiple frame errors can be handled to overcome the problem of referencing out of memory.

これにより、参照（ポインタ）は、いわば効果的に、最終的には記憶ウィンドウから外れ得る。しかしながら、複数の参照の必要性をなくすことによりウィンドウ内の誤った値が更新されるならば、これは問題点である必要がない。 As a result, the reference (pointer) can effectively fall out of the storage window in the end. However, if the wrong value in the window is updated by eliminating the need for multiple references, this need not be a problem.

代わりに、置換フレームがバッファ内に格納されると、フレーム−４４２０が現在のフレームである場合、それはバッファ内のフレーム−６４３０（次にフレーム−２）と置換されただろう。その結果、バッファは常に使用可能なデータのみを含む。 Instead, when a replacement frame is stored in the buffer, if frame-4 420 is the current frame, it would have been replaced with frame-6430 (and then frame-2) in the buffer. As a result, the buffer always contains only usable data.

要約すると、参照またはポインタは、主要なビット・ストリームに対する代替ビット・ストリームでデコーダに送信される。参照またはポインタは、現在送信されているフレームと最も良く一致する、以前に送信されたフレームを示す。参照またはポインタは、並列のビット・ストリームで好ましくは送信される。フレームが音声デコーダで誤って受け取られた場合、この参照またはポインタはフレーム置換エラー軽減プロセスで使用される。従って、フレーム軽減は、既知の直前または直後のフレーム置換機構を、多くのフレームから任意のフレームへと延長することにより増強される。これに関して、プロセスで使用されるフレームの数は、バッファリング／記憶機構および／または最小の重み付けエラーフレームを決定するのに必要な処理パワーによってのみ制限される。 In summary, the reference or pointer is sent to the decoder in an alternative bit stream for the main bit stream. The reference or pointer indicates the previously transmitted frame that best matches the currently transmitted frame. References or pointers are preferably transmitted in parallel bit streams. This reference or pointer is used in the frame replacement error mitigation process if the frame is received in error by the audio decoder. Thus, frame mitigation is enhanced by extending the known immediately or immediately following frame replacement mechanism from many frames to any frame. In this regard, the number of frames used in the process is limited only by the buffering / storage mechanism and / or processing power required to determine the minimum weighted error frame.

示されるように、音声コーダの音声パラメータのバッファリング／記憶プロセスは、多くのフレームに関して行なわれる。例えば、１２ｋｂ／秒より小さいＧＳＭ強化フル速度
（ＥＦＲ）コーデックの場合、３秒の音声に対する記憶量はわずか５キロバイトである。したがって、最も困難なタスクは、１５０の可能なフレームから最も近いフレームマッチを識別することである。従って、本発明の１実施形態では、上述の最小重み付けエラー選択技術は、音声コーダフレームのすべてのパラメータではなく、パラメータの部分集合または合成音声に由来するパラメータに適用され得る。言い換えれば、メモリへの保存および比較処理のために、正確なコーダ・パラメータではなく、ＬＰＣフィルタパラメータ（ＬＳＦ）および合成音声フレームのエネルギー（エンコーダとデコーダの両方で計算された合成音声から由来する音声パラメータ）が、参照（またはポイント）されるであろう。 As shown, the speech coder speech parameter buffering / storing process is performed for many frames. For example, for a GSM Enhanced Full Rate (EFR) codec that is less than 12 kb / s, the amount of storage for 3 seconds of speech is only 5 kilobytes. Thus, the most difficult task is to identify the closest frame match from the 150 possible frames. Thus, in one embodiment of the present invention, the minimum weight error selection technique described above may be applied to parameters derived from a subset of parameters or synthesized speech, rather than all parameters of a speech coder frame. In other words, for storage in memory and for comparison processing, not the exact coder parameters, but the LPC filter parameters (LSF) and the energy of the synthesized speech frame (speech derived from synthesized speech calculated by both the encoder and decoder) Parameter) will be referenced (or pointed).

この点に関して、音声フレームは多くのパラメータを含んでいるため、それらの任意の数に、提案された技術を原則として適用することができる。そのようなパラメータの例としては、ＣＥＬＰコーダでは、以下のものが挙げられる：
（ｉ）ＬＰＣパラメータを表わすライン・スペクトル・ペア（ＬＳＰ）；
（ｉｉ）サブフレーム−１に対する長期予測器（ＬＴＰ）の遅延；
（ｉｉｉ）サブフレーム−１に対するＬＴＰ利得；
（ｉｖ）サブフレーム−１に対するコードブック・インデックス；
（ｖ）サブフレーム−１に対するコードブック利得；
（ｖｉ）サブフレーム−２に対する長期予測器の遅延；
（ｖｉｉ）サブフレーム−２に対するＬＴＰ利得；
（ｖｉｉｉ）サブフレーム−２に対するコードブック・インデックス；
（ｉｘ）サブフレーム−２に対するコードブック利得；
（ｘ）サブフレーム−３に対する長期予測器遅延；
（ｘｉ）サブフレーム−３に対するＬＴＰ利得；
（ｘｉｉ）サブフレーム−３に対するコードブック・インデックス；
（ｘｉｉｉ）サブフレーム−３に対するコードブック利得；
（ｘｉｖ）サブフレーム−４に対する長期予測器遅延；
（ｘｖ）サブフレーム−４に対するＬＴＰ利得；
（ｘｖｉ）サブフレーム−４に対するコードブック・インデックス；
または（ｘｖｉｉ）サブフレーム−４に対するコードブック利得。 In this regard, speech frames contain many parameters, so the proposed technique can be applied in principle to any number of them. Examples of such parameters include the following for CELP coders:
(I) a line spectrum pair (LSP) representing LPC parameters;
(Ii) Long-term predictor (LTP) delay for subframe-1;
(Iii) LTP gain for subframe-1;
(Iv) Codebook index for subframe-1;
(V) codebook gain for subframe-1;
(Vi) long-term predictor delay for subframe-2;
(Vii) LTP gain for subframe-2;
(Viii) codebook index for subframe-2;
(Ix) codebook gain for subframe-2;
(X) long-term predictor delay for subframe-3;
(Xi) LTP gain for subframe-3;
(Xii) codebook index for subframe-3;
(Xiii) codebook gain for subframe-3;
(Xiv) long-term predictor delay for subframe-4;
(Xv) LTP gain for subframe-4;
(Xvi) codebook index for subframe-4;
Or (xvii) codebook gain for subframe-4.

パラメータ全体のセットではなく、現在のフレームのＬＳＰとマッチする以前のフレームのＬＳＰのセットを参照するポインタを送信できることは本発明の想定内である。代わりに、上記の多くのパラメータの各々に対するポインタを有することも可能であろう。 It is within the assumption of the invention that a pointer can be sent that refers to the set of LSPs of the previous frame that matches the LSP of the current frame, rather than the entire set of parameters. Alternatively, it would be possible to have a pointer to each of the many parameters described above.

無線通信システムでは、並列の仮想伝送路は、好ましくは、データペイロードの非保護ビット内でブロック符号化された参照ワード（１２８のフレームバッファを支援するには、約２．５秒に等しい７ビットで十分であろう）を送信することから成る。参照ワードは、２ビットのエラー訂正まで提供する、１５ビットのＢＣＨブロックコード（７５ビット／秒の等価速度で）により符号化し得る。 In a wireless communication system, parallel virtual transmission lines are preferably block-coded reference words (7 bits equal to about 2.5 seconds to support 128 frame buffers) within the unprotected bits of the data payload. Will be sufficient). The reference word may be encoded with a 15-bit BCH block code (at an equivalent rate of 75 bits / second), providing up to 2 bits of error correction.

代わりに、代替の仮想伝送路は、エラー訂正機能とエラー検出機能の組み合わせを提供してもよいことが意図される。参照の受け取りが欠如すると悪い軽減につながる可能性があるため、エラー検出は有用であろう。参照ワードが悪く受信された場合、スキームは以前のフレーム反復をデフォルトにすることが可能である。７５ビット／秒のチャンネル速度は、ＧＳＭのフル速度チャネルの正味のビット速度を、２２．８キロビット／秒から２２．７２５キロビット／秒にわずかに減小させるにすぎない。これは、感度の無視し得る程度の損失である。 Instead, it is contemplated that an alternative virtual transmission line may provide a combination of error correction and error detection functions. Error detection may be useful because lack of receipt of references can lead to bad mitigation. If the reference word is received badly, the scheme can default to the previous frame repetition. The channel speed of 75 bits / second only slightly reduces the net bit rate of the GSM full speed channel from 22.8 kbps to 22.725 kbps. This is a negligible loss of sensitivity.

インターネットプロトコル（ＶｏＩＰ）通信リンクを介したボイスオーバー等の別の実施形態では、代替仮想伝送路は、多数のパケット・ストリームを送ることにより達成され
得る。これについては、パケットを落とす確率が増加するため、トラフィック全体が実質的に増大しないことが望ましい。 In another embodiment, such as voice over over an Internet Protocol (VoIP) communication link, an alternative virtual transmission path may be achieved by sending multiple packet streams. In this regard, it is desirable that the overall traffic does not increase substantially because the probability of dropping packets increases.

好ましい機構は、遷移が起こり、音声が非定常である場合にのみ、上述したように以前のフレームに参照を送ることだろう。音声が定常の場合、および、従来の技術が比較的うまくいく場合、参照は送られない。このように、パケットネットワークは過度に過負荷をかけられることはなく、大半の性能利得が達成される。音声信号がどれくらい定常であるかの程度は、変数として生成することができ、この変数は、失われたパケットの場合には再生品質を改善するために調節することができるる。 The preferred mechanism would be to send a reference to the previous frame as described above only when a transition occurs and the speech is non-stationary. If the speech is stationary and if the prior art is relatively successful, no reference is sent. In this way, the packet network is not overloaded and most performance gains are achieved. The degree to which the audio signal is steady can be generated as a variable, which can be adjusted to improve the playback quality in the case of lost packets.

デコーダの機能は、実質的にエンコーダの逆であるため（マルチプレクサに続く追加の回路がなければ）、ここでは詳細に説明しない。通常の音声復号化ユニットの機能の説明も、１９９４年にJo hn Wileyによって公表されたA. M. Kondozの「低ビット速度の通
信システムのためのデジタル音声符号化（Digital speech coding for low-bit rate communications systems ）」に見出すことができる。デコーダでは、悪いフレームを決定するまで、デコーダは標準的な復号プロセスを辿る。悪いフレームが検知されると、デコーダは、代替仮想伝送路を評価して、参照／ポインタのそれぞれによって示された代替フレームを決定する。その後、参照／ポインタ伝送によって示されるように、デコーダは「同様の」フレームを検索する。その後、以前に示されたフレームが、音声を合成するために、受信フレームと置き換わるべく使用される。 Since the decoder function is substantially the inverse of the encoder (without additional circuitry following the multiplexer), it will not be described in detail here. A description of the functionality of a normal speech decoding unit is also provided by AM Kondoz's “Digital speech coding for low-bit rate communications systems” published by Jo Win Wiley in 1994. ) ”. At the decoder, the decoder follows the standard decoding process until a bad frame is determined. When a bad frame is detected, the decoder evaluates the alternative virtual transmission path to determine the alternative frame indicated by each of the references / pointers. The decoder then searches for “similar” frames, as indicated by the reference / pointer transmission. The previously indicated frame is then used to replace the received frame to synthesize speech.

有利には、本明細書に説明した発明概念は、既に構成されたＦＥＣ方式からビットを盗むことにより既存のコーデックに適合し得る。
いかなる音声処理回路も本明細書に説明した発明概念から利益を得るであろうことは、本発明の想定内である。 Advantageously, the inventive concepts described herein can be adapted to existing codecs by stealing bits from an already configured FEC scheme.
It is within the contemplation of the present invention that any audio processing circuit will benefit from the inventive concepts described herein.

悪いフレームのエラー軽減機構は、上述したように、少なくとも次の利点を有することが理解されるだろう。
（ｉ）より正確な置換フレーム機構が提供され、そのため、回復された音声フレームで望ましくないアーチファクトが聞こえる危険性を減らす。
（ｉｉ）代替仮想伝送路は、例えば既に構成されたＦＥＣ方式からビットを盗むことにより、既存のコーデックに適合され得る。
（ｉｉｉ）遷移が起こり、音声が非定常である場合にのみ、以前のフレームへの参照が送られると、既存の悪いフレームのエラー軽減技術が使用され、そのため、本発明に必要な追加のデータを最小限にすることができる。
（ｉｖ）所定のフレームで受け取られたデータをこのスキームで参照されるフレームと相互参照することにより、誤って受信されたパラメータが検知され得る。 It will be appreciated that the bad frame error mitigation mechanism has at least the following advantages, as described above.
(I) A more accurate replacement frame mechanism is provided, thus reducing the risk of undesired artifacts being heard in recovered speech frames.
(Ii) The alternative virtual transmission path can be adapted to an existing codec, for example by stealing bits from an already configured FEC scheme.
(Iii) The existing bad frame error mitigation techniques are used if a reference to the previous frame is sent only if the transition occurs and the speech is non-stationary, so the additional data required for the present invention Can be minimized.
(Iv) Misreceived parameters can be detected by cross-referencing data received in a given frame with frames referenced in this scheme.

好ましい実施形態は、本発明をＣＥＬＰコーダに適用することについて論じているが、発明者らには、伝送誤差が生じ得る場合に本明細書に含まれる発明概念から利益を得ることができる他の任意の音声処理ユニットも想定される。本明細書で説明した発明概念は、特に、ユニバーサル・モバイル通信システム（ＵＭＴＳ）ユニット、グローバル移動体通信システム（ＧＳＭ）、地上基盤無線（ＴＥＴＲＡ）通信ユニット、情報とシグナリングのデジタル交換規格（Digital Interchange of Information and Signalling standard, ＤＩＩＳ）、ボイスオーバ・インターネット・プロトコル（ＶｏＩＰ）ユニットなどの無線通信ユニット用の音声処理ユニットへの用途が特に考えられる。 Although the preferred embodiment discusses applying the present invention to a CELP coder, the inventors have other benefits that can benefit from the inventive concepts contained herein where transmission errors can occur. Any audio processing unit is also envisaged. The inventive concepts described herein include, among other things, Universal Mobile Telecommunications System (UMTS) units, Global Mobile Telecommunications System (GSM), Terrestrial Based Radio (TETRA) communication units, Digital Interchange for Information and Signaling (Digital Interchange Applications for speech processing units for wireless communication units such as the Information of Signaling Standard (DIIS) and Voice over Internet Protocol (VoIP) units are particularly conceivable.

（本発明の装置：）
音声通信ユニットは、入力音声信号を表わすことが可能な音声エンコーダを備えている。音声エンコーダは、音声デコーダに多くの音声フレームを送信するための伝送路を有す
る。音声エンコーダは、伝送路上を送信される多くの音声フレームに対する１または複数の参照を送信するための仮想伝送路をさらに有する。１または複数の参照は、フレームが誤って受け取られた場合に置換フレームとして使用される、伝送路上を送信される多くの音声フレーム内の代替の音声フレームに関する。
音声通信ユニット、例えば音声エンコーダを有する上記の音声通信ユニットは、伝送路上の多くの音声フレームと、仮想伝送路上の１または複数の置換音声フレーム参照とを受け取るように適合された、音声デコーダを備えている。１または複数の参照は、フレームが誤って受け取られた場合に置換フレームとして使用される、伝送路上で受け取られる多くの音声フレーム内の代替の音声フレームに関する。 (Device of the present invention :)
The voice communication unit includes a voice encoder capable of representing an input voice signal. The speech encoder has a transmission path for transmitting many speech frames to the speech decoder. The speech encoder further comprises a virtual transmission path for transmitting one or more references for many speech frames transmitted over the transmission path. One or more references relate to alternative voice frames within the many voice frames transmitted over the transmission path that are used as replacement frames if the frame is received in error.
A voice communication unit, for example a voice communication unit as described above with a voice encoder, comprises a voice decoder adapted to receive a number of voice frames on the transmission line and one or more replacement voice frame references on the virtual transmission line. ing. The one or more references relate to alternative voice frames within the many voice frames received on the transmission path that are used as replacement frames if the frame is received in error.

（本発明の方法：）
音声通信ユニットにおいて悪いフレームのエラー軽減を実行する方法は、音声通信ユニット内の音声エンコーダにより、多くの音声フレームを伝送路上を通って音声デコーダへ送信する工程から成る。音声エンコーダは、伝送路を送信された多くの音声フレームに対する１または複数の参照を、仮想伝送路上を通って送信する。１または複数の参照は、フレームが誤って受け取られた場合に置換フレームとして使用される、伝送路上を送信される多くの音声フレーム内の代替の音声フレームに関する。 (Method of the present invention :)
A method for performing bad frame error mitigation in a voice communication unit consists of sending a number of voice frames over a transmission path to a voice decoder by a voice encoder in the voice communication unit. A speech encoder transmits one or more references for many speech frames transmitted over a transmission path over a virtual transmission path. One or more references relate to alternative voice frames within the many voice frames transmitted over the transmission path that are used as replacement frames if the frame is received in error.

このように、音声フレームが誤って受け取られた場合、多くの音声フレームから改善された置換フレームが選択され得る。
かくして、少なくとも既知のエラー軽減技術に関する上述の欠点の少なくとも一部を実質的に軽減する、悪いフレームのエラー軽減技術、ならびに関連の音声通信ユニットおよび回路について説明した。 Thus, if a speech frame is received in error, an improved replacement frame can be selected from many speech frames.
Thus, a bad frame error mitigation technique and associated voice communication units and circuits have been described that substantially alleviate at least some of the above-mentioned drawbacks associated with at least known error mitigation techniques.

本発明の好ましい実施形態の様々な発明概念をサポートするよう適合された音声コーダを備えた無線通信ユニットのブロック図。1 is a block diagram of a wireless communication unit with a voice coder adapted to support various inventive concepts of a preferred embodiment of the present invention. 本発明の好ましい実施形態の様々な発明概念をサポートするよう適合された符号励起線形予測音声コーダのブロック図。1 is a block diagram of a code-excited linear prediction speech coder adapted to support various inventive concepts of a preferred embodiment of the present invention. 本発明の好ましい実施形態による、多くの他のフレームから置換フレームが選択される、代替の仮想伝送路によって示された参照機構の使用。Use of a reference mechanism indicated by an alternative virtual transmission line in which a replacement frame is selected from many other frames according to a preferred embodiment of the present invention. 本発明の好ましい実施形態による、メイン伝送路で生じる複数のエラーに対処するための、代替の仮想伝送路の強化された使用。Enhanced use of alternative virtual transmission lines to address multiple errors occurring in the main transmission line, according to a preferred embodiment of the present invention.

Claims

A voice communication unit (100) having a voice encoder (134) capable of representing an input voice signal, wherein the voice encoder (134) is a transmission path (281) for transmitting a number of voice frames to a voice decoder. The speech encoder (134) has a virtual transmission path (282) for transmitting one or more references to a number of speech frames transmitted on the transmission path (281), The one or more references relate to alternative audio frames within the many audio frames transmitted over the transmission path (281) that are used as replacement frames if the frame is received in error. Communication unit (100).

The speech encoder (134)
A multiplexer (250) for multiplexing the number of speech frames;
A buffer (260) that operates in combination with the multiplexer (250) and stores multiplexed audio data; and a current audio in the buffer (260) that operates in combination with the buffer (260). A processor (130, 270) for characterizing a frame and selecting an alternative voice frame that exhibits characteristics similar to the current voice frame, the reference to the alternative voice frame being a virtual transmission path (282) Sent to the decoder at the processor (130, 270);
The voice communication unit (100) of claim 1, further characterized by:

The processor has a demultiplexer function (270) for accessing one or more audio frames in the buffer (260), and the LPC parameters of the audio frames buffered to select audio frames exhibiting similar characteristics The voice communication unit (100) of claim 2, wherein the excitation parameter (274) is separated from the (272).

The voice communication unit (100) according to any of claims 1 to 3, wherein the virtual transmission line (282) is included in the same bit stream of the transmission line (281).

The transmission line (281) uses a first forward error correction protection scheme, and the virtual transmission line (282) uses a second forward error correction protection different from that used in the transmission line (281). The voice communication unit (100) according to any one of claims 1 to 4.

The voice communication unit (100) according to any of claims 2 to 5, wherein the processor (130, 266, 270) selects an alternative replacement frame to provide a minimum weighting error.

The processor (130, 266, 270) determines a minimum weighting error by evaluating a weighted segmental signal to noise (SEGSNR) or average weighted SNR for each of a plurality of buffer frames. Item 7. The voice communication unit (100) according to item 6.

The speech communication unit (100) according to claim 6 or 7, wherein the processor (130, 266, 270) determines a minimum weighting error of a subset of speech coding parameters.

The processor (130, 266) substantially searches the pitch period of the buffered speech frame and its surroundings and selects the frame showing the highest SEGSNR value. The voice communication unit (100) according to claim 1. d

The voice communication unit (100) according to any of claims 1 to 9, wherein the alternative voice frame (320) is referenced to the current voice frame only when a transition occurs and the voice is non-stationary.

Features an audio decoder (132) adapted to receive a number of audio frames on the transmission line (281) and one or more replacement audio frames (320) references on the virtual transmission line (282). The one or more references relate to alternative voice frames (320) within the many voice frames received on the transmission path (281) that are used as replacement frames if the frames are received in error. The voice communication unit (100) according to any of claims 1 to 10, wherein:

If the substitute audio frame (420) is received in error, a frame (430) is selected as the substitute frame for the incorrectly received substitute frame (420) and the current audio frame (410) received in error and The voice communication unit (100) of claim 11, wherein the voice communication unit (100) is used in place of a falsely received alternative voice frame (420).

A method for performing bad frame error mitigation in a voice communication unit (100) comprising:
Sending a number of audio frames over the transmission path (281) to the audio decoder by the audio encoder (134) in the audio communication unit (100);
The method comprising:
Transmitting one or more references for a number of voice frames transmitted over a transmission path (281) on a virtual transmission path (282), wherein the one or more references are received when the frame is erroneously received. A step of relating to an alternative voice frame within the number of voice frames transmitted over the transmission path (281), which is used as a replacement frame in the case of
A method characterized by comprising.

Voice communication unit (100) adapted to carry out the two steps of the method according to claim 13.

A wireless communication system adapted to support the use of a transmission line (281) and a virtual transmission line (282) according to claims 1-14.