JP4245606B2

JP4245606B2 - Speech encoding device

Info

Publication number: JP4245606B2
Application number: JP2005500739A
Authority: JP
Inventors: 均佐々木; 恭士大田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-06-10
Filing date: 2003-06-10
Publication date: 2009-03-25
Anticipated expiration: 2023-06-10
Also published as: WO2004112256A1; US20050278174A1; US7072830B2; JPWO2004112256A1

Description

本発明は、音声符号化装置に関し、特に音声信号の情報を圧縮して符号化を行う音声符号化装置に関する。 The present invention relates to a speech coding apparatus, and more particularly to a speech coding apparatus that performs coding by compressing information of a speech signal.

移動体通信やＣＤなどでは、音声のディジタル処理が行われ、ディジタル化された音声信号は、ユーザにとっても身近な存在となっている。ディジタル音声信号を効率よく圧縮・伝送するためには、高能率符号化が行われる。 In mobile communication and CD, digital processing of voice is performed, and the digitized voice signal is familiar to the user. In order to efficiently compress and transmit digital audio signals, high-efficiency encoding is performed.

高能率符号化は、情報量の冗長度を除去して圧縮し、人間の感覚で歪ができるだけ感知されないようにして伝送容量の節約を図る技術であり、様々な方式が提案されている。音声信号の高能率符号化アルゴリズムとしては、ITU-T G.726で標準化されているＡＤＰＣＭ（Adaptive Differential Pulse Code Modulation：適応的差分パルス符号変調）が広く使用されている。 High-efficiency coding is a technique for reducing the amount of information redundancy and compressing it so that distortion is not perceived as much as possible by human senses, and various schemes have been proposed. As a high-efficiency encoding algorithm for audio signals, ADPCM (Adaptive Differential Pulse Code Modulation) standardized by ITU-T G.726 is widely used.

図１８、図１９はＡＤＰＣＭコーデックのブロック構成を示す図である。ＡＤＰＣＭ符号器１１０は、Ａ／Ｄ部１１１、適応量子化部１１２、適応逆量子化部１１３、適応予測部１１４、減算器１１５、加算器１１６から構成される。なお、点線枠内をローカルデコーダと呼ぶ。ＡＤＰＣＭ復号器１２０は、適応逆量子化部１２１、適応予測部１２２、Ｄ／Ａ部１２３、加算器１２４から構成される（符号器側のローカルデコーダがそのまま復号器となる）。 18 and 19 are diagrams showing a block configuration of the ADPCM codec. The ADPCM encoder 110 includes an A / D unit 111, an adaptive quantization unit 112, an adaptive inverse quantization unit 113, an adaptive prediction unit 114, a subtractor 115, and an adder 116. The dotted line frame is called a local decoder. The ADPCM decoder 120 includes an adaptive inverse quantization unit 121, an adaptive prediction unit 122, a D / A unit 123, and an adder 124 (the local decoder on the encoder side becomes a decoder as it is).

ＡＤＰＣＭ符号器１１０に対し、Ａ／Ｄ部１１１は、入力音声をディジタル信号ｘに変換する。減算器１１５は、現在の入力信号ｘと、適応予測部１１４で過去の入力信号にもとづいて生成した予測信号ｙとの差分をとって予測残差信号ｒを生成する。 For the ADPCM encoder 110, the A / D unit 111 converts the input speech into a digital signal x. The subtractor 115 generates a prediction residual signal r by taking the difference between the current input signal x and the prediction signal y generated based on the past input signal by the adaptive prediction unit 114.

適応量子化部１１２は、量子化誤差が小さくなるように、予測残差信号ｒの過去の量子化値に応じて量子化ステップ幅（ステップサイズ）を増減して量子化を行う。すなわち、直前の標本（サンプル）の量子化値の振幅が一定値以下のときは変化が少ないとみて、量子化ステップサイズに１よりも小さい係数（スケーリングファクタと呼ばれる）を乗じて、量子化ステップサイズを狭めて量子化する。 The adaptive quantization unit 112 performs quantization by increasing / decreasing the quantization step width (step size) according to the past quantization value of the prediction residual signal r so that the quantization error becomes small. That is, when the amplitude of the quantized value of the immediately preceding sample (sample) is equal to or smaller than a certain value, the change is considered to be small, and the quantization step size is multiplied by a coefficient (called a scaling factor) smaller than 1 to obtain a quantization step. Quantize by narrowing the size.

また、直前のサンプルの量子化値の振幅が一定値を超えるときは変化が大きいとみて、量子化ステップサイズに１よりも大きい係数を乗じて、量子化ステップサイズを広げて粗く量子化する。 Further, when the amplitude of the quantized value of the previous sample obtaining ultra constant value is regarded as the change is large, multiplied by a factor greater than 1 to the quantization step size is quantized coarsely spread the quantization step size .

ここで、適応量子化部１１２の量子化レベル数は、符号化ビット数によって決まり、例えば、４ビット符号化であれば１６レベルに量子化される。Ａ／Ｄ部１１１のサンプリング周波数を８ｋＨｚとすれば、適応量子化部１１２のディジタル出力（ＡＤＰＣＭ符号）ｚは、３２ｋｂｉｔ／ｓ（＝８ｋＨｚ×４ビット）となる（Ａ／Ｄ部１１１が出力するディジタル音声信号が６４ｋｂｉｔ／ｓならば圧縮率は１／２である）。 Here, the number of quantization levels of the adaptive quantization unit 112 is determined by the number of encoded bits. For example, in the case of 4-bit encoding, it is quantized to 16 levels. If the sampling frequency of the A / D section 111 and the 8 k Hz, the digital output (ADPCM code) z adaptive quantization unit 112, 32 kbit / s to become (= 8 kHz × 4 bits) (A / D 111 If the output digital audio signal is 64 kbit / s, the compression rate is ½).

また、ＡＤＰＣＭ符号ｚは、ローカルデコーダの適応逆量子化部１１３に入力される。適応逆量子化部１１３は、ＡＤＰＣＭ符号ｚを逆量子化して、量子化予測残差信号ｒａを生成する。加算器１１６は、予測信号ｙと量子化予測残差信号ｒａとを加算して、再生信号（局部再生信号）ｘａを生成する。 Further, the ADPCM code z is input to the adaptive inverse quantization unit 113 of the local decoder. The adaptive inverse quantization unit 113 inversely quantizes the ADPCM code z to generate a quantized prediction residual signal ra. The adder 116 adds the prediction signal y and the quantized prediction residual signal ra to generate a reproduction signal (local reproduction signal) xa.

適応予測部１１４は、内部に適応フィルタを含み、適応フィルタの予測係数を予測残差信号の電力が最小になるように逐次修正しながら、再生信号ｘａと量子化予測残差信号ｒａにもとづいて、次の入力のサンプル値に対する予測信号ｙを生成し、減算器１１５へ送信する。 The adaptive prediction unit 114 includes an adaptive filter therein, and sequentially corrects the prediction coefficient of the adaptive filter so that the power of the prediction residual signal is minimized, and based on the reproduced signal xa and the quantized prediction residual signal ra. The prediction signal y for the sample value of the next input is generated and transmitted to the subtractor 115.

一方、ＡＤＰＣＭ復号器１２０では、伝送されたＡＤＰＣＭ符号ｚに対し、ＡＤＰＣＭ符号器１１０のローカルデコーダと全く同一の処理を行って再生信号ｘａを生成し、Ｄ／Ａ部１２３でアナログ信号に変換して音声出力を得る。 On the other hand, the ADPCM decoder 120 performs exactly the same processing as the local decoder of the ADPCM encoder 110 on the transmitted ADPCM code z to generate a reproduction signal xa, and the D / A unit 123 converts it into an analog signal. To get audio output.

ＡＤＰＣＭの利用分野としては、近年、携帯電話機にＡＤＰＣＭ音源を内蔵して、サンプリングした動物の鳴き声や人の話し声などを着信メロディとして流したり、リアルな再生音を利用して、ゲームの音楽に効果音を挿入するなど、多様な音声サービスに盛んに使われており、さらなる音声品質の向上が求められている。 In recent years, ADPCM has been used as a mobile phone with a built-in ADPCM sound source that plays sampled animal calls and human voices as incoming melody, and uses real-world playback sounds for game music. It is actively used for various voice services such as inserting sound, and further improvement in voice quality is required.

ＡＤＰＣＭによる音声品質向上を図った従来技術としては、入力音声と予測値との差分値に単位量子化幅の１／２を加算または減算した信号を、適応量子化して符号を求め、その符号から次ステップの単位量子化幅を更新して、予測値、逆量子化値から次の予測値を求める技術が提案されている（例えば、特許文献１参照）。
特開平１０−２３３６９６号公報（段落番号〔００４９〕〜〔００８９〕，第１図）図１８で上述したITU-T G.726のＡＤＰＣＭ符号器１１０のループ制御では、現在（時刻ｎ）の１つのサンプルのみの量子化の情報によって、ＡＤＰＣＭ符号を生成している。このため、時刻（ｎ＋１）で急に振幅が増加するような、予測した値よりも大きな信号ｘn+1が入力すると、時刻（ｎ＋１）の量子化ステップサイズΔn+1は小さいままなので、変化に追随できずに大きな量子化誤差が生じてしまう。これを再生すると聴覚的に聞き苦しい音（主観的にはカサカサした音）となり、音質劣化を引き起こすといった問題があった。 As a conventional technique for improving speech quality by ADPCM, a signal obtained by adding or subtracting 1/2 of the unit quantization width to a difference value between an input speech and a predicted value is adaptively quantized to obtain a code. A technique has been proposed in which the unit quantization width of the next step is updated to obtain the next predicted value from the predicted value and the inverse quantized value (see, for example, Patent Document 1).
Japanese Patent Laid-Open No. 10-233696 (paragraph numbers [0049] to [0089], FIG. 1) In the loop control of ADPCM encoder 110 of ITU-T G.726 described above with reference to FIG. An ADPCM code is generated based on quantization information of only one sample. For this reason, when a signal xn + 1 larger than the predicted value whose amplitude suddenly increases at time (n + 1) is input, the quantization step size Δn + 1 at time (n + 1) remains small, and therefore changes. A large quantization error occurs without being able to follow. When this is reproduced, there is a problem that the sound becomes audibly hard to hear (subjectively gritty sound) and causes sound quality deterioration.

また、従来技術（特開平１０−２３３６９６号公報）では、単位量子化幅を更新するために必要なテーブルを、符号器と復号器の両方に用意して置かなければならず、実用面において必ずしも好適とはいえない。 In the prior art (Japanese Patent Laid-Open No. 10-233696), tables necessary for updating the unit quantization width must be prepared and placed in both the encoder and the decoder. It is not preferable.

本発明はこのような点に鑑みてなされたものであり、量子化誤差を抑制して音声品質の向上を図った音声符号化装置を提供することを目的とする。
上記課題を解決するために、図１に示すような、音声信号の符号化を行う音声符号化装置１０において、音声信号のサンプル値に対する符号を求める際に、サンプル値の近傍区間での符号候補の複数の組み合わせとして、先読みサンプル数までに取り得るすべての符号の候補を、符号を求める度に格納する符号候補格納部１１と、符号候補格納部１１で格納されている符号を復号化して再生信号を生成する復号信号生成部１２と、入力サンプル値と再生信号との差分の自乗和を算出して、量子化誤差を最小とする、自乗和が最小値の符号候補を検出し、検出した符号候補の中の符号を出力する誤差評価部１３と、を有することを特徴とする音声符号化装置１０が提供される。 The present invention has been made in view of these points, and an object of the present invention is to provide a speech coding apparatus that suppresses quantization errors and improves speech quality.
To solve the above SL problems, as shown in FIG. 1, the speech coding apparatus 10 for coding voice signals, when obtaining the code for the sample values of the speech signal, the code in the vicinity interval of sample values As a plurality of combinations of candidates , a code candidate storage unit 11 that stores all code candidates that can be taken up to the number of pre-read samples, and a code stored in the code candidate storage unit 11 are decoded every time a code is obtained. a decoded signal generator 12 which generates a reproduced signal, calculates the square sum of the difference between the reproduced signal and the input sample values, to minimize the quantization error, the square sum detects the code candidate minimum, There is provided a speech encoding device 10 including an error evaluation unit 13 that outputs a code among detected code candidates.

ここで、符号候補格納部１１は、音声信号のサンプル値に対する符号を求める際に、サンプル値の近傍区間での符号候補の複数の組み合わせとして、先読みサンプル数までに取り得るすべての符号の候補を、符号を求める度に格納する。復号信号生成部１２は、符号候補格納部１１で格納されている符号を復号化して再生信号を生成する。誤差評価部１３は、入力サンプル値と再生信号との差分の自乗和を算出して、量子化誤差を最小とする、自乗和が最小値の符号候補を検出し、検出した符号候補の中の符号を出力する。 Here, when obtaining the code for the sample value of the audio signal, the code candidate storage unit 11 selects all the code candidates that can be taken up to the number of pre-read samples as a plurality of combinations of code candidates in the neighborhood of the sample value. Each time a sign is obtained, it is stored. The decoded signal generation unit 12 decodes the code stored in the code candidate storage unit 11 to generate a reproduction signal. Error evaluation unit 13 calculates the square sum of the difference between the reproduced signal and the input sample values, to minimize the quantization error, the square sum detects the code candidate minimum, among the detected candidate codes The sign of is output.

本発明の上記および他の目的、特徴および利点は本発明の例として好ましい実施の形態を表す添付の図面と関連した以下の説明により明らかになるであろう。 These and other objects, features and advantages of the present invention will become apparent from the following description taken in conjunction with the accompanying drawings which illustrate preferred embodiments by way of example of the present invention.

以下、本発明の実施の形態を図面を参照して説明する。図１は音声符号化装置の原理図である。音声符号化装置１０は、音声信号の情報を圧縮して符号化を行う装置である。
符号候補格納部１１は、音声信号のサンプル値に対する符号を求める際に、後述の先読みサンプル数ｐｒまでを近傍区間とした、時刻（ｎ＋ｋ）（０≦ｋ≦ｐｒ）までの符号候補｛ｊ１、ｊ２、…、ｊ（ｐｒ＋１）｝の複数（すべて）の組み合わせを格納する。図では、先読みサンプルのｐｒを１として、時刻ｎの符号ｊ１と時刻（ｎ＋１）の符号ｊ２の符号候補の組み合わせを格納している例を示している。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Figure 1 is a principle diagram of a speech coding apparatus. The speech encoding device 10 is a device that compresses and encodes information of a speech signal.
When the code candidate storage unit 11 obtains a code for the sample value of the audio signal, the code candidate {j1, up to time (n + k) (0 ≦ k ≦ pr) with a pre-read sample number pr described later as a neighborhood section. A plurality of (all) combinations of j2,..., j (pr + 1)} are stored. In the drawing, an example is shown in which pr of the pre-read sample is 1, and a combination of code candidates of code j1 at time n and code j2 at time (n + 1) is stored.

復号信号生成部（ローカルデコーダ）１２は、符号候補格納部１１で格納されている符号を順次復号化して再生信号ｓｒを生成する。誤差評価部１３は、入力音声信号の入力サンプル値ｉｎと再生信号ｓｒとの差分の自乗和を算出し、自乗和が最小値の符号候補（＝量子化誤差が最小とみなせる）を検出し、検出した符号候補の中の符号ｉｄｘを出力する。 The decoded signal generation unit (local decoder) 12 sequentially decodes the codes stored in the code candidate storage unit 11 to generate a reproduction signal sr. Error evaluation unit 13 calculates the square sum of the difference between the input sample values in the input audio signal and the reproduced signal sr, square sum detects the minimum value of the candidate codes (= quantization error can be regarded as a minimum) The code idx in the detected code candidates is output.

なお、図中ベクトル表記してあるのは、順次処理が行われることを示すものである。すなわち、符号候補のベクトル表記は、符号候補格納部１１からローカルデコーダ１２へ符号候補｛１、１｝、｛１、２｝、…が順次入力されることを示し、再生信号のベクトル表記は、ローカルデコーダ１２で順次生成されて誤差評価部１３へ入力することを示し、入力サンプル値のベクトル表記は、誤差評価部１３へ順次入力されることを示している。 Note that the vector notation in the figure indicates that sequential processing is performed. That is, the code notation vector notation indicates that code candidates {1, 1}, {1, 2},... Are sequentially input from the code candidate storage unit 11 to the local decoder 12, and the reproduction signal vector notation is: This indicates that the local decoder 12 sequentially generates and inputs to the error evaluation unit 13, and the vector notation of the input sample value indicates that it is sequentially input to the error evaluation unit 13.

なお、後述の図１２のローカルデコーダ１２の構成を使用する場合、符号候補｛１、２｝に対してのローカルデコーダ再生信号ｓｒ[ｎ]とｓｒ[ｎ＋１]は、以下の手順で生成する。 When the configuration of the local decoder 12 shown in FIG. 12 described later is used, the local decoder reproduction signals sr [n] and sr [n + 1] for the code candidates {1, 2} are generated by the following procedure.

適応逆量子化部１２ａにおいて符号＃１を逆量子化して、逆量子化信号ｄｑ[ｎ]を生成する。加算器１２ｂでは、前時刻の再生信号ｓｒ[ｎ−１]を遅延させた遅延信号ｓｅ[ｎ]と加算して、再生信号（局部再生信号）ｓｒ[ｎ]を生成する。 The adaptive inverse quantization unit 12a performs inverse quantization on the code # 1 to generate an inversely quantized signal dq [n]. The adder 12b adds the delayed signal se [n] obtained by delaying the reproduced signal sr [n-1] at the previous time to generate a reproduced signal (local reproduced signal) sr [n].

次にｎ＋１での再生信号を同様の手順で求める。適応逆量子化部１２ａにおいて符号＃２を逆量子化して、逆量子化信号ｄｑ[ｎ＋１]を生成する。加算器１２ｂでは、前時刻の再生信号ｓｒ[ｎ]を遅延させた遅延信号ｓｅ[ｎ＋１]と加算して、再生信号（局部再生信号）ｓｒ[ｎ＋１]を生成する。 Next, the reproduction signal at n + 1 is obtained in the same procedure. The adaptive inverse quantization unit 12a performs inverse quantization on the code # 2 to generate an inversely quantized signal dq [n + 1]. The adder 12b adds the delayed signal se [n + 1] obtained by delaying the reproduction signal sr [n] at the previous time to generate a reproduction signal (local reproduction signal) sr [n + 1].

ここで、時刻ｎのサンプル値に対する符号ｉｄｘ[ｎ]を求める場合、従来では上述したように、現在時刻ｎの１つのサンプルのみの量子化によって符号化を行っていたが、時刻ｎだけでなく時刻ｎ周辺のサンプル区間（＝近傍区間）の情報も誤差評価の対象として利用して、符号ｉｄｘ[ｎ]を求めるものである。 Here, when obtaining the code idx [n] for the sample value at time n, as described above in the conventional, had been encoded by the quantization of only one sample at the current time n, only when time n In addition, the code idx [n] is obtained by using the information of the sample interval (= neighboring interval) around the time n as an error evaluation target.

すなわち、現在のサンプル値だけでなく、未来のサンプル（先読みサンプルと呼ぶ）も利用するということであり、例えば、先読みサンプルを１としたら、時刻ｎ及び時刻（ｎ＋１）の２サンプルの情報までを考慮して、時刻ｎの符号ｉｄｘ[ｎ]を求めることになる。 That is, not only the current sample value, the future of the sample (pre-read samples and hump) also means that use, for example, when the look-ahead sample and 1, to two samples of information at time n and time (n + 1) Thus, the code idx [n] at time n is obtained.

また、先読みサンプルを２としたら、時刻ｎ、時刻（ｎ＋１）、時刻（ｎ＋２）の３サンプルの情報までを考慮して、時刻ｎの符号ｉｄｘ[ｎ]を求めることになる。なお、本装置の詳細動作については図４以降で説明する。 Further, if the pre-read sample is 2, the code idx [n] at time n is obtained in consideration of the information of three samples of time n, time (n + 1), and time (n + 2). The detailed operation of this apparatus will be described with reference to FIG.

次に解決すべき問題点について図２、図３を用いて詳しく説明する。図２は再生信号を求めている様子を示す図である。説明を簡略にするために、予測なし（単に入力サンプルと再生信号との差分を量子化）として、１サンプルあたり２ビット（量子化レベルは４通り）で量子化するものとする。 2 next to resolve to be a problem, will be described in detail with reference to FIG. FIG. 2 is a diagram illustrating a state in which a reproduction signal is obtained. In order to simplify the description, it is assumed that quantization is performed with 2 bits (4 quantization levels) per sample without prediction (the difference between the input sample and the reproduction signal is simply quantized).

音声信号に対して、時刻（ｎ−１）でサンプルしたサンプル値をＸｎ−１、時刻ｎでサンプルしたサンプル値をＸｎとする。また、時刻（ｎ−１）で復号された再生信号がＳｎ−１であったとする。 Let Xn-1 be the sample value sampled at time (n-1) and Xn be the sample value sampled at time n for the audio signal. Further, it is assumed that the reproduced signal decoded at time (n−1) is Sn−1.

ここで、時刻ｎにおける再生信号を求める場合、まず、時刻ｎのサンプル値Ｘｎと、時刻（ｎ−１）の再生信号Ｓｎ−１との差分をとって差分信号Ｅｎを生成する（予測処理を行うのであれば同一時刻での差分を求めるが、ここでは予測なしとしたので、１つ前の再生信号と現在の入力サンプル値との差分が求められる）。 Here, when obtaining the reproduction signal at time n, first, the difference signal En is generated by taking the difference between the sample value Xn at time n and the reproduction signal Sn-1 at time (n−1) (prediction processing is performed). If so, the difference at the same time is obtained, but since there is no prediction here, the difference between the previous reproduction signal and the current input sample value is obtained).

そして、この差分信号Ｅｎに量子化を施して、時刻ｎにおける量子化値を選択する。ここでは２ビットの量子化としたので、量子化値はｈ１〜ｈ４の４通りあり、これら４候補の中から、差分信号Ｅｎの値を最も正しく表現できるもの（サンプル値Ｘｎに最も近接するもの）が選択されることになる（なお、ドットの間隔が量子化ステップサイズに対応する）。 Then, the difference signal En is quantized to select a quantized value at time n. In this case, since 2-bit quantization is used, there are four quantization values h1 to h4. Among these four candidates, one that can most accurately represent the value of the difference signal En (one that is closest to the sample value Xn). ) Is selected (note that the dot interval corresponds to the quantization step size).

図では、差分信号Ｅｎを最も正しく表現できるものは量子化値ｈ３である（すなわち、サンプル値Ｘｎと最も近接なドットはｈ３）。したがって、時刻ｎにおける再生信号として、量子化値ｈ３（Ｓｎとする）を選択し、量子化値ｈ３を示すＡＤＰＣＭ符号が符号器から出力することになる。 In the figure, the value that can most accurately represent the difference signal En is the quantized value h3 (that is, the dot closest to the sample value Xn is h3). Therefore, a quantized value h3 (referred to as Sn) is selected as a reproduction signal at time n, and an ADPCM code indicating the quantized value h3 is output from the encoder.

図３は振幅変動に追随できずに大きな量子化誤差が発生する様子を示す図である。従来のＡＤＰＣＭ符号器の問題点を示している。図２で示した音声信号に対して、時刻（ｎ＋１）でサンプルしたサンプル値をＸｎ＋１、時刻（ｎ＋２）でサンプルしたサンプル値をＸｎ＋２とする。また、時刻ｎで復号された再生信号は図２で示したＳｎである。なお、音声信号は、時刻（ｎ＋１）付近で急に振幅が増加する波形とする。 FIG. 3 is a diagram illustrating a state in which a large quantization error occurs without following the amplitude fluctuation. The problem of the conventional ADPCM encoder is shown. For the audio signal shown in FIG. 2, the sample value sampled at time (n + 1) is Xn + 1, and the sample value sampled at time (n + 2) is Xn + 2. Also, the reproduction signal decoded at time n is Sn shown in FIG. The audio signal has a waveform whose amplitude suddenly increases around time (n + 1).

時刻（ｎ＋１）における再生信号を求める場合を考える。まず、時刻（ｎ＋１）のサンプル値Ｘｎ＋１と、時刻ｎの再生信号Ｓｎとの差分をとって差分信号Ｅｎ＋１を生成する。 Consider a case where a reproduction signal at time (n + 1) is obtained. First, the difference signal En + 1 is generated by taking the difference between the sample value Xn + 1 at time (n + 1) and the reproduction signal Sn at time n.

そして、差分信号Ｅｎ＋１に量子化を施して、時刻（ｎ＋１）の量子化値を選択する。２ビットの量子化なので、量子化値の候補は、ｈ５〜ｈ８の４通りある。また、これら量子化値の量子化ステップサイズは、直前で選択された量子化値によって決まる。 Then, the difference signal En + 1 is quantized to select a quantized value at time (n + 1). Since it is 2-bit quantization, there are four candidates for quantized values, h5 to h8. The quantization step size of these quantization values is determined by the quantization value selected immediately before.

すなわち、直前で選択された量子化値が、４つあるドットの真ん中２つのいずれかが選ばれているなら、時刻（ｎ−１）から時刻ｎへの振幅変動は少ないため、時刻ｎから時刻（ｎ＋１）への振幅変動も少ないであろうとみなして、時刻（ｎ＋１）の量子化ステップサイズは小さくする。 That is, if one of the two middle dots of the four dots is selected as the quantized value selected immediately before, since the amplitude fluctuation from time (n−1) to time n is small, time n to time Assuming that the amplitude fluctuation to (n + 1) will be small, the quantization step size at time (n + 1) is reduced.

また、直前で選択された量子化値が、４つあるドットの両端のいずれかが選ばれた場合には、時刻（ｎ−１）から時刻ｎへの振幅変動は大きいため、時刻ｎから時刻（ｎ＋１）への振幅変動も大きいであろうとみなして、時刻（ｎ＋１）の量子化ステップサイズは大きくする。 In addition, when the quantization value selected immediately before is selected at either end of four dots, the amplitude variation from time (n−1) to time n is large, so time n to time Assuming that the amplitude fluctuation to (n + 1) will also be large, the quantization step size at time (n + 1) is increased.

ここの例では、時刻ｎの再生信号Ｓｎは、再生信号候補ｈ１〜ｈ４の中のｈ３を選択したものであるから（真ん中２つの内の１つである）、振幅変動が少ないとみなせるので、時刻（ｎ＋１）の量子化値の量子化ステップサイズは（つまりｈ５〜ｈ８のドット間隔は）、小さくする（時刻ｎで用いた１より小さいスケーリングファクタを時刻（ｎ＋１）でも用いて、ｈ１〜ｈ４のドット間隔と同じとしている）。 In this example, the reproduction signal Sn at time n is selected from h3 among the reproduction signal candidates h1 to h4 (one of the two in the middle), so that it can be considered that the amplitude variation is small. The quantization step size of the quantized value at time (n + 1) is reduced (that is, the dot interval between h5 and h8) is reduced (a scaling factor smaller than 1 used at time n is also used at time (n + 1), and h1 to h4. Is the same as the dot spacing of

その後、量子化値の候補ｈ５〜ｈ８の中から、差分信号Ｅｎ＋１を最も正しく表現できるものを選択することになる。ところが、時刻（ｎ＋１）で音声信号の振幅が急に立ち上がっているため、量子化ステップサイズが大きくない再生信号候補ｈ５〜ｈ８の中から差分信号Ｅｎ＋１をもっとも正しく表現できるもの（サンプル値Ｘｎ＋１に最も近接なドット）を選ぶとしてもせいぜいｈ５しかない。 Thereafter, one that can most accurately represent the difference signal En + 1 is selected from the quantized value candidates h5 to h8. However, since the amplitude of the audio signal suddenly rises at time (n + 1), the difference signal En + 1 can be expressed most correctly among the reproduction signal candidates h5 to h8 whose quantization step size is not large (the sample value Xn + 1 is the most). Even if you select (close dots), there is at most h5.

したがって、時刻（ｎ＋１）における再生信号は、量子化値ｈ５（Ｓｎ＋１）が選択され、量子化値ｈ５を示すＡＤＰＣＭ符号が符号器から出力されることになる。しかし、図からわかるように、量子化誤差が大きくなってしまい、音質劣化を招くことになる。 Therefore, the quantized value h5 (Sn + 1) is selected for the reproduced signal at time (n + 1), and an ADPCM code indicating the quantized value h5 is output from the encoder. However, as can be seen from the figure, the quantization error increases, leading to sound quality degradation.

次に時刻（ｎ＋２）での量子化に対し、時刻（ｎ＋１）の再生信号Ｓｎ＋１は、再生信号候補ｈ５〜ｈ８の中のｈ５を選択したものであるから（両端の内の１つである）、振幅変動が大きいとみなし、時刻（ｎ＋２）の量子化値の量子化ステップサイズは（つまりｈ９〜ｈ１２のドット間隔は）、時刻（ｎ＋１）の量子化ステップサイズよりも大きくなっている。そして、上述と同様な処理を行って、再生信号としてはｈ９が選択されることになる。 Next, for the quantization at time (n + 2), the reproduction signal Sn + 1 at time (n + 1) is selected from h5 among reproduction signal candidates h5 to h8 (one of both ends). The quantization step size of the quantization value at time (n + 2) (that is, the dot interval from h9 to h12) is larger than the quantization step size at time (n + 1). Then, processing similar to that described above is performed, and h9 is selected as the reproduction signal.

このように、従来のＡＤＰＣＭでは、音声の急なレベル変化があった場合でも、変化量が小さい振幅増加前の量子化ステップサイズで、振幅変動の大きいサンプルの量子化値を求めているために、大きな量子化誤差が発生してしまい、音質劣化が生じていた。音声符号化装置１０では、音声の振幅変動が大きい場合でも、量子化誤差を効率よく抑制して音声品質の向上を図るものである。 In this way, in the conventional ADPCM, even when there is a sudden change in the level of speech, the quantized value of a sample with a large amplitude variation is obtained with the quantization step size before the amplitude increase with a small change amount. As a result, a large quantization error occurred, resulting in deterioration of sound quality. The speech encoding device 10 is intended to improve speech quality by efficiently suppressing quantization errors even when speech amplitude fluctuation is large.

次に音声符号化装置１０の構成及び動作について以降詳しく説明する。最初に符号候補格納部１１について説明する。図４は符号候補格納部１１で格納される符号候補の概念を説明するための図である。今、時刻ｎにおける音声信号のサンプル値の符号ｉｄｘ[ｎ]を求める場合を考える。また、時刻（ｎ＋１）のサンプル値までを、時刻ｎのサンプル値の近傍区間とし（すなわち、先読みサンプル１とする）、１サンプルあたり２ビットの量子化と仮定する。 It will be described in detail later configuration and operation of the voice encoding device 10 to the next. First, the code candidate storage unit 11 will be described. FIG. 4 is a diagram for explaining the concept of code candidates stored in the code candidate storage unit 11. Consider a case where the code idx [n] of the sample value of the audio signal at time n is obtained. Further, it is assumed that up to the sample value at time (n + 1) is a neighborhood of the sample value at time n (that is, prefetch sample 1), and quantization of 2 bits per sample.

時刻ｎのサンプル値に対する量子化値の符号ｊ１は、＃１〜＃４の４通りの候補があり、符号ｊ１の＃１〜＃４それぞれに対して、時刻（ｎ＋１）の符号ｊ２も＃１〜＃４の４通りの候補がある。 The code j1 of the quantized value for the sample value at time n has four candidates # 1 to # 4. For each of # 1 to # 4 of code j1, the code j2 at time (n + 1) is also # 1. There are four candidates of ~ # 4.

ここで、例えば、時刻ｎのサンプル値に対する符号ｊ１に＃１を選択して、時刻（ｎ＋１）の符号ｊ２に＃１を選択した場合を｛１、１｝のように表記すると、符号候補のすべての組み合わせは、｛１、１｝、｛１、２｝、…｛４、３｝、｛４、４｝の１６通りあることになる。 Here, for example, when # 1 is selected as the code j1 for the sample value at time n and # 1 is selected as the code j2 at time (n + 1), the code candidate is expressed as {1, 1}. There are 16 combinations of {1, 1}, {1, 2},... {4, 3}, {4, 4}.

したがって、現在時刻ｎの符号を２ビットの量子化で求める際に、先読みサンプル１として、時刻（ｎ＋１）のサンプル値までを使用すると、符号候補格納部１１では、時刻ｎの符号ｊ１と時刻（ｎ＋１）の符号ｊ２の符号のすべての１６通りの組み合わせ｛ｊ１、ｊ２｝＝｛１、１｝、…、｛４、４｝が格納されることになる。 Accordingly, when the code at the current time n is obtained by quantization of 2 bits, if the sample value up to the time (n + 1) is used as the prefetch sample 1, the code candidate storage unit 11 uses the code j1 at the time n and the time ( All 16 combinations {j1, j2} = {1, 1},..., {4, 4} of the code of the code j2 of (n + 1) are stored.

また、符号候補格納部１１は、これら符号候補をローカルデコーダ１２に順次入力し、１６通りすべて入力し終わると、次は装置内では現在時刻（ｎ＋１）の符号を求めることになるので、時刻（ｎ＋２）のサンプル値までを使用することになり、符号候補格納部１１には、時刻（ｎ＋１）の符号ｊ１と、時刻（ｎ＋２）の符号ｊ２とのすべての１６通りの組み合わせが格納され、再びローカルデコーダ１２へ入力することになる。以下、このような動作が繰り返される。 Further, the code candidate storage unit 11 sequentially inputs these code candidates to the local decoder 12, and when all the 16 patterns have been input, the next time is to obtain the code of the current time (n + 1) in the apparatus. n + 2) sample values are used, and the code candidate storage unit 11 stores all 16 combinations of the code j1 at time (n + 1) and the code j2 at time (n + 2). This is input to the local decoder 12. Thereafter, such an operation is repeated.

なお、上記の例では、時刻ｎの符号ｉｄｘ[ｎ]を求める際に、先読みサンプル１として時刻（ｎ＋１）までを含めたが、２ビット量子化で先読みサンプル２とすれば、符号候補格納部１１には、時刻ｎの符号ｊ１、時刻（ｎ＋１）の符号ｊ２、時刻（ｎ＋２）の符号ｊ３のすべての符号の組み合わせ｛ｊ１、ｊ２、ｊ３｝＝｛１、１、１｝、…、｛４、４、４｝の６４通りの候補が格納されることになる（以降、同様な考え方である）。 In the above example, when the code idx [n] at time n is obtained, the pre-read sample 1 includes up to time (n + 1), but if the pre-read sample 2 is obtained by 2-bit quantization, the code candidate storage unit 11 includes a combination of all codes {j1, j2, j3} = {1, 1, 1},... {, Code j1 at time n, code j2 at time (n + 1), and code j3 at time (n + 2). 64 candidates of 4, 4, 4} are stored (hereinafter, the same way of thinking).

次に符号化時に量子化誤差を抑制するときの動作について図５〜図１１を用いて説明する。なお、時刻ｎの符号ｉｄｘ[ｎ]を求めるものとし、先読みサンプル１として時刻（ｎ＋１）の情報を利用する。また、説明を簡略化するために、予測なしとし、量子化は２ビットで行うものとする。 Next, the operation for suppressing the quantization error during encoding will be described with reference to FIGS. Note that the code idx [n] at time n is obtained, and information at time (n + 1) is used as the prefetch sample 1. In addition, in order to simplify the description, it is assumed that there is no prediction and quantization is performed with 2 bits.

図５〜図１０は動作を説明するための図である。音声信号に対して、時刻ｎでサンプルしたサンプル値をＸｎ、時刻（ｎ＋１）でサンプルしたサンプル値をＸｎ＋１とする。また、音声信号は、時刻（ｎ＋１）付近で急に振幅が増加する波形とする。 Figures 5-10 are views for explaining the operation. Let Xn be the sample value sampled at time n and Xn + 1 be the sample value sampled at time (n + 1) for the audio signal. The audio signal has a waveform whose amplitude suddenly increases around time (n + 1).

図５に対し、時刻ｎにおける符号候補ｊ１を復号した際の符号候補は＃１〜＃４の４通りある。ここで、時刻ｎにおいて、符号候補＃１を最初に選択したとする。すると、符号候補＃１に対応する、時刻（ｎ＋１）において選択可能な符号候補は、量子化ステップサイズの広い＃（１−１）〜＃（１−４）の４通りある。 As compared with FIG. 5, there are four code candidates # 1 to # 4 when the code candidate j1 at time n is decoded. Here, it is assumed that code candidate # 1 is first selected at time n. Then, there are four types of code candidates # (1-1) to # (1-4) having a wide quantization step size corresponding to code candidate # 1 and selectable at time (n + 1).

図６に対し、時刻（ｎ＋１）の符号候補として、＃（１−１）を選択したとする。このとき、時刻ｎのサンプル値Ｘｎと、符号候補＃１との差分ｄ₁を求め、時刻（ｎ＋１）のサンプル値Ｘｎ＋１と、符号候補＃（１−１）との差分ｄ_1-1を求める。そして、これらの差分の自乗和を算出して誤差評価値ｅ（｛１、１｝）を求める。 In FIG. 6, it is assumed that # (1-1) is selected as a code candidate at time (n + 1). At this time, the difference d ₁ between the sample value Xn at time n and the code candidate # 1 is obtained, and the difference d _1-1 between the sample value Xn + 1 at time (n + 1) and the code candidate # (1-1) is obtained. . Then, the sum of squares of these differences is calculated to obtain an error evaluation value e ({1, 1}).

ｅ（｛１、１｝）＝（ｄ₁）²＋（ｄ_1-1）² ・・・（１）
図７に対し、時刻（ｎ＋１）の符号候補として、＃（１−２）を選択したとする。このとき、時刻ｎのサンプル値Ｘｎと、符号候補＃１との差分はｄ₁であり、また、時刻（ｎ＋１）のサンプル値Ｘｎ＋１と、符号候補＃（１−２）との差分ｄ_1-2が求められる。そして、これらの差分の自乗和を算出して誤差評価値ｅ（｛１、２｝）を求める。 e ({1, 1}) = (d ₁ ) ² + (d _1-1 ) ² (1)
In FIG. 7, it is assumed that # (1-2) is selected as the code candidate at time (n + 1). At this time, the difference between the sample value Xn at time n and the code candidate # 1 is d ₁ , and the difference d ₁₋ between the sample value Xn + 1 at time (n + 1) and the code candidate # (1-2). ₂ is required. Then, the sum of squares of these differences is calculated to obtain an error evaluation value e ({1, 2}).

ｅ（｛１、１｝）＝（ｄ₁）²＋（ｄ_1-2）² ・・・（２）
以下、時刻（ｎ＋１）の符号候補として、＃（１−３）、＃（１−４）を選択した場合も同様の処理を行って、誤差評価値ｅ（｛１、３｝）、ｅ（｛１、４｝）を求める。 e ({1, 1}) = (d ₁ ) ² + (d _1-2 ) ² (2)
Hereinafter, when # (1-3) and # (1-4) are selected as code candidates at time (n + 1), the same processing is performed, and error evaluation values e ({1, 3}), e ( {1, 4}).

図８に対し、時刻ｎにおいて、符号候補＃２を選択したとする。すると、符号候補＃２に対応する、時刻（ｎ＋１）において選択可能な符号候補は、量子化ステップサイズの狭い＃（２−１）〜＃（２−４）の４通りある。 In FIG. 8, it is assumed that code candidate # 2 is selected at time n. Then, there are four types of code candidates # (2-1) to # (2-4) having a narrow quantization step size corresponding to code candidate # 2 and selectable at time (n + 1).

図９に対し、時刻（ｎ＋１）の符号候補として、＃（２−１）を選択したとする。このとき、時刻ｎのサンプル値Ｘｎと、符号候補＃２との差分ｄ₂を求め、また、時刻（ｎ＋１）のサンプル値Ｘｎ＋１と、符号候補＃（２−１）との差分ｄ_2-1が求められる。そして、これら差分の自乗和を算出して誤差評価値ｅ（｛２、１｝）を求める。 In FIG. 9, it is assumed that # (2-1) is selected as a code candidate at time (n + 1). At this time, the difference d ₂ between the sample value Xn at time n and the code candidate # 2 is obtained, and the difference d _2-1 between the sample value Xn + 1 at time (n + 1) and the code candidate # (2-1). Is required. Then, the sum of squares of these differences is calculated to obtain an error evaluation value e ({2, 1}).

ｅ（｛２、１｝）＝（ｄ₂）²＋（ｄ_2-1）² ・・・（３）
図１０に対し、時刻（ｎ＋１）の符号候補として、＃（２−２）を選択したとする。このとき、時刻ｎのサンプル値Ｘｎと、再生信号候補＃２との差分はｄ₂であり、また、時刻（ｎ＋１）のサンプル値Ｘｎ＋１と、符号候補＃（２−２）との差分ｄ_2-2が求められる。そして、これら差分の自乗和を算出して誤差評価値ｅ（｛２、２｝）を求める。 e ({2, 1}) = (d ₂ ) ² + (d _2-1 ) ² (3)
In FIG. 10, it is assumed that # (2-2) is selected as a code candidate at time (n + 1). At this time, the sample value Xn at time n, the difference between the reproduction signal candidate # 2 is d _2, also the time (n + 1) and the sample value Xn + 1 of the difference d ₂ between the candidate codes # (2-2) _-2 is required. Then, an error evaluation value e ({2, 2}) is obtained by calculating the square sum of these differences.

ｅ（｛２、２｝）＝（ｄ₂）²＋（ｄ_2-2）² ・・・（４）
以下、時刻（ｎ＋１）の符号候補として、＃（２−３）、＃（２−４）を選択した場合も同様の処理を行って、誤差評価値ｅ（｛２、３｝）、ｅ（｛２、４｝）を求める。 e ({2, 2}) = (d ₂ ) ² + (d _2-2 ) ² (4)
Hereinafter, when # (2-3) and # (2-4) are selected as code candidates at time (n + 1), the same processing is performed, and error evaluation values e ({2, 3}), e ( {2, 4}).

このような処理を時刻ｎにおける符号候補＃３、＃４についても行い、結局、１６個の誤差評価値ｅ（｛１、１｝）〜ｅ（｛４、４｝）を求める。そして、誤差評価値ｅ（｛１、１｝）〜ｅ（｛４、４｝）の中から最小値を選択する。この例の場合、図６で説明した誤差評価値ｅ（｛１、１｝）が最小値になることが、図から判別できる。したがって、時刻ｎの符号候補＃１が最終的に選択決定され、符号候補＃１を表す符号ｉｄｘ[ｎ]が伝送路上へ出力されることになる。 Such processing is also performed for the code candidates # 3 and # 4 at time n, and finally, 16 error evaluation values e ({1, 1}) to e ({4, 4}) are obtained. Then, the minimum value is selected from the error evaluation values e ({1, 1}) to e ({4, 4}). In the case of this example, it can be determined from the figure that the error evaluation value e ({1, 1}) described in FIG. 6 is the minimum value. Therefore, code candidate # 1 at time n is finally selected and determined, and code idx [n] representing code candidate # 1 is output onto the transmission line.

ここで、従来技術と比較しながら音声符号化装置１０の特徴について説明する。図１１は符号選択を示す図である。もし、上記の図５〜図１０の例に対して、図３で説明したような従来技術の処理を行ったとすると、時刻ｎでは、サンプル値Ｘｎに最も近接な位置にある候補＃２が選択され、時刻（ｎ＋１）では、サンプル値Ｘｎ＋１に最も近接な位置にある候補＃（２−１）が選択されることになる。すると、時刻ｎでは量子化誤差ｅ_1aが小さくても、時刻（ｎ＋１）では大きな量子化誤差ｅ_2aが発生してしまうことになる。 Here, the features of the speech coding apparatus 10 will be described in comparison with the prior art. Figure 11 is a diagram showing a sign-selection. If the prior art processing described with reference to FIG. 3 is performed on the examples of FIGS. 5 to 10, the candidate # 2 closest to the sample value Xn is selected at time n. At time (n + 1), candidate # (2-1) located closest to the sample value Xn + 1 is selected. Then, even if the quantization error e _1a is small at time n, a large quantization error e _2a occurs at time (n + 1).

ここで、量子化ステップサイズを決めるには、直前で選択された値によって決めることは従来と同じであるが、従来の処理では、過去に決定された符号にもとづいて、次の量子化ステップサイズを決めている。したがって、時刻ｎでは、時刻ｎのサンプル値に最も近い符号を決定できたとしても、次のサンプリング時刻（ｎ＋１）で振幅変動が急激に増加したような場合、変化量が小さい振幅増加前の量子化ステップサイズで、時刻（ｎ＋１）の符号を求めてしまうことが起こるため、時刻（ｎ＋１）では大きな量子化誤差ｅ_2aが発生してしまう。 Here, To determine the quantization step size, but be determined by the value selected just before is the same as that of the conventional, in the conventional process, based on the determined past code, the next quantization step size Have decided. Therefore, even if the code closest to the sample value at time n can be determined at time n, if the amplitude fluctuation increases rapidly at the next sampling time (n + 1), the quantum before the amplitude increase with a small change amount is obtained. Since the sign of time (n + 1) is obtained with the quantization step size, a large quantization error e _2a occurs at time (n + 1).

一方、音声符号化装置１０の場合、近傍サンプル区間内の符号候補すべてに対して発生する量子化誤差をあらかじめ求めておき、量子化誤差が最小となる符号候補の組み合わせを選択する。このため、振幅変動が急激に増加する場合であっても、その振幅変動が近傍区間内にあれば、従来のように１つのサンプル地点のみ大きな量子化誤差を発生する符号を選択するようなことがなくなる。 On the other hand, in the case of the speech coding apparatus 10 , quantization errors occurring for all code candidates in the neighboring sample section are obtained in advance, and a combination of code candidates that minimizes the quantization error is selected. For this reason, even if the amplitude fluctuation increases rapidly, if the amplitude fluctuation is in the vicinity section, a code that generates a large quantization error only at one sample point is selected as in the prior art. Disappears.

例えば、図６は、誤差評価値が最小となる符号候補＃１、＃（１−１）を示しており、時刻ｎでは候補＃１を選択決定しているため、時刻ｎの量子化誤差だけについて見ると、量子化誤差ｅ₁（＝ｄ₁）は、図１１の従来処理と比べて大きくはなっている（ｅ₁＞ｅ_1a）。 For example, FIG. 6 shows code candidates # 1 and # (1-1) having the smallest error evaluation value. Since the candidate # 1 is selected and determined at time n, only the quantization error at time n is displayed. , The quantization error e ₁ (= d ₁ ) is larger than that in the conventional process of FIG. 11 (e ₁ > e _1a ).

ただし、時刻ｎで候補＃１を選択することで、時刻（ｎ＋１）では量子化ステップサイズを広げることができる。このため、時刻（ｎ＋１）ではステップサイズが広がった候補＃１−１〜＃１−４の中でサンプル値Ｘｎ＋１に近接な候補を選択することになるので、結局、（ｅ₁＋ｅ₂（＝ｄ_1-1））＜（ｅ_1a＋ｅ_2a）となり、音声符号化装置１０の方が量子化誤差を小さくできることがわかる。 However, by selecting candidate # 1 at time n, the quantization step size can be increased at time (n + 1). Therefore, at time (n + 1), a candidate close to the sample value Xn + 1 is selected from the candidates # 1 −1 to # 1 -4 whose step size is widened, so that (e ₁ + e ₂ (= d _1-1 )) <(e _1a + e _2a ) It can be seen that the speech coding apparatus 10 can reduce the quantization error.

このように、振幅変動前は量子化誤差を小さくできても、振幅変動後に大きな量子化誤差を発生させてしまう従来技術に対して、音声符号化装置１０では、振幅変動前後で量子化誤差を総体的に小さくする構成としたので、Ｓ／Ｎの向上を図ることが可能になる。 Thus, in contrast to the conventional technique that generates a large quantization error after amplitude fluctuation even if the quantization error can be reduced before amplitude fluctuation, the speech coding apparatus 10 reduces the quantization error before and after the amplitude fluctuation. Since the overall configuration is small, the S / N can be improved.

次にローカルデコーダ１２の詳細ブロックを示した音声符号化装置１０について説明する。図１２は音声符号化装置１０の構成を示す図である。音声符号化装置１０は、符号候補格納部１１、ローカルデコーダ１２、誤差評価部１３を含む。ローカルデコーダ１２は、適応逆量子化部１２ａ、加算器１２ｂ、遅延部１２ｃから構成され、誤差評価部１３は、差分自乗和算出部１３ａ、最小値検出部１３ｂから構成される。符号候補格納部１１については上述したので、ローカルデコーダ１２、誤差評価部１３について説明する。なお、符号候補格納部１１では、時刻ｎの符号ｊ１、時刻（ｎ＋１）の符号ｊ２の｛ｊ１、ｊ２｝の組み合わせを格納しているものとする。 Next, the speech encoding apparatus 10 showing detailed blocks of the local decoder 12 will be described. FIG. 12 is a diagram showing the configuration of the speech encoding apparatus 10. The speech encoding apparatus 10 includes a code candidate storage unit 11, a local decoder 12, and an error evaluation unit 13. The local decoder 12 includes an adaptive inverse quantization unit 12a, an adder 12b, and a delay unit 12c. The error evaluation unit 13 includes a difference square sum calculation unit 13a and a minimum value detection unit 13b. Since the code candidate storage unit 11 has been described above, the local decoder 12 and the error evaluation unit 13 will be described. It is assumed that the code candidate storage unit 11 stores a combination of a code j1 at time n and a {j1, j2} code j2 at time (n + 1).

ローカルデコーダ１２に対し、適応逆量子化部１２ａは、符号候補｛１、１｝を受信すると、前回の時刻（ｎ−１）で処理した結果から量子化ステップサイズを更新する。そして、最初に時刻ｎのｊ１＝＃１の符号に対応する量子化値を認識した後、その量子化値を逆量子化して、逆量子化信号ｄｑ[ｎ]を出力する。 When receiving the code candidate {1, 1} to the local decoder 12, the adaptive inverse quantization unit 12a updates the quantization step size from the result processed at the previous time (n-1). Then, after first recognizing the quantized value corresponding to the code of j1 = # 1 at time n, the quantized value is dequantized and the dequantized signal dq [n] is output.

加算器１２ｂは、遅延部１２ｃから出力される遅延信号ｓｅ[ｎ]（時刻（ｎ−１）の再生信号ｓｒ[ｎ−１]を１サンプル時間遅延した信号である）と、逆量子化信号ｄｑ[ｎ]とを加算して、再生信号ｓｒ[ｎ]（＝ｄｑ[ｎ]＋ｓｅ[ｎ]）を生成し、遅延部１２ｃ及び誤差評価部１３へ出力する。遅延部１２ｃは、再生信号ｓｒ[ｎ]を受信すると、１サンプル時間遅延させて遅延信号ｓｅ[ｎ＋１]を出力し、加算器１２ｂへフィードバックする。 The adder 12b includes a delay signal se [n] (a signal obtained by delaying the reproduction signal sr [n-1] at time (n-1) by one sample time) output from the delay unit 12c, and an inverse quantized signal. dq [n] is added to generate a reproduction signal sr [n] (= dq [n] + se [n]), which is output to the delay unit 12c and the error evaluation unit 13. When receiving the reproduction signal sr [n], the delay unit 12c outputs a delay signal se [n + 1] with a delay of one sample time, and feeds it back to the adder 12b.

次に適応逆量子化部１２ａは、時刻（ｎ＋１）のｊ２＝＃１の符号に対応する量子化値を認識した後、その量子化値を逆量子化して、逆量子化信号ｄｑ[ｎ＋１]を出力する。そして、加算器１２ｂ、遅延部１２ｃでは、上述と同様な処理が行われて、符号ｊ２に対する再生信号が生成される。 Next, the adaptive inverse quantization unit 12a recognizes the quantized value corresponding to the code of j2 = # 1 at time (n + 1), and then inversely quantizes the quantized value to obtain the inverse quantized signal dq [n + 1]. ] Is output. Then, the adder 12b and the delay unit 12c perform the same processing as described above to generate a reproduction signal for the code j2.

誤差評価部１３に対し、差分自乗和算出部１３ａは、入力サンプル値ｉｎ[ｎ]と、再生信号ｓｒ[ｎ]とを受信して、以下の式にもとづいて差分自乗和を算出する。ただし、０≦ｋ≦ｐｒである（ｐｒは先読みサンプル数）。 For the error evaluation unit 13, the difference square sum calculation unit 13a receives the input sample value in [n] and the reproduction signal sr [n], and calculates the difference square sum based on the following equation. However, 0 ≦ k ≦ pr (pr is the number of pre-read samples).

最小値検出部１３ｂは、すべての符号候補に対する式（５）の値から最小値を検出する。そして、最小値である符号候補の中から時刻ｎの符号候補（再生信号）を認識し、その符号候補に対応する符号ｉｄｘ[ｎ]を伝送路上へ出力する。

The minimum value detection unit 13b detects the minimum value from the value of Expression (5) for all code candidates. Then, the code candidate (reproduced signal) at time n is recognized from the code candidates having the minimum value, and the code idx [n] corresponding to the code candidate is output onto the transmission path.

なお、上記の構成に対して、予測を行う場合には、遅延部１２ｃを適応予測部に置き換え、この適応予測部に再生信号および逆量子化信号を入力する構成とすれば、適応予測方式に対応することができる。 If prediction is performed for the above configuration, the delay unit 12c is replaced with an adaptive prediction unit, and a reproduction signal and a dequantized signal are input to the adaptive prediction unit. Can respond.

図１３は音声符号化装置１０の動作概要を示すフローチャートである。符号候補は｛ｊ１、ｊ２｝とし、ｊ１は時刻ｎの符号、ｊ２は時刻（ｎ＋１）の符号である。
〔Ｓ１〕符号候補格納部１１は、符号候補｛ｊ１、ｊ２｝を格納する。
〔Ｓ２〕ローカルデコーダ１２は、時刻ｎの符号ｊ１の再生信号を生成する。
〔Ｓ３〕ローカルデコーダ１２は、時刻（ｎ＋１）の符号ｊ２の再生信号を生成する。
〔Ｓ４〕誤差評価部１３は、式（５）にもとづき、誤差評価値ｅ（｛ｊ１、ｊ２｝）を算出する。
〔Ｓ５〕すべての符号候補｛ｊ１、ｊ２｝＝｛１、１｝〜｛ｆ、ｆ｝に対する誤差を算出したならばステップＳ６へいき、そうでなければステップＳ２へ戻る。
〔Ｓ６〕誤差評価部１３は、誤差評価値ｅ（｛ｊ１、ｊ２｝）の最小値を検出し、最小値となった｛ｊ１、ｊ２｝のｊ１を時刻ｎの符号ｉｄｘ[ｎ]として出力する。
〔Ｓ７〕ローカルデコーダ１２は、ステップＳ６で決定された時刻ｎのｊ１にもとづいて、時刻（ｎ＋１）における量子化ステップサイズの更新を行う。
〔Ｓ８〕時刻ｎを更新し、時刻（ｎ＋１）の符号を求める処理に入る（符号候補格納部１１には、時刻（ｎ＋１）の符号ｊ１、時刻（ｎ＋２）の符号ｊ２の符号候補｛ｊ１、ｊ２｝が格納されることになる）。 FIG. 13 is a flowchart showing an outline of the operation of the speech encoding apparatus 10. The code candidates are {j1, j2}, j1 is a code at time n, and j2 is a code at time (n + 1).
[S1] The code candidate storage unit 11 stores code candidates {j1, j2}.
[S2] The local decoder 12 generates a reproduction signal of code j1 at time n.
[S3] The local decoder 12 generates a reproduction signal of code j2 at time (n + 1).
[S4] The error evaluation unit 13 calculates an error evaluation value e ({j1, j2}) based on the equation (5).
[S5] If errors for all the code candidates {j1, j2} = {1, 1} to {f, f} are calculated, the process proceeds to step S6, and if not, the process returns to step S2.
[S6] The error evaluation unit 13 detects the minimum value of the error evaluation value e ({j1, j2}), and outputs j1 of {j1, j2} that is the minimum value as the code idx [n] at time n. To do.
[S7] The local decoder 12 updates the quantization step size at time (n + 1) based on j1 at time n determined in step S6.
[S8] Update the time n and enter the process for obtaining the code at the time (n + 1) (the code candidate storage unit 11 includes the code candidate {j1, code j1 at time (n + 1), code j2 at time (n + 2)) j2} is stored).

以上説明したように、音声信号のサンプル値に対する符号を求める際に、サンプル値の近傍区間でのすべての符号候補の組み合わせを格納し、符号候補から再生信号を生成し、入力サンプル値と再生信号との差分の自乗和を算出して、自乗和が最小となる符号候補の中の符号を出力する構成とした。これにより、音声の振幅変動が大きい場合でも、量子化誤差を効率よく抑制することができ、音声品質の向上を図ることが可能になる。また、符号器側の構成変更のみで実現できるので容易に実用化が可能である。 As described above, when obtaining the code for the sample values of the audio signal, and stores the combinations of all the candidate codes in the vicinity interval of sample values, and generates a reproduced signal from the candidate codes, reproducing the input sample value The sum of squares of the difference from the signal is calculated, and the code among the code candidates that minimizes the sum of squares is output. Thereby, even when the amplitude fluctuation of the voice is large, the quantization error can be efficiently suppressed, and the voice quality can be improved. Further, it can easily be put to practical use because it realized only by the configuration change of the encoder side.

次に効果について説明する。図１４は従来の処理を行った場合の波形であり、図１５は音声符号化装置１０による処理を行った場合の波形を示す図である。縦軸は振幅、横軸は時間であり、男女の自然音（肉声）ファイルについて測定した結果である。 Next to the effect will be explained. FIG. 14 shows a waveform when the conventional processing is performed, and FIG. 15 shows a waveform when the processing by the speech encoding apparatus 10 is performed. The vertical axis represents amplitude, and the horizontal axis represents time, which is the result of measurement on a natural sound (real voice) file for men and women.

図１４の上側の波形Ｗ１ａは、従来のＡＤＰＣＭ符号器で符号化した信号を再生した信号（ＡＤＰＣＭ復号器の出力波形）であり、下側の波形Ｗ１ｂは元の入力音声と波形Ｗ１ａとのレベル差分である。また、図１５の上側の波形Ｗ２ａは、音声符号化装置１０で符号化した信号を再生した信号（ＡＤＰＣＭ復号器の出力波形）であり、下側の波形Ｗ２ｂは元の入力音声と波形Ｗ２ａとのレベル差分である（レベル差分を示す誤差信号の倍率は４倍にした）。 The upper waveform W1a in FIG. 14 is a signal (ADPCM decoder output waveform) reproduced from the signal encoded by the conventional ADPCM encoder, and the lower waveform W1b is the level of the original input speech and the waveform W1a. It is a difference. The upper waveform W2a of Fig. 15 is a coded signal of the reproduced signal in voice coding apparatus 10 (the output waveform of the ADPCM decoder), lower waveform W2b original input speech waveform W2a (The magnification of the error signal indicating the level difference is 4 times).

波形Ｗ１ｂ、波形Ｗ２ｂを比較すると、波形Ｗ２ｂの方が平坦であり、量子化誤差が抑制されていることがわかる。また、Ｓ／Ｎについては従来は２８．３７ｄＢであったが、音声符号化装置１０では３４．５０ｄＢとなり、６．１３ｄＢの改善が見られ、音声符号化装置１０が有効であることがわかる。 Waveform W1b, when comparing the waveform W2b, a flat towards waveform W2b, it can be seen that the quantization error is suppressed. Further, the S / N was 28.37 dB in the past, but it was 34.50 dB in the speech encoding device 10 , which showed an improvement of 6.13 dB, indicating that the speech encoding device 10 is effective.

次に変形例について説明する。図１６は変形例を示す図である。音声符号化装置１０ａは、あらたに符号選択部１４を含む。その他の構成要素は図１２と同じである。
符号選択部１４では、近傍区間の最終段のサンプル時刻を時刻（ｎ＋ｋ）とした場合、時刻（ｎ＋ｋ）における符号候補に対し、入力サンプル値ｉｎ[ｎ＋ｋ]に最も近い値を表す符号を選択し、適応逆量子化部１２ａへ出力する。そして、ローカルデコーダ１２では、時刻（ｎ＋ｋ）の再生信号に対しては、符号選択部１４で選択された符号のみを再生して再生信号を生成する。 For strange Katachirei be described in the following. Figure 16 is a diagram showing a modification Katachirei. The speech encoding apparatus 10a newly includes a code selection unit 14. Other components are the same as those in FIG.
The code selection unit 14 selects a code representing a value closest to the input sample value in [n + k] for the code candidate at the time (n + k) when the sample time of the last stage of the neighboring section is time (n + k). , And output to the adaptive inverse quantization unit 12a. The local decoder 12 then reproduces only the code selected by the code selection unit 14 for the reproduction signal at time (n + k) to generate a reproduction signal.

図１７は変形例の動作を説明するための図である。時刻ｎの符号を求める際に、先読みサンプル１とすると、最終段時刻は時刻（ｎ＋１）となる（先読みサンプルが２なら、最終段時刻は時刻（ｎ＋２）である）。 FIG. 17 is a diagram for explaining the operation of the modification. When obtaining the sign of time n, if the prefetch sample 1 is used, the last stage time is time (n + 1) (if the prefetch sample is 2, the last stage time is time (n + 2)).

ここで、図１５以前に上述した音声符号化装置１０の動作では、符号候補格納部１１から入力した符号をすべて復号化して再生信号を生成し、誤差評価を行うものであった。一方、変形例の場合は、最終段時刻（ｎ＋ｋ）の符号候補に対しては、最終段時刻（ｎ＋ｋ）の入力サンプル値ｉｎ[ｎ＋ｋ]と最も近接な１つの符号を符号選択部１４であらかじめ選択し（通常の符号化が行われている）、最終段時刻（ｎ＋ｋ）に関しては、その符号だけをローカルデコーダ１２で復号化して再生信号を生成して、その後、誤差評価部１３で誤差評価が行われるものである。 Here, in the operation of the speech encoding apparatus 10 described before FIG. 15, all the codes input from the code candidate storage unit 11 are decoded to generate a reproduction signal, and error evaluation is performed. On the other hand, in the case of the modified example, for the code candidate at the last stage time (n + k), the code selection unit 14 previously stores one code closest to the input sample value in [n + k] at the last stage time (n + k). With respect to the final stage time (n + k), the local decoder 12 decodes only the code to generate a reproduction signal, and then the error evaluation unit 13 evaluates the error. Is done.

したがって、図の場合、＃（１−１）が符号選択部１４で選択されることになるので、ローカルデコーダ１２では、＃（１−１）のみ復号化し、＃（１−２）〜＃（１−４）に関しては、復号化は行わない。このような構成にすることで、変形例の場合では、計算量を低減することができ、処理速度の向上を図ることが可能になる。 Accordingly, in the case of the figure, since the code selection unit 14 selects # (1-1), the local decoder 12 decodes only # (1-1), and # (1-2) to # ( For 1-4), no decoding is performed. With such a configuration, in the case of the modification, the amount of calculation can be reduced, and the processing speed can be improved.

このように、現在のサンプルだけでなく、近傍のサンプル区間での量子化誤差を考慮して符号を選択することで、量子化誤差を抑制し、音質を向上させることができる。なお、上記では、符号化を行う信号として、音声信号を対象にして説明したが、音声信号に限らず、高能率符号化の一方式として、多様な分野に広く適用することが可能である。 Thus, not only the sample current, by selecting the code in consideration of the quantization error in the sampling interval in the vicinity, it is possible to suppress the quantization error, improve sound quality. In the above description, the audio signal is described as the signal to be encoded. However, the present invention is not limited to the audio signal, and can be widely applied to various fields as a high- efficiency encoding method.

以上説明したように、音声符号化装置は、音声信号のサンプル値に対する符号を求める際に、サンプル値の近傍区間でのすべての符号候補の組み合わせを格納し、格納されている符号を復号化して再生信号を生成し、入力サンプル値と再生信号との差分の自乗和を算出して、自乗和が最小となる符号候補を量子化誤差最小とみなして、符号候補の中の符号を出力する構成とした。これにより、音声の振幅変動が大きい場合でも、量子化誤差を効率よく抑制することができ、音声品質の向上を図ることが可能になる。 As described above, the voice encoding device, when obtaining the code for the sample values of the speech signal, and stores the combinations of all the candidate codes in the vicinity interval of sample values, and decodes the code stored The reproduction signal is generated, the sum of squares of the difference between the input sample value and the reproduction signal is calculated, the code candidate having the minimum square sum is regarded as the minimum quantization error, and the code in the code candidate is output. The configuration. Thereby, even when the amplitude fluctuation of the voice is large, the quantization error can be efficiently suppressed, and the voice quality can be improved.

上記については単に本発明の原理を示すものである。さらに、多数の変形、変更が当業者にとって可能であり、本発明は上記に示し、説明した正確な構成および応用例に限定されるものではなく、対応するすべての変形例および均等物は、添付の請求項およびその均等物による本発明の範囲とみなされる。 The above merely illustrates the principle of the present invention. In addition, many modifications and changes can be made by those skilled in the art, and the present invention is not limited to the precise configuration and application shown and described above, and all corresponding modifications and equivalents may be And the equivalents thereof are considered to be within the scope of the invention.

音声符号化装置の原理図である。It is a principle diagram of a speech coding apparatus. 再生信号を求めている様子を示す図である。It is a figure which shows a mode that the reproduction | regeneration signal is calculated | required. 振幅変動に追随できずに大きな量子化誤差が発生する様子を示す図である。It is a figure which shows a mode that a big quantization error generate | occur | produces without following an amplitude fluctuation | variation. 符号候補格納部で格納される符号候補の概念を説明するための図である。It is a figure for demonstrating the concept of the code candidate stored in a code candidate storage part. 動作を説明するための図である。It is a diagram for explaining the operation. 動作を説明するための図である。It is a diagram for explaining the operation. 動作を説明するための図である。It is a diagram for explaining the operation. 動作を説明するための図である。It is a diagram for explaining the operation. 動作を説明するための図である。It is a diagram for explaining the operation. 動作を説明するための図である。It is a diagram for explaining the operation. 符号選択を示す図である。It is a diagram showing a sign-selection. 音声符号化装置の構成を示す図である。It is a figure which shows the structure of a speech coding apparatus. 音声符号化装置の動作概要を示すフローチャートである。It is a flowchart which shows the operation | movement outline | summary of a speech coding apparatus. 従来の処理を行った場合の波形を示す図である。It is a figure which shows the waveform at the time of performing the conventional process. 音声符号化装置による処理を行った場合の波形を示す図である。It is a figure which shows the waveform at the time of performing the process by a speech coder . 変形例を示す図である。It is a diagram illustrating a change Katachirei. 変形例の動作を説明するための図である。It is a figure for demonstrating operation | movement of a modification. ＡＤＰＣＭコーデックのブロック構成を示す図である。It is a figure which shows the block configuration of an ADPCM codec. ＡＤＰＣＭコーデックのブロック構成を示す図である。It is a figure which shows the block configuration of an ADPCM codec.

Explanation of symbols

１０音声符号化装置
１１符号候補格納部
１２ローカルデコーダ
１３誤差評価部
DESCRIPTION OF SYMBOLS 10 Speech coding apparatus 11 Code candidate storage part 12 Local decoder 13 Error evaluation part

Claims

In a speech encoding device that encodes a speech signal,
A code that stores all code candidates that can be taken up to the number of pre-read samples as a plurality of combinations of code candidates in a neighborhood section of the sample value every time a code is obtained when obtaining a code for a sample value of an audio signal A candidate store;
A decoded signal generation unit that generates a reproduction signal by decoding the code stored in the code candidate storage unit;
Calculates the sum of squares of the difference between the input sample value and the reproduction signal, to minimize the quantization error, the square sum detects the code candidate minimum, and outputs a code in the detected code candidate error An evaluation unit;
A speech encoding apparatus comprising:

When obtaining a code for a sample value at time n, when time (n + k) is set (0 ≦ k ≦ pr) with the pre-read sample number pr as the neighborhood interval, the code candidate storage unit stores the sample at time n A plurality of combinations of code candidates J {j1, j2,..., Jk} of code jk with respect to sample values from value code j1 to time (n + k) are stored, and the decoded signal generation unit stores codes j1, j2,. , Jk, the reproduction signal sr (J) is sequentially generated, and the error evaluation unit has an input sample value of in,

Code candidate {j1, j2,..., Jk} that minimizes the error evaluation value e (J) of j2 is detected, and j1 of the detected code candidate {j1, j2,..., Jk} is output as a code at time n. The speech encoding apparatus according to claim 1, wherein:

When obtaining the sign for the sample value at time n, if the pre-read sample number pr is the neighborhood section, and the last stage sample time of the neighborhood section is time (n + k) (k = pr), the last stage time (n + k) ) Input sample value in [n + k] is further selected, and the decoded signal generation unit is configured to select the code selection unit for the reproduction signal at the final stage time (n + k). 2. The speech encoding apparatus according to claim 1, wherein only the code selected in (2) is reproduced to generate a reproduction signal.

In an encoding method for encoding a signal,
When obtaining a code for the sample value at time n, when time (n + k) is set (0 ≦ k ≦ pr) with the pre-read sample number pr as the neighborhood section,
As a plurality of combinations of code candidates J {j1, j2,..., Jk} of the code jk for the sample values from the code j1 of the sample value at time n to the time (n + k), all the codes that can be taken up to the number of pre-read samples The candidate is stored every time the sign is obtained,
The reproduction signal sr (J) is sequentially generated from the codes j1, j2,.
If the input sample value is in,

Code candidate {j1, j2,..., Jk} that minimizes the error evaluation value e (J) of
A coding method characterized by outputting j1 of detected code candidates {j1, j2,..., Jk} as a code at time n.

When obtaining the sign for the sample value at time n, if the pre-read sample number pr is the neighborhood section, and the last stage sample time of the neighborhood section is time (n + k) (k = pr), the last stage time (n + k) ) To select the code closest to the input sample value in [n + k], and for the reproduction signal at the final stage time (n + k), reproduce only the selected code to generate a reproduction signal. The encoding method according to claim 4, characterized in that: