JP3262652B2

JP3262652B2 - Audio encoding device and audio decoding device

Info

Publication number: JP3262652B2
Application number: JP28089093A
Authority: JP
Inventors: 義博有山; 浩桂川; 弘美青柳; 克俊伊東
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1993-11-10
Filing date: 1993-11-10
Publication date: 2002-03-04
Anticipated expiration: 2017-03-04
Also published as: JPH07134600A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は音声符号化装置及び音声
復号化装置に関し、例えば、ＶＳＥＬＰ（Vector Sum E
xcited Linear Prediction：ベクトル加算励振線形予
測）音声符号化方式に従うものに適用して好適なもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding apparatus and a speech decoding apparatus, for example, a VSELP (Vector Sum E
xcited Linear Prediction: Vector addition excitation linear prediction) This is suitable for application to a speech coding system.

【０００２】[0002]

【従来の技術】ＶＳＥＬＰ音声符号化方式は、コード励
振線形予測（ＣＥＬＰ）音声符号化方式の一種であり、
音声信号を、声道及び声帯（励振源）の情報にパラメー
タ符号化するものである。ＶＳＥＬＰ音声符号化方式
は、特に、励振源のパラメータ符号化方法がＣＥＬＰ符
号化方式とは異なっている。このＶＳＥＬＰ音声符号化
方式は、例えば、米国ＴＩＡ（Telecommunications Ind
ustry Association ）委員会によって、デジタルセルラ
用の音声符号化方式として標準化されている。2. Description of the Related Art VSELP speech coding is a type of code-excited linear prediction (CELP) speech coding,
The voice signal is parameter-encoded into vocal tract and vocal cord (excitation source) information. In particular, the VSELP speech coding method differs from the CELP coding method in the parameter coding method of the excitation source. This VSELP speech coding method is described in, for example, US TIA (Telecommunications Ind.
ustry Association) committee has standardized it as a voice coding system for digital cellular.

【０００３】このようなＶＳＥＬＰ音声符号化方式につ
いては、例えば、文献Ａや文献Ｂに記載されている。[0003] Such a VSELP speech coding system is described in, for example, References A and B.

【０００４】文献Ａ：“VECTOR SUM EXCITED LINEAR PR
EDICTION (VSELP) SPEECH CODINGAT 8KBPS ”，Ira A.G
erson and Mark A.Jasiuk，Proc. IEEE Inc.Conf. on A
coustics, Speech and Signal Processing, pp.461-46
4,April 1990 文献Ｂ：特開平４−１９０３９９号公報図２は、従来のＶＳＥＬＰ音声符号化方式に従う音声符
号化装置の機能的構成を、主として文献Ａの記載内容に
従って示すものである。Reference A: "VECTOR SUM EXCITED LINEAR PR
EDICTION (VSELP) SPEECH CODINGAT 8KBPS ”, Ira AG
erson and Mark A. Jasiuk, Proc. IEEE Inc. Conf. on A
coustics, Speech and Signal Processing, pp.461-46
4, April 1990 Document B: Japanese Patent Laid-Open No. 4-190399 FIG. 2 shows a functional configuration of a conventional speech coding apparatus according to the VSELP speech coding method, mainly in accordance with the contents described in Document A.

【０００５】なお、図２において、各機能ブロックから
引き出されている破線出力線に係る情報は、音声復号化
装置に伝送されるものを示し、各機能ブロックから引き
出されている実線出力線に係る情報は、伝送する情報を
決定（探索）する際に処理される情報である。[0005] In FIG. 2, the information related to the dashed output lines drawn from each functional block indicates information transmitted to the audio decoding device, and the information related to the solid output lines drawn from each functional block. The information is information processed when determining (searching) information to be transmitted.

【０００６】このＶＳＥＬＰ音声符号化装置は、入力音
声分析部（さらに機能を分解すると、音声パワー計算
部、音声パワー補間部及び声道分析部とに分けることが
できる）１０１、ターゲット信号作成部１０２、ロング
タームラグ選択部１０３、第１コード選択部１０４、第
２コード選択部１０５、ゲイン選択部１０６、３個の増
幅回路１０７〜１０９、加算回路１１０から構成されて
おり、このＶＳＥＬＰ音声符号化装置は、所定周波数で
サンプリングされたデジタル化された音声信号Ｖが入力
されて符号化を行なう。The VSELP speech coding apparatus includes an input speech analysis unit (which can be further divided into a speech power calculation unit, a speech power interpolation unit, and a vocal tract analysis unit) 101 and a target signal creation unit 102. , A long-term lag selector 103, a first code selector 104, a second code selector 105, a gain selector 106, three amplifier circuits 107 to 109, and an adder circuit 110. The apparatus receives a digitized audio signal V sampled at a predetermined frequency and performs encoding.

【０００７】入力音声分析部１０１は、入力音声信号Ｖ
を例えば１６０サンプル（１フレーム）毎に、ＬＰＣ分
析（声道分析方法の一種）して声道情報としてのＬＰＣ
係数ｋ、及び音声平均パワーＲを求め、さらに、これら
ＬＰＣ係数ｋ及び音声平均パワーＲを量子化すると共に
４０サンプルからなるサブフレームに対して補間して、
サブフレーム毎のＬＰＣ係数ｋj 及び音声平均パワーＲ
０を出力する。[0007] The input voice analysis unit 101 receives the input voice signal V
Is subjected to LPC analysis (a type of vocal tract analysis method) for every 160 samples (one frame),
A coefficient k and a voice average power R are obtained, and the LPC coefficient k and the voice average power R are quantized and interpolated with respect to a subframe including 40 samples.
LPC coefficient kj and average sound power R for each subframe
Outputs 0.

【０００８】このようなサブフレームに対する音声平均
パワーの補間方法を示すと、第１サブフレームの音声平
均パワーは前フレームの音声平均パワーとし、第２サブ
フレームの音声平均パワーは前フレームの音声平均パワ
ーと現フレームの音声平均パワーの相乗平均とし、第３
及び第４サブフレームの音声平均パワーはそれぞれ現フ
レームの音声平均パワーとする。The method of interpolating the average voice power for such a subframe is as follows. The average voice power of the first subframe is the average voice power of the previous frame, and the average voice power of the second subframe is the average voice power of the previous frame. The average of the power and the average power of the voice of the current frame
And the average voice power of the fourth subframe is the average voice power of the current frame.

【０００９】ターゲット信号作成部１０２は、各サブフ
レーム毎に、ＬＰＣ係数ｋj を用いた分析フィルタによ
って、入力音声信号Ｖの残差信号を作成すると共に、Ｌ
ＰＣ係数ｋj を用いた重み付けフィルタにその残差信号
を入力させることで音声信号に再合成し、さらに、以前
のサブフレームで合成された励振信号（励振コードベク
トル）ｅｘによる影響を取り除くことで、ターゲット信
号ｐを作成して出力する。The target signal generator 102 generates a residual signal of the input voice signal V by an analysis filter using the LPC coefficient kj for each subframe,
By inputting the residual signal to a weighting filter using the PC coefficient kj to resynthesize the speech signal, and removing the influence of the excitation signal (excitation code vector) ex synthesized in the previous subframe, Generate and output a target signal p.

【００１０】このターゲット信号ｐを目標に、後述する
各機能部が自機能部に係る励振源情報を探索して決定す
る。図２に示す各種情報を表す符号は決定された状態の
ものとし、以下では、探索中のもの（候補）に対しては
その符号末尾に符号ｔを付けて説明する。[0010] With the target signal p as a target, each function unit described later searches and determines excitation source information related to its own function unit. It is assumed that the codes indicating various types of information shown in FIG. 2 are in a determined state, and the search (candidate) being searched for will be described with the code t appended to the end of the code.

【００１１】ロングタームラグ選択部１０３には、以前
に合成された複数のサブフレームの励振信号ｅｘがサブ
フレーム単位の処理が終了する毎に更新されて保存され
ている（内部値ロングタームフィルタステート）。ロン
グタームラグ選択部１０３は、保存されている合成励振
信号の中から様々なラグＬｔで４０サンプルずつの励振
信号ｃ０ｔを取り出し（このような４０サンプルずつの
励振信号がロングターム励振コードベクトルである）、
ＬＰＣ係数ｋj を用いた重み付けフィルタにその取り出
したラグ対応のロングターム励振コードベクトルｃ０ｔ
を入力させて得た信号ｆ０ｔと、ターゲット信号ｐとの
差の二乗誤差を計算し、この二乗誤差が最も小さくなる
ロングターム励振コードベクトルｃ０を求めて出力す
る。なお、各ロングターム励振コードベクトルｃ０ｔに
は、インデックスとしてそのロングターム励振コードベ
クトルに係るラグＬｔが対応する。[0011] The long-term lag selecting section 103 updates and saves the excitation signal ex of a plurality of subframes that have been synthesized before each time processing on a subframe basis is completed (internal value long-term filter state). ). The long-term lag selecting unit 103 extracts an excitation signal c0t of 40 samples each with various lags Lt from the stored combined excitation signals (the excitation signal of each 40 samples is a long-term excitation code vector. ),
A lag-corresponding long-term excitation code vector c0t is added to the weighting filter using the LPC coefficient kj.
Is calculated, and the square error of the difference between the signal f0t obtained by inputting the target signal and the target signal p is calculated, and a long-term excitation code vector c0 that minimizes the square error is obtained and output. Note that a lag Lt related to the long-term excitation code vector corresponds to each long-term excitation code vector c0t as an index.

【００１２】第１コード選択部１０４には、ベーシスベ
クトル（基底ベクトル）ｖ１として複数の固定データが
保存されている。第１コード選択部１０４は、ベーシス
ベクトルに対してベクトルの正負極性を与えるパラメー
タ（グレイコード）θを組み合わせて複数の第１励振コ
ードベクトルｕ１ｔを作成し（文献Ａの(1) 式参照）、
各励振コードベクトルｕ１ｔをＬＰＣ係数ｋj を用いた
重み付けフィルタにかけた後、さらにロングタームラグ
選択部１０３で選択されたロングターム励振コードベク
トルｃ０に対して直交化した信号ｆ１ｔを作成し、この
信号ｆ１ｔとターゲット信号ｐとの差の二乗誤差を計算
し、この二乗誤差が最も小さくなる第１励振コードベク
トルｃ１（ｕ１）を求めて出力する。なお、各第１励振
コードベクトルｕ１ｔにはインデックスＩｔが付与され
ている。The first code selection unit 104 stores a plurality of fixed data as a basis vector (basis vector) v1. The first code selection unit 104 creates a plurality of first excitation code vectors u1t by combining a parameter (gray code) θ that gives the vector a positive / negative polarity with respect to the basis vector (refer to Expression (1) in Document A).
After applying a weighting filter using the LPC coefficient kj to each excitation code vector u1t, a signal f1t orthogonalized to the long-term excitation code vector c0 selected by the long-term lag selection unit 103 is further created. And a target signal p, and calculates and outputs a first excitation code vector c1 (u1) that minimizes the square error. Note that an index It is assigned to each first excitation code vector u1t.

【００１３】第２コード選択部１０５には、第１コード
選択部１０４とは異なる複数のベーシスベクトルｖ２が
保存されている。第２コード選択部１０５は、ベーシス
ベクトルに対してパラメータθを組み合わせて複数の第
２励振コードベクトルｕ２ｔを作成し、各励振コードベ
クトルをＬＰＣ係数ｋj を用いた重み付けフィルタにか
けた後、ロングターム励振コードベクトルｃ０に対して
直交化させた、しかも第１コード選択部１０４が選択し
た最適な第１励振コードベクトルｃ１に対して直交化さ
せた信号ｆ２ｔを作成し、この信号ｆ２ｔとターゲット
信号ｐとの差の二乗誤差を計算し、この二乗誤差が最も
小さくなる第２励振コードベクトルｃ２（ｕ２）を求め
て出力する。なお、各第２励振コードベクトルｕ２ｔに
はインデックスＪｔが付与されている。The second code selector 105 stores a plurality of basis vectors v2 different from those of the first code selector 104. The second code selecting unit 105 creates a plurality of second excitation code vectors u2t by combining the parameter θ with the basis vector, applies each excitation code vector to a weighting filter using the LPC coefficient kj, and then performs long-term excitation. A signal f2t that is orthogonalized to the code vector c0 and that is orthogonalized to the optimal first excitation code vector c1 selected by the first code selection unit 104 is created. Is calculated, and the second excitation code vector c2 (u2) that minimizes the square error is obtained and output. The index Jt is assigned to each second excitation code vector u2t.

【００１４】ゲイン選択部１０６は、増幅回路１０７〜
１０９及び加算回路１１０と相俟って、各励振コードベ
クトルｃ０、ｃ１、ｃ２に対する最適なゲインβ、γ
１、γ２を決定するものである。The gain selection unit 106 includes amplification circuits 107 to
109 and the addition circuit 110, the optimum gains β, γ for the respective excitation code vectors c0, c1, c2.
1, γ2.

【００１５】ゲイン選択部１０６には、各励振コードベ
クトルｃ０、ｃ１、ｃ２に対するゲインβ、γ１、γ２
の組情報が、これらゲインβ、γ１、γ２を等価変換し
たパラメータＧＳ、Ｐ０、Ｐ１の組として複数組格納さ
れている。パラメータＧＳ、Ｐ０、Ｐ１の組はベクトル
量子化されており、その組毎にインデックス（ゲインイ
ンデックス）Ｇが付与されている。なお、ゲインβ、γ
１、γ２の組と、パラメータＧＳ、Ｐ０、Ｐ１の組との
関係については、上記文献Ａの(15)式〜(23)式に記載さ
れている。The gain selector 106 includes gains β, γ1, γ2 for the respective excitation code vectors c0, c1, c2.
Are stored as a set of parameters GS, P0, and P1 obtained by equivalently converting the gains β, γ1, and γ2. A set of parameters GS, P0, and P1 is vector-quantized, and an index (gain index) G is given to each set. Note that the gains β and γ
The relationship between the set of 1, γ2 and the set of parameters GS, P0, P1 is described in the above-mentioned document A in equations (15) to (23).

【００１６】ゲイン選択部１０６は、パラメータＧＳ
ｔ、Ｐ０ｔ、Ｐ１ｔの各組について、ＬＰＣ係数ｋj 、
平均音声パワーＲ０及び各励振コードベクトルｃ０、ｃ
１、ｃ２に基づいて、ゲインβｔ、γ１ｔ、γ２ｔに変
換して対応する増幅回路１０７、１０８、１０９に与え
る。増幅回路１０７〜１０９及び加算回路１１０によっ
て、各励振コードベクトルｃ０、ｃ１、ｃ２と対応する
ゲインβｔ、γ１ｔ、γ２ｔとの積和演算結果である合
成励振信号ｅｘｔが複数の候補として得られる。ゲイン
選択部１０６は、各合成励振信号ｅｘｔについて、ＬＰ
Ｃ係数ｋj から求められる合成フィルタを適用して局部
再生の信号を得、各局部再生信号とターゲット信号ｐと
の差の二乗誤差を計算し、この二乗誤差を最も小さくす
るゲインβ、γ１、γ２の組を最適なものと決定する。The gain selector 106 has a parameter GS
For each set of t, P0t, and P1t, the LPC coefficient kj,
Average voice power R0 and each excitation code vector c0, c
1, c2, and converted to gains βt, γ1t, and γ2t and provided to the corresponding amplifier circuits 107, 108, and 109. By the amplification circuits 107 to 109 and the addition circuit 110, a combined excitation signal ext which is a product-sum operation result of each of the excitation code vectors c0, c1, c2 and the corresponding gain βt, γ1t, γ2t is obtained as a plurality of candidates. The gain selection unit 106 calculates LP for each synthesized excitation signal ext.
A local reproduction signal is obtained by applying a synthesis filter obtained from the C coefficient kj, a square error of a difference between each local reproduction signal and the target signal p is calculated, and gains β, γ1, γ2 for minimizing the square error are calculated. Are determined to be optimal.

【００１７】最適な励振コードベクトルｃ０、ｃ１、ｃ
２及び最適なゲインβ、γ１、γ２から生成された合成
励振信号ｅｘは、次のサブフレームに対するターゲット
信号の合成や、ロングタームフィルタステートの更新等
に用いられる。Optimum excitation code vectors c0, c1, c
2 and the combined excitation signal ex generated from the optimal gains β, γ1, and γ2 are used for the synthesis of the target signal for the next subframe, the update of the long-term filter state, and the like.

【００１８】ＶＳＥＬＰ音声符号化装置は、ＬＰＣ係数
ｋj 、平均音声パワーＲ０、ロングターム励振コードベ
クトルｃ０のインデックス（ロングタームラグ）Ｌ、最
適な第１励振コードベクトルｃ１のインデックスＩ、最
適な第２励振コードベクトルｃ２のインデックスＪ、及
び、最適なゲインパラメータＧＳ、Ｐ0 、Ｐ1 の組のイ
ンデックスＧを、ＶＳＥＬＰ音声復号化装置に送信す
る。The VSELP speech coding apparatus comprises an LPC coefficient kj, an average speech power R0, an index (long term lag) L of a long-term excitation code vector c0, an index I of an optimal first excitation code vector c1, an optimal second The index J of the excitation code vector c2 and the index G of the set of the optimal gain parameters GS, P0, P1 are transmitted to the VSELP speech decoder.

【００１９】次に、ＶＳＥＬＰ音声復号化装置の図示は
省略するが、その復号動作を簡単に説明する。Next, although the illustration of the VSELP speech decoder is omitted, its decoding operation will be briefly described.

【００２０】ＶＳＥＬＰ音声復号化装置においては、後
述するようにして得られた過去の数サブフレームの合成
励振信号に基づいて更新されているロングタームフィル
タステートに、受信したラグＬをインデックスとして適
用してロングターム励振コードベクトルｃ０を復号す
る。また、符号化装置と同一の格納されているベーシス
ベクトル（符号化装置と復号化装置とが同一装置に搭載
されているものであれば符号化装置が用意したもの）ｖ
１と受信した第１励振コードベクトルについてのインデ
ックスＩとから最適な第１励振コードベクトルｃ１を復
号し、符号化装置と同一の格納されているベーシスベク
トル（符号化装置と復号化装置とが同一装置に搭載され
ているものであれば符号化装置が用意したもの）ｖ２と
受信した第２励振コードベクトルについてのインデック
スＪとから最適な第２励振コードベクトルｃ２を復号す
る。さらに、最適なゲインインデックスＧから最適なゲ
インパラメータＧＳ、Ｐ0 、Ｐ1 の組を取出し、得られ
たロングターム励振コードベクトルｃ０、第１励振コー
ドベクトルｃ１、第２励振コードベクトルｃ２と、受信
したＬＰＣ係数ｋj 、平均音声パワーＲ０とから最適な
ゲインβ、γ１、γ２を復号する。In the VSELP speech decoding apparatus, the received lag L is applied as an index to the long-term filter state updated based on the synthesized excitation signals of several past subframes obtained as described later. To decode the long-term excitation code vector c0. Also, the same stored basis vector as the encoding device (if the encoding device and the decoding device are mounted on the same device, those prepared by the encoding device) v
1 and the index I of the received first excitation code vector, and decodes the optimal first excitation code vector c1, and stores the same stored basis vector (the same as the encoding device and the decoding device) An optimal second excitation code vector c2 is decoded from v2 and the index J of the received second excitation code vector if it is installed in the apparatus. Further, a set of optimal gain parameters GS, P0, P1 is extracted from the optimal gain index G, and the obtained long-term excitation code vector c0, first excitation code vector c1, second excitation code vector c2, and received LPC The optimum gain β, γ1, γ2 is decoded from the coefficient kj and the average audio power R0.

【００２１】そして、得られたロングターム励振コード
ベクトルｃ０、第１励振コードベクトルｃ１、第２励振
コードベクトルｃ２に、対応するゲインβ、γ１、γ２
を乗算した後、それらを加算して合成励振信号ｅｘを
得、この合成励振信号ｅｘを、受信したＬＰＣ係数ｋj
を用いて構成された合成フィルタ部を通すことにより、
音声信号Ｖを再生し、さらに、この音声信号Ｖを、受信
したＬＰＣ係数ｋj を用いて構成されたポストフィルタ
部を通してその雑音成分を圧縮して出力する。Then, gains β, γ1, γ2 corresponding to the obtained long-term excitation code vector c0, first excitation code vector c1, and second excitation code vector c2 are obtained.
, And add them to obtain a combined excitation signal ex.
By passing through the synthesis filter unit configured using
The audio signal V is reproduced, and the noise component of the audio signal V is compressed and output through a post-filter unit configured using the received LPC coefficient kj.

【００２２】ところで、ＶＳＥＬＰ音声符号化方式は、
米国ＴＩＡ（Telecommunications Industry Associatio
n ）委員会がデジタルセルラ用の音声符号化方式として
標準化したように、主として、デジタル移動体通信等の
音声信号の圧縮通信に用いられる。By the way, the VSELP speech coding method is as follows.
US TIA (Telecommunications Industry Association)
n) It is mainly used for compression communication of audio signals such as digital mobile communication, as standardized by the committee as an audio coding method for digital cellular.

【００２３】そのため、ＶＳＥＬＰ音声符号化方式を実
行する処理装置（音声符号化装置や音声復号化装置）
は、可能な限りの小形化と低消費電力が要求される。こ
のような要求に応えるためには、処理装置内で扱うデー
タのビット長（語長）をできるだけ短く固定し、しか
も、処理装置が実行する演算処理を固定小数点表現で実
行することが有効な方法である。従って、図２に示した
従来のＶＳＥＬＰ音声符号化装置や、それに対応するＶ
ＳＥＬＰ音声復号化装置においても、演算処理を固定小
数点表現で実行するようになされている。For this reason, a processing unit (speech encoding device or speech decoding device) that executes the VSELP speech encoding method
Requires the smallest possible size and low power consumption. In order to meet such a demand, it is effective to fix the bit length (word length) of data handled in the processing device as short as possible and to execute the arithmetic processing performed by the processing device in a fixed-point representation. It is. Therefore, the conventional VSELP speech coding apparatus shown in FIG.
In the SELP speech decoding apparatus, the arithmetic processing is executed in a fixed-point representation.

【００２４】しかし、演算処理するデータのビット長が
短く選定され、固定小数点表現での演算処理が採用され
ている場合において、そのときのサブフレームの入力音
声信号の音声平均パワーが変化すると、演算処理に供す
る各種内部変数の値も変化し、選定されている固定小数
点表現ではオーバーフロー等を起こす変数も生じ、その
結果、計算精度が低下し、最終的に復号された音声信号
の品質を著しく低下させることも生じていた。However, if the bit length of the data to be processed is selected to be short and the arithmetic processing in the fixed-point representation is employed, if the average audio power of the input audio signal of the subframe at that time changes, the arithmetic operation is performed. The values of various internal variables used for processing also change, and in the selected fixed-point representation, some variables may cause overflow, etc., resulting in a decrease in calculation accuracy and a significant decrease in the quality of the final decoded speech signal. Had also been caused.

【００２５】上述した文献Ｂは、短ビット長のデータを
固定小数点表現で演算することで生じていた不都合を解
決できる発明を記載している。すなわち、入力音声信号
の大きさ（平均パワー）と、演算処理に供する内部変数
の大きさとの間には特定の関係があることに着目し、求
められた音声平均パワーＲ０に応じて、演算処理に供す
る内部変数等の小数点位置の切替えを行ない、切り替え
た小数点位置による固定小数点位置で所定の演算を実行
することが記載されている。The above-mentioned Document B describes an invention which can solve the inconvenience caused by calculating short bit length data in fixed point representation. That is, attention is paid to the fact that there is a specific relationship between the magnitude (average power) of the input audio signal and the magnitude of the internal variable used for the arithmetic processing, and the arithmetic processing is performed in accordance with the determined average audio power R0. It describes that a decimal point position of an internal variable or the like to be used is switched, and a predetermined operation is performed at a fixed point position based on the switched decimal point position.

【００２６】図２において、ターゲット信号作成部１０
２、ロングタームラグ選択部１０３、第１コード選択部
１０４、第２コード選択部１０５及びゲイン選択部１０
６等は種々の演算を実行する機能部であり、文献Ｂに記
載の方法が適用される部分である。図２において、ター
ゲット信号作成部１０２、ロングタームラグ選択部１０
３、第１コード選択部１０４及び第２コード選択部１０
５の機能ブロックに対して、音声平均パワーＲ０を括弧
書きで入力させているのは、文献Ｂの方法が適用可能な
ことを示している。また、ゲイン選択部１１４において
も、ゲインパラメータをゲインに変換する場合だけでな
く、固定小数点位置の決定に音声平均パワーＲ０が利用
可能である。In FIG. 2, the target signal generator 10
2. Long term lag selection unit 103, first code selection unit 104, second code selection unit 105, and gain selection unit 10
Reference numeral 6 denotes a functional unit that executes various operations, and is a unit to which the method described in Document B is applied. In FIG. 2, a target signal creating unit 102, a long term lag selecting unit 10
3. First code selector 104 and second code selector 10
The fact that the audio average power R0 is input in parentheses for the fifth functional block indicates that the method of Document B is applicable. Also in the gain selection unit 114, not only the case where the gain parameter is converted into the gain, but also the average voice power R0 can be used to determine the fixed-point position.

【００２７】このような各機能部において、音声平均パ
ワーＲ０が入力されると、その音声平均パワーＲ０が属
する段階を検出し、例えば、予め用意されているテーブ
ルを参照し、演算に用いる変数の固定小数点表現を得
る。このような処理の後に、所定の演算（例えば所定信
号がフィルタを通過する処理）を行なう。When the sound average power R0 is input to each of these functional units, the stage to which the sound average power R0 belongs is detected, and for example, a table prepared in advance is referred to and the variable used for the calculation is determined. Get fixed-point representation. After such a process, a predetermined operation (for example, a process of passing a predetermined signal through a filter) is performed.

【００２８】図３は、固定小数点位置を切り替えるため
のテーブルの構成例を示すものである。例えば、音声平
均パワーＲ０が、値ＰＷ０以上ＰＷ１未満では、変数１
の固定小数点位置をａ１にし、変数２の固定小数点位置
をａ２にし、変数ｙの固定小数点位置をａｙにすること
を規定している。因に、音声平均パワーＲ０が属する段
階と内部変数の固定小数点表現とを対応付けたテーブル
は、例えば、音声平均パワーＲ０が属する段階に応じた
その機能部からの出力信号（これも内部変数になる）の
固定小数点位置を意識し（出力信号の精度）、このよう
な出力信号の固定小数点位置を実現できるような内部変
数の固定小数点位置を格納しているものである。FIG. 3 shows an example of the structure of a table for switching the fixed-point position. For example, if the audio average power R0 is equal to or more than the value PW0 and less than PW1, the variable 1
, The fixed-point position of variable 2 is set to a2, and the fixed-point position of variable y is set to ay. Incidentally, a table in which the stage to which the audio average power R0 belongs and the fixed-point representation of the internal variable are associated is, for example, an output signal from the functional unit corresponding to the stage to which the audio average power R0 belongs (also an internal variable). The fixed-point position of an internal variable that can realize such a fixed-point position of the output signal is stored in consideration of the fixed-point position of the output signal.

【００２９】[0029]

【発明が解決しようとする課題】しかしながら、このよ
うな内部変数の固定小数点表現が切り替えられる複数の
機能部の内、ロングタームラグ選択部１０３は他の機能
部とは異なる性格を有する。すなわち、ロングタームラ
グ選択部１０３は、過去の数個のサブフレームの合成励
振信号ｅｘをロングタームフィルタステートとして格納
しており、格納しているサブフレームによってはその合
成励振信号の固定小数点位置が他とは異なっている。現
在対象のサブフレームの音声平均パワーＲ０に基づい
て、当該ロングタームラグ選択部１０３からのロングタ
ーム励振コードベクトルｃ０の固定小数点位置を決定し
た場合、この決定された固定小数点位置に、ロングター
ムフィルタステートとして格納されている合成励振信号
の固定小数点位置を一致させて取出すことが必要となる
が、このことはデータをシフトさせることを意味し、シ
フト方向によっては値がオーバーフローすることもあ
る。However, among the plurality of functional units in which the fixed-point representation of the internal variables can be switched, the long term lag selecting unit 103 has a different characteristic from other functional units. That is, the long-term lag selecting unit 103 stores the combined excitation signal ex of several past subframes as a long-term filter state, and depending on the stored subframe, the fixed-point position of the combined excitation signal may be different. Different from the others. When the fixed-point position of the long-term excitation code vector c0 from the long-term lag selecting unit 103 is determined based on the audio average power R0 of the current target subframe, a long-term filter is added to the determined fixed-point position. It is necessary to match the fixed-point position of the combined excitation signal stored as a state and extract it. This means that data is shifted, and the value may overflow depending on the shift direction.

【００３０】すなわち、音声平均パワーＲ０に基づい
て、固定小数点位置を切り替えたとしても、ロングター
ムラグ選択部１０３については、十分な精度を得ること
ができなかった。That is, even if the fixed-point position is switched based on the average voice power R0, the long term lag selecting section 103 cannot obtain sufficient accuracy.

【００３１】以上のように、ゲイン選択だけではなく、
固定小数点位置の切替えに用いる各サブフレーム毎の音
声平均パワーＲ０は、例えば上述のように、第１サブフ
レームの音声平均パワーは前フレームの音声平均パワー
とし、第２サブフレームの音声平均パワーは前フレーム
の音声平均パワーと現フレームの音声平均パワーの相乗
平均とし、第３及び第４サブフレームの音声平均パワー
はそれぞれ現フレームの音声平均パワーとするように決
定される。なお、第２サブフレームについて、２個の音
声平均パワーの平均的情報を相乗平均によって求めるよ
うにしているのは、音声平均パワーはサンプル値の二乗
和になっており、相乗平均の方が相加平均より、その中
間的な値のサンプル列の二乗和に近くなるためである。As described above, not only gain selection but also
For example, as described above, the average audio power R0 of each subframe used for switching the fixed-point position is, as described above, the average audio power of the first subframe is the average audio power of the previous frame, and the average audio power of the second subframe is The average audio power of the previous frame and the average audio power of the current frame are determined as the geometric mean, and the average audio power of the third and fourth subframes is determined to be the average audio power of the current frame. It should be noted that the average information of the two average audio powers is obtained by geometric mean for the second subframe because the average audio power is the sum of squares of the sample values, and This is because it is closer to the sum of squares of the sample sequence of the intermediate value than the averaging.

【００３２】上述の補間方法によれば、第１、第３及び
第４サブフレームの音声平均パワーを容易に得ることが
できるが、第２サブフレームについては演算が必要であ
る。しかも、相乗平均演算であるので、乗算及び平方処
理が必要である。一般的に、平方根を求める処理は、複
雑な計算機構によるか、また、多くの処理ステップを必
要とする。従って、ＶＳＥＬＰ音声符号化方式が適用さ
れた装置（音声符号化装置や音声復号化装置）に求めら
れている、可能な限りの小形化と低消費電力に反してい
る。According to the above-mentioned interpolation method, the average voice power of the first, third and fourth sub-frames can be easily obtained, but the second sub-frame requires calculation. In addition, since it is a geometric mean operation, multiplication and squaring are required. In general, the process of finding the square root depends on a complicated calculation mechanism and requires many processing steps. Therefore, this is contrary to the miniaturization and low power consumption as much as possible which are required for a device (a voice coding device or a voice decoding device) to which the VSELP voice coding method is applied.

【００３３】以上、ＶＳＥＬＰ音声符号化方式に従う装
置について課題を説明したが、一般的なＣＥＬＰ音声符
号化方式に従う装置においても、過去の合成励振信号を
格納してある種の励振コードベクトルを出力する適応励
振コードベクトル選択部（上述したロングタームラグ選
択部に対応）を有するものがあり、また、音声平均パワ
ーをサブフレーム毎に補間するものがある。すなわち、
上記課題は、一般的なＣＥＬＰ音声符号化方式に従う装
置についても生じているものである。Although the problem has been described above with respect to the device that complies with the VSELP speech coding method, a device that complies with the general CELP speech coding method also outputs a certain kind of excitation code vector storing the past synthesized excitation signal. adaptation excited
Vibration code vector selection unit (long-term lag election described above
(Corresponding to the selection unit), and there is a type that interpolates the average voice power for each subframe. That is,
The above problem also occurs in a device that follows a general CELP speech coding scheme.

【００３４】本発明は、以上の点を考慮してなされたも
のであり、過去の合成励振信号を格納している適応励振
コードベクトル選択部においても、固定小数点位置の切
替えによる精度向上を実現することができる音声符号化
装置及び音声復号化装置を提供しようとしたものであ
る。The present invention has been made in view of the above points, and realizes an improvement in accuracy by switching the fixed-point position even in an adaptive excitation code vector selection unit storing past synthesized excitation signals. It is an object of the present invention to provide a speech encoding device and a speech decoding device capable of performing the above-mentioned operations.

【００３５】また、本発明は、フレーム毎の音声平均パ
ワーからサブフレーム毎の音声平均パワーを補間して求
めることを、簡単な構成又は簡単な処理によって実行す
ることができる音声符号化装置を提供しようとしたもの
である。Further, the present invention provides a speech coding apparatus capable of executing, by a simple configuration or a simple process, obtaining an average speech power per subframe from a speech average power per frame. Is what I tried.

【００３６】[0036]

【課題を解決するための手段】かかる課題を解決するた
め、請求項１の本発明においては、音声信号を、有限語
長によるデジタル処理によってパラメータ符号化するＣ
ＥＬＰ音声符号化方式に従う音声符号化装置であって、
過去の数個のフレーム（この請求項においてはサブフレ
ームを含む概念）の合成励振信号を保存しておき、その
信号系列の一部を適応的な励振コードベクトル成分とし
て出力する、内部変数の固定小数点位置を切替え可能な
適応励振コードベクトル選択部を有する音声符号化装置
において、以下の構成を設けた。In order to solve this problem, according to the first aspect of the present invention, a voice signal is parameter-encoded by digital processing using a finite word length.
An audio encoding device according to an ELP audio encoding method,
A fixed internal variable that stores a combined excitation signal of several past frames (a concept including a subframe in this claim) and outputs a part of the signal sequence as an adaptive excitation code vector component. The following configuration is provided in a speech coding apparatus having an adaptive excitation code vector selection unit capable of switching a decimal point position.

【００３７】すなわち、適応励振コードベクトル選択部
に保存されている過去の数個のフレームの合成励振信号
のそれぞれについて、固定小数点位置情報を格納する小
数点位置バッファ部と、この小数点位置バッファ部に格
納されている複数の固定小数点位置情報と、現フレーム
についての音声平均パワーとから、適応励振コードベク
トル選択部における内部変数の固定小数点位置を決定す
る小数点位置選択部とを設けた。That is, for each of the combined excitation signals of the past several frames stored in the adaptive excitation code vector selection unit, a decimal point position buffer unit for storing fixed-point position information, and stored in the decimal point position buffer unit A fixed-point position selector for determining a fixed-point position of an internal variable in the adaptive excitation code vector selector based on the plurality of fixed-point position information and the average voice power for the current frame.

【００３８】また、請求項２の本発明は、音声信号を、
有限語長によるデジタル処理によってパラメータ符号化
するＣＥＬＰ音声符号化方式に従う音声符号化装置であ
って、音声信号のフレーム毎の音声平均パワーを求める
音声パワー計算部と、相前後する２個のフレームの音声
平均パワーから、１フレームを数個に等分したサブフレ
ーム毎の音声平均パワー情報を得る音声パワー補間部と
を有する音声符号化装置を、以下のようにした。According to the second aspect of the present invention, an audio signal is
An audio encoding apparatus according to a CELP audio encoding method for performing parameter encoding by digital processing with a finite word length, comprising: an audio power calculation unit for obtaining an average audio power of each frame of an audio signal; A speech encoding apparatus having a speech power interpolator for obtaining speech average power information for each subframe obtained by equally dividing one frame into several frames from the speech average power is as follows.

【００３９】すなわち、音声パワー補間部が、少なくと
も音声平均パワーの量子化手段を備え、サブフレーム毎
の音声平均パワー情報を求めるときに、フレーム毎の音
声平均パワーを量子化したインデックスで扱い、２個の
フレームの音声平均パワーの相乗平均情報を求めるサブ
フレームに対してはインデックスの相加平均で求め、サ
ブフレーム毎に求められたインデックスを出力し、又
は、求められたインデックスを逆量子化手段を介して戻
した音声平均パワーを出力することとした。That is, the audio power interpolator has at least a means for quantizing the average audio power and, when obtaining the average audio power information for each sub-frame, treats the average audio power for each frame by a quantized index and For the subframes for which the geometric average information of the audio average power of the frames is obtained, the arithmetic average of the indexes is obtained, the index obtained for each subframe is output, or the obtained index is inversely quantized. And output the average power of the voice returned via the.

【００４０】さらに、請求項３の本発明は、音声信号
を、有限語長によるデジタル処理によってパラメータ符
号化するＣＥＬＰ音声符号化方式に従う音声符号化装置
であって、音声信号のフレーム毎の音声平均パワーを求
める音声パワー計算部と、相前後する２個のフレームの
音声平均パワーから、１フレームを数個に等分したサブ
フレーム毎の音声平均パワー情報を得る音声パワー補間
部と、過去の数個のサブフレームの合成励振信号を保存
しておき、その信号系列の一部を適応的な励振コードベ
クトル成分として出力する、内部変数の固定小数点位置
を切替え可能な適応励振コードベクトル選択部を有する
音声符号化装置を、以下のようにした。Furthermore, the present invention according to claim 3 is a speech encoding apparatus according to a CELP speech encoding method for encoding a speech signal by digital processing with a finite word length, wherein the speech signal is averaged for each frame of the speech signal. A voice power calculator for obtaining power; a voice power interpolator for obtaining voice average power information for each sub-frame obtained by equally dividing one frame into several frames from the voice average power of two consecutive frames; An adaptive excitation code vector selector capable of switching a fixed-point position of an internal variable, storing a composite excitation signal of the sub-frames and outputting a part of the signal sequence as an adaptive excitation code vector component The speech encoding device was as follows.

【００４１】すなわち、音声パワー補間部が、少なくと
も音声平均パワーの量子化手段を備え、サブフレーム毎
の音声平均パワー情報を求めるときに、フレーム毎の音
声平均パワーを量子化したインデックスで扱い、２個の
フレームの音声平均パワーの相乗平均情報を求めるサブ
フレームに対してはインデックスの相加平均で求め、サ
ブフレーム毎に求められたインデックスを出力し、又
は、求められたインデックスを逆量子化手段を介して戻
した音声平均パワーを出力すると共に、適応励振コード
ベクトル選択部に保存されている過去の数個のサブフレ
ームの合成励振信号のそれぞれについて、固定小数点位
置情報を格納する小数点位置バッファ部と、この小数点
位置バッファ部に格納されている複数の固定小数点位置
情報と、現サブフレームについての音声平均パワーとか
ら、適応励振コードベクトル選択部における内部変数の
固定小数点位置を決定する小数点位置選択部とを設け
た。That is, the audio power interpolating unit includes at least audio average power quantizing means, and when obtaining audio average power information for each sub-frame, treats the audio average power for each frame with a quantized index, and For the subframes for which the geometric average information of the audio average power of the frames is obtained, the arithmetic average of the indexes is obtained, the index obtained for each subframe is output, or the obtained index is inversely quantized. And a fixed-point position buffer unit for storing fixed-point position information for each of the past several sub-frame synthesized excitation signals stored in the adaptive excitation code vector selection unit, And a plurality of fixed point position information stored in the decimal point position buffer section and the current subframe. And a voice average power of beam, provided with decimal point selection unit which determines a fixed-point position of the internal variables of the adaptive excitation code vector selection unit.

【００４２】請求項４の本発明は、請求項１〜３の本発
明のいずれかに記載の音声符号化装置が採用しているＣ
ＥＬＰ音声符号化方式が、その１種であるフォワード形
ＶＳＥＬＰ音声符号化方式であることを特徴とする。According to a fourth aspect of the present invention, there is provided a speech encoding apparatus according to any one of the first to third aspects of the present invention.
The ELP speech encoding method is a forward-type VSELP speech encoding method, which is one type of the ELP speech encoding method.

【００４３】請求項５の本発明は、請求項１の本発明の
音声符号化装置に対応したＣＥＬＰ音声符号化方式に従
う音声復号化装置であって、過去の数個のフレーム（こ
の請求項においてはサブフレームを含む概念）の合成励
振信号を保存しておき、その信号系列の一部を適応的な
励振コードベクトル成分として出力する、内部変数の固
定小数点位置を切替え可能な適応励振コードベクトル選
択部を有する音声復号化装置において、以下の構成を設
けた。According to a fifth aspect of the present invention, there is provided a speech decoding apparatus according to the CELP speech coding system corresponding to the speech coding apparatus according to the first aspect of the present invention. Is a concept that includes subframes), and outputs a part of the signal sequence as an adaptive excitation code vector component. Selects an adaptive excitation code vector that can switch the fixed-point position of internal variables. The following configuration is provided in the audio decoding device having the section.

【００４４】すなわち、適応励振コードベクトル選択部
に保存されている過去の数個のフレームの合成励振信号
のそれぞれについて、固定小数点位置情報を格納する小
数点位置バッファ部と、この小数点位置バッファ部に格
納されている複数の固定小数点位置情報と、現フレーム
についての音声平均パワーとから、適応励振コードベク
トル選択部における内部変数の固定小数点位置を決定す
る小数点位置選択部とを設けた。That is, for each of the combined excitation signals of the past several frames stored in the adaptive excitation code vector selection unit, a decimal point position buffer unit for storing fixed-point position information, and a decimal point position buffer unit A fixed-point position selector for determining a fixed-point position of an internal variable in the adaptive excitation code vector selector based on the plurality of fixed-point position information and the average voice power for the current frame.

【００４５】請求項６の本発明は、請求項５の本発明の
いずれかに記載の音声復号化装置が採用しているＣＥＬ
Ｐ音声符号化方式が、その１種であるフォワード形ＶＳ
ＥＬＰ音声符号化方式であることを特徴とする。According to a sixth aspect of the present invention, there is provided a speech decoding apparatus according to any one of the fifth aspects of the present invention.
P-speech coding method is one of the forward type VS
It is characterized by the ELP audio coding method.

【００４６】[0046]

【作用】請求項１及び５の本発明は、音声信号を、有限
語長によるデジタル処理によってパラメータ符号化して
伝送するＣＥＬＰ符号化方式に従う音声符号化装置及び
音声復号化装置であって、過去の数個のフレームの合成
励振信号を保存しておき、その信号系列の一部を適応的
な励振コードベクトル成分として出力する、内部変数の
固定小数点位置を切替え可能な適応励振コードベクトル
選択部を有するものに関する。According to the first and fifth aspects of the present invention, there are provided a speech encoding apparatus and a speech decoding apparatus which conform to a CELP encoding system for encoding a speech signal by digital processing with a finite word length and transmitting the encoded signal. It has an adaptive excitation code vector selector capable of switching a fixed-point position of an internal variable, storing a composite excitation signal of several frames and outputting a part of the signal sequence as an adaptive excitation code vector component. About things.

【００４７】既に、音声平均パワーによって、各機能部
が処理する際に用いる内部変数の固定小数点位置を切り
替える方法が提案されている。過去の数個のフレームの
合成励振信号を保存してこれを利用する適応励振コード
ベクトル選択部においては、過去の数個のフレームの合
成励振信号毎に固定小数点位置が異なるため、単に現フ
レームについての音声平均パワーから固定小数点位置を
決定した場合、過去のフレームの合成励振信号から見て
固定小数点位置が妥当ではないことがある（例えば小さ
い数を正確に表現できても大きい数ではオーバーフロー
して不正確になることがある）。There has already been proposed a method of switching the fixed-point position of an internal variable used when each functional unit performs processing according to the average voice power. In the adaptive excitation code vector selection unit that saves and uses the combined excitation signals of the past several frames, the fixed-point position differs for each of the combined excitation signals of the past several frames. If the fixed-point position is determined from the average voice power of the fixed frame, the fixed-point position may not be appropriate in view of the synthesized excitation signal of the past frame (for example, even if a small number can be accurately represented, a large number overflows. May be incorrect).

【００４８】そこで、請求項１及び請求項５の本発明に
おいては、小数点位置バッファ部及び小数点位置選択部
を設け、小数点位置選択部が、小数点位置バッファ部に
格納されている複数の固定小数点位置情報と、現フレー
ムについての音声平均パワーとから、適応励振コードベ
クトル選択部における内部変数の固定小数点位置を決定
することとした。勿論、音声符号化装置と音声復号化装
置とでは、適応励振コードベクトル選択部が同じ状態で
適応励振コードベクトルを出力しなければならず、音声
符号化装置に、小数点位置バッファ部及び小数点位置選
択部を設けた場合には、それに対応する音声復号化装置
にもこれら機能部を設ける。Therefore, according to the first and fifth aspects of the present invention, a decimal point position buffer section and a decimal point position selecting section are provided, and the decimal point position selecting section includes a plurality of fixed decimal point positions stored in the decimal point position buffer section. The fixed-point position of the internal variable in the adaptive excitation code vector selector is determined from the information and the average voice power of the current frame. Of course, in the speech encoding device and the speech decoding device, the adaptive excitation code vector selection unit must output the adaptive excitation code vector in the same state, and the speech encoding device requires the decimal point position buffer unit and the decimal point position selection unit. When such units are provided, these functional units are also provided in the corresponding audio decoding device.

【００４９】請求項２の本発明は、音声信号を、有限語
長によるデジタル処理によってパラメータ符号化するＣ
ＥＬＰ音声符号化方式に従う音声符号化装置であって、
音声信号のフレーム毎の音声平均パワーを求める音声パ
ワー計算部と、相前後する２個のフレームの音声平均パ
ワーから、１フレームを数個に等分したサブフレーム毎
の音声平均パワー情報を得る音声パワー補間部とを有す
る音声符号化装置を前提とする。According to a second aspect of the present invention, a voice signal is parameter-encoded by digital processing using a finite word length.
An audio encoding device according to an ELP audio encoding method,
An audio power calculation unit for calculating an average audio power of each frame of an audio signal, and audio for obtaining average audio power of each sub-frame obtained by equally dividing one frame into several pieces from the average audio power of two consecutive frames. It is assumed that the speech encoding device has a power interpolation unit.

【００５０】サブフレームによっては、相前後する２個
のフレームの音声平均パワーの平均的情報が補間情報と
して求められる。音声平均パワーが、サンプル値の二乗
和で求められることを考慮し、従来では、相乗平均値で
補間していたが、これでは補間構成又は補間処理が複雑
である。For some subframes, average information of the average sound power of two consecutive frames is obtained as interpolation information. In consideration of the fact that the average voice power is obtained by the sum of squares of the sample values, interpolation has conventionally been performed using the geometric mean value. However, the interpolation configuration or interpolation processing is complicated.

【００５１】そこで、請求項２の本発明においては、音
声パワー補間部が、少なくとも音声平均パワーの量子化
手段を備えるようにし、サブフレーム毎の音声平均パワ
ー情報を求めるときに、フレーム毎の音声平均パワーを
量子化したインデックスで扱うこととした。また、サブ
フレームによっては、２個のフレームの音声平均パワー
の相乗平均情報が求められることがあるが、この場合に
は、インデックスの相加平均で求めることとした。サブ
フレーム毎の音声平均パワー情報の利用の仕方によって
は、サブフレーム毎に求められた音声平均パワーのイン
デックスを出力するようにしても良く、また、求められ
たインデックスを逆量子化手段を介して戻した音声平均
パワーを出力するようにしても良い。Therefore, in the present invention of claim 2, the audio power interpolating unit is provided with at least audio average power quantizing means, and when obtaining the audio average power information for each sub-frame, the audio power for each frame is obtained. The average power is treated as a quantized index. In some subframes, the geometric average information of the audio average power of the two frames may be obtained. In this case, the arithmetic average of the indices is used. Depending on how to use the audio average power information for each sub-frame, an index of the audio average power obtained for each sub-frame may be output, and the obtained index may be output via the inverse quantization means. The returned audio average power may be output.

【００５２】請求項３の本発明の音声符号化装置は、請
求項１及び２の本発明の特徴部分を共に備えたものであ
る。According to a third aspect of the present invention, there is provided a speech encoding apparatus having both the features of the first and second aspects of the present invention.

【００５３】請求項４の本発明は、請求項１〜３の本発
明のいずれかに記載の音声符号化装置が採用しているＣ
ＥＬＰ音声符号化方式が、その１種であるフォワード形
ＶＳＥＬＰ音声符号化方式であることに限定したもので
あり、請求項６の本発明は、請求項５の本発明に記載の
音声復号化装置が採用しているＣＥＬＰ音声符号化方式
が、その１種であるフォワード形ＶＳＥＬＰ音声符号化
方式であることに限定したものである。移動体通信等の
圧縮通信では、フォワード形ＶＳＥＬＰ音声符号化方式
が採用されており、請求項１〜３の本発明のいずれ共
に、また、請求項６の本発明も有効に機能する発明であ
る。According to a fourth aspect of the present invention, there is provided a speech encoding apparatus according to any one of the first to third aspects of the present invention.
The ELP audio coding system is limited to one of the forward-type VSELP audio coding systems, and the present invention according to claim 6 is the audio decoding apparatus according to claim 5 of the present invention. Is limited to the forward VSELP speech coding method, which is one type of the CELP speech coding method. In compressed communication such as mobile communication, a forward type VSELP speech coding system is adopted, and any of the present inventions of claims 1 to 3 and the present invention of claim 6 are also effective inventions. .

【００５４】[0054]

【Example】

（Ａ）ＶＳＥＬＰ音声符号化装置以下、本発明による音声符号化装置の一実施例を図面を
参照しながら詳述する。この実施例は、ＶＳＥＬＰ音声
符号化方式に従うＶＳＥＬＰ音声符号化装置の例であ
り、図１にその機能ブロック構成を示している。(A) VSELP Speech Encoding Apparatus Hereinafter, an embodiment of a speech encoding apparatus according to the present invention will be described in detail with reference to the drawings. This embodiment is an example of a VSELP speech coding apparatus according to the VSELP speech coding method, and FIG. 1 shows a functional block configuration thereof.

【００５５】なお、図１においても、各機能ブロックか
ら引き出されている破線出力線に係る情報は、復号化装
置に伝送されるものを示し、各機能ブロックから引き出
されている実線出力線に係る情報は、伝送する情報を決
定（探索）する際に処理される情報である。In FIG. 1, the information on the dashed output lines drawn from each functional block indicates information transmitted to the decoding device, and the information on the solid output lines drawn from each functional block. The information is information processed when determining (searching) information to be transmitted.

【００５６】また、図１に示す各種情報を表す符号も決
定された状態のものとし、探索中のもの（候補）に対し
てはその符号末尾に符号ｔを付けて説明する。Also, it is assumed that the codes indicating the various information shown in FIG. 1 are in the determined state, and the searching (candidate) will be described with the code t appended to the end of the code.

【００５７】この実施例のＶＳＥＬＰ音声符号化装置
は、図１に示すように、音声パワー計算部２０１、音声
パワー補間部２０２、ＬＰＣ分析部２０３、ターゲット
信号作成部２０４、ロングタームラグ選択部２０５、小
数点位置バッファ部２０６、小数点位置選択部２０７、
第１コード選択部２０８、第２コード選択部２０９、ゲ
イン選択部２１０、３個の増幅回路２１１〜２１３、加
算回路２１４から構成されている。As shown in FIG. 1, the VSELP speech coding apparatus of this embodiment has a speech power calculator 201, a speech power interpolator 202, an LPC analyzer 203, a target signal generator 204, and a long term lag selector 205. , Decimal point position buffer 206, decimal point position selector 207,
It comprises a first code selector 208, a second code selector 209, a gain selector 210, three amplifier circuits 211 to 213, and an adder circuit 214.

【００５８】音声パワー計算部２０１にはデジタル入力
音声信号Ｖが与えられ、音声パワー計算部２０１は、入
力音声信号Ｖの音声平均パワーをフレーム（例えば１６
０サンプル）毎に計算し（例えばサンプル値の二乗
和）、フレーム毎の音声平均パワーＲｆを音声パワー補
間部２０２及びＬＰＣ分析部２０３に与える。The audio power calculator 201 is supplied with the digital input audio signal V. The audio power calculator 201 converts the average audio power of the input audio signal V into a frame (for example, 16 bits).
The calculation is performed for each sample (for example, 0 samples) (for example, the sum of squares of sample values), and the average audio power Rf for each frame is provided to the audio power interpolation unit 202 and the LPC analysis unit 203.

【００５９】音声パワー補間部２０２は、現フレーム及
び前フレームの音声平均パワーＲｆから、サブフレーム
（例えば４０サンプル）毎の音声平均パワーＲ０を補間
して得て後述するように種々の機能部に出力する。The audio power interpolation unit 202 interpolates the average audio power R0 of each subframe (for example, 40 samples) from the average audio power Rf of the current frame and the previous frame and obtains the average audio power R0 of each subframe. Output.

【００６０】ここで、音声パワー補間部２０２は、対数
のオーダーで表現されている音声平均パワーの量子化テ
ーブル２０２ａを有し、その音声平均パワーがどの量子
化段階に属するかを表すインデックスを得ることができ
るようになされている。また、音声パワー補間部２０２
は、音声平均パワーのインデックスから音声平均パワー
を得る逆量子化テーブル２０２ｂも備えている。Here, the audio power interpolation unit 202 has a quantization table 202a of audio average power expressed in logarithmic order, and obtains an index indicating to which quantization stage the audio average power belongs. It has been made possible. Also, the audio power interpolation unit 202
Also has an inverse quantization table 202b for obtaining the average voice power from the index of the average voice power.

【００６１】この実施例においては、音声パワー補間部
２０２は、各フレームの音声平均パワーＲｆを量子化テ
ーブル２０２ａを用いてインデックスに変換し、インデ
ックス段階でサブフレームに対する補間処理を行ない、
各サブフレームの音声平均パワーのインデックスを逆量
子化テーブル２０２ｂを用いて音声平均パワーＲ０に変
換する。In this embodiment, the audio power interpolation unit 202 converts the average audio power Rf of each frame into an index using the quantization table 202a, and performs an interpolation process on the sub-frame at the index stage.
The index of the average audio power of each subframe is converted to the average audio power R0 using the inverse quantization table 202b.

【００６２】サブフレーム毎のインデックスの補間は、
例えば次のように行なう。(1) 第１サブフレームの音声
平均パワーのインデックスとして、前フレームの音声平
均パワーのインデックスを用いる。(2) 第２サブフレー
ムの音声平均パワーのインデックスとして、前フレーム
の音声平均パワーのインデックスと現フレームの音声平
均パワーのインデックスとの相加平均を用いる。(3) 第
３のサブフレームの音声平均パワーのインデックスとし
て、現フレームの音声平均パワーのインデックスを用い
る。(4) 第４のサブフレームの音声平均パワーのインデ
ックスとして、現フレームの音声平均パワーのインデッ
クスを用いる。The interpolation of the index for each sub-frame is as follows:
For example, the following is performed. (1) The index of the average audio power of the previous frame is used as the index of the average audio power of the first subframe. (2) The arithmetic average of the index of the average audio power of the previous frame and the index of the average audio power of the current frame is used as the index of the average audio power of the second subframe. (3) The index of the average voice power of the current frame is used as the index of the average voice power of the third subframe. (4) The index of the average voice power of the current frame is used as the index of the average voice power of the fourth subframe.

【００６３】なお、逆量子化テーブル２０２ｂは、相加
平均で得られた端数を有するインデックスを音声平均パ
ワーに変換できるように、量子化テーブル２０２ａより
細かく構成されている。Note that the inverse quantization table 202b is configured more finely than the quantization table 202a so that an index having a fraction obtained by arithmetic averaging can be converted into audio average power.

【００６４】この実施例の補間方法において、音声平均
パワーのインデックスへの変換、インデックスから音声
平均パワーへの変換はテーブル２０２ａ及び２０２ｂを
用いて行なっているので簡単な処理であり、しかも、２
個の音声平均パワーの平均的情報を得る処理も相乗平均
ではなく相加平均を用いているので簡単な処理である。
相乗平均値は、乗算及び平方処理で得られるが、相加平
均値は、加算及び１／２倍処理で得られる。相加平均演
算における加算も簡単な処理であり、複数ビットでなる
データの１／２倍も１ビットシフトという簡単な処理で
ある。In the interpolation method of this embodiment, the conversion of the average voice power into the index and the conversion from the index into the average voice power are performed using the tables 202a and 202b, so that the processing is simple.
The process for obtaining the average information of the individual voice average powers is also a simple process because the arithmetic mean is used instead of the geometric mean.
The geometric mean is obtained by multiplication and squaring, while the arithmetic mean is obtained by addition and halving. The addition in the arithmetic averaging operation is also a simple process, and is a simple process of shifting half the data of a plurality of bits by one bit.

【００６５】なお、以上では、インデックスを逆量子化
して音声平均パワーに戻した後、各種機能部に与えるも
のを示したが、インデックスのまま各種機能部に与える
ようにしても良く、インデックスのまま復号化装置に送
信するようにしても良い。サブフレーム毎の音声平均パ
ワーＲ０は、後述するように、各種機能部における内部
変数の固定小数点位置に用いられるものであり、絶対的
な値が問題ではなく、固定小数点位置との関係では段階
が明らかであれば良く、インデックスを出力しても十分
であり、むしろ、インデックスを与える形式の方が各種
機能部での構成を簡単にできる。但し、ゲイン選択部２
１０では、サブフレーム毎の音声平均パワーの絶対的な
値がゲインの選択動作に必要であるので、インデックス
で与える場合には、ゲイン選択部２１０内に逆量子化テ
ーブルを持たせることが必要となる。しかし、以下で
は、サブフレーム毎の音声平均パワーＲ０自体を各種機
能部に与えるとして説明する。In the above description, the index is inversely quantized and returned to the audio average power, and then applied to various functional units. However, the index may be applied to various functional units as it is. You may make it transmit to a decoding apparatus. As will be described later, the audio average power R0 for each subframe is used for the fixed-point position of an internal variable in various functional units, and its absolute value is not a problem. It is sufficient if it is clear, and it is sufficient to output the index. Rather, the format in which the index is provided simplifies the configuration of various functional units. However, the gain selection unit 2
In 10, since the absolute value of the average sound power of each sub-frame is required for the gain selection operation, it is necessary to provide an inverse quantization table in the gain selection unit 210 when giving an index. Become. However, in the following, description will be made assuming that the average sound power R0 per subframe is given to various functional units.

【００６６】ＬＰＣ分析部２０３は声道分析手段として
設けられたものである。ＬＰＣ分析部２０３は、与えら
れた現フレームの音声平均パワーＲｆに基づいて、内部
で行なうＬＰＣ分析演算での変数の固定小数点位置を切
り替える（図３参照）。その後、１フレーム分の入力音
声信号Ｖに対してＬＰＣ分析を行なってＬＰＣ係数ｋj
を求めて出力する。The LPC analysis section 203 is provided as vocal tract analysis means. LPC analysis section 203 switches the fixed-point positions of variables in the LPC analysis operation performed internally based on the given average voice power Rf of the current frame (see FIG. 3). Thereafter, an LPC analysis is performed on the input voice signal V for one frame to obtain an LPC coefficient kj.
Is output.

【００６７】なお、従来では、ＬＰＣ係数についてもサ
ブフレーム毎の補間値を求めて出力するものを示した
が、この実施例においてはフレーム毎のＬＰＣ係数ｋj
を全てのサブフレームで利用する。しかし、この相違
は、本発明の特徴とは無関係であり、従来と同様に、サ
ブフレーム毎にＬＰＣ係数を得て出力するようにしても
良い。In the prior art, the LPC coefficient is calculated and output for each sub-frame, but in this embodiment, the LPC coefficient kj is determined for each frame.
Is used in all subframes. However, this difference is irrelevant to the feature of the present invention, and the LPC coefficient may be obtained and output for each subframe as in the related art.

【００６８】ターゲット信号作成部２０４は、図２に示
した従来のターゲット信号作成部１０２とほぼ同様な処
理を行なってサブフレーム毎のターゲット信号ｐを形成
して出力する。図２に示した従来のターゲット信号作成
部１０２と異なる点は、ターゲット信号ｐの作成演算処
理を開始する前に、入力されたサブフレームの音声平均
パワーＲ０に応じて、ターゲット信号ｐの作成演算に用
いる変数の固定小数点位置の設定を行なう点である。The target signal generator 204 performs substantially the same processing as the conventional target signal generator 102 shown in FIG. 2 to form and output a target signal p for each subframe. The difference from the conventional target signal creation unit 102 shown in FIG. 2 is that before the start of the target signal p creation calculation processing, the target signal p creation calculation is performed in accordance with the average audio power R0 of the input subframe. Is to set the fixed-point position of the variable used for.

【００６９】ロングタームラグ選択部２０５は、図２に
示した従来のロングタームラグ選択部１０３とほぼ同様
な処理を行なってサブフレーム毎に最適なロングターム
励振コードベクトルｃ０を得て出力する。勿論、この際
には、最適なロングタームラグＬも決定される。The long-term lag selecting section 205 performs almost the same processing as the conventional long-term lag selecting section 103 shown in FIG. 2 to obtain and output the optimal long-term excitation code vector c0 for each subframe. Of course, at this time, the optimum long term lug L is also determined.

【００７０】しかし、この実施例のロングタームラグ選
択部２０５は、最適なロングターム励振コードベクトル
ｃ０や最適なロングタームラグＬを決定する処理に供す
る内部変数の固定小数点位置を切り替える機能を有する
点が従来とは異なっており、しかも、内部変数の固定小
数点位置を切り替える方法も、音声平均パワーＲ０にの
み基づく文献Ｂに記載されている方法とは異なってい
る。However, the long-term lag selecting section 205 of this embodiment has a function of switching the fixed-point position of an internal variable used for determining the optimum long-term excitation code vector c0 and the optimum long-term lag L. Is different from the conventional one, and the method of switching the fixed-point position of the internal variable is also different from the method described in Document B based only on the average sound power R0.

【００７１】固定小数点位置の切替えのために、この実
施例のロングタームラグ選択部２０５には関連して小数
点位置バッファ部２０６及び小数点位置選択部２０７が
設けられている。For switching the fixed-point position, the long-term lag selecting unit 205 of this embodiment is provided with a decimal-point position buffer unit 206 and a decimal-point position selecting unit 207.

【００７２】図４は、ロングタームラグ選択部２０３内
のロングタームフィルタステートバッファ部、及び、小
数点位置バッファ部２０６の格納内容を示すものであ
る。FIG. 4 shows the contents stored in the long-term filter state buffer unit in the long-term lag selection unit 203 and the decimal point position buffer unit 206.

【００７３】ロングタームラグ選択部２０３には、サブ
フレーム毎の最終的な合成励振信号ｅｘが与えられ、直
前過去の数個（図４では４個の例）のサブフレームの最
終的な合成励振信号ｅｘ（−１）、ｅｘ（−２）、ｅｘ
（−３）、ｅｘ（−４）が、図４（ａ）に示すように、
ロングタームフィルタステートバッファ部に格納され
る。この合成励振信号ｅｘの格納時には、その合成励振
信号に係る固定小数点位置ｆｐの情報が小数点位置バッ
ファ部２０６に与えられ、図４（ｂ）に示すように、サ
ブフレームを明らかにしてその固定小数点位置ｆｐ（−
１）、ｆｐ（−２）、ｆｐ（−３）、ｆｐ（−４）の情
報が小数点位置バッファ部２０６に格納される。The long-term lag selecting section 203 is supplied with the final combined excitation signal ex for each subframe, and the final combined excitation of several immediately preceding subframes (four in FIG. 4 as an example). Signals ex (-1), ex (-2), ex
(-3) and ex (-4) are as shown in FIG.
It is stored in the long-term filter state buffer. At the time of storing the combined excitation signal ex, information on the fixed-point position fp related to the combined excitation signal is given to the decimal-point position buffer unit 206, and as shown in FIG. Position fp (-
1), fp (−2), fp (−3), and fp (−4) are stored in the decimal point position buffer unit 206.

【００７４】ここで、現サブフレームについて最適なロ
ングターム励振コードベクトルｃ０を探索する際には、
ロングタームフィルタステートバッファ部に格納されて
いる部分からロングタームラグＬｔを変更しながら１サ
ブフレーム期間のデータを取出すことを要する。このよ
うな場合、過去の２個のサブフレームの合成励振信号に
股がるサブフレーム期間のデータを取出すことがほとん
どである。ロングタームフィルタステートバッファ部に
格納されている直前過去の数個のサブフレームの最終的
な合成励振信号ｅｘ（−１）、ｅｘ（−２）、ｅｘ（−
３）、ｅｘ（−４）の固定小数点位置ｆｐ（−１）、ｆ
ｐ（−２）、ｆｐ（−３）、ｆｐ（−４）は、異なって
いることの方が多く、音声平均パワーＲ０にのみ基づい
て、ロングタームラグ選択部２０５における内部変数の
固定小数点位置を決定することは妥当ではない。Here, when searching for the optimal long-term excitation code vector c0 for the current subframe,
It is necessary to extract data for one subframe period while changing the long term lag Lt from the portion stored in the long term filter state buffer unit. In such a case, in most cases, data is extracted during a subframe period that overlaps with the combined excitation signal of the past two subframes. The final combined excitation signals ex (−1), ex (−2), ex (−) of the last several subframes stored in the long-term filter state buffer unit
3), fixed point positions fp (-1), f of ex (-4)
p (−2), fp (−3), and fp (−4) are often different, and the fixed-point positions of the internal variables in the long term lag selection unit 205 are determined based on only the average voice power R0. It is not reasonable to determine

【００７５】小数点位置選択部２０７は、直前過去の数
個のサブフレームの最終的な合成励振信号ｅｘ（−
１）、ｅｘ（−２）、ｅｘ（−３）、ｅｘ（−４）の固
定小数点位置ｆｐ（−１）、ｆｐ（−２）、ｆｐ（−
３）、ｆｐ（−４）を考慮し、以下のようにして、ロン
グタームラグ選択部２０５における内部変数の固定小数
点位置を決定する。The decimal point position selection unit 207 outputs the final combined excitation signal ex (−
1), fixed-point positions fp (-1), fp (-2), fp (-) of ex (-2), ex (-3), ex (-4)
3) In consideration of fp (-4), the fixed-point position of the internal variable in the long term lag selecting unit 205 is determined as follows.

【００７６】図５は、小数点位置選択部２０７に内蔵さ
れているテーブル構成を示すものである。小数点位置選
択部２０７は、現サブフレームの音声平均パワーＲ０に
よって、図５（ａ）に示すテーブルをアクセスして、現
サブフレームの合成励振信号に求められる固定小数点位
置ｆｐ（０）の推測値を得る。次に、小数点位置選択部
２０７は、小数点位置バッファ部２０６に格納されてい
る固定小数点位置ｆｐ（−１）、ｆｐ（−２）、ｆｐ
（−３）、ｆｐ（−４）と、現サブフレームに関して得
た固定小数点位置ｆｐ（０）の中で最も大きな数を表す
固定小数点位置ＦＰを求める。その後、小数点位置選択
部２０７は、この固定小数点位置ＦＰによって、図５
（ｂ）に示すテーブルをアクセスして、ロングタームラ
グ選択部２０５における内部変数のそれぞれについて固
定小数点位置を決定する。FIG. 5 shows a table structure incorporated in the decimal point position selection unit 207. The decimal point position selection unit 207 accesses the table shown in FIG. 5A based on the average voice power R0 of the current subframe, and estimates the fixed point position fp (0) obtained for the combined excitation signal of the current subframe. Get. Next, the decimal point position selection unit 207 converts the fixed point positions fp (−1), fp (−2), and fp stored in the decimal point position buffer unit 206.
(−3), fp (−4) and the fixed point position FP representing the largest number among the fixed point positions fp (0) obtained for the current subframe are obtained. After that, the decimal point position selection unit 207 uses this fixed point position FP as shown in FIG.
By accessing the table shown in (b), the fixed-point position is determined for each of the internal variables in the long term lag selection unit 205.

【００７７】以上のようにして固定小数点位置を決定す
ることにより、ロングターム励振コードベクトルｃ０ｔ
（ロングタームラグＬｔ）の探索時に、精度が低下する
ことを防止できる。By determining the fixed-point position as described above, the long-term excitation code vector c0t
When searching for (long term lag Lt), it is possible to prevent the accuracy from lowering.

【００７８】第１コード選択部２０８、第２コード選択
部２０９及びゲイン選択部２１０においても、ターゲッ
ト信号作成部２０４と同様に、音声平均パワーＲ０に基
づいた内部変数の固定小数点位置の決定処理が実行され
る（図３参照）。第１コード選択部２０８、第２コード
選択部２０９及びゲイン選択部２１０において、このよ
うな内部変数の固定小数点位置を決定した後に行なう処
理自体は、図２に示した従来の第１コード選択部１０
４、第２コード選択部１０５及びゲイン選択部１０６と
ほぼ同様であるので、その説明は省略する。In the first code selector 208, the second code selector 209, and the gain selector 210, as in the case of the target signal generator 204, the process of determining the fixed-point position of the internal variable based on the average voice power R0 is performed. (See FIG. 3). In the first code selection unit 208, the second code selection unit 209, and the gain selection unit 210, the processing itself performed after determining the fixed-point position of such an internal variable is performed by the conventional first code selection unit shown in FIG. 10
4, the second code selection unit 105 and the gain selection unit 106 are substantially the same, and thus description thereof is omitted.

【００７９】上記実施例のＶＳＥＬＰ音声符号化装置に
よれば、ロングタームラグ選択部２０５における演算に
供する内部変数の固定小数点位置を、保存している直前
過去の数個の合成励振信号についての固定小数点位置、
及び、現サブフレームの音声平均パワーに基づいて、決
定するようにしているので、過去の合成励振信号の固定
小数点位置の相違に精度が影響されなくなり、音声信号
の符号化精度が高まり、復号された音声信号の品質を従
来より高めることができる。According to the VSELP speech encoding apparatus of the above embodiment, the fixed-point positions of the internal variables used for the operation in the long-term lag selecting unit 205 are fixed for the stored several immediately preceding synthetic excitation signals. Decimal point position,
In addition, since the determination is made based on the average voice power of the current subframe, the accuracy is not affected by the difference in the fixed-point position of the past synthetic excitation signal, the encoding accuracy of the audio signal is increased, and decoding is performed. The quality of the sound signal thus obtained can be improved as compared with the conventional case.

【００８０】なお、ロングターム励振コードベクトルｃ
０（ロングタームラグＬ）の各候補毎に、その候補に係
る過去２個のサブフレームの合成励振信号の固定小数点
位置と、現サブフレームの音声平均パワーＲ０とから内
部変数の固定小数点位置を決定することも考えられる。
しかし、このようにすると、ロングターム励振コードベ
クトルｃ０（ロングタームラグＬ）の各候補毎に、内部
変数の見直しを行なうことを要して実際的ではなく、上
記実施例のように、全ての候補に対して内部変数の固定
小数点位置を一律に決定することが好ましい。Note that the long-term excitation code vector c
For each candidate of 0 (long term lag L), the fixed-point position of an internal variable is determined from the fixed-point position of the combined excitation signal of the past two sub-frames relating to the candidate and the average voice power R0 of the current sub-frame. It is also conceivable to decide.
However, in this case, it is not practical because it is necessary to review the internal variables for each candidate of the long-term excitation code vector c0 (long-term lag L). It is preferable that the fixed-point positions of the internal variables be determined uniformly for the candidates.

【００８１】また、上記実施例のＶＳＥＬＰ音声符号化
装置によれば、２個のフレームの音声平均パワーからあ
るサブフレームの音声平均パワーを平均的情報値として
求めるにつき、音声平均パワーの量子化、量子化による
インデックスの相加平均処理を利用して行なうようにし
たので、サブフレームの音声平均パワーを得る構成を簡
単にでき、又は、処理ステップ数を少なくすることがで
きる（従って消費電力を小さくできる）。Further, according to the VSELP speech coding apparatus of the above embodiment, when the speech mean power of a certain sub-frame is determined as the average information value from the speech mean power of two frames, the quantization of the speech mean power is performed. Since the arithmetic average processing of the index by quantization is performed, the configuration for obtaining the average audio power of the subframe can be simplified, or the number of processing steps can be reduced (thus, the power consumption can be reduced). it can).

【００８２】（Ｂ）ＶＳＥＬＰ音声復号化装置次に、上記実施例によるＶＳＥＬＰ音声符号化装置に対
応した実施例のＶＳＥＬＰ音声復号化装置を図面を参照
しながら詳述する。(B) VSELP Speech Decoding Apparatus Next, a VSELP speech decoding apparatus according to an embodiment corresponding to the VSELP speech encoding apparatus according to the above embodiment will be described in detail with reference to the drawings.

【００８３】図６は、この実施例のＶＳＥＬＰ音声復号
化装置の機能ブロック構成を示すものである。なお、図
６において、各機能ブロックへの入力線のうち破線のも
のに係る情報は、復号化装置に伝送されてきたものを示
し、各機能ブロックへの入力線のうち実線のものに係る
情報は、当該復号化装置において形成された情報であ
る。FIG. 6 shows a functional block configuration of the VSELP speech decoding apparatus of this embodiment. In FIG. 6, information relating to the broken lines among the input lines to the respective functional blocks indicates information transmitted to the decoding device, and information relating to the solid lines among the input lines to the respective functional blocks. Is information formed in the decoding device.

【００８４】この実施例のＶＳＥＬＰ音声復号化装置
は、図６に示すように、ロングタームラグ選択部３０
１、小数点位置バッファ部３０２、小数点位置選択部３
０３、第１コード選択部３０４、第２コード選択部３０
５、ゲイン選択部３０６、３個の増幅回路３０７〜３０
９、加算回路３１０、合成フィルタ部３１１、ポストフ
ィルタ部３１２から構成されている。As shown in FIG. 6, the VSELP speech decoding apparatus of this embodiment has a long term lag selecting section 30.
1. Decimal point buffer 302, decimal point selector 3
03, first code selector 304, second code selector 30
5, gain selector 306, three amplifier circuits 307 to 30
9, an adder circuit 310, a synthesis filter unit 311, and a post filter unit 312.

【００８５】図６において、ＶＳＥＬＰ音声符号化装置
から送信されてきた情報の内、ロングタームラグＬはロ
ングタームラグ選択部３０１に与えられ、第１励振コー
ドベクトルのインデックスＩは第１コード選択部３０４
に与えられ、第２励振コードベクトルのインデックスＪ
は第２コード選択部３０５に与えられ、ゲインインデッ
クスＧはゲイン選択部３０６に与えられ、音声平均パワ
ーＲ０は小数点位置選択部３０３、第１コード選択部３
０４、第２コード選択部３０５、ゲイン選択部３０６、
合成フィルタ部３１１及びポストフィルタ部３１２に与
えられ、ＬＰＣ係数ｋj はゲイン選択部３０６、合成フ
ィルタ部３１１及びポストフィルタ部３１２に与えられ
る。In FIG. 6, of the information transmitted from the VSELP speech coding apparatus, the long term lag L is given to the long term lag selecting section 301, and the index I of the first excitation code vector is set to the first code selecting section. 304
And the index J of the second excitation code vector
Is given to the second code selection unit 305, the gain index G is given to the gain selection unit 306, and the average voice power R0 is calculated by the decimal point position selection unit 303 and the first code selection unit 3
04, a second code selector 305, a gain selector 306,
The LPC coefficient kj is provided to the synthesis filter unit 311 and the post filter unit 312, and the LPC coefficient kj is provided to the gain selection unit 306, the synthesis filter unit 311 and the post filter unit 312.

【００８６】ロングタームラグ選択部３０１には、サブ
フレーム毎の確定された合成励振信号ｅｘが与えられ、
内部のロングタームフィルタステートバッファ部に格納
されるようになされており、また、その合成励振信号ｅ
ｘに係る固定小数点位置ｆｐの情報を小数点位置バッフ
ァ部３０２に与えて格納させるようになされている。す
なわち、現サブフレームの処理を開始する前のロングタ
ームフィルタステートバッファ部及び小数点位置バッフ
ァ部３０２の格納内容が、符号化装置における対応バッ
ファ部と同一になるようになされている。The long-term lag selecting section 301 is provided with the synthesized excitation signal ex determined for each sub-frame,
The internal excitation signal e is stored in an internal long-term filter state buffer.
The information of the fixed-point position fp related to x is given to the decimal-point position buffer unit 302 and stored. That is, the contents stored in the long-term filter state buffer unit and the decimal point position buffer unit 302 before starting the processing of the current subframe are the same as those of the corresponding buffer unit in the encoding device.

【００８７】小数点位置選択部３０３は、現サブフレー
ムの音声平均パワーＲ０に基づいて現サブフレームの合
成励振信号に求められる固定小数点位置ｆｐ（０）の推
測値を得、次に、小数点位置バッファ部３０２に格納さ
れている直前過去の数個のサブフレームの合成励振信号
ｅｘ（−１）、ｅｘ（−２）、ｅｘ（−３）、ｅｘ（−
４）の固定小数点位置ｆｐ（−１）、ｆｐ（−２）、ｆ
ｐ（−３）、ｆｐ（−４）と、現サブフレームに関して
得た固定小数点位置ｆｐ（０）の中で最も大きな数を表
す固定小数点位置ＦＰを求め、その固定小数点位置ＦＰ
によって、ロングタームラグ選択部３０１における内部
変数について固定小数点位置を決定する。The decimal point position selection unit 303 obtains an estimated value of the fixed point position fp (0) obtained for the combined excitation signal of the current subframe based on the average voice power R0 of the current subframe. The composite excitation signals ex (-1), ex (-2), ex (-3), ex (-
4) Fixed-point positions fp (-1), fp (-2), f
p (−3), fp (−4) and the fixed point position FP representing the largest number among the fixed point positions fp (0) obtained for the current subframe are obtained.
Thus, the fixed-point position of the internal variable in the long term lag selecting unit 301 is determined.

【００８８】復号化装置のロングタームラグ選択部３０
１では、最適なロングターム励振コードベクトルｃ０の
探索が不要であるので、固定小数点が切り替えられる変
数は、出力されるロングターム励振コードベクトルｃ０
自体である。The long term lag selector 30 of the decoding device
In No. 1, since the search for the optimum long-term excitation code vector c0 is unnecessary, the variable whose fixed point is switched is the long-term excitation code vector c0 to be output.
Is itself.

【００８９】ロングタームラグ選択部３０１は、現サブ
フレームに対する固定小数点位置が決定されると、与え
られたロングタームラグＬに基づいて、ロングタームフ
ィルタステートバッファ部からロングターム励振コード
ベクトルｃ０を取出して、ゲイン選択部３０６及び増幅
回路３０７に出力する。When the fixed-point position with respect to the current subframe is determined, the long-term lag selecting section 301 extracts the long-term excitation code vector c0 from the long-term filter state buffer section based on the given long-term lag L. Then, the signal is output to the gain selection unit 306 and the amplification circuit 307.

【００９０】第１コード選択部３０４、第２コード選択
部３０５、ゲイン選択部３０６、合成フィルタ部３１
１、ポストフィルタ部３１２は、自己ブロックに割り当
てられた本来の処理を実行する前に、符号化装置側から
与えられた音声平均パワーＲ０に基づいて、処理に供す
る内部変数の固定小数点位置を決定する（図３参照）。
なお、音声復号化装置における第１コード選択部３０
４、第２コード選択部３０５及びゲイン選択部３０６
は、最適値の探索処理が不要であるので、符号化装置に
おける対応機能ブロックより固定小数点位置が決定され
る内部変数の種類数は少ない。First code selection section 304, second code selection section 305, gain selection section 306, synthesis filter section 31
1. The post-filter unit 312 determines the fixed-point position of an internal variable to be used for processing based on the average audio power R0 given from the encoding apparatus before executing the original processing assigned to the self-block. (See FIG. 3).
Note that the first code selection unit 30 in the speech decoding device
4. Second code selector 305 and gain selector 306
Does not require an optimum value search process, so the number of types of internal variables whose fixed-point positions are determined is smaller than that of the corresponding function block in the encoding device.

【００９１】第１コード選択部３０４は、符号化装置に
おける第１コード選択部２０８と同一のベーシスベクト
ルｖ１を格納しており、固定小数点位置が決定される
と、このベーシスベクトルと受信した第１励振コードベ
クトルについてのインデックスＩとから最適な第１励振
コードベクトルｃ１を復号して、増幅回路３０８及びゲ
イン選択部３０６に出力する。The first code selection unit 304 stores the same basis vector v1 as that of the first code selection unit 208 in the encoding device. When the fixed-point position is determined, the first vector selection unit 304 receives this basis vector and the received first base vector. The optimum first excitation code vector c1 is decoded from the index I for the excitation code vector and output to the amplification circuit 308 and the gain selection unit 306.

【００９２】第２コード選択部３０５は、符号化装置に
おける第２コード選択部２０９と同一のベーシスベクト
ルｖ２を格納しており、固定小数点位置が決定される
と、このベーシスベクトルと受信した第２励振コードベ
クトルについてのインデックスＪとから最適な第２励振
コードベクトルｃ０を復号して、増幅回路３０９及びゲ
イン選択部３０６に出力する。The second code selection unit 305 stores the same basis vector v2 as that of the second code selection unit 209 in the encoding device. When the fixed-point position is determined, the second vector selection unit 305 and the received second vector are received. The optimum second excitation code vector c0 is decoded from the index J of the excitation code vector and output to the amplification circuit 309 and the gain selection unit 306.

【００９３】ゲイン選択部３０６は、符号化装置におけ
るゲイン選択部２１０と同一のゲインパラメータＧＳ、
Ｐ0 、Ｐ1 の組情報を格納しており、固定小数点位置が
決定されると、受信したゲインインデックスＧから最適
なゲインパラメータＧＳ、Ｐ0 、Ｐ1 の組を取出し、得
られたロングターム励振コードベクトルｃ０、第１励振
コードベクトルｃ１、第２励振コードベクトルｃ２と、
受信したＬＰＣ係数ｋj 、平均音声パワーＲ0 とから最
適なゲインβ、γ１、γ２を復号し、それぞれ対応する
増幅回路３０７、３０８、３０９に出力する。The gain selection unit 306 has the same gain parameters GS,
When the fixed-point position is determined, the optimum set of gain parameters GS, P0, P1 is extracted from the received gain index G, and the obtained long-term excitation code vector c0 is stored. , A first excitation code vector c1, a second excitation code vector c2,
The optimum gains β, γ1 and γ2 are decoded from the received LPC coefficient kj and the average audio power R0 and output to the corresponding amplifier circuits 307, 308 and 309, respectively.

【００９４】以上のようにして得られたロングターム励
振コードベクトルｃ０、第１励振コードベクトルｃ１、
第２励振コードベクトルｃ２は、各増幅回路３０７、３
０８、３０９によって、対応するゲインβ、γ１、γ２
に基づいた利得制御がなされ、それらが加算回路３１０
によって加算されて現サブフレームについて合成励振信
号ｅｘが得られる。この合成励振信号ｅｘは、合成フィ
ルタ部３１１及びロングタームラグ選択部３０１に与え
られる。The long-term excitation code vector c0, the first excitation code vector c1,
The second excitation code vector c2 is stored in each of the amplifier circuits 307, 3
08, 309, the corresponding gain β, γ1, γ2
Is controlled based on
To obtain a combined excitation signal ex for the current subframe. This combined excitation signal ex is provided to the combining filter unit 311 and the long term lag selecting unit 301.

【００９５】合成フィルタ部３１１は、決定された固定
小数点位置表現で受信したＬＰＣ係数ｋj を設定してフ
ィルタ特性を規定し、入力された合成励振信号ｅｘをそ
のフィルタ特性でフィルタリングして音声信号を再生す
る。このようにして得られた再生音声信号の雑音成分が
大きいこともある。The synthesis filter unit 311 sets the LPC coefficient kj received in the determined fixed-point position expression to define the filter characteristics, filters the input synthesized excitation signal ex with the filter characteristics, and converts the audio signal into an audio signal. Reproduce. The noise component of the reproduced audio signal obtained in this way may be large.

【００９６】ポストフィルタ部３１２は、決定された固
定小数点位置表現で受信したＬＰＣ係数ｋj を設定して
フィルタ特性を規定し、合成フィルタ部３１１による再
生音声信号をそのフィルタ特性でフィルタリングして雑
音成分を圧縮した最終的な再生音声信号Ｖを得て出力す
る。The post-filter unit 312 sets the LPC coefficient kj received in the determined fixed-point position expression to define the filter characteristics, filters the audio signal reproduced by the synthesis filter unit 311 with the filter characteristics, and removes the noise component. Is obtained and output.

【００９７】上記実施例のＶＳＥＬＰ音声復号化装置に
よれば、ロングタームラグ選択部３０１における演算に
供する内部変数の固定小数点位置を、保存している直前
過去の数個の合成励振信号についての固定小数点位置、
及び、現サブフレームの音声平均パワーに基づいて、決
定するようにしているので、過去の合成励振信号の固定
小数点位置の相違に精度が影響されなくなり、音声信号
の符号化精度が高まり、復号された音声信号の品質を従
来より高めることができる。勿論、符号化装置と同一の
固定小数点位置の決定方法を採用して上記効果が実効あ
るものとなる。According to the VSELP speech decoding apparatus of the above embodiment, the fixed-point positions of the internal variables used for the operation in the long term lag selecting section 301 are fixed for the stored several immediately preceding synthetic excitation signals. Decimal point position,
In addition, since the determination is made based on the average voice power of the current subframe, the accuracy is not affected by the difference in the fixed-point position of the past synthetic excitation signal, the encoding accuracy of the audio signal is increased, and decoding is performed. The quality of the sound signal thus obtained can be improved as compared with the conventional case. Needless to say, the same effect can be obtained by adopting the same fixed point position determining method as that of the encoding device.

【００９８】（Ｃ）他の実施例上記実施例の説明においても、他の実施例について言及
したが、上記で説明した以外にも、以下のような他の実
施例を挙げることができる。(C) Other Embodiments In the description of the above embodiments, other embodiments have been mentioned. In addition to the above, the following other embodiments can be mentioned.

【００９９】上記実施例の特徴は大きく言えば２個あ
る。第１の特徴は、音声平均パワーの補間方法に関し、
２個の音声平均パワーからその平均的な情報を得るのに
インデックスの相加平均を利用したことである。第２の
特徴は、過去のサブフレームの合成励振信号を現サブフ
レームの励振コードベクトル成分として用いる際の固定
小数点表現方法に関し、過去の合成励振信号の固定小数
点位置を利用して必要な変数の固定小数点位置を決定し
ていることである。しかし、いずれか一方の特徴だけを
備えて装置を構成しても良い。The features of the above embodiment are roughly two. The first feature relates to a method for interpolating the average voice power.
That is, the arithmetic mean of the index is used to obtain the average information from the two audio average powers. The second feature relates to a fixed-point representation method when a combined excitation signal of a past subframe is used as an excitation code vector component of a current subframe. That is, the fixed-point position is determined. However, the device may be configured to include only one of the features.

【０１００】上記実施例においては、本発明をフォワー
ド形のＶＳＥＬＰ音声符号化方式に従う装置に適用した
ものを示したが、バックワード形のＶＳＥＬＰ音声符号
化方式に従う装置に適用しても良く、一般的なＣＥＬＰ
音声符号化方式に従う装置に適用しても良い。In the above embodiment, the present invention is applied to an apparatus according to the forward-type VSELP speech coding scheme. However, the present invention may be applied to an apparatus according to the backward-type VSELP speech coding scheme. CELP
The present invention may be applied to a device that conforms to a speech coding method.

【０１０１】例えば、一般的なＣＥＬＰ音声符号化方式
に従う装置においても、過去の合成励振信号を保存して
利用する適応励振コードベクトル選択部（実施例のロン
グタームラグ選択部に相当）を有するものがあり、この
ような装置では第２の特徴を適用できる。なお、この場
合において、フレームをサブフレームに分解することが
ない装置であっても良い。ＶＳＥＬＰ音声符号化方式
は、ＣＥＬＰ音声符号化方式の一種であるので、過去の
合成励振信号を保存して利用する構成部分名を、特許請
求の範囲では、「適応励振コードベクトル選択部」と表
現している。 For example, an apparatus according to a general CELP speech coding system also has an adaptive excitation code vector selection unit (corresponding to the long term lag selection unit in the embodiment) for storing and using past synthesized excitation signals. In such an apparatus, the second feature can be applied. In this case, a device that does not decompose a frame into subframes may be used. VSELP speech coding system
Is a type of CELP speech coding method,
The name of the component that stores and uses the synthesized excitation signal is
In the range of the
Is showing.

【０１０２】また、例えば、２個の音声平均パワーの平
均的情報を固定小数点表現の切替えに用いることがない
装置であっても、２個の音声平均パワーの平均的情報を
得ることを要する装置であれば、第１の特徴構成を適用
できる。Further, for example, even if the apparatus does not use the average information of the two audio average powers for switching the fixed-point representation, it is necessary to obtain the average information of the two audio average powers. Then, the first characteristic configuration can be applied.

【０１０３】[0103]

【発明の効果】以上のように、本発明の音声符号化装置
及び音声復号化装置によれば、適応励振コードベクトル
選択部に保存されている過去の数個のフレームの合成励
振信号のそれぞれについて、固定小数点位置情報を格納
する小数点位置バッファ部と、この小数点位置バッファ
部に格納されている複数の固定小数点位置情報と、現フ
レームについての音声平均パワーとから、適応励振コー
ドベクトル選択部における内部変数の固定小数点位置を
決定する小数点位置選択部とを設けたので、過去の合成
励振信号を格納している適応励振コードベクトル選択部
においても、固定小数点位置の切替えによる精度向上を
実現することができるようになる。As described above, according to the speech encoding apparatus and speech decoding apparatus of the present invention, each of the synthesized excitation signals of the past several frames stored in the adaptive excitation code vector selection unit is used. The adaptive excitation code vector selection unit uses a decimal point position buffer unit for storing fixed point position information, a plurality of fixed point position information stored in the decimal point position buffer unit, and a speech average power for the current frame. Since a decimal point position selector for determining the fixed point position of the variable is provided, even in the adaptive excitation code vector selector that stores the past synthesized excitation signal, it is possible to realize an improvement in accuracy by switching the fixed point position. become able to.

【０１０４】また、他の本発明の音声符号化装置によれ
ば、音声パワー補間部が、少なくとも音声平均パワーの
量子化手段を備え、サブフレーム毎の音声平均パワー情
報を求めるときに、フレーム毎の音声平均パワーを量子
化したインデックスで扱い、２個のフレームの音声平均
パワーの相乗平均情報をインデックスナンバーの相加平
均で求め、サブフレーム毎に求められたインデックスを
出力し、又は、求められたインデックスを逆量子化手段
を介して戻した音声平均パワーを出力するようにしたの
で、簡単な構成又は簡単な処理によってサブフレーム毎
の音声平均パワー情報を得ることができる。According to another speech coding apparatus of the present invention, the speech power interpolator includes at least speech average power quantizing means, and obtains speech average power information for each sub-frame. The audio average power of the two frames is treated as a quantized index, and the geometric average information of the audio average power of the two frames is obtained by the arithmetic mean of the index numbers, and the index obtained for each subframe is output or obtained. Since the average voice power obtained by returning the index obtained through the inverse quantization means is output, the average voice power information for each subframe can be obtained by a simple configuration or simple processing.

[Brief description of the drawings]

【図１】実施例のＶＳＥＬＰ音声符号化装置の機能ブロ
ック図である。FIG. 1 is a functional block diagram of a VSELP speech encoding device according to an embodiment.

【図２】従来のＶＳＥＬＰ音声符号化装置の機能ブロッ
ク図である。FIG. 2 is a functional block diagram of a conventional VSELP speech coding apparatus.

【図３】固定小数点表現の一般的な切替え方法の説明図
である。FIG. 3 is an explanatory diagram of a general switching method of fixed-point representation.

【図４】実施例のロングタームラグ選択部に関する固定
小数点表現の切替え方法の説明図（その１）である。FIG. 4 is an explanatory diagram (part 1) of a method for switching a fixed-point expression regarding a long-term lag selector according to the embodiment.

【図５】実施例のロングタームラグ選択部に関する固定
小数点表現の切替え方法の説明図（その２）である。FIG. 5 is a diagram (part 2) illustrating a method of switching the fixed-point representation regarding the long-term lag selector according to the embodiment.

【図６】実施例のＶＳＥＬＰ音声復号化装置の機能ブロ
ック図である。FIG. 6 is a functional block diagram of the VSELP speech decoding device according to the embodiment;

[Explanation of symbols]

２０１…音声パワー計算部、２０２…音声パワー補間部、２０２ａ…音声平均パワーの量子化テーブル、２０３…ＬＰＣ分析部、２０４…ターゲット信号作成部、２０５、３０１…ロングタームラグ選択部、２０６、３０２…小数点位置バッファ部、２０７、３０３…小数点位置選択部、２０８、３０４…第１コード選択部、２０９、３０５…第２コード選択部、２１０、３０６…ゲイン選択部、２１１〜２１３、３０７〜３０９…増幅回路、２１４、３１０…加算回路２１４、３１１…合成フィルタ部、３１２…ポストフィルタ部。 201: audio power calculation unit, 202: audio power interpolation unit, 202a: audio average power quantization table, 203: LPC analysis unit, 204: target signal creation unit, 205, 301: long term lag selection unit, 206, 302 ... Decimal point position buffer section, 207, 303... Decimal point position selecting section, 208, 304... First code selecting section, 209, 305. ... Amplifier circuits 214 and 310... Addition circuits 214 and 311... Synthesis filter section 312.

フロントページの続き (72)発明者伊東克俊東京都港区虎ノ門１丁目７番12号沖電気工業株式会社内 (56)参考文献特開平４−190399（ＪＰ，Ａ) 特開平５−165498（ＪＰ，Ａ) 特開昭58−63998（ＪＰ，Ａ) 特開昭63−261925（ＪＰ，Ａ) 特開平４−107600（ＪＰ，Ａ) 特開平４−270399（ＪＰ，Ａ) 特開平５−6199（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 - 19/14 H03M 7/30 H04B 14/04 Continuation of the front page (72) Inventor Katsutoshi Ito 1-7-12 Toranomon, Minato-ku, Tokyo Oki Electric Industry Co., Ltd. (56) References JP-A-4-190399 (JP, A) JP-A-5-190 165498 (JP, A) JP-A-58-63998 (JP, A) JP-A-63-261925 (JP, A) JP-A-4-107600 (JP, A) JP-A-4-270399 (JP, A) JP-A-5-6199 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 19/00-19/14 H03M 7/30 H04B 14/04

Claims

(57) [Claims]

1. A speech coding apparatus according to a CELP speech coding system for parameter coding a speech signal by digital processing with a finite word length, wherein a synthetic excitation signal of several past frames is stored. outputting a portion of the signal sequence as adaptive excitation code vector component, in speech encoding apparatus having an adaptive excitation code vector selection unit capable of switching fixed-point position of the internal variable, to the adaptive excitation code vector selection unit For each of the stored excitation signals of the past several frames, a fixed-point position buffer for storing fixed-point position information, a plurality of fixed-point position information stored in the fixed-point position buffer, and a voice average power of the frame, fixed-internal variables in the adaptive excitation code vector selection unit Speech coding apparatus is characterized in that a decimal point position selection section which determines a point location.

2. A speech encoding apparatus according to a CELP speech encoding method for encoding a speech signal by digital processing with a finite word length according to a CELP speech encoding method, comprising: a speech power calculation unit for calculating a speech average power of each frame of the speech signal; A voice power interpolator for obtaining voice average power information for each sub-frame obtained by equally dividing one frame into several frames from the voice average power of two successive frames. The unit includes at least voice average power quantization means, and when obtaining voice average power information for each subframe, treats the voice average power for each frame with a quantized index and calculates the voice average power of two frames. When calculating the geometric mean information, the arithmetic mean of the index is calculated, and the index obtained for each subframe is calculated. Force and, or, the speech coding apparatus and outputs a voice average power back through the inverse quantization means indexes obtained.

3. A speech encoding apparatus according to a CELP speech encoding method for encoding a speech signal by digital processing with a finite word length according to a CELP speech encoding method, comprising: a speech power calculation unit for calculating a speech average power of each frame of the speech signal. A voice power interpolator for obtaining voice average power information for each subframe obtained by equally dividing one frame into several frames from the voice average power of two successive frames, and a synthetic excitation signal of several past subframes Is stored, and a part of the signal sequence is output as an adaptive excitation code vector component. The power interpolation unit includes at least a voice average power quantizing unit, and obtains voice average power information for each sub-frame. The audio average power of each frame is treated as a quantized index, and when calculating geometric average information of the audio average power of two frames, the arithmetic average of the indexes is calculated, and the index determined for each subframe is output. or outputs the audio average power index back through the inverse quantization unit obtained, each of the synthetic excitation signal of the past several sub frames stored in said adaptive excitation code vector selection unit The adaptive excitation code vector selection is performed based on a decimal point position buffer unit for storing fixed point position information, a plurality of fixed point position information stored in the decimal point position buffer unit, and a speech average power for the current subframe. And a decimal point position selecting unit for determining a fixed point position of an internal variable in the unit. Speech coding apparatus for.

4. The speech encoding apparatus according to claim 1, wherein said CELP speech encoding scheme is one of the forward type VSELP speech encoding schemes.

5. A C code corresponding to the speech coding apparatus according to claim 1.
An audio decoding device according to an ELP audio encoding method,
Save the combined excitation signals of several past frames,
Outputting a portion of the signal sequence as adaptive excitation code vector component, the speech decoder with adaptive excitation code vector selection unit capable of switching fixed-point position of the internal variable, to the adaptive excitation code vector selection unit For each of the stored excitation signals of the past several frames, a fixed-point position buffer for storing fixed-point position information, a plurality of fixed-point position information stored in the fixed-point position buffer, and a voice average power of the frame, the speech decoding apparatus is characterized in that a decimal point position selection section which determines a fixed-point position of the internal variable in the adaptive excitation code vector selection unit.

6. The speech decoding apparatus according to claim 5, wherein said CELP speech coding system is one of the forward type VSELP speech coding systems.