JPH07134600A

JPH07134600A - Device for encoding voice and device for decoding voice

Info

Publication number: JPH07134600A
Application number: JP5280890A
Authority: JP
Inventors: Yoshihiro Ariyama; 義博有山; Hiroshi Katsuragawa; 浩桂川; Hiromi Aoyanagi; 弘美青柳; Katsutoshi Ito; 克俊伊東
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1993-11-10
Filing date: 1993-11-10
Publication date: 1995-05-23
Anticipated expiration: 2017-03-04
Also published as: JP3262652B2

Abstract

PURPOSE:To improve the precision by switching the fixed point position even in an adaptive excitation code vector selection part, and to provide voice mean power information for every sub frame by interpolation using simple constitution or processing. CONSTITUTION:A point position selection part 207 decides the fixed point position of an internal variable in the adaptive excitation code vector selection part from fixed point position information stored in a point position buffer part 206 and the voice mean power of the present sub frame related to a synthesis excitation signal of the past sub frame stored in the adaptive excitation code vector selection part 205. Further, a voice power interpolation part 202 is provided with a quantization means 202a of at least the voice mean power, and when the voice mean power information at every sub frame is obtained, the voice mean power at every frame is dealt with a quantized index, and the arithmetical mean processing of the indices is performed for the sub frame obtaining the geometrical mean information of the voice mean power of two pieces of frames.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声符号化装置及び音声
復号化装置に関し、例えば、ＶＳＥＬＰ（Vector Sum E
xcited Linear Prediction：ベクトル加算励振線形予
測）音声符号化方式に従うものに適用して好適なもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coder and a speech decoder, for example, VSELP (Vector Sum E).
xcited Linear Prediction: Vector addition excitation linear prediction) This is suitable for application to a speech coding system.

【０００２】[0002]

【従来の技術】ＶＳＥＬＰ音声符号化方式は、コード励
振線形予測（ＣＥＬＰ）音声符号化方式の一種であり、
音声信号を、声道及び声帯（励振源）の情報にパラメー
タ符号化するものである。ＶＳＥＬＰ音声符号化方式
は、特に、励振源のパラメータ符号化方法がＣＥＬＰ符
号化方式とは異なっている。このＶＳＥＬＰ音声符号化
方式は、例えば、米国ＴＩＡ（Telecommunications Ind
ustry Association ）委員会によって、デジタルセルラ
用の音声符号化方式として標準化されている。2. Description of the Related Art The VSELP speech coding system is a kind of code-excited linear prediction (CELP) speech coding system.
The audio signal is parameter-encoded into vocal tract and vocal cord (excitation source) information. The VSELP speech coding system is different from the CELP coding system particularly in the parameter coding method of the excitation source. This VSELP voice encoding system is, for example, a TIA (Telecommunications Ind.
ustry Association) has been standardized as a voice coding scheme for digital cellular.

【０００３】このようなＶＳＥＬＰ音声符号化方式につ
いては、例えば、文献Ａや文献Ｂに記載されている。Such a VSELP audio encoding system is described in, for example, Documents A and B.

【０００４】文献Ａ：“VECTOR SUM EXCITED LINEAR PR
EDICTION (VSELP) SPEECH CODINGAT 8KBPS ”，Ira A.G
erson and Mark A.Jasiuk，Proc. IEEE Inc.Conf. on A
coustics, Speech and Signal Processing, pp.461-46
4,April 1990 文献Ｂ：特開平４−１９０３９９号公報図２は、従来のＶＳＥＬＰ音声符号化方式に従う音声符
号化装置の機能的構成を、主として文献Ａの記載内容に
従って示すものである。Reference A: “VECTOR SUM EXCITED LINEAR PR
EDICTION (VSELP) SPEECH CODINGAT 8KBPS ", Ira AG
erson and Mark A. Jasiuk, Proc. IEEE Inc. Conf. on A
coustics, Speech and Signal Processing, pp.461-46
4, April 1990 Document B: Japanese Patent Application Laid-Open No. 4-190399 FIG. 2 shows a functional configuration of a speech coder according to the conventional VSELP speech coding system, mainly in accordance with the contents described in Document A.

【０００５】なお、図２において、各機能ブロックから
引き出されている破線出力線に係る情報は、音声復号化
装置に伝送されるものを示し、各機能ブロックから引き
出されている実線出力線に係る情報は、伝送する情報を
決定（探索）する際に処理される情報である。In FIG. 2, the information about the broken line output line drawn from each functional block shows what is transmitted to the speech decoding apparatus, and the information about the solid line output line drawn from each functional block is shown. The information is information processed when determining (searching) information to be transmitted.

【０００６】このＶＳＥＬＰ音声符号化装置は、入力音
声分析部（さらに機能を分解すると、音声パワー計算
部、音声パワー補間部及び声道分析部とに分けることが
できる）１０１、ターゲット信号作成部１０２、ロング
タームラグ選択部１０３、第１コード選択部１０４、第
２コード選択部１０５、ゲイン選択部１０６、３個の増
幅回路１０７〜１０９、加算回路１１０から構成されて
おり、このＶＳＥＬＰ音声符号化装置は、所定周波数で
サンプリングされたデジタル化された音声信号Ｖが入力
されて符号化を行なう。This VSELP speech coding apparatus has an input speech analysis section (which can be further divided into a speech power calculation section, a speech power interpolation section and a vocal tract analysis section) 101, and a target signal generation section 102. , A long-term lag selection unit 103, a first code selection unit 104, a second code selection unit 105, a gain selection unit 106, three amplification circuits 107 to 109, and an addition circuit 110. The device receives the digitized audio signal V sampled at a predetermined frequency and performs encoding.

【０００７】入力音声分析部１０１は、入力音声信号Ｖ
を例えば１６０サンプル（１フレーム）毎に、ＬＰＣ分
析（声道分析方法の一種）して声道情報としてのＬＰＣ
係数ｋ、及び音声平均パワーＲを求め、さらに、これら
ＬＰＣ係数ｋ及び音声平均パワーＲを量子化すると共に
４０サンプルからなるサブフレームに対して補間して、
サブフレーム毎のＬＰＣ係数ｋj 及び音声平均パワーＲ
０を出力する。The input voice analysis unit 101 receives the input voice signal V
, LPC as vocal tract information by performing LPC analysis (a type of vocal tract analysis method) for each 160 samples (1 frame)
The coefficient k and the average voice power R are obtained, and the LPC coefficient k and the average voice power R are quantized and interpolated with respect to a sub-frame consisting of 40 samples.
LPC coefficient kj and speech average power R for each subframe
Outputs 0.

【０００８】このようなサブフレームに対する音声平均
パワーの補間方法を示すと、第１サブフレームの音声平
均パワーは前フレームの音声平均パワーとし、第２サブ
フレームの音声平均パワーは前フレームの音声平均パワ
ーと現フレームの音声平均パワーの相乗平均とし、第３
及び第４サブフレームの音声平均パワーはそれぞれ現フ
レームの音声平均パワーとする。A method of interpolating the voice average power for such a subframe will be described. The voice average power of the first subframe is the voice average power of the previous frame, and the voice average power of the second subframe is the voice average of the previous frame. Power and the average audio power of the current frame are taken as the geometric mean,
The average voice power of the fourth subframe is the average voice power of the current frame.

【０００９】ターゲット信号作成部１０２は、各サブフ
レーム毎に、ＬＰＣ係数ｋj を用いた分析フィルタによ
って、入力音声信号Ｖの残差信号を作成すると共に、Ｌ
ＰＣ係数ｋj を用いた重み付けフィルタにその残差信号
を入力させることで音声信号に再合成し、さらに、以前
のサブフレームで合成された励振信号（励振コードベク
トル）ｅｘによる影響を取り除くことで、ターゲット信
号ｐを作成して出力する。The target signal creating unit 102 creates a residual signal of the input voice signal V by an analysis filter using the LPC coefficient kj for each subframe, and
By inputting the residual signal to the weighting filter using the PC coefficient kj, it is re-synthesized into the voice signal, and further, by removing the influence of the excitation signal (excitation code vector) ex synthesized in the previous subframe, Create and output the target signal p.

【００１０】このターゲット信号ｐを目標に、後述する
各機能部が自機能部に係る励振源情報を探索して決定す
る。図２に示す各種情報を表す符号は決定された状態の
ものとし、以下では、探索中のもの（候補）に対しては
その符号末尾に符号ｔを付けて説明する。With the target signal p as a target, each functional unit, which will be described later, searches and determines excitation source information related to its own functional unit. Codes representing various kinds of information shown in FIG. 2 are assumed to be in a determined state, and in the following, description will be given by adding a code t to the end of the code (candidate) under search.

【００１１】ロングタームラグ選択部１０３には、以前
に合成された複数のサブフレームの励振信号ｅｘがサブ
フレーム単位の処理が終了する毎に更新されて保存され
ている（内部値ロングタームフィルタステート）。ロン
グタームラグ選択部１０３は、保存されている合成励振
信号の中から様々なラグＬｔで４０サンプルずつの励振
信号ｃ０ｔを取り出し（このような４０サンプルずつの
励振信号がロングターム励振コードベクトルである）、
ＬＰＣ係数ｋj を用いた重み付けフィルタにその取り出
したラグ対応のロングターム励振コードベクトルｃ０ｔ
を入力させて得た信号ｆ０ｔと、ターゲット信号ｐとの
差の二乗誤差を計算し、この二乗誤差が最も小さくなる
ロングターム励振コードベクトルｃ０を求めて出力す
る。なお、各ロングターム励振コードベクトルｃ０ｔに
は、インデックスとしてそのロングターム励振コードベ
クトルに係るラグＬｔが対応する。The long-term lag selection unit 103 updates and stores the excitation signals ex of a plurality of subframes that have been previously combined each time the processing for each subframe is completed (internal value long-term filter state). ). The long-term lag selection unit 103 extracts an excitation signal c0t of 40 samples at various lags Lt from the stored combined excitation signals (the excitation signal of 40 samples each is a long-term excitation code vector). ),
A long-term excitation code vector c0t corresponding to the lag extracted by the weighting filter using the LPC coefficient kj
Is calculated and the squared error of the difference between the signal f0t obtained by inputting and the target signal p is calculated, and the long-term excitation code vector c0 having the smallest squared error is obtained and output. Note that each long-term excitation code vector c0t corresponds to the lag Lt related to that long-term excitation code vector as an index.

【００１２】第１コード選択部１０４には、ベーシスベ
クトル（基底ベクトル）ｖ１として複数の固定データが
保存されている。第１コード選択部１０４は、ベーシス
ベクトルに対してベクトルの正負極性を与えるパラメー
タ（グレイコード）θを組み合わせて複数の第１励振コ
ードベクトルｕ１ｔを作成し（文献Ａの(1) 式参照）、
各励振コードベクトルｕ１ｔをＬＰＣ係数ｋj を用いた
重み付けフィルタにかけた後、さらにロングタームラグ
選択部１０３で選択されたロングターム励振コードベク
トルｃ０に対して直交化した信号ｆ１ｔを作成し、この
信号ｆ１ｔとターゲット信号ｐとの差の二乗誤差を計算
し、この二乗誤差が最も小さくなる第１励振コードベク
トルｃ１（ｕ１）を求めて出力する。なお、各第１励振
コードベクトルｕ１ｔにはインデックスＩｔが付与され
ている。The first code selecting section 104 stores a plurality of fixed data as a basis vector (base vector) v1. The first code selection unit 104 creates a plurality of first excitation code vectors u1t by combining a parameter (Gray code) θ that gives positive and negative polarities of the vector to the basis vector (see equation (1) of document A),
After each excitation code vector u1t is subjected to a weighting filter using the LPC coefficient kj, a signal f1t orthogonal to the long-term excitation code vector c0 selected by the long-term lag selecting unit 103 is created, and this signal f1t is generated. The square error of the difference between the target signal p and the target signal p is calculated, and the first excitation code vector c1 (u1) having the smallest square error is obtained and output. An index It is assigned to each first excitation code vector u1t.

【００１３】第２コード選択部１０５には、第１コード
選択部１０４とは異なる複数のベーシスベクトルｖ２が
保存されている。第２コード選択部１０５は、ベーシス
ベクトルに対してパラメータθを組み合わせて複数の第
２励振コードベクトルｕ２ｔを作成し、各励振コードベ
クトルをＬＰＣ係数ｋj を用いた重み付けフィルタにか
けた後、ロングターム励振コードベクトルｃ０に対して
直交化させた、しかも第１コード選択部１０４が選択し
た最適な第１励振コードベクトルｃ１に対して直交化さ
せた信号ｆ２ｔを作成し、この信号ｆ２ｔとターゲット
信号ｐとの差の二乗誤差を計算し、この二乗誤差が最も
小さくなる第２励振コードベクトルｃ２（ｕ２）を求め
て出力する。なお、各第２励振コードベクトルｕ２ｔに
はインデックスＪｔが付与されている。The second code selection unit 105 stores a plurality of basis vectors v2 different from those of the first code selection unit 104. The second code selection unit 105 combines the basis vector with the parameter θ to create a plurality of second excitation code vectors u2t, applies each excitation code vector to a weighting filter using the LPC coefficient kj, and then performs long-term excitation. A signal f2t orthogonalized to the code vector c0 and further orthogonalized to the optimum first excitation code vector c1 selected by the first code selection unit 104 is created, and this signal f2t and the target signal p are generated. The squared error of the difference is calculated, and the second excitation code vector c2 (u2) having the smallest squared error is obtained and output. An index Jt is given to each second excitation code vector u2t.

【００１４】ゲイン選択部１０６は、増幅回路１０７〜
１０９及び加算回路１１０と相俟って、各励振コードベ
クトルｃ０、ｃ１、ｃ２に対する最適なゲインβ、γ
１、γ２を決定するものである。The gain selection unit 106 includes amplifier circuits 107-
Together with 109 and the adder circuit 110, the optimum gains β and γ for each excitation code vector c0, c1 and c2 are obtained.
1 and γ2 are determined.

【００１５】ゲイン選択部１０６には、各励振コードベ
クトルｃ０、ｃ１、ｃ２に対するゲインβ、γ１、γ２
の組情報が、これらゲインβ、γ１、γ２を等価変換し
たパラメータＧＳ、Ｐ０、Ｐ１の組として複数組格納さ
れている。パラメータＧＳ、Ｐ０、Ｐ１の組はベクトル
量子化されており、その組毎にインデックス（ゲインイ
ンデックス）Ｇが付与されている。なお、ゲインβ、γ
１、γ２の組と、パラメータＧＳ、Ｐ０、Ｐ１の組との
関係については、上記文献Ａの(15)式〜(23)式に記載さ
れている。The gain selector 106 includes gains β, γ1, γ2 for the excitation code vectors c0, c1, c2.
A plurality of sets of group information is stored as a set of parameters GS, P0, P1 obtained by equivalently converting these gains β, γ1, γ2. The set of parameters GS, P0, and P1 is vector-quantized, and an index (gain index) G is given to each set. Note that the gains β and γ
The relationship between the set of 1 and γ2 and the set of parameters GS, P0, and P1 is described in the equations (15) to (23) of the above-mentioned document A.

【００１６】ゲイン選択部１０６は、パラメータＧＳ
ｔ、Ｐ０ｔ、Ｐ１ｔの各組について、ＬＰＣ係数ｋj 、
平均音声パワーＲ０及び各励振コードベクトルｃ０、ｃ
１、ｃ２に基づいて、ゲインβｔ、γ１ｔ、γ２ｔに変
換して対応する増幅回路１０７、１０８、１０９に与え
る。増幅回路１０７〜１０９及び加算回路１１０によっ
て、各励振コードベクトルｃ０、ｃ１、ｃ２と対応する
ゲインβｔ、γ１ｔ、γ２ｔとの積和演算結果である合
成励振信号ｅｘｔが複数の候補として得られる。ゲイン
選択部１０６は、各合成励振信号ｅｘｔについて、ＬＰ
Ｃ係数ｋj から求められる合成フィルタを適用して局部
再生の信号を得、各局部再生信号とターゲット信号ｐと
の差の二乗誤差を計算し、この二乗誤差を最も小さくす
るゲインβ、γ１、γ２の組を最適なものと決定する。The gain selection unit 106 has a parameter GS.
For each set of t, P0t and P1t, LPC coefficients kj,
Average voice power R0 and each excitation code vector c0, c
Based on 1 and c2, they are converted into gains βt, γ1t and γ2t and given to corresponding amplifier circuits 107, 108 and 109. By the amplifier circuits 107 to 109 and the adder circuit 110, the composite excitation signal ext, which is the product-sum operation result of the excitation code vectors c0, c1, c2 and the corresponding gains βt, γ1t, γ2t, is obtained as a plurality of candidates. The gain selection unit 106 sets the LP for each composite excitation signal ext.
A local reproduction signal is obtained by applying a synthesis filter obtained from the C coefficient kj, a square error of the difference between each local reproduction signal and the target signal p is calculated, and gains β, γ1, γ2 that minimize the square error are calculated. Determine the optimal set of.

【００１７】最適な励振コードベクトルｃ０、ｃ１、ｃ
２及び最適なゲインβ、γ１、γ２から生成された合成
励振信号ｅｘは、次のサブフレームに対するターゲット
信号の合成や、ロングタームフィルタステートの更新等
に用いられる。Optimal excitation code vectors c0, c1, c
The combined excitation signal ex generated from 2 and the optimum gains β, γ1, and γ2 is used for combining the target signal for the next subframe and updating the long-term filter state.

【００１８】ＶＳＥＬＰ音声符号化装置は、ＬＰＣ係数
ｋj 、平均音声パワーＲ０、ロングターム励振コードベ
クトルｃ０のインデックス（ロングタームラグ）Ｌ、最
適な第１励振コードベクトルｃ１のインデックスＩ、最
適な第２励振コードベクトルｃ２のインデックスＪ、及
び、最適なゲインパラメータＧＳ、Ｐ0 、Ｐ1 の組のイ
ンデックスＧを、ＶＳＥＬＰ音声復号化装置に送信す
る。The VSELP speech coder has the LPC coefficient kj, the average speech power R0, the index (long term lag) L of the long-term excitation code vector c0, the index I of the optimum first excitation code vector c1, and the optimum second. The index J of the excitation code vector c2 and the index G of the set of optimum gain parameters GS, P0 and P1 are transmitted to the VSELP speech decoding apparatus.

【００１９】次に、ＶＳＥＬＰ音声復号化装置の図示は
省略するが、その復号動作を簡単に説明する。Next, although illustration of the VSELP speech decoding apparatus is omitted, its decoding operation will be briefly described.

【００２０】ＶＳＥＬＰ音声復号化装置においては、後
述するようにして得られた過去の数サブフレームの合成
励振信号に基づいて更新されているロングタームフィル
タステートに、受信したラグＬをインデックスとして適
用してロングターム励振コードベクトルｃ０を復号す
る。また、符号化装置と同一の格納されているベーシス
ベクトル（符号化装置と復号化装置とが同一装置に搭載
されているものであれば符号化装置が用意したもの）ｖ
１と受信した第１励振コードベクトルについてのインデ
ックスＩとから最適な第１励振コードベクトルｃ１を復
号し、符号化装置と同一の格納されているベーシスベク
トル（符号化装置と復号化装置とが同一装置に搭載され
ているものであれば符号化装置が用意したもの）ｖ２と
受信した第２励振コードベクトルについてのインデック
スＪとから最適な第２励振コードベクトルｃ２を復号す
る。さらに、最適なゲインインデックスＧから最適なゲ
インパラメータＧＳ、Ｐ0 、Ｐ1 の組を取出し、得られ
たロングターム励振コードベクトルｃ０、第１励振コー
ドベクトルｃ１、第２励振コードベクトルｃ２と、受信
したＬＰＣ係数ｋj 、平均音声パワーＲ０とから最適な
ゲインβ、γ１、γ２を復号する。In the VSELP speech decoding apparatus, the received lag L is applied as an index to the long-term filter state updated based on the past combined excitation signals of several subframes obtained as described later. And decodes the long-term excitation code vector c0. Also, the same stored basis vector as that of the encoding device (provided by the encoding device if the encoding device and the decoding device are installed in the same device) v
The optimum first excitation code vector c1 is decoded from 1 and the received index I for the first excitation code vector, and the stored basis vector is the same as the encoding device (the encoding device and the decoding device are the same). The optimum second excitation code vector c2 is decoded from v2, which is prepared by the encoding device if it is mounted on the device, and the index J for the received second excitation code vector. Furthermore, the set of optimum gain parameters GS, P0, P1 is extracted from the optimum gain index G, and the obtained long-term excitation code vector c0, first excitation code vector c1, second excitation code vector c2, and received LPC Optimal gains β, γ1, and γ2 are decoded from the coefficient kj and the average voice power R0.

【００２１】そして、得られたロングターム励振コード
ベクトルｃ０、第１励振コードベクトルｃ１、第２励振
コードベクトルｃ２に、対応するゲインβ、γ１、γ２
を乗算した後、それらを加算して合成励振信号ｅｘを
得、この合成励振信号ｅｘを、受信したＬＰＣ係数ｋj
を用いて構成された合成フィルタ部を通すことにより、
音声信号Ｖを再生し、さらに、この音声信号Ｖを、受信
したＬＰＣ係数ｋj を用いて構成されたポストフィルタ
部を通してその雑音成分を圧縮して出力する。The gains β, γ1, and γ2 corresponding to the obtained long-term excitation code vector c0, the first excitation code vector c1, and the second excitation code vector c2 are obtained.
And then add them to obtain a composite excitation signal ex, and use this composite excitation signal ex as the received LPC coefficient kj
By passing through the synthesis filter section configured using
The voice signal V is reproduced, and the noise component of the voice signal V is compressed and output through the post filter unit configured by using the received LPC coefficient kj.

【００２２】ところで、ＶＳＥＬＰ音声符号化方式は、
米国ＴＩＡ（Telecommunications Industry Associatio
n ）委員会がデジタルセルラ用の音声符号化方式として
標準化したように、主として、デジタル移動体通信等の
音声信号の圧縮通信に用いられる。By the way, the VSELP audio encoding system is
US TIA (Telecommunications Industry Associatio)
n) As standardized by the Committee as a voice encoding method for digital cellular, it is mainly used for voice signal compression communication such as digital mobile communication.

【００２３】そのため、ＶＳＥＬＰ音声符号化方式を実
行する処理装置（音声符号化装置や音声復号化装置）
は、可能な限りの小形化と低消費電力が要求される。こ
のような要求に応えるためには、処理装置内で扱うデー
タのビット長（語長）をできるだけ短く固定し、しか
も、処理装置が実行する演算処理を固定小数点表現で実
行することが有効な方法である。従って、図２に示した
従来のＶＳＥＬＰ音声符号化装置や、それに対応するＶ
ＳＥＬＰ音声復号化装置においても、演算処理を固定小
数点表現で実行するようになされている。Therefore, a processing device (speech encoding device or speech decoding device) that executes the VSELP speech encoding system.
Are required to be as compact and low power consumption as possible. In order to meet such a demand, it is effective to fix the bit length (word length) of the data handled in the processing device as short as possible and to execute the arithmetic processing executed by the processing device in fixed-point representation. Is. Therefore, the conventional VSELP speech coding apparatus shown in FIG.
Even in the SELP speech decoding apparatus, the arithmetic processing is executed by the fixed point representation.

【００２４】しかし、演算処理するデータのビット長が
短く選定され、固定小数点表現での演算処理が採用され
ている場合において、そのときのサブフレームの入力音
声信号の音声平均パワーが変化すると、演算処理に供す
る各種内部変数の値も変化し、選定されている固定小数
点表現ではオーバーフロー等を起こす変数も生じ、その
結果、計算精度が低下し、最終的に復号された音声信号
の品質を著しく低下させることも生じていた。However, when the bit length of the data to be arithmetically processed is selected to be short and the arithmetic processing in the fixed point representation is adopted, if the audio average power of the input audio signal of the subframe at that time changes, the arithmetic operation is performed. The values of various internal variables used for processing also change, and some variables may cause overflow in the selected fixed-point representation. As a result, the calculation accuracy decreases and the quality of the final decoded audio signal significantly decreases. Something was happening.

【００２５】上述した文献Ｂは、短ビット長のデータを
固定小数点表現で演算することで生じていた不都合を解
決できる発明を記載している。すなわち、入力音声信号
の大きさ（平均パワー）と、演算処理に供する内部変数
の大きさとの間には特定の関係があることに着目し、求
められた音声平均パワーＲ０に応じて、演算処理に供す
る内部変数等の小数点位置の切替えを行ない、切り替え
た小数点位置による固定小数点位置で所定の演算を実行
することが記載されている。The above-mentioned document B describes an invention which can solve the inconvenience which has occurred by calculating the data of the short bit length by the fixed point representation. That is, paying attention to the fact that there is a specific relationship between the size of the input audio signal (average power) and the size of the internal variable used for the arithmetic processing, and the arithmetic processing is performed according to the obtained average audio power R0. It is described that the decimal point positions of internal variables and the like used for the above are switched, and a predetermined operation is executed at the fixed decimal point position according to the switched decimal point position.

【００２６】図２において、ターゲット信号作成部１０
２、ロングタームラグ選択部１０３、第１コード選択部
１０４、第２コード選択部１０５及びゲイン選択部１０
６等は種々の演算を実行する機能部であり、文献Ｂに記
載の方法が適用される部分である。図２において、ター
ゲット信号作成部１０２、ロングタームラグ選択部１０
３、第１コード選択部１０４及び第２コード選択部１０
５の機能ブロックに対して、音声平均パワーＲ０を括弧
書きで入力させているのは、文献Ｂの方法が適用可能な
ことを示している。また、ゲイン選択部１１４において
も、ゲインパラメータをゲインに変換する場合だけでな
く、固定小数点位置の決定に音声平均パワーＲ０が利用
可能である。In FIG. 2, the target signal generator 10
2, long term lag selection section 103, first code selection section 104, second code selection section 105, and gain selection section 10
Reference numeral 6 and the like are functional units that execute various calculations, and are the portions to which the method described in Document B is applied. In FIG. 2, a target signal generation unit 102 and a long term lag selection unit 10
3, first code selecting section 104 and second code selecting section 10
The fact that the voice average power R0 is input in parentheses for the functional block of 5 indicates that the method of Document B is applicable. Also in the gain selection unit 114, the audio average power R0 can be used not only for converting the gain parameter into the gain but also for determining the fixed point position.

【００２７】このような各機能部において、音声平均パ
ワーＲ０が入力されると、その音声平均パワーＲ０が属
する段階を検出し、例えば、予め用意されているテーブ
ルを参照し、演算に用いる変数の固定小数点表現を得
る。このような処理の後に、所定の演算（例えば所定信
号がフィルタを通過する処理）を行なう。When the voice average power R0 is input to each of the functional units, the stage to which the voice average power R0 belongs is detected, and, for example, a table prepared in advance is referred to and the variable used for the calculation is determined. Get fixed-point representation. After such processing, a predetermined calculation (for example, processing of passing a predetermined signal through the filter) is performed.

【００２８】図３は、固定小数点位置を切り替えるため
のテーブルの構成例を示すものである。例えば、音声平
均パワーＲ０が、値ＰＷ０以上ＰＷ１未満では、変数１
の固定小数点位置をａ１にし、変数２の固定小数点位置
をａ２にし、変数ｙの固定小数点位置をａｙにすること
を規定している。因に、音声平均パワーＲ０が属する段
階と内部変数の固定小数点表現とを対応付けたテーブル
は、例えば、音声平均パワーＲ０が属する段階に応じた
その機能部からの出力信号（これも内部変数になる）の
固定小数点位置を意識し（出力信号の精度）、このよう
な出力信号の固定小数点位置を実現できるような内部変
数の固定小数点位置を格納しているものである。FIG. 3 shows an example of the structure of a table for switching fixed point positions. For example, when the voice average power R0 is the value PW0 or more and less than PW1, the variable 1
It is defined that the fixed point position of the variable is set to a1, the fixed point position of the variable 2 is set to a2, and the fixed point position of the variable y is set to ay. Incidentally, the table in which the stage to which the average voice power R0 belongs and the fixed point representation of the internal variable are associated with each other is, for example, an output signal from the functional unit corresponding to the stage to which the average voice power R0 belongs (this is also an internal variable. The fixed-point position of the internal variable that can realize such a fixed-point position of the output signal is stored.

【００２９】[0029]

【発明が解決しようとする課題】しかしながら、このよ
うな内部変数の固定小数点表現が切り替えられる複数の
機能部の内、ロングタームラグ選択部１０３は他の機能
部とは異なる性格を有する。すなわち、ロングタームラ
グ選択部１０３は、過去の数個のサブフレームの合成励
振信号ｅｘをロングタームフィルタステートとして格納
しており、格納しているサブフレームによってはその合
成励振信号の固定小数点位置が他とは異なっている。現
在対象のサブフレームの音声平均パワーＲ０に基づい
て、当該ロングタームラグ選択部１０３からのロングタ
ーム励振コードベクトルｃ０の固定小数点位置を決定し
た場合、この決定された固定小数点位置に、ロングター
ムフィルタステートとして格納されている合成励振信号
の固定小数点位置を一致させて取出すことが必要となる
が、このことはデータをシフトさせることを意味し、シ
フト方向によっては値がオーバーフローすることもあ
る。However, the long term lag selecting section 103 has a character different from other functional sections among the plurality of functional sections whose fixed-point representations of the internal variables can be switched. That is, the long-term lag selection unit 103 stores the combined excitation signals ex of the past several subframes as a long-term filter state, and the fixed-point position of the combined excitation signal may be different depending on the stored subframes. Different from the others. When the fixed-point position of the long-term excitation code vector c0 from the long-term lag selection unit 103 is determined based on the audio average power R0 of the currently targeted subframe, the long-term filter is set to the determined fixed-point position. It is necessary to match the fixed-point positions of the composite excitation signal stored as a state, and to take them out. This means that the data is shifted, and the value may overflow depending on the shift direction.

【００３０】すなわち、音声平均パワーＲ０に基づい
て、固定小数点位置を切り替えたとしても、ロングター
ムラグ選択部１０３については、十分な精度を得ること
ができなかった。That is, even if the fixed point position is switched on the basis of the voice average power R0, the long term lag selecting section 103 cannot obtain sufficient accuracy.

【００３１】以上のように、ゲイン選択だけではなく、
固定小数点位置の切替えに用いる各サブフレーム毎の音
声平均パワーＲ０は、例えば上述のように、第１サブフ
レームの音声平均パワーは前フレームの音声平均パワー
とし、第２サブフレームの音声平均パワーは前フレーム
の音声平均パワーと現フレームの音声平均パワーの相乗
平均とし、第３及び第４サブフレームの音声平均パワー
はそれぞれ現フレームの音声平均パワーとするように決
定される。なお、第２サブフレームについて、２個の音
声平均パワーの平均的情報を相乗平均によって求めるよ
うにしているのは、音声平均パワーはサンプル値の二乗
和になっており、相乗平均の方が相加平均より、その中
間的な値のサンプル列の二乗和に近くなるためである。As described above, not only the gain selection but also the
The voice average power R0 for each subframe used for switching the fixed point position is, for example, as described above, the voice average power of the first subframe is the voice average power of the previous frame, and the voice average power of the second subframe is The voice average power of the previous frame and the voice average power of the current frame are determined as a geometric mean, and the voice average powers of the third and fourth sub-frames are determined to be the voice average power of the current frame, respectively. The average information of the two audio average powers is calculated by geometric mean for the second subframe because the audio average power is the sum of squares of the sample values, and the geometric mean is more This is because it is closer to the sum of squares of the sample sequence of the intermediate value than the arithmetic mean.

【００３２】上述の補間方法によれば、第１、第３及び
第４サブフレームの音声平均パワーを容易に得ることが
できるが、第２サブフレームについては演算が必要であ
る。しかも、相乗平均演算であるので、乗算及び平方処
理が必要である。一般的に、平方根を求める処理は、複
雑な計算機構によるか、また、多くの処理ステップを必
要とする。従って、ＶＳＥＬＰ音声符号化方式が適用さ
れた装置（音声符号化装置や音声復号化装置）に求めら
れている、可能な限りの小形化と低消費電力に反してい
る。According to the above-mentioned interpolation method, the voice average powers of the first, third and fourth subframes can be easily obtained, but calculation is required for the second subframe. Moreover, since it is a geometric mean calculation, multiplication and square processing are required. Generally, the process of obtaining a square root requires a complicated calculation mechanism or requires many processing steps. Therefore, this is contrary to the possible miniaturization and low power consumption required for a device (speech encoding device or speech decoding device) to which the VSELP speech encoding system is applied.

【００３３】以上、ＶＳＥＬＰ音声符号化方式に従う装
置について課題を説明したが、一般的なＣＥＬＰ音声符
号化方式に従う装置においても、過去の合成励振信号を
格納してある種の励振コードベクトルを出力する適応コ
ードブック部を有するものがあり、また、音声平均パワ
ーをサブフレーム毎に補間するものがある。すなわち、
上記課題は、一般的なＣＥＬＰ音声符号化方式に従う装
置についても生じているものである。The problem has been described above with respect to the device conforming to the VSELP speech coding system. However, even in the device conforming to the general CELP speech coding system, the past synthesized excitation signal is stored and a certain kind of excitation code vector is output. Some have an adaptive codebook section, and some interpolate the speech average power for each subframe. That is,
The above problem also occurs in a device that complies with a general CELP speech coding system.

【００３４】本発明は、以上の点を考慮してなされたも
のであり、過去の合成励振信号を格納している適応励振
コードベクトル選択部においても、固定小数点位置の切
替えによる精度向上を実現することができる音声符号化
装置及び音声復号化装置を提供しようとしたものであ
る。The present invention has been made in consideration of the above points, and also in the adaptive excitation code vector selection unit that stores the past combined excitation signal, the accuracy is improved by switching the fixed point position. It is an object of the present invention to provide a speech coding apparatus and a speech decoding apparatus which can be performed.

【００３５】また、本発明は、フレーム毎の音声平均パ
ワーからサブフレーム毎の音声平均パワーを補間して求
めることを、簡単な構成又は簡単な処理によって実行す
ることができる音声符号化装置を提供しようとしたもの
である。Further, the present invention provides a speech coding apparatus capable of performing interpolating calculation of the speech average power of each subframe from the speech average power of each frame with a simple configuration or a simple process. It was something I tried to do.

【００３６】[0036]

【課題を解決するための手段】かかる課題を解決するた
め、請求項１の本発明においては、音声信号を、有限語
長によるデジタル処理によってパラメータ符号化するＣ
ＥＬＰ音声符号化方式に従う音声符号化装置であって、
過去の数個のフレーム（この請求項においてはサブフレ
ームを含む概念）の合成励振信号を保存しておき、その
信号系列の一部を適応的な励振コードベクトル成分とし
て出力する、内部変数の固定小数点位置を切替え可能な
適応励振コードベクトル選択部を有する音声符号化装置
において、以下の構成を設けた。In order to solve such a problem, according to the present invention of claim 1, a voice signal is parameter-encoded by digital processing by a finite word length C.
A speech coder according to an ELP speech coding method, comprising:
A fixed internal variable that saves a composite excitation signal of several past frames (a concept including subframes in this claim) and outputs a part of the signal sequence as an adaptive excitation code vector component The following configuration is provided in a speech coder having an adaptive excitation code vector selector capable of switching the decimal point position.

【００３７】すなわち、適応励振コードベクトル選択部
に保存されている過去の数個のフレームの合成励振信号
のそれぞれについて、固定小数点位置情報を格納する小
数点位置バッファ部と、この小数点位置バッファ部に格
納されている複数の固定小数点位置情報と、現フレーム
についての音声平均パワーとから、適応励振コードベク
トル選択部における内部変数の固定小数点位置を決定す
る小数点位置選択部とを設けた。That is, for each of the composite excitation signals of the past several frames stored in the adaptive excitation code vector selection unit, a decimal point position buffer unit that stores fixed point position information and a decimal point position buffer unit A fixed-point position selection unit that determines the fixed-point position of the internal variable in the adaptive excitation code vector selection unit is provided based on the plurality of fixed-point position information stored and the voice average power for the current frame.

【００３８】また、請求項２の本発明は、音声信号を、
有限語長によるデジタル処理によってパラメータ符号化
するＣＥＬＰ音声符号化方式に従う音声符号化装置であ
って、音声信号のフレーム毎の音声平均パワーを求める
音声パワー計算部と、相前後する２個のフレームの音声
平均パワーから、１フレームを数個に等分したサブフレ
ーム毎の音声平均パワー情報を得る音声パワー補間部と
を有する音声符号化装置を、以下のようにした。Further, according to the present invention of claim 2, a voice signal is converted into
A speech coding apparatus according to a CELP speech coding method for parameter coding by digital processing with a finite word length, comprising a speech power calculation unit for obtaining a speech average power of each frame of a speech signal, and two consecutive frames. A speech coding apparatus having a speech power interpolating unit that obtains speech average power information for each subframe obtained by equally dividing one frame into several pieces from the speech average power is as follows.

【００３９】すなわち、音声パワー補間部が、少なくと
も音声平均パワーの量子化手段を備え、サブフレーム毎
の音声平均パワー情報を求めるときに、フレーム毎の音
声平均パワーを量子化したインデックスで扱い、２個の
フレームの音声平均パワーの相乗平均情報を求めるサブ
フレームに対してはインデックスの相加平均で求め、サ
ブフレーム毎に求められたインデックスを出力し、又
は、求められたインデックスを逆量子化手段を介して戻
した音声平均パワーを出力することとした。That is, the voice power interpolating unit is provided with at least a means for quantizing the voice average power, and when the voice average power information for each subframe is obtained, the voice average power for each frame is treated with a quantized index. For the subframes for which the geometric mean information of the voice average powers of the individual frames is obtained, the arithmetic mean of the indexes is obtained, and the indexes obtained for each subframe are output, or the obtained indexes are dequantized. It is decided to output the audio average power returned via the.

【００４０】さらに、請求項３の本発明は、音声信号
を、有限語長によるデジタル処理によってパラメータ符
号化するＣＥＬＰ音声符号化方式に従う音声符号化装置
であって、音声信号のフレーム毎の音声平均パワーを求
める音声パワー計算部と、相前後する２個のフレームの
音声平均パワーから、１フレームを数個に等分したサブ
フレーム毎の音声平均パワー情報を得る音声パワー補間
部と、過去の数個のサブフレームの合成励振信号を保存
しておき、その信号系列の一部を適応的な励振コードベ
クトル成分として出力する、内部変数の固定小数点位置
を切替え可能な適応励振コードベクトル選択部を有する
音声符号化装置を、以下のようにした。Furthermore, the present invention of claim 3 is a speech coder according to the CELP speech coding system for parameter-coding a speech signal by digital processing according to a finite word length, wherein speech averaging for each frame of a speech signal is performed. An audio power calculation unit that obtains power, an audio power interpolation unit that obtains audio average power information for each subframe obtained by equally dividing one frame into several parts from the audio average powers of two consecutive frames, and a past number It has an adaptive excitation code vector selector that can switch the fixed-point position of the internal variable, which stores the composite excitation signal of each subframe and outputs a part of the signal sequence as an adaptive excitation code vector component. The speech coder is as follows.

【００４１】すなわち、音声パワー補間部が、少なくと
も音声平均パワーの量子化手段を備え、サブフレーム毎
の音声平均パワー情報を求めるときに、フレーム毎の音
声平均パワーを量子化したインデックスで扱い、２個の
フレームの音声平均パワーの相乗平均情報を求めるサブ
フレームに対してはインデックスの相加平均で求め、サ
ブフレーム毎に求められたインデックスを出力し、又
は、求められたインデックスを逆量子化手段を介して戻
した音声平均パワーを出力すると共に、適応励振コード
ベクトル選択部に保存されている過去の数個のサブフレ
ームの合成励振信号のそれぞれについて、固定小数点位
置情報を格納する小数点位置バッファ部と、この小数点
位置バッファ部に格納されている複数の固定小数点位置
情報と、現サブフレームについての音声平均パワーとか
ら、適応励振コードベクトル選択部における内部変数の
固定小数点位置を決定する小数点位置選択部とを設け
た。That is, the voice power interpolating unit is provided with at least a means for quantizing the voice average power, and when the voice average power information for each subframe is obtained, the voice average power for each frame is treated with a quantized index. For the subframes for which the geometric mean information of the voice average powers of the individual frames is obtained, the arithmetic mean of the indexes is obtained, and the indexes obtained for each subframe are output, or the obtained indexes are dequantized. Outputs the voice average power returned via the, and the decimal point position buffer unit that stores fixed point position information for each of the past several subframe composite excitation signals stored in the adaptive excitation code vector selection unit. And a plurality of fixed point position information stored in this decimal point position buffer section and the current subframe And a voice average power of beam, provided with decimal point selection unit which determines a fixed-point position of the internal variables of the adaptive excitation code vector selection unit.

【００４２】請求項４の本発明は、請求項１〜３の本発
明のいずれかに記載の音声符号化装置が採用しているＣ
ＥＬＰ音声符号化方式が、その１種であるフォワード形
ＶＳＥＬＰ音声符号化方式であることを特徴とする。The present invention of claim 4 is the C adopted by the speech encoding apparatus according to any one of the inventions of claims 1 to 3.
The ELP speech coding method is a forward VSELP speech coding method, which is one of the ELP speech coding methods.

【００４３】請求項５の本発明は、請求項１の本発明の
音声符号化装置に対応したＣＥＬＰ音声符号化方式に従
う音声復号化装置であって、過去の数個のフレーム（こ
の請求項においてはサブフレームを含む概念）の合成励
振信号を保存しておき、その信号系列の一部を適応的な
励振コードベクトル成分として出力する、内部変数の固
定小数点位置を切替え可能な適応励振コードベクトル選
択部を有する音声復号化装置において、以下の構成を設
けた。The present invention of claim 5 is a speech decoding apparatus according to the CELP speech coding system corresponding to the speech coding apparatus of the present invention of claim 1, wherein several past frames (in this claim) Is a concept that includes subframes) and outputs a part of the signal sequence as an adaptive excitation code vector component. Adaptive excitation code vector selection that can switch the fixed-point position of the internal variable. The following configuration is provided in a speech decoding apparatus having a section.

【００４４】すなわち、適応励振コードベクトル選択部
に保存されている過去の数個のフレームの合成励振信号
のそれぞれについて、固定小数点位置情報を格納する小
数点位置バッファ部と、この小数点位置バッファ部に格
納されている複数の固定小数点位置情報と、現フレーム
についての音声平均パワーとから、適応励振コードベク
トル選択部における内部変数の固定小数点位置を決定す
る小数点位置選択部とを設けた。That is, for each of the composite excitation signals of the past several frames stored in the adaptive excitation code vector selection unit, a decimal point position buffer unit for storing fixed point position information and a decimal point position buffer unit are stored. A fixed-point position selection unit that determines the fixed-point position of the internal variable in the adaptive excitation code vector selection unit is provided based on the plurality of fixed-point position information stored and the voice average power for the current frame.

【００４５】請求項６の本発明は、請求項５の本発明の
いずれかに記載の音声復号化装置が採用しているＣＥＬ
Ｐ音声符号化方式が、その１種であるフォワード形ＶＳ
ＥＬＰ音声符号化方式であることを特徴とする。The present invention of claim 6 is the CEL adopted by the speech decoding apparatus according to any one of the invention of claim 5.
The P voice encoding method is one of them, a forward VS.
It is characterized in that it is an ELP voice encoding system.

【００４６】[0046]

【作用】請求項１及び５の本発明は、音声信号を、有限
語長によるデジタル処理によってパラメータ符号化して
伝送するＣＥＬＰ符号化方式に従う音声符号化装置及び
音声復号化装置であって、過去の数個のフレームの合成
励振信号を保存しておき、その信号系列の一部を適応的
な励振コードベクトル成分として出力する、内部変数の
固定小数点位置を切替え可能な適応励振コードベクトル
選択部を有するものに関する。The present invention according to claims 1 and 5 is a speech coding apparatus and speech decoding apparatus according to the CELP coding system, which parameter-codes a speech signal by digital processing with a finite word length and transmits the speech signal. It has an adaptive excitation code vector selector that can switch the fixed-point position of the internal variable, which stores the composite excitation signal of several frames and outputs a part of the signal sequence as an adaptive excitation code vector component. Regarding things.

【００４７】既に、音声平均パワーによって、各機能部
が処理する際に用いる内部変数の固定小数点位置を切り
替える方法が提案されている。過去の数個のフレームの
合成励振信号を保存してこれを利用する適応励振コード
ベクトル選択部においては、過去の数個のフレームの合
成励振信号毎に固定小数点位置が異なるため、単に現フ
レームについての音声平均パワーから固定小数点位置を
決定した場合、過去のフレームの合成励振信号から見て
固定小数点位置が妥当ではないことがある（例えば小さ
い数を正確に表現できても大きい数ではオーバーフロー
して不正確になることがある）。There has already been proposed a method of switching the fixed point position of the internal variable used when each functional unit processes by the voice average power. In the adaptive excitation code vector selection unit that saves and uses the composite excitation signals of the past several frames, the fixed-point position is different for each of the composite excitation signals of the past several frames. When the fixed point position is determined from the audio average power of, the fixed point position may not be valid from the viewpoint of the combined excitation signal of the past frames (for example, even if a small number can be expressed accurately, a large number may overflow. May be inaccurate).

【００４８】そこで、請求項１及び請求項５の本発明に
おいては、小数点位置バッファ部及び小数点位置選択部
を設け、小数点位置選択部が、小数点位置バッファ部に
格納されている複数の固定小数点位置情報と、現フレー
ムについての音声平均パワーとから、適応励振コードベ
クトル選択部における内部変数の固定小数点位置を決定
することとした。勿論、音声符号化装置と音声復号化装
置とでは、適応励振コードベクトル選択部が同じ状態で
適応励振コードベクトルを出力しなければならず、音声
符号化装置に、小数点位置バッファ部及び小数点位置選
択部を設けた場合には、それに対応する音声復号化装置
にもこれら機能部を設ける。Therefore, in the present invention of claim 1 and claim 5, a decimal point position buffer section and a decimal point position selecting section are provided, and the decimal point position selecting section is provided with a plurality of fixed decimal point positions stored in the decimal point position buffer section. The fixed point position of the internal variable in the adaptive excitation code vector selection unit is decided from the information and the speech average power for the current frame. Of course, in the speech coding apparatus and the speech decoding apparatus, the adaptive excitation code vector selection section must output the adaptive excitation code vector in the same state, and the speech coding apparatus must select the decimal point position buffer section and the decimal point position selection section. When a unit is provided, these functional units are also provided in the corresponding voice decoding device.

【００４９】請求項２の本発明は、音声信号を、有限語
長によるデジタル処理によってパラメータ符号化するＣ
ＥＬＰ音声符号化方式に従う音声符号化装置であって、
音声信号のフレーム毎の音声平均パワーを求める音声パ
ワー計算部と、相前後する２個のフレームの音声平均パ
ワーから、１フレームを数個に等分したサブフレーム毎
の音声平均パワー情報を得る音声パワー補間部とを有す
る音声符号化装置を前提とする。According to the present invention of claim 2, a voice signal is parameter-encoded by digital processing with a finite word length.
A speech coder according to an ELP speech coding method, comprising:
A voice power calculation unit that obtains a voice average power of each frame of a voice signal, and a voice that obtains voice average power information of each subframe obtained by equally dividing one frame into several voices from the voice average powers of two consecutive frames. It is premised on a speech coding apparatus having a power interpolation unit.

【００５０】サブフレームによっては、相前後する２個
のフレームの音声平均パワーの平均的情報が補間情報と
して求められる。音声平均パワーが、サンプル値の二乗
和で求められることを考慮し、従来では、相乗平均値で
補間していたが、これでは補間構成又は補間処理が複雑
である。Depending on the sub-frame, the average information of the average voice powers of two adjacent frames is obtained as the interpolation information. Considering that the voice average power is obtained by the sum of squares of sample values, in the past, interpolation was performed using a geometric mean value, but this requires a complicated interpolation configuration or interpolation process.

【００５１】そこで、請求項２の本発明においては、音
声パワー補間部が、少なくとも音声平均パワーの量子化
手段を備えるようにし、サブフレーム毎の音声平均パワ
ー情報を求めるときに、フレーム毎の音声平均パワーを
量子化したインデックスで扱うこととした。また、サブ
フレームによっては、２個のフレームの音声平均パワー
の相乗平均情報が求められることがあるが、この場合に
は、インデックスの相加平均で求めることとした。サブ
フレーム毎の音声平均パワー情報の利用の仕方によって
は、サブフレーム毎に求められた音声平均パワーのイン
デックスを出力するようにしても良く、また、求められ
たインデックスを逆量子化手段を介して戻した音声平均
パワーを出力するようにしても良い。Therefore, in the present invention of claim 2, the voice power interpolating section is provided with at least a voice average power quantizing means, and when the voice average power information for each subframe is obtained, the voice for each frame is determined. It was decided to handle the average power with a quantized index. Further, depending on the subframe, the geometric mean information of the sound average powers of the two frames may be obtained, but in this case, the arithmetic mean of the indexes is used. Depending on the way of using the audio average power information for each subframe, the index of the audio average power obtained for each subframe may be output, and the obtained index may be output via an inverse quantizer. The returned audio average power may be output.

【００５２】請求項３の本発明の音声符号化装置は、請
求項１及び２の本発明の特徴部分を共に備えたものであ
る。The speech coder according to the third aspect of the present invention comprises both the features of the first and second aspects of the present invention.

【００５３】請求項４の本発明は、請求項１〜３の本発
明のいずれかに記載の音声符号化装置が採用しているＣ
ＥＬＰ音声符号化方式が、その１種であるフォワード形
ＶＳＥＬＰ音声符号化方式であることに限定したもので
あり、請求項６の本発明は、請求項５の本発明に記載の
音声復号化装置が採用しているＣＥＬＰ音声符号化方式
が、その１種であるフォワード形ＶＳＥＬＰ音声符号化
方式であることに限定したものである。移動体通信等の
圧縮通信では、フォワード形ＶＳＥＬＰ音声符号化方式
が採用されており、請求項１〜３の本発明のいずれ共
に、また、請求項６の本発明も有効に機能する発明であ
る。The present invention of claim 4 is the C adopted by the speech encoding apparatus according to any one of the inventions of claims 1 to 3.
The ELP speech coding system is limited to the forward type VSELP speech coding system, which is one of the ELP speech coding systems. The invention of claim 6 is the speech decoding apparatus according to the invention of claim 5. The CELP speech coding method adopted by the present invention is limited to the forward VSELP speech coding method which is one of them. The forward VSELP voice encoding system is adopted in compression communication such as mobile communication, and the present invention of claim 6 and the present invention of claim 6 function effectively. .

【００５４】[0054]

【Example】

（Ａ）ＶＳＥＬＰ音声符号化装置以下、本発明による音声符号化装置の一実施例を図面を
参照しながら詳述する。この実施例は、ＶＳＥＬＰ音声
符号化方式に従うＶＳＥＬＰ音声符号化装置の例であ
り、図１にその機能ブロック構成を示している。(A) VSELP Speech Encoding Device An embodiment of the speech encoding device according to the present invention will be described in detail below with reference to the drawings. This embodiment is an example of a VSELP speech coder according to the VSELP speech coding method, and FIG. 1 shows the functional block configuration thereof.

【００５５】なお、図１においても、各機能ブロックか
ら引き出されている破線出力線に係る情報は、復号化装
置に伝送されるものを示し、各機能ブロックから引き出
されている実線出力線に係る情報は、伝送する情報を決
定（探索）する際に処理される情報である。Note that, also in FIG. 1, the information regarding the broken line output lines drawn from the respective functional blocks indicates what is transmitted to the decoding device, and the information regarding the solid line output lines drawn from the respective functional blocks is shown. The information is information processed when determining (searching) information to be transmitted.

【００５６】また、図１に示す各種情報を表す符号も決
定された状態のものとし、探索中のもの（候補）に対し
てはその符号末尾に符号ｔを付けて説明する。Further, the codes representing the various information shown in FIG. 1 are assumed to be in a determined state, and the information (candidates) under search will be described by adding the code t to the end of the code.

【００５７】この実施例のＶＳＥＬＰ音声符号化装置
は、図１に示すように、音声パワー計算部２０１、音声
パワー補間部２０２、ＬＰＣ分析部２０３、ターゲット
信号作成部２０４、ロングタームラグ選択部２０５、小
数点位置バッファ部２０６、小数点位置選択部２０７、
第１コード選択部２０８、第２コード選択部２０９、ゲ
イン選択部２１０、３個の増幅回路２１１〜２１３、加
算回路２１４から構成されている。The VSELP speech coding apparatus of this embodiment, as shown in FIG. 1, has a speech power calculation section 201, a speech power interpolation section 202, an LPC analysis section 203, a target signal generation section 204, and a long term lag selection section 205. , Decimal point position buffer unit 206, decimal point position selection unit 207,
The first code selecting unit 208, the second code selecting unit 209, the gain selecting unit 210, the three amplifying circuits 211 to 213, and the adding circuit 214 are included.

【００５８】音声パワー計算部２０１にはデジタル入力
音声信号Ｖが与えられ、音声パワー計算部２０１は、入
力音声信号Ｖの音声平均パワーをフレーム（例えば１６
０サンプル）毎に計算し（例えばサンプル値の二乗
和）、フレーム毎の音声平均パワーＲｆを音声パワー補
間部２０２及びＬＰＣ分析部２０３に与える。A digital input voice signal V is given to the voice power calculation unit 201, and the voice power calculation unit 201 calculates the voice average power of the input voice signal V in frames (for example, 16).
It is calculated every 0 samples (for example, the sum of squares of sample values), and the voice average power Rf for each frame is given to the voice power interpolation unit 202 and the LPC analysis unit 203.

【００５９】音声パワー補間部２０２は、現フレーム及
び前フレームの音声平均パワーＲｆから、サブフレーム
（例えば４０サンプル）毎の音声平均パワーＲ０を補間
して得て後述するように種々の機能部に出力する。The voice power interpolator 202 interpolates the voice average power R0 of each subframe (for example, 40 samples) from the voice average power Rf of the current frame and the previous frame to obtain various functions as described later. Output.

【００６０】ここで、音声パワー補間部２０２は、対数
のオーダーで表現されている音声平均パワーの量子化テ
ーブル２０２ａを有し、その音声平均パワーがどの量子
化段階に属するかを表すインデックスを得ることができ
るようになされている。また、音声パワー補間部２０２
は、音声平均パワーのインデックスから音声平均パワー
を得る逆量子化テーブル２０２ｂも備えている。Here, the voice power interpolation unit 202 has a quantization table 202a of the average voice power expressed in logarithmic order, and obtains an index indicating which quantization stage the average voice power belongs to. Is made possible. Also, the audio power interpolation unit 202
Also has an inverse quantization table 202b for obtaining the average voice power from the average voice power index.

【００６１】この実施例においては、音声パワー補間部
２０２は、各フレームの音声平均パワーＲｆを量子化テ
ーブル２０２ａを用いてインデックスに変換し、インデ
ックス段階でサブフレームに対する補間処理を行ない、
各サブフレームの音声平均パワーのインデックスを逆量
子化テーブル２０２ｂを用いて音声平均パワーＲ０に変
換する。In this embodiment, the voice power interpolating unit 202 converts the voice average power Rf of each frame into an index using the quantization table 202a, and performs interpolation processing for subframes at the index stage.
The index of the average voice power of each sub-frame is converted into the average voice power R0 using the inverse quantization table 202b.

【００６２】サブフレーム毎のインデックスの補間は、
例えば次のように行なう。(1) 第１サブフレームの音声
平均パワーのインデックスとして、前フレームの音声平
均パワーのインデックスを用いる。(2) 第２サブフレー
ムの音声平均パワーのインデックスとして、前フレーム
の音声平均パワーのインデックスと現フレームの音声平
均パワーのインデックスとの相加平均を用いる。(3) 第
３のサブフレームの音声平均パワーのインデックスとし
て、現フレームの音声平均パワーのインデックスを用い
る。(4) 第４のサブフレームの音声平均パワーのインデ
ックスとして、現フレームの音声平均パワーのインデッ
クスを用いる。Interpolation of the index for each subframe is
For example, do as follows. (1) The index of the average audio power of the previous frame is used as the index of the average audio power of the first subframe. (2) As the index of the average audio power of the second sub-frame, the arithmetic average of the index of average audio power of the previous frame and the index of average audio power of the current frame is used. (3) The index of the average voice power of the current frame is used as the index of the average voice power of the third subframe. (4) The index of the average voice power of the current frame is used as the index of the average voice power of the fourth subframe.

【００６３】なお、逆量子化テーブル２０２ｂは、相加
平均で得られた端数を有するインデックスを音声平均パ
ワーに変換できるように、量子化テーブル２０２ａより
細かく構成されている。The dequantization table 202b is configured more finely than the quantization table 202a so that an index having a fraction obtained by arithmetic averaging can be converted into a speech average power.

【００６４】この実施例の補間方法において、音声平均
パワーのインデックスへの変換、インデックスから音声
平均パワーへの変換はテーブル２０２ａ及び２０２ｂを
用いて行なっているので簡単な処理であり、しかも、２
個の音声平均パワーの平均的情報を得る処理も相乗平均
ではなく相加平均を用いているので簡単な処理である。
相乗平均値は、乗算及び平方処理で得られるが、相加平
均値は、加算及び１／２倍処理で得られる。相加平均演
算における加算も簡単な処理であり、複数ビットでなる
データの１／２倍も１ビットシフトという簡単な処理で
ある。In the interpolation method of this embodiment, the conversion of the voice average power into the index and the conversion of the index into the voice average power are performed by using the tables 202a and 202b, which is a simple process.
The process of obtaining the average information of the individual voice average powers is also a simple process because the arithmetic mean is used instead of the geometric mean.
The geometric mean value is obtained by multiplication and square processing, while the arithmetic mean value is obtained by addition and 1/2 processing. The addition in the arithmetic mean calculation is also a simple process, and is a simple process of ½ times the data of a plurality of bits and shifting by 1 bit.

【００６５】なお、以上では、インデックスを逆量子化
して音声平均パワーに戻した後、各種機能部に与えるも
のを示したが、インデックスのまま各種機能部に与える
ようにしても良く、インデックスのまま復号化装置に送
信するようにしても良い。サブフレーム毎の音声平均パ
ワーＲ０は、後述するように、各種機能部における内部
変数の固定小数点位置に用いられるものであり、絶対的
な値が問題ではなく、固定小数点位置との関係では段階
が明らかであれば良く、インデックスを出力しても十分
であり、むしろ、インデックスを与える形式の方が各種
機能部での構成を簡単にできる。但し、ゲイン選択部２
１０では、サブフレーム毎の音声平均パワーの絶対的な
値がゲインの選択動作に必要であるので、インデックス
で与える場合には、ゲイン選択部２１０内に逆量子化テ
ーブルを持たせることが必要となる。しかし、以下で
は、サブフレーム毎の音声平均パワーＲ０自体を各種機
能部に与えるとして説明する。In the above description, the index is inversely quantized and returned to the voice average power and then given to the various function units. However, the index may be given to the various function units as it is, or the index may be given as it is. You may make it transmit to a decoding apparatus. The audio average power R0 for each subframe is used for the fixed point position of the internal variable in various functional units, as will be described later, and the absolute value is not a problem, and there is a step in relation to the fixed point position. It is sufficient if it is clear, and it is sufficient to output the index. Rather, the form of giving the index can simplify the configuration of various functional units. However, the gain selection unit 2
In No. 10, since the absolute value of the average voice power for each subframe is necessary for the gain selection operation, it is necessary to have an inverse quantization table in the gain selection unit 210 when it is given as an index. Become. However, in the following description, it is assumed that the audio average power R0 itself for each subframe is given to various functional units.

【００６６】ＬＰＣ分析部２０３は声道分析手段として
設けられたものである。ＬＰＣ分析部２０３は、与えら
れた現フレームの音声平均パワーＲｆに基づいて、内部
で行なうＬＰＣ分析演算での変数の固定小数点位置を切
り替える（図３参照）。その後、１フレーム分の入力音
声信号Ｖに対してＬＰＣ分析を行なってＬＰＣ係数ｋj
を求めて出力する。The LPC analysis section 203 is provided as a vocal tract analysis means. The LPC analysis unit 203 switches the fixed point position of the variable in the LPC analysis calculation performed internally based on the given voice average power Rf of the current frame (see FIG. 3). After that, the LPC analysis is performed on the input audio signal V for one frame to obtain the LPC coefficient kj.
And output.

【００６７】なお、従来では、ＬＰＣ係数についてもサ
ブフレーム毎の補間値を求めて出力するものを示した
が、この実施例においてはフレーム毎のＬＰＣ係数ｋj
を全てのサブフレームで利用する。しかし、この相違
は、本発明の特徴とは無関係であり、従来と同様に、サ
ブフレーム毎にＬＰＣ係数を得て出力するようにしても
良い。In the prior art, the LPC coefficient is also calculated and output for each subframe, but in this embodiment, the LPC coefficient kj for each frame is output.
Is used in all subframes. However, this difference is irrelevant to the features of the present invention, and the LPC coefficient may be obtained and output for each subframe as in the conventional case.

【００６８】ターゲット信号作成部２０４は、図２に示
した従来のターゲット信号作成部１０２とほぼ同様な処
理を行なってサブフレーム毎のターゲット信号ｐを形成
して出力する。図２に示した従来のターゲット信号作成
部１０２と異なる点は、ターゲット信号ｐの作成演算処
理を開始する前に、入力されたサブフレームの音声平均
パワーＲ０に応じて、ターゲット信号ｐの作成演算に用
いる変数の固定小数点位置の設定を行なう点である。Target signal generating section 204 performs substantially the same processing as conventional target signal generating section 102 shown in FIG. 2 to form and output target signal p for each subframe. The difference from the conventional target signal generation unit 102 shown in FIG. 2 is that the calculation calculation of the target signal p is performed in accordance with the average audio power R0 of the input subframe before the calculation calculation process of the target signal p is started. The point is to set the fixed point position of the variable used for.

【００６９】ロングタームラグ選択部２０５は、図２に
示した従来のロングタームラグ選択部１０３とほぼ同様
な処理を行なってサブフレーム毎に最適なロングターム
励振コードベクトルｃ０を得て出力する。勿論、この際
には、最適なロングタームラグＬも決定される。The long-term lag selecting section 205 performs almost the same processing as the conventional long-term lag selecting section 103 shown in FIG. 2 to obtain and output the optimum long-term excitation code vector c0 for each subframe. Of course, at this time, the optimum long-term lug L is also determined.

【００７０】しかし、この実施例のロングタームラグ選
択部２０５は、最適なロングターム励振コードベクトル
ｃ０や最適なロングタームラグＬを決定する処理に供す
る内部変数の固定小数点位置を切り替える機能を有する
点が従来とは異なっており、しかも、内部変数の固定小
数点位置を切り替える方法も、音声平均パワーＲ０にの
み基づく文献Ｂに記載されている方法とは異なってい
る。However, the long term lag selecting section 205 of this embodiment has a function of switching the fixed point position of the internal variable used for the process of determining the optimum long term excitation code vector c0 and the optimum long term lag L. Is different from the conventional one, and the method of switching the fixed point position of the internal variable is also different from the method described in Document B based only on the average sound power R0.

【００７１】固定小数点位置の切替えのために、この実
施例のロングタームラグ選択部２０５には関連して小数
点位置バッファ部２０６及び小数点位置選択部２０７が
設けられている。In order to switch the fixed point position, the long term lag selecting section 205 of this embodiment is provided with a decimal point position buffer section 206 and a decimal point position selecting section 207.

【００７２】図４は、ロングタームラグ選択部２０３内
のロングタームフィルタステートバッファ部、及び、小
数点位置バッファ部２０６の格納内容を示すものであ
る。FIG. 4 shows the stored contents of the long term filter state buffer section and the decimal point position buffer section 206 in the long term lag selection section 203.

【００７３】ロングタームラグ選択部２０３には、サブ
フレーム毎の最終的な合成励振信号ｅｘが与えられ、直
前過去の数個（図４では４個の例）のサブフレームの最
終的な合成励振信号ｅｘ（−１）、ｅｘ（−２）、ｅｘ
（−３）、ｅｘ（−４）が、図４（ａ）に示すように、
ロングタームフィルタステートバッファ部に格納され
る。この合成励振信号ｅｘの格納時には、その合成励振
信号に係る固定小数点位置ｆｐの情報が小数点位置バッ
ファ部２０６に与えられ、図４（ｂ）に示すように、サ
ブフレームを明らかにしてその固定小数点位置ｆｐ（−
１）、ｆｐ（−２）、ｆｐ（−３）、ｆｐ（−４）の情
報が小数点位置バッファ部２０６に格納される。The final synthetic excitation signal ex for each subframe is given to the long-term lag selecting section 203, and the final synthetic excitation signals of the last few past (four in FIG. 4) subframes are obtained. Signals ex (-1), ex (-2), ex
(-3) and ex (-4) are as shown in FIG.
It is stored in the long-term filter state buffer section. When the composite excitation signal ex is stored, information on the fixed point position fp related to the composite excitation signal is given to the decimal point position buffer unit 206, and as shown in FIG. Position fp (-
Information of 1), fp (-2), fp (-3), and fp (-4) is stored in the decimal point position buffer unit 206.

【００７４】ここで、現サブフレームについて最適なロ
ングターム励振コードベクトルｃ０を探索する際には、
ロングタームフィルタステートバッファ部に格納されて
いる部分からロングタームラグＬｔを変更しながら１サ
ブフレーム期間のデータを取出すことを要する。このよ
うな場合、過去の２個のサブフレームの合成励振信号に
股がるサブフレーム期間のデータを取出すことがほとん
どである。ロングタームフィルタステートバッファ部に
格納されている直前過去の数個のサブフレームの最終的
な合成励振信号ｅｘ（−１）、ｅｘ（−２）、ｅｘ（−
３）、ｅｘ（−４）の固定小数点位置ｆｐ（−１）、ｆ
ｐ（−２）、ｆｐ（−３）、ｆｐ（−４）は、異なって
いることの方が多く、音声平均パワーＲ０にのみ基づい
て、ロングタームラグ選択部２０５における内部変数の
固定小数点位置を決定することは妥当ではない。Here, when searching for the optimum long-term excitation code vector c0 for the current subframe,
It is necessary to extract data for one subframe period while changing the long term lag Lt from the portion stored in the long term filter state buffer unit. In such a case, it is almost the case that the data of the sub-frame period in which the previous two sub-frames are combined into the excitation signal is extracted. The final combined excitation signals ex (−1), ex (−2), ex (−) stored in the long-term filter state buffer unit for the last few past subframes.
3), fixed-point positions fp (-1), f of ex (-4)
p (−2), fp (−3), and fp (−4) are often different, and the fixed point position of the internal variable in the long term lag selection unit 205 is based on only the voice average power R0. It is not reasonable to decide.

【００７５】小数点位置選択部２０７は、直前過去の数
個のサブフレームの最終的な合成励振信号ｅｘ（−
１）、ｅｘ（−２）、ｅｘ（−３）、ｅｘ（−４）の固
定小数点位置ｆｐ（−１）、ｆｐ（−２）、ｆｐ（−
３）、ｆｐ（−４）を考慮し、以下のようにして、ロン
グタームラグ選択部２０５における内部変数の固定小数
点位置を決定する。The decimal point position selection unit 207 determines the final combined excitation signal ex (-
1), ex (-2), ex (-3), ex (-4) fixed point positions fp (-1), fp (-2), fp (-
3), fp (-4) is considered, and the fixed point position of the internal variable in the long term lag selection unit 205 is determined as follows.

【００７６】図５は、小数点位置選択部２０７に内蔵さ
れているテーブル構成を示すものである。小数点位置選
択部２０７は、現サブフレームの音声平均パワーＲ０に
よって、図５（ａ）に示すテーブルをアクセスして、現
サブフレームの合成励振信号に求められる固定小数点位
置ｆｐ（０）の推測値を得る。次に、小数点位置選択部
２０７は、小数点位置バッファ部２０６に格納されてい
る固定小数点位置ｆｐ（−１）、ｆｐ（−２）、ｆｐ
（−３）、ｆｐ（−４）と、現サブフレームに関して得
た固定小数点位置ｆｐ（０）の中で最も大きな数を表す
固定小数点位置ＦＰを求める。その後、小数点位置選択
部２０７は、この固定小数点位置ＦＰによって、図５
（ｂ）に示すテーブルをアクセスして、ロングタームラ
グ選択部２０５における内部変数のそれぞれについて固
定小数点位置を決定する。FIG. 5 shows a table structure built in the decimal point position selection unit 207. The decimal point position selection unit 207 accesses the table shown in FIG. 5A by using the voice average power R0 of the current subframe, and estimates the fixed point position fp (0) obtained for the composite excitation signal of the current subframe. To get Next, the decimal point position selection unit 207 stores the fixed decimal point positions fp (−1), fp (−2), fp stored in the decimal point position buffer unit 206.
(-3), fp (-4) and the fixed point position FP representing the largest number among the fixed point positions fp (0) obtained for the current subframe are obtained. After that, the decimal point position selection unit 207 uses the fixed point position FP as shown in FIG.
The table shown in (b) is accessed to determine the fixed point position for each of the internal variables in the long term lag selection unit 205.

【００７７】以上のようにして固定小数点位置を決定す
ることにより、ロングターム励振コードベクトルｃ０ｔ
（ロングタームラグＬｔ）の探索時に、精度が低下する
ことを防止できる。By determining the fixed point position as described above, the long-term excitation code vector c0t
It is possible to prevent the accuracy from decreasing when searching for (long term lag Lt).

【００７８】第１コード選択部２０８、第２コード選択
部２０９及びゲイン選択部２１０においても、ターゲッ
ト信号作成部２０４と同様に、音声平均パワーＲ０に基
づいた内部変数の固定小数点位置の決定処理が実行され
る（図３参照）。第１コード選択部２０８、第２コード
選択部２０９及びゲイン選択部２１０において、このよ
うな内部変数の固定小数点位置を決定した後に行なう処
理自体は、図２に示した従来の第１コード選択部１０
４、第２コード選択部１０５及びゲイン選択部１０６と
ほぼ同様であるので、その説明は省略する。In the first code selection section 208, the second code selection section 209, and the gain selection section 210 as well, similar to the target signal generation section 204, the fixed point position determination process of the internal variable based on the voice average power R0 is performed. It is executed (see FIG. 3). In the first code selection unit 208, the second code selection unit 209, and the gain selection unit 210, the process itself performed after determining the fixed point position of such an internal variable is the conventional first code selection unit shown in FIG. 10
4, the second code selecting section 105 and the gain selecting section 106 are almost the same, and therefore their explanations are omitted.

【００７９】上記実施例のＶＳＥＬＰ音声符号化装置に
よれば、ロングタームラグ選択部２０５における演算に
供する内部変数の固定小数点位置を、保存している直前
過去の数個の合成励振信号についての固定小数点位置、
及び、現サブフレームの音声平均パワーに基づいて、決
定するようにしているので、過去の合成励振信号の固定
小数点位置の相違に精度が影響されなくなり、音声信号
の符号化精度が高まり、復号された音声信号の品質を従
来より高めることができる。According to the VSELP speech coding apparatus of the above embodiment, the fixed decimal point positions of the internal variables used for the calculation in the long term lag selecting section 205 are fixed for the several past composite excitation signals which are stored. Decimal point position,
Also, since the determination is made based on the average voice power of the current subframe, the precision is not affected by the difference in the fixed point position of the past synthesized excitation signal, the encoding precision of the voice signal is improved, and it is decoded. It is possible to improve the quality of the voice signal compared to the conventional one.

【００８０】なお、ロングターム励振コードベクトルｃ
０（ロングタームラグＬ）の各候補毎に、その候補に係
る過去２個のサブフレームの合成励振信号の固定小数点
位置と、現サブフレームの音声平均パワーＲ０とから内
部変数の固定小数点位置を決定することも考えられる。
しかし、このようにすると、ロングターム励振コードベ
クトルｃ０（ロングタームラグＬ）の各候補毎に、内部
変数の見直しを行なうことを要して実際的ではなく、上
記実施例のように、全ての候補に対して内部変数の固定
小数点位置を一律に決定することが好ましい。The long-term excitation code vector c
For each candidate of 0 (long term lag L), the fixed point position of the internal variable is determined from the fixed point position of the combined excitation signal of the past two subframes related to the candidate and the average voice power R0 of the current subframe. It is also possible to decide.
However, in this case, it is not practical to review the internal variable for each candidate of the long-term excitation code vector c0 (long-term lag L), and it is not practical as in the above-described embodiment. It is preferable to uniformly determine the fixed-point positions of internal variables for the candidates.

【００８１】また、上記実施例のＶＳＥＬＰ音声符号化
装置によれば、２個のフレームの音声平均パワーからあ
るサブフレームの音声平均パワーを平均的情報値として
求めるにつき、音声平均パワーの量子化、量子化による
インデックスの相加平均処理を利用して行なうようにし
たので、サブフレームの音声平均パワーを得る構成を簡
単にでき、又は、処理ステップ数を少なくすることがで
きる（従って消費電力を小さくできる）。Further, according to the VSELP speech coding apparatus of the above-mentioned embodiment, when the speech average power of a certain subframe is obtained as the average information value from the speech average powers of two frames, the quantization of the speech average power, Since the arithmetic mean of the index by quantization is used, the structure for obtaining the average voice power of the subframe can be simplified, or the number of processing steps can be reduced (thus, the power consumption can be reduced. it can).

【００８２】（Ｂ）ＶＳＥＬＰ音声復号化装置次に、上記実施例によるＶＳＥＬＰ音声符号化装置に対
応した実施例のＶＳＥＬＰ音声復号化装置を図面を参照
しながら詳述する。(B) VSELP Speech Decoding Device Next, a VSELP speech decoding device of an embodiment corresponding to the VSELP speech decoding device according to the above embodiment will be described in detail with reference to the drawings.

【００８３】図６は、この実施例のＶＳＥＬＰ音声復号
化装置の機能ブロック構成を示すものである。なお、図
６において、各機能ブロックへの入力線のうち破線のも
のに係る情報は、復号化装置に伝送されてきたものを示
し、各機能ブロックへの入力線のうち実線のものに係る
情報は、当該復号化装置において形成された情報であ
る。FIG. 6 shows a functional block configuration of the VSELP speech decoding apparatus of this embodiment. In FIG. 6, the information about the broken lines of the input lines to each functional block indicates the information transmitted to the decoding device, and the information about the solid lines of the input lines to each functional block is shown. Is information formed in the decoding device.

【００８４】この実施例のＶＳＥＬＰ音声復号化装置
は、図６に示すように、ロングタームラグ選択部３０
１、小数点位置バッファ部３０２、小数点位置選択部３
０３、第１コード選択部３０４、第２コード選択部３０
５、ゲイン選択部３０６、３個の増幅回路３０７〜３０
９、加算回路３１０、合成フィルタ部３１１、ポストフ
ィルタ部３１２から構成されている。As shown in FIG. 6, the VSELP speech decoding apparatus of this embodiment has a long term lag selecting section 30.
1, decimal point position buffer unit 302, decimal point position selection unit 3
03, first code selection unit 304, second code selection unit 30
5, gain selection unit 306, three amplification circuits 307 to 30
9, an adder circuit 310, a synthesis filter unit 311, and a post filter unit 312.

【００８５】図６において、ＶＳＥＬＰ音声符号化装置
から送信されてきた情報の内、ロングタームラグＬはロ
ングタームラグ選択部３０１に与えられ、第１励振コー
ドベクトルのインデックスＩは第１コード選択部３０４
に与えられ、第２励振コードベクトルのインデックスＪ
は第２コード選択部３０５に与えられ、ゲインインデッ
クスＧはゲイン選択部３０６に与えられ、音声平均パワ
ーＲ０は小数点位置選択部３０３、第１コード選択部３
０４、第２コード選択部３０５、ゲイン選択部３０６、
合成フィルタ部３１１及びポストフィルタ部３１２に与
えられ、ＬＰＣ係数ｋj はゲイン選択部３０６、合成フ
ィルタ部３１１及びポストフィルタ部３１２に与えられ
る。In FIG. 6, among the information transmitted from the VSELP speech coding apparatus, the long term lag L is given to the long term lag selecting section 301, and the index I of the first excitation code vector is the first code selecting section. 304
Given to the index J of the second excitation code vector
Is given to the second code selecting unit 305, the gain index G is given to the gain selecting unit 306, and the voice average power R0 is the decimal point position selecting unit 303 and the first code selecting unit 3.
04, a second code selection unit 305, a gain selection unit 306,
The LPC coefficient kj is provided to the synthesis filter unit 311 and the post filter unit 312, and is provided to the gain selection unit 306, the synthesis filter unit 311 and the post filter unit 312.

【００８６】ロングタームラグ選択部３０１には、サブ
フレーム毎の確定された合成励振信号ｅｘが与えられ、
内部のロングタームフィルタステートバッファ部に格納
されるようになされており、また、その合成励振信号ｅ
ｘに係る固定小数点位置ｆｐの情報を小数点位置バッフ
ァ部３０２に与えて格納させるようになされている。す
なわち、現サブフレームの処理を開始する前のロングタ
ームフィルタステートバッファ部及び小数点位置バッフ
ァ部３０２の格納内容が、符号化装置における対応バッ
ファ部と同一になるようになされている。The long-term lag selecting section 301 is provided with the determined combined excitation signal ex for each subframe,
It is designed to be stored in the internal long-term filter state buffer section, and the composite excitation signal e
Information on the fixed point position fp related to x is given to the decimal point position buffer unit 302 to be stored. That is, the stored contents of the long-term filter state buffer unit and the decimal point position buffer unit 302 before starting the processing of the current subframe are the same as the corresponding buffer units in the encoding device.

【００８７】小数点位置選択部３０３は、現サブフレー
ムの音声平均パワーＲ０に基づいて現サブフレームの合
成励振信号に求められる固定小数点位置ｆｐ（０）の推
測値を得、次に、小数点位置バッファ部３０２に格納さ
れている直前過去の数個のサブフレームの合成励振信号
ｅｘ（−１）、ｅｘ（−２）、ｅｘ（−３）、ｅｘ（−
４）の固定小数点位置ｆｐ（−１）、ｆｐ（−２）、ｆ
ｐ（−３）、ｆｐ（−４）と、現サブフレームに関して
得た固定小数点位置ｆｐ（０）の中で最も大きな数を表
す固定小数点位置ＦＰを求め、その固定小数点位置ＦＰ
によって、ロングタームラグ選択部３０１における内部
変数について固定小数点位置を決定する。The decimal point position selection unit 303 obtains an estimated value of the fixed decimal point position fp (0) required for the composite excitation signal of the current subframe based on the voice average power R0 of the current subframe, and then the decimal point position buffer. The composite excitation signals ex (−1), ex (−2), ex (−3), ex (−) of the previous several past subframes stored in the unit 302.
4) fixed point positions fp (-1), fp (-2), f
The fixed point position FP that represents the largest number among p (−3) and fp (−4) and the fixed point position fp (0) obtained for the current subframe is calculated, and the fixed point position FP is obtained.
The fixed point position is determined for the internal variable in the long term lag selection unit 301.

【００８８】復号化装置のロングタームラグ選択部３０
１では、最適なロングターム励振コードベクトルｃ０の
探索が不要であるので、固定小数点が切り替えられる変
数は、出力されるロングターム励振コードベクトルｃ０
自体である。Long term lag selector 30 of the decoding device
In 1, it is not necessary to search for the optimum long-term excitation code vector c0, so the variable whose fixed point is switched is the output long-term excitation code vector c0.
Itself.

【００８９】ロングタームラグ選択部３０１は、現サブ
フレームに対する固定小数点位置が決定されると、与え
られたロングタームラグＬに基づいて、ロングタームフ
ィルタステートバッファ部からロングターム励振コード
ベクトルｃ０を取出して、ゲイン選択部３０６及び増幅
回路３０７に出力する。When the fixed-point position for the current subframe is determined, the long-term lag selecting section 301 extracts the long-term excitation code vector c0 from the long-term filter state buffer section based on the given long-term lag L. And outputs it to the gain selection unit 306 and the amplification circuit 307.

【００９０】第１コード選択部３０４、第２コード選択
部３０５、ゲイン選択部３０６、合成フィルタ部３１
１、ポストフィルタ部３１２は、自己ブロックに割り当
てられた本来の処理を実行する前に、符号化装置側から
与えられた音声平均パワーＲ０に基づいて、処理に供す
る内部変数の固定小数点位置を決定する（図３参照）。
なお、音声復号化装置における第１コード選択部３０
４、第２コード選択部３０５及びゲイン選択部３０６
は、最適値の探索処理が不要であるので、符号化装置に
おける対応機能ブロックより固定小数点位置が決定され
る内部変数の種類数は少ない。First code selecting section 304, second code selecting section 305, gain selecting section 306, synthesis filter section 31.
1. The post-filter unit 312 determines the fixed point position of the internal variable to be used for processing, based on the average voice power R0 given from the encoding device side, before executing the original processing assigned to the self block. (See FIG. 3).
The first code selection unit 30 in the speech decoding device
4, second code selection unit 305 and gain selection unit 306
Since there is no need to search for the optimum value, the number of types of internal variables whose fixed-point positions are determined is smaller than that of the corresponding functional block in the encoding device.

【００９１】第１コード選択部３０４は、符号化装置に
おける第１コード選択部２０８と同一のベーシスベクト
ルｖ１を格納しており、固定小数点位置が決定される
と、このベーシスベクトルと受信した第１励振コードベ
クトルについてのインデックスＩとから最適な第１励振
コードベクトルｃ１を復号して、増幅回路３０８及びゲ
イン選択部３０６に出力する。The first code selection unit 304 stores the same basis vector v1 as that of the first code selection unit 208 in the encoding device, and when the fixed point position is determined, the first vector received with this basis vector is determined. The optimum first excitation code vector c1 is decoded from the index I of the excitation code vector and output to the amplification circuit 308 and the gain selection unit 306.

【００９２】第２コード選択部３０５は、符号化装置に
おける第２コード選択部２０９と同一のベーシスベクト
ルｖ２を格納しており、固定小数点位置が決定される
と、このベーシスベクトルと受信した第２励振コードベ
クトルについてのインデックスＪとから最適な第２励振
コードベクトルｃ０を復号して、増幅回路３０９及びゲ
イン選択部３０６に出力する。The second code selecting unit 305 stores the same basis vector v2 as the second code selecting unit 209 in the encoding device, and when the fixed point position is determined, this second basis vector v2 is received. The optimum second excitation code vector c0 is decoded from the index J for the excitation code vector and output to the amplifier circuit 309 and the gain selection unit 306.

【００９３】ゲイン選択部３０６は、符号化装置におけ
るゲイン選択部２１０と同一のゲインパラメータＧＳ、
Ｐ0 、Ｐ1 の組情報を格納しており、固定小数点位置が
決定されると、受信したゲインインデックスＧから最適
なゲインパラメータＧＳ、Ｐ0 、Ｐ1 の組を取出し、得
られたロングターム励振コードベクトルｃ０、第１励振
コードベクトルｃ１、第２励振コードベクトルｃ２と、
受信したＬＰＣ係数ｋj 、平均音声パワーＲ0 とから最
適なゲインβ、γ１、γ２を復号し、それぞれ対応する
増幅回路３０７、３０８、３０９に出力する。The gain selection unit 306 has the same gain parameter GS as the gain selection unit 210 in the encoding device,
The set information of P0 and P1 is stored, and when the fixed point position is determined, the optimum set of gain parameters GS, P0 and P1 is extracted from the received gain index G, and the obtained long term excitation code vector c0 is obtained. , A first excitation code vector c1, a second excitation code vector c2,
Optimal gains β, γ1 and γ2 are decoded from the received LPC coefficient kj and average voice power R0 and output to the corresponding amplifier circuits 307, 308 and 309, respectively.

【００９４】以上のようにして得られたロングターム励
振コードベクトルｃ０、第１励振コードベクトルｃ１、
第２励振コードベクトルｃ２は、各増幅回路３０７、３
０８、３０９によって、対応するゲインβ、γ１、γ２
に基づいた利得制御がなされ、それらが加算回路３１０
によって加算されて現サブフレームについて合成励振信
号ｅｘが得られる。この合成励振信号ｅｘは、合成フィ
ルタ部３１１及びロングタームラグ選択部３０１に与え
られる。The long-term excitation code vector c0, the first excitation code vector c1, obtained as described above,
The second excitation code vector c2 is supplied to each amplifier circuit 307, 3
08, 309, corresponding gains β, γ1, γ2
Gain control based on
Are added to obtain a composite excitation signal ex for the current subframe. This combined excitation signal ex is given to the combined filter section 311 and the long term lag selecting section 301.

【００９５】合成フィルタ部３１１は、決定された固定
小数点位置表現で受信したＬＰＣ係数ｋj を設定してフ
ィルタ特性を規定し、入力された合成励振信号ｅｘをそ
のフィルタ特性でフィルタリングして音声信号を再生す
る。このようにして得られた再生音声信号の雑音成分が
大きいこともある。The synthesis filter unit 311 sets the LPC coefficient kj received in the determined fixed-point position representation to define the filter characteristic, and filters the input synthetic excitation signal ex with the filter characteristic to produce a voice signal. Reproduce. The reproduced voice signal thus obtained may have a large noise component.

【００９６】ポストフィルタ部３１２は、決定された固
定小数点位置表現で受信したＬＰＣ係数ｋj を設定して
フィルタ特性を規定し、合成フィルタ部３１１による再
生音声信号をそのフィルタ特性でフィルタリングして雑
音成分を圧縮した最終的な再生音声信号Ｖを得て出力す
る。The post-filter unit 312 sets the LPC coefficient kj received in the determined fixed-point position representation to define the filter characteristic, and filters the reproduced voice signal by the synthesis filter unit 311 with the filter characteristic to generate a noise component. A final reproduced audio signal V obtained by compressing is obtained and output.

【００９７】上記実施例のＶＳＥＬＰ音声復号化装置に
よれば、ロングタームラグ選択部３０１における演算に
供する内部変数の固定小数点位置を、保存している直前
過去の数個の合成励振信号についての固定小数点位置、
及び、現サブフレームの音声平均パワーに基づいて、決
定するようにしているので、過去の合成励振信号の固定
小数点位置の相違に精度が影響されなくなり、音声信号
の符号化精度が高まり、復号された音声信号の品質を従
来より高めることができる。勿論、符号化装置と同一の
固定小数点位置の決定方法を採用して上記効果が実効あ
るものとなる。According to the VSELP speech decoding apparatus of the above embodiment, the fixed decimal point positions of the internal variables used for the calculation in the long term lag selecting section 301 are fixed for the several past composite excitation signals which are stored. Decimal point position,
Also, since the determination is made based on the average voice power of the current subframe, the precision is not affected by the difference in the fixed point position of the past synthesized excitation signal, the encoding precision of the voice signal is improved, and it is decoded. It is possible to improve the quality of the voice signal compared to the conventional one. Of course, the same effect can be obtained by adopting the same fixed-point position determination method as that of the encoding device.

【００９８】（Ｃ）他の実施例上記実施例の説明においても、他の実施例について言及
したが、上記で説明した以外にも、以下のような他の実
施例を挙げることができる。(C) Other Embodiments In the description of the above embodiments, other embodiments have been mentioned, but the following other embodiments can be mentioned in addition to the above description.

【００９９】上記実施例の特徴は大きく言えば２個あ
る。第１の特徴は、音声平均パワーの補間方法に関し、
２個の音声平均パワーからその平均的な情報を得るのに
インデックスの相加平均を利用したことである。第２の
特徴は、過去のサブフレームの合成励振信号を現サブフ
レームの励振コードベクトル成分として用いる際の固定
小数点表現方法に関し、過去の合成励振信号の固定小数
点位置を利用して必要な変数の固定小数点位置を決定し
ていることである。しかし、いずれか一方の特徴だけを
備えて装置を構成しても良い。The above-mentioned embodiment has two main features. The first feature relates to a method of interpolating a voice average power,
That is, the arithmetic mean of the indexes is used to obtain the average information from the two voice average powers. The second feature relates to a fixed-point representation method when the past excitation signal of the sub-frame is used as the excitation code vector component of the current sub-frame. That is, the fixed point position is determined. However, the device may be configured to have only one of the features.

【０１００】上記実施例においては、本発明をフォワー
ド形のＶＳＥＬＰ音声符号化方式に従う装置に適用した
ものを示したが、バックワード形のＶＳＥＬＰ音声符号
化方式に従う装置に適用しても良く、一般的なＣＥＬＰ
音声符号化方式に従う装置に適用しても良い。In the above embodiment, the present invention is applied to the device conforming to the forward type VSELP speech encoding system, but it may be applied to the device conforming to the backward type VSELP speech encoding system. CELP
It may be applied to a device that complies with the audio encoding system.

【０１０１】例えば、一般的なＣＥＬＰ音声符号化方式
に従う装置においても、過去の合成励振信号を保存して
利用する適応コードブック部（実施例のロングタームラ
グ選択部に相当）を有するものがあり、このような装置
では第２の特徴を適用できる。なお、この場合におい
て、フレームをサブフレームに分解することがない装置
であっても良い。For example, even a device according to a general CELP speech coding system has an adaptive codebook unit (corresponding to the long term lag selection unit of the embodiment) for storing and using the past synthesized excitation signal. The second feature can be applied to such a device. In this case, the device may not decompose the frame into subframes.

【０１０２】また、例えば、２個の音声平均パワーの平
均的情報を固定小数点表現の切替えに用いることがない
装置であっても、２個の音声平均パワーの平均的情報を
得ることを要する装置であれば、第１の特徴構成を適用
できる。Further, for example, even if the device does not use the average information of the two voice average powers for switching the fixed point representation, the device that needs to obtain the average information of the two voice average powers. If so, the first characteristic configuration can be applied.

【０１０３】[0103]

【発明の効果】以上のように、本発明の音声符号化装置
及び音声復号化装置によれば、適応励振コードベクトル
選択部に保存されている過去の数個のフレームの合成励
振信号のそれぞれについて、固定小数点位置情報を格納
する小数点位置バッファ部と、この小数点位置バッファ
部に格納されている複数の固定小数点位置情報と、現フ
レームについての音声平均パワーとから、適応励振コー
ドベクトル選択部における内部変数の固定小数点位置を
決定する小数点位置選択部とを設けたので、過去の合成
励振信号を格納している適応励振コードベクトル選択部
においても、固定小数点位置の切替えによる精度向上を
実現することができるようになる。As described above, according to the speech coding apparatus and the speech decoding apparatus of the present invention, for each of the composite excitation signals of the past several frames stored in the adaptive excitation code vector selection unit. , The fixed point position buffer section for storing fixed point position information, a plurality of fixed point position information stored in this decimal point position buffer section, and the voice average power for the current frame, Since the decimal point position selection unit that determines the fixed point position of the variable is provided, it is possible to improve accuracy by switching the fixed point position even in the adaptive excitation code vector selection unit that stores the past composite excitation signal. become able to.

【０１０４】また、他の本発明の音声符号化装置によれ
ば、音声パワー補間部が、少なくとも音声平均パワーの
量子化手段を備え、サブフレーム毎の音声平均パワー情
報を求めるときに、フレーム毎の音声平均パワーを量子
化したインデックスで扱い、２個のフレームの音声平均
パワーの相乗平均情報をインデックスナンバーの相加平
均で求め、サブフレーム毎に求められたインデックスを
出力し、又は、求められたインデックスを逆量子化手段
を介して戻した音声平均パワーを出力するようにしたの
で、簡単な構成又は簡単な処理によってサブフレーム毎
の音声平均パワー情報を得ることができる。According to another speech coding apparatus of the present invention, the speech power interpolating section includes at least a quantizing means for the speech average power, and when the speech average power information for each subframe is obtained, The audio average power of is treated as a quantized index, and the geometric mean information of the audio average powers of two frames is calculated by the arithmetic mean of the index numbers, and the index calculated for each subframe is output or calculated. Since the voice average power obtained by returning the index through the inverse quantization means is output, the voice average power information for each subframe can be obtained by a simple configuration or a simple process.

[Brief description of drawings]

【図１】実施例のＶＳＥＬＰ音声符号化装置の機能ブロ
ック図である。FIG. 1 is a functional block diagram of a VSELP audio encoding device according to an embodiment.

【図２】従来のＶＳＥＬＰ音声符号化装置の機能ブロッ
ク図である。FIG. 2 is a functional block diagram of a conventional VSELP speech coder.

【図３】固定小数点表現の一般的な切替え方法の説明図
である。FIG. 3 is an explanatory diagram of a general fixed point representation switching method.

【図４】実施例のロングタームラグ選択部に関する固定
小数点表現の切替え方法の説明図（その１）である。FIG. 4 is an explanatory diagram (Part 1) of a fixed-point representation switching method for the long-term lag selecting unit according to the embodiment.

【図５】実施例のロングタームラグ選択部に関する固定
小数点表現の切替え方法の説明図（その２）である。FIG. 5 is an explanatory diagram (part 2) of the fixed point representation switching method for the long term lag selection unit of the embodiment.

【図６】実施例のＶＳＥＬＰ音声復号化装置の機能ブロ
ック図である。FIG. 6 is a functional block diagram of a VSELP audio decoding device according to the embodiment.

[Explanation of symbols]

２０１…音声パワー計算部、２０２…音声パワー補間部、２０２ａ…音声平均パワーの量子化テーブル、２０３…ＬＰＣ分析部、２０４…ターゲット信号作成部、２０５、３０１…ロングタームラグ選択部、２０６、３０２…小数点位置バッファ部、２０７、３０３…小数点位置選択部、２０８、３０４…第１コード選択部、２０９、３０５…第２コード選択部、２１０、３０６…ゲイン選択部、２１１〜２１３、３０７〜３０９…増幅回路、２１４、３１０…加算回路２１４、３１１…合成フィルタ部、３１２…ポストフィルタ部。 201 ... Voice power calculation unit, 202 ... Voice power interpolation unit, 202a ... Quantization table of voice average power, 203 ... LPC analysis unit, 204 ... Target signal creation unit, 205, 301 ... Long term lag selection unit, 206, 302 ... Decimal point position buffer section, 207, 303 ... Decimal point position selection section, 208, 304 ... First code selection section, 209, 305 ... Second code selection section, 210, 306 ... Gain selection section, 211-213, 307-309 ... amplification circuit 214, 310 ... addition circuit 214, 311 ... synthesis filter section, 312 ... post filter section.

フロントページの続き (72)発明者伊東克俊東京都港区虎ノ門１丁目７番12号沖電気工業株式会社内Front Page Continuation (72) Inventor Katsutoshi Ito 1-7-12 Toranomon, Minato-ku, Tokyo Oki Electric Industry Co., Ltd.

Claims

[Claims]

1. A speech coder according to a CELP speech coding method for parameter-coding a speech signal by digital processing according to a finite word length, wherein synthetic excitation signals of several past frames are stored, In a speech coder having an adaptive excitation code vector selection unit that can switch the fixed-point position of an internal variable that outputs a part of the signal sequence as an adaptive excitation code vector component, save it in the adaptive excitation code vector selection unit For each of the composite excitation signals of the past several frames, the decimal point position buffer section that stores the fixed point position information, the multiple fixed point position information stored in this decimal point position buffer section, and the current frame , And the fixed-point position of the internal variable in the adaptive excitation code vector selector Determining speech coding apparatus is characterized in that a decimal point selection unit which.

2. A speech coding apparatus according to the CELP speech coding system for parameter-coding a speech signal by digital processing according to a finite word length, comprising: a speech power calculation unit for obtaining a speech average power of each frame of a speech signal. And a voice power interpolating unit that obtains voice average power information for each subframe obtained by equally dividing one frame into several frames, from the voice average power of two successive frames. The unit includes at least a speech average power quantizing means, and when obtaining the speech average power information for each sub-frame, the speech average power for each frame is treated by a quantized index, and the speech average power of two frames is calculated. When calculating the geometric mean information, the arithmetic mean of the indexes is calculated, and the index calculated for each subframe is calculated. Force and, or, the speech coding apparatus and outputs a voice average power back through the inverse quantization means indexes obtained.

3. A speech coding apparatus according to the CELP speech coding system for parameter-coding a speech signal by digital processing with a finite word length, comprising: a speech power calculation unit for obtaining a speech average power for each frame of a speech signal. , A voice power interpolator for obtaining voice average power information for each sub-frame obtained by equally dividing one frame into several voices from the voice average powers of two consecutive frames, and a composite excitation signal of several past sub-frames In the speech encoding apparatus having an adaptive excitation code vector selection unit capable of switching the fixed point position of the internal variable, which outputs a part of the signal sequence as an adaptive excitation code vector component. When the power interpolator includes at least a speech average power quantizing unit and obtains the speech average power information for each subframe, The speech average power of each frame is treated as a quantized index, and when the geometric mean information of the speech average powers of two frames is calculated, the arithmetic average of the indexes is calculated and the index calculated for each subframe is output. , Or output the speech average power obtained by returning the obtained index through the dequantization means, and for each of the past composite excitation signals of several subframes stored in the adaptive excitation code vector selection unit. , The fixed-point position buffer unit that stores fixed-point position information, the plurality of fixed-point position information that is stored in the fixed-point position buffer unit, and the average voice power for the current subframe, in the adaptive excitation code vector selection unit. A sound characterized by having a decimal point position selection unit that determines the fixed point position of an internal variable. Encoding device.

4. The speech coding apparatus according to claim 1, wherein the CELP speech coding method is a forward VSELP speech coding method which is one of the CELP speech coding methods.

5. C corresponding to the speech coding apparatus according to claim 1.
A voice decoding device according to an ELP voice encoding method, comprising:
Save the composite excitation signals of the past few frames,
In a speech decoding device having an adaptive excitation code vector selection unit that can switch the fixed-point position of an internal variable that outputs a part of the signal sequence as an adaptive excitation code vector component, save it in the adaptive excitation code vector selection unit For each of the composite excitation signals of the past several frames, the decimal point position buffer section that stores the fixed point position information, the multiple fixed point position information stored in this decimal point position buffer section, and the current frame And a decimal point position selection unit that determines a fixed decimal point position of an internal variable in the adaptive excitation code vector selection unit.

6. The speech decoding apparatus according to claim 5, wherein the CELP speech coding method is a forward VSELP speech coding method which is one of the CELP speech coding methods.