JP2008309956A

JP2008309956A - Speech encoding device and speech decoding device

Info

Publication number: JP2008309956A
Application number: JP2007156589A
Authority: JP
Inventors: Hisashi Yajima; 久矢島; Tadashi Yamaura; 正山浦
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-06-13
Filing date: 2007-06-13
Publication date: 2008-12-25
Anticipated expiration: 2027-06-13
Also published as: JP5084360B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech encoding device which selects the optimal code book, even when an excitation sound source signal of a fixed sound source code book becomes a wave packet feature having a predetermined time length by impulse response of a filter placed in a post-stage of a pulse sound source code book composing the fixed sound source code book. <P>SOLUTION: An adaptive code book 8 and the fixed sound source code book 9 output not only the excitation sound source signal of a present subframe, but also the excitation sound source signal of the following subframe, and they are constituted in such a way that, not only the sound signal of the present subframe but also the sound signal of the following subframe is included for an evaluation object of a quantization error. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、入力音声を高能率符号化する音声符号化装置と、音声符号化装置により符号化された音声を復号する音声復号装置とに関するものである。 The present invention relates to a speech encoding apparatus that performs high-efficiency encoding on input speech, and a speech decoding apparatus that decodes speech encoded by the speech encoding apparatus.

符号励振線形予測（ＣＥＬＰ）を用いて、音声信号を高能率符号化する方式としては、単位パルスで構成される代数符号帳を用いて、励振信号の雑音成分を表現する方式（代数的符号励振線形予測：ＡｌｇｅｂｒａｉｃＣｏｄｅ―ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ：ＡＣＥＬＰ）があり、この方式が各種の標準方式に採用されている（例えば、非特許文献１，２，３を参照）。 As a method for highly efficient coding of a speech signal using code excitation linear prediction (CELP), a method of expressing a noise component of an excitation signal using an algebraic codebook composed of unit pulses (algebraic code excitation). There is linear prediction: Algebraic Code-Excited Linear Prediction (ACELP), and this method is adopted in various standard methods (see, for example, Non-Patent Documents 1, 2, and 3).

図３１は例えば以下の非特許文献２に開示されている従来の音声符号化装置を示す構成図である。
従来の音声符号化装置による音声の符号化処理の概略は下記の通りである。
（１）音声符号化装置は、図３３に示すような音声信号を入力すると、入力音声を一定のフレーム長（通常、５ｍｓｅｃ〜５０ｍｓｅｃ程度、ＡＭＲ方式では２０ｍｓｅｃ）に区切る。
（２）前処理部が帯域制限フィルタリングを行うことにより、入力音声から符号化の対象とならない帯域の信号を除去する。
（３）線形予測分析部が入力音声のフレーム毎に、音声のスペクトル分析（ＬＰＣ分析）を実施して、合成フィルタの係数に用いる線形予測係数ＬＰＣを算出するとともに、その線形予測係数ＬＰＣを線スペクトル対ＬＳＰに変換する。 FIG. 31 is a block diagram showing a conventional speech coding apparatus disclosed in Non-Patent Document 2 below, for example.
The outline of the speech encoding process by the conventional speech encoding apparatus is as follows.
(1) When a speech signal as shown in FIG. 33 is input, the speech encoding apparatus divides the input speech into a certain frame length (usually about 5 msec to 50 msec, 20 msec in the AMR system).
(2) The pre-processing unit performs band-limiting filtering to remove a signal in a band not to be encoded from the input speech.
(3) The linear prediction analysis unit performs speech spectrum analysis (LPC analysis) for each frame of the input speech to calculate a linear prediction coefficient LPC to be used as a coefficient of the synthesis filter. Convert to spectrum versus LSP.

（４）ＬＳＰ量子化・逆量子化部がＬＳＰ符号帳を参照してベクトル量子化を行う。
即ち、ＬＳＰ符号帳に記録されているＬＳＰ係数の中で、線スペクトル対ＬＳＰに最も近似しているＬＳＰ係数を特定し、ＬＳＰ符号帳から当該ＬＳＰ係数のインデックスを抽出する。
また、ＬＳＰ量子化・逆量子化部が当該ＬＳＰ係数のインデックスをスペクトル情報として多重化部に出力する。
（５）ＬＳＰ／ＬＰＣ変換部が当該ＬＳＰ係数を線形予測係数ＬＰＣに変換し、その線形予測係数ＬＰＣに応じて合成フィルタを形成する。 (4) The LSP quantization / inverse quantization unit performs vector quantization with reference to the LSP codebook.
That is, among the LSP coefficients recorded in the LSP codebook, the LSP coefficient closest to the line spectrum pair LSP is specified, and the index of the LSP coefficient is extracted from the LSP codebook.
Further, the LSP quantization / inverse quantization unit outputs the index of the LSP coefficient as spectrum information to the multiplexing unit.
(5) The LSP / LPC converter converts the LSP coefficient into a linear prediction coefficient LPC, and forms a synthesis filter in accordance with the linear prediction coefficient LPC.

（６）駆動音源生成部が適応符号帳及び固定音源符号帳から出力されるサブフレーム単位（１つのフレームが時間軸上で複数に分割（ＡＭＲ方式では４分割、１サブフレーム＝５ｍｓｅｃ）された区間）の励振音源信号を組み合わせて、複数の駆動音源を生成する。
（７）駆動音源生成部により生成された複数の駆動音源を合成フィルタに通して、複数の合成音声を生成する。 (6) The sub-frame unit (one frame is divided into a plurality of parts on the time axis (four divisions in the AMR method, one sub-frame = 5 msec)) output from the adaptive excitation codebook and the fixed excitation codebook by the driving excitation generator A plurality of driving sound sources are generated by combining the excitation sound source signals of (section).
(7) A plurality of drive sound sources generated by the drive sound source generation unit are passed through a synthesis filter to generate a plurality of synthesized sounds.

（８）最小誤差探索部が適応符号帳及び固定音源符号帳から出力される励振音源信号や、利得符号帳から出力される利得を制御しながら、複数の合成音声と入力音声の量子化誤差を評価し、複数の合成音声の中で量子化誤差が最小の合成音声を探索する。
（９）多重化部がＬＰＣ量子化・逆量子化部から出力されたスペクトル情報と、量子化誤差が最小の合成音声が得られる際に適応符号帳から出力される励振音源信号のピッチ情報と、量子化誤差が最小の合成音声が得られる際に固定音源符号帳から出力される励振音源信号のパルス情報と、量子化誤差が最小の合成音声が得られる際に利得符号帳から出力される利得を示す利得情報とを多重化して、その多重化信号を音声復号装置に送信する。 (8) The minimum error search unit controls quantization errors between a plurality of synthesized speech and input speech while controlling the excitation excitation signal output from the adaptive codebook and the fixed excitation codebook and the gain output from the gain codebook. Evaluate and search for a synthesized speech with a minimum quantization error among a plurality of synthesized speech.
(9) Spectrum information output from the LPC quantization / inverse quantization unit by the multiplexing unit, pitch information of the excitation sound source signal output from the adaptive codebook when a synthesized speech with a minimum quantization error is obtained, The pulse information of the excitation excitation signal output from the fixed excitation codebook when the synthesized speech with the minimum quantization error is obtained, and the gain codebook output when the synthesized speech with the minimum quantization error is obtained. The gain information indicating the gain is multiplexed and the multiplexed signal is transmitted to the speech decoding apparatus.

ここで、音声符号化装置の適応符号帳は、過去に生成した駆動音源を蓄積したものである。
また、固定音源符号帳は、例えば、ＡＭＲ方式では、図３４に示すように、複数本の単位パルスで構成されているパルス音源符号帳（代数符号帳）が用いられている。
また、パルス音源符号帳の後段にピッチ強調フィルタが設置され、適応符号帳から出力される励振音源信号のピット周期に応じてピッチ周波数成分を強調することにより、母音部の音質を改善する手法がとられている。
なお、利得符号帳には、複数の利得値の候補が格納されており、それぞれの利得値にインデックスが付されている。
駆動音源は、これらの符号帳の要素を適宜組み合わせることによって生成される。 Here, the adaptive codebook of the speech coding apparatus is an accumulation of drive excitations generated in the past.
As the fixed excitation codebook, for example, in the AMR system, as shown in FIG. 34, a pulse excitation codebook (algebraic codebook) composed of a plurality of unit pulses is used.
In addition, there is a method for improving the sound quality of the vowel part by installing a pitch emphasis filter after the pulse excitation codebook and enhancing the pitch frequency component according to the pit period of the excitation excitation signal output from the adaptive codebook. It has been taken.
The gain codebook stores a plurality of gain value candidates, and each gain value is indexed.
The driving sound source is generated by appropriately combining these codebook elements.

図３２は例えば以下の非特許文献２に開示されている従来の音声復号装置を示す構成図である。
従来の音声復号装置による音声の復号処理の概略は下記の通りである。
（１）多重分離部が音声符号化装置から送信された多重化信号を受信して、その多重化信号を分離し、スペクトル情報、ピッチ情報、パルス情報及び利得情報を出力する。
（２）多重分離部から出力されたスペクトル情報が示すＬＳＰ係数のインデックスにしたがって合成フィルタを形成する。
（３）適応符号帳からピッチ情報に対応する当該サブフレームにおける励振音源信号（適応符号帳成分信号）を取得するとともに、固定音源符号帳からパルス情報に対応する当該サブフレームにおける励振音源信号（パルス音源符号帳成分信号）を取得する。 FIG. 32 is a block diagram showing a conventional speech decoding apparatus disclosed in Non-Patent Document 2 below, for example.
The outline of the speech decoding process by the conventional speech decoding apparatus is as follows.
(1) The demultiplexing unit receives the multiplexed signal transmitted from the speech coding apparatus, demultiplexes the multiplexed signal, and outputs spectrum information, pitch information, pulse information, and gain information.
(2) A synthesis filter is formed according to the index of the LSP coefficient indicated by the spectrum information output from the demultiplexing unit.
(3) An excitation excitation signal (adaptive codebook component signal) in the subframe corresponding to the pitch information is acquired from the adaptive codebook, and an excitation excitation signal (pulse) in the subframe corresponding to the pulse information from the fixed excitation codebook A sound source codebook component signal) is acquired.

（４）利得符号帳から利得情報が示す利得を取得する。
（５）利得乗算器が利得符号帳から取得した利得を適応符号帳成分信号及びパルス音源符号帳成分信号に乗じ、加算器が利得乗算後の適応符号帳成分信号とパルス音源符号帳成分信号を加算する。
（６）加算器による加算後の励振音源信号（適応符号帳成分信号＋パルス音源符号帳成分信号）を合成フィルタに通して合成音声を復号する。 (4) The gain indicated by the gain information is acquired from the gain codebook.
(5) The gain obtained from the gain codebook by the gain multiplier is multiplied by the adaptive codebook component signal and the pulse excitation codebook component signal, and the adder obtains the adaptive codebook component signal and the pulse excitation codebook component signal after gain multiplication. to add.
(6) Pass the excitation excitation signal (adaptive codebook component signal + pulse excitation codebook component signal) after addition by the adder through the synthesis filter to decode the synthesized speech.

上記のＣＥＬＰ型符号化方式の枠組みを用いて、低ビットレートを維持しながら、さらに符号化音声品質を向上させて、ユーザが自然で聞き取りやすい音声を伝送させる例が、以下の特許文献１に開示されている。
即ち、以下の特許文献１には、ＩＴＵ―Ｔ勧告Ｇ．７２９ＡｎｎｅｘＤに示されているパルス拡散符号帳を用いる技術が開示されている。
図３５はパルス拡散符号帳を用いる固定音源符号帳を示す構成図であり、図３５の固定音源符号帳は、パルス音源に拡散パタン（固定波形）を畳み込んで固定音源ベクトルを生成するものである。 The following Patent Document 1 shows an example in which the speech quality is further improved while maintaining a low bit rate by using the CELP coding scheme framework described above, and the user transmits natural and easy-to-hear speech. It is disclosed.
That is, in the following Patent Document 1, ITU-T Recommendation G. A technique using a pulse spreading codebook shown in 729 Annex D is disclosed.
FIG. 35 is a block diagram showing a fixed excitation codebook using a pulse spreading codebook. The fixed excitation codebook in FIG. 35 generates a fixed excitation vector by convolving a spreading pattern (fixed waveform) with a pulse excitation. is there.

また、パルス音源から所望の周波数帯域の信号を抽出し、当該周波数帯域の信号を強調させるため、図３６に示すように、低域通過フィルタ（ＬＰＦ）や高域通過フィルタ（ＨＰＦ）を後置することもある。 Further, in order to extract a signal of a desired frequency band from the pulse sound source and emphasize the signal of the frequency band, as shown in FIG. 36, a low-pass filter (LPF) or a high-pass filter (HPF) is placed after the signal. Sometimes.

以上より、上記のＣＥＬＰ型符号化方式を用いれば、音声信号の符号化において、低ビットレート化を図ることができる。
しかしながら、パルス音源符号帳の後段に、拡散フィルタやＨＰＦなどの各種フィルタを設置する場合、通常、フィルタのインパルス応答によって、所定の時間長を有する波束形状となることが知られている。
図３７は所定の時間長を有する波束形状を示す説明図である。
図３７の上段には、パルス音源符号帳の出力波形（図３５及び図３６の（１）に相当する波形）を示し、下段には、後置フィルタの出力波形（図３５及び図３６の（２）に相当する波形）を示している。ただし、図を見やすくするために、（２）の波形については、波形の包絡線を示している。 As described above, if the CELP coding method is used, it is possible to reduce the bit rate in encoding the audio signal.
However, when various filters such as a diffusion filter and an HPF are installed at the subsequent stage of the pulse excitation codebook, it is generally known that a wave packet shape having a predetermined time length is obtained due to the impulse response of the filter.
FIG. 37 is an explanatory view showing a wave packet shape having a predetermined time length.
The upper part of FIG. 37 shows the output waveform of the pulse excitation codebook (the waveform corresponding to (1) in FIGS. 35 and 36), and the lower part shows the output waveform of the post filter (in FIGS. 35 and 36). 2). However, in order to make the figure easy to see, the waveform of (2) shows an envelope of the waveform.

固定音源符号帳の励振音源信号として、所定の時間長を有する波束を用いて、ＣＥＬＰ符号化処理を実行する場合、以下に示すような問題を生じる。
図３８は所定の時間長を有する波束を用いて、ＣＥＬＰ符号化処理を実行する場合の問題を説明する説明図である。
図３８の上段は、上記の（１）の波形に相当するパルスの位置及び波束が、符号化処理対象である当該サブフレーム区間に存在する理想的な例を示している。このときは、波束成分の全てが、固定音源符号帳の量子化誤差評価の対象となるため、正確な誤差評価が可能である。 When CELP encoding processing is executed using a wave packet having a predetermined time length as an excitation excitation signal of a fixed excitation codebook, the following problems occur.
FIG. 38 is an explanatory diagram for explaining a problem when the CELP encoding process is executed using a wave packet having a predetermined time length.
The upper part of FIG. 38 shows an ideal example in which the pulse position and wave packet corresponding to the waveform of (1) above exist in the subframe section to be encoded. At this time, since all of the wave packet components are subjected to quantization error evaluation of the fixed excitation codebook, accurate error evaluation is possible.

しかしながら、図３８の下段に示すように、パルス位置が当該サブフレーム区間内にあっても、波束がサブフレーム間を跨がる場合があり、波束の一部が固定音源符号帳の量子化誤差評価の対象とならないことがある。
即ち、パルス音源が、符号化処理が行われている当該サブフレーム（ここでは、説明の便宜上、第Ｎサブフレームと表記する）の末尾付近（図３８の例では、パルスＢ）にある場合、図３８の下段に示すように、波束の一部が第（Ｎ＋１）サブフレームに跨ることがある。
このとき、図３３に示すフレーム構成で符号化を実行すると、実際に誤差評価の対象となるのは、区間Ｂのみである（図３８の下段を参照）。 However, as shown in the lower part of FIG. 38, even if the pulse position is within the subframe section, the wave packet may straddle between the sub frames, and a part of the wave packet is a quantization error of the fixed excitation codebook. May not be subject to evaluation.
That is, when the pulse sound source is near the end (pulse B in the example of FIG. 38) of the subframe where the encoding process is performed (here, for convenience of description, expressed as the Nth subframe), As shown in the lower part of FIG. 38, a part of the wave packet may straddle the (N + 1) th subframe.
At this time, when encoding is performed with the frame configuration shown in FIG. 33, only the section B is actually subject to error evaluation (see the lower part of FIG. 38).

パルスＢは、波束の一部（区間Ｂ）でしか、誤差評価がなされていないため、波束全体で評価された場合には、選択から漏れていた可能性がある。
このように、本来、選択されないパルス（波束位置）を誤って選択してしまうことがある。
また、符号化ビットレートは有限であるため、誤って選択されてしまったパルスの代わりに、本来選択されるべきパルスが選択されない（機会損失）ということも考えられる。 Since the error evaluation of the pulse B is performed only in a part of the wave packet (section B), there is a possibility that the pulse B has been omitted from the selection when the entire wave packet is evaluated.
Thus, a pulse (wave packet position) that is not originally selected may be selected by mistake.
In addition, since the encoding bit rate is finite, it is conceivable that a pulse to be originally selected is not selected (opportunity loss) instead of a pulse that has been selected by mistake.

さらに、当該サブフレームの直前のサブフレーム（第（Ｎ―１）サブフレーム）で選択されたパルスのインパルス応答成分は、第Ｎサブフレームには繰り越されないため、図３８の区間Ａの信号はないものとして、量子化処理がなされる。
そのため、例えば、図３８のパルスＤのように、本来なら選択する必要のない区間Ａ内のパルスが選択されやすくなることがある。その結果、パルスＤの代わりに、本来選択されるべきパルス（例えば、パルスＣ）が選択されない（機会損失）ことも考えられる。 Furthermore, since the impulse response component of the pulse selected in the subframe immediately before the subframe ((N−1) th subframe) is not carried over to the Nth subframe, the signal in section A in FIG. Quantization processing is performed on the assumption that there is nothing.
Therefore, for example, a pulse in the section A that does not need to be selected as in the case of the pulse D in FIG. 38 may be easily selected. As a result, instead of the pulse D, a pulse (for example, the pulse C) that should be originally selected is not selected (loss of opportunity).

音声復号装置側では、区間Ａに相当するインパルス応答成分を繰越再生する機能を備えていないため、波束形状が崩れてしまい、畳み込んだフィルタの効果が減じられるなどの弊害が発生することも考えられる。 Since the speech decoding apparatus does not have a function to carry forward the impulse response component corresponding to the section A, the wave packet shape is collapsed, and there is a possibility that the effect of the convolved filter is reduced. It is done.

再公表２００３／０７１５２２号公報Republished 2003/071522 ITU-T Recommendation G.729, "Coding of Speech at 8kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction(CS-ACELP)" (TTC標準JT-G729、「8kbit/s CS-ACELPを用いた音声符号化方式」(社)情報通信技術委員会、1999年制定)ITU-T Recommendation G.729, "Coding of Speech at 8kbit / s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP)" (TTC standard JT-G729, "8kbit / s CS-ACELP is used (Speech coding system) (Information and Communication Technology Committee, established in 1999) 3rd Generation Partnership Project(3GPP), Technical Specification(TS) 26.090, "AMR speech codec; Transcoding functions", Version 4.0.0 (2001-03)3rd Generation Partnership Project (3GPP), Technical Specification (TS) 26.090, "AMR speech codec; Transcoding functions", Version 4.0.0 (2001-03) ITU-T Recommendation G.722.2, "Wideband coding of speech at around 16kbit/s using Adaptive Multi-Rate Wideband(AMR-WB)" (TTC標準JT-G722.2、「適応マルチレート広帯域(AMR-WB)方式を用いた16kbit/s程度の広帯域音声符号化」(社)情報通信技術委員会、2004年制定)ITU-T Recommendation G.722.2, "Wideband coding of speech at around 16kbit / s using Adaptive Multi-Rate Wideband (AMR-WB)" (TTC standard JT-G722.2, "Adaptive Multirate Wideband (AMR-WB) (Wideband speech coding of about 16kbit / s using ``, '' Information and Communication Technology Committee, established in 2004)

従来の音声符号化装置は以上のように構成されているので、ＣＥＬＰ型符号化方式を用いれば、音声信号の符号化において、低ビットレート化を図ることができる。しかし、パルス音源符号帳の後段に、拡散フィルタやＨＰＦなどの各種フィルタを設置すると、フィルタのインパルス応答によって、固定音源符号帳の励振音源信号が所定の時間長を有する波束形状となるため、最適な符号帳を選択することができず、音声復号装置で復号される音声の品質が劣化してしまうなどの課題があった。 Since the conventional speech coding apparatus is configured as described above, if the CELP coding method is used, a low bit rate can be achieved in speech signal coding. However, if various filters such as diffusion filters and HPFs are installed after the pulse excitation codebook, the excitation excitation signal of the fixed excitation codebook has a wave packet shape having a predetermined time length due to the impulse response of the filter. There is a problem that a correct codebook cannot be selected, and the quality of speech decoded by the speech decoding apparatus deteriorates.

この発明は上記のような課題を解決するためになされたもので、固定音源符号帳を構成するパルス音源符号帳の後段に設置されるフィルタのインパルス応答によって、固定音源符号帳の励振音源信号が所定の時間長を有する波束形状となっても、最適な符号帳を選択することができる音声符号化装置を得ることを目的とする。 The present invention has been made to solve the above-described problems, and the excitation excitation signal of the fixed excitation codebook is generated by the impulse response of the filter installed at the subsequent stage of the pulse excitation codebook constituting the fixed excitation codebook. An object of the present invention is to obtain a speech coding apparatus capable of selecting an optimum codebook even when the wave packet shape has a predetermined time length.

この発明に係る音声符号化装置は、適応符号帳及び固定音源符号帳が当該サブフレームの励振音源信号だけでなく、次サブフレームの励振音源信号を出力し、パラメータ探索手段が当該サブフレームの入力音声だけでなく、次サブフレームの入力音声を量子化誤差の評価対象に含めるようにしたものである。 In the speech coding apparatus according to the present invention, the adaptive codebook and the fixed excitation codebook output not only the excitation excitation signal of the subframe but also the excitation excitation signal of the next subframe, and the parameter search means inputs the subframe. In addition to the speech, the input speech of the next subframe is included in the quantization error evaluation target.

この発明によれば、適応符号帳及び固定音源符号帳が当該サブフレームの励振音源信号だけでなく、次サブフレームの励振音源信号を出力し、パラメータ探索手段が当該サブフレームの入力音声だけでなく、次サブフレームの入力音声を量子化誤差の評価対象に含めるように構成したので、固定音源符号帳を構成するパルス音源符号帳の後段に設置されるフィルタのインパルス応答によって、固定音源符号帳の励振音源信号が所定の時間長を有する波束形状となっても、最適な符号帳を選択することができる効果がある。 According to this invention, the adaptive codebook and the fixed excitation codebook output not only the excitation excitation signal of the subframe but also the excitation excitation signal of the next subframe, and the parameter search means not only the input speech of the subframe. Since the input speech of the next subframe is included in the evaluation target of the quantization error, the fixed excitation codebook of the fixed excitation codebook is determined by the impulse response of the filter installed after the pulse excitation codebook constituting the fixed excitation codebook. Even if the excitation sound source signal has a wave packet shape having a predetermined time length, there is an effect that an optimal codebook can be selected.

実施の形態１．
図１はこの発明の実施の形態１による音声符号化装置を示す構成図であり、図において、バッファ１は入力音声である音声信号を格納するメモリである。
前処理部２はバッファ１に格納されている音声信号を一定のフレーム長（通常、５ｍｓｅｃ〜５０ｍｓｅｃ程度、ＡＭＲ方式では２０ｍｓｅｃ）に区切り、帯域制限フィルタリングを行うことにより、音声信号の各フレームから符号化の対象とならない不要な低周波数成分を除去する処理を実施する。前処理部２は例えばカットオフ周波数が１４０Ｈｚの極零フィルタなどから構成される。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a speech coding apparatus according to Embodiment 1 of the present invention. In the figure, a buffer 1 is a memory for storing a speech signal as input speech.
The pre-processing unit 2 divides the audio signal stored in the buffer 1 into a predetermined frame length (usually about 5 msec to 50 msec, 20 msec in the AMR method), and performs band-limiting filtering to encode the audio signal from each frame. Perform processing to remove unnecessary low-frequency components that are not subject to conversion. For example, the preprocessing unit 2 includes a pole-zero filter having a cutoff frequency of 140 Hz.

スペクトル分析部３は線形予測分析部４、ＬＳＰ符号帳５及びＬＳＰ量子化・逆量子化部６から構成されており、前処理部２から出力された音声信号のフレーム毎に、音声のスペクトル分析（ＬＰＣ分析）を実施する。なお、スペクトル分析部３はスペクトル分析手段を構成している。
線形予測分析部４は音声信号のフレーム毎に、音声のスペクトル分析（ＬＰＣ分析）を実施して、合成フィルタ１６の係数に用いる線形予測係数ＬＰＣを算出するとともに、その線形予測係数ＬＰＣを線スペクトル対ＬＳＰに変換する処理を実施する。 The spectrum analysis unit 3 includes a linear prediction analysis unit 4, an LSP codebook 5, and an LSP quantization / inverse quantization unit 6. For each frame of the audio signal output from the preprocessing unit 2, an audio spectrum analysis is performed. (LPC analysis) is performed. The spectrum analysis unit 3 constitutes spectrum analysis means.
The linear prediction analysis unit 4 performs a speech spectrum analysis (LPC analysis) for each frame of the speech signal, calculates a linear prediction coefficient LPC to be used as a coefficient of the synthesis filter 16, and uses the linear prediction coefficient LPC as a line spectrum. A process of converting to LSP is executed.

ＬＳＰ符号帳５は複数のＬＳＰ係数を記録している符号帳であり、各ＬＳＰ係数にはインデックスが付与されている。
ＬＳＰ量子化・逆量子化部６はＬＳＰ符号帳５に記録されているＬＳＰ係数の中で、線スペクトル対ＬＳＰに最も近似しているＬＳＰ係数を特定し、ＬＳＰ符号帳５から当該ＬＳＰ係数のインデックスを抽出するとともに、そのＬＳＰ係数のインデックスをスペクトル情報として多重化部２１に出力する処理を実施する。 The LSP codebook 5 is a codebook that records a plurality of LSP coefficients, and an index is assigned to each LSP coefficient.
The LSP quantization / inverse quantization unit 6 identifies an LSP coefficient that is closest to the line spectrum pair LSP among the LSP coefficients recorded in the LSP codebook 5, and extracts the LSP coefficient from the LSP codebook 5. While extracting an index, the process which outputs the index of the LSP coefficient to the multiplexing part 21 as spectrum information is implemented.

駆動音源生成部７は最小誤差探索部２０の指示の下、適応符号帳８及び固定音源符号帳９から出力されるサブフレーム単位（１つのフレームが時間軸上で複数に分割（ＡＭＲ方式では４分割、１サブフレーム＝５ｍｓｅｃ）された区間）の励振音源信号を組み合わせて、複数の駆動音源を生成する処理を実施する。なお、駆動音源生成部７は駆動音源生成手段を構成している。
適応符号帳８は過去に生成した駆動音源である励振音源信号（適応符号帳成分信号）を蓄積している符号帳であり、最小誤差探索部２０が指示する当該サブフレーム及び次サブフレームの適応符号帳成分信号を出力する。 Under the instruction of the minimum error search unit 20, the drive excitation generator 7 is subframe units (one frame is divided into multiples on the time axis (4 in the AMR method). A process of generating a plurality of driving sound sources is performed by combining the excitation sound source signals in the divided section, 1 subframe = 5 msec). The driving sound source generator 7 constitutes a driving sound source generator.
The adaptive codebook 8 is a codebook that accumulates excitation excitation signals (adaptive codebook component signals), which are driving excitations generated in the past, and adaptation of the subframe and the next subframe indicated by the minimum error search unit 20. A codebook component signal is output.

固定音源符号帳９は例えばＡＭＲ方式では、複数本の単位パルスで構成されているパルス音源符号帳（代数符号帳）が用いられる符号帳であり、最小誤差探索部２０が指示する当該サブフレーム及び次サブフレームの励振音源信号（パルス音源符号帳成分信号）を出力する。なお、パルス音源符号帳の後段にピッチ強調フィルタが設置され、適応符号帳８から出力される励振音源信号のピット周期に応じてピッチ周波数成分を強調することにより、母音部の音質を改善する手法がとられることがある。
利得符号帳１０は複数の利得を格納している符号帳であり、最小誤差探索部２０が指示する利得を出力する。 The fixed excitation codebook 9 is a codebook in which a pulse excitation codebook (algebraic codebook) composed of a plurality of unit pulses is used in the AMR method, for example, and the subframe and the minimum error search unit 20 indicate An excitation excitation signal (pulse excitation codebook component signal) of the next subframe is output. A method of improving the sound quality of the vowel part by installing a pitch emphasis filter after the pulse excitation codebook and enhancing the pitch frequency component according to the pit period of the excitation excitation signal output from the adaptive codebook 8 May be taken.
The gain codebook 10 is a codebook storing a plurality of gains, and outputs the gain indicated by the minimum error search unit 20.

利得乗算器１１は適応符号帳８から出力された当該サブフレーム及び次サブフレームにおける励振音源信号（適応符号帳成分信号）に、利得符号帳１０から出力された利得を乗算する処理を実施する。
利得乗算器１２は固定音源符号帳９から出力された当該サブフレーム及び次サブフレームにおける励振音源信号（パルス音源符号帳成分信号）に、利得符号帳１０から出力された利得を乗算する処理を実施する。
加算器１３は利得乗算器１１により利得が乗算された励振音源信号（適応符号帳成分信号）と利得乗算器１２により利得が乗算された励振音源信号（パルス音源符号帳成分信号）を加算する処理を実施する。 The gain multiplier 11 multiplies the excitation source signal (adaptive codebook component signal) in the subframe and the next subframe output from the adaptive codebook 8 by the gain output from the gain codebook 10.
The gain multiplier 12 multiplies the excitation source signal (pulse excitation codebook component signal) in the subframe and the next subframe output from the fixed excitation codebook 9 by the gain output from the gain codebook 10. To do.
The adder 13 adds the excitation source signal (adaptive codebook component signal) multiplied by the gain by the gain multiplier 11 and the excitation source signal (pulse excitation codebook component signal) multiplied by the gain by the gain multiplier 12. To implement.

合成音声生成部１４はＬＳＰ量子化・逆量子化部６により特定されたＬＳＰ係数に応じて合成フィルタ１６を形成し、駆動音源生成部７により生成された複数の駆動音源を合成フィルタ１６に通して、複数の合成音声を生成する処理を実施する。なお、合成音声生成部１４は合成音声生成手段を構成している。
ＬＳＰ／ＬＰＣ変換部１５はＬＳＰ量子化・逆量子化部６により特定されたＬＳＰ係数を線形予測係数ＬＰＣに変換し、その線形予測係数ＬＰＣに応じて合成フィルタ１６を形成する処理を実施する。
合成フィルタ１６は駆動音源生成部７により生成された駆動音源を入力して、合成音声を減算器１８に出力するフィルタである。 The synthesized speech generation unit 14 forms a synthesis filter 16 according to the LSP coefficient specified by the LSP quantization / inverse quantization unit 6, and passes a plurality of driving sound sources generated by the driving sound source generation unit 7 through the synthesis filter 16. Then, a process for generating a plurality of synthesized speech is performed. The synthesized speech generation unit 14 constitutes a synthesized speech generation means.
The LSP / LPC conversion unit 15 converts the LSP coefficient specified by the LSP quantization / inverse quantization unit 6 into a linear prediction coefficient LPC, and performs a process of forming the synthesis filter 16 according to the linear prediction coefficient LPC.
The synthesis filter 16 is a filter that inputs the driving sound source generated by the driving sound source generation unit 7 and outputs the synthesized speech to the subtractor 18.

参照ベクトル組立バッファ１７は前処理部２から出力されたフレーム単位の音声信号の中から、当該サブフレームの音声信号と次サブフレームの音声信号を抽出し、当該サブフレーム及び次サブフレームの音声信号を、量子化誤差が最小の合成音声を探索するに使用する参照信号として減算器１８に出力する。
減算器１８は合成音声生成部１４により生成された複数の合成音声と参照ベクトル組立バッファ１７から出力された参照信号の差分（量子化誤差）を算出する処理を実施する。 The reference vector assembly buffer 17 extracts the audio signal of the subframe and the audio signal of the next subframe from the audio signal of the frame unit output from the preprocessing unit 2, and the audio signal of the subframe and the next subframe. Is output to the subtracter 18 as a reference signal used for searching for a synthesized speech with the smallest quantization error.
The subtracter 18 performs a process of calculating a difference (quantization error) between the plurality of synthesized speech generated by the synthesized speech generating unit 14 and the reference signal output from the reference vector assembly buffer 17.

聴覚重み付けフィルタ１９は減算器１８により算出された量子化誤差に対して聴覚重み付けを与える処理を実施する。
最小誤差探索部２０は聴覚重み付けフィルタ１９から出力される聴覚重み付け量子化誤差が小さくなるように、適応符号帳８及び固定音源符号帳９から出力される励振音源信号や、利得符号帳１０から出力される利得を制御して、複数の合成音声の中で量子化誤差が最小の合成音声に係る符号化パラメータ（ピッチ情報、利得情報、パルス情報）を探索する処理を実施する。
なお、参照ベクトル組立バッファ１７、減算器１８、聴覚重み付けフィルタ１９及び最小誤差探索部２０からパラメータ探索手段が構成されている。 The auditory weighting filter 19 performs a process of giving auditory weighting to the quantization error calculated by the subtracter 18.
The minimum error search unit 20 outputs the excitation excitation signal output from the adaptive codebook 8 and the fixed excitation codebook 9 and the gain codebook 10 so that the perceptual weighting quantization error output from the perceptual weighting filter 19 is reduced. The process of searching the encoding parameter (pitch information, gain information, pulse information) regarding the synthetic | combination speech with the minimum quantization error is controlled among several synthetic | combination speeches by controlling the gain to be performed.
The reference vector assembly buffer 17, the subtracter 18, the perceptual weighting filter 19, and the minimum error search unit 20 constitute parameter search means.

多重化部２１はＬＰＣ量子化・逆量子化部６から出力されたスペクトル情報と、量子化誤差が最小の合成音声が得られる際に適応符号帳８から出力される励振音源信号のピッチ情報と、量子化誤差が最小の合成音声が得られる際に固定音源符号帳９から出力される励振音源信号のパルス情報と、量子化誤差が最小の合成音声が得られる際に利得符号帳１０から出力される利得を示す利得情報とを多重化して、その多重化信号を音声復号装置に送信する処理を実施する。 The multiplexing unit 21 includes the spectrum information output from the LPC quantization / inverse quantization unit 6, the pitch information of the excitation sound source signal output from the adaptive codebook 8 when the synthesized speech with the minimum quantization error is obtained, and The pulse information of the excitation excitation signal output from the fixed excitation codebook 9 when the synthesized speech with the minimum quantization error is obtained, and the output from the gain codebook 10 when the synthesized speech with the minimum quantization error is obtained. A process of multiplexing the gain information indicating the gain to be transmitted and transmitting the multiplexed signal to the speech decoding apparatus is performed.

図２はこの発明の実施の形態１による音声復号装置を示す構成図であり、図において、多重分離部３１は音声符号化装置から送信された多重化信号を受信して、その多重化信号を分離し、スペクトル情報、ピッチ情報、パルス情報及び利得情報を出力する。なお、多重分離部３１は情報受信手段を構成している。
適応符号帳３２は図１の音声符号化装置における適応符号帳８に相当する符号帳であり、多重分離部３１から出力されたピッチ情報に対応する当該サブフレームの励振音源信号（適応符号帳成分信号）を出力する。 FIG. 2 is a block diagram showing a speech decoding apparatus according to Embodiment 1 of the present invention. In FIG. 2, a demultiplexing unit 31 receives a multiplexed signal transmitted from the speech encoding apparatus, and receives the multiplexed signal. Separating and outputting spectrum information, pitch information, pulse information and gain information. The demultiplexing unit 31 constitutes information receiving means.
The adaptive codebook 32 is a codebook corresponding to the adaptive codebook 8 in the speech coding apparatus of FIG. 1, and the excitation sound source signal (adaptive codebook component) of the subframe corresponding to the pitch information output from the demultiplexing unit 31. Signal).

固定音源符号帳３３は図１の音声符号化装置における固定音源符号帳９に相当する符号帳であり、多重分離部３１から出力されたパルス情報に対応する当該サブフレームの励振音源信号（パルス音源符号帳成分信号）を出力する。
利得符号帳３４は図１の音声符号化装置における利得符号帳１０に相当する符号帳であり、多重分離部３１から出力された利得情報に対応する利得を利得乗算器３５，３６に出力する。 The fixed excitation codebook 33 is a codebook corresponding to the fixed excitation codebook 9 in the speech encoding apparatus of FIG. 1, and the excitation excitation signal (pulse excitation source) of the subframe corresponding to the pulse information output from the demultiplexing unit 31. Codebook component signal).
The gain codebook 34 is a codebook corresponding to the gain codebook 10 in the speech coding apparatus of FIG. 1, and outputs gains corresponding to the gain information output from the demultiplexing unit 31 to the gain multipliers 35 and 36.

利得乗算器３５は適応符号帳３２から出力された当該サブフレームにおける励振音源信号（適応符号帳成分信号）に、利得符号帳３４から出力された利得を乗算する処理を実施する。
利得乗算器３６は固定音源符号帳３３から出力された当該サブフレームにおける励振音源信号（パルス音源符号帳成分信号）に、利得符号帳３４から出力された利得を乗算する処理を実施する。
加算器３７は利得乗算器３５により利得が乗算された励振音源信号（適応符号帳成分信号）と利得乗算器３６により利得が乗算された励振音源信号（パルス音源符号帳成分信号）を加算して駆動音源を生成する処理を実施する。
なお、適応符号帳３２、固定音源符号帳３３、利得符号帳３４、利得乗算器３５，３６及び加算器３７から駆動音源生成手段が構成されている。 The gain multiplier 35 multiplies the excitation source signal (adaptive codebook component signal) in the subframe output from the adaptive codebook 32 by the gain output from the gain codebook 34.
The gain multiplier 36 performs processing for multiplying the excitation excitation signal (pulse excitation codebook component signal) in the subframe output from the fixed excitation codebook 33 by the gain output from the gain codebook 34.
The adder 37 adds the excitation excitation signal (adaptive codebook component signal) multiplied by the gain by the gain multiplier 35 and the excitation excitation signal (pulse excitation codebook component signal) multiplied by the gain by the gain multiplier 36. A process of generating a driving sound source is performed.
The adaptive codebook 32, fixed excitation codebook 33, gain codebook 34, gain multipliers 35 and 36, and adder 37 constitute drive excitation generation means.

ＬＳＰ符号帳３８は図１の音声符号化装置におけるＬＳＰ符号帳５に相当する符号帳であり、多重分離部３１から出力されたスペクトル情報に対応するＬＳＰ係数を出力する。
ＬＳＰ／ＬＰＣ変換部３９はＬＳＰ符号帳３８から出力されたＬＳＰ係数を線形予測係数ＬＰＣに変換し、その線形予測係数ＬＰＣに応じて合成フィルタ４０を形成する処理を実施する。
合成フィルタ４０は加算器３７から出力された駆動音源を入力して、合成音声をポストフィルタ４１に出力するフィルタである。
ポストフィルタ４１は、合成フィルタ４０から出力された合成音声に対して品質を改善する処理を実施する。
なお、ＬＳＰ符号帳３８、ＬＳＰ／ＬＰＣ変換部３９、合成フィルタ４０及びポストフィルタ４１から合成音声復号手段が構成されている。 The LSP codebook 38 is a codebook corresponding to the LSP codebook 5 in the speech coding apparatus of FIG. 1 and outputs LSP coefficients corresponding to the spectrum information output from the demultiplexing unit 31.
The LSP / LPC converter 39 converts the LSP coefficient output from the LSP codebook 38 into a linear prediction coefficient LPC, and performs a process of forming a synthesis filter 40 according to the linear prediction coefficient LPC.
The synthesis filter 40 is a filter that receives the driving sound source output from the adder 37 and outputs the synthesized speech to the post filter 41.
The post filter 41 performs processing for improving the quality of the synthesized speech output from the synthesis filter 40.
The LSP codebook 38, the LSP / LPC converter 39, the synthesis filter 40, and the post filter 41 constitute a synthesized speech decoding unit.

次に動作について説明する。
音声符号化装置の前処理部２は、バッファ１が入力音声である音声信号を格納すると、図３に示すように、その音声信号を一定のフレーム長（通常、５ｍｓｅｃ〜５０ｍｓｅｃ程度、ＡＭＲ方式では２０ｍｓｅｃ）に区切り、帯域制限フィルタリングを行うことにより、音声信号の各フレームから符号化の対象とならない不要な低周波数成分を除去する。 Next, the operation will be described.
When the buffer 1 stores an audio signal as input speech, the pre-processing unit 2 of the audio encoding device stores the audio signal in a certain frame length (usually about 5 msec to 50 msec, in the AMR system, as shown in FIG. 20 msec) and band limiting filtering is performed to remove unnecessary low-frequency components that are not to be encoded from each frame of the audio signal.

スペクトル分析部３は、前処理部２から前処理が施された音声信号を受けると、その音声信号のフレーム毎に、音声の線形予測分析（ＬＰＣ分析）を実施して、合成フィルタ１６の係数に用いる線形予測係数（ＬＰＣ）を算出する。
ここで、合成フィルタ１６は、下記の式（１）で定義される。
ただし、Ａハット（ｚ）は合成フィルタ１６の係数、ａ_iハットは量子化された線形予測係数である。

When the spectrum analysis unit 3 receives the preprocessed speech signal from the preprocessing unit 2, the spectrum analysis unit 3 performs linear prediction analysis (LPC analysis) of speech for each frame of the speech signal, and performs coefficients of the synthesis filter 16. The linear prediction coefficient (LPC) used for is calculated.
Here, the synthesis filter 16 is defined by the following equation (1).
However, A hat (z) is a coefficient of the synthesis filter 16, and a _i hat is a quantized linear prediction coefficient.

以下、スペクトル分析部３の処理内容を具体的に説明する。
スペクトル分析部３の線形予測分析部４は、例えば、３０ｍｓ幅の非対称窓による自己相関法を用いて、１フレーム毎に１回線形予測分析を実行する。
即ち、１６０サンプル（２０ｍｓ）毎に、窓かけされた音声の自己相関係数を計算し、レビンソンアルゴリズムを用いて、その自己相関係数を線形予測係数に変換する。
また、線形予測分析部４は、後段のＬＳＰ量子化・逆量子化部６が線形予測係数の量子化や補間を効率よく行えるようにするため、その線形予測係数を線スペクトル対（ＬＳＰ）に変換する。 Hereinafter, the processing content of the spectrum analysis part 3 is demonstrated concretely.
The linear prediction analysis unit 4 of the spectrum analysis unit 3 performs one-line type prediction analysis for each frame using, for example, an autocorrelation method using a 30 ms wide asymmetric window.
That is, the autocorrelation coefficient of the windowed speech is calculated every 160 samples (20 ms), and the autocorrelation coefficient is converted into a linear prediction coefficient using the Levinson algorithm.
Also, the linear prediction analysis unit 4 converts the linear prediction coefficient into a line spectrum pair (LSP) so that the LSP quantization / inverse quantization unit 6 in the subsequent stage can efficiently perform quantization and interpolation of the linear prediction coefficient. Convert.

ＬＳＰ量子化・逆量子化部６は、線形予測分析部４から線スペクトル対ＬＳＰを受けると、ＬＳＰ符号帳５を参照してベクトル量子化を行う。
即ち、ＬＳＰ量子化・逆量子化部６は、ＬＳＰ符号帳５に記録されているＬＳＰ係数の中で、線スペクトル対ＬＳＰに最も近似しているＬＳＰ係数を特定し、ＬＳＰ符号帳５から当該ＬＳＰ係数のインデックスを抽出する。
また、ＬＳＰ量子化・逆量子化部６は、そのＬＳＰ係数のインデックスをスペクトル情報として多重化部２１に出力する。
なお、ここで量子化されたＬＳＰ係数は、第４サブフレームの合成フィルタ１６で使用される。また、第１、第２、第３サブフレームで使用される線形予測係数を計算するため、直前のフレームで量子化されたＬＳＰ係数と、上記量子化ＬＳＰとを用いて補間処理が行われる。 When receiving the line spectrum pair LSP from the linear prediction analysis unit 4, the LSP quantization / inverse quantization unit 6 refers to the LSP codebook 5 and performs vector quantization.
That is, the LSP quantization / inverse quantization unit 6 identifies the LSP coefficient closest to the line spectrum pair LSP among the LSP coefficients recorded in the LSP codebook 5, and Extract the LSP coefficient index.
Further, the LSP quantization / inverse quantization unit 6 outputs the index of the LSP coefficient to the multiplexing unit 21 as spectrum information.
The LSP coefficient quantized here is used in the synthesis filter 16 of the fourth subframe. Further, in order to calculate linear prediction coefficients used in the first, second, and third subframes, interpolation processing is performed using the LSP coefficients quantized in the immediately preceding frame and the quantized LSP.

駆動音源生成部７は、最小誤差探索部２０の指示の下、適応符号帳８及び固定音源符号帳９から出力されるサブフレーム単位の励振音源信号を組み合わせて、複数の駆動音源を生成する。
以下、駆動音源生成部７の処理内容を具体的に説明する。 The driving excitation generator 7 generates a plurality of driving excitations by combining the excitation excitation signals in units of subframes output from the adaptive codebook 8 and the fixed excitation codebook 9 under the instruction of the minimum error search unit 20.
Hereinafter, the processing content of the drive sound source generation unit 7 will be specifically described.

駆動音源生成部７の適応符号帳８は、過去に生成した駆動音源である励振音源信号（適応符号帳成分信号）を蓄積しており、それらの励振音源信号（適応符号帳成分信号）の中から最小誤差探索部２０が指示する励振音源信号（適応符号帳成分信号）を出力する。
また、固定音源符号帳９は、複数本の単位パルスで構成されているパルス音源符号帳（代数符号帳）が用いられており、最小誤差探索部２０が指示する励振音源信号（パルス音源符号帳成分信号）を出力する。
ただし、適応符号帳８及び固定音源符号帳９は、当該サブフレームの励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）だけでなく、次サブフレームに跨る励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）も出力する。 The adaptive codebook 8 of the drive excitation generator 7 stores excitation excitation signals (adaptive codebook component signals) that are drive excitations generated in the past, and among these excitation excitation signals (adaptive codebook component signals). To the excitation source signal (adaptive codebook component signal) instructed by the minimum error search unit 20.
The fixed excitation codebook 9 uses a pulse excitation codebook (algebraic codebook) composed of a plurality of unit pulses, and an excitation excitation signal (pulse excitation codebook) indicated by the minimum error search unit 20. Component signal).
However, the adaptive codebook 8 and the fixed excitation codebook 9 are not limited to the excitation excitation signal (adaptive codebook component signal and pulse excitation codebook component signal) of the subframe, but also the excitation excitation signal (adaptive code) over the next subframe. (Book component signal, pulse excitation codebook component signal) are also output.

即ち、従来の音声符号化装置の場合、現在の符号化対象のフレームが第Ｍフレームであるとき、第Ｍフレームの第１サブフレームでは、第１サブフレームの励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）だけを出力して駆動音源を生成するようにしているが、上述したように、固定音源符号帳を構成するパルス音源符号帳の後段に、拡散フィルタやＨＰＦなどの各種フィルタを設置すると（図３５及び図３６を参照）、フィルタのインパルス応答によって、固定音源符号帳の励振音源信号が所定の時間長を有する波束形状となるため（図３７を参照）、当該サブフレームの励振音源信号から生成した合成音声と、当該サブフレームの参照信号との量子化誤差を評価しても、最適な符号帳を選択することができなくなることがある。 That is, in the case of the conventional speech encoding apparatus, when the current encoding target frame is the Mth frame, the excitation signal (adaptive codebook component signal) of the first subframe is used in the first subframe of the Mth frame. , Only a pulse excitation codebook component signal) is output to generate a driving excitation. As described above, a diffusion filter, an HPF, or the like is provided after the pulse excitation codebook constituting the fixed excitation codebook. When various filters are installed (see FIG. 35 and FIG. 36), the excitation excitation signal of the fixed excitation codebook has a wave packet shape having a predetermined time length due to the impulse response of the filter (see FIG. 37). Even if the quantization error between the synthesized speech generated from the excitation signal of the frame and the reference signal of the subframe is evaluated, it is not possible to select the optimum codebook A.

そこで、この実施の形態１では、適応符号帳８及び固定音源符号帳９が、当該サブフレームの励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）だけでなく、次サブフレームに跨る励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）も出力するようにしている。
具体的には、第Ｍフレームの第１サブフレームでは、第１サブフレームの励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）と、第１サブフレームから第２サブフレームに跨る励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）を出力する。
同様に、第Ｍフレームの第Ｎ（Ｎ＝２，３）サブフレームでは、第Ｎサブフレームの励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）と第（Ｎ＋１）サブフレームの励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）を出力する。
また、第Ｍフレームの第４サブフレームでは、第Ｍフレームの第４サブフレームの励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）と第（Ｍ＋１）フレームの第１サブフレームの励振音源信号（適応符号帳成分信号、パルス音源符号帳成分信号）を出力する。 Therefore, in the first embodiment, the adaptive codebook 8 and the fixed excitation codebook 9 are used not only for the excitation excitation signal (adaptive codebook component signal and pulse excitation codebook component signal) of the subframe but also in the next subframe. Overlying excitation excitation signals (adaptive codebook component signal, pulse excitation codebook component signal) are also output.
Specifically, in the first subframe of the Mth frame, the excitation signal (adaptive codebook component signal, pulse excitation codebook component signal) of the first subframe and the first subframe to the second subframe are straddled. Excitation excitation signal (adaptive codebook component signal, pulse excitation codebook component signal) is output.
Similarly, in the Nth (N = 2, 3) subframe of the Mth frame, the excitation excitation signal (adaptive codebook component signal, pulse excitation codebook component signal) of the Nth subframe and the (N + 1) th subframe of the Nth subframe. Excitation excitation signal (adaptive codebook component signal, pulse excitation codebook component signal) is output.
Also, in the fourth subframe of the Mth frame, the excitation excitation signal (adaptive codebook component signal, pulse excitation codebook component signal) of the fourth subframe of the Mth frame and the first subframe of the (M + 1) th frame. Excitation excitation signal (adaptive codebook component signal, pulse excitation codebook component signal) is output.

利得符号帳１０は、複数の利得を格納している符号帳であり、複数の利得の中から最小誤差探索部２０が指示する利得を出力する。
利得乗算器１１は、適応符号帳８から出力された当該サブフレーム及び次サブフレームにおける励振音源信号（適応符号帳成分信号）に、利得符号帳１０から出力された利得を乗算する。
また、利得乗算器１２は、固定音源符号帳９から出力された当該サブフレーム及び次サブフレームにおける励振音源信号（パルス音源符号帳成分信号）に、利得符号帳１０から出力された利得を乗算する。
加算器１３は、利得乗算器１１により利得が乗算された励振音源信号（適応符号帳成分信号）と利得乗算器１２により利得が乗算された励振音源信号（パルス音源符号帳成分信号）を加算して、駆動音源を生成する。 The gain codebook 10 is a codebook storing a plurality of gains, and outputs a gain indicated by the minimum error search unit 20 from the plurality of gains.
The gain multiplier 11 multiplies the excitation source signal (adaptive codebook component signal) in the subframe and the next subframe output from the adaptive codebook 8 by the gain output from the gain codebook 10.
The gain multiplier 12 multiplies the excitation source signal (pulse excitation codebook component signal) in the subframe and the next subframe output from the fixed excitation codebook 9 by the gain output from the gain codebook 10. .
The adder 13 adds the excitation excitation signal (adaptive codebook component signal) multiplied by the gain by the gain multiplier 11 and the excitation excitation signal (pulse excitation codebook component signal) multiplied by the gain by the gain multiplier 12. To generate a driving sound source.

合成音声生成部１４のＬＳＰ／ＬＰＣ変換部１５は、ＬＳＰ量子化・逆量子化部６がＬＳＰ係数を特定すると、そのＬＳＰ係数を線形予測係数ＬＰＣに変換し、その線形予測係数ＬＰＣに応じて合成フィルタ１６を形成する。
合成音声生成部１４の合成フィルタ１６は、駆動音源生成部７が複数の駆動音源を生成すると、複数の駆動音源を入力して、それらの駆動音源から複数の合成音声を生成し、複数の合成音声を減算器１８に出力する。 When the LSP quantization / inverse quantization unit 6 specifies an LSP coefficient, the LSP / LPC conversion unit 15 of the synthesized speech generation unit 14 converts the LSP coefficient into a linear prediction coefficient LPC, and according to the linear prediction coefficient LPC A synthesis filter 16 is formed.
When the driving sound source generator 7 generates a plurality of driving sound sources, the synthesizing filter 16 of the synthesized sound generating unit 14 inputs a plurality of driving sound sources, generates a plurality of synthesized sounds from the driving sound sources, and generates a plurality of synthesized sounds. The sound is output to the subtracter 18.

参照ベクトル組立バッファ１７は、前処理部２からフレーム単位の音声信号を受けると、そのフレーム単位の音声信号の中から、当該サブフレームの音声信号と次サブフレームの音声信号を抽出し、当該サブフレーム及び次サブフレームの音声信号を、量子化誤差が最小の合成音声を探索するに使用する参照信号として減算器１８に出力する。
具体的には、第Ｍフレームの第１サブフレームでは、図３に示すように、第１サブフレームの音声信号と第２サブフレームの音声信号を参照信号として出力する。
同様に、第Ｍフレームの第Ｎ（Ｎ＝２，３）サブフレームでは、第Ｎサブフレームの音声信号と第（Ｎ＋１）サブフレームの音声信号を参照信号として出力する。
また、第Ｍフレームの第４サブフレームでは、第Ｍフレームの第４サブフレームの音声信号と第（Ｍ＋１）フレームの第１サブフレームの音声信号を参照信号として出力する。 When receiving the frame unit audio signal from the preprocessing unit 2, the reference vector assembly buffer 17 extracts the subframe audio signal and the next subframe audio signal from the frame unit audio signal. The audio signal of the frame and the next subframe is output to the subtracter 18 as a reference signal used for searching for synthesized speech with the minimum quantization error.
Specifically, in the first subframe of the Mth frame, as shown in FIG. 3, the audio signal of the first subframe and the audio signal of the second subframe are output as reference signals.
Similarly, in the Nth (N = 2, 3) subframe of the Mth frame, the audio signal of the Nth subframe and the audio signal of the (N + 1) th subframe are output as reference signals.
Also, in the fourth subframe of the Mth frame, the audio signal of the fourth subframe of the Mth frame and the audio signal of the first subframe of the (M + 1) th frame are output as reference signals.

減算器１８は、合成音声生成部１４により生成された複数の合成音声と参照ベクトル組立バッファ１７から出力された参照信号の差分（量子化誤差）を算出し、その量子化誤差を聴覚重み付けフィルタ１９に出力する。
聴覚重み付けフィルタ１９は、減算器１８で得られた量子化誤差信号について、平坦な送話周波数応答を持つ量子化誤差に対して聴覚重みを付加することにより、音声信号の性能を改善する。 The subtractor 18 calculates a difference (quantization error) between the plurality of synthesized speech generated by the synthesized speech generation unit 14 and the reference signal output from the reference vector assembly buffer 17, and uses the quantization error as the perceptual weighting filter 19. Output to.
The perceptual weighting filter 19 adds the perceptual weight to the quantization error having a flat transmission frequency response for the quantization error signal obtained by the subtractor 18 to improve the performance of the audio signal.

最小誤差探索部２０は、聴覚重み付けフィルタ１９から出力される量子化誤差が小さくなるように、適応符号帳８及び固定音源符号帳９から出力される励振音源信号や、利得符号帳１０から出力される利得を制御して、合成フィルタ１６から出力される複数の合成音声の中で、量子化誤差が最小の合成音声を探索する。 The minimum error search unit 20 outputs the excitation excitation signal output from the adaptive codebook 8 and the fixed excitation codebook 9 and the gain codebook 10 so that the quantization error output from the perceptual weighting filter 19 is reduced. The synthesized speech with the smallest quantization error is searched for among the plurality of synthesized speech output from the synthesis filter 16.

多重化部２１は、最小誤差探索部２０が、量子化誤差が最小の合成音声を探索すると、ＬＰＣ量子化・逆量子化部６から出力されたスペクトル情報と、量子化誤差が最小の合成音声が得られる際に適応符号帳８から出力される当該サブフレームの励振音源信号のピッチ情報と、量子化誤差が最小の合成音声が得られる際に固定音源符号帳９から出力される当該サブフレームの励振音源信号のパルス情報と、量子化誤差が最小の合成音声が得られる際に利得符号帳１０から出力される利得を示す利得情報とを多重化して、その多重化信号を音声復号装置に送信する。 When the minimum error search unit 20 searches for the synthesized speech with the smallest quantization error, the multiplexing unit 21 searches for the spectrum information output from the LPC quantization / inverse quantization unit 6 and the synthesized speech with the smallest quantization error. Is obtained from the adaptive codebook 8 and the subframe output from the fixed excitation codebook 9 when the synthesized speech with the minimum quantization error is obtained. And the gain information indicating the gain output from the gain codebook 10 when the synthesized speech with the minimum quantization error is obtained, and the multiplexed signal is transmitted to the speech decoding apparatus. Send.

音声復号装置の多重分離部３１は、音声符号化装置から送信された多重化信号を受信して、その多重化信号を分離し、スペクトル情報、ピッチ情報、パルス情報及び利得情報を出力する。
適応符号帳３２は、多重分離部３１からピッチ情報を受けると、そのピッチ情報に対応する当該サブフレームの励振音源信号（適応符号帳成分信号）を出力する。 The demultiplexing unit 31 of the speech decoding apparatus receives the multiplexed signal transmitted from the speech encoding apparatus, separates the multiplexed signal, and outputs spectrum information, pitch information, pulse information, and gain information.
When adaptive codebook 32 receives pitch information from demultiplexing section 31, adaptive codebook 32 outputs an excitation excitation signal (adaptive codebook component signal) of the subframe corresponding to the pitch information.

固定音源符号帳３３は、多重分離部３１からパルス情報を受けると、そのパルス情報に対応する当該サブフレームの励振音源信号（パルス音源符号帳成分信号）を出力する。
利得符号帳３４は、多重分離部３１から利得情報を受けると、その利得情報に対応する利得を利得乗算器３５，３６に出力する。 When fixed excitation codebook 33 receives pulse information from demultiplexing unit 31, fixed excitation codebook 33 outputs an excitation excitation signal (pulse excitation codebook component signal) of the subframe corresponding to the pulse information.
When gain codebook 34 receives gain information from demultiplexing section 31, gain codebook 34 outputs gain corresponding to the gain information to gain multipliers 35 and 36.

利得乗算器３５は、適応符号帳３２から出力された当該サブフレームにおける励振音源信号（適応符号帳成分信号）に、利得符号帳３４から出力された利得を乗算する。
利得乗算器３６は、固定音源符号帳３３から出力された当該サブフレームにおける励振音源信号（パルス音源符号帳成分信号）に、利得符号帳３４から出力された利得を乗算する。
加算器３７は、利得乗算器３５により利得が乗算された励振音源信号（適応符号帳成分信号）と利得乗算器３６により利得が乗算された励振音源信号（パルス音源符号帳成分信号）を加算して、駆動音源を生成する。 The gain multiplier 35 multiplies the excitation source signal (adaptive codebook component signal) in the subframe output from the adaptive codebook 32 by the gain output from the gain codebook 34.
The gain multiplier 36 multiplies the excitation excitation signal (pulse excitation codebook component signal) in the subframe output from the fixed excitation codebook 33 by the gain output from the gain codebook 34.
The adder 37 adds the excitation excitation signal (adaptive codebook component signal) multiplied by the gain by the gain multiplier 35 and the excitation excitation signal (pulse excitation codebook component signal) multiplied by the gain by the gain multiplier 36. To generate a driving sound source.

ＬＳＰ符号帳３８は、多重分離部３１からスペクトル情報を受けると、そのスペクトル情報に対応するＬＳＰ係数を出力する。
ＬＳＰ／ＬＰＣ変換部３９は、ＬＳＰ符号帳３８からＬＳＰ係数を受けると、そのＬＳＰ係数を線形予測係数ＬＰＣに変換し、その線形予測係数ＬＰＣに応じて合成フィルタ４０を形成する。
合成フィルタ４０は、加算器３７から出力された駆動音源を入力して、その駆動音源から合成音声を復号する。
ポストフィルタ４１は、合成フィルタ４０から合成音声を受けると、その合成音声に対して品質を改善する処理を実施する。 When receiving the spectrum information from the demultiplexing unit 31, the LSP codebook 38 outputs LSP coefficients corresponding to the spectrum information.
When receiving the LSP coefficient from the LSP codebook 38, the LSP / LPC conversion unit 39 converts the LSP coefficient into a linear prediction coefficient LPC, and forms a synthesis filter 40 according to the linear prediction coefficient LPC.
The synthesis filter 40 receives the driving sound source output from the adder 37 and decodes the synthesized speech from the driving sound source.
When the post filter 41 receives the synthesized speech from the synthesis filter 40, the post filter 41 performs processing for improving the quality of the synthesized speech.

以上で明らかなように、この実施の形態１によれば、適応符号帳８及び固定音源符号帳９が当該サブフレームの励振音源信号だけでなく、次サブフレームの励振音源信号を出力し、当該サブフレームの音声信号だけでなく、次サブフレームの音声信号を量子化誤差の評価対象に含めるように構成したので、固定音源符号帳９を構成するパルス音源符号帳の後段に設置されるフィルタのインパルス応答によって、固定音源符号帳９の励振音源信号が所定の時間長を有する波束形状となっても、最適な符号帳を選択することができる効果を奏する。 As apparent from the above, according to the first embodiment, the adaptive codebook 8 and the fixed excitation codebook 9 output not only the excitation source signal of the subframe but also the excitation source signal of the next subframe, Since not only the sub-frame speech signal but also the next sub-frame speech signal is included in the quantization error evaluation target, the filter installed in the subsequent stage of the pulse excitation codebook constituting the fixed excitation codebook 9 Even if the excitation sound source signal of the fixed excitation codebook 9 has a wave packet shape having a predetermined time length due to the impulse response, the optimum codebook can be selected.

即ち、この実施の形態１によれば、図４に示すように、パルスが存在する区間を当該サブフレームの区間内に限定したとしても、次サブフレームまでインパルス応答成分が及ぶ区間（図４の例では、区間Ｃ）まで量子化誤差の評価が行われるため、本来選択されないパルス（波束位置）を誤って選択することがなくなる。
また、符号化ビットレートは有限であるが、誤ってパルスが選択されてしまっても、そのパルスの代わりに、本来選択されるべきパルスが選択されない（機会損失）ということも少なくなる。
このように、波束位置をより適所に配置することができるため、同一ビットレートで音声品質の改善を図ることができる。 That is, according to the first embodiment, as shown in FIG. 4, even if the section where the pulse exists is limited to the section of the subframe, the section where the impulse response component extends to the next subframe (see FIG. 4). In the example, since the quantization error is evaluated until section C), a pulse (wave packet position) that is not originally selected is not erroneously selected.
In addition, although the encoding bit rate is finite, even if a pulse is selected by mistake, a pulse that should be originally selected is not selected (opportunity loss) instead of the pulse.
As described above, since the wave packet position can be arranged at a more appropriate position, the voice quality can be improved at the same bit rate.

実施の形態２．
図５はこの発明の実施の形態２による音声符号化装置を示す構成図であり、図において、図１と同一符号は同一または相当部分を示すので説明を省略する。
重み付け最小誤差探索部２２は図１の最小誤差探索部２０と同様に、複数の合成音声の中で量子化誤差が最小の合成音声に係る符号化パラメータを探索する処理を実施するが、重み付け最小誤差探索部２２は量子化誤差が最小の合成音声に係る符号化パラメータを探索する際、当該サブフレームの音声信号に係る量子化誤差の評価と比べて、次サブフレームの音声信号に係る量子化誤差の評価を相対的に減らすため、図６に示すように、当該サブフレームの音声信号に係る量子化誤差に対する重み付けを、次サブフレームの音声信号に係る量子化誤差に対する重み付けより大きくするようにする。
なお、参照ベクトル組立バッファ１７、減算器１８、聴覚重み付けフィルタ１９及び重み付け最小誤差探索部２２からパラメータ探索手段が構成されている。 Embodiment 2. FIG.
FIG. 5 is a block diagram showing a speech encoding apparatus according to Embodiment 2 of the present invention. In the figure, the same reference numerals as those in FIG.
Similar to the minimum error search unit 20 in FIG. 1, the weighted minimum error search unit 22 performs a process of searching for a coding parameter related to a synthesized speech having a minimum quantization error among a plurality of synthesized speech. When the error search unit 22 searches for the coding parameter related to the synthesized speech with the minimum quantization error, the error search unit 22 compares the quantization error related to the speech signal of the subframe and the quantization related to the speech signal of the next subframe. In order to relatively reduce the error evaluation, as shown in FIG. 6, the weighting for the quantization error related to the audio signal of the subframe is made larger than the weighting for the quantization error related to the audio signal of the next subframe. To do.
The reference vector assembly buffer 17, the subtracter 18, the perceptual weighting filter 19, and the weighted minimum error search unit 22 constitute parameter search means.

次に動作について説明する。
重み付け最小誤差探索部２２以外の動作は、上記実施の形態１と同様であるため、重み付け最小誤差探索部２２の動作についてのみ説明する。 Next, the operation will be described.
Since the operations other than the weighted minimum error search unit 22 are the same as those in the first embodiment, only the operation of the weighted minimum error search unit 22 will be described.

例えば、第Ｎサブフレームと第（Ｎ＋１）サブフレームとでは、通常、合成フィルタ１６の係数が異なる。
第Ｎサブフレームにおける合成フィルタ１６の係数と、第（Ｎ＋１）サブフレームにおける合成フィルタ１６の係数との変動（スペクトル包絡情報の変動）が小さければ、上記実施の形態１のような方法を用いても支障はないが、スペクトル包絡情報の変動が大きい場合（例えば、音声信号が雑音的である場合）、第Ｎサブフレームのスペクトル包絡情報を用いて、第（Ｎ＋１）サブフレームまで跨って量子化誤差の評価を行うと、その評価値が本来のものと相違し、本来選択されるべきパルスが選択されなかったり、本来選択されないパルスを誤って選択してしまったりすることがある。
そこで、この実施の形態２では、次サブフレームの音声信号に係る量子化誤差の評価を相対的に減らすことで、スペクトル包絡情報の変動分を吸収することを目的としている。 For example, the coefficients of the synthesis filter 16 are usually different between the Nth subframe and the (N + 1) th subframe.
If the variation (coefficient of spectrum envelope information) between the coefficient of synthesis filter 16 in the Nth subframe and the coefficient of synthesis filter 16 in the (N + 1) th subframe is small, the method as in the first embodiment is used. However, if the fluctuation of the spectrum envelope information is large (for example, when the voice signal is noisy), quantization is performed across the (N + 1) th subframe using the spectrum envelope information of the Nth subframe. When the error is evaluated, the evaluation value is different from the original value, and a pulse to be originally selected may not be selected, or a pulse that is not originally selected may be erroneously selected.
Therefore, the second embodiment aims to absorb the fluctuation of the spectrum envelope information by relatively reducing the evaluation of the quantization error related to the audio signal of the next subframe.

重み付け最小誤差探索部２２は、量子化誤差が最小の合成音声を探索する際、当該サブフレームの音声信号に係る量子化誤差の評価と比べて、次サブフレームの音声信号に係る量子化誤差の評価を相対的に減らすため、当該サブフレームの音声信号に係る量子化誤差に対する重み付けを、次サブフレームの音声信号に係る量子化誤差に対する重み付けより大きくするようにする。
以下、重み付け最小誤差探索部２２の具体的な処理内容を説明する。 When searching for a synthesized speech with the smallest quantization error, the weighted minimum error search unit 22 compares the quantization error associated with the speech signal in the next subframe as compared with the evaluation of the quantization error associated with the speech signal in the subframe. In order to reduce the evaluation relatively, the weighting for the quantization error related to the audio signal of the subframe is made larger than the weighting for the quantization error related to the audio signal of the next subframe.
Hereinafter, specific processing contents of the weighted minimum error search unit 22 will be described.

ＣＥＬＰ符号化における誤差評価は、通常、演算量を軽減することを目的として、参照信号ベクトルと合成信号ベクトルとの正規化された内積値と等価な下記の式（２）のＲ（ｋ）を用いる。
即ち、重み付け最小誤差探索部２２は、誤差評価Ｒ（ｋ）を最大にするｋの値を探索することによって、符号帳の探索（量子化誤差が最小の合成音声の探索）を実現する。

ただし、式（２）において、ベクトルｘは符号帳探索用のターゲットベクトル、ベクトルｃ_kはインデックスｋに対応する固定符号帳ベクトル、Ｈは合成フィルタ１６のインパルス応答行列である。 The error evaluation in CELP encoding is usually performed by using R (k) in the following equation (2) equivalent to the normalized inner product value of the reference signal vector and the combined signal vector for the purpose of reducing the amount of calculation. Use.
That is, the weighted minimum error search unit 22 realizes a codebook search (synthetic speech search with a minimum quantization error) by searching for a value of k that maximizes the error evaluation R (k).

In equation (2), vector x is a target vector for codebook search, vector _ck is a fixed codebook vector corresponding to index k, and H is an impulse response matrix of synthesis filter 16.

ここで、ターゲットベクトルｘは、第Ｎサブフレーム区間に相当するｘ_currentと、第（Ｎ＋１）サブフレーム区間に相当するｘ_nextに分離することができる。
ｘ_current
＝（ｘ₀ ，ｘ₁ ，・・・，ｘ_M-2，ｘ_M-1，０，０，・・・，０）
（４）
ｘ_next
＝（０，０，・・・，０，０，ｘ_M ，ｘ_M+1，・・・，ｘ_M+P-2，ｘ_M+P-1）
（５） Here, the target vector x can and x _current corresponding to the N sub-frame period, be separated into x _next corresponding to the (N + 1) sub-frame period.
x _current
= (X ₀ , x ₁ ,..., X _M−2 , x _M−1 , 0, 0,..., 0)
(4)
x _next
= (0, 0, ..., 0, 0, x _M , x _{M + 1} , ..., x _{M + P-2} , x _{M + P-1} )
(5)

ターゲットベクトルｘの要素を式（４）（５）のように書き表すことにより、Ｒ（ｋ）の分子成分Ｃ（ｋ）は、第Ｎサブフレームの量子化誤差評価パラメータに相当する下記の式（６）のＣ_current（ｋ）と、第（Ｎ＋１）サブフレームの量子化誤差評価パラメータに相当する下記の式（７）のＣ_next（ｋ）との和に分離することができる。

By writing the elements of the target vector x as in equations (4) and (5), the numerator component C (k) of R (k) can be expressed by the following equation corresponding to the quantization error evaluation parameter of the Nth subframe ( and C _current (k) 6) can be separated to the sum of the first (N + 1) C _next following equations corresponding to the quantization error evaluation parameter of the sub-frame (7) (k).

次サブフレーム以降の区間における量子化誤差に対する重み付けを相対的に軽くする場合、下記の式（８）に示すように、Ｃ_next（ｋ）に重み付け係数αを乗ずることにより実現することができる。ただし、αの条件は、下記の式（９）に示す通りである。

When the weight for the quantization error in the section after the next subframe is relatively lightened, it can be realized by multiplying C _next (k) by a weighting coefficient α as shown in the following equation (8). However, the condition of α is as shown in the following formula (9).

以上で明らかなように、この実施の形態２によれば、重み付け最小誤差探索部２２が、量子化誤差が最小の合成音声を探索する際、当該サブフレームの音声信号に係る量子化誤差に対する重み付けを、次サブフレームの音声信号に係る量子化誤差に対する重み付けより大きくするように構成したので、スペクトル包絡が急激に変動しても、良好な音声品質を維持することができる効果を奏する。また、合成フィルタ１６の係数の変動幅を評価して、適応的にαの値を決めても同様の効果が得られる。 As is apparent from the above, according to the second embodiment, when the weighted minimum error search unit 22 searches for the synthesized speech with the minimum quantization error, the weighting for the quantization error related to the speech signal of the subframe is performed. Is set to be larger than the weighting for the quantization error related to the audio signal of the next subframe, so that it is possible to maintain good audio quality even if the spectrum envelope fluctuates rapidly. Further, the same effect can be obtained by evaluating the coefficient fluctuation range of the synthesis filter 16 and adaptively determining the value of α.

実施の形態３．
上記実施の形態１，２では、当該サブフレームの音声信号だけでなく、次サブフレームの音声信号を量子化誤差の評価対象に含めるものについて示したが、次サブフレームが当該サブフレームと異なるフレームに属している場合、次サブフレームの音声信号を量子化誤差の評価対象から除外するようにしてもよい。
ここで、図７は入力音声の波形、入力音声のフレーム及びサブフレーム、参照信号などを示す説明図である。
第１〜第３サブフレームに対する誤差評価区間は、上記実施の形態１と同じであるが（図３を参照）、第４サブフレームに対する誤差評価区間が、第４サブフレーム区間そのものに限定される点で、上記実施の形態１と相違している。 Embodiment 3 FIG.
In Embodiments 1 and 2 described above, not only the audio signal of the subframe but also the audio signal of the next subframe is included in the quantization error evaluation target, but the next subframe is different from the subframe. May belong to the quantization error evaluation target.
Here, FIG. 7 is an explanatory diagram showing the waveform of the input sound, the frame and subframe of the input sound, the reference signal, and the like.
The error evaluation interval for the first to third subframes is the same as in the first embodiment (see FIG. 3), but the error evaluation interval for the fourth subframe is limited to the fourth subframe interval itself. This is different from the first embodiment.

上記実施の形態２では、スペクトル包絡の変動によるパルスの誤選択を回避する方法を説明したが、同一フレーム内のサブフレーム間の合成フィルタの係数は、フレーム間の合成フィルタ係数の線形補間により求めているため、変動幅は比較的緩やかなものと考えられる。
しかし、フレームについては、それぞれ音声信号から改めて線形予測分析を実施して、計算し直しているため、変動幅がより大きくなる。
そこで、この実施の形態３では、フレームを跨ぐ区間での量子化誤差評価は行わないようにすることで、より大きな耐性を持たせ、スペクトル包絡が急激に変動しても、良好な音声品質を維持することができるようにしている。 In the second embodiment, the method for avoiding erroneous selection of pulses due to fluctuations in the spectral envelope has been described. However, the coefficients of the synthesis filter between subframes in the same frame are obtained by linear interpolation of the synthesis filter coefficients between frames. Therefore, the fluctuation range is considered to be relatively moderate.
However, since the frame is recalculated by performing linear prediction analysis again from each speech signal, the fluctuation range becomes larger.
Therefore, in this third embodiment, by not performing the quantization error evaluation in the section across the frames, it has a higher tolerance, and even if the spectrum envelope fluctuates rapidly, a good voice quality is obtained. So that it can be maintained.

実施の形態４．
図８はこの発明の実施の形態４による音声符号化装置を示す構成図であり、図において、図１と同一符号は同一または相当部分を示すので説明を省略する。
繰越成分記憶用メモリ２３は前サブフレームにおける固定音源符号帳９のインパルス応答成分を記憶するメモリである。
加算器２４は固定音源符号帳９から出力される当該サブフレームの励振音源信号に、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分を加算する処理を実施する。
なお、繰越成分記憶用メモリ２３及び加算器２４から加算手段が構成されている。 Embodiment 4 FIG.
8 is a block diagram showing a speech encoding apparatus according to Embodiment 4 of the present invention. In the figure, the same reference numerals as those in FIG.
The carry component storage memory 23 is a memory for storing the impulse response component of the fixed excitation codebook 9 in the previous subframe.
The adder 24 adds the impulse response component of the fixed excitation codebook 9 in the previous subframe stored in the carry component storage memory 23 to the excitation excitation signal of the subframe output from the fixed excitation codebook 9. To implement.
The carry component storage memory 23 and the adder 24 constitute an adding means.

図９はこの発明の実施の形態４による音声復号装置を示す構成図であり、図において、図２と同一符号は同一または相当部分を示すので説明を省略する。
繰越成分記憶用メモリ４２は前サブフレームにおける固定音源符号帳３３のインパルス応答成分を記憶するメモリである。
加算器４３は固定音源符号帳３３から出力される当該サブフレームの励振音源信号に、繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分を加算する処理を実施する。
なお、繰越成分記憶用メモリ４２及び加算器４３から加算手段が構成されている。 FIG. 9 is a block diagram showing a speech decoding apparatus according to Embodiment 4 of the present invention. In the figure, the same reference numerals as those in FIG.
The carry component storage memory 42 is a memory for storing the impulse response component of the fixed excitation codebook 33 in the previous subframe.
The adder 43 adds the impulse response component of the fixed excitation codebook 33 in the previous subframe stored in the carry component storage memory 42 to the excitation excitation signal of the subframe output from the fixed excitation codebook 33. To implement.
The carry component storage memory 42 and the adder 43 constitute an adding means.

次に動作について説明する。
この実施の形態４では、前サブフレームにおける固定音源符号帳３３のインパルス応答成分を固定音源符号帳３３から出力される当該サブフレームの励振音源信号に加算するようにしている点で、上記実施の形態１と相違している。 Next, the operation will be described.
In the fourth embodiment, the impulse response component of the fixed excitation codebook 33 in the previous subframe is added to the excitation excitation signal of the subframe output from the fixed excitation codebook 33. This is different from Form 1.

音声符号化装置では、繰越成分記憶用メモリ２３が前サブフレームにおける固定音源符号帳９のインパルス応答成分を記憶し、加算器２４が固定音源符号帳９から出力される当該サブフレームの励振音源信号に、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分を加算するようにしている。
例えば、第（Ｎ−１）サブフレームで選択されたパルスＡの場合（図１０を参照）、第Ｎサブフレームに繰り越されるインパルス応答成分（図１０の区間Ａ＝Ｃ_previous）は、第（Ｎ−１）サブフレームの処理中にパルスＡが選択された時点で判明するので、繰越成分記憶用メモリ２３が信号Ｃ_previousを記憶する。 In the speech coding apparatus, the carry component storage memory 23 stores the impulse response component of the fixed excitation codebook 9 in the previous subframe, and the adder 24 outputs the excitation excitation signal of the subframe output from the fixed excitation codebook 9. In addition, the impulse response component of the fixed excitation codebook 9 in the previous subframe stored in the carry component storage memory 23 is added.
For example, in the case of the pulse A selected in the (N−1) th subframe (see FIG. 10), the impulse response component (section A = C _{previous in} FIG. 10) carried over to the Nth subframe is the (N -1) Since the pulse A is determined when the subframe is processed, the carry component storage memory 23 stores the signal C _previous .

第（Ｎ−１）サブフレームから第Ｎサブフレームの処理に移行すると、加算器２４が繰越成分記憶用メモリ２３から信号Ｃ_previousを取り出し、その信号Ｃ_previousを固定音源符号帳９から出力される当該サブフレームの励振音源信号に加算し、その加算結果を固定音源符号帳の探索に用いるようにする。
この実施の形態４では、最小誤差探索部２０が式（２）の誤差評価Ｒ（ｋ）を最大にするｋの値を探索する際、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）は、下記の式（１０）を使用する。

ただし、Ｃ_previousは、第（Ｎ−１）サブフレームにおける固定音源符号帳９のインパルス応答成分であり、繰越成分記憶用メモリ２３に記憶される信号Ｃ_previousである。 When the processing shifts from the (N−1) th subframe to the Nth subframe, the adder 24 extracts the signal C _previous from the carry component storage memory 23 and outputs the signal C _previous from the fixed excitation codebook 9. It adds to the excitation excitation signal of the said sub-frame, and uses the addition result for the search of a fixed excitation codebook.
In the fourth embodiment, when the minimum error search unit 20 searches for a value of k that maximizes the error evaluation R (k) of Equation (2), the molecular component C (k) of the error evaluation R (k) is The following formula (10) is used.

Here, C _previous is an impulse response component of the fixed excitation codebook 9 in the (N−1) th subframe, and is a signal C _previous stored in the carry component storage memory 23.

音声復号装置では、第（Ｎ−１）サブフレームの処理において、音声符号化装置からパルスＡの位置を示すパルス情報が送信されてくるが、固定音源符号帳３３は、音声符号化装置の固定音源符号帳９と同じフィルタ（固定音源符号帳９と同じ内部フィルタのインパルス応答情報）を持っているため、自動的に、第Ｎサブフレームに繰り越すインパルス応答成分を得ることができるので、繰越成分記憶用メモリ４２がインパルス応答成分である信号Ｃ_previousを記憶する。
第（Ｎ−１）サブフレームから第Ｎサブフレームの処理に移行すると、加算器４３が繰越成分記憶用メモリ４２から信号Ｃ_previousを取り出し、その信号Ｃ_previousを固定音源符号帳３３から出力される当該サブフレームの励振音源信号に加算し、その加算結果を利得乗算器３６に出力する。 In the speech decoding apparatus, in the processing of the (N−1) th subframe, pulse information indicating the position of the pulse A is transmitted from the speech encoding apparatus, but the fixed excitation codebook 33 is fixed to the speech encoding apparatus. Since it has the same filter as the excitation codebook 9 (impulse response information of the same internal filter as the fixed excitation codebook 9), an impulse response component that is carried forward to the Nth subframe can be automatically obtained. The storage memory 42 stores the signal C _previous which is an impulse response component.
When the process proceeds from the (N−1) th subframe to the Nth subframe, the adder 43 extracts the signal C _previous from the carry component storage memory 42 and outputs the signal C _previous from the fixed excitation codebook 33. The result is added to the excitation sound source signal of the subframe, and the addition result is output to the gain multiplier 36.

以上で明らかなように、この実施の形態４によれば、固定音源符号帳９，３３から出力される当該サブフレームの励振音源信号に、前サブフレームにおける固定音源符号帳９，３３のインパルス応答成分を加算するように構成したので、上記実施の形態１よりも更に音声品質を高めることができる効果を奏する。
例えば、図１０のパルスＤのように、本来なら選択する必要のない区間Ａ内のパルスが選択され難くなり、本来選択されるべきパルス（例えば、パルスＣ）が選択されるようになるため、音質の向上に貢献する。
音声復号装置では、区間Ａに相当するインパルス応答成分を繰越再生する機能を備えたことにより、波束形状が崩れて、畳み込んだフィルタの効果が減じられる等の弊害が発生することがなくなり、音質品質が向上する。 As is apparent from the above, according to the fourth embodiment, the impulse response of the fixed excitation codebooks 9 and 33 in the previous subframe is added to the excitation excitation signal of the relevant subframe output from the fixed excitation codebooks 9 and 33. Since the components are added, the sound quality can be further improved as compared with the first embodiment.
For example, as in the pulse D in FIG. 10, it is difficult to select a pulse in the section A that does not need to be selected, and a pulse to be originally selected (for example, the pulse C) is selected. Contributes to improving sound quality.
In the speech decoding apparatus, since the impulse response component corresponding to the section A is provided with the function of carrying over and reproducing, the wave packet shape is not lost, and the adverse effects such as reducing the effect of the convoluted filter are not generated. Quality is improved.

実施の形態５．
図１１はこの発明の実施の形態５による音声符号化装置の一部を示す構成図であり、図において、図８と同一符号は同一または相当部分を示すので説明を省略する。
利得乗算器２５は繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousに０以上１未満の固定の利得、または、時間の経過に伴って徐々に１から０に低下する利得を乗算し、利得乗算後の信号Ｃ_previousを加算器２４に出力する処理を実施する。なお、利得乗算器２５は加算手段を構成している。 Embodiment 5. FIG.
FIG. 11 is a block diagram showing a part of a speech encoding apparatus according to Embodiment 5 of the present invention. In the figure, the same reference numerals as those in FIG.
The gain multiplier 25 receives a fixed gain of 0 or more and less than 1 in the signal C _previous which is the impulse response component of the fixed excitation codebook 9 in the previous subframe stored in the carry component storage memory 23, or the passage of time. Accordingly, a process of multiplying the gain gradually decreasing from 1 to 0 and outputting the signal C _previous after gain multiplication to the adder 24 is performed. The gain multiplier 25 constitutes addition means.

図１２はこの発明の実施の形態５による音声復号装置の一部を示す構成図であり、図において、図９と同一符号は同一または相当部分を示すので説明を省略する。
利得乗算器４４は繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousに０以上１未満の固定の利得、または、時間の経過に伴って徐々に１から０に低下する利得を乗算し、利得乗算後の信号Ｃ_previousを加算器２４に出力する処理を実施する。なお、利得乗算器４４は加算手段を構成している。 12 is a block diagram showing a part of a speech decoding apparatus according to Embodiment 5 of the present invention. In the figure, the same reference numerals as those in FIG.
The gain multiplier 44 has a fixed gain of 0 or more and less than 1 in the signal C _previous which is the impulse response component of the fixed excitation codebook 9 in the previous subframe stored in the carry component storage memory 42, or the passage of time. Accordingly, a process of multiplying the gain gradually decreasing from 1 to 0 and outputting the signal C _previous after gain multiplication to the adder 24 is performed. The gain multiplier 44 constitutes addition means.

上記実施の形態４では、固定音源符号帳９，３３から出力される当該サブフレームの励振音源信号に、前サブフレームにおける固定音源符号帳９，３３のインパルス応答成分を加算するものについて示したが、音声符号化装置及び音声復号装置の利得乗算器２５，４４が前サブフレームにおける固定音源符号帳９，３３のインパルス応答成分である信号Ｃ_previousに０以上１未満の利得を乗算することにより、前サブフレームにおける固定音源符号帳９，３３のインパルス応答成分の重み付けを軽くするようにしてもよい。 In Embodiment 4 described above, the impulse response component of fixed excitation codebooks 9 and 33 in the previous subframe is added to the excitation excitation signal of the subframe output from fixed excitation codebooks 9 and 33. The gain multipliers 25 and 44 of the speech encoding device and speech decoding device multiply the signal C _previous which is the impulse response component of the fixed excitation codebooks 9 and 33 in the previous subframe by a gain of 0 or more and less than 1. The weighting of the impulse response components of the fixed excitation codebooks 9 and 33 in the previous subframe may be reduced.

このような構成とすることで、スペクトル包絡の急激な変動に対して耐性を持たせることが可能になる。
また、合成音声を復号する際、前サブフレームからの繰越成分のウエイトが相対的に軽くなるため、仮に、前サブフレームが損失して、当該サブフレームの繰越成分が失われたとしてもダメージが軽くなり、フレーム損失に対する耐性を高めることができる。 By adopting such a configuration, it becomes possible to withstand a sudden fluctuation in the spectrum envelope.
Also, when decoding synthesized speech, the weight of the carry-over component from the previous subframe becomes relatively light, so that even if the previous subframe is lost and the carry-over component of the subframe is lost, damage is caused. It becomes lighter and can withstand frame loss.

実施の形態６．
図１３はこの発明の実施の形態６による音声符号化装置の一部を示す構成図であり、図において、図８と同一符号は同一または相当部分を示すので説明を省略する。
スイッチ２６は前サブフレームが当該サブフレームと同じフレームに属していれば、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousを加算器２４に与えるが、前サブフレームが当該サブフレームと異なるフレームに属している場合、その信号Ｃ_previousを加算器２４に与えないようにする機能を備えている。なお、スイッチ２６は加算手段を構成している。 Embodiment 6 FIG.
FIG. 13 is a block diagram showing a part of a speech encoding apparatus according to Embodiment 6 of the present invention. In the figure, the same reference numerals as those in FIG.
If the previous subframe belongs to the same frame as the subframe, the switch 26 adds the signal C _previous which is the impulse response component of the fixed excitation codebook 9 in the previous subframe stored in the carry component storage memory 23. However, when the previous sub-frame belongs to a different frame from the sub-frame, the signal C _previous is not provided to the adder 24. Note that the switch 26 constitutes an adding means.

図１４はこの発明の実施の形態６による音声復号装置の一部を示す構成図であり、図において、図９と同一符号は同一または相当部分を示すので説明を省略する。
スイッチ４５は前サブフレームが当該サブフレームと同じフレームに属していれば、繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousを加算器４３に与えるが、前サブフレームが当該サブフレームと異なるフレームに属している場合、その信号Ｃ_previousを加算器４３に与えないようにする機能を備えている。なお、スイッチ４５は加算手段を構成している。 FIG. 14 is a block diagram showing a part of a speech decoding apparatus according to Embodiment 6 of the present invention. In the figure, the same reference numerals as those in FIG.
If the previous subframe belongs to the same frame as the subframe, the switch 45 adds the signal C _previous which is the impulse response component of the fixed excitation codebook 33 in the previous subframe stored in the carry component storage memory 42. However, when the previous subframe belongs to a different frame from the subframe, the signal C _previous is not provided to the adder 43. Note that the switch 45 constitutes an adding means.

上記実施の形態４では、固定音源符号帳９，３３から出力される当該サブフレームの励振音源信号に、前サブフレームにおける固定音源符号帳９，３３のインパルス応答成分を加算するものについて示したが、この実施の形態６では、前サブフレームが当該サブフレームと異なるフレームに属している場合、固定音源符号帳９，３３から出力される当該サブフレームの励振音源信号に、前サブフレームにおける固定音源符号帳９，３３のインパルス応答成分を加算しないようにしている。 In Embodiment 4 described above, the impulse response component of fixed excitation codebooks 9 and 33 in the previous subframe is added to the excitation excitation signal of the subframe output from fixed excitation codebooks 9 and 33. In the sixth embodiment, when the previous subframe belongs to a frame different from the subframe, the excitation signal of the subframe output from the fixed excitation codebooks 9 and 33 is added to the fixed excitation in the previous subframe. The impulse response components of the codebooks 9 and 33 are not added.

即ち、音声符号化装置及び音声復号装置のスイッチ２６，４５は、図１５に示すように、当該サブフレームが第２〜４サブフレームであれば、スイッチ状態をＯＮにして、前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousを加算器４３に与えるが、当該サブフレームが前フレームの直後のサブフレームである第１サブフレームであれば、スイッチ状態をＯＦＦにして、前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousを加算器４３に与えないようにしている。 That is, as shown in FIG. 15, the switches 26 and 45 of the speech encoding device and speech decoding device turn on the switch state and fix in the previous subframe if the subframe is the second to fourth subframes. The signal C _previous which is an impulse response component of the excitation codebook 33 is _supplied to the adder 43. If the subframe is the first subframe immediately after the previous frame, the switch state is turned OFF, The signal C _previous which is the impulse response component of the fixed excitation codebook 33 in the subframe is not given to the adder 43.

なお、この実施の形態６では、スイッチ２６，４５がスイッチ状態をＯＮ／ＯＦＦして、加算器２４，４３に対する信号Ｃ_previousの出力を制御するものについて示したが、スイッチ２６，４５の代わりに、図１１及び図１２の利得乗算器２５，４４を使用し、利得乗算器２５，４４の利得を０又は１に切り替えるようにしてもよい。 In the sixth embodiment, the switches 26 and 45 are turned on and off to control the output of the signal C _previous to the adders 24 and 43. However, instead of the switches 26 and 45, 11 and 12 may be used, and the gains of the gain multipliers 25 and 44 may be switched to 0 or 1.

実施の形態７．
図１６はこの発明の実施の形態７による音声符号化装置を示す構成図であり、図において、図８と同一符号は同一または相当部分を示すので説明を省略する。
モード選択機能付最小誤差探索部２７は図１や図８の最小誤差探索部２０と同様に、聴覚重み付けフィルタ１９から出力される量子化誤差が小さくなるように、適応符号帳８及び固定音源符号帳９から出力される励振音源信号や、利得符号帳１０から出力される利得を制御して、合成フィルタ１６から出力される複数の合成音声の中で、量子化誤差が最小の合成音声に係る符号化パラメータを探索する処理を実施するほか、その量子化誤差の評価結果に基づいて、前サブフレームにおける固定音源符号帳９のインパルス応答成分を加算するか否かを判定し、その判定結果を示すモード情報をスイッチ２８及び多重化部２９に出力する処理を実施する。なお、モード選択機能付最小誤差探索部２７はパラメータ探索手段を構成している。 Embodiment 7 FIG.
FIG. 16 is a block diagram showing a speech encoding apparatus according to Embodiment 7 of the present invention. In the figure, the same reference numerals as those in FIG.
Similar to the minimum error search unit 20 in FIGS. 1 and 8, the mode selection function-equipped minimum error search unit 27 reduces the quantization error output from the perceptual weighting filter 19 and reduces the adaptive codebook 8 and the fixed excitation code. The excitation sound source signal output from the book 9 and the gain output from the gain codebook 10 are controlled, and among the plurality of synthesized voices output from the synthesis filter 16, the synthesized voice with the minimum quantization error is related. In addition to performing the process of searching for the coding parameter, it is determined whether or not to add the impulse response component of the fixed excitation codebook 9 in the previous subframe based on the evaluation result of the quantization error, and the determination result is Processing for outputting the mode information shown to the switch 28 and the multiplexing unit 29 is performed. The mode selection function-equipped minimum error search unit 27 constitutes parameter search means.

スイッチ２８はモード選択機能付最小誤差探索部２７から出力されたモード情報が加算する旨を示していれば、スイッチ状態をＯＮにして、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousを加算器２４に与える処理を実施する。なお、スイッチ２８は加算手段を構成している。
多重化部２９は図１や図８の多重化部２１と同様に、スペクトル情報、ピッチ情報、パルス情報及び利得情報を多重化するとともに、モード選択機能付最小誤差探索部２７から出力されたモード情報も一緒に多重化し、その多重化信号を音声復号装置に送信する処理を実施する。
図１８はこの発明の実施の形態７による音声符号化装置の処理内容の概略を示すフローチャートである。 If the switch 28 indicates that the mode information output from the mode selection function-equipped minimum error search unit 27 is to be added, the switch state is turned ON and the previous subframe stored in the carry component storage memory 23 is turned on. A process of giving the signal C _previous which is an impulse response component of the fixed excitation codebook 9 to the adder 24 is performed. Note that the switch 28 constitutes an adding means.
The multiplexing unit 29 multiplexes spectrum information, pitch information, pulse information, and gain information as well as the multiplexing unit 21 of FIGS. 1 and 8 and also outputs the mode output from the mode selection function-equipped minimum error search unit 27. Information is also multiplexed together, and the multiplexed signal is transmitted to the speech decoding apparatus.
FIG. 18 is a flowchart showing an outline of the processing contents of the speech coding apparatus according to Embodiment 7 of the present invention.

図１７はこの発明の実施の形態７による音声復号装置を示す構成図であり、図において、図９と同一符号は同一または相当部分を示すので説明を省略する。
多重分離部４６は音声符号化装置から送信された多重化信号を受信して、その多重化信号を分離し、スペクトル情報、ピッチ情報、パルス情報、利得情報及びモード情報を出力する。なお、多重分離部４６は情報受信手段を構成している。
スイッチ４７は多重分離部４６から出力されたモード情報が前サブフレームにおける固定音源符号帳のインパルス応答成分を加算する旨を示していれば、繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousを加算器４３に与える処理を実施する。なお、スイッチ４７は加算手段を構成している。 FIG. 17 is a block diagram showing a speech decoding apparatus according to Embodiment 7 of the present invention. In the figure, the same reference numerals as those in FIG.
The demultiplexing unit 46 receives the multiplexed signal transmitted from the speech coding apparatus, demultiplexes the multiplexed signal, and outputs spectrum information, pitch information, pulse information, gain information, and mode information. The demultiplexing unit 46 constitutes information receiving means.
If the mode information output from the demultiplexer 46 indicates that the impulse response component of the fixed excitation codebook in the previous subframe is added, the switch 47 stores the previous subframe stored in the carry component storage memory 42. The signal C _previous which is the impulse response component of the fixed excitation codebook 33 in FIG. The switch 47 constitutes addition means.

上記実施の形態４では、固定音源符号帳９，３３から出力される当該サブフレームの励振音源信号に、前サブフレームにおける固定音源符号帳９，３３のインパルス応答成分を加算するものについて示したが、量子化誤差の評価結果に基づいて、前サブフレームにおける固定音源符号帳９，３３のインパルス応答成分を加算するか否かを判定するようにしてもよい。
具体的には、以下の通りである。 In Embodiment 4 described above, the impulse response component of fixed excitation codebooks 9 and 33 in the previous subframe is added to the excitation excitation signal of the subframe output from fixed excitation codebooks 9 and 33. Based on the evaluation result of the quantization error, it may be determined whether or not to add the impulse response components of the fixed excitation codebooks 9 and 33 in the previous subframe.
Specifically, it is as follows.

音声符号化装置のモード選択機能付最小誤差探索部２７は、量子化誤差が最小の合成音声を探索する際、前サブフレームにおける固定音源符号帳４２のインパルス応答成分である信号Ｃ_previousを計算する（ステップＳＴ１）。
モード選択機能付最小誤差探索部２７は、インパルス応答成分である信号Ｃ_previousを計算すると、その信号Ｃ_previousを所定の閾値と比較し（ステップＳＴ２）、その信号Ｃ_previousが所定の閾値より大きい場合、繰越成分の寄与が大きいので、そのインパルス応答成分を加算するものと判断し、そのインパルス応答成分を加算する旨を示すモード情報をスイッチ２８及び多重化部２１に出力する（ステップＳＴ３）。
また、式（１０）を使用して、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）を計算する（ステップＳＴ４）。 The minimum error search unit 27 with a mode selection function of the speech encoding device calculates a signal C _previous that is an impulse response component of the fixed excitation codebook 42 in the previous subframe when searching for synthesized speech with the minimum quantization error. (Step ST1).
When calculating the signal C _previous which is an impulse response component, the mode selection function-equipped minimum error search unit 27 compares the signal C _previous with a predetermined threshold (step ST2), and the signal C _previous is larger than the predetermined threshold. Since the contribution of the carry-over component is large, it is determined that the impulse response component is added, and mode information indicating that the impulse response component is added is output to the switch 28 and the multiplexing unit 21 (step ST3).
Further, the molecular component C (k) of the error evaluation R (k) is calculated using the equation (10) (step ST4).

モード選択機能付最小誤差探索部２７は、信号Ｃ_previousが所定の閾値より小さい場合、繰越成分の寄与が小さく、その繰越成分が劣化要因になる可能性があるので、そのインパルス応答成分を加算しないものと判断し、そのインパルス応答成分を加算しない旨を示すモード情報をスイッチ２８及び多重化部２１に出力する（ステップＳＴ５）。
また、当該サブフレームの量子化誤差評価パラメータに相当する式（６）のＣ_current（ｋ）が、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）であるとして、式（６）のＣ_current（ｋ）を計算する（ステップＳＴ６）。 When the signal C _previous is smaller than the predetermined threshold value, the mode selection function-equipped minimum error search unit 27 does not add the impulse response component because the contribution of the carry-over component is small and the carry-over component may be a cause of deterioration. The mode information indicating that the impulse response component is not added is output to the switch 28 and the multiplexing unit 21 (step ST5).
Also, assuming that C _current (k) in equation (6) corresponding to the quantization error evaluation parameter of the subframe is the molecular component C (k) of error evaluation R (k), C _{current in} equation (6) (K) is calculated (step ST6).

音声符号化装置のスイッチ２８は、モード選択機能付最小誤差探索部２７から出力されたモード情報が加算する旨を示していれば、スイッチ状態をＯＮにして、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousを加算器２４に与え、そのモード情報が加算しない旨を示していれば、スイッチ状態をＯＦＦにして、前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousを加算器２４に与えないようにする。 If the switch 28 of the speech encoding device indicates that the mode information output from the mode selection function-added minimum error search unit 27 is to be added, the switch state is turned ON and stored in the carry component storage memory 23. If the signal C _previous which is an impulse response component of the fixed excitation codebook 9 in the previous subframe is given to the adder 24 to indicate that the mode information is not added, the switch state is turned OFF and the previous subframe is turned off. The signal C _previous which is the impulse response component of the fixed excitation codebook 9 is not given to the adder 24.

次に、モード選択機能付最小誤差探索部２７は、式（２）を使用して、すべてのｋについて、誤差評価Ｒ（ｋ）を計算し（ステップＳＴ７）、誤差評価Ｒ（ｋ）を最大にするｋの値を探索する（ステップＳＴ８）。
モード選択機能付最小誤差探索部２７は、誤差評価Ｒ（ｋ）を最大にするｋの値を探索すると、ｋの値に対応するインデックス（ピッチ情報、パルス情報、利得情報）を多重化部２９に出力する（ステップＳＴ９）。
多重化部２９は、スペクトル情報、ピッチ情報、パルス情報及び利得情報を多重化するとともに、モード選択機能付最小誤差探索部２７から出力されたモード情報も一緒に多重化し、その多重化信号を音声復号装置に送信する。 Next, the mode selection function-equipped minimum error search unit 27 calculates the error evaluation R (k) for all k using Equation (2) (step ST7), and maximizes the error evaluation R (k). The value of k to be searched is searched (step ST8).
When the minimum error search unit 27 with mode selection function searches for a value of k that maximizes the error evaluation R (k), an index (pitch information, pulse information, gain information) corresponding to the value of k is multiplexed by the multiplexing unit 29. (Step ST9).
The multiplexing unit 29 multiplexes the spectrum information, pitch information, pulse information, and gain information, and also multiplexes the mode information output from the mode selection function-equipped minimum error search unit 27, and the multiplexed signal is voiced. Send to decryption device.

音声復号装置の多重分離部４６は、音声符号化装置から送信された多重化信号を受信して、その多重化信号を分離し、スペクトル情報、ピッチ情報、パルス情報、利得情報及びモード情報を出力する。
スイッチ４７は、多重分離部４６から出力されたモード情報が前サブフレームにおける固定音源符号帳のインパルス応答成分を加算する旨を示していれば、スイッチ状態をＯＮにして、繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousを加算器４３に与え、そのモード情報が加算しない旨を示していれば、スイッチ状態をＯＦＦにして、前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousを加算器４３に与えないようにする。 The demultiplexing unit 46 of the speech decoding apparatus receives the multiplexed signal transmitted from the speech encoding apparatus, separates the multiplexed signal, and outputs spectrum information, pitch information, pulse information, gain information, and mode information. To do.
If the mode information output from the demultiplexing unit 46 indicates that the impulse response component of the fixed excitation codebook in the previous subframe is to be added, the switch 47 turns on the switch state and carries the carry component storage memory 42. If the signal C _previous which is the impulse response component of the fixed excitation codebook 33 in the previous subframe stored in FIG. 4 is given to the adder 43 and the mode information indicates that the addition is not performed, the switch state is turned OFF, The signal C _previous which is the impulse response component of the fixed excitation codebook 33 in the previous subframe is not given to the adder 43.

この実施の形態７によれば、繰越成分の寄与度を評価した結果を以て、繰越成分を加算するか否かを判断するようにしているので、モード情報を送信する分だけ、伝送ビットレートが若干増えるが、固定音源符号帳９，３３の量子化誤差を効果的に減じることができるようになり、音声品質の向上を図ることができる効果を奏する。 According to the seventh embodiment, since it is determined whether or not the carry-over component is added based on the result of evaluating the contribution degree of the carry-over component, the transmission bit rate is slightly increased by the amount of transmission of the mode information. Although it increases, the quantization error of the fixed excitation codebooks 9 and 33 can be effectively reduced, and there is an effect that the voice quality can be improved.

実施の形態８．
図１９はこの発明の実施の形態８による音声符号化装置の一部を示す構成図であり、図において、図８と同一符号は同一または相当部分を示すので説明を省略する。
ピッチ安定度評価部６１は過去のサブフレームにおけるピッチ周期の変動を監視し、そのピッチ周期の変動に応じてピッチ強調フィルタ６３のフィルタ係数を調整する処理を実施する。
ピッチ強調フィルタ６２は適応符号帳８から出力される励振音源信号のピッチ周期に応じたフィルタ係数が設定され、固定音源符号帳９から出力される励振音源信号のピッチ周波数成分を強調するフィルタである。
ピッチ強調フィルタ６３はピッチ安定度評価部６１によりフィルタ係数が設定され、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分のピッチ周波数成分を強調するフィルタである。
なお、ピッチ安定度評価部６１及びピッチ強調フィルタ６２，６３は加算手段を構成している。 Embodiment 8 FIG.
FIG. 19 is a block diagram showing a part of a speech encoding apparatus according to Embodiment 8 of the present invention. In the figure, the same reference numerals as those in FIG.
The pitch stability evaluation unit 61 monitors the variation of the pitch period in the past subframe and performs a process of adjusting the filter coefficient of the pitch enhancement filter 63 according to the variation of the pitch period.
The pitch enhancement filter 62 is a filter in which a filter coefficient corresponding to the pitch period of the excitation excitation signal output from the adaptive codebook 8 is set and the pitch frequency component of the excitation excitation signal output from the fixed excitation codebook 9 is enhanced. .
The pitch enhancement filter 63 is a filter for which the filter coefficient is set by the pitch stability evaluation unit 61 and emphasizes the pitch frequency component of the impulse response component of the fixed excitation codebook 9 in the previous subframe stored in the carry component storage memory 23. It is.
Note that the pitch stability evaluation unit 61 and the pitch emphasis filters 62 and 63 constitute addition means.

図２０はこの発明の実施の形態８による音声復号装置の一部を示す構成図であり、図において、図９と同一符号は同一または相当部分を示すので説明を省略する。
ピッチ安定度評価部８１は過去のサブフレームにおけるピッチ周期の変動を監視し、そのピッチ周期の変動に応じてピッチ強調フィルタ８３のフィルタ係数を調整する処理を実施する。
ピッチ強調フィルタ８２は適応符号帳３２から出力される励振音源信号のピッチ周期に応じたフィルタ係数が設定され、固定音源符号帳３３から出力される励振音源信号のピッチ周波数成分を強調するフィルタである。
ピッチ強調フィルタ８３はピッチ安定度評価部８１によりフィルタ係数が設定され、繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分のピッチ周波数成分を強調するフィルタである。
なお、ピッチ安定度評価部８１及びピッチ強調フィルタ８２，８３は加算手段を構成している。 20 is a block diagram showing a part of a speech decoding apparatus according to Embodiment 8 of the present invention. In the figure, the same reference numerals as those in FIG.
The pitch stability evaluation unit 81 monitors the variation of the pitch period in the past subframe, and performs a process of adjusting the filter coefficient of the pitch enhancement filter 83 according to the variation of the pitch period.
The pitch enhancement filter 82 is a filter in which a filter coefficient corresponding to the pitch period of the excitation excitation signal output from the adaptive codebook 32 is set and the pitch frequency component of the excitation excitation signal output from the fixed excitation codebook 33 is enhanced. .
The pitch enhancement filter 83 is a filter for which the filter coefficient is set by the pitch stability evaluation unit 81 and emphasizes the pitch frequency component of the impulse response component of the fixed excitation codebook 33 in the previous subframe stored in the carry component storage memory 42. It is.
Note that the pitch stability evaluation unit 81 and the pitch emphasis filters 82 and 83 constitute addition means.

図２１は図１９のピッチ強調フィルタ６３の内部構成例を示す構成図である。図２１の例では、ピッチ強調フィルタ６３は加算器６３ａと逆フィルタ６３ｂとフィルタ係数乗算器６３ｃから構成されている。
なお、図２１は図１９のピッチ強調フィルタ６３の内部構成例を示しているが、図２０のピッチ強調フィルタ８３の内部構成も同様である。 FIG. 21 is a block diagram showing an internal configuration example of the pitch enhancement filter 63 of FIG. In the example of FIG. 21, the pitch enhancement filter 63 includes an adder 63a, an inverse filter 63b, and a filter coefficient multiplier 63c.
FIG. 21 shows an example of the internal configuration of the pitch enhancement filter 63 of FIG. 19, but the internal configuration of the pitch enhancement filter 83 of FIG. 20 is the same.

次に動作について説明する。
音声符号化装置及び音声復号装置のピッチ安定度評価部６１，８１は、過去のサブフレームにおけるピッチ周期の変動を監視する。
即ち、ピッチ安定度評価部６１，８１は、適応符号帳８又は多重分離部３１から出力されるピッチ情報を参照して、例えば、過去Ｎサブフレームにおけるピッチ周期の平均と分散を計算する。 Next, the operation will be described.
The pitch stability evaluation units 61 and 81 of the speech encoding device and speech decoding device monitor changes in pitch period in past subframes.
That is, the pitch stability evaluation units 61 and 81 refer to the pitch information output from the adaptive codebook 8 or the demultiplexing unit 31 and calculate, for example, the average and variance of pitch periods in the past N subframes.

ピッチ安定度評価部６１，８１は、ピッチ周期の分散が小さい場合（ピッチ変動が小さく、安定している場合）、ピッチ強調フィルタ６３，８３のフィルタ係数を重くして、ピッチ強調フィルタ６３，８３のフィルタ効果を強くするようにする。
一方、ピッチ周期の分散が大きい場合（ピッチ変動が大きく、不安定である場合）、ピッチ強調フィルタ６３，８３のフィルタ係数を軽くして、ピッチ強調フィルタ６３，８３のフィルタ効果を無効、または、フィルタ効果を弱めるようにする。 When the variance of the pitch period is small (when the pitch fluctuation is small and stable), the pitch stability evaluation units 61 and 81 increase the filter coefficients of the pitch enhancement filters 63 and 83 to increase the pitch enhancement filters 63 and 83. Try to strengthen the filter effect.
On the other hand, when the variance of the pitch period is large (when the pitch fluctuation is large and unstable), the filter coefficients of the pitch enhancement filters 63 and 83 are reduced, and the filter effect of the pitch enhancement filters 63 and 83 is invalidated, or Reduce the filter effect.

この実施の形態８によれば、ピッチ安定度評価部６１，８１が過去のサブフレームにおけるピッチ周期の変動を監視し、そのピッチ周期の変動に応じてピッチ強調フィルタ６３，８３のフィルタ係数を調整するように構成したので、音声符号化装置がモード情報を音声復号装置に送信することなく、伝送ビットレートを維持したまま、固定音源符号帳の量子化誤差を減じて、音声品質の向上を図ることができる効果を奏する。 According to the eighth embodiment, the pitch stability evaluation units 61 and 81 monitor the fluctuation of the pitch period in the past subframe and adjust the filter coefficients of the pitch enhancement filters 63 and 83 according to the fluctuation of the pitch period. Therefore, the speech encoding apparatus does not transmit mode information to the speech decoding apparatus, and the quantization error of the fixed excitation codebook is reduced while maintaining the transmission bit rate, thereby improving speech quality. There is an effect that can be.

実施の形態９．
図２２はこの発明の実施の形態９による音声符号化装置を示す構成図であり、図において、図１６及び図１９と同一符号は同一または相当部分を示すので説明を省略する。
モード選択機能付最小誤差探索部６４は図１や図８の最小誤差探索部２０と同様に、聴覚重み付けフィルタ１９から出力される量子化誤差が小さくなるように、適応符号帳８及び固定音源符号帳９から出力される励振音源信号や、利得符号帳１０から出力される利得を制御して、合成フィルタ１６から出力される複数の合成音声の中で、量子化誤差が最小の合成音声に係る符号化パラメータを探索する処理を実施するが、適応符号帳８及び固定音源符号帳９から出力される励振音源信号等を制御する際、ピッチ強調フィルタ８４のフィルタ係数も制御するようにする。なお、モード選択機能付最小誤差探索部６４はパラメータ探索手段を構成している。
ピッチ強調フィルタ６５はモード選択機能付最小誤差探索部６４によりフィルタ係数が設定され、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分のピッチ周波数成分を強調するフィルタである。なお、ピッチ強調フィルタ６５は加算手段を構成している。 Embodiment 9 FIG.
FIG. 22 is a block diagram showing a speech encoding apparatus according to Embodiment 9 of the present invention. In the figure, the same reference numerals as those in FIGS.
Similar to the minimum error search unit 20 in FIGS. 1 and 8, the minimum error search unit with mode selection function 64 is adapted to the adaptive codebook 8 and the fixed excitation code so that the quantization error output from the perceptual weighting filter 19 is reduced. The excitation sound source signal output from the book 9 and the gain output from the gain codebook 10 are controlled, and among the plurality of synthesized voices output from the synthesis filter 16, the synthesized voice with the minimum quantization error is related. The processing for searching for the encoding parameters is performed, but when controlling the excitation excitation signal output from the adaptive codebook 8 and the fixed excitation codebook 9, the filter coefficient of the pitch enhancement filter 84 is also controlled. The mode selection function-equipped minimum error search unit 64 constitutes parameter search means.
The pitch enhancement filter 65 has a filter coefficient set by the mode selection function-equipped minimum error search unit 64, and calculates the pitch frequency component of the impulse response component of the fixed excitation codebook 9 in the previous subframe stored in the carry component storage memory 23. It is a filter to emphasize. Note that the pitch emphasis filter 65 constitutes addition means.

図２３はこの発明の実施の形態９による音声復号装置を示す構成図であり、図において、図１７及び図２０と同一符号は同一または相当部分を示すので説明を省略する。
ピッチ強調フィルタ８４は多重分離部４６から出力されたモード情報が示す利得インデックスに応じたフィルタ係数が設定され、繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分のピッチ周波数成分を強調するフィルタである。なお、ピッチ強調フィルタ８４は加算手段を構成している。 FIG. 23 is a block diagram showing a speech decoding apparatus according to Embodiment 9 of the present invention. In the figure, the same reference numerals as those in FIGS.
The pitch enhancement filter 84 is set with a filter coefficient corresponding to the gain index indicated by the mode information output from the demultiplexing unit 46, and the impulse of the fixed excitation codebook 33 in the previous subframe stored in the carry component storage memory 42. It is a filter that emphasizes the pitch frequency component of the response component. The pitch emphasizing filter 84 constitutes addition means.

図２４は図２２のピッチ強調フィルタ６５の内部構成例を示す構成図である。図２４の例では、ピッチ強調フィルタ６５は加算器６５ａ、逆フィルタ６５ｂ、利得符号帳６５ｃ及びフィルタ係数乗算器６５ｄから構成されている。
なお、図２４は図１２のピッチ強調フィルタ６５の内部構成例を示しているが、図２３のピッチ強調フィルタ８４の内部構成も同様である。 FIG. 24 is a block diagram showing an example of the internal configuration of the pitch enhancement filter 65 of FIG. In the example of FIG. 24, the pitch enhancement filter 65 includes an adder 65a, an inverse filter 65b, a gain codebook 65c, and a filter coefficient multiplier 65d.
24 shows an example of the internal configuration of the pitch enhancement filter 65 of FIG. 12, the internal configuration of the pitch enhancement filter 84 of FIG. 23 is the same.

上記実施の形態１では、最小誤差探索部２０が、聴覚重み付けフィルタ１９から出力される量子化誤差が小さくなるように、適応符号帳８及び固定音源符号帳９から出力される励振音源信号や、利得符号帳１０から出力される利得を制御して、合成フィルタ１６から出力される複数の合成音声の中で、量子化誤差が最小の合成音声を探索するものについて示したが、さらに、ピッチ強調フィルタ８４のフィルタ係数を制御しながら、量子化誤差が最小の合成音声を探索するようにしてもよい。
具体的には、以下の通りである。 In the first embodiment, the minimum error search unit 20 makes the excitation excitation signal output from the adaptive codebook 8 and the fixed excitation codebook 9 so that the quantization error output from the perceptual weighting filter 19 is reduced, Although the gain output from the gain codebook 10 is controlled to search for the synthesized speech with the minimum quantization error among the plurality of synthesized speech output from the synthesis filter 16, the pitch emphasis is further shown. A synthesized speech with the smallest quantization error may be searched for while controlling the filter coefficient of the filter 84.
Specifically, it is as follows.

モード選択機能付最小誤差探索部６４は、適応符号帳８から出力される励振音源信号と、固定音源符号帳９から出力される励振音源信号と、利得符号帳１０から出力される利得と、ピッチ強調フィルタ８４のフィルタ係数とを適宜組み合わせて、複数の駆動音源を生成させるようにする。
そして、モード選択機能付最小誤差探索部６４は、駆動音源生成部５により生成された複数の駆動音源が合成フィルタ１６を通されて、合成フィルタ１６から出力された複数の合成音声の中で、量子化誤差が最小の合成音声を探索するようにする。 The mode selection function-equipped minimum error search unit 64 includes an excitation excitation signal output from the adaptive codebook 8, an excitation excitation signal output from the fixed excitation codebook 9, a gain output from the gain codebook 10, and a pitch. A plurality of driving sound sources are generated by appropriately combining the filter coefficients of the enhancement filter 84.
The minimum error search unit 64 with a mode selection function passes through the synthesis filter 16 through the plurality of driving sound sources generated by the driving sound source generation unit 5, and among the plurality of synthesized speech output from the synthesis filter 16, A synthesized speech with the smallest quantization error is searched.

モード選択機能付最小誤差探索部６４は、量子化誤差が最小の合成音声を探索すると、その合成音声が得られた際のピッチ強調フィルタ６５の利得インデックス（フィルタ係数に対応するインデックス）を示すモード情報を多重化部２９に出力する。
多重化部２９は、スペクトル情報、ピッチ情報、パルス情報及び利得情報を多重化するとともに、モード選択機能付最小誤差探索部６４から出力されたモード情報も一緒に多重化し、その多重化信号を音声復号装置に送信する。 When the minimum error search unit with mode selection function 64 searches for a synthesized speech with the smallest quantization error, the mode indicating a gain index (index corresponding to the filter coefficient) of the pitch enhancement filter 65 when the synthesized speech is obtained. Information is output to the multiplexing unit 29.
The multiplexing unit 29 multiplexes the spectrum information, pitch information, pulse information, and gain information, and also multiplexes the mode information output from the mode selection function-equipped minimum error search unit 64, and the multiplexed signal is voiced. Send to decryption device.

音声復号装置の多重分離部４６は、音声符号化装置から送信された多重化信号を受信して、その多重化信号を分離し、スペクトル情報、ピッチ情報、パルス情報、利得情報及びモード情報を出力する。
ピッチ強調フィルタ８４は、多重分離部４６から出力されたモード情報を受けると、そのモード情報が示す利得インデックスに応じたフィルタ係数が設定され、繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分のピッチ周波数成分を強調する。 The demultiplexing unit 46 of the speech decoding apparatus receives the multiplexed signal transmitted from the speech encoding apparatus, separates the multiplexed signal, and outputs spectrum information, pitch information, pulse information, gain information, and mode information. To do.
When the pitch enhancement filter 84 receives the mode information output from the demultiplexing unit 46, the filter coefficient corresponding to the gain index indicated by the mode information is set, and the previous subframe stored in the carry component storage memory 42 is set. The pitch frequency component of the impulse response component of the fixed excitation codebook 33 is emphasized.

この実施の形態９によれば、ピッチ強調フィルタ８４のフィルタ係数を制御しながら、量子化誤差が最小の合成音声を探索するように構成したので、ピッチ強調フィルタ６５の利得インデックスを示すモード情報を送信する分だけ、伝送ビットレートが若干増えるが、固定音源符号帳９，３３の量子化誤差を効果的に減じることができるようになり、音声品質の向上を図ることができる効果を奏する。 According to the ninth embodiment, since it is configured to search for the synthesized speech with the minimum quantization error while controlling the filter coefficient of the pitch enhancement filter 84, the mode information indicating the gain index of the pitch enhancement filter 65 is obtained. Although the transmission bit rate is slightly increased by the amount of transmission, the quantization error of the fixed excitation codebooks 9 and 33 can be effectively reduced, and the voice quality can be improved.

実施の形態１０．
図２５はこの発明の実施の形態１０による音声符号化装置を示す構成図であり、図において、図１６と同一符号は同一または相当部分を示すので説明を省略する。
モード選択機能付最小誤差探索部６６は図１や図８の最小誤差探索部２０と同様に、聴覚重み付けフィルタ１９から出力される量子化誤差が小さくなるように、適応符号帳８及び固定音源符号帳９から出力される励振音源信号や、利得符号帳１０から出力される利得を制御して、合成フィルタ１６から出力される複数の合成音声の中で、量子化誤差が最小の合成音声に係る符号化パラメータを探索する処理を実施するほか、当該サブフレームの入力音声だけでなく、次サブフレームの入力音声を量子化誤差の評価対象に含めるか否かを判定し、次サブフレームの入力音声を量子化誤差の評価対象に含める場合には、前サブフレームにおける固定音源符号帳のインパルス応答成分を加算する旨を示すモード情報をスイッチ２８及び多重化部２９に出力する処理を実施する。なお、モード選択機能付最小誤差探索部６６はパラメータ探索手段を構成している。 Embodiment 10 FIG.
25 is a block diagram showing a speech coding apparatus according to Embodiment 10 of the present invention. In the figure, the same reference numerals as those in FIG.
Similar to the minimum error search unit 20 in FIGS. 1 and 8, the mode selection function-equipped minimum error search unit 66 reduces the quantization error output from the perceptual weighting filter 19 and reduces the adaptive codebook 8 and the fixed excitation code. The excitation sound source signal output from the book 9 and the gain output from the gain codebook 10 are controlled, and among the plurality of synthesized voices output from the synthesis filter 16, the synthesized voice with the minimum quantization error is related. In addition to performing the process of searching for the encoding parameter, it is determined whether to include not only the input speech of the subframe but also the input speech of the next subframe in the quantization error evaluation target, and the input speech of the next subframe Is included in the quantization error evaluation target, the mode information indicating that the impulse response component of the fixed excitation codebook in the previous subframe is added is added to the switch 28 and the multiplexing unit 29. And it carries out a process of outputting. The mode selection function-equipped minimum error search unit 66 constitutes a parameter search means.

図２６はこの発明の実施の形態１０による音声符号化装置の処理内容の概略を示すフローチャートである。
この発明の実施の形態１０による音声復号装置の構成は、例えば、図１７の音声復号装置と同一である。 FIG. 26 is a flowchart showing an outline of the processing contents of the speech coding apparatus according to Embodiment 10 of the present invention.
The configuration of the speech decoding apparatus according to Embodiment 10 of the present invention is the same as that of the speech decoding apparatus of FIG. 17, for example.

次に動作について説明する。
モード選択機能付最小誤差探索部６６は、前サブフレームが「繰越モード」であったか否かを判定する（ステップＳＴ１１）。
即ち、前サブフレームでは、スイッチ２８のスイッチ状態がＯＮになって、繰越成分記憶用メモリ２３に記憶されている固定音源符号帳９のインパルス応答成分である信号Ｃ_previousが加算器２４に与えられているか否かを判定する。 Next, the operation will be described.
The mode selection function-equipped minimum error search unit 66 determines whether or not the previous subframe is the “carry forward mode” (step ST11).
That is, in the previous subframe, the switch state of the switch 28 is turned ON, and the signal C _previous which is the impulse response component of the fixed excitation codebook 9 stored in the carry component storage memory 23 is given to the adder 24. It is determined whether or not.

モード選択機能付最小誤差探索部６６は、前サブフレームが「繰越モード」である場合、上記の式（１１）を使用して、繰越成分記憶用メモリ２３に記憶される信号Ｃ_previousを計算し（ステップＳＴ１２）、その信号Ｃ_previousを誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）に加算する（式（１０）を参照）。
一方、前サブフレームが「繰越モード」でない場合、信号Ｃ_previous＝０として（ステップＳＴ１３）、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）には信号Ｃ_previousを加算しない。 When the previous subframe is the “carry forward mode”, the mode selection function-equipped minimum error search unit 66 calculates the signal C _previous stored in the carry component storage memory 23 using the above equation (11). (Step ST12), the signal C _previous is added to the numerator component C (k) of the error evaluation R (k) (see equation (10)).
On the other hand, when the previous subframe is not the “carry-over mode”, the signal C _previous = 0 (step ST13), and the signal C _previous is not added to the numerator component C (k) of the error evaluation R (k).

次に、モード選択機能付最小誤差探索部６６は、次サブフレームの量子化誤差評価パラメータに相当する式（７）のＣ_next（ｋ）を計算し（ステップＳＴ１４）、そのＣ_next（ｋ）を所定の閾値と比較する（ステップＳＴ１５）。
モード選択機能付最小誤差探索部６６は、Ｃ_next（ｋ）が所定の閾値より大きい場合、上記実施の形態１と同様に、当該サブフレームの入力音声だけでなく、次サブフレームの入力音声を量子化誤差の評価対象に含めるようにし、また、前サブフレームにおける固定音源符号帳のインパルス応答成分を加算する旨を示すモード情報をスイッチ２８及び多重化部２９に出力する（ステップＳＴ１６）。
また、モード選択機能付最小誤差探索部６６は、下記の式（１２）に示すように、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）を計算する（ステップＳＴ１７）。
Ｃ（ｋ）＝Ｃ_previous＋Ｃ_current（ｋ）＋Ｃ_next（ｋ）（１２） Next, the mode selection function-equipped minimum error search unit 66 calculates C _next (k) of Expression (7) corresponding to the quantization error evaluation parameter of the next subframe (step ST14), and C _next (k) Is compared with a predetermined threshold value (step ST15).
When C _next (k) is larger than a predetermined threshold, the mode selection function-equipped minimum error search unit 66 not only inputs the input sound of the next subframe but also the input sound of the next subframe, as in the first embodiment. The mode information indicating that the impulse response component of the fixed excitation codebook in the previous subframe is added is output to the switch 28 and the multiplexing unit 29 (step ST16).
Further, the minimum error search unit with mode selection function 66 calculates the molecular component C (k) of the error evaluation R (k) as shown in the following equation (12) (step ST17).
C (k) = C _previous + C _current (k) + C _next (k) (12)

モード選択機能付最小誤差探索部６６は、Ｃ_next（ｋ）が所定の閾値より小さい場合、量子化誤差の評価対象を拡張せずに、評価対象を当該サブフレームの入力音声だけに限定するようにし、また、前サブフレームにおける固定音源符号帳のインパルス応答成分を加算しない旨を示すモード情報をスイッチ２８及び多重化部２９に出力する（ステップＳＴ１８）。
また、モード選択機能付最小誤差探索部６６は、式（１０）を使用して、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）を計算する（ステップＳＴ１９）。 When C _next (k) is smaller than a predetermined threshold, the mode selection function-equipped minimum error search unit 66 limits the evaluation target to only the input speech of the subframe without expanding the evaluation target of the quantization error. In addition, mode information indicating that the impulse response component of the fixed excitation codebook in the previous subframe is not added is output to the switch 28 and the multiplexing unit 29 (step ST18).
Further, the mode selection function-equipped minimum error search unit 66 calculates the numerator component C (k) of the error evaluation R (k) using the equation (10) (step ST19).

音声符号化装置のスイッチ２８は、モード選択機能付最小誤差探索部６６から出力されたモード情報が加算する旨を示していれば、スイッチ状態をＯＮにして、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousを加算器２４に与え、そのモード情報が加算しない旨を示していれば、スイッチ状態をＯＦＦにして、前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousを加算器２４に与えないようにする。 If the switch 28 of the speech encoding device indicates that the mode information output from the mode selection function-added minimum error search unit 66 is to be added, the switch state is turned ON and stored in the carry component storage memory 23. If the signal C _previous which is an impulse response component of the fixed excitation codebook 9 in the previous subframe is given to the adder 24 to indicate that the mode information is not added, the switch state is turned OFF and the previous subframe is turned off. The signal C _previous which is the impulse response component of the fixed excitation codebook 9 is not given to the adder 24.

次に、モード選択機能付最小誤差探索部６６は、式（２）を使用して、誤差評価Ｒ（ｋ）を計算し（ステップＳＴ２０）、すべてのｋの中で、誤差評価Ｒ（ｋ）を最大にするｋの値を探索する（ステップＳＴ２１）。
モード選択機能付最小誤差探索部６６は、誤差評価Ｒ（ｋ）を最大にするｋの値を探索すると、ｋの値に対応するインデックス（ピッチ情報、パルス情報、利得情報）を多重化部２９に出力する（ステップＳＴ２２）。
多重化部２９は、スペクトル情報、ピッチ情報、パルス情報及び利得情報を多重化するとともに、モード選択機能付最小誤差探索部６６から出力されたモード情報も一緒に多重化し、その多重化信号を音声復号装置に送信する。 Next, the mode selection function-equipped minimum error search unit 66 calculates the error evaluation R (k) using the equation (2) (step ST20), and among all k, the error evaluation R (k) The value of k that maximizes is searched (step ST21).
When searching for the value of k that maximizes the error evaluation R (k), the mode selection function-equipped minimum error search unit 66 multiplexes an index (pitch information, pulse information, gain information) corresponding to the value of k. (Step ST22).
The multiplexing unit 29 multiplexes the spectrum information, pitch information, pulse information, and gain information, and also multiplexes the mode information output from the mode selection function-equipped minimum error search unit 66, and the multiplexed signal is voiced. Send to decryption device.

図１７の音声復号装置の多重分離部４６は、音声符号化装置から送信された多重化信号を受信して、その多重化信号を分離し、スペクトル情報、ピッチ情報、パルス情報、利得情報及びモード情報を出力する。
スイッチ４７は、多重分離部４６から出力されたモード情報が前サブフレームにおける固定音源符号帳のインパルス応答成分を加算する旨を示していれば、スイッチ状態をＯＮにして、繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousを加算器４３に与え、そのモード情報が加算しない旨を示していれば、スイッチ状態をＯＦＦにして、前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousを加算器４３に与えないようにする。 17 receives the multiplexed signal transmitted from the speech coding apparatus, separates the multiplexed signal, and obtains spectrum information, pitch information, pulse information, gain information, and mode. Output information.
If the mode information output from the demultiplexing unit 46 indicates that the impulse response component of the fixed excitation codebook in the previous subframe is to be added, the switch 47 turns on the switch state and carries the carry component storage memory 42. If the signal C _previous which is the impulse response component of the fixed excitation codebook 33 in the previous subframe stored in FIG. 4 is given to the adder 43 and the mode information indicates that the addition is not performed, the switch state is turned OFF, The signal C _previous which is the impulse response component of the fixed excitation codebook 33 in the previous subframe is not given to the adder 43.

この実施の形態１０によれば、誤差評価の拡張区間の寄与度に応じて、当該サブフレームの直後のサブフレーム（第（Ｎ＋１）サブフレーム）に繰り越すか否かを決めるように構成したので、拡張の可否と繰越の可否とがミスマッチを起こすことがなくなり、音質の向上を図ることができる効果を奏する。 According to the tenth embodiment, since it is configured to determine whether to carry over to the subframe immediately after the subframe ((N + 1) th subframe) according to the contribution of the extended section of error evaluation, There is no mismatch between the availability of expansion and the availability of carry-over, and the sound quality can be improved.

実施の形態１１．
図２７はこの発明の実施の形態１１による音声符号化装置を示す構成図であり、図において、図２５と同一符号は同一または相当部分を示すので説明を省略する。
モード別最小誤差探索部６７は図２５のモード選択機能付最小誤差探索部６６と同様に、当該サブフレームの入力音声だけでなく、次サブフレームの入力音声を量子化誤差の評価対象に含めるか否かを判定し、次サブフレームの入力音声を量子化誤差の評価対象に含める場合には、前サブフレームにおける固定音源符号帳のインパルス応答成分を加算する旨を示すモード情報をスイッチ２８に出力するが、図２５のモード選択機能付最小誤差探索部６６と異なり、スイッチ２８に出力するモード情報は、モード情報候補（最終的には決定されていないモード情報）としてスイッチ２８及び繰越成分誤差評価モード決定部７１に出力する。 Embodiment 11 FIG.
FIG. 27 is a block diagram showing a speech encoding apparatus according to Embodiment 11 of the present invention. In the figure, the same reference numerals as those in FIG.
As with the minimum error search unit with mode selection function 66 in FIG. 25, the mode-specific minimum error search unit 67 includes not only the input speech of the subframe but also the input speech of the next subframe as the quantization error evaluation target. If the input speech of the next subframe is included in the quantization error evaluation target, mode information indicating that the impulse response component of the fixed excitation codebook in the previous subframe is added is output to the switch 28. However, unlike the minimum error search unit 66 with the mode selection function in FIG. 25, the mode information output to the switch 28 is the switch 28 and carry-over component error evaluation as mode information candidates (mode information that has not been finally determined). It outputs to the mode determination part 71.

多重化部６８はＬＰＣ量子化・逆量子化部６から出力されたスペクトル情報と、量子化誤差が最小の合成音声が得られる際に適応符号帳８から出力される励振音源信号のピッチ情報と、量子化誤差が最小の合成音声が得られる際に利得符号帳１０から出力される利得を示す利得情報とを多重化して、その多重化信号をバッファ６９に出力する処理を実施する。
バッファ６９は多重化部６８から出力された多重化信号を一時的に格納するメモリである。
バッファ７０はモード別最小誤差探索部６７から出力されたモード情報候補を一時的に格納するとともに、量子化誤差が最小の合成音声が得られる際に固定音源符号帳９から出力される励振音源信号のパルス情報をパルス情報候補（最終的には決定されていないパルス情報）として一時的に格納するメモリである。 The multiplexing unit 68 includes the spectrum information output from the LPC quantization / inverse quantization unit 6, the pitch information of the excitation sound source signal output from the adaptive codebook 8 when the synthesized speech with the minimum quantization error is obtained, and Then, a process of multiplexing the gain information indicating the gain output from the gain codebook 10 when the synthesized speech with the minimum quantization error is obtained and outputting the multiplexed signal to the buffer 69 is performed.
The buffer 69 is a memory that temporarily stores the multiplexed signal output from the multiplexing unit 68.
The buffer 70 temporarily stores the mode information candidates output from the mode-specific minimum error search unit 67, and the excitation source signal output from the fixed excitation codebook 9 when a synthesized speech with the minimum quantization error is obtained. Is temporarily stored as pulse information candidates (pulse information that has not been finally determined).

繰越成分誤差評価モード決定部７１は次サブフレームでの量子化誤差の評価結果に基づいて、次サブフレームの入力音声を量子化誤差の評価対象に含めるか否かを判定して、パルス情報とモード情報を最終的に決定する処理を実施する。
なお、モード別最小誤差探索部６７、多重化部６８、バッファ６９，７０及び繰越成分誤差評価モード決定部７１はパラメータ探索手段を構成している。
多重化部７２は繰越成分誤差評価モード決定部７１により決定されたパルス情報及びモード情報と、バッファ６９から出力された多重化信号とを多重化して、その多重化信号を音声復号装置に送信する処理を実施する。 Based on the evaluation result of the quantization error in the next subframe, the carry-over component error evaluation mode determination unit 71 determines whether or not the input speech of the next subframe is included in the quantization error evaluation target, and the pulse information and A process for finally determining the mode information is performed.
The mode-specific minimum error search unit 67, multiplexing unit 68, buffers 69 and 70, and carry-over component error evaluation mode determination unit 71 constitute parameter search means.
The multiplexing unit 72 multiplexes the pulse information and mode information determined by the carry component error evaluation mode determination unit 71 and the multiplexed signal output from the buffer 69, and transmits the multiplexed signal to the speech decoding apparatus. Implement the process.

図２８はこの発明の実施の形態１１による音声符号化装置の処理内容の概略を示すフローチャートである。
この発明の実施の形態１１による音声復号装置の構成は、例えば、図１７の音声復号装置と同一である。 FIG. 28 is a flowchart showing an outline of the processing contents of the speech coding apparatus according to Embodiment 11 of the present invention.
The configuration of the speech decoding apparatus according to Embodiment 11 of the present invention is the same as, for example, the speech decoding apparatus of FIG.

次に動作について説明する。
最初に、第Ｎサブフレームにおける処理について説明する。
モード別最小誤差探索部６７は、前サブフレームである第（Ｎ−１）サブフレームが「繰越モード」であったか否かを判定する（ステップＳＴ３１）。
即ち、前サブフレームでは、スイッチ２８のスイッチ状態がＯＮになって、繰越成分記憶用メモリ２３に記憶されている固定音源符号帳９のインパルス応答成分である信号Ｃ_previousが加算器２４に与えられているか否かを判定する。 Next, the operation will be described.
First, processing in the Nth subframe will be described.
The mode-specific minimum error search unit 67 determines whether or not the (N−1) -th subframe that is the previous subframe is the “carry-over mode” (step ST31).
That is, in the previous subframe, the switch state of the switch 28 is turned ON, and the signal C _previous which is the impulse response component of the fixed excitation codebook 9 stored in the carry component storage memory 23 is given to the adder 24. It is determined whether or not.

モード別最小誤差探索部６７は、前サブフレームが「繰越モード」である場合、上記の式（１１）を使用して、繰越成分記憶用メモリ２３に記憶される信号Ｃ_previousを計算し（ステップＳＴ３２）、その信号Ｃ_previousを誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）に加算する（式（１０）を参照）。
一方、前サブフレームが「繰越モード」でない場合、信号Ｃ_previous＝０として（ステップＳＴ３３）、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）には信号Ｃ_previousを加算しない。 When the previous subframe is the “carry forward mode”, the mode-specific minimum error search unit 67 calculates the signal C _previous stored in the carry component storage memory 23 using the above equation (11) (step S1). ST32), the signal C _previous is added to the numerator component C (k) of the error evaluation R (k) (see equation (10)).
On the other hand, when the previous subframe is not the “carry-over mode”, the signal C _previous = 0 (step ST33), and the signal C _previous is not added to the numerator component C (k) of the error evaluation R (k).

次に、モード別最小誤差探索部６７は、次サブフレームである第（Ｎ＋１）サブフレームの量子化誤差評価パラメータに相当する式（７）のＣ_next（ｋ）を計算する（ステップＳＴ３４）。
モード別最小誤差探索部６７は、式（７）のＣ_next（ｋ）を計算すると、前サブフレームにおける固定音源符号帳のインパルス応答成分を加算する旨を示すモード情報候補をスイッチ２８に出力するとともに、そのモード情報候補をバッファ７０に格納し（ステップＳＴ３５）、そのＣ_next（ｋ）を式（１２）に代入して、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）を計算する（ステップＳＴ３６）。
スイッチ２８は、モード別最小誤差探索部６７から加算する旨を示すモード情報候補を受けると、スイッチ状態をＯＮにして、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousを加算器２４に与えるようにする。 Next, the mode-specific minimum error search unit 67 calculates C _next (k) in Expression (7) corresponding to the quantization error evaluation parameter of the (N + 1) th subframe which is the next subframe (step ST34).
When the mode-specific minimum error search unit 67 calculates C _next (k) in Expression (7), it outputs a mode information candidate indicating that the impulse response component of the fixed excitation codebook in the previous subframe is added to the switch 28. At the same time, the mode information candidates are stored in the buffer 70 (step ST35), and the C _next (k) is substituted into the equation (12) to calculate the molecular component C (k) of the error evaluation R (k) ( Step ST36).
When the switch 28 receives the mode information candidate indicating addition from the mode-specific minimum error search unit 67, the switch 28 turns on the fixed excitation codebook in the previous subframe stored in the carry-over component storage memory 23. The signal C _previous which is the impulse response component of 9 is _supplied to the adder 24.

次に、モード別最小誤差探索部６７は、式（２）を使用して、誤差評価Ｒ（ｋ）を計算し（ステップＳＴ３７）、すべてのｋの中で、誤差評価Ｒ（ｋ）を最大にするｋの値を探索する（ステップＳＴ３８）。
モード別最小誤差探索部６７は、誤差評価Ｒ（ｋ）を最大にするｋの値を探索すると、そのｋの値をｋ０として、インデックスｋ０を保存する（ステップＳＴ３９）。
なお、誤差評価Ｒ（ｋ）が最大になる際に固定音源符号帳９から出力される励振音源信号のパルス情報がパルス情報候補としてバッファ７０に格納される。 Next, the mode-specific minimum error search unit 67 calculates the error evaluation R (k) using the equation (2) (step ST37), and the error evaluation R (k) is maximized among all k. The value of k to be searched is searched (step ST38).
When searching for the value of k that maximizes the error evaluation R (k), the mode-specific minimum error search unit 67 stores the value k as k0 and stores the index k0 (step ST39).
The pulse information of the excitation excitation signal output from the fixed excitation codebook 9 when the error evaluation R (k) is maximized is stored in the buffer 70 as a pulse information candidate.

次に、モード別最小誤差探索部６７は、前サブフレームにおける固定音源符号帳のインパルス応答成分を加算しない旨を示すモード情報候補をスイッチ２８に出力するとともに、そのモード情報候補をバッファ７０に格納し（ステップＳＴ４０）、そのＣ_next（ｋ）を式（１０）に代入して、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）を計算する（ステップＳＴ４１）。
スイッチ２８は、モード別最小誤差探索部６７から加算しない旨を示すモード情報候補を受けると、スイッチ状態をＯＦＦにして、繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousを加算器２４に与えないようにする。 Next, the mode-specific minimum error search unit 67 outputs a mode information candidate indicating that the impulse response component of the fixed excitation codebook in the previous subframe is not added to the switch 28 and stores the mode information candidate in the buffer 70. Then, the C _next (k) is substituted into the equation (10) to calculate the molecular component C (k) of the error evaluation R (k) (step ST41).
When the switch 28 receives the mode information candidate indicating that the addition is not performed from the mode-specific minimum error search unit 67, the switch 28 turns off the switch state, and the fixed excitation codebook in the previous subframe stored in the carry component storage memory 23. The signal C _previous which is the impulse response component of 9 is not given to the adder 24.

次に、モード別最小誤差探索部６７は、式（２）を使用して、誤差評価Ｒ（ｋ）を計算し（ステップＳＴ４２）、すべてのｋの中で、誤差評価Ｒ（ｋ）を最大にするｋの値を探索する（ステップＳＴ４３）。
モード別最小誤差探索部６７は、誤差評価Ｒ（ｋ）を最大にするｋの値を探索すると、そのｋの値をｋ１として、インデックスｋ１を保存する（ステップＳＴ４４）。
なお、誤差評価Ｒ（ｋ）が最大になる際に固定音源符号帳９から出力される励振音源信号のパルス情報がパルス情報候補としてバッファ７０に格納される。 Next, the mode-specific minimum error search unit 67 calculates the error evaluation R (k) using Equation (2) (step ST42), and the error evaluation R (k) is maximized among all k. The value of k to be searched is searched (step ST43).
When searching for the value of k that maximizes the error evaluation R (k), the mode-specific minimum error search unit 67 stores the index k1 with the value of k as k1 (step ST44).
The pulse information of the excitation excitation signal output from the fixed excitation codebook 9 when the error evaluation R (k) is maximized is stored in the buffer 70 as a pulse information candidate.

次に、第（Ｎ＋１）サブフレームにおける処理について説明する。
モード別最小誤差探索部６７は、繰越成分記憶用メモリ２３に記憶される信号Ｃ_previousを計算し（ステップＳＴ５１）、その信号Ｃ_previousを所定の閾値と比較する（ステップＳＴ５２）。
モード別最小誤差探索部６７は、信号Ｃ_previousが所定の閾値より大きい場合、第Ｎサブフレームにおける固定音源符号帳のインパルス応答成分を「繰り越しする」旨を示す第ＮＳＦのモード情報を繰越成分誤差評価モード決定部７１に出力するとともに（ステップＳＴ５３）、先に保存しているインデックスｋ０を繰越成分誤差評価モード決定部７１に出力する（ステップＳＴ５４）。 Next, processing in the (N + 1) th subframe will be described.
The mode-specific minimum error search unit 67 calculates the signal C _previous stored in the carry component storage memory 23 (step ST51), and compares the signal C _previous with a predetermined threshold (step ST52).
When the signal C _previous is larger than a predetermined threshold, the mode-specific minimum error search unit 67 sets the NSF mode information indicating that the impulse response component of the fixed excitation codebook in the Nth subframe is “carried over” as a carried component error. While outputting to the evaluation mode determination part 71 (step ST53), the index k0 preserve | saved previously is output to the carry-over component error evaluation mode determination part 71 (step ST54).

モード別最小誤差探索部６７は、信号Ｃ_previousが所定の閾値より小さい場合、第Ｎサブフレームにおける固定音源符号帳のインパルス応答成分を「繰り越ししない」旨を示す第ＮＳＦのモード情報を繰越成分誤差評価モード決定部７１に出力するとともに（ステップＳＴ５５）、先に保存しているインデックスｋ１を繰越成分誤差評価モード決定部７１に出力する（ステップＳＴ５６）。 When the signal C _previous is smaller than a predetermined threshold, the mode-specific minimum error search unit 67 sets the NSF mode information indicating that the impulse response component of the fixed excitation codebook in the Nth subframe is not carried over to the carryover component error. While outputting to the evaluation mode determination part 71 (step ST55), the index k1 preserve | saved previously is output to the carry-over component error evaluation mode determination part 71 (step ST56).

繰越成分誤差評価モード決定部７１は、モード別最小誤差探索部６７から「繰り越しする」旨を示す第ＮＳＦのモード情報を受けると、バッファ７０から固定音源符号帳のインパルス応答成分を加算する旨を示すモード情報（決定値）と、インデックスｋ０が示すｋの値に対応するパルス情報（決定値）とを取得し、そのモード情報（決定値）とパルス情報（決定値）を多重化部７２に出力する。
また、モード別最小誤差探索部６７から「繰り越ししない」旨を示す第ＮＳＦのモード情報を受けると、バッファ７０から固定音源符号帳のインパルス応答成分を加算しない旨を示すモード情報（決定値）と、インデックスｋ１が示すｋの値に対応するパルス情報（決定値）とを取得し、そのモード情報（決定値）とパルス情報（決定値）を多重化部７２に出力する。 When the carry-over component error evaluation mode determination unit 71 receives the mode information of the NSF indicating “carry-over” from the mode-specific minimum error search unit 67, the carry-over component error evaluation mode determination unit 71 adds the impulse response component of the fixed excitation codebook from the buffer 70. Mode information (determination value) to be indicated and pulse information (determination value) corresponding to the value of k indicated by the index k0 are acquired, and the mode information (determination value) and pulse information (determination value) are transmitted to the multiplexing unit 72. Output.
Further, when mode information of the NSF indicating “no carryover” is received from the mode-specific minimum error search unit 67, mode information (decision value) indicating that the impulse response component of the fixed excitation codebook is not added from the buffer 70; The pulse information (determination value) corresponding to the value of k indicated by the index k1 is acquired, and the mode information (determination value) and pulse information (determination value) are output to the multiplexing unit 72.

多重化部７２は、繰越成分誤差評価モード決定部７１から出力されたモード情報（決定値）及びパルス情報（決定値）と、バッファ６９から出力された多重化信号とを多重化して、その多重化信号を音声復号装置に送信する。 The multiplexing unit 72 multiplexes the mode information (determination value) and pulse information (determination value) output from the carry component error evaluation mode determination unit 71 and the multiplexed signal output from the buffer 69, and multiplexes the multiplexed information. The encrypted signal is transmitted to the speech decoding apparatus.

この実施の形態１１によれば、当該サブフレームの直後のサブフレーム（第（Ｎ＋１）サブフレーム）に繰り越すか否かを、第（Ｎ＋１）サブフレームの処理に委ねる（ディレイドディシジョン）ように構成したので、評価区間拡張の可否と繰越の可否のミスマッチを起こすことがなくなり、また、繰越成分の評価を実施した後に、繰越の要／不要を判断することができるため、音質の向上を高めることが可能になる。 According to the eleventh embodiment, whether to carry over to the subframe immediately after the subframe ((N + 1) th subframe) is left to the processing of the (N + 1) th subframe (delayed decision). As a result, there is no mismatch between whether or not the evaluation section can be extended and whether or not carryover is possible, and it is possible to determine the necessity / unnecessity of carryover after the evaluation of the carryover component, thereby improving the sound quality. It becomes possible.

実施の形態１２．
図２９はこの発明の実施の形態１２による音声符号化装置の一部を示す構成図であり、図において、図８と同一符号は同一または相当部分を示すので説明を省略する。
利得記憶用メモリ７３は利得符号帳１０から出力された利得（固定音源符号帳１０の励振音源信号に対する利得）を記憶するメモリである。
利得乗算器７４は繰越成分記憶用メモリ２３に記憶されている前サブフレームにおける固定音源符号帳９のインパルス応答成分である信号Ｃ_previousに、利得記憶用メモリ７３に記憶されている利得を乗算する処理を実施する。
なお、利得記憶用メモリ７３及び利得乗算器７４は加算手段を構成している。 Embodiment 12 FIG.
FIG. 29 is a block diagram showing a part of a speech encoding apparatus according to Embodiment 12 of the present invention. In the figure, the same reference numerals as those in FIG.
The gain storage memory 73 is a memory for storing the gain (gain for the excitation source signal of the fixed excitation codebook 10) output from the gain codebook 10.
The gain multiplier 74 multiplies the signal C _previous which is the impulse response component of the fixed excitation codebook 9 in the previous subframe stored in the carry component storage memory 23 by the gain stored in the gain storage memory 73. Perform the process.
The gain storage memory 73 and the gain multiplier 74 constitute addition means.

図３０はこの発明の実施の形態１２による音声復号装置の一部を示す構成図であり、図において、図９と同一符号は同一または相当部分を示すので説明を省略する。
利得記憶用メモリ８５は利得符号帳３４から出力された利得（固定音源符号帳３３の励振音源信号に対する利得）を記憶するメモリである。
利得乗算器８６は繰越成分記憶用メモリ４２に記憶されている前サブフレームにおける固定音源符号帳３３のインパルス応答成分である信号Ｃ_previousに、利得記憶用メモリ８５に記憶されている利得を乗算する処理を実施する。
なお、利得記憶用メモリ８５及び利得乗算器８６は加算手段を構成している。 30 is a block diagram showing a part of a speech decoding apparatus according to Embodiment 12 of the present invention. In the figure, the same reference numerals as those in FIG.
The gain storage memory 85 is a memory for storing the gain (gain for the excitation excitation signal of the fixed excitation codebook 33) output from the gain codebook 34.
The gain multiplier 86 multiplies the signal C _previous which is the impulse response component of the fixed excitation codebook 33 in the previous subframe stored in the carry component storage memory 42 by the gain stored in the gain storage memory 85. Perform the process.
The gain storage memory 85 and the gain multiplier 86 constitute addition means.

次に動作について説明する。
音声符号化装置では、上記実施の形態４と同様に、繰越成分記憶用メモリ２３が前サブフレームにおける固定音源符号帳９のインパルス応答成分を記憶する。
例えば、第（Ｎ−１）サブフレームで選択されたパルスＡの場合（図１０を参照）、第Ｎサブフレームに繰り越されるインパルス応答成分（図１０の区間Ａ＝Ｃ_previous）は、第（Ｎ−１）サブフレームの処理中にパルスＡが選択された時点で判明するので、繰越成分記憶用メモリ２３が信号Ｃ_previousを記憶する。
また、利得記憶用メモリ７３が利得符号帳１０から利得乗算器１２に出力される利得（固定音源符号帳１０の励振音源信号に対する利得）を記憶する。 Next, the operation will be described.
In the speech coding apparatus, as in the fourth embodiment, the carry component storage memory 23 stores the impulse response component of the fixed excitation codebook 9 in the previous subframe.
For example, in the case of the pulse A selected in the (N−1) th subframe (see FIG. 10), the impulse response component (section A = C _{previous in} FIG. 10) carried over to the Nth subframe is the (N -1) Since the pulse A is determined when the subframe is processed, the carry component storage memory 23 stores the signal C _previous .
The gain storage memory 73 stores the gain (gain for the excitation source signal of the fixed excitation codebook 10) output from the gain codebook 10 to the gain multiplier 12.

第（Ｎ−１）サブフレームから第Ｎサブフレームの処理に移行すると、利得乗算器７４が繰越成分記憶用メモリ２３から信号Ｃ_previousを取り出し、その信号Ｃ_previousに対して利得記憶用メモリ７３に記憶されている利得を乗算して、その乗算結果を加算器２４に出力する。 When the processing shifts from the (N−1) th subframe to the Nth subframe, the gain multiplier 74 extracts the signal C _previous from the carry component storage memory 23 and stores it in the gain storage memory 73 for the signal C _previous . The stored gain is multiplied and the multiplication result is output to the adder 24.

この実施の形態１２では、最小誤差探索部２０が式（２）の誤差評価Ｒ（ｋ）を最大にするｋの値を探索する際、下記の式（１２）を使用して、誤差評価Ｒ（ｋ）の分子成分Ｃ（ｋ）を計算する。

ただし、ｇ_currは、現フレームの固定音源符号帳９の利得（非量子化利得）である。 In the twelfth embodiment, when the minimum error search unit 20 searches for the value of k that maximizes the error evaluation R (k) of the equation (2), the error evaluation R is calculated using the following equation (12). The molecular component C (k) of (k) is calculated.

Here, g _curr is the gain (unquantized gain) of the fixed excitation codebook 9 of the current frame.

Ｃ（ｋ）の第２項であるＣ_previousは式（１１）を用いて求めることができる。
Ｃ_previousは第（Ｎ−１）サブフレームの固定音源符号帳ベクトル９の繰越成分であり、繰越成分記憶用メモリ２３に記憶された信号である。
また、ｇ_prevハットは、前フレームの固定音源符号帳９の利得（量子化利得）であり、利得記憶用メモリ７３に記憶された信号である。
また、ｇ_currハットは、現フレームの固定音源符号帳９の利得（量子化利得）であり、ｃ_k と共に符号帳探索の対象となる信号である。 C _previous, which is the second term of C (k), can be obtained using Equation (11).
C _previous is a carry component of the fixed excitation codebook vector 9 of the (N−1) th subframe, and is a signal stored in the carry component storage memory 23.
Further, g _prev hat is a gain (quantization gain) of the fixed excitation codebook 9 of the previous frame, and is a signal stored in the gain storage memory 73.
Further, g _curr hat is a gain (quantization gain) of the fixed excitation codebook 9 of the current frame, and is a signal that is a codebook search target together with _ck .

音声復号装置では、第（Ｎ−１）サブフレームの処理において、音声符号化装置からパルスＡの位置を示すパルス情報が送信されてくるが、固定音源符号帳３３は、音声符号化装置の固定音源符号帳９と同じフィルタ（固定音源符号帳９と同じ内部フィルタのインパルス応答情報）を持っているため、自動的に、第Ｎサブフレームに繰り越すインパルス応答成分を得ることができるので、繰越成分記憶用メモリ４２がインパルス応答成分である信号Ｃ_previousを記憶する。
また、同時に、利得記憶用メモリ８５が利得符号帳３４で復号された固定音源符号帳３３の励振音源信号に対する利得を記憶する。 In the speech decoding apparatus, in the processing of the (N−1) th subframe, pulse information indicating the position of the pulse A is transmitted from the speech encoding apparatus, but the fixed excitation codebook 33 is fixed to the speech encoding apparatus. Since it has the same filter as the excitation codebook 9 (impulse response information of the same internal filter as the fixed excitation codebook 9), it is possible to automatically obtain an impulse response component that is carried over to the Nth subframe. The storage memory 42 stores a signal C _previous which is an impulse response component.
At the same time, the gain storage memory 85 stores the gain for the excitation source signal of the fixed excitation codebook 33 decoded by the gain codebook 34.

第（Ｎ−１）サブフレームから第Ｎサブフレームの処理に移行すると、利得乗算器８６が繰越成分記憶用メモリ４２から信号Ｃ_previousを取り出し、その信号Ｃ_previousに対して利得記憶用メモリ８５に記憶されている利得を乗算して、その乗算結果を加算器４３に出力する。 When the processing shifts from the (N−1) th subframe to the Nth subframe, the gain multiplier 86 extracts the signal C _previous from the carry component storage memory 42 and stores it in the gain storage memory 85 for the signal C _previous . The stored gain is multiplied and the multiplication result is output to the adder 43.

上記の構成とすることにより、繰越成分の利得は、前サブフレームからの残存インパルス応答成分を正確に反映したものとなるから、当該サブフレームの直前のサブフレーム（第（Ｎ−１）サブフレーム）から、当該サブフレームに掛けて、特に利得が急激に変化した場合に、前サブフレームからの繰越成分に現サブフレームの利得が掛かることによるミスマッチの影響を回避することができるため、音質の向上を図ることができる。 With the above configuration, the gain of the carry-over component accurately reflects the remaining impulse response component from the previous subframe, so the subframe immediately before the subframe ((N−1) th subframe). ), It is possible to avoid the influence of mismatch due to the gain of the current subframe being applied to the carry-over component from the previous subframe, especially when the gain changes rapidly over the subframe. Improvements can be made.

ただし、この実施の形態１２は、固定音源（波形）の量子化と、利得の量子化とを同時進行させねばならず、探索処理が膨大となる恐れがあるが、例えば、下記の式（１３）のように、現サブフレームの量子化利得の代わりに、近似値として、入力信号によって決まる量子化前の固定音源成分の利得ｇ_currを用いて、固定音源符号帳の探索を行うようにすれば、若干の音質劣化を伴うが、固定音源符号帳の探索と、利得符号帳の探索とを分離して、探索処理を軽減することができる。

ｇ_currは、現フレームの固定音源符号帳の利得（非量子化利得）である。 However, in the twelfth embodiment, the quantization of the fixed sound source (waveform) and the quantization of the gain must proceed simultaneously, and the search process may be enormous. For example, the following equation (13) ), Instead of the quantization gain of the current subframe, the fixed excitation codebook search is performed using the gain g _curr of the fixed _excitation component before quantization determined by the input signal as an approximate value. For example, although there is a slight deterioration in sound quality, the search processing can be reduced by separating the search for the fixed excitation codebook and the search for the gain codebook.

g _curr is the gain (non-quantization gain) of the fixed excitation codebook of the current frame.

この発明の実施の形態１による音声符号化装置を示す構成図である。It is a block diagram which shows the audio | voice coding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による音声復号装置を示す構成図である。It is a block diagram which shows the audio | voice decoding apparatus by Embodiment 1 of this invention. 入力音声の波形、入力音声のフレーム及びサブフレーム、参照信号などを示す説明図である。It is explanatory drawing which shows the waveform of an input audio | voice, the frame and sub-frame of an input audio | voice, a reference signal, etc. 量子化誤差の評価が行われる区間を示す説明図である。It is explanatory drawing which shows the area where evaluation of a quantization error is performed. この発明の実施の形態２による音声符号化装置を示す構成図である。It is a block diagram which shows the audio | voice coding apparatus by Embodiment 2 of this invention. 量子化誤差に対する重み付けの違いを示す説明図である。It is explanatory drawing which shows the difference in the weighting with respect to a quantization error. 入力音声の波形、入力音声のフレーム及びサブフレーム、参照信号などを示す説明図である。It is explanatory drawing which shows the waveform of an input audio | voice, the frame and sub-frame of an input audio | voice, a reference signal, etc. この発明の実施の形態４による音声符号化装置を示す構成図である。It is a block diagram which shows the audio | voice coding apparatus by Embodiment 4 of this invention. この発明の実施の形態４による音声復号装置を示す構成図である。It is a block diagram which shows the audio | voice decoding apparatus by Embodiment 4 of this invention. 第Ｎサブフレームに繰り越されるインパルス応答成分などを示す説明図である。It is explanatory drawing which shows the impulse response component etc. carried over by the Nth sub-frame. この発明の実施の形態５による音声符号化装置の一部を示す構成図である。It is a block diagram which shows a part of speech coding apparatus by Embodiment 5 of this invention. この発明の実施の形態５による音声復号装置の一部を示す構成図である。It is a block diagram which shows a part of speech decoding apparatus by Embodiment 5 of this invention. この発明の実施の形態６による音声符号化装置の一部を示す構成図である。It is a block diagram which shows a part of speech coding apparatus by Embodiment 6 of this invention. この発明の実施の形態６による音声復号装置の一部を示す構成図である。It is a block diagram which shows a part of speech decoding apparatus by Embodiment 6 of this invention. 入力音声の波形、入力音声のフレーム及びサブフレーム、参照信号などを示す説明図である。It is explanatory drawing which shows the waveform of an input audio | voice, the frame and sub-frame of an input audio | voice, a reference signal, etc. この発明の実施の形態７による音声符号化装置を示す構成図である。It is a block diagram which shows the audio | voice coding apparatus by Embodiment 7 of this invention. この発明の実施の形態７による音声復号装置を示す構成図である。It is a block diagram which shows the audio | voice decoding apparatus by Embodiment 7 of this invention. この発明の実施の形態７による音声符号化装置の処理内容の概略を示すフローチャートである。It is a flowchart which shows the outline of the processing content of the audio | voice coding apparatus by Embodiment 7 of this invention. この発明の実施の形態８による音声符号化装置の一部を示す構成図である。It is a block diagram which shows a part of speech coding apparatus by Embodiment 8 of this invention. この発明の実施の形態８による音声復号装置の一部を示す構成図である。It is a block diagram which shows a part of speech decoding apparatus by Embodiment 8 of this invention. 図１９のピッチ強調フィルタ６３の内部構成例を示す構成図である。It is a block diagram which shows the internal structural example of the pitch emphasis filter 63 of FIG. この発明の実施の形態９による音声符号化装置を示す構成図である。It is a block diagram which shows the audio | voice coding apparatus by Embodiment 9 of this invention. この発明の実施の形態９による音声復号装置を示す構成図である。It is a block diagram which shows the audio | voice decoding apparatus by Embodiment 9 of this invention. 図２２のピッチ強調フィルタ６５の内部構成例を示す構成図である。It is a block diagram which shows the internal structural example of the pitch emphasis filter 65 of FIG. この発明の実施の形態１０による音声符号化装置を示す構成図である。It is a block diagram which shows the audio | voice coding apparatus by Embodiment 10 of this invention. この発明の実施の形態１０による音声符号化装置の処理内容の概略を示すフローチャートである。It is a flowchart which shows the outline of the processing content of the audio | voice coding apparatus by Embodiment 10 of this invention. この発明の実施の形態１１による音声符号化装置を示す構成図である。It is a block diagram which shows the audio | voice coding apparatus by Embodiment 11 of this invention. この発明の実施の形態１１による音声符号化装置の処理内容の概略を示すフローチャートである。It is a flowchart which shows the outline of the processing content of the audio | voice coding apparatus by Embodiment 11 of this invention. この発明の実施の形態１２による音声符号化装置の一部を示す構成図である。It is a block diagram which shows a part of speech coding apparatus by Embodiment 12 of this invention. この発明の実施の形態１２による音声復号装置の一部を示す構成図である。It is a block diagram which shows a part of speech decoding apparatus by Embodiment 12 of this invention. 従来の音声符号化装置を示す構成図である。It is a block diagram which shows the conventional audio | voice encoding apparatus. 従来の音声復号装置を示す構成図である。It is a block diagram which shows the conventional speech decoding apparatus. 入力音声の波形や、入力音声のフレーム及びサブフレームを示す説明図である。It is explanatory drawing which shows the waveform of an input audio | voice, and the frame and sub-frame of an input audio | voice. パルス音源符号帳が用いる固定音源符号帳を示す構成図である。It is a block diagram which shows the fixed excitation codebook which a pulse excitation codebook uses. パルス拡散符号帳を用いる固定音源符号帳を示す構成図である。It is a block diagram which shows the fixed excitation codebook which uses a pulse-spreading codebook. パルス拡散符号帳を用いる固定音源符号帳を示す構成図である。It is a block diagram which shows the fixed excitation codebook which uses a pulse-spreading codebook. 所定の時間長を有する波束形状を示す説明図である。It is explanatory drawing which shows the wave packet shape which has predetermined | prescribed time length. 所定の時間長を有する波束を用いて、ＣＥＬＰ符号化処理を実行する場合の問題を説明する説明図である。It is explanatory drawing explaining the problem in the case of performing a CELP encoding process using the wave packet which has a predetermined | prescribed time length.

Explanation of symbols

１バッファ、２前処理部、３スペクトル分析部（スペクトル分析手段）、４線形予測分析部、５ＬＳＰ符号帳、６ＬＳＰ量子化・逆量子化部、７駆動音源生成部（駆動音源生成手段）、８適応符号帳、９固定音源符号帳、１０，６５ｃ利得符号帳、１１，１２利得乗算器、１３，６３ａ，６５ａ加算器、１４合成音声生成部（合成音声生成手段）、１５ＬＳＰ／ＬＰＣ変換部、１６合成フィルタ、１７参照ベクトル組立バッファ（パラメータ探索手段）、１８減算器（パラメータ探索手段）、１９聴覚重み付けフィルタ（パラメータ探索手段）、２０最小誤差探索部（パラメータ探索手段）、２１，２９，７２多重化部、２２重み付け最小誤差探索部（パラメータ探索手段）、２３，４２繰越成分記憶用メモリ（加算手段）、２４，４３加算器（加算手段）、２５，４４，７４，８６利得乗算器（加算手段）、２６，２８，４５，４７スイッチ（加算手段）、２７，６４，６６モード選択機能付最小誤差探索部（パラメータ探索手段）、３１，４６多重分離部（情報受信手段）、３２適応符号帳（駆動音源生成手段）、３３固定音源符号帳（駆動音源生成手段）、３４利得符号帳（駆動音源生成手段）、３５，３６利得乗算器（駆動音源生成手段）、３７加算器（駆動音源生成手段）、３８ＬＳＰ符号帳（合成音声復号手段）、３９ＬＳＰ／ＬＰＣ変換部（合成音声復号手段）、４０合成フィルタ（合成音声復号手段）、４１ポストフィルタ（合成音声復号手段）、６１，８１ピッチ安定度評価部（加算手段）、６２，６３，６５，８２，８３，８４ピッチ強調フィルタ（加算手段）、６３ｂ，６５ｂ逆フィルタ、６３ｃ，６５ｄフィルタ係数乗算器、６７モード別最小誤差探索部（パラメータ探索手段）、６８多重化部（パラメータ探索手段）、６９，７０バッファ（パラメータ探索手段）、７１繰越成分誤差評価モード決定部（パラメータ探索手段）、７３，８５利得記憶用メモリ（加算手段）。 1 buffer, 2 preprocessing unit, 3 spectrum analysis unit (spectrum analysis unit), 4 linear prediction analysis unit, 5 LSP codebook, 6 LSP quantization / inverse quantization unit, 7 drive excitation generation unit (drive excitation generation unit) 8 adaptive codebook, 9 fixed excitation codebook, 10, 65c gain codebook, 11, 12 gain multiplier, 13, 63a, 65a adder, 14 synthesized speech generation unit (synthesized speech generation means), 15 LSP / LPC Conversion unit, 16 synthesis filter, 17 reference vector assembly buffer (parameter search unit), 18 subtractor (parameter search unit), 19 auditory weighting filter (parameter search unit), 20 minimum error search unit (parameter search unit), 21, 29, 72 Multiplexing unit, 22 Weighted minimum error searching unit (parameter searching means), 23, 42 Carrying forward component storage memo (Adding means), 24, 43 Adder (adding means), 25, 44, 74, 86 Gain multiplier (adding means), 26, 28, 45, 47 Switch (adding means), 27, 64, 66 Mode selection Minimum error search section with function (parameter search means), 31, 46 Demultiplexing section (information receiving means), 32 Adaptive codebook (drive excitation generator), 33 Fixed excitation codebook (drive excitation generator), 34 Gain code Book (driving excitation generating means), 35, 36 Gain multiplier (driving excitation generating means), 37 Adder (driving excitation generating means), 38 LSP codebook (synthesized speech decoding means), 39 LSP / LPC converter (synthesis) Speech decoding unit), 40 synthesis filter (synthetic speech decoding unit), 41 post filter (synthetic speech decoding unit), 61, 81 pitch stability evaluation unit (adding unit), 62, 63, 6 5, 82, 83, 84 Pitch emphasis filter (adding means), 63b, 65b Inverse filter, 63c, 65d Filter coefficient multiplier, 67 Mode-specific minimum error searching section (parameter searching means), 68 Multiplexing section (parameter searching means) ), 69, 70 buffer (parameter search means), 71 carry-over component error evaluation mode determination section (parameter search means), 73, 85 gain storage memory (addition means).

Claims

Drive that generates a plurality of drive sound sources by combining spectrum analysis means for performing sound spectrum analysis for each frame of input speech and excitation sound source signals in units of subframes output from the adaptive code book and fixed sound source code book A synthesis filter is formed according to a spectrum analysis result of the sound source generation means and the spectrum analysis means, and a plurality of drive sound sources generated by the drive sound source generation means are passed through the synthesis filter to generate a plurality of synthesized speech. A synthesized speech generating means, and a plurality of synthesized speech generated by the synthesized speech generating means and a quantization error of the input speech is evaluated, and a code related to the synthesized speech having the smallest quantization error among the plurality of synthesized speech In the speech coding apparatus comprising the parameter search means for searching for the conversion parameter, the adaptive codebook and the fixed excitation codebook are the subcode. Output the excitation source signal of the next subframe as well as the excitation source signal of the next subframe, and the parameter search means uses not only the input speech of the subframe but also the input speech of the next subframe as an object of quantization error evaluation. A speech encoding apparatus comprising:

The parameter search means sets the weighting for the quantization error related to the input speech of the subframe larger than the weighting for the quantization error related to the input speech of the next subframe, and encodes the synthesized speech with the minimum quantization error. The speech coding apparatus according to claim 1, wherein a parameter is searched.

2. The speech coding according to claim 1, wherein the parameter search means excludes the input speech of the next subframe from the quantization error evaluation target when the next subframe belongs to a frame different from the subframe. apparatus.

For each frame of the input speech, spectrum analysis means for performing speech spectrum analysis, the excitation source signal of the subframe output from the fixed excitation codebook and the response component of the fixed excitation codebook in the previous subframe are added Drive excitation source that generates a plurality of drive excitation sources by combining the excitation excitation signal of the fixed excitation codebook to which the response component is added by the addition means and the excitation excitation signal of the subframe output from the adaptive codebook A synthesis unit that forms a synthesis filter according to a spectrum analysis result of the generation unit and the spectrum analysis unit, and passes a plurality of driving sound sources generated by the driving sound source generation unit through the synthesis filter to generate a plurality of synthesized speech Evaluating quantization errors of speech generation means, a plurality of synthesized speech generated by the synthesized speech generation means, and the input speech Speech encoding apparatus that includes a parameter search means quantization error among the plurality of synthesized speech is to search for the coding parameters of the smallest synthetic speech.

The adding means multiplies the response component of the fixed excitation codebook in the previous subframe by a gain of 0 or more and less than 1, and then adds the response component to the excitation excitation signal of the subframe. Speech encoding device.

When the previous subframe belongs to a frame different from the subframe, the adding means adds the response component of the fixed excitation codebook in the previous subframe to the excitation excitation signal of the subframe output from the fixed excitation codebook. 5. The speech coding apparatus according to claim 4, wherein no addition is performed.

The parameter search means determines whether or not to add the response component of the fixed excitation codebook in the previous subframe based on the quantization error evaluation result, and indicates that the determination result of the parameter search means is added. 5. The speech coding apparatus according to claim 4, wherein the adding means adds the response component to the excitation sound source signal of the subframe.

When a pitch emphasis filter is added to the fixed excitation codebook, the adding means filters the pitch enhancement filter for the response component of the fixed excitation codebook in the previous subframe according to the stability of the pitch period of the past subframe. 5. The speech coding apparatus according to claim 4, wherein the effect is adjusted.

When the pitch enhancement filter is added to the fixed excitation codebook, the parameter search means adjusts the filter effect of the pitch enhancement filter on the response component of the fixed excitation codebook in the previous subframe, and minimizes the quantization error. 5. The speech coding apparatus according to claim 4, wherein coding parameters relating to speech are searched.

The adding means excites the subframe output from the fixed excitation codebook only when the parameter search means includes not only the input speech of the subframe but also the input speech of the next subframe as an object of quantization error evaluation. 5. The speech coding apparatus according to claim 4, wherein the response component of the fixed excitation codebook in the previous subframe is added to the excitation signal.

The parameter search means excludes the input speech of the next subframe from the evaluation target of the quantization error when the minimum quantization error when the input speech of the next subframe is included in the evaluation target is larger than a predetermined value. The speech encoding apparatus according to claim 10.

11. The parameter search means, based on a quantization error evaluation result in the next subframe, determines whether or not to include the input speech of the next subframe as a quantization error evaluation target. Speech encoding device.

5. The speech encoding apparatus according to claim 4, wherein the adding means stores a gain of the fixed excitation codebook with respect to the excitation excitation signal, and multiplies the gain by the response component of the fixed excitation codebook in the previous subframe.

Spectral information indicating the spectral analysis result of speech from the speech coding device, pitch information of the excitation source signal output from the adaptive codebook when a synthesized speech with minimum quantization error is obtained, and synthesis with minimum quantization error Information receiving means for receiving pulse information of the excitation excitation signal output from the fixed excitation codebook when speech is obtained, and the fixed excitation codebook in the subframe corresponding to the pulse information received by the information receiving means An addition means for adding the excitation source signal and the response component of the fixed excitation codebook in the previous subframe, an excitation excitation signal of the fixed excitation codebook to which the response component is added by the addition means, and the information receiving means Driving excitation generating means for generating a driving excitation from the excitation excitation signal of the adaptive codebook in the subframe corresponding to the pitch information; Synthetic speech decoding that forms a synthesis filter according to the spectrum analysis result indicated by the spectrum information received by the information receiving means, passes the driving sound source generated by the driving sound source generating means through the synthesis filter, and decodes the synthesized speech And a speech decoding apparatus.

The adding means multiplies the response component of the fixed excitation codebook in the previous subframe by a gain of 0 to less than 1, and then adds the response component to the excitation signal of the fixed excitation codebook in the subframe. The speech decoding apparatus according to claim 14.

When the previous subframe belongs to a frame different from the subframe, the adding means does not add the response component of the fixed excitation codebook in the previous subframe to the excitation signal of the fixed excitation codebook in the subframe. The speech decoding apparatus according to claim 14.

The information receiving means receives mode information indicating whether or not to add the response component of the fixed excitation codebook in the previous subframe from the speech encoding apparatus, and the adding means adds the mode information received by the information receiving means. 15. The speech decoding apparatus according to claim 14, wherein the response component is added to the excitation signal of the fixed excitation codebook in the subframe.

When a pitch emphasis filter is added to the fixed excitation codebook, the adding means filters the pitch enhancement filter for the response component of the fixed excitation codebook in the previous subframe according to the stability of the pitch period of the past subframe. The speech decoding apparatus according to claim 14, wherein an effect is adjusted.

When the pitch enhancement filter is added to the fixed excitation codebook, the information receiving means receives mode information indicating the state of the pitch enhancement filter when the synthesized speech with the minimum quantization error is obtained from the speech encoding device, 15. The speech decoding apparatus according to claim 14, wherein the adding means adjusts the filter effect of the pitch enhancement filter for the response component of the fixed excitation codebook in the previous subframe according to the mode information received by the information receiving means.

15. The speech decoding apparatus according to claim 14, wherein the adding means stores a gain of the fixed excitation codebook for the excitation excitation signal, and multiplies the gain by the response component of the fixed excitation codebook in the previous subframe.