JPH09297597A

JPH09297597A - High-efficiency speech transmission system and high-efficiency speech transmission device

Info

Publication number: JPH09297597A
Application number: JP8164702A
Authority: JP
Inventors: Mitsuru Tsuboi; 満坪井; Fumiaki Nishida; 文昭西田; Osahide Eguchi; 修英江口; Takashi Ota; 恭士大田; Masanao Suzuki; 政直鈴木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1996-03-06
Filing date: 1996-06-25
Publication date: 1997-11-18

Abstract

PROBLEM TO BE SOLVED: To reduce a hardware scale and to enable high-efficiency speech transmission of a low bit rate in a wide band by executing encoding processing to the input speech signal of a past frame by using the speech information parameter formed by subjecting the input speech signal of the past frame to a linear prediction analysis. SOLUTION: The encoding processing to the input speech signal of the N-th frame is executed by using the speech information parameter formed by subjecting the input speech signal of the past frames including the (N-1) the frame to the linear prediction analysis before the execution of the high-efficiency encoding to the input speech signal of the N-th frame. In such a case, speech information parameters, such as filter coefft. and pitch frequency, are obtainable from first, second adaptation devices 22, 32 and a pitch period extraction part 31. The short-period prediction filter coefft. is formed by subjecting a synthesized signal which is the output of a short period prediction filter 42 to the linear prediction analysis with this device and, therefore, an encoder is no more required to transfer the information of the short period prediction filter coefft. to a decoder side.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば、マルチメ
ディア通信で使用される高能率音声伝送方法及び高能率
音声伝送装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-efficiency voice transmission method and a high-efficiency voice transmission device used in, for example, multimedia communication.

【０００２】近年、ISDN網及びマルチメディア情報機器
の発達に伴い、画像、音声、ドキュメント等のマルチメ
ディア通信がより盛んに行われる様になった。マルチメ
ディア通信の中では、特に、テレビ会議システムやビジ
ュアル・テレフォンシステム等が、電話帯域音声よりも
自然で高品質な広帯域音声(50Hz 〜７KHz 帯域) を求め
ている。これは、広帯域音声にすることにより音声の自
然性及び臨場感が向上する為である。In recent years, with the development of ISDN networks and multimedia information devices, multimedia communication of images, voices, documents, etc. has become more active. Among multimedia communications, in particular, video conferencing systems and visual telephone systems require wideband voice (50 Hz to 7 KHz band) that is more natural and higher in quality than telephone band voice. This is because the wideband sound improves the naturalness and the realism of the sound.

【０００３】一方、従来の16KHz サンプリング、７KHz
帯域の音声符号化方式は、ITU-T G.722 として標準化さ
れているが、ビットレートが64〜48Kb/sの為に64KHz の
伝送帯域の回線では音声信号を１chしか伝送できない。On the other hand, conventional 16KHz sampling, 7KHz
The band audio coding system is standardized as ITU-T G.722, but since the bit rate is 64 to 48 Kb / s, only one channel of audio signal can be transmitted on the line of 64 KHz transmission band.

【０００４】そこで、音声信号を低ビットレート化して
伝送帯域を狭くすることにより、上記の伝送帯域で他の
メディア情報も伝送することが可能となり、通信コスト
の削減が可能となるが、これに対応してハードウェア規
模の縮小と広帯域で低ビットレートな高能率音声伝送方
法及び高能率音声伝送装置の提供を図ることが必要であ
る。Therefore, by lowering the bit rate of the audio signal and narrowing the transmission band, other media information can be transmitted in the above-mentioned transmission band and the communication cost can be reduced. Correspondingly, it is necessary to provide a high-efficiency voice transmission method and a high-efficiency voice transmission device with a reduced hardware scale and a wide band and low bit rate.

【０００５】[0005]

【従来の技術】図21は音声信号生成モデルの要部構成図
の一例、図22はベクトル／フレームの説明図、図23は従
来例の符号器の処理タイミング説明図である。2. Description of the Related Art FIG. 21 is an example of a main part configuration diagram of a voice signal generation model, FIG. 22 is an explanatory diagram of vectors / frames, and FIG. 23 is an explanatory diagram of processing timing of a conventional encoder.

【０００６】先ず、上記のITU-T G.722 による音声符号
化方式は帯域分割ADPCM 方式を用いるので、音声信号を
処理する際には７KHz 帯域を3.4KHz以上( 高域側) と3.
4KHz以下( 低域側) に分割し、分割した音声信号を16KH
z のサンプリング周期でそれぞれ符号化する。First, the above-mentioned ITU-T G.722 voice encoding system uses a band-division ADPCM system. Therefore, when processing a voice signal, the 7 KHz band is set to 3.4 KHz or higher (high band side).
Divide into 4KHz or less (low frequency side) and divide the audio signal into 16KH
Encode each with a sampling period of z.

【０００７】この時、高域側は２ビット、低域側は６ビ
ットで符号化し、多重化して回線に８KHz 周期で送出す
るので、全体として64Kb/sのビットレートとなり、64KH
z 帯域の回線は１chの音声信号しか伝送できない。At this time, the high band side is coded with 2 bits and the low band side is coded with 6 bits, multiplexed and transmitted to the line at an 8 KHz cycle, resulting in a bit rate of 64 Kb / s as a whole, and 64 KH / s.
The z-band line can only transmit 1ch audio signals.

【０００８】これに対して、電話帯域（300 〜3.4KHz帯
域) の通信では、国際標準G.728(LD-CELP)があり、必要
な伝送レートは16Kbpsと低ビットレートを実現してい
る。例えば、TV会議システムでは、限られた伝送レート
の中に画像と音声が共存する為、音声通信の伝送レート
の拡大は画像の通信品質に影響を与える。逆に、画像の
高品質化を選択すると音声の自然性が損なわれてしま
う。On the other hand, in the communication in the telephone band (300 to 3.4 KHz band), there is the international standard G.728 (LD-CELP), and the required transmission rate is as low as 16 Kbps. For example, in a video conferencing system, images and audio coexist within a limited transmission rate, so the expansion of the audio communication transmission rate affects the image communication quality. On the contrary, if the image quality is selected to be high, the naturalness of the sound is lost.

【０００９】このCELP方式は、複数の音声サンプルをベ
クトルとフレームからなる１つのグループにまとめ、後
述する音声信号生成モデルを用いて合成した音声信号
と、実際の入力音声信号との誤差が最小となる様に音声
情報パラメータを変化させ、最小となったパラメータを
量子化して伝送する方式である。In this CELP system, a plurality of voice samples are combined into one group consisting of a vector and a frame, and an error between a voice signal synthesized by using a voice signal generation model described later and an actual input voice signal is minimized. In this method, the voice information parameter is changed so that the minimum parameter is quantized and transmitted.

【００１０】ここで、音声信号生成モデルについて説明
する。図22に示す様に、入力した音声信号をサンプリン
グし、PCM 化した音声信号（以下、入力音声信号と云
う) を複数サンプル集めたものをベクトルと云い、ベク
トルが n個 ( nは正の整数) 、集まったものをフレーム
と云う。Here, the voice signal generation model will be described. As shown in Fig. 22, a sample is a collection of multiple samples of the input audio signal and the PCM-converted audio signal (hereinafter referred to as the input audio signal) is called a vector. There are n vectors (n is a positive integer). ), The collection is called a frame.

【００１１】さて、図21において、１フレーム分の入力
音声信号は、バッファ81に一時( 例えば、30〜40ms) 格
納された後、線形予測分析部分(LPC) 85と遅延部分82に
加えられる。In FIG. 21, an input voice signal for one frame is temporarily (for example, 30-40 ms) stored in the buffer 81 and then added to the linear prediction analysis part (LPC) 85 and the delay part 82.

【００１２】線形予測分析部分85は、１フレーム分の入
力音声信号に対して線形予測分析を行ってフィルタ係
数、利得値などからなる音声情報パラメータを取り出し
て、音声合成フィルタ88、増幅器87に送出する。The linear predictive analysis portion 85 performs linear predictive analysis on the input audio signal for one frame to extract audio information parameters including a filter coefficient and a gain value, and sends them to the audio synthesis filter 88 and the amplifier 87. To do.

【００１３】なお、線形予測分析部分85から出力する音
声情報パラメータは、時間T₀毎に更新されるが、その間
は固定されている。また、ベクトル化部分83は、遅延部
分82を通って入力した１フレームの入力音声信号中か
ら、これから符号化しようとする入力音声ベクトル（例
えば、図22中の#0の入力音声ベクトル）を抽出して比較
部分84に加える。The voice information parameter output from the linear prediction analysis section 85 is updated every time T _0, but is fixed during that period. Further, the vectorization section 83 extracts the input speech vector to be encoded (for example, the input speech vector of # 0 in FIG. 22) from one frame of the input speech signal input through the delay section 82. And add it to the comparison part 84.

【００１４】一方、ベクトル量子化コードブック( 以
下、VQコードブックと云う)86 には、１ベクトルの長さ
を持つ Q種類の雑音信号パターン( 以下、音源パターン
と云う) が格納されている。On the other hand, a vector quantization codebook (hereinafter referred to as VQ codebook) 86 stores Q kinds of noise signal patterns (hereinafter referred to as sound source patterns) having a length of one vector.

【００１５】そこで、コードブックから取り出した、例
えば、#0の音源パターンを増幅器87と上記のフィルタ係
数を持つ音声合成フィルタ88を通すことにより音声ベク
トルを合成し、合成音声ベクトルを比較部分84に加え
る。Therefore, for example, the sound source pattern of # 0 extracted from the codebook is passed through the amplifier 87 and the voice synthesis filter 88 having the above filter coefficient to synthesize the voice vector, and the synthesized voice vector is sent to the comparison portion 84. Add.

【００１６】比較部分84は、合成音声ベクトルと実際の
入力音声ベクトルを比較して誤差を求めるが、これを #
(Q−1)の音源パターンまで Q回繰り返し、誤差が最小の
時の利得値やフィルタ係数などを量子化する。The comparison section 84 compares the synthesized speech vector with the actual input speech vector to obtain an error.
Repeat (Q-1) sound source pattern Q times to quantize the gain value and filter coefficient when the error is minimum.

【００１７】ここで、コードブック中の音源パターンの
種類(Q) が1024個あるとすると、このパターンの指定に
10ビット必要である。また、１ベクトルが５サンプル、
サンプル周期が８KHz とすると、ビットレートは〔10ビット/( 5× 125μs 〕＝16 Kbs となる。If there are 1024 kinds of sound source patterns (Q) in the codebook, the pattern can be specified.
Requires 10 bits. Also, 1 vector is 5 samples,
If the sample period is 8 KHz, the bit rate will be [10 bits / (5 × 125 μs) = 16 Kbs].

【００１８】しかし、１ベクトルを n=10 サンプルと、
サンプル数を倍にして上記と同じ計算をすると８Kbｓと
なる。つまり、１ベクトル当りのサンプル数を増やす
と、１サンプル当りのビット割当を削減でき、低ビット
レート化が可能である。However, one vector has n = 10 samples,
If the number of samples is doubled and the same calculation as above is performed, it becomes 8 Kbs. That is, if the number of samples per vector is increased, the bit allocation per sample can be reduced, and the bit rate can be reduced.

【００１９】次に、図23を用いて符号化部の処理タイミ
ングを説明する( 図21参照）。図23に示す様に、例え
ば、#Nフレームの入力音声信号に対して符号化処理を行
うには、この処理の前に線形予測分析( 以下、LPC 分析
と示す) を行う必要がある。LPC 分析はこれから符号化
処理をする#Nフレームの入力音声信号の前後数サンプル
までの入力音声信号が必要であり( 図23中の斜線部分)
、この部分をバッファ81に格納する。Next, the processing timing of the encoder will be described with reference to FIG. 23 (see FIG. 21). As shown in FIG. 23, for example, in order to perform an encoding process on an #N frame input speech signal, it is necessary to perform a linear prediction analysis (hereinafter, referred to as LPC analysis) before this process. The LPC analysis requires the input audio signals up to several samples before and after the input audio signal of the #N frame to be encoded (the shaded area in Fig. 23).
, This part is stored in the buffer 81.

【００２０】この為、LPC 分析処理は #(N＋1)フレーム
の前半から後半にかけて行われ、符号化処理は、#(N ＋
1)フレームの後半から #(N＋2)の後半にかけて行われる
ことになる。Therefore, the LPC analysis processing is performed from the first half to the second half of the # (N + 1) frame, and the encoding processing is # (N +).
1) It will be performed from the latter half of the frame to the latter half of # (N + 2).

【００２１】そこで、#Nフレームに対する音声情報パラ
メータを符号化した音声符号化情報は #(N＋2)フレーム
の時に送出されることになり、結局、約３フレーム遅れ
て符号化情報が得られる。Therefore, the voice coded information obtained by coding the voice information parameter for the #N frame is transmitted at the time of the # (N + 2) frame, and eventually the coded information is obtained with a delay of about 3 frames.

【００２２】[0022]

【発明が解決しようとする課題】上記の様に、 LPC分析
はこれから符号処理を行うフレームの入力音声信号に対
して行うので、LPC 分析に必要な３フレーム分の入力音
声信号を蓄える為のメモリ( 図21のバッファ）が必要と
なる。これにより、符号化部のハードウェア規模が大き
くなると云う課題が生じた。As described above, since the LPC analysis is performed on the input voice signal of the frame to be coded from now on, the memory for storing the input voice signals of three frames necessary for the LPC analysis is used. (The buffer in Figure 21) is required. This causes a problem that the hardware scale of the encoding unit becomes large.

【００２３】また、例えば、TV会議システムでは、画像
通信の品質に影響を与えず、且つ、音声通信における自
然性、臨場感を得る為、再生音声信号が広帯域で低ビッ
トレートな音声符号化方法を提供しなければならないと
云う課題が生じた。Further, for example, in a video conference system, a reproduced audio signal is a wide band and a low bit rate audio encoding method in order to obtain a naturalness and a sense of presence in the audio communication without affecting the quality of image communication. There was a problem that I had to provide.

【００２４】本発明はハードウェア規模の縮小と、広帯
域で低ビットレートな高能率音声伝送方法及び高能率音
声伝送装置の提供を図ることを目的とする。It is an object of the present invention to provide a high-efficiency voice transmission method and a high-efficiency voice transmission device which has a wide bandwidth and a low bit rate in a reduced hardware scale.

【００２５】[0025]

【課題を解決するための手段】図１は第１〜第３の本発
明の音声符号器の要部機能説明図、図２は第４の本発明
の音声符号器の要部機能説明図、図３は図１の処理タイ
ミング説明図、図４はLPC 分析窓説明図である。FIG. 1 is a functional explanatory view of a main part of a speech coder of the first to third inventions, and FIG. 2 is a functional explanatory view of a main part of a speech coder of the fourth invention. FIG. 3 is an explanatory diagram of the processing timing of FIG. 1, and FIG. 4 is an explanatory diagram of the LPC analysis window.

【００２６】図５は第７〜第13の本発明の音声符号器・
復号器の要部機能説明図である。第１の本発明は、Ｎ番
目のフレームの入力音声信号に対して高能率符号化を行
う前に、（Ｎ−１）番目のフレームを含めた過去のフレ
ームの入力音声信号に対して線形予測分析を行って生成
した音声情報パラメータを用いて、該Ｎ番目のフレーム
の入力音声信号に対する符号化処理を行う様にした。FIG. 5 shows a speech coder according to the seventh to thirteenth aspects of the present invention.
It is a principal part functional explanatory view of a decoder. The first aspect of the present invention, prior to performing high-efficiency coding on the input speech signal of the Nth frame, performs linear prediction on the input speech signal of the past frame including the (N-1) th frame. Using the voice information parameter generated by the analysis, the input voice signal of the Nth frame is encoded.

【００２７】第２の本発明は、上記過去のフレームのう
ち、Ｎ番目のフレームに近いフレーム程、入力音声信号
に対して大きな重み付けを行う線形予測分析窓を用いる
様にした。According to the second aspect of the present invention, a linear prediction analysis window for weighting an input speech signal in a frame closer to the Nth frame among the past frames is used.

【００２８】第３の本発明は、入力音声信号の線形予測
分析をおこなって音声情報パラメータを生成する音声情
報生成手段と、音源となる複数の雑音信号パターンが格
納されており、取り出された雑音信号パターンに利得を
与えて出力するベクトル量子化コードブック手段と、印
加した音声情報パラメータをフィルタ係数とする長期予
測フィルタと、短期フィルタ、該ベクトル量子化コード
ブック手段の出力を用いて合成音声信号を生成する音声
合成フィルタ手段と、入力音声信号と生成した合成音声
信号の間の誤差計算を繰り返し、誤差が最小となる音源
番号を決定する最小誤差自乗計算手段とを有する高能率
音声符号器において、（Ｎ−１）番目のフレームを含め
た過去のフレームの入力音声信号に対して線形予測分析
を行って生成した音声情報パラメータを用いて、Ｎ番目
のフレームの入力音声信号に対する符号化処理を行う構
成にした。A third aspect of the present invention stores voice information generating means for performing a linear predictive analysis of an input voice signal to generate voice information parameters, and a plurality of noise signal patterns serving as a sound source, and the extracted noise. Vector quantization codebook means for giving a gain to a signal pattern and outputting it, a long-term prediction filter using applied voice information parameters as filter coefficients, a short-term filter, and a synthesized voice signal using the output of the vector quantization codebook means. In a high-efficiency speech coder having a speech synthesis filter means for generating a noise and a minimum error square calculation means for repeating a calculation of an error between an input speech signal and a generated synthesized speech signal to determine a sound source number having a minimum error. , The (N-1) th frame is generated by performing the linear prediction analysis on the input speech signals of the past frames. Using voice information parameters, and configured to perform a coding process for the input audio signal of the N-th frame.

【００２９】第４の本発明は、上記の短期予測フィルタ
のフィルタ係数として、過去に出力された合成音声信号
に対して線形予測分析を行って生成した音声情報パラメ
ータを使用する構成した。The fourth aspect of the present invention is configured to use, as a filter coefficient of the above-mentioned short-term prediction filter, a voice information parameter generated by performing a linear prediction analysis on a synthesized voice signal output in the past.

【００３０】第５の本発明は、入力した音声符号化情報
を分離して、音源番号、音声情報パラメータを取り出す
分離手段と、印加した音源番号に対応する雑音信号パタ
ーンを取り出し、利得を与えて出力するベクトル量子化
コードブック手段と、印加した音声情報パラメータに対
応したフィルタ特性を有する長期予測フィルタと、短期
予測フィルタを具備する高能率音声復号器において、短
期予測フィルタが、過去に出力された復号音声信号に対
して線形予測分析を行って生成した音声情報パラメータ
をフィルタ係数として使用する構成にした。In a fifth aspect of the present invention, the input speech coded information is separated to separate the sound source number and the sound information parameter, and the noise signal pattern corresponding to the applied sound source number is extracted to give a gain. In the high-efficiency speech decoder including the vector quantization codebook means for outputting, the long-term prediction filter having the filter characteristics corresponding to the applied speech information parameter, and the short-term prediction filter, the short-term prediction filter has been output in the past. The speech information parameter generated by performing the linear prediction analysis on the decoded speech signal is used as the filter coefficient.

【００３１】第６の本発明は、請求項４の高能率音声符
号器を用いて、入力音声信号を符号化して送信し、請求
項５の高能率音声復号器を用いて符号化された音声信号
を受信し、復号する構成にした。A sixth aspect of the present invention uses the high-efficiency speech encoder of claim 4 to encode and transmit an input speech signal, and encodes the speech using the high-efficiency speech decoder of claim 5. The signal is received and decoded.

【００３２】第７の本発明は、合成フィルタでインパル
ス応答行列とコードベクトルとの畳み込み演算を行なう
際、演算順序を最近のコードベクトルから演算を開始
し、順次、過去のコードベクトルの演算に移行して畳み
込み演算を行なう様にした。In a seventh aspect of the present invention, when a convolution operation of an impulse response matrix and a code vector is performed by a synthesis filter, the operation sequence is started from the most recent code vector, and the past code vector is sequentially operated. Then, the convolution operation is performed.

【００３３】第８の本発明は、適応コードブックのピッ
チ探索を、フレームを構成するｍベクトル単位で行なう
際、第１のベクトルは全範囲を探索して最適ピッチＰ₁
を求めた後、第２のベクトルから第ｍのベクトル迄は、
前回のピッチ探索で求めた最適ピッチを中心として、予
め設定された範囲内に限定してピッチ探索を行って最適
ピッチを求める様にした。According to the eighth aspect of the present invention, when the pitch search of the adaptive codebook is performed in units of m vectors forming a frame, the first vector searches the entire range to find the optimum pitch P ₁
After calculating, from the second vector to the m-th vector,
With the optimum pitch found in the previous pitch search as the center, the pitch search is performed within a preset range to find the optimum pitch.

【００３４】第９の本発明は、適応コードブック手段及
び固定コードブック手段からそれぞれ取り出したコード
ベクトルを加算して得た加算コードベクトルと、インパ
ルス応答ベクトル生成手段からのインパルス応答ベクト
ルとの畳み込み演算を行なって０状態応答ベクトルを求
めるインパルス応答フィルタ手段を設け、インパルス応
答フィルタ手段からの０状態応答ベクトルと、入力した
ターゲットベクトルとの誤差が最小となる様な最適コー
ドベクトルの組合せの探索を行なう高能率音声符号化方
法において、第７の本発明を用いて畳込み演算を、第８
の本発明を用いてピッチ探索を行なう様にした。The ninth aspect of the present invention is a convolution operation of an addition code vector obtained by adding code vectors respectively taken out from the adaptive codebook means and the fixed codebook means and an impulse response vector from the impulse response vector generating means. Is provided to provide an impulse response filter means for obtaining a 0-state response vector, and a combination of optimum code vectors that minimizes the error between the 0-state response vector from the impulse response filter means and the input target vector is searched. In the high-efficiency speech coding method, the convolution operation is performed by using the seventh aspect of the present invention.
A pitch search is performed using the present invention.

【００３５】第１０の本発明は、第９の本発明の音声符
号化方法で符号化された信号を復号して元の音声信号を
再生する様にした。第１１の本発明は、第７の本発明を
用いて畳込み演算を、第８の本発明を用いてピッチ探索
を行なう構成にした。The tenth aspect of the present invention reproduces the original voice signal by decoding the signal encoded by the voice encoding method of the ninth aspect of the present invention. The 11th aspect of the present invention is configured to perform the convolution operation using the 7th aspect of the present invention and the pitch search using the 8th aspect of the present invention.

【００３６】第１２の本発明は、第１０の本発明の高能
率音声符号器で符号化された信号を復号して元の音声信
号を再生する構成にした。第１３の本発明は、第１１の
本発明の高能率音声符号器を用いて、入力音声信号を符
号化して送信し、請求項１２の高能率音声復号器を用い
て符号化された音声信号を受信し、復号する構成にし
た。The twelfth aspect of the present invention has a configuration in which the signal encoded by the high-efficiency voice encoder of the tenth aspect of the present invention is decoded to reproduce the original voice signal. A thirteenth aspect of the present invention encodes an input voice signal by using the high-efficiency speech encoder of the eleventh aspect of the invention and transmits the speech signal, and encodes the speech signal using the high-efficiency speech decoder of claim 12. Is received and decrypted.

【００３７】先ず、図１〜図４を用いて本発明の手段を
説明する。図１において、従来例で説明した様に、VQコ
ードブック51には１ベクトルの長さを持つ Q種類の音源
パターンが予め用意されている。First, the means of the present invention will be described with reference to FIGS. In FIG. 1, as explained in the conventional example, the VQ codebook 51 is prepared in advance with Q kinds of sound source patterns having a length of one vector.

【００３８】VQコードブック51から出力された音源パタ
ーンは、利得部分52で利得適応器53からの出力に対応す
る利得が与えられ、音声を合成する合成フィルタ( 長期
予測フィルタと短期予測フィルタで構成) ４を通って合
成音声信号が得られるが、この合成音声信号はVQコード
ブックに用意された音源パターンの種類だけ生成され
る。The sound source pattern output from the VQ codebook 51 is provided with a gain corresponding to the output from the gain adaptor 53 in the gain portion 52, and a synthesis filter (composed of a long-term prediction filter and a short-term prediction filter) for synthesizing speech is provided. ) 4, a synthesized speech signal is obtained, but this synthesized speech signal is generated only for the types of sound source patterns prepared in the VQ codebook.

【００３９】また、VQコードブック手段５、合成フィル
タ４、最小誤差自乗計算手段６からなるループは、上記
の合成音声信号と入力音声信号の誤差を計算する部分
で、誤差が最小となった時のVQコードブックの音源パタ
ーンの番号( 以下、インデックスと云う) が求められ
る。The loop composed of the VQ codebook means 5, the synthesis filter 4, and the minimum error square calculation means 6 is a portion for calculating the error between the above-mentioned synthesized voice signal and the input voice signal, and when the error becomes the minimum. The VQ codebook sound source pattern number (hereinafter referred to as the index) is calculated.

【００４０】ここで、VQコードブック51に用意された音
源パターンは、人の声帯の振動パターンや子音発声時の
唇、舌等の動きに対応している。なお、声帯の振動が口
蓋の形により様々な反響音が生じて音声となるのに対
し、人の口蓋の形に対応している合成音声フィルタはフ
ィルタ係数の値を変化させることにより、人の口蓋の形
の変化と等価な動作となる。Here, the sound source pattern prepared in the VQ codebook 51 corresponds to the vibration pattern of the human vocal cords and the movement of the lips, tongue, etc. at the time of consonant vocalization. It should be noted that, while the vibrations of the vocal cords produce various reverberations depending on the shape of the palate to produce a voice, the synthetic voice filter corresponding to the shape of the palate of a person changes the value of the filter coefficient. The movement is equivalent to a change in the shape of the palate.

【００４１】つまり、音声符号化情報は、VQコードブッ
クから選択されたインデックス及び合成フィルタのフィ
ルタ係数となるが、このフィルタ係数は入力音声信号を
LPC分析することにより生成する。That is, the speech coding information is the index selected from the VQ codebook and the filter coefficient of the synthesis filter, and this filter coefficient represents the input speech signal.
It is generated by LPC analysis.

【００４２】ここで、図１の信号の流れを説明する。入
力音声信号をベクトルバッファ12に一時、格納する。そ
こで、LPC 分析部分11は、ベクトルバッファに格納され
た入力音声信号を用いて反射係数、LPC 係数を求め、前
者を第１の適応器22に、後者をピッチ周期抽出部分31と
聴覚重み付けフィルタ62に送出する。Here, the signal flow in FIG. 1 will be described. The input audio signal is temporarily stored in the vector buffer 12. Therefore, the LPC analysis section 11 obtains the reflection coefficient and the LPC coefficient using the input speech signal stored in the vector buffer, the former as the first adaptor 22, and the latter as the pitch period extraction section 31 and the auditory weighting filter 62. Send to.

【００４３】第１の適応器22は、反射係数コードブック
21にアクセスして、入力した反射係数に最も近いフィル
タ係数を取り出して音声合成フィルタ内の短期予測フィ
ルタ42に送出する。The first adaptor 22 is a reflection coefficient codebook.
21 is accessed to take out the filter coefficient closest to the input reflection coefficient and send it to the short-term prediction filter 42 in the speech synthesis filter.

【００４４】なお、反射係数コードブック21には、様々
な入力反射係数に対応するフィルタ係数との関係を示す
テーブルが格納されているものとする。また、ピッチ周
期抽出部分31は、入力したLPC 係数と入力音声信号か
ら、LPC予測残差とピッチ周期を求めて第２の適応器32
に送出する。It is assumed that the reflection coefficient codebook 21 stores a table showing the relationship with the filter coefficients corresponding to various input reflection coefficients. In addition, the pitch period extraction unit 31 obtains the LPC prediction residual and the pitch period from the input LPC coefficient and the input speech signal, and outputs the second adaptive unit 32.
To send to.

【００４５】第２の適応器32は、上記と同様に長期予測
係数コードブック33にアクセスして、入力したLPC 予測
残差とピッチ周期を用いて対応するフィルタ係数を取り
出して、ピッチ周期と共に音声合成フィルタ内の長期予
測フィルタ41に送出する。The second adaptor 32 accesses the long-term prediction coefficient codebook 33 in the same manner as described above, extracts the corresponding filter coefficient using the input LPC prediction residual and pitch cycle, and outputs the speech with the pitch cycle. It is sent to the long-term prediction filter 41 in the synthesis filter.

【００４６】なお、長期予測係数コードブック33には、
LPC 予測残差とピッチ周期の様々な組合せに対応するフ
ィルタ係数との関係を示すテーブルが格納されているも
のとする。In the long-term prediction coefficient codebook 33,
It is assumed that a table showing the relationship between LPC prediction residuals and filter coefficients corresponding to various combinations of pitch periods is stored.

【００４７】更に、聴覚重み付けフィルタ部分62に対し
て、入力したLPC 係数に対応したマスキング特性を持た
せる。つまり、上記の様に、第１の適応器22、第２の適
応器32とピッチ周期抽出部分31からフィルタ係数やピッ
チ周期などの音声情報パラメータが得られる。Further, the auditory weighting filter portion 62 is provided with a masking characteristic corresponding to the input LPC coefficient. That is, as described above, the voice information parameters such as the filter coefficient and the pitch period can be obtained from the first adaptor 22, the second adaptor 32 and the pitch period extracting portion 31.

【００４８】一方、VQコードブック51から#0の音源パタ
ーン（１ベクトルの長さを持つ) を取り出す。そこで、
利得部分52は取り出した #0 音源パターンに対して、利
得適応器53の出力に対応する利得を与え、長期予測フィ
ルタ41と短期予測フィルタ42からなる音声合成フィルタ
４に送出する。On the other hand, the sound source pattern # 0 (having a length of one vector) is extracted from the VQ codebook 51. Therefore,
The gain portion 52 gives a gain corresponding to the output of the gain adaptor 53 to the extracted # 0 sound source pattern, and sends it to the speech synthesis filter 4 including the long-term prediction filter 41 and the short-term prediction filter 42.

【００４９】これにより、音源パターンは音声合成フィ
ルタ4 を通って１ベクトルの合成音声信号が生成され、
誤差計算部分61に加えられる。誤差計算部分61には、ベ
クトルバッファ12に格納された入力音声信号のうち、対
応する１ベクトルの入力音声信号も加えられているの
で、１ベクトルの合成音声信号と入力音声信号との間の
誤差を計算し、計算結果を聴覚重み付けフィルタ62を介
して最小誤差自乗計算部分63に加える。As a result, the sound source pattern passes through the speech synthesis filter 4 to generate a synthesized speech signal of one vector,
It is added to the error calculation part 61. Since the corresponding one-vector input audio signal of the input audio signals stored in the vector buffer 12 is also added to the error calculation portion 61, the error between the one-vector synthesized audio signal and the input audio signal is obtained. Is calculated, and the calculation result is added to the minimum error square calculation part 63 via the auditory weighting filter 62.

【００５０】そこで、最小誤差自乗計算部分63は最小誤
差自乗計算を行うが、この処理を Q回繰り返して誤差電
力が最小となるVQコードブック51のインデックスを求め
る。そして、上記で得られたVQコードブックのインデッ
クス、長期予測係数コードブックのインデックス、ピッ
チ周期、反射係数コードブックのインデックスが音声情
報パラメータとして復号器側に送られる。Therefore, the minimum error square calculation part 63 performs the minimum error square calculation, and this process is repeated Q times to obtain the index of the VQ codebook 51 that minimizes the error power. Then, the index of the VQ codebook, the index of the long-term prediction coefficient codebook, the pitch period, and the index of the reflection coefficient codebook obtained above are sent to the decoder side as audio information parameters.

【００５１】ここで、図１の場合、短期予測フィルタの
係数は入力音声信号を用いて生成し、生成した短期予測
フィルタの係数を復号器側に送出している。しかし、図
２の場合は短期予測フィルタの係数は合成音声信号を用
いて生成するが、短期予測フィルタの係数は復号器側に
送出せず、復号器側は復号音声信号を用いて短期予測フ
ィルタの係数を生成する様にした。Here, in the case of FIG. 1, the coefficients of the short-term prediction filter are generated using the input speech signal, and the generated coefficients of the short-term prediction filter are sent to the decoder side. However, in the case of FIG. 2, the coefficients of the short-term prediction filter are generated using the synthesized speech signal, but the coefficients of the short-term prediction filter are not sent to the decoder side, and the decoder side uses the decoded speech signal to perform the short-term prediction filter. The coefficient of is generated.

【００５２】これにより、復号器側に伝送すべき情報量
が少なくなり、伝送速度がより低下する。次に、音声信
号はサンプル間に相関関係があり、短期間( 数ms) であ
れば過去から未来の音声波形を予測することが可能であ
る。As a result, the amount of information to be transmitted to the decoder side decreases, and the transmission speed further decreases. Next, the speech signal has a correlation between samples, and it is possible to predict speech waveforms from the past to the future in a short period (several ms).

【００５３】そこで、本発明では図３に示す様に、#Nフ
レームの入力音声信号に対する符号化処理を行う前の、
LPC 分析に必要な入力音声信号として、斜線部分の #(N
−1)フレーム以前の入力音声信号を使用する。Therefore, in the present invention, as shown in FIG. 3, before performing the encoding process on the input audio signal of the #N frame,
As the input audio signal necessary for LPC analysis, # (N
-1) Use the input audio signal before the frame.

【００５４】また、LPC 分析を行う際には図４に示す様
な特性を有するLPC 分析窓を使用する。この分析窓の特
性はある伝達関数を持つフィルタにインパルスを印加し
た時に現れる応答波形と同じであり、下記の式で表され
る。When performing LPC analysis, an LPC analysis window having the characteristics shown in FIG. 4 is used. The characteristic of this analysis window is the same as the response waveform that appears when an impulse is applied to a filter having a certain transfer function, and is expressed by the following equation.

【００５５】H(Z)= 1/( 1−αZ ^-1)² ここで、0 ＜α＜1 である。つまり、LPC 分析窓に図４
に示す様に特性を持たせることにより、次のフレームに
近い波形の重み付けを行い、過去の影響を受けない様に
した。H (Z) = 1 / (1−αZ ⁻¹ ) ² where 0 <α <1. In other words, in the LPC analysis window,
By giving the characteristics as shown in, the waveforms close to the next frame are weighted so that they are not affected by the past.

【００５６】これにより、LPC 分析を安定に行うことが
できる。さて、図３において、 #(N−1)フレーム以前の
入力音声信号は、 #(N−1)フレームでLPC 分析され、必
要な音声情報パラメータが得られている。As a result, the LPC analysis can be stably performed. Now, in FIG. 3, the input voice signal before the # (N−1) frame is subjected to LPC analysis in the # (N−1) frame to obtain necessary voice information parameters.

【００５７】そこで、既に得られた#(N −1)の音声情報
パラメータを、#Nフレームの入力音声信号に適用して符
号化処理を行うと、 #(N＋1)フレームの時点で音声符号
化情報が取り出されるので、符号化における情報の遅延
は１フレームとなる。Therefore, when the already obtained # (N-1) voice information parameters are applied to the input voice signal of the #N frame to perform the encoding process, the voice encoding is performed at the time of the # (N + 1) frame. Since the information is extracted, the information delay in encoding is one frame.

【００５８】これにより、メモリ容量の削減に対応し
て、ハードウェア規模の小さな符号器の提供ができる。
次に、図５を用いて第７〜第12の本発明の手段を説明す
る。As a result, it is possible to provide an encoder with a small hardware scale in response to the reduction in memory capacity.
Next, the means of the seventh to twelfth aspects of the present invention will be described with reference to FIG.

【００５９】音声の符号化方式には種々な技術が存在す
るが、大別すると，波形を忠実に表現すること目的とし
た波形符号化方式と、音声の生成モデルを分析、パラメ
ータに変換し、そのパラメータから合成する分析合成方
式とがある。There are various techniques for speech coding methods. Broadly speaking, a waveform coding method for the purpose of faithfully representing a waveform and a speech generation model are analyzed and converted into parameters. There is an analysis and synthesis method that synthesizes from the parameters.

【００６０】上記のITU-T G.722 は波形符号化方式の一
種であり、波形符号化方式の特徴としてビット当りの情
報量の重みがかなり大きく、ビットを削れば削る程、元
の音声が持っていた情報を失う割合が大きく、低ビット
レート化には向かないことが知られている。The above-mentioned ITU-T G.722 is a kind of waveform coding method. The characteristic of the waveform coding method is that the weight of the information amount per bit is quite large. It is known that the ratio of losing the information that they have is large, and it is not suitable for lowering the bit rate.

【００６１】これに対して、分析合成方式は、ある程
度、まとめたものを１つの塊として送るので、比較的低
ビットレートまで高品質であることが知られており、本
発明はその代表的な技術であるCELP(Code Excited Line
ar Prediction)方式を基本としている。On the other hand, the analysis-synthesis method is known to have a high quality up to a relatively low bit rate because the aggregated data is sent as one block to some extent. CELP (Code Excited Line)
ar Prediction) method.

【００６２】一般に、CELP方式は電話帯域の低ビットレ
ート符号化方式として有効な方式であり、前記G. 728(1
6 Kbps) の他にも、移動体通信の日本標準方式であるVS
ELP(Vector Sum Excited Linear Prediction: 6.7 Kbp
s) やPSI-CELP(Pitch Synchronous Inovation-CELP ：
3.45 Kbps)といった、極低レートな符号化方式として実
現されている。In general, the CELP system is a system effective as a low bit rate coding system in the telephone band, and is described in G.728 (1).
6 Kbps), as well as VS, which is the Japanese standard method for mobile communication.
ELP (Vector Sum Excited Linear Prediction: 6.7 Kbp
s) and PSI-CELP (Pitch Synchronous Inovation-CELP:
It has been realized as an extremely low-rate encoding method such as 3.45 Kbps.

【００６３】CELP方式の概要を述べると、入力音声信号
から線形予測分析によって、その信号の持つ特徴パラメ
ータを抽出し、合成フィルタを構成する。予め、用意さ
れている複数の励振信号( 音源で、コードブックとして
用意してある) を順次、合成フィルタにより音声信号レ
ベルに変換し、入力音声信号との誤差を評価し、誤差最
小のベクトルのインデックスを符号として伝送する様に
した高能率符号化である。The CELP system will be outlined. The characteristic parameters of the input speech signal are extracted from the input speech signal by linear prediction analysis to construct a synthesis filter. In advance, multiple excitation signals (prepared as a codebook in the sound source) are sequentially converted into a voice signal level by a synthesis filter, the error with the input voice signal is evaluated, and the vector of the error minimum vector is calculated. High-efficiency coding in which the index is transmitted as a code.

【００６４】本発明では、CELP方式の持つ高能率な情報
圧縮性を利用して電話帯域より広帯域の音声信号を扱う
為に必要な構成を提案するが、関連する発明として「高
能率音声伝送方法及び高能率音声伝送装置」が本出願人
より出願されており、本発明はそれを改良したものであ
る。以下、図５を用いて説明する。The present invention proposes a configuration necessary for handling a voice signal wider than the telephone band by utilizing the high-efficiency information compressibility of the CELP system. As a related invention, "high-efficiency voice transmission method" is proposed. And a high-efficiency voice transmission device ”have been filed by the present applicant, and the present invention is an improvement thereof. This will be described below with reference to FIG.

【００６５】図中、ベクトル化手段101 はPCM 符号化さ
れた入力音声信号をベクトル化する部分であり、線形予
測分析手段103 はある周期毎に、入力音声信号列を線形
予測分析処理を行い、その時の反射係数を算出する部分
である。In the figure, a vectoring means 101 is a part for vectorizing an input speech signal coded by PCM, and a linear prediction analysis means 103 carries out a linear prediction analysis processing of an input speech signal sequence at a certain cycle. This is a part for calculating the reflection coefficient at that time.

【００６６】フィルタ係数算出手段104 は、線形予測分
析手段103 が算出した反射係数を、反射係数テーブル10
5 によって量子化した後、合成フィルタ106 、聴覚重み
付けフィルタ107 のそれぞれで使用するフィルタ係数を
算出する部分である。The filter coefficient calculation means 104 converts the reflection coefficient calculated by the linear prediction analysis means 103 into the reflection coefficient table 10
After being quantized by 5, the filter coefficient used in each of the synthesis filter 106 and the perceptual weighting filter 107 is calculated.

【００６７】適応コードブック109 は過去の利得調整さ
れた励起信号、即ち、過去に最適と判断された合成フィ
ルタの入力信号を蓄積しており、利得手段111 はその利
得を調整する部分である。The adaptive codebook 109 stores the past gain-adjusted excitation signal, that is, the input signal of the synthesis filter determined to be optimum in the past, and the gain means 111 is a part for adjusting the gain.

【００６８】固定コードブック112 は駆動音源を蓄積し
ており、利得手段114 は利得を調整する部分である。加
算手段115 は利得手段111 と利得手段114 から出力され
たベクトルを加算する部分である。The fixed codebook 112 stores the driving sound source, and the gain means 114 is a part for adjusting the gain. The adding means 115 is a part that adds the vectors output from the gain means 111 and the gain means 114.

【００６９】差分手段 102は入力ベクトルと合成フィル
タ106 で合成されたベクトルとの差分を取る部分であ
る。合成フィルタ手段 106は利得調整された励振信号か
ら信号を合成する部分である。The difference means 102 is a part for obtaining the difference between the input vector and the vector synthesized by the synthesis filter 106. The synthesis filter means 106 is a part for synthesizing a signal from the excitation signal whose gain has been adjusted.

【００７０】聴覚重み付けフィルタ 107は、入力ベクト
ルと合成ベクトルとの差分ベクトルに対し、聴覚的な重
み付けを行う部分である。最小誤差計算手段 108は重み
付けされた誤差ベクトルの評価を行う部分である。The perceptual weighting filter 107 is a part for perceptually weighting the difference vector between the input vector and the combined vector. The minimum error calculation means 108 is a part that evaluates the weighted error vector.

【００７１】なお、図５中の復号器の各手段について
は、対応する符号器の手段と同等の処理を行うので省略
する。また、図５の信号の流れは次の様である。Note that each unit of the decoder in FIG. 5 performs the same processing as the unit of the corresponding encoder, and therefore its description is omitted. The flow of signals in FIG. 5 is as follows.

【００７２】PCM 符号化された入力音声信号は、ベクト
ル化手段101 でｎ個を単位としてベクトルにまとめられ
るが、このベクトルをターゲットベクトルと云う。ま
た、線形予測分析手段 103は、上記の入力音声信号に対
して、１フレーム毎に線形予測分析を行って分析結果を
フィルタ係数算出手段104 に送出する。そこで、フィル
タ係数算出手段104 は分析結果を用いて、聴覚重み付け
フィルタ係数、および合成フィルタ係数を算出して対応
するフィルタに与える。The PCM-encoded input voice signal is put into a vector by the vectorizing means 101 in units of n, and this vector is called a target vector. Further, the linear prediction analysis unit 103 performs a linear prediction analysis on the above-mentioned input speech signal for each frame and sends the analysis result to the filter coefficient calculation unit 104. Therefore, the filter coefficient calculation means 104 calculates the auditory weighting filter coefficient and the synthesis filter coefficient using the analysis result and gives them to the corresponding filter.

【００７３】なお、係数算出に当たっては、線形予測分
析によって得られた１〜L 迄のL 個の各次数の反射係数
を、それぞれ必要なレベルで用意した反射係数テーブル
105で量子化し、その量子化値を用いてフィルタ係数に
変換する。In calculating the coefficients, a reflection coefficient table prepared by using L-level reflection coefficients of 1 to L obtained by the linear prediction analysis at respective required levels
Quantize at 105, and use the quantized value to convert into filter coefficients.

【００７４】適応コードブック109 の各出力ベクトルに
対し、利得手段111 でそれぞれ最適な利得を与えた利得
付与適応コードベクトルと、固定コードブック112 の各
ベクトルに対し、利得手段114 でそれぞれ最適な利得を
与えた利得付与固定コードベクトルを加算手段115 で加
算する。For each output vector of the adaptive codebook 109, a gain-giving adaptive code vector to which an optimum gain is given by the gain means 111, and for each vector of the fixed codebook 112, an optimum gain is given by the gain means 114. The adding means 115 adds the fixed gain-applying code vectors given by.

【００７５】そして、加算手段115 の加算結果を、フレ
ーム毎にフィルタ係数が更新されている合成フィルタ10
6 を通過させることにより、重み付け領域のベクトル(
以下、重み付け信号と云う) に変換し、差分手段102 に
加える。Then, the addition result of the addition means 115 is used as the synthesis filter 10 whose filter coefficient is updated for each frame.
The vector of weighting regions ((
Hereinafter, it will be referred to as a weighting signal) and added to the difference means 102.

【００７６】差分手段102 には、ベクトル化手段101 か
らの出力( 以下、ターゲットベクトルと云う) も加えら
れているのて、重み付け信号とターゲットベクトルとの
誤差を計算する。Since the output from the vectorization means 101 (hereinafter referred to as the target vector) is also added to the difference means 102, the error between the weighting signal and the target vector is calculated.

【００７７】誤差計算に当たっては、人の聴覚的な特徴
を反映するする為の聴覚重み付けフィルタ107 を通過さ
せた後、最小誤差計算手段108 で誤差評価を行う。誤差
評価の結果、誤差が最小となる適応コードベクトルと固
定コードベクトルの組合せを決定し、その時の各コード
ベクトルのインデックスである適応コードブック・イン
デックスと固定コードブック・インデックス及び各
コードブック・ゲインテーブル110, 113のインデックス
である適応コードブック・ゲイン・インデックス, 固
定コードブック・ゲイン・インデックスを符号化情報
として、復調器へ伝送する。In calculating the error, the minimum error calculating means 108 evaluates the error after passing through the auditory weighting filter 107 for reflecting the human auditory characteristics. As a result of the error evaluation, the combination of the adaptive code vector and the fixed code vector that minimizes the error is determined, and the adaptive codebook index and fixed codebook index, which are the indices of each code vector at that time, and each codebook gain table. The adaptive codebook gain index and fixed codebook gain index, which are indices 110 and 113, are transmitted to the demodulator as encoded information.

【００７８】また、フレーム周期毎に線形予測分析によ
って、反射係数テーブル105 から得られた反射係数の量
子化テーブル・インデックスを、符号化情報として復
号器へ伝送する。Further, the quantization table index of the reflection coefficient obtained from the reflection coefficient table 105 is transmitted to the decoder as coding information by the linear prediction analysis for each frame period.

【００７９】さて、復号器では、フレーム周期毎に伝送
される反射係数の各インデックス及び各ベクトル毎に
伝送される適応/ 固定コードブック・インデックス/
と適応/ 固定コードブック・ゲイン・インデックス
/ を分離する。Now, in the decoder, each index of the reflection coefficient transmitted in each frame period and the adaptive / fixed codebook index / transmitted in each vector
And adaptive / fixed codebook gain index
Separate the /.

【００８０】そして、フレーム周期毎に、反射係数テー
ブル 121から反射係数テーブル・インデックスに対応す
る量子化反射係数を読み出し、フィルタ係数算出手段12
0 で合成フィルタの係数を生成・更新する。Then, for each frame period, the quantized reflection coefficient corresponding to the reflection coefficient table index is read from the reflection coefficient table 121, and the filter coefficient calculation means 12
Generates and updates the synthesis filter coefficient with 0.

【００８１】また、各ベクトル毎に適応/ 固定コードブ
ック123,126 及び適応/ 固定コードブック・ゲイン・テ
ーブル125, 128からインデックスに対応する量子化利得
を読み出し、利得調整手段124, 127で利得調整を行った
ものを符号器と同様に合成フィルタ122 を通過させるこ
とで音声信号を再生する。Further, the quantization gain corresponding to the index is read from the adaptive / fixed codebooks 123 and 126 and the adaptive / fixed codebook gain tables 125 and 128 for each vector, and the gain adjustment means 124 and 127 perform gain adjustment. The audio signal is reproduced by passing the signal through the synthesis filter 122 similarly to the encoder.

【００８２】ここで、広帯域の音声信号を扱う場合に
は、その中に含まれるフォルマントの数が増える為、周
波数特性を再現する合成フィルタの次数L を16〜20とす
る。また、低レートを実現する為には、１ベクトル当り
のサンプル数と１フレーム当りのベクトル数の両方を大
きくする。Here, when a wideband audio signal is handled, the number of formants contained therein increases, so the order L of the synthesis filter for reproducing the frequency characteristic is set to 16 to 20. In order to realize a low rate, both the number of samples per vector and the number of vectors per frame are increased.

【００８３】本発明の場合、１ベクトル当りのサンプル
数ｎは30〜40が適当であり、１フレーム当りのベクトル
数ｍは８から10程度となる。また、CELP方式は、一般に
演算量が大きく、多くの演算量削減の方式が提案されて
いる。In the case of the present invention, the number of samples n per vector is suitably 30 to 40, and the number of vectors m per frame is about 8 to 10. In addition, the CELP method generally has a large amount of calculation, and many methods of reducing the amount of calculation have been proposed.

【００８４】本発明では、特に演算量の大きい合成フィ
ルタのインパルス応答行列と適応コードベクトルとの畳
み込み演算の回数を削減する演算手段を提供する。The present invention provides an arithmetic means for reducing the number of times of the convolution operation of the impulse response matrix of the synthesis filter and the adaptive code vector, which has a particularly large arithmetic amount.

【００８５】[0085]

【発明の実施の形態】図６は第１〜第３の本発明の実施
例の機能説明図（符号器）、図７は図６の処理手順説明
図、図８は図６の符号器に対応する復号器の機能説明
図、図９は第６の本発明の実施例の機能説明図（復号
器）である。FIG. 6 is a functional explanatory diagram (encoder) of the first to third embodiments of the present invention, FIG. 7 is an explanatory diagram of a processing procedure of FIG. 6, and FIG. 8 is an encoder of FIG. FIG. 9 is a functional explanatory diagram of the corresponding decoder, and FIG. 9 is a functional explanatory diagram (decoder) of the sixth embodiment of the present invention.

【００８６】図10は第７〜第９, 第11の本発明の実施例
の機能説明図( 符号器) 、図11は図10の処理手順説明
図、図12は第10, 第12の本発明の実施例の機能説明図(
復号器) 、図13は図12の処理手順説明図である。FIG. 10 is a functional explanatory view (encoder) of the seventh to ninth and eleventh embodiments of the present invention, FIG. 11 is an explanatory view of the processing procedure of FIG. 10, and FIG. 12 is a tenth and twelfth book. Functional explanatory view of the embodiment of the invention (
Decoder) FIG. 13 is an explanatory diagram of the processing procedure of FIG.

【００８７】図14はデータ格納順序説明図で、(a) は正
順格納で、従来例の場合、 (b)は逆順格納で、第７の本
発明の実施例の場合、図15は図14(a) の場合の演算量説
明図（その１）、図16は図14(a) の場合の演算量説明図
（その２）、図17は第７の本発明の実施例の演算量説明
図（その１）、図18は第７の本発明の実施例の演算量説
明図（その２）である。FIG. 14 is a diagram for explaining the data storage order. (A) is a normal order storage, in the case of the conventional example, (b) is a reverse order storage. In the case of the seventh embodiment of the present invention, FIG. 14 (a), the calculation amount explanatory diagram (1), FIG. 16 is the calculation amount explanatory diagram (2) in FIG. 14 (a), FIG. 17 is the calculation amount of the seventh embodiment of the present invention FIG. 18 is an explanatory diagram (No. 1) and FIG. 18 is an explanatory diagram (No. 2) of the amount of calculation in the embodiment of the seventh invention.

【００８８】図19は適応コードブックのピッチ予備探索
方法説明図( 全ベクトルで毎回ピッチ抽出評価を行う場
合) で、(a) は探索範囲説明図、(b),(c) は処理手順説
明図、図20は第８の本発明の実施例の説明図( 前ベクト
ルのピッチ情報を中心に前後数サンプルの範囲について
のみ評価を行う場合) で、(a) は本発明の探索方法説明
図、(b),(c) は処理手順説明図である。FIG. 19 is an explanatory diagram of the pitch preliminary search method of the adaptive codebook (when pitch extraction evaluation is performed every time with all vectors), (a) is a search range explanatory diagram, and (b) and (c) are processing procedure explanations. FIG. 20 is an explanatory diagram of the eighth embodiment of the present invention (when only the range of several samples before and after the pitch information of the front vector is evaluated), and (a) is an explanatory diagram of the search method of the present invention. , (B), (c) are processing procedure explanatory diagrams.

【００８９】ここで、全図を通じて同一符号は同一対象
物を示す。また、上記で詳細説明した部分については概
略説明し、本発明の部分については詳細説明する。以
下、図６〜図９と図10〜図18について説明する。Here, the same reference numerals denote the same objects throughout the drawings. Further, the parts described in detail above will be briefly described, and the parts of the present invention will be described in detail. 6 to 9 and 10 to 18 will be described below.

【００９０】先ず、図６〜図９の説明を行う。図６にお
いて、ベクトルバッファ部分12は、入力音声信号（例え
ば、16KHz でサンプリングされた16ビットのリニアPCM
化された音声データ) からm サンプル（m は正の整数）
の連続する音声ベクトル信号S(n)を生成する。First, the description of FIGS. 6 to 9 will be given. In FIG. 6, the vector buffer portion 12 is an input audio signal (for example, 16-bit linear PCM sampled at 16 KHz).
Sampled audio data) to m samples (m is a positive integer)
Of continuous speech vector signals S (n) are generated.

【００９１】第１の聴覚重み付けフィルタ62a は、上記
の音声ベクトル信号S(n)に対して聴覚重み付け処理した
音声ベクトルV(n)を生成する。長期予測フィルタ41は音
声の基本ピッチ周波数の K倍(Kは正の整数) の周波数成
分のレベルを高くする櫛形フィルタであり、短期予測フ
ィルタ42は音声のスペクトルを包絡させた周波数特性を
持つフィルタである。The first perceptual weighting filter 62a generates a perceptually weighted voice vector V (n) for the above voice vector signal S (n). The long-term prediction filter 41 is a comb filter that raises the level of frequency components that are K times (K is a positive integer) the fundamental pitch frequency of speech, and the short-term prediction filter 42 has a frequency characteristic that envelops the speech spectrum. Is.

【００９２】そして、長期予測フィルタ41と短期予測フ
ィルタ42と第２の聴覚重み付けフィルタ62b をカスケー
ド接続し、長期予測フィルタ41の入力信号として 0,0,
・・0 を加えた時の第２の聴覚重み付けフィルタ62b の
出力を計算する（この出力を０入力応答信号と云う）。Then, the long-term prediction filter 41, the short-term prediction filter 42, and the second auditory weighting filter 62b are connected in cascade, and 0,0,
.. The output of the second auditory weighting filter 62b when 0 is added is calculated (this output is referred to as 0 input response signal).

【００９３】さて、第１の聴覚重み付けフィルタ62a で
聴覚重み付けされた入力音声信号v(n)と、第２の聴覚重
み付けフィルタ62b からの０入力応答信号r(n)とが、第
１の差分計算部分61a でv(n)−r(n)の計算が行われ、差
分出力( 以下、ターゲットベクトルと云う) x(n)を得
る。Now, the first difference between the input sound signal v (n), which is perceptually weighted by the first perceptual weighting filter 62a, and the 0 input response signal r (n) from the second perceptual weighting filter 62b. The calculation part 61a calculates v (n) -r (n) to obtain a difference output (hereinafter, referred to as a target vector) x (n).

【００９４】このターゲットベクトルx(n)と、VQコード
ブック51の音源パターンから生成される合成音声信号Xj
との誤差が最小となる様にVQコードブック51のインデッ
クスを決定する。A synthetic speech signal Xj generated from the target vector x (n) and the sound source pattern of the VQ codebook 51.
The index of VQ codebook 51 is determined so that the error between and is minimum.

【００９５】VQコードブック51には、上記の様にQ 種類
の音源パターンが予め用意されており、VQコードブック
のインデックスの探索は下記の手順で行われる。VQコー
ドブック51のj 番目のインデックスから取り出された１
ベクトルの音源パターン( 以下、コードベクトルと云
う）を、 yj( yj= yj(0), yj(1), ・・・・yj(n-1) ) とする。The VQ codebook 51 is prepared in advance with Q kinds of sound source patterns as described above, and the index search of the VQ codebook is performed by the following procedure. 1 from the jth index in VQ codebook 51
A vector sound source pattern (hereinafter referred to as a code vector) is defined as yj (yj = yj (0), yj (1), ..., Yj (n-1)).

【００９６】コードベクトル yj は、利得部分52におい
て、予測した入力信号の振幅に対応した利得σが与えら
れた後、インパルス応答ベクトル43の出力が印加したイ
ンパルス応答フィルタ44に加えられてフィルタ処理され
る。The code vector yj is filtered in the gain section 52 after being given a gain σ corresponding to the predicted amplitude of the input signal, and then the output of the impulse response vector 43 is applied to the applied impulse response filter 44. It

【００９７】ここで、インパルス応答ベクトル部分43
は、LPC 分析結果により得られた聴覚重み付けフィルタ
係数、短期予測フィルタ係数、長期予測フィルタ係数、
ピッチ周期を用いてインパルス応答ベクトルを算出し、
この応答ベクトルをインパルスパルス応答フィルタ44に
印加する。Here, the impulse response vector part 43
Is the perceptual weighting filter coefficient, short-term prediction filter coefficient, long-term prediction filter coefficient, which is obtained from the LPC analysis results,
Calculate the impulse response vector using the pitch period,
This response vector is applied to the impulse response filter 44.

【００９８】そこで、インパルス応答フィルタ44は、カ
スケード接続された長期予測フィルタ41、短期予測フィ
ルタ42、聴覚重み付けフィルタ63b にインパルスが入力
した時のインパルス応答特性と同じ応答特性を持つこと
になり、この時の１ベクトル分の応答信号をh(0),h(1),
・・,h(n−1)とする。Therefore, the impulse response filter 44 has the same response characteristic as the impulse response characteristic when the impulse is input to the long-term prediction filter 41, the short-term prediction filter 42, and the auditory weighting filter 63b connected in cascade. The response signal for one vector of time is h (0), h (1),
.., h (n-1).

【００９９】これにより、インパルス応答フィルタ44
は、h(0),h(1),・・,h(n−1)をフィルタ係数とするFIR
形フィルタで表される。As a result, the impulse response filter 44
Is an FIR with filter coefficients h (0), h (1), ..., h (n−1)
Represented by a shape filter.

【０１００】さて、このインパルス応答フィルタ44に、
最適な利得を与えたコードベクトルが印加した時の応答
特性を xj( yj= xj(0), xj(1),・・・・xj(n-1) ) とすると、xjは(1) 式で表される。Now, in the impulse response filter 44,
Let xj (yj = xj (0), xj (1), ... xj (n-1)) be the response characteristics when the code vector with the optimum gain is applied. It is represented by.

【０１０１】xj = Ｈ・σyj (1) ここで、伝達函数Ｈ( インパルス応答行列式Ｈ) は下記
の行列で表される。Xj = Hσyj (1) Here, the transfer function H (impulse response determinant H) is expressed by the following matrix.

【０１０２】[0102]

【数１】なお、１行目はt₀，２行目はt₁・・の時のフィルタ出
力、「σyj。」は入力を示す。[Equation 1] The first line shows the filter output at t ₀ , the second line shows the filter output at t ₁ ···, and “σyj.” Shows the input.

【０１０３】この時、ターゲットベクトルx(n)とxjとの
自乗誤差が最小となる様にVQコードベクトルのインデッ
クスj が決定される。j 番目のコードベクトル選択時の
x(n)とxjの自乗誤差をDjとすると、 Dj = ｜x(n)−xj｜²= ｜x(n)−Ｈ・σyj。｜² で表され、この計算は第２の誤差計算部分61b と最小自
乗誤差計算部分63で行われる。At this time, the index j of the VQ code vector is determined so that the squared error between the target vectors x (n) and xj is minimized. when the jth code vector is selected
If the square error of x (n) and xj is Dj, Dj = | x (n) −xj | ² = | x (n) −H · σyj. | Represented by ^2, this calculation is performed by the second error calculation part 61b and the minimum square error computation section 63.

【０１０４】ここで、全コードベクトルに対する誤差が
計算され、その値が最小となるコードブックのインデッ
クス（即ち、最適インデックス）が決定される。LPC 分
析部分11は、上記のLPC 分析窓を通った入力音声信号か
らLPC 分析( 線形予測分析) を行って、生成した聴覚重
み付けフィルタ係数を、第１，第２の聴覚重み付けフィ
ルタ62a,62b に送出し、短期予測フィルタ42の次数に応
じた個数の反射係数を、反射係数量子化部分23とピッチ
周期抽出部分31に送出する( 短期予測フィルタの次数が
N 次の時、生成される反射係数はN 個である) 。Here, the error with respect to all code vectors is calculated, and the index of the codebook that minimizes the value (that is, the optimum index) is determined. The LPC analysis part 11 performs LPC analysis (linear prediction analysis) from the input speech signal passing through the above LPC analysis window, and outputs the generated auditory weighting filter coefficients to the first and second auditory weighting filters 62a and 62b. Then, the number of reflection coefficients corresponding to the order of the short-term prediction filter 42 is sent to the reflection coefficient quantization section 23 and the pitch period extraction section 31 (the order of the short-term prediction filter is
(Nth order, N reflection coefficients are generated).

【０１０５】反射係数量子化部分23は、印加したN 個の
反射係数に対して、予め設けられた反射係数テーブル24
を参照して、一番近い反射係数に量子化して短期予測フ
ィルタ係数生成部分25に送出する。なお、量子化時の最
適な反射係数テーブル24の反射係数インデックスは伝送
情報となる。The reflection coefficient quantization section 23 is provided with a reflection coefficient table 24 provided in advance for N applied reflection coefficients.
Is quantized to the nearest reflection coefficient and sent to the short-term prediction filter coefficient generation unit 25. The optimum reflection coefficient index in the reflection coefficient table 24 at the time of quantization is transmission information.

【０１０６】そこで、短期予測フィルタ係数生成部分25
は、量子化された反射係数を短期予測フィルタ係数に変
換して、符号化処理に使用される。一方、ピッチ周期抽
出部分31は、ベクトルバッファ12に格納された入力音声
信号とLPC 分析部分11からの分析結果を用いて、上記入
力音声信号の基本ピッチ周期の抽出とLPC 予測残差信号
を生成して長期予測係数量子化部分35に送出する。Therefore, the short-term prediction filter coefficient generation part 25
Is used in the encoding process by converting the quantized reflection coefficient into a short-term prediction filter coefficient. On the other hand, the pitch period extraction unit 31 uses the input speech signal stored in the vector buffer 12 and the analysis result from the LPC analysis unit 11 to extract the basic pitch period of the input speech signal and generate the LPC prediction residual signal. And sends it to the long-term prediction coefficient quantizer 35.

【０１０７】長期予測係数量子化部分35は、基本ピッチ
周期及びLPC 予測残差信号を用いて長期予測フィルタ係
数を計算した後、計算した長期予測フィルタ係数を、長
期予測フィルタテーブル34に予め格納された量子化値の
うち、最も近い量子化値に量子化される。The long-term prediction coefficient quantizer 35 calculates the long-term prediction filter coefficient using the basic pitch period and the LPC prediction residual signal, and then stores the calculated long-term prediction filter coefficient in the long-term prediction filter table 34 in advance. The quantized value is quantized to the closest quantized value.

【０１０８】なお、長期予測フィルタは３次の伝達関数
で表され、フィルタ係数は３個存在する。本発明では３
個のフィルタ係数を１ベクトルとし、このベクトルとの
誤差が最小となる様に長期予測フィルタ係数テーブル34
を探索する。また、ここで量子化された長期予測フィル
タ係数は符号化処理に使用される。The long-term prediction filter is represented by a third-order transfer function, and there are three filter coefficients. In the present invention, 3
Each filter coefficient is set as one vector, and the long-term prediction filter coefficient table 34 is set so that the error with this vector is minimized.
To explore. Further, the quantized long-term prediction filter coefficient is used in the encoding process.

【０１０９】ここで、これらの処理によって生成された
音声符号化情報（VQコードブックのインデックス、反射
係数インデックス、長期予測フィルタ係数インデック
ス) は多重部によって回線側に送る為のフォーマット変
換が行われる。Here, the voice coded information (VQ codebook index, reflection coefficient index, long-term prediction filter coefficient index) generated by these processes is subjected to format conversion for transmission to the line side by the multiplexing unit.

【０１１０】図７において、図に示す処理手順の詳細説
明は上記図６の動作説明と重複するので省略し、各ステ
ップの処理が図６のどの部分で行われるかなどについて
説明する。In FIG. 7, the detailed description of the processing procedure shown in FIG. 7 is omitted because it overlaps with the operation description of FIG. 6 described above, and in which part of FIG. 6 the processing of each step is performed will be described.

【０１１１】なお、図７に示す処理手順は、１フレーム
が第１ベクトルと第２ベクトルで構成され、ステップ１
(S1)〜ステップ８(S8)までは、第１、第２ベクトルに対
する処理、ステップ10(S10) は第２ベクトル迄の処理が
終了した時点で、次の処理を行う為の前準備（LPC 分
析) を行う部分である。In the processing procedure shown in FIG. 7, one frame is composed of the first vector and the second vector.
Steps (S1) to 8 (S8) are the processing for the first and second vectors, and step 10 (S10) is the preparation for the next processing when the processing up to the second vector is completed (LPC). Analysis).

【０１１２】従って、第１ベクトル、第２ベクトルに対
する処理が終了しない間はステップ10には移らない。さ
て、ステップ１では、インパルス応答ベクトル部分43
が、入力した長期予測フィルタ係数、短期予測フィルタ
係数、聴覚重み付けフィルタ係数及びピッチ周期を用い
てインパルス応答計算を行って初期化を行う。Therefore, the process does not proceed to step 10 until the processing for the first vector and the second vector is completed. Now, in step 1, the impulse response vector part 43
Initializes by performing impulse response calculation using the input long-term prediction filter coefficient, short-term prediction filter coefficient, auditory weighting filter coefficient, and pitch period.

【０１１３】ステップ２では、ベクトルバッファ部分12
が、入力音声信号に対してベクトル化処理を行ってm サ
ンプルの音声ベクトル信号s(n)を生成する。ステップ３
では、聴覚重み付けフィルタ62a が、ステップ２で得ら
れた音声ベクトル信号に対して聴覚重み付けを行う。In step 2, the vector buffer portion 12
Performs vectorization processing on the input speech signal to generate an m-sample speech vector signal s (n). Step 3
Then, the perceptual weighting filter 62a performs perceptual weighting on the voice vector signal obtained in step 2.

【０１１４】ステップ４では、カスケード接続された長
期予測フィルタ41、短期予測フィルタ42、聴覚重み付け
フィルタ62b に対する０入力応答計算( ZIR 計算) を行
う。即ち、長期予測フィルタに 0, 0, 0, ・・・を印
加して各フィルタのフィルタ特性( 応答状態) を計算
し、計算結果を各フィルタに内蔵するメモリに格納す
る。In step 4, 0 input response calculation (ZIR calculation) is performed for the long-term prediction filter 41, the short-term prediction filter 42, and the auditory weighting filter 62b connected in cascade. That is, 0, 0, 0, ... Is applied to the long-term prediction filter, the filter characteristic (response state) of each filter is calculated, and the calculation result is stored in the memory built in each filter.

【０１１５】ステップ５では、聴覚重み付けフィルタ62
a の出力V(n)と、聴覚重み付けフィルタ62b の出力r(n)
の差分を計算してターゲットベクトルx(n)を求める。ス
テップ６では、VQコードブック51から取り出されたコー
ドベクトルを増幅部分52、インパルス応答フィルタ部分
44を介して出力したインパルス応答ベクトルxjと、上記
のターゲットベクトルx(n)との誤差を計算して比較ベク
トル求める。In step 5, the auditory weighting filter 62
The output V (n) of a and the output r (n) of the auditory weighting filter 62b
The target vector x (n) is calculated by calculating the difference of. In step 6, the code vector extracted from the VQ codebook 51 is amplified by the amplification section 52 and the impulse response filter section.
An error between the impulse response vector xj output via 44 and the above target vector x (n) is calculated to obtain a comparison vector.

【０１１６】ステップ７では、VQコードブック51→イン
パルス応答フィルタ44→誤差計算部分61b →最小自乗誤
差計算部分63→VQコードブック51からなるループを用い
て、最小自乗誤差となるVQコードブックのインデックス
の探索を行う。In step 7, the loop of VQ codebook 51 → impulse response filter 44 → error calculation part 61b → least square error calculation part 63 → VQ codebook 51 is used to index the VQ codebook that gives the least square error. Search for.

【０１１７】ステップ８では、符号化処理してVQコード
ブックの最適インデックスが検索できれば、最適インデ
ックスのコードベクトルを利得部分52を介して、長期予
測フィルタ41、短期予測フィルタ42、聴覚重み付けフィ
ルタ62b に印加して、これらのフィルタの応答を計算さ
せる。これにより、"0状態応答" の計算結果が得られ
る。In step 8, if the optimum index of the VQ codebook can be searched by the encoding process, the code vector of the optimum index is passed through the gain portion 52 to the long-term prediction filter 41, short-term prediction filter 42, and perceptual weighting filter 62b. Apply to cause the response of these filters to be calculated. As a result, the calculation result of "0 state response" is obtained.

【０１１８】なお、ステップ８の計算を実施する際、ス
テップ4 で行った計算結果を一時，退避させてメモリの
内容を"0" にクリアしておく。そして、"0状態応答" の
計算結果と退避させたメモリ内容を加算して、各フィル
タのメモリ内容を更新するが、この更新は毎回行わなけ
ればならない。When performing the calculation in step 8, the calculation result obtained in step 4 is temporarily saved and the memory contents are cleared to "0". Then, the calculation result of "0 state response" and the saved memory content are added to update the memory content of each filter, but this update must be performed every time.

【０１１９】ステップ９では、第２ベクトルまで終了し
てなければステップ２に戻り、終了すればステップ10に
移行する。ステップ10では、 a.「前フレームの入力データのLPC 分析」と「聴覚重み
付けフィルタ係数の適応」は LPC分析部分11で処理し、 b.「短期予測フィルタ係数の適応」は反射係数量子化部
分23、反射係数テーブル24、短期予測フィルタ係数生成
部分25で処理し、 c.「ピッチ周期抽出」はピッチ周期抽出部分31で処理
し、 d.「長期予測フィルタ係数の適応」は長期予測係数量子
化部分35、長期予測フィルタ係数テーブル34」で処理
し、 e.「利得の適応」は利得適応器53で処理し、 f.「インパルス応答計算」はインパルス応答ベクトル部
分43で処理する。In step 9, if the second vector is not completed, the process returns to step 2, and if completed, the process proceeds to step 10. In step 10, a. “LPC analysis of the input data of the previous frame” and “adaptation of auditory weighting filter coefficients” are processed in the LPC analysis part 11, and b. “Adaptation of short-term prediction filter coefficients” is performed in the reflection coefficient quantization part. 23, reflection coefficient table 24, short-term prediction filter coefficient generation part 25, c. "Pitch cycle extraction" is processed by pitch cycle extraction part 31, d. "Long-term prediction filter coefficient adaptation" is long-term prediction coefficient quantum Processing section 35, long-term prediction filter coefficient table 34 ", e." Adjustment of gain "is processed by gain adaptor 53, and f." Impulse response calculation "is processed by impulse response vector section 43.

【０１２０】図８において、受信した音声符号化情報は
分離部71で VQ コードブック51のインデックス、反射係
数テーブル26のインデックス、長期予測フィルタ係数テ
ーブル36のインデックス及びピッチ周期に分離される。In FIG. 8, the received speech coded information is separated into a VQ codebook 51 index, a reflection coefficient table 26 index, a long-term prediction filter coefficient table 36 index and a pitch period in a separation unit 71.

【０１２１】分離されたそれぞれのインデックスを用い
て、VQコードブック51から対応したコードベクトル、反
射係数テーブル26から対応した反射係数、長期予測フィ
ルタ係数テーブル36から対応した長期予測フィルタ係数
がそれぞれ得られる。Using the respective separated indexes, the corresponding code vector from the VQ codebook 51, the corresponding reflection coefficient from the reflection coefficient table 26, and the corresponding long-term prediction filter coefficient from the long-term prediction filter coefficient table 36 are respectively obtained. .

【０１２２】また、短期予測フィルタ係数生成部25は、
入力した反射係数を用いて短期予測フィルタ係数を生成
する。そして、短期予測フィルタ係数を短期予測フィル
タ42に、長期予測フィルタ係数をピッチ周期が加えられ
た長期予測フィルタ41に、それぞれ与える。Further, the short-term prediction filter coefficient generator 25
A short-term prediction filter coefficient is generated using the input reflection coefficient. Then, the short-term prediction filter coefficient is given to the short-term prediction filter 42, and the long-term prediction filter coefficient is given to the long-term prediction filter 41 to which the pitch period is added.

【０１２３】一方、上記のコードベクトルは、利得部分
52で利得適応器53が示した最適利得が与えられた後、長
期予測フィルタ41と短期予測フィルタ42を通って復号音
声が生成されて取り出される。On the other hand, the above code vector has a gain portion.
After the optimum gain indicated by the gain adaptor 53 is given at 52, decoded speech is generated and taken out through the long-term prediction filter 41 and the short-term prediction filter 42.

【０１２４】図９において、受信した音声符号化情報は
分離部71で VQ コードブック51のインデックス、長期予
測フィルタ係数テーブル36のインデックス及びピッチ周
期に分離される。In FIG. 9, the received speech coded information is separated into a VQ codebook 51 index, a long-term prediction filter coefficient table 36 index and a pitch period in a separation unit 71.

【０１２５】分離されたそれぞれのインデックスを用い
て、VQコードブック51から対応したコードベクトル、長
期予測フィルタ係数テーブル36から対応した長期予測フ
ィルタ係数がそれぞれ得られる。Using each of the separated indexes, the corresponding code vector from the VQ codebook 51 and the corresponding long-term prediction filter coefficient are obtained from the long-term prediction filter coefficient table 36, respectively.

【０１２６】短期予測フィルタ42のフィルタ係数は、こ
のフィルタが過去に出力した復号音声信号を適応器46に
おいてLPC 分析することにより求められる。利得適応器
53は、過去にVQコードブック51から出力されたVQコード
ベクトルと、それに付加した利得から現在のコードベク
トルに与える利得を予測する。The filter coefficient of the short-term prediction filter 42 is obtained by subjecting the decoded speech signal output in the past by this filter to LPC analysis in the adaptor 46. Gain adaptor
53 predicts the gain to be given to the current code vector from the VQ code vector output from VQ code book 51 in the past and the gain added to it.

【０１２７】そこで、VQコードブック51から取り出され
たコードベクトルは、利得部分52で最適な利得が与えら
れた後、ピッチ周期と長期予測フィルタ係数が加えられ
た長期予測フィルタ、短期予測フィルタを通過して復号
音声信号として出力する。Therefore, the code vector taken out from the VQ codebook 51 is passed through the long-term prediction filter and the short-term prediction filter to which the pitch period and the long-term prediction filter coefficient are added after the gain part 52 is given an optimum gain. And outputs it as a decoded audio signal.

【０１２８】つまり、本発明によれば、フレーム長及び
ベクトル長が短い為、符号化による遅延時間を短くする
ことができる。これにより符号化処理に必要な各種デー
タを格納する為のバッファメモリが小さくなり、ハード
ウェア規模を削減することができ、ひいては音声符号化
装置全体のコストダウン及び低消費電力化に寄与すると
ころが大きい。That is, according to the present invention, since the frame length and the vector length are short, the delay time due to encoding can be shortened. As a result, the buffer memory for storing various data necessary for the encoding process becomes small, the hardware scale can be reduced, and it contributes to the cost reduction and the power consumption reduction of the entire speech encoding apparatus. .

【０１２９】また、短期予測フィルタの出力である合成
音声信号をLPC 分析して、短期予測フィルタ係数を生成
することにより、符号器は短期予測フィルタ係数の情報
を復号器側に伝送する必要がなくなり、より低ビットレ
ート化の可能性がある。Further, by performing LPC analysis on the synthesized speech signal which is the output of the short-term prediction filter to generate the short-term prediction filter coefficient, the encoder does not need to transmit the information of the short-term prediction filter coefficient to the decoder side. , There is a possibility of lowering the bit rate.

【０１３０】更に、本方式では処理された音声はVQコー
ドブックのコードベクトル数によって音質が異なる。そ
こで、短期予測フィルタ係数の情報を削減する代わり
に、VQコードブックのコードベクトルの数を増加するこ
とにより、同じビットレートでより音質の向上が可能と
なる。Furthermore, in this method, the sound quality of the processed voice differs depending on the number of code vectors in the VQ codebook. Therefore, instead of reducing the information of the short-term prediction filter coefficient, it is possible to improve the sound quality at the same bit rate by increasing the number of code vectors of the VQ codebook.

【０１３１】次に、図10〜図20の説明を行うが、図10〜
図13の基本的な処理手順は図６〜図８と同等であり、上
記で詳細説明してあるので概略の説明を行う。なお、大
きな変更点としては後述する様に、基本ピッチ成分を再
現する為に長期フィルタの代わりに適応コードブックを
設けたことである。Next, the description of FIGS. 10 to 20 will be made.
The basic processing procedure of FIG. 13 is equivalent to that of FIGS. 6 to 8 and has been described in detail above, so a brief description will be given. The major change is that an adaptive codebook is provided instead of the long-term filter in order to reproduce the basic pitch component, as will be described later.

【０１３２】さて、図10の動作を図11の処理手順を用い
て概略説明する。先ず、"0" を印加してプログラムの初
期化をした後、インパルス応答ベクトル生成部116 でイ
ンパルス応答計算を行う。Now, the operation of FIG. 10 will be briefly described with reference to the processing procedure of FIG. First, "0" is applied to initialize the program, and then the impulse response vector generation unit 116 performs impulse response calculation.

【０１３３】ここで、インパルス応答ベクトル生成部11
6 は、合成フィルタと聴覚重み付けフィルタを縦続接続
し、これらのフィルタに合成フィルタ係数、聴覚重み付
けフィルタ係数を与えて構成してあるが、ここにインパ
ルスを印加してその出力h(n)を求める。Here, the impulse response vector generator 11
In Fig. 6, a synthesis filter and a perceptual weighting filter are connected in cascade, and a synthesis filter coefficient and a perceptual weighting filter coefficient are given to these filters. Impulse is applied to this filter to obtain its output h (n). .

【０１３４】この時、インパルス応答フィルタ117 では
インパルス応答ベクトルh(n)とコードベクトルとの畳み
込み演算を行って０状態応答ベクトルy(n)を求める(S1,
S2参照) 。At this time, the impulse response filter 117 performs the convolution operation of the impulse response vector h (n) and the code vector to obtain the 0-state response vector y (n) (S1,
(See S2).

【０１３５】また、ベクトルバッファ101 での入力音声
信号のベクトル化(S3 参照) 、聴覚重み付けフィルタ10
7bにおける入力ベクトルの聴覚重み付けフィルタリング
(S4参照) 、聴覚重み付き合成フィルタ(106, 107a) の
零入力応答ベクトルr(n)計算及び聴覚重み付けフィルタ
107bの出力であるターゲットベクトルV(n)との減算(S5
参照) 、減算部102aでのターゲットベクトルx(n)の計算
(S6 参照) 、適応コードブック109 でのピッチp 探索処
理、最適利得β計算から適応コードブック・インデック
ス、適応コードブック・ゲイン・インデックスの決定、
固定コードブック112 でのコード cの探索処理、最適利
得γ計算から固定コードブック・インデックス、固定コ
ードブック・ゲイン・インデックスの決定、最適コード
ブック109 の更新処理( S7〜S9参照) 、各部フィルタメ
モリ106, 107a, 116, 117 での更新を行う(S10参照) 。
そして、フレーム処理か否かをプログラムが判断する(S
11参照) 。In addition, vectorization of the input audio signal in the vector buffer 101 (see S3), the perceptual weighting filter 10
Auditory weighted filtering of input vectors in 7b
(Refer to S4), Zero input response vector r (n) calculation of auditory weighted synthesis filter (106, 107a) and auditory weighting filter
Subtraction with the target vector V (n) which is the output of 107b (S5
Calculation of target vector x (n) in subtraction unit 102a
(See S6), pitch p search processing in adaptive codebook 109, determination of adaptive codebook index, adaptive codebook gain index from optimal gain β calculation,
Search process for code c in fixed codebook 112, determination of fixed codebook index and fixed codebook gain index from optimal gain γ calculation, update process of optimal codebook 109 (see S7 to S9), filter memory for each part Update with 106, 107a, 116, 117 (see S10).
Then, the program determines whether the frame processing is performed (S
(See 11).

【０１３６】これは、線形予測分析103, 反射係数量子
化部104a, フィルタ係数生成部104bはフレーム毎に処理
するが、それ以外の部分はベクトル的に処理するが、こ
れをプログラムが判定する為である。This is because the linear prediction analysis 103, the reflection coefficient quantization unit 104a, and the filter coefficient generation unit 104b process each frame, but the other parts are processed in a vector manner, and the program determines this. Is.

【０１３７】プログラム処理であれば、線形予測分析部
103 での入力音声信号の線形予測分析、反射係数量子化
部12での反射係数量子化を行って、反射係数テーブル10
5 から反射係数テーブル・インデックスの決定、フィル
タ係数生成部13での合成フィルタ106 と聴覚重み付けフ
ィルタ107a, 107bのフィルタ係数生成、インパルス応答
ベクトル生成部116 でのh(n)の生成を行うことを繰り返
す(S11〜S15 参照) 。For program processing, the linear prediction analysis unit
The linear prediction analysis of the input speech signal in 103 and the reflection coefficient quantization in the reflection coefficient quantization unit 12 are performed, and the reflection coefficient table 10
5, the reflection coefficient table index determination, the filter coefficient generation in the filter coefficient generation unit 13, the filter coefficient generation in the auditory weighting filters 107a and 107b, and the h (n) generation in the impulse response vector generation unit 116 are performed. Repeat (see S11-S15).

【０１３８】また、図12に示す復号器の動作を図13の処
理手順を用いて概略説明する。図13において、符号化器
と同様にプログラムの初期化し、伝送符号を受信する(S
1, S2 参照) 。その後、受信した適応コードブック・イ
ンデックス、適応コードブック・ゲイン・インデックス
に対応する最適適応コードベクトルと最適適応コードベ
クトルゲインを、適応コードブック109 と適応コードブ
ック・ゲイン・テーブル110 から取り出し、後者を利得
手段111 に付与する(S3 参照) 。The operation of the decoder shown in FIG. 12 will be briefly described with reference to the processing procedure of FIG. In FIG. 13, similar to the encoder, the program is initialized and the transmission code is received (S
1, S2). After that, the optimum adaptive code vector and the optimum adaptive code vector gain corresponding to the received adaptive code book index and adaptive code book gain index are extracted from the adaptive code book 109 and the adaptive code book gain table 110, and the latter is obtained. It is given to the gain means 111 (see S3).

【０１３９】また、受信した固定コードブック・インデ
ックス、固定コードブック・ゲイン・インデックスに対
応する最適固定コードベクトルと最適固定コードベクト
ルゲインを固定コードブック112 と固定コードブック・
ゲイン・テーブル113 から取り出し、後者を利得手段11
4 に付与する(S4 参照) 。Further, the optimum fixed code vector corresponding to the received fixed codebook index and fixed codebook gain index and the optimum fixed code vector gain are set to fixed codebook 112 and fixed codebook.
From the gain table 113, the latter is gain means 11
Assign to 4 (see S4).

【０１４０】そして、適応コードブック109 の更新、合
成フィルタ106 の中のメモリの更新を行って出力音声信
号を取り出し(S5, S6 参照) 、フレーム処理であれば、
反射係数量子化部104aは受信した反射係数テーブル・イ
ンデックスに対応する反射係数を反射係数テーブルから
取り出してフィルタ係数生成部104bに送る(S7, S8 参
照) 。Then, the adaptive codebook 109 is updated and the memory in the synthesis filter 106 is updated to extract the output audio signal (see S5 and S6).
The reflection coefficient quantization unit 104a extracts the reflection coefficient corresponding to the received reflection coefficient table index from the reflection coefficient table and sends it to the filter coefficient generation unit 104b (see S7 and S8).

【０１４１】フィルタ係数生成部104bは、入力した反射
係数を用いて合成フィルタ係数を生成することを繰り返
す(S9 参照) 。なお、生成した合成フィルタ係数は合成
フィルタ106 に送られる。The filter coefficient generator 104b repeats the generation of the synthetic filter coefficient using the input reflection coefficient (see S9). The generated synthesis filter coefficient is sent to the synthesis filter 106.

【０１４２】さて、音声を再生する際、音声信号はある
周期の繰り返しになっていると云うのが前提になってい
る。そこで図６に示す符号器の構成ではピッチ周期抽出
部分31がベクトルバッファの出力を用いて抽出したピッ
チ周期を長期予測フィルタ41に与えて、利得部分52の出
力ベクトルに周期的にピッチ成分を加えている。しか
し、この方法の一番難しい点はピッチ周期の抽出の際に
適切なピッチ周期が求め難いことである。By the way, it is premised that the audio signal is repeated in a certain cycle when the audio is reproduced. Therefore, in the configuration of the encoder shown in FIG. 6, the pitch period extraction unit 31 gives the pitch period extracted by using the output of the vector buffer to the long-term prediction filter 41, and periodically adds the pitch component to the output vector of the gain unit 52. ing. However, the most difficult point of this method is that it is difficult to find an appropriate pitch period when extracting the pitch period.

【０１４３】一方、適応コードブックは過去にどの様な
信号を送出してきたかと云う情報が蓄積されており、蓄
積された信号の中から適正なものがあれば、それをもう
一度取り出してくれば、ピッチ成分を合成することがで
きる。On the other hand, the adaptive codebook stores information indicating what kind of signal has been transmitted in the past. If there is a proper one among the stored signals, if it is taken out again, Pitch components can be combined.

【０１４４】そこで、図10に示す符号器では長期フィル
タの代わりに適応コードブック109を設けてある。即
ち、適応コードブック109 は、過去に発生した最適コー
ドベクトルをある大きさ( ピッチ抽出範囲＋１ベクト長
分) のバッファメモリに蓄積したものであり、過去の蓄
積データから１ベクトル分、切り出し、その最適利得で
利得調整を行った後、合成フィルタと聴覚重み付けフィ
ルタを通過させて重み付け領域の信号、即ち、重み付け
信号r(n)を生成し、入力ベクトルV(n)との差分を評価す
る。Therefore, in the encoder shown in FIG. 10, an adaptive codebook 109 is provided instead of the long-term filter. That is, the adaptive codebook 109 is one in which the optimum codevectors generated in the past are accumulated in a buffer memory of a certain size (pitch extraction range + 1 vector length), and one vector is cut out from the accumulated data in the past. After the gain adjustment is performed with the optimum gain, a signal in the weighting region, that is, a weighting signal r (n) is generated by passing through the synthesis filter and the auditory weighting filter, and the difference from the input vector V (n) is evaluated.

【０１４５】そして、次の切出しを１サンプルずらして
上記と同様な処理を繰り返し、各差分信号の中から誤差
が最小になるものを見つけ、そのベクトルの位置がピッ
チとなり、最適なコードベクトルとして、そのインデッ
クスを符号情報として伝送する様にした。Then, the next cut-out is shifted by one sample, and the same processing as above is repeated to find the one having the smallest error from the respective difference signals, and the position of the vector becomes the pitch, and as the optimum code vector, The index is transmitted as code information.

【０１４６】なお、適応コードブック109 は過去に発生
した最適コードベクトルを、図15の適応コードブックy0
に示す様に横長に格納し、位置ｂから１ベクトル分を切
出し、次の切出しは１サンプルずらした位置ａから１ベ
クトル分切り出す様になっている。Adaptive codebook 109 uses the optimum codevectors generated in the past as adaptive codebook y0 in FIG.
As shown in (1), it is stored horizontally, and one vector is cut out from the position b, and the next cutting is performed by cutting out one vector from the position a shifted by one sample.

【０１４７】ここで、図10中の機能ブロックで実現され
ているアルゴリズムは下記の様である。合成フィルタ10
6 の伝達関数F(z)は、次の様になる。Here, the algorithm realized by the functional blocks in FIG. 10 is as follows. Synthesis filter 10
The transfer function F (z) of 6 is as follows.

【０１４８】[0148]

【数２】聴覚重み付けフィルタ107 の伝達関数は、次の様にな
る。[Equation 2] The transfer function of the perceptual weighting filter 107 is as follows.

【０１４９】[0149]

【数３】適応コードブック探索の評価式 D_Pは次の様になる。(Equation 3) The evaluation formula D _P of the adaptive codebook search is as follows.

【０１５０】D_p= MIN ｜ v n −βＨp ｜² Ｈ:
インパルス応答行列固定コードブック探索の評価式 D_cは次の様になる。 v n = v n −β_optＨp_optとして D_c= MIN ｜ v n −γＨc ｜² Ｈ: インパルス
応答行列ここで、聴覚重み付けフィルタの伝達関数内のz はフィ
ルタの遅延素子、γはパラメータ、適応コードブック探
索の評価式のv n −βHpの部分は入力信号( ターゲット
ベクトル) と最適値との差分を求める部分、固定コード
ブック探索の評価式は適応コードブックで求めたMIN の
v n を固定値とし、これを用いて自分のv n MIN を求め
るので２段階処理（２つの式）となる。[0150] _{D p = MIN | vn -βHp |} 2 H:
Impulse response matrix The evaluation formula D _{c for} fixed codebook search is as follows. vn = vn − β _opt Hp _opt as D _c = MIN │ vn − γ Hc │ ² H: Impulse response matrix where z in the transfer function of the auditory weighting filter is the delay element of the filter, γ is a parameter, adaptive codebook search The vn − βHp part of the evaluation formula of is the part that finds the difference between the input signal (target vector) and the optimum value, and the evaluation formula of the fixed codebook search is the MIN of the adaptive codebook.
Since vn is a fixed value and this is used to calculate one's own vn MIN, it is a two-step process (two expressions).

【０１５１】また、インパルス応答行列式Ｈは次の様に
なる。Further, the impulse response determinant H is as follows.

【０１５２】[0152]

【数４】h n =｛ h₀, h₁, h₂, … , h_n-1｝ :インパ
ルス応答列なお、このインパルス応答行列式と上記の伝達関数とは
同じ式であるが、演算処理の為、インパルス応答行列と
してある。Hn = {h ₀ , h ₁ , h ₂ , ..., h _n-1 }: Impulse response sequence Note that this impulse response determinant and the above transfer function are the same equations, but Therefore, it is an impulse response matrix.

【０１５３】次に、図14〜図18について説明する。一般
に、配列要素に対して時系列に並んだデータを格納する
場合、時刻の古いものや、配列の添字の小さい方から順
に格納している。これはアルゴリズムの理論式で表現さ
れる添字との対応がつき易い為である。Next, FIGS. 14 to 18 will be described. In general, when storing data arranged in time series for array elements, the data is stored in order from the oldest time or the smallest subscript of the array. This is because it easily corresponds to the subscript expressed by the theoretical formula of the algorithm.

【０１５４】例えば、図10の入力音声信号のベクトル化
を下記の様に表現する場合、ベクトルs(n) =｜s₀, s₁, s_2,・・,s_n-1｜にはデータの到着順にs₀から s_n-1に順次、格納( これ
を正順格納と云う) される( 図14(a) 参照) 。For example, when the vectorization of the input speech signal of FIG. 10 is expressed as follows, the vector s (n) = | s ₀ , s ₁ , s _2, ..., s _n-1 | Are sequentially stored from s ₀ to s _n−1 in the order of arrival (see FIG. 14 (a)).

【０１５５】本発明では図14(b) に示す様に時間軸上で
の順序と配列の添字との順序を逆転している。その為、
理論式との対応が付け難くなるが、実際のプロセッサ上
では実現時に演算量を削減するのに役立つことになる(
後述する) 。In the present invention, as shown in FIG. 14 (b), the order on the time axis and the order of array subscripts are reversed. For that reason,
It is difficult to attach a correspondence to the theoretical formula, but it will be useful to reduce the amount of calculation at the time of implementation on an actual processor (
(See below).

【０１５６】また、図10のインパルス応答フィルタ117
でコードベクトルから０状態応答ベクトルy(n)を合成す
る際、図15に示す様にインパルス応答行列 Hと適応コー
ドブック109 から取り出したコードベクトルy0との畳み
込み演算を行う。Further, the impulse response filter 117 of FIG.
When synthesizing the 0-state response vector y (n) from the code vector, the convolution operation of the impulse response matrix H and the code vector y0 extracted from the adaptive codebook 109 is performed as shown in FIG.

【０１５７】適応コードブック109 は上記の様に、配列
内のデータを１サンプルずつ、ずらしながら１ベクトル
ずつ切り出す。このベクトルに対して順次、畳み込み演
算を行うが、その演算処理の削減の為に、前回のベクト
ル計算時の差分のみを計算する方法が取られる。As described above, the adaptive codebook 109 cuts out the data in the array one sample at a time and one vector at a time while shifting it. A convolution operation is sequentially performed on this vector, but in order to reduce the operation processing, a method of calculating only the difference at the time of the previous vector calculation is adopted.

【０１５８】ここで、時刻の古いものを配列の添字の小
さい方から順に格納する、従来通りの時系列と配列要素
の順序が同じ場合、畳み込み演算は図16に示す様にな
る。ここでは、例として１ベクトル= 10サンプルとし、
i=0 から(n-1) の範囲で順に計算する場合、例えば、i=
(n-10)の計算時には、i=(n-11)の点線内の部分が流用で
きる為、各行毎に最左列を減算し( 差分計算) 、i=n-
10の最下行のみを計算する( 差分計算) 。Here, in the case where the oldest time is stored in the ascending order of the array subscripts and the order of the array elements is the same as the conventional time series, the convolution operation is as shown in FIG. Here, as an example, 1 vector = 10 samples,
When calculating in order from i = 0 to (n-1), for example, i =
When calculating (n-10), the part within the dotted line of i = (n-11) can be reused, so the leftmost column is subtracted for each row (difference calculation), i = n-
Calculate only the bottom 10 rows (difference calculation).

【０１５９】i=(n-9) の計算時にも、同様に、i=(n-10)
の点線内の部分が流用できる為、各行毎に最左列を減算
し、i=n-９の最下行のみを計算する。この時の演算回数
は、最左列の計算・・・積算：９回、減算：９回 →差
分計算最下行の計算・・・積算：10回、加算：10回 →差
分計算の合計38回の四則演算が必要である。また、その時、選
択される最適コードブック・インデックスは、ベクトル
の切り出し位置を使って、現在の位置をn とすると(n-
i) の計算が必要で直観的には判り難い。Similarly, when calculating i = (n-9), i = (n-10)
Since the part within the dotted line of can be diverted, the leftmost column is subtracted for each row and only the bottom row of i = n-9 is calculated. At this time, the number of calculations is the leftmost column: total: 9 times, subtraction: 9 times → difference calculation Bottom row calculation: total: 10 times, addition: 10 times → total difference calculation 38 times All four arithmetic operations are required. Also, at that time, the optimum codebook index to be selected is (n-
It is difficult to intuitively understand because i) needs to be calculated.

【０１６０】これに対して、図17の様に格納順を逆転す
る。またこの場合は配列要素y0 0は使用しないものと
し、上記の同じ位置の計算を行なう場合を考える。i=11
の計算時にi=10の一部を流用した場合、図18に示す様
に、最左列の分だけ加算すればよいことになる。On the other hand, the storage order is reversed as shown in FIG. Further, in this case, the array element y0 0 is not used, and the case of performing the same position calculation is considered. i = 11
When a part of i = 10 is used in the calculation of, it is necessary to add only the leftmost column as shown in FIG.

【０１６１】この時の演算回数は最左列の計算・・・積算：10回、加算：９回の19回の演算だけで済むことになる。また、この時の最
適適応コードブック・インデックスも、現在の位置がn=
0 となる為、ベクトルの切り出し位置i の値がそのまま
使えることになり、直観的にも判りやすい。The number of calculations at this time is 19 in total, that is, calculation in the leftmost column: integration: 10 times, addition: 9 times. In addition, the optimum adaptive codebook index at this time also has the current position n =
Since it is 0, the value of the cutout position i of the vector can be used as it is, which is intuitively easy to understand.

【０１６２】なお、図16において、i を大きな値から小
さな値の方向( 後から前の方向) に順次、計算を行なえ
ばよい。図16の（3)→(2) →(1) の順で計算を行なう。
例えば、(1) の処理には(2) の点線内の部分のデータが
流用でき、図18と同等の処理ができるが、最適コードブ
ック・インデックスの算出( 差分計算) には、（n-i)の
計算は避けられず、従来例と同様な処理が必要であると
云う問題が残る。この他にも、適応コードブックの演
算量削減には、連続するピッチ情報の間に強い相関関係
がある性質を利用し、毎回全ピッチ抽出範囲について検
索するのではなく、予めピッチ位置を予測し、その前後
数サンプルについてのみ探索処理を行なう方法がある。It should be noted that in FIG. 16, the calculation may be sequentially performed in the direction from a large value to a small value (backward to frontward). Calculation is performed in the order of (3) → (2) → (1) in Fig. 16.
For example, the data within the dotted line in (2) can be diverted to the process of (1), and the same process as in Fig. 18 can be performed, but the calculation of the optimal codebook index (difference calculation) can be performed using (ni) However, there remains a problem that the same processing as the conventional example is required. In addition to this, in order to reduce the calculation amount of the adaptive codebook, the property that there is a strong correlation between consecutive pitch information is used, and instead of searching the entire pitch extraction range each time, the pitch position is predicted in advance. , There is a method of performing a search process only for several samples before and after that.

【０１６３】一般に行なわれている方法は、入力信号の
線形予測残差波形の相関計算からピッチを算出し( 長期
予測フィルタと同じ) 、適応コードブックのピッチ候補
として利用する方法が用いられる。As a generally used method, a method is used in which the pitch is calculated from the correlation calculation of the linear prediction residual waveform of the input signal (same as the long-term prediction filter) and used as the pitch candidate of the adaptive codebook.

【０１６４】この場合、ピッチの予備探索の為、入力信
号から残差波形求める為の逆フィルタや相関計算の為の
相関計算手段などが別途必要となり、構成が複雑にな
る。本発明では、適応コードブック探索処理の中で、ピ
ッチ候補を設定し、順次、更新する方法を取ることで、
付加的なブロックは発生しない。この方法について、図
19、図20を用いて説明する。In this case, for the preliminary search of the pitch, an inverse filter for obtaining the residual waveform from the input signal, a correlation calculating means for calculating the correlation, etc. are additionally required, which complicates the configuration. In the present invention, in the adaptive codebook search process, pitch candidates are set, and by sequentially updating,
No additional blocks occur. Figure about this method
It will be described with reference to FIGS.

【０１６５】図19は全ベクトルで毎回ピッチ抽出評価を
行なう方法で、適応コードブック探索は各ベクトル毎に
行なわれる。ここでは、１フレーム内のベクトル数をm
とし、各ベクトル番号として１からm を付加する。各ベ
クトル毎にその時の適応コードブック内のデータを利用
し、ピッチ抽出範囲全体について探索し、P1からPmまで
の最適ピッチを求めるものとする。この場合、各ベクト
ルにおける探索演算の処理量は、抽出範囲が全体の為に
全て同じである( 図19(a) 〜(c) 参照) 。FIG. 19 shows a method in which pitch extraction evaluation is performed for all vectors every time, and the adaptive codebook search is performed for each vector. Here, the number of vectors in one frame is m
And add 1 to m as each vector number. For each vector, the data in the adaptive codebook at that time is used to search the entire pitch extraction range to find the optimum pitch from P1 to Pm. In this case, the processing amount of the search operation in each vector is the same because the extraction range is the whole (see FIGS. 19 (a) to 19 (c)).

【０１６６】ここで、一般に音声のピッチ周期は、連続
する前後間では相関が強く、ほぼ一定の値をとることが
判っている。つまり、P1とP2、P2とP3・・・・はほぼ同
じ値が続くことが多い。そこで、この性質を利用して探
索範囲を限定し、演算量を削減する方法を述べる。Here, it is generally known that the pitch period of a voice has a strong correlation between before and after successive voices and takes a substantially constant value. In other words, P1 and P2, P2 and P3 ... Therefore, a method of limiting the search range and reducing the amount of calculation by utilizing this property will be described.

【０１６７】図20において、第１ベクトルについては、
上記と同様に全ピッチ範囲について探索し、P1を求め
る。次の第２ベクトルではP1の近くにある筈であるとし
て探索範囲をP1±k の範囲に限定して探索を行いP2を得
る。In FIG. 20, for the first vector,
Similar to the above, the entire pitch range is searched to find P1. In the next second vector, assuming that it should be near P1, the search range is limited to the range of P1 ± k, and P2 is obtained.

【０１６８】なお、K の選定には、例えば、図17の説明
で求めたP1、Pmから前後間の差分の平均や最大値を、複
数の音声サンプルについて評価した結果から設定する方
法が考えられる。Note that, for the selection of K, for example, a method of setting the average or maximum value of the difference between before and after from P1 and Pm obtained in the explanation of FIG. 17 from the result of evaluation of a plurality of voice samples can be considered. .

【０１６９】第３ベクトル以下も同様に前ベクトルのピ
ッチ±K の範囲で探索処理を行なうことにより、演算量
の削減が図れる( 図20(a) 〜(c) 参照) 。また、フレー
ム内で全ピッチ範囲を探索する処理を、第１ベクトルの
１回だけに限らず、２ベクトルに１回、３ベクトルに１
回と、複数回全ピッチ探索を行なう様にする方法も考え
られる。この場合、演算回数は増加するが、ピッチ探索
の精度は上がることになり、特性の改善に効果がある。Similarly, for the third and subsequent vectors, the amount of calculation can be reduced by performing the search process within the range of the pitch ± K of the previous vector (see FIGS. 20 (a) to 20 (c)). Further, the process of searching the entire pitch range in a frame is not limited to once for the first vector, but once for two vectors and once for three vectors.
It is also possible to consider a method of performing the full pitch search once or multiple times. In this case, the number of calculations increases, but the accuracy of pitch search increases, which is effective in improving the characteristics.

【０１７０】上記はピッチ候補として１個を残す方法で
あるが、ピッチ候補として複数（２候補以上）残す構成
にし、その各々の候補に対して±K の範囲で探索処理を
行なう様にする方法も考えられる。ある候補に対する探
索処理は図20と同様であり、例えば、PIがP1-1, P1-2・
・と複数存在すると考えればよい。The above is a method of leaving one pitch candidate, but a method of leaving a plurality of pitch candidates (two or more candidates) and performing a search process within a range of ± K for each candidate. Can also be considered. The search process for a certain candidate is the same as in FIG. 20, and for example, PI is P1-1, P1-2
・ You can think that there is more than one.

【０１７１】[0171]

【発明の効果】以上、詳細に説明した様に本発明によれ
ば、符号化による遅延時間を短くすことができる為、符
号化処理に必要な各種データを格納するバッファメモリ
が小さくなり、ハードウェア規模を削減することができ
る。As described above in detail, according to the present invention, since the delay time due to the encoding can be shortened, the buffer memory for storing various data necessary for the encoding processing becomes small and the hardware can be reduced. The wear scale can be reduced.

【０１７２】また、短期予測フィルタの出力である合成
音声信号をLPC 分析して、短期予測フィルタ係数を生成
することにより、符号器は短期予測フィルタ係数の情報
を復号器側に伝送する必要がなくなり、より低ビットレ
ート、または高音質化が可能で、高能率音声伝送装置の
性能向上を図ることができると云う効果がある。Further, by performing LPC analysis on the synthesized speech signal which is the output of the short-term prediction filter to generate the short-term prediction filter coefficient, the encoder does not need to transmit the information of the short-term prediction filter coefficient to the decoder side. Further, there is an effect that a lower bit rate or higher sound quality can be achieved and the performance of the high-efficiency audio transmission device can be improved.

【０１７３】更に、本発明による符号化方式を採用する
ことで、従来例と比べてより低い伝送レートでの広帯域
音声が提供でき、より自然で臨場間のある環境が提供で
きる。また、畳み込み演算時にその処理順序を逆店する
ことにより、その処理に必要な演算量を削減することが
出来、実現ハードウェア規模を小さくすることができる
きると云う効果がある。Further, by adopting the coding method according to the present invention, it is possible to provide wideband speech at a transmission rate lower than that of the conventional example and to provide a more natural and realistic environment. In addition, by reversely processing the order of the convolutional operations, it is possible to reduce the amount of operations required for the processing and to reduce the scale of realized hardware.

[Brief description of drawings]

【図１】第１〜第３の本発明の音声符号器の要部機能説
明図である。FIG. 1 is a functional explanatory diagram of a main part of a speech coder according to first to third aspects of the present invention.

【図２】第４の本発明の音声符号器の要部機能説明図で
ある。FIG. 2 is a functional explanatory diagram of a main part of a speech coder according to a fourth aspect of the present invention.

【図３】図１の処理タイミング説明図である。3 is an explanatory diagram of a processing timing of FIG.

【図４】ＬＰＣ分析窓説明図FIG. 4 is an explanatory diagram of an LPC analysis window

【図５】第７〜第１２の本発明の音声符号器・復号器の
要部機能説明図である。FIG. 5 is a functional explanatory diagram of a main part of a speech encoder / decoder according to seventh to twelfth aspects of the present invention.

【図６】第１〜第３の本発明の実施例の機能説明図（符
号器）である。FIG. 6 is a functional explanatory diagram (encoder) of the first to third embodiments of the present invention.

【図７】図６の処理手順説明図である。7 is an explanatory diagram of a processing procedure of FIG.

【図８】図６の符号器に対応する復号器の機能説明図で
ある。8 is a functional explanatory diagram of a decoder corresponding to the encoder of FIG.

【図９】第６の本発明の実施例の機能説明図（復号器）
である。FIG. 9 is a functional explanatory diagram of a sixth embodiment of the present invention (decoder).
It is.

【図１０】第７〜第９, 第１１の本発明の実施例の機能
説明図( 符号器) である。FIG. 10 is a functional explanatory diagram (encoder) of seventh to ninth and eleventh embodiments of the present invention.

【図１１】図１０の処理手順説明図である。11 is an explanatory diagram of a processing procedure of FIG.

【図１２】第１０、第１２の本発明の実施例の機能説明
図( 復号器) である。FIG. 12 is a functional explanatory diagram (decoder) of the tenth and twelfth embodiments of the present invention.

【図１３】図１２の処理手順説明図である。13 is an explanatory diagram of a processing procedure of FIG.

【図１４】データ格納順序説明図で、(a) は正順格納
で、従来例の場合、 (b)は逆順格納で、第７の本発明の
場合である。FIG. 14 is a diagram for explaining a data storage order, in which (a) is a normal order storage, and (b) is a reverse order storage, which is the case of the seventh aspect of the present invention.

【図１５】図１４(a) の場合の演算量説明図（その１）
である。FIG. 15 is an explanatory diagram of the calculation amount in the case of FIG. 14 (1).
It is.

【図１６】図１４(a) の場合の演算量説明図（その２）
である。FIG. 16 is an explanatory diagram of the calculation amount in the case of FIG.
It is.

【図１７】第７の本発明の実施例の演算量説明図（その
１）である。FIG. 17 is a calculation amount explanatory diagram (1) of the seventh embodiment of the present invention.

【図１８】第７の本発明の実施例の演算量説明図（その
２）である。FIG. 18 is a diagram (No. 2) for explaining the amount of calculation in the seventh embodiment of the present invention.

【図１９】適応コードブックのピッチ予備探索方法説明
図で、(a) は探索方法説明図、(b), (c)は処理手順説明
図である。FIG. 19 is an explanatory diagram of a pitch preliminary search method for an adaptive codebook, in which (a) is a search method illustration and (b) and (c) are processing procedure illustrations.

【図２０】第８の本発明の実施例の説明図で、(a) は探
索方法説明図、(b),(c) は処理手順説明図である。FIG. 20 is an explanatory diagram of an eighth embodiment of the present invention, (a) is a search method explanatory diagram, and (b) and (c) are processing procedure explanatory diagrams.

【図２１】音声信号生成モデルの要部構成図の一例であ
る。FIG. 21 is an example of a main part configuration diagram of an audio signal generation model.

【図２２】ベクトル/ フレームの説明図である。FIG. 22 is an explanatory diagram of vectors / frames.

【図２３】従来例の符号器の処理タイミング説明図であ
る。FIG. 23 is an explanatory diagram of processing timing of a conventional encoder.

[Explanation of symbols]

11, 103 線形予測(LPC) 分析 12, 101 ベクトルバッファ 21 反射係数コードブック 22 適応器 23, 104a 反射係数量子化部 24, 105 反射係数テーブル 25 短期予測フィルタ係数生成 31 ピッチ周期抽出部 32 適応器 33 長期予測係数コードブック 34 長期予測フィルタ係数テーブル 35 長期予測係数量子化 41 長期予測フィルタ 42 短期予測フィルタ 43 インパルス応答ベクトル 44, 117 インパルス応答フィルタ 51 VQコードブック 52, 111, 114 利得器 53 利得適応器 61 誤差計算部分 62, 107 聴覚重み付けフィルタ 63, 108 最小誤差自乗計算 109 適応コードブック 112 固定コードブック 116 インパルス応答ベクトル生成部 106 合成フィルタ 110 適応コードブック・ゲイン・テーブル 113 固定コードブック・ゲイン・テーブル 11, 103 Linear prediction (LPC) analysis 12, 101 Vector buffer 21 Reflection coefficient codebook 22 Adaptor 23, 104a Reflection coefficient quantizer 24, 105 Reflection coefficient table 25 Short-term prediction filter coefficient generation 31 Pitch period extractor 32 Adaptor 33 Long-term prediction coefficient codebook 34 Long-term prediction filter coefficient table 35 Long-term prediction coefficient quantization 41 Long-term prediction filter 42 Short-term prediction filter 43 Impulse response vector 44, 117 Impulse response filter 51 VQ codebook 52, 111, 114 Gain unit 53 Gain adaptation 61 Error calculation part 62, 107 Auditory weighting filter 63, 108 Minimum error square calculation 109 Adaptive codebook 112 Fixed codebook 116 Impulse response vector generator 106 Synthesis filter 110 Adaptive codebook gain table 113 Fixed codebook gain table

フロントページの続き (72)発明者江口修英福岡県福岡市博多区博多駅前三丁目22番８号富士通九州ディジタル・テクノロジ株式会社内 (72)発明者大田恭士神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内 (72)発明者鈴木政直神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内Front page continuation (72) Inventor Shuei Eguchi 3-22-8 Hakataekimae, Hakata-ku, Fukuoka City, Fukuoka Prefecture Fujitsu Kyushu Digital Technology Co., Ltd. (72) Inventor Kyoji Ota Kami, Nakahara-ku, Kawasaki City, Kanagawa Prefecture 4-1-1 Odanaka, Fujitsu Limited (72) Inventor Masanao Suzuki 4-1-1, Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa Inside Fujitsu Limited

Claims

[Claims]

1. Before performing high efficiency coding on an input speech signal of an Nth frame (N is a positive integer), (N-
1) Using an audio information parameter generated by performing a linear prediction analysis on an input audio signal of a past frame including the n-th frame, an encoding process is performed on the input audio signal of the N-th frame. A highly efficient voice transmission method characterized by the above.

2. The linear prediction analysis window for weighting an input speech signal as much as a frame closer to the N-th frame among the past frames is used. Efficient voice transmission method.

3. A voice information generating means for generating a voice information parameter by performing a linear prediction analysis of an input voice signal,
Multiple noise signal patterns that are sound sources are stored,
Vector quantization codebook means for giving a gain to the extracted noise signal pattern and outputting the noise signal pattern, and a long-term prediction filter having the applied voice information parameter as a filter coefficient,
A short-term filter, a speech synthesis filter means for generating a synthesized speech signal by using the output of the vector quantization codebook means, and an error calculation between the input speech signal and the generated synthesized speech signal are repeated to obtain a sound source with a minimum error. In a high-efficiency speech coder having a minimum error square calculation means for determining a number, speech information generated by performing linear prediction analysis on an input speech signal of a past frame including the (N-1) th frame A high-efficiency speech coder, which is configured to perform an encoding process on an input speech signal of an N-th frame using parameters.

4. The voice information parameter generated by performing a linear prediction analysis on a synthetic speech signal output in the past is used as a filter coefficient of the short-term prediction filter. High efficiency voice encoder.

5. A vector quantizer which separates input speech coded information to extract a sound source number and a sound information parameter, and a noise signal pattern corresponding to an applied sound source number and outputs the noise signal pattern with a gain. In a high-efficiency speech decoder including a codebook means, a long-term prediction filter having a filter characteristic corresponding to an applied speech information parameter, and a short-term prediction filter, the short-term prediction filter converts a decoded speech signal output in the past. A high-efficiency speech decoder characterized in that the speech information parameter generated by performing linear prediction analysis is used as a filter coefficient.

6. The high-efficiency speech coder according to claim 4,
A high-efficiency voice transmission device, characterized in that it encodes and transmits an input voice signal, and receives and decodes a voice signal encoded by using the high-efficiency voice decoder according to claim 5.

7. When performing a convolution operation of an impulse response matrix and a code vector with a synthesis filter, the operation sequence is started from the most recent code vector, and the operation is sequentially shifted to the past code vector to perform the convolution operation. An arithmetic method characterized by performing.

8. When the pitch search of the adaptive codebook is performed in units of m (m is a positive integer) vector that constitutes a frame, the first vector is searched over the entire range to obtain the optimum pitch P _1. , The second vector to the m-th vector are characterized in that the optimum pitch is found by limiting the pitch search within a preset range centered on the optimum pitch found in the previous pitch search. Calculation method.

9. A zero state is obtained by performing a convolution operation of the added code vector obtained by adding the code vectors respectively taken out from the adaptive codebook means and the fixed codebook means and the impulse response vector from the impulse response vector generation means. A high-efficiency speech code which is provided with impulse response filter means for obtaining a response vector, and searches for a combination of optimum code vectors such that the error between the 0-state response vector from the impulse response filter means and the input target vector is minimized. In the coding method, a high-efficiency speech coding method is characterized in that a convolution calculation is performed using the calculation method of claim 7 and a pitch search is performed using claim 8.

10. A high-efficiency speech decoding method, which decodes a signal encoded by the speech encoding method according to claim 9 to reproduce an original speech signal.

11. A high-efficiency speech coder characterized in that a convolution operation is performed by using the operation method according to claim 7, and a pitch search is performed by using claim 8.

12. A high-efficiency speech decoder having a configuration for decoding a signal encoded by the speech encoding method according to claim 9 to reproduce an original speech signal.

13. An input speech signal is encoded and transmitted using the high efficiency speech coder of claim 11, and a speech signal encoded by the high efficiency decoder of claim 12 is received and decoded. A high-efficiency voice transmission device having the above structure.