JP2002244700A

JP2002244700A - Device and method for sound encoding and storage element

Info

Publication number: JP2002244700A
Application number: JP2001036406A
Authority: JP
Inventors: Atsushi Yamane; 淳山根
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-02-14
Filing date: 2001-02-14
Publication date: 2002-08-30

Abstract

PROBLEM TO BE SOLVED: To provide a sound encoding device which excellently encodes a sound regardless of whether there is background noise. SOLUTION: According to the discrimination result of a subframe discrimination means 107, a sound source code book search means 105 discriminatingly uses an A-b-s sound source code book search means 202 or a spectrum area sound source code book search means 203 to make a sound source code book search in a frequency area when the sound has much background noise.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声符号化装置、
音声符号化方法および音声符号化方法のアルゴリズムを
記憶する記憶素子に関し、特に背景雑音の有無に影響さ
れない音声符号化装置、音声符号化方法および記憶素子
に関する。[0001] The present invention relates to a speech coding apparatus,
The present invention relates to a speech encoding method and a storage element for storing an algorithm of the speech encoding method, and particularly to a speech encoding device, a speech encoding method, and a storage element that are not affected by the presence or absence of background noise.

【０００２】[0002]

【従来の技術】近年、携帯電話、インターネット電話、
ＤＳＶＤモデムなど、音声信号を低ビットレートで符号
化する音声符号化技術を用いたアプリケーションが広く
普及するにつれ、音声符号化技術の高品質化への要求が
高まっている。６ｋｂｐｓ〜１６ｋｂｐｓにおける音声
符号化方式の主流は、ＣＥＬＰ（符号励振線形予測符号
化方式：ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰ
ｒｅｄｉｃｔｉｏｎｃｏｄｉｕｇｓｙｓｔｅｍ）で
あり、ＣＥＬＰの概念を用いた方式が、上記さまざまな
アプリケーションや、ディジタル携帯電話などさまざま
な標準において利用されてきている。2. Description of the Related Art In recent years, mobile phones, Internet phones,
2. Description of the Related Art As applications using voice coding technology for coding voice signals at a low bit rate, such as DSVD modems, have become widespread, demands for higher quality voice coding technology have increased. The mainstream of speech coding systems at 6 kbps to 16 kbps is CELP (Code Excited Linear Prediction Coding: Code Excited Linear P
A system using the concept of CELP has been used in the above-described various applications and various standards such as digital mobile phones.

【０００３】ＣＥＬＰは、線形予測による音声の音源・
声道分離モデルを用い、声道情報を量子化し、さらに音
源情報をベクトル量子化することによって、低ビットレ
ート符号化を実現するものである。具体的には、まず、
入力音声を処理単位であるフレームに分割する。さら
に、フレームを解析し、声道情報に対応する音声のホル
マント成分を表現するホルマントパラメータを抽出して
量子化する。[0003] CELP is a sound source of speech by linear prediction.
Using the vocal tract separation model, the vocal tract information is quantized, and the sound source information is further vector-quantized to realize low bit rate encoding. Specifically, first,
The input audio is divided into frames, which are processing units. Further, the frame is analyzed, and a formant parameter expressing a formant component of the voice corresponding to the vocal tract information is extracted and quantized.

【０００４】音源情報の量子化は、フレームよりもさら
に短いサブフレームについて行われることが多く、さら
に、二種類の符号帳を用いて二段階の過程を経て行われ
ることが多い。この一段階目は、音声のピッチ情報を表
す適応符号帳を用いた適応符号帳探索である。適応符号
帳は、直前のサブフレームまでの音源情報をピッチ長さ
分、繰り返してサブフレーム長にしたベクトル（適応符
号ベクトル）を所定数のピッチ数個、生成することによ
ってサブフレームごとに適応的に構成される。さらに、
各適応符号ベクトルを前記量子化されたホルマントパラ
メータによって構成される声道フィルタに通し、合成ベ
クトルを生成する。次に、前記合成ベクトルと入力サブ
フレームとの距離を測定し、最も短い距離を与えるもの
として最適な適応符号ベクトル、すなわち最適なピッチ
を決定する。このように、合成音を用いて分析する手法
は「合成による分析（Ａ−ｂ−ｓ）法」と呼ばれる。[0004] The quantization of the excitation information is often performed on a sub-frame shorter than the frame, and is often performed through a two-step process using two types of codebooks. The first stage is an adaptive codebook search using an adaptive codebook representing pitch information of speech. The adaptive codebook generates a predetermined number of pitches (adaptive code vectors) in which the excitation information up to the immediately preceding subframe is repeated by the pitch length and has a subframe length by a predetermined number of pitches. It is composed of further,
Each adaptive code vector is passed through a vocal tract filter composed of the quantized formant parameters to generate a composite vector. Next, a distance between the combined vector and the input subframe is measured, and an optimal adaptive code vector, that is, an optimal pitch is determined to give the shortest distance. The technique of analyzing using a synthesized sound in this way is called an “analysis by synthesis (Abs) method”.

【０００５】二段階目は、音源符号帳（雑音符号帳と呼
ぶこともある）と呼ばれる所定数の固定的なベクトル
（音源符号ベクトル）によって構成される固定的な符号
帳を用いた音源符号帳探索（雑音符号帳探索と呼ぶこと
もある）である。音源符号帳には、学習によって構成さ
れる学習符号帳、乱数によって構成するランダム符号
帳、数本のパルスを用いて代数的に構成する代数的符号
帳、あるいはそれらを部分的に組み合わせたものといっ
た種類がある。音源符号帳探索にも、Ａ−ｂ−ｓ法が一
般に用いられる。Ａ−ｂ−ｓ法では、まず、前記サブフ
レームベクトルから、前記決定された最適な適応符号ベ
クトルを声道フィルタに通した合成ベクトルを差し引く
ことにより、音源符号帳探索の目標信号を生成する。次
に、各音源符号ベクトルを、適応符号ベクトルと同様
に、前記声道フィルタに通し、合成ベクトルを生成す
る。さらに、前記音源符号帳探索の目標信号と前記合成
ベクトルとの距離を測定し、最も短い距離を与えるもの
として最適な音源符号ベクトルを決定する。The second stage is an excitation codebook using a fixed codebook composed of a predetermined number of fixed vectors (excitation code vectors) called an excitation codebook (sometimes called a noise codebook). Search (sometimes called a random codebook search). The excitation codebook includes a learning codebook configured by learning, a random codebook configured by random numbers, an algebraic codebook configured algebraically using several pulses, or a combination thereof. There are types. The Abs method is also generally used for the excitation codebook search. In the Abs method, first, a target signal for excitation codebook search is generated by subtracting a synthesized vector obtained by passing the determined optimal adaptive code vector through a vocal tract filter from the subframe vector. Next, similarly to the adaptive code vector, each excitation code vector is passed through the vocal tract filter to generate a synthesized vector. Further, a distance between a target signal of the excitation codebook search and the combined vector is measured, and an optimal excitation code vector is determined to give the shortest distance.

【０００６】上記のように、ＣＥＬＰは、音声の音源・
声道モデルを用いるため、人間の音声を効率よく符号化
することのできる方式ではあるが、人間の音声以外の音
楽信号や背景雑音信号などに弱いという欠点を持つ。し
かし、上記携帯電話等、音声符号化を用いたアプリケー
ションでは人間の実環境において利用される場合が多
く、特に背景雑音への対策が望まれている。[0006] As described above, CELP is a sound source of speech.
Since the vocal tract model is used, the method can efficiently encode human speech, but has a disadvantage that it is weak to music signals other than human speech, background noise signals, and the like. However, applications using voice coding, such as the above-mentioned mobile phones, are often used in the real human environment, and measures against background noise are particularly desired.

【０００７】ＣＥＬＰの背景雑音への対策としては、
（１）雑音抑圧を利用して入力信号の雑音レベルを低減
する方式、（２）雑音をＣＥＬＰ符号化することにより
不自然さを低減する方式、（３）入力音声の性質にした
がって音源符号帳を切り替える方式、等が提案されてい
る。As a countermeasure against the background noise of CELP,
(1) A method for reducing the noise level of an input signal using noise suppression, (2) A method for reducing unnaturalness by performing CELP coding on noise, and (3) A source codebook according to the characteristics of input speech , Etc., are proposed.

【０００８】（１）の例としては、文献（Ｓ．Ｆ．Ｂｏ
ｌｌ，“ＳｕｐｐｒｅｓｓｉｏｎｏｆＡｃｏｕｓｔｉ
ｃＮｏｉｓｅｉｎＳｐｅｅｃｈＵｓｉｎｇＳ
ｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ”，ＩＥＥＥ
Ｔｒａｎｓ．０ｎＡＳＳＰ，Ｖｏｌ．２７，Ｎｏ．
２，ｐｐ．１１３〜１２０（１９７９））において述べ
られているＳＳ（ＳｐｅｃｔｒｕｍＳｕｂｔｒａｃｔ
ｉｏｎ）法や、文献（Ｊ．Ｄ．Ｇｉｂｓｏｎ，Ｂ．Ｋｏ
ｏ，Ｓ．Ｄ．Ｇｒａｙ，“Ｆｉｌｔｅｒｉｎｇｏｆ
ＣｏｌｏｒｅｄＮｏｉｓｅｆｏｒＳｐｅｅｃｈ
ＥｎｈａｎｃｅｍｅｎｔａｎｄＣｏｄｉｎｇ”，Ｉ
ＥＥＥＴｒａｎｓ．ｏｎＳＰ，Ｖｏｌ．３９，Ｎ
ｏ．８，ｐｐ．１７３２〜１７４１（１９９１））にお
いて述べられているカルマンフィルタ法などを用いた雑
音抑圧法を用いるものがあげられる。これらの方式は、
ある程度雑音のレベルを低減することには有効である
が、雑音を完全に除去することはできないため、依然雑
音をＣＥＬＰで符号化することによる不自然さを除くこ
とはできない。[0008] As an example of (1), a document (SFBo)
11, "Suppression of Acoustic.
c Noise in Speech Usage S
Practical Subtraction ”, IEEE
Trans. 0n ASSP, Vol. 27, no.
2, pp. 113 (Spectrum Subtract) described in pp. 113-120 (1979)).
ion) method and literature (JD Gibson, B. Ko).
o, S. D. Gray, “Filtering of
Colored Noise for Speech
Enhancement and Coding ”, I
EEE Trans. on SP, Vol. 39, N
o. 8, pp. 1732-1741 (1991)) and a noise suppression method using a Kalman filter method or the like. These methods are:
Although it is effective to reduce the level of noise to some extent, since noise cannot be completely removed, the unnaturalness caused by coding noise with CELP cannot be eliminated.

【０００９】（２）の例としては、特開平１１−２４２
４９９号公報に記載されているＳＳ法を用いて音声と背
景雑音を分離して符号化する方式や、文献（大室、間
野、“低ビットレート音声符号化における背景雑音付加
音声の品質向上”、電子情報通信学会技術研究報告、Ｓ
Ｐ９８〜１４５（１９９９））に述べられている、背景
雑音レベルを推定し、復号後の音声に積極的に背景雑昔
を加えることによって背景雑音をＣＥＬＰで符号化する
ことによる不自然さを低減する方式、および文献（岡
崎、高橋、“ポストノイズスムーザーによる低レートＣ
ＥＬＰの雑音区間品質の改善”、日本音響学会講演論文
集、ｐｐ．２３７〜２３８（１９９８．３））に述べら
れている、再生音声をスペクトル領域でスムージングし
て不自然さを低減する方式、等が提案されている。これ
らの方法は、背景雑音をＣＥＬＰ符号化することによる
不快な音は大幅に低減され、ある程度不自然さは低減さ
れるが、背景雑音をランダム雑音で近似するため、実環
境における背景雑音を変質させてしまい、場所の同定な
ど背景雑音に意味のある場合、必要な情報が伝わらない
といった問題が生じる。As an example of (2), JP-A-11-242
No. 499, the method of separating and coding speech and background noise using the SS method, and a document (Omuro, Maino, "Improvement of quality of speech with background noise added in low bit rate speech coding") , IEICE Technical Report, S
P98-145 (1999)) reduces the unnaturalness caused by coding background noise by CELP by estimating the background noise level and positively adding background noise to decoded speech. And the literature (Okazaki, Takahashi, "Low-rate C with post-noise smoother"
Improvement of ELP noise section quality ", a method of smoothing a reproduced voice in a spectral domain to reduce unnaturalness, described in Acoustic Society of Japan, pp. 237-238 (1998. 3)). In these methods, unpleasant sounds due to CELP coding of background noise are greatly reduced, and unnaturalness is reduced to some extent. However, since background noise is approximated by random noise, The background noise in the real environment is altered, and when the background noise is meaningful, for example, for identification of a place, there arises a problem that necessary information is not transmitted.

【００１０】（３）としては、特開平８−１２３４９３
号公報に記載されている音源符号帳として、有声音に有
効なマルチパルス型の符号帳と、雑音や無声音等に有効
なランダム雑音符号帳とを備え、入力音声の性質にした
がって切り替えたり、混合比率を変化させたりする方式
があげられる。ＣＥＬＰでは、上記のように、一般にＡ
−ｂ−ｓ法を用いた符号帳探索を行う。Ａ−ｂ−ｓ法
は、合成音と目標となる音声とを時間軸で比較するもの
である。ＳＳ法では、入力信号の離散フーリエ変換値の
強度から、推定した雑音スペクトルのパワーを差し引
き、位相情報は入力信号のものをそのまま用いることに
背景雑音を除去するものであり、雑音成分の特徴量とし
ては、時間軸における類似度よりも、スペクトルの強度
が重要であると考えることができる。As (3), Japanese Patent Application Laid-Open No. 8-123493
As a sound source codebook described in Japanese Unexamined Patent Publication, a multi-pulse type codebook effective for voiced sounds and a random noise codebook effective for noise and unvoiced sounds are provided. There is a method of changing the ratio. In CELP, as described above, generally A
Perform a codebook search using the -bs method. The Abs method compares a synthesized voice with a target voice on a time axis. In the SS method, the power of the estimated noise spectrum is subtracted from the intensity of the discrete Fourier transform value of the input signal, and background information is removed by using the phase information of the input signal as it is, and the characteristic amount of the noise component is used. It can be considered that the intensity of the spectrum is more important than the similarity on the time axis.

【００１１】[0011]

【発明が解決しようとする課題】上述のごとく、学習済
みの音源符号帳を用いるＣＥＬＰは、背景雑音や背景音
楽を含まない人間の音声を効率よく符号化するには適し
た方法であるが、背景に雑音や音楽が加わった場合には
品質劣化を生じるという問題があった。また、学習をし
ないガウス雑音からなる音源符号帳を用いた場合は、背
景雑音や背景音楽による劣化がないものの、雑音のない
環境下での音声符号化に対しては品質がよくないという
問題があった。本発明は、比較的簡単な方法でこの問題
を解決して、入力音声の性質にしたがって音源符号帳を
切り替える方式を用い、音源符号帳探索において、背景
雑音レベルが低い場合には学習済みの音源符号帳を用
い、背景雑音レベルが高い場合には時間軸のＡ−ｂ−ｓ
法を用いずスペクトル領域において探索を行うことによ
り、背景雑音のない場合も、背景雑音が含まれた場合
も、音声を良好に符号化する音声符号化方法およびその
方法を実現する音声符号化装置を提供することを課題と
するものである。As described above, CELP using a learned excitation codebook is a suitable method for efficiently encoding human speech without background noise or background music. When noise or music is added to the background, there is a problem that quality is deteriorated. In addition, when a source codebook made of Gaussian noise without learning is used, there is no degradation due to background noise or background music, but the quality is poor for speech coding in a noise-free environment. there were. The present invention solves this problem in a relatively simple manner, and uses a method of switching the excitation codebook according to the characteristics of the input speech. In the excitation codebook search, if the background noise level is low, the learned excitation If the background noise level is high using the codebook, Abs on the time axis
Speech coding method and apparatus for realizing such a method by performing a search in the spectral domain without using the method, so that the speech can be satisfactorily encoded even when there is no background noise or when background noise is included. It is an object to provide

【００１２】[0012]

【課題を解決するための手段】上記課題を達成するた
め、本発明は、ディジタル音声信号の符号化を実行する
音声符号化装置において、前記ディジタル音声信号を複
数のフレームに分割するフレーム分割手段と、前記フレ
ームをフレームよりさらに短いサブフレームに分割する
サブフレーム分割手段と、前記フレームから前記フレー
ムのホルマントパラメータを抽出して符号化するホルマ
ントパラメータ抽出手段と、前記サブフレームと前記フ
レームのホルマントパラメータとを用いて前記サブフレ
ームのピッチ周期を抽出し符号化する適応符号帳探索手
段と、前記サブフレームと前記フレームのホルマントパ
ラメータと前記ピッチ周期と複数の音源符号ベクトルか
らなる音源符号帳とを用いて、前記サブフレームの雑音
源成分を抽出し符号化する音源符号帳探索手段と、前記
サブフレームの性質を判別するサブフレーム判別手段と
を備え、さらに、前記音源符号帳探索手段に、合成によ
る分析法を用いて雑音源成分を抽出し符号化する「合成
による分析法（Ａ−ｂ−ｓ）」音源符号帳探索手段と、
スペクトル領域で音源符号帳探索成分を抽出し符号化す
るスペクトル領域音源符号帳探索手段とを備え、前記サ
ブフレーム判別手段における判別結果にしたがって、前
記音源符号帳探索手段において、Ａ−ｂ−ｓ音源符号帳
探索手段とスペクトル領域音源符号帳探索手段とのいず
れかを使い分けることを特徴とする。これにより、背景
雑音等を判別することによって音源符号帳探索手段を切
り替え、背景雑音のない場合も、背景雑音が含まれた場
合も、音声を良好に符号化する音声符号化装置を提供す
ることができる。In order to achieve the above object, the present invention relates to a speech encoding apparatus for encoding a digital speech signal, comprising: a frame dividing means for dividing the digital speech signal into a plurality of frames; A subframe dividing unit that divides the frame into subframes shorter than a frame, a formant parameter extracting unit that extracts and encodes the formant parameters of the frame from the frame, and a formant parameter of the subframe and the frame. Using an adaptive codebook search means for extracting and encoding the pitch period of the subframe using, and an excitation codebook including the subframe, the formant parameter of the frame, the pitch period, and a plurality of excitation code vectors. Extracting the noise source component of the subframe and coding An excitation codebook searching means for performing the analysis, and a subframe determining means for determining the property of the subframe. Further, the excitation codebook searching means extracts and encodes a noise source component using an analysis method by synthesis. "Analysis method by synthesis (Abs)" excitation codebook search means;
Spectrum domain excitation codebook search means for extracting and encoding an excitation codebook search component in a spectrum domain, wherein the excitation codebook search means includes an Abs excitation It is characterized in that either one of the codebook searching means and the spectral domain excitation codebook searching means is used properly. Accordingly, an excitation codebook search unit is switched by determining background noise or the like, and a speech encoding device that satisfactorily encodes speech regardless of whether background noise is present or background noise is provided. Can be.

【００１３】また、ディジタル音声信号の符号化を実行
する際の音声符号化方法において、ディジタル音声信号
をフレームに分割するフレーム分割工程と、前記フレー
ムをフレームよりさらに短いサブフレームに分割するサ
ブフレーム分割工程と、前記フレームから前記フレーム
のホルマントパラメータを抽出し符号化するホルマント
パラメータ抽出工程と、前記サブフレームと、前記フレ
ームのホルマントパラメータとを用いて前記サブフレー
ムのピッチ周期を抽出し符号化する適応符号帳探索工程
と、前記サブフレームと、前記フレームのホルマントパ
ラメータと、前記ピッチ周期と、複数の音源符号ベクト
ルからなる音源符号帳とを用いて前記サブフレームの雑
音源成分を抽出し符号化する音源符号帳探索工程と、前
記サブフレームの性質を判別するサブフレーム判別工程
とを備え、さらに、前記音源符号帳探索工程が、合成に
よる分析法を用いて雑音源成分を抽出し符号化するＡ−
ｂ−ｓ音源符号帳探索工程と、スペクトル領域で音源符
号帳探索成分を抽出し符号化するスペクトル領域音源符
号帳探索工程とを備え、前記サブフレーム判別工程にお
ける判別結果にしたがって、前記音源符号帳探索工程に
おいて、前記Ａ−ｂ−ｓ音源符号帳探索工程と前記スペ
クトル領域音源符号帳探索工程とのいずれかを使い分け
ることを特徴とする。これにより、背景雑音等を判別す
ることによって音源符号帳探索工程を切り替え、背景雑
音のない場合も、背景雑音が含まれた場合も、音声を良
好に符号化する音声符号化方法を提供することができ
る。In a speech encoding method for encoding a digital audio signal, a frame dividing step of dividing the digital audio signal into frames, and a sub-frame division dividing the frame into sub-frames shorter than the frame. A formant parameter extracting step of extracting and encoding the formant parameters of the frame from the frame; and an adaptive extracting and encoding a pitch period of the subframe using the subframe and the formant parameters of the frame. Extracting and encoding a noise source component of the subframe using a codebook search step, the subframe, the formant parameter of the frame, the pitch period, and an excitation codebook including a plurality of excitation code vectors. Excitation codebook search step; And a sub-frame determination step of determining the quality, further, the excitation codebook search process, to extract and encode the noise source components using analysis by synthesis A-
bs excitation codebook search step, and a spectral domain excitation codebook search step of extracting and encoding an excitation codebook search component in a spectrum domain, wherein the excitation codebook is determined according to the result of the subframe determination step. In the search step, one of the Abs excitation codebook search step and the spectral domain excitation codebook search step is selectively used. Accordingly, an excitation codebook search process is switched by discriminating background noise and the like, and a speech encoding method that satisfactorily encodes speech when there is no background noise or when background noise is included is provided. Can be.

【００１４】[0014]

【発明の実施の形態】以下、本発明にかかる音声符号化
装置、音声符号化方法を添付図面を参照にして詳細に説
明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech coding apparatus and a speech coding method according to the present invention will be described below in detail with reference to the accompanying drawings.

【００１５】図１に、本発明の請求項１に基づく音声符
号化装置の一実施の形態のブロック図を示す。この音声
符号化装置は、ＣＥＬＰに基づいてディジタル音声入力
信号を符号化する音声符号化装置であって、フレーム分
割手段１０１と、サブフレーム分割手段１０２と、ホル
マントパラメータ抽出手段１０３と、適応符号帳探索手
段１０４と、音源符号帳探索手段１０５と、利得量子化
手段（利得符号帳探索手段）１０６と、サブフレーム判
別手段１０７とを備える。さらに、音源符号帳探索手段
１０５は、目標信号算出手段２０１と、Ａ−ｂ−ｓ音源
符号帳探索手段２０２と、スペクトル領域音源符号帳探
索手段２０３とを備える。FIG. 1 is a block diagram showing an embodiment of a speech coding apparatus according to the present invention. This speech coding apparatus is a speech coding apparatus for coding a digital speech input signal based on CELP, and includes a frame dividing means 101, a subframe dividing means 102, a formant parameter extracting means 103, an adaptive codebook. A search unit 104, an excitation codebook search unit 105, a gain quantization unit (gain codebook search unit) 106, and a subframe determination unit 107 are provided. Further, excitation codebook search means 105 includes target signal calculation means 201, Abs excitation codebook search means 202, and spectral domain excitation codebook search means 203.

【００１６】ディジタル音声信号は、フレーム分割手段
１０１において、フレームと呼ぶ処理単位に分割され
る。フレーム長としては、２０〜３０ｍｓといった値が
あげられる。フレーム分割の前後に、高域通過フィルタ
を通すことにより、直流成分を除去してもよい。さら
に、前記フレームは、サブフレーム分割手段１０２にお
いて、サブフレームと呼ぶ処理単位に分割される。１フ
レームあたりのサブフレーム数としては、２〜６といっ
た値があげられる。The digital audio signal is divided by the frame dividing means 101 into processing units called frames. Examples of the frame length include values such as 20 to 30 ms. A DC component may be removed by passing through a high-pass filter before and after frame division. Further, the frame is divided by the sub-frame dividing means 102 into processing units called sub-frames. The number of subframes per frame includes a value of 2 to 6, for example.

【００１７】さらに、前記フレームは、ホルマントパラ
メータ抽出手段１０３において、解析され、前記フレー
ムのホルマントパラメータが抽出される。ホルマントパ
ラメータとしては、ＬＰＣ（ＬｉｎｅａｒＰｒｅｄｉ
ｃｔｉｏｎＣｏｄｉｎｇ、線形予測）係数、ＬＳＰ
（ＬｉｎｅＳｐｅｃｔｒｕｍＰａｉｒ、線スペクト
ル対）、ＬＳＦ（ＬｉｎｅＳｐｅｃｔｒｕｍＦｒｅ
ｑｕｅｎｃｙ、線スペクトル周波数）、ＬＰＣケプスト
ラム係数、反射係数があげられる。ホルマントパラメー
タの抽出手法としては、線形予測解析があげられる。線
形予測解析の手法としては、フレームの自己相関関数を
算出し、Ｌｅｖｉｎｓｏｎ−Ｄｕｒｂｉｎの再帰解法に
よってＬＰＣ係数を算出する手法があげられる。前記自
己相関関数を算出する前に、フレームにハミング窓ある
いはハニング窓などの窓関数を適用してもよい。Further, the frame is analyzed by formant parameter extracting means 103, and the formant parameter of the frame is extracted. As formant parameters, LPC (Linear Predi)
ction Coding, linear prediction) coefficient, LSP
(Line Spectrum Pair, line spectrum pair), LSF (Line Spectrum Fre)
frequency, line spectrum frequency), LPC cepstrum coefficient, and reflection coefficient. As a method of extracting the formant parameters, a linear prediction analysis can be cited. As a method of the linear prediction analysis, there is a method of calculating an autocorrelation function of a frame and calculating LPC coefficients by a Levinson-Durbin recursive method. Before calculating the autocorrelation function, a window function such as a Hamming window or a Hanning window may be applied to the frame.

【００１８】次に、前記フレームのホルマントパラメー
タは量子化され、伝送あるいは蓄積される。前記フレー
ムのホルマントパラメータの量子化手法としては、スカ
ラ量子化、ベクトル量子化、多段ベクトル量子化、分割
ベクトル量子化、予測量子化などがあげられる。ホルマ
ントパラメータを量子化する際には、ＬＳＰあるいはＬ
ＳＦといった量子化効率の良いパラメータを用いること
が好ましい。Next, the formant parameters of the frame are quantized and transmitted or stored. Examples of the quantization method of the formant parameter of the frame include scalar quantization, vector quantization, multi-stage vector quantization, split vector quantization, and predictive quantization. When quantizing the formant parameters, LSP or L
It is preferable to use a parameter with high quantization efficiency such as SF.

【００１９】次に、前記量子化された前記フレームのホ
ルマントパラメータを用いて、前記サブフレームのホル
マントパラメータが算出される。前記サブフレームのホ
ルマントパラメータ算出手法としては、現在および過去
の前記量子化された前記フレームのホルマントパラメー
タから補間によって求めるという手法があげられる。前
記補間手法としては、線形補間、二次補間があげられ
る。Next, a formant parameter of the sub-frame is calculated using the quantized formant parameter of the frame. As a method of calculating the formant parameter of the sub-frame, there is a method of obtaining the formant parameter of the current and past frames by interpolation from the quantized formant parameters. The interpolation method includes linear interpolation and quadratic interpolation.

【００２０】次に、適応符号帳探索手段１０４におい
て、各前記サブフレームに対して、サブフレームのピッ
チ周期成分の抽出に相当する適応符号帳探索が行われ
る。この適応符号帳探索においては、あらかじめ所定数
のピッチ候補を用意しておく。ピッチ候補には、サンプ
リング単位の整数倍の整数ピッチと、非整数ピッチとが
ある。このピッチ候補のすべてに対して、以下の処理を
行う。Next, in the adaptive codebook searching means 104, an adaptive codebook search corresponding to the extraction of the pitch period component of the subframe is performed for each of the subframes. In this adaptive codebook search, a predetermined number of pitch candidates are prepared in advance. Pitch candidates include an integer pitch that is an integral multiple of the sampling unit and a non-integer pitch. The following processing is performed on all of the pitch candidates.

【００２１】まず、適応符号ベクトルの生成が行われ
る。適応符号ベクトルは、直前のサブフレームまでの音
源ベクトルからピッチ長分を切り出し、サブフレーム長
になるまで繰り返して並べることによって生成される。
次に、前記適応符号ベクトルと前記サブフレームのホル
マントパラメータとを用いることにより、合成音声ベク
トルが生成される。合成音声ベクトルの生成手法として
は、前記適応符号ベクトルに前記サブフレームのホルマ
ントパラメータによって構成された線形フィルタを適用
することによって行う手法があげられる。First, an adaptive code vector is generated. The adaptive code vector is generated by cutting out the pitch length from the excitation vector up to the immediately preceding subframe and arranging it repeatedly until the subframe length is reached.
Next, a synthesized speech vector is generated by using the adaptive code vector and the formant parameter of the subframe. As a method of generating a synthesized speech vector, a method of applying a linear filter constituted by the formant parameters of the subframe to the adaptive code vector may be used.

【００２２】次に、前記合成音声ベクトルが、任意の利
得を乗じた場合に、前記サブフレームに対して最も近く
なる最短距離が算出される。前記最短距離の算出におい
ては、聴覚的な重み付けを行うことにより、聴覚的な誤
差が最小になるように行う手法を導入してもよい。以上
の処理を、前記ピッチ候補内のすべてのピッチに対して
行い、前記最短距離が最も小さい適応符号ベクトルおよ
びピッチ周期を、当該サブフレームの適応符号ベクトル
およびピッチ周期とする。当該サブフレームのピッチ周
期に付与された符号は、伝送あるいは蓄積される。Next, when the synthesized speech vector is multiplied by an arbitrary gain, a shortest distance closest to the subframe is calculated. In the calculation of the shortest distance, a method of performing an auditory weighting so as to minimize an auditory error may be introduced. The above processing is performed for all the pitches in the pitch candidate, and the adaptive code vector and the pitch cycle having the shortest distance are set as the adaptive code vector and the pitch cycle of the subframe. The code assigned to the pitch period of the subframe is transmitted or stored.

【００２３】さらに、前記サブフレーム判別手段１０７
において、前記サブフレームの性質の判別を行う。本実
施の形態においては、背景雑音の含まれる度合いの判別
を行う。背景雑音の含まれる度合いの判別を行う手法と
しては、前記サブフレームの自己相関関数を計算し、０
次の自己相関関数と０次を除いた中でもっとも値の大き
な自己相関関数との比が所定の値以下であり、０次の自
己相関関数（すなわち前記サブフレームの強度）の値が
所定の値以上である場合に、背景雑音が多く含まれると
判別する手法があげられる。Further, the sub-frame discriminating means 107
In, the characteristics of the subframe are determined. In the present embodiment, the degree of background noise is determined. As a method of determining the degree of the background noise included, an autocorrelation function of the subframe is calculated, and 0 is calculated.
The ratio between the next autocorrelation function and the largest autocorrelation function excluding the zeroth order is equal to or less than a predetermined value, and the value of the zeroth order autocorrelation function (that is, the intensity of the subframe) is equal to or smaller than a predetermined value. When the value is equal to or more than the value, there is a method of determining that a large amount of background noise is included.

【００２４】さらに、前記音源符号帳探索手段１０５に
おいて、複数の音源符号ベクトルによって構成される音
源符号帳を用いた雑音源成分の抽出および符号化が行わ
れる。音源符号帳としては、ランダム符号帳、Ａｌｇｅ
ｂｒａｉｃ（代数学的）符号帳に代表される複数の少数
のパルスによって構成される符号帳、学習によって構成
される学習符号帳などがあげられる。Further, the excitation codebook search means 105 extracts and encodes a noise source component using an excitation codebook composed of a plurality of excitation code vectors. As a sound source codebook, a random codebook, Alge
Examples include a codebook composed of a plurality of small pulses represented by a braic (algebraic) codebook, and a learning codebook composed of learning.

【００２５】まず、前記目標信号算出手段２０１におい
て、前記当該サブフレームの適応符号ベクトルと前記サ
ブフレームのホルマントパラメータとによって合成され
る合成音声ベクトルを、前記サブフレームから差し引く
ことにより、雑音源探索の目標信号ベクトルが生成され
る。First, the target signal calculating means 201 subtracts a synthesized speech vector synthesized from the adaptive code vector of the subframe and the formant parameter of the subframe from the subframe, thereby obtaining a noise source search. A target signal vector is generated.

【００２６】次に、前記サブフレーム判別手段１０７に
おける判別結果に基づき、前記スペクトル領域音源符号
帳探索手段２０３と前記Ａ−ｂ−ｓ音源符号帳探索手段
２０２とのどちらで符号帳探索を行うかが決定される。
背景雑音が多く含まれると判別された場合は、前記目標
信号ベクトルは、前記スペクトル領域音源符号帳探索手
段２０３に送られて、符号帳探索が行われる。背景雑音
が多く含まれないと判別された場合は、前記目標信号ベ
クトルは、前記Ａ−ｂ−ｓ音源符号帳探索手段２０２に
送られ、符号帳探索が行われる。Next, based on the result of the discrimination by the subframe discriminating means 107, which one of the spectrum domain excitation codebook searching means 203 and the Abs excitation codebook searching means 202 performs the codebook search? Is determined.
If it is determined that a large amount of background noise is included, the target signal vector is sent to the spectral domain excitation codebook search means 203, and a codebook search is performed. If it is determined that a large amount of background noise is not included, the target signal vector is sent to the Abs excitation codebook search means 202, and a codebook search is performed.

【００２７】前記Ａ−ｂ−ｓ音源符号帳探索手段２０２
においては、まず、前記音源符号ベクトルに対して、前
記適応符号帳探索手段１０４における場合と同様の距離
計算を行い、各音源符号ベクトルの合成音声ベクトル
が、任意の利得を乗じた場合に前記目標信号ベクトルに
最も近くなる最短距離が計算される。この場合に、前記
音源符号ベクトルの合成音声ベクトルを前記適応符号ベ
クトルの合成音声ベクトルに対して直交化してから距離
計算を行ってもよい。Abs excitation codebook search means 202
In the first, the same distance calculation as in the adaptive codebook search means 104 is performed on the excitation code vector, and when the synthesized speech vector of each excitation code vector is multiplied by an arbitrary gain, the target The shortest distance closest to the signal vector is calculated. In this case, the distance calculation may be performed after orthogonalizing the synthesized speech vector of the excitation code vector with respect to the synthesized speech vector of the adaptive code vector.

【００２８】次に、前記最短距離が最も小さくなる前記
音源符号ベクトルが決定される。また、前記スペクトル
領域音源符号帳探索手段２０３においては、まず、前記
適応符号帳探索手段１０４および前記Ａ−ｂ−ｓ音源符
号帳探索手段２０２における場合と同様に、前記音源符
号帳に含まれるすべての前記音源符号ベクトルの合成音
声ベクトルが計算される。さらに、前記合成音声ベクト
ルおよび前記目標信号を周波数領域に変換する。周波数
領域への変換手法としては、離散フーリエ変換、離散コ
サイン変換、ウェーブレット変換があげられるが、この
限りではない。Next, the excitation code vector that minimizes the shortest distance is determined. In the spectral domain excitation codebook searching means 203, first, as in the adaptive codebook searching means 104 and the Abs excitation codebook searching means 202, Of the excitation code vector is calculated. Further, the synthesized speech vector and the target signal are transformed into a frequency domain. Examples of the conversion method to the frequency domain include, but are not limited to, discrete Fourier transform, discrete cosine transform, and wavelet transform.

【００２９】さらに、周波数領域における前記目標信号
と前記合成音声ベクトルとの類似度が計算される。この
類似度としては、双方のベクトルを正規化した場合の距
離の小ささや、人間の聴覚特性を反映したマスキング周
波数を用いて重み付けした距離の小ささがあげられる
が、その限りではない。次に、前記類似度が最も大きく
なる前記音源符号ベクトルが決定される。ここで決定さ
れた音源符号ベクトルに付与された符号は、伝送あるい
は蓄積される。Further, the similarity between the target signal and the synthesized speech vector in the frequency domain is calculated. Examples of the similarity include, but are not limited to, a small distance when both vectors are normalized, and a small distance weighted using a masking frequency reflecting human auditory characteristics. Next, the excitation code vector that maximizes the similarity is determined. The code assigned to the excitation code vector determined here is transmitted or stored.

【００３０】さらに、利得符号化手段１０６において、
利得成分が符号化される。利得成分の抽出手法として
は、適応符号ベクトルの利得成分および雑音符号ベクト
ルの利得成分を別個に量子化するスカラ量子化と、両者
を同時に最適化するように量子化するベクトル量子化と
があげられる。ここで量子化された利得成分は、伝送あ
るいは蓄積される。さらに、本発明の音声符号化装置に
おいて、背景雑音が多く含まれると判別された場合に、
適応符号帳のビット数を削減したり、適応符号帳探索を
行わなかったり、音源符号ベクトルを切り替えたりする
など、入力サブフレームの性質に応じた符号帳構成の適
応的な切り替えを行ってもよい。以上、本発明の音声符
号化装置について述べたが、この音声符号化装置で用い
られる音声符号化方法およびこの音声符号化方法のアル
ゴリズムを記憶するコンピュータによって読取り可能な
記憶素子をも本発明の対象とするものである。Further, in the gain encoding means 106,
The gain component is encoded. The gain component extraction method includes scalar quantization for separately quantizing the gain component of the adaptive code vector and the gain component of the noise code vector, and vector quantization for quantizing both of them at the same time. . The gain component quantized here is transmitted or stored. Further, in the speech encoding device of the present invention, when it is determined that a large amount of background noise is included,
Adaptive switching of the codebook configuration according to the characteristics of the input subframe may be performed, such as reducing the number of bits in the adaptive codebook, not performing the adaptive codebook search, or switching the excitation code vector. . The speech encoding apparatus of the present invention has been described above. However, the present invention also relates to a speech encoding method used in the speech encoding apparatus and a computer-readable storage element that stores an algorithm of the speech encoding method. It is assumed that.

【００３１】[0031]

【発明の効果】以上説明したように本発明の請求項１の
発明は、音声符号化装置において、サブフレーム判別手
段における判別結果にしたがって、音源符号帳探索手段
で、Ａ−ｂ−ｓ音源符号帳探索手段とスペクトル領域音
源符号帳探索手段とのいずれかを使い分け、背景雑音が
多く含まれた音声の場合に、周波数領域での音源符号帳
探索を行うことにより、背景雑音を含んだ音声を良好に
符号化することのできる音声符号化装置を与えることが
できる。As described above, according to the first aspect of the present invention, in the speech coding apparatus, the excitation codebook searching means uses the Abs excitation code in accordance with the result of determination by the subframe determination means. In the case of speech containing a large amount of background noise, the sound source containing the background noise is searched for in the case of speech containing a lot of background noise. It is possible to provide a speech coding device that can perform good coding.

【００３２】本発明の請求項２の発明は、サブフレーム
判別手段でサブフレームの背景雑音レベルを推定してサ
ブフレームの判別に用いることで、より簡単な構成で目
的の音声符号化装置を実現することができる。According to a second aspect of the present invention, a target speech coding apparatus is realized with a simpler configuration by estimating a background noise level of a subframe by a subframe discriminating means and using the background noise level for subframe discrimination. can do.

【００３３】本発明の請求項３の発明は、スペクトル領
域音源符号帳探索手段で、入力目標信号と音声符号ベク
トルとを周波数領域に変換し、周波数スペクトルでの類
似度によって最適な音源符号ベクトルを決定するので、
より簡単な構成で目的の音声符号化装置を実現すること
ができる。According to a third aspect of the present invention, a spectral domain excitation codebook search means converts an input target signal and a speech code vector into a frequency domain, and determines an optimal excitation code vector based on the similarity in the frequency spectrum. I will decide
A target speech encoding device can be realized with a simpler configuration.

【００３４】本発明の請求項４の発明は、音声符号化装
置において、サブフレーム判別工程における判別結果に
したがって、音源符号帳探索工程で、Ａ−ｂ−ｓ音源符
号帳探索工程とスペクトル領域音源符号帳探索工程との
いずれかを使い分け、背景雑音が多く含まれた音声の場
合に、周波数領域での音源符号帳探索を行うことによ
り、背景雑音を含んだ音声を良好に符号化することので
きる音声符号化方法を実現することができる。According to a fourth aspect of the present invention, in the speech coding apparatus, in the excitation codebook search step, the Abs excitation codebook search step and the spectrum domain excitation In the case of speech that contains a lot of background noise, the sound source codebook search is performed in the frequency domain to properly encode speech containing background noise. A possible audio coding method can be realized.

【００３５】本発明の請求項５の発明は、サブフレーム
判別工程でサブフレームの背景雑音レベルを推定してサ
ブフレームの判別に用いることで、より簡単な構成で目
的の音声符号化装置を実現することができる。According to the invention of claim 5 of the present invention, the target speech coding apparatus is realized with a simpler configuration by estimating the background noise level of the subframe in the subframe discriminating step and using it for discriminating the subframe. can do.

【００３６】本発明の請求項６の発明は、スペクトル領
域音源符号帳探索工程で、入力目標信号と音声符号ベク
トルとを周波数領域に変換し、周波数スペクトルでの類
似度によって最適な音源符号ベクトルを決定するので、
より簡単な構成で目的の音声符号化方法を実現すること
ができる。According to a sixth aspect of the present invention, in the spectral domain excitation codebook search step, the input target signal and the speech code vector are transformed into the frequency domain, and the optimal excitation code vector is determined based on the similarity in the frequency spectrum. I will decide
A target speech encoding method can be realized with a simpler configuration.

【００３７】本発明の請求項７の発明は、請求項４ない
し請求項６のいずれかの音声符号化方法のアルゴリズム
を、コンピュータによって読取り可能な記憶素子に記憶
するので、これらの方法を容易にコンピュータ上に実装
することができる。According to a seventh aspect of the present invention, since the algorithm of the speech encoding method according to any one of the fourth to sixth aspects is stored in a storage element readable by a computer, these methods can be easily performed. Can be implemented on a computer.

[Brief description of the drawings]

【図１】本発明の音声符号化装置のブロック図。FIG. 1 is a block diagram of a speech encoding device according to the present invention.

[Explanation of symbols]

１０１フレーム分割手段１０２サブフレーム分割手段１０３ホルマントパラメータ抽出手段１０４適応符号帳探索手段１０５音源符号帳探索手段１０６利得量子化手段（利得符号帳探索手段）１０７サブフレーム判別手段２０１目標信号算出手段２０２Ａ−ｂ−ｓ音源符号帳探索手段２０３スペクトル領域音源符号帳探索手段 Reference Signs List 101 frame dividing means 102 subframe dividing means 103 formant parameter extracting means 104 adaptive codebook searching means 105 excitation codebook searching means 106 gain quantization means (gain codebook searching means) 107 subframe discriminating means 201 target signal calculating means 202 A -Bs excitation codebook search means 203 spectrum domain excitation codebook search means

Claims

[Claims]

1. A speech encoding apparatus for encoding a digital speech signal, comprising: a frame dividing unit for dividing the digital speech signal into a plurality of frames; and a subframe for dividing the frame into subframes shorter than the frame. Dividing means; formant parameter extracting means for extracting and encoding the formant parameters of the frame from the frame; and extracting and encoding the pitch period of the subframe using the subframe and the formant parameters of the frame. An adaptive codebook search unit, and an excitation source for extracting and encoding a noise source component of the subframe, using an excitation codebook including the subframe, the formant parameter of the frame, the pitch period, and an excitation code vector. Codebook searching means; Sub-frame discriminating means for discriminating properties; and said excitation codebook searching means extracts and encodes a noise source component using an analysis method based on synthesis. Excitation codebook search means, and spectrum domain excitation codebook search means for extracting and encoding an excitation codebook search component in the spectral domain, wherein the excitation codebook search means , A speech coding apparatus characterized in that either one of Abs excitation codebook searching means and spectrum domain excitation codebook searching means is selectively used.

2. The speech coding apparatus according to claim 1, wherein the subframe discriminating unit estimates a background noise level of the subframe and uses the estimated background noise level for discriminating the subframe.

3. The spectral domain excitation codebook search means converts an input target signal and a speech code vector into a frequency domain, and determines an optimal excitation code vector based on a similarity in a frequency spectrum. The speech encoding device according to claim 1.

4. A speech encoding method for encoding a digital audio signal, comprising: a frame dividing step of dividing the digital audio signal into frames; and a subframe dividing step of dividing the frame into subframes shorter than the frame. A formant parameter extracting step of extracting and encoding the formant parameter of the frame from the frame; and an adaptive code for extracting and encoding the pitch period of the subframe using the subframe and the formant parameter of the frame. A book search step, an excitation source for extracting and encoding a noise source component of the subframe using the subframe, a formant parameter of the frame, the pitch period, and an excitation codebook including a plurality of excitation code vectors. Codebook search step; and properties of the subframe Further, the excitation codebook searching step includes: an Abs excitation codebook searching step of extracting and encoding a noise source component using an analysis method by synthesis; And a spectrum domain excitation codebook search step of extracting and encoding an excitation codebook search component in a region. In the excitation codebook search step, according to the determination result in the subframe determination step, the Abs
A speech coding method, wherein one of an excitation codebook search step and the spectral domain excitation codebook search step is selectively used.

5. The speech encoding method according to claim 4, wherein said sub-frame discriminating step estimates a background noise level of said sub-frame and uses it for discriminating said sub-frame.

6. The spectral domain excitation codebook search step converts an input target signal and a code vector into a frequency domain, and determines an optimal excitation code vector according to a frequency spectrum similarity. A speech encoding method according to claim 4 or claim 5.

7. A storage element readable by a computer, wherein the storage element stores an algorithm based on the speech encoding method according to claim 4. Description: