JPH11305798A

JPH11305798A - Voice compressing and encoding device

Info

Publication number: JPH11305798A
Application number: JP10130977A
Authority: JP
Inventors: Atsushi Yamane; 淳山根
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1998-04-27
Filing date: 1998-04-27
Publication date: 1999-11-05

Abstract

PROBLEM TO BE SOLVED: To decrease the amount of arithmetic operations in the encoding of a noise excitation source by encoding (quantizing) a target signal itself for noise source information extraction instead of the conventional noise excitation source code vector search in the process for encoding of the CELP(code excited linear prediction coding system). SOLUTION: A noise source extraction part 206 consists of a target signal constitution part 301, a discrete cosine (DCT) transformation part 302, and a coefficient conversion part 303, wherein the part 303 is equipped with a coefficient selection part 304 for selecting a DCT transformation coefficient, an intensity quantization part 305 for quantizing the intensity of the DCT transformation coefficient, a bit string output part 306 which receives the processing result and outputs a bit sequence of specific length. Here, the intensity quantization part 305 encodes the positions and values of coefficients by each specific number of coefficients and the coefficient value are encoded by using only the codes of the respective coefficients and the intensity representing all the coefficients.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は，留守番電話，音声
応答システム，ボイスメール等に適用される音声圧縮符
号化装置に関し，より詳細には，アナログ音声波形を入
力してディジタル音声信号に変換した後，該ディジタル
音声信号を所定の低ビットレート・低演算量の符号化方
式で高能率符号化することにより，蓄積メモリコストお
よび処理コストの同時低減を図った音声圧縮符号化装置
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice compression encoding apparatus applied to an answering machine, a voice response system, a voice mail, and the like. More particularly, the present invention relates to an analog voice waveform input and converted into a digital voice signal. Thereafter, the present invention relates to an audio compression encoding apparatus which simultaneously reduces the storage memory cost and the processing cost by performing high-efficiency encoding of the digital audio signal using a predetermined low bit rate and low operation amount encoding method.

【０００２】[0002]

【従来の技術】近年，自動車電話等の移動体通信におけ
るチャンネル容量の拡大や，マルチメディア通信におけ
る膨大な情報の蓄積・伝送の必要性から，実用的な低ビ
ットレート音声符号化技術に対する要求が高まってい
る。2. Description of the Related Art In recent years, there has been a demand for practical low bit rate speech coding technology due to the expansion of channel capacity in mobile communications such as automobile telephones and the necessity of storing and transmitting enormous information in multimedia communications. Is growing.

【０００３】また，ファクシミリ・モデムや，データ・
モデムの付加機能として，留守番電話のための音声符号
化／復号化機能を備えたものが求められており，この符
号化／復号化のための低ビットレートの音声圧縮符号化
手法の開発が望まれている。In addition, a facsimile modem, a data
As an additional function of the modem, a voice encoding / decoding function for an answering machine is required, and it is desired to develop a low bit rate voice compression / encoding method for the encoding / decoding. It is rare.

【０００４】現在，１０ｋｂｐｓ以下の低ビットレート
音声圧縮符号化方式の主流は，ＣＥＬＰ（ＣｏｄｅＥ
ｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ
ｃｏｄｉｎｇｓｙｓｔｅｍ）方式である。このＣＥＬ
Ｐ方式は，線形予測に基づく音声のＡＲ（Ａｕｔｏ−Ｒ
ｅｇｒｅｓｓｉｖｅ：自己回帰）モデルに基づいたモデ
ルベースの圧縮符号化方式である。At present, the mainstream of low-bit-rate speech compression encoding systems of 10 kbps or less is CELP (Code E).
xcited Linear Prediction
coding system). This CEL
The P method uses AR (Auto-R) for speech based on linear prediction.
This is a model-based compression encoding method based on an egressive (egressive) model.

【０００５】具体的には，符号化側において，音声をフ
レームと呼ばれる単位に分割し，それぞれの単位につい
てスペクトル包絡を表すＬＰＣ（ＬｉｎｅａｒＰｒｅ
ｄｉｃｔｉｏｎｃｏｄｉｎｇ：線形予測）係数，その
ピッチ情報を表すピッチラグ情報，音源情報である雑音
（源）情報，および，ピッチラグ情報と雑音源情報それ
ぞれに対応する利得情報の，それぞれに対応するパラメ
ータを抽出し，符号化（量子化）を行い，格納または伝
送するものである。なお，上記ピッチラグ情報，雑音源
情報，および，ビッチラグ情報および雑音源情報それぞ
れに対応する利得情報の処理（パラメータの抽出および
符号化等）は，フレームをさらに分割したサブフレーム
と呼ばれる単位に対して行われることもある。More specifically, on the encoding side, speech is divided into units called frames, and LPCs (Linear Presets) representing a spectral envelope for each unit.
A parameter corresponding to each of a prediction coding (linear prediction) coefficient, pitch lag information representing the pitch information, noise (source) information as sound source information, and gain information corresponding to each of the pitch lag information and the noise source information is extracted. , Encoding (quantization), and storing or transmitting. The processing of the pitch lag information, the noise source information, and the gain information corresponding to each of the bitch lag information and the noise source information (such as parameter extraction and encoding) is performed on a unit called a subframe obtained by further dividing the frame. Sometimes it is done.

【０００６】また，復号側では，符号化された各情報を
復元し，雑音源情報にピッチ情報および利得情報を加え
ることによって励振源信号を生成し，この励振源信号を
ＬＰＣ係数で構成される線形予測合成フィルタに通し，
合成音声を得るものである。On the decoding side, the coded information is restored, an excitation source signal is generated by adding pitch information and gain information to noise source information, and this excitation source signal is composed of LPC coefficients. Through a linear prediction synthesis filter,
This is to obtain synthesized speech.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら，上記従
来のＣＥＬＰ方式では，１０ｋｂｐｓの低ビットレート
において，良好な音声を得ることができるという利点を
有する反面，それぞれのパラメータの符号化過程におけ
る演算量の多さがリアルタイム処理を実現する上で障害
となるという問題点があった。However, the above-mentioned conventional CELP system has an advantage that a good voice can be obtained at a low bit rate of 10 kbps, but on the other hand, the amount of calculation in the encoding process of each parameter is reduced. There is a problem in that a large number hinders realization of real-time processing.

【０００８】特に，ピッチラグ情報の符号化や雑音源情
報の符号化については，励振源符号帳に蓄えられた各励
振源符号ベクトルを，線形予測合成フィルタに通すこと
によって合成音声を生成し，原音声と比較し，最も原音
声に近いものを選択することによって行われているが，
フィルタ演算には多くの演算を必要とするため，前記励
振源符号帳に蓄えられた全ての励振源符号ベクトルをフ
ィルタに通して比較を行うことを多大な計算を必要と
し，汎用のプロセッサでリアルタイムに処理を実現する
ことは大変困難であった。In particular, for encoding pitch lag information and noise source information, a synthetic speech is generated by passing each excitation source code vector stored in an excitation source codebook through a linear prediction synthesis filter. It is performed by selecting the one closest to the original sound compared to the sound,
Since many operations are required for the filter operation, it is necessary to perform a large amount of calculations to compare all excitation source code vectors stored in the excitation source codebook through a filter. It was very difficult to realize the processing.

【０００９】そのため，これまでに，さまざまな演算量
削減のための改良が行われている。例えば，全ての励振
源符号ベクトルに対してフィルタ演算を行い原音声との
比較を行うのではなく，近似的に原音声との比較を行う
ことのできる比較的演算量の少ないパラメータによって
励振源符号ベクトルを少数に絞り込むという予備選択手
法はその一つである。For this reason, various improvements have been made to reduce the amount of computation. For example, instead of performing a filter operation on all the excitation source code vectors and comparing them with the original speech, the excitation source code can be approximately compared with the original speech using parameters with a relatively small amount of computation. One of the preselection methods is to narrow down the vectors to a small number.

【００１０】また，前記励振源符号帳は，与えられたビ
ット数によって表される数の励振源符号ベクトルを蓄え
ているのが一般的であるが，その構成を工夫することに
より，演算量を削減する方法も提案されている。励振源
符号ベクトルをビット数分だけ有し，それらの和と差で
ビット数によって表される数の励振源符号ベクトルを表
すことにより，フィルタ計算の数を激減させるＶＳＥＬ
Ｐ（ＶｅｃｔｏｒＳｕｍＥｘｃｉｔｅｄＬｉｎｅ
ａｒＰｒｅｄｉｃｔｉｏｎｃｏｄｉｎｇ）方式はそ
の一例である。In general, the excitation source codebook stores a number of excitation source code vectors represented by a given number of bits. However, by devising the configuration, the amount of calculation is reduced. Methods to reduce it have also been proposed. VSEL, which has the number of excitation source code vectors by the number of bits and represents the number of excitation source code vectors represented by the number of bits by the sum and difference thereof, drastically reduces the number of filter calculations.
P (Vector Sum Excited Line)
The ar Prediction coding method is one example.

【００１１】本発明は上記に鑑みてなされたものであっ
て，ＣＥＬＰ方式の符号化の過程において，雑音励振源
符号ベクトル探索によって行っていた雑音源情報の符号
化（量子化）を，雑音源情報抽出のための目標信号その
ものを符号化（量子化）することにより，雑音励振源の
符号化における演算量を削減した音声圧縮符号化装置を
提供することを目的とする。The present invention has been made in view of the above. In the CELP coding process, the coding (quantization) of noise source information performed by searching for a noise excitation source code vector is performed by a noise source. It is an object of the present invention to provide a speech compression encoding apparatus in which a target signal itself for information extraction is encoded (quantized) to reduce a calculation amount in encoding a noise excitation source.

【００１２】[0012]

【課題を解決するための手段】上記の目的を達成するた
めに，請求項１に係る音声圧縮符号化装置は，アナログ
音声波形をディジタル音声信号にディジタル化するＡ／
Ｄ変換手段と，前記ディジタル音声信号を所定の符号化
方式で符号化する音声符号化手段と，前記符号化された
ディジタル音声信号を蓄積する蓄積手段と，前記蓄積さ
れたディジタル音声信号を取り出して復号化する音声復
号化手段と，前記復号化されたディジタル音声信号をア
ナログ音声信号に変換するＤ／Ａ変換手段と，を有する
音声圧縮符号化装置において，前記音声符号化手段が，
前記ディジタル音声信号をフレームと呼ばれる処理単位
に分割するフレーム分割手段と，前記分割したフレーム
についてスペクトル包絡を表すスペクトル包絡情報を抽
出して符号化するスペクトル包絡符号化手段と，前記分
割したフレームからサブフレームと呼ばれる処理単位を
構成するサブフレーム構成手段と，前記サブフレームの
ピッチ情報を抽出して符号化するピッチ情報抽出手段
と，前記サブフレームの前記ピッチ情報に対応する利得
情報を抽出して符号化する利得情報抽出手段と，前記サ
ブフレームの音源情報である雑音源情報を抽出して符号
化する雑音源情報抽出手段と，を備え，前記音声復号化
手段が，前記符号化されたスペクトル包絡情報を復号す
るスペクトル包絡情報復号手段と，前記符号化された雑
音源情報を復号する雑音源情報復号手段と，前記符号化
されたピッチ情報を復号するピッチ情報復号手段と，前
記符号化された利得情報を復号する利得情報復号手段
と，前記復号された雑音源情報，ピッチ情報および利得
情報から励振源信号を生成する励振源信号生成手段と，
前記励振源信号と前記復号されたスペクトル包絡情報と
から合成信号を生成する合成信号生成手段と，を備え，
前記雑音源情報抽出手段が，雑音源情報抽出の目標信号
を抽出する目標信号抽出手段と，前記抽出された目標信
号を離散コサイン変換係数列に変換する離散コサイン変
換手段と，前記離散コサイン変換手段で得られた離散コ
サイン変換係数列を所定長のビット列に変換する係数変
換手段と，を備え，さらに，前記係数変換手段が，前記
離散コサイン変換係数列から離散コサイン変換係数の選
択を行う係数選択手段と，前記係数選択手段で選択した
離散コサイン変換係数の強度を量子化する強度量子化手
段と，前記係数選択手段および強度量子化手段の処理結
果を受けて所定長のビット列を出力するビット列出力手
段と，を備え，さらに，前記強度量子化手段が，前記係
数選択手段で選択した離散コサイン変換係数に関して，
係数の位置と係数値とを所定数ずつ符号化し，前記係数
値の符号化に関しては、各係数の符号と，全係数を代表
する強度のみとを用いて符号化するものである。In order to achieve the above object, a speech compression encoding apparatus according to the first aspect of the present invention comprises an A / D converter for digitizing an analog speech waveform into a digital speech signal.
D conversion means, voice coding means for coding the digital voice signal by a predetermined coding method, storage means for storing the coded digital voice signal, and taking out the stored digital voice signal An audio compression encoding apparatus comprising: audio decoding means for decoding; and D / A conversion means for converting the decoded digital audio signal into an analog audio signal.
Frame dividing means for dividing the digital audio signal into processing units called frames; spectrum envelope encoding means for extracting and encoding spectrum envelope information representing a spectrum envelope for the divided frames; Sub-frame forming means forming a processing unit called a frame; pitch information extracting means for extracting and encoding pitch information of the sub-frame; and gain information corresponding to the pitch information of the sub-frame by extracting and encoding Gain information extracting means for converting the encoded spectral envelope into noise information, and noise source information extracting means for extracting and encoding the noise source information which is the sound source information of the subframe. Spectrum envelope information decoding means for decoding information, and decoding the encoded noise source information Sound source information decoding means, pitch information decoding means for decoding the encoded pitch information, gain information decoding means for decoding the encoded gain information, the decoded noise source information, pitch information and gain Excitation source signal generation means for generating an excitation source signal from the information;
Combined signal generation means for generating a combined signal from the excitation source signal and the decoded spectral envelope information,
The noise source information extracting means for extracting a target signal for noise source information extraction, a discrete cosine transform means for transforming the extracted target signal into a discrete cosine transform coefficient sequence, and the discrete cosine transform means Coefficient conversion means for converting the discrete cosine transform coefficient sequence obtained in the above into a bit string of a predetermined length, further comprising a coefficient selecting means for selecting a discrete cosine transform coefficient from the discrete cosine transform coefficient sequence. Means, intensity quantizing means for quantizing the intensity of the discrete cosine transform coefficient selected by the coefficient selecting means, and a bit string output for outputting a bit string of a predetermined length in response to processing results of the coefficient selecting means and the intensity quantizing means. Means, and the intensity quantizing means further comprises: a discrete cosine transform coefficient selected by the coefficient selecting means.
The position of the coefficient and the coefficient value are encoded by a predetermined number, and the encoding of the coefficient value is performed using only the sign of each coefficient and the intensity representative of all the coefficients.

【００１３】また，請求項２に係る音声圧縮符号化装置
は，請求項１記載の音声圧縮符号化装置において，さら
に，前記係数選択手段が，前記離散コサイン変換係数を
所定数のブロックに分割し、各ブロックから所定数の係
数を選択するものである。According to a second aspect of the present invention, in the audio compression encoding apparatus according to the first aspect, the coefficient selecting means divides the discrete cosine transform coefficient into a predetermined number of blocks. , A predetermined number of coefficients are selected from each block.

【００１４】[0014]

【発明の実施の形態】以下，本発明の音声圧縮符号化装
置の一実施の形態について添付の図面を参照して詳細に
説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a voice compression encoding apparatus according to the present invention.

【００１５】図１は，本実施の形態の音声圧縮符号化装
置１００の概略構成図を示す。音声圧縮符号化装置１０
０は，アナログ音声波形をディジタル音声信号にディジ
タル化するＡ／Ｄ変換手段としてのＡ／Ｄ変換部１０１
と，Ａ／Ｄ変換部１０１からディジタル音声信号を入力
して，ディジタル音声信号を所定の符号化方式で符号化
する音声符号化手段としての音声符号化部１０２と，音
声符号化部１０２で符号化されたディジタル音声信号
（符号化されたスペクトル包絡情報，ピッチ情報，利得
情報および雑音源情報）を蓄積する蓄積手段としての蓄
積部１０３と，蓄積されたディジタル音声信号を取り出
して復号化する音声復号化手段としての音声復号化部１
０４と，復号化されたディジタル音声信号をアナログ音
声信号に変換するＤ／Ａ変換手段としてのＤ／Ａ変換部
１０５と，から構成される。なお，Ａ／Ｄ変換部１０１
としては，例えば，Ａ／Ｄ変換装置，ＰＣ用サウンドボ
ード等が挙げられる。また，Ｄ／Ａ変換部１０５として
は，例えば，Ｄ／Ａ変換装置，ＰＣ用サウンドボード等
が挙げられる。FIG. 1 is a schematic configuration diagram of a speech compression encoding apparatus 100 according to the present embodiment. Speech compression encoding device 10
0 denotes an A / D conversion unit 101 as A / D conversion means for digitizing an analog audio waveform into a digital audio signal.
And a digital audio signal from the A / D conversion unit 101, and a voice encoding unit 102 as a voice encoding unit for encoding the digital voice signal by a predetermined encoding method. Storage section 103 as storage means for storing the encoded digital audio signal (encoded spectrum envelope information, pitch information, gain information, and noise source information), and audio for extracting and decoding the stored digital audio signal Speech decoding unit 1 as decoding means
And a D / A conversion unit 105 as D / A conversion means for converting the decoded digital audio signal into an analog audio signal. The A / D converter 101
Examples thereof include an A / D converter, a PC sound board, and the like. Further, examples of the D / A converter 105 include a D / A converter, a PC sound board, and the like.

【００１６】図２は，音声符号化部１０２のブロック構
成図を示す。音声符号化部１０２は，入力したディジタ
ル音声信号を予め定められたサンプル数（例えば，２４
０サンプル）のフレームと呼ばれる単位に分割し，フレ
ーム信号を出力するフレーム構成部２０１と，フレーム
構成部２０１で分割したフレーム（フレーム信号）か
ら，フレーム単位でスペクトル包絡を表すスペクトル包
絡情報を抽出して符号化するスペクトル包絡抽出部２０
２と，フレーム構成部２０１で分割したフレームを更に
予め定められたサンプル数（例えば，６０サンプル）の
サブフレーム単位に分割し，サブフレーム信号を出力す
るサブフレーム構成部２０３と，スペクトル包絡抽出部
２０２で抽出したスペクトル包絡情報を用いて，サブフ
レーム構成部２０３で分割したサブフレームからピッチ
情報（ピッチラグ情報）を抽出して符号化するピッチ情
報抽出部２０４と，サブフレームのピッチ情報に対応す
る利得情報を抽出して符号化する利得抽出部２０５と，
スペクトル包絡情報，サブフレーム，ピッチ情報および
利得情報からサブフレームの音源情報である雑音源情報
を抽出して符号化する雑音源抽出部２０６と，から構成
される。FIG. 2 shows a block diagram of the speech encoding unit 102. The voice coding unit 102 converts the input digital voice signal into a predetermined number of samples (for example, 24
(0 sample), a frame composing unit 201 for outputting a frame signal and extracting the frame envelope information representing the spectrum envelope in frame units from the frame (frame signal) divided by the frame composing unit 201 Envelope extracting section 20 for encoding
2, a subframe forming unit 203 that further divides the frame divided by the frame forming unit 201 into subframe units of a predetermined number of samples (for example, 60 samples) and outputs a subframe signal, and a spectrum envelope extracting unit A pitch information extraction unit 204 that extracts and encodes pitch information (pitch lag information) from the subframes divided by the subframe configuration unit 203 using the spectrum envelope information extracted in 202, and corresponds to the pitch information of the subframe. A gain extraction unit 205 for extracting and encoding gain information;
A noise source extraction unit 206 that extracts and encodes noise source information as excitation information of the subframe from the spectrum envelope information, the subframe, the pitch information, and the gain information.

【００１７】また，図３は，雑音源抽出部２０６のブロ
ック構成図を示す。雑音抽出部２０６は，スペクトル包
絡抽出部２０２で抽出されたスペクトル包絡情報，サブ
フレーム構成部２０３から出力されたサブフレーム信
号，ピッチ情報抽出部２０４で抽出されたピッチ情報お
よび利得抽出部２０５で抽出された利得情報を用いて，
雑音源情報抽出の目標信号を抽出する目標信号構成部３
０１と，目標信号を離散コサイン変換（Ｄｉｓｃｒｅｔ
ｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ：以下，ＤＣＴ
と記述する）して，ＤＣＴ係数列を得るＤＣＴ変換部３
０２と，ＤＣＴ変換部３０２によるＤＣＴ変換で得られ
たＤＣＴ係数列を所定長のビット列に変換する係数変換
部３０３と，から構成される。FIG. 3 is a block diagram of the noise source extracting unit 206. The noise extraction unit 206 extracts the spectrum envelope information extracted by the spectrum envelope extraction unit 202, the subframe signal output from the subframe construction unit 203, the pitch information extracted by the pitch information extraction unit 204, and the gain extraction unit 205. Using the gain information obtained,
Target signal forming unit 3 for extracting a target signal for noise source information extraction
01 and a discrete cosine transform (Discret)
e Cosine Transform: DCT
DCT transform unit 3 for obtaining a DCT coefficient sequence
02, and a coefficient transforming unit 303 that transforms a DCT coefficient sequence obtained by the DCT transform by the DCT transforming unit 302 into a bit sequence of a predetermined length.

【００１８】さらに，係数変換部３０３は，ＤＣＴ係数
列からＤＣＴ係数を選択する係数選択部３０４と，係数
選択部３０４で選択したＤＣＴ係数の強度を量子化する
強度量子化部３０５と，強度量子化部３０５の処理結果
を入力して所定長のビット列を出力するビット列出力部
３０６とを有する。Further, a coefficient conversion unit 303 includes a coefficient selection unit 304 for selecting a DCT coefficient from the DCT coefficient sequence, an intensity quantization unit 305 for quantizing the intensity of the DCT coefficient selected by the coefficient selection unit 304, and an intensity quantization unit 305. And a bit string output section 306 that receives the processing result of the conversion section 305 and outputs a bit string of a predetermined length.

【００１９】更に，図４は，音声復号化部１０４のブロ
ック構成図を示す。音声復号化部１０４は，蓄積部１０
３から取り出したディジタル音声信号（符号化されたス
ペクトル包絡情報，ピッチ情報，利得情報および雑音源
情報）を入力し，符号化されたスペクトル包絡情報を復
号するスペクトル包絡復号部４０１と，符号化されたピ
ッチ情報を復号するピッチ情報復号部４０２と，符号化
された雑音源情報から雑音源情報を復号する雑音源復号
部４０３と，符号化された利得情報を復号する利得復号
部４０４と，復号されたスペクトル包絡情報と，復号さ
れたピッチ情報，利得情報および雑音源情報から生成さ
れた励振源信号とから合成音声を生成する音声合成部４
０５と，から構成される。FIG. 4 is a block diagram of the speech decoding unit 104. The audio decoding unit 104 stores
3, a spectrum envelope decoding unit 401 for receiving the digital audio signal (encoded spectrum envelope information, pitch information, gain information, and noise source information) and decoding the encoded spectrum envelope information. A pitch information decoding unit 402 for decoding the encoded pitch information, a noise source decoding unit 403 for decoding the noise source information from the encoded noise source information, a gain decoding unit 404 for decoding the encoded gain information, A speech synthesizer 4 that generates a synthesized speech from the decoded spectrum envelope information and the excitation source signal generated from the decoded pitch information, gain information, and noise source information.
05.

【００２０】以上の構成において，図５の本実施の形態
の音声圧縮符号化装置１００の概略フローチャートおよ
び図６の音声符号化部の動作手順を示すフローチャート
を参照してその動作を説明する。図１において，アナロ
グ音声入力装置（図示せず）から入力されたアナログ音
声信号（アナログ音声波形）はＡ／Ｄ変換部１０１によ
ってディジタル音声信号に変換される（Ｓ５０１）。こ
こで，アナログ音声入力装置としては，例えば，マイク
ロフォンや，ＣＤプレーヤ，カセットデッキ等が挙げら
れる。The operation of the above configuration will be described with reference to a schematic flowchart of the speech compression encoding apparatus 100 of this embodiment in FIG. 5 and a flowchart showing the operation procedure of the speech encoding unit in FIG. In FIG. 1, an analog audio signal (analog audio waveform) input from an analog audio input device (not shown) is converted into a digital audio signal by the A / D converter 101 (S501). Here, examples of the analog audio input device include a microphone, a CD player, a cassette deck, and the like.

【００２１】続いて，ディジタル音声信号を入力した音
声符号化部１０２は，ディジタル音声信号を所定の符号
化方式で符号化する（Ｓ５０２）。ここで，図６の音声
符号化部１０２の概略フローチャートを参照して，音声
符号化部１０２による音声符号化処理について詳細に説
明する。Subsequently, the voice coding unit 102, which has received the digital voice signal, codes the digital voice signal by a predetermined coding method (S502). Here, the speech encoding process by the speech encoding unit 102 will be described in detail with reference to a schematic flowchart of the speech encoding unit 102 in FIG.

【００２２】先ず，ディジタル音声信号は，フレーム構
成部２０１のフレーム構成処理によって，予め定められ
たサンプル数（例えば，２４０サンプル）のフレームと
呼ばれる単位に分割される（Ｓ６０１）。なお，このフ
レームはフレーム信号としてスペクトル包絡抽出部２０
２およびサブフレーム構成部２０３に出力される。First, the digital audio signal is divided into a unit called a frame having a predetermined number of samples (for example, 240 samples) by the frame composing process of the frame composing unit 201 (S601). This frame is used as a frame signal in the spectral envelope extraction unit 20.
2 and output to the subframe configuration section 203.

【００２３】次に，スペクトル包絡抽出部２０２のスペ
クトル包絡抽出処理によって，該フレーム信号からスペ
クトル包絡情報を抽出して符号化（量子化）し，ピッチ
情報抽出部２０４および雑音源抽出部２０６へ出力する
（Ｓ６０２）。スペクトル包絡情報としては，例えば，
線形予測分析に基づく線形予測係数，ＰＡＲＣＯＲ係
数，ＬＳＰ係数等が挙げられる。またスペクトル包絡情
報の符号化（量子化）には，ベクトル量子化や，スカラ
ー量子化，分割ベクトル量子化，多段ベクトル量子化，
あるいはそれらの複数の量子化の組み合わせが挙げられ
る。Next, spectrum envelope information is extracted and encoded (quantized) from the frame signal by a spectrum envelope extraction process of a spectrum envelope extraction unit 202, and output to a pitch information extraction unit 204 and a noise source extraction unit 206. (S602). As the spectrum envelope information, for example,
Examples include a linear prediction coefficient based on a linear prediction analysis, a PARCOR coefficient, and an LSP coefficient. The encoding (quantization) of the spectral envelope information includes vector quantization, scalar quantization, split vector quantization, multistage vector quantization,
Alternatively, a combination of a plurality of quantizations may be used.

【００２４】一方，サブフレーム構成部２０３は，フレ
ーム構成部２０１からフレーム信号を入力すると，サブ
フレーム構成処理を実行して，該フレーム信号を予め定
められたサンプル数（例えば，６０サンプル）に分割
し，サブフレーム信号として出力する（Ｓ６０３）。な
お，条件によってはサブフレームがフレームと一致する
場合もある。On the other hand, when the frame signal is input from the frame forming unit 201, the subframe forming unit 203 executes a subframe forming process to divide the frame signal into a predetermined number of samples (for example, 60 samples). Then, it is output as a subframe signal (S603). Note that the subframe may coincide with the frame depending on conditions.

【００２５】各サブフレーム信号は，ピッチ情報抽出部
２０４のピッチ情報抽出処理によって，スペクトル包絡
抽出部２０２によって抽出されたスペクトル包絡情報を
用いて，ピッチ情報が抽出され，符号化される（Ｓ６０
４）。なお，ピッチ情報の抽出には，ＣＥＬＰ方式にお
ける適応符号帳探索，またはフーリエ変換やウェーブレ
ット変換等のスペクトル情報から求める方法等が考えら
れる。適応符号帳探索の場合，聴覚重みづけフィルタを
用いる場合もある。聴覚重みづけフィルタは線形予測係
数から構成することができる。The pitch information of each subframe signal is extracted and encoded by the pitch information extraction processing of the pitch information extraction section 204 using the spectrum envelope information extracted by the spectrum envelope extraction section 202 (S60).
4). Note that the pitch information may be extracted by an adaptive codebook search in the CELP method, or a method of obtaining from pitch information such as Fourier transform or wavelet transform. In the case of an adaptive codebook search, an auditory weighting filter may be used. The auditory weighting filter can be composed of linear prediction coefficients.

【００２６】ピッチ情報抽出部２０４で抽出されたピッ
チ情報は，利得抽出部２０５に入力され，利得抽出処理
によって利得情報（利得成分）が抽出されて符号化され
る（Ｓ６０５）。The pitch information extracted by the pitch information extraction unit 204 is input to the gain extraction unit 205, and gain information (gain component) is extracted and encoded by a gain extraction process (S605).

【００２７】雑音源抽出部２０６では，目標信号構成部
３０１，ＤＣＴ変換部３０２，および係数変換部３０３
によって雑音源抽出処理が実行される（Ｓ６０６）。具
体的には，先ず，図３に示すように，雑音源情報抽出の
目標信号を抽出する目標信号構成部３０１がサブフレー
ム信号，スペクトル包絡情報，ピッチ情報および利得情
報を入力し，雑音源情報抽出の目標信号を構成する。こ
の際，前サブフレームまでの残差信号と，ピッチ情報抽
出部２０４において抽出したピッチ情報と，利得抽出部
２０５において抽出した利得情報と，によってピッチ成
分残差信号を構成し，さらに，構成したピッチ成分残差
信号とスペクトル包絡情報とからピッチ成分信号を構成
した後，サブフレーム信号からピッチ成分信号を差し引
くことによって雑音源情報抽出の目標信号を得ることが
できる。なお，ピッチ成分残差信号とスペクトル包絡情
報とからピッチ成分信号を得るには，スペクトル包絡情
報によって得られる合成フィルタに残差信号を通す等の
方法を用いることができる。このようにして構成された
目標信号は，ＤＣＴ変換部３０２に出力される。The noise source extracting section 206 includes a target signal forming section 301, a DCT transforming section 302, and a coefficient transforming section 303.
Performs a noise source extraction process (S606). More specifically, first, as shown in FIG. 3, a target signal composing unit 301 for extracting a target signal for noise source information extraction inputs a subframe signal, spectrum envelope information, pitch information, and gain information, and Construct the target signal for the extraction. At this time, a pitch component residual signal is composed of the residual signal up to the previous subframe, the pitch information extracted by the pitch information extracting section 204, and the gain information extracted by the gain extracting section 205, and further composed. After constructing a pitch component signal from the pitch component residual signal and the spectrum envelope information, a target signal for noise source information extraction can be obtained by subtracting the pitch component signal from the subframe signal. In order to obtain a pitch component signal from the pitch component residual signal and the spectrum envelope information, a method of passing the residual signal through a synthesis filter obtained from the spectrum envelope information can be used. The target signal thus configured is output to DCT transform section 302.

【００２８】次に，ＤＣＴ変換部３０２は，目標信号を
入力してＤＣＴ変換し，ＤＣＴ変換によって得られた複
数のＤＣＴ係数（すなわち，ＤＣＴ係数列）を係数変換
部３０３へ出力する。Next, the DCT transform unit 302 receives the target signal, performs DCT transform, and outputs a plurality of DCT coefficients (that is, a DCT coefficient sequence) obtained by the DCT transform to the coefficient transform unit 303.

【００２９】係数変換部３０３では，係数選択部３０４
が，複数のＤＣＴ係数を入力すると，ＤＣＴ係数列から
ＤＣＴ係数の選択を行い，選択結果を出力する。なお，
係数選択部３０４における係数の選択方法としては，係
数の振幅の絶対値（すなわち，強度）が最大のものから
所定数を選択するという方法が挙げられる。また，ＤＣ
Ｔ係数列を均等，あるいは不均等にブロックに分割し，
それぞれのブロックから選択を行い，その選択された係
数の数の和を所定数にするという方法も挙げられる。In the coefficient conversion unit 303, the coefficient selection unit 304
Receives a plurality of DCT coefficients, selects a DCT coefficient from a DCT coefficient sequence, and outputs a selection result. In addition,
As a method of selecting a coefficient in the coefficient selection unit 304, a method of selecting a predetermined number from the coefficient having the largest absolute value (that is, intensity) of the amplitude of the coefficient can be cited. In addition, DC
The T coefficient sequence is divided into blocks evenly or unequally,
There is also a method in which a selection is made from each block and the sum of the numbers of the selected coefficients is set to a predetermined number.

【００３０】強度量子化部３０５は，係数選択部３０４
によって選択されたＤＣＴ係数の位置および係数値（符
号および強度）を符号化する。なお，符号化の際に，位
置および符号については係数毎に符号化を行い，強度に
ついては，係数毎に符号化するのではなく，全係数を代
表する一つあるいは複数の値を算出し，符号化する。一
つの強度の代表値を算出する方法としては，入力サブフ
レーム信号をサブフレーム長の次元を持つベクトル
（ｘ）とし，選択された係数に所定の強度を与え，選択
されなかった係数にゼロ値を与えた係数列を逆ＤＣＴ変
換して得られた信号列をベクトル（ａ）とした時に，｜
ｘ−ｇａ｜が最小になるようにｇを求めるといった方法
が挙げられる。The intensity quantization unit 305 includes a coefficient selection unit 304
And encodes the position and coefficient value (sign and strength) of the DCT coefficient selected by. At the time of encoding, position and sign are coded for each coefficient, and strength is not coded for each coefficient, but one or more values representative of all coefficients are calculated. Encode. As a method of calculating a representative value of one intensity, an input subframe signal is defined as a vector (x) having a dimension of a subframe length, a predetermined intensity is given to selected coefficients, and a zero value is given to unselected coefficients. When the signal sequence obtained by performing the inverse DCT transform on the coefficient sequence given with
There is a method of obtaining g so that x-ga | is minimized.

【００３１】ビット列出力部３０６は，強度量子化部３
０５の量子化結果を入力して所定のビット長のビット列
を出力する。The bit string output unit 306 is provided for the intensity quantization unit 3
The quantization result of step 05 is input and a bit string having a predetermined bit length is output.

【００３２】図５に戻って，音声符号化部１０２から出
力された量子化信号（符号化したディジタル音声信号）
は，蓄積部１０３によって蓄積される（Ｓ５０３）。Returning to FIG. 5, the quantized signal (encoded digital audio signal) output from audio encoding section 102
Are stored by the storage unit 103 (S503).

【００３３】次に，蓄積部１０３に蓄積された量子化信
号（符号化されたディジタル音声信号）は，必要に応じ
て，音声復号化部１０４によって読み出されて復号化さ
れる（Ｓ６０４）。音声復号化部１０４においては，図
４に示すように，スペクトル包絡復号部４０１でスペク
トル包絡情報が復号され，ピッチ情報復号部４０２でピ
ッチ情報が復号され，雑音源復号部４０３で雑音源情報
が復号され，さらに，利得復号部４０４で利得情報が復
号される。ここで，復号されたピッチ情報，雑音源情報
および利得情報は，残差信号（励振源信号）を構成す
る。音声合成部４０５は，復号されたスペクトル包絡情
報と残差信号とからディジタル音声信号である復号音声
（合成音声）を生成して，Ｄ／Ａ変換部１０５に出力す
る。Next, the quantized signal (encoded digital audio signal) stored in the storage section 103 is read and decoded by the audio decoding section 104 as necessary (S604). In speech decoding section 104, as shown in FIG. 4, spectrum envelope information is decoded by spectrum envelope decoding section 401, pitch information is decoded by pitch information decoding section 402, and noise source information is decoded by noise source decoding section 403. The gain information is decoded by the gain decoding unit 404. Here, the decoded pitch information, noise source information, and gain information constitute a residual signal (excitation source signal). The speech synthesis unit 405 generates a decoded speech (synthesized speech) as a digital speech signal from the decoded spectrum envelope information and the residual signal, and outputs the decoded speech to the D / A conversion unit 105.

【００３４】続いて，音声合成部４０５（すなわち，音
声復号化部１０４）から出力されたディジタル音声信号
は，図１に示すように，Ｄ／Ａ変換部１０５でアナログ
音声信号（アナログ音声波形）に変換される（Ｓ５０
５）。Subsequently, the digital audio signal output from the audio synthesizer 405 (ie, the audio decoder 104) is converted into an analog audio signal (analog audio waveform) by the D / A converter 105 as shown in FIG. (S50
5).

【００３５】なお，選択されたＤＣＴ係数の符号化に
は，係数の位置（すなわち，周波数成分），および係数
値（符号および強度）を所定のビット長に符号化するも
のが考えられる。雑音源情報抽出の目標信号は，音声の
スペクトル包絡やピッチといった成分を除いた，位相的
な成分を担っているため，特定の周波数の強度を忠実に
符号化するよりも，特徴的な周波数成分の位相的な特徴
を重視した方が聴感的な音質は良くなる。そのため，係
数の位置および符号については係数毎に符号化し，強度
については全係数を代表する値を算出し，符号化するこ
とにより，強度の符号化に要するビット数を削減し，そ
の分選択する係数の数を増やすことができるようにな
る。The coding of the selected DCT coefficient may be performed by coding the coefficient position (ie, frequency component) and the coefficient value (code and strength) to a predetermined bit length. Since the target signal for noise source information extraction is a phase component excluding components such as the speech spectral envelope and pitch, the characteristic frequency component is more accurate than the strength of a specific frequency. If the emphasis is placed on the topological characteristics, the audible sound quality is improved. For this reason, the position and sign of the coefficient are coded for each coefficient, and for the strength, a value representative of all the coefficients is calculated and coded, thereby reducing the number of bits required for coding the strength and selecting as much. The number of coefficients can be increased.

【００３６】ここで，図７（ａ），（ｂ）を参照して具
体的に説明する。図において，（ａ）は，係数毎に符号
化する方法，（ｂ）は全係数を代表する一つの値を符号
化する方法を示し，図から明らかなように，（ａ）では
（ａ＋ｂ＋ｃ）×ｎビットが必要となり，（ｂ）では
（ａ＋ｂ）×ｎ＋ｄビットが必要となる。この場合，
（ｂ）はｃ×ｎ−ｄビット（ｄはｃと同程度であると考
えることができる）分だけ少なくて良いため，減少分の
ビットを係数の本数（ｎ）の増加に配分することによ
り，より精度の高い目標信号の位相成分の符号化が可能
になり，同程度のビットレートで（ａ）よりも音質の向
上を期待することができる。Here, a specific description will be given with reference to FIGS. 7 (a) and 7 (b). In the figure, (a) shows a method of coding for each coefficient, and (b) shows a method of coding one value representing all coefficients. As is clear from the figure, (a) shows (a + b + c) × n bits are required, and (b) requires (a + b) × n + d bits. in this case,
Since (b) can be reduced by c × n−d bits (d can be considered to be substantially the same as c), the reduced bits are allocated to the increase in the number (n) of coefficients. Thus, it is possible to encode the phase component of the target signal with higher accuracy, and it is possible to expect improvement in sound quality as compared with (a) at the same bit rate.

【００３７】また，請求項２に係る音声圧縮符号化装置
のように，係数選択部３０４が，ＤＣＴ係数を所定数の
ブロックに分割し，各ブロックから所定数の係数を選択
する場合は，例えば，予め学習データにより各係数の選
択される頻度を分析し，その頻度にしたがって適切に係
数をブロック分割することにより，音質を劣化させるこ
となく選択するビット数を大幅に削減することも可能で
ある。When the coefficient selection unit 304 divides the DCT coefficient into a predetermined number of blocks and selects a predetermined number of coefficients from each block, as in the speech compression encoding apparatus according to the second aspect, for example, By analyzing in advance the frequency at which each coefficient is selected based on the learning data and dividing the coefficient appropriately according to the frequency, it is possible to greatly reduce the number of bits to be selected without deteriorating sound quality. .

【００３８】前述したように本実施の形態は，ＣＥＬＰ
音声符号化に属する音声圧縮符号化方法を用いた音声圧
縮符号化装置である。As described above, the present embodiment uses the CELP
This is an audio compression encoding device using an audio compression encoding method belonging to audio encoding.

【００３９】従来のＣＥＬＰ方式では，励振源となる符
号帳を持ち，符号帳に属する各符号ベクトルを音声のス
ペクトル包絡を表す線形予測フィルタに通し，この結果
を雑音源情報符号化の目標信号と比較し，最も近い合成
信号を与える符号を得る。因みに，この探索においては
聴覚重みづけフィルタを用いることができる。ところ
が，ＣＥＬＰ方式は，高音質で低ビットレートの音声圧
縮符号化技術であるものの，雑音源情報の符号化におけ
る演算量の多さが問題となっている。In the conventional CELP system, a codebook serving as an excitation source is provided, and each code vector belonging to the codebook is passed through a linear prediction filter representing a speech spectral envelope, and the result is used as a target signal for noise source information encoding. By comparison, a code that gives the closest synthesized signal is obtained. Incidentally, an auditory weighting filter can be used in this search. However, although the CELP method is a high-quality, low-bit-rate voice compression coding technique, it has a problem in that the amount of calculation in coding the noise source information is large.

【００４０】これに対して，本実施の形態の音声圧縮符
号化装置によれば，この雑音源情報抽出の目標信号の符
号化において，符号帳やフィルタ計算によらず，目標信
号を離散コサイン変換（ＤＣＴ）し，その結果であるＤ
ＣＴ係数を所定長のビット列に変換することにより符号
化を行うものである。前述したようにＤＣＴ係数は，係
数変換部３０３に送られて，係数選択部３０４で選択さ
れ，強度量子化部３０５をへて，ビット列出力部３０６
において所定長のビット列に変換される。すなわち，符
号帳を持たず，かつ，フィルタ計算を用いた符号帳探索
を行わないため，従来のＣＥＬＰ方式と比較して，高品
質かつ低ビットレートを維持しつつ，低演算量の音声圧
縮符号化装置が可能になる。On the other hand, according to the speech compression encoding apparatus of the present embodiment, in encoding the target signal for the noise source information extraction, the target signal is subjected to discrete cosine transform regardless of the codebook or filter calculation. (DCT) and the resulting D
The encoding is performed by converting the CT coefficient into a bit string of a predetermined length. As described above, the DCT coefficients are sent to the coefficient conversion unit 303, selected by the coefficient selection unit 304, passed through the intensity quantization unit 305, and output to the bit string output unit 306.
Is converted into a bit string of a predetermined length. In other words, since it does not have a codebook and does not perform codebook search using filter calculation, it is possible to maintain a high quality and a low bit rate as compared with the conventional CELP system, and to reduce the amount of operation of a speech compression code. Device becomes possible.

【００４１】また，前述した実施の形態では，入力した
アナログ音声波形をディジタル音声信号にディジタル化
し，符号化（圧縮）して蓄積し，蓄積されているディジ
タル音声信号（符号化されたディジタル音声信号）を取
り出して復号化し，さらにアナログ音声信号に変換して
出力する例を記述しているが，本発明の本質とするとこ
ろは，音声圧縮符号化装置における符号化および復号化
の手段にあり，本発明の装置で，符号化した信号をネッ
トワークや通信装置等の伝送手段を介して，伝送した
後，本発明の装置で復号化する場合にも，当然のことな
がら本発明の範疇に属することは明らかである。In the above-described embodiment, the input analog audio waveform is digitized into a digital audio signal, encoded (compressed) and stored, and the stored digital audio signal (the encoded digital audio signal) is stored. ) Is decoded, decoded, and further converted to an analog audio signal and output. However, the essence of the present invention lies in the encoding and decoding means in the audio compression encoding apparatus. Even if the coded signal is transmitted by the device of the present invention through a transmission means such as a network or a communication device and then decoded by the device of the present invention, the signal naturally belongs to the scope of the present invention. Is clear.

【００４２】[0042]

【発明の効果】以上説明したように，本発明の音声圧縮
符号化装置（請求項１）は、雑音源情報抽出手段が，目
標信号を抽出する目標信号抽出手段と，抽出された目標
信号を離散コサイン変換係数列に変換する離散コサイン
変換手段と，離散コサイン変換手段で得られた離散コサ
イン変換係数列を所定長のビット列に変換する係数変換
手段と，を備え，さらに，係数変換手段が，離散コサイ
ン変換係数列から離散コサイン変換係数の選択を行う係
数選択手段と，係数選択手段で選択した離散コサイン変
換係数の強度を量子化する強度量子化手段と，係数選択
手段および強度量子化手段の処理結果を受けて所定長の
ビット列を出力するビット列出力手段と，を備え，さら
に，強度量子化手段が，係数選択手段で選択した離散コ
サイン変換係数に関して，係数の位置と係数値とを所定
数ずつ符号化し，係数値の符号化に関しては、各係数の
符号と，全係数を代表する強度のみとを用いて符号化す
るため，ＣＥＬＰ方式の符号化の過程において，雑音励
振源符号ベクトル探索によって行っていた雑音源情報の
符号化（量子化）を，雑音源情報抽出のための目標信号
そのものを符号化（量子化）することにより，雑音励振
源の符号化における演算量を削減した音声圧縮符号化装
置を提供することができる。As described above, according to the speech compression encoding apparatus of the present invention (claim 1), the noise source information extracting means includes a target signal extracting means for extracting a target signal, and a target signal extracting means for extracting the extracted target signal. Discrete cosine transform means for transforming into a discrete cosine transform coefficient sequence, and coefficient transforming means for transforming the discrete cosine transform coefficient sequence obtained by the discrete cosine transform means into a bit string of a predetermined length, further comprising: Coefficient selecting means for selecting a discrete cosine transform coefficient from a sequence of discrete cosine transform coefficients, intensity quantizing means for quantizing the intensity of the discrete cosine transform coefficient selected by the coefficient selecting means, and coefficient selecting means and intensity quantizing means. Bit string output means for outputting a bit string of a predetermined length in response to the processing result, and further comprising an intensity quantization means for converting the discrete cosine transform coefficient selected by the coefficient selection means to Then, the position of the coefficient and the coefficient value are coded by a predetermined number, and the coding of the coefficient value is performed using only the sign of each coefficient and only the intensity representative of all the coefficients. In the encoding process, the encoding (quantization) of the noise source information, which was performed by searching for the noise excitation source code vector, is performed by encoding (quantizing) the target signal itself for extracting the noise source information. It is possible to provide a speech compression encoding device in which the amount of calculation in encoding the excitation source is reduced.

【００４３】また，請求項２に係る音声圧縮符号化装置
は，請求項１記載の音声圧縮符号化装置において，係数
選択手段が，離散コサイン変換係数を所定数のブロック
に分割し、各ブロックから所定数の係数を選択するた
め，例えば，予め学習データにより各係数の選択される
頻度を分析し，その頻度にしたがって適切に係数をブロ
ック分割することにより，音質を劣化させることなく選
択するビット数を大幅に削減することも可能である。According to a second aspect of the present invention, in the first aspect of the present invention, the coefficient selecting means divides the discrete cosine transform coefficients into a predetermined number of blocks. In order to select a predetermined number of coefficients, for example, the frequency at which each coefficient is selected based on learning data is analyzed in advance, and the coefficients are appropriately divided into blocks according to the frequency, so that the number of bits to be selected without deteriorating sound quality Can also be significantly reduced.

[Brief description of the drawings]

【図１】本実施の形態の音声圧縮符号化装置の概略構成
図である。FIG. 1 is a schematic configuration diagram of a speech compression encoding device according to the present embodiment.

【図２】本実施の形態の音声符号化部のブロック構成図
である。FIG. 2 is a block diagram illustrating a configuration of a speech encoding unit according to the present embodiment.

【図３】本実施の形態の雑音源抽出部の概略ブロック図
である。FIG. 3 is a schematic block diagram of a noise source extraction unit according to the present embodiment.

【図４】本実施の形態の音声復号化部の一部構成を示す
ブロック図である。FIG. 4 is a block diagram illustrating a partial configuration of a speech decoding unit according to the present embodiment.

【図５】本実施の形態の音声圧縮符号化装置の概略フロ
ーチャートである。FIG. 5 is a schematic flowchart of the audio compression encoding apparatus according to the present embodiment.

【図６】本実施の形態の音声符号化部の動作手順を示す
フローチャートである。FIG. 6 is a flowchart showing an operation procedure of a speech encoding unit according to the present embodiment.

【図７】本実施の形態において，強度の符号化に要する
ビット数を削減し，その分選択する係数の数を増やした
場合の効果を示すための説明図である。FIG. 7 is an explanatory diagram showing the effect of reducing the number of bits required for encoding the strength and increasing the number of coefficients to be selected accordingly in the present embodiment.

[Explanation of symbols]

１００音声圧縮符号化装置１０１Ａ／Ｄ変換部１０２音声符号化部１０３蓄積部１０４音声復号化部１０５Ｄ／Ａ変換部２０１フレーム構成部２０２スペクトル包絡抽出部２０３サブフレーム構成部２０４ピッチ情報抽出部２０５利得抽出部２０６雑音源抽出部３０１目標信号構成部３０２ＤＣＴ変換部３０３係数変換部３０４係数選択部３０５強度量子化部３０６ビット列出力部４０１スペクトル包絡復号部４０２ピッチ情報復号部４０３雑音源復号部４０４利得復号部４０５音声合成部 REFERENCE SIGNS LIST 100 audio compression encoding apparatus 101 A / D conversion section 102 audio encoding section 103 storage section 104 audio decoding section 105 D / A conversion section 201 frame configuration section 202 spectrum envelope extraction section 203 subframe configuration section 204 pitch information extraction section 205 gain extraction unit 206 noise source extraction unit 301 target signal configuration unit 302 DCT conversion unit 303 coefficient conversion unit 304 coefficient selection unit 305 intensity quantization unit 306 bit sequence output unit 401 spectrum envelope decoding unit 402 pitch information decoding unit 403 noise source decoding unit 404 Gain decoding unit 405 Voice synthesis unit

Claims

[Claims]

1. A / D conversion means for digitizing an analog audio waveform into a digital audio signal, audio encoding means for encoding the digital audio signal by a predetermined encoding method, and Storage means for storing the signal; voice decoding means for extracting and decoding the stored digital voice signal; and D / D for converting the decoded digital voice signal into an analog voice signal.
And A conversion means.
Frame encoding means for dividing the digital audio signal into processing units called frames, and spectral envelope encoding means for extracting and encoding spectrum envelope information representing a spectrum envelope for the divided frames. A subframe constructing unit that constitutes a processing unit called a subframe from the divided frames, a pitch information extracting unit that extracts and encodes pitch information of the subframe, and a subframe corresponding to the pitch information of the subframe. Gain information extraction means for extracting and encoding gain information; noise source information extraction means for extracting and encoding noise source information as excitation information of the subframe;
Wherein the speech decoding means comprises: spectrum envelope information decoding means for decoding the encoded spectrum envelope information; noise source information decoding means for decoding the encoded noise source information; Pitch information decoding means for decoding the pitch information, gain information decoding means for decoding the encoded gain information, and the decoded noise source information;
An excitation source signal generation unit configured to generate an excitation source signal from pitch information and gain information; and a synthesized signal generation unit configured to generate a synthesized signal from the excitation source signal and the decoded spectrum envelope information. Information extracting means for extracting a target signal for extracting noise source information, discrete cosine transform means for transforming the extracted target signal into a discrete cosine transform coefficient sequence, and discrete cosine transform means; Coefficient conversion means for converting the obtained discrete cosine transform coefficient sequence into a bit string of a predetermined length, further comprising: coefficient selection means for selecting a discrete cosine transform coefficient from the discrete cosine transform coefficient sequence Intensity quantizing means for quantizing the intensity of the discrete cosine transform coefficient selected by the coefficient selecting means; And a bit string output means for outputting a bit string of a predetermined length in response to the result of the processing described above, and further comprising: the intensity quantizing means, for the discrete cosine transform coefficients selected by the coefficient selecting means, Is encoded by a predetermined number, and the encoding of the coefficient values is performed using only the sign of each coefficient and only the intensity representative of all the coefficients.

2. A speech compression code according to claim 1, wherein said coefficient selecting means divides said discrete cosine transform coefficient into a predetermined number of blocks, and selects a predetermined number of coefficients from each block. Device.