JPH07248799A

JPH07248799A - Method and device for coding/decoding voice

Info

Publication number: JPH07248799A
Application number: JP6042530A
Authority: JP
Inventors: Mikio Mizutani; 幹男水谷; Hiroyuki Nemoto; 博幸根本; Keiji Egashira; 慶治江頭
Original assignee: Matsushita Graphic Communication Systems Inc
Current assignee: Panasonic System Solutions Japan Co Ltd
Priority date: 1994-03-14
Filing date: 1994-03-14
Publication date: 1995-09-26
Anticipated expiration: 2016-09-17
Also published as: JP3210165B2

Abstract

PURPOSE:To realize a method and device for coding/decoding voice capable of performing a precise voice coding process only by a small number of code books. CONSTITUTION:First of all, the voice data sampled in a prescribed time are discrete cosine-transformed from the data on a time base to the data on a frequency axis. Then, the data on the frequency axis are divided into an imaginary number component and a real number component. Then, an optimal frequency pattern most approximating to the data divided ;o into the imaginary number component and the real number component on the frequency axis from among frequency patterns beforehand stored in the code book 5 is selected, and the optimal frequency pattern is fitted to the data on the frequency axis, and a subtraction process is performed. The subtraction process by a waveform pattern is repeated until the maximum level of the data on the frequency axis after subtraction reaches a prescribed value.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声の符号化復号化方
法および装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding / decoding method and apparatus.

【０００２】[0002]

【従来の技術】従来、この種の音声の符号化方法は、以
下のようなものであった。2. Description of the Related Art Conventionally, a speech coding method of this kind has been as follows.

【０００３】すなわち、音声データの時間軸上の波形に
対応するコードブックを予め用意しておき、実際の時間
軸上のデータに前記コードブックの中から最も近似した
コードブックを選択し、前記選択されたコードブックを
示す番号を音声データの圧縮符号化データとして送出し
ていた。That is, a codebook corresponding to the waveform of voice data on the time axis is prepared in advance, and the codebook closest to the actual data on the time axis is selected from the codebooks. The number indicating the generated codebook is transmitted as compression encoded data of audio data.

【０００４】[0004]

【発明が解決しようとする課題】しかし、このような従
来の方法では、以下のような問題があった。（１）上述のコードブックは時間軸上の波形に対応する
ものであったので、音質の良い圧縮を行うためには、音
声データには様々は波形は出現するので、コードブック
の数を膨大に揃えなければならず（通常は１０００個を
越える）、そのため前記コードブックを格納するメモリ
の容量が大きくする必要があった。（２）膨大な数のコードブックが必要になることから、
そのため実際の音声データの波形に該当するコードブッ
クを探し出すには、膨大な数の全コードブックに対して
音声データの波形を当てはめる処理を行わなければなら
ず、その分処理時間がかかっていた。However, such a conventional method has the following problems. (1) Since the above codebooks correspond to waveforms on the time axis, various waveforms appear in voice data in order to perform compression with good sound quality. Therefore, the number of codebooks is enormous. Therefore, it is necessary to increase the capacity of the memory for storing the codebook. (2) Because a huge number of codebooks are required,
Therefore, in order to find the codebook corresponding to the waveform of the actual voice data, it is necessary to perform the process of applying the waveform of the voice data to a huge number of all codebooks, which takes a processing time accordingly.

【０００５】本発明は、上記課題を解決するもので、小
数のコードブックのみを使用するだけで精度の高い音声
符号化処理を行うことができる音声符号化復号化方法お
よび装置を実現するものである。The present invention solves the above problems and realizes a speech coding / decoding method and apparatus capable of performing highly accurate speech coding processing by using only a small number of codebooks. is there.

【０００６】[0006]

【課題を解決するための手段】本発明は上記課題を解決
するために、第１に、音声データを符号化する際、まず
音声データを所定時間単位に離散コサイン変換し、前記
離散コサイン変換により得たデータの最大レベルおよび
この最大レベルの位置情報を検出し、周波数パターンに
より構成されるコードブックの中から、前記離散コサイ
ン変換により得たデータにおいて前記最大レベル周辺の
データに近似した前記コードブックを選択し、前記離散
コサイン変換により得たデータから前記コードブックを
減算し、前記最大レベル、前記位置情報、減算に用いた
前記コードブックを示す情報、前記所定時間内の音声デ
ータに対して行った処理の回数に基づいて音声データの
符号化データを生成するものである。In order to solve the above-mentioned problems, the present invention firstly requires that when audio data is encoded, the audio data is first subjected to discrete cosine transform in a predetermined time unit, and then the discrete cosine transform is performed. The codebook that detects the maximum level of the obtained data and the position information of this maximum level, and is close to the data around the maximum level in the data obtained by the discrete cosine transform from the codebook configured by the frequency pattern. Is selected, the codebook is subtracted from the data obtained by the discrete cosine transform, and the maximum level, the position information, the information indicating the codebook used for the subtraction, and the voice data within the predetermined time are selected. The encoded data of the audio data is generated based on the number of times the processing is performed.

【０００７】第２に、上記方法により音声データを符号
化する際、離散コサイン変換により得たデータからコー
ドブックを減算する回数を任意に設定して符号化処理を
行い、音声データの符号化データを生成するものであ
る。Secondly, when the voice data is encoded by the above method, the number of times the codebook is subtracted from the data obtained by the discrete cosine transform is arbitrarily set to perform the encoding process, and the encoded data of the voice data is obtained. Is generated.

【０００８】第３に、上記第１の方法により音声データ
を符号化する際、離散コサイン変換により得たデータの
レベルにおいて前記減算を行う最低レベルを任意に設定
して、前記離散コサイン変換により得たデータからコー
ドブックを減算する処理を行い、音声データの符号化デ
ータを生成するものである。Thirdly, when the voice data is encoded by the first method, the minimum level for the subtraction is arbitrarily set in the level of the data obtained by the discrete cosine transform, and the obtained by the discrete cosine transform. The codebook is subtracted from the generated data to generate encoded data of voice data.

【０００９】第４に、上記第１の方法により音声データ
を符号化する際、無音声状態の場合でも無音声状態ため
の特別の処理を行うことなく、音声が有る状態の処理と
同じ処理によって無音声データの符号化データを生成す
るものである。Fourthly, when the audio data is encoded by the first method, the same process as the process in the presence of voice is performed without performing a special process for the non-voice state even in the non-voice state. It is for generating encoded data of unvoiced data.

【００１０】第５に、上記第１の方法により音声データ
を符号化する際、離散コサイン変換により得たデータの
最大レベルにコードブックの最大レベルの大きさ、位置
を合せて減算処理を行い、音声データの符号化データを
生成するものである。Fifth, when the speech data is encoded by the first method, the subtraction process is performed by matching the maximum level and the position of the codebook with the maximum level of the data obtained by the discrete cosine transform. It is for generating encoded data of audio data.

【００１１】第６に、上記第１の方法により音声データ
を符号化する際、離散コサイン変換により得たデータか
らコードブックを減算する回数に応じて、生成する符号
化データのデータ長を可変にするものである。Sixth, when encoding the voice data by the first method, the data length of the encoded data to be generated is made variable according to the number of times the codebook is subtracted from the data obtained by the discrete cosine transform. To do.

【００１２】[0012]

【作用】本発明は上述の方法および構成により、まず、
所定時間内にサンプリングされた音声データを時間軸上
のデータから周波数軸上のデータに離散コサイン変換を
行う。次に前記周波数軸上のデータを奇数成分と偶数成
分に分ける。次に前記奇数成分と偶数成分に分けられた
周波数軸上のデータに、予め複数メモリに格納された周
波数パターンの中から最も近似している最適周波数パタ
ーンを選択し、前記周波数軸上のデータに前記最適周波
数パターンを当てはめて減算処理を行う。前記波形パタ
ーンによる減算処理を減算後の前記周波数軸上のデータ
の最大レベルが所定値に達するまで繰り返す。The present invention is based on the above-described method and configuration.
The discrete cosine transform is performed on the audio data sampled within a predetermined time from the data on the time axis to the data on the frequency axis. Next, the data on the frequency axis is divided into an odd component and an even component. Next, for the data on the frequency axis divided into the odd-numbered component and the even-numbered component, the optimum frequency pattern that is most approximated is selected from the frequency patterns stored in advance in a plurality of memories, and the data on the frequency axis is selected. Subtraction processing is performed by applying the optimum frequency pattern. The subtraction process using the waveform pattern is repeated until the maximum level of the data on the frequency axis after the subtraction reaches a predetermined value.

【００１３】以上の処理が終了すると、（１）処理前の
周波数軸上のデータの最大レベル、（２）減算処理に使
用した周波数パターンの番号、（３）減算したデータの
最大レベルの周波数位置、（４）減算処理を行った回数
を示す情報に基づいて圧縮符号化データを生成して送出
する。When the above processing is completed, (1) the maximum level of the data on the frequency axis before the processing, (2) the number of the frequency pattern used for the subtraction processing, and (3) the frequency position of the maximum level of the subtracted data. (4) Compressed and encoded data is generated and transmitted based on the information indicating the number of times the subtraction process is performed.

【００１４】これにより、音声データを圧縮符号化する
処理を行う際、前記音声データを周波数軸上のデータに
離散コサイン変換した後に、周波数軸上のデータをその
奇数成分と偶数成分とに分割することにより、奇数成分
と偶数成分の各々領域での相関係数が増加するので、前
記周波数軸上のデータにあてはめる周波数パターンの数
を大幅に削減することができ、予め格納しておく周波数
パターンの数を大幅に減少させることができる。さら
に、周波数パターンを格納しておくメモリの容量を大幅
に削減できるとともに、周波数パターンの数が少ないの
で音声データを圧縮符号化する際の処理を簡易にするこ
とができ、前記処理速度を向上させることができる。As a result, when the audio data is compression-encoded, the audio data is discrete cosine transformed into the data on the frequency axis, and then the data on the frequency axis is divided into its odd and even components. This increases the correlation coefficient in each of the odd-numbered component region and the even-numbered component region, so that the number of frequency patterns applied to the data on the frequency axis can be significantly reduced, and the frequency patterns stored in advance can be reduced. The number can be greatly reduced. Further, the capacity of the memory for storing the frequency pattern can be significantly reduced, and since the number of frequency patterns is small, it is possible to simplify the processing when compressing and encoding the audio data, and improve the processing speed. be able to.

【００１５】また、上記効果を達成した上で、処理前の
周波数軸上のデータの最大値、減算した波形パターンの
番号、減算したデータの位置、減算回数に基づいて音声
データ圧縮符号化データを生成しているので、音声デー
タの圧縮効率を高めるとともに音声データの再現性を確
保することができる。Further, after achieving the above effects, the audio data compression-encoded data is obtained based on the maximum value of the data on the frequency axis before processing, the number of the subtracted waveform pattern, the position of the subtracted data, and the number of subtractions. Since the audio data is generated, it is possible to improve the compression efficiency of the audio data and ensure the reproducibility of the audio data.

【００１６】[0016]

【実施例】以下、本発明の一実施例について図面を参照
にしながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１７】まず、音声の符号化について説明する。図
１は本発明による音声符号化処理を実現する音声符号化
装置のブロック図である。図１において、１はアナログ
データとして入力した音声データを所定時間毎にサンプ
リングして出力するＡ／Ｄ変換器である。２はＡ／Ｄ変
換器１においてサンプリングされた音声データを入力し
所定数蓄積するまで溜めて出力するバッファである。本
実施例では、バッファ２に蓄積する音声データ数を１２
８個としている。３はバッファ２から出力された音声デ
ータを離散コサイン変換する離散コサイン変換手段であ
る。すなわち、図９に示すように音声データを時間軸上
のデータ（イ）から周波数軸上のデータ（ロ）に変換し
ている。なお、図９については後述する。４は周波数軸
上のデータに変換された所定時間内の音声データの最大
レベルの値およびその最大レベルが存在する周波数位置
を検出する最大レベル検出手段である。５は離散コサイ
ン変換手段３により変換された周波数軸上のデータのパ
ターンから頻出頻度を多いパターンを選択して格納して
あるコードブックである。コードブック５には図８に示
すような周波数パターンが用いられ、図８には代表的な
周波数パターンを示している。周波数パターンにはレベ
ルの大小による相違はあるが、パターンの形状は近似し
たものが出現するので、頻出頻度を多いパターンを選択
するだけで、充分に実際に出現する周波数軸上のデータ
に対応することができる。本発明において、特徴的なこ
とはコードブックとして周波数パターンを用いているこ
とであり、コードブック５に格納されている周波数パタ
ーンが少なくて済むということである。本実施例におい
ても周波数パターンの数は１６個以下で充分に足りてい
る。６は最大レベル検出手段から出力された最大レベル
の周波数位置において近似する周波数パターンをコード
ブック５から選択する最適コードブック選択手段であ
る。７は実際の周波数軸上のデータとコードブック５か
ら選択された周波数パターンとを比較し減算処理を行う
計算手段である。８は音声データの圧縮符号データを生
成して出力する符号化手段である。符号化手段８は最大
レベル検出手段４から出力される（１）所定時間内の周
波数軸上のデータに変換された音声データの最大レベル
（以下、単に最大レベルとする）および（２）前記最大
レベルの存在する周波数位置（以下、位置情報とす
る）、さらに、（３）最適コードブック選択手段６から
出力される周波数パターンの番号（以下、コード番号）
に基づき音声データの圧縮符号データを生成する。First, the encoding of voice will be described. FIG. 1 is a block diagram of a speech coder for realizing a speech coding process according to the present invention. In FIG. 1, reference numeral 1 is an A / D converter for sampling and outputting audio data input as analog data at predetermined time intervals. Reference numeral 2 is a buffer for inputting the audio data sampled by the A / D converter 1, accumulating and outputting the audio data until a predetermined number is accumulated. In the present embodiment, the number of audio data stored in the buffer 2 is 12
Eight is set. Reference numeral 3 is a discrete cosine transforming means for performing a discrete cosine transform on the voice data output from the buffer 2. That is, as shown in FIG. 9, the audio data is converted from the data on the time axis (a) to the data on the frequency axis (b). Note that FIG. 9 will be described later. Reference numeral 4 is a maximum level detecting means for detecting the maximum level value of the voice data converted into the data on the frequency axis within a predetermined time and the frequency position where the maximum level exists. Reference numeral 5 is a codebook in which a pattern having a high frequency of occurrence is selected and stored from the patterns of the data on the frequency axis converted by the discrete cosine transform means 3. A frequency pattern as shown in FIG. 8 is used for the codebook 5, and FIG. 8 shows a typical frequency pattern. Although there are differences in frequency patterns depending on the size of the level, similar pattern shapes appear, so simply selecting a pattern with a high frequency of occurrence corresponds to the data on the frequency axis that actually appears sufficiently. be able to. A feature of the present invention is that a frequency pattern is used as a codebook, and the number of frequency patterns stored in the codebook 5 can be small. Also in this embodiment, the number of frequency patterns is 16 or less, which is sufficient. Reference numeral 6 is an optimum codebook selecting means for selecting from the codebook 5 a frequency pattern that is approximated at the maximum level frequency position output from the maximum level detecting means. Reference numeral 7 is a calculation means for comparing the actual data on the frequency axis with the frequency pattern selected from the codebook 5 and performing subtraction processing. Reference numeral 8 is an encoding means for generating and outputting compressed code data of audio data. The encoding means 8 outputs from the maximum level detection means 4 (1) the maximum level (hereinafter, simply referred to as maximum level) of the voice data converted into the data on the frequency axis within a predetermined time, and (2) the maximum. The frequency position where the level exists (hereinafter referred to as position information), and (3) the number of the frequency pattern output from the optimum codebook selecting means 6 (hereinafter referred to as the code number).
The compressed code data of the audio data is generated based on.

【００１８】以下、以上のように構成された本発明の音
声符号化装置についてその音声符号化処理について、図
２乃至図１０を用いて説明する。図２は本発明の音声デ
ータの符号化処理の一実施例を示した上位フローチャー
トである。図３は図２のステップ（以下、ＳＴとする）
４における最大レベル検出処理を示したフローチャート
である。図４は図２のＳＴ６における最適コードブック
選択処理を示したフローチャートである。図５は図２の
ＳＴ７における計算処理（減算）を示したフローチャー
トである。図６は図２のＳＴ８における符号化処理Ａを
示したフローチャートである。図７は図２のＳＴ１１に
おける符号化処理Ｂを示したフローチャートである。図
８は図１のコードブック７に格納されている周波数パタ
ーンの代表パターンを示した図である。図９は音声デー
タを離散コサイン変換して時間軸上のデータから周波数
軸上のデータに変換した所定時間におけるデータの状態
を示した図である。図９において（イ）は時間軸上のデ
ータを示し、（ロ）は（イ）のデータを周波数軸上のデ
ータに離散コサイン変換した状態を示し、（ハ）は受信
側で（イ）を再現した音声データを時間軸上のデータに
より示し、（ニ）は逆離散コサイン変換して時間軸上の
データ（ハ）になる前の周波数軸上のデータを示してい
る。図１０は符号化された圧縮符号化データの構成を示
したデータ構成図である。The speech coding process of the speech coding apparatus of the present invention having the above configuration will be described below with reference to FIGS. 2 to 10. FIG. 2 is a high-level flowchart showing an embodiment of the audio data encoding processing of the present invention. FIG. 3 shows steps in FIG. 2 (hereinafter referred to as ST)
6 is a flowchart showing a maximum level detection process in No. 4. FIG. 4 is a flowchart showing the optimum codebook selection process in ST6 of FIG. FIG. 5 is a flowchart showing the calculation process (subtraction) in ST7 of FIG. FIG. 6 is a flowchart showing the encoding process A in ST8 of FIG. FIG. 7 is a flowchart showing the encoding process B in ST11 of FIG. FIG. 8 is a diagram showing a representative pattern of frequency patterns stored in the codebook 7 of FIG. FIG. 9 is a diagram showing a state of data at a predetermined time when the audio data is subjected to the discrete cosine transform to be transformed from the data on the time axis to the data on the frequency axis. In FIG. 9, (a) shows the data on the time axis, (b) shows the state of discrete cosine transform of the data of (a) into the data on the frequency axis, and (c) shows the data on the receiving side. The reproduced voice data is shown by the data on the time axis, and (d) shows the data on the frequency axis before the inverse discrete cosine transform to obtain the data (c) on the time axis. FIG. 10 is a data structure diagram showing the structure of encoded compression-encoded data.

【００１９】まず、図２を用いて本実施例の音声符号化
処理の概略を説明する。音声データを入力する前にＳＴ
７における計算処理を何回繰り返すかをカウンタに設定
する（ＳＴ１）。後述するが、前記設定した回数が多い
ほど圧縮符号化した音声データの再現性が高くなり、逆
に、前記設定した回数が少ないほど圧縮符号化した音声
データの再現性は悪くなるが圧縮効率は高くなる。First, the outline of the speech coding process of this embodiment will be described with reference to FIG. ST before inputting voice data
The number of times the calculation process in 7 is repeated is set in the counter (ST1). As will be described later, the higher the set number of times, the higher the reproducibility of the compression-encoded audio data, and conversely, the smaller the set number of times, the lower the reproducibility of the compression-encoded audio data but the compression efficiency. Get higher

【００２０】音声データを入力すると、Ａ／Ｄ変換部１
において所定時間毎に前記音声データを１２８単位によ
りサンプリングを行う。Ａ／Ｄ変換部１はサンプリング
した音声データをワンサンプリングデータ単位により出
力する。バッファ２は前記サンプリングされた音声デー
タを１２８個蓄積するまで溜めておき、離散コサイン変
換手段３へ出力する。この際音声データは図９（イ）に
示す状態になっている。離散コサイン変換手段３では入
力した音声データを時間軸上のデータを周波数軸上のデ
ータ（図９（ロ）に示す）に離散コサイン変換を行い
（ＳＴ２）、奇数成分と偶数成分とに分離した上で出力
する（ＳＴ３）。When voice data is input, the A / D converter 1
In step 1, the audio data is sampled in units of 128 every predetermined time. The A / D converter 1 outputs the sampled audio data in units of one sampling data. The buffer 2 stores the sampled voice data until 128 pieces are accumulated, and outputs it to the discrete cosine transform means 3. At this time, the audio data is in the state shown in FIG. The discrete cosine transform means 3 performs a discrete cosine transform on the input voice data into the data on the time axis and the data on the frequency axis (shown in FIG. 9B) (ST2), and separates it into odd and even components. Output above (ST3).

【００２１】このように、周波数軸上のデータを奇数成
分と偶数成分とに分離すると、奇数成分と偶数成分の各
々領域での相関係数が増加し、図９の（ロ）に示すよう
なデータが生成される。As described above, when the data on the frequency axis is separated into the odd component and the even component, the correlation coefficient in each region of the odd component and the even component increases, as shown in (b) of FIG. Data is generated.

【００２２】次に、前記所定時間内において１２８回に
サンプリングされた周波数軸上のデータの中での、デー
タの最大レベルを検出する（ＳＴ４）とともに前記最大
レベルであるデータの周波数位置を検出している。この
際最大レベルとはデータの大きさを意味し、絶対値にお
いて最大であるデータのレベルを検出している。Next, the maximum level of the data among the data on the frequency axis sampled 128 times within the predetermined time is detected (ST4) and the frequency position of the data having the maximum level is detected. ing. At this time, the maximum level means the size of the data, and the maximum data level in absolute value is detected.

【００２３】前記最大レベルが所定レベル以下（０を含
む）であると（ＳＴ５）、音声がない状態であるとみな
して周波数パターンによる演算処理を行わず、ＳＴ１１
の符号化処理Ｂへ進む。ここで、本実施例によると無音
声状態に処理が分岐しても、無音声状態のための特別の
処理はなく、音声が有る場合の処理と同一であるＳＴ１
１の符号化処理Ｂへ進んでいる。If the maximum level is equal to or lower than the predetermined level (including 0) (ST5), it is considered that there is no sound, and the calculation processing based on the frequency pattern is not performed, and ST11 is set.
To the encoding process B. Here, according to the present embodiment, even if the process branches to the voiceless state, there is no special process for the voiceless state, which is the same as the process when there is voice ST1.
The process proceeds to the encoding process B of 1.

【００２４】ＳＴ５において、前記最大レベルが所定レ
ベル以上であれば、周波数パターンによる演算処理へ進
む。最大レベル検出手段４は最適コードブック選択手段
６へ最大レベル（１）と位置情報（２）とを出力する。
最適コードブック選択手段６はコードブック５から順次
周波数パターンを取り出す。コードブック５内の周波数
パターンを５個とすると、１個目の周波数パターンから
順次実際の周波数軸上のデータに当てはめて、前記実際
の周波数軸上のデータに近似しているものを５個の周波
数パターンの中から選択する。この際コードブック５内
に格納されている周波数パターンは、各々図８（イ）に
示すように最大レベルが１となるように構成されてい
る。したがって、実際の周波数軸上のデータに当てはめ
る場合、コードブック５内の周波数パターンの最大レベ
ルを実際の周波数パターンの最大レベルに合せるように
ｎ倍している。このように、最適コードブック選択手段
６では、コードブック５から最適な周波数パターンを選
択して（ＳＴ６）、前記周波数パターンの番号であるコ
ード番号（３）を出力する。In ST5, if the maximum level is equal to or higher than the predetermined level, the process proceeds to the frequency pattern calculation process. The maximum level detecting means 4 outputs the maximum level (1) and the position information (2) to the optimum codebook selecting means 6.
The optimum codebook selecting means 6 sequentially extracts frequency patterns from the codebook 5. Assuming that there are five frequency patterns in the code book 5, the first frequency pattern is sequentially applied to the data on the actual frequency axis, and five data that are approximate to the data on the actual frequency axis are sequentially applied. Select from the frequency patterns. At this time, the frequency patterns stored in the codebook 5 are configured so that the maximum level is 1, as shown in FIG. Therefore, when applying the data on the actual frequency axis, the maximum level of the frequency pattern in the codebook 5 is multiplied by n so as to match the maximum level of the actual frequency pattern. In this way, the optimum codebook selecting means 6 selects the optimum frequency pattern from the codebook 5 (ST6) and outputs the code number (3) which is the number of the frequency pattern.

【００２５】ＳＴ７の計算処理では、離散コサイン変換
手段３から出力された実際の周波数軸上のデータに、最
適コードブック選択手段６から出力されたコード番号
（３）に基づいてコードブック５から最適な周波数パタ
ーンを取り出して減算する処理を行う。減算処理後に減
算した計算回数（４）を符号化手段８へ出力し、また、
減算した後の同所定時間内の周波数軸上のデータを最大
レベル検出手段４へ出力する。In the calculation process of ST7, the codebook 5 is optimized based on the code number (3) output from the optimum codebook selecting unit 6 on the data on the actual frequency axis output from the discrete cosine transforming unit 3. A process of extracting and subtracting a different frequency pattern is performed. The number of calculations (4) subtracted after the subtraction processing is output to the encoding means 8, and
The data on the frequency axis within the predetermined time after the subtraction is output to the maximum level detecting means 4.

【００２６】ＳＴ８の符号化処理Ａでは、最大レベル検
出手段４から入力した最大レベル（１）および位置情報
（２）、最適コードブック選択手段６から入力したコー
ド番号（３）に基づいて図１０（ロ）に示すように圧縮
符号化データを構成するフレームを生成する。計算手段
７における１回目の計算処理に基づいてフレーム１が生
成され、２回目の計算処理に基づいてフレーム２が生成
される。したがって、計算処理り回数が多いほどフレー
ム数は増加することになる。In the encoding process A of ST8, the maximum level (1) and the position information (2) input from the maximum level detecting means 4 and the code number (3) input from the optimum codebook selecting means 6 are used to generate the data shown in FIG. As shown in (b), a frame forming the compression coded data is generated. The frame 1 is generated based on the first calculation processing in the calculation means 7, and the frame 2 is generated based on the second calculation processing. Therefore, the number of frames increases as the number of calculation processes increases.

【００２７】符号化手段によるＳＴ８における符号化処
理Ａが終了すると、ＳＴ１において設定した計算処理の
回数をディクリメントし（ＳＴ９）、カウンタ値が０に
なったか否かを判断する（ＳＴ１０）。０であればＳＴ
１１へ進むが、０でなければ再びＳＴ３の最大レベル検
出手段へ戻り、減算処理後の周波数軸上のデータにおい
て最大レベルを検出し、ＳＴ５→ＳＴ６→ＳＴ７→ＳＴ
８→ＳＴ９→ＳＴ１０と、ＳＴ１０においてカウンタ値
が０になるまで処理を繰り返す。When the encoding process A in ST8 by the encoding means is completed, the number of calculation processes set in ST1 is decremented (ST9), and it is determined whether the counter value has become 0 (ST10). If 0, ST
If it is not 0, the process goes back to the maximum level detecting means in ST3 to detect the maximum level in the data on the frequency axis after the subtraction processing, and ST5 → ST6 → ST7 → ST.
The process is repeated from 8 to ST9 to ST10 until the counter value becomes 0 in ST10.

【００２８】ここで、ＳＴ１において、カウンタ値を予
め多く設定しておくと、周波数パターンによる計算回数
（ＳＴ７）が多くなり、その分音質の高い音声データの
圧縮を実現することができる。反面、カウンタ値を予め
多く設定しておくと、計算回数が多くなり、その分図１
０（イ）に示す圧縮符号化データのフレーム数が増加
し、データ長が長くなり、音声データの圧縮率が低くな
る。このように本発明によると、任意に音声データの圧
縮率を設定変更することができる。Here, if a large counter value is set in advance in ST1, the number of calculations by the frequency pattern (ST7) increases, and it is possible to realize compression of voice data with high sound quality. On the other hand, if a large number of counter values are set in advance, the number of calculations will increase, and as a result, Figure 1
The number of frames of the compression-encoded data indicated by 0 (a) increases, the data length increases, and the compression rate of audio data decreases. As described above, according to the present invention, the compression rate of audio data can be arbitrarily changed.

【００２９】さらに、ＳＴ１においてカウンタ値を予め
多く設定しておいた場合でも、周波数軸上のデータがコ
ードブック５内の周波数パターンとほとんど同一である
場合のようにＳＴ５において最大レベルが所定レベルを
下回ると、その時点でカウンタ値が０に達しなくても計
算処理を繰り返えさず中断してＳＴ１１の符号化処理Ｂ
へ進む。このように本発明によると、実際の周波数軸上
のデータと周波数パターンとが近似している場合には計
算処理の回数は予め設定されたカウンタ値より少なくな
り、また、実際の周波数軸上のデータと周波数パターン
とが近似していない場合には予め設定されたカウンタ値
まで行っているので、音声データの再現性を保障しつつ
圧縮率を高めることができる。Further, even if a large number of counter values are set in advance in ST1, the maximum level is set to a predetermined level in ST5 as in the case where the data on the frequency axis is almost the same as the frequency pattern in the codebook 5. If it falls below, even if the counter value does not reach 0 at that time, the calculation process is not repeated and is interrupted, and the encoding process B of ST11 is performed.
Go to. As described above, according to the present invention, when the data on the actual frequency axis and the frequency pattern are close to each other, the number of calculation processes is less than the preset counter value, and the actual frequency axis is When the data and the frequency pattern are not close to each other, the preset counter value is used, so that the compression rate can be increased while ensuring the reproducibility of the audio data.

【００３０】なお、ＳＴ５における所定レベルは任意に
設定することができる。前記所定レベルを高く設定する
とその分音声データの再現性は悪くなり、反面、ＳＴ１
におけるカウンタ値に関わりなく計算処理を中断させる
のでその分圧縮率は高まる。また、前記所定レベルを低
く設定するとその分音声データの再現性は高くなり、反
面、ＳＴ１におけるカウンタ値が０に達するまで計算処
理を継続させるのでその分圧縮率は低くなる。The predetermined level in ST5 can be set arbitrarily. If the predetermined level is set high, the reproducibility of the audio data is deteriorated to that extent, while ST1
Since the calculation process is interrupted regardless of the counter value at, the compression rate increases accordingly. Further, if the predetermined level is set low, the reproducibility of the audio data becomes high, and on the other hand, the calculation process is continued until the counter value in ST1 reaches 0, so the compression rate becomes low accordingly.

【００３１】ＳＴ１０においてカウンタ値が０に達し、
若しくは、ＳＴ５において最大レベルが所定レベル以下
になると、ＳＴ１１の符号化処理Ｂへ進む。ＳＴ１１の
符号化処理Ｂにおいては、各回数における計算処理の結
果によりＳＴ８の符号化処理Ａにおいて生成されたフレ
ーム１からフレームｎまでを１個の圧縮符号化データに
まとめる処理を行っている。図１０（イ）に示すように
所定時間内の音声データであることを示すヘッダ情報を
先頭に配置し、順次フレーム１からフレームｎまで配置
して、所定時間内の音声データの圧縮符号化データを生
成している。このように圧縮符号化データを生成すると
外部へ出力する。At ST10, the counter value reaches 0,
Alternatively, when the maximum level becomes equal to or lower than the predetermined level in ST5, the process proceeds to the encoding process B in ST11. In the encoding process B of ST11, a process of combining frames 1 to n generated in the encoding process A of ST8 into one piece of compression encoded data according to the result of the calculation process at each number of times is performed. As shown in FIG. 10 (a), header information indicating that the audio data is within a predetermined time is placed at the head, and sequentially from frame 1 to frame n, compressed encoded data of the audio data within the predetermined time. Is being generated. When the compression encoded data is generated in this way, it is output to the outside.

【００３２】また、ＳＴ１１において、入力した音声デ
ータが無音声状態であると、圧縮符号化データは図１０
（イ）のヘッダ部分のみの構成により生成される。圧縮
符号化データを復号化する際、圧縮符号化データがヘッ
ダ部分のみにより構成されていると、無音声であると判
断する。上述したように、無音声状態が判別された場合
でも、無音声状態にのみ用いる処理を特別に行わず、音
声がある場合の通常の処理により圧縮符号化データを生
成しているので、その分ソフトウェアを減少させること
ができる。さらに音声データがない場合には圧縮情報が
減少するので、その分音声データの圧縮率を上げること
ができる。なお、無音声の場合の圧縮符号化データはヘ
ッダにフレームを１個付加した構成であってもよい。In ST11, if the input voice data is in the non-voice state, the compression coded data is as shown in FIG.
It is generated by the configuration of only the header part in (a). When decoding the compressed coded data, if the compressed coded data is composed of only the header portion, it is determined that there is no voice. As described above, even when the voiceless state is determined, the process used only for the voiceless state is not specially performed, and the compression coded data is generated by the normal process when there is voice. Software can be reduced. Furthermore, when there is no audio data, the compression information decreases, so the compression rate of the audio data can be increased accordingly. The compression-coded data in the case of no voice may have a structure in which one frame is added to the header.

【００３３】以上が本実施例の音声符号化処理の概略で
ある。以下、図３を用いて図２におけるＳＴ４の最大レ
ベル検出処理について詳細に説明する。The above is the outline of the voice encoding process of the present embodiment. Hereinafter, the maximum level detection process of ST4 in FIG. 2 will be described in detail with reference to FIG.

【００３４】まず、カウンタ値をセットして、最大レベ
ルに０をセットする（ＳＴ１）。ここでカウンタ値とは
所定時間内のサンプリング数を示している。本実施例の
場合は所定時間内で音声データを１２８サンプリングし
ているので、カウンタ値は１２８となる。次に、離散コ
サイン変換手段３から周波数軸上のデータに変換された
音声データを入力する。前記周波数軸上のデータを１個
目のサンプリングされたデータから順番に１２８番目ま
で以下の処理を繰り返し行う。First, the counter value is set and 0 is set to the maximum level (ST1). Here, the counter value indicates the number of samplings within a predetermined time. In the case of this embodiment, since the audio data is sampled 128 times within the predetermined time, the counter value becomes 128. Next, the voice data converted into the data on the frequency axis is input from the discrete cosine transform means 3. The following processing is repeated from the first sampled data to the 128th data on the frequency axis in order.

【００３５】ＳＴ２において、ＳＴ１でセットされた最
大レベル０から１個目のサンプリングされたデータを減
算する。前記１個目のサンプリングされたデータを０と
すると減算結果は０であるので、ＳＴ５に進み、ＳＴ１
におけるカウンタ値を１２８から１２７へとディクリメ
ントする。In ST2, the first sampled data is subtracted from the maximum level 0 set in ST1. If the first sampled data is set to 0, the subtraction result is 0, so the process proceeds to ST5 and ST1
Decrement the counter value at 128 from 127.

【００３６】ＳＴ６において、ＳＴ１におけるカウンタ
値が０になったか否かを判断する。上記例の場合カウン
タ値は１２７であるので、ＳＴ２へ戻る。再び、ＳＴ２
→ＳＴ３→ＳＴ５→ＳＴ６と処理を繰り返し、最大レベ
ルが０のまま処理が進み、カウンタ値が５となった時点
でデータのレベルが５０であるとする。ＳＴ２におい
て、最大レベル０からレベル５０を減算する。減算結果
は−５０であり０より小さいので、ＳＴ４へ進む。ＳＴ
４では、最大レベルに５０を入力し、位置情報に５を入
力する。ＳＴ５→ＳＴ６→ＳＴ２と進む。At ST6, it is determined whether or not the counter value at ST1 has become zero. In the above example, since the counter value is 127, the process returns to ST2. Again, ST2
It is assumed that the processing is repeated in the order of ST3 → ST5 → ST6, the processing proceeds with the maximum level being 0, and the data level is 50 when the counter value becomes 5. In ST2, level 50 is subtracted from maximum level 0. Since the subtraction result is -50, which is smaller than 0, the process proceeds to ST4. ST
In 4, the maximum level is input as 50 and the position information is input as 5. The process proceeds from ST5 → ST6 → ST2.

【００３７】続いて、カウンタ値が６となり、データの
レベルが−３０であるとする。ＳＴ２においては絶対値
により減算を行うので、前回更新した最大値５０からレ
ベル３０を減算する。減算結果は２０であり、０より大
きいのでＳＴ５へ進むことになる。すなわち、最大レベ
ルは更新されないことになる。処理はＳＴ５→ＳＴ６→
ＳＴ２と進む。Next, it is assumed that the counter value becomes 6 and the data level is -30. In ST2, since the subtraction is performed by the absolute value, the level 30 is subtracted from the maximum value 50 updated last time. The subtraction result is 20, which is larger than 0, and thus the process proceeds to ST5. That is, the maximum level will not be updated. Processing is ST5 → ST6 →
Go to ST2.

【００３８】再び処理が数回繰り返されて、カウンタ値
が１２のときに最大レベルが５０のままであるとする。
カウンタ値が１３となり、データのレベルが−８０であ
るとする。ＳＴ２において、最大レベル５０から絶対値
であるレベル８０を減算する。減算結果−３０であるの
で０より小さい値となり、ＳＴ４へ進み、最大レベルを
５０から８０へと更新し、位置情報を５から１３へと更
新する。It is assumed that the processing is repeated several times and the maximum level remains 50 when the counter value is 12.
It is assumed that the counter value is 13 and the data level is -80. In ST2, the absolute level 80 is subtracted from the maximum level 50. Since the subtraction result is -30, the value is smaller than 0, the process proceeds to ST4, the maximum level is updated from 50 to 80, and the position information is updated from 5 to 13.

【００３９】このように、１２８個目のサンプリングさ
れた音声データまで上記処理を繰り返し、所定時間内に
おける周波数軸上のデータに変換された音声データの最
大レベル（１）と位置情報（２）とを検出して出力す
る。In this way, the above processing is repeated up to the 128th sampled voice data, and the maximum level (1) and the position information (2) of the voice data converted into the data on the frequency axis within the predetermined time are obtained. Is detected and output.

【００４０】以上が最大レベル検出処理である。以下、
図４を用いて図２におけるＳＴ６の最適コードブック選
択処理について詳細に説明する。The above is the maximum level detection processing. Less than,
The optimum codebook selection process of ST6 in FIG. 2 will be described in detail with reference to FIG.

【００４１】まず、カウンタ値をセットし、最大レベル
検出処理４から最大レベル（１）と位置情報（２）とを
入力し、最小減算誤差に任意の大きな値を入力する（Ｓ
Ｔ１）。ここでカウンタ値とはコードブック５内に格納
されている周波数パターンの数を示している。本実施例
の場合は図８の（イ）から（ホ）に示すように周波数パ
ターンの数は５個とする。また、最小減算誤差には実際
の周波数軸上のデータとコードブック５内の周波数パタ
ーンとを減算し、この減算結果を総和した値の中で発生
し得る最も大きな値よりも、大きな値を入力している。First, the counter value is set, the maximum level (1) and the position information (2) are input from the maximum level detection processing 4, and an arbitrarily large value is input as the minimum subtraction error (S
T1). Here, the counter value indicates the number of frequency patterns stored in the codebook 5. In the case of the present embodiment, the number of frequency patterns is 5, as shown in (a) to (e) of FIG. For the minimum subtraction error, the actual data on the frequency axis and the frequency pattern in the codebook 5 are subtracted, and a value larger than the largest value that can occur in the sum of the subtraction results is input. is doing.

【００４２】次に、コードブック５から１番目に格納さ
れている周波数パターンを取り出す（ＳＴ２）。ここで
は、図８の（イ）を取り出すとする。前記周波数パター
ン（イ）を最大レベルに合せてｎ倍する（ＳＴ３）。す
なわち、周波数パターンは前述したようにコードブック
５内で最大レベルの高さを１に設定して格納してある。
したがって、最大レベル検出手段４から入力した最大レ
ベルが８０とすると、周波数パターン（イ）を８０倍に
する。Next, the first stored frequency pattern is extracted from the codebook 5 (ST2). Here, it is assumed that (a) in FIG. 8 is taken out. The frequency pattern (a) is multiplied by n according to the maximum level (ST3). That is, the frequency pattern is stored with the maximum level height set to 1 in the codebook 5 as described above.
Therefore, assuming that the maximum level input from the maximum level detecting means 4 is 80, the frequency pattern (a) is multiplied by 80.

【００４３】ＳＴ４において、実際の周波数軸上のデー
タの最大レベルに、レベル調整をした周波数パターン
（イ）の最大レベルを合せて減算する。したがって、実
際の周波数軸上のデータの最大レベルは減算処理後完全
に消えることになる。続いて、減算して残った各周波数
位置におけるレベルの総和を減算誤差に入力する。In ST4, the maximum level of the frequency-adjusted frequency pattern (a) is added to the actual maximum level of the data on the frequency axis and subtracted. Therefore, the actual maximum level of the data on the frequency axis disappears completely after the subtraction process. Then, the sum of the levels at each frequency position remaining after the subtraction is input to the subtraction error.

【００４４】ＳＴ５において、ＳＴ１における最小減算
誤差からＳＴ４における減算誤差を減算する。ここで、
最小減算誤差を１０００とし、減算誤差を２０とする。
減算すると９８０になり０より大きいので（ＳＴ６）、
ＳＴ７に進む。At ST5, the subtraction error at ST4 is subtracted from the minimum subtraction error at ST1. here,
The minimum subtraction error is 1000 and the subtraction error is 20.
When subtracted, it becomes 980, which is greater than 0 (ST6),
Go to ST7.

【００４５】ＳＴ７において、最小減算誤差を１０００
から２０へ更新し、周波数パターンのコード番号に１を
入力する。At ST7, the minimum subtraction error is 1000
To 20 and input 1 to the code number of the frequency pattern.

【００４６】ＳＴ８において、ＳＴ１におけるカウンタ
値を５から４へデクリメントする。次にカウンタ値が０
であるか否かを判断する（ＳＴ９）。ここではカウンタ
値は０でないので、ＳＴ２へ戻る。At ST8, the counter value at ST1 is decremented from 5 to 4. Next, the counter value is 0
Or not (ST9). Since the counter value is not 0 here, the process returns to ST2.

【００４７】ＳＴ２において、コードブック５から２番
目の周波数パターンを入力する。すなわちここでは図８
の（ロ）とする。再び、前回のＳＴ４において減算する
前の実際の周波数軸上のデータにおける最大レベルに合
せて、２番目の周波数パターン（ロ）の最大レベルをｎ
倍する（ＳＴ３）。In ST2, the second frequency pattern is input from codebook 5. That is, here in FIG.
(B). Again, in accordance with the maximum level in the actual data on the frequency axis before the subtraction in ST4 of the previous time, the maximum level of the second frequency pattern (b) is set to n.
Double (ST3).

【００４８】ＳＴ４において、前回のＳＴ４において減
算する前の実際の周波数軸上のデータにおける最大レベ
ルに２番目の周波数パターン（ロ）の最大レベルを合せ
て減算を行い、減算後に残った実際の周波数軸上のデー
タの各周波数位置におけるレベルの総和を減算誤差に入
力する。In ST4, the maximum level of the data on the actual frequency axis before the subtraction in the previous ST4 is matched with the maximum level of the second frequency pattern (b) to perform the subtraction, and the actual frequency remaining after the subtraction is performed. The sum of the levels at each frequency position of the data on the axis is input to the subtraction error.

【００４９】ＳＴ５において、最小減算誤差から減算誤
差を減算する。ここで、最小減算誤差は前回２０に更新
され、減算誤差を１０とする。減算すると１０になり０
より大きいので（ＳＴ６）、最小減算誤差は２０から１
０へ更新され、コード番号は周波数パターンの１番から
２番へと更新される。カウンタ値はデクリメントされ、
ＳＴ９→ＳＴ２と進む。コードブック５から
３番目の周波数パターンを入力する（ＳＴ２）。すなわ
ちここでは図８の（ハ）に示すものである。再び、最初
の実際の周波数軸上のデータにおける最大レベルに合せ
て、３番目の周波数パターン（ハ）の最大レベルをｎ倍
する（ＳＴ３）。At ST5, the subtraction error is subtracted from the minimum subtraction error. Here, the minimum subtraction error is updated to 20 the previous time, and the subtraction error is set to 10. Subtracting to 10 gives 0
Since it is larger (ST6), the minimum subtraction error is 20 to 1
It is updated to 0, and the code number is updated from 1 to 2 of the frequency pattern. The counter value is decremented,
Go to ST9 → ST2. The third frequency pattern is input from codebook 5 (ST2). That is, here, it is shown in FIG. Again, the maximum level of the third frequency pattern (C) is multiplied by n in accordance with the maximum level of the data on the first actual frequency axis (ST3).

【００５０】ＳＴ４において、最初の実際の周波数軸上
のデータにおける最大レベルに、３番目の周波数パター
ン（ハ）の最大レベルを合せて減算を行い、減算後に残
った実際の周波数軸上のデータの各周波数位置における
レベルの総和を減算誤差に入力する。In ST4, the maximum level of the data on the first actual frequency axis is matched with the maximum level of the third frequency pattern (C), and subtraction is performed. The sum of the levels at each frequency position is input to the subtraction error.

【００５１】ＳＴ５において、最小減算誤差から減算誤
差を減算する。ここで、最小減算誤差は前回１０に更新
され、減算誤差を３０とする。減算すると１２０になり
０より小さいので（ＳＴ６）、最小減算誤差は更新され
ず１０のままで、ＳＴ８→ＳＴ９→ＳＴ２と進む。At ST5, the subtraction error is subtracted from the minimum subtraction error. Here, the minimum subtraction error is updated to 10 the previous time, and the subtraction error is set to 30. The subtraction results in 120, which is smaller than 0 (ST6). Therefore, the minimum subtraction error is not updated and remains 10, and the process proceeds to ST8 → ST9 → ST2.

【００５２】以上のような処理をＳＴ９においてカウン
タ値が０に達するまで繰り返す。ここでは、コードブッ
ク５内に格納されている周波数パターンの数は５個であ
るので５回繰り返すことになる。５回目まで処理が進
み、ＳＴ７における最小減算誤差が１０であり、コード
番号が２番だとすると、２番目の周波数パターン（ロ）
がコードブック５内の周波数パターンの中で最も近似し
ているものとして選択される。このように、本発明で
は、音声データにコードブックを当てはめて圧縮符号化
処理を行う際、音声データを時間軸上のデータから周波
数軸上のデータへ離散コサイン変換を行い、周波数パタ
ーンを当てはめている。出現する周波数パターンはレベ
ルの相違はあっても通常類似形状のものが多いので、１
０００個ものパターン数を予め格納しておく必要がな
く、最適コードブックを選択する処理においても、少な
い周波数パターンの中から１個を選択するだけであるの
で、処理を簡単に行うことができ、処理速度も従来に比
較して格段にアップさせることができる。The above processing is repeated until the counter value reaches 0 in ST9. Here, since the number of frequency patterns stored in the codebook 5 is 5, it is repeated 5 times. If the process proceeds to the fifth time, the minimum subtraction error in ST7 is 10, and the code number is 2, the second frequency pattern (b)
Is selected as the closest approximation of the frequency patterns in codebook 5. As described above, in the present invention, when the codebook is applied to the audio data to perform the compression encoding process, the audio data is subjected to the discrete cosine transform from the data on the time axis to the data on the frequency axis, and the frequency pattern is applied. There is. Since the frequency patterns that appear often have similar shapes, although there are differences in level, 1
It is not necessary to store as many as 000 patterns in advance, and even in the process of selecting the optimum codebook, only one is selected from a small number of frequency patterns, so that the process can be easily performed. The processing speed can be remarkably increased as compared with the conventional one.

【００５３】以上が最適コードブック選択処理である。
以下、図５を用いて図２におけるＳＴ７の計算処理（減
算）について詳細に説明する。The above is the optimum codebook selection processing.
Hereinafter, the calculation process (subtraction) in ST7 in FIG. 2 will be described in detail with reference to FIG.

【００５４】まず、計算手段７は、最大レベル検出手段
４から最大レベル（１）および位置情報（２）を入力
し、最適コードブック選択手段６からコード番号（３）
を入力する（ＳＴ１）。次に、コードブック５から前記
コード番号（３）に基づいて周波数パターンを取り出
す。続いて、前記周波数パターンを所定時間内の実際の
周波数軸データにおける最大レベルに合せてｎ倍する。First, the calculating means 7 inputs the maximum level (1) and the position information (2) from the maximum level detecting means 4, and the optimum codebook selecting means 6 inputs the code number (3).
Is input (ST1). Next, the frequency pattern is extracted from the codebook 5 based on the code number (3). Then, the frequency pattern is multiplied by n according to the maximum level in the actual frequency axis data within a predetermined time.

【００５５】ＳＴ４において、前記実際の周波数軸デー
タにおける最大レベルの周波数位置に、前記周波数パタ
ーンにおける最大レベルの周波数位置に合せて、前記実
際の周波数軸データから前記周波数パターンを減算し
（ＳＴ４）、減算した結果前記実際の周波数軸データに
おいて残ったレベルを最大レベル検出手段４に戻す。In ST4, the frequency pattern is subtracted from the actual frequency axis data in accordance with the maximum level frequency position in the actual frequency axis data in accordance with the maximum level frequency position in the actual frequency axis data (ST4). As a result of the subtraction, the level remaining in the actual frequency axis data is returned to the maximum level detecting means 4.

【００５６】ここで減算を行う場合、実際の周波数軸デ
ータにおける最大レベルの周波数位置に、周波数パター
ンにおける最大レベルの周波数位置に合せて行っている
が、これにより、図のＳＴ１においてカウンタ値を少な
く設定して減算処理を少ない回数で打ち切る場合でも、
データの最大レベルを基準に減算処理を行っているの
で、そこそこの再現性を保障することができるとともに
圧縮率を高めることができる。When the subtraction is performed here, it is performed in accordance with the maximum level frequency position in the actual frequency axis data and the maximum level frequency position in the frequency pattern. This reduces the counter value in ST1 of the figure. Even if you set and cancel the subtraction process with a small number of times,
Since the subtraction process is performed based on the maximum level of the data, it is possible to ensure a reasonable reproducibility and increase the compression rate.

【００５７】以上が計算処理である。以下、図６を用い
て図２におけるＳＴ８の符号化処理Ａについて詳細に説
明する。The above is the calculation processing. Hereinafter, the encoding process A of ST8 in FIG. 2 will be described in detail with reference to FIG.

【００５８】まず、最大レベル検出手段４から相対レベ
ル（１）を入力する（ＳＴ１）。ここで、相対レベルと
は最大レベルのことであるが、図２のＳＴ４からＳＴ１
０までの処理を繰り返すうちに最大レベルは変化し、相
対レベルは各々の処理において変化する最大レベルを示
している。First, the relative level (1) is input from the maximum level detecting means 4 (ST1). Here, the relative level is the maximum level, but from ST4 to ST1 in FIG.
The maximum level changes while the processes up to 0 are repeated, and the relative level indicates the maximum level that changes in each process.

【００５９】次に、最大レベル検出手段４からＳＴ１に
おける相対レベルに対応する位置情報（２）を入力する
（ＳＴ２）。Next, the position information (2) corresponding to the relative level in ST1 is input from the maximum level detecting means 4 (ST2).

【００６０】続いて、最適コードブック選択手段６から
コード番号（３）を入力する。ＳＴ４において、上記相
対レベル（１）、位置情報（２）、コード番号（３）か
ら図１０（ロ）に示すように各符号化フレームを生成す
る。Then, the code number (3) is input from the optimum codebook selecting means 6. In ST4, each coded frame is generated from the relative level (1), the position information (2), and the code number (3) as shown in FIG.

【００６１】以上が符号化処理Ａである。以下、図７を
用いて図２におけるＳＴ１０の符号化処理Ｂについて詳
細に説明する。The above is the encoding process A. Hereinafter, the encoding process B of ST10 in FIG. 2 will be described in detail with reference to FIG.

【００６２】まず、図２のＳＴ１で設定された計算処理
回数の処理を終了すると、所定時間内における音声デー
タの圧縮符号化処理は終了したことになり、前記所定時
間内における音声データの圧縮符号化データを生成する
処理に移行することになる。First, when the process of the number of calculation processes set in ST1 of FIG. 2 is finished, the compression coding process of the voice data within the predetermined time is finished, and the compression code of the voice data within the predetermined time is finished. The process will move to the process of generating the encoded data.

【００６３】ここでは、まず図１０の（イ）に示すヘッ
ダ情報を生成することからはじめる。前記ヘッダ情報は
ブロック識別子、最大レベル（１）、計算回数（４）か
ら構成される。ブロック識別子は、所定時間内の音声デ
ータ毎の各区切りを示すものであり、音声データを再現
する際ブロック識別子を検出することにより、ある所定
時間内の音声データから次の所定時間内の音声データに
移ったことを識別することができる。最大レベル（１）
は、前記前記所定時間内における周波数軸上に変換され
た音声データの最大レベルであり、図２のＳＴ４の最大
レベル検出処理において１番最初に検出された最大レベ
ルのことである。計算回数（４）は、図２のＳＴ１にお
いて設定されたカウント値であり、図２のＳＴ３からＳ
Ｔ９までの処理を何回繰り返したかを示している。Here, first, the header information shown in FIG. 10A is generated. The header information includes a block identifier, a maximum level (1), and the number of calculations (4). The block identifier indicates each section of each audio data within a predetermined time, and by detecting the block identifier when reproducing the audio data, the audio data within a certain predetermined time is changed to the audio data within the next predetermined time. You can identify that you have moved to. Maximum level (1)
Is the maximum level of the voice data converted on the frequency axis within the predetermined time, and is the maximum level detected first in the maximum level detection process of ST4 in FIG. The number of calculations (4) is the count value set in ST1 of FIG.
It shows how many times the processing up to T9 was repeated.

【００６４】符号化手段８は、ブロック識別子を入力し
（ＳＴ１）、最大レベル検出手段４から最大レベル
（１）を入力し（ＳＴ２）、計算手段７から計算回数
（４）を入力して（ＳＴ３）、図１０の（ロ）に示すヘ
ッダ情報を示す。The coding means 8 inputs the block identifier (ST1), the maximum level (1) from the maximum level detecting means 4 (ST2), and the calculation count (4) from the calculating means 7 ( ST3) shows the header information shown in (b) of FIG.

【００６５】続いて、図２の処理を繰り返す毎に図２の
ＳＴ８における符号化処理Ａによって生成されたフレー
ムを入力し（ＳＴ４）、前記ヘッダ情報に前記フレーム
を順次付加して図１０の（イ）に示すような符号化ブロ
ックを生成して（ＳＴ５）、外部に前記符号化ブロック
を出力する。ここで、前記ヘッダ情報に付加するフレー
ム数は図２における処理を繰り返した回数に応じて増加
する。したがって、図２における処理を繰り返した回数
が多ければ、図１０の（イ）に示す符号化ブロック（圧
縮符号化データ）のデータ長は長くなり、圧縮率はその
分低くなる。反面、図２における処理を繰り返した回数
が少なければ、図１０の（イ）に示す符号化ブロック
（圧縮符号化データ）のデータ長は短くなり、圧縮率は
その分高くなる。このように、図２のＳＴ１におけるカ
ウンタ値を任意に設定することにより、音声データの圧
縮率を設定変更することができる。Then, each time the processing of FIG. 2 is repeated, the frame generated by the encoding processing A in ST8 of FIG. 2 is input (ST4), the frame is sequentially added to the header information, and the frame of FIG. A coded block as shown in (a) is generated (ST5), and the coded block is output to the outside. Here, the number of frames added to the header information increases according to the number of times the processing in FIG. 2 is repeated. Therefore, if the number of times the process in FIG. 2 is repeated is large, the data length of the coding block (compressed coded data) shown in FIG. 10A becomes long and the compression rate becomes low accordingly. On the other hand, if the number of times the process in FIG. 2 is repeated is small, the data length of the coded block (compressed coded data) shown in FIG. 10A becomes short and the compression rate becomes high accordingly. As described above, by arbitrarily setting the counter value in ST1 of FIG. 2, the compression rate of audio data can be changed.

【００６６】以上が符号化処理Ｂである。以上の説明か
ら明らかなように本発明によると、音声データを圧縮符
号化する際、周波数軸上のデータを奇数成分と偶数成分
とに分離した上で、まず音声データを時間軸上のデータ
から周波数軸上の離散コサイン変換を行って、次に奇数
成分と偶数成分のに分離し、周波数パターンにより構成
されているコードブックを当てはめて減算する処理を行
っている。この減算処理を行う際、周波数軸上のデータ
を奇数成分と偶数成分とに分離した上で、コードブック
を周波数パターンにより行うことにより、周波数パター
ンはレベルの差は個々に異なっても、出現するパターン
の形状はどれも略似通った形状であるので、予め設けて
おくコードブックの数の少なくて済み、実際の音声デー
タに適合したコードブックを選択する選択処理を少ない
処理数により簡易に実行することができるとともに、処
理数により行う分前記選択処理の処理速度を高めること
ができる。The above is the encoding process B. As is apparent from the above description, according to the present invention, when compressing and encoding audio data, the data on the frequency axis is separated into the odd component and the even component, and then the audio data is first extracted from the data on the time axis. Discrete cosine transform on the frequency axis is performed, then it is separated into an odd component and an even component, and a codebook composed of frequency patterns is applied and subtracted. When this subtraction process is performed, the data on the frequency axis is separated into odd-numbered components and even-numbered components, and then the codebook is performed by the frequency pattern, so that the frequency pattern appears even if the level difference is different. Since the shapes of the patterns are similar to each other, the number of codebooks provided in advance can be small, and the selection process for selecting a codebook suitable for actual voice data can be easily executed with a small number of processes. In addition, the processing speed of the selection processing can be increased by the number of processings.

【００６７】また、音声データの圧縮符号化データは、
上記減算処理を行う過程により得た最大レベル（１）、
位置情報（２）、コード番号（３）、計算回数（４）に
より生成され、前記減算処理を行う回数を増やすと圧縮
符号化データのデータ長は前記減算処理の回数に比例し
て長くなる構成としている。これにより、圧縮符号化デ
ータのデータ長を、すなわち音声データの圧縮率を減算
処理の回数の設定により任意に設定変更することができ
るので、音声データの圧縮率を任意に高めることができ
る。さらに、減算処理の回数を少なく設定し圧縮率を高
くした場合でも、データの最大レベルを基準に減算処理
を行っているので、音声の内容および特徴が判別できる
程度の再現性を保障することができる。The compression coded data of the voice data is
The maximum level (1) obtained in the process of performing the subtraction process,
Generated from the position information (2), the code number (3), and the number of calculations (4), the data length of the compression-coded data increases in proportion to the number of times of the subtraction process when the number of times of the subtraction process is increased. I am trying. Accordingly, the data length of the compression-coded data, that is, the compression rate of the audio data can be arbitrarily changed by setting the number of times of the subtraction processing, and thus the compression rate of the audio data can be arbitrarily increased. Furthermore, even when the number of times of subtraction processing is set to be small and the compression rate is increased, since the subtraction processing is performed based on the maximum level of data, it is possible to guarantee the reproducibility to the extent that the content and characteristics of voice can be discriminated. it can.

【００６８】以下、以上にように圧縮符号化された音声
データを再現する処理について図面を参照にしながら説
明する。The process of reproducing the audio data compressed and encoded as described above will be described below with reference to the drawings.

【００６９】図１１は、本発明による音声復号化処理を
実現する音声復号化装置のブロック図である。図１１に
おいて、１１は入力した圧縮符号化データから最大レベ
ル（１）、位置情報（２）、コード番号（３）、計算回
数（４）を抽出する復号化手段である。１２は音声符号
化処理に用いる周波数パターンと同種類同数の周波数パ
ターンを同一コード番号により管理して格納するコード
ブックである。１３は復号化手段から入力した最大レベ
ル（１）、位置情報（２）、コード番号（３）、計算回
数（４）と、コードブック１２から入力した周波数パタ
ーンとから基づいて、音声データを周波数軸上のデータ
として演算（加算）により再現する計算手段である。１
４は計算手段１３から出力された周波数軸上のデータを
時間軸上のデータに逆離散コサイン変換する逆離散コサ
イン変換手段である。１５は逆離散コサイン変換手段１
４から入力される音声データを出力するバッファであ
る。１６はバッファ１５から入力したワンサンプリング
データ単位によりデータをＤ／Ａ変換して出力するＤ／
Ａ変換部である。FIG. 11 is a block diagram of a speech decoding apparatus which realizes speech decoding processing according to the present invention. In FIG. 11, reference numeral 11 is a decoding means for extracting the maximum level (1), position information (2), code number (3), and number of calculations (4) from the input compression-coded data. Reference numeral 12 is a codebook that manages and stores the same number and the same number of frequency patterns as the frequency patterns used for the voice encoding process, using the same code number. Reference numeral 13 indicates the frequency of the voice data based on the maximum level (1), the position information (2), the code number (3), the number of calculations (4) input from the decoding means, and the frequency pattern input from the codebook 12. It is a calculation means that reproduces as data on the axis by calculation (addition). 1
Reference numeral 4 is an inverse discrete cosine transforming means for performing an inverse discrete cosine transform on the data on the frequency axis output from the calculating means 13 into the data on the time axis. 15 is an inverse discrete cosine transform means 1
4 is a buffer that outputs the audio data input from 4. Reference numeral 16 is a D / A that outputs data after D / A converting it in the unit of one sampling data input from the buffer 15.
It is an A converter.

【００７０】以下、以上のように構成された本発明の音
声復号化装置についてその音声復号化処理を図１２乃至
図１５を用いて説明する。図１２は本発明の音声データ
の復号化処理の一実施例を示した上位フローチャートで
ある。図１３は図１２のＳＴ１における復号化処理Ａを
示したフローチャートである。図１４は図１２のＳＴ３
における復号化処理Ｂを示したフローチャートである。
図１５は図１２のＳＴ４における計算処理（加算）を示
したフローチャートである。図１６は図１２のＳＴ７に
おける逆離散コサイン変換処理を示したフローチャート
である。The speech decoding process of the speech decoding apparatus of the present invention configured as above will be described below with reference to FIGS. 12 to 15. FIG. 12 is a high-level flowchart showing an embodiment of the audio data decoding processing of the present invention. FIG. 13 is a flowchart showing the decoding process A in ST1 of FIG. FIG. 14 shows ST3 of FIG.
5 is a flowchart showing a decoding process B in FIG.
FIG. 15 is a flowchart showing the calculation process (addition) in ST4 of FIG. FIG. 16 is a flowchart showing the inverse discrete cosine transform process in ST7 of FIG.

【００７１】まず、図１２を用いて本実施例の音声復号
化処理の概略を説明する。圧縮符号化データを入力する
と復号化処理Ａを行う（ＳＴ１）。復号化処理Ａは、図
１０の（イ）の符号化ブロックをヘッダ情報とフレーム
とに分解し、前記ヘッダ情報の中から最大レベル
（１）、計算回数（４）を検出する処理である。First, the outline of the speech decoding process of this embodiment will be described with reference to FIG. When the compression coded data is input, the decoding process A is performed (ST1). The decoding process A is a process of decomposing the coded block of FIG. 10A into header information and a frame, and detecting the maximum level (1) and the number of calculations (4) from the header information.

【００７２】次に、前記ヘッダ情報の中から検出した計
算回数（４）に基づいてカウンタ値を設定する（ＳＴ
２）。すなわち、図２により音声データの符号化処理に
おいて行った減算処理の回数と同じ回数だけＳＴ４の加
算処理を行うことになる。Next, the counter value is set based on the number of calculations (4) detected from the header information (ST).
2). That is, the addition process of ST4 is performed the same number of times as the number of subtraction processes performed in the encoding process of the audio data according to FIG.

【００７３】続いて、復号化処理Ｂを行う。復号化処理
Ｂは、図１０の（イ）の符号化ブロック内のフレームｎ
から、各フレーム内にセットされている相対レベル
（１）、位置情報（２）、コード番号（３）を検出する
処理である。Then, the decoding process B is performed. The decoding process B is performed on the frame n in the coded block of FIG.
Is a process for detecting the relative level (1), position information (2), and code number (3) set in each frame.

【００７４】ＳＴ４へ進み、ＳＴ４において計算処理
（加算）を行う。計算処理は、復号化手段１１から入力
したコード番号（３）に基づきコードブック１２から該
当する周波数パターンを取り出し、復号化手段１１から
入力した相対レベル（１）に基づき前記周波数パターン
の最大レベルをｎ倍する。コードブック１２内には、符
号化処理の場合と同様に各周波数パターンの最大レベル
は図８の（イ）に示すように１に設定して格納してあ
る。次に、相対レベル（１）に基づいてｎ倍された周波
数パターンを位置情報（２）が示す位置に加算する。位
置情報（２）は、符号化処理の場合と同様に所定時間内
の音声データを１２８個に分割し、その何個目の位置で
あるかを示している。In ST4, calculation processing (addition) is performed in ST4. The calculation process extracts the corresponding frequency pattern from the codebook 12 based on the code number (3) input from the decoding means 11, and determines the maximum level of the frequency pattern based on the relative level (1) input from the decoding means 11. Multiply by n. In the code book 12, the maximum level of each frequency pattern is set to 1 and stored as shown in FIG. Next, the frequency pattern multiplied by n based on the relative level (1) is added to the position indicated by the position information (2). The position information (2) indicates the number of the position of the audio data within a predetermined time divided into 128, as in the case of the encoding process.

【００７５】計算処理が終了すると、カウンタ値をデク
リメントする（ＳＴ５）。カウンタ値が０に達するまで
（ＳＴ６）ＳＴ３→ＳＴ４→ＳＴ５→ＳＴ６と処理を繰
り返す。この間、順次後続するフレームから相対レベル
（１）、位置情報（２）、コード番号（３）を検出し、
コードブック１２から該当する周波数パターンを取り出
し、前記周波数パターンをｎ倍して位置情報（２）の示
す位置に加算する処理が行われることになる。When the calculation process is completed, the counter value is decremented (ST5). Until the counter value reaches 0 (ST6), the process is repeated as ST3 → ST4 → ST5 → ST6. During this period, the relative level (1), the position information (2), and the code number (3) are sequentially detected from the subsequent frames,
A process of taking out the corresponding frequency pattern from the codebook 12, multiplying the frequency pattern by n, and adding it to the position indicated by the position information (2) is performed.

【００７６】以上の処理が繰り返されカウンタ値が０に
達すると、図９の（ニ）に示すようなデータが生成され
る。このデータを逆離散コサイン変換して図９の（ハ）
に示すような時間軸上のデータに変換する（ＳＴ７）。
前記時間軸上のデータに変換されたデータは、Ｄ／Ａ変
換部１６においてアナログデータに変換された後出力さ
れる。When the above processing is repeated and the counter value reaches 0, the data as shown in FIG. 9D is generated. The inverse discrete cosine transform of this data is performed, and
The data is converted into data on the time axis as shown in (ST7).
The data converted into the data on the time axis is output after being converted into analog data in the D / A conversion unit 16.

【００７７】以上が本実施例による音声復号化処理の概
略である。以下、図１３を用いて図１２におけるＳＴ１
の復号化手段Ａについて詳細に説明する。The above is the outline of the speech decoding processing according to the present embodiment. Hereinafter, with reference to FIG. 13, ST1 in FIG.
The decoding means A will be described in detail.

【００７８】まず、復号化手段１１において圧縮符号化
データ（符号化ブロック）を入力すると（ＳＴ１）、圧
縮符号化データを構成する符号化ブロックからヘッダ情
報を抜き出し、最大レベル（１）および計算回数（４）
を出力する（ＳＴ２、ＳＴ３）。First, when the compressed coded data (coded block) is input to the decoding means 11 (ST1), the header information is extracted from the coded blocks constituting the compressed coded data, and the maximum level (1) and the number of calculations are calculated. (4)
Is output (ST2, ST3).

【００７９】以上が復号化手段Ａである。以下、図１４
を用いて図１２におけるＳＴ３の復号化手段Ｂについて
詳細に説明する。The above is the decoding means A. Below, FIG.
The decoding means B of ST3 in FIG. 12 will be explained in detail using.

【００８０】まず、復号化手段１１において圧縮符号化
データ（符号化ブロック）内の先頭フレームを解析する
（ＳＴ１）。解析の結果、相対レベル（１）、位置情報
（２）、コード番号（３）を出力する（ＳＴ２、ＳＴ
３、ＳＴ４）。図１４のフローチャートに戻りＳＴ４→
ＳＴ５→ＳＴ６→ＳＴ３と進み、再び復号化手段Ｂに戻
ってくると、圧縮符号化データ内の２番目のフレームつ
いて上述の処理を行い、この処理を図１４のＳＴ６にお
いてカウンタ値が０に達するまで繰り返す。First, the decoding means 11 analyzes the first frame in the compression coded data (coded block) (ST1). As a result of the analysis, the relative level (1), the position information (2), and the code number (3) are output (ST2, ST
3, ST4). Returning to the flowchart of FIG. 14, ST4 →
When the process proceeds from ST5 to ST6 to ST3 and returns to the decoding means B again, the above-described process is performed for the second frame in the compression coded data, and the counter value reaches 0 in ST6 of FIG. Repeat until.

【００８１】以上が復号化手段Ｂである。以下、図１５
を用いて図１２におけるＳＴ４の計算処理（加算）につ
いて詳細に説明する。The above is the decoding means B. Below, FIG.
The calculation process (addition) of ST4 in FIG. 12 will be described in detail using.

【００８２】まず、計算手段１３は、復号化手段１１か
ら相対レベル（１）、位置情報（２）、コード番号
（３）を入力する（ＳＴ１）。次にコードブック１２か
ら前記コード番号（３）に対応する周波数パターンを取
り出す（ＳＴ２）。続いて、前記相対レベル（１）に基
づいて前記周波数パターンの最大レベルをｎ倍する（Ｓ
Ｔ３）。ｎ倍した周波数パターンを前記位置情報（２）
に前記周波数パターンの最大レベルの位置を合せて加算
する処理を行う（ＳＴ４）。周波数パターンの加算が終
了すると、図１２のフローチャートに戻り、図１２のＳ
Ｔ６においてカウンタ値が０に達するまでＳＴ５→ＳＴ
６→ＳＴ３→ＳＴ４の処理を繰り返し、再びＳＴ４の計
算処理において圧縮符号化データ内の２番目のフレーム
の解析結果から得た相対レベル（１）、位置情報
（２）、コード番号（３）を入力し、前回の加算結果の
上にさらに今回の周波数パターンが加算される。計算処
理は図２のＳＴ６においてカウンタ値が０に達するまで
繰り返される。First, the calculation means 13 inputs the relative level (1), the position information (2) and the code number (3) from the decoding means 11 (ST1). Next, the frequency pattern corresponding to the code number (3) is extracted from the code book 12 (ST2). Then, the maximum level of the frequency pattern is multiplied by n based on the relative level (1) (S
T3). The frequency pattern multiplied by n is used as the position information (2).
Then, the maximum level position of the frequency pattern is aligned and added (ST4). When the addition of the frequency pattern is completed, the process returns to the flowchart of FIG. 12 and S of FIG.
ST5 → ST until the counter value reaches 0 at T6
The process of 6 → ST3 → ST4 is repeated, and the relative level (1), the position information (2), and the code number (3) obtained from the analysis result of the second frame in the compression coded data are again obtained in the calculation process of ST4. The frequency pattern of this time is added to the input result of the previous addition. The calculation process is repeated until the counter value reaches 0 in ST6 of FIG.

【００８３】以上が計算処理（加算）である。以上の説
明から明らかなように本発明によると、圧縮された音声
データを再現する際、まず圧縮符号化データから相対レ
ベル（１）、位置情報（２）、コード番号（３）、計算
回数（４）を検出し、圧縮符号化データの先頭フレーム
から得られる情報によりコードブックを選択し加算する
処理が行われる。続いて、圧縮符号化データの次フレー
ムから得られる情報により処理を行い、これを前記計算
回数（４）に基づく回数だけ繰り返して復号化処理を行
う。この復号化処理により圧縮符号化された音声データ
を確実に再現することができる。The above is the calculation process (addition). As is apparent from the above description, according to the present invention, when reproducing compressed audio data, first, the relative level (1), position information (2), code number (3), number of calculations ( 4) is detected, a codebook is selected and added according to the information obtained from the first frame of the compression encoded data. Then, the process is performed by the information obtained from the next frame of the compression-encoded data, and the decoding process is performed by repeating the process for the number of times based on the calculation count (4). By this decoding processing, the compression-encoded audio data can be reliably reproduced.

【００８４】[0084]

【発明の効果】以上の説明から明らかなように、本発明
は、音声データを圧縮符号化する際、まず音声データを
時間軸上のデータから周波数軸上のデータに離散コサイ
ン変換を行って、次に離散コサイン変換後のデータを奇
数成分と偶数成分のに分離し、この分離した結果得た周
波数軸上のデータに周波数パターンにより構成されたコ
ードブックを当てはめて減算する処理を行っている。こ
の減算処理を行う際、周波数軸上のデータを奇数成分と
偶数成分とに分離しているので、奇数成分と偶数成分と
の各々領域において相関係数が増加するため、周波数軸
上のデータは略近似したパターンが出現するようにな
る。その上で、コードブックを周波数パターンにより構
成すると、周波数パターンはレベルの差は個々に異なっ
ても、出現するパターンの形状は相互に似通った形状で
あるので、予め設けておくコードブックの数を大幅に削
減しても実際の周波数軸上のデータに適応させることが
でき、実際の音声データに適合したコードブックを選択
する選択処理を少ない処理ステップ数により簡易に実行
することができるとともに、処理数により行う分前記選
択処理の処理速度を高めることができるという効果を実
現することができる。As is apparent from the above description, according to the present invention, when compressing and coding voice data, first, the voice data is subjected to discrete cosine transform from data on the time axis to data on the frequency axis, Next, the data after the discrete cosine transformation is separated into an odd component and an even component, and the data on the frequency axis obtained as a result of the separation is subjected to a subtraction by applying a codebook composed of a frequency pattern. When performing this subtraction process, the data on the frequency axis is separated into the odd component and the even component, so the correlation coefficient increases in each region of the odd component and the even component. Patterns that are approximately similar to each other will appear. Moreover, if the codebook is composed of frequency patterns, the shapes of the patterns that appear are similar to each other even if the difference in level of the frequency patterns is different, so the number of codebooks to be provided in advance can be reduced. Even if it is drastically reduced, it can be adapted to the actual data on the frequency axis, and the selection process to select the codebook that matches the actual voice data can be easily executed with a small number of processing steps. It is possible to realize an effect that the processing speed of the selection processing can be increased by the number.

【００８５】また、上記減算処理を行う際、前記減算処
理の回数は任意に設定することができるので、前記回数
を予め多く設定しておくとその分音質の高い音声データ
の圧縮を実現することができ、反面、前記回数を予め多
く設定しておくと計算回数が多くなり、その分圧縮符号
化データのデータ長が長くなり、音声データの圧縮率は
低くなる。このように任意に音声データの圧縮率を設定
変更することができる。When the subtraction process is performed, the number of times of the subtraction process can be set arbitrarily. Therefore, if the number of times of the subtraction process is set in advance, it is possible to realize compression of voice data having high sound quality. On the other hand, if the number of times is set in advance, the number of calculations increases, the data length of the compression-coded data increases, and the compression rate of the audio data decreases. In this way, it is possible to arbitrarily change the compression ratio of audio data.

【００８６】また、上記の減算処理の回数を予め多く設
定しておいた場合でも、周波数軸上のデータと選択され
た周波数パターンとがほとんど同一形状である場合は、
減算により実際のデータの最大レベルが予め設定されて
いる最小レベルを下回ると、その時点で減算処理が前記
回数に達しなくても中断している。このように、実際の
周波数軸上のデータと周波数パターンとが近似している
場合には減算処理の回数は予め設定された回数より少な
くなり、また、実際の周波数軸上のデータと周波数パタ
ーンとが近似していない場合には予め設定された回数ま
で行っているので、前記回数を多く設定しても周波数軸
上のデータと選択された周波数パターンとがほとんど同
一形状であれば処理を中断する分、音声データの再現性
を保障しつつ圧縮率を高めることができる。Further, even when the number of times of the above subtraction processing is set in advance, if the data on the frequency axis and the selected frequency pattern have almost the same shape,
When the maximum level of the actual data falls below the preset minimum level due to the subtraction, the subtraction process is interrupted at that time even if the number of times has not reached the above number. As described above, when the actual data on the frequency axis and the frequency pattern are close to each other, the number of times of the subtraction process is less than the preset number, and the actual data on the frequency axis and the frequency pattern are If is not approximate, the process is performed up to a preset number of times, so even if the number of times is set to a large number, if the data on the frequency axis and the selected frequency pattern have almost the same shape, the process is interrupted. As a result, the compression rate can be increased while ensuring the reproducibility of audio data.

【００８７】また、無音声状態が判別された場合でも、
無音声状態にのみ用いる処理を特別に行わず、音声があ
る場合の通常の処理により圧縮符号化データを生成して
いるので、その分ソフトウェアを減少させることができ
る。さらに音声データがない場合には圧縮情報が減少す
るので、その分音声データの圧縮率を上げることができ
る。Further, even when the silent state is determined,
Since the compression-encoded data is generated by the normal process when there is voice, the process used only for the non-voice state is not specially performed, and the software can be reduced accordingly. Furthermore, when there is no audio data, the compression information decreases, so the compression rate of the audio data can be increased accordingly.

【００８８】また、上記減算処理を行う際、減算処理の
回数を少なく設定し圧縮率を高くした場合でも、実際の
データの最大レベルを基準にコードブックを当てはめて
減算処理を行っているので、音声の内容および特徴が判
別できる程度の再現性を保障することができる。Further, when performing the above subtraction processing, even if the number of subtraction processings is set to be small and the compression rate is increased, since the codebook is applied with reference to the maximum level of actual data, the subtraction processing is performed. It is possible to guarantee reproducibility to the extent that the content and characteristics of voice can be discriminated.

【００８９】また、上記減算処理により得られた音声デ
ータの圧縮符号化データは、上記減算処理を行う過程に
より得た最大レベル（１）、位置情報（２）、コード番
号（３）、計算回数（４）により生成され、前記減算処
理を行う回数を増やすと前記圧縮符号化データのデータ
長は前記減算処理の回数に比例して長くなる構成として
いるので、音声データの圧縮率を減算処理の回数の設定
により任意に設定変更することができるので、圧縮符号
化データのデータ長をを任意に設定変化させることがで
き、音声データの圧縮率を任意に高めることができる。The compression coded data of the audio data obtained by the subtraction process is the maximum level (1) obtained by the process of the subtraction process, the position information (2), the code number (3), the number of calculations. Since the data length of the compression-coded data generated in (4) increases in proportion to the number of times of the subtraction processing when the number of times of performing the subtraction processing is increased, the compression rate of the audio data is Since the setting can be arbitrarily changed by setting the number of times, the data length of the compression encoded data can be arbitrarily changed and the compression rate of the audio data can be arbitrarily increased.

[Brief description of drawings]

【図１】本発明による音声符号化処理を行うブロック図FIG. 1 is a block diagram of a voice encoding process according to the present invention.

【図２】本発明の音声データの符号化処理の一実施例を
示した上位フローチャートFIG. 2 is a high-level flow chart showing an embodiment of a voice data encoding process of the present invention.

【図３】図２のＳＴ４における最大レベル検出処理を示
したフローチャートFIG. 3 is a flowchart showing a maximum level detection process in ST4 of FIG.

【図４】図２のＳＴ６における最適コードブック選択処
理を示したフローチャートFIG. 4 is a flowchart showing an optimum codebook selection process in ST6 of FIG.

【図５】図２のＳＴ７における計算処理（減算）を示し
たフローチャート5 is a flowchart showing a calculation process (subtraction) in ST7 of FIG.

【図６】図２のＳＴ８における符号化処理Ａを示したフ
ローチャートFIG. 6 is a flowchart showing an encoding process A in ST8 of FIG.

【図７】図２のＳＴ１１における符号化処理Ｂを示した
フローチャートFIG. 7 is a flowchart showing an encoding process B in ST11 of FIG.

【図８】図１のコードブック５に格納されている周波数
パターンの代表パターンを示した図8 is a diagram showing a representative pattern of frequency patterns stored in the codebook 5 of FIG.

【図９】音声データを離散コサイン変換して時間軸上の
データから周波数軸上のデータに変換した所定時間にお
けるデータの状態を示した図FIG. 9 is a diagram showing a state of data at a predetermined time when voice data is subjected to discrete cosine transform and converted from data on a time axis to data on a frequency axis.

【図１０】符号化された圧縮符号化データの構成を示し
たデータ構成図FIG. 10 is a data configuration diagram showing a configuration of encoded compression encoded data.

【図１１】本発明による音声復号化処理を実現するブロ
ック図FIG. 11 is a block diagram for realizing a speech decoding process according to the present invention.

【図１２】本発明の音声データの復号化処理の一実施例
を示した上位フローチャートFIG. 12 is a high-level flowchart showing an embodiment of the audio data decoding process of the present invention.

【図１３】図１２のＳＴ１における復号化処理Ａを示し
たフローチャート13 is a flowchart showing a decoding process A in ST1 of FIG.

【図１４】図１２のＳＴ３における復号化処理Ｂを示し
たフローチャート14 is a flowchart showing a decoding process B in ST3 of FIG.

【図１５】図１２のＳＴ４における計算処理（加算）を
示したフローチャートFIG. 15 is a flowchart showing calculation processing (addition) in ST4 of FIG.

[Explanation of symbols]

３離散コサイン変換手段４最大レベル検出手段５コードブック６最適コードブック選択手段７計算手段８符号化手段 3 Discrete Cosine Transforming Means 4 Maximum Level Detecting Means 5 Codebook 6 Optimum Codebook Selecting Means 7 Calculating Means 8 Encoding Means

Claims

[Claims]

1. A first process for performing a discrete cosine transform of audio data in a predetermined time unit, a second process for separating the data obtained by the discrete cosine transform into an odd component and an even component, and the separated process. A third process for detecting the maximum level of data and position information of this maximum level and a codebook composed of frequency patterns were used to approximate the data obtained by the discrete cosine transform to the data around the maximum level. The fourth process of selecting the codebook, the fifth process of subtracting the codebook from the data obtained by the discrete cosine transform, the maximum level, the position information, and the codebook used for the subtraction are shown. The encoded data of the audio data is generated based on the information and the number of times of the fifth processing performed on the audio data within the predetermined time. A speech encoding method comprising a sixth process.

2. A means for performing discrete cosine transform of audio data in a predetermined time unit, a means for separating the separated data into an odd component and an even component, a maximum level of the separated data and this maximum level. A means for detecting position information, a means for storing a codebook constituted by a frequency pattern, a means for selecting the codebook approximated to the data around the maximum level in the data obtained by the discrete cosine transform, Means for subtracting the codebook from the data around the maximum level, the maximum level, the position information, information indicating the codebook used for the subtraction, and the subtraction performed on the voice data within the predetermined time. A voice encoding device, comprising: means for generating encoded data of voice data based on the number of times.

3. A means for performing discrete cosine transform of audio data in a predetermined time unit, a means for separating the separated data into an odd component and an even component, a maximum level of the separated data and this maximum level. A means for detecting position information, a means for storing a codebook constituted by a frequency pattern, a means for selecting the codebook approximated to the data around the maximum level in the data obtained by the discrete cosine transform, Means for subtracting the codebook from data around the maximum level, means for arbitrarily setting the number of times of the subtraction, the maximum level, the position information, information indicating the codebook used for subtraction, the predetermined time Means for generating encoded data of audio data based on the number of times of the subtraction performed on the audio data in Audio coding device.

4. A means for performing discrete cosine transform of audio data in a predetermined time unit, a means for separating the separated data into an odd component and an even component, a maximum level of the separated data and a maximum level of the maximum level. Means for detecting position information, means for storing a codebook composed of frequency patterns, means for selecting the codebook that approximates the data around the maximum level in the data obtained by the discrete cosine transform, Means for subtracting the codebook from the data around the maximum level, means for arbitrarily setting the minimum level for the subtraction in the level of the data obtained by the discrete cosine transform, the maximum level, the position information, and subtraction Information indicating the codebook used, the number of times of subtraction performed on the audio data within the predetermined time. A speech coding apparatus comprising means for generating coded data of speech data based on a number.

5. A first process for performing a discrete cosine transform of audio data in a predetermined time unit, a second process for separating the data obtained by the discrete cosine transform into an odd component and an even component, and the separated process. A third process for detecting the maximum level of data and position information of this maximum level and a codebook composed of frequency patterns were used to approximate the data obtained by the discrete cosine transform to the data around the maximum level. A fourth process of selecting the codebook, a fifth process of subtracting the codebook from data around the maximum level, and a maximum level of
A fifth process for generating encoded data of audio data based on the position information, information indicating the codebook used for subtraction, and the number of times of the fifth process performed on the audio data within the predetermined time. And a voice coding method for generating coded data of voiceless data by the sixth process even in the voiceless state.

6. When performing the fifth process, the codebook obtained by the fourth process is multiplied by n in accordance with the maximum level of the data obtained by the discrete cosine transform, and the maximum of the data obtained by the discrete cosine transform is obtained. 2. The speech coding method according to claim 1, wherein the position of the maximum level of the codebook is aligned with the position of the level, and the data obtained by the discrete cosine transform is subtracted from the codebook.

7. The voice encoding method according to claim 1, wherein when performing the sixth process, the data length of the encoded data is variably generated according to the number of times of the fifth process.

8. A first process of inputting coded data of voice data generated by the voice coding method according to claim 1, a maximum level from the coded data, position information of this maximum level, and a frequency pattern. A second process configured to extract information indicating a codebook used in the encoding process, and selecting a codebook based on the information indicating the codebook, and selecting the codebook at the position indicated by the position information. And a third process for reproducing the voice data from the coded data by adding the maximum levels of the above.

9. A unit for inputting encoded data of voice data generated by the voice encoding device according to claim 2.
The number of operations performed on the voice data within a predetermined time is extracted from the encoded data, and the maximum level from the frame generated according to the number of operations, the position information of the maximum level, and the frequency pattern are used. Means for extracting information indicating the codebook used in the encoded processing,
A means for storing a codebook configured by a frequency pattern, a codebook is selected based on information indicating the codebook, and means for adding the maximum level of the codebook to the position indicated by the position information, A speech decoding apparatus, comprising: means for repeating the addition of the codebook by the number of times of the extracted operations, and reproducing speech data from the encoded data.