JP4736699B2

JP4736699B2 - Audio signal compression apparatus, audio signal restoration apparatus, audio signal compression method, audio signal restoration method, and program

Info

Publication number: JP4736699B2
Application number: JP2005299346A
Authority: JP
Inventors: 寧佐藤
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2005-10-13
Filing date: 2005-10-13
Publication date: 2011-07-27
Anticipated expiration: 2025-10-13
Also published as: JP2007108440A

Description

この発明は、音声信号圧縮装置、音声信号復元装置、音声信号圧縮方法、音声信号復元方法及びプログラムに関する。 The present invention relates to an audio signal compression device, an audio signal restoration device, an audio signal compression method, an audio signal restoration method, and a program.

テキストデータなどを音声へと変換する音声合成の手法が、カーナビゲーション等の分野で近年行われるようになっている。
音声合成では、例えば、テキストデータが表す文に含まれる単語、文節及び文節相互の係り受け関係が特定され、特定された単語、文節及び係り受け関係に基づいて、文の読み方が特定される。そして、特定した読み方を表す表音文字列に基づき、音声を構成する音素の波形や継続時間やピッチ（基本周波数）のパターンが決定され、決定結果に基づいて漢字かな混じり文全体を表す音声の波形が決定され、決定された波形を有するような音声が出力される。 In recent years, a speech synthesis method for converting text data into speech has been used in the field of car navigation and the like.
In speech synthesis, for example, a word included in a sentence represented by text data, a phrase, and a dependency relationship between phrases are specified, and how to read the sentence is specified based on the specified word, phrase, and dependency relationship. The phoneme waveform, duration, and pitch (fundamental frequency) patterns that make up the speech are determined based on the phonetic character string that represents the specified reading. Based on the determination result, The waveform is determined, and a sound having the determined waveform is output.

上述した音声合成の手法において、音声の波形を特定するためには、音声の波形を表す音声データを集積した音声辞書を検索する。合成する音声を自然なものにするためには、音声辞書が膨大な数の音声データを集積していなければならない。 In the speech synthesis method described above, in order to specify a speech waveform, a speech dictionary in which speech data representing the speech waveform is accumulated is searched. In order for the synthesized speech to be natural, the speech dictionary must accumulate an enormous number of speech data.

加えて、カーナビゲーション装置等、小型化が求められる装置にこの手法を応用する場合は、一般的に、装置が用いる音声辞書を記憶する記憶装置もサイズの小型化が必要になる。そして、記憶装置のサイズを小型化すれば、一般的にはその記憶容量の小容量化も避けられない。 In addition, when this method is applied to a device that is required to be downsized, such as a car navigation device, generally, a storage device that stores a speech dictionary used by the device needs to be downsized. If the size of the storage device is reduced, it is generally inevitable to reduce the storage capacity.

そこで、記憶容量が小さな記憶装置にも十分な量の音声データを含んだ音素辞書を格納できるようにするため、音声データにデータ圧縮を施し、音声データ１個あたりのデータ容量を小さくすることが行われていた（例えば、特許文献１参照）。
特表２０００−５０２５３９号公報 Therefore, in order to store a phoneme dictionary including a sufficient amount of audio data even in a storage device with a small storage capacity, it is possible to compress the audio data and reduce the data capacity per audio data. (For example, refer to Patent Document 1).
Special Table 2000-502539

しかし、データの規則性に着目してデータを圧縮する手法であるエントロピー符号化の手法（具体的には、算術符号化やハフマン符号化など）を用いて、人が発する音声を表す音声データを圧縮する場合、人が発声した音声を表す音声データにはある程度の規則性がみられるため効率のよい圧縮が行えるものの、人が発声した音声に起因しない成分（例えば、楽器が発する音を表す成分など）を含む音声データは、全体としては必ずしも明確な周期性を有していないため、圧縮の効率が低かった。 However, using entropy coding techniques (specifically, arithmetic coding, Huffman coding, etc.) that compress data by paying attention to the regularity of the data, audio data representing the voice uttered by a person is converted. In the case of compression, although sound data representing a voice uttered by a person has a certain degree of regularity and can be efficiently compressed, a component that does not originate from a voice uttered by a person (for example, a component that represents a sound emitted by a musical instrument) Etc.) is not necessarily clearly periodic as a whole, and compression efficiency is low.

また、人が発声した音声を表す音声データをエントロピー符号化する際には、ピッチのゆらぎも問題になっていた。ピッチは、人間の感情や意識に影響されやすく、ある程度は一定とみなせる周期であるものの、現実には微妙にゆらぎを生じる。従って、同一話者が同じ言葉（音素）を複数ピッチ分発声した場合、ピッチの間隔は通常、一定しない。従って、１個の音素を表す波形にも正確な規則性がみられない場合が多く、このためにエントロピー符号化による圧縮の効率が低くなる場合が多かった。 Further, when entropy coding is performed on voice data representing voice uttered by a person, pitch fluctuation has also been a problem. The pitch is easily affected by human emotions and consciousness and is a period that can be regarded as being constant to some extent, but in reality it causes subtle fluctuations. Therefore, when the same speaker utters the same word (phoneme) for a plurality of pitches, the pitch interval is usually not constant. Therefore, there are many cases where accurate regularity is not observed even in a waveform representing one phoneme, and for this reason, compression efficiency by entropy coding is often lowered.

この発明は上記実状に鑑みてなされたものであり、人が発する音声を表す成分を含んだデータのデータ容量を効率よく圧縮することを可能にするための音声信号圧縮装置、音声信号圧縮方法及びプログラムを提供すること、また、このような音声信号圧縮装置及び音声信号圧縮方法により圧縮されたデータを復元するための音声信号復元装置、音声信号復元方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an audio signal compression device, an audio signal compression method, and an audio signal compression method for efficiently compressing the data capacity of data including a component representing a voice uttered by a person, and It is an object of the present invention to provide a program, and to provide an audio signal restoration device, an audio signal restoration method, and a program for restoring data compressed by such an audio signal compression device and an audio signal compression method.

上記目的を達成すべく、この発明の第１の観点に係る音声信号圧縮装置は、
音声の波形を表す圧縮対象の音声信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号を生成するサブバンド抽出手段と、
前記音声信号をフィルタリングすることにより得られるピッチ信号の振幅が所定量に達していないと判別した場合に、前記サブバンド信号より、所定の基準に合致する程度の周期性を有する連続成分、及び、当該サブバンド信号より前記連続成分を除いたものに相当するランダム成分を分離し、前記ピッチ信号の振幅が前記所定量に達していると判別した場合に、前記サブバンド信号を前記連続成分として扱う成分分離手段と、
前記連続成分にエントロピー符号化又は線形予測符号化を施す符号化手段と、
を備えることを特徴とする。 In order to achieve the above object, an audio signal compression apparatus according to the first aspect of the present invention provides:
Subband extraction means for generating a subband signal representing a temporal change in the intensity of the fundamental frequency component and the harmonic component of the audio signal to be compressed representing the waveform of the audio;
When it is determined that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the subband signal has a continuous component having a periodicity that meets a predetermined criterion, and A random component corresponding to the subband signal excluding the continuous component is separated, and the subband signal is treated as the continuous component when it is determined that the amplitude of the pitch signal has reached the predetermined amount. Component separation means;
Encoding means for performing entropy encoding or linear predictive encoding on the continuous components ;
Characterized the Ruco equipped with.

また、この発明の第２の観点に係る音声信号圧縮装置は、
音声の波形を表す圧縮対象の音声信号を取得し、当該音声信号を当該音声の単位ピッチ分の複数の区間に区切った場合におけるこれらの区間の位相を実質的に同一に揃えることによって、当該音声信号をピッチ波形信号へと加工する音声信号加工手段と、
前記ピッチ波形信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号を生成するサブバンド抽出手段と、
前記音声信号をフィルタリングすることにより得られるピッチ信号の振幅が所定量に達していないと判別した場合に、前記サブバンド信号より、所定の基準に合致する程度の周期性を有する連続成分、及び、当該サブバンド信号より前記連続成分を除いたものに相当するランダム成分を分離し、前記ピッチ信号の振幅が前記所定量に達していると判別した場合に、前記サブバンド信号を前記連続成分として扱う成分分離手段と、
前記連続成分にエントロピー符号化又は線形予測符号化を施す符号化手段と、
を備えることを特徴とする。 An audio signal compression apparatus according to the second aspect of the present invention provides:
By acquiring the audio signal to be compressed representing the waveform of the audio, and dividing the audio signal into a plurality of intervals corresponding to the unit pitch of the audio, the phases of these intervals are made substantially the same, Audio signal processing means for processing the signal into a pitch waveform signal;
Subband extraction means for generating a subband signal representing a temporal change in intensity of the fundamental frequency component and the harmonic component of the pitch waveform signal;
When it is determined that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the subband signal has a continuous component having a periodicity that meets a predetermined criterion, and A random component corresponding to the subband signal excluding the continuous component is separated, and the subband signal is treated as the continuous component when it is determined that the amplitude of the pitch signal has reached the predetermined amount. Component separation means;
Encoding means for performing entropy encoding or linear predictive encoding on the continuous components ;
Characterized the Ruco equipped with.

前記符号化手段は、前記連続成分を非線形量子化した結果、及び／又は前記ランダム成分を非線形量子化した結果にエントロピー符号化を施すものであってもよい。 The encoding means may perform entropy encoding on the result of nonlinear quantization of the continuous component and / or the result of nonlinear quantization of the random component.

前記符号化手段は、前記非線形量子化の量子化特性を示すデータを生成するものであってもよい。 The encoding unit may generate data indicating a quantization characteristic of the nonlinear quantization.

前記符号化手段は、過去にエントロピー符号化された連続成分及び／又はランダム成分のデータ量に基づいて、前記非線形量子化の量子化特性を決定し、決定した量子化特性に合致するように前記非線形量子化を行うものであってもよい。 The encoding means determines a quantization characteristic of the nonlinear quantization based on a data amount of a continuous component and / or a random component that have been entropy-encoded in the past, and matches the determined quantization characteristic. Nonlinear quantization may be performed.

また、この発明の第３の観点に係る音声信号復元装置は、
音声の波形を表す圧縮対象の音声信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号のうち所定の基準に合致する程度の周期性を有する連続成分を抽出してエントロピー符号化又は線形予測符号化を施したものに相当する入力信号を取得し、当該入力信号を復号化することにより当該連続成分を復元する復号化手段と、
前記復元された連続成分を前記サブバンド信号より除いたものに相当するランダム成分が存在する場合に、当該ランダム成分を取得し、当該連続成分に当該ランダム成分を加算することにより、当該サブバンド信号を復元し、前記ランダム成分が存在しない場合に、前記復元された連続成分を当該サブバンド信号として扱うサブバンド信号復元部と、
前記復元されたサブバンド信号に基づいて前記圧縮対象の音声信号を復元する音声信号復元手段と、
を備えることを特徴とする。 An audio signal restoration device according to the third aspect of the present invention is
An entropy code is extracted by extracting continuous components having a periodicity that meets a predetermined standard from subband signals representing temporal changes in the fundamental frequency components and harmonic component intensities of audio signals to be compressed that represent speech waveforms. Decoding means for obtaining an input signal corresponding to a signal that has been subjected to normalization or linear predictive coding, and restoring the continuous component by decoding the input signal;
If the random component corresponding to the restored continuous component minus from said sub-band signal is present, by obtains the random component, adds the random component to the connected component, the sub-band A subband signal restoration unit that restores a signal and treats the restored continuous component as the subband signal when the random component does not exist ;
And voice signal restoring means for restoring the audio signal of the compressed object based on the reconstructed subband signals,
Characterized the Ruco equipped with.

また、この発明の第４の観点に係る音声信号圧縮方法は、
プロセッサを有する音声信号圧縮装置にて実行される音声信号圧縮方法であって、
前記プロセッサが、音声の波形を表す圧縮対象の音声信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号を生成し、
前記プロセッサが、前記音声信号をフィルタリングすることにより得られるピッチ信号の振幅が所定量に達していないと判別した場合に、前記サブバンド信号より、所定の基準に合致する程度の周期性を有する連続成分、及び、当該サブバンド信号より前記連続成分を除いたものに相当するランダム成分を分離し、前記ピッチ信号の振幅が前記所定量に達していると判別した場合に、前記サブバンド信号を前記連続成分として扱い、
前記プロセッサが、前記連続成分にエントロピー符号化又は線形予測符号化を施す、
ことを特徴とする。 An audio signal compression method according to the fourth aspect of the present invention includes:
An audio signal compression method executed by an audio signal compression apparatus having a processor,
The processor generates a subband signal representing a temporal change in intensity of a fundamental frequency component and a harmonic component of an audio signal to be compressed representing an audio waveform;
When the processor determines that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the processor has a periodicity that meets a predetermined criterion from the subband signal. When the component and a random component corresponding to the sub-band signal obtained by removing the continuous component are separated and it is determined that the amplitude of the pitch signal has reached the predetermined amount, the sub-band signal is Treated as a continuous component,
The processor performs entropy coding or linear predictive coding on the continuous components;
It is characterized by that.

また、この発明の第５の観点に係る音声信号圧縮方法は、
プロセッサを有する音声信号圧縮装置にて実行される音声信号圧縮方法であって、
前記プロセッサが、音声の波形を表す圧縮対象の音声信号を取得し、当該音声信号を当該音声の単位ピッチ分の複数の区間に区切った場合におけるこれらの区間の位相を実質的に同一に揃えることによって、当該音声信号をピッチ波形信号へと加工し、
前記プロセッサが、前記ピッチ波形信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号を生成し、
前記プロセッサが、前記音声信号をフィルタリングすることにより得られるピッチ信号の振幅が所定量に達していないと判別した場合に、前記サブバンド信号より、所定の基準に合致する程度の周期性を有する連続成分、及び、当該サブバンド信号より前記連続成分を除いたものに相当するランダム成分を分離し、前記ピッチ信号の振幅が所定量に達していると判別した場合に、前記サブバンド信号を前記連続成分として扱い、
前記プロセッサが、前記連続成分にエントロピー符号化又は線形予測符号化を施す、
ことを特徴とする。 An audio signal compression method according to the fifth aspect of the present invention is
An audio signal compression method executed by an audio signal compression apparatus having a processor,
When the processor acquires an audio signal to be compressed representing a waveform of an audio and divides the audio signal into a plurality of intervals corresponding to a unit pitch of the audio, the phases of these intervals are made substantially the same. By processing the audio signal into a pitch waveform signal,
The processor generates a subband signal representing a temporal change in intensity of a fundamental frequency component and a harmonic component of the pitch waveform signal;
When the processor determines that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the processor has a periodicity that meets a predetermined criterion from the subband signal. When the component and the random component corresponding to the sub-band signal obtained by removing the continuous component are separated and it is determined that the amplitude of the pitch signal has reached a predetermined amount, the sub-band signal is Treat as an ingredient,
The processor performs entropy coding or linear predictive coding on the continuous components;
It is characterized by that.

また、この発明の第６の観点に係る音声信号復元方法は、
プロセッサを有する音声信号復元装置にて実行される音声信号復元方法であって、
前記プロセッサが、音声の波形を表す圧縮対象の音声信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号のうち所定の基準に合致する程度の周期性を有する連続成分を抽出してエントロピー符号化又は線形予測符号化を施したものに相当する入力信号を取得し、当該入力信号を復号化することにより当該連続成分を復元し、
前記プロセッサが、前記復元された連続成分を前記サブバンド信号より除いたものに相当するランダム成分が存在する場合に、当該ランダム成分を取得し、当該連続成分に当該ランダム成分を加算することにより、当該サブバンド信号を復元し、前記ランダム成分が存在しない場合に、前記復元された連続成分を当該サブバンド信号として扱い、
前記プロセッサが、前記復元されたサブバンド信号に基づいて前記圧縮対象の音声信号を復元する、
ことを特徴とする。 An audio signal restoration method according to the sixth aspect of the present invention includes:
An audio signal restoration method executed by an audio signal restoration device having a processor,
The processor extracts a continuous component having a periodicity enough to meet a predetermined standard from subband signals representing temporal changes in intensity of fundamental frequency components and harmonic components of a compression target speech signal representing a speech waveform. To obtain an input signal corresponding to the one subjected to entropy coding or linear prediction coding, and to restore the continuous component by decoding the input signal,
Wherein the processor, when the random component corresponding to the restored continuous component minus from said sub-band signal is present, acquires the random component, by adding the random component to the continuous component The subband signal is restored, and when the random component is not present, the restored continuous component is treated as the subband signal,
The processor restores the audio signal to be compressed based on the restored subband signal;
It is characterized by that.

また、この発明の第７の観点に係るプログラムは、
コンピュータを、
音声の波形を表す圧縮対象の音声信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号を生成するサブバンド抽出手段、
前記音声信号をフィルタリングすることにより得られるピッチ信号の振幅が所定量に達していないと判別した場合に、前記サブバンド信号より、所定の基準に合致する程度の周期性を有する連続成分、及び、当該サブバンド信号より前記連続成分を除いたものに相当するランダム成分を分離し、前記ピッチ信号の振幅が前記所定量に達していると判別した場合に、前記サブバンド信号を前記連続成分として扱う成分分離手段、
前記連続成分にエントロピー符号化又は線形予測符号化を施す符号化手段、
として機能させるためのものであることを特徴とする。 A program according to the seventh aspect of the present invention is
Computer
Subband extraction means for generating a subband signal representing a temporal change in intensity of a fundamental frequency component and a harmonic component of an audio signal to be compressed representing an audio waveform ;
When it is determined that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the subband signal has a continuous component having a periodicity that meets a predetermined criterion, and A random component corresponding to the subband signal excluding the continuous component is separated, and the subband signal is treated as the continuous component when it is determined that the amplitude of the pitch signal has reached the predetermined amount. Component separation means ,
Encoding means for performing entropy encoding or linear predictive encoding on the continuous components ;
Characterized in that it is intended to function as a.

また、この発明の第８の観点に係るプログラムは、
コンピュータを、
音声の波形を表す圧縮対象の音声信号を取得し、当該音声信号を当該音声の単位ピッチ分の複数の区間に区切った場合におけるこれらの区間の位相を実質的に同一に揃えることによって、当該音声信号をピッチ波形信号へと加工する音声信号加工手段、
前記ピッチ波形信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号を生成するサブバンド抽出手段、
前記音声信号をフィルタリングすることにより得られるピッチ信号の振幅が所定量に達していないと判別した場合に、前記サブバンド信号より、所定の基準に合致する程度の周期性を有する連続成分、及び、当該サブバンド信号より前記連続成分を除いたものに相当するランダム成分を分離し、前記ピッチ信号の振幅が前記所定量に達していると判別した場合に、前記サブバンド信号を前記連続成分として扱う成分分離手段、
前記連続成分にエントロピー符号化又は線形予測符号化を施す符号化手段、
として機能させるためのものであることを特徴とする。 A program according to the eighth aspect of the present invention is
Computer
By acquiring the audio signal to be compressed representing the waveform of the audio, and dividing the audio signal into a plurality of intervals corresponding to the unit pitch of the audio, the phases of these intervals are made substantially the same, Audio signal processing means for processing a signal into a pitch waveform signal ;
Subband extraction means for generating a subband signal representing a temporal change in intensity of the fundamental frequency component and the harmonic component of the pitch waveform signal ;
When it is determined that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the subband signal has a continuous component having a periodicity that meets a predetermined criterion, and A random component corresponding to the subband signal excluding the continuous component is separated, and the subband signal is treated as the continuous component when it is determined that the amplitude of the pitch signal has reached the predetermined amount. Component separation means ,
Encoding means for performing entropy encoding or linear predictive encoding on the continuous components ;
Characterized in that it is intended to function as a.

また、この発明の第９の観点に係るプログラムは、
コンピュータを、
音声の波形を表す圧縮対象の音声信号の基本周波数成分及び高調波成分の強度の時間変化を表すサブバンド信号のうち所定の基準に合致する程度の周期性を有する連続成分を抽出してエントロピー符号化又は線形予測符号化を施したものに相当する入力信号を取得し、当該入力信号を復号化することにより当該連続成分を復元する復号化手段、
前記復元された連続成分を前記サブバンド信号より除いたものに相当するランダム成分が存在する場合に、当該ランダム成分を取得し、当該連続成分に当該ランダム成分を加算することにより、当該サブバンド信号を復元し、前記ランダム成分が存在しない場合に、前記復元された連続成分を当該サブバンド信号として扱うサブバンド信号復元部、
前記復元されたサブバンド信号に基づいて前記圧縮対象の音声信号を復元する音声信号復元手段、
として機能させるためのものであることを特徴とする。
A program according to the ninth aspect of the present invention is
Computer
An entropy code is extracted by extracting continuous components having a periodicity that meets a predetermined standard from subband signals representing temporal changes in the fundamental frequency components and harmonic component intensities of audio signals to be compressed that represent speech waveforms. A decoding unit that obtains an input signal corresponding to a signal that has been subjected to normalization or linear predictive coding, and restores the continuous component by decoding the input signal ;
If the random component corresponding to the restored continuous component minus from said sub-band signal is present, by obtains the random component, adds the random component to the connected component, the sub-band A subband signal restoration unit that restores a signal and treats the restored continuous component as the subband signal when the random component is not present ;
Voice signal restoring means for restoring the audio signal of the compressed object based on the reconstructed subband signals,
Characterized in that it is intended to function as a.

この発明によれば、人が発する音声を表す成分を含んだデータのデータ容量を効率よく圧縮することを可能にするための音声信号圧縮装置、音声信号圧縮方法及びプログラムが実現され、また、このような音声信号圧縮装置及び音声信号圧縮方法により圧縮されたデータを復元するための音声データ復元装置、音声データ復元方法及びプログラムが実現される。 According to the present invention, an audio signal compression device, an audio signal compression method, and a program for efficiently compressing a data capacity of data including a component representing audio uttered by a person are realized. An audio data restoration device, an audio data restoration method, and a program for restoring the data compressed by the audio signal compression device and the audio signal compression method are realized.

以下に、図面を参照して、この発明の実施の形態を説明する。
（第１の実施の形態）
図１は、この発明の第１の実施の形態に係る音声データ圧縮システムの構成を示す図である。図示するように、この音声データ圧縮システムは、記録媒体（例えば、フレキシブルディスクやＣＤ−Ｒ（Compact Disc-Recordable）など）に記録されたデータを読み取る記録媒体ドライブ装置（フレキシブルディスクドライブや、ＣＤ−ＲＯＭドライブなど）ＳＭＤと、記録媒体ドライブ装置ＳＭＤに接続されたコンピュータＣ１とより構成されている。 Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
FIG. 1 is a diagram showing the configuration of an audio data compression system according to the first embodiment of the present invention. As shown in the figure, this audio data compression system includes a recording medium drive device (a flexible disk drive or a CD-R) that reads data recorded on a recording medium (for example, a flexible disk or a CD-R (Compact Disc-Recordable)). (ROM drive etc.) SMD and computer C1 connected to the recording medium drive device SMD.

図示するように、コンピュータＣ１は、ＣＰＵ（Central Processing Unit）やＤＳＰ（Digital Signal Processor）等からなるプロセッサや、ＲＡＭ（Random Access Memory）等からなる揮発性メモリや、ハードディスク装置等からなる不揮発性メモリや、キーボード等からなる入力部や、液晶ディスプレイ等からなる表示部や、ＵＳＢ（Universal Serial Bus）インターフェース回路等からなっていて外部とのシリアル通信を制御するシリアル通信制御部などからなっている。 As shown in the figure, a computer C1 includes a processor composed of a CPU (Central Processing Unit) and a DSP (Digital Signal Processor), a volatile memory composed of a RAM (Random Access Memory), etc., and a nonvolatile memory composed of a hard disk device and the like. And an input unit including a keyboard, a display unit including a liquid crystal display, and a serial communication control unit configured to control serial communication with the outside, including a USB (Universal Serial Bus) interface circuit.

コンピュータＣ１は音声データ圧縮プログラムを予め記憶しており、この音声データ圧縮プログラムを実行することにより後述する処理を行う。 The computer C1 stores an audio data compression program in advance, and performs the processing described later by executing the audio data compression program.

（第１の実施の形態：動作）
次に、この音声データ圧縮システムの動作を、図２及び図３を参照して説明する。図２及び図３は、図１の音声データ圧縮システムの動作の流れを示す図である。 (First Embodiment: Operation)
Next, the operation of this audio data compression system will be described with reference to FIGS. 2 and 3 are diagrams showing an operation flow of the audio data compression system of FIG.

ユーザが、音声の波形を表す音声データを記録した記録媒体を記録媒体ドライブ装置ＳＭＤにセットして、コンピュータＣ１に、音声データ圧縮プログラムの起動を指示すると、コンピュータＣ１は、音声データ圧縮プログラムの処理を開始する。 When the user sets a recording medium on which audio data representing an audio waveform is recorded in the recording medium drive device SMD and instructs the computer C1 to start the audio data compression program, the computer C1 performs processing of the audio data compression program. To start.

すると、まず、コンピュータＣ１は、記録媒体ドライブ装置ＳＭＤを介し、記録媒体より音声データを読み出す（図２、ステップＳ１０１）。なお、音声データは、例えばＰＣＭ（Pulse Code Modulation）変調されたディジタル信号の形式を有しており、音声のピッチより十分短い一定の周期でサンプリングされた音声を表しているものとする。 Then, first, the computer C1 reads audio data from the recording medium via the recording medium drive device SMD (FIG. 2, step S101). Note that the audio data has, for example, a PCM (Pulse Code Modulation) modulated digital signal format, and represents audio sampled at a constant cycle sufficiently shorter than the audio pitch.

次に、コンピュータＣ１は、記録媒体より読み出された音声データをフィルタリングすることにより、フィルタリングされた音声データ（ピッチ信号）を生成する（ステップＳ１０２）。ピッチ信号は、音声データのサンプルリング間隔と実質的に同一のサンプリング間隔を有するディジタル形式のデータからなるものとする。 Next, the computer C1 generates filtered voice data (pitch signal) by filtering the voice data read from the recording medium (step S102). The pitch signal is assumed to be digital data having a sampling interval substantially the same as the sampling interval of audio data.

なお、コンピュータＣ１は、ピッチ信号を生成するために行うフィルタリングの特性を、後述するピッチ長と、ピッチ信号の瞬時値が０となる時刻（ゼロクロスする時刻）とに基づくフィードバック処理を行うことにより決定する。 The computer C1 determines the characteristics of the filtering performed to generate the pitch signal by performing feedback processing based on the pitch length described later and the time when the instantaneous value of the pitch signal becomes 0 (time when zero crossing). To do.

すなわち、コンピュータＣ１は、読み出した音声データに、例えば、ケプストラム解析や、自己相関関数に基づく解析を施すことにより、この音声データが表す音声の基本周波数を特定し、この基本周波数の逆数の絶対値（すなわち、ピッチ長）を求める（ステップＳ１０３）。（あるいは、コンピュータＣ１は、ケプストラム解析及び自己相関関数に基づく解析の両方を行うことにより基本周波数を２個特定し、これら２個の基本周波数の逆数の絶対値の平均をピッチ長として求めるようにしてもよい。） That is, the computer C1 identifies the fundamental frequency of the voice represented by the voice data by performing, for example, cepstrum analysis or analysis based on the autocorrelation function on the read voice data, and the absolute value of the reciprocal of the fundamental frequency. (That is, the pitch length) is obtained (step S103). (Alternatively, the computer C1 specifies two fundamental frequencies by performing both cepstrum analysis and analysis based on the autocorrelation function, and obtains the average of the absolute values of the reciprocals of these two fundamental frequencies as the pitch length. May be.)

なお、ケプストラム解析としては、具体的には、まず、読み出した音声データの強度を、元の値の対数（対数の底は任意）に実質的に等しい値へと変換し、値が変換された音声データのスペクトル（すなわち、ケプストラム）を、高速フーリエ変換の手法（あるいは、離散的変数をフーリエ変換した結果を表すデータを生成する他の任意の手法）により求める。そして、このケプストラムの極大値を与える周波数のうちの最小値を基本周波数として特定する。 For cepstrum analysis, specifically, the intensity of the read audio data is first converted to a value substantially equal to the logarithm of the original value (the base of the logarithm is arbitrary), and the value is converted. The spectrum (ie, cepstrum) of the audio data is obtained by a fast Fourier transform method (or any other method that generates data representing the result of Fourier transform of discrete variables). Then, the minimum value of the frequencies giving the maximum value of the cepstrum is specified as the fundamental frequency.

一方、自己相関関数に基づく解析としては、具体的には、読み出した音声データを用いてまず、数式１の右辺により表される自己相関関数ｒ（ｌ）を特定する。そして、自己相関関数ｒ（ｌ）をフーリエ変換した結果得られる関数（ピリオドグラム）の極大値を与える周波数のうち、所定の下限値を超える最小の値を基本周波数として特定する。 On the other hand, as the analysis based on the autocorrelation function, specifically, the autocorrelation function r (l) represented by the right side of Formula 1 is first specified using the read audio data. Then, a minimum value exceeding a predetermined lower limit value is specified as a fundamental frequency among frequencies giving a maximum value of a function (periodogram) obtained as a result of Fourier transform of the autocorrelation function r (l).

一方、コンピュータＣ１は、ピッチ信号がゼロクロスする時刻が来るタイミングを特定する（ステップＳ１０４）。そして、コンピュータＣ１は、ピッチ信号のゼロクロスの周期とピッチ長とが互いに所定量以上異なっているか否かを判別し（ステップＳ１０５）、異なっていないと判別した場合は、ゼロクロスの周期の逆数を中心周波数とするようなバンドパスフィルタの特性で上述のフィルタリングを行うこととする（ステップＳ１０６）。一方、所定量以上異なっていると判別した場合は、ピッチ長の逆数を中心周波数とするようなバンドパスフィルタの特性で上述のフィルタリングを行うこととする（ステップＳ１０７）。なお、いずれの場合も、フィルタリングの通過帯域幅は、通過帯域の上限が音声データの表す音声の基本周波数の２倍以内に常に収まるような通過帯域幅であることが望ましい。 On the other hand, the computer C1 specifies the timing at which the time when the pitch signal crosses zero (step S104). Then, the computer C1 determines whether or not the zero-cross cycle and the pitch length of the pitch signal are different from each other by a predetermined amount or more (step S105), and if it is determined that they are not different, the reciprocal of the zero-cross cycle is the center. It is assumed that the above-described filtering is performed with the characteristics of the band-pass filter that has the frequency (step S106). On the other hand, if it is determined that they differ by a predetermined amount or more, the above-described filtering is performed with the characteristics of a bandpass filter having the reciprocal of the pitch length as the center frequency (step S107). In any case, it is desirable that the filtering pass band width is such that the upper limit of the pass band always falls within twice the fundamental frequency of the voice represented by the voice data.

次に、コンピュータＣ１は、生成したピッチ信号の単位周期（例えば１周期）の境界が来るタイミング（具体的には、ピッチ信号がゼロクロスするタイミング）で、記録媒体から読み出した音声データを区切る（ステップＳ１０８）。そして、区切られてできる区間のそれぞれについて、この区間内の音声データの位相を種々変化させたものとこの区間内のピッチ信号との相関を求め、最も相関が高くなるときの音声データの位相を、この区間内の音声データの位相として特定する（ステップＳ１０９）。そして、音声データのそれぞれの区間を、互いが実質的に同じ位相になるように移相する（ステップＳ１１０）。 Next, the computer C1 divides the audio data read from the recording medium at the timing when the boundary of the unit period (for example, one period) of the generated pitch signal comes (specifically, the timing at which the pitch signal crosses zero) (step S1). S108). Then, for each of the sections that can be divided, the correlation between the variously changed phases of the audio data in this section and the pitch signal in this section is obtained, and the phase of the audio data when the correlation becomes the highest is obtained. The phase of the audio data in this section is specified (step S109). Then, the respective sections of the audio data are phase-shifted so that they have substantially the same phase (step S110).

具体的には、コンピュータＣ１は、それぞれの区間毎に、例えば、数式２の右辺により表される値ｃｏｒを、位相を表すφ（ただし、φは０以上の整数）の値を種々変化させた場合それぞれについて求める。そして、値ｃｏｒが最大になるようなφの値Ψを、この区間内の音声データの位相を表す値として特定する。この結果、この区間につき、ピッチ信号との相関が最も高くなる位相の値が定まる。そして、コンピュータＣ１は、この区間内の音声データを、（−Ψ）だけ移相する。 Specifically, the computer C1 changes, for each section, for example, the value cor represented by the right side of Formula 2 and the value of φ representing the phase (where φ is an integer of 0 or more). Ask for each case. Then, the value ψ of φ that maximizes the value cor is specified as a value representing the phase of the audio data in this section. As a result, the value of the phase having the highest correlation with the pitch signal is determined for this section. Then, the computer C1 shifts the audio data in this section by (−Ψ).

音声データを上述の通り移相することにより得られるデータが、ピッチ波形データである。ピッチ波形データが表す波形の一例を図４（ｃ）に示す。図４（ａ）に示す移相前の音声データの波形のうち、「＃１」及び「＃２」として示す２個の区間は、図４（ｂ）に示すように、ピッチのゆらぎの影響により互いに異なる位相を有している。これに対し、移相された音声データ（すなわち、ピッチ波形データ）が表す波形の区間＃１及び＃２は、図４（ｃ）に示すように、ピッチのゆらぎの影響が除去されて位相が揃っている。また、図４（ａ）に示すように、各区間の始点の値は０に近い値となっている。 Data obtained by phase-shifting audio data as described above is pitch waveform data. An example of the waveform represented by the pitch waveform data is shown in FIG. Among the waveforms of the audio data before phase shift shown in FIG. 4A, two sections indicated as “# 1” and “# 2” are affected by pitch fluctuations as shown in FIG. 4B. Have different phases. On the other hand, in the sections # 1 and # 2 of the waveform represented by the phase-shifted audio data (that is, the pitch waveform data), as shown in FIG. It's all there. Further, as shown in FIG. 4A, the value of the start point of each section is a value close to zero.

なお、区間の時間的な長さは、１ピッチ分程度であることが望ましい。区間が長いほど、区間内のサンプル数が増えて、ピッチ波形データのデータ量が増大し、あるいは、サンプリング間隔が増大してピッチ波形データが表す音声が不正確になる、という問題が生じる。 Note that the time length of the section is preferably about one pitch. As the section becomes longer, the number of samples in the section increases and the amount of pitch waveform data increases, or the sampling interval increases and the voice represented by the pitch waveform data becomes inaccurate.

次に、コンピュータＣ１は、ピッチ波形データを補間する（ステップＳ１１１）。すなわち、ピッチ波形データのサンプル間を補間する値を表す補間データを生成してピッチ波形データに追加することにより、補間後のピッチ波形データを生成する。 Next, the computer C1 interpolates pitch waveform data (step S111). That is, interpolation data representing values for interpolating between samples of pitch waveform data is generated and added to the pitch waveform data, thereby generating pitch waveform data after interpolation.

次に、コンピュータＣ１は、補間後のピッチ波形データの各区間をサンプリングし直す（リサンプリングする）。また、各区間の元のサンプル数を示すデータであるサンプル数データも生成する（ステップＳ１１２）。なお、コンピュータＣ１は、ピッチ波形データの各区間のサンプル数が互いにほぼ等しくなるようにして、同一区間内では等間隔になるようリサンプリングするものとする。
記録媒体より読み出した音声データのサンプリング間隔が既知であるものとすれば、サンプル数データは、この音声データの単位ピッチ分の区間の元の時間長を表す情報として機能する。 Next, the computer C1 resamples (resamples) each section of the pitch waveform data after interpolation. In addition, sample number data that is data indicating the original number of samples in each section is also generated (step S112). Note that the computer C1 performs resampling so that the number of samples in each section of the pitch waveform data is substantially equal to each other, and is equally spaced within the same section.
Assuming that the sampling interval of the audio data read from the recording medium is known, the sample number data functions as information representing the original time length of the unit pitch of the audio data.

次に、コンピュータＣ１は、リサンプリングされたピッチ波形データにＤＣＴ（Discrete Cosine Transform）等の直交変換を施すことにより、サブバンドデータ群を生成する（ステップＳ１１３）。サブバンドデータ群は、リサンプリングされたピッチ波形データが表す音声の基本周波数成分の強度の時間変化を表すデータ（０番目のサブバンドデータ）と、この音声のｎ個（ｎは自然数）の高調波成分の強度の時間変化を表すｎ個のデータ（１番目〜ｎ番目のサブバンドデータ）とより構成されている。（従って、サブバンドデータは、音声の基本周波数成分（又は高調波成分）の強度の時間変化がないとき、この基本周波数成分（又は高調波成分）の強度を、直流信号の形で表す。） Next, the computer C1 generates a subband data group by performing orthogonal transformation such as DCT (Discrete Cosine Transform) on the resampled pitch waveform data (step S113). The subband data group includes data (0th subband data) representing the temporal change in intensity of the fundamental frequency component of the voice represented by the resampled pitch waveform data, and n harmonics (n is a natural number) of this voice. It is composed of n pieces of data (1st to nth subband data) representing the time change of the intensity of the wave component. (Therefore, the subband data represents the intensity of the fundamental frequency component (or harmonic component) in the form of a direct current signal when there is no temporal change in the intensity of the fundamental frequency component (or harmonic component) of the voice.)

次に、コンピュータＣ１は、ステップＳ１０２で生成されたピッチ信号の振幅が所定量に達しているか否かを判別し（ステップＳ１１４）、達していないと判別した場合は、ステップＳ１１３で生成したサブバンドデータ群に含まれるサブバンドデータをそれぞれフィルタリングすることにより、各々のサブバンドデータのうち一定程度以上に強い周期性を有する成分を表すデータ（以下、連続成分データと呼ぶ）を生成し、また、当該サブバンドデータから連続成分を除いた成分を表すデータ（以下、ランダム成分データと呼ぶ）も生成して（ステップＳ１１５）、ステップＳ１１７に処理を移す。
（なお、以下では、ｋ番目（ｋは０以上ｎ以下の整数）のサブバンドデータより分離された連続成分データをｋ番目の連続成分データと呼び、また、ｋ番目のサブバンドデータより分離されたランダム成分データをｋ番目のランダム成分データと呼ぶ。） Next, the computer C1 determines whether or not the amplitude of the pitch signal generated in step S102 has reached a predetermined amount (step S114). If it is determined that the amplitude has not reached, the subband generated in step S113 is determined. By filtering each of the subband data included in the data group, data representing a component having a periodicity stronger than a certain degree among each subband data (hereinafter referred to as continuous component data) is generated, Data representing a component obtained by removing the continuous component from the subband data (hereinafter referred to as random component data) is also generated (step S115), and the process proceeds to step S117.
(Hereinafter, the continuous component data separated from the k-th (k is an integer from 0 to n)) sub-band data is referred to as k-th continuous component data, and is separated from the k-th sub-band data. The random component data is called the kth random component data.)

一方、ピッチ信号の振幅が所定量に達しているとステップＳ１１４で判別した場合、コンピュータＣ１は、ｋ番目のサブバンドデータをそのままｋ番目の連続成分データとして扱うことと決定し（ステップＳ１１６）、処理をステップＳ１１７に移す。 On the other hand, if it is determined in step S114 that the amplitude of the pitch signal has reached the predetermined amount, the computer C1 determines to treat the kth subband data as it is as the kth continuous component data (step S116). The process moves to step S117.

一般に、人が発声する音声には、周期性のあるピッチ成分が多く含まれているのに対し、その他の音（例えば、楽器が発生する音など）には、周期性のある成分が多く含まれない。従って、上述の連続成分データは、サブバンドデータのうち人が発声する音声に起因する成分を表すとみることができ、一方、上述のランダム成分データは、人が発声する音声に起因しない成分を表すとみることができる。
そして、ステップＳ１１４でコンピュータＣ１が行う処理は、「サブバンドデータ群のうち人が発声する音声に起因しない成分の存在を無視し、サブバンドデータ群の全成分を、人が発声する音声の成分を表すものとして扱ってよいか否か」を判定する処理に相当するものであって、人が発声する音声に起因しない成分を無視できない（具体的には、ピッチ信号の振幅が所定量に達していない）と判別した場合には、サブバンドデータを、人が発声する音声に起因すると考えられる成分と、起因しないと考えられる成分とに分離する、という処理に相当するものである。 In general, speech uttered by humans contains a lot of periodic pitch components, while other sounds (such as sounds generated by musical instruments) contain a lot of periodic components. I can't. Therefore, it can be considered that the above-mentioned continuous component data represents a component caused by a voice uttered by a person in the subband data, while the above-mentioned random component data contains a component not caused by a voice uttered by a person. It can be seen as a representation.
The process performed by the computer C1 in step S114 is as follows: “Non-existence of components that do not originate in the speech uttered by a person in the subband data group and ignore all the components in the subband data group as components of the speech uttered by the person. It is equivalent to the process of determining whether or not it can be treated as an expression of the signal, and a component that is not caused by the voice uttered by a person cannot be ignored (specifically, the amplitude of the pitch signal reaches a predetermined amount). In other words, the subband data corresponds to a process of separating the subband data into a component that is considered to be caused by speech uttered by a person and a component that is considered not to be caused.

次に、コンピュータＣ１は、ステップＳ１１５又はＳ１１６で得られた（ｎ＋１）個の連続成分データ及びステップＳ１１５で得られた（ｎ＋１）個のランダム成分データを用いて（ｎ＋１）個の非線形量子化連続成分データ及び（ｎ＋１）個の非線形量子化ランダム成分データを生成し（ステップＳ１１７）、ステップＳ１１７で得られた（ｎ＋１）個の非線形量子化連続成分データを含むデータ及び（ｎ＋１）個の非線形量子化ランダム成分データを含むデータをそれぞれエントロピー符号化することにより後述の連続成分圧縮データ及びランダム成分圧縮データを生成して（ステップＳ１１８）、連続成分データの圧縮率（つまり、（ｎ＋１）個の連続成分データ及び（ｎ＋１）個のランダム成分データのデータ量の総計に対する、連続成分圧縮データ及びランダム成分圧縮データのデータ量の総計の比）と、所定の目標値との大小関係を判別する（ステップＳ１１９）、という処理を、連続成分データ及びランダム成分データの圧縮率が当該目標値に至るまで繰り返す。 Next, the computer C1 uses the (n + 1) continuous component data obtained in step S115 or S116 and the (n + 1) random component data obtained in step S115 to (n + 1) nonlinear quantization continuous data. Component data and (n + 1) nonlinear quantized random component data are generated (step S117), and data including (n + 1) nonlinear quantized continuous component data obtained in step S117 and (n + 1) nonlinear quanta Each of the data including the randomized component data is entropy-encoded to generate continuous component compressed data and random component compressed data, which will be described later (step S118), and the compression rate of the continuous component data (that is, (n + 1) continuous data). For the total amount of component data and (n + 1) random component data, The ratio of the total amount of component compressed data and random component compressed data) and a predetermined target value are determined (step S119). Repeat until the target value is reached.

コンピュータＣ１は、具体的には、例えばまずステップＳ１１７で、ステップＳ１１５又はＳ１１６で得られた（ｎ＋１）個の連続成分データのそれぞれが表す波形の瞬時値に非線形な圧縮を施して得られる値（具体的には、たとえば、瞬時値を上に凸な関数に代入して得られる値）を量子化したものに相当する合計（ｎ＋１）個のデータを、上述の（ｎ＋１）個の非線形量子化連続成分データとして生成する。また、ステップＳ１１７でコンピュータＣ１は、ステップＳ１１５で得られた（ｎ＋１）個のランダム成分データのそれぞれが表す波形の瞬時値に当該非線形な圧縮を施して得られる値を量子化したものに相当する合計（ｎ＋１）個のデータを、上述の（ｎ＋１）個の非線形量子化ランダム成分データとして生成する。 Specifically, the computer C1, for example, first in step S117, a value obtained by applying nonlinear compression to the instantaneous value of the waveform represented by each of the (n + 1) continuous component data obtained in step S115 or S116 ( Specifically, for example, a total of (n + 1) data corresponding to the quantized value obtained by substituting an instantaneous value into an upward convex function is converted into the above-described (n + 1) nonlinear quantization. Generated as continuous component data. Further, in step S117, the computer C1 corresponds to the quantized value obtained by applying the nonlinear compression to the instantaneous value of the waveform represented by each of the (n + 1) random component data obtained in step S115. A total of (n + 1) pieces of data are generated as the above-described (n + 1) pieces of nonlinear quantized random component data.

ステップＳ１１７でコンピュータＣ１が行う非線形な圧縮の圧縮特性（すなわち、瞬時値の圧縮前の値と圧縮後の値との対応関係）は、過去最も新しく実行されたステップＳ１１９の処理の結果に基づいて、コンピュータＣ１が決定する。具体的には、ステップＳ１１９で求めた圧縮率が目標値より大きいと判別すると、圧縮率が現在より小さくなるように圧縮特性を決定する。一方、求めた圧縮率が目標値より小さいと判別すると、圧縮率が現在より大きくなるように、圧縮特性を決定する。ただし、ステップＳ１１９の処理がまだ実行されていない場合は、所定の初期特性を圧縮特性として圧縮を行う。
また、ステップＳ１１７でコンピュータＣ１は、決定した圧縮特性を示す圧縮特性データを作成する。 The compression characteristic of the nonlinear compression performed by the computer C1 in step S117 (that is, the correspondence relationship between the pre-compression value and the post-compression value of the instantaneous value) is based on the result of the process of step S119 executed most recently. The computer C1 determines. Specifically, if it is determined that the compression rate obtained in step S119 is larger than the target value, the compression characteristics are determined so that the compression rate is smaller than the current value. On the other hand, if it is determined that the obtained compression rate is smaller than the target value, the compression characteristics are determined so that the compression rate is greater than the current value. However, if the process of step S119 has not been executed yet, compression is performed using a predetermined initial characteristic as a compression characteristic.
In step S117, the computer C1 creates compression characteristic data indicating the determined compression characteristic.

ステップＳ１１７で圧縮特性を決定する手順のより具体的な例を説明すると、コンピュータＣ１は、例えば数式３の右辺に含まれる関数ｇｌｏｂａｌ＿ｇａｉｎ（ｘｉ）を、過去最も新しく実行されたステップＳ１１９の処理の結果に基づいて決定する。そして、非線形圧縮後の各連続成分データや各ランダム成分データの瞬時値を、数式３の右辺に示す関数Ｘｒｉ（ｘｉ）を量子化した値に実質的に等しくなるようなものへと変更することにより非線形量子化を行う。一方でコンピュータＣ１は、決定した関数ｇｌｏｂａｌ＿ｇａｉｎ（ｘｉ）を表すデータを、上述の圧縮特性データとして作成する。 A more specific example of the procedure for determining the compression characteristics in step S117 will be described. The computer C1, for example, the function global_gain (xi) included in the right side of Equation 3 is the result of the process of step S119 executed most recently. Determine based on. Then, the instantaneous value of each continuous component data and each random component data after nonlinear compression is changed to a value that is substantially equal to a value obtained by quantizing the function Xri (xi) shown on the right side of Equation 3. To perform nonlinear quantization. On the other hand, the computer C1 creates data representing the determined function global_gain (xi) as the above-described compression characteristic data.

（数３）Ｘｒｉ（ｘｉ）＝ｓｇｎ（ｘｉ）・｜ｘｉ｜^４／３・２^{｛ｇｌｏｂａｌ＿ｇａｉｎ（ｘｉ）｝／４}
（ただし、ｓｇｎ（α）＝（α／｜α｜）、ｘｉは、連続成分データの波形の瞬時値、ｇｌｏｂａｌ＿ｇａｉｎ（ｘｉ）は、フルスケールを設定するためのｘｉの関数） (Expression 3) Xri (xi) = sgn (xi) · | xi | ^4/3 · 2 ^{{global_gain (xi)} / 4}
(Where sgn (α) = (α / | α |), xi is the instantaneous value of the waveform of the continuous component data, and global_gain (xi) is a function of xi for setting the full scale)

次に、ステップＳ１１８でコンピュータＣ１は、ステップＳ１１７で得られた（ｎ＋１）個の非線形量子化連続成分データと圧縮特性データとをエントロピー符号化する（具体的には、例えば算術符号（arithmetic code）あるいはハフマン符号へと変換する）ことにより、連続成分圧縮データを生成する。また、ステップＳ１１８でコンピュータＣ１は、ステップＳ１１７で得られた（ｎ＋１）個の非線形量子化ランダム成分データをエントロピー符号化することにより、ランダム成分圧縮データを生成する。 Next, in step S118, the computer C1 entropy-encodes (n + 1) pieces of nonlinear quantized continuous component data and compression characteristic data obtained in step S117 (specifically, for example, an arithmetic code) Alternatively, continuous component compressed data is generated by converting the data into a Huffman code. In step S118, the computer C1 entropy-codes the (n + 1) nonlinear quantized random component data obtained in step S117, thereby generating random component compressed data.

次に、ステップＳ１１９でコンピュータＣ１は、ステップＳ１１４で得られた（ｎ＋１）個の連続成分データ及び（ｎ＋１）個のランダム成分データのデータ量の総計に対する、ステップＳ１１８で得られた連続成分圧縮データ及びランダム成分圧縮データのデータ量の総計の比を圧縮率として求め、求めた圧縮率が、上述の目標値（たとえば、１００分の１）より大きいか、小さいか、あるいは当該目標値に実質的に等しいか、を判別する。そして、求めた圧縮率が目標値より大きいか又は小さいと判別すると、処理をステップＳ１１７に戻す。 Next, in step S119, the computer C1 determines the continuous component compressed data obtained in step S118 for the total amount of (n + 1) continuous component data and (n + 1) random component data obtained in step S114. And the ratio of the total amount of random component compressed data is obtained as a compression rate, and the obtained compression rate is larger or smaller than the above-mentioned target value (for example, 1/100) or substantially equal to the target value. Is equal to If it is determined that the obtained compression rate is larger or smaller than the target value, the process returns to step S117.

一方、求めた圧縮率が目標値に実質的に等しいと判別すると、コンピュータＣ１は、ステップＳ１１８で生成した連続成分圧縮データ及びランダム成分圧縮データと、ステップＳ１１２で生成したサンプル数データとを、自己のシリアル通信制御部を介して外部に出力する（ステップＳ１２０）。 On the other hand, when determining that the obtained compression rate is substantially equal to the target value, the computer C1 uses the continuous component compression data and random component compression data generated in step S118 and the sample number data generated in step S112 as its own. Is output to the outside via the serial communication control unit (step S120).

以上説明した処理を行う結果、この音声データ圧縮システムは、圧縮の対象である音声データを、所定の基準に合致する程度の周期性を有する連続成分データと、その他の成分を表すランダム成分データとに分離し、連続成分データ及びランダム成分データに別個にエントロピー符号化を施す。このため、この音声データのうち人が発する音声に起因する成分と起因しない成分とが別個にエントロピー符号化され、音声データは全体として効率的に圧縮される。従ってこの音声データ圧縮システムは、例えば、人の発した音声と背景音楽とを含んだ音声を表すボイスメールなどを、効率よく圧縮することができる。 As a result of performing the processing described above, this audio data compression system converts audio data to be compressed into continuous component data having a periodicity that meets a predetermined standard, and random component data representing other components, and And entropy coding is performed separately on the continuous component data and the random component data. For this reason, in the audio data, the component caused by the voice uttered by the person and the component not caused by the entropy coding are separately encoded, and the audio data is efficiently compressed as a whole. Therefore, this voice data compression system can efficiently compress, for example, voice mails representing voices including human voices and background music.

また、音声データはピッチ波形データへと加工されることにより単位ピッチ分の区間の長さや振幅が規格化され、ピッチのゆらぎの影響が除去されている。このため、ピッチ波形データのうち人が発する音声に起因する成分は強い周期性を有するものとなり、この成分は、成分分離部Ｅ４によって連続成分データとして正確に抽出される。抽出されたこの連続成分データは強い周期性を有しているため、連続成分データのエントロピー符号化は効率的に行われる。 Further, the voice data is processed into pitch waveform data, so that the length and amplitude of a section for a unit pitch are standardized, and the influence of pitch fluctuation is removed. For this reason, the component resulting from the voice which a person utters among pitch waveform data has a strong periodicity, and this component is correctly extracted as continuous component data by the component separation part E4. Since the extracted continuous component data has a strong periodicity, entropy coding of the continuous component data is efficiently performed.

また、この音声データ圧縮システムが生成するピッチ波形データの各区間の元の時間長は、サンプル数データを用いて特定することが可能である。このため、連続成分圧縮データやランダム成分データを取得した外部の装置は、これらの連続成分圧縮データやランダム成分データを用いてピッチ波形データを復元した後、復元されたピッチ波形データの各区間の時間長を元の音声データにおける時間長へと復元することによって、元の音声データを容易に復元できる。 Further, the original time length of each section of the pitch waveform data generated by the audio data compression system can be specified using the sample number data. For this reason, an external device that acquires continuous component compressed data and random component data restores the pitch waveform data using these continuous component compressed data and random component data, and then restores each section of the restored pitch waveform data. By restoring the time length to the time length in the original voice data, the original voice data can be easily restored.

なお、この音声データ圧縮システムの構成は上述のものに限られない。
たとえば、コンピュータＣ１は、外部からシリアル伝送される音声データを、シリアル通信制御部を介して取得するようにしてもよい。また、電話回線、専用回線、衛星回線等の通信回線を介して外部より音声データを取得するようにしてもよく、この場合、コンピュータＣ１は、例えばモデムやＤＳＵ（Data Service Unit）等を備えていればよい。また、記録媒体ドライブ装置ＳＭＤ以外から音声データを取得するならば、コンピュータＣ１は必ずしも記録媒体ドライブ装置ＳＭＤを備えている必要はない。 The configuration of the audio data compression system is not limited to the above.
For example, the computer C1 may acquire audio data serially transmitted from the outside via a serial communication control unit. In addition, voice data may be acquired from the outside via a communication line such as a telephone line, a dedicated line, a satellite line, etc. In this case, the computer C1 includes, for example, a modem, a DSU (Data Service Unit), and the like. Just do it. Further, if the audio data is acquired from other than the recording medium drive device SMD, the computer C1 does not necessarily need to include the recording medium drive device SMD.

また、コンピュータＣ１は、マイクロフォン、ＡＦ増幅器、サンプラー、Ａ／Ｄ（Analog-to-Digital）コンバータ及びＰＣＭエンコーダなどからなる集音装置を備えていてもよい。集音装置は、自己のマイクロフォンが集音した音声を表す音声信号を増幅し、サンプリングしてＡ／Ｄ変換した後、サンプリングされた音声信号にＰＣＭ変調を施すことにより、音声データを取得すればよい。なお、コンピュータＣ１が取得する音声データは、必ずしもＰＣＭ信号である必要はない。 The computer C1 may include a sound collection device including a microphone, an AF amplifier, a sampler, an A / D (Analog-to-Digital) converter, a PCM encoder, and the like. If the sound collection device acquires sound data by amplifying a sound signal representing sound collected by its own microphone, sampling and A / D converting, and then performing PCM modulation on the sampled sound signal Good. Note that the audio data acquired by the computer C1 is not necessarily a PCM signal.

また、コンピュータＣ１は、連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データの一部又は全部を、記録媒体ドライブ装置ＳＭＤにセットされた記録媒体に、記録媒体ドライブ装置ＳＭＤを介して書き込むようにしてもよい。あるいは、ハードディスク装置等からなる外部の記憶装置に書き込むようにしてもよい。これらの場合、コンピュータＣ１は、記録媒体ドライブ装置や、ハードディスクコントローラ等の制御回路を備えていればよい。 Further, the computer C1 writes part or all of the continuous component compressed data, the random component compressed data, and the sample number data to the recording medium set in the recording medium drive device SMD via the recording medium drive device SMD. May be. Alternatively, the data may be written in an external storage device such as a hard disk device. In these cases, the computer C1 only needs to include a control circuit such as a recording medium drive device and a hard disk controller.

また、コンピュータＣ１は、ケプストラム解析又は自己相関係数に基づく解析のいずれかを行わなくてもよく、この場合は、ケプストラム解析又は自己相関係数に基づく解析のうち一方の手法で求めた基本周波数の逆数をそのままピッチ長として扱うようにすればよい。 In addition, the computer C1 does not have to perform either cepstrum analysis or analysis based on the autocorrelation coefficient. In this case, the fundamental frequency obtained by one of the cepstrum analysis or the analysis based on the autocorrelation coefficient. The reciprocal of can be handled as the pitch length as it is.

また、コンピュータＣ１が音声データの各区間内の音声データを移相する量は（−Ψ）である必要はなく、例えば、コンピュータＣ１は、初期位相を表す各区間に共通な実数をδとして、それぞれの区間につき、（−Ψ＋δ）だけ、音声データを移相するようにしてもよい。また、コンピュータＣ１が音声データを区切る位置は、必ずしもピッチ信号がゼロクロスするタイミングである必要はなく、例えば、ピッチ信号が０でない所定の値となるタイミングであってもよい。
しかし、初期位相αを０とし、且つ、ピッチ信号がゼロクロスするタイミングで音声データを区切るようにすれば、各区間の始点の値は０に近い値になるので、音声データを各区間へと区切ることに各区間が含むようになるノイズの量が少なくなる。 Further, the amount by which the computer C1 shifts the audio data in each section of the audio data does not need to be (−Ψ). For example, the computer C1 sets δ as a real number common to each section representing the initial phase. For each section, the audio data may be phase-shifted by (−Ψ + δ). Further, the position where the computer C1 divides the audio data does not necessarily have to be the timing at which the pitch signal crosses zero, and may be the timing at which the pitch signal has a predetermined value other than 0, for example.
However, if the initial phase α is set to 0 and the audio data is divided at the timing when the pitch signal crosses zero, the value of the start point of each section becomes a value close to 0, so the audio data is divided into each section. In particular, the amount of noise included in each section is reduced.

また、ピッチ波形データの補間は必ずしもラグランジェ補間の手法により行われる必要はなく、例えば直線補間の手法によってもよいし、補間自体を省略してもよい。
また、圧縮する対象の音声データのピッチのゆらぎが無視できる程度であれば、コンピュータＣ１は、当該音声データの移相を行う必要はなく、当該音声データをピッチ波形データと同視してステップＳ１１３以降の処理を行うようにしてもよい。また、音声データの補間やリサンプリングも、必ずしも必要な処理ではない。 Further, the pitch waveform data need not be interpolated by the Lagrange interpolation method, for example, the linear interpolation method may be used, or the interpolation itself may be omitted.
Further, if the fluctuation of the pitch of the audio data to be compressed is negligible, the computer C1 does not need to perform phase shift of the audio data, and the audio data is regarded as pitch waveform data and after step S113. You may make it perform the process of. Also, interpolation and resampling of audio data are not necessarily required processes.

また、コンピュータＣ１はステップＳ１１４で、ピッチ信号の振幅が所定量に達しているか否かを判別する代わりに、ピッチ波形信号の振幅に対するピッチ信号の振幅の比率が所定量に達しているか否かを判別してもよい。この場合コンピュータＣ１は、当該比率が所定量に達していないと判別した場合はステップＳ１１５に処理を移し、達していると判別した場合はステップＳ１１６に処理を移すようにすればよい。 In step S114, the computer C1 determines whether the ratio of the amplitude of the pitch signal to the amplitude of the pitch waveform signal has reached a predetermined amount instead of determining whether the amplitude of the pitch signal has reached a predetermined amount. It may be determined. In this case, the computer C1 may move the process to step S115 when it is determined that the ratio has not reached the predetermined amount, and may move to step S116 when it is determined that the ratio has been reached.

また、コンピュータＣ１は、必ずしも圧縮特性データをエントロピー符号化しなくてもよく、この場合、コンピュータＣ１は、ステップＳ１１８では例えば非線形量子化連続成分データのみをエントロピー符号化して連続成分圧縮データを生成するものとし、ステップＳ１２０では、ステップＳ１１８で生成した連続成分圧縮データ及びランダム成分圧縮データと、ステップＳ１１２で生成したサンプル数データと、ステップＳ１１７で生成した圧縮特性データと、を出力するようにすればよい。 Further, the computer C1 does not necessarily need to entropy encode the compression characteristic data. In this case, in step S118, the computer C1 entropy encodes only the nonlinear quantized continuous component data to generate continuous component compressed data, for example. In step S120, the continuous component compression data and random component compression data generated in step S118, the sample number data generated in step S112, and the compression characteristic data generated in step S117 may be output. .

また、コンピュータＣ１は、ステップＳ１１９の処理を省略してもよく、この場合例えば、ステップＳ１１７では所定の圧縮特性で非線形量子化連続成分データ及び非線形量子化ランダム成分データを生成し、ステップＳ１１８では、ステップＳ１１７で得られた非線形量子化連続成分データ及び圧縮特性データとをエントロピー符号化することにより連続成分圧縮データを生成し、ステップＳ１１７で得られた非線形量子化ランダム成分データをエントロピー符号化することによりランダム成分圧縮データを生成して、処理をステップＳ１２０へと移すようにしてもよい。なお、コンピュータＣ１は、ステップＳ１１７において所定の圧縮特性で非線形量子化連続成分データ及び非線形量子化ランダム成分データを生成する場合は、当該所定の圧縮特定を示す圧縮特性データをあらかじめ記憶するようにしてもよく、あるいは、圧縮特性データのエントロピー符号化ないし外部への出力を省略してもよい。 Further, the computer C1 may omit the process of step S119. In this case, for example, in step S117, nonlinear quantized continuous component data and nonlinear quantized random component data are generated with a predetermined compression characteristic, and in step S118, Continuous component compressed data is generated by entropy encoding the nonlinear quantized continuous component data and compression characteristic data obtained in step S117, and the nonlinear quantized random component data obtained in step S117 is entropy encoded. Thus, the random component compressed data may be generated and the process may be shifted to step S120. When the computer C1 generates the nonlinear quantized continuous component data and the nonlinear quantized random component data with the predetermined compression characteristic in step S117, the computer C1 stores the compression characteristic data indicating the predetermined compression specification in advance. Alternatively, entropy coding of compression characteristic data or output to the outside may be omitted.

また、コンピュータＣ１は、ステップＳ１１８で、非線形量子化連続成分データのエントロピー符号化を行う代わりに、ステップＳ１１５若しくはＳ１１６で得られた（ｎ＋１）個の連続成分データ（又は、ステップＳ１１７で得られた（ｎ＋１）個の非線形量子化連続成分データ）を線形予測符号化することにより連続成分圧縮データを生成するようにしてもよい。このように、音声を表すサブバンドデータからランダム成分データ（人が発する音声に起因しないと考えられる成分）を除いた成分を表す連続成分データについて線形予測符号化を行うようにすれば、人が発する音声を表すデータが、人が発する音声に起因しない成分による影響を実質的に受けることなく、的確且つ効率的に線形予測符号化される。 In addition, instead of performing entropy coding of nonlinear quantized continuous component data in step S118, the computer C1 obtains (n + 1) pieces of continuous component data obtained in step S115 or S116 (or obtained in step S117). (N + 1) non-linear quantized continuous component data) may be linearly predictively encoded to generate continuous component compressed data. In this way, if linear predictive coding is performed on continuous component data representing components obtained by excluding random component data (components that are not considered to be caused by human-generated speech) from subband data representing speech, The data representing the voice to be uttered is accurately and efficiently subjected to linear predictive coding without being substantially affected by a component not caused by the voice uttered by a person.

なお、コンピュータＣ１は専用のシステムである必要はなく、パーソナルコンピュータ等であってよい。また、音声データ圧縮プログラムは、音声データ圧縮プログラムを格納した媒体（ＣＤ−ＲＯＭ、ＭＯ、フレキシブルディスク等）からコンピュータＣ１へとインストールするようにしてもよいし、通信回線の掲示板（ＢＢＳ）に音声データ圧縮プログラムをアップロードし、これを通信回線を介して配信してもよい。また、音声データ圧縮プログラムを表す信号により搬送波を変調し、得られた変調波を伝送し、この変調波を受信した装置が変調波を復調して音声データ圧縮プログラムを復元するようにしてもよい。 The computer C1 does not have to be a dedicated system and may be a personal computer or the like. The audio data compression program may be installed on the computer C1 from a medium (CD-ROM, MO, flexible disk, etc.) storing the audio data compression program, or the audio data compression program may be installed on a bulletin board (BBS) on a communication line. A data compression program may be uploaded and distributed via a communication line. Further, the carrier wave may be modulated with a signal representing the audio data compression program, the obtained modulated wave may be transmitted, and the apparatus that has received the modulated wave may demodulate the modulated wave to restore the audio data compression program. .

また、音声データ圧縮プログラムは、ＯＳの制御下に、他のアプリケーションプログラムと同様に起動してコンピュータＣ１に実行させることにより、上述の処理を実行することができる。なお、ＯＳが上述の処理の一部を分担する場合、記録媒体に格納される音声データ圧縮プログラムは、当該処理を制御する部分を除いたものであってもよい。 In addition, the audio data compression program can execute the above-described processing by being activated and executed by the computer C1 under the control of the OS in the same manner as other application programs. When the OS shares a part of the above-described processing, the audio data compression program stored in the recording medium may be one that excludes the portion that controls the processing.

（第２の実施の形態）
次に、この発明の第２の実施の形態に係る音声データ再生システムを説明する。
この音声データ再生システムは、図１に示す音声データ圧縮システムの構成と実質的に同一の物理的構成を有している。ただし、この音声データ再生システムを構成するコンピュータＣ１は音声データ再生プログラムを予め記憶していて、この音声データ再生プログラムを実行することにより、後述する処理を行うものとする。 (Second Embodiment)
Next explained is an audio data reproducing system according to the second embodiment of the invention.
This audio data reproduction system has a physical configuration substantially the same as that of the audio data compression system shown in FIG. However, the computer C1 constituting the audio data reproduction system stores an audio data reproduction program in advance, and executes the process described later by executing the audio data reproduction program.

次に、この音声データ再生システムの動作を、図５を参照して説明する。図５は、この音声データ再生システムの動作の流れを示す図である。 Next, the operation of this audio data reproduction system will be described with reference to FIG. FIG. 5 is a diagram showing an operation flow of the audio data reproduction system.

ユーザが、例えば上述の第１の実施の形態における連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データを記録した記録媒体を記録媒体ドライブ装置ＳＭＤにセットして、コンピュータＣ１に、音声データ再生プログラムの起動を指示すると、コンピュータＣ１は、音声データ再生プログラムの処理を開始する。 The user sets, for example, a recording medium on which the continuous component compressed data, random component compressed data, and sample number data in the first embodiment described above are recorded in the recording medium drive device SMD, and the computer C1 stores the audio data reproduction program. The computer C1 starts processing of the audio data reproduction program.

すると、まず、コンピュータＣ１は、記録媒体ドライブ装置ＳＭＤを介し、記録媒体より連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データを読み出す（図５、ステップＳ２０１）。 Then, first, the computer C1 reads the continuous component compressed data, the random component compressed data, and the sample number data from the recording medium via the recording medium drive device SMD (FIG. 5, step S201).

次に、コンピュータＣ１は、読み出した連続成分圧縮データを復号化することにより、（ｎ＋１）個の非線形量子化連続成分データと、圧縮特性データとを復元する（ステップＳ２０２）。また、ステップＳ２０２でコンピュータＣ１は、読み出したランダム成分圧縮データを復号化することにより、（ｎ＋１）個の非線形量子化ランダム成分データを復元する。 Next, the computer C1 restores (n + 1) nonlinear quantized continuous component data and compression characteristic data by decoding the read continuous component compressed data (step S202). In step S202, the computer C1 restores (n + 1) pieces of nonlinear quantized random component data by decoding the read random component compressed data.

次に、コンピュータＣ１は、復元された（ｎ＋１）個の非線形量子化連続成分データ及び（ｎ＋１）個の非線形量子化ランダム成分データが表す波形の瞬時値を、復元された圧縮特性データが示す圧縮特性と互いに逆変換の関係にある特性に従って変更することにより、非線形量子化される前の（ｎ＋１）個の連続成分データ及び（ｎ＋１）個のランダム成分データを復元する（ステップＳ２０３）。 Next, the computer C1 compresses the instantaneous value of the waveform represented by the restored (n + 1) non-linear quantized continuous component data and the (n + 1) non-linear quantized random component data by the restored compression characteristic data. By changing according to the characteristic that is inversely transformed with the characteristic, (n + 1) continuous component data and (n + 1) random component data before nonlinear quantization are restored (step S203).

なお、ステップＳ２０２で連続成分圧縮データから圧縮特性データを得られなかった場合、ステップＳ２０３でコンピュータＣ１は、非線形量子化連続成分データ及び非線形量子化ランダム成分データが表す波形の瞬時値を所定の特性に従って変更することにより連続成分データ及びランダム成分データを復元してもよい。あるいは、非線形量子化連続成分データ及び非線形量子化ランダム成分データを成分データ及びランダム成分データとみなして、ステップＳ２０２から直ちにステップＳ２０４へと処理を移してもよい。 If the compression characteristic data cannot be obtained from the continuous component compressed data in step S202, the computer C1 uses the instantaneous value of the waveform represented by the nonlinear quantized continuous component data and the nonlinear quantized random component data to obtain predetermined characteristics in step S203. The continuous component data and the random component data may be restored by changing according to the above. Alternatively, the nonlinear quantized continuous component data and the nonlinear quantized random component data may be regarded as component data and random component data, and the process may be immediately transferred from step S202 to step S204.

次に、コンピュータＣ１は、ステップＳ２０３で復元したｋ番目（ｋは０以上ｎ以下の各整数）のランダム成分データ及びｋ番目の連続成分データが示す各瞬時値同士の和（ただし、実質上互いに同一の時刻における瞬時値同士の和）を示す信号を生成する（ステップＳ２０４）。 Next, the computer C1 sums the instantaneous values indicated by the k-th random component data (k is an integer between 0 and n) restored in step S203 and the k-th continuous component data (however, substantially each other). A signal indicating the sum of instantaneous values at the same time is generated (step S204).

ステップＳ２０４で生成される、ｋ番目のランダム成分データが示す瞬時値と、ｋ番目の連続成分データが示す瞬時値との和を示す信号は、ステップＳ２０１で読み出した連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データがたとえば上述の第１の実施の形態の音声データ圧縮システムにより生成されたものである場合、当該音声データ圧縮システムが上述のステップＳ１１３で生成したｋ番目のサブバンドデータに相当する信号である。なお、ｋ番目のランダム成分データが存在しない場合、ステップＳ２０４でコンピュータＣ１は、ｋ番目の連続成分をそのままｋ番目のサブバンドデータとして扱うことと決定すればよい。 The signal indicating the sum of the instantaneous value indicated by the kth random component data and the instantaneous value indicated by the kth continuous component data, generated in step S204, is the continuous component compressed data and random component compressed read in step S201. When the data and the sample number data are generated by, for example, the audio data compression system according to the first embodiment described above, the audio data compression system corresponds to the kth subband data generated at step S113 described above. Signal. If the kth random component data does not exist, the computer C1 may determine in step S204 that the kth continuous component is handled as it is as the kth subband data.

次に、コンピュータＣ１は、ステップＳ２０４で生成された計（ｎ＋１）個のサブバンドデータに変換を施すことにより、これらのサブバンドデータにより各周波数成分の強度が表されるピッチ波形データを復元する（ステップＳ２０５）。 Next, the computer C1 performs transformation on the total (n + 1) subband data generated in step S204 to restore pitch waveform data in which the intensity of each frequency component is represented by these subband data. (Step S205).

コンピュータＣ１がステップＳ２０５でサブバンドデータに施す変換は、このサブバンドデータを生成するために音声データに施した変換に対して実質的に逆変換の関係にあるような変換である。従って、例えばこのサブバンドデータが上述のステップＳ１１３で生成されたものである場合、ステップＳ２０５でコンピュータＣ１は、ステップＳ１１３でピッチ波形データに施された変換の逆変換を施せばよい。具体的には、たとえばこのサブバンドデータがピッチ波形データにＤＣＴを施して生成されたものである場合、ステップＳ２０５でコンピュータＣ１は、このサブバンドデータにＩＤＣＴ（Inverse DCT）を施すようにすればよい。 The conversion performed by the computer C1 on the subband data in step S205 is a conversion that has a substantially inverse relationship to the conversion performed on the audio data in order to generate the subband data. Therefore, for example, when the subband data is generated in the above-described step S113, the computer C1 may perform inverse conversion of the conversion performed on the pitch waveform data in step S113 in step S205. Specifically, for example, if the subband data is generated by applying DCT to the pitch waveform data, the computer C1 may apply IDCT (Inverse DCT) to the subband data in step S205. Good.

次に、コンピュータＣ１は、ステップＳ２０５で復元したピッチ波形データの各区間のサンプル数を、ステップＳ２０２で復元したサンプル数データが示すサンプル数になるよう変更することにより、各区間の時間長を変更する（ステップＳ２０６）。
そして、コンピュータＣ１は、各区間の時間長を変更されたピッチ波形データ、すなわち復元された音声データを出力する（ステップＳ２０７）。 Next, the computer C1 changes the time length of each section by changing the number of samples in each section of the pitch waveform data restored in step S205 to the number of samples indicated by the sample number data restored in step S202. (Step S206).
Then, the computer C1 outputs pitch waveform data in which the time length of each section is changed, that is, restored audio data (step S207).

なお、ステップＳ２０７でンピュータＣ１が音声データを出力する手法は任意であり、例えば、コンピュータＣ１は、復元された音声データを自己のシリアル通信制御部を介して外部に出力してもよいし、記録媒体ドライブ装置ＳＭＤにセットされた記録媒体に、記録媒体ドライブ装置ＳＭＤを介して書き込むようにしてもよい。ハードディスクコントローラ等の制御回路を備えている場合は、ハードディスク装置等からなる外部の記憶装置に書き込むようにしてもよい。また、コンピュータＣ１が自ら実行している他の処理へと、音声データを引き渡すようにしてもよい。 Note that the method in which the computer C1 outputs audio data in step S207 is arbitrary. For example, the computer C1 may output the restored audio data to the outside via its own serial communication control unit, or record it. The recording medium set in the medium drive device SMD may be written via the recording medium drive device SMD. When a control circuit such as a hard disk controller is provided, the data may be written in an external storage device such as a hard disk device. Moreover, you may make it deliver audio | voice data to the other process which computer C1 is performing itself.

以上説明した処理を行う結果、この音声データ再生システムは、第１の実施の形態の音声データ圧縮システムにより圧縮された音声データ（あるいは、後述する第３の実施の形態の音声データ圧縮システムにより圧縮された音声データや、その他任意の手法によって上述の連続成分圧縮データ、ランダム成分データ及びサンプル数データへと変換された音声データ）を復元する。 As a result of performing the above-described processing, this audio data reproduction system uses the audio data compressed by the audio data compression system of the first embodiment (or the audio data compression system of the third embodiment described later). Audio data that has been converted to the above-described continuous component compressed data, random component data, and sample number data) by any other method.

なお、この音声データ再生システムの構成も、上述のものに限られない。
たとえば、この音声データ再生システムを構成するコンピュータＣ１も、外部からシリアル伝送される連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データを、シリアル通信制御部を介して取得するようにしてもよい。また、通信回線を介して外部より連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データを取得するようにしてもよく、この場合、コンピュータＣ１は、例えばモデムやＤＳＵ等を備えていればよい。また、記録媒体ドライブ装置ＳＭＤ以外から連続成分圧縮データ、ランダム成分データ及びサンプル数データを取得するならば、コンピュータＣ１は必ずしも記録媒体ドライブ装置ＳＭＤを備えている必要はない。 Note that the configuration of the audio data reproduction system is not limited to that described above.
For example, the computer C1 constituting the audio data reproduction system may also acquire the continuous component compressed data, random component compressed data, and sample number data serially transmitted from the outside via the serial communication control unit. In addition, continuous component compressed data, random component compressed data, and sample number data may be acquired from the outside via a communication line. In this case, the computer C1 only needs to include a modem, DSU, or the like. Further, if continuous component compressed data, random component data, and sample number data are obtained from other than the recording medium drive device SMD, the computer C1 does not necessarily have to include the recording medium drive device SMD.

また、コンピュータＣ１は、Ｄ／Ａ（Digital-to-Analog）変換器、ＡＦ増幅器及びスピーカなどからなる音声再生装置を備えていてもよい。この場合、ステップＳ２０７で音声再生装置が、復元された音声データをＤ／Ａ変換してアナログ形式の音声データを生成し、このアナログ形式の音声データを増幅して自己のスピーカを駆動することにより、音声データが表す音声を再生してもよい。 Further, the computer C1 may include an audio reproduction device including a D / A (Digital-to-Analog) converter, an AF amplifier, a speaker, and the like. In this case, in step S207, the audio reproduction device D / A converts the restored audio data to generate analog audio data, amplifies the analog audio data, and drives its own speaker. The voice represented by the voice data may be reproduced.

また、コンピュータＣ１は、復元された音声データを、記録媒体ドライブ装置ＳＭＤにセットされた記録媒体に、記録媒体ドライブ装置ＳＭＤを介して書き込むようにしてもよい。あるいは、ハードディスク装置等からなる外部の記憶装置に書き込むようにしてもよい。これらの場合、コンピュータＣ１は、記録媒体ドライブ装置や、ハードディスクコントローラ等の制御回路を備えていればよい。 Further, the computer C1 may write the restored audio data to the recording medium set in the recording medium drive device SMD via the recording medium drive device SMD. Alternatively, the data may be written in an external storage device such as a hard disk device. In these cases, the computer C1 only needs to include a control circuit such as a recording medium drive device and a hard disk controller.

また、コンピュータＣ１はステップＳ２０６で、ステップＳ２０５で復元したピッチ波形データのそれぞれの区間内のサンプルの間隔を調整することにより、当該区間の時間長を、ステップＳ２０２で復元したサンプル数データより特定される時間長へと変更するようにしてもよい。 In step S206, the computer C1 adjusts the sample interval in each section of the pitch waveform data restored in step S205, thereby specifying the time length of the section from the sample number data restored in step S202. You may make it change into the length of time to be.

なお、この音声データ再生システムを構成するコンピュータＣ１も専用のシステムである必要はなく、パーソナルコンピュータ等であってよい。また、音声データ再生プログラムは、音声データ圧縮プログラムを格納した媒体からコンピュータＣ１へとインストールするようにしてもよいし、通信回線の掲示板に音声データ再生プログラムをアップロードし、これを通信回線を介して配信してもよい。また、音声データ再生プログラムを表す信号により搬送波を変調し、得られた変調波を伝送し、この変調波を受信した装置が変調波を復調して音声データ再生プログラムを復元するようにしてもよい。また、音声データ再生プログラムは、ＯＳの制御下に、他のアプリケーションプログラムと同様に起動してコンピュータＣ１に実行させることにより、上述の処理を実行することができる。なお、ＯＳが上述の処理の一部を分担する場合、記録媒体に格納される音声データ再生プログラムは、当該処理を制御する部分を除いたものであってもよい。 Note that the computer C1 constituting the audio data reproduction system is not necessarily a dedicated system, and may be a personal computer or the like. The audio data reproduction program may be installed in the computer C1 from the medium storing the audio data compression program, or the audio data reproduction program is uploaded to a bulletin board on the communication line, and this is uploaded via the communication line. You may distribute. Further, the carrier wave may be modulated with a signal representing the audio data reproduction program, the obtained modulated wave may be transmitted, and the apparatus that has received the modulated wave may demodulate the modulated wave to restore the audio data reproduction program. . Further, the audio data reproduction program can be executed in the same manner as other application programs under the control of the OS and executed by the computer C1, thereby executing the above-described processing. When the OS shares a part of the above-described processing, the audio data reproduction program stored in the recording medium may be a program that excludes the portion that controls the processing.

（第３の実施の形態）
次に、この発明の第３の実施の形態を説明する。
図６は、この発明の第３の実施の形態に係る音声データ圧縮システムの構成を示す図である。図示するように、この音声データ圧縮システムは、音声入力部Ｅ１と、ピッチ波形抽出部Ｅ２と、サブバンド解析部Ｅ３と、成分分離部Ｅ４と、データ圧縮部Ｅ５と、出力部Ｅ６とより構成されている。 (Third embodiment)
Next explained is the third embodiment of the invention.
FIG. 6 is a diagram showing the configuration of an audio data compression system according to the third embodiment of the present invention. As shown in the figure, this audio data compression system includes an audio input unit E1, a pitch waveform extraction unit E2, a subband analysis unit E3, a component separation unit E4, a data compression unit E5, and an output unit E6. Has been.

音声入力部Ｅ１は、例えば、第１の実施の形態における記録媒体ドライブ装置ＳＭＤと同様の記録媒体ドライブ装置等より構成されている。
音声入力部Ｅ１は、音声の波形を表す音声データを、この音声データが記録された記録媒体から読み取る等して取得し、ピッチ波形抽出部Ｅ２に供給する。なお、音声データは、ＰＣＭ変調されたディジタル信号の形式を有しており、音声のピッチより十分短い一定の周期でサンプリングされた音声を表しているものとする。 The audio input unit E1 is configured by, for example, a recording medium drive device similar to the recording medium drive device SMD in the first embodiment.
The voice input unit E1 acquires voice data representing a voice waveform by reading it from a recording medium on which the voice data is recorded, and supplies the voice data to the pitch waveform extraction unit E2. Note that the audio data has a PCM-modulated digital signal format, and represents audio sampled at a constant period sufficiently shorter than the audio pitch.

ピッチ波形抽出部Ｅ２、サブバンド解析部Ｅ３、成分分離部Ｅ４及びデータ圧縮部Ｅ５は、いずれも、ＤＳＰやＣＰＵ等のプロセッサや、このプロセッサが実行するためのプログラムを記憶するメモリなどより構成されている。
なお、ピッチ波形抽出部Ｅ２、サブバンド解析部Ｅ３、成分分離部Ｅ４及びデータ圧縮部Ｅ５の一部又は全部の機能を単一のプロセッサが行うようにしてもよい。 Each of the pitch waveform extraction unit E2, the subband analysis unit E3, the component separation unit E4, and the data compression unit E5 includes a processor such as a DSP or a CPU, a memory that stores a program to be executed by the processor, and the like. ing.
A single processor may perform a part or all of the functions of the pitch waveform extraction unit E2, the subband analysis unit E3, the component separation unit E4, and the data compression unit E5.

ピッチ波形抽出部Ｅ２は、音声入力部Ｅ１より供給された音声データを、この音声データが表す音声の単位ピッチ分（たとえば、１ピッチ分）にあたる区間へと分割する。そして、分割されてできた各区間を移相及びリサンプリングすることにより、各区間の時間長及び位相を互いに実質的に同一になるように揃える。そして、各区間の位相及び時間長を揃えられた音声データ（ピッチ波形データ）を、サブバンド解析部Ｅ３に供給する。
また、ピッチ波形抽出部Ｅ２は、後述するピッチ信号を生成し、後述するように自らこのピッチ信号を用いるととともに、このピッチ信号を成分分離部Ｅ４へと供給する。
また、ピッチ波形抽出部Ｅ２は、この音声データの各区間の元のサンプル数を示すサンプル数データを生成し、出力部Ｅ６へと供給する。 The pitch waveform extraction unit E2 divides the audio data supplied from the audio input unit E1 into sections corresponding to the unit pitch (for example, one pitch) of the audio represented by the audio data. Then, by phase-shifting and resampling each section that has been divided, the time length and phase of each section are aligned so as to be substantially the same. And the audio | voice data (pitch waveform data) by which the phase and time length of each area were arrange | equalized are supplied to the subband analysis part E3.
Further, the pitch waveform extraction unit E2 generates a pitch signal to be described later, uses the pitch signal itself as described later, and supplies the pitch signal to the component separation unit E4.
Further, the pitch waveform extraction unit E2 generates sample number data indicating the original number of samples in each section of the audio data and supplies the sample number data to the output unit E6.

ピッチ波形抽出部Ｅ２は、機能的には、例えば図７に示すように、ケプストラム解析部Ｅ２０１と、自己相関解析部Ｅ２０２と、重み計算部Ｅ２０３と、ＢＰＦ（バンドパスフィルタ）係数計算部Ｅ２０４と、バンドパスフィルタＥ２０５と、ゼロクロス解析部Ｅ２０６と、波形相関解析部Ｅ２０７と、位相調整部Ｅ２０８と、補間部Ｅ２０９と、ピッチ長調整部Ｅ２１０とより構成されている。 Functionally, the pitch waveform extraction unit E2 has a cepstrum analysis unit E201, an autocorrelation analysis unit E202, a weight calculation unit E203, a BPF (band pass filter) coefficient calculation unit E204, for example, as shown in FIG. , A band pass filter E205, a zero cross analysis unit E206, a waveform correlation analysis unit E207, a phase adjustment unit E208, an interpolation unit E209, and a pitch length adjustment unit E210.

なお、ケプストラム解析部Ｅ２０１、自己相関解析部Ｅ２０２、重み計算部Ｅ２０３、ＢＰＦ係数計算部Ｅ２０４、バンドパスフィルタＥ２０５、ゼロクロス解析部Ｅ２０６、波形相関解析部Ｅ２０７、位相調整部Ｅ２０８、補間部Ｅ２０９及びピッチ長調整部Ｅ２１０の一部又は全部の機能を単一のプロセッサが行うようにしてもよい。 The cepstrum analysis unit E201, autocorrelation analysis unit E202, weight calculation unit E203, BPF coefficient calculation unit E204, bandpass filter E205, zero cross analysis unit E206, waveform correlation analysis unit E207, phase adjustment unit E208, interpolation unit E209, and pitch A single processor may perform a part or all of the functions of the length adjusting unit E210.

ピッチ波形抽出部Ｅ２は、ケプストラム解析と、自己相関関数に基づく解析とを併用して、ピッチの長さを特定する。
すなわち、まず、ケプストラム解析部Ｅ２０１は、音声入力部Ｅ１より供給される音声データにケプストラム解析を施すことにより、この音声データが表す音声の基本周波数を特定し、特定した基本周波数を示すデータを生成して重み計算部Ｅ２０３へと供給する。 The pitch waveform extraction unit E2 specifies the length of the pitch by using both the cepstrum analysis and the analysis based on the autocorrelation function.
That is, first, the cepstrum analysis unit E201 performs cepstrum analysis on the voice data supplied from the voice input unit E1, thereby specifying the fundamental frequency of the voice represented by the voice data and generating data indicating the identified fundamental frequency. And supplied to the weight calculation unit E203.

具体的には、ケプストラム解析部Ｅ２０１は、音声入力部Ｅ１より音声データを供給されると、まず、この音声データの強度を、元の値の対数に実質的に等しい値へと変換する。（対数の底は任意である。）
次に、ケプストラム解析部Ｅ２０１は、値が変換された音声データのスペクトル（すなわち、ケプストラム）を、高速フーリエ変換の手法（あるいは、離散的変数をフーリエ変換した結果を表すデータを生成する他の任意の手法）により求める。
そして、このケプストラムの極大値を与える周波数のうちの最小値を基本周波数として特定し、特定した基本周波数を示すデータを生成して重み計算部Ｅ２０３へと供給する。 Specifically, when audio data is supplied from the audio input unit E1, the cepstrum analysis unit E201 first converts the intensity of the audio data into a value substantially equal to the logarithm of the original value. (The base of the logarithm is arbitrary.)
Next, the cepstrum analysis unit E201 uses a fast Fourier transform technique (or other arbitrary data that generates a result of Fourier transform of discrete variables) on the spectrum of the speech data (ie, the cepstrum) whose values have been converted. This method is used.
Then, the minimum value among the frequencies giving the maximum value of the cepstrum is specified as the fundamental frequency, data indicating the identified fundamental frequency is generated and supplied to the weight calculation unit E203.

一方、自己相関解析部Ｅ２０２は、音声入力部Ｅ１より音声データを供給されると、音声データの波形の自己相関関数に基づいて、この音声データが表す音声の基本周波数を特定し、特定した基本周波数を示すデータを生成して重み計算部Ｅ２０３へと供給する。 On the other hand, when the audio data is supplied from the audio input unit E1, the autocorrelation analysis unit E202 specifies the basic frequency of the audio represented by the audio data based on the autocorrelation function of the waveform of the audio data, and specifies the specified basic Data indicating the frequency is generated and supplied to the weight calculation unit E203.

具体的には、自己相関解析部Ｅ２０２は、音声入力部Ｅ１より音声データを供給されるとまず、上述した自己相関関数ｒ（ｌ）を特定する。そして、特定した自己相関関数ｒ（ｌ）をフーリエ変換した結果得られるピリオドグラムの極大値を与える周波数のうち、所定の下限値を超える最小の値を基本周波数として特定し、特定した基本周波数を示すデータを生成して重み計算部Ｅ２０３へと供給する。 Specifically, when the audio data is supplied from the audio input unit E1, the autocorrelation analysis unit E202 first specifies the autocorrelation function r (l) described above. Then, among the frequencies giving the maximum value of the periodogram obtained as a result of Fourier transform of the specified autocorrelation function r (l), the minimum value exceeding a predetermined lower limit value is specified as the basic frequency, and the specified basic frequency is Data shown is generated and supplied to the weight calculation unit E203.

重み計算部Ｅ２０３は、ケプストラム解析部Ｅ２０１及び自己相関解析部Ｅ２０２より基本周波数を示すデータを１個ずつ合計２個供給されると、これら２個のデータが示す基本周波数の逆数の絶対値の平均を求める。そして、求めた値（すなわち、平均ピッチ長）を示すデータを生成し、ＢＰＦ係数計算部Ｅ２０４へと供給する。 When a total of two pieces of data indicating the fundamental frequency are supplied one by one from the cepstrum analysis unit E201 and the autocorrelation analysis unit E202, the weight calculation unit E203 averages the absolute value of the reciprocal of the fundamental frequency indicated by these two data. Ask for. Then, data indicating the obtained value (that is, average pitch length) is generated and supplied to the BPF coefficient calculation unit E204.

ＢＰＦ係数計算部Ｅ２０４は、平均ピッチ長を示すデータを重み計算部Ｅ２０３より供給され、ゼロクロス解析部Ｅ２０６より後述のゼロクロス信号を供給されると、供給されたデータやゼロクロス信号に基づき、平均ピッチ長とゼロクロスの周期とが互いに所定量以上異なっているか否かを判別する。そして、異なっていないと判別したときは、ゼロクロスの周期の逆数を中心周波数（バンドパスフィルタＥ２０５の通過帯域の中央の周波数）とするように、バンドパスフィルタＥ２０５の周波数特性を制御する。一方、所定量以上異なっていると判別したときは、平均ピッチ長の逆数を中心周波数とするように、バンドパスフィルタＥ２０５の周波数特性を制御する。 When the BPF coefficient calculation unit E204 is supplied with data indicating the average pitch length from the weight calculation unit E203 and is supplied with a zero-cross signal described later from the zero-cross analysis unit E206, the average pitch length is based on the supplied data and the zero-cross signal. And whether the zero-cross cycle is different from each other by a predetermined amount or more. When it is determined that they are not different, the frequency characteristic of the bandpass filter E205 is controlled so that the reciprocal of the zero-crossing period is the center frequency (the center frequency of the passband of the bandpass filter E205). On the other hand, when it is determined that they differ by a predetermined amount or more, the frequency characteristics of the bandpass filter E205 are controlled so that the reciprocal of the average pitch length is set as the center frequency.

バンドパスフィルタＥ２０５は、中心周波数が可変なＦＩＲ（Finite Impulse Response）型のフィルタの機能を行う。
具体的には、バンドパスフィルタＥ２０５は、自己の中心周波数を、ＢＰＦ係数計算部Ｅ２０４の制御に従った値に設定する。そして、音声入力部Ｅ１より供給される音声データをフィルタリングして、フィルタリングされた音声データ（ピッチ信号）を、ゼロクロス解析部Ｅ２０６、波形相関解析部Ｅ２０７及び成分分離部Ｅ４の後述する連続成分抽出部Ｅ４１−０〜Ｅ４１−ｎへと供給する。ピッチ信号は、音声データのサンプルリング間隔と実質的に同一のサンプリング間隔を有するディジタル形式のデータからなるものとする。
なお、バンドパスフィルタＥ２０５の帯域幅は、バンドパスフィルタＥ２０５の通過帯域の上限が音声データの表す音声の基本周波数の２倍以内に常に収まるような帯域幅であることが望ましい。 The band pass filter E205 performs a function of an FIR (Finite Impulse Response) type filter having a variable center frequency.
Specifically, the bandpass filter E205 sets its own center frequency to a value according to the control of the BPF coefficient calculation unit E204. Then, the voice data supplied from the voice input unit E1 is filtered, and the filtered voice data (pitch signal) is converted into a zero-cross analysis unit E206, a waveform correlation analysis unit E207, and a continuous component extraction unit described later of the component separation unit E4. Supply to E41-0 to E41-n. The pitch signal is assumed to be digital data having a sampling interval substantially the same as the sampling interval of audio data.
The bandwidth of the bandpass filter E205 is desirably a bandwidth that always keeps the upper limit of the passband of the bandpass filter E205 within twice the fundamental frequency of the voice represented by the voice data.

ゼロクロス解析部Ｅ２０６は、バンドパスフィルタＥ２０５から供給されたピッチ信号の瞬時値が０となる時刻（ゼロクロスする時刻）が来るタイミングを特定し、特定したタイミングを表す信号（ゼロクロス信号）を、ＢＰＦ係数計算部Ｅ２０４へと供給する。このようにして、音声データのピッチの長さが特定される。
ただし、ゼロクロス解析部Ｅ２０６は、ピッチ信号の瞬時値が０でない所定の値となる時刻が来るタイミングを特定し、特定したタイミングを表す信号を、ゼロクロス信号に代えてＢＰＦ係数計算部Ｅ２０４へと供給するようにしてもよい。 The zero-cross analysis unit E206 specifies the timing when the time when the instantaneous value of the pitch signal supplied from the band-pass filter E205 becomes 0 (time when zero-crossing) comes, and the signal (zero-cross signal) indicating the specified timing is represented by the BPF coefficient. It supplies to the calculation part E204. In this way, the pitch length of the audio data is specified.
However, the zero-cross analysis unit E206 specifies the timing when the time when the instantaneous value of the pitch signal is a predetermined value other than 0 comes, and supplies a signal representing the specified timing to the BPF coefficient calculation unit E204 instead of the zero-cross signal. You may make it do.

波形相関解析部Ｅ２０７は、音声入力部Ｅ１より音声データを供給され、バンドパスフィルタＥ２０５よりピッチ信号を供給されると、ピッチ信号の単位周期（例えば１周期）の境界が来るタイミングで音声データを区切る。そして、区切られてできる区間のそれぞれについて、この区間内の音声データの位相を種々変化させたものとこの区間内のピッチ信号との相関を求め、最も相関が高くなるときの音声データの位相を、この区間内の音声データの位相として特定する。このようにして、各区間につき音声データの位相が特定される。 When the waveform correlation analysis unit E207 is supplied with the audio data from the audio input unit E1 and is supplied with the pitch signal from the bandpass filter E205, the waveform correlation analysis unit E207 receives the audio data at the timing when the boundary of the unit period (for example, 1 period) of the pitch signal comes. punctuate. Then, for each of the sections that can be divided, the correlation between the variously changed phases of the audio data in this section and the pitch signal in this section is obtained, and the phase of the audio data when the correlation becomes the highest is obtained. The phase of the audio data in this section is specified. In this way, the phase of the audio data is specified for each section.

具体的には、波形相関解析部Ｅ２０７は、例えば、それぞれの区間毎に、上述した値Ψを特定し、値Ψを示すデータを生成して、この区間内の音声データの位相を表す位相データとして位相調整部Ｅ２０８に供給する。なお、区間の時間的な長さは、１ピッチ分程度であることが望ましい。 Specifically, for example, the waveform correlation analysis unit E207 specifies the above-described value Ψ for each section, generates data indicating the value Ψ, and represents phase data representing the phase of the audio data in the section. To the phase adjustment unit E208. Note that the time length of the section is preferably about one pitch.

位相調整部Ｅ２０８は、音声入力部Ｅ１より音声データを供給され、波形相関解析部Ｅ２０７より音声データの各区間の位相Ψを示すデータを供給されると、それぞれの区間の音声データの位相を（−Ψ）だけ移相することにより、各区間の位相を揃える。そして、移相された音声データを補間部Ｅ２０９へと供給する。 When the audio data is supplied from the audio input unit E1 and the data indicating the phase ψ of each section of the audio data is supplied from the waveform correlation analysis unit E207, the phase adjustment unit E208 sets the phase of the audio data in each interval to ( The phase of each section is aligned by shifting the phase by −Ψ). Then, the phase-shifted audio data is supplied to the interpolation unit E209.

補間部Ｅ２０９は、位相調整部Ｅ２０８より供給された音声データ（移相された音声データ）にラグランジェ補間を施して、ピッチ長調整部Ｅ２１０へと供給する。 The interpolation unit E209 performs Lagrangian interpolation on the audio data (phase-shifted audio data) supplied from the phase adjustment unit E208, and supplies the result to the pitch length adjustment unit E210.

ピッチ長調整部Ｅ２１０は、ラグランジェ補間を施された音声データを補間部Ｅ２０９より供給されると、供給された音声データの各区間をリサンプリングすることにより、各区間の時間長を互いに実質的に同一になるように揃える。そして、各区間の時間長を揃えられた音声データ（すなわち、ピッチ波形データ）をサブバンド解析部Ｅ３へと供給する。 When the audio data subjected to Lagrangian interpolation is supplied from the interpolation unit E209, the pitch length adjustment unit E210 substantially resamples the time lengths of the sections from each other by resampling each section of the supplied audio data. To be identical to each other. Then, the audio data (that is, pitch waveform data) in which the time lengths of the respective sections are aligned is supplied to the subband analysis unit E3.

また、ピッチ長調整部Ｅ２１０は、この音声データの各区間の元のサンプル数（音声入力部Ｅ１からピッチ長調整部Ｅ２１０へと供給された時点におけるこの音声データの各区間のサンプル数）を示すサンプル数データを生成し、出力部Ｅ６へと供給する。 The pitch length adjustment unit E210 indicates the original number of samples in each section of the voice data (the number of samples in each section of the voice data at the time when the voice data is supplied from the voice input unit E1 to the pitch length adjustment unit E210). Sample number data is generated and supplied to the output unit E6.

サブバンド解析部Ｅ３は、ピッチ長調整部Ｅ２１０より供給されたピッチ波形データにＤＣＴ等の直交変換を施すことにより、０番目〜ｎ番目までの計（ｎ＋１）このサブバンドデータからなるサブバンドデータ群を生成し、このサブバンドデータ群を成分分離部Ｅ４へと供給する。 The subband analyzing unit E3 performs orthogonal transformation such as DCT on the pitch waveform data supplied from the pitch length adjusting unit E210, thereby obtaining a total of (n + 1) subband data consisting of this subband data. A group is generated, and this subband data group is supplied to the component separation unit E4.

成分分離部Ｅ４は、機能的には、例えば図８に示すように、（ｎ＋１）個の連続成分抽出部Ｅ４１−０〜Ｅ４１−ｎと、（ｎ＋１）個のランダム成分抽出部Ｅ４２−０〜Ｅ４２−ｎとより構成されている。 Functionally, the component separation unit E4 has (n + 1) continuous component extraction units E41-0 to E41-n and (n + 1) random component extraction units E42-0 to E42-0, for example, as shown in FIG. E42-n.

連続成分抽出部Ｅ４１−０〜Ｅ４１−ｎは、それぞれ、たとえばＬＭＳ（Least Mean Square）フィルタあるいはその他の適応フィルタ（適応型フィルタ）の機能を行うものである。
連続成分抽出部Ｅ４１−ｋ（ｋは０以上ｎ以下の整数）は、バンドパスフィルタＥ２０５より供給されるピッチ信号の振幅が所定量に達しているか否かを判別する。そして、所定量に達していないと判別された期間は、サブバンド解析部Ｅ３より供給されるサブバンドデータ群に含まれるｋ番目のサブバンドデータをフィルタリングすることにより、ｋ番目のサブバンドデータのうち一定程度以上に強い周期性を有する成分（以下、ｋ番目の連続成分データと呼ぶ）を抽出し、抽出されたｋ番目の連続成分データを、ランダム成分抽出部Ｅ４２−ｋ及びデータ圧縮部Ｅ５へと供給する。
一方、ピッチ信号の振幅が所定量に達していると判別された期間は、ｋ番目のサブバンドデータのフィルタリングを行う代わりに、ｋ番目のサブバンドデータをそのままｋ番目の連続成分データとしてランダム成分抽出部Ｅ４２−ｋ及びデータ圧縮部Ｅ５へと供給する。 The continuous component extraction units E41-0 to E41-n perform functions of, for example, an LMS (Least Mean Square) filter or another adaptive filter (adaptive filter).
The continuous component extraction unit E41-k (k is an integer between 0 and n) determines whether or not the amplitude of the pitch signal supplied from the bandpass filter E205 has reached a predetermined amount. Then, during a period in which it is determined that the predetermined amount has not been reached, the kth subband data is filtered by filtering the kth subband data included in the subband data group supplied from the subband analysis unit E3. Among them, a component having a periodicity stronger than a certain level (hereinafter referred to as k-th continuous component data) is extracted, and the extracted k-th continuous component data is converted into a random component extraction unit E42-k and a data compression unit E5. To supply.
On the other hand, during the period when it is determined that the amplitude of the pitch signal has reached a predetermined amount, the kth subband data is used as it is as the kth continuous component data as a random component instead of filtering the kth subband data. The data is supplied to the extraction unit E42-k and the data compression unit E5.

ランダム成分抽出部Ｅ４２−ｋは、サブバンド解析部Ｅ３より供給されたｋ番目のサブバンドデータが示す瞬時値と、連続成分抽出部Ｅ４２−ｋより供給されたｋ番目の連続成分データが示す瞬時値との差（ただし、実質上互いに同一の時刻における瞬時値同士の差）を示す信号（以下、ｋ番目のランダム成分データ）を生成して、データ圧縮部Ｅ５へと供給する。 The random component extraction unit E42-k includes an instantaneous value indicated by the kth subband data supplied from the subband analysis unit E3 and an instantaneous value indicated by the kth continuous component data supplied from the continuous component extraction unit E42-k. A signal (hereinafter, kth random component data) indicating a difference from the value (however, a difference between instantaneous values at substantially the same time) is generated and supplied to the data compression unit E5.

連続成分抽出部Ｅ４１−ｋの行う動作は、ｋ番目のサブバンドデータについて、人が発声する音声に起因しない成分（楽器の音などの成分）を無視できないと判別した場合には、当該ｋ番目のサブバンドデータのうち一定程度の強い周期性のある成分をｋ番目の連続成分データとして抽出し、また、人が発声する音声に起因しない成分を無視できると判別した場合には、ｋ番目のサブバンドデータがそのままｋ番目の連続成分データであると見なす（つまり、人が発声する音声に起因しない成分は存在しないものと見なす）、という動作に相当する。
従って、ｋ番目のサブバンドデータについて、人が発声する音声に起因しない成分を無視できない判別されている場合には、当該ｋ番目のサブバンドデータのうち周期性が一定程度に達しない成分がｋ番目のランダム成分データとなり、また、ｋ番目のサブバンドデータが人の発声した音声の成分のみからなるとみなしてよい旨判別されている場合には、ｋ番目の連続成分データの強度は実質的に０となる。 The operation performed by the continuous component extraction unit E41-k is performed when the k-th subband data is determined not to ignore a component (a component such as a sound of an instrument) that is not caused by a voice uttered by a person. If a component having a certain degree of strong periodicity is extracted as the k-th continuous component data from the sub-band data and if it is determined that a component that is not caused by speech uttered by a person can be ignored, the k-th This corresponds to an operation in which the subband data is regarded as the k-th continuous component data as it is (that is, a component that is not caused by a voice uttered by a person is not present).
Therefore, when it is determined that the component that does not originate from the voice uttered by the person cannot be ignored for the kth subband data, the component of the kth subband data whose periodicity does not reach a certain level is k. In the case where it is determined that the k-th sub-band data may be considered to be composed only of the speech component uttered by the person, the strength of the k-th continuous component data is substantially 0.

データ圧縮部Ｅ５は、機能的には、例えば図９に示すように、非線形量子化部Ｅ５１と、圧縮率設定部Ｅ５２と、エントロピー符号化部Ｅ５３とより構成されている。 The data compression unit E5 is functionally composed of a non-linear quantization unit E51, a compression rate setting unit E52, and an entropy coding unit E53, for example, as shown in FIG.

非線形量子化部Ｅ５１は、（ｎ＋１）個の連続成分データを連続成分抽出部Ｅ４１−０〜Ｅ４１−ｎより供給されると、これらの連続成分データのそれぞれが表す波形の瞬時値に非線形な圧縮を施して得られる値（具体的には、たとえば、瞬時値を上に凸な関数に代入して得られる値）を量子化したものに相当する（ｎ＋１）個の非線形量子化連続成分データを生成する。また、（ｎ＋１）個のランダム成分データを連続成分抽出部Ｅ４１−０〜Ｅ４１−ｎより供給されると、これらのランダム成分データのそれぞれが表す波形の瞬時値に当該非線形な圧縮を施して得られる値を量子化したものに相当する（ｎ＋１）個の非線形量子化ランダム成分データを生成する。
そして、非線形量子化部Ｅ５１は、生成したこれらの非線形量子化連続成分データ及び非線形量子化ランダム成分データを、エントロピー符号化部Ｅ５３へと供給する。ただし強度が０であるランダム成分については、非線形量子化ランダム成分データを生成する必要はない。 When the non-linear quantization unit E51 is supplied with (n + 1) pieces of continuous component data from the continuous component extraction units E41-0 to E41-n, the non-linear quantization unit E51 performs non-linear compression on the instantaneous value of the waveform represented by each of these continuous component data. (N + 1) non-linear quantized continuous component data corresponding to a value obtained by quantizing a value (specifically, for example, a value obtained by substituting an instantaneous value into an upward convex function) Generate. Further, when (n + 1) pieces of random component data are supplied from the continuous component extraction units E41-0 to E41-n, they are obtained by subjecting the instantaneous values of the waveforms represented by these random component data to the nonlinear compression. (N + 1) non-linear quantized random component data corresponding to a quantized value is generated.
Then, the nonlinear quantization unit E51 supplies the generated nonlinear quantization continuous component data and nonlinear quantization random component data to the entropy coding unit E53. However, it is not necessary to generate nonlinear quantized random component data for a random component having an intensity of 0.

なお、非線形量子化部Ｅ５１は、瞬時値の圧縮前の値と圧縮後の値との対応関係を特定するための圧縮特性データを圧縮率設定部Ｅ５２より取得し、このデータにより特定される対応関係に従って圧縮を行うものとする。具体的には、例えば、非線形量子化部Ｅ５１は、上述の関数ｇｌｏｂａｌ＿ｇａｉｎ（ｘｉ）を特定するデータを、圧縮特性データとして圧縮率設定部Ｅ５２より取得する。そして、非線形圧縮後の各連続成分データや各ランダム成分データの瞬時値を、上述の関数Ｘｒｉ（ｘｉ）を量子化した値に実質的に等しくなるようなものへと変更することにより非線形量子化を行う。 The nonlinear quantization unit E51 obtains compression characteristic data for specifying the correspondence between the pre-compression value and the post-compression value of the instantaneous value from the compression rate setting unit E52, and the correspondence specified by this data. Compress according to the relationship. Specifically, for example, the nonlinear quantization unit E51 acquires data specifying the above-described function global_gain (xi) from the compression rate setting unit E52 as compression characteristic data. Then, the nonlinear quantization is performed by changing the instantaneous value of each continuous component data and each random component data after nonlinear compression to a value that is substantially equal to a value obtained by quantizing the above-described function Xri (xi). I do.

圧縮率設定部Ｅ５２は、非線形量子化部Ｅ５１による瞬時値の圧縮前の値と圧縮後の値との対応関係（以下、圧縮特性と呼ぶ）を特定するための上述の圧縮特性データを生成し、非線形量子化部Ｅ５１及びエントロピー符号化部Ｅ５３に供給する。具体的には、例えば、上述の関数ｇｌｏｂａｌ＿ｇａｉｎ（ｘｉ）を特定する圧縮特性データを生成して、非線形量子化部Ｅ５１及びエントロピー符号化部Ｅ５３に供給する。 The compression rate setting unit E52 generates the above-described compression characteristic data for specifying the correspondence (hereinafter referred to as compression characteristic) between the value before compression of the instantaneous value by the nonlinear quantization unit E51 and the value after compression. The non-linear quantization unit E51 and the entropy coding unit E53 are supplied. Specifically, for example, compression characteristic data specifying the above-described function global_gain (xi) is generated and supplied to the nonlinear quantization unit E51 and the entropy encoding unit E53.

なお、圧縮率設定部Ｅ５２は、圧縮特性を決定するため、たとえば、エントロピー符号化部Ｅ５３より、後述の連続成分圧縮データ及びランダム成分圧縮データを取得する。そして、成分分離部Ｅ４より取得した（ｎ＋１）個の連続成分データ及び（ｎ＋１）個のランダム成分データのデータ量の総計に対する、エントロピー符号化部Ｅ５３より取得した連続成分圧縮データ及びランダム成分圧縮データのデータ量の総計の比を求め、求めた比が、目標とする所定の圧縮率より大きいか否かを判別する。求めた比が目標とする圧縮率より大きいと判別すると、圧縮率設定部Ｅ５２は、圧縮率が現在より小さくなるように圧縮特性を決定する。一方、求めた比が目標とする圧縮率以下であると判別すると、圧縮率が現在より大きくなるように、圧縮特性を決定する。 Note that the compression rate setting unit E52 acquires continuous component compressed data and random component compressed data, which will be described later, from the entropy encoding unit E53, for example, in order to determine the compression characteristics. Then, the continuous component compressed data and the random component compressed data acquired from the entropy encoding unit E53 with respect to the total amount of (n + 1) continuous component data and (n + 1) random component data acquired from the component separator E4. The ratio of the total amount of data is obtained, and it is determined whether or not the obtained ratio is larger than a target predetermined compression rate. If it is determined that the obtained ratio is larger than the target compression rate, the compression rate setting unit E52 determines the compression characteristics so that the compression rate is smaller than the current compression rate. On the other hand, when it is determined that the obtained ratio is equal to or less than the target compression rate, the compression characteristic is determined so that the compression rate is larger than the current compression rate.

エントロピー符号化部Ｅ５３は、非線形量子化部Ｅ５１より供給された（ｎ＋１）個の非線形量子化連続成分データ、及び、圧縮率設定部Ｅ５２より供給された圧縮特性データをエントロピー符号化し、エントロピー符号化されたこれらのデータを、連続成分圧縮データとして、圧縮率設定部Ｅ５２及び出力部Ｅ６へと供給する。また、エントロピー符号化部Ｅ５３は、非線形量子化部Ｅ５１より供給された（ｎ＋１）個の非線形量子化ランダム成分データをエントロピー符号化し、エントロピー符号化されたこれらのデータを、ランダム成分圧縮データとして、圧縮率設定部Ｅ５２及び出力部Ｅ６へと供給する。 The entropy encoding unit E53 entropy-encodes the (n + 1) pieces of non-linear quantization continuous component data supplied from the non-linear quantization unit E51 and the compression characteristic data supplied from the compression rate setting unit E52. These pieces of data are supplied to the compression ratio setting unit E52 and the output unit E6 as continuous component compressed data. The entropy encoding unit E53 entropy-encodes (n + 1) nonlinear quantization random component data supplied from the nonlinear quantization unit E51, and these entropy-encoded data are used as random component compressed data. It supplies to the compression rate setting part E52 and the output part E6.

出力部Ｅ６は、たとえば、ＵＳＢ等の規格に準拠して外部とのシリアル通信を制御する制御回路より構成されている。なお、ピッチ波形抽出部Ｅ２、サブバンド解析部Ｅ３、成分分離部Ｅ４及びデータ圧縮部Ｅ５の一部又は全部の機能を行うプロセッサが、出力部Ｅ６の機能を更に行うようにしてもよい。 The output unit E6 includes a control circuit that controls serial communication with the outside based on a standard such as USB. Note that a processor that performs some or all of the functions of the pitch waveform extraction unit E2, the subband analysis unit E3, the component separation unit E4, and the data compression unit E5 may further perform the function of the output unit E6.

出力部Ｅ６は、データ圧縮部Ｅ５が生成した連続成分圧縮データ及びランダム成分圧縮データと、ピッチ波形抽出部Ｅ２のピッチ長調整部Ｅ２１０が生成したサンプル数データとを供給されると、これらの連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データを出力する。 When the output unit E6 is supplied with the continuous component compression data and the random component compression data generated by the data compression unit E5 and the sample number data generated by the pitch length adjustment unit E210 of the pitch waveform extraction unit E2, the output unit E6 continuously Component compressed data, random component compressed data, and sample number data are output.

図６の音声データ圧縮システムも、圧縮の対象である音声データを、所定の基準に合致する程度の周期性を有する連続成分データと、その他の成分であるランダム成分データとに分離し、両者のそれぞれを別個にエントロピー符号化する。このため、この音声データのうち人が発する音声に起因する成分と起因しない成分とが別個にエントロピー符号化され、音声データは全体として効率的に圧縮される。 The audio data compression system of FIG. 6 also separates audio data to be compressed into continuous component data having a periodicity that meets a predetermined standard and random component data that is other components. Each is entropy encoded separately. For this reason, in the audio data, the component caused by the voice uttered by the person and the component not caused by the entropy coding are separately encoded, and the audio data is efficiently compressed as a whole.

また、音声データはピッチ波形データへと加工されることにより単位ピッチ分の区間の長さや振幅が規格化され、ピッチのゆらぎの影響が除去されている。このため、ピッチ波形データのうち人が発する音声に起因する成分は強い周期性を有するものとなり、この成分は、成分分離部Ｅ４によって連続成分として正確に抽出される。抽出されたこの連続成分は強い周期性を有しているため、連続成分のエントロピー符号化は効率的に行われる。 Further, the voice data is processed into pitch waveform data, so that the length and amplitude of a section for a unit pitch are standardized, and the influence of pitch fluctuation is removed. For this reason, the component resulting from the voice which a person utters among pitch waveform data has a strong periodicity, and this component is correctly extracted as a continuous component by the component separation part E4. Since the extracted continuous component has a strong periodicity, entropy coding of the continuous component is performed efficiently.

更に、サンプル数データを用いてピッチ波形データの各区間の元の時間長を特定することができるため、ピッチ波形データの各区間の時間長を元の音声データにおける時間長へと復元することにより、元の音声データを容易に復元できる。 Furthermore, since the original time length of each section of the pitch waveform data can be specified using the sample number data, the time length of each section of the pitch waveform data is restored to the time length in the original audio data. The original audio data can be easily restored.

なお、この音声データ圧縮システムの構成も上述のものに限られない。
たとえば、音声入力部Ｅ１は、電話回線、専用回線、衛星回線等の通信回線を介して外部より音声データを取得するようにしてもよい。この場合、音声入力部Ｅ１は、例えばモデムやＤＳＵ等からなる通信制御部を備えていればよい。 The configuration of the audio data compression system is not limited to the above.
For example, the voice input unit E1 may acquire voice data from the outside via a communication line such as a telephone line, a dedicated line, or a satellite line. In this case, the voice input unit E1 only needs to include a communication control unit including, for example, a modem or a DSU.

また、音声入力部Ｅ１は、マイクロフォン、ＡＦ増幅器、サンプラー、Ａ／Ｄコンバータ及びＰＣＭエンコーダなどからなる集音装置を備えていてもよい。集音装置は、自己のマイクロフォンが集音した音声を表す音声信号を増幅し、サンプリングしてＡ／Ｄ変換した後、サンプリングされた音声信号にＰＣＭ変調を施すことにより、音声データを取得すればよい。なお、音声入力部Ｅ１が取得する音声データは、必ずしもＰＣＭ信号である必要はない。 The voice input unit E1 may include a sound collection device including a microphone, an AF amplifier, a sampler, an A / D converter, a PCM encoder, and the like. If the sound collection device acquires sound data by amplifying a sound signal representing sound collected by its own microphone, sampling and A / D converting, and then performing PCM modulation on the sampled sound signal Good. Note that the audio data acquired by the audio input unit E1 is not necessarily a PCM signal.

また、このピッチ波形抽出部Ｅ２は、ケプストラム解析部Ｅ２０１（又は自己相関解析部Ｅ２０２）を備えていなくてもよく、この場合、重み計算部Ｅ２０３は、ケプストラム解析部Ｅ２０１（又は自己相関解析部Ｅ２０２）が求めた基本周波数の逆数をそのまま平均ピッチ長として扱うようにすればよい。 The pitch waveform extraction unit E2 may not include the cepstrum analysis unit E201 (or autocorrelation analysis unit E202). In this case, the weight calculation unit E203 may include the cepstrum analysis unit E201 (or autocorrelation analysis unit E202). The reciprocal of the fundamental frequency obtained by (2) may be handled as the average pitch length as it is.

また、ゼロクロス解析部Ｅ２０６は、バンドパスフィルタＥ２０５から供給されたピッチ信号を、そのままゼロクロス信号としてＢＰＦ係数計算部Ｅ２０４へと供給するようにしてもよい。 The zero cross analysis unit E206 may supply the pitch signal supplied from the bandpass filter E205 as it is to the BPF coefficient calculation unit E204 as a zero cross signal.

また、位相調整部Ｅ２０８が音声データの各区間内の音声データを移相する量は（−Ψ）である必要はなく、また、波形相関解析部Ｅ２０７が音声データを区切る位置は、必ずしもピッチ信号がゼロクロスするタイミングである必要はない。
また、補間部Ｅ２０９は移相された音声データの補間を必ずしもラグランジェ補間の手法により行う必要はなく、例えば直線補間の手法によってもよいし、補間部Ｅ２０９を省略し、位相調整部Ｅ２０８は音声データを直ちにピッチ長調整部Ｅ２１０に供給してもよい。 Further, the amount by which the phase adjustment unit E208 shifts the audio data in each section of the audio data does not need to be (−Ψ), and the position where the waveform correlation analysis unit E207 divides the audio data is not necessarily a pitch signal. Need not be at the timing of zero crossing.
Further, the interpolation unit E209 does not necessarily perform the phase-shifted audio data interpolation by the Lagrangian interpolation method. For example, the interpolation unit E209 may omit the interpolation unit E209 and the phase adjustment unit E208 The data may be immediately supplied to the pitch length adjustment unit E210.

また、出力部Ｅ６は、音素データやサンプル数データ、通信回線等を介して外部に出力するようにしてもよい。通信回線を介してデータを出力する場合、出力部Ｅ６は、例えばモデムやＤＳＵ等からなる通信制御部を備えていればよい。
また、出力部Ｅ６は、記録媒体ドライブ装置を備えていてもよく、この場合、出力部Ｅ６は、連続成分圧縮データやランダム成分圧縮データやサンプル数データを、この記録媒体ドライブ装置にセットされた記録媒体の記憶領域に書き込むようにしてもよい。
なお、単一のモデムやＤＳＵや記録媒体ドライブ装置が音声入力部Ｅ１及び出力部Ｅ６を構成していてもよい。 Further, the output unit E6 may output to the outside via phoneme data, sample number data, a communication line, or the like. When outputting data via a communication line, the output unit E6 only needs to include a communication control unit such as a modem or a DSU.
Further, the output unit E6 may include a recording medium drive device. In this case, the output unit E6 sets continuous component compressed data, random component compressed data, and sample number data in this recording medium drive device. You may make it write in the storage area of a recording medium.
Note that a single modem, DSU, or recording medium drive device may constitute the audio input unit E1 and the output unit E6.

また、連続成分抽出部Ｅ４１−ｋは、ピッチ信号の振幅が所定量に達しているか否かを判別する代わりに、ピッチ波形信号の振幅に対するピッチ信号の振幅の比率が所定量に達しているか否かを判別してもよい。 Further, instead of determining whether or not the amplitude of the pitch signal has reached a predetermined amount, the continuous component extraction unit E41-k determines whether or not the ratio of the amplitude of the pitch signal to the amplitude of the pitch waveform signal has reached a predetermined amount. It may be determined.

また、エントロピー符号化部Ｅ５３は、必ずしも圧縮特性データをエントロピー符号化しなくてもよく、この場合は例えば、圧縮率設定部Ｅ５２が、自己が生成した圧縮特性データを非線形量子化部Ｅ５１及び出力部Ｅ６に供給するものとし、出力部Ｅ６は、連続成分圧縮データ、ランダム成分圧縮データと及びサンプル数データに加え、圧縮率設定部Ｅ５２より供給された圧縮特性データも出力するものとすればよい。 Further, the entropy encoding unit E53 does not necessarily need to entropy encode the compression characteristic data. In this case, for example, the compression rate setting unit E52 converts the compression characteristic data generated by itself into the nonlinear quantization unit E51 and the output unit. The output unit E6 may output the compression characteristic data supplied from the compression rate setting unit E52 in addition to the continuous component compression data, the random component compression data, and the sample number data.

また、データ圧縮部Ｅ５は必ずしも圧縮率設定部Ｅ５２を備えている必要はなく、この場合は例えば、非線形量子化部Ｅ５１が所定の圧縮特性で非線形量子化連続成分データ及び非線形量子化ランダム成分データを生成し、エントロピー符号化部Ｅ５３が、当該所定の圧縮特性を示す圧縮特性データと、非線形量子化部Ｅ５１が生成した非線形量子化連続成分データとをエントロピー符号化することにより連続成分圧縮データを生成するものとしてもよい。また、エントロピー符号化部Ｅ５３は、圧縮特性データのエントロピー符号化を省略してもよい。 The data compression unit E5 does not necessarily include the compression rate setting unit E52. In this case, for example, the nonlinear quantization unit E51 has nonlinear compression continuous component data and nonlinear quantization random component data with predetermined compression characteristics. The entropy encoding unit E53 entropy-encodes the continuous component compressed data by entropy encoding the compression characteristic data indicating the predetermined compression characteristic and the non-linear quantization continuous component data generated by the non-linear quantization unit E51. It may be generated. The entropy encoding unit E53 may omit the entropy encoding of the compression characteristic data.

また、エントロピー符号化部Ｅ５３は、非線形量子化連続成分データのエントロピー符号化を行う代わりに、連続成分データ又は非線形量子化連続成分データを線形予測符号化することにより連続成分圧縮データを生成するようにしてもよい。 Further, the entropy coding unit E53 generates continuous component compressed data by performing linear predictive coding on continuous component data or nonlinear quantized continuous component data instead of entropy coding of nonlinear quantized continuous component data. It may be.

（第４の実施の形態）
次に、この発明の第４の実施の形態を、音声データ再生システムを例として説明する。
この音声データ再生システムは、図１０に示すように、データ入力部Ｄ１と、エントロピー符号復号化部Ｄ２と、非線形逆量子化部Ｄ３と、成分結合部Ｄ４と、サブバンド合成部Ｄ５と、音声データ復元部Ｄ６と、音声合成部Ｄ７とより構成されている。 (Fourth embodiment)
Next, a fourth embodiment of the present invention will be described by taking an audio data reproduction system as an example.
As shown in FIG. 10, the audio data reproduction system includes a data input unit D1, an entropy code decoding unit D2, a nonlinear inverse quantization unit D3, a component combination unit D4, a subband synthesis unit D5, The data restoration unit D6 and the speech synthesis unit D7 are configured.

データ入力部Ｄ１、エントロピー符号復号化部Ｄ２、非線形逆量子化部Ｄ３、成分結合部Ｄ４、サブバンド合成部Ｄ５及び音声データ復元部Ｄ６は、いずれも、ＤＳＰやＣＰＵ等のプロセッサや、このプロセッサが実行するためのプログラムを記憶するメモリなどより構成されている。なお、データ入力部Ｄ１、エントロピー符号復号化部Ｄ２、非線形逆量子化部Ｄ３、成分結合部Ｄ４、サブバンド合成部Ｄ５及び音声データ復元部Ｄ６の一部又は全部の機能を単一のプロセッサが行うようにしてもよい。 The data input unit D1, entropy code decoding unit D2, nonlinear inverse quantization unit D3, component combination unit D4, subband synthesis unit D5 and speech data restoration unit D6 are all processors such as DSPs and CPUs, Is configured by a memory for storing a program to be executed. In addition, a single processor functions part or all of the data input unit D1, the entropy code decoding unit D2, the nonlinear inverse quantization unit D3, the component combination unit D4, the subband synthesis unit D5, and the speech data restoration unit D6. You may make it perform.

データ入力部Ｄ１は、上述の連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データを外部から取得し、取得したこれらのデータのうち、連続成分圧縮データ及びランダム成分圧縮データをエントロピー符号復号化部Ｄ２に供給し、サンプル数データを音声データ復元部Ｄ６へと供給する。 The data input unit D1 acquires the above-described continuous component compressed data, random component compressed data, and sample number data from the outside, and among these acquired data, the continuous component compressed data and the random component compressed data are entropy code decoding units. The sample number data is supplied to D2, and the sample number data is supplied to the audio data restoration unit D6.

なお、データ入力部Ｄ１が連続成分圧縮データ、ランダム成分圧縮データ及びサンプル数データを取得する手法は任意であり、たとえばコンピュータ読み取り可能な記録媒体に記録されている圧縮音素データを読み取ることにより取得してもよく、あるいはEthernet（登録商標）、ＵＳＢ、ＩＥＥＥ１３９４若しくはＲＳ２３２Ｃ等の規格に準拠した態様でシリアル伝送されたこれらのデータ、若しくはパラレル伝送されたこれらのデータを受信することにより取得してもよい。データ入力部Ｄ１は、外部のサーバが記憶しているこれらのデータを、インターネット等のネットワークを介してダウンロードする等の手法により取得してもよい。 The data input unit D1 can use any method for acquiring the continuous component compressed data, the random component compressed data, and the sample number data. For example, the data input unit D1 acquires the compressed phoneme data recorded on the computer-readable recording medium. Alternatively, it may be acquired by receiving these data serially transmitted in a mode compliant with standards such as Ethernet (registered trademark), USB, IEEE1394, or RS232C, or these data transmitted in parallel. . The data input unit D1 may acquire these data stored in an external server by a technique such as downloading via a network such as the Internet.

なお、データ入力部Ｄ１は、連続成分圧縮データ、ランダム成分圧縮データあるいはサンプル数データを記録媒体から読み取る場合、例えば、記録媒体からのデータの読み取りをプロセッサ等の指示に従って行う記録媒体ドライブ装置を更に備えていればよい。また、シリアル伝送されたこれらのデータを受信する場合は、Ethernet（登録商標）、ＵＳＢ、ＩＥＥＥ１３９４若しくはＲＳ２３２Ｃ等の規格に準拠して外部とのシリアル通信を制御する制御回路を更に備えていればよい。 The data input unit D1 further includes, for example, a recording medium drive device that reads data from a recording medium in accordance with instructions from a processor or the like when reading continuous component compressed data, random component compressed data, or sample number data from a recording medium. It only has to have. Further, when receiving these serially transmitted data, it is only necessary to further include a control circuit for controlling serial communication with the outside in accordance with standards such as Ethernet (registered trademark), USB, IEEE 1394, or RS232C. .

エントロピー符号復号化部Ｄ２は、データ入力部Ｄ１より供給された連続成分圧縮データを復号化することにより、（ｎ＋１）個の非線形量子化連続成分データと、圧縮特性データとを復元する。そして、復元されたこれらのデータを非線形逆量子化部Ｄ３へと供給する。また、エントロピー符号復号化部Ｄ２は、データ入力部Ｄ１より供給されたランダム成分圧縮データを復号化することにより、（ｎ＋１）個の非線形量子化ランダム成分データを復元し、復元された非線形量子化ランダム成分データも非線形逆量子化部Ｄ３へと供給する。 The entropy code decoding unit D2 restores (n + 1) non-linear quantized continuous component data and compression characteristic data by decoding the continuous component compressed data supplied from the data input unit D1. Then, these restored data are supplied to the non-linear inverse quantization unit D3. The entropy code decoding unit D2 restores (n + 1) non-linear quantization random component data by decoding the random component compressed data supplied from the data input unit D1, and restores the restored non-linear quantization Random component data is also supplied to the nonlinear inverse quantization unit D3.

非線形逆量子化部Ｄ３は、エントロピー符号復号化部Ｄ２より（ｎ＋１）個の非線形量子化連続成分データ、（ｎ＋１）個の非線形量子化ランダム成分データ及び圧縮特性データを供給されると、これらの非線形量子化連続成分データ及び非線形量子化ランダム成分データが表す波形の瞬時値を、この圧縮特性データが示す圧縮特性と互いに逆変換の関係にある特性に従って変更することにより、非線形量子化される前の（ｎ＋１）個の連続成分データ及び（ｎ＋１）個のランダム成分データを復元する。そして、復元したこれらの連続成分データ及びランダム成分データを成分結合部Ｄ４へと供給する。 When the non-linear inverse quantization unit D3 is supplied with (n + 1) non-linear quantization continuous component data, (n + 1) non-linear quantization random component data and compression characteristic data from the entropy code decoding unit D2, By changing the instantaneous value of the waveform represented by the non-linear quantized continuous component data and the non-linear quantized random component data according to the characteristic inversely transformed from the compression characteristic indicated by the compression characteristic data, (N + 1) continuous component data and (n + 1) random component data are restored. Then, the restored continuous component data and random component data are supplied to the component combination unit D4.

なお、エントロピー符号復号化部Ｄ２が連続成分圧縮データから圧縮特性データを得られなかった場合、非線形逆量子化部Ｄ３は、非線形量子化連続成分データ及び非線形量子化ランダム成分データが表す波形の瞬時値を所定の特性に従って変更することにより連続成分データ及びランダム成分データを復元してもよいし、あるいは、非線形量子化連続成分データ及び非線形量子化ランダム成分データを連続成分データ及びランダム成分データとみなしてそのまま成分結合部Ｄ４へと供給してもよい。 When the entropy code decoding unit D2 cannot obtain the compression characteristic data from the continuous component compressed data, the nonlinear inverse quantization unit D3 instantiates the waveform represented by the nonlinear quantized continuous component data and the nonlinear quantized random component data. Continuous component data and random component data may be restored by changing the value according to a predetermined characteristic, or nonlinear quantized continuous component data and nonlinear quantized random component data are regarded as continuous component data and random component data. Then, it may be supplied to the component coupling part D4 as it is.

成分結合部Ｄ４は、非線形逆量子化部Ｄ３より（ｎ＋１）個の連続成分データ及び（ｎ＋１）個のランダム成分データを供給されると、非線形逆量子化部Ｄ３より供給されたｋ番目のランダム成分データ及びｋ番目の連続成分データが示す各瞬時値同士の和（ただし、実質上互いに同一の時刻における瞬時値同士の和）を示す信号を生成して、サブバンド合成部Ｄ５へと供給する。ｋ番目のランダム成分データが示す瞬時値と、ｋ番目の連続成分データが示す瞬時値との和を示すこの信号は、上述のサブバンド解析部Ｅ３が生成したｋ番目のサブバンドデータに相当する信号である。なお、ｋ番目のランダム成分データが存在しない場合、成分結合部Ｄ４は、ｋ番目の連続成分データをそのままｋ番目のサブバンドデータとして扱えばよい。 When the component combination unit D4 is supplied with (n + 1) continuous component data and (n + 1) random component data from the nonlinear inverse quantization unit D3, the kth random component supplied from the nonlinear inverse quantization unit D3. A signal indicating the sum of the instantaneous values indicated by the component data and the kth continuous component data (however, the sum of the instantaneous values at substantially the same time) is generated and supplied to the subband synthesizing unit D5. . This signal indicating the sum of the instantaneous value indicated by the kth random component data and the instantaneous value indicated by the kth continuous component data corresponds to the kth subband data generated by the subband analysis unit E3. Signal. If the kth random component data does not exist, the component combination unit D4 may treat the kth continuous component data as it is as the kth subband data.

サブバンド合成部Ｄ５は、計（ｎ＋１）個のサブバンドデータを成分結合部Ｄ４より供給されると、これらのサブバンドデータに変換を施すことにより、これらのサブバンドデータにより各周波数成分の強度が表されるピッチ波形データを復元し、復元されたピッチ波形データを、音声データ復元部Ｄ６へと供給する。 When the subband synthesizing unit D5 is supplied with a total of (n + 1) subband data from the component combining unit D4, the subband synthesizing unit D5 performs conversion on these subband data so that the intensity of each frequency component is obtained by the subband data. Is restored, and the restored pitch waveform data is supplied to the audio data restoration unit D6.

サブバンド合成部Ｄ５がサブバンドデータに施す変換は、このサブバンドデータを生成するために音声データに施した変換に対して実質的に逆変換の関係にあるような変換である。従って、たとえばこのサブバンドデータが上述のサブバンド解析部Ｅ３（あるいは、上述のステップＳ１１３の処理）により生成されたものである場合、サブバンド合成部Ｄ５は、サブバンド解析部Ｅ３（あるいは、上述のステップＳ１１３の処理）で施された変換の逆変換を施せばよい。具体的には、たとえばこのサブバンドデータが音素にＤＣＴを施して生成されたものである場合、サブバンド合成部Ｄ５は、このサブバンドデータにＩＤＣＴ（Inverse DCT）を施すようにすればよい。 The conversion performed on the subband data by the subband synthesizing unit D5 is a conversion that has a substantially inverse relationship to the conversion performed on the audio data in order to generate the subband data. Therefore, for example, when the subband data is generated by the above-described subband analysis unit E3 (or the processing in step S113 described above), the subband synthesis unit D5 performs the subband analysis unit E3 (or the above described above). The inverse conversion of the conversion performed in step S113) may be performed. Specifically, for example, when the subband data is generated by applying DCT to phonemes, the subband synthesizing unit D5 may apply IDCT (Inverse DCT) to the subband data.

音声データ復元部Ｄ６は、サブバンド合成部Ｄ５より供給されたピッチ波形データのそれぞれの区間のサンプル数あるいはサンプルの間隔を調整して、当該区間の時間長を、データ入力部Ｄ１より供給されるサンプル数データより特定される時間長になるようにする。
そして、音声データ復元部Ｄ６は、各区間の時間長を変更されたピッチ波形データ、すなわち復元された音声データを出力する。 The audio data restoration unit D6 adjusts the number of samples or the sample interval in each section of the pitch waveform data supplied from the subband synthesis unit D5, and the time length of the section is supplied from the data input unit D1. The time length specified by the sample number data is set.
Then, the voice data restoration unit D6 outputs the pitch waveform data in which the time length of each section is changed, that is, the restored voice data.

なお、音声データ復元部Ｄ６が音声データを出力する手法は任意であり、例えば、図示しないＤ／Ａ（Digital-to-Analog）変換器やスピーカを介して、この音声データが表す音声を再生するようにしてもよい。また、図示しないインターフェース回路を介して外部の装置やネットワークに送出してもよいし、図示しない記録媒体ドライブ装置にセットされた記録媒体へ、この記録媒体ドライブ装置を介して書き込んでもよい。また、音声データ復元部Ｄ６の機能を行っているプロセッサが、自ら実行している他の処理へと、音声データを引き渡すようにしてもよい。 The method of outputting the audio data by the audio data restoration unit D6 is arbitrary. For example, the audio represented by the audio data is reproduced via a D / A (Digital-to-Analog) converter or a speaker (not shown). You may do it. Further, it may be sent to an external device or a network via an interface circuit (not shown), or may be written to a recording medium set in a recording medium drive device (not shown) via this recording medium drive device. Further, the processor performing the function of the voice data restoration unit D6 may deliver the voice data to another process being executed by itself.

この発明の第１の実施の形態に係る音声データ圧縮システムの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice data compression system which concerns on 1st Embodiment of this invention. 図１の音声データ圧縮システムの動作の流れの前半を示す図である。It is a figure which shows the first half of the flow of operation | movement of the audio | voice data compression system of FIG. 図１の音声データ圧縮システムの動作の流れの後半を示す図である。It is a figure which shows the second half of the flow of operation | movement of the audio | voice data compression system of FIG. （ａ）及び（ｂ）は、移相される前の音声データの波形を示すグラフであり、（ｃ）は、移相された後の音声データの波形を表すグラフである。(A) And (b) is a graph which shows the waveform of the audio | voice data before phase-shifting, (c) is a graph showing the waveform of the audio | voice data after phase-shifting. この発明の第２の実施の形態に係る音声データ再生システムの動作の流れの前半を示す図である。It is a figure which shows the first half of the flow of operation | movement of the audio | voice data reproduction | regeneration system based on 2nd Embodiment of this invention. この発明の第３の実施の形態に係る音声データ圧縮システムの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice data compression system which concerns on 3rd Embodiment of this invention. 図６の音声データ圧縮システムのピッチ波形抽出部の構成を示すブロック図である。It is a block diagram which shows the structure of the pitch waveform extraction part of the audio | voice data compression system of FIG. 図６の音声データ圧縮システムの成分分離部の構成を示すブロック図である。It is a block diagram which shows the structure of the component separation part of the audio | voice data compression system of FIG. 図６の音声データ圧縮システムのデータ圧縮部の構成を示すブロック図である。It is a block diagram which shows the structure of the data compression part of the audio | voice data compression system of FIG. この発明の第４の実施の形態に係る音声データ再生システムの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice data reproduction | regeneration system which concerns on 4th Embodiment of this invention.

Explanation of symbols

Ｃ１コンピュータ
ＳＭＤ記録媒体ドライブ装置
Ｅ１音声入力部
Ｅ２ピッチ波形抽出部
Ｅ２０１ケプストラム解析部
Ｅ２０２自己相関解析部
Ｅ２０３重み計算部
Ｅ２０４ＢＰＦ係数計算部
Ｅ２０５バンドパスフィルタ
Ｅ２０６ゼロクロス解析部
Ｅ２０７波形相関解析部
Ｅ２０８位相調整部
Ｅ２０９補間部
Ｅ２１０ピッチ長調整部
Ｅ３サブバンド解析部
Ｅ４成分分離部
Ｅ４１−０〜Ｅ４１−ｎ連続成分抽出部
Ｅ４２−０〜Ｅ４２−ｎランダム成分抽出部
Ｅ５データ圧縮部
Ｅ５１非線形量子化部
Ｅ５２圧縮率設定部
Ｅ５３エントロピー符号化部
Ｅ６出力部
Ｄ１データ入力部
Ｄ２エントロピー符号復号化部
Ｄ３非線形逆量子化部
Ｄ４成分結合部
Ｄ５サブバンド合成部
Ｄ６音素データ復元部
Ｄ７音声合成部 C1 computer SMD recording medium drive device E1 voice input unit E2 pitch waveform extraction unit E201 cepstrum analysis unit E202 autocorrelation analysis unit E203 weight calculation unit E204 BPF coefficient calculation unit E205 band pass filter E206 zero cross analysis unit E207 waveform correlation analysis unit E208 phase adjustment Unit E209 interpolation unit E210 pitch length adjustment unit E3 subband analysis unit E4 component separation unit E41-0 to E41-n continuous component extraction unit E42-0 to E42-n random component extraction unit E5 data compression unit E51 nonlinear quantization unit E52 Compression rate setting unit E53 Entropy encoding unit E6 Output unit D1 Data input unit D2 Entropy code decoding unit D3 Nonlinear inverse quantization unit D4 Component combination unit D5 Subband synthesis unit D6 Phoneme data restoration unit D7 Speech synthesis unit

Claims

Subband extraction means for generating a subband signal representing a temporal change in the intensity of the fundamental frequency component and the harmonic component of the audio signal to be compressed representing the waveform of the audio;
When it is determined that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the subband signal has a continuous component having a periodicity that meets a predetermined criterion, and A random component corresponding to the subband signal excluding the continuous component is separated, and the subband signal is treated as the continuous component when it is determined that the amplitude of the pitch signal has reached the predetermined amount. Component separation means;
Encoding means for performing entropy encoding or linear predictive encoding on the continuous components ;
Audio signal compression apparatus characterized by a Turkey provided with.

By acquiring the audio signal to be compressed representing the waveform of the audio, and dividing the audio signal into a plurality of intervals corresponding to the unit pitch of the audio, the phases of these intervals are made substantially the same, Audio signal processing means for processing the signal into a pitch waveform signal;
Subband extraction means for generating a subband signal representing a temporal change in intensity of the fundamental frequency component and the harmonic component of the pitch waveform signal;
When it is determined that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the subband signal has a continuous component having a periodicity that meets a predetermined criterion, and A random component corresponding to the subband signal excluding the continuous component is separated, and the subband signal is treated as the continuous component when it is determined that the amplitude of the pitch signal has reached the predetermined amount. Component separation means;
Encoding means for performing entropy encoding or linear predictive encoding on the continuous components ;
Audio signal compression apparatus characterized by a Turkey provided with.

The encoding means performs entropy encoding on the result of nonlinear quantization of the continuous component and / or the result of nonlinear quantization of the random component.
Characterized in that, the speech signal compression apparatus according to claim 1 or 2.

The encoding means generates data indicating a quantization characteristic of the nonlinear quantization;
Characterized in that, the speech signal compression apparatus according to claim 3.

The encoding means determines a quantization characteristic of the nonlinear quantization based on a data amount of a continuous component and / or a random component that have been entropy-encoded in the past, and matches the determined quantization characteristic. Perform nonlinear quantization,
Characterized in that, the speech signal compression apparatus according to claim 3 or 4.

An entropy code is extracted by extracting continuous components having a periodicity that meets a predetermined standard from subband signals representing temporal changes in the fundamental frequency components and harmonic component intensities of audio signals to be compressed that represent speech waveforms. Decoding means for obtaining an input signal corresponding to a signal that has been subjected to normalization or linear predictive coding, and restoring the continuous component by decoding the input signal;
If the random component corresponding to the restored continuous component minus from said sub-band signal is present, by obtains the random component, adds the random component to the connected component, the sub-band A subband signal restoration unit that restores a signal and treats the restored continuous component as the subband signal when the random component does not exist ;
And voice signal restoring means for restoring the audio signal of the compressed object based on the reconstructed subband signals,
Speech signal decompression apparatus according to claim and Turkey provided with.

An audio signal compression method executed by an audio signal compression apparatus having a processor,
The processor generates a subband signal representing a temporal change in intensity of a fundamental frequency component and a harmonic component of an audio signal to be compressed representing an audio waveform;
When the processor determines that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the processor has a periodicity that meets a predetermined criterion from the subband signal. When the component and a random component corresponding to the sub-band signal obtained by removing the continuous component are separated and it is determined that the amplitude of the pitch signal has reached the predetermined amount, the sub-band signal is Treat as a continuous component,
The processor performs entropy coding or linear predictive coding on the continuous components;
An audio signal compression method.

An audio signal compression method executed by an audio signal compression apparatus having a processor,
When the processor acquires an audio signal to be compressed representing a waveform of an audio and divides the audio signal into a plurality of intervals corresponding to a unit pitch of the audio, the phases of these intervals are made substantially the same. By processing the audio signal into a pitch waveform signal,
The processor generates a subband signal representing a temporal change in intensity of a fundamental frequency component and a harmonic component of the pitch waveform signal;
When the processor determines that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the processor has a periodicity that meets a predetermined criterion from the subband signal. When the component and the random component corresponding to the sub-band signal obtained by removing the continuous component are separated and it is determined that the amplitude of the pitch signal has reached a predetermined amount, the sub-band signal is Treat as an ingredient,
The processor performs entropy coding or linear predictive coding on the continuous components;
An audio signal compression method.

An audio signal restoration method executed by an audio signal restoration device having a processor,
The processor extracts a continuous component having a periodicity enough to meet a predetermined standard from subband signals representing temporal changes in intensity of fundamental frequency components and harmonic components of a compression target speech signal representing a speech waveform. To obtain an input signal corresponding to the one subjected to entropy coding or linear prediction coding, and to restore the continuous component by decoding the input signal,
Wherein the processor, when the random component corresponding to the restored continuous component minus from said sub-band signal is present, acquires the random component, by adding the random component to the continuous component The subband signal is restored, and when the random component is not present, the restored continuous component is treated as the subband signal,
The processor restores the audio signal to be compressed based on the restored subband signal;
A method for restoring an audio signal.

Computer
Subband extraction means for generating a subband signal representing a temporal change in intensity of a fundamental frequency component and a harmonic component of an audio signal to be compressed representing a waveform of the audio ;
When it is determined that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the subband signal has a continuous component having a periodicity that meets a predetermined criterion, and A random component corresponding to the subband signal excluding the continuous component is separated, and the subband signal is treated as the continuous component when it is determined that the amplitude of the pitch signal has reached the predetermined amount. Component separation means ,
Encoding means for performing entropy encoding or linear predictive encoding on the continuous components ;
Program to function as a.

Computer
By acquiring the audio signal to be compressed representing the waveform of the audio, and dividing the audio signal into a plurality of intervals corresponding to the unit pitch of the audio, the phases of these intervals are made substantially the same, Audio signal processing means for processing a signal into a pitch waveform signal ;
Subband extraction means for generating a subband signal representing a temporal change in intensity of the fundamental frequency component and the harmonic component of the pitch waveform signal ;
When it is determined that the amplitude of the pitch signal obtained by filtering the audio signal does not reach a predetermined amount, the subband signal has a continuous component having a periodicity that meets a predetermined criterion, and A random component corresponding to the subband signal excluding the continuous component is separated, and the subband signal is treated as the continuous component when it is determined that the amplitude of the pitch signal has reached the predetermined amount. Component separation means ,
Encoding means for performing entropy encoding or linear predictive encoding on the continuous components ;
Program to function as a.

Computer
An entropy code is extracted by extracting continuous components having a periodicity that meets a predetermined standard from subband signals representing temporal changes in the fundamental frequency components and harmonic component intensities of audio signals to be compressed that represent speech waveforms. A decoding unit that obtains an input signal corresponding to a signal that has been subjected to normalization or linear predictive coding, and restores the continuous component by decoding the input signal ;
If the random component corresponding to the restored continuous component minus from said sub-band signal is present, by obtains the random component, adds the random component to the connected component, the sub-band A subband signal restoration unit that restores a signal and treats the restored continuous component as the subband signal when the random component is not present ;
Voice signal restoring means for restoring the audio signal of the compressed object based on the reconstructed subband signals,
Program to function as a.