JP2648138B2

JP2648138B2 - How to compress audio patterns

Info

Publication number: JP2648138B2
Application number: JP59501357A
Authority: JP
Inventors: サループアタル，ビシユニユ
Original assignee: AT&T Corp
Current assignee: AT&T Corp
Priority date: 1983-04-12
Filing date: 1984-03-12
Publication date: 1997-08-27
Anticipated expiration: 2012-08-27
Also published as: CA1201533A; EP0138954B1; WO1984004194A1; EP0138954A1; EP0138954A4; DE3474873D1; JPS60501076A

Description

【発明の詳細な説明】発明の背景本発明は音声処理、特に音声パターンの圧縮並びにこ
のような圧縮されたパターンからの音声パターンの合成
に関する。Description: BACKGROUND OF THE INVENTION The present invention relates to audio processing, and in particular to the compression of audio patterns and the synthesis of audio patterns from such compressed patterns.

音声信号を適切に了解できるようにするためには少く
とも4kHzの帯域が必要であることが知られている。音声
合成装置の認識装置あるいは符号化装置のようなデイジ
タル音声処理システムにおいては、4kHzの帯域の全体の
波形のデイジタルな記憶に要するチヤネル容量は極めて
大きい。音声信号を表わすのに必要なデイジタル符号の
数を減少するために、多数の手法が工夫されている。パ
ルス符号変調（PCM）、差動パルス符号変調（DPCM）、
デルタ変調あるいは適応予測符号化のような波形信号に
よつて、16〜64kbpsのビツト周波数で自然性の高い高品
質の音声が得られるようになつて来た。しかし波形符号
器によつて得られる音声の品質はビツト周波数が16kbps
より下がるに従つて劣化して来ることになる。It is known that a bandwidth of at least 4 kHz is necessary so that an audio signal can be properly understood. In a digital speech processing system such as a speech synthesizer recognizer or encoder, the channel capacity required for digital storage of the entire waveform in the 4 kHz band is extremely large. Numerous approaches have been devised to reduce the number of digital codes required to represent a speech signal. Pulse code modulation (PCM), differential pulse code modulation (DPCM),
Waveform signals such as delta modulation or adaptive predictive coding have made it possible to obtain natural, high quality speech at bit frequencies of 16-64 kbps. However, the quality of the speech obtained by the waveform encoder is 16 kbps.
It will deteriorate as it goes down.

米国特許3,642,302で開示された他の音声符号化法で
は例えば12−16の少数のゆるやかに変化するパラメータ
を用い、これを処理して歪みの少ない音声パターンの写
しを形成するようになつている。このようなパラメー
タ、例えば、線形予測分析によつて発生される線形予測
係数（LPC）又はログエリアパラメータは帯域制限によ
る大きな歪みなしに50Hzに帯域制限できる。LPCあるい
はログエリアパラメータ（対数断面積パラメータ）の符
号化のためには、一般に帯域の２倍のサンプリング周波
数を必要とし、各々の結果として得られるログエリアパ
ラメータのフレームを量子化する必要がある。ログエリ
アパラメータの各フレームは48ビツトを使用して量子化
できる。従つて、各々が50Hzの帯域を持つ12のログエリ
アパラメータでは4800ビツト／秒のビツト周波数を持つ
ことになる。Another speech coding method disclosed in U.S. Pat. No. 3,642,302 uses a small number of slowly varying parameters, for example 12-16, which are processed to form a transcript of the speech pattern with less distortion. Such parameters, such as linear prediction coefficients (LPC) or log area parameters generated by linear prediction analysis, can be band-limited to 50 Hz without significant distortion due to band-limiting. Encoding LPC or log area parameters (log cross section parameters) generally requires a sampling frequency twice the bandwidth, and it is necessary to quantize each resulting frame of log area parameters. Each frame of the log area parameter can be quantized using 48 bits. Thus, 12 log area parameters, each with a 50 Hz bandwidth, would have a bit frequency of 4800 bits / sec.

これ以上の帯域圧縮を行なえば、ビツト周波数は低下
するが、この結果として歪みが増大し、低い帯域パラメ
ータから合成された音声の了解性が害されることにな
る。音声パターンの中で声は一定の割合では発生しない
ことがわかつており、このような一定でない発生を考慮
に入れた手法が工夫されている。米国特許4,349,700で
はダイナミツクプログラミングを用いて広汎な音声パタ
ーンを持つ音声の認識ができるようになつている。米国
特許4,038,503では音声の特徴がより均一に表わされる
ようにするために音声パターンの時間幅を非線形的に変
化する手法が開示されている。しかし、これらの構成で
はパターンの内の最も急速に変化する特徴に対応する周
波数でサンプルされた音声特徴信号を蓄積して処理する
ことが必要である。本発明の目的はデイジタル処理と記
憶の要求を減少した音声表現および／あるいは音声合成
の装置を提供することにある。If the band compression is further performed, the bit frequency decreases, but as a result, the distortion increases, and the intelligibility of the speech synthesized from the low band parameters is impaired. It is known that voices do not occur at a fixed rate in a voice pattern, and a method has been devised in consideration of such non-uniform occurrences. In U.S. Pat. No. 4,349,700, dynamic programming can be used to recognize speech with a wide range of speech patterns. U.S. Pat. No. 4,038,503 discloses a method of non-linearly changing the time width of an audio pattern in order to more uniformly express audio features. However, these configurations require that audio feature signals sampled at frequencies corresponding to the most rapidly changing features of the pattern be stored and processed. It is an object of the present invention to provide a speech expression and / or speech synthesis device with reduced digital processing and storage requirements.

発明の要約人間の声における音あるいは事象は毎秒10〜20事象の
間で変化する割合で平均的に発生している。このような
音声事象（“音声事象”とは、音の系列における１つの
音（例えば、１つの母音）を意味するものである。）は
不均一に分布した時間幅で生じており、種々の音につい
て声道の運動は大幅に異ることが観察されている。従つ
て、音声の特徴パラメータを不均一な時間間隔を持つた
位置にある短い音声事象に関連した単位に変換すること
によつて大幅な圧縮を実現することができる。このよう
な音声事象ユニツトを高能率で符号化することによつ
て、パターン表現の精度を劣化することなく高い能率を
得ることができる。SUMMARY OF THE INVENTION Sounds or events in the human voice occur on average at a rate that varies between 10 and 20 events per second. Such a speech event ("speech event" means one sound (for example, one vowel) in a sound sequence) occurs with a non-uniformly distributed time width, and various It has been observed that the movements of the vocal tract for sounds vary widely. Thus, significant compression can be achieved by converting speech feature parameters into units associated with short speech events located at non-uniform time intervals. By encoding such a voice event unit with high efficiency, high efficiency can be obtained without deteriorating the precision of pattern expression.

本発明は音声パターンを分析して第１の周波数で音声
パターンの音響的特徴を表わす信号の集合を発生するよ
うな音声圧縮装置を指向している。この音声特徴信号に
応動して音声パターン中の連続した音声事象の音響的特
徴を表わす信号の系列が発生され、各音声事象を表わす
信号に対応するデイジタル的に符号化されたデイジタル
的に符号化された信号が第１の周波数より低い周波数で
発生される。The present invention is directed to an audio compression device that analyzes an audio pattern and generates a set of signals at a first frequency that represents an acoustic feature of the audio pattern. In response to the speech feature signal, a sequence of signals representing the acoustic features of successive speech events in the speech pattern is generated, and digitally encoded digitally encoded corresponding to each speech event signal. The generated signal is generated at a lower frequency than the first frequency.

本発明のひとつの特徴に従えば、音声パターンは音声
要素信号の予め定められた集合を蓄積し、該音声要素信
号を組合わせて音声パターンの音響的特徴を表わす信号
を形成し、音声特徴信号の集合に応動して該音声パター
ンを発生することによつて音声パターンが合成される。
予め定められた音声要素信号は第１の周波数で音声パタ
ーンを分析して音声特徴信号の集合を発生することによ
つて形成される。該音声パターン中の連続した音声事象
の音響的特徴を表わす信号の系列は該サンプルされた音
声特徴信号に応動して発生され、該第１の周波数より低
い周波数で音声事象表現信号に対応するデイジタル的に
符号化された信号の系列が形成される。According to one aspect of the invention, the audio pattern stores a predetermined set of audio component signals and combines the audio component signals to form a signal representative of the acoustic features of the audio pattern, The voice pattern is synthesized by generating the voice pattern in response to the set of.
The predetermined audio component signal is formed by analyzing an audio pattern at a first frequency to generate a set of audio feature signals. A sequence of signals representing acoustic features of successive speech events in the speech pattern is generated in response to the sampled speech feature signal, and wherein a digital signal corresponding to a speech event representation signal at a frequency lower than the first frequency. A sequence of dynamically encoded signals is formed.

図面の説明第１図は本発明の一般的方法を図示するフローチヤー
ト、第２図は本発明を説明する音声パターン符号化回路の
ブロツク図；第３図〜第８図は第２図の回路の動作を説明する詳細
なフローチヤート；第９図は本発明の一例たる音声合成器の図；第10図は第９図の回路の動作を示すフローチヤート；第11図は第２図の回路で得られる音声事象タイミング
信号を示す波形；第12図はそれに関連した音声パターンと音声事象特徴
信号を図示する波形である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flowchart illustrating the general method of the present invention; FIG. 2 is a block diagram of a speech pattern encoding circuit illustrating the present invention; FIGS. 3-8 are the circuits of FIG. FIG. 9 is a diagram of a voice synthesizer as an example of the present invention; FIG. 10 is a flowchart showing the operation of the circuit of FIG. 9; FIG. 11 is a circuit of FIG. FIG. 12 is a waveform illustrating a voice pattern and a voice event feature signal associated therewith.

一般的説明音声パターンを線形予測あるいは他のスペクトル分析
から誘導された音声特徴信号の系列で表わすことは当事
者には周知である。音声パターンの能率の良い表現を得
るために近い間隔を持つ時間でサンプルされたログエリ
アパラメータは従来から音声合成で使用されている。本
発明に従えば、ログエリアパラメータは次のように個々
の音声事象特徴信号φ_ｋ（ｎ）の系列に変換される。General Description It is well known to those skilled in the art to represent speech patterns as a sequence of speech feature signals derived from linear prediction or other spectral analysis. Log area parameters sampled at close intervals to obtain efficient representations of speech patterns have been used in speech synthesis. According to the invention, the log area parameters are converted into a sequence of individual audio event feature signals φ _k (n) as follows.

音声事象特徴信号φ_ｋ（ｎ）はシーケンシヤルであ
り、ログエリアパラメータのフレーム周波数より本質的
に低い周波数の音声事象周波数で生ずる。式（１）にお
いては、Ｐは線形予測分析によつて決定されるログエリ
アパラメータy_i（ｎ）の総数である。ｍはパターン中の
音声事象の数に対応し、ｎはログエリアパラメータのサ
ンプリング周波数での音声パターン中のサンプルのイン
デクスであり、φ_ｋ（ｎ）はサンプル時点ｎにおけるｋ
番目の音声事象信号であり、a_ikはｋ番目の音声事象関
数のｉ番目のログエリアパラメータに対する寄与さに対
応する組合せ係数である。式（１）はマトリクスの形で
はＹ＝ＡΦ （２）と表わされる。ここでＹはその（i,n）要素がy_i（ｎ）
であるＰ×Ｎの行列であり、Ａはその（i,k）要素がa_ik
であるＰ×ｍの行列であり、Φはその（k,n）要素がφ
_ｋ（ｎ）であるｍ×Ｎの行列である。各音声事象ｋは音
声パターンの小さなセグメントだけを表わすものである
から、それを表わす信号φ_ｋ（ｎ）は全パターンのサン
プル区間の小さな範囲についてだけ非零となる。式
（１）の各々のログエリアパラメータy_i（ｎ）は音声事
象関数φ_ｋ（ｎ）の線形結合であり、各々のy_i（ｎ）パ
ラメータの帯域は音声事象関数φ_ｋ（ｎ）の任意のもの
の最大の帯域である。従つて、y_i（ｎ）を直接符号化す
れば、φ_ｋ（ｎ）スイツチ事象信号の符号化と式（１）
の組合せ係数信号より多くのビツトを要することは容易
にわかる。 The audio event feature signal φ _k (n) is sequential and occurs at an audio event frequency that is substantially lower than the frame frequency of the log area parameter. In equation (1), P is the total number of log area parameters y _i (n) determined by linear prediction analysis. m corresponds to the number of audio events in the pattern, n is the index of the sample in the audio pattern at the sampling frequency of the log area parameter, and φ _k (n) is k at the sample time n
A _ik is the combination factor corresponding to the contribution of the k th audio event function to the ith log area parameter. Equation (1) is expressed as Y = AΦ (2) in the form of a matrix. Where Y is (i, n) element y _i (n)
A is a P × N matrix, and A has its (i, k) element a _ik
Is a P × m matrix, and Φ is a matrix whose (k, n) element is φ
It is an m × N matrix that is _k (n). Since each speech event k represents only a small segment of the speech pattern, the signal φ _k (n) representing it is non-zero only for a small range of the sample interval of the entire pattern. Each log area parameter y _i (n) of equation (1) is a linear combination of sound events function phi _{k (n),} the bandwidth of each y _i (n) parameter sound event function phi _k of _(n) The largest band of any thing. Therefore, if y _i (n) is directly encoded, the encoding of φ _k (n) switch event signal and the equation (1)
It is easy to see that more bits are required than the combination coefficient signal.

第１図は本発明の一般的方法を図示したフローチヤー
トである。本発明に従えば、音声パターンが分析され
て、ログエリアパラメータの音響的特徴信号を表わす信
号の系列が形成される。しかし、LPC、部分相関（PARCO
R）あるいは他の音声の特徴（例えば、米国特許3,624,3
02）をログエリアパラメータの代りに使用することがで
きる。次にこの特徴信号は、伝送あるいは蓄積のために
低いビット周波数で符号化された信号を表わす音声事象
の集合に変換される。FIG. 1 is a flowchart illustrating the general method of the present invention. According to the invention, the speech pattern is analyzed to form a sequence of signals representing the acoustic feature signal of the log area parameter. However, LPC, partial correlation (PARCO
R) or other audio features (eg, US Patent 3,624,3
02) can be used instead of the log area parameter. This feature signal is then converted to a set of speech events representing a signal encoded at a lower bit frequency for transmission or storage.

第１図を参照すれば、ブロツク101では音声パターン
に対応する電気的信号は不必要な高周波の音声と雑音の
成分を除去するために低域波され、波された信号が
フイルタの遮断周波数の２倍の周波数でサンプルされ
る。音声パターンのサンプルは次にブロツク110でパタ
ーンに対応するデイジタル符号化された信号の系列に変
換される。サンプル信号に要求される記憶は多くの現実
的な応用では大きすぎるから、当業者には周知の線形予
測手法を使つて、ブロツク120においては、これはログ
エリアパラメータ信号を発生するのに使用される。ログ
エリアパラメータ信号y_i（ｎ）は音声パターン中の予期
される最高速の事象を正確に表わすことができるのに充
分な一定のサンプリング周波数で発生される。典型的に
は２から５ミリ秒のサンプリング間隔で選択される。Referring to FIG. 1, at block 101, an electrical signal corresponding to a voice pattern is low-pass-filtered to remove unnecessary high-frequency voice and noise components, and the wave signal is filtered at a cutoff frequency of a filter. Sampled at twice the frequency. The samples of the audio pattern are then converted at block 110 to a sequence of digitally encoded signals corresponding to the pattern. Because the storage required for the sampled signal is too large for many practical applications, this is used in block 120 to generate the log area parameter signal, using linear prediction techniques well known to those skilled in the art. You. The log area parameter signal y _i (n) is generated at a constant sampling frequency sufficient to accurately represent the fastest expected event in a speech pattern. Typically, a sampling interval of 2 to 5 milliseconds is selected.

ログエリアパラメータ信号が記憶されたあとで、パタ
ーン中の連続した音声事象の発生の時点が検出され、ブ
ロツク130では事象のタイミングを表わす信号が発生さ
れる。これはパターンを例えば0.25秒の間隔の予め定め
られた短いセグメントに分割することによつて行なわれ
る。開始フレームn_bと終了フレームn_eを持つ各々の連続
した時間において、セグメントのログエリアパラメータ
y_i（ｎ）に対応してログエリアパラメータ信号の行列が
形成される。行列の冗長性は最初の４個の主成分をフア
クタアウトすることによつて次のように減少される。After the log area parameter signal is stored, the point in time of the occurrence of a continuous audio event in the pattern is detected, and block 130 generates a signal indicating the timing of the event. This is done by dividing the pattern into predetermined short segments, for example at intervals of 0.25 seconds. In each of the consecutive time with the start frame n _b and the end frame n _e, segment log area parameters
A matrix of log area parameter signals is formed corresponding to y _i (n). Matrix redundancy is reduced by factoring out the first four principal components as follows.

最初の４項の主成分は、ベルシステムテクニカルジヤ
ーナル誌54巻第10号頁1693−1723（1975年12月）のエ
ム．アール．サンバー（M.R.Sanbur）のアンエフイシイ
エントリニアプレデイクシヨンボコーダ（“An Efficie
nt Linear Prediction Vocoder"）と題する論文に述べ
らたような周知の方法によつて得られる。この結果得ら
れるu_m（ｎ）の関数はφ_ｋ（ｎ）が時間的に最も圧縮さ
れるように係数b_kmを選択することによつて、所望の音
声事象信号が定義されるように線形に組合わされる。 The main components of the first four terms are M.M. in Bell System Technical Journal, Vol. 54, No. 10, p. 169-1723 (December 1975). R. "An Efficie vocoder (" An Efficie ")
nt Linear Prediction Vocoder "). The resulting function of u _m (n) is such that φ _k (n) is best compressed in time. By selecting the coefficient b _km in a linear manner, the desired speech event signals are linearly combined so as to be defined.

このようにして、音声パターンは連続した成分（広が
りが小さい）音声事象信号φ_ｋ（ｎ）の系列によつて表
わされ、その各々を能率良く符号化することができる。
音声事象信号の形と位置を得るために距離測度は最適のφ（ｎ）を選択するために最小化され、その位
置は音声事象タイミング信号から得られる。式5,6および７から、最小の広がりを持
つ音声事象信号φ_ｋ（ｎ）はν（Ｌ）の負の０交叉を中
心としている。 In this way, the speech pattern is represented by a series of continuous component (small spread) speech event signals φ _k (n), each of which can be efficiently encoded.
Distance measures to get shape and location of audio event signals Is minimized to select the optimal φ (n), the position of which is Obtained from From Equations 5, 6, and 7, the speech event signal φ _k (n) having the smallest spread is centered at the negative zero crossing of ν (L).

ブロツク130でν（Ｌ）信号を発生したあと、ブロツ
ク140に入り、ν（Ｌ）の負への零交叉からの音声事象
発生信号について、ブロツク130のプロセスを使つて音
声事象信号φ_ｋ（ｎ）が正確に決定される。音声事象表
現信号のシーケンスを発生したあとで、平均二乗誤差を
最小化することによつて式（１）および（２）に組合せ
係数a_ikが発生される。After generating a [nu (L) signal at block 130, enters the block 140, [nu the audio event occurrence signal from the zero crossing in the negative (L), using the process of block 130 connexion sound event signal phi _{k (n} ) Is determined exactly. After generating the sequence of speech event representation signals, the combination coefficients _aik are generated in equations (1) and (2) by minimizing the mean square error.

ここでＭは加算を行なうインデクスｎの範囲内での音
声事象の総数である。Ｅの係数a_ikに関する偏微分を０
において、係数a_ikは次の線形連立方程式から得られ
る。 Here, M is the total number of audio events within the range of the index n where the addition is performed. The partial derivative of the coefficient a _ik of E
In, the coefficients a _ik are obtained from the following linear simultaneous equations:

詳細な説明第２図は音声パターンをパターンを表わすデイジタル
符号の記憶されたシーケンスに変換するように動作する
電磁トランスジユーサ201、フイルタおよびサンプラ20
3、アナログデイジタル変換器205をよび音声サンプルス
トア210を図示している。中央処理装置275はリードオン
リーメモリー（ROM）215、220、225、230および235に永
久的に記憶された命令によつて制御されるモトローラの
MC68000型のようなマイクロプロセツサを含んでいる。
プロセツサ275は算術プロセツサ280とストア210、240、
245、250、255および260の動作を指示し、これを音声事
象特徴信号のコンパクトな集合に圧縮するようになつて
いる。次に音声事象特徴信号は入出力インタフエース26
5を通して利用装置285に供給される。利用装置はデイジ
タル通信設備、あるいは遅延された伝送のため記憶装置
あるいは音声合成装置に関連した記憶装置である。モト
ローラMC68000の集積回路についてはモトローラ社1980
年刊のMC68000の16ビツトマイクロプロセツサユーズマ
ニユアルに述べられており、算術プロセツサはTRW社のM
PY−16HJ集積回路から成る。 DETAILED DESCRIPTION FIG. 2 shows an electromagnetic transducer 201, a filter and a sampler 20 operable to convert a speech pattern into a stored sequence of digital codes representing the pattern.
3, the analog digital converter 205 and the audio sample store 210 are illustrated. The central processing unit 275 is a Motorola controlled by instructions permanently stored in read only memories (ROMs) 215, 220, 225, 230 and 235.
Includes microprocessors such as the MC68000.
Processor 275 is an arithmetic processor 280 and stores 210, 240,
It directs the operation of 245, 250, 255, and 260 to compress it into a compact set of speech event feature signals. Next, the voice event feature signal is input / output interface 26
It is supplied to the utilization device 285 through 5. The utilization device is a digital communication facility, or a storage device for delayed transmission or a storage device associated with a speech synthesizer. For Motorola MC68000 integrated circuits Motorola 1980
As described in the annual MC68000 16-bit Microprocessor Use Manual, the arithmetic processor is TRW's M
Consists of a PY-16HJ integrated circuit.

第２図を参照すれば、音声パターンは電気音響変換器
201に与えられ、それから得られる電気的信号が低域フ
イルタ兼サンプラ回路203に供給され、これは信号帯域
の上限を3.5kHzに制限し、8kHzの周波数で波された信
号をサンプルするように動作する。アナログ・デイジタ
ル変換器205はフイルタ兼サンプラ203からのサンプルさ
れた信号を、その各々が信号サンプルの大きさを表わす
デイジタル符号のシーケンスに変換する。この結果得ら
れたデイジタル符号は音声サンプルストア210に順次に
記憶される。Referring to FIG. 2, the voice pattern is an electroacoustic transducer.
The resulting electrical signal is provided to 201 and supplied to a low-pass filter / sampler circuit 203, which operates to limit the upper limit of the signal band to 3.5kHz and sample the signal oscillated at a frequency of 8kHz. I do. The analog-to-digital converter 205 converts the sampled signal from the filter / sampler 203 into a sequence of digital codes, each of which represents the size of a signal sample. The resulting digital codes are sequentially stored in the audio sample store 210.

ストア210にサンプルされた音声パターンコードを記
憶したあとで、中央プロセツサ275はログエリアパラメ
ータプログラムストア215に記憶された命令を中央プロ
セツサに関連したランダムアクセスメモリーに転送する
ようにする。第３図のフローチヤートはストア215から
の命令に応動してコントローラによつて実行される動作
のシーケンスを図示している。After storing the sampled voice pattern code in store 210, central processor 275 causes the instructions stored in log area parameter program store 215 to be transferred to a random access memory associated with the central processor. The flowchart of FIG. 3 illustrates the sequence of operations performed by the controller in response to instructions from store 215.

第３図を参照すれば、ブロツク305に最初に入つたと
きに、フレームカウントインデクスｎを１にリセツトす
る。次に現在のフレームの音声サンプルはブロツク310
に示すようにストア210から中央プロセツサ275を経由し
て算術プロセツサに転送される。音声サンプル信号終了
の発生は判定ブロツク315でチエツクされる。音声パタ
ーン信号終了の検出までは制御はブロツク325に渡さ
れ、プロセツサ275と280でそのフレームについてLPC分
析が実行される。現在のフレームのLPCパラメータ信号
は次にブロツク330でログエリアパラメータ信号y
_i（ｋ）に変換され、ログエリアパラメータ信号はログ
エリアパラメータストア240に記憶される（ブロツク33
5）。フレームカウントはブロツク345で１だけ増分さ
れ、次のフレームの音声サンプルが読まれる。（ブロツ
ク310）。音声パターン終了信号が生じたときに、制御
はブロツク320に与えられ、パターン中のフレームの数
に対応する信号がプロセツサ275に記憶される。Referring to FIG. 3, when the block 305 is first entered, the frame count index n is reset to 1. Next the audio sample of the current frame is block 310
Is transferred from the store 210 to the arithmetic processor via the central processor 275. The occurrence of the audio sample signal end is checked in decision block 315. Control is passed to block 325 until the end of the voice pattern signal is detected, and the processors 275 and 280 perform LPC analysis on the frame. The LPC parameter signal of the current frame is then the log area parameter signal y at block 330.
_i (k) and the log area parameter signal is stored in the log area parameter store 240 (block 33).
Five). The frame count is incremented by one at block 345 and the audio sample of the next frame is read. (Block 310). When an audio pattern end signal occurs, control is provided to block 320, and a signal corresponding to the number of frames in the pattern is stored in processor 275.

ログエリアパラメータの記憶動作が完了したあとで、
ROM220から記憶された命令をランダムアクセスメモリー
に転送するために中央プロセツサ275が動作する。スト
ア220からの命令コードは第４図および第５図のフロー
チヤートで図示されている。これらの命令コードはそれ
から音声パターンの音声事象の発生が検出され位置が検
出される信号ν（Ｌ）を発生するのに用いられる。After the log area parameter storage operation is completed,
The central processor 275 operates to transfer the instructions stored from the ROM 220 to the random access memory. The instruction code from store 220 is illustrated in the flowcharts of FIGS. These instruction codes are then used to generate a signal ν (L) from which the occurrence of a sound event of the sound pattern is detected and the position is detected.

第４図を参照すれば、ログエリアパラメータのフレー
ムカウントはブロツク403でプロセツサ275によつて初期
にリセツトされ、音声パターンの初期時間幅n₁からn₂の
ログエリアパラメータy_i（ｎ）はログエリアパラメータ
ストア240からプロセツサ275に転送される（ブロツク41
0）。音声パターンの終りに達したかどうかが判定ブロ
ツク415で判定されたあと、ブロツク420に入り前述した
ように最初の４個の主要項u_i（ｎ）、ｉ＝1,…,4をフア
クタすることによつてログエリアパラメータ信号の冗長
性が除かれる。Referring to FIG. 4, the frame count of the log area parameter is initially reset at block 403 by the processor 275, and the log area parameter y _i (n) of the initial time width n ₁ to n ₂ of the voice pattern is log. Transferred from area parameter store 240 to processor 275 (block 41)
0). After it is determined in decision block 415 whether the end of the voice pattern has been reached, block 420 is entered and factors of the first four main terms u _i (n), i = 1,. This eliminates the redundancy of the log area parameter signal.

現在の時間幅におけるログエリアパラメータはこのと
きで表わされ、これから信号の集合が得られる。その時間幅にわたるu_i（ｎ）信号は、ブロ
ツク425でパラメータb_i ｉ＝1,…,4を使用することに
よつての信号の集合が得られ、φ_ｋがn₁からn₂の範囲で最もコ
ンパクトであるようにすることによつて、この時間幅で
のu_i（ｎ）信号が発生される。これは式（６）のθ
（Ｌ）の関数を使用することによつて実現される。音声
パターンの音声事象タイミングを表わす信号ν（Ｌ）は
次にブロツク430で式（７）に従つて形成され、ν
（Ｌ）信号はタイミングパラメータストア245に記憶さ
れる。フレームカウンタｎはどの程度近くで隣接した音
声事象信号φ_ｋ（ｎ）が生ずると期待されるかに従つ
て、一定値例えば、５だけ増分され（ブロツク435）、
音声パターンの次の時間幅でφ_ｋ（ｎ）とν（Ｌ）の信
号を発生するために、ブロツク410に再入するようにす
る。The log area parameter at the current time width is And a set of signals from now on Is obtained. The u _i (n) signal over that time span is obtained by using the parameters b _i i = 1,. The u _i (n) signal over this time span is generated by making φ _{k the} most compact in the range of n ₁ to n ₂ . This is θ in equation (6).
This is realized by using the function of (L). A signal v (L) representing the voice event timing of the voice pattern is then formed at block 430 according to equation (7),
The (L) signal is stored in the timing parameter store 245. The frame counter n is incremented by a constant value, eg, 5 (block 435), depending on how close adjacent audio event signals φ _k (n) are expected to occur.
Blocks 410 are re-entered to generate φ _k (n) and ν (L) signals in the next time span of the voice pattern.

判定ブロツク415で音声パターン終了が検出されたと
きに、音声パターンのフレームカウントが記憶され（ブ
ロツク440）、その音声パターンのための音声事象タイ
ミングパラメータ信号の発生が完了する。第11図は例と
して示したメツセージの発生の場合の音声事象タイミン
グパラメータ信号を図示している。第11図の負方向への
各々の零交叉は音声事象特徴信号φ_ｋ（ｎ）の中央に対
応している。When the end of the voice pattern is detected at decision block 415, the frame count of the voice pattern is stored (block 440) and the generation of the voice event timing parameter signal for that voice pattern is completed. FIG. 11 illustrates a voice event timing parameter signal in the case of the occurrence of a message shown as an example. Each zero crossing in the negative direction of FIG. 11 corresponds to the center of the speech event feature signal φ _k (n).

第５図を参照すれば、ブロツク501に入ると、音声事
象インデクスＩは０にリセツトされ、フレームインデク
スｎは再び１にセツトされる。インデクスＩとｎが初期
化されたあと、音声事象タイミングパラメータ信号の連
続したフレームがストア245から読み出され（ブロツク5
05）、その中の零交叉はブロツク510でプロセツサ275に
よつて検出される。零交叉が見付かつたときにはいつで
も、音声事象インデクスＩが増分され（ブロツク51
5）、音声事象位置ストア250に音声事象位置フレームが
記憶される（ブロツク520）。次にブロツク525フレーム
インデクスｎが増分され、ブロツク530で音声パターン
フレームの終りがチエツクされる。音声パターンフレー
ム終了信号が検出されるまで、ブロツク530からブロツ
ク505に戻り、各々のくりかえしてパターンの次々の音
声事象位置フレームが検出される。Referring to FIG. 5, upon entering block 501, the voice event index I is reset to zero and the frame index n is reset to one. After the indexes I and n have been initialized, successive frames of the audio event timing parameter signal are read from the store 245 (block 5).
05), the zero crossing therein is detected by the processor 275 at block 510. Whenever a zero crossing is found, the voice event index I is incremented (block 51).
5) The voice event location frame is stored in the voice event location store 250 (block 520). Next, block 525 frame index n is incremented and the end of the voice pattern frame is checked at block 530. Block 530 returns to block 505 until the voice pattern frame end signal is detected, and each successive voice event location frame of the pattern is detected.

ブロツク530で音声パターン終了信号が検出される
と、中央プロセツサ235は音声事象特徴信号発生プログ
ラムストア225をアドレスし、その内容をプロセツサに
転送するように動作する。中央プロセツサ275と算術プ
ロセツサ280はこれによつてストア240中のログエリアパ
ラメータ信号とストア250中の音声事象位置信号に応動
して音声事象特徴信号のシーケンスを発生するようにな
つている。音声事象特徴信号発生命令は第６図のフロー
チヤートに図示されている。Upon detection of the audio pattern end signal at block 530, the central processor 235 operates to address the audio event feature signal generation program store 225 and transfer its contents to the processor. Central processor 275 and arithmetic processor 280 are thereby adapted to generate a sequence of voice event feature signals in response to the log area parameter signal in store 240 and the voice event position signal in store 250. The voice event feature signal generation command is illustrated in the flowchart of FIG.

初期には、ブロツク601によつて、位置インデククは
１にセツトされ、ストア250中の音声事象の位置が中央
プロセツサ275に転送される（ブロツク605）。ブロツク
610と同様に、例えば５に限定された予め定められた数
の音声事象位置の限界フレームが決定される。限界フレ
ームによつて規定された音声パターンの時間幅のログエ
リアパラメータが中央プロセツサ275のメモリーの区間
に入れられる（ブロツク615）。ログエリアパラメータ
の冗長性は予め定められた事象の数に対応してその中の
主要項の数をフアクタアウトすることによつて減少され
る（ブロツク620）。その直後に、現在の位置Ｌについ
ての音声事象特徴信号φ_Ｌ（ｎ）が発生される。Initially, the position index is set to 1 by block 601, and the location of the voice event in store 250 is transferred to central processor 275 (block 605). Block
Similar to 610, a limit frame of a predetermined number of audio event locations, eg, limited to five, is determined. The log area parameter of the duration of the audio pattern defined by the limit frame is entered in the memory section of the central processor 275 (block 615). Redundancy of the log area parameter is reduced by factoring out the number of key terms therein corresponding to a predetermined number of events (block 620). Immediately thereafter, a speech event feature signal φ _L (n) for the current position L is generated.

φ_Ｌ（ｎ）を決定するための式（６）の最小化は次の
導関数によつて行なわれる。The minimization of equation (6) for determining φ _L (n) is performed by the following derivative.

ここでである。ｍは予め定められた事象の数、ｒは1,2,…ある
いはｍのいずれかである。式（13）の導関数を０におい
て最小を判定すると、が得られる。式（14）からである。従つて式（15）はに変換される。式（17）のφ（ｎ）は式（14）の右辺で
置き換えることができ、となる。ここでである。式（18）を変形するとを得る。u_i（ｎ）は行列Ｙの主成分であるからとなる。式（20）は次のように簡単化される。 here It is. m is a predetermined number of events, and r is one of 1, 2,... or m. Determining the minimum of the derivative of equation (13) at 0 gives Is obtained. From equation (14) It is. Therefore, equation (15) is Is converted to Φ (n) in equation (17) can be replaced by the right side of equation (14), Becomes here It is. By transforming equation (18) Get. u _i (n) is the main component of matrix Y Becomes Equation (20) is simplified as follows.

ここでである。式（22）は行列表現でＲ＝▲▼ （24）となる。ここで λ＝θ（Ｌ）（25）である。式（25）は正確にｍ個の解を持ち、θ（Ｌ）を
最小化する解はλを最小とする解である。λ＝θ（Ｌ）
が最小値をとるような係数b₁,b₂,…,b_mは最適音声事象
特徴信号φ_Ｌ（ｎ）を実現する。 here It is. Equation (22) becomes R = ▲ ▼ (24) in a matrix expression. Here, λ = θ (L) (25). Equation (25) has exactly m solutions, and the solution that minimizes θ (L) is the solution that minimizes λ. λ = θ (L)
The coefficients b ₁ , b ₂ ,..., B _m that take the minimum value realize the optimal speech event feature signal φ _L (n).

第６図において、音声事象特徴信号φ_Ｌ（ｎ）はブロ
ツク625で発生され、ストア255に記憶される。判定ブロ
ツク635で音声パターン終了が検出されるまで、ブロツ
ク605、610、615、620、625および630を含むループが繰
返されて、その音声パターンに対する音声事象の完全な
シーケンスが形成される。In FIG. 6, a speech event feature signal φ _L (n) is generated at block 625 and stored in store 255. The loop including blocks 605, 610, 615, 620, 625 and 630 is repeated until decision block 635 detects the end of the audio pattern, forming a complete sequence of audio events for that audio pattern.

第12図は音声パターンと本発明に従つてそこから発生
される音声事象特徴信号を図示する波形を図示してい
る。波形1201は音声パターンの部分に対応し、波形1205
−１乃至1205−ｎは第２図の回路の波形から得られる音
声事象特徴信号φ_Ｌ（ｎ）のシーケンスに対応する。各
々の特徴信号は波形1201のパターンの音声事象の音響的
特徴を表わす。音声事象特徴信号は式（１）の係数a_ik
を使つて組合わせることができ、音声パターンの音響的
特徴を表わすログエリアパラメータ信号を再生する。FIG. 12 illustrates waveforms illustrating audio patterns and audio event feature signals generated therefrom in accordance with the present invention. Waveform 1201 corresponds to the voice pattern part, and waveform 1205
-1 to 1205-n correspond to the sequence of the speech event feature signal φ _L (n) obtained from the waveform of the circuit of FIG. Each feature signal represents an acoustic feature of the audio event in the pattern of waveform 1201. The voice event feature signal is _{calculated by} the coefficient a _{ik of the} equation (1).
To reproduce a log area parameter signal representing the acoustic features of the audio pattern.

第６図に図示した動作を完了すると、その音声パター
ンの音声事象特徴信号のシーケンスはストア255に記憶
される。各々の音声事象特徴信号φ_Ｉ（ｎ）は符号化さ
れ、第７図のフローチヤートに示すように利用装置285
に転送される。中央プロセツサはROM235に記憶された音
声事象信号符号化プログラム命令集合を受信するのに適
合している。Upon completion of the operation shown in FIG. 6, the sequence of audio event feature signals for that audio pattern is stored in store 255. Each audio event feature signal φ _I (n) is encoded and used as shown in the flowchart of FIG.
Is forwarded to The central processor is adapted to receive a set of audio event signal encoding program instructions stored in ROM 235.

第７図を参照すれば、音声事象インデクスＩはブロツ
ク701によつて１にリセツトされ、音声事象特徴信号φ
_Ｉ（ｎ）がストア255から読まれる。現在の音声事象特
徴信号のサンプリング周波数R_Iは当業者には周知の多く
の方法のひとつによつて、ブロツク710の中で選択され
る。例えば、命令コードはフーリエ分析を実行し、それ
からサンプリング周波数R_Iを決定するために、特徴信号
の帯域の上限に対応する信号を発生する。当業者には周
知であるように、サンプリング周波数は特徴信号を適切
に表わせるようになつているだけでよい。従つて、ゆる
やかに変化する特徴信号では、急速に変化する特徴信号
の場合より低いサンプリング周波数を利用することがで
き、各々の特徴信号について、サンプリング周波数は異
つていてよい。Referring to FIG. 7, the speech event index I is reset to 1 by block 701 and the speech event feature signal .phi.
_I (n) is read from store 255. Sampling frequency R _I of the current speech event feature signal Yotsute one of many ways known to those skilled in the art, it is selected in the block 710. For example, instruction code performs a Fourier analysis and then to determine the sampling frequency R _I, to generate a signal corresponding to the upper limit of the band of the characteristic signal. As is well known to those skilled in the art, the sampling frequency need only be such as to adequately represent the feature signal. Thus, a slowly changing feature signal may utilize a lower sampling frequency than a rapidly changing feature signal, and each feature signal may have a different sampling frequency.

音声事象特徴信号φ_Ｉ（ｎ）について、サンプリング
周波数信号が一度決定されると、これはブロツク715で
周波数R_Iで符号化される。周知の符号化方式の任意のも
のを使用することができる。例えば、各々のサンプルは
PCM、ADPCMあるいはデルタ変調信号の内の任意のものに
変調でき、音声パターン中の特徴信号位置を表わす信号
と、サンプリング周波数R_Iを表わす信号とつなぎ合わさ
れる。符号化された音声事象特徴信号は次に入出力イン
タフエース265を通して利用装置285に転送される。次に
音声事象インデクスＩが増分され（ブロツク720）、最
後の音声事象信号が符号化されたかどうかを判定するた
めに判定ブロツク725に入る。最後の音声事象信号が符
号化されるまで、ブロツク705乃至725が繰返され（Ｉ＞
I_F）、最後の符号化が行なわれたとき、音声事象特徴信
号の符号化が完了する。For sound event feature signal φ _{I (n),} the sampling frequency signal is determined once, which is encoded in the frequency R _I in block 715. Any of the well-known coding schemes can be used. For example, each sample is
PCM, ADPCM or can modulate the any of the delta modulated signal, a signal representative of the characteristic signal position in the speech patterns, are joined together with the signal representative of the sampling frequency R _I. The encoded audio event feature signal is then transferred to the utilization device 285 through the input / output interface 265. Next, the speech event index I is incremented (block 720) and a decision block 725 is entered to determine if the last speech event signal has been encoded. Blocks 705-725 are repeated (I>) until the last audio event signal is encoded.
_IF ), when the last encoding is performed, the encoding of the speech event feature signal is completed.

音声事象特徴信号からログエリアパラメータ信号の写
しを形成するためには、音声事象特徴信号を式（１）に
従つて組合わせなければならない。従つて、音声パター
ンの組合せ係数が発生され、第８図の流れ図に示すよう
に符号化される。音声事象特徴信号の符号化の後で、中
央プロセツサ275はROM225の内容を読み取るようにな
る。ROMに永久的に記憶された命令コードが組合せ係数
の形成と符号化を制御する。In order to form a copy of the log area parameter signal from the audio event feature signal, the audio event feature signal must be combined according to equation (1). Accordingly, a combination coefficient of the voice pattern is generated and encoded as shown in the flowchart of FIG. After encoding the audio event feature signal, the central processor 275 reads the contents of the ROM 225. Instruction codes permanently stored in ROM control the formation and encoding of the combination coefficients.

この組合せ係数は中央プロセツサ275と算術プロセツ
サ280のマトリクス処理によつて、全体の音声パターン
について発生される。第８図を参照すれば、音声パター
ンのログエリアパラメータはブロツク801によつてプロ
セツサ275に転送される。音声事象係数行列Ｇは次式に
よつて発生される（ブロツク805）。This combination coefficient is generated for the entire speech pattern by the matrix processing of the central processor 275 and the arithmetic processor 280. Referring to FIG. 8, the log area parameter of the voice pattern is transferred to the processor 275 by the block 801. The speech event coefficient matrix G is generated by the following equation (block 805).

Ｙ−Φの相関行列ｃはに従つて形成される（ブロツク810）。次に組合せ係数
行列は次式に従つて、ブロツク815で発生される。 The correlation matrix c of Y-Φ is (Block 810). Next, a combination coefficient matrix is generated at block 815 according to the following equation.

Ａ＝G^-1C （28）行列Ａの要素は式（１）の組合せ係数a_ikである。こ
れらの組合せ係数はブロツク820で当業者には周知のよ
うに符号化され、符号化された係数が利用装置285に転
送される。A = G ⁻¹ C (28) The elements of the matrix A are the combination coefficients _aik of the equation (1). These combination coefficients are encoded in block 820 as is well known to those skilled in the art, and the encoded coefficients are transferred to utilization unit 285.

本発明に従えば、その最大の変化速度に対応する周波
数でサンプルされた線形予測パラメータは、はるかに低
い音声事象発生周波数で符号化された音声事象特徴信号
の系列に変換され、音声パターンはさらに圧縮されて、
了解性に悪い影響を与えることなく、伝送と記憶の要求
を減少する。利用装置285は当業者には周知のLPC全極フ
イルタを使用した多数の音声合成回路のひとつに接続さ
れた通信設備でよい。According to the present invention, the linear prediction parameters sampled at the frequency corresponding to its maximum rate of change are converted to a sequence of audio event feature signals encoded at a much lower audio event occurrence frequency, and the audio pattern is further Compressed
Reduce transmission and storage requirements without negatively affecting intelligibility. Utilization device 285 may be a communication facility connected to one of a number of speech synthesis circuits using LPC all-pole filters well known to those skilled in the art.

第２図の回路は発声されたメツセージを利用装置285
を通して合成装置に伝送される符号化音声事象特徴信号
のシーケンスに圧縮するようになつている。合成装置に
おいては、音声事象特徴信号とメツセージの組合せ係数
は復号され組合わされて、メツセージのログエリアパラ
メータ信号を形成する。これらのログエリアパラメータ
信号は次に元のメツセージの写しを発生するために利用
される。The circuit of FIG. 2 utilizes the spoken message using the device 285.
To a sequence of encoded speech event feature signals transmitted to the synthesizer. In the synthesizer, the combined coefficients of the speech event feature signal and the message are decoded and combined to form a message log area parameter signal. These log area parameter signals are then used to generate a copy of the original message.

第９図は本発明の一実施例たる音声合成回路のブロツ
ク図；第10図はその動作を示すフローチヤートである。
第９図のストア915は線901とインタフエース回路904を
通して第２図の利用装置285から受信された連続して符
号化された音声事象特徴信号と組合せ信号とを記憶する
ようになつている。ストア920は線903を経由して合成し
て必要な励起信号のシーケンスを受信する。励起信号は
当業者には周知の方法で音声メツセージに応動して発生
される有声／無声音信号とピツチ周期の連続から成つて
いても良い。マイクロプロセツサ910は合成器の動作を
制御するようになつており、前述したモトローラタイプ
MC68000集積回路でよい。LPC特徴信号ストア925は発声
されたメツセージの連続したログエリアパラメータ信号
を記憶するのに利用され、これは音声事象特徴信号とス
トア915の組合せ信号から成る。発声されたメツセージ
の写しの形成はストア925からのLPC特徴信号とストア92
0からの励起信号に応動して、マイクロプロセツサ910の
制御下にLPC合成器930によつて実行される。FIG. 9 is a block diagram of a voice synthesizing circuit according to an embodiment of the present invention; FIG. 10 is a flowchart showing the operation thereof.
The store 915 of FIG. 9 is adapted to store the continuously encoded speech event feature signal and combination signal received from the utilization device 285 of FIG. 2 via the line 901 and the interface circuit 904. Store 920 receives the required sequence of excitation signals by combining via line 903. The excitation signal may consist of a voiced / unvoiced signal generated in response to a voice message in a manner well known to those skilled in the art and a series of pitch periods. The microprocessor 910 controls the operation of the synthesizer, and is the same as the Motorola type described above.
An MC68000 integrated circuit may be used. The LPC feature signal store 925 is used to store a continuous log area parameter signal of the spoken message, which comprises a combination of the speech event feature signal and the store 915. The formation of the uttered message transcript is the LPC feature signal from Store 925 and Store 92
In response to the excitation signal from zero, it is executed by the LPC synthesizer 930 under the control of the microprocessor 910.

合成器の動作はそれに関連したリードオンリーメモリ
ーに常駐した永久的に記憶された命令コードの制御下
に、マイクロプロセツサ910によつて指示される。合成
器の動作は第10図の流れ図によつて記述される。第10図
を参照すれば、符号化された音声事象特徴信号と対応す
る組合せ信号と発声されたメツセージの励起信号はイン
タフエース904によつて受信され、音声事象特徴信号お
よび組合せ係数信号ストア915と励起信号ストア920に対
して、ブロツク1010によつて転送される。ログエリアパ
ラメータインデクスＩは次にプロセツサ910中で１にリ
セツトされ（ブロツク1020）、これによつて第１のログ
エリア特徴信号y₁（ｎ）の再生が開始される。The operation of the synthesizer is directed by the microprocessor 910 under the control of a permanently stored instruction code resident in its associated read-only memory. The operation of the synthesizer is described with reference to the flowchart of FIG. Referring to FIG. 10, the encoded speech event feature signal and the corresponding combination signal and the uttered message excitation signal are received by the interface 904, and the speech event feature signal and the combination coefficient signal store 915 are received. It is forwarded by block 1010 to the excitation signal store 920. The log area parameter index I is then reset to 1 in processor 910 (block 1020), which initiates the reproduction of the _first log area characteristic signal y ₁ (n).

ログエリア信号の形成のためには音声事象特徴信号を
式（１）に従つてインデクスＩの組合せ係数と組合わせ
る必要がある。音声事象特徴信号位置カウンタＬはブロ
ツク1025によつてプロセツサ910中で１にセツトされ、
現在の音声事象特徴信号サンプルがストア915から読ま
れる（ブロツク1030）。信号サンプルのシーケンスは
波されてブロツク1035によつて音声事象特徴信号は平滑
化され、ブロツク1040では現在のログエリアパラメータ
信号が部分的に形成される。音声事象位置カウンタＬは
ストア915中の次の音声事象特徴信号をアドレスするた
めに増分され（ブロツク1045）、最後の特徴信号の発生
は判定ブロツク1050によつてテストされる。最後の音声
事象特徴信号が処理されてしまうまで、ブロツク1030乃
至1050を含むループは繰返され、これによつて現在のロ
グエリアパラメータ信号が発生され、プロセツサ910の
制御下にLPC特徴信号ストア925に記憶される。In order to form a log area signal, it is necessary to combine the voice event feature signal with the combination coefficient of index I according to equation (1). The voice event feature signal position counter L is set to 1 in processor 910 by block 1025,
The current audio event feature signal sample is read from store 915 (block 1030). The sequence of signal samples is waved and the speech event feature signal is smoothed by block 1035, and block 1040 partially forms the current log area parameter signal. Voice event position counter L is incremented to address the next voice event feature signal in store 915 (block 1045), and the occurrence of the last feature signal is tested by decision block 1050. The loop including blocks 1030-1050 is repeated until the last voice event feature signal has been processed, thereby generating the current log area parameter signal and storing it in LPC feature signal store 925 under the control of processor 910. It is memorized.

ストア925にログエリア特徴信号が記憶されてしまう
と、ブロツク1050からブロツク1055に入り、次のログエ
リアパラメータ信号の形成を開始するために、ログエリ
アインデクス信号が増分される（ブロツク1055）。ブロ
ツク1030乃至ブロツク1050のループには判定ブロツク10
60を通して入ることになる。最後のログエリアパラメー
タ信号が記憶されたあと、プロセツサ910は発声された
メツセージの写しがLPC合成器930に形成されるようにす
る。Once the log area feature signal has been stored in the store 925, block 1050 enters block 1055, where the log area index signal is incremented to begin forming the next log area parameter signal (block 1055). In the loop from block 1030 to block 1050, the judgment block 10
You will enter through 60. After the last log area parameter signal is stored, the processor 910 causes a copy of the spoken message to be formed in the LPC synthesizer 930.

第９図の合成器回路は複数個の発声されたメツセージ
に対応する音声事象特徴信号のシーケンスを記憶し、当
業者には周知の手法で、これらのメツセージの写しを選
択的に発生するように容易に変更できる。このような装
置では第２図の音声事象特徴信号発生回路は予め定めら
れた発声メツセージのシーケンスを受信し、利用装置28
5は音声事象特徴信号とそのメツセージの対応する組合
せ係数を永久に記憶する装置と該発声されたメツセージ
の音声事象と組合せ信号とを含むリードオンリーメモリ
ーで形成できる。符号化された音声事象と組合せ信号を
含むリードオンリーメモリーは第９図の合成回路ではス
トア915として組込まれている。The synthesizer circuit of FIG. 9 stores a sequence of speech event feature signals corresponding to a plurality of spoken messages, and selectively generates copies of these messages in a manner well known to those skilled in the art. Can be easily changed. In such a device, the voice event feature signal generation circuit of FIG. 2 receives a predetermined vocal message sequence and uses
5 can be formed by a device for permanently storing the speech event feature signal and the corresponding combination coefficient of the message and a read-only memory containing the speech event and the combination signal of the said uttered message. The read-only memory containing the encoded speech event and the combination signal is incorporated as a store 915 in the synthesis circuit of FIG.

フロントページの続き (56)参考文献特開昭51−93105（ＪＰ，Ａ) 特公昭53−26761（ＪＰ，Ｂ２) 日本音響学会昭和56年度秋季研究発表会講演論文集▲Ｉ▼１−１−１音声形のくり返し性を利用したデータ圧縮のリアルタイム化への試みContinuation of the front page (56) References JP-A-51-93105 (JP, A) JP-B-53-26761 (JP, B2) Proceedings of the Acoustical Society of Japan Fall Meeting, 1981, I-1-1 -1 A trial of real-time data compression using the repetitiveness of voice format

Claims

(57) [Claims]

An audio pattern is analyzed (for example, 210, 2
15, 275, 280), a method for compressing a speech pattern comprising generating a set of acoustic feature signals representing said speech pattern at a first frequency, comprising: Generating an event time sequence of a signal representing the time of occurrence of the voice event by detecting the time of occurrence of a negative zero crossing in the signal representing the timing of the voice event in the voice pattern (eg, 220); Based on the event time sequence of the signal, the acoustic feature signals are linearly combined to generate a sequence of audio event signals, each representing a voice event of the audio pattern (eg, 225, 24).
0) sampling and encoding the audio event signal to generate a sequence of audio event signals encoded at a frequency lower than the first frequency and corresponding to an occurrence frequency of the audio event. (For example, 235, 250), a method of compressing a voice pattern, which includes each step.

2. The method of claim 1, wherein each audio event signal has a time spread such that its center is the time of occurrence of the corresponding audio event. A method of compressing an audio pattern, characterized in that the audio pattern is a linear combination controlled by the following.

3. A method for compressing a speech pattern according to claim 2, wherein said step of generating said encoded speech event signal comprises generating a signal representing a band of each speech representation signal.
5) A method of compressing an audio pattern, comprising: sampling the audio event signal at a frequency corresponding to the signal representing the band.

4. The method according to claim 1, 2 or 3,
A method for compressing a speech pattern according to any of the preceding claims, wherein the acoustic feature signal is a linear prediction parameter signal representing the speech pattern.

5. A method for compressing an audio pattern according to claim 4, wherein said linear prediction parameter signal is a log area parameter signal representing the audio pattern. .

6. A method for compressing a speech pattern according to claim 4, wherein said linear prediction parameter signal is a PA representing a speech pattern.
A method for compressing voice patterns characterized by being RCOR signals.