JPS62111300A

JPS62111300A - Voice analysis/synthesization circuit

Info

Publication number: JPS62111300A
Application number: JP60251408A
Authority: JP
Inventors: 明寿山田; 永井　清隆; 正宏浜田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1985-11-08
Filing date: 1985-11-08
Publication date: 1987-05-22

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は音声認識に用いる音声の特徴抽出部や電話音声
のデジ）ル信号変換器（以下コーデックという）に用い
る場合の音声信号の低ビツトレートデジタル変換に用い
られる音声分析合成回路に関するものである。[Detailed Description of the Invention] Industrial Application Field The present invention is a low bit rate digital signal converter (hereinafter referred to as a codec) for use in voice feature extractors for voice recognition and telephone voice digital signal converters (hereinafter referred to as codecs). This relates to a speech analysis and synthesis circuit used for conversion.

従来の技術近年、電話機器のデジタル化の傾向は、機器の機能拡大
や、通信回線利用の拡大とともに推進されてくるものと
考えられる。しかし、それに伴ない有限の回線に対して
多くの加入者に対するサービスは限度があり、回線の有
効利用を行う必要が生じてくる。この様な対策として音
声信号の圧縮、即ち低ビツトレートでの音声信号変換が
必要となってきた。この様な低ビツトレートでの音声コ
ーデックの方法として、従来は音声信号のレベル変動の
バラツキを利用して信号に非線形のカーブをあてはめ圧
縮、伸張を行うものや、音声信号の相関性を利用して信
号波形の変化成分のみを符号化して圧縮率を増加し、低
ビツトレートを図る方法、或は音声のスペクトルパラメ
ータとその残差を複数のパルスで近似し低ビツトレート
化を行うものがあった。また音声認識では、入力音声の
特徴を抽出し辞書と入力音声との比較をパターンマツチ
ングで行うために膨大な入力音声データを圧縮しパラメ
ータとして辞書に持たなければならず、入力音声もまた
同じパラメータとして抽出しなければならないために上
記と同様に、線形予測係数等を用いて声道パラメータを
特徴抽出としていた。BACKGROUND OF THE INVENTION In recent years, the trend toward digitalization of telephone equipment is thought to be promoted along with the expansion of functionality of equipment and the expansion of the use of communication lines. However, due to this, there is a limit to the services that can be provided to a large number of subscribers using limited lines, and it becomes necessary to use the lines effectively. As a countermeasure against this problem, it has become necessary to compress the audio signal, that is, to convert the audio signal at a low bit rate. Conventional methods for audio codecs at such low bit rates include those that take advantage of variations in the level fluctuations of the audio signal to compress and expand the signal by applying a nonlinear curve to the signal, and those that use the correlation of the audio signal to compress and expand the signal. There have been methods to reduce the bit rate by encoding only the changing components of the signal waveform to increase the compression rate, and methods to reduce the bit rate by approximating the audio spectrum parameters and their residuals with a plurality of pulses. In addition, in speech recognition, in order to extract the characteristics of the input voice and compare the input voice with the dictionary using pattern matching, a huge amount of input voice data must be compressed and stored as parameters in the dictionary, and the input voice is also the same. Since vocal tract parameters must be extracted as parameters, similar to the above, linear prediction coefficients and the like were used to extract vocal tract parameters as features.

以下図面を参照しながら、上述した従来の音声分析合成
回路の一例について説明する。An example of the conventional speech analysis and synthesis circuit described above will be described below with reference to the drawings.

第５図は従来の音声分析合成回路の構成を示すものであ
る。第５図において、入力音声は音声合成器６２とパル
ス発生回路６１とによシ疑似音声を発生し入力音声と減
算器６６によってその誤差成分を出力するこの誤差成分
の２乗を２乗誤差最少化制御回路６３によって最少にな
るようにパルス発生回路５１のパルス位置と各々のパル
ス信号レベルを制御する。また音−合成器５２に用いる
パラメータは入力音声を線形予測分析６４によって構成
される。ここで上記のようにして得られたパルス位置、
パルス信号レベル、及ヒ線形予測ハラメータを量子化マ
ルチプレクサ６６によって入力音声の特徴を抽出する様
に行っていた。この様な例は例えば、１９８１年１２月
１日ビー・ニス・アタール（Ｂ、Ｓ・ムｔａｌ　）米国
特許第３２６３３７１号明細書に示されている。FIG. 5 shows the configuration of a conventional speech analysis and synthesis circuit. In FIG. 5, the input voice is used by a voice synthesizer 62 and a pulse generation circuit 61 to generate a pseudo voice, and a subtracter 66 outputs the error component of the input voice. The pulse position of the pulse generation circuit 51 and each pulse signal level are controlled by the pulse generation control circuit 63 so as to minimize the pulse position. Further, the parameters used in the sound synthesizer 52 are configured by linear predictive analysis 64 of input speech. Here, the pulse position obtained as above,
The pulse signal level and the linear prediction harameter were used to extract the features of the input voice using the quantization multiplexer 66. Such an example is shown, for example, in B. S. Mutal U.S. Pat. No. 3,263,371, issued December 1, 1981.

発明が解決しようとする問題点一般に音声、或は音声以外の発音メカニズム（例えば、
トロンポーン等の管楽器）をモデル化した場合、零極の
モデルで表すことができる。この零極モデルのフィルタ
係数を持つフィルタを雑音源で駆動することによって音
を発生している。Problems to be solved by the invention Generally speaking, speech or non-speech pronunciation mechanisms (e.g.
When modeling a wind instrument such as a trompon, it can be expressed as a zero-pole model. Sound is generated by driving a filter with filter coefficients of this zero-pole model with a noise source.

音声でも子音の発声には多くの要因が考えられ声道部の
一部で狭めの部分を通る空気流が乱流となり雑音源を構
成する場合がある。従来用いられてきた特徴抽出方法で
はこの声道モデルを線形予測係数で近似した全極モデル
として位置ずけられるが、全極モデルで逆フィルタされ
た後の残差成分には、鼻子音や声帯から肺の部分では極
モデルにのらず零点として音響系では扱われるため多く
の情報を含んでおり、声道中の口の狭めから起こる子音
については全くモデルに乗らない。このためとの残差成
分の中には音声信号の多くの情報を含んでいた。In speech, there are many factors involved in the production of consonants, and the airflow that passes through a narrow part of the vocal tract can become turbulent and constitute a noise source. In conventional feature extraction methods, this vocal tract model is positioned as an all-pole model approximated by linear prediction coefficients, but the residual components after being inverse filtered with the all-pole model include nasal consonants and vocal folds. Since the lung part does not fit into the polar model and is treated as a zero point in the acoustic system, it contains a lot of information, and consonants that occur due to the narrowing of the mouth in the vocal tract are not included in the model at all. For this reason, the residual component contained a lot of information of the audio signal.

この様に零モデル系の導入によって発生系の完全な同定
が可能である事からみて、明らかに声帯発生モデルを用
いることが音声認識の特徴抽出として有効かつ、効率的
圧縮が可能であり、また音声コーデックの音声品質を向
上するものである。Considering that it is possible to completely identify the developmental system by introducing the zero model system, it is clear that using the vocal cord developmental model is effective for extracting features for speech recognition, and allows for efficient compression. This improves the audio quality of audio codecs.

しかしながらかかる従来の様な回路では、音声の発生メ
カニズムからは声道パラメータ抽出以外例等の対応関係
がなく、全極モデルで逆フィルタされた後の残差成分に
は多くの音声情報が残っており、それを単なるパルス発
声回路で代用していた。この場合、特に無声音等の音声
部分では効率的な圧縮が出来ず音声品質の劣化や、完全
な音声の特徴を抽出する事ができなかった。However, in such conventional circuits, there is no correspondence other than vocal tract parameter extraction from the speech generation mechanism, and a lot of speech information remains in the residual components after being inverse filtered using the all-pole model. Instead, a simple pulse generation circuit was used instead. In this case, especially voice parts such as unvoiced sounds cannot be efficiently compressed, resulting in deterioration of voice quality and inability to extract complete voice characteristics.

本発明は上記問題点に鑑み、音声、或はその他の音響信
号発生器を極零モデル化・して特徴を抽出する事を主眼
とし、音源を単純なインパルス、或は雑音源として持ち
入力音響信号と、極零モデルからなる合成フィルタとの
誤差を最少にする様に、上記合成フィルタの係数を最適
化する事を特徴としているため、音声或はその他の音響
信号に対して完全なモデル化が可能となる。さらにこの
モデル化された合成フィルタの係数を量子化器をもちい
て音声コーデック、或は音声認識装置に用いられる音声
分析のデータ圧縮として有効な音声分析回路を提供する
ものである。In view of the above-mentioned problems, the present invention focuses on extracting features by creating a pole-zero model of speech or other acoustic signal generators, and uses a simple impulse or a noise source as a sound source and input acoustic signals. It is characterized by optimizing the coefficients of the synthesis filter to minimize the error between the signal and the synthesis filter consisting of a pole-zero model, so it is possible to completely model speech or other acoustic signals. becomes possible. Furthermore, the present invention provides a speech analysis circuit which is effective for data compression of speech analysis used in speech codecs or speech recognition devices by using a quantizer for the coefficients of the modeled synthesis filter.

問題点を解決するための手段上記問題点を解決する為に本発明の音声分析合成回路は
、入力音声を格子型の線形予測係数として分析し、さら
に各格子点から並列に出力点を取り出し各出力点に重み
付け回路により重み付けをし、それぞれの重み付けされ
た出力を加算器により加算し、その加算出力と上記格子
型フィルタの出力信号または白色雑音との差を減算器に
より減算し、その出力誤差の２乗を最少とするように上
記重み付け回路の係数を制御する２乗誤差最少化制御回
路を持ち、上記格子型フィルタの出力または誤差出力か
ら得られる信号を相関器に通して得られる振幅、及び最
も高い相関を示すところの相関時間を出力し、それら格
子型フィルタの係数、重み付け回路の係数、相関器から
得られる振幅、相関時間等のパラメータを量子化しマル
チプレクサにより回線等に伝送する音声分析装置と、上
記格子型フィルタの逆の構成を持ち、それぞれの格子点
に白色雑音源から重み付け回路を介して重み付け白色雑
音を挿入してなる構成で、上記音声分析装置から得られ
る各種のパラメータをそれぞれ。Means for Solving the Problems In order to solve the above problems, the speech analysis and synthesis circuit of the present invention analyzes input speech as grid-type linear prediction coefficients, extracts output points from each grid point in parallel, and calculates each The output points are weighted by a weighting circuit, each weighted output is added by an adder, and the difference between the added output and the output signal or white noise of the above-mentioned lattice filter is subtracted by a subtracter, and the output error is calculated. has a square error minimization control circuit that controls the coefficients of the weighting circuit so as to minimize the square of the amplitude obtained by passing the signal obtained from the output or error output of the lattice filter through a correlator, Audio analysis that outputs the correlation time that shows the highest correlation, quantizes parameters such as the coefficients of the lattice filter, the coefficients of the weighting circuit, the amplitude obtained from the correlator, and the correlation time, and transmits them to a line etc. using a multiplexer. The apparatus has a structure that is the inverse of the lattice filter described above, and inserts weighted white noise from a white noise source through a weighting circuit at each lattice point. Each.

格子型フィルタの係数、重み付け回路の係数に挿入し、
さらに相関時間に対応した周期を持つパルス信号を上記
で得られた振幅で制御したパルス信号を格子型フィルタ
の逆から挿入し、逆から出力して音声信号を復元する音
声合成装置とから構成されている。Insert into the lattice filter coefficients and weighting circuit coefficients,
Furthermore, it consists of a speech synthesizer that restores the audio signal by inserting a pulse signal having a period corresponding to the correlation time and controlling it with the amplitude obtained above from the opposite side of the lattice filter, and outputting it from the opposite side. ing.

作用本発明は上記した構成によって、入力音声を格子型フィ
ルタ構成による声道パラメータと各格子点から重み付け
回路を介して得られる係数に分離し、音源として単純化
したパルス、或は雑音を持ち、声道パラメータとして入
力信号の同定が出来なかった場合、その残差成分を重み
付け係数をパラメータとし、２乗誤差最少化アルゴリズ
ムにより制御することとなる。即ち音声以外の入力音に
対しても対応がとれるだけで、なく、入力音声に雑音等
が重畳した場合、従来の様な線形予測フィルタで構成さ
れる音声コーデックや音声認識に用いられる特徴抽出回
路の音声発声系の同定精度の劣化に対しても、極零モデ
ルとしてとらえられるので有効となる。Operation The present invention uses the above-described configuration to separate input speech into vocal tract parameters using a lattice filter configuration and coefficients obtained from each lattice point via a weighting circuit, and has a simplified pulse or noise as a sound source. If the input signal cannot be identified as a vocal tract parameter, the residual component is controlled by a square error minimization algorithm using a weighting coefficient as a parameter. In other words, it is not only possible to deal with input sounds other than speech, but also when noise etc. are superimposed on the input speech, the feature extraction circuit used for speech codec and speech recognition consisting of conventional linear prediction filters can be used. It is also effective against deterioration in the identification accuracy of the speech production system because it can be regarded as a pole-zero model.

実施例以下本発明の一実施例の音声分析合成回路について、図
面を参照しながら説明する。Embodiment Hereinafter, a speech analysis and synthesis circuit according to an embodiment of the present invention will be described with reference to the drawings.

第１図、第４図は本発明の一実施例における音声分析合
成回路の構成を示すものである。第１図において、１，
７．１３は相関器、２，３，８゜９．１４．１５は乗算
器、４，５，１０，１１゜１６．１７．２５は加算器、
６，１２．１８は単位遅延素子、１９は相関器、２０，
２１．２２゜２３は乗算器で、重み付け回路を構成する
。２４は加算器；２１は２乗誤差最少化制御回路である
。FIGS. 1 and 4 show the configuration of a speech analysis and synthesis circuit in an embodiment of the present invention. In Figure 1, 1,
7.13 is a correlator, 2, 3, 8° 9.14.15 is a multiplier, 4, 5, 10, 11° 16.17.25 is an adder,
6, 12. 18 is a unit delay element, 19 is a correlator, 20,
21, 22 and 23 are multipliers that constitute a weighting circuit. 24 is an adder; 21 is a square error minimization control circuit.

以上の様に構成された音声分析合成回路について、以下
第１図をもってその動作を説明する。The operation of the speech analysis and synthesis circuit configured as described above will be explained below with reference to FIG.

まず第１図にお、いて入力音声は声道パラメータとして
相関器１，７，１３、乗算器２，３，８゜９．１４，１
５、単位遅延素子６，１２．１Ｂ、加算器４，５，１０
，１１，１６．１７により構成される格子型フィルタに
より入力音声を声道に依存したパラメータＫに分析する
。この分析に関しては、例えばバーブ（Ｂｕｒｇ）法「
ア　ニューアナリシス　テクニック　７オー　タイム　
シリーズ　データ」（ムＮｅｗ　Ａｎａｌｙｓｉｓ　Ｔ
ｅｃｈｎｉｑｕｅｆｏｒ　Ｔｉｍｅ　５ｅｒｉｅｓ　Ｄ
ａｔａ−Ｐｒｏｃ　ＮムＴＯムｄｖａｎｃｅｄｓｔｕｄ
ｙ　１ｎｓｓｔｉｔｕｔｅ　ｏｎ　５ｊ４ｎａｌ　ｐｒ
ｏｃ−ＸｒｕｓｃｈａｄｅＮｅｔｈｅｒｌａｎｄｓ　１
９６８　）等がある。加算器１６の出力信号を相関器１
９に接続し残差信号の振幅、周期等を得ることができる
。さらに各格子点の出力を乗算器２０，２１．２２．２
３を介し加算器２４へ送出する。加算器２４の出力は白
色雑音との差を得るため加算器２６へ加えられる。ここ
でその誤差を２乗誤差最少化制御回路２６に介し上記乗
算器（重み付け回路）２０，２１．２２．２３の係数（
重み付け係数）を制御する。First, in Fig. 1, the input voice has vocal tract parameters of correlators 1, 7, 13, multipliers 2, 3, 8°9, 14, 1
5, unit delay element 6, 12.1B, adder 4, 5, 10
, 11, 16, and 17, the input speech is analyzed into parameters K that depend on the vocal tract. For this analysis, for example, the Burg method
New Analysis Technique 7 Oh Time
Series Data” (Mu New Analysis T
echnique for Time 5eries D
ata-Proc Nmu TOmu dvanced stud
y 1nsstitut on 5j4nal pr
oc-XruschadeNetherlands 1
968) etc. The output signal of the adder 16 is sent to the correlator 1
9 to obtain the amplitude, period, etc. of the residual signal. Furthermore, the output of each grid point is multiplier 20, 21.22.2
3 to the adder 24. The output of adder 24 is added to adder 26 for difference with white noise. Here, the error is passed through the square error minimization control circuit 26 and the coefficients (
weighting factors).

なお得られた声道特性フィルタの係数区１重み付け回路
の係数、相関器１９から得られる振幅値、時間周期を量
子化器、及びマルチプレクサへと送出して音声の分析系
を構成する。また、上記の分析系によって得られた各種
のパラメータは以下の合成系によって元の音声に復元さ
れる。Note that the obtained coefficients of the coefficient section 1 weighting circuit of the vocal tract characteristic filter, the amplitude value obtained from the correlator 19, and the time period are sent to a quantizer and a multiplexer to constitute a speech analysis system. Furthermore, various parameters obtained by the above analysis system are restored to the original speech by the following synthesis system.

以下第４図をもって合成系の説明をする。第４図におい
て、パルス発振振幅制御部１ｏ１は前記分析系の相関器
１９より得られた周期と振幅を持つパルス信号を加算器
１０５に加える。さらに同様にｋｏパラメータ入力部１
０２よシ合成系の格子型フィルタの係数ｋを制御する乗
算器１１７゜１１８．１１２，１１３，１０７，１０８
に、また重み付け係数ＷをＷパラメータ入力部１０３か
ら各乗算器１２３，１２２，１２１，１２０へ挿入する
。ここで合成系の格子型フィルタは分析系の格子型フィ
ルタと逆の関係になっている。ここで合成系の格子型フ
ィルタの各格子点に白色雑音の重み付けされた信号が加
わるため、あたかも子音声発声時の声道の狭めが模擬出
来る事となる。The synthesis system will be explained below with reference to FIG. In FIG. 4, a pulse oscillation amplitude control section 1o1 applies to an adder 105 a pulse signal having the period and amplitude obtained from the correlator 19 of the analysis system. Furthermore, similarly, ko parameter input section 1
Multiplier 117゜118.112, 113, 107, 108 that controls the coefficient k of the lattice type filter of the 02-synthesis system
In addition, the weighting coefficient W is inserted from the W parameter input section 103 into each multiplier 123, 122, 121, and 120. Here, the lattice type filter of the synthesis system has an inverse relationship to the lattice type filter of the analysis system. Here, since a signal weighted with white noise is added to each grid point of the grid filter of the synthesis system, it is possible to simulate the narrowing of the vocal tract when the child voice is uttered.

この様に本実施例によれば、音声分析合成回路に関して
、例えば声道パラメータ分析部と、Ｗパラメータで構成
するトランスバーサル型フィルタとが独立に制御するこ
とが可能であシ、格子型フィルタの安定性と、学習同定
法等を用いたアルゴリズムで上記したトランスバーサル
型フィルタの係数を制御することで、構成する系の安定
性も補償する事ができる。さらに重要なことは、このよ
うに音声発声メカニズムに基ずいて特徴抽出が出来るた
め、例え音声と同時に不要な周囲雑音が入った場合でも
従来のような声道モデル（全極型モデル）のみに依存し
た場合のような声道パラメータの劣化が起こらず音声コ
ーデックにおける音声品質の向上が図れ、また音声認識
における音声の特徴抽出精度の向上が図れる。As described above, according to the present embodiment, regarding the speech analysis and synthesis circuit, for example, the vocal tract parameter analysis section and the transversal filter composed of W parameters can be independently controlled, and the lattice filter can be controlled independently. By controlling the coefficients of the above-mentioned transversal filter using an algorithm using stability and a learning identification method, it is possible to compensate for the stability of the constituent system. More importantly, since features can be extracted based on the speech production mechanism in this way, even if unnecessary ambient noise is present at the same time as the speech, it is possible to extract features using only the conventional vocal tract model (all-pole model). It is possible to improve the voice quality in the voice codec without causing deterioration of vocal tract parameters as would be the case in the case of dependency, and it is also possible to improve the accuracy of voice feature extraction in voice recognition.

以下本発明の第２の実施例について図面を参照しながら
説明する。A second embodiment of the present invention will be described below with reference to the drawings.

第２図は本発明の第２の実施例を示す音声分析合成回路
の構成図である。なお第１の実施例と同一の構成要素に
は同一の番号を付している。FIG. 2 is a block diagram of a speech analysis and synthesis circuit showing a second embodiment of the present invention. Note that the same components as in the first embodiment are given the same numbers.

同図において、１から１７は第１図における格子型フィ
ルタによる線形予測分析器（声道）（ラメータ分析とし
て用いる）と同様であり、２０から２６は各格子点から
重み係数Ｗで重み付けされた信号を加算器２４によって
総和を得、加算器２６によって白色雑音と比較され、そ
の２乗誤差を最少にするように２乗誤差最少化制御回路
２６により重み係数Ｗを制御する。ここで加算器２６の
出力から相関器１９を介してパルス振幅ム、パルス周期
Ｔ、を出力する構成とした点である。In the figure, 1 to 17 are the same as the linear predictive analyzer (vocal tract) using a lattice filter in Figure 1 (used as a metric analysis), and 20 to 26 are weighted with a weighting coefficient W from each lattice point. The sum of the signals is obtained by an adder 24, which is compared with white noise by an adder 26, and the weighting coefficient W is controlled by a square error minimization control circuit 26 so as to minimize the square error. Here, the configuration is such that the pulse amplitude M and the pulse period T are outputted from the output of the adder 26 via the correlator 19.

上記の様に構成された音声分析回路について以下その動
作を説明する。The operation of the speech analysis circuit configured as described above will be explained below.

第２図において、重み付け回路２０から２乗誤差最少化
制御回路２６の系を例えば最急時化法や学習同定法等の
アルゴリズムで系の同定を行った場合、加算器２５の加
算入力に白色雑音を入力した場合、その減算入力、すな
わち加算器２４の出力は白色雑音に近似される。すなわ
ち、加算器２６の出力は、系で同定出来なかった音声の
有声音時に発生するピッチに対応したパルス成分が検出
される。また、・無声音時の場合には出力誤差が最少と
なり、音声の発声時に対応した声道中の狭い部分に対応
したパラメータＷの系列が得られる。この様な音声分析
系から得られるパラメータを量子化しマルチプレクサを
介して伝送系に送出し、受信部では第４図の様な構成に
て元の音声信号を復元するものである。In FIG. 2, when the system from the weighting circuit 20 to the squared error minimization control circuit 26 is identified by an algorithm such as the steepest method or the learning identification method, a white color is applied to the addition input of the adder 25. When noise is input, the subtraction input, ie, the output of the adder 24, is approximated to white noise. That is, in the output of the adder 26, a pulse component corresponding to the pitch that occurs when the voiced sound cannot be identified by the system is detected. Furthermore, in the case of unvoiced sound, the output error is minimized, and a series of parameters W corresponding to the narrow part of the vocal tract corresponding to the time of vocalization can be obtained. Parameters obtained from such a voice analysis system are quantized and sent to the transmission system via a multiplexer, and the receiving section restores the original voice signal using a configuration as shown in FIG.

以下本発明の第３の実施例について図面を参照しながら
説明する。A third embodiment of the present invention will be described below with reference to the drawings.

第３図は本発明の第３の実施例を示す音声分析合成回路
の構成図である。なお第３図においても第１の実施例と
同一の構成要素には同一の番号を付している。FIG. 3 is a block diagram of a speech analysis and synthesis circuit showing a third embodiment of the present invention. In FIG. 3 as well, the same components as in the first embodiment are given the same numbers.

同図において、１から１７は第１図における格子型フィ
ルタによる線形予測分析器（声道パラメータ分析として
用いる）と同様であり、２０から２６は各格子点から重
み係数Ｗで重み付けされた信号を加算器２４によって総
和を得、加算器２６によって加算器１６の出力と比較さ
れ、その２乗誤差を最少にするように２乗誤差最少化制
御回路２６により重み係数Ｗを制御する。ここで加算器
２５の出力から相関器１９を介してパルス振幅ム、パル
ス周期ＴＰ、及び入力音声の有無、無声判定検出を出力
する構成とした点である。In the same figure, 1 to 17 are the same as the linear predictive analyzer (used for vocal tract parameter analysis) using a lattice filter in FIG. The adder 24 obtains the sum, which is compared with the output of the adder 16 by the adder 26, and the square error minimization control circuit 26 controls the weighting coefficient W so as to minimize the square error. Here, the configuration is such that the pulse amplitude, pulse period TP, presence or absence of input voice, and voiceless determination detection are output from the output of the adder 25 via the correlator 19.

上記の様に構成された音声分析合成回路について以下そ
の動作を説明する。The operation of the speech analysis and synthesis circuit configured as described above will be explained below.

第３図において、重み付け回路２ｏから２乗誤差最少化
制御回路２６の系を例えば最急降化法や学習同定法等の
アルゴリズムで系の同定を行った場合、加算器２５の加
算入力に格子型フィルタ出力で入力音声に対応した残差
信号（白色雑音）を入力した場合、その減算入力、すな
わち加算器２４の出力は入力音声の有声、無声に対応し
てパルス信号や白色雑音に近似される。すなわち、加算
器２６の出力は、格子型フィルタ系で同定出来なかった
音声の有声音時に発生するピッチに対応したパルス成分
が検出される。また、無声音時の場合には出力誤差が最
少となり、音声の発声時に対応した声道中の狭めに対応
したパラメータＷの系０列が得られる。この誤差信号に
相関器１９を介してパルス振幅や、周期、及び出力誤差
の値をある与えられたしきい値で判定する事により、入
力音声の有声、無声の判定信号出力を得る。この様な音
声分析系から得られるパラメータを量子化しマルチプレ
クサを介して伝送系に送出し、受信部では第４図の様な
構成にて元の音声信号を復元するものである。In FIG. 3, when the system from the weighting circuit 2o to the squared error minimization control circuit 26 is identified using an algorithm such as the steepest descent method or the learning identification method, a grid is applied to the addition input of the adder 25. When a residual signal (white noise) corresponding to the input voice is input as a type filter output, the subtraction input, that is, the output of the adder 24, is approximated to a pulse signal or white noise depending on whether the input voice is voiced or unvoiced. Ru. That is, in the output of the adder 26, a pulse component corresponding to a pitch that occurs when the voiced sound cannot be identified by the lattice filter system is detected. Furthermore, in the case of unvoiced sounds, the output error is minimized, and a series 0 of parameters W corresponding to the narrowing of the vocal tract corresponding to the utterance of speech is obtained. This error signal is passed through a correlator 19 to determine the pulse amplitude, period, and output error value using a given threshold value, thereby obtaining a signal output for determining whether the input voice is voiced or unvoiced. Parameters obtained from such a voice analysis system are quantized and sent to the transmission system via a multiplexer, and the receiving section restores the original voice signal using a configuration as shown in FIG.

なお、第１，２及び３の実施例において量子化器は各パ
ラメータの出現頻度に応じて最適など・ソト配列を行う
こともできる。またさらに第４図で、パルス発振器と白
色雑音発生器を第３図で述べたように、入力音声の有声
、無声を判定し双方の発振器出力をスイッチ等で切シ替
える構成としてもよい。Note that in the first, second, and third embodiments, the quantizer can also perform optimal sorting according to the appearance frequency of each parameter. Furthermore, in FIG. 4, the pulse oscillator and white noise generator may be configured to determine whether the input voice is voiced or unvoiced and to switch the outputs of both oscillators using a switch or the like, as described in FIG.

発明の効果以上の様に本発明は、音声及びそれ以外の音響入力信号
に対して完全なモデルを当てはめることができ、また効
率的な音声の特徴を抽出するものである。さらに本発明
の効果は、低ビツトレート音声コーデックにおける音声
品質の向上や、音声認識における入力音声の特徴抽出回
路に対しても有効な回路を提供するものである。Effects of the Invention As described above, the present invention is capable of applying a complete model to speech and other audio input signals, and efficiently extracting speech features. Furthermore, the present invention provides a circuit that is effective for improving the voice quality in low bit rate voice codecs and for extracting features of input voice in voice recognition.

[Brief explanation of drawings]

第１図と第４図は本発明の第１の実施例における音声分
析合成回路の構成を示すブロック図、第２図は本発明の
第２の実施例における音声分析回路の構成を示すブロッ
ク図、第３図は本発明の第３の実施例における音声分析
回路のブロック図、第６図は従来例における音声分析合
成回路のブロック図である。１．７．１３・・・・・・相関器、２，３，８，９，１
４゜１６・・・・・・乗算器、４．５．１０．１１　、
１６，１７゜２６・・・・・・加算器、６，１２．１８
・・・・・・単位遅延素子、１９・・・・・・相関器、
２０〜２３・・・・・・乗算器、２４゜２６・・・・・
・加算器、２６・・・・・・２乗誤差最少化制御回路。1 and 4 are block diagrams showing the configuration of a speech analysis and synthesis circuit in a first embodiment of the present invention, and FIG. 2 is a block diagram showing the structure of a speech analysis circuit in a second embodiment of the present invention. , FIG. 3 is a block diagram of a speech analysis circuit in a third embodiment of the present invention, and FIG. 6 is a block diagram of a speech analysis and synthesis circuit in a conventional example. 1.7.13 Correlator, 2, 3, 8, 9, 1
4゜16...multiplier, 4.5.10.11,
16,17゜26...Adder, 6,12.18
... Unit delay element, 19 ... Correlator,
20~23... Multiplier, 24゜26...
- Adder, 26... Square error minimization control circuit.

Claims

[Claims]

(1) Analyze the input audio as a grid-type linear prediction coefficient,
Furthermore, output points are extracted in parallel from each grid point, each output point is weighted by a weighting circuit, each weighted output is added by an adder, and the added output is combined with the output signal of the above-mentioned grid filter or white noise. The output or error output of the lattice filter has a square error minimization control circuit that controls the coefficients of the weighting circuit so as to minimize the square of the output error. Pass the signal obtained from the correlator through a correlator, output the amplitude obtained and the correlation time showing the highest correlation, and calculate the coefficients of the lattice filter, the coefficients of the weighting circuit, the amplitude obtained from the correlator, the correlation time, etc. It has a voice analysis device that quantizes the parameters of and transmits it to a line etc. using a multiplexer, and has the inverse configuration of the above-mentioned lattice type filter, and inserts weighted white noise from a white noise source through a weighting circuit at each lattice point. In the configuration, various parameters obtained from the speech analysis device described above are inserted into the coefficients of the lattice filter and the coefficients of the weighting circuit, respectively, and a pulse signal with a period corresponding to the correlation time is generated with the amplitude obtained above. A speech analysis and synthesis circuit comprising a speech synthesis device which inserts a controlled pulse signal from the opposite side of a lattice filter and outputs it from the opposite side to restore an audio signal.

(2) Perform linear predictive analysis on the input signal using a lattice filter, weight the output signal from each lattice point via a weighting circuit, obtain the difference between the sum of the outputs and the white noise signal, and square the difference. A squared error minimizing control circuit is provided to control and minimize the coefficients of each weighting circuit, and the amplitude and correlation time of the error signal are obtained from the error signal using a correlator. The speech analysis and synthesis circuit according to item 1.

(3) Perform linear predictive analysis of the input signal using a lattice filter, weight the output signal of the lattice point from each lattice point via a weighting circuit, and calculate the difference or error between the sum of the outputs and the output signal of the lattice filter. A square error minimization control circuit is provided which obtains the signal and minimizes the square of the signal by controlling the coefficients of each weighting circuit, and the error signal is passed through a correlator to obtain the amplitude and correlation time of the error signal. A speech analysis and synthesis circuit according to claim 1, characterized in that: