JPS61275799A

JPS61275799A - Voice recognition equipment

Info

Publication number: JPS61275799A
Application number: JP60117320A
Authority: JP
Inventors: 平岩　篤信; 雅男渡; 曜一郎佐古; 誠赤羽
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1985-05-30
Filing date: 1985-05-30
Publication date: 1986-12-05

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】以下の順序でこの発明を説明する。[Detailed description of the invention] The invention will be explained in the following order.

Ａ　産業上の利用分野Ｂ　発明の概要Ｃ従来の技術Ｄ　発明が解決しようとする問題点Ｅ　問題点を解決するための手段Ｆ　作用Ｇ　実施例Ｇｌ　　音響分析回路の説明（第１図）０２　時間正規
化処理の説明（第１図、第２図）Ｇ３　パターンマツチング処理の説明（第１図）Ｇ４　予備選択の説明（第３図〜第５図）Ｇ５　予備選
択の他の例の説明（第６図〜第８図）Ｈ発明の効果Ａ　産業上の利用分野この発明は、前もって作成し記憶しである認識対象語の
標準パターンと、認識したい語の入力パターンとのパタ
ーンマツチングを行うことにより音声認識を行なう装置
に関する。A. Field of industrial application B. Overview of the invention C. Prior art D. Problem to be solved by the invention E. Means for solving the problem F. Effect G. Example G. Description of acoustic analysis circuit (Fig. 1) 02 Time Explanation of normalization processing (Figures 1 and 2) G3 Explanation of pattern matching process (Figure 1) G4 Explanation of preliminary selection (Figures 3 to 5) G5 Explanation of other examples of preliminary selection ( Figures 6 to 8) Effects of the Invention A Industrial Field of Application This invention performs pattern matching between a standard pattern of words to be recognized, which has been created and stored in advance, and an input pattern of the word to be recognized. The present invention relates to a device for performing voice recognition.

Ｂ　発明の概要この発明はパターンマツチング式の音声認識装置におい
て、登録時、認識対象語の正規の標準パターンを記憶し
ておくだけでなり、認識対象語の音響パラメータ系列の
２値化パターンの標準パターンをも登録しておき、認識
時、入力語の音響パラメータ系列からその２値化パター
ンを求め、この２値化パターンと２値化標準パターンと
の距離算出を行なって入力語のパターンとの距離算出を
行う標準パターンを絞り込む予備選択を行うようにした
もので、パターンマツチング時の距離計算の演算量の削
減ができ、認識の応答時間を短縮できる。B. Summary of the Invention The present invention is a pattern matching type speech recognition device that, at the time of registration, only stores the regular standard pattern of the recognition target word, and the binarization pattern of the acoustic parameter sequence of the recognition target word is stored. A standard pattern is also registered, and during recognition, the binarized pattern is obtained from the acoustic parameter series of the input word, the distance between this binarized pattern and the binarized standard pattern is calculated, and the pattern is compared to the input word pattern. This method performs preliminary selection to narrow down the standard patterns for distance calculation, which reduces the amount of calculation for distance calculation during pattern matching and shortens recognition response time.

Ｃ従来の技術音声は時間軸に沿って変化する現象で、スペクトラム・
パターンが刻々と変化するように音声を発声することに
よって固有の単語や言葉が生まれる。この人間が発声す
る単語や言葉を自動認識する技術が音声認識であるが、
人間の聴覚機能に匹敵するような音声認識を実現するこ
とは現在のところ至難のことである。このため、現在実
用化されている音声認識の殆んどは、一定の使用条件の
下で、Ｗ識対象単語の標準パターンと入力パターンとの
パターンマツチングを行なうことによりなす方法である
。C Conventional technologySpeech is a phenomenon that changes along the time axis, and is a phenomenon that changes along the time axis.
Unique words and words are created by uttering sounds with ever-changing patterns. Speech recognition is a technology that automatically recognizes words and phrases spoken by humans.
At present, it is extremely difficult to achieve speech recognition that is comparable to the human auditory function. For this reason, most speech recognition methods currently in practical use are performed by pattern matching a standard pattern of W recognition target words and an input pattern under certain conditions of use.

第９図はこの音声認識装置の概要を説明するための図で
、マイクロホン（１）よりの音声入力が音響分析回路（
２）に供給される。この音響分析回路（２）では入力音
声パターンの特徴を表わす音響パラメータが抽出される
。この音響パラメータを抽出する音響分析の方法は種々
考えられるが、例えばその−例としてバンドパスフィル
タと整流回路を１チヤンネルとし、このようなチャンネ
ルを通過帯域を変えて複数個並べ、このバンドパスフィ
ルタ群の出力としてスペクトラム・パターンの時間変化
を抽出する方法が知られている。この場合、音響パラメ
ータはその時系列Ｐｉ（ｎ）　（ｉ　＝　１．　２・・
・ＩＩＩは例えばバンドパスフィルタのチャンネル数、
ｎ＝１．２・・・ＮＵＮは音声区間判定により判定され
た区間において認識に利用されるフレーム数）で表わす
ことができる。Figure 9 is a diagram for explaining the outline of this speech recognition device, in which voice input from the microphone (1) is input to the acoustic analysis circuit (
2). This acoustic analysis circuit (2) extracts acoustic parameters representing the characteristics of the input speech pattern. Various acoustic analysis methods can be considered to extract this acoustic parameter, but for example, a bandpass filter and a rectifier circuit are used as one channel, and a plurality of such channels are arranged with different passbands, and this bandpass filter A method is known in which a temporal change in a spectrum pattern is extracted as a group output. In this case, the acoustic parameters are their time series Pi(n) (i = 1. 2...
・III is, for example, the number of channels of the bandpass filter,
n=1.2...NUN can be expressed as the number of frames used for recognition in the section determined by voice section determination.

この音響分析回路（２）よりの音響パラメータ時系列Ｐ
ｉ（ｎ）は、例えばスイッチからなるモード切換回路（
３）に供給される。この回路（３）のスイッチが端子Ａ
側に切り換えられるときは登録モード時で、音響パラメ
ータ時系列Ｐｉ（ｎ）が認識パラメータとして標準パタ
ーンメモ１月４）にストアされる。つまり、音声認識に
先だって話者の音声パターンが標準パターンとしてこの
メモ１月４）に記憶される。なお、この登録時、発声速
度変動や単語長の違いにより一般に各登録標準パターン
のフレーム数は異なっている。Acoustic parameter time series P from this acoustic analysis circuit (2)
i(n) is, for example, a mode switching circuit (
3). The switch of this circuit (3) is terminal A
When it is switched to the registration mode, the acoustic parameter time series Pi(n) is stored as a recognition parameter in the standard pattern memo January 4). That is, prior to speech recognition, the speaker's speech pattern is stored as a standard pattern in this memo. Note that during this registration, the number of frames for each registered standard pattern generally differs due to variations in speaking speed and differences in word length.

一方、このスイッチ（３）が端子Ｂ側に切り換えられる
ときは認識モード時である。そして、この認識モード時
は、音響分析回路（２）からのそのときの入力音声の音
響パラメータ時系列が入力音声パターンメモリ（５）に
供給されて一時スドアされる。そしてこの入力パターン
と標準パターンメモリ（４）から読み出された複数の認
識対象単語の標準パターンのそれぞれとの違いの大きさ
が距離算出回路（６）にて計算され、そのうち入力パタ
ーンと標準パターンとの差が最小の認識対象単語が最小
値判定回路（７）にて検出され、これにて入力された単
語が認識される。On the other hand, when this switch (3) is switched to the terminal B side, it is the recognition mode. In this recognition mode, the acoustic parameter time series of the input speech at that time from the acoustic analysis circuit (2) is supplied to the input speech pattern memory (5) and temporarily stored therein. Then, the distance calculation circuit (6) calculates the difference between this input pattern and each of the standard patterns of a plurality of recognition target words read out from the standard pattern memory (4). The recognition target word with the smallest difference from the minimum value determination circuit (7) is detected, and the input word is recognized by this.

このように、登録された標準パターンと入力パターンの
パターンマツチング処理により入力音声の認識を行なう
ものであるが、この場合に同じ単語を同じように発声し
てもそのスペクトラムバタ−ンは時間軸方向にずれたり
伸縮したりすることを考慮しなければならない。すなわ
ち、例えば「ハイ」という単語を認識する場合、標準パ
ターンが「ハイ」で登録されているとき、入力音声が「
ハーイ」と時間軸方向に伸びてしまった場合、これは距
離が大きく違い、全く違った単語とされてしまい、正し
い認識ができない。このため、音声認識のパターンマツ
チングでは、この時間軸方向のずれ、伸縮を補正する時
間正規化の処理を行なう必要があり、また、この時間正
規化は認識精度を向上させるための重要な処理である。In this way, the input speech is recognized by pattern matching processing between the registered standard pattern and the input pattern, but in this case, even if the same word is uttered in the same way, the spectrum pattern will be different from the time axis. It is necessary to take into consideration the possibility of deviation or expansion/contraction in the direction. In other words, for example, when recognizing the word "hai", if the standard pattern is registered as "hai", the input voice is "
If the word ``hi'' extends along the time axis, the distance will be very different and it will be treated as a completely different word, making it impossible to recognize it correctly. For this reason, in pattern matching for speech recognition, it is necessary to perform time normalization processing to correct this shift in the time axis direction, expansion and contraction, and this time normalization is an important process for improving recognition accuracy. It is.

この時間正規化の一方法としてＤ　Ｐ　（Ｄｙｎａｍｉ
ｃＰｒｏｇｒａｍ＋ｎｉｎｇ　）マツチングと呼ばれる
手法がある（例えば特開昭５０−９６１０４号公報参照
）。One method of time normalization is D P (Dynami
cProgram+ning) There is a method called matching (see, for example, Japanese Patent Laid-Open No. 50-96104).

このＤＰマツチングは時間軸のずれを考慮した多数の標
準パターンを用意しておくのではなく、歪関数によって
多数の時間を正規化した標準パターンを生成し、これと
入力パターンとの距離を求め、その最小値のものを検知
することにより、音声認識をするものである。This DP matching does not prepare a large number of standard patterns that take time axis deviations into account, but instead generates a standard pattern that normalizes a large number of times using a distortion function, and calculates the distance between this and the input pattern. Speech recognition is performed by detecting the one with the minimum value.

ところで、このＤＰマツチングの手法を用いる場合、登
録される標準パターンのフレーム数は不定であり、しか
も全登録標準パターンと入力パターンとのＤＰマツチン
グ処理をする必要があり、倍電が多くなると演算量が飛
曜的に増加する欠点がある。By the way, when using this DP matching method, the number of frames of the standard pattern to be registered is undefined, and it is necessary to perform DP matching processing between all registered standard patterns and input patterns, and as the number of double voltages increases, the amount of calculation increases. The disadvantage is that the amount increases dramatically.

また、ＤＰマツチングは、定常部（スペクトラムパター
ンの時間変化のない部分）を重視したマツチング方式で
あるので部分的類似パターン間で誤認識を生じる可能性
があった。Furthermore, since DP matching is a matching method that emphasizes a stationary portion (a portion of a spectrum pattern that does not change over time), there is a possibility that erroneous recognition may occur between partially similar patterns.

このような欠点を生じない時間正規化の手法を本出願人
は先に提案した（例えば特願昭５９−１０６１７７号）
。The present applicant has previously proposed a time normalization method that does not cause such drawbacks (for example, Japanese Patent Application No. 106177/1982).
.

すなわち、音響パラメータ時系列Ｐｉ［ｎｌは、そのパ
ラメータ空間を考えた場合、点列を描く。例えばｖｇｌ
！ｌｉ対象単語がｒＨＡＩＪであるとき音響分析用バン
ドパスフィルタの数が２個で、Ｐｉ（ｎｌ＝　（Ｐｚ　　Ｐ２　）であれば、入力音声の音響パラメータ時系列はその２次
元パラメータ空間には第１０図に示すような点列を描く
。この図から明らかなように音声の非定常部の点列は粗
に分布し、準定常部は密に分布する。この場合、完全に
音声が定常であればパラメータは変化せず、その場合に
は点列はパラメータ空間において一点に停留することに
なるが、人間は同じ音を発生しても、音声のゆらぎのた
め完全な定常にはならず、図のように準定常部として、
ゆらぎの影響がでる。That is, the acoustic parameter time series Pi[nl draws a point sequence when considering its parameter space. For example vgl
! li When the target word is rHAIJ, the number of acoustic analysis band-pass filters is 2, and if Pi(nl= (Pz P2 ), then the acoustic parameter time series of the input speech is the 10th in the two-dimensional parameter space. Draw a sequence of points as shown in the figure.As is clear from this figure, the sequence of points in the non-stationary part of the voice is distributed coarsely, and the sequence of points in the quasi-stationary part is densely distributed.In this case, even if the voice is completely stationary, In this case, the parameters do not change, and in that case the point sequence will stay at one point in the parameter space.However, even if a human generates the same sound, it will not become completely stationary due to fluctuations in the sound, and the sequence of points will remain at one point in the parameter space. As a quasi-stationary part,
There is an effect of fluctuation.

そして、以上のことから、音声の発声速度変動による時
間軸方向のずれは殆んどが準定常部の点列密度の違いに
起因し、非定常部の時間長の影響は少ないと考えられる
。そこで、この入力パラメータ時系列Ｐｉ（ｎ）の点列
から第１１図に示すように点列全体を近似的に通過する
ような連続曲線で描いた軌跡を推定すれば、この軌跡は
音声の発声速度変動に対して殆んど不変であることがわ
かる。From the above, it is considered that most of the deviation in the time axis direction due to variations in speech rate is caused by differences in point sequence density in the quasi-stationary part, and that the influence of the time length of the unsteady part is small. Therefore, if we estimate a trajectory drawn by a continuous curve that approximately passes through the entire point sequence as shown in Fig. 11 from the point sequence of this input parameter time series Pi(n), this trajectory will become the utterance of the voice. It can be seen that it remains almost unchanged with respect to speed fluctuations.

このことから、出願人は、次のような時間軸正規化方法
を提案した。すなわち、先ず入力パラメータの時系列Ｐ
　ｉ　（ｎｌの始端Ｐｉ（１）から終端ＰｉｌＮ）まで
を連続曲線Ｐ　ｉ　（８）で描いた軌跡を推定する。こ
の場合、この軌跡の推定は例えば音響パラメータ時系列
を第１２図に示すように直線近似することによって行な
う。この推定した曲線Ｐｉ（ｐから軌跡の長さＳを求め
る。そして第１２図においてＯ印で示すようにこの軌跡
に沿って所定長Ｔで再サンプリングする。Based on this, the applicant proposed the following time axis normalization method. That is, first, the time series P of input parameters
A trajectory drawn by a continuous curve P i (8) from the start end Pi (1) to the end PiN of nl is estimated. In this case, the trajectory is estimated by, for example, linearly approximating the acoustic parameter time series as shown in FIG. The length S of the trajectory is determined from this estimated curve Pi(p). Then, resampling is performed at a predetermined length T along this trajectory as indicated by O in FIG.

例えばＭ個の点に再サンプリングする場合、Ｔ−３／　
（Ｍ−１）　　　　　　・・・（１１の長さを基準とし
て軌跡を再サンプリングする。For example, when resampling to M points, T-3/
(M-1)...(The trajectory is resampled using the length of 11 as a reference.

この再サンプリングされた点列を描くパラメータ時系列
をＱｉｈ）（ｉ＝１．２・・・・Ｉ、ｍ＝１．２・・・
・Ｍ）とすれば、このパラメータ時系列Ｑｉ（ｍｌは軌
跡の基本情報を有しており、しかも音声の発声速度変動
に対して殆んど不変なパラメータである。The parameter time series that depicts this resampled point sequence is Qih) (i=1.2...I, m=1.2...
・M), this parameter time series Qi (ml) has basic information on the trajectory, and is a parameter that is almost invariant to variations in the speech rate.

つまり、時間軸が正規化された認識パラメータ時系列で
ある。In other words, it is a recognition parameter time series whose time axis has been normalized.

したがって、このパラメータ時系列Ｑｉ（（ロ）を標準
パターンとして登録しておくとともに、入力パターンも
このパラメータ時系列ＱＨＩｎ）として得、このパラメ
ータ時系列Ｑｌ（ｆｆｌ）により両パターン間の距離を
求め、その距離が最小であるものを検知して音声認識を
行うようにすれば、時間軸方向のずれが正規化されて除
去された状態で音声認識が常になされる。Therefore, this parameter time series Qi ((b)) is registered as a standard pattern, and the input pattern is also obtained as this parameter time series QHIn), and the distance between both patterns is calculated using this parameter time series Ql (ffl). If the voice recognition is performed by detecting the one with the minimum distance, the voice recognition will always be performed in a state where the deviation in the time axis direction is normalized and removed.

そして、この処理方法によれば、登録時の発声速度変動
や単語長の違いに関係なく認識パラメータ時系列旧（ホ
）のフレーム数は常にＭであり、その上、認識パラメー
タ時系列Ｑｉ（ｍ）は時間正規化されているので、入力
パターンと登録標準パターンとの距離の演算は最も単純
なチェビシェフ距離を求める演算でも良好な効果が期待
できる。According to this processing method, the number of frames in the recognition parameter time series old (E) is always M regardless of the speech rate fluctuation or word length difference at the time of registration, and in addition, the recognition parameter time series Qi (m ) has been time-normalized, so even the simplest calculation of the Chebyshev distance can be expected to yield good results when calculating the distance between the input pattern and the registered standard pattern.

また、以上の方法は音声の非定常部をより重視した時間
正規化の手法であり、ＤＰマツチング処理のような部分
的類似パターン間の誤認識が少なくなる。Furthermore, the above method is a time normalization method that places more emphasis on non-stationary parts of speech, and reduces misrecognition between partially similar patterns such as in DP matching processing.

さらに、発声速度の変動情報は正規化パラメータ時系列
旧（ホ）には含まれず、このためパラメータ空間に配位
するパラメータ遷移構造のグローバルな特徴等の扱いが
容易となり、不特定話者認識に対しても有効な各種方法
の通用が可能となる。Furthermore, since the speech rate fluctuation information is not included in the normalized parameter time series old (E), it is easy to handle the global characteristics of the parameter transition structure arranged in the parameter space, and it is useful for speaker-independent recognition. It becomes possible to use various methods that are also effective against these conditions.

なお、以下、以上のような時間正規化の処理をＮ　Ａ　
Ｔ　（Ｎｏｒ＋＋＋ａｌｉｚａｔｉｏｎ＾ｌｏｎｇ　Ｔ
ｒａｊｅｃｔｏｒｙ）処理と呼ぶ。In addition, in the following, the above time normalization process is performed using N A
T (Nor+++alization＾long T
This process is called ``direction'' processing.

Ｄ　発明が解決しようとする問題点ここで、ＮＡＴ処理を行なう音声認識装置とＤＰマツチ
ング処理を行なう音声認識装置との演算量における差異
について説明すると次のようになる。D. Problems to be Solved by the Invention Here, the difference in the amount of calculation between a speech recognition device that performs NAT processing and a speech recognition device that performs DP matching processing will be explained as follows.

入力パターンに対する標準パターン１個当たりのＤＰマ
ツチング距離算出部における平均演算量をαとし、一方
ＮＡＴ処理部の平均の演算量をγとし、チェビシェフ距
離算出部（ＮＡＴ処理の場合は距離算出部としてチェビ
シェフ距離算出手段を用いることができる）における平
均演算量をβとしたとき、ＪＩＢの標準パターンに対す
るＤＰマツチング処理による演算量Ｃ１はＣ１冨α弓である。また、５個の標準パターンに対するＮＡＴ処理
による演算量Ｃ２はＣ２−β・Ｊ＋γ である、一般に、平均演算量αは平均演算量βに対して
α）βなる関係がある。したがってなる関係が成り立ち
、認識対象語い数が増加するに従づて演算量Ｃ１は演算
量Ｃ２に対して（、Ｌ＞Ｃ２なる関係となり、ＮＡＴ処
理を行なう音声認識装置によれば、演算量を大幅に低減
できる。Let α be the average amount of calculation in the DP matching distance calculation unit per standard pattern for the input pattern, and let γ be the average amount of calculation in the NAT processing unit. When the average amount of calculations in the DP matching process for the JIB standard pattern is set to β, the amount of calculations C1 in the DP matching process for the JIB standard pattern is C1-value α. Further, the amount of calculations C2 due to the NAT processing for the five standard patterns is C2-β·J+γ. Generally, the average amount of calculations α has the relationship α)β with respect to the average amount of calculations β. Therefore, as the number of words to be recognized increases, the amount of calculation C1 becomes larger than the amount of calculation C2 (, L>C2, and according to the speech recognition device that performs NAT processing, the amount of calculation can be significantly reduced.

また、ＮＡＴ処理部より得られる認識パラメータ時系列
０ｆ（（ロ）はその時系列方向において一定のパラメー
タ数に設定できるので、標準パターンメモリ（４）の記
憶領域を有効に利用でき、その記憶容量を比較的少なく
できる。In addition, since the recognition parameter time series 0f ((b) obtained from the NAT processing unit can be set to a constant number of parameters in the time series direction, the storage area of the standard pattern memory (4) can be used effectively, and its storage capacity can be reduced. It can be done relatively little.

このようにＳＡＴ処理を行うようにした音声認識装置に
おいては、ＤＰマツチング処理を行うようにした音声認
識装置に比べ、入力パターンに対する標準パターン１個
当りの平均演算量の違いにより認識対象語い数の増加に
伴って演算量が低減する。In a speech recognition device that performs SAT processing in this way, compared to a speech recognition device that performs DP matching processing, the number of words to be recognized is lower due to the difference in the average amount of calculations per standard pattern for input patterns. The amount of calculation decreases as .

しかしながら、このＮＡＴ処理を行うようにした音声認
識装置においても、ＤＰマツチング処理を行なう場合と
同様に入力パターンに対して全標準パターンとの距離計
算をする必要があり、演算量の絶対量は依然として多く
、このため認識の応答が比較的遅いという欠点がある。However, even in a speech recognition device that performs this NAT processing, it is necessary to calculate the distance between the input pattern and all standard patterns, as in the case of DP matching processing, and the absolute amount of calculation is still For this reason, the recognition response is relatively slow.

Ｅ　問題点を解決するための手段この発明においては、入力音声信号の音響パラメータ系
列を得る音響分析手段（２）と、この音響分析手段（２
）よりの音響パラメータ系列がそのパラメータ空間で描
く軌跡を推定し、この軌跡の軌跡長を求める軌跡長算出
手段（９１）と、認識対象語の標準パターンの認識パラ
メータ系列が記憶されている標準パターンメモリ（４）
と、音響パラメータ系列に基づいて形成される入力パタ
ーンの認識パラメータ系列と上記標準パターンメモリよ
りの標準パターンの認識パラメータ系列との差を算出す
る第１の距離算出手段（６）と、この距離算出手段（６
）で算出された値の最小の標準パターンの語を検知して
認識出力を得る最小値判定手段（７）と、音響パラメー
タ系列から２値化パターンを形成する手段（４１）と、
認識対象語の２値化標準パターンが記憶されている２値
化標準パターンメモリ　（４３）と、２値化パターンと
２値化標準パターンメモリ　（４３）よりの２値化標準
パターンとの距離算出を行う第２の距離算出手段（４４
）と、この第２の距離算出手段（４４）の出力に基づい
て第１の距離算出手段（６）において入力パターンとの
距離算出をなす標準パターンを標準パターンメモリ（４
）に記憶されているすべてのものの中から選択する予備
選択手段（４５）とからなる。E. Means for Solving the Problems The present invention includes an acoustic analysis means (2) for obtaining an acoustic parameter sequence of an input audio signal;
) for estimating the trajectory drawn by the acoustic parameter series in the parameter space and calculating the trajectory length of this trajectory, and a standard pattern in which the recognition parameter series of the standard pattern of the recognition target word is stored. Memory (4)
and a first distance calculating means (6) for calculating the difference between the input pattern recognition parameter sequence formed based on the acoustic parameter sequence and the standard pattern recognition parameter sequence from the standard pattern memory; Means (6
); a minimum value determining means (7) for obtaining a recognition output by detecting the word of the minimum standard pattern of the values calculated in (7); and means (41) for forming a binarized pattern from the acoustic parameter series;
Distance calculation between the binarization standard pattern memory (43) in which the binarization standard pattern of the recognition target word is stored and the binarization standard pattern from the binarization standard pattern memory (43) A second distance calculation means (44
), and the standard pattern for calculating the distance to the input pattern in the first distance calculating means (6) based on the output of the second distance calculating means (44) is stored in the standard pattern memory (44).
).

Ｆ　作用入力パターンの２値化パターンと２値化標準パターンと
の距離算出によりすべての登録語の正規の標準パターン
のうち入力パターンとの距離が近接しているものが予備
選択され、この予備選択された標準パターンのみと入力
パターンとの距離算出がなされる。F By calculating the distance between the binary pattern of the action input pattern and the binary standard pattern, among the regular standard patterns of all registered words, those that are close to the input pattern are preselected, and this preliminary selection The distance between only the standard pattern that has been created and the input pattern is calculated.

したがって、この２値化パターンによる予備選択の段階
で、距離算出手段（６）で入力パターンとの演算の対象
となる標準パターンの数が予め絞られるので、その分だ
け絶対演算量が少なくなるものである。Therefore, at the stage of preliminary selection using this binarized pattern, the distance calculating means (6) narrows down the number of standard patterns to be calculated with the input pattern in advance, so the absolute amount of calculation is reduced accordingly. It is.

Ｇ　実施例第１図はこの発明による音声認識装置の一実施例で、こ
の例は音響分析に１６チヤンネルのバンドパスフィルタ
群を用いた場合である。G. Embodiment FIG. 1 shows an embodiment of the speech recognition apparatus according to the present invention, in which a 16-channel band-pass filter group is used for acoustic analysis.

Ｇ１　音響分析回路（２）の説明すなわち、音響分析回路（２）においては、マイクロホ
ン（１１からの音声信号がアンプ（２１１）及び帯域制
限用のローパスフィルタ（２１２）を介してＡ／Ｄコン
バータ（２１３）に供給され、例えば１２．５ｋＨｚの
サンプリング周波数で１２ビツトのデジタル音声信号に
変換される。このデジタル音声信号は、１６チヤンネル
のバンドパスフィルタバンク（２２）の各チャンネルの
デジタルバンドパスフィルタ　（２２１ｚ）　　、　　
（２２１２）　　、・・・・、　　（２２１ｘｓ）に供
給される。このデジタルバンドパスフィルタ（２２ｈ）
　、　　（２２１２）　、・・・・、　　（２２ｈａ）
は例えばバターワース４次のデジタルフィルタにて構成
され、２５０Ｈｚから５．５ＫＨｚまでの帯域が対数軸
上で等間隔で分割された各帯域が各フィルタの通過帯域
となるようにされている。そして、各デジタルバンドパ
スフィルタ（２２１１）　、　　（２２１２）　。G1 Description of the acoustic analysis circuit (2) In other words, in the acoustic analysis circuit (2), the audio signal from the microphone (11) is passed through the amplifier (211) and the low-pass filter (212) for band limitation to the A/D converter ( 213) and is converted into a 12-bit digital audio signal at a sampling frequency of, for example, 12.5kHz.This digital audio signal is supplied to the digital bandpass filter (213) of each channel of the 16-channel bandpass filter bank (22). 221z),
(2212), ..., (221xs). This digital band pass filter (22h)
, (2212) ,..., (22ha)
is composed of, for example, a Butterworth fourth-order digital filter, and each band obtained by dividing the band from 250 Hz to 5.5 KHz at equal intervals on the logarithmic axis becomes the pass band of each filter. And each digital band pass filter (2211), (2212).

・・・・、　　（２２１１８）の出力信号はそれぞれ整
流回路（２２２１）　、　　（２２２２）　、・・・・
、　　（２２２ｔｓ）に供給され、これら整流回路（２
２２１）　、　　（２２２２）。..., (22118) output signals are respectively rectified circuits (2221), (2222), ...
, (222ts) and these rectifier circuits (222ts).
221), (2222).

・・・・（２２２１ｇ　）の出力はそれぞれデジタルロ
ーパスフィルタ（２２３１）　、　　（２２３２）　、
・・・・、（２２３ｔｓ）に供給される。これらデジタ
ルローパスフィルタ（２２３１）　、　　（２２３２）
　、・・・・、　　（２２３ｔｓ）は例えばカットオフ
周波数５２．８ＨｚのＦＩＲローパスフィルタにて構成
される。The outputs of (2221g) are respectively digital low-pass filters (2231), (2232),
..., (223ts) is supplied. These digital low-pass filters (2231), (2232)
, . . . (223ts) is composed of, for example, an FIR low-pass filter with a cutoff frequency of 52.8 Hz.

音響分析回路（２）の出力である各デジタルローパスフ
ィルタ（２２３１）　、　　（２２３２）　、・・・・
。Each digital low-pass filter (2231), (2232), etc. which is the output of the acoustic analysis circuit (2)
.

（２２３１８）の出力信号は特徴抽出回路（２３）を構
成するサンプラー（２３１’）に供給される。このサン
プラー（２３１）ではデジタルローパスフィルタ（２２
３１）　、　　（２２３２）　、・・・・、　　（２２
３ｓｓ）の出力信号をフレーム周期５．１２ｍ５ｅｃ毎
にサンプリングする。したがって、これよりはサンプル
時系列Ａｉ（ｎ）　（ｉ　＝　１．　２．　　・・・４
６；　ｎはフレーム番号でｎ−１＋２＋　・・・・、Ｎ
）が得られる。The output signal of (22318) is supplied to a sampler (231') constituting the feature extraction circuit (23). This sampler (231) uses a digital low-pass filter (22
31) , (2232) ,..., (22
3ss) is sampled every frame period of 5.12m5ec. Therefore, from this, the sample time series Ai(n) (i = 1. 2. . . . 4
6; n is the frame number n-1+2+ ..., N
) is obtained.

このサンプラー（２３１）からの出力、つまりサンプル
時系列Ａｔ（ｎ）は音源情報正規化回路（２３２）に供
給され、これにて認識しようとする音声の話者による声
帯音源特性の違いが除去される。The output from this sampler (231), that is, the sample time series At(n), is supplied to a sound source information normalization circuit (232), which removes differences in vocal cord sound source characteristics depending on the speaker of the speech to be recognized. Ru.

即ち、フレーム周期毎にサンプラー（２３１）から供給
されるサンプル時系列Ａｉ（ｎ）に対してＡｔ（ｎ）＝
　　ｌｏｇ　（Ａｉ（ｎ）＋Ｂ）　　　　　　　・・・
（２）なる対数変換がなされる。この（１）式において
、Ｂはバイアスでノイズレベルが隠れる程度の値を設定
する。That is, for the sample time series Ai(n) supplied from the sampler (231) every frame period, At(n)=
log (Ai(n)+B)...
(2) A logarithmic transformation is performed. In this equation (1), B is set to a value such that the noise level is hidden by the bias.

そして、声帯音源特性をｙｉ＝ａ−ｉ＋ｂなる式で近似
すると、このａ及びｂの係数は次式により決定される。Then, when the vocal cord sound source characteristics are approximated by the formula yi=a−i+b, the coefficients of a and b are determined by the following formula.

（１＝　１６）　　　　　　・　・　・（３）Ｉ　　（
１−１）（Ｉ　−１６）　　　　　　・　・　・（４）そして、
音源の正規化されたパラメータをＰｉ（ロ）とすると、
ａ（口１＜０のときパラメータＰｉ（ｎｌはＰｉ（ｎｌ
＝Ａｌ（ｎｌ−（ａ（ｎ）　・ｉ＋　ｂ（ｎ））　　　
　・・１５１と表される。(1=16) ・・・(3) I (
1-1) (I-16) ・・・(4) And,
Letting the normalized parameters of the sound source be Pi (b),
a(mouth1<0, parameter Pi(nl is Pi(nl
=Al(nl-(a(n) ・i+ b(n))
...It is expressed as 151.

又、ａ　（ｎ）≧０のときレベルの正規化のみ行ない、
パラメータＰｉ（ｎ）は・・・（６）と表される。Also, when a (n)≧0, only level normalization is performed,
The parameter Pi(n) is expressed as...(6).

こうして声帯音源特性の違いが正規化されて除去された
音響パラメータ時系列Ｐ　１（ｎ）がこの音源情報正規
化回路（２３２）より得られる。In this way, the sound source information normalization circuit (232) obtains an acoustic parameter time series P1(n) in which differences in vocal cord sound source characteristics are normalized and removed.

この音源情報正規化回路（２３２）よりの音響パラメー
タ時系列Ｐｉ（ｎｌは正の値及び負の値の両者をとる。The acoustic parameter time series Pi (nl) from this sound source information normalization circuit (232) takes both positive and negative values.

この音源情報正規化回路（２３２）よりの音響パラメー
タＰｉ（ｎ）は音声区間内パラメータメモリ（８）に供
給される。この音声区間内パラメータメモリ（８）では
音声区間判定回路（２４）からの音声区間判定信号を受
けて、パラメータＰｉ（ｎ）が、判定さた音声区間毎に
ストアされる。The acoustic parameters Pi(n) from the sound source information normalization circuit (232) are supplied to the intra-speech interval parameter memory (8). The intra-speech-segment parameter memory (8) receives a speech-segment determination signal from the speech-segment determination circuit (24), and stores parameters Pi(n) for each determined voice period.

音声区間判定回路（２４）はゼロクロスカウンタ（２４
１）とパワー算出回路（２４２）と音声区間決定回路（
２４３）とからなり、Ａ／Ｄコンバータ（２１３）より
のデジタル音声信号がゼロクロスカウンタ（２４１）及
びパワー算出回路（２４２）に供給される。ゼロクロス
カウンタ（２４’ｌ）では１フレ一ム周期５．１２ｍ５
ｅｃ毎に、この１フレ一ム周期内の６４サンプルのデジ
タル音声信号のゼロクロス数をカウントし、そのカウン
ト値が音声区間決定回路（２４３）の第１の入力端に供
給される。パワー算出回路（２４２）では１フレ一ム周
期毎にこのｌフレーム周期内のデジタル音声信号のパワ
ー、すなわち２乗和が求められ、その出力パワー信号が
音声区間決定回路（２４３）の第２の入力端に供給され
る。音声区間決定回路（２４３）には、さらに、その第
３の入力端に音源情報正規化回路（２３２）よりの音源
正規化情報が供給される。そして、この音声区間決定回
路（２４３）においてはゼロクロス数、区間内パワー及
び音源正規化情報が複合的に処理され、無音、無声音及
び有声音の判定処理が行なわれ、音声区間が決定される
。The voice section determination circuit (24) includes a zero cross counter (24).
1), a power calculation circuit (242), and a voice section determination circuit (
243), and the digital audio signal from the A/D converter (213) is supplied to the zero cross counter (241) and the power calculation circuit (242). For zero cross counter (24'l), one frame period is 5.12m5
For each ec, the number of zero crosses of the 64 samples of the digital audio signal within one frame period is counted, and the count value is supplied to the first input terminal of the audio section determining circuit (243). The power calculation circuit (242) calculates the power of the digital audio signal within this l frame period for each frame period, that is, the sum of squares, and the output power signal is sent to the second voice section determination circuit (243). Supplied to the input end. The voice section determination circuit (243) is further supplied with sound source normalization information from the sound source information normalization circuit (232) to its third input terminal. In this speech section determining circuit (243), the number of zero crossings, the power within the section, and the sound source normalization information are processed in a composite manner, and a process for determining silence, unvoiced sound, and voiced sound is performed, and a speech section is determined.

この音声区間決定回路（２４３）よりの判定された音声
区間を示す音声区間判定信号は音声区間判定回路（２４
）の出力として音声区間内パラメータメモリ　（２００
）に供給される。The voice interval determination signal indicating the determined voice interval from the voice interval determination circuit (243) is transmitted to the voice interval determination circuit (243).
) as the output of the voice interval parameter memory (200
).

こうして、判定音声区間内においてメモリ　（２００）
にストアされた音響パラメータ時系列Ｐ　１（１１はＮ
ＡＴ処理回路（９）に供給される。In this way, within the judgment voice section, the memory (200)
The acoustic parameter time series P 1 (11 is N
The signal is supplied to the AT processing circuit (9).

０２　時間正規化処理の説明ＮＡＴ処理回路（９）は軌跡長算出回路（９１）と補間
間隔算出回路（９２）と補間点抽出回路（９３）からな
る。02 Description of Time Normalization Process The NAT processing circuit (9) consists of a trajectory length calculation circuit (91), an interpolation interval calculation circuit (92), and an interpolation point extraction circuit (93).

パラメータメモリ　（２００）からのパラメータ時系゛
列Ｐｉ（ｎｌ　　（ｉ　＝　１．　２．　−、　１６；
　　ｎ−１，２。Parameter time series Pi(nl (i = 1. 2. −, 16;
n-1,2.

・・・・、Ｎ）は軌跡長算出回路（９１）に供給される
。..., N) are supplied to the trajectory length calculation circuit (91).

この軌跡長算出回路（９１）においては音響パラメータ
時系列Ｐｉ（ｎ）がそのパラメータ空間において前述の
第１１図に示すように描く直線近似による軌跡の長さを
算出する。This trajectory length calculation circuit (91) calculates the length of the trajectory of the acoustic parameter time series Pi(n) by linear approximation drawn in the parameter space as shown in FIG. 11 described above.

この場合、■次元ベクトルａｌ及びｂ【間のユークリッ
ド距離Ｄ（ａ＋、ｂｔ）は ■ ・・・（７）である。そこで、■次元の音響パラメータ時系列Ｐｉ（
ｎｌより、直線近似により軌跡を推定した場合の時系列
方向に隣接するパラメータ間距離Ｓ　（ｎ）は５（ｎ）
＝　Ｄ　（Ｐｉ　（ｎ　＋　１　）　、　Ｐｉ（ｎ））
（ｎ＝１．・・・・、Ｎ）　　　　　　・・・（８）と
表わされる。そして、時系列方向における第１番目のパ
ラメータＰｉ（１）から第ｎ番目のパラメータＰｉｔｎ
ｌ迄の距離５Ｌ（ｎｌはと表わされる。なお、５Ｌ（１）−〇である。In this case, the Euclidean distance D (a+, bt) between the ■-dimensional vectors al and b[ is (7). Therefore, ■-dimensional acoustic parameter time series Pi (
From nl, the distance S (n) between adjacent parameters in the time series direction when the trajectory is estimated by linear approximation is 5(n)
= D (Pi (n + 1), Pi (n))
(n=1...,N)...(8). Then, the first parameter Pi(1) to the nth parameter Pitn in the time series direction
The distance 5L (nl) to l is expressed as 5L(1)-〇.

そして、全軌跡長ＳＬはと表わされる。軌跡長算出回路（９１）はこの（１１）
式、（１２）式及び（１３）にて示す信号処理を行なう
。Then, the total trajectory length SL is expressed as. The trajectory length calculation circuit (91) is this (11)
The signal processing shown in equations (12) and (13) is performed.

この軌跡長算出回路（９１）にて求められた軌跡長ＳＬ
を示す信号は補間間隔算出回路（９２）に供給される。The trajectory length SL obtained by this trajectory length calculation circuit (91)
A signal indicating the interpolation interval calculation circuit (92) is supplied to the interpolation interval calculation circuit (92).

この補間間隔算出回路（９２）では軌跡に沿って再サン
プリングするときの再サンプリング間隔Ｔを算出する。This interpolation interval calculation circuit (92) calculates the resampling interval T when resampling is performed along the locus.

この場合、Ｍ点に再サンプリングするとすれば、再サン
プリング間隔ＴはＴ＝ＳＬ／　（Ｍ−１）　　　　　　　　　・・・　（
１１として求められる。In this case, if resampling is performed at M points, the resampling interval T is T = SL/ (M-1) ... (
11.

この補間間隔算出回路（９２）よりの再サンプリング間
隔Ｔを示す信号は補間点抽出回路（９３）に供給される
。また、パラメータメモリ（８）よりの音響パラメータ
時系列Ｐｉ（ｎ）も、また、この補間点抽出回路（９３
）に供給される。この補間点抽出回路（９３）は音響パ
ラメータ時系列Ｐｉ（ｎ）のそのパラメータ空間におけ
る軌跡、例えばパラメータ間を直線近似した軌跡に沿っ
て第１２図においてＯ印にて示すように再サンプリング
間隔Ｔで再サンプリングし、このサンプリングにより得
た新たな点列より認識パラメータ時系列Ｑｔ（ホ）を形
成する。A signal indicating the resampling interval T from this interpolation interval calculation circuit (92) is supplied to an interpolation point extraction circuit (93). In addition, the acoustic parameter time series Pi(n) from the parameter memory (8) is also stored in this interpolation point extraction circuit (93
). This interpolation point extraction circuit (93) follows the locus of the acoustic parameter time series Pi(n) in its parameter space, for example, the locus of linear approximation between the parameters, at resampling intervals T as shown by O in FIG. The recognition parameter time series Qt(e) is formed from the new point sequence obtained by this sampling.

ここで、この補間点抽出回路（９３）においては第２図
に示すフローチャートに従った処理がなされ、認識パラ
メータ時系列Ｑｉに）が形成される。Here, the interpolation point extraction circuit (93) performs processing according to the flowchart shown in FIG. 2 to form a recognition parameter time series Qi.

先ず、ステップ（１０１３にて再サンプリング点の時系
列方向における番号を示す変数Ｊに値１が設定されると
共に音響パラメータ時系列Ｐｉ［ｎ）のフレーム番号を
示す変数ＩＣに値１が設定され、イニ）　シャライズさ
れる。次にステップ（１０２）にて変数Ｊがインクリメ
ントされ、ステップ（１０３）に・てそのときの変数Ｊ
が（Ｍ−１）以下であるかどうかが判別されることによ
り、そのときの再サンプリング点の時系列方向における
番号かりサンプリングする必要のある最後の番号になっ
ているかどうかを判断する。最後の番号であればステッ
プ（１０４）に進み、再サンプリングは終了する。First, in step (1013), the value 1 is set to the variable J indicating the number of the resampling point in the time series direction, and the value 1 is set to the variable IC indicating the frame number of the acoustic parameter time series Pi[n), Ini) Charized. Next, in step (102), the variable J is incremented, and in step (103), the variable J at that time is incremented.
By determining whether or not is less than or equal to (M-1), it is determined whether the number of the resampling point at that time in the time series direction is the last number that needs to be sampled. If it is the last number, the process advances to step (104) and resampling ends.

最後の番号でなければステップ（１０５）にて第１番目
の再サンプリング点（これは必ず無音の部分である。）
から第３番目の再サンプリング点までの再サンプリング
距離ＤＬが算出される。次にステップ（１０６）に進み
変数ＩＣがインクリメントされる。次にステップ（１０
７）にて再サンプル距離ＤＬが音響パラメータ時系列Ｐ
Ｌ（ｎｌの第１番目のパラメータＰｉ（１）から第１Ｃ
番目のパラメータＰｉ（（６）までの距離ＳＬ（ゆより
も小さいかどうかにより、そのときの再サンプリング点
が軌跡上においてそのときのパラメータＰｉ＋ゆよりも
軌跡の始点側に位置するかどうかが判断され、始点側に
位置していなければステップ（１０６）に戻り変数ＩＣ
をインクリメントした後再びステップ（１０７）にて再
サンプリング点とパラメータｐｉａａとの軌跡上におけ
る位置の比較をし、再サンプリング点が軌跡上において
゛パラメータメモリｏよりも始点側に位置すると判断さ
れたとき、ステップ（１０Ｂ）に進み認識パラメータＱ
ｉｇ）が形成される。If it is not the last number, go to step (105) and select the first resampling point (this is always a silent part).
A resampling distance DL from to the third resampling point is calculated. Next, the process proceeds to step (106), where the variable IC is incremented. Next step (10
7), the resampling distance DL is the acoustic parameter time series P
L(nl's first parameter Pi(1) to first C
Depending on whether the distance to the th parameter Pi ((6) is smaller than the distance SL (Y), it is determined whether the resampling point at that time is located closer to the starting point of the trajectory than the parameter Pi + Yu at that time on the trajectory. and if it is not located on the starting point side, return to step (106) and change the variable IC
After incrementing , the positions of the resampling point and the parameter piaa on the trajectory are compared again in step (107), and when it is determined that the resampling point is located closer to the starting point than the parameter memory o on the trajectory. , proceed to step (10B) and recognize the recognition parameter Q
ig) is formed.

即ち、第３番目の再サンプリング点による再サンプリン
グ距ＭＤＬからこの第３番目の再サンプリング点よりも
始点側に位置する第（ＩＣ−１）番目のパラメータＰ　
ｉ　ａｃ−１）による距離Ｓ　Ｌ（ＩＣ−１１を減算し
て第（ＩＣ−１）番目のパラメータＰ　１（Ｈａ−１＞
から第３番目の再サンプリング点迄の距離ＳＳを求める
。次に、軌跡上においてこの第３番目の再サンプリング
点の両側に位置するパラメータＰ　ｉ　［ＩＣ−１３及
びパラメータＰｉａａ間の距離５（ｎ）（この距％ｌｌ
　Ｓ　（ｎｌは（１１）式にて示される信号処理にて得
られる。）にてこの距離ＳＳを除算し、この除算結果Ｓ
Ｓ／　Ｓ　（Ｉｃ−１１に軌跡上において第３番目の再
サンプリング点の両側に位置するパラメータＰｉ（１０
とＰ　ｆｏｃ−１＋との差（Ｐ　ｔｏｏ　−Ｐ　ｉ＋＋
ｃ　−１＋　）を掛算して、軌跡上において第３番目の
再サンプリング点のこの再サンプリング点よりも始点側
に隣接して位置する第（ＩＣ−１）番目のパラメータＰ
　ｉｒ＋ｃ−ｔ＋からの補間量を算出し、この補間量と
第３番目の再サンプリング点よりも始点側に隣接して位
置する第（ＩＣ−１）番目のパラメータＰ　ｔ　（ＩＧ
−１＋とを加算して、軌跡に沿う新たな認識パラメータ
Ｑｉσ）が形成される。That is, from the resampling distance MDL by the third resampling point, the (IC-1)th parameter P located closer to the starting point than this third resampling point
i ac-1) by subtracting the distance S L (IC-11) to obtain the (IC-1)th parameter P 1 (Ha-1>
Find the distance SS from to the third resampling point. Next, the distance 5(n) between the parameter P i [IC-13 and the parameter Piaa located on both sides of this third resampling point on the trajectory (this distance %ll
Divide this distance SS by S (nl is obtained by signal processing shown in equation (11)), and the division result S
S/S (Ic-11 has parameters Pi(10
and P foc-1+ (P too -P i++
c −1+ ) to calculate the (IC−1)th parameter P located adjacent to the starting point side of the third resampling point on the trajectory.
The interpolation amount from ir+c-t+ is calculated, and this interpolation amount and the (IC-1)th parameter P t (IG
-1+ is added to form a new recognition parameter Qiσ) along the trajectory.

このようにして始点及び終点（これらはそれぞれ無音で
あるときはＱｉ（１）　＝　Ｐｉ（ｏ）　＝　０　、　
Ｑｉｃｘ　＝　Ｐｉ＋ｇ＋＝０である。）を除＜　　（
Ｍ−２）点の再サンプリングによりｍ識パラメータ時系
列Ｑｉ（１１１１が形成される。In this way, the starting point and the ending point (when these are silent, respectively, Qi (1) = Pi (o) = 0,
Qicx=Pi+g+=0. ) except < (
By resampling M-2) points, an m-identified parameter time series Qi (1111) is formed.

Ｇ３　パターンマツチング処理の説明このＮＡＴ処理回路（９）よりのｔＩｍパラメータ時系
列旧に）はモード切換回路（３）に供給されるとともに
予備選択回路（４０）に供給される。また、軌跡長算出
回路（９１）よりの算出軌跡長を示す信号が予備選択回
路（４０）に供給される。G3 Description of Pattern Matching Process The tIm parameter time series from the NAT processing circuit (9) is supplied to the mode switching circuit (3) and also to the preliminary selection circuit (40). Further, a signal indicating the calculated trajectory length from the trajectory length calculation circuit (91) is supplied to the preliminary selection circuit (40).

そして、登録時においては認識パラメータ時系列は標準
パターンメモリ（４）にストアされる。At the time of registration, the recognition parameter time series is stored in the standard pattern memory (4).

予備選択回路（４０）に供給された認識パラメータ時系
列Ｑ１（ロ）は後述するように２値化パターンにされ、
これがメモリに登録される。また、予備選択回路（４０
）に供給された軌跡長は後述するように予備選択のため
の軌跡長辞書の作成に供される。The recognition parameter time series Q1 (b) supplied to the preliminary selection circuit (40) is converted into a binarized pattern as described later,
This is registered in memory. In addition, a preliminary selection circuit (40
) is used to create a trajectory length dictionary for preliminary selection, as will be described later.

次に認識時においては、ＮＡＴ処理回路（９）よりのＭ
１１ｉａパターン時系列Ｑｉ（ホ）はモード切換スイッ
チ（３）を介して距離算出回路（６）に供給され、正規
の標準パターンメモリ（４）よりの標準パターンのパラ
メータ時系列との距離の算出がなされる。この場合に距
離算出回路（６）において入力パラメータと距離演算な
されるのは予備選択手段（４０）の出力により後述する
ようにしてメモリ（４）のすべての標準パターンから入
力パターンに近いもののみが選択されて絞り込まれたも
のである。Next, at the time of recognition, M from the NAT processing circuit (9)
The 11ia pattern time series Qi (E) is supplied to the distance calculation circuit (6) via the mode changeover switch (3), and the distance between it and the parameter time series of the standard pattern from the regular standard pattern memory (4) is calculated. It will be done. In this case, the distance calculation circuit (6) performs distance calculations on the input parameters based on the output of the preliminary selection means (40), as will be described later. It has been selected and narrowed down.

距離算出回路（６）で算出される距離は例えば簡易的な
チェビシェフ距離として算出される。この距離算出回路
（６）よりの各標準パターンと入力パターンとの距離の
算出出力は最小値となる標準パターンが判定され、この
判定結果により入力音声信号の認識結果が出力端（７０
）に得られる。The distance calculated by the distance calculation circuit (6) is calculated as, for example, a simple Chebyshev distance. The distance calculation circuit (6) calculates the distance between each standard pattern and the input pattern to determine the standard pattern that has the minimum value, and based on this determination result, the recognition result of the input audio signal is
) can be obtained.

Ｇ４　予備選択の説明第３図は予備選択回路（４０）の−例のブロック図で、
これらはすべてコンピュータ処理によっても実現できる
。G4 Preliminary Selection Description FIG. 3 is an example block diagram of the preliminary selection circuit (40).
All of these can also be achieved through computer processing.

先ず、ＳＡＴ処理回路（９）の補間点抽出回路（９３）
よりの認識パラメータ時系列Ｑｉ（（ロ）は２値化パタ
ーン生成回路（４１）に供給される。前述したように認
識パラメータ時系列Ｑｉ（ホ）は音源情報正規化回路（
２３２）における正規化処理により、正、負の値を有し
ているので、２値化パターン生成回路（４１）では例え
ば正の値に対しては「ｌ」、負の値に対しては「０」が
対応づけられて２値化パターンが生成される。First, the interpolation point extraction circuit (93) of the SAT processing circuit (9)
The recognition parameter time series Qi ((b)) is supplied to the binarization pattern generation circuit (41). As mentioned above, the recognition parameter time series Qi (e) is supplied to the sound source information normalization circuit (41).
232), it has positive and negative values, so the binarization pattern generation circuit (41) uses, for example, "l" for positive values and "l" for negative values. 0'' are associated with each other to generate a binarized pattern.

そして、登録モード時は、この２値化パターンがモード
切換回路（４２）を介して２値化標準パターンメモリ　
（４３）に書き込まれる。In the registration mode, this binarized pattern is transferred to the binarized standard pattern memory via the mode switching circuit (42).
(43) is written.

さらに、この例においてはこの登録モード時において軌
跡長算出回路（９１）よりの軌跡長出力ＳＬがモード切
換回路（４６）を介して軌跡長メモリ（４７）に−担ス
ドアされる。Further, in this example, in this registration mode, the trajectory length output SL from the trajectory length calculation circuit (91) is stored in the trajectory length memory (47) via the mode switching circuit (46).

そして、この例においては認識対象語のすべての正規の
標準パターン、２値化標準パターン及び軌跡長の登録が
終了すると、軌跡長辞書作成手段（４８）によりメモリ
　（４７）に記憶された軌跡長に基づいて所定の長さ範
囲の軌跡長に対してどの語の標準パターンを２値化標準
パターンメモリ　（４３）より読み出すかの軌跡長辞書
が作成される。In this example, when the registration of all regular standard patterns, binarized standard patterns, and trajectory lengths of the recognition target word is completed, the trajectory length dictionary creation means (48) stores the trajectory length in the memory (47). Based on this, a trajectory length dictionary is created that indicates which word standard pattern is to be read out from the binarized standard pattern memory (43) for trajectory lengths in a predetermined length range.

この例の場合、次のようにして辞書が作成される。In this example, the dictionary is created as follows.

すなわち、メモリ　（４７）に記憶されているすべての
軌跡長からその最大値と最小値を求め、この最大値と最
小値との間を例えばｎ等分してｎ個の軌跡長範囲を形成
し、各範囲に属する軌跡長の語を登録して辞書を作成す
る０例えば、最大値が６００で、最小値が２００であっ
て、範囲長が５０である場合、第４図に示すように■■
・・・・＠の軌跡長範囲が定まり、各範囲のデータとそ
の各範囲に属する語Ａ、Ｂ、Ｃ，Ｄ、・・・・のデータ
（例えば各語Ａ。That is, the maximum and minimum values are determined from all the trajectory lengths stored in the memory (47), and the distance between the maximum and minimum values is divided into n equal parts to form n trajectory length ranges. , Create a dictionary by registering the words of trajectory length belonging to each range.0 For example, if the maximum value is 600, the minimum value is 200, and the range length is 50, as shown in Figure 4, ■ ■
...The trajectory length range of @ is determined, and the data of each range and the data of words A, B, C, D, etc. belonging to each range (for example, each word A.

Ｂ、Ｃ，Ｄ、　　・・・・の２値化標準パターンメモリ
（４３）のアドレスデータ）が軌跡長メモリ　（４７）
にストアされる。The address data of the binary standard pattern memory (43) of B, C, D, ...) is the trajectory length memory (47)
Stored in

次に、音声ｔｌｋ時は以下のようにして、軌跡長辞書を
用いた予備選択°が行なわれ、さらに２値化パターンに
よる予備選択がなされた後、パターンマツチング処理が
なされる。Next, at the time of voice tlk, preliminary selection using a trajectory length dictionary is performed as described below, further preliminary selection is performed using a binarized pattern, and then pattern matching processing is performed.

すなわち、第５図は軌跡長による２値化パターンの予備
選択のフローチャートで、軌跡長算出回路（９１）より
の入力語の音響パラメータの軌跡長ＳＬがモード切換回
路（４６）を介して予備選択手段（４９）に供給される
（ステップ（２０１）　）。次にメモリ（４７）より０
番の軌跡長範囲が読み出される（ステップ（２０２）　
）　、すなわち、始めは■の軌跡長範囲が読み出される
０次に入力の軌跡長ＳＬがこの■の範囲内であるかどう
か判別され（ステップ（２０３）　）　、範囲内でなけ
れば、軌跡長範囲が０番のもの、つまり最後のものであ
るか否か判別され（ステップ（２０４）　）　、０番の
範囲でなければ、ステップ（２０２）に戻り、次の０番
、例えば０番の範囲内に、その軌跡長ＳＬがはいるかど
うか判別される。That is, FIG. 5 is a flowchart of preliminary selection of a binarization pattern based on the trajectory length, in which the trajectory length SL of the acoustic parameter of the input word from the trajectory length calculation circuit (91) is preselected via the mode switching circuit (46). It is supplied to means (49) (step (201)). Next, 0 from memory (47)
The trajectory length range of the number is read out (step (202)
), that is, at the beginning, the trajectory length range of ■ is read out. It is determined whether the input trajectory length SL is within the range of ■ (step (203)), and if it is not within the range, the trajectory length range is read out. It is determined whether or not it is the number 0, that is, the last one (step (204)), and if it is not within the range of number 0, the process returns to step (202) and the next number 0, for example, within the range of number 0 is determined. Then, it is determined whether the trajectory length SL is within the range.

そして、ステップ（２０３）で軌跡長ＳＬが０番の軌跡
長範囲内として判別されると、その０番の軌跡長範囲の
登録語のアドレスデータがメモリ　（４７）より読み出
され、これが予備選択回路（４９）を介して２値化標準
パターンメモリ　（４３）に供給されて、これよりは０
番の軌跡長範囲の登録語の２値化パターンのみが読み出
され（ステップ（２０５）　）、距離算出回路（４４）
に供給される。一方、２値化パターン生成回路（４１）
よりの入力パターンの２値化パターンはモード切換回路
（４２）を介して距離算出回路（４４）に供給されて、
読み出された２値化標準パターンとの距離算出がなされ
る。Then, when the trajectory length SL is determined to be within the trajectory length range of number 0 in step (203), the address data of the registered word in the trajectory length range of number 0 is read out from the memory (47), and this is used as the preliminary selection. It is supplied to the binarized standard pattern memory (43) via the circuit (49), and from this
Only the binarized pattern of the registered word in the trajectory length range of number is read out (step (205)), and the distance calculation circuit (44)
supplied to On the other hand, the binarization pattern generation circuit (41)
The binarized pattern of the input pattern is supplied to the distance calculation circuit (44) via the mode switching circuit (42),
The distance to the read binarized standard pattern is calculated.

この場合の距離計算は２値化チエビシエフ距離を排他的
論理和で計算できる。例えば、正規化パターンつまり認
識パラメータＱｉ（ホ）が１６ビツト精度で、１６×１
８ワードの正規化パターンを２値化パターン１８ワード
に圧縮でき、排他的論理和をとったとき、その「０」出
力のカウント値の総和で距離計算が可能である。演算量
は１６ビツトの正規化パターンで行なう場合の約１／１
０〜１／１６の演算量で十分である。In this case, the distance can be calculated by exclusive ORing the binarized Tievishiev distance. For example, the normalization pattern, that is, the recognition parameter Qi (E) has a precision of 16 bits, and is 16×1.
An 8-word normalized pattern can be compressed into an 18-word binarized pattern, and when the exclusive OR is performed, distance calculation is possible with the sum of the count values of the "0" outputs. The amount of calculation is approximately 1/1 of that when using a 16-bit normalized pattern.
A calculation amount of 0 to 1/16 is sufficient.

すなわち、距離Ｔは次式で求められる。That is, the distance T is determined by the following formula.

Ｔ−Σ　ビットカウント（ＩＩｊ■Ｓｊ）ステップ（２
０３）において、０〜０番のすべての軌跡長範囲に、入
力の軌跡長ＳＬがはいらなかったときは、ステップ（２
０４）でそれが判別され、入力語は登録語中にはない、
あるいは大きくかけ離れていると判別されてリジェクト
信号が端子（５０）に得られ、２値化標準パターンと入
力パターンとの距離の差の演算はなされず、例えば認識
不能の表示がされる。もちろん、このときは後述の予備
選択手段（４５）の出力により距離算出回路（６）での
距離算出もなされない。T-Σ bit count (IIj■Sj) step (2
03), if the input trajectory length SL does not fall within all trajectory length ranges from 0 to 0, step (2)
04), and the input word is not among the registered words.
Alternatively, it is determined that they are far apart, and a reject signal is obtained at the terminal (50), and the difference in distance between the binarized standard pattern and the input pattern is not calculated, and, for example, an unrecognizable display is displayed. Of course, at this time, the distance calculation circuit (6) does not calculate the distance due to the output of the preliminary selection means (45), which will be described later.

以上のように軌跡長により予備選択され、すべての２値
化標準パターンから絞り込まれて読み出された２値化標
準パターンは、前述のようにして入力パターンの２値化
パターンとの距離算出が距離算出回路（４４）において
なされ、その距離算出出力は２値化パターン予備選択回
路（４５）に供給される。この予備選択回路（４５）で
は距離算出出力値の小さいものから上位に１固（ｋはす
べての登録パターン数より小さい数）の正規の標準パタ
ーンをのみ選出して距離算出回路（６）において入力パ
ターンとの距離算出を行なうための信号ＳＲを得る。そ
してこの例ではこの信号ＳＥは距離算出回路（６）に供
給されて、モード切換回路（３）よりの入力パターンと
距離演算される標準パターンが選択される。As described above, the binarized standard pattern that has been preselected based on the trajectory length, narrowed down and read out from all the binarized standard patterns can be used to calculate the distance between the input pattern and the binarized pattern as described above. This is performed in a distance calculation circuit (44), and the distance calculation output is supplied to a binarization pattern preliminary selection circuit (45). This preliminary selection circuit (45) selects only the top regular standard patterns (k is a number smaller than all the registered patterns) from those with the smallest distance calculation output value, and inputs them to the distance calculation circuit (6). A signal SR for calculating the distance to the pattern is obtained. In this example, this signal SE is supplied to the distance calculation circuit (6), and a standard pattern to be used for distance calculation with the input pattern from the mode switching circuit (3) is selected.

なお、予備選択回路（４５）より２値化パターンによる
予備選択によって入力パターンと距離算出すべき標準パ
ターンのメモリアドレスを出力として得、その出力によ
り標準パターンメモ１月４）よりの読み出しを制御して
予備選択を行なうようにしてもよい。In addition, the memory address of the standard pattern whose distance from the input pattern is to be calculated is obtained as an output from the preliminary selection circuit (45) by preliminary selection using the binarized pattern, and this output controls the reading from the standard pattern memo (January 4). Preliminary selection may also be performed.

また、予備選択回路（４５）において距離の小さいもの
から上位に個の標準パターンを選択する信号を得る場合
に、回路（４４）の出力値が一定の距離以内の標準パタ
ーンのみを選択するようにしてもよい。Further, when obtaining a signal for selecting standard patterns from the shortest distance to the highest in the preliminary selection circuit (45), only standard patterns whose output values from the circuit (44) are within a certain distance are selected. It's okay.

この場合に、一定の距離の範囲内に２値化パターンの距
離差がはいらず、すべて大きいものであるときは、軌跡
長による予備選択手段と同様認識不能として端子（５１
）にリジェクト信号を得ることができる。In this case, if the distance differences of the binarized patterns do not fall within a certain distance range and all of them are large, the terminal (51
) can get a reject signal.

なお、以上の例では軌跡長による予備選択は２値化パタ
ーンによる距離演算時に行なうようにしたが、距離算出
回路（６）における正規の標準パターンの距離演算時に
、２値化パターンによって予備選択と並行して行なうよ
うにしてもよい。この場合には、距離算出回路（６）で
入力パターンと演算される標準パターンは軌跡長による
予備選択により選択されたものと２値化パターンによる
予備選択により選択されたものとの論理糟あるいは論理
和であってもよい。In the above example, the preliminary selection based on the trajectory length was performed when calculating the distance using the binarized pattern, but when the distance calculation circuit (6) calculates the distance using the regular standard pattern, the preliminary selection based on the binary pattern is performed. They may be performed in parallel. In this case, the standard pattern that is calculated with the input pattern in the distance calculation circuit (6) is a logical difference between the one selected by the preliminary selection based on the trajectory length and the one selected by the preliminary selection based on the binary pattern. It may be Japanese.

この例においては、２値化パターンによる予備選択に加
えて軌跡長による予備選択を行なっているので、２値化
パターンによる予備選択により削減され距離演算すべき
標準パターンがさらに削減され、音声認識の応答の速度
の顕著な向上が期待できる。In this example, preliminary selection based on the trajectory length is performed in addition to preliminary selection using the binarization pattern, so the number of standard patterns that are reduced by the preliminary selection using the binarization pattern and whose distance must be calculated is further reduced. A significant improvement in response speed can be expected.

なお、登録時、軌跡長範囲を定めておき、登録される標
準パターンの軌跡長がどの範囲になるかを求め、その求
めた範囲に応じて登録標準パターンの書き込みアドレス
を決定するようにすれば軌跡長辞書は標準パターンの登
録とともに作成することができる。In addition, at the time of registration, if the trajectory length range is determined, the trajectory length range of the standard pattern to be registered is determined, and the writing address of the registered standard pattern is determined according to the determined range. A trajectory length dictionary can be created along with the registration of standard patterns.

Ｇ５　予備選択の他の例の説明第６図〜第８図はこの発明装置の予備選択の他の例を説
明するための図で、軌跡長による予備選択の方法が前述
例と異なる。G5 Explanation of Other Examples of Preliminary Selection FIGS. 6 to 8 are diagrams for explaining other examples of preliminary selection of the apparatus of the present invention, in which the method of preliminary selection based on trajectory length is different from the previous example.

第６図はその要部のブロック図で、認識時、ＳＡＴ処理
回路（９）からの認識パラメータ時系列と後述のように
して予備選択された標準パターンのパラメータ時系列と
の距離が距離算出回路（６）において算出され、最小値
判別回路（７）においてその距離の最小値が判別されて
認識出力が出力端（７０）に得られるのは前述の例と同
様である。FIG. 6 is a block diagram of the main part. During recognition, the distance calculation circuit calculates the distance between the recognition parameter time series from the SAT processing circuit (9) and the parameter time series of the standard pattern preselected as described below. The distance is calculated in step (6), the minimum value of the distance is determined in the minimum value determining circuit (7), and the recognition output is obtained at the output terminal (70), as in the previous example.

この例においては、登録時、１つの語を複数回入力し、
標準パターンとしては、そのオア（論理和）をとった統
合パターンを登録する。もちろん、すべてを登録しても
よい。In this example, when registering, one word is entered multiple times,
As a standard pattern, an integrated pattern obtained by taking the OR (logical sum) is registered. Of course, you can register everything.

そして、軌跡長算出回路（９１）よりの算出軌跡長ＳＬ
がモード切換回路（４６）を介して最大値、最小値検出
回路（５２）に供給され、１つの語の複数回入力時にお
ける軌跡長の最大値Ｍａｘ及び最小値Ｍｉｎが検出され
る。そして、その最大値Ｍａｘ及び最小値Ｍｉｎが、メ
モリ　（５３）に標準パターンメモ１月４）に登録され
る各語のアドレスに関連づけて第７図のように書き込ま
れる。Then, the calculated trajectory length SL from the trajectory length calculation circuit (91)
is supplied to the maximum value/minimum value detection circuit (52) via the mode switching circuit (46), and the maximum value Max and minimum value Min of the trajectory length when one word is input multiple times is detected. The maximum value Max and minimum value Min are written in the memory (53) in association with the address of each word registered in the standard pattern memo (January 4) as shown in FIG.

ｗ！、識時は、第８図のフローチャートにしたがった予
備選択がなされる。Lol! , a preliminary selection is made according to the flowchart of FIG. 8.

すなわち、先ず、メモリ　（５３）のアドレスがイニシ
ャライズされる（ステップ（３０１）　）。そして、軌
跡長算出回路（９１）からの入力語の算出軌跡長ＳＬが
モード切換回路（４６）を介して予備選択回路（４９）
に供給される（ステップ（３０２）　）。That is, first, the address of the memory (53) is initialized (step (301)). Then, the calculated trajectory length SL of the input word from the trajectory length calculation circuit (91) is sent to the preliminary selection circuit (49) via the mode switching circuit (46).
(step (302)).

一方、メモリ　（５３）のアドレスがインクリメントさ
れ（ステップ（３０３）　）　、先ず、初めのアドレス
が指定されて、記憶されている軌跡長の最大値Ｍａｘと
最小値Ｍｉｎが読み出され、これが予備選択回路（４９
）に供給される。そして、算出軌跡長ＳＬが最小値Ｍｉ
ｎより大きいかどうか判別され（ステップ（３０５）　
）　、大きければステップ（３０６）に進み、算出軌跡
長ＳＬが最大値Ｍａｘより小さいかどうか判別され、小
さければ、従ってＭｉｎ≦ＳＬ≦Ｍａｘであればステッ
プ（３０７）に進み、その最大値及び最小値を持つ登録
語の２値化標準パターンがメモリ（４３）より読み出さ
れ距離算出回路（４４）に供給される。On the other hand, the address of the memory (53) is incremented (step (303)), and first, the first address is specified, and the stored maximum value Max and minimum value Min of the trajectory length are read out, and these are used as the preliminary selection. Circuit (49
). Then, the calculated trajectory length SL is the minimum value Mi
It is determined whether it is larger than n (step (305)
), if it is larger, the process proceeds to step (306), where it is determined whether or not the calculated trajectory length SL is smaller than the maximum value Max. The binarized standard pattern of registered words having values is read out from the memory (43) and supplied to the distance calculation circuit (44).

次にアドレスがインクリメントされ（ステップ（３０８
）　）　、アドレスがメモリ　（５３）の最後のものに
なるまで、Ｍｉｎ≦ＳＬＹ　Ｍａｘとなるような最大値
及び最小値を持つ標準パターンが検出されメモリ　（４
３）より読み出される（ステップ（３０５３〜ステツプ
［３０７）　）。The address is then incremented (step (308)
) ) , a standard pattern with maximum and minimum values such that Min≦SLY Max is detected and stored in memory (4) until the address is the last one in memory (53).
3) is read out (step (3053 to step [307)).

そして、アドレスがメモリ　（５３）の最後になったこ
とが判別されると（ステップ（３０９）　）　、予備選
択は終了する。Then, when it is determined that the address is at the end of the memory (53) (step (309)), the preliminary selection ends.

一方、ステップ（３０５）で算出軌跡長ＳＬが最小値Ｍ
ｉｎより小さいと判別され、あるいはＳＬ≧Ｍｉｎであ
ってもステップ（３０Ｂ）で、軌跡長ＳＬが最大値Ｍａ
ｘよりも大きいと判別されたときは、ステップ（３０３
）に戻り、次のアドレスの登録標準パターンの軌跡長の
最大値Ｍａｘ及び最小値側ｎが読み出され、前述と同様
にＭａｘ≧ＳＬ≧Ｍｉｎであるかどうか判別され、そう
であればその標準パターンがメモリ　（４３）より読み
出され（ステップ（３０５）〜（３０９）　）　、そう
でなければ、ステップ（３０３）に戻り、これがくり返
される。On the other hand, in step (305), the calculated trajectory length SL is the minimum value M
Even if it is determined that the trajectory length SL is smaller than in, or even if SL≧Min, the trajectory length SL is set to the maximum value Ma in step (30B).
When it is determined that it is larger than x, step (303
), the maximum value Max and minimum value side n of the trajectory length of the registered standard pattern at the next address are read out, and it is determined whether Max≧SL≧Min as described above, and if so, the standard The pattern is read out from the memory (43) (steps (305) to (309)), otherwise the process returns to step (303) and is repeated.

そして、すべてのアドレスより最大値Ｍａｘ　、最小値
Ｍｉｎが読み出されてもＭｉｎ≦ＳＬ５　Ｍａｘとなる
ことがなかったときはステップ（３０４）でそれが判別
され、パターンマツチングを行なわないものとされ、そ
れを示すリジェクト信号が予備選択回路（４９）より端
子（５０）に導出される。If the maximum value Max and the minimum value Min are read from all addresses but Min≦SL5 Max is not satisfied, this is determined in step (304) and pattern matching is not performed. , a reject signal indicating this is output from the preliminary selection circuit (49) to the terminal (50).

こうして、予備選択により、軌跡長として取り得る可能
性のある語の２値化標準パターンのみがメモリ　（４３
）より読み出され、すべての２値化標準パターンと距離
演算する場合に比べて演算量が少なくなるものである。In this way, by preliminary selection, only the binarized standard patterns of words that can be taken as trajectory lengths are stored in memory (43
), and the amount of calculation is smaller than when calculating the distance with all the binarized standard patterns.

２値化パターンによる予備選択により距離算出回路（６
）において入力パターンと距離演算する標準パターンが
絞り込まれるのは前述の例と同様である。The distance calculation circuit (6
), the input pattern and the standard pattern whose distance is to be calculated are narrowed down, as in the previous example.

また、この例においても前述例と同様に軌跡長による予
備選択は、２値化パターンの標準パターンの予備選択で
はなく、メモリ（４）からの正規の標準パターンの予備
選択であってもよい。Further, in this example as well, the preliminary selection based on the trajectory length may be the preliminary selection of a regular standard pattern from the memory (4) instead of the preliminary selection of the standard pattern of the binarization pattern, as in the previous example.

この例においても前述例と同様の効果が得られることは
言うまでもない。It goes without saying that the same effects as in the previous example can be obtained in this example as well.

なお、以上の実施例においては音響パラメータ時系列Ｐ
ｉ（ｎ）からそのパラメータ空間における軌跡の軌跡長
を算出した場合について述べたが・音響パラメータ周波
数系列からそのパラメータ空間における軌跡の軌跡長を
算出するようにしてもよい。In addition, in the above embodiment, the acoustic parameter time series P
Although the case has been described in which the trajectory length of the trajectory in the parameter space is calculated from i(n), the trajectory length of the trajectory in the parameter space may be calculated from the acoustic parameter frequency series.

また、上述の実施例においては直線近似による軌跡の軌
跡長を算出するようにしたが、円弧近似、スプライン近
似などによる軌跡の軌跡長を算出するようにしてもよい
。Further, in the above embodiment, the length of the trajectory is calculated by linear approximation, but the length of the trajectory may be calculated by arc approximation, spline approximation, or the like.

さらに、上述の実施例においては音響分析部（２）の音
響パラメータ時系列Ｐｉ（ｎ）をＮＡＴ処理回路（９）
に供給し、このＮＡＴ処理回路（９）の軌跡長算出回路
（９１）において算出した軌跡長を用いて予備選択した
場合について述べたが、ＮＡＴ処理回路（９）の軌跡長
算出回路（９１）とは別途に軌跡長算出回路を設け、そ
の別の軌跡長算出回路にＮＡＴ処理回路（９）よりの新
たな認識パラメータ時系列Ｑｉ（ホ）を供給し、そのパ
ラメータ空間における軌跡の軌跡長を算出し、この軌跡
長に基づいて予備選択するようにしてもよい。Furthermore, in the above embodiment, the acoustic parameter time series Pi(n) of the acoustic analysis section (2) is transmitted to the NAT processing circuit (9).
As described above, preliminary selection is made using the trajectory length calculated by the trajectory length calculation circuit (91) of the NAT processing circuit (9). A trajectory length calculation circuit is separately provided, and the new recognition parameter time series Qi (E) from the NAT processing circuit (9) is supplied to the other trajectory length calculation circuit to calculate the trajectory length of the trajectory in the parameter space. The trajectory length may be calculated and preliminary selection may be made based on this trajectory length.

もちろん、軌跡長による予備選択を行なわなくてもよい
。Of course, it is not necessary to perform preliminary selection based on trajectory length.

また、ＮＡＴ処理回路を設けない音声認識装置に適用で
きることももちろんである。例えば、ＤＰマツチング処
理を行うようにした音声認識装置においても、音響分析
部（２）の音響パラメータ系列を２値化パターン生成回
路に供給し、この２値化パターン生成回路よりの２値化
パターンと２値化標準パターンとの距離算出出力に基づ
いて標準パターンを選択するようにしてもＤＰマツチン
グ処理のための演算量を少なくすることができる。It goes without saying that the present invention can also be applied to a speech recognition device that does not include a NAT processing circuit. For example, in a speech recognition device that performs DP matching processing, the acoustic parameter series of the acoustic analysis section (2) is supplied to a binarization pattern generation circuit, and the binarization pattern generated by the binarization pattern generation circuit is The amount of calculation for the DP matching process can also be reduced by selecting the standard pattern based on the distance calculation output between the standard pattern and the binarized standard pattern.

なお、以上の例においては軌跡長による予備選択により
２値化標準パターンメモリよりの２値化標準パターンの
読み出しを制御して距離算出手段（４４）に供給する２
値化標準パターンを絞り込むようにしたが、距離算出回
路（６）における場合の処理と同様に距離算出手段（４
４）の入力段で標準パターンを予備選択出力によりゲー
トしてもよいし、また、予備選択により距離算出の必要
の標準パターンは入力パターンとの距離算出時に排除し
て距離算出しないようにしてもよい。In the above example, the readout of the binarized standard pattern from the binarized standard pattern memory is controlled by the preliminary selection based on the trajectory length, and the two are supplied to the distance calculation means (44).
Although the value conversion standard pattern is narrowed down, the distance calculation means (4) is similar to the process in the distance calculation circuit (6).
In the input stage of 4), the standard pattern may be gated by the preliminary selection output, or the standard pattern that requires distance calculation may be excluded from the distance calculation by preliminary selection when calculating the distance from the input pattern. good.

また、逆に、距離算出回路（６）に供給する標準パター
ンを予備選択するのにメモリ（４）の読み出しを制御す
るようにしてもよい。Conversely, reading of the memory (4) may be controlled to preliminarily select a standard pattern to be supplied to the distance calculation circuit (6).

以上の例では、音響パラメータ時系列を２値化するに当
たって、音源情報正規化回路の出力が正の値と負の値を
とることを利用したが、音響パラメータ時系列のパワー
スペクトラムの傾向に応じてスライスレベル定め、この
スライスレベルより大きいときは「１」、小さいときは
「０」というようにして２値化するようにしてもよい。In the above example, in binarizing the acoustic parameter time series, we used the fact that the output of the sound source information normalization circuit takes positive and negative values. Alternatively, a slice level may be determined, and when the slice level is greater than the slice level, "1" is assigned, and when it is smaller than the slice level, "0" is assigned, and the binarization is performed.

Ｈ発明の効果以上のようにこの発明によれば入力語の音響パラメータ
系列の２値化パターンを生成して登録しておき、認識時
、入力語の２値化パターンと登録２値化標準パターンと
の距離算出し、その算出出力により入力パターンと距離
算出する正規の標準パターンを予備選択し、この予備選
択によりすべての登録標準パターンより絞り込まれた標
準パターンを距離算出回路に供給して、入力パターンと
の距離算出をなすようにしたので、距離算出時の演算量
を低減することができる。この場合に２値化パターンの
距離演算が加わるがその演算量は前述したように非常に
少ないので、全体として演算量の大幅な低減をすること
ができる。H Effects of the Invention As described above, according to this invention, a binarized pattern of the acoustic parameter sequence of an input word is generated and registered, and during recognition, the binarized pattern of the input word and the registered binarized standard pattern are The distance between the input pattern and the regular standard pattern to be calculated is preliminarily selected based on the calculation output, and the standard pattern narrowed down from all the registered standard patterns by this preliminary selection is supplied to the distance calculation circuit and input. Since the distance to the pattern is calculated, the amount of calculation when calculating the distance can be reduced. In this case, the distance calculation for the binarized pattern is added, but the amount of calculation is very small as described above, so the overall amount of calculation can be significantly reduced.

したがって、認識時の応答がその演算量の減少の分だけ
短かくなり、早くなるものである。Therefore, the response during recognition becomes shorter and faster by the reduction in the amount of calculation.

また、入力語が認識対象語でないときは、予備選択の段
階がそれを検出してリジェクト信号を得ることができ、
迅速に応答できるという利点もある。In addition, when the input word is not a recognition target word, the preliminary selection stage can detect it and obtain a reject signal,
It also has the advantage of being able to respond quickly.

[Brief explanation of the drawing]

第１図はこの発明装置の一実施例のブロック図、第２図
はその一部の動作の説明のためのフローチャートを示す
図、第３図はその要部の一例のブロック図、第４図及び
第５図は第３図例の説明のための図、第６図はこの発明
装置の要部の他の例のブロック図、第７図及び第８図は
その説明に供する図、第９図は音声ｍ！ｌｌ装置の基本
構成を示すブロック図、第１０図〜第１２図はＮＡＴ処
理を説明するための図である。（２）は音響分析回路、（４）は標準パターンメモリ、
（６）は標準パターンと入力パターンとの距離算出回路
、（７）は最小値判定回路、（９）はＮＡＴ処理回路、
（４１）は２値化パターン生成回路、（４３）は２値化
標準パターンメモリ、（４５）は２値化予備選択回路で
ある。精°閲舊Ｉ！呂めフローテヤト第２図第８図ｐフパ′フメータｅｌｌｌＩ’ｌｅ橿くよ、１１Ｊ＾仲Ｊｆ
ｔホ１■乙第１０図１でフメータｔＩｌｌｌＩｈ福〈争π丑つ脅り及ホ１図
第１１図第１２図FIG. 1 is a block diagram of an embodiment of the device of the present invention, FIG. 2 is a flowchart for explaining the operation of a part of the device, FIG. 3 is a block diagram of an example of the main part, and FIG. 4 5 is a diagram for explaining the example shown in FIG. 3, FIG. 6 is a block diagram of another example of the main part of this invention device, FIGS. 7 and 8 are diagrams for explaining the same, and FIG. 9 is a diagram for explaining the example shown in FIG. The figure is audio m! FIGS. 10 to 12, which are block diagrams showing the basic configuration of the II device, are diagrams for explaining NAT processing. (2) is an acoustic analysis circuit, (4) is a standard pattern memory,
(6) is a distance calculation circuit between a standard pattern and an input pattern, (7) is a minimum value judgment circuit, (9) is a NAT processing circuit,
(41) is a binarization pattern generation circuit, (43) is a binarization standard pattern memory, and (45) is a binarization preliminary selection circuit. Precise review I! 11 J^ Naka Jf
t ho 1 ■ B 10 Figure 1 and the future of the future

Claims

[Scope of Claims] (a) Acoustic analysis means for obtaining an acoustic parameter sequence of an input speech signal; (b) Standard pattern memory means in which an acoustic parameter sequence of a standard pattern of a recognition target word is stored; (c) a first distance calculation means for calculating the difference between the acoustic parameter series of the input pattern and the acoustic parameter series of the standard pattern read from the standard pattern memory means; (d) the value calculated by the first distance calculation means; (e) means for forming a binarization pattern from the acoustic parameter series; (f) a binarization standard for the recognition target word; a binarized standard pattern memory means in which the pattern is stored; (g) a second distance calculation for calculating the distance between the binarized pattern and the binarized standard pattern from the binarized standard pattern memory means; (h) selecting a standard pattern from all of the standard pattern memory means for calculating the distance to the input pattern in the first distance calculating means based on the output of the second distance calculating means; A speech recognition device comprising preliminary selection means.