JPS6232799B2

JPS6232799B2 -

Info

Publication number: JPS6232799B2
Application number: JP55041681A
Authority: JP
Inventors: Hiroaki Sekoe
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1980-03-31
Filing date: 1980-03-31
Publication date: 1987-07-16
Also published as: JPS56138798A

Description

【発明の詳細な説明】本発明は音声認識装置の改良に関し、特に入力
速度の向上に寄与するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to improvements in speech recognition devices, and particularly contributes to improving input speed.

音声認識装置は人間から機械へのデータあるい
は制御指令の入力手段として有効である。近年で
は自動仕分け装置制御指令入力手段として利用さ
れるようになつている。「中田和男編、昭和53年
９月コロナ社発行による“パタン認識とその応
用”なる文献の第153ページから第156ページ」に
は、この他にも広範囲の応用分野が存在すること
が記されている。 Speech recognition devices are effective as means for inputting data or control commands from humans to machines. In recent years, it has come to be used as an automatic sorting device control command input means. ``Pages 153 to 156 of the literature entitled ``Pattern Recognition and Its Applications,'' edited by Kazuo Nakata, published by Corona Publishing, September 1978,'' states that there are a wide range of other application fields. ing.

音声認識装置には大別して離散入力型と連続入
力型とが存在する。前者では入力される単語間に
休止区間（無音区間）を入れる事が必要とされ
る。この休止区間によつて単語の始端と終端を決
定して単語区間を検出し、認識処理が行なわれ
る。しかるに数字“６”（roku）のような音声で
はｋの直前に休止区間が発生する。このように語
中に発生する休止区間長は時として200ｍｓ以上
に達する事がある。このため語間の休止区間には
約300ｍｓ以上の長さが必要とされる。したがつ
て従来の離散入力型の音声認識装置では単語間に
300ｍｓ以上の休止区間を置きながら単語を入力
する必要があつて、入力速度は低かつた。 Speech recognition devices can be broadly classified into discrete input type and continuous input type. In the former case, it is necessary to insert a pause section (silent section) between input words. The start and end of a word are determined using this pause section, a word section is detected, and recognition processing is performed. However, in a voice such as the number "6" (roku), a pause section occurs immediately before k. In this way, the length of pauses that occur during words can sometimes reach 200 ms or more. Therefore, the length of the pause section between words is required to be approximately 300 ms or more. Therefore, in conventional discrete input type speech recognition devices, there is a
It was necessary to input words with pauses of 300ms or more, and the input speed was slow.

一方、連続入力型の音声認識装置は単語間に休
止区間を入れなくても認識動作を実行でき、高い
入力速度が得られる。特願昭50−29891号明細書
には連続単語認識装置の構成が示されており、そ
の原理は日本電気製のDP−100音声入力装置に実
用化されている。しかしこのような連続単語認識
法では、単語間境界が不明であるため、音声中の
各時点を単語境界と仮定して比較計算を多数回繰
り返す必要があり、所要演算量が膨大になつて装
置が大型高価格になるという欠点があつた。 On the other hand, continuous input type speech recognition devices can perform recognition operations without inserting pauses between words, and can achieve high input speed. Japanese Patent Application No. 50-29891 discloses the structure of a continuous word recognition device, and its principle has been put to practical use in the DP-100 voice input device manufactured by NEC Corporation. However, in such continuous word recognition methods, since the boundaries between words are unknown, it is necessary to repeat the comparison calculation many times by assuming each point in the speech as a word boundary, which increases the amount of calculation required and the equipment The disadvantage was that it was large and expensive.

本発明は、完全に連続した単語列を認識するも
のではないが、単語間に極めて短かくても休止区
間が存在しさえすれば語中の休止区間に影響され
る事なく正しい認識を行なう離散入力型の装置を
実現するものである。 The present invention does not recognize completely continuous word strings, but as long as there are pauses between words, even if they are extremely short, the present invention can perform accurate recognition without being affected by pauses in words. This realizes an input type device.

すなわち、本発明の目的は従来離散入力型より
は極めて高速で、連続入力型に比較すると大幅に
小型安価なる音声認識装置を実現提供する事にあ
る。 That is, an object of the present invention is to realize and provide a speech recognition device that is much faster than the conventional discrete input type, and significantly smaller and cheaper than the continuous input type.

本発明による高速音声認識装置は入力音声波形
を分析しベクトル系列として表現される入力パタ
ンに変換するための分析部と、標準パタンを記憶
するための標準パタンメモリーと、入力音声の振
幅を検定して入力パタン内に休止点を決定するた
めの手段と、休止点と別の休止点との間の区間と
して定義される部分パタンと前記標準パタンとの
距離を算出するためのパタンマツチング部と、各
標準パタンに対して算出される距離を比較してそ
の最小値たる部分距離とその最小値を与える単語
名たる部分判定を算出するための最小値検出部
と、前記入力パタン内の各休止点によつて区分さ
れ重複せずかつ入力パタン全体を覆う部分パタン
群を各部分パタンに対応する前記部分距離の総和
が最小となるように定めるための手段と、これに
よつて定まる各部分パタンに対応する前記部分判
定を認識結果として定め出力するための判定部と
より構成される。 The high-speed speech recognition device according to the present invention includes an analysis section for analyzing an input speech waveform and converting it into an input pattern expressed as a vector sequence, a standard pattern memory for storing a standard pattern, and a standard pattern memory for testing the amplitude of the input speech. means for determining a resting point in an input pattern by using a pattern matching unit for calculating a distance between a partial pattern defined as an interval between the resting point and another resting point and the standard pattern; , a minimum value detection unit for comparing distances calculated for each standard pattern and calculating a partial distance as the minimum value and a partial judgment as a word name giving the minimum value, and each pause in the input pattern. Means for determining a group of partial patterns that are divided by points, do not overlap, and cover the entire input pattern so that the sum of the partial distances corresponding to each partial pattern is minimized, and each partial pattern determined thereby. and a determination unit for determining and outputting the partial determination corresponding to the recognition result as a recognition result.

かくの如き構成によると部分距離や部分判定の
算出は休止点だけを対象として行なえばよいの
で、すべての時点を対称としていた前記特願昭50
−29891号の構成に非して格段に少量の計算量で
音声認識が可能になる。一方、本装置を使用する
場合には各単語間に休止区間を置いて発声する必
要があるが、この休止区間は単語内の休止区間に
比して同程度以下に短かくても良いので、十分長
い休止区間を置く必要のあつた在来の離散入力型
の音声認識装置に比して高速な速度が実現でき
る。 According to such a configuration, calculation of partial distances and partial judgments only needs to be performed for resting points, so it is not necessary to calculate the partial distances and partial determinations using only the resting points.
-Voice recognition becomes possible with a much smaller amount of calculation than the configuration of No. 29891. On the other hand, when using this device, it is necessary to utter a pause between each word, but this pause can be as short or shorter than the pause within a word. Faster speeds can be achieved compared to conventional discrete input type speech recognition devices that require a sufficiently long pause interval.

本発明による音声認識装置の認識対象語は特に
限定されるものではないが、以下では一列として
数字０〜９を認識対象とする。一般に数字をｎで
示す。 Words to be recognized by the speech recognition device according to the present invention are not particularly limited, but in the following, numbers 0 to 9 will be recognized as a string. Generally, numbers are indicated by n.

ｎ＝０、１、２、…、９ (1) 各数字ｎには標準パタンＢⁿ＝〓_１ ⁿ、〓_２ ⁿ、…、〓_j ⁿ、…、〓ⁿ _Jｎ (2) が用意されている。いま入力パタン（未知）をＡ＝〓_１、〓_２、…、〓_i、…、〓_I (3) と示す。ここに(2)、(3)におけるベクトル〓_j ⁿ、〓_i
等はそれぞれ時刻ｊ，ｉにおける音声の特徴を示
すベクトルである。 n = 0, 1, 2, ..., 9 (1) For each number n, a standard pattern B ⁿ =〓 ₁ ⁿ , 〓 ₂ ⁿ , ..., 〓 _j ⁿ , ..., 〓 ⁿ _J n (2) is prepared. ing. Now, the input pattern (unknown) is expressed as A=〓 ₁ , 〓 ₂ , ..., 〓 _i , ..., 〓 _I (3). Here, the vectors 〓 _j ⁿ , 〓 _i in (2) and (3)
etc. are vectors indicating the characteristics of the voice at times j and i, respectively.

入力パタンＡには複数個（特殊な場合には１
個）の数字音声が含まれている。数字と数字の間
には第１ａ図に示すように必らず休止区間が含ま
れているものとする。数字“１”、“６”、“８”等
の場合には語中の休止区間も存在する。しかし当
然の事ながら各休止区間の語間の休止区間である
か、語中の休止区間であるかは不明である。 Input pattern A has multiple patterns (in special cases, 1 pattern)
Contains digit sounds. It is assumed that a pause section is necessarily included between the numbers, as shown in FIG. 1a. In the case of numbers "1", "6", "8", etc., there is also a pause section in the word. However, as a matter of course, it is unclear whether each pause section is a pause section between words or a pause section within a word.

休止区間は存在するか否かという情報以外には
特に有益な情報は無いのでその長さを１に圧縮し
てさしつかえない。すなわち第１図ｂに示される
ように圧縮するものとする。入力パタンＡはこの
ように圧縮されているものとする。休止区間に対
応してはベクトル〓_iとして０ベクトルが存在す
るものとする。すなわち、休止区間では〓_i＝（０、０、…、０） (4) となつている。なお、このように０ベクトルを含
む入力パタンとの比較が精度良く実行されるよう
に、各標準パタンの最後のベクトル〓ⁿ _Jｎも０ベ
クトルとなつているものとする。 Since there is no particularly useful information in the pause section other than information on whether it exists or not, its length can be compressed to 1. That is, it is assumed that the data is compressed as shown in FIG. 1b. It is assumed that input pattern A is compressed in this way. It is assumed that a 0 vector exists as the vector 〓 _i corresponding to the pause section. That is, in the pause section, 〓 _i = (0, 0, ..., 0) (4). It is assumed that the last vector 〓 ⁿ _J n of each standard pattern is also a 0 vector so that the comparison with input patterns including 0 vectors can be performed with high accuracy.

〓ⁿ _Jｎ＝（０、０、…、０） (5) 以上述べた入力パタンと標準パタンとの間で実
行される認識動作の原理を以下に説明する。第１
ｂ図のように０ベクトルが挿入された点を休止点
と呼ぶ。最初の、すなわち単語列の先頭を０とし
以後順に番号づけを行なつてｋ＝０、１、２、…、Ｌ (6) なる数字を各休止区間に対応させる。これらの休
止区間の中の何個から単語境界であり、他は単な
る語中の休止区間である。しかし未知の入力パタ
ンＡが与えられた段階ではこれらの区別は知られ
ていない。今、第ｋ休止点の時刻ｉがｉ＝ｐ(k) (7) で与えられるとする。入力パタンＡに部分パタンＡ（ｘ、ｙ）＝〓_l+1、〓_l+2、…、〓_n (8) を定義する。ここにｌ＝ｐ（ｘ）、ｍ＝ｐ（ｙ） (9) すなわち第ｘ休止点の直後から始まり、第ｙ休止
点に至るまで（第ｙ休止点の０ベクトルを含む）
のベクトル系列を部分パタンＡ（ｘ、ｙ）として
定義する。この部分パタンは内部に休止点を含ん
でいてもよい。すなわち、部分パタンの終端ｙは
始端ｘの直後の休止区間である必要は無い。一般
にはｘ＜ｙ (10) いま、この部分区間Ａ（ｘ、ｙ）が真の単語であ
ると仮定とすると、この部分は公知のパタンマツ
チング法によつて認識できる。すなわち、今、ベ
クトル〓_iと〓_jとの間の距離をｄ（ｉ、ｊ）で示
す時、上記部分パタンと前記の音声パタンＢ（一
般性を持ちたせるために添字ｎを省略）との間の
距離を次のように定義する。〓 ⁿ _J n=(0, 0,..., 0) (5) The principle of the recognition operation performed between the input pattern and the standard pattern described above will be explained below. 1st
The point where the 0 vector is inserted as shown in figure b is called the rest point. The first word, that is, the beginning of the word string, is set to 0, and the numbers are sequentially numbered thereafter, and the numbers k=0, 1, 2, . . . , L (6) are made to correspond to each pause section. Some of these pause sections are word boundaries, and others are simply pause sections within words. However, these distinctions are not known at the stage when an unknown input pattern A is given. Now, assume that the time i of the k-th rest point is given by i=p(k) (7). Define subpatterns A(x, y)=〓 _l+1 , 〓 _l+2 , . . . , 〓 _n (8) for input pattern A. Here, l=p(x), m=p(y) (9) That is, starting immediately after the x-th resting point and ending at the y-th resting point (including the 0 vector of the y-th resting point)
Define the vector sequence as a partial pattern A(x,y). This partial pattern may include a rest point inside. That is, the terminal end y of the partial pattern does not need to be a rest section immediately after the starting end x. In general, x<y (10) Now, assuming that this subinterval A(x, y) is a true word, this part can be recognized by a known pattern matching method. That is, now, when the distance between the vectors 〓 _i and 〓 _j is denoted by d (i, j), the relationship between the above partial pattern and the above speech pattern B (subscript n is omitted for generality) The distance between them is defined as follows.

これは、例えば特願昭54−66589号明細書にお
ける(4)式の距離の定義と同義である。関数ｊ(i)は
単調増加関数であつてｊ（ｌ＋１）＝１、ｊ（ｍ）＝ｊ (12) なる境界条件を満足する。上記明細書あるいは特
願昭50−132003号明細書の主旨によると、(11)式の
最小化問題は次のようなダイナミツクプログラミ
ング法（DP）によつて計算される。 This is synonymous with the definition of distance in equation (4) in, for example, Japanese Patent Application No. 1983-66589. Function j(i) is a monotonically increasing function and satisfies the following boundary conditions: j(l+1)=1, j(m)=j (12). According to the gist of the above specification or Japanese Patent Application No. 50-132003, the minimization problem of equation (11) is calculated by the following dynamic programming method (DP).

初期条件ｇ（ｍ、ｊ）＝ｄ（ｍ、ｊ）（13）漸化式距離Ｄ（Ａ（ｘ、ｙ）、Ｂ）＝ｇ（ｌ＋１、１）
（15）以上の距離の定義及び計算法は例えば新美康永
著（昭和54年10月10日共立出版株式会社刊）「音
声認識」第108頁に記載された如くDP−マツチン
グ法として広く知られているものである。 Initial condition g (m, j) = d (m, j) (13) Recurrence formula Distance D(A(x,y),B)=g(l+1,1)
(15) The above distance definition and calculation method is widely known as the DP-matching method, as described, for example, in "Speech Recognition" by Yasunaga Niimi (published by Kyoritsu Publishing Co., Ltd. on October 10, 1971), page 108. This is what is being done.

標準パタンとしてＢⁿを代入れて上記の手続き
を実行して得られる距離Ｄ（Ａ（ｘ、ｙ）、Ｂⁿ）
をＤ（ｘ、ｙ、ｎ）と略記する事にする。この距
離が数字単語ｎ＝０、１、…、９の全部に対して
求まると、その最小値を求める事によつてこの部
分パタンＡ（ｘ、ｙ）を認識する事ができる。そ
の結果を部分判定部分距離とする。ここにargminなる記号は〔〕内の値
の最小を与えるパラメタｎを選択する事を意味す
る。 Distance D (A(x, y), B ⁿ ) obtained by substituting B ⁿ as the standard pattern and executing the above procedure
is abbreviated as D(x, y, n). When this distance is found for all of the numerical words n=0, 1, . . . , 9, this partial pattern A(x, y) can be recognized by finding its minimum value. Partial judgment of the result partial distance shall be. Here, the symbol argmin means to select the parameter n that gives the minimum value in [ ].

以上の部分判定Ｎ＾（ｘ、ｙ）と部分距離Ｄ＾
（ｘ、ｙ）をすべての休止点対（ｘ、ｙ）に対し
て求めることにする。以上を第１段処理と呼ぶ。 The above partial judgment N^(x, y) and partial distance D^
Let us find (x, y) for all pairs of resting points (x, y). The above is called first stage processing.

次に入力パタンＡの全体について上記分部距離
Ｄ＾（ｘ、ｙ）の総和を計算し、それが最少となる
ような部分パタン列を求める。ただし、この部分
パタン列は第２図に示すように相互に重複せず、
かつ入力パタンＡ全体を覆うものであるとする。
この要求は連続発声された単語列中には単語間で
重なり合う部分が無く、かついずれの単語にも属
さない様な余部な部分が無いという意味である。
また、この部分パタン列の中の各部分パタンの間
の境界は前記の休止点のいずれかに一致する必要
がある。以上の事を数式的に表わすと次のように
なる。 Next, the sum of the partial distances D^(x, y) is calculated for the entire input pattern A, and a partial pattern sequence that minimizes the sum is determined. However, as shown in Figure 2, these partial pattern rows do not overlap with each other;
Moreover, it is assumed that the input pattern A is entirely covered.
This requirement means that there are no overlapping parts between words in the continuously uttered word string, and there are no extra parts that do not belong to any word.
Furthermore, the boundary between each partial pattern in this partial pattern sequence must coincide with one of the above-mentioned rest points. The above can be expressed mathematically as follows.

すなわち、単語数Ｍと単語境界ｘ（０）、ｘ
(1)、…、ｘ(k)、…、ｘ（Ｍ）とを最適選択する事
によつてこれら単語境界に対応する部分距離の総
和を最小とするのである。 That is, the number of words M and the word boundaries x(0), x
By optimally selecting (1), . . . , x(k), . . . , x(M), the sum of partial distances corresponding to these word boundaries is minimized.

（16）式の最小化問題を計算し、最適なパラメ
ータＫ＝Ｍとｘ(k)＝ｘ＾(k)、ｋ＝０、１、２、…、
Ｍ＾（ただしｘ（０）は単語列全体の始点であるの
でｘ（０）＝０、ｘ（Ｍ）は単語列全体の終点で
あるのでｘ（Ｍ）＝Ｉであることは自明）を求め
ると、前記の部分判定Ｎ＾（ｘ、ｙ）を参照するこ
とによつてｎ＾(k)＝Ｎ＾（ｘ＾（ｋ−１）、ｘ＾(k)）、ｋ＝１、２、…、Ｍ＾（17）と認識結果が確定する。 Calculate the minimization problem of equation (16) and find the optimal parameters K=M and x(k)=x^(k), k=0, 1, 2,...
M^ (However, since x(0) is the starting point of the entire word string, x(0) = 0, and x(M) is the end point of the entire word string, so it is obvious that x(M) = I). When calculated, by referring to the partial determination N^(x, y) above, n^(k)=N^(x^(k-1), x^(k)), k=1, 2 , ..., M^ (17) and the recognition result is finalized.

（16）式の最小化問題の計算は、例えば前記特
願昭50−132003号明細書の（24）式に示される如
きダイナミツクプログラミング法によつても可能
であるが、本願の場合には休止点に限定して単語
境界を決定すれば良いので問題の規模が小さく、
いわゆる総当り法によつても可能である。すなわ
ち第２図の場合において、単語列全体としての始
端と終端を除外して５個の休止点があるから、こ
れらの各々が単語境界である場合とそうでない場
合の総べての組み合せについて部分距離の群の総
和を計算し最小値を求めることによつて実行され
る。この場合の組み合せの総数は2⁵（休止点か否
か、すなわち１か０かという独立事象が５個あ
る）すなわち32通りであるにすぎない。休止点が
10個ある場合でも1024通りであるにすぎず、１回
の総和計算に100μｓ必要であるとしても、全体
では102.4ｍｓ以内で終了する事になる。 Although the calculation of the minimization problem of equation (16) is possible, for example, by a dynamic programming method as shown in equation (24) in the specification of Japanese Patent Application No. 50-132003, in the case of the present application, The scale of the problem is small because it is only necessary to determine word boundaries at rest points.
This is also possible using the so-called brute force method. In other words, in the case of Figure 2, since there are five resting points excluding the start and end of the word string as a whole, the partial It is performed by calculating the sum of a group of distances and finding the minimum value. The total number of combinations in this case is only 2 ⁵ (there are 5 independent events of whether it is a resting point or not, that is, whether it is 1 or 0), that is, 32 ways. The resting point
Even if there are 10 items, there are only 1024 ways, and even if it takes 100 μs to calculate the total sum once, the total calculation will be completed within 102.4 ms.

これら（16）式の最小化を第２段処理と呼ぶ。
また（17）式の計算を判定処理と呼ぶ。 The minimization of these equations (16) is called second stage processing.
Also, the calculation of equation (17) is called determination processing.

第３図は以上の原理に基づいて動作する高速音
声認識装置の一構成例を示すブロツク図である。
信号線ISを通して入力される音声信号は例えば
「昭和54年９月にオーム社より刊行されたエレク
トロニクス誌の929ページの第２図」に示される
如き周波数分析手段２０によつて周波数分析、時
間多重化、標本化、デイジタル化され、(3)式の如
きベクトルの時系例として入力パタンバツフア４
０に送られる。他方レベル検出器３０では入力音
声信号の振幅レベルが測定され、レベル信号Ｌと
して音声検出器５０に送られる。音声検出器５０
では、入力されるレベル信号Ｌに基づいて第４図
に示す如き信号q₁，q₂，bg，enを発生する。す
なわち音声の始端（単語列全体としての始端）で
は始端検出パルスbgか、音声の終端（単語列全
体としての終端）では終端検出パルスenが、そ
れぞれ発生される。また休止区間の始点では休止
区間検出パルスq₁が、また休止区間中では休止区
間継続信号q₂が発生される。 FIG. 3 is a block diagram showing an example of the configuration of a high-speed speech recognition device that operates based on the above principle.
The audio signal input through the signal line IS is subjected to frequency analysis and time multiplexing by a frequency analysis means 20 as shown in "Figure 2 on page 929 of Electronics magazine published by Ohmsha in September 1974". The input pattern buffer 4 is an example of the time series of vectors expressed in equation (3), which are
Sent to 0. On the other hand, the level detector 30 measures the amplitude level of the input audio signal and sends it as a level signal L to the audio detector 50. voice detector 50
Then, based on the input level signal L, signals q ₁ , q ₂ , bg, and en as shown in FIG. 4 are generated. That is, a start detection pulse bg is generated at the start of the voice (the start of the entire word string), and an end detection pulse en is generated at the end of the voice (the end of the entire word string). Further, a rest period detection pulse q ₁ is generated at the start point of the rest period, and a rest period continuation signal q ₂ is generated during the rest period.

制御部１０に内蔵されるフレームカウンタの出
力m₁と、休止点カウンタの出力k₁とは前記の始
端検出パルスbgが発生された時点でそれぞれm₁
＝１、k₁＝１とセツトされる。また休止点テーブ
ルの内容はすべて−１にリセツトされた後第０番
地に０が記入される。以後周波数分析部２０から
入力パタンのベクトル〓_n（時刻ｉ＝ｍ）が１個
送られるたびに、フレームカウンタ信号は１ずつ
増加される。このフレームカウンタ信号によつて
番地指定され、休止点テーブルの第m₁番地には
休止点カウンタ信号k₁が記入される上記のベクト
ル〓_iは第ｍ番目のベクトル〓_nとして入力パタン
バツフア４０に書き込まれる。休止区間の最初に
休止区間検出パルスq₁が発生されると、休止点テ
ーブルの第m₁番地に１が記入され、入力パタン
バツフアにはベクトル〓_nとして、(4)式の如き０
ベクトルが書き込まれる。また休止点カウンタ信
号k₁は１だけ増加される。その後、休止区間継続
信号q₂が発生されている間はフレームカウンタ信
号ｍの増加は抑止される。かくの如き制御によつ
て、第１ａ図のように休止区間を伴なう音声が入
力されても、第１ｂ図のように休止区間を圧縮し
た形式の入力パタンが得られることになる。 The output m ₁ of the frame counter built in the control unit 10 and the output k ₁ of the rest point counter are respectively m ₁ at the time when the start edge detection pulse bg is generated.
=1, k ₁ =1. Also, after all the contents of the rest point table are reset to -1, 0 is written at address 0. Thereafter, each time one input pattern vector 〓 _n (time i=m) is sent from the frequency analysis section 20, the frame counter signal is incremented by one. The address is specified by this frame counter signal, and the rest point counter signal _k1 _is written in the m _- _th address of the rest point table. It can be done. When the rest period detection pulse q ₁ is generated at the beginning of the rest period, 1 is written in the m _-th address of the rest point table, and the input pattern buffer is filled with 0 as the vector 〓 _n , as shown in equation (4).
A vector is written. Also, the rest point counter signal _k1 is incremented by one. Thereafter, while the pause period continuation signal q ₂ is being generated, the frame counter signal m is inhibited from increasing. With such control, even if a voice with pauses as shown in FIG. 1a is input, an input pattern with the pauses compressed as shown in FIG. 1b can be obtained.

フレームカウンタ信号m₁＝ｍで休止点カウン
タ信号k₁＝ｙのとき休止区間が始まつたとする。
休止区間検出信号q₁がパタンマツチング部７０に
送られると前記の第１段処理が開始される。この
ため、前記制御部よりの単語指定信号n₁が第５図
のタイムチヤートの如く０、１、２、…、９と変
化され、これによつて標準パタンメモリー１３０
内の標準パタンＢⁿが順次指定される。いま、一
般的に単語指定信号がn₁＝ｎである場合のパタン
マツチング部７０の動作を説明する。一般的にn₁
＝ｎであるサイクル（13）、（14）式の計算が行な
われる。特に（14）式の計算は、前記特願昭50−
132003号に記しされた如くｊ＋ｍ−Ｊⁿ−γ≠≦ｉ≦ｊ＋ｍ−Ｊⁿ＋γ
（18）なる整合窓内で行なわれ、この結果漸化式値ｇ
（ｉ、１）は１＋ｍ−Ｊⁿ−γ≦ｉ≦１＋ｍ−Ｊⁿ＋γ（19）の範囲で求まる。したがつて距離Ｄ（ｘ、ｙ、
ｎ）はｍ−Ｊⁿ−γ≦ｌ≦ｍ−Ｊⁿ＋γ すなわちＰ（ｘ）−Ｊⁿ−γ≦Ｐ（ｙ）≦Ｐ（ｘ）−Ｊⁿ＋γ
（20）の条件を満足する休止点ｘを始端とする部分パタ
ンに対して算出される。（20）の条件を満足する
ｘが複数個存在する時はこれらのｘを始点とする
部分パタンＡ（ｘ、ｙ）のそれぞれに対して距離
Ｄ（ｘ、ｙ、ｎ）が算出される。かくの如きパタ
ンマツチング部７０は前記特願昭50−132003号明
細書においても参照されている特願昭50−29891
号明細書の第６図と同様な構成によつて実現でき
る。 Assume that a pause period starts when the frame counter signal m ₁ =m and the pause point counter signal k ₁ =y.
When the pause section detection signal q ₁ is sent to the pattern matching section 70, the first stage processing described above is started. Therefore, the word designation signal _n1 from the control section is changed to 0, 1, 2, . . . , 9 as shown in the time chart in FIG.
The standard patterns B ⁿ within are sequentially specified. Now, the operation of the pattern matching section 70 when the word designation signal is generally n ₁ =n will be explained. generally n ₁
=n, calculations of equations (13) and (14) are performed. In particular, the calculation of equation (14) is
As stated in No. 132003, j+m-J ⁿ -γ≠≦i≦j+m-J ⁿ +γ
(18), and as a result, the recurrence formula value g
(i, 1) is found in the range 1+m-J ⁿ -γ≦i≦1+m-J ⁿ +γ (19). Therefore, the distance D(x, y,
n) is m-J ⁿ -γ≦l≦m-J ⁿ +γ, that is, P(x)-J ⁿ -γ≦P(y)≦P(x)-J ⁿ +γ
(20) It is calculated for the partial pattern whose starting point is the resting point x that satisfies the condition. When there are multiple x's that satisfy the condition (20), the distance D(x, y, n) is calculated for each of the partial patterns A(x, y) starting from these x's. Such a pattern matching section 70 is disclosed in Japanese Patent Application No. 50-29891, which is also referred to in the specification of Japanese Patent Application No. 132003/1983.
This can be realized by a configuration similar to that shown in FIG. 6 of the specification.

かくして計算された距離Ｄ（ｘ、ｙ、ｎ）は信
号線D₁を経由して最小値検出部８０に送られ
る。本題では休止点のみが単語境界たりうるとし
ていることが特徴である。このためアドレス信号
m₂＝ｉによつて休止点テーブル６０に（19）式
の範囲内で番地指定がなされ、ｉ番地の内容ｃ(i)
が信号線ｃを経由して読み出される。ｃ(i)が−１
のきには該等する漸化式値ｇ（ｉ、１）は出力さ
れない。ｃ(i)が非負の数ｘである時はこのｘは休
止点番号であるので漸化式値ｇ（ｉ、１）が距離
Ｄ（ｘ、ｙ、ｎ）として出力される。またこの休
止点番号ｃ(i)は信号線k₂経由して部分距離メモリ
ー９０と、部分判定メモリー１００とに送られ
る。以上の動作は単語指定信号ｎが０から９まで
変化する間繰り返される。 The distance D (x, y, n) thus calculated is sent to the minimum value detection section 80 via the signal line _D1 . The feature of this paper is that only resting points can be word boundaries. For this reason, the address signal
An address is specified in the rest point table 60 within the range of equation (19) by m ₂ =i, and the content of address i is c(i)
is read out via signal line c. c(i) is -1
In this case, the corresponding recurrence formula value g(i, 1) is not output. When c(i) is a non-negative number x, since x is a rest point number, the recurrence formula value g(i, 1) is output as the distance D(x, y, n). This rest point number c(i) is also sent to the partial distance memory 90 and the partial determination memory 100 via the signal line _k2 . The above operations are repeated while the word designation signal n changes from 0 to 9.

最小値検出部８０では、前記パタンマツチング
部７０より信号線D₁を経由して出力される距離
Ｄ（ｘ、ｙ、ｎ）を大小比較する。その結果とし
て同一の休止点対（ｘ、ｙ）に対しては単語ｎに
関して距離Ｄ（ｘ、ｙ、ｎ）の最小値を算出し
（17）式の部分距離Ｄ＾（ｘ、ｙ）とし、またその
最小値を与える単語を（16）式の部分判定Ｎ＾
（ｘ、ｙ）とする。これらはそれぞれ信号Ｄ＾とＮ＾
を経由してそれぞれ部分距離メモリー９０と部分
判定メモリー１００とに記入される。この場合の
番地指定は前記制御部１０からの休止点カウント
信号k₁によつて与えられる休止点番号ｘと前記パ
タンマツチング部７０から信号線k₂を通して与え
られる休止点番号ｙとによつてなされる。この動
作に関与する最小値検出部７０、部分判定メモリ
ー１００、部分距離テーブル９０との全体として
の構成は一例として特願昭51−18346号明細書の
第２図に示された第１比較回路１６、部分判定結
果テーブル１８、部分類似度テーブル１７の構成
接続と同様であつても良い。 The minimum value detection section 80 compares the distances D (x, y, n) outputted from the pattern matching section 70 via the signal line _D1 . As a result, for the same resting point pair (x, y), calculate the minimum value of the distance D(x, y, n) with respect to word n and use it as the partial distance D^(x, y) of equation (17). , and the word that gives the minimum value is determined by partial judgment N^ of equation (16).
Let it be (x, y). These are the signals D^ and N^, respectively.
are entered into the partial distance memory 90 and partial determination memory 100, respectively. In this case, the address is designated by the rest point number x given by the rest point count signal _k1 from the control section 10 and the rest point number y given from the pattern matching section 70 through the signal line _k2 . It will be done. The overall configuration of the minimum value detection section 70, partial determination memory 100, and partial distance table 90 that are involved in this operation is, for example, the first comparison circuit shown in FIG. 16, the partial determination result table 18, and the partial similarity table 17 may have the same configuration and connection.

以上の手続きは入力パタンのベクトル〓_nが入
力されかつ休止区間が発見されるたび（すなわ
ち、休止区間検出パルスq₁が発生されるたび）に
繰り返される。最後に終端検出パルスenが発生
された時点の休止点カウント信号k₁が（Ｋ＋２）
となつているとする。この時、最後の休止点の番
号（すなわち単語列としての終端）は（Ｋ＋１）
となている。したがつて、０≦ｘ＜ｙ≦Ｋ＋１（21）なる範囲内の休止点ｘとｙの組み合せに対して部
分判定Ｎ＾（ｘ、ｙ）と部分距離Ｄ＾（ｘ、ｙ）とが
前記の部分判定メモリー１００と部分距離メモリ
ー９０とに記載されている。 The above procedure is repeated every time the input pattern vector 〓 _n is input and a pause section is found (that is, every time the pause section detection pulse q ₁ is generated). The rest point count signal k ₁ at the time when the last end detection pulse en is generated is (K+2)
Suppose that it is. At this time, the number of the last resting point (that is, the end of the word string) is (K+1)
It becomes. Therefore, for a combination of resting points x and y within the range 0≦x<y≦K+1 (21), the partial judgment N^(x, y) and the partial distance D^(x, y) are as follows. are written in the partial determination memory 100 and the partial distance memory 90.

前記の終端検出パルスｅ_oが第２段処理部１１
０に与えられると、（16）式の計算が始められ
る。この第２段処理部は周知のマイクロプロセツ
サで構成されており、次のような動作を行なう。
（16）式の計算を総当り法で行なうために、Ｋ個
の休止点（単語列全体としての始端と終端を除
く）の各々が真の単語境界である場合とそうでな
い場合の総ての組合せを調べる必要がある。この
ため、第６図のような考え方で休止点（ｘ、ｙ）
の組み合せを発生する。すなわちＫビツトのカウ
ンタ１１０１を仮定し、このカウンタを初期値１
として以後１ずつ加算する。そのＫビツトの出力
γによつてアドレステーブル１１０２をマスクし
てγ＝１のビツトが入力しているアドレスのみを
出力する。アドレステーブルには０から（Ｋ＋
１）までの整数が記録されており、アドレス０と
アドレス（Ｋ＋１）とは常時出力されている。こ
れは単語列の始点と終点とが休止区間として扱わ
れている事実に対応する。これらのアドレス群は
スキヤナーによつて対として走査され、低位アド
レスをｘ＝ｘ（ｋ−１）、高位アドレスをｙ＝ｘ
(k)とするアドレス対k₃＝（ｘ、ｙ）として部分距
離メモリー９０に送られる。これによつて部分距
離Ｄ＾（ｘ（ｋ−１）、ｘ(k)）が信号線Ｄ＾_１を通し
て読み出される。このような番地指定と読み出し
を続けながら第６図のスキヤンを行ないつつ、部
分距離の総和を計算する。次にはカウンタ１１０
１の内容を１だけ増加して上記と同様な手続きを
行なつて総和を計算する。かくしてカウンタ１１
０１の内容が総て１になるまで繰返し総和を計算
する。この総和群の計算に並行してそれらの最小
値を求める。この最小値が得られた時のアドレス
群（第６図のアドレステーブルでγ＝１が指定さ
れているアドレス）が下位アドレスからｘ＾(k)、ｋ
＝０、１、２、…、Ｍ＾（ただし、Ｍ＾はγ＝１とな
るアドレスの総数）となる。かくして、（16）式
の最小化が完了した。すなわち第２段処理が終了
した。 The end detection pulse e _o is sent to the second stage processing section 11.
When set to 0, calculation of equation (16) begins. This second stage processing section is composed of a well-known microprocessor, and performs the following operations.
In order to calculate equation (16) using the brute force method, we calculate all the cases where each of the K resting points (excluding the start and end of the entire word string) is a true word boundary and when it is not. I need to check the combination. Therefore, using the concept shown in Figure 6, the resting point (x, y)
generate a combination of In other words, assume a K-bit counter 1101, and set this counter to an initial value of 1.
, and then increment by 1. The address table 1102 is masked by the K-bit output γ, and only the addresses to which the bit with γ=1 is input are output. The address table starts from 0 (K+
Integers up to 1) are recorded, and address 0 and address (K+1) are always output. This corresponds to the fact that the start and end points of a word string are treated as a pause section. These addresses are scanned in pairs by a scanner, with the low address x=x(k-1) and the high address y=x
(k) is sent to the partial distance memory 90 as an address pair k ₃ =(x,y). As a result, the partial distance D^(x(k-1), x(k)) is read out through the signal line D^ ₁ . The total sum of partial distances is calculated while performing the scanning shown in FIG. 6 while continuing such address designation and reading. Next is the counter 110
The contents of 1 are incremented by 1 and the same procedure as above is performed to calculate the sum. Thus counter 11
The sum is calculated repeatedly until the contents of 01 become all 1. In parallel with this calculation of the sum group, find their minimum value. When this minimum value is obtained, the address group (addresses for which γ = 1 is specified in the address table in Figure 6) is x^(k), k from the lower address.
=0, 1, 2,..., M^ (where M^ is the total number of addresses for which γ=1). Thus, the minimization of equation (16) is completed. In other words, the second stage processing has ended.

（16）式の最適パラメータ（単語境界に対応す
る休止点番号）ｎ＾（ｘ）が求まつた後の判定処理
は判定部１２０により（17）式を実行することに
よつて実行される。すなわちｘ＾＝ｘ＾（ｋ−１）、
ｙ＝ｘ＾(k)の対よりなるアドレス信号k₄を部分判定
メモリー１００に送つてＮ（ｘ＾、ｙ＾）を信号N₁
として読み出すという手続きをｋ＝１、２、…、
Ｍ＾と繰り返すことによつて実行される。これらの
判定結果ｎ＾(k)は信号線ｎ＾を経由して外部へ出力さ
れる。この判定処理は簡単であるので判定部１２
０は前記の第２段処理部１１０と同一共通のマイ
クロプロセツサであつて良い。 After the optimal parameter (pause point number corresponding to the word boundary) n^(x) of equation (16) is determined, the judgment process is performed by executing equation (17) by the judgment unit 120. That is, x^=x^(k-1),
Send address signal k ₄ consisting of a pair of y=x^(k) to partial judgment memory 100 and send N(x^, y^) to signal N ₁
The procedure of reading out as k=1, 2,...
It is executed by repeating M^. These determination results n^(k) are output to the outside via the signal line n^. Since this determination process is simple, the determination unit 12
0 may be the same common microprocessor as the second stage processing section 110 described above.

以上本発明の構成を実施例に基づいて説明した
がこれらの記載は本発明の範囲を限定するもので
はない。特に第２段処理部の構成及び動作は特願
昭50−29891号、特願昭50−132003号、特願昭50
−132004号、特願昭51−18346号明細書に記載さ
れているが如き構成と動作によつても良い。ま
た、本願ではベクトル間の距離を類似性の尺度と
したが特願昭50−132003号明細書の場合のよう
に、ベクトル間の内積を用いてもよい。この場合
には本願における最小値検出の操作はすべて最大
値検出操作におきかえられる必要がある。 Although the configuration of the present invention has been described above based on examples, these descriptions do not limit the scope of the present invention. In particular, the structure and operation of the second stage processing section are disclosed in Japanese Patent Application No. 50-29891, Japanese Patent Application No. 132003-1973,
The configuration and operation described in Japanese Patent Application No. 132004 and Japanese Patent Application No. 51-18346 may also be used. Further, in this application, the distance between vectors is used as a measure of similarity, but the inner product between vectors may be used as in the case of Japanese Patent Application No. 132003/1983. In this case, all minimum value detection operations in the present application must be replaced with maximum value detection operations.

[Brief explanation of the drawing]

第１ａ図、第１ｂ図、第２図は本発明の動作原
理を説明するための図、第３図は本発明の一実施
例を示すブロツク図、第４図、第５図はタイムチ
ヤート、第６図は第３図ブロツク図の一部構成を
説明するためのブロツク図である。図において、１０は制御部、２０は分析部、３
０はレベル検出器、４０は入力パタンバツフア、
５０は音声検出器、６０は休止点テーブル、７０
はパタンマツチング部、８０は最小値検出部、９
０は部分距離メモリー、１００は部分判定メモリ
ー、１１０は第２段処理部、１２０は判定部、１
３０は標準パタンメモリー、１１０１はカウン
タ、１１０２はアドレステーブルをそれぞれ示
す。 Figures 1a, 1b, and 2 are diagrams for explaining the operating principle of the present invention, Figure 3 is a block diagram showing an embodiment of the present invention, Figures 4 and 5 are time charts, FIG. 6 is a block diagram for explaining a part of the configuration of the block diagram of FIG. 3. In the figure, 10 is a control section, 20 is an analysis section, 3
0 is a level detector, 40 is an input pattern buffer,
50 is a voice detector, 60 is a rest point table, 70
is a pattern matching section, 80 is a minimum value detection section, 9
0 is a partial distance memory, 100 is a partial judgment memory, 110 is a second stage processing section, 120 is a judgment section, 1
30 is a standard pattern memory, 1101 is a counter, and 1102 is an address table.

Claims

[Claims]

1 an analysis unit for analyzing an input audio waveform and converting it into an input pattern expressed as a vector sequence;
a standard pattern memory for storing standard patterns; a means for testing the amplitude of input audio to determine resting points within said input pattern; and defined as an interval between each resting point and another resting point. a pattern matching unit for calculating the distance between the partial pattern and the standard pattern; and a word that compares the distances calculated for each standard pattern and gives the minimum value of the partial distance and the minimum value. A minimum value detection unit for calculating a famous partial judgment, and a partial distance corresponding to each partial pattern, which is divided by each rest point in the input pattern, and detects a group of partial patterns that do not overlap and cover the entire input pattern. A high-speed speech recognition system comprising: a means for determining the sum of the sums thereof to be a minimum; and a determining unit for determining and outputting the partial determination corresponding to each partial pattern determined by this as a recognition result. Device.