JPS6140120B2

JPS6140120B2 -

Info

Publication number: JPS6140120B2
Application number: JP53073693A
Authority: JP
Inventors: Ryuichi Oka
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1978-06-20
Filing date: 1978-06-20
Publication date: 1986-09-08
Also published as: JPS552205A

Description

【発明の詳細な説明】本発明は、人が発声した音声を単語単位で自動
的に認識し、その認識結果を例えば活字によつて
表す等の用に供される音声認識装置、殊に実時間
で連続的に当該認識の行える実時間連続音声認識
装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device that is used to automatically recognize speech uttered by a person word by word, and to express the recognition results in, for example, printed text. The present invention relates to a real-time continuous speech recognition device that can perform continuous recognition over time.

従来の音声認識装置は、後に述べるように、実
時間で連続的な音声の認識が行えず、一度に発声
できる単語数が多くても十個、通常四乃至五個と
少く、また、演算結果を出力するための計算量が
膨大であるために、装置が複雑、大規模なものと
なり、計算時間が掛ける上に認識できる語彙の数
も制限されていた。 As will be discussed later, conventional speech recognition devices cannot recognize continuous speech in real time, can only utter at most 10 words at a time, and usually only 4 or 5 words, and the calculation result Because the amount of calculation required to output the words is enormous, the device becomes complex and large-scale, which increases the calculation time and limits the number of vocabulary that can be recognized.

本発明は以上に鑑み、一度に話す単語の数に制
限がなく、認識結果を出力するための計算量を極
力少くし、また構成自体も至便なものとして、結
局は装置の規模も小型化し、扱い得る語彙数も増
加させた、実時間連続音声認識装置を提供するこ
とを主目的としてなされたものである。 In view of the above, the present invention has no limit to the number of words that can be spoken at once, minimizes the amount of calculation required to output recognition results, and has an extremely convenient configuration, which ultimately reduces the size of the device. The main purpose of this invention is to provide a real-time continuous speech recognition device that can handle an increased number of vocabularies.

先づ、第１図にこの種音声認識装置の概略構成
を挙げ、説明をしておく。 First, the schematic configuration of this type of speech recognition device is shown in FIG. 1 and will be explained.

マイクロ・フオン等の音声入力部Ａに入つた音
声入力はアナログ−デジタル変換器Ｂを介しデジ
タル信号となつてバンド・パス・フイルタ乃至相
関器等の分析部Ｃ（実際には上記変換器Ｂも含ん
でなるのが一般である）に入り、分析され、特徴
パターンとなる。一般に分析は音声波形の約
20msec程度の区間のデジタル信号に就いて行わ
れるが、この分析は通常10msec程度づつシフト
して行われる。また、バンド・パス・フイルタの
バンド数乃至相関器の次数は通常10乃至20程度と
なつている。従つて、分析部Ｃからの出力は
10msec毎の10乃至20次元のベクトルの時系列と
なる。 The audio input that enters the audio input section A, such as a microphone, becomes a digital signal via an analog-to-digital converter B, and is converted into a digital signal by an analysis section C, such as a band pass filter or correlator (actually, the above converter B is also used). (generally, it contains the following information), is analyzed, and becomes a characteristic pattern. Generally, analysis is performed on approximately
This analysis is performed on digital signals in intervals of about 20 msec, but this analysis is usually performed with shifts of about 10 msec. Further, the number of bands of the band pass filter or the order of the correlator is usually about 10 to 20. Therefore, the output from analysis section C is
This is a time series of 10- to 20-dimensional vectors every 10 msec.

認識したい単語が装置の中に登録されていない
時には、分析部Ｃの出力は切替スイツチＳを介し
標準パターン記憶部に記憶されるようになつてい
る。逆に、標準パターン記憶部に認識したい語彙
が予め登録、記憶されている場合には、スイツチ
Ｓを介し分析出力は単語認識部Ｅへと継がり、認
識が行なわれることになる。 When the word to be recognized is not registered in the device, the output of the analysis section C is stored in the standard pattern storage section via the changeover switch S. On the other hand, if the vocabulary to be recognized is registered and stored in advance in the standard pattern storage section, the analysis output is passed through the switch S to the word recognition section E for recognition.

ここにおいて、入力音声の分析部Ｃからの出力
は、｛_(t,x)：１ｘＬ｝ (1) なる式で表現される。ここで、ｔ＝１，２……と
し、ｔの間隔は分析の行なわれる間隔、即ち、例
えば10msecである。また、ｘは例えば分析部Ｃ
がバンド・パス・フイルタであるとすると、各バ
ンドの番号を表し、Ｌは既述の所から通常10乃至
20である。この時、上記の_(t,x)はｔ時刻のバ
ンド番号ｘのパワー乃至大きさを示していること
になる。 Here, the output from the input speech analysis section C is expressed by the following equation: { _(t,x) :1xL} (1). Here, t=1, 2, . . . , and the interval t is the interval at which the analysis is performed, that is, for example, 10 msec. In addition, x is, for example, the analysis unit C
If is a band pass filter, it represents the number of each band, and L is usually 10 to 10 from the above.
It is 20. At this time, the above _(t,x) indicates the power or magnitude of band number x at time t.

次に、一般に標準パターン記憶部Ｄに登録され
ている単語の一つ、単語名ｉの単語は｛Ｚ_(j,x)：１ｊＴ_i，１ｘＬ｝ (2) で表現される。これは単語名ｉの標準パターンと
謂われ、Ｔ_iはこの標準パターンのパターン長を
示している。 Next, one of the words generally registered in the standard pattern storage section D, the word with word name i, is expressed as {Z _(j,x) :1jT _i ,1xL} (2). This is called a standard pattern of word name i, and T _i indicates the pattern length of this standard pattern.

ところで、従来は、一単語或いは数個の連続し
た単語を発声する場合、その始まりと終りを指定
してやり、その指定された時点間にある入力パタ
ーンが認識の対象とされてきた。この認識は、標
準パターン群と先の入力パターンとの距離を計算
してその値の最も小さいものの単語名を定めると
いうことで完行されるが、この距離計算には
Dynamic Programming乃至DP（動的計画法）を
使用することが最も有効と考えられている。つま
り、この距離計算では、よく知られているよう
に、入力パターンは標準パターンと比べると通常
発声スピードが異つているため、最も対応してい
る場所で合うように当該入力パターンを伸縮した
りするが、これを先の動的計画法で行うのであ
る。これは、通常、時間正規化と呼ばれている。 By the way, conventionally, when uttering one word or several consecutive words, the beginning and end of the word are specified, and the input pattern between the specified points of time is targeted for recognition. This recognition is completed by calculating the distance between the standard pattern group and the previous input pattern and determining the word name for the one with the smallest value.
Dynamic Programming or DP (Dynamic Programming) is considered to be the most effective method. In other words, in this distance calculation, as it is well known, the input pattern usually has a different speaking speed compared to the standard pattern, so the input pattern is expanded or contracted to fit at the most corresponding place. However, this is done using the dynamic programming method described above. This is commonly referred to as time normalization.

こうした中にあつて、従来の不都合は、この距
離計算の始まる時点は発声の終わつた時点であ
り、距離計算のための時間の進行と対象パターン
の時間の経過とは別々の進行であつたことにあ
る。そのため、音声の認識処理において本質的に
必要な筈の実時間処理、即ち次の時刻の入力が入
る迄に認識を終わつているという認識方式による
処理が不可能であつた。従つて、これ迄、「実時
間処理」といわれれているものは、発声が終わつ
てから「短い時間後」という意味でしかなかつた
のである。そればかりか、発声後の「短い時間
後」に結果を出すにしても、その計算は膨大なも
のであり、既述の欠点に結びついていた。 Under these circumstances, the disadvantage of the conventional method is that the distance calculation begins at the end of the utterance, and the time progression for distance calculation and the time progression of the target pattern are separate progressions. It is in. For this reason, real-time processing that is essentially necessary in speech recognition processing, that is, processing using a recognition method in which recognition is completed by the time the next time input is received, has not been possible. Therefore, up until now, what has been called ``real-time processing'' has only meant ``a short period of time after'' the utterance has finished. Not only that, but even if the results were to be produced ``shortly after'' the utterance, the calculations involved were enormous, leading to the drawbacks mentioned above.

こうして欠陥に鑑み、本発明を成立させる前提
として、本発明者は特別の知見を得ることに努力
した。 In view of these deficiencies, the inventors have made efforts to obtain special knowledge as a premise for establishing the present invention.

即ち、上述の欠陥は、入力パターン（既述の(1)
式）のもつ或る時刻ｔにおいて、ｔ時刻以前の過
去の入力をも考慮して、時間正規化という意味に
則つた単語名ｉとの距離をその時刻毎に簡単に算
出できれば克服することができる。今、このよう
な距離が各時刻において求まつたとし、これを
Ai（ｔ）とすれば、前もつてλの値を定てお
き、ｉ^*（ｔ）：^ｍｉｎ _ｉＡ_i(t) ＝Ａ_i ^* _(t)（ｔ）λ (3) なるｉ^*（ｔ）を求めれば、ｔ時刻においてｉ^*
（ｔ）という単語名を認識しているということを
自動的に判定できるものとなる。式(3)のｉ^*
（ｔ）がｔ時刻において存在しなければその時刻
は如何なる単語も発声し終わつていないか（つま
り発声途中であるか）或いは如何なる単語も発声
されていない時刻であるとすることができる。ｉ
^*（ｔ）が定まればそれに対応する単語名は前も
つて判つているので、第１図中の出力部Ｆで表示
する等すれば良い。 In other words, the above defect is caused by the input pattern ((1)
This problem can be overcome if the distance from the word name i in accordance with the meaning of time normalization can be easily calculated at a certain time t of the formula (formula) by taking into account past inputs before time t. can. Now, suppose that such a distance is found at each time, and this is
Let Ai(t) be the value of λ, and then i ^* (t): ^min _i A _i(t) = A _i ^* _(t) (t)λ (3 ⁾ t), at time t, i ^*
It is possible to automatically determine that the word name (t) is recognized. i ^* in equation (3)
If (t) does not exist at time t, it can be assumed that no word has been uttered at that time (in other words, it is in the middle of being uttered) or that no word is uttered at that time. i
^* Once (t) is determined, the word name corresponding to it has already been known, so it can be displayed on the output section F in FIG.

本発明は以上のような役割を持つＡ_i(t)を得れ
ば良いという知見に基いており、単語認識部Ｅを
してこのＡ_i(t)を自動的に算出するようにしたも
のである。以下詳記することから理解されるよう
に、本発明の構成による方式は蓋し連続DPと呼
んで良いもので、これは既述のDP（動的計画
法）が各時刻毎に過去を考えて完行される故で、
それにより各時刻毎に単語名の認識が可能であ
り、その意味で連続単語の認識が可能となつてい
るのである。 The present invention is based on the knowledge that it is sufficient to obtain A _i(t) having the role described above, and the word recognition unit E is configured to automatically calculate this A _i(t). It is. As will be understood from the detailed description below, the method according to the configuration of the present invention can be referred to as continuous DP, which means that the previously described DP (dynamic programming) considers the past at each time. Because it is completed by
This makes it possible to recognize word names at each time, and in this sense, it is possible to recognize consecutive words.

本発明装置の単語認識部の構成の一実施例を第
２図に示すが、大別して六つの回路系１〜６から
成つている。この第２図示の構成は一つの単語名
ｉに関するものであるが、簡単のために、以下で
はこのｉを省略する。 An embodiment of the construction of the word recognition section of the device of the present invention is shown in FIG. 2, which is roughly divided into six circuit systems 1 to 6. Although the configuration shown in the second diagram relates to one word name i, this i will be omitted below for simplicity.

回路１は部分距離計算回路で、ここではＴ個のを計算する。これは謂わば、全体の距離を計算す
るための部分距離、正しくは分析された或る時刻
の音声入力の、標準パターンの各点への部分距離
を計算するものである。この回路１では、(4)式を
計算するものがＴコ並設されているが、このＴは
既述の式(2)による標準パターンのＴ_iに対応した
ものである。勿論、回路１に与えられるＺは第１
図中の標準パターン記憶部Ｄからの出力によつて
定まつている。 Circuit 1 is a partial distance calculation circuit, where T Calculate. This is, so to speak, a partial distance for calculating the entire distance, or more precisely, a partial distance of the analyzed audio input at a certain time to each point of the standard pattern. In this circuit 1, T units for calculating equation (4) are arranged in parallel, and this T corresponds to the standard pattern T _i according to equation (2) described above. Of course, Z given to circuit 1 is the first
It is determined by the output from the standard pattern storage section D in the figure.

而して、回路１の出力はＴ個のＱ_(t,j)である
が、これは入力｛_(t,x)：１ｘＬ｝が入つ
てくる毎に計算され、また、これ等は独立に計算
可能となつている。 Therefore, the output of circuit 1 is T Q _(t,j) , which is calculated every time the input { _(t,x) :1xL} comes in, and these are independently calculated. It is now possible to calculate.

これ等Ｔ個の回路１からの出力は、次いで部分
距離最適積分回路２に入る。この回路系２とこれ
に付随した回路系４（後述）では、Ｑ_(t,j)の
夫々に対応した同じくＴ個のＰ_(t,j)，ｊ＝１，
２，……，Ｔと、各Ｐ_(t,j)に常に付随している
Ｃ_(t,j)が考えられ、合計2T個のレジスタの値を
定めることが目的となつている。この計算には、
この場合、二単位迄の過去の最適部分距離の積分
値となる量（後述の回路系３によつて記憶されて
いる）と回路系１の出力とが関与する。 The outputs from these T circuits 1 then enter a partial distance optimal integrator circuit 2. This circuit system 2 and the associated circuit system 4 (described later) have T _(t,j) corresponding to each of Q _(t,j) , j=1,
2 _, _. For this calculation,
In this case, a quantity serving as an integral value of past optimal partial distances up to two units (stored by circuit system 3, which will be described later) and the output of circuit system 1 are involved.

この回路は以下の簡単な計算式(5)，(6)によつて
定められる。 This circuit is determined by the following simple calculation formulas (5) and (6).

これ等の式の意味する所は次のようである。式
(5)の意味を理解するには、回路３の意味する内容
も同時に理解する必要がある。 The meanings of these expressions are as follows. formula
In order to understand the meaning of (5), it is necessary to understand the meaning of circuit 3 at the same time.

回路３には或る時刻、この場合ｔ時刻より二単
位時間前迄の過去の最適積分値となる量が記憶さ
れているが、ｔ時刻においてはｔ時刻における最
適積分値となる量を回路２によつて回路１の出力
Ｑ_(t,j)と相俟つて定めることになる。 The circuit 3 stores the amount that is the past optimal integral value at a certain time, in this case two units of time before time t. This is determined together with the output Q _(t,j) of circuit 1.

ｔ時刻の最適積分値となる量は、一または二単
位時間前の最適積分値の中から三つを選び、それ
ぞれ回路を一つの出力に乗算器９をかけた値を加
算器８によつて加える操作により都合三つのｔ時
刻の最適積分値となる量の候補を作り、これ等三
つのものの中で最も小さいものが比較器１０によ
つて選ぶ積分決定回路７を構成する。これが式(5)
の持つ意味である。 The amount that becomes the optimal integral value at time t is determined by selecting three of the optimal integral values one or two units ago, and multiplying the output of each circuit by the multiplier 9 using the adder 8. By the addition operation, candidates for quantities that are the optimum integral value at three times t are created, and the comparator 10 selects the smallest one among these three, forming an integral determining circuit 7. This is equation (5)
This is the meaning of

三つのものの中で最も小さいものを採るのは最
適距離は標準パターンと最も小さく採れるものと
して定められているからである。これにより、話
す速度が標準パターンを作つた時と異なつていて
も許されることになる。 The reason why the smallest of the three is selected is that the optimal distance is determined as the one that is the smallest possible distance from the standard pattern. This allows the speaker to speak at a different speed than when creating the standard pattern.

さて、これが回路系２のＰ_(t,j)、ｊ＝１，
２，……，Ｔを作る意味であるが、これだけで何
故、最適距離が求まるのかと謂えば、今、ｔ時刻
で定まる最適積分値となる量の個数はＴ個、即ち
Ｐ_(t,j)，ｊ＝１，２，……，Ｔもある。これ等
が夫々過去の最適積分値となる量をも考慮して新
たにｔ時刻のものとして定まつていることは明ら
かである。何故なら、回路系３は回路系２で作ら
れたＰ_(t,j)，j1，２，……，Ｔを遅延回路Ｇ等
によつて時間を遅延させて作られるからである。 Now, this is P _(t,j) of circuit system 2, j=1,
2, ..., T, but how can the optimal distance be found just by doing this? Now, the number of quantities that will be the optimal integral value determined at time t is T, that is, P _{(t,j )} , j=1, 2, ..., T. It is clear that these are newly determined as those at time t, taking into consideration the quantities that are the past optimal integral values. This is because circuit system 3 is created by delaying P _(t,j) , j1, 2, . . . , T created in circuit system 2 using delay circuit G or the like.

以上のことから、正確な表現を以つて述べれ
ば、回路系２は、部分距離計算回路系１によつて
計算された標準パターンの各点への部分距離を最
適積分し、標準パターンの各点に対応した最適積
分量となる量を得るものであり、回路系３はこの
最適積分量となる量の計算に必要とする、上記部
分距離最適積分回路系２により得られた過去の最
適積分量となる量を記憶するものであると謂え
る。 From the above, to put it in precise terms, the circuit system 2 optimally integrates the partial distances to each point of the standard pattern calculated by the partial distance calculation circuit system 1, and The circuit system 3 obtains the optimal integral quantity corresponding to the optimal integral quantity, and the circuit system 3 obtains the past optimal integral quantity obtained by the above-mentioned partial distance optimal integral circuit system 2, which is necessary for calculating the quantity that becomes the optimal integral quantity. It can be said that it memorizes the quantity.

さて、第２図中、右端に示したＰ_(t,T)の意味
を考えてみよう。これは、明らかに、Ｓ＝｛Ｐ_(t-2,T-1)，Ｐ_(t-1,T-1)，Ｐ_(t-1,T-2)｝と、Ｑ_(t,T)とから定められていることが理解さ
れよう。そこで、Ｓの中の任意の一つ、例えばＰ
_(t-1,T-1)を採つてみよう。これは、t′_(=t-1)の
時刻には、回路系２のＴ−１の所であつたもので
ある。今、時刻をこのt′_(=t-1)に戻り、回路系２
の状況を考えると、明らかにこのＰ_(t′_,T-1)も、 S′＝｛Ｐ_(t′_-2,T-2)，Ｐ_(t′_-1,T-2)，Ｐ_(t′_-1,T-3)｝とＱ_(t′_,T-1)で作られていることが判る。而し
て、このS′の中の任意の一つ、例えばＰ_(t′_-2,T-
_２）を採ると、同じく時刻t′−２の時を考えれば、
このＰ_(t′_-2,T-2)は S″＝｛Ｐ_(t′_-4,T-3)，Ｐ_(t′_-3,T-3)，Ｐ_(t′_-3,T-4)｝とＱ_(t′_-2,T-2)とから作られている。 Now, let's consider the meaning of P _(t,T) shown at the right end of Figure 2. This clearly means that S={P _(t-2,T-1) , P _(t-1,T-1) , P _(t-1,T-2) } and Q _(t,T) It will be understood that it is determined from this. Therefore, any one of S, for example P
Let's take _(t-1,T-1) . This is what was at T-1 of circuit system 2 at time t' _(=t-1) . Now, return to this t′ _(=t-1) , and circuit system 2
Considering the situation, clearly this P _(t ′ _,T-1) also becomes S′={P _(t ′ _-2,T-2) , P _(t ′ _-1,T-2) , P _{( t} ′ _-1,T-3) } and Q _(t ′ _,T-1) . Therefore, any one of this S′, for example, P _(t ′ _-2,T-
If we take ₂₎ and also consider time t'-2, we get
This P _(t ′ _-2,T-2) is S″={P _(t ′ _-4,T-3) , P _(t ′ _-3,T-3) , P _(t ′ _{-3,T- 4)} } and Q _(t ′ _-2,T-2) .

この操作を繰返してＰ（・，１）が出現する迄
を考えると、これは回路系２として示した中で左
端にあつたものが見付かる。そして、前述の所で
S′からS″，S″からＳ，ＳからＳ〓……の
節々では最適の距離夫々が、過去の距離とＱ
（・，ｊ）を考えて作られているから、全体とし
てみればＰ_(t,T)は(2)式の標準パターンと最適の
距離を採つていることが理解されよう。 If we repeat this operation until P(., 1) appears, we will find the one at the left end of the circuit system 2. And in the above
At the nodes S' to S'', S'' to S, S to S〓..., the optimal distance is the past distance and Q.
Since it is created with (·, j) in mind, it can be understood that overall, P _{(t, T)} takes the optimal distance from the standard pattern in equation (2).

即ち、という過去の入力をも考慮した距離がｔ時刻のＰ
_(t,T)として作られていることが判る。また、Ｐ
_(t,j)，ｊ＝１，２，……，Ｔは夫々その時刻で
回路系１と回路系３の中のもののみを使つて定め
られ、謂わば各時刻では常に自己調達されたもの
で十分であることが判る。 That is, The distance P at time t that takes into account the past inputs is
It can be seen that it is created as _(t,T) . Also, P
_(t,j) , j=1, 2, ..., T are determined using only those in circuit system 1 and circuit system 3 at that time, so to speak, they are always self-procured at each time. It turns out that this is sufficient.

以上のことから、Ｐ_(t,T)の意味が明らかに最
適な距離を表していることが理解されただろう。 From the above, it should be understood that the meaning of P _(t,T) clearly represents the optimal distance.

さて、Ｐ_(t,T)は(7)式で定まることは判かる
が、それは計算の途中ではＱ_(t,j)に重みKj（こ
の場合、式(5)から２乃至３である）がかかつてお
り、又、一般に標準パターンの長さＴは単語が異
なれば異なるので、長さの異なる単語のＰ_(t,T
_）、即ちＰ_(t,Tl)とＰ_(t,Tn)（ｌ≠ｍ）を比較
する時には意味がないと同時に、時刻の異なるＰ
_(t1,T)，Ｐ_(t2,T)（t₁≠t₂）を比較しても意味が
ない。というのも時刻t₁，t₂夫々でＰ_(t1,T)，Ｐ
_(t2,T)を作る重みKjの和が異なつているためで
ある。 Now, it can be seen that P _(t,T) is determined by equation (7), but in the middle of the calculation, Q _(t,j) is given a weight Kj (in this case, it is 2 to 3 from equation (5)). In addition, generally the length T of the standard pattern is different for different words, so P _(t,T
₎ , that is, it is meaningless when comparing P _(t,Tl) and P _(t,Tn) (l≠m), and at the same time, P (t,Tl) and P (t,Tn) (l≠m)
There is no meaning in comparing _(t1,T) and P _(t2,T) (t ₁ ≠t ₂ ). This is because at times t ₁ and t ₂ , P _(t1,T) and P
This is because the sum of weights Kj that make up _(t2,T) is different.

これ等のことから、Ｐ_(t,T)を作るに要したKj
の和を求めて、これでＰ_(t,T)を割つてやれば、
上記の二つの問題点はいづれも解決する。このた
めに定められたものが式(6)のＣ_(t,j)である。式
(5)と式(6)からＣ_(t,j)はＰ_(t,j)が作れらる迄に
要した重みKjの和を表していることが判かる。
即ち、この重みKjの和は図中、回路系４によつ
て作られているが、回路系４は、まとめ述べれ
ば、標準パターンの各点に対応した既述の過去の
最適積分量となる量と、標準パターンの各点への
現在の部分距離に予め定められた重みを掛けた値
とを加え、その時刻の標準パターンの各点に対応
したこれも既述の最適積分量となる量の候補を定
め、該候補の中から標準パターンの各点に対応し
た最適積分量となる量を求めるに際し、それ等の
最適積分量となる量を得るに要した重みの和を計
算する最適重みの和計算回路と言えるもので、そ
の際には、同じく、回路系４により計算された過
去の最適重みの和（回路系５が記憶している）を
用いているのである。 From these facts, Kj required to make P _(t,T)
If we find the sum of and divide P _(t,T) by this, we get
Both of the above two problems are solved. C _(t,j) in equation (6) is determined for this purpose. formula
It can be seen from (5) and equation (6) that C _(t,j) represents the sum of the weights Kj required until P _(t,j) is created.
That is, the sum of the weights Kj is created by circuit system 4 in the figure, and circuit system 4 is, in summary, the past optimal integral quantity corresponding to each point of the standard pattern. and the value obtained by multiplying the current partial distance to each point of the standard pattern by a predetermined weight, and calculate the amount that corresponds to each point of the standard pattern at that time and is also the optimal integral amount described above. Optimum weight for calculating the sum of the weights required to obtain the optimal integral amount when determining the optimal integral amount corresponding to each point of the standard pattern from among the candidates. This can be said to be a sum calculation circuit, in which the past optimal weight sum calculated by the circuit system 4 (stored in the circuit system 5) is used.

斯くして、Ｐ_(t,T)に対応するＣ_(t,T)で当該
Ｐ_(t,T)を除せば、Ｐ_{（ｔ，Ｔ）}／Ｃ_{（ｔ，Ｔ）}＝Ａ_(t) (8) となり、この(8)式は明らかに異なる単語間、異な
る時刻間で比較することの意義が十分にある値で
あることを示している。この計算は因みに回路系
６で行なわれるが、これも正確に表現すれば、標
準パターンのパターン長と等しい点に存る最適積
分量となる量を、標準パターンのパターン長と等
しい点にある最適重みの和となる量で除した値を
標準パターンとの距離として出力する回路と謂う
ことができる。 Thus, if we divide P _(t,T) by C _(t,T) corresponding to P _(t,T) , we get P _(t,T) /C _(t,T) =A _(t) (8), and this equation (8) clearly shows that it is a value that is sufficiently meaningful to compare between different words and different times. Incidentally, this calculation is performed in circuit system 6, but to express this accurately, the amount that is the optimal integral amount at a point equal to the pattern length of the standard pattern is It can be said to be a circuit that outputs the value divided by the sum of the weights as the distance from the standard pattern.

ここで実践的な問題として式(5)，(6)に就き考え
るとすれば、ｔ＝１（即ち、本装置の作動開始
時）の時の回路系３の値をどう定めるか、という
ことがある。これはＰ_(0,j)，Ｐ_(-1,j)，ｊ＝
１，２，……，Ｔ，^C（０，ｊ），^C（−１，ｊ），ｊ
＝１，２，……，Ｔを定めることであるが、Ｐ_(0,j)＝Ｐ_(-1,j)＝Ｍ，〓 (9) ｊ＝１，２，……，ＴＣ_(0,j)＝Ｃ_(-1,j)＝Ｏ，ｊ＝１，２，……，Ｔとして、値Ｍを先に挙げた値λに比し充分大きな
値としておくと、これ等の初期条件に関与したＡ
_(t)の値はλに比べて充分大きくなるので、認識
においては関係がなくなるものとすることができ
る。 If we consider equations (5) and (6) as a practical problem, how to determine the value of circuit system 3 at t = 1 (i.e., when the device starts operating)? There is. This is P _(0,j) , P _(-1,j) , j=
1, 2, ..., T, ^C (0, j), ^C (-1, j), j
= 1, 2, ..., T is determined, P _{(0, j)} = P _{(-1, j)} = M, 〓 (9) j = 1, 2, ..., T C _{(0 ,j)} =C _(-1,j) =O, j=1,2,...,T, and if the value M is set to be a sufficiently large value compared to the value λ mentioned earlier, these initial conditions A involved in
Since the value of _(t) is sufficiently larger than λ, it can be assumed that there is no relationship in recognition.

以上のことから明らかなように、第２図示の構
成は一つの標準パターンに対応したものであり、
今、Ｎ個の標準パターンを考えると、第２図示の
構成はＮ個あつた方が良い。というのも、Ｎ個あ
れば、それ等は入力_(t,x)毎に並列にＰ_(t,j
_），Ｃ_(t,j)を計算でき、Ｎ個のＡ_(t)が同時に作
られるからである。但し、第２図示の構成が一個
しかなくても、Ｎ個のＡ_(t)を直列に作り得る
が、Ｎ倍の時間が掛けることになる。 As is clear from the above, the configuration shown in the second diagram corresponds to one standard pattern,
Now, considering N standard patterns, it is better to have N in the configuration shown in the second figure. This is because if there are N, they are P ₍ _t,j
₎ , C _(t,j) can be calculated, and N pieces of A _(t) are created at the same time. However, even if there is only one configuration shown in the second figure, N pieces of A _(t) can be made in series, but it will take N times as much time.

而し、いづれにせよ、原理的にはＮ個の標準パ
ターンを考える時、Ｎ個のＡ_(t)が作られること
になる。 However, in any case, in principle, when considering N standard patterns, N A _(t) will be created.

斯くして本発明の目的は達せられたが、尚、こ
れ等をＡ_i(t)，ｉ＝１，２，……，Ｎとして、こ
れ等から如何にして単語が定まるかに言い及んで
おく。 Although the purpose of the present invention has been achieved in this way, let us now refer to these as A _i(t) , i=1, 2, ..., N, and explain how words can be determined from these. put.

今、単語名ｉの単語を、単独にせよ連続発声す
る単語の一つであるにせよ、発声し終わつた時刻
を仮にｔ_Fとしよう。もつとも、このｔ_Fはこの説
明のために考えるもので、認識においては如何な
る意味でも前もつて定まる必要のないものであ
る。而して、単語名ｉの単語を発声し終わると、
Ａ_i(tF)はＡ_k(tF)，ｋ≠ｉより小さいのは素よ
り、単語発声途中のＡ_i(t)，ｔ＜ｔ_F、発声後のＡ
_i(t)，ｔ＞ｔ_F又、何も発声していない時刻のＡ_i(t
_）よりも小さいのは明らかである。 Now, suppose _tF is the time when the word with word name i is finished being uttered, whether it is uttered singly or as one of the words to be uttered continuously. However, this t _F is considered for the purpose of this explanation, and does not need to be determined in advance in any sense in recognition. Then, when you finish uttering the word with word name i,
A _i(tF) is smaller than A _k(tF) , k≠i, so A _i(t) during word utterance, t<t _F , and A after utterance.
_i(t) , t>t _FAlso , A _i(t) at the time when no voice is uttered
₎ is clearly smaller than

従つて、再桁するが、前もつてλの値を定めて
おいて、先掲の(3)式に基き、ｉ^* _(t)：^ｍｉｎ _ｉＡ_i(t) ＝Ａ_i ^* _(t)（ｔ）λ (3) なるｉ^*（ｔ）を定めれば、（もし(3)式を満たす
ｉ^*（ｔ）がなければｉ^*（ｔ）＝φ（空）とす
る）、ｉ^*（ｔ）は各時刻毎の認識結果を示して
いる。そして、ｉ^*（ｔ）＝φを考えないとすれ
ば、ｉ^*（ｔ）はＮ個の単語の中の一つを認識し
ていると同時に、その認識時刻も結果的に示して
いる。この意味で、本発明装置は連続単語の認識
を可能にしていると謂える。 Therefore, although the value of λ has been determined in advance, based on the above equation (3), i ^* _(t) : ^min _i A _i(t) = A _i ^* _(t) (t) λ (3) (If there is no i ^* (t) that satisfies equation (3), then ^{i *} ⁽ t) = φ (empty)), i ^* (t) shows the recognition results at each time. If i ^* (t) = φ is not considered, i ^* (t) recognizes one of the N words, and at the same time also indicates the recognition time. In this sense, it can be said that the device of the present invention enables continuous word recognition.

ただ、ｉ^*（ｔ）は或る時刻のみが空でない認
識結果を示し、その前後は総てφであるというこ
とにはならず、（これは発声の終りのあいまい性
による）、空でない時刻は数時刻続くことにな
る。従つて、ｉ^* _(t)＝ｉ^*（ｔ＋１）＝……ｉ^*（ｔ＋Ｈ）
のように、少なくともＨ個の以上の同一単語の認
識が続く時にそれを認識したと定めることが実際
には行なわれよう。 However, i ^* (t) shows a recognition result that is not empty only at a certain time, and it does not mean that everything before and after that is φ (this is due to the ambiguity at the end of the utterance). will last for several hours. Therefore, i ^* _(t) = i ^* (t+1) =...i ^* (t+H)
In practice, recognition will be determined when at least H identical words are recognized continuously.

第３図は、第２図示の構成に比し、より認識度
を高めるために、Ｐ_(t,j)の決定にＱ_(t,j)だけ
でなく、Ｑ_(t-1,j)，Ｑ_(t,j-1)、即ち一時刻前の
部分距離をも記憶回路７によつて記憶させて用い
ようとするもので、ソフトウエア的に相違はある
が基本的構成は第２図示各回路系の機能で満足さ
れるものであるため、同一構成子には同一符号を
付した回路ダイアグラムを挙げるに留めておく。 Compared to the configuration shown in FIG. 2, FIG. 3 uses not only Q _(t,j) but also Q _(t-1,j) , Q _(t-1,j) , Q _(t,j-1) , that is, the partial distance one time ago is also stored and used in the memory circuit 7, and although there are differences in software, the basic configuration is the same as shown in the second figure. Since the functions of the circuit system are satisfied, only circuit diagrams in which the same components are given the same reference numerals will be shown.

以上詳記のように、本発明によれば、語彙数こ
そ制限されるものの、無限の個数の単語を一度に
連続的に認識することができ、当該認識部は構成
至便、同一の構造を持つ一様な素子により構成し
得るという大きな効果を呈し、音声タイプライタ
等の応用面に極めて有用なものである。また、第
２，３図示の構成からも明らかなように、マスタ
ークロツクに同期して演算、識別させることもで
き、各クロツクの計算量も、式(5)，(6)，(3)のみで
良いから極めて少く、十分実時間、即ち通常の入
力間隔（既述のように通常10msec程度）内に計
算可能であるので、連続単語を発声する場合、各
単語を話し終つた瞬間に結果を出すことができ
る。因みに、従来の最も優れたものと比べても計
算量は約5000分の１に迄縮められている。 As detailed above, according to the present invention, although the number of vocabulary is limited, it is possible to continuously recognize an infinite number of words at once, and the recognition section is conveniently configured and has the same structure. It has the great effect of being able to be constructed from uniform elements, and is extremely useful in applications such as voice typewriters. Furthermore, as is clear from the configurations shown in the second and third figures, calculations and identification can be performed in synchronization with the master clock, and the amount of calculation for each clock can be reduced using equations (5), (6), and (3). It is possible to calculate the result in real time, that is, within the normal input interval (usually about 10 msec as mentioned above). can be produced. Incidentally, the amount of calculation has been reduced to about 1/5000 compared to the most excellent conventional method.

[Brief explanation of the drawing]

第１図は音声認識装置の概略構成図、第２図は
本発明装置の単語認識部の一実施例の概略構成
図、第３図は同じく第二の実施例の概略構成図、
である。図中、Ａは音声入力部、Ｃは音声入力分析部、
Ｄは標準パターン記憶部、Ｅは単語認識部、１は
部分距離計算回路、２は部分距離最適積分回路、
３は過去の最適積分量となる量の記憶回路、４は
最適重みの和計算回路、５は過去の最適重みの和
記憶回路、６は標準パターンとの距離出力回路、
７は積分決定回路、８は加算器、９は乗算器、１
０は比較器、１１は一時刻前の部分距離の記憶回
路である。 FIG. 1 is a schematic configuration diagram of a speech recognition device, FIG. 2 is a schematic configuration diagram of an embodiment of the word recognition section of the device of the present invention, and FIG. 3 is a schematic diagram of a second embodiment.
It is. In the figure, A is a voice input section, C is a voice input analysis section,
D is a standard pattern storage unit, E is a word recognition unit, 1 is a partial distance calculation circuit, 2 is a partial distance optimal integration circuit,
3 is a storage circuit for the past optimal integral amount; 4 is a circuit for calculating the sum of optimal weights; 5 is a circuit for storing the sum of past optimal weights; 6 is a distance output circuit from the standard pattern;
7 is an integral determining circuit, 8 is an adder, 9 is a multiplier, 1
0 is a comparator, and 11 is a storage circuit for a partial distance one time ago.

Claims

[Claims] 1. A speech input section, an analysis section for the speech input, and a standard pattern storage section, which calculates the distance between the analyzed speech input and the standard pattern, and calculates the word name of the input speech. In a speech recognition device having a word recognition unit for identifying a word and an output unit for outputting the recognition result, the word recognition unit at least recognizes the analyzed speech input at a certain time to each point of a standard pattern. a circuit that calculates each partial distance; a partial distance optimal integration circuit that optimally integrates the partial distances to each point of the standard pattern to obtain an optimal integral amount corresponding to each point of the standard pattern; A circuit for storing the past optimal integral quantity obtained by the partial distance optimal integral circuit, which is used to calculate the quantity, and a circuit that stores the past optimal integral quantity corresponding to each point of the standard pattern. and a value obtained by multiplying the partial distance at the certain time to each point of the standard pattern by a predetermined weight, and calculate the optimal integral amount at the certain time corresponding to each point of the standard pattern. means for determining candidates; and an optimal weight sum calculation circuit for calculating the sum of weights required to determine the optimal integral amount when determining the optimal integral amount corresponding to each point of the standard pattern from among the candidates; , the circuit that stores the sum of the past optimal weights required to calculate the sum of the optimal weights at a certain time, and the optimal integral amount from the integrating circuit at a point equal to the pattern length of the standard pattern. a circuit that outputs a value obtained by dividing the sum of the optimal weights from the weight integration circuit at a point equal to the pattern length of the standard pattern as a distance from the standard pattern; Time continuous speech recognition device.