JPS61148496A

JPS61148496A - Continuous voice recognition equipment

Info

Publication number: JPS61148496A
Application number: JP59269919A
Authority: JP
Inventors: 誠夫亘理
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1984-12-21
Filing date: 1984-12-21
Publication date: 1986-07-07
Also published as: JPH0346840B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明の連続音声認識装置に関し、特に文法に従って連
続発声された文音声を認識する装置の改良に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a continuous speech recognition device, and particularly relates to an improvement of a device that recognizes sentence speech continuously uttered according to grammar.

（従来の技術）音声認識装置の中でも文法に従って発声された文音声を
認識する装置は、計算機プログラムや限定業務用文章あ
るいは航空管制や各種機器の制御用指令などの認識がで
き広範囲な応用分野を有している。文法の拘束が与えら
れている場合には、その文法規則を利用することによっ
て誤認識を防止できることが原理的に知られている。特
に連続数字認識において入力音声に桁数の制約がある場
合、その制約を規則化することにより認識率を改善する
ことができる。(Prior art) Among speech recognition devices, devices that recognize sentence sounds uttered according to grammar can recognize computer programs, limited business texts, air traffic control and control commands for various equipment, etc., and have a wide range of applications. have. It is known in principle that when grammatical constraints are given, misrecognition can be prevented by using the grammatical rules. In particular, when there is a restriction on the number of digits in input speech in continuous number recognition, the recognition rate can be improved by regularizing the restriction.

このような文法に従って連続に発声された文音声を認識
する手法が本願の発明者による特願昭５９−６８０１５
号明細書「連続音声認識装置」Ｋ記載されている。A method of recognizing sentence sounds continuously uttered according to such a grammar was proposed in Japanese Patent Application No. 59-68015 by the inventor of the present application.
``Continuous Speech Recognition Device'' is described in the specification of No. K.

この原理である斜めブロックワイズＤＰマツチング法は
大路次のようである。文法をオートマトンαで表現し、
そのオートマトンαを次のように定義する。The principle of this diagonal blockwise DP matching method is as follows. Expressing the grammar with automaton α,
The automaton α is defined as follows.

α−＜Ｋ、Σ、ΔｔＰｏ、Ｆ＞　　　・・・・・ｍ１・
（１）ここで、Ｋ：状態ｐの集合（ｐｌｐ−１，２゜・
・・π） Σ：入力単語ｎの集合（ｒｌＩｎ”１゜２、・・・Ｎ） Δ：状態遷移規則［（ｐ、　　ｑ、　　ｎ）］ここで、
（ｐ、ｑ、ｎ）はｐ！−ｑなる状態遷移を意味する。α−<K, Σ, ΔtPo, F> ・・・・・・m1・
(1) Here, K: set of states p (plp-1, 2°・
...π) Σ: Set of input words n (rlIn"1゜2,...N) Δ: State transition rule [(p, q, n)] where,
(p, q, n) is p! −q means a state transition.

Ｐｏ：初期状態、以後はｐ−０で示す。Po: initial state, hereinafter indicated as p-0.

Ｆ：最終状態集合ＦｃＫ次に前記オートマトンαに従って単語１ＥＥ−Σを連続
して発声して得られる音声パタンＡをＡ　ｚ　ａｌ　、
ａｚ、　・−ａｌ＋　・＋＋　ａｚ　　　　”・”　（
２）で示し、これを（未知）入力パタンと呼ぶ。容重１
ｌｎｅｒに対して標準的なパタンＢ”−ｂ：、ｂ星、・・・ｂ″　・・・ｂ＝　　　・・
・・・・・・・（３）°ｊ°。F: Final state set FcK Next, the speech pattern A obtained by continuously uttering the word 1EE-Σ according to the automaton α is A z al ,
az, ・-al+ ・++ az “・” (
2), and this is called an (unknown) input pattern. Weight 1
The standard pattern for lner is B"-b:, b star,...b"...b=...
・・・・・・・・・(3)°j°.

を用意し、これを単語標準パタンと呼ぶ。この単語標準
パタンＢ″をオートマトンαに従って接続することによ
って得られる連続音声標準パタンＣ−Ｂ”　、Ｂ“２．
・・・Ｂ′と入カパタン人とのＤＰマツチングを行い、
２つのパタンの相互に異なる度合を表わす量（以下相異
度と称する）を算出し、最小の相異度を与える単語系列
を認識結果とする。is prepared and called a word standard pattern. Continuous speech standard patterns C-B", B"2. which are obtained by connecting this word standard pattern B" according to the automaton α.
...Perform DP matching between B' and the incoming person,
A quantity representing the degree of mutual difference between two patterns (hereinafter referred to as a degree of dissimilarity) is calculated, and a word sequence that provides the minimum degree of dissimilarity is taken as a recognition result.

ここで最小の相異度を次のような動的計画の手法で求め
る。初期条件をＴ（ｏ、ｏ）−。Here, the minimum degree of dissimilarity is found using the following dynamic programming method. The initial condition is T(o,o)−.

Ｔ（ｍ＋ｑ）＝”、ｍ〆０ｅＱ−〇　　−（４）Ｇ（ｐ
、ｎ、ｊ）−閃とし、ｉりｌよりＩ／ＩＬ　（ここでＩ／ＩＬは説明の
簡単のため割シ切れるとする）まで屓次次の（５）（６
″Ｘの境界条件を基に（７）式の漸化式を（ｐ、ｑ。T(m+q)=”, m〆0eQ−〇−(4)G(p
.
``Based on the boundary conditions of

ｎ）εΔなるすべての対（ｐ、ｎ）について計算する。n) Calculate for all pairs (p, n) such that εΔ.

すなわち、境界条件をとし、Ｊ−１ｅ・・・Ｊｏなる各標準パタン時刻ｊにつ
いてｍｓｊ＝ｍｓ。十（ｊ　−ａ）ｍ＠ｊ　”ｍ穆ｊ＋ＩＬ−まただしくＸ）はＸよシ小さな最大の整数とし、境界条件とし、漸化式を時刻ｍ””ｍ、Ｊよ’）ｍａ」まで計算し、境界値で
あるｇ（ｍｅｊ＊　ｊ）−１１（ｍｅｊ＋　ｊ）をそれ
ぞれテーブル記憶Ｇ（ｐ、ｎ、Ｄ、Ｈ（ｐ、ｎ、ｊ）へ
格納する。That is, the boundary condition is set, and msj=ms for each standard pattern time j of J-1e...Jo. 10 (j - a) m@j ``m 穆 j + IL - Matadashiku Then, the boundary values g(mej*j)-11(mej+j) are stored in the table storage G(p, n, D, H(p, n, j), respectively).

（７）式の計算が標準パタン時刻ｊ−Ｊζで終了した後
、単語境界における最小化として以上述べたように（７
）式の漸化式計算は、第２図に示すように入力パタンの
ニレフレーム分をブロック化し、さらに斜めに傾斜させ
た斜めブロックごとに実行している。After the calculation of equation (7) is completed at standard pattern time j−Jζ, as described above, (7
) is calculated by dividing the input pattern into blocks for each elm frame as shown in FIG. 2, and executing the calculation for each diagonal block that is further tilted diagonally.

最後に、入力パタンの認識結果は判定処理として次のよ
うな手続きにより求められる。Finally, the recognition result of the input pattern is determined by the following procedure as a determination process.

初期条件　ｑ＝ａｒｇｍｉｎ　　　　　　・・・・・・
（９）ｑＥＦＱ”ｑ、ｍ−Ｉ　　　　　　　・・・・・・（１０）を
求める。Initial condition q=argmin ・・・・・・
(9) Find qEF Q”q, m-I (10).

もし？＞ｏならばｑ−６＋　Ｉｎ　−’ｊとして（１１
）式を繰シ返す。ｌ！−０ならば終了。if? > o, then q-6+ In -'j (11
) repeat the formula. l! If it is -0, it ends.

（従来技術の問題点）前述の特願昭５９−６８０１５の方法では、標準パタン
を計算の途中結果であるＧ（ｐ　、　ｎ　、　ｊ　）　
、Ｈ（ｐ　。(Problems with the Prior Art) In the method of the above-mentioned Japanese Patent Application No. 59-68015, the standard pattern is converted into G(p, n, j) which is an intermediate result of calculation.
, H(p.

ｎ＋　Ｊ　）の読み出し、書き込み回数はブロック幅Ｉ
ＬＫ反比例しておシ、このブロック幅ＩＬが大きいほど
メモリアクセス時間を少なくすることができる。n+J) read and write times are block width I
Inversely proportional to LK, the larger the block width IL, the shorter the memory access time.

一方、このブロック幅ＩＬには％ａＢＬをブロックの傾
きとすると、ｍｌ　ｎ　（Ｊ”　’Ｊ　／　ａＢＬ　４　ＩＬ　　　
　・・”　（１２）なる制約条件がちシ、最大ブロック
幅は標準パタンの最小フレーム数に依存する。例えば、
ａＢＬはＤＰマツチングバスの最大傾きと等しく通常２
であるので標準パタンの最小フレーム数を１０とすれば
、ＩＬ−５とすることができる。しかし、ＩＬ−５とす
れば１０フレームよシ小さい標準パタンの計算は実行で
きない。On the other hand, for this block width IL, if %aBL is the block slope, then ml n (J"'J / aBL 4 IL
..." (12) The maximum block width depends on the minimum number of frames of the standard pattern. For example,
aBL is equal to the maximum slope of the DP matching bus, usually 2
Therefore, if the minimum number of frames of the standard pattern is 10, it can be set to IL-5. However, with IL-5, calculations for standard patterns smaller than 10 frames cannot be performed.

このように従来技術による方法では、ＩＬｘａＢＬフレ
ームよシ短い標準パタンか存在する場合は、ブロック幅
ＩＬを小さく変更しなければならず、メモリアクセス時
間が増加し認識結果が得られるまでの応答時間が大きく
なるという欠点があった。In this way, in the conventional method, if there is a standard pattern that is shorter than the ILxaBL frame, the block width IL must be changed to a smaller value, which increases memory access time and reduces the response time until recognition results are obtained. It had the disadvantage of being large.

また、ブロック幅ＩＬは漸化式計算の制御部で定めてお
シ、利用者が変更できない場合、標準パタンかＩＬｘａ
ＢＬフレームよシ長くなるまで再度登録をやシ直さなけ
ればならないという欠点があった。さらに、ブロック幅
ＩＬを変更できるように装置を構成する場合回路が複雑
になるという欠点もあった。In addition, the block width IL must be determined by the control section of the recurrence formula calculation, and if the user cannot change it, the block width IL can be determined using the standard pattern or ILxa.
There was a drawback that the registration had to be repeated again until the BL frame was longer. Furthermore, when the device is constructed so that the block width IL can be changed, the circuit becomes complicated.

（発明の目的）本発明の目的は、標準パタンを登録時に許容されるパタ
ン長よシ短い標準パタンかある場合、その短い標準パタ
ンを伸長し、許容パタン長以上の長さの標準パタンに置
き換えることにより、前記欠点を解決し、ブロック幅Ｉ
Ｌを変更させることがなく常に短い一定の応答時間内に
認識結果を出力できる連続音声認識装置を提供すること
にある。(Objective of the Invention) The object of the present invention is to expand the short standard pattern and replace it with a standard pattern whose length is longer than the allowable pattern length, if there is a standard pattern that is shorter than the allowable pattern length when registering the standard pattern. By solving the above drawback, the block width I
To provide a continuous speech recognition device that can always output recognition results within a short constant response time without changing L.

（発明の構成）本発明の連続音声認識装置の構成は、標準パタンを登録
する際に標準パタン長が許容パタン長より長いか否かを
検定するパタン長検定部と、このパタン長検定部にて標
準パタン長が短いと判定されたときこの標準パタンを前
記許容パタン長以上に伸長する伸長パタン部と、前記有
限状態オートマトンにより指定される全ての組合せに対
して単語標準パタンを連結した連続標準パタンと入力パ
タンとの最小距離を入力パタン上で所定の時間幅を持ち
標準パタン軸に対した傾斜した斜めブロックごとに動的
計画法を用いて計算する斜めブロックワイズＤＰマツチ
ング部と、この斜めブロッククイズＤＰマツチング部で
求められた最小距離が得られる単語の組合せを認識結果
として出力する認識出力部と備えることを特徴とする。(Structure of the Invention) The structure of the continuous speech recognition device of the present invention includes a pattern length verification section that verifies whether or not the standard pattern length is longer than the allowable pattern length when registering a standard pattern; an extension pattern section that extends the standard pattern to a length greater than the allowable pattern length when the standard pattern length is determined to be short; and a continuous standard that connects word standard patterns for all combinations specified by the finite state automaton. A diagonal blockwise DP matching unit that calculates the minimum distance between a pattern and an input pattern using dynamic programming for each diagonal block that has a predetermined time width on the input pattern and is tilted with respect to the standard pattern axis; The present invention is characterized by comprising a recognition output unit that outputs a combination of words that yields the minimum distance determined by the block quiz DP matching unit as a recognition result.

（実施例）次に本発明を図面に従って詳細に説明する。(Example) Next, the present invention will be explained in detail with reference to the drawings.

第１図は本発明の一実施例を示すブロック図である。マ
イクロホン１よシ音声が入力されると、音声分析部２に
よって音声分析、例えば周波数分析が行われ特徴を示す
ベクトル時系列に変換されると同時に音声検出が行われ
、音声が存在する時刻の区間が求められる。FIG. 1 is a block diagram showing one embodiment of the present invention. When a voice is input from the microphone 1, the voice analysis unit 2 performs voice analysis, for example, frequency analysis, converts it into a vector time series indicating characteristics, and at the same time performs voice detection to determine the time interval in which the voice exists. is required.

利用者は初めに定められた単語セットΣ−（ｎ）の音声
を発声し、標準パタンとして標準パタンメモリ６へ登録
する。この登録を行う場合、スイッチＳ１は下側へ倒さ
れる。音声分析部２より得られた音声区間長ｌがパタン
長検定部３へ送られ、許容パタン長／ｍｔと比較器によ
り比較される。また、／：ｈｅｍｔ　の場合は、制御線
Ｃを介してスイッチＳ２．Ｓ３がＡ側へ倒され、音声分
析部２で得られた特徴ベクトルの時系列が標準パタンと
して標準メタ／メモリ６に格納される。The user first utters the voice of the set of words Σ-(n) and registers it in the standard pattern memory 6 as a standard pattern. When performing this registration, switch S1 is pushed down. The speech segment length l obtained from the speech analysis section 2 is sent to the pattern length verification section 3, where it is compared with the allowable pattern length/mt by a comparator. Also, in the case of /:hemt, the switch S2. S3 is turned to the A side, and the time series of feature vectors obtained by the speech analysis section 2 is stored in the standard meta/memory 6 as a standard pattern.

一方、ｌ＜１Ｈ１ｔの場合は制御線Ｃを介してスイッチ
ｓ２．ｓ３がＢ側へ倒され、音声分析部２で得られた特
徴ベクトル時系列がパタン伸長部４へ送られる。このパ
タン伸長部４では１ｍｔ−７個の特徴ベクトルが等間隔
に挿入され、ｌ！フレームのパタンか線形に伸長されて
１ｍｔ７レームのパタンとなる。この場合挿入する特徴
ベクトルは隣接する特徴ベクトルと同一のものである。On the other hand, if l<1H1t, the switch s2. s3 is moved to the B side, and the feature vector time series obtained by the speech analysis section 2 is sent to the pattern expansion section 4. In this pattern expansion section 4, 1mt-7 feature vectors are inserted at equal intervals, and l! The frame pattern is linearly expanded to become a 1mt7 frame pattern. In this case, the feature vector to be inserted is the same as the adjacent feature vector.

このパタン伸長部４にて伸長されたパタンは標準パタン
メモリ６に格納される。The pattern expanded by the pattern expansion section 4 is stored in the standard pattern memory 6.

以上のようにしてすべての単語が登録された後に、スイ
ッチＳ１はＲ側へ倒され、認識が開始される。未知の入
力音声は登録時と同様に音声分析部２にて特徴ベクトル
の時系列に変換され入力パタンメモリ５に格納される。After all the words are registered as described above, the switch S1 is turned to the R side and recognition is started. The unknown input voice is converted into a time series of feature vectors by the voice analysis unit 2 and stored in the input pattern memory 5, as in the case of registration.

続いて斜めブロックワイズＤＰマツチング部７にて、入
力パタンと標準パタンの関でＤＰマツチングが行われ、
（４）式を初期値として（５）、　（６）　。Next, in the diagonal blockwise DP matching section 7, DP matching is performed between the input pattern and the standard pattern.
(5) and (6) using equation (4) as the initial value.

（７）　、　（８）式が計算される。この斜めブロック
ワイズＤＰマツチング部７は特願昭５９−６８０１５に
記載されている実施例を用いることができる。Equations (7) and (8) are calculated. As this diagonal blockwise DP matching section 7, the embodiment described in Japanese Patent Application No. 59-68015 can be used.

最後に、認識結果の出力として判定部８では（９）、（
１０）、（１１）式が計算され認識結果ｎが出力される
。この判定部８も特願昭５９−６８０１５に記載されて
いる判定部を用いることができる。Finally, as an output of the recognition result, the determination unit 8 outputs (9), (
10) and (11) are calculated and the recognition result n is output. This determining section 8 can also use the determining section described in Japanese Patent Application No. 59-68015.

以上本発明を実施例にもとづいて説明したが、これらの
記載は本発明の権利範囲を限定するものではない。本実
施例のパタン伸長部ではパタンを線形に伸長したが、パ
タンを非線形に伸長する方法も考えられる。例えば、同
じカテゴリですでに登録されているパタンとＤＰマツチ
ングを行い、その結果求められたＤＰマツチングパスよ
シ伸長させたいパタンとすでに登録されているパタンと
の間の時間対応を求めその時間対応に従って伸長させる
方法も考えられる。Although the present invention has been described above based on examples, these descriptions do not limit the scope of the rights of the present invention. Although the pattern expansion section of this embodiment linearly expands the pattern, a method of non-linearly expanding the pattern is also conceivable. For example, perform DP matching with a pattern already registered in the same category, use the resulting DP matching path to find the time correspondence between the pattern you want to expand and the already registered pattern, and follow that time correspondence. A method of elongation is also considered.

（発明の効果）以上説明したように、本発明の連続音声ＮＲ装置では、
登録時にパタン長を検定し許容されるパタン長よシ短い
場合はそのパタンを伸長することにより、標準パタンを
許容パタン長より常に長くできる。これによって斜めブ
ロックワイズＤｒマツチング部の計算単位であるブロッ
ク幅ＩＬを小さく変更する必要がなくなシ、常に短い一
定の応答時間内に認識結果を得ることができる。また、
従来の方法では標準パタンか短い場合パタンか長くなる
まで登録をやり直さなければならなかったが、本発明で
は再登録する必要はなくな）使い勝手が向上する。(Effects of the Invention) As explained above, in the continuous speech NR device of the present invention,
The standard pattern can always be made longer than the allowable pattern length by verifying the pattern length at the time of registration and expanding the pattern if it is shorter than the allowable pattern length. This eliminates the need to reduce the block width IL, which is the calculation unit of the diagonal blockwise Dr matching section, and it is possible to always obtain recognition results within a short constant response time. Also,
In the conventional method, if the standard pattern was short, it was necessary to re-register it until the pattern became long, but with the present invention, there is no need to re-register it, thereby improving usability.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図、第２図は
斜めブロックワイズＤＰマツチングの計算手順を示す図
である。図において１・・・・・・マイクロ、ホン、２・・・・・・音声分
析部、３・・・・・・パタン長検定部、４・・・・・・
パタン伸長部、５・・・・・・入力パタンメモ１ハ　６
・・・・・・標準パタンメモ１ハ　７・・・・・・斜め
ブロックワイズＤＰマツチング部、８・・・・・・判定
部、FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a diagram showing a calculation procedure for diagonal blockwise DP matching. In the figure, 1...Microphone, 2...Speech analysis section, 3...Pattern length verification section, 4...
Pattern extension section, 5... Input pattern memo 1c 6
...Standard pattern memo 1c 7...Diagonal blockwise DP matching section, 8...Judgment section,

Claims

[Claims]

In a continuous speech recognition device that recognizes speech in which a string of words specified by a finite state automaton is continuously uttered by performing DP matching with a standard pattern, when registering the standard pattern, is the standard pattern length shorter than the allowable pattern length? a pattern length testing section that tests whether or not the standard pattern length is short; an extension pattern section that extends the standard pattern beyond the allowable pattern length when the pattern length testing section determines that the standard pattern length is short; and the finite state automaton. For all specified combinations, the minimum distance between the input pattern and a continuous standard pattern in which word standard patterns are connected is calculated by moving each diagonal block with a predetermined time width on the input pattern and tilted with respect to the standard pattern axis. The present invention is characterized by comprising a diagonal blockwise DP matching section that calculates using a logical programming method, and a recognition output section that outputs a combination of words that yields the minimum distance determined by the diagonal blockwise DP matching section as a recognition result. Continuous sound recognition device.