JPH0577080B2

JPH0577080B2 -

Info

Publication number: JPH0577080B2
Application number: JP62298594A
Authority: JP
Inventors: Hiromi Fujii
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1987-11-25
Filing date: 1987-11-25
Publication date: 1993-10-25
Also published as: JPH01138596A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、発声された音声を高速で認識する音
声認識装置の改良に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to an improvement in a speech recognition device that recognizes uttered speech at high speed.

（従来の技術）音声認識は、優れたマンマシンインターフエー
スを実現する技術として重要であり、すでにさま
ざまな分野で音声認識装置が使われている。現在
の装置のほとんどは、パターンマツチング法によ
る認識方式を採用している。この方式は、発声さ
れた認識対象単語ｎの特徴ベクトルの時系列Bⁿ
＝bⁿ ₁…bⁿ _j…bⁿ _Jをあらかじめ標準パターンとして
保持しておき、入力された発声のパターンＡ＝a₁
…a_i…a_I（以下、入力パターンとする）と保持さ
れた標準パターンとの比較を行い、最も類似した
標準パターンの単語名を認識結果とするものであ
る。この時、標準パターンと入力パターンの時間
軸ｉ，ｊをそれぞれ対応つけてパターン間距離を
求める方法としては、動的計画法により非線形な
対応付けを行うDPマツチング法が使用されてい
る。DPマツチングは、例えば(1)式のような漸化
式計算を用いてa_iとb_jのベクトル間距離dⁿ（ｉ，
ｊ）の総和を最小にする最適化問題を解くもので
ある。(Prior Art) Speech recognition is an important technology for realizing excellent man-machine interfaces, and speech recognition devices are already being used in various fields. Most of the current devices employ a recognition method based on a pattern matching method. This method is based on the time series B ⁿ of the feature vector of the uttered word n to be recognized.
=b ⁿ ₁ ...b ⁿ _j ...b ⁿ _J is held as a standard pattern in advance, and the input utterance pattern A = a ₁
...a _i ...a _I (hereinafter referred to as input pattern) is compared with the stored standard pattern, and the word name of the most similar standard pattern is taken as the recognition result. At this time, as a method for determining the inter-pattern distance by associating the time axes i and j of the standard pattern and the input pattern, a DP matching method is used which performs nonlinear correspondence using dynamic programming. DP matching uses recurrence formula calculation such as equation (1) to calculate the distance between vectors a _i and b _j d ⁿ (i,
This solves an optimization problem that minimizes the sum of j).

gⁿ（ｉ，ｊ）＝dⁿ（ｉ，ｊ）＋ min Ｐ＝０，１，２［gⁿ（ｉ−１，ｊ−ｐ）］ (1) ここで、gⁿ（ｉ，ｊ）は（１，１）から（ｉ，
ｊ）までのdⁿ（ｉ，ｊ）の累積値である。DPマツ
チングについての詳細は、「連続発声した単語音
声を効率的に認識する２段DPマツチング」、日経
エレクトロニクス、1983年11月７日号の171頁よ
り208頁（以下、文献１とする）に記述されてい
る。 g ⁿ (i, j) = d ⁿ (i, j) + min P = 0, 1, 2 [g ⁿ (i-1, j-p)] (1) Here, g ⁿ (i, j) is (1,1) to (i,
j) is the cumulative value of d ⁿ (i, j). For details on DP matching, see "Two-stage DP matching for efficiently recognizing continuously uttered word sounds," Nikkei Electronics, November 7, 1983 issue, pages 171 to 208 (hereinafter referred to as Document 1). It has been described.

このDPマツチングを原理とする認識アルゴリ
ズムは、現在最も一般的に用いられているが、計
算量の多さが問題であつた。計算量の低減法とし
ては、特開昭58−98796号明細書に記載されてい
るクロツクワイズDP法である。この方法は、(1)
式のような漸化式計算を入力パターンの時間軸ｉ
に同期して処理することにより実時間性を高める
ものである。時間軸に同期した処理は、入力パタ
ーンの時刻ｉにおいて、全ての単語ｎとその時刻
ｊに対して累積距離gⁿ（ｉ，ｊ）を求めることに
よつて実現される。また、さらに高速化を狙つた
ものとしては、このクロツクワイズDP法に枝刈
の考えを導入することで処理を高速化したものが
特願昭62−61732号に述べられている。以下、こ
の方法について簡単に説明する。 The recognition algorithm based on this DP matching principle is currently the most commonly used, but the problem is that it requires a large amount of calculation. A method for reducing the amount of calculation is the clockwise DP method described in Japanese Patent Application Laid-Open No. 58-98796. This method is (1)
Input pattern time axis i by recursive formula calculation such as Eq.
This improves real-time performance by processing in synchronization with the Processing synchronized with the time axis is realized by calculating the cumulative distance g ⁿ (i, j) for all words n and their time j at time i of the input pattern. Furthermore, as a method aimed at further speeding up the process, Japanese Patent Application No. 62-61732 describes a method in which the idea of pruning was introduced into the Crotwise DP method to speed up the processing. This method will be briefly explained below.

この方法は、漸化式計算を進める際に、ある条
件を満足する（ｎ，ｊ）に対しては漸化式計算を
行い、満足しない（ｎ，ｊ）に対しては漸化式計
算を省略するという枝刈処理を取り入れることに
より、漸化式計算回路を削減するものである。枝
刈の条件としては、例えば、時刻ｉにおけるgⁿ
（ｉ，ｊ）の最小値gminに余裕分α（ビーム幅フ
アクタ）を加えたものをｉにおけるしきい値θ
（ｉ）として用いる方法がある。この場合、θ
（ｉ）＞gⁿ（ｉ，ｊ）となる（ｎ，ｊ）のみを漸化
式計算の対象として残し、それ以外の（ｎ，ｊ）
は漸化式計算の対象から外す。 In this method, when proceeding with recurrence formula calculation, recurrence formula calculation is performed for (n, j) that satisfies a certain condition, and recurrence formula calculation is performed for (n, j) that does not satisfy a certain condition. The number of recurrence formula calculation circuits is reduced by incorporating a pruning process of omitting them. As a condition for pruning, for example, g ⁿ at time i
The minimum value gmin of (i, j) plus the margin α (beam width factor) is the threshold value θ at i.
There is a method used as (i). In this case, θ
Leave only (n, j) such that (i)>g ⁿ (i, j) as the subject of recurrence formula calculation, and other (n, j)
is excluded from the recurrence formula calculation.

入力パターンa_iが入力されると、枝刈を行いな
がらクロツクワイズ法によつて漸化式計算を行
う。第２図はそのアルごリズムを示している。第
２図では、求められたgⁿ（ｉ，ｊ）としきい値θ
（ｉ）を比較し、gⁿ（ｉ，ｊ）＞θ（ｉ）であればｊ
＝ｊ＋１として漸化式計算を省略する。以上のよ
うにして、処理は入力パターンの入力と同期して
ｉ＝１からＩ−１まで処理される。入力パターン
の最後の特徴ベクトルa_Iが入力されると、ｉ＝Ｉ
において、マツチング部では、(2)式の処理が行わ
れ、各単語ｎごとに入力パターンとの距離Ｄ（Ａ，
Bⁿ）を得る。 When the input pattern a _i is input, recurrence formula calculation is performed by the clockwise method while performing pruning. Figure 2 shows the algorithm. In Figure 2, the obtained g ⁿ (i, j) and the threshold value θ
(i), and if g ⁿ (i, j) > θ(i), then j
=j+1 and omit the recurrence formula calculation. As described above, the processing is performed from i=1 to I-1 in synchronization with the input of the input pattern. When the last feature vector a _I of the input pattern is input, i=I
In the matching section, the process of equation (2) is performed, and the distance D(A,
B ⁿ ) is obtained.

Ｄ（Ａ，Bⁿ）＝gⁿ（Ｉ，Ｊ）＝gⁿ（Ｊ）＋dⁿ（Ｉ，Ｊ）
(2) 但し、gⁿ（Ｊ）＝ming_o（Ｉ−１，Ｊ） gⁿ（Ｉ−１，Ｊ−１） gⁿ（Ｉ−１，Ｊ−２）次に、求められたＤ（Ａ，Bⁿ）を順次比較し、
最小値を求め、これに対応するｎを認識結果とし
て出力する。D (A, B ⁿ ) = g ⁿ (I, J) = g ⁿ (J) + d ⁿ (I, J)
(2) However, g ⁿ (J) = ming _o (I-1, J) g ⁿ (I-1, J-1) g ⁿ (I-1, J-2) Next, the obtained D( A, B ⁿ ) are compared sequentially,
The minimum value is determined, and n corresponding to this value is output as a recognition result.

（発明が解決しようとする問題点）従来方式における枝刈を伴う認識処理では、語
中のある時刻ｉにおける枝刈で残された（ｎ，
ｊ）のｎが１種類になることもありうる。この場
合、ｉ＋１以降の漸化式計算を行つてもｎ以外の
単語が認識結果となることはない。しかしなが
ら、従来方式による音声認識装置では、認識結果
を得るために入力パターンの開始から終了時刻ま
でのパターンに対して標準パターンとのパターン
間距離Ｄ（Ａ，Bⁿ）を求める必要があつた。しか
し、上記のように、枝刈によつてｎが１種類にな
つたその時点で認識結果が１意に定まつてしま
う。そのため、認識結果を得るという意味では、
ｉ＋１以降Ｉまで計算は余分な処理であり、認識
速度を低下させる原因となつていた。(Problems to be Solved by the Invention) In recognition processing that involves pruning in the conventional method, pruning at a certain time i in a word (n,
It is also possible that n in j) is of one type. In this case, even if the recurrence formula calculation is performed after i+1, words other than n will not be recognized as a result. However, in the conventional speech recognition apparatus, in order to obtain a recognition result, it is necessary to calculate the inter-pattern distance D (A, B ⁿ ) from the standard pattern for the pattern from the start to the end time of the input pattern. However, as described above, the recognition result is uniquely determined at the point when n becomes one type due to pruning. Therefore, in terms of obtaining recognition results,
Calculations from i+1 to I are redundant processing, which causes a reduction in recognition speed.

本発明の目的は、上記に述べた計算の無駄を省
き、より速く認識結果を得ることのできる音声認
識装置を提供することにある。 An object of the present invention is to provide a speech recognition device that can eliminate the above-mentioned wasteful calculations and obtain recognition results more quickly.

（問題点を解決するための手段）本発明による音声認識装置は次の各部を必要と
する。すなわち各単語ｎの音声の特徴ベクトル時
系列Bⁿ＝bⁿ ₁…bⁿ _j…bⁿ _Jを標準パターンとして保持
する標準パターン格納部と、枝刈のしきい値を格
納するしきい値格納部と、時刻ｉの入力音声の特
徴ベクトルa_iを逐次読み込み、各時刻ｉにおいて
入力音声の特徴a_iと前記標準パターン格納部の標
準パターンとの距離dⁿ（ｉ，ｊ）の累積距離gⁿ
（ｉ，ｊ）を前記しきい値格納部のしきい値によ
つて定められる条件を満足する（ｎ，ｊ）の値に
対して求め、またそのとき条件を満足する（ｎ，
ｊ）のｎの値が１種類である場合には単語ｎを認
識結果として出力するマツチング部と、このマツ
チング部にて求められた累積距離を格納する累積
距離格納部と、前記マツチング部にて時刻Ｉに求
められた累積距離gⁿ（Ｉ，Ｊ）の最小値を与える
単語ｎを認識結果として出力する判定部の各部で
ある。(Means for Solving the Problems) The speech recognition device according to the present invention requires the following parts. In other words, there is a standard pattern storage section that stores the speech feature vector time series B ⁿ = b ⁿ ₁ ...b ⁿ _j ...b ⁿ _J as a standard pattern for each word n, and a threshold storage section that stores the threshold value for pruning. and the feature vector a _i of the input speech at time i are sequentially read, and at each time i, the cumulative distance g of the distance d ⁿ (i, j) between the feature a _i of the input speech and the standard pattern in the standard pattern storage section is calculated. ⁿ
(i, j) is calculated for the value of (n, j) that satisfies the condition determined by the threshold value of the threshold storage section, and at that time, the value (n, j) that satisfies the condition is calculated.
If the value of n in j) is one type, a matching unit outputs the word n as a recognition result, a cumulative distance storage unit stores the cumulative distance obtained by this matching unit, and the matching unit These are each part of the determination unit that outputs the word n that gives the minimum value of the cumulative distance g ⁿ (I, J) determined at time I as a recognition result.

（作用）本発明による音声認識装置は、漸化式計算を進
める際、枝刈されない（ｎ，ｊ）のｎが１種類し
か存在しない場合には、入力パターンの入力終了
を持たずに単語ｎを認識結果として出力する点を
特徴とする。枝刈されない（ｎ，ｊ）のｎが１種
類になつたということは、これ以上漸化式計算を
行つてもｎ以外の単語が認識結果になることはな
いことを意味する。従つて、ｎが１種類になつた
時点で、ｎを認識結果として出力することにより
それ以降の漸化式計算を省略することができ、認
識処理を高速化することができる。(Operation) When the speech recognition device according to the present invention proceeds with recurrence formula calculation, if there is only one type of n of (n, j) that is not pruned, the speech recognition device according to the present invention performs word n without having an input end of the input pattern. The feature is that it outputs as a recognition result. The fact that n in (n, j) that is not pruned is now one type means that words other than n will not become recognition results even if recurrence formula calculations are performed any further. Therefore, when n becomes one type, by outputting n as a recognition result, subsequent recurrence formula calculations can be omitted, and recognition processing can be speeded up.

以上の機能を持つ音声認識装置は、枝刈されな
い（ｎ，ｊ）のｎが１種類かどうかの判定機能を
持ち、そのようなｎが１種類であればその時点で
ｎを結果として出力し、２種類以上ある場合には
ｉ＋１における漸化式処理を続ける処理により実
現できる。 A speech recognition device with the above functions has a function to determine whether there is one type of n in (n, j) that is not pruned, and if there is one type of n, it outputs n as a result at that point. , if there are two or more types, it can be realized by continuing the recurrence formula processing at i+1.

（実施例）以下に、本発明の実施例について図面を参照し
ながら詳細に説明する。第１図は、本発明の一実
施例を示すブロツク図である。(Example) Examples of the present invention will be described in detail below with reference to the drawings. FIG. 1 is a block diagram showing one embodiment of the present invention.

第１図における標準パターン格納部１には、あ
らかじめ発声された認識対象単語ｎ（１≦ｎ≦Ｎ）
の各時系列データが標準パターンB_o＝bⁿ ₁…bⁿ _j…
bⁿ _Jとして格納されており、しきい値格納部２に
は、時刻ｉにおける枝刈のしきい値θ（ｉ）があ
らかじめ格納されているとする。発声された入力
パターンＡは実時間で分析され、特徴ベクトルa_i
の時系列のデータとして逐次マツチング部３に入
力される。 The standard pattern storage unit 1 in FIG. 1 stores recognition target words n (1≦n≦N) uttered in advance
Each time series data of is a standard pattern B _o = b ⁿ ₁ … b ⁿ _j …
_It is assumed that the threshold value θ(i ⁾ for pruning at time i is stored in the threshold storage unit 2 in advance. The uttered input pattern A is analyzed in real time and the feature vector a _i
The data is sequentially input to the matching unit 3 as time-series data.

マツチング部３では、入力されたa_iごとに、
ｎ，ｊに対して枝刈を伴う漸化式計算を行いgⁿ
（ｉ，ｊ）を求める。この処理には従来方式にお
けるクロツクワイズDP法に枝刈の考えを導入し
た方式（前記文献、特願昭62−61732号）を用い
る。マツチング部３では、しきい値格納部２のし
きい値θ（ｉ）を読み込み、gⁿ（ｉ，ｊ）＜θ（ｉ）
を満足する（ｎ，ｊ）に対して漸化式計算を行
う。マツチング部３における枝刈を伴う漸化式計
算の処理としては、第２図に示した従来方式にお
ける処理を用いることができる。マツチング部３
における処理において、時刻ｉで求められた累積
距離gⁿ（ｉ，ｊ）は、累積距離格納部４に保持さ
れる。それらの累積距離は、ｉ＋１の漸化式計算
時にマツチング部３より読み出され使用される。
ここで、マツチング部３では、上記の処理の他に
（ｎ，ｊ）のｎが１種類かどうか判定を行い、１
種類であればｎを認識結果として出力する。この
処理を含めたマツチング部３のアルゴリズムを第
３図に示す。２重枠内がｎが１種類かどうかの判
定処理である。n′は単語ｎに対して、枝仮されな
いｊがあるかどうかのフラグであり、そのような
ｊがあればn′は単語番号ｎの値となる。また、nn
はｉにおいて枝刈されない（ｎ，ｊ）のｎの個数
である。ｉの処理が終了した後、nn＜２なる判
定処理を満足した場合にn′を認識結果として出力
する。 In the matching section 3, for each input a _i ,
Perform recurrence formula calculation with pruning for n, j and calculate g ⁿ
Find (i, j). For this process, a method is used in which the idea of pruning is introduced into the conventional clockwise DP method (see the above-mentioned document, Japanese Patent Application No. 61732/1983). The matching unit 3 reads the threshold value θ(i) from the threshold storage unit 2 and calculates g ⁿ (i, j)<θ(i)
Recurrence formula calculation is performed for (n, j) that satisfies . As the recursion formula calculation process involving pruning in the matching section 3, the conventional process shown in FIG. 2 can be used. Matching section 3
In the process in step 1, the cumulative distance g ⁿ (i, j) found at time i is held in the cumulative distance storage section 4 . These cumulative distances are read out from the matching unit 3 and used when calculating the recurrence formula of i+1.
Here, in the matching section 3, in addition to the above processing, it is determined whether or not n of (n, j) is one type.
If it is a type, n is output as the recognition result. The algorithm of the matching section 3 including this process is shown in FIG. The area within the double frame is a process for determining whether n is one type. n' is a flag indicating whether or not there is a j that is not branched for the word n, and if there is such a j, n' becomes the value of the word number n. Also, nn
is the number of n pieces of (n, j) that are not pruned in i. After the processing of i is completed, if the determination processing of nn<2 is satisfied, n' is output as the recognition result.

このアルゴリズムにより枝刈されない（ｎ，
ｊ）のｎが常に２種類以上ある場合は、従来方式
と同様に時刻Ｉ−１までの漸化式計算とＩにおけ
る(2)式の処理を行い、入力パターンＡと全ての標
準パターンBnとのパターン間距離を求める。判
定部５では、マツチング部４にて得られた入力パ
ターンＡと全ての標準パターンBn，ｎ＝１…Ｎ
とのパターン間距離のうち最小距離を与えた標準
パターンを結果として出力する。 This algorithm does not prune (n,
If there are always two or more types of n in j), the recurrence formula calculation up to time I-1 and the processing of equation (2) at I are performed as in the conventional method, and input pattern A and all standard patterns Bn and Find the distance between patterns. In the determination unit 5, the input pattern A obtained in the matching unit 4 and all standard patterns Bn, n=1...N
Outputs the standard pattern that gives the minimum distance among the inter-pattern distances as a result.

（発明の効果）以上に述べたように、本発明によれば、枝刈さ
れない（ｎ，ｊ）のｎが１種類である場合、直ち
にｎを結果として出力することにより、計算量を
低減し、認識処理速度を高めた音声認識装置の実
現が可能になる。(Effects of the Invention) As described above, according to the present invention, when there is only one type of n in (n, j) that is not pruned, the amount of calculation is reduced by immediately outputting n as a result. , it becomes possible to realize a speech recognition device with increased recognition processing speed.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロツク図、
第２図は従来方式のマツチング部における処理の
アルゴリズムを示す流れ図、第３図は第１図のマ
ツチング部における処理のアルゴリズムを示す流
れ図である。１……標準パターン格納部、２……しきい値格
納部、３……マツチング部、４……累積距離格納
部、５……判定部。 FIG. 1 is a block diagram showing one embodiment of the present invention;
FIG. 2 is a flowchart showing the processing algorithm in the matching section of the conventional system, and FIG. 3 is a flowchart showing the processing algorithm in the matching section of FIG. 1... Standard pattern storage section, 2... Threshold storage section, 3... Matching section, 4... Cumulative distance storage section, 5... Judgment section.

Claims

[Claims]

1 Speech feature vector time series B ⁿ = of each word n
b ⁿ ₁ …b ⁿ _j …b ⁿ _J is stored as a standard pattern in a standard pattern storage unit, a threshold storage unit stores a pruning threshold value, and a feature vector a _i of input speech at time i is stored. Sequential reading, at each time i, the cumulative distance g ⁿ (i, j) of the distance d n (i, j) between the input voice feature a _i and the standard pattern in the standard pattern storage unit is calculated as the cumulative distance g ⁿ (i, j) in the threshold storage unit. Find the value of (n, j) that satisfies the condition determined by the threshold value,
At that time, if there is only one value of n in (n, j) that satisfies the condition, there is a matching section that outputs the word n as a recognition result, and a cumulative section that stores the cumulative distance obtained by this matching section. Speech recognition characterized by having a distance storage unit and a determination unit that outputs a word n that gives the minimum value of the cumulative distance g ⁿ (I, J) found at time I by the matching unit as a recognition result. Device.