JPH0449719B2

JPH0449719B2 -

Info

Publication number: JPH0449719B2
Application number: JP58048103A
Authority: JP
Inventors: Hidekazu Tsuboka
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1983-03-22
Filing date: 1983-03-22
Publication date: 1992-08-12
Also published as: JPS59172698A

Description

【発明の詳細な説明】産業上の利用分野本発明は音声認識装置、特に単音節音声と単語
音声の両方を認識できる音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a speech recognition device, and more particularly to a speech recognition device capable of recognizing both monosyllabic speech and word speech.

従来例の構成とその問題点仮名漢字変換機能付のワードプロセツサの仮名
キーボードの代りに区切つて発生した単音節を認
識する単音節音声認識装置を用いた音声入力ワー
ドプロセツサが既に製品化されている。Structure of the conventional example and its problems A speech input word processor that uses a monosyllabic speech recognition device that recognizes monosyllables generated by dividing them into words instead of the kana keyboard of a word processor with a kana-kanji conversion function has already been commercialized. ing.

この場合、音声認識は、使用者が予め自分の声
を標準パターンとして登録しておかねばならない
いわゆる特定話者向のものである。このとき、
「改行」、「削除」、「変換」等のフアンクシヨン機
能や、その他頻出単語を単語音声として音声入力
で行うためには、単語節についての標準パターン
のみならず単語についても標準パターンとして特
徴ベクトルの系列を登録しておかなければならな
い。このため、単語の数が増加すると記憶容量が
増加し、また、計算量も、入力パターンの特徴の
ベクトルと、単音節および単語の特徴ベクトル系
列との距離計算を行なわねばならないため、非常
に増加するという問題があつた。 In this case, voice recognition is for so-called specific speakers, in which the user must register his/her own voice as a standard pattern in advance. At this time,
In order to use function functions such as "line break", "deletion", "conversion", and other frequently occurring words as word sounds, it is necessary to use feature vectors not only as standard patterns for word clauses but also as standard patterns for words. The series must be registered. For this reason, as the number of words increases, the storage capacity increases, and the amount of calculation also increases significantly, as it is necessary to calculate the distance between the feature vector of the input pattern and the feature vector series of monosyllables and words. There was a problem.

発明の目的本発明は以上の問題を解消し、記憶容量および
計算量の増大を抑えつつ、単音節および単語の認
識を行うことができる音声認識装置を提供するこ
とを目的とする。OBJECTS OF THE INVENTION An object of the present invention is to solve the above problems and provide a speech recognition device that can recognize monosyllables and words while suppressing increases in storage capacity and calculation amount.

発明の構成本発明の音声認識装置は、単音節標準パターン
を予め登録しておき、区切つて発声された単音節
の認識を行うとともに単語標準パターンを単音節
標準パターンの結合パターンとし、単音節毎に求
めたベクトル間距離を用いて単語認識を行うよう
に構成し、単音節の認識と単語の認識をともに行
えるようにしたものである。Structure of the Invention The speech recognition device of the present invention registers monosyllable standard patterns in advance, recognizes monosyllables uttered in sections, and uses the word standard pattern as a combination pattern of the monosyllable standard patterns, for each monosyllable. The system is configured to perform word recognition using the distance between vectors determined by the method, so that both monosyllable recognition and word recognition can be performed.

実施例の説明以下、本発明の実施例について説明するがその
前にパターンマツチングによる単語音声認識装置
について説明する。この装置の一般的な構成は次
のようなものである。DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments of the present invention will be described, but before that, a word speech recognition device using pattern matching will be described. The general configuration of this device is as follows.

入力音声信号を、フイルタバンク、周波数分析
LPC分析等によつて特徴ベクトルの系列に変換
する特徴抽出手段と、予め発声され、この特徴抽
出手段により抽出された特徴ベクトルの系列を認
識単語全部について標準パターンとして登録して
おく標準パターン記憶手段と、認識させるべく発
生され、前記特徴抽出手段により抽出された入力
パターンと前記標準パターン記憶手段に記憶され
ている標準パターンの全てと特徴ベクトルの系列
としての類似度あるいは距離を計算するパターン
比較手段と、パターン比較の結果、最も類似度の
高かつた（距離の小さかつた）標準パターンに対
応する単語を認識結果として判定出力する判定手
段からなる。 Input audio signal, filter bank, frequency analysis
Feature extraction means for converting into a series of feature vectors by LPC analysis etc., and standard pattern storage means for registering the series of feature vectors uttered in advance and extracted by the feature extraction means as standard patterns for all recognized words. and pattern comparison means for calculating the similarity or distance as a series of feature vectors between the input pattern generated for recognition and extracted by the feature extraction means and all of the standard patterns stored in the standard pattern storage means. and a determining means for determining and outputting a word corresponding to the standard pattern having the highest degree of similarity (smallest distance) as a recognition result as a result of pattern comparison.

このとき、同一話者が同一の単語を発声しても
発声の都度、その発声時間長が異るので、前記パ
ターン比較手段で標準パターンと入力パターンの
比較を行う際には、両者の時間軸を伸縮させ、両
者のパターン長を揃えて比較する必要がある。そ
の際、発声時間長の変化は、発声単語の各部で一
様に生じているのではないので、各部を不均一に
伸縮する必要がある。 At this time, even if the same speaker utters the same word, the duration of the utterance differs each time, so when comparing the standard pattern and the input pattern using the pattern comparison means, the time axis of both It is necessary to expand and contract the pattern lengths of the two to make them the same and compare them. At this time, since the change in utterance time length does not occur uniformly in each part of the uttered word, it is necessary to expand and contract each part non-uniformly.

これを図で表現したのが第１図である。第１図
ａにおいて横軸は入力パターンＴ＝a₁a₂…a_I（a_iは
入力パターンの第ｉフレームの特徴ベクトル）に
対応するｉ座標、縦軸は標準パターンRⁿ＝bⁿ ₁bⁿ ₂
…bⁿ _Jｎ（bⁿ _jは標準パターンRⁿの第ｊフレームの特
徴ベクトル）に対応するｊ座標を表す。入力パタ
ーンＴと標準パターンRⁿとを時間軸を非線形に
伸縮してマツチングするとはこの格子グラフ上に
おいて、両パターンの各特徴ベクトルの対応関係
を締す径路１を、両パターンの、系列としての距
離が最小になるという評価基準のもとで見出し、
そのときの距離を両パターンの距離とする。この
計算を効率的に行う方法として動的計画法を用い
る方法が良く知られており、DPマツチングと呼
ばれている。 Figure 1 represents this graphically. In Figure 1a, the horizontal axis is the i coordinate corresponding to the input pattern T = a ₁ a ₂ ... a _I (a _i is the feature vector of the i-th frame of the input pattern), and the vertical axis is the standard pattern R ⁿ = b ⁿ ₁ b ⁿ ₂
... represents the j coordinate corresponding to b ⁿ _J n (b ⁿ _j is the feature vector of the j-th frame of the standard pattern R ⁿ ). Matching the input pattern T and the standard pattern R ⁿ by nonlinearly expanding and contracting the time axis means that on this lattice graph, the path 1 that tightens the correspondence of each feature vector of both patterns is defined as a series of both patterns. Find out based on the evaluation criterion that the distance is the minimum,
The distance at that time is defined as the distance between both patterns. A method using dynamic programming is well known as a method for efficiently performing this calculation, and is called DP matching.

この径路を決める際には音声の性質を考慮して
制限条件を設ける。第１図ｂは傾斜制限と呼ばれ
る径路選択の条件の一例である。即ち、この例で
は点（ｉ，ｊ）へ径る径路は、点（ｉ−２，ｉ−
１）から点（ｉ−１，ｊ）を通る径路が、点（ｉ
−１，ｊ−１）からの径路か、点（ｉ−１，ｊ−
１）から点（ｉ，ｊ−１）を通る径路かの何れか
の径路しか取り得ないことを意味しており、入力
パターンと標準パターンの始端と終端は必ず対応
させるという条件をつければ、前記マツチングの
径路は第１図ａの斜線の部分に制限される。この
制限は、いかに時間軸が伸縮するとはいつても、
同一単語に対してはそれ程極端に伸縮するはずは
ないという事実からあまり極端な対応づけが生じ
ないようにするためである。 When determining this route, limiting conditions are set in consideration of the nature of the voice. FIG. 1b is an example of a route selection condition called slope restriction. That is, in this example, the path leading to point (i, j) is point (i-2, i-
1) through point (i-1,j) is point (i
-1,j-1) or point (i-1,j-
This means that only one path can be taken from 1) to the point (i, j-1), and if we provide the condition that the starting and ending ends of the input pattern and the standard pattern must correspond, then the above The matching path is limited to the shaded area in FIG. 1a. This restriction applies no matter how the time axis expands or contracts.
This is to prevent extreme correspondences from occurring due to the fact that the same word cannot be expanded or contracted so drastically.

両系列間の距離は、入力ベクトルa_iと標準パタ
ーンベクトルbⁿ _jのベクトル間距離dⁿ（ｉ，ｊ）の
前記径路に沿う重み付平均として定義される。 The distance between both series is defined as a weighted average along the path of the intervector distance d ⁿ (i, j) between the input vector a _i and the standard pattern vector b ⁿ _j .

このとき径路に沿う重みの和が径路の選ばれ方
に依らず一定になるようにしておけばDPマツチ
ングの手法が使える。 At this time, the DP matching method can be used if the sum of the weights along the path is made constant regardless of how the path is selected.

第２図は端音節音声標準パターンを結合するこ
とによつて構成した単語標準パターンと入力パタ
ーンのマツチングの様子を図示したものである。
同図において、R^q(1)、R^q(2)、R^q(3)は単音節ｑ(1)、
ｑ(2)、ｑ(3)の標準パターンを意味し、この例は単
音節ｑ(1)、ｑ(2)、ｑ(3)から成る単語の標準パター
ンと入力パターンをマツチングする場合を示して
いる。前記説明に従つてマツチング径路は、例え
ば２のようになる。 FIG. 2 illustrates the matching of the word standard pattern constructed by combining end syllable speech standard patterns and the input pattern.
In the same figure, R ^q (1), R ^q (2), and R ^q (3) are monosyllables q(1),
It means the standard pattern of q(2), q(3), and this example shows the case of matching the input pattern with the standard pattern of words consisting of monosyllables q(1), q(2), q(3). ing. According to the above description, the matching path is, for example, 2.

以下、前記したパターンマツチングの手法を用
いた本発明の実施例について説明する。 Examples of the present invention using the pattern matching method described above will be described below.

第３図は本発明の一実施例を示すブロツク図で
ある。図において、３は音声信号の入力端子、４
はフイルタバンク等で構成された、入力音声信号
を特徴ベクトルの系列に変換する特徴抽出部であ
る。５は単音節標準パターン記憶部であつて、各
単音節の特徴ベクトルの系列に変換されたパター
ンが記憶される。６はベクトル間距離計算部であ
つて、単音節標準パターン記憶部５の標準パター
ンRⁿを構成するベクトルbⁿ _jと入力パターンを構
成するベクトルa_iの距離dⁿ（ｉ，ｊ）を計算する。
a_i＝（a_i1、a_i2、…、a_il）、bⁿ _j＝（bⁿ _j1、bⁿ _j2、…、
bⁿ _jl）
とするとき、dⁿ（ｉ，ｊ）は最も簡単には、 dⁿ（ｉ，ｊ）＝_l 〓^k=1 ｜a_ik−bⁿ _jk｜で与えられる。 FIG. 3 is a block diagram showing one embodiment of the present invention. In the figure, 3 is an audio signal input terminal;
is a feature extractor that converts an input audio signal into a series of feature vectors, which is composed of a filter bank or the like. Reference numeral 5 denotes a monosyllabic standard pattern storage unit in which patterns converted into a series of feature vectors for each monosyllable are stored. Reference numeral 6 denotes an inter-vector distance calculation unit which calculates the distance d ⁿ (i, j) between the vector b ⁿ _j forming the standard pattern R ⁿ in the monosyllabic standard pattern storage unit 5 and the vector a _i forming the input pattern. do.
a _i = (a _i1 , a _i2 , ..., a _il ), b ⁿ _j = (b ⁿ _j1 , b ⁿ _j2 , ...,
b ⁿ _jl )
Then, d ⁿ (i, j) is most simply given by d ⁿ (i, j) = _l 〓 ^k=1 | a _ik − b ⁿ _jk |.

７はベクトル間距離記憶部であつて、ベクトル
間距離計算部６で計算された結果を記憶してい
る。 Reference numeral 7 denotes an inter-vector distance storage unit which stores the results calculated by the inter-vector distance calculation unit 6.

８は単音節累積距離計算部であつて、各単音節
について第１フレームから現フレームまでの累積
距離（dⁿ（ｉ，ｊ）のマツチング径路に沿う重み
付和）を求める。マツチング径路の拘束条件とし
て第１図ｂを採用し、各径路に沿う重み係数を同
図の径路上に付した数値とすると、座標（ｉ，
ｊ）における標準パターンRⁿに対する累積距離
Dⁿ（ｉ，ｊ）は次のように与えられる。 Reference numeral 8 denotes a monosyllable cumulative distance calculation unit, which calculates the cumulative distance (weighted sum along the matching path of d ⁿ (i, j)) from the first frame to the current frame for each monosyllable. If Figure 1b is adopted as the constraint condition for the matching route, and the weighting coefficient along each route is the numerical value attached to the route in the same figure, then the coordinates (i,
Cumulative distance for standard pattern R ⁿ in j)
D ⁿ (i, j) is given as follows.

Dⁿ（ｉ，ｊ）＝minDⁿ（ｉ−２，ｊ−１）＋ｄ（ｉ−
１，ｊ）＋ｄ（ｉ，ｊ）Ｄ（ｉ−１，ｊ）＋ｄ（ｉ，ｊ）Ｄ（ｉ−１，ｊ）＋ｄ（ｉ，ｊ） Dⁿ（ｉ−１，ｊ−２）＋0.5d（ｉ−１，ｊ）＋0.5d（ｉ
，ｊ）……(1) ９は単音節判定部であつて、Dⁿ（Ｉ，Jⁿ）が最
小になるｎをn^とするときRⁿに対応する単音節を
認識する結果とする。 D ⁿ (i, j) = minD ⁿ (i-2, j-1) + d (i-
1,j)+d(i,j) D(i-1,j)+d(i,j) D(i-1,j)+d(i,j) D ⁿ (i-1,j-2)+0 .5d(i-1,j)+0.5d(i
, j)...(1) 9 is a monosyllable determination unit, which recognizes the monosyllable corresponding to R ⁿ when n where D ⁿ (I, J ⁿ ) is minimum is set to n^. .

１０は単語辞書であつて、認識単語がそれを表
わす単音節の記号の列としてキーボード等で入力
することにより準備されている。１１は単語累積
距離計算部であつて、マツチングさせたい単語に
対し、単語辞書１０で指定される単音節の順序に
従つて、ベクトル間距離記憶部７に記憶されてい
る。既に計算済のベクトル間距離を読み出してき
て、単語としての点（ｉ，ｊ）までの累積距離を
計算する。即ち、例えば第２図において、第ｉフ
レームにおいて、ｎ＝１、２、…Ｎ（Ｎは単音節
数）に対して単音節標準パターンRⁿ＝bⁿ ₁bⁿ ₂…bⁿ _Jo
のそれぞれのベクトルbⁿ _jと入力パターンＴ＝a₁a₂
…a_Iの第ｉフレームのベクトルa_iとのベクトル間
距離dⁿ（ｉ，ｊ）は既に単音節認識の際に計算済
であるから、R^q(1)、R^q(2)、R^q(3)の結合パターン
R^q(1)R^q(2)R^q(3)＝b^q(1) ₁b^q(1) ₂…b^q(1) _Jq(1)b^q(2) ₁b^q(2) ₂…b^q(2) _Jq(2)
b^q(3)b^q(3)…b^q(3) _Jq(3)とa_iとのベクトル間距離は新た
に
計算する必要はない。単語判定部１２は入力が完
了した後、単語累積距離計算部１１で得られたそ
れぞれの単語に対する最終累積距離のうち、最小
値を与えるものを単語の認識結果として判定す
る。１３は単音節・単語識別部であつて、入力さ
れた音声が単音節であつたか、単語であつたかを
識別する。これは、キーボードから指定すること
もできるが、自動的に行うには音声区間の長さに
より判定することもできる。即ち、単音節よりも
単語の方が発生時間が長いので、入力信号の電力
から通常の方法で求められた音声区間が予め定め
られた閾値を越えるか否かにより単語か単音節か
を判定できる。１４は、認識結果切換部であつ
て、単音節・単語識別部１３の結果に応じて単音
節判定部９か、単語判定部１２の出力を切り換え
るものである。１５は認識結果の出力端子であ
る。 Reference numeral 10 is a word dictionary in which recognized words are prepared by inputting them using a keyboard or the like as a string of monosyllabic symbols representing the words. Reference numeral 11 denotes a word cumulative distance calculation unit, which stores words to be matched in the inter-vector distance storage unit 7 in the order of monosyllables specified in the word dictionary 10. The already calculated distance between vectors is read out, and the cumulative distance to the point (i, j) as a word is calculated. That is, for example, in FIG. 2, in the i-th frame, the monosyllabic standard pattern R ⁿ = b ⁿ ₁ b ⁿ 2 ... b ⁿ _Jo for n = 1, _2, ...N (N is the number of monosyllables)
each vector b ⁿ _j and input pattern T=a ₁ a ₂
...The inter-vector distance d ⁿ (i, j) between vector a _i of the i-th frame of a _I has already been calculated during monosyllable recognition, so R ^q(1) , R ^q(2) , R combination pattern of ^q(3)
R ^q(1) R ^q(2) R ^q(3) =b ^q(1) ₁ b ^q(1) ₂ …b ^q(1) _Jq(1) b ^q(2) ₁ b ^q(2) ₂ …b ^q(2) _Jq(2)
b ^q(3) b ^q(3) ...b ^q(3) There is no need to newly calculate the vector distance between _Jq(3) and a _i . After the input is completed, the word determination section 12 determines the one that gives the minimum value among the final cumulative distances for each word obtained by the word cumulative distance calculation section 11 as the word recognition result. Reference numeral 13 denotes a monosyllable/word identification unit, which identifies whether the input voice is a monosyllable or a word. This can be specified from the keyboard, but it can also be determined automatically based on the length of the voice section. In other words, since words take longer to occur than monosyllables, it is possible to determine whether a speech interval obtained from the power of the input signal using a normal method exceeds a predetermined threshold or not to determine whether it is a word or a monosyllable. . Reference numeral 14 denotes a recognition result switching unit that switches the output of the monosyllable determining unit 9 or the word determining unit 12 according to the result of the monosyllable/word identifying unit 13. 15 is an output terminal for the recognition result.

本実施例のように、入力のフレーム毎にすべて
の単音節、単語毎にベクトル間距離、累積距離の
計算を完了させてゆけば、入力が終了すると同時
に認識結果が得られるようになる。このとき径路
選択の拘束条件を第１図ｂとすれば、式(1)からも
明らかなように、ベクトル間距離記憶部７は、フ
レームｉの処理において必要とされるベクトル間
距離記憶しておけばよいのであつて第ｉフレーム
第ｉ−１フレームのものだけ記憶していればよ
い。また、累積距離については、単音節に対する
もの、単語に対するものに共に、第ｉ−１フレー
ムと第ｉ−２フレームのものだけ記憶していれば
よい。また、累積距離については、単音節に対す
るもの、単語に対するもの共に、第ｉ−１フレー
ムと第ｉ−２フレームのものだけ記憶していれば
よい。なお、これら累積距離の記憶部は単音節累
積距離計算部８、単語累積距離計算部１１に含ま
れており、図示していない。 As in this embodiment, if the calculation of the inter-vector distance and cumulative distance is completed for every single syllable and every word in each input frame, the recognition result can be obtained at the same time as the input ends. At this time, if the constraint condition for route selection is shown in FIG. It is sufficient to store only the i-th frame and the i-1th frame. Further, regarding the cumulative distance, it is sufficient to store only those for the i-1th frame and the i-2th frame, as well as those for single syllables and words. Further, regarding cumulative distances, it is sufficient to store only those for the i-1th frame and the i-2th frame, both for single syllables and for words. Note that storage units for these cumulative distances are included in the monosyllable cumulative distance calculation unit 8 and the word cumulative distance calculation unit 11, and are not shown.

なお、本実施例では、単音節判定部９と単語判
定部１２のように判定部を単音節と単語の場合に
ついて分け、単音節・単語識別部９により入力音
声が単音節か単語の何れであつたかを識別し、認
識結果切換部１４で前記両判定部の何れかの判定
結果を出力する構成としたが、他の実施例とし
て、判定部を１つだけ設け、単音節累積距離計算
部８と単語累積距離計算部１１の出力を区別する
ことなく距離が最も小さくなる標準パターンに対
応する単音節あるいは単語を認識結果として出力
するようにも構成できる。その場合は単音節・単
語識別部１３、認識結果切換部１４は不要とな
る。 In this embodiment, the determining units are divided into monosyllables and words, such as the monosyllable determining unit 9 and the word determining unit 12, and the monosyllable/word identifying unit 9 determines whether the input speech is a monosyllable or a word. Although the recognition result switching unit 14 outputs the determination result of either of the determination units, in another embodiment, only one determination unit is provided and the monosyllable cumulative distance calculation unit It is also possible to output a single syllable or a word corresponding to the standard pattern with the smallest distance as a recognition result without distinguishing between the output of the word cumulative distance calculating section 11 and the output of the cumulative word distance calculating section 11. In that case, the monosyllable/word identification unit 13 and the recognition result switching unit 14 are unnecessary.

なお、以上説明した実施例の各構成要素は、ソ
フトウエア手段によりその機能を実現することも
可能である。 Note that the functions of each component of the embodiment described above can also be realized by software means.

発明の効果本発明の音声認識装置は、単音節標準パターン
を予め登録しておき、区切つて発生された単音節
の認識を行うとともに単語標準ハターンを単音節
標準パターンの結合パターンとし、単音節毎に求
められたベクトル間距離を用いて単語累積距離を
求めるように構成したので、単音節音声認識と単
語音声認識の両方が可能となり、日本語ワードプ
ロセサや音声タイプライタを実現する場合、わざ
わざ単音節音声の他に単語音声を標準パターンと
して登録する必要はなく、単音節音声を登録して
おくだけでフアクシヨンキーに相当する単語や、
頻出する単語等は単語認識でも入力可能とするこ
とができ、その価値は高い。Effects of the Invention The speech recognition device of the present invention registers monosyllabic standard patterns in advance, recognizes monosyllables generated in sections, and uses word standard patterns as a combination pattern of the monosyllabic standard patterns, for each monosyllable. Since the word cumulative distance is calculated using the distance between the vectors found in There is no need to register word sounds as standard patterns in addition to syllable sounds, just register monosyllabic sounds to create words that correspond to faction keys,
Frequently appearing words can also be input using word recognition, which is highly valuable.

[Brief explanation of the drawing]

第１図ａ，ｂはDPマツチングの原理を説明す
る図、第２図は本発明の原理を説明する図、第３
図は本発明における一実施例の音声認識装置の構
成を示すブロツク図である。４……特徴抽出部、５……単音節標準パターン
記憶部、６……ベクトル間距離計算部、７……ベ
クトル間距離記憶部、８……単音節累積距離計算
部、９……単音節判定部、１０……単語辞書、１
１……単語累積距離計算部、１２……単語判定
部。 Figures 1a and b are diagrams explaining the principle of DP matching, Figure 2 is a diagram explaining the principle of the present invention, and Figure 3 is a diagram explaining the principle of DP matching.
The figure is a block diagram showing the configuration of a speech recognition device according to an embodiment of the present invention. 4... Feature extraction unit, 5... Monosyllabic standard pattern storage unit, 6... Inter-vector distance calculation unit, 7... Inter-vector distance storage unit, 8... Monosyllabic cumulative distance calculation unit, 9... Monosyllabic Judgment unit, 10...Word dictionary, 1
1...Word cumulative distance calculation unit, 12...Word determination unit.

Claims

[Claims]

1 Input audio signal as a sequence of feature vectors a ₁ a ₂ …a _i
...a _I , a feature extraction means for converting the monosyllabic speech signal of the n-th monosyllabic into an input pattern T consisting of a feature vector sequence b ⁿ ₁ b ⁿ ₂ ...b ⁿ _J n (where n=1, 2, ...,
A standard pattern storage means for storing the standard pattern R ⁿ converted into N), and a feature vector ^b ⁿ _j (j = 1, 2, ..., J ⁿ ) and the feature vector a _i of the i-th frame of the input pattern T, inter-vector distance d ⁿ (i, j)
and an inter-vector distance calculation means for calculating the input pattern T from the inter-vector distance d ⁿ (i, j).
and monosyllabic cumulative distance calculation means for calculating the distance between the series of the standard pattern R ⁿ and the word standard pattern expressed as a combination of the monosyllabic standard pattern and the series of the input pattern. From the results of the cumulative distance calculation means and the monosyllable cumulative distance calculation means or the word cumulative distance calculation means,
A speech recognition device characterized by comprising: determining means for determining a single syllable or word standard pattern that is closest in distance to the input pattern T.