JPH0480398B2

JPH0480398B2 -

Info

Publication number: JPH0480398B2
Application number: JP58050698A
Authority: JP
Inventors: Jungo Kito
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1983-03-25
Filing date: 1983-03-25
Publication date: 1992-12-18
Also published as: JPS59176794A

Description

【発明の詳細な説明】〈技術分野〉本発明は未知入力音声に対する性能向上をはか
つた単語音声認識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a word speech recognition device with improved performance for unknown input speech.

〈背景〉単語音声認識装置は、一般に、発声された単語
音声をマイクアンプ等の増巾器で増巾した後、音
響分析部にて音声の特徴を表現出来る特徴パラメ
ータ、例えばBPF群（バントパスフイルター群）
によるパワースペクトル、自己相関関数、零交差
数に分析される。この後、特徴抽出部にて単語区
間の判定、単語区間内の特徴パラメータ時系列
を、パターンメモリ量の低減、マツチング時の演
算時間の低減化の関点より、時間軸方向に圧縮す
る等、所定のアルゴリズムにより入力パターンと
して作成が行なわれる。<Background> In general, a word speech recognition device amplifies the uttered word speech with an amplifier such as a microphone amplifier, and then uses feature parameters that can express the characteristics of the speech in an acoustic analysis section, such as BPF group (Bant Pass). filter group)
is analyzed into the power spectrum, autocorrelation function, and number of zero crossings. After this, the feature extraction unit determines the word section and compresses the feature parameter time series within the word section in the time axis direction from the perspective of reducing the amount of pattern memory and reducing the calculation time during matching. It is created as an input pattern using a predetermined algorithm.

この後、入力パターンは前もつて同様の方法で
登録されている（特定話者を対象とするもの）、
あるいは多数の話者より作成されている（不特定
話者を対象とするもの）標準パターンとマツチン
グが取られ、最も類似したものを認識結果として
出力する。 After this, the input pattern is registered in the same way as before (targeting a specific speaker),
Alternatively, matching is performed with a standard pattern created by many speakers (targeting unspecified speakers), and the most similar pattern is output as a recognition result.

この標準認識に於いて、認識対象の単語以外の
音声、特に使用者が不用意に発声した話声、せき
ばらい等、さらに周囲騒音で突発的に発生した物
音に対して、誤認識を行なつてしまう不都合がよ
くみられる。 In this standard recognition, erroneous recognition may occur for sounds other than the words to be recognized, especially for voices carelessly uttered by the user, coughing, etc., and also for sounds suddenly generated due to ambient noise. The inconvenience of getting used to it is common.

〈発明の目的〉本発明はこのような不都合を改善するものであ
り、標準パターンに各単語毎の時間長情報を付加
し、単語長制限してパターンマツチング処理を行
なうことにより、音声の特徴表現力によるが、特徴の音韻に対
する識別能力が低い場合、認識対象語以外の継
続時間長の異なる単語でも非常に類似したパタ
ーンが作成され、これが誤認識の原因となる。
この誤認識を低減する。<Purpose of the Invention> The present invention is intended to improve such inconveniences, and by adding time length information for each word to a standard pattern and performing pattern matching processing with word length restrictions, the characteristics of speech can be improved. Although it depends on the expressiveness, if the discriminative ability for the phoneme of the feature is low, very similar patterns will be created even for words with different durations other than the recognition target word, and this will cause misrecognition.
Reduce this misrecognition.

標準パターン容量の低減化に併ない、標準パ
ターン長を単語長の差異にかかわらず一定長に
固定した場合、特徴の部分的欠落が単語長が長
くなるに従がい大きくなり、パフオーマンスの
低下をまねく。これによる誤認識を低減する。 As the standard pattern capacity is reduced, if the standard pattern length is fixed at a constant length regardless of the difference in word length, the partial loss of features will increase as the word length increases, leading to a decrease in performance. . Misrecognition caused by this is reduced.

また、標準パターンとして登録されている認
識対象語間の語長の分散が大きい場合、即ち、
対象語長が異なるものが多い場合、マツチング
対象として入力単語長が各標準パターン単語長
許容内にある標準パターンのみに限定できる。
これは単語長に制限を設けることで、マツチン
グ処理の対象となる標準パターンを少数に絞り
込む予備選択操作であり、マツチング処理時間
の低減化に役立つ。 In addition, if the variance in word length between recognition target words registered as standard patterns is large, that is,
If there are many target words with different lengths, matching can be limited to only standard patterns whose input word lengths are within the allowable word length of each standard pattern.
This is a preliminary selection operation that narrows down the standard patterns to be matched to a small number by setting a limit on the word length, and is useful for reducing the matching processing time.

〈実施例〉以下図面に従つて本発明の一実施例を説明す
る。<Example> An example of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例を示すブロツク構成
図である。 FIG. 1 is a block diagram showing one embodiment of the present invention.

マイク１に向つて発声された単語音声は、前処
理部２により増巾され、必要に応じプリエンフア
シス等の処理がなされる。この段以降がデイジタ
ル的な処理である場合、この前処理部２に於いて
音声波形はデイジタル信号に変換される。 Word sounds uttered into the microphone 1 are amplified by a preprocessing unit 2, and are subjected to processing such as pre-emphasis as necessary. If the processing after this stage is digital processing, the audio waveform is converted into a digital signal in this preprocessing section 2.

続いて、特徴分析部３では音声の特徴を表しう
る量、例えば短時間パワースペクトル、自己相関
関数、零交差数の分析が行なわれる。 Subsequently, the feature analysis section 3 analyzes quantities that can represent the features of the voice, such as short-time power spectrum, autocorrelation function, and number of zero crossings.

この特徴時系列は単語区間切出し部４で単語区
間の特徴時系列１０１と継続時間長情報１０２が
抽出される。単語区間の特徴時系列１０１はパタ
ーン作成部５にて、時間軸方向への情報圧縮が行
なわれパターン１０３として作成される。 From this feature time series, the word section extraction unit 4 extracts the feature time series 101 and duration information 102 of the word section. The feature time series 101 of the word section is compressed in the time axis direction by the pattern creation unit 5, and is created as a pattern 103.

パターン１０３と継続時間長１０２は切換スイ
ツチSW₁又はSW₂の動きによつて、標準パターン
記憶部６か入力パターン記憶部７に送られる。 The pattern 103 and the duration 102 are sent to the standard pattern storage section 6 or the input pattern storage section 7 by movement of the changeover switch _SW1 or _SW2 .

通常、特定話者を対象とする認識装置では、予
じめ認識対象語を登録するためＳ１を閉じ、標準
パターン記憶部６に登録する。そして認識処理の
場合は、Ｓ２を閉じ、入力語のパターンを入力パ
ターン記憶部７に一旦格納する。 Normally, in a recognition device targeted at a specific speaker, in order to register the recognition target word in advance, S1 is closed and the recognition target word is registered in the standard pattern storage section 6. In the case of recognition processing, S2 is closed and the input word pattern is temporarily stored in the input pattern storage section 7.

不特定話者を対象にする場合は、この切換スイ
ツチSW₁，SW₂はなく、標準パターン記憶部６は
多数話者の発声した単語より抽出した特徴パター
ン及び標準単語継続時間長が前もつて格納されて
いる。この場合、この記憶部６はROM等で構成
される。 When targeting unspecified speakers, the changeover switches SW ₁ and SW ₂ are not needed, and the standard pattern storage unit 6 stores feature patterns extracted from words uttered by multiple speakers and standard word duration lengths in advance. Stored. In this case, this storage section 6 is composed of a ROM or the like.

パターンマツチング部８は認識処理時に於い
て、標準パターン記憶部６に格納されているパタ
ーンと入力パターン記憶部７の現入力パターンと
の比較照合を行ない、最も類似している標準パタ
ーンの番号等を認識結果１０４として出力する。 During recognition processing, the pattern matching unit 8 compares and matches the pattern stored in the standard pattern storage unit 6 with the current input pattern in the input pattern storage unit 7, and determines the number etc. of the most similar standard pattern. is output as the recognition result 104.

制御部９は各部及び各部間の制御を行なうもの
で、前記標準パターン内の各単語の継続時間長か
ら演算によつてその許容範囲を外部から調整でき
る外部入力部１０を有している。外部入力部１０
から入力される許容設定コード１０５は、許容巾
を時間で表現したり、標準時間長に対する上限・
下限比率（％）で表わしている。The control section 9 controls each section and between each section, and has an external input section 10 that can adjust the allowable range from the outside by calculating from the duration of each word in the standard pattern. External input section 10
The allowable setting code 105 input from
It is expressed as a lower limit ratio (%).

第２図は第１図の点線部内を１ビツトマイクロ
コンピユータ１１で構成したものである。マイク
ロコンピユータ１１はCPU１２，ROM１３，
RAM１４を含み、不特定話者を対象とする場
合、標準パターンは上記ROM１３に格納される
こととなる。 In FIG. 2, the dotted line area in FIG. 1 is constructed with a 1-bit microcomputer 11. The microcomputer 11 has a CPU 12, a ROM 13,
If the standard pattern includes the RAM 14 and is aimed at unspecified speakers, the standard pattern will be stored in the ROM 13.

特定話者単語音声認識は、近年のデイジタル信
号処理技術、LSI技術の進歩により、低価格化の
方向でボードやLSIが開発されているが、不特定
話者認識は、大型かつ高価な装置にとどまつてい
る。 For speaker-specific word speech recognition, boards and LSIs have been developed with the aim of reducing costs due to advances in digital signal processing technology and LSI technology in recent years, but speaker-independent recognition requires large and expensive equipment. It's staying.

不特定話者の単語音声認識において、話者によ
る変動の少ない特徴パラメーターとして、すべて
の音韻について適用できるものはない、しかしな
がら、大まかな分類を行なう場合には比較的個人
差の少ないパラメータとして零交差数が知られて
いる。パラメータとして零交差数を採り上げ、
種々の検討を加えた結果、短時間定レベル交差数
（Level Crossing）分析法を採用すれば、上記の
ようにアナログ回路部を除いて、不特定音声認識
装置をCMOS1チツプLSIで実現することができ
る。 In word speech recognition for non-specific speakers, there is no feature parameter that can be applied to all phonemes as a feature parameter that has little variation depending on the speaker.However, when performing rough classification, zero-crossing is a parameter with relatively few individual differences number is known. Taking the number of zero crossings as a parameter,
As a result of various studies, we found that by adopting a short-time constant level crossing analysis method, it is possible to realize an unspecified speech recognition device using a CMOS 1-chip LSI, excluding the analog circuit section as described above. can.

第３図に短時間定レベル交差数の定義を示す。
本特徴量は音声波形ｖが一定の閾値レベルt_hを交
差する回数を、短時間フレーム（周期τ）毎に計
数して得られるものである。閾値レベルは定常周
囲騒音よりもやや大きい値に調整設定することに
より、単語音声区間と無音区間の判別を可能にし
ている。 Figure 3 shows the definition of the number of short-term constant level crossings.
This feature quantity is obtained by counting the number of times the audio waveform v crosses a certain threshold level _th for each short time frame (period τ). By adjusting and setting the threshold level to a value slightly larger than the steady ambient noise, it is possible to distinguish between word speech sections and silent sections.

特徴量は音声のスペクトルの相対強度を表わし
得るもので、第４図に定レベル交差数分析により
得られた特徴時系列（ストツプと発生した場合）
を示す。(a)は音声波形、(b)は定レベル交差系列で
ある。図中に見られるように、有声音では低い値
を示し、無声音、特に“Ｓ”，“SH”等の摩擦音、
“TH”等の破擦音に対しては高い値を示す。零
レベル交差分析法では特徴分析部３において、こ
のようなフレーム毎の定レベル交差数が特徴とし
て抽出される。 The feature amount can represent the relative strength of the speech spectrum, and Figure 4 shows the feature time series obtained by constant level intersection analysis (when a stop occurs).
shows. (a) is a speech waveform, and (b) is a constant level crossing series. As seen in the figure, the value is low for voiced sounds, and for unvoiced sounds, especially fricatives such as "S" and "SH",
It shows high values for affricates such as “TH”. In the zero-level intersection analysis method, the feature analysis unit 3 extracts the number of constant-level intersections for each frame as a feature.

また第４図ｂにおいて、Ａ点は語頭、Ｂ点は語
尾であり、単語区間切出し部４はこれに基いて単
語継続時間情報１０２を抽出するとともに、特徴
時系列をゲーテングしてパターン作成部５に供給
することとなる。第２図のように１チツプマイク
ロコンピユータ１１を使用するものでは、この処
理は１チツプマイクロコンピユータ１１の内部で
行なわれる。 In addition, in FIG. 4b, point A is the beginning of a word, and point B is the end of a word. Based on this, the word segment extraction unit 4 extracts word duration information 102, and gates the feature time series to form a pattern creation unit 5. It will be supplied to In the case where a one-chip microcomputer 11 is used as shown in FIG. 2, this processing is performed inside the one-chip microcomputer 11.

第５図は標準パターン記憶部６又は１チツプマ
イクロコンピユータ１１のROM１３に格納され
る標準パターン例であり、各単語毎に特徴系列
Ｘ、標準単語時間長Ｙを記憶している。更に、図
示のようにこのパターンが選択された時に出力す
る結果出力コードＺ等の情報を含む場合もある。 FIG. 5 shows an example of a standard pattern stored in the standard pattern storage unit 6 or the ROM 13 of the 1-chip microcomputer 11, in which a feature series X and a standard word duration Y are stored for each word. Furthermore, as shown in the figure, information such as a result output code Z to be output when this pattern is selected may be included.

第６図は認識手順の主要部をフローチヤートと
して示したものであり、点線内が単語長許容値に
よるマツチング手順を示している。 FIG. 6 is a flowchart showing the main part of the recognition procedure, and the dotted line indicates the matching procedure based on the word length tolerance.

開始後、まず外部入力部１０からの許容設定コ
ード１０５の入力状況が見られ、後述する上限・
下限計算のためこのコードが読取られ記憶され
る。その後、第４図Ａ点のような入力単語の語頭
を検出し、語頭が検出されれば同図Ｂ点の語尾ま
での時間、即ち発声された単語の継続時間長を計
測する。次にこの単語区間の特徴時系列により入
力パターンが作成され、入力パターン記憶部７
（第２図では１チツプマイクロコンピユータ１１
のRAM１４）に記憶される。入力パターンは上
で計測された単語長情報を含んでいる。そしてフ
ローチヤートの点線内の単語長許容値によるマツ
チング処理に入る。 After starting, the input status of the allowable setting code 105 from the external input unit 10 is first seen, and the upper limit and
This code is read and stored for the lower limit calculation. Thereafter, the beginning of the input word as at point A in FIG. 4 is detected, and if the beginning is detected, the time to the end of the word at point B in FIG. 4, that is, the duration of the uttered word is measured. Next, an input pattern is created based on the feature time series of this word section, and the input pattern storage unit 7
(In Figure 2, a 1-chip microcomputer 11
RAM 14). The input pattern includes the word length information measured above. Then, matching processing is started using the word length tolerance value within the dotted line in the flowchart.

まず、標準パターン記憶部６（第２図では１チ
ツプマイクロコンピユータ１１のROM１３）に
格納されている標準パターンの標準単語継続時間
長Ｙを取込み、前記許容設定コード１０５とで単
語長許容値の上限・下限を計算する。これによつ
て入力語が許容値に入つているか判断し、入つて
いれば標準パターンとのパターン間距離を計算す
る。パターン間距離を計算した後、あるいは入力
語が許容値に入つていない場合は、このパターン
間距離計算をジヤンプして、標準パターンが終了
したかどうかの判断をし、次の比較照合すべき標
準パターンの標準単語継続時間長Ｙを取込む。以
上を比較照合すべき標準パターンが終了するまで
それぞれの単語毎に繰返す。 First, the standard word duration length Y of the standard pattern stored in the standard pattern storage unit 6 (in the ROM 13 of the 1-chip microcomputer 11 in FIG. 2) is loaded, and the upper limit of the permissible word length value is taken in with the permissible setting code 105.・Calculate the lower limit. Based on this, it is determined whether the input word is within the allowable value, and if so, the inter-pattern distance from the standard pattern is calculated. After calculating the inter-pattern distance, or if the input word is not within the allowed value, jump this inter-pattern distance calculation, judge whether the standard pattern is finished, and start the next comparison. The standard word duration length Y of the standard pattern is taken. The above steps are repeated for each word until the standard pattern to be compared and matched is completed.

ここで、特徴の音韻に対する識別能力が低い場
合、認識対象語以外の継続時間長の異なる単語で
も非常に類似したパターンが作成されるが、上述
のように単語長制限することにより許容値外はパ
ターン間距離の計算が省略され、誤認識は低減さ
れる。 Here, if the discriminative ability for the phoneme of the feature is low, very similar patterns will be created even for words with different duration lengths other than the recognition target word, but by limiting the word length as described above, Calculation of inter-pattern distances is omitted, reducing misrecognition.

また、標準パターンの容量の低減化に併ない、
標準パターンを単語長の差異にかかわらず一定長
に固定した場合も、特徴の部分的欠落が単語長が
長くなるに従がい大きくなり、パフオーマンスの
低下をまねくが、上記のような制限で単語長その
ものを比較照合の情報とすることによりこれの誤
認識を低減することができる。 In addition, along with the reduction in the capacity of standard patterns,
Even if the standard pattern is fixed to a constant length regardless of the difference in word length, the partial loss of features will increase as the word length increases, leading to a decrease in performance. By using that as information for comparison and verification, misrecognition of this can be reduced.

更に単語長に制限を設けることで、パターン間
距離の計算を省略するなど、マツチング処理の対
象となる標準パターンを少数に絞り込む予備選択
操作を行なうことができ、マツチング処理時間の
低減化に役立つ。 Further, by setting a limit on the word length, it is possible to perform a preliminary selection operation to narrow down the standard patterns to be matched to a small number, such as by omitting the calculation of the distance between patterns, which is useful for reducing the matching processing time.

このような比較照合の後、計算された標準パタ
ーンのなかから最小距離のものを検索する。そし
て、ここでは更に所定の閾値を定めて、所定値以
上のもののみを有効と判断して結果出力を行なう
ようにしている。所定値以下のものはリジエクト
として出力される。 After such comparison and verification, the one with the minimum distance is searched among the calculated standard patterns. Here, a predetermined threshold value is further defined, and only those that are equal to or higher than the predetermined value are determined to be valid and the results are output. Those below a predetermined value are output as rejects.

こうして再び外部入力部１０からの許容設定コ
ード１０３の読取りに戻る。外部入力部１０から
は任意の許容設定コード１０５を入力することが
可能であり、変化すれば変化した許容設定コード
１０５が読取られる。これは使用状況や話者に応
じて最適の許容巾を設定するもので、実例では、
例えば±20〜30％の許容範囲で制限を加えても、
未知入力音声に対する性能向上をはかつた上で、
認識率への影響も低いことが確かめられた。 In this way, the process returns to reading the permission setting code 103 from the external input unit 10 again. It is possible to input any permissible setting code 105 from the external input unit 10, and if it changes, the changed permissible setting code 105 is read. This sets the optimal tolerance according to the usage situation and speaker, and in actual examples,
For example, even if you add a limit within the tolerance range of ±20 to 30%,
After improving performance for unknown input audio,
It was confirmed that the effect on the recognition rate was also low.

上記の実施例では、パターン間距離を尺度にし
て類似パターンを検索する手法を示したが、パタ
ーンが記号的シンボルで表現されており、その時
間軸方向への遷移として系列が与えられるような
パターン作成部を有している装置で、この遷移系
列と完全に一致するものを認識結果とするような
場合は、一致を取つた上で単語長許容巾を計算
し、入力単語長が許容内であればその一致したパ
ターンの番号を出力し、さもなければ入力語を棄
却するアルゴリズムとして構成することも容易で
ある。 In the above example, a method of searching for similar patterns using the distance between patterns as a measure was shown. If the device has a creation section and the recognition result is something that completely matches this transition sequence, calculate the word length tolerance after finding a match, and check if the input word length is within the tolerance. It is also easy to construct an algorithm that outputs the number of the matched pattern if there is one, and rejects the input word otherwise.

〈発明の効果〉以上のように本発明によれば、定レベル交差数
分析法を用いる単語音声認識装置において、標準
パターンに各単語毎の時間長情報を付加し、さら
に周囲の雑音等の状況に応じて単語長許容値を変
化させるための単語継続時間長の許容設定情報を
外部より設定入力する手段を設け、この入力され
た許容設定情報と各単語毎の時間長情報とから演
算手段によつてその許容範囲の上限と下限とを求
め、これによつて入力単語長が許容値に入つてい
るか否か判断し、入つているもののみパターンマ
ツチングを行うように構成しているため、例え
ば、周囲の雑音が大きくなつた場合に入力単語長
が長く抽出されるのに対応して許容設定情報を変
化させて許容範囲を広げることによつて、認識率
の低下を防ぐことが可能となり、また、単語音声
認識装置を用いる人の発声速度に対応して許容設
定情報を変化させて許容範囲を調整することによ
つて認識率の低下を防ぐと共にマツチング処理時
間の低減が可能となる。<Effects of the Invention> As described above, according to the present invention, in a word speech recognition device that uses a constant level intersection number analysis method, time length information for each word is added to a standard pattern, and furthermore, the time length information for each word is added to the standard pattern, and the situation of surrounding noise etc. A means is provided for externally setting and inputting permissible setting information of word duration length in order to change the permissible word length value according to the word length, and a calculation means is provided from the input permissible setting information and the time length information for each word. Therefore, the upper and lower limits of the permissible range are determined, and based on these, it is determined whether the input word length is within the permissible value, and pattern matching is performed only for those words that are within the permissible value. For example, if the ambient noise becomes louder and longer input word lengths are extracted, it is possible to prevent the recognition rate from decreasing by changing the tolerance setting information and widening the tolerance range. Furthermore, by adjusting the permissible range by changing the permissible setting information in accordance with the speaking speed of the person using the word speech recognition device, it is possible to prevent a decrease in the recognition rate and reduce the matching processing time.

これによつて、定レベル交差数分析法の欠点で
ある類似した音韻系列を有する認識対象語以外の
未知音声に対する棄却能力の低さを十分に改善す
ることができるため、不特定話者音声認識装置
に、集積回路化に適し又比較的話者変動の小さい
定レベル交差数分析法を採用することが可能とな
り、単語音声認識装置をCMOS1チツプLSIで実
現することが可能となる。 As a result, it is possible to sufficiently improve the low rejection ability for unknown speech other than the recognition target word that has a similar phonological sequence, which is a drawback of the fixed-level intersection analysis method. It becomes possible to employ a constant level crossing number analysis method that is suitable for integrated circuit implementation and has relatively small speaker variation in the device, and it becomes possible to realize a word speech recognition device with a CMOS 1-chip LSI.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロツク構成
図、第２図は１チツプマイクロコンピユータを用
いた場合のブロツク図、第３図は音声波形を示す
図、第４図は音声波形ａと特徴系列ｂを対比して
示す図、第５図は標準パターンのメモリマツプ、
第６図は主要部の動作を説明するフローチヤート
である。１……マイク、２……前処理部、３……特徴分
析部、４……単語区間切り出し部、５……パター
ン作成部、６……標準パターン記憶部、７……入
力パターン記憶部、８……パターンマツチング
部、９……制御部、１０……外部入力部、１１…
…１チツプマイクロコンピユータ、１２……
CPU、１３……ROM、１４……RAM、Ｘ……
特徴系列、Ｙ……標準単語継続時間長。 Fig. 1 is a block configuration diagram showing an embodiment of the present invention, Fig. 2 is a block diagram when a 1-chip microcomputer is used, Fig. 3 is a diagram showing audio waveforms, and Fig. 4 is a diagram showing audio waveforms a and 1. A diagram showing a comparison of feature series b, Figure 5 is a memory map of the standard pattern,
FIG. 6 is a flowchart explaining the operation of the main parts. 1...Microphone, 2...Preprocessing unit, 3...Feature analysis unit, 4...Word section cutting unit, 5...Pattern creation unit, 6...Standard pattern storage unit, 7...Input pattern storage unit, 8...Pattern matching section, 9...Control section, 10...External input section, 11...
...1 chip microcomputer, 12...
CPU, 13...ROM, 14...RAM, X...
Feature series, Y...Standard word duration length.

Claims

[Scope of Claims] 1. A storage means for storing a feature sequence pattern X and standard word duration information Y obtained by a constant level intersection number analysis method as a standard pattern to be compared and matched for each word; and utterance. a feature analysis means for performing a feature analysis of the uttered input word speech using a constant level intersection number analysis method; measurement means for measuring time length; input pattern creation means for creating an input pattern based on the characteristic time series of the uttered input word sounds obtained by the feature analysis means; and word continuation in the standard pattern. an external input device for inputting permissible setting information for adjusting permissible values based on time length; and each standard word time length information Y in the standard pattern.
means for calculating an upper limit and a lower limit of a permissible word length value based on the input permissible setting information; and a means for calculating an upper limit and a lower limit of a permissible word length value based on the input permissible setting information; and means for performing pattern matching processing with the input pattern only for standard patterns that are within the range of allowable values based on the determination result. word speech recognition device.