JPH04204899A

JPH04204899A - Speech recognition device

Info

Publication number: JPH04204899A
Application number: JP2337944A
Authority: JP
Inventors: Tatsuya Kimura; 達也木村
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-11-30
Filing date: 1990-11-30
Publication date: 1992-07-27

Abstract

PURPOSE:To obtain good degrees of section similarity and to provide a word recognition system as well as to improve a recognition rate by subjecting the degree of frame similarity to weighting computation in a time direction, then determining the degree of section similarity. CONSTITUTION:A means 2 for calculating the degree of the frame similarity outputs the degree of the frame similarity obtd. by calculating the degree of similarity between the characteristic parameter time series obtd. by a speech analyzing means 1 and the standard parameter prepd. for each of the phonemes stored in a phoneme standard pattern 3. A means 4 for computing weighting obtains the weighted degree of the frame similarity by referencing the weighting function stored in a means 5 for storing the weighting functions. A means 6 for calculating the degree of the section similarity calculates the quantity of the degree of the section similarity with all the phonemes and a means 7 for calculating the degree of the word similarity obtains the degree of the word similarity by most adequately adding the degree of the section similarity in accordance with the array of the phonemes concerning the respective words obtd. by referencing a word dictionary 8. The recognition rate is improved in this way.

Description

【発明の詳細な説明】産業上の利用分野本発明は、人間の発声した音声を自動認識する音声認識
装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a speech recognition device that automatically recognizes human speech.

従来の技術従来、人間の発声した音声を認識する装置を実現する方
法として、音素や音節を認識の基本単位とする方法があ
る。この方法は音声の登録が不要なため、単語辞書の変
更が容易であるという特長がある。以下、従来技術とし
て、音素を認識の基本単位とする方法のうちから、フレ
ーム毎に音素の類似度を求めた後に、それらを統合して
最終的な認識結果を得る方法について説明する。2. Description of the Related Art Conventionally, as a method for realizing a device that recognizes human speech, there is a method in which phonemes and syllables are used as basic units of recognition. This method has the advantage that it is easy to change the word dictionary because it does not require voice registration. Hereinafter, as a conventional technique, a method using phonemes as the basic unit of recognition, a method of obtaining the similarity of phonemes for each frame and then integrating them to obtain a final recognition result will be described.

第２図は、従来の音声認識システムの構成を示したもの
である。FIG. 2 shows the configuration of a conventional speech recognition system.

第２図において、入力音声は音声分析手段１１に入力さ
れる。この部分、即ち音声分析手段１１は入力音声を分
析し、音声の特徴を表す特徴パラメータの時系列を分析
の基本単位であるフレーム毎にフレーム類似度算出手段
１２に出力する。フレーム類似度算出手段１２は、前段
の音声分析手段１１で得られた上記特徴パラメータ時系
列と、音素標準パターン１３に格納されている音素毎に
用意された標準パターンとの間の類似度を計算すること
によって得られるフレーム類似度を区間類似度算出手段
１４に出力する。In FIG. 2, input speech is input to speech analysis means 11. In FIG. This part, that is, the speech analysis means 11 analyzes the input speech and outputs a time series of feature parameters representing the characteristics of the speech to the frame similarity calculation means 12 for each frame, which is the basic unit of analysis. The frame similarity calculation means 12 calculates the similarity between the feature parameter time series obtained by the preceding speech analysis means 11 and the standard pattern prepared for each phoneme stored in the phoneme standard pattern 13. The frame similarity obtained by doing this is output to the section similarity calculation means 14.

次段の区間類似度算出手段１４では、入力音声中の部分
区間についての音素の類似度である「区間類似度」なる
量をすべての音素について計算する。The section similarity calculating means 14 at the next stage calculates a quantity called "section similarity" which is the similarity of phonemes for partial sections in the input speech for all phonemes.

区間類似度は、フレーム類似度を部分区間についてフレ
ーム類似度を累積した値を区間長で正規化した値である
。単語類似度算出手段１５では、単語辞書１６を参照す
ることによって得られる各単語に関する音素の並びに従
って、上記区間類似度を最適に足し合わせることにより
単語類似度を得る。The section similarity is a value obtained by accumulating the frame similarity for a partial section and normalizing it by the section length. The word similarity calculation means 15 obtains a word similarity by optimally adding up the section similarities according to the phoneme arrangement for each word obtained by referring to the word dictionary 16.

次に区間類似度を用いて単語類似度を算出する方法につ
いて説明する。Next, a method of calculating word similarity using interval similarity will be explained.

単語辞書側の単語の音素の並びをｐ（１）Ｉ）（２）ｌ
）（３）・・・・・・ｐ　（Ｎ）であるとする。単語中
のｉ番目の音素ｐ（ｉ）の継続時間の下限値をｌ　ｍｉ
ｎ　（ｉ　）、上限値を１　ｍａｘ（＋　）とする。音
素Ｐ（１）の第ｊフレームにおけるフレーム類似度をＲ
（ｌ、ｊ）、音素ｐ（１）のフレーム番号ｊにおける累
積類似度を５ＣＯＲＥ　（ｉ、ｊ）とする。また最終フ
レーム番号Ｊ長さＬの部分区間における音素ｐの区間累
似度をＳＩＭ（ｐ、ｊ、Ｌ）とする。また５ＣＯＲＥ　
（ｉ、ｊ）を得た時の入力音声の始端からＪに至るまで
の長さ（フレーム数）を１ｅｎ（ｉ、ｊ）とする。更に
下記のとおり記号の定義をする。The phoneme sequence of the word in the word dictionary is p(1)I)(2)l
)(3)...p (N). The lower limit of the duration of the i-th phoneme p(i) in a word is l mi
n (i), and the upper limit value is 1 max (+). The frame similarity in the j-th frame of phoneme P(1) is R
(l, j), the cumulative similarity of phoneme p(1) at frame number j is 5CORE (i, j). Further, the interval similarity of phoneme p in the partial interval of final frame number J and length L is assumed to be SIM(p, j, L). Also 5 CORE
When (i, j) is obtained, the length (number of frames) from the start of the input audio to J is 1en(i, j). Furthermore, the symbols are defined as follows.

ＳＳ　（ｉ）：ｐ　（ｉ）の始端範囲の先頭フレーム番
号ＳＥ　（ｉ）：　ｐ（ｉ）の始端範囲の最後尾フレーム
番号ＥＳ　（ｉ）：ｐ　（ｉ）の終端範囲の先頭フレ−ム番
号ＥＥ　（ｉ）：　ｐ　（ｉ）の終端範囲の最後尾フレー
ム番号単語類似度算出の手順を以下に示す。SS (i): First frame number of the starting range of p (i) SE (i): Last frame number of the starting range of p(i) ES (i): First frame of the ending range of p (i) Number EE (i): Last frame number of the terminal range of p (i) The procedure for calculating word similarity is shown below.

（１）初期化ＳＳ　（１）←単語の始端の存在範囲の先頭フレーム番
号Ｅ　（１）←単語の始端の存在範囲の最後尾フレーム番
号５ＣＯＲＥ　（０，・）−〇ｌｅｎ　（０、・）＝０（２）ｉ＝１からｉ−Ｎまで（３）および（４）を実行
する。(1) Initialization SS (1) ← First frame number E of the range where the start of the word exists (1) ← Last frame number of the range where the start of the word exists 5CORE (0,・)-〇len (0,・) =0 (2) Execute (3) and (4) from i=1 to iN.

（３）音素終端範囲の決定ＥＳ　（ｊ）　←ＳＳ　（ｊ）　＋１ｍ１ｎ（ｉ）ＥＥ
　（ｉ）　←ＳＥ　（ｉ）＋１ｍａｘ（１）（４）音素
終端範囲における累積類似度の計算ｊ＝ＥＳ　（ｉ）か
らＥＥ（ｉ）まで以下の漸化式を実行する。(3) Determination of phoneme ending range ES (j) ←SS (j) +1m1n(i)EE
(i)←SE (i)+1max(1)(4) Calculation of cumulative similarity in phoneme end range j=ES (Execute the following recurrence formula from (i) to EE(i).

５ＣＯＲＥ　（ｉ、ｊ）　− ＭＡＸ［＋αｘＳＩＭ（ｐ　（ｉ）、ｊ、Ｌ）１　ｍｉ
ｎ　（ｉ　）≦Ｌ≦ｌ　ｍａｘ　（ｉ　）　　　　　　
　　　　＼但し、α、β、γは時間正規化のだめの重み
係数。5CORE (i, j) − MAX[+αxSIM(p (i), j, L) 1 mi
n(i)≦L≦lmax(i)
\However, α, β, and γ are weighting coefficients for time normalization.

α＝Ｌ β＝１ｅｎ（ｉ−１，ｊ　−Ｌ） γ＝α＋β （５）単語類似度の決定次式で得られる値Ｒを単語類似度とする。α=L β=1en(i-1,j-L) γ=α+β (5) Determination of word similarity Let the value R obtained by the following equation be the word similarity.

Ｒ＝ＭＡＸ　Ｉ　５ＣＯＲＥ　（Ｎ、ｊ）１」Ｃ単語終
端の存在領域（６）終了以上述べた手順に従って単語類似度Ｒを単語辞書に格納
されている全ての単語について求める。R=MAX I 5CORE (N, j) 1''C Existence area of word end (6) End Find word similarity R for all words stored in the word dictionary according to the procedure described above.

最後に、第２図中の単語決定手段１７により最大の単語
類似度値を与える単語を認識結果として得る。Finally, the word determining means 17 in FIG. 2 obtains the word giving the maximum word similarity value as a recognition result.

発明が解決しようとする課題以上のように、従来の音声認識方法では、フレーム毎の
音素類似度の累積値に基づき単語類似度の算出を行って
いた。ところがこの方法、即ち、積してこれを当該区間
の音素の類似度とする方法は以下に述べる理由で問題が
あった。Problems to be Solved by the Invention As described above, in the conventional speech recognition method, word similarity is calculated based on the cumulative value of phoneme similarity for each frame. However, this method, ie, the method of multiplying and using this as the similarity of phonemes in the section, has a problem for the reasons described below.

実際の音素の存在区間についての音素の類似度は、フレ
ーム毎の音素類似度を上記存在区間について単に累積し
た値に一致するとは限らない。即ち、フレーム類似度を
算出するために設けられている音素の標準パターンは、
音素の特徴が顕著になるフレーム（エポックフレームと
記す）の近傍を含む区間における時間軸方向に幅もった
区間にわたる特徴パラメータの時系列を音声データから
切り出すことにより作成するのが一般的である。The degree of phoneme similarity for the actual phoneme existence interval does not necessarily match the value obtained by simply accumulating the phoneme similarity for each frame for the aforementioned existence interval. In other words, the standard pattern of phonemes provided for calculating frame similarity is:
It is common to create a time series of feature parameters over a wide interval in the time axis direction, including the vicinity of a frame (referred to as an epoch frame) in which phoneme features become noticeable, from audio data.

この方法で作成した時間幅をもつ標準パターンを１フレ
ームずつずらしながらフレーム毎に音素類似度を求めた
場合、当該音素のエポックフレームにおけるフレーム類
似度は明らかに意味を持つが、音素境界等のフレームに
おけるフレーム類似度値は必ずしも意味を持った値であ
るという保証は無い。If the standard pattern with a time width created using this method is shifted one frame at a time and the phoneme similarity is calculated for each frame, the frame similarity in the epoch frame of the phoneme is clearly meaningful, but There is no guarantee that the frame similarity value in is a meaningful value.

ところが従来方法では、エポックフレームにおけるフレ
ーム類似度もエポックフレーム以外におけるフレーム類
似度も同等に取扱って区間類似度を求めていたため、区
間類似度の品質の点で問題がありこれが認識率低下の一
因となっていた。However, in the conventional method, frame similarity in epoch frames and frame similarity in non-epoch frames are treated equally to calculate interval similarity, which causes a problem in the quality of interval similarity, which is one of the reasons for the decline in recognition rate. It became.

本発明は上記課題に鑑み、質の良い区間類似度を得て単
語認識系を実現することにより認識率の向上を図ること
を目的とする。In view of the above problems, it is an object of the present invention to improve the recognition rate by obtaining a high-quality section similarity and realizing a word recognition system.

課題を解決するための手段この目的を達成するために、本発明は、入力音声を分析
の単位であるフレーム毎に分析し、特徴パラメータを得
る音声分析手段と、認識の基本単位の特徴を表す標準パ
ターンを格納する標準パターン格納手段と、特徴パラメ
ータと標準パターンとの間の類似度をフレーム毎に算出
するフレーム類似度算出手段と、認識の基本単位を表現
する記号によって記述された単語の発音に関する情報を
格納する単語辞書と、時間方向の重み付け関数を格納ま
たは発生する重み付け関数格納手段と、フレーム類似度
算出手段が算出したフレーム類似度に対して重み付け関
数を用し・て重み付け演算を行う重み付け演算手段とを
具備し、重み付け演算によって得られた認識の基本単位
毎のフレーム類似度を単語辞書内の表記に従って累積す
ることにより単語類似度を算出する。Means for Solving the Problems To achieve this object, the present invention provides a speech analysis means for analyzing input speech frame by frame, which is a unit of analysis, and obtaining feature parameters, and a method for expressing the features of the basic unit of recognition. A standard pattern storage means for storing a standard pattern, a frame similarity calculating means for calculating the degree of similarity between a feature parameter and a standard pattern for each frame, and a pronunciation of a word described by a symbol representing a basic unit of recognition. weighting function storage means for storing or generating a weighting function in the time direction; and weighting calculation using the weighting function for the frame similarity calculated by the frame similarity calculation means. The word similarity is calculated by accumulating the frame similarity for each basic unit of recognition obtained by the weighting calculation according to the notation in the word dictionary.

作用本発明は、上記構成により、フレーム類似度に時間方向
の重み付け演算を施した後に、この値を累積することで
区間類似度を求めることにより、従来方法に比べて質の
良い区間類似度を得ることができ、この区間類似度を用
いて単語認識系を実現することにより認識率の向上を図
ることができる。According to the above configuration, the present invention calculates the interval similarity by accumulating the values after weighting the frame similarity in the time direction, thereby obtaining a higher quality interval similarity than the conventional method. By using this interval similarity to implement a word recognition system, it is possible to improve the recognition rate.

実施例以下、本発明の一実施例について図面を参照しながら説
明する。第１図は本発明の一実施例における音声認識装
置のブロック結線図である。EXAMPLE Hereinafter, an example of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention.

第１図において、入力音声は音声分析手段１に入力され
る。この部分、即ち音声分析手段１は、入力音声を分析
し、音声の特徴を表す特徴パラメータの時系列を分析の
基本単位であるフレーム毎にフレーム類似度算出手段２
に出力する。In FIG. 1, input speech is input to speech analysis means 1. In FIG. This part, that is, the voice analysis means 1 analyzes the input voice, and calculates the time series of feature parameters representing the characteristics of the voice for each frame, which is the basic unit of analysis, into the frame similarity calculation means 2.
Output to.

フレーム類似度算出手段２は前段の音声分析手段１で得
られた上記特徴パラメータ時系列と、音素標準パターン
３に格納されている音素毎に用意された標準パターンと
の間の類似度を計算することによって得られるフレーム
類似度を出力する。The frame similarity calculation means 2 calculates the similarity between the feature parameter time series obtained by the preceding speech analysis means 1 and the standard pattern prepared for each phoneme stored in the phoneme standard pattern 3. The frame similarity obtained by this method is output.

なお、音素標準パターン３は、音素や音節等の認識の基
本単位の特徴を表す標準パターンを格納する。Note that the phoneme standard pattern 3 stores standard patterns representing characteristics of basic units of recognition such as phonemes and syllables.

重み付け演算手段４では重み付け関数格納手段５に格納
されている重み付け関数を参照することにより、前段で
得られたフレーム類似度に重み付け演算を行い「重み付
けフレーム類似度」を得る。The weighting calculation means 4 performs a weighting calculation on the frame similarity obtained in the previous stage by referring to the weighting function stored in the weighting function storage means 5 to obtain a "weighted frame similarity".

重み付け演算は、入力音声区間の部分区間を設定して行
う。従って重み付け関数は、部分区間の長さ毎に用意さ
れ、部分区間の中心及びその近傍におけるフレーム類似
度の寄与の度合いを犬かく、部分区間の両端及びその近
傍におけるフレーム類似度の寄与の度合いを小さくなる
ようにとる。The weighting calculation is performed by setting a partial section of the input speech section. Therefore, a weighting function is prepared for each subinterval length, and weighs the degree of contribution of frame similarity at the center of the subinterval and its vicinity, and weighs the degree of contribution of frame similarity at both ends of the subinterval and its vicinity. Make it smaller.

次段の区間類似度算出手段６では入力音声中の部分区間
についての音素の類似度である［区間類低度」なる量を
すべての音素について計算する。The section similarity calculation means 6 at the next stage calculates a quantity called "section class lowness" which is the similarity of phonemes for partial sections in the input speech for all phonemes.

区間類似度は、上記「重み付きフレーム類似度」を部分
区間について累積した値を区間長で正規化した値である
。The section similarity is a value obtained by accumulating the above-mentioned "weighted frame similarity" for a partial section and normalizing it by the section length.

単語類似度算出手段７では、単語辞書８を参照すること
によって得られる各単語に関する音素の並びに従って上
記区間類似度を最適に足し合わせることにより単語類似
度を得る。The word similarity calculation means 7 obtains the word similarity by optimally adding up the section similarities according to the phoneme arrangement for each word obtained by referring to the word dictionary 8.

次に区間類似度を用いて単語類似度を算出する方法につ
いて説明する。単語辞書８側の単語の音素の並びをｐ　
（ｉ）　ｐ　（２）　ｐ　（３）・・・・・・ｐ（Ｎ）
であるとする。単語中の１番目の音素ｐ（１）の継続時
間の下限値を１　ｍｉｎ　（ｉ）　、上限値をｌ　ｍａ
ｘ（１）とする。音素Ｐ（１）の第ｊフレームにおける
フレーム類似度をＲ（ｉ、ｊ）、音素ｐ（ｉ）のフレー
ム番号ｊにおける累積類似度を５ＣＯＲＥ（ｉ、ｊ）と
する。また最終フレーム番号Ｊ、長さしの部分区間にお
ける音素ｐの区間類似度をＳＩＭ（ｐ、ｊ、Ｌ）とする
。また５ＣＯＲＥ（ｉ。Next, a method of calculating word similarity using interval similarity will be explained. The phoneme arrangement of the word in the word dictionary 8 is p
(i) p (2) p (3)...p(N)
Suppose that The lower limit of the duration of the first phoneme p(1) in a word is 1 min (i), and the upper limit is l ma
Let x(1) be. Let the frame similarity of phoneme P(1) in the j-th frame be R(i, j), and the cumulative similarity of phoneme p(i) in frame number j be 5CORE(i, j). Also, let SIM(p, j, L) be the interval similarity of phoneme p in a partial interval with final frame number J and length. Also 5CORE (i.

」）を得た時の入力音声の始端からｊに至るまでの長さ
（フレーム数）をＩｅｎ（ｉ、ｊ）とする。更に下記の
とおり記号の定義をする。'') is obtained, the length (number of frames) from the start of the input audio to j is defined as Ien(i, j). Furthermore, the symbols are defined as follows.

ＳＳ　（ｉ）：Ｉ）（ｉ）の始端範囲の先頭フレーム番
号５Ｅ（ｉ）：ｐ（１）の始端範囲の最後尾フレーム番号ＥＳ　（ｉ）：ｐ　（ｉ）の終端範囲の先頭フレーム番
号ＥＥ　（ｉ）：ｐ（ｊ）の終端範囲の最後尾フレーム番号単語類似度検出の手順を以下に示す。SS (i): I) First frame number of the starting range of (i) 5E (i): Last frame number of the starting range of p(1) ES (i): First frame number of the ending range of p (i) The procedure for detecting the last frame number word similarity in the end range of EE (i):p(j) is shown below.

（１）初期化ＳＳ　（１）←単語の始端の存在範囲の先頭フレーム番
号Ｅ　（１）←単語の始端の存在範囲の最後尾フレーム番
号５ＣＯＲＥ　（０，・）＝Ｏ１ｅｎ　（０＋　　・）＝０（２）ｉ＝１からｉ＝Ｎまで（３）および（４〉を実行
する。(1) Initialization SS (1) ← First frame number E of the range where the start of the word exists (1) ← Last frame number of the range where the start of the word exists 5 CORE (0, ・)=O 1en (0+ ・)= 0 (2) Execute (3) and (4> from i=1 to i=N.

（３）音素終端範囲の決定ＥＳ　（ｉ）　←ＳＳ　（ｉ）＋１ｍ１ｎ（ｉ）ＥＥ　
（ｉ）　←ＳＥ　（ｉ）＋１ｍａｘ（１）（４）音素終
端範囲における累積類似度の計算ｊ＝ＥＳ　（ｉ）から
ＥＥ（１）まで以下の漸化式を実行する。(3) Determination of phoneme ending range ES (i) ←SS (i)+1m1n(i)EE
(i)←SE (i)+1max(1)(4) Calculation of cumulative similarity in phoneme end range j=ES (Execute the following recurrence formula from (i) to EE(1).

５ＣＯＲＥ　（ｉ、ｊ）　− ＭＡＸ口（αＸＳＩＭ　（ｐ（ｉ）、ｊ、Ｌ）ｌ　ｍｉ
ｎ　（ｉ　）≦Ｌ≦１　ｍａｘ　（ｉ　）＋βｘｓｃＯ
ＲＥ　（ｉ−１，ｊ−Ｌ）ｌ／γ〕但し、α、β、γは
時間正規化のための重み係数。5CORE (i, j) - MAX mouth (αXSIM (p(i), j, L)l mi
n (i)≦L≦1 max (i)+βxscO
RE (i-1,j-L)l/γ] where α, β, and γ are weighting coefficients for time normalization.

α＝Ｌ β＝ｌｅｎ（ｉ−１，ｊ　−Ｌ） γ＝α＋β （５）単語類似度の決定次式で得られる値Ｒを単語類似度とする。α=L β=len(i-1,j-L) γ=α+β (5) Determination of word similarity Let the value R obtained by the following equation be the word similarity.

Ｒ＝ＭＡＸＩＳＣＯＲＥ　（Ｎ、ｊ）１ｊＣ単語終端の存在領域（６）終了以上述べた手順に従って単語類似度Ｒを単語辞書に格納
されている全ての単語について求める。R=MAXISCORE (N, j) 1jC Existence region of word end (6) End Find the word similarity R for all words stored in the word dictionary according to the procedure described above.

最後に、第１図中の単語決定手段９により最大の単語類
似度値を与える単語を認識結果として得る。Finally, the word determining means 9 in FIG. 1 obtains the word giving the maximum word similarity value as a recognition result.

以上、実施例では認識の基本単位が音素の場合について
の具体例で説明したが、例えば音節等の音素以外の認識
の基本単位の場合についても本発明はもちろん適用でき
る。In the above embodiments, a specific example has been described in which the basic unit of recognition is a phoneme, but the present invention is of course applicable to cases where the basic unit of recognition is other than a phoneme, such as a syllable.

発明の効果以上のように本発明によれば、フレーム類似度に時間方
向の重み付け演算を施した後に区間類似度を求めること
により、意味のあるフレームにおけるフレーム類似度の
値が強調されるため、質の良い区間類似度を得ることが
でき、この区間類似度を用いて単語認識系を実現するこ
とにより認識率の向上を図ることができる。Effects of the Invention As described above, according to the present invention, by calculating the interval similarity after performing a temporal weighting operation on the frame similarity, the value of the frame similarity in meaningful frames is emphasized. It is possible to obtain a high-quality section similarity, and by implementing a word recognition system using this section similarity, it is possible to improve the recognition rate.

[Brief explanation of the drawing]

第１図は本発明の一実施例における音声認識装置のブロ
ック結線図、第２図は従来の音声認識装置のブロック結
線図である。】・・・音声分析手段、２・・・フレーム類似度算出手
段、３・・・音素標準パターン、４・・・重み付け演算
手段、５・・・重み付け関数格納手段、６・・・区間類
似度算出手段、７・・・単語類似度算出手段、８・・・
単語辞書、９・・単語決定手段。FIG. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention, and FIG. 2 is a block diagram of a conventional speech recognition device. ]...Speech analysis means, 2...Frame similarity calculation means, 3...Phoneme standard pattern, 4...Weighting calculation means, 5...Weighting function storage means, 6...Section similarity Calculation means, 7... Word similarity calculation means, 8...
Word dictionary, 9... word determination means.

Claims

[Scope of Claims] Speech analysis means for analyzing input speech frame by frame, which is a unit of analysis, to obtain feature parameters; and standard pattern storage means for storing a standard pattern representing the features of the basic unit of recognition; a frame similarity calculation means for calculating the similarity between the parameters and the standard pattern for each frame; a word dictionary storing information regarding the pronunciation of words described by symbols representing the basic units of recognition; weighting function storage means for storing or generating a direction weighting function; and weighting calculation means for performing a weighting operation using the weighting function on the frame similarity calculated by the frame similarity calculation means, A speech recognition device characterized in that a word similarity is calculated by accumulating the frame similarity for each basic unit of recognition obtained by calculation according to the notation in the word dictionary.