JPH02293899A

JPH02293899A - Voice recognizing device

Info

Publication number: JPH02293899A
Application number: JP1115314A
Authority: JP
Inventors: Takeshi Norimatsu; 武志則松
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1989-05-09
Filing date: 1989-05-09
Publication date: 1990-12-05

Abstract

PURPOSE:To improve recognition performance by providing a soundless section detecting section which detects soundless sections and stores the frame positions thereof and a similarity calculation control section which executes the computation of cumulative distances in accordance with a matching route. CONSTITUTION:The soundless section detecting section 11 which detects the soundless sections from an energy system outputted from a voice analyzing section 10 and stores the start frame and final frame thereof and the similarity calculation control section 12 which executes the the pattern matching computation with the start points obtd. in the respective soundless section detection of the input patterns and standard patterns and the first soundless section points as start point candidate points are provided. The pattern matching calculation is controlled with the start frame of the final soundless section on the time series of the voice patterns together with the terminals of the respective standard patterns and input voice patterns as end point candidates as well by taking the missing of the end of words into consideration, by which the similarity of both is determined. The accurate recognition is executed in this way even if the missing of the end of the words arises.

Description

【発明の詳細な説明】産業上の利用分野本発明は、入力音声パターンと各標準パターンとのパタ
ーンマッチングにより認識結果を導き出す音声認識装置
に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a speech recognition device that derives recognition results through pattern matching between an input speech pattern and each standard pattern.

従来の技術この種の装置として、入力音声パターンと辞書に蓄えら
れた各標準パターンとの類似度を計算し、類似度の最大
となる標準パターンを認識結果とする方法が行なわれて
いる。二つの音声パターンの類似度を計算するためには
動的計画法（ダイナミック　プログラミング法）を用い
て、二つのパターンの時間軸を非線形に伸縮するパター
ンマッチング（以下、ＤＰマッチングと記す。）が使用
されている。特に、単語音声認識装置では、このＤＰマ
ッチング法により高い認識率を得ている。例えば次の文
献に記載されている。2. Description of the Related Art In this type of apparatus, a method is used in which the degree of similarity between an input speech pattern and each standard pattern stored in a dictionary is calculated, and the standard pattern with the maximum degree of similarity is determined as the recognition result. In order to calculate the similarity between two speech patterns, dynamic programming is used to perform pattern matching (hereinafter referred to as DP matching), which non-linearly expands and contracts the time axes of the two patterns. has been done. In particular, word speech recognition devices achieve high recognition rates using this DP matching method. For example, it is described in the following document.

エイチ・号コエ　アント゜　エス・チハ゛　　「夕゜イ
ナミフク　フ゜ロク゜ラミンク゜　オフ゜ティミｔ゜イ
ション　フ才　スネ゛−クン　ワート“　レコク゛ニシ
ョン」　アイイーイーイー　トランス・アコースティフ
ク、スピーチ、シク゜ナル　フ゜口ｔフシンク″　（　
Ｈ　．Ｓａｋｏｅ　　ａｎｄ　　Ｓ．Ｃｈ１ｂａ，　　
”Ｄｙｎａｍｉｃ　　ｐｒｏｇｒａｍｍｌｎｇ　　ｏｐ
ｔｉｍｉｚａｔ１ｏｎ　　ｆｏｒ　　ｓｐｏｒｋｅｎ　
　ｖｏｒｄ　　ｒｅｃｏｇｎｉｔｌｏｎ”，ＩＥＥＥ　
　ｔｒａｎｓ．Ａｃｏｕｓｔｌｃ，Ｓｐｅｅｃｈ，Ｓｉ
ｇｎａｌ　　Ｐｒｏｃｅｓｓｉｎｇ）　　，ｖｏｌ．Ａ
ＳＳＰ−２７　　ｐｐ，３３１１ｉ−３４９．１９７９
以下図面を参照しながら従来の音声認識装置の一例につ
いて説明する。Ant゜S Chiha゛ ``Evening Inami Fuku゜Following゜Think゜Off゜Timing゜tion゜Snaken Wort゛Recognition゛Eeeeeee Trance Acoustifuku, Speech, Sequential Music'' (
H. Sakoe and S. Ch1ba,
”Dynamic programming op.
timizat1on for sporken
IEEE
trans. Acoustlc, Speech, Si
gnal Processing), vol. A
SSP-27 pp, 3311i-349.1979
An example of a conventional speech recognition device will be described below with reference to the drawings.

第７図において、１は音声分析部で入力された音声信号
を特徴ベクトルの時系列に変換し発声された音声区間部
分を検出する。２は単語辞書で予め認識対象単語の特徴
ベクトルの時系列が標準パターンとして記憶されている
。３は類似度計算部で入力音声パターンと各標準パター
ンとの類似度をパターンマッチングにより計算する。４
は判定部で類似度が最大となる標準パターンを認識候補
音声として判定する。In FIG. 7, reference numeral 1 denotes a speech analysis unit which converts the input speech signal into a time series of feature vectors and detects the uttered speech section. 2 is a word dictionary in which a time series of feature vectors of words to be recognized is stored in advance as a standard pattern. 3 is a similarity calculation unit that calculates the similarity between the input speech pattern and each standard pattern by pattern matching. 4
The determination unit determines the standard pattern with the maximum similarity as the recognition candidate speech.

次に上記音声認識装置の動作を説明する。Next, the operation of the above speech recognition device will be explained.

まずマイクロホン等を通して入力された音声信号は音声
分析部１で線形予測係数、ケプスドラム係数等の特徴パ
ラメータの時系列、及びエネルギー値系列に変換され、
エネルギー値がある一定値以上の部分を音声区間として
検出する。ここでは既に認識対象単語それぞれの、音声
分析部１より得られた音声区間部分の特徴ベクトルの時
系列が標準パターンとして単語辞書２に記憶されている
ものとする。次に類似度計算部３で入力音声の特徴ベク
トルの時系列と単語辞書３に記憶された各標準パターン
の特徴ベクトルの時系列との間でＤＰマッチング計算を
行いそれぞれの類似度を記憶する。最後に判定部４で類
似度計算部３で得られた類似度の内最大の類似度を与え
る標準パターンを認識．候補音声として判定し認識結果
として外部に出力する。First, an audio signal input through a microphone or the like is converted into a time series of characteristic parameters such as linear prediction coefficients and cepsdrum coefficients, and an energy value series in the audio analysis unit 1.
A portion where the energy value exceeds a certain value is detected as a voice section. Here, it is assumed that the time series of feature vectors of the speech section portion of each recognition target word obtained by the speech analysis section 1 has already been stored in the word dictionary 2 as a standard pattern. Next, the similarity calculation unit 3 performs DP matching calculation between the time series of feature vectors of the input speech and the time series of feature vectors of each standard pattern stored in the word dictionary 3, and stores the respective similarities. Finally, the determination unit 4 recognizes the standard pattern that gives the maximum similarity among the similarities obtained by the similarity calculation unit 3. It is determined as a candidate voice and output as a recognition result to the outside.

発明が解決しようとする課題しかしながら上記の音声認識装置では、入力された音声
の区間検出を誤った場合（音声の語頭或は語尾の音節が
欠落して区間検出された場合）には誤認識を生じること
が多い。語頭の欠落の例として無声子音から始まる音声
（福岡、福島、福井など）の場合、例えばｒＦＵＫＵＯ
ＫＡ」　（福岡）と発声した場合を考えたとき、語頭の
ｒＦＵＪの母音ｒＵＪの部分は無声化してしまうことが
多くその結果語頭のエネルギーレベルが小さくなり、音
声区間検出の際にｒＫＵＯＫＡ」の部分だけを検出して
しまうことが多くなる。しかし人が音声を発声する場合
周囲雑音レベルの高い環境下等では意識的にはっきりと
発声しようとする傾向があり、そのため逆に語頭のｒＦ
Ｕ］の母音ｒＵＪの部分が有声化することもあり、この
場合はｒＦＵＫＵＯＫＡＪと区間検出されたりｒＫＵＯ
ＫＡＪと区間検出されたりする。このように発声の際の
状況により語頭が欠落する可能性のある場合、予め辞書
に登録された標準パターンと入力音声パターンの音声区
間検出結果が異なると（例えば、方がｒＦＵＫＵＯＫＡ
Ｊでもう一方がｒＫＵＯＫＡ」と検出された場合）、従
来のＤＰマッチング法では対応しきれずに誤認識を生じ
させる原因となっていた。語尾の欠落についても同様の
ことが言える。Problems to be Solved by the Invention However, in the above-mentioned speech recognition device, if the section of the input speech is incorrectly detected (the section is detected due to a missing syllable at the beginning or end of the speech), it may cause erroneous recognition. often occurs. As an example of missing the beginning of a word, in the case of a voice that starts with a voiceless consonant (Fukuoka, Fukushima, Fukui, etc.), for example, rFUKUO.
When we consider the case of uttering ``KA'' (Fukuoka), the vowel rUJ of rFUJ at the beginning of the word is often devoiced, and as a result, the energy level at the beginning of the word becomes low, and when detecting the speech interval, the vowel rUJ of rFUJ is often devoiced. In many cases, only the However, when people vocalize, they tend to consciously try to pronounce it clearly in environments with high ambient noise levels.
The vowel rUJ of [U] may be voiced, and in this case it is detected as rFUKUOKAJ or rKUO.
The section KAJ may be detected. In this case, when there is a possibility that the beginning of a word may be missing depending on the situation during pronunciation, if the speech interval detection results of the standard pattern registered in advance in the dictionary and the input speech pattern are different (for example, the beginning of the word is rFUKUOKA).
J and the other is rKUOKA), the conventional DP matching method cannot cope with this problem and causes misrecognition. The same can be said about missing word endings.

また従来のＤＰマッチングによる音声認識装置では音声
パターン全体に渡ってパターンマッチングを実行するた
め極端なマッチングにより誤認識を生じる場合がある。Further, in conventional speech recognition devices using DP matching, pattern matching is performed over the entire speech pattern, so that erroneous recognition may occur due to extreme matching.

例えば、大分（ＯＯＩＴＡ）と大阪（ＯＯＳＡＫＡ）と
を考えると前者は２つ、後者は３つのエネルギーの山が
音声パターン中に存在しエネルギー的に明らかに異なっ
たパターンであるが音韻は似通っているため誤認識を生
じることがある。このため音声パターン中のエネルギー
の山谷を検出しパターンマッチング経路を制限する方法
があるが、検出誤りを生じると誤認識の原因となる。For example, considering Oita (OOITA) and Osaka (OOSAKA), the former has two energy peaks and the latter has three peaks of energy in their vocal patterns, and although they are clearly different energetically, the phonemes are similar. Therefore, erroneous recognition may occur. For this reason, there is a method of detecting the peaks and troughs of energy in a voice pattern to limit the pattern matching path, but if a detection error occurs, it may cause erroneous recognition.

本発明は上記従来の音声認識装置の課題に鑑み、語頭或
は語尾の欠落する可能性のある音声が認識対象でも精度
よく認識することのでき、さらに類似単語間の誤認識を
極力防止し対象外単語入力時のりジェクト性能を向上さ
せることのできる音声認識装置を提供することを目的と
する。In view of the above-mentioned problems with conventional speech recognition devices, the present invention is capable of accurately recognizing speech that may be missing the beginning or end of a word, and furthermore prevents misrecognition between similar words as much as possible. An object of the present invention is to provide a speech recognition device that can improve the project performance when inputting foreign words.

本発明は、このような従来技術の課題を解決することを
目的とする。The present invention aims to solve the problems of the prior art.

課題を解決するための手段第１の本発明（請求項１）の音声認識装置は、入力音声
からエネルギー系列を含む特徴ベクトルの時系列を出力
する音声分析部と、前記音声分析部から出力されるエネ
ルギー系列から無音区間を検出しその開始フレームと最
終フレームを記憶する無音区間検出部と、入力パターン
と標準パターンそれぞれの、音声区間検出の結果得られ
た始点と最初の無音区間点とを始点候補点としてパター
ンマッチング演算を実行する類似度計算制御部とを備え
たことを特徴とする。Means for Solving the Problems A speech recognition device according to a first aspect of the present invention (claim 1) includes a speech analysis section that outputs a time series of feature vectors including an energy sequence from an input speech, and a speech analysis section that outputs a time series of feature vectors including an energy sequence from an input speech. A silent section detecting unit detects a silent section from an energy series and stores its start frame and final frame, and the start point is the start point obtained as a result of voice section detection and the first silent section point of the input pattern and the standard pattern, respectively. The present invention is characterized by comprising a similarity calculation control unit that performs a pattern matching operation on candidate points.

第２の本発明（請求項２）の音声認識装置は、入力音声
からエネルギー系列を含む特徴ベクトルの時系列を出力
する音声分析部と、前記音声分析部から出力されるエネ
ルギー系列から無音区間を検出しその開始フレームと最
終フレームを記憶する無音区間検出部と、各標準パター
ンと入力音声パターンそれぞれの無音区間の最終フレー
ムの交点でマッチング経路を限定するパターンマッチン
グ演算を可能性のある多《の経路について実行する第１
の類似度計算制御部と、入力パターンと標準パターンそ
れぞれの、音声区間検出の結果得られた終点と最後の無
音区間の開始フレームとを終端候補点としてパターンマ
ッチング演算を実行する第２の類似度計算制御部とを備
えたことを特徴とする。A speech recognition device according to a second aspect of the present invention (claim 2) includes a speech analysis section that outputs a time series of feature vectors including an energy series from an input speech, and a speech analysis section that extracts a silent interval from the energy series output from the speech analysis section. A silent section detector detects and stores the start frame and the last frame, and a pattern matching operation that limits the matching path at the intersection of the final frame of the silent section of each standard pattern and each input audio pattern is performed using a The first step to run on the route
and a second similarity that executes a pattern matching operation using the end point obtained as a result of voice section detection and the start frame of the last silent section as end candidate points for each of the input pattern and the standard pattern. It is characterized by comprising a calculation control section.

第４の本発明（請求項４）の音声認識装置は、入力音声
からエネルギー系列を含む特徴ベクトルの時系列を出力
する音声分析部と、前記音声分析部から出力されるエネ
ルギー系列から無音区間を検出しその開始フレームと最
終フレームを記憶する無音区間検出部と、各標準パター
ンと入力音声パターンそれぞれの無音区間の最終フレー
ムの交点でマッチング経路を限定するパターンマッチン
グ演算を可能性のある多くの経路について実行する第１
の類似度計算制御部と、入力パターンと標準パターンそ
れぞれの、音声区間検出の結果得られた終点と最後の無
音区間の開始フレームとを終端候補点としてパターンマ
ッチング演算を実行する第２の類似度計算制御部と、入
力パターン、標準パターンそれぞれの始端と最後の無音
区間の終了フレームとを始端候補点としてパターンマッ
チング演算を実行する第３の類似度演算制御部とを備え
たことを特徴とする。A speech recognition device according to a fourth aspect of the present invention (claim 4) includes a speech analysis unit that outputs a time series of feature vectors including an energy sequence from input speech, and a silent interval from the energy series output from the speech analysis unit. A silent section detector that detects and stores the start frame and the final frame, and a pattern matching operation that limits the matching path at the intersection of the final frame of the silent section of each standard pattern and each input audio pattern. The first to run for
and a second similarity that executes a pattern matching operation using the end point obtained as a result of voice section detection and the start frame of the last silent section as end candidate points for each of the input pattern and the standard pattern. The present invention is characterized by comprising a calculation control unit and a third similarity calculation control unit that executes a pattern matching calculation using the start ends of each of the input pattern and the standard pattern and the end frame of the last silent section as start end candidate points. .

作用第１の本発明は上記に述べた構成によって、語尾の欠落
を考慮して、各標準パターン及び入力音声パターンの終
端と共に音声パターンの時系列上の最後の無音区間の開
始フレームも終端点候補としてパターンマッチング計算
を制御し両者の類似度を求めていくことにより、語尾の
欠落が生じても精度よく認識することができる。Effects The first aspect of the present invention has the above-described configuration, and takes into account missing word endings and uses the start frame of the last silent section in the time series of the audio pattern as a candidate for the end point, as well as the end of each standard pattern and the input audio pattern. By controlling pattern matching calculations and determining the degree of similarity between the two, even if the ending of a word is missing, it can be recognized with high accuracy.

第２の本発明は上記に述べた構成によって上記に述べた
第１の本発明の作用に加えて、無音区間の交点により限
定された可能性のあるマッチング経路を考慮したパター
ンマッチングにより、無音区間検出が誤った場合でも、
対象外単語が入力されたときの拒絶性能が向上し、類似
単語間の誤認識を極力抑えることができる。In addition to the effects of the first aspect of the invention described above, the second aspect of the present invention has the configuration described above. Even if the detection is incorrect,
The rejection performance when non-target words are input is improved, and misrecognition between similar words can be minimized.

第４の本発明は上記に述べた構成によって上記に述べた
第２の本発明の作用に加えて、語頭の欠落を考慮して各
標準パターン及び入力音声パターンの始端と共に音声パ
ターンの時系列上の最初の無音区間の終了フレームも始
端点候補として類似度計算を制御することにより、語頭
の欠落が生じても精度よく認識することができる。According to the fourth aspect of the present invention, in addition to the effects of the second aspect of the present invention described above, the fourth aspect of the present invention has the configuration described above, and also takes into account the omission of the beginning of a word and performs the same on the chronological order of the voice pattern as well as the beginning of each standard pattern and input voice pattern. By controlling the similarity calculation using the end frame of the first silent section as a starting point candidate, even if the beginning of a word is missing, it can be recognized with high accuracy.

実施例以下に、本発明の実施例について図面を参照しながら説
明する。Examples Examples of the present invention will be described below with reference to the drawings.

第１図は本発明の第１の実施例における音声認識装置の
ブロック図である。第１図において、１０は音声分析部
で、第１図の音声分析部１と同じ機能をする。１１は無
音区間検出部で、音声のエネルギー値時系列から無音部
分を検出しその開始フレームを記憶する。１２は類似度
計算制御部で音声パターンの複数の終端候補点でのみ類
似度計算を行うよう制御する。１３はパターンマッチン
グ制御部で全体のパターンマッチング計算を制御する。FIG. 1 is a block diagram of a speech recognition device according to a first embodiment of the present invention. In FIG. 1, reference numeral 10 denotes a voice analysis section, which has the same function as the voice analysis section 1 in FIG. Reference numeral 11 denotes a silent section detecting section, which detects a silent section from a time series of audio energy values and stores its starting frame. Reference numeral 12 denotes a similarity calculation control unit which controls the similarity calculation to be performed only at a plurality of terminal candidate points of the speech pattern. A pattern matching control unit 13 controls the overall pattern matching calculation.

１４は単語辞書で認識対象単語の特徴ベクトルの時系列
と無音区間のフレーム位置を記憶する。１５は判定部で
類似度最大なる標準パターンを認識候補音声と判定する
。第２図は第１図に示した装置を説明するためのパター
ン図である。14 is a word dictionary that stores the time series of feature vectors of words to be recognized and frame positions of silent sections. 15 is a determination unit that determines the standard pattern with the maximum similarity as the recognition candidate speech. FIG. 2 is a pattern diagram for explaining the device shown in FIG. 1.

次に上記第１の実施例における音声認識装置の動作を説
明する。Next, the operation of the speech recognition device in the first embodiment will be explained.

まず、マイクロホン等を通して入力された音声信号は音
声分析部１０でアナログーディジタル変換され音声の特
徴ベクトルの時系列とエネルギー値系列に変換され発声
された音声部分が区間検出される。次に、無音区間検出
部１１で、音声分析部１０で得られたエネルギー値系列
から例えばエネルギー値が予め定められた閾値を下回る
区間が一定時間を超える区間を無音声区間として検出し
、その無音区間の開始フレームを記憶する。なお単語辞
書１４には既に認識対象となる音声の特徴ベクトルの時
系列と無音区間の情報がこ記憶されているものとする。First, a voice signal inputted through a microphone or the like is converted from analog to digital by the voice analysis section 10 into a time series of voice feature vectors and an energy value series, and the uttered voice portion is detected as a section. Next, the silent section detecting section 11 detects, as a silent section, a section in which the energy value is lower than a predetermined threshold for a certain period of time from the energy value series obtained by the speech analyzing section 10, and Store the starting frame of the interval. It is assumed that the word dictionary 14 has already stored the time series of feature vectors of speech to be recognized and information on silent intervals.

パターンマッチングの漸化式としてここでは次式を考え
る。Here, we consider the following equation as a recurrence equation for pattern matching.

ｇ（１　．１）：ｄ（１　．１）　　　　　　　　　　
一−　−　−　（＋）ｇ（１　＋Ｊ）＝ｍｌｎ（ｇ（１
−１＋Ｊ）　＋ｇ（１−１＋ｊ−１）＋ｇ（＋−１＋ｊ
−２））＋ｄ（１，Ｊ）　　　　　　　　　一−　−　
−　（２）ここでｇ（Ｉ，ｊ）、ｄ（Ｘｌ　ｊ）はそれ
ぞれ入力音声パターンのｉフレーム、標準パターンのｊ
フレームにおける累積距離及び特徴ベクトル間距離を表
す。なお漸化式は上式以外のものでも良い。またここで
は簡単のため整合窓によるマッチング経路の制限は考え
ないものとする。例として第２図に示したような入力音
声パターンと標準パターンとの間でパターンマッチング
を実行する場合について考える。第２図において音声パ
ターンに対する包絡線はエネルギーの時間変化を示して
いる。ここで入力音声パターンの最後の無音区間の開始
フレームを１　０Ｅ１　　標準パターンの最後の無音区
間の開始フレームをｊ　ＯＥとする。g(1.1):d(1.1)
- - - (+)g(1 +J)=mln(g(1
-1+J) +g(1-1+j-1)+g(+-1+j
−2))+d(1,J) 1− −
- (2) Here, g (I, j) and d (Xl j) are the i frame of the input audio pattern and the j frame of the standard pattern, respectively.
It represents the cumulative distance in a frame and the distance between feature vectors. Note that the recurrence formula may be other than the above formula. Furthermore, for simplicity, we will not consider limiting the matching path by the matching window. As an example, consider a case where pattern matching is executed between an input voice pattern and a standard pattern as shown in FIG. In FIG. 2, the envelope for the speech pattern shows the change in energy over time. Here, the start frame of the last silent section of the input audio pattern is 1 0E1, and the start frame of the last silent section of the standard pattern is j OE.

ところで認識対象音声が無声子音等で終わる場合には特
に標準パターン、入力音声パターンのどちらも同様に語
尾の区間検出を誤る可能性がある。By the way, especially when the speech to be recognized ends with a voiceless consonant or the like, there is a possibility that the final section of the word will be detected incorrectly in both the standard pattern and the input speech pattern.

このように両パターンに語尾の欠落の可能性がある場合
には、第２図に示したようにパターンマッチング経路の
終点としては、　（Ｉ．Ｊ）、　（　Ｉ　ＱＥｗＪ）、
（Ｉ．　　ｊｏε）の３点が考えられ、この３点で終了
する３つのマッチング経路から最適な経路を選択すれば
語尾の欠落に対応できるパターンマッチングが実現でき
る。In this way, if there is a possibility that the ending of the word is missing in both patterns, the end points of the pattern matching path are (I.J), (I QEwJ), as shown in Figure 2.
There are three possible points (I. joε), and by selecting the optimal path from the three matching paths that end at these three points, pattern matching that can deal with missing word endings can be realized.

パターンマッチング制御部１３では漸化式（２）に従い
累積距離計算を開始する。まずｉ＝１として標準パター
ンの第１フレーム即ちｊ＝１から縦方向に累積距離を計
算する。これが終了すると順次ｉをインクリメントしな
がら同様の処理を続ける。ｉ＝Ｉまでの処理が終了する
と類似度計算制御部１２では（Ｉ．　　Ｊ）　　（Ｉ，
　　Ｊｏε）での累積距離の小さい方を２つのパターン
間の類似度とする。The pattern matching control unit 13 starts calculating the cumulative distance according to recurrence formula (2). First, with i=1, the cumulative distance is calculated in the vertical direction from the first frame of the standard pattern, that is, j=1. When this is completed, the same process is continued while sequentially incrementing i. When the processing up to i=I is completed, the similarity calculation control unit 12 calculates (I. J) (I,
Let the smaller cumulative distance in Joε) be the similarity between the two patterns.

ここでは非線形な式（２）のような傾斜制限を用いてい
るので点（Ｉ，　　Ｊ）で得られる累積距離は既に第２
図の経路ａ，　　ｂの内最適な経路を選択したときの結
果となっている。こうして得られた類似度は第２図の終
点候補点３点を考慮した３種類のマッチング経路のうち
最適な経路を選択したときのパターン間距離となる。な
お、類似度の計算は点（Ｉ．　　Ｊ）　（Ｉ，　　Ｊｏ
ε）　　（ＩＣＩＥＩＪ）の３点の累積距離のそれぞれ
を入力音声パターンの長さで正規化した値の内最小とな
るものをパターン間の類似度としても良い。Here, since we use a slope restriction like the nonlinear equation (2), the cumulative distance obtained at point (I, J) is already the second
This is the result when the optimal route is selected from routes a and b in the figure. The degree of similarity obtained in this way becomes the distance between patterns when the optimal route is selected from among the three types of matching routes considering the three end point candidate points in FIG. 2. Note that the similarity calculation is performed using the point (I. J) (I, Jo
ε) (ICIEIJ) may be normalized by the length of the input speech pattern, and the minimum value among the values may be taken as the similarity between patterns.

単語辞書１４に記憶された各標準パターンと入力音声パ
ターン間の類似度がすべて計算されたのちに、判定部１
５で類似度の最大なる標準パターンを認識候補音声とし
て判定し外部に出力する。After all the similarities between each standard pattern stored in the word dictionary 14 and the input speech pattern are calculated, the determination unit 1
In step 5, the standard pattern with the highest degree of similarity is determined as the recognition candidate speech and output to the outside.

以上のように本実施例によれば、音声の無音区間を検出
する無音区間検出部１１と、語尾欠落を考慮した複数の
終端候補点での累積距離からパターン間の類似度を計算
する類似度計算制御部１２とを備えたことにより、標準
パターン、入力音声パターンのどちらで語尾の欠落が生
じても、１回のパターンマッチング演算ですべての可能
性を考慮した計算が実行でき、計算量を増加させずに語
尾の不安定な音声に対しても精度よく認識することが可
能になる。As described above, according to the present embodiment, the silent section detection unit 11 detects a silent section of speech, and the similarity measure calculates the similarity between patterns from the cumulative distance at a plurality of terminal candidate points in consideration of missing word endings. By being equipped with the calculation control unit 12, even if the ending of a word is missing in either the standard pattern or the input speech pattern, calculations can be performed that consider all possibilities with a single pattern matching operation, reducing the amount of calculation. It becomes possible to accurately recognize speech with unstable endings without increasing the number of words.

次に本発明の第２の実施例の音声認識装置について図面
を参照しながら説明する。Next, a speech recognition device according to a second embodiment of the present invention will be described with reference to the drawings.

第３図は本発明の第２の実施例における音声認識装置の
ブロック図である。第３図において２０は吾声の特徴ベ
クトル及びエネルギーの時系列を抽出する音声分析部、
２１は音声パターン中の無音区間の開始フレームと終了
フレームを検出する無音区間検出部、２２は無音区間で
マッチング経路を制限したパターンマッチング計算を制
御する第１の類似度計算制御部、２３は各点での累積距
離と共にそれ以前にどの無音区間点の交点をマツッチン
グ経路が通過してきたかを記憶するバックポインター記
憶部、２４は複数の終端点候補を考慮して類似度計算を
制御する第２の類似度計算制御部、２５は全体の累積距
離演算を制御するパターンマッチング制御部、２６は標
準パターンの特徴ベクトルと無音区間のフレーム位置を
記憶する単語辞書、２７は類似度最大の標準パターンを
認識候補と判定する判定部である。第４図は第３図に示
した装置を説明するためのパターン図である。FIG. 3 is a block diagram of a speech recognition device according to a second embodiment of the present invention. In FIG. 3, 20 is a speech analysis unit that extracts the feature vector and energy time series of hoarseness;
Reference numeral 21 denotes a silent interval detection unit that detects the start frame and end frame of a silent interval in a voice pattern, 22 a first similarity calculation control unit that controls pattern matching calculation that limits the matching path in the silent interval, and 23 each A back pointer storage section 24 stores the cumulative distance at a point as well as which intersection of silent section points the matching path has passed before; A similarity calculation control unit, 25 is a pattern matching control unit that controls the overall cumulative distance calculation, 26 is a word dictionary that stores feature vectors of standard patterns and frame positions of silent sections, and 27 recognizes the standard pattern with the maximum similarity. This is a determination unit that determines that the candidate is a candidate. FIG. 4 is a pattern diagram for explaining the device shown in FIG. 3.

次に上記第２の実施例における音声認識装置の動作を説
明する。Next, the operation of the speech recognition device in the second embodiment will be explained.

まず入力された音声信号は音声分析部２０で特徴ベクト
ルの時系列とエネルギー値の時系列に変換される。次に
無音区間検出部２１で音声分析部２０で得られたエネル
ギー値系列から例えば、エネルギー値がある一定値以下
の部分を無音区間として検出しその開始フレームと終了
フレームを記憶する。なお、予め認識対象音声の特徴ベ
クトルの時系列とそれぞれの無音区間の開始フレームと
終了フレームが単語辞書２６に記憶されているものとす
る。次に第１の類似度計算制御部２２で、入力音声パタ
ーンは各標準パターンとの間で無音区間検出部２１で検
出された無音区間の終了フレームでマッチング経路を拘
束したパターンマッチング計算を実行していく。又第２
の類似度計算制御部２４では入力音声パターンと標準パ
ターンの終端とそれぞれの最後の無音区間の開始フレー
ムを終端候補として累積距離計算を制御する。以下パタ
ーンマッチングの動作をここでは簡単のために入力パタ
ーンに２個、標準パターンに１個無音区間が存在する場
合について説明する。First, the input audio signal is converted by the audio analysis section 20 into a time series of feature vectors and a time series of energy values. Next, the silent section detecting section 21 detects, for example, a section where the energy value is less than a certain value from the energy value series obtained by the speech analyzing section 20 as a silent section, and stores the start frame and end frame thereof. It is assumed that the time series of feature vectors of the speech to be recognized and the start and end frames of each silent period are stored in the word dictionary 26 in advance. Next, the first similarity calculation control unit 22 executes a pattern matching calculation between the input audio pattern and each standard pattern in which the matching path is constrained at the end frame of the silent interval detected by the silent interval detection unit 21. To go. Also second
The similarity calculation control unit 24 controls the cumulative distance calculation using the end of the input speech pattern, the standard pattern, and the start frame of the last silent section of each as end candidates. For simplicity, the pattern matching operation will be described below for the case where there are two silent sections in the input pattern and one silent section in the standard pattern.

ここで入力パターンの始端及び終端をそれぞれＩｓ（　
　１），Ｉｓ（　　１），　　標準パターンの始端終端
をそれぞれＪｉ（　　１），Ｊ２（　　Ｊ）とする。Here, the start and end of the input pattern are respectively Is(
1), Is(1), and the starting and ending ends of the standard pattern are Ji(1) and J2(J), respectively.

また入力パターンの無音区間の終了フレームを始端に近
い方からＩ＋＋　　Ｉｓ、最後の無音区間の開始フレー
ムをＩ２゜とし標準パターンについても同様にＪ＋及び
Ｊ　，ｊとする。これらの様子は第４図に示されている
。ここでは簡単のために整合窓による制限は考えないこ
とにする。Further, the end frame of the silent section of the input pattern is I++ Is from the one closest to the start, the start frame of the last silent section is I2°, and the standard pattern is similarly J+, J2, j. These situations are shown in FIG. Here, for the sake of simplicity, we will not consider restrictions due to matching windows.

パターンマッチングは入力パターン、標準バタ一ン上で
検出された無音区間の終了フレーム同士を対応させた交
点上でマッチング経路を拘束して行うが、無音区間の検
出が正し《行われない場合を考慮すると第４図に示した
ａｂｃｄの４種類の経路を考える必要がある。パターン
マッチングの漸化式は第１の実施例の式（１）（２）と
同じものを使用する。ここで点（Ｌｊ）でのバックポイ
ンターをｂ（ｉ，ｊ）とし初期値としてｂ（１．　　１
）　　　（１．　　１）とする。Pattern matching is performed by constraining the matching path on the intersection of the input pattern and the end frames of the silent sections detected on the standard baton, but if the silent section is detected correctly (if it is not carried out) Taking this into account, it is necessary to consider the four types of abcd routes shown in FIG. The recurrence formulas for pattern matching are the same as formulas (1) and (2) in the first embodiment. Here, let the back pointer at point (Lj) be b(i, j) and the initial value is b(1. 1
) (1.1).

まずパターンマッチング制御部２５でｉ＝１，ｊ＝１と
してｊをインクリメントしながら漸化式に従い累積距ｉ
！ｔｆｔｇを計算しｊ＝Ｊまで処理が終わるとｉをイン
クリメントし同様の処理を続ける。First, the pattern matching control unit 25 sets i=1, j=1, increments j, and calculates the cumulative distance i according to the recurrence formula.
! When tftg is calculated and the processing is completed until j=J, i is incremented and the same processing is continued.

各点（　ｉ，　　ｊ　）でのバックポインターは以下の
ようにして求める。The back pointer at each point (i, j) is determined as follows.

ｉ　”　Ｉ　ｋかつｊ　＝Ｊ＋のときｂ　（１＋　　Ｊ）　　　（Ｉｋ＋　　Ｊ＋）ｉ≠Ｉｂ
またはｊ　ｆ−Ｌのとき点（ｉ，ｊ）がＩｋ＜ｉ：ＳＩｂ−＋　　かつＪ＋＜ｊ
ｓＪ＋。１の範囲内にあるとき漸化式（２）の右辺第１項の中の最
小値を与えるｇＯに対応するバックポインターの内容を
ｂ（ｉ，ｊ）とする。但しここでｂ（ｉ，ｊ）として取
り得る点（Ｉ．，　　Ｊ，）は、ｍ＝ｋかつＨ：ａＬ　
　　　またはｎ＝１かつｍ≦ｋでなければならない。このようにして得られたバックポ
インターｂ（ｉ，ｊ）はバックポインター記憶部２３に
記憶される。i ” When I k and j = J+, b (1+ J) (Ik+ J+)i≠Ib
Or when j f−L, the point (i, j) is Ik<i:SIb−+ and J+<j
sJ+. Let b(i,j) be the content of the back pointer corresponding to gO that gives the minimum value in the first term on the right side of recurrence formula (2) when it is within the range of 1. However, here, the point (I., J,) that can be taken as b (i, j) is m = k and H: aL
Or n=1 and m≦k. The back pointer b(i, j) thus obtained is stored in the back pointer storage section 23.

ｉ＝Ｉまでの処理が終了すると以降の処理は第１の実施
例の類似度計算制御部及び判定部と同じ動作をそれぞれ
第２の類似度計算制御部２４、判定部２７で行う。ここ
で得られる入力パターンと標準パターンとの類似度は第
４図の４種類の経路の内最適な経路を選んだときの類似
度となる。When the processing up to i=I is completed, the subsequent processing is performed by the second similarity calculation control section 24 and the judgment section 27, respectively, which perform the same operations as the similarity calculation control section and the judgment section of the first embodiment. The degree of similarity between the input pattern and the standard pattern obtained here is the degree of similarity when the optimal route is selected from the four types of routes shown in FIG. 4.

以上のように本実施例によれば、音声の無音区間を検出
する無音区間検出部２１と、無音区間の交点により経路
を制限したパターンマッチングにより入力パターンと標
準パターンとの類似度計算を制御する第１の類似度計算
制御部２２と、過去に通過した無音区間の交点を記憶す
るバックポインター記憶部２３と、複数の終端候補を考
慮してパターンマッチング演算を行う第２の類似度計算
制御部２４を設けたことにより、１回のパターンマッチ
ング演算で経路限定の全ての可能性を考慮することがで
き、又同時に語尾の欠落を考慮した複数の終端候補で類
似度を計算することにより、無音区間検出を誤った場合
でも精度よく認識が可能となり、語尾の欠落した音声パ
ターンが入力パターン、標準パターンのどちら側で生じ
ても正しく認識することができる。As described above, according to the present embodiment, the calculation of the similarity between the input pattern and the standard pattern is controlled by the silent section detection unit 21 that detects silent sections of audio and pattern matching in which the path is restricted by the intersections of the silent sections. A first similarity calculation control unit 22, a back pointer storage unit 23 that stores intersection points of silent sections passed in the past, and a second similarity calculation control unit that performs a pattern matching operation in consideration of a plurality of termination candidates. By providing 24, it is possible to consider all possibilities of path restriction in a single pattern matching operation, and at the same time, by calculating the similarity with multiple terminal candidates that take into account missing endings, it is possible to eliminate silence. Accurate recognition is possible even when a section is incorrectly detected, and a voice pattern with a missing ending can be recognized correctly regardless of whether it occurs in the input pattern or the standard pattern.

次に本発明の第３の実施例の音声認識装置について図面
を参照しながら説明する。Next, a speech recognition device according to a third embodiment of the present invention will be described with reference to the drawings.

第５図は本発明の第３の実施例における音声認識装置の
ブロック図である。第５図において、音声分析部３０、
無音区間検出部３１、バックポインター記憶部３３、第
１の類似度計算制御部３２、第２の類似度計算制御部３
４、パターンマッチング耶３８、単語辞書３７、判定部
３８は第２の実施例のブロック図の第３図のそれぞれに
対応する同名の各部と同一の機能を有する。３５は第３
の類似度計算制御部で、入力パターン、標準パターンそ
れぞれの始端及び最初の無音区間の終了フレームを始端
候補としてパターンマッチング演算を制御する。第６図
は第５図に示した装置を説明するためのパターン図であ
る。FIG. 5 is a block diagram of a speech recognition device according to a third embodiment of the present invention. In FIG. 5, the voice analysis section 30,
Silent section detection section 31, back pointer storage section 33, first similarity calculation control section 32, second similarity calculation control section 3
4. The pattern matching unit 38, the word dictionary 37, and the determining unit 38 have the same functions as the units with the same names corresponding to those in FIG. 3 of the block diagram of the second embodiment. 35 is the third
The similarity calculation control unit controls the pattern matching calculation using the starting edges of the input pattern and the standard pattern and the end frame of the first silent section as starting edge candidates. FIG. 6 is a pattern diagram for explaining the device shown in FIG. 5.

次に上記第３の実施例における音声認識装置の動作を説
明する。Next, the operation of the speech recognition device in the third embodiment will be explained.

まず入力された音声信号は音声分析部３０で特徴ベクト
ルの時系列とエネルギー値の時系列に変換される。次に
無音区間検出部３１で音声分析部３０で得られたエネル
ギー値系列から例えば、エネ／Ｌ／ギー値がある一定値
以下の部分を無音区間として検出しその開始フレームと
終了フレームを記憶する。なお、予め認識対象音声の特
徴ベクトルの時系列とそれぞれの無音区間の開始フレー
ムと終了フレームが単語辞書３７に記憶されているもの
とする。First, the input audio signal is converted by the audio analysis section 30 into a time series of feature vectors and a time series of energy values. Next, the silent section detecting section 31 detects, for example, a portion where the energy/L/Gy value is less than a certain value from the energy value series obtained by the speech analyzing section 30 as a silent section, and stores the start frame and end frame of the section. . It is assumed that the time series of feature vectors of the speech to be recognized and the start and end frames of each silent section are stored in the word dictionary 37 in advance.

次にパターンマッチング演算の制御について説明する。Next, control of pattern matching calculation will be explained.

ここでは第６図に示したように入カバタ，ーン、標準パ
ターンにそれぞれ無音区間が２個づつ存在する場合につ
いて考える。ここで入力／ｆターンの始端、終端をそれ
ぞれＩｓ（　　１）、Ｉ，標準パターンについても同様
にＪｅ（　１）、Ｊとする。また時系列順の無音区間の
終了フレーム、及び最後の無音区間の開始フレームを順
番に入力ｚ＜ターンについてＩＩＩ　　Ｉｌｌ　　Ｉ２
’　　標ＭｌハターンについてＪＩ＊Ｊｓ＋Ｊ２’　と
する。Here, we will consider the case where there are two silent sections each in the introductory pattern, tone pattern, and standard pattern, as shown in FIG. Here, let the start and end of the input/f turn be Is(1) and I, respectively, and Je(1) and J for the standard pattern as well. Also input the end frame of the silent section in chronological order and the start frame of the last silent section in order for z<turn III Ill I2
'JI*Js+J2' for the target Ml Hataan.

ところで認識対象音声が無声子音等から始まる場合には
標準パターン、入力音声パターンのどちらにも同様に語
頭の区間検出を誤る可能性がある。By the way, when the speech to be recognized starts with an unvoiced consonant or the like, there is a possibility that the initial section of the word will be detected incorrectly in both the standard pattern and the input speech pattern.

このように両パターンに語頭の欠落の可能性がある場合
にはパターンマッチング経路の始端として（１、１）、
　（Ｌ，　　１），（１，　　Ｊ＋）の３点が考えられ
、この３点から出発する全ての経路から最適な経路を選
択すれば語頭の欠落に対応できるパターンマッチングが
実現できる。パターンマッチングの漸化式として第１の
実施例の式（１）（２）を使用する。In this way, if there is a possibility that the beginning of a word is missing in both patterns, (1, 1) is used as the starting point of the pattern matching path.
Three points, (L, 1) and (1, J+), are considered, and if the optimal route is selected from all routes starting from these three points, pattern matching that can deal with missing beginnings of words can be realized. Equations (1) and (2) of the first embodiment are used as recurrence equations for pattern matching.

まず第３の類似度計算制御部３５で初期設定として次式
の設定を行う。First, the third similarity calculation control unit 35 sets the following equation as an initial setting.

ｄ（１，ｊ）＝φ 但し　ｊ≠１、Ｊ，ｄ（ｉ．１）＝■ 但し　ｉ　＞　Ｉ　＋点（Ｉ＋，１）から出発する経路としては、点（１、１
）から点（ＩＩ，　　１）への直線をマッチング経路上
に許すことによって疑似的な経路として考える。d(1,j)=φ However, j≠1, J, d(i.1)=■ However, i > I + As a route starting from point (I+,1), point (1,1
) to point (II, 1) is considered as a pseudo route by allowing it on the matching route.

この設定によりパターンマッチング経路が上記の始端候
補点３点以外の点からは出発しないように制限を加えた
ことになる。この後、ｉ＝１から漸化式（２）に従い累
積距離の計算を開始する。この後第１の類似度計算制御
部３２で無音区間検出誤りを考慮して無音区間の交点で
マッチング経路を制限したパターンマッチング演算を実
行する。This setting imposes a restriction so that the pattern matching route does not start from any point other than the three starting end candidate points. Thereafter, calculation of cumulative distance is started from i=1 according to recurrence formula (2). Thereafter, the first similarity calculation control unit 32 executes a pattern matching calculation in which the matching path is limited at the intersection of the silent sections, taking into account the detection error of the silent sections.

また第２の類似度計算制御部３４で入力パターンと標準
パターンの終端とそれぞれの最後の無音区間・の開始フ
レームを終端候補としてバタンマッチングを制御する。Further, the second similarity calculation control unit 34 controls bang matching using the end of the input pattern, the standard pattern, and the start frame of the last silent section of each as end candidates.

このように複数の始端及び終端候補、無音区間検出誤り
を考慮すると第６図に示したように多くの可能性のある
マッチング経路が考えられる。Considering a plurality of start and end candidates and silent section detection errors as described above, many possible matching paths can be considered as shown in FIG. 6.

これ以降の第１の類似度計算制御部３２、第２の類似度
計算制御部３４で行われる処理内容はそれぞれ第２の実
施例の第１の類似度計算制御部２２、第２の類似度計算
制御部２４と同一であるのでここでは説明を省略する。The processing contents performed by the first similarity calculation control unit 32 and the second similarity calculation control unit 34 from this point onwards are the same as those of the first similarity calculation control unit 22 and the second similarity calculation control unit 34 of the second embodiment, respectively. Since it is the same as the calculation control section 24, the explanation will be omitted here.

こうして単語辞書３７に記憶された全ての標準パターン
との類似度が計算された後に、判定部３８で類似度が最
大となる標準パターンを認識候補音声として判定し外部
に出力する。After the degrees of similarity with all the standard patterns stored in the word dictionary 37 are calculated in this way, the determination unit 38 determines the standard pattern with the maximum degree of similarity as the recognition candidate speech and outputs it to the outside.

以上のように第３の実施例によれば、音声の無音区間を
検出する無音区間検出部３１、無音区間の交点でマッチ
ング経路を制限するパターンマッチングを実行する第１
の類似度計算制御部３２と、複数の終端候補を考慮した
パターンマッチング演算を行う第２の類似度計算制御部
３４と、語頭の欠落を考慮した複数の始点候補からのパ
ターンマッチング演算を行う第３の類似度計算制御部３
５を設けたことにより、語頭或は語尾の欠落した音声が
入力された場合も精度よく認識することが可能となり、
また無音区間検出を誤っても１回のパターンマッチング
で経路制限の全ての可能性を考慮することができ、類似
単語間の認識性能が向上し、対象外音声入力時にも極力
拒絶することができる。As described above, according to the third embodiment, the silent section detecting section 31 detects a silent section of audio, and the first silent section detecting section 31 executes pattern matching to limit a matching path at the intersection of the silent sections.
a second similarity calculation control unit 34 that performs a pattern matching calculation taking into account a plurality of end point candidates; 3 similarity calculation control unit 3
By providing 5, it is possible to recognize accurately even when input audio is missing the beginning or end of a word.
In addition, even if a silent section is detected incorrectly, all possibilities of route restriction can be considered in one pattern matching, improving the recognition performance between similar words and rejecting as much as possible even when untargeted speech is input. .

発明の効果以上のように第１の本発明は、音声パターン中の無音区
間を検出しそのフレーム位置を記憶する無音区間検出部
と、語尾欠落を考慮した複数のパターンマッチングの終
端候補点で終了するマッチング経路に従って累積距離演
算を実行する類似度計算制御部を備えたことにより、認
識対象音声に語尾の不安定な音声が存在した場合に入力
音声パターン、標準パターンのどちらに語尾の欠落が生
じたとしても、語頭の欠落を考慮した全てのパターンマ
ッチングが１回の計算で実現でき計算量をほとんど増加
させることなく認識性能を向上させることのできる音声
認識装置を提供することができる。Effects of the Invention As described above, the first aspect of the present invention includes a silent section detection unit that detects a silent section in a speech pattern and stores its frame position, and a terminal candidate point for multiple pattern matching that takes into account missing word endings. Equipped with a similarity calculation control unit that performs cumulative distance calculations according to the matching path, it is possible to eliminate missing endings from either the input speech pattern or the standard pattern when there is an unstable ending in the speech to be recognized. Even so, it is possible to provide a speech recognition device that can perform all pattern matching in consideration of missing beginnings of words in one calculation, and can improve recognition performance without substantially increasing the amount of calculation.

また第２の本発明は、標準パターン、入力音声パターン
それぞれの無音区間の最終フレームの交点でマッチング
経路を限定するパターンマッチング演算を無音区間検出
誤りを考慮して可能性のある多くの経路について実行す
る第１の類似度計算制御部と、各点での累積距離計算時
にその点に到達するまでにどの無音区間の終了フレーム
の交点をマッチング経路が通過してきたかを記憶するバ
ックポインターを備えたことにより、無音区間検出誤り
が生じてもエネルギー包絡に合った可能性のあるマッチ
ング経路に沿ったパターンマッチング演算を１回の計算
で実行でき、計算量を増加させずに対象外単語入力時の
りジェクト性能を向上させ、類似単語間の誤認識を極力
抑えることのできる音声認識装置を提供することができ
る。In addition, the second invention performs a pattern matching operation that limits the matching path at the intersection of the final frames of the silent sections of the standard pattern and the input audio pattern on many possible paths, taking into account silent section detection errors. a first similarity calculation control unit for calculating the cumulative distance at each point; and a back pointer for storing the intersection point of the end frame of the silent section that the matching route has passed through before reaching that point when calculating the cumulative distance at each point. As a result, even if a silent section detection error occurs, the pattern matching operation along the matching path that may match the energy envelope can be executed in one calculation. It is possible to provide a speech recognition device that can improve performance and minimize misrecognition between similar words.

また第４の本発明は、語頭欠落を考慮して、音声パター
ンの始点及び最初の無音区間の終了フレームを始端候補
点としてパターンマッチング演算を実行する第３の類似
度計算制御部を備えたことにより、語頭の欠落が標準パ
ターン、入力音声バターンのどちら側に生じても語尾の
欠落を考慮した全てのパターンマッチング演算を１回の
計算で実行でき計算量をほとんど増加させずに認識性能
を向上させる音声認識装置を提供することができる。Further, the fourth aspect of the present invention is provided with a third similarity calculation control unit that executes a pattern matching operation using the start point of the speech pattern and the end frame of the first silent section as start point candidate points, taking into account the beginning of a word drop. As a result, all pattern matching operations that take into account missing word endings can be performed in a single calculation, regardless of whether the missing beginning of a word occurs in the standard pattern or the input speech pattern. Recognition performance is improved with almost no increase in the amount of calculation. It is possible to provide a speech recognition device that allows

[Brief explanation of drawings]

第１図は本発明の第１の実施例の音声認識装置のブロッ
ク図、第２図は同装置の動作説明図、第３図は第２の実
施例の音声認識装置のブロック図、第４図は同窓値の動
作説明図、第５図は第３の実施例の音声認識装置のブロ
ック図、第８図はその動作説明図、第７図は従来の音声
認識装置のブロック構成図、である。゛３０●●●音声分析部、３１●●●無音区間検出部、３
２●●拳第１の類似度計算制御部、３３●●●バックポ
インター記憶部、３４●●●第２の類似度計算制御部、
３５●●φ第３の類似度計算制御部、３６●●●パター
ンマッチング制御部、３７●●●単語辞書、３８●●●
判定部。代理人の氏名　弁理士　粟野重孝　ほか１名第　１　図
　　　　　　　　　書声入力＃果出力富図窮図Ｓト果出汝琲出力第図第図八丈音一ノでターン第７図坩果出力FIG. 1 is a block diagram of a speech recognition device according to a first embodiment of the present invention, FIG. 2 is an explanatory diagram of the operation of the same device, FIG. 3 is a block diagram of a speech recognition device according to a second embodiment, and FIG. 5 is a block diagram of the speech recognition device of the third embodiment, FIG. 8 is a diagram illustrating its operation, and FIG. 7 is a block diagram of the conventional speech recognition device. be.゛30●●●Speech analysis section, 31●●●Silent section detection section, 3
2●●Fist first similarity calculation control unit, 33●●●back pointer storage unit, 34●●●second similarity calculation control unit,
35●●φ third similarity calculation control unit, 36●●●pattern matching control unit, 37●●●word dictionary, 38●●●
Judgment department. Name of agent Patent attorney Shigetaka Awano and 1 other person Figure 1 Voice input #fruit output Fuzufuzu S to fruit output Figure 7 Figure 7 Turn at Otoichi no Hachijo

Claims

[Claims]

(1) A speech analysis section that outputs a time series of feature vectors including energy sequences from the input speech signal and detects uttered speech sections; a silent section detecting section that detects a speech portion and stores its start frame, and detecting the end of the speech section detected by the speech analyzing section of each of the standard pattern and input speech pattern stored in advance as speech to be recognized, and the silence section. Similarity is calculated by pattern matching between the standard pattern and the input speech pattern, using the end frame of the last silent section in the speech pattern time series detected by the section detection unit as the end frame of the speech pattern. A speech recognition device comprising: a calculation control section.

(2) A speech analysis unit that outputs a time series of feature vectors including energy sequences from the input speech signal and detects uttered speech sections; A pattern is created at the intersection of the silent section detecting section that detects the audio part and storing its start frame and ending frame, and the end frame of the silent section obtained by the silent section detecting section of the input pattern and each standard pattern stored in the dictionary in advance. a first similarity calculation control unit that performs pattern matching distance calculation for limiting the matching path on a matching path limited by intersections of end frames of all possible silent periods in consideration of a silent period detection error; , for each standard pattern and input audio pattern, the end of the audio section detected by the audio analysis section and the start frame of the last silent section in the audio pattern time series detected by the silent section detection section are used as the audio pattern. A speech recognition device comprising: a second similarity calculation control unit that controls similarity calculation between a standard pattern and an input speech pattern as a terminal candidate point.

(3) When calculating the cumulative distance at each point, the first similarity calculation control unit uses a back pointer that stores the intersection point of the end frame of the silent section that the pattern matching path has passed before reaching that point. 3. The speech recognition device according to claim 2, further comprising a speech recognition device.

(4) A speech analysis unit that outputs a time series of feature vectors including energy sequences from the input speech signal and detects uttered speech sections; A silent section detecting section detects a speech portion and stores its start frame and end frame; and a silent section detecting section detects a speech portion and stores its start frame and end frame; A third control unit that controls similarity calculation between the standard pattern and the input speech pattern by using the starting point and the end frame of the first silent section in the speech pattern time series detected by the silent section detection unit as starting point candidate points for each speech pattern.
The pattern matching distance calculation, which limits the pattern matching path at the intersection of the end frame of the silent section obtained by the similarity calculation control section of the input pattern and the silent section detection section of each standard pattern, takes into account silent section detection errors. The first similarity calculation control section performs the matching on a matching path limited by the intersections of the end frames of all possible silent sections, and the speech analysis section detects each standard pattern and input speech pattern. the end of the voice section,
A second degree of similarity that controls similarity calculation between the standard pattern and the input voice pattern, using the start frame of the last silent period in the voice pattern time series detected by the silent zone detection unit as the end candidate point of the voice pattern. A speech recognition device comprising: a calculation control section.