JPS6312000A

JPS6312000A - Voice recognition equipment

Info

Publication number: JPS6312000A
Application number: JP61156635A
Authority: JP
Inventors: 武志則松
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1986-07-03
Filing date: 1986-07-03
Publication date: 1988-01-19
Anticipated expiration: 2009-12-12
Also published as: JPH06100919B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、認識候補音声を導き出す不特定話者用の音声
認識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a speech recognition device for non-specific speakers that derives recognition candidate speech.

従来の技術一般に、不特定話者用音声認識装置では、多人数の多数
の音声パタンをクラスタリング手法によりグループ分け
し、それらの代表パタンを標準パタンとして登録し、入
力音声パタンと辞書に蓄えられたすべての標準パタンと
の間で類似度を計算した後、類似度の最大となる標準パ
タンを認識候補音声とする方法が行なわれている。二つ
の音声パタンの類似度を計算するためには動的計画法（
ダイナミック−プログラミング法）を用いて、二つのパ
タンの時間軸を非線形に伸縮するパタンマツチング（以
下、ＤＰマツチングと記す。）が使用されている。特に
、単語音声認識装置では、このＤＰマツチング法により
高い認識率を得ている。（例えば、［ダイナミック　プ
ログラミングオプティミゼインヨン　フォ　スポークン
　ワード　レコダ＝　ン、　７　Ｊ　（Ｈ，５ａｋｏａ
　ａｎｄ　Ｓ、Ｃ：ｈｉｂａ。Conventional technology In general, in speech recognition devices for non-specific speakers, a large number of speech patterns from a large number of people are divided into groups using a clustering method, their representative patterns are registered as standard patterns, and the input speech patterns are combined with the input speech patterns stored in a dictionary. A method is used in which the degree of similarity is calculated between all the standard patterns, and then the standard pattern with the maximum degree of similarity is selected as the recognition candidate speech. In order to calculate the similarity between two speech patterns, dynamic programming (
Pattern matching (hereinafter referred to as DP matching) is used in which the time axes of two patterns are expanded or contracted non-linearly using a dynamic programming method. In particular, word speech recognition devices achieve high recognition rates using this DP matching method. (For example, [Dynamic Programming Optimization for Spoken Word Recording, 7 J (H, 5 akoa
and S, C:hiba.

”　Ｄｙｎａｍｉｃ　ｐｒｏｇｒａｍｍｉｎｇ　Ｏｐｔ
ｉｍｉｚａｔｉｏｎ　ｒｏｒｓｐｏｒｋａｎ　ｗｏｒｄ
　ｒｅｃｏｇｎｉｔｉｏｎ”、　工ＥＥＥ　ｔｒａｎｓ
。” Dynamic programming Opt.
imization rosporkan word
recognition”, ENGEEE trans
.

人ｃｏｕｓｔｉｃ、５ｐｅｅｃｈ、Ｓｉｇｎａｌ　　Ｐ
ｒｏｃｅｓｓｉｎｇ、Ｖｏｌ。personcoustic, 5peech, Signal P
rocessing, Vol.

ム５ＳＰ−２７ｐｐ、３３６−３４９．１９７９）　）
発明が解決しようとする問題点しかしながら上記の音声認識装置では、話者の発声の仕
方９個人差及び音声区間検出の誤り等により語頭部ある
いは語尾部の欠落したパタンか入力された場合には、欠
落のないパタンとパタンマツチングを行うことになり類
似度が低くなり誤認識が生じやすくなるという問題点を
有していた。Mu5SP-27pp, 336-349.1979))
Problems to be Solved by the Invention However, with the above-mentioned speech recognition device, when a pattern with a missing beginning or end of a word is input due to individual differences in the way speakers pronounce their utterances, errors in speech segment detection, etc. This method has the problem that pattern matching is performed with patterns that have no omissions, resulting in a low degree of similarity and an increased likelihood of misrecognition.

例えば、ｒＦＵＫＵＯＫＡ（福岡）」と発声する場合を
考えると語頭部のＦＨの部分は発声の仕方９個人差等に
より有声化したり無声化したりする０無声化した場合に
はＦＵの部分のエネルギー値は非常に小さくなり、主に
音声のエネルギー値系列により音声区間を検出する音声
認識装置では、誤ってＦＵの部分が欠落したｒＫＵＯＫ
Ａ」の区間だけを音声区間として検出する可能性が高く
なる。そのため標準パタンのｒＦＵＫＵＯＫＡ　」との
パタンマツチングを行ってもその類似度が低くなり誤認
識が生じやすくなる。このように従来の音声認識装置で
は音声区間の検出を誤った場合に、いかに認識率の低下
を防ぐかが問題であった。For example, if we say ``rFUKUOKA (Fukuoka)'', the FH part at the beginning of the word may be voiced or devoiced depending on the way of pronouncing9 between individuals.0 If it is devoiced, the energy value of the FU part becomes very small, and in a speech recognition device that mainly detects voice sections based on the energy value series of the voice, rKUOK where the FU part is mistakenly omitted.
There is a high possibility that only the section "A" will be detected as a voice section. Therefore, even if pattern matching is performed with the standard pattern rFUKUOKA, the degree of similarity will be low and misrecognition will likely occur. As described above, the problem with conventional speech recognition devices is how to prevent the recognition rate from decreasing when a speech section is incorrectly detected.

本発明は上記問題点に鑑み、発声の仕方により語頭部２
語尾部の欠落の可能性のあるパタンについて、音声区間
の検出を誤った場合でも精度良く認識することのできる
音声認識装置を提供するものである。In view of the above-mentioned problems, the present invention has been developed to improve the sound of the beginning part of a word depending on the way it is uttered.
To provide a speech recognition device that can accurately recognize a pattern in which there is a possibility of missing a word ending even if a speech section is incorrectly detected.

問題点を解決するための手段上記目的を達するために本発明の音声認識装置は、入力
音声のエネルギー系列から音声区間を検出する音声区間
検出手段と、多人数の多数の音声パタンから代表的なパ
タンを認識対象音声ごとに複数個ずつ選び出し、それら
を標準パタンとして決定する標準パタン決定手段と、標
準パタンの記憶されているアドレス及びパタン長を管理
する標準パタン管理手段と、発声の仕方９個人差により
語頭部１語尾部の欠落する可能性のあるパタン全欠落の
ない標準パタンの一部分として管理する部分パタン管理
手段と、入力音声と前記標準パタン管理手段と部分パタ
ン管理手段により管理された各パタンとの間でパタンマ
ツチングを行い、類似度の最大となるパタンを認識候補
音声とするパタンマツチング手段を備えたものである。Means for Solving the Problems In order to achieve the above object, the speech recognition device of the present invention includes a speech section detecting means for detecting a speech section from the energy series of input speech, and a speech section detecting means for detecting a speech section from the energy series of input speech, and a speech section detecting means for detecting a speech section from a large number of speech patterns of a large number of people. standard pattern determining means for selecting a plurality of patterns for each voice to be recognized and determining them as standard patterns; standard pattern management means for managing the stored address and pattern length of the standard patterns; and nine individual utterance methods. A partial pattern management means that manages a pattern in which there is a possibility that the initial part of a word or a final part of a word is missing due to the difference as a part of a standard pattern without any missing parts; The apparatus is equipped with a pattern matching means that performs pattern matching between each pattern and selects the pattern with the maximum degree of similarity as a recognition candidate speech.

作用本発明は上記に述べた構成によって、あらかじめ語頭部
１語尾部の欠落の可能性のあるパタンについて、欠落の
生じたパタンを欠落のない標準パタンの一部分として管
理し、欠落のない標準パタン及び欠落の生じた代表パタ
ンの部分パタンと入力音声との間でパタンマツチングを
行い認識候補音声を導き出すことにより、語頭部２語尾
部の検出の難しいパタンについて音声区間検出を誤った
場合にも精度良く認識する事ができる。また、欠落のあ
るパタンを欠落のない標準パタンの一部分として管理す
ることにより標準パタンのメモリ容量が増加することを
防止する事ができる。Effects of the Invention With the above-described configuration, the present invention manages patterns in which there is a possibility of missing the beginning and end of a word in advance as a part of a standard pattern without missing parts, and creates a standard pattern without missing parts. By performing pattern matching between the partial pattern of the missing representative pattern and the input speech and deriving recognition candidate speech, it is possible to detect speech sections incorrectly for patterns that are difficult to detect at the beginning or end of two words. can also be recognized with high accuracy. Furthermore, by managing a pattern with a missing part as a part of a standard pattern without any missing part, it is possible to prevent the memory capacity of the standard pattern from increasing.

実施例以下本発明の一実施例の音声認識装置について、図面を
参照しながら説明する。Embodiment Hereinafter, a speech recognition device according to an embodiment of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例における音声認識装置のブロ
ック図である。第１図において、１は音声入力部で、話
者の音声がマイクロホン等を通して入力される。２は音
声分析手段で、入力された音声信号から特徴ベクトルの
時系列及びエネルギー系列を抽出する。３は音声区間検
出手段で、音声のエネルギー系列から音声区間部分を検
出する。FIG. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a voice input section, into which a speaker's voice is input through a microphone or the like. 2 is a speech analysis means that extracts a time series and an energy series of feature vectors from the input speech signal. 3 is a voice section detecting means that detects a voice section portion from the energy sequence of the voice.

４は標準パタン決定手段で、多人数の多数の音声パタン
を分析し、それらの代表パタンを標準パタンとして決定
する。６は各標準パタンのメモリ位置、パタン長を管理
する標準パタン管理手段、６は語頭部２語尾部の欠落し
たパタンを標準パタン管理手段６で管理されている標準
パタンの一部分として管理する部分パタン管理手段、７
は入力パタンと各標準パタン及び各部分パタンとの間で
パタンマツチングを行うパタンマツチング手段、８はパ
タンマツチング手段７の結果から導き出した認識候補音
声を音声合成等により話者に知らせる認識結果出力部で
ある。Reference numeral 4 denotes a standard pattern determining means that analyzes a large number of voice patterns from a large number of people and determines their representative patterns as standard patterns. Reference numeral 6 denotes a standard pattern management means for managing the memory location and pattern length of each standard pattern; 6 a part for managing patterns with missing two word beginnings and tails as part of the standard pattern managed by the standard pattern management means 6; Pattern management means, 7
8 is a pattern matching means that performs pattern matching between the input pattern and each standard pattern and each partial pattern, and 8 is a recognition device that notifies the speaker of the recognition candidate speech derived from the result of the pattern matching means 7 through speech synthesis or the like. This is the result output section.

第２図は本実施例の構成を示す回路図で、上記の音声区
間検出手段３、標準パタン管理手段６゜部分パタン管理
手段６．パタンマツチング手段７をマイクロコンピュー
タ２３で実現した構成を示すものである。第２図におい
て、１１は音声の入力を行なうマイクロホン、１２はマ
イクロホン１１から入力された音声信号をアナログ−デ
ィジタル変換するアナログ／ディジタル変換器（以下Ａ
／Ｄ変換器という。）、１３は音声分析部、１４は音声
区間検出部、１５は入力音声の特徴ベクトルの時系列を
記憶する入力パタンメモリ、１７は標準パタンのなかで
語頭部１語尾部の欠落の可能性のあるパタンについて、
欠落の生じたパタンを標準パタンの部分パタンとして管
理する標準パタンの部分パタン管理テーブル、１８は標
準パタン決定手段６により決定された各標準パタンを管
理する標準パタン管理テーブル、１９はすべての標準パ
タンの特徴ベクトルの時系列を記憶する標準パタンメモ
リ、２０は認識結果判定部、２１は得られた認識候補音
声の音声を合成する音声合成部、２２は音声合成部２１
で得られた音声合成部を出力するスピーカである。FIG. 2 is a circuit diagram showing the configuration of this embodiment, which includes the above-mentioned voice section detection means 3, standard pattern management means 6.degree. partial pattern management means 6. This figure shows a configuration in which the pattern matching means 7 is realized by a microcomputer 23. In FIG. 2, reference numeral 11 is a microphone that inputs audio, and 12 is an analog/digital converter (hereinafter referred to as A) that converts the audio signal input from the microphone 11 from analog to digital.
/D converter. ), 13 is a speech analysis unit, 14 is a speech interval detection unit, 15 is an input pattern memory that stores the time series of feature vectors of input speech, and 17 is a possibility that one word beginning or end part is missing in the standard pattern. Regarding a certain pattern,
A standard pattern partial pattern management table for managing missing patterns as partial patterns of the standard pattern; 18 a standard pattern management table for managing each standard pattern determined by the standard pattern determining means 6; 19 a standard pattern management table for managing all standard patterns. 20 is a recognition result determination section; 21 is a speech synthesis section that synthesizes the speech of the obtained recognition candidate speech; 22 is a speech synthesis section 21
This is a speaker that outputs the voice synthesizer obtained by.

第３図は本実施例のマイクロコンピュータの動作を説明
するための要部フローチャートである。FIG. 3 is a main part flowchart for explaining the operation of the microcomputer of this embodiment.

以上の構成による本実施例の動作を、第３図のフローチ
ャートに清って詳細に説明する。The operation of this embodiment with the above configuration will be explained in detail with reference to the flowchart of FIG.

まず、ステップ３１でマイクロホン１１から音声を入力
し、人／Ｄ変換器１２で音声信号をアナログ−ディジタ
ル変換したあと、音声分析部１３で音声パタンの特徴ベ
クトル（例えば、１０次元の線形予測係数）の時系列と
エネルギー系列を求める。ステップ３２では、音声分析
部１３で得られたエネルギー系列からエネルギー値がし
きい値を上回る区間が一定時間Ｔ８ｆｃ超え、しかも語
頭前部９語尾後部にそれぞれ一定時間Ｔ１．Ｔ２以上の
しきい値人。を下回る区間が存在するとき一定時間Ｔ。First, in step 31, audio is input from the microphone 11, the audio signal is converted from analog to digital by the human/D converter 12, and then the audio analysis unit 13 converts the audio pattern into a feature vector (for example, a 10-dimensional linear prediction coefficient). Find the time series and energy series of. In step 32, from the energy series obtained by the speech analysis unit 13, the sections in which the energy value exceeds the threshold value exceed T8fc for a certain period of time, and furthermore, the sections in which the energy value exceeds the threshold value exceed T8fc for a certain period of time T1. Threshold people above T2. When there is an interval below T for a certain period of time.

を超える区間を音声区間として検出し、ステップ３３で
入力パタンメモリ１６にその特徴ベクトルの時系列を記
憶する。The section exceeding the above is detected as a speech section, and in step 33, the time series of the feature vector is stored in the input pattern memory 16.

なお、あらかじめ標準パタン決定手段４により認識対象
音声の各々に対して、多人数の多数の音声パタンより代
表的なパタンを複数個ずつ決定し、標準パタンメモリ１
９にそれらのパタンを記憶している。また、標準パタン
管理テーブル１８には、標準パタンメモリ１９の各パタ
ンを管理するだめのアドレス及びパタン長を記憶してお
り、標準パタンの部分パタン管理テーブル１７には、標
準パタンのうち語頭部２語尾部の欠落の可能性のあるパ
タンをあらかじめ調べておき、欠落の生じた時のパタン
を欠落のない標準パタンの部分パタンとして管理するた
めに、その標準パタンメモリ１９上のアドレス及びその
パタン長を記憶している０即ち、標準パタンメモリ１９
には欠落のない代表パタンとしての標準パタンの特徴ベ
クトルの時系列のみが記憶されているだけであり、語頭
部２語尾部の欠落した部分パタンか必要なときは、標準
パタンの部分パタン管理テーブル１７に従い標準パタン
メ゛モリ１９内の部分パタンの部分のみを取り出せばよ
い。Note that the standard pattern determining means 4 determines in advance a plurality of representative patterns from among the many voice patterns of many people for each recognition target voice, and stores them in the standard pattern memory 1.
9 memorizes those patterns. Further, the standard pattern management table 18 stores the addresses and pattern lengths for managing each pattern in the standard pattern memory 19, and the standard pattern partial pattern management table 17 stores the beginning of the word of the standard pattern. In order to check in advance a pattern that may have a missing two-word tail, and to manage the pattern when the missing part occurs as a partial pattern of a standard pattern without missing parts, the address on the standard pattern memory 19 and its pattern are stored. 0 that stores the length, that is, the standard pattern memory 19
only stores the time series of feature vectors of the standard pattern as a representative pattern with no omissions, and when a missing partial pattern at the beginning or end of a word is needed, partial pattern management of the standard pattern is performed. It is sufficient to extract only the partial pattern portion from the standard pattern memory 19 according to the table 17.

ステップ３４では、標準パタン管理テーブル１８に従っ
て標準パタンメモリ１９上の最初のパタンをＤＰマツチ
ング部１６のメモリにロードし、次にステップ３５で入
力パタンメモリ１６に記憶された入力パタンとステップ
３４でロードされた標準パタンとの間でＤＰマツチング
を行う。ステップ３６では、標準パタン管理テーブル１
８に従い、すべての標準パタンとステップ３４．３５の
処理を終了したかを調べ、終了していなければステップ
３４に戻り同様の処理を続ける。In step 34, the first pattern on the standard pattern memory 19 is loaded into the memory of the DP matching unit 16 according to the standard pattern management table 18, and then in step 35, the input pattern stored in the input pattern memory 16 and the input pattern are loaded in step 34. DP matching is performed between the standard pattern and the standard pattern. In step 36, standard pattern management table 1
8, it is checked whether all the standard patterns and the processing of steps 34 and 35 have been completed, and if not, the process returns to step 34 and the same processing is continued.

ステップ３６の条件を満足すると、次はステップ３７で
部分パタン管理テーブル１７に従い、最初の部分パタン
を標準パタンメモリ１９からＤＰマツチング部１６のメ
モリ上にロードし、ステップ３８でＤＰマツチングを実
行する。その後、ステップ３９で標準パタンの部分パタ
ン管理テーブル１７に従い、すべての部分パタンとステ
ップ３７．３８の処理を終了したかをチェックし、終了
していなければステップ３７の処理に戻る。If the conditions in step 36 are satisfied, then in step 37 the first partial pattern is loaded from the standard pattern memory 19 onto the memory of the DP matching section 16 according to the partial pattern management table 17, and in step 38 DP matching is executed. Thereafter, in step 39, it is checked in accordance with the standard pattern partial pattern management table 17 whether all partial patterns and the processes of steps 37 and 38 have been completed, and if not, the process returns to step 37.

すべての標準パタン及び部分パタンとのＤＰマツチング
が終了すると、ステップ４ｏに進み、認識結果判定部２
ｏで、ＤＰマツチング部１６で得られた各標準パタン及
び部分パタンとの類似度のうち最大値を与えるパタンを
認識候補音声として判定する。さらに、ステップ４１で
音声合成部２１を起動させ認識結果判定部２ｏで得られ
た認識候補音声を合成し、スピーカ２２に出力すること
により話者に認識候補音声を通知する。When the DP matching with all standard patterns and partial patterns is completed, the process proceeds to step 4o, where the recognition result determination unit 2
At step o, the pattern that gives the maximum value among the degrees of similarity with each standard pattern and partial pattern obtained by the DP matching unit 16 is determined as a recognition candidate speech. Further, in step 41, the speech synthesis section 21 is started to synthesize the recognition candidate speech obtained by the recognition result determination section 2o, and outputs the synthesized speech to the speaker 22 to notify the speaker of the recognition candidate speech.

なお、本実施例では、標準パタン管理テーブルと部分パ
タン管理テーブルとを別々に持ったが、部分パタン管理
テーブルを標準パタン管理テーブルの中の一部と考えれ
ば管理テーブル一つで同様の処理を行うことができる。In this embodiment, the standard pattern management table and the partial pattern management table are provided separately, but if the partial pattern management table is considered as part of the standard pattern management table, the same processing can be performed with a single management table. It can be carried out.

以上のように本実施例によれば、標準パタンを管理する
標準パタン管理手段と、語頭部１語尾部の欠落する可能
性のあるパタンについて欠落の生じた時のパタンを欠落
のない標準パタンの一部分として管理する部分パタン管
理手段とを持ち、語頭部２語尾部の検出を誤った場合に
も、部分パタンとパタンマツチングすることにより正し
く認識を行うことができる。As described above, according to this embodiment, the standard pattern management means for managing standard patterns, and the standard pattern management means for managing standard patterns, and the standard pattern that does not have any deletions, are used to convert the pattern when the beginning or end of a word is likely to be deleted. Even if a word beginning or two word endings are mistakenly detected, correct recognition can be performed by pattern matching with the partial pattern.

また、語頭部９語尾部の不安定な標準パタンについては
欠落の生じたパタンを欠落のない代表パタン一つで管理
することができるのでテンプレートを増やす必要がなく
、メモリの有効利用がはかれる。Furthermore, regarding the unstable standard patterns of the beginning and end of words, the missing patterns can be managed with a single representative pattern that is free of omissions, so there is no need to increase the number of templates, and memory can be used effectively.

発明の効果以上のように本発明は、多人数の多数の音声パタンから
代表的なパタンを各認識対象音声に複数個ずつ選択し、
標準パタンとして決定する標準パタン決定手段と、各標
準パタンのメモリ上のアドレスとパタン長を管理する標
準パタン管理手段と、標準パタンのうち語頭部２語尾部
の欠落の可能性のあるパタンについて欠落の生じたとき
のパタンを、欠落のない標準パタンの一部分としてその
アドレスとパタン長を標準パタン一つで管理する部分パ
タン管理手段とを持ち、入力パタンと各標準パタン及び
各部分パタンとの間でパタンマツチングを行い類似度が
最大となるパタンを認識候補音声とすることにより、音
声区間検出の際に誤って語頭部１語尾部が欠落したパタ
ンを大刀した場合でも部分パタン管理手段により管理さ
れた部分パタンとパタンマツチングを行うことにより精
度良く認識を行うことのできる音声認識装置を提供する
ことができる。Effects of the Invention As described above, the present invention selects a plurality of representative patterns from a large number of speech patterns of a large number of people for each speech to be recognized,
A standard pattern determining means for determining a standard pattern, a standard pattern management means for managing the memory address and pattern length of each standard pattern, and a pattern that may be missing two word beginnings and tails among the standard patterns. It has a partial pattern management means that manages the address and pattern length of a pattern when a missing part occurs as part of a standard pattern without missing parts in one standard pattern. By performing pattern matching between the two and selecting the pattern with the highest degree of similarity as the recognition candidate speech, it is possible to manage partial patterns even when a pattern in which the beginning or end of a word is accidentally omitted during speech section detection is detected. By performing pattern matching with the partial patterns managed by the above method, it is possible to provide a speech recognition device that can perform recognition with high accuracy.

また、欠落の生じたパタンを欠落のない標準パタンを代
表パタンとして代表パタン一つで管理することにより、
テンプレート数を増加させることなく音声区間検出を誤
った場合にも正しく認識することのできる音声認識装置
を提供することができる。In addition, by managing missing patterns as a single representative pattern, using a standard pattern with no missing parts as a representative pattern,
It is possible to provide a speech recognition device that can correctly recognize even when a speech section is detected incorrectly without increasing the number of templates.

[Brief explanation of the drawing]

第１図は本発明の一実施例における音声認識装置の構成
を示すブロック図、第２図は同装置の構成を示す回路ブ
ロック図、第３図は同装置の動作説明のための要部フロ
ーチャートである。２・・・・・・音声分析手段、３・・・・・・音声区間
検出手段、４・・・・・・標準パタン決定手段、６・・
・・・・標準パタン管理手段、６・・・・・・部分パタ
ン管理手段、７・・・・・・パタンマツチング手段、１
１・・・・・・マイクロボン、１５・・・・・・入力パ
タンメモリ、１７・旧・・部分パタン管理−ｙ−−７”
ル、１８・・・・・・標準パタン管理テーブル、１９・
・・・・・標準パタンメモリ、２２・・川・スピーカ、
２３・・・・・・マイクロコンピュータ。FIG. 1 is a block diagram showing the configuration of a speech recognition device according to an embodiment of the present invention, FIG. 2 is a circuit block diagram showing the configuration of the same device, and FIG. 3 is a flowchart of main parts for explaining the operation of the same device. It is. 2... Voice analysis means, 3... Voice section detection means, 4... Standard pattern determining means, 6...
... Standard pattern management means, 6 ... Partial pattern management means, 7 ... Pattern matching means, 1
1... Microbon, 15... Input pattern memory, 17 Old... Partial pattern management-y--7"
18...Standard pattern management table, 19.
...Standard pattern memory, 22... River speaker,
23...Microcomputer.

Claims

[Claims]

speech analysis means for extracting a time series of feature vectors including energy sequences from input speech; speech section detection means for detecting speech sections from the energy sequences obtained by the speech analysis means; standard pattern determining means for selecting representative patterns and determining a plurality of them as standard patterns for each voice to be recognized; and an address in a memory where each standard pattern determined by the standard pattern determining means is stored; Regarding the standard pattern management means for managing pattern length and the standard pattern of recognition target speech where the beginning or end of a speech pattern may be missing depending on the way of speaking or individual differences, a pattern with no missing parts is used as a representative pattern. , a partial pattern management means for managing the memory address and pattern length where the standard pattern of the missing pattern is stored as a part of the representative pattern; each standard pattern managed by the standard pattern management means; The present invention is characterized by comprising a pattern matching means that performs pattern matching between each partial pattern of the standard pattern managed by the partial pattern management means and an input speech pattern, and sets a pattern with a maximum degree of similarity as a recognition candidate speech. voice recognition device.