JPH04305700A

JPH04305700A - Pattern recognition device

Info

Publication number: JPH04305700A
Application number: JP3062118A
Authority: JP
Inventors: Hiroshi Kanazawa; 博史金澤; Yoichi Takebayashi; 洋一竹林; Hiroyuki Tsuboi; 宏之坪井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-02-13
Filing date: 1991-03-26
Publication date: 1992-10-28
Anticipated expiration: 2016-02-19
Also published as: JP3135594B2

Abstract

PURPOSE:To enable the estimation of parameters for learning using a recognition dictionary and likelihood arithmetic individually with learning data of the recognition dictionary and parameter estimation data for likelihood arithmetic which are prepared separately. CONSTITUTION:A feature vector is extracted from a feature parameter obtained by analyzing input data and collated with the recognition dictionary 4 to find similarity and likelihood corresponding to the similarity is calculated and used to find the recognition result as to the input data. The learning using the recognition dictionary 4 is performed by using the feature vector extracted from recognition dictionary learning data 7 in a category to be recognized and the recognition result of the feature vector, and the parameter for likelihood arithmetic is estimated from the similarity found by matching the feature vector extracted from the parameter estimation data 8 for likelihood arithmetic and the recognition dictionary 4 after the learning and used for a next recognition process.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、文字や音声などを認識
するためのパターン認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pattern recognition device for recognizing characters, sounds, and the like.

【０００２】0002

【従来技術】従来、文字や音声などを認識するパターン
認識装置として次のようなものが知られている。例えば
、音声パターンを認識するパターン認識装置では、入力
された音声データから音声パターンを抽出し、このパタ
ーンの認識対象カテゴリについて認識辞書の内容を照合
して、その類似度を求め、このうちで最大類似度を示す
カテゴリを認識結果として出力するようにしている。2. Description of the Related Art Conventionally, the following types of pattern recognition devices for recognizing characters, voices, etc. are known. For example, a pattern recognition device that recognizes speech patterns extracts a speech pattern from input speech data, compares the recognition target category of this pattern with the contents of a recognition dictionary, calculates the degree of similarity, and selects the highest Categories indicating the degree of similarity are output as recognition results.

【０００３】この場合、認識辞書とパターンの照合によ
り求められる類似度は、類似度演算に供されるパターン
が認識辞書の内容にどの程度類似しているかを示す尺度
である。つまり、ここでの類似度は、パターンが認識対
象カテゴリのうち、どのカテゴリに最も似ているかを示
すのに止まるもので、この時の抽出パターンが、あるカ
テゴリであるか、そうでないかを同定することは困難と
されていた。[0003] In this case, the degree of similarity determined by matching the pattern with the recognition dictionary is a measure of how similar the pattern subjected to the similarity calculation is to the contents of the recognition dictionary. In other words, the similarity here only indicates which category the pattern is most similar to among the recognition target categories, and it is used to identify whether the extracted pattern belongs to a certain category or not. It was considered difficult to do so.

【０００４】そこで、従来、カテゴリを同定するための
手段として、類似度に対して所定のしきい値を設定し、
上述の認識辞書とパターンの照合により求められる類似
度がしきい値より小さい場合には、棄却（リジェクト）
処理を実行し、しきい値よりも大きい場合には、その時
のカテゴリであると同定する方法が考えられている。[0004] Conventionally, therefore, as a means for identifying categories, a predetermined threshold is set for the degree of similarity.
If the similarity obtained by matching the pattern with the recognition dictionary described above is smaller than the threshold, it is rejected.
A method is being considered in which a process is executed and if the value is larger than a threshold value, it is identified as the current category.

【０００５】ところが、類似度に対して設定されるしき
い値は、例えば、雑音の重畳した音声データをパターン
認識する場合は、全体的に類似度が低下することを見越
して低目に設定し、一方、雑音のない音声データをパタ
ーン認識する場合には、逆に高目に設定するといったよ
うに認識条件に依存する傾向があり、一意に定めること
は困難とされていた。[0005] However, when recognizing a pattern of speech data with superimposed noise, for example, the threshold value set for the degree of similarity is set low in anticipation of a decrease in the degree of similarity overall. On the other hand, when pattern recognition is performed on noise-free voice data, there is a tendency to depend on the recognition conditions, such as setting a higher value, and it has been difficult to uniquely define the recognition conditions.

【０００６】したがって、このような類似度を用いるこ
とは、いずれにしても抽出パターンと認識辞書がどの程
度似ているかを知る程度に過ぎず、その類似度値を取る
ことでどの程度の確率で正確な結果が得られるかについ
ては有効な方法とは言えなかった。[0006] Therefore, in any case, using this kind of similarity is only a way of knowing how similar the extraction pattern and the recognition dictionary are, and by taking the similarity value, you can determine the probability. It cannot be said that it is an effective method in terms of whether accurate results can be obtained.

【０００７】そこで、従来、類似度の値を尤度化する方
法が考えられている。つまり、この方法では、類似度値
を尤度化し、類似度を確率事象としてとらえることによ
り、確実性の表現を可能として、それまであるカテゴリ
と別カテゴリとの関係が不明確であったものを明確な関
係として扱えるようにしている。（参考文献：Ｔｅｒｕ
ｈｉｋｏ　Ｕｋｉｔａ　ｅｔ　ａｌ．”Ａ　Ｓｐｅａｋ
ｅｒ　Ｉｎｄｅｐｅｎｄｅｎｔ　Ｒｅｃｏｇｎｉｔｉｏ
ｎ　Ａｌｇｏｒｉｔｈｍ　ｆｏｒ　Ｃｏｎｎｅｃｔｅｄ
　Ｗｏｒｄ　Ｕｓｉｎｇ　Ｗｏｒｄ　Ｂｏｕｎｄａｒｙ
　Ｈｙｐｏｔｈｅｓｉｚｅｒ”，　Ｐｒｏｃｅｅｄｉｎ
ｇｓ　ｏｆ　ＩＣＡＳＳＰ８６，　ｐｐ−１０７７〜１
０８０）しかしながら、この方法によっても、例えば、
認識辞書の学習に用いたデータセットから求めた尤度化
のためのパラメータと、これ以外のデータセットから求
めた尤度化のためのパラメータとは大きく異なり、認識
辞書の学習と尤度化のためのパラメータの推定を同じデ
ータセットで行ったとしても、認識の性能向上は何等望
むことができず、別のデータセットで行うにしても、実
際にどのようにして認識辞書の学習を行えばよいかなど
の指針がないことから、認識性能の向上を期待するのが
難しかった。[0007] Conventionally, therefore, a method of converting the similarity value into a likelihood has been considered. In other words, this method makes it possible to express certainty by converting the similarity value into a likelihood and treating the similarity as a probability event. This allows it to be treated as a clear relationship. (Reference: Teru
Hiko Ukita et al. ”A Speak
er Independent Recognition
n Algorithm for Connected
Word Using Word Boundary
Hypothesizer”, Proceedin
gs of ICASSP86, pp-1077~1
080) However, even with this method, for example,
The parameters for likelihood calculation obtained from the dataset used for learning the recognition dictionary are significantly different from the parameters for likelihood calculation obtained from other datasets. Even if we estimate the parameters for the recognition dictionary using the same data set, we cannot expect any improvement in recognition performance. It was difficult to expect recognition performance to improve because there were no guidelines for determining whether the recognition was acceptable or not.

【０００８】[0008]

【発明が解決しようとする課題】このように、従来のパ
ターン認識装置では、認識辞書と入力パターンの照合に
よって得られる類似度値を尤度化する場合に、認識辞書
の学習と尤度化のためのパラメータの推定をどのように
行うかという明確な指針がないことから、認識性能の優
れたパターン認識を期待するのが難しいという問題があ
った。[Problems to be Solved by the Invention] In this way, in the conventional pattern recognition device, when converting the similarity value obtained by matching the recognition dictionary to the input pattern, it is difficult to learn the recognition dictionary and calculate the likelihood. The problem is that it is difficult to expect pattern recognition with excellent recognition performance because there is no clear guideline on how to estimate the parameters for the pattern recognition.

【０００９】本発明は、上記事情に鑑みてなされたもの
で、認識辞書の学習と尤度演算のためのパラメータの推
定を効率よく行うことができ、認識性能の向上を期待で
きるパターン認識装置を提供することを目的とする。The present invention has been made in view of the above circumstances, and provides a pattern recognition device that can efficiently perform learning of a recognition dictionary and estimate parameters for likelihood calculation, and is expected to improve recognition performance. The purpose is to provide.

【００１０】0010

【課題を解決するための手段】本発明のパターン認識装
置は、入力データを分析した特徴パラメータから一定次
元の特徴ベクトルを抽出し、この特徴ベクトル抽出を認
識辞書と照合して類似度を求めるとともに、類似度に対
応する尤度を演算し、この演算された尤度を用いて上記
入力データに対する認識結果を求めるようにしたもので
あって、認識対象カテゴリに属する認識辞書学習用デー
タから抽出した特徴ベクトルと、この特徴ベクトルの認
識結果を用いて上記認識辞書の学習を行うとともに、尤
度演算用パラメータ推定データから抽出した特徴ベクト
ルと学習後の認識辞書との照合により求められる類似度
から尤度演算用のパラメータを推定して次回の認識処理
に供するようになっている。[Means for Solving the Problems] The pattern recognition device of the present invention extracts feature vectors of a certain dimension from feature parameters obtained by analyzing input data, compares the extracted feature vectors with a recognition dictionary, and calculates the degree of similarity. , the likelihood corresponding to the similarity is calculated, and the calculated likelihood is used to obtain the recognition result for the input data, which is extracted from the recognition dictionary training data belonging to the recognition target category. The above recognition dictionary is trained using the feature vector and the recognition result of this feature vector, and the likelihood is calculated from the similarity obtained by comparing the feature vector extracted from the parameter estimation data for likelihood calculation with the learned recognition dictionary. The parameters for the degree calculation are estimated and used for the next recognition process.

【００１１】また、本発明のパターン認識装置は、入力
データを分析した特徴パラメータから一定次元の特徴ベ
クトルを抽出し、この特徴ベクトルを認識辞書と照合し
て類似度を求めるとともに、類似度に対応する尤度を演
算し、この尤度を用いて上記入力データに対する認識結
果を求めるようにしたものであって、あらかじめ用意さ
れた複数のデータセットより認識辞書学習処理および尤
度演算用パラメータ推定に供するデータセットを選択可
能にしたデータセット選択手段を具備し、データセット
選択手段より認識辞書学習用として選択されたデータセ
ットによる特徴ベクトルの認識結果を用いて認識辞書学
習を実行するとともに、データセット選択手段より尤度
演算用パラメータ推定用として選択されたデータセット
による特徴ベクトルと学習後の認識辞書との照合により
尤度演算用のパラメータを推定し、且つ次回の認識辞書
の学習用には前回の尤度演算用パラメータ推定用に選択
したデータセットを含めてデータセットを選択するとと
もに、尤度演算用パラメータ推定用には認識辞書学習用
としていないデータセットを選択するようになっている
。[0011] Furthermore, the pattern recognition device of the present invention extracts a feature vector of a certain dimension from the feature parameters obtained by analyzing input data, compares this feature vector with a recognition dictionary to obtain a degree of similarity, and also calculates a degree of similarity corresponding to the degree of similarity. The system calculates the likelihood for the above input data and uses this likelihood to obtain the recognition result for the input data. The data set selection means is equipped with a data set selection means capable of selecting a data set to be provided, and executes recognition dictionary learning using the recognition result of the feature vector from the data set selected by the data set selection means for recognition dictionary learning. The parameters for likelihood calculation are estimated by comparing the feature vectors from the data set selected for estimating the parameters for likelihood calculation by the selection means with the recognition dictionary after learning, and the parameters for the next recognition dictionary are estimated using the previous Data sets including the data set selected for estimating parameters for likelihood calculation are selected, and data sets not used for recognition dictionary learning are selected for estimating parameters for likelihood calculation.

【００１２】0012

【作用】この結果、本発明によれば、認識辞書の学習と
、尤度演算のためのパラメータ推定を効率よく行うこと
ができ、認識性能の優れたるパターン認識を実現するこ
とができる。As a result, according to the present invention, learning of the recognition dictionary and parameter estimation for likelihood calculation can be performed efficiently, and pattern recognition with excellent recognition performance can be realized.

【００１３】[0013]

【実施例】以下、本発明の一実施例を図面にしたがい説
明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１４】図１は、本発明を音声認識装置に適用した
場合の概略構成を示すものである。この場合、１は音声
入力部で、この音声入力部１は、図示しないマイクロホ
ンなどを介して入力される音声信号をディジタル信号に
変換して出力するようにしている。FIG. 1 shows a schematic configuration when the present invention is applied to a speech recognition device. In this case, reference numeral 1 denotes an audio input section, and the audio input section 1 converts an audio signal input through a microphone (not shown) into a digital signal and outputs the digital signal.

【００１５】この場合、音声入力部１は、高周波雑音成
分を除去するローパスフィルタ（ＬＰＦ）と、このＬＰ
Ｆを介して取り込まれる入力音声信号を例えば標本化周
波数１２ｋＨｚ、量子化ビット数１２ｂｉｔｓによりデ
ィジタル信号に変換するＡ／Ｄ変換器により構成されて
いる。なお、ここでの入力音声信号に対するディジタル
化処理は、例えば８ｋＨｚの標本化周波数で量子化ビッ
ト数が１６ｂｉｔｓのディジタル信号を求めるようにし
てもよいなど、その仕様は、入力音声に対して要求され
る認識性能などに応じて適宜定めることができる。また
、この音声入力部１には、認識対象カテゴリに属するデ
ータである認識辞書学習用データ７および尤度演算用パ
ラメータを推定するための尤度演算用パラメータ推定用
データ８が入力可能にもなっている。In this case, the audio input section 1 includes a low-pass filter (LPF) for removing high-frequency noise components, and a low-pass filter (LPF) for removing high-frequency noise components.
It is constituted by an A/D converter that converts an input audio signal taken in via F into a digital signal at a sampling frequency of 12 kHz and a quantization bit number of 12 bits, for example. Note that the digitization process for the input audio signal here may be performed to obtain a digital signal with a sampling frequency of 8 kHz and a quantization bit count of 16 bits, for example, depending on the specifications required for the input audio. It can be determined as appropriate depending on the recognition performance, etc. In addition, it is possible to input recognition dictionary learning data 7, which is data belonging to the recognition target category, and likelihood calculation parameter estimation data 8, which is used to estimate the likelihood calculation parameters, to the voice input unit 1. ing.

【００１６】そして、この音声入力部１にてデジタル信
号に変換された出力は、音声分析部２に送られる。この
音声分析部２は、音声入力部１より与えられる音声デー
タを分析するためのもので、基本的にはＦＦＴ分析や、
ＬＰＣ分析などの手法が用いられ、例えば、８ｍｓｅｃ
　毎に、その特徴パラメータを求めるようになっている
。The output converted into a digital signal by the audio input section 1 is sent to the audio analysis section 2. This voice analysis section 2 is for analyzing the voice data given from the voice input section 1, and basically performs FFT analysis,
A method such as LPC analysis is used, for example, 8 msec
The feature parameters are calculated for each time.

【００１７】音声分析部２にて求められる特徴パラメー
タの時系列は、認識処理部３に送られ、認識処理に供さ
れるようになる。この認識処理部３は、音声分析部２で
求められた特徴パラメータを用いて音声特徴ベクトルを
抽出する音声特徴ベクトル抽出部３１、この音声特徴ベ
クトル抽出部３１で抽出された音声特徴ベクトルと認識
対象カテゴリの認識辞書４との間でパターン照合を行い
類似度を演算する類似度演算部３２および類似度演算部
３２で求められた類似度値から尤度を求める尤度演算部
３３により構成されている。The time series of feature parameters determined by the speech analysis section 2 is sent to the recognition processing section 3, where it is subjected to recognition processing. This recognition processing unit 3 includes a voice feature vector extraction unit 31 that extracts a voice feature vector using the feature parameters obtained by the voice analysis unit 2, a voice feature vector extracted by this voice feature vector extraction unit 31, and a recognition target. It is composed of a similarity calculation section 32 that performs pattern matching with the category recognition dictionary 4 and calculates the similarity, and a likelihood calculation section 33 that calculates the likelihood from the similarity value obtained by the similarity calculation section 32. There is.

【００１８】この場合、音声特徴ベクトル抽出部３１は
、音声特徴パラメータを用いて入力音声の始終端を検出
するとともに、始終端点で挟まれる音声区間の特徴パラ
メータをリサンプル抽出し、例えば、第２図に示す周波
数方向に１６次元、時間軸方向に１６次元のような一定
次元の音声特徴ベクトルを求めるようにしている。そし
て音声特徴ベクトル抽出部３１で求められる音声特徴ベ
クトルは、類似度演算部３２および認識辞書学習部６に
送られるようになっている。ここでの認識辞書学習部６
は、音声特徴ベクトルと後述する認識結果出力部５の認
識結果を用いて認識辞書の学習を行うものである。類似
度演算部３２は、音声特徴ベクトル抽出部３１より与え
られる音声特徴ベクトルと認識辞書４に登録されている
認識対象カテゴリとの間でのパターン照合を行うように
している。ここでの類似度演算部３２におけるパターン
照合は、例えば複合類似度演算などの手法を用いるよう
にしている。また、この類似度演算部３２での照合結果
は、尤度演算用パラメータ推定部９に送られ、尤度演算
用パラメータ１０の推定に供するようにもしている。尤度演算部３３は、類似度演算部３２で求められた類似
度値を尤度演算用パラメータ１０を用いて尤度に変換す
るようにしている。In this case, the speech feature vector extracting unit 31 detects the beginning and end of the input speech using the speech feature parameters, and resamples and extracts the feature parameters of the speech section sandwiched between the start and end points. A voice feature vector of constant dimensions, such as 16 dimensions in the frequency direction and 16 dimensions in the time axis direction, as shown in the figure, is obtained. The speech feature vector obtained by the speech feature vector extraction section 31 is sent to the similarity calculation section 32 and the recognition dictionary learning section 6. Recognition dictionary learning section 6 here
In this method, a recognition dictionary is trained using a voice feature vector and recognition results from a recognition result output unit 5, which will be described later. The similarity calculation section 32 performs pattern matching between the speech feature vector provided by the speech feature vector extraction section 31 and the recognition target category registered in the recognition dictionary 4. The pattern matching in the similarity calculation unit 32 here uses a technique such as a composite similarity calculation, for example. Further, the matching result from the similarity calculation section 32 is sent to the likelihood calculation parameter estimating section 9, and is used for estimating the likelihood calculation parameter 10. The likelihood calculation section 33 converts the similarity value obtained by the similarity calculation section 32 into a likelihood using the likelihood calculation parameter 10.

【００１９】ここで、尤度として事後確率を求めるには
、カテゴリＣｉに対して類似度値Ｓｉを持った時に、実
際にＣｉに属する確率Ｐ（Ｃｉ｜Ｓｉ）を求めるように
なるが、事後確率Ｐ（Ｃｉ｜Ｓｉ）は直接的に求めるこ
とができないため、ベイズの定理からHere, in order to obtain the posterior probability as the likelihood, when we have a similarity value Si for the category Ci, we will obtain the probability P (Ci | Si) of actually belonging to Ci. Since the probability P(Ci | Si) cannot be determined directly, from Bayes' theorem,

【００２０】[0020]

【数１】[Math 1]

【００２１】により求めるようになる。It can be found by:

【００２２】なお、Ｐ（Ｓｉ｜Ｃｉ）は、カテゴリがＣ
ｉの時に類似度Ｓｉをとる確率、Ｐ（Ｃｉ）は、カテゴ
リＣｉが出現する確率で、等しいとおける。これにより
、Ｐ（Ｓｉ｜Ｃｉ）よりＰ（Ｃｉ｜Ｓｉ）を導き出すこ
とが可能になる。（参考文献：電子情報通信学会　　パ
ターン認識・理解研究会　　資料ＰＲＵ８７−１８（１
９８７））図１では、Ｐ（Ｓｉ｜Ｃｉ）は、尤度パラメ
ータ推定用音声データ８を音声入力部１に入力すること
で、音声分析部２の音声分析で音声特徴パラメータを抽
出し、音声特徴ベクトルを抽出した後に、類似度演算部
３２により演算される類似度値の分布から求めることが
可能になる。実際には、入力音声の属するカテゴリの認
識辞書４と音声特徴ベクトルとの類似度を用い、その類
似度分布の従う分布関数を求め、その分布関数を決定す
るいくつかのパラメータを求めることで、各カテゴリ毎
に求めることができる。[0022] Note that P(Si|Ci) has a category of C
The probability that the similarity degree Si is taken when i is equal, P(Ci), is the probability that the category Ci appears, and can be assumed to be equal. This makes it possible to derive P(Ci|Si) from P(Si|Ci). (Reference: Institute of Electronics, Information and Communication Engineers Pattern Recognition and Understanding Study Group Material PRU87-18 (1)
987)) In FIG. 1, P(Si|Ci) extracts speech feature parameters by inputting the speech data 8 for likelihood parameter estimation into the speech input section 1, and extracts speech feature parameters through speech analysis in the speech analysis section 2. After extracting the feature vector, it can be determined from the distribution of similarity values calculated by the similarity calculation unit 32. In reality, by using the similarity between the recognition dictionary 4 of the category to which the input speech belongs and the speech feature vector, finding the distribution function that the similarity distribution follows, and finding some parameters that determine the distribution function, It can be determined for each category.

【００２３】また、[0023] Also,

【００２４】[0024]

【数２】[Math 2]

【００２５】についても認識対象となる全力カテゴリの
認識辞書と音声特徴ベクトルとの類似度を用いて、その
類似度分布を求め、Ｐ（Ｓｉ｜Ｃｉ）と同様の方法で求
めることができる。そして、これらのＰ（Ｓｉ｜Ｃｉ）
から、Ｐ（Ｃｉ｜Ｓｉ）を求めることで、類似度の尤度
化（事後確率化）が可能となる。[0025] Also, the similarity distribution can be obtained using the similarity between the recognition dictionary of the full-power category to be recognized and the speech feature vector in the same manner as P(Si|Ci). And these P(Si|Ci)
By obtaining P(Ci|Si) from , it becomes possible to convert the similarity into likelihood (posterior probability).

【００２６】そして、尤度演算部３３で類似度の尤度化
が行われた結果は、認識結果出力部５に送られ、認識結
果が出力される。また、この認識結果出力部５での認識
結果は、認識辞書学習部６に送られる。Then, the result obtained by converting the similarity into a likelihood in the likelihood calculation section 33 is sent to the recognition result output section 5, where the recognition result is output. Further, the recognition result from the recognition result output section 5 is sent to the recognition dictionary learning section 6.

【００２７】ここで、認識処理部３は、音声特徴ベクト
ル抽出部３１と類似度演算部３２を分けて述べたが、こ
れらを１つにまとめ、入力音声の始終端検出を行うこと
なしに連続パターン照合によるワードスポッティング手
法を用いて、各カテゴリの類似度を求めることも可能で
ある。Here, the recognition processing unit 3 has been described with the voice feature vector extraction unit 31 and the similarity calculation unit 32 separately, but these are combined into one and the recognition processing unit 3 continuously processes the input voice without detecting the beginning and end of the input voice. It is also possible to determine the similarity of each category using a word spotting method using pattern matching.

【００２８】次に、このように構成した実施例の動作を
説明する。Next, the operation of the embodiment configured as described above will be explained.

【００２９】まず、図３に示すような連続する音声パタ
ーンの照合の場合を説明すると、基本的には、音声入力
部１の入力音声を音声分析部２で分析することで求めら
れる特徴パラメータの系列から、その特徴パラメータを
求めた各分析フレームをそれぞれ終端点Ｅに仮設定し、
これら終端点Ｅを基準にして、ある音声区間条件を満た
す複数の始端点からなる始端候補区間Ｓを仮設定する。そして、これらの始終端点で示される仮の音声区間の特
徴パラメータの系列を時間軸方向にリサンプル処理し、
音声区間を異にする所定の次元数の特徴ベクトルを終端
点Ｅを基準として求める。次いで、このように終端点Ｅ
を基準に求められた所定の次元数の複数特徴ベクトルを
終端点Ｅを時間軸方向にシフトしながら順次連続的に抽
出し、各特徴ベクトルと認識辞書４との類似度をそれぞ
れ求めるとともに、各特徴ベクトルについて求められた
類似度系列から各カテゴリごとに最大類似度を示す音声
特徴ベクトルと、始終端情報を求めるようにする。この
ようにすることで始終端検出の誤りを低減できる。First, to explain the case of matching continuous speech patterns as shown in FIG. Temporarily set each analysis frame whose characteristic parameters have been determined from the series as the terminal point E,
Based on these terminal points E, a starting end candidate section S consisting of a plurality of starting points satisfying a certain voice section condition is provisionally set. Then, the series of feature parameters of the hypothetical voice section indicated by these start and end points is resampled in the time axis direction, and
A feature vector of a predetermined number of dimensions that differs between voice sections is obtained using the terminal point E as a reference. Then, set the terminal point E like this
A plurality of feature vectors with a predetermined number of dimensions obtained based on the above are sequentially and continuously extracted while shifting the terminal point E in the time axis direction, and the degree of similarity between each feature vector and the recognition dictionary 4 is determined. A speech feature vector indicating the maximum similarity and start/end information are obtained for each category from the similarity series obtained for the feature vectors. By doing so, errors in detecting the start and end points can be reduced.

【００３０】次に、同実施例装置の動作を図４に従い説
明する。Next, the operation of the apparatus of this embodiment will be explained with reference to FIG.

【００３１】まず、認識辞書学習用データ７を分析し、
始終端検出を行なった後に抽出した音声特徴ベクトルを
元にして初期の認識辞書４を作成する。（ステップＡ１
）。First, the recognition dictionary learning data 7 is analyzed,
An initial recognition dictionary 4 is created based on the voice feature vector extracted after detecting the beginning and end. (Step A1
).

【００３２】次に、尤度演算用パラメータ推定用データ
８から抽出した音声特徴ベクトルを元に類似度を求め、
その類似度分布から尤度演算用パラメータ推定部９で尤
度演算用パラメータの推定を行い、初期の尤度演算用パ
ラメータ１０を作成する（ステップＡ２）。Next, the degree of similarity is determined based on the speech feature vector extracted from the parameter estimation data 8 for likelihood calculation,
The likelihood calculation parameter estimation unit 9 estimates the likelihood calculation parameters from the similarity distribution, and creates the initial likelihood calculation parameters 10 (step A2).

【００３３】そして、ステップ１で作成した初期の認識
辞書４を用いて認識辞書学習用データ７から抽出した音
声特徴ベクトルとの類似度を求めるとともに（ステップ
Ａ３）、初期尤度演算用パラメータ１０を用いて、類似
度を尤度に変換する（ステップＡ４）。そして、ステッ
プＡ４で求められた尤度に基づいて認識結果出力部５に
て認識処理を実行する（ステップＡ５）。Then, using the initial recognition dictionary 4 created in step 1, the degree of similarity with the speech feature vector extracted from the recognition dictionary training data 7 is determined (step A3), and the initial likelihood calculation parameter 10 is calculated. , to convert the similarity into likelihood (step A4). Then, the recognition result output unit 5 executes recognition processing based on the likelihood obtained in step A4 (step A5).

【００３４】次に、ステップＡ６でデータ終了かを判断
する。ここではデータ終了によりＹＥＳと判断されるま
では、ステップＡ３〜ステップＡ５の動作が繰り返して
実行される。Next, in step A6, it is determined whether the data is complete. Here, the operations of steps A3 to A5 are repeatedly executed until the determination is YES due to data completion.

【００３５】そして、ステップＡ６でデータ終了を判断
すると、類似度演算に用いた音声特徴ベクトルと認識結
果出力部５の認識結果を用いて認識辞書学習部６により
認識辞書４の更新を行う（ステップＡ７）。When it is determined in step A6 that the data has ended, the recognition dictionary learning unit 6 updates the recognition dictionary 4 using the voice feature vector used in the similarity calculation and the recognition result of the recognition result output unit 5 (step A6). A7).

【００３６】ここで、認識辞書４の更新にともなう学習
は、以下述べるように共分散行列を更新し、主成分分析
を行うことで実行される。具体的には、式（１）で共分
散行列の更新を行う。Here, the learning associated with updating the recognition dictionary 4 is performed by updating the covariance matrix and performing principal component analysis as described below. Specifically, the covariance matrix is updated using equation (1).

【００３７】[0037]

【数３】[Math 3]

【００３８】なお、更新係数αは、認識結果にしたがっ
て変化される。Note that the update coefficient α is changed according to the recognition result.

【００３９】そして、このようにして更新される共分散
行列より主成分分析の１つであるＫ−Ｌ展開により複数
個の固有値、固有ベクトルを求め、これを複合類似度の
認識辞書とする。（参考文献：日本音響学会講演論文集
　　ｐｐ．１０９〜１１０（昭和６１年１０月））次に
、更新された認識辞書４を用いて尤度演算用パラメータ
推定用データ８から抽出した音声特徴ベクトルとの類似
度を求める（ステップＡ８）。次に、ステップＡ９でデ
ータ終了かを判断する。ここではデータ終了によりＹＥ
Ｓと判断されるまでは、ステップＡ８の動作が繰り返し
て実行される。Then, a plurality of eigenvalues and eigenvectors are obtained from the thus updated covariance matrix by K-L expansion, which is one type of principal component analysis, and these are used as a recognition dictionary for composite similarity. (Reference: Proceedings of the Acoustical Society of Japan, pp. 109-110 (October 1986)) Next, the speech feature vector extracted from the parameter estimation data 8 for likelihood calculation using the updated recognition dictionary 4. (Step A8). Next, in step A9, it is determined whether the data has ended. Here, YE due to the end of data
The operation of step A8 is repeatedly executed until it is determined that S is determined.

【００４０】そして、ステップＡ９でデータ終了を判断
すると、ステップＡ８で得られた類似度値を用いて尤度
演算用のパラメータを再推定して次回の尤度演算に供え
るようになる（ステップＡ１０）。When it is determined in step A9 that the data has ended, the parameters for likelihood calculation are re-estimated using the similarity value obtained in step A8 and used for the next likelihood calculation (step A10). ).

【００４１】その後、ステップＡ１１で認識辞書の学習
終了かを判断する。ここでは学習終了によりＹＥＳと判
断されるまでは、ステップＡ３以降の動作が繰り返して
実行され、ＹＥＳと判断されると処理を終了する。Thereafter, in step A11, it is determined whether learning of the recognition dictionary is completed. Here, the operations from step A3 onwards are repeatedly executed until the determination is YES due to the completion of learning, and when the determination is YES, the process is terminated.

【００４２】したがって、このようにすれば認識辞書学
習用データ７から抽出した特徴ベクトルと該特徴ベクト
ルの認識結果を用いて認識辞書４の学習を行うとともに
、尤度演算用パラメータ推定データ８から抽出した特徴
ベクトルと学習後の認識辞書４との照合により求められ
る類似度から上記尤度演算用のパラメータを推定して次
回の認識処理に供するようになるので、類似度演算に用
いる認識辞書４の充実が可能となり、かつ認識辞書に対
して最適な尤度演算用パラメータを推定することができ
、認識性能の向上を期待することができる。また、この
実施例では、認識辞書学習用データ７と尤度演算用パラ
メータ推定用データ８を別個に有することにより、信頼
性の高い認識辞書４を学習を期待できる。Therefore, in this way, the recognition dictionary 4 is trained using the feature vector extracted from the recognition dictionary training data 7 and the recognition result of the feature vector, and the recognition dictionary 4 is trained using the feature vector extracted from the recognition dictionary training data 7 and the recognition result of the feature vector is used. The parameters for the above-mentioned likelihood calculation are estimated from the similarity obtained by comparing the learned feature vector with the recognition dictionary 4 after learning, and are used for the next recognition process. It is possible to enrich the recognition dictionary, estimate the optimal likelihood calculation parameters for the recognition dictionary, and expect an improvement in recognition performance. Furthermore, in this embodiment, by having the recognition dictionary learning data 7 and the likelihood calculation parameter estimation data 8 separately, it is possible to expect a highly reliable recognition dictionary 4 to be learned.

【００４３】ここで、認識辞書学習用データ７と尤度演
算用パラメータ推定用データ８を同じものとした場合、
これにより求められる類似度の分布は、図５（ａ）に示
すように認識辞書４の学習が進むにつれて高い値を示す
ようになり、同図（ｂ）に示す認識辞書学習用データ以
外の音声データの示す類似度分布と大きくずれる。これ
により認識辞書学習用音声データから得られた類似度分
布を用いて尤度演算用パラメータを推定し、この尤度演
算パラメータを用いて、未知入力の音声を認識するので
は認識性能の向上は期待できないことになる。この結果
からも、認識辞書学習用データ７と尤度演算用パラメー
タ推定用データ８を別々にすることで認識辞書の信頼性
が高まることが理解できる。Here, if the recognition dictionary learning data 7 and the likelihood calculation parameter estimation data 8 are the same,
As shown in FIG. 5(a), the distribution of similarity obtained by this becomes higher as the learning of the recognition dictionary 4 progresses, and There is a large deviation from the similarity distribution shown by the data. As a result, recognition performance cannot be improved by estimating the likelihood calculation parameters using the similarity distribution obtained from the speech data for recognition dictionary training, and by using this likelihood calculation parameter to recognize unknown input speech. It's something you can't expect. From this result as well, it can be understood that the reliability of the recognition dictionary is increased by separating the recognition dictionary learning data 7 and the likelihood calculation parameter estimation data 8.

【００４４】次に、図６は本発明の他の実施例を示すも
のである。Next, FIG. 6 shows another embodiment of the present invention.

【００４５】この場合、図６は、上述した図１に示す認
識辞書学習用データ７および尤度演算用パラメータ推定
用データ８に代えてデータセット選択部１１により各種
データが選択されるデータセット部１２を用いるように
している。その他は図１と同様であり、同一部分には同
符号を付してその説明を省略する。In this case, FIG. 6 shows a data set section in which various data are selected by the data set selection section 11 in place of the recognition dictionary learning data 7 and the likelihood calculation parameter estimation data 8 shown in FIG. 12 is used. The other parts are the same as those in FIG. 1, and the same parts are given the same reference numerals and the explanation thereof will be omitted.

【００４６】ここで、データセット選択部１１は、上述
した認識辞書の学習処理および尤度演算用パラメータ推
定に供するデータをデータセット部１２より選択するも
のである。そして、データセット部１２は、次のように
構成されている。Here, the data set selection unit 11 selects data from the data set unit 12 to be used in the above-described recognition dictionary learning process and likelihood calculation parameter estimation. The data set section 12 is configured as follows.

【００４７】図７は、データセット部１２を説明するた
めの模式図である。この場合、データセット部１２は、
認識対象カテゴリ別に、例えば５０のデータを格納した
データセット１２１を複数（図示例では＃１〜＃ｎのｎ
個）有している。そして、このデータセット部１２の複
数のデータセット１２１は、データ選択部１１の動作に
より認識辞書学習時には認識辞書学習用データ１２２と
して、また尤度演算用パラメータの推定時には尤度演算
用パラメータ推定用データ１２３としてそれぞれ選択さ
れ、音声入力部１に送り出されるようになっている。FIG. 7 is a schematic diagram for explaining the data set section 12. As shown in FIG. In this case, the data set section 12
For each recognition target category, there are multiple data sets 121 storing, for example, 50 data (in the illustrated example, n of #1 to #n).
) have. The plurality of data sets 121 of the data set unit 12 are used as recognition dictionary learning data 122 when learning a recognition dictionary by the operation of the data selection unit 11, and as data for estimating parameters for likelihood calculation when estimating parameters for likelihood calculation. Each is selected as data 123 and sent to the audio input section 1.

【００４８】ここで、データ選択部１１でのデータセッ
ト１２１の選択は、同時に認識辞書学習用データ１２２
および尤度演算用パラメータ推定用データ１２３として
重複しないようにしている。また、尤度演算用パラメー
タ推定用データ１２３として選択されたデータセット１
２１は、次回の認識辞書学習時の認識辞書学習用データ
１２２として選択されるようにしている。Here, the data selection unit 11 selects the data set 121 at the same time as the recognition dictionary learning data 122.
and likelihood calculation parameter estimation data 123 so as not to overlap. In addition, dataset 1 selected as data 123 for estimating parameters for likelihood calculation
21 is selected as the recognition dictionary learning data 122 during the next recognition dictionary learning.

【００４９】次に、図８はデータ選択部１１によるデー
タセット部１２でのデータセット１２１の選択処理の手
順を示すものである。この場合、データセット部１２は
５個（＝Ｎ）のデータセット１２１を有し、それぞれの
データセット１２１は、認識対象カテゴリ別に、例えば
５０のデータを格納しているものとする。Next, FIG. 8 shows the procedure for selecting the data set 121 in the data set unit 12 by the data selection unit 11. In this case, it is assumed that the data set unit 12 has five (=N) data sets 121, and each data set 121 stores, for example, 50 data for each recognition target category.

【００５０】まず、ｎを１とし（ステップＢ１）、ｎ＝
１で＃１のデータセット１２１が認識辞書学習用データ
１２２として選択され（ステップＢ２）、音声入力部１
に送り出される。これにより、＃１のデータセット１２
１を用いて、上述した図４のフローチャートに従った認
識辞書４の学習が実行される（ステップＢ３）。この場
合、＃１のデータセット１２１が選択されることから認
識辞書４の学習に供される各認識対象単語のデータ数は
、５０個となる。なお、初期辞書および尤度演算用パラ
メータは、あらかじめ別のデータセットにより求められ
ているものとする。First, n is set to 1 (step B1), and n=
1, the data set 121 of #1 is selected as the recognition dictionary learning data 122 (step B2), and the voice input unit 1
sent to. As a result, data set 12 of #1
1, the recognition dictionary 4 is trained according to the flowchart of FIG. 4 described above (step B3). In this case, since the #1 data set 121 is selected, the number of data of each recognition target word used for learning of the recognition dictionary 4 is 50. Note that it is assumed that the initial dictionary and the parameters for likelihood calculation are determined in advance from another data set.

【００５１】次に、ｎに１を加えた値をｊとし（ステッ
プＢ４）、ｊ＝２で＃２のデータセット１２１が、尤度
演算用パラメータ推定用データ１２３として選択され（
ステップＢ５）、音声入力部１に送り出される。これに
より、＃２のデータセット１２１を用いて、上述した図
４のフローチャートに従った尤度演算用パラメータの推
定処理が実行される（ステップＢ６）。Next, the value obtained by adding 1 to n is set to j (step B4), and when j=2, the data set 121 of #2 is selected as the data 123 for estimating the parameter for likelihood calculation (
Step B5), the audio is sent to the audio input section 1. As a result, the process of estimating the likelihood calculation parameters according to the flowchart of FIG. 4 described above is executed using the #2 data set 121 (step B6).

【００５２】次に、ｎに１を加えてｎ＝２とし（ステッ
プＢ７）、次のステップＢ８でデータセット１２１の数
Ｎ（＝５）と比較する。ここでは、ｎはＮより小さくＮ
Ｏと判断されるので、ステップＢ２に戻され、今度は、
ｎ＝２で＃１および＃２のデータセット１２１が認識辞
書学習用データ１２２として選択され（ステップＢ２）
、これら＃１および＃２のデータセット１２１を用いて
、認識辞書４の学習が実行される（ステップＢ３）。こ
の場合、＃１および＃２のデータセット１２１が選択さ
れることから認識辞書４の学習に供されるデータ数は、
５０個から１００個になる。Next, 1 is added to n to make n=2 (step B7), and in the next step B8, it is compared with the number N (=5) of data sets 121. Here, n is smaller than N
Since it is judged as O, the process returns to step B2, and this time,
When n=2, data sets 121 of #1 and #2 are selected as recognition dictionary learning data 122 (step B2).
, learning of the recognition dictionary 4 is executed using these #1 and #2 data sets 121 (step B3). In this case, since the #1 and #2 data sets 121 are selected, the number of data used for learning the recognition dictionary 4 is:
From 50 pieces to 100 pieces.

【００５３】次に、ｎに１を加えた値をｊとし（ステッ
プＢ４）、ｊ＝３で＃３のデータセット１２１が、尤度
演算用パラメータ推定用データ１２３として選択され（
ステップＢ５）、この＃３のデータセット１２１を用い
て、尤度演算用パラメータの推定処理が実行される（ス
テップＢ６）。Next, the value obtained by adding 1 to n is set to j (step B4), and when j=3, the data set 121 of #3 is selected as the parameter estimation data 123 for likelihood calculation (
Step B5), using this #3 data set 121, a process of estimating the parameters for likelihood calculation is executed (step B6).

【００５４】そして、次に、ｎに１を加えてｎ＝３とし
（ステップＢ７）、次のステップＢ８でデータセット１
２１の数Ｎ（＝５）と比較する。Next, 1 is added to n to make n=3 (step B7), and in the next step B8, data set 1 is
Compare with the number N (=5) of 21.

【００５５】以下、ステップＢ８でｎがＮより小さくＮ
Ｏと判断される場合は、上述した処理が繰り返して実行
され、一方、ｎがＮ（＝５）より大きくなりＹＥＳと判
断されたら処理を終了する。[0055] Thereafter, in step B8, if n is smaller than N, then N
If it is determined to be O, the above-described process is repeatedly executed. On the other hand, if n becomes larger than N (=5) and it is determined to be YES, the process is terminated.

【００５６】この場合、３回目の認識辞書学習処理では
、認識辞書学習用データ１２２として＃１〜＃３のデー
タセット１２１が選択され、これら＃１〜＃３のデータ
セット１２１を用いて、認識辞書４の学習が実行される
。この場合、認識辞書４の学習に供されるデータ数は１
５０個となる。また、尤度演算用パラメータ推定用デー
タ１２３として＃４のデータセット１２１が選択される
。In this case, in the third recognition dictionary learning process, data sets #1 to #3 are selected as the recognition dictionary learning data 122, and these data sets 121 to #3 are used to perform recognition. Learning of dictionary 4 is executed. In this case, the number of data used for learning the recognition dictionary 4 is 1
There will be 50 pieces. Further, the #4 data set 121 is selected as the parameter estimation data 123 for likelihood calculation.

【００５７】また、４回目の認識辞書学習処理では、認
識辞書学習用データ１２２として＃１〜＃４のデータセ
ット１２１が選択され、これら＃１〜＃４のデータセッ
ト１２１を用いて、認識辞書４の学習が実行される。こ
の場合、認識辞書４の学習に供されるデータ数は２００
個となる。また、尤度演算用パラメータ推定用データ１
２３として＃５のデータセット１２１が選択される。In addition, in the fourth recognition dictionary learning process, the data sets #1 to #4 are selected as the recognition dictionary learning data 122, and these data sets 121 to #4 are used to create the recognition dictionary. 4 learning is executed. In this case, the number of data used for learning the recognition dictionary 4 is 200.
Become an individual. In addition, data 1 for estimating parameters for likelihood calculation
The data set 121 of #5 is selected as the data set 23.

【００５８】したがって、このようにすれば複数のデー
タセット１２１の中から認識辞書学習用と尤度演算用パ
ラメータ推定用のデータセット１２１を別々に選択して
、認識辞書学習用として選択されたデータセット１２１
によりこのデータセット１２１による特徴ベクトルの認
識結果を用いて認識辞書４の学習を行うとともに、尤度
演算用パラメータ推定用として選択されたデータセット
１２１による特徴ベクトルと学習後の認識辞書との照合
により尤度演算用パラメータを推定し、更に、次回の認
識辞書の学習用には、前回の尤度演算用パラメータ推定
用のデータセット１２１を含めてデータセット１２１を
選択し、尤度演算用パラメータ推定には、残ったデータ
セット１２１のうち認識辞書学習用として選択されてい
ないデータセット１２１を選択するようになるので、認
識辞書４の学習処理をきめ細かに行うことができ、類似
度演算に用いられる認識辞書をより一層充実したものに
でき、しかも、認識辞書に対する尤度演算用パラメータ
の推定処理も充実したものにでき、認識性能の向上を更
に期待できるようになる。Therefore, in this way, the data sets 121 for recognition dictionary training and the data sets 121 for likelihood calculation parameter estimation are separately selected from among the plurality of data sets 121, and the data set 121 for the recognition dictionary training is selected separately. set 121
The recognition dictionary 4 is trained using the recognition results of the feature vectors from this data set 121, and the feature vectors from the data set 121 selected for likelihood calculation parameter estimation are compared with the learned recognition dictionary. The parameters for the likelihood calculation are estimated, and furthermore, for the next learning of the recognition dictionary, the dataset 121 including the previous data set 121 for estimating the parameters for the likelihood calculation is selected, and the parameter estimation for the likelihood calculation is performed. In this case, the dataset 121 that has not been selected for recognition dictionary training is selected from among the remaining datasets 121, so that the learning process of the recognition dictionary 4 can be performed in detail, and the dataset 121 used for similarity calculation can be performed in detail. The recognition dictionary can be made even more complete, and the process of estimating the likelihood calculation parameters for the recognition dictionary can also be made more complete, making it possible to expect further improvement in recognition performance.

【００５９】なお、本発明は、上述した実施例に限定さ
れるものでなく、例えば、上述の実施例では入力データ
として音声データを用い、この音声データについての認
識および学習処理の例を述べたが、音声データの代わり
に文字データについて認識、学習処理を行うことも可能
である。この場合の認識処理および学習処理に用いられ
る特徴ベクトルの次元数やパターン照合の手法について
も特に限定されることはない。Note that the present invention is not limited to the embodiments described above; for example, in the embodiments described above, voice data is used as input data, and an example of recognition and learning processing for this voice data is described. However, it is also possible to perform recognition and learning processing on character data instead of voice data. There are no particular limitations on the number of dimensions of the feature vectors used in the recognition processing and learning processing in this case, or on the pattern matching method.

【００６０】本発明のポイントは、認識辞書の学習に用
いるデータと尤度演算パラメータ推定用のデータを別々
に持ち、各々のデータから生成した認識辞書および尤度
演算パラメータを用いて、入力データの認識処理および
学習処理を行うものであり、その要旨を逸脱しない範囲
で種々変形して実施することができる。The key point of the present invention is that the data used for learning the recognition dictionary and the data for estimating the likelihood calculation parameters are separately provided, and the recognition dictionary and likelihood calculation parameters generated from each data are used to calculate the input data. It performs recognition processing and learning processing, and can be implemented with various modifications without departing from the gist.

【００６１】[0061]

【発明の効果】本発明のパターン認識装置によれば、認
識辞書の学習と、尤度演算のためのパラメータ推定を高
い信頼性により効率よく行うことができ、認識性能の優
れたパターン認識を実現することができる。[Effects of the Invention] According to the pattern recognition device of the present invention, learning of a recognition dictionary and parameter estimation for likelihood calculation can be performed efficiently with high reliability, realizing pattern recognition with excellent recognition performance. can do.

[Brief explanation of the drawing]

【図１】本発明の一実施例の概略構成を示すブロック図
。FIG. 1 is a block diagram showing a schematic configuration of an embodiment of the present invention.

【図２】図１の実施例において音声特徴ベクトルの一例
を示す図。FIG. 2 is a diagram showing an example of audio feature vectors in the embodiment of FIG. 1;

【図３】図１の実施例において連続する音声パターン照
合の場合を説明するための図。FIG. 3 is a diagram for explaining a case of consecutive voice pattern matching in the embodiment of FIG. 1;

【図４】図１の実施例の動作を説明するためのフローチ
ャート。FIG. 4 is a flowchart for explaining the operation of the embodiment of FIG. 1;

【図５】図１の実施例で認識辞書学習用データと尤度演
算用パラメータ推定用データを同じものとした場合の状
態を説明するための図。FIG. 5 is a diagram for explaining a state when recognition dictionary learning data and likelihood calculation parameter estimation data are the same in the embodiment of FIG. 1;

【図６】本発明の他の実施例の概略構成を示すブロック
図。FIG. 6 is a block diagram showing a schematic configuration of another embodiment of the present invention.

【図７】図６の他の実施例に用いられるデータセット部
を説明するための模式図。FIG. 7 is a schematic diagram for explaining a data set section used in another embodiment of FIG. 6;

【図８】図６の他の実施例でのデータ選択部によるデー
タセット部でのデータセットの選択処理の手順を説明す
るためのフローチャート。FIG. 8 is a flowchart for explaining the procedure of the data set selection process in the data set unit by the data selection unit in another embodiment of FIG. 6;

[Explanation of symbols]

１…音声入力部、２…音声分析部、３…認識処理部、３
１…音声特徴ベクトル抽出部、３２…類似度演算部、３
３…尤度演算部、４…認識辞書、５…認識結果出力部、
６…認識辞書学習部、７…認識辞書学習用データ、８…
尤度演算用パラメータ推定用データ、９…尤度演算用パ
ラメータ推定部、１０…尤度演算用パラメータ、１１…
データセット選択部、１２…データセット部、１２１…
データセット。1...Speech input section, 2...Speech analysis section, 3...Recognition processing section, 3
1... Audio feature vector extraction unit, 32... Similarity calculation unit, 3
3... Likelihood calculation unit, 4... Recognition dictionary, 5... Recognition result output unit,
6... Recognition dictionary learning section, 7... Recognition dictionary learning data, 8...
Likelihood calculation parameter estimation data, 9... Likelihood calculation parameter estimation unit, 10... Likelihood calculation parameters, 11...
Data set selection section, 12...Data set section, 121...
data set.

Claims

[Claims]

Claim 1: Extract a feature vector of a certain dimension from the feature parameters obtained by analyzing input data, compare this feature vector with a recognition dictionary to determine the degree of similarity, calculate the likelihood corresponding to the degree of similarity, and calculate the degree of similarity. In a pattern recognition device that uses likelihood to obtain a recognition result for the input data, the recognition dictionary is trained using the feature vector extracted from the recognition dictionary training data and the recognition result of the data. The parameter for the likelihood calculation is estimated from the similarity obtained by comparing the feature vector extracted from the parameter estimation data for the likelihood calculation with the recognition dictionary after learning, and used for the next recognition process. pattern recognition device.

[Claim 2] A feature vector of a certain dimension is extracted from feature parameters obtained by analyzing input data, and this feature vector is compared with a recognition dictionary to obtain a degree of similarity, and a likelihood corresponding to the degree of similarity is calculated. In a pattern recognition device that uses likelihood to obtain recognition results for the above input data, it is now possible to select a dataset for recognition dictionary learning processing and parameter estimation for likelihood calculation from multiple datasets prepared in advance. The data set selection means executes recognition dictionary learning using the feature vectors from the data set selected for recognition dictionary learning by the data set selection means and the recognition results of the data. The parameters for the likelihood calculation are estimated by comparing the feature vector from the dataset selected for estimating the parameters for the likelihood calculation with the recognition dictionary after learning, and the current likelihood calculation is used for the next learning of the recognition dictionary. A pattern recognition apparatus characterized in that a data set is selected including a data set selected for parameter estimation for a likelihood calculation, and a data set not used for recognition dictionary learning is selected for a parameter estimation for a likelihood calculation.

3. A pattern recognition apparatus according to claim 1, wherein learning of a recognition dictionary and estimation of parameters for likelihood calculation are repeatedly executed in the pattern recognition apparatus according to claim 1 or 2.