JPS62114082A

JPS62114082A - Pattern recognition learning system

Info

Publication number: JPS62114082A
Application number: JP60254093A
Authority: JP
Inventors: Hiroshi Matsuura; 博松浦; Yoichi Takebayashi; 洋一竹林
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1985-11-13
Filing date: 1985-11-13
Publication date: 1987-05-25
Anticipated expiration: 2011-06-12
Also published as: JP2507308B2

Abstract

PURPOSE:To execute a recognition processing efficiently by controlling the number of axes of a recognition dictionary in accordance with the number of dimensions of a feature which is extracted from an input pattern offered for the recognition processing. CONSTITUTION:An input voice which has been inputted through a voice input part 1 is given to a feature extracting part 1, and a feature vector of its input voice pattern is extracted. The feature vector of the input voice pattern which as been derived by the feature extracting part 2 is given to a recognizing part 3, and a learning part 4, at the time of a recognition processing of the input voice, and at the time of learning, respectively. In the learning part 4, a generation processing of a recognition dictionary (standard pattern) conforming to a learning pattern is executed, and it is stored in a standard pattern dictionary memory 6. A control part 5 controls a learning method of the recognition dictionary in the learning part 4 from the number of the input voice pattern which have been used for generating the recognition dictionary in the learning part 4, and also controls the number of axes of the recognition dictionary used for the recognition processing of the input voice pattern in the recognizing part 3, in accordance with the number of dimensions of the feature vector of the input voice.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、例えば入力音声を精度良く認識することので
きる実用性の高いパターン認識学習方式〔発明の技術的
背景とその問題点〕近時、音声に対するパターン認識処理技術が発達し、工
場における製品管理や各種電話サービス等に幅広く応用
されている。また音声ワードプロセッサへの応用も進め
られており、認識対象の拡大や認識性能の向上要求が益
々強くなってきている。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention provides a highly practical pattern recognition learning method that can, for example, accurately recognize input speech [Technical background of the invention and its problems] Recently, Pattern recognition processing technology for voice has been developed and is widely applied to product management in factories and various telephone services. Applications to voice word processors are also progressing, and demands for expanding recognition targets and improving recognition performance are becoming stronger.

ところで認識対象とする話者を制限した特定話者認識装
置では、その特定話者が発声した音声を用いて標準パタ
ーン（音声認識辞書）を作成することにより、比較的容
易に大語案認識や連続発声された音声に対する認識性能
の向上を図り得る。By the way, with a specific speaker recognition device that restricts the speakers to be recognized, it is relatively easy to recognize large phrases by creating a standard pattern (speech recognition dictionary) using the speech uttered by the specific speaker. It is possible to improve recognition performance for continuously uttered voices.

つまり認識対象とする話者が特定されていることから、
その話者に対する音声認識辞書の高性能化を容易に図り
、その認識性能を高めることが可能となる。具体的には
、例えばＤＰマツチングや単純類似度法によって入力音
声を認識する場合、認識辞書の学習用に入力された音声
パターンの平均を標準パターンとすることによって、そ
の認識辞書の高性能化を図り得る。極端な場合には、１
つの学習パターンだけからでもイの標準パターンを作成
することができ、この標準パターンを用いて入力音声を
認識することが可能となる。In other words, since the speaker to be recognized is specified,
It becomes possible to easily improve the performance of the speech recognition dictionary for the speaker and improve its recognition performance. Specifically, when recognizing input speech using DP matching or simple similarity method, for example, the performance of the recognition dictionary can be improved by using the average of the input speech patterns for learning the recognition dictionary as the standard pattern. It is possible. In extreme cases, 1
A standard pattern can be created even from just one learning pattern, and input speech can be recognized using this standard pattern.

然し乍ら、パターン変形によって上記学習パターンから
少々ずれた音声パターンが入力されると、例えば標準パ
ターンとの類似Ｈｌｆ［が低下するので、さほどその認
識率の向上は望めない。ちなみに、学習パターン数を増
大さＬると、その平均パターンとして求められる標準パ
ターンが改善され、認識率の向上を図ることができる。However, if a speech pattern slightly deviated from the learning pattern is input due to pattern deformation, the similarity Hlf[ with the standard pattern, for example, will decrease, so that the recognition rate cannot be expected to improve much. Incidentally, when the number of learning patterns is increased, the standard pattern obtained as the average pattern is improved, and the recognition rate can be improved.

しかし入力音声パターンの上記平均パターンからのずれ
に対しては依然として問題が残る。これ故、学習パター
ンをいくら増やしても成る程度以上の認識率の向上が望
めないと云う問題がある。However, a problem still remains regarding the deviation of the input speech pattern from the above-mentioned average pattern. Therefore, there is a problem in that no matter how many learning patterns are increased, the recognition rate cannot be expected to improve beyond the level that can be achieved.

一方１．平均パターンに対してずれを持つ入力音声パタ
ーンを効果的に認識する認識法として、例えば複合類似
度法や２次識別関数等の共分散行列を用いた認識法が提
唱されている。これらの認識法によれば、例えばその共
分散行列に前記平均パターンに対するずれの分布状態が
反映されるので、学習パターンを増やすことによって前
述した単純類似痩法等に比較して遥かに高い０識率を達
成することができる。On the other hand 1. Recognition methods using covariance matrices, such as a composite similarity method and a quadratic discriminant function, have been proposed as recognition methods for effectively recognizing input speech patterns that deviate from the average pattern. According to these recognition methods, for example, the covariance matrix reflects the distribution of deviations from the average pattern, so by increasing the number of learning patterns, it is possible to achieve a much higher level of zero recognition than the simple similarity reduction method described above. rate can be achieved.

ところがその反面、認識対象パターンのずれの分布を正
しく反映した共分散行列を得、高性能な認識辞書を作成
するには、膨大な量（数）の学習パターンを収集し、こ
れらを学習する必要があった。However, on the other hand, in order to obtain a covariance matrix that accurately reflects the distribution of deviations in the recognition target pattern and to create a high-performance recognition dictionary, it is necessary to collect and learn a huge amount (number) of learning patterns. was there.

[Purpose of the invention]

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、学習パターンの収集の程度に応
じて、或いは入力パターンから抽出される特徴ベクトル
の性質に応じて該入力パターンを高精度に認識処理する
ことのできる汎用性の高いパターン認識学習方式を提供
することにある。The present invention has been made in consideration of such circumstances, and its purpose is to adjust the input pattern according to the degree of collection of learning patterns or according to the nature of the feature vector extracted from the input pattern. The purpose of this invention is to provide a highly versatile pattern recognition learning method that can recognize and process patterns with high precision.

〔発明の概要）本発明は、入力パターンを分析して求められる該入力パ
ターンの特徴ベクトルと、予め登録された認識辞書とを
照合して、前記入力パターンを認識するパターン認識装
置において、認識辞書の作成に用いた学賀パターンの数、または認識
処理に供される入力パターンから抽出された特徴ベクト
ルの次元数に応じて、該特徴ベクトルとの照合に供せら
れる認識辞書の軸数を制御して、認識性能を十分高く館
持しながら、その認識処理を効率良く行わしめるように
したものである。[Summary of the Invention] The present invention provides a pattern recognition device that recognizes an input pattern by comparing a feature vector of the input pattern obtained by analyzing the input pattern with a recognition dictionary registered in advance, comprising: a recognition dictionary. The number of axes of the recognition dictionary used for matching with the feature vector is controlled according to the number of Gakuga patterns used to create it or the number of dimensions of the feature vector extracted from the input pattern used for recognition processing. In this way, recognition processing can be performed efficiently while maintaining sufficiently high recognition performance.

〔Effect of the invention〕

かくして本発明によれば、認識辞書の作成に用いられた
学習パターンの入力数、或いは入力パターンから抽出さ
れた特徴ベクＩ−ルの次元数に応じて、その特徴ベタ１
〜ルとの照合に供される認識辞書の軸数が制御される。Thus, according to the present invention, the number of input learning patterns used to create the recognition dictionary or the number of dimensions of the feature vector I-L extracted from the input pattern,
The number of axes of the recognition dictionary used for comparison with the .

従って、例えば認識辞書の学習の程度に応じて、或いは
入力パターンの質（特徴ベクトルの次元数）に応じた適
切な認識処理学続きにて該入力パターンを高精度に認識
することができる。故に、認識辞書の学習に多くの努力
を払わなくても、その学習の程度に応じた適切な認識処
理によって、その認識率の向上を図ると共に、認識処理
効率の向上を図ることが可能となる。また入力パターン
から抽出される特徴ベクトルの次元数に応じて、その特
徴ベクトルとの照合に供される認識辞書の軸数が設定さ
れるので、認識装置の汎用性を高めることが可能となる
等の実用上多大なる効果が奏せられる。Therefore, the input pattern can be recognized with high precision by using an appropriate recognition processing sequence depending on, for example, the degree of training of the recognition dictionary or the quality of the input pattern (number of dimensions of the feature vector). Therefore, without putting much effort into learning the recognition dictionary, it is possible to improve the recognition rate and recognition processing efficiency by performing appropriate recognition processing according to the degree of learning. . In addition, the number of axes of the recognition dictionary used for matching with the feature vector is set according to the number of dimensions of the feature vector extracted from the input pattern, making it possible to increase the versatility of the recognition device. A great practical effect can be achieved.

[Embodiments of the invention]

以下、図面を参照して本発明方式の一実施例につき説明
する。Hereinafter, one embodiment of the method of the present invention will be described with reference to the drawings.

尚、ここでは単語単位に発声された入力音声を認識対象
パターンとして説明するが、文字や図形等を認識対象と
するパターン認識装置であっても同様に適用可能である
。また認識対象の単位についても、例えば音素や音節、
更には連続発声された音声等、その仕様に応じて定めれ
ば良いものである。Although input speech uttered in units of words will be described here as a pattern to be recognized, the present invention is similarly applicable to a pattern recognition device that recognizes characters, figures, and the like. Also, regarding the units to be recognized, for example, phonemes, syllables,
Furthermore, it may be determined according to the specifications, such as continuously uttered voice.

第１図は実施例方式を適用した音声認識装置の概略構成
図である。FIG. 1 is a schematic configuration diagram of a speech recognition device to which an embodiment method is applied.

マイクロフォン等からなる音声入力部１を介して入力さ
れた入ノｊ音声は、特徴抽出部２に与えられ、その入力
音声パターンの特徴ベクトルが抽出されている。特徴抽
出部２は」−記入力音声を、例えばカットオフ周波数５
．６　Ｋ　ｌｌｚの低域フィルタを通した後、標本化周
波数１２１（ｌｌｚで１２ピッｌ−のディジタル信号に
量子化している。そしてこのディジタル化した入力音声
データを、例えば８チヤンネルのバンド・バス・フィル
タｌｌ’ｆを介して分析処理し、更に平滑化、対数変換
した後、例えば１０ｍ５ｅｃ毎に前記入力音声の特徴パ
ラメータの時系列として出力している。An incoming voice input through a voice input section 1 consisting of a microphone or the like is given to a feature extraction section 2, where a feature vector of the input voice pattern is extracted. The feature extraction unit 2 converts the input audio into, for example, a cutoff frequency of 5.
．． After passing through a low-pass filter of 6 Kllz, it is quantized into a 12-pitch digital signal at a sampling frequency of 121 (llz).Then, this digitized input audio data is converted into, for example, an 8-channel band bus signal. After analysis processing through a filter ll'f, further smoothing and logarithmic transformation, the input speech is outputted as a time series of characteristic parameters of the input speech every 10 m5ec, for example.

尚、このディジタル・フィルタ分析処理に代えて、線形
予測分析（ＬＰＧ分析）等を行ってその特徴パラメータ
の時系列を求めるようにしても良い。Note that instead of this digital filter analysis processing, linear predictive analysis (LPG analysis) or the like may be performed to obtain the time series of the characteristic parameters.

このようにして分析処理された特徴パラメータの時系列
から、例えばｆＩＭ区間の検出、この音声区間の検出結
果に基く特徴パラメータの切出しによって前記入力音声
パターンの特徴ベクトルが求められる。具体的には、特
徴パラメータの時系列から、音声区間を７等分割した各
分割点（時間軸方向に８点）の特徴パラメータ（周波数
軸方向に８点）を抽出し、これを入力音声（例えば単語
音声）パターンの特徴ベクトルとして出力している。From the time series of the feature parameters analyzed in this manner, the feature vector of the input speech pattern is determined by, for example, detecting the fIM section and cutting out the feature parameters based on the detection result of this speech section. Specifically, from the time series of the feature parameters, the feature parameters (8 points in the frequency axis direction) of each division point (8 points in the time axis direction) of dividing the speech interval into 7 equal parts are extracted, and these are extracted from the input speech ( For example, it is output as a feature vector of a word (speech) pattern.

従って、この場合には、前記入力音声パターンの特徴ベ
タ１〜ルは６４次元ベクトルとして抽出されることにな
る。Therefore, in this case, the feature patterns 1 to 1 of the input voice pattern are extracted as 64-dimensional vectors.

このようにして特徴抽出部２にて求められた入力音声パ
ターンの特徴ベクトルは、入力音声の認識処理時には認
識部３に与えられ、また学習時には学習部４に与えられ
る。尚、特定話者用の単語認識装置にあっては、認識処
理に供する音声の発声に先立って、学習用の音声パター
ンの発声入力が行われる。そして制御部５の制御の下で
、前記学習部４において上記学習用パターンに従う認識
辞書（標準パターン）の作成処理が行われる。この学習
部４で作成された認識辞書（標準パターン）が標準パタ
ーン′ｆＩＩｍメモリ６に格納され、前記認識部３にお
ける入力音声パターンの認識処理に供せられる。The feature vector of the input speech pattern thus obtained by the feature extraction section 2 is given to the recognition section 3 during input speech recognition processing, and is given to the learning section 4 during learning. Note that in a word recognition device for a specific speaker, a speech pattern for learning is uttered and inputted prior to uttering the speech to be subjected to recognition processing. Under the control of the control section 5, the learning section 4 creates a recognition dictionary (standard pattern) according to the learning pattern. The recognition dictionary (standard pattern) created by the learning section 4 is stored in the standard pattern 'fIIm memory 6, and is used for the recognition process of the input speech pattern in the recognition section 3.

尚、図中７は、認識部３による入力音声パターンのｇ識
結果等を表示する表示部である。Note that 7 in the figure is a display unit that displays the recognition result of the input speech pattern by the recognition unit 3.

前記制御部５は、前記学習部４にて認識辞書の作成に用
いられた人力音声バターの数から、前記学習部４にお１
ノる認識辞書の学習法を制御している。また制紳部５は
前記特徴抽出部２で抽出された入力音声の特徴ベクトル
の次元数、または前記学習部４にて認識辞書の作成に用
いた特徴ベクトルの数の少なくとも１つを用いて、前記
認識部３における入力音声パターンの認識処理に用いる
前記認識辞書の軸数を制御している。The control unit 5 causes the learning unit 4 to receive one voice based on the number of human voice butters used to create the recognition dictionary in the learning unit 4.
It controls the learning method of the Noru recognition dictionary. Further, the control unit 5 uses at least one of the number of dimensions of the feature vector of the input speech extracted by the feature extraction unit 2, or the number of feature vectors used in the learning unit 4 to create a recognition dictionary, The number of axes of the recognition dictionary used in the recognition process of the input speech pattern in the recognition unit 3 is controlled.

上記学習部４における認識辞書（標準パターン）の作成
は、例えば成る認識対象カテゴリに対する学習音声パタ
ーンが１つだけ入力された場合、これを第１軸の認ｗＡ
辞書φ１として行われる。そして２個目、３個目の学習
パターンが入力されると、これを前記第１軸の認識辞書
φ１に対してシュミットの直交化により、第２軸の［辞
書φ２、第３軸の認識辞書φ３としている。同様に学習
パターンが入力される都度、シュミットの直交化により
順次筒Ｍ軸の認識辞書φｍを作成している。The creation of the recognition dictionary (standard pattern) in the learning section 4 is performed by, for example, inputting only one learning speech pattern for the recognition target category consisting of the recognition wA of the first axis.
This is done as dictionary φ1. When the second and third learning patterns are input, they are made into Schmidt orthogonal to the recognition dictionary φ1 of the first axis, [dictionary φ2 of the second axis, recognition dictionary φ2 of the third axis] It is set to φ3. Similarly, each time a learning pattern is input, a recognition dictionary φm of the cylinder M axis is sequentially created by Schmidt's orthogonalization.

このシュミットの直交化ドよるＩｓ辞書の作成は、例え
ば第Ｍ個目の新たな学習パターンｆｎ＋が入力されたと
き、その第Ｍ軸の標準パターンφｍをとして計算することにより行われる。The creation of the Is dictionary using Schmidt's orthogonalization is performed, for example, by calculating the standard pattern φm of the M-th axis when the M-th new learning pattern fn+ is input.

このようにして学習処理に供せられた入力音声パターン
の数に応じて求められた第１軸から第Ｍ軸までの標準パ
ターンが、その認識辞書として標準パターンメモリ６に
格納される。The standard patterns from the first axis to the Mth axis obtained according to the number of input speech patterns subjected to the learning process in this way are stored in the standard pattern memory 6 as a recognition dictionary.

尚、不特定話者利用の単語音声認識装置にあっては、特
定話者利用の場合よりも入力音声パターンの変動が大き
いことから、例えばその特徴ベクトルを時間軸方向に１
６点、周波数軸方向に１６点のデータとして、２５６次
元のベクトルとして抽出する。Note that in a word speech recognition device that is used by an unspecified speaker, the fluctuation of the input speech pattern is larger than in the case of a specific speaker, so for example, the feature vector is
Data of 6 points and 16 points in the frequency axis direction is extracted as a 256-dimensional vector.

そして認識１ｌｉｖ書作成の為の学習パターン数が１〜
１０個の場合には、これらの学習パターンの共分散行列
を計算し、この共分散行列をＫＬ展開してその固有値と
固有ベクトルとを求めて、例えば第１軸から第４軸まで
の標準パターン（認識辞書）φ１．φ２．〜φ４とする
。そして認識辞書作成用の学習パターンが更に増え、そ
の個数が１１〜３０となった場合には、先に入力された
学習パターンを含めて同様にこれらの学習パターンの共
分散行列を求め、ＫＬ展開処理して、新たに、今度は第
１軸から第６軸までの標準パターン（認識辞書）φ１゜
φ２．〜φ６を求め、先に求められていた認識辞書を更
新する。And the number of learning patterns for creating a recognition 1liv document is 1~
In the case of 10 learning patterns, calculate the covariance matrix of these learning patterns, perform KL expansion of this covariance matrix to find its eigenvalues and eigenvectors, and then calculate the standard patterns from the 1st axis to the 4th axis ( recognition dictionary) φ1. φ2. ~φ4. Then, when the number of learning patterns for creating a recognition dictionary increases further and the number reaches 11 to 30, the covariance matrix of these learning patterns including the previously input learning patterns is calculated in the same way, and the KL expansion is performed. After processing, a new standard pattern (recognition dictionary) from the first axis to the sixth axis (recognition dictionary) φ1゜φ2. ~φ6 is determined, and the previously determined recognition dictionary is updated.

尚、先に求められていた認識辞書の特性核を、新たに入
力された学習パターンの特性核を用いて更新処理し、こ
れをＫＬ展開してその認識辞書を更新するようにしてｂ
良い。Note that the previously obtained characteristic kernel of the recognition dictionary is updated using the characteristic kernel of the newly input learning pattern, and this is expanded into KL to update the recognition dictionary.
good.

同様にして学習パターンの数が３０個以上に増えた場合
には、その共分散行列のＫＬ展開によって、第１軸から
第１０軸までの認識辞書（標準パターン）φ１．φ２．
〜φ１０を求める。Similarly, when the number of learning patterns increases to 30 or more, the recognition dictionary (standard pattern) φ1. φ2.
~Find φ10.

このようにして認識辞書の作成に供された学習パターン
の数に応じて、所定の軸数までの認識辞書が作成され、
標準パターン辞書メモリ６に格納される。In this way, recognition dictionaries with up to a predetermined number of axes are created according to the number of learning patterns used to create recognition dictionaries,
It is stored in the standard pattern dictionary memory 6.

しかして認識部３では、このようにして標準パターン辞
書メモリ６に格納された標準パターンφｉ　　（ｉ＝１
．２．〜ｍ）を用いて、認識処理に供せられる入力音声
パターンｆとの照合を、次の複合類似度Ｓを計算するこ
とにより行っている。In the recognition unit 3, the standard pattern φi (i=1
．． 2. ~m) is used to perform comparison with the input speech pattern f to be subjected to recognition processing by calculating the following composite similarity S.

Ｓ＝２ｍ　（（λｉ　／／λ１）（φｉ、ｆ）２１”１ ÷１１φｉ　　１１２　ＩＩ　ｆ　ｌｌ２）岬Σ（（λ
ｉ／λ１）　（φｉ、ｆ）２１＝１ ÷ＩＩ　ｆ　１１２　）但し、１１φ＋　　＋＋は（１）に正規化されたもので
あり、λ１は係数である。S=2m ((λi //λ1)(φi, f)21”1 ÷11φi 112 II f ll2) Cape Σ((λ
i/λ1) (φi, f)21=1 ÷II f 112 ) However, 11φ+ ++ is normalized to (1), and λ1 is a coefficient.

ここで、この認識処理に用いられる前記認識辞書（標準
パターン）の軸数Ｍは、前記認識辞書の作成に用いられ
た学習パターンの数、または前記特徴抽出部２で抽出さ
れた特徴ベクトルの次元数に応じて、前記制御部５の制
御の下で制御されるようになっている。Here, the number of axes M of the recognition dictionary (standard pattern) used in this recognition process is the number of learning patterns used to create the recognition dictionary, or the dimension of the feature vector extracted by the feature extraction unit 2. The controller 5 controls the controller 5 according to the number of controllers.

例えば認識辞書の作成に用いられた学習パターン数が１
〜１０個である場合には、第４軸までの標準パターンし
か求められていないことから、制御部５は認識部３に対
して第゛１輔から第４軸までの標準パターンと入力′ｆ
１声の特徴ベタ１ヘルと類似度計算を行うように指示し
ている、１まだ認識辞書の学習に用いられた学習パター
ン数が１１〜３０個の場合には、第６軸までの認識辞書
を用いた複合類似度計算を行うように指示し、学習パタ
ーン数が３１個以上の場合には、第１０軸までの認識辞
書を用いた複合類似度計算を行うＪ：うに指示している
。For example, the number of learning patterns used to create the recognition dictionary is 1.
~10, since only the standard patterns up to the 4th axis have been obtained, the control unit 5 inputs the standard patterns from the 1st to 4th axes to the recognition unit 3.
If the number of learning patterns used for learning the recognition dictionary is 11 to 30, the recognition dictionary up to the 6th axis If the number of learning patterns is 31 or more, it instructs J: to perform a composite similarity calculation using recognition dictionaries up to the 10th axis.

このようにして制胛部５は、前記ｉ！！識辞書の学習（
作成）に用いられた学習パターンの数に応じて、入力音
声の特徴ベタ１〜ルとの複合類似度計算に供せられる認
識辞書の軸数を制御し、その認識処理針−の無駄を省い
ている。In this way, the control unit 5 controls the i! ! Learning dictionary (
The number of axes of the recognition dictionary used for the composite similarity calculation with the input speech feature patterns is controlled according to the number of learning patterns used in the process (creation), thereby reducing waste of recognition processing needles. I'm there.

尚、認識処理に用いる認識辞書の軸数は、認識対象カテ
ゴリ毎にそれぞれ個別に制御されるものであっても良い
し、或いは複数の認識対象カテゴリ毎に求められた各認
識辞書の軸数の中で最低の軸数のものに合せて、全体的
に統一して制御するようにしても良い。The number of axes of the recognition dictionary used in recognition processing may be controlled individually for each recognition target category, or the number of axes of each recognition dictionary determined for each of multiple recognition target categories may be controlled individually. The overall control may be unified depending on the one with the lowest number of axes among them.

即ち、第５図は単語カテゴリについて、その認識辞書を
作成する為に学習した学習パターン数と、その学習によ
って作成された認識辞書の軸数との例を示すものである
が、この場合認識対象カテゴリによって作成された認識
辞書の軸数に差異がある。That is, Fig. 5 shows an example of the number of learning patterns learned to create a recognition dictionary for a word category and the number of axes of the recognition dictionary created by the learning. There is a difference in the number of axes of the recognition dictionary created depending on the category.

置体的には「秋田」なるカテゴリについては第４軸まで
の認識辞書しか求められていないが１、　　「東京」「
大阪」なるカテゴリについては、それぞれ第１０軸まで
のＩｓ辞書が求められている。In terms of layout, only the recognition dictionary up to the 4th axis is required for the category "Akita"1, but "Tokyo""
For the category "Osaka", Is dictionaries up to axis 10 are required.

しかして各カテゴリ毎に、認識処理に用いる認識辞書の
軸数を変えても良いが、その制御が徒に複雑化する虞れ
があることから、例えばその中で最小の軸数の認識辞書
を見出し、その軸数に合せて各認識対象カテゴリに対す
る複合類似度計算を行うようにすれば良い。このＪ：う
にすれば、各認識対象カテゴリに対する類似度値の評価
条件の統一化を図ることかできるので、積電の高い認識
処理を行うことが可能となる。Although it is possible to change the number of axes of the recognition dictionary used for recognition processing for each category, there is a risk that the control will become unnecessarily complicated. Composite similarity calculations for each recognition target category may be performed in accordance with the heading and its number of axes. If J: is used, it is possible to unify the evaluation conditions of similarity values for each recognition target category, and therefore it is possible to perform recognition processing with a high accumulation of power.

つまりこの例では、第４軸までの認識辞書だ【プを用い
てその認識処理を行うようにすれば良い。In other words, in this example, the recognition process can be performed using the recognition dictionary up to the fourth axis.

ところで音声［を行う場合、例えばその入力音声パター
ンの母音成分の特徴ベクトルと子音成分の特徴ベクトル
とをそれぞれ別個に抽出し、それらをそれぞれ認識処理
する場合がある。By the way, when performing speech [, for example, the feature vector of the vowel component and the feature vector of the consonant component of the input speech pattern may be extracted separately and subjected to recognition processing respectively.

例えば第２図に示すようにセグメント化処理部８にて、
前記特徴抽出部２で抽出される入力音声の特徴パラメー
タの時系列（例えば１６チヤンネル）から、該入力音声
の単音節をそれぞれ切出す。そして母音成分については
、上記１６チヤンネルの分析出力をその母音部の１フレ
一分を母音認識用の特徴パターンとして切出し、子音成
分については上記１６チヤンネルの分析出力を隣接する
２ヂヤンネルづつまとめた８チヤンネルの特徴パラメー
タとして、子音から母音への亙り部分を含めた８フレ一
ム分を子音パターンとして切出すことが行われる。For example, as shown in FIG. 2, in the segmentation processing section 8,
Each monosyllable of the input speech is extracted from the time series (for example, 16 channels) of the feature parameters of the input speech extracted by the feature extraction unit 2. For the vowel component, one frame of the vowel part was extracted from the analysis output of the 16 channels as a feature pattern for vowel recognition, and for the consonant component, the analysis output of the 16 channels was summarized into two adjacent channels. As a characteristic parameter of the channel, eight frames including the transition part from a consonant to a vowel are extracted as a consonant pattern.

この場合、認識処理に供せられる母音パターンの特徴ベ
クトルの次元数は１６次元となり、また子音パターンの
特徴ベクトルは６４次元のベクトルとなる。In this case, the number of dimensions of the vowel pattern feature vector used for recognition processing is 16 dimensions, and the consonant pattern feature vector is a 64-dimensional vector.

制御部５は、このような母音パターンおよび子音パター
ンに対する認識辞書の作成を学習部４に対してそれぞれ
独立に制御している。例えば母音パターンの認識辞書を
作成するに際しては、例えば第３図に示すようにその学
習パターン数に応じて認識辞書の作成軸数を制御し、ま
た子音パターンの認識辞書を作成するに際しては、例え
ば第４図に示すようにその学習パターン数に応じてその
認識辞書の作成軸数を制御している。このような認識辞
書の作成は前述したようにシュミットの直交化によって
行っても良いが、学習パターンの共分散行列をＫＬ展開
してその固有値と固有ベクトルを求め、これを認識辞書
とすることが好ましい。The control unit 5 independently controls the learning unit 4 to create recognition dictionaries for such vowel patterns and consonant patterns. For example, when creating a recognition dictionary for vowel patterns, the number of axes for creating the recognition dictionary is controlled according to the number of learning patterns, as shown in Figure 3, and when creating a recognition dictionary for consonant patterns, for example, As shown in FIG. 4, the number of axes for creating the recognition dictionary is controlled according to the number of learning patterns. Although such a recognition dictionary may be created by Schmidt's orthogonalization as described above, it is preferable to perform a KL expansion on the covariance matrix of the learning pattern to obtain its eigenvalues and eigenvectors, and use this as the recognition dictionary. .

またｍｓ辞書の更新に際しては、例えば不特定話者用の
共分散行列を特定話者の学習パターンにより更新し、更
新された共分散行列をＫＬ展開することによって、その
ｌ！　Ｉ　辞書を特定話者に適合させるようにしていけ
ば良い。In addition, when updating the ms dictionary, for example, the covariance matrix for unspecified speakers is updated using the learning pattern of a specific speaker, and the updated covariance matrix is subjected to KL expansion. I Just try to adapt the dictionary to the specific speaker.

しかして入力音声を認識処理する場合には、その特徴パ
ターンが母音の特徴ベクトルか或いは子音の特徴ベクト
ルかにＪ：って、そのベクトルの次元数が異なることか
ら、認識部３にて認識処理に供せられる特徴ベクトルの
次元数に応じて該特徴ベクトルとの複合類似度ｉｔ　痺
に用いる認識辞書の軸数を制御する。When input speech is recognized, the number of dimensions of the vector is different depending on whether the feature pattern is a vowel feature vector or a consonant feature vector. The number of axes of the recognition dictionary used for composite similarity with the feature vector is controlled according to the number of dimensions of the feature vector provided to the feature vector.

この結果、認識処理に供せられる入力パターンの特徴ベ
クトルに応じて最適な軸数の認識辞書を用いた認識処理
が効率良く、ｎつ精度良く行われることになる。As a result, the recognition process using the recognition dictionary with the optimal number of axes is performed efficiently and accurately according to the feature vector of the input pattern used for the recognition process.

尚、この特徴ベクトルの次元数に応じた認識辞書の軸数
の選択によるパターン認識処理においても、認識辞書の
作成に用いられた学習パターン数に応Ｃて、その認識処
理に用いる認識辞書の軸数の制御を行うようにしても良
い。つまり、特徴ベクトルの次元数とｉｌｌ辞書の作成
に用いられた学習パターン数との双方の情報に従って、
認識処理に用いる認識辞書の軸数を制御するようにして
も良い。In addition, even in pattern recognition processing by selecting the number of axes of the recognition dictionary according to the number of dimensions of the feature vector, the number of axes of the recognition dictionary used for the recognition processing is determined according to the number of learning patterns used to create the recognition dictionary. The number may be controlled. In other words, according to the information on both the number of dimensions of the feature vector and the number of learning patterns used to create the ill dictionary,
The number of axes of the recognition dictionary used for recognition processing may be controlled.

以上説明したように、本装置によれば認識辞書の作成に
用いられた学習パターンの数、または認識処理に供せら
れる入力パターンの特徴ベクトルの次元数に応じて、該
入力パターンの特徴ベクトルとの照合に用いられる認識
辞書の軸数が制御され、最適な軸数での認識処理が行わ
れる。従って認識装置の利用者に負担を掛けることなく
、適切な軸数の認識辞書を用いて効果的な認識処理を行
うことが可能となる。また認識辞書の作成に用いた学習
パターン数の多少に拘ることなく、その認識辞書の学習
回数に応じた適切な認識処理が行われることになる。As explained above, according to the present device, the feature vector of the input pattern and the number of dimensions of the feature vector of the input pattern to be subjected to recognition processing are The number of axes of the recognition dictionary used for verification is controlled, and recognition processing is performed using the optimal number of axes. Therefore, it is possible to perform effective recognition processing using a recognition dictionary with an appropriate number of axes without placing a burden on the user of the recognition device. Furthermore, regardless of the number of learning patterns used to create a recognition dictionary, appropriate recognition processing is performed according to the number of times the recognition dictionary is trained.

従って、学習データの収集に労力を費やすことなく、ま
た認識処理すべき入力パターンの特徴ベクトルの次元数
に応じて、簡易に、しかも効果的にパターン認識を行う
ことが可能となる等の実用上多大なる効果が奏せられる
。Therefore, it is possible to easily and effectively perform pattern recognition without spending effort on collecting learning data, and according to the number of dimensions of the feature vector of the input pattern to be recognized. Great effects can be achieved.

尚、本発明は上述した実施例に限定されるものではない
。例えば認識辞書の軸数の選択設定基準は、認識辞書の
学習の方式等に応じて定めれば良いものである。またこ
こでは、利用者の学習パターンだけを用いてその標準パ
ターンを作成したが、不特定話者用の標準パターンを予
め準備しておき、これを第１軸の標準パターンどすると
共に、学習パターンの入力に従って第２軸以降の標準パ
ターンを順次作成していくようにしても良い。また既に
作成されている標準パターンを更新処理して特定話者用
に適応化ざＶていりＪ：うにしても良い。Note that the present invention is not limited to the embodiments described above. For example, the selection setting criteria for the number of axes of the recognition dictionary may be determined depending on the learning method of the recognition dictionary. In addition, here, the standard pattern was created using only the user's learning pattern, but a standard pattern for unspecified speakers is prepared in advance, and this is used as the standard pattern for the first axis, and the learning pattern The standard patterns for the second and subsequent axes may be created sequentially according to the input. Alternatively, a standard pattern that has already been created may be updated and adapted for a specific speaker.

またその認識辞書の学習を更に発展させて、不特定話者
用の高性能な認識辞書を作成して行くようにしても良い
。更には実施例では音声認識を例に説明したが、文字や
図形の認識処理においても同様に適用することが可能で
あり、要するに本発明はその要旨を逸脱しない範囲で種
々変形して実施することができる。Further, learning of the recognition dictionary may be further developed to create a high-performance recognition dictionary for unspecified speakers. Further, although the embodiments have been described using voice recognition as an example, the present invention can be similarly applied to character and figure recognition processing, and in short, the present invention can be implemented with various modifications without departing from the gist thereof. I can do it.

[Brief explanation of drawings]

第１図は本発明の一実施例方式を適用した音声認識装置
の概略構成図、第２図は他の実施例装置の概略構成を示
ず図、第３図および第４図はそれぞれ認識辞書の作成に
用いられた学習パターン教書の学習パターン数とその認
識辞書の軸数の例を示す図である。１・・・音声入力部、２・・・特徴抽出部、３・・・認
識部、４・・・学習部、５・・・選択部、６・・・標準
パターン辞書メモリ、６・・・表示部、８・・・セグメ
ント化処理部。FIG. 1 is a schematic configuration diagram of a speech recognition device to which one embodiment of the present invention is applied, FIG. 2 is a diagram that does not show the schematic configuration of another embodiment of the device, and FIGS. 3 and 4 are recognition dictionaries, respectively. FIG. 3 is a diagram showing an example of the number of learning patterns in the learning pattern textbook used for creating the learning pattern book and the number of axes of the recognition dictionary. DESCRIPTION OF SYMBOLS 1... Voice input part, 2... Feature extraction part, 3... Recognition part, 4... Learning part, 5... Selection part, 6... Standard pattern dictionary memory, 6... Display section, 8... Segmentation processing section.

Claims

[Claims]

(1) Analyzing an input pattern to extract a feature vector of a predetermined number of dimensions of the input pattern, and comparing this feature vector with a recognition dictionary registered in advance to recognize the input pattern, the recognition dictionary The number of axes of the recognition dictionary used for comparison with the feature vector is controlled according to the number of dimensions of the feature vector used for comparison with the feature vector or the number of learning patterns used to create the recognition dictionary. A pattern recognition learning method featuring:

(2) The input pattern recognition process is performed by the subspace method or the composite similarity method, and the recognition dictionary is created by averaging the learning patterns, KL expansion of the characteristic kernel, or Schmidt orthogonalization. The pattern recognition learning method according to claim 1, which is carried out by.