JPH0934862A

JPH0934862A - Pattern learning method and device

Info

Publication number: JPH0934862A
Application number: JP7182430A
Authority: JP
Inventors: Kazuki Nakajima; 和樹中島; Hiroshi Shinjo; 広新庄; Katsumi Marukawa; 勝美丸川; Yoshihiro Shima; 好博嶋; Kazumi Suzuki; 和美鈴木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-07-19
Filing date: 1995-07-19
Publication date: 1997-02-07

Abstract

PROBLEM TO BE SOLVED: To learn lots of patterns efficiently by eliminating an undesired dictionary or setting a penalty through the use of a recognition result. SOLUTION: The device is made up of an input control section 101, a display control section 102, a clustering pre-processing control section 105, a clustering control signal 106, a clustering post-processing control section 107, a pattern recognition control section 108, a dictionary (clustering result) 111, and a learning pattern 112. Then learning patterns with very high similarity are collected before the learning to reduce the number of learning patterns in advance. Furthermore, a radial pattern space is divided around a representative point of the clusters and only a pattern remotest from the representative point of the cluster in the region is stored as a pattern in the cluster. Furthermore, a limit for discontinuously merging clusters with high similarity is provided. Then an undesired dictionary is eliminated or a penalty is set by using the result of recognition.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、パターン認識の分野に
おいて、パターンをクラスタリング（分類）することに
よって学習し、パターン認識のための辞書（学習したク
ラスタ）を作成する方法および装置に係り、特に、大量
パターンの効率的な学習を可能とする学習方法および装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for learning a pattern by clustering (classifying) it to create a dictionary (learned cluster) for pattern recognition in the field of pattern recognition, and more particularly to , A learning method and apparatus capable of efficiently learning a large number of patterns.

【０００２】[0002]

【従来の技術】従来、パターンを学習する方法には、パ
ターン、またはパターン集合を一つのクラスタ（グルー
プ）とみなし、このクラスタ間の類似性をある尺度に基
づいて数値化し、最大の類似性、あるいはより一般的に
類似性がある値以上のクラスタどうしを併合して一つの
クラスタとする操作を繰り返すことによって、パターン
を分類する方法（「認識工学−パターン認識とその応
用」（鳥脇純一郎著，コロナ社，１９９３年）の８５ペ
ージに記載の「階層的クラスタリング」）がある。2. Description of the Related Art Conventionally, as a method of learning a pattern, a pattern or a set of patterns is regarded as one cluster (group), the similarity between the clusters is digitized based on a certain scale, and the maximum similarity, Or, more generally, a method of classifying patterns by merging clusters having similar values or more to form one cluster (“recognition engineering-pattern recognition and its application” (Junichiro Toriwaki, Corona, 1993), page 85, "Hierarchical clustering").

【０００３】[0003]

【発明が解決しようとする課題】上述した従来の「階層
的クラスタリング」を膨大な量のパターン学習に適用す
る場合、すべてのパターンを一括して学習に用いていた
ので、繰り返しの処理が非常に多くなり、膨大な処理時
間がかかっていた。そのため学習パターンを効率的に減
らすことが課題となっている。When the above-mentioned conventional "hierarchical clustering" is applied to the learning of a huge amount of patterns, all the patterns are collectively used for learning, so that the repetitive processing is extremely difficult. It took a lot of time and took a huge amount of processing time. Therefore, efficient reduction of learning patterns is an issue.

【０００４】類似性の高いクラスタが３個以上存在する
場合、それらのクラスタを併合する順序が具体的に示さ
れていなかった。When there are three or more highly similar clusters, the order of merging those clusters has not been specifically indicated.

【０００５】類似性の高いクラスタの併合を繰り返す場
合、一般に鎖効果と呼ばれる欠点が現れる。学習パター
ンが膨大な場合は、パターンが密集することが多く、こ
の鎖効果が顕著に現れるため問題となる。When repeating merging of highly similar clusters, a defect generally called a chain effect appears. When the learning patterns are enormous, the patterns are often clustered and this chain effect becomes noticeable, which is a problem.

【０００６】クラスタリングによって大量のパターンを
学習した場合、少数個のパターンからなるクラスタ（辞
書）が大量に生成されるが、これらの中には認識精度に
関係の無い、言わば不必要な辞書が大量に存在する。こ
れらを除外するなどして効率の高い辞書を作成すること
が課題となる。When a large number of patterns are learned by clustering, a large number of clusters (dictionaries) consisting of a small number of patterns are generated. Among these, a large number of unnecessary dictionaries that are unrelated to the recognition accuracy. Exists in. The problem is to create a highly efficient dictionary by excluding these.

【０００７】[0007]

【課題を解決するための手段】本発明の方法または装置
は、類似性の極めて高い学習パターンを学習前にまとめ
ておく等して、学習パターン数をあらかじめ削減してお
く処理または手段、クラスタの代表点を中心とした放射
状の領域にパターン空間を分割し、該領域の中でクラス
タの代表点からの距離が最も遠いパターンのみをクラス
タ内のパターンとして記憶する処理または手段、類似性
の高いクラスタの併合を非連続的にするような制限を設
ける処理または手段、認識結果を利用して不必要な辞書
を除いたり、あるいはペナルティを設定する処理または
手段からなる。The method or apparatus of the present invention is a process or means for preliminarily reducing the number of learning patterns by, for example, collecting learning patterns having extremely high similarity before learning, A process or means for dividing a pattern space into radial areas centered on a representative point, and storing only the pattern with the longest distance from the representative point of the cluster as an intra-cluster pattern in the area, a cluster with high similarity The process or means for limiting the merging of the two is discontinuous, the process or means for removing the unnecessary dictionary by using the recognition result, or the process for setting the penalty.

【０００８】[0008]

【作用】学習パターン数を削減することにより、学習時
間を短縮できる。クラスタの併合に制限を設けることに
より、特に学習初期のクラスタを分散化させることがで
きるので、鎖効果が起こりにくく、膨大なパターンの学
習が可能となる。同時に、具体的な併合の方法も示され
る。また、認識結果を用いることにより、不必要な辞書
の判定が容易になる。[Function] The learning time can be shortened by reducing the number of learning patterns. By limiting the merging of the clusters, the clusters at the initial stage of learning can be dispersed, so that the chain effect is unlikely to occur and a huge amount of patterns can be learned. At the same time, a concrete merging method is also shown. Further, use of the recognition result facilitates determination of unnecessary dictionaries.

【０００９】[0009]

【実施例】クラスタリングについて説明する。パターン
空間に有限個のパターンのサンプルが与えられている場
合、この空間の中で、パターンの分布が密集している箇
所をクラスタ（パターン集合）と呼ぶ（図２参照）。図
２は特徴軸が２種類の場合を示している。このクラスタ
を見つける操作をクラスタリングと呼ぶ。クラスタリン
グの代表的な手法は、類似性のあるパターンどうしの併
合を繰り返す処理である（一般に、最近隣法と呼ばれて
いる）。[Examples] Clustering will be described. When a finite number of pattern samples are given to the pattern space, the locations where the pattern distributions are dense in this space are called clusters (pattern sets) (see FIG. 2). FIG. 2 shows a case where there are two types of characteristic axes. The operation of finding this cluster is called clustering. A typical method of clustering is a process of repeatedly merging patterns having similarities (generally called the nearest neighbor method).

【００１０】パターン間の類似性を表す尺度としては距
離や類似度等がある。距離には、いわゆる市街距離，ユ
ークリッド距離，ミンコフスキー距離，マハラノビス距
離などがあり、類似度にはいわゆる単純類似度，方向類
似度，複合類似度，重み付き類似度などがある。類似度
が高い時には類似性が高く、類似度が低い時には類似性
は低い。逆に、距離を用いる場合は、距離が短い時には
類似性が高く、距離が遠いときには類似性は低くなる。Distance, similarity and the like are used as a measure of the similarity between patterns. The distance includes so-called city distance, Euclidean distance, Minkowski distance, Mahalanobis distance, and the like, and the similarity includes so-called simple similarity, directional similarity, complex similarity, weighted similarity, and the like. When the similarity is high, the similarity is high, and when the similarity is low, the similarity is low. On the contrary, when the distance is used, the similarity is high when the distance is short, and the similarity is low when the distance is long.

【００１１】パターンあるいはクラスタ併合処理には、
代表的手法として、（１）併合するパターンのパターン空間上の位置を保存
しない手法（２）併合するパターンのパターン空間上の位置を保存
する手法の２種類がある。For pattern or cluster merge processing,
There are two typical methods: (1) a method of not saving the position of the pattern to be merged in the pattern space, and (2) a method of saving the position of the pattern to be merged in the pattern space.

【００１２】上記（１）は、パターン空間の特徴軸のそ
れぞれに対して特徴値の累積を取り、平均を取る，重心
を取る、あるいはある閾値で正規化する等して、クラス
タを代表するパターンを作成し、以降の併合処理では、
この代表パターンとの類似性を比較し、併合に用いると
いう手法である。代表パターンを作成することによっ
て、併合されたパターンのパターン空間上の位置は消滅
する。In the above (1), a pattern representing a cluster is obtained by accumulating feature values for each of the feature axes of the pattern space, taking the average, taking the center of gravity, or normalizing with a certain threshold. Is created, and in the subsequent merge processing,
This is a method of comparing the similarity with the representative pattern and using it for merging. By creating the representative pattern, the position of the merged pattern in the pattern space disappears.

【００１３】これに対し上記（２）は、代表パターンを
作成することなく、クラスタ内の個々のパターンとの類
似性を比較し、併合処理を繰り返す手法である。従って
(２)の手法は、併合するパターンのパターン空間上の位
置を保存する必要があるので、大量パターンの学習には
不適である。On the other hand, the above (2) is a method of comparing the similarities with individual patterns in the cluster and repeating the merging process without creating a representative pattern. Therefore
The method (2) is not suitable for learning a large number of patterns because it is necessary to save the positions of the patterns to be merged in the pattern space.

【００１４】以降で併合処理といった場合は、上記
（１）の手法を用いることとする。ただし、計算機の発
展により、上記（２）の手法を取るのも不可能ではな
い。In the case of merging processing thereafter, the above method (1) is used. However, due to the development of computers, it is not impossible to adopt the above method (2).

【００１５】従来方法である「階層的クラスタリング」
について説明する。クラスタリングの対象となるｎ個の
パターン集合をＰ＝{ｐ０，ｐ１，...，ｐｎ−１｝とす
る。パターンｐｉとｐｊの距離または類似度を第ｉ行ｊ
列とする行列を、それぞれ、類似度行列，距離行列と呼
ぶことにする。この行列はあらかじめ表（リスト）の形
式で計算機上に実現してもよいが、大量のパターンを学
習させる場合には、表のサイズが巨大となるため、必要
になる度に逐次計算することにしてもよい。以降で用い
られている「距離」は、これらの距離に加えて上述の類
似度も含むこととし、パターン間の類似性を表す尺度を
表現する言葉として用いる。"Hierarchical clustering" which is a conventional method
Will be described. The set of n patterns to be clustered is P = {p0, p1, ..., pn-1}. The distance or similarity between the patterns pi and pj is the i-th row j
The matrixes that form the columns will be referred to as a similarity matrix and a distance matrix, respectively. This matrix may be realized in advance in the form of a table (list) on a computer, but when learning a large number of patterns, the size of the table will be huge, so it will be calculated sequentially as needed. May be. The “distance” used hereafter will include the above-mentioned similarity in addition to these distances, and will be used as a word to express a scale representing the similarity between patterns.

【００１６】図５は従来の「階層的クラスタリング」を
説明する図である。初期のクラスタはそれぞれの学習パ
ターンであるとする（処理６０１）。処理６０２におけ
る閾値Ｔは、クラスタの併合を決定するための距離の閾
値である。処理６０４におけるｄｊｋは、クラスタＣｊ
とクラスタＣｋとの距離である。処理６０４において、
距離ｄｊｋが閾値Ｔ以下なら、クラスタＣｋをクラスタ
Ｃｊに併合する（処理６０５）。処理６０６においてｋ
をインクリメント（１増加）し、クラスタ数を越えなけ
れば、この処理を繰り返す（処理６０７）。次に、処理
６０８において、ｋをゼロと置くとともに、ｊをインク
リメントし、クラスタ数を越えなければ、さらにこれら
の処理を繰り返す（処理６０９）。閾値Ｔをあらかじめ
設定した増加分だけ増加させ（処理６１０）、最終値を
越えなければ、さらにこの処理を繰り返す（処理６１
１）。このような処理の繰り返しによって、小さなクラ
スタから徐々に大きなクラスタへと成長して行き、学習
が実行される。FIG. 5 is a diagram for explaining the conventional "hierarchical clustering". It is assumed that the initial cluster is each learning pattern (process 601). The threshold T in the process 602 is a distance threshold for determining cluster merging. The djk in the process 604 is the cluster Cj
And the cluster Ck. In process 604,
If the distance djk is less than or equal to the threshold T, the cluster Ck is merged with the cluster Cj (process 605). K in processing 606
Is incremented (incremented by 1), and if the number of clusters is not exceeded, this processing is repeated (processing 607). Next, in process 608, k is set to zero, j is incremented, and if the number of clusters is not exceeded, these processes are repeated (process 609). The threshold value T is increased by a preset increase amount (process 610), and if the final value is not exceeded, this process is repeated (process 61).
1). By repeating such processing, small clusters gradually grow into large clusters, and learning is executed.

【００１７】ここで、前記（２）の手法は、併合するパ
ターンのパターン空間上の位置を保存する必要があるの
で、大量パターンの学習には不適であると述べたが、以
下のような方法を用いることにより学習が可能となる。
この方法では、いわゆる最近隣法，最遠隣法，ウォード
法，重心法を用いて高性能なクラスタリング処理が可能
となるので、大量パターンの学習に不可欠の方法であ
る。The method (2) described above is not suitable for learning a large number of patterns because it is necessary to store the positions of the patterns to be merged in the pattern space, but the following method is used. It becomes possible to learn by using.
Since this method enables high-performance clustering processing by using the so-called nearest neighbor method, farthest neighbor method, Ward method, and center of gravity method, it is an essential method for learning a large number of patterns.

【００１８】パターン空間が２次元の場合を例として、
図３を用いて説明する。図３は、クラスタ代表点（中心
点や平均点や重心点）３０１の周辺のみを拡大して示し
てある。学習パターンのパターン空間上の点は、点３０
２や点３０３が示すように×印で示してある。学習パタ
ーンが膨大な場合は、クラスタ代表点の周辺に、千個と
か一万個とか、あるいはそれ以上のオーダーでパターン
が集中することがある。このようなパターンのパターン
空間上の点をすべて保存しておくための記憶容量は膨大
となり、場合によっては計算機の能力を越えるため、学
習処理の実現が不可能となる場合がある。Taking the case where the pattern space is two-dimensional as an example,
This will be described with reference to FIG. FIG. 3 shows only the periphery of the cluster representative point (center point, average point, or center of gravity point) 301 in an enlarged manner. The point on the pattern space of the learning pattern is the point 30.
As indicated by 2 and point 303, it is indicated by an X mark. When the learning patterns are enormous, the patterns may be concentrated around the cluster representative point in the order of 1,000, 10,000, or more. The storage capacity for storing all the points in the pattern space of such a pattern becomes enormous, and in some cases, the capacity of the computer is exceeded, so that the learning process may not be realized.

【００１９】本方法は、すべての位置を記憶しなくて
も、従来と同等のクラスタリングが可能となる方法であ
る。それは、図３において、クラスタの代表点を中心と
して、３０６のような線引きをし、領域を角度（クラス
タ中心点とのベクトルの方向）でもって分割し、各領域
の中で、クラスタ代表点から最も遠い位置にあるパター
ンのみを記憶する方法である。つまり、例えば、分割を
１度ごとに行えば、360等分することになるが、最大３
６０個のパターンのパターン空間上の位置を覚えるだけ
で済む。This method is a method that enables the same clustering as the conventional method without storing all positions. In FIG. 3, a line such as 306 is drawn around the representative point of the cluster, the region is divided at an angle (the direction of the vector with respect to the cluster center point), and in each region, from the cluster representative point. This is a method of storing only the pattern at the farthest position. That is, for example, if the division is performed once, it will be divided into 360 equal parts, but at most 3
All that is required is to remember the positions of the 60 patterns in the pattern space.

【００２０】図３では、距離３０５がこの領域の中で最
も遠いためパターン位置３０２は記憶されるが、距離３
０４はクラスタ代表点３０１により近いパターン位置３
０３を記憶する必要はない。なお、記憶する必要のない
パターンは単純に消去し学習しないことにしてもよい
し、クラスタの代表点と併合してもよい。この方法によ
り、一クラスタあたり数千，数万、あるいはそれ以上の
パターンが集中し、学習ができないような場合であって
も、数百程度のパターンの処理で学習が実行できる。In FIG. 3, the pattern position 302 is stored because the distance 305 is the farthest in this region, but the distance 3
04 is the pattern position 3 closer to the cluster representative point 301
It is not necessary to remember 03. The patterns that do not need to be stored may be simply deleted and not learned, or may be merged with the representative points of the cluster. According to this method, even if the number of patterns of thousands, tens of thousands, or more per cluster is concentrated and learning cannot be performed, learning can be executed by processing hundreds of patterns.

【００２１】例えば手書き文字や、オムニフォント（多
種類の印刷文字）は、文字の変形が非常に多くなるた
め、膨大な学習パターンを用意し学習する必要がある。
従来通りすべてのパターンを学習させようとすると、膨
大な処理時間がかかってしまう。そのため、学習前に、
何らかの処理によって学習パターンを減らすことが必要
になる。For example, handwritten characters and omni fonts (various types of printed characters) are extremely deformed, so it is necessary to prepare and learn a huge amount of learning patterns.
If you try to learn all patterns as before, it will take a huge amount of processing time. Therefore, before learning
It is necessary to reduce the learning pattern by some processing.

【００２２】もちろん、単にパターンを減らせばよいと
いうわけではなく、学習すべきパターンを単に除外して
しまうようなことがあってはならない。学習パターンを
減らすことによって、繰り返しの処理がパターン数のべ
き乗で少なくなっていく。図５ではループが３重のた
め、３乗で減少する。Of course, it is not necessary to simply reduce the pattern, and it is not possible to simply exclude the pattern to be learned. By reducing the number of learning patterns, the number of repetitive processes decreases as a power of the number of patterns. In FIG. 5, the number of loops is triple, and therefore the number decreases with the cube.

【００２３】学習パターンを減らすには、（１）カテゴリに分割する（２）学習パターンを単純に分割する（３）距離が極めて隣接しているパターンどうしをあら
かじめまとめておく（４）距離が極めて隣接しているパターンの中で、１個
のみまたは数個のパターンを代表パターンとして残し、
残りを除外する（５）構造解析結果を用いて分割するなどの方法がある。ここで、カテゴリとは、例えば、数
字の場合は「０」から「９」まで、ひらがなの場合は
「あ」から「ん」までを指す。To reduce the number of learning patterns, (1) divide into categories (2) simply divide the learning pattern (3) collect patterns that have very close distances in advance (4) make the distance extremely Among the adjacent patterns, only one pattern or several patterns are left as a representative pattern,
Exclude the rest (5) There are methods such as dividing using the structural analysis results. Here, the category refers to, for example, “0” to “9” in the case of numbers and “a” to “n” in the case of hiragana.

【００２４】上記（１）や（２）や（５）に示すように
学習パターンを分割することによって、複数の計算機に
学習処理を分散し、並列的に処理し、学習時間を短縮で
きるという利点もある。あらかじめ分割し、クラスタリ
ングしたそれぞれのクラスタリング結果を纏めて、再度
クラスタリングすれば、分割しなかった場合と同等のク
ラスタリング結果を短時間の処理で得ることができる。By dividing the learning pattern as shown in the above (1), (2) and (5), the learning process can be distributed to a plurality of computers and processed in parallel to shorten the learning time. There is also. If the clustering results obtained by dividing and clustering in advance are collected and re-clustered, a clustering result equivalent to the case of not dividing can be obtained in a short time.

【００２５】上記（３）は、通常のクラスタリング手法
を用いてもよい。この場合は、ごく初期の内に処理を中
断する形となる。（４）は、膨大な学習パターンがある
場合は、ほとんど同じ形状のパターンが確率的に多く存
在するという考えに基づいている。つまり、ほとんど同
じ形状のパターンをいくつも学習に用いるのは無駄であ
り、１個だけ用いればよいという考えである。この１個
のパターンの選び方は、中心，平均，重心のパターンを
選ぶとか、ランダムに選べばよい。In the above (3), an ordinary clustering method may be used. In this case, the processing is interrupted in the very beginning. (4) is based on the idea that, when there are enormous learning patterns, many patterns with almost the same shape are stochastically present. In other words, it is wasteful to use many patterns having almost the same shape for learning, and it is necessary to use only one pattern. As for the method of selecting this one pattern, it is sufficient to select a pattern of the center, the average, and the center of gravity, or randomly.

【００２６】大量のパターンを学習する場合、パターン
空間上にパターンが密集することになり、図４に示した
いわゆる鎖効果が現れることとなる。図４は、パターン
空間上のあるクラスタの中心点（あるいはクラスタの代
表点）５０１（図の×印）とその近辺に存在する別のク
ラスタが併合されて、クラスタ中心が５０２に移動する
様子を示している。同様にして、クラスタ中心が、５０
３まで移動している。符号５０４はクラスタ中心の移動
の軌跡を示している。この鎖効果は、従来の「階層的ク
ラスタリング」においては、図５の符号６０４，６０
５，６０６，６０７が示す一連の繰り返し処理によっ
て、発生する。鎖効果は、本来あるべきクラスタの中心
が移動してしまい、本来併合されてはならないクラスタ
との併合を引き起こすという点で、あってはならない現
象である。この鎖効果をなるべく引き起こさないような
クラスタリング手法が必要である。When learning a large number of patterns, the patterns are densely arranged in the pattern space, and the so-called chain effect shown in FIG. 4 appears. FIG. 4 shows a state in which the center point (or representative point of a cluster) 501 (marked by a cluster) 501 of a cluster in the pattern space and another cluster existing in the vicinity are merged, and the cluster center moves to 502. Shows. Similarly, the cluster center is 50
It has moved to 3. Reference numeral 504 indicates the locus of movement of the cluster center. In the conventional “hierarchical clustering”, this chain effect is represented by reference numerals 604, 60 in FIG.
It is generated by a series of repeated processes indicated by reference numerals 5,606,607. The chain effect is a phenomenon that should not occur in that the center of a cluster that should be originally moved moves to cause a merger with a cluster that should not be originally merged. A clustering method that does not cause this chain effect is necessary.

【００２７】鎖効果が起こらないようにするためには、
例えば図６に示すような、併合処理を連続的に行わない
よう、併合処理に制限を設けることによって解決でき
る。図５と図６の各処理は、ほぼ対応しており、処理６
０５と処理７０５が終わった後のみが異なっている。つ
まり、処理７０５において、クラスタの併合処理を行っ
た場合、このクラスタとの併合処理を一旦中止し、別の
クラスタとの併合処理へと移行させる。このように併合
を連続的に行わないようにすることによって、特に学習
の初期状態において、極小のクラスタが点在することに
なり、このクラスタを種（中心）として、クラスタが徐
々に成長することになる。結果として、パターンの変形
を適度に吸収し、バリエーションも豊富で、認識精度に
有効な辞書が作成される。In order to prevent the chain effect from occurring,
For example, as shown in FIG. 6, it is possible to solve the problem by limiting the merge process so that the merge process is not continuously performed. The processes of FIGS. 5 and 6 substantially correspond to each other.
Only after 05 and processing 705 are different. That is, when the cluster merging process is performed in the process 705, the merging process with this cluster is temporarily stopped and the merging process with another cluster is performed. By not merging continuously in this way, especially in the initial state of learning, minimal clusters will be scattered, and clusters will gradually grow with this cluster as a seed (center). become. As a result, it is possible to create a dictionary that absorbs pattern deformations abundantly, has many variations, and is effective for recognition accuracy.

【００２８】パターンを大量に集めると、様々な変形を
含んだパターンが必然的に多くなるが、変形の度合いに
応じてその絶対数は少なくなる。つまり、変形の少ない
標準的な形状のパターンは多く、変形の大きいパターン
の個数は少ない。従って、大量のパターンをクラスタリ
ングした場合、少数個のパターンからなるクラスタ（辞
書）が大量に生成されることになる。このような辞書
は、認識の精度を高めるためにはある程度の個数が必要
であるが、これらの中には、言わば不必要な辞書も大量
に存在する。この不必要な辞書を正確に判定し、除外す
るなどして、辞書の絶対数を少なく、かつ高精度な認識
の可能な、効率の高い辞書を作成することが重要であ
る。When a large number of patterns are collected, the number of patterns including various deformations inevitably increases, but the absolute number decreases depending on the degree of deformation. That is, there are many patterns having a standard shape with little deformation, and the number of patterns with large deformation is small. Therefore, when a large number of patterns are clustered, a large number of clusters (dictionaries) composed of a small number of patterns are generated. Such dictionaries require a certain number in order to improve the recognition accuracy, but there are a large number of unnecessary dictionaries among them. It is important to create a highly efficient dictionary that has a small absolute number of dictionaries and can be recognized with high accuracy by accurately determining and excluding this unnecessary dictionary.

【００２９】そのためには、（１）正解が少なく、不正解の多い辞書を削除する（図
７）（２）正解が少なく、不正解の多い辞書にペナルティを
設ける（図８）（３）形状が似ていることにより不必要と判定された辞
書を削除するなどの方法が有効である。To this end, (1) delete a dictionary with few correct answers and many incorrect answers (FIG. 7) (2) provide a penalty for a dictionary with few correct answers and many incorrect answers (FIG. 8) (3) shape It is effective to delete the dictionary that is judged unnecessary because of the similarity of.

【００３０】上記（１）は、図７に示すように、学習
（本発明の方法を含むどのような方法でもよい）処理９
０１の後、認識処理９０２を行う。認識処理はどのよう
な方法でもよいが、通常は、別に用意したテストパター
ンとすべての学習した辞書との類似度または距離を計算
し、認識判定９０５において最大の類似性を示した辞書
を認識結果とする。認識結果は正解の場合もあれば、不
正解の場合もある。例えば、数字の０を認識させる場合
に、認識結果は０（ゼロ）にならなければならないが、
認識結果が６となってしまった場合は、不正解となる。
この正解と不正解をそれぞれの辞書毎に個別に数えあげ
て、正解が少なく、不正解が多い辞書を削除し（９０
４）、この処理を繰り返すという方法である。In the above (1), as shown in FIG. 7, learning (any method including the method of the present invention) processing 9 is performed.
After 01, recognition processing 902 is performed. Although any method may be used for the recognition processing, normally, the similarity or distance between the separately prepared test pattern and all learned dictionaries is calculated, and the dictionary showing the maximum similarity in the recognition judgment 905 is used as the recognition result. And The recognition result may be a correct answer or an incorrect answer. For example, when recognizing the number 0, the recognition result must be 0 (zero),
If the recognition result is 6, it is an incorrect answer.
The correct answer and the incorrect answer are counted individually for each dictionary, and the dictionary with few correct answers and many incorrect answers is deleted (90
4) The method is to repeat this process.

【００３１】具体的には、正解がゼロで、不正解が１以
上の辞書を消去，認識を再度行って、やはり、正解がゼ
ロで不正解が１以上の辞書を削除する。もし、削除する
辞書がなければ、正解が１個以下で不正解が１以上の辞
書を削除する。この処理を繰り返し、正解の個数を１個
ずつ増やしていき、例えば、正解が１０個以下で不正解
が１個以上となるまで繰り返す。これは非常に丁寧な方
法である。例えば、正解が１０個以下で不正解が１個以
上の辞書を削除するという処理をいきなり繰り返しても
よい。Specifically, the dictionary having zero correct answers and one or more incorrect answers is deleted and recognition is performed again, and the dictionary having zero correct answers and one or more incorrect answers is deleted. If there is no dictionary to be deleted, a dictionary with one correct answer or less and one incorrect answer or more is deleted. This process is repeated, and the number of correct answers is increased by one, for example, until the correct answer is 10 or less and the incorrect answer is 1 or more. This is a very polite method. For example, the process of deleting a dictionary having 10 or less correct answers and one or more incorrect answers may be repeated immediately.

【００３２】認識精度の限界に達した時は、この繰り返
し処理を終了する（９０３）。認識処理９０２を行うご
とに認識精度が計算できるので、認識精度の限界点は、
例えば、認識精度の上昇が鈍ったとき、認識精度が下降
した時、認識精度の上昇の度合いがある基準値を下回っ
た場合等となる。When the recognition accuracy limit is reached, this repetitive processing is terminated (903). Since the recognition accuracy can be calculated each time the recognition processing 902 is performed, the limit point of the recognition accuracy is
For example, when the increase in recognition accuracy slows down, when the recognition accuracy decreases, or when the degree of increase in the recognition accuracy falls below a certain reference value.

【００３３】上記（２）は辞書そのものは消去しない
が、ペナルティを付加することによって、正解となるべ
き辞書が認識結果として上がってくるような方法であ
る。各辞書毎にペナルティを設ける。The above (2) is a method in which the dictionary itself is not erased, but a dictionary that should be the correct answer appears as a recognition result by adding a penalty. A penalty is set for each dictionary.

【００３４】ペナルティの付加には、（ａ）類似度減算型（ｂ）類似度積算型のような方法がある。The penalties can be added by methods such as (a) similarity subtraction type and (b) similarity integration type.

【００３５】上記（ａ）は、ペナルティの初期値はゼロ
である。認識処理の後、各辞書ごとに数え上げた正解と
不正解の個数をもとに、ペナルティを加算する。例え
ば、不正解数／正解数÷ｋを加算する。再度行う認識処
理において、実際に計算された類似度からこのペナルテ
ィを、認識判定１００５において減算する。類似度は通
常０から１の間の小数値を取るので、ｋは百とか千とか
いう一定値である。ｋは必ずしも一定値である必要はな
く、学習の進展具合に応じて変化させてもよい。In (a) above, the initial value of the penalty is zero. After the recognition process, a penalty is added based on the number of correct and incorrect answers counted for each dictionary. For example, the number of incorrect answers / the number of correct answers / k is added. In the recognition process performed again, this penalty is subtracted in the recognition determination 1005 from the actually calculated similarity. Since the similarity usually takes a decimal value between 0 and 1, k is a constant value such as one hundred or one thousand. k does not necessarily have to be a constant value and may be changed according to the progress of learning.

【００３６】上記（ｂ）は、ペナルティの初期値は１で
ある。認識処理の後、各辞書ごとに数え上げた正解と不
正解の個数をもとに、ペナルティを積算する。例えば、
正解数／（正解数＋不正解数）×ｋをペナルティに積算
する。ｋは定数または、上述のような変化する値であ
る。不正解数が多いほど、ペナルティは１より小さな値
となり、再度行う認識処理において、実際に計算された
類似度にこのペナルティを積算すれば、不必要な辞書の
類似度を相対的に下げることができる。ペナルティが充
分大きな辞書は、除去してもよい。In the above (b), the initial value of the penalty is 1. After the recognition processing, the penalties are accumulated based on the number of correct and incorrect answers counted for each dictionary. For example,
The number of correct answers / (the number of correct answers + the number of incorrect answers) × k is added to the penalty. k is a constant or a changing value as described above. The larger the number of incorrect answers, the smaller the penalty becomes, and in the recognition processing to be performed again, if this penalty is added to the actually calculated similarity, the similarity of the unnecessary dictionary can be relatively lowered. it can. A dictionary with a sufficiently large penalty may be removed.

【００３７】上記（３）での判定は、類似度または距離
を計算することにより、類似性が高いと判定された辞書
に対しては、その中の一つ、あるいは数個の辞書があれ
ばよい場合がありえる。（３）に示したように、それ以
外の辞書を単純に消去してもよいし、一つずつ消去する
毎に認識処理を繰り返して、消去による影響を調べなが
ら、悪影響のない辞書のみを選択して消去してもよい。In the determination in (3) above, for a dictionary determined to have a high degree of similarity by calculating the degree of similarity or distance, if there are one or several dictionaries among them. There can be good cases. As shown in (3), the other dictionaries may be simply deleted, or the recognition process is repeated every deletion one by one, and only the dictionaries having no adverse effect are selected while checking the effect of the deletion. May be erased.

【００３８】[0038]

【発明の効果】以上に説明した通り、本発明は、大量の
パターンを効率的に学習することが可能となる効果があ
る。As described above, the present invention has an effect that it is possible to efficiently learn a large number of patterns.

[Brief description of drawings]

【図１】本発明を実現するシステム構成の一例を示すブ
ロック図。FIG. 1 is a block diagram showing an example of a system configuration that realizes the present invention.

【図２】クラスタリングの概念的説明図。FIG. 2 is a conceptual explanatory diagram of clustering.

【図３】学習パターンの減少方法の説明図。FIG. 3 is an explanatory diagram of a learning pattern reduction method.

【図４】鎖効果の説明図。FIG. 4 is an explanatory diagram of chain effect.

【図５】従来の「階層的クラスタリング」のフロー図。FIG. 5 is a flow chart of conventional “hierarchical clustering”.

【図６】鎖効果を無くすクラスタリング手法の実施例の
フロー図。FIG. 6 is a flow chart of an example of a clustering method for eliminating chain effect.

【図７】不必要な辞書を除去する手法のフロー図。FIG. 7 is a flowchart of a method for removing an unnecessary dictionary.

【図８】不必要な辞書にペナルティを設定する手法のフ
ロー図。FIG. 8 is a flowchart of a method of setting a penalty in an unnecessary dictionary.

[Explanation of symbols]

１０１…入力制御部、１０２…表示制御部、１０５…ク
ラスタリング前処理制御部、１０６…クラスタリング制
御部、１０７…クラスタリング後処理制御部、１０８…
パターン認識制御部、１１１…辞書（クラスタリング結
果）、１１２…学習パターン、３０２，３０３…学習パ
ターンのパターン空間上での位置、304,３０５…クラス
タの代表点からの距離、５０１，５０２，５０３…学習
パターンのパターン空間上での位置、５０４…鎖効果を
説明するための軌道。101 ... Input control unit, 102 ... Display control unit, 105 ... Clustering pre-processing control unit, 106 ... Clustering control unit, 107 ... Clustering post-processing control unit, 108 ...
Pattern recognition control unit, 111 ... Dictionary (clustering result), 112 ... Learning pattern, 302, 303 ... Position of learning pattern on pattern space, 304, 305 ... Distance from cluster representative point, 501, 502, 503 ... Position of the learning pattern in the pattern space, 504 ... Trajectories for explaining chain effect.

───────────────────────────────────────────────────── フロントページの続き (72)発明者嶋好博東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者鈴木和美神奈川県小田原市国府津2880番地株式会社日立製作所ストレージシステム事業部内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Yoshihiro Shima Shima 1-280, Higashi Koikeku, Kokubunji, Tokyo Inside Central Research Laboratory, Hitachi, Ltd. (72) Kazumi Suzuki, 2880, Kozu, Odawara, Kanagawa Prefecture Hitachi, Ltd. Storage System Division

Claims

[Claims]

1. A device for learning by clustering patterns, comprising means for inputting a pattern for learning, means for storing the pattern, means for storing a dictionary which is a clustering result, and pre-clustering. A pre-processing unit for clustering that adjusts the number of patterns, a clustering unit that clusters the patterns, a post-processing unit that clusters the dictionary that is the clustering result created by the clustering unit, and the dictionary. And a pattern recognition means for recognizing a pattern.

2. The preprocessing device for clustering according to claim 1, wherein the learning pattern is divided into categories for each learning pattern.

3. The clustering preprocessing device according to claim 1, wherein the learning pattern is simply divided.

4. The clustering preprocessing device according to claim 1, wherein the clustering preprocessing means divides the learning pattern based on a structural analysis result of the pattern.

5. A preprocessing device for clustering according to claim 1, wherein the preprocessing means for clustering collects learning patterns in advance with highly similar patterns to reduce the number of learning patterns.

6. The clustering preprocessing device according to claim 1, wherein the number of learning patterns is reduced by selecting a representative pattern from among patterns having high similarity as a learning pattern. .

7. The clustering preprocessing device according to claim 2, claim 3 or claim 4, when the clustering results learned by dividing in advance are combined and not divided by re-clustering. A pattern learning device which obtains a clustering result equivalent to.

8. The pattern learning apparatus according to claim 1, further comprising means for dispersing cluster merging.

9. The clustering means according to claim 1, wherein the pattern space is divided into radial regions centered on the representative point of the cluster, and only the pattern whose distance from the representative point of the cluster is the longest in the region. A pattern learning device characterized by storing as a pattern in a cluster.

10. The post-processing for clustering according to claim 1, wherein the result of the pattern recognition means according to claim 1 is used to delete a dictionary determined to be unnecessary. apparatus.

11. The clustering post-processing means according to claim 1 uses the result of the pattern recognition means according to claim 1 to set a penalty in a dictionary judged to be unnecessary. Aftertreatment device.

12. A process of inputting a pattern for learning, a process of storing the pattern, a process of storing a dictionary that is a clustering result, and a pre-processing of clustering that adjusts the number of patterns before clustering. A pattern learning method, comprising: a process of clustering the pattern, a post-processing of clustering for adjusting a dictionary that is a created clustering result, and a pattern recognition process of recognizing a pattern using the dictionary. .