JPH0740289B2

JPH0740289B2 - High-speed classification method

Info

Publication number: JPH0740289B2
Application number: JP61287633A
Authority: JP
Inventors: 裕勝山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-12-04
Filing date: 1986-12-04
Publication date: 1995-05-01
Anticipated expiration: 2010-05-01
Also published as: JPS63141193A

Description

【発明の詳細な説明】〔概要〕ベクトル表現による辞書ノードを中心としたそれぞれの
辞書空間に分割し、この分割辞書空間内の各点の所属先
を示す所属先表を設け、この所属先表を用いて、文字認
識，分類に於ける未知文字パターンの特徴ベクトルの所
属先を検索して分類するものであり、距離計算等を行う
ことなく、高速で分類することができる。DETAILED DESCRIPTION OF THE INVENTION [Outline] The dictionary table is divided into respective dictionary spaces centered around a vector-based dictionary node, and an affiliation table indicating the affiliation of each point in this divided dictionary space is provided. Is used to search and classify the affiliation destination of the feature vector of the unknown character pattern in character recognition and classification, and classification can be performed at high speed without performing distance calculation or the like.

[Industrial application field]

本発明は、文字認識，分類に於ける未知文字パターンの
分類を高速で実行できる高速分類方法に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high speed classification method capable of executing a high speed classification of unknown character patterns in character recognition and classification.

文字認識，分類方式に於いては、漢字を含む場合に、対
象とする文字の種類が非常に多くなるから、未知文字パ
ターンを分類する処理時間が長くなるものである。従っ
て、正確且つ高速で分類できることが要望されている。In the character recognition / classification method, when Chinese characters are included, the number of types of target characters is very large, so that the processing time for classifying unknown character patterns becomes long. Therefore, there is a demand for accurate and high-speed classification.

[Conventional technology]

従来の光学文字読取装置や手書き文字入力装置等から入
力された文字パターンを認識，分類する方式に於いて
は、入力された未知文字パターンの線分や方向等の特徴
を基に定めた特徴ベクトルに変換し、この特徴ベクトル
と辞書との間の距離を計算し、未知文字パターンの特徴
ベクトルに対して最短距離にある辞書を分類先とする方
式が比較的多く採用されている。In the conventional method of recognizing and classifying character patterns input from an optical character reading device or a handwritten character input device, a feature vector determined based on the features such as line segment and direction of the input unknown character pattern A method in which the distance between the feature vector and the dictionary is calculated, and the dictionary having the shortest distance to the feature vector of the unknown character pattern is used as the classification destination is relatively adopted.

例えば、第５図に示すように、二次元辞書空間に於い
て、×印の未知文字パターンの特徴ベクトル（x₀,y₀）
と、全辞書との間の矢印で示す距離を計算する。そし
て、計算された距離を比較して、最短距離の辞書（x₁,y
₁）を分類先とするものである。For example, as shown in FIG. 5, in a two-dimensional dictionary space, a feature vector (x ₀ , y ₀ ) of an unknown character pattern of x
And calculate the distance indicated by the arrow between all dictionaries. Then, the calculated distances are compared, and the dictionary of the shortest distance (x ₁ , y
₁ ) is the classification destination.

[Problems to be solved by the invention]

前述のように、従来の文字認識，分類方式に於いては、
未知文字パターンが１文字分入力される毎に、全辞書と
の間の距離を計算して、最短距離の辞書を検索するもの
であり、漢字を含む文字認識，分類方式に於いては、辞
書数が非常に多いことから、距離計算に要する時間が長
くなる欠点がある。As mentioned above, in the conventional character recognition and classification method,
Every time an unknown character pattern is input for one character, the distance from all the dictionaries is calculated and the dictionary with the shortest distance is searched. In the character recognition and classification method including kanji, the dictionary is used. Since the number is very large, there is a drawback that the distance calculation takes a long time.

本発明は、このような距離計算を行うことなく短時間で
分類を可能とすることを目的とするものである。An object of the present invention is to enable classification in a short time without performing such distance calculation.

[Means for solving problems]

本発明の高速分類方法は、分割した辞書空間内の各点の
所属先表を用いて分類するものであり、第１図を参照し
て説明する。The high-speed classification method of the present invention classifies using the affiliation table of each point in the divided dictionary space, and will be described with reference to FIG.

ベクトル表現による辞書ノード間を垂直２等分超平面を
用いて分割した辞書空間を形成し、その辞書空間内の各
点の所属先を示す所属先表３を設け、光学文字読取装置
や手書き入力装置等の入力部１から入力された文字パタ
ーンを、その線分，方向等の特徴を基に特徴ベクトル変
換部２に於いて特徴ベクトルに変換し、この特徴ベクト
ルを用いて所属先表３を検索し、この特徴ベクトルが所
属する分割辞書空間の中心の辞書ノードを分類結果とし
て出力するものである。A dictionary space is formed by dividing the dictionary nodes by vector representation using a vertical halving hyperplane, and an affiliation table 3 showing the affiliation of each point in the dictionary space is provided, and an optical character reader or handwriting input is provided. The character pattern input from the input unit 1 of the device or the like is converted into a feature vector in the feature vector conversion unit 2 based on the features such as line segments and directions, and the belonging table 3 is used by using the feature vector. The search is performed, and the central dictionary node of the divided dictionary space to which this feature vector belongs is output as a classification result.

[Action]

入力された未知文字パターンを特徴ベクトルに変換し、
その特徴ベクトルをアドレスとして、所属先表３をアク
セスするものであり、所属先表３からその特徴ベクトル
が所属する分割辞書空間の中心の辞書ノードが読出され
る。即ち、距離計算を行うことなく、所属先表３を検索
するだけで、直ちに、未知文字パターンの分類を行うこ
とができる。Convert the input unknown character pattern into a feature vector,
The affiliation table 3 is accessed using the feature vector as an address, and the central dictionary node of the divided dictionary space to which the feature vector belongs is read from the affiliation table 3. That is, the unknown character patterns can be immediately classified only by searching the affiliation table 3 without performing the distance calculation.

〔Example〕

以下図面を参照して本発明の実施例について詳細に説明
する。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第１図は本発明の実施例の説明図であり、光学文字読取
装置（OCR）や、タブレットを用いた手書き文字入力装
置等の入力部１により入力された文字パターンは、特徴
ベクトル変換部２に於いて線分，方向等の特徴に基づい
た特徴ベクトルに変換される。この特徴ベクトルは、次
元が多い程、正確な分類が可能となるものであり、例え
ば、256次元等が用いられている。この特徴ベクトルを
アドレスとして所属先表３をアクセスするものである。FIG. 1 is an explanatory diagram of an embodiment of the present invention, in which a character pattern input by an input unit 1 such as an optical character reader (OCR) or a handwritten character input device using a tablet is a feature vector conversion unit 2 Is converted into a feature vector based on features such as line segments and directions. This feature vector enables more accurate classification as the number of dimensions increases, and for example, 256 dimensions are used. The affiliation table 3 is accessed using this feature vector as an address.

所属先表３は、辞書ノードを中心とした分割辞書空間内
の各点の所属先を示すものである。第２図は分割辞書空
間の説明図、第３図は辞書空間分割の説明図、第４図は
所属先表の説明図であり、第２図に於ける●印の注目辞
書ノードＡと、○印の他の総ての辞書ノードB,C,Dとの
間の直線a,b,cの垂直２等分線によって囲まれた空間
を、注目辞書ノードＡを中心とした分割辞書空間とし、
この分割辞書空間内の各点を、注目辞書ノードＡに近い
点として、それぞれ注目辞書ノードＡに所属するものと
する。The affiliation table 3 shows the affiliation destination of each point in the divided dictionary space centered on the dictionary node. FIG. 2 is an explanatory diagram of a divided dictionary space, FIG. 3 is an explanatory diagram of dictionary space division, and FIG. 4 is an explanatory diagram of a affiliation destination table. In FIG. A space surrounded by the vertical bisectors of straight lines a, b, and c between all other dictionary nodes B, C, D marked with a circle is defined as a divided dictionary space centered on the target dictionary node A. ,
It is assumed that each point in this divided dictionary space belongs to the target dictionary node A as a point close to the target dictionary node A.

多次元表示の辞書ノードを用いた場合は、第３図に示す
ように、全辞書空間を垂直２等分超平面を用いて分割
し、それによって各辞書ノードを中心とした分割辞書空
間を形成し、その分割辞書空間の各点の所属先表を形成
するものである。In the case of using a dictionary node of multidimensional display, as shown in FIG. 3, the entire dictionary space is divided using a vertical bisector hyperplane, thereby forming a divided dictionary space centered on each dictionary node. Then, the affiliation destination table of each point in the divided dictionary space is formed.

このようにして形成された所属先表は、例えば、第４図
に示すように、辞書ノードＡ＝（a₀,a₁,a₂,・・・a_n）
を中心とした分割辞書空間、辞書ノードＢ＝（b₀,b₁,
b₂,・・・b_n）を中心とした分割辞書空間の各点につい
て、その中心の辞書ノードA,B,・・・を所属先として格
納したものである。従って、入力された未知文字パター
ンの特徴ベクトルＸ＝（a₀′,a₁′,a₂′，・・・
ａ′_ｎ）の場合に、それをアドレスとして所属先表をア
クセスすると、その所属先の辞書ノードＡが読出され
る。この場合は、辞書ノードＡを中心とした分割辞書空
間内に未知文字パターンの特徴ベクトルＸが存在した場
合であり、距離計算を行うことなく、直ちに、その特徴
ベクトルＸは、辞書ノードＡに分類されることになる。The affiliation destination table thus formed is, for example, as shown in FIG. 4, dictionary node A = (a ₀ , a ₁ , a ₂ , ... A _n ).
Divided dictionary space centered on, dictionary node B = (b ₀ , b ₁ ,
For each point in the divided dictionary space centering on b ₂ , ... b _n ), the central dictionary node A, B, ... Is stored as the affiliation destination. Accordingly, the input unknown character pattern feature vector X = (a ₀ ′, a ₁ ′, a ₂ ′, ...
In the case of _a'n ), when the affiliation destination table is accessed by using it as an address, the dictionary node A of the affiliation destination is read. In this case, the feature vector X of the unknown character pattern exists in the divided dictionary space centered on the dictionary node A, and the feature vector X is immediately classified into the dictionary node A without performing the distance calculation. Will be done.

所属先表３は、第４図に示すように、分割辞書空間の各
点と、メモリの各アドレスとを対応させて、それぞれの
所属先を格納することもできるが、未知文字パターンの
特徴ベクトルをデコードするアドレスデコーダによっ
て、分割辞書空間内の各点の所属先を格納した領域をア
クセスするように構成することも可能であり、この場合
は、アドレスデコーダの構成が複雑となるが、メモリ空
間を小さくすることができる。As shown in FIG. 4, the affiliation table 3 can store each affiliation by associating each point in the divided dictionary space with each address in the memory, but the feature vector of the unknown character pattern is stored. It is also possible to configure an address decoder that decodes to access the area that stores the affiliation of each point in the divided dictionary space. In this case, the configuration of the address decoder becomes complicated, but the memory space Can be made smaller.

〔The invention's effect〕

以上説明したように、本発明は、ベクトル表現による辞
書ノードを中心とした分割辞書空間内の各点の所属先を
示す所属先表３を、未知文字パターンの特徴ベクトルを
用いて検索することにより、その特徴ベクトルが存在す
る分割辞書空間の中心の辞書ノードを分類結果として出
力することができるものであり、従来例のように、辞書
ノードとの間の距離計算を行い、且つ最短距離の辞書ノ
ードを探す処理を必要としないことになり、分類の高速
に実行することが可能となる利点がある。従って、文字
認識，分類に適用して処理の高速化を図ることができ
る。As described above, according to the present invention, the affiliation table 3 indicating the affiliation of each point in the divided dictionary space centered on the dictionary node represented by the vector expression is searched by using the feature vector of the unknown character pattern. , The dictionary node at the center of the divided dictionary space in which the feature vector exists can be output as a classification result, and the distance between the dictionary node and the dictionary node can be calculated as in the conventional example. There is an advantage that the process of searching for a node is not required and the classification can be executed at high speed. Therefore, it can be applied to character recognition and classification to speed up the process.

【図面の簡単な説明】第１図は本発明の実施例のブロック図、第２図は分割辞
書空間の説明図、第３図は辞書空間の分割の説明図、第
４図は所属先表の説明図、第５図は従来例の説明図であ
る。１は入力部、２は特徴ベクトル変換部、３は所属先表で
ある。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of an embodiment of the present invention, FIG. 2 is an explanatory diagram of a divided dictionary space, FIG. 3 is an explanatory diagram of division of a dictionary space, and FIG. And FIG. 5 is an explanatory view of a conventional example. Reference numeral 1 is an input unit, 2 is a feature vector conversion unit, and 3 is a affiliation destination table.

Claims

[Claims]

1. Vertical 2 between dictionary nodes represented by a vector representation.
An input section (1) is provided by dividing using an even hyperplane to form a divided dictionary space centered on a dictionary node, and providing an affiliation table (3) indicating the affiliation of each point in the divided dictionary space. The feature vector of the character pattern input from the character vector is obtained in the feature vector conversion unit (2), the affiliation table (3) is searched by the feature vector, and the dictionary at the center of the divided dictionary space to which the feature vector belongs A high-speed classification method that outputs nodes as classification results.