JPH0550785B2

JPH0550785B2 -

Info

Publication number: JPH0550785B2
Application number: JP61033190A
Authority: JP
Inventors: Fumio Yoda; Yoji Maeda
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1986-02-18
Filing date: 1986-02-18
Publication date: 1993-07-29
Also published as: JPS62190574A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文書に記入された文字列から文字パ
ターンを切り出す文字パターン切り出し装置に関
するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a character pattern cutting device that cuts out a character pattern from a character string written in a document.

[Conventional technology]

文字を認識するには、用紙に記入された文字を
光電変換し、文字の部分、背景の部分に対応して
１，０の信号に２値化変換した文字パターンから
１文字ずつ文字パターンを切り出さなければなら
ない。例えば、第２図は情報処理学会第28回全国
大会講演論文集，P885−886，「文字認識による
英文の文献入力システム」に示された従来のこの
種の装置の構成を示す図であり、図中、１は用
紙、２は用紙１上に記入された文字列を光学的に
走査して光電変換する走査手段、３は光電変換さ
れた文字列のパターン（以後「文字列パターン」
と呼ぶ）を記憶する文字列パターン記憶手段、４
は上記文字列と直交する方向に文字列パターンを
走査して求めた周辺分布値の連続性に基づき、上
記文字列パターンを分割して求めたパターン（以
後「基本パターン」と呼ぶ）の左端と右端との座
標を求めることにより基本パターン領域を検出す
る基本パターン領域検出手段、５は基本パターン
領域検出手段４で求めた基本パターン領域の位置
情報から単独の基本パターン及び連続する複数個
の基本パターン領域を結合したパターン（以後
「結合パターン」と呼ぶ）のそれぞれが、どの程
度１文字のパターンらしいかを定量化した値（以
後「類似度」と呼ぶ）を計算する文字認識手段、
６は上記文字認識手段５で求めた類似度を格納す
る類似度テーブル、９は上記類似度テーブルに格
納した類似度の大きさに基づき切り出すべき文字
パターンの位置を決定する文字切り出し位置決定
手段、１０は上記文字切り出し位置決定手段９で
決定した文字パターン切り出し位置に基づいて上
記文字列パターン記憶手段３から１文字ずつ文字
パターンを切り出す切り出し手段、１１は切り出
した文字のパターンを格納する切り出しパターン
バツフアである。 To recognize characters, the characters written on paper are photoelectrically converted and converted into binary signals of 1 and 0 corresponding to the characters and the background.The character pattern is then cut out one character at a time. There must be. For example, Fig. 2 is a diagram showing the configuration of a conventional device of this kind, which was shown in Proceedings of the 28th National Conference of the Information Processing Society of Japan, pages 885-886, "English Document Input System Using Character Recognition." In the figure, 1 is a sheet of paper, 2 is a scanning means for optically scanning and photoelectrically converting a character string written on the sheet 1, and 3 is a pattern of the photoelectrically converted character string (hereinafter referred to as a "character string pattern").
(4) character string pattern storage means for storing
is the left end of the pattern obtained by dividing the above character string pattern (hereinafter referred to as the "basic pattern") based on the continuity of the marginal distribution values obtained by scanning the character string pattern in the direction perpendicular to the above character string. A basic pattern area detecting means detects a basic pattern area by determining the coordinates with the right end; 5 is a basic pattern area detecting means 4 that detects a single basic pattern and a plurality of consecutive basic patterns from the positional information of the basic pattern area obtained by the basic pattern area detecting means 4; a character recognition means that calculates a value (hereinafter referred to as "similarity") that quantifies the degree to which each pattern that combines regions (hereinafter referred to as "combined pattern") resembles a pattern of one character;
Reference numeral 6 denotes a similarity table for storing the similarity obtained by the character recognition means 5; 9 character extraction position determining means determines the position of the character pattern to be extracted based on the degree of similarity stored in the similarity table; 10 is a cutting means for cutting out a character pattern one character at a time from the character string pattern storage means 3 based on the character pattern cutting position determined by the character cutting position determining means 9, and 11 is a cutting pattern cross for storing the cut out character pattern. It's Hua.

第３図は、第２図の上記基本パターン領域検出
手段４による処理の一例を示す図である。図中、
１２は文字列パターン、１３は周辺分布値、１４
は基本パターンであり、該基本パターンと対応す
る基本パターン領域を矩形で囲つて示してある。 FIG. 3 is a diagram showing an example of processing by the basic pattern area detection means 4 of FIG. 2. In the figure,
12 is a character string pattern, 13 is a marginal distribution value, 14
is a basic pattern, and a basic pattern area corresponding to the basic pattern is shown surrounded by a rectangle.

第４図は、第２図に示された類似度テーブル６
の構成例を示す図であり、図中、１５は類似度、
１６は説明を分り易くするために示した類似度に
対応する基本パターン又は結合パターンである。
また、図中記号「＊」は、所定の領域に対応する
パターンが存在しないことを意味するものであ
る。 FIG. 4 shows the similarity table 6 shown in FIG.
15 is a diagram showing an example of the configuration of
16 is a basic pattern or a combination pattern corresponding to the degree of similarity shown to make the explanation easier to understand.
Further, the symbol "*" in the figure means that there is no pattern corresponding to the predetermined area.

第５図は、第２図の文字切り出し位置決定手段
９で文字を切り出すために発生させる基本パター
ン又は結合パターンの組み合せの例を示す図であ
る。 FIG. 5 is a diagram showing an example of a combination of basic patterns or combination patterns generated to cut out characters by the character cutting position determining means 9 of FIG. 2.

次に、第２図〜第５図を用いて第２図に示す従
来のこの種装置の動作について説明する。 Next, the operation of the conventional device of this kind shown in FIG. 2 will be explained using FIGS. 2 to 5.

まず、用紙１上の文字列は走査手段２で光電変
換され、文字列パターン記憶手段３に格納され
る。次に文字列パターン記憶手段３内の文字列パ
ターン「解決」１２は基本パターン領域検出手段
４に渡される。 First, a character string on paper 1 is photoelectrically converted by scanning means 2 and stored in character string pattern storage means 3. Next, the character string pattern "solution" 12 in the character string pattern storage means 3 is passed to the basic pattern area detection means 4.

基本パターン領域検出手段４では、文字列と直
交する上下方向に文字列パターンを走査して作成
した周辺分布値１３が所定の闘値をこえる領域の
連続性に基づいて文字列パターン「解決」１２を
分割し、分割して得た各基本パターン「角」，
「〓」，「〓」，「〓」１４の左右端の座標を基本パ
ターン領域として検出する。 The basic pattern area detecting means 4 uses the character string pattern "resolution" 12 based on the continuity of the area where the marginal distribution value 13 created by scanning the character string pattern in the vertical direction orthogonal to the character string exceeds a predetermined threshold value. and each basic pattern "corner" obtained by dividing,
The coordinates of the left and right ends of "〓", "〓", "〓" 14 are detected as the basic pattern area.

次に、文字認識手段５では、単独の基本パター
ン、及び連続する複数個の基本パターンを結合し
た結合パターンのすべてのパターンに対し、すで
に知られている文字認識の技術を用いて、個々の
パターンがどの程度１文字のパターンらしいかを
定量化した値である類似度を算出し、算出した類
似度を上記類似度テーブル６に格納する。例え
ば、入力パターンＰから抽出した特徴ベクトルを
Ｘ＝〔X₁，X₂，…，X_K〕，文字Ｃの基準パターン
ベクトルをF_C＝〔f_C1，f_C2，…，f_CK〕とした時入
力パターンの類似度Ｓ（Ｐ）は、式(1)と(2)より算
出する。 Next, the character recognition means 5 uses already known character recognition technology to identify individual patterns for all patterns, including a single basic pattern and a combined pattern that combines a plurality of consecutive basic patterns. The degree of similarity, which is a value that quantifies how much is likely to be a pattern of one character, is calculated, and the calculated degree of similarity is stored in the above-mentioned similarity table 6. For example _, _let _the feature vector extracted from _the input pattern P be _X = [ _{X 1} _, The similarity S(P) of the time input pattern is calculated from equations (1) and (2).

Ｓ（Ｐ）＝ｍａＣxS_c（Ｘ） ……(1) S_c（Ｐ）＝（Ｘ，F_c）／｜Ｘ｜・｜F_c｜……(2
) ここで、（Ｘ，F_c）＝Ｘ・F_cＴなる内積、｜Ｘ｜＝√（，）なるノルム
を示す。 S(P)=m a CxS _c (X) ……(1) S _c (P)=(X, F _c )/｜X｜・｜F _c ｜……(2
) Here, the inner product (X, F _c )=X・F _c T and the norm |X|=√(,) are shown.

類似度Ｓ（Ｐ）は0.0〜1.0の値をとり、入力パ
ターンが文字らしい程、大きな値をとるという性
質がある。 The similarity S(P) takes a value between 0.0 and 1.0, and has a property that the more the input pattern is a character, the larger the value is.

そして、上記文字認識手段５で求めた結果は、
第４図に示す類似度テーブル６に格納する。すな
わち、左側の基本パターンから順に番号付けした
時の番号ｉと、所定のパターンを構成する基本パ
ターン数Ｍとによつて指定される領域に類似度を
格納する。例えば、結合パターン「解〓」１６の
類似度0.29１５はｉ＝１，Ｍ＝３のアドレスで示
される領域に格納される。 The results obtained by the character recognition means 5 are as follows:
It is stored in the similarity table 6 shown in FIG. That is, the degree of similarity is stored in an area designated by the number i when numbering the basic patterns sequentially from the left side and the number M of basic patterns forming a predetermined pattern. For example, the similarity of 0.2915 for the combined pattern "Solution" 16 is stored in the area indicated by the address of i=1 and M=3.

そして、上記文字切り出し位置決定手段９で
は、上記基本パターン領域間の境界点を文字列パ
ターンの切り出し候補位置とし、この各切り出し
候補位置のあらゆる可能な組み合せにより、第５
図に示す全ての可能な切り出しパターンの組み合
せを求める。また、次に、各切り出し候補位置の
組について切り出し候補位置で切り出されたパタ
ーンに対応する基本パターンあるいは結合パター
ンに対応する上記類似度テーブルに格納した類似
度の値に基づいて切り出し評価値を求め、この切
り出し評価値の大きさに基づいて最適な文字切り
出し位置の組を決定する。切り出し評価値は、各
パターンに対応する類似度の算術平均を計算する
手法などで具体的に求めることができる。 Then, the character cutting position determining means 9 sets the boundary points between the basic pattern areas as cutting candidate positions of the character string pattern, and uses all possible combinations of these cutting candidate positions to
Find all possible combinations of cutout patterns shown in the figure. Next, for each set of cropping candidate positions, a cropping evaluation value is calculated based on the similarity value stored in the similarity table above corresponding to the basic pattern or combined pattern corresponding to the pattern cropped at the cropping candidate position. , determines an optimal set of character extraction positions based on the magnitude of this extraction evaluation value. The cutout evaluation value can be specifically determined by a method of calculating the arithmetic mean of the degrees of similarity corresponding to each pattern.

次に上記文字切り出し手段１０は、上記文字切
り出し位置決定手段９で決定した文字切り出し位
置の組の情報に基づいて上記文字列パターン記憶
手段３から１文字ずつ文字パターンを切り出して
上記切り出しパターンバツフア１１に出力する。 Next, the character cutting means 10 cuts out a character pattern one character at a time from the character string pattern storage means 3 based on the information on the set of character cutting positions determined by the character cutting position determining means 9, and stores the character pattern in the cutting pattern buffer. Output to 11.

[Problem that the invention seeks to solve]

従来の文字パターン切り出し装置は以上のよう
にパターンを認識して得た類似度の値に基づいて
算出した切り出し評価値の大きさに基づいて文字
領域を切り出すように構成されているので、フリ
ーピツチの文字列から個々の文字を切り出すこと
ができるが、漢字の「明」などのように偏「日」
と旁「月」とのパターンがそれぞれ１文字「日」、
「月」としても存在する文字（以後「分離有意文
字」と呼ぶ）の場合、「明」、「日」、「月」のパタ
ーンに対応する類似度の値の大きさが同程度とな
るため、「明」などの分離有意文字を誤つて切り
出す率が高くなるという問題点があつた。 Conventional character pattern extraction devices are configured to extract character areas based on the size of the extraction evaluation value calculated based on the similarity value obtained by recognizing the pattern as described above, so the free pitch Although it is possible to extract individual characters from a string, it is possible to cut out individual characters from a string, but it is possible to cut out individual characters from a character string.
The pattern of ``moon'' and 旁 ``moon'' is one character each, ``day'',
In the case of a character that also exists as ``month'' (hereinafter referred to as ``separable significant character''), the similarity values corresponding to the patterns ``明'', ``日'', and ``月'' are approximately the same. There was a problem that the rate of erroneously extracting significant characters such as ``Ming'' and ``Ming'' was high.

この発明はこのような問題点を解決するために
なされたもので、偏と旁とが各々１文字としても
存在する字形からなる分離有意文字を含んだフリ
ーピツチで記入された日本語文字列から個々の文
字を正しく切り出すことができる装置を得ること
を目的とする。 This invention was made in order to solve these problems, and it is possible to extract individual characters from Japanese character strings written in free pitch containing separable significant characters consisting of glyphs in which ``bias'' and ``旁'' each exist as one character. The purpose of this invention is to obtain a device that can correctly cut out characters.

[Means for solving problems]

この発明に係る文字パターン切り出し装置は、
上記文字認識手段で算出した類似度の値の差に基
づいてパターンがどの程度１文字らしいか定量化
した値（以後「差分類似度」と呼ぶ）を算出する
文字列パターンを分割して得る切り出し候補位置
の組み合せを求め、求められた切り出し候補位置
の組み合せについて切り出し候補位置で切り出さ
れた各パターンの上記差分類似度の平均値を切り
出し評価値として求め、この切り出し評価値に基
づき最適な文字切り出し位置の組み合わせを決定
する文字切り出し位置決定手段とを設けたもので
ある。 The character pattern cutting device according to the present invention includes:
A value obtained by dividing a character string pattern to calculate a quantified value (hereinafter referred to as "differential similarity") to what extent the pattern resembles a single character based on the difference in similarity values calculated by the above character recognition means. Find combinations of candidate positions, calculate the average value of the above-mentioned differential similarity of each pattern cut out at the cutout candidate positions for the combination of the found cutout candidate positions as a cutout evaluation value, and calculate the optimal character cutout based on this cutout evaluation value. A character cutout position determining means for determining a combination of positions is provided.

[Effect]

この発明では、意味のある日本文の文字列内に
存在する１文字のパターンとこの文字に隣接する
他の文字あるいは他の文字の一部のパターンとが
結合して生成されるパターンが１文字となること
は、確率的に０に近い頻度でしかあり得ず、他
方、分離有意文字を構成する偏あるいは旁のパタ
ーンは隣り合う他のパターンと結合して１文字の
パターンに必ずなるという性質に着目して、差分
類似度算出手段において着目するパターンに対す
る類似度の値と、着目するパターンを含んだパタ
ーンに対する類似度の値との差に基づいて該着目
パターンがどの程度文字らしいか定量的に表わし
た差分類似度の値を算出する。このため、上記差
分類似度算出手段で求めた分離有意文字のパター
ンに対する差分類似度の値は低下しないが、分離
有意文字を構成する偏や旁のパターンに対する差
分類似度の値は小さくなる。従つて、差分類似度
の値に基づいて算出された切り出し評価値を用い
ることにより、「明」などの分離有意文字を正し
く切り出すことができる。 In this invention, one character is a pattern that is generated by combining a pattern of one character existing in a meaningful Japanese character string with a pattern of another character adjacent to this character or a part of another character. This can only happen with a probability close to 0, and on the other hand, there is a property that the partial or low patterns that make up a separate significant character always combine with other adjacent patterns to form a single character pattern. , the difference similarity calculation means quantitatively determines how much the pattern of interest is like a character based on the difference between the similarity value for the pattern of interest and the similarity value of the pattern that includes the pattern of interest. Calculate the value of the differential similarity expressed in . Therefore, although the value of the differential similarity with respect to the pattern of the separated significant characters calculated by the above-mentioned differential similarity calculation means does not decrease, the value of the differential similarity with respect to the partial or vertical patterns forming the separated significant characters becomes small. Therefore, by using the cutout evaluation value calculated based on the value of the difference similarity, it is possible to correctly cut out the significant separable characters such as "bright".

[Embodiments of the invention]

以下、図面を用いて本発明を詳細に説明する。 Hereinafter, the present invention will be explained in detail using the drawings.

第１図は、この発明の実施例の構成を示す図で
ある。図中、１〜６，９〜１１は第２図に示した
従来装置と同一のものである。図中７は上記類似
度テーブルに格納された類似度の値に基づいて差
分類似度を算出する差分類似度算出手段、８は上
記差分類似度算出手段７で計算した差分類似度を
格納する差分類似度テーブルである。 FIG. 1 is a diagram showing the configuration of an embodiment of the present invention. In the figure, numerals 1 to 6 and 9 to 11 are the same as the conventional device shown in FIG. In the figure, 7 is a differential similarity calculating means for calculating a differential similarity based on the similarity value stored in the similarity table, and 8 is a differential similarity calculating means for storing the differential similarity calculated by the differential similarity calculating means 7. This is a similarity table.

第６図は、本発明の動作を説明するための図で
あり、図中、１７は従来の装置では正しく切り出
すことが困難な分離有意文字からなる文字列パタ
ーン、１８は周辺分布値、１９は基本パターンで
ある。 FIG. 6 is a diagram for explaining the operation of the present invention. In the figure, 17 is a character string pattern consisting of separable significant characters that are difficult to extract correctly with conventional devices, 18 is a marginal distribution value, and 19 is a This is a basic pattern.

第７図は、第１図に示された上記類似度テーブ
ル６の構成例であり、図中、２０は類似度の例、
２１は説明を分り易くするために示した類似度に
対応する基本パターンまたは結合パターンの例で
ある。図中記号「＊」は第４図の場合と同様に所
定の領域に対応するパターンが存在しないことを
意味する。 FIG. 7 is a configuration example of the similarity table 6 shown in FIG. 1, in which 20 is an example of similarity;
21 is an example of a basic pattern or a combination pattern corresponding to the degree of similarity shown to make the explanation easier to understand. The symbol "*" in the figure means that, as in the case of FIG. 4, there is no pattern corresponding to the predetermined area.

第８図は、第１図に示した上記差分類似度テー
ブル８の構成例を示す図であり、図中、２２は差
分類似度の例、２３は説明を分り易くするために
示した差分類似度に対応する基本パターンまたは
結合パターンの例である。また、図中記号「＊」
は所定の領域に対応するパターンが存在しないこ
とを意味するものである。 FIG. 8 is a diagram showing an example of the configuration of the differential similarity table 8 shown in FIG. This is an example of a basic pattern or a combination pattern corresponding to degrees. In addition, the symbol "*" in the figure
means that there is no pattern corresponding to the predetermined area.

第９図は、第１図に示した上記差分類似度算出
手段７の構成例を示す図であり、図中、６と８は
第１図に示したものと同一のものである。図中、
２４はパターン類似度検出器、２５は包含パター
ン類似度検出器、２６は最大要素検出器、２７は
減算器、２８〜３４は信号線である。 FIG. 9 is a diagram showing an example of the configuration of the differential similarity calculation means 7 shown in FIG. 1, and in the figure, 6 and 8 are the same as those shown in FIG. 1. In the figure,
24 is a pattern similarity detector, 25 is an inclusive pattern similarity detector, 26 is a maximum element detector, 27 is a subtractor, and 28 to 34 are signal lines.

以下、第６図〜第９図を用いて第１図に示す実
施例の動作を説明する。 The operation of the embodiment shown in FIG. 1 will be explained below using FIGS. 6 to 9.

まず、用紙１上に記入された文字列は走査手段
２で光電変換され、この結果得た１行分の文字列
パターン「明治」１７は、上記文字列パターン記
憶手段３に格納される。次に文字列パターン「明
治」１７は、基本パターン領域検出手段４に渡さ
れる。基本パターン領域検出手段４では、文字列
パターン「明治」１７の周辺分布値１８を求め、
次に、この周辺分布値に基づいて基本パターン
「日」，「月」，「〓」，「台」１９を抽出し各基本パ
ターンに対応する左右端の座標を基本パターン領
域の位置情報として上記文字認識手段５に転送す
る。 First, the character string written on the paper 1 is photoelectrically converted by the scanning means 2, and the resulting character string pattern "Meiji" 17 for one line is stored in the character string pattern storage means 3. Next, the character string pattern "Meiji" 17 is passed to the basic pattern area detection means 4. The basic pattern area detection means 4 calculates the marginal distribution value 18 of the character string pattern "Meiji" 17,
Next, the basic patterns "day", "month", "〓", and "stand" 19 are extracted based on this peripheral distribution value, and the coordinates of the left and right ends corresponding to each basic pattern are used as the position information of the basic pattern area as described above. It is transferred to the character recognition means 5.

また、上記文字認識手段５では、従来の装置と
同様に単独の基本パターンと、連続する複数個の
基本パターンを結合した結合パターンとのすべて
のパターンに対してパターン認識処理を行い、各
各のパターンに対応する類似度を第７図に示した
上記類似度テーブル６に格納する。 In addition, the character recognition means 5 performs pattern recognition processing on all patterns, including a single basic pattern and a combined pattern that combines a plurality of consecutive basic patterns, as in the conventional device, and The degree of similarity corresponding to the pattern is stored in the degree of similarity table 6 shown in FIG.

次に、上記差分類似度算出手段７の動作につい
て説明する。まず、差分類似度について簡単に説
明する。 Next, the operation of the differential similarity calculating means 7 will be explained. First, differential similarity will be briefly explained.

一般に、意味のある日本文の文字列内に存在す
る１文字のパターンP₀に隣接する他の文字ある
いは他の文字の一部のパターンP₁が結合して生
成されるパターンP_Aが１つの文字となることは
確率的に０に近い頻度でしかあり得ないことが知
られている。このため、１文字のパターンP₀を
認識して得る類似度Ｓ（P₀）の値とパターンP₀を
含むパターンP_Aを認識して得る類似度Ｓ（P_A）の
値とには大きな差が生じる。他方、分離有意文字
を構成する偏あるいは旁のパターンP₂と、分離
有意文字のパターンP₃とは共に１つの文字とな
る。従つて、分離有意文字を構成する偏あるいは
旁のパターンP₂を認識して得る類似度Ｓ（P₂）の
値とパターンP₂を含む分離有意文字のパターン
P₃を認識して得る類似度Ｓ（P₃）の値とに大きな
差が生じない。そこで、式(3)で定義したパターン
Ｐの差分類似度ΔS（Ｐ）の値の大きさにより、パ
ターンＰがどの程度１文字らしいか正確に表現で
きる。 In general, a pattern P _A created by combining a pattern P ₀ of one character existing in a meaningful string of Japanese sentences with a pattern P ₁ of adjacent other characters or a part of other characters is one It is known that the probability of a character being a character is only close to 0. Therefore, the value of similarity S (P _{0 ) obtained by recognizing the pattern P 0} _of one character and the value of similarity S (P _A ) obtained by recognizing the pattern P _A including pattern P ₀ are large. It makes a difference. On the other hand, both the partial or vertical pattern P ₂ constituting the separate significant character and the pattern P ₃ of the separate significant character become one character. Therefore, the value of similarity S (P ₂ ) obtained by recognizing the partial or partial pattern P ₂ that constitutes the separated significant character and the pattern of the separated significant character that includes the pattern P ₂
There is no significant difference in the value of similarity S(P ₃ ) obtained by recognizing P ₃ . Therefore, it is possible to accurately express how likely the pattern P is to be a single character, depending on the value of the differential similarity ΔS(P) of the pattern P defined by equation (3).

ΔS（Ｐ）＝Ｓ（Ｐ）−ｍａｉxS（Pi） ……(3) ここでPiはパターンＰを含むパターン第８図は、第７図に示した類似度テーブルに格
納した類似度の値に基づいて上記差分類似度算出
手段７で基本パターンおよび結合パターンの各々
に対応する差分類似度を格納した差分類似度テー
ブルの例である。例えば、パターン「月」，「明」，
「月〓」に対する類似度はそれぞれ0.89，0.90，
0.71であるため、パターン「月」に対応する差分
類似度ΔS（月）＝0.89−max（0.90，0.71）＝−0.01
となる。 ΔS (P) = S (P) - maixS (Pi) ... (3) Here, Pi is a pattern that includes pattern P. Figure 8 shows the similarity values stored in the similarity table shown in Figure 7. This is an example of a differential similarity table in which the differential similarity calculation means 7 stores differential similarities corresponding to each of the basic pattern and the combined pattern based on the above-mentioned differential similarity calculation means 7. For example, the patterns "Moon", "Ming",
The similarity to “Moon〓” is 0.89, 0.90, respectively.
Since it is 0.71, the differential similarity ΔS (month) corresponding to the pattern “month” = 0.89 − max (0.90, 0.71) = −0.01
becomes.

第９図は、差分類似度を算出する構成例を示し
たものである。まず、差分類似度を算出するパタ
ーンを指示するパターンインデツクス信号ｉ，Ｍ
が信号線２８を介してパターン類似度検出器２４
に入力され、上記パターン類似度検出器２４はパ
ターンインデツクス信号ｉ，Ｍで指示されるパタ
ーンに対応する類似度を信号線２９を介し類似度
テーブルからロードした後、信号線３０を介して
減算器２７に転送する。次に包含パターン類似度
検出器２５は、パターンインデツクス信号ｉ，Ｍ
を信号線２８を介し入力してパターンインデツク
ス信号ｉ，Ｍで指定されるパターンを含むパター
ンのインデツクス信号i′，M′を生成する。そし
て、次にパターンインデツクス信号i′，M′に基づ
いて上記類似度テーブル６から信号線３１を介し
て類似度をロードし、信号線３２を介して最大要
素検出器２６に転送する。上記最大要素検出器２
６は、上記包含パターン類似度検出器２５から送
られた類似度のうち最大となる値を検出し、信号
線３３を通して上記減算器２７に転送する。上記
減算器２７は、信号線３０を介して転送された類
似度と信号線３３を介して転送された類似度との
差をとり、結果を信号線３４を介してパターンイ
ンデツクス信号ｉ，Ｍで指示される差分類似度テ
ーブル８の領域に格納する。 FIG. 9 shows an example of a configuration for calculating differential similarity. First, pattern index signals i, M specifying the pattern for calculating the differential similarity.
is connected to the pattern similarity detector 24 via a signal line 28.
The pattern similarity detector 24 loads the similarity corresponding to the pattern indicated by the pattern index signals i and M from the similarity table via the signal line 29, and then subtracts it via the signal line 30. transfer to the device 27. Next, the inclusive pattern similarity detector 25 detects pattern index signals i, M
are input through the signal line 28 to generate index signals i', M' of patterns including the patterns specified by the pattern index signals i, M. Then, based on the pattern index signals i' and M', the similarity is loaded from the similarity table 6 via the signal line 31 and transferred to the maximum element detector 26 via the signal line 32. Maximum element detector 2 above
6 detects the maximum value among the similarities sent from the inclusive pattern similarity detector 25 and transfers it to the subtracter 27 through the signal line 33. The subtracter 27 calculates the difference between the similarity transferred via the signal line 30 and the similarity transferred via the signal line 33, and sends the result to pattern index signals i, M via the signal line 34. The difference similarity table 8 is stored in the area indicated by .

次に上記差分類似度テーブル８に記憶した差分
類似度を用いて文字列パターンから文字を切り出
す処理について説明する。 Next, a process of cutting out characters from a character string pattern using the differential similarities stored in the differential similarity table 8 will be described.

上記文字切り出し位置決定手段９では、上記基
本パターン領域間の境界点を文字列パターンの切
り出し候補位置とし、この切り出し候補位置のあ
らゆる組み合わせを求める。次に、各切り出し候
補位置の組み合せについて切り出し候補位置で切
り出されたパターンに対応する上記差分類似度テ
ーブル８に格納した差分類似度に値に基づいて切
り出し評価値を求め、最も高い切り出し評価値が
与えられる切り出しパターンの組み合せを検出
し、この情報を上記文字切り出し手段１０に転送
する。 The character cutout position determining means 9 uses the boundary points between the basic pattern regions as cutout candidate positions of the character string pattern, and finds all combinations of these cutout candidate positions. Next, for each combination of cropping candidate positions, a cropping evaluation value is calculated based on the value of the differential similarity stored in the differential similarity table 8 corresponding to the pattern cropped at the cropping candidate position, and the highest cropping evaluation value is determined. A combination of given cutting patterns is detected and this information is transferred to the character cutting means 10.

切り出し評価値は、各パターンに対応する差分
類似度を求め、その結果を用いて、例えば従来の
装置と同様に算術平均値で具体的に求めることが
できる。例えば、第６図の文字列パターンに対
し、第８図に示した差分類似度テーブルに格納し
た差分類似度の算術平均で切り出し評価値Ｖを求
めると、それぞれＶ（「日」＋「月」＋「〓」＋「台」）＝（0.01−0.01
−0.02
＋0.02）／４＝0.000 Ｖ（「日」＋「月」＋「治」）＝（0.01−0.01＋0.24）
／３
＝0.080 Ｖ（「日」＋「月〓」＋「台」）＝（0.01＋0.01＋0.02
）／
３＝0.013 Ｖ（「明」＋「〓」＋「台」）＝（0.20−0.02＋0.02）
／３
＝0.007 Ｖ（「明」＋「治」）＝（0.20＋0.24）／２＝0.220 Ｖ（「明〓」＋「台」）＝（0.04＋0.02）／２＝0.030 Ｖ（「日」＋「月〓台」）＝（0.01＋0.02）／２＝0.01
5 となる。この結果、上記文字切り出し位置決定手
段９では、切り出しパターン「明」と「治」の組
み合わせの切り出し評価値Ｖ（「明」＋「治」）＝
0.220が最大の値となるため、パターン「明」と
「治」を最適な切り出し結果と判定し、この組み
合わせの情報を上記文字切り出し手段１０に転送
する。 The cutout evaluation value can be specifically determined by calculating the differential similarity corresponding to each pattern, and using the results, for example, as an arithmetic average value, as in conventional devices. For example, for the character string pattern shown in Figure 6, if the cutout evaluation value V is calculated by the arithmetic average of the differential similarities stored in the differential similarity table shown in Figure 8, then + “〓” + “stand”) = (0.01−0.01
−0.02
+0.02)/4=0.000 V ("Sun" + "Month" + "Cure") = (0.01-0.01+0.24)
/3
= 0.080 V ("Sun" + "Month" + "Unit") = (0.01 + 0.01 + 0.02
)/
3 = 0.013 V (“bright” + “〓” + “base”) = (0.20−0.02+0.02)
/3
= 0.007 V ("Ming" + "Chi") = (0.20 + 0.24) / 2 = 0.220 V ("Ming" + "Tai") = (0.04 + 0.02) / 2 = 0.030 V ("Sun" + "month = unit") = (0.01 + 0.02) / 2 = 0.01
It becomes 5. As a result, the character cutout position determining means 9 determines that the cutout evaluation value V (“bright” + “ji”) of the combination of cutout patterns “bright” and “ji” =
Since 0.220 is the maximum value, the patterns "Ai" and "ji" are determined to be the optimal cutting results, and information on this combination is transferred to the character cutting means 10.

最後に上記文字切り出し手段１０では、上記文
字切り出し位置決定手段９で決定した文字切り出
し位置の組の情報に基づいて上記文字列パターン
記憶手段３から１文字ずつ文字パターンを切り出
して上記切り出しパターンバツフアに出力する。 Finally, the character cutting means 10 cuts out the character pattern one character at a time from the character string pattern storage means 3 based on the information on the set of character cutting positions determined by the character cutting position determining means 9, and stores the character pattern in the cutting pattern buffer. Output to.

なお、上記実施例では文字認識手段で類似度を
算出する場合について説明したが、この発明はこ
れに限らず文字認識手段で相違度を算出する場合
に用いてもよく、この時、本発明の最大値を検出
する回路を最小値を検出する回路に置き換えるこ
とにより実現できる。 In addition, although the above embodiment describes the case where the degree of similarity is calculated by the character recognition means, the present invention is not limited to this, and may be used when the degree of dissimilarity is calculated by the character recognition means. This can be achieved by replacing the circuit that detects the maximum value with a circuit that detects the minimum value.

〔Effect of the invention〕

以上のように、この発明によればパターンを認
識して得た類似度の差の値に基づいて１文字ずつ
文字を切り出すため、偏と旁とが各々１文字とし
ても存在する字形から構成される分離有意文字を
含んだフリーピツチで記入された文字列から個々
の文字を高い精度で切り出すことのできる装置が
得られる効果がある。 As described above, according to the present invention, characters are extracted one by one based on the value of the difference in similarity obtained by recognizing a pattern. The present invention has the effect of providing a device that can cut out individual characters with high accuracy from a character string written in free pitches including separable significant characters.

[Brief explanation of drawings]

第１図はこの発明の一実施例の構成を示す構成
図、第２図は従来の文字パターン切り出し装置の
構成図、第３図の基本パターン領域検出手段の処
理例の説明図、第４図は類似度テーブルの構成例
を示す図、第５図は文字切り出し位置決定手段で
評価するパターンの組み合せの例を示す説明図、
第６図は分離有意文字列パターンの例を示す図、
第７図は類似度テーブルの構成の例を示す図、第
８図は差分類似度テーブル８の構成例を示す図、
第９図は差分類似度算出手段の構成例を示す図で
ある。図中、１は用紙、２は走査手段、３は文字列パ
ターン記憶手段、４は基本パターン領域検出手
段、５は文字認識手段、６は類似度テーブル、７
は差分類似度算出手段、８は差分類似度テーブ
ル、９は文字切り出し位置決定手段、１０は文字
切り出し手段、１１は切り出しパターンバツフア
である。なお図中、同一符号は同一または相当部
分を示す。 FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, FIG. 2 is a block diagram of a conventional character pattern extraction device, FIG. 3 is an explanatory diagram of a processing example of the basic pattern area detection means, and FIG. 4 FIG. 5 is an explanatory diagram showing an example of a combination of patterns to be evaluated by the character extraction position determining means;
FIG. 6 is a diagram showing an example of a separation significant character string pattern,
FIG. 7 is a diagram showing an example of the configuration of the similarity table, FIG. 8 is a diagram showing an example of the configuration of the differential similarity table 8,
FIG. 9 is a diagram showing an example of the configuration of the differential similarity calculating means. In the figure, 1 is paper, 2 is a scanning means, 3 is a character string pattern storage means, 4 is a basic pattern area detection means, 5 is a character recognition means, 6 is a similarity table, 7
8 is a differential similarity calculation means, 8 is a differential similarity table, 9 is a character cutting position determining means, 10 is a character cutting means, and 11 is a cutting pattern buffer. In the drawings, the same reference numerals indicate the same or corresponding parts.

Claims

[Claims]

1. A character pattern cutting device that cuts out a character pattern from a character string written on a sheet of paper, etc., includes a scanning means that optically scans and photoelectrically converts the character string on the sheet, and stores the photoelectrically converted pattern of the character string. a string pattern storage means for
Basic pattern area detection means that divides the character string pattern based on the continuity of the marginal distribution values obtained by scanning the character string pattern, and determines a basic pattern area from the coordinates of the ends of the basic pattern obtained by dividing;
Character recognition means that recognizes the basic pattern and a combined pattern that combines a plurality of consecutive basic patterns and calculates the similarity of the basic pattern and the combined pattern, and the above-mentioned similarity of the focused basic pattern itself or the combined pattern itself. and a difference similarity calculating means for calculating the difference between the basic pattern to be focused on or the combined pattern including the combined pattern as the differential similarity of the basic pattern to be focused on or the combined pattern to be focused on, based on the position information of the basic pattern area. to find possible combinations of cropping candidate positions,
The average value of the above-mentioned differential similarity of each pattern cut out at the cutout candidate positions for the obtained combination of cutout candidate positions is determined as a cutout evaluation value, and the optimal combination of character cutout positions is determined based on this cutout evaluation value. A character characterized by comprising: a cutting position determining means; and a character cutting means for cutting out and outputting a character pattern from a character string pattern stored in the character string pattern storage means based on the result of the character cutting position determining means. Pattern cutting device.