JPH10222607A - Method for recognizing separation of graphic element from character element - Google Patents

Method for recognizing separation of graphic element from character element

Info

Publication number
JPH10222607A
JPH10222607A JP9021140A JP2114097A JPH10222607A JP H10222607 A JPH10222607 A JP H10222607A JP 9021140 A JP9021140 A JP 9021140A JP 2114097 A JP2114097 A JP 2114097A JP H10222607 A JPH10222607 A JP H10222607A
Authority
JP
Japan
Prior art keywords
character
contour
circle
graphic
judged
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP9021140A
Other languages
Japanese (ja)
Inventor
Masahito Hori
雅人 堀
Tetsuya Yasuda
哲也 安田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Original Assignee
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Meidensha Corp, Meidensha Electric Manufacturing Co Ltd filed Critical Meidensha Corp
Priority to JP9021140A priority Critical patent/JPH10222607A/en
Publication of JPH10222607A publication Critical patent/JPH10222607A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PROBLEM TO BE SOLVED: To surely recognized separation even when a graphic element is brought into contact with a character element. SOLUTION: A number circle is processed in S1, a processing for adopting θ=0 deg. is decided in S2, then, and whether θis smaller than 180 deg. or not is judged in S3. At the time of Y in judgment, two contact lines in the angle θ of a contour element group are calculated by S4. It is judged whether or not the widths of the two calculated contact lines are out of the standard of the number circule in S5. When they are within a standard, contour elements are judged to be within distance (d) from the two contact lines in S6. At the time of Y by the judgment, a mark is given to the contour element in S7 and the processings from S3-S7 are repeated till θ becomes 180 deg.. At the time of θ>180 deg., it is judged whether or not a part with the mark exists in the contour element or not in S8 so as to adopt a contour element group as a circle element in S9 when it is given and to recognize a circle in S10. When it is judged that the mark does not exist, the contour element group is adopted as a character element in S11 and character recognition is executed in S12.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】この発明は、文字や図形を含
む図面を光学的に読み取り認識する図面自動入力装置に
おいて、図形要素例えば円と、文字要素例えば円内に描
かれた数字とを分離する図形要素と文字要素の分離認識
方法に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic drawing input device for optically reading and recognizing a drawing containing characters and figures, and separates a graphic element such as a circle from a character element such as a number drawn in the circle. The present invention relates to a method for separating and recognizing graphic elements and character elements.

【0002】[0002]

【従来の技術】文書や図面に書かれている文字をコンピ
ュータで読み取る場合、まず被写体をイメージスキャナ
等の撮像手段でラスタスキャンし、入力されたパターン
の白黒に対応する2値画像データから対象となる文字の
データを抽出し、これを辞書と参照することにより認識
処理を行っている。被写体が文字やシンボル,図形等の
混在する図面や表の場合、最初に文字の部分と図形の部
分を分け、更に1文字分の情報ごとに切り出して認識を
行う必要がある。図面に含まれる文字は、文字の大きさ
が一定でない場合が多いので、抽出も認識も大変難し
い。そこで、文字を切り出す前に、入力画像を輪郭ベク
トル化し、要素分離を行う。まず、孤立輪郭ベクトルル
ープの外接四角形の大きさにより文字候補の判定を行
う。文字候補の条件は、図5に示す如く、ベクトルの幅
寸法をL1,高さ寸法をL2とし、文字の大きさの最大値
を規定するしきい値をL0としたとき、 Max{L1,L2}<L である。文字候補のうち検索対象となる文字列の高さと
同程度の高さを持つものを文字列候補の核として、図6
(a)に示すような範囲に中心座標を有する文字候補を
検索する。手順としては、まず文字列核から右側に図6
(b)に示すような範囲内に他の文字候補の中心点を検
索し、検出されたもののうち中心核よりも最も遠い距離
にあるものを次の探索範囲の開始点とし、検出できなか
った場合は右側の探索を終了し、同様な探索を左側に向
かって行う。これらの探索により検出された文字列候補
の中から、図6(c)に示すようi、核の高さhcと候
補間の距離dが、 d(i,j)<hc×定数(1/2≦定数≦1) という関係にあるとき図字の文字列候補内のi,jが文
字列として抽出される。
2. Description of the Related Art When a computer reads characters written in a document or a drawing, the subject is first raster-scanned by an image pickup means such as an image scanner, and the binary pattern data corresponding to the black and white of the input pattern is used as a target. The recognition process is performed by extracting data of a character and referring to the dictionary. When the subject is a drawing or a table in which characters, symbols, figures, and the like are mixed, it is necessary to first separate the character portion from the graphic portion, and then cut out the information for each character to perform recognition. Characters included in drawings are often not uniform in character size, and are therefore very difficult to extract and recognize. Therefore, before cutting out the characters, the input image is converted into a contour vector, and the elements are separated. First, character candidates are determined based on the size of the circumscribed rectangle of the isolated contour vector loop. As shown in FIG. 5, the conditions of the character candidates are as follows: When the width of the vector is L 1 , the height is L 2, and the threshold value defining the maximum value of the character is L 0 , L 1 , L 2 } <L 0 . FIG. 6 shows that a character candidate having the same height as the character string to be searched is used as a core of the character string candidate.
A character candidate having center coordinates in a range as shown in FIG. The procedure is as follows:
The center point of another character candidate is searched in the range as shown in (b), and the detected one located farthest from the central nucleus is set as the start point of the next search range, and is not detected. In this case, the search on the right side ends, and a similar search is performed toward the left side. From the character string candidates detected by these searches, as shown in FIG. 6C, i, the height hc of the nucleus and the distance d between the candidates are: d (i, j) <hc × constant (1 / When 2 ≦ constant ≦ 1), i and j in the character string candidate of the character are extracted as a character string.

【0003】[0003]

【発明が解決しようとする課題】従来の文字認識装置
で、文字の分離が正しく行われるためには、下記の制約
があった。即ち、 (1)文字を構成する輪郭ベクトルの外周を文字以外の
輪郭ベクトルの外周と区別できること。通常、輪郭ベク
トルの外周が所定の値よりも大きければ文字以外の図形
の輪郭ベクトルと判断し、所定の値以内のものを文字の
輪郭ベクトルと判断する。 (2)文字の輪郭ベクトルが他の文字以外の図形、例え
ば線やシンボルなどと接触していないことの二つであ
る。しかし、表などの枠付き文字列で枠内に文字を書く
場合、図7に示す数字「1」,「4」,「6」のよう
に、輪郭ベクトル化した文字と枠が接触してしまうこと
が多く、このような用途に対する文字認識の性能を低下
させる原因になっていた。
In the conventional character recognition device, there are the following restrictions in order to correctly separate characters. That is, (1) the outer periphery of a contour vector constituting a character can be distinguished from the outer periphery of a contour vector other than a character. Usually, if the outer circumference of the contour vector is larger than a predetermined value, it is determined that the outline vector is a figure other than a character, and those within the predetermined value are determined as a character outline vector. (2) The outline vector of a character is not in contact with a figure other than other characters, for example, a line or a symbol. However, when a character is written in a frame using a character string with a frame such as a table, the character comes into contact with the outline vectorized character, such as the numbers “1”, “4”, and “6” shown in FIG. In many cases, the performance of character recognition for such applications is reduced.

【0004】上記の外に、円からなる図形要素と数字か
らなる文字要素とが混在した場合には、まず円を検出
し、しかる後、円の内側の文字(数字)の認識を行う第
1方法を取るか、例えば3桁からなる数字の番号を辞書
に登録し、シンボル認識若しくは文字認識を行う第2方
法が取られている。
[0004] In addition to the above, when a graphic element composed of a circle and a character element composed of a number are mixed, a circle is first detected, and then a first character (number) inside the circle is recognized. Alternatively, a second method of registering, for example, a three-digit number in a dictionary and performing symbol recognition or character recognition is used.

【0005】しかし、上記第1方法では円からなる図形
要素と文字要素が接触している場合や円からなる図形要
素が歪んでいる場合などに、円からなる図形要素と文字
要素の分離がしにくくなる問題があるとともに、かすれ
によって円からなる図形要素が欠けている場合も円から
なる図形要素として認識することができない問題があ
る。また、上記第2の方法では全てのパターンについて
登録しておく必要があるので、メモリ、時間の面で大き
な問題がある。
However, in the first method, when a graphic element composed of a circle is in contact with a character element or when a graphic element composed of a circle is distorted, the graphic element composed of a circle and the character element are separated. In addition to the problem that it becomes difficult, there is also a problem that even when a graphic element composed of a circle is missing due to blurring, it cannot be recognized as a graphic element composed of a circle. Further, in the second method, since it is necessary to register all patterns, there is a large problem in terms of memory and time.

【0006】この発明は上記の事情に鑑みてなされたも
ので、図形要素と文字要素とが接触していても分離認識
が確実にできるとともに、図形要素がかすれていても図
形要素と文字要素の分離認識が確実にできる図形要素と
文字要素の分離認識方法を提供することを課題とする。
The present invention has been made in view of the above circumstances, and it is possible to reliably perform separation recognition even when a graphic element and a character element are in contact with each other. An object of the present invention is to provide a method for separating and recognizing graphic elements and character elements that can reliably perform separation recognition.

【0007】[0007]

【課題を解決するための手段】この発明は、上記の課題
を達成するために、連続している輪郭要素を決定した
後、その輪郭要素を挟持するように平行な接線を輪郭要
素に算出し、その接線から一定の距離以内に存在する輪
郭要素に識別体を付した後、識別体を付した部分と付し
ていない部分を検出し、付した部分の輪郭要素を図形要
素として図形と認識し、付さない部分の輪郭要素を文字
要素として文字と認識することを特徴とするものであ
る。
According to the present invention, in order to achieve the above object, after determining a continuous contour element, a parallel tangent is calculated as the contour element so as to sandwich the contour element. Identifiers are added to contour elements that exist within a certain distance from the tangent line, and the parts with and without identifiers are detected, and the contour elements of the attached parts are recognized as figures as graphic elements. Then, the outline element of the portion not to be added is recognized as a character element as a character.

【0008】[0008]

【発明の実施の形態】以下この発明の実施の形態を図面
に基づいて説明する。図1はこの発明の実施の形態を述
べるためのフローチャートで、図1において、ステップ
S1は連続している輪郭要素を1グループと決定する処
理で、このステップS1は、例えば図2に示すように、
円形内に数字「2」が描かれた番号円からなる図形を処
理するものである。なお、図2において、輪郭要素グル
ープ{ア、イ、ウ、エ、オ、カ}は符号A,B,Cで分
断されている。次に接線を求めるに当たり、図3に示す
ように接線が垂直の場合、θ=0°とする処理をステッ
プS2で決定し、そのθが180°より小さいかをステ
ップS3の処理で判定する。判定の結果、「Y」なら輪
郭要素グループの角度θにおける2本の接線を図3に示
すようにステップS4で算出する。2本の接線は番号円
を挟むように平行に形成される。
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a flowchart for describing an embodiment of the present invention. In FIG. 1, step S1 is a process of determining a continuous contour element as one group. This step S1 is performed, for example, as shown in FIG. ,
It processes a figure consisting of a number circle in which the number "2" is drawn in a circle. In FIG. 2, the outline element groups {A, B, C, D, E} are divided by A, B, C. Next, in obtaining a tangent, if the tangent is vertical as shown in FIG. 3, a process of setting θ = 0 ° is determined in step S2, and whether or not θ is smaller than 180 ° is determined in a process of step S3. If the result of determination is “Y”, two tangents at the angle θ of the contour element group are calculated in step S4 as shown in FIG. The two tangent lines are formed in parallel so as to sandwich the number circle.

【0009】算出した2本の接線の幅が番号円の規格外
かをステップS5で判定し、判定の結果規格に外れる
「Y」なら輪郭要素グループ{ア、イ、ウ、エ、オ、
カ}はノイズであるとし、規格内「N」なら2本の接線
から距離d以内にある輪郭要素であるかをステップS6
で判定する。ステップS6の判定の結果、「N」ならノ
イズとして処理し、「Y」ならステップS7で、図3に
示すように輪郭要素{ア、イ}にマーク(識別体)を付
けて、θをΔθだけ増加(θ=θ+Δθ)させ、θが1
80°になるまで、ステップS3からステップS7の処
理を繰り返す。図4はθが90°のときを示すもので、
輪郭要素{ウ}にもマークを付けたものである。
In step S5, it is determined whether the calculated widths of the two tangents are out of the standard of the number circle. If the result of the determination is "Y" which is out of the standard, the outline element group {, 、, 、, 、, 、, 、
It is assumed that} is noise, and if “N” in the standard, it is determined whether or not the contour element is within a distance d from two tangents in step S6.
Is determined. As a result of the determination in step S6, if "N", it is processed as noise, and if "Y", in step S7, marks (identifiers) are added to the contour elements PIA and II as shown in FIG. Increase (θ = θ + Δθ), and θ becomes 1
Steps S3 to S7 are repeated until the angle reaches 80 °. FIG. 4 shows a case where θ is 90 °,
The outline element {@} is also marked.

【0010】前記θがθ>180°になったとステップ
S3の処理で判定したとき、すなわち、ステップS3で
「N」になったときにはステップS8の処理を行う。こ
のステップS8の処理は、輪郭要素にマークが付いてい
る部分があるかを判定し、付いている「Y」ならステッ
プS9で輪郭要素グループを円要素とし、ステップS1
0で円と認識する。また、ステップS8の処理でマーク
が無し「N」と判定したなら、ステップS11で輪郭要
素グループを文字要素とし、ステップS12で文字認識
とする。ステップS10とステップS12で認識した円
要素と文字要素とをステップS13で1グループとす
る。
When it is determined in step S3 that θ has become θ> 180 °, that is, when it becomes “N” in step S3, the process of step S8 is performed. In the process of step S8, it is determined whether or not there is a marked portion in the contour element. If the contour element is "Y", the contour element group is set as a circle element in step S9, and step S1 is performed.
Recognize a circle as 0. If it is determined in step S8 that there is no mark and "N", the outline element group is set as a character element in step S11, and character recognition is performed in step S12. The circle elements and the character elements recognized in steps S10 and S12 are grouped in step S13.

【0011】上記のようにして番号円のような図形要素
と文字要素とからなる図面においては、接線は必ず図形
要素部分に接触し、接線と文字要素とは接触することは
生じないから、上記のように接線を用いて図形要素と文
字要素とを分離することができるようになる。
In a drawing composed of a graphic element such as a number circle and a character element as described above, the tangent line always contacts the graphic element portion and the tangent line does not contact the character element. A graphic element and a character element can be separated using a tangent line as shown in FIG.

【0012】[0012]

【発明の効果】以上述べたように、この発明によれば、
図形要素部分に文字要素が接触していても確実に両要素
を分離することができ、また、図形要素部分にかすれが
あっても同様に両要素を分離することができる。さら
に、メモリも多大に使用することなく両要素を短時間に
分離することができる。このほか、図形要素部分若しく
は文字要素部分が認識段階で失敗してもその後の処理で
修正しやすい利点などがある。
As described above, according to the present invention,
Even if a character element is in contact with the graphic element part, both elements can be reliably separated, and even if the graphic element part is faint, both elements can be similarly separated. Further, both elements can be separated in a short time without using much memory. In addition, there is an advantage that even if the graphic element part or the character element part fails in the recognition stage, it can be easily corrected in subsequent processing.

【図面の簡単な説明】[Brief description of the drawings]

【図1】この発明の実施の形態を述べるフローチャー
ト。
FIG. 1 is a flowchart describing an embodiment of the present invention.

【図2】番号円を示す説明図。FIG. 2 is an explanatory diagram showing a number circle.

【図3】番号円に接線を付したときの説明図。FIG. 3 is an explanatory diagram when a tangent line is attached to a number circle.

【図4】接線が90°になったときの説明図。FIG. 4 is an explanatory diagram when a tangent becomes 90 °.

【図5】文字認識処理の説明図。FIG. 5 is an explanatory diagram of a character recognition process.

【図6】文字認識処理の説明図。FIG. 6 is an explanatory diagram of a character recognition process.

【図7】輪郭ベクトルの説明図。FIG. 7 is an explanatory diagram of a contour vector.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 連続している輪郭要素を決定した後、そ
の輪郭要素を挟持するように平行な接線を輪郭要素に算
出し、その接線から一定の距離以内に存在する輪郭要素
に識別体を付した後、識別体を付した部分と付していな
い部分を検出し、付した部分の輪郭要素を図形要素とし
て図形と認識し、付さない部分の輪郭要素を文字要素と
して文字と認識することを特徴とする図形要素と文字要
素の分離認識方法。
After determining a continuous contour element, a parallel tangent is calculated as a contour element so as to sandwich the contour element, and an identification object is assigned to a contour element existing within a predetermined distance from the tangent. After the identification, the part with the identification body and the part without the identification are detected, the outline element of the part with the identification is recognized as a graphic element as a graphic element, and the outline element of the part without the identification is recognized as a character as a character element. A method of separating and recognizing graphic elements and character elements.
JP9021140A 1997-02-04 1997-02-04 Method for recognizing separation of graphic element from character element Pending JPH10222607A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP9021140A JPH10222607A (en) 1997-02-04 1997-02-04 Method for recognizing separation of graphic element from character element

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP9021140A JPH10222607A (en) 1997-02-04 1997-02-04 Method for recognizing separation of graphic element from character element

Publications (1)

Publication Number Publication Date
JPH10222607A true JPH10222607A (en) 1998-08-21

Family

ID=12046602

Family Applications (1)

Application Number Title Priority Date Filing Date
JP9021140A Pending JPH10222607A (en) 1997-02-04 1997-02-04 Method for recognizing separation of graphic element from character element

Country Status (1)

Country Link
JP (1) JPH10222607A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901541A (en) * 2021-09-08 2022-01-07 长沙泛一参数信息技术有限公司 Identification method for elevation marking of building and structure diagram

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901541A (en) * 2021-09-08 2022-01-07 长沙泛一参数信息技术有限公司 Identification method for elevation marking of building and structure diagram

Similar Documents

Publication Publication Date Title
EP1909215B1 (en) Image region detection method, recording medium, and device therefor
US4903311A (en) Character region extracting method and apparatus capable of implementing the method
US5077805A (en) Hybrid feature-based and template matching optical character recognition system
US5048113A (en) Character recognition post-processing method
JPS62254282A (en) Method and apparatus for separating overlapped pattern
JPH10222607A (en) Method for recognizing separation of graphic element from character element
JP2917427B2 (en) Drawing reader
JPH05242294A (en) Drawing reader
JPH0785221A (en) Method for separating and recognizing character and symbol in automatic drawing recognizing device
JP3197441B2 (en) Character recognition device
JP2746345B2 (en) Post-processing method for character recognition
JP2925270B2 (en) Character reader
JP3151866B2 (en) English character recognition method
JP3193573B2 (en) Character recognition device with brackets
JP3541093B2 (en) Document image inclination detection method and apparatus
JP4580520B2 (en) Character recognition method and character recognition apparatus
JPH11126216A (en) Automatic drawing input device
JP3428504B2 (en) Character recognition device
JP2002279344A (en) Character recognition device and method, and recording medium
JP2974396B2 (en) Image processing method and apparatus
JP2004013704A (en) Original direction distinguishing method for character recognition processing
JPH031711B2 (en)
JPH0554071A (en) Digital translation device
JPH0757047A (en) Character segmentation system
JPH0816720A (en) Character recognition device