JPH0212390A - Character-string area extracting device - Google Patents

Character-string area extracting device

Info

Publication number
JPH0212390A
JPH0212390A JP63161680A JP16168088A JPH0212390A JP H0212390 A JPH0212390 A JP H0212390A JP 63161680 A JP63161680 A JP 63161680A JP 16168088 A JP16168088 A JP 16168088A JP H0212390 A JPH0212390 A JP H0212390A
Authority
JP
Japan
Prior art keywords
character
character string
string
area
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63161680A
Other languages
Japanese (ja)
Inventor
Kinji Hashimoto
橋本 欽司
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP63161680A priority Critical patent/JPH0212390A/en
Publication of JPH0212390A publication Critical patent/JPH0212390A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To correctly extract a character-string area even when character-strings in two horizontal and vertical directions are written adjacently by using information related to an effective range of the character-string related to each symbol of each connection and a kind of the character-string which can be brought to notation. CONSTITUTION:The title device is provided with a character-string area candidate extracting means 1, a character recognizing means 2, a character connecting relation extracting means 3, a character information storage part 4, a character-string information extracting means 5, a character-string information storage part 6, and a character- string area extracting means 7. In this state, by utilizing information related to the effective range of a character-string related to each symbol or each connected and the kind of the character-string which can be brought to notation, and determining a character-string area alternately in order of the horizontal direction and the vertical direction, or the vertical direction and the horizontal direction, characters overlapped and contained in two character-string areas are determined surely to the character of one character-string area. In such a way, even when character-strings in two horizontal and vertical directions are written adjacently, the character-string area can be extracted correctly.

Description

【発明の詳細な説明】 産業上の利用分舟 本発明はシンボル又はシンボル間の結合関係を表す線分
である結線と、水平・垂直2方向に書かれた文字列を含
み、シンボル又は結線の近傍に位置し、あらかじめ定め
た範囲である文字列の有効範囲に表記される文字列の種
類がシンボル又は結線毎に限定されているという規則を
有する図面等の画像から、文字列領域を抽出する際の文
字列領域抽出装置に関するものである。
DETAILED DESCRIPTION OF THE INVENTION Industrial Application The present invention includes a symbol or a connection line, which is a line segment representing a connection relationship between symbols, and a character string written in two directions, horizontally and vertically. Extract a character string area from an image such as a drawing that is located nearby and has a rule that the types of character strings that can be written in the valid range of character strings, which is a predetermined range, are limited for each symbol or connection. The present invention relates to a character string region extracting device.

従来の技術 従来、水平拳垂直2方向に書かれた文字列を含む図面等
の画像から文字列領域を抽出する手法としては、 (1)文字列領域を抽出するために優先方向を設け、ま
ずその優先方向で文字列領域を抽出し、その後その残り
の文字からもう一方の方向で文字列領域の抽出を行う方
法 (2)水平・垂直2方向で文字列領域候補を抽出し、水
平中垂直2方向の文字列領域候補に含まれた文字の水平
・垂直2方向の文字認識結果によりその文字が含まれる
文字列領域を決定する方法などがある。第3図は第1の
従来例における処理のブロック図を示し、第4図はその
流れを説明したパターン図である。ここでは、文字列領
域を抽出するための優先方向が水平の場合のみ示してい
るが、優先方向が垂直の場合も同様である。第5図は第
2の従来例における処理のブロック図を示し、第6図は
その流れを説明したパターン図である。第6図において
、文字”C”と文字″F”は水平・垂直2方向の文字列
領域に一旦含まれるが、その文字の水平・垂直2方向の
文字認識の結果により確信度の高い方向の文字列領域の
文字として決定される。
Conventional technology Conventionally, methods for extracting character string regions from images such as drawings that include character strings written in two directions, horizontal and vertical, include: (1) Setting a priority direction to extract a character string region; A method of extracting a character string area in that preferred direction, and then extracting a character string area from the remaining characters in the other direction (2) Extracting character string area candidates in two directions, horizontal and vertical, There is a method of determining a character string area including a character included in character string area candidates in two directions based on character recognition results in two directions, horizontal and vertical. FIG. 3 shows a block diagram of the processing in the first conventional example, and FIG. 4 is a pattern diagram explaining the flow. Although only the case where the preferential direction for extracting a character string area is horizontal is shown here, the same applies to the case where the preferential direction is vertical. FIG. 5 shows a block diagram of processing in the second conventional example, and FIG. 6 is a pattern diagram explaining the flow. In Figure 6, the character "C" and the character "F" are once included in the character string area in two horizontal and vertical directions, but based on the results of character recognition in the two horizontal and vertical directions, the characters in the direction with high confidence are Determined as characters in the string area.

発明が解決しようとする課題 しかしながら上記第1の従来例では水平・垂直2方向の
文字列が近接して書かれた場合、文字列領域の抽出が正
しくできないという課題がある。
Problems to be Solved by the Invention However, in the first conventional example, there is a problem in that when character strings are written close to each other in two directions, horizontal and vertical, the character string area cannot be extracted correctly.

又上記第2の従来例では水平・工直2方向の文字列領域
に含まれる文字の水平−垂直2方向の候補文字の確信度
に差が見られない場合などで文字列領域の抽出誤りが生
じると言う問題を有していた。
Furthermore, in the second conventional example described above, an error in character string area extraction may occur when there is no difference in the reliability of candidate characters in the horizontal and vertical directions for characters included in the character string area in the horizontal and vertical directions. There was a problem that occurred.

本発明は上記問題点に鑑み、各シンボル又は各結線につ
いて文字列の有効範囲及び表記可能な文字列の種類に関
する情報を利用し、水平・垂直2方向の文字列が近接し
て書かれた場合でも文字列領域が正しく抽出できる文字
列領域抽出装置を提供するものである。
In view of the above problems, the present invention utilizes information regarding the effective range of character strings and types of character strings that can be written for each symbol or each connection, and uses information regarding the effective range of character strings and types of character strings that can be written for each symbol or each connection, and when character strings in two directions, horizontal and vertical, are written close to each other. However, the present invention provides a character string region extracting device that can correctly extract character string regions.

課題を解決するための手段 上記課題を解決するために、本発明の文字列領域抽出装
置は、文字列領域候補抽出手段と、文字認識手段と、文
字接続関係抽出手段と、文字情報記憶部と、文字列情報
記憶部と、文字列領域抽出手段という構成を備えたもの
である。
Means for Solving the Problems In order to solve the above problems, a character string region extraction device of the present invention includes a character string region candidate extraction means, a character recognition means, a character connection relation extraction means, a character information storage unit, , a character string information storage section, and a character string area extraction means.

作   用 本発明は上記した構成によって、各シンボル又は各結線
について文字列の有効範囲及び表記可能な文字列の種類
に関する情報を利用し、水平方向、垂直方向あるいは垂
直方向、水平方向の順に交互に文字列領域を決定してい
くことにより、2つの文字列領域に重複して含まれた文
字を確実に1つの文字列領域の文字に決定することが可
能なこととなる。
Effect: With the above-described configuration, the present invention utilizes information regarding the effective range of character strings and types of character strings that can be written for each symbol or each connection, and alternately performs horizontal direction, vertical direction, or vertical direction, and horizontal direction in this order. By determining the character string areas, it becomes possible to reliably determine characters that are included in two character string areas overlappingly into characters in one character string area.

実施例 以下本発明の一実施例の文字列領域抽出装置について図
面を参照しながら説明する。第1図は本発明の一実施例
における文字列領域抽出装置の処理のブロック図を示し
ている。第1図において、1はシンボル又はシンボル間
の結合関係を表す線分である結線と、水平・垂直2方向
に書かれた文字列を含み、シンボル又は結線の近傍に位
置し、あらかじめ定めた範囲である文字列の有効範囲に
表記される文字列の種類がシンボル又は結線毎に限定さ
れているという規則を宵する図面等の画像から、水平・
垂直各方向について、領域があるしきい値内で隣接する
文字の並びを抽出する文字列領域候補抽出手段、2は抽
出された文字列領域候補内の文字を認識する文字認識手
段、3は文字間の接続関係を抽出する文字接続関係抽出
手段、4は文字認識の結果及び文字間の接続関係を格納
する文字情報記憶部、5は各シンボル又は各結線の種類
と位置情報を利用し、文字列の有効範囲及び表記可能な
文字列の種類を抽出する文字列情報抽出手段、6は文字
列情報を格納する文字列情報記憶部、7は文字情報及び
文字列清報を用い、水平方向、垂直方向あるいは垂直方
向、水平方向の順に交互に文字列領域を決定していく文
字列領域抽出手段である。なお、ここで実線は制御の流
れ、第破線はデータの流れをそれぞれ示す。2図は本発
明の一実施例における文字列領域抽出の流れを説明した
パターン図である。第2図において、R1゜R2、R3
はシンボル、SCI、  SC2、SC3は文字列領域
候補、C1から09は文字列を構成する文字領域、A1
、A2、A3はそれぞれR1,R2、R3に関する文字
列の有効範囲、Sl、  S2、S3は抽出された文字
列領域である。以下第1図、第2図を用いてその動作を
説明する。まず、文字列領域候補抽出手段1においては
、水平方向について領域が隣接する文字の並びを抽出す
る事により文字列領域候補sciとSC2を抽出し、そ
の後垂直方向について同じ手法により文字列領域候補S
C3を抽出する。次に、文字認識手段2においては、文
字列領域候補内の文字を文字列領域候補の方向で認識し
、その認識結果を文字情報記憶部4に格納する。文字接
続関係抽出手段3においては、認識した文字の位置情報
を用い文字間の接続関係を抽出し、その情報を文字情報
記憶部4に格納する。ここで、文字列領域候補SCIは
文字領域CI、C2、C3、C4を、文字列領域候補S
C2は文字領域C5、C6,07を、 文字列領域候補
SC3は文字領域CI、 CB、09、C4を含む。さ
らに、文字列情報抽出手段5においては、各シンボル又
は各結線と位置情報を利用し、文字列の有効範囲及び表
記可能な文字列の種類を抽出する。ここで、文字列の有
効範囲AI、A2、A3と表記可能な文字列の種類”数
字列千単位″が抽出される。この文字列情報を文字列情
報記憶部6に格納する。最後に、文字列領域抽出手段7
においては、文字情報及び文字列情報を用い文字列領域
を抽出する。第一回目の水平方向についての文字列領域
の抽出では、対象となる文字列領域候補きしてSCIと
SC2がある。SCIはR3に関する文字列の有効範囲
A3に表記されているが、文字列の先頭文字CIの認識
結果が数字とならないため表記可能な文字列の種類に当
てはまらず文字列領域として抽出されない。一方、SC
2は、R1に関する文字列の有効領域AIに表記されて
おり、又表記可能な文字列の種類も″数字列(10)+
単位(Ω)“である。そこで、文字領域C5、CB、C
7を含む文字列領域Slが抽出される。
Embodiment Hereinafter, a character string region extraction device according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 shows a block diagram of processing of a character string region extracting device in an embodiment of the present invention. In Figure 1, 1 includes symbols or connections, which are line segments representing the connection relationship between symbols, and character strings written in two directions, horizontally and vertically, located near the symbols or connections, and within a predetermined range. Horizontal and
Character string region candidate extraction means extracts a sequence of adjacent characters within a certain threshold area in each vertical direction; 2 character recognition means recognizes characters within the extracted character string region candidates; 3 character recognition means 4 is a character information storage unit that stores the result of character recognition and the connection relationship between characters; 5 is a character information storage unit that stores the result of character recognition and the connection relationship between characters; Character string information extraction means for extracting the valid range of columns and types of character strings that can be written; 6 is a character string information storage unit that stores character string information; 7 is a means for extracting character string information using character information and character string information; This is a character string area extraction means that alternately determines character string areas in the vertical direction or in the vertical and horizontal directions. Note that here, the solid line indicates the flow of control, and the broken line indicates the flow of data. FIG. 2 is a pattern diagram illustrating the flow of character string region extraction in an embodiment of the present invention. In Figure 2, R1°R2, R3
is a symbol, SCI, SC2 and SC3 are character string area candidates, C1 to 09 are character areas that make up the character string, A1
, A2, and A3 are valid ranges of character strings related to R1, R2, and R3, respectively, and Sl, S2, and S3 are extracted character string regions. The operation will be explained below using FIGS. 1 and 2. First, the character string region candidate extracting means 1 extracts character string region candidates sci and SC2 by extracting a sequence of characters whose regions are adjacent in the horizontal direction, and then extracts character string region candidates S by the same method in the vertical direction.
Extract C3. Next, the character recognition means 2 recognizes the characters within the character string area candidate in the direction of the character string area candidate, and stores the recognition result in the character information storage section 4. The character connection relation extraction means 3 extracts the connection relations between characters using the position information of the recognized characters, and stores the information in the character information storage section 4. Here, the character string area candidate SCI converts the character areas CI, C2, C3, and C4 into the character string area candidate S
C2 includes character areas C5, C6, and 07, and character string area candidate SC3 includes character areas CI, CB, 09, and C4. Furthermore, the character string information extraction means 5 uses each symbol or each connection and position information to extract the valid range of character strings and the types of character strings that can be written. Here, the effective range of character strings AI, A2, and A3 and the type of character string that can be expressed as "numeric string in thousands" are extracted. This character string information is stored in the character string information storage section 6. Finally, character string area extraction means 7
In this method, a character string area is extracted using character information and character string information. In the first extraction of a character string area in the horizontal direction, there are SCI and SC2 as target character string area candidates. SCI is written in the valid range A3 of the character string related to R3, but since the recognition result of the first character CI of the character string is not a number, it does not fit into the types of character strings that can be written and is not extracted as a character string area. On the other hand, S.C.
2 is written in the effective area AI of the character string related to R1, and the type of character string that can be written is also ``Number string (10) +
unit (Ω). Therefore, character areas C5, CB, C
A character string area Sl containing 7 is extracted.

なお、文字列領域候補SC3に含まれていた文字領域C
5は省かれ、SC3は文字領域CI、  CB、CBよ
り構成される事になる。第一回目の垂直方向についての
文字列領域の抽出では、対象となる文字列領域候補とし
てSC3がある。SC3は、R2に関する文字列の存効
領域人2に表記されており、又表記可能な文字列の種類
も”数字列(20)千単位(Ω)”である。
Note that the character area C included in the character string area candidate SC3
5 will be omitted, and SC3 will consist of character areas CI, CB, and CB. In the first extraction of a character string area in the vertical direction, SC3 is a target character string area candidate. SC3 is written in the effective area person 2 of the character string related to R2, and the type of character string that can be written is "number string (20) in thousands (Ω)".

そこで、文字領域C11C8、C9を含む文字列領域S
2が抽出される。なお、文字列領域候補SC1に含まれ
ていた文字領域C1は省かれ、SCIは文字領域C2、
C3、C4より構成される事になる。第二回目の水平方
向についての文字列領域の抽出では、対象となる文字列
領域候補としてSolがある。SCIは、R3に関する
文字列のを動領域A3に表記されており、又表記可能な
文字列の種類もパ数字列(30)千単位(Ω)”である
。そこで、文字領域02.03、C4を含む文字列領域
S3が抽出される。上記の手法により、文字列領域Sl
、S2、S3が抽出される。
Therefore, the character string area S including the character areas C11C8 and C9
2 is extracted. Note that the character area C1 included in the character string area candidate SC1 is omitted, and the SCI is the character area C2,
It will be composed of C3 and C4. In the second extraction of a character string region in the horizontal direction, Sol is a target character string region candidate. In the SCI, the character string related to R3 is written in the dynamic area A3, and the type of character string that can be written is ``Parameter string (30) thousand units (Ω).'' Therefore, the character string 02.03, A character string area S3 including C4 is extracted. By the above method, the character string area S1
, S2, and S3 are extracted.

発明の効果 以上のように本発明は、 文字列領域候補抽出手段と、
文字認識手段と、文字接続関係抽出手段と、文字情報記
憶部と、文字列情報記憶部と、文字列領域決定抽出を設
ける事により、シンボル又はシンボル間の結合関係を示
す線分である結線と、水平拳垂i’i!2方向に書かれ
た文字列を含み、シンボル又は結線の近傍に位置し、あ
らかじめ定めた節回である文字列の有効範囲に表記され
る文字列の種類がシンボル又は結線毎に限定されている
という規則を有する図面等の画像から、各シンボル又は
各結線に関する文字列の有効範囲と表記可能な文字列の
種類についての情報を用い、水平e垂直2方向の文字列
が近接して書かれた場合でも文字列領域を正しく抽出す
ることができる。
Effects of the Invention As described above, the present invention includes a character string region candidate extraction means;
By providing a character recognition means, a character connection relation extraction means, a character information storage section, a character string information storage section, and a character string area determination extraction, it is possible to detect connections, which are line segments indicating symbols or connection relations between symbols. , horizontal fist i'i! Includes character strings written in two directions, is located near a symbol or connection, and the types of character strings that are written in the valid range of the character string that is a predetermined turn are limited for each symbol or connection. Using information about the valid range of character strings for each symbol or each connection and the types of character strings that can be written, from images of drawings etc. that have the rule, character strings in two directions (horizontal and vertical) are written close to each other. The string area can be extracted correctly even in the case of

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例における文字列領域抽出装置
の全体構成を示すブロック図、第2図は本発明の一実施
例における文字列領域抽出の流れを説明したパターン図
、第3図は第1の従来例における処理のブロック図、第
4図は第1の従来例の流れを説明したパターン図、第5
図は第2の従来例における処理のブロック図、第6図は
第2の従来例の流れを説明したパターン図である。 1・・・文字列領域候補抽出手段、2・・・文字認識手
段、3・・・文字接続関係抽出手段、4・・・文字情報
記憶部、5・・・文字列情報抽出手段、6・・・文字列
情報記憶部、7・・・文字列領域抽出手段。 第 図 第 図 第 図
FIG. 1 is a block diagram showing the overall configuration of a character string region extraction device in an embodiment of the present invention, FIG. 2 is a pattern diagram illustrating the flow of character string region extraction in an embodiment of the present invention, and FIG. 3 is a block diagram of processing in the first conventional example, Fig. 4 is a pattern diagram explaining the flow of the first conventional example, and Fig. 5 is a block diagram of processing in the first conventional example.
The figure is a block diagram of processing in the second conventional example, and FIG. 6 is a pattern diagram explaining the flow of the second conventional example. DESCRIPTION OF SYMBOLS 1... Character string area candidate extraction means, 2... Character recognition means, 3... Character connection relationship extraction means, 4... Character information storage section, 5... Character string information extraction means, 6. . . . Character string information storage unit, 7 . . . Character string area extraction means. Figure Figure Figure

Claims (1)

【特許請求の範囲】[Claims] シンボル又は前記シンボル間の結合関係を表す線分であ
る結線と、水平・垂直2方向に書かれた文字列を含み、
前記シンボル又は前記結線の近傍に位置し、あらかじめ
定めた範囲である文字列の有効範囲に表記される文字列
の種類が前記シンボル又は前記結線毎に限定されている
という規則を有する図面等の画像から、水平・垂直各方
向について、領域が隣接する文字の並びを抽出する文字
列領域候補抽出手段と、抽出された前記文字列領域候補
内の文字を認識する文字認識手段と、文字間の接続関係
を抽出する文字接続関係抽出手段と、前記文字認識の結
果及び前記文字間の接続関係を格納する文字情報記憶部
と、各前記シンボル又は各前記結線について前記文字列
の有効範囲及び表記可能な文字列の種類が格納されてい
る文字列情報記憶部と、前記文字情報及び前記文字列情
報を用い、水平方向、垂直方向あるいは垂直方向、水平
方向の順に交互に文字列領域を決定していく文字列領域
抽出手段を備え、水平・垂直2方向での前記文字列領域
候補に重複して含まれた文字を一つの文字列領域に決定
できる事を特徴とする文字列領域抽出装置。
It includes a symbol or a line segment that represents a connection relationship between the symbols, and a character string written in two directions, horizontally and vertically,
An image of a drawing or the like that is located near the symbol or the connection and has a rule that the type of character string written in a predetermined valid range of character strings is limited for each symbol or connection. character string region candidate extraction means for extracting a sequence of characters whose regions are adjacent in each horizontal and vertical direction; character recognition means for recognizing characters in the extracted character string region candidates; and connections between characters. a character connection relation extraction means for extracting relationships; a character information storage section for storing the result of the character recognition and the connection relations between the characters; Using a character string information storage unit storing the type of character string, the character information, and the character string information, a character string area is determined horizontally and vertically or alternately in the vertical and horizontal directions. A character string region extracting device comprising a character string region extracting means, and capable of determining characters duplicately included in the character string region candidates in two horizontal and vertical directions as one character string region.
JP63161680A 1988-06-29 1988-06-29 Character-string area extracting device Pending JPH0212390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63161680A JPH0212390A (en) 1988-06-29 1988-06-29 Character-string area extracting device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63161680A JPH0212390A (en) 1988-06-29 1988-06-29 Character-string area extracting device

Publications (1)

Publication Number Publication Date
JPH0212390A true JPH0212390A (en) 1990-01-17

Family

ID=15739802

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63161680A Pending JPH0212390A (en) 1988-06-29 1988-06-29 Character-string area extracting device

Country Status (1)

Country Link
JP (1) JPH0212390A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2358195A (en) * 2000-01-13 2001-07-18 Atofina Electrolytic synthesis of tetramethylammonium hydroxide

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS593593A (en) * 1982-06-30 1984-01-10 Fujitsu Ltd Separating system of character data
JPS62134767A (en) * 1985-12-06 1987-06-17 Fujitsu Ltd Automatic extracting device for symbol name and segment name

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS593593A (en) * 1982-06-30 1984-01-10 Fujitsu Ltd Separating system of character data
JPS62134767A (en) * 1985-12-06 1987-06-17 Fujitsu Ltd Automatic extracting device for symbol name and segment name

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2358195A (en) * 2000-01-13 2001-07-18 Atofina Electrolytic synthesis of tetramethylammonium hydroxide

Similar Documents

Publication Publication Date Title
JPS63182793A (en) Character segmenting system
JPH0212390A (en) Character-string area extracting device
JPH0520794B2 (en)
JPH1063744A (en) Method and system for analyzing layout of document
JPH0247788B2 (en)
JP2618018B2 (en) Character recognition device
Hwang et al. Segmentation of a text printed in Korean and English using structure information and character recognizers
JP2856409B2 (en) Character recognition apparatus and method
JPH11203405A (en) Character recognition device, its method and program recording medium
JPH0586585B2 (en)
JP2976445B2 (en) Character recognition device
JPH0217575A (en) Drawing automatic recognizing system
JP3151866B2 (en) English character recognition method
JPH04199274A (en) Filing system
JPS6389990A (en) Character reading system
JPS61131091A (en) Character reader
JP2004280530A (en) System and method for processing form
JPH04115384A (en) Japanese ocr having word checking function
JPH0353392A (en) Character recognizing device
JPH03116392A (en) Pattern recognition postprocessing system
JP2870640B2 (en) Figure recognition method
JPH06203201A (en) Method and device for recognizing optical handwritten character
JP2925270B2 (en) Character reader
JPS61163477A (en) Character recognition device
JPS62236088A (en) Recognizing device of character with sonant mark and p-sonant mark