JPH0324681A - Character recognizing device - Google Patents

Character recognizing device

Info

Publication number
JPH0324681A
JPH0324681A JP1156737A JP15673789A JPH0324681A JP H0324681 A JPH0324681 A JP H0324681A JP 1156737 A JP1156737 A JP 1156737A JP 15673789 A JP15673789 A JP 15673789A JP H0324681 A JPH0324681 A JP H0324681A
Authority
JP
Japan
Prior art keywords
character
image data
line
area
projection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1156737A
Other languages
Japanese (ja)
Other versions
JP2933947B2 (en
Inventor
Tetsuomi Tanaka
哲臣 田中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to JP1156737A priority Critical patent/JP2933947B2/en
Publication of JPH0324681A publication Critical patent/JPH0324681A/en
Application granted granted Critical
Publication of JP2933947B2 publication Critical patent/JP2933947B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To segment an inclined character string by distinguishing the image data of a character line in the main scanning direction, extracting a character existence area, detecting character projection information in the subscanning direction, segmenting a character area, and executing character recognition processing. CONSTITUTION:The image data of a character string from an original image obtained from a reader 20 are inputted through an input part 2 and stored in a RAM 7. A CPU 5 divides the image data into 4 equal parts in the line direction and checks a scanning line number including the image data in each division. The image data of character strings other than the target string are removed and the projection area of each division of the target character string is extracted. The projected image of the character line in the height direction is found out. Character width in the projection area is determined and the character area of one character is segmented from the height and width to recognize the character. Said processing is applied to all characters.

Description

【発明の詳細な説明】 [産業上の利用分野] 本発明は文字認識装置に関し、例えば、斜行した文字列
から文字切出しを行う文字認識装置に関するものである
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a character recognition device, and for example, to a character recognition device that cuts out characters from a diagonal character string.

[従来の技術] 従来、この種の装置においては、横書き(または、縦書
き)文字では横(または、縦)方向の射影を取り出して
行の切出しを行い、この後に、各行に対して縦(または
、横)方向の射影を取り出して文字の切出しを行ってい
る。
[Prior Art] Conventionally, in this type of device, for characters written horizontally (or vertically), the projection in the horizontal (or vertical) direction is taken out to cut out the lines, and then the vertical (or vertical) Alternatively, characters are extracted by extracting the projection in the horizontal direction.

その一例を第5図(a),(b)を用いて説明する。An example of this will be explained using FIGS. 5(a) and 5(b).

まず、同図中、℃は行の長さ、h゜は行間隔、Xは行方
向射影の傾き限界角、hは文字の高さ、β゜は文字間隔
、yは文字高さ方向の傾き限界角を示し、上記記号によ
り、Xl3’は以下の式が成り立つ。すなわち、 h x=tan−’ β (1) β y=tan−’ −             (2)
h となる。
First, in the same figure, ℃ is the line length, h゜ is the line spacing, X is the slope limit angle of line direction projection, h is the character height, β゜ is the character spacing, and y is the slope in the character height direction. Indicates the limit angle, and using the above symbols, the following formula holds true for Xl3'. That is, h x=tan-' β (1) β y=tan-' - (2)
h.

[発明が解決しようとしている課題】 しかしながら、上記従来例では、斜行した文字切出しを
行った場合、第5図(a),(b)に示されるように、
行の長さ(I2)と行間(h゜)との比β:h゜は文字
の高さhと文字間2゛との比h:β゜に対して非常に大
きいため、行の射影を坂る限界角度Xは文字間の射影の
限界角yに対して非常に小さいため、原稿中の傾きの小
さい文字列しか文字の切出しが行えないという欠点があ
る. 本発明は上述した従来例の欠点に鑑みてなされたもので
あり、その目的とするところは、原稿中の傾きの大きい
文字列においても文字の切出しを行える文字認識装置を
提供する点にある。
[Problems to be Solved by the Invention] However, in the above conventional example, when cutting out oblique characters, as shown in FIGS. 5(a) and (b),
Since the ratio β:h゜ between the line length (I2) and the line spacing (h゜) is very large compared to the ratio h:β゛ between the character height h and the character spacing 2゛, the line projection is Since the limit angle of slope, The present invention has been made in view of the above-mentioned drawbacks of the conventional example, and its object is to provide a character recognition device that can cut out characters even in a character string with a large inclination in a document.

[課題を解決するための手段] 上述した課題を解決し、目的を達成するため、本発明に
係わる文字認識装置は、少なくともl行分の文字行情報
を含む画像データを入力する入力手段と、該入力手段で
入力された画像データを主走査方向で区分する区分手段
と、該区分手段で区分された画像データ毎に前記文字行
情報の存在する領域を抽出する抽出手段と、該抽出手段
で抽出された区分毎の領域に副走査方向での文字の射影
情報を検出する検出手段と、該検出手段で検出された文
字の射影情報に基づいて文字領域の切出しを行う文字切
出手段と、該文字切出手段で切出された文字領域毎に文
字認識処理を実施する認識手段とを備える。
[Means for Solving the Problems] In order to solve the above-mentioned problems and achieve the objectives, a character recognition device according to the present invention includes an input means for inputting image data containing at least l lines of character line information; a dividing means for dividing the image data inputted by the input means in the main scanning direction; an extracting means for extracting an area where the character line information exists for each image data divided by the dividing means; a detection means for detecting character projection information in the sub-scanning direction in each extracted region; a character cutting means for cutting out a character region based on the character projection information detected by the detection means; and recognition means for performing character recognition processing for each character area cut out by the character cutting means.

[作用] かかる構成によれば、入力手段は少なくともl行分の文
字行情報を含む画像データを入力し、区分手段は入力手
段で入力された画像データを主走査方向で区分し、抽出
手段は区分手段で区分された画像データ毎に文字行情報
の存在する領域を抽出し、検出手段は抽出手段で抽出さ
れた区分毎の領域に副走査方向での文字の射影情報を検
出し、文字切出手段は検出手段で検出された文字の射影
情報に基づいて文字領域の切出しを行い、認識手段は文
字切出手段で切出された文字領域毎に文字認識処理を実
施する。
[Operation] According to this configuration, the input means inputs image data including at least l lines of character line information, the sorting means sorts the image data inputted by the input means in the main scanning direction, and the extracting means The segmentation means extracts a region where character line information exists for each segmented image data, and the detection means detects character projection information in the sub-scanning direction in the region for each segment extracted by the extraction means, and performs character cutting. The output means cuts out a character area based on the projection information of the character detected by the detection means, and the recognition means performs character recognition processing for each character area cut out by the character cutting means.

【実施例] 以下、添付図面を参照して、本発明に係わる好適な実施
例を詳細に説明する。
[Embodiments] Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

第1図は本実施例の文字認識装置の構成を説明するブロ
ック図である。同図において、1は本実施例の文字認識
装置を示し、20は本装置lに原稿から光学的に読取っ
た画像データを送出する読取装置を示し、30は本装置
1で文字認識された認識結果を入力して表示する表示装
置を示している。尚、認識結果の出力先は、表示装置3
0に代わり、パーソナルコンピュータとしても良い。
FIG. 1 is a block diagram illustrating the configuration of the character recognition device of this embodiment. In the figure, 1 indicates the character recognition device of this embodiment, 20 indicates a reading device that sends image data optically read from a document to this device 1, and 30 indicates character recognition recognized by this device 1. A display device for inputting and displaying results is shown. Note that the output destination of the recognition results is the display device 3.
0 may be replaced by a personal computer.

本装置1内において、2は読取装置20からドットデー
タを入力する入力部を示し、8は表示装置30に認識結
果、即ち、データを出力する出力部を示している。3は
原稿の読取りや文字の認識結果の表示等を指示する操作
を行うための操作部を示している.4は読み取られた画
像データから文字の切出しが行われた領域の文字認識を
行う文字認識部を示し、5は本装置1全体の動作をRO
MG中の各種プログラムを用いて制御するCPUを示し
、6は制御プログラム,エラー処理プログラム,第2図
に示されるフローチャートのプログラム等を格納してい
るROMを示している。7は上記各種プログラムのワー
クエリア及びエラー処理時の一時退避エリアとして用い
るRAMを示している。そして、9は本装置1内のデー
タ,アドレス信号,制御信号を伝送するバスラインを示
している。
In the apparatus 1, numeral 2 indicates an input section for inputting dot data from the reading device 20, and numeral 8 indicates an output section for outputting recognition results, ie, data, to the display device 30. Reference numeral 3 indicates an operation section for performing operations such as reading a manuscript and displaying character recognition results. Reference numeral 4 indicates a character recognition unit that performs character recognition in an area where characters have been cut out from read image data, and 5 indicates an RO unit that controls the overall operation of the device 1.
A CPU is shown which controls using various programs in the MG, and 6 shows a ROM which stores a control program, an error processing program, a program of the flowchart shown in FIG. 2, and the like. Reference numeral 7 indicates a RAM used as a work area for the various programs mentioned above and a temporary save area during error processing. Reference numeral 9 indicates a bus line for transmitting data, address signals, and control signals within the device 1.

次に、本実施例の動作について説明する。Next, the operation of this embodiment will be explained.

第2図は本実施例の文字認識動作を説明するフローチャ
ートであり、第3図は1文字分の領域を切出すまでの過
程を説明する図である。
FIG. 2 is a flowchart illustrating the character recognition operation of this embodiment, and FIG. 3 is a diagram illustrating the process up to cutting out an area for one character.

まず、読取装置20で読取られた原稿画像の画像データ
は入力部2を介して入力された後に、RAM7に格納さ
れる(ステップSL)。そして、第3図の第1工程(1
)に示されるように画像データに対して行方向にA−D
の4等分に区分される(ステップS2)。第3図によれ
ば、真中の文字行を文字切出しの対象の文字行としてい
る。
First, image data of a document image read by the reading device 20 is input via the input unit 2 and then stored in the RAM 7 (step SL). Then, the first step (1
), A-D in the row direction for the image data.
The image is divided into four equal parts (step S2). According to FIG. 3, the middle character line is the character line to be cut out.

その対象の文字行の画像データの存在を確認するため、
副操作方向を分割する走査ライン(同図中、No.1〜
l2までのラインナンパが割り付けられている)によっ
て各区分毎に画像データが存在する走査ラインNo.が
第3図の第2工程(II)の如く、調べられる。このよ
うにして、第3図の第2工程(n)に示される如く、4
等分に区分された各対象文字列の画像データは、区分A
には第6ライン目から第12ライン目、区分Bには第4
ライン目から第10ライン目、区分Cには第3ライン目
から第10ライン目、そして、区分Dには第2ライン目
から第8ライン目までの間に存在していることが確認さ
れる。
To check the existence of image data for the target character line,
Scanning lines that divide the sub-operation direction (No. 1 to
line number up to 12 is assigned), the scanning line number where image data exists for each section is determined. is examined as in the second step (II) of FIG. In this way, as shown in the second step (n) of FIG.
The image data of each target character string divided into equal parts is classified into division A.
6th line to 12th line, and 4th line for section B.
It is confirmed that there are lines from line 1 to line 10, line C from line 3 to line 10, and line D from line 2 to line 8. .

この後に、対象の文字行以外の文字行の画像データが取
り除かれて、第3図中の第3工程(III)に示される
如く、対象文字行は各区分での射影領域、即ち、領域A
,B,C,Dとして抽出される。まず、領域Aが抽出さ
れ(ステップs3)、第3図中の第3工程(III)に
示される如く、文字行の高さ方向の射影が求められる(
ステップS5)。このステップS4とステップs5の処
理は、領域A−Dまで繰り返し行われる(ステップS6
)。
After this, the image data of character lines other than the target character line are removed, and as shown in the third step (III) in FIG.
,B,C,D. First, region A is extracted (step s3), and as shown in the third step (III) in FIG. 3, the projection in the height direction of the character line is obtained (
Step S5). The processing of steps S4 and s5 is repeated until areas A-D (step S6
).

次に、文字の切出しが行われるが、例えば、第3図中の
第4工程(■)に示される如く、領域Aと領域Bとにま
たがる文字「じ」のような場合には、領域Aに属するS
lで示される射影と領域Bに属するS2で示される射影
が連続することから夫々の射影に対応する画像データの
CI,C2で示される文字領域が結合され、これによっ
て文字の高さh及び文字の幅Wが決定され、その文字の
高さhと文字の幅Wとで形成される矩形領域(文字)が
1文字分として切出される(ステップS7)。そして、
切出された文字の文字認識が行われ(ステップS8)、
その認識結果は表示部3oへ送られる(ステップS9)
。このようにして、現在対象としている文字行から文字
の切出しがステップ87〜ステップs9の間で繰り返し
行われる(ステップSIO)。さらに、原稿画像中のす
べての文字列の文字認識が終了するまでステップ83〜
ステップSIOの処理が繰り返し行われる(ステップS
ll), 以上説明したように、本実施例によれば、斜行した文字
列に対しても確実に文字の切出しが行える。
Next, characters are cut out. For example, as shown in the fourth step (■) in FIG. S belonging to
Since the projection indicated by l and the projection indicated by S2 belonging to area B are continuous, the character areas indicated by CI and C2 of the image data corresponding to each projection are combined, and thereby the height h of the character and the character The width W of the character is determined, and a rectangular area (character) formed by the height h of the character and the width W of the character is cut out as one character (step S7). and,
Character recognition of the cut out characters is performed (step S8),
The recognition result is sent to the display unit 3o (step S9)
. In this way, characters are repeatedly cut out from the currently targeted character line between step 87 and step s9 (step SIO). Furthermore, steps 83 to 83 until character recognition of all character strings in the original image are completed
The process of step SIO is repeated (step SIO).
ll) As explained above, according to this embodiment, characters can be reliably extracted even from a skewed character string.

さて、上述した実施例では、行方向(主走査方向)と高
さ方向(副走査方向)との射影を求めることによって文
字の切出しを行っているが、本発明はこれに限定される
ものではなく、上述実施例の変形例として、特に文字列
の傾きが大きい場合に最適な文字列の傾きを求めて文字
の切出しを行う方法を用いても良い。以下に、その変形
例について説明する。
Now, in the above-mentioned embodiment, characters are cut out by determining the projection in the row direction (main scanning direction) and the height direction (sub-scanning direction), but the present invention is not limited to this. Instead, as a modification of the above-described embodiment, a method may be used in which, especially when the inclination of the character string is large, the optimum inclination of the character string is determined and the characters are extracted. Below, a modification thereof will be explained.

第4図は変形例を説明する図である。FIG. 4 is a diagram illustrating a modification.

この変形例では、第4図に示される如く、上述実施例と
同様に読取られた原稿画像の画像データがA−Dの4等
分に区分され、行方向の射影よって、認識対象の文字列
が決まると、次に、文字列の傾きXが算出される。この
傾きXの算出の方法は、周知の技術のため、説明を省略
する。
In this modified example, as shown in FIG. 4, the image data of the read original image is divided into four equal parts A-D in the same way as in the above embodiment, and by projection in the row direction, the character string to be recognized is Once this is determined, the slope X of the character string is then calculated. The method for calculating the slope X is a well-known technique, so the explanation thereof will be omitted.

そして、傾きXに基づいて、第4図の如く、領域A−D
の文字の高さ方向の射影が取られ、上述した実施例と同
様に連続する射影においては、夫々の射影に対応する文
字領域から1つの矩形領域が決定され、その矩形領域を
切出して文字認識が実施される。
Then, based on the slope X, as shown in FIG.
Projection in the height direction of the character is taken, and as in the above-described embodiment, in successive projections, one rectangular area is determined from the character area corresponding to each projection, and that rectangular area is cut out and used for character recognition. will be implemented.

この変形例によれば、隣り合う文字間で高さ方向の射影
が重なってしまうほどの文字列の傾きがあっても、適確
に文字の切出しを実施できる。
According to this modification, even if the character string has such an inclination that the projections in the height direction overlap between adjacent characters, the characters can be accurately extracted.

この変形例において、各領域での文字列が一定の傾きを
有しているのであれば、文字列毎に傾きを調べる必要は
なく、第1行目の文字列で算出された傾きで全文字列か
ら文字の切出しを行えば良い。
In this modified example, if the character strings in each area have a constant slope, there is no need to check the slope for each character string, and the slope calculated for the first line of characters is used for all characters. All you have to do is cut out the characters from the column.

[発明の効果] 以上説明したように、本発明によれば、斜行した文字列
に対しても確実に文字の切出しが行える。
[Effects of the Invention] As described above, according to the present invention, characters can be reliably extracted even from a skewed character string.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本実施例の文字認識装置の構成を説明するブロ
ック図、 第2図は本実施例の文字認識動作を説明するフローチャ
ート、 第3図は1文字分の領域を切出すまでの過程を説明する
図、 第4図は変形例を説明する図、 第5図(a),(b)は従来例の文字切出し方法を説明
する図である。 図中、l・・・文字認識装置、2・・・入力部、3・・
・操作部、4・・・文字認識部、5・・・cpu、・・
・ ROM, 20・・・読取装置、 ・・・RAM,8・・・出力 30・・・表示装置である。
Figure 1 is a block diagram explaining the configuration of the character recognition device of this embodiment, Figure 2 is a flowchart explaining the character recognition operation of this embodiment, and Figure 3 is the process of cutting out an area for one character. 4 is a diagram illustrating a modified example, and FIGS. 5(a) and 5(b) are diagrams illustrating a conventional character cutting method. In the figure, l...Character recognition device, 2...Input section, 3...
・Operation unit, 4...Character recognition unit, 5...CPU,...
- ROM, 20...Reading device,...RAM, 8...Output 30...Display device.

Claims (2)

【特許請求の範囲】[Claims] (1)少なくとも1行分の文字行情報を含む画像データ
を入力する入力手段と、 該入力手段で入力された画像データを主走査方向で区分
する区分手段と、 該区分手段で区分された画像データ毎に前記文字行情報
の存在する領域を抽出する抽出手段と、該抽出手段で抽
出された領域毎に副走査方向での文字の射影情報を検出
する検出手段と、 該検出手段で検出された文字の射影情報に基づいて文字
領域の切出しを行う文字切出手段と、該文字切出手段で
切出された文字領域毎に文字認識処理を実施する認識手
段とを備えることを特徴とする文字認識装置。
(1) An input means for inputting image data including at least one line of character line information, a sorting means for sorting the image data inputted by the input means in the main scanning direction, and an image divided by the sorting means. an extraction means for extracting a region where the character line information exists for each data; a detection means for detecting character projection information in the sub-scanning direction for each region extracted by the extraction means; The present invention is characterized by comprising a character cutting means for cutting out a character area based on projection information of a character, and a recognition means for performing character recognition processing for each character area cut out by the character cutting means. Character recognition device.
(2)少なくとも1行分の文字行情報を含む画像データ
を入力する入力手段と、 該入力手段で入力された画像データをほぼ主走査方向で
区分する区分手段と、 該区分手段で区分された画像データ毎に前記文字行情報
の存在する領域を抽出する抽出手段と、該抽出手段で抽
出された領域に基づいて主走査方向に対する前記文字行
の傾きを検出する傾き検出手段と、 該傾き検出手段で検出された傾きに基づいて前記夫々の
区分の領域中の文字の射影情報を検出する射影検出手段
と、 該射影検出手段で検出された文字の射影情報に基づいて
文字領域の切出しを行う文字切出手段と、 該文字切出手段で切出された文字領域毎に文字認識処理
を実施する認識手段とを備えることを特徴とする文字認
識装置。
(2) an input means for inputting image data including at least one line of character line information; a sorting means for sorting the image data inputted by the input means substantially in the main scanning direction; Extracting means for extracting an area where the character line information exists for each image data; Tilt detection means for detecting the tilt of the character line with respect to the main scanning direction based on the area extracted by the extracting means; and the tilt detection. projection detection means for detecting projection information of characters in the regions of the respective divisions based on the inclinations detected by the means; and cutting out character regions based on the projection information of characters detected by the projection detection means. A character recognition device comprising: a character cutting means; and a recognition means for performing character recognition processing for each character area cut out by the character cutting means.
JP1156737A 1989-06-21 1989-06-21 Image processing method and apparatus Expired - Fee Related JP2933947B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1156737A JP2933947B2 (en) 1989-06-21 1989-06-21 Image processing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1156737A JP2933947B2 (en) 1989-06-21 1989-06-21 Image processing method and apparatus

Publications (2)

Publication Number Publication Date
JPH0324681A true JPH0324681A (en) 1991-02-01
JP2933947B2 JP2933947B2 (en) 1999-08-16

Family

ID=15634219

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1156737A Expired - Fee Related JP2933947B2 (en) 1989-06-21 1989-06-21 Image processing method and apparatus

Country Status (1)

Country Link
JP (1) JP2933947B2 (en)

Also Published As

Publication number Publication date
JP2933947B2 (en) 1999-08-16

Similar Documents

Publication Publication Date Title
US5613016A (en) Area discrimination system for text image
JPH05242292A (en) Separating method
JPH05233873A (en) Area dividing method
JPH07105312A (en) Method and device for eliminating dirt from character image in optical character reader
JPH0950527A (en) Frame extracting device and rectangle extracting device
JP2890306B2 (en) Table space separation apparatus and table space separation method
JPH0410087A (en) Base line extracting method
JPH0324681A (en) Character recognizing device
JP3187895B2 (en) Character area extraction method
JP3019897B2 (en) Line segmentation method
JPH07230525A (en) Method for recognizing ruled line and method for processing table
JPH0679348B2 (en) Line cutting method
JPH117493A (en) Character recognition processor
JPH0728933A (en) Character recognition device
JPH10507014A (en) Automatic determination of landscape scan in binary image
JP2954218B2 (en) Image processing method and apparatus
JPH0573718A (en) Area attribute identifying system
JP2962525B2 (en) Text block recognition method
JPH05266250A (en) Character string detector
JPH08339424A (en) Device and method for image processing
JPH05210759A (en) Character recognizing device
JPH0934992A (en) On-line handwritten character string segmenting device
JPH05114047A (en) Device for segmenting character
JPH0433074B2 (en)
JPH05274472A (en) Image recognizing device

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees