JPH0863545A

JPH0863545A - Method for deciding direction of character and line in character recognition processor

Info

Publication number: JPH0863545A
Application number: JP6201149A
Authority: JP
Inventors: Shiori Ooaku; 志緒理大阿久
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1994-08-25
Filing date: 1994-08-25
Publication date: 1996-03-08

Abstract

PURPOSE: To accurately decide the direction of a character and a line, which forms the most natural signal word string in language by generating a character lattice and judging the directions of the character and the line based on a constituent word rate. CONSTITUTION: As the result of recognition processing at a character recognition processing part 4, the character lattice being the matrix of plural recognition candidate characters is generated for each character constituting circumscribed lines. A language processing part 6 switches processing in accordance with the presence of information of the character direction and line direction concerning an object area in reference to an information storing part 5. Then, a character direction and line direction judgement part 7 judges the character and line directions which seems to be the correctest in language based on language information extracted by a language processing part 6 with respect to the character string of each circumscribed line. Character recognition processing is executed by each character direction of all the circumscribed lines of a specified area and language processing is executed to the generated character lattice to decide the directions of the character and the line in the specified area based on the constituent word rate of the character lattice.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書原稿を画像データ
として入力して、文字認識処理を行なう文字認識処理装
置（ＯＣＲ）において、認識対象の文字領域を単位とし
て、文字方向および行方向を決定する方法に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition processing device (OCR) for inputting a document original as image data and performing character recognition processing, in which the character direction and the line direction are set in units of character areas to be recognized. It is about how to make a decision.

【０００２】[0002]

【従来の技術】従来より、新聞や雑誌などの縦方向の文
字、横方向の文字が混在している文書に対して文字認識
処理を行なう場合には、文字認識処理装置側で処理対象
の領域ごとに文字方向および行方向を正確に特定するこ
とは困難であった。そのために、文字認識処理の前に、
入力画像を表示装置に表示して、操作者がその対象領域
の文字方向、行方向を予め指定したり、誤認識した結果
を修正したりして文字認識処理を施すのが一般的であっ
た。2. Description of the Related Art Conventionally, when character recognition processing is performed on a document such as newspapers and magazines in which vertical characters and horizontal characters are mixed, an area to be processed on the character recognition processing device side. It was difficult to accurately specify the character direction and line direction for each. Therefore, before the character recognition process,
In general, an input image is displayed on a display device, and an operator preliminarily specifies the character direction and line direction of the target area, and corrects the result of erroneous recognition to perform character recognition processing. .

【０００３】こうした操作者の指定作業、修正作業の負
担を軽減するために、文書中の各領域の大きさや縦横比
等の領域情報から文字方向、行方向を推定する方式が採
られる場合がある。しかしながら、領域情報のみに基づ
く文字方向の決定では、例外が多くなり充分な精度が得
られなかった。In order to reduce the burden of the operator's designation work and correction work, a method of estimating the character direction and the line direction from the area information such as the size and aspect ratio of each area in the document may be adopted. . However, in the determination of the character direction based only on the area information, there are many exceptions and sufficient accuracy cannot be obtained.

【０００４】また、特開平４ー３１２１６２号公報に
は、罫線で分離された各領域に対して縦書きと横書きの
文字列を作成して、その文字列に対して形態素解析を行
ない、その分割結果からこの文字行中の未知語数が少な
い文字列を正しい文字列として判定する技術が記載され
ている。このように言語的情報から行方向を推定する方
式は、領域情報のみで判定する方式より精度が高いと推
測される。Further, in Japanese Laid-Open Patent Publication No. 4-321162, character strings of vertical writing and horizontal writing are created for each area separated by ruled lines, morphological analysis is performed on the character string, and the division is performed. A technique for determining a character string having a small number of unknown words in the character line as a correct character string from the result is described. As described above, the method of estimating the line direction from the linguistic information is presumed to have higher accuracy than the method of determining only the area information.

【０００５】しかしながら、上記公報記載の方式は、画
像の上下方向が正しく入力された罫線で分離された領域
に対する処理を前提としており、罫線で分離されていな
い文書画像の行方向の判定には不具合が発生する場合が
ある。また、文字方向の決定については考慮されていな
い。However, the method described in the above publication is premised on the processing for the area separated by the ruled line in which the vertical direction of the image is correctly input, and there is a problem in determining the line direction of the document image not separated by the ruled line. May occur. Further, the determination of the character direction is not considered.

【０００６】また、誤った文字方向および行方向で文字
列が生成された場合においても、形態素解析の結果とし
て未知語文字列に判定される文字列が少なくなることが
多いために誤判定を起こす場合がある。Even when a character string is generated in the wrong character direction and line direction, the character string determined as the unknown word character string is often reduced as a result of the morphological analysis, and thus an erroneous determination occurs. There are cases.

【０００７】[0007]

【発明が解決すべき課題】本発明はかかる事情に鑑み、
文字認識処理装置において、操作者の指定作業、修正作
業の負担を軽減し、処理対象として特定された日本語の
文字領域から得られる言語情報より特定領域の文字方向
および行方向を高精度に決定する文字方向および行方向
決定方法を提供することを目的とする。The present invention has been made in view of such circumstances.
In the character recognition processing device, the burden of the operator's designation work and correction work is reduced, and the character direction and line direction of the specific area are determined with high accuracy based on the language information obtained from the Japanese character area specified as the processing target. It is an object of the present invention to provide a method for determining a character direction and a line direction to be performed.

【０００８】[0008]

【課題を解決するための手段】上述した目的を達成する
ために、特許請求の範囲第１項記載の発明は、入力画像
中の文字領域を特定して特定領域内の文字行を切り出し
て前記特定領域の文字画像データに対して文字認識処理
を施す文字認識処理装置において、前記特定領域の全て
の外接行の各文字方向ごとに文字認識処理を施し、生成
された文字ラティスに対して言語処理を施し、前記文字
ラティスの構成単語率に基づいて前記特定領域の文字方
向および行方向を決定するように構成した。In order to achieve the above object, the invention according to claim 1 specifies a character area in an input image, cuts out a character line in the specific area, and A character recognition processing device that performs character recognition processing on character image data of a specific area, performs character recognition processing for each character direction of all circumscribing lines of the specific area, and performs language processing on the generated character lattice. Then, the character direction and the line direction of the specific region are determined based on the constituent word ratio of the character lattice.

【０００９】また、特許請求の範囲第２項記載の発明
は、入力画像中の文字領域を特定して特定領域内の文字
行を切り出して前記特定領域の文字画像データに対して
文字認識処理を施す文字認識処理装置において、前記特
定領域の全ての外接行の各文字方向ごとに文字認識処理
を施し、生成された文字ラティスに対して言語処理を施
し、前記文字ラティスの自立語含有率に基づいて前記特
定領域の文字方向および行方向を決定するように構成し
た。According to the second aspect of the invention, a character area in the input image is specified, a character line in the specific area is cut out, and character recognition processing is performed on the character image data in the specific area. In the character recognition processing device to perform, character recognition processing is performed for each character direction of all circumscribing lines of the specific area, language processing is performed on the generated character lattice, and based on the content rate of the independent word of the character lattice. And the character direction and line direction of the specific area are determined.

【００１０】[0010]

【作用】本発明によれば、処理対象として特定された文
字領域の外接行を抽出し、各外接行に対して文字方向を
生成する。各外接行と各文字方向ごとに認識処理を行な
って認識候補文字の文字ラティスを生成し、各文字ラテ
ィスごとに言語処理を行なって文字列を決定して言語情
報を抽出する。各文字列の構成単語率または自立語含有
率に基づいて、言語的に最も妥当である文字方向・行方
向を処理領域の文字方向・行方向と決定する。According to the present invention, the circumscribed line of the character area specified as the processing target is extracted, and the character direction is generated for each circumscribed line. The recognition process is performed for each circumscribed line and each character direction to generate the character lattice of the recognition candidate character, and the language process is performed for each character lattice to determine the character string and the language information is extracted. Based on the constituent word rate or the independent word content rate of each character string, the linguistically most appropriate character direction / line direction is determined as the character direction / line direction of the processing area.

【００１１】[0011]

【実施例】以下、図面に基づいて本発明の一実施例を詳
細に説明する。図１は本発明を文字認識装置に応用した
ブロック図であり、処理対象の文書はイメージ画像とし
て入力され、領域識別部１において、入力画像を文字領
域と写真領域、表領域等のその他の領域とに矩形状に識
別する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described in detail below with reference to the drawings. FIG. 1 is a block diagram in which the present invention is applied to a character recognition device. A document to be processed is input as an image image, and in the area identification unit 1, the input image is a character area, a photographic area, and other areas such as a table area. And identify in a rectangular shape.

【００１２】この領域識別処理としては、例えば、本出
願人による特開平５−８１４７５号公報に記載された、
入力された文書画像中の黒画素の連続性から黒ラン（黒
画素連結成分）を抽出して、隣接する黒ランを統合する
圧縮処理によって文字行を抽出し、さらに抽出した文字
行を統合することによって文字領域とその他の領域とを
識別する方法によって行なわれる。尚、上記のような処
理手段に換えてマウスなどの領域指定手段によって処理
対象の領域を指定することも可能である。This area identification processing is described in, for example, Japanese Patent Application Laid-Open No. 5-81475 by the present applicant.
A black run (black pixel connected component) is extracted from the continuity of black pixels in the input document image, a character line is extracted by a compression process that integrates adjacent black runs, and the extracted character lines are further integrated. This is done by a method of distinguishing the character area from other areas. Incidentally, it is possible to specify the area to be processed by the area specifying means such as a mouse in place of the above-mentioned processing means.

【００１３】外接行抽出部２では、領域識別部１で矩形
状に文字領域と識別された処理対象の領域に対して、こ
の文字領域の最も外側に位置する最上行、最下行、最右
行、最左行、計４行の外接行を図２に示すように抽出す
る。この外接行の抽出処理は、領域識別部１における黒
ランの抽出、統合による文字行の抽出結果を利用するこ
とにより行なわれる。In the circumscribing line extraction unit 2, with respect to the region to be processed, which has been rectangularly identified as a character region by the region identification unit 1, the uppermost line, the lowermost line, and the rightmost line located on the outermost side of this character region. , The leftmost row, a total of 4 circumscribed rows are extracted as shown in FIG. This circumscribing line extraction processing is performed by using the extraction result of the character line by the extraction and integration of the black run in the area identification unit 1.

【００１４】文字方向生成部３では、外接行抽出部２に
よって抽出された最上行、最下行、最右行、最左行の４
つの外接行に対して、図３に示すように、各外接行につ
いて上・下・右・左の計４方向の文字方向を生成する。In the character direction generation unit 3, the top line, the bottom line, the rightmost line, and the leftmost line extracted by the circumscribed line extraction unit 2 are four.
For each circumscribed line, as shown in FIG. 3, a total of four character directions of upper / lower / right / left are generated for each circumscribed line.

【００１５】文字認識処理部４では、先ず処理対象の領
域の文字方向・行方向に関する初期値を確認するために
情報格納部５を参照して、処理対象領域に関する文字方
向・行方向の情報の有無を確認する。情報格納部５に対
象領域の文字方向・行方向に関する情報が無い場合に
は、外接行抽出部２によって抽出された各外接行に対し
て、文字方向生成部３で生成した上・下・右・左の４方
向について文字認識用の辞書とのマッチングによる文字
認識処理を施す。本発明におけるマッチング処理では、
各外接行に含まれる各文字に対して複数の候補文字が出
力される。In the character recognition processing unit 4, first, the information storage unit 5 is referred to in order to confirm the initial values regarding the character direction and line direction of the region to be processed, and the information on the character direction and line direction regarding the region to be processed is stored. Check for the presence. If the information storage unit 5 does not have information about the character direction and line direction of the target area, the upper, lower, and right sides generated by the character direction generation unit 3 for each circumscribed line extracted by the circumscribed line extraction unit 2. -Perform character recognition processing by matching with the dictionary for character recognition in the left four directions. In the matching process of the present invention,
A plurality of candidate characters are output for each character included in each circumscribed line.

【００１６】文字認識処理部４における認識処理の結果
として、外接行を構成する各文字に対して複数の認識候
補文字のマトリクスである文字ラティスが生成される。
文字ラティスとは、ある文字列の認識処理を行なった際
に、各文字ごとに複数保持されているに認識候補文字が
格子状に連なった候補文字のマトリクスであり、単語の
概念で区切られていない状態を意味している。As a result of the recognition processing in the character recognition processing unit 4, a character lattice, which is a matrix of a plurality of recognition candidate characters, is generated for each character forming the circumscribed line.
A character lattice is a matrix of candidate characters in which a plurality of recognition candidate characters are held in a grid pattern when a certain character string is recognized, and are separated by the concept of words. It means no state.

【００１７】文字認識処理部４では、特定領域の外接行
の文字に対して、各外接行ごとに上下左右の全ての方向
に認識処理を行なって、合計１６種の文字ラティスを生
成しているために特に文字方向および行方向を決定する
際の判別精度が高められる。The character recognition processing unit 4 performs recognition processing on the characters of the circumscribing line of the specific area in all directions of up, down, left and right for each circumscribing line to generate a total of 16 character lattices. Therefore, the discrimination accuracy is particularly improved when determining the character direction and the line direction.

【００１８】言語処理部６においては、情報格納部５を
参照して対象領域に関する文字方向・行方向の情報の有
無に応じて処理が切り替えられる。情報格納部５に対象
領域の文字方向・行方向に関する情報が無い場合には、
文字認識処理部で生成した１６種類の文字ラティスに対
して、単語情報、品詞情報を含んだ所定の言語辞書との
マッチングによる言語処理が施される。これは認識候補
文字の連なりである文字ラティスから、言語的に正しい
文字列を構成する認識候補文字を選択するための処理で
あり、言語処理の結果として、文字列長、文字列を構成
する単語数、各単語の品詞の内訳、各単語の表記長が文
字ラティスの言語情報として抽出される。The language processing section 6 refers to the information storage section 5 to switch the processing depending on the presence / absence of character direction / line direction information regarding the target area. If the information storage unit 5 does not have information about the character direction and line direction of the target area,
The 16 types of character lattices generated by the character recognition processing unit are subjected to language processing by matching with a predetermined language dictionary containing word information and part-of-speech information. This is a process for selecting recognition candidate characters that make up a linguistically correct character string from the character lattice that is a sequence of recognition candidate characters.As a result of the language processing, the character string length and the words that make up the character string are selected. The number, the breakdown of the part of speech of each word, and the notation length of each word are extracted as the language information of the character lattice.

【００１９】文字方向・行方向判定部７では、各外接行
の文字列に対して言語処理部６で抽出された言語情報に
基づいて、言語的に最も確からしい文字方向および行方
向の判定処理が施される。図４に文字方向・行方向判定
処理のフローチャートを示す。The character direction / line direction determining unit 7 determines the linguistically most probable character direction and line direction based on the linguistic information extracted by the linguistic processing unit 6 for each circumscribed line character string. Is applied. FIG. 4 shows a flowchart of character direction / line direction determination processing.

【００２０】＜処理１＞まず、すべての外接行の文字列
に対して、言語処理部より得られた言語情報から構成単
語率を算出する。構成単語率は、文字列を構成する単語
数を文字列長によって割った商として求める。この値の
最も小さい文字ラティスの文字方向・行方向を対象領域
における正しい文字方向・行方向であると判定する。即
ち、外接行の文字列に対して最も多くの単語が構成され
る文字方向、行方向を言語的に正当な文字方向、行方向
であると判定する。<Process 1> First, for all the character strings of the circumscribed line, the constituent word rate is calculated from the language information obtained from the language processing unit. The constituent word rate is obtained as a quotient obtained by dividing the number of words constituting a character string by the character string length. The character direction / line direction of the character lattice having the smallest value is determined to be the correct character direction / line direction in the target area. That is, it is determined that the character direction and the line direction in which the most words are formed in the character string of the circumscribing line are the linguistically valid character direction and the line direction.

【００２１】＜処理２＞構成単語率が最小となる文字列
が複数ある場合には、構成単語率が最小値となった文字
列に対して、表記長が１である自立語含有率を算出す
る。自立語含有率は、表記長が１である自立語数を文字
列長によって割った商として求める。この値の最も小さ
い文字ラティスの文字方向・行方向を対象領域における
正しい文字方向・行方向であると判定する。即ち、言語
的に孤立している表記長が１の自立語数が最も少なくな
る文字方向、行方向を言語的に正当な文字方向、行方向
であると判定する。<Process 2> When there are a plurality of character strings having the smallest constituent word rate, the independent word content rate having a notation length of 1 is calculated for the character string having the smallest constituent word rate. To do. The independent word content rate is obtained as a quotient obtained by dividing the number of independent words having a notation length of 1 by the character string length. The character direction / line direction of the character lattice having the smallest value is determined to be the correct character direction / line direction in the target area. That is, it is determined that the character direction and the line direction in which the number of independent words having a notation length of 1 which is linguistically isolated is the linguistically valid character direction and the line direction.

【００２２】表記長が１の自立語の抽出処理は、文字ラ
ティスに言語処理を施すことによって得られる各単語の
品詞情報によって自立語品詞テーブル８を検索すること
により実行される。自立語品詞テーブル８には漢語名
詞、和語名詞、固有名詞等の単独で意味を成す自立語に
なる確率の高い品詞名が記述されており、各単語につい
ての言語処理の結果である品詞情報と単語長により表記
長が１の自立語が抽出される。The process of extracting an independent word having a notation length of 1 is executed by searching the independent word part-of-speech table 8 with the part-of-speech information of each word obtained by subjecting the character lattice to language processing. The independent word part-of-speech table 8 describes the part-of-speech names such as Chinese nouns, Japanese nouns, proper nouns, etc. that have a high probability of becoming independent words that make sense independently, and the part-of-speech information that is the result of language processing for each word. An independent word with a notation length of 1 is extracted by the word length.

【００２３】認識対象の文書が漢字とひらがなの混じっ
た日本語の文字列である場合には、言語処理の結果とし
て得られる文字コードのうち、漢字文字コードの多く
は、表記長が１字の和語名詞や固有名詞に認識され、ひ
らがなの多くは助詞と認定される場合が多い。このとき
文字方向を誤って文字列を構成した場合には、正解文字
列より表記長１字の自立語数が多くなり、さらに、正解
文字列を構成する単語よりも、構成単語数が多くなると
いう傾向が見られる。本願発明ではこの特徴を利用する
ことにより、言語的に最も確からしい文字列を選択して
いる。When the document to be recognized is a Japanese character string in which Chinese characters and Hiragana are mixed, among the character codes obtained as a result of the language processing, most of the Chinese character character codes have a notation length of 1 character. Recognized by Japanese nouns and proper nouns, most hiragana are often recognized as postpositional particles. At this time, if a character string is formed by mistakenly character direction, the number of independent words with a written length of 1 character is larger than that of the correct answer string, and the number of constituent words is larger than the number of words forming the correct answer string. There is a tendency. In the present invention, by utilizing this feature, the character string most likely to be linguistically selected is selected.

【００２４】文字方向・行方向判定部７で決定された文
字方向・行方向は、情報格納部５と前領域情報格納部９
に保持されて、行切り出し部１０、文字切り出し部１１
に入力される。行切り出し部１０、文字切り出し部１１
では、情報格納部５に格納されている行方向・文字方向
に準じて、処理対象の文字領域全体の文字行の切り出し
処理、文字の切り出し処理を行い、結果を文字認識処理
部４に出力する。文字認識処理部４では、処理対象の文
字領域全体に対して文字認識用の辞書とのマッチングを
とる文字認識処理を施して文字ラティスを生成し、言語
処理部６において、文字ラティスから言語的に最も確か
らしい文字列を選択した後、ＣＲＴ、プリンタ等の出力
装置より処理対象領域の文字認識結果を出力する。The character direction / line direction determined by the character direction / line direction determination unit 7 is the information storage unit 5 and the previous area information storage unit 9.
Are held by the line cutout unit 10 and the character cutout unit 11
Is input to Line cutout unit 10, character cutout unit 11
Then, in accordance with the line direction and the character direction stored in the information storage unit 5, the character line cutout process and the character cutout process of the entire character area to be processed are performed, and the result is output to the character recognition processing unit 4. . The character recognition processing unit 4 performs a character recognition process for matching the entire character region to be processed with a dictionary for character recognition to generate a character lattice, and a language processing unit 6 verbalizes the character lattice. After selecting the most probable character string, the character recognition result of the processing target area is output from an output device such as a CRT or a printer.

【００２５】情報格納部５に対象領域の文字方向、行方
向に関する情報が格納されている場合には、文字認識処
理部４、言語処理部６では、格納されている文字方向、
行方向に基づいて対象領域全体の文字ラティスに対して
文字認識処理、言語処理を行い認識結果の文字列として
出力する。対象領域の認識結果の出力後に情報格納部５
を初期化して、次の処理対象領域に対して同様の処理を
繰り返す。When the information storage unit 5 stores information about the character direction and line direction of the target area, the character recognition processing unit 4 and the language processing unit 6 store the stored character direction,
Character recognition processing and linguistic processing are performed on the character lattice of the entire target area based on the line direction, and output as a character string of the recognition result. Information storage unit 5 after outputting the recognition result of the target area
Is initialized and the same processing is repeated for the next processing target area.

【００２６】また、処理対象の文字領域に仕様外の大き
さの文字やフォントが含まれている際には、上記の＜処
理２＞を実行した後も複数候補が存在してしまう場合が
ある。この場合、文字方向・行方向判定部７は、前領域
情報格納部９を参照して、現在処理中の領域は直前に処
理を行なった領域と同様の文字方向・行方向であると判
定して、文字方向・行方向情報格納部５に情報を格納
し、行切り出し部・文字切り出し部への処理に進める。
なお、前領域情報格納部９には利用状況に応じた初期値
を設定しておくことが望ましい。Further, when the character area to be processed includes characters or fonts having a size out of the specification, there may be a plurality of candidates even after the above-mentioned <Process 2> is executed. . In this case, the character direction / line direction determination unit 7 refers to the previous region information storage unit 9 and determines that the region currently being processed has the same character direction / line direction as the region processed immediately before. Then, the information is stored in the character direction / line direction information storage unit 5, and the process proceeds to the line cutout unit / character cutout unit.
It should be noted that it is desirable to set an initial value in the previous area information storage unit 9 according to the usage situation.

【００２７】また、本実施例では自立語品詞テーブル８
を独立して設けているが、言語処理部６の言語辞書に自
立語品詞に関する情報を含めるように構成することも可
能である。この場合は、言語処理部６の出力に自立語品
詞の判定情報が含まれる。Further, in this embodiment, the independent word part-of-speech table 8 is used.
Is independently provided, but the language dictionary of the language processing unit 6 can be configured to include information on the independent word part of speech. In this case, the output of the language processing unit 6 includes the independent word part of speech determination information.

【００２８】[0028]

【発明の効果】請求項１記載の発明においては、処理対
象領域の外接行について、正解候補となる全ての文字方
向と行方向の文字ラティスを生成して言語処理を行な
い、構成単語率に基づいて文字方向および行方向と判定
することにより言語的に最も自然な単語列を形成する文
字方向および行方向を高精度に決定することが可能とな
る。According to the first aspect of the present invention, for the circumscribed line of the processing target area, character lattices in all the character directions and line directions that are correct candidates are generated and language processing is performed, and based on the constituent word rate. It is possible to determine with high accuracy the character direction and line direction forming the linguistically most natural word sequence by determining the character direction and line direction.

【００２９】請求項２記載の発明においては、処理対象
領域の外接行について、正解候補となる全ての文字方向
と行方向の文字ラティスを生成して言語処理を行ない、
自立語含有率に基づいて文字方向および行方向を判定す
ることにより、さらに言語情報を厳密に規定することに
より、高精度に文字方向および行方向の判定処理が可能
となる。According to the second aspect of the present invention, for the circumscribed line of the processing target area, character lattices in all the character directions and the line directions that are correct candidates are generated and language processing is performed.
By determining the character direction and the line direction based on the content rate of the independent word, and by further strictly defining the linguistic information, it is possible to highly accurately determine the character direction and the line direction.

【００３０】また、本発明は、処理対象領域の外接行の
みを言語処理の対象として文字方向および行方向判定処
理を実行するため、小さな処理領域から処理対象領域の
文字方向および行方向を高精度に決定することが可能に
なり、処理時間の短縮が図れて文字認識処理の前処理に
は特に適している。Further, according to the present invention, since the character direction and line direction determination processing is executed with only the circumscribed line of the processing target area as the target of the language processing, the character direction and line direction of the processing target area can be accurately adjusted from a small processing area. It is possible to reduce the processing time, which is particularly suitable for the preprocessing of the character recognition processing.

[Brief description of drawings]

【図１】本発明を文字認識装置に応用したブロック図で
ある。FIG. 1 is a block diagram in which the present invention is applied to a character recognition device.

【図２】本発明における外接行抽出処理を説明する図で
ある。FIG. 2 is a diagram illustrating a circumscribed line extraction process according to the present invention.

【図３】本発明における文字方向生成を説明する図であ
る。FIG. 3 is a diagram illustrating character direction generation according to the present invention.

【図４】本発明による文字方向・行方向判定処理のフロ
ーチャートである。FIG. 4 is a flowchart of a character direction / line direction determination process according to the present invention.

【図５】自立語品詞テーブルの例を示す図である。FIG. 5 is a diagram showing an example of an independent word part-of-speech table.

[Explanation of symbols]

１領域識別部２外接行抽出部３文字方向生成部４文字認識処理部５情報格納部６言語処理部７文字方向・行方向判定部８自立語品詞テーブル９前領域情報格納部１０行切り出し部１１文字切り出し部 1 area identification unit 2 circumscribed line extraction unit 3 character direction generation unit 4 character recognition processing unit 5 information storage unit 6 language processing unit 7 character direction / line direction determination unit 8 independent word part-of-speech table 9 previous region information storage unit 10 line cutout unit 11 character cutout

Claims

[Claims]

1. A character recognition processing device which specifies a character area in an input image, cuts out a character line in the specific area, and performs character recognition processing on the character image data of the specific area. Character recognition processing is performed for each character direction of the circumscribing line, linguistic processing is performed on the generated character lattice, and the character direction and line direction of the specific area are determined based on the constituent word ratio of the character lattice. A method for determining a character direction and a line direction in a character recognition processing device characterized by the above.

2. A character recognition processing device which specifies a character area in an input image, cuts out a character line in the specific area, and performs character recognition processing on the character image data in the specific area. Character recognition processing is performed for each character direction of the circumscribed line, and language processing is performed on the generated character lattice, and the character direction and line direction of the specific area are determined based on the content rate of the independent word of the character lattice. A method for determining a character direction and a line direction in a character recognition processing device, characterized by: