JPH0944605A

JPH0944605A - Document image analyzer

Info

Publication number: JPH0944605A
Application number: JP7212806A
Authority: JP
Inventors: Kiyoshi Tashiro; 潔田代
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1995-07-28
Filing date: 1995-07-28
Publication date: 1997-02-14
Anticipated expiration: 2015-07-28
Also published as: JP3546553B2

Abstract

PROBLEM TO BE SOLVED: To precisely separate a character part or the like by performing the extraction and recognition of a linked black picture element group, providing a character code as its recognized candidate and using the shape and position of the already character-recognized black picture element group and the information on connection between words held in a grammer dictionary or the like. SOLUTION: A rectangle circumscribed to the continuous black picture element group in an image is found by a circumscribed rectangle generating means 101 and the degree of certainty is provided by a character recognizing means 102-1. The estimation and merge of separate characters are performed by a separate character candidate extracting means 104 and a separate character merging means 105, and the rectangle merged as separate characters is fixed as a character rectangle. Next, a character string is estimated as the set of character rectangles by a character string estimating means 107. Afterwards, a character block is estimated by a character block estimating means 108, and a non-character rectangle is estimated as the set of rectangles by a non-character rectangle estimating means 106. When there is any unfixed rectangle, that rectangle is estimated again as an object but when there is no rectangle, the estimated result of character structure is outputted by a character P block order estimating means 111.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、様々なレイアウトをも
つ文書画像を対象とする、文書画像解析装置に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document image analysis device for document images having various layouts.

【０００２】[0002]

【従来の技術】従来文書画像のレイアウトを解析する方
法としては、例えば、画像電子学会誌Ｖｏｌ．１７Ｎ
ｏ．５ｐｐ２５８〜２６６「文書画像処理技術の動
向」に示されるように、様々な画像的特徴を利用して文
字行や文字ブロックを抽出する方法が考案されている。
また、同書籍、ｐｐ２６７〜２７７「書式定義言語を用
いた文書画像の理解」に示されるように、予め対象とす
る特定の文書のレイアウトに関する知識を記憶してお
き、入力画像にそれを適用していく方法も考案されてい
る。2. Description of the Related Art As a conventional method of analyzing the layout of a document image, for example, the Institute of Image Electronics Engineers, Vol. 17 N
o. As described in 5 pp. 258 to 266 "Trend of Document Image Processing Technology", a method of extracting a character line or a character block by utilizing various image characteristics has been devised.
Further, as shown in the same book, pp. 267-277 “Understanding Document Image Using Format Definition Language”, knowledge about the layout of a specific target document is stored in advance and applied to the input image. A method of working is also devised.

【０００３】前者の方法は幅広い種類の文書に対応でき
る半面、文字行間や段組の間隔が狭い場合などに、文字
行や文字ブロックの推定を誤り、その後に実行される文
字認識処理において文字が正しい順序で認識できない等
の欠点がある。また、図の中に存在する文字などに関し
ては文字認識処理の対象にできないか、あるいは文字図
形分離処理等の処理を更に実行する必要がある。The former method is applicable to a wide variety of documents, but when the space between character lines or columns is narrow, the character line or character block is erroneously estimated, and the character is recognized in the character recognition processing executed thereafter. There are drawbacks such as not being able to recognize in the correct order. In addition, the characters existing in the figure cannot be subjected to the character recognition processing, or it is necessary to further execute processing such as character / graphic separation processing.

【０００４】後者の方法は、文字行や文字ブロックの抽
出精度は向上し、また題名部分等の論理的な構造も抽出
できるが、対象とする文書の種類が予め記憶されている
知識の範囲内に限定されてしまう。The latter method improves the extraction accuracy of character lines and character blocks, and can extract the logical structure of the title part, etc., but the type of the target document is within the range of knowledge stored in advance. Will be limited to.

【０００５】図２１に、これらの従来の文書画像解析装
置の動作を表すフローチャートを示す。（Ｓ２１０）画像中のすべての連続黒画素塊に対してそ
の外接矩形を求める。（Ｓ２１１）１つ以上の外接矩形の集合として文字列を
推定する。（Ｓ２１２）１つ以上の文字列の集合として文字ブロッ
クを推定する。（Ｓ２１３）各文字のコード情報が必要な場合には、文
字列または文字ブロック内を対象に文字認識を行なう。（Ｓ２１４）文書構造の推定結果を出力して処理を終了
する。FIG. 21 is a flow chart showing the operation of these conventional document image analysis apparatuses. (S210) The circumscribed rectangles of all the continuous black pixel blocks in the image are obtained. (S211) A character string is estimated as a set of one or more circumscribing rectangles. (S212) A character block is estimated as a set of one or more character strings. (S213) If the code information of each character is required, character recognition is performed for the character string or character block. (S214) The estimation result of the document structure is output, and the process ends.

【０００６】このように、両者の方法あるいは従来の他
の文書画像解析の方法は何れも、文字ブロックなどの構
造の抽出が終了してから文字認識処理が行なわれるの
で、構造の抽出には画像から得られる情報しか用いるこ
とができず、個々の部分的な画像が文字画像として適切
であるかという情報や、文章の意味のつながり等の文字
の内容から得られる情報は用いることができない。As described above, in both of these methods and the other conventional document image analysis methods, the character recognition processing is performed after the extraction of the structure such as the character block is completed. It is only possible to use the information obtained from the above, and it is not possible to use the information as to whether each partial image is suitable as a character image or the information obtained from the content of the character such as the connection of the meaning of the sentence.

【０００７】また、文字認識装置の前処理としても、例
えばオペレータが文字ブロックの範囲を指定したり、文
字ブロック間の順序を指定する必要があったり、また
は、上記の技術を用いて認識対象とする文字ブロックを
自動的に抽出する試みがなされているが、精度の問題な
どから実用には至っていない。Also, as preprocessing of the character recognition device, for example, the operator needs to specify the range of the character blocks, the order between the character blocks, or the recognition target using the above technique. Attempts have been made to automatically extract the character blocks to be used, but they have not been put into practical use due to problems with accuracy.

【０００８】[0008]

【発明が解決しようとする課題】本発明は様々なレイア
ウトをもつ文書画像から、文字部分とその他の部分を精
度良く分離し、また、文字行、文字ブロックなどの構造
を精度良く抽出し、更に、文字ブロック間の順序を精度
良く推定することができる文書画像解析装置を提供する
ことを目的とする。SUMMARY OF THE INVENTION The present invention accurately separates a character part and other parts from a document image having various layouts, and extracts a structure such as a character line and a character block with high accuracy. It is an object of the present invention to provide a document image analysis device capable of accurately estimating the order between character blocks.

【０００９】また、オペレータの指示なしに文書画像中
の文字を認識し、正しい順序で文字コードを出力する文
字認識機能を提供することを目的とする。It is another object of the present invention to provide a character recognition function for recognizing characters in a document image and outputting character codes in the correct order without an operator's instruction.

【００１０】[0010]

【課題を解決するための手段】本発明の文書画像解析装
置は、以下の手段からなる。（１）画像中の連結黒画素塊を取り出す画素塊抽出手段（２）前記画素塊抽出手段により取り出された連結黒画
素塊に対して文字認識処理を行い、少なくとも１つの文
字コードを出力する第１の文字認識手段（３）単語及び単語間の接続情報を保持する文法辞書（４）前記文字認識手段の決定した文字コード、前記文
法辞書の保持する単語及び単語間の接続情報に基づい
て、文字列の方向を推定する文字列推定手段The document image analysis apparatus of the present invention comprises the following means. (1) Pixel block extraction means for extracting a connected black pixel block in an image (2) Character recognition processing is performed on the connected black pixel block extracted by the pixel block extraction means, and at least one character code is output. 1 character recognition means (3) a grammar dictionary holding words and connection information between words (4) based on the character code determined by the character recognition means, words held by the grammar dictionary, and connection information between words, Character string estimating means for estimating the direction of a character string

【００１１】また、（５）分離文字候補を抽出する分離文字候補抽出手段（２’）分離文字候補に対して文字認識処理を行い、少
なくとも１つの文字コードを決定する第２の文字認識手
段（６）分離文字候補から分離文字を決定する分離文字統
合手段とを更に有する構成とすることもできる。(5) Separated character candidate extraction means for extracting separated character candidates. (2 ') Second character recognition means for performing character recognition processing on the separated character candidates to determine at least one character code. 6) It may be configured to further include a separated character integration unit that determines a separated character from the separated character candidates.

【００１２】更に、（７）接触文字候補を抽出する接触文字候補抽出手段（８）接触文字候補から接触文字を決定し、該接触文字
を分割して複数の画像に分割する接触文字分割手段（２’’）前記接触文字分割手段により得られる画像に
対して文字認識処理を行い、少なくとも１つ以上の文字
コードを出力する第３の文字認識手段とを更に有する構
成とすることもできる。(7) Touch character candidate extraction means for extracting touch character candidates (8) Touch character dividing means for determining touch characters from the touch character candidates and dividing the touch characters into a plurality of images ( 2 ″) It is also possible to further include a third character recognition means for performing character recognition processing on the image obtained by the contact character division means and outputting at least one or more character codes.

【００１３】また、本発明の文書画像解析装置は、前記
構成に加えて、（９）前記文字列推定手段の推定した文字列、前記文字
認識手段の出力した文字コード、及び前記文法辞書の保
持する単語及び単語間の接続情報に基づいて、文字ブロ
ックを推定する文字ブロック推定手段を更に有する。In addition to the above configuration, the document image analysis apparatus of the present invention includes: (9) holding of a character string estimated by the character string estimating means, a character code output by the character recognizing means, and the grammar dictionary. It further has a character block estimation means which estimates a character block based on the word and the connection information between words.

【００１４】また、本発明の文書画像解析装置は、前記
構成に加えて、（１０）前記連結黒画素塊の位置及び形状、前記文字認
識手段の出力する文字コード、及び文法辞書に保持され
る単語及び単語間の接続情報に基づいて文字ではない非
文字矩形を推定する非文字推定手段を更に有する。Further, in addition to the above configuration, the document image analyzing apparatus of the present invention holds (10) the position and shape of the connected black pixel block, the character code output by the character recognizing means, and a grammar dictionary. It further has a non-character estimating means for estimating a non-character rectangle that is not a character based on the words and the connection information between the words.

【００１５】更に、本発明の文書画像解析装置は、前記
（１０）を含む構成に加えて、（１１）前記文字ブロック推定手段の推定した文字ブロ
ック、該文字ブロックに含まれる前記文字列、前記文字
認識手段の出力した文字コード、及び前記文法辞書の保
持する単語及び単語間の接続情報に基づいて、文字ブロ
ックの順序を推定する文字ブロック連続性推定手段（１２）前記文字ブロック推定手段の推定した文字ブロ
ック、該文字ブロックに含まれる前記文字列、前記文字
認識手段の出力した文字コード、及び前記文字ブロック
連続性推定手段の推定した文字ブロックの順序から、前
記文字ブロックに含まれる文字コードを前記順序で出力
する文字コード出力手段を備える。Further, in addition to the configuration including (10), the document image analysis apparatus of the present invention includes (11) a character block estimated by the character block estimating means, the character string included in the character block, and Character block continuity estimating means for estimating the order of the character blocks based on the character code output by the character recognizing means, and the information held by the grammar dictionary and the connection information between the words (12) Estimating by the character block estimating means The character code included in the character block is determined from the order of the character block, the character string included in the character block, the character code output by the character recognition unit, and the character block estimated by the character block continuity estimation unit. A character code output means for outputting in the order is provided.

【００１６】また、本発明の文書画像解析装置は、前記
（１０）を含む構成に加えて、（１３）前記文字ブロック推定手段の推定した文字ブロ
ック、該文字ブロックに含まれる各文字列、及び該文字
列に含まれて前記文字認識手段の出力した文字コード
と、文法辞書内の情報を用いて、前記各ブロックに論理
的な構造を付与して出力する第１の論理構造解析手段を
備えることができる。Further, in addition to the configuration including the above (10), the document image analyzing apparatus of the present invention includes (13) a character block estimated by the character block estimating means, each character string included in the character block, and A first logical structure analysis unit is provided that adds a logical structure to each block using the character code included in the character string and output by the character recognition unit and information in the grammar dictionary, and outputs the logical structure. be able to.

【００１７】また、本発明の文書画像解析装置は、前記
（１０）を含む構成に加えて、（１３’）前記非文字矩
形に関して、その近傍に存在する前記文字ブロック推定
手段の推定した文字ブロック、該文字ブロックに含まれ
る各文字列、及び該文字列に含まれて前記文字認識手段
の出力した文字コードと、文法辞書内の情報を用いて、
前記非文字矩形に論理的な構造を付与して出力する第２
の論理構造解析手段を備えることができる。Further, in addition to the configuration including the above (10), the document image analyzing apparatus of the present invention includes (13 ') a character block estimated by the character block estimating means existing in the vicinity of the non-character rectangle. , Using each character string included in the character block, the character code included in the character string and output by the character recognition means, and information in the grammar dictionary,
Secondly, adding a logical structure to the non-character rectangle and outputting
Can be provided with the logical structure analysis means.

【００１８】[0018]

【作用】入力された文書画像に対して、連結黒画素塊が
抽出される。続いて、各々の黒画素塊が認識され、その
認識候補として少なくとも１つの文字コードが得られ
る。次に、既に文字認識を行った黒画素塊の形状及び位
置、黒画素塊に対する文字コード、文法辞書の保持する
単語及び単語間の接続情報等を用いて、文字列を推定す
る。The connected black pixel block is extracted from the input document image. Subsequently, each black pixel block is recognized, and at least one character code is obtained as the recognition candidate. Next, the character string is estimated using the shape and position of the black pixel block that has already undergone character recognition, the character code for the black pixel block, the words held in the grammar dictionary and the connection information between the words, and the like.

【００１９】また、推定された文字列、文字コード、及
び前記文法辞書の保持する単語及び単語間の接続情報に
基づいて、文字ブロックを推定する。Further, the character block is estimated based on the estimated character string, the character code, and the information held by the grammar dictionary and the connection information between the words.

【００２０】また、推定された文字列、文字コード、及
び前記文法辞書の保持する単語及び単語間の接続情報に
基づいて、非文字矩形の推定も精度よく行える。Further, the non-character rectangle can be accurately estimated based on the estimated character string, the character code, and the information held by the grammar dictionary and the connection information between the words.

【００２１】更に、文字認識の結果を利用して、文字ブ
ロックの順序を決定したり、文書画像の論理構造解析を
行うことができる。Further, the result of character recognition can be used to determine the order of character blocks and to analyze the logical structure of a document image.

【００２２】このように、文字列、文字ブロック、文字
ブロックの順序、論理構造の推定に先だって文字認識を
行うことにより、文字列、文字ブロック、文字ブロック
の順序、論理構造の推定時にはその情報を用いることが
できる。As described above, character recognition is performed prior to the estimation of the character string, the character block, the order of the character block, and the logical structure, so that the information can be obtained when estimating the order of the character string, the character block, the character block, and the logical structure. Can be used.

【００２３】また、分離文字や接触文字に対する認識も
行って、より正確な文字認識を得ることにより、非文字
矩形、文字列、文字ブロック、文字ブロックの順序、論
理構造の推定はより正確なものとなる。Further, by recognizing separated characters and contact characters to obtain more accurate character recognition, the estimation of non-character rectangles, character strings, character blocks, the order of character blocks, and the logical structure is more accurate. Becomes

【００２４】[0024]

【実施例】図１は、本発明の一実施例である文書画像解
析装置の構成を示す図である。入力された画像中の連結
黒画素塊の外接矩形を求める外接矩形生成手段１０１、
外接矩形生成手段１０１で求められた外接矩形に対して
文字認識を行う第１の文字認識手段１０２−１、後述の
分離文字候補抽出手段により得られる分離文字候補画像
に対して文字認識を行う第２の文字認識手段１０２−
２、後述の接触文字分割手段により得られる画像に対し
て文字認識を行う第３の文字認識手段１０２−３、単語
及び単語間の接続情報を保持する文法辞書１０３、分離
文字候補抽出手段１０４、分離文字統合手段１０５、非
文字矩形指定手段１０６、文字列指定手段１０７、文字
ブロック推定手段１０８、接触文字候補抽出手段１０
９、接触文字分割手段１１０、文字ブロック順序推定手
段１１１、文字コード出力手段１１２、論理構造解析手
段１１３からなる。１０４から１１３までの手段はアプ
リケーションの要求により、適宜組み合せて接続、削除
が可能である。また、ここでは文字認識手段を３つの部
分に分離して記載しているが、これらは、入力が異なる
のみで同じ処理を行うものと考えられる。従って、これ
らを１つの手段、１つのソフトウェア・モジュールとし
て構成してももちろん良い。以下では１０２−１、１０
２−２、１０２−３を「第１の」、「第２の」、「第３
の」を省略して、いずれも文字認識手段と呼ぶことにす
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a diagram showing the configuration of a document image analysis apparatus according to an embodiment of the present invention. A circumscribing rectangle generating means 101 for obtaining a circumscribing rectangle of a connected black pixel block in the input image,
A first character recognizing means 102-1 for performing character recognition on the circumscribing rectangle obtained by the circumscribing rectangle generating means 101, and a first character recognizing for a separated character candidate image obtained by a separated character candidate extracting means to be described later. 2 character recognition means 102-
2. Third character recognition means 102-3 for performing character recognition on an image obtained by the contact character division means described later, grammar dictionary 103 for holding words and connection information between words, separated character candidate extraction means 104, Separated character integration means 105, non-character rectangular designation means 106, character string designation means 107, character block estimation means 108, contact character candidate extraction means 10
9, a contact character division unit 110, a character block order estimation unit 111, a character code output unit 112, and a logical structure analysis unit 113. The means 104 to 113 can be appropriately combined and connected and deleted according to the application request. Further, although the character recognition means is described as being divided into three parts here, it is considered that these perform the same processing only with different inputs. Therefore, of course, these may be configured as one means and one software module. In the following, 102-1 and 10
2-2, 102-3 as "first", "second", "third"
No "will be omitted and all will be referred to as character recognition means.

【００２５】まず、文法辞書１０３の構成の一例を図２
に示す。２０１は自立語をその品詞とともに記憶する自
立語表である。２０２は付属語をその前接続、後接続と
ともに記憶する付属語表である。２０３は各活用種類に
ついてその活用語尾を後接続とともに記憶する活用表で
ある。２０４は後接続と前接続間の接続スコアを記憶す
る接続表である。表中の０−、１−等は前接続、−０、
−１等は後接続となる見出し語の番号を表しており、例
えば０−１の接続スコアは１００になっている。高い接
続スコアは両見出し語が接続可能性が高いことを示して
いる。以下の説明においては、このような表を用いるこ
とにより、文字コードの並びから自立語表を参照して単
語を検出したり、形態素解析を行ない文節などを検出す
るものとする。形態素解析の技術については、例えば産
業図書株式会社『自然言語解析の基礎』の１４０ページ
から１４２ページに示されているような、従来の技術を
用いることができる。例えば文節を検出する場合には、
隣接した２つの見出し語のうち、前側の語の後接続、後
側の語の前接続を自立語表、付属語表、活用表から得
る。得られた後接続及び前接続を接続スコアを参照して
検査して２つの見出し語が接続可能かどうかを調べる。
接続スコアが０以外であればその２つの見出し語は１文
節に含まれるものとみなす。First, FIG. 2 shows an example of the structure of the grammar dictionary 103.
Shown in Reference numeral 201 is an independent word table that stores the independent word together with its part of speech. Reference numeral 202 is an adjunct word table that stores adjunct words together with their pre-connection and post-connection. Reference numeral 203 is a utilization table that stores the utilization ending for each utilization type together with the subsequent connection. Reference numeral 204 is a connection table that stores the connection score between the post connection and the front connection. 0-, 1-, etc. in the table are pre-connection, -0,
-1 and the like represent the number of the headword to be connected later. For example, the connection score of 0-1 is 100. A high connection score indicates that both headwords are highly likely to be connected. In the following description, by using such a table, it is assumed that a word is detected from a sequence of character codes by referring to an independent word table, or a morphological analysis is performed to detect a phrase or the like. As the technique of morphological analysis, conventional techniques such as those shown on pages 140 to 142 of "Basics of Natural Language Analysis" of Sangyo Tosho Co., Ltd. can be used. For example, to detect a phrase,
Of the two adjacent headwords, the post-connection of the front word and the pre-connection of the rear word are obtained from the independent word table, the attached word table, and the utilization table. The obtained post-connection and pre-connection are checked with reference to the connection score to see if the two entry words can be connected.
If the connection score is other than 0, the two entry words are considered to be included in one clause.

【００２６】図３は本発明による文書画像解析装置の一
実施例の動作を示すフローチャートである。FIG. 3 is a flow chart showing the operation of an embodiment of the document image analyzing apparatus according to the present invention.

【００２７】（Ｓ３０）外接矩形生成手段１０１によ
り、画像中のすべての連続黒画素塊に対してその外接矩
形を求める。(S30) The circumscribing rectangle generating means 101 obtains circumscribing rectangles for all continuous black pixel blocks in the image.

【００２８】（Ｓ３１）生成された外接矩形に対して文
字認識手段１０２−１により文字コードと、認識された
画像が標準パターンとどの程度近く認識されたかを示す
確信度を得る。確信度を用いて認識された画像が文字画
像であるか否かが推定できる。その結果、リジェクトさ
れなかった矩形，または確信度が所定値より大きい矩形
を文字矩形として確定する。得られた文字コードと確信
度は各矩形と対応して記憶される。処理内容の具体例を
図４、図５、図６を用いて後述する。(S31) With respect to the generated circumscribed rectangle, the character recognition means 102-1 obtains a character code and a certainty factor indicating how closely the recognized image is recognized as a standard pattern. Using the certainty factor, it can be estimated whether the recognized image is a character image. As a result, a rectangle that is not rejected or a rectangle whose certainty factor is greater than a predetermined value is determined as a character rectangle. The obtained character code and certainty factor are stored in association with each rectangle. A specific example of the processing content will be described later with reference to FIGS. 4, 5, and 6.

【００２９】（Ｓ３２）分離文字候補抽出手段１０４と
分離文字統合手段１０５とにより分離文字の推定、統合
を行ない、分離文字として統合された矩形を文字矩形と
して確定する。本ステップについての処理内容の詳細は
図７を用いて後述する。(S32) The separated character candidate extraction means 104 and the separated character integration means 105 estimate and integrate the separated characters, and the rectangle integrated as the separated characters is determined as a character rectangle. Details of the processing content of this step will be described later with reference to FIG. 7.

【００３０】（Ｓ３３）接触文字候補抽出手段１０９と
接触文字分割手段１１０とにより接触文字の推定、分割
を行ない、接触文字として分割された矩形を文字矩形と
して確定する。本ステップについての処理内容の詳細は
図９、図１０を用いて後述する。(S33) The contact character candidate extracting means 109 and the contact character dividing means 110 estimate and divide the contact character, and determine the rectangle divided as the contact character as the character rectangle. Details of the processing content of this step will be described later with reference to FIGS. 9 and 10.

【００３１】（Ｓ３４）文字列推定手段１０７により１
つ以上の文字矩形の集合として文字列を推定する。詳細
なフローチャートの一例を図１２に示し、具体的な処理
の説明は、図１１、図１３を用いて後述する。(S34) 1 by the character string estimating means 107
Estimate a string as a set of one or more character rectangles. An example of a detailed flowchart is shown in FIG. 12, and a specific process will be described later with reference to FIGS. 11 and 13.

【００３２】（Ｓ３５）文字ブロック推定手段１０８に
より１つ以上の文字列の集合として文字ブロックを推定
する。詳細なフローチャートの一例を図１５に示すとと
もに、詳細な説明は図１５を用いて後述する。(S35) The character block estimation means 108 estimates a character block as a set of one or more character strings. An example of a detailed flowchart is shown in FIG. 15, and a detailed description will be given later with reference to FIG.

【００３３】（Ｓ３６）非文字矩形推定手段１０６によ
り、１つ以上の文字矩形でない矩形の集合として非文字
矩形を推定する。(S36) The non-character rectangle estimating means 106 estimates a non-character rectangle as a set of one or more non-character rectangles.

【００３４】（Ｓ３７）文字矩形、非文字矩形のどちら
にも確定していない矩形があるか否かを調べる。(S37) It is checked whether there is an unfixed rectangle in both the character rectangle and the non-character rectangle.

【００３５】（Ｓ３８）確定していない矩形がある場合
には、確定していない矩形を対象に、分離文字候補抽出
手段１０４と分離文字統合手段１０５、または接触文字
候補抽出手段１０９と接触文字分割手段１１０により再
推定する。この際、既に推定されている文字列、文字ブ
ロック、非文字矩形との位置関係などを推定に用い、ま
た文字矩形として確定する条件を変更するので、前回の
推定で文字矩形として確定しなかった矩形の一部も文字
矩形として確定される場合がある。この再推定の詳細な
例は図１９を用いて後述する。新たに文字矩形として確
定した矩形も含めて、再び（Ｓ３４）の文字列の推定か
ら処理を繰り返す。(S38) If there is an undetermined rectangle, the separated character candidate extraction means 104 and the separated character integration means 105, or the contact character candidate extraction means 109 and the contact character division are targeted for the undetermined rectangle. Re-estimate by the means 110. At this time, the previously estimated character string, character block, positional relationship with a non-character rectangle, etc. are used for estimation, and the condition for confirming the character rectangle is changed, so it was not confirmed as the character rectangle in the previous estimation. A part of the rectangle may be fixed as the character rectangle. A detailed example of this re-estimation will be described later with reference to FIG. The process is repeated from the estimation of the character string in (S34), including the rectangle newly determined as the character rectangle.

【００３６】（Ｓ３９）確定していない矩形が無い場合
には、文字ブロック順序推定手段１１１、文字コード出
力手段１１２、論理構造解析手段１１３などにより、必
要な文字コードや文書構造の推定結果を出力して処理を
終わる。(S39) If there is no undetermined rectangle, the character block order estimation means 111, the character code output means 112, the logical structure analysis means 113, etc. output the estimation results of the required character codes and document structures. Then the processing is completed.

【００３７】図４は、図３の（Ｓ３０）で外接矩形生成
手段１０１により得られる外接矩形の例を示す図であ
る。４０１は文書画像の一部分であり、４１１〜４２０
が生成された外接矩形を示す。画像中の連結黒画素塊を
検出してその外接矩形を得る方法は、既に様々な方法が
提案されており、それらの何れを用いても構わない。例
えばコロナ社『画像工学』（１９８９年８月２５日初版
第１冊発行）１１５ページから１１６ページに連結黒画
素塊を検出する技術が示されている。FIG. 4 is a diagram showing an example of a circumscribed rectangle obtained by the circumscribed rectangle generation means 101 in (S30) of FIG. 401 is a part of the document image, and 411-420
Shows the generated circumscribed rectangle. Various methods have already been proposed as a method of detecting a connected black pixel block in an image and obtaining a circumscribing rectangle thereof, and any of them may be used. For example, a technique for detecting a connected black pixel block is shown on pages 115 to 116 of "Image Engineering" (published on August 25, 1989, first edition, first volume) by Corona.

【００３８】図５は、文字認識手段１０２−１、１０２
−２、及び１０２−３の構成の一例である。これらの文
字認識手段に入力された画像は、まず画像正規化部５０
１により大きさが正規化され、特徴抽出部４０２により
特徴が抽出され、距離計算部５０３により各文字に対応
する標準特徴との距離が計算され、結果出力部５０４に
より距離が小さい順にソートされた少なくとも１つの文
字に対応する文字コードと確信度が出力される。例えば
確信度として、最も小さい距離の逆数を用いることがで
きる。FIG. 5 shows the character recognition means 102-1 and 102.
2 and 102-3 are examples of the configurations. The image input to these character recognition means is first processed by the image normalization unit 50.
The size is normalized by 1, the feature is extracted by the feature extraction unit 402, the distance between the standard feature corresponding to each character is calculated by the distance calculation unit 503, and the result output unit 504 sorts the distances in ascending order. The character code and the certainty factor corresponding to at least one character are output. For example, the reciprocal of the smallest distance can be used as the certainty factor.

【００３９】外接矩形生成手段１０１により得られた外
接矩形の各々に対して文字認識手段１０２−１により得
られる文字コードと確信度の例を図６に示す。得られた
確信度があらかじめ定められた値より小さい場合は、矩
形内の画像がどの文字とも似ていないことを示すので、
分離文字の一部であるか、図形など文字ではない画像で
あると推測することができる。従って、そのような矩形
はリジェクトされる。それ以外の矩形、すなわち、あら
かじめ定められた値より大きい値の確信度を持つ矩形
は、得られた文字コードを持つ文字矩形であるとして確
定される。FIG. 6 shows an example of the character code and the certainty factor obtained by the character recognizing means 102-1 for each of the circumscribing rectangles obtained by the circumscribing rectangle generating means 101. If the obtained certainty factor is smaller than a predetermined value, it means that the image in the rectangle is not similar to any character.
It can be inferred that it is a part of the separated characters or an image such as a figure that is not a character. Therefore, such a rectangle is rejected. Other rectangles, that is, rectangles having a certainty factor larger than a predetermined value are determined as character rectangles having the obtained character code.

【００４０】図７は、（Ｓ３２）において分離文字候補
抽出手段１０４が、隣接する複数の外接矩形を統合する
ことにより一文字に相当する画像になる可能性がある候
補を抽出する例を示す図である。この例では、確信度が
１００未満である２つ以上の矩形を統合して得られる矩
形が、近傍の矩形の幅・高さのそれぞれ３倍以内であ
り、かつ他の外接矩形に重ならないかまたは完全に包含
されるという条件で分離文字候補が抽出されるものとす
る。図６の例では、外接矩形４１３と４１４を統合して
得られる矩形は、近傍の矩形の幅・高さのそれぞれ３倍
以内という条件は満たしているが、矩形４１２と重なっ
ているため分離文字候補とはならない。この条件は、文
字矩形は他の文字矩形または非文字矩形とは重ならない
という仮定を用いて、不要な分離文字候補の生成を避け
るための条件である。ただし、表の中の文字のように、
他の非文字矩形に完全に包含される場合はあるので、他
の外接矩形に完全に包含されるものは分離文字候補とし
て抽出する。また、英文のイタリック体の文字のように
文字矩形どうしが重なる場合のある文書画像を対象とす
るときには、他の外接矩形に重ならないという条件は用
いなくても良い。図７に示すように、図６中の外接矩形
４１２、４１３、４１４から分離文字候補７１１が得ら
れる。同様に４１７、４１８から７１２が、４１８、４
１９から７１３が、４１９、４２０から７１４が分離文
字候補として得られる。FIG. 7 is a diagram showing an example in which the separated character candidate extracting means 104 extracts a candidate which may become an image corresponding to one character by integrating a plurality of adjacent circumscribing rectangles in (S32). is there. In this example, whether a rectangle obtained by integrating two or more rectangles with a certainty factor of less than 100 is within three times the width and height of a neighboring rectangle and does not overlap another circumscribed rectangle. Alternatively, it is assumed that the separated character candidates are extracted on the condition that they are completely included. In the example of FIG. 6, the rectangle obtained by integrating the circumscribing rectangles 413 and 414 satisfies the condition that the width and height of neighboring rectangles are each within three times, but since they overlap the rectangle 412, the separated characters are separated. Not a candidate. This condition is a condition for avoiding generation of unnecessary separated character candidates by using the assumption that the character rectangle does not overlap with other character rectangles or non-character rectangles. However, like the letters in the table,
Since it may be completely included in other non-character rectangles, those completely included in other circumscribed rectangles are extracted as separated character candidates. Further, when targeting a document image in which character rectangles may overlap with each other, such as italicized letters in English, the condition that they do not overlap other circumscribed rectangles need not be used. As shown in FIG. 7, the separated character candidates 711 are obtained from the circumscribing rectangles 412, 413, and 414 in FIG. Similarly, 417, 418 to 712 are 418, 4
19 to 713, 419 and 420 to 714 are obtained as the separated character candidates.

【００４１】７０１は、本実施例において、分離文字候
補抽出手段１０４により得られた分離文字候補の各々に
対して、文字認識手段１０２−２により得られる文字コ
ードと確信度の例を示す図である。701 is a diagram showing an example of the character code and the certainty factor obtained by the character recognizing means 102-2 for each of the separated character candidates obtained by the separated character candidate extracting means 104 in this embodiment. is there.

【００４２】図８は、分離文字統合手段１０５が、分離
文字候補の中から実際に分離文字であるものを推定し、
複数の外接矩形を統合する例を示す図である。In FIG. 8, the separated character integration means 105 estimates the actual separated characters from the separated character candidates,
It is a figure which shows the example which integrates several circumscribed rectangles.

【００４３】統合の例として、矩形の幅・高さの一様
性、矩形間の距離の一様性、矩形に対応する確信度の大
小、また文法辞書を用いることにより、各矩形に対応す
る文字コード間の単語内・単語間の接続性等を基に最も
適切な分離文字の組み合わせが決定される。As an example of the integration, the uniformity of the width and height of the rectangle, the uniformity of the distance between the rectangles, the degree of certainty corresponding to the rectangles, and the use of the grammar dictionary are used to correspond to the respective rectangles. The most appropriate combination of separation characters is determined based on the intra-word / inter-word connectivity between character codes.

【００４４】この例では、図４の矩形４１２、図７の７
１１、７１２、７１４の幅がほぼ等しく、図４の矩形４
１５、４１６、４１９、４２０、図７の７１４の高さが
ほぼ等しい。また、矩形４１２または７１１、４１５、
４１６、７１２、７１４の中心間の距離がほぼ等しい。
更に、矩形４１２または７１１、４１５、４１６、７１
２に対応する文字コードは、文法辞書内に記憶されてい
る名詞“ベクトル”の単語内の順序に一致し、名詞と接
尾語“化”が接続することから、矩形７１４に対応する
文字コードも矩形４１２または７１１、４１５、４１
６、７１２に対応する文字コードとの接続性を満たす。In this example, the rectangle 412 in FIG. 4 and the rectangle 7 in FIG.
The widths of 11, 712 and 714 are almost equal, and the rectangle 4 in FIG.
The heights of 15, 416, 419, 420 and 714 of FIG. 7 are substantially equal. Also, the rectangles 412 or 711, 415,
The distances between the centers of 416, 712, 714 are approximately equal.
In addition, rectangles 412 or 711, 415, 416, 71
The character code corresponding to 2 corresponds to the order in the word of the noun “vector” stored in the grammar dictionary, and since the noun and the suffix “ka” are connected, the character code corresponding to the rectangle 714 is also Rectangle 412 or 711, 415, 41
The connectivity with the character codes corresponding to 6, 712 is satisfied.

【００４５】また、例えば矩形４１２と４１３と４１
４、矩形７１２と７１３、矩形７１３と７１４は互いに
重なっており、文字として同時に成り立つ可能性は低
い。これらの情報から、分離文字統合手段１０５は、分
離文字候補７１１、７１２、７１４が分離文字であると
推定して統合し、結果として、図８に示すような分離文
字統合後の矩形８１１〜８１６が得られる。Further, for example, rectangles 412, 413 and 41
4, the rectangles 712 and 713, and the rectangles 713 and 714 overlap with each other, and it is unlikely that they will be valid as characters at the same time. From these pieces of information, the separated character integration unit 105 estimates that the separated character candidates 711, 712, and 714 are separated characters and integrates them, and as a result, the separated character integrated rectangles 811 to 816 as illustrated in FIG. Is obtained.

【００４６】複数の分離文字候補から適切な分離文字の
組み合わせを決定するのに十分な情報が得られない部分
がある場合には、その部分に関して分離文字統合手段１
０５による処理を保留して分離文字候補を記憶してお
き、例えば文字列推定手段１０７による処理を行なった
後で文字列内の文字の大きさや文字列の方向を用いて、
分離文字統合手段１０５による処理を再度行なうことに
より、分離文字を統合する精度を更に向上できる。If there is a portion for which sufficient information for determining an appropriate combination of separated characters cannot be obtained from a plurality of separated character candidates, the separated character integrating means 1 for that portion.
The process of 05 is suspended and the separated character candidates are stored. For example, after the process of the character string estimating unit 107 is performed, the size of the character in the character string and the direction of the character string are used,
By performing the processing by the separated character integration unit 105 again, the accuracy of integrating the separated characters can be further improved.

【００４７】次に、図３の（Ｓ３３）の処理、すなわ
ち、図１の接触文字候補抽出手段１０９及び接触文字分
割手段１１０の動作について説明する。接触文字候補抽
出手段１０９により、各外接矩形の位置および形状、対
応する文字コード、確信度、文法辞書内の情報から、外
接矩形を分割して得られる複数の矩形のそれぞれが一文
字に相当する画像または分離文字候補になる可能性があ
る接触文字候補が抽出される。Next, the processing of (S33) in FIG. 3, that is, the operations of the contact character candidate extraction means 109 and the contact character division means 110 of FIG. 1 will be described. An image in which each of a plurality of rectangles obtained by dividing the circumscribed rectangle by the contact character candidate extraction means 109 from the position and shape of each circumscribed rectangle, the corresponding character code, the certainty factor, and the information in the grammar dictionary corresponds to one character. Alternatively, a contact character candidate that may be a separated character candidate is extracted.

【００４８】図９は、接触文字候補抽出手段１０９が、
接触文字候補を抽出する例を示す図である。９０１は文
書画像の一部、９１１〜９１７は外接矩形生成手段によ
り得られる外接矩形、９０２は外接矩形の各々に対して
文字認識手段１０２−１により得られる文字コードと確
信度である。例として、接触文字候補は、対応する確信
度が１００未満である矩形を分割して得られる矩形が、
近傍の矩形の幅・高さのいずれかが１．５倍以上である
という条件で抽出される。この例では、外接矩形９１７
が接触文字候補として得られる。In FIG. 9, the contact character candidate extraction means 109 is
It is a figure which shows the example which extracts a contact character candidate. 901 is a part of the document image, 911 to 917 are circumscribing rectangles obtained by the circumscribing rectangle generating means, and 902 is a character code and certainty factor obtained by the character recognizing means 102-1 for each of the circumscribing rectangles. As an example, the contact character candidate is a rectangle obtained by dividing a rectangle whose corresponding certainty factor is less than 100,
It is extracted under the condition that either the width or the height of the neighboring rectangle is 1.5 times or more. In this example, the circumscribed rectangle 917
Is obtained as a contact character candidate.

【００４９】図１０は、接触文字分割手段１１０が、接
触文字候補の中から実際に接触文字であるものを推定
し、複数の矩形に分割し、分割して得られる矩形の各々
に対して、文字認識手段１０２−３により認識を行なう
例を示す図である。接触文字において分割すべき位置を
推定し分割する技術は、例えば、特許出願公開「特開平
５−１２８３０８」に示されているように、様々なもの
が公知になっており、それらを用いることができる。１
０１１および１０１２は、接触文字候補抽出手段１０９
により接触文字候補として得られた９１７を分割して得
られる矩形である。分割して得られた矩形の各々に対し
て、文字認識手段により得られる文字コードと確信度を
１００１に示す。接触文字分割手段１１０は、矩形１０
１１と１０１２の幅が近傍の矩形とほぼ等しいこと、矩
形１０１１および１０１２に対応する確信度が大きいこ
と、矩形１０１１および１０１２に対応する文字コード
の並びが文法辞書内にある名詞“年代”の並びと一致す
ることから、接触文字候補９１７は実際に接触文字であ
ると推測し、矩形１０１１および１０１２を文字矩形と
して登録する。In FIG. 10, the contact character dividing means 110 estimates what is actually a contact character from among the contact character candidates, divides it into a plurality of rectangles, and for each of the obtained rectangles, It is a figure which shows the example which recognizes by the character recognition means 102-3. As a technique for estimating and dividing a position to be divided in a contact character, various techniques are known as disclosed in, for example, Japanese Patent Application Laid-Open No. 5-128308, and it is possible to use them. it can. 1
011 and 1012 are contact character candidate extraction means 109.
It is a rectangle obtained by dividing 917 obtained as a contact character candidate by. 1001 shows the character code and the certainty factor obtained by the character recognition means for each of the rectangles obtained by the division. The contact character dividing unit 110 is a rectangle 10
The widths of 11 and 1012 are almost equal to the neighboring rectangles, the certainty factor corresponding to the rectangles 1011 and 1012 is large, and the sequence of character codes corresponding to the rectangles 1011 and 1012 is the sequence of nouns "age" in the grammar dictionary. It is estimated that the contact character candidate 917 is actually a contact character, and the rectangles 1011 and 1012 are registered as character rectangles.

【００５０】また、実施例では（Ｓ３６）において、矩
形４１１は対応する確信度が小さいこと、接触文字候補
を構成しないこと、矩形の幅が近傍の矩形の平均に比べ
て大きいことから、非文字矩形推定手段１０６により非
文字矩形であると推定される。これらの条件は、（Ｓ３
６）の処理が繰り返される毎に、初めは厳しい条件にし
ておき、徐々に緩くしていくことにより、推定の精度を
高くすることができる。例えば確信度が小さいという条
件では、１度目の推定では閾値を１０とし、その後の推
定で閾値を１０ずつ増加させる。推定の精度が高くなる
理由は、図１９を用いて後述するように、文字矩形にも
非文字矩形にも確定されていない矩形に対して文字矩形
の再推定が行なわれ、例えば１度目の推定で確信度が低
くて文字矩形として確定されなかった矩形が、文字矩形
として確定される場合があるためである。初期の条件を
厳しくすることにより、本来文字である矩形を誤って非
文字矩形として確定してしまう誤りを減らすことができ
る。In addition, in the embodiment (S36), since the rectangle 411 has a small corresponding certainty factor, does not form a contact character candidate, and the width of the rectangle is larger than the average of the neighboring rectangles, the non-character The rectangle estimating unit 106 estimates that the rectangle is a non-character rectangle. These conditions are (S3
Every time the process of 6) is repeated, a strict condition is initially set, and the condition is gradually loosened, so that the estimation accuracy can be increased. For example, under the condition that the certainty factor is small, the threshold value is set to 10 in the first estimation, and the threshold value is increased by 10 in the subsequent estimation. The reason why the estimation accuracy is high is that the character rectangle is re-estimated for a rectangle that has not been determined as a character rectangle or a non-character rectangle, as will be described later with reference to FIG. This is because a rectangle that has a low certainty factor and is not determined as a character rectangle may be determined as a character rectangle. By making the initial condition strict, it is possible to reduce the error that a rectangle that is originally a character is mistakenly determined as a non-character rectangle.

【００５１】次に、文字列推定手段１０７の処理につい
て説明する。図１１は、文字列推定手段１０７が文字列
を推定する過程の一例を示す図である。１１０１は文書
画像の一部分である。１１０２は分離文字統合後の矩形
を示す。この例では、矩形の形状や位置のみの情報を用
いると縦方向にも横方向にも統合が可能であるが、文字
列推定手段１０７は、各矩形に対応する文字コードを用
い、文法辞書を参照して文字行の方向を推定する。即
ち、禁則ルール、“、”や“。”の文字コードが対応す
る矩形の文字列中の位置、文字コードの並びに現れる文
節数などにより、縦書きの文字列であることが推定でき
る。この推定の過程の詳細は、図１２、図１３を用いて
後述する。結果として１１０３に示すような文字列が得
られる。Next, the processing of the character string estimating means 107 will be described. FIG. 11 is a diagram showing an example of a process in which the character string estimating unit 107 estimates a character string. 1101 is a part of the document image. Reference numeral 1102 indicates a rectangle after the separated characters are integrated. In this example, if the information on only the shape and position of the rectangle is used, it can be integrated in the vertical direction and the horizontal direction. However, the character string estimating unit 107 uses the character code corresponding to each rectangle to open the grammar dictionary. Estimate the direction of a character line by referring to it. That is, it can be presumed that the character string is written vertically by the prohibition rule, the position in the rectangular character string to which the character code of "," or "." Corresponds, the number of clauses appearing in the character code, and the like. Details of this estimation process will be described later with reference to FIGS. 12 and 13. As a result, a character string shown in 1103 is obtained.

【００５２】図１２は、本発明の文字列推定手段１０７
の一実施例の動作を示すフローチャートであって、図３
の（Ｓ３４）を詳細化したものである。フローチャート
の概要を次に示し、各ステップの詳細については後述す
る。FIG. 12 shows the character string estimating means 107 of the present invention.
3 is a flowchart showing the operation of one embodiment of FIG.
This is a detailed version of (S34). The outline of the flowchart is shown below, and the details of each step will be described later.

【００５３】（Ｓ１２０）文書画像中の文字矩形として
確定している各矩形を対象に、縦横それぞれの方向の文
字列候補を生成する。(S120) Character string candidates in the vertical and horizontal directions are generated for each rectangle defined as the character rectangle in the document image.

【００５４】（Ｓ１２１）各文字列候補に対してスコア
を計算する。(S121) A score is calculated for each character string candidate.

【００５５】（Ｓ１２２）互いに矛盾する文字列候補の
組があるか否かを調べる。ここで、互いに矛盾する文字
列候補の組とは、同一の文字矩形を含む複数の文字列候
補、互いに交差する文字列候補などである。(S122) It is checked whether there is a set of character string candidates that contradict each other. Here, a set of character string candidates that contradict each other is a plurality of character string candidates including the same character rectangle, a character string candidate that intersects with each other, and the like.

【００５６】（Ｓ１２３）互いに矛盾する文字列候補の
組がある場合には、それらの文字列候補の中から、スコ
アの低い候補を削除する。その後、矛盾する文字列候補
の検査へ戻り、互いに矛盾する文字列候補の組が無くな
るまで繰り返す。(S123) If there is a set of character string candidates that contradict each other, the candidate with a low score is deleted from the character string candidates. After that, the process returns to the inspection of the contradictory character string candidates, and is repeated until there is no set of the contradictory character string candidates.

【００５７】（Ｓ１２４）互いに矛盾する文字列候補の
組が無い場合は、文字列候補を文字列推定結果として出
力して処理を終わる。(S124) If there is no set of character string candidates contradictory to each other, the character string candidates are output as the character string estimation result, and the process ends.

【００５８】（Ｓ１２０）の文字列候補の生成には、従
来の技術を用いることができる。例えば、各矩形の大き
さや形状の類似性や、矩形間の距離や位置関係が予め定
められた条件を満たすものを文字列候補として抽出す
る。この処理においては従来技術を用いるものの、本発
明の一実施例である文書画像解析装置においては、文字
認識手段等の結果を利用することにより、より高精度、
高効率に文字列候補を生成できる。Conventional techniques can be used to generate the character string candidates in (S120). For example, the size and shape similarity of each rectangle, and the distance and positional relationship between the rectangles satisfying predetermined conditions are extracted as character string candidates. Although the conventional technique is used in this process, in the document image analysis device according to the embodiment of the present invention, by using the result of the character recognition means or the like, higher accuracy,
It is possible to generate character string candidates with high efficiency.

【００５９】従来の文書画像解析装置においては、単な
る連結黒画像塊の外接矩形を対象に文字列候補の生成を
行なっている。従って、例えば、「川」のような分離文
字や、複数の文字が接触している文字に関しては、複数
の矩形が１文字に対応したり、１つの矩形が複数の文字
に対応することがある。このため、大きさや形状が類似
している矩形の並びを文字列候補として生成しようとし
ても、１文字に対応する大きさや形状が正確には分から
ないという問題がある。従って、生成条件を厳しくすれ
ば、本来の文字列が文字列候補として抽出できず、また
生成条件を緩やかにすれば、余分な文字列候補を多数生
成することになり、精度や効率が悪くなる。In the conventional document image analysis apparatus, character string candidates are generated for the circumscribed rectangle of a simple connected black image block. Therefore, for example, regarding a separated character such as “kawa” or a character in which a plurality of characters are in contact, a plurality of rectangles may correspond to one character, or one rectangle may correspond to a plurality of characters. . Therefore, even if an array of rectangles having similar sizes and shapes is generated as a character string candidate, there is a problem that the size and shape corresponding to one character cannot be accurately known. Therefore, if the generation conditions are made strict, the original character strings cannot be extracted as character string candidates, and if the generation conditions are made loose, a large number of extra character string candidates will be generated, resulting in poor accuracy and efficiency. .

【００６０】一方、本発明の一実施例である文書画像解
析装置では、図３に示すように、（Ｓ３１）の文字認識
手段１０２−１、（Ｓ３２）の分離文字統合手段１０
５、（Ｓ３３）の接触文字分割手段１１０等により、
（Ｓ３４）の文字列候補の生成に先だって文字矩形を確
定する。そして、文字列推定手段１０７は文字矩形とし
て確定した矩形を対象に文字列候補の生成を行なう。各
矩形が１文字に対応しているため、大きさや形状が類似
している文字の並びを正確に検出することができる。従
って、文字列候補の生成条件の設定も容易になり、余分
な文字列候補の生成を減少でき、かつ本来の文字列を確
実に候補の中に含めることができる。On the other hand, in the document image analyzing apparatus which is an embodiment of the present invention, as shown in FIG. 3, the character recognizing means 102-1 in (S31) and the separated character integrating means 10 in (S32).
5, by the contact character dividing means 110 of (S33),
The character rectangle is fixed prior to the generation of the character string candidates in (S34). Then, the character string estimating means 107 generates a character string candidate for the rectangle determined as the character rectangle. Since each rectangle corresponds to one character, it is possible to accurately detect the arrangement of characters having similar sizes and shapes. Therefore, it becomes easy to set the generation condition of the character string candidates, the generation of extra character string candidates can be reduced, and the original character string can be surely included in the candidates.

【００６１】図１３は図１２の（Ｓ１２１）における、
文字列候補に対するスコア計算の一例を示す図である。
図１１の１１０２に示された文字矩形に対して、縦横そ
れぞれの方向の文字列候補を生成したものが、図１３中
の（１）から（２２）である。（１）から（１２）が横
方向の文字列候補、（１３）から（２２）が縦方向の文
字列候補である。各々の文字列候補に含まれる文字矩形
に対応した文字コードで文字列候補を示している。FIG. 13 is a graph of (S121) of FIG.
It is a figure which shows an example of the score calculation with respect to a character string candidate.
The character string candidates in the vertical and horizontal directions for the character rectangle shown in 1102 of FIG. 11 are (1) to (22) in FIG. (1) to (12) are horizontal character string candidates, and (13) to (22) are vertical character string candidates. A character string candidate is indicated by a character code corresponding to a character rectangle included in each character string candidate.

【００６２】表中の列Ｓ１は、各文字列候補の文字数に
対応したスコアである。文字数が多いほど大きなスコア
を与える。The column S1 in the table is a score corresponding to the number of characters of each character string candidate. The greater the number of characters, the greater the score.

【００６３】Ｓ２は、禁則文字に係わるスコアである。
“。”や“」”などの文字が文字列の先頭にある場合
や、“「”などの文字が文字列の最後尾にある場合は、
文字列として不自然であるので、マイナスのスコアを与
える。この例の場合、文字列候補（８）と（１１）がこ
れに該当し、スコア−１０が与えられている。禁則文字
および禁則ルールは、文字列推定手段中に記憶しておい
ても良いし、または文法辞書中に記憶しておいても良
い。S2 is a score relating to prohibited characters.
If a character such as "." Or "" is at the beginning of the string, or if a character such as "" is at the end of the string,
Since it is unnatural as a character string, a negative score is given. In the case of this example, the character string candidates (8) and (11) correspond to this, and score -10 is given. The prohibited characters and prohibited rules may be stored in the character string estimating means or may be stored in the grammar dictionary.

【００６４】Ｓ３は、文字列の縦横によって、向きや位
置の異なる文字に係わるスコアである。例えば、句点を
示す文字“。”は文字列の縦横によって文字列中に配置
される位置が異なる。また鍵括弧を示す文字“」”は文
字列の縦横によって、向きが異なる。図１３の例の場
合、横の文字列候補（３）、（８）、（９）、（１１）
にはそれぞれ“。”または“、”が含まれているが、対
応するそれぞれの文字矩形は、横の文字列候補中の上部
に位置しており、横書き中の位置としては不自然である
ため、（３）、（８）、（９）、（１１）にはそれぞれ
スコア−５が与えられる。S3 is a score relating to a character whose orientation and position are different depending on the length and width of the character string. For example, the position of the character "." Indicating a punctuation mark in the character string differs depending on the length and width of the character string. The direction of the character """indicating the bracket is different depending on the length and width of the character string.In the example of Fig. 13, horizontal character string candidates (3), (8), (9), (11).
Each contains a "." Or ",", but the corresponding character rectangles are located at the top of the horizontal character string candidates, which is unnatural for horizontal writing. , (3), (8), (9), and (11) are each given a score of -5.

【００６５】Ｓ４は文法的な検定に係わるスコアであ
る。各文字列候補に含まれる文字矩形に対応する文字コ
ードの並びに対して、検出された単語数などに応じて文
章として自然であるほど高いスコアを与える。文章とし
て自然であることの基準の例として、文字コードの並び
の中に文節として成り立つ２文字以上の並びがあれば、
スコア５を与える。図１３の例では（４）の“もし”、
（１７）の“込んだ”などがこれに相当する。更に、上
記文節の中で、文節の長さが５文字以上であるか、漢字
が３文字以上含まれる場合にはスコア５を加える。図１
３の例では（１３）の“戦闘機に”、（１４）の“パイ
ロットの”などがこれに相当する。また、文字コードの
並びの先頭に、文節の前部が欠けたものと見なせる２文
字以上の並びがあれば、スコア５を与える。同様に、文
字コードの並びの後尾に、文節の後部が欠けたものと見
なせる２文字以上の並びがあれば、スコア５を与える。
図１３の例では（１３）の“地で”、（１６）の“ンデ
ィなど”などがこれに相当する。S4 is a score related to the grammatical test. With respect to the arrangement of the character codes corresponding to the character rectangles included in each character string candidate, a higher score is given as the sentence becomes more natural according to the number of detected words. As an example of the criteria for being natural as a sentence, if there is a sequence of two or more characters that is a phrase in the sequence of character codes,
Give a score of 5. In the example of FIG. 13, “Moshi” of (4),
The "complex" of (17) corresponds to this. Furthermore, in the above phrase, if the length of the phrase is 5 characters or more, or if the Chinese character includes 3 characters or more, a score of 5 is added. FIG.
In the example of FIG. 3, "13" is "fighter" and "14" is "pilot's". If the beginning of the character code sequence is a sequence of two or more characters in which the front part of the phrase can be regarded as missing, a score of 5 is given. Similarly, a score of 5 is given if there is a sequence of two or more characters at the tail of the sequence of character codes that can be regarded as a lack of the rear part of the phrase.
In the example of FIG. 13, “13”, “in the ground”, and (16), “Ndy etc.” correspond to this.

【００６６】Ｔ１は、Ｓ１からＳ４までのスコアの第１
の合計値である。T1 is the first of the scores from S1 to S4
Is the total value of.

【００６７】Ｓ５は、各文字列候補において、その候補
と矛盾する他の文字列候補に係わるスコアである。例え
ば、文字列候補（１）は、文字列候補（１３）、（１
４）、（１６）から（２２）のそれぞれに対して、互い
に同一の文字矩形を含んでおり、矛盾している。文字列
候補（１３）、（１４）、（１６）から（２２）までの
第１の合計値Ｔ１の総和に重み−０．１を乗じて、文字
列候補（１）に対するスコアＳ５として与える。S5 is a score relating to another character string candidate which is inconsistent with the candidate in each character string candidate. For example, the character string candidate (1) is the character string candidate (13), (1
4) and (16) to (22) include the same character rectangles and are inconsistent. The sum of the first total value T1 of the character string candidates (13), (14), (16) to (22) is multiplied by a weight of -0.1, and given as a score S5 for the character string candidate (1).

【００６８】Ｔ２は、第１の合計値Ｔ１とスコアＳ５を
足した第２の合計値である。このＴ２を各文字列候補の
最終的なスコアとして、文字列候補の削除に用いる。T2 is a second total value obtained by adding the first total value T1 and the score S5. This T2 is used as the final score of each character string candidate to delete the character string candidate.

【００６９】図１１および図１３を用いて説明した例で
は、矛盾する文字列候補の組のうち、スコアの小さい文
字列候補を順次削除していく処理を繰り返す結果、文字
列候補（１３）から（２２）が残り、これらが文字列推
定結果として出力される。以上に述べたスコアの重みの
配分や閾値の値は、対象として想定する文書画像に対し
て適切な推定ができるように実験的に求めればよく、本
実施例に述べた数値と異なるものを用いてもよい。In the example described with reference to FIGS. 11 and 13, as a result of repeating the process of sequentially deleting the character string candidates having a smaller score among the contradictory character string candidate sets, the character string candidates (13) are selected. (22) remains, and these are output as the character string estimation result. The distribution of the score weights and the threshold value described above may be obtained experimentally so that an appropriate estimation can be performed for the document image assumed as a target, and a value different from the numerical values described in this embodiment is used. May be.

【００７０】図１４は、文字ブロック推定手段１０８が
文字ブロックを推定する一例を示す図である。この例で
は、文字列推定手段１０７の結果得られた文字列の並び
１１０３に対して、６つの行すべてについて文字行の幅
および上辺の高さ、横に隣接する文字行間の距離ががほ
ぼ一定であること、文字行中に含まれる矩形の形状・大
きさと縦に隣接する矩形間の距離が各行に共通してほぼ
一定であること、さらに“ハンディなど”“ロシアの”
“舞い上がり”“見事な”など各行にまたがって日本語
の文節として成り立つ文字コードの並びがあることなど
から、６つの行は１つの文字ブロックに統合できること
が推定できる。結果として１４０１に示すような文字ブ
ロックが得られる。FIG. 14 is a diagram showing an example in which the character block estimating means 108 estimates a character block. In this example, with respect to the arrangement 1103 of the character strings obtained as a result of the character string estimation means 107, the widths of the character lines and the heights of the upper sides, and the distances between the adjacent character lines in the horizontal direction are substantially constant. That the shape and size of the rectangle contained in the character line and the distance between vertically adjacent rectangles are almost constant in common for each line, and that "Handy, etc.""Russian"
It can be inferred that six lines can be integrated into one character block because there is a sequence of character codes such as “soaring up” and “stunning” that spans each line as a Japanese phrase. As a result, a character block as shown in 1401 is obtained.

【００７１】図１５は、本発明における文字ブロック推
定手段の一実施例の動作を示すフローチャートである。FIG. 15 is a flow chart showing the operation of one embodiment of the character block estimating means in the present invention.

【００７２】（Ｓ１５０）まず、文書画像中の文字列を
対象に、接続する可能性のある２つの文字列の組である
文字列間接続候補を生成する。(S150) First, an inter-character string connection candidate, which is a set of two character strings that may be connected, is generated for a character string in a document image.

【００７３】（Ｓ１５１）各接続候補についてスコアを
計算する。(S151) A score is calculated for each connection candidate.

【００７４】（Ｓ１５２）予め定められた閾値と各接続
候補のスコアを比較し、スコアが閾値未満である接続候
補を削除する。(S152) The score of each connection candidate is compared with a predetermined threshold value, and the connection candidate whose score is less than the threshold value is deleted.

【００７５】（Ｓ１５３）残っている接続候補に従っ
て、文字列どうしを統合して文字ブロックを生成する。(S153) According to the remaining connection candidates, the character strings are integrated to generate a character block.

【００７６】（Ｓ１５４）最後に、生成された文字ブロ
ックを推定結果として出力して処理を終わる。(S154) Finally, the generated character block is output as an estimation result, and the process is terminated.

【００７７】この動作フローは、従来の技術でも用いら
れることがある。本発明の一実施例における文字ブロッ
ク推定手段１０８の主な特徴は各候補のスコアの計算方
法にある。次にこのスコアの計算方法を中心に、従来手
法と比較しながら、各ステップの詳細を説明する。This operation flow may be used in the conventional technique. The main feature of the character block estimation means 108 in one embodiment of the present invention is the method of calculating the score of each candidate. Next, the details of each step will be described, focusing on this score calculation method and comparing with the conventional method.

【００７８】文字列間接続候補の生成には、従来の技術
を用いることができる。例えば、２つの文字列の組の間
に接続候補を生成する条件として、以下の条件を用い
る。Conventional techniques can be used to generate the inter-character string connection candidates. For example, the following condition is used as a condition for generating a connection candidate between a set of two character strings.

【００７９】（ａ）２つの文字列の縦または横の方向が
同一である。（ｂ１）横方向の文字列の場合、２つの文字列のそれぞ
れの縦方向の斜影の共通部分が存在する。（ｂ２）縦方
向の文字列の場合、２つの文字列のそれぞれの横方向の
斜影の共通部分が存在する。（ｃ１）横方向の文字列の場合、以下の条件をすべて満
たす他の文字列が存在しない。（ｃ１−１）一方の文字列より上に位置し、もう一方の
文字列より下に位置する。（ｃ１−２）縦方向の斜影が、２つの文字列のそれぞれ
の縦方向の斜影の共通部分と共通部分を持つ。（ｃ２）縦方向の文字列の場合、以下の条件をすべて満
たす他の文字列が存在しない。（ｃ２−１）一方の文字列より右に位置し、もう一方の
文字列より左に位置する。（ｃ２−２）横方向の斜影が、２つの文字列のそれぞれ
の横方向の斜影の共通部分と共通部分を持つ。(A) The two character strings have the same vertical or horizontal direction. (B1) In the case of a character string in the horizontal direction, there is a common portion of the respective diagonal shadings of the two character strings. (B2) In the case of the character string in the vertical direction, there is a common portion of the respective horizontal shadows of the two character strings. (C1) In the case of a horizontal character string, there is no other character string that satisfies all of the following conditions. (C1-1) It is located above one character string and below the other character string. (C1-2) The vertical shading has a common part and a common part of the respective vertical shadings of the two character strings. (C2) In the case of a vertical character string, there is no other character string that satisfies all of the following conditions. (C2-1) Position to the right of one character string and to the left of the other character string. (C2-2) The horizontal bevel has a common part and a common part of the respective bevels of the two character strings.

【００８０】図１６に文字列間接続候補の例と、その文
字列間接続候補に対し、従来技術と本発明の実施例の接
続候補のスコアの例、そして本発明の実施例による文字
ブロック推定結果の例を示す。図中、１６０１は文書画
像の一部分、１６０２は非文字矩形、１６０３、１６０
４、１６０５、１６０６、１６０７はそれぞれ文字列
（１）、（２）、（３）、（４）、（５）である。この
例では、文字列（１）は非文字矩形のキャプション、文
字列（２）から（５）は本文であり、これらは別の文字
ブロックとするのが望ましい。FIG. 16 shows an example of inter-character string connection candidates, examples of scores of connection candidates of the prior art and the embodiment of the present invention for the inter-character string connection candidates, and character block estimation according to the embodiment of the present invention. An example of the result is shown. In the figure, 1601 is a part of the document image, 1602 is a non-character rectangle, 1603, 160.
4, 1605, 1606 and 1607 are character strings (1), (2), (3), (4) and (5), respectively. In this example, the character string (1) is a non-character rectangular caption, and the character strings (2) to (5) are the texts, which are preferably different character blocks.

【００８１】文字列間接続候補として、前記の条件を用
いて、文字列（１）と文字列（２）の間の接続（以下
（１）−（２）のように略記する）、（２）−（３）、
（３）−（４）、（４）−（５）が得られる。As the inter-character string connection candidates, the connection between the character strings (1) and (2) (abbreviated as (1)-(2) below), (2 )-(3),
(3)-(4) and (4)-(5) are obtained.

【００８２】１６０８は、従来技術による接続候補のス
コアの例である。Reference numeral 1608 is an example of the score of the connection candidate according to the conventional technique.

【００８３】表中の列Ｓ１は、２つの文字列にそれぞれ
含まれる文字の大きさの類似性に関するスコアである。
２つの文字列に含まれる文字の大きさが近いほど、その
接続候補に大きなスコアが与えられる。この例の場合、
文字列（１）から（５）に含まれる文字の大きさは全て
ほぼ等しいので、各接続候補には等しくスコア１０が与
えられる。The column S1 in the table is a score relating to the similarity in the size of the characters included in each of the two character strings.
The closer the size of the characters included in the two character strings is, the higher the score is given to the connection candidate. In this example,
Since all the characters included in the character strings (1) to (5) have substantially the same size, the score 10 is equally given to each connection candidate.

【００８４】Ｓ２は、文字列の間隔に関するスコアであ
る。横方向の文字列の場合、文字列間の縦方向の間隔が
小さいほど大きなスコアが与えられる。この例の場合、
各文字列は当間隔に並んでおり、各接続候補には等しく
スコア６が与えられる。S2 is a score related to the space between character strings. In the case of a character string in the horizontal direction, the smaller the vertical interval between the character strings, the larger the score. In this example,
The respective character strings are arranged at the same intervals, and each connection candidate is given a score of 6 equally.

【００８５】Ｓ３は、文字列の両端の位置に関するスコ
アである。横方向の文字列の場合、２つの文字列の両端
の横方向の位置が、それぞれ近いほど大きなスコアが与
えられる。この例の場合、接続候補（１）−（２）と接
続候補（３）−（４）では、文字列の先頭、後尾の位置
が離れており、スコア−１０が与えられる。接続候補
（２）−（３）では、先頭の位置はほぼ等しく、後尾の
位置が離れており、スコア−５が与えられる。接続候補
（４）−（５）では、後尾の位置はほぼ等しく、先頭の
位置が離れており、スコア−２が与えられる。S3 is a score relating to the positions of both ends of the character string. In the case of a character string in the horizontal direction, the closer the positions of the two character strings in the horizontal direction to each other, the larger the score. In the case of this example, the connection candidates (1)-(2) and the connection candidates (3)-(4) are apart from each other in the positions of the beginning and end of the character string, and a score of -10 is given. In connection candidates (2)-(3), the head positions are almost equal, the tail positions are distant, and a score of -5 is given. In connection candidates (4)-(5), the positions of the tails are almost the same, the positions of the heads are distant, and a score of -2 is given.

【００８６】Ｔは、Ｓ１からＳ３のスコアの合計値であ
る。従来技術の一例ではこのＴを接続候補の削除に用い
る。しかしながら、例えば閾値を０としたとき、削除さ
れる接続候補は無く、文字列（１）から（５）は全て統
合される。本来は文字列（１）は非文字矩形に付随する
キャプションであり、同じ文字ブロックに含まれるべき
ではない。一方文字列（１）が独立した文字ブロックと
なるように、閾値を接続候補（１）−（２）のスコア６
より大きな値、例として７としたときは、接続候補
（１）−（２）とともに接続候補（３）−（４）も削除
されて、文字列（１）、文字列（２）と（３）、文字列
（４）と（５）の３つの文字ブロックが生成される。こ
の結果は本来同じ文字ブロックとなるべき文字列（２）
から（５）が別々になってしまい好ましくない。T is the total value of the scores from S1 to S3. In an example of the conventional technique, this T is used for deleting a connection candidate. However, for example, when the threshold value is 0, there are no connection candidates to be deleted, and all the character strings (1) to (5) are integrated. Originally, the character string (1) is a caption attached to a non-character rectangle and should not be included in the same character block. On the other hand, the threshold value is set to 6 for the connection candidates (1)-(2) so that the character string (1) becomes an independent character block.
If a larger value, for example 7, is set, the connection candidates (1)-(2) and the connection candidates (3)-(4) are deleted, and the character strings (1), (2) and (3) are deleted. ), Three character blocks of character strings (4) and (5) are generated. This result is a character string that should be the same character block (2)
Since (5) becomes different, it is not preferable.

【００８７】従来技術による問題点は、１６０８に例と
して挙げたように、文字列の位置や形状、文字列に含ま
れる文字の位置や形状のみでは、接続すべき文字列と接
続すべきでない文字列を正確に区別できない場合が多い
ことである。The problem with the prior art is that, as mentioned in 1608, the character string to be connected and the character string to not be connected only by the position and shape of the character string and the position and shape of the character included in the character string. Often, the columns cannot be accurately distinguished.

【００８８】１６０９は、本発明の一実施例における文
字ブロック推定手段１０８による接続候補のスコアの一
例である。Reference numeral 1609 is an example of the score of the connection candidate by the character block estimating means 108 in the embodiment of the present invention.

【００８９】Ｓ１、Ｓ２は従来技術１５０８で説明した
Ｓ１、Ｓ２と同じものである。S1 and S2 are the same as S1 and S2 described in the prior art 1508.

【００９０】Ｓ３は、従来技術１５０８と同じく文字列
の両端の位置に関するスコアであるが、本発明では、先
に文字認識を行っているので、その結果を利用すること
によりスコアの付け方が異なる。例えば、横方向の文字
列の場合、２つの文字列の両端の横方向の位置が、それ
ぞれ近いほど大きなスコアが与えられる。これは、１５
０８と同様である。更に、文字列の後尾の文字矩形に対
応する文字コードが“。”または“．”である場合は、
例外として、接続するもう一方の文字列より後尾の位置
が左であれば大きなスコアを与える。この例の場合、文
字列（３）は後尾の文字矩形に対応する文字コード
が“。”であり、文字列（３）の後尾の位置は文字列
（２）、（４）の後尾より左であるので、接続候補
（２）−（３）、接続候補（３）−（４）に対して、１
５０８で従来技術の場合に与えられたスコアより大きな
スコア０が与えられる。S3 is a score relating to the positions of both ends of the character string as in the prior art 1508. However, in the present invention, since character recognition is performed first, the score is assigned differently by using the result. For example, in the case of a character string in the horizontal direction, the closer the positions of the two character strings in the horizontal direction are, the larger the score. This is 15
Same as 08. Furthermore, when the character code corresponding to the character rectangle at the end of the character string is "." Or ".",
As an exception, a large score is given if the tail position is to the left of the other connected character string. In the case of this example, the character code of the character string (3) corresponding to the tail character rectangle is “.”, And the tail position of the character string (3) is left of the tail ends of the character strings (2) and (4). Therefore, for connection candidates (2)-(3) and connection candidates (3)-(4), 1
At 508, a score of 0 is given, which is greater than the score given in the prior art.

【００９１】Ｓ４は、文字列間の文法的な連続性に関す
るスコアである。２つの文字列に含まれる文字矩形に対
応する文字コードを文法的に検定し、文法的な連続性が
高いほど大きなスコアが与えられる。例として、接続候
補に対応する２つの文字列に含まれる文字矩形に対応す
る文字コードに対して形態素解析を行ない、文字列間に
またがって単語が検出される場合はスコア２０を与え
る。図１６の例の場合、接続候補（２）−（３）、
（４）−（５）における“意味”、“レベル”がこれに
相当する。また、文字列間にまたがって文節が検出され
るが、単語は文字列間にまたがって検出されない場合に
はスコア１０を与える。図１６の例にはこれに相当する
ものはないが、例えば、“…意味”−“がなく…”とい
う接続がこれに相当する。また、文節の切れ目と文字列
の接続位置がちょうど一致する場合にはスコア５を与え
る。図１６の例の場合、接続候補（３）−（４）におけ
る、“…できない。”−“そこで…”という接続がこれ
に相当する。更に、文字列の接続位置において、解析不
能であったり、未知語が検出された場合にはスコア−１
０が与えられる。図１６の例の場合、接続候補（１）−
（２）において、“…分類る」といった…”の解析で
“る”という未知語が検出されるので、この場合に相当
する。S4 is a score relating to grammatical continuity between character strings. The character codes corresponding to the character rectangles included in the two character strings are grammatically tested, and the higher the grammatical continuity, the greater the score. As an example, morphological analysis is performed on the character codes corresponding to the character rectangles included in the two character strings corresponding to the connection candidates, and a score of 20 is given when a word is detected across the character strings. In the case of the example in FIG. 16, connection candidates (2)-(3),
"Meaning" and "level" in (4)-(5) correspond to this. In addition, when a phrase is detected across character strings, but a word is not detected across character strings, a score of 10 is given. In the example of FIG. 16, there is no corresponding one, but for example, a connection "... meaning"-"without" ... "corresponds to this. Also, if the break of the phrase and the connection position of the character string are exactly the same, a score of 5 is given. In the case of the example in FIG. 16, the connection "(cannot) .-" There ... "in the connection candidates (3)-(4) corresponds to this. Furthermore, if it is not possible to analyze or an unknown word is detected at the connection position of the character string, score -1
0 is given. In the case of the example in FIG. 16, connection candidate (1)-
In (2), the unknown word "ru" is detected by the analysis of "... classify ...", which corresponds to this case.

【００９２】Ｔは、Ｓ１からＳ３のスコアの合計値であ
る。閾値を０に設定することにより、接続候補（１）−
（２）が削除され、残っている接続に従って文字列を統
合すれば、文字ブロック１６１０および１６１１が得ら
れる。以上に述べたスコアの重みの配分や閾値の値は、
対象として想定する文書画像に対して適切な推定ができ
るように実験的に求めればよく、本実施例に述べた数値
と異なるものを用いてもよい。T is the total value of the scores from S1 to S3. By setting the threshold value to 0, connection candidate (1)-
If (2) is deleted and the character strings are integrated according to the remaining connections, character blocks 1610 and 1611 are obtained. The distribution of score weights and threshold values described above are
It suffices to experimentally obtain it so that an appropriate estimation can be made for the document image assumed as the target, and a value different from the numerical value described in the present embodiment may be used.

【００９３】図１７は、文字ブロック順序推定手段１１
１が文字ブロックの順序を推定する一例を示す図であ
る。１７０１は文書画像の一部分である。１７１１〜１
７１９は非文字矩形推定手段１０６、文字ブロック推定
手段１０８による処理の結果得られる非文字矩形および
文字ブロックの例である。この例では、１７０２に示す
ように、１７１２、１７１５、１７１９は非文字矩形、
１７１１、１７１３、１７１４、１７１６、１７１７、
１７１８は縦書きの文字ブロックとして推定されてい
る。FIG. 17 shows the character block order estimating means 11
FIG. 1 is a diagram showing an example of estimating the order of character blocks. 1701 is a part of the document image. 1711-1
Reference numeral 719 is an example of a non-character rectangle and a character block obtained as a result of the processing by the non-character rectangle estimating means 106 and the character block estimating means 108. In this example, as indicated by 1702, 1712, 1715, and 1719 are non-character rectangles,
1711, 1713, 1714, 1716, 1717,
1718 is estimated as a vertically written character block.

【００９４】文字ブロック順序推定手段が用いるルール
の例として、構成する文字の幅・高さの平均が他の文字
ブロックを構成する文字矩形の面積の平均の２倍より大
きい文字ブロックは、大見出しとして順序を最初とし、
非文字矩形の上または下に隣接する横書きの文字ブロッ
ク、あるいは非文字矩形の左または右に隣接する縦書き
の文字ブロックで、他に隣接する文字ブロックと文字行
の方向や文字矩形の大きさの平均が異なるものを図形等
に付属するキャプションとして順序を最後とし、その他
の文字ブロックを本文として、縦書きの場合は上から
下、右から左の順序に、横書きの場合は左から右、上か
ら下の順序に並べる、というルールを用いる。このルー
ルに従えば文字ブロック１７１３は順序が最初に決定さ
れる。また残りの文字ブロックについては、１７１７、
１７１６、１７１１、１７１８、１７１４という順序
と、１７１７、１７１６、１７１８、１７１１、１７１
４という順序の２通りの可能性がある。As an example of the rule used by the character block order estimating means, a character block in which the average width / height of the constituent characters is more than twice the average area of the character rectangles forming the other character blocks is Order first and then
The horizontal character block that is adjacent to the top or bottom of the non-character rectangle, or the vertical character block that is adjacent to the left or right of the non-character rectangle. Those with different averages are captions attached to figures etc. and the order is the last, and other character blocks are the text, top to bottom, right to left for vertical writing, left to right for horizontal writing, The rule is to arrange from top to bottom. According to this rule, the order of the character blocks 1713 is determined first. For the remaining character blocks, 1717,
1716, 1711, 1718, 1714 and 1717, 1716, 1718, 1711, 171
There are two possibilities in the order of four.

【００９５】次に各文字ブロックに含まれる各文字矩形
に対応する文字コードを調べる。文字ブロック間の文法
的連続性の評価には、前に述べた文字列間の文法的連続
性の評価と同様の技術を用いることができる。即ち、図
１７の例では、１７１６から１７１１への接続と１７１
８から１７１４への接続にはスコア２０、１７１７から
１７１６への接続と１７１１から１７１８への接続には
スコア５、１７１６から１７１８への接続と１７１８か
ら１７１１への接続と１７１１から１７１４への接続に
はスコア−１０が与えられる。１７１７、１７１６、１
７１１、１７１８、１７１４の文字ブロックの並びにお
いては、スコアの合計は５０、１７１７、１７１６、１
７１８、１７１１、１７１４の文字ブロックの並びにお
いては、スコアの合計は−２５となる。このことから、
１７１１、１７１８、１７１７、１７１６、１７１５と
いう順序が正しい順序として推定され、最終的には、１
７１２、１７１１、１７１８、１７１７、１７１６、１
７１５、１７１４という順序が得られる。Next, the character code corresponding to each character rectangle included in each character block is checked. For the evaluation of grammatical continuity between character blocks, the same technique as the above-described evaluation of grammatical continuity between character strings can be used. That is, in the example of FIG. 17, the connection from 1716 to 1711 and 171
8 to 1714 for score 20, 1717 to 1716 for connection and 1711 to 1718 for score 5, 1716 to 1718 for connection, 1718 to 1711 for connection and 1711 to 1714 for connection Is given a score of -10. 1717, 1716, 1
In the arrangement of the character blocks 711, 1718, 1714, the total score is 50, 1717, 1716, 1
In the arrangement of the character blocks 718, 1711, and 1714, the total score is -25. From this,
The order 1711, 1718, 1717, 1716, 1715 is presumed to be the correct order, and finally 1
712, 1711, 1718, 1717, 1716, 1
The sequence 715, 1714 is obtained.

【００９６】図１８は、文字コード出力手段１１２によ
る文字コード出力結果の例である。FIG. 18 shows an example of a character code output result by the character code output means 112.

【００９７】図１９は、図３中のステップＳ３８で行な
われる文字矩形の再推定の一例である。１９０１は文書
画像の一部、１９０２は、図３中のステップＳ３８の一
度目の実行がなされる前の、文字矩形および文字列の推
定結果を示したものである。１９０２において、文字
「超」に相当する矩形１９１１は、図３中のステップＳ
３１における文字認識の結果、２１０３に示すように確
信度「８２」が得られるが、確信度が十分に大きくない
ために、文字矩形として確定されなかったものである。
ステップＳ３８で行なわれる再推定において、例えば、
ある未確定の矩形が、以下の条件を満たしたときには、
その矩形に対応する確信度に２０を加算するものとす
る。（１）２つの縦方向の文字列、または、２つの横方向の
文字列の間にある。（２）２つの文字列の中心線がほぼ一致する。（３）２つの文字列中の文字間隔がほぼ等しい。（４）２つの文字列中の文字の大きさの平均がほぼ等し
い。（５）その未確定の矩形と各文字列中の最近の文字矩形
との間隔のそれぞれが、両文字列中の文字間隔とほぼ等
しい。（６）その未確定の矩形の大きさが、両文字列中の文字
の大きさの平均とほぼ等しい。FIG. 19 shows an example of the re-estimation of the character rectangle performed in step S38 in FIG. Reference numeral 1901 denotes a part of the document image, and 1902 shows the estimation result of the character rectangle and the character string before the first execution of step S38 in FIG. In 1902, the rectangle 1911 corresponding to the character "super" is the step S in FIG.
As a result of character recognition in 31, a certainty factor “82” is obtained as shown in 2103, but since the certainty factor is not sufficiently large, it is not decided as a character rectangle.
In the re-estimation performed in step S38, for example,
When an undetermined rectangle satisfies the following conditions,
It is assumed that 20 is added to the certainty factor corresponding to the rectangle. (1) It is between two vertical character strings or two horizontal character strings. (2) The center lines of the two character strings substantially coincide with each other. (3) The character spacing in the two character strings is almost equal. (4) The average of the sizes of the characters in the two character strings is almost equal. (5) Each of the intervals between the undetermined rectangle and the most recent character rectangle in each character string is approximately equal to the character interval in both character strings. (6) The size of the undetermined rectangle is almost equal to the average size of the characters in both character strings.

【００９８】図１９の例では、矩形１９１１はこれらの
条件を満たすため、１９０４に示されるように、確信度
に２０が加算され「１０２」となる。その結果、矩形１
９１１は文字矩形として確定される。この後、図３中の
ステップＳ３４からの処理が再び実行され、１９０５に
示されるように、正しく文字矩形および文字列が推定さ
れる。In the example of FIG. 19, since the rectangle 1911 satisfies these conditions, 20 is added to the certainty factor to give “102”, as indicated by 1904. As a result, rectangle 1
911 is determined as a character rectangle. After this, the processing from step S34 in FIG. 3 is executed again, and as shown at 1905, the character rectangle and the character string are correctly estimated.

【００９９】次に、論理構造解析手段１１３について説
明する。レイアウト構造から論理構造を解析する手法は
様々に提案されており、例えば画像電子学会誌Ｖｏｌ．
１７Ｎｏ．５ｐｐ２６７〜２７７『書式定義言語を用
いた文書画像の理解』に示されている手法を用いること
ができる。Next, the logical structure analysis means 113 will be described. Various methods for analyzing a logical structure from a layout structure have been proposed, for example, the Institute of Image Electronics Engineers, Vol.
17 No. 5 pp.267-277 "Understanding document image using format definition language" can be used.

【０１００】本発明では、先に文字認識を行い、その文
字認識の結果を利用してブロックを推定し、論理構造の
解析時には、レイアウト要素の１つである文字ブロック
に関して、各々に含まれる文字の文字コードおよびその
ブロック内における順序が明らかになっているため、文
法情報あるいは単語に関する情報を用いて、より正確に
論理構造を決定できる点を特徴とする。In the present invention, character recognition is performed first, the block is estimated by using the result of the character recognition, and when the logical structure is analyzed, the character blocks included in each character block, which is one of the layout elements, are analyzed. Since the character code of and the order within the block are clarified, the feature is that the logical structure can be determined more accurately by using the grammatical information or the information about the word.

【０１０１】この例では、文字ブロック順序推定手段の
実施例の説明で記述したものと同様の推定方法により、
１７１１、１７１８、１７１７の各ブロックは、この順
序で連続する本文領域であることが推定され、大段落の
１つの一部を構成するブロックであることが推定でき
る。In this example, an estimation method similar to that described in the description of the embodiment of the character block order estimating means is used.
It is estimated that the blocks 1711, 1718, and 1717 are text areas that are continuous in this order, and can be estimated to be blocks that form one part of a large paragraph.

【０１０２】文字ブロック順序推定手段の推定の結果、
図２０の２００１に示すような論理構造解析が得られ
る。As a result of estimation by the character block order estimating means,
A logical structure analysis as shown by 2001 in FIG. 20 is obtained.

【０１０３】論理構造を推定する他の例として、例え
ば、著者名が記述される論理ブロックを判定する際に
は、文字ブロック内に人名に用いられる単語があるか否
かを検査することにより、判定の精度を向上することが
できる。また、図に付随する図見出しが記述される論理
ブロックを判定する際には、非文字矩形からある閾値以
内の近傍にあるというような位置関係や、文字ブロック
内に“図”、“Ｆｉｇ．”などの文字の並びがあるか否
かを検査することにより、該非文字矩形の論理構造が明
確になり、判定の精度を向上することができる。As another example of estimating the logical structure, for example, when a logical block in which an author's name is described is determined, it is checked whether or not there is a word used for a personal name in a character block. The accuracy of judgment can be improved. Further, when determining a logical block in which a figure heading attached to a figure is described, a positional relationship such that the figure block is in the vicinity of a non-character rectangle within a certain threshold value, or “figure”, “Figure. By checking whether or not there is a character sequence such as "," the logical structure of the non-character rectangle is clarified, and the accuracy of determination can be improved.

【０１０４】[0104]

【発明の効果】本発明による文書画像解析装置では、先
に、各矩形にたいして文字認識を行ない、その認識結果
として得られる文字コードを利用し、また文法辞書を備
えて文字コードの並びを利用することによって、文字行
や文字ブロックなどの構造を推定するので、解析精度を
向上でき、かつ、より複雑な構造を持つ文書画像を解析
できる。In the document image analyzing apparatus according to the present invention, character recognition is first performed on each rectangle and the character code obtained as the recognition result is used. Further, the grammar dictionary is used and the arrangement of character codes is used. As a result, the structure of a character line, a character block, or the like is estimated, so that the analysis accuracy can be improved and a document image having a more complicated structure can be analyzed.

[Brief description of drawings]

【図１】本発明による文書画像解析装置の一実施例の構
成を示す図である。FIG. 1 is a diagram showing a configuration of an embodiment of a document image analysis apparatus according to the present invention.

【図２】文法辞書の一例を示す図である。FIG. 2 is a diagram showing an example of a grammar dictionary.

【図３】本発明の実施例における文書画像解析装置の動
作の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of the operation of the document image analysis apparatus according to the exemplary embodiment of the present invention.

【図４】外接矩形生成手段１０１により得られる外接矩
形の例を示す図である。FIG. 4 is a diagram showing an example of a circumscribed rectangle obtained by a circumscribed rectangle generation means 101.

【図５】本実施例における文字認識手段１０２−１、１
０２−２、１０２−３の構成の一例を示す図である。FIG. 5: Character recognition means 102-1 and 1 in this embodiment
It is a figure which shows an example of a structure of 02-2, 102-3.

【図６】外接矩形生成手段１０１により得られた外接矩
形の各々に対して、文字認識手段１０２により得られる
文字コードと確信度の例を示す図である。FIG. 6 is a diagram showing an example of a character code and a certainty factor obtained by the character recognizing means for each circumscribing rectangle obtained by the circumscribing rectangle generating means 101.

【図７】分離文字候補抽出手段１０４により得られる分
離文字候補の例と、得られた分離文字候補の各々に対し
て、文字認識手段１０２−２により得られる文字コード
と確信度の例を示す図である。FIG. 7 shows examples of separated character candidates obtained by the separated character candidate extraction unit 104, and examples of character codes and certainty factors obtained by the character recognition unit 102-2 for each of the obtained separated character candidates. It is a figure.

【図８】分離文字統合手段１０５が、分離文字候補の中
から分離文字を決定定し、複数の外接矩形を統合した例
を示す図である。FIG. 8 is a diagram showing an example in which separated character integration means 105 determines a separated character from the separated character candidates and integrates a plurality of circumscribing rectangles.

【図９】接触文字候補抽出手段１０９により得られる接
触文字候補を含む画像の一例を示す図である。9 is a diagram showing an example of an image including contact character candidates obtained by contact character candidate extraction means 109. FIG.

【図１０】接触文字分割手段１１０が、接触文字候補の
中から接触文字を決定し、複数の外接矩形に分割した例
を示す図である。FIG. 10 is a diagram showing an example in which a contact character dividing unit 110 determines a contact character from contact character candidates and divides it into a plurality of circumscribing rectangles.

【図１１】文字列推定手段１０７のよる文字列の推定の
例を示す図である。FIG. 11 is a diagram showing an example of estimation of a character string by a character string estimating means 107.

【図１２】図３の（Ｓ３４）を詳細化した、文字列推定
手段１０７の動作の一例を示すフローチャートである。FIG. 12 is a detailed flowchart of (S34) of FIG. 3, showing an example of the operation of the character string estimating unit 107.

【図１３】文字列推定手段１０７において、文字列の推
定のためのスコアの一例を示す図である。FIG. 13 is a diagram showing an example of a score for estimating a character string in the character string estimating means 107.

【図１４】文字ブロック推定手段１０８による文字ブロ
ックの推定の例を示す図である。FIG. 14 is a diagram showing an example of estimation of a character block by a character block estimation means 108.

【図１５】図３の（Ｓ３５）を詳細化した、文字ブロッ
ク推定手段１０８の動作の一例を示すフローチャートで
ある。FIG. 15 is a flowchart showing an example of the operation of the character block estimating unit, which is a detailed version of (S35) in FIG.

【図１６】従来技術、及び本発明の文字ブロック推定手
段１０８において、文字ブロックの推定に用いるスコア
の一例を示す図である。FIG. 16 is a diagram showing an example of a score used for estimating a character block in the conventional technique and the character block estimating means 108 of the present invention.

【図１７】文字ブロック順序推定手段１１１による文字
ブロックの順序の推定の一例を示す図である。FIG. 17 is a diagram showing an example of estimation of the order of character blocks by the character block order estimation means 111.

【図１８】文字コード出力手段１１２による文字コード
出力結果の例である。FIG. 18 is an example of a character code output result by the character code output unit 112.

【図１９】図３の（Ｓ３８）文字矩形の再推定の一例を
示す図である。FIG. 19 is a diagram showing an example of re-estimation of the character rectangle in (S38) of FIG.

【図２０】論理構造解析手段１１３によって、各文字ブ
ロック及び非文字矩形に対して論理的な構造を付与した
例を示す図である。20 is a diagram showing an example in which a logical structure is added to each character block and non-character rectangle by the logical structure analysis unit 113. FIG.

【図２１】従来技術における文書画像解析装置の動作の
一例を示すフローチャートである。FIG. 21 is a flowchart showing an example of the operation of the document image analysis apparatus in the related art.

[Explanation of symbols]

１０１：外接矩形生成手段；１０２−１、１０２−
２、１０２−３：文字認識手段；１０３：文法辞書；
１０４：分離文字候補抽出手段；１０５：分離文字
統合手段；１０６：非文字矩形推定手段；１０７：
文字列推定手段；１０８：文字ブロック推定手段；１
０９：接触文字候補抽出手段；１１０：接触文字分割
手段；１１１：文字ブロック順序推定手段；１１
２：文字コード出力手段；１１３：論理構造解析手
段；２０１：自立語表；２０２：付属語表；２０
３：活用表；２０４：接続表；４０１：文書画像の
一部分；４１１〜４２０：外接矩形；５０１：画像
正規化部；５０２：特徴抽出部；５０３：距離計算
部；５０４：結果出力部；６０１：文字コードと確
信度；７０１：分離文字候補に対する文字コードと確
信度；７１１〜７１４：分離文字候補；８１１〜８
１６：分離文字統合後の矩形；９０１：文書画像の一
部分；９０２：文字コードと確信度；９１１〜９１
７：外接矩形；１００１：文字コードと確信度；１０
１１，１０１２：接触文字候補を分割して得られる矩
形；１１０１：文書画像の一部分；１１０２：文字
矩形；１１０３：文字行推定結果；１４０１：文字ブ
ロック推定結果；１６０１：文書画像の一部分；１
６０２：非文字矩形；１６０３〜１６０７：文字列；
１６０８：従来技術のスコアの一例；１６０９：本
発明のスコアの一例；１６１０，１６１１：文字ブロ
ック；１７０１：文書画像の一部分；１７０２：文
字ブロック推定結果及び非文字矩形推定結果：１７１
１，１７１２，１７１４〜１７１８：文字ブロック：
１７１３：非文字矩形；１８０１：文字コード出力結
果；１９０１：文字画像の一部；１９０２，１９０
３：文字矩形及び文字列の推定結果；１９０３，１９
０４：文字コードと確信度；１９１１：文字矩形の一
部の拡大図；２００１：論理構造推定結果．101: circumscribed rectangle generating means; 102-1 and 102-
2, 102-3: character recognition means; 103: grammar dictionary;
104: separated character candidate extracting means; 105: separated character integrating means; 106: non-character rectangle estimating means; 107:
Character string estimating means; 108: Character block estimating means; 1
09: contact character candidate extracting means; 110: contact character dividing means; 111: character block order estimating means; 11
2: Character code output means; 113: Logical structure analysis means; 201: Independent word table; 202: Adjunct word table; 20
3: Utilization table; 204: Connection table; 401: Part of document image; 411-420: circumscribed rectangle; 501: image normalization unit; 502: feature extraction unit; 503: distance calculation unit; 504: result output unit; 601 : 701: Character code and certainty factor; 701: Character code and certainty factor for separated character candidate; 711-714: Separated character candidate; 811-8
16: Rectangle after separation character integration; 901: Part of document image; 902: Character code and certainty factor; 911 to 91
7: circumscribed rectangle; 1001: character code and certainty factor; 10
11, 1012: Rectangle obtained by dividing contact character candidates; 1101: Part of document image; 1102: Character rectangle; 1103: Character line estimation result; 1401: Character block estimation result; 1601: Part of document image; 1
602: Non-character rectangle; 1603 to 1607: Character string;
1608: example of score of prior art; 1609: example of score of the present invention; 1610, 1611: character block; 1701: part of document image; 1702: character block estimation result and non-character rectangle estimation result: 171
1, 1712, 1714 to 1718: Character block:
1713: Non-character rectangle; 1801: Character code output result; 1901: Part of character image; 1902, 190
3: Character rectangle and character string estimation result; 1903, 19
04: Character code and certainty factor; 1911: Partial enlarged view of character rectangle; 2001: Logical structure estimation result.

Claims

[Claims]

1. A pixel block extracting means for extracting a connected black pixel block in an image, and character recognition processing for the connected black pixel block extracted by the pixel block extracting means to determine at least one character code. A first character recognition unit, a grammar dictionary holding a word and connection information between words; a character code determined by the character recognition unit; a character stored in the grammar dictionary; A document image analyzing apparatus, comprising: a character string estimating unit that estimates a direction of a column.

2. The document image analyzing apparatus according to claim 1, wherein the separated character candidate extracting unit extracts a separated character candidate having a possibility of integrating the plurality of connected black pixel blocks into one character. And character recognition processing is performed on the separated character candidate,
A document image analysis apparatus, comprising: a second character recognition means for determining at least one character code, and a separated character integration means for determining a separated character from the separated character candidates.

3. The document image analyzing apparatus according to claim 1, wherein one concatenated black pixel block is divided to extract contact character candidates having a possibility of forming a plurality of characters or a part of the characters. A contact character candidate extracting unit, a contact character dividing unit that determines a contact character from the contact character candidate, and divides the contact character into a plurality of images; and a character recognition process for the image obtained by the contact character dividing unit. And a document image analysis device characterized by third character recognition means for outputting at least one or more character codes.

4. The document image analysis apparatus according to claim 1, wherein at least one character code output from said first character recognition means for said connected black pixel block and held in a grammar dictionary. The document image analyzing apparatus further comprising a non-character estimating unit that estimates a non-character rectangle that is not a character based on the selected word and connection information between words.

5. The document image analyzing apparatus according to claim 1, further comprising: a character string estimated by the character string estimating means, and output by the first, second or third character recognizing means. A document image analyzing apparatus comprising: a character block estimating means for estimating a character block as a set of character strings based on a character code and a word held in the grammar dictionary and connection information between the words.

6. The document image analyzing apparatus according to claim 5, further comprising: the character block estimated by the character block estimating means, the character string included in the character block, and the first, second, or third character block. The character block order estimating means for estimating the order of the character blocks based on the character code output from the character recognizing means of No. 3, the word held in the grammar dictionary, and the connection information between the words, and the estimating of the character block estimating means From the character block, the character string included in the character block, the character code output by the first, second, or third character recognition means, and the order of the character blocks estimated by the character block order estimation means, A document image analysis apparatus, comprising: a character code output unit that outputs the character codes included in the character block in the order.

7. The document image analyzing apparatus according to claim 5, wherein the character block estimated by the character block estimating means, each character string included in the character block, and the character string included in the character string are included in the character block. Based on the character code output by the first, second or third character recognition means and the words in the grammar dictionary and the connection information between the words, a logical structure is added to the character block and the first character block is output. A document image analysis apparatus comprising a logical structure analysis means.

8. The document image analyzing apparatus according to claim 5, wherein, with respect to the non-character rectangle, a character block estimated by the character block estimating means existing in the vicinity of the non-character rectangle and each character string included in the character block. , And a logical code in the non-character rectangle based on a character code included in the character string and output by the first, second or third character recognition means, and a word in the grammar dictionary and connection information between the words. And a second logical structure analysis means for outputting a specific structure.