JPH10222612A

JPH10222612A - Document recognizing device

Info

Publication number: JPH10222612A
Application number: JP9024810A
Authority: JP
Inventors: Yoshifumi Sato; 佳史里; Takuya Okamoto; 卓哉岡本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-02-07
Filing date: 1997-02-07
Publication date: 1998-08-21

Abstract

PROBLEM TO BE SOLVED: To recognize a document by the most suitable character recognizing method for every partial area in a document picture. SOLUTION: A character recognizing part 102 recognizes the character of the document picture 101 and outputs a character recognition result a103. A logical structure recognizing part 104 recognizes logical structure with respect to the character recognition result a103 decides that a character string in the recognition result corresponds to which element. Then, information of character areas of respective characters in the character recognition result (a) is referred to, a character string area occupied by the respective character strings made correspond to the element is derived and a logical structure recognition result 105 including at least the correspondent relation of the character string areas with the elements is outputted. In a control part 108, element-recognizing means correspondence knowledge 107 is referred to and proper recognizing means corresponding to the corresponding elements are assigned from a recognizing means managing part 109 as against the respective character string area in the logical structure recognition result 106. A recognizing means applying part 110 applys the recognizing means selected by the control part 108 as against the respective character string areas and generates a character recognition result b111.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、紙面文書をスキャ
ナで取り込む等の手段によって生成された文書画像情報
に対して、文字認識を行う方法および装置に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for performing character recognition on document image information generated by means such as taking a paper document with a scanner.

【０００２】[0002]

【従来の技術】文書画像情報に対する文字認識技術の研
究開発は以前から活発に行われてきており、印刷文書に
対する文字認識であれば、９９％程度の認識率で認識で
きるようになっている（「文字認識技術の将来展望」黒
沢ら，東芝レビュー１９９２Ｖol．４７Ｎo.２ p
p.１１１−１１４：以下「文献１」として参照）。しか
し、例えば法規文書や戸籍情報のような重要な情報を認
識する文字認識システムには、より高い精度（例えば９
９．９％程度）が要求されることが多く、このような高
精度の文字認識を実現するには、未だに多くの課題が残
されている。そうした課題の一つとして、「認識対象の
文書画像にさまざまな文字書体（以下「フォント」）や
文字種（英数字、漢字、手書き文字、等の文字種別）が
混在する場合、特定のフォントや文字種の文字列の認識
率が低下しやすい」という問題点がある。これは、認識
方式や、認識に用いる文字パターン辞書によって、フォ
ントや文字種に関する得手・不得手があるためである。2. Description of the Related Art Research and development of character recognition technology for document image information has been actively carried out for a long time, and character recognition for printed documents can be recognized at a recognition rate of about 99% ( "Prospects of Character Recognition Technology" Kurosawa et al., Toshiba Review, 1992 Vol.47 No.2 p.
p.111-114: hereinafter referred to as "Document 1"). However, a character recognition system that recognizes important information such as legal documents and family register information has higher accuracy (for example, 9 characters).
(Approximately 9.9%), and many problems still remain to realize such highly accurate character recognition. One of these issues is that if a document image to be recognized contains various character typefaces (hereinafter referred to as “fonts”) and character types (character types such as alphanumeric characters, kanji, handwritten characters, etc.), specific fonts and character types The character string recognition rate is likely to decrease. " This is because there are advantages and disadvantages regarding fonts and character types depending on the recognition method and the character pattern dictionary used for recognition.

【０００３】この問題を解決するための手法として、文
書画像上で特定のフォントや文字種の文字群から構成さ
れる部分的な領域（以下「フィールド」）毎に、その領
域に応じた適切な文字認識手段を割り当てる、という手
法が検討されてきた。この手法に関する三つの従来技術
を以下に示す。As a technique for solving this problem, for each partial area (hereinafter referred to as "field") composed of a character group of a specific font or character type on a document image, an appropriate character corresponding to the area is used. Methods of assigning recognition means have been studied. Three prior arts relating to this approach are shown below.

【０００４】特開昭６１−１６３７８号公報には、帳票
や定型文書など、フィールドの位置が固定している文書
に対して、フィールド毎に適切な文字認識手段を割り当
てることが記載されている（従来技術１）。Japanese Patent Application Laid-Open No. S61-16378 describes that an appropriate character recognition means is assigned to each document, such as a form or a fixed form document, in which the position of the field is fixed. Conventional technology 1).

【０００５】特開昭６１−２９０５７９号公報には、予
め人手でフィールド分割を行い、各フィールドに対して
適切な文字認識手段を割り当てることが記載されている
（従来技術２）。Japanese Patent Application Laid-Open No. Sho 61-290579 describes that field division is manually performed in advance and appropriate character recognition means is assigned to each field (prior art 2).

【０００６】特開昭６３−２７９９０号公報には、画像
を行単位に分割した時点で、予め作成しておいた「行の
レイアウトの種類と、各レイアウトの行を認識するため
の認識手段との対応関係」を参照して、適切な文字認識
方法を割り当てることが記載されている（従来技術
３）。Japanese Patent Application Laid-Open No. 63-27990 discloses that when an image is divided in units of lines, a "line layout type and a recognition means for recognizing each layout line" are prepared in advance. It is described that an appropriate character recognition method is assigned with reference to "correspondence relation of the related art" (Prior Art 3).

【０００７】[0007]

【発明が解決しようとする課題】まず、従来技術の問題
点について述べる。従来技術１では、フィールドの位置
が固定している文書に対する手法であるため、フィール
ドの物理的な位置や個数が個々の文書について変動する
ような種類の文書に対しては効果がない。First, problems of the prior art will be described. In the prior art 1, since the method is for a document in which the position of a field is fixed, there is no effect on a type of document in which the physical position and the number of fields vary for each document.

【０００８】従来技術２の手法は、人手でフィールド分
割を行うため、文書の数やサイズに比例して要する労力
も増大することになる。そのため、文書の数やサイズが
大きい場合には適用困難である。In the technique of the prior art 2, since field division is performed manually, labor required in proportion to the number and size of documents also increases. Therefore, it is difficult to apply when the number and size of documents are large.

【０００９】従来技術３の手法は、従来技術１および２
の問題点を克服する手法である。しかし、実際には、行
単位のレイアウト情報のみで適切なフィールド分割を行
うことは困難である。例えば、一行中にフォントの異な
るフィールドが混在する場合がある。図２は法規文書画
像の一例であるが、５行目の最初の３文字がゴシック体
であり、残りの文字は明朝体である。このとき、ゴシッ
ク体の文字と明朝体の文字では異なる認識手段を割り当
てたいのだが、一行単位でフィールドを分割していく従
来技術３の方法では実現できない。また、図２の４行目
はゴシック体、６行目は明朝体でそれぞれ記述されてい
る。この二つの行にそれぞれ異なる認識手段を割り当て
たい場合でも、その二つの行が同じレイアウト（この場
合は行頭インデントが同じ）であるような場合には、同
一の認識手段を割り当てることになる。The technique of the prior art 3 is different from the techniques of the prior arts 1 and 2
This is a method that overcomes the problems described above. However, in practice, it is difficult to perform appropriate field division using only layout information for each row. For example, fields with different fonts may be mixed in one line. FIG. 2 is an example of a legal document image. The first three characters on the fifth line are in Gothic type, and the remaining characters are in Mincho type. At this time, it is desired to assign different recognition means for Gothic characters and Mincho characters, but this cannot be achieved by the method of the related art 3 in which the field is divided line by line. The fourth line in FIG. 2 is described in Gothic font, and the sixth line is described in Mincho font. Even if it is desired to assign different recognizing means to the two lines, if the two lines have the same layout (in this case, the indentation at the beginning of the line is the same), the same recognizing means will be assigned.

【００１０】本発明が解決しようとする課題は、フィー
ルドの物理的な位置や個数が個々の文書について変動す
るような種類の文書についても自動的にフィールド分割
を行い、かつ一行を複数のフィールドに分割することが
でき、行のレイアウトが同一であってもその内容によっ
て異なる認識手段を割り当てられるような機能を実現す
ることである。The problem to be solved by the present invention is to automatically perform field division for a document of a type in which the physical position and number of fields vary for each document, and to convert one line into a plurality of fields. The purpose of the present invention is to realize a function that can be divided so that different recognition means can be assigned depending on the contents even if the row layout is the same.

【００１１】[0011]

【課題を解決するための手段】本発明は、文書画像のフ
ィールド分割を行うにあたり、従来技術３のように「行
のレイアウト」に注目するのではなく、次のような文書
の性質に着目し、「文書の論理構造の要素」(以下「エ
レメント」)に注目してフィールド分割を行うものであ
る。（１）フォントは、「タイトル」「章題」「条番
号」といったエレメント毎に設定されていることが多
い。（２）例えば「日付」「○○番号」といったエレメ
ントについては、その文字列を構成する字種を限定する
ことができる。According to the present invention, when performing field division of a document image, instead of paying attention to "line layout" as in the prior art 3, attention is paid to the following document properties. The field division is performed by paying attention to “elements of the logical structure of the document” (hereinafter, “elements”). (1) Fonts are often set for each element such as “title”, “chapter title”, and “article number”. (2) For elements such as “date” and “XX number”, the character types constituting the character string can be limited.

【００１２】エレメントに着目したフィールド分割を行
うことにより、行を複数のフィールドに分割すること、
また行のレイアウトが同一であってもその内容によって
異なる認識手段を割り当てることが可能になる。Dividing a line into a plurality of fields by performing field division focusing on elements;
Even if the row layout is the same, different recognition means can be assigned depending on the content.

【００１３】本発明の構成を以下に示す。本発明は、認
識対象文書の画像情報から文字を抽出し、抽出した文字
の認識を行う文字認識手段と、文字認識結果に対して論
理構造認識を行い、少なくとも部分的な文字列に対応す
る画像情報中の部分領域すなわち文字列領域と、認識対
象文書における論理的な要素すなわち論理構造要素の種
別との間の領域とその領域の論理構造の関係（Ａ領域−
ａ論理、Ｂ領域−ｂ論理）を含む論理構造認識結果を生
成する論理構造認識手段と、個々の文字列領域に対し
て、少なくとも論理構造認識結果中で対応付けられた論
理構造要素の種別を基準の一つとしてその文字列領域に
適した文字認識手段を導出し、再度文字認識を行う再文
字認識手段と、を具備して構成するものである。The configuration of the present invention will be described below. The present invention provides a character recognizing means for extracting characters from image information of a document to be recognized and recognizing the extracted characters, and performing a logical structure recognition on the character recognition result to obtain an image corresponding to at least a partial character string. Relationship between the region between the partial region in the information, that is, the character string region, and the type of the logical element, that is, the logical structure element in the recognition target document, and the logical structure of the region (A region-
(a logic, B area-b logic), and a logical structure recognition means for generating a logical structure recognition result including at least a type of a logical structure element associated with each character string region in the logical structure recognition result. As one of the criteria, a character recognizing means suitable for the character string area is derived, and re-character recognizing means for performing character recognition again is provided.

【００１４】上記の再文字認識手段は、例えば予め作成
しておいた、論理構造要素の種別と、各種別の論理構造
要素に対応する文字列領域に現われるべき文字の特徴に
適した認識手段との間の二項関係を参照することによ
り、認識対象の画像情報中の各文字列領域に対して適切
な認識手段を割り当て、再度文字認識を行なうように構
成してもよい。The re-character recognizing means includes, for example, a recognizing means suitable for a type of a logical structure element prepared in advance and a characteristic of a character to appear in a character string area corresponding to each logical structure element. By referring to the binary relation between the character strings, an appropriate recognition means may be assigned to each character string area in the image information to be recognized, and character recognition may be performed again.

【００１５】また、上記の再文字認識手段は、予め作成
しておいた、論理構造要素の種別と、各種別の論理構造
要素に対応する文字列領域に現われるべき文字の特徴の
種別を示す情報との間の二項関係と、文字の特徴の種別
から、その特徴に適した認識手段を導出する手段とを具
備することにより、各文字列領域に対して適切な認識手
段を割り当て再度文字認識を行なうように構成してもよ
い。The character re-recognition means includes information indicating a type of a logical structure element created in advance and a type of a characteristic of a character to appear in a character string area corresponding to each of the logical structure elements. And a means for deriving a recognition means suitable for the characteristic from the type of the characteristic of the character, thereby allocating an appropriate recognition means to each character string region and re-character recognition. May be performed.

【００１６】[0016]

【発明の実施の形態】本実施例は、エレメントに注目し
てフィールド分割を行い、各フィールドに対して、特定
のフォントに強い認識手段を割り当てる例である。DESCRIPTION OF THE PREFERRED EMBODIMENTS This embodiment is an example in which field division is performed by focusing on elements, and a strong recognition means is assigned to a specific font for each field.

【００１７】まず、本実施例における認識率の扱いにつ
いて述べる。一般に、あらゆるフォントを対象として文
字認識を行う認識方式（以下「マルチフォント方式」）
よりも、対象とするフォントを特化した認識方式（以下
「フォント特化方式」）の方が認識率が高いことが知ら
れている。文献１では、マルチフォント方式（文献１中
では「オムニフォント活字漢字」の認識）の認識率を９
９％程度、フォント特化方式（文献１中では「シングル
フォント活字漢字」の認識）の認識率を９９．７％以上
としているが、具体的な認識率は個々の認識方式によっ
て異なる。本実施例では、「マルチフォント方式の認識
率よりもフォント特化方式の認識率の方が高い」という
定性的な性質を用いて認識率の高低を議論することにす
る。また本実施例における文字認識の例においては、こ
の二方式の認識率の差から生じる本発明の効果を分かり
やすく例示するために、マルチフォント方式の認識率を
９６％程度、フォント特化方式の認識率を９９％程度と
想定して具体的な認識結果例を示した。First, the handling of the recognition rate in this embodiment will be described. Generally, a recognition method that performs character recognition for all fonts (hereinafter, “multi-font method”)
It is known that a recognition method specialized in a target font (hereinafter referred to as “font specialized method”) has a higher recognition rate than a recognition method. In Reference 1, the recognition rate of the multi-font method (recognition of “omni font type kanji” in Reference 1) is 9
The recognition rate of the font-specific method (recognition of “single font type kanji” in Reference 1) is 99.7% or more, but the specific recognition rate differs depending on the individual recognition method. In the present embodiment, the degree of the recognition rate will be discussed using a qualitative property that the recognition rate of the font-specific method is higher than the recognition rate of the multi-font method. Further, in the example of character recognition in this embodiment, in order to clearly illustrate the effect of the present invention resulting from the difference between the two types of recognition rates, the recognition rate of the multi-font type is about 96%, and the recognition rate of the font specialization type is about 96%. Specific recognition result examples are shown assuming that the recognition rate is about 99%.

【００１８】以下、図面を参照して本発明の一実施例を
説明する。図１は、本実施例に係わる文書認識手法の構
成を示すブロック図である。１０１は文書画像、すなわ
ち紙面文書をスキャナ等の手段によって取り込み、画像
情報へと変換したデータである。文字認識部１０２で
は、文書画像１０１を入力として、文書画像１０１中の
文字領域に対して文字認識を行い、文字認識結果ａ１０
３を出力する。１０２の文字認識部では、文書画像中の
各文字毎のフォント情報が与えられていないため、文字
領域全体に対して一様な方法で文字認識を行う。この文
字認識は、単一の認識手段で実現してもよいし、複数の
認識手段を組み合わせて実現してもよい。文字認識結果
ａ１０３は、少なくとも、各文字が文書画像上で占める
領域（以下「文字領域」）とその文字を認識した結果で
ある文字コード情報との対応関係を含む情報である。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration of a document recognition method according to the present embodiment. Reference numeral 101 denotes data obtained by capturing a document image, that is, a paper document by a scanner or the like, and converting the document image into image information. The character recognition unit 102 receives the document image 101 as input, performs character recognition on a character area in the document image 101, and obtains a character recognition result a10
3 is output. In the character recognition unit 102, since font information for each character in the document image is not given, character recognition is performed in a uniform manner for the entire character area. This character recognition may be realized by a single recognition unit, or may be realized by combining a plurality of recognition units. The character recognition result a103 is information including at least the correspondence between the area occupied by each character on the document image (hereinafter, “character area”) and the character code information as a result of recognizing the character.

【００１９】論理構造認識部１０４では、文字認識結果
ａ１０３に対して論理構造認識を行い、認識結果中の文
字列が、どのような文書の論理構造（以下「エレメン
ト」）に対応するのかを決定する。（例えば、ある文字
列はその文書の「タイトル」というエレメントに対応づ
けられ、別の文字列は「章題」というエレメントに対応
づけられる。）そして、文字認識結果ａ中の、各文字の文字領域の情報
を参照して、エレメントに対応づけられた各文字列が占
める領域（以下「文字列領域」）を導出し、少なくとも
文字列領域とエレメントとの対応関係を含む論理構造認
識結果１０５を出力する。The logical structure recognizing unit 104 performs logical structure recognition on the character recognition result a103 and determines what kind of document logical structure (hereinafter, “element”) the character string in the recognition result corresponds to. I do. (For example, a certain character string is associated with an element “title” of the document, and another character string is associated with an element “chapter title.”) Then, the character of each character in the character recognition result a Referring to the area information, an area occupied by each character string associated with the element (hereinafter, “character string area”) is derived, and a logical structure recognition result 105 including at least the correspondence between the character string area and the element is obtained. Output.

【００２０】再文字認識部１０６では、論理構造認識結
果１０５と、文書の種類に応じて予め作成しておいたエ
レメント−認識手段対応知識１０７を参照して、文字列
領域毎に適した認識手段を選択して再度文字認識を行
い、高精度な文字認識結果ｂ１１１を出力する。The re-character recognition unit 106 refers to the logical structure recognition result 105 and the element-recognition unit correspondence knowledge 107 created in advance according to the type of the document, and recognizes a recognition unit suitable for each character string area. Is selected and character recognition is performed again, and a highly accurate character recognition result b111 is output.

【００２１】再文字認識部１０６での処理をより詳細に
述べる。再文字認識部１０６は、制御部１０８、認識手
段管理部１０９、認識手段適用部１１０からなる。認識
手段管理部１０９では、各フォントや文字種について適
した複数の認識手段を管理する。制御部１０８では、エ
レメント−認識手段対応知識１０７を参照して、論理構
造認識結果１０６中の各文字列領域に対して、対応する
エレメントに応じた適切な認識手段を割り当てる。認識
手段適用部１１０では、制御部１０８が選択した認識手
段を各文字列領域に対して適用し、文字認識結果ｂ１１
１を生成する。The processing in the re-character recognition unit 106 will be described in more detail. The re-character recognition unit 106 includes a control unit 108, a recognition unit management unit 109, and a recognition unit application unit 110. The recognition unit management unit 109 manages a plurality of recognition units suitable for each font and character type. The control unit 108 refers to the element-recognition unit correspondence knowledge 107 and allocates an appropriate recognition unit corresponding to the corresponding element to each character string region in the logical structure recognition result 106. The recognizing means applying unit 110 applies the recognizing means selected by the control unit 108 to each character string area, and obtains a character recognition result b11
1 is generated.

【００２２】以下、図１における各構成要素の処理の詳
細について説明する。Hereinafter, the processing of each component in FIG. 1 will be described in detail.

【００２３】図２は、認識対象とする文書画像１０１の
例である。これは、紙面法規文書をスキャナで入力する
ことによって作成した画像情報であり、基本的に明朝体
の印刷文字で構成されているが、一部ゴシック体の文字
列が混在している。図２の文書画像において、ゴシック
体の文字列と明朝体の文字列とを明示したものを図３に
示す。四角で囲った部分がゴシック体の文字列であり、
その他の部分が明朝体の文字列である。つまり、行番号
１、２、３、４、１５、１７のすべての文字列と行番号
５の「第１条」、行番号１４の「第２条」がゴシック体
の文字列であり、他が明朝体の文字列である。特に、５
行目や１４行目は一行中にゴシック体の文字列と明朝体
の文字列が混在している。また、４行目と６行目は、行
のレイアウトには差がない（行頭インデントが同じ）
が、用いられているフォントは異なる。FIG. 2 shows an example of a document image 101 to be recognized. This is image information created by inputting a paper regulation document with a scanner, and is basically composed of Mincho printed characters, but some Gothic character strings are mixed. FIG. 3 shows a gothic character string and a Mincho character string in the document image of FIG. The part enclosed in a square is a Gothic character string,
Other parts are character strings in Mincho style. In other words, all the character strings of the line numbers 1, 2, 3, 4, 15, and 17, the “first rule” of the line number 5, and the “second rule” of the line number 14 are Gothic character strings. Is a character string in Mincho style. In particular, 5
In the first and fourteenth lines, a Gothic character string and a Mincho character string are mixed in one line. In addition, there is no difference in the line layout between the fourth and sixth lines (the indentation at the beginning of the line is the same)
However, the font used is different.

【００２４】文字認識部１０２では、文書画像中の文字
全体に対して、マルチフォント方式による一様な方法で
文字認識を行う。これは、最初に文字認識を行う際に
は、図３に示したような文字毎のフォントの情報は与え
られていないためである。この文字認識は、単一の認識
手段で実現してもよいし、複数の認識手段を組み合わせ
て実現してもよい。このような文書画像情報に対する文
字認識技術は既存技術であり、例えば文献「文字構造情
報に基づく高精度な文字切出し処理を用いた文書認識シ
ステム」（孫ら，情報処理学会論文誌Ｖol.３３Ｎ
o.９pp.１０８３−１０９１（１９９２））を参照さ
れたい。The character recognizing unit 102 performs character recognition on all the characters in the document image by a uniform method using the multi-font method. This is because font information for each character as shown in FIG. 3 is not given when character recognition is first performed. This character recognition may be realized by a single recognition unit, or may be realized by combining a plurality of recognition units. Such a character recognition technology for document image information is an existing technology. For example, a document “Document Recognition System Using Highly Accurate Character Extraction Processing Based on Character Structure Information” (Sun et al., IPSJ Transactions Vol. 33 N
o. 9 pp. 1083-1091 (1992)).

【００２５】文字認識部１０２によって図２の文書画像
を認識した文字認識結果ａ１０３の例を図４に示す。文
字認識結果ａ１０３は図４の例のように、少なくとも文
書画像上で文字の存在する領域を表す「文字領域」と、
その文字領域内の文字の認識結果である「文字コード」
との二項関係を含む。図４の例における文字領域の表記
法を図５に示す。ここでは、文書画像上の各文字に外接
する矩形（文書画像に対して設定されたｘ軸またはｙ軸
に平行な４辺で構成される長方形）の左上端の座標と右
下端の座標との組を、文字領域を表す情報として用いて
いる。図５の例では、「昭」という文字の文字領域は、
「（ｘ１，ｙ１）−（ｘ２，ｙ２）」となる。FIG. 4 shows an example of a character recognition result a103 obtained by recognizing the document image of FIG. 2 by the character recognition unit 102. As shown in the example of FIG. 4, the character recognition result a103 includes at least a “character area” representing an area where a character exists on the document image;
"Character code" which is the recognition result of the character in the character area
And the binary relation with FIG. 5 shows the notation of the character area in the example of FIG. Here, the coordinates of the upper left corner and the lower right corner of a rectangle circumscribing each character on the document image (a rectangle configured of four sides parallel to the x-axis or y-axis set for the document image) are shown. The set is used as information representing a character area. In the example of FIG. 5, the character area of the character "Akira"
“(X1, y1) − (x2, y2)”.

【００２６】文字認識結果ａ１０３中の文字コードに対
応する文字を、元の存在位置を表すように配した例を図
６に示す。図６の中で、○印で囲んでいる文字が誤認識
文字である。本実施例中では、冒頭部で述べたように、
マルチフォント方式の認識率を９６％程度と想定してお
り、図６中では３１２文字中誤認識文字が１２文字存在
する。FIG. 6 shows an example in which the characters corresponding to the character codes in the character recognition result a103 are arranged so as to represent the original positions. In FIG. 6, characters surrounded by a circle are erroneously recognized characters. In this embodiment, as described at the beginning,
Assuming that the recognition rate of the multi-font method is about 96%, there are 12 misrecognized characters in 312 characters in FIG.

【００２７】次に、論理構造認識部１０４において、文
字認識結果ａ１０３に対する論理構造認識を行う。論理
構造認識とは、文書の種類に応じて図７に示すような論
理的な構造を予め定義しておいた上で、入力された個々
の文書の部分的な文字列に対して、各文字列が文書中で
論理的にどのような意味をもつのか、すなわち図７のよ
うな構造の中のどの要素（以下「エレメント」）に対応
するのかを解析する処理のことであり、文字の位置情報
や認識結果の文字コード情報などを利用して解析を行な
う。Next, the logical structure recognition unit 104 performs logical structure recognition on the character recognition result a103. The logical structure recognition means that a logical structure as shown in FIG. 7 is defined in advance according to the type of a document, and a partial character string of an input individual document is converted to each character. This is a process of analyzing what the column logically has in the document, that is, what element (hereinafter, “element”) in the structure shown in FIG. 7 corresponds to the character position. The analysis is performed using the information and the character code information of the recognition result.

【００２８】図７は、図２の法規文書に対して設定され
た論理的な構造の定義例である。この例では、文書全体
は「条例」というエレメントに対応し、その「条例」
は、「題名」「公布」「本則」「附則」という下位エレ
メントを自らの内容とする。「公布」エレメントはさら
に、「公布年月日」「例規番号」「公布文」といった下
位エレメントから構成される。この図の中で「？」マー
クは実際の文書の中ではそのエレメントが存在してもし
なくてもいいことを表現し、「＋」マークは、そのエレ
メントが複数回出現可能であることを示す。また、「＃
ＰＣＤＡＴＡ」を内容とするエレメントは、そのエレメ
ントの内容が下位エレメントではなく文字列データその
ものであることを示す。このようなエレメントを特に
「文字列対応エレメント」と呼ぶことにする。FIG. 7 is a definition example of a logical structure set for the regulatory document of FIG. In this example, the entire document corresponds to the element "Ordinance" and the "Ordinance"
Shall have its own sub-elements of "Title", "Promulgation", "Main Rules" and "Supplementary Rules". The “promulgation” element further includes lower elements such as “promulgation date”, “regulation number”, and “promulgation sentence”. In this figure, the "?" Mark indicates that the element may or may not exist in the actual document, and the "+" mark indicates that the element can appear multiple times. . Also,"#
An element having the content “PCDATA” indicates that the content of the element is not the lower element but the character string data itself. Such an element is particularly called a “character string corresponding element”.

【００２９】論理構造認識部１０４は、図６に例示した
ような文字認識結果ａ１０３中の部分的な文字列が、図
７の構造中のどの文字列対応エレメントに対応するのか
を解析する。そして、文字列対応エレメントの並びを基
にして、図７に示したような「条例」エレメントを根と
したエレメントの木構造を生成する。このような文字認
識結果に対する論理構造認識の技術は既存技術であり、
例えば、本出願人が先に出願した特願平７−２２３０１
７号に記載されている。The logical structure recognition unit 104 analyzes which character string corresponding element in the structure of FIG. 7 corresponds to the partial character string in the character recognition result a103 as illustrated in FIG. Then, based on the arrangement of the element corresponding to the character string, a tree structure of the element having the “rule” element as a root as shown in FIG. 7 is generated. The technology of logical structure recognition for such a character recognition result is an existing technology,
For example, Japanese Patent Application No. 7-22301 filed earlier by the present applicant.
No. 7.

【００３０】なお、文字認識結果中の誤認識文字が原因
で論理構造認識が失敗することはありうる。再文字認識
部１０６における再度の文字認識は、論理構造認識に成
功した文書にのみ適用することになる。しかし、例え
ば、特開平７−２５７４３１号公報では、「論理構造認
識の失敗の原因となった文字を人手で修正して、再度論
理構造認識を行う」という手続きを繰り返すことで、誤
認識文字を含む認識結果に対しても論理構造認識を成功
させる手法が提案されている。Incidentally, the logical structure recognition may fail due to an erroneously recognized character in the character recognition result. The character recognition performed again by the character re-recognition unit 106 is applied only to a document for which the logical structure has been successfully recognized. However, for example, in Japanese Unexamined Patent Publication No. Hei 7-257431, by repeating the procedure of “correcting the character that caused the failure of the logical structure recognition manually and performing the logical structure recognition again”, the erroneously recognized character is repeated. A method has been proposed for making the logical structure recognition successful even for the recognition result including the recognition result.

【００３１】図８に、論理構造認識部１０４において生
成した論理構造認識結果１０５の例を示す。論理構造認
識結果は、文字認識結果ａ１０３中の部分文字列と、そ
の部分文字列が文書画像上で占める領域（文字列領域）
を表す情報、及び部分文字列に対して割り当てられた文
字列対応エレメント名からなる。この三つの項目のう
ち、後の処理で用いるのは、文字列領域を表す情報と、
それに対応するエレメントとの二項目である。そのた
め、論理構造認識結果１０５は、少なくとも文字列領域
を表す情報とエレメント名との二項関係を含んでいれば
良い。FIG. 8 shows an example of the logical structure recognition result 105 generated by the logical structure recognition unit 104. The logical structure recognition result includes a partial character string in the character recognition result a103 and an area occupied by the partial character string on the document image (character string area).
, And a character string corresponding element name assigned to the partial character string. Of the three items, the information used in the subsequent processing is information indicating a character string area,
It is two items with the corresponding element. Therefore, the logical structure recognition result 105 only needs to include at least the binary relation between the information indicating the character string area and the element name.

【００３２】文字列領域の導出法、並びに表記法を図９
に示す。文字列領域は、図９（ａ）に示すように、文字
列を構成する文字の文字領域群に外接する矩形として生
成し、その矩形の左上端の座標と右下端の座標との組で
表現する。文字列が複数行にわたる場合には、行毎に文
字列領域を抽出し、その組合せによって全体の文字列領
域を表現する。例えば図９（ｂ）の例では、あるエレメ
ントに対応する文字列領域が二行にまたがっている。こ
のような場合には、各行において文字領域群に外接する
矩形を左上端の座標と右下端の座標との組で表現し、か
つそれらを「，」で区切って並べることにより、矩形の
組合せを表現する。FIG. 9 shows a method of deriving a character string area and a notation.
Shown in As shown in FIG. 9A, the character string area is generated as a rectangle circumscribing a character area group of characters constituting the character string, and is represented by a set of upper left coordinates and lower right coordinates of the rectangle. I do. If the character string extends over a plurality of lines, a character string region is extracted for each line, and the entire character string region is expressed by a combination thereof. For example, in the example of FIG. 9B, a character string area corresponding to a certain element extends over two lines. In such a case, the rectangles circumscribing the character area group in each line are represented by a set of upper left coordinates and lower right coordinates, and they are separated by “,” and arranged to form a rectangle combination. Express.

【００３３】次に、再文字認識部１０６の処理を詳細に
述べる。図１０に、エレメント−認識手段対応知識１０
７の例を示す。これは、認識対象の文書の種類に応じて
予め設定しておく情報であり、論理構造の要素であるエ
レメントと、そのエレメントに対応する文字列を認識す
る際に最も適した認識手段との間の二項関係の集合であ
る。例えば図１０の例では、エレメント「題名」に対応
する文字の認識には「ゴシック体用認識手段」を適用し
（１００１）、エレメント「条規定」に対応する文字の
認識には「明朝体用認識手段」を適用する（１００２）
ことを設定している。Next, the processing of the character re-recognition unit 106 will be described in detail. FIG. 10 shows the element-recognition means correspondence knowledge 10.
7 is shown. This is information that is set in advance according to the type of the document to be recognized, and is used between an element that is an element of the logical structure and a recognition unit that is most suitable for recognizing a character string corresponding to the element. Is a set of binary relations. For example, in the example of FIG. 10, "Gothic type recognition means" is applied to the recognition of the character corresponding to the element "Title" (1001), and "Mincho type" is used to recognize the character corresponding to the element "Art. Application ”(1002)
That is set.

【００３４】制御部１０８では、エレメント−認識手段
対応知識１０７を参照して、論理構造認識結果１０６中
の各文字列領域に対して、対応するエレメントに応じた
適切な認識手段を割り当てる。これは論理構造認識結果
１０６中における「文字列領域とエレメント名の二項関
係」と、エレメント−認識手段対応知識１０７における
「エレメント名と認識手段の二項関係」を用いて行う。
論理構造認識の結果生成された文字列領域の一つを仮に
「領域Ａ」とする。まず、その領域Ａを含む「文字列領
域とエレメント名の二項関係」から、領域Ａに対応する
エレメント名を得る。そして、そのエレメント名を含む
「エレメント名と認識手段の二項関係」を、エレメント
−認識手段対応知識１０７の中から探索する。探索の結
果得られた二項関係中の「認識手段」の内容が、領域Ａ
を認識するための認識手段として割り当てられる。The control unit 108 refers to the element-recognition means correspondence knowledge 107 and allocates an appropriate recognition means corresponding to the corresponding element to each character string area in the logical structure recognition result 106. This is performed using “binary relation between character string area and element name” in the logical structure recognition result 106 and “binary relation between element name and recognition means” in the element-recognition means correspondence knowledge 107.
One of the character string regions generated as a result of the logical structure recognition is temporarily referred to as “region A”. First, an element name corresponding to the area A is obtained from the “binary relationship between the character string area and the element name” including the area A. Then, the "binary relation between the element name and the recognition means" including the element name is searched for in the element-recognition means correspondence knowledge 107. The content of the “recognition means” in the binary relation obtained as a result of the search is the area A
Is assigned as a recognition means for recognizing the.

【００３５】例えば、図１１の例では、論理構造認識結
果において文字列領域”（７８０，０）−（１３８０，
５０）”にはエレメント名「題名」が対応している。こ
の文字列領域に対して認識手段を割り当てるために、エ
レメント−認識手段対応知識の中から、エレメント名
「題名」を含む二項関係を探索する。そして、探索の結
果得られた二項関係中の認識手段、すなわち「ゴシック
体用認識手段」が、文字列領域”（７８０，０）−（１
３８０，５０）”を認識するための手段として割り当て
られる。For example, in the example of FIG. 11, the character string area "(780,0)-(1380,
50) "corresponds to the element name" title ". In order to assign the recognition means to this character string area, a binary relation including the element name "title" is searched from the element-recognition means correspondence knowledge. Then, the recognition means in the binary relation obtained as a result of the search, that is, the “recognition means for Gothic body” is used as the character string area “(780, 0) − (1
380, 50) ".

【００３６】この手法を全ての文字列領域に対して適用
することにより、各文字列領域に対して適切な認識手段
を割り当てる。この処理の結果、図８中の各文字領域に
対して割り当てられた認識手段の例を図１２に示す。こ
の例では、例えば１２０２の文字列領域と１２０３の文
字列領域の前半部（”（２４０，２４０）−（２４０
０，２９０）”）は、図２の文書画像では同じ５行目に
属するが、異る認識手段が割り当てられている。つま
り、一行に対して複数のフィールドを割り当てている。
また、１２０１の文字領域は図２の４行めに、１２０３
の文字列領域の後半部（”（６０，３００）−（８６
０，３５０）”）は図２の６行めにそれぞれ相当する
が、これらの行はレイアウト（行頭インデント）が同じ
であるにもかかわらず、異る認識手段が割り当てられ
る。By applying this method to all character string regions, appropriate recognition means is assigned to each character string region. FIG. 12 shows an example of recognition means assigned to each character area in FIG. 8 as a result of this processing. In this example, for example, the first half ("(240,240)-(240)" of the character string area of 1202 and the character string area of 1203
0, 290) ") belong to the same fifth line in the document image of FIG. 2, but are assigned different recognition means. That is, a plurality of fields are assigned to one line.
The character area 1201 is set to the fourth line in FIG.
("(60,300)-(86
0, 350) ") correspond to the sixth row in FIG. 2, respectively, but these rows are assigned different recognition means even though they have the same layout (line indent).

【００３７】認識手段適用部１１０では、制御部１０８
が選択した認識手段を認識手段管理部１０９から呼び出
して各文字列領域に対して適用し、文字認識結果ｂ１１
１を生成する。各文字列領域について、その領域のフォ
ントに応じたフォント特化方式による文字認識を行うた
め、各領域における認識率は、マルチフォント方式によ
って認識した場合よりも高くなることが期待できる。そ
のため、文書全体における認識率も、文字認識結果ａ１
０３の認識率より高くなることが期待できる。図２に例
示した文書画像１０１に対して再文字認識部１０６によ
る文字認識を行った文字認識結果ｂ１１１の例を図１３
に示す。本実施例中では、冒頭部で述べたように、フォ
ント特化方式の認識率を９９％程度と想定しており、図
１３中では３１２文字中誤認識文字が３文字存在する。
図６の文字認識結果ａ１０３の例と比較して、誤認識文
字数が減少している。In the recognizing means application unit 110, the control unit 108
Calls the recognition means selected from the recognition means management unit 109 and applies it to each character string area, and the character recognition result b11
1 is generated. For each character string region, character recognition is performed by a font specialization method according to the font of the region. Therefore, the recognition rate in each region can be expected to be higher than in the case of recognition by the multi-font method. Therefore, the recognition rate of the entire document is also the same as the character recognition result a1.
03 can be expected to be higher than the recognition rate. FIG. 13 shows an example of a character recognition result b111 obtained by performing character recognition by the character re-recognition unit 106 on the document image 101 illustrated in FIG.
Shown in In the present embodiment, as described at the beginning, it is assumed that the recognition rate of the font specialization method is about 99%. In FIG. 13, there are three misrecognized characters out of 312 characters.
Compared to the example of the character recognition result a103 in FIG. 6, the number of misrecognized characters is reduced.

【００３８】以下、他の文書認識手法の例について述べ
る。この手法は、先に述べた手法と比べると、再文字認
識部１０６中の制御部１０８の参照する情報のみが異
る。本手法における再文字認識部１０６の構成例を図１
４に示す。An example of another document recognition method will be described below. This method differs from the above-described method only in the information referred to by the control unit 108 in the re-character recognition unit 106. FIG. 1 shows a configuration example of the character re-recognition unit 106 in this method.
It is shown in FIG.

【００３９】エレメント−文字特徴種別対応知識１４０
１は、文字列対応エレメントのエレメント名と、そのエ
レメントの内容を表す文字の文書画像における特徴種別
との間の二項関係の集合である。文字特徴種別−認識手
段対応知識１４０２は、文字列の特徴の種別と、その種
別の文字を認識するのに適した認識手段との間の二項関
係の集合であり、文字の特徴の種別から、その特徴に適
した認識手段を導出する手段を実現するための一例であ
る。ここで、エレメント−文字特徴種別対応知識１４０
１は、対象とする文書の種類に応じて変更する必要があ
るが、文字特徴種別−認識手段対応知識１４０２は、認
識手段格納部１０９で管理する認識手段に変更があった
場合にのみ更新すればよい。Element-character feature type correspondence knowledge 140
1 is a set of binary relations between an element name of a character string corresponding element and a feature type in a document image of a character representing the content of the element. Character feature type-recognition means correspondence knowledge 1402 is a set of binary relations between the type of a character string feature and a recognition means suitable for recognizing a character of that type. This is an example for realizing means for deriving recognition means suitable for the feature. Here, the element-character feature type correspondence knowledge 140
1 needs to be changed according to the type of the target document, but the character feature type-recognition means correspondence knowledge 1402 is updated only when the recognition means managed by the recognition means storage unit 109 is changed. I just need.

【００４０】図１５に、制御部１０８が論理構造認識結
果１０５中の各文字列領域に対して適切な認識手段を割
り当てる処理例を示す。論理構造認識結果１０５におい
て、文字列領域”（２０，０）−（１００，２０）”に
はエレメント名「登録番号」が対応している。このと
き、エレメント−文字特徴種別対応知識１４０１の中か
ら、エレメント名「登録番号」を含む二項関係を探索す
る（１５０１）。これにより、文字列領域”（２０，
０）−（１００，２０）”内の文字は「パターンＢ」と
いう特徴を持つことが分かる。さらに、文字特徴種別−
認識手段対応知識１４０２の中から、文字特徴種別「パ
ターンＢ」を含む二項関係を探索する（１５０２）。こ
れにより、文字列領域”（２０，０）−（１００，２
０）”に対して「英数字認識手段」という認識手段が割
り当てられる。FIG. 15 shows an example of processing in which the control unit 108 allocates appropriate recognition means to each character string area in the logical structure recognition result 105. In the logical structure recognition result 105, the element name “registration number” corresponds to the character string area “(20, 0)-(100, 20)”. At this time, a binary relation including the element name “registration number” is searched from the element-character feature type correspondence knowledge 1401 (1501). As a result, the character string area “(20,
It can be seen that the characters in (0)-(100, 20) "have the feature of" pattern B ". Furthermore, character feature type
A binary relation including the character feature type “pattern B” is searched from the recognition means correspondence knowledge 1402 (1502). As a result, the character string area "(20,0)-(100,2
Recognition means called “alphanumeric character recognition means” is assigned to “0)”.

【００４１】以上の処理を全ての文字列領域について適
用することにより、再文字認識部１０６を図１４のよう
に変更した構成例においても各文字列領域に対して適切
な認識手段が割り当てられる。そして、認識手段適用部
１１０が各文字列領域に割り当てられた認識手段を実行
することにより、図１の文字認識結果ａ１０３よりも高
精度な文字認識結果ｂ１１０が得られる。By applying the above processing to all the character string regions, an appropriate recognition means is assigned to each character string region even in a configuration example in which the re-character recognition unit 106 is changed as shown in FIG. Then, the recognition unit application unit 110 executes the recognition unit assigned to each character string area, so that a character recognition result b110 with higher accuracy than the character recognition result a103 in FIG. 1 is obtained.

【００４２】[0042]

【発明の効果】以上のように、本発明によれば、エレメ
ント毎に適切な認識手段を対応付けておき、これを論理
構造認識結果によって分割した各文字列領域に対して適
用することにより、個々の認識手段を割り当てる領域を
設定するフィールド分割を自動的に行うことができる。
また、エレメントに注目したフィールド分割を行ってい
るため、フィールドの物理的な位置や個数が個々の文書
について変動しても対応可能であり、かつ一行を複数の
フィールドに分割することができ、行のレイアウトが同
一であってもその内容によって異なる認識手段を割り当
てるような機能を実現できる。As described above, according to the present invention, an appropriate recognition means is associated with each element, and this is applied to each character string area divided according to the logical structure recognition result. Field division for setting an area to which each recognition unit is assigned can be automatically performed.
In addition, since field division is performed focusing on elements, even if the physical position and number of fields fluctuate in each document, it is possible to cope with the change, and one line can be divided into a plurality of fields. Even if the layouts are the same, a function of assigning different recognition means depending on the contents can be realized.

[Brief description of the drawings]

【図１】本発明の実施例（請求項１及び２に対応）に係
わる文書認識手法の構成を説明するブロック図である。FIG. 1 is a block diagram illustrating a configuration of a document recognition method according to an embodiment of the present invention (corresponding to claims 1 and 2).

【図２】文書画像の例を示した図である。FIG. 2 is a diagram illustrating an example of a document image.

【図３】図２に示した文書中のゴシック体の文字列と明
朝体の文字列の区別を明示した図である。FIG. 3 is a diagram clearly showing a distinction between a Gothic character string and a Mincho character string in the document shown in FIG. 2;

【図４】マルチフォント方式による文字認識の認識結果
を、文字コードと文字領域との対応関係で表現した例で
ある。FIG. 4 is an example in which a recognition result of character recognition by a multi-font method is represented by a correspondence between a character code and a character area.

【図５】文字領域の表し方を示した図である。FIG. 5 is a diagram showing how to represent a character area.

【図６】マルチフォント方式による認識結果例における
誤文字認識を示した図である。FIG. 6 is a diagram illustrating erroneous character recognition in a recognition result example using the multi-font method.

【図７】図１に示した文書に対して設定された論理構造
の定義例を示した図である。FIG. 7 is a diagram illustrating an example of a definition of a logical structure set for the document illustrated in FIG. 1;

【図８】論理構造認識結果の例を示した図である。FIG. 8 is a diagram showing an example of a logical structure recognition result.

【図９】文字列領域の導出方法と表現方法について示し
た図である。FIG. 9 is a diagram showing a method of deriving and expressing a character string area.

【図１０】エレメント−認識手段対応知識の例である。FIG. 10 is an example of element-recognition means correspondence knowledge.

【図１１】文字列領域に対して、それが対応するエレメ
ント名に応じて認識手段を割り当てる処理の例である。FIG. 11 is an example of a process of assigning a recognition unit to a character string area according to an element name corresponding to the character string area.

【図１２】文字列領域に対して認識手段を割り当てた結
果の例である。FIG. 12 is an example of a result of assigning a recognition unit to a character string area.

【図１３】フォント特化方式による文字認識の認識結果
の例を示した図である。FIG. 13 is a diagram illustrating an example of a recognition result of character recognition by a font specialization method.

【図１４】図１の構成を請求項１及び３に対応した形に
するために、再文字認識部の構成に変更を加えた図であ
る。FIG. 14 is a diagram in which the configuration of the character re-recognition unit is modified in order to make the configuration of FIG. 1 correspond to the first and third aspects.

【図１５】図１４の構成における制御部が、論理構造認
識結果中の各文字列領域に対して適切な認識手段を割り
当てる処理の例である。FIG. 15 is an example of a process in which the control unit in the configuration of FIG. 14 assigns appropriate recognition means to each character string area in the logical structure recognition result.

Claims

[Claims]

1. A character recognizing means for extracting a character from image information of a document to be recognized and recognizing the extracted character, and performing a logical structure recognition on the character recognition result to correspond at least to a partial character string. Logical structure recognizing means for generating a logical structure recognition result including a binary relation between a region in image information to be processed, ie, a character string region, and a type of a logical element, ie, a logical structure element, in the recognition target document; For each character string area in the recognition result, at least the type of the logical structure element associated in the logical structure recognition result is used as one of the criteria to derive a character recognition means suitable for the character string area, and A document recognizing device comprising: a character re-recognizing unit that performs character recognition.

2. The document recognition apparatus according to claim 1, wherein:
The re-character recognizing means includes at least a pre-created type of a logical structural element and a recognizing means suitable for a characteristic of a character to appear in a character string area corresponding to each of the various logical structural elements. A document recognition apparatus characterized in that an appropriate recognition means is assigned to each character string region in image information to be recognized by referring to a term relation, and character recognition is performed again.

3. A method according to claim 1, wherein said re-character recognizing means includes a pre-created type of a logical structural element;
A binary relation between information indicating the type of a character feature to appear in the character string area corresponding to each type of logical structure element,
Means for deriving recognition means suitable for the feature from the type of the character feature, thereby assigning an appropriate recognition means to each character string region and performing character recognition again. Recognition device.