JP2008033960A

JP2008033960A - Apparatus and method for document image processing

Info

Publication number: JP2008033960A
Application number: JP2007260130A
Authority: JP
Inventors: Hiroshi Kamata; 洋鎌田; Katsuto Fujimoto; 克仁藤本; Koji Kurokawa; 浩司黒川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-10-03
Filing date: 2007-10-03
Publication date: 2008-02-14
Anticipated expiration: 2017-11-18
Also published as: JP4589370B2

Abstract

<P>PROBLEM TO BE SOLVED: To reduce user load in check/correction operations for recognition/editing processing results of document images, and to realize efficient operations. <P>SOLUTION: A document image which is inputted from an input part 21 is area identified by an identification part 22, and recognized by a recognition part 23. Then, a type code of each area and an individual code of a recognition result are stored in a storage part 24, and displayed on a display part 25. A user inputs a correction instruction through a correction part 26, and then corrects the results of area identification and recognition processing by one operation. In addition, an original image is displayed near the recognition results on the display part 25, so that movement of a user's line of sight is reduced. If there is not a correct answer in recognition candidates, a code is added to the original image to consider the image as a recognition result, so that the image is corrected with keeping the possibility of editing. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、文書画像を入力として、文字、表、図、枠などの文書画像構成要素の画像を決定し、文書画像構成要素の認識によるコード化処理を行う文書画像処理装置に関する。 The present invention relates to a document image processing apparatus that uses a document image as an input, determines an image of a document image constituent element such as a character, a table, a figure, or a frame, and performs a coding process by recognizing the document image constituent element.

近年のパーソナルコンピュータの普及と通信ネットワークの整備により、電子化文書が多く流通するようになっている。しかし、情報流通の主要媒体は依然として紙文書であり、既存の紙文書も多く存在する。そこで、紙文書から電子化文書への変換を行い、変換結果を編集する文書画像認識・編集装置の需要が拡大している。 With the spread of personal computers and the development of communication networks in recent years, many electronic documents have been distributed. However, the main medium for information distribution is still paper documents, and there are many existing paper documents. Therefore, the demand for a document image recognition / editing apparatus for converting a paper document into an electronic document and editing the conversion result is increasing.

文書画像認識・編集装置とは、文書画像を入力として、文字、表、図、枠などの文書画像構成要素の画像を決定し、文書画像構成要素の認識によるコード化処理を行う装置である。コード化処理では、特に、文字画像が文字コードに変換される。 The document image recognition / editing apparatus is an apparatus that determines an image of a document image constituent element such as a character, a table, a figure, a frame, and the like, and performs a coding process by recognizing the document image constituent element, using the document image as an input. In the encoding process, in particular, a character image is converted into a character code.

しかし、文書画像処理装置における認識処理では、正解率が１００％にならないため、正解でない認識結果の扱いが課題であり、特に、効率的に修正作業が行える仕組みが望まれている。 However, in the recognition process in the document image processing apparatus, since the correct answer rate does not reach 100%, the handling of recognition results that are not correct is a problem, and in particular, a mechanism that can perform correction work efficiently is desired.

図３８は、従来の文書画像認識・編集装置の構成図である。文書画像入力部１は、処理対象の文書画像を入力し、領域識別部２は、画像中の個別領域を識別して、その結果を領域識別結果格納部３に格納する。ここで、表示部８が、領域識別結果を画面に表示し、必要に応じて、ユーザがそれを修正する。このとき、第１の修正部６により、領域識別結果格納部３内のデータが修正される。 FIG. 38 is a block diagram of a conventional document image recognition / editing apparatus. The document image input unit 1 inputs a document image to be processed, and the region identification unit 2 identifies an individual region in the image and stores the result in the region identification result storage unit 3. Here, the display unit 8 displays the region identification result on the screen, and the user corrects it as necessary. At this time, the data in the region identification result storage unit 3 is corrected by the first correction unit 6.

次に、個別領域認識部４は、個別領域内の文字を認識し、その結果を認識結果格納部５に格納する。そして、表示部８が、認識結果を画面に表示し、必要に応じて、ユーザがそれを修正する。このとき、第２の修正部７により、認識結果格納部５内のデータが修正される。 Next, the individual area recognition unit 4 recognizes characters in the individual area and stores the result in the recognition result storage unit 5. And the display part 8 displays a recognition result on a screen, and a user corrects it as needed. At this time, the data in the recognition result storage unit 5 is corrected by the second correction unit 7.

このような文書画像認識・編集装置においては、正解率が１００％にならない認識結果に対する扱いと修正作業を、次のように処理している。
（１）領域識別部２による領域識別処理として、個別領域の文書画像構成要素の文章、表、図、枠などの属性を、必要であれば修正して決定した後、個別領域認識部４が、属性に応じた個別の文書画像構成要素の認識を行う。文章領域であれば、個別の文字画像を決定し、文字認識する。表領域であれば、罫線抽出を行い、各セル内の文字領域を決定し、文字認識する。認識結果は、必要に応じて修正される。 In such a document image recognition / editing apparatus, handling and correction work for recognition results whose correct answer rate does not reach 100% are processed as follows.
(1) As an area identification process by the area identification unit 2, the individual area recognition unit 4 determines the attributes such as text, table, figure, frame, etc. of the document image component of the individual area by correcting if necessary. , Recognition of individual document image constituent elements according to attributes is performed. If it is a text area, an individual character image is determined and character recognition is performed. If it is a table area, ruled line extraction is performed, a character area in each cell is determined, and characters are recognized. The recognition result is corrected as necessary.

（２）文字認識処理の結果は、図３９に示すように、確からしい順に並んだ候補文字コードの列を含んでいる。第１位の候補文字コードが認識結果の初期値である。第２の修正部７は、第２位以下の候補文字コードを表示し、ユーザはそれらのうちの１つを選択することができる。文字認識結果の修正時には、対応する文字画像は、入力画像中の元の位置Ｐ１に表示される。 (2) The result of the character recognition processing includes a sequence of candidate character codes arranged in a probable order as shown in FIG. The first candidate character code is the initial value of the recognition result. The 2nd correction part 7 displays the candidate character code below the 2nd rank, and the user can select one of them. When the character recognition result is corrected, the corresponding character image is displayed at the original position P1 in the input image.

しかしながら、従来の文書画像認識・編集装置においては、次に述べるように、認識結果の修正に多大な労力を要するという問題がある。
（１）従来の文書画像処理は、領域識別と領域内認識という２段階から成っており、各段階でユーザの修正処理を含む構成である。このように、ユーザにとっては、２度の修正操作が必要となり、操作が煩わしい。また、領域識別の段階で識別誤りがなくても、識別誤りの有無を確認する必要があり、この確認を省略した場合、領域内認識の後で識別誤りのあった箇所を修正することはできない。この場合、正しい処理結果を得るには、最初から処理をやり直し、領域識別の段階で識別誤りを修正する必要がある。 However, the conventional document image recognition / editing apparatus has a problem that a great deal of labor is required to correct the recognition result as described below.
(1) Conventional document image processing includes two stages of area identification and intra-area recognition, and includes a user correction process at each stage. As described above, the user needs two correction operations, which is troublesome. Even if there is no identification error at the region identification stage, it is necessary to confirm the presence or absence of the identification error. If this confirmation is omitted, it is not possible to correct the location where the identification error occurred after the intra-region recognition. . In this case, in order to obtain a correct processing result, it is necessary to repeat the processing from the beginning and correct the identification error at the region identification stage.

（２）文書画像構成要素の認識結果表示に含まれる情報は、図３９に示したように、コード情報のみである。このため、文字認識結果が正解であるかどうかを確かめるには、認識結果表示において対象となる文字が指示された場合に、入力画像中の対応する文書画像構成要素の位置Ｐ１を枠で囲って表示するなどしていた。しかし、認識結果表示のコード情報と入力画像中の文字画像を比較照合する際に、ユーザの視点の移動が大きく、照合作業はユーザにとって負担になる。 (2) The information included in the recognition result display of the document image component is only code information as shown in FIG. Therefore, in order to confirm whether the character recognition result is correct, when a target character is designated in the recognition result display, the position P1 of the corresponding document image component in the input image is surrounded by a frame. It was displayed. However, when comparing and collating the code information of the recognition result display with the character image in the input image, the movement of the user's viewpoint is large, and collation work becomes a burden on the user.

また、候補文字コードの修正選択では、候補文字中に正しい文字がない場合がある。この場合、正しい文字コードを最初から入力する必要があり、入力作業がユーザにとって負担になる。 In addition, in the candidate character code correction selection, there are cases where there is no correct character among the candidate characters. In this case, it is necessary to input a correct character code from the beginning, and the input work becomes a burden on the user.

本発明の課題は、文書画像認識・編集装置による処理結果の確認・修正作業において、ユーザの負担を軽減し、効率の良い操作を実現する文書画像処理装置およびその方法を提供することである。 SUMMARY OF THE INVENTION An object of the present invention is to provide a document image processing apparatus and method for reducing the burden on the user and realizing an efficient operation in the confirmation / correction of processing results by the document image recognition / editing apparatus.

図１は、本発明の文書画像処理装置の構成図である。図１の文書画像処理装置は、識別手段１１、認識手段１２、出力手段１３、修正手段１４、抽出手段１５、コード付加手段１６、および編集手段１７を備え、入力された画像の認識処理を行う。 FIG. 1 is a configuration diagram of a document image processing apparatus according to the present invention. The document image processing apparatus of FIG. 1 includes an identification unit 11, a recognition unit 12, an output unit 13, a correction unit 14, an extraction unit 15, a code addition unit 16, and an editing unit 17, and performs a recognition process on an input image. .

本発明の第１の原理によれば、識別手段１１、認識手段１２、出力手段１３、および修正手段１４は、次のように動作する。
識別手段１１は、入力画像のパターン領域を識別して、パターン領域の種類を決定する。認識手段１２は、パターン領域内に含まれるパターンの認識処理を行う。出力手段１３は、パターン領域の種類を表す種類情報とパターンを表す個別情報とを、入力画像を構成する画像構成要素の認識結果候補として出力する。修正手段１４は、認識結果候補を修正する。 According to the first principle of the present invention, the identification unit 11, the recognition unit 12, the output unit 13, and the correction unit 14 operate as follows.
The identification unit 11 identifies the pattern area of the input image and determines the type of the pattern area. The recognition unit 12 performs recognition processing for patterns included in the pattern area. The output unit 13 outputs the type information indicating the type of the pattern area and the individual information indicating the pattern as recognition result candidates of the image constituent elements constituting the input image. The correcting unit 14 corrects the recognition result candidate.

パターン領域とは、入力された画像に含まれる、文章、表、図、囲み枠、文字などの画像領域を指し、あるパターン領域が他のパターン領域を含む場合もあり得る。例えば、文章のパターン領域は、通常、複数の文字のパターン領域から構成される。また、画像構成要素とは、入力画像の部分画像を指し、パターン領域またはパターン領域内のパターンに対応する。 The pattern area refers to an image area such as a sentence, a table, a figure, a frame, or a character included in the input image, and a certain pattern area may include another pattern area. For example, the pattern area of a sentence is usually composed of a pattern area of a plurality of characters. An image component refers to a partial image of an input image and corresponds to a pattern area or a pattern in the pattern area.

識別手段１１は、認識すべきパターン領域の種類が、文章、表、図、囲み枠、文字などのうちどれに対応するかを決定し、認識手段１２は、文章、表、囲み枠などの内部構造を持つパターン領域について、文字認識や罫線認識などの認識処理を行う。 The identification unit 11 determines which type of the pattern area to be recognized corresponds to a sentence, a table, a figure, a box, a character, and the like. Recognition processing such as character recognition and ruled line recognition is performed on a pattern area having a structure.

そして、出力手段１３は、パターン領域の種類情報と認識されたパターンを表す文字フォントなどの個別情報とを認識結果候補として、一括して出力する。ユーザは、出力結果を見ながら、修正手段１４を用いて、種類情報や個別情報を一括して修正することができる。 Then, the output means 13 collectively outputs the pattern area type information and individual information such as a character font representing the recognized pattern as recognition result candidates. The user can collectively correct the type information and the individual information using the correcting unit 14 while looking at the output result.

このように、第１の原理によれば、文書画像の領域識別と領域内認識を一括して行い、その結果を一括して修正できる。このため、従来のような２段階の修正作業を行わなくてもよくなり、修正作業におけるユーザの負担が軽減される。 As described above, according to the first principle, it is possible to collectively perform region identification and intra-region recognition of document images and collectively correct the result. For this reason, it is not necessary to perform the conventional two-stage correction work, and the burden on the user in the correction work is reduced.

また、本発明の第２の原理によれば、出力手段１３、抽出手段１５、コード付加手段１６、および編集手段１７は、次のように動作する。
抽出手段１５は、入力画像から、それを構成する画像構成要素を抽出する。コード付加手段１６は、画像構成要素に新たなコード情報を付加する。出力手段１３は、画像構成要素に対応する画像データと既存のコード情報に対応する文字パターンが混在した文書情報を出力する。編集手段１７は、新たなコード情報と既存のコード情報を用いて文書情報を編集する。 Further, according to the second principle of the present invention, the output means 13, the extraction means 15, the code addition means 16, and the editing means 17 operate as follows.
The extraction means 15 extracts the image component which comprises it from an input image. The code adding means 16 adds new code information to the image component. The output unit 13 outputs document information in which image data corresponding to image components and character patterns corresponding to existing code information are mixed. The editing means 17 edits document information using new code information and existing code information.

入力画像から抽出された画像構成要素にコード情報を付加することで、それを既存のコード情報に対応する文字パターンと同様に扱うことが可能となる。したがって、入力画像の部分画像と、コード情報として与えられた文字パターンとが混在した文書を表示して、編集することができる。 By adding code information to an image component extracted from an input image, it can be handled in the same manner as a character pattern corresponding to existing code information. Therefore, it is possible to display and edit a document in which a partial image of the input image and a character pattern given as code information are mixed.

第２の原理によれば、画像構成要素に付加されたコード情報を用いて、文字認識結果の候補の近くに元の画像を表示することもでき、認識結果と入力画像の比較確認のための視点移動が低減される。 According to the second principle, it is possible to display the original image near the candidate of the character recognition result by using the code information added to the image component, and to confirm the comparison between the recognition result and the input image. Viewpoint movement is reduced.

また、本発明の第３の原理によれば、認識手段１２、出力手段１３、および抽出手段１５は、次のように動作する。
抽出手段１５は、入力画像から、それを構成する画像構成要素を抽出する。認識手段１２は、画像構成要素の認識処理を行う。出力手段１３は、画像構成要素に対応する画像データを、入力画像から分離して、画像構成要素の認識結果における１つ以上の候補とともに出力する。 According to the third principle of the present invention, the recognition unit 12, the output unit 13, and the extraction unit 15 operate as follows.
The extraction means 15 extracts the image component which comprises it from an input image. The recognition unit 12 performs image component recognition processing. The output unit 13 separates the image data corresponding to the image component from the input image and outputs the image data together with one or more candidates in the recognition result of the image component.

第３の原理によれば、入力画像から抽出された画像構成要素の画像を、その認識結果候補の近くに画面表示することができ、認識結果と入力画像の比較確認のための視点移動が低減される。また、認識結果候補中に正解がない場合は、元の画像を選択して修正できるため、修正用の文字コードを入力し直す必要がなくなる。 According to the third principle, the image of the image component extracted from the input image can be displayed on the screen near the recognition result candidate, and the viewpoint movement for comparing and confirming the recognition result and the input image is reduced. Is done. Further, when there is no correct answer among the recognition result candidates, the original image can be selected and corrected, so that it is not necessary to input a correction character code again.

例えば、図１の識別手段１１、認識手段１２、修正手段１４は、それぞれ、後述する図２の領域識別部２２、個別領域認識部２３、修正部２６に対応し、図１の抽出手段１５、コード付加手段１６、編集手段１７は、それぞれ、後述する図１５の文書画像構成要素抽出部４２、コード付加部４３、編集部４９に対応する。また、例えば、図１の出力手段１３は、図２の表示部２５および図１５の表示部４６に対応する。 For example, the identification unit 11, the recognition unit 12, and the correction unit 14 in FIG. 1 correspond to an area identification unit 22, an individual region recognition unit 23, and a correction unit 26 in FIG. The code adding unit 16 and the editing unit 17 correspond to a document image constituent element extracting unit 42, a code adding unit 43, and an editing unit 49 in FIG. Further, for example, the output unit 13 in FIG. 1 corresponds to the display unit 25 in FIG. 2 and the display unit 46 in FIG.

本発明によれば、文書画像の領域識別と領域内認識を一括して行い、その結果を一括して修正できるため、修正作業におけるユーザの負担が軽減される。従来の２段階の修正作業では、１段目と２段目の修正作業の間、ユーザが拘束されていたが、この拘束時間がなくなることになる。 According to the present invention, area identification and in-area recognition of document images can be performed at once, and the results can be corrected collectively, thereby reducing the burden on the user in correction work. In the conventional two-stage correction work, the user is restrained during the first and second stage correction work, but this restraint time is eliminated.

また、領域内認識における認識結果候補の１つとして元の画像が表示されるため、認識結果と入力画像の比較確認のための視点移動が低減され、ユーザの負担が軽くなる。また、認識結果候補中に正解がない場合は、元の画像を選択して修正できるため、修正用の文字コードを入力し直す必要がなくなる。 In addition, since the original image is displayed as one of the recognition result candidates in the intra-region recognition, the viewpoint movement for comparing and confirming the recognition result and the input image is reduced, and the burden on the user is reduced. Further, when there is no correct answer among the recognition result candidates, the original image can be selected and corrected, so that it is not necessary to input a correction character code again.

以下、図面を参照しながら、本発明の実施の形態を詳細に説明する。
本発明では、上述した従来の問題点（１）、（２）に対応して、次のような対策を施す。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
In the present invention, the following measures are taken in response to the above-mentioned conventional problems (1) and (2).

（１）領域識別の識別結果候補を領域内認識段階でも保持しておき、識別結果候補と領域内認識結果候補を同時に修正選択できるようにする。これにより、領域識別と領域内認識を一括して行うことができ、処理結果を一括して修正できるようになる。 (1) An identification result candidate for area identification is retained even in the intra-area recognition stage so that the identification result candidate and the intra-area recognition result candidate can be selected and corrected simultaneously. Thereby, area identification and intra-area recognition can be performed collectively, and the processing result can be corrected collectively.

文書画像処理装置は、文書画像を入力とし、文章、表、図、囲み枠などの領域を識別し、文章、表、囲み枠などの内部構造をもつ領域については、文字認識や罫線認識など領域内部の認識処理を行う。このとき、領域および領域内部の領域からなる文書画像構成要素の認識結果候補コードとして、「文章」、「表」、「図」、「枠」などの領域の種類コードと、「文字コード」などの個別コードを合わせて含むことにより、領域識別と領域内認識の結果を一括して修正できるようにする。 The document image processing apparatus receives a document image, identifies a region such as a sentence, table, figure, and box, and has an internal structure such as a sentence, table, and frame for areas such as character recognition and ruled line recognition. Perform internal recognition processing. At this time, as a recognition result candidate code of the document image constituent element composed of the area and the area inside the area, the area type code such as “text”, “table”, “figure”, “frame”, “character code”, etc. By including the individual codes together, the results of area identification and intra-area recognition can be corrected collectively.

認識コードを修正するには、文書画像構成要素を指示する手段が必要である。文書画像構成要素とは、文書画像の部分画像であり、文字などの文書画像を構成する要素を意味する。通常は、文書画像領域の座標位置により、文書画像構成要素を指示するが、このために、文書画像構成要素の領域を下記の（ａ）、（ｂ）のいずれかの方法で定義する。
（ａ）文書画像構成要素の領域は、文書画像構成要素の文字・図形に対応する画素領域と定義する。
（ｂ）文書画像構成要素の領域は、文書画像構成要素の文字・図形の外接矩形内と定義する。 In order to correct the recognition code, means for indicating the document image component is required. The document image constituent element is a partial image of the document image, and means an element constituting the document image such as a character. Normally, the document image component is indicated by the coordinate position of the document image region. For this purpose, the region of the document image component is defined by one of the following methods (a) and (b).
(A) The region of the document image component is defined as a pixel region corresponding to the character / graphic of the document image component.
(B) The region of the document image constituent element is defined as a circumscribed rectangle of the character / graphic of the document image constituent element.

上記に定義した文書画像構成要素の領域と文書画像領域の指示座標位置により、文書画像構成要素を最初に指示する手段として、下記の（ａ）、（ｂ）、（ｃ）が考えられる。
（ａ）文書画像の指示座標を含む最も内側の文書画像構成要素を、指示対象とする。
（ｂ）文書画像の指示座標を含む最大の文書画像構成要素を、指示対象とする。
（ｃ）文書画像の指示座標に外接枠が最も近い文書画像構成要素を、指示対象とする。 The following (a), (b), and (c) are conceivable as means for first instructing the document image constituent element based on the document image constituent element area defined above and the designated coordinate position of the document image area.
(A) The innermost document image constituent element including the instruction coordinates of the document image is set as an instruction target.
(B) The largest document image constituent element including the designated coordinates of the document image is set as the designation target.
(C) The document image constituent element whose circumscribing frame is closest to the designated coordinates of the document image is set as the designation target.

上記の（ａ）、（ｂ）の方法では、指定できない文書画像構成要素が生ずる場合があるが、それらの文書画像構成要素は、上記の（ｃ）の場合も含めて、既に指定した文書画像構成要素との関係から、下記（ａ）、（ｂ）のように指示することができる。
（ａ）既に指示した文書画像構成要素を含む最も内側の文書画像構成要素を、指示対象とする。
（ｂ）既に指示した文書画像構成要素内で文書画像の指示座標を含む最大の文書画像構成要素を、指示対象とする。 In the above methods (a) and (b), there may occur document image components that cannot be specified. These document image components include document images that have already been specified, including the case of (c) above. From the relationship with the constituent elements, the following instructions (a) and (b) can be given.
(A) The innermost document image constituent element including the already instructed document image constituent element is set as an instruction target.
(B) The largest document image constituent element including the designated coordinates of the document image among the already designated document image constituent elements is set as the instruction target.

文書画像構成要素の認識結果候補コードのユーザへの呈示方法は、例えば、下記（ａ）、（ｂ）のようになる。
（ａ）認識結果候補第１位が「文字」の場合、文字認識を行い、文字認識結果の候補文字コードを上位の認識結果候補コードとし、「文字」以外の「表」、「図」、「枠」などの文書画像構成要素の種類コードを、下位の認識結果候補コードとする。
（ｂ）認識結果候補第１位が「文字」以外の「文章」、「表」、「図」、「枠」などの文書画像構成要素の種類の場合、その文書画像構成要素の種類コードを第１位の認識結果候補コードとし、これ以外の文書画像構成要素の種類コードを下位の認識結果候補コードとする。 The method of presenting the recognition result candidate code of the document image component to the user is, for example, as shown in (a) and (b) below.
(A) When the first recognition result candidate is “character”, character recognition is performed, the candidate character code of the character recognition result is set as a higher recognition result candidate code, and “table”, “figure”, The type code of the document image constituent element such as “frame” is set as a lower recognition result candidate code.
(B) When the recognition result candidate number 1 is a document image component type such as “text”, “table”, “figure”, “frame” other than “character”, the type code of the document image component is The first recognition result candidate code is used, and the other types of document image component type codes are used as lower recognition result candidate codes.

また、修正指示時の動作は下記（ａ）、（ｂ）のようになる。
（ａ）認識結果候補コードとして「文字」の種類コードを修正指示した場合、該当する入力文字画像の文字認識を行い、単数あるいは複数の文字認識結果候補文字コードで、認識結果候補コードの「文字」を置き換える。
（ｂ）認識結果候補コードとして「文字」以外の「表」、「図」、「枠」などの文書画像構成要素の種類コードを修正指示した場合、該当する入力文字画像に対して、指示された種類の文書画像構成要素としての認識を行う。 The operation at the time of a correction instruction is as follows (a) and (b).
(A) When an instruction for correcting the type code of “character” is given as the recognition result candidate code, the character recognition of the corresponding input character image is performed, and “character” of the recognition result candidate code is used with one or more character recognition result candidate character codes. Is replaced.
(B) When a correction instruction is given for a type code of a document image constituent element such as “table”, “figure”, “frame” other than “character” as a recognition result candidate code, it is instructed for the corresponding input character image. Recognition as a type of document image component.

（２）文字認識結果の候補文字表示領域内に、文字画像を表示するようにして、認識結果と入力画像の比較確認のための視点移動を低減する。文書画像処理装置は、文書画像を入力とし、文字、表、図、枠などの文書画像構成要素の画像を決定し、文書画像構成要素の認識によるコード化処理を行う。このとき、認識結果候補表示として、認識結果候補コードと共に、文書画像構成要素の画像を表示する。これにより、認識結果と入力画像の比較確認のための視点移動を低減する。 (2) A character image is displayed in the candidate character display area of the character recognition result to reduce viewpoint movement for comparison confirmation between the recognition result and the input image. The document image processing apparatus receives a document image, determines an image of a document image constituent element such as a character, a table, a figure, or a frame, and performs a coding process by recognizing the document image constituent element. At this time, as the recognition result candidate display, the image of the document image constituent element is displayed together with the recognition result candidate code. Thereby, viewpoint movement for comparing and confirming the recognition result and the input image is reduced.

また、文書画像構成要素の認識結果候補表示における文書画像構成要素画像を修正指示可能とし、修正指示した場合は、文書画像構成要素画像に新コードを対応させ、編集可能とする。これにより、認識結果候補文字中に正解文字がない場合でも、文字画像を修正選択することにより、必ず正しく修正でき、かつ編集できるようになる。 In addition, it is possible to instruct correction of the document image constituent element image in the recognition result candidate display of the document image constituent element, and when the correction instruction is given, the new code is made to correspond to the document image constituent element image and can be edited. As a result, even when there is no correct character in the recognition result candidate characters, the character image can be corrected and edited without fail by selecting and correcting the character image.

文書画像構成要素画像を修正指示し、文書画像構成要素画像に新コードを対応させて編集するために、新コードの表示情報を表示部に表示することにより、既存のコードと新コードが混在した文書を編集可能とする。より一般には、文書画像を入力とし、文書画像構成要素の画像を決定する際に、文書画像構成要素画像に新コードを対応させて、新コードの表示情報を表示部に表示することにより、既存のコードと新コードが混在した文書を編集可能とする。 In order to modify the document image component image and edit the document image component image in association with the new code, the display information of the new code is displayed on the display unit, so that the existing code and the new code are mixed. Make the document editable. More generally, when a document image is input and a document image component image is determined, a new code is associated with the document image component image and the display information of the new code is displayed on the display unit. It is possible to edit a document that contains both the new code and the new code.

文書画像構成要素画像に対応させた新コードの表示情報としては、下記（ａ）、（ｂ）のようなものがある。
（ａ）文書画像構成要素画像の等倍画像を含む縮小または拡大画像を用いる。
（ｂ）文書画像構成要素画像の輪郭をベクトル化したデータを用いる。 The display information of the new code corresponding to the document image constituent element image includes the following (a) and (b).
(A) A reduced or enlarged image including a normal image of the document image component image is used.
(B) Data obtained by vectorizing the contour of the document image constituent element image is used.

修正指示した文書画像構成要素画像に新コードを対応させ、編集可能とした場合、さらに、文書画像構成要素画像に「文字」、「表」、「図」、「枠」などの文書画像構成要素の種類属性を持たせれば、後段の編集処理に付加価値をつけることができる。この種類属性として、認識候補第１位のコードの属性を自動的に付与する方法が考えられる。この文書画像構成要素画像の種類属性を修正指示するインタフェース部を設ければ、さらに柔軟な修正を行うことができる。 When a new code is made to correspond to the document image component image that has been instructed to be edited and editing is possible, the document image component elements such as “character”, “table”, “figure”, “frame” are added to the document image component image. This type attribute can add value to the subsequent editing process. As this type attribute, a method of automatically assigning the attribute of the first code of the recognition candidate can be considered. If an interface unit for instructing correction of the type attribute of the document image constituent element image is provided, more flexible correction can be performed.

既存のコードと新コードが混在した文書を編集する際に、認識結果候補コードに確信度（信頼度）を付与することも考えられる。この場合、文書画像構成要素画像を認識結果候補として扱うために、文書画像構成要素画像にも一定値の確信度を与えて、確信度の大きい認識結果候補コードもしくは文書画像構成要素画像から順に順位付けする。文書画像構成要素画像を認識結果候補として扱うと、修正操作が一段と円滑になる効果がある。 When editing a document in which an existing code and a new code are mixed, a certainty factor (reliability) may be given to a recognition result candidate code. In this case, in order to treat the document image constituent element image as a recognition result candidate, the document image constituent element image is given a certain level of certainty, and the recognition result candidate code or document image constituent element image having the highest certainty is ranked in order. Attach. When the document image component image is handled as a recognition result candidate, there is an effect that the correction operation becomes smoother.

また、文書画像構成要素画像に与える確信度の値を外部から入力できるインタフェース部を持つと、認識結果をユーザの用途に適応させることができる効果がある。すなわち、文書画像構成要素画像に与える確信度が高いと、認識結果の中で文書画像構成要素画像が第１位となる比率が高くなり、結果が既存コード化される比率は低くなるが、誤認識率も低くなる。文書画像構成要素画像に与える確信度が低いと、認識結果の中で文書画像構成要素画像が第１位となる比率が低くなり、既存コード化される比率も高くなるが、誤認識率も高くなる。 In addition, having an interface unit that can input a certainty value given to a document image component image from the outside has an effect of adapting the recognition result to the user's application. That is, if the certainty given to the document image component image is high, the ratio that the document image component image is first in the recognition result is high, and the rate that the result is encoded is low. The recognition rate is also lowered. If the certainty given to the document image constituent element image is low, the ratio that the document image constituent element image is first in the recognition result is low, and the ratio of existing coding is high, but the misrecognition rate is also high. Become.

さらに、対話的に文書画像構成要素画像の確信度を入力し、認識結果候補第１位のみを表示するウィンドウを持ち、認識結果候補第１位表示を逐次変更する手段を設けると、認識対象文書ごとの対話的な調整が可能になる。 Furthermore, if a means for interactively inputting the certainty factor of the document image component image, having a window for displaying only the first recognition result candidate, and sequentially changing the recognition result candidate first display, a recognition target document is provided. Can be adjusted interactively.

次に、文書画像処理装置の構成と上述したような処理の具体例について、順に説明する。
図２は、本発明の文書画像処理装置の第１の構成図である。図２の文書画像処理装置は、文書画像入力部２１、領域識別部２２、個別領域認識部２３、認識結果格納部２４、表示部２５、および修正部２６を備え、図３に示すような処理を行う。 Next, the configuration of the document image processing apparatus and a specific example of the processing as described above will be described in order.
FIG. 2 is a first block diagram of the document image processing apparatus of the present invention. The document image processing apparatus of FIG. 2 includes a document image input unit 21, an area identification unit 22, an individual area recognition unit 23, a recognition result storage unit 24, a display unit 25, and a correction unit 26, and performs processing as shown in FIG. I do.

まず、文書画像入力部２１は、文書を電子化したディジタル画像を入力する（ステップＳ１）。文書画像入力部２１としては、例えば、紙文書をディジタル化するスキャナ装置が使用される。次に、領域識別部２２は、文書画像を入力とし、文章、表、図、囲み枠などの個別領域を識別する（ステップＳ２）。次に、個別領域認識部２３は、文章、表、囲み枠などの内部構造をもつ領域について、文字認識や罫線認識など領域内部の認識処理を行う（ステップＳ３）。 First, the document image input unit 21 inputs a digital image obtained by digitizing a document (step S1). As the document image input unit 21, for example, a scanner device for digitizing a paper document is used. Next, the region identification unit 22 receives a document image as input and identifies individual regions such as sentences, tables, diagrams, and surrounding frames (step S2). Next, the individual area recognition unit 23 performs recognition processing inside the area, such as character recognition and ruled line recognition, for an area having an internal structure such as a sentence, a table, or a frame (step S3).

領域識別部２２および個別領域認識部２３による処理結果は、認識結果格納部２４に格納される。すなわち、領域および領域内部の領域からなる文書画像構成要素の認識結果候補コードとして、「文章」、「表」、「図」、「囲み枠」などの領域の種類コードと、「文字コード」などの個別コードとが合わせて格納される。 The processing results by the area identification unit 22 and the individual area recognition unit 23 are stored in the recognition result storage unit 24. That is, as a recognition result candidate code of a document image component composed of a region and a region inside the region, a region type code such as “text”, “table”, “figure”, “enclosed frame”, “character code” Are stored together with the individual codes.

図４は、領域の種類コードと個別コードの例を示している。図４において、「文章」、「表」、「図」、「囲み枠」、および「文字」は種類コードを表し、「文字コード」は「文字」に対応する個別コードを表す。ここでは、「文章」、「表」、「図」、および「囲み枠」に対応する個別コードは定義されていない。 FIG. 4 shows examples of area type codes and individual codes. In FIG. 4, “sentence”, “table”, “figure”, “enclosed frame”, and “character” represent type codes, and “character code” represents an individual code corresponding to “character”. Here, the individual codes corresponding to “text”, “table”, “figure”, and “box” are not defined.

認識結果格納部２４に格納されたデータは、表示部２５の画面上に表示される（ステップＳ４）と同時に、修正部２６から入力されるユーザの修正指示に従って修正される（ステップＳ５）。具体的には、文書画像構成要素の認識結果候補コードのデータが修正される。 The data stored in the recognition result storage unit 24 is displayed on the screen of the display unit 25 (step S4), and at the same time, is corrected according to the user's correction instruction input from the correction unit 26 (step S5). Specifically, the data of the recognition result candidate code of the document image constituent element is corrected.

修正部２６による認識結果候補コードのデータの修正においては、修正対象の文書画像構成要素を選択する操作が必要である。文書画像構成要素の選択は、一般に、表示画面上でのポインティングデバイスなどを用いた座標指示により行われる。このため、文書画像構成要素の画像範囲（領域）をあらかじめ決めておく必要がある。 In the correction of the recognition result candidate code data by the correction unit 26, an operation for selecting a document image constituent element to be corrected is required. The selection of a document image component is generally performed by a coordinate instruction using a pointing device or the like on the display screen. For this reason, it is necessary to determine the image range (region) of the document image constituent element in advance.

文書画像構成要素の画像範囲としては、文書画像構成要素の黒画素連結領域を用いることができる。例えば、図５のような文書画像構成要素の場合、画像「メ」を構成する黒画素部分が画像範囲となる。 As the image range of the document image constituent element, the black pixel connection area of the document image constituent element can be used. For example, in the case of a document image constituent element as shown in FIG. 5, the black pixel portion constituting the image “M” is the image range.

また、文書画像構成要素の画像範囲として、文書画像構成要素の外接矩形を用いることもできる。例えば、図５のような文書画像構成要素の場合、図６に示すように、画像「メ」の黒画素連結領域の外接矩形が画像範囲となる。このような画像範囲を用いれば、黒画素部分の周囲の白画素部分を指示した場合でも、対応する文書画像構成要素が指定可能であり、黒画素部分よりも指示しやすいという利点がある。 A circumscribed rectangle of the document image constituent element can also be used as the image range of the document image constituent element. For example, in the case of a document image constituent element as shown in FIG. 5, as shown in FIG. 6, the circumscribed rectangle of the black pixel connection area of the image “M” is the image range. If such an image range is used, even when the white pixel portion around the black pixel portion is designated, the corresponding document image constituent element can be designated, and there is an advantage that it is easier to designate than the black pixel portion.

文書画像構成要素の中には、文章領域とこれに含まれる文字領域のように、概念的に階層上下関係にあるものが含まれるため、画像上の１つの指示座標により、対応する文書画像構成要素が一意に定まらない場合がある。一般に、２つの文書画像構成要素が概念的な階層上下関係にある場合、それらの領域は包含関係にある。例えば、図７に示す文書画像構成要素の場合、文章領域は文字領域を含み、表領域は罫線領域や文字領域を含んでいる。 Since document image components include those that are conceptually hierarchically related, such as a text region and a character region included in the document image component, the corresponding document image configuration is based on one designated coordinate on the image. An element may not be uniquely determined. In general, when two document image components are in a conceptual hierarchical hierarchical relationship, their regions are in an inclusive relationship. For example, in the case of the document image component shown in FIG. 7, the text area includes a character area, and the table area includes a ruled line area and a character area.

このような場合に、１つの指示座標により文書画像構成要素を一意に定めるために、指示座標を含む複数の文書画像構成要素のうち、最も内側の文書画像構成要素が指示されたものとみなすことにする。 In such a case, in order to uniquely determine the document image constituent element by one designated coordinate, it is considered that the innermost document image constituent element is designated among the plurality of document image constituent elements including the designated coordinate. To.

例えば、図８のような画像において、文書画像構成要素である文章「メディア」の中には、さらに４つの文書画像構成要素「メ」、「デ」、「ィ」、および「ア」が包含されている。ここで、各文書画像構成要素の画像範囲は、その外接矩形により定義されるものとする。この例において、ユーザが文書画像構成要素「メ」の外接矩形３１内の点の座標を指示した場合は、文書画像構成要素「メ」が指示対象として検出される。 For example, in the image shown in FIG. 8, the document “media” that is the document image component further includes four document image components “me”, “de”, “i”, and “a”. Has been. Here, the image range of each document image constituent element is defined by its circumscribed rectangle. In this example, when the user designates the coordinates of a point in the circumscribed rectangle 31 of the document image component “M”, the document image component “M” is detected as an instruction target.

また、指示座標を含む最も外側の文書画像構成要素が指示されたものとみなすこともできる。図８の例では、ユーザが文章「メディア」の外接矩形３２内のどの点を指示しても、「メディア」が指示対象として検出される。したがって、「メ」の外接矩形３１内の点が指示された場合でも、「メディア」が指示対象となる。 Further, it can be considered that the outermost document image constituent element including the designated coordinates is designated. In the example of FIG. 8, regardless of the point in the circumscribed rectangle 32 of the sentence “media”, “media” is detected as an instruction target. Therefore, even when a point in the circumscribed rectangle 31 of “Me” is designated, “Media” is designated.

また、指示座標に外接枠が最も近い文書画像構成要素が指示されたものとみなすこともできる。図８の例では、「メディア」、「メ」などの５つの文書画像構成要素の各外接矩形の辺上に指示点から垂線が引かれ、その長さが指示点と各外接矩形の間の距離として求められる。そして、指示点までの距離が最も短い外接矩形に対応する文書画像構成要素が、指示対象として検出される。この方法によれば、包含関係の制約を受けることなく、いずれの文書画像構成要素も指示対象となる可能性がある。 It can also be considered that the document image constituent element whose circumscribing frame is closest to the designated coordinates is designated. In the example of FIG. 8, a perpendicular line is drawn from the designated point on each circumscribed rectangle side of five document image components such as “media” and “me”, and the length is between the designated point and each circumscribed rectangle. It is calculated as a distance. Then, the document image constituent element corresponding to the circumscribed rectangle with the shortest distance to the indication point is detected as the indication target. According to this method, any document image constituent element may be an instruction target without being restricted by the inclusion relationship.

ところで、上述のような指示方法では、領域が包含関係にある複数の文書画像構成要素から、１つの文書画像構成要素を選択することができるが、その他の文書画像構成要素を直接選択することはできない。したがって、既に選択した文書画像構成要素から間接的に他の文書画像構成要素を選択する処理を提供する必要がある。 By the way, in the instruction method as described above, one document image constituent element can be selected from a plurality of document image constituent elements whose areas are in an inclusive relationship, but it is not possible to directly select other document image constituent elements. Can not. Therefore, it is necessary to provide a process for indirectly selecting another document image constituent element from already selected document image constituent elements.

そこで、既に指定した文書画像構成要素を含む他の文書画像構成要素のうちで、最も内側のものを指示する選択操作を設けることにする。図８の例では、「メ」が既に指示されている状態で、ユーザがこの選択操作を行うと、その外側の「メディア」が指示される。 Therefore, a selection operation for designating the innermost one among the other document image components including the already specified document image component is provided. In the example of FIG. 8, when the user performs this selection operation in a state in which “M” has already been instructed, “Media” on the outside thereof is instructed.

また、既に指定した文書画像構成要素内の他の文書画像構成要素のうちで、既に指示した座標を含む最大の文書画像構成要素を指示する選択操作を設けてもよい。図８の例では、まず、「メ」の領域内の座標が指示されることにより、文書画像構成要素「メディア」が指示されたとする。この状態において、ユーザがこの選択操作を行うと、「メ」の領域内の指示座標を含む最大の文書画像構成要素である「メ」が指示される。 In addition, a selection operation may be provided that designates the largest document image constituent element including the already designated coordinates among other document image constituent elements in the already designated document image constituent element. In the example of FIG. 8, it is assumed that the document image component “media” is instructed by instructing the coordinates in the “me” area. In this state, when the user performs this selection operation, “me”, which is the largest document image constituent element including the designated coordinates in the “me” region, is designated.

次に、文書画像構成要素の領域の種類コードと個別コードを、認識結果候補として表示する方法としては、図９に示すように、種類コードと個別コードを個別に表示する２元表示法が考えられる。図９において、認識結果候補の第１位が「文字」であるため、文字認識が行われ、文字認識結果の個別コードが表示されている。しかし、この表示法では、種類コードと個別コードの関連性が直観的に分かりにくいことが多い。 Next, as a method of displaying the type code and individual code of the region of the document image constituent element as a recognition result candidate, a binary display method in which the type code and the individual code are individually displayed as shown in FIG. 9 is considered. It is done. In FIG. 9, since the first place of the recognition result candidate is “character”, character recognition is performed and an individual code of the character recognition result is displayed. However, in this display method, the relationship between the type code and the individual code is often difficult to understand intuitively.

そこで、個別領域認識部２３による文書画像構成要素の種類の認識において、認識結果候補の第１位が「文字」の場合、文字認識を行い、その結果得られた候補文字コードを、図１０に示すように、上位の認識結果候補コードとして表示する。そして、「文字」以外の「表」、「図」、「枠」などの文書画像構成要素の種類コードを、下位の認識結果候補コードとして表示する。このように、種類コードと個別コードを１つのリストとして１元表示することで、認識結果候補が一目で分かるようになる。 Therefore, in the recognition of the type of document image component by the individual area recognition unit 23, when the first recognition result candidate is “character”, character recognition is performed, and the candidate character code obtained as a result is shown in FIG. As shown, it is displayed as a higher recognition result candidate code. Then, the type codes of the document image constituent elements such as “table”, “figure”, “frame” other than “character” are displayed as lower recognition result candidate codes. In this way, by displaying the type code and the individual code as a single list, the recognition result candidates can be recognized at a glance.

このような１元表示法では、認識結果候補の第１位が「文字」以外の「表」、「図」、「枠」などの種類の場合、図１１に示すように、文字認識により得られた候補文字コードは下位の認識結果候補コードとして表示される。この場合、認識結果が「文字」である可能性は低く、正解度の低い文字認識結果を求めるための処理が実行されることになる。 In such a one-way display method, when the first recognition result candidate is a type such as “table”, “figure”, or “frame” other than “character”, it is obtained by character recognition as shown in FIG. The obtained candidate character code is displayed as a lower recognition result candidate code. In this case, it is unlikely that the recognition result is “character”, and a process for obtaining a character recognition result with a low degree of correctness is executed.

そこで、認識結果候補の第１位が「文字」以外の種類の場合、その文書画像構成要素の種類を第１位の認識結果候補コードとし、これ以外の文書画像構成要素の種類コードのみを下位の認識結果候補コードとしてもよい。この方法では、図１２に示すように、文書画像構成要素の種類のみが候補として表示され、文字認識の候補文字は表示されない。 Therefore, when the first recognition result candidate is a type other than “character”, the type of the document image constituent element is set as the first recognition result candidate code, and only the type codes of other document image constituent elements are subordinate. May be a recognition result candidate code. In this method, as shown in FIG. 12, only the types of document image constituent elements are displayed as candidates, and candidate characters for character recognition are not displayed.

このような表示を採用するのは、第２位以下の詳細情報を表示せず、表示を見やすくするためである。一般に、認識候補の第１位が正解である確率が高いため、結果的に、第２位以下の詳細情報は不要となることが多い。さらに、この方法では、「文字」などの第２位以下の認識結果候補に対応する認識処理が不要になり、処理が高速化される。 The reason why such a display is used is to make the display easier to view without displaying the second and lower detailed information. In general, since there is a high probability that the first place of the recognition candidate is correct, detailed information below the second place is often unnecessary as a result. Furthermore, this method eliminates the need for recognition processing corresponding to the second or lower recognition result candidate such as “character”, and speeds up the processing.

ユーザは、こうして表示された認識結果候補を見て、それらを修正することができる。例えば、図１３に示すように、第１位の種類コードである「表」を「文字」に修正指示した場合、「文字」以外の種類コードの順位が１つずつ下方にシフトする。そして、必要に応じて、該当する入力文字画像の文字認識が行われ、その結果得られる単数あるいは複数の候補文字コードにより、認識結果候補コードの「文字」が置き換えられる。 The user can see the recognition result candidates displayed in this way and correct them. For example, as shown in FIG. 13, when “table” which is the first type code is instructed to be corrected to “character”, the rank of the type codes other than “character” is shifted downward one by one. Then, if necessary, character recognition of the corresponding input character image is performed, and “characters” in the recognition result candidate code are replaced with one or a plurality of candidate character codes obtained as a result.

また、認識結果候補コードとして「文字」以外の「表」、「図」、「枠」などの種類コードを修正指示した場合も、必要に応じて、該当する入力文字画像に対し、指示された文書画像構成要素としての認識処理が行われる。 In addition, when a type code such as “table”, “figure”, or “frame” other than “character” is instructed as a recognition result candidate code, it is instructed to the corresponding input character image as necessary. Recognition processing as a document image constituent element is performed.

例えば、図１４に示すように、第１位の種類コードである「図」を、内部構造を持つ「表」や「囲み枠」に修正指示した場合、指示された種類コードである「表」や「囲み枠」の内部構造に関する認識が行われる。「表」の内部構造の認識処理では、罫線の抽出およびベクトル化、罫線により囲まれたセルの抽出、セル内の文字の認識などが行われる。また、「囲み枠」の内部構造の認識処理では、枠の抽出およびベクトル化、枠内の領域識別などが行われる。 For example, as illustrated in FIG. 14, when the “type” that is the first type code is instructed to be corrected to a “table” or “enclosure” having an internal structure, the indicated type code “table” And the internal structure of the “box” is recognized. In the process of recognizing the internal structure of the “table”, ruled lines are extracted and vectorized, cells surrounded by the ruled lines are extracted, and characters in the cells are recognized. In the process of recognizing the internal structure of the “enclosed frame”, extraction and vectorization of the frame, area identification within the frame, and the like are performed.

図１５は、本発明の文書画像処理装置の第２の構成図である。図１５の文書画像処理装置は、文書画像入力部４１、文書画像構成要素抽出部４２、コード付加部４３、編集データ格納部４４、文書データ格納部４５、表示部４６、コード文書入力部４７、編集操作入力部４８、および編集部４９を備え、図１６および図１７に示すような処理を行う。 FIG. 15 is a second block diagram of the document image processing apparatus of the present invention. 15 includes a document image input unit 41, a document image component extraction unit 42, a code addition unit 43, an edit data storage unit 44, a document data storage unit 45, a display unit 46, a code document input unit 47, An editing operation input unit 48 and an editing unit 49 are provided, and processing as shown in FIGS. 16 and 17 is performed.

まず、文書画像入力部４１は、文書を電子化したディジタル画像を入力する（ステップＳ１１）。文書画像入力部４１は、例えば、デジタルスキャナ装置である。次に、文書画像構成要素抽出部４２は、文書画像を構成する文書画像構成要素を抽出する（ステップＳ１２）。 First, the document image input unit 41 inputs a digital image obtained by digitizing a document (step S11). The document image input unit 41 is, for example, a digital scanner device. Next, the document image constituent element extraction unit 42 extracts the document image constituent elements constituting the document image (step S12).

コード付加部４３は、抽出された文書画像構成要素に新コードを付加し（ステップＳ１３）、新コードが付加された文書画像構成要素を編集データ格納部４４に格納する（ステップＳ１４）。また、必要であれば、コード文書入力部４７が、既存の電子文書（電子コード文書）を編集データ格納部４４に入力する（ステップＳ１５）。既存の電子文書のデータは、既存コードの集合であり、既存の文字パターンに対応している。したがって、編集データ格納部４４に格納される編集データには、図１８に示すような２種類のデータが含まれる。 The code adding unit 43 adds a new code to the extracted document image constituent element (step S13), and stores the document image constituent element to which the new code is added in the editing data storage unit 44 (step S14). If necessary, the code document input unit 47 inputs an existing electronic document (electronic code document) to the edit data storage unit 44 (step S15). The data of an existing electronic document is a set of existing codes and corresponds to an existing character pattern. Therefore, the edit data stored in the edit data storage unit 44 includes two types of data as shown in FIG.

編集データ格納部４４において、文書画像構成要素は、ビットマップなどの画像データで表され、新コードとしては、外字コードなどが用いられる。新コードは文書画像構成要素のビットマップデータに自動的に付加されるため、ユーザは、通常の外字登録の場合のように、文字の形状などをデザインする必要がない。また、既存の文字パターンは、フォントデータなどで表される。 In the editing data storage unit 44, the document image constituent element is represented by image data such as a bitmap, and an external character code or the like is used as the new code. Since the new code is automatically added to the bitmap data of the document image component, the user does not need to design the character shape or the like as in the case of normal external character registration. An existing character pattern is represented by font data or the like.

さらに、必要であれば、既に編集されて文書データ格納部４５に格納されている文書データを、編集データ格納部４４に読み出して使用することもできる（ステップＳ１６）。
次に、表示部４６は、編集データ格納部４４内のデータを用いて、編集対象の文書を画面に表示する（ステップＳ１７）。文書画像構成要素の表示には、画像データが用いられ、既存の文字パターンの表示には、フォントデータが用いられる。 Furthermore, if necessary, the document data that has already been edited and stored in the document data storage unit 45 can be read out and used in the editing data storage unit 44 (step S16).
Next, the display unit 46 displays the document to be edited on the screen using the data in the editing data storage unit 44 (step S17). Image data is used to display document image components, and font data is used to display existing character patterns.

編集操作入力部４８から、ユーザによる文書の編集操作が編集部４９に入力されると（ステップＳ１８）、編集部４９は、編集データ格納部４４内のデータを編集する（ステップＳ１９）。このとき、新コードと既存コードの集合が編集処理の直接の対象となり、表示処理には、画像データおよびフォントデータが使用される。 When a document editing operation by the user is input to the editing unit 49 from the editing operation input unit 48 (step S18), the editing unit 49 edits the data in the editing data storage unit 44 (step S19). At this time, a set of the new code and the existing code becomes a direct object of the editing process, and image data and font data are used for the display process.

文書画像構成要素のコピー操作や移動操作の際には、ユーザは、ポインティングデバイスなどを用いて、表示された画像上で指示を行う。これを受けて、編集部４９は、指示された文書画像構成要素に対応する新コードに対して、指示された編集処理を行う。このように、文書画像構成要素に対応する新コードは、システムにより自動的に処理されるため、ユーザはそれを直接扱う必要がない。 When a document image component is copied or moved, the user gives an instruction on the displayed image using a pointing device or the like. In response to this, the editing unit 49 performs the instructed editing process on the new code corresponding to the instructed document image constituent element. In this way, the new code corresponding to the document image component is automatically processed by the system, so the user does not have to deal with it directly.

編集操作を完了した文書データは、編集データ格納部４４から文書データ格納部４５に格納されて（ステップＳ２０）、処理が終了する。文書データ格納部４５に格納された文書データは、編集データ格納部４４に読み込んで、再編集することもできる。 The document data for which the editing operation has been completed is stored in the document data storage unit 45 from the editing data storage unit 44 (step S20), and the process ends. The document data stored in the document data storage unit 45 can be read into the edit data storage unit 44 and re-edited.

例えば、図１９に示すようなコード文書が紙媒体に印刷され、何回かコピーやＦＡＸ（ファクシミリ）送信された後に、文書画像入力部４１から画像として入力されたとする。この課程で印字品質が劣化するため、入力画像は、図２０に示すような文書画像となる。 For example, it is assumed that a code document as shown in FIG. 19 is printed on a paper medium and is input as an image from the document image input unit 41 after being copied or faxed several times. Since the print quality deteriorates during this process, the input image becomes a document image as shown in FIG.

文書画像構成要素抽出部４２は、この文書画像から、図２１に示すような文書画像構成要素を抽出する。ここでは、抽出された各文書画像構成要素が、外接矩形を用いて画面に表示されている。各文書画像構成要素には、コード付加部４３により新コードが付与され、文書画像構成要素の単位で編集ができるようになる。 The document image component extraction unit 42 extracts document image components as shown in FIG. 21 from the document image. Here, each extracted document image component is displayed on the screen using a circumscribed rectangle. Each document image component is given a new code by the code adding unit 43 so that it can be edited in units of document image components.

ユーザは、表示された画像に含まれる文書画像構成要素の順番を変えて、図２２に示すような画像を作成することができる。ここでは、「マルチメディアシステム」の画像が「メディアマルチシステム」に編集されている。 The user can create an image as shown in FIG. 22 by changing the order of the document image constituent elements included in the displayed image. Here, the image of “multimedia system” is edited to “media multimedia system”.

さらに、コード文書入力部４７から入力されたコード文書と文書画像構成要素の混在編集もできる。例えば、文書画像構成要素「マルチ」をコード文書「統合」に置き換える編集を行うと、図２３に示すような文書が生成される。すべての編集が終了した後、図２４に示すように、外接矩形による文書画像構成要素の表示は解除される。編集後の文書は、ユーザの指示に応じて、印刷されたり、認識処理されたりする。 Further, the mixed editing of the code document input from the code document input unit 47 and the document image constituent element can be performed. For example, when editing is performed by replacing the document image component “multi” with the code document “integrated”, a document as shown in FIG. 23 is generated. After all editing is completed, as shown in FIG. 24, the display of the document image component by the circumscribed rectangle is canceled. The edited document is printed or recognized according to a user instruction.

このように、文書画像構成要素画像に新たなコードを対応させて、そのコードの表示情報を表示部４６に表示することにより、既存のコードと新コードが混在した文書を編集することが可能になる。 In this way, by associating a new code with the document image component image and displaying the display information of the code on the display unit 46, it is possible to edit a document in which the existing code and the new code are mixed. Become.

表示部４６は、文書画像構成要素画像に対応させた新コードの表示情報として、文書画像構成要素画像の等倍画像を含む縮小または拡大画像を用いる。文書画像構成要素画像の大きさは様々であるため、既存の文書データと混在して編集するためには、大きさを変更して表示した方が便利である。 The display unit 46 uses a reduced or enlarged image including a normal image of the document image component image as display information of a new code corresponding to the document image component image. Since the document image constituent element images have various sizes, it is more convenient to display the document image by changing the size in order to edit it together with existing document data.

文書画像構成要素画像として最も一般的な文字画像については、既存の文字パターンと大きさを同一にして表示する。このために、文字画像を縮小または拡大して表示する。図２４では、この方法により、文字画像である「メディア」および「システム」の大きさと、既存の文字パターンである「統合」の大きさが統一されて表示されている。 The most common character image as the document image component image is displayed with the same size as the existing character pattern. For this purpose, the character image is reduced or enlarged and displayed. In FIG. 24, the size of “media” and “system” as character images and the size of “integrated” as existing character patterns are unified and displayed by this method.

また、この既存の文字パターンの大きさと整合性を保ったままで、文字画像を縮小表示すると、図２５に示すようになり、さらに縮小すると、図２６に示すようになる。
しかし、文書画像構成要素画像を、画像データのままで拡大・縮小すると、形状が崩れて表示される場合がある。そこで、文書画像構成要素の表示に、文書画像構成要素画像の輪郭をベクトル化したアウトライン・データ（アウトライン・フォント）を用いると、拡大・縮小した場合でも形状が崩れるのを防ぐことができる。 Further, when the character image is reduced and displayed while maintaining the size and consistency of the existing character pattern, the result is as shown in FIG. 25, and further reduced as shown in FIG.
However, when the document image component image is enlarged or reduced with the image data as it is, the shape may be collapsed and displayed. Therefore, when outline data (outline font) obtained by vectorizing the outline of the document image constituent element image is used for displaying the document image constituent element, it is possible to prevent the shape from collapsing even when enlarged or reduced.

例えば、文字画像「メ」をベクトル化したアウトライン・データを拡大・縮小すると、図２７のような表示が得られる。図２７では、いずれの文字の形状も相似であることが分かる。 For example, when the outline data obtained by vectorizing the character image “M” is enlarged or reduced, a display as shown in FIG. 27 is obtained. In FIG. 27, it can be seen that the shapes of all the characters are similar.

ところで、編集された文書画像の認識処理を行う場合は、文書データを図２の文書処理装置に入力する。そして、上述したような領域識別と認識処理を行った後、認識結果候補の表示・修正を行う。 By the way, when the recognition processing of the edited document image is performed, the document data is input to the document processing apparatus of FIG. Then, after performing the region identification and recognition processing as described above, the recognition result candidates are displayed and corrected.

従来の認識結果表示方法では、図３９に示したように、認識結果候補コードの文字パターンと、対応する文字画像とが別々に表示される。このため、確認のための視線移動が大きく、ユーザにとって負担になるという問題がある。 In the conventional recognition result display method, as shown in FIG. 39, the character pattern of the recognition result candidate code and the corresponding character image are displayed separately. For this reason, there is a problem that the line of sight movement for confirmation is large, which is a burden on the user.

そこで、本実施形態では、図２８に示すように、認識結果候補の表示領域（表示ウィンドウ）５１内に、認識結果候補の文字パターンとともに、対応する文書画像構成要素画像を表示する。このように、文書画像構成要素を入力文書画像から分離して認識結果候補の近くに表示すれば、視線移動が大幅に削減され、ユーザは、認識結果の妥当性を容易に確認することができる。 Therefore, in the present embodiment, as shown in FIG. 28, the corresponding document image component image is displayed in the recognition result candidate display area (display window) 51 together with the recognition result candidate character pattern. In this way, if the document image component is separated from the input document image and displayed near the recognition result candidate, the line-of-sight movement is greatly reduced, and the user can easily confirm the validity of the recognition result. .

文書画像構成要素が文字以外の場合も、同様にして、認識結果候補の表示領域内に元の画像を表示することができる。例えば、図２９に示すような認識結果５２の場合、認識結果候補の種類コードを表す「表」、「囲み枠」などとともに、表示領域５１内に文書画像構成要素画像５３が表示される。これにより、ユーザは、認識結果５２の妥当性を容易に確認することができる。 Similarly, when the document image component is other than a character, the original image can be displayed in the display area of the recognition result candidate. For example, in the case of the recognition result 52 as shown in FIG. 29, the document image constituent element image 53 is displayed in the display area 51 together with “table”, “enclosed frame”, and the like representing the type code of the recognition result candidate. Thereby, the user can easily confirm the validity of the recognition result 52.

また、このようにして表示された文書画像構成要素画像を修正に用いることもできる。通常の文字認識の認識結果表示においては、候補中に正解が含まれない場合もある。このような場合に、ユーザが文書画像構成要素画像を選択すると、認識結果中の候補文字コードが文書画像構成要素のコードに置き換えられ、画面上には、部分的に画像データを用いた妥当な文書が表示される。 Also, the document image component image displayed in this way can be used for correction. In the recognition result display of normal character recognition, there are cases where the correct answer is not included in the candidates. In such a case, when the user selects a document image constituent element image, the candidate character code in the recognition result is replaced with the code of the document image constituent element. The document is displayed.

例えば、図３０の左側に示すような認識結果において、表示された候補文字中に正解の「情」が含まれていない場合、ユーザは文書画像構成要素画像５４を選択して、修正指示を行う。これにより、認識結果は、右側に示すように、画像５４を用いて修正表示される。 For example, in the recognition result as shown on the left side of FIG. 30, when the correct candidate “information” is not included in the displayed candidate characters, the user selects the document image component image 54 and issues a correction instruction. . Thereby, the recognition result is corrected and displayed using the image 54 as shown on the right side.

また、図３１の左側に示すような表の認識結果において、表示された認識結果５２が正しくない場合、ユーザは文書画像構成要素画像５５を選択して、修正指示を行う。これにより、認識結果は、右側に示すように、画像５５を用いて修正表示される。 In addition, in the recognition result of the table as shown on the left side of FIG. 31, when the displayed recognition result 52 is not correct, the user selects the document image component image 55 and issues a correction instruction. Thereby, the recognition result is corrected and displayed using the image 55 as shown on the right side.

このように、認識結果候補表示における文書画像構成要素画像を修正指示可能とし、ユーザがそれを選択した場合は、文書画像構成要素画像に対応するコードを用いて編集処理が行われる。したがって、少なくとも表示上では、常に、妥当な認識結果候補が含まれていることになる。 As described above, when the document image component image in the recognition result candidate display can be instructed to be corrected and the user selects it, editing processing is performed using a code corresponding to the document image component image. Therefore, at least on the display, a valid recognition result candidate is always included.

また、文書画像構成要素画像に「文字」、「表」、「図」、「枠」などの文書画像構成要素の種類属性を保持させることにより、種類属性に応じた取り扱いが可能となる。種類属性は、文書画像構成要素画像のコードとともに、図２の認識結果格納部２４に格納される。認識結果においては、通常、第１位の認識候補が最も確度が高いため、そのコードの種類に対応する属性が、種類属性として文書画像構成要素画像に自動的に付加される。 Further, by holding the document image constituent element type attributes such as “character”, “table”, “figure”, “frame” in the document image constituent element image, handling according to the type attribute becomes possible. The type attribute is stored in the recognition result storage unit 24 of FIG. 2 together with the code of the document image constituent element image. In the recognition result, the first recognition candidate usually has the highest accuracy. Therefore, an attribute corresponding to the type of the code is automatically added to the document image constituent element image as a type attribute.

例えば、図３２の左側に示すような認識結果においては、第１位の「惰」の種類属性が「文字」であるため、文書画像構成要素画像５４の属性も「文字」になっている。
ユーザは、このような認識結果表示において、文書画像構成要素画像を選択して、その種類属性に応じた再認識処理を行わせることもできる。図３２の例では、ユーザが文書画像構成要素画像５４を選択して修正指示すると、右側に示すように、対応する文字パターンが文書画像構成要素画像５４に置き換えられて、再度、文字認識が行われる。 For example, in the recognition result as shown on the left side of FIG. 32, since the type attribute of the first “惰” is “character”, the attribute of the document image component image 54 is also “character”.
In such recognition result display, the user can select a document image constituent element image and perform re-recognition processing according to the type attribute. In the example of FIG. 32, when the user selects and corrects the document image component image 54, as shown on the right side, the corresponding character pattern is replaced with the document image component image 54, and character recognition is performed again. Is called.

ところで、第１位の認識候補に対応する種類属性を文書画像構成要素画像に与えたとしても、それが誤っていることも考えられる。そこで、文書画像構成要素画像の種類属性を修正指示するための外部インタフェース部を、図２の修正部２６内に設けておく。この外部インタフェース部は、ユーザの指示に応じて、文書画像構成要素画像の種類属性を変更する。例えば、図３３においては、文書画像構成要素画像である「情」の属性が、「表」から「文字」に変更されている。 By the way, even if the type attribute corresponding to the first recognition candidate is given to the document image constituent element image, it may be wrong. Therefore, an external interface unit for instructing correction of the type attribute of the document image constituent element image is provided in the correction unit 26 of FIG. The external interface unit changes the type attribute of the document image constituent element image in accordance with a user instruction. For example, in FIG. 33, the attribute of “information” which is a document image component image is changed from “table” to “character”.

また、認識処理においては、認識結果の各候補コードと認識辞書との距離値などを計算して、各候補コードに確信度を与える場合が多い。そこで、本実施形態では、文書画像構成要素画像にも一定値の確信度を与えて、確信度の高い候補コードもしくは文書画像構成要素画像から順に順位付けを行う。元来の認識結果候補である候補コードの確信度が低い場合でも、このような順位付けを行うと、文書画像構成要素画像が第１位の認識結果候補となるため、認識結果の文書が適切に表示される。 In the recognition process, a distance value between each candidate code of the recognition result and the recognition dictionary is often calculated to give a certainty degree to each candidate code. Therefore, in the present embodiment, a certain degree of certainty is given to the document image constituent element images, and ranking is performed in order from candidate codes or document image constituent elements having a high certainty degree. Even if the certainty of the candidate code that is the original recognition result candidate is low, if such ranking is performed, the document image constituent element image becomes the first recognition result candidate, so that the recognition result document is appropriate. Is displayed.

図３４および図３５は、認識結果候補を確信度とともに表示した例を示している。図３４においては、文書画像構成要素画像である「情」の確信度は６０であり、これは第２位の候補として表示されている。これに対して、図３５においては、「情」の確信度は７０であり、これは第１位の候補として表示されている。 34 and 35 show an example in which recognition result candidates are displayed together with certainty factors. In FIG. 34, the certainty factor of “information”, which is a document image constituent element image, is 60, and this is displayed as the second candidate. On the other hand, in FIG. 35, the certainty factor of “symbol” is 70, which is displayed as the first candidate.

このように、文書画像構成要素画像とその他の候補コードとの相対的な確信度の差によって、順位付けが異なってくる。このため、文書画像構成要素画像の確信度を、元来の認識結果候補を文書画像構成要素画像により置き換えるためのしきい値として用いることができる。そこで、文書画像構成要素画像の確信度を入力する外部インタフェース部を、図２の修正部２６内に設けておき、ユーザがこのしきい値を調整できるようにする。 As described above, the ranking is different depending on the relative certainty difference between the document image component image and the other candidate codes. Therefore, the certainty factor of the document image constituent element image can be used as a threshold value for replacing the original recognition result candidate with the document image constituent element image. Therefore, an external interface unit for inputting the certainty factor of the document image component image is provided in the correction unit 26 in FIG. 2 so that the user can adjust the threshold value.

修正部２６は、文書画像構成要素画像の確信度が入力される度に、それを他の候補の確信度と比較し、必要であれば、認識結果候補の第１位の表示を更新する。これにより、ユーザは、結果を確認しながら、確信度のしきい値を対話的に調整することができる。 The correction unit 26 compares the certainty factor of the document image constituent element image with the certainty factors of other candidates, and updates the first display of the recognition result candidate if necessary. Thus, the user can interactively adjust the certainty threshold while confirming the result.

このとき、図２の表示部２５は、図３４および図３５に示したように、所定数の認識結果候補を表示してもよいが、第１位の候補のみを表示するウィンドウを設定してもよい。第１位の候補のみを表示することで、ユーザの視線移動が軽減されるため、しきい値の調整作業が効率化される。 At this time, the display unit 25 of FIG. 2 may display a predetermined number of recognition result candidates as shown in FIGS. 34 and 35, but sets a window for displaying only the first candidate. Also good. By displaying only the first candidate, the movement of the user's line of sight is reduced, so that the threshold adjustment operation is made efficient.

本実施形態の文書画像処理装置は、例えば、図３６に示すような情報処理装置（コンピュータ）を用いて構成することができる。図３６の情報処理装置は、ＣＰＵ（中央処理装置）６１、メモリ６２、入力装置６３、出力装置６４、外部記憶装置６５、媒体駆動装置６６、ネットワーク接続装置６７、および光電変換装置６８を備え、それらはバス６９により互いに接続されている。 The document image processing apparatus of the present embodiment can be configured using an information processing apparatus (computer) as shown in FIG. 36, for example. 36 includes a CPU (central processing unit) 61, a memory 62, an input device 63, an output device 64, an external storage device 65, a medium driving device 66, a network connection device 67, and a photoelectric conversion device 68. They are connected to each other by a bus 69.

メモリ６２には、上述したような文書画像処理に用いられるプログラムとデータが格納される。メモリ６２としては、例えばＲＯＭ（read only memory）、ＲＡＭ（random access memory）などが用いられる。ＣＰＵ６１は、メモリ６２を利用してプログラムを実行することにより、必要な処理を行う。 The memory 62 stores programs and data used for document image processing as described above. As the memory 62, for example, a read only memory (ROM), a random access memory (RAM), or the like is used. The CPU 61 performs necessary processing by executing a program using the memory 62.

入力装置６３は、例えば、キーボード、ポインティングデバイス、タッチパネルなどであり、ユーザからの指示や必要な情報の入力に用いられる。出力装置６４は、例えば、ディスプレイやプリンタなどであり、処理結果などの出力に用いられる。 The input device 63 is, for example, a keyboard, a pointing device, a touch panel, and the like, and is used for inputting instructions from the user and necessary information. The output device 64 is, for example, a display or a printer, and is used for outputting processing results.

外部記憶装置６５は、例えば、磁気ディスク装置、光ディスク装置、光磁気ディスク（magneto-optical disk）装置などである。この外部記憶装置６５に、上述のプログラムとデータを保存しておき、必要に応じて、それらをメモリ６２にロードして使用することもできる。 The external storage device 65 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, or the like. The above-described program and data can be stored in the external storage device 65, and loaded into the memory 62 for use as necessary.

媒体駆動装置６６は、可搬記録媒体７０を駆動し、その記録内容にアクセスする。可搬記録媒体７０としては、メモリカード、フレキシブルディスク、ＣＤ−ＲＯＭ（compact disk read only memory ）、光ディスク、光磁気ディスクなど、任意のコンピュータ読み取り可能な記録媒体が用いられる。この可搬記録媒体７０に上述のプログラムとデータを格納しておき、必要に応じて、それらをメモリ６２にロードして使用することもできる。 The medium driving device 66 drives the portable recording medium 70 and accesses the recorded contents. As the portable recording medium 70, any computer-readable recording medium such as a memory card, a flexible disk, a CD-ROM (compact disk read only memory), an optical disk, or a magneto-optical disk is used. The above-described program and data can be stored in the portable recording medium 70, and can be loaded into the memory 62 and used as necessary.

ネットワーク接続装置６７は、ＬＡＮ（local area network）などの任意のネットワーク（回線）を介して外部の装置と通信し、通信に伴うデータ変換を行う。これにより、文書画像処理装置は、必要に応じて、上述のプログラムとデータを外部の装置から受け取り、それらをメモリ６２にロードして使用することができる。 The network connection device 67 communicates with an external device via an arbitrary network (line) such as a LAN (local area network) and performs data conversion accompanying the communication. Thereby, the document image processing apparatus can receive the above-described program and data from an external apparatus and use them by loading them into the memory 62 as necessary.

光電変換装置６８は、例えば、デジタルスキャナであり、紙媒体に記述された文書の画像を入力する。
図３７は、図３６の情報処理装置にプログラムとデータを供給することのできるコンピュータ読み取り可能な記録媒体を示している。可搬記録媒体７０や外部のデータベース７１に保存されたプログラムとデータは、メモリ６２にロードされる。そして、ＣＰＵ６１は、そのデータを用いてそのプログラムを実行し、必要な処理を行う。 The photoelectric conversion device 68 is a digital scanner, for example, and inputs an image of a document described on a paper medium.
FIG. 37 shows a computer-readable recording medium that can supply a program and data to the information processing apparatus of FIG. Programs and data stored in the portable recording medium 70 or the external database 71 are loaded into the memory 62. Then, the CPU 61 executes the program using the data and performs necessary processing.

本発明の文書画像処理装置の原理図である。1 is a principle diagram of a document image processing apparatus of the present invention. 文書画像処理装置の第１の構成図である。It is a 1st block diagram of a document image processing apparatus. 第１の文書画像処理のフローチャートである。It is a flowchart of a 1st document image process. 種類コードと個別コードを示す図である。It is a figure which shows a kind code and an individual code. 第１の画像範囲を示す図である。It is a figure which shows the 1st image range. 第２の画像範囲を示す図である。It is a figure which shows the 2nd image range. 文書画像構成要素の階層を示す図である。It is a figure which shows the hierarchy of a document image component. 包含関係にある文書画像構成要素を示す図である。It is a figure which shows the document image component which is in an inclusive relationship. 認識結果の第１の表示を示す図である。It is a figure which shows the 1st display of a recognition result. 認識結果の第２の表示を示す図である。It is a figure which shows the 2nd display of a recognition result. 認識結果の第３の表示を示す図である。It is a figure which shows the 3rd display of a recognition result. 認識結果の第４の表示を示す図である。It is a figure which shows the 4th display of a recognition result. 第１の修正指示を示す図である。It is a figure which shows the 1st correction instruction | indication. 第２の修正指示を示す図である。It is a figure which shows the 2nd correction instruction | indication. 文書画像処理装置の第２の構成図である。It is a 2nd block diagram of a document image processing apparatus. 第２の文書画像処理のフローチャート（その１）である。It is a flowchart (the 1) of the 2nd document image processing. 第２の文書画像処理のフローチャート（その２）である。It is a flowchart (the 2) of the 2nd document image processing. 編集データを示す図である。It is a figure which shows edit data. コード文書を示す図である。It is a figure which shows a code document. 文書画像を示す図である。It is a figure which shows a document image. 抽出された文書画像構成要素を示す図である。It is a figure which shows the extracted document image component. 編集結果の第１の表示を示す図である。It is a figure which shows the 1st display of an edit result. 編集結果の第２の表示を示す図である。It is a figure which shows the 2nd display of an edit result. 編集結果の第３の表示を示す図である。It is a figure which shows the 3rd display of an edit result. 第１の縮小表示を示す図である。It is a figure which shows the 1st reduction display. 第２の縮小表示を示す図である。It is a figure which shows the 2nd reduction display. アウトライン表示を示す図である。It is a figure which shows an outline display. 認識結果の第５の表示を示す図である。It is a figure which shows the 5th display of a recognition result. 認識結果の第６の表示を示す図である。It is a figure which shows the 6th display of a recognition result. 第３の修正指示を示す図である。It is a figure which shows the 3rd correction instruction | indication. 第４の修正指示を示す図である。It is a figure which shows the 4th correction instruction | indication. 第５の修正指示を示す図である。It is a figure which shows the 5th correction instruction | indication. 第６の修正指示を示す図である。It is a figure which shows the 6th correction instruction | indication. 認識結果の第７の表示を示す図である。It is a figure which shows the 7th display of a recognition result. 認識結果の第８の表示を示す図である。It is a figure which shows the 8th display of a recognition result. 情報処理装置の構成図である。It is a block diagram of information processing apparatus. 記録媒体を示す図である。It is a figure which shows a recording medium. 従来の文書画像認識・編集装置の構成図である。It is a block diagram of the conventional document image recognition and editing apparatus. 従来の認識結果表示を示す図である。It is a figure which shows the conventional recognition result display.

Explanation of symbols

１、２１、４１文書画像入力部
２、２２領域識別部
３領域識別結果格納部
４、２３個別領域認識部
５、２４認識結果格納部
６第１の修正部
７第２の修正部
８、２５、４６表示部
１１識別手段
１２認識手段
１３出力手段
１４修正手段
１５抽出手段
１６コード付加手段
１７編集手段
２６修正部
３１、３２外接矩形
４２文書画像構成要素抽出部
４３コード付加部
４４編集データ格納部
４５文書データ格納部
４７コード文書入力部
４８編集操作入力部
４９編集部
５１表示領域
５２認識結果
５３、５４、５５文書画像構成要素画像
６１ＣＰＵ
６２メモリ
６３入力装置
６４出力装置
６５外部記憶装置
６６媒体駆動装置
６７ネットワーク接続装置
６８光電変換装置
６９バス
７０可搬記録媒体
７１データベース 1, 21, 41 Document image input unit 2, 22 Region identification unit 3 Region identification result storage unit 4, 23 Individual region recognition unit 5, 24 Recognition result storage unit 6 First modification unit 7 Second modification unit 8, 25 46 Display unit 11 Identification unit 12 Recognition unit 13 Output unit 14 Correction unit 15 Extraction unit 16 Code addition unit 17 Editing unit 26 Correction unit 31 and 32 circumscribed rectangle 42 Document image component extraction unit 43 Code addition unit 44 Edit data storage unit 45 Document Data Storage Unit 47 Code Document Input Unit 48 Editing Operation Input Unit 49 Editing Unit 51 Display Area 52 Recognition Results 53, 54, 55 Document Image Component Image 61 CPU
62 memory 63 input device 64 output device 65 external storage device 66 medium drive device 67 network connection device 68 photoelectric conversion device 69 bus 70 portable recording medium 71 database

Claims

Extraction means for extracting image components constituting the input image from the input image;
Recognizing means for recognizing the image components;
An image processing apparatus comprising: output means for separating image data corresponding to the image component from the input image and outputting the image data together with one or more candidates in the recognition result of the image component.

A selection means for selecting one of the one or more candidates and image data; and when the image data is selected, new code information is added to an image component corresponding to the image data, 2. The image processing apparatus according to claim 1, further comprising editing means for editing a document in which image data and character patterns corresponding to existing code information are mixed.

The image processing apparatus according to claim 1, wherein the recognition unit adds type information to the image constituent element.

The image processing apparatus according to claim 3, wherein the recognition unit adds type information corresponding to a first candidate in the recognition result to the image constituent element.

The image processing apparatus according to claim 3, further comprising a correcting unit that corrects the type information of the image component.

The recognizing unit obtains the certainty factor of the one or more candidates, and the output unit gives a predetermined certainty factor to the image data corresponding to the image constituent element, and the one or more candidates and the image data are obtained. The image processing apparatus according to claim 1, wherein the images are ranked and output in descending order of certainty.

The image processing apparatus according to claim 6, further comprising an input unit configured to input a certainty factor of image data corresponding to the image constituent element.

The output means outputs the first rank information among the one or more candidates and image data, and changes the first rank information according to the certainty factor inputted by the input means. The image processing apparatus according to claim 7.

A recording medium recording a program for a computer,
A function of extracting an image component constituting the input image from the input image;
A function of performing recognition processing of the image component;
A computer having recorded thereon a program for causing the computer to realize a function of separating image data corresponding to the image component from the input image and outputting together with one or more candidates in the recognition result of the image component A readable recording medium.

Enter the image into the computer,
Extracting image components constituting the input image from the input image,
Recognizing the image component,
An image processing method comprising: separating image data corresponding to the image component from the input image and displaying the image data together with one or more candidates in the recognition result of the image component.