JP2006031129A

JP2006031129A - Document processing method and document processor

Info

Publication number: JP2006031129A
Application number: JP2004205361A
Authority: JP
Inventors: Takeshi Eisaki; 健永崎; Mariko Yamamoto; 真理子山本; Katsumi Marukawa; 勝美丸川; Hiroyuki Kuriyama; 裕之栗山; Shigeyuki Fujiwara; 茂之藤原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-07-13
Filing date: 2004-07-13
Publication date: 2006-02-02
Anticipated expiration: 2024-07-13
Also published as: JP4466241B2

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that it is difficult to take countermeasures to the increase of a processing time due to the reading of a document image on a whole surface, or the error of the character line extraction of an OCR due to the mixture of a document/graphic plate/ruled line or the deterioration of reading precision due to the difficulty of the arrangement of general-purpose notation knowledge suitable for a document image since a paper document group or a document image group used to be collectively converted into a text in batch processing by using an OCR device, and a document processing task used to be operated to this in a conventional manner. <P>SOLUTION: A character string block is extracted from a document image being an object to be processed, and the two-dimensional arrangement structure is analyzed so that the described contents category of the character string block can be estimated. When a reading object region is designated by the interactive operation of a user, a recognition processing is performed only for a character string block group in the neighborhood of the region, so that it is possible to reduce a processing time. Also, character string recognition is performed by using notation knowledge corresponding to the described contents of the character string block so that it is possible to improve the recognition precision. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、文字認識を用いた対話的文書処理手法及び文書処理プログラムを記録した記録媒体に関する。 The present invention relates to an interactive document processing technique using character recognition and a recording medium on which a document processing program is recorded.

コンピュータによるデジタル情報技術が普及した現在でも、紙文書は情報伝達の媒体として広く使われている。このため多量の紙文書を対象とした文書処理を効率良く行いたいという要求は社会的に根強いが、紙文書はデジタル文書に比べて参照・検索・改変の処理に時間を要するという問題がある。特に、許認可や点検に関する文書処理業務においては、申請書の記載内容に関する業務ノウハウを素早く検索できることや、必要な情報だけを手早くデジタルデータ化することが、業務効率を上げる意味で重要である。これらの問題を解決するために、紙文書処理のための様々な手法が提案されている。 Even today, with the spread of digital information technology using computers, paper documents are still widely used as a medium for information transmission. For this reason, there is a strong social demand for efficient document processing for a large number of paper documents, but paper documents have a problem in that they require more time for reference / search / modification processing than digital documents. In particular, in document processing work related to authorization and inspection, it is important to be able to quickly search for business know-how related to the contents of application forms and to quickly convert only necessary information into digital data in terms of improving work efficiency. In order to solve these problems, various methods for paper document processing have been proposed.

デジタル文書と同等の文書処理を紙文書で実現するための手段としては、ＯＣＲ（光学的読取装置）を用いて紙文書を認識し、紙文書の記載内容をすべてデジタルデータ化するというバッチ的業務処理が一般的である。あらかじめＯＣＲによって紙文書を全てデジタルデータ（テキスト）に変換しておけば、以降の処理ではテキストを使って関連業務ノウハウを検索する、またはテキストをコード化することで、上述した文書処理業務が効率良く遂行できる。しかし、一般にＯＣＲで変換されたテキストには誤りが含まれるため、単純なバッチ的読取処理だけでは対処できないケースが生じる。 As a means for realizing document processing equivalent to a digital document with a paper document, a batch operation of recognizing a paper document using an OCR (optical reading device) and converting all the written contents of the paper document into digital data Processing is common. If all paper documents are converted to digital data (text) by OCR in advance, the subsequent processing can be performed efficiently by searching for related business know-how using text or by encoding text. Can perform well. However, since the text converted by OCR generally includes an error, there may be a case that cannot be dealt with only by a simple batch reading process.

上記課題の対応策の一つは認識精度の向上、特に文字列の表記知識を用いた精度向上がある。一般に、文字列認識における文字切出及び文字認識の不確定さを補い、文字列画像を文字列テキストに変換するために、文字列表記解析処理は広く利用されている。そのアルゴリズムとしては形態素解析を用いたものや、ＲＴＮ照合（再帰遷移ネットワーク照合）、上昇型構文解析アルゴリズムを用いたものが一般的である。 One of the countermeasures for the above problem is improvement of recognition accuracy, particularly improvement of accuracy using notation knowledge of character strings. In general, character string notation analysis processing is widely used to compensate for character cutout and character recognition uncertainty in character string recognition and to convert a character string image into character string text. As the algorithm, those using morphological analysis, those using RTN matching (recursive transition network matching), and ascending parsing algorithm are generally used.

例えば、特開平０５−１０８８９１号公報（特許文献1）では、ＯＣＲの読取精度を向上する手段としてＯＣＲの認識結果に形態素解析を適用する手法が記されている。形態素解析等の知識処理を行うことで誤読を訂正することは可能であるが、通常の形態素解析で用いる辞書は新聞等の一般文章を対象としており、特殊な業務用途の文書を精度良く校正するためには、その分野に適合した特殊辞書を追加定義する必要がある。このため保守性や計算量の面で問題が残る。更には、形態素解析という幅広い表記知識を対象とするため、表記知識の解析に時間が掛ったり、また表記解析において膨大な記憶容量を必要とするという問題がある。 For example, Japanese Patent Laying-Open No. 05-108891 (Patent Document 1) describes a method of applying morphological analysis to an OCR recognition result as means for improving OCR reading accuracy. Although it is possible to correct misreading by performing knowledge processing such as morphological analysis, the dictionary used in normal morphological analysis is intended for general sentences such as newspapers, and proofreads documents for special business use with high accuracy. To do this, it is necessary to define additional special dictionaries suitable for the field. Therefore, problems remain in terms of maintainability and computational complexity. Furthermore, since a wide range of notation knowledge called morphological analysis is targeted, it takes time to analyze notation knowledge, and there is a problem that enormous storage capacity is required for notation analysis.

また、特開２００２−１１７３７４号公報（特許文献２）では、手書き数字列に対して上昇型構文解析を使った文字列表記解析処理が提案されている。一般に上昇型構文解析は下降型構文解析に比べて計算量が削減できるとされており、数字列等の表記が単純な規則で表現できるものに対して多く適用されている。しかし、文字列認識において起こり得る文字の誤不読、ノイズ混入等の問題に対してのロバスト性には、そのアルゴリズムが最適性を保証するものではないため、限界がある。また、ＯＣＲによって変換されたテキストコードを人手で修正し、その修正結果に対して検索を行うことも可能である。しかし、人間が介在しての修正は、その処理速度及びコストの面から実用的とは言い難い。たとえ人手による修正を省いたとしても、数百万もの紙文書をＯＣＲによってテキスト化することは、処理時間やシステム構築コストが膨大に掛る処理である。 Japanese Patent Laid-Open No. 2002-117374 (Patent Document 2) proposes a character string notation analysis process using ascending syntax analysis for a handwritten digit string. In general, it is said that ascending parsing can reduce the amount of calculation compared to descending parsing, and it is often applied to what can be expressed by a simple rule such as a numeric string. However, there is a limit to the robustness against problems such as misreading of characters and noise mixing that can occur in character string recognition because the algorithm does not guarantee optimality. It is also possible to manually correct a text code converted by OCR and perform a search on the correction result. However, it is difficult to say that correction with human intervention is practical in terms of processing speed and cost. Even if manual correction is omitted, converting millions of paper documents into texts by OCR is a process that requires enormous processing time and system construction costs.

一方、文字認識を対話的に行う文書処理では、文書処理システムの利用者が認識したい対象である文字列を、部分的に指定してテキスト化するため処理時間の軽減を図ることができる。そのため大規模なシステム構築が不要となる。本発明で提案する手法は、対話的に文字認識を行う処理に基づく文書処理手法であり、かつ文書に記された文字列の配置構造を解析することで、認識対象として指示された文字列の記載内容のカテゴリ（年月日か、金額数字か、業務単語か、等）を推定し、これに合致する表記知識を用いて文字列を読み取ることで認識精度の向上を図る。 On the other hand, in document processing in which character recognition is performed interactively, a character string that is a target to be recognized by a user of the document processing system is partially specified and converted into text, thereby reducing processing time. Therefore, it is not necessary to construct a large-scale system. The technique proposed in the present invention is a document processing technique based on a process of performing character recognition interactively, and by analyzing the arrangement structure of the character strings described in the document, the character strings designated as recognition targets are analyzed. The category of description contents (date, date, monetary number, business word, etc.) is estimated, and the recognition accuracy is improved by reading the character string using notation knowledge that matches this category.

特開平０５−１０８８９１号公報Japanese Patent Laid-Open No. 05-108891

特開２００２−１１７３７４号公報JP 2002-117374 A 特開平０９−３１９８２４号公報Japanese Patent Application Laid-Open No. 09-319824 特開２０００−２５１０１２号公報Japanese Patent Laid-Open No. 2000-251012 特開２００１−０１４３１１号公報JP 2001-014411 A

本発明の目的は、文書画像を画面上に表示して処理を行う文書処理装置において、利用者の対話的な操作を元に文字認識を行い、その結果を元に業務データベース中から関連する項目を検索して表示する文書処理システム、または当該文書のデータ化を行う文書処理システム、その装置及び文書処理プログラムを記録した記録媒体を提供することにある。 It is an object of the present invention to perform character recognition based on a user's interactive operation in a document processing apparatus that displays and processes a document image on a screen, and related items from a business database based on the result. Is to provide a document processing system that retrieves and displays the document, a document processing system that converts the document into data, a device thereof, and a recording medium on which the document processing program is recorded.

従来の手法では、ＯＣＲ装置を用いてバッチ処理で文書画像群を一括してテキストに変換し、これに対して文書処理業務を行っていたが、文書画像を全面で読み取ることによる処理時間の増大や、文書−図版−罫線混在に起因するＯＣＲの文字行抽出誤りや、該文書画像に適した汎用的な表記知識を整備することの難しさによる読取精度低下に対処することが困難であった。本発明の目的は、ＯＣＲ読取に起因する処理時間増大、及びＯＣＲ読取誤りが文書処理業務に与える悪影響を回避する方法を提案することである。 In the conventional method, a batch processing using an OCR device converts a document image group into text and performs document processing for this. However, the processing time is increased by reading the entire document image. It was difficult to cope with a reading accuracy drop due to an error in extracting a character line of OCR due to a mixed document-illustration-ruled line, or difficulty in preparing general-purpose notation knowledge suitable for the document image. . An object of the present invention is to propose a method for avoiding an increase in processing time due to OCR reading and an adverse effect of OCR reading errors on a document processing operation.

上記の目的を達成するため、本発明は、文書表示・操作装置において利用者の対話的な操作に応じて、文字認識処理を要求のあった時点で起動し、必要な部分のみを即時に認識し、当該認識結果を用いた業務データベースからの情報検索及び処理対象文書の部分的なデータ化を行うシステムを提供する。また、処理対象である文書画像から、文字列ブロックを抽出し、その二次元的配置構造を解析することで、当該文字列ブロックの記載内容カテゴリ（年月日か、金額数字か、業務単語か、等）を推定し、利用者の対話的操作によって、認識対象である文字列ブロックを選択し、推定された当該文字列ブロックの記載内容カテゴリに応じて、表記知識を切り替えて文字列を読取ることで、読取精度を向上する機構を提供する。 In order to achieve the above object, the present invention starts character recognition processing at the time of request in response to a user's interactive operation in a document display / operation device, and immediately recognizes only necessary portions. In addition, a system is provided that performs information retrieval from a business database using the recognition result and partial data conversion of a processing target document. Also, by extracting a character string block from the document image to be processed and analyzing its two-dimensional arrangement structure, the description content category (year / month / day, monetary number, business word, etc.) of the character string block is analyzed. , Etc.), a character string block to be recognized is selected by a user's interactive operation, and the character string is read by switching the notation knowledge according to the description content category of the estimated character string block. Thus, a mechanism for improving the reading accuracy is provided.

従来の手法では、ＯＣＲ装置を用いたバッチ処理により紙文書群及び文書画像群を一括してテキストに変換し、これに対して文書処理業務を行っていたが、文書画像を全面で読み取ることによる処理時間の増大や、文書−図版−罫線混在に起因するＯＣＲの文字行抽出の誤りや、該文書画像に適した汎用的な表記知識を整備することの困難さによる読取精度低下に対処することが困難であった。一方、本発明によれば、文書処理業務において文字認識処理を対話的に起動し、文書処理において必要となる部分のみに限定してテキスト化するため処理時間の軽減を図ることができる。また、文書に記された文字列の二次元的な配置構造を解析することで、認識対象として指示された文字列の記載内容のカテゴリ（年月日か、金額数字か、業務単語か、等）を推定し、これに合致する表記知識を用いて文字列認識を行うことで認識精度向上を図ることが可能となる。 In the conventional technique, the paper document group and the document image group are collectively converted into text by batch processing using an OCR device, and the document processing work is performed on this, but by reading the document image over the entire surface. To cope with a decrease in reading accuracy due to an increase in processing time, an OCR character line extraction error due to mixed document-illustration-ruled lines, and difficulty in preparing general-purpose notation knowledge suitable for the document image It was difficult. On the other hand, according to the present invention, the character recognition process is interactively started in the document processing operation, and the text is limited to only the part necessary for the document process, so that the processing time can be reduced. In addition, by analyzing the two-dimensional arrangement structure of the character string described in the document, the category of the description content of the character string designated as the recognition target (year, month, monetary number, business word, etc.) ) And the character string recognition using notation knowledge that matches this is possible to improve the recognition accuracy.

始めに、図１をもとにクリック認識を用いた文書処理のフローの概説する。本発明の実施例である文書処理装置では、ＯＣＲ装置、スキャナ装置、文書カメラ等を用いて紙文書を撮像して、これを電子画像データに変換した文書画像を扱う。まず始めに、処理対象とする文書画像を外部記憶装置や通信線を通して外部装置からデータを読み込む（０１０１）。次に、文書画像データから罫線抽出、枠構造解析、読取対象枠の位置推定等の文書構造解析を行う（０１０２）。このとき使う認識処理には公知技術（特開平０９−３１９８２４号公報（特許文献３）、特開２０００−２５１０１２号公報（特許文献４）等）を利用する。一般に文書構造解析を行うためには、対象とする文書の枠配置情報などを記憶した文書構造辞書を使用する（０１０８）。この文書構造辞書は一般に外部記憶装置に記憶されている。文書構造解析では、メモリ若しくは外部記憶装置に記録された、文書画像データ及び文書構造辞書を入力とし、罫線の位置情報と、枠の配置情報と、推定された枠の属性情報とを組にして、メモリ若しくは外部記憶装置へと出力する。 First, an outline of a document processing flow using click recognition will be described with reference to FIG. In a document processing apparatus according to an embodiment of the present invention, a paper document is imaged using an OCR apparatus, a scanner apparatus, a document camera, and the like, and a document image converted into electronic image data is handled. First, data of a document image to be processed is read from an external device through an external storage device or a communication line (0101). Next, document structure analysis such as ruled line extraction from the document image data, frame structure analysis, and position estimation of the reading target frame is performed (0102). For the recognition processing used at this time, known techniques (Japanese Patent Laid-Open No. 09-319824 (Patent Document 3), Japanese Patent Laid-Open No. 2000-2521012 (Patent Document 4), etc.) are used. In general, in order to perform document structure analysis, a document structure dictionary storing frame layout information of a target document is used (0108). This document structure dictionary is generally stored in an external storage device. In document structure analysis, document image data and a document structure dictionary recorded in a memory or an external storage device are input, and ruled line position information, frame layout information, and estimated frame attribute information are paired. , Output to memory or external storage device.

次に、文書構造解析の結果を受けて、認識対象候補である文字列ブロックを抽出する（０１０３）。一般に文字列ブロックとは、空白などの切れ目が無く、意味的に同一のまとまりを表すと思われる文字の塊を意味し、例えば単語の塊が文字列ブロックに該当する。文字列ブロックが複数集まって文字行が構成される。次に、抽出した文字列ブロック情報から、その二次元的な配置構造を解析することで、当該文字列ブロックがどのような属性を持つかを推定する。二次元的な配置構造とは、文字列ブロックの文書画像中における位置、サイズ、上下左右に存在する罫線の情報、及び文字列ブロック相互間の配置関係情報などである。これを配置構造解析と称し（０１０４）、文字列ブロックの属性のことを記載内容カテゴリとも称する。この記載内容カテゴリには、例えば、年月日文字列、金額文字列、ＩＤ数字文字列、一般単語文字列等の分類がある。一般に配置構造解析では、外部記憶装置に蓄えられた配置構造定義情報を使う（０１０９）。配置構造定義情報には、文書種別ごとに、二次元的な配置構造の情報と、その配置構造をもつ文字列ブロックの記載内容カテゴリとが対応付けられて記憶されている。配置構造情報には、例えば座標情報、サイズ情報、属性情報、上下や左右などの隣接関係を表す情報、記載内容カテゴリ情報などの情報が少なくとも含まれる。
ここまでで説明した処理は、以下の手順においてユーザが選択する領域・認識モードなどに基づく認識処理のいわば前処理であり、ユーザが選択する可能性のある領域全体に対して非選択的に行われる。 Next, in response to the result of the document structure analysis, a character string block that is a recognition target candidate is extracted (0103). In general, a character string block means a block of characters that has no breaks such as white space and is considered to represent the same unit in meaning. For example, a block of words corresponds to a character block. A character line is composed of a plurality of character string blocks. Next, by analyzing the two-dimensional arrangement structure from the extracted character string block information, it is estimated what attributes the character string block has. The two-dimensional arrangement structure includes the position and size of character string blocks in the document image, information on ruled lines existing in the upper, lower, left, and right directions, and information on the arrangement relation between character string blocks. This is referred to as arrangement structure analysis (0104), and the attribute of the character string block is also referred to as a description content category. This description content category includes, for example, a classification such as a date character string, an amount character string, an ID numeric character string, and a general word character string. Generally, in the arrangement structure analysis, arrangement structure definition information stored in an external storage device is used (0109). In the arrangement structure definition information, for each document type, information on a two-dimensional arrangement structure and a description content category of a character string block having the arrangement structure are stored in association with each other. The arrangement structure information includes at least information such as coordinate information, size information, attribute information, information representing adjacent relationships such as up and down, left and right, and description content category information.
The processing described so far is a so-called pre-processing of the recognition processing based on the region / recognition mode selected by the user in the following procedure, and is performed non-selectively on the entire region that the user may select. Is called.

以上述べた、文書構造解析、文字列ブロック抽出、配置構造解析が終了した後に、当該文書画像を表示装置（０１１０）を通じて文書処理システムのユーザに表示する。ユーザは、キーボード（０１１１）、マウス（０１１２）、電子ペン（０１１３）、あるいはタッチパッド（０１１０）などの情報入力装置を通して、当該文書に対する業務処理（検索、点検、データ登録）を行う。人間との対話的な処理を行い、文書処理や文字認識処理との仲介をするのが文書表示・操作制御部（０１０５）である。文書表示・操作制御部では、マウスクリック、ペンドラッグ、カーソル移動などのイベント（ユーザアクション）に応じて、当該文書の必要な箇所の文字認識（０１０６）、並びに業務データベース検索と結果表示などの文書処理（０１０７）を行う。文字認識部（０１０６）については図２に、文書処理部（０１０７）については図３で説明する。 After the document structure analysis, character string block extraction, and arrangement structure analysis described above are completed, the document image is displayed to the user of the document processing system through the display device (0110). The user performs business processing (search, inspection, data registration) on the document through an information input device such as a keyboard (0111), a mouse (0112), an electronic pen (0113), or a touch pad (0110). The document display / operation control unit (0105) performs interactive processing with humans and mediates between document processing and character recognition processing. In the document display / operation control unit, in accordance with an event (user action) such as mouse click, pen drag, cursor movement, etc., a character recognition of a necessary part of the document (0106), a document such as business database search and result display, etc. Processing (0107) is performed. The character recognition unit (0106) will be described with reference to FIG. 2, and the document processing unit (0107) will be described with reference to FIG.

図２は文字認識部の内部フローを示した図である。文字認識部の上位には文書表示・操作制御部（０１０５）があり、ここから文書画像、配置構造情報、認識要求情報が出力される。認識要求情報とは、認識をどのようなモードで行うか等を記したデータである。次に、これらの入力データを元に画像領域選択部で、認識対象となる領域（文字列ブロック群）を確定する（０２０１）。次に確定した認識領域内の文字行について、文字切出を行う（０２０２）。更に、切り出された各文字パタンについて識別を行う（０２０３）。この結果、図８、図９で後述するような候補文字ネットワークが得られる。候補文字ネットワークとは、認識対象となる文字行画像を、文字パタン及びその識別結果をエッジとし、文字パタンの切断点をノードとする有向グラフとして表したものである。文字識別部では一般に外部記憶装置またはメモリ上に蓄えられた文字識別辞書（０２０６）を用いて、文字パタンの識別を行う。次に、文字切出部（０２０２）と文字識別部（０２０３）の処理の結果として得られた候補文字ネットワークについて表記解析を行う（０２０４）。表記解析では一般に、外部記憶装置またはメモリ上に蓄えられた表記知識辞書（０２０７）を用いて、当該文字列がどのような単語で構成されるか、単語の並びはどうかなどを確かめて、候補文字ネットワークからテキストである文字列を確定する。 FIG. 2 is a diagram showing an internal flow of the character recognition unit. A document display / operation control unit (0105) is provided above the character recognition unit, from which a document image, arrangement structure information, and recognition request information are output. The recognition request information is data describing in what mode the recognition is performed. Next, an area (character string block group) to be recognized is determined by the image area selection unit based on these input data (0201). Next, character extraction is performed on the character line in the recognized recognition area (0202). Further, each extracted character pattern is identified (0203). As a result, a candidate character network as described later with reference to FIGS. 8 and 9 is obtained. The candidate character network represents a character line image to be recognized as a directed graph having a character pattern and its identification result as an edge and a cut point of the character pattern as a node. The character identification unit generally identifies character patterns using a character identification dictionary (0206) stored in an external storage device or memory. Next, a notation analysis is performed on the candidate character network obtained as a result of the processing of the character cutout unit (0202) and the character identification unit (0203) (0204). In notation analysis, in general, a notation knowledge dictionary (0207) stored in an external storage device or memory is used to check what word the character string is composed of, whether the word is arranged, and the like. Determine a text string from the character network.

このとき上位の文書表示・操作制御部（０１０５）から入力された配置構造情報に含まれる、認識対象となった文字列ブロックの記載内容カテゴリ情報を用いて、表記知識辞書（０２０７）の切り替えを行う。表記知識辞書は、それぞれの記載内容カテゴリに対応して用意されるものとする。これにより、当該文字列ブロックの文字列認識精度を向上する。記載内容カテゴリには複数のカテゴリ候補がある場合もあり、その場合は、各記載内容カテゴリの候補に対応した表記知識辞書を用いて文字列認識を行い、その結果を尤度順に複数出力することになる。最後に、文字識別の結果得られたテキストコード、及び配置構造情報を用いて認識結果統合部の処理で、得られた認識結果群を確定する（０２０５）。以上が文字認識部のフローとなり、その処理結果は読取結果として上位の処理（０１０５）に戻される。 At this time, the notation knowledge dictionary (0207) is switched using the description content category information of the character string block to be recognized, which is included in the arrangement structure information input from the upper document display / operation control unit (0105). Do. The notation knowledge dictionary is prepared corresponding to each description content category. Thereby, the character string recognition accuracy of the character string block is improved. There may be multiple category candidates in the description content category. In that case, character string recognition is performed using the notation knowledge dictionary corresponding to each description content category candidate, and multiple results are output in order of likelihood. become. Finally, the recognition result group obtained by the processing of the recognition result integration unit is determined by using the text code obtained as a result of the character identification and the arrangement structure information (0205). The above is the flow of the character recognition unit, and the processing result is returned to the upper processing (0105) as the reading result.

図３は、文書処理部の内部フローを示した図である。文書処理部の上位には文書表示・操作制御部（０１０５）がある。文書表示・操作制御部は文書画像を表示装置を通じてユーザに表示し、制御入力装置を通して当該文書の処理、画像データのコード化、文書検索、業務データ閲覧などを行う。その中で、文字認識が必要となる場合は、文字認識部（０１０６、詳細は０２０１〜０２０５に記述）に、認識するべき領域に関する情報を出力して、認識結果の入力を受け取る。更に、この認識結果を受けて検索や業務データ閲覧などの文書処理を行う場合は、文書処理部（０１０７）に必要なデータを入力して処理を任せる。文書処理部では上位より、文書画像、配置構造情報、認識結果、文書処理要求情報が入力される。文書画像は画像そのままのデータでなくとも、画像を一意的に特定するＩＤ番号でも良い。また文書処理要求情報とは、認識結果を用いて、どのような文書処理を行いたいのか、ユーザの要求を特定するために必要十分な情報を含んでいる。以上の入力を受けて、業務関連処理制御部が稼動する（０３０１）。 FIG. 3 is a diagram showing an internal flow of the document processing unit. A document display / operation control unit (0105) is provided above the document processing unit. The document display / operation control unit displays a document image to a user through a display device, and performs processing of the document, encoding of image data, document search, business data browsing, and the like through a control input device. Among them, when character recognition is required, information on the region to be recognized is output to the character recognition unit (0106, details are described in 0201 to 0205), and input of recognition results is received. Further, when document processing such as search or business data browsing is performed in response to the recognition result, necessary data is input to the document processing unit (0107) and processing is entrusted. In the document processing unit, the document image, the arrangement structure information, the recognition result, and the document processing request information are input from the upper level. The document image may be an ID number that uniquely identifies the image, instead of the data as it is. Further, the document processing request information includes information necessary and sufficient to specify what kind of document processing is desired to be performed using the recognition result and the user's request. In response to the above input, the business-related process control unit operates (0301).

文書処理制御部は、ユーザの要求に応じて、文書データベースからの検索や業務データベースの検索などを行う。例えば認識した単語を含む文書を検索したいという場合、まず認識結果の不確定性・不安定性を吸収する曖昧性対応処理（０３０２）を行い、次に文書データベース（０３０４）の中に蓄えられたデータの中から、要求された文書検索を行う（０３０３）。また、認識した単語について付随するデータを検索したい場合（例えば医療関連文献であれば、同時に併用してはいけない薬品名などを知りたい場合）、先程と同様に認識結果の不確定性・不安定性を吸収する曖昧性対応処理（０３０５）を行った後に、認識したキーワードを元に業務データベース（０３０７）の中から必要な情報の検索を行う（０３０６）。以上の、文書検索、情報検索の結果は上位の文書処理制御部に戻され、ここから更に上位の文書表示・操作制御部（０１０５）へと、文書検索結果または情報検索結果として戻される。 The document processing control unit performs a search from a document database, a search of a business database, or the like according to a user request. For example, when it is desired to search for a document including a recognized word, first, an ambiguity handling process (0302) that absorbs uncertainty and instability of the recognition result is performed, and then data stored in the document database (0304). The requested document is searched from the list (0303). In addition, if you want to search the data that accompanies the recognized word (for example, if you want to know the names of drugs that should not be used at the same time in the case of medical literature), the recognition results are uncertain or unstable as before. After performing the ambiguity handling process (0305) to absorb the necessary information, the necessary information is searched from the business database (0307) based on the recognized keyword (0306). The above document search and information search results are returned to the higher-level document processing control unit, and from there to the higher-level document display / operation control unit (0105) as a document search result or information search result.

図４は上述した文書処理装置の具体的な構成例を示したものである。図４上段の文書表示装置では、画像入力装置（０４０１）により紙文書を電子データに変換し、それを外部記憶装置（０４０４）及びメモリ（０４０５）に蓄えて、中央演算装置（０４０６）により読取を行う。中央演算装置（０４０６）では、図１から３に示す処理のうち、０１０１から０１０９までの処理、及び０２０１から０２０７までの処理、及び０３０１から０３０７までの処理を行う。図４に示すように文書処理装置が外部に存在する場合は、中央演算装置（０４０６）が、０１０１から０１０５までの文書画像関連処理を行う、または０１０５の表示・操作関連処理のみを行い、その他の処理を外部文書処理装置で分担することで、高速な文書処理機能を提供することが可能である。文書形式の定義などは外部記憶装置（０４０４）に蓄えられており、文書構造解析にはここに蓄えた定義を参照する。これらの処理は操作端末装置（０４０２）を通して人間が操作可能であり、処理結果等は表示端末装置（０４０３）を通して表示される。 FIG. 4 shows a specific configuration example of the document processing apparatus described above. 4, the paper document is converted into electronic data by the image input device (0401), stored in the external storage device (0404) and the memory (0405), and read by the central processing unit (0406). I do. The central processing unit (0406) performs the processes from 0101 to 0109, the processes from 0201 to 0207, and the processes from 0301 to 0307 among the processes shown in FIGS. As shown in FIG. 4, when the document processing apparatus exists outside, the central processing unit (0406) performs document image related processing from 0101 to 0105, or performs only display / operation related processing of 0105, and the others. It is possible to provide a high-speed document processing function by sharing the above process by the external document processing apparatus. The definition of the document format is stored in the external storage device (0404), and the stored definition is referred to for the document structure analysis. These processes can be operated by a human through the operation terminal device (0402), and the processing results and the like are displayed through the display terminal device (0403).

処理結果や認識起動などのイベント情報は、必要に応じて外部記憶装置に蓄積または通信装置（０４０７）を通して外部装置にデータが送られる。ユーザは表示端末装置（０４０３）及び操作端末装置（０４０２）を通して、文書画像の閲覧及び業務処理を行う。業務処理の際に文字認識結果が必要となる場合は、当該箇所をマウスのクリックなどにより指定することで、文字認識プログラムが起動する。文字認識プログラムは外部記憶装置（０４０４）若しくはメモリ（０４０５）上に蓄えられており、ユーザの対話的な入力アクションによって動作する。これによりバッチ処理で必要となる大量の計算時間を削減し、小規模のシステム構成で文字認識機能を利用した文書画像処理が実現できる。上記装置群は内部バス（０４０８）によってつながれている。 Event information such as processing results and recognition activation is stored in an external storage device as needed, or data is sent to an external device through a communication device (0407). The user browses the document image and performs business processing through the display terminal device (0403) and the operation terminal device (0402). When a character recognition result is required during business processing, the character recognition program is activated by designating the corresponding part by clicking the mouse. The character recognition program is stored in the external storage device (0404) or the memory (0405), and is operated by a user's interactive input action. As a result, a large amount of calculation time required for batch processing can be reduced, and document image processing using the character recognition function can be realized with a small system configuration. The above devices are connected by an internal bus (0408).

図４下段の業務処理装置は、上記文書表示装置から出力された認識結果を用いて文書検索・業務データベース検索を行うものである。この業務処理装置は、通信装置（０４１３）及び外部記憶装置（０４１０）より認識結果を受け取り、これをメモリ（０４１１）にロードして、中央演算装置（０４１２）により文書の検索や、業務処置に必要なデータを検索し、その結果を文書表示装置に通信装置（０４１３）及び外部通信線（０４０９）を通して通知する。中央演算装置（０４１２）で行う処理は、図１から３に示す処理のうち、例えば０２０１から０２０７までの処理、及び０３０１から０３０７までの処理が該当する。また、中央演算装置（０４０６）が０１０５の表示・操作関連処理のみを専念的に行う場合は、上記に加えて０１０１から０１０４、および０１０６から０１０９の処理を中央演算装置（０４１２）が行う。これらの装置は内部バス（０４１４）によってつながれている。尚、この例では対話的な操作を行う文書表示装置と、業務データベースの検索を行う業務処理装置を分離して記しているが、これらが一体となった装置でも良い。 The business processing apparatus in the lower part of FIG. 4 performs document retrieval / business database retrieval using the recognition result output from the document display apparatus. The business processing device receives the recognition result from the communication device (0413) and the external storage device (0410), loads the result into the memory (0411), and searches the document or performs business processing by the central processing unit (0412). The necessary data is retrieved, and the result is notified to the document display device through the communication device (0413) and the external communication line (0409). The processing performed by the central processing unit (0412) corresponds to the processing from 0201 to 0207 and the processing from 0301 to 0307 among the processing shown in FIGS. When the central processing unit (0406) exclusively performs the display / operation related processing of 0105, the central processing unit (0412) performs the processing of 0101 to 0104 and 0106 to 0109 in addition to the above. These devices are connected by an internal bus (0414). In this example, the document display device that performs interactive operations and the business processing device that searches the business database are described separately, but a device in which these are integrated may be used.

図５は、クリック認識のコンセプトを述べた図である。まず文書画像（ａ、０５０１）がある。ここでは医療関連文献を例とする。まず、この文書画像から文字列ブロックの抽出を行う（ｂ）。０５０２は抽出された罫線、０５０３は文字列ブロックを表す外接矩形である。次に、この文字列ブロックの配置情報を使って、配置構造解析を行う（ｃ）。この医療関連文献では、配置構造解析によって大きく４つのカテゴリに分類されている。０５０４はその中の１つのカテゴリを表している。この解析は一般に配置構造定義を用いて行う。ユーザクションで特定箇所をマウスなどでクリックした場合（０５０５）、配置構造解析情報を使って、クリックされた付近の文字列ブロックを、そのカテゴリに合わせた表記知識を使って認識するのがクリック認識である。配置構造解析が利用できない場合は、文字列ブロックを用いて認識するべき文字列を特定し、これに対して、汎用的な表記知識を用いた、または表記知識を用いない文字列認識を行うことになる。一般に、配置構造解析を利用すると、業務カテゴリに限定した表記知識を使えるため、文字列認識の誤不読を低減することが可能となる。 FIG. 5 is a diagram describing the concept of click recognition. First, there is a document image (a, 0501). Here, medical-related literature is taken as an example. First, a character string block is extracted from this document image (b). 0502 is an extracted ruled line, and 0503 is a circumscribed rectangle representing a character string block. Next, the arrangement structure analysis is performed using the arrangement information of the character string block (c). This medical literature is roughly classified into four categories by arrangement structure analysis. Reference numeral 0504 represents one of the categories. This analysis is generally performed using the arrangement structure definition. When the user clicks a specific location with the mouse (0505) in the user action, it is click recognition that uses the layout structure analysis information to recognize the text block near the click using the notation knowledge that matches the category. It is. If layout structure analysis cannot be used, identify the character string to be recognized using the character string block, and perform character string recognition using general notation knowledge or not using notation knowledge. become. In general, when the arrangement structure analysis is used, notation knowledge limited to a business category can be used, so that it is possible to reduce misreading of character string recognition.

図６は、認識を起動するためのユーザの様々なアクション（ユーザインタフェース）を示した図である。認識指定インタフェース１（ａ）では、０６０１で示す箇所でのマウスのクリック、または電子ペンのタップ動作により認識を行うことを示す。認識指定インタフェース２（ｂ）では、マウスを横方向（０６０２に示すの方向）にドラッグまたは移動、または電子ペンを横方向にスライドすることにより、その横方向への移動量で指定された幅分の文字列（文字列ブロック）を認識する動作を示す。その際、動作のフィードバックとして、ＧＵＩ上に０６０３で示すような指定範囲を示す下線を表示する。認識指定インタフェース３（ｃ）では、同様にマウスまたは電子ペンを下方向にドラッグ・移動・スライドすることにより、指定した上下範囲内の行に属する文字列（または文字列ブロック）を認識する処理を示している。 FIG. 6 is a diagram showing various actions (user interface) of the user for activating recognition. The recognition designation interface 1 (a) indicates that recognition is performed by clicking a mouse at a location indicated by 0601 or a tap operation of an electronic pen. In the recognition designation interface 2 (b), by dragging or moving the mouse in the horizontal direction (direction shown by 0602) or sliding the electronic pen in the horizontal direction, the width specified by the amount of movement in the horizontal direction is set. The operation | movement which recognizes the character string (character string block) of is shown. At that time, as an operation feedback, an underline indicating a designated range as indicated by 0603 is displayed on the GUI. In the recognition designation interface 3 (c), similarly, a process of recognizing a character string (or character string block) belonging to a line within a designated vertical range by dragging, moving, and sliding the mouse or the electronic pen downward. Show.

認識指定インタフェース４（ｄ）は、マウスまたは電子ペンを斜め方向にドラッグ・スライドすることにより、矩形状に領域を括り、その中の文字列・文字列ブロックを認識する処理である。０６０５には、括った矩形領域をＧＵＩで表示している。認識指定インタフェース５（ｅ）は、マウスまたは電子ペンで丸囲みで領域を指定することにより、当該領域内または当該領域にほぼ重なる文字列ブロックを認識する処理を示している。０６０６は、括った領域をＧＵＩで表示する場合の表示線を表している。認識指定インタフェース６（ｆ）は、ジェスチャにより認識領域を指定する方法を示している。例えば０６０７に示すようなチェックマークで、当該マークが書かれた縦方向のすべての文字列ブロックを認識するとした場合、０６０８に示すような領域内が全て認識対象となる。これは、帳票認識において特定欄を全て認識するような場合に使うことができ、領域すべてを囲うための手間を省くことができる。これは配置情報を事前に解析されていることにより可能となるインタフェースである。 The recognition designation interface 4 (d) is processing for recognizing a character string / character string block in a rectangular area by dragging and sliding a mouse or an electronic pen in an oblique direction. In 0605, the enclosed rectangular area is displayed with a GUI. The recognition designation interface 5 (e) shows processing for recognizing a character string block in or near the area by designating the area with a mouse or an electronic pen in a circle. Reference numeral 0606 denotes a display line when the enclosed area is displayed using a GUI. The recognition designation interface 6 (f) shows a method for designating a recognition area by a gesture. For example, when a check mark as shown in 0607 recognizes all the character string blocks in the vertical direction in which the mark is written, the entire area as shown in 0608 is a recognition target. This can be used in the case of recognizing all the specific fields in the form recognition, and can save time and effort for enclosing the entire area. This is an interface that is made possible by analyzing the arrangement information in advance.

上記インタフェースによって指定された対象の認識結果は、その場でポップアップウィンドウで表示する、または業務データベースから検索した関連情報を表示するなどの手段により、文書処理システムのユーザにフィードバックされる。認識結果が複数ある場合は、例えばマウスや電子ペンの停留（停止）状態によって、時間毎に認識結果を違えて表示し、その中から正しい認識結果を選択する入力をしてもらう、あるいは業務データベースから検索した関連情報を複数の認識結果について並べて表示するなどのフィードバック方法が考えられる。 The recognition result of the object specified by the interface is fed back to the user of the document processing system by means such as displaying it in a pop-up window on the spot or displaying related information retrieved from the business database. If there are multiple recognition results, for example, depending on whether the mouse or electronic pen is stopped (stopped), the recognition results are displayed differently at different times, and the correct recognition results are selected from among them, or the business database A feedback method such as displaying related information retrieved from a plurality of recognition results side by side is conceivable.

図７は、配置構造情報を利用して、同じアクションが認識モードの違いによって、異なる認識結果につながることを示した図である。ここでは配置構造情報（ａ）として、０７０１に示す４種類があるとする。配置構造解析が既に済んだ画像に対して、クリック認識による領域指定を行った場合（ｂ）、認識モードの指定を替えることによって、認識結果が変わる。例えば領域選択１（ｂ１）では、０７０７に示すように認識モードとして「Ｇｅｔ（Ｃｏｌｕｍｎ（ｘ）、Ｒｏｗ（ｘ））」と指定されている。この指定文の中のｘとはクリックされた当該箇所（０７０６の場所）を表す。また、Ｃｏｌｕｍｎとは当該箇所の列、Ｒｏｗは当該箇所の行を表し、Ｇｅｔにより引数に指定した箇所を取得・認識せよという指令文を構成している。この場合、図中の網がけ部分が選択され、認識に掛けられる。 FIG. 7 is a diagram showing that the same action leads to different recognition results depending on the recognition mode using the arrangement structure information. Here, it is assumed that there are four types shown as 0701 as the arrangement structure information (a). When region designation by click recognition is performed on an image that has already undergone arrangement structure analysis (b), the recognition result changes by changing the recognition mode designation. For example, in area selection 1 (b1), “Get (Column (x), Row (x))” is designated as the recognition mode as indicated by 0707. X in this specification sentence indicates the clicked location (location 0706). Column is a column of the location, Row is a row of the location, and constitutes a command statement for acquiring and recognizing the location specified as an argument by Get. In this case, the shaded portion in the figure is selected and subjected to recognition.

また、領域選択２（ｂ２）では、０７０８に示すように、認識モードが「Ｇｅｔ（ＡＢＣＤ、Ｒｏｗ（ｘ））」と指定されている。これは列としてカテゴリＡ、Ｂ、Ｃ、Ｄを選択し、行としてクリックされた当該箇所の行を選択するという指定である。カテゴリＡ、Ｂ、Ｃ、Ｄは０７０１にある配置構造情報であり、各々が０７０２、０７０３、０７０４、０７０５に示される縦の系列が相当する。従って（ｂ２）では、網がけに示す領域が選択され、各々のブロック毎に認識が行われる。 In the area selection 2 (b2), as indicated by 0708, the recognition mode is designated as “Get (ABCD, Row (x))”. This is a designation of selecting categories A, B, C, and D as columns, and selecting the row at the clicked location as a row. Categories A, B, C, and D are arrangement structure information in 0701, and correspond to vertical series shown in 0702, 0703, 0704, and 0705, respectively. Therefore, in (b2), the area shown by shading is selected, and recognition is performed for each block.

また、領域選択３（ｂ３）では、０７０９に示すように、認識モードとして、クリック箇所の列と、全ての行が選択されている。従って、この場合、網がけに示すＢの系列（０７０３の部分）がすべて選択され、これらが認識対象となることを示している。同様に、クリックによる領域指定だけでなく、ドラッグ・スライドなどによる領域指定でも同じ動作が起こる（ｃ）。０７１０ではライン指定で２つのブロックを指定している。認識モードが０７１１に示すような場合は、指定した当該行群、当該列群が認識対象領域として選択されることを示している（ｃ１）。このような認識モードは、文書処理の業務形態に応じて、予め選択することができる。例えば、特定項目の縦計を行いたい場合は、認識モードとして０７０９を選んでおけば、簡単な操作で、ユーザの所望する項目の縦方向の認識が一括して行えることになる。
文字列表記解析処理と文字列仮説については、図８及び図９に概要がある。図８は文字列仮説と表記知識を使った文字列認識の流れを説明した図である。また、図９は、文字列仮説の概念とデータの詳細を示した図である。 In area selection 3 (b3), as shown in 0709, the column of the clicked part and all the rows are selected as the recognition mode. Therefore, in this case, all of the B series (the portion 0703) shown in the halftone screen is selected, indicating that these are to be recognized. Similarly, the same operation occurs not only by specifying an area by clicking but also by specifying an area by dragging and sliding (c). In 0710, two blocks are designated by line designation. When the recognition mode is as shown in 0711, it indicates that the designated row group and column group are selected as the recognition target area (c1). Such a recognition mode can be selected in advance according to the business mode of document processing. For example, if the vertical measurement of a specific item is desired, if 0709 is selected as the recognition mode, the vertical direction of items desired by the user can be collectively recognized with a simple operation.
The character string notation analysis process and the character string hypothesis are outlined in FIGS. FIG. 8 is a diagram for explaining the flow of character string recognition using the character string hypothesis and notation knowledge. FIG. 9 is a diagram showing the concept of the character string hypothesis and details of the data.

図８を説明する。読取対象文字行（ａ）から、文字パタンと推定される部分を様々に切出して文字パタン候補を作り、各文字パタン候補を文字識別したものが、文字列仮説（ｂ）である。文字列仮説は、文字パタン候補、文字識別の結果得られた順位付けされた識別文字コード群、文字列仮説中での文字パタン候補間の接続関係の情報、を最低限持つものとする。このように文字列仮説はグラフ形式による表現で表され、それ故に候補文字ネットワークとも称される。次に文字列表記知識（ｃ）を使って、文字列仮説（候補文字ネットワーク）から文字列パス（ｄ）を計算する。文字列パスとは、一意的に確定した文字コード列（テキスト）と、各文字コードに対応する文字パタンの並びを意味する。この例では文字列表記知識をＯＲ記号（｜）で単語を並べて表現している。すなわち、記号｜の間に挟まれた単語群が表記知識として指定されたことを意味する。文字列表記知識を表現するとしては、この表現以外にもトライ、文脈自由文法などを使った方法がある（特開２００１−０１４３１１号公報（特許文献５）等に記載）。 FIG. 8 will be described. A character string hypothesis (b) is a character pattern hypothesis (b) in which a portion estimated to be a character pattern is cut out from a character line (a) to be read to create a character pattern candidate and each character pattern candidate is identified. It is assumed that the character string hypothesis has at least character pattern candidates, ranked identification character code groups obtained as a result of character identification, and information on connection relations between character pattern candidates in the character string hypothesis. In this way, the character string hypothesis is represented by a representation in a graph format, and is therefore also referred to as a candidate character network. Next, the character string path knowledge (c) is used to calculate the character string path (d) from the character string hypothesis (candidate character network). The character string path means a character code string (text) uniquely determined and a character pattern corresponding to each character code. In this example, knowledge of character string notation is expressed by arranging words with an OR symbol (|). That is, it means that a word group sandwiched between symbols | is designated as notation knowledge. In addition to this expression, there is a method using a try, a context free grammar, etc. (described in Japanese Patent Application Laid-Open No. 2001-014411 (Patent Document 5)).

文字列仮説（候補文字ネットワーク）の詳細は図９にある。文字列仮説は、文字パタンの候補をアーク（０９０１）とし、文字パタンの境界をノード（０９０２）とする有向グラフとして表現される。各文字パタンには、左右（縦書きであれば上下）のノード（パタン境界）を表す境界ＩＤ番号と、文字識別候補（０９０３）及び識別類似度（０９０４）の情報が含まれる。知識処理は、この文字列仮説と文字列表記知識を入力として、文字列仮説に含まれ得る単語とそのパタン列を見つける処理である。例えば文字列表記知識にある「血液化学検査」という単語は、図３（ｂ）の文字列仮説中に、丸で示される文字コード及び文字パタン（０９０５）を辿ることで見つけることができる。当該欄に書かれる文字列の表記が事前に定まっている場合、本処理を行うことで文字列コードが確定する。 Details of the character string hypothesis (candidate character network) are shown in FIG. The character string hypothesis is expressed as a directed graph in which a character pattern candidate is an arc (0901) and a character pattern boundary is a node (0902). Each character pattern includes boundary ID numbers representing left and right (upper and lower if vertical writing) (pattern boundaries), information on character identification candidates (0903), and identification similarity (0904). Knowledge processing is processing for finding words and pattern strings that can be included in the character string hypothesis using the character string hypothesis and knowledge of character string notation as inputs. For example, the word “blood chemistry test” in the character string notation knowledge can be found by following the character code and character pattern (0905) indicated by a circle in the character string hypothesis of FIG. When the notation of the character string written in the field is determined in advance, the character string code is determined by performing this process.

以上に述べた処理により、文書処理業務において文字認識処理を対話的に起動し、文書処理において必要となる部分のみに限定してテキスト化するため処理時間の軽減を図ることができる。また、文書に記された文字列の二次元的な配置構造を解析することで、認識対象として指示された文字列の記載内容のカテゴリ（年月日か、金額数字か、業務単語か、等）を推定し、これに合致する表記知識を用いて文字列認識を行うことで認識精度向上を図ることが可能となる。 By the processing described above, the character recognition processing is interactively activated in the document processing operation, and the text is limited to only the portion necessary for the document processing, so that the processing time can be reduced. In addition, by analyzing the two-dimensional arrangement structure of the character string described in the document, the category of the description content of the character string designated as the recognition target (year, month, monetary number, business word, etc.) ) And the character string recognition using notation knowledge that matches this is possible to improve the recognition accuracy.

クリック認識を用いた文書処理フロー図。The document processing flowchart using click recognition. 文字認識部の処理フロー図。The processing flow figure of a character recognition part. 文書処理部の処理フロー図。The processing flow figure of a document processing part. 文書表示装置と文書処理装置の構成例。2 is a configuration example of a document display device and a document processing device. 配置構造解析を用いたクリック認識の概念図。The conceptual diagram of the click recognition using arrangement | positioning structure analysis. クリック認識のインタフェース。Click recognition interface. クリック認識範囲のモード指定。Click recognition range mode specification. 文字列仮説を使った表記知識処理の概念図。The conceptual diagram of the notation knowledge process using a character string hypothesis. 文字列仮説の概念図。The conceptual diagram of a character string hypothesis.

Explanation of symbols

０１０１…画像入力部、０１０２…文書構造解析部、０１０３…文字行・文字ブロック抽出部、０１０４…配置構造解析部、０１０５…文書表示・操作制御部、０１０６…文字認識部、０１０７…文書処理部、０１０８…文書構造辞書、０１０９…配置構造定義、０１１０…表示装置、０１１１…キーボード、０１１２…マウス、０１１３…電子ペン、
０２０１…認識対象領域（文字列ブロック群）選択部、０２０２…文字切出部、０２０３…文字識別部、０２０４…表記解析部、０２０５…認識結果統合部、０２０６…文字識別辞書、０２０７…表記知識辞書、
０３０１…業務関連処理制御部、０３０２…曖昧性対応部、０３０３…文書検索部、０３０４…文書データベース、０３０５…曖昧性対応部、０３０６…情報検索部、０３０７…業務データベース、
０４０１…文書表示装置における画像入力装置、０４０２…文書表示装置における操作端末装置、０４０３…文書表示装置における表示端末装置、０４０４…文書表示装置における外部記憶装置、０４０５…文書表示装置におけるメモリ、０４０６…文書表示装置における中央演算装置、０４０７…文書表示装置における通信装置、０４０８…文書表示装置における内部バス、０４０９…データ通信線、０４１０…業務処理装置における外部記憶装置、０４１１…業務処理装置におけるメモリ、０４１２…業務処理装置における中央演算装置、０４１３…業務処理装置における通信装置、０４１４…業務処理装置における内部バス、
０５０１…処理対象とする文書画像の例、０５０２…文書画像から抽出された罫線、０５０３…文書画像から抽出された文字列ブロック、０５０４…配置構造解析の結果タグが付けられた文字列ブロック、０５０５…マウス・電子ペン等のカーソル、０５０６…クリック認識の結果、
０６０１…マウス・電子ペン等のカーソル、０６０２…カーソルの動きを表す矢印、０６０３…選択領域をＧＵＩで表示した横線、０６０４…選択領域をＧＵＩで表示した縦線、０６０５…選択領域をＧＵＩで外接矩形として表示した囲み線、０６０６…選択領域をＧＵＩで丸として表示した囲み線、０６０７…ジェスチャを行った場合の動線、０６０８…ジェスチャの結果選択された領域を示す囲み線、
０７０１…配置構造情報、０７０２…配置構造情報のＡカテゴリ、０７０３…配置構造情報のＢカテゴリ、０７０４…配置構造情報のＣカテゴリ、０７０５…配置構造情報のＤカテゴリ、０７０６…カーソルの動き（当該場所でクリック）、０７０７…認識モード指定とカーソルのクリック位置による領域選択１、０７０８…認識モード指定とカーソルのクリック位置による領域選択２、０７０９…認識モード指定とカーソルのクリック位置による領域選択３、０７１０…カーソルの動き（当該場所でのドラッグ・ライン）、０７１１…認識モード指定とカーソルのドラッグ・ライン位置による領域選択、
０９０１…切り出された文字パタン及び識別候補（グラフアーク）、０９０２…文字切出の境界（グラフノード）、０９０３…文字識別候補群、０９０４…文字識別候補に対応する識別類似度群、０９０５…知識処理の結果選択された文字識別候補。 0101: Image input unit, 0102 ... Document structure analysis unit, 0103 ... Character line / character block extraction unit, 0104 ... Arrangement structure analysis unit, 0105 ... Document display / operation control unit, 0106 ... Character recognition unit, 0107 ... Document processing unit , 0108 ... Document structure dictionary, 0109 ... Arrangement structure definition, 0110 ... Display device, 0111 ... Keyboard, 0112 ... Mouse, 0113 ... Electronic pen,
0201 ... Recognition target area (character string block group) selection unit, 0202 ... Character extraction unit, 0203 ... Character identification unit, 0204 ... Notation analysis unit, 0205 ... Recognition result integration unit, 0206 ... Character identification dictionary, 0207 ... Notation knowledge dictionary,
0301 ... Business related processing control unit, 0302 ... Ambiguity correspondence unit, 0303 ... Document search unit, 0304 ... Document database, 0305 ... Ambiguity correspondence unit, 0306 ... Information search unit, 0307 ... Business database,
0401: Image input device in document display device, 0402: Operation terminal device in document display device, 0403 ... Display terminal device in document display device, 0404 ... External storage device in document display device, 0405 ... Memory in document display device, 0406 ... Central processing unit in document display device, 0407 ... Communication device in document display device, 0408 ... Internal bus in document display device, 0409 ... Data communication line, 0410 ... External storage device in business processing device, 0411 ... Memory in business processing device, 0412 ... Central processing unit in the business processing device, 0413 ... Communication device in the business processing device, 0414 ... Internal bus in the business processing device,
0501 ... Example of document image to be processed, 0502 ... Ruled lines extracted from document image, 0503 ... Character string block extracted from document image, 0504 ... Character string block with tag as a result of arrangement structure analysis, 0505 ... Cursor such as mouse and electronic pen, 0506 ... Click recognition result,
0601: Cursor such as mouse / electronic pen, 0602: Arrow representing cursor movement, 0603: Horizontal line displaying selection area with GUI, 0604 ... Vertical line displaying selection area with GUI, 0605 ... circumscribing selection area with GUI Enclosed line displayed as a rectangle, 0606... Enclosed line with a selection area displayed as a circle in the GUI, 0607... Movement line when a gesture is performed, 0608... Enclosed line indicating an area selected as a result of the gesture.
0701 ... Arrangement structure information, 0702 ... A category of arrangement structure information, 0703 ... B category of arrangement structure information, 0704 ... C category of arrangement structure information, 0705 ... D category of arrangement structure information, 0706 ... Movement of the cursor (corresponding location) , 0707... Region selection 1 by recognition mode designation and cursor click position 1, 0708 ... region selection by recognition mode designation and cursor click position 2, 0709 ... region selection by recognition mode designation and cursor click position 3, 0710 ... Cursor movement (drag line at the location), 0711 ... Selection of region by specifying recognition mode and cursor drag line position,
0901: Cutout character pattern and identification candidate (graph arc), 0902: Character cutout boundary (graph node), 0903 ... Character identification candidate group, 0904 ... Identification similarity group corresponding to character identification candidate, 0905 ... Knowledge Character identification candidates selected as a result of processing.

Claims

An image input device that accepts input of a document image in which characters are written, a central processing unit, a storage device that holds notation knowledge prepared in correspondence with an arrangement structure definition and a description content category, a display device, and a user input device An interactive document processing device comprising an operation terminal device having a communication device and a communication device, wherein the central processing unit extracts a plurality of character string blocks from an input document image, By analyzing a two-dimensional arrangement structure on a document and referring to the arrangement structure definition based on the arrangement structure, a description content category representing the description contents of the character string block is estimated, and the operation terminal device When the input unit receives an input of an operation for instructing activation of recognition processing from a user of the interactive document processing apparatus, the character string block to be specified is selected and the sentence is selected. A character extraction candidate is extracted from the column block, the character extraction candidate is further identified, and the character identification result and the description corresponding to the character string block selected above for the character extraction candidate A document processing apparatus that recognizes the character string block with reference to notation knowledge related to a category.

In the central processing unit, a cursor is moved and clicked through a user input device such as a mouse, an electronic pen, and a touch pad, and a display operation terminal device including a display device such as a cathode ray tube display, a liquid crystal display, and a portable display terminal. Event such as drag, gesture, etc. is accepted as an instruction to start recognition processing, and the selection method of the character string block to be recognized is switched according to the recognition mode designation status stored in the storage device. The notation knowledge corresponding to the character string block is switched according to the description content category of the character string block to recognize the character string, and the output format of the reading result by the display device is switched according to the recognition mode. The document processing apparatus according to claim 1.

The central processing unit further lists a plurality of description content category candidates by estimating the description content of the character string block, calculates the likelihood of the description content category, extracts the character extraction candidates, and determines the character identification Is used to calculate the character identification result and the similarity of the identification result, the likelihood of the description content category of the estimated character string block, the similarity of the character identification result, and the notation corresponding to the description content category The text that is the reading result of the character string block obtained by applying the knowledge is also used as the reading result of the character string block, the reading results of the character string block are arranged in order of likelihood, and a plurality of reading results are obtained. The document processing apparatus according to claim 1, wherein the document processing apparatus stores the reading result in a data storage device or sends the read result to another document processing device through the communication device.

2. The document processing apparatus according to claim 1, comprising an operation terminal device, a storage device, a central processing unit, and a communication device, and a business processing device including the central processing unit, the storage device, and the communication device. The central processing unit of the interactive document processing device transmits a reading result of a character string block group designated as a recognition target to the business processing device through the communication device, and the reading processing system is configured. The central processing unit of the business processing device retrieves information related to the result from the business database stored in the storage device, and transmits information obtained from the business database from the communication device of the business processing device, Information is received by the interactive document processing apparatus, and the received information is transmitted as data related to the character string block designated as a recognition target through the display operation terminal apparatus. Document processing system is presented to the user, and performs an interactive document processing.

A program executed by a computer including an operation terminal device, a storage device, and a central processing unit, the step of receiving input of an image in which characters are described in the central processing unit, and a character string from the image Extracting a block; estimating a description content category representing a description content of the character string block from a two-dimensional arrangement structure of the character string block with reference to an arrangement structure definition stored in the storage device; A step of displaying the image on a display screen of the operation terminal device and receiving a user operation input from an input unit of the operation terminal device; a step of determining a recognition start by a user operation; and a recognition start Receiving a character string block to be recognized from the image and extracting a character extraction candidate from the character string block; and The candidate is character-identified, and as a result of the character identification, the description content category corresponding to the character string block and the data including the character extraction candidate match the description content category held in the storage device. A step of performing character string recognition using notation knowledge, a step of retrieving related information from a business database in response to the reading result, and a step of displaying the related information on a display terminal Program to do.