JPH11102414A

JPH11102414A - Method and device for correcting optical character recognition by using bitmap selection and computer-readable record medium record with series of instructions to correct ocr output error

Info

Publication number: JPH11102414A
Application number: JP10110882A
Authority: JP
Inventors: J Mcnaney Michael; ジェイ．マキナニーマイケル
Original assignee: KURARITEC CORP
Current assignee: KURARITEC CORP
Priority date: 1997-07-25
Filing date: 1998-04-21
Publication date: 1999-04-13
Also published as: US6453079B1

Abstract

PROBLEM TO BE SOLVED: To allow a user without having to look at the document of an original paper at the time of correcting an error that occurs, while an original text is converted into a text of an OCR output. SOLUTION: An document image 210 that is a source of optical character reader(OCR) output is displayed. A recognition likelihood parameter is decided with respect to an area of a document image that corresponds to a work in the OCR output. The areas are displayed so that they represent respective recognition likelihood parameters (for instance, emphasized by various colors). It is preferable that a user be able to select an area where the displayed document image exits. When the area is selected, a text of the OCR output which corresponds to the selected area is shown in a pop-up menu.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、光学式文字認識に
関し、特に光学式文字認識出力の誤りを発見するための
方法および装置、並びに、ＯＣＲ出力の誤りを発見する
ための一連の命令を記録したコンピュータ読み取り可能
な記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to optical character recognition, and more particularly, to a method and apparatus for finding errors in optical character recognition output, and recording a series of instructions for finding errors in OCR output. Computer-readable recording medium.

【０００２】[0002]

【従来の技術】紙の書類からテキストおよびグラフィッ
クを取得することは、多くの産業にとって重大な問題で
ある。例えば出版会社は年間を通じて何百または何千の
学術論文を印刷するかもしれない。しばしば出版会社は
紙の文書から作業を始め、その紙の文書は出版会社のコ
ンピュータ装置に入力されなければならない。従来の一
手法は、紙の文書を読んでその文書をコンピュータシス
テムにタイプ入力するために、キーボード入力者を雇う
というものである。しかしながら文書を入力することは
時間を浪費し、かつコストも高い。BACKGROUND OF THE INVENTION Obtaining text and graphics from paper documents is a significant problem for many industries. For example, a publisher may print hundreds or thousands of scholarly articles throughout the year. Often a publisher starts with a paper document that must be entered into the publisher's computing device. One conventional approach is to employ a keyboard enthusiast to read a paper document and type the document into a computer system. However, entering a document is time consuming and costly.

【０００３】光学式文字認識（以下、ＯＣＲとする）
は、出版産業およびその他の産業にとって有益であるこ
とを保証する技術である。その理由は、ＯＣＲ装置の入
力処理速度はキーボード入力者の入力速度をはるかに上
回っているからである。従って出版会社の従業員は、し
ばしば読取り走査された文書から作業を始める。その文
書はＯＣＲ装置によってコンピュータの読込み可能なテ
キストフォーマット、例えばＡＳＣＩＩに変換されてい
る。[0003] Optical character recognition (hereinafter referred to as OCR)
Is a technology that guarantees benefits to the publishing and other industries. The reason is that the input processing speed of the OCR device is much higher than the input speed of the keyboard input person. Thus, publisher employees often begin work with scanned documents. The document has been converted by the OCR device into a computer readable text format, for example, ASCII.

【０００４】しかしながら、最近のＯＣＲ装置でもって
可能な高い認識率（しばしば９５％を越える）ですら、
高い正確度を必要とする出版産業のような産業にとって
は十分でない。従って通常、出版会社は校正係の人を雇
い、手作業でＯＣＲ出力の修正を行う。[0004] However, even with the high recognition rates (often over 95%) possible with modern OCR devices,
It is not enough for industries such as the publishing industry that require high accuracy. Thus, publishers typically employ a proofreader to manually modify the OCR output.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、ＯＣＲ
出力を手作業で校正することは、非常に時間を浪費し、
また人が行うのは困難である。校正係の人は、元の紙の
文書とＯＣＲ出力の印刷またはスクリーン表示とを見比
べてそれらを一語一語比較しなければならない。たとえ
認識率が高くても、人がＯＣＲ出力の校正を行うと一人
よがりになって誤りを見落としがちである。SUMMARY OF THE INVENTION However, OCR
Calibrating the output manually is very time consuming,
It is also difficult for humans to do. The proofreader must compare the original paper document with the printed or screen display of the OCR output and compare them word by word. Even if the recognition rate is high, when a person calibrates the OCR output, one person tends to miss and miss an error.

【０００６】別の従来の選択は、結果として生じたコン
ピュータの読込み可能なテキストのスペルチェックを行
うことである。しかしながらスペルの間違った語すべて
を認識するというわけではない。加えて、入力された語
は非常に曲解されているかもしれないので、校正係はス
ペルチェックを行っている間中ずっと紙のテキストに戻
って参照しなければならない。一旦校正をする人は紙の
テキストを見て正しい語を決め、その正しい語をＯＣＲ
出力のテキストにキーをたたいて入力する。この手法は
時間を浪費し、またやや間違いがちであることが分かっ
ている。Another conventional option is to spell check the resulting computer readable text. However, it does not recognize all misspelled words. In addition, the entered words may be very distorted, so the proofreader must refer back to the paper text throughout the entire spell check. Once the proofreader sees the text on the paper, he decides the correct word, and the correct word is OCR
Hit the key on the text of the output and enter it. This approach has been found to be time consuming and somewhat error prone.

【０００７】本発明の目的は、元のテキストをＯＣＲ出
力のテキストに変換している間に起こった間違いを正す
際に、ユーザが元の紙の文書を見なくても済むようにす
ることである。It is an object of the present invention to eliminate the need for a user to view an original paper document in correcting errors made while converting the original text to OCR output text. is there.

【０００８】[0008]

【課題を解決するための手段】人がＯＣＲ出力を校正す
ることを容易に行えるようにする必要がある。特にＯＣ
Ｒ出力を構成するのに費やされる時間を減らす必要があ
る。SUMMARY OF THE INVENTION There is a need to facilitate human calibration of OCR output. Especially OC
There is a need to reduce the time spent configuring the R output.

【０００９】これらおよび他の必要性は、本発明によっ
てうまく処理される。元の紙の文書から得られた文書イ
メージの文字は、文書テキストを生成するために（例え
ばＯＣＲを介して）認識される。文書テキストの語に対
応する文書イメージの領域が決定され、そして認識確度
パラメータが各領域に対して決定される。文書イメージ
の領域は、それぞれの認識パラメータを示す方法で表示
され得る。ユーザが文書テキストを進むのに伴って、文
書イメージが同じように表示される。[0009] These and other needs are successfully addressed by the present invention. The characters of the document image obtained from the original paper document are recognized (e.g., via OCR) to generate the document text. The regions of the document image corresponding to the words of the document text are determined, and recognition accuracy parameters are determined for each region. The regions of the document image can be displayed in a manner that indicates the respective recognition parameters. As the user progresses through the document text, the document image is displayed in the same way.

【００１０】好ましくはユーザは文書イメージにおける
ある位置を選択することができる。選択される語は、文
書イメージにおいてその選択された位置を含む文書領域
および例えばポップアップメニュー内の表示によって決
定される。加えて認識確度パラメータは１つ以上の閾値
と比較され、そして越えた閾値に対応する色で表示され
てもよい。[0010] Preferably, the user can select a location in the document image. The word selected is determined by the document area containing the selected location in the document image and the display, for example, in a pop-up menu. In addition, the recognition accuracy parameter may be compared to one or more thresholds and displayed in a color corresponding to the exceeded threshold.

【００１１】付加的な目的、利点および本発明の新規な
特徴は、以下の詳細な説明の部分で明らかとなり、また
その部分において検査により明白となるか、あるいは本
発明の実施により分かるであろう。本発明の目的および
利点は、特許請求の範囲で指摘されている方法および装
置によって理解され、また得られるであろう。[0011] Additional objects, advantages and novel features of the invention will become apparent in the detailed description which follows, and in which part will be apparent upon inspection, or will be learned by practice of the invention. . The objects and advantages of the invention will be realized and attained by the methods and devices pointed out in the appended claims.

【００１２】[0012]

【発明の実施の形態】以下に、本発明に係るヒートマッ
プを用いて光学式文字認識の訂正を行うための方法およ
び装置、並びに、ＯＣＲ出力の誤りを発見するための一
連の命令を記録したコンピュータ読み取り可能な記録媒
体を添付図面に示された具体例によって説明するが、本
発明はそれに制限されるものではない。また図面におい
ては、同じ参照符号を付された構成要素は同様の構成要
素であることを表している。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A method and apparatus for correcting optical character recognition using a heat map according to the present invention and a series of instructions for detecting an error in an OCR output are described below. A computer-readable recording medium will be described with reference to specific examples illustrated in the accompanying drawings, but the present invention is not limited thereto. In addition, in the drawings, components denoted by the same reference numerals represent similar components.

【００１３】ＯＣＲ出力の誤りを発見する方法および装
置について説明する。以下の説明においては、説明の便
宜上、本発明を完全に理解するために多数の特別な細部
が設けられている。しかしながら本発明はこれらの特別
な細部がなくても実施可能であるということは、明らか
である。別の例では、本発明をいたずらに分かり難くす
るのを避けるために、周知の構造および装置はブロック
図の形態で示されている。A method and apparatus for finding an error in an OCR output will be described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It is apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

【００１４】（ハードウェアの概略）図１について説明
すると、同図は、本発明の一例が実施され得るコンピュ
ータシステム１００のブロック図である。コンピュータ
システム１００は、情報を伝達するためにバス１１０ま
たは他の伝達手段を備えており、また情報を処理するた
めにプロセッサ１１２がバス１１０に接続されている。
さらにコンピュータシステム１００はランダムアクセス
メモリ（ＲＡＭ）または他のダイナミック記憶装置１１
４（メインメモリとして示されている）を備えており、
そのメインメモリ１１４は、情報およびプロセッサ１１
２によって実行されるべき命令を記憶するためにバス１
１０に接続されている。またメインメモリ１１４は、プ
ロセッサ１１２が命令を実行している間、一時的な変数
や他の中間的な情報を記憶するのにも使用されてもよ
い。またコンピュータシステム１００は読出し専用メモ
リ（ＲＯＭ）および他のスタティック記憶装置１１６の
一方または両方を備えており、それらはバス１１０に接
続されていて、静的な情報およびプロセッサ１１２に対
する命令を記憶する。データ記憶装置１１８は、例えば
磁気ディスクや光ディスクおよびそれに相当するディス
クのドライブであり、情報および命令を記憶するために
バス１１０に接続され得る。FIG. 1 is a block diagram of a computer system 100 on which an example of the present invention can be implemented. Computer system 100 includes a bus 110 or other communication means for communicating information, and a processor 112 connected to bus 110 for processing information.
Further, computer system 100 may include a random access memory (RAM) or other dynamic storage device 11.
4 (shown as main memory),
The main memory 114 stores information and the processor 11
2 to store instructions to be executed by
10 is connected. Main memory 114 may also be used to store temporary variables and other intermediate information while processor 112 is executing instructions. Computer system 100 also includes one or both of read-only memory (ROM) and other static storage devices 116, which are connected to bus 110 and store static information and instructions for processor 112. Data storage device 118 is, for example, a drive for a magnetic or optical disk and the equivalent disk, and may be connected to bus 110 for storing information and instructions.

【００１５】またコンピュータシステム１００には、バ
ス１１０を介して入出力装置が接続され得る。例えばコ
ンピュータシステム１００は、コンピュータのユーザに
情報を表示するために、例えばブラウン管（ＣＲＴ）の
ような表示装置１２０を用いる。さらにコンピュータシ
ステム１００は、キーボード１２２および例えばマウス
のようなカーソル制御手段１２４を用いる。加えてコン
ピュータシステム１００は、紙の文書をコンピュータの
読込み可能なフォーマットに変換するためのスキャナー
１２６を用いてもよい。さらにまたコンピュータシステ
ム１００は、スキャナー１２６によって生成された文書
イメージ、またはメインメモリ１１４や記憶装置１１８
に記憶された文書イメージにおける文字を認識するため
にＯＣＲ装置１２８を用いることができる。あるいはＯ
ＣＲ装置１２８の機能は、メインメモリ１１４に記憶さ
れた命令をプロセッサ１１２で実行することによって、
ソフトウェアで実施され得る。さらに別に例では、スキ
ャナー１２６とＯＣＲ装置１２８は、紙の文書を走査し
てそこにある文字を認識するように設計された単一の装
置に組み合わせられ得る。An input / output device can be connected to the computer system 100 via a bus 110. For example, the computer system 100 uses a display device 120, such as a cathode ray tube (CRT), to display information to a computer user. Further, the computer system 100 uses a keyboard 122 and cursor control means 124 such as a mouse. In addition, computer system 100 may use a scanner 126 to convert paper documents into a computer readable format. Furthermore, the computer system 100 may store the document image generated by the scanner 126 or the main memory 114 or the storage device 118
The OCR device 128 can be used to recognize characters in a document image stored in the OCR. Or O
The function of the CR device 128 is performed by executing the instructions stored in the main memory 114 by the processor 112.
It can be implemented in software. In yet another example, the scanner 126 and the OCR device 128 can be combined into a single device designed to scan a paper document and recognize the characters there.

【００１６】本発明は、ＯＣＲ出力において誤りを見つ
けるためにコンピュータシステム１００を使用すること
に関する。一実施の形態によれば、ＯＣＲ出力の誤りを
見つけることは、メインメモリ１１４に格納された一連
の命令をプロセッサ１１２が実行することに応じてコン
ピュータシステム１００によって遂行される。そのよう
な命令は、例えばデータ記憶装置１１８のような別のコ
ンピュータ読込み可能媒体からメインメモリ１１４内に
読み込まれてもよい。メインメモリ１１４内に格納され
た一連の命令を実行することによって、プロセッサ１１
２は後述する処理工程を遂行することとなる。別の例で
は、本発明を実施するためにソフトウェアによる命令に
代えて、あるいはソフトウェアの命令とともにハードワ
イヤード回路が用いられてもよい。従って、本発明はハ
ードウェア回路とソフトウェアとの如何なる特定の組合
わせにも制限されない。The present invention relates to using computer system 100 to find errors in OCR output. According to one embodiment, finding errors in the OCR output is performed by computer system 100 in response to processor 112 executing a sequence of instructions stored in main memory 114. Such instructions may be read into main memory 114 from another computer-readable medium, such as data storage device 118. By executing a series of instructions stored in the main memory 114, the processor 11
2 performs the processing steps described below. In another example, hardwired circuitry may be used in place of, or in conjunction with, software instructions to implement the present invention. Accordingly, the present invention is not limited to any particular combination of hardware circuits and software.

【００１７】（合成文書アーキテクチャ）合成文書は、
ある文書の多数の表現を有しており、その多数の表現を
論理的な全体として取り扱う。合成文書２００は、図２
に示すように、例えばコンピュータシステム１００のメ
インメモリ１１４や記憶装置１１８のようなメモリに記
憶されている。(Synthesized Document Architecture) A synthesized document is
It has multiple representations of a document and treats the multiple representations as a logical whole. The composite document 200 is shown in FIG.
As shown in FIG. 3, the information is stored in a memory such as the main memory 114 or the storage device 118 of the computer system 100.

【００１８】合成文書２００は文書イメージ２１０を備
えており、そのイメージは文書の文書（例えばスキャナ
ー１２６から生成されたＴＩＦＦファイル）のビットマ
ップ表示である。例えばアメリカ合衆国憲法のコピー
は、文書イメージ２１０の形態でアメリカ合衆国憲法の
イメージを生成するために、スキャナー１２６によって
読取り走査されてもよい。The composite document 200 includes a document image 210, which is a bitmap representation of a document of the document (eg, a TIFF file generated from the scanner 126). For example, a copy of the United States Constitution may be read and scanned by scanner 126 to produce an image of the United States Constitution in the form of a document image 210.

【００１９】ビットマップ表示はピクセルの列であり、
モノクロ（例えば黒と白）または多色（例えば赤、青、
緑等）で表され得る。従って文書イメージ２１０の矩形
領域の位置は、例えば矩形の左上隅と右下隅を組み合わ
せることによって特定され得る。アメリカ合衆国憲法を
読取り走査する例では、前文の「form」という単語の最
初の文字は、左上が（１６，１１０）の座標で右下が
（３１，１１９）の座標の矩形内に配置されてもよい。
従って同じ単語の最後の文字は、左上が（１６，１４
０）の座標で右下が（３１，１４９）の座標の矩形内に
配置され得る。A bitmap representation is a row of pixels,
Monochrome (eg black and white) or multicolor (eg red, blue,
Green, etc.). Therefore, the position of the rectangular area of the document image 210 can be specified, for example, by combining the upper left corner and the lower right corner of the rectangle. In the example of reading and scanning the United States Constitution, the first letter of the word "form" in the preceding sentence may be located within a rectangle whose upper left corner is (16,110) and lower right is (31,119). Good.
Therefore, the last character of the same word is (16,14)
The lower right of the coordinates of (0) can be arranged in the rectangle of the coordinates of (31,149).

【００２０】また合成文書２００は、文書テキスト２２
０および相関テーブル２３０を備えており、それらは図
３のフローチャートに示す方法によって生成されてもよ
い。文書テキスト２２０は、符号化したＡＳＣＩＩ、Ｅ
ＢＣＤＩＣまたはユニコード（Unicode ）に文字を符号
化した一続きの８ビットまたは１６ビットのバイトでで
きている。従って文書テキスト２２０内の文字は、文書
テキスト２２０内にオフセットにより配置され得る。前
記例では、相関テーブル２３０のオフセット欄に表され
るように、前文の「form」という単語の最初の文字はオ
フセット５７に、また同じ単語の最後の文字はオフセッ
ト６０に配置され得る。The composite document 200 is composed of the document text 22
0 and a correlation table 230, which may be generated by the method shown in the flowchart of FIG. The document text 220 is encoded ASCII, E
It is made up of a series of 8-bit or 16-bit bytes that encode characters in BCDIC or Unicode. Thus, characters in the document text 220 may be located within the document text 220 by offset. In the above example, the first character of the word “form” in the preamble may be located at offset 57 and the last character of the same word may be located at offset 60, as represented in the offset column of the correlation table 230.

【００２１】図３について説明すると、ステップＳ２５
０で、文書イメージ２１０内の文字は、ＯＣＲ装置１２
８またはそれと同等のものによって認識され、ステップ
Ｓ２５２で、文書テキスト２２０に保存される。またＯ
ＣＲ装置１２８は、ステップＳ２５０において、認識さ
れる文字の文書イメージにおける座標を出力するように
設計されている。従って文書テキスト２２０内の分かっ
ているオフセットにて認識された文字は、文書イメージ
２１０の領域に関連付けられ得る。上記例では、オフセ
ット５７にある文字は、座標（１６，１１０）および
（３１，１１９）によって定義される領域に関係づけら
れている。Referring to FIG. 3, step S25 will be described.
0, the characters in the document image 210 are
8 or equivalent and is stored in the document text 220 in step S252. Also O
The CR device 128 is designed to output the coordinates of the recognized character in the document image in step S250. Thus, characters recognized at known offsets in the document text 220 may be associated with regions of the document image 210. In the above example, the character at offset 57 is associated with the area defined by coordinates (16, 110) and (31, 119).

【００２２】加えてＯＣＲ装置１２８の幾つかの実施例
は、当該技術分野において周知のように、その認識結果
が正しい結果である可能性を示す認識確度パラメータを
出力するように設計されている。例えばあるフォント
で、文書イメージ２１０の文字「rn」は、判断され得る
可能性を有する文字「m 」として認識されるかもしれな
い。この場合には、ＯＣＲ装置１２８は例えばその文字
対に対して６０％の認識確度パラメータを出力してもよ
い。In addition, some embodiments of the OCR device 128 are designed to output a recognition accuracy parameter that indicates the likelihood that the recognition result is correct, as is well known in the art. For example, in one font, the character "rn" in the document image 210 may be recognized as a character "m" that has the potential to be determined. In this case, the OCR device 128 may output a 60% recognition accuracy parameter for the character pair, for example.

【００２３】ステップＳ２５４で、文書テキスト２２０
の単語は、例えば空白の間の文字を語として解釈するこ
とによって特定される。ステップＳ２５４で、その語の
文字に対応する文書イメージ２１０の領域は、合併され
て文書テキスト２２０の完全な語に対応する１つの領域
になる。一実施の形態では、文書領域は、個々の文字に
対応する領域の座標のうち最も左上の座標と最も右下の
座標を有する矩形として特定される。例えば先の前文に
おける「form」という単語に対応する領域は、座標（１
６，１１０）および（３１，１４９）を有する矩形によ
って特定される。あるいは特に種々のサイズの文字を有
する文書に対しては、全ての基本的な文字に対する座標
リストは保存されてもよい。In step S254, the document text 220
Is specified by, for example, interpreting characters between spaces as words. In step S254, the area of the document image 210 corresponding to the character of the word is merged into one area corresponding to the complete word of the document text 220. In one embodiment, the document area is specified as a rectangle having the upper left coordinates and the lower right coordinates of the coordinates of the area corresponding to each character. For example, the area corresponding to the word “form” in the previous sentence is represented by coordinates (1
6,110) and (31,149). Alternatively, especially for documents having characters of various sizes, the coordinate list for all basic characters may be saved.

【００２４】ある単語が特定されると、全ての文字また
は文字対の認識確度パラメータからその単語に対する認
識確度パラメータが計算される。好ましくは単語に対す
る認識確度パラメータは、個々の文字に基づく認識確度
パラメータを掛け合わせることによって算出される。
「form」という単語を認識する場合には、「f 」および
「o 」の文字はきわめて高い認識確度パラメータ（例え
ば９５％および９０％）を有するが、しかし「rm」の文
字対は６０％の認識確度パラメータしか有していないか
もしれない。これらの認識確度パラメータを掛け合わせ
ると、全体の認識確度パラメータは５１．３％となる。
あるいは別の計算では、例えばその単語に対して最小の
認識確度パラメータ（例えば６０％）が用いられてもよ
い。When a word is specified, a recognition accuracy parameter for the word is calculated from the recognition accuracy parameters of all characters or character pairs. Preferably, the recognition accuracy parameters for the words are calculated by multiplying the recognition accuracy parameters based on the individual characters.
When recognizing the word "form", the letters "f" and "o" have very high recognition accuracy parameters (eg, 95% and 90%), but the letter pair "rm" has a 60% It may only have the recognition accuracy parameter. When these recognition accuracy parameters are multiplied, the overall recognition accuracy parameter is 51.3%.
Alternatively, in another calculation, for example, a minimum recognition accuracy parameter (eg, 60%) for the word may be used.

【００２５】ステップＳ２５６で、文書テキスト２２０
の各語についての情報は相関テーブル２３０に保存さ
れ、そのため文書イメージ２１０の領域は文書テキスト
２２０の語に関係づけられ得る。特に相関テーブル２３
０は、文書イメージ２１０における領域を特定する座標
対２３２、文書テキスト２２０における単語の位置を特
定するオフセット対２３４、およびその単語に対する認
識確度パラメータ２３６を格納する。上記例では、「fo
rm」という単語は、（１６，１１０）および（３１，１
４９）の座標対２３２と、５７および６０のオフセット
対２３４と、５１．３％の認識確度パラメータ２３６を
有する。In step S256, the document text 220
Is stored in the correlation table 230 so that regions of the document image 210 can be associated with words of the document text 220. In particular, the correlation table 23
0 stores a coordinate pair 232 for specifying an area in the document image 210, an offset pair 234 for specifying the position of a word in the document text 220, and a recognition accuracy parameter 236 for the word. In the above example, "fo
rm "are (16,110) and (31,1
49) has a coordinate pair 232, an offset pair 234 of 57 and 60, and a recognition accuracy parameter 236 of 51.3%.

【００２６】相関テーブル２３０を用いると、文書テキ
スト２２０におけるオフセット２３４は、座標２３２で
特定される文書イメージ２１０の領域に対応し、その逆
も同じである。例えば（２３，１２７）の座標が与えら
れると、与えられた座標が、オフセット５７−６０に位
置する単語内に含まれていることを決めるために、相関
テーブル２３０の座標２３２の縦列が調べられ得る。文
書テキスト２２０のそのオフセットに位置する単語、上
記例の場合には、「form」という単語が導き出され得
る。Using the correlation table 230, the offset 234 in the document text 220 corresponds to the area of the document image 210 specified by coordinates 232, and vice versa. For example, given the coordinates (23,127), the column of coordinates 232 in the correlation table 230 is examined to determine that the given coordinates are included in the word located at offset 57-60. obtain. The word located at that offset in the document text 220, in the example above, the word "form" may be derived.

【００２７】もう一方については、与えられたオフセッ
ト（例えば５８）に対して相関テーブル２３０が調べら
れ、そしてその結果として（１６，１１０）および（３
１，１４９）の座標を有する矩形が特定される。従っ
て、ここで説明された合成文書アーキテクチャは、文書
テキスト２２０における語の配置を文書イメージ２１０
の対応する領域に関係づける一方法を提供している。For the other, the correlation table 230 is consulted for a given offset (eg, 58), and as a result, (16, 110) and (3)
A rectangle having the coordinates of (1,149) is specified. Thus, the composite document architecture described herein uses word placement in document text 220 as document image 210.
Provides a way to relate to the corresponding region of

【００２８】（誤認識の見込みを有する語の指摘）元の
紙の文書を参照するのに関する時間を減らすために、元
の紙の文書の読取り走査されたイメージ（すなわち文書
イメージ２１０）は、校正する人に対して表示される。
アメリカ合衆国憲法を読取り走査した例において、前文
の読取り走査されたイメージが、図４に示すようにイメ
ージ表示３００内に表示されていてもよい。(Indicating Words with Potential Misrecognition) To reduce the time associated with referencing the original paper document, the scanned image of the original paper document (ie, document image 210) is proofread. Displayed to those who do.
In an example of reading and scanning the United States Constitution, the read and scanned image of the preamble may be displayed in image display 300 as shown in FIG.

【００２９】イメージ表示３００では、最も誤認識の可
能性の高い語は、異なる状態で表示される。例えば異な
る色で明るくしたり、フォントを変えたり、きらめかせ
たり、下線を付すなど。これらの語は、対応する認識確
度パラメータ２３６を規定された閾値と比較することに
よって決められ得る。例えば認識確度パラメータ２３６
が６０％以下の語は赤で表示され、間違っているおそれ
のある語にユーザの注意を向けさせることができる。In the image display 300, the word most likely to be erroneously recognized is displayed in a different state. For example, lightening in different colors, changing fonts, shimmering, underlining, etc. These words may be determined by comparing the corresponding recognition accuracy parameter 236 to a defined threshold. For example, the recognition accuracy parameter 236
Words with less than or equal to 60% are displayed in red, and can draw the user's attention to words that may be wrong.

【００３０】上記例において、元の語「form」が５１．
３％の認識確度パラメータ２３６でもって「fonn」とし
て誤認識されたとする。この場合、「fonn」という単語
に対応するイメージ表示３００の領域の黒のピクセルは
赤のピクセルとして表示される。好ましい例では、ある
文字のイメージの周囲にある背景ピクセルの色が、その
文字イメージを構成するピクセルの色に代えて、変えら
れる。In the above example, the original word "form" is 51.
It is assumed that an incorrect recognition as “fonn” is made with a recognition accuracy parameter 236 of 3%. In this case, the black pixels in the area of the image display 300 corresponding to the word "fonn" are displayed as red pixels. In a preferred example, the color of the background pixels surrounding the image of a character is changed instead of the colors of the pixels constituting the character image.

【００３１】好ましい例では、さらに認識確度パラメー
タ２３６は、文書イメージ２１０の領域にふさわしい個
々の表示色を決めて、認識された語の「ヒートマップ」
を形成するために、複数の閾値と比較される。ヒートマ
ップは、複数の色を用いてスペクトルの種々の点でのパ
ラメータ（例えば周波数、温度または認識確度）の値を
示した図表である。結果として生じる「ヒートマップ」
は、ＯＣＲ出力について文書イメージの最も問題の有り
そうな部分にユーザを導く助けとなる。In a preferred example, the recognition accuracy parameter 236 further determines the individual display colors appropriate to the area of the document image 210 and provides a "heat map" of the recognized word.
Is compared to a plurality of thresholds to form The heat map is a chart showing values of parameters (for example, frequency, temperature, or recognition accuracy) at various points in the spectrum using a plurality of colors. The resulting "heat map"
Helps guide the user to the most problematic parts of the document image for OCR output.

【００３２】図５について説明すると、ステップＳ３１
０で制御されるループによって文書イメージ２１０がイ
メージ表示３００に表示される時に、ヒートマップが生
成される。ステップＳ３１０は、イメージ表示３００に
表示されるべき各領域全部についてループをなす。ステ
ップＳ３２０で、表示された領域に対応する認識確度パ
ラメータ２３６を見つけるために、相関テーブル２３０
が調べられる。それからこのパラメータ２３６は複数の
閾値、例えば６０％、８０％および９０％と引き続き比
較される。Referring to FIG. 5, step S31 will be described.
When the document image 210 is displayed on the image display 300 by the loop controlled by 0, a heat map is generated. Step S310 forms a loop for all the areas to be displayed on the image display 300. In step S320, the correlation table 230 is searched to find the recognition accuracy parameter 236 corresponding to the displayed area.
Is examined. This parameter 236 is then subsequently compared to a plurality of thresholds, for example, 60%, 80% and 90%.

【００３３】ステップＳ３２２−Ｓ３３４は、例えば閾
値を６０％、８０％および９０％とした場合のヒートマ
ップ表示の生成処理を示している。まず最も低い閾値で
ある６０％が比較用の閾値として使用される。認識確度
パラメータ２３６がその閾値よりも低い場合には、表示
領域の色は赤に設定される（ステップＳ３２４）。上記
例では、「form」という語は、その認識確度パラメータ
２３６が５１．３％であるため、赤で表示される。図４
で赤に設定される他の語は「general 」と「Constituti
on」である。Steps S322 to S334 show a process of generating a heat map display when the thresholds are set to 60%, 80% and 90%, for example. First, the lowest threshold of 60% is used as a threshold for comparison. If the recognition accuracy parameter 236 is lower than the threshold, the color of the display area is set to red (step S324). In the above example, the word “form” is displayed in red because its recognition accuracy parameter 236 is 51.3%. FIG.
Other words set to red in "general" and "Constituti
on ".

【００３４】つぎにステップＳ３２６では、つぎに低い
閾値である８０％が比較用の閾値として使用される。認
識確度パラメータ２３６がその閾値よりも低い場合に
は、表示領域の色は緑に設定される（ステップＳ３２
８）。上記例では、「Union 」という語は、その認識確
度パラメータ２３６が７５％であり、それゆえ緑で表示
される。図４で緑に設定される他の語は「ensure」と
「secure」である。Next, in step S326, the next lower threshold value of 80% is used as a comparison threshold value. If the recognition accuracy parameter 236 is lower than the threshold, the color of the display area is set to green (step S32).
8). In the above example, the word "Union" has a recognition accuracy parameter 236 of 75% and is therefore displayed in green. Other words set to green in FIG. 4 are "ensure" and "secure".

【００３５】ステップＳ３３０で、最後の閾値である９
０％が比較用の閾値として使用される。認識確度パラメ
ータ２３６がその閾値よりも低い場合には、表示領域の
色は青に設定される（ステップＳ３３２）。図４で青に
設定される語は「more」と「Tranquility 」（ポップア
ップメニュー３０４によって部分的に隠されている）と
「establish 」である。他方、認識確度パラメータ２３
６が全ての閾値よりも高い場合には、表示領域の色は、
デフォルトの色と同じ黒になる（ステップＳ３３４）。
色が設定されると、その領域はその色で表示される（ス
テップＳ３３６）。In step S330, the last threshold value of 9
0% is used as a threshold for comparison. If the recognition accuracy parameter 236 is lower than the threshold, the color of the display area is set to blue (step S332). The words set in blue in FIG. 4 are "more", "Tranquility" (partially hidden by pop-up menu 304), and "establish". On the other hand, the recognition accuracy parameter 23
If 6 is higher than all thresholds, the color of the display area is
It becomes the same black as the default color (step S334).
When the color is set, the area is displayed in that color (step S336).

【００３６】閾値に対する数および色が、本発明の趣旨
から逸脱することなく、実施の形態に応じて変わっても
よいことは十分に理解されよう。例えば閾値が１つ、２
つ、３つまたは１０個でさえもかまわない。別の例とし
て、色の選択が変わってもよい（例えば赤、オレンジ、
黄色）。実際に例えば点滅や下線のような表示色以外の
表示属性が採用されてもよい。また図５のフローチャー
トに示すように分岐を厳格に体系化せずに、閾値および
表示色または他の表示属性が１つのテーブルに入力され
ていて１つのループで引き続き調べられてもよいことも
理解され得る。It will be appreciated that the numbers and colors for the thresholds may vary depending on the embodiment without departing from the spirit of the invention. For example, one threshold, 2
One, three, or even ten. As another example, the color selection may change (e.g., red, orange,
yellow). Actually, display attributes other than the display color such as blinking and underlining may be employed. It is also understood that the thresholds and display colors or other display attributes may be entered in one table and continually examined in one loop without strictly organizing the branches as shown in the flowchart of FIG. Can be done.

【００３７】ユーザが文書イメージ２１０の強調された
語の上にカーソル３０２を位置させ、そして文書テキス
ト２２０内に対応する認識されたテキストを近くに表示
させること（例えばポップアップメニュー表示）によっ
て、誤りを訂正することはさらに容易になり得る。例え
ばユーザが文書イメージ２１０の「form」という赤色の
単語の上にカーソル３０２を位置させると、ポップアッ
プメニュー３０４が表示されて、その語が「fonn」とし
て誤認識されたことが分かるようになっていてもよい。
ユーザがその語を訂正すると、訂正された語の認識確度
パラメータ２３６は１００％に再設定され、訂正された
語に対応する文書イメージ２１０の領域の表示は黒に戻
される。The user may place the cursor 302 over the highlighted word in the document image 210 and cause the corresponding recognized text in the document text 220 to be displayed nearby (eg, a pop-up menu display) to correct the error. Corrections can be even easier. For example, when the user positions the cursor 302 over the red word "form" in the document image 210, a pop-up menu 304 is displayed so that the user can know that the word has been incorrectly recognized as "fonn". You may.
When the user corrects the word, the recognition accuracy parameter 236 of the corrected word is reset to 100%, and the display of the area of the document image 210 corresponding to the corrected word is returned to black.

【００３８】ステップＳ３１０により制御されたループ
を実行した後、文書イメージ２１０は、例えば高解像度
モニタのような表示装置１２０上にイメージ表示３００
として表示される。加えてカーソル３０２はイメージ表
示３００上に表示される。そしてユーザは、マウスやト
ラックボールやジョイスティックのようなカーソル制御
手段１２４を用いて、イメージ表示３００の如何なる部
分上にもカーソル３０２を位置させてもよい。After executing the loop controlled by step S310, the document image 210 is displayed on the display device 120 such as a high-resolution monitor by the image display 300.
Will be displayed as In addition, the cursor 302 is displayed on the image display 300. Then, the user may position the cursor 302 on any part of the image display 300 using the cursor control means 124 such as a mouse, a trackball, or a joystick.

【００３９】ステップＳ３４０で、誤り発見装置は、イ
メージ表示３００上のある位置を選択する入力を受け取
る。この入力は、カーソル３０２がイメージ表示３００
上に置かれる時にはいつでも、あるいはユーザがボタン
を操作する時にのみ、カーソル制御手段１２４によって
自動的に生成されてもよい。後者の場合には、ユーザが
ボタンを操作する時に、カーソル制御手段１２４はカー
ソル３０２の現在の位置を入力として送る。In step S340, the error detection device receives an input for selecting a position on the image display 300. This input is performed when the cursor 302 is displayed on the image display 300.
It may be automatically generated by the cursor control means 124 whenever it is placed on it or only when the user operates a button. In the latter case, when the user operates the button, the cursor control means 124 sends the current position of the cursor 302 as an input.

【００４０】ステップＳ３４０で受け取られる入力に関
連付けられる位置は、当該技術分野において周知のマッ
ピング技術によって、イメージ表示３００の座標システ
ムから文書イメージ２１０の座標システムに変換され
る。量の多い文書の文書イメージ２１０は、それよりも
小さいイメージ表示３００内に納まらないので、座標変
換はしばしば必要となる。図４に示す例では、イメージ
表示３００におけるカーソル３０２の位置は文書イメー
ジ２１０の座標（２３，１２７）に対応している。The location associated with the input received in step S340 is converted from the coordinate system of image display 300 to the coordinate system of document image 210 by mapping techniques well known in the art. Coordinate transformations are often required because the document image 210 of a large document does not fit within the smaller image display 300. In the example shown in FIG. 4, the position of the cursor 302 on the image display 300 corresponds to the coordinates (23, 127) of the document image 210.

【００４１】ステップＳ３４２では、ステップＳ３４０
で受け取られた入力から得られた座標２３２を含む領域
を指定する記載を求めて、相関テーブル２３０が調べら
れる。上記例では、座標（２３，１２７）は、（１６，
１１０）−（３１，１４９）の座標範囲によって決まる
領域に含まれる。文書テキスト２２０に対するオフセッ
ト対２３４は、相関テーブル２３０の記載から取り出さ
れ、そして文書テキスト２２０において選択された語を
決めるために使用される。上記例では、対応するオフセ
ット対は５７−６０である。このオフセット対は、オフ
セット対２３４の範囲内のオフセットにて文書テキスト
２２０内に置かれた文字列を抜き出すのに使用される。
上記例では前文の「form」という元の単語が「fonn」と
して誤認識されたと仮定したが、その場合そのオフセッ
ト対２３４の範囲で選択された語は「fonn」となるであ
ろう。In step S342, step S340
The correlation table 230 is consulted for a description that specifies an area that includes the coordinates 232 obtained from the input received at. In the above example, the coordinates (23, 127) are (16,
110)-(31,149) are included in the area determined by the coordinate range. The offset pair 234 for the document text 220 is taken from the entry in the correlation table 230 and used to determine the selected word in the document text 220. In the above example, the corresponding offset pair is 57-60. This offset pair is used to extract the string placed in the document text 220 at an offset within the offset pair 234.
In the above example, it was assumed that the original word "form" in the preceding sentence was misrecognized as "fonn", in which case the word selected within the offset pair 234 would be "fonn".

【００４２】ステップＳ３４４で、選択された語は、カ
ーソル３０２の近くでポップアップメニュー３０４内に
表示され、そのため、ユーザは認識された語が何である
かを容易に判断することができる。従って上記例では、
ポップアップメニュー３０４は「fonn」という選択され
た語を表示し、そのためポップアップメニューが表示さ
れると、ユーザは、文書イメージ２１０のイメージ表示
３００を見るだけで、選択された語が正しくないことを
判断することができる。In step S344, the selected word is displayed in pop-up menu 304 near cursor 302, so that the user can easily determine what the recognized word is. Therefore, in the above example,
Pop-up menu 304 displays the selected word "fonn" so that when the pop-up menu is displayed, the user can only look at image display 300 of document image 210 and determine that the selected word is incorrect. can do.

【００４３】一実施の形態によれば、カーソル３０２が
イメージ表示３００のある語の上にある場合には、その
カーソルの位置が自動的に入力され、そのためポップア
ップメニュー３０４が自動的に表示される。従ってユー
ザは、イメージ表示３００のテキストの表示列の上にカ
ーソル３０２を動かすことができ、ポップアップメニュ
ー３０４内の基準位置に自動的に表示される選択された
テキストを比較することができる。従ってユーザは、あ
る文字がＯＣＲ装置１２８によって誤認識されたか否か
を判断するために元の紙を見るのに要する時間を費やさ
ずに済む。もしその語が違っている場合には、ユーザは
上述したようにしてそのテキストを訂正することができ
る。According to one embodiment, when the cursor 302 is over a word in the image display 300, the position of the cursor is automatically entered, so that the pop-up menu 304 is automatically displayed. . Thus, the user can move the cursor 302 over the displayed column of text in the image display 300 and compare the selected text automatically displayed at the reference position in the pop-up menu 304. Thus, the user does not have to spend the time required to look at the original paper to determine whether a character has been misrecognized by the OCR device 128. If the word is different, the user can correct the text as described above.

【００４４】本発明は、特定の実施の形態について言及
しながら特に詳細に説明され、また図示されたが、本発
明の趣旨または範囲から逸脱することなく、形態または
細部について上記説明または図における変形がなされて
もよいということは、当業者により理解されるであろ
う。Although the invention has been particularly described and illustrated with reference to specific embodiments, variations in form or detail may be made in the above description or figures without departing from the spirit or scope of the invention. It will be understood by those skilled in the art that

【００４５】[0045]

【発明の効果】以上、説明したとおり、この発明に係る
ヒートマップを用いて光学式文字認識の訂正を行うため
の方法および装置、並びに、ＯＣＲ出力の誤りを発見す
るための一連の命令を記録したコンピュータ読み取り可
能な記録媒体によれば、元のテキストをＯＣＲ出力のテ
キストに変換している間に起こった間違いを正す際に、
ユーザが元の紙の文書を見なくても済むという効果を奏
する。As described above, a method and apparatus for correcting optical character recognition using a heat map according to the present invention, and a series of instructions for finding an error in an OCR output are recorded. According to the computer-readable recording medium described above, when correcting an error that occurred while converting the original text to the text of the OCR output,
This has the effect that the user does not have to look at the original paper document.

[Brief description of the drawings]

【図１】本発明が実施され得るコンピュータシステムを
示す上位ブロック図である。FIG. 1 is a high-level block diagram illustrating a computer system on which the present invention can be implemented.

【図２】合成文書アーキテクチャを示すブロック図であ
る。FIG. 2 is a block diagram illustrating a composite document architecture.

【図３】合成文書の生成処理を示すフローチャートであ
る。FIG. 3 is a flowchart illustrating a process of generating a composite document.

【図４】本発明の一実施の形態によるスクリーン表示の
一例を示す図である。FIG. 4 is a diagram showing an example of a screen display according to an embodiment of the present invention.

【図５】一実施の形態によるＯＣＲ出力における誤りの
発見および訂正処理を示すフローチャートである。FIG. 5 is a flowchart showing a process of finding and correcting an error in an OCR output according to one embodiment.

[Explanation of symbols]

１１２プロセッサ１１８データ記憶装置（コンピュータ読み取り可能な
記録媒体）１２０表示装置１２４カーソル制御手段１２８光学式文字認識装置２１０文書イメージ２２０文書テキスト２３６認識確度パラメータ３０４ポップアップメニュー112 processor 118 data storage device (computer-readable recording medium) 120 display device 124 cursor control means 128 optical character recognition device 210 document image 220 document text 236 recognition accuracy parameter 304 pop-up menu

Claims

[Claims]

1. A method for finding errors in an OCR output, the method comprising: recognizing characters in a document image to generate document text; and determining a region of the document image corresponding to words in the document text. Determining a recognition accuracy parameter for the region of the document image; and displaying the region of the document image so as to represent the respective recognition accuracy parameters. How to make corrections.

Receiving an input to select a location in the document image; determining a selected word corresponding to a document area containing the location in the document image; The method for correcting optical character recognition using a heat map according to claim 1, further comprising: displaying a word.

3. The method according to claim 2, wherein displaying the selected word comprises displaying the selected word in a pop-up menu. To make corrections.

4. The method according to claim 1, wherein the step of displaying the area of the document image comprises: comparing the recognition accuracy parameter with a plurality of recognition accuracy thresholds; 2. The method according to claim 1, further comprising: setting a color of the area; and displaying the area of the document image with each set color. Way to do.

5. An apparatus for finding an error in an OCR output, comprising: an OCR apparatus for recognizing a character in a document image to generate a document text; and determining an area of the document image corresponding to a word of the document text. Means for determining a recognition accuracy parameter for the area of the document image, and a display device for displaying the area of the document image so as to represent the respective recognition accuracy parameters, using a heat map characterized by comprising: A device for correcting optical character recognition.

6. A cursor control means for receiving an input for selecting a position in a document image, and means for determining a selected word corresponding to a document area including the position in the document image. The apparatus for correcting optical character recognition using a heat map according to claim 5, wherein the display device displays the selected word.

7. The apparatus for correcting optical character recognition using a heat map according to claim 6, wherein the display device displays the selected word in a pop-up menu.

8. A comparator for comparing a recognition accuracy parameter with a plurality of recognition accuracy thresholds, and means for setting a color of the region based on the recognition accuracy threshold exceeded by the recognition accuracy parameter for each region. The device for correcting optical character recognition using a heat map according to claim 5, further comprising: a display device for displaying a region of the document image in each set color. .

9. A computer-readable recording medium having recorded thereon a series of instructions for finding an error in an OCR output, the series of instructions recognizing a character in a document image to generate a document text. Determining the area of the document image corresponding to the word of the document text, determining the recognition accuracy parameter for the area of the document image, and displaying the area of the document image to represent each recognition accuracy parameter And a series of instructions for performing the following steps.

10. The series of instructions comprising: receiving input selecting a location in a document image; and determining a selected word corresponding to a document region including the location in the document image. The computer-readable medium of claim 9, further comprising: a series of instructions for performing: and displaying the selected word.

11. The computer-readable medium of claim 10, wherein displaying the selected word comprises displaying the selected word in a pop-up menu.

12. The step of displaying an area of the document image, the step of comparing the recognition accuracy parameter with a plurality of recognition accuracy thresholds, and the step of displaying based on the recognition accuracy threshold that is exceeded by the recognition accuracy parameter for each area. 10. The computer-readable recording medium according to claim 9, comprising: setting a color of the area; and displaying an area of the document image with each set color.