JPH0520492A

JPH0520492A - Document recognizing/correcting device

Info

Publication number: JPH0520492A
Application number: JP3200037A
Authority: JP
Inventors: Noboru Shimizu; 昇清水
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1991-07-15
Filing date: 1991-07-15
Publication date: 1993-01-29

Abstract

PURPOSE:To obtain a document recognizing/correcting device which can effectively correct the documents by giving the automatic correction equal to that done by an operator to the same character, at the same time giving no wrong automatic correction to the recognized correct characters, and correcting automatically and accurately only the misrecognized characters. CONSTITUTION:When the character images are recognized by a character recognizing means 100, a 1st correction means 200 corrects the wrong characters through an operator. Then a retrieving means 300 retrieves a character string of a prescribed number of characters including those corrected by the means 200 out of the characters recognized by the means 100. A 2nd correction means 400 corrects the character equal to the misrecognized one corrected by the means 200 and included in the retrieved character string into the character corrected by the means 200.

Description

Detailed Description of the Invention

【産業上の利用分野】本発明は、紙の文書を認識する文
書認識装置において、自動修正を行う文書認識修正装置
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document recognition / correction device for automatically correcting a document recognition device for recognizing a paper document.

【０００２】[0002]

【従来の技術】紙の文書に印刷されている文字や図形を
認識し、ワープロ等の文書編集装置に入力するための文
書認識装置に関する研究が行われている。しかし、文字
認識は１００％の認識率を得ることはたいへん難しく、
現在の状況では、操作者が認識結果を確認して、誤認識
文字に対しては修正する必要がある（画像処理ハンドブ
ック昭晃堂２０．３文字認識装置（ＯＣＲ）ｐ．４
８２〜４９０）。2. Description of the Related Art Research has been conducted on a document recognition device for recognizing characters and figures printed on a paper document and inputting them to a document editing device such as a word processor. However, it is very difficult to get 100% recognition rate for character recognition,
In the current situation, it is necessary for the operator to confirm the recognition result and correct any erroneously recognized characters (Image Processing Handbook Shokoido 20.3 Character Recognition Device (OCR) p. 4).
82-490).

【０００３】[0003]

【発明が解決しようとする課題】この作業は認識したす
べての結果に対して行わなければならないため、操作者
に対して、大きな負担が生じる。また、文字認識におい
て、対象の文書画像は同一の画像入力装置から同一の条
件で入力されているため、同一文字に対しては、同じ誤
認識が起こりやすい。このことを利用して、操作者が修
正した文字と同じ文字に対しては、操作者が行った修正
と同じ修正を自動的に行えば、操作者に対する負担が軽
減されることは容易に考えられる。しかし、認識結果の
１字のみの比較によって、この自動修正を行うと、正し
く認識された文字の修正も行われてしまい、悪影響を及
ぼすという問題がある。Since this work must be performed for all the recognized results, a great burden is imposed on the operator. Further, in character recognition, since the target document image is input from the same image input device under the same conditions, the same erroneous recognition is likely to occur for the same character. By using this, it is easy to think that if the same character as the character corrected by the operator is automatically corrected with the same correction made by the operator, the burden on the operator will be reduced. To be However, if this automatic correction is performed by comparing only one character of the recognition result, the correctly recognized character is also corrected, which has a problem of having an adverse effect.

【０００４】本発明は以上のような点に鑑みてなされた
もので、その目的とするところは、認識結果の修正時に
おいて、操作者の修正の負担を軽減するために、操作者
が行った修正と同じ修正を同一文字に対して自動的に行
い、なおかつ、正しく認識された文字に対して誤った自
動修正は行わず、的確に誤認識文字のみを自動修正する
ことによって、効率的な修正が可能となる文書認識修正
装置および方法を提供することにある。The present invention has been made in view of the above points, and it is an object of the present invention to perform the correction by the operator in order to reduce the burden of the correction when the recognition result is corrected. Efficient correction by automatically making the same correction as the correction for the same character, but not automatically making mistaken automatic correction for correctly recognized characters It is to provide a document recognition correction apparatus and method capable of performing the following.

【０００５】[0005]

【課題を解決するための手段】本発明の文書認識修正装
置は、上記課題を解決するため図１に示すように、文字
画像を認識する文字認識手段１００と、この文字認識手
段１００で認識された認識結果を操作者の指示により修
正する第１の修正手段２００と、前記文字認識手段１０
０により認識された文字の中から、第１の修正手段２０
０により修正された文字を含んでいる所定の文字数の文
字列を探索する探索手段３００と、この探索手段３００
によって探索された文字列における前記第１の修正手段
２００により修正された誤認識文字と同一の文字を前記
第１の修正手段２００によって修正された修正済の文字
に修正する第２の修正手段４００とを備えている。In order to solve the above-mentioned problems, the document recognition / correction device of the present invention is, as shown in FIG. 1, a character recognition means 100 for recognizing a character image and a character recognition means 100 for recognizing the character image. And a character recognizing means 10 for correcting the recognized recognition result according to an operator's instruction.
The first correction means 20 from the characters recognized by 0
Searching means 300 for searching a character string having a predetermined number of characters including the character corrected by 0, and this searching means 300
Second correction means 400 for correcting the same character as the misrecognized character corrected by the first correction means 200 in the character string searched for by the corrected character corrected by the first correction means 200. It has and.

【０００６】また、本発明の文書認識修正装置は上記文
書認識修正装置を用いた場合であって、前記第１の修正
手段２００により修正された文字の位置よりも前に位置
する文字に対しては前記第２の修正手段４００によって
文字を修正する際、操作者に確認を得て文字の修正を行
っている。Further, the document recognition / correction apparatus of the present invention uses the above-mentioned document recognition / correction apparatus, and for a character positioned before the position of the character corrected by the first correction means 200. When the character is corrected by the second correction means 400, the operator corrects the character with confirmation.

【０００７】[0007]

【作用】文字認識手段１００により文字画像が認識され
ると、必ずしも全ての文字が正しく認識されているとは
限らないので、まず第１の修正手段２００により誤って
いる文字を操作者が修正する。次に、文字認識手段１０
０により認識された文字の中から、第１の修正手段２０
０により修正された文字を含んでいる所定の文字数の文
字列が探索手段３００により探索され、第２の修正手段
４００がその探索された文字列中の第１の修正手段２０
０により修正された誤認識文字と同一の文字を第１の修
正手段２００により修正された修正済の文字に修正す
る。こうすることにより、同一誤認識文字の探索が前後
の文字を含めた文字列を用いて行われているので、ただ
１字の文字が同一でも正しく認識されて修正不要な文字
に対しての修正は行われず、誤認識文字のみを的確に修
正することができる。When the character recognizing means 100 recognizes the character image, not all the characters are recognized correctly, so the operator first corrects the erroneous character by the first correcting means 200. . Next, the character recognition means 10
The first correction means 20 from the characters recognized by 0
A character string having a predetermined number of characters including the character corrected by 0 is searched by the searching means 300, and the second correcting means 400 is the first correcting means 20 in the searched character string.
The same character as the misrecognized character corrected by 0 is corrected to the corrected character corrected by the first correction means 200. By doing this, since the search for the same misrecognized character is performed using the character string that includes the preceding and succeeding characters, even if only one character is the same, it is correctly recognized and correction is made for unnecessary characters. Is not performed, and only misrecognized characters can be corrected accurately.

【０００８】また、第１の修正手段２００により修正さ
れた文字の位置よりも前に位置する文字に対しては第２
の修正手段４００によって文字を修正する際、操作者に
確認を得て文字の修正を行うことにより、既修正部分に
対して誤りとなる修正を防ぐことができる。Further, for the character positioned before the position of the character corrected by the first correction means 200, the second
When the character is corrected by the correction means 400, the character is corrected with the confirmation of the operator, so that it is possible to prevent the correction of the corrected portion from being an error.

【０００９】[0009]

【実施例】図２は文書認識装置全体の概要を示すもの
で、画像入力部１、イメージメモリ２、文字画像抽出部
３、文字認識部（ＯＣＲ）４、認識結果格納メモリ５、
修正部６、格納部７、文書ファイル格納装置８、制御／
操作部９からなっている。画像入力部１から紙の文書を
デジタル入力し、その原画像をイメージメモリ２に格納
しておく。入力された文書画像を文字画像抽出部３と制
御／操作部９とを介して、ＣＲＴ等の表示装置９１に表
示する。操作者が、この原画像を見ながらマウス等の指
示装置９３によって、文字画像領域のみを抽出する。図
３（ａ）が実際の文書３１に対して、文字画像領域のみ
を指定した状態を示している。点線の矩形で囲まれた領
域が操作者による指定である。このように指定された文
字画像領域情報を、図３（ｂ）に示すような表３２に格
納する。表の第１，２列は、文字画像矩形領域の左上座
標で、第３，４列は、文字画像矩形領域の幅と高さであ
る。この文字画像領域を抽出する方法として、前記した
操作者による抽出方法以外に、特開平２−１５９６９０
号公報において示されているような黒画素塊の特徴を抽
出して自動的に文字と図形を分離する方法を用いること
もできる。文字認識部（ＯＣＲ）４において、イメージ
メモリ２と、文字画像抽出部３において抽出された文字
画像領域を示す表３２とを使用して、指定された文字画
像領域を認識する。これによって、認識結果を各々の文
字画像領域ごとに図４に示すような表形式で認識結果格
納表５１として認識結果格納メモリ５に出力する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 2 shows an outline of the entire document recognition apparatus. An image input section 1, an image memory 2, a character image extraction section 3, a character recognition section (OCR) 4, a recognition result storage memory 5,
Correction unit 6, storage unit 7, document file storage device 8, control /
It comprises an operation unit 9. A paper document is digitally input from the image input unit 1 and its original image is stored in the image memory 2. The input document image is displayed on the display device 91 such as a CRT via the character image extraction unit 3 and the control / operation unit 9. The operator extracts only the character image area by using the pointing device 93 such as a mouse while looking at the original image. FIG. 3A shows a state in which only the character image area is designated for the actual document 31. The area enclosed by the dotted rectangle is designated by the operator. The character image area information thus designated is stored in the table 32 as shown in FIG. The first and second columns of the table are the upper left coordinates of the character image rectangular area, and the third and fourth columns are the width and height of the character image rectangular area. As a method for extracting this character image area, in addition to the above-mentioned operator-extracting method, Japanese Patent Laid-Open No. 2-159690
It is also possible to use a method of automatically separating a character and a figure by extracting the characteristics of a black pixel block as shown in Japanese Patent Publication No. The character recognition unit (OCR) 4 recognizes the designated character image area by using the image memory 2 and the table 32 showing the character image areas extracted by the character image extraction unit 3. As a result, the recognition result is output to the recognition result storage memory 5 as the recognition result storage table 51 in a table format as shown in FIG. 4 for each character image area.

【００１０】次に図６のフローチャートに基づいて修正
部６の作用について説明する。まず最初に認識結果格納
メモリ５内の認識結果に対して、修正処理が終了するま
で（図６ステップ６１１でＹとなるまで）、操作者が修
正を行う（ステップ６１２）。これは、現在の文字認識
技術では、文字認識部４の認識率は完全な１００％には
ならず、どうしても操作者による確認／修正が必要なた
めである。修正を行う際には、認識結果格納メモリ５内
にある認識結果を修正部６と制御／操作部９とを介し
て、ＣＲＴ等の表示装置９１に表示し、操作者が、この
認識結果を見ながらマウス等の指示装置９３やキーボー
ド９２を用いて修正を行う。結果表示は認識結果をもと
の文字画像があった位置に表示し、できるだけ原画像と
同じような状態で示すことによって、誤認識文字を見つ
けやすいようにする。操作者修正部６１では、操作者が
このような表示を見て、誤認識文字を発見した場合、指
示装置９３を用いて、図５に示すようにカーソル９４を
誤認識文字の上に移動し選択する。そして、キーボード
９２を使用し、操作者がかな漢字変換等を用いて、正し
い文字の入力を行う。この入力文字が誤認識文字と置き
換わり、修正が行われる。この際、“修正された文字の
位置”と“修正された文字”そして“置き換えた文字”
とを記録しておく（ステップ６１２）。Next, the operation of the correction unit 6 will be described with reference to the flowchart of FIG. First, the operator corrects the recognition result in the recognition result storage memory 5 until the correction process is completed (until Y in step 611 in FIG. 6) (step 612). This is because with the current character recognition technology, the recognition rate of the character recognition unit 4 does not reach 100%, and confirmation / correction by the operator is necessary. When making a correction, the recognition result stored in the recognition result storage memory 5 is displayed on the display device 91 such as a CRT via the correction unit 6 and the control / operation unit 9, and the operator displays the recognition result. While looking, correction is performed using the pointing device 93 such as a mouse or the keyboard 92. In the result display, the recognition result is displayed at the position where the original character image was, and is displayed in the same state as the original image as much as possible so that the misrecognized character can be easily found. When the operator sees such a display and finds the erroneously recognized character, the operator correction section 61 uses the pointing device 93 to move the cursor 94 onto the erroneously recognized character as shown in FIG. select. Then, using the keyboard 92, the operator inputs correct characters by using kana-kanji conversion or the like. This input character replaces the misrecognized character and is corrected. At this time, "position of corrected character", "corrected character" and "replaced character"
And are recorded (step 612).

【００１１】次に自動修正部６２では、操作者による修
正が行われた後、未修正部分の認識文字列（認識結果格
納メモリ５内の“修正された文字の位置”から最後の文
字までの文字列）に対して、“修正された文字”とその
文字の後の１文字を含めた２文字の探索を行い、同一の
文字列を探す（ステップ６２１）。同一の文字列を発見
した（ステップ６２１でＹ）ならば、“修正された文
字”と同一の文字に対して操作者による修正（“修正さ
れた文字”を“置き換えた文字”に置き換える）と同じ
処理を行う（ステップ６２２）。同一の文字列でなく
（ステップ６２１でＮ）、未修正文字列全てを探索して
いない（ステップ６２３でＮ）ならば、次の未修正文字
列に対して前記と同様な探索処理（ステップ６２１）と
修正処理（ステップ６２２）とを繰り返す。未修正文字
列全てを探索した（ステップ６２３でＹ）ならば、ステ
ップ６２４の処理へと移る。この際、未修正文字列に対
しては、操作者が行った修正と同じ修正が行われたこと
になる。Next, in the automatic correction unit 62, after the correction by the operator, the unrecognized character string (from the "position of the corrected character" in the recognition result storage memory 5 to the last character) is recognized. The character string) is searched for two characters including the "corrected character" and one character after the character, and the same character string is searched (step 621). If the same character string is found (Y in step 621), the operator corrects the same character as the "corrected character" (replaces the "corrected character" with the "replaced character"). The same process is performed (step 622). If they are not the same character string (N in step 621) and all unmodified character strings have not been searched (N in step 623), the same unprocessed character string as described above is searched for (step 621). ) And the correction process (step 622) are repeated. If all uncorrected character strings have been searched (Y in step 623), the process proceeds to step 624. At this time, the uncorrected character string is subjected to the same correction as the correction made by the operator.

【００１２】その後、同様な処理を既修正部分の認識文
字列（認識結果格納メモリ５内の最初の文字から“修正
された文字の位置”までの文字列）に対して行う。つま
り、既修正部分の認識文字列に対して、“修正された文
字”とその文字の後の１文字を含めた２文字の探索を行
い、同一の文字列を探す（ステップ６２４）。同一の文
字列を発見した（ステップ６２４でＹ）ならば、操作者
に修正してもよいかどうかの確認を求める表示を行い
（ステップ６２５）、ＯＫならば（ステップ６２６で
Ｙ）、“修正された文字”と同一の文字に対して操作者
による修正（“修正された文字”を“置き換えた文字”
に置き換える）と同じ処理を行う（ステップ６２７）。
ＯＫでないならば（ステップ６２６でＮ）、修正は行わ
ず、ステップ６２４の処理へ戻る。同一の文字列でなく
（ステップ６２４でＮ）、既修正文字列全てを探索して
いない（ステップ６２８でＮ）ならば、次の既修正文字
列に対して前記と同様な探索処理（ステップ６２４）と
修正処理（ステップ６２５，６２６，６２７）を繰り返
す。既修正文字列全てを探索した（ステップ６２８で
Ｙ）ならば、操作者修正部６１のステップ６１１の処理
へ戻る。Thereafter, similar processing is performed on the recognized character string of the already-corrected portion (the character string from the first character in the recognition result storage memory 5 to the "position of the corrected character"). That is, the recognized character string of the corrected part is searched for two characters including the "corrected character" and one character after the character, and the same character string is searched (step 624). If the same character string is found (Y in step 624), a display is displayed asking the operator to confirm whether or not it may be corrected (step 625), and if OK (Y in step 626), "correction" is performed. Corrected by the operator for the same character as the "corrected character"("replacedcharacter" for "corrected character")
The same processing as that of (replaced by) is performed (step 627).
If it is not OK (N in step 626), no correction is performed and the process returns to step 624. If it is not the same character string (N in step 624) and all the corrected character strings have not been searched (N in step 628), the same correction processing as described above is performed for the next corrected character string (step 624). ) And the correction process (steps 625, 626, 627) are repeated. If all the corrected character strings have been searched (Y in step 628), the process returns to step 611 of the operator correction unit 61.

【００１３】既修正文字列の修正処理では、操作者が既
に修正済みの部分であるので、自動修正を行う前に確認
を求める（ステップ６２５，６２６）ことを行ってい
る。In the correction processing of the already-corrected character string, since the operator has already corrected the portion, confirmation is requested (steps 625 and 626) before automatic correction is performed.

【００１４】自動修正部の処理の一例として、図５で示
した操作者による修正が、自動修正部６２によって、ど
のように修正されるかを図７に示す。図７（ａ）で、２
行目の誤認識文字“間”を修正した場合（１行目の誤認
識文字“間”は、操作者は気付かなかったとする）、図
７（ｂ）に示すようになる。次に自動修正部６２の処理
を行う。自動修正部６２では、未修正部分（つまり、操
作者が修正した文字“間”の後から最後の文字まで）に
対して、修正対象文字の後１文字を含めた文字列“間
題”を探索し、同一の文字列があった場合、操作者と同
じ修正を行う。この場合、３行目に同一の文字列“間
題”があるので、これに対して修正（“間”→“問”）
を行う。４行目にも“間”という文字があるが、文字列
“間題”とは違うため、修正は行われない。次に既修正
部分（つまり、最初の文字から操作者が修正した文字
“間”の前の文字まで）に対して、未修正文字列に対す
る処理と同様に、修正対象文字の後１文字を含めた文字
列“間題”を探索する。この場合、１行目に同一の文字
列“間題”があるので、操作者に修正をするかどうかの
確認を行い、ＯＫの場合に修正（“間”→“問”）を行
う。このように誤認識している文字（１，３行目の
“間”）のみを修正し、同じ文字ではあるが、修正して
はならない文字（４行目の“間”）に対しては修正は行
っていない。この修正処理は、同一文書または同一ペー
ジ内では、同一の単語が使用される場合が多いことを利
用している。As an example of the processing of the automatic correction unit, FIG. 7 shows how the automatic correction unit 62 corrects the correction by the operator shown in FIG. In FIG. 7A, 2
When the erroneously recognized character "between" on the line is corrected (it is assumed that the operator did not notice the erroneously recognized character "between" on the first line), it becomes as shown in FIG. 7 (b). Next, the processing of the automatic correction unit 62 is performed. In the automatic correction unit 62, a character string "intermediate" including one character after the character to be corrected is added to the uncorrected portion (that is, from the character "between the character corrected by the operator" to the last character). If the same character string is searched for, the same correction as the operator is made. In this case, the same character string "interworking" exists on the third line, so correct it ("interval" → "question")
I do. The fourth line also has the character "ma", but it is not modified because it is different from the character string "between". Next, for the already-corrected part (that is, from the first character to the character before the character "between" corrected by the operator), include the character after the character to be corrected in the same way as the processing for the uncorrected character string. Search for the string "intertitle". In this case, since the same character string "interworking" is present on the first line, the operator is asked whether or not to make a correction, and if it is OK, the correction ("interval" → "question") is performed. In this way, only the characters that are erroneously recognized (“between” on the 1st and 3rd lines) are corrected, and for the same characters that should not be corrected (“between” on the 4th line), No modifications have been made. This correction process utilizes that the same word is often used in the same document or the same page.

【００１５】前記の修正処理が行われ、初期の目標とし
ている文書の作成ができる。格納部７では、修正処理が
終了した文書を既存のワープロ等の文書編集装置が扱え
る文書フォーマットに変換し、文書ファイルとして文書
ファイル格納装置８に格納する。By performing the above-mentioned correction processing, it is possible to create an initial target document. The storage unit 7 converts the corrected document into a document format that can be handled by a document editing device such as an existing word processor, and stores the document file in the document file storage device 8.

【００１６】前記実施例で説明した自動修正部６１内の
処理以外に、次のような処理に変更することも可能であ
る。（１）前記実施例の自動修正部６１では、修正対象文字
の後の１文字を含めた２文字で同一の文字列を探索する
が、これを修正対象文字の前の１文字または前後の１文
字づつまたは複数の前後文字を含めた２字以上の文字列
によって探索すること。（２）前記実施例の自動修正部６１では、探索を一回の
み行うが、複数の文字列で複数回の探索を行うようにす
ること。たとえば、修正対象文字の後の１文字を含めた
２文字で同一の文字列を探索した後、二回目は修正対象
文字の前の１文字を含めた２文字で探索するように変更
することなどがある。（３）前記実施例の自動修正部６１では、探索文字列
は、修正対象文字のただ単に前後の文字を含めた文字列
であるが、これを文字種（かな、漢字、英数字など）に
よって含める文字を適応的に変化させること。たとえ
ば、修正対象文字が漢字であり、前の文字はかな、後の
文字は漢字である場合は、後の文字を含めた探索文字列
とする。これによって、実施例より効果的な探索（探索
する際のヒット率が高くなる）が可能となるという効果
がある。（４）前記の実施例自動修正部６１では、既修正文字列
に対する修正の際、操作者に確認を求めているが、これ
を行わないで、自動的に修正してしまうこと。または、
未修正文字列に対しての修正の際、操作者に確認を求め
るようにすること。（５）前記実施例においては、操作者が修正する文字と
して１文字の場合のみを対象としているが、連続した複
数文字を修正した場合も同様な構成で処理することがで
きる。In addition to the processing in the automatic correction section 61 described in the above embodiment, the following processing can be changed. (1) The automatic correction unit 61 of the above-described embodiment searches for the same character string with two characters including one character after the correction target character. Searching by a character string of two or more characters including each character or multiple preceding and following characters. (2) Although the automatic correction unit 61 of the above-described embodiment performs the search only once, it should perform the search a plurality of times using a plurality of character strings. For example, after searching for the same character string with two characters including the one character after the correction target character, change to search with the two characters including the one character before the correction target character for the second time. There is. (3) In the automatic correction unit 61 of the above-described embodiment, the search character string is a character string that includes characters just before and after the correction target character, but includes this by the character type (kana, kanji, alphanumeric characters, etc.). To change characters adaptively. For example, if the correction target character is a kanji character, the previous character is a kana character, and the subsequent character is a kanji character, the search character string includes the latter character. As a result, there is an effect that a more effective search (a hit rate at the time of searching becomes higher) becomes possible than the embodiment. (4) In the above-described embodiment automatic correction section 61, the operator is requested to confirm when correcting the already-corrected character string, but the correction is automatically performed without doing this. Or
When correcting an uncorrected character string, ask the operator for confirmation. (5) In the above embodiment, the case where the operator corrects only one character is targeted, but a case where a plurality of consecutive characters are modified can be processed with the same configuration.

【００１７】[0017]

【発明の効果】以上述べたように、この発明によれば、
操作者が行った修正に従って認識文字群の中の同一の誤
認識文字を修正するので修正時における操作者に対する
負担を軽減する。また、同一誤認識文字の探索が前後の
文字を含めた文字列を用いて行われているので、ただ１
字の文字が同一でも正しく認識されて修正不要な文字に
対しての修正は行われないで、誤認識文字のみを的確に
修正することができる。未修正部分と既修正部分とを分
けることによって、２つの部分に対して違う処理を施す
ことが可能となった。つまり、未修正部分の文字列が修
正文字列と同一の際は自動的に修正が行われるが、既修
正部分は既に操作者が検査している部分であるので、確
認してから修正が行われる。このように処理を分けるこ
とによって既修正部分に対して誤りとなる修正を防ぐこ
とができる。As described above, according to the present invention,
Since the same erroneously recognized character in the recognized character group is corrected according to the correction made by the operator, the burden on the operator at the time of correction can be reduced. Also, since the search for the same misrecognized character is performed using the character string that includes the preceding and following characters, only 1
Even if the characters of the characters are the same, the characters that are recognized correctly and do not need to be corrected are not corrected, and only the erroneously recognized characters can be corrected accurately. By separating the uncorrected part and the already-corrected part, different processing can be performed on the two parts. In other words, when the character string of the uncorrected part is the same as the corrected character string, the correction is automatically performed, but the already-corrected part is the part that the operator has already inspected, so make the correction after confirming it. Be seen. By dividing the processing in this way, it is possible to prevent an erroneous correction of the already-corrected portion.

[Brief description of drawings]

【図１】本発明の概略を示す構成図である。FIG. 1 is a configuration diagram showing an outline of the present invention.

【図２】文書認識装置全体の概要を示すブロック図であ
る。FIG. 2 is a block diagram showing an outline of the entire document recognition device.

【図３】文字画像領域抽出の例を示している。FIG. 3 shows an example of character image area extraction.

【図４】文字認識部（ＯＣＲ）からの認識結果の表を示
している。FIG. 4 shows a table of recognition results from a character recognition unit (OCR).

【図５】操作者修正部での修正例である。FIG. 5 is an example of correction by an operator correction unit.

【図６】修正部分のアルゴリズムを示すフローチャート
である。FIG. 6 is a flowchart showing an algorithm of a modified portion.

【図７】修正部での修正例である。FIG. 7 is an example of correction by a correction unit.

[Explanation of symbols]

１・・画像入力部、２・・イメージメモリ、３・・文字
画像抽出部、４・・文字認識部（ＯＣＲ）、５・・認識
結果格納メモリ、６・・修正部、７・・格納部、８・・
文書ファイル格納装置1 ... Image input unit, 2 ... Image memory, 3 ... Character image extraction unit, 4 ... Character recognition unit (OCR), 5 ... Recognition result storage memory, 6 ... Correction unit, 7 ... Storage unit , 8 ...
Document file storage device

Claims

[Claims]

1. A character recognizing means for recognizing a character image, a first correcting means for correcting a recognition result recognized by the character recognizing means according to an instruction of an operator, and a character recognizing means for recognizing the character recognized by the character recognizing means. From the first
Searching means for searching a character string having a predetermined number of characters including the character corrected by the correcting means, and the same misidentified character corrected by the first correcting means in the character string searched by the searching means. And a second correcting means for correcting the character of the above to the corrected character corrected by the first correcting means.

2. The document recognition correction apparatus according to claim 1, wherein a character positioned before the position of the character corrected by the first correction unit is converted by the second correction unit. A document recognition / correction device having a confirmation means for correcting a character upon confirmation by an operator. [0001]