WO2001013325A1 - Document input method, recording medium recording document input program and document input device - Google Patents

Document input method, recording medium recording document input program and document input device Download PDF

Info

Publication number
WO2001013325A1
WO2001013325A1 PCT/JP2000/002484 JP0002484W WO0113325A1 WO 2001013325 A1 WO2001013325 A1 WO 2001013325A1 JP 0002484 W JP0002484 W JP 0002484W WO 0113325 A1 WO0113325 A1 WO 0113325A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
correction
document
recognition
output
Prior art date
Application number
PCT/JP2000/002484
Other languages
French (fr)
Japanese (ja)
Inventor
Masaki Nakagawa
Original Assignee
Japan Science And Technology Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Japan Science And Technology Corporation filed Critical Japan Science And Technology Corporation
Publication of WO2001013325A1 publication Critical patent/WO2001013325A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/987Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns with the intervention of an operator

Definitions

  • the present invention relates to a document input method, a recording medium recording a document input program, and a document input device.
  • the present invention particularly relates to a document input method usable in the field of information devices used in offices and the like, and capable of correcting recognized characters, a recording medium storing a document input program, and a document input device.
  • BACKGROUND ART Conventionally, there is a document input device that inputs a document using an optical character reader (Optical Character Reader, OCR) or the like, and corrects a read result without using a keyboard and a display (for example, a special document input device). See 63-22020383).
  • FIG. 13 shows a configuration diagram of a conventional document input device.
  • FIG. 13 shows a configuration diagram of a conventional document input device.
  • This document input device includes a photoelectric conversion unit 101, a character recognition unit 102, a document file storage unit 103, a printing unit 104, a correction symbol / character recognition unit 106, a correspondence unit 107, a correction unit. Equipped with 1 08.
  • an input document 110 to be read is converted into electronic data by a photoelectric conversion unit 101, read by a character recognition unit 102, and the result is stored in a document file storage unit 103.
  • the document file stored in the document file storage means 103 is printed by the printing means 104, and an output document 105 (recognition output 111) is output.
  • a symbol and a character are entered and a correction input 1 1 2 is created.
  • the corrected input 1 1 and 2 were converted into electronic data again by the photoelectric conversion means 101, and entered in the corrected symbol / character recognition means 106. Read symbols and letters. Further, the correspondence between the positions where these are entered and the positions of the document files is performed by the associating means 107, and the processing according to the entered correction symbols is performed by the correcting means 108, and the correction result 1 is obtained. You can get 1 3 In this way, people who cannot use a keyboard and display can create and modify electronic documents. Disclosure of the invention
  • the present invention stores intermediate processing results such as a character image and a candidate character of a reading result, and always corrects characters by using the intermediate processing result and recognition means and correction means. The purpose is to correct misrecognized characters without writing them. Further, according to the present invention, it is another object of the present invention to apply a modification in a case where a character is entered as in the past.
  • a recording medium storing a document input program.
  • FIG. 1 is a configuration diagram of a document input device according to the invention.
  • FIG. 1 is a configuration diagram of a document input device according to the invention.
  • FIG. 2 is a flowchart of a first embodiment of a document input method according to the present invention.
  • FIG. 3 is an explanatory diagram relating to the writing input process.
  • FIG. 4 is an explanatory diagram of word processing.
  • FIG. 5 is an explanatory diagram of the combining process.
  • FIG. 6 is an explanatory diagram of the separation processing.
  • FIG. 7 is a flowchart of the document input method according to the second embodiment of the present invention.
  • FIG. 8 is a flowchart of a third embodiment of the document input method according to the present invention.
  • Fig. 9 shows the extraction and identification of correction symbols in the case of dot characters or gray color. These are the flowcharts.
  • FIG. 10 is an explanatory diagram relating to the document processing of No. 9.
  • FIG. 10 is an explanatory diagram relating to the document processing of No. 9.
  • FIG. 10 is an explanatory diagram relating to the document processing of No. 9.
  • FIG. 10 is an explanatory diagram
  • FIG. 11 is an explanatory diagram of detecting a lint-out character position.
  • FIG. 12 is an explanatory diagram of recognition of a positive sign.
  • FIG. 13 is a configuration diagram of a conventional document input device.
  • FIG. 14 is an explanatory diagram of the conventional document input process.
  • BEST MODE FOR CARRYING OUT THE INVENTION FIG. 1 shows a configuration diagram of a document input device according to the present invention.
  • This document input device includes photoelectric conversion means 1, character recognition means 2, document file storage means 3, output means 4, input means 5, correction recognition means 6, association means 7, correction means 8, intermediate processing file storage means. 9 is provided.
  • the photoelectric conversion means 1 is composed of, for example, an OCR, reads a document to be read, and converts it into electronic data.
  • the character recognition means 2 recognizes a character or a document from the electronic data read by the photoelectric conversion means 1.
  • the document file storage means 3 stores the document file created by the character recognition means 2.
  • the output means 4 displays or outputs the document file stored in the character file storage means 3.
  • the output unit 4 is a device for printing out the output document by the printing unit, a device for displaying the output document on a display, or the like.
  • As the output means 4, a device for interfacing with an external device, a transmission device, a drive device for various recording media, or the like can be used.
  • the input means 5 is a suitable input device such as a pointing device such as a mouse and a touch pen, a keyboard and the like.
  • the operator refers to the recognition result output by the output unit 4 and instructs the content of the correction.
  • the correction content may be indicated by inputting a correction symbol in the print result. This place In this case, the entered correction symbol is cut out by the correction recognition means 6.
  • the instruction of the content of the correction may be made by pointing the correction content and the correction position by the pointing device or the like of the input means 5 while watching the screen.
  • the correction recognizing means 6 recognizes correction contents (correction symbols, characters, etc.) specified for the document or display screen output by the output means 4.
  • the correction recognizing means 6 can recognize not only the correction symbols but also the above-described conventional correction characters and correction portions.
  • the correction recognition means 6 includes, for example, word processing for recognizing a plurality of characters as a word, combining processing for recognizing a character pattern for a plurality of characters as one character pattern, and a plurality of character patterns corresponding to one character. It is possible to recognize a correction symbol or a correction character when performing a separation process or the like for recognizing the character pattern as a character pattern.
  • the associating means 7 determines the correspondence between the content of the correction recognized by the correction recognizing means 6 and the character or the position of the character to be corrected. That is, the entered correction symbol and character are associated with the processing of the output character.
  • the correction means 8 refers to the intermediate processing file storage means 9 based on the result of the determination made by the associating means 7 when re-recognized, and refers to the intermediate processing file storage means 9 to store the document file stored in the document file storage means 3 in accordance with the content of the correction. Fix it.
  • the correction unit 8 may be configured to convert the character indicated by the position information of the document file from the candidates stored in the intermediate processing file storage unit 9 at the time of the previous recognition, according to the association method 7, for example. Modify to characters.
  • the correction means 8 can also execute the above-described conventional correction processing.
  • the correcting means 8 recognizes a designated plurality of characters as one word, and in the case of combining processing, recognizes a character pattern for a plurality of characters as one character pattern.
  • character flutter corresponding to one character Can be separated and recognized as a plurality of character patterns, and the corresponding correction processing is executed for each.
  • the present invention can be appropriately applied to other instructions for correction.
  • the intermediate processing file storage means 9 stores processing data in the middle, that is, a character pattern and its position, a line position, a recognition result including a recognition candidate character, and the like.
  • the character recognition means 2 stores the information in the intermediate processing file storage means 9 during the character recognition processing.
  • the correction means 8 extracts the position information and the candidate character for the corresponding character or line from the intermediate processing file storage means 9.
  • FIG. 2 shows a flowchart of the first embodiment of the document input method according to the present invention.
  • FIG. 3 is an explanatory diagram related to the document input process. Here, a case where the output result is printed out on paper or the like will be described as an example.
  • the input document 10 is image-input from the photoelectric conversion means 1 (S101), and the data is stored in the intermediate processing file storage means 9.
  • the character recognizing means 2 cuts out characters from the input image (S103), and stores character or line position information, recognition candidates and the like in the intermediate processing file storage means 9.
  • the character recognition means 2 performs character recognition (S105), and stores the recognition result in the document file storage means 3.
  • the output means 4 outputs the recognition result (S107).
  • the recognition output 11 is output.
  • the operator gives an instruction of the correction content to a character or a document that has not been sufficiently recognized (SI11).
  • the instruction for the correction content here is made, for example, by using the correction symbol of the first symbol 12a, the second symbol 12b, and the third symbol 12c in the correction input 12 in which the symbol characters are entered.
  • the first symbol 12a, the second symbol 12b, and the third symbol 12c are symbols for instructing word processing, combining character patterns, and separating character patterns, respectively.
  • the correction recognizing means 6 cuts out the written correction symbol (S113). The extraction of the correction symbol is performed by detecting the line spacing and the character spacing of the print position, for example. Alternatively, a correction symbol inserted between characters can be detected. When the correction symbol is a color such as blue or red, the correction symbol can be detected by adding a light-receiving element or a filter that detects only that color as the photoelectric conversion unit 1. . Conversely, when the character printed by the output means 4 is a color character, the correction symbol can be similarly detected.
  • the number of erroneous recognitions is reduced because there are restrictions on the combinations of the first and second characters that can be established as words. For example, suppose that as a result of recognizing the first character, four characters are determined as recognition candidates. That is, the first candidate is “mochi”, the second candidate is “special”, the third candidate is “samurai”, and the fourth candidate is “earning”. Also, assume that three characters are determined as recognition candidates as a result of recognizing the second character. That is, the first candidate is “fine”, the second candidate is “sign”, and the third candidate is “work”. In the recognition results for each character, the first candidates “Toku” and “Fine” were selected, respectively.
  • the user was instructed to recognize these two characters as words, so we consider combinations of each candidate for each character.
  • the combination of the second candidate “Toku” and the second candidate “Sho” is the sum of the candidate priorities 4 (2 + 2), and the fourth candidate “Earn” and the third candidate “Work” Is determined to be the sum of the candidate priorities 7 (4 + 3). Therefore, as the word processing result, the first candidate “feature” and the second candidate “operation” are determined as recognition candidates in descending order of priority (ie, in order of decreasing sum of priorities).
  • the “feature” power of the first candidate Be recognized. If it is recognized that the character pattern is to be combined, the corresponding character patterns are combined, and a combining process for recognizing one character is executed (SI 19).
  • FIG. 5 is an explanatory diagram of the combining process.
  • a character pattern stored in the intermediate processing file storage means 9) corresponding to two or more characters specified by the correction symbol is recognized again as one character pattern.
  • an instruction to combine “T” and “Why” in the recognition result was given, and as a result of re-recognition, “Extract” was recognized. If it is recognized that the separation process of the character and the 'turn' is performed, the separation process is performed again from the character extraction for the corresponding character pattern (SI17).
  • Figure 6 shows an illustration of the separation process.
  • FIG. 7 shows a flowchart of a document input method according to a second embodiment of the present invention.
  • FIG. 8 shows a flowchart of the third embodiment of the document input method according to the present invention.
  • steps S101 to S107 and S115 to S127 are the same as those in FIG. Further, the processing in step S115 is the same as that in FIG.
  • a recognition output 11 1 S by the output unit 4 and a step S 109 for determining whether the output is paper output or display output are added.
  • processing similar to that of the first embodiment is executed, and in the case of display output, processing similar to that of the second embodiment is executed.
  • Fig. 9 shows a flowchart for cutting out and identifying modified symbols in the case of dot characters or gray color.
  • the output means 4 outputs a printout of the document file stored in the document file storage means 3 in a dot character or gray color image (S201).
  • the operator enters a symbol for correction and characters as necessary on this printout, and creates a correction input document 14 (S203).
  • the corrected input document 14 is image-input by the photoelectric conversion means 1, converted into electronic data, and stored in the electronic data storage means 6 (S205).
  • the correction recognizing means 6 detects the position of the character in which the correction symbol is entered in the document output by the output means 4 (S207). Further, the correction recognizing means 6 detects lines and character positions from the digitized document image, noting that the printed characters are printed in dots. Details will be described later.
  • the correction recognizing means 6 recognizes the correction symbols and characters entered in the correction input document 14 (S211).
  • the correct character to be corrected can be recognized by, for example, the character recognition means 2 or the like.
  • the associating means 7 associates the position of the character detected by the correction recognizing means 6 with the correction symbol recognized by the correction recognizing means 6 and specifies the character to be corrected (S 2 13 ) . That is, it is associated with where the entered correction symbol or character corresponds to the output character. In this way, a symbol detection / recognition result 15 is obtained.
  • the correction means 8 corrects the corresponding character by the correction processing corresponding to the correction symbol in accordance with the correspondence of the correspondence means 7 (S215).
  • the correction means 8 corrects the document file stored in the document file storage means 3 according to the correction symbol recognized by the correction recognition means 6.
  • the hatched instruction indicates, for example, a symbol that replaces the corresponding character with the character written thereon, so that the correcting means 8 responds to such a character by
  • the process of correcting “filled” into “physical” is performed on the document file storage means 3.
  • the character “rea” is recognized by the character recognition means 2 or the correction recognition means 6 as a correct character to be corrected.
  • the correction means 8 can perform, for example, word recognition processing, combination processing, and division processing to correct the document file.
  • the output means 4 reads the corrected document file from the document file storage means 3 and outputs a correction result 16 (S2 17).
  • Fig. 11 is an explanatory diagram of printout character position detection.
  • the correction recognizing means 6 detects the position of the character by multiplying the number of pixels in each of the horizontal direction and the vertical direction in the edge image of the input document. Specifically, for example, first, a contour image is created, emphasized, and on this contour image, the number of black pixels is counted in the horizontal direction to detect a row position. The character position is detected by counting the number of black pixels. In this example, the position detection of the characters printed as “this paper” on the first line and “capturing device” on the second line by the correction recognition means 6 will be described. First, a character image 41 in which a correction symbol has been entered is input.
  • each pixel is detected based on this character image, and a contour image 42 is obtained. Further, with respect to the detected contour pixel 42, pixels are added or added in the horizontal direction to obtain a horizontal peripheral distribution 43. The position of the row can be detected from the horizontal distribution. Next, for each of the obtained rows, similarly, the detected contour pixels 44 are cut out, and the pixels are added or added in the vertical direction to obtain the vertical peripheral distribution 45. From the vertical marginal distribution 45, the character position (horizontal position) of each line can be obtained. As described above, for a certain character, the row and column are specified, and a character position detection image 46 is obtained. Furthermore, each character can be cut out by detecting the outermost pixel of each character. FIG.
  • the correction recognizing means 6 calculates the connected components of the pixels from the corrected input document 14 in the extraction of the corrected symbols (detection of the entered characters), and determines the connected components of the pixels in advance based on the obtained number of pixels of the connected components. By removing black connected components smaller than or equal to the size, only the modified symbols are left, and each of the remaining modified symbols is cut out.
  • the correction recognition means 6 includes, for example, a contraction process of setting a black pixel adjacent to a white pixel as an edge and removing the edge from the original image. In this example, a description will be given of the correction symbol written in the “package” of the character printed out “of the collecting device”.
  • a character image 51 in which a correction symbol has been entered is input.
  • each pixel is converted into a contracted image 52 in which contraction processing (contraction processing) is performed.
  • contraction processing contraction processing
  • This shrinking process is performed an appropriate number of times to erase the dot character, thereby obtaining a corrected symbol extracted image 53 from which the corrected symbol has been extracted.
  • a connected component which is a portion where black pixels are connected is obtained, and the number of black pixels of each connected component is obtained.
  • a modified symbol can be extracted by leaving a connected component in which the number of black pixels is equal to or larger than a predetermined threshold.
  • the correction recognizing means 6 cuts out the correction symbol and recognizes what kind of correction symbol the correction symbol is to be issued, thereby obtaining a correction symbol recognition result 54.
  • communication such as the Internet may be used.
  • Industrial Possibility according to the present invention, as described above, the intermediate processing results such as the character image and the candidate characters of the read result are stored, and the intermediate processing results are used and the recognition means and the correction means are used. It is possible to correct an erroneously recognized character without necessarily correcting the character. Further, according to the present invention, it is also possible to apply a modification in a case where a character is entered as in the related art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

Erroneously recognized characters are corrected without necessarily entering characters in correcting characters. An input document (10) is image-input (S101), characters are picked up (S103), characters are recognized (S105) and recognition results are output (S107). When outputting on paper, the operator instructs correction descriptions to characters/documents short of sufficient recognition (S111) and entered correction symbols are picked up (S113). Word processing (S117), linking (S119) or separating (S117) is executed according to the picked up correction characters, a document file is corrected according to each processing (S123) and the correction result is output (S125). If a satisfactory result is not obtained (S129), the system returns to Step S111 to repeat the sequences of processing which are terminated after a sufficient correction has been made.

Description

明 細 書 文書入力方法、 文書入力プログラムを記録した記録媒体及び文書入力装置 技術分野 本発明は、 文書入力方法、 文書入力プログラムを記録した記録媒体及び文書入 力装置に係る。 本発明は、 特に、 オフィス等で利用される情報機器分野に利用可 能で、 認識された文字の修正を行うことのできる文書入力方法、 文書入力プログ ラムを記録した記録媒体及び文書入力装置に関する。 背景技術 従来より、 文書を光学式文字読み取り装置 (O p t i c a l C h a r a c t e r R e a d e r , O C R ) 等を用いて入力し、 キーボード、 ディスプレイ を用いずに読み取り結果を修正する文書入力装置がある (例えば、 特開昭 6 3— 2 20 3 8 3号公報参照) 。 図 1 3に、 従来の文書入力装置の構成図を示す。 また、 図 1 4に、 従来の文書 入力処理に関する説明図を示す。 この文書入力装置は、 光電変換手段 1 0 1、 文字認識手段 1 02、 文書フアイ ル記憶手段 1 03、 印刷手段 1 04、 修正記号 ·文字認識手段 1 06、 対応付け 手段 1 0 7、 修正手段 1 08を備える。 この文書入力装置では、 まず、 読みとるべき入力文書 1 1 0を、 光電変換手段 1 0 1により電子データとし、 文字認識手段 1 0 2で読み取り、 その結果を文書 フアイル記憶手段 1 03に記憶する。 文書フアイル記憶手段 1 03に記憶された 文書ファイルは、 印刷手段 1 04より印刷され、 出力文書 1 0 5 (認識出力 1 1 1) が出力される。 ここで、 認識出力 1 1 1上に誤りを修正するため、 記号、 文 字を記入し修正入力 1 1 2を作成する。 この修正入力 1 1 2を、 光電変換手段 1 0 1で再び電子データにし、 修正記号 ·文字認識手段 1 06において、 記入した 記号、 文字を読みとる。 さらに、 それらが記入された位置と文書ファイルとの位 置の対応付けを対応付け手段 1 0 7で行い、 記入された修正記号に従った処理を 修正手段 1 0 8で行 、、 修正結果 1 1 3を得ることができる。 このようにするこ とによって、 キーボード、 ディスプレイを使えない人にとっても、 電子文書の作 成及び修正が可能になる。 発明の開示 TECHNICAL FIELD The present invention relates to a document input method, a recording medium recording a document input program, and a document input device. The present invention particularly relates to a document input method usable in the field of information devices used in offices and the like, and capable of correcting recognized characters, a recording medium storing a document input program, and a document input device. . BACKGROUND ART Conventionally, there is a document input device that inputs a document using an optical character reader (Optical Character Reader, OCR) or the like, and corrects a read result without using a keyboard and a display (for example, a special document input device). See 63-22020383). FIG. 13 shows a configuration diagram of a conventional document input device. FIG. 14 is an explanatory diagram of a conventional document input process. This document input device includes a photoelectric conversion unit 101, a character recognition unit 102, a document file storage unit 103, a printing unit 104, a correction symbol / character recognition unit 106, a correspondence unit 107, a correction unit. Equipped with 1 08. In this document input device, first, an input document 110 to be read is converted into electronic data by a photoelectric conversion unit 101, read by a character recognition unit 102, and the result is stored in a document file storage unit 103. The document file stored in the document file storage means 103 is printed by the printing means 104, and an output document 105 (recognition output 111) is output. Here, in order to correct an error on the recognition output 1 1 1, a symbol and a character are entered and a correction input 1 1 2 is created. The corrected input 1 1 and 2 were converted into electronic data again by the photoelectric conversion means 101, and entered in the corrected symbol / character recognition means 106. Read symbols and letters. Further, the correspondence between the positions where these are entered and the positions of the document files is performed by the associating means 107, and the processing according to the entered correction symbols is performed by the correcting means 108, and the correction result 1 is obtained. You can get 1 3 In this way, people who cannot use a keyboard and display can create and modify electronic documents. Disclosure of the invention
しかしながら、 従来においては、 修正のため出力文書に手書きで記入された漢 字等の文字認識が、 正しく行われることが前提である。 従来は、 すなわち、 もと もとの認識処理で正確に認識されなかったものを同様の認識処理で繰返し認識す るのみに過ぎず、 手書き文字を正確に認識することは、 現状ではかなり実現が困 難である。 本発明は、 以上の点に鑑み、 文字イメージや読み取り結果の候補文字等の中間 処理結果を記憶しておき、 この中間処理結果と認識手段及び修正手段等を用いる ことにより、 文字の修正をかならずしも文字を記入せずに、 誤って認識された文 字の修正を行うことを目的とする。 また、 本発明によると、 従来のように文字を 記入した場合の修正についても適用することを目的とする。 本発明の第 1の解決手段によると、 文字認識により作成された文書ファイルを表示又は出力する出力ステップと、 前記出力ステップにより出力された文書に対して指示された修正内容を認識す る修正認識ステップと、 前記修正認識ステップにより認識された修正内容と修正すべき文字との対応関 係を判断する対応付けステップと、 前記対応付け手段による判断結果に基づき、 修正内容に応じて文書ファイルを 修正する修正ステップと を備えた文書入力方法及び文書入力プログラムを記録した記録媒体を提供する。 また、 本発明の第 2の解決手段によると、 文字認識により作成された文書ファィルを記憶する文書ファィル記憶手段と、 前記文字ファィル記憶手段に記憶された文書ファィルを表示又は出力する出力 手段と、 前記出力手段により出力された文書に対して指示された修正内容を認識する修 正認識手段と、 前記修正認識手段により認識された修正内容と修正すべき文字との対応関係を 判断する対応付け手段と、 前記対応付け手段による判断結果に基づき、 修正内容に応じて前記文書フアイ ル記憶手段に記憶された文書フアイルを修正する修正手段と を備えた文書入力装置を提供する。 図面の簡単な説明 図 1は発明に係る文書入力装置の構成図である。 図 2は発明に係る文書入力方法の第 1の実施の形態のフロ一チヤ一トである。 図 3は書入力処理に関する説明図である。 図 4は語処理の説明図である。 図 5は合処理の説明図である。 図 6は離処理の説明図である。 図 7は発明に係る文書入力方法の第 2の実施の形態のフローチャートである。 図 8は発明に係る文書入力方法の第 3の実施の形態のフローチヤ一トである。 図 9はドット文字又はグレー力ラーの場合の修正記号の切り出し及び識別につい てのフローチヤ一トである。 図 1 0は 9の文書処理に関する説明図である。 図 1 1はリントアウト文字位置検出の説明図である。 図 1 2は正記号の認識の説明図である。 図 1 3は来の文書入力装置の構成図である。 図 1 4は来の文書入力処理に関する説明図である。 発明を実施するための最良の形態 図 1に、 本発明に係る文書入力装置の構成図を示す。 この文書入力装置は、 光電変換手段 1、 文字認識手段 2、 文書ファイル記憶手 段 3、 出力手段 4、 入力手段 5、 修正認識手段 6、 対応付け手段 7、 修正手段 8、 中間処理フアイル記憶手段 9を備える。 光電変換手段 1は、 例えば O C Rで構成され、 読み取り対象となる文書を読み とり、 電子データとする。 文字認識手段 2は、 光電変換手段 1により読み取られ た電子データから文字 ·文書を認識する。 文書ファイル記憶手段 3は、 文字認識 手段 2により作成された文書ファイルを記憶する。 出力手段 4は、 文字ファイル 記憶手段 3に記憶された文書ファイルを表示又は出力する。 出力手段 4は、 印刷 手段により出力文書をプリントァゥ卜する装置又はディスプレイ上に出力文書を 表示する装置等である。 なお、 出力手段 4として、 外部装置のインタフェースを とるための装置、 伝送装置、 各種記録媒体の駆動装置等を用いることもできる。 入力手段 5は、 マウス、 タツチペン等のポインティングデバイス、 キーボード等 適宜の入力装置である。 操作者は、 出力手段 4により出力された認識結果を参照して、 修正内容の指示 を行う。 修正内容の指示は、 例えば、 プリンタ等の出力手段 4による出力プリン トアウトの場合、 その印刷結果に修正記号を記入するようにしてもよい。 この場 合、 記入された修正記号は、 修正認識手段 6により、 切り出される。 また、 修正 内容の指示は、 ディスプレイ等の出力手段 4による表示画面の場合、 画面を見な がら入力手段 5のポインティングデバイス等で修正内容及び修正位置を指示する ようにしてもよレ、。 修正認識手段 6は、 出力手段 4により出力された文書又は表示画面に対して指 示された修正内容 (修正記号、 文字等) を認識する。 出力手段 4により、 認識結 果としての文書がプリントァゥトされた場合、 光電変換手段 1より修正内容が入 力される。 一方、 認識結果としての文書がディスプレイ上に表示された場合、 入 力手段 5により、 修正内容及び位置が適宜入力される。 なお、 修正認識手段 6は、 修正記号のみならず、 上述した従来のような修正文字及び修正箇所についても認 識することができる。 修正認識手段 6は、 例えば、 単語として複数文字を認識す るための単語処理、 複数文字に対する文字パターンをひとつの文字パターンとし て認識するための結合処理、 ひとつの文字に対応する文字パターンを複数の文字 パターンとして認識するための分離処理等を実行する場合の修正記号や修正文字 を認識することができる。 対応付け手段 7は、 修正認識手段 6により認識された修正内容と修正すべき文 字又は文字の位置との対応関係を判断する。 すなわち、 記入された修正記号、 文 字が、 出力された文字のどこに対する処理なのかを対応づける。 修正手段 8は、 再度認識した際の、 対応付け手段 7による判断結果に基づき、 中間処理ファイル記憶手段 9を参照して、 修正内容に応じて文書ファイル記憶手 段 3に記憶された文書ファイルを修正する。 修正手段 8は、 例えば、 対応付け手 段 7に従レ、文書ファィルの位置情報により指示された文字を、 中間処理フアイル 記憶手段 9に先の認識時に記憶されている候補の中から別の候補文字に修正する。 修正手段 8は、 上述した従来のような修正処理についても実行することができる。 修正手段 8は、 例えば、 単語処理の場合、 指示された複数文字をひとつの単語と して認識し、 結合処理の場合、 複数文字に対する文字パターンをひとつの文字パ ターンとして認識し、 また、 分離処理の場合、 ひとつの文字に対応する文字バタ ーンを複数の文字パターンとして分離して認識することができ、 それぞれ該当す る修正処理を実行する。 なお、 本発明は、 修正するためのこの他の指示について も、 適宜適用することができる。 中間処理フアイル記憶手段 9は、 文字認識手段 2で認識を行う際に、 途中の処 理データ、 すなわち文字パターン及びその位置、 行位置、 認識候補文字を含む認 識結果等が記憶される。 文字認識手段 2は、 文字認識処理の際、 中間処理フアイ ル記憶手段 9に、 これら情報を記憶する。 修正手段 8は、 文字の再認識及び修正処理を実行する場合、 対応する文字又は行 についての位置情報と候補文字を中間処理ファィル記憶手段 9カゝら取り出す。 図 2に、 本発明に係る文書入力方法の第 1の実施の形態のフローチャートを示 す。 また、 図 3に、 文書入力処理に関する説明図を示す。 ここでは、 出力結果が 紙等にプリントアウトとされた場合を例に説明する。 まず、 光電変換手段 1から入力文書 1 0がイメージ入力され (S 1 0 1 ) 、 中 間処理ファイル記憶手段 9にデータを記憶する。 文字認識手段 2は、 入力された イメージから文字を切り出し (S 1 0 3 ) 、 中間処理ファイル記憶手段 9に文字 又は行の位置情報、 認識候補等を記憶する。 文字認識手段 2は、 文字認識をし (S 1 0 5 ) 、 認識結果を文書ファイル記憶手段 3に記憶する。 出力手段 4は、 認識 結果を出力する (S 1 0 7 ) 。 ここで、 認識出力 1 1が出力されたとする。 紙出力の場合は、 操作者は、 十分な認識がなされていない文字 ·文書に修正内 容の指示を行う (S I 1 1 ) 。 ここでの修正内容の指示は、 例えば、 記号文字が 記入された修正入力 1 2の中の、 第 1記号 1 2 a、 第 2記号 1 2 b、 第 3記号 1 2 cの修正記号で行われる。 この修正記号の例において、 第 1記号 1 2 a、 第 2 記号 1 2 b、 第 3記号 1 2 cはそれぞれ、 単語処理、 文字パターンの結合、 文字 パターンの分離処理を指示する記号である。 修正認識手段 6は、 記入された修正記号を切り出す (S 1 1 3 ) 。 修正記号の 切り出しは、 例えば、 活字位置の行間や文字間を検出することにより、 その行間 又は文字間に記入された修正記号を検出することができる。 また、 修正記号が青、 赤等のカラーである場合、 光電変換手段 1として、 その色のみを検出するような 受光素子や、 フィルタを付加すること等により、 修正記号を検出することができ る。 反対に、 出力手段 4により印刷された文字がカラーの文字の場合も同様に修 正記号を検出することができる。 また、 出力手段 4によるプリントアウトに、 文 字の位置情報が印刷されている場合、 それ以外の位置に記入されたものを修正記 号として認識することもできる。 さらに、 出力手段 4によるプリントアウトカ' ドット文字又はグレーカラーで印刷されている場合、 画素の連続性、 太さ等によ り、 修正記号を識別することができる。 この点については、 後述する。 ここで、 切り出された修正文字が、 単語処理と認識された場合は、 認識候補文 字において、 この記号に対応する文字の組で候補文字の組み合わせから単語とし てもっとも可能性の高い組み合わせを求める単語処理を行う (S 1 1 7 ) 。 図 4に、 単語処理の説明図を示す。 一般に文字認識は、 個々の文字パターンご とに認識して、 認識候補文字の一番目を認識結果として出力する。 これを単語と して例えば 2文字の組み合わせで判断した場合、 1文字目と 2文字目の組み合わ せのうち、 単語として成り立つものの制限があることから、 誤認識が少なくなる。 例えば、 1文字目を認識した結果、 4つの文字が認識候補として判断されたとす る。 即ち、 第 1候補 「持」 、 第 2候補 「特」 、 第 3候補 「侍」 、 第 4候補 「稼」 である。 また、 2文字目を認識した結果、 3つの文字が認識候補として判断され たとする。 即ち、 第 1候補 「微」 、 第 2候補 「徴」 、 第 3候補 「働」 である。 1 文字毎の認識結果では、 各々第 1候補の 「特」 及び 「微」 が選択された。 しかし、 単語処理では、 これら 2文字を単語として認識するように指示されたため、 各文 字の各候補の組合わせを検討する。 例えば、 第 2候補 「特」 及び第 2候補 「徴」 との組合わせが候補の優先順位の和 4 ( 2 + 2 ) であり、 また、 第 4候補 「稼」 及び第 3候補 「働」 との組合わせが候補の優先順位の和 7 ( 4 + 3 ) であること 力 \ 判断される。 したがって、 単語処理結果として、 優先順位が高い順に (即ち、 優先順位の和が小さい順に) 、 第 1候補 「特徴」 及び第 2候補 「稼働」 が認識候 補として判断される。 ここでは、 第 1候補の 「特徴」 力 単語処理の結果として 認識される。 また、 文字パターンの結合処理と認識された場合は、 对応する複数の文字パタ ーンをあわせ、 ひとつの文字として認識する結合処理を実行する (S I 1 9) 。 図 5に、 結合処理の説明図を示す。 結合処理は、 修正記号により指示された 2 個以上の文字に対応する文字パターン (中間処理ファイル記憶手段 9に記憶) を、 ひとつの文字パターンとして再度認識する。 ここでは、 認識結果の 「T」 と 「由」 とを結合する指示がなされたため、 再度の認識の結果、 「抽」 が認識された。 文字ノ、'ターンの分離処理と認識された場合は、 対応する文字パターンについて、 再度文字切り出しから行う分離処理が実行される (S I 1 7) 。 図 6に、 分離処理の説明図を示す。 分離処理は、 修正記号により指示された文 字に対応する文字パターン (中間処理ファイル記憶手段 9に記憶) を、 2つの文 字バタ一一ンとして切り出して再度認識する。 ここでは、 認識結果の 「和」 を分 離する指示がなされたため、 再度の認識の結果、 「 f 」 と 「e」 とが認識された。 修正手段 8は、 各処理に応じて文書ファイル記憶手段 3を修正し (S 1 23) 、 修正結果 1 3を出力する (S 1 25) 。 ここで、 満足する結果が得られない場合 (S 1 29) は、 さらに、 ステップ S 1 1 1に戻って処理を繰り返す。 十分に修 正が行われたら、 処理を終了する。 つぎに、 図 7に、 本発明に係る文書入力方法の第 2の実施の形態のフローチヤ ートを示す。 これは、 認識出力 1 1がディスプレイによる出力手段 4で出力され た場合の動作を示す。 ステップ S 101〜S 1 07、 S 1 1 7〜S 127の各処 理については、 図 2と同様である。 図 2のステップ S 1 1 1及び S 1 1 3力 こ の図のステップ S 1 1 5と置換されている。 ディスプレイ出力の場合は、 マウス等の入力手段 5を用いて、 修正内容及び該 当する文字位置を指示する (S 1 1 5) 。 よって、 修正認識手段 6及び対応付け 手段 7は、 マウス等による指示から直接修正内容と位置を把握することができる。 また、 修正文字自体を入力手段 4のキーボード等から入力することもできる。 図 8に、 本発明に係る文書入力方法の第 3の実施の形態のフローチャートを示 す。 ステップ S 1 0 1〜S 1 0 7、 S 1 1 5〜S 1 2 7の各処理については、 図 2と同様である。 また、 ステップ S 1 1 5の処理については、 図 7と同様である。 ここでは、 出力手段 4による認識出力 1 1力 S、 紙出力かディスプレイ出力かの判 断のためのステップ S 1 0 9が加えられている。 紙出力の場合は第 1の実施の形 態と同様に、 ディスプレイ出力の場合は第 2の実施の形態と同様の処理が実行さ れる。 以下に、 出力手段 4によるプリントアウトが、 ドット文字又はグレーカラーの 場合の修正記号の切り出し及び識別について説明する。 図 9に、 ドット文字又はグレーカラーの場合の修正記号の切り出し及び識別に ついてのフローチャートを示す。 また、 図 1 0に、 図 9の文書処理に関する説明 図を示す。 出力手段 4は、 文書ファイル記憶手段 3に記憶された文書ファイルをドット文 字又はグレーカラ一イメージでプリントアウトを出力する (S 2 0 1 ) 。 操作者 は、 このプリントアウト上に修正のための記号、 また、 必要に応じて文字を記入 し、 修正入力文書 1 4を作成する (S 2 0 3 ) 。 さらに、 光電変換手段 1により、 修正入力文書 1 4がイメージ入力され、 電子データ化され電子データ蓄積手段 6 に蓄積される (S 2 0 5 ) 。 修正認識手段 6は、 出力手段 4により出力された文書中で、 修正記号が記入さ れた文字の位置を検出する (S 2 0 7 ) 。 また、 修正認識手段 6は、 活字がドッ 卜で印字されていることに注目して、 電子化された文書画像から行及び文字位置 を検出する。 詳細は後述する。 また、 修正認識手段 6は、 修正入力文書 1 4に記 入された修正記号や文字を、 認識する (S 2 1 1 ) 。 なお、 修正すべき正しい文 字は、 例えば、 文字認識手段 2等により、 認識することもできる。 対応付け手段 7は、 修正認識手段 6により検出された文字の位置と、 修正認識 手段 6により認識された修正記号とを対応付け、 修正対象文字を特定する ( S 2 1 3 ) 。 すなわち、 記入された修正記号や文字が、 出力された文字のどこに対応 するのかを対応付ける。 このようにして、 記号検出 ·認識結果 1 5が得られる。 修正手段 8は、 対応付け手段 7の対応付けに従い、 該当する文字を修正記号に 対応する修正処理により修正する (S 2 1 5 ) 。 修正手段 8は、 修正認識手段 6 により認識された修正記号に従い文書ファィル記憶手段 3に記憶された文書ファ ィルを修正する。 ここでは、 図 1 0中、 斜線による指示は、 例えば、 対応する活 字をその上に記入された文字と置き換える記号を表すので、 修正手段 8は、 この ような記入文字に対応して、 「埋」 を 「理」 に修正する処理を、 文書ファイル記 憶手段 3に対して行う。 ここでは、 文字認識手段 2又は修正認識手段 6等により、 修正すべき正しい文字として 「理」 が認識される。 なお、 修正手段 8は、 この他 にも、 例えば、 単語認識処理、 結合処理、 分割処理を実行し、 文書ファイルを修 正することができる。 出力手段 4は、 修正された文書ファイルを文書フアイル記憶手段 3から読み出 し、 修正結果 1 6を出力する (S 2 1 7 ) 。 この際、 プリン卜アウトに限らずデ イスプレイ上に出力して確認しても良レ、。 図 1 1に、 プリントアウト文字位置検出の説明図を示す。 修正認識手段 6は、 入力文書のエッジ画像について、 横方向及び縦方向のそれ ぞれに画素数を積算することにより、 文字の位置を検出するようにする。 具体的 には、 例えば、 まず輪郭画像を作成して、 これを強調し、 この輪郭画像上で、 水 平方向に黒画素数を数えて行位置を検出し、 次に各行毎に垂直方向に黒画素数を 計数することにより文字位置を検出する。 この例では、 修正認識手段 6による、 第 1行目 「本論文では」 及び第 2行目 「取 り装置の」 とプリントアウトされた文字の位置検出について説明する。 まず、 修 正記号を記入済みの文字イメージ 4 1が入力される。 この文字イメージに基づき、 各画素の輪郭を検出を行い、 輪郭画像 4 2を求める。 さらに、 検出された輪郭画 素 4 2について、 横方向に画素を積算又は加算して横方向周辺分布 4 3を求める。 この横方向分布により行の位置を検出することができる。 つぎに、 求められた各 行について、 同様に、 検出された輪郭画素 4 4を切り出し、 これについて縦方向 に画素を積算又は加算して縦方向周辺分布 4 5を求める。 この縦方向周辺分布 4 5により各行の文字の位置 (横方向位置) を求めることができる。 以上のように、 ある文字について、 行と列が特定され文字位置検出画像 4 6が得られる。 さらに、 各文字の最も外側の画素を検出することで、 各文字を切り出すことができる。 図 1 2に、 修正記号の認識の説明図を示す。 修正認識手段 6は、 修正記号の抽出 (記入文字の検出) において、 修正入力文 書 1 4から、 画素の連結成分を求め、 求められた連結成分の画素数に基づき、 あ らかじめ決められた大きさ以下の黒連結成分を除くことにより、 修正記号のみを 残し、 残された修正記号の各々を切り出す。 修正認識手段 6は、 修正記号を切り 出す際、 例えば、 白画素に隣接する黒画素をエッジとして、 そのエッジを元の画 像から除去する収縮処理を備える。 この例では、 「取り装置の」 とプリントアウトされた文字の 「装」 に記入され た修正記号について説明する。 まず、 修正記号が記入済みの文字イメージ 5 1が 入力される。 つぎに、 各画素を収縮処理 (収縮処理) した収縮画像 5 2に変換す る。 この収縮処理を適宜の回数実行して、 ドット文字を消去することにより、 修正 記号を抽出した修正記号抽出画像 5 3を得る。 ドット文字を消去する方法として は、 例えば、 黒画素が連結した部分である連結成分を求めて、 各連結成分の黒画 素数を求める。 さらに、 この黒画素数が予め決められた閾値以上である連結成分 を残すことにより修正記号を抽出することができる。 つぎに、 修正認識手段 6は、 修正記号を切り出し、 どのような修正指示を行うための修正記号かを認識するこ とで修正記号認識結果 5 4が得られる。 なお、 各処理間の入出力については、 インターネット等の通信を用いても良い。 産業上の可能性 本発明によると、 以上のように、 文字イメージや読み取り結果の候補文字等の 中間処理結果を記憶しておき、 この中間処理結果と認識手段及び修正手段等を用 いることにより、 文字の修正をかならずしも文字を記入せずに、 誤って認識され た文字の修正を行うことができる。 また、 本発明によると、 従来のように文字を 記入した場合の修正についても適用することができる。 However, conventionally, it is assumed that the recognition of characters such as kanji written by hand in an output document for correction is performed correctly. In the past, in other words, what was not accurately recognized in the original recognition process was simply repeated repeatedly by the same recognition process.Accurately recognizing handwritten characters can be quite realized at present. Have difficulty. In view of the above, the present invention stores intermediate processing results such as a character image and a candidate character of a reading result, and always corrects characters by using the intermediate processing result and recognition means and correction means. The purpose is to correct misrecognized characters without writing them. Further, according to the present invention, it is another object of the present invention to apply a modification in a case where a character is entered as in the past. According to a first solution of the present invention, an output step of displaying or outputting a document file created by character recognition, and a correction recognition for recognizing a correction content instructed for a document output by the output step And a matching step of determining the correspondence between the correction content recognized in the correction recognition step and the character to be corrected; and correcting the document file according to the correction content based on the determination result by the mapping means. And a recording medium storing a document input program. According to a second solution of the present invention, a document file storage means for storing a document file created by character recognition, an output means for displaying or outputting the document file stored in the character file storage means, A correction recognizing means for recognizing the correction content instructed for the document output by the output means, and a correlating means for determining a correspondence between the correction content recognized by the correction recognizing means and the character to be corrected. And a correction means for correcting a document file stored in the document file storage means in accordance with the content of the correction based on the result of determination by the associating means. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a configuration diagram of a document input device according to the invention. FIG. 2 is a flowchart of a first embodiment of a document input method according to the present invention. FIG. 3 is an explanatory diagram relating to the writing input process. FIG. 4 is an explanatory diagram of word processing. FIG. 5 is an explanatory diagram of the combining process. FIG. 6 is an explanatory diagram of the separation processing. FIG. 7 is a flowchart of the document input method according to the second embodiment of the present invention. FIG. 8 is a flowchart of a third embodiment of the document input method according to the present invention. Fig. 9 shows the extraction and identification of correction symbols in the case of dot characters or gray color. These are the flowcharts. FIG. 10 is an explanatory diagram relating to the document processing of No. 9. FIG. 11 is an explanatory diagram of detecting a lint-out character position. FIG. 12 is an explanatory diagram of recognition of a positive sign. FIG. 13 is a configuration diagram of a conventional document input device. FIG. 14 is an explanatory diagram of the conventional document input process. BEST MODE FOR CARRYING OUT THE INVENTION FIG. 1 shows a configuration diagram of a document input device according to the present invention. This document input device includes photoelectric conversion means 1, character recognition means 2, document file storage means 3, output means 4, input means 5, correction recognition means 6, association means 7, correction means 8, intermediate processing file storage means. 9 is provided. The photoelectric conversion means 1 is composed of, for example, an OCR, reads a document to be read, and converts it into electronic data. The character recognition means 2 recognizes a character or a document from the electronic data read by the photoelectric conversion means 1. The document file storage means 3 stores the document file created by the character recognition means 2. The output means 4 displays or outputs the document file stored in the character file storage means 3. The output unit 4 is a device for printing out the output document by the printing unit, a device for displaying the output document on a display, or the like. As the output means 4, a device for interfacing with an external device, a transmission device, a drive device for various recording media, or the like can be used. The input means 5 is a suitable input device such as a pointing device such as a mouse and a touch pen, a keyboard and the like. The operator refers to the recognition result output by the output unit 4 and instructs the content of the correction. For example, in the case of an output printout by the output unit 4 such as a printer, the correction content may be indicated by inputting a correction symbol in the print result. This place In this case, the entered correction symbol is cut out by the correction recognition means 6. In the case of a display screen by the output means 4 such as a display, the instruction of the content of the correction may be made by pointing the correction content and the correction position by the pointing device or the like of the input means 5 while watching the screen. The correction recognizing means 6 recognizes correction contents (correction symbols, characters, etc.) specified for the document or display screen output by the output means 4. When a document as a recognition result is printed out by the output unit 4, the correction content is input from the photoelectric conversion unit 1. On the other hand, when the document as the recognition result is displayed on the display, the input means 5 appropriately inputs the correction content and the position. The correction recognizing means 6 can recognize not only the correction symbols but also the above-described conventional correction characters and correction portions. The correction recognition means 6 includes, for example, word processing for recognizing a plurality of characters as a word, combining processing for recognizing a character pattern for a plurality of characters as one character pattern, and a plurality of character patterns corresponding to one character. It is possible to recognize a correction symbol or a correction character when performing a separation process or the like for recognizing the character pattern as a character pattern. The associating means 7 determines the correspondence between the content of the correction recognized by the correction recognizing means 6 and the character or the position of the character to be corrected. That is, the entered correction symbol and character are associated with the processing of the output character. The correction means 8 refers to the intermediate processing file storage means 9 based on the result of the determination made by the associating means 7 when re-recognized, and refers to the intermediate processing file storage means 9 to store the document file stored in the document file storage means 3 in accordance with the content of the correction. Fix it. For example, the correction unit 8 may be configured to convert the character indicated by the position information of the document file from the candidates stored in the intermediate processing file storage unit 9 at the time of the previous recognition, according to the association method 7, for example. Modify to characters. The correction means 8 can also execute the above-described conventional correction processing. For example, in the case of word processing, the correcting means 8 recognizes a designated plurality of characters as one word, and in the case of combining processing, recognizes a character pattern for a plurality of characters as one character pattern. In the case of processing, character flutter corresponding to one character Can be separated and recognized as a plurality of character patterns, and the corresponding correction processing is executed for each. It should be noted that the present invention can be appropriately applied to other instructions for correction. When the character recognition means 2 performs recognition, the intermediate processing file storage means 9 stores processing data in the middle, that is, a character pattern and its position, a line position, a recognition result including a recognition candidate character, and the like. The character recognition means 2 stores the information in the intermediate processing file storage means 9 during the character recognition processing. When executing the character re-recognition and correction processing, the correction means 8 extracts the position information and the candidate character for the corresponding character or line from the intermediate processing file storage means 9. FIG. 2 shows a flowchart of the first embodiment of the document input method according to the present invention. FIG. 3 is an explanatory diagram related to the document input process. Here, a case where the output result is printed out on paper or the like will be described as an example. First, the input document 10 is image-input from the photoelectric conversion means 1 (S101), and the data is stored in the intermediate processing file storage means 9. The character recognizing means 2 cuts out characters from the input image (S103), and stores character or line position information, recognition candidates and the like in the intermediate processing file storage means 9. The character recognition means 2 performs character recognition (S105), and stores the recognition result in the document file storage means 3. The output means 4 outputs the recognition result (S107). Here, it is assumed that the recognition output 11 is output. In the case of paper output, the operator gives an instruction of the correction content to a character or a document that has not been sufficiently recognized (SI11). The instruction for the correction content here is made, for example, by using the correction symbol of the first symbol 12a, the second symbol 12b, and the third symbol 12c in the correction input 12 in which the symbol characters are entered. Will be In the example of the modified symbols, the first symbol 12a, the second symbol 12b, and the third symbol 12c are symbols for instructing word processing, combining character patterns, and separating character patterns, respectively. The correction recognizing means 6 cuts out the written correction symbol (S113). The extraction of the correction symbol is performed by detecting the line spacing and the character spacing of the print position, for example. Alternatively, a correction symbol inserted between characters can be detected. When the correction symbol is a color such as blue or red, the correction symbol can be detected by adding a light-receiving element or a filter that detects only that color as the photoelectric conversion unit 1. . Conversely, when the character printed by the output means 4 is a color character, the correction symbol can be similarly detected. Further, when character position information is printed on the printout by the output means 4, it is possible to recognize what is written in other positions as correction symbols. Further, when the printout is performed by the output unit 4 in dot characters or gray color, the correction symbol can be identified by the continuity and thickness of the pixels. This point will be described later. Here, if the extracted modified character is recognized as a word process, a combination of characters corresponding to this symbol in a candidate character to be recognized is determined from a combination of candidate characters that is most likely to be a word. The word processing is performed (S117). Figure 4 shows an illustration of word processing. Generally, in character recognition, each character pattern is recognized, and the first recognition candidate character is output as a recognition result. If this is determined as a word, for example, by a combination of two characters, the number of erroneous recognitions is reduced because there are restrictions on the combinations of the first and second characters that can be established as words. For example, suppose that as a result of recognizing the first character, four characters are determined as recognition candidates. That is, the first candidate is “mochi”, the second candidate is “special”, the third candidate is “samurai”, and the fourth candidate is “earning”. Also, assume that three characters are determined as recognition candidates as a result of recognizing the second character. That is, the first candidate is “fine”, the second candidate is “sign”, and the third candidate is “work”. In the recognition results for each character, the first candidates “Toku” and “Fine” were selected, respectively. However, in word processing, the user was instructed to recognize these two characters as words, so we consider combinations of each candidate for each character. For example, the combination of the second candidate “Toku” and the second candidate “Sho” is the sum of the candidate priorities 4 (2 + 2), and the fourth candidate “Earn” and the third candidate “Work” Is determined to be the sum of the candidate priorities 7 (4 + 3). Therefore, as the word processing result, the first candidate “feature” and the second candidate “operation” are determined as recognition candidates in descending order of priority (ie, in order of decreasing sum of priorities). Here, the “feature” power of the first candidate Be recognized. If it is recognized that the character pattern is to be combined, the corresponding character patterns are combined, and a combining process for recognizing one character is executed (SI 19). FIG. 5 is an explanatory diagram of the combining process. In the combining process, a character pattern (stored in the intermediate processing file storage means 9) corresponding to two or more characters specified by the correction symbol is recognized again as one character pattern. Here, an instruction to combine “T” and “Why” in the recognition result was given, and as a result of re-recognition, “Extract” was recognized. If it is recognized that the separation process of the character and the 'turn' is performed, the separation process is performed again from the character extraction for the corresponding character pattern (SI17). Figure 6 shows an illustration of the separation process. In the separation processing, a character pattern (stored in the intermediate processing file storage means 9) corresponding to the character indicated by the correction symbol is cut out as two character butters and recognized again. Here, the instruction to separate the “sum” of the recognition results was given, and as a result of the recognition again, “f” and “e” were recognized. The correction means 8 corrects the document file storage means 3 according to each process (S123), and outputs the correction result 13 (S125). If a satisfactory result is not obtained (S129), the process returns to step S111 to repeat the process. When the correction has been made sufficiently, the processing ends. Next, FIG. 7 shows a flowchart of a document input method according to a second embodiment of the present invention. This shows the operation when the recognition output 11 is output by the output means 4 by the display. The processes in steps S101 to S107 and S117 to S127 are the same as those in FIG. Steps S111 and S113 in FIG. 2 are replaced by steps S115 in this figure. In the case of display output, the contents of correction and the corresponding character position are indicated by using the input means 5 such as a mouse (S115). Therefore, the correction recognizing means 6 and the associating means 7 can directly grasp the content and position of the correction from the instruction by the mouse or the like. Further, the correction character itself can be input from the keyboard of the input means 4 or the like. FIG. 8 shows a flowchart of the third embodiment of the document input method according to the present invention. The processes in steps S101 to S107 and S115 to S127 are the same as those in FIG. Further, the processing in step S115 is the same as that in FIG. Here, a recognition output 11 1 S by the output unit 4 and a step S 109 for determining whether the output is paper output or display output are added. In the case of paper output, processing similar to that of the first embodiment is executed, and in the case of display output, processing similar to that of the second embodiment is executed. In the following, a description will be given of the extraction and identification of a correction symbol when the printout by the output unit 4 is a dot character or a gray color. Fig. 9 shows a flowchart for cutting out and identifying modified symbols in the case of dot characters or gray color. FIG. 10 is an explanatory diagram related to the document processing of FIG. The output means 4 outputs a printout of the document file stored in the document file storage means 3 in a dot character or gray color image (S201). The operator enters a symbol for correction and characters as necessary on this printout, and creates a correction input document 14 (S203). Further, the corrected input document 14 is image-input by the photoelectric conversion means 1, converted into electronic data, and stored in the electronic data storage means 6 (S205). The correction recognizing means 6 detects the position of the character in which the correction symbol is entered in the document output by the output means 4 (S207). Further, the correction recognizing means 6 detects lines and character positions from the digitized document image, noting that the printed characters are printed in dots. Details will be described later. Further, the correction recognizing means 6 recognizes the correction symbols and characters entered in the correction input document 14 (S211). The correct character to be corrected can be recognized by, for example, the character recognition means 2 or the like. The associating means 7 associates the position of the character detected by the correction recognizing means 6 with the correction symbol recognized by the correction recognizing means 6 and specifies the character to be corrected (S 2 13 ) . That is, it is associated with where the entered correction symbol or character corresponds to the output character. In this way, a symbol detection / recognition result 15 is obtained. The correction means 8 corrects the corresponding character by the correction processing corresponding to the correction symbol in accordance with the correspondence of the correspondence means 7 (S215). The correction means 8 corrects the document file stored in the document file storage means 3 according to the correction symbol recognized by the correction recognition means 6. Here, in FIG. 10, the hatched instruction indicates, for example, a symbol that replaces the corresponding character with the character written thereon, so that the correcting means 8 responds to such a character by The process of correcting “filled” into “physical” is performed on the document file storage means 3. Here, the character “rea” is recognized by the character recognition means 2 or the correction recognition means 6 as a correct character to be corrected. In addition, the correction means 8 can perform, for example, word recognition processing, combination processing, and division processing to correct the document file. The output means 4 reads the corrected document file from the document file storage means 3 and outputs a correction result 16 (S2 17). At this time, it is okay to output and confirm on the display as well as the printout. Fig. 11 is an explanatory diagram of printout character position detection. The correction recognizing means 6 detects the position of the character by multiplying the number of pixels in each of the horizontal direction and the vertical direction in the edge image of the input document. Specifically, for example, first, a contour image is created, emphasized, and on this contour image, the number of black pixels is counted in the horizontal direction to detect a row position. The character position is detected by counting the number of black pixels. In this example, the position detection of the characters printed as “this paper” on the first line and “capturing device” on the second line by the correction recognition means 6 will be described. First, a character image 41 in which a correction symbol has been entered is input. The contour of each pixel is detected based on this character image, and a contour image 42 is obtained. Further, with respect to the detected contour pixel 42, pixels are added or added in the horizontal direction to obtain a horizontal peripheral distribution 43. The position of the row can be detected from the horizontal distribution. Next, for each of the obtained rows, similarly, the detected contour pixels 44 are cut out, and the pixels are added or added in the vertical direction to obtain the vertical peripheral distribution 45. From the vertical marginal distribution 45, the character position (horizontal position) of each line can be obtained. As described above, for a certain character, the row and column are specified, and a character position detection image 46 is obtained. Furthermore, each character can be cut out by detecting the outermost pixel of each character. FIG. 12 shows an explanatory diagram of recognition of a correction symbol. The correction recognizing means 6 calculates the connected components of the pixels from the corrected input document 14 in the extraction of the corrected symbols (detection of the entered characters), and determines the connected components of the pixels in advance based on the obtained number of pixels of the connected components. By removing black connected components smaller than or equal to the size, only the modified symbols are left, and each of the remaining modified symbols is cut out. When cutting out a correction symbol, the correction recognition means 6 includes, for example, a contraction process of setting a black pixel adjacent to a white pixel as an edge and removing the edge from the original image. In this example, a description will be given of the correction symbol written in the “package” of the character printed out “of the collecting device”. First, a character image 51 in which a correction symbol has been entered is input. Next, each pixel is converted into a contracted image 52 in which contraction processing (contraction processing) is performed. This shrinking process is performed an appropriate number of times to erase the dot character, thereby obtaining a corrected symbol extracted image 53 from which the corrected symbol has been extracted. As a method of erasing a dot character, for example, a connected component which is a portion where black pixels are connected is obtained, and the number of black pixels of each connected component is obtained. Furthermore, a modified symbol can be extracted by leaving a connected component in which the number of black pixels is equal to or larger than a predetermined threshold. Next, the correction recognizing means 6 cuts out the correction symbol and recognizes what kind of correction symbol the correction symbol is to be issued, thereby obtaining a correction symbol recognition result 54. For input and output between the processes, communication such as the Internet may be used. Industrial Possibility According to the present invention, as described above, the intermediate processing results such as the character image and the candidate characters of the read result are stored, and the intermediate processing results are used and the recognition means and the correction means are used. It is possible to correct an erroneously recognized character without necessarily correcting the character. Further, according to the present invention, it is also possible to apply a modification in a case where a character is entered as in the related art.

Claims

請 求 の 範 囲 The scope of the claims
1 . 文字認識により作成された文書ファイルを表示又は出力する出力ステップと、 前記出力ステップにより出力された文書に対して指示された修正内容を認識す る修正認識ステップと、 前記修正認識ステップにより認識された修正內容と修正すベき文字との対応関 係を判断する对応付けステップと、 前記対応付け手段による判断結果に基づき、 修正内容に応じて文書ファイルを 修正する修正ステップと を備えた文書入力方法。 1. An output step of displaying or outputting a document file created by character recognition, a correction recognition step of recognizing the correction content instructed for the document output by the output step, and a recognition by the correction recognition step An associating step of determining the correspondence between the corrected content and the character to be corrected, and a correcting step of correcting the document file according to the content of the correction based on the determination result by the associating means. Document input method.
2 . 入力された文書に基づき認識された文字又は行についての位置情報と候補文 字を中間処理フアイルに記憶する文字認識ステップをさらに備え、 前記修正ステップは、 前記文字認識ステップにより記憶された中間処理フアイ ルを参照して、 文書ファイルの位置情報により指示された文字を別の候補文字に 修正することを特徴とする請求項 1に記載の文書入力方法。 2. A character recognition step of storing position information and a candidate character of a character or a line recognized based on the input document in an intermediate processing file, wherein the correction step includes the intermediate step stored in the character recognition step. 2. The document input method according to claim 1, wherein the character designated by the position information of the document file is corrected to another candidate character with reference to the processing file.
3 . 前記出カステツプにより認識結果がプリントアウトされた場合、 認識結果に 記入されることにより修正内容を指示するための修正記号を切り出す記号切り出 しステップをさらに備えたことを特徴とする請求項 1又は 2に記載の文書入力方 法。 3. If the recognition result is printed out by the output step, the method further comprises a symbol cutout step of writing out a correction symbol for indicating a correction content by filling in the recognition result. Document input method described in 1 or 2.
4 . 前記出力ステップにより認識結果がディスプレイ上に表示された場合、 表示 された画面上で修正内容及び修正位置を指示するための入力ステップをさらに備 えたことを特徴とする請求項 1乃至 3のいずれかに記載の文書入力方法。 4. The method according to claim 1, further comprising an input step for designating a correction content and a correction position on the displayed screen when the recognition result is displayed on a display by the output step. Document input method described in any of them.
5 . 前記修正認識ステップは、 単語として複数文字を認識するための単語処理、 複数文字に対する文字パター ンをひとつの文字パターンとして認識するための結合処理、 又は、 ひとつの文字 に対応する文字パターンを複数の文字パターンとして認識するための分離処理の いずれか又は複数を認識することを特徴とする請求項 1乃至 4のいずれかに記載 の文書入力方法。 5. The correction recognition step includes a word process for recognizing a plurality of characters as a word, a combining process for recognizing a character pattern for a plurality of characters as one character pattern, or a character pattern corresponding to one character. 5. The document input method according to claim 1, wherein any one or a plurality of separation processes for recognizing a plurality of character patterns is recognized.
6 . 前記修正ステップは、 指示された修正内容が単語処理の場合、 指示された複数文字をひとつの単語と し又ョ忍^し、 指示された修正内容が結合処理の場合、 複数文字に対する文字パターンをひと つの文字パターンとして認識し、 指示された修正内容が分離処理の場合、 ひとつの文字に対応する文字パターン を複数の分離された文字パターンとして認識することを特徴とする請求項 1乃至 5のいずれかに記載の文書入力方法。 6. In the correcting step, when the specified correction content is word processing, the specified plural characters are combined into one word, and when the specified correction content is the combining processing, the character corresponding to the plurality of characters is processed. 6. The method according to claim 1, wherein the pattern is recognized as one character pattern, and when the specified correction content is a separation process, a character pattern corresponding to one character is recognized as a plurality of separated character patterns. Document input method according to any of the above.
7 . 文字認識により作成された文書ファイルを表示又は出力する出力ステップと、 前記出力ステップにより出力された文書に対して指示された修正内容を認識す る修正認識ステップと、 前記修正認識ステップにより認識された修正内容と修正すべき文字との対応関 係を判断する対応付けステップと、 前記対応付け手段による判断結果に基づき、 修正内容に応じて文書ファイルを 修正する修正ステップと を備えた文書入力プログラムを記録した記録媒体。 7. An output step of displaying or outputting the document file created by the character recognition, and a correction recognition step of recognizing a correction content specified for the document output by the output step. An associating step of judging a correspondence relationship between the amendment recognized in the amendment recognizing step and a character to be amended, and an amendment step of amending a document file according to the amendment based on the judgment result by the associating means. A recording medium on which a document input program comprising and is recorded.
8 . 文字認識により作成された文書フアイルを記憶する文書ファィル記憶手段と、 前記文字ファィル記憶手段に記憶された文書ファィルを表示又は出力する出力 手段と、 前記出力手段により出力された文書に対して指示された修正内容を認識する修 正認識手段と、 前記修正認識手段により認識された修正内容と修正すべき文字との対応関係を 判断する対応付け手段と、 前記対応付け手段による判断結果に基づき、 修正内容に応じて前記文書フアイ ル記憶手段に記憶された文書ファイルを修正する修正手段と を備えた文書入力装置。 8. Document file storage means for storing a document file created by character recognition, output means for displaying or outputting the document file stored in the character file storage means, and for a document output by the output means. A correction recognizing means for recognizing the instructed correction content, an associating means for judging a correspondence relationship between the correction content recognized by the correction recognizing means and a character to be corrected; and A document input device comprising: a correction unit that corrects a document file stored in the document file storage unit according to the content of the correction.
9 . 入力された文書に基づき認識された、 文字又は行についての位置情報と候補 文字を記憶する中間処理ファィル記憶手段をさらに備え、 前記修正手段は、 前記中間処理ファイル記憶手段を参照して、 文書ファイルの 位置情報により指示された文字を別の候補文字に修正することを特徴とする請求 項 8に記載の文書入力装置。 9. Further provided is an intermediate processing file storage unit that stores position information and candidate characters for characters or lines recognized based on the input document, and the correction unit refers to the intermediate processing file storage unit, 9. The document input device according to claim 8, wherein the character designated by the position information of the document file is corrected to another candidate character.
PCT/JP2000/002484 1999-08-11 2000-04-17 Document input method, recording medium recording document input program and document input device WO2001013325A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP22731999A JP3954247B2 (en) 1999-08-11 1999-08-11 Document input method, recording medium recording document input program, and document input device
JP11/227319 1999-08-11

Publications (1)

Publication Number Publication Date
WO2001013325A1 true WO2001013325A1 (en) 2001-02-22

Family

ID=16858951

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2000/002484 WO2001013325A1 (en) 1999-08-11 2000-04-17 Document input method, recording medium recording document input program and document input device

Country Status (2)

Country Link
JP (1) JP3954247B2 (en)
WO (1) WO2001013325A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6102374B2 (en) * 2013-03-15 2017-03-29 オムロン株式会社 Reading character correction program and character reading device
JP2016218638A (en) * 2015-05-18 2016-12-22 京セラドキュメントソリューションズ株式会社 Electronic apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63220383A (en) * 1987-03-10 1988-09-13 Mitsubishi Electric Corp Character input device
JPH06325214A (en) * 1993-05-14 1994-11-25 Sanyo Electric Co Ltd Character recognition post processor
JPH0744655A (en) * 1993-08-03 1995-02-14 Sony Corp Handwritten input display device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63220383A (en) * 1987-03-10 1988-09-13 Mitsubishi Electric Corp Character input device
JPH06325214A (en) * 1993-05-14 1994-11-25 Sanyo Electric Co Ltd Character recognition post processor
JPH0744655A (en) * 1993-08-03 1995-02-14 Sony Corp Handwritten input display device

Also Published As

Publication number Publication date
JP2001052111A (en) 2001-02-23
JP3954247B2 (en) 2007-08-08

Similar Documents

Publication Publication Date Title
JPH02502679A (en) Apparatus and method for encoding and decoding barcodes
JP4909311B2 (en) Character recognition device
WO2001071649A1 (en) Method and system for searching form features for form identification
JP2010157107A (en) Business document processor
JP2007005950A (en) Image processing apparatus and network system
JP3954246B2 (en) Document processing method, recording medium storing document processing program, and document processing apparatus
JP5357711B2 (en) Document processing device
WO2001013325A1 (en) Document input method, recording medium recording document input program and document input device
JP2017116974A (en) Image processing device and image processing program
JP3955467B2 (en) Image processing program and image processing apparatus
US20060023236A1 (en) Method and arrangement for copying documents
JPS63146187A (en) Character recognizing device
JP4552822B2 (en) Image processing apparatus, image processing method, and image processing program
JPH0991385A (en) Character recognition dictionary adding method and terminal ocr device using same
JP4083723B2 (en) Image processing device
JP3077580B2 (en) Character reader
JP4736595B2 (en) Teaching material processing apparatus, teaching material processing method, and teaching material processing program
JP2924356B2 (en) Optical character reader
WO2003038739A1 (en) Apparatus and method for determining selection data from pre-printed forms
US20100134849A1 (en) Image processing apparatus, image processing method and computer readable medium
JPH04309B2 (en)
JP2578768B2 (en) Image processing method
JP2003085477A (en) Character recognizing device and correcting method of character recognition result
JPH10187880A (en) Character reader and storage medium storing character read processing
JP2005175565A (en) Image processing apparatus

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): DE FR GB IT

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase