WO2001013325A1 - Procede d'entree de document, support d'impression, programme d'entree de document a imprimer et dispositif d'entree de document - Google Patents

Procede d'entree de document, support d'impression, programme d'entree de document a imprimer et dispositif d'entree de document Download PDF

Info

Publication number
WO2001013325A1
WO2001013325A1 PCT/JP2000/002484 JP0002484W WO0113325A1 WO 2001013325 A1 WO2001013325 A1 WO 2001013325A1 JP 0002484 W JP0002484 W JP 0002484W WO 0113325 A1 WO0113325 A1 WO 0113325A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
correction
document
recognition
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2000/002484
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
Masaki Nakagawa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Japan Science and Technology Agency
Original Assignee
Japan Science and Technology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Japan Science and Technology Corp filed Critical Japan Science and Technology Corp
Publication of WO2001013325A1 publication Critical patent/WO2001013325A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/987Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns with the intervention of an operator

Definitions

  • the present invention relates to a document input method, a recording medium recording a document input program, and a document input device.
  • the present invention particularly relates to a document input method usable in the field of information devices used in offices and the like, and capable of correcting recognized characters, a recording medium storing a document input program, and a document input device.
  • BACKGROUND ART Conventionally, there is a document input device that inputs a document using an optical character reader (Optical Character Reader, OCR) or the like, and corrects a read result without using a keyboard and a display (for example, a special document input device). See 63-22020383).
  • FIG. 13 shows a configuration diagram of a conventional document input device.
  • FIG. 13 shows a configuration diagram of a conventional document input device.
  • This document input device includes a photoelectric conversion unit 101, a character recognition unit 102, a document file storage unit 103, a printing unit 104, a correction symbol / character recognition unit 106, a correspondence unit 107, a correction unit. Equipped with 1 08.
  • an input document 110 to be read is converted into electronic data by a photoelectric conversion unit 101, read by a character recognition unit 102, and the result is stored in a document file storage unit 103.
  • the document file stored in the document file storage means 103 is printed by the printing means 104, and an output document 105 (recognition output 111) is output.
  • a symbol and a character are entered and a correction input 1 1 2 is created.
  • the corrected input 1 1 and 2 were converted into electronic data again by the photoelectric conversion means 101, and entered in the corrected symbol / character recognition means 106. Read symbols and letters. Further, the correspondence between the positions where these are entered and the positions of the document files is performed by the associating means 107, and the processing according to the entered correction symbols is performed by the correcting means 108, and the correction result 1 is obtained. You can get 1 3 In this way, people who cannot use a keyboard and display can create and modify electronic documents. Disclosure of the invention
  • the present invention stores intermediate processing results such as a character image and a candidate character of a reading result, and always corrects characters by using the intermediate processing result and recognition means and correction means. The purpose is to correct misrecognized characters without writing them. Further, according to the present invention, it is another object of the present invention to apply a modification in a case where a character is entered as in the past.
  • a recording medium storing a document input program.
  • FIG. 1 is a configuration diagram of a document input device according to the invention.
  • FIG. 1 is a configuration diagram of a document input device according to the invention.
  • FIG. 2 is a flowchart of a first embodiment of a document input method according to the present invention.
  • FIG. 3 is an explanatory diagram relating to the writing input process.
  • FIG. 4 is an explanatory diagram of word processing.
  • FIG. 5 is an explanatory diagram of the combining process.
  • FIG. 6 is an explanatory diagram of the separation processing.
  • FIG. 7 is a flowchart of the document input method according to the second embodiment of the present invention.
  • FIG. 8 is a flowchart of a third embodiment of the document input method according to the present invention.
  • Fig. 9 shows the extraction and identification of correction symbols in the case of dot characters or gray color. These are the flowcharts.
  • FIG. 10 is an explanatory diagram relating to the document processing of No. 9.
  • FIG. 10 is an explanatory diagram relating to the document processing of No. 9.
  • FIG. 10 is an explanatory diagram relating to the document processing of No. 9.
  • FIG. 10 is an explanatory diagram
  • FIG. 11 is an explanatory diagram of detecting a lint-out character position.
  • FIG. 12 is an explanatory diagram of recognition of a positive sign.
  • FIG. 13 is a configuration diagram of a conventional document input device.
  • FIG. 14 is an explanatory diagram of the conventional document input process.
  • BEST MODE FOR CARRYING OUT THE INVENTION FIG. 1 shows a configuration diagram of a document input device according to the present invention.
  • This document input device includes photoelectric conversion means 1, character recognition means 2, document file storage means 3, output means 4, input means 5, correction recognition means 6, association means 7, correction means 8, intermediate processing file storage means. 9 is provided.
  • the photoelectric conversion means 1 is composed of, for example, an OCR, reads a document to be read, and converts it into electronic data.
  • the character recognition means 2 recognizes a character or a document from the electronic data read by the photoelectric conversion means 1.
  • the document file storage means 3 stores the document file created by the character recognition means 2.
  • the output means 4 displays or outputs the document file stored in the character file storage means 3.
  • the output unit 4 is a device for printing out the output document by the printing unit, a device for displaying the output document on a display, or the like.
  • As the output means 4, a device for interfacing with an external device, a transmission device, a drive device for various recording media, or the like can be used.
  • the input means 5 is a suitable input device such as a pointing device such as a mouse and a touch pen, a keyboard and the like.
  • the operator refers to the recognition result output by the output unit 4 and instructs the content of the correction.
  • the correction content may be indicated by inputting a correction symbol in the print result. This place In this case, the entered correction symbol is cut out by the correction recognition means 6.
  • the instruction of the content of the correction may be made by pointing the correction content and the correction position by the pointing device or the like of the input means 5 while watching the screen.
  • the correction recognizing means 6 recognizes correction contents (correction symbols, characters, etc.) specified for the document or display screen output by the output means 4.
  • the correction recognizing means 6 can recognize not only the correction symbols but also the above-described conventional correction characters and correction portions.
  • the correction recognition means 6 includes, for example, word processing for recognizing a plurality of characters as a word, combining processing for recognizing a character pattern for a plurality of characters as one character pattern, and a plurality of character patterns corresponding to one character. It is possible to recognize a correction symbol or a correction character when performing a separation process or the like for recognizing the character pattern as a character pattern.
  • the associating means 7 determines the correspondence between the content of the correction recognized by the correction recognizing means 6 and the character or the position of the character to be corrected. That is, the entered correction symbol and character are associated with the processing of the output character.
  • the correction means 8 refers to the intermediate processing file storage means 9 based on the result of the determination made by the associating means 7 when re-recognized, and refers to the intermediate processing file storage means 9 to store the document file stored in the document file storage means 3 in accordance with the content of the correction. Fix it.
  • the correction unit 8 may be configured to convert the character indicated by the position information of the document file from the candidates stored in the intermediate processing file storage unit 9 at the time of the previous recognition, according to the association method 7, for example. Modify to characters.
  • the correction means 8 can also execute the above-described conventional correction processing.
  • the correcting means 8 recognizes a designated plurality of characters as one word, and in the case of combining processing, recognizes a character pattern for a plurality of characters as one character pattern.
  • character flutter corresponding to one character Can be separated and recognized as a plurality of character patterns, and the corresponding correction processing is executed for each.
  • the present invention can be appropriately applied to other instructions for correction.
  • the intermediate processing file storage means 9 stores processing data in the middle, that is, a character pattern and its position, a line position, a recognition result including a recognition candidate character, and the like.
  • the character recognition means 2 stores the information in the intermediate processing file storage means 9 during the character recognition processing.
  • the correction means 8 extracts the position information and the candidate character for the corresponding character or line from the intermediate processing file storage means 9.
  • FIG. 2 shows a flowchart of the first embodiment of the document input method according to the present invention.
  • FIG. 3 is an explanatory diagram related to the document input process. Here, a case where the output result is printed out on paper or the like will be described as an example.
  • the input document 10 is image-input from the photoelectric conversion means 1 (S101), and the data is stored in the intermediate processing file storage means 9.
  • the character recognizing means 2 cuts out characters from the input image (S103), and stores character or line position information, recognition candidates and the like in the intermediate processing file storage means 9.
  • the character recognition means 2 performs character recognition (S105), and stores the recognition result in the document file storage means 3.
  • the output means 4 outputs the recognition result (S107).
  • the recognition output 11 is output.
  • the operator gives an instruction of the correction content to a character or a document that has not been sufficiently recognized (SI11).
  • the instruction for the correction content here is made, for example, by using the correction symbol of the first symbol 12a, the second symbol 12b, and the third symbol 12c in the correction input 12 in which the symbol characters are entered.
  • the first symbol 12a, the second symbol 12b, and the third symbol 12c are symbols for instructing word processing, combining character patterns, and separating character patterns, respectively.
  • the correction recognizing means 6 cuts out the written correction symbol (S113). The extraction of the correction symbol is performed by detecting the line spacing and the character spacing of the print position, for example. Alternatively, a correction symbol inserted between characters can be detected. When the correction symbol is a color such as blue or red, the correction symbol can be detected by adding a light-receiving element or a filter that detects only that color as the photoelectric conversion unit 1. . Conversely, when the character printed by the output means 4 is a color character, the correction symbol can be similarly detected.
  • the number of erroneous recognitions is reduced because there are restrictions on the combinations of the first and second characters that can be established as words. For example, suppose that as a result of recognizing the first character, four characters are determined as recognition candidates. That is, the first candidate is “mochi”, the second candidate is “special”, the third candidate is “samurai”, and the fourth candidate is “earning”. Also, assume that three characters are determined as recognition candidates as a result of recognizing the second character. That is, the first candidate is “fine”, the second candidate is “sign”, and the third candidate is “work”. In the recognition results for each character, the first candidates “Toku” and “Fine” were selected, respectively.
  • the user was instructed to recognize these two characters as words, so we consider combinations of each candidate for each character.
  • the combination of the second candidate “Toku” and the second candidate “Sho” is the sum of the candidate priorities 4 (2 + 2), and the fourth candidate “Earn” and the third candidate “Work” Is determined to be the sum of the candidate priorities 7 (4 + 3). Therefore, as the word processing result, the first candidate “feature” and the second candidate “operation” are determined as recognition candidates in descending order of priority (ie, in order of decreasing sum of priorities).
  • the “feature” power of the first candidate Be recognized. If it is recognized that the character pattern is to be combined, the corresponding character patterns are combined, and a combining process for recognizing one character is executed (SI 19).
  • FIG. 5 is an explanatory diagram of the combining process.
  • a character pattern stored in the intermediate processing file storage means 9) corresponding to two or more characters specified by the correction symbol is recognized again as one character pattern.
  • an instruction to combine “T” and “Why” in the recognition result was given, and as a result of re-recognition, “Extract” was recognized. If it is recognized that the separation process of the character and the 'turn' is performed, the separation process is performed again from the character extraction for the corresponding character pattern (SI17).
  • Figure 6 shows an illustration of the separation process.
  • FIG. 7 shows a flowchart of a document input method according to a second embodiment of the present invention.
  • FIG. 8 shows a flowchart of the third embodiment of the document input method according to the present invention.
  • steps S101 to S107 and S115 to S127 are the same as those in FIG. Further, the processing in step S115 is the same as that in FIG.
  • a recognition output 11 1 S by the output unit 4 and a step S 109 for determining whether the output is paper output or display output are added.
  • processing similar to that of the first embodiment is executed, and in the case of display output, processing similar to that of the second embodiment is executed.
  • Fig. 9 shows a flowchart for cutting out and identifying modified symbols in the case of dot characters or gray color.
  • the output means 4 outputs a printout of the document file stored in the document file storage means 3 in a dot character or gray color image (S201).
  • the operator enters a symbol for correction and characters as necessary on this printout, and creates a correction input document 14 (S203).
  • the corrected input document 14 is image-input by the photoelectric conversion means 1, converted into electronic data, and stored in the electronic data storage means 6 (S205).
  • the correction recognizing means 6 detects the position of the character in which the correction symbol is entered in the document output by the output means 4 (S207). Further, the correction recognizing means 6 detects lines and character positions from the digitized document image, noting that the printed characters are printed in dots. Details will be described later.
  • the correction recognizing means 6 recognizes the correction symbols and characters entered in the correction input document 14 (S211).
  • the correct character to be corrected can be recognized by, for example, the character recognition means 2 or the like.
  • the associating means 7 associates the position of the character detected by the correction recognizing means 6 with the correction symbol recognized by the correction recognizing means 6 and specifies the character to be corrected (S 2 13 ) . That is, it is associated with where the entered correction symbol or character corresponds to the output character. In this way, a symbol detection / recognition result 15 is obtained.
  • the correction means 8 corrects the corresponding character by the correction processing corresponding to the correction symbol in accordance with the correspondence of the correspondence means 7 (S215).
  • the correction means 8 corrects the document file stored in the document file storage means 3 according to the correction symbol recognized by the correction recognition means 6.
  • the hatched instruction indicates, for example, a symbol that replaces the corresponding character with the character written thereon, so that the correcting means 8 responds to such a character by
  • the process of correcting “filled” into “physical” is performed on the document file storage means 3.
  • the character “rea” is recognized by the character recognition means 2 or the correction recognition means 6 as a correct character to be corrected.
  • the correction means 8 can perform, for example, word recognition processing, combination processing, and division processing to correct the document file.
  • the output means 4 reads the corrected document file from the document file storage means 3 and outputs a correction result 16 (S2 17).
  • Fig. 11 is an explanatory diagram of printout character position detection.
  • the correction recognizing means 6 detects the position of the character by multiplying the number of pixels in each of the horizontal direction and the vertical direction in the edge image of the input document. Specifically, for example, first, a contour image is created, emphasized, and on this contour image, the number of black pixels is counted in the horizontal direction to detect a row position. The character position is detected by counting the number of black pixels. In this example, the position detection of the characters printed as “this paper” on the first line and “capturing device” on the second line by the correction recognition means 6 will be described. First, a character image 41 in which a correction symbol has been entered is input.
  • each pixel is detected based on this character image, and a contour image 42 is obtained. Further, with respect to the detected contour pixel 42, pixels are added or added in the horizontal direction to obtain a horizontal peripheral distribution 43. The position of the row can be detected from the horizontal distribution. Next, for each of the obtained rows, similarly, the detected contour pixels 44 are cut out, and the pixels are added or added in the vertical direction to obtain the vertical peripheral distribution 45. From the vertical marginal distribution 45, the character position (horizontal position) of each line can be obtained. As described above, for a certain character, the row and column are specified, and a character position detection image 46 is obtained. Furthermore, each character can be cut out by detecting the outermost pixel of each character. FIG.
  • the correction recognizing means 6 calculates the connected components of the pixels from the corrected input document 14 in the extraction of the corrected symbols (detection of the entered characters), and determines the connected components of the pixels in advance based on the obtained number of pixels of the connected components. By removing black connected components smaller than or equal to the size, only the modified symbols are left, and each of the remaining modified symbols is cut out.
  • the correction recognition means 6 includes, for example, a contraction process of setting a black pixel adjacent to a white pixel as an edge and removing the edge from the original image. In this example, a description will be given of the correction symbol written in the “package” of the character printed out “of the collecting device”.
  • a character image 51 in which a correction symbol has been entered is input.
  • each pixel is converted into a contracted image 52 in which contraction processing (contraction processing) is performed.
  • contraction processing contraction processing
  • This shrinking process is performed an appropriate number of times to erase the dot character, thereby obtaining a corrected symbol extracted image 53 from which the corrected symbol has been extracted.
  • a connected component which is a portion where black pixels are connected is obtained, and the number of black pixels of each connected component is obtained.
  • a modified symbol can be extracted by leaving a connected component in which the number of black pixels is equal to or larger than a predetermined threshold.
  • the correction recognizing means 6 cuts out the correction symbol and recognizes what kind of correction symbol the correction symbol is to be issued, thereby obtaining a correction symbol recognition result 54.
  • communication such as the Internet may be used.
  • Industrial Possibility according to the present invention, as described above, the intermediate processing results such as the character image and the candidate characters of the read result are stored, and the intermediate processing results are used and the recognition means and the correction means are used. It is possible to correct an erroneously recognized character without necessarily correcting the character. Further, according to the present invention, it is also possible to apply a modification in a case where a character is entered as in the related art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)
PCT/JP2000/002484 1999-08-11 2000-04-17 Procede d'entree de document, support d'impression, programme d'entree de document a imprimer et dispositif d'entree de document Ceased WO2001013325A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP22731999A JP3954247B2 (ja) 1999-08-11 1999-08-11 文書入力方法、文書入力プログラムを記録した記録媒体及び文書入力装置
JP11/227319 1999-08-11

Publications (1)

Publication Number Publication Date
WO2001013325A1 true WO2001013325A1 (fr) 2001-02-22

Family

ID=16858951

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2000/002484 Ceased WO2001013325A1 (fr) 1999-08-11 2000-04-17 Procede d'entree de document, support d'impression, programme d'entree de document a imprimer et dispositif d'entree de document

Country Status (2)

Country Link
JP (1) JP3954247B2 (https=)
WO (1) WO2001013325A1 (https=)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6102374B2 (ja) * 2013-03-15 2017-03-29 オムロン株式会社 読取文字訂正用のプログラムおよび文字読取装置
JP2016218638A (ja) * 2015-05-18 2016-12-22 京セラドキュメントソリューションズ株式会社 電子機器

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63220383A (ja) * 1987-03-10 1988-09-13 Mitsubishi Electric Corp 文字入力装置
JPH06325214A (ja) * 1993-05-14 1994-11-25 Sanyo Electric Co Ltd 文字認識後処理装置
JPH0744655A (ja) * 1993-08-03 1995-02-14 Sony Corp 手書き入力表示装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63220383A (ja) * 1987-03-10 1988-09-13 Mitsubishi Electric Corp 文字入力装置
JPH06325214A (ja) * 1993-05-14 1994-11-25 Sanyo Electric Co Ltd 文字認識後処理装置
JPH0744655A (ja) * 1993-08-03 1995-02-14 Sony Corp 手書き入力表示装置

Also Published As

Publication number Publication date
JP3954247B2 (ja) 2007-08-08
JP2001052111A (ja) 2001-02-23

Similar Documents

Publication Publication Date Title
JPH02502679A (ja) バーコードをコード化し又解読する装置と方法
JP2010157107A (ja) 業務文書処理装置
JP2022074466A (ja) 画像処理装置および画像形成装置
JP3954246B2 (ja) 文書処理方法、文書処理プログラムを記録した記録媒体及び文書処理装置
JP5357711B2 (ja) 文書処理装置
WO2001013325A1 (fr) Procede d'entree de document, support d'impression, programme d'entree de document a imprimer et dispositif d'entree de document
JP2017116974A (ja) 画像処理装置および画像処理プログラム
JP3955467B2 (ja) 画像処理プログラム及び画像処理装置
US20060023236A1 (en) Method and arrangement for copying documents
JPH0991385A (ja) 文字認識辞書追加方法及びこれを用いた端末ocr装置
JP2007005950A (ja) 画像処理装置及びネットワークシステム
JPS63146187A (ja) 文字認識装置
JP4083723B2 (ja) 画像処理装置
JP2003085477A (ja) 文字認識装置および文字認識結果の訂正方法
JP3077580B2 (ja) 文字読取装置
JP4552822B2 (ja) 画像処理装置、画像処理方法、および画像処理プログラム
JP3187182B2 (ja) 光学的手書き文字列認識方法および装置
JP2924356B2 (ja) 光学文字読取装置
JP4633246B2 (ja) 認識文字修正方法および認識文字修正プログラムを記録したコンピュータ読み取り可能な記録媒体
JPH10187880A (ja) 文字読取装置およびその文字読取処理を記憶した記憶媒体
JPH04309B2 (https=)
JP2578768B2 (ja) 画像処理方法
JP3239965B2 (ja) 文字認識装置
JP2006092324A (ja) 文字認識装置及び文字認識方法
JP2001127974A (ja) 画像読取装置及びシステム

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): DE FR GB IT

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase