JP2004252881A - Text data correction method - Google Patents

Text data correction method Download PDF

Info

Publication number
JP2004252881A
JP2004252881A JP2003044798A JP2003044798A JP2004252881A JP 2004252881 A JP2004252881 A JP 2004252881A JP 2003044798 A JP2003044798 A JP 2003044798A JP 2003044798 A JP2003044798 A JP 2003044798A JP 2004252881 A JP2004252881 A JP 2004252881A
Authority
JP
Japan
Prior art keywords
correction
text data
step
correction method
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2003044798A
Other languages
Japanese (ja)
Inventor
Shinichi Inoue
信一 井上
Original Assignee
Mitsubishi Paper Mills Ltd
三菱製紙株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Paper Mills Ltd, 三菱製紙株式会社 filed Critical Mitsubishi Paper Mills Ltd
Priority to JP2003044798A priority Critical patent/JP2004252881A/en
Publication of JP2004252881A publication Critical patent/JP2004252881A/en
Application status is Pending legal-status Critical

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide a text data correction method by which systematic or automatic text data correction is efficiently done with good accuracy. <P>SOLUTION: The text data correction method comprises: a process for separately correcting the same uncorrected text data by a plurality of correctors; a process for storing a correction portion and correction contents done by each of the correctors as correction information; and a process for preparing an index for each and every correction portion by analyzing the plural correction information. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a text data correction method, and more specifically,
The present invention relates to a text data correction method capable of systematically, efficiently, accurately and automatically performing text data correction.
[0002]
[Prior art]
Correction of text data is widely performed in word processor, computer program development, database update, and the like.
[0003]
Generally, correction of text data is an action performed based on an arbitrary intention of a corrector, and it is difficult for anyone other than the corrector to determine how to correct the text data.
[0004]
For this reason, the technology of text data correction is intended to manage the correction history, to redo an incorrect correction, and to perform similar corrections collectively. (See, for example, Patent Documents 1 to 4).
[0005]
[Patent Document 1]
JP-A-5-324425 [Patent Document 2]
JP-A-8-221304 [Patent Document 3]
Japanese Patent Application Laid-Open No. 9-198384 [Patent Document 4]
Japanese Patent Application Laid-Open No. 11-143983
[Problems to be solved by the invention]
Generally, correction of text data is an action performed based on an arbitrary intention of a corrector, and it is difficult for anyone other than the corrector to determine how to correct the text data. On the other hand, some text data assumes a character string to be corrected.
[0007]
For example, text data read from an image by an OCR (Optical Character Reader: character recognition device) and converted to text often includes misread character strings, and correction of these misread character strings is performed by correcting characters to be corrected. The columns are fixed.
[0008]
Further, for example, for the purpose of learning or the like, text data including an incorrect character string (character string to be corrected) may be intentionally provided and corrected. These corrections have fixed character strings to be corrected.
[0009]
However, even in the case of text data correction in which a character string to be corrected is assumed, the corrector is a human and is not always completely corrected correctly due to oversight or mistake. For this reason, when a human further judges how much the correction has been made, this is repeated, and it has been difficult to systematize and automate.
[0010]
SUMMARY OF THE INVENTION It is an object of the present invention to provide a text data correction method for accurately and systematically correcting text data in which a character string to be corrected is assumed.
[0011]
Another object of the present invention is to provide a text data correction method for automatically correcting text data in which a character string to be corrected is assumed.
[0012]
It is a further object of the present invention to provide a text data correction method for correcting text data read by an OCR accurately, efficiently, or automatically.
[0013]
[Means for Solving the Problems]
The present inventor has made intensive studies in view of the above, and as a result, has come to invent the text data correcting method of the present invention.
That is, the text data correction method of the present invention includes a step of separately correcting the same pre-correction text data by a plurality of correctors, and a step of storing, as correction information, the location and content of correction by each corrector. And a step of analyzing the plurality of pieces of correction information and creating an index for each correction location.
[0014]
The text data correction method further includes a step of automatically determining the index created by the text data correction method based on a given rule, and automatically correcting the text data before correction. Is what you do.
[0015]
In the text data correction method, the method further comprises a step of displaying both the index created by the text data correction method and the correction content in correspondence with a correction portion of the text data before correction. is there.
[0016]
Further, in the text data correction method, after each correction of the correction information of each of the correctors created by the text data correction method is converted into an evaluation value based on the corresponding index, these evaluation values are totaled for each of the correctors. And a step of calculating the score of each corrector.
[0017]
In the above invention, the pre-correction text data is the data generated by an OCR (Optical Character Reader: character recognition device).
[0018]
In the above invention, the step of creating the index is for calculating the probability of occurrence of the same correction content at the same correction location, and the index is the occurrence probability.
[0019]
Further, in the above invention, the step of performing the correction is performed from an arbitrary place at an arbitrary time via the Internet.
[0020]
Further, in the above invention, the text data correction in the text data correction method is a desired sentence search.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the text data correcting method of the present invention will be described in detail with reference to the drawings.
[0022]
FIG. 1 is a block diagram showing a configuration example for implementing the text data correcting method of the present invention. In FIG. 1, a terminal 2 used by a modifier 1 and a server 3 are connected via the Internet 4. There may be a plurality of correctors 1 and the same terminal may be used, but in general, there are a plurality of correctors 1 corresponding to the correctors. There are a plurality of modification information items 6 to be described later corresponding to the modifier 1. In FIG. 1, these multiple entities are represented as 1a, 1b, and 1c, such as "corrector 1a".
[0023]
The text data 5 before correction is stored in the server 3, and the corrector 1 corrects the text data 5 before correction by a given program. This program can be modified without rewriting the original unmodified text data 5. In the correction work at each correction location, at least the correction location and the correction content are recorded as the correction information 6. The correction part can be recorded by the correction start character position and the correction character number. Further, the correction part can be recorded at the correction start character position and the correction end character position. The correction content is composed of the corrected character string. The correction information 6 is a file of a computer. It may be data from a database. An example of the modification information 6 is shown in FIG. FIG. 3 is a diagram showing an example of the correction information 6 of the embodiment in the text data correction method of the present invention. In FIG. 3, a character is counted as one character with a half-width (8 bits) in the starting character position and the number of characters. In addition, it can be determined by full-width (16 bits) or a predetermined rule.
[0024]
It is also possible to obtain the correction information 6 by a different method. The pre-correction text data 5 is separately recorded, and the pre-correction text data 5 and the post-correction text data are compared, whereby the correction portion and the correction content can be extracted. A method of recording this as the correction information 6 may be used.
[0025]
Each corrector 1 can perform the correction work from an arbitrary place and an arbitrary time via the Internet 4. This makes it possible to efficiently proceed with the text data correction work, which has a high business effect. For example, you can expect to work from home or work at night as a side job.
[0026]
Correction information 6 of each corrector 1 is collected and analyzed. In the analysis, for example, the correction information 6 on the same pre-correction text data 5 is combined into one and rearranged in the order of the correction portion. The correction contents for the same correction part are collected for each of the same correction contents, and the number is recorded. The number of the same correction is used as an index. The number of the same corrections is calculated by dividing the number of the same corrections by the number of the correction information (or the number of the correctors 1 who have corrected the same pre-correction text data 5). This ratio is used as an index. This can be expressed as a percentage or as a ratio. An example of the correction information analysis result 7 is shown in FIG. FIG. 4 is a diagram showing a correction information analysis result 7 of the embodiment in the text data correction method of the present invention. As a more advanced analysis, there is a method of summarizing whether the corrected character strings of the correction contents of the same correction portion are semantically close to each other.
[0027]
In this way, a group of indices, which are the results of the analysis of the correction location, the correction content, and the like, is obtained as the correction information analysis result 7. By using the correction information analysis result 7, text data correction can be accurately and systematically performed.
[0028]
For example, if a given rule is applied to the index of the correction information analysis result 7, a systematic automatic correction can be performed with high accuracy. For example, when the index is a ratio (%), it is possible to make a correction for a correction that is an index of 80% or more. This means that, for example, as a result of correction by ten correctors 1, only those corrected by eight or more persons are used as corrections. Since the accuracy of the correction is determined by a given rule, it can be adjusted according to the situation such as the ability and the number of the corrector 1.
[0029]
Also, when another decision person 9 makes the final decision, it is very efficient to display the modification information analysis result 7 in correspondence with the modified portion. Displaying in correspondence with the corrected portion means, for example, that the character string of the corrected portion is displayed in a different color (for example, red) from the other character strings, or is displayed in underline, italic, or bold. Further, the character string to be rewritten, which is the content of the correction, is listed near the correction part or in an area indicating the correction part together with the index. An example is shown in FIG. FIG. 5 is a diagram showing an example of a screen display of the terminal 2 when the determination person 9 of the embodiment of the text data correction method of the present invention performs correction. In FIG. 5, one of the correction information analysis results 7 can be selected to execute the correction. It is also possible to determine that no correction is made, and to make corrections not included in the correction information analysis result 7.
[0030]
In FIG. 1, the judge 9 is not shown, but the judge can be carried out by accessing the server 3 from the terminal 2 like the corrector 1 in FIG.
[0031]
These correction methods are particularly often used in the following manner. This is a case where a character string written, whether printed or handwritten, is recorded as an image and converted to text data.
[0032]
The image of the character string is generally a printed matter, such as a newspaper, a book, a magazine, or a catalog. Recently, there is a document created by a computer but printed out. In handwriting, there are handwritten documents, notebooks, memos, and the like, and recently, there are character images and the like input by hand into a computer. These are read as image data by a scanner or the like, and then the character string portion is converted into text data. These hardware and software are widely known as OCR.
[0033]
In the conversion to the text data by the OCR, an incorrect conversion may be performed. In particular, the conversion accuracy is remarkably reduced in special characters, kanji, pictures with pictures, old kanji and characters, mixed languages, and handwritten with habits.
[0034]
As described above, the text data converted by the OCR has improved conversion accuracy but cannot guarantee completeness. Therefore, it is necessary to correct the text data while comparing the image before conversion with the converted text data. The text data correction method of the present invention is a method that can efficiently or automatically correct text data converted by OCR.
[0035]
The text data correction work is performed by a plurality of correctors 1. This is a human task. It is important to know the ability and performance of the corrector 1. According to the text data correction method of the present invention, after each correction of the correction information 6 of each corrector 1 is converted into an evaluation value based on the index of the correction information analysis result 7, these evaluation values are, for example, the correction information of each corrector 1. By calculating the score by totaling every six, it is possible to know the abilities and achievements of the corrector 1.
[0036]
The conversion into the evaluation value can be performed such that, for example, if each correction of the correction information 6 of each corrector 1 is a certain value or more (for example, 80% or more) as an index of the correction information analysis result 7, one point is obtained. .
[0037]
The conversion into the evaluation value can be performed, for example, such that one point is given when each correction of the correction information 6 of each corrector 1 matches the correction finally performed by the determiner 9 or the like.
[0038]
The conversion to the evaluation value can be calculated for an incorrect correction, or a combination of an additional point and a deductible point can be used.
[0039]
These evaluation values can be used for ranking based on the correction ability of the corrector 1. The ranking is used to determine whether or not to allow the corrector 1 to perform the subsequent correction work, and to calculate by multiplying a larger coefficient for correction of the higher rank corrector when creating an index in the analysis. it can.
[0040]
In addition, the evaluation value can be linked to a reward if the text data correction work is performed as a work involving wages. Also, the evaluation value can be linked to the grade when the text data correction work is performed as a performance test.
[0041]
When the text data correction work is performed as a business, as described above, this work needs to be performed by a plurality of correctors 1, and expansion of the scale requires joint work by many people.
[0042]
The text data correction method of the present invention can be implemented via the Internet. Thereby, the corrector 1 can perform the correction work from an arbitrary place at an arbitrary time. It is difficult for a business to collect a large number of revisions 1 in one place, from the viewpoint of the ability, location, equipment and cost of the revision 1 required. According to the present invention, it is possible to provide a text data correction method in which people all over the world can arbitrarily participate.
[0043]
Until now, the text data correction in which a character string to be corrected is assumed has been mainly described. In the present invention, however, the text data correction method is characterized in that the text data correction is a desired sentence search. Is also effective. In this method, for the purpose of investigation, improvement, and the like, certain text data is given to a plurality of correctors 1 to make corrections, and the tendency and preference of corrections are examined. In this case, although a character string to be corrected is not fixed, a common correction by many correctors 1 can be considered as a character string to be corrected. In addition, it is possible to statistically investigate the tendency of the correction contents of a plurality of correctors 1.
[0044]
FIG. 2 is a flowchart showing the operation of the embodiment of the text data correcting method according to the present invention.
[0045]
In step 1, the corrector 1 corrects the pre-correction text data 5.
[0046]
In step 2, the correction information 6 is generated and stored.
[0047]
Steps 1 and 2 are each performed by a plurality of modifiers 1. Thereby, the correction information 6 corresponding to the corrector 1 is generated and stored by the number of the correctors 1.
[0048]
In step 3, a plurality of pieces of correction information 6 corresponding to the pre-correction text data 5 are analyzed, and a correction information analysis result 7 is generated.
[0049]
In step 4, the text data 8 is generated by performing the correction. The method of performing the correction includes an automatic method and a determination method.
[0050]
In step 5, the score of the corrector is calculated. Step 5 may or may not be performed.
[0051]
【The invention's effect】
According to the present invention, text data correction can be systematically, efficiently, accurately, or automatically performed, and thereby text data correction can be performed by an organization as a business.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration for implementing a text data correction method of the present invention.
FIG. 2 is a flowchart showing the operation of the embodiment of the text data correction method of the present invention.
FIG. 3 is a diagram showing an example of correction information 6 according to the embodiment of the text data correction method of the present invention.
FIG. 4 is a diagram showing a correction information analysis result 7 of the embodiment in the text data correction method of the present invention.
FIG. 5 is a diagram showing an example of a screen display of the terminal 2 when the correction is performed by the determiner 9 in the embodiment of the text data correction method of the present invention.
[Explanation of symbols]
1 Modifier 2 Terminal 3 Server 4 Internet 5 Text data before correction 6 Correction information 7 Correction information analysis result 8 Text data after correction 9 Judge

Claims (8)

  1. A step in which a plurality of correctors separately correct the same pre-correction text data, a step of storing the corrected portion and the corrected content by each corrector as correction information, and analyzing a plurality of the corrected information to A text data correction method comprising a step of creating an index for each correction location.
  2. 2. A text data comprising a step of automatically judging the index created by the text data correction method according to claim 1 based on a given rule, and automatically correcting the text data before correction. How to fix.
  3. 2. A text data correction method, comprising the step of displaying both the index created by the text data correction method according to claim 1 and the correction content in correspondence with a correction location of the text data before correction.
  4. After converting each correction of the correction information of each of the correctors created by the text data correcting method according to claim 1 into an evaluation value based on the corresponding index, these evaluation values are totaled for each of the correctors, and A text data correction method, comprising a step of calculating a score of a corrector.
  5. 5. The text data correction method according to claim 1, wherein the pre-correction text data is the data generated by an OCR (Optical Character Reader: character recognition device).
  6. 5. The text data correction method according to claim 1, wherein the step of creating the index is for obtaining a probability of occurrence of the same correction content at the same correction location, and the index is the occurrence probability.
  7. 5. The text data correction method according to claim 1, wherein the step of performing the correction is performed from an arbitrary place at an arbitrary time via the Internet.
  8. The text data correction method according to claim 1, wherein the text data correction in the text data correction method is a desired sentence search.
JP2003044798A 2003-02-21 2003-02-21 Text data correction method Pending JP2004252881A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003044798A JP2004252881A (en) 2003-02-21 2003-02-21 Text data correction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003044798A JP2004252881A (en) 2003-02-21 2003-02-21 Text data correction method

Publications (1)

Publication Number Publication Date
JP2004252881A true JP2004252881A (en) 2004-09-09

Family

ID=33027396

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003044798A Pending JP2004252881A (en) 2003-02-21 2003-02-21 Text data correction method

Country Status (1)

Country Link
JP (1) JP2004252881A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012178692A (en) * 2011-02-25 2012-09-13 Mitsubishi Electric Information Systems Corp Faxocr system and faxocr program
JP2014126926A (en) * 2012-12-25 2014-07-07 Nippon Telegr & Teleph Corp <Ntt> Error compilation device, method, and program
CN105408891A (en) * 2013-06-03 2016-03-16 机械地带有限公司 Systems and methods for multi-user multi-lingual communications
JP2016514393A (en) * 2013-02-13 2016-05-19 オートデスク,インコーポレイテッド Serialization for differential encoding
JP2016152026A (en) * 2015-02-19 2016-08-22 株式会社富士通アドバンストエンジニアリング Reliability calculation program, system, method and device
US9836459B2 (en) 2013-02-08 2017-12-05 Machine Zone, Inc. Systems and methods for multi-user mutli-lingual communications
US9881007B2 (en) 2013-02-08 2018-01-30 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US10162811B2 (en) 2014-10-17 2018-12-25 Mz Ip Holdings, Llc Systems and methods for language detection
US10204099B2 (en) 2013-02-08 2019-02-12 Mz Ip Holdings, Llc Systems and methods for multi-user multi-lingual communications
US10346543B2 (en) 2013-02-08 2019-07-09 Mz Ip Holdings, Llc Systems and methods for incentivizing user feedback for translation processing
US10366170B2 (en) 2013-02-08 2019-07-30 Mz Ip Holdings, Llc Systems and methods for multi-user multi-lingual communications

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012178692A (en) * 2011-02-25 2012-09-13 Mitsubishi Electric Information Systems Corp Faxocr system and faxocr program
JP2014126926A (en) * 2012-12-25 2014-07-07 Nippon Telegr & Teleph Corp <Ntt> Error compilation device, method, and program
US10366170B2 (en) 2013-02-08 2019-07-30 Mz Ip Holdings, Llc Systems and methods for multi-user multi-lingual communications
US10346543B2 (en) 2013-02-08 2019-07-09 Mz Ip Holdings, Llc Systems and methods for incentivizing user feedback for translation processing
US10204099B2 (en) 2013-02-08 2019-02-12 Mz Ip Holdings, Llc Systems and methods for multi-user multi-lingual communications
US10146773B2 (en) 2013-02-08 2018-12-04 Mz Ip Holdings, Llc Systems and methods for multi-user mutli-lingual communications
US9881007B2 (en) 2013-02-08 2018-01-30 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US9836459B2 (en) 2013-02-08 2017-12-05 Machine Zone, Inc. Systems and methods for multi-user mutli-lingual communications
US10417351B2 (en) 2013-02-08 2019-09-17 Mz Ip Holdings, Llc Systems and methods for multi-user mutli-lingual communications
JP2016514393A (en) * 2013-02-13 2016-05-19 オートデスク,インコーポレイテッド Serialization for differential encoding
US9659020B2 (en) 2013-02-13 2017-05-23 Autodesk, Inc. Serialization for delta encoding
JP2016524234A (en) * 2013-06-03 2016-08-12 マシーン・ゾーン・インコーポレイテッドMachine Zone, Inc. System and method for multi-user multilingual communication
CN105408891A (en) * 2013-06-03 2016-03-16 机械地带有限公司 Systems and methods for multi-user multi-lingual communications
US10162811B2 (en) 2014-10-17 2018-12-25 Mz Ip Holdings, Llc Systems and methods for language detection
JP2016152026A (en) * 2015-02-19 2016-08-22 株式会社富士通アドバンストエンジニアリング Reliability calculation program, system, method and device

Similar Documents

Publication Publication Date Title
Pallant SPSS survival manual
Kelley et al. Standardizing privacy notices: an online study of the nutrition label approach
JP4093012B2 (en) Hypertext inspection apparatus, method, and program
US5167016A (en) Changing characters in an image
Heiberger et al. Statistical analysis and data display
US20100070394A1 (en) Efficient work flow system and method for preparing tax returns
Foster Data Analysis Using SPSS for Windows-Version 6: A Beginner's Guide
JP4829920B2 (en) Form automatic embedding method and apparatus, graphical user interface apparatus
JP4366108B2 (en) Document search apparatus, document search method, and computer program
US7236968B2 (en) Question-answering method and question-answering apparatus
US7047238B2 (en) Document retrieval method and document retrieval system
US6907431B2 (en) Method for determining a logical structure of a document
US6577846B2 (en) Methods for range finding of open-ended assessments
Walther et al. Comparison of electronic data capture (EDC) with the standard data capture method for clinical trial data
US20060288279A1 (en) Computer assisted document modification
US20040202352A1 (en) Enhanced readability with flowed bitmaps
DE10342594B4 (en) Method and system for collecting data from a plurality of machine readable documents
US8607142B2 (en) Method for aligning demonstrated user actions with existing documentation
US20120102002A1 (en) Automatic data validation and correction
CN1195799A (en) Handwritten data input device having coordinate detection image input tablet
JPH08166970A (en) Method for emphersizing document picture by highlight by using encoded word token
US8442324B2 (en) Method and system for displaying image based on text in image
Yang et al. MLwiN Macros for advanced Multilevel modelling
Bernhard et al. Answering learners' questions by retrieving question paraphrases from social Q&A sites
JP2010510563A (en) Automatic generation of form definitions from hardcopy forms