CN102591850A - Method and system for error text statement correction based on conditional statements - Google Patents

Method and system for error text statement correction based on conditional statements Download PDF

Info

Publication number
CN102591850A
CN102591850A CN2011104465786A CN201110446578A CN102591850A CN 102591850 A CN102591850 A CN 102591850A CN 2011104465786 A CN2011104465786 A CN 2011104465786A CN 201110446578 A CN201110446578 A CN 201110446578A CN 102591850 A CN102591850 A CN 102591850A
Authority
CN
China
Prior art keywords
data
text
score value
sentence
replacement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011104465786A
Other languages
Chinese (zh)
Inventor
兰荣春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Founder International Co Ltd
Original Assignee
Founder International Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Founder International Co Ltd filed Critical Founder International Co Ltd
Priority to CN2011104465786A priority Critical patent/CN102591850A/en
Publication of CN102591850A publication Critical patent/CN102591850A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method and a system for error text statement correction based on conditional statements, which relate to the technical field of computer information processing. In the prior art, only a manual correction way and a semantic correction way are available for correction of character mark errors in a text, manual correction is not technical at all, and semantic correction using a large database is low in efficiency. The method and the system use no complex ways such as semantic analysis and the like but use simple logical detection way and automatic device detection and correction, have the advantages of high efficiency, quickness, high controllability and the like, and are easy in application and easy to be mastered for use.

Description

A kind of Error Text statement correcting method and system based on conditional statement
Technical field
The present invention relates to technical field of computer information processing, relate in particular to a kind of Error Text statement correcting method and system based on conditional statement.
Background technology
The continuous development of Along with computer technology and digitizing technique need be with electronizations such as traditional paper book, document, newspapers.With these physical entity data in the conversion process of electronic data; The books of papery, document, newspaper finally convert electronic data (TXT, WORD into; The e-file of forms such as PDF); Inevitably use character recognition technology (OCR technology, Optical Character Recognition, optical character identification).When carrying out literal identification; Existing character recognition technology can't really be accomplished right-on identification; Also need do further manual typing work; E-text that these two factors obtain after all can causing discerning and the difference between the real image data, just common wrongly written character of these difference and similar character wrongly written character, full half-angle wrongly written character etc.In addition, in our daily literal input and editing process, also usually similar mistake can appear.
The reason that wrongly written character produces in the text mainly is streaming literal or the artificial stream text of importing after the OCR identification, because the replacement character error of the similar character that identification error or input error produce.Such as, OCR discerns one "-", is "-" " one " "-" similar fonts such as " _ " but can't distinguish.When manual input, input error also often appears in complete sometimes half-angle punctuation mark and half-angle punctuation mark.In addition, when five inputs or handwriting input, the selection mistake of similar character or alternative word often appears also.
In the prior art, to the modification of these literal or character mistake, manual amendment's mode and semantic alter mode are only arranged, the manual amendment has no technology and can say, the semantic modification adopted big database, and efficient is low.Therefore need a kind of method that can detect automatically and correct to the wrongly written character that produces in above-mentioned OCR identification or the artificial input process.
Summary of the invention
To existing deficiency in the prior art; The object of the present invention is to provide a kind of Error Text statement correcting method and system, adopt a kind of rapid automatized mode to accomplish these manually-operateds, use simple logic detection device to detect automatically and revise the literal and the character mistake of being deposited in the text sentence based on conditional statement; Raise the efficiency and speed; Practice thrift human cost, be easy to use, be easy to grasp and use.
The present invention solves the problems of the technologies described above the technical scheme that is adopted and describes as follows:
A kind of Error Text statement correcting method based on conditional statement may further comprise the steps:
(1) input needs the text data of processing, the line data of going forward side by side conversion;
(2) input and editor's testing conditions;
(3) logical condition detects and the coupling step: the matching degree between the testing conditions of detect, coupling being imported data and input, and raw data is revised in replacement, and provides coupling replacement score value;
(4) scoring step: detect the replacement data, the score value that provide with coalignment to logical condition and carry out rationality and judge, select one group the most reasonably result as qualified data;
(5) qualified data output step: qualified data are converted into and the e-file output of original input data with form.
Further, in the step (1), the said text data that needs to handle is the stream text data.
Described stream text data are WORD, PDF or text.
In the step (1), when carrying out data-switching, become corresponding accessible storage unit to each literal in the text data of input, become the tabulation that N storage unit arranged to the stream of N literal, N is a positive integer.
Further, in the step (3), said testing conditions has many groups, and what provide is the data that many groups are revised.
Further, in the step (3), said detection step comprises:
1) replace statement coupling scoring one by one, each replacement condition all is provided with a score value, all conditions is done the replacement coupling after, obtain a condition score value;
2) confirm the score value of replacement text own according to the content of text own;
3) judge it is Chinese or English, and the position of symbol appearance, the position score value obtained;
4) be specified to mating score value about replacement;
5) each score value addition is obtained total score value after, directly carry out text replacement according to total score value.
Further, in the step 5), if a symbol can carry out multiple replacement, get the high program of score value and directly replace, what score value was low casts out, and score value is close or identical then quotes the manual detection early warning.
A kind of Error Text statement correcting system based on conditional statement comprises with lower device:
(1) input media 11: be used to import the text that needs are handled, the line data of going forward side by side conversion;
(2) testing conditions input and editing device 12: be used for input and editor's testing conditions;
(3) logical condition detects and coalignment 13: be used to detect, mate the matching degree between the testing conditions of importing data and input, raw data is revised in replacement, and provides coupling replacement score value;
(4) scoring apparatus 14: be used for judging to logical condition detection and coupling replacement data, score value;
(5) qualified data output device 15: be used for qualified data are converted into and the e-file output of original input data with form.
Effect of the present invention is: the present invention does not adopt complex way such as semantic analysis; Use the automatic detection of simple logic detection device and revise the literal and the character mistake of being deposited in the text sentence, have the efficient height, speed is fast; Advantages such as controllability is strong; Make to be easy to use, be easy to grasp and use, practiced thrift human cost to a certain extent.
Description of drawings
Fig. 1 is the structural drawing based on the Error Text statement correcting system of conditional statement;
Fig. 2 is the process flow diagram based on the Error Text statement correcting method of conditional statement.
Embodiment
Come the present invention is done further, detailed description below in conjunction with accompanying drawing and specific embodiment.
As shown in Figure 1, a kind of Error Text statement correcting system based on conditional statement comprises with lower device:
(1) input media 11: be used to import the text that needs are handled, the line data of going forward side by side conversion;
(2) testing conditions input and editing device 12: be used for input and editor's testing conditions;
(3) logical condition detects and coalignment 13: be used to detect, mate the matching degree between the testing conditions of importing data and input, raw data is revised in replacement, and provides coupling replacement score value;
(4) scoring apparatus 14: be used for judging to logical condition detection and coupling replacement data, score value;
(5) qualified data output device 15: be used for qualified data are converted into and the e-file output of original input data with form.
Wherein:
Input media 11: the function of input stream text is provided, and the conversion raw data is the accessible data of the present invention.Said stream text is meant according to a definite sequence, has the word flow that line style concerns, simply the most for example text belongs to the set of linear writing, and the text among WORD, PDF, the XML also is linear text; Be that each word only has a unique preceding text unit and hereinafter unit.
Logical condition detects and coalignment 13: the matching degree between the testing conditions of detect, coupling being imported data and input, raw data is revised in replacement, and provides coupling replacement score value.Because matching condition has many groups, what provide is the data that many groups are revised here.
Scoring apparatus 14: detect the replacement data, the score value that provide with coalignment to logical condition and carry out the rationality judgement, select one group of the most reasonably result.
Qualified data output device 15: qualified data are converted and the e-file of original input data with form into output qualified processed result.
A kind of Error Text statement correcting method based on conditional statement may further comprise the steps:
(1) input needs the text of processing, the line data of going forward side by side conversion;
The said text that needs to handle is the stream text data, comprises WORD, PDF, text etc.The conversion raw data is the accessible data of the present invention.Said stream text is meant according to a definite sequence, has the word flow that line style concerns, simply the most for example text belongs to the set of linear writing, and the text among WORD, PDF, the XML also is linear text; Be that each word only has a unique preceding text unit and hereinafter unit.
Data after the conversion have become corresponding accessible storage unit to each literal of input, have become the tabulation that N storage unit arranged to the stream of N literal, and the result of storage unit in tabulating is also write in the tabulation.
(2) input and editor's testing conditions;
Below be the testing conditions table in the present embodiment:
[, change ,]: [(left side===&& is right!==&& left side sentence!=middle sentence left side sentence=middle sentence) (left side belongs to " .. " right side to ‖!=Chinese) ‖ (the English sentence of the English sentence of left side sentence 1==&& right sentence 1==&& left side sentence 2! The English sentence of the right sentence of=middle sentence 2==) ‖ (the English sentence of the English sentence of left side sentence 1==right sentence 1==English sentence left side sentence 2==right==letter)]
[. change .]: [(left side==letter is right!==&& left side sentence!=middle sentence left side sentence=middle sentence) (left side belongs to " .. " right side to ‖!=Chinese) ‖ (the English sentence of the English sentence of left side sentence 1==&& right sentence 1==&& left side sentence 2! The English sentence of the right sentence of=middle sentence 2==) ‖ (the English sentence of the English sentence of left side sentence 1==right sentence 1==English sentence left side sentence 2==right==letter)]
[? Change? ]: [left side sentence==middle sentence]
[! Change! ]: [left side sentence==middle sentence]
[: change :]: [(left side==the=&& right side==a=&& left side sentence!=middle sentence) ‖ (left side==the right sentence of numeral==English sentence) ‖ (left side==letter left side sentence=middle sentence left side sentence=middle sentence) ‖ (left side==numeral is right==numeral left side sentence=middle sentence left side sentence=middle sentence)]
[: change :]: [((left side sentence==the right sentence of English sentence ‖==English sentence) (left side==numeral is right==numeral)) ‖ ((left side sentence==the right sentence of middle sentence ‖==middle sentence) (left side==numeral is right==numeral)) ‖ (left side sentence==a middle sentence left side==numeral is right==numeral)]
[; Change; ]: [(left side==the=&& right side==a=&& left side sentence!=middle sentence) ‖ (left side==the right sentence of numeral==English sentence) ‖ (left side==letter left side sentence=middle sentence left side sentence=middle sentence) ‖ (left side==numeral is right==numeral) ‖ (left side==letter left side sentence=middle sentence left side sentence=middle sentence)]
[; Change; ]: [a left side!==&& the right side==Chinese]
[+change+]: [1]
[(|) become exchange (|)]: [(comprising Chinese) ‖ (left side, left side sentence==middle sentence) ‖ (the right sentence in right side==middle sentence) ‖ (left side, left side sentence Sentence among the sentence 2==of a left side ,=middle sentence left side) ‖ (among the left side sentence 2==of sentence 1==English sentence left side, a left side, left side among the sentence 3==of a left side, a sentence left side sentence) left side, ‖ left side belongs to " [] ' ' " "..; : :+-* ÷ () right side, () " && right side belongs to " [] ' ' " "..;!?:、∶+-×÷\(\)()]}″]
[) changes)]: [(left side==Chinese is right==Chinese) ‖ (left side==the right sentence of numeral==middle sentence) a ‖ (left side belongs to " ..*i ii iii iv v vi vii viii ix x III III IV V VI VII VIII IX X
" right sentence==middle sentence) ‖ (left side==the right sentence of letter==middle sentence)]
[(change (]: [a left side!=English ‖ is right!=English]
[' change ']: [left side===&& is right==a=&& left side sentence!=middle sentence]
[, change ,]: [(right==the right sentence of middle sentence ‖==middle sentence) (left side==the numeral right side==numeral)]
[. change .]: [((left==the right sentence of middle sentence ‖==middle sentence) (left side==the alphabetical right side==letter)) ‖ ((left side sentence! The right sentence of=middle sentence ‖==middle sentence) (left side==letter))]
[≡ changes]: [1]
[~change~]: [1]
[% changes %]: [1]
[/ change /]: [1]
[change]: [1]
[=change=]: [1]
[[change []: [1]
[] changes]]: [1]
[{ change {]: [1]
[} changes }]: [1]
[<change <]: [1]
[>change >]: [1]
[< |>become exchange<|>]: [left side, left side sentence==the right sentence in English sentence right side==an English sentence ‖ comprises English]
[>change >]: [(a left side!=English) ‖ (left side==the right sentence of numeral==middle sentence) a ‖ (left side belongs to " ..*i ii iii iv v vi vii viii ix x I II III IV V VIVII VIII IX X
" right sentence==middle sentence) ‖ (left side==the right sentence of letter==middle sentence)]
[<change <] a: [left side!=English ‖ is right!=English]
[(|) becomes exchange; Individual character |, change ,]: [left side left side sentence==middle sentence do not contain Chinese character]
[(|) becomes exchange; Individual character |. change .]: [(left side left side sentence==middle sentence comprise letter) ‖ (not containing Chinese character)]
[{ | } becomes exchange; Individual character |. change .]: [(left side left side sentence==middle sentence comprise letter) ‖ (not containing Chinese character)]
[{ | } becomes exchange; Individual character |, change ,]: [left side left side sentence==middle sentence do not contain Chinese character]
[[|] becomes exchange; Individual character |, change ,]: [left side left side sentence==middle sentence do not contain Chinese character]
[[|] becomes exchange; Individual character |. change .]: [(left side left side sentence==middle sentence comprise letter) ‖ (not containing Chinese character)]
[" | " becomes exchange; Individual character |, change ,]: [left side left side sentence==middle sentence do not contain Chinese character]
[" | " becomes exchange; Individual character |. change .]: [(left side left side sentence==middle sentence comprise letter) ‖ (not containing Chinese character)]
[(|) becomes exchange; Individual character |, change ,]: [left side left side sentence==middle sentence comprise Chinese character]
[{ | } becomes exchange; Individual character |. change .]: [(left side left side sentence==middle sentence comprise letter) ‖ (not containing Chinese character)]
Above-mentioned testing conditions table can be done further editor and modification according to actual needs.
(3) logical condition detects and the coupling step: the matching degree between the testing conditions of detect, coupling being imported data and input, and raw data is revised in replacement, and provides coupling replacement score value; Because matching condition has many groups, what provide is the data that many groups are revised here.
Detecting step described in the step (2) comprises:
1) replaces the scoring of statement coupling one by one;
2) the own score value of replacement text;
3) replacement location score;
4) coupling minute about in pairs replacement " (), " ", " " " waits;
5) obtain score value after, directly carry out text replacement;
6) close identical score value is reported to the police manual intervention.
In the present embodiment, described score value is prior statistics, and each replacement condition all has a score value, all conditions is done the replacement coupling after, obtain a condition score value; According to the content of text own for example branch, comma, this gets relation with regard to obtaining with the house of score value again; And then judge that section is a Chinese or English, and the position of symbol appearance, provide a position score value, these score value additions are obtained total score value.
If a symbol can carry out multiple replacement, get the high program of score value and directly replace, what score value was low casts out, and score value is close quotes the manual detection early warning.
(4) scoring step: detect the replacement data, the score value that provide with coalignment to logical condition and carry out rationality and judge, select one group the most reasonably result as qualified data;
(5) qualified data output step: qualified data are converted and the e-file of original input data with form into output qualified processed result.
It will be understood by those skilled in the art that top specific descriptions just in order to explain the object of the invention, are not to be used to limit the present invention.Protection scope of the present invention is limited claim and equivalency range thereof.

Claims (8)

1. Error Text statement correcting method based on conditional statement may further comprise the steps:
(1) input needs the text data of processing, the line data of going forward side by side conversion;
(2) input and editor's testing conditions;
(3) logical condition detects and the coupling step: the matching degree between the testing conditions of detect, coupling being imported data and input, and raw data is revised in replacement, and provides coupling replacement score value;
(4) scoring step: detect the replacement data, the score value that provide with coalignment to logical condition and carry out rationality and judge, select one group the most reasonably result as qualified data;
(5) qualified data output step: qualified data are converted into and the e-file output of original input data with form.
2. a kind of Error Text statement correcting method based on conditional statement as claimed in claim 1 is characterized in that: in the step (1), the said text data that needs to handle is the stream text data.
3. a kind of Error Text statement correcting method based on conditional statement as claimed in claim 2, it is characterized in that: described stream text data are WORD, PDF or text.
4. a kind of Error Text statement correcting method as claimed in claim 3 based on conditional statement; It is characterized in that: in the step (1); When carrying out data-switching; Become corresponding accessible storage unit to each literal in the text data of input, become the tabulation that N storage unit arranged to the stream of N literal, N is a positive integer.
5. a kind of Error Text statement correcting method based on conditional statement as claimed in claim 4 is characterized in that: in the step (3), said testing conditions has many groups, and what provide is the data that many groups are revised.
6. like the arbitrary described a kind of Error Text statement correcting method of claim 1 to 5, it is characterized in that in the step (3), said detection step comprises based on conditional statement:
1) replace statement coupling scoring one by one, each replacement condition all is provided with a score value, all conditions is done the replacement coupling after, obtain a condition score value;
2) confirm the score value of replacement text own according to the content of text own;
3) judge it is Chinese or English, and the position of symbol appearance, the position score value obtained;
4) be specified to mating score value about replacement;
5) each score value addition is obtained total score value after, directly carry out text replacement according to total score value.
7. a kind of Error Text statement correcting method as claimed in claim 6 based on conditional statement; It is characterized in that: in the step 5),, get the high program of score value and directly replace if a symbol can carry out multiple replacement; What score value was low casts out, and score value is close or identical then quotes the manual detection early warning.
8. Error Text statement correcting system based on conditional statement comprises with lower device:
(1) input media 11: be used to import the text that needs are handled, the line data of going forward side by side conversion;
(2) testing conditions input and editing device 12: be used for input and editor's testing conditions;
(3) logical condition detects and coalignment 13: be used to detect, mate the matching degree between the testing conditions of importing data and input, raw data is revised in replacement, and provides coupling replacement score value;
(4) scoring apparatus 14: be used for judging to logical condition detection and coupling replacement data, score value;
(5) qualified data output device 15: be used for qualified data are converted into and the e-file output of original input data with form.
CN2011104465786A 2011-12-28 2011-12-28 Method and system for error text statement correction based on conditional statements Pending CN102591850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011104465786A CN102591850A (en) 2011-12-28 2011-12-28 Method and system for error text statement correction based on conditional statements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011104465786A CN102591850A (en) 2011-12-28 2011-12-28 Method and system for error text statement correction based on conditional statements

Publications (1)

Publication Number Publication Date
CN102591850A true CN102591850A (en) 2012-07-18

Family

ID=46480519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011104465786A Pending CN102591850A (en) 2011-12-28 2011-12-28 Method and system for error text statement correction based on conditional statements

Country Status (1)

Country Link
CN (1) CN102591850A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250354A (en) * 2015-06-09 2016-12-21 富士通株式会社 Process the information processor of document, information processing method and program
CN107168941A (en) * 2017-05-12 2017-09-15 掌阅科技股份有限公司 Content of text modification method, electronic equipment, computer-readable storage medium
CN110633461A (en) * 2019-09-10 2019-12-31 北京百度网讯科技有限公司 Document detection processing method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639830A (en) * 2009-09-08 2010-02-03 西安交通大学 Chinese term automatic correction method in input process

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639830A (en) * 2009-09-08 2010-02-03 西安交通大学 Chinese term automatic correction method in input process

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250354A (en) * 2015-06-09 2016-12-21 富士通株式会社 Process the information processor of document, information processing method and program
CN106250354B (en) * 2015-06-09 2020-09-18 富士通株式会社 Information processing apparatus, information processing method, and program for processing document
CN107168941A (en) * 2017-05-12 2017-09-15 掌阅科技股份有限公司 Content of text modification method, electronic equipment, computer-readable storage medium
CN110633461A (en) * 2019-09-10 2019-12-31 北京百度网讯科技有限公司 Document detection processing method and device, electronic equipment and storage medium
CN110633461B (en) * 2019-09-10 2024-01-16 北京百度网讯科技有限公司 Document detection processing method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Drobac et al. Optical character recognition with neural networks and post-correction with finite state methods
CN101770446A (en) Method and system for identifying form in layout file
CN106446072B (en) The treating method and apparatus of web page contents
CN102637180B (en) Character post processing method and device based on regular expression
CN108959566A (en) A kind of medical text based on Stacking integrated study goes privacy methods and system
CN101354727A (en) Method and apparatus for establishing links between digital document catalog and text
CN104199845B (en) Line Evaluation based on agent model discusses sensibility classification method
CN101539910A (en) A sentence taking method for computer aided translation and system thereof
CN102662937A (en) Automatic translation system and automatic translation method thereof
CN105630817A (en) Electronic invoice content analysis method and system
CN103500216A (en) Method for extracting file information
CN108132917B (en) Document error correction marking method
CN108763428A (en) A kind of information technology consulting system based on search engine
Toselli et al. Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription
CN104331400B (en) A kind of Mongolian code conversion method and device
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
CN102591850A (en) Method and system for error text statement correction based on conditional statements
Shanmugalingam et al. Language identification at word level in Sinhala-English code-mixed social media text
Camps et al. Handling heavily abbreviated manuscripts: Htr engines vs text normalisation approaches
CN111563372B (en) Typesetting document content self-duplication checking method based on teaching book publishing
CN109992761A (en) The rule-based adaptive text information extracting method of one kind and software memory
Sturgeon Large-scale Optical Character Recognition of pre-modern Chinese texts
CN105095184A (en) Method for spelling and grammar proofreading of text document
Darģis et al. Lessons learned from creating a balanced corpus from online data
CN104063366A (en) Text format setting method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20120718

C20 Patent right or utility model deemed to be abandoned or is abandoned