CN105279149A - Chinese text automatic correction method - Google Patents

Chinese text automatic correction method Download PDF

Info

Publication number
CN105279149A
CN105279149A CN201510688403.4A CN201510688403A CN105279149A CN 105279149 A CN105279149 A CN 105279149A CN 201510688403 A CN201510688403 A CN 201510688403A CN 105279149 A CN105279149 A CN 105279149A
Authority
CN
China
Prior art keywords
word
text
error
chinese text
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510688403.4A
Other languages
Chinese (zh)
Inventor
刘云翔
杜杰
李晓丹
郑力
杜俊
刘续博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Technology
Original Assignee
Shanghai Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Technology filed Critical Shanghai Institute of Technology
Priority to CN201510688403.4A priority Critical patent/CN105279149A/en
Publication of CN105279149A publication Critical patent/CN105279149A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese text automatic correction method. The method comprises the following steps of: a) inputting a to-be-corrected Chinese text, and performing word segmentation preprocessing on the Chinese text sentence by sentence; b) searching for one-character words, two-character words or disperse strings of three or more than three characters occurring in the text subjected to word segmentation sentence by sentence; c) performing continuous determination on the disperse strings occurring in the text subjected to word segmentation by adopting an N-gram model, and checking text word level errors for each single sentence in combination with a word forming probability of separate characters; and d) constructing an error correction knowledge base to generate an error correction candidate text. According to the Chinese text automatic correction method provided by the invention, the one-character words, two-character words or disperse strings of three or more than three characters occurring in the text subjected to word segmentation are searched for sentence by sentence, the disperse strings occurring in the text subjected to word segmentation are subjected to continuous determination by adopting the N-gram model to determine identification errors, and the error correction knowledge base is constructed to generate the error correction candidate text, so that error checking and correcting processes are combined very well, and the method has the characteristics of high error checking speed and high error correcting efficiency.

Description

A kind of Chinese text auto-correction method
Technical field
The present invention relates to a kind of text correction method, particularly relate to a kind of Chinese text auto-correction method.
Background technology
Along with developing rapidly of Modern Laser phototypesetting technology and electronic publishing industry, how to ensure passed on information correctly one of importance becoming research.Current people use computing machine to carry out writing, edit and the work such as typesetting, inevitably some errors in text, such as multiword, hiatus, transposition, English word spelling write error, punctuate lack of standardization etc.Therefore need special school team's system to proofread manuscript.From long term growth, informationization is the trend of social development in the future, the electronic information that people face and manuscript increasing, and traditional craft check and correction needs press corrector to carry out reading word by word and sentence by sentence, inspection to text, all can not adapt to from cost and efficiency two aspects the trend that e-text quantity rapidly increases.Therefore, more and more urgent to the demand of an automatic school team system that accuracy is high, efficiency is high.
Automatic school team has very important practical value, and have a wide range of applications field.In publishing business, the realization of text automatic Proofreading can alleviate the workload of staff greatly, they is freed from loaded down with trivial details tasteless work, accelerates to publish rhythm and promotes developing rapidly of whole publishing business; In Text region, need with debugging, error correcting technique to speech recognition, the recognition results such as ORC Text region are modified; In copy editor, such as, all provide automatic errordetecting technology in a lot of text editing system such as word etc., the text of input is reported an error automatically; In man-machine interface, such as the man-machine interface such as data base querying, natural language requires certain fault freedom; Need to analyze the sentence of input in the systems such as aided education, find out mistake wherein, and provide possible correct option etc.
In addition, automatic Proofreading also has very important theory significance.From ownership of discipline, automatic Proofreading is subordinated to the category of natural language understanding, involves the basic sector of many natural language understandings, such as automatic word segmentation, part-of-speech tagging, syntactic analysis etc., because of but a research topic having very much a learning value.At present, the research of natural language processing has entered the stage to extensive real text process, and the real text of reality may also exist mistake, automatic Proofreading technology is studied exactly and is searched these mistakes of process, therefore the development of automatic Proofreading technology must improve the fault freedom of other natural language processings, promotes the development of whole natural language processing research further.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of Chinese text auto-correction method, to e-text automatic analysis, can find, indicate mistake and carry out error correction correction, debugging and error correction procedure are combined well, there is debugging speed fast, the feature that error correction efficiency is high.
The present invention solves the problems of the technologies described above the technical scheme adopted to be to provide a kind of Chinese text auto-correction method, comprises the steps: a) to input to wait to proofread Chinese text, carries out participle pre-service by simple sentence to Chinese text; B) individual character, double word or three words and above loose string thereof that occur in participle text are searched by simple sentence; C) adopt N-gram model to judge continuously the loose string occurred in participle text, and each simple sentence is checked to the mistake of text word level in conjunction with inside word probability; D) construct correcting knowledge sets and generate error correction candidate text.
Above-mentioned Chinese text auto-correction method, wherein, described step a) adopts voice or input through keyboard to wait to proofread Chinese text, and described pre-service comprises treating check and correction Chinese text arrangement grammar mistake and carrying out pattern match inspection input.
Above-mentioned Chinese text auto-correction method, wherein, described step a) in phonetic entry to wait to proofread the process of Chinese text as follows: receive the phonetic entry from microphone and transfer the voice flow that computing machine can receive to, the combination of Pattern matching generating candidate words word is carried out to voice flow, utilizes language model to identify the combination of candidate word word.
Above-mentioned Chinese text auto-correction method, wherein, described step a) middle input through keyboard waits that the process of proofreading Chinese text is as follows: encode to words in advance, keystroke signal is converted to the code sequence that computing machine accepts, and described code sequence be associated with word coding method.
Above-mentioned Chinese text auto-correction method, wherein, described step c) as follows to the deterministic process of three words and above loose string thereof: judge that in loose string, each word becomes separately the probability of word, determine the first error constant, the binary word model that continues is adopted to judge that adjacent two words become the probability of word successively, determine the second error constant, the ternary word model that continues is adopted to judge that adjacent three words become the probability of word successively, determine the 3rd error constant, all error constants are added the terminal error coefficient determining text word level.
Above-mentioned Chinese text auto-correction method, wherein, described step c) to continuous four words loose string W kw k+1w k+2w k+3deterministic process as follows: c1) judge W respectively kw k+1w k+2w k+3these words become separately the probability of word, if probability P=0 that certain word occurs separately, then this place is wrong, error constant K 1+=1.5; C2) with W k-2for reference position, W k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation; If R=0, then error constant K 4+=0.2, if R>=1, then K 2-=1.0; C3) with W k-1for reference position, W k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation; If R=0, then error constant K 3+=0.5, if 1<R<2, then K 3+=0.2, if R>=2, then K 3-=1.0; C4) with W kthe first character of the first two word is end position, W k+3rear second word is end position, adopts ternary word model to judge, with continuous three word co-occurrence frequency R for basis for estimation; If R=0, then error constant K 4+=0.2, if R>=1, then K 4-=1.0;
C5) with W kprevious word is reference position, W k+3a rear word is end position, adopts binary word model to judge, with continuous two word co-occurrence frequency R for basis for estimation; If R=0, then error constant K 5+=0.8, if 1<R<3, then K 5+=0.5, if R>=3, then K 5-=1.0; C6) treat debugging individual character for a certain, gained error constant is added, i.e. K=K 1+ K 2+ K 3+ K 4+ K 5if K>=1.5, then this place is wrong, is indicated by Error Text.
Above-mentioned Chinese text auto-correction method, wherein, described steps d) the error correction candidate text generated is sorted, described sequencer procedure is as follows: use each error correction candidate text to replace former Error Text, step b is repeated to the simple sentence after replacing) and step c) carry out debugging process again and obtain corresponding error constant, according to error constant size order, error correction candidate text is sorted.
Above-mentioned Chinese text auto-correction method, wherein, described steps d) text based error characteristic and the various correcting knowledge sets of likelihood match method construct, described correcting knowledge sets comprises wrongly written character dictionary, easily obscures words allusion quotation, similar code dictionary and/or the two-way dictionary of word drive.
The present invention contrasts prior art following beneficial effect: Chinese text auto-correction method provided by the invention, individual character, double word or three words and above loose string thereof that occur in participle text are searched by simple sentence, N-gram model is adopted to carry out judging continuously to determine to identify mistake to the loose string occurred in participle text, and construct correcting knowledge sets generation error correction candidate text, thus debugging and error correction procedure are combined well, there is debugging speed fast, the feature that error correction efficiency is high.
Accompanying drawing explanation
Fig. 1 is Chinese text automatic calibration schematic flow sheet of the present invention;
Fig. 2 is that the present invention carries out preprocessing process schematic diagram to Chinese text to be corrected;
Fig. 3 is that the present invention adopts input through keyboard to obtain Chinese text process schematic to be corrected;
Fig. 4 is that the present invention adopts phonetic entry to obtain Chinese text process schematic to be corrected;
Fig. 5 is that the voice signal in knowledge based storehouse of the present invention is to Chinese Character Recognition process schematic;
Fig. 6 is the detailed process schematic diagram of Chinese text automatic error-correcting of the present invention.
Embodiment
Below in conjunction with drawings and Examples, the invention will be further described.
Fig. 1 is Chinese text automatic calibration schematic flow sheet of the present invention.
Refer to Fig. 1, Chinese text auto-correction method provided by the invention, comprises the steps:
A) input wait proofread Chinese text, by simple sentence, participle pre-service is carried out to Chinese text; Voice or input through keyboard is adopted to wait to proofread Chinese text, described pre-service comprises treating check and correction Chinese text arrangement grammar mistake and carrying out pattern match inspection input, treat that check and correction Chinese text can adopt voice or input through keyboard, keyboard input process is as shown in Figure 3: encode to words in advance, keystroke signal is converted to the code sequence that computing machine accepts, and described code sequence is associated with word coding method; Phonetic entry process is as shown in Figure 4 and Figure 5: receive the phonetic entry from microphone and transfer the voice flow that computing machine can receive to, the combination of Pattern matching generating candidate words word is carried out to voice flow, utilizes language model to identify the combination of candidate word word.
B) individual character, double word or three words and above loose string thereof that occur in participle text are searched by simple sentence.
C) adopt N-gram model to judge continuously the loose string occurred in participle text, and each simple sentence is checked to the mistake of text word level in conjunction with inside word probability; As follows to the deterministic process of three words and above loose string thereof: to judge that in loose string, each word becomes separately the probability of word, determine the first error constant, the binary word model that continues is adopted to judge that adjacent two words become the probability of word successively, determine the second error constant, the ternary word model that continues is adopted to judge that adjacent three words become the probability of word successively, determine the 3rd error constant, all error constants are added the terminal error coefficient determining text word level; N-Gram is a kind of language model conventional in large vocabulary continuous speech recognition, for Chinese, is referred to as Chinese language model (CLM, ChineseLanguageModel).
D) construct correcting knowledge sets and generate error correction candidate text; Specifically can adopt text based error characteristic and the various correcting knowledge sets of likelihood match method construct, described correcting knowledge sets comprises wrongly written character dictionary, easily obscures words allusion quotation, similar code dictionary and/or the two-way dictionary of word drive; Select for the ease of user, the present invention also can sort to the error correction candidate text generated, described sequencer procedure is as follows: use each error correction candidate text to replace former Error Text, step b is repeated to the simple sentence after replacing) and step c) carry out debugging process again and obtain corresponding error constant, according to error constant size order, error correction candidate text is sorted.
Please continue see Fig. 6, provide a specific embodiment below, performing step is as follows:
Step1: input and wait to proofread text, adopt Beijing University's participle software, participle pre-service is carried out to text;
Step2: search individual character, double word or three words and above loose string thereof that occur in participle text, using all these local sources possible as mistake.Suppose to find out W in text kw k+1w k+2w k+3for the loose string of continuous four words occurred, then debugging is carried out in the source herein as mistake, k is natural number, represents and finds out the position of text in simple sentence.
Step3: judge W respectively kw k+1w k+2w k+3these words become separately the probability of word, if probability P=0 that certain word occurs separately, then this place is wrong, error constant K 1+=1.5.
Step4: with W k-2for reference position, W k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation.If R=0, then error constant K 4+=0.2, if R>=1, then K 2-=1.0.
Step5: with W k-1for reference position, W k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation.If R=0, then error constant K 3+=0.5, if 1<R<2, then K 3+=0.2, if R>=2, then K 3-=1.0.
Step6: with W kthe first character of the first two word is end position, W k+3rear second word is end position, adopts ternary word model to judge, with continuous three word co-occurrence frequency R for basis for estimation.If R=0, then error constant K 4+=0.2, if R>=1, then K 4-=1.0.
Step7: with W kprevious word is reference position, W k+3a rear word is end position, adopts binary word model to judge, with continuous two word co-occurrence frequency R for basis for estimation.If R=0, then error constant K 5+=0.8, if 1<R<3, then K 5+=0.5, if R>=3, then K 5-=1.0.
Step8: treat debugging individual character for a certain, is added each module gained error constant, i.e. K=K 1+ K 2+ K 3+ K 4+ K 5if K>=1.5, then this place is wrong, is indicated by Error Text.
Step9: terminate.
In sum, auto-correction method of the present invention, mainly comprises automatic errordetecting and error correction two parts, utilizes the combination of multi-model debugging technology based on hybrid algorithm and error correcting technique, devises a kind of self-verifying model of words staging error; And on the basis analyzing text words staging error characteristic distributions, adopt N-gram model to judge continuously the loose string occurred in text.The present invention checks the mistake of text word level in conjunction with inside word probability, on the basis of structure correcting knowledge sets, achieves Correcting Suggestion generating algorithm.In conjunction with the various correcting knowledge sets of the error characteristic of text and likelihood match method construct, comprise wrongly written character dictionary, easily obscure words allusion quotation, similar code dictionary, the two-way dictionary of word drive etc. and generate error correction candidate suggestion.And propose error correction candidate suggestion to sort, by the sequencer procedure of Correcting Suggestion by realizing the debugging process of each Correcting Suggestion.When error correction, each candidate's Correcting Suggestion is replaced former mistake, carry out debugging process and obtain corresponding error constant to this place, the minimum suggestion of error constant is most probable Correcting Suggestion, thus completes the sequencer procedure of text Correcting Suggestion.The method makes the research of error correction and debugging combine, and debugging technology is well applied to error correction procedure.Concrete advantage is as follows: 1, propose the words level automatic errordetecting function adopted based on N-gram model, reflect the information of commonly used words preferably: tuple higher for the frequency of occurrences in statistics and dictionary are compared, can find that the tuple corresponding to words conventional in Chinese has higher co-occurrence frequency, the adjacency matrix thus adding up acquisition contains conventional associational word set.2, the collocation of conventional function word can well be reacted: in Chinese, some function word is combined with some word, although there is no the implication of reality, but serve grammatical function, as " must very ", " can not ", the tuple such as " one-tenth " has very high co-occurrence probability.3, N unit words adjacency matrix can well react beginning of the sentence, sentence tail information.4, find a lot of mistake by the statistical method of words, illustrate that N unit words adjacency matrix reflects some inherent laws of natural language to a certain extent.
Although the present invention discloses as above with preferred embodiment; so itself and be not used to limit the present invention, any those skilled in the art, without departing from the spirit and scope of the present invention; when doing a little amendment and perfect, therefore protection scope of the present invention is when being as the criterion of defining with claims.

Claims (8)

1. a Chinese text auto-correction method, is characterized in that, comprises the steps:
A) input wait proofread Chinese text, by simple sentence, participle pre-service is carried out to Chinese text;
B) individual character, double word or three words and above loose string thereof that occur in participle text are searched by simple sentence;
C) adopt N-gram model to judge continuously the loose string occurred in participle text, and each simple sentence is checked to the mistake of text word level in conjunction with inside word probability;
D) construct correcting knowledge sets and generate error correction candidate text.
2. Chinese text auto-correction method as claimed in claim 1, it is characterized in that, described step a) adopts voice or input through keyboard to wait to proofread Chinese text, and described pre-service comprises treating check and correction Chinese text arrangement grammar mistake and carrying out pattern match inspection input.
3. Chinese text auto-correction method as claimed in claim 2, it is characterized in that, in described step a), to wait to proofread the process of Chinese text as follows in phonetic entry: receive the phonetic entry from microphone and transfer the voice flow that computing machine can receive to, the combination of Pattern matching generating candidate words word is carried out to voice flow, utilizes language model to identify the combination of candidate word word.
4. Chinese text auto-correction method as claimed in claim 2, it is characterized in that, in described step a), input through keyboard waits that the process of proofreading Chinese text is as follows: encode to words in advance, keystroke signal is converted to the code sequence that computing machine accepts, and described code sequence is associated with word coding method.
5. Chinese text auto-correction method as claimed in claim 1, it is characterized in that, the deterministic process of described step c) to three words and above loose string thereof is as follows: judge that in loose string, each word becomes separately the probability of word, determine the first error constant, the binary word model that continues is adopted to judge that adjacent two words become the probability of word successively, determine the second error constant, the ternary word model that continues is adopted to judge that adjacent three words become the probability of word successively, determine the 3rd error constant, all error constants are added the terminal error coefficient determining text word level.
6. Chinese text auto-correction method as claimed in claim 5, is characterized in that, described step c) is to continuous four words loose string W kw k+1w k+2w k+3deterministic process as follows:
C1) W is judged respectively kw k+1w k+2w k+3these words become separately the probability of word, if probability P=0 that certain word occurs separately, then this place is wrong, error constant K 1+=1.5;
C2) with W k-2for reference position, W k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation; If R=0, then error constant K 4+=0.2, if R>=1, then K 2-=1.0;
C3) with W k-1for reference position, W k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation; If R=0, then error constant K 3+=0.5, if 1<R<2, then K 3+=0.2, if R>=2, then K 3-=1.0;
C4) with W kthe first character of the first two word is end position, W k+3rear second word is end position, adopts ternary word model to judge, with continuous three word co-occurrence frequency R for basis for estimation; If R=0, then error constant K 4+=0.2, if R>=1, then K 4-=1.0;
C5) with W kprevious word is reference position, W k+3a rear word is end position, adopts binary word model to judge, with continuous two word co-occurrence frequency R for basis for estimation; If R=0, then error constant K 5+=0.8, if 1<R<3, then K 5+=0.5, if R>=3, then K 5-=1.0;
C6) treat debugging individual character for a certain, gained error constant is added, i.e. K=K 1+ K 2+ K 3+ K 4+ K 5if K>=1.5, then this place is wrong, is indicated by Error Text.
7. Chinese text auto-correction method as claimed in claim 5, it is characterized in that, described step d) sorts to the error correction candidate text generated, described sequencer procedure is as follows: use each error correction candidate text to replace former Error Text, simple sentence repetition step b) after replacement and step c) are carried out to debugging process again and obtained corresponding error constant, according to error constant size order, error correction candidate text is sorted.
8. Chinese text auto-correction method as claimed in claim 1, it is characterized in that, described step d) text based error characteristic and the various correcting knowledge sets of likelihood match method construct, described correcting knowledge sets comprises wrongly written character dictionary, easily obscures words allusion quotation, similar code dictionary and/or the two-way dictionary of word drive.
CN201510688403.4A 2015-10-21 2015-10-21 Chinese text automatic correction method Pending CN105279149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510688403.4A CN105279149A (en) 2015-10-21 2015-10-21 Chinese text automatic correction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510688403.4A CN105279149A (en) 2015-10-21 2015-10-21 Chinese text automatic correction method

Publications (1)

Publication Number Publication Date
CN105279149A true CN105279149A (en) 2016-01-27

Family

ID=55148178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510688403.4A Pending CN105279149A (en) 2015-10-21 2015-10-21 Chinese text automatic correction method

Country Status (1)

Country Link
CN (1) CN105279149A (en)

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN105824804A (en) * 2016-03-31 2016-08-03 长安大学 English spelling error correction tool and method based on word bank
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
CN106547741A (en) * 2016-11-21 2017-03-29 江苏科技大学 A kind of Chinese language text auto-collation based on collocation
WO2017161899A1 (en) * 2016-03-24 2017-09-28 华为技术有限公司 Text processing method, device, and computing apparatus
CN107506413A (en) * 2017-08-11 2017-12-22 江苏科技大学 A kind of querying method based on Lucene wrong words
CN107633250A (en) * 2017-09-11 2018-01-26 畅捷通信息技术股份有限公司 A kind of Text region error correction method, error correction system and computer installation
CN107656627A (en) * 2017-09-28 2018-02-02 百度在线网络技术(北京)有限公司 Data inputting method and device
CN107729316A (en) * 2017-10-12 2018-02-23 福建富士通信息软件有限公司 The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese
CN108038098A (en) * 2017-11-28 2018-05-15 苏州市东皓计算机系统工程有限公司 A kind of computword correcting method
CN108132917A (en) * 2017-12-04 2018-06-08 昆明理工大学 A kind of document error correction flag method
WO2018153295A1 (en) * 2017-02-27 2018-08-30 腾讯科技(深圳)有限公司 Text entity extraction method, device, apparatus, and storage media
TWI635406B (en) * 2016-11-25 2018-09-11 英業達股份有限公司 Method for string recognition and machine learning
CN108595410A (en) * 2018-03-19 2018-09-28 小船出海教育科技(北京)有限公司 The automatic of hand-written composition corrects method and device
CN108628826A (en) * 2018-04-11 2018-10-09 广州视源电子科技股份有限公司 Candidate word appraisal procedure, device, computer equipment and storage medium
CN108647681A (en) * 2018-05-08 2018-10-12 重庆邮电大学 A kind of English text detection method with text orientation correction
CN108647202A (en) * 2018-04-11 2018-10-12 广州视源电子科技股份有限公司 Candidate word appraisal procedure, device, computer equipment and storage medium
CN108664467A (en) * 2018-04-11 2018-10-16 广州视源电子科技股份有限公司 Candidate word appraisal procedure, device, computer equipment and storage medium
CN108694167A (en) * 2018-04-11 2018-10-23 广州视源电子科技股份有限公司 Candidate word appraisal procedure, candidate word sort method and device
CN108717412A (en) * 2018-06-12 2018-10-30 北京览群智数据科技有限责任公司 Chinese check and correction error correction method based on Chinese word segmentation and system
CN108733646A (en) * 2018-04-11 2018-11-02 广州视源电子科技股份有限公司 Candidate word appraisal procedure, device, computer equipment and storage medium
CN108845984A (en) * 2018-05-22 2018-11-20 广州视源电子科技股份有限公司 Wrongly-written characters detection method, device and computer readable storage medium, terminal device
CN109062888A (en) * 2018-06-04 2018-12-21 昆明理工大学 A kind of self-picketing correction method when there is Error Text input
CN109213998A (en) * 2018-08-17 2019-01-15 汇智容大(北京)信息技术有限公司 Chinese wrongly written character detection method and system
CN109460552A (en) * 2018-10-29 2019-03-12 朱丽莉 Rule-based and corpus Chinese faulty wording automatic testing method and equipment
CN110046350A (en) * 2019-04-12 2019-07-23 百度在线网络技术(北京)有限公司 Grammatical bloopers recognition methods, device, computer equipment and storage medium
CN110110969A (en) * 2019-04-10 2019-08-09 中国科学院国家空间科学中心 A kind of space environment forecast product gross examines appraisal procedure and system automatically
CN110110334A (en) * 2019-05-08 2019-08-09 郑州大学 A kind of remote medical consultation with specialists recording text error correction method based on natural language processing
CN110134950A (en) * 2019-04-28 2019-08-16 北京百分点信息科技有限公司 A kind of text auto-collation that words combines
CN110135879A (en) * 2018-11-17 2019-08-16 华南理工大学 Customer service quality automatic scoring method based on natural language processing
CN110929514A (en) * 2019-11-20 2020-03-27 北京百分点信息科技有限公司 Text proofreading method and device, computer readable storage medium and electronic equipment
CN110991166A (en) * 2019-12-03 2020-04-10 中国标准化研究院 Chinese wrongly-written character recognition method and system based on pattern matching
CN111079768A (en) * 2019-12-23 2020-04-28 北京爱医生智慧医疗科技有限公司 Character and image recognition method and device based on OCR
CN111079415A (en) * 2019-11-12 2020-04-28 中国标准化研究院 Chinese automatic error checking method based on collocation conflict
CN111144101A (en) * 2019-12-26 2020-05-12 北大方正集团有限公司 Wrongly written character processing method and device
CN111310447A (en) * 2020-03-18 2020-06-19 科大讯飞股份有限公司 Grammar error correction method, grammar error correction device, electronic equipment and storage medium
CN111339755A (en) * 2018-11-30 2020-06-26 中国移动通信集团浙江有限公司 Automatic error correction method and device for office data
CN111626049A (en) * 2020-05-27 2020-09-04 腾讯科技(深圳)有限公司 Title correction method and device for multimedia information, electronic equipment and storage medium
CN111783458A (en) * 2020-08-20 2020-10-16 支付宝(杭州)信息技术有限公司 Method and device for detecting overlapping character errors
CN112711943A (en) * 2020-12-17 2021-04-27 厦门市美亚柏科信息股份有限公司 Uygur language identification method, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1387650A (en) * 1999-11-05 2002-12-25 微软公司 Language input architecture for converting one text form to another text form with minimized typographical errors and conversion errors
CN101655837A (en) * 2009-09-08 2010-02-24 北京邮电大学 Method for detecting and correcting error on text after voice recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1387650A (en) * 1999-11-05 2002-12-25 微软公司 Language input architecture for converting one text form to another text form with minimized typographical errors and conversion errors
CN101655837A (en) * 2009-09-08 2010-02-24 北京邮电大学 Method for detecting and correcting error on text after voice recognition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张仰森,俞士汶: "文本自动校对技术研究综述", 《计算机应用研》 *
潘昊,颜军: "基于中文分词的文本自动校对算法", 《武汉理工大学学报》 *
郇政永: "基于OCR的中文文本校对研究", 《中国优秀硕士学位论文全文数据库 信息科技集》 *

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
WO2017161899A1 (en) * 2016-03-24 2017-09-28 华为技术有限公司 Text processing method, device, and computing apparatus
CN107229627A (en) * 2016-03-24 2017-10-03 华为技术有限公司 A kind of text handling method, device and computing device
CN105824804A (en) * 2016-03-31 2016-08-03 长安大学 English spelling error correction tool and method based on word bank
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
CN105869634B (en) * 2016-03-31 2019-11-19 重庆大学 It is a kind of based on field band feedback speech recognition after text error correction method and system
CN106547741A (en) * 2016-11-21 2017-03-29 江苏科技大学 A kind of Chinese language text auto-collation based on collocation
TWI635406B (en) * 2016-11-25 2018-09-11 英業達股份有限公司 Method for string recognition and machine learning
WO2018153295A1 (en) * 2017-02-27 2018-08-30 腾讯科技(深圳)有限公司 Text entity extraction method, device, apparatus, and storage media
US11222178B2 (en) 2017-02-27 2022-01-11 Tencent Technology (Shenzhen) Company Ltd Text entity extraction method for extracting text from target text based on combination probabilities of segmentation combination of text entities in the target text, apparatus, and device, and storage medium
CN107506413B (en) * 2017-08-11 2020-03-20 江苏科技大学 Lucene wrongly written character based query method
CN107506413A (en) * 2017-08-11 2017-12-22 江苏科技大学 A kind of querying method based on Lucene wrong words
CN107633250A (en) * 2017-09-11 2018-01-26 畅捷通信息技术股份有限公司 A kind of Text region error correction method, error correction system and computer installation
CN107633250B (en) * 2017-09-11 2023-04-18 畅捷通信息技术股份有限公司 Character recognition error correction method, error correction system and computer device
CN107656627B (en) * 2017-09-28 2021-07-23 百度在线网络技术(北京)有限公司 Information input method and device
CN107656627A (en) * 2017-09-28 2018-02-02 百度在线网络技术(北京)有限公司 Data inputting method and device
CN107729316A (en) * 2017-10-12 2018-02-23 福建富士通信息软件有限公司 The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese
CN108038098A (en) * 2017-11-28 2018-05-15 苏州市东皓计算机系统工程有限公司 A kind of computword correcting method
CN108132917A (en) * 2017-12-04 2018-06-08 昆明理工大学 A kind of document error correction flag method
CN108132917B (en) * 2017-12-04 2021-12-17 昆明理工大学 Document error correction marking method
CN108595410A (en) * 2018-03-19 2018-09-28 小船出海教育科技(北京)有限公司 The automatic of hand-written composition corrects method and device
CN108733646B (en) * 2018-04-11 2022-09-06 广州视源电子科技股份有限公司 Candidate word evaluation method and device, computer equipment and storage medium
CN108733646A (en) * 2018-04-11 2018-11-02 广州视源电子科技股份有限公司 Candidate word appraisal procedure, device, computer equipment and storage medium
CN108628826B (en) * 2018-04-11 2022-09-06 广州视源电子科技股份有限公司 Candidate word evaluation method and device, computer equipment and storage medium
CN108664467A (en) * 2018-04-11 2018-10-16 广州视源电子科技股份有限公司 Candidate word appraisal procedure, device, computer equipment and storage medium
CN108694167A (en) * 2018-04-11 2018-10-23 广州视源电子科技股份有限公司 Candidate word appraisal procedure, candidate word sort method and device
CN108647202A (en) * 2018-04-11 2018-10-12 广州视源电子科技股份有限公司 Candidate word appraisal procedure, device, computer equipment and storage medium
CN108628826A (en) * 2018-04-11 2018-10-09 广州视源电子科技股份有限公司 Candidate word appraisal procedure, device, computer equipment and storage medium
CN108647681B (en) * 2018-05-08 2019-06-14 重庆邮电大学 A kind of English text detection method with text orientation correction
CN108647681A (en) * 2018-05-08 2018-10-12 重庆邮电大学 A kind of English text detection method with text orientation correction
CN108845984A (en) * 2018-05-22 2018-11-20 广州视源电子科技股份有限公司 Wrongly-written characters detection method, device and computer readable storage medium, terminal device
CN108845984B (en) * 2018-05-22 2022-04-22 广州视源电子科技股份有限公司 Wrongly written character detection method and device, computer readable storage medium and terminal equipment
CN109062888B (en) * 2018-06-04 2023-03-31 昆明理工大学 Self-correcting method for input of wrong text
CN109062888A (en) * 2018-06-04 2018-12-21 昆明理工大学 A kind of self-picketing correction method when there is Error Text input
CN108717412A (en) * 2018-06-12 2018-10-30 北京览群智数据科技有限责任公司 Chinese check and correction error correction method based on Chinese word segmentation and system
CN109213998B (en) * 2018-08-17 2023-06-23 上海蜜度信息技术有限公司 Chinese character error detection method and system
CN109213998A (en) * 2018-08-17 2019-01-15 汇智容大(北京)信息技术有限公司 Chinese wrongly written character detection method and system
CN109460552A (en) * 2018-10-29 2019-03-12 朱丽莉 Rule-based and corpus Chinese faulty wording automatic testing method and equipment
CN110135879A (en) * 2018-11-17 2019-08-16 华南理工大学 Customer service quality automatic scoring method based on natural language processing
CN110135879B (en) * 2018-11-17 2024-01-16 华南理工大学 Customer service quality automatic scoring method based on natural language processing
CN111339755A (en) * 2018-11-30 2020-06-26 中国移动通信集团浙江有限公司 Automatic error correction method and device for office data
CN110110969A (en) * 2019-04-10 2019-08-09 中国科学院国家空间科学中心 A kind of space environment forecast product gross examines appraisal procedure and system automatically
CN110046350A (en) * 2019-04-12 2019-07-23 百度在线网络技术(北京)有限公司 Grammatical bloopers recognition methods, device, computer equipment and storage medium
CN110134950A (en) * 2019-04-28 2019-08-16 北京百分点信息科技有限公司 A kind of text auto-collation that words combines
CN110134950B (en) * 2019-04-28 2022-12-06 北京百分点科技集团股份有限公司 Automatic text proofreading method combining words
CN110110334A (en) * 2019-05-08 2019-08-09 郑州大学 A kind of remote medical consultation with specialists recording text error correction method based on natural language processing
CN110110334B (en) * 2019-05-08 2022-09-13 郑州大学 Remote consultation record text error correction method based on natural language processing
CN111079415A (en) * 2019-11-12 2020-04-28 中国标准化研究院 Chinese automatic error checking method based on collocation conflict
CN110929514B (en) * 2019-11-20 2023-06-27 北京百分点科技集团股份有限公司 Text collation method, text collation apparatus, computer-readable storage medium, and electronic device
CN110929514A (en) * 2019-11-20 2020-03-27 北京百分点信息科技有限公司 Text proofreading method and device, computer readable storage medium and electronic equipment
CN110991166A (en) * 2019-12-03 2020-04-10 中国标准化研究院 Chinese wrongly-written character recognition method and system based on pattern matching
CN110991166B (en) * 2019-12-03 2021-07-30 中国标准化研究院 Chinese wrongly-written character recognition method and system based on pattern matching
CN111079768A (en) * 2019-12-23 2020-04-28 北京爱医生智慧医疗科技有限公司 Character and image recognition method and device based on OCR
CN111144101A (en) * 2019-12-26 2020-05-12 北大方正集团有限公司 Wrongly written character processing method and device
CN111144101B (en) * 2019-12-26 2021-12-03 北大方正集团有限公司 Wrongly written character processing method and device
CN111310447A (en) * 2020-03-18 2020-06-19 科大讯飞股份有限公司 Grammar error correction method, grammar error correction device, electronic equipment and storage medium
CN111310447B (en) * 2020-03-18 2024-02-02 河北省讯飞人工智能研究院 Grammar error correction method, grammar error correction device, electronic equipment and storage medium
CN111626049A (en) * 2020-05-27 2020-09-04 腾讯科技(深圳)有限公司 Title correction method and device for multimedia information, electronic equipment and storage medium
CN111626049B (en) * 2020-05-27 2022-12-16 深圳市雅阅科技有限公司 Title correction method and device for multimedia information, electronic equipment and storage medium
CN111783458A (en) * 2020-08-20 2020-10-16 支付宝(杭州)信息技术有限公司 Method and device for detecting overlapping character errors
CN111783458B (en) * 2020-08-20 2024-05-03 支付宝(杭州)信息技术有限公司 Method and device for detecting character overlapping errors
CN112711943A (en) * 2020-12-17 2021-04-27 厦门市美亚柏科信息股份有限公司 Uygur language identification method, device and storage medium
CN112711943B (en) * 2020-12-17 2023-11-24 厦门市美亚柏科信息股份有限公司 Uygur language identification method, device and storage medium

Similar Documents

Publication Publication Date Title
CN105279149A (en) Chinese text automatic correction method
CN110489760B (en) Text automatic correction method and device based on deep neural network
US20210157975A1 (en) Device, system, and method for extracting named entities from sectioned documents
CN103885938B (en) Industry spelling mistake checking method based on user feedback
CN113495900B (en) Method and device for obtaining structured query language statement based on natural language
US11055327B2 (en) Unstructured data parsing for structured information
Li et al. Spelling error correction using a nested RNN model and pseudo training data
CN111062376A (en) Text recognition method based on optical character recognition and error correction tight coupling processing
Mehmood et al. An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis
CN113076739A (en) Method and system for realizing cross-domain Chinese text error correction
CN110110334B (en) Remote consultation record text error correction method based on natural language processing
CN111651978A (en) Entity-based lexical examination method and device, computer equipment and storage medium
CN112417823B (en) Chinese text word order adjustment and word completion method and system
CN112101010A (en) Telecom industry OA office automation manuscript auditing method based on BERT
JP2018206262A (en) Word linking identification model learning device, word linking detection device, method and program
KR20230009564A (en) Learning data correction method and apparatus thereof using ensemble score
CN113673228A (en) Text error correction method, text error correction device, computer storage medium and computer program product
Hládek et al. Learning string distance with smoothing for OCR spelling correction
Uthayamoorthy et al. Ddspell-a data driven spell checker and suggestion generator for the tamil language
CN116306600A (en) MacBert-based Chinese text error correction method
Chaudhuri Reversed word dictionary and phonetically similar word grouping based spell-checker to Bangla text
Mittra et al. A bangla spell checking technique to facilitate error correction in text entry environment
Juan et al. Handwritten text recognition for ancient documents
Hocking et al. Optical character recognition for South African languages
CN114580391A (en) Chinese error detection model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160127