CN105045778B - A kind of Chinese homonym mistake auto-collation - Google Patents

A kind of Chinese homonym mistake auto-collation Download PDF

Info

Publication number
CN105045778B
CN105045778B CN201510354692.4A CN201510354692A CN105045778B CN 105045778 B CN105045778 B CN 105045778B CN 201510354692 A CN201510354692 A CN 201510354692A CN 105045778 B CN105045778 B CN 105045778B
Authority
CN
China
Prior art keywords
homonym
word
chinese
adjacent
mistake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510354692.4A
Other languages
Chinese (zh)
Other versions
CN105045778A (en
Inventor
吴健康
严熙
刘亮亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN201510354692.4A priority Critical patent/CN105045778B/en
Publication of CN105045778A publication Critical patent/CN105045778A/en
Application granted granted Critical
Publication of CN105045778B publication Critical patent/CN105045778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of Chinese homonym mistake auto-collation, the homonym that this method firstly generates Chinese word obscures collection, then left adjacent binary is counted by the training of extensive Web language materials, right adjacent binary, adjacent ternary model, obscure collection using homonym and probability Estimation algorithm obtains local adjacent NGram models, then the combined method of Weight is utilized, context support of the homonym of concentration in sentence is obscured by the word and its corresponding homonym that calculate in sentence, judge whether homonym mistake, and homonym mistake is marked and amending advice list is provided, so as to realize the automatic Proofreading of Chinese homonym.The homonym mistake auto-collation that the present invention is provided, faster system response, precision meet practical application request, and validity and accuracy are high.

Description

A kind of Chinese homonym mistake auto-collation
Technical field
The present invention relates to the natural language processing in artificial intelligence computer field, more particularly to automatic proofreading for Chinese texts Field.
Background technology
With the information processing technology and the high speed development of internet, traditional text work almost all is taken by computer In generation, e-text, blog, the microblogging such as e-book, electronic newspaper, Email, office document etc. all turn into people's daily life A part, but in text mistake it is also more and more, this brings very big challenge to proof-reading.Traditional artificial school , intensity low to efficiency is big, cycle length obviously can not meet the demand of text proofreading.
Text automatic Proofreading is one of main application of natural language processing, is also the problem of natural language understanding.Chinese It is to be input to by input method in computer, as increasing people using spelling input method inputs Chinese character, and Pinyin Input Method can be with input word and phrase, therefore occurs increasing homonym mistake in the text, and homonym mistake is to belong to true word The category of mistake.The auto-collation of the true word mistake of Chinese has problems with:
1) word for occurring true word mistake is also the word in dictionary, and this is the difficult point of automatic proofreading for Chinese texts.
2) true word mistake can disturb the syntax and semantics of whole sentence, consequently found that true word mistake need many knowledge with Resource.
3) Sparse is a main obstacle of the automatic Proofreading of true word mistake;
4) homonym automatic Proofreading includes automatic errordetecting and automatic error-correcting, and automatic errordetecting is to find that the homonym in sentence is wrong By mistake, and automatic error-correcting is to be proofreaded that there is provided amending advice to the mistake in sentence.And many methods are all to look into automatically at present Wrong and two stages of automatic error-correcting separate.
For above-mentioned Railway Project, the present invention proposes and realizes the automatic errordetecting of Chinese homonym mistake and automatic school To method.
The content of the invention
Goal of the invention:In order to overcome the deficiencies in the prior art, the present invention provides a kind of Chinese homonym mistake certainly Dynamic proofreading method, is a kind of method for integrating automatic errordetecting and automatic Proofreading.
Technical scheme:
In order to solve the above technical problems, the present invention provides a kind of Chinese homonym mistake auto-collation, it is based on same Sound word obscures the adjacent NGram models combination determining method of the part of collection and Weight and carries out Chinese homonym mistake automatic Proofreading, should Method comprises the following steps:
1) by phonetic transcriptions of Chinese characters, the homonym for setting up Chinese word obscures collection;
2) set up that the second from left is first, right binary and ternary local adjacent NGram models, based on step 1) obtained homonym mixes Confuse collection, carries out probability Estimation to the local adjacent NGram models by Probabilistic estimation, is trained by large-scale corpus To local adjacent NGram models;
3) it is based on step 2) obtained adjacent NGram models of part, using the combined method of Weight, by calculating sentence In word and its corresponding homonym obscure context support of the homonym of concentration in sentence, judge whether unisonance Word mistake, and homonym mistake is marked and amending advice list is provided.
It is preferred that, the step 1) include:Using phonetic transcriptions of Chinese characters table and Chinese dictionary, generation homonym obscures collection:
Wherein WiIt is a Chinese word,It is WiHomonym.
It is preferred that, the step 1) in homonym obscure collection source include two parts:Automatic identification part and artificial school To part;
Wherein automatic identification part comprises the following steps:
Step 11) Chinese dictionary is read, the Chinese word in Chinese dictionary is read into Chinese word structure;
Step 12) phonetic transcriptions of Chinese characters of phonetic transcriptions of Chinese characters table is read in into phonetic transcriptions of Chinese characters structure;
Step 13) combine step 11) obtained Chinese word and step 12) obtained phonetic transcriptions of Chinese characters, by Chinese dictionary Chinese word is converted into phonetic, is put into generation homonym dictionary configuration, i.e. homonym in unisonance word structure and obscures collection;
Wherein artificial check and correction part includes:To step 13) obtained homonym obscures collection and manually proofreaded, renewal unisonance Word obscures collection;
The structure that the homonym obscures collection is as follows:
Wherein WiIt is a word,It is WiHomonym.
It is preferred that, the step 2) comprise the following steps:
Step 21) extensive Web language materials are based on, set up the local adjacent of left adjacent binary, right adjacent binary and adjacent ternary Connect NGram models:Participle is carried out to the sentence in language material, such as carrying out participle to sentence L obtains L=W1W2..Wi-1WiWi+1… Wn, for word WiFor,
Left adjacent binary is:LeftBiGram(Wi)=Wi-1Wi
Right adjacent binary is:RightBiGram(Wi)=WiWi+1
Abutting ternary is:TriGram(Wi)=Wi-1WiWi+1
Step 22) collection CSet (W are obscured based on extensive Web language materials and homonymi), it is all that statistics homonym obscures concentration Left adjacent binary and its co-occurrence frequency of word, right adjacent binary and its co-occurrence frequency and adjacent ternary and its co-occurrence frequency, its In It is WiHomonym;
Step 23) collection CSet (W are obscured based on homonymi), to the office of left adjacent binary, right adjacent binary and adjacent ternary Portion's adjoining NGram models carry out probability Estimation, so as to generate the adjacent NGram models of the part comprising probabilistic estimated value;Wherein
The probability Estimation of left adjacent binary is:
The probability Estimation of right adjacent binary is:
The probability Estimation of adjacent ternary is:
Wherein Wk∈CSet(Wi), Count (Wi-1Wi) represent Wi-1WiThe co-occurrence frequency in language material, Count (WiWi+1) Represent WiWi+1The co-occurrence frequency in language material, Count (Wi-1WiWi+1) represent Wi-1WiWi+1The co-occurrence frequency in language material,RepresentThe co-occurrence frequency in language material,RepresentIn language material The co-occurrence frequency,RepresentThe co-occurrence frequency in language material.
It is preferred that, the step 3) comprise the following steps:
Step 31) to the sentence S progress participles of application this method, travel through the word W in the sentence S after participlei, judge whether There is homonym and obscure collection CSet (Wi), to wherein exist homonym obscure collection word carry out step 32) processing, until sentence In word be traversed;
Step 32) if WiThere is CSet (Wi), based on step 2) obtained adjacent NGram models of part, utilize Weight Combined method, context of the homonym of concentration in sentence is obscured by the word and its corresponding homonym that calculate in sentence Support, judges whether homonym mistake, and homonym mistake is marked and amending advice list is provided, specific bag Include:
Step 32-1) by combining the word W in scoring function Score calculating sentences SiContext support in sentence For
Score(Wi)=α1*Pleft(Wi|Wi-1)+α2*Pright(Wi|Wi+1)+α3*Ptri(Wi|Wi-1Wi+1)(4);
Wherein α123=1, α1>0,α2>0,α3>0, α1、α2、α3Left adjacent binary, right adjacent binary, neighbour are represented respectively Connect the weight of ternary;
Step 32-2) by combining the word W in scoring function Score calculating sentences SiCorresponding homonym obscures concentration Each homonymContext support in sentence is
Wherein:
Step 32-3) to WiAnd its in CSet (Wi) in the context support Score of each homonym arranged Sequence;
Step 32-4) if Score (Wi)=0, then to WiMarked erroneous, and list in Score sequencing tables'sIt is used as amending advice list;Otherwise step 32-5 is turned to);
Step 32-5) if Score (Wi)>0, andTo Wi Marked erroneous, is listed file names with It is correspondingAs amending advice list, W is otherwise markediFor Correct word, wherein β is the wrong probability into its homonym of a word.
It is used as preferred, above-mentioned steps 32-1) and step 32-2) in, the weight α of left adjacent binary1=0.25, right adjoining The weight α of binary2=0.25, the weight α of adjacent ternary3=0.5.
It is preferred that, the step 32-5) in word it is wrong into probability values≤0.01 of its homonym.
Beneficial effect:The present invention proposes a kind of auto-collation of Chinese homonym mistake, and this method uses unisonance Word obscures collection and left adjacent binary, right adjacent binary and the adjacent triple combination method of Weight are carried out to the homonym in sentence Judge, recognize homonym mistake, and provide the amending advice of homonym mistake, integrate automatic errordetecting and automatic Proofreading. Experiment shows that the method recall rate for the homonym mistake automatic Proofreading that the present invention is provided reaches 81.2%, and precision reaches 75.6%, Faster system response, precision meet practical application request, and validity and accuracy are high, with higher practicality.
Brief description of the drawings
Fig. 1 homonym mistake automatic Proofreading flow charts.
Embodiment
The present invention is further described with reference to the accompanying drawings and examples.
A kind of Chinese homonym mistake auto-collation that the present invention is provided obscures collection and Weight based on homonym Local adjacent NGram models combination determining method carries out Chinese homonym mistake automatic Proofreading, and this method comprises the following steps: 1)、 Set up homonym and obscure collection, to Chinese word by phonetic transcriptions of Chinese characters, the homonym for setting up Chinese word obscures collection.
As shown in figure 1, using phonetic transcriptions of Chinese characters table and Chinese dictionary, generation homonym obscures collection:
Wherein WiIt is a Chinese word,It is WiHomonym.
In the present embodiment homonym obscure collection source include two parts:Automatic identification part and artificial check and correction part;
Wherein automatic identification part comprises the following steps:
Step 11) Chinese dictionary is read, the Chinese word in Chinese dictionary is read into Chinese word structure;
Step 12) phonetic transcriptions of Chinese characters of phonetic transcriptions of Chinese characters table is read in into phonetic transcriptions of Chinese characters structure;
Step 13) combine step 11) obtained Chinese word and step 12) obtained phonetic transcriptions of Chinese characters, by Chinese dictionary Chinese word is converted into phonetic, is put into generation homonym dictionary configuration, i.e. homonym in unisonance word structure and obscures collection;
Wherein artificial check and correction part includes:To step 13) obtained homonym obscures collection and manually proofreaded, renewal unisonance Word obscures collection.
2), set up that the second from left is first, right binary and ternary local adjacent NGram models, based on step 1) obtained homonym Obscure collection, probability Estimation is carried out to the local adjacent NGram models by Probabilistic estimation, trained by large-scale corpus Obtain local adjacent NGram models.Specially:
Step 21) extensive Web language materials are based on, set up the local adjacent of left adjacent binary, right adjacent binary and adjacent ternary Connect NGram models:Participle is carried out to the sentence in language material, such as carrying out participle to sentence L obtains L=W1W2..Wi-1WiWi+1… Wn, for word WiFor,
Left adjacent binary is:LeftBiGram(Wi)=Wi-1Wi
Right adjacent binary is:RightBiGram(Wi)=WiWi+1
Abutting ternary is:TriGram(Wi)=Wi-1WiWi+1
Step 22) collection CSet (W are obscured based on extensive Web language materials and homonymi), it is all that statistics homonym obscures concentration Left adjacent binary and its co-occurrence frequency of word, right adjacent binary and its co-occurrence frequency and adjacent ternary and its co-occurrence frequency, its In It is WiHomonym;
Step 23) collection CSet (W are obscured based on homonymi), to the office of left adjacent binary, right adjacent binary and adjacent ternary Portion's adjoining NGram models carry out probability Estimation, so as to generate the adjacent NGram models of the part comprising probabilistic estimated value;Wherein
The probability Estimation of left adjacent binary is:
The probability Estimation of right adjacent binary is:
The probability Estimation of adjacent ternary is:
Wherein Wk∈CSet(Wi), Count (Wi-1Wi) represent Wi-1WiThe co-occurrence frequency in language material, Count (WiWi+1) Represent WiWi+1The co-occurrence frequency in language material, Count (Wi-1WiWi+1) represent Wi-1WiWi+1The co-occurrence frequency in language material,Represent Wi- The co-occurrence frequency in language material,RepresentWi+1In language material The co-occurrence frequency,RepresentThe co-occurrence frequency in language material.
3), based on step 2) obtained adjacent NGram models of part, using the combined method of Weight, by calculating sentence Word and its corresponding homonym in son obscure context support of the homonym of concentration in sentence, judge whether same Sound word mistake, and homonym mistake is marked and amending advice list is provided.As shown in figure 1, being specially:
Step 31) to the sentence S progress participles of application this method, travel through the word W in the sentence S after participlei, judge whether There is homonym and obscure collection CSet (Wi), to wherein exist homonym obscure collection word carry out step 32) processing, until sentence In word be traversed;
Step 32) if WiThere is CSet (Wi), based on step 2) obtained adjacent NGram models of part, utilize Weight Combined method, context of the homonym of concentration in sentence is obscured by the word and its corresponding homonym that calculate in sentence Support, judges whether homonym mistake, and homonym mistake is marked and amending advice list is provided, specific bag Include:
Step 32-1) by combining the word W in scoring function Score calculating sentences SiContext support in sentence For
Score(Wi)=α1*Pleft(Wi|Wi-1)+α2*Pright(Wi|Wi+1)+α3*Ptri(Wi|Wi-1Wi+1)(4);
Wherein α123=1, α1>0,α2>0,α3>0, α1、α2、α3Left adjacent binary, right adjacent binary, neighbour are represented respectively Connect the weight of ternary;In the present embodiment, α12=0.25, α3=0.5, naturally it is also possible to suitably adjusted according to actual needs It is whole.
Step 32-2) by combining the word W in scoring function Score calculating sentences SiCorresponding homonym obscures concentration Each homonymContext support in sentence is
Wherein
Step 32-3) to WiAnd its in CSet (Wi) in the context support Score of each homonym arranged Sequence;
Step 32-4) if Score (Wi)=0, then to WiMarked erroneous, and list in Score sequencing tables'sIt is used as amending advice list;Otherwise step 32-5 is turned to);
Step 32-5) if Score (Wi)>0, and To Wi Marked erroneous, is listed file names with It is correspondingAs amending advice list, W is otherwise markediFor Correct word, wherein β is the wrong probability into its homonym of a word, usual β≤0.01, β=0.01 in the present embodiment.
Experiment:
Repeatedly open test is lived through, experiment is used in the testing material of 10,000 row sentences, manual construction language material sentence At homonym error 6 00, the parameter given using in embodiment is experiment parameter.Experiment shows that the homonym that the present invention is provided is wrong The method recall rate of automatic Proofreading reaches 81.2% by mistake, and precision reaches 75.6%.This precision has exceeded prior art, reaches The demand of practical application, with higher validity and accuracy.
It is only presently preferred embodiments of the present invention to implement row above, does not constitute restriction to the present invention, relevant staff is not Deviate in the range of the technology of the present invention thought, any modification, equivalent substitution and improvements carried out etc. all fall within the guarantor of the present invention In the range of shield.

Claims (5)

1. a kind of Chinese homonym mistake auto-collation, it is characterised in that the office of collection and Weight is obscured based on homonym Adjoining NGram model combination determining methods in portion carry out Chinese homonym mistake automatic Proofreading, and this method comprises the following steps:
1) by phonetic transcriptions of Chinese characters, the homonym for setting up Chinese word obscures collection;
2) set up that the second from left is first, right binary and ternary local adjacent NGram models, based on step 1) obtained homonym obscures Collection, carries out probability Estimation to the local adjacent NGram models by Probabilistic estimation, is obtained by large-scale corpus training Local adjacent NGram models;
3) it is based on step 2) obtained adjacent NGram models of part, using the combined method of Weight, by calculating in sentence Word and its corresponding homonym obscure context support of the homonym of concentration in sentence, judge whether that homonym is wrong Miss, and homonym mistake is marked and amending advice list is provided;
Wherein described step 2) comprise the following steps:
Step 21) extensive Web language materials are based on, the part for setting up left adjacent binary, right adjacent binary and adjacent ternary is adjacent NGram models:Participle is carried out to the sentence in language material, such as carrying out participle to sentence L obtains L=W1W2..Wi-1WiWi+1…Wn, it is right In word WiFor,
Left adjacent binary is:LeftBiGram(Wi)=Wi-1Wi
Right adjacent binary is:RightBiGram(Wi)=WiWi+1
Abutting ternary is:TriGram(Wi)=Wi-1WiWi+1
Step 22) collection CSet (W are obscured based on extensive Web language materials and homonymi), statistics homonym, which is obscured, concentrates all words Left adjacent binary and its co-occurrence frequency, right adjacent binary and its co-occurrence frequency and adjacent ternary and its co-occurrence frequency, wherein It is WiHomonym;
Step 23) collection CSet (W are obscured based on homonymi), to the local adjacent of left adjacent binary, right adjacent binary and adjacent ternary Connect NGram models and carry out probability Estimation, so as to generate the adjacent NGram models of the part comprising probabilistic estimated value;Wherein left adjoining The probability Estimation of binary is:
The probability Estimation of right adjacent binary is:
The probability Estimation of adjacent ternary is:
WhereinCount(Wi-1Wi) represent Wi-1WiThe co-occurrence frequency in language material, Count (WiWi+1) represent WiWi+1The co-occurrence frequency in language material, Count (Wi-1WiWi+1) represent Wi-1WiWi+1The co-occurrence frequency in language material,RepresentThe co-occurrence frequency in language material,RepresentIn language material The co-occurrence frequency,RepresentThe co-occurrence frequency in language material;
Wherein described step 3) comprise the following steps:
Step 31) to the sentence S progress participles of application this method, travel through the word W in the sentence S after participlei, judge whether Homonym obscures collection CSet (Wi), to wherein exist homonym obscure collection word carry out step 32) processing, until in sentence Word has been traversed;
Step 32) if WiThere is CSet (Wi), based on step 2) obtained adjacent NGram models of part, utilize the group of Weight Conjunction method, obscures context of the homonym of concentration in sentence by the word and its corresponding homonym that calculate in sentence and supports Degree, judges whether homonym mistake, and homonym mistake is marked and amending advice list is provided, and specifically includes:
Step 32-1) by combining the word W in scoring function Score calculating sentences SiContext support in sentence is
Score(Wi)=α1*Pleft(Wi|Wi-1)+α2*Pright(Wi|Wi+1)+α3*Ptri(Wi|Wi-1Wi+1) (4);
Wherein α123=1, α1>0,α2>0,α3>0, α1、α2、α3Left adjacent binary, right adjacent binary, adjacent three are represented respectively The weight of member;
Step 32-2) by combining the word W in scoring function Score calculating sentences SiCorresponding homonym obscures each of concentration Individual homonymContext support in sentence is
Wherein
Step 32-3) to WiAnd its in CSet (Wi) in the context support Score of each homonym be ranked up;
Step 32-4) if Score (Wi)=0, then to WiMarked erroneous, and list in Score sequencing tables'sIt is used as amending advice list;Otherwise step 32-5 is turned to);
Step 32-5) if Score (Wi)>0, andTo WiMark is wrong By mistake, list file names withIt is correspondingAs amending advice list, W is otherwise markediTo be correct Word, wherein β are the wrong probability into its homonym of a word.
2. Chinese homonym mistake auto-collation according to claim 1, it is characterised in that the step 1) include: Using phonetic transcriptions of Chinese characters table and Chinese dictionary, generation homonym obscures collection:
Wherein WiIt is a Chinese word,It is WiHomonym.
3. Chinese homonym mistake auto-collation according to claim 1, it is characterised in that the step 1) in it is same The source that sound word obscures collection includes two parts:Automatic identification part and artificial check and correction part;
Wherein automatic identification part comprises the following steps:
Step 11) Chinese dictionary is read, the Chinese word in Chinese dictionary is read into Chinese word structure;
Step 12) phonetic transcriptions of Chinese characters of phonetic transcriptions of Chinese characters table is read in into phonetic transcriptions of Chinese characters structure;
Step 13) combine step 11) obtained Chinese word and step 12) obtained phonetic transcriptions of Chinese characters, by the Chinese in Chinese dictionary Word is converted into phonetic, is put into generation homonym dictionary configuration, i.e. homonym in unisonance word structure and obscures collection;
Wherein artificial check and correction part includes:To step 13) obtained homonym obscures collection and manually proofreaded, and updates homonym mixed Confuse collection;
The structure that the homonym obscures collection is as follows:
Wherein WiIt is a word,It is WiHomonym.
4. Chinese homonym mistake auto-collation according to claim 1, it is characterised in that:The step 32-1) With step 32-2) in, the weight α of left adjacent binary1=0.25, the weight α of right adjacent binary2=0.25, the weight of adjacent ternary α3=0.5.
5. Chinese homonym mistake auto-collation according to claim 1, it is characterised in that:The step 32-5) In word it is wrong into probability values≤0.01 of its homonym.
CN201510354692.4A 2015-06-24 2015-06-24 A kind of Chinese homonym mistake auto-collation Active CN105045778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510354692.4A CN105045778B (en) 2015-06-24 2015-06-24 A kind of Chinese homonym mistake auto-collation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510354692.4A CN105045778B (en) 2015-06-24 2015-06-24 A kind of Chinese homonym mistake auto-collation

Publications (2)

Publication Number Publication Date
CN105045778A CN105045778A (en) 2015-11-11
CN105045778B true CN105045778B (en) 2017-10-17

Family

ID=54452334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510354692.4A Active CN105045778B (en) 2015-06-24 2015-06-24 A kind of Chinese homonym mistake auto-collation

Country Status (1)

Country Link
CN (1) CN105045778B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105573979B (en) * 2015-12-10 2018-05-22 江苏科技大学 A kind of wrongly written character word knowledge generation method that collection is obscured based on Chinese character
CN106528616B (en) * 2016-09-30 2019-12-17 厦门快商通科技股份有限公司 Language error correction method and system in human-computer interaction process
CN109388252B (en) * 2017-08-14 2022-10-04 北京搜狗科技发展有限公司 Input method and device
CN110083819B (en) * 2018-01-26 2024-02-09 北京京东尚科信息技术有限公司 Spelling error correction method, device, medium and electronic equipment
CN108563634A (en) * 2018-03-29 2018-09-21 广州视源电子科技股份有限公司 Recognition methods, system, computer equipment and the storage medium of word misspelling
CN108519973A (en) * 2018-03-29 2018-09-11 广州视源电子科技股份有限公司 Detection method, system, computer equipment and the storage medium of word spelling
CN108563632A (en) * 2018-03-29 2018-09-21 广州视源电子科技股份有限公司 Modification method, system, computer equipment and the storage medium of word misspelling
CN108491392A (en) * 2018-03-29 2018-09-04 广州视源电子科技股份有限公司 Modification method, system, computer equipment and the storage medium of word misspelling
CN108845984B (en) * 2018-05-22 2022-04-22 广州视源电子科技股份有限公司 Wrongly written character detection method and device, computer readable storage medium and terminal equipment
CN108984515B (en) * 2018-05-22 2022-09-06 广州视源电子科技股份有限公司 Wrongly written character detection method and device, computer readable storage medium and terminal equipment
CN108874770B (en) * 2018-05-22 2022-04-22 广州视源电子科技股份有限公司 Wrongly written character detection method and device, computer readable storage medium and terminal equipment
CN108829665B (en) * 2018-05-22 2022-05-31 广州视源电子科技股份有限公司 Wrongly written character detection method and device, computer readable storage medium and terminal equipment
CN110600011B (en) * 2018-06-12 2022-04-01 中国移动通信有限公司研究院 Voice recognition method and device and computer readable storage medium
CN110619119B (en) * 2019-07-23 2022-06-10 平安科技(深圳)有限公司 Intelligent text editing method and device and computer readable storage medium
CN110717021B (en) * 2019-09-17 2023-08-29 平安科技(深圳)有限公司 Input text acquisition and related device in artificial intelligence interview
CN110851599B (en) * 2019-11-01 2023-04-28 中山大学 Automatic scoring method for Chinese composition and teaching assistance system
CN110991166B (en) * 2019-12-03 2021-07-30 中国标准化研究院 Chinese wrongly-written character recognition method and system based on pattern matching
CN111161739B (en) * 2019-12-28 2023-01-17 科大讯飞股份有限公司 Speech recognition method and related product
CN111312209A (en) * 2020-02-21 2020-06-19 北京声智科技有限公司 Text-to-speech conversion processing method and device and electronic equipment
CN111709228B (en) * 2020-06-22 2023-11-21 中国标准化研究院 Automatic identification method for word repetition errors
CN112668328A (en) * 2020-12-25 2021-04-16 广东南方新媒体科技有限公司 Media intelligent proofreading algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375807A (en) * 2010-08-27 2012-03-14 汉王科技股份有限公司 Method and device for proofing characters
US8364487B2 (en) * 2008-10-21 2013-01-29 Microsoft Corporation Speech recognition system with display information
CN104166462A (en) * 2013-05-17 2014-11-26 北京搜狗科技发展有限公司 Input method and system for characters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364487B2 (en) * 2008-10-21 2013-01-29 Microsoft Corporation Speech recognition system with display information
CN102375807A (en) * 2010-08-27 2012-03-14 汉王科技股份有限公司 Method and device for proofing characters
CN104166462A (en) * 2013-05-17 2014-11-26 北京搜狗科技发展有限公司 Input method and system for characters

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于决策列表的中文同音词自动识别与校对;石敏 等;《电子设计工程》;20150531;第23卷(第9期);第39-41页 *
汉字种子混淆集的构建方法研究;施恒利 等;《计算机科学》;20140831;第41卷(第8期);第230-231页 *

Also Published As

Publication number Publication date
CN105045778A (en) 2015-11-11

Similar Documents

Publication Publication Date Title
CN105045778B (en) A kind of Chinese homonym mistake auto-collation
CN104991889B (en) A kind of non-multi-character word error auto-collation based on fuzzy participle
Shaalan et al. Arabic word generation and modelling for spell checking.
CN103970765A (en) Error correcting model training method and device, and text correcting method and device
CN105824800B (en) A kind of true word mistake auto-collation of Chinese
Zhang et al. HANSpeller++: A unified framework for Chinese spelling correction
CN105512110B (en) A kind of wrongly written character word construction of knowledge base method based on fuzzy matching with statistics
KR101633556B1 (en) Apparatus for grammatical error correction and method using the same
Ljubešić et al. Predicting the level of text standardness in user-generated content
Richter et al. Korektor–a system for contextual spell-checking and diacritics completion
CN108280065B (en) Foreign text evaluation method and device
CN104346326A (en) Method and device for determining emotional characteristics of emotional texts
CN106528533A (en) Dynamic sentiment word and special adjunct word-based text sentiment analysis method
JP6626917B2 (en) Readability evaluation method and system based on English syllable calculation method
Schneider et al. Comparing rule-based and SMT-based spelling normalisation for English historical texts
CN106202037B (en) Vietnamese phrase tree constructing method based on chunking
CN112115701B (en) News reading text readability evaluation method and system
CN107894977A (en) With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary
Geyken et al. On-the-fly Generation of Dictionary Articles for the DWDS Website
Schottmüller et al. Issues in translating verb-particle constructions from german to english
CN111027314A (en) Character attribute extraction method based on language fragment
US10755594B2 (en) Method and system for analyzing a piece of text
Ji et al. Analysis and repair of name tagger errors
Lu et al. Language model for Mongolian polyphone proofreading
CN104978311B (en) A kind of Vietnamese segmenting method based on condition random field

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20151111

Assignee: JIANGSU KEDA HUIFENG SCIENCE AND TECHNOLOGY Co.,Ltd.

Assignor: JIANGSU University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2020980007325

Denomination of invention: An automatic correction method for Chinese homonym errors

Granted publication date: 20171017

License type: Common License

Record date: 20201029

EC01 Cancellation of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: JIANGSU KEDA HUIFENG SCIENCE AND TECHNOLOGY Co.,Ltd.

Assignor: JIANGSU University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2020980007325

Date of cancellation: 20201223