CN102368271A - Chinese content spelling correcting system and method with fault-tolerant capability - Google Patents

Chinese content spelling correcting system and method with fault-tolerant capability Download PDF

Info

Publication number
CN102368271A
CN102368271A CN2011103399344A CN201110339934A CN102368271A CN 102368271 A CN102368271 A CN 102368271A CN 2011103399344 A CN2011103399344 A CN 2011103399344A CN 201110339934 A CN201110339934 A CN 201110339934A CN 102368271 A CN102368271 A CN 102368271A
Authority
CN
China
Prior art keywords
chinese
module
fault
content
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103399344A
Other languages
Chinese (zh)
Inventor
陈淮琰
陈国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Besta Xian Co Ltd
Original Assignee
Inventec Besta Xian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Xian Co Ltd filed Critical Inventec Besta Xian Co Ltd
Priority to CN2011103399344A priority Critical patent/CN102368271A/en
Publication of CN102368271A publication Critical patent/CN102368271A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a Chinese content spelling correcting system with fault-tolerant capability, comprising a Pinyin or Chinese content obtaining module, a searching module for comparison searching, a storage module for storing information of a comparison table, a processing module for judging matching information, and an output module for outputting matching content, wherein the Pinyin or Chinese content obtaining module is connected with the searching module, the processing module is connected with the searching module, the output module is connected with the processing module, and the storage module is connected with the searching module. The Chinese content spelling correcting system and method with fault-tolerant capability, provided by the invention, have the advantages of wide application range and strong practical applicability.

Description

A kind of Chinese content spelling correction system and method that has fault-tolerant ability
Technical field
The present invention relates to a kind of method of proofreading and correct Chinese content, relate in particular to a kind of Chinese content spelling correction system and method that has fault-tolerant ability.
Background technology
There is the electronic product such as e-dictionary, PDA, PC of the picture of Chinese Query or input very many now; Also used by a lot of user, but for inquiry or when importing Chinese content, because there is word to misspoke; For example: think input " captious (chuimaoqiuci) "; But be entered as " blowing ball top ", but just can't inquire about or match on actual " the finding fault " wanting to import of user in the prior art, have very big limitation by " blowing the ball top ratio " than (chuimaoqiubi).
Summary of the invention
In order to solve existing technical matters in the background technology, the present invention proposes a kind of Chinese content spelling correction system and method that has fault-tolerant ability, the scope of application is extensive, and is practical.
Technical solution of the present invention is: a kind of Chinese content spelling correction system that has fault-tolerant ability, and its special character is: said system comprises the acquisition module that obtains phonetic or Chinese content, contrast searching module, storing the output module of the storage module of table of comparisons information, the processing module of judging match information and output matching content of searching; Said acquisition module with search module and be connected, said processing module with search module and be connected, said output module is connected with processing module, said storage module with search module and be connected.
Said system also comprises the modular converter that Chinese content is changed, and said modular converter is connected with acquisition module.
Above-mentioned storage module comprises the phonetic data bank of default everyday character word combination, the Chinese-character phonetic letter table of comparisons of everyday character word combination and the database of corresponding Chinese character and the contrast of its pinyin sequence.
A kind of Chinese content spelling correction method that has fault-tolerant ability, its special character is: said method comprising the steps of:
1) sets up the phonetic data bank of default everyday character word combination;
2) set up the Chinese-character phonetic letter table of comparisons of everyday character word combination;
5) obtain pinyin sequence;
6) inquire about the words that is complementary with the pinyin sequence that obtains according to the Chinese-character phonetic letter table of comparisons of everyday character word combination;
7) judge whether to match satisfactory words, if carry out step 6);
The words of 8) output coupling.
Above-mentioned steps 7), then carry out step 7.1 if do not match satisfactory words) in the phonetic data bank of default everyday character word combination, search the words that is complementary behind the clipped word phonetic.
Above-mentioned steps 2) comprises also that afterwards step 3) sets up the database of corresponding Chinese character and its pinyin sequence contrast.
Above-mentioned steps 3) comprises also that afterwards step 4) obtains Chinese content, obtain the pinyin sequence corresponding with the database of its pinyin sequence contrast with Chinese content according to corresponding Chinese character.
The present invention is a kind of according to phonetic transcriptions of Chinese characters spelling combination and pronunciation rule; Proofread and correct the method for Chinese content in conjunction with everyday character word combination rule; The user can import or inquire about the nearly sound glossary of more Chinese through the method, can the more quick and easy learning words that needs maybe need inquire about the more user of multiword remittance.The present invention can be wider the demand that satisfies user input or inquiry glossary, and the inquiry of glossary and input have extendability and extensibility.
Description of drawings
Fig. 1 is a structural representation of the present invention;
Fig. 2 is for being obtained the method flow diagram of spelling correction content by the input Pinyin sequence;
Fig. 3 Chinese content of serving as reasons obtains the method flow diagram of spelling correction content;
Embodiment
Referring to Fig. 1; The Chinese content spelling correction system that has fault-tolerant ability of the present invention comprises the acquisition module 1 that obtains phonetic or Chinese content, contrasts searching module 2, storing the output module 5 of the storage module 3 of table of comparisons information, the processing module 4 of judging match information and output matching content of searching; Acquisition module and 1 is searched module 2 and is connected, processing module 4 with search module 2 and be connected, output module 5 is connected with processing module 4, storage module 3 with search module 2 and be connected; Also comprise the modular converter 6 that Chinese content is changed, modular converter 6 is connected with acquisition module 1; Storage module 3 comprises the phonetic data bank of default everyday character word combination, the Chinese-character phonetic letter table of comparisons of everyday character word combination and the database of corresponding Chinese character and the contrast of its pinyin sequence.
Referring to Fig. 2, Fig. 3, the Chinese content spelling correction method that has fault-tolerant ability of the present invention may further comprise the steps:
1) sets up the phonetic data bank of default everyday character word combination;
2) set up the Chinese-character phonetic letter table of comparisons of everyday character word combination;
3) set up the database that corresponding Chinese character and its pinyin sequence contrast;
4) obtain Chinese content, obtain the pinyin sequence corresponding with the database of its pinyin sequence contrast with Chinese content according to corresponding Chinese character;
5) obtain pinyin sequence;
6) inquire about the words that is complementary with the pinyin sequence that obtains according to the Chinese-character phonetic letter table of comparisons of everyday character word combination;
7) judge whether to match satisfactory words, if carry out step 6); If do not match satisfactory words, then carry out step 7.1) in the phonetic data bank of default everyday character word combination, search the words that is complementary behind the clipped word phonetic.
The words of 8) output coupling.
Flexible method property of the present invention is very high, if user's input is phonetic, then directly in the table of comparisons, searches corresponding words combination earlier, if in database, do not search again;
If Chinese content then earlier finds its Chinese pairing pinyin sequence in database, through the phonetic substitution table resulting pinyin sequence is replaced again;
Last words combination of in table, searching correspondence again according to the phonetic replacement sequence that obtains is not if search in database again; The present invention can be wider the demand that satisfies user input or inquiry glossary, and the inquiry of glossary and input have extendability and extensibility.
A kind of concrete form of Chinese character and its Chinese phonetic alphabet sequence table of comparisons, referring to table 1:
Table 1
Blow chui
Hair mao
Ask qiu
Defect ci
A kind of concrete form of the Chinese-character phonetic letter table of comparisons of everyday character word combination, referring to table 2:
Table 2
Index Chinese Pinyin sequence
100 Captious chuimaoqiuci
The phonetic data bank of a default everyday character word combination of when the fault-tolerant number of words of need is 1, setting up referring to table 3, is wherein shown 3.1-table 3.4 and is respectively the pinyin sequence after 4 words language omits the 1st~4 word;
Table 3
100 maoqiuci
Table 3.1
100 chuiqiuci
Table 3.2
100 chuimaoci
Table 3.3
100 chuimaoqiu
Table 3.4
The specific embodiment of key step of the present invention is following:
One, directly select Chinese:
1, at first the user selects Chinese content, as selecting " blowing the ball top ratio ";
2, obtain its pinyin sequence " chuimaoqiubi " through Chinese character and its pinyin sequence table of comparisons;
3, in the Chinese-character phonetic letter table of comparisons of everyday character word combination, do not find corresponding with it words combination.Omit first to fourth word respectively and obtain " maoqiubi ", " chuiqiubi ", " chuimaobi " this moment; " chuimaoqiu " four pinyin sequences; Search in table 3.1 in database~table 3.4 respectively, can in table 3.4, obtain result 100, this is an index number in the table 2;
4, select for the user by table 2 output " finding fault ".
Two, import through phonetic:
1, user's input Pinyin at first is like input " chuimaoqiubi ";
2, pieced together " pinyin " as follows, " pingying ", " pingyin ", " pinying " through the replacement of phonetic substitution table;
3, in the Chinese-character phonetic letter table of comparisons of everyday character word combination, do not find corresponding with it words combination; Omit first to fourth word respectively and obtain " maoqiubi ", " chuiqiubi ", " chuimaobi " this moment; " chuimaoqiu " four pinyin sequences; Search in table 3.1 in database~table 3.4 respectively, can in table 3.4, obtain result 100, this is an index number in the table 2;
4, select for the user by table b output " finding fault ".

Claims (7)

1. Chinese content spelling correction system that has fault-tolerant ability is characterized in that: said system comprises the acquisition module that obtains phonetic or Chinese content, contrast searching module, storing the output module of the storage module of table of comparisons information, the processing module of judging match information and output matching content of searching; Said acquisition module with search module and be connected, said processing module with search module and be connected, said output module is connected with processing module, said storage module with search module and be connected.
2. the Chinese content spelling correction system that has fault-tolerant ability according to claim 1, it is characterized in that: said system also comprises the modular converter that Chinese content is changed, said modular converter is connected with acquisition module.
3. the Chinese content spelling correction system that has fault-tolerant ability according to claim 2 is characterized in that: said storage module comprises the phonetic data bank of default everyday character word combination, the Chinese-character phonetic letter table of comparisons of everyday character word combination and the database of corresponding Chinese character and the contrast of its pinyin sequence.
4. Chinese content spelling correction method that has fault-tolerant ability is characterized in that: said method comprising the steps of:
1) sets up the phonetic data bank of default everyday character word combination;
2) set up the Chinese-character phonetic letter table of comparisons of everyday character word combination;
5) obtain pinyin sequence;
6) inquire about the words that is complementary with the pinyin sequence that obtains according to the Chinese-character phonetic letter table of comparisons of everyday character word combination;
7) judge whether to match satisfactory words, if carry out step 6);
The words of 8) output coupling.
5. the Chinese content spelling correction method that has fault-tolerant ability according to claim 4; It is characterized in that: said step 7) then carry out step 7.1 if do not match satisfactory words) in the phonetic data bank of default everyday character word combination, search the words that is complementary behind the clipped word phonetic.
6. according to claim 4 or the 5 described Chinese content spelling correction methods that have fault-tolerant ability, it is characterized in that: comprise also afterwards that said step 2) step 3) sets up the database of corresponding Chinese character and its pinyin sequence contrast.
7. the Chinese content spelling correction method that has fault-tolerant ability according to claim 6; It is characterized in that: comprise that also step 4) obtains Chinese content after the said step 3), obtain the pinyin sequence corresponding with the database of its pinyin sequence contrast with Chinese content according to corresponding Chinese character.
CN2011103399344A 2011-11-01 2011-11-01 Chinese content spelling correcting system and method with fault-tolerant capability Pending CN102368271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103399344A CN102368271A (en) 2011-11-01 2011-11-01 Chinese content spelling correcting system and method with fault-tolerant capability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103399344A CN102368271A (en) 2011-11-01 2011-11-01 Chinese content spelling correcting system and method with fault-tolerant capability

Publications (1)

Publication Number Publication Date
CN102368271A true CN102368271A (en) 2012-03-07

Family

ID=45760835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103399344A Pending CN102368271A (en) 2011-11-01 2011-11-01 Chinese content spelling correcting system and method with fault-tolerant capability

Country Status (1)

Country Link
CN (1) CN102368271A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199954A (en) * 2012-06-26 2014-12-10 北京奇虎科技有限公司 Recommendation system and method for search input
CN109063062A (en) * 2018-07-20 2018-12-21 澳通(大连)科技发展有限公司 The method and apparatus of Chinese character information inquiry

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1339754A (en) * 2000-08-22 2002-03-13 英业达集团(上海)电子技术有限公司 Chinese character identifying method and system with correcting function
CN1684481A (en) * 2004-04-16 2005-10-19 英华达(南京)科技有限公司 Method for realizing Chinese input of fuzzy phonetic transcription in mobile phone
CN101133411A (en) * 2004-08-25 2008-02-27 Google公司 Fault-tolerant romanized input method for non-roman characters
CN101819469A (en) * 2009-11-06 2010-09-01 无敌科技(西安)有限公司 Method for correcting Chinese content spelling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1339754A (en) * 2000-08-22 2002-03-13 英业达集团(上海)电子技术有限公司 Chinese character identifying method and system with correcting function
CN1684481A (en) * 2004-04-16 2005-10-19 英华达(南京)科技有限公司 Method for realizing Chinese input of fuzzy phonetic transcription in mobile phone
CN101133411A (en) * 2004-08-25 2008-02-27 Google公司 Fault-tolerant romanized input method for non-roman characters
CN101819469A (en) * 2009-11-06 2010-09-01 无敌科技(西安)有限公司 Method for correcting Chinese content spelling

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199954A (en) * 2012-06-26 2014-12-10 北京奇虎科技有限公司 Recommendation system and method for search input
CN109063062A (en) * 2018-07-20 2018-12-21 澳通(大连)科技发展有限公司 The method and apparatus of Chinese character information inquiry

Similar Documents

Publication Publication Date Title
CN106326303B (en) A kind of spoken semantic analysis system and method
US10216725B2 (en) Integration of domain information into state transitions of a finite state transducer for natural language processing
CN106202153B (en) A kind of the spelling error correction method and system of ES search engine
Liu et al. Insertion, deletion, or substitution? Normalizing text messages without pre-categorization nor supervision
CN104142915B (en) A kind of method and system adding punctuate
EP2715566A1 (en) Method and system for text message normalization based on character transformation and unsupervised of web data
CN101794307A (en) Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea
CN102768681A (en) Recommending system and method used for search input
CN101441527A (en) Method and apparatus for prompting right pronunciation in phonetic input
CN102867511A (en) Method and device for recognizing natural speech
CN103594085A (en) Method and system providing speech recognition result
CN106205613B (en) A kind of navigation audio recognition method and system
CN101359339A (en) Enquiry method for auto expanding key words and apparatus thereof
CN104142974A (en) Voice file querying method and device
CN101539433A (en) Searching method with first letter of pinyin and intonation in navigation system and device thereof
CN103810161A (en) Method for converting Cyril Mongolian into traditional Mongolian
CN101477565A (en) Method and apparatus for confirming correctness of input character string in search engine
KR20230079729A (en) Method for converting natural language query to sql and device thereof
CN102970618A (en) Video on demand method based on syllable identification
CN105512121A (en) Address query method based on keyword
CN102368271A (en) Chinese content spelling correcting system and method with fault-tolerant capability
CN102385597B (en) The fault-tolerant searching method of a kind of POI
CN100403239C (en) Tibetan input method based on English keyboard
CN112328773A (en) Knowledge graph-based question and answer implementation method and system
CN105373568A (en) Method and device for automatically learning question answers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120307