CN101158942A - Translation method capable of correcting Chinese characters phonetic error and system thereof - Google Patents

Translation method capable of correcting Chinese characters phonetic error and system thereof Download PDF

Info

Publication number
CN101158942A
CN101158942A CNA2007100190400A CN200710019040A CN101158942A CN 101158942 A CN101158942 A CN 101158942A CN A2007100190400 A CNA2007100190400 A CN A2007100190400A CN 200710019040 A CN200710019040 A CN 200710019040A CN 101158942 A CN101158942 A CN 101158942A
Authority
CN
China
Prior art keywords
phonetic
phrase
module
chinese character
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007100190400A
Other languages
Chinese (zh)
Inventor
陈淮琰
王秦秦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Besta Xian Co Ltd
Original Assignee
Inventec Besta Xian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Xian Co Ltd filed Critical Inventec Besta Xian Co Ltd
Priority to CNA2007100190400A priority Critical patent/CN101158942A/en
Publication of CN101158942A publication Critical patent/CN101158942A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention is a translation method and a system thereof that can revise Chinese character-pinyin errors, the method comprises the following steps: 210) transform the Chinese characters in target words and sentences into pinyin characters; 220-1) analyze a plurality of pinyin character structures to generate pinyin phrases; 220-2) use correct Chinese character phrases to replace the pinyin phrases; 230) translate a new word and a sentences formed by the correct Chinese character phrases. The invention solves the technical problems that can't judge whether the target words and sentences have wrong characters when translating the Chinese characters in the background technique, so as to provide the translation system and the method thereof that can revise the Chinese character-pinyin errors, through transforming the target words and sentences into pinyin characters and switching back into the Chinese characters, thereby using the correct Chinese characters to translate.

Description

The interpretation method and the system thereof of recoverable phonetic transcriptions of Chinese characters mistake
Technical field
The present invention relates to a kind of interpretation method and system thereof, especially a kind of the target words and phrases are converted to interpretation method and the system thereof of phonetic phrase with the error recovery Chinese-character phonetic letter.
Background technology
Many webpages of carrying out the electronic installation of translation program or can carrying out translation program have been arranged at present, wherein, the translation program that is performed is by the notion of fuzzy translation (Fuzzy translation), utilize the algorithm of fuzzy logic to provide possibility higher collocation, and to utilize the algorithm of fuzzy matching be to have the possible sentence of a plurality of participles to make correct participle for the word that ambiguity is arranged.Roughly, the present translation program effect user that translates out still can accept.
Yet, the translation program of present major part translation Chinese character all is directly to wanting the Aim of Translation words and phrases directly to translate, and that is to say that the translation program of present translation Chinese character is behind the evaluating objects words and phrases, will translate with each phrase that analyzes, to draw translation result.And for example as the user because any reason mistake will " run in the opposite direction " when being input as " stolen and hold " of unisonance, the translation result of the English that existing translation program produces may be " to steal holding ", but scarcely can the person of being to use wish the translation result " to run in the opposite direction " seen.
Summary of the invention
When the present invention translates Chinese character for solving in the background technology, can't judge the technical matters whether wrongly written or mispronounced characters is arranged in the target words and phrases, and provide a kind of translation system and method thereof of recoverable phonetic transcriptions of Chinese characters mistake, by the switch target words and phrases is to go back to Chinese character again behind the phonetic word, just can use correct Chinese character to translate.
Technical solution of the present invention is: the present invention is a kind of interpretation method of recoverable phonetic transcriptions of Chinese characters mistake, and its special character is: this method comprises the following steps:
210) Chinese character in the switch target words and phrases is the phonetic word;
220-1) analyze the structure of described a plurality of phonetic words to produce the phonetic phrase;
220-2) correct Chinese character phrase is replaced described phonetic phrase;
230) translate new words and phrases of being formed by described correct Chinese character phrase.
Above-mentioned steps 220-1) analyzing the phonetic word in and form structure, is that the phonetic words in the phonetic words hurdle in phonetic word and the phrase tables of data is compared, and the phonetic phrase formed in the phonetic word.
Above-mentioned 220-2) in the Chinese-character word-phrase hurdle in from the phrase tables of data, finds out the correct Chinese character phrase corresponding and replace this phonetic phrase with the phonetic phrase.
Above-mentioned steps 220-2) also include step 220-3 after) the individual character phonetic word in the phonetic word is replaced with the individual character in the former target words and phrases.
A kind of translation system of recoverable phonetic transcriptions of Chinese characters mistake, its special character is: this system comprises modular converter 110, and being used for changing the target words and phrases that comprise one of mistake Chinese character is a plurality of phonetic words; Analysis module 120 is used for analyzing the structure of described a plurality of phonetic words to produce at least one phonetic phrase; Storage module 130 is used for storing at least one correct Chinese character phrase of corresponding in advance described phonetic phrase; Replace module 140, be used for using described correct Chinese character phrase to replace described phonetic phrase; And translation module 150, be used for translating new words and phrases of forming by described correct Chinese character phrase, modular converter 110 connect into analysis modules 120, analysis module 120 inserts and replaces module 140, storage module 130 joins with analysis module 120 and replacement module 140 respectively, replaces module 140 and inserts translation modules 150.
Said system also comprises selects module 190, is used for the described correct Chinese character phrase of corresponding described phonetic phrase not only one group the time, selected described correct Chinese character phrase one of them, replace module 140 and insert and select modules 190, select module 190 to insert translation modules 150.
The present invention is converted to the phonetic word by the target words and phrases that will comprise the mistake Chinese character, and the structure of analyzing the phonetic word produces the phonetic phrase, to analyze the phonetic phrase that produces again and be converted to correct Chinese character phrase, use correct Chinese character phrase to form the method and system that new words and phrases are translated at last, solved when the user in input when wanting to have occurred the wrongly written or mispronounced characters of unisonance in the Aim of Translation words and phrases, existing translation program also can't judge in the target words and phrases of input whether wrongly written character is arranged, to directly translate, produce waste system effectiveness and problem that can't be fault-tolerant the target words and phrases.
Description of drawings
Fig. 1 is the system architecture diagram of the translation system of recoverable phonetic transcriptions of Chinese characters mistake of the present invention;
Fig. 2 is the method flow diagram of the interpretation method of recoverable phonetic transcriptions of Chinese characters mistake of the present invention;
Fig. 3 is the detailed method process flow diagram of the target words and phrases of analysis of the present invention and replacement phonetic form;
Fig. 4 is the phrase corresponding tables of the embodiment of the invention.
Embodiment
The translation system of correctable error Chinese character of the present invention, referring to Fig. 1, system of the present invention contains modular converter 110, analysis module 120, replaces module 140, storage module 130, translation module 150.Wherein to be responsible for the target words and phrases that conversion comprises the mistake Chinese character be a plurality of phonetic words for modular converter 110; Analysis module 120 is responsible for analyzing the structure by the modular converter 110 conversion phonetic words that produce, and produces at least one phonetic phrase after analysis; Storage module 130 is responsible for storing the correct Chinese character phrase corresponding with the phonetic phrase; Replace module 140 and be responsible for, and use correct Chinese character phrase to replace the phonetic phrase by the correct Chinese character phrase of reading corresponding phonetic phrase in the storage module 130; Translation module 150 is responsible for the new words and phrases that translation is made up of correct Chinese character phrase.
Explain orally operation system of the present invention and method with an embodiment below, referring to Fig. 2, the method flow diagram of the interpretation method of correctable error Chinese character of the present invention, and referring to Fig. 3, the detailed method process flow diagram of the target words and phrases of analysis of the present invention and replacement phonetic form.
When user's use has electronic installation 100 of the present invention to come the special translating purpose words and phrases, if the user will desire the wrongly written character that Aim of Translation words and phrases " this runs in the opposite direction with democracy spirit " are input as unisonance " this is spiritual stolen and hold with democracy ", and carry out when translating, modular converter 110 of the present invention will be converted to a plurality of phonetics (step 210) to the target words and phrases, in the present embodiment, with Chinese phonetic alphabet method is example, just modular converter 110 can be converted to the target words and phrases " zhe shi tong min zhu jing shen bei dao erchi de ", but modular converter 110 of the present invention is not limited to the target words and phrases are converted to the Chinese phonetic alphabet, in addition, the method that the target words and phrases is converted to the phonetic word can be used known ways such as corresponding tables, to not be described in detail at this, and switch target words and phrases of the present invention are the mode that phonetic word mode is not limited to use corresponding tables.
Then, analysis module 120 of the present invention can be analyzed the phonetic word and form structure, to find out each phonetic phrase (token) in the target words and phrases, just wherein analytical approach is for example for " zhe shi " (this is), " tong " (together), " min zhu jing shen " (democracy spirit), " bei dao er chi " (stolen holding), " de " phrase such as () phonetic (step 220-1) with the combinatory analysis of phonetic word: use a phrase tables of data 300, be not limited in the mode of using phrase tables of data 300 but the present invention analyzes phonetic word method.
Referring to Fig. 4, analysis module 120 is compared each phonetic words in the phonetic words hurdle 310 in phonetic word and the phrase MSDS 300, for example compare the 8th phonetic word " bei ", owing to store " bei dao er chi " in the phrase tables of data 300, therefore analysis module 120 can add the 9th phonetic word " dao " in regular turn under the condition that comparison coincide, the tenth phonetic word " er " and the 11 phonetic word " chi ", after using " bei dao er chi " to compare, can learn that " bei dao er chi " is recorded in the phonetic words hurdle 310 of phrase tables of data 300, therefore " bei dao er chi " becomes a phonetic phrase, because the character string comparison is known technology, only simply describe herein.Then, replacement module 140 of the present invention just can be from the Chinese-character word-phrase hurdle 320 the phrase MSDS 300, the corresponding correct Chinese character phrase of the phonetic phrase of reading and being recorded " beidao er chi " " runs in the opposite direction ", and " runs in the opposite direction " with the correct Chinese character phrase of reading and to replace phonetic phrase " bei dao er chi " (step 220-2).
After replacement module 130 uses corresponding correct Chinese character phrase to replace each phonetic phrase, the words and phrases that are translated will be proofreaied and correct by " this with democracy spirit stolen and hold " and be " this runs in the opposite direction with democracy spirit ", therefore translation module 150 of the present invention just can be translated (step 230) to the new words and phrases " this runs in the opposite direction with democracy spirit " after replacing, be example to be translated as English in the present embodiment, but the present invention is not limited in the new words and phrases after replacing is translated as English, so the user is after 150 pairs of fresh target words and phrases of translation module are translated, just can obtain correct translation result " This ran in theopposite direction with democratic spirit. ", because translation words and phrases method is a known technology, do not add to describe at this.So, the present invention promptly can solve the problem that present translation program can't correct pinyin mistake Chinese character.
If consider storage area and the factor of searching usefulness, to can not store the phonetic that has only single Chinese character in the storage module 120, reach " de " as phonetic phrase " tong " among the above-mentioned embodiment, so analysis module 120 is reading phonetic word (step 221) afterwards, can't from the stored phrase tables of data 300 of storage module 130, compare out identical phonetic phrase (step 222), so replacement module 140 can not read corresponding to " tong " and be reached the correct Chinese character phrase of " de " and replace (step 223), so after replacing module 140 execution, the words and phrases that are translated can become " this be tong democracy spirit run in the opposite direction de ", so replacement module 140 of the present invention also needs unsubstituted " tong " reached " de " Chinese character replacement with the original input of user, just " tong " reached " de " and be substituted by " together " respectively, " " (step 220-3), so translation module of the present invention 150 just can correct translation replace new words and phrases " this runs in the opposite direction with democracy spirit ".
In addition, owing to might occur the situation of complete unisonance in the correct Chinese character phrase, for example when replacement module 140 will replace " ce shi " (step 220-2), may record " plotter " in the phrase tables of data 300, more than one corresponding correct Chinese character phrases such as " tests ", therefore the present invention also includes selection module 190, select " plotter " in order to the mode of using fuzzy translation, " test " one of them, simultaneously, can't judge use " plotter " when selecting module 190, when correct Chinese character phrases such as " tests " is comparatively suitable, can point out the user to select wherein to select module 190 selected, after selecting module 190 selected correct Chinese character phrases, translation module 150 just can be translated (step 230) to new words and phrases.
The interpretation method of correctable error Chinese character of the present invention can be implemented in hardware, software or hardware and the combination of software, also can realize or be dispersed in different assemblies the dispersing mode of the computer system of several interconnected with centralized system and realize in computer system.

Claims (6)

1. the interpretation method of a recoverable phonetic transcriptions of Chinese characters mistake, it is characterized in that: this method comprises the following steps:
210) Chinese character in the switch target words and phrases is the phonetic word;
220-1) analyze the structure of described a plurality of phonetic words to produce the phonetic phrase;
220-2) correct Chinese character phrase is replaced described phonetic phrase;
230) translate new words and phrases of being formed by described correct Chinese character phrase.
2. the interpretation method of recoverable phonetic transcriptions of Chinese characters mistake according to claim 1, be characterised in that: analyze the phonetic word described step 220-1) and form structure, be that the phonetic words in the phonetic words hurdle in phonetic word and the phrase tables of data is compared, the phonetic phrase formed in the phonetic word.
3. the interpretation method of recoverable phonetic transcriptions of Chinese characters mistake according to claim 2 is characterised in that: in the Chinese-character word-phrase hurdle described 220-2) from the phrase tables of data, find out the correct Chinese character phrase corresponding with the phonetic phrase and replace this phonetic phrase.
4. according to the interpretation method of claim 1 or 2 or 3 described recoverable phonetic transcriptions of Chinese characters mistakes, be characterised in that: also include step 220-3 described step 220-2)) the individual character phonetic word in the phonetic word is replaced with the individual character in the former target words and phrases.
5. the translation system of a recoverable phonetic transcriptions of Chinese characters mistake, it is characterized in that: this system comprises modular converter (110), and being used for changing the target words and phrases that comprise one of mistake Chinese character is a plurality of phonetic words; Analysis module (120) is used for analyzing the structure of described a plurality of phonetic words to produce at least one phonetic phrase; Storage module (130) is used for storing at least one correct Chinese character phrase of corresponding in advance described phonetic phrase; Replace module (140), be used for using described correct Chinese character phrase to replace described phonetic phrase; And translation module (150), be used for translating new words and phrases of forming by described correct Chinese character phrase, described modular converter (110) connect into analysis module (120), described analysis module (120) inserts and replaces module (140), described storage module (130) joins with analysis module (120) and replacement module (140) respectively, and described replacement module (140) inserts translation module (150).
6. the translation system of recoverable phonetic transcriptions of Chinese characters mistake according to claim 5, it is characterized in that: described system also comprises selects module (190), be used for the described correct Chinese character phrase of corresponding described phonetic phrase not only one group the time, selected described correct Chinese character phrase one of them, described replacement module (140) inserts selects module (190), and described selection module (190) inserts translation module (150).
CNA2007100190400A 2007-11-09 2007-11-09 Translation method capable of correcting Chinese characters phonetic error and system thereof Pending CN101158942A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007100190400A CN101158942A (en) 2007-11-09 2007-11-09 Translation method capable of correcting Chinese characters phonetic error and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007100190400A CN101158942A (en) 2007-11-09 2007-11-09 Translation method capable of correcting Chinese characters phonetic error and system thereof

Publications (1)

Publication Number Publication Date
CN101158942A true CN101158942A (en) 2008-04-09

Family

ID=39307045

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007100190400A Pending CN101158942A (en) 2007-11-09 2007-11-09 Translation method capable of correcting Chinese characters phonetic error and system thereof

Country Status (1)

Country Link
CN (1) CN101158942A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411595A (en) * 2010-09-25 2012-04-11 英业达股份有限公司 Input correction system and method translation inquiry
CN102541837A (en) * 2010-12-22 2012-07-04 张家港市赫图阿拉信息技术有限公司 Method for correcting inputted Chinese characters
CN104750672A (en) * 2013-12-27 2015-07-01 重庆新媒农信科技有限公司 Chinese word error correction method used in search and device thereof
CN106980390A (en) * 2016-01-18 2017-07-25 富士通株式会社 Supplementary translation input method and supplementary translation input equipment
CN111079417A (en) * 2019-12-17 2020-04-28 米哈游科技(上海)有限公司 Wrongly written character checking method, wrongly written character checking device, server and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411595A (en) * 2010-09-25 2012-04-11 英业达股份有限公司 Input correction system and method translation inquiry
CN102541837A (en) * 2010-12-22 2012-07-04 张家港市赫图阿拉信息技术有限公司 Method for correcting inputted Chinese characters
CN104750672A (en) * 2013-12-27 2015-07-01 重庆新媒农信科技有限公司 Chinese word error correction method used in search and device thereof
CN104750672B (en) * 2013-12-27 2017-11-21 重庆新媒农信科技有限公司 A kind of Chinese vocabulary error correction method and its device being applied in search
CN106980390A (en) * 2016-01-18 2017-07-25 富士通株式会社 Supplementary translation input method and supplementary translation input equipment
CN111079417A (en) * 2019-12-17 2020-04-28 米哈游科技(上海)有限公司 Wrongly written character checking method, wrongly written character checking device, server and storage medium

Similar Documents

Publication Publication Date Title
Abandah et al. Automatic diacritization of Arabic text using recurrent neural networks
US8706472B2 (en) Method for disambiguating multiple readings in language conversion
US20210124876A1 (en) Evaluating the Factual Consistency of Abstractive Text Summarization
CN109460552B (en) Method and equipment for automatically detecting Chinese language diseases based on rules and corpus
CN101730898A (en) Adopt the handwriting recognition of neural network
Zhang et al. A fast, compact, accurate model for language identification of codemixed text
CN101158942A (en) Translation method capable of correcting Chinese characters phonetic error and system thereof
JP5314195B2 (en) Natural language processing apparatus, method, and program
Lee et al. Linguistic rules based Chinese error detection for second language learning
Kübler et al. Part of speech tagging for Arabic
Sawalha et al. Fine-grain morphological analyzer and part-of-speech tagger for Arabic text
Cao et al. Integrating BERT and score-based feature gates for Chinese grammatical error diagnosis
Alosaimy et al. Tagging classical Arabic text using available morphological analysers and part of speech taggers
WO2018097022A1 (en) Automatic translation pattern learning device, automatic translation preprocessing device, and computer program
CN101520778A (en) Apparatus and method for determing parts-of-speech in chinese
Kaur et al. Hybrid approach for spell checker and grammar checker for Punjabi
Etxeberria et al. Weighted finite-state transducers for normalization of historical texts
Acharya KaushikAcharya at SemEval-2021 task 9: Candidate generation for fact verification over tables
Lehal Design and implementation of Punjabi spell checker
Wiechetek et al. Seeing more than whitespace—Tokenisation and disambiguation in a North Sámi grammar checker
US11288451B2 (en) Machine based expansion of contractions in text in digital media
Mandal et al. A systematic literature review on spell checkers for bangla language
Rahate et al. Text Normalization and Its Role in Speech Synthesis
AlGahtani et al. Joint Arabic segmentation and part-of-speech tagging
Aichaoui et al. SPIRAL: SP ell I ng e R ror Parallel Corpus for A rabic L anguage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication