CN103164395B - The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language - Google Patents

The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language Download PDF

Info

Publication number
CN103164395B
CN103164395B CN201110426747.XA CN201110426747A CN103164395B CN 103164395 B CN103164395 B CN 103164395B CN 201110426747 A CN201110426747 A CN 201110426747A CN 103164395 B CN103164395 B CN 103164395B
Authority
CN
China
Prior art keywords
language
word
chinese
input
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110426747.XA
Other languages
Chinese (zh)
Other versions
CN103164395A (en
Inventor
尼加提·纳吉米
买合木提·买买提
帕肉克·司地克
马斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XINJIANG INFORMATION INDUSTRY Co Ltd
Original Assignee
XINJIANG INFORMATION INDUSTRY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XINJIANG INFORMATION INDUSTRY Co Ltd filed Critical XINJIANG INFORMATION INDUSTRY Co Ltd
Priority to CN201110426747.XA priority Critical patent/CN103164395B/en
Publication of CN103164395A publication Critical patent/CN103164395A/en
Application granted granted Critical
Publication of CN103164395B publication Critical patent/CN103164395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a kind of Chinese Ke e-dictionary and its method for automatic translation Chinese Ke's language, with languages identification module, retrieval module, retrieval combination output module, display module, sound identification module and voice output module;After the word being input into is identified category of language, be input into word is matched with the word in basic corpus by retrieval module, then the word to be translated for being retrieved from basic corpus according to retrieval module, with the to be translated word meaning corresponding Chinese of the sound identification module again to being obtained by retrieval combination output module explains that sentence and Ke's language explain that sentence (through syllable splitting link) is effectively recognized, recall true man's sound bank or synthesis Ke's language sound bank, sound identification module reads above-mentioned be input into word, and send the voice of be input into word successively by the loudspeaker of sound identification module.Electronic dictionary of the invention is rational in infrastructure, and its method changes the dictionary technology of original Chinese Ke language intertranslation, improves the Chinese mutual translational efficiency of Ke's language, and improvement carries out the performance that voice is broadcasted to Chinese Ke's language word.

Description

The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language
Technical field
The invention belongs to mechanical translation language technical field, relating to the use of computer software and hardware makes Chinese and Kirgiz language Mutually the language conversion technology of translation, particularly Chinese Ke e-dictionary and its automatically method of translation Chinese Ke's language.
Background technology
In the present age of social informatization, people obtain to all kinds of language informations, inquire about, translate propose faster, it is higher It is required that, all kinds of e-dictionary products have been developed therewith, the big electronics to entry containing hundreds of thousands, up to ten thousand media materials is more Media encyclopedia, small to be welcome by users to the palm instant translator containing thousands of entries, e-dictionary is by as study language The aid of speech, translation and quick search.The practicalization of machine translation system and natural language processing system abroad In, machine dictionary has become the focus of exploitation, increasing language translation technical specialist the scale of machine dictionary and Quality regards the key for determining machine translation system and natural language processing system success or failure as, early in MITI of Japan in 1986 just Provide funds 100,000,000 dollars to support e-dictionaries(EDR)The development plan of 9 years, the European Community also subsidizes grinding for multinomial machine dictionary Problem is studied carefully, including ACQUILEX(The Acquisition of Lexical Knowledge)Problem, its target is logical Cross multi-section machine readable dictionary MRD(Machine Reading Dictionary)Automatically vocabulary knowledge is obtained, to set up Support the multilingual words knowledge base LKB of natural language processing(Lexical Knowledge Base), opened on this basis The multi-section heavy-duty machines dictionary of each languages of hair, its species includes basic dictionary, term dictionary, collocation dictionary, concept classification diction Allusion quotation, concept description dictionary, grammer dictionary etc..At present, the e-dictionary species of commercialization is various, such as Encyclopedia Britannica, Ke General encyclopedia, ENCARTA etc..
In China, it is related to the research in terms of machine translation dictionary then to start from the twentieth century 50, sixties, in reform and opening-up After obtained abundant attention, the twentieth century later stage eighties, the expert in Chinese information processing field has started to machine dictionary National the Seventh Five-Year Plan, eight or five, 95 are formally listed in research, twentieth century beginning of the nineties, the research of the machine dictionary of Information treatment in Plan, has carried out such as《Information processing is studied with modern Chinese vocabulary》、《Chinese semantic meaning dictionary based on coordination valence》、《The modern Chinese Language syntactic information dictionary》Deng basic research problem, develop on this basis《Encyclopadia Sinica》、《Kingsoft Powerword》、 《East grand ceremony》Deng more ripe information products, the welcome of users is received.
In recent years, with the sustained and rapid development of minority language informatization, in Xinjiang of China, the relevant minority people The e-dictionary of race's language there has also been than larger development, but most of based on existing common Chinese dimension e-dictionary, not The level for having the actual demand for meeting more users, more support minority language translation technologies has larger lacking Fall into.
The content of the invention
It is an object of the invention to provide a kind of Chinese Ke e-dictionary, its is rational in infrastructure, highly versatile.
The object of the present invention is achieved like this:A kind of Chinese Ke e-dictionary, by languages identification module, retrieval module, inspection Suo Zuhe output modules, display module, sound identification module and voice output module composition, languages identification module are corresponding by its The interface of interface connection display module and the interface of retrieval module, retrieval module export end interface correspondence chained search group by it Close the input end interface of output module, the input of the output end interface correspondence connection sound identification module of retrieval combination output module End interface, sound identification module exports the input end interface that end interface connects voice output module by it.
The present invention also aims to provide a kind of method that Chinese Ke e-dictionary translates Chinese Ke's language automatically, change original biography The dictionary technology of system, common Chinese and Kirgiz language intertranslation, improves the efficiency that Chinese and Kirgiz language are mutually translated, and improves The performance that voice is broadcasted is carried out to Chinese written language, Kirgiz Chinese language word (Kirgiz language is referred to as Ke's language or Ke Wen).
The object of the present invention is achieved like this:A kind of method that Chinese Ke e-dictionary translates Chinese Ke's language automatically, it is sequentially The step for the treatment of, is as follows:
(I) be input into word is shown by display module 1, structure takes word window, and languages identification module 2 is utilized and takes word window The method that mouth passes through screen word-selecting, acquisition is input into the corresponding inputting character code region of word with the display of display module 1, The word that will be input into is right compared with the code character in stored UNICODE standard codes character set, and judgement is input into text The languages of word are Chinese or Ke's language, then the word that is input into for being identified languages is reached retrieval module 3;
(II) retrieval module 3 obtains retrieval mode and word and is being deposited at memory by being input into for languages is identified The character that is stored is compared in the Chinese-Ke's corpus and Ke-Chinese corpus of storage side by side in basic corpus, with from base The character combination identical or corresponding with the character for being input into word for being identified languages is retrieved in plinth corpus, quilt is confirmed The word that is input into for identifying languages is the known individual character or word being stored in basic corpus, or further actively complete Whole Chinese word combination or word letter combination, if can not be retrieved from the Chinese-Ke's corpus and Ke-Chinese corpus defeated with institute Enter the identical or corresponding character combination-Chinese word of word or Ke's language word, then retrieve module 3 and judge to be identified languages The word that is input into be unknown, it is impossible to confirmed by languages identification module 2, receive;
(III) languages identification module 2 receives the character combination that retrieval module 3 is retrieved, and is stored from basic language material place The Chinese-Ke's corpus and Ke-Chinese corpus in recall it is corresponding with the character combination meaning retrieved by retrieval module 3 and Different from another languages character combination-be translated into Chinese word, Chinese language words or Ke's language word of be input into word languages, then What is recalled from basic corpus be input into word and/or by languages identification module 2 is corresponding with the be input into word meaning Another languages character combination by retrieve module 3 or be directly transferred to retrieval combination output module 4;
(IV) retrieval combination output module 4 according to be input into word and/or by languages identification module 2 from basic corpus Another languages character combination corresponding with the be input into word meaning for being recalled, the Chinese stored side by side from basic corpus- The Chinese solution of the meaning for explaining the character combination that the module 3 that is retrieved is retrieved is obtained in Chinese corpus and Ke-Ke's corpus Sentence is released, according to Slav Wen Keyu words and Arabic Ke's language word mapping table, is obtained and above-mentioned another languages character group The desirable Ke's language explanation sentence for thinking corresponding Ezra husband letter or Arabic alphabet expression, mutually tackles by languages identification module 2 The meaning of the character combination recalled from basic corpus is explained, and retrieval combination output module 4 is again retrieved it Explanation sentence export to sound identification module 5;
(V) when sound identification module 5 judges that its explanation sentence for being received explains sentence for Chinese, speech recognition mould True man's Chinese speech information library that block 5 is stored with the speech data place being deposited in memory, the Chinese for accordingly being received to it one by one Language explains that each Chinese word in sentence carries out voice match according to Chinese speech pronunciation word order, then receives the Chinese with it by keeping in have Language explains that the Chinese speech pronunciation signal that the Chinese word in sentence sequentially matches reaches voice output module 6 successively, corresponding to the Chinese After language explains that the Chinese speech pronunciation signal of each Chinese word in sentence is sequentially detected, read one by one by voice output module 6, by language Loudspeaker in sound output module 6 sends and receives Chinese with it and explain the corresponding Chinese of each Chinese word in sentence successively Voice;
When sound identification module 5 judges that its explanation sentence for being received explains sentence and its Ke's language solution for receiving by Ke's language When releasing sentence and be Ke's language word expressed with Arabic alphabet or Cyrillic, sound identification module 5 is with speech database The true man Ke's language sound bank for being stored, Ke's language for accordingly being received to it one by one explains each Ke's language word of sentence according to Ke's language Pronunciation word order carries out voice match, then has what Ke's language word received in Ke's language explanation sentence with it sequentially matched by temporary Ke's language pronunciation signal reaches voice output module 6 successively, and each Ke's language word in Ke's language explanation sentence is received corresponding to it After Ke's language pronunciation signal is sequentially detected, read one by one by voice output module 6, by the loudspeaker in voice output module 6 successively Send Ke's language voice matched with each Ke's language word in Ke's language explanation sentence;If sound identification module 5 judges that it is received Explanation sentence for Ke's language explain sentence, but can not to Ke's language explain sentence carry out voice match when, then estimate Ke's language solution It is Ke's language text expressed with Arabic alphabet or Cyrillic to release sentence, and calls the synthesis stored in speech database Ke's language sound bank carries out the phonetic synthesis based on syllable to Ke's language text, is incited somebody to action by the way that Ke's language sentence word is corresponding to syllable splitting method Ke's language text is cut into Ke language word of the known as memory in synthesis speech database, then with true man Ke's language sound bank and/or synthesizes Ke Language sound bank, accordingly carries out voice match to each Ke's language word of Ke's language text according to Ke's language pronunciation word order one by one, will be temporary There is Ke's language pronunciation signal that Ke's language word being sequentially cut into Ke's language text matches and reaches voice output mould successively Block 6, after Ke's language pronunciation signal is sequentially detected, read one by one by voice output module 6, by the loudspeaker in voice output module 6 Ke's language voice matched with each Ke's language word in Ke's language text is sent successively.
The present invention is based at computational linguistics, Ethnology, sociology, pragmatics, interpretative science and computerized information Chinese Ke's language two-way multimedia e-dictionary of reason science and technology, the bilingual coded format of Chinese Ke based on UNICODE international standards, with The two-way word input function of Han Ke, Ke Han, Chinese Ke word and text reading function are realized, with the utilization under different operating system The function that screen word-selecting method obtains the function of Chinese Ke's character and changed to domestic and international Kirgiz (language) literal code, also has There is the multilingual interface of Chinese Ke's language, to Chinese Ke's word quick-searching, fuzzy search, kirgiz can be directly inputted, to diction Allusion quotation dictionary is managed, the function such as the setting of subsidiary dictionary, dictionary instrument, dictionary annex, online upgrading.
The present invention provides Kirgiz language arabian writing input method, but is independent of other kirgiz input methods, carries Availability high, there is provided the two-way real time translation of screen word-selecting Chinese Ke, is the user side of bringing using Chinese, Kirgiz language Just, there is provided the standard of Chinese Ke's word and expression is read aloud, it is learning Chinese, the powerful of Kirgiz language, with magnanimity Ke Er Gram diligent literary corpus and word, phrase explanation function and Kirgiz language Slav word(Kirghizstan)And Ke Erke Diligent language arabian writing(Xinjiang, China)Between conversion display function, facilitate other say the personnel of non-Kirgiz language learn Ke That Ke Zi races language, Kirgiz national history, folkways and customs, are that other say that the personnel of non-Kirgiz language understand Xinjiang and Ji The lucky De Stein geography information of that and region, style and features provide lot of examples.
The present invention solves all domestic and international Kirgiz people with Kirgiz language as mother tongue and is difficult to obtain modern Aphasis problem in knowledge and daily life, enables domestic and international Kirgiz language learner rapid translation and then obtains various Information, not only facilitates Kirgiz people's learning Chinese, and facilitates Han nationality comrade and foreigner's study Kirgiz language, is Ke That gram diligent language, Chinese user learning Chinese, Ke's language translation tool, the Chinese to improving the Kirgiz people say that the level of writing has Profound significance;On the other hand Chinese Ke (language) machine translation dictionary in future storehouse is built, to crow (Uzbek's text) Chinese, soil (soil Er Qiwen) the exploitation of Chinese bidirectional electronic dictionary and auxiliary machinery translation system lays a solid foundation.
Technical characterstic of the invention is:1. the two-way word translation service between Chinese, Kirgiz language is provided, in this hair Being input into above-mentioned any one language word in bright Chinese Ke's e-dictionary can obtain its lexical or textual analysis in another language;② The kirgiz assembly type input method for supporting world UNICODE standards is provided, i.e. user is fitted without any Ke's language input method When, this dictionary still can correctly enter Ke's language word of standard;3. in the Windows sequence of maneuvers systems of current main-stream (Windows XP\Windows Server\Windows Vista\Windows 7)In, it is capable of achieving to carry out screen to Ke's language and take The function of word;4. the function of reading aloud to Ke's language word and text is realized using statistics and phonetics, it is massage voice reading standard, clear It is clear, with more advanced technical characteristic;5. the additional work(such as dictionary online upgrading, dictionary setting, dictionary instrument, dictionary annex are provided Can, can be according to being configured the need for user;6. the multilingual dictionary interface of close friend is provided, is obtained not by the setting of hommization With the dictionary interface of language and direction;7. the function to being input into word language automatic identification is realized, analysis is input into word, automatically Languages judgement is carried out to be input into word, and word translation is carried out to it;8. being collected in Chinese Ke dictionary has nearly 100,000 vocabulary, together When establish true man's sound bank and the massage voice reading based on syllable splitting technology synthesis storehouse;9. Kirgiz language Slav text is realized Conversion display function between word (Central Asia-Kirghizstan) and Kirgiz language arabian writing (Xinjiang, China), that is, exist Above two written form is shown in lexical or textual analysis window simultaneously, so as to effectively widen use scope of the invention.Electricity of the invention Sub- dictionary its rational in infrastructure, highly versatile, its method changes the dictionary skill of original traditional, common Chinese and Kirgiz language intertranslation Art, improves the efficiency that Chinese and Kirgiz language are mutually translated, and improvement carries out voice and put to Chinese written language, Kirgiz Chinese language word The performance sent.
Brief description of the drawings
Accompanying drawing 1 is the main-process stream schematic diagram of the method for module connection diagram of the invention and its automatic translation Chinese Ke's language.
Specific embodiment
A kind of Chinese Ke e-dictionary, as shown in Figure 1, by languages identification module 2, retrieval module 3, retrieval combination output mould Block 4, display module 1, sound identification module 5 and voice output module 6 are constituted, and languages identification module 2 is connected by its corresponding interface The interface of display module 1 and the interface of retrieval module 3 are connect, retrieval module 3 exports end interface correspondence chained search and combines by it The input end interface of output module 4, the output end interface correspondence of retrieval combination output module 4 connects the defeated of sound identification module 5 Enter end interface, sound identification module 5 exports the input end interface that end interface connects voice output module 6 by it.
A kind of method that Chinese Ke e-dictionary translates Chinese Ke's language automatically, as shown in Figure 1, the step of it is sequentially processed such as Under:
(I) be input into word is shown by display module 1, structure takes word window, and languages identification module 2 is utilized and takes word window The method that mouth passes through screen word-selecting, acquisition is input into the corresponding inputting character code region of word with the display of display module 1, The word that will be input into is right compared with the code character in stored UNICODE standard codes character set, and judgement is input into text The languages of word are Chinese or Ke's language, then the word that is input into for being identified languages is reached retrieval module 3;
(II) retrieval module 3 obtains retrieval mode and word and is being deposited at memory by being input into for languages is identified The character that is stored is compared in the Chinese-Ke's corpus and Ke-Chinese corpus of storage side by side in basic corpus, with from base The character combination identical or corresponding with the character for being input into word for being identified languages is retrieved in plinth corpus, quilt is confirmed The word that is input into for identifying languages is the known individual character or word being stored in basic corpus, or further actively complete Whole Chinese word combination or word letter combination, if can not be retrieved from the Chinese-Ke's corpus and Ke-Chinese corpus defeated with institute Enter the identical or corresponding character combination-Chinese word of word or Ke's language word, then retrieve module 3 and judge to be identified languages The word that is input into be unknown, it is impossible to confirmed by languages identification module 2, receive;
(III) languages identification module 2 receives the character combination that retrieval module 3 is retrieved, and is stored from basic language material place The Chinese-Ke's corpus and Ke-Chinese corpus in recall it is corresponding with the character combination meaning retrieved by retrieval module 3 and Different from another languages character combination-be translated into Chinese word, Chinese language words or Ke's language word of be input into word languages, then What is recalled from basic corpus be input into word and/or by languages identification module 2 is corresponding with the be input into word meaning Another languages character combination by retrieve module 3 or be directly transferred to retrieval combination output module 4;
(IV) retrieval combination output module 4 according to be input into word and/or by languages identification module 2 from basic corpus Another languages character combination corresponding with the be input into word meaning for being recalled, the Chinese stored side by side from basic corpus- The Chinese solution of the meaning for explaining the character combination that the module 3 that is retrieved is retrieved is obtained in Chinese corpus and Ke-Ke's corpus Sentence is released, according to Slav Wen Keyu words and Arabic Ke's language word mapping table, is obtained and above-mentioned another languages character group The desirable Ke's language explanation sentence for thinking corresponding Ezra husband letter or Arabic alphabet expression, mutually tackles by languages identification module 2 The meaning of the character combination recalled from basic corpus is explained, and retrieval combination output module 4 is again retrieved it Explanation sentence export to sound identification module 5;
(V) when sound identification module 5 judges that its explanation sentence for being received explains sentence for Chinese, speech recognition mould True man's Chinese speech information library that block 5 is stored with the speech data place being deposited in memory, the Chinese for accordingly being received to it one by one Language explains that each Chinese word in sentence carries out voice match according to Chinese speech pronunciation word order, then receives the Chinese with it by keeping in have Language explains that the Chinese speech pronunciation signal that the Chinese word in sentence sequentially matches reaches voice output module 6 successively, corresponding to the Chinese After language explains that the Chinese speech pronunciation signal of each Chinese word in sentence is sequentially detected, read one by one by voice output module 6, by language Loudspeaker in sound output module 6 sends and receives Chinese with it and explain the corresponding Chinese of each Chinese word in sentence successively Voice;
When sound identification module 5 judges that its explanation sentence for being received explains sentence and its Ke's language solution for receiving by Ke's language When releasing sentence and be Ke's language word expressed with Arabic alphabet or Cyrillic, sound identification module 5 is with speech database The true man Ke's language sound bank for being stored, Ke's language for accordingly being received to it one by one explains each Ke's language word of sentence according to Ke's language Pronunciation word order carries out voice match, then has what Ke's language word received in Ke's language explanation sentence with it sequentially matched by temporary Ke's language pronunciation signal reaches voice output module 6 successively, and each Ke's language word in Ke's language explanation sentence is received corresponding to it After Ke's language pronunciation signal is sequentially detected, read one by one by voice output module 6, by the loudspeaker in voice output module 6 successively Send Ke's language voice matched with each Ke's language word in Ke's language explanation sentence;If sound identification module 5 judges that it is received Explanation sentence for Ke's language explain sentence, but can not to Ke's language explain sentence carry out voice match when, then estimate Ke's language solution It is Ke's language text expressed with Arabic alphabet or Cyrillic to release sentence, and calls the synthesis stored in speech database Ke's language sound bank carries out the phonetic synthesis based on syllable to Ke's language text, is incited somebody to action by the way that Ke's language sentence word is corresponding to syllable splitting method Ke's language text is cut into Ke language word of the known as memory in synthesis speech database, then with true man Ke's language sound bank and/or synthesizes Ke Language sound bank, accordingly carries out voice match to each Ke's language word of Ke's language text according to Ke's language pronunciation word order one by one, will be temporary There is Ke's language pronunciation signal that Ke's language word being sequentially cut into Ke's language text matches and reaches voice output mould successively Block 6, after Ke's language pronunciation signal is sequentially detected, read one by one by voice output module 6, by the loudspeaker in voice output module 6 Ke's language voice matched with each Ke's language word in Ke's language text is sent successively.
Described retrieval mode is for stem retrieval mode, afterbody retrieval mode or comprising retrieval mode;Stem retrieval mode For:A, retrieval each character that sequentially typing is input into word one by one from left to right of module 3, B, by basic corpus (Chinese- Ke's corpus and Ke-Chinese data storehouse) in the character combination data that are stored combine phase with the alphabetic character that is input into being logged Compare, if can be searched out from basic corpus that identical character is combined with the alphabetic character being logged, stop retrieval, i.e., it is complete Go out the work of be input into word into accurately mate;If can not be searched out from basic corpus by stem retrieval mode defeated with institute Enter word identical character combination, then continue to retrieve be input into word using following afterbody retrieval mode;
Afterbody retrieval mode is:1. (left side, the right for being faced according to people) the sequentially typing one by one from right to left of module 3 is retrieved Each character in be input into word, 2. with above-mentioned stem retrieval mode the step of B;If can not be by afterbody retrieval mode from base Searched out in plinth corpus be input into word identical character, then using it is following comprising retrieval mode continue retrieve be input into Word;
Comprising retrieval mode by from any direction matching input word character combination retrieval mode, including above-mentioned head Portion's retrieval mode and afterbody retrieval mode.Retrieval module 3 from basic corpus comprising retrieval mode by that should search out and institute Input word identical character, is finally completed the work that accurately mate is input into word.
Retrieval flow of the invention is related to languages identification module 2, retrieval module 3, retrieval combination output module 4 and basic language Expect storehouse, its main flow is:1) first, user is input into Chinese written language or Ke's language word, input by Chinese or Ke's language input method The word of required inquiry, is encoded by the UNICODE of input data, and judgement is input into word (original language word or text) Languages (Chinese or Kirgiz language);2) retrieval mode set according to user judges to be input into the languages of word, retrieves module 3 Retrieve the Chinese and/or Ke's language word, text matched be input into word (original language word or text);3) according to retrieval mould 3 pairs of results for being input into character search of block, match identical be input into word or corresponding Chinese list from basic corpus Word and/or Ke's language word Chinese equivalent in meaning explain that example sentence and Ke's language explain example sentence, and combination producing needs the data of output.
Screen word-selecting of the present invention, translation flow are related to languages identification module 2, display module 1, retrieval module 3 and take word number According to storehouse (basic corpus), its main flow is:1) user input word (needing word, the text of translation);2) languages identification Module 2 encodes the languages (Chinese for judging above-mentioned be input into word (original language word or text) by the UNICODE of input data Or Kirgiz language);3) different language that word is judged is input into according to languages identification module 2 pairs, retrieval module 3 is from taking word Chinese storehouse takes the middle acquisition of word Ke's language dictionary (Chinese-Ke's corpus and/or Ke-Chinese corpus) and is input into what word matched Word, text;4) result that word is finally matched is input into according to retrieval module 3 pairs, display module 1 passes through text mixed composition Technology and picture and text mixed composition technology, build screen word-selecting translation interface, show final translation result (Chinese word and sentence or Ke's words and phrases Sentence).
The flow that voice of the present invention is read aloud is related to languages identification module 2, voice output module 6, retrieval combination output module 4 And speech database, its main flow is:1) languages identification module 2 receives what retrieval combination output module 4 was sent to it Chinese, Ke's language explain that sentence (word being input into screen word-selecting link) carries out languages judgement, if the explanation being input into Sentence is Chinese word and sentence, then be input into Chinese word is matched from true man's Chinese speech information library, if the explanation sentence being input into is Ke's language words and phrases, then continue to judge that Ke's language that languages identification module 2 is received explains whether sentence is Ke's language word, if being input into Word be Ke's language word, then directly identical or corresponding Ke's language word is matched from true man Ke's language sound bank, if voice output Module 6 can not find Ke's language word of matching, then be transferred to text-processing process, if the explanation sentence being input into is Ke's language Text, then using Ke's language sentence syllable splitting technology, by Ke's language text according to the cutting of Ke's language language feature be Ke's language word, and will The characteristics of Ke's language word in Ke's language text is according to Ke's language cutting is syllable, and it is every to match Ke's language text from synthesis Ke's language sound bank All syllables of one Ke's language word, finally constitute complete Ke's language speech text;2) detected by computer speech equipment, to upper Ke's language text is stated to be read out and export, play.
User is input into word (original language word to be checked by keyboard entry method in the input frame of screen display Or text), the word being input into by languages identification link be identified category of language (Chinese or Ke's language) after, by retrieval module 3 using phonetic retrieval methods, stem descriptor index method, afterbody descriptor index method, comprising any one in descriptor index method and exact match search method Method, to be input into word and phonetic corpus, Chinese Ke corpus, Ke Han corpus word match, from basic language Retrieved in material storehouse with the above-mentioned word that be input into word is corresponding or identical is to be translated, then according to retrieving module 3 from base The word to be translated retrieved in plinth corpus, retrieval combination output module 4 is obtained and looked like with the word to be translated Corresponding Chinese explains that sentence and Ke's language explain sentence, then is entered by text mixed composition technology, picture and text mixed composition technology Edlin, explains that sentence or Ke's language explain that sentence is combined into the lteral data of output by the Chinese of translation, is displayed in (screen) In the domain of results display area.
The word (word or text) of the explanation to be translated that user is input into by cursor positioning method, the text being input into Word is by after languages identification link, languages identification module 2 takes word Ke repertorie (Chinese-Ke from the conventional word Chinese storehouse that takes with conventional again Corpus and/or Ke-Chinese corpus) in retrieve and be input into word (object language or original language word or text) meaning Think identical or corresponding another languages word (translation data), then by text mixed composition technology, picture and text mixed composition skill Art will translate data (result) and be combined into output data, and build display circle for meeting output data size in a dynamic fashion Face, shows final translation result.
After user is input into word (original language word or text), it is input into word and is examined by languages identification link, word After rope confirms link, Chinese and Ke's language translation link, Ke's language syllable segmentation of words link etc., recall true man's Chinese speech information library, True man Ke's language sound bank and synthesis Ke's language sound bank, corresponding Chinese or Ke's language voice document, voice are generated by be input into word Identification module 5 (speech detection equipment) reads above-mentioned be input into word, and it is defeated to send institute by syllable successively by its loudspeaker Enter the voice of word.

Claims (2)

1. a kind of method that Chinese Ke e-dictionary translates Chinese Ke's language automatically, described Ke's language is Kirgiz language, and it is sequentially processed The step of it is as follows:
(I) be input into word is shown by display module (1), structure takes word window, and languages identification module (2) is utilized and takes word window By the method for screen word-selecting, acquisition is input into the corresponding inputting character code region of word with display module (1) display, The word that will be input into is right compared with the code character in stored UNICODE standard codes character set, and judgement is input into text The languages of word are Chinese or Ke's language, then the word that is input into for being identified languages is reached retrieval module (3);
(II) retrieval module (3) obtains retrieval mode and word and is being deposited at the base of memory by being input into for languages is identified The character that is stored is compared in the Chinese-Ke's corpus and Ke-Chinese corpus of storage side by side in plinth corpus, with from basis The character combination identical or corresponding with the character for being input into word for being identified languages is retrieved in corpus, confirmation is known The word that is input into for not going out languages is the known individual character or word being stored in basic corpus, or further actively complete Chinese word is combined or word letter combination, if can not retrieve and be input into from the Chinese-Ke's corpus and Ke-Chinese corpus The identical or corresponding character combination-Chinese word of word or Ke's language word, then retrieve module (3) and judge to be identified languages The word that is input into be unknown, it is impossible to by languages identification module (2) confirm, receive;
(III) languages identification module (2) receives the character combination that retrieval module (3) is retrieved, and is stored from basic language material place The Chinese-Ke's corpus and Ke-Chinese corpus in recall it is corresponding with the character combination meaning retrieved by retrieval module (3) And different from another languages character combination-be translated into Chinese word, Chinese language words or Ke's language word of be input into word languages, Recalled from basic corpus be input into word and/or by languages identification module (2) again be input into word look like phase Corresponding another languages character combination is by retrieving module (3) or being directly transferred to retrieval combination output module (4);
(IV) retrieval combination output module (4) according to be input into word and/or by languages identification module (2) from basic corpus Another languages character combination corresponding with the be input into word meaning for being recalled, the Chinese stored side by side from basic corpus- Obtained in Chinese corpus and Ke-Ke's corpus for explanation be retrieved character combination that module (3) retrieves the meaning Chinese Sentence is explained, according to Slav Wen Keyu words and Arabic Ke's language word mapping table, is obtained and above-mentioned another languages character The corresponding Ezra husband letter of the combination meaning or Ke's language of Arabic alphabet expression explain sentence, mutually tackle by languages identification module (2) meaning of the character combination recalled from basic corpus is explained, and retrieval combination output module (4) is again examined it The explanation sentence that rope goes out is exported to sound identification module (5);
(V) when sound identification module (5) judges that its explanation sentence for being received explains sentence for Chinese, sound identification module (5) the true man's Chinese speech information library stored with the speech data place being deposited in memory, the Chinese for accordingly being received to it one by one Language explains that each Chinese word in sentence carries out voice match according to Chinese speech pronunciation word order, then receives the Chinese with it by keeping in have Language explains that the Chinese speech pronunciation signal that the Chinese word in sentence sequentially matches reaches voice output module (6) successively, corresponds to After Chinese explains that the Chinese speech pronunciation signal of each Chinese word in sentence is sequentially detected, read one by one by voice output module (6), The each Chinese word received with it in Chinese explanation sentence is sent by the loudspeaker in voice output module (6) successively corresponding Chinese speech;
When sound identification module (5) judges that its explanation sentence for being received is explained sentence by Ke's language and its Ke's language for receiving is explained When sentence is with Arabic alphabet or Ke's language word of Cyrillic expression, sound identification module (5) is with speech database The true man Ke's language sound bank for being stored, Ke's language for accordingly being received to it one by one explains each Ke's language word of sentence according to Ke's language Pronunciation word order carries out voice match, then has what Ke's language word received in Ke's language explanation sentence with it sequentially matched by temporary Ke's language pronunciation signal reaches voice output module (6) successively, Ke's language is received corresponding to it and explains each Ke's language word in sentence Ke's language pronunciation signal by voice output module (6) one by one sequentially detect, read after, by raising one's voice in voice output module (6) Device sends Ke's language voice matched with each Ke's language word in Ke's language explanation sentence successively;If sound identification module (5) judges The explanation sentence that it is received is that Ke's language explains sentence, but when can not explain that Ke's language sentence carries out voice match, is then estimated Ke's language explains that sentence is Ke's language text expressed with Arabic alphabet or Cyrillic, and calls and deposited in speech database Synthesis Ke's language sound bank of storage carries out the phonetic synthesis based on syllable to Ke's language text, by Ke's language sentence word and syllable splitting Method is corresponding to be cut into Ke language word of the known as memory in synthesis speech database by Ke's language text, then with true man Ke's language sound bank and/ Or synthesize Ke's language sound bank, accordingly each Ke's language word one by one to Ke's language text carries out voice according to Ke's language pronunciation word order Match somebody with somebody, the temporary Ke's language pronunciation signal for having Ke's language word being sequentially cut into Ke's language text to match is reached into voice successively Output module (6), after Ke's language pronunciation signal is sequentially detected, read one by one by voice output module (6), by voice output module (6) loudspeaker in sends Ke's language voice matched with each Ke's language word in Ke's language text successively.
2. the method that Chinese Ke e-dictionary according to claim 1 translates Chinese Ke's language automatically, it is characterized in that:Described retrieval Mode is for stem retrieval mode, afterbody retrieval mode or comprising retrieval mode;
Stem retrieval mode is:A, retrieval module (3) each character that sequentially typing is input into word one by one from left to right;B、 The character combination data that will be stored in basic corpus are right compared with alphabetic character is combined with being input into for being logged, if can be from base Searched out in plinth corpus and combine identical character with the alphabetic character being logged, then stop retrieval, that is, completed accurately mate and go out The work of be input into word;If can not be searched out from basic corpus and be input into word identical by stem retrieval mode Character combination, then continue to retrieve be input into word using following afterbody retrieval mode;
Afterbody retrieval mode:Retrieval module(3)From right to left, i.e., the left side that faces according to people, the right sequentially typing one by one institute are defeated Enter each character in word, the character combination data that will be stored in basic corpus be logged be input into alphabetic character Combination, if can be searched out from basic corpus that identical character is combined with the alphabetic character being logged, stops inspection compared to right Rope, that is, complete the work that accurately mate goes out be input into word;If can not be searched for from basic corpus by afterbody retrieval mode Go out and continue to retrieve be input into word comprising retrieval mode be input into word identical character combination, then use;
Comprising retrieval mode by from any direction match institute input word character combination retrieval mode, including above-mentioned stem examine Rope mode and afterbody retrieval mode.
CN201110426747.XA 2011-12-19 2011-12-19 The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language Active CN103164395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110426747.XA CN103164395B (en) 2011-12-19 2011-12-19 The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110426747.XA CN103164395B (en) 2011-12-19 2011-12-19 The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language

Publications (2)

Publication Number Publication Date
CN103164395A CN103164395A (en) 2013-06-19
CN103164395B true CN103164395B (en) 2017-06-23

Family

ID=48587491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110426747.XA Active CN103164395B (en) 2011-12-19 2011-12-19 The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language

Country Status (1)

Country Link
CN (1) CN103164395B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298661A (en) * 2013-12-29 2015-01-21 新疆信息产业有限责任公司 Method for using Kirgiz translation engine for self-service electric fee payment terminal
CN105095192A (en) * 2014-05-05 2015-11-25 武汉传神信息技术有限公司 Double-mode translation equipment
CN106202147A (en) * 2016-06-16 2016-12-07 塔里木大学 A kind of Ke Han intertranslation electronic dictionary based on Android
CN112818212B (en) * 2020-04-23 2023-10-13 腾讯科技(深圳)有限公司 Corpus data acquisition method, corpus data acquisition device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219646B1 (en) * 1996-10-18 2001-04-17 Gedanken Corp. Methods and apparatus for translating between languages
CN101329667A (en) * 2008-08-04 2008-12-24 深圳市大正汉语软件有限公司 Intelligent translation apparatus of multi-language voice mutual translation and control method thereof
CN102103625A (en) * 2009-12-17 2011-06-22 艾利和电子科技(中国)有限公司 System for automatically searching electronic dictionary according to input language and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219646B1 (en) * 1996-10-18 2001-04-17 Gedanken Corp. Methods and apparatus for translating between languages
CN101329667A (en) * 2008-08-04 2008-12-24 深圳市大正汉语软件有限公司 Intelligent translation apparatus of multi-language voice mutual translation and control method thereof
CN102103625A (en) * 2009-12-17 2011-06-22 艾利和电子科技(中国)有限公司 System for automatically searching electronic dictionary according to input language and method thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
汉维哈柯双语平行语料库加工处理系统的设计与实现;吴小川 等;《电脑知识与技术》;20110930;第7卷(第27期);第6680-6681,6693页 *
电子词典软件系统中对维、哈、柯文进行自动判别技术的研究;买日旦·吾守尔 等;《新疆大学学报(自然科学版)》;20110228;第28卷(第1期);第88-92页 *
维、哈、柯、汉、英多文种处理平台的设计与实现;缪成 等;《计算机工程》;20040531;第30卷(第10期);第71-73页 *

Also Published As

Publication number Publication date
CN103164395A (en) 2013-06-19

Similar Documents

Publication Publication Date Title
CN100511215C (en) Multilingual translation memory and translation method thereof
Pettersson et al. A multilingual evaluation of three spelling normalisation methods for historical text
CN103164397B (en) The Chinese breathes out the method that e-dictionary and its automatic translation Chinese breathe out language
CN110717341B (en) Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN103164398B (en) Utilize the method that Chinese dimension language translated automatically by Chinese dimension e-dictionary
CN103164395B (en) The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language
WO2012159558A1 (en) Natural language processing method, device and system based on semantic recognition
CN106528731A (en) Sensitive word filtering method and system
CN106383814A (en) Word segmentation method of English social media short text
CN103164396B (en) Use the method that Han Weihake language translated automatically by Han Weihake e-dictionary
Wehrmeyer A corpus for signed language<? br?> interpreting research
CN103680503A (en) Semantic identification method
CN113761377B (en) False information detection method and device based on attention mechanism multi-feature fusion, electronic equipment and storage medium
CN101441626A (en) Multimedia retrieval system and method
CN114239579A (en) Electric power searchable document extraction method and device based on regular expression and CRF model
CN109523992A (en) Tibetan dialect speech processing system
Del Arco et al. Share: A lexicon of harmful expressions by spanish speakers
Rosmorduc Computational linguistics in egyptology
JPH08212216A (en) Natural language processor and natural language processing method
CN111597827A (en) Method and device for improving machine translation accuracy
KR100463376B1 (en) A Translation Engine Apparatus for Translating from Source Language to Target Language and Translation Method thereof
Gupta et al. A new approach towards bibliographic reference identification, parsing and inline citation matching
JPS61248160A (en) Document information registering system
WO2008017188A1 (en) System and method for making teaching material of language class
CN1553381A (en) Multi-language correspondent list style language database and synchronous computer inter-transtation and communication

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant