CN103164395B - The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language - Google Patents
The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language Download PDFInfo
- Publication number
- CN103164395B CN103164395B CN201110426747.XA CN201110426747A CN103164395B CN 103164395 B CN103164395 B CN 103164395B CN 201110426747 A CN201110426747 A CN 201110426747A CN 103164395 B CN103164395 B CN 103164395B
- Authority
- CN
- China
- Prior art keywords
- language
- word
- chinese
- input
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a kind of Chinese Ke e-dictionary and its method for automatic translation Chinese Ke's language, with languages identification module, retrieval module, retrieval combination output module, display module, sound identification module and voice output module;After the word being input into is identified category of language, be input into word is matched with the word in basic corpus by retrieval module, then the word to be translated for being retrieved from basic corpus according to retrieval module, with the to be translated word meaning corresponding Chinese of the sound identification module again to being obtained by retrieval combination output module explains that sentence and Ke's language explain that sentence (through syllable splitting link) is effectively recognized, recall true man's sound bank or synthesis Ke's language sound bank, sound identification module reads above-mentioned be input into word, and send the voice of be input into word successively by the loudspeaker of sound identification module.Electronic dictionary of the invention is rational in infrastructure, and its method changes the dictionary technology of original Chinese Ke language intertranslation, improves the Chinese mutual translational efficiency of Ke's language, and improvement carries out the performance that voice is broadcasted to Chinese Ke's language word.
Description
Technical field
The invention belongs to mechanical translation language technical field, relating to the use of computer software and hardware makes Chinese and Kirgiz language
Mutually the language conversion technology of translation, particularly Chinese Ke e-dictionary and its automatically method of translation Chinese Ke's language.
Background technology
In the present age of social informatization, people obtain to all kinds of language informations, inquire about, translate propose faster, it is higher
It is required that, all kinds of e-dictionary products have been developed therewith, the big electronics to entry containing hundreds of thousands, up to ten thousand media materials is more
Media encyclopedia, small to be welcome by users to the palm instant translator containing thousands of entries, e-dictionary is by as study language
The aid of speech, translation and quick search.The practicalization of machine translation system and natural language processing system abroad
In, machine dictionary has become the focus of exploitation, increasing language translation technical specialist the scale of machine dictionary and
Quality regards the key for determining machine translation system and natural language processing system success or failure as, early in MITI of Japan in 1986 just
Provide funds 100,000,000 dollars to support e-dictionaries(EDR)The development plan of 9 years, the European Community also subsidizes grinding for multinomial machine dictionary
Problem is studied carefully, including ACQUILEX(The Acquisition of Lexical Knowledge)Problem, its target is logical
Cross multi-section machine readable dictionary MRD(Machine Reading Dictionary)Automatically vocabulary knowledge is obtained, to set up
Support the multilingual words knowledge base LKB of natural language processing(Lexical Knowledge Base), opened on this basis
The multi-section heavy-duty machines dictionary of each languages of hair, its species includes basic dictionary, term dictionary, collocation dictionary, concept classification diction
Allusion quotation, concept description dictionary, grammer dictionary etc..At present, the e-dictionary species of commercialization is various, such as Encyclopedia Britannica, Ke
General encyclopedia, ENCARTA etc..
In China, it is related to the research in terms of machine translation dictionary then to start from the twentieth century 50, sixties, in reform and opening-up
After obtained abundant attention, the twentieth century later stage eighties, the expert in Chinese information processing field has started to machine dictionary
National the Seventh Five-Year Plan, eight or five, 95 are formally listed in research, twentieth century beginning of the nineties, the research of the machine dictionary of Information treatment in
Plan, has carried out such as《Information processing is studied with modern Chinese vocabulary》、《Chinese semantic meaning dictionary based on coordination valence》、《The modern Chinese
Language syntactic information dictionary》Deng basic research problem, develop on this basis《Encyclopadia Sinica》、《Kingsoft Powerword》、
《East grand ceremony》Deng more ripe information products, the welcome of users is received.
In recent years, with the sustained and rapid development of minority language informatization, in Xinjiang of China, the relevant minority people
The e-dictionary of race's language there has also been than larger development, but most of based on existing common Chinese dimension e-dictionary, not
The level for having the actual demand for meeting more users, more support minority language translation technologies has larger lacking
Fall into.
The content of the invention
It is an object of the invention to provide a kind of Chinese Ke e-dictionary, its is rational in infrastructure, highly versatile.
The object of the present invention is achieved like this:A kind of Chinese Ke e-dictionary, by languages identification module, retrieval module, inspection
Suo Zuhe output modules, display module, sound identification module and voice output module composition, languages identification module are corresponding by its
The interface of interface connection display module and the interface of retrieval module, retrieval module export end interface correspondence chained search group by it
Close the input end interface of output module, the input of the output end interface correspondence connection sound identification module of retrieval combination output module
End interface, sound identification module exports the input end interface that end interface connects voice output module by it.
The present invention also aims to provide a kind of method that Chinese Ke e-dictionary translates Chinese Ke's language automatically, change original biography
The dictionary technology of system, common Chinese and Kirgiz language intertranslation, improves the efficiency that Chinese and Kirgiz language are mutually translated, and improves
The performance that voice is broadcasted is carried out to Chinese written language, Kirgiz Chinese language word (Kirgiz language is referred to as Ke's language or Ke Wen).
The object of the present invention is achieved like this:A kind of method that Chinese Ke e-dictionary translates Chinese Ke's language automatically, it is sequentially
The step for the treatment of, is as follows:
(I) be input into word is shown by display module 1, structure takes word window, and languages identification module 2 is utilized and takes word window
The method that mouth passes through screen word-selecting, acquisition is input into the corresponding inputting character code region of word with the display of display module 1,
The word that will be input into is right compared with the code character in stored UNICODE standard codes character set, and judgement is input into text
The languages of word are Chinese or Ke's language, then the word that is input into for being identified languages is reached retrieval module 3;
(II) retrieval module 3 obtains retrieval mode and word and is being deposited at memory by being input into for languages is identified
The character that is stored is compared in the Chinese-Ke's corpus and Ke-Chinese corpus of storage side by side in basic corpus, with from base
The character combination identical or corresponding with the character for being input into word for being identified languages is retrieved in plinth corpus, quilt is confirmed
The word that is input into for identifying languages is the known individual character or word being stored in basic corpus, or further actively complete
Whole Chinese word combination or word letter combination, if can not be retrieved from the Chinese-Ke's corpus and Ke-Chinese corpus defeated with institute
Enter the identical or corresponding character combination-Chinese word of word or Ke's language word, then retrieve module 3 and judge to be identified languages
The word that is input into be unknown, it is impossible to confirmed by languages identification module 2, receive;
(III) languages identification module 2 receives the character combination that retrieval module 3 is retrieved, and is stored from basic language material place
The Chinese-Ke's corpus and Ke-Chinese corpus in recall it is corresponding with the character combination meaning retrieved by retrieval module 3 and
Different from another languages character combination-be translated into Chinese word, Chinese language words or Ke's language word of be input into word languages, then
What is recalled from basic corpus be input into word and/or by languages identification module 2 is corresponding with the be input into word meaning
Another languages character combination by retrieve module 3 or be directly transferred to retrieval combination output module 4;
(IV) retrieval combination output module 4 according to be input into word and/or by languages identification module 2 from basic corpus
Another languages character combination corresponding with the be input into word meaning for being recalled, the Chinese stored side by side from basic corpus-
The Chinese solution of the meaning for explaining the character combination that the module 3 that is retrieved is retrieved is obtained in Chinese corpus and Ke-Ke's corpus
Sentence is released, according to Slav Wen Keyu words and Arabic Ke's language word mapping table, is obtained and above-mentioned another languages character group
The desirable Ke's language explanation sentence for thinking corresponding Ezra husband letter or Arabic alphabet expression, mutually tackles by languages identification module 2
The meaning of the character combination recalled from basic corpus is explained, and retrieval combination output module 4 is again retrieved it
Explanation sentence export to sound identification module 5;
(V) when sound identification module 5 judges that its explanation sentence for being received explains sentence for Chinese, speech recognition mould
True man's Chinese speech information library that block 5 is stored with the speech data place being deposited in memory, the Chinese for accordingly being received to it one by one
Language explains that each Chinese word in sentence carries out voice match according to Chinese speech pronunciation word order, then receives the Chinese with it by keeping in have
Language explains that the Chinese speech pronunciation signal that the Chinese word in sentence sequentially matches reaches voice output module 6 successively, corresponding to the Chinese
After language explains that the Chinese speech pronunciation signal of each Chinese word in sentence is sequentially detected, read one by one by voice output module 6, by language
Loudspeaker in sound output module 6 sends and receives Chinese with it and explain the corresponding Chinese of each Chinese word in sentence successively
Voice;
When sound identification module 5 judges that its explanation sentence for being received explains sentence and its Ke's language solution for receiving by Ke's language
When releasing sentence and be Ke's language word expressed with Arabic alphabet or Cyrillic, sound identification module 5 is with speech database
The true man Ke's language sound bank for being stored, Ke's language for accordingly being received to it one by one explains each Ke's language word of sentence according to Ke's language
Pronunciation word order carries out voice match, then has what Ke's language word received in Ke's language explanation sentence with it sequentially matched by temporary
Ke's language pronunciation signal reaches voice output module 6 successively, and each Ke's language word in Ke's language explanation sentence is received corresponding to it
After Ke's language pronunciation signal is sequentially detected, read one by one by voice output module 6, by the loudspeaker in voice output module 6 successively
Send Ke's language voice matched with each Ke's language word in Ke's language explanation sentence;If sound identification module 5 judges that it is received
Explanation sentence for Ke's language explain sentence, but can not to Ke's language explain sentence carry out voice match when, then estimate Ke's language solution
It is Ke's language text expressed with Arabic alphabet or Cyrillic to release sentence, and calls the synthesis stored in speech database
Ke's language sound bank carries out the phonetic synthesis based on syllable to Ke's language text, is incited somebody to action by the way that Ke's language sentence word is corresponding to syllable splitting method
Ke's language text is cut into Ke language word of the known as memory in synthesis speech database, then with true man Ke's language sound bank and/or synthesizes Ke
Language sound bank, accordingly carries out voice match to each Ke's language word of Ke's language text according to Ke's language pronunciation word order one by one, will be temporary
There is Ke's language pronunciation signal that Ke's language word being sequentially cut into Ke's language text matches and reaches voice output mould successively
Block 6, after Ke's language pronunciation signal is sequentially detected, read one by one by voice output module 6, by the loudspeaker in voice output module 6
Ke's language voice matched with each Ke's language word in Ke's language text is sent successively.
The present invention is based at computational linguistics, Ethnology, sociology, pragmatics, interpretative science and computerized information
Chinese Ke's language two-way multimedia e-dictionary of reason science and technology, the bilingual coded format of Chinese Ke based on UNICODE international standards, with
The two-way word input function of Han Ke, Ke Han, Chinese Ke word and text reading function are realized, with the utilization under different operating system
The function that screen word-selecting method obtains the function of Chinese Ke's character and changed to domestic and international Kirgiz (language) literal code, also has
There is the multilingual interface of Chinese Ke's language, to Chinese Ke's word quick-searching, fuzzy search, kirgiz can be directly inputted, to diction
Allusion quotation dictionary is managed, the function such as the setting of subsidiary dictionary, dictionary instrument, dictionary annex, online upgrading.
The present invention provides Kirgiz language arabian writing input method, but is independent of other kirgiz input methods, carries
Availability high, there is provided the two-way real time translation of screen word-selecting Chinese Ke, is the user side of bringing using Chinese, Kirgiz language
Just, there is provided the standard of Chinese Ke's word and expression is read aloud, it is learning Chinese, the powerful of Kirgiz language, with magnanimity Ke Er
Gram diligent literary corpus and word, phrase explanation function and Kirgiz language Slav word(Kirghizstan)And Ke Erke
Diligent language arabian writing(Xinjiang, China)Between conversion display function, facilitate other say the personnel of non-Kirgiz language learn Ke
That Ke Zi races language, Kirgiz national history, folkways and customs, are that other say that the personnel of non-Kirgiz language understand Xinjiang and Ji
The lucky De Stein geography information of that and region, style and features provide lot of examples.
The present invention solves all domestic and international Kirgiz people with Kirgiz language as mother tongue and is difficult to obtain modern
Aphasis problem in knowledge and daily life, enables domestic and international Kirgiz language learner rapid translation and then obtains various
Information, not only facilitates Kirgiz people's learning Chinese, and facilitates Han nationality comrade and foreigner's study Kirgiz language, is Ke
That gram diligent language, Chinese user learning Chinese, Ke's language translation tool, the Chinese to improving the Kirgiz people say that the level of writing has
Profound significance;On the other hand Chinese Ke (language) machine translation dictionary in future storehouse is built, to crow (Uzbek's text) Chinese, soil (soil
Er Qiwen) the exploitation of Chinese bidirectional electronic dictionary and auxiliary machinery translation system lays a solid foundation.
Technical characterstic of the invention is:1. the two-way word translation service between Chinese, Kirgiz language is provided, in this hair
Being input into above-mentioned any one language word in bright Chinese Ke's e-dictionary can obtain its lexical or textual analysis in another language;②
The kirgiz assembly type input method for supporting world UNICODE standards is provided, i.e. user is fitted without any Ke's language input method
When, this dictionary still can correctly enter Ke's language word of standard;3. in the Windows sequence of maneuvers systems of current main-stream
(Windows XP\Windows Server\Windows Vista\Windows 7)In, it is capable of achieving to carry out screen to Ke's language and take
The function of word;4. the function of reading aloud to Ke's language word and text is realized using statistics and phonetics, it is massage voice reading standard, clear
It is clear, with more advanced technical characteristic;5. the additional work(such as dictionary online upgrading, dictionary setting, dictionary instrument, dictionary annex are provided
Can, can be according to being configured the need for user;6. the multilingual dictionary interface of close friend is provided, is obtained not by the setting of hommization
With the dictionary interface of language and direction;7. the function to being input into word language automatic identification is realized, analysis is input into word, automatically
Languages judgement is carried out to be input into word, and word translation is carried out to it;8. being collected in Chinese Ke dictionary has nearly 100,000 vocabulary, together
When establish true man's sound bank and the massage voice reading based on syllable splitting technology synthesis storehouse;9. Kirgiz language Slav text is realized
Conversion display function between word (Central Asia-Kirghizstan) and Kirgiz language arabian writing (Xinjiang, China), that is, exist
Above two written form is shown in lexical or textual analysis window simultaneously, so as to effectively widen use scope of the invention.Electricity of the invention
Sub- dictionary its rational in infrastructure, highly versatile, its method changes the dictionary skill of original traditional, common Chinese and Kirgiz language intertranslation
Art, improves the efficiency that Chinese and Kirgiz language are mutually translated, and improvement carries out voice and put to Chinese written language, Kirgiz Chinese language word
The performance sent.
Brief description of the drawings
Accompanying drawing 1 is the main-process stream schematic diagram of the method for module connection diagram of the invention and its automatic translation Chinese Ke's language.
Specific embodiment
A kind of Chinese Ke e-dictionary, as shown in Figure 1, by languages identification module 2, retrieval module 3, retrieval combination output mould
Block 4, display module 1, sound identification module 5 and voice output module 6 are constituted, and languages identification module 2 is connected by its corresponding interface
The interface of display module 1 and the interface of retrieval module 3 are connect, retrieval module 3 exports end interface correspondence chained search and combines by it
The input end interface of output module 4, the output end interface correspondence of retrieval combination output module 4 connects the defeated of sound identification module 5
Enter end interface, sound identification module 5 exports the input end interface that end interface connects voice output module 6 by it.
A kind of method that Chinese Ke e-dictionary translates Chinese Ke's language automatically, as shown in Figure 1, the step of it is sequentially processed such as
Under:
(I) be input into word is shown by display module 1, structure takes word window, and languages identification module 2 is utilized and takes word window
The method that mouth passes through screen word-selecting, acquisition is input into the corresponding inputting character code region of word with the display of display module 1,
The word that will be input into is right compared with the code character in stored UNICODE standard codes character set, and judgement is input into text
The languages of word are Chinese or Ke's language, then the word that is input into for being identified languages is reached retrieval module 3;
(II) retrieval module 3 obtains retrieval mode and word and is being deposited at memory by being input into for languages is identified
The character that is stored is compared in the Chinese-Ke's corpus and Ke-Chinese corpus of storage side by side in basic corpus, with from base
The character combination identical or corresponding with the character for being input into word for being identified languages is retrieved in plinth corpus, quilt is confirmed
The word that is input into for identifying languages is the known individual character or word being stored in basic corpus, or further actively complete
Whole Chinese word combination or word letter combination, if can not be retrieved from the Chinese-Ke's corpus and Ke-Chinese corpus defeated with institute
Enter the identical or corresponding character combination-Chinese word of word or Ke's language word, then retrieve module 3 and judge to be identified languages
The word that is input into be unknown, it is impossible to confirmed by languages identification module 2, receive;
(III) languages identification module 2 receives the character combination that retrieval module 3 is retrieved, and is stored from basic language material place
The Chinese-Ke's corpus and Ke-Chinese corpus in recall it is corresponding with the character combination meaning retrieved by retrieval module 3 and
Different from another languages character combination-be translated into Chinese word, Chinese language words or Ke's language word of be input into word languages, then
What is recalled from basic corpus be input into word and/or by languages identification module 2 is corresponding with the be input into word meaning
Another languages character combination by retrieve module 3 or be directly transferred to retrieval combination output module 4;
(IV) retrieval combination output module 4 according to be input into word and/or by languages identification module 2 from basic corpus
Another languages character combination corresponding with the be input into word meaning for being recalled, the Chinese stored side by side from basic corpus-
The Chinese solution of the meaning for explaining the character combination that the module 3 that is retrieved is retrieved is obtained in Chinese corpus and Ke-Ke's corpus
Sentence is released, according to Slav Wen Keyu words and Arabic Ke's language word mapping table, is obtained and above-mentioned another languages character group
The desirable Ke's language explanation sentence for thinking corresponding Ezra husband letter or Arabic alphabet expression, mutually tackles by languages identification module 2
The meaning of the character combination recalled from basic corpus is explained, and retrieval combination output module 4 is again retrieved it
Explanation sentence export to sound identification module 5;
(V) when sound identification module 5 judges that its explanation sentence for being received explains sentence for Chinese, speech recognition mould
True man's Chinese speech information library that block 5 is stored with the speech data place being deposited in memory, the Chinese for accordingly being received to it one by one
Language explains that each Chinese word in sentence carries out voice match according to Chinese speech pronunciation word order, then receives the Chinese with it by keeping in have
Language explains that the Chinese speech pronunciation signal that the Chinese word in sentence sequentially matches reaches voice output module 6 successively, corresponding to the Chinese
After language explains that the Chinese speech pronunciation signal of each Chinese word in sentence is sequentially detected, read one by one by voice output module 6, by language
Loudspeaker in sound output module 6 sends and receives Chinese with it and explain the corresponding Chinese of each Chinese word in sentence successively
Voice;
When sound identification module 5 judges that its explanation sentence for being received explains sentence and its Ke's language solution for receiving by Ke's language
When releasing sentence and be Ke's language word expressed with Arabic alphabet or Cyrillic, sound identification module 5 is with speech database
The true man Ke's language sound bank for being stored, Ke's language for accordingly being received to it one by one explains each Ke's language word of sentence according to Ke's language
Pronunciation word order carries out voice match, then has what Ke's language word received in Ke's language explanation sentence with it sequentially matched by temporary
Ke's language pronunciation signal reaches voice output module 6 successively, and each Ke's language word in Ke's language explanation sentence is received corresponding to it
After Ke's language pronunciation signal is sequentially detected, read one by one by voice output module 6, by the loudspeaker in voice output module 6 successively
Send Ke's language voice matched with each Ke's language word in Ke's language explanation sentence;If sound identification module 5 judges that it is received
Explanation sentence for Ke's language explain sentence, but can not to Ke's language explain sentence carry out voice match when, then estimate Ke's language solution
It is Ke's language text expressed with Arabic alphabet or Cyrillic to release sentence, and calls the synthesis stored in speech database
Ke's language sound bank carries out the phonetic synthesis based on syllable to Ke's language text, is incited somebody to action by the way that Ke's language sentence word is corresponding to syllable splitting method
Ke's language text is cut into Ke language word of the known as memory in synthesis speech database, then with true man Ke's language sound bank and/or synthesizes Ke
Language sound bank, accordingly carries out voice match to each Ke's language word of Ke's language text according to Ke's language pronunciation word order one by one, will be temporary
There is Ke's language pronunciation signal that Ke's language word being sequentially cut into Ke's language text matches and reaches voice output mould successively
Block 6, after Ke's language pronunciation signal is sequentially detected, read one by one by voice output module 6, by the loudspeaker in voice output module 6
Ke's language voice matched with each Ke's language word in Ke's language text is sent successively.
Described retrieval mode is for stem retrieval mode, afterbody retrieval mode or comprising retrieval mode;Stem retrieval mode
For:A, retrieval each character that sequentially typing is input into word one by one from left to right of module 3, B, by basic corpus (Chinese-
Ke's corpus and Ke-Chinese data storehouse) in the character combination data that are stored combine phase with the alphabetic character that is input into being logged
Compare, if can be searched out from basic corpus that identical character is combined with the alphabetic character being logged, stop retrieval, i.e., it is complete
Go out the work of be input into word into accurately mate;If can not be searched out from basic corpus by stem retrieval mode defeated with institute
Enter word identical character combination, then continue to retrieve be input into word using following afterbody retrieval mode;
Afterbody retrieval mode is:1. (left side, the right for being faced according to people) the sequentially typing one by one from right to left of module 3 is retrieved
Each character in be input into word, 2. with above-mentioned stem retrieval mode the step of B;If can not be by afterbody retrieval mode from base
Searched out in plinth corpus be input into word identical character, then using it is following comprising retrieval mode continue retrieve be input into
Word;
Comprising retrieval mode by from any direction matching input word character combination retrieval mode, including above-mentioned head
Portion's retrieval mode and afterbody retrieval mode.Retrieval module 3 from basic corpus comprising retrieval mode by that should search out and institute
Input word identical character, is finally completed the work that accurately mate is input into word.
Retrieval flow of the invention is related to languages identification module 2, retrieval module 3, retrieval combination output module 4 and basic language
Expect storehouse, its main flow is:1) first, user is input into Chinese written language or Ke's language word, input by Chinese or Ke's language input method
The word of required inquiry, is encoded by the UNICODE of input data, and judgement is input into word (original language word or text)
Languages (Chinese or Kirgiz language);2) retrieval mode set according to user judges to be input into the languages of word, retrieves module 3
Retrieve the Chinese and/or Ke's language word, text matched be input into word (original language word or text);3) according to retrieval mould
3 pairs of results for being input into character search of block, match identical be input into word or corresponding Chinese list from basic corpus
Word and/or Ke's language word Chinese equivalent in meaning explain that example sentence and Ke's language explain example sentence, and combination producing needs the data of output.
Screen word-selecting of the present invention, translation flow are related to languages identification module 2, display module 1, retrieval module 3 and take word number
According to storehouse (basic corpus), its main flow is:1) user input word (needing word, the text of translation);2) languages identification
Module 2 encodes the languages (Chinese for judging above-mentioned be input into word (original language word or text) by the UNICODE of input data
Or Kirgiz language);3) different language that word is judged is input into according to languages identification module 2 pairs, retrieval module 3 is from taking word
Chinese storehouse takes the middle acquisition of word Ke's language dictionary (Chinese-Ke's corpus and/or Ke-Chinese corpus) and is input into what word matched
Word, text;4) result that word is finally matched is input into according to retrieval module 3 pairs, display module 1 passes through text mixed composition
Technology and picture and text mixed composition technology, build screen word-selecting translation interface, show final translation result (Chinese word and sentence or Ke's words and phrases
Sentence).
The flow that voice of the present invention is read aloud is related to languages identification module 2, voice output module 6, retrieval combination output module 4
And speech database, its main flow is:1) languages identification module 2 receives what retrieval combination output module 4 was sent to it
Chinese, Ke's language explain that sentence (word being input into screen word-selecting link) carries out languages judgement, if the explanation being input into
Sentence is Chinese word and sentence, then be input into Chinese word is matched from true man's Chinese speech information library, if the explanation sentence being input into is
Ke's language words and phrases, then continue to judge that Ke's language that languages identification module 2 is received explains whether sentence is Ke's language word, if being input into
Word be Ke's language word, then directly identical or corresponding Ke's language word is matched from true man Ke's language sound bank, if voice output
Module 6 can not find Ke's language word of matching, then be transferred to text-processing process, if the explanation sentence being input into is Ke's language
Text, then using Ke's language sentence syllable splitting technology, by Ke's language text according to the cutting of Ke's language language feature be Ke's language word, and will
The characteristics of Ke's language word in Ke's language text is according to Ke's language cutting is syllable, and it is every to match Ke's language text from synthesis Ke's language sound bank
All syllables of one Ke's language word, finally constitute complete Ke's language speech text;2) detected by computer speech equipment, to upper
Ke's language text is stated to be read out and export, play.
User is input into word (original language word to be checked by keyboard entry method in the input frame of screen display
Or text), the word being input into by languages identification link be identified category of language (Chinese or Ke's language) after, by retrieval module
3 using phonetic retrieval methods, stem descriptor index method, afterbody descriptor index method, comprising any one in descriptor index method and exact match search method
Method, to be input into word and phonetic corpus, Chinese Ke corpus, Ke Han corpus word match, from basic language
Retrieved in material storehouse with the above-mentioned word that be input into word is corresponding or identical is to be translated, then according to retrieving module 3 from base
The word to be translated retrieved in plinth corpus, retrieval combination output module 4 is obtained and looked like with the word to be translated
Corresponding Chinese explains that sentence and Ke's language explain sentence, then is entered by text mixed composition technology, picture and text mixed composition technology
Edlin, explains that sentence or Ke's language explain that sentence is combined into the lteral data of output by the Chinese of translation, is displayed in (screen)
In the domain of results display area.
The word (word or text) of the explanation to be translated that user is input into by cursor positioning method, the text being input into
Word is by after languages identification link, languages identification module 2 takes word Ke repertorie (Chinese-Ke from the conventional word Chinese storehouse that takes with conventional again
Corpus and/or Ke-Chinese corpus) in retrieve and be input into word (object language or original language word or text) meaning
Think identical or corresponding another languages word (translation data), then by text mixed composition technology, picture and text mixed composition skill
Art will translate data (result) and be combined into output data, and build display circle for meeting output data size in a dynamic fashion
Face, shows final translation result.
After user is input into word (original language word or text), it is input into word and is examined by languages identification link, word
After rope confirms link, Chinese and Ke's language translation link, Ke's language syllable segmentation of words link etc., recall true man's Chinese speech information library,
True man Ke's language sound bank and synthesis Ke's language sound bank, corresponding Chinese or Ke's language voice document, voice are generated by be input into word
Identification module 5 (speech detection equipment) reads above-mentioned be input into word, and it is defeated to send institute by syllable successively by its loudspeaker
Enter the voice of word.
Claims (2)
1. a kind of method that Chinese Ke e-dictionary translates Chinese Ke's language automatically, described Ke's language is Kirgiz language, and it is sequentially processed
The step of it is as follows:
(I) be input into word is shown by display module (1), structure takes word window, and languages identification module (2) is utilized and takes word window
By the method for screen word-selecting, acquisition is input into the corresponding inputting character code region of word with display module (1) display,
The word that will be input into is right compared with the code character in stored UNICODE standard codes character set, and judgement is input into text
The languages of word are Chinese or Ke's language, then the word that is input into for being identified languages is reached retrieval module (3);
(II) retrieval module (3) obtains retrieval mode and word and is being deposited at the base of memory by being input into for languages is identified
The character that is stored is compared in the Chinese-Ke's corpus and Ke-Chinese corpus of storage side by side in plinth corpus, with from basis
The character combination identical or corresponding with the character for being input into word for being identified languages is retrieved in corpus, confirmation is known
The word that is input into for not going out languages is the known individual character or word being stored in basic corpus, or further actively complete
Chinese word is combined or word letter combination, if can not retrieve and be input into from the Chinese-Ke's corpus and Ke-Chinese corpus
The identical or corresponding character combination-Chinese word of word or Ke's language word, then retrieve module (3) and judge to be identified languages
The word that is input into be unknown, it is impossible to by languages identification module (2) confirm, receive;
(III) languages identification module (2) receives the character combination that retrieval module (3) is retrieved, and is stored from basic language material place
The Chinese-Ke's corpus and Ke-Chinese corpus in recall it is corresponding with the character combination meaning retrieved by retrieval module (3)
And different from another languages character combination-be translated into Chinese word, Chinese language words or Ke's language word of be input into word languages,
Recalled from basic corpus be input into word and/or by languages identification module (2) again be input into word look like phase
Corresponding another languages character combination is by retrieving module (3) or being directly transferred to retrieval combination output module (4);
(IV) retrieval combination output module (4) according to be input into word and/or by languages identification module (2) from basic corpus
Another languages character combination corresponding with the be input into word meaning for being recalled, the Chinese stored side by side from basic corpus-
Obtained in Chinese corpus and Ke-Ke's corpus for explanation be retrieved character combination that module (3) retrieves the meaning Chinese
Sentence is explained, according to Slav Wen Keyu words and Arabic Ke's language word mapping table, is obtained and above-mentioned another languages character
The corresponding Ezra husband letter of the combination meaning or Ke's language of Arabic alphabet expression explain sentence, mutually tackle by languages identification module
(2) meaning of the character combination recalled from basic corpus is explained, and retrieval combination output module (4) is again examined it
The explanation sentence that rope goes out is exported to sound identification module (5);
(V) when sound identification module (5) judges that its explanation sentence for being received explains sentence for Chinese, sound identification module
(5) the true man's Chinese speech information library stored with the speech data place being deposited in memory, the Chinese for accordingly being received to it one by one
Language explains that each Chinese word in sentence carries out voice match according to Chinese speech pronunciation word order, then receives the Chinese with it by keeping in have
Language explains that the Chinese speech pronunciation signal that the Chinese word in sentence sequentially matches reaches voice output module (6) successively, corresponds to
After Chinese explains that the Chinese speech pronunciation signal of each Chinese word in sentence is sequentially detected, read one by one by voice output module (6),
The each Chinese word received with it in Chinese explanation sentence is sent by the loudspeaker in voice output module (6) successively corresponding
Chinese speech;
When sound identification module (5) judges that its explanation sentence for being received is explained sentence by Ke's language and its Ke's language for receiving is explained
When sentence is with Arabic alphabet or Ke's language word of Cyrillic expression, sound identification module (5) is with speech database
The true man Ke's language sound bank for being stored, Ke's language for accordingly being received to it one by one explains each Ke's language word of sentence according to Ke's language
Pronunciation word order carries out voice match, then has what Ke's language word received in Ke's language explanation sentence with it sequentially matched by temporary
Ke's language pronunciation signal reaches voice output module (6) successively, Ke's language is received corresponding to it and explains each Ke's language word in sentence
Ke's language pronunciation signal by voice output module (6) one by one sequentially detect, read after, by raising one's voice in voice output module (6)
Device sends Ke's language voice matched with each Ke's language word in Ke's language explanation sentence successively;If sound identification module (5) judges
The explanation sentence that it is received is that Ke's language explains sentence, but when can not explain that Ke's language sentence carries out voice match, is then estimated
Ke's language explains that sentence is Ke's language text expressed with Arabic alphabet or Cyrillic, and calls and deposited in speech database
Synthesis Ke's language sound bank of storage carries out the phonetic synthesis based on syllable to Ke's language text, by Ke's language sentence word and syllable splitting
Method is corresponding to be cut into Ke language word of the known as memory in synthesis speech database by Ke's language text, then with true man Ke's language sound bank and/
Or synthesize Ke's language sound bank, accordingly each Ke's language word one by one to Ke's language text carries out voice according to Ke's language pronunciation word order
Match somebody with somebody, the temporary Ke's language pronunciation signal for having Ke's language word being sequentially cut into Ke's language text to match is reached into voice successively
Output module (6), after Ke's language pronunciation signal is sequentially detected, read one by one by voice output module (6), by voice output module
(6) loudspeaker in sends Ke's language voice matched with each Ke's language word in Ke's language text successively.
2. the method that Chinese Ke e-dictionary according to claim 1 translates Chinese Ke's language automatically, it is characterized in that:Described retrieval
Mode is for stem retrieval mode, afterbody retrieval mode or comprising retrieval mode;
Stem retrieval mode is:A, retrieval module (3) each character that sequentially typing is input into word one by one from left to right;B、
The character combination data that will be stored in basic corpus are right compared with alphabetic character is combined with being input into for being logged, if can be from base
Searched out in plinth corpus and combine identical character with the alphabetic character being logged, then stop retrieval, that is, completed accurately mate and go out
The work of be input into word;If can not be searched out from basic corpus and be input into word identical by stem retrieval mode
Character combination, then continue to retrieve be input into word using following afterbody retrieval mode;
Afterbody retrieval mode:Retrieval module(3)From right to left, i.e., the left side that faces according to people, the right sequentially typing one by one institute are defeated
Enter each character in word, the character combination data that will be stored in basic corpus be logged be input into alphabetic character
Combination, if can be searched out from basic corpus that identical character is combined with the alphabetic character being logged, stops inspection compared to right
Rope, that is, complete the work that accurately mate goes out be input into word;If can not be searched for from basic corpus by afterbody retrieval mode
Go out and continue to retrieve be input into word comprising retrieval mode be input into word identical character combination, then use;
Comprising retrieval mode by from any direction match institute input word character combination retrieval mode, including above-mentioned stem examine
Rope mode and afterbody retrieval mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110426747.XA CN103164395B (en) | 2011-12-19 | 2011-12-19 | The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110426747.XA CN103164395B (en) | 2011-12-19 | 2011-12-19 | The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103164395A CN103164395A (en) | 2013-06-19 |
CN103164395B true CN103164395B (en) | 2017-06-23 |
Family
ID=48587491
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110426747.XA Active CN103164395B (en) | 2011-12-19 | 2011-12-19 | The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103164395B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104298661A (en) * | 2013-12-29 | 2015-01-21 | 新疆信息产业有限责任公司 | Method for using Kirgiz translation engine for self-service electric fee payment terminal |
CN105095192A (en) * | 2014-05-05 | 2015-11-25 | 武汉传神信息技术有限公司 | Double-mode translation equipment |
CN106202147A (en) * | 2016-06-16 | 2016-12-07 | 塔里木大学 | A kind of Ke Han intertranslation electronic dictionary based on Android |
CN112818212B (en) * | 2020-04-23 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Corpus data acquisition method, corpus data acquisition device, computer equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6219646B1 (en) * | 1996-10-18 | 2001-04-17 | Gedanken Corp. | Methods and apparatus for translating between languages |
CN101329667A (en) * | 2008-08-04 | 2008-12-24 | 深圳市大正汉语软件有限公司 | Intelligent translation apparatus of multi-language voice mutual translation and control method thereof |
CN102103625A (en) * | 2009-12-17 | 2011-06-22 | 艾利和电子科技(中国)有限公司 | System for automatically searching electronic dictionary according to input language and method thereof |
-
2011
- 2011-12-19 CN CN201110426747.XA patent/CN103164395B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6219646B1 (en) * | 1996-10-18 | 2001-04-17 | Gedanken Corp. | Methods and apparatus for translating between languages |
CN101329667A (en) * | 2008-08-04 | 2008-12-24 | 深圳市大正汉语软件有限公司 | Intelligent translation apparatus of multi-language voice mutual translation and control method thereof |
CN102103625A (en) * | 2009-12-17 | 2011-06-22 | 艾利和电子科技(中国)有限公司 | System for automatically searching electronic dictionary according to input language and method thereof |
Non-Patent Citations (3)
Title |
---|
汉维哈柯双语平行语料库加工处理系统的设计与实现;吴小川 等;《电脑知识与技术》;20110930;第7卷(第27期);第6680-6681,6693页 * |
电子词典软件系统中对维、哈、柯文进行自动判别技术的研究;买日旦·吾守尔 等;《新疆大学学报(自然科学版)》;20110228;第28卷(第1期);第88-92页 * |
维、哈、柯、汉、英多文种处理平台的设计与实现;缪成 等;《计算机工程》;20040531;第30卷(第10期);第71-73页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103164395A (en) | 2013-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100511215C (en) | Multilingual translation memory and translation method thereof | |
Pettersson et al. | A multilingual evaluation of three spelling normalisation methods for historical text | |
CN103164397B (en) | The Chinese breathes out the method that e-dictionary and its automatic translation Chinese breathe out language | |
CN110717341B (en) | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot | |
CN103164398B (en) | Utilize the method that Chinese dimension language translated automatically by Chinese dimension e-dictionary | |
CN103164395B (en) | The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language | |
WO2012159558A1 (en) | Natural language processing method, device and system based on semantic recognition | |
CN106528731A (en) | Sensitive word filtering method and system | |
CN106383814A (en) | Word segmentation method of English social media short text | |
CN103164396B (en) | Use the method that Han Weihake language translated automatically by Han Weihake e-dictionary | |
Wehrmeyer | A corpus for signed language<? br?> interpreting research | |
CN103680503A (en) | Semantic identification method | |
CN113761377B (en) | False information detection method and device based on attention mechanism multi-feature fusion, electronic equipment and storage medium | |
CN101441626A (en) | Multimedia retrieval system and method | |
CN114239579A (en) | Electric power searchable document extraction method and device based on regular expression and CRF model | |
CN109523992A (en) | Tibetan dialect speech processing system | |
Del Arco et al. | Share: A lexicon of harmful expressions by spanish speakers | |
Rosmorduc | Computational linguistics in egyptology | |
JPH08212216A (en) | Natural language processor and natural language processing method | |
CN111597827A (en) | Method and device for improving machine translation accuracy | |
KR100463376B1 (en) | A Translation Engine Apparatus for Translating from Source Language to Target Language and Translation Method thereof | |
Gupta et al. | A new approach towards bibliographic reference identification, parsing and inline citation matching | |
JPS61248160A (en) | Document information registering system | |
WO2008017188A1 (en) | System and method for making teaching material of language class | |
CN1553381A (en) | Multi-language correspondent list style language database and synchronous computer inter-transtation and communication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |