CN103164395A - Chinese-Kirgiz language electronic dictionary and automatic translating Chinese-Kirgiz language method thereof - Google Patents

Chinese-Kirgiz language electronic dictionary and automatic translating Chinese-Kirgiz language method thereof Download PDF

Info

Publication number
CN103164395A
CN103164395A CN 201110426747 CN201110426747A CN103164395A CN 103164395 A CN103164395 A CN 103164395A CN 201110426747 CN201110426747 CN 201110426747 CN 201110426747 A CN201110426747 A CN 201110426747A CN 103164395 A CN103164395 A CN 103164395A
Authority
CN
China
Prior art keywords
language
chinese
word
module
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110426747
Other languages
Chinese (zh)
Other versions
CN103164395B (en
Inventor
尼加提·纳吉米
买合木提·买买提
帕肉克·司地克
马斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XINJIANG INFORMATION INDUSTRY Co Ltd
Original Assignee
XINJIANG INFORMATION INDUSTRY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XINJIANG INFORMATION INDUSTRY Co Ltd filed Critical XINJIANG INFORMATION INDUSTRY Co Ltd
Priority to CN201110426747.XA priority Critical patent/CN103164395B/en
Publication of CN103164395A publication Critical patent/CN103164395A/en
Application granted granted Critical
Publication of CN103164395B publication Critical patent/CN103164395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a Chinese-Kirgiz language electronic dictionary and an automatic translating Chinese-Kirgiz language method of the electronic dictionary. The Chinese-Kirgiz language electronic dictionary comprises a language recognition module, a searching module, a searching combination output module, a display module, a voice recognition module and a voice output module. After the language of input characters is recognized, the searching module matches the input characters with words of a basic language database, then the voice recognition module effectively recognizes Chinese explaining sentences and Kirgiz language explaining sentences (through a syllable segmentation link), wherein the Chinese explaining sentences and the Kirgiz language explaining sentences are obtained by the searching combination output module and correspond to words to be translated in meaning, then a human voice library is used or a Kirgiz language voice library is composed, the voice recognition module reads the input characters, and voices of the input characters are successively given out through a loudspeaker of the voice recognition module. The electronic dictionary is reasonable in structure, the prior dictionary technology of Chinese-Kirgiz language translation is improved, the efficiency of the Chinese-Kirgiz language translation is improved, and the performance of broadcasting voices of Chinese-Kirgiz language characters is improved.

Description

Chinese Ke e-dictionary and automatically translate the method for Chinese Ke language
Technical field
The invention belongs to the mechanical translation language technical field, relate to the language conversion technology of utilizing computer software and hardware that Chinese and Kirgiz language are translated mutually, particularly Chinese Ke e-dictionary and automatically translate the method for Chinese Ke language.
Background technology
In the present age of social informatization, people have proposed faster, higher requirement to all kinds of languages acquisition of informations, inquiry, translation, all kinds of e-dictionary products have been developed thereupon, greatly to the electronic multimedia encyclopedia that contains hundreds of thousands entry, up to ten thousand media materials, little palm instant translator to containing several thousand entries, be subject to users and welcome, e-dictionary is used as and learns a language, the aid of translation and fast query.abroad in the practicalization of machine translation system and natural language processing system, the machine dictionary has become the focus of exploitation, increasing Language Translation technical specialist regards the scale and quality of machine dictionary as the key that determines machine translation system and natural language processing system success or failure, as far back as MITI of Japan in 1986 the 100000000 dollars of development plans of 9 years supporting e-dictionaries (EDR) of just providing funds, the European Community also subsidizes the research topic of multinomial machine dictionary, comprising ACQUILEX(The Acquisition of Lexical Knowledge) problem, its target is by multi-section machine readable dictionary MRD(Machine Reading Dictionary) come the automatic acquisition vocabulary knowledge, in order to set up the multilingual words knowledge base LKB(Lexical Knowledge Base that supports natural language processing), the large-scale machine dictionary of the multi-section of each languages of developing on this basis, its kind comprises basic dictionary, the term dictionary, the collocation dictionary, the concept classification dictionary, the concept description dictionary, grammer dictionary etc.At present, the e-dictionary of commercialization is of a great variety, as Encyclopedia Britannica, Ke Pudun encyclopedia, ENCARTA etc.
in China, the research that relates to mechanical translation dictionary aspect starts from twentieth century 50, the sixties, obtained abundant attention after reform and opening-up, the twentieth century later stage eighties, the expert in Chinese information processing field has begun the research to the machine dictionary, twentieth century beginning of the nineties, national the Seventh Five-Year Plan is formally listed in the research of the machine dictionary that Information is processed in, eight or five, the Ninth Five-Year Plan, carried out such as " information processing is studied with modern Chinese vocabulary ", " based on the Chinese semantic meaning dictionary of coordination valence ", basic research problems such as " Modern Chinese syntactic information dictionaries ", developed on this basis " Encyclopadia Sinica ", " Kingsoft Powerword ", more ripe information products such as " east grand ceremonies ", be subject to users' welcome.
In recent years, sustained and rapid development along with the minority language informatization, in Xinjiang of China, the e-dictionary of relevant minority language has also had larger development, but great majority are take existing common Chinese dimension e-dictionary as main, do not satisfy more users' actual demand, more branched level of holding the minority language translation technology exists larger defective.
Summary of the invention
The object of the present invention is to provide a kind of Chinese Ke e-dictionary, it is rational in infrastructure, highly versatile.
the object of the present invention is achieved like this: a kind of Chinese Ke e-dictionary, by the languages identification module, retrieval module, retrieval array output module, display module, sound identification module and voice output module form, the languages identification module connects the interface of display module and the interface of retrieval module by its corresponding interface, retrieval module is by the input end interface of the corresponding chained search array output of its output terminal interface module, the corresponding input end interface that connects sound identification module of output terminal interface of retrieval array output module, sound identification module connects the input end interface of voice output module by its output terminal interface.
The present invention also aims to provide a kind of Chinese Ke e-dictionary automatically to translate the method for Chinese Ke language, change the dictionary technology of original tradition, common Chinese and Kirgiz language intertranslation, improve the efficient that Chinese and Kirgiz language are translated mutually, improve Chinese written language, Kirgiz Chinese language word are carried out the performance (the Kirgiz language is referred to as Ke's language or Ke Wen) that voice are broadcasted.
The object of the present invention is achieved like this: a kind of Chinese Ke e-dictionary is translated the method for Chinese Ke language automatically, and its step of processing according to the order of sequence is as follows:
(I) shown the word of inputting by display module, structure is got the word window, the utilization of languages identification module is got the word window by the method for screen word-selecting, obtain the corresponding inputting character code zone of institute's input characters that shows with display module, (universal character set: the coded character Universal Multiple-Octet Coded Character Set) compares with the word inputted and stored UNICODE standard code character set, the languages of judgement institute input characters are Chinese or Ke's language, again institute's input characters of identified languages is reached retrieval module,
(II) retrieval module obtains retrieval mode institute's input characters of identified languages and the character of storing in the Han of storage-Ke corpus and Ke-Han corpus side by side in being deposited at the basic corpus of storer is compared, to retrieve the character combination identical or corresponding with the character of institute's input characters of identified languages from basic corpus, institute's input characters of confirming identified languages is known individual character or the word that has been stored in basic corpus, or further initiatively complete Chinese word combination or word letter combination, if can not retrieve character combination-Chinese word or the Ke language word identical or corresponding with the institute input characters from Han-Ke corpus and Ke-Han corpus, institute's input characters of the identified languages of retrieval module judgement is unknown, can not be confirmed by the languages identification module, receive,
(III) languages identification module receives the character combination that retrieval module retrieves, and access corresponding with the character combination meaning that is retrieved by retrieval module the Han that stores from basic corpus-Ke corpus and Ke-Han corpus and another languages character combination that be different from institute's input characters languages-be translated into Chinese word, Chinese word or Ke's language word, again institute's input characters and/or another languages character combination corresponding with the institute input characters meaning of being accessed from basic corpus by the languages identification module by retrieval module or directly reach retrieval array output module,
(IV) retrieval array output module is according to institute's input characters and/or another languages character combination corresponding with the institute input characters meaning of being accessed from basic corpus by the languages identification module, obtain in the Han of storage-Han corpus and Ke-Ke corpus side by side from basic corpus for the be retrieved Chinese of the meaning of the character combination that module retrieves of explanation and explain statement, according to Slav Wen Keyu word and Arabic Ke language word mapping table, obtain Ke's language explanation statement that the Ezra husband letter corresponding with above-mentioned another languages character combination meaning or Arabic alphabet are expressed, the meaning of tackling mutually the character combination that is accessed by the languages identification module from basic corpus makes an explanation, the explanation statement that retrieval array output module retrieves it again exports sound identification module to,
when (V) judges that when sound identification module its explanation statement that receives is Chinese explanation statement, true man's Chinese speech information library that sound identification module is stored with the speech database that is deposited in storer, corresponding Chinese that one by one it is received explains that each Chinese word in statement carries out voice match according to the Chinese speech pronunciation word order, to keep in again with its Chinese that receives explain that the Chinese speech pronunciation signal that the Chinese word in statement is complementary according to the order of sequence reaches the voice output module successively, the Chinese speech pronunciation signal of explaining each Chinese word in statement corresponding to Chinese is detected one by one according to the order of sequence by the voice output module, after reading, by the loudspeaker in the voice output module send successively with its Chinese that receives explain the Chinese speech that each Chinese word in statement is corresponding,
when sound identification module judges that its explanation statement that receives explains that for Ke's language statement and its Ke's language that receives explain that statement is Ke's language word of expressing with Arabic alphabet or Cyrillic, sound identification module is with true man Ke's language sound bank of storing in speech database, corresponding Ke's language that one by one it is received explains that each Ke's language word of statement carries out voice match according to Ke's language pronunciation word order, to keep in again and receive with it Ke's language pronunciation signal that Ke's language explains that Ke's language word in statement is complementary according to the order of sequence and reach successively the voice output module, receive Ke's language corresponding to it and explain that Ke's language pronunciation signal of each Ke's language word in statement is detected one by one according to the order of sequence by the voice output module, after reading, sent successively with Ke's language by the loudspeaker in the voice output module and explain Ke's language voice that in statement, each Ke's language word is complementary, if sound identification module judges its explanation statement that receives and explains statement for Ke's language, but in the time of can not explaining that statement carries out voice match to this Ke's language, infer this Ke's language and explain that statement is Ke's language text of expressing with Arabic alphabet or Cyrillic, and call synthetic Ke's language sound bank of storing in speech database Ke's language text is carried out phonetic synthesis based on syllable, by Ke's language statement word is corresponding with the syllable splitting method, Ke's language text is cut into the Ke language word of known as memory in synthesis speech database, use again true man Ke's language sound bank and/or synthetic Ke's language sound bank, correspondingly one by one each Ke's language word of this Ke's language text is carried out voice match according to Ke's language pronunciation word order, there is Ke's language pronunciation signal that Ke's language word of being cut into according to the order of sequence with Ke's language text is complementary to reach successively the voice output module with temporary, Ke's language pronunciation signal is detected one by one according to the order of sequence by the voice output module, after reading, by the loudspeaker in the voice output module send successively with Ke's language text in Ke's language voice of being complementary of each Ke's language word.
the present invention is based on computational linguistics, Ethnology, sociology, pragmatics, Chinese Ke language two-way multimedia e-dictionary of interpretative science and computer information processing science and technology, the bilingual coded format of Chinese Ke based on the UNICODE international standard, to realize Chinese Ke, the two-way word input function of Ke Han, Chinese Ke word and text reading function, have the function of utilizing the screen word-selecting method to obtain Chinese Ke character and the function that domestic and international Kirgiz (language) literal code is changed under different operating system, the multilingual interface that also has Chinese Ke language, to Chinese Ke word quick-searching, fuzzy search, can directly input kirgiz, the dictionary dictionary is managed, subsidiary dictionary setting, the dictionary instrument, the dictionary appendix, the functions such as online upgrading.
the invention provides Kirgiz language arabian writing input method, but do not rely on other kirgiz input method, improved availability, screen word-selecting Chinese Ke is provided two-way real time translation, for using Chinese, the user of Kirgiz language has brought convenience, provide the standard of Chinese Ke word and expression to read aloud, it is learning Chinese, the powerful of Kirgiz language, have magnanimity kirgiz corpus and word, conversion Presentation Function between phrase explanation function and Kirgiz language Slav word (Kirghizstan) and Kirgiz language arabian writing (Xinjiang, China), facilitate other personnel that say non-Kirgiz language to learn Kirgiz's language, the Kirgiz national history, folkways and customs, say the personnel of non-Kirgiz language for other and understand Xinjiang and Kirghizstan's geography information and zone, style and features provides lot of examples.
The invention solves all domestic and international Kirgiz people take the Kirgiz language as mother tongue and be difficult to obtain aphasis problem in modern knowledge and daily life, make domestic and international Kirgiz language learner can translate fast and then obtain various information, not only facilitate Kirgiz people's learning Chinese, and facilitate the comrade of Han nationality and the foreigner to learn the Kirgiz language, be Kirgiz language, Chinese user learning Chinese, Ke's language translation tool, the Chinese that improves the Kirgiz people is said that the level of writing has profound significance; On the other hand Chinese Ke in future (language) mechanical translation dictionary storehouse is built, the exploitation of crow (Uzbek's literary composition) Chinese, soil (Turkey's literary composition) Chinese bidirectional electronic dictionary and auxiliary engine translation system is laid a solid foundation.
Technical characterstic of the present invention is: 1. the two-way word translation service between Chinese, Kirgiz language is provided, and above-mentioned any one language word of input can obtain its lexical or textual analysis in another language in Chinese Ke e-dictionary of the present invention; The kirgiz assembly type input method of the international UNICODE standard that 2. provides support, when namely the user did not install any Ke's language input method, this dictionary still can correctly be inputted Ke's language word of standard; 3. the Windows of current main-stream series operating system (Windows XP Windows Server Windows Vista Windows 7) in, can realize Ke's language is carried out the function of screen word-selecting; 4. use statistics and phonetics to realize the function of reading aloud to Ke's language word and text, massage voice reading standard, clear has more advanced technical characteristic; 5. the additional functions such as dictionary online upgrading, dictionary setting, dictionary instrument, dictionary appendix are provided, can arrange according to user's needs; 6. provide friendly multilingual dictionary interface, by dictionary interface and the direction that obtains different language that arrange of hommization; 7. realize the function to the automatic identification of input characters language, analyze institute's input characters, automatically institute's input characters is carried out the languages judgement, and it is carried out word translation; 8. collecting in Chinese Ke dictionary has nearly 100,000 vocabulary, has set up simultaneously true man's sound bank and based on the synthetic storehouse of the massage voice reading of syllable splitting technology; 9. realize the conversion Presentation Function between Kirgiz language Slav word (Central Asia-Kirghizstan) and Kirgiz language arabian writing (Xinjiang, China), namely show simultaneously above-mentioned two kinds of written forms in the lexical or textual analysis window, thereby effectively widen usable range of the present invention.It is rational in infrastructure for electronic dictionary of the present invention, highly versatile, its method changes the dictionary technology of original tradition, common Chinese and Kirgiz language intertranslation, improves the efficient that Chinese and Kirgiz language are translated mutually, improves Chinese written language, Kirgiz Chinese language word are carried out the performance that voice are broadcasted.
Description of drawings
Accompanying drawing is module connection diagram of the present invention and the main-process stream schematic diagram of automatically translating the method for Chinese Ke language thereof.
Embodiment
a kind of Chinese Ke e-dictionary, as shown in drawings, by languages identification module 2, retrieval module 3, retrieval array output module 4, display module 1, sound identification module 5 and voice output module 6 form, languages identification module 2 connects the interface of display module 1 and the interface of retrieval module 3 by its corresponding interface, retrieval module 3 is by the input end interface of the corresponding chained search array output of its output terminal interface module 4, the corresponding input end interface that connects sound identification module 5 of output terminal interface of retrieval array output module 4, sound identification module 5 connects the input end interface of voice output module 6 by its output terminal interface.
A kind of Chinese Ke e-dictionary is translated the method for Chinese Ke language automatically, and as shown in drawings, its step of processing according to the order of sequence is as follows:
(I) shows by display module 1 word that (by keyboard) inputted, make successively institute's input characters mixing layout and picture and text mixed composition, structure is got the word window, languages identification module 2 utilizes gets the word window by the method for screen word-selecting, obtain the corresponding inputting character code zone of institute's input characters that shows with display module 1, (universal character set: the coded character Universal Multiple-Octet Coded Character Set) compares with the word inputted and stored UNICODE standard code character set, the languages of judgement institute input characters are Chinese or Ke's language, again institute's input characters of identified languages is reached retrieval module 3, annotate: be Chinese alphabetic writing if languages identification module 2 is judged its institute's input characters that receives, first with the monogram of input Chinese alphabetic writing be deposited at storer in basic corpus (getting the word database) in all monograms of phonetic corpus compare one by one (if all monograms that the monogram of the Chinese alphabetic writing of inputting and phonetic corpus are stored are not identical or not corresponding, can not obtain the Chinese word identical with the pronunciation of input Chinese alphabetic writing from the phonetic corpus, if the monogram of the Chinese alphabetic writing of inputting is identical or corresponding with a certain monogram that the phonetic corpus is stored, can obtain the Chinese word corresponding with input Bopomofo pronunciation word from the phonetic corpus), to obtain the Chinese word identical with the pronunciation of inputted Chinese alphabetic writing, namely access the list of enumerating the candidate Chinese word identical with above-mentioned Chinese alphabetic writing pronunciation from the phonetic corpus, the user selects a certain candidate's Chinese word from this list, to transfer to display module 1 with the Chinese alphabetic writing identical a certain candidate's Chinese word that pronounces, show this a certain candidate's Chinese word by display module 1, to be sent to retrieval module 3 with the Chinese alphabetic writing identical Chinese word that pronounces again, described phonetic corpus stores the Chinese word (index) identical with each Chinese phonetic alphabet combining characters pronunciation, Chinese word (index), if it is Chinese written language that languages identification module 2 is judged its institute's input characters that directly receives, directly this Chinese written language is transferred to retrieval module 3,
(II) retrieval module 3 obtains retrieval mode with institute's input characters of identified languages and the character of storing in the Han of storage-Ke corpus and Ke-Han corpus side by side compare (described character is Chinese word or Ke's language word) in being deposited at the basic corpus of storer, to retrieve the character combination identical or corresponding with the character of institute's input characters of identified languages from basic corpus, institute's input characters of confirming identified languages is known individual character or the word that has been stored in basic corpus, or further initiatively complete Chinese word combination or word letter combination, if can not retrieve character combination-Chinese word or the Ke language word identical or corresponding with the institute input characters from Han-Ke corpus and Ke-Han corpus, institute's input characters of the retrieval module 3 identified languages of judgement is unknown, can not be confirmed by languages identification module 2, receive, described Han-Ke corpus stores and each Chinese word or the corresponding Ke's language word of Chinese word remittance, described Ke-Chinese material stock contains and each corresponding Chinese word of Ke's language word or Chinese word,
(III) languages identification module 2 receives the character combination that retrieval module 3 retrieves, and access corresponding with the character combination meaning that is retrieved by retrieval module 3 Han that stores from basic corpus-Ke corpus and Ke-Han corpus and another languages character combination that be different from institute's input characters languages-be translated into Chinese word, Chinese word or Ke's language word, be about to Ke's language word and be translated into Chinese word or Chinese word, or Chinese word or Chinese word are translated into Ke's language word, again institute's input characters and/or another languages character combination corresponding with the institute input characters meaning of being accessed from basic corpus by languages identification module 2 by retrieval module 3 or directly reach retrieval array output module 4,
(IV) retrieval array output module 4 is according to institute's input characters and/or another languages character combination corresponding with the institute input characters meaning of being accessed from basic corpus by languages identification module 2, obtain in the Han of storage-Han corpus and Ke-Ke corpus side by side from basic corpus for the be retrieved Chinese of the meaning of the character combination that module 3 retrieves of explanation and explain statement, according to Slav Wen Keyu word and Arabic Ke language word mapping table, obtain Ke's language explanation statement (carrying out text-converted processes) that the Ezra husband letter corresponding with above-mentioned another languages character combination meaning or Arabic alphabet are expressed, the explanation statement of having done with above-mentioned a certain languages word must be the explanation statement made from the word of languages under institute's input characters, the meaning of tackling mutually the character combination that is accessed by languages identification module 2 from basic corpus makes an explanation and (explains that as a certain Ke's language word being used the Chinese corresponding with its meaning statement makes an explanation, perhaps use the Ke language explanation statement of with Arabic alphabet or Cyrillic expressing corresponding with its meaning to make an explanation to a certain Chinese word or word, perhaps use the Ke language explanation statement of with Arabic alphabet or Cyrillic expressing corresponding with its meaning to make an explanation to a certain Ke's language word, perhaps use the Chinese corresponding with its meaning to explain that statement makes an explanation to a certain Chinese word or word), the explanation statement that retrieval array output module 4 retrieves it again (statement and Ke's language explanation statement explained in Chinese) exports sound identification module 5 to, for example, the described Chinese-Chinese material stock contains the Chinese word and sentence that each Chinese word or word are made explanations, and described Ke-Ke corpus stores Ke's language words and phrases that each Ke's language word is made explanations,
when (V) is Chinese explanation statement when sound identification module 5 its explanation statements that receive of judgement, sound identification module 5 use are deposited at true man's Chinese speech information library that the speech database in storer is stored, corresponding Chinese that one by one it is received explains that each Chinese word in statement carries out voice match according to the Chinese speech pronunciation word order, to keep in again with its Chinese that receives explain that the Chinese speech pronunciation signal that the Chinese word in statement is complementary according to the order of sequence reaches voice output module 6 successively, the Chinese speech pronunciation signal of explaining each Chinese word in statement corresponding to Chinese is detected one by one according to the order of sequence by voice output module 6, after reading, by the loudspeaker in voice output module 6 send successively with its Chinese that receives explain the Chinese speech that each Chinese word in statement is corresponding,
when its explanation statements that receive of sound identification module 5 judgement explain that for Ke's language statement and its Ke's language that receives explain that statement is Ke's language word with Arabic alphabet or Cyrillic expression, true man Ke's language sound bank of storing in sound identification module 5 use speech databases, corresponding Ke's language that one by one it is received explains that each Ke's language word of statement carries out voice match according to Ke's language pronunciation word order, to keep in again and receive with it Ke's language pronunciation signal that Ke's language explains that Ke's language word in statement is complementary according to the order of sequence and reach successively voice output module 6, receive Ke's language corresponding to it and explain that Ke's language pronunciation signal of each Ke's language word in statement is detected one by one according to the order of sequence by voice output module 6, after reading, sent successively with Ke's language by the loudspeaker in voice output module 6 and explain Ke's language voice that in statement, each Ke's language word is complementary, if sound identification module 5 its explanation statements that receive of judgement are explained statement for Ke's language, but in the time of can not explaining that statement carries out voice match to this Ke's language, infer this Ke's language and explain that statement is Ke's language text (namely changing text-processing over to) of expressing with Arabic alphabet or Cyrillic, and call synthetic Ke's language sound bank of storing in speech database Ke's language text is carried out phonetic synthesis based on syllable, by Ke's language statement word is corresponding with the syllable splitting method, Ke's language text is cut into the Ke language word of known as memory in synthesis speech database, use again true man Ke's language sound bank and/or synthetic Ke's language sound bank, correspondingly one by one each Ke's language word of this Ke's language text is carried out voice match according to Ke's language pronunciation word order, there is Ke's language pronunciation signal that Ke's language word of being cut into according to the order of sequence with Ke's language text is complementary to reach successively voice output module 6 with temporary, Ke's language pronunciation signal is detected one by one according to the order of sequence by voice output module 6, after reading, by the loudspeaker in voice output module 6 send successively with Ke's language text in Ke's language voice of being complementary of each Ke's language word.
Described retrieval mode is stem retrieval mode, afterbody retrieval mode or comprises retrieval mode; The stem retrieval mode is: each character in A, retrieval module 3 typing one by one according to the order of sequence from left to right institute input characters, B, with the character combination data of storing in basic corpus (Han-Ke corpus and Ke-Chinese data storehouse) with compared by institute's input characters character combination of typing, if can search out from basic corpus and be made up identical character by the alphabetic character of typing, stop retrieval, namely complete the work that exact matching goes out institute's input characters; If can not search out by the stem retrieval mode character combination identical with the institute input characters from basic corpus, adopt following afterbody retrieval mode to continue the word that retrieval is inputted;
The afterbody retrieval mode is: 1. retrieval module 3 each character in (left side, the right of facing according to the people) institute of typing one by one according to the order of sequence input characters from right to left, 2. with the step B of above-mentioned stem retrieval mode; If can not search out by the stem retrieval mode character identical with the institute input characters from basic corpus, adopt the following retrieval mode that comprises to continue the word that retrieval is inputted;
Comprise retrieval mode for mate the retrieval mode of the character combination of institute's input characters from any direction, comprise above-mentioned stem retrieval mode and afterbody retrieval mode, retrieval module 3 comprises retrieval mode by this and search out the character identical with the institute input characters from basic corpus, finally completes the work of exact matching institute input characters.
Retrieval flow of the present invention relates to languages identification module 2, retrieval module 3, retrieval array output module 4 and basic corpus, its main flow process is: 1) at first, the user is by Chinese or Ke's language input method input Chinese written language or Ke's language word, input the word of required inquiry, by the UNICODE coding of input data, the languages (Chinese or Kirgiz language) of judgement institute's input characters (source language word or text); The languages of the retrieval mode judgement institute input characters that 2) arranges according to the user, retrieval module 3 retrieve and the Chinese of institute's input characters (source language word or text) coupling and/or Ke's language word, text; 3) result of retrieving according to 3 pairs of institute's input characters of retrieval module, match identical with the institute input characters from basic corpus or corresponding Chinese word and/or Ke's language word Chinese equivalent in meaning explain that example sentence and Ke's language explain example sentence, and the data that need to export of combination producing.
Screen word-selecting of the present invention, translation flow relate to languages identification module 2, display module 1, retrieval module 3 and get word database (basic corpus), and its main flow process is: 1) user's input characters (word, the text that need translation); 2) languages identification module 2 is by the languages (Chinese or Kirgiz language) of the UNICODE coding judgement above-mentioned institute input characters (source language word or text) of input data; 3) different language of judging according to 2 pairs of institute's input characters of languages identification module, retrieval module 3 is from getting word Chinese storehouse or getting and obtain word, the text that is complementary with the institute input characters word Ke language dictionary (Han-Ke corpus and/or Ke-Han corpus); 4) according to the result of the final coupling of 3 pairs of institute's input characters of retrieval module, display module 1 builds the screen word-selecting translation interface by text mixed composition technology and picture and text mixed composition technology, shows final translation result (Chinese word and sentence or Ke's language words and phrases).
the flow process that voice of the present invention are read aloud relates to languages identification module 2, voice output module 6, retrieval array output module 4 and speech database, its main flow process is: 1) languages identification module 2 receives to it Chinese that retrieval array output module 4 is sent, Ke's language explains that statement (word of inputting) carries out the languages judgement in the screen word-selecting link, if the explanation statement of inputting is Chinese word and sentence, the Chinese word of inputting from true man's Chinese speech information library coupling, if the explanation statement of inputting is Ke's language words and phrases, continue to judge whether Ke's language explanation statement that languages identification module 2 receives is Ke's language word, if the word of inputting is Ke's language word, directly match identical or corresponding Ke's language word from true man Ke's language sound bank, if voice output module 6 can not find Ke's language word of coupling, it is changed over to the text-processing process, if the explanation statement of namely inputting is Ke's language text, utilize Ke's language statement syllable splitting technology, be Ke's language word with Ke's language text according to the cutting of Ke's language language feature, and be syllable with Ke's language word in Ke's language text according to the characteristics cutting of Ke's language, match all syllables of each Ke's language word of Ke's language text from synthetic Ke's language sound bank, the complete Ke's language speech text of final composition, 2) by the computer speech equipment Inspection, above-mentioned Ke's language text is read and exports, plays.
the user inputs word to be checked (source language word or text) in the input frame of screen display by keyboard entry method, after the word of inputting is identified the identified category of language of link (Chinese or Ke's language) through languages, utilize the phonetic retrieval method by retrieval module 3, the stem descriptor index method, the afterbody descriptor index method, comprise any one method in descriptor index method and exact matching descriptor index method, to word and the phonetic corpus of inputting, Chinese Ke corpus, the word of Ke Han corpus mates, retrieve the to be translated word corresponding or identical with above-mentioned institute input characters from basic corpus, then the word to be translated that retrieves from basic corpus according to retrieval module 3, retrieval array output module 4 is obtained the Chinese corresponding with the described word meaning to be translated and is explained that statement and Ke's language explain statement, again by text mixed composition technology, picture and text mixed composition technology is edited, the Chinese of translation is explained that statement or Ke's language explanation statement are combined into the lteral data of output, be presented in (screen) results display area territory.
the word (word or text) of the explanation to be translated that the user inputs by the cursor locator meams, the word of inputting is after identifying link through languages, languages identification module 2 retrieves another languages word (translation data) equivalent in meaning or corresponding with the word of inputting (target language or source language word or text) from commonly using to get word Chinese storehouse and commonly use to get again word Ke repertorie (Han-Ke corpus and/or Ke-Han corpus), again by text mixed composition technology, picture and text mixed composition technology is combined into the output data with translation data (result), and meet with the dynamical fashion structure display interface of exporting size of data, show final translation result.
After user's input characters (source language word or text), after institute's input characters confirms that through languages identification link, Word search link, Ke's language syllable segmentation of words link etc. translated in link, Chinese and Ke's language, call again true man's Chinese speech information library, true man Ke's language sound bank and synthetic Ke's language sound bank, institute's input characters is generated corresponding Chinese or Ke's language voice document, sound identification module 5 (speech detection equipment) reads the above-mentioned word of inputting, and sends successively the voice of institute's input characters by syllable by its loudspeaker.

Claims (3)

1. Chinese Ke e-dictionary, it is characterized in that: by languages identification module (2), retrieval module (3), retrieval array output module (4), display module (1), sound identification module (5) and voice output module (6) form, languages identification module (2) connects the interface of display module (1) and the interface of retrieval module (3) by its corresponding interface, retrieval module (3) is by the input end interface of the corresponding chained search array output module of its output terminal interface (4), the corresponding input end interface that connects sound identification module (5) of the output terminal interface of retrieval array output module (4), sound identification module (5) connects the input end interface of voice output module (6) by its output terminal interface.
2. Chinese Ke e-dictionary is translated the method for Chinese Ke language automatically, and its step of processing according to the order of sequence is as follows:
(I) shown the word of inputting by display module (1), structure is got the word window, languages identification module (2) utilization is got the word window by the method for screen word-selecting, obtain the corresponding inputting character code zone of institute's input characters that shows with display module (1), the word inputted and the coded character in stored UNICODE standard code character set are compared, the languages of judgement institute input characters are Chinese or Ke's language, then institute's input characters of identified languages is reached retrieval module (3);
(II) retrieval module (3) obtains retrieval mode institute's input characters of identified languages and the character of storing in the Han of storage-Ke corpus and Ke-Han corpus side by side in being deposited at the basic corpus of storer is compared, to retrieve the character combination identical or corresponding with the character of institute's input characters of identified languages from basic corpus, institute's input characters of confirming identified languages is known individual character or the word that has been stored in basic corpus, or further initiatively complete Chinese word combination or word letter combination, if can not retrieve character combination-Chinese word or the Ke language word identical or corresponding with the institute input characters from Han-Ke corpus and Ke-Han corpus, institute's input characters of the identified languages of retrieval module (3) judgement is unknown, can not be confirmed by languages identification module (2), receive,
(III) languages identification modules (2) receive the character combination that retrieval module (3) retrieves, and access corresponding with the character combination meaning that is retrieved by retrieval module (3) the Han that stores from basic corpus-Ke corpus and Ke-Han corpus and another languages character combination that be different from institute's input characters languages-be translated into Chinese word, Chinese word or Ke's language word, again institute's input characters and/or another languages character combination corresponding with the institute input characters meaning of being accessed from basic corpus by languages identification module (2) by retrieval module (3) or directly reach and retrieve array output module (4),
(IV) retrieval array output module (4) is according to institute's input characters and/or another languages character combination corresponding with the institute input characters meaning of being accessed from basic corpus by languages identification module (2), obtain in the Han of storage-Han corpus and Ke-Ke corpus side by side from basic corpus for the be retrieved Chinese of the meaning of the character combination that module (3) retrieves of explanation and explain statement, according to Slav Wen Keyu word and Arabic Ke language word mapping table, obtain Ke's language explanation statement that the Ezra husband letter corresponding with above-mentioned another languages character combination meaning or Arabic alphabet are expressed, the meaning of tackling mutually the character combination that is accessed by languages identification module (2) from basic corpus makes an explanation, the explanation statement that retrieval array output module (4) retrieves it again exports sound identification module (5) to,
when (V) judges that when sound identification module (5) its explanation statement that receives is Chinese explanation statement, true man's Chinese speech information library that sound identification module (5) is stored with the speech database that is deposited in storer, corresponding Chinese that one by one it is received explains that each Chinese word in statement carries out voice match according to the Chinese speech pronunciation word order, to keep in again with its Chinese that receives explain that the Chinese speech pronunciation signal that the Chinese word in statement is complementary according to the order of sequence reaches voice output module (6) successively, the Chinese speech pronunciation signal of explaining each Chinese word in statement corresponding to Chinese is detected one by one according to the order of sequence by voice output module (6), after reading, by the loudspeaker in voice output module (6) send successively with its Chinese that receives explain the Chinese speech that each Chinese word in statement is corresponding,
when sound identification module (5) judges that its explanation statement that receives explains that for Ke's language statement and its Ke's language that receives explain that statement is Ke's language word of expressing with Arabic alphabet or Cyrillic, sound identification module (5) is with true man Ke's language sound bank of storing in speech database, corresponding Ke's language that one by one it is received explains that each Ke's language word of statement carries out voice match according to Ke's language pronunciation word order, to keep in again and receive with it Ke's language pronunciation signal that Ke's language explains that Ke's language word in statement is complementary according to the order of sequence and reach successively voice output module (6), receive Ke's language corresponding to it and explain that Ke's language pronunciation signal of each Ke's language word in statement is detected one by one according to the order of sequence by voice output module (6), after reading, sent successively with Ke's language by the loudspeaker in voice output module (6) and explain Ke's language voice that in statement, each Ke's language word is complementary, if sound identification module (5) judges its explanation statement that receives and explains statement for Ke's language, but in the time of can not explaining that statement carries out voice match to this Ke's language, infer this Ke's language and explain that statement is Ke's language text of expressing with Arabic alphabet or Cyrillic, and call synthetic Ke's language sound bank of storing in speech database Ke's language text is carried out phonetic synthesis based on syllable, by Ke's language statement word is corresponding with the syllable splitting method, Ke's language text is cut into the Ke language word of known as memory in synthesis speech database, use again true man Ke's language sound bank and/or synthetic Ke's language sound bank, correspondingly one by one each Ke's language word of this Ke's language text is carried out voice match according to Ke's language pronunciation word order, there is Ke's language pronunciation signal that Ke's language word of being cut into according to the order of sequence with Ke's language text is complementary to reach successively voice output module (6) with temporary, Ke's language pronunciation signal is detected one by one according to the order of sequence by voice output module (6), after reading, by the loudspeaker in voice output module (6) send successively with Ke's language text in Ke's language voice of being complementary of each Ke's language word.
3. Chinese Ke e-dictionary according to claim 2 is translated the method for Chinese Ke language automatically, it is characterized in that: described retrieval mode is stem retrieval mode, afterbody retrieval mode or comprises retrieval mode;
The stem retrieval mode is: each character in A, retrieval module (3) typing one by one according to the order of sequence from left to right institute input characters, B, with the character combination data of storing in basic corpus with compared by institute's input characters character combination of typing, if can search out from basic corpus and be made up identical character by the alphabetic character of typing, stop retrieval, namely complete the work that exact matching goes out institute's input characters; If can not search out by the stem retrieval mode character combination identical with the institute input characters from basic corpus, adopt following afterbody retrieval mode to continue the word that retrieval is inputted;
The afterbody retrieval mode is: 1. retrieval module (3) each character in (left side, the right of facing according to the people) institute of typing one by one according to the order of sequence input characters from right to left, 2. with the step B of above-mentioned stem retrieval mode; If can not search out by the stem retrieval mode character identical with the institute input characters from basic corpus, adopt the following retrieval mode that comprises to continue the word that retrieval is inputted;
Comprise retrieval mode for mate the retrieval mode of the character combination of institute's input characters from any direction, comprise above-mentioned stem retrieval mode and afterbody retrieval mode.
CN201110426747.XA 2011-12-19 2011-12-19 The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language Active CN103164395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110426747.XA CN103164395B (en) 2011-12-19 2011-12-19 The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110426747.XA CN103164395B (en) 2011-12-19 2011-12-19 The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language

Publications (2)

Publication Number Publication Date
CN103164395A true CN103164395A (en) 2013-06-19
CN103164395B CN103164395B (en) 2017-06-23

Family

ID=48587491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110426747.XA Active CN103164395B (en) 2011-12-19 2011-12-19 The method of Chinese Ke e-dictionary and its automatic translation Chinese Ke's language

Country Status (1)

Country Link
CN (1) CN103164395B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298661A (en) * 2013-12-29 2015-01-21 新疆信息产业有限责任公司 Method for using Kirgiz translation engine for self-service electric fee payment terminal
CN105095192A (en) * 2014-05-05 2015-11-25 武汉传神信息技术有限公司 Double-mode translation equipment
CN106202147A (en) * 2016-06-16 2016-12-07 塔里木大学 A kind of Ke Han intertranslation electronic dictionary based on Android
CN112818212A (en) * 2020-04-23 2021-05-18 腾讯科技(深圳)有限公司 Corpus data acquisition method and device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6085162A (en) * 1996-10-18 2000-07-04 Gedanken Corporation Translation system and method in which words are translated by a specialized dictionary and then a general dictionary
CN101329667A (en) * 2008-08-04 2008-12-24 深圳市大正汉语软件有限公司 Intelligent translation apparatus of multi-language voice mutual translation and control method thereof
KR20110069488A (en) * 2009-12-17 2011-06-23 주식회사 아이리버 System for automatic searching of electronic dictionary according input language and method thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298661A (en) * 2013-12-29 2015-01-21 新疆信息产业有限责任公司 Method for using Kirgiz translation engine for self-service electric fee payment terminal
CN105095192A (en) * 2014-05-05 2015-11-25 武汉传神信息技术有限公司 Double-mode translation equipment
CN106202147A (en) * 2016-06-16 2016-12-07 塔里木大学 A kind of Ke Han intertranslation electronic dictionary based on Android
CN112818212A (en) * 2020-04-23 2021-05-18 腾讯科技(深圳)有限公司 Corpus data acquisition method and device, computer equipment and storage medium
CN112818212B (en) * 2020-04-23 2023-10-13 腾讯科技(深圳)有限公司 Corpus data acquisition method, corpus data acquisition device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN103164395B (en) 2017-06-23

Similar Documents

Publication Publication Date Title
WO2008107305A2 (en) Search-based word segmentation method and device for language without word boundary tag
Karim Technical challenges and design issues in bangla language processing
CN103314369B (en) Machine translation apparatus and method
Şeker et al. Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content 1
Tachicart et al. Building a Moroccan dialect electronic dictionary (MDED)
Tursun et al. Noisy Uyghur text normalization
CN111814485A (en) Semantic analysis method and device based on massive standard document data
CN103164397A (en) Chinese-Kazakh electronic dictionary and automatic translating Chinese- Kazakh method thereof
CN103164398A (en) Chinese-Uygur language electronic dictionary and automatic translating Chinese-Uygur language method thereof
Wehrmeyer A corpus for signed language<? br?> interpreting research
CN102929865B (en) PDA (Personal Digital Assistant) translation system for inter-translating Chinese and languages of ASEAN (the Association of Southeast Asian Nations) countries
Lyons A review of Thai–English machine translation
CN103164395A (en) Chinese-Kirgiz language electronic dictionary and automatic translating Chinese-Kirgiz language method thereof
CN101441626A (en) Multimedia retrieval system and method
CN103164396A (en) Chinese-Uygur language-Kazakh-Kirgiz language electronic dictionary and automatic translating Chinese-Uygur language-Kazakh-Kirgiz language method thereof
Kirmizialtin et al. Automated transcription of non-Latin script periodicals: a case study in the ottoman Turkish print archive
Raupova Principles of creating an electronic dictionary of grammatical terms
CN103680503A (en) Semantic identification method
Yadava et al. Construction and annotation of a corpus of contemporary Nepali
CN102135957A (en) Clause translating method and device
Rosmorduc Computational linguistics in egyptology
KR100463376B1 (en) A Translation Engine Apparatus for Translating from Source Language to Target Language and Translation Method thereof
Sankaravelayuthan et al. English to tamil machine translation system using parallel corpus
Shih et al. Improved Rapid Automatic Keyword Extraction for Voice-based Mechanical Arm Control.
CN110874527A (en) Cloud-based intelligent paraphrasing and phonetic notation system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant