CN103984420B - A kind of Tibetan language intelligent input method based on phonetic - Google Patents

A kind of Tibetan language intelligent input method based on phonetic Download PDF

Info

Publication number
CN103984420B
CN103984420B CN201410142863.2A CN201410142863A CN103984420B CN 103984420 B CN103984420 B CN 103984420B CN 201410142863 A CN201410142863 A CN 201410142863A CN 103984420 B CN103984420 B CN 103984420B
Authority
CN
China
Prior art keywords
tibetan language
input method
phonetic
input
syllable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410142863.2A
Other languages
Chinese (zh)
Other versions
CN103984420A (en
Inventor
程卫军
洛桑旦增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minzu University of China
Original Assignee
Minzu University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minzu University of China filed Critical Minzu University of China
Priority to CN201410142863.2A priority Critical patent/CN103984420B/en
Publication of CN103984420A publication Critical patent/CN103984420A/en
Application granted granted Critical
Publication of CN103984420B publication Critical patent/CN103984420B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of Tibetan language intelligent input method based on phonetic.This method is:1)Each Tibetan language consonant and each Tibetan language vowel are set into a key mapping code respectively;2)Pinyin coding corresponding to setting one to each syllable according to the spelling of Tibetan language syllable order, and be saved into an input method character library;3)Establish the relational tree of a Pinyin coding and key mapping code;4)One input method engine is established based on the input method character library, the input method engine travels through the relational tree according to the key mapping code of input, obtains corresponding Pinyin coding;Then the input method character library, Tibetan language corresponding to return are inquired about according to the Pinyin coding.Compared with prior art, the present invention has repeated code low, it is easy to accomplish, the features such as being easy to establish and expand dictionary, and meet the Natural thinking of Tibetan language writing, allow the input of Tibetan language to be more convenient, be quick, flexibly.

Description

A kind of Tibetan language intelligent input method based on phonetic
Technical field
The present invention relates to a kind of input method, more particularly to a kind of Tibetan language intelligent input method based on phonetic.
Background technology
Tibetan language since initiative, no matter the main carriers as national culture succession, or be used as Tibetan area propagation section now The main tool of skill knowledge, or even the main indications national as one in informationized society, its unique human culture valency Value and vast Tibetan area play great function be immeasurable.
Tibetan language strides into recent decades information age, and considerable hair has been obtained in each side of computer Tibetan information processing Exhibition, also achieves many achievements, Tibetan language typesetting processing, the transmitting-receiving of Tibetan language Email, Tibetan language is entered into from the typewriting of Tibetan language Web Hosting, the exploitation of Tibetan language application software, the making of Tibetan language courseware etc..
Tibetan language is alphabetic writing, with laterally writing property structure simultaneously also have longitudinal direction write property structure, its phrase or Sentence is by syllable one by one(Or it is word)Composition, the corresponding sound of a syllable a, syllable has several Tibetan language again Letter composition, slightly seem that it and English are much like, such asBut for a Tibetan language syllable, it is again from one Base word starts, and is formed by upper word adding, down word adding, pre-script, back word adding stack combinations, thus has the characteristics of plane word again.Hide The structure of the syllable of text is using a letter as core, and the letter of core is front and rear on this basis " base word ", remaining letter Additional and upper and lower fold is write, and is combined into a complete word table structure, and the appellation of each letter is according to the position for being added in base word And gain the name.As shown in Figure 1.
30 consonants of Tibetan language can make base word, and still, can do forward and backward, upper and lower plus word letter is all in the syntax As defined in having, and limited amount.
Centered on Tibetan language pronunciation is also base word consonant, a syllable only has a vowel(Vowel a can be omitted), therefore one The corresponding sound of individual syllable, when Tibetan language combines into syllables, since leftmost consonant, its order is 1)Pre-script, 2)It is upper to add Word, 3)Base word, 4)Down word adding, 5)Vowel, 6)Back word adding, 7)Back word adding again.
The writing of Tibetan language is in units of syllable, is from left to right write across the page, and is separated between syllable with dot, such as The sequential write of syllable and the sequence consensus combined into syllables, most of input method are also to come according to this order Input Tibetan code, but specific input hypothesis are complex, and have the drawbacks of very big because when writing upper word adding or under Some special letters during word are added to need to deform, so Tibetan language defines 211 characters altogether in international code Unicode, its In comprising general character, overlaying character, numerical chracter, astronomy and calendrical calculation symbol etc..Opentype character library marker characteristics are recycled, will These general characters combine with overlaying character, and this function is that fontlib possesses in itself rather than input method, and input method is by root Character code is formed according to the input of user, fontlib passes through character library marker characteristic so as to show Tibetan language syllable according to this coding.
At present, the input speed of Tibetan language still has obvious gap compared with the input speed of the other parts language such as Chinese, Especially on mobile terminals, main cause lacks the input method of efficient intelligence.Among existing input method, only a small number of tools There is the external Tibetan input method such as phrase inputting function, Microsoft's Himalaya input methods not support phrase or intelligent input method, and Domestic class's intelligence with phrase inputting employs the phrase encoding scheme that base word adds back word adding up to input method, but has not certainly So, difficult note is difficult uses and repeated code is more, and user can arbitrarily input character any combination, the shortcomings of violating the syntax of Tibetan language.Therefore it is badly in need of A kind of easy-to-use, natural, versatile and low repeated code intelligent input scheme is developed, to improve the input speed of Tibetan language.
The content of the invention
In order to overcome technical problem present in prior art, searched for it is an object of the invention to provide one kind based on phonetic Tibetan input method, according to the text structure, pronunciation character and spelling methods of Tibetan language, the present invention is using some letters as phonetic word Accord with and to identify specific syllable, do not consider that the additive process of syllable represents, realize Pinyin Input, therefore phonetic proposed by the present invention is defeated It is exactly based on it to enter method.The spelling rule of Tibetan language is specially made good use of, the phonetic of Tibetan language syllable and corresponding relation is deposited It is put into character library, Pinyin coding is formed by input method, target word collection is returned further according to input method engine.
Therefore the present invention has repeated code low, it is easy to accomplish, the features such as being easy to establish and expand dictionary, and meet Tibetan language writing Natural thinking, it is readily appreciated that and use.
The object of the invention is achieved by the following technical programs:
A kind of Tibetan language intelligent input method based on phonetic, its step are:
1)Each Tibetan language consonant and each Tibetan language vowel are set into a key mapping code respectively;
2)Pinyin coding corresponding to setting one to each syllable according to the spelling of Tibetan language syllable order, and it is saved into one In input method character library;
3)Establish the relational tree of a Pinyin coding and key mapping code;
4)One input method engine is established based on the input method character library, the input method engine is according to the key mapping code time of input The relational tree is gone through, obtains corresponding Pinyin coding;Then the input method character library is inquired about according to the Pinyin coding, returned corresponding Tibetan language.
Further, the method to phonetic corresponding to each syllable setting one is:For monocase Tibetan language syllable, if Put its phonetic for monocase Tibetan language syllable in itself;For more character Tibetan language syllables without upper and lower superposition, it is multiword to set its phonetic Accord with Tibetan language syllable in itself;For there is the more character Tibetan language syllables being superimposed up and down, set its phonetic for monocase Tibetan language syllable in itself.
Further, same Pinyin coding corresponds to one or more syllables.
Further, the input method engine searches the phonetic of matching according to Pinyin coding, by all and this phonetic Match somebody with somebody or the Tibetan language using this phonetic as beginning is shown in the candidate word region of input method, and sorted by word frequency order.
Further, on the mobile apparatus using full keyboard pattern or nine grids pattern as Tibetan language consonant and member The inputting interface of sound letter.
Further, the key mapping mode of Himalaya input method is used on PC as Tibetan language consonant and vowel Inputting interface.
A kind of Tibetan language intelligent input method flow chart based on phonetic of the present invention is as shown in Fig. 2 it is comprised the following steps that:
First, 30 Tibetan language consonants and 4 vowels are provided to form pinyin character, and it is suitable according to Tibetan language spelling Sequence combines to form phonetic corresponding to every syllable.
Such as the consonant that table 1 is Tibetan language:
Table 1 is Tibetan language consonant table
Such as the vowel that table 2 is Tibetan language:
Table 2 is Tibetan language vowel table
The phonetic according to corresponding to the spelling of Tibetan language order provides each syllable, its is specific as follows:
1. the phonetic of monocase syllable is itself.Such as table 3:
Table 3 is monocase syllable
2. the syllable of character more than has no superposition up and down(Except vowel character is superimposed)When phonetic be itself.Such as table 4:
Table 4 is without the more character syllables being superimposed up and down
Phonetic is determined by the spelling order of the word when 3. syllable of character more than has superposition.Such as table 5 below:
Table 5 is to have the more character syllables being superimposed up and down
According to three above-mentioned rules, we can determine that each syllable corresponds to phonetic, while it has also been found that same spelling substantially Sound corresponds to multiple syllables.
2nd, create and the phonetic of particular kind of relationship and corresponding syllable are added in character library and character library.
Target syllable is stored in character library with phonetic, and both storage organizations have two kinds of relations from data structure, a kind of It is one-one relationship, i.e. a string of pinyin characters only represent a syllable, and another kind is many-one relationship, and a string of pinyin representations are more Individual syllable, due to the particularity of Tibetan language, at most corresponding three syllables of a phonetic.
3rd, input method engine
Input method engine is the core for realizing intelligent input, and it provides an adapter for input method, that is, receives and use The code value of family input, will find the phonetic code corresponding to the code value in adapter, then phonetic code is scanned for character library, will The result of search returns to user, so as to complete to input.
Compared with existing input method, beneficial effects of the present invention:
1)It is versatile
Unicode coding of the character coding method of the present invention based on international standard, is easily achieved on different devices.
2)Key mapping typesetting is flexible
Due to the present invention pinyin character quantity it is few, it is only necessary to 34 key mappings, so we completely may be used on PC In a manner of the key mapping using the current commonplace Himalaya input method used, and it is very ingenious in the terminals such as mobile phone, flat board Full keyboard pattern or nine grids pattern are realized in ground, and existing mobile phone Tibetan input method does not have nine grids keyboard mode also.
3)The high repeated code of input rate is few
The Pinyin Input based on search is employed, and the many-one relationship amount of phonetic and word is few, not only repeated code is few, Er Qieneng Preferred word is accurately determined, improves input speed.Such as phraseCommon input method needs 11 keys, and this input method only needs 8 Key, or even less.For shorter word, preferred word can accurately just be determined first by inputting.
4)It is consistent with the writing thinking of Tibetan language, it is eager to learn handy
The present invention is the input method based on phonetic, and Tibetan language is also alphabetic writing in itself, and the writing of phonetic is completely in Tibetan Text writing thinking coincide, as long as have Tibetan language writing basis can Step By Step understand this input method.
5)It is easy to establish and expand dictionary
Fundamental system character library defined in the present invention, it includes all Tibetan language syllables, and we will be used by being established based on it Family dictionary, the dictionary not only need good data structure, it is also necessary to good extendibility, compatibility, using the volume of the present invention Code scheme, can insert all kinds of Tibetan language vocabulary in network in dictionary so that the expansion of dictionary is convenient to be realized.
6)Realize that word frequency records, input is faster
As the Chinese character coding input method of current main-stream, our input method also possesses programming count and the adjustment of word frequency, real The input mode of user's feature is now adapted to, the memory function of user's phrase can also be realized, makes the input of Tibetan language more convenient, fast It is prompt, flexible.
Brief description of the drawings
Below in conjunction with accompanying drawing, the present invention is described in further detail:
Fig. 1 is the structure elucidation schematic diagram of Tibetan language word;
Fig. 2 is the inventive method flow chart;
Fig. 3 is the input method theory diagram of a specific embodiment of the invention.
Embodiment
With reference to accompanying drawing, below the present invention is further described, implement the intelligent input based on phonetic by following steps Method scheme:
1. according to given Pinyin rule, input method character library is established.A kind of efficient input method all necessarily correspond to it The character library of specific structure, in the present invention, using the Unicode codings for overstating platform on coding, input is designed as shown in table 6 The structure of method character library, character library is by encoding(ID), syllable(Vlaue), phonetic code(PinyinCode)And frequency(Frequency) Four parts form.
Table 6 is input method character library
The foundation of character library is carried out according to the following steps:
A. in character library syllable material collection, consult vocabulary, the material such as dictionary of correlation, due to all pinyin characters all It is of the invention specific, it is necessary to by manually obtaining or writing program entry phonetic.
B. the phonetic of acquisition and syllable are integrated into input method character library, special program module can be designed and be responsible for this work Make.
2. establish the input method engine based on character library
Input method engine based on character library mainly provides an adapter between user and character library, as shown in Figure 3.Have Key mapping code and the corresponding phonetic value of matching user's input, output phonetic value simultaneously inquire about character library, return to result that character library is inquired about etc. Function.
All phonetic value and the relational trees of key mapping code defined in input method, adapter will travel through this tree obtain with The phonetic value that the current enter key bit code of user is matched, character library, and returning result are searched for according to resulting phonetic value.Work as user When starting input, all candidate word regions for being matched with this phonetic or input method being shown to using this phonetic as the word started In, and sorted by word frequency order, now user input Pinyin can completely can also be selected in candidate word regional choice target word Word.The ending that needs can also be added automatically according to the grammatical input method of Tibetan language accords with, and has four kinds of ending symbols in Tibetan language, and by corresponding The syntax regulation.
Tibetan language input process is exemplified below:
A. assume that keyboard layout is full keyboard, intends inputWord, its corresponding phonetic are:So when input first Individual characterWhen input method by it is all withUser interface is returned to for the word of the phonetic of beginning, such as Deng word, when user inputs second characterWhen, input method will be withThe word of the phonetic of beginning returns to user, such asDeng by that analogy, constantly being screened, finally obtain the word to be inputted.
B. phonetic and syllable in the present invention(Word)Relation mainly exist with one-one relationship, and search for process come See, find and just obtain result without whole phonetic of being totally lost, and input method engine returns to optimal result in input process To user, as long as we therefrom select can.

Claims (5)

1. a kind of Tibetan language intelligent input method based on phonetic, its step are:
1) each Tibetan language consonant and each Tibetan language vowel are set into a key mapping code respectively;The key mapping code is phonetic word Value;
2) according to the spelling of Tibetan language syllable order to Pinyin coding corresponding to each syllable setting one, and it is saved into an input In method character library;
3) relational tree of a Pinyin coding and key mapping code is established;
4) an input method engine is established based on the input method character library, the input method engine travels through institute according to the key mapping code of input Relational tree is stated, obtains corresponding Pinyin coding;Then the input method character library is inquired about according to the Pinyin coding, returns to corresponding hide Text;
Wherein, it is to the method for Pinyin coding corresponding to each syllable setting one:For monocase Tibetan language syllable, its phonetic is set For monocase Tibetan language syllable in itself;For more character Tibetan language syllables without upper and lower superposition, it is more character Tibetan language sounds to set its phonetic Section is in itself;For there are the more character Tibetan language syllables being superimposed up and down, its phonetic is set according to its spelling order.
2. input method as claimed in claim 1, it is characterised in that same Pinyin coding corresponds to one or more syllables.
3. input method as claimed in claim 1 or 2, it is characterised in that the input method engine is searched according to Pinyin coding and matched Phonetic, matched all with this phonetic or Tibetan language using this phonetic as beginning is shown to the candidate word regions of input method In, and sorted by word frequency order.
4. input method as claimed in claim 1, it is characterised in that on the mobile apparatus using full keyboard pattern or nine grids Pattern is as Tibetan language consonant and the inputting interface of vowel.
5. input method as claimed in claim 1, it is characterised in that made on PC using the key mapping mode of Himalaya input method For Tibetan language consonant and the inputting interface of vowel.
CN201410142863.2A 2014-04-10 2014-04-10 A kind of Tibetan language intelligent input method based on phonetic Expired - Fee Related CN103984420B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410142863.2A CN103984420B (en) 2014-04-10 2014-04-10 A kind of Tibetan language intelligent input method based on phonetic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410142863.2A CN103984420B (en) 2014-04-10 2014-04-10 A kind of Tibetan language intelligent input method based on phonetic

Publications (2)

Publication Number Publication Date
CN103984420A CN103984420A (en) 2014-08-13
CN103984420B true CN103984420B (en) 2017-11-14

Family

ID=51276430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410142863.2A Expired - Fee Related CN103984420B (en) 2014-04-10 2014-04-10 A kind of Tibetan language intelligent input method based on phonetic

Country Status (1)

Country Link
CN (1) CN103984420B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408037A (en) * 2014-12-05 2015-03-11 才智杰 Tibetan text vector model representation method
CN104615269B (en) * 2015-02-04 2018-01-16 史晓东 A kind of Tibetan language Latin simple double spelling coding method and its intelligent input system entirely

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1696880A (en) * 2005-05-08 2005-11-16 卢亚军 General keyboard layout of Tibetan computer, and input method
CN1737739A (en) * 2005-07-16 2006-02-22 西北民族大学 Tibetan input method based on English keyboard
CN101751140A (en) * 2008-12-22 2010-06-23 青海师范大学 Input method leading modern Tibetan scripts to correspond to fingerboard key maps one by one

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1696880A (en) * 2005-05-08 2005-11-16 卢亚军 General keyboard layout of Tibetan computer, and input method
CN1737739A (en) * 2005-07-16 2006-02-22 西北民族大学 Tibetan input method based on English keyboard
CN101751140A (en) * 2008-12-22 2010-06-23 青海师范大学 Input method leading modern Tibetan scripts to correspond to fingerboard key maps one by one

Also Published As

Publication number Publication date
CN103984420A (en) 2014-08-13

Similar Documents

Publication Publication Date Title
CN102298582B (en) Data search and matching process and system
CN101183281B (en) Method for inputting word related to candidate word in input method and system
CN102449579B (en) All-in-one chinese character input method
CN100437557C (en) Machine translation method and apparatus based on language knowledge base
KR20120006489A (en) Input method editor
CN103309926A (en) Chinese and English-named entity identification method and system based on conditional random field (CRF)
KR20100052461A (en) Word probability determination
CN103314369B (en) Machine translation apparatus and method
CN103324621A (en) Method and device for correcting spelling of Thai texts
CN104462072A (en) Input method and device oriented at computer-assisting translation
CN104252542A (en) Dynamic-planning Chinese words segmentation method based on lexicons
Prabhakar et al. Machine transliteration and transliterated text retrieval: a survey
CN101493727A (en) Natural participle and mixing input by statement input method
CN103324607B (en) Word method and device cut by a kind of Thai text
CN101667099B (en) A kind of method and apparatus of stroke connection keyboard text event detection
CN103810993A (en) Text phonetic notation method and device
CN103984420B (en) A kind of Tibetan language intelligent input method based on phonetic
CN102929864A (en) Syllable-to-character conversion method and device
CN101577115A (en) Voice input system and voice input method
CN101499056A (en) Backward reference sentence pattern language analysis method
CN114970524B (en) Controllable text generation method and device
CN108255818B (en) Combined machine translation method using segmentation technology
Singh et al. English-Dogri Translation System using MOSES
KR101982490B1 (en) Method for searching keywords based on character data conversion and apparatus thereof
Kang Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171114

Termination date: 20190410