Wei Hakewen intelligent input method based on syllable splitting
Technical field
The present invention relates to the letter input method field, belong to phonetic language as dimension (Uygur) language, breathe out (Kazak) language, the improvement of Ke (Kirgiz) language intelligent input method is particularly based on the Wei Hakewen intelligent input method of syllable splitting.
Background technology
In recent years, continuous progress along with ethnic group's informationization technology, the electronic software programming technique of relevant Xinjiang minority language has also had bigger development, but at Uighur, on Kazak and the language input method field, Kirgiz, still lack a kind of spoken and written languages of national minorities (dimension language that improves, breathe out language, Ke's language) input efficiency and the intelligent input method that improves its input quality, follow the Unicode standard though some have occurred, the simple intelligent input method that represents candidate word with rule compositor, but it uses still very inconvenient, fails to satisfy the demand that the user searches word fast.
Summary of the invention
The object of the present invention is to provide a kind of Wei Hakewen intelligent input method based on syllable splitting, can reduce the quantity that the user keys in the Wei Hakewen letter and improve the speed of typing word greatly, reduce input Wei Hakewen misspelling, further improve the demand of user's inquiry Wei Hakewen, thereby reach the purpose of quick and precisely importing required Wei Hake literal.
The object of the present invention is achieved like this: a kind of Wei Hakewen intelligent input method based on syllable splitting, and its step is as follows:
(I) index building: with each dimension language, breathe out the speak initial of all syllables of word of language or Ke and form the alphabetic string combination according to the order of sequence, the alphabetic string combination inductive set of arranging is according to the order of sequence become total syllable initial index, with the corresponding word index with the alphabetic string combination of arranging according to the order of sequence in total syllable initial index that forms of the combination of the alphabetic string in total syllable initial index, all words in the word index are corresponding respectively with all alphabetic string combinations of total syllable initial index, wherein the combination of its alphabetic string of some word includes the identical letter of a certain alphabetic string combination of total syllable initial index, and the ordering of its letter of word in the word index is also identical with the letter sequence of a certain alphabetic string combination of total syllable initial index;
(II) uses the step of above-mentioned indexed search target word that the user imports:
1. judge all syllables of target word that the user imports, the typing user imports the initial of each syllable in the target word successively, obtain by the user and imported in the target word character string that the syllable initial arranged is according to the order of sequence formed, the character string that is combined into syllable initial of obtaining directly perhaps that the user imports, the letter of all alphabetic strings combination rankings first that first syllable initial and total syllable initial index of character string are gathered compares again, if first syllable initial that obtains from character string is different from the letter of all alphabetic strings combination rankings first that total syllable initial index gathers, then location failure, stop target word that inquiring user is imported or character string, carry out following step 2.;
If first syllable initial that obtains from character string makes up the alphabetical identical of ranking first with a certain alphabetic string that total syllable initial index is gathered, then locate successfully, judge whether to have in the character string and be positioned at its first syllable initial second syllable initial afterwards, if be not positioned at second syllable initial after its first syllable initial in the above-mentioned syllable initial string, confirm that then this first syllable initial is last letter, directly carry out following step 3.;
If having, character string is positioned at its first syllabic alphabet second syllabic alphabet afterwards, then make up the secondary syllable initial index that the syllable initial string that starts with the above-mentioned first identical syllable initial is combined as set, this second syllable initial of location from secondary syllabic alphabet index, the letter of all alphabetic strings combination rankings second that second syllabic alphabet and the secondary syllabic alphabet index of character string are gathered compares, if second syllable initial that obtains from character string is different from the letter of all alphabetic strings combination rankings second that secondary syllable initial index gathers, then location failure, stop target word that inquiring user is imported or character string, carry out following step 2.;
If second syllable initial that obtains from character string makes up the alphabetical identical of ranking second with a certain alphabetic string that secondary syllable initial index is gathered, then locate successfully, judge whether to have in the character string and be positioned at its second syllable initial triphone initial afterwards, if be not positioned at the triphone initial after its second syllable initial in the character string, confirm that then this second syllable initial is last letter, directly carry out following step 3.;
If having, character string is positioned at second syllable initial triphone initial afterwards, then make up three syllable initial index that the syllable initial string that starts with the combination of second syllable initial with above-mentioned first syllable initial of arranging according to the order of sequence is combined as set, from three syllable initial index, locate this triphone initial, the letter of all alphabetic strings combination rankings the 3rd that the triphone initial of character string and three syllable initial index are gathered compares, and according to the above-mentioned ordered pair first of pressing, the step of second syllable initial retrieval, if to the failure of triphone initial location, stop target word that inquiring user is imported or character string, carry out following step 2., if the triphone initial is located successfully, judge whether to have in the character string and be positioned at its triphone initial the 4th syllable initial afterwards, if be not positioned at the 4th syllable initial after its triphone initial in the character string, confirm that then this second syllable initial is last letter, directly carry out following step 3., analogize according to the steps in sequence that character string is retrieved, character string is finished retrieval, finally to coming the letter location failure of a certain position in the character string, then make the judgement that stops inquiry string, then carry out following step 2., perhaps finally locate successfully coming its last letter in the character string, and after confirming this last letter, directly carry out following step 3.;
2., directly from word database, attempt retrieval and fail the character string that 1. retrieved by above-mentioned steps, if from word database, fail to access the word identical with character string, then the user is imported target word typing word database and/or word index, and judge that the user imports the syllable of arranging according to the order of sequence in the target word, obtain all syllable initials of target word that the user imports, so that the total syllable initial index of new syllable initial string combination typing that each syllable initial of target word that the user imports is formed, calculate, the input frequency of utilization in the past of target word that recording user is imported, so far finish, finish the registration to this target word of importing of user, amended record work;
3., according to total syllable initial index, from word index, obtain the interim index relevant with character string, the initial combination of its syllable of arranging according to the order of sequence of each word candidate of enumerating in this interim index is identical with character string, from word database, orient first word candidate in this interim index, obtain the information of this first word candidate, and judge in this interim index, whether to be positioned at its first word candidate second word candidate afterwards in addition, if in this interim index, be positioned at second word candidate after its first word candidate in addition, determine that then first word candidate is not last word, from word database, orient second word candidate in the interim index, obtain the information of this second word candidate, if in this interim index, be not positioned at second word candidate after its first word candidate, determine that then first word candidate is last word, according to the 3. above-mentioned said determining step that from word database, all words in the interim index is retrieved and obtain one by one each word information one by one of this step, till confirming that with interim index listed last Word search goes out and obtains this last word information; Frequency of utilization height according to each word candidate that retrieves, all word candidate in the interim index are arranged in order, represent, show all word candidate again, after a certain word in the interim index is chosen input once by the user, recomputate, record the input frequency of utilization in the past of the word candidate of from interim index, being chosen by the user, target word that the user imports or character string, so far finish, finish to the user this retrieval work of the word of importing or character string.
The present invention utilizes Uighur, its each word cut apart in the diligent civilian syllable of Kazakh and Ke Erke, and form one-level index (total syllable initial index) with the initial sequence of each syllable, index with the word sequence of one-level index correspondence forms (secondary) word index, when syllable initial sequence that the user imports, according to one-level indexed search (secondary) word sequence index, obtain word sequence according to (secondary) word index, generate candidate's entry word tabulation (index) according to the frequency of utilization of index word and select for the user, and the frequency of utilization of importing word according to the user is to dynamically update word frequencies.Characteristics of the present invention are: language common feature (Uygur's (language) literary composition of 1) all forming word according to Wei Hakewen with syllable, Kazak (language) literary composition and Ke Erke diligent (language) literary composition abbreviate Wei Hakewen jointly as), use the syllable initial composite sequence of word and search word candidate, and word candidate is offered the user with the form of index select, therefore, the present invention can reduce user's input (character) amount significantly, and improve the speed that the user imports word, also can reduce the word of user's typing misspellings simultaneously; 2) in the inquiry link, adopt (one-level) total syllable initial index and (secondary) word index, greatly improved inquiry velocity by this two-stage index, also shortened period of reservation of number; 3) adopted memory function, the key entry frequency height of the word that uses according to the user represents the higher word of frequency of utilization, further improves intelligent conclusion degree; 4) adopt the user thesaurus function, and word frequency of utilization statistics applied in the user thesaurus, the present invention can improve Uighur effectively, the input speed of Kazakh and kirgiz, quality, avoid misspellings, the present invention is for the restricted intelligent terminal particularly suitable of input equipment.The present invention can reduce the quantity that the user keys in the Wei Hakewen letter and improve the speed of typing word greatly, reduce input Wei Hakewen misspelling, further improve the demand of user's inquiry Wei Hakewen, thereby reach the purpose of quick and precisely importing required Wei Hake literal.
Description of drawings
Fig. 1 is candidate word generating principle process flow diagram of the present invention;
Fig. 2 is index of the present invention and dictionary relational structure synoptic diagram;
Fig. 3 is the present invention imports newly-increased word search, amended record to the user process flow diagram.
Embodiment
A kind of Wei Hakewen intelligent input method based on syllable splitting, its step is as follows:
(I) index building: with each dimension language, breathe out the speak initial of all syllables of word of language or Ke and form the alphabetic string combination according to the order of sequence, the alphabetic string combination inductive set of arranging is according to the order of sequence become total syllable initial index, with the corresponding word index that forms the alphabetic string combination that has and from total syllable initial index, arrange according to the order of sequence of the combination of the alphabetic string in total syllable initial index, all words in the word index are corresponding respectively with all alphabetic string combinations of total syllable initial index, wherein its alphabetic string combination of some word includes the identical letter of a certain alphabetic string combination of total syllable initial index, and the ordering of its letter of word in the word index is also identical with the letter sequence of a certain alphabetic string combination of total syllable initial index, total syllable initial index is associated with word index as the one-level index, and word index is the secondary index that is subordinated to total syllable initial index;
(II) uses the step of above-mentioned indexed search target word that the user imports:
1., judge all syllables of target word that the user imports, the typing user imports the initial of each syllable in the target word successively, obtain by the user and imported in the target word character string that the syllable initial arranged is according to the order of sequence formed, the character string that is combined into syllable initial of obtaining directly perhaps that the user imports, the letter of all alphabetic strings combination rankings first that first syllable initial and total syllable initial index of character string are gathered compares again, if first syllable initial that obtains from character string is different from the letter of all alphabetic strings combination rankings first that total syllable initial index gathers, then location failure, stop target word that inquiring user is imported or character string, carry out following step 2.;
If first syllable initial that obtains from character string makes up the alphabetical identical of ranking first with a certain alphabetic string that total syllable initial index is gathered, then locate successfully, judge whether to have in the character string and be positioned at its first syllable initial second syllable initial afterwards, if be not positioned at second syllable initial after its first syllable initial in the above-mentioned syllable initial string, confirm that then this first syllable initial is last letter, directly carry out following step 3.;
If having, character string is positioned at its first syllabic alphabet second syllabic alphabet afterwards, then make up the secondary syllable initial index that the syllable initial string that starts with the above-mentioned first identical syllable initial is combined as set, this second syllable initial of location from secondary syllabic alphabet index, the letter of all alphabetic strings combination rankings second that second syllabic alphabet and the secondary syllabic alphabet index of character string are gathered compares, if second syllable initial that obtains from character string is different from the letter of all alphabetic strings combination rankings second that secondary syllable initial index gathers, then location failure, stop target word that inquiring user is imported or character string, carry out following step 2.;
If second syllable initial that obtains from character string makes up the alphabetical identical of ranking second with a certain alphabetic string that secondary syllable initial index is gathered, then locate successfully, judge whether to have in the character string and be positioned at its second syllable initial triphone initial afterwards, if be not positioned at the triphone initial after its second syllable initial in the character string, confirm that then this second syllable initial is last letter, directly carry out following step 3.;
If having, character string is positioned at second syllable initial triphone initial afterwards, then make up three syllable initial index that the syllable initial string that starts with the combination of second syllable initial with above-mentioned first syllable initial of arranging according to the order of sequence is combined as set, from three syllable initial index, locate this triphone initial, the letter of all alphabetic strings combination rankings the 3rd that the triphone initial of character string and three syllable initial index are gathered compares, and according to the above-mentioned ordered pair first of pressing, the step of second syllable initial retrieval, if to the failure of triphone initial location, stop target word that inquiring user is imported or character string, carry out following step 2., if the triphone initial is located successfully, judge whether to have in the character string and be positioned at its triphone initial the 4th syllable initial afterwards, if be not positioned at the 4th syllable initial after its triphone initial in the character string, confirm that then this second syllable initial is last letter, directly carry out following step 3., analogize according to the steps in sequence that character string is retrieved, character string is finished retrieval, finally to coming the letter location failure of a certain position in the character string, then make the judgement that stops inquiry string, then carry out following step 2., perhaps finally locate successfully coming its last letter in the character string, and after confirming this last letter, directly carry out following step 3.;
2., directly from word database, attempt retrieval and fail the character string that 1. retrieved by above-mentioned steps, if from word database, fail to access the word identical with character string, then the user is imported target word typing word database and/or word index, and judge that the user imports the syllable of arranging according to the order of sequence in the target word, obtain all syllable initials of target word that the user imports, so that the total syllable initial index of new syllable initial string combination typing that each syllable initial of target word that the user imports is formed, calculate, the input frequency of utilization in the past of target word that recording user is imported, so far finish, finish the registration to this target word of importing of user, amended record work;
3., according to total syllable initial index, from word index, obtain the interim index (tabulation) relevant with character string, the initial combination of its syllable of arranging according to the order of sequence of each word candidate of enumerating in this interim index is identical with character string, from word database, orient first word candidate in this interim index, obtain the information of this first word candidate, and judge in this interim index, whether to be positioned at its first word candidate second word candidate afterwards in addition, if in this interim index, be positioned at second word candidate after its first word candidate in addition, determine that then first word candidate is not last word, from word database, orient second word candidate in the interim index, obtain the information of this second word candidate, if in this interim index, be not positioned at second word candidate after its first word candidate, determine that then first word candidate is last word, according to the 3. above-mentioned said determining step that from word database, all words in the interim index is retrieved and obtain one by one each word information one by one of this step, till confirming that with interim index listed last Word search goes out and obtains this last word information; Frequency of utilization height according to each word candidate that retrieves, all word candidate in the interim index are arranged in order, represent again, show all word candidate, after a certain word in the interim index is chosen input once by the user, recomputate, the word candidate that record is chosen from interim index by the user, the input frequency of utilization in the past of target word that the user imports or character string (i.e. word candidate to being chosen from interim index by the user, the frequency of usage of input in the past of target word that the user imports or character string recomputates, record), target word that the user imports and the word candidate of from interim index, being chosen by the user, character string all has relevance, so far finish, finish to the user this retrieval work of the word of importing or character string.
In total syllable initial index, the alphabetic string combination is categorized into subindex by its cephalocaudal same or similar degree, what are arranged according to the order of sequence according to its total alphabetical number with all the alphabetic string combinations in each subindex.
User's input alphabet sequence, the alphabetical sequence that computing machine is at first imported according to the user generates total syllable initial index (one-level index), generate the secondary word index from total syllable initial index again, in order to provide this word index (secondary index) to the user, make things convenient for the user that the word in the word index is selected, when the user selects word candidate from word index, computing machine upgrades the frequency of utilization record of word that the user imports automatically, and when importing identical sequence next time, according to all words after upgrading separately frequency of utilization show with the user and import the relevant or word candidate accordingly of word sequence, when computing machine fails to retrieve the candidate word identical with word that the user imports in word index, will be considered as word by word that the user imports, then word that retrieval user is imported in word data (storehouse), the frequency of utilization of this new word of record in word data (storehouse), and the frequency of utilization of renewal word that the user imports, when computing machine even when failing in word data (storehouse), to retrieve the candidate word identical with word that the user imports, then word that the user imports is considered as newly-increased word, and interpolation should increase word newly to user-oriented dictionary (word database, word index) in.
As shown in Figure 1, the user imports the syllable initial sequence of a certain word, the initial sequence of importing according to the user, from dictionary one-level index (initial index), retrieve (secondary) word index, from (secondary) word index, take out the word data, these word data comprise its word itself (meaning of a word) and frequency of utilization, and word candidate is arranged, represented to last basis according to the order of sequence with the frequency of utilization that the user imports word correlation candidate word.
As shown in Figure 2, dictionary comprises three contents: namely one-level index (initial index) partly-comprise Uighur, the sequence of the diligent civilian single syllable initial of Kazakh and Ke Erke; Secondary index part (word index)-the comprise index of the word sequence of one-level index indication; Data division-dictionary (word data) is preserved existing or is finally increased word and probabilistic information thereof newly.
As shown in Figure 3, when the user successfully imports a word, at first judge the word of importing whether be (to provide the one-level relevant with the input word according to word that the user imports by intelligent candidate word mode, the mode of secondary index) imports, import if use intelligent candidate word mode, then upgrade the frequency of utilization that this institute imports word, if not, then continue the inquiry word of importing, if the word of importing is present in word data (storehouse), then upgrade the frequency of utilization that this institute imports word equally, if the word of importing is not stored in the word data (storehouse), then in word data (storehouse), add this word, and record the frequency of utilization (once) that this institute imports word.