Dimension Kazakh intelligent input method based on syllable splitting
Technical field
The present invention relates to letter input method field, belong to phonetic language such as dimension (Uygur) language, Kazakhstan (Kazak) language, the improvement of Ke (Kirgiz) language intelligent input method, is based particularly on the dimension Kazakh intelligent input method of syllable splitting.
Background technology
In recent years, continuous progress along with ethnic groups informationization technology, electronic software programming technique about Xinjiang Minority language there has also been bigger development, but at Uighur, on Kazak and language input method field, Kirgiz, still lack one and can improve the spoken and written languages of national minorities (dimension language, breathe out language, Ke's language) input efficiency and improve the intelligent input method of its input quality, although occurring in that some follow Unicode standard, the simple intelligent input method of candidate word is represented with rule compositor, but it uses the most very inconvenient, fail to meet user and quickly search the demand of word.
Summary of the invention
It is an object of the invention to provide a kind of dimension Kazakh intelligent input method based on syllable splitting, user can be reduced key in the quantity of dimension Kazakh letter and be greatly improved the speed of typing word, reduce input dimension Kazakh cacography, improve the demand of the inquiry dimension Kazakh of user further, thus reach quick and precisely to input the purpose of required dimension Kazakh word.
The object of the present invention is achieved like this: a kind of dimension Kazakh intelligent input method based on syllable splitting, and its step is as follows:
(I) index building: with each dimension language, breathe out language or Ke's language articulatory initial of word sequentially forms letter string combination, the letter string of arranged in sequence is combined inductive set becomes total syllable initial to index, letter string combination in indexing with total syllable initial is correspondingly formed the word index of the letter string combination with arranged in sequence in total syllable initial index, all letter strings combination that all words in word index index with total syllable initial is the most corresponding, the combination of its letter string of some of which word includes a certain letter string of total syllable initial index and combines identical letter, and the sequence of its letter of word in word index also index with total syllable initial a certain letter string combination letter sequence identical;
(II) step of above-mentioned the inputted target word of indexed search user is used:
1. user's all syllables of inputted target word are judged, the initial of each syllable in the inputted target word of typing user successively, obtain the character string being made up of the syllable initial of arranged in sequence in the inputted target word of user, or directly obtain the character string being combined into syllable initial that user is inputted, the alphabetical phase comparison of all letter strings combination ranking first again the first syllable initial of character string and total syllable initial index gathered, if the first syllable initial acquired from character string is different from the letter of all letter strings combination ranking first that total syllable initial index is gathered, then position failure, stop inquiry the inputted target word of user or character string, perform following step 2.;
If the first syllable initial acquired from character string is identical with the letter of a certain letter string combination ranking first that total syllable initial index is gathered, then position successfully, judge whether character string has the second syllable initial being positioned at after its first syllable initial, if character string is not at the second syllable initial after its first syllable initial, then confirm that this first syllable initial is the last letter, directly perform following step 3.;
If character string has the second syllabic alphabet being positioned at after its first syllabic alphabet, then build the secondary syllable initial index being combined as set with the syllable initial string of above-mentioned the first identical syllable initial beginning, this the second syllable initial is positioned from secondary syllabic alphabet indexes, the alphabetical phase comparison of all letter strings combination ranking second that the second syllabic alphabet of character string and secondary syllabic alphabet index are gathered, if the second syllable initial acquired from character string is different from the letter of all letter strings combination ranking second that secondary syllable initial index is gathered, then position failure, stop inquiry the inputted target word of user or character string, perform following step 2.;
If the second syllable initial acquired from character string is identical with the letter of a certain letter string combination ranking second that secondary syllable initial index is gathered, then position successfully, judge whether character string has the triphone initial being positioned at after its second syllable initial, if character string is not at the triphone initial after its second syllable initial, then confirm that this second syllable initial is the last letter, directly perform following step 3.;
nullIf character string has the triphone initial being positioned at after the second syllable initial,The syllable initial string then building the first syllable initial with above-mentioned arranged in sequence and the combination of the second syllable initial and start is combined as three syllable initial indexes of set,This triphone initial is positioned from three syllable initial indexes,The triphone initial of character string is indexed, with three syllable initials, all letter strings combination gathered and ranks the alphabetical phase comparison of the 3rd,And according to above-mentioned by ordered pair first、The step of the second syllable initial retrieval,If triphone initial is positioned unsuccessfully,Stop inquiry the inputted target word of user or character string,Perform following step 2.,If triphone initial is positioned successfully,Judge whether character string has the 4th syllable initial being positioned at after its triphone initial,If character string is not at the 4th syllable initial after its triphone initial,Then confirm that this triphone initial is the last letter,Directly perform following step 3.,According to the step that character string is retrieved the like,Character string is completed retrieval,Finally the letter coming a certain position in character string is positioned unsuccessfully,Then make the judgement stopping inquiry string,Then following step is performed 2.,Or finally position successfully character string comes its last letter,And after confirming this last letter,Directly perform following step 3.;
2., from word database, directly attempt the character string that retrieval fails 1. to be retrieved by above-mentioned steps, if failing to recall the word identical with character string from word database, then by inputted for user target word typing word database and/or word index, and judge the syllable of arranged in sequence in the inputted target word of user, obtain user's all syllable initials of inputted target word, so that new syllable initial string combination typing total syllable initial index of user's each syllable initial of inputted target word composition, calculate, the input in the past of record the inputted target word of user uses frequency, so far terminate, complete the registration to this inputted target word of user, amended record works;
null③、Index according to total syllable initial,The interim index relevant to character string is obtained from word index,The initial combination of the syllable of this each its arranged in sequence of word candidate enumerated in temporarily indexing is identical with character string,The first word candidate during this indexes temporarily is oriented from word database,Obtain the information of this first word candidate,And judge the most also to be positioned at the second word candidate after its first word candidate in this temporarily index,If being also positioned at the second word candidate after its first word candidate in this temporarily index,Then determine that the first word candidate is not last word,The second word candidate in interim index is oriented from word database,Obtain the information of this second word candidate,If being not at the second word candidate after its first word candidate in this temporarily index,Then determine that the first word candidate is last word,According to this step the most above-mentioned described from word database will temporarily in index all words retrieve and obtain one by one the judgement step of each word information one by one,Till last Word search listed by temporarily indexing until confirmation goes out and obtains this last word information;Use frequency height according to each word candidate retrieved, all word candidate in index temporarily are arranged in order, represent again, show all word candidate, after a certain word in interim index is chosen input once by user, recalculate, record the word candidate chosen from interim index by user, user inputs the input in the past of target word or character string and uses frequency, so far terminate, complete this inputted word of user or the retrieval work of character string.
The present invention utilizes Uighur, its each word split in Kazakh and Ke Erke diligent literary composition syllable, and form one-level index (total syllable initial index) with the initial sequence of each syllable, (two grades) word index is formed with the index of the word sequence of one-level index correspondence, when the syllable initial sequence that user is inputted, index according to one-level indexed search (two grades) word sequence, word sequence is obtained according to (two grades) word index, use frequency according to index word generates candidate's entry word list (index) and selects for user, and input the use frequency of word dynamically to update word frequencies according to user.The invention have the characteristics that 1) according to dimension Kazakh, all with the language common feature of syllable composition word, (Uygur's (language) is civilian, Kazakh simply referred to as tieed up in Kazak (language) literary composition and Ke Erke diligent (language) literary composition), the syllable initial composite sequence of application word searches word candidate, and word candidate is supplied to user's selection with the form of index, therefore, the present invention can reduce user's input (character) amount significantly, and improve user and input the speed of word, the word of user's typing misspellings can also be reduced simultaneously;2) in inquiry link, use (one-level) total syllable initial index and (two grades) word index, drastically increase inquiry velocity by this two-stage index, also shorten period of reservation of number;3) have employed memory function, according to the key entry frequency height of the word that user uses, represent the word using frequency higher, improve intelligent conclusion degree further;4) user thesaurus function is used, and word uses frequency statistics apply in user thesaurus, the present invention can be effectively improved Uighur, the input speed of Kazakh and kirgiz, quality, avoiding misspellings, present invention intelligent terminal restricted for input equipment is particularly suitable.The present invention can reduce user and keys in the quantity of dimension Kazakh letter and be greatly improved the speed of typing word, reduce input dimension Kazakh cacography, improve the demand of the inquiry dimension Kazakh of user further, thus reach quick and precisely to input the purpose of required dimension Kazakh word.
Accompanying drawing explanation
Fig. 1 is the candidate word generating principle flow chart of the present invention;
Fig. 2 is index and the dictionary positionality schematic diagram of the present invention;
Fig. 3 is that the present invention inputs added words search, the flow chart of amended record to user.
Detailed description of the invention
A kind of dimension Kazakh intelligent input method based on syllable splitting, its step is as follows:
null(I) index building: with each dimension language、Breathe out language or Ke's language articulatory initial of word sequentially forms letter string combination,The letter string of arranged in sequence is combined inductive set becomes total syllable initial to index,Letter string combination in indexing with total syllable initial is correspondingly formed the word index with the letter string combination with arranged in sequence from total syllable initial index,All letter strings combination that all words in word index index with total syllable initial is the most corresponding,The combination of its letter string of some of which word includes a certain letter string of total syllable initial index and combines identical letter,And the sequence of its letter of word in word index also index with total syllable initial a certain letter string combination letter sequence identical,Total syllable initial index is associated with word index as one-level index,Word index is to be subordinated to the secondary index of total syllable initial index;
(II) step of above-mentioned the inputted target word of indexed search user is used:
1., judge user's all syllables of inputted target word, the initial of each syllable in the inputted target word of typing user successively, obtain the character string being made up of the syllable initial of arranged in sequence in the inputted target word of user, or directly obtain the character string being combined into syllable initial that user is inputted, the alphabetical phase comparison of all letter strings combination ranking first again the first syllable initial of character string and total syllable initial index gathered, if the first syllable initial acquired from character string is different from the letter of all letter strings combination ranking first that total syllable initial index is gathered, then position failure, stop inquiry the inputted target word of user or character string, perform following step 2.;
If the first syllable initial acquired from character string is identical with the letter of a certain letter string combination ranking first that total syllable initial index is gathered, then position successfully, judge whether character string has the second syllable initial being positioned at after its first syllable initial, if character string is not at the second syllable initial after its first syllable initial, then confirm that this first syllable initial is the last letter, directly perform following step 3.;
If character string has the second syllabic alphabet being positioned at after its first syllabic alphabet, then build the secondary syllable initial index being combined as set with the syllable initial string of above-mentioned the first identical syllable initial beginning, this the second syllable initial is positioned from secondary syllabic alphabet indexes, the alphabetical phase comparison of all letter strings combination ranking second that the second syllabic alphabet of character string and secondary syllabic alphabet index are gathered, if the second syllable initial acquired from character string is different from the letter of all letter strings combination ranking second that secondary syllable initial index is gathered, then position failure, stop inquiry the inputted target word of user or character string, perform following step 2.;
If the second syllable initial acquired from character string is identical with the letter of a certain letter string combination ranking second that secondary syllable initial index is gathered, then position successfully, judge whether character string has the triphone initial being positioned at after its second syllable initial, if character string is not at the triphone initial after its second syllable initial, then confirm that this second syllable initial is the last letter, directly perform following step 3.;
nullIf character string has the triphone initial being positioned at after the second syllable initial,The syllable initial string then building the first syllable initial with above-mentioned arranged in sequence and the combination of the second syllable initial and start is combined as three syllable initial indexes of set,This triphone initial is positioned from three syllable initial indexes,The triphone initial of character string is indexed, with three syllable initials, all letter strings combination gathered and ranks the alphabetical phase comparison of the 3rd,And according to above-mentioned by ordered pair first、The step of the second syllable initial retrieval,If triphone initial is positioned unsuccessfully,Stop inquiry the inputted target word of user or character string,Perform following step 2.,If triphone initial is positioned successfully,Judge whether character string has the 4th syllable initial being positioned at after its triphone initial,If character string is not at the 4th syllable initial after its triphone initial,Then confirm that this triphone initial is the last letter,Directly perform following step 3.,According to the step that character string is retrieved the like,Character string is completed retrieval,Finally the letter coming a certain position in character string is positioned unsuccessfully,Then make the judgement stopping inquiry string,Then following step is performed 2.,Or finally position successfully character string comes its last letter,And after confirming this last letter,Directly perform following step 3.;
2., from word database, directly attempt the character string that retrieval fails 1. to be retrieved by above-mentioned steps, if failing to recall the word identical with character string from word database, then by inputted for user target word typing word database and/or word index, and judge the syllable of arranged in sequence in the inputted target word of user, obtain user's all syllable initials of inputted target word, so that new syllable initial string combination typing total syllable initial index of user's each syllable initial of inputted target word composition, calculate, the input in the past of record the inputted target word of user uses frequency, so far terminate, complete the registration to this inputted target word of user, amended record works;
null③、Index according to total syllable initial,The interim index (list) relevant to character string is obtained from word index,The initial combination of the syllable of this each its arranged in sequence of word candidate enumerated in temporarily indexing is identical with character string,The first word candidate during this indexes temporarily is oriented from word database,Obtain the information of this first word candidate,And judge the most also to be positioned at the second word candidate after its first word candidate in this temporarily index,If being also positioned at the second word candidate after its first word candidate in this temporarily index,Then determine that the first word candidate is not last word,The second word candidate in interim index is oriented from word database,Obtain the information of this second word candidate,If being not at the second word candidate after its first word candidate in this temporarily index,Then determine that the first word candidate is last word,According to this step the most above-mentioned described from word database will temporarily in index all words retrieve and obtain one by one the judgement step of each word information one by one,Till last Word search listed by temporarily indexing until confirmation goes out and obtains this last word information;Use frequency height according to each word candidate retrieved, all word candidate in index temporarily are arranged in order, represent again, show all word candidate, after a certain word in interim index is chosen input once by user, recalculate, record the word candidate chosen from interim index by user, the input in the past of the inputted target word of user or character string uses frequency (i.e. to the word candidate chosen from interim index by user, the frequency of usage of input in the past of the inputted target word of user or character string recalculates, record), the inputted target word of user and the word candidate chosen from interim index by user, character string is respectively provided with relatedness, so far terminate, complete this inputted word of user or the retrieval work of character string.
In total syllable initial indexes, letter string combination is categorized into subindex by its cephalocaudal same or similar degree, all letter strings in each subindex is combined according to its total letter number how many arranged in sequences.
nullUser inputs alphabetical sequence,The alphabetical sequence that first computer is inputted according to user generates total syllable initial index (one-level index),Two grades of word index are generated again from total syllable initial index,To provide the user with this word index (secondary index),Facilitate user that the word in word index is selected,When user selects word candidate from word index,Computer automatically updates the use frequency record of the inputted word of user,And when inputting identical sequence next time,Frequency is each used show with user to input word sequence to be correlated with or corresponding word candidate according to all words after updating,When computer fails to retrieve the identical candidate word of word inputted with user in word index,Word will be considered as by the inputted word of user,Then retrieval the inputted word of user in word data (storehouse),The use frequency of this new word is recorded in word data (storehouse),And update the use frequency of the inputted word of user,When computer even fails to retrieve the identical candidate word of word inputted with user in word data (storehouse),Then inputted for user word is considered as added words,And add this added words to user-oriented dictionary (word database、Word index) in.
As shown in Figure 1, user inputs the syllable initial sequence of a certain word, the initial sequence inputted according to user, (two grades) word index is retrieved from dictionary one-level index (initial index), word data are taken out from (two grades) word index, these word data comprise its word itself (meaning of a word) and use frequency, finally according to word correlation candidate word inputted with user use frequency arranged in sequence, represent word candidate.
As in figure 2 it is shown, dictionary comprises three pieces of contents: i.e. one-level index (initial index) part comprises the sequence of Uighur, Kazakh and Ke Erke diligent literary composition single syllable initial;Secondary index part (word index) comprises the index of the word sequence of one-level index indication;Data division dictionary (word data) preserves existing or final added words and probabilistic information thereof.
As shown in Figure 3, when user is successfully entered a word, first determine whether whether inputted word is (to provide the one-level relevant to inputted word according to the inputted word of user by intelligence candidate word mode, the mode of secondary index) input, if using intelligence candidate word mode to input, then update this institute and input the use frequency of word, if not, then continue to inquire about inputted word, if inputted word is present in word data (storehouse), update this institute the most equally and input the use frequency of word, if inputted word is not stored in word data (storehouse), in word data (storehouse), then add this word, and record this institute and input the use frequency (once) of word.