Summary of the invention
Based on the deficiencies in the prior art, one of the technical problem to be solved in the present invention is to provide a kind of method and system of storage word library, can reduce the needed storage space of language database.
Two of the technical issues that need to address of the present invention are to provide the method and system of stores words in a kind of search tree structure, improve the efficient of data access method.
For solving one of the problems referred to above of the present invention, the present invention is achieved through the following technical solutions: a kind of method of storage word library may further comprise the steps:
A, foundation are used for the root node of the tree construction of storage word library;
B, create the subtree of root node according to the word in the word library, the all corresponding letter of each child node in the subtree, store several words with this nodal directory path beginning in each node, the number of each node stores words is greater than the twice of this node storage size.
Particularly, described step B may further comprise the steps:
B1, the current degree of depth variable of initialization d are 0, and present node is pointed to root node, and all word items of word library are set to not Access status, and word library is set to current vocabulary;
B2, call and create node step B3:
B31, in current vocabulary, obtain one not the visit word, if not the visit word, then with in the word library all etc. the character after d the character of word of armed state store into and go in the node corresponding, the directory path of described respective nodes with want the character string before d character of stores words consistent, finish then;
B32, d character in the current vocabulary equaled described current all words that obtain d character of word find out, set up a sub-vocabulary that comprises these words;
B33, judge word in the sub-vocabulary number whether greater than the twice of node storage size;
Be the described current child node of obtaining the node of d character of word as present node if B34, sets up a corresponding letter greater than the twice of tree node storage size, all words in the sub-vocabulary are set to treatment state; Otherwise, and return B2 in current vocabulary with armed statees such as all words in the sub-vocabulary are set to;
B35, current degree of depth variable d is set adds 1, present node is pointed to the new child node of creating, current vocabulary points to sub-vocabulary, and the data structure that node step B3 creates child node is created in recursive call.
For solving one of the problems referred to above of the present invention, the present invention also provides a kind of system of storage word library, and it comprises:
The dictionary storer is used for storing all words of dictionary;
The root node creation module is used to set up the root node of the tree construction that is used for storage word library;
The subtree node creation module, be used for creating the subtree of root node according to the word of word library, the all corresponding letter of each child node in the subtree, store several words with this nodal directory path beginning in each node, the number of each node stores words is greater than the twice of this node storage size.
For solving two of the problems referred to above of the present invention, the invention provides a kind of method of searching words, it comprises:
E, with tree node pointed root node as present node, user's input alphabet sequence;
F, mobile tree node pointer, node in the access tree structure, the corresponding separately letter of node in the tree construction stores several words with this nodal directory path beginning in each node, the path of present node and the alphabetical sequence of user's input are mated;
If the match is successful for G, then the word of storing in this node is taken out, the word of taking-up and the alphabetical sequence of user's input are mated, if coupling does not match and then returns step F then with the word adding candidate who takes out; If it fails to match, all words of then exporting present node add the candidate.
Particularly, described step F comprises: F1, a mark initial value is set is 0; F2, judge that whether present node exists child node and mark value is 0; F3:F3a if, mobile tree node pointer is to first child node of present node, and the alphabetical sequence of its path and user's input is mated, if do not match, then mark value is set to 1, and turns back to S022, if coupling returns success; F3b, if not, whether judge the present node degree of depth greater than 0, if be not more than 0, return failure; If greater than 0, judge then whether present node is last child node of its father node: if not last child node of its father node, mobile tree node pointer is to the next brother node of present node, and the alphabetical sequence of its path and user's input is mated; If last child node of its father node, mobile tree node pointer is to the father node of present node, and turns back to F3b.
For solving two of the problems referred to above of the present invention, the present invention also provides a kind of system of searching words, it is used for the word that the search tree structure is stored, the corresponding separately letter of node in the described tree construction, store several words in each node with this nodal directory path beginning, it comprises: load module is used for the input alphabet sequence; Initialization module is used for tree node pointed root node as present node; The node visit module is used for mobile tree node pointer, the node in the access tree structure; Matching module, be used for the path of the present node of described node visit module accesses and the alphabetical sequence of user's input are mated, if the match is successful, then the word of storing in this node is taken out, the word of taking-up and the alphabetical sequence of user's input are mated, if then with the word output of taking out, coupling does not match then return node access modules; If it fails to match, then export all words of present node; Display module is used for the word of matching module output is shown, selects for the user.
Compared with prior art:
The method and system of a kind of storage word library of the present invention are utilized the node stores words in the tree structure, store several words in its each node with this nodal directory path beginning, the number of each node stores words makes the storage space of word library obviously reduce greater than the twice of this node storage size.
The method and system of a kind of searching words of the present invention are by mobile tree node pointer, node in the access tree structure, the path of present node and the alphabetical sequence of user's input are mated, if do not match, then skip this node visit next node, needn't visit the word in each node, so just save a lot of times, accelerate the seek rate of word.
Embodiment
For making the present invention easier to understand, the present invention is further elaborated in conjunction with the accompanying drawings, but the embodiment in the accompanying drawing does not constitute any limitation of the invention.
The present invention utilizes the node in the tree structure to store language database in small communication devices such as embedded, store several words in its each node with this nodal directory path beginning, the number of each node stores words makes the storage space of word library obviously reduce greater than the twice of this node storage size.When needs are searched word in the word library, by mobile tree node pointer, node in the access tree structure, the path of present node and the alphabetical sequence of user's input are mated, if do not match, then skip this node visit next node, needn't visit the word in each node, so just accelerated the speed of data access.
With reference to figure 1, a kind of embodiment of the method for a kind of storage word library of the present invention, it comprises step:
S01, set up root node (root), it is 0 that current depth d epth is set, and present node is pointed to root node;
S02, word library are set to current vocabulary (wordTable), and all of current vocabulary are set to not Access status;
S03, call the data structure that the node creation procedure is created root node root with parameter (root, wordTable, depth); This node creation procedure comprises 3 parameters: current tree node (curNode), current vocabulary (curTable) and the current degree of depth (depth).
With reference to figure 2, particularly, the node creation procedure described in the step S03 specifically comprises following process step:
S31, in current vocabulary, obtain one not the visit word;
If S32a obtains the word success, the word that obtains is designated as curWord, depth character of curWord is saved in curChar;
If the word that S32b does not visit, then with in the word library all etc. the character after depth the character of word of armed state store into and go in the node corresponding, the directory path of described respective nodes with want the character string before depth character of stores words consistent, finish then;
S33, all words that depth character in the current vocabulary equaled curChar are found out, and set up a sub-vocabulary sunTable who comprises these words, and the word number in this sub-vocabulary deposits count in;
S34, judge that count is whether greater than the twice of node storage size; Because in general the word text of tree node correspondence and frequency data can adopt certain compression method (such as the Huffman compressed encoding) to compress, these compression methods generally can be 50% size before the word data boil down to, if a so newly-built node needs the storage space of N byte, word with this node beginning will have 2N at least so, 2N byte just becomes the data of N Byte after overcompression like this, and is suitable with this size of node; In actual the use, the node storage size may change, and can be chosen as 4byte, also can be chosen as 8byte.
If it is the child node sunNode of the node of curChar as present node that S35a, sets up a corresponding letter greater than the twice of tree node storage size, all words in the sub-vocabulary are set to treatment state;
S35b otherwise with armed statees such as all words in the sub-vocabulary are set in current vocabulary, and return S31;
S36, current degree of depth variable depth+1 is set, present node is pointed to the new child node of creating, current vocabulary points to sub-vocabulary, creates the data structure of child node sunNode with parameter (sunNode, sunTable, depth+1) recursive call node creation procedure.
With reference to figure 3, the present invention has used tree construction to write down English word; Directory tree is defined as follows:
1, except root node (N0), each node is all represented a letter; Root node is not represented any letter.Each node is designated as D (Nx)=c (x>0, c is an English alphabet or punctuate);
2, for node Na, root node N0 is designated as to the path of Na
And directory path is
Especially, P (N0) is a null character string;
3, each node Nx can preserve several words with P (Nx) beginning, is designated as C (Nx)=A (A is the set of several words).When word w was kept on the Nx, we only needed to preserve the part except that P (Nx) in the character string of this word, as: when administration was kept at above the tree node that the path is admi, we were the actual preservation nistration that only needs.
In Fig. 3, each node makes up with three parts to be represented: the data of the actual preservation of word, node that node letter (node path), node are preserved, as can be seen from the figure, P (N4) equals " ab ", P (N5) equals " ba ", P (N6) equals " and bac ".Preserve 6 words such as backbackground backward backbone bachelor bacilli among the P (N6), and in fact only needed to preserve following data k kground kward kbone helor illi.The node canned data comprises the priority of letter, word and the information of tree construction.
According to tree construction described above, should be kept in the darker directory node minimum data that makes the actual preservation of node like this during stores words as far as possible.Need the x byte if preserve a tree node, so if set up a node Nx, the word with P (Nx) beginning should have x+1 at least so, needs the data total amount that the data total amount of preserving just can be when not building this node little.Consider that actual word text can adopt certain compression method (such as the Huffman compressed encoding) to compress, generally can be 50% size before the word data boil down to, if a so newly-built node needs the storage space of N byte, the word with this node beginning will have 2N at least so.
Fig. 4 is the structural representation of a kind of embodiment of a kind of storage word library of the present invention system, and corresponding with above-mentioned method, this system comprises:
Dictionary storer 1 is used for storing all words of dictionary;
Root node creation module 2 is used to set up the root node of the tree construction that is used for storage word library;
Subtree node creation module 3, be used for creating the subtree of root node according to the word of word library, the all corresponding letter of each child node in the subtree, store several words with this nodal directory path beginning in each node, the number of each node stores words is greater than the twice of this node storage size.
Described subtree node creation module 3 comprises:
Initialization unit 31, being used for the current degree of depth variable of initialization depth is 0, and present node is pointed to root node, and all word items of word library are set to not Access status, and the dictionary in the described dictionary storer is set to current vocabulary,
Word acquiring unit 32 is used for obtaining a not word of visit from current vocabulary;
Sub-vocabulary is set up unit 33, is used for that depth character of current vocabulary equaled described current all words that obtain depth character of word and finds out, and sets up a sub-vocabulary that comprises these words;
Storage unit 34 is used to store described sub-vocabulary;
Whether judging unit 35, the number that is used for judging sub-vocabulary word be greater than the twice of node storage size;
Child node creating unit 36, be used for result according to described judge module judgement, if twice greater than the tree node storage size, set up a corresponding letter and be the described current child node of obtaining the node of depth character of word as present node, all words in the sub-vocabulary are set to treatment state; Current degree of depth variable depth is set then adds 1, present node is pointed to the new child node of creating, current vocabulary points to sub-vocabulary; Otherwise with armed statees such as all words in the sub-vocabulary are set in current vocabulary.
Fig. 5 is the method flow diagram of a kind of searching words of the embodiment of the invention.It comprises this method:
S11, user import keystroke sequence by cell phone keyboard, and keystroke sequence can obtain corresponding alphabetical sequence by the keyboard map table;
S12, with tree node pointed root node as present node, current word subscript is set to 0;
S13, current word subscript add 1;
S14, judge that current word subscript is whether greater than the word number of tree node;
If S15 is greater than, mobile tree node pointer, the next matched node in the access tree structure is mated the path of present node and the alphabetical sequence of user's input; If be not more than, then jump to S18;
S16, judge whether that the match is successful;
If the S17 success, current word subscript puts 1, if all candidates are exported in failure;
S18, the tail data decompress(ion) of current word is taken out, add the path of current tree node, as current word;
S19, with the word that takes out and keystroke sequence according to the comparison of keyboard map table;
S20, judge whether the coupling,
S21, if coupling then adds the candidate with the word that takes out, and return S13, then do not return step S13 if match.
Wherein the mobile tree node pointer among the S15 is visited next matched node, and the path of present node and the alphabetical sequence of user's input are mated; So just can reduce a lot of nodes of visit, improve the speed of search greatly; Fig. 6 is the detail flowchart of step S15, and S15 specifically comprises:
S151, a mark moveSibling initial value is set is 0;
S152, judge that whether present node exists child node and mark value is 0, if, then carry out S153, if not, then carry out S154;
S153:
S1531, mobile tree node pointer be to first child node of present node,
S1532, the alphabetical sequence of the path of this child node and user's input is mated,
If S1533 does not match, then mark moveSibling is set to 1, and turns back to S152, if coupling returns success;
S154:
S1541, whether judge the present node degree of depth greater than 0,
S1542, if greater than 0, judge then whether present node is last child node of its father node:, return failure if be not more than 0;
S1543a, if not last child node of its father node, mobile tree node pointer is to the next brother node of present node, and the alphabetical sequence of its path and user's input is mated, if coupling then returns success, do not match and then returns S1541;
S1543b is if last child node of its father node, and mobile tree node pointer is to the father node of present node, and returns S1541.
With reference to figure 7, corresponding with the method that above-mentioned search tree structure is carried out data access, the embodiment of the invention also provides a kind of system of searching words, and this system comprises:
Load module is used for the input alphabet sequence; It comprises: key-press input unit 41 is used to store the button sequence number that the user imports; Mapping table storage unit 42 is used for storage by key map; Map unit 43 is used for the pairing alphabetical sequence of button sequence number that finds out user's input by key map according to described; Storage unit 44 is used to store the alphabetical sequence that described map unit finds out.
Initialization module 5 is used for tree node pointed root node as present node;
Node visit module 6 is used for mobile tree node pointer, the node in the access tree structure;
Matching module 7, be used for the path of the present node of described node visit module accesses and the alphabetical sequence of user's input are mated, if the match is successful, then the word of storing in this node is taken out, the word of taking-up and the alphabetical sequence of user's input are mated, if then with the word output of taking out, coupling does not match then return node access modules; If it fails to match, then export all words of present node;
Display module 8 is used for the word of matching module output is shown, selects for the user.
Described node visit module 6 comprises:
Initialization unit, being used to be provided with a mark initial value is 0;
Judging unit is used to judge that whether present node exists child node and mark value is 0;
The pen travel unit is used for the judged result according to described judging unit, and mobile tree node pointer arrives node corresponding:
If judgment unit judges result is that to have child node and mark value be 0, then mobile tree node pointer is to first child node of present node;
If judgment unit judges result is that not have child node or mark value be not 0,
Whether judge the present node degree of depth greater than 0,, return failure if be not more than 0; If greater than 0, judge then whether present node is last child node of its father node:
If not last child node of its father node, mobile tree node pointer is to the next brother node of present node; If last child node of its father node, mobile tree node pointer is to the father node of present node, and returns and whether rejudge the present node degree of depth greater than 0.
Just as a catalogue, when engine was searched word, whether, go the content of access node again if having only after the path and keystroke sequence coupling of catalogue with current keystroke sequence coupling by the path of at first looking at each catalogue for engine for whole tree.So just significantly reduced the visit capacity of word.Test proof by experiment, if adopt 18677 English words of tree construction storage of the present invention, average word length is 7.15, the raw data size is 133K, is 50KB after overcompression.On the CPU of 26Mhz, average search speed is 0.11 second simultaneously.