Summary of the invention
Based on the deficiency of prior art, one of technical matters that the present invention will solve is to provide a kind of method and system of storage word library, can reduce the needed storage space of language database.
Two of the technical issues that need to address of the present invention are to provide the method and system of stores words in a kind of search tree structure, improve the efficient of data access method.
For solving one of the problems referred to above of the present invention, the present invention realizes through following technical scheme: a kind of method of storage word library may further comprise the steps:
A, foundation are used for the root node of the tree construction of storage word library;
B, create the subtree of root node according to the word in the word library; The all corresponding letter of each node in the subtree; Store several words with this nodal directory path beginning in each node, the number of each node stores words is greater than the twice of this node storage size.
Particularly, said step B may further comprise the steps:
B1, the current degree of depth variable of initialization d are 0, and present node is pointed to root node, and all word items of word library are set to not Access status, and word library is set to current vocabulary;
B2, call and create node step B3:
B31, in current vocabulary, obtain one not the visit word; If the word of not visiting; Then with in the word library all etc. the character after d the character of word of armed state store into and go in the node corresponding; The directory path of said respective nodes with want the character string before d character of stores words consistent, end then;
B32, d character in the current vocabulary equaled said current all words that obtain d character of word find out, set up a sub-vocabulary that comprises these words;
B33, judge word in the sub-vocabulary number whether greater than the twice of node storage size;
Be the said current child node of obtaining the node of d character of word as present node if B34, sets up a corresponding letter greater than the twice of tree node storage size, all words in the sub-vocabulary are set to treatment state; Otherwise, and return B2 in current vocabulary with armed statees such as all words in the sub-vocabulary are set to;
B35, current degree of depth variable d is set adds 1, present node is pointed to the new child node of creating, current vocabulary points to sub-vocabulary, and the data structure that node step B3 creates child node is created in recursive call.
For solving one of the problems referred to above of the present invention, the present invention also provides a kind of system of storage word library, and it comprises:
The dictionary storer is used for storing all words of dictionary;
Root node is created module, is used to set up the root node of the tree construction that is used for storage word library;
The subtree node creation module; Be used for creating the subtree of root node according to the word of word library; The all corresponding letter of each node in the subtree; Store several words with this nodal directory path beginning in each node, the number of each node stores words is greater than the twice of this node storage size.
For solving two of the problems referred to above of the present invention, the present invention provides a kind of method of searching words, and it comprises:
E, with tree node pointed root node as present node, user's input alphabet sequence;
F, mobile tree node pointer; Node in the access tree structure; The corresponding separately letter of node in the tree construction stores several words with this nodal directory path beginning in each node, the path of present node and the alphabetical sequence of user's input are mated;
If G matees successfully, then the word of storing in this node is taken out, the word of taking-up and the alphabetical sequence of user's input are mated, if coupling does not match and then returns step F then with the word adding candidate who takes out; If the coupling failure, all words of then exporting present node add the candidate.
Particularly, said step F comprises: F1, a mark initial value is set is 0; F2, judge that whether present node exists child node and mark value is 0; F3:F3a if, move tree node pointer first child node, and its path and the alphabetical sequence that the user imports mated to present node, if do not match, then mark value is set to 1, and turns back to S022, if mate, returns success; F3b, if not, whether judge the present node degree of depth greater than 0, if be not more than 0, return failure; If greater than 0; Judge then whether present node is last child node of its father node: if not last child node of its father node; Move the next brother node of tree node pointer, and the alphabetical sequence of its path and user's input is mated to present node; If last child node of its father node moves the father node of tree node pointer to present node, and turns back to F3b.
For solving two of the problems referred to above of the present invention; The present invention also provides a kind of system of searching words; It is used for the word that search tree structure is stored, and the corresponding separately letter of the node in the said tree construction stores several words with this nodal directory path beginning in each node; It comprises: load module is used for the input alphabet sequence; Initialization module is used for tree node pointed root node as present node; The node visit module is used for moving tree node pointer, the node in the access tree structure; Matching module; Be used for the path of the present node of said node visit module accesses and the alphabetical sequence of user's input are mated; If mate successfully, then the word of storing in this node is taken out, the word of taking-up and the alphabetical sequence of user's input are mated; If then with the word output of taking out, coupling does not match then return node access modules; If coupling is failed, then export all words of present node; Display module is used for the word of matching module output is shown, selects for the user.
Compared with prior art:
The method and system of a kind of storage word library of the present invention are utilized the node stores words in the tree structure; Store several words in its each node with this nodal directory path beginning; The number of each node stores words makes the storage space of word library obviously reduce greater than the twice of this node storage size.
The method and system of a kind of searching words of the present invention are through moving the tree node pointer; Node in the access tree structure matees the path of present node and the alphabetical sequence of user's input, if do not match; Then skip this node visit next node; Needn't visit the word in each node, so just save a lot of times, accelerate the seek rate of word.
Embodiment
For making the present invention be more prone to understand, in conjunction with accompanying drawing the present invention is done further elaboration, but the embodiment in the accompanying drawing does not constitute to any restriction of the present invention.
The present invention utilizes the node in the tree structure in small communication devices such as embedded, to store language database; Store several words in its each node with this nodal directory path beginning; The number of each node stores words makes the storage space of word library obviously reduce greater than the twice of this node storage size.When needs are searched the word in the word library; Through moving the tree node pointer, the node in the access tree structure matees the path of present node and the alphabetical sequence of user's input; If do not match; Then skip this node visit next node, needn't visit the word in each node, so just accelerated the speed of data access.
With reference to figure 1, a kind of embodiment of the method for a kind of storage word library of the present invention, it comprises step:
S01, set up root node (root), it is 0 that current depth d epth is set, and present node is pointed to root node;
S02, word library are set to current vocabulary (wordTable), and all of current vocabulary are set to not Access status;
S03, call the data structure that the node creation procedure is created root node root with parameter (root, wordTable, depth); This node creation procedure comprises 3 parameters: current tree node (curNode), current vocabulary (curTable) and the current degree of depth (depth).
With reference to figure 2, particularly, the node creation procedure described in the step S03 specifically comprises following process step:
S31, in current vocabulary, obtain one not the visit word;
If S32a obtains the word success, the word that obtains is designated as curWord, depth character of curWord is saved in curChar;
If the word that S32b does not visit; Then with in the word library all etc. the character after depth the character of word of armed state store into and go in the node corresponding; The directory path of said respective nodes with want the character string before depth character of stores words consistent, end then;
S33, all words that depth character in the current vocabulary equaled curChar are found out, and set up a sub-vocabulary sunTable who comprises these words, and the word number in this sub-vocabulary deposits count in;
S34, judge that count is whether greater than the twice of node storage size; Because word text and frequency data that in general tree node is corresponding can adopt certain compression method (such as the Huffman compressed encoding) to compress; These compression methods generally can be 50% size before the word data boil down to; If a so newly-built node needs the storage space of N byte; Word with this node beginning will have 2N at least so, and 2N byte just becomes the data of N Byte after overcompression like this, and is suitable with this size of node; In actual the use, the node storage size possibly change, and can be chosen as 4byte, also can be chosen as 8byte.
If it is the child node sunNode of the node of curChar as present node that S35a, sets up a corresponding letter greater than the twice of tree node storage size, all words in the sub-vocabulary are set to treatment state;
S35b otherwise with armed statees such as all words in the sub-vocabulary are set in current vocabulary, and return S31;
S36, current degree of depth variable depth+1 is set; Present node is pointed to the new child node of creating; Current vocabulary points to sub-vocabulary, creates the data structure of child node sunNode with parameter (sunNode, sunTable, depth+1) recursive call node creation procedure.
With reference to figure 3, the present invention has used tree construction to write down English word; Directory tree defines as follows:
1, except root node (N0), each node is all represented a letter; Root node is not represented any letter.Each node is designated as D (Nx)=c (x>0, c is an English alphabet or punctuate);
2, for node Na; Root node N0 be designated as
to the path of Na and directory path for
especially, P (N0) is a null character string;
3, each node Nx can preserve several words with P (Nx) beginning, is designated as C (Nx)=A (A is the set of several words).When word w was kept on the Nx, we only needed to preserve the part except that P (Nx) in the character string of this word, as: when administration was kept at above the tree node that the path is admi, we were the actual preservation nistration that only needs.
In Fig. 3, each node makes up with three parts to be represented: the data of the actual preservation of word, node that node letter (node path), node are preserved, as can be seen from the figure, P (N4) equals " ab ", P (N5) equals " ba ", P (N6) equals " and bac ".Preserve 6 words such as backbackground backward backbone bachelor bacilli among the P (N6), and in fact only needed to preserve following data k kground kward kbone helor illi.The node canned data comprises the priority of letter, word and the information of tree construction.
According to top described tree construction, should be kept in the darker directory node minimum data that makes the actual preservation of node like this during stores words as far as possible.Need the x byte if preserve a tree node, so if set up a node Nx, the word with P (Nx) beginning should have x+1 at least so, and the data total amount that the data total amount that needs to preserve just can be when not building this node is little.Consider that actual word text can adopt certain compression method (such as the Huffman compressed encoding) to compress; Generally can be 50% size before the word data boil down to; If a so newly-built node needs the storage space of N byte, the word with this node beginning will have 2N at least so.
Fig. 4 is the structural representation of a kind of embodiment of a kind of storage word library of the present invention system, and corresponding with above-mentioned method, this system comprises:
Dictionary storer 1 is used for storing all words of dictionary;
Root node is created module 2, is used to set up the root node of the tree construction that is used for storage word library;
Subtree node creation module 3; Be used for creating the subtree of root node according to the word of word library; The all corresponding letter of each node in the subtree; Store several words with this nodal directory path beginning in each node, the number of each node stores words is greater than the twice of this node storage size.
Said subtree node creation module 3 comprises:
Initialization unit 31, being used for the current degree of depth variable of initialization depth is 0, and present node is pointed to root node, and all word items of word library are set to not Access status, and the dictionary in the said dictionary storer is set to current vocabulary,
Word acquiring unit 32 is used for obtaining a not word of visit from current vocabulary;
Sub-vocabulary is set up unit 33, is used for that depth character of current vocabulary equaled said current all words that obtain depth character of word and finds out, and sets up a sub-vocabulary that comprises these words;
Storage unit 34 is used to store said sub-vocabulary;
Whether judging unit 35, the number that is used for judging sub-vocabulary word be greater than the twice of node storage size;
Child node is created unit 36; Be used for result according to said judge module judgement; If twice greater than the tree node storage size; Set up a corresponding letter and be the said current child node of obtaining the node of depth character of word as present node, all words in the sub-vocabulary are set to treatment state; Current degree of depth variable depth is set then adds 1, present node is pointed to the new child node of creating, current vocabulary points to sub-vocabulary; Otherwise with armed statees such as all words in the sub-vocabulary are set in current vocabulary.
Fig. 5 is the method flow diagram of a kind of searching words of the embodiment of the invention.It comprises this method:
S11, user are through cell phone keyboard input keystroke sequence, and keystroke sequence can obtain corresponding alphabetical sequence through the keyboard map table;
S12, with tree node pointed root node as present node, current word subscript is set to 0;
S13, current word subscript add 1;
S14, judge that current word subscript is whether greater than the word number of tree node;
If S15 greater than, move the tree node pointer, the next matched node in the access tree structure is mated the path and the alphabetical sequence that the user imports of present node; If be not more than, then jump to S18;
S16, judge whether to mate successfully;
If the S17 success, current word subscript puts 1, if all candidates are exported in failure;
S18, the tail data decompress(ion) of current word is taken out, add the path of current tree node, as current word;
S19, with the word that takes out and keystroke sequence according to the comparison of keyboard map table;
S20, judge whether the coupling,
S21, if coupling then adds the candidate with the word that takes out, and return S13, then do not return step S13 if match.
Wherein the next matched node of the mobile tree node pointer among the S15 is mated the path of present node and the alphabetical sequence of user's input; So just can reduce a lot of nodes of visit, improve the speed of search greatly; Fig. 6 is the detail flowchart of step S15, and S15 specifically comprises:
S151, a mark moveSibling initial value is set is 0;
S152, judge that whether present node exists child node and mark value is 0, if then carry out S153, if not, then carry out S154;
S153:
S1531, move tree node pointer first child node to present node,
S1532, the alphabetical sequence of the path of this child node and user's input is mated,
If S1533 does not match, then mark moveSibling is set to 1, and turns back to S152, if coupling returns success;
S154:
S1541, whether judge the present node degree of depth greater than 0,
S1542, if greater than 0, judge then whether present node is last child node of its father node:, return failure if be not more than 0;
S1543a, if not last child node of its father node, move the next brother node of tree node pointer, and the alphabetical sequence of its path with user's input mated to present node, then return success as if mating, do not match and then return S1541;
S1543b is if last child node of its father node moves the father node of tree node pointer to present node, and returns S1541.
With reference to figure 7, the method for carrying out data access with above-mentioned search tree structure is corresponding, and the embodiment of the invention also provides a kind of system of searching words, and this system comprises:
Load module is used for the input alphabet sequence; It comprises: key-press input unit 41 is used to store the button sequence number that the user imports; Mapping table storage unit 42 is used for storage by key map; Map unit 43 is used for the pairing alphabetical sequence of button sequence number that finds out user's input by key map according to said; Storage unit 44 is used to store the alphabetical sequence that said map unit finds out.
Initialization module 5 is used for tree node pointed root node as present node;
Node visit module 6 is used for moving tree node pointer, the node in the access tree structure;
Matching module 7; Be used for the path of the present node of said node visit module accesses and the alphabetical sequence of user's input are mated; If mate successfully, then the word of storing in this node is taken out, the word of taking-up and the alphabetical sequence of user's input are mated; If then with the word output of taking out, coupling does not match then return node access modules; If coupling is failed, then export all words of present node;
Display module 8 is used for the word of matching module output is shown, selects for the user.
Said node visit module 6 comprises:
Initialization unit, being used to be provided with a mark initial value is 0;
Judging unit is used to judge that whether present node exists child node and mark value is 0;
The pen travel unit is used for the judged result according to said judging unit, moves the tree node pointer and arrives node corresponding:
If judgment unit judges result is that to have child node and mark value be 0, then move tree node pointer first child node to present node;
If judgment unit judges result is that not have child node or mark value be not 0,
Whether judge the present node degree of depth greater than 0,, return failure if be not more than 0; If greater than 0, judge then whether present node is last child node of its father node:
If not last child node of its father node, move the next brother node of tree node pointer to present node; If last child node of its father node, move the father node of tree node pointer, and return and whether rejudge the present node degree of depth greater than 0 to present node.
Just as a catalogue, when engine was searched word, whether, go the content of access node again if having only after the path and keystroke sequence coupling of catalogue with current keystroke sequence coupling by the path of at first looking at each catalogue for engine for whole tree.So just significantly reduced the visit capacity of word.Prove that through experiment test if adopt 18677 English words of tree construction storage of the present invention, average word length is 7.15, the raw data size is 133K, is 50KB after overcompression.On the CPU of 26Mhz, average search speed is 0.11 second simultaneously.