CN103823814B - A kind of information processing method and device - Google Patents
A kind of information processing method and device Download PDFInfo
- Publication number
- CN103823814B CN103823814B CN201210468061.1A CN201210468061A CN103823814B CN 103823814 B CN103823814 B CN 103823814B CN 201210468061 A CN201210468061 A CN 201210468061A CN 103823814 B CN103823814 B CN 103823814B
- Authority
- CN
- China
- Prior art keywords
- phonetic
- even numbers
- word
- strings
- dictionary tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The embodiment of the invention discloses a kind of information processing method and device, for according to the Chinese phonetic alphabet that user inputs in Large Copacity dictionary word corresponding to quick search to the phonetic.Present invention method includes:Even numbers group dictionary tree is generated according to pinyin syllable and phonetic ID corresponding relation, the even numbers group dictionary tree includes:Base value array and verification array, the phonetic ID is the state transfer amount of the even numbers group dictionary tree, receive the phonetic ID strings required to look up, the sequence that corresponding phonetic ID is formed after pinyin character cutting of the phonetic ID strings for user's input, in pinyin lexicon according to corresponding to the even numbers group dictionary tree searches the phonetic ID string word, export the word found.
Description
Technical field
The present invention relates to communication technical field, more particularly to a kind of information processing method and device.
Background technology
With the continuous lifting of computer hardware performance and the intelligent continuous improvement of software, increasingly, it is desired that calculating
Machine can provide more natural man-machine interaction mode, such as:(1)More intelligent Chinese character input method is provided;(2)There is provided more
Accurate speech identifying function.And the realization of these interactive modes, bottom are required for the support of large and complete pinyin lexicon, institute
So that the search efficiency of extensive pinyin lexicon directly affects the execution speed of above-mentioned interactive software, so as to also just determine
Their quality.It is accurately and quickly its lifeline certainly by taking spelling input method as an example, in order to improve accuracy rate,
Current input method system all employs ultra-large dictionary, and when user is inputted, program needs the phonetic according to input
Big frequently thesaurus-lookups are carried out, so as to provide accurate candidate's word.
In the prior art, existing pinyin lexicon system mostly uses the storage issuer based on phonetic and word length packet
Method, i.e., for a given pinyin string, it is first obtained according to word is long and the top n phonetic of word is indexed to dictionary
Top n syllable and word length, the phonetic grouping sheet that equivalent is grown into dictionary, find and are grouped corresponding to the syllable, travel through the packet
In all words, return to the word that phonetic and the pinyin string to be searched match.
But in the above prior art, thesaurus-lookups efficiency is low, it is necessary to travel through all words in same packet, and dictionary
Dilatation poor-performing, when dictionary constantly increases, inquiry will be time-consuming to be multiplied, and cause software can not normal work.
The content of the invention
The embodiments of the invention provide a kind of information processing method and device, to realize in pinyin lexicon, according to
Word corresponding to the quick lookup of pinyin character of family input.
Information processing method provided in an embodiment of the present invention, including:According to pinyin syllable and phonetic identity number ID
Corresponding relation generation even numbers group dictionary tree, the even numbers group dictionary tree includes:Base value array and verification array, the phonetic ID
For the state transfer amount of the even numbers group dictionary tree;The phonetic ID strings required to look up are received, the phonetic ID strings input for user
Pinyin character cutting after the sequence that forms of corresponding phonetic ID;Institute is searched according to the even numbers group dictionary tree in pinyin lexicon
State word corresponding to phonetic ID strings;Export the word found.
Preferably, include before the generation even numbers group dictionary tree according to pinyin syllable and phonetic ID corresponding relation:If
Put phonetic ID and the corresponding relation of pinyin syllable.
Further, it is described in pinyin lexicon according to corresponding to the even numbers group dictionary tree searches the phonetic ID string word
Language includes:Since the root node of the even numbers group dictionary tree, phonetic ID in the phonetic ID strings required to look up according to described and
The value of the base value array searches word corresponding to the phonetic ID strings;If it is end mark corresponding to the phonetic ID, and it is described
The current first bit of base value array element of even numbers group dictionary tree is 1, then exports the word that current lookup arrives.
Further, it is described since the root node of the even numbers group dictionary tree, according to the phonetic ID required to look up
The value of phonetic ID and the base value array in string include after searching word corresponding to the phonetic ID strings:If the phonetic ID
Corresponding is not end mark, then judge current verification array value whether with the section before transfering state in current lookup sequence node
Whether the numbering of point is equal;If so, then according to next phonetic ID in the phonetic ID strings required to look up and current base value
The value sum of array continues to search for next node.
Further, the phonetic ID strings for needing to inquire about that receive include before:The pinyin character that the user is inputted
Cutting is syllable, and the syllable is linked in sequence and gone here and there for the phonetic ID.
Information processor provided in an embodiment of the present invention, including:Generation unit, according to pinyin syllable and phonetic identity mark
Know number ID corresponding relation generation even numbers group dictionary tree, the even numbers group dictionary tree includes:Base value array and verification array, institute
State the state transfer amount that phonetic ID is the even numbers group dictionary tree;Receiving unit, for receiving the phonetic ID required to look up strings, institute
State the sequence that phonetic ID strings are corresponding phonetic ID compositions after the pinyin character cutting that user inputs;Searching unit, for spelling
In sound dictionary according to corresponding to the even numbers group dictionary tree searches the phonetic ID strings that the receiving unit receives word;Output
Unit, the word found for exporting the searching unit.
Preferably, described device also includes:Setting unit, for setting phonetic ID and pinyin syllable corresponding relation.
Further, the searching unit, it is additionally operable to since the root node of the even numbers group dictionary tree, according to the need
The value of phonetic ID and the base value array in the phonetic ID to be searched strings search word corresponding to the phonetic ID strings;
The output unit, it is end mark if being additionally operable to corresponding to the phonetic ID, and the even numbers group dictionary tree is worked as
The preceding first bit of base value array element is 1, then exports the word that current lookup arrives.
Further, described device also includes:Judging unit, if for not being end mark corresponding to the phonetic ID,
Judge whether the value of current verification array equal with the numbering of the node before transfering state in current lookup sequence node;
The searching unit, it is additionally operable to if so, then according to next phonetic ID in the phonetic ID strings required to look up
Next node is continued to search for the value sum of current base value array.
Further, described device also includes:Converting unit, the pinyin character cutting for the user to be inputted are sound
Section, and the syllable is linked in sequence and gone here and there for the phonetic ID.
As can be seen from the above technical solutions, the embodiment of the present invention has advantages below:Due to according to pinyin syllable and spelling
Sound ID generates even numbers group dictionary tree, can be according to pinyin syllable in even numbers group when searching the pinyin character that user requires to look up
Searched in a branch in dictionary tree, without traveling through all words in phonetic packet, inquiry workload is small, looks into
Ask speed faster.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those skilled in the art, without having to pay creative labor, can be with root
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the phrase dictionary tree schematic diagram of Chinese character;
Fig. 2 is one embodiment schematic diagram of the information processing method in the embodiment of the present invention;
Fig. 3 is another embodiment schematic diagram of the information processing method in the embodiment of the present invention;
Fig. 4 is the example schematic that even numbers group dictionary tree construction is generated in the embodiment of the present invention;
Fig. 5 is query terms flow chart in the information processing method in the embodiment of the present invention;
Fig. 6 is an example schematic of the information processing method in the embodiment of the present invention;
Fig. 7 is one embodiment schematic diagram of the information processor in the embodiment of the present invention;
Fig. 8 is another embodiment schematic diagram of the information processor in the embodiment of the present invention.
Embodiment
The technical scheme of the embodiment of the present invention is further illustrated with specific embodiment below in conjunction with the accompanying drawings, it is clear that described
Embodiment be only part of the embodiment of the present invention, rather than whole embodiment.Based on the embodiment in the present invention, ability
The every other embodiment that field technique personnel are obtained under the premise of creative work is not made, belong to what the present invention protected
Scope.
The embodiments of the invention provide a kind of information processing method and device, the Chinese phonetic alphabet for being inputted according to user exists
Word or phrase corresponding to quick search to the phonetic in Large Copacity dictionary.
Trie trees are one kind of search tree, can establish effective data retrieval institutional framework, realize the lookup in dictionary
The algorithm of word.It is substantially the finite-state automata of a determination(DFA, Deterministic Finite
Automaton), a state of each node on behalf automatic machine.This state includes " word prefix " in dictionary, " into word "
Deng.
Even numbers group dictionary tree(Double Array Trie)It it is one of trie trees simple and effectively realize, by two
Integer array is formed, if array index is i, i is the integer more than or equal to 1, then even numbers group a array is base value array
Base [i], another array are verification array check [i], and its each branch is exactly to be reached after running into specific character from some state
One State Transferring of another state.Such as, a character c arrival states t State Transferring is run into for state s, in even numbers
Have in group:
check[base[s]+c]=s
base[s]+c=t
In the present embodiment, the purpose of thesaurus-lookups is according to given cutting phonetic, there is provided corresponding word candidate.It is first
What is first done is exactly to be indexed to dictionary plus dictionary tree.In units of syllable, syllable is the substantially single of pronunciation for the branch of dictionary tree
Position, be in pronunciation can not cutting again least unit.Assuming that dictionary only has three words now:China, Chinese, harmony.For this
The dictionary tree index that dictionary is established is as shown in Figure 1.In order to avoid there is the prefix that some word is another word, such as " China " in
" Chinese ", we add an end mark " $ " to each word.So, each node for having term data is independent
Leaf node.Because each branch of each node of dictionary tree is unique, only need to be searched successively according to branch during lookup,
Search the length that number of comparisons is word.Represent there is no the word in dictionary if finding some syllable and there is no branch.
The information processing method in the embodiment of the present invention is described below, referring to Fig. 2, at the information in the embodiment of the present invention
One embodiment of reason method includes:
101st, even numbers group dictionary tree, the even numbers group dictionary tree bag are generated according to pinyin syllable and phonetic ID corresponding relation
Include base value array and verification array;
In the embodiment of the present invention, according to pinyin syllable in dictionary and phonetic identity number(ID, IDentity)Pair
The even numbers group dictionary tree of generation, the base value of the even numbers group dictionary tree should be related to(base)Array and verification(check)The member of array
Element is one-to-one, and equivalent to a node of even numbers group dictionary tree, its value is used as shape for each element in base value array
The base value of state transfer, the value of relevant position in array is verified equivalent to check value, for examining the state after shifting whether there is.
State transfer amount in even numbers group dictionary tree, the offset of another state is transferred to from a state, i.e., from one
Individual node is transferred to the offset of next node.State transfer amount is actually needed determination according to even numbers group dictionary tree.This reality
Apply in example, phonetic ID is the state transfer amount of even numbers group dictionary tree.
102nd, the phonetic ID strings required to look up are received;
The sequence that corresponding phonetic ID is formed after pinyin character cutting of the phonetic ID strings for user's input;
103rd, in pinyin lexicon according to corresponding to the even numbers group dictionary tree searches the phonetic ID string word;
Include word list and the even numbers group dictionary tree in pinyin lexicon, the phonetic ID strings searched as needed,
Utilized in pinyin lexicon in the even numbers group dictionary tree and search word corresponding to phonetic ID strings in word list, found
Word is probably the word that a word is formed, it is also possible to the word of two word composition described above.
104th, the word found is exported.
The word found by peripheral apparatus output.
In the embodiment of the present invention, due to generating even numbers group dictionary tree according to pinyin syllable and phonetic ID, needed searching user
During the pinyin character to be searched, it can be searched according to pinyin syllable in a branch in even numbers group dictionary tree, without
Need to travel through all words in phonetic packet, inquiry workload is small, inquiry velocity faster, furthermore, it is desirable to during dilatation dictionary, by
Syllable is added on the basis of the even numbers group dictionary tree in such a structure and corresponding word is more easy to operate, therefore dictionary dilatation works
Simple efficiency high.
For ease of understanding, the information processing method in the embodiment of the present invention is described in detail with another embodiment below, please be join
Read Fig. 3, another embodiment of the information processing method in the embodiment of the present invention includes:
The 201st, pinyin syllable and phonetic ID corresponding relation are set;
Phonetic ID and the corresponding relation of pinyin syllable are set, and phonetic ID is the state transfer amount of even numbers group dictionary tree.
It should be noted that due to running into character c for state s(In the present embodiment, c is phonetic ID)Arrival state t
A State Transferring, have in even numbers group dictionary tree:
check[base[s]+c]=s
base[s]+c=t
So, for base [s] selection, then chosen according to system queries dictionary concrete condition, base [s] excessive possibility
Base arrays can be caused excessively sparse, base [s] is too small, and to be likely to result in collision more, wherein collision refers to perform next
After [base [s]+c]=t, base [base [s]+c] is not sky, and this is to need to choose base [s] again, when colliding
Suitable base [s] is selected so that all NextStates using this state as current state can find the room in base
Son.In the embodiment of the present invention, by taking base [s]=1 as an example, the selection of specific base [s] is chosen according to actual application, herein
It is not especially limited.
Referring to Fig. 4, the even numbers group dictionary tree in the present embodiment, represents pinyin syllable, phonetic ID is using phonetic ID
One integer, still with reference to the previous case, phonetic ID have following corresponding relation with pinyin syllable:
1:End mark " $ ", 2:Zhong, 3:Guo, 4:Ren, 5:He, 6:xie.
So, the even numbers group dictionary tree construction that example shown in earlier figures 1 is generated is as shown in Figure 4.
202nd, even numbers group dictionary tree, the even numbers group dictionary tree bag are generated according to pinyin syllable and phonetic ID corresponding relation
Include base value array and verification array;
Due to even numbers group dictionary tree base [s] and c value it has been determined that according to pinyin syllable and phonetic ID corresponding relation
Even numbers group dictionary tree can be generated, wherein, the first element of array, i.e. check [0], the element of expression even numbers group are verified in even numbers group
Number, the first element of base value array, i.e. base [0], SDAT are the mark of data block, and its numerical value is the character string of ASCII codings
" SDAT ", totally 4 characters, take 4 bytes.Base value is represented when the first bit of base value array element of even numbers group is 0, is
A word group is represented when 1, ensuing 22 are state transfer amount of the first word of the group in word list, last 9
Position can represent the phrase number of the group.
The first element of word list, represents the element number of the word list, and the data structure can support more than 400 ten thousand
(222-1)Individual word, homonym number are maximum up to 511(29-1)It is individual.
According to the example in abovementioned steps 201, the even numbers group dictionary tree generated is PG_1, PG_2 and PG_3 shown in Fig. 4
For the word represented by the leaf node of each branch.
203rd, the pinyin character cutting for inputting the user is syllable, and the syllable is linked in sequence as the phonetic
ID goes here and there;
The pinyin character of user's input is received by peripheral apparatus, the peripheral apparatus can be defeated for keyboard, touch-screen, voice
Enter device etc..It is syllable by the pinyin character cutting, and the syllable is linked in sequence and gone here and there for the phonetic ID.
204th, the phonetic ID strings required to look up are received;
The sequence that corresponding phonetic ID is formed after pinyin character cutting of the phonetic ID strings for user's input.For example, ginseng
According to phonetic ID in abovementioned steps 201 and the corresponding relation of pinyin syllable, if the pinyin character of user's input is hexie, cutting
Corresponding phonetic ID is 5,6 afterwards, then, the phonetic ID strings of composition are 561, wherein 1 represents end mark " $ ".
205th, since the root node of the even numbers group dictionary tree, according to the phonetic in the phonetic ID strings required to look up
The value of ID and the base value array searches word corresponding to the phonetic ID strings;
For example, it is desired to the phonetic ID strings searched are 561, then according to even numbers group dictionary tree shown in earlier figures 4, respectively according to 5,
6th, 1, and value corresponding in base value array searches word corresponding to phonetic ID string " 561 ", the word of acquisition may be " and
It is humorous " or " river crab ", as user according to needed for selecting output result word.
206th, corresponding to the phonetic ID in the phonetic ID strings required to look up it is end mark, and the even numbers group trie
The first bit of current base array elements is 1, then exports the word that current lookup arrives;
When being end mark corresponding to the ID in phonetic ID that phonetic requires to look up string, and the even numbers group dictionary tree is current
The first bit of base value array element be 1, then show current lookup to result be a word group, export and wrapped in the word group
The word contained.
If not being the 207, end mark corresponding to the phonetic ID, judge current verification array value whether with current lookup
Whether the numbering of the node in sequence node before transfering state is equal;
If not being end mark corresponding to the phonetic ID, show that current lookup does not terminate, by verifying the current of array
Check value judges whether current lookup correct, specifically judge current verification array value whether with current lookup sequence node
Whether the numbering of the node before transfering state is equal, for example, referring to even numbers group dictionary tree shown in Fig. 4 in abovementioned steps 201, verification
The 5th value is " 3 " in array, and corresponding node serial number is 4, and a upper node, i.e. node serial number before its transfering state are
" 3 ", then current lookup is in the right direction, and for another example, it is " 6 " to verify the 8th value in array, and corresponding node serial number is 7, a upper section
Node serial number before point, i.e. its transfering state is " 3 ", then current lookup is in the right direction, on the contrary then incorrect.
If the 208, currently the value of verification array is equal with the numbering of the node before transfering state in current lookup sequence node,
Then continued to search for down according to next phonetic ID in the phonetic ID strings required to look up and the value sum of current base value array
One node.
If currently the value of verification array and the numbering of the node before transfering state in current lookup sequence node are unequal,
Illustrate this branch is not present in even numbers group dictionary tree, i.e. word corresponding to the phonetic is not present.
Specifically, referring to Fig. 5, since the root node i of even numbers group dictionary tree, i numbers for root node, in this example, i=
1.According to the pinyin syllable to be inquired about, jump to and number the node for being j, j=base [i]+c, wherein, c is state transfer amount, i.e.,
Phonetic ID, judging whether phonetic ID is 1, " 1 " corresponding end mark, that is, judge whether state transfer amount c is end mark in phonetic ID,
If c is end mark, judge whether current base value array first place character is 1, if, then it represents that the word of matching is found, it is defeated
Go out now word;If not, then it represents that do not find the word of matching, if c is not end mark, judge the value of current verification array
It is whether equal with a upper node serial number for current queries node, that is, judge whether check [j] is equal to i, if it is not, then explanation is double
This branch is not present in array dictionary tree, i.e. word corresponding to the phonetic is not present, if so, then continuing to inquire about next sound downwards
Section, until it is end mark to inquire c, and current base value array first place character is 1, then it represents that finds the word of matching.
For ease of understanding, the letter in the embodiment of the present invention is described in detail exemplified by searching " zhongguo " in dictionary below
Processing method is ceased, referring to Fig. 6, even numbers group dictionary tree, still by taking aforementioned exemplary as an example, based on base=1, the first next state turns
Shifting amount is 2, is continued based on base=1, and the second next state transfer amount is 3, is continued based on base=1, third time state
Transfer amount is 1, and state transfer amount is 1 corresponding end mark, and in check arrays, check [4]=3, now can determine whether to have inquired about
To " zhongguo ", " China ", " middle mistake ", " kind " are exported from word list, for selection by the user.
It should be noted that the information processing method in the embodiment of the present invention, can apply to speech recognition system, phonetic
In input method execution module, the speed of processing information can be improved, also, the information processing method in the embodiment of the present invention can be applied
In all IT products for relying on extensive dictionaries, the information processing efficiency of product is improved.
In the embodiment of the present invention, pinyin syllable and phonetic ID corresponding relation generation even numbers group dictionary tree are set, from described
The root node of even numbers group dictionary tree starts, the phonetic ID's and the base value array in being gone here and there according to the phonetic ID required to look up
Value searches word corresponding to phonetic ID string, and when being end mark corresponding to phonetic ID, and the even numbers group dictionary tree is worked as
Preceding base value array element first place bit is 1, then exports the word that current lookup arrives, improve looking into Large Copacity pinyin lexicon
Ask phrase speed.When needing dilatation dictionary, due to adding syllable and corresponding on the basis of the even numbers group dictionary tree of such a structure
Word is more easy to operate, therefore the simple efficiency high of dictionary dilatation work.
The information processor in the embodiment of the present invention is described below, referring to Fig. 7, at the data in the embodiment of the present invention
One embodiment of reason device includes:
Generation unit 301, it is described double for generating even numbers group dictionary tree according to pinyin syllable and phonetic ID corresponding relation
Array dictionary tree includes:Base value array and verification array, the phonetic ID are the state transfer amount of the even numbers group dictionary tree;
Receiving unit 302, for receiving the phonetic ID required to look up strings, the phonetic word that the phonetic ID strings input for user
The sequence that corresponding phonetic ID is formed after symbol cutting;
Searching unit 303, connect for searching the receiving unit 302 according to the even numbers group dictionary tree in pinyin lexicon
Word corresponding to the phonetic ID strings received;
Output unit 304, the word found for exporting the searching unit 303.
Each unit realizes the detailed process of function in information processor in the embodiment of the present invention, see the institute of earlier figures 2
Show each step detailed description in embodiment, here is omitted.
In the embodiment of the present invention, due to generating even numbers group dictionary tree according to pinyin syllable and phonetic ID, needed searching user
During the pinyin character to be searched, it can be searched according to pinyin syllable in a branch in even numbers group dictionary tree, without
Need to travel through all words in phonetic packet, inquiry workload is small, inquiry velocity faster, furthermore, it is desirable to during dilatation dictionary, by
Syllable is added on the basis of the even numbers group dictionary tree in such a structure and corresponding word is more easy to operate, therefore dictionary dilatation works
Simple efficiency high.
For ease of understanding, the information processor in the embodiment of the present invention is described in detail with another embodiment below, please be join
Read Fig. 8, another embodiment of the information processor in the embodiment of the present invention includes:
Generation unit 401, for generating even numbers group according to the corresponding relation of pinyin syllable and phonetic ID identity numbers
Dictionary tree, the even numbers group dictionary tree include:Base value array and verification array, the phonetic ID are the even numbers group dictionary tree
State transfer amount;
Receiving unit 402, for receiving the phonetic ID required to look up strings, the phonetic word that the phonetic ID strings input for user
The sequence that corresponding phonetic ID is formed after symbol cutting;
Searching unit 403, connect for searching the receiving unit 402 according to the even numbers group dictionary tree in pinyin lexicon
Word corresponding to the phonetic ID strings received;
Output unit 404, the word found for exporting the searching unit 403.
It should be noted that the information processor in the embodiment of the present invention can further include:
Setting unit 405, for setting phonetic ID and pinyin syllable corresponding relation.
Further, searching unit 403, it is additionally operable to since the root node of the even numbers group dictionary tree, according to the need
The value of phonetic ID and the base value array in the phonetic ID to be searched strings search word corresponding to the phonetic ID strings;
Output unit 404, if it is end mark corresponding to the phonetic ID being additionally operable in the phonetic ID strings required to look up, and
The current first bit of base value array element of the even numbers group dictionary tree is 1, then exports the word that current lookup arrives.
Information processor in the embodiment of the present invention can further include:
Judging unit 406, if for not being end mark corresponding to the phonetic ID, judging the value of current verification array is
Whether the numbering of the node in the no sequence node with current lookup before transfering state is equal;
Further, searching unit 403, if be additionally operable to current verification array value whether with current lookup sequence node
The numbering of node before transfering state is equal, then according to next phonetic ID in the phonetic ID strings required to look up and currently
The value sum of base value array continues to search for next node.
Information processor in the embodiment of the present invention can further include:
Converting unit 407, the pinyin character cutting for the user to be inputted are syllable, and syllable order is connected
It is connected in the phonetic ID strings.
Each unit realizes the detailed process of function in information processor in the embodiment of the present invention, see earlier figures 2 and
Each step detailed description in embodiment illustrated in fig. 3, here is omitted.
In the embodiment of the present invention, pinyin syllable and phonetic ID corresponding relation generation even numbers group dictionary tree are set, from described
The root node of even numbers group dictionary tree starts, the phonetic ID's and the base value array in being gone here and there according to the phonetic ID required to look up
Value searches word corresponding to phonetic ID string, and when being end mark corresponding to phonetic ID, and the even numbers group dictionary tree is worked as
Preceding base value array element first place bit is 1, then exports the word that current lookup arrives, improve looking into Large Copacity pinyin lexicon
Ask phrase speed.When needing dilatation dictionary, due to adding syllable and corresponding on the basis of the even numbers group dictionary tree of such a structure
Word is more easy to operate, therefore the simple efficiency high of dictionary dilatation work.
It will be appreciated by those skilled in the art that realize that all or part of step in above-described embodiment method is to pass through
Program come instruct correlation hardware complete, described program can be stored in a kind of computer-readable recording medium, above-mentioned to carry
To storage medium can be read-only storage, disk or CD etc..
A kind of information processing method provided by the present invention and device are described in detail above, for this area
Technical staff, according to the thought of the embodiment of the present invention, there will be changes in specific embodiments and applications, to sum up
Described, this specification content should not be construed as limiting the invention.
Claims (8)
- A kind of 1. information processing method, it is characterised in that including:Even numbers group dictionary tree is generated according to pinyin syllable and phonetic ID corresponding relation, the even numbers group dictionary tree includes:Base value Array and verification array, the phonetic ID are the state transfer amount of the even numbers group dictionary tree;The pinyin character cutting that user is inputted is syllable, and the syllable is linked in sequence and gone here and there for phonetic ID;Receive the phonetic ID strings required to look up;In pinyin lexicon according to corresponding to the even numbers group dictionary tree searches the phonetic ID string word;Export the word found.
- 2. according to the method for claim 1, it is characterised in that described to be given birth to according to pinyin syllable and phonetic ID corresponding relation Include before array dictionary tree in pairs:The phonetic ID and the pinyin syllable corresponding relation are set.
- 3. method according to claim 1 or 2, it is characterised in that it is described in pinyin lexicon according to the even numbers group word Word corresponding to the allusion quotation tree lookup phonetic ID strings includes:Since the root node of the even numbers group dictionary tree, phonetic ID in the phonetic ID strings required to look up according to described and described The value of base value array searches word corresponding to the phonetic ID strings;If it is end mark corresponding to the phonetic ID in the phonetic ID strings required to look up, and the even numbers group dictionary tree is current The first bit of base value array element is 1, then exports the word that current lookup arrives.
- 4. according to the method for claim 3, it is characterised in that it is described since the root node of the even numbers group dictionary tree, Searched according to the phonetic ID in the phonetic ID strings required to look up and the value of the base value array corresponding to the phonetic ID strings Include after word:If not being end mark corresponding to the phonetic ID in the phonetic ID strings required to look up, the value of current verification array is judged It is whether equal with the numbering of the node before transfering state in current lookup sequence node;If so, the value sum of the next phonetic ID and current base value array in the phonetic ID strings then required to look up according to described after It is continuous to search next node.
- A kind of 5. information processor, it is characterised in that including:Generation unit, even numbers group dictionary tree, the even numbers group dictionary tree are generated according to pinyin syllable and phonetic ID corresponding relation Including:Base value array and verification array, the phonetic ID are the state transfer amount of the even numbers group dictionary tree;Converting unit, the pinyin character cutting for user to be inputted are syllable, and the syllable are linked in sequence as phonetic ID String;Receiving unit, for receiving the phonetic ID required to look up strings;Searching unit, for searching the spelling of the receiving unit reception according to the even numbers group dictionary tree in pinyin lexicon Word corresponding to sound ID strings;Output unit, the word found for exporting the searching unit.
- 6. device according to claim 5, it is characterised in that described device also includes:Setting unit, for setting the phonetic ID and the pinyin syllable corresponding relation.
- 7. device according to claim 6, it is characterised in thatThe searching unit, it is additionally operable to since the root node of the even numbers group dictionary tree, according to the phonetic required to look up The value of phonetic ID and the base value array in ID strings search word corresponding to the phonetic ID strings;The output unit, if it is end mark corresponding to the phonetic ID being additionally operable in the phonetic ID strings required to look up, and institute The current first bit of base value array element for stating even numbers group dictionary tree is 1, then exports the word that current lookup arrives.
- 8. device according to claim 7, it is characterised in thatDescribed device also includes:Judging unit, if for not being end mark corresponding to the phonetic ID in the phonetic ID strings required to look up, judge to work as Whether whether the value of preceding verification array equal with the numbering of the node before transfering state in current lookup sequence node;The searching unit, it is additionally operable to if so, next phonetic ID in the phonetic ID strings then required to look up according to described and working as The value sum of preceding base value array continues to search for next node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210468061.1A CN103823814B (en) | 2012-11-19 | 2012-11-19 | A kind of information processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210468061.1A CN103823814B (en) | 2012-11-19 | 2012-11-19 | A kind of information processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103823814A CN103823814A (en) | 2014-05-28 |
CN103823814B true CN103823814B (en) | 2017-12-01 |
Family
ID=50758884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210468061.1A Active CN103823814B (en) | 2012-11-19 | 2012-11-19 | A kind of information processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103823814B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107153647B (en) * | 2016-03-02 | 2021-12-07 | 北京字节跳动网络技术有限公司 | Method, apparatus, system and computer program product for data compression |
CN105955986A (en) * | 2016-04-18 | 2016-09-21 | 乐视控股(北京)有限公司 | Character converting method and apparatus |
CN106484684B (en) * | 2016-10-11 | 2019-04-05 | 语联网(武汉)信息技术有限公司 | Data in a kind of pair of database carry out the matched method of term |
CN106649286B (en) * | 2016-10-15 | 2019-07-02 | 语联网(武汉)信息技术有限公司 | One kind carrying out the matched method of term based on even numbers group dictionary tree |
CN109426358B (en) * | 2017-09-01 | 2023-04-07 | 百度在线网络技术(北京)有限公司 | Information input method and device |
CN109933774A (en) * | 2017-12-15 | 2019-06-25 | 腾讯科技(深圳)有限公司 | Method for recognizing semantics, device storage medium and electronic device |
CN110597800A (en) * | 2018-05-23 | 2019-12-20 | 杭州海康威视数字技术股份有限公司 | Method and device for determining annotation information and constructing prefix tree |
CN109065016B (en) * | 2018-08-30 | 2021-04-13 | 出门问问信息科技有限公司 | Speech synthesis method, speech synthesis device, electronic equipment and non-transient computer storage medium |
CN109377980B (en) * | 2018-08-31 | 2022-06-07 | 众安信息技术服务有限公司 | Syllable segmentation method and device |
CN111967248A (en) * | 2020-07-09 | 2020-11-20 | 深圳价值在线信息科技股份有限公司 | Pinyin identification method and device, terminal equipment and computer readable storage medium |
CN112035597B (en) * | 2020-09-04 | 2023-11-21 | 常州新途软件有限公司 | Vehicle-mounted input method |
CN112185356A (en) * | 2020-09-29 | 2021-01-05 | 北京百度网讯科技有限公司 | Speech recognition method, speech recognition device, electronic device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20010108845A (en) * | 2000-05-31 | 2001-12-08 | 기민호 | Term-based cluster management system and method for query processing in information retrieval |
CN1786962A (en) * | 2005-12-21 | 2006-06-14 | 中国科学院计算技术研究所 | Method for managing and searching dictionary with perfect even numbers group TRIE Tree |
CN101075262A (en) * | 2007-06-12 | 2007-11-21 | 腾讯科技(深圳)有限公司 | Method and system for inputting Chinese character by computer |
CN101079060A (en) * | 2007-03-26 | 2007-11-28 | 腾讯科技(深圳)有限公司 | Chinese character input simple 'pinyin' implementation method and system |
CN101710258A (en) * | 2009-12-23 | 2010-05-19 | 福州星网视易信息系统有限公司 | Embedded device simple input method |
CN102737105A (en) * | 2012-03-31 | 2012-10-17 | 北京小米科技有限责任公司 | Dict-tree generation method and searching method |
-
2012
- 2012-11-19 CN CN201210468061.1A patent/CN103823814B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20010108845A (en) * | 2000-05-31 | 2001-12-08 | 기민호 | Term-based cluster management system and method for query processing in information retrieval |
CN1786962A (en) * | 2005-12-21 | 2006-06-14 | 中国科学院计算技术研究所 | Method for managing and searching dictionary with perfect even numbers group TRIE Tree |
CN101079060A (en) * | 2007-03-26 | 2007-11-28 | 腾讯科技(深圳)有限公司 | Chinese character input simple 'pinyin' implementation method and system |
CN101075262A (en) * | 2007-06-12 | 2007-11-21 | 腾讯科技(深圳)有限公司 | Method and system for inputting Chinese character by computer |
CN101710258A (en) * | 2009-12-23 | 2010-05-19 | 福州星网视易信息系统有限公司 | Embedded device simple input method |
CN102737105A (en) * | 2012-03-31 | 2012-10-17 | 北京小米科技有限责任公司 | Dict-tree generation method and searching method |
Also Published As
Publication number | Publication date |
---|---|
CN103823814A (en) | 2014-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103823814B (en) | A kind of information processing method and device | |
CN103268313B (en) | A kind of semantic analytic method of natural language and device | |
CN104252484B (en) | A kind of phonetic error correction method and system | |
Gesmundo et al. | Lemmatisation as a tagging task | |
CN105138514B (en) | It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method | |
CN106598939A (en) | Method and device for text error correction, server and storage medium | |
CN102768681A (en) | Recommending system and method used for search input | |
CN104731768B (en) | A kind of location of incident abstracting method towards Chinese newsletter archive | |
EP1352330A1 (en) | Method and system for generating structured data from semi-structured data sources | |
CN102955833A (en) | Correspondence address identifying and standardizing method | |
CN102236702A (en) | Computer executing method and systems and devices for searching using queries | |
CN110046348A (en) | Main body recognition methods in a kind of rule-based and dictionary metro design code | |
CN102214166A (en) | Machine translation system and machine translation method based on syntactic analysis and hierarchical model | |
CN101493812B (en) | Tone-character conversion method | |
CN103294820B (en) | WEB page classifying method and system based on semantic extension | |
CN102955832A (en) | Correspondence address identifying and standardizing system | |
CN106339455A (en) | Webpage text extracting method based on text tag feature mining | |
CN102867049A (en) | Chinese PINYIN quick word segmentation method based on word search tree | |
CN104199954B (en) | A kind of commending system and method for searching for input | |
CN109213998A (en) | Chinese wrongly written character detection method and system | |
CN105447104A (en) | Knowledge map generating method and apparatus | |
EP3945431A1 (en) | Bridge from natural language processing engine to database engine | |
CN106295252A (en) | Search method for gene prod | |
CN102622378A (en) | Method and device for detecting events from text flow | |
CN108733848A (en) | A kind of method and system of search knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |