CN103823814B - A kind of information processing method and device - Google Patents

A kind of information processing method and device Download PDF

Info

Publication number
CN103823814B
CN103823814B CN201210468061.1A CN201210468061A CN103823814B CN 103823814 B CN103823814 B CN 103823814B CN 201210468061 A CN201210468061 A CN 201210468061A CN 103823814 B CN103823814 B CN 103823814B
Authority
CN
China
Prior art keywords
phonetic
even numbers
word
strings
dictionary tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210468061.1A
Other languages
Chinese (zh)
Other versions
CN103823814A (en
Inventor
李鑫
李东华
刘廷超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201210468061.1A priority Critical patent/CN103823814B/en
Publication of CN103823814A publication Critical patent/CN103823814A/en
Application granted granted Critical
Publication of CN103823814B publication Critical patent/CN103823814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The embodiment of the invention discloses a kind of information processing method and device, for according to the Chinese phonetic alphabet that user inputs in Large Copacity dictionary word corresponding to quick search to the phonetic.Present invention method includes:Even numbers group dictionary tree is generated according to pinyin syllable and phonetic ID corresponding relation, the even numbers group dictionary tree includes:Base value array and verification array, the phonetic ID is the state transfer amount of the even numbers group dictionary tree, receive the phonetic ID strings required to look up, the sequence that corresponding phonetic ID is formed after pinyin character cutting of the phonetic ID strings for user's input, in pinyin lexicon according to corresponding to the even numbers group dictionary tree searches the phonetic ID string word, export the word found.

Description

A kind of information processing method and device
Technical field
The present invention relates to communication technical field, more particularly to a kind of information processing method and device.
Background technology
With the continuous lifting of computer hardware performance and the intelligent continuous improvement of software, increasingly, it is desired that calculating Machine can provide more natural man-machine interaction mode, such as:(1)More intelligent Chinese character input method is provided;(2)There is provided more Accurate speech identifying function.And the realization of these interactive modes, bottom are required for the support of large and complete pinyin lexicon, institute So that the search efficiency of extensive pinyin lexicon directly affects the execution speed of above-mentioned interactive software, so as to also just determine Their quality.It is accurately and quickly its lifeline certainly by taking spelling input method as an example, in order to improve accuracy rate, Current input method system all employs ultra-large dictionary, and when user is inputted, program needs the phonetic according to input Big frequently thesaurus-lookups are carried out, so as to provide accurate candidate's word.
In the prior art, existing pinyin lexicon system mostly uses the storage issuer based on phonetic and word length packet Method, i.e., for a given pinyin string, it is first obtained according to word is long and the top n phonetic of word is indexed to dictionary Top n syllable and word length, the phonetic grouping sheet that equivalent is grown into dictionary, find and are grouped corresponding to the syllable, travel through the packet In all words, return to the word that phonetic and the pinyin string to be searched match.
But in the above prior art, thesaurus-lookups efficiency is low, it is necessary to travel through all words in same packet, and dictionary Dilatation poor-performing, when dictionary constantly increases, inquiry will be time-consuming to be multiplied, and cause software can not normal work.
The content of the invention
The embodiments of the invention provide a kind of information processing method and device, to realize in pinyin lexicon, according to Word corresponding to the quick lookup of pinyin character of family input.
Information processing method provided in an embodiment of the present invention, including:According to pinyin syllable and phonetic identity number ID Corresponding relation generation even numbers group dictionary tree, the even numbers group dictionary tree includes:Base value array and verification array, the phonetic ID For the state transfer amount of the even numbers group dictionary tree;The phonetic ID strings required to look up are received, the phonetic ID strings input for user Pinyin character cutting after the sequence that forms of corresponding phonetic ID;Institute is searched according to the even numbers group dictionary tree in pinyin lexicon State word corresponding to phonetic ID strings;Export the word found.
Preferably, include before the generation even numbers group dictionary tree according to pinyin syllable and phonetic ID corresponding relation:If Put phonetic ID and the corresponding relation of pinyin syllable.
Further, it is described in pinyin lexicon according to corresponding to the even numbers group dictionary tree searches the phonetic ID string word Language includes:Since the root node of the even numbers group dictionary tree, phonetic ID in the phonetic ID strings required to look up according to described and The value of the base value array searches word corresponding to the phonetic ID strings;If it is end mark corresponding to the phonetic ID, and it is described The current first bit of base value array element of even numbers group dictionary tree is 1, then exports the word that current lookup arrives.
Further, it is described since the root node of the even numbers group dictionary tree, according to the phonetic ID required to look up The value of phonetic ID and the base value array in string include after searching word corresponding to the phonetic ID strings:If the phonetic ID Corresponding is not end mark, then judge current verification array value whether with the section before transfering state in current lookup sequence node Whether the numbering of point is equal;If so, then according to next phonetic ID in the phonetic ID strings required to look up and current base value The value sum of array continues to search for next node.
Further, the phonetic ID strings for needing to inquire about that receive include before:The pinyin character that the user is inputted Cutting is syllable, and the syllable is linked in sequence and gone here and there for the phonetic ID.
Information processor provided in an embodiment of the present invention, including:Generation unit, according to pinyin syllable and phonetic identity mark Know number ID corresponding relation generation even numbers group dictionary tree, the even numbers group dictionary tree includes:Base value array and verification array, institute State the state transfer amount that phonetic ID is the even numbers group dictionary tree;Receiving unit, for receiving the phonetic ID required to look up strings, institute State the sequence that phonetic ID strings are corresponding phonetic ID compositions after the pinyin character cutting that user inputs;Searching unit, for spelling In sound dictionary according to corresponding to the even numbers group dictionary tree searches the phonetic ID strings that the receiving unit receives word;Output Unit, the word found for exporting the searching unit.
Preferably, described device also includes:Setting unit, for setting phonetic ID and pinyin syllable corresponding relation.
Further, the searching unit, it is additionally operable to since the root node of the even numbers group dictionary tree, according to the need The value of phonetic ID and the base value array in the phonetic ID to be searched strings search word corresponding to the phonetic ID strings;
The output unit, it is end mark if being additionally operable to corresponding to the phonetic ID, and the even numbers group dictionary tree is worked as The preceding first bit of base value array element is 1, then exports the word that current lookup arrives.
Further, described device also includes:Judging unit, if for not being end mark corresponding to the phonetic ID, Judge whether the value of current verification array equal with the numbering of the node before transfering state in current lookup sequence node;
The searching unit, it is additionally operable to if so, then according to next phonetic ID in the phonetic ID strings required to look up Next node is continued to search for the value sum of current base value array.
Further, described device also includes:Converting unit, the pinyin character cutting for the user to be inputted are sound Section, and the syllable is linked in sequence and gone here and there for the phonetic ID.
As can be seen from the above technical solutions, the embodiment of the present invention has advantages below:Due to according to pinyin syllable and spelling Sound ID generates even numbers group dictionary tree, can be according to pinyin syllable in even numbers group when searching the pinyin character that user requires to look up Searched in a branch in dictionary tree, without traveling through all words in phonetic packet, inquiry workload is small, looks into Ask speed faster.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those skilled in the art, without having to pay creative labor, can be with root Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the phrase dictionary tree schematic diagram of Chinese character;
Fig. 2 is one embodiment schematic diagram of the information processing method in the embodiment of the present invention;
Fig. 3 is another embodiment schematic diagram of the information processing method in the embodiment of the present invention;
Fig. 4 is the example schematic that even numbers group dictionary tree construction is generated in the embodiment of the present invention;
Fig. 5 is query terms flow chart in the information processing method in the embodiment of the present invention;
Fig. 6 is an example schematic of the information processing method in the embodiment of the present invention;
Fig. 7 is one embodiment schematic diagram of the information processor in the embodiment of the present invention;
Fig. 8 is another embodiment schematic diagram of the information processor in the embodiment of the present invention.
Embodiment
The technical scheme of the embodiment of the present invention is further illustrated with specific embodiment below in conjunction with the accompanying drawings, it is clear that described Embodiment be only part of the embodiment of the present invention, rather than whole embodiment.Based on the embodiment in the present invention, ability The every other embodiment that field technique personnel are obtained under the premise of creative work is not made, belong to what the present invention protected Scope.
The embodiments of the invention provide a kind of information processing method and device, the Chinese phonetic alphabet for being inputted according to user exists Word or phrase corresponding to quick search to the phonetic in Large Copacity dictionary.
Trie trees are one kind of search tree, can establish effective data retrieval institutional framework, realize the lookup in dictionary The algorithm of word.It is substantially the finite-state automata of a determination(DFA, Deterministic Finite Automaton), a state of each node on behalf automatic machine.This state includes " word prefix " in dictionary, " into word " Deng.
Even numbers group dictionary tree(Double Array Trie)It it is one of trie trees simple and effectively realize, by two Integer array is formed, if array index is i, i is the integer more than or equal to 1, then even numbers group a array is base value array Base [i], another array are verification array check [i], and its each branch is exactly to be reached after running into specific character from some state One State Transferring of another state.Such as, a character c arrival states t State Transferring is run into for state s, in even numbers Have in group:
check[base[s]+c]=s
base[s]+c=t
In the present embodiment, the purpose of thesaurus-lookups is according to given cutting phonetic, there is provided corresponding word candidate.It is first What is first done is exactly to be indexed to dictionary plus dictionary tree.In units of syllable, syllable is the substantially single of pronunciation for the branch of dictionary tree Position, be in pronunciation can not cutting again least unit.Assuming that dictionary only has three words now:China, Chinese, harmony.For this The dictionary tree index that dictionary is established is as shown in Figure 1.In order to avoid there is the prefix that some word is another word, such as " China " in " Chinese ", we add an end mark " $ " to each word.So, each node for having term data is independent Leaf node.Because each branch of each node of dictionary tree is unique, only need to be searched successively according to branch during lookup, Search the length that number of comparisons is word.Represent there is no the word in dictionary if finding some syllable and there is no branch.
The information processing method in the embodiment of the present invention is described below, referring to Fig. 2, at the information in the embodiment of the present invention One embodiment of reason method includes:
101st, even numbers group dictionary tree, the even numbers group dictionary tree bag are generated according to pinyin syllable and phonetic ID corresponding relation Include base value array and verification array;
In the embodiment of the present invention, according to pinyin syllable in dictionary and phonetic identity number(ID, IDentity)Pair The even numbers group dictionary tree of generation, the base value of the even numbers group dictionary tree should be related to(base)Array and verification(check)The member of array Element is one-to-one, and equivalent to a node of even numbers group dictionary tree, its value is used as shape for each element in base value array The base value of state transfer, the value of relevant position in array is verified equivalent to check value, for examining the state after shifting whether there is.
State transfer amount in even numbers group dictionary tree, the offset of another state is transferred to from a state, i.e., from one Individual node is transferred to the offset of next node.State transfer amount is actually needed determination according to even numbers group dictionary tree.This reality Apply in example, phonetic ID is the state transfer amount of even numbers group dictionary tree.
102nd, the phonetic ID strings required to look up are received;
The sequence that corresponding phonetic ID is formed after pinyin character cutting of the phonetic ID strings for user's input;
103rd, in pinyin lexicon according to corresponding to the even numbers group dictionary tree searches the phonetic ID string word;
Include word list and the even numbers group dictionary tree in pinyin lexicon, the phonetic ID strings searched as needed, Utilized in pinyin lexicon in the even numbers group dictionary tree and search word corresponding to phonetic ID strings in word list, found Word is probably the word that a word is formed, it is also possible to the word of two word composition described above.
104th, the word found is exported.
The word found by peripheral apparatus output.
In the embodiment of the present invention, due to generating even numbers group dictionary tree according to pinyin syllable and phonetic ID, needed searching user During the pinyin character to be searched, it can be searched according to pinyin syllable in a branch in even numbers group dictionary tree, without Need to travel through all words in phonetic packet, inquiry workload is small, inquiry velocity faster, furthermore, it is desirable to during dilatation dictionary, by Syllable is added on the basis of the even numbers group dictionary tree in such a structure and corresponding word is more easy to operate, therefore dictionary dilatation works Simple efficiency high.
For ease of understanding, the information processing method in the embodiment of the present invention is described in detail with another embodiment below, please be join Read Fig. 3, another embodiment of the information processing method in the embodiment of the present invention includes:
The 201st, pinyin syllable and phonetic ID corresponding relation are set;
Phonetic ID and the corresponding relation of pinyin syllable are set, and phonetic ID is the state transfer amount of even numbers group dictionary tree.
It should be noted that due to running into character c for state s(In the present embodiment, c is phonetic ID)Arrival state t A State Transferring, have in even numbers group dictionary tree:
check[base[s]+c]=s
base[s]+c=t
So, for base [s] selection, then chosen according to system queries dictionary concrete condition, base [s] excessive possibility Base arrays can be caused excessively sparse, base [s] is too small, and to be likely to result in collision more, wherein collision refers to perform next After [base [s]+c]=t, base [base [s]+c] is not sky, and this is to need to choose base [s] again, when colliding Suitable base [s] is selected so that all NextStates using this state as current state can find the room in base Son.In the embodiment of the present invention, by taking base [s]=1 as an example, the selection of specific base [s] is chosen according to actual application, herein It is not especially limited.
Referring to Fig. 4, the even numbers group dictionary tree in the present embodiment, represents pinyin syllable, phonetic ID is using phonetic ID One integer, still with reference to the previous case, phonetic ID have following corresponding relation with pinyin syllable:
1:End mark " $ ", 2:Zhong, 3:Guo, 4:Ren, 5:He, 6:xie.
So, the even numbers group dictionary tree construction that example shown in earlier figures 1 is generated is as shown in Figure 4.
202nd, even numbers group dictionary tree, the even numbers group dictionary tree bag are generated according to pinyin syllable and phonetic ID corresponding relation Include base value array and verification array;
Due to even numbers group dictionary tree base [s] and c value it has been determined that according to pinyin syllable and phonetic ID corresponding relation Even numbers group dictionary tree can be generated, wherein, the first element of array, i.e. check [0], the element of expression even numbers group are verified in even numbers group Number, the first element of base value array, i.e. base [0], SDAT are the mark of data block, and its numerical value is the character string of ASCII codings " SDAT ", totally 4 characters, take 4 bytes.Base value is represented when the first bit of base value array element of even numbers group is 0, is A word group is represented when 1, ensuing 22 are state transfer amount of the first word of the group in word list, last 9 Position can represent the phrase number of the group.
The first element of word list, represents the element number of the word list, and the data structure can support more than 400 ten thousand (222-1)Individual word, homonym number are maximum up to 511(29-1)It is individual.
According to the example in abovementioned steps 201, the even numbers group dictionary tree generated is PG_1, PG_2 and PG_3 shown in Fig. 4 For the word represented by the leaf node of each branch.
203rd, the pinyin character cutting for inputting the user is syllable, and the syllable is linked in sequence as the phonetic ID goes here and there;
The pinyin character of user's input is received by peripheral apparatus, the peripheral apparatus can be defeated for keyboard, touch-screen, voice Enter device etc..It is syllable by the pinyin character cutting, and the syllable is linked in sequence and gone here and there for the phonetic ID.
204th, the phonetic ID strings required to look up are received;
The sequence that corresponding phonetic ID is formed after pinyin character cutting of the phonetic ID strings for user's input.For example, ginseng According to phonetic ID in abovementioned steps 201 and the corresponding relation of pinyin syllable, if the pinyin character of user's input is hexie, cutting Corresponding phonetic ID is 5,6 afterwards, then, the phonetic ID strings of composition are 561, wherein 1 represents end mark " $ ".
205th, since the root node of the even numbers group dictionary tree, according to the phonetic in the phonetic ID strings required to look up The value of ID and the base value array searches word corresponding to the phonetic ID strings;
For example, it is desired to the phonetic ID strings searched are 561, then according to even numbers group dictionary tree shown in earlier figures 4, respectively according to 5, 6th, 1, and value corresponding in base value array searches word corresponding to phonetic ID string " 561 ", the word of acquisition may be " and It is humorous " or " river crab ", as user according to needed for selecting output result word.
206th, corresponding to the phonetic ID in the phonetic ID strings required to look up it is end mark, and the even numbers group trie The first bit of current base array elements is 1, then exports the word that current lookup arrives;
When being end mark corresponding to the ID in phonetic ID that phonetic requires to look up string, and the even numbers group dictionary tree is current The first bit of base value array element be 1, then show current lookup to result be a word group, export and wrapped in the word group The word contained.
If not being the 207, end mark corresponding to the phonetic ID, judge current verification array value whether with current lookup Whether the numbering of the node in sequence node before transfering state is equal;
If not being end mark corresponding to the phonetic ID, show that current lookup does not terminate, by verifying the current of array Check value judges whether current lookup correct, specifically judge current verification array value whether with current lookup sequence node Whether the numbering of the node before transfering state is equal, for example, referring to even numbers group dictionary tree shown in Fig. 4 in abovementioned steps 201, verification The 5th value is " 3 " in array, and corresponding node serial number is 4, and a upper node, i.e. node serial number before its transfering state are " 3 ", then current lookup is in the right direction, and for another example, it is " 6 " to verify the 8th value in array, and corresponding node serial number is 7, a upper section Node serial number before point, i.e. its transfering state is " 3 ", then current lookup is in the right direction, on the contrary then incorrect.
If the 208, currently the value of verification array is equal with the numbering of the node before transfering state in current lookup sequence node, Then continued to search for down according to next phonetic ID in the phonetic ID strings required to look up and the value sum of current base value array One node.
If currently the value of verification array and the numbering of the node before transfering state in current lookup sequence node are unequal, Illustrate this branch is not present in even numbers group dictionary tree, i.e. word corresponding to the phonetic is not present.
Specifically, referring to Fig. 5, since the root node i of even numbers group dictionary tree, i numbers for root node, in this example, i= 1.According to the pinyin syllable to be inquired about, jump to and number the node for being j, j=base [i]+c, wherein, c is state transfer amount, i.e., Phonetic ID, judging whether phonetic ID is 1, " 1 " corresponding end mark, that is, judge whether state transfer amount c is end mark in phonetic ID, If c is end mark, judge whether current base value array first place character is 1, if, then it represents that the word of matching is found, it is defeated Go out now word;If not, then it represents that do not find the word of matching, if c is not end mark, judge the value of current verification array It is whether equal with a upper node serial number for current queries node, that is, judge whether check [j] is equal to i, if it is not, then explanation is double This branch is not present in array dictionary tree, i.e. word corresponding to the phonetic is not present, if so, then continuing to inquire about next sound downwards Section, until it is end mark to inquire c, and current base value array first place character is 1, then it represents that finds the word of matching.
For ease of understanding, the letter in the embodiment of the present invention is described in detail exemplified by searching " zhongguo " in dictionary below Processing method is ceased, referring to Fig. 6, even numbers group dictionary tree, still by taking aforementioned exemplary as an example, based on base=1, the first next state turns Shifting amount is 2, is continued based on base=1, and the second next state transfer amount is 3, is continued based on base=1, third time state Transfer amount is 1, and state transfer amount is 1 corresponding end mark, and in check arrays, check [4]=3, now can determine whether to have inquired about To " zhongguo ", " China ", " middle mistake ", " kind " are exported from word list, for selection by the user.
It should be noted that the information processing method in the embodiment of the present invention, can apply to speech recognition system, phonetic In input method execution module, the speed of processing information can be improved, also, the information processing method in the embodiment of the present invention can be applied In all IT products for relying on extensive dictionaries, the information processing efficiency of product is improved.
In the embodiment of the present invention, pinyin syllable and phonetic ID corresponding relation generation even numbers group dictionary tree are set, from described The root node of even numbers group dictionary tree starts, the phonetic ID's and the base value array in being gone here and there according to the phonetic ID required to look up Value searches word corresponding to phonetic ID string, and when being end mark corresponding to phonetic ID, and the even numbers group dictionary tree is worked as Preceding base value array element first place bit is 1, then exports the word that current lookup arrives, improve looking into Large Copacity pinyin lexicon Ask phrase speed.When needing dilatation dictionary, due to adding syllable and corresponding on the basis of the even numbers group dictionary tree of such a structure Word is more easy to operate, therefore the simple efficiency high of dictionary dilatation work.
The information processor in the embodiment of the present invention is described below, referring to Fig. 7, at the data in the embodiment of the present invention One embodiment of reason device includes:
Generation unit 301, it is described double for generating even numbers group dictionary tree according to pinyin syllable and phonetic ID corresponding relation Array dictionary tree includes:Base value array and verification array, the phonetic ID are the state transfer amount of the even numbers group dictionary tree;
Receiving unit 302, for receiving the phonetic ID required to look up strings, the phonetic word that the phonetic ID strings input for user The sequence that corresponding phonetic ID is formed after symbol cutting;
Searching unit 303, connect for searching the receiving unit 302 according to the even numbers group dictionary tree in pinyin lexicon Word corresponding to the phonetic ID strings received;
Output unit 304, the word found for exporting the searching unit 303.
Each unit realizes the detailed process of function in information processor in the embodiment of the present invention, see the institute of earlier figures 2 Show each step detailed description in embodiment, here is omitted.
In the embodiment of the present invention, due to generating even numbers group dictionary tree according to pinyin syllable and phonetic ID, needed searching user During the pinyin character to be searched, it can be searched according to pinyin syllable in a branch in even numbers group dictionary tree, without Need to travel through all words in phonetic packet, inquiry workload is small, inquiry velocity faster, furthermore, it is desirable to during dilatation dictionary, by Syllable is added on the basis of the even numbers group dictionary tree in such a structure and corresponding word is more easy to operate, therefore dictionary dilatation works Simple efficiency high.
For ease of understanding, the information processor in the embodiment of the present invention is described in detail with another embodiment below, please be join Read Fig. 8, another embodiment of the information processor in the embodiment of the present invention includes:
Generation unit 401, for generating even numbers group according to the corresponding relation of pinyin syllable and phonetic ID identity numbers Dictionary tree, the even numbers group dictionary tree include:Base value array and verification array, the phonetic ID are the even numbers group dictionary tree State transfer amount;
Receiving unit 402, for receiving the phonetic ID required to look up strings, the phonetic word that the phonetic ID strings input for user The sequence that corresponding phonetic ID is formed after symbol cutting;
Searching unit 403, connect for searching the receiving unit 402 according to the even numbers group dictionary tree in pinyin lexicon Word corresponding to the phonetic ID strings received;
Output unit 404, the word found for exporting the searching unit 403.
It should be noted that the information processor in the embodiment of the present invention can further include:
Setting unit 405, for setting phonetic ID and pinyin syllable corresponding relation.
Further, searching unit 403, it is additionally operable to since the root node of the even numbers group dictionary tree, according to the need The value of phonetic ID and the base value array in the phonetic ID to be searched strings search word corresponding to the phonetic ID strings;
Output unit 404, if it is end mark corresponding to the phonetic ID being additionally operable in the phonetic ID strings required to look up, and The current first bit of base value array element of the even numbers group dictionary tree is 1, then exports the word that current lookup arrives.
Information processor in the embodiment of the present invention can further include:
Judging unit 406, if for not being end mark corresponding to the phonetic ID, judging the value of current verification array is Whether the numbering of the node in the no sequence node with current lookup before transfering state is equal;
Further, searching unit 403, if be additionally operable to current verification array value whether with current lookup sequence node The numbering of node before transfering state is equal, then according to next phonetic ID in the phonetic ID strings required to look up and currently The value sum of base value array continues to search for next node.
Information processor in the embodiment of the present invention can further include:
Converting unit 407, the pinyin character cutting for the user to be inputted are syllable, and syllable order is connected It is connected in the phonetic ID strings.
Each unit realizes the detailed process of function in information processor in the embodiment of the present invention, see earlier figures 2 and Each step detailed description in embodiment illustrated in fig. 3, here is omitted.
In the embodiment of the present invention, pinyin syllable and phonetic ID corresponding relation generation even numbers group dictionary tree are set, from described The root node of even numbers group dictionary tree starts, the phonetic ID's and the base value array in being gone here and there according to the phonetic ID required to look up Value searches word corresponding to phonetic ID string, and when being end mark corresponding to phonetic ID, and the even numbers group dictionary tree is worked as Preceding base value array element first place bit is 1, then exports the word that current lookup arrives, improve looking into Large Copacity pinyin lexicon Ask phrase speed.When needing dilatation dictionary, due to adding syllable and corresponding on the basis of the even numbers group dictionary tree of such a structure Word is more easy to operate, therefore the simple efficiency high of dictionary dilatation work.
It will be appreciated by those skilled in the art that realize that all or part of step in above-described embodiment method is to pass through Program come instruct correlation hardware complete, described program can be stored in a kind of computer-readable recording medium, above-mentioned to carry To storage medium can be read-only storage, disk or CD etc..
A kind of information processing method provided by the present invention and device are described in detail above, for this area Technical staff, according to the thought of the embodiment of the present invention, there will be changes in specific embodiments and applications, to sum up Described, this specification content should not be construed as limiting the invention.

Claims (8)

  1. A kind of 1. information processing method, it is characterised in that including:
    Even numbers group dictionary tree is generated according to pinyin syllable and phonetic ID corresponding relation, the even numbers group dictionary tree includes:Base value Array and verification array, the phonetic ID are the state transfer amount of the even numbers group dictionary tree;
    The pinyin character cutting that user is inputted is syllable, and the syllable is linked in sequence and gone here and there for phonetic ID;
    Receive the phonetic ID strings required to look up;
    In pinyin lexicon according to corresponding to the even numbers group dictionary tree searches the phonetic ID string word;
    Export the word found.
  2. 2. according to the method for claim 1, it is characterised in that described to be given birth to according to pinyin syllable and phonetic ID corresponding relation Include before array dictionary tree in pairs:
    The phonetic ID and the pinyin syllable corresponding relation are set.
  3. 3. method according to claim 1 or 2, it is characterised in that it is described in pinyin lexicon according to the even numbers group word Word corresponding to the allusion quotation tree lookup phonetic ID strings includes:
    Since the root node of the even numbers group dictionary tree, phonetic ID in the phonetic ID strings required to look up according to described and described The value of base value array searches word corresponding to the phonetic ID strings;
    If it is end mark corresponding to the phonetic ID in the phonetic ID strings required to look up, and the even numbers group dictionary tree is current The first bit of base value array element is 1, then exports the word that current lookup arrives.
  4. 4. according to the method for claim 3, it is characterised in that it is described since the root node of the even numbers group dictionary tree, Searched according to the phonetic ID in the phonetic ID strings required to look up and the value of the base value array corresponding to the phonetic ID strings Include after word:
    If not being end mark corresponding to the phonetic ID in the phonetic ID strings required to look up, the value of current verification array is judged It is whether equal with the numbering of the node before transfering state in current lookup sequence node;
    If so, the value sum of the next phonetic ID and current base value array in the phonetic ID strings then required to look up according to described after It is continuous to search next node.
  5. A kind of 5. information processor, it is characterised in that including:
    Generation unit, even numbers group dictionary tree, the even numbers group dictionary tree are generated according to pinyin syllable and phonetic ID corresponding relation Including:Base value array and verification array, the phonetic ID are the state transfer amount of the even numbers group dictionary tree;
    Converting unit, the pinyin character cutting for user to be inputted are syllable, and the syllable are linked in sequence as phonetic ID String;
    Receiving unit, for receiving the phonetic ID required to look up strings;
    Searching unit, for searching the spelling of the receiving unit reception according to the even numbers group dictionary tree in pinyin lexicon Word corresponding to sound ID strings;
    Output unit, the word found for exporting the searching unit.
  6. 6. device according to claim 5, it is characterised in that described device also includes:
    Setting unit, for setting the phonetic ID and the pinyin syllable corresponding relation.
  7. 7. device according to claim 6, it is characterised in that
    The searching unit, it is additionally operable to since the root node of the even numbers group dictionary tree, according to the phonetic required to look up The value of phonetic ID and the base value array in ID strings search word corresponding to the phonetic ID strings;
    The output unit, if it is end mark corresponding to the phonetic ID being additionally operable in the phonetic ID strings required to look up, and institute The current first bit of base value array element for stating even numbers group dictionary tree is 1, then exports the word that current lookup arrives.
  8. 8. device according to claim 7, it is characterised in that
    Described device also includes:
    Judging unit, if for not being end mark corresponding to the phonetic ID in the phonetic ID strings required to look up, judge to work as Whether whether the value of preceding verification array equal with the numbering of the node before transfering state in current lookup sequence node;
    The searching unit, it is additionally operable to if so, next phonetic ID in the phonetic ID strings then required to look up according to described and working as The value sum of preceding base value array continues to search for next node.
CN201210468061.1A 2012-11-19 2012-11-19 A kind of information processing method and device Active CN103823814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210468061.1A CN103823814B (en) 2012-11-19 2012-11-19 A kind of information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210468061.1A CN103823814B (en) 2012-11-19 2012-11-19 A kind of information processing method and device

Publications (2)

Publication Number Publication Date
CN103823814A CN103823814A (en) 2014-05-28
CN103823814B true CN103823814B (en) 2017-12-01

Family

ID=50758884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210468061.1A Active CN103823814B (en) 2012-11-19 2012-11-19 A kind of information processing method and device

Country Status (1)

Country Link
CN (1) CN103823814B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153647B (en) * 2016-03-02 2021-12-07 北京字节跳动网络技术有限公司 Method, apparatus, system and computer program product for data compression
CN105955986A (en) * 2016-04-18 2016-09-21 乐视控股(北京)有限公司 Character converting method and apparatus
CN106484684B (en) * 2016-10-11 2019-04-05 语联网(武汉)信息技术有限公司 Data in a kind of pair of database carry out the matched method of term
CN106649286B (en) * 2016-10-15 2019-07-02 语联网(武汉)信息技术有限公司 One kind carrying out the matched method of term based on even numbers group dictionary tree
CN109426358B (en) * 2017-09-01 2023-04-07 百度在线网络技术(北京)有限公司 Information input method and device
CN109933774A (en) * 2017-12-15 2019-06-25 腾讯科技(深圳)有限公司 Method for recognizing semantics, device storage medium and electronic device
CN110597800A (en) * 2018-05-23 2019-12-20 杭州海康威视数字技术股份有限公司 Method and device for determining annotation information and constructing prefix tree
CN109065016B (en) * 2018-08-30 2021-04-13 出门问问信息科技有限公司 Speech synthesis method, speech synthesis device, electronic equipment and non-transient computer storage medium
CN109377980B (en) * 2018-08-31 2022-06-07 众安信息技术服务有限公司 Syllable segmentation method and device
CN111967248A (en) * 2020-07-09 2020-11-20 深圳价值在线信息科技股份有限公司 Pinyin identification method and device, terminal equipment and computer readable storage medium
CN112035597B (en) * 2020-09-04 2023-11-21 常州新途软件有限公司 Vehicle-mounted input method
CN112185356A (en) * 2020-09-29 2021-01-05 北京百度网讯科技有限公司 Speech recognition method, speech recognition device, electronic device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010108845A (en) * 2000-05-31 2001-12-08 기민호 Term-based cluster management system and method for query processing in information retrieval
CN1786962A (en) * 2005-12-21 2006-06-14 中国科学院计算技术研究所 Method for managing and searching dictionary with perfect even numbers group TRIE Tree
CN101075262A (en) * 2007-06-12 2007-11-21 腾讯科技(深圳)有限公司 Method and system for inputting Chinese character by computer
CN101079060A (en) * 2007-03-26 2007-11-28 腾讯科技(深圳)有限公司 Chinese character input simple 'pinyin' implementation method and system
CN101710258A (en) * 2009-12-23 2010-05-19 福州星网视易信息系统有限公司 Embedded device simple input method
CN102737105A (en) * 2012-03-31 2012-10-17 北京小米科技有限责任公司 Dict-tree generation method and searching method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010108845A (en) * 2000-05-31 2001-12-08 기민호 Term-based cluster management system and method for query processing in information retrieval
CN1786962A (en) * 2005-12-21 2006-06-14 中国科学院计算技术研究所 Method for managing and searching dictionary with perfect even numbers group TRIE Tree
CN101079060A (en) * 2007-03-26 2007-11-28 腾讯科技(深圳)有限公司 Chinese character input simple 'pinyin' implementation method and system
CN101075262A (en) * 2007-06-12 2007-11-21 腾讯科技(深圳)有限公司 Method and system for inputting Chinese character by computer
CN101710258A (en) * 2009-12-23 2010-05-19 福州星网视易信息系统有限公司 Embedded device simple input method
CN102737105A (en) * 2012-03-31 2012-10-17 北京小米科技有限责任公司 Dict-tree generation method and searching method

Also Published As

Publication number Publication date
CN103823814A (en) 2014-05-28

Similar Documents

Publication Publication Date Title
CN103823814B (en) A kind of information processing method and device
CN103268313B (en) A kind of semantic analytic method of natural language and device
CN104252484B (en) A kind of phonetic error correction method and system
Gesmundo et al. Lemmatisation as a tagging task
CN105138514B (en) It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method
CN106598939A (en) Method and device for text error correction, server and storage medium
CN102768681A (en) Recommending system and method used for search input
CN104731768B (en) A kind of location of incident abstracting method towards Chinese newsletter archive
EP1352330A1 (en) Method and system for generating structured data from semi-structured data sources
CN102955833A (en) Correspondence address identifying and standardizing method
CN102236702A (en) Computer executing method and systems and devices for searching using queries
CN110046348A (en) Main body recognition methods in a kind of rule-based and dictionary metro design code
CN102214166A (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN101493812B (en) Tone-character conversion method
CN103294820B (en) WEB page classifying method and system based on semantic extension
CN102955832A (en) Correspondence address identifying and standardizing system
CN106339455A (en) Webpage text extracting method based on text tag feature mining
CN102867049A (en) Chinese PINYIN quick word segmentation method based on word search tree
CN104199954B (en) A kind of commending system and method for searching for input
CN109213998A (en) Chinese wrongly written character detection method and system
CN105447104A (en) Knowledge map generating method and apparatus
EP3945431A1 (en) Bridge from natural language processing engine to database engine
CN106295252A (en) Search method for gene prod
CN102622378A (en) Method and device for detecting events from text flow
CN108733848A (en) A kind of method and system of search knowledge

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant