WO2014176959A1 - 一种基于本地词库提供输入候选词条的方法与设备 - Google Patents
一种基于本地词库提供输入候选词条的方法与设备 Download PDFInfo
- Publication number
- WO2014176959A1 WO2014176959A1 PCT/CN2014/074856 CN2014074856W WO2014176959A1 WO 2014176959 A1 WO2014176959 A1 WO 2014176959A1 CN 2014074856 W CN2014074856 W CN 2014074856W WO 2014176959 A1 WO2014176959 A1 WO 2014176959A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- input
- term
- user
- local
- entry
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/02—Input arrangements using manually operated switches, e.g. using keyboards or dials
- G06F3/023—Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
- G06F3/0233—Character input methods
- G06F3/0237—Character input methods using prediction or retrieval techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/174—Form filling; Merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/274—Converting codes to words; Guess-ahead of partial word inputs
Definitions
- the present invention relates to the field of input method technologies, and more particularly to a technique for providing input candidate terms based on a local vocabulary. Background technique
- the input method generally only learns a certain input content of the user, but does not learn the context of the segmented screen. For example, the user inputs the input string ab, the input term a corresponding to the upper screen ab, the user inputs the input string cd, and the input entry a2 corresponding to the upper screen cd, the existing input method does not The context relationship between the input terms a and a2 of the segmentation screen is learned; only when the user inputs the input string abed once and enters the entry ala2 on the screen, the existing input method will perform the input term ala2. Learn.
- the input method of the prior art does not based on the input entry record of the user history segmentation screen, gives the predicted input candidate term, and further limits the input candidate term provided to the user, which affects the user's input experience. .
- a method for providing an input candidate term based on a local vocabulary comprises the following steps:
- a user equipment for providing an input candidate term based on a local vocabulary, wherein the device comprises:
- a first obtaining device configured to acquire an input character string input by a user
- a first matching device configured to perform a matching query in the local vocabulary according to the input character string, to obtain a corresponding candidate term, wherein the local vocabulary is established or updated according to the input term record of the user history segmentation screen ;
- a second matching device configured to perform a matching query on the local terminology included in the candidate term, and determine a following entry corresponding to the last section sub-term; the merging device, And a method for combining the candidate term with the following term to obtain an input candidate term to be provided to the user;
- the present invention performs a matching query in a local vocabulary established or updated according to an input vocabulary record of a user history segmentation screen according to an input character string input by a user, and obtains a corresponding candidate term, and further according to the The last section sub-entries included in the candidate term, the matching query in the local lexicon obtains the corresponding vocabulary, and the candidate term is merged with the vocabulary to obtain the input candidate term and provide the
- the user accurately and effectively expands the range of input candidate terms provided, so that the provided input candidate terms are more in line with the user's input requirements, and the user's input body is improved.
- the present invention learns the input term record of the user segmentation on the screen, and uses the context relationship of the input term record to be based on the above entry of the user just on the screen and the input between the two upper screen entries in the history.
- the relationship weights determine the predicted input candidate terms, thereby increasing the recall rate of the current input and increasing the recall rate for the prediction.
- the present invention can also learn to input the complete input input entry record into fine-grained or merged granularity, that is, if the user inputs a long input entry record, the present invention will make the input entry record reasonable.
- the granularity of the scores is obtained by taking each reasonable granularity of the entry granularity, so that it does not cause an unreasonable length of the term to be predicted when predicting input candidate terms.
- FIG. 1 shows a schematic diagram of an apparatus for providing input candidate terms based on a local vocabulary in accordance with an aspect of the present invention
- FIG. 2 shows a schematic diagram of an apparatus for providing input candidate terms based on a local vocabulary in accordance with a preferred embodiment of the present invention
- FIG. 3 shows a flow chart of a method for providing input candidate terms based on a local vocabulary in accordance with another aspect of the present invention
- FIG. 4 shows a flow chart of a method for providing input candidate terms based on a local vocabulary in accordance with a preferred embodiment of the present invention.
- the user equipment 1 shows a schematic diagram of an apparatus for providing input candidate terms based on a local vocabulary in accordance with an aspect of the present invention.
- the user equipment 1 includes a first obtaining means 101, a first matching means 102, a second matching means 103, a combining means 104 and a providing means 105.
- the user equipment 1 includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a handwriting device, such as a computer, a mobile phone, a PDA, a tablet, Game console or IPTV.
- a handwriting device such as a computer, a mobile phone, a PDA, a tablet, Game console or IPTV.
- the first obtaining means 101 acquires an input character string input by the user. Specifically, the user inputs an input character string in the user device 1 by interacting with the user device 1, and the first obtaining device 101 acquires the user by calling the application program interface (API) provided by the user device 1 one or more times. The input string you entered.
- API application program interface
- the first matching device 102 performs a matching query in the local vocabulary according to the input character string to obtain a corresponding candidate term, wherein the local vocabulary is established or updated according to the input vocabulary record of the user history segmentation upper screen. Specifically, the first matching device 102 performs a matching query in the local vocabulary according to the input character string acquired by the first obtaining device 101, and obtains a candidate term corresponding to the input character string, such as according to the local lexicon.
- the mapping relationship between the pronunciation and the term stored in the vocabulary or by traversing the term stored in the tree structure in the local lexicon, matching the candidate terms whose pronunciation matches the input string.
- the user inputs an input string abcdef through interaction with the user device 1, wherein ab is a pronunciation of a, cd is a pronunciation of a2, and ef is a pronunciation of a3, and the first obtaining means 101 passes the user equipment 1 Interacting, obtaining the input string abcdef; the first matching device 102 performs a matching search in the local lexicon according to the input string, and directly finds the candidate term ala2a3 matching the input string pronunciation; or, the first match The device 102 respectively finds a matching with the ab pronunciation, a2 matching the cd pronunciation, and a3 matching the ef pronunciation, and then splicing the three to obtain the candidate term ala2a3 matching the input string abcdef pronunciation; or, A matching device 102 finds the pronunciation of the following entry a2, a2 of al and the cd of the input string abcdef according to al, then splicing it to form ala2, and then finding the following entry a3 of a2 according to
- the local vocabulary is created or updated according to the input vocabulary record of the user history segmentation screen, and the input vocabulary record of the segmentation upper screen is the input vocabulary of the user successively on the screen.
- the input vocabulary record of the segmentation upper screen is the input vocabulary of the user successively on the screen.
- the user enters the input string ab and selects the input term a on the screen.
- the user inputs the input string cd again, and selects the input term a2 on the screen, then the input terms a and a2 are
- the user equipment 1 stores the input entry record of the segmentation upper screen into the local thesaurus to implement the establishment or update of the local thesaurus.
- the manner of specifically establishing or updating will be described in detail in the embodiment corresponding to FIG. 2.
- the second matching device 103 performs a matching query on the last section sub-terms included in the candidate term in the local thesaurus to determine the following terms corresponding to the last section sub-terms. Specifically, the second matching device 103 selects the candidate term obtained by the first matching device 102 according to the last term sub-term included in the candidate term, and the last matching sub-term a3 obtained in the above example. Performing a matching query in the local thesaurus to determine the following terms corresponding to the last section sub-terms, such as finding a context with the last section sub-term according to the context relationship between the terms stored in the local thesaurus The following terms of the relationship.
- the second matching device 103 can continue to perform a matching query in the local lexicon according to the following terms obtained by the matching, and obtain the following entry of the following vocabulary, that is, obtain the last suffix entry The following terms.
- the candidate terms obtained by matching the obtained matching words with the first matching device 102 can be spliced into a complete input candidate term to be provided to the user.
- the merging device 104 combines the candidate term with the lexical entry to obtain an input candidate term to be provided to the user. Specifically, the merging device 104 combines the candidate terms obtained by matching the first matching device 102 with the following terms obtained by matching the second matching device 103, for example, the following terms obtained by matching the second matching device 103 are spliced in the first After a matching device matches the obtained candidate term, the merged result of the merge process is obtained as an input candidate term. For example, the merging device 104 matches according to the first matching device 102. The candidate term ala2a3 and the second matching device 103 match the obtained term bl, and the two are combined to obtain the input candidate term ala2a3bl.
- the providing means 105 provides the input candidate term to the user. Specifically, the providing device 105 merges the input candidate terms obtained by the merging device 104 into the user by calling a page technology such as ASP, JSP or PHP, or by other agreed display modes. This operation may employ any known means of providing human readable information by a computer, such as a screen display, speaker playback, and the like. Taking the screen display as an example, the providing device 105 merges the input candidate terms obtained by the merge device 104 into the user in a certain order and format for selection. Specifically, when displayed to the user in an input window column of the display, the plurality of input candidate terms and the input character string may be displayed in columns, and the plurality of input candidate terms may all be included in the next column for the user to select.
- a page technology such as ASP, JSP or PHP
- This operation may employ any known means of providing human readable information by a computer, such as a screen display, speaker playback, and the like.
- the providing device 105 merges the input candidate
- the specific function key can be, for example, "+” and "-”.
- the various devices of the user device 1 are continuously working.
- the first obtaining device 101 acquires an input character string input by the user;
- the first matching device 102 performs a matching query in the local vocabulary according to the input character string, and obtains a corresponding candidate term, where the local word is obtained.
- the library is created or updated according to the input entry record of the screen on the user history segment;
- the second matching device 103 performs a matching query on the local terminology included in the candidate term in the candidate term, and determines the last section.
- the merging device 104 combines the candidate term with the lexical entry to obtain an input candidate term to be provided to the user;
- the providing device 105 inputs the input Candidate entry To the user.
- continuous means that the devices of the user equipment 1 respectively acquire the input character string, the candidate term and the following terms according to the set or real-time adjusted working mode requirements. Processing, inputting the provision of the candidate term until the user device 1 stops acquiring the input string input by the user for a long time.
- the present invention performs a matching query in a local vocabulary established or updated according to the input vocabulary record of the user history segmentation screen, obtains a corresponding candidate term, and further according to the candidate term
- the last section sub-entries included in the local lexicon, the corresponding vocabulary is obtained by the matching query, and the candidate term is merged with the vocabulary to obtain the input candidate term and provided to the user, which is accurate
- the scope of the provided input candidate terms is effectively expanded, so that the input candidate terms provided are more in line with the user's input requirements, and the user's input experience is improved.
- the local vocabulary stores the vocabulary in a tree structure; wherein the matching query performed by the first matching device 102 and the second matching device 103 includes traversing the local vocabulary by using a deep traversal algorithm
- the tree structure stores the entries in a tree structure, and each node stores the pronunciation, the entry, the pronunciation segmentation, the term segmentation, the following pronunciation, the following terms, and the like.
- the first matching device 102 and the second matching device 103 employ a depth traversal algorithm to traverse the tree structure.
- the user inputs an input string abcdef through interaction with the user equipment 1, wherein ab is a pronunciation of a, cd is a pronunciation of a2, and ef is a pronunciation of a3, and the first obtaining apparatus 101 passes the user equipment 1 Interacting, the input string abcdef is obtained; the first matching device 102 performs a matching query in the local lexicon by using a, ab, abc, abed, abcde...
- the first matching device 102 uses a deep traversal algorithm to trace the following of each prefix word, for example, according to the data.
- Al finds the following a2, a2 pronunciation of a2, and the cd match in the input string abcdef, then splicing it into a, forming ala2; if there is a mismatch, skipping, splicing to a pronunciation and input string consistent a word, such as ala2a3, as a candidate term; or, the first matching device 102 directly matches the word A matched to the pronunciation in the local lexicon according to the input string abcdef, as a candidate word .
- the order of the prefix words may be input according to the user history. The following is followed by the prefix word, for example, for the prefix words a1, ax, the user has recently entered a1, then the first matching device 102 traces the following according to a.
- the second matching device 103 also employs a deep traversal algorithm to traverse the tree structure in the local lexicon.
- the second matching device 103 uses the deep traversal algorithm according to the candidate term ala2a3 matched by the first matching device 102, and traces the following terms according to the last section sub-term a3, for example, finds the following words. a bl; subsequently, the merging device 104 combines the candidate term ala2a3 and the vocabulary bl to obtain a combined result ala2a3bl as an input candidate term; subsequently, the providing device 105 provides the input candidate term to the user .
- the second matching device 103 can continue to traverse along the following entry bl, for example to obtain the following entry b2 of bl, which is the lower-order entry of a3; the second matching device 103 can be further along B2 traversing, obtaining b3, assuming that ala2a3blb2b3 is a complete entry, the merging device 104 may combine ala2a3 with bl, b2, b3 to obtain ala2a3blb2b3 as an input candidate term; subsequently, the providing device 105 inputs the candidate The terms are provided to the user.
- the first matching device 102 and the second matching device 103 can terminate the traversal after finding the input candidate term that matches the result number.
- the number of results of the input candidate term may be preset by the system or may be set by the user.
- the matching query includes traversing the tree structure in the local lexicon by using the depth traversal algorithm according to a context relationship of a term stored in a node of the tree structure.
- the first matching device 102 and the second matching device 103 use the context trajectory of the entry stored in the node of the tree structure in the local lexicon, such as the above relationship of the term, using deep traversal
- the algorithm traverses the tree structure.
- the user history is separately segmented by al, a2, a3; a2, a4; a2, a5; then the a2 node of the tree structure in the local lexicon corresponds to the above entry a1, corresponding to the following terms a3, a4 , a5, that is, the term a3 has the above relationship with the term a1.
- the two devices when the two devices query the following terms of a2 and find the following terms a3, a4, A5, due to the entry a3 It also corresponds to the above entry a1, so the two devices preferentially splicing the term a3 and continuing to traverse from the a3 node.
- the context of the terms stored in the nodes may be prioritized, and then the time sequence of the entries entered by the user history may be considered.
- the user equipment 1 further includes priority determining means (not shown), the priority determining means determining a priority of the input candidate term according to a history input order of the user; wherein the providing means 105.
- the input candidate term is provided to the user according to the priority.
- the priority determining device determines the priority of the input candidate term according to the historical input order of the user, for example, sorting the distance of the input candidate term according to the user history, and the input candidate term recently input by the user. The priority is the highest; subsequently, the providing device 105 provides the input candidate term to the user according to the priority determined by the priority determining device.
- the priority determining means determines the priority of the input candidate term according to the history input order of the user and the term attribute of the input candidate term; wherein the term attribute Includes at least one of the following:
- the input candidate term corresponds to a rate attribute of the local thesaurus
- the priority determining device combines the entry attribute of the input candidate term according to the historical input order of the user, such as the input candidate term corresponding to the probability attribute of the local thesaurus, and the input of the user history.
- the priority of the input candidate term is determined by the number of candidate terms, the transition probability between the sub-terms included in the input candidate term, the prediction length corresponding to the input candidate term, and the like.
- the history input sequence and each term attribute may respectively correspond to a certain score and a weight, and the priority determining device obtains a score of each input candidate term by weighting calculation, and then determines each input candidate according to the score.
- the priority of the entry This weight can be preset by the system or can be set by the user.
- the input candidate term corresponds to a probability attribute of the local thesaurus, for example, the probability of occurrence of the input candidate term in the local thesaurus, and the input candidate term is in the local thesaurus.
- the number of occurrences is calculated from the number of occurrences of all terms in the local lexicon.
- the number of times the user history inputs the input candidate term can be statistically derived.
- the transition probability between the sub-terms included in the input candidate term can be calculated by the transition probability of the language model, and the transition probability is, for example, the input candidate term ab when the above entry is a, the following entry is The probability of b.
- the predicted length corresponding to the input candidate term is, for example, the maximum number of sub-words that can be included in an input candidate term, which can be preset by the system, or can be set by the user.
- the priority determining means may determine the priority of the input candidate term according to the history input order of the user and the arbitrary plurality of term attributes of the input candidate term, for example, by corresponding to the term attribute
- the weight value is set to zero, and the term attribute to be considered is filtered.
- the priority determining apparatus may further determine the priority of the input candidate term according to the following manner, for example, sorting the exactly matched input candidate term and the complete input candidate term according to the user's historical input order, and then only An input candidate term predicting an entry below follows the exact matching input candidate term.
- the user inputs an input string abcdef through interaction with the user device 1, wherein ab is a pronunciation of a, cd is a pronunciation of a2, and ef is a pronunciation of a3, and the first matching device 102 is based on the input string.
- the matched ala2a3 is an exact matching input candidate term; the second matching device 103 adopts a deep traversal algorithm according to the last section sub-term a3, and after finding the following term bl, the ala2a3bl stitched by the merging device 104 is obtained.
- the input candidate term is predicted only for one of the following terms; and the second matching device 103 continues to traverse along the following entry bl, obtains the following entry b2 of bl, and traverses along b2 to obtain b3, assuming ala2a3blb2b3 is A complete entry, the merge device 104 combines ala2a3 with bl, b2, b3 to obtain ala2a3blb2b3, which is a complete input candidate term. And the time when the user inputs the exact matching input candidate term ala2a3 after inputting the complete input candidate term ala2a3blb2b3, the priority determining device determines the priority of the input candidate term from high to low.
- ala2a3, ala2a3bl, ala2a3blb2b3. 2 shows a schematic diagram of an apparatus for providing input candidate terms based on a local vocabulary in accordance with a preferred embodiment of the present invention.
- the user equipment 1 also includes a second acquisition device 206 and an update device 207. The preferred embodiment is described in detail below with reference to FIG. 2.
- the second obtaining means 206 acquires an input entry record of the screen on the user history segment; and the updating means 207 records the entry between the input entries according to the history segmentation Context, establishing or updating the local vocabulary;
- the first obtaining means 201 acquires an input character string input by the user;
- the first matching means 202 performs a matching query in the local lexicon according to the input character string, and obtains a corresponding candidate a term, wherein the local thesaurus is created or updated according to an input term record of the user history segmentation upper screen;
- the second matching device 203 pairs the last section sub-terms included in the candidate terminology in the local thesaurus Performing a matching query to determine the following entry corresponding to the last section sub-term;
- the merging device 204 combines the candidate term with the following terms to obtain an input candidate to be provided to the user
- the providing means 205 provides the input candidate term to the user.
- the first obtaining device 201, the first matching device 202, the second matching device 203, the merging device 204, and the providing device 205 are the same as or substantially the same as the corresponding device shown in FIG. 1, and therefore are not described herein again, and are referred to by reference. The way is included here.
- the second obtaining means 206 acquires an input entry record of the screen on the user history segment. Specifically, the user inputs an entry record through the interaction with the user equipment 1, and the second obtaining device 206 acquires the user by calling an application program interface (API) provided by the user equipment 1, or other agreed manner. Input entry record for segmentation on the screen. For example, if the user history inputs the input string ab and selects the input term a, the second obtaining device 206 acquires the input term a1 of the user's upper screen through interaction with the user device 1 as the user.
- API application program interface
- the user inputs the input string cd again, and selects the input term a2 to be on the screen, then the second obtaining means 206 continues to acquire the interaction through the user device 1
- the input term a2 of the user's upper screen is used as the input entry record of the screen on the user history. Since the input terms a1 and a2 are the screens successively connected to the user, the input terms a1 and a2 are the user history points. Input entry record on the segment screen.
- the update device 207 establishes or updates the local vocabulary based on the context relationship between the input vocabulary records of the historical segmentation screen.
- the update device 207 is based on the second acquisition.
- the input term records a1, a2 of the screen of the user history segment acquired by the device 206 are based on the context relationship between the two input term records, and the local input frequency of the two input term records is used to establish or update the local
- the vocabulary for example, the input vocabulary record of the segmentation screen and its corresponding context relationship are stored in the local vocabulary, for example, the a2 as the following entry of the al is recorded by a vector structure with the attribute name nextentry. To achieve the establishment or update of the local thesaurus.
- the user equipment 1 further includes a word-cutting device (not shown), and the word-cutting device performs a word-cutting process on the input term record to obtain at least one term granularity; wherein the updating device 207 is The context relationship between the at least one entry granularity establishes or updates the local thesaurus.
- a length threshold may be preset, when the length of the entry of the input entry record exceeds the length threshold, it is determined that the input entry record length is long.
- the word-cutting process is performed, and the word-cutting device performs word-cutting processing according to the input term record acquired by the second obtaining means 206 to obtain at least one term granularity. Then, the updating device 207 stores the at least one term granularity obtained after the word-cutting process, and the context relationship between them, into the local vocabulary, for example, in the tree structure of the local lexicon. To achieve the establishment or update of the local thesaurus.
- the user history segments the two long input entry records of the upper screens A and B, respectively, and the second obtaining means 206 acquires the two input entry records; the word cutting device performs word segmentation processing on the two input entry records.
- the input term record A is divided into three terms of ag, a2, and a3, and the input term record B is divided into three terms of bl, b2, and b3.
- the updating means 207 first learns A, B, al, a2, a3, bl, b2, b3 as a complete entry, and then saves the context relationship therebetween.
- nextentry is used.
- the updating means 207 determines that the at least two terms are combined and obtained. Whether the obtained entry is a high frequency entry, wherein the at least two entry granularities have a context relationship; if the entry is a high frequency entry, the local thesaurus is established or updated according to the entry .
- the updating device 207 determines whether the term obtained by combining the at least two terms of the granularity is a high-frequency term according to the granularity of the word processed by the word-cutting device, for example, determining the merged Whether the probability of occurrence of the entry in the local thesaurus is greater than a predetermined probability threshold, if greater than the predetermined probability threshold, determining that the merged entry is a high frequency term; subsequently, the entry is stored in the local term In the library, if it is stored in the corresponding node of the tree structure of the local lexicon, to establish or update the local vocabulary.
- the two word sizes of the high frequency terms obtained after the combination have a context relationship.
- the input terms obtained by the second obtaining means 206 are recorded as A and B; the word cutting device performs word segmentation processing on the two input term records to obtain the entry granularity a1, a2, a3, bl, b2, b3.
- the updating means 207 judges that the term a2a3 merged by the entry granularity a2, a3 is a high frequency term, wherein the bar granularity a2 and a3 have a context relationship, and then the updating means 207 sets the term granularity a2.
- A3 performs a merge process to obtain the term a2a3, and stores the term a2a3 in the local thesaurus to implement the establishment or update of the local thesaurus.
- FIG. 3 shows a flow chart of a method for providing input candidate terms based on a local vocabulary in accordance with another aspect of the present invention.
- step S301 the user device 1 acquires an input character string input by the user. Specifically, the user inputs an input character string in the user equipment 1 by interacting with the user equipment 1.
- step S301 the user equipment 1 invokes an application program interface (API) provided by the user equipment 1 one or more times. Get the input string entered by this user.
- API application program interface
- step S302 the user equipment 1 performs a matching query in the local vocabulary according to the input character string, and obtains a corresponding candidate term, wherein the local lexicon records the input vocabulary of the screen on the user history segment. Create or update.
- the user device 1 performs a matching query in the local vocabulary according to the input character string obtained in step S301, and obtains a candidate term corresponding to the input character string, such as according to the pronunciation and words stored in the local lexicon.
- the user inputs an input string abcdef through interaction with the user equipment 1, wherein ab is a pronunciation of a, cd is a pronunciation of a2, and ef is a pronunciation of a3, then in step S301, the user equipment 1 passes the user The interaction of the device 1 acquires the input string abcdef; in step S302, the user equipment 1 performs a matching search in the local vocabulary according to the input string, and directly finds the candidate term ala2a3 that matches the input string pronunciation.
- step S302 the user equipment 1 respectively finds a that matches the ab pronunciation, a2 that matches the cd pronunciation, and a3 that matches the ef pronunciation, and then splices the three to obtain the pronunciation matching with the input string abcdef.
- the user equipment 1 finds the pronunciation of the following terms a2, a2 of al and the cd of the input string abcdef according to al, and then splicing it to form ala2, Then, according to a2, find the following entry a3 of a2, the pronunciation of a3 matches the ef in the input string abcdef, then splicing it to ala2 to form ala2a3, as matching with the input string Choose an entry.
- the local lexicon is created or updated according to the input vocabulary record of the user history segmentation upper screen, and the input vocabulary record of the segment upper screen is the input term of the user successively on the screen.
- the user history inputs the input string ab, and selects the input term a on the screen, and then the user inputs the input string cd again, and selects the input term a2 on the screen, then the input terms a and a2
- the user equipment 1 stores the input entry record of the segmentation upper screen into the local thesaurus to realize the establishment or update of the local thesaurus.
- the manner of concrete establishment or update will be described in detail in the embodiment corresponding to Fig. 2.
- step S303 the user equipment 1 performs a matching query on the last section sub-terms included in the candidate term in the local thesaurus to determine the following terms corresponding to the last section sub-terms. Specifically, in step S303, the user equipment 1 is based on the step S302.
- Corresponding vocabulary for example, according to the context relationship between the terms stored in the local lexicon, find the following terms that have a context relationship with the last sub-term.
- the user equipment 1 may continue to perform a matching query in the local lexicon according to the following terms obtained by the matching, and obtain the following entry of the following entry, that is, obtain the last clause.
- the term obtained by the final matching and the candidate term obtained by the user device 1 in step S302 can be spliced into a complete input candidate term to be provided to the user.
- step S304 the user equipment 1 combines the candidate term with the following terms to obtain an input candidate term to be provided to the user. Specifically, in step S304, the user equipment 1 performs a merge process on the candidate term obtained by the matching in step S302 and the following term obtained by matching in step S303, such as stitching the following terms obtained by matching in step S303. After the obtained candidate terms are matched in step S302, the merged result of the merge processing is obtained as an input candidate term. For example, in step S304, the user equipment 1 combines the candidate term ala2a3 obtained in the matching in step S302 and the following term bl obtained in step S303, and obtains the input candidate term. Ala2a3bl.
- step S305 the user equipment 1 provides the input candidate term to the user. Specifically, in step S305, the user equipment 1 merges the input candidate terms obtained in step S304 by calling page technologies such as ASP, JSP, or PHP, or by other agreed display modes, to the user. This operation may employ any known means of providing human readable information by a computer, such as a screen display, speaker playback, and the like. Taking the screen display as an example, in step S305, the user equipment 1 will merge in step S304.
- the input candidate terms obtained are processed and provided to the user in a certain order and format for selection.
- the plurality of input candidate terms and the input character string may be displayed in columns, and the plurality of input candidate terms may all be included in the next column for the user to select.
- the plurality of input candidate terms may all be included in the next column for the user to select.
- only one row of input candidate terms may be displayed in the entry column, and the number of input candidate entries may be default or user-settable, and the previous or next line is input by pressing a specific function key by the user.
- the specific function key can be, for example, "+" and "-”.
- the various steps of the user device 1 are continuously working. Specifically, in step S301, the user equipment 1 acquires an input character string input by the user; in step S302, the user equipment 1 performs a matching query in the local vocabulary according to the input character string, and obtains a corresponding candidate term.
- the local vocabulary is established or updated according to the input vocabulary record of the user history segmentation screen; in step S303, the user equipment 1 selects the last suffix item included in the candidate vocabulary in the local vocabulary Performing a matching query to determine the following terms corresponding to the last section sub-terms; in step S304, the user equipment 1 combines the candidate terms with the following terms to obtain a to-be-provided The user's input candidate term; in step S305, the user device 1 provides the input candidate term to the user.
- continuous means that the steps of the user equipment 1 respectively acquire the input character string, the candidate term and the following terms according to the set or real-time adjusted working mode requirements. Processing, inputting the provision of the candidate term until the user device 1 stops acquiring the input string input by the user for a long time.
- the present invention performs a matching query in a local vocabulary established or updated according to the input vocabulary record of the user history segmentation screen, obtains a corresponding candidate term, and further according to the candidate term
- the last section sub-entries included in the local lexicon, the corresponding vocabulary is obtained by the matching query, and the candidate term is merged with the vocabulary to obtain the input candidate term and provided to the user, which is accurate Effectively expanded
- the range of input candidate terms is provided, so that the input candidate terms provided are more in line with the user's input requirements, and the user's input experience is improved.
- the local thesaurus stores the terms in a tree structure; wherein the matching query executed in step S302 and in step S303 comprises using a deep traversal algorithm to traverse the tree in the local thesaurus structure.
- the local vocabulary stores the words in a tree structure, and each node stores pronunciations, terms, pronunciations, vocabulary segments, vocabulary readings, vocabulary words, and the like.
- the user equipment 1 traverses the tree structure using a depth traversal algorithm.
- the user inputs an input string abcdef through interaction with the user equipment 1, wherein ab is a pronunciation of a, cd is a pronunciation of a2, and ef is a pronunciation of a3, then in step S301, the user equipment 1 passes the user The interaction of the device 1 acquires the input string abcdef; in step S302, the user equipment 1 performs matching query in the local lexicon according to the input string, respectively, using a, ab, abc, abed, abcde...
- the user equipment 1 uses the depth traversal algorithm to trace back The following of each prefix word, for example, according to al, find the following a2, a2 pronunciation of al and cd match in the input string abcdef, then 4 bar splicing a1 to form ala2; if there is a mismatch, skip it, always Splicing into a word whose pronunciation is consistent with the input string, such as ala2a3, as a candidate term; or, in step S302, the user device 1 is based on the input string abcdef , directly matching the query to the pronunciation matching word A in the local lexicon as a candidate term.
- the prefix of the prefix word may be input according to the user history, and the following is sequentially traced, for example, for the prefix words a1, ax, the user recently inputes the al Then, the user equipment 1 traces the following according to al.
- step S303 the user equipment 1 also employs a deep traversal algorithm to traverse the tree structure in the local lexicon.
- the user equipment 1 according to the candidate term ala2a3 obtained in step S302, according to the last section sub-term a3, uses a deep traversal algorithm to trace the following terms, for example, find The following entry bl; subsequently, in step S304, the user equipment 1 combines the candidate term ala2a3 and the following term bl to obtain the combined result ala2a3bl as an input candidate term; subsequently, in the step In S305, the user equipment 1 provides the input candidate term to the user.
- the user equipment 1 may continue to traverse along the following entry bl, for example, obtain the following entry b2 of bl, which is the lower-order entry of a3; the user equipment 1 may follow along B2 traversing, obtaining b3, assuming that ala2a3blb2b3 is a complete entry, in step S304, user equipment 1 may combine ala2a3 with bl, b2, b3 to obtain ala2a3blb2b3 as an input candidate term; subsequently, in step The input candidate term is provided to the user in S305.
- the user equipment 1 can terminate the traversal after finding the input candidate term that matches the result number.
- the number of results of the input candidate term may be preset by the system or may be set by the user.
- the matching query includes traversing the tree structure in the local thesaurus by using the depth traversal algorithm according to the context relationship of the terms stored in the nodes of the tree structure.
- the user equipment 1 adopts a depth traversal algorithm according to the context relationship of the terms stored in the node of the tree structure in the local lexicon, such as the above relationship of the entry. , traverse the tree structure.
- the user history is separately segmented by al, a2, a3; a2, a4; a2, a5; then the a2 node of the tree structure in the local lexicon corresponds to the above entry a1, corresponding to the following terms a3, a4 , a5, that is, the term a3 has the above relationship with the term a1. Therefore, if the user equipment 1 prioritizes the above relationship in the matching process in step S302 or in step S303, the user equipment 1 queries the following terms of a2 in the two steps and simultaneously finds the following terms.
- the two steps preferentially splicing the term a3 and continuing to traverse from the a3 node.
- the context relationship of the terms stored in the node may be prioritized, and then the time sequence of the entry in the user history is considered.
- the method further includes a step S308 (not shown), in step S308, the user equipment 1 determines the priority of the input candidate term according to the historical input order of the user; wherein, in step S305, The user equipment 1 provides the input candidate term to the user according to the priority.
- the user equipment 1 is based on the user. a history input order, determining a priority of the input candidate term, for example, sorting by the user history inputting the input candidate term, the user inputting the input candidate term has the highest priority; subsequently, in step S305 The user equipment 1 provides the input candidate term to the user according to the priority determined in step S308.
- the user equipment 1 determines the priority of the input candidate term according to the history input order of the user and the term attribute of the input candidate term.
- attribute includes at least one of the following:
- the input candidate term corresponds to a probability attribute of the local thesaurus
- the user equipment 1 combines the entry attribute of the input candidate term according to the history input order of the user, such as the probability attribute of the input candidate entry corresponding to the local thesaurus, and the user history input.
- the priority of the input candidate term is determined by the number of times the candidate term is input, the transition probability between the sub-terms included in the input candidate term, the predicted length corresponding to the input candidate term, and the like.
- the history input sequence and each term attribute may respectively correspond to a certain score and a weight.
- the user equipment 1 obtains a score of each input candidate term by weighting calculation, and then determines the score according to the score.
- the priority of each input candidate term This weight can be preset by the system or can be set by the user.
- the probability attribute of the input candidate term corresponding to the local thesaurus may be the number of occurrences of the input candidate term in the local thesaurus and the local The number of occurrences of all terms in the thesaurus is calculated.
- the number of times the user history inputs the input candidate term can be statistically derived.
- the transition probability between the sub-terms included in the input candidate term can be calculated by the transition probability of the language model, and the transition probability is, for example, the input candidate term ab when the above entry is a, the following entry is The probability of b.
- the predicted length corresponding to the input candidate term is, for example, the maximum number of sub-words that can be included in an input candidate term, which can be preset by the system, or can be set by the user.
- the user equipment 1 can input according to the history of the user. And determining the priority of the input candidate term according to any of the above plurality of term attributes of the input candidate term, for example, by filtering the weight corresponding to the term attribute to zero, screening the word to be considered Strip attribute.
- the user equipment 1 may further determine the priority of the input candidate term according to the following manner, for example, sorting the exactly matched input candidate term and the complete input candidate term according to the user's historical input order. Then, the input candidate term that predicts only one of the following terms is immediately followed by the exact matching input candidate term.
- the user inputs an input string abcdef through interaction with the user equipment 1, wherein ab is a pronunciation of a, cd is a pronunciation of a2, and ef is a pronunciation of a3, then in step S302, the user equipment 1
- the input string, the matched ala2a3 is an exact matching input candidate term; in step S303, the user equipment 1 uses the deep traversal algorithm according to the last section sub-term a3, after finding the following entry bl,
- the ala2a3bl obtained by the user equipment 1 is an input candidate term for predicting only one of the following terms; in step S303, the user equipment 1 continues to traverse along the following entry bl to obtain the following words of bl.
- step S304 user device 1 combines ala2a3 with bl, b2, b3 to obtain ala2a3blb2b3, which is complete Enter candidate terms. And the time when the user inputs the exact matching input candidate term ala2a3 after inputting the complete input candidate term ala2a3blb2b3, then in step S308, the priority of the input candidate term determined by the user equipment 1 is determined by The order of high to low is: ala2a3, ala2a3bl, ala2a3blb2b3.
- step S406 the user equipment 1 acquires an input entry record of the screen on the user history segment; in step S407, the user equipment 1 is classified according to the history.
- step S401 the user device 1 acquires an input character string input by the user; in step S402, the user sets According to the input string, the matching query is performed in the local vocabulary to obtain a corresponding candidate term, wherein the local vocabulary is established or updated according to the input vocabulary record of the user history segmentation screen; in step S403 And the user equipment 1 performs a matching query on the local vocabulary in the last vocabulary included in the candidate vocabulary to determine the following vocabulary corresponding to the last suffix entry; in step S404, the user The device 1 combines the candidate term with the following terms to obtain an input candidate term to be provided to the user; in step S405, the user device 1 provides the input candidate term to the User.
- the steps S401-S405 are the same as or substantially the same as the corresponding steps shown in FIG. 3, and thus are not described herein again, and are included herein by reference.
- step S406 the user equipment 1 acquires an input entry record of the screen on the user history segment. Specifically, the user enters the input entry record on the segment through the interaction with the user device 1.
- the user device 1 invokes the application program interface (API) provided by the user device 1, or other agreed manners. Get the input entry record of the screen on the user segment. For example, if the user history inputs the input string ab and selects the input term a, the user device 1 obtains the input term a1 of the user's upper screen by interacting with the user device 1 in step S406.
- API application program interface
- step S406 the user equipment 1 continues to pass the user equipment 1 Interacting, obtaining the input term a2 of the user's upper screen, as the input term record of the screen on the user history, since the input term a, a2 is the user's successively on the screen, the input term a, a2 is The input history record of the screen on the user history segment.
- step S407 the user equipment 1 establishes or updates the local thesaurus according to the context relationship between the input entry records of the historical segmentation upper screen.
- the user equipment 1 records the records s, a2, based on the input vocabulary of the user history segment acquired in step S406, according to the context relationship between the two input vocabulary records, and combines the Entering the frequency of the input of the two entry entries, establishing or updating the local thesaurus, for example, storing the input entry records of the segmented screen and their corresponding contexts into the local thesaurus, such as the a2 as the lower of the al
- the entry is recorded in a vector structure with the attribute name nextentry to implement the creation or update of the local thesaurus.
- the method further includes a step S409 (not shown), in step S409, the user equipment 1 performs a word-cutting process on the input term record to obtain at least one term granularity; wherein, in step S407, The user equipment 1 establishes or updates the local thesaurus according to the context relationship between the at least one term granularity.
- a length threshold may be preset, when the length of the entry of the input entry record exceeds the length threshold, it is determined that the input entry record length is long. The word-cutting process is required.
- step S409 the user equipment 1 performs word-cutting processing according to the input term record acquired in step S406 to obtain at least one term granularity. Then, in step S407, the user equipment 1 stores at least one entry granularity obtained after the word-cutting process, and their mutual context relationship, into the local thesaurus, such as a tree stored in the local thesaurus. In the shape structure, to establish or update the local thesaurus.
- the user history segments the two long input entry records of the upper screens A and B respectively, and in step S406, the user equipment 1 acquires the two input entry records; in step S409, the user equipment 1 inputs the two inputs.
- the entry record is processed by word segmentation, and the input term record A is divided into three terms of size: a, a2, and a3, and the input term record B is divided into three terms of bl, b2, and b3.
- the user equipment 1 first learns A, B, al, a2, a3, bl, b2, and b3 as a complete entry, and then saves the context relationship therebetween, and uses an attribute named nextentry below.
- the vector structure record such as the following entry of al increases a2, the following entry of a2 increases a3, the following entry of a3 adds B, in addition to bl, B before bl; next, in step S407, the user equipment 1
- the above relationship for example, when the term granularity a2 increases a3 as the following entry, and at the same time records a3 as the following entry, the above entry of a2 is al, as in the next entry vector of a2 Record "a3 ⁇ ral", where " ⁇ r,” is used to split the following terms and the above terms.
- step S407 the user equipment 1 determines whether the vocabulary obtained by combining at least two lexical granularities is a high frequency vocabulary, wherein the at least two lexical granularities have a context relationship;
- the term bar is a high frequency term, and the local thesaurus is created or updated according to the term.
- step S407 the user equipment 1 determines the words obtained by combining the at least two terms of the granularity according to the word granularity processed by the word cut in step S409.
- the strip is a high frequency term, for example, determining whether the occurrence rate of the merged entry in the local thesaurus is greater than a predetermined threshold, and if the threshold is greater than the predetermined threshold, determining the merged word
- the bar is a high frequency term; subsequently, the term is stored in the local thesaurus, such as in a corresponding node of the tree structure of the local thesaurus to establish or update the local thesaurus.
- the two word sizes of the high frequency terms obtained after the combination have a context relationship.
- the input terms acquired by the user equipment 1 are recorded as A and B.
- the user equipment 1 performs word-cutting processing on the two input entry records to obtain the entry granularity a1, a2, a3, bl, B2, b3; in step S407, the user equipment 1 determines that the term a2a3 merged by the entry granularity a2, a3 is a high frequency term, wherein the term granularity a2 and a3 have a context relationship, and subsequently, the user equipment 1 Combine the term granularity a2 and a3 to obtain the term a2a3, and store the term a2a3 in the local thesaurus to implement the establishment or update of the local thesaurus.
- the next matching query is traversed in the tree structure of the local lexicon, the following entry a2 or a2a3 can be obtained.
- the present invention can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device.
- the software program of the present invention may be executed by a processor to implement the steps or functions described above.
- the software program (including related data structures) of the present invention can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like.
- some of the steps or functions of the present invention may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.
- a portion of the present invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or solution in accordance with the present invention.
- the program instructions for invoking the method of the present invention may be stored in a fixed or removable recording medium, and/or transmitted by a data stream in a broadcast or other signal bearing medium, and/or stored in a The working memory of the computer device in which the program instructions are run.
- an embodiment in accordance with the present invention includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein the computer program When the instructions are executed by the processor, the apparatus is triggered to operate based on the methods and/or technical solutions described above in accordance with various embodiments of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种用于基于本地词库提供输入候选词条的方法与设备。该方法包括:获取用户输入的输入字符串;根据该输入字符串,在本地词库中进行匹配査询,获取对应的候选词条,其中,该本地词库根据用户历史分段上屏的输入词条记录建立或更新;根据该候选词条中所包括的末节子词条,在该本地词库中匹配査询得到对应的下文词条;将该候选词条与该下文词条进行合并处理,得到输入候选词条并提供给该用户。与现有技术相比,本发明准确、有效地扩大了提供的输入候选词条的范围,使得提供的输入候选词条更符合用户的输入需求,提升了用户的输入体验。
Description
一种基于本地词库提供输入候选词条的方法与设备
技术领域
本发明涉及输入法技术领域, 尤其涉及一种用于基于本地词库提 供输入候选词条的技术。 背景技术
现有技术中, 输入法一般仅对用户某次完整输入的内容进行学 习, 但没对分段上屏的上下文关系进行学习。 例如用户分别输入输入 字符串 ab, 上屏 ab对应的输入词条 al , 紧接着该用户输入输入字符 串 cd, 并上屏 cd对应的输入词条 a2, 现有输入法并不会对该两个分 段上屏的输入词条 al、 a2 间的上下文关系进行学习; 仅当该用户一 次输入输入字符串 abed, 并上屏输入词条 ala2时, 现有输入法才会 对该输入词条 ala2进行学习。
显然, 现有技术的该种方式不利于本地词库的更新, 使得匹配得 到的输入候选词条受到了限制, 影响了用户的输入体验。
并且, 现有技术的输入法不会基于用户历史分段上屏的输入词条 记录, 给出预测的输入候选词条, 进一步使得提供给用户的输入候选 词条受到了限制, 影响了用户的输入体验。
因此, 如何有效地基于本地词库提供输入候选词条,提升用户的输 入体验, 成为本领域技术人员亟需解决的一个问题。 发明内容
本发明的目的是提供一种用于基于本地词库提供输入候选词条的 方法与设备。
根据本发明的一个方面, 提供了一种用于基于本地词库提供输入候 选词条的方法, 其中, 该方法包括以下步骤:
a获取用户输入的输入字符串;
b根据所述输入字符串, 在本地词库中进行匹配查询, 获取对应的
候选词条, 其中, 所述本地词库根据用户历史分段上屏的输入词条记录 建立或更新;
C对所述候选词条中所包括的末节子词条, 在所述本地词库中进行 匹配查询, 确定所述末节子词条所对应的下文词条;
d将所述候选词条与所述下文词条进行合并处理, 以获得待提供给 所述用户的输入候选词条;
e将所述输入候选词条提供给所述用户。
根据本发明的另一方面, 还提供了一种用于基于本地词库提供输 入候选词条的用户设备, 其中, 该设备包括:
第一获取装置, 用于获取用户输入的输入字符串;
第一匹配装置, 用于根据所述输入字符串, 在本地词库中进行匹配 查询, 获取对应的候选词条, 其中, 所述本地词库根据用户历史分段上 屏的输入词条记录建立或更新;
第二匹配装置, 用于对所述候选词条中所包括的末节子词条, 在所 述本地词库中进行匹配查询, 确定所述末节子词条所对应的下文词条; 合并装置, 用于将所述候选词条与所述下文词条进行合并处理, 以 获得待提供给所述用户的输入候选词条;
提供装置, 用于将所述输入候选词条提供给所述用户。
与现有技术相比, 本发明根据用户输入的输入字符串, 在根据用 户历史分段上屏的输入词条记录建立或更新的本地词库中进行匹配查 询, 获取对应的候选词条, 并进一步根据该候选词条中所包括的末节子 词条, 在该本地词库中匹配查询得到对应的下文词条, 将该候选词条与 该下文词条进行合并处理, 得到输入候选词条并提供给该用户, 准确、 有效地扩大了提供的输入候选词条的范围, 使得提供的输入候选词条更 符合用户的输入需求, 提升了用户的输入体猃。
进一步地, 本发明对用户分段上屏的输入词条记录进行学习, 利 用输入词条记录的上下文关系, 可以根据用户刚上屏的上文词条和输 入历史中两个上屏词条之间的关系权值来确定预测的输入候选词条, 从而提高当前输入的召回率, 提高对预测的召回率。
进一步地, 本发明还可对完整输入的输入词条记录拆分成细粒度 或合并粒度进行学习, 即用户如果上屏一个较长的输入词条记录, 本 发明会对输入词条记录进行合理的粒度切分, 取到每个合理粒度的词 条粒度进行学习, 这样就不会导致预测输入候选词条时预测出一个不 合理长度的词条。 附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述, 本发明的其它特征、 目的和优点将会变得更明显:
图 1 示出根据本发明一个方面的用于基于本地词库提供输入候选 词条的设备示意图;
图 2示出根据本发明一个优选实施例的用于基于本地词库提供输 入候选词条的设备示意图;
图 3示出根据本发明另一个方面的用于基于本地词库提供输入候 选词条的方法流程图;
图 4示出根据本发明一个优选实施例的用于基于本地词库提供输 入候选词条的方法流程图。
附图中相同或相似的附图标记代表相同或相似的部件。 具体实施方式
下面结合附图对本发明作进一步详细描述。
图 1 示出根据本发明一个方面的用于基于本地词库提供输入候选 词条的设备示意图。 用户设备 1包括第一获取装置 101、 第一匹配装置 102、 第二匹配装置 103、 合并装置 104和提供装置 105。
其中, 该用户设备 1 包括但不限于任何一种可与用户通过键盘、 鼠标、遥控器、触摸板、或手写设备等方式进行人机交互的电子产品, 例如计算机、 手机、 PDA、 平板电脑、 游戏机或 IPTV等。 本领域技 术人员应能理解上述用户设备仅为举例, 其他现有的或今后可能出现 的用户设备如可适用于本发明, 也应包含在本发明保护范围以内, 并
在此以引用方式包含于此。
第一获取装置 101获取用户输入的输入字符串。 具体地, 用户通过 与用户设备 1的交互, 在用户设备 1中输入输入字符串, 第一获取装置 101通过一次或多次调用该用户设备 1所提供的应用程序接口 (API ), 获取该用户输入的输入字符串。
本领域技术人员应能理解上述输入及获取输入字符串的方式仅 为举例, 其他现有的或今后可能出现的输入或获取输入字符串的方式 如可适用于本发明, 也应包含在本发明保护范围以内, 并在此以引用 方式包含于此。
第一匹配装置 102根据所述输入字符串, 在本地词库中进行匹配查 询, 获取对应的候选词条, 其中, 所述本地词库根据用户历史分段上屏 的输入词条记录建立或更新。 具体地, 第一匹配装置 102根据第一获取 装置 101所获取的输入字符串, 在本地词库中进行匹配查询, 获取与该 输入字符串对应的候选词条, 如才艮据该本地词库中所存储的读音与词条 的映射关系, 或通过遍历该本地词库中以树形结构存放的词条, 匹配得 到读音与该输入字符串匹配的候选词条。 例如, 用户通过与用户设备 1 的交互, 输入输入字符串 abcdef, 其中, 假设 ab是 al的读音, cd是 a2 的读音, ef是 a3的读音,则第一获取装置 101通过与用户设备 1的交互, 获取该输入字符串 abcdef; 第一匹配装置 102根据该输入字符串, 在本 地词库中进行匹配查找, 直接找到与该输入字符串读音匹配的候选词条 ala2a3,; 或者, 第一匹配装置 102分别找到与 ab读音匹配的 al、 与 cd 读音匹配的 a2、 与 ef读音匹配的 a3, 再将三者进行拼接, 得到与该输 入字符串 abcdef读音匹配的候选词条 ala2a3; 或者, 第一匹配装置 102 才艮据 al找到 al的下文词条 a2, a2的读音和输入字符串 abcdef中的 cd 匹配, 则把它拼接 al, 组成 ala2, 接着, 再根据 a2找到 a2的下文词条 a3, a3的读音和输入字符串 abcdef中的 ef 匹配, 则把它拼接 ala2, 组 成 ala2a3, 作为与该输入字符串匹配的候选词条。
在此, 该本地词库根据用户历史分段上屏的输入词条记录建立或更 新, 该分段上屏的输入词条记录为用户接连上屏的输入词条。 例如, 用
户历史输入输入字符串 ab, 并选择了输入词条 al上屏, 紧接着, 该用 户又输入输入字符串 cd, 并选择了输入词条 a2上屏, 则该输入词条 al 及 a2即为分段上屏的输入词条记录,该用户设备 1将该分段上屏的输入 词条记录存入该本地词库中, 以实现对该本地词库的建立或更新。 具体 建立或更新的方式将在图 2对应的实施例中详细描述。
本领域技术人员应能理解上述匹配候选词条的方式仅为举例, 其 他现有的或今后可能出现的匹配候选词条的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并在此以引用方式包含于此。
第二匹配装置 103对所述候选词条中所包括的末节子词条, 在所述 本地词库中进行匹配查询, 确定所述末节子词条所对应的下文词条。 具 体地,第二匹配装置 103根据第一匹配装置 102所匹配获得的候选词条, 才艮据该候选词条中所包括的末节子词条, 如上例中最后匹配得到的末节 子词条 a3, 在该本地词库中进行匹配查询, 确定该末节子词条所对应的 下文词条, 如根据该本地词库中所存储的词条间的上下文关系, 找到与 该末节子词条具有上下文关系的下文词条。
优选地, 该第二匹配装置 103可以才艮据该匹配得到的下文词条, 继 续在该本地词库中进行匹配查询, 得到该下文词条的下文词条, 即, 得 到该末节子词条的下下文词条。 最终匹配得到的词条与该第一匹配装置 102 所匹配得到的候选词条可拼接为一个完整的输入候选词条, 以提供 给该用户。
本领域技术人员应能理解上述匹配下文词条的方式仅为举例, 其 他现有的或今后可能出现的匹配下文词条的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并在此以引用方式包含于此。
合并装置 104将所述候选词条与所述下文词条进行合并处理, 以获 得待提供给所述用户的输入候选词条。 具体地, 合并装置 104将第一匹 配装置 102匹配得到的候选词条和第二匹配装置 103匹配得到的下文词 条进行合并处理, 如将第二匹配装置 103匹配得到的下文词条拼接在第 一匹配装置匹配得到的候选词条之后, 获得合并处理后的合并结果, 作 为输入候选词条。 例如, 该合并装置 104根据第一匹配装置 102匹配得
到的候选词条 ala2a3及第二匹配装置 103 匹配得到的下文词条 bl , 将 这两者进行合并处理, 得到输入候选词条 ala2a3bl。
本领域技术人员应能理解上述合并处理的方式仅为举例, 其他现 有的或今后可能出现的合并处理的方式如可适用于本发明, 也应包含 在本发明保护范围以内, 并在此以引用方式包含于此。
提供装置 105将所述输入候选词条提供给所述用户。 具体地, 提供 装置 105将该合并装置 104合并所获得的输入候选词条,通过调用 ASP、 JSP或 PHP等页面技术, 或通过其他约定的显示方式, 提供给该用户。 这一操作可以采用任何已知的计算机提供人可读信息的技术手段, 典 型的例子如屏幕显示、扬声器播放等。以屏幕显示为例,提供装置 105 将合并装置 104合并处理所获得的输入候选词条,按一定顺序和格式 提供给所述用户, 供其选择以作具体输入。 具体地, 通过在显示器的 一个输入窗口栏中显示给用户时, 可将多个输入候选词条与输入字符 串分栏显示, 多个输入候选词条可全部列入下一栏中供用户选择。 优 选地, 可以在词条栏中仅显示一行输入候选词条, 该行输入候选词条 数目可以是缺省的也可由用户设定, 通过由用户按动特定功能键显示 上一行或下一行输入候选词条, 该特定功能键例如可以是" +"和" -"。
本领域技术人员应能理解上述提供输入候选词条的方式仅为举 例, 其他现有的或今后可能出现的提供输入候选词条的方式如可适用 于本发明, 也应包含在本发明保护范围以内, 并在此以引用方式包含 于此。
优选地,用户设备 1的各个装置之间是持续不断工作的。具体地, 第一获取装置 101获取用户输入的输入字符串; 第一匹配装置 102根据 所述输入字符串, 在本地词库中进行匹配查询, 获取对应的候选词条, 其中, 所述本地词库根据用户历史分段上屏的输入词条记录建立或更 新; 第二匹配装置 103对所述候选词条中所包括的末节子词条, 在所述 本地词库中进行匹配查询, 确定所述末节子词条所对应的下文词条; 合 并装置 104将所述候选词条与所述下文词条进行合并处理, 以获得待提 供给所述用户的输入候选词条; 提供装置 105将所述输入候选词条提供
给所述用户。 在此, 本领域技术人员应理解"持续"是指用户设备 1 的 各装置分别按照设定的或实时调整的工作模式要求进行输入字符串 的获取、 候选词条和下文词条的获取和合并处理、 输入候选词条的提 供,直至该用户设备 1在较长时间内停止获取用户输入的输入字符串。
在此, 本发明根据用户输入的输入字符串, 在根据用户历史分段上 屏的输入词条记录建立或更新的本地词库中进行匹配查询, 获取对应的 候选词条, 并进一步根据该候选词条中所包括的末节子词条, 在该本地 词库中匹配查询得到对应的下文词条, 将该候选词条与该下文词条进行 合并处理, 得到输入候选词条并提供给该用户, 准确、 有效地扩大了提 供的输入候选词条的范围, 使得提供的输入候选词条更符合用户的输入 需求, 提升了用户的输入体验。
优选地, 所述本地词库以树形结构存放词条; 其中, 所述第一匹配 装置 102和第二匹配装置 103 所执行的匹配查询包括采用深度遍历算 法, 遍历所述本地词库中的所述树形结构。 具体地, 该本地词库以树形 结构存放词条, 各节点存放读音、 词条、 读音切分、 词条切分、 下文读 音、 下文词条等。 该第一匹配装置 102和第二匹配装置 103采用深度遍 历算法, 遍历该树形结构。 例如, 用户通过与用户设备 1的交互, 输入 输入字符串 abcdef, 其中, 假设 ab是 al 的读音, cd是 a2的读音, ef 是 a3的读音, 则第一获取装置 101通过与用户设备 1的交互,获取该输 入字符串 abcdef; 第一匹配装置 102根据该输入字符串, 分别用 a、 ab、 abc、 abed, abcde...在本地词库中进行匹配查询, 找到读音匹配的词, 假设这里找出前缀词 al、 ax, 在此, 前缀词为与输入字符串部分读音匹 配的词条; 接着, 第一匹配装置 102采用深度遍历算法, 追溯每个前缀 词的下文,例如才艮据 al找到 al的下文 a2, a2的读音和输入字符串 abcdef 中的 cd匹配, 则把它拼接 al, 組成 ala2; 如果遇到不匹配的则跳过, 一直拼接到一个读音和输入字符串一致的词,如 ala2a3 ,作为候选词条; 或者, 该第一匹配装置 102根据输入字符串 abcdef, 直接在本地词库中 匹配查询到读音匹配的词 A, 作为候选词条。 优选地, 在第一匹配装置 102匹配得到前缀词之后,可以按照用户历史输入该前缀词的先后顺序,
依次追溯该前缀词的下文,例如对于前缀词 al、 ax,用户最近输入过 al , 则第一匹配装置 102才 据 al追溯其下文。
在此, 第二匹配装置 103同样采用深度遍历算法, 遍历该本地词库 中的树形结构。 接上例, 该第二匹配装置 103根据该第一匹配装置 102 所匹配得到的候选词条 ala2a3, 根据其中的末节子词条 a3 , 采用深度遍 历算法, 追溯其下文词条, 例如找到下文词条 bl ; 随后, 合并装置 104 将该候选词条 ala2a3 及该下文词条 bl 进行合并处理, 获得合并结果 ala2a3bl , 作为输入候选词条; 随后, 提供装置 105将该输入候选词条 提供给该用户。
优选地, 该第二匹配装置 103可继续沿着下文词条 bl进行遍历, 例如获得 bl的下文词条 b2, 该 b2即为 a3的下下文词条; 该第二匹配 装置 103可再沿着 b2遍历, 得到 b3, 假设 ala2a3blb2b3为一完整的词 条, 则合并装置 104可将 ala2a3 与 bl、 b2、 b3进行合并处理, 得到 ala2a3blb2b3, 作为输入候选词条; 随后, 提供装置 105将该输入候选 词条提供给该用户。
优选地, 由于深度遍历算法时间复杂度比较大, 因此, 该第一匹配 装置 102和第二匹配装置 103在找到符合结果数的输入候选词条后即可 终止遍历。 在此, 该输入候选词条的结果数可由系统预置, 也可由用户 进行设置。
更优选地, 所述匹配查询包括根据所述树形结构的节点中所存放的 词条的上下文关系, 采用所述深度遍历算法, 遍历所述本地词库中的所 述树形结构。 具体地, 该第一匹配装置 102和第二匹配装置 103才艮据该 本地词库中树形结构的节点中存放的词条的上下文关系, 如该词条的上 上文关系, 采用深度遍历算法, 遍历该树形结构。 例如, 用户历史分别 分段上屏过 al、 a2、 a3; a2、 a4; a2、 a5; 则该本地词库中的树形结构 的 a2节点分别对应上文词条 al、 对应下文词条 a3、 a4、 a5, 即, 该词 条 a3与该词条 al具有上上文关系。 因此, 若该第一匹配装置 102或该 第二匹配装置 103在匹配过程中优先考虑上上文关系, 则当该两个装置 在查询 a2的下文词条并同时找到下文词条 a3、 a4、 a5时, 由于词条 a3
还对应上上文词条 al , 因此该两个装置优先拼接词条 a3 , 并继续从 a3 节点进行遍历。 优选地, 该两个装置在深度遍历每层节点时, 可优先考 虑节点中所存放词条的上下文关系, 再考虑用户历史输入该词条的时间 顺序。
优选地, 该用户设备 1 还包括优先级确定装置 (未示出), 该优先 级确定装置根据所述用户的历史输入顺序, 确定所述输入候选词条的优 先级; 其中, 所述提供装置 105按照所述优先级, 将所述输入候选词条 提供给所述用户。具体地,该优先级确定装置根据用户的历史输入顺序, 确定所述输入候选词条的优先级, 例如, 按用户历史输入该输入候选词 条的远近进行排序,用户最近输入的输入候选词条的优先级最高; 随后, 该提供装置 105按照该优先级确定装置所确定的优先级, 将该输入候选 词条提供给该用户。
更优选地, 所述优先级确定装置根据所述用户的历史输入顺序, 并 结合所述输入候选词条的词条属性, 确定所述输入候选词条的优先级; 其中, 所述词条属性包括以下至少任一项:
- 所述输入候选词条对应所述本地词库的既率属性;
- 所述用户历史输入所述输入候选词条的次数;
- 所述输入候选词条中所包括的子词条间的转移概率;
- 所述输入候选词条所对应的预测长度。
具体地, 优先级确定装置才艮据用户的历史输入顺序, 再结合该输入 候选词条的词条属性, 如该输入候选词条对应该本地词库的概率属性、 该述用户历史输入该输入候选词条的次数、 该输入候选词条中所包括的 子词条间的转移概率、 该输入候选词条所对应的预测长度等, 确定所述 输入候选词条的优先级。 例如, 该历史输入顺序与各个词条属性可分别 对应一定的分数与权值, 优先级确定装置通过加权计算, 得到各个输入 候选词条的分值, 再按照该分值, 确定每个输入候选词条的优先级。 该 权值可由系统预置, 或可由用户自行设置。
在此, 该输入候选词条对应该本地词库的概率属性例如该输入候选 词条在该本地词库中的出现概率, 可由该输入候选词条在该本地词库中
的出现次数与该本地词库中所有词条的出现次数来计算得出。 该述用户 历史输入该输入候选词条的次数可通过统计得出。 该输入候选词条中所 包括的子词条间的转移概率可通过语言模型的转移概率计算得出, 转移 概率为例如输入候选词条 ab在上文词条为 a的情况时下文词条为 b的概 率。 该输入候选词条所对应的预测长度例如一个输入候选词条最大可包 括的子词条的数量, 其可由系统预置, 或可由用户自行设置。
优选地, 优先级确定装置可根据该用户的历史输入顺序, 并结合输 入候选词条的上述任意多个词条属性, 确定该输入候选词条的优先级, 例如可通过将词条属性对应的权值置为零的方式, 筛选所需考虑的词条 属性。
本领域技术人员应能理解上述词条属性仅为举例, 其他现有的或 今后可能出现的提供输入候选词条的方式如可适用于本发明, 也应包 含在本发明保护范围以内, 并在此以引用方式包含于此。
优选地, 优先级确定装置还可根据如下方式确定输入候选词条的优 先级, 例如, 将精确匹配的输入候选词条与完整的输入候选词条按照用 户的历史输入顺序进行排序, 再将仅预测一个下文词条的输入候选词条 紧跟在该精确匹配的输入候选词条之后。 例如, 用户通过与用户设备 1 的交互, 输入输入字符串 abcdef, 其中, 假设 ab是 al的读音, cd是 a2 的读音, ef是 a3的读音, 则第一匹配装置 102根据该输入字符串, 所匹 配得到的 ala2a3即为精确匹配的输入候选词条; 第二匹配装置 103根 据其中的末节子词条 a3 , 采用深度遍历算法, 找到下文词条 bl之后, 合并装置 104所拼接得到的 ala2a3bl , 即为仅预测一个下文词条的输入 候选词条; 而第二匹配装置 103继续沿着下文词条 bl进行遍历, 获得 bl的下文词条 b2, 再沿着 b2遍历, 得到 b3 , 假设 ala2a3blb2b3为一 完整的词条, 则该合并装置 104将 ala2a3与 bl、 b2、 b3进行合并处理 之后得到的 ala2a3blb2b3 , 即为完整的输入候选词条。 且该用户输入该 精确匹配的输入候选词条 ala2a3 的时间后于输入该完整的输入候选词 条 ala2a3blb2b3的时间,则该优先级确定装置据此确定的输入候选词条 的优先级由高到低依次为: ala2a3 、 ala2a3bl、 ala2a3blb2b3。
图 2示出根据本发明一个优选实施例的用于基于本地词库提供输 入候选词条的设备示意图。 该用户设备 1还包括第二获取装置 206和 更新装置 207。 以下参照图 2对该优选实施例进行详细描述: 具体地, 第二获取装置 206获取所述用户历史分段上屏的输入词条记录; 更新装 置 207根据所述历史分段上屏的输入词条记录间的上下文关系, 建立或 更新所述本地词库; 第一获取装置 201获取用户输入的输入字符串; 第 一匹配装置 202根据所述输入字符串, 在本地词库中进行匹配查询, 获 取对应的候选词条, 其中, 所述本地词库根据用户历史分段上屏的输入 词条记录建立或更新; 第二匹配装置 203对所述候选词条中所包括的末 节子词条, 在所述本地词库中进行匹配查询, 确定所述末节子词条所对 应的下文词条; 合并装置 204将所述候选词条与所述下文词条进行合并 处理, 以获得待提供给所述用户的输入候选词条; 提供装置 205将所述 输入候选词条提供给所述用户。 在此, 第一获取装置 201、 第一匹配装 置 202、 第二匹配装置 203、 合并装置 204及提供装置 205与图 1所示 对应装置相同或基本相同, 故此处不再赘述, 并通过引用的方式包含于 此。
第二获取装置 206获取所述用户历史分段上屏的输入词条记录。 具 体地, 用户通过与用户设备 1的交互, 分段上屏了输入词条记录, 第二 获取装置 206通过调用该用户设备 1所提供的应用程序接口 (API ), 或 其他约定的方式, 获取该用户分段上屏的输入词条记录。 例如, 用户历 史输入输入字符串 ab, 并选择了输入词条 al上屏, 则该第二获取装置 206通过与该用户设备 1的交互, 获取该用户上屏的输入词条 al , 作为 该用户历史上屏的输入词条记录;紧接着,该用户又输入输入字符串 cd, 并选择了输入词条 a2上屏,则该第二获取装置 206继续通过与该用户设 备 1的交互, 获取该用户上屏的输入词条 a2, 作为该用户历史上屏的输 入词条记录, 由于该输入词条 al、 a2为该用户接连上屏的, 该输入词条 al、 a2即为该用户历史分段上屏的输入词条记录。
更新装置 207根据所述历史分段上屏的输入词条记录间的上下文关 系, 建立或更新所述本地词库。 接上例, 更新装置 207根据该第二获取
装置 206所获取的该用户历史分段上屏的输入词条记录 al、 a2, 根据该 两个输入词条记录间的上下文关系, 并结合该两个输入词条记录的输入 频次, 建立或更新该本地词库, 例如将该分段上屏的输入词条记录及其 对应的上下文关系存入该本地词库中, 如该 a2作为该 al的下文词条, 被以一个属性名为 nextentry的 vector结构记录, 以实现对该本地词库 的建立或更新。
优选地, 该用户设备 1 还包括切词装置(未示出), 该切词装置对 所述输入词条记录进行切词处理, 以获得至少一个词条粒度; 其中, 所 述更新装置 207根据所述至少一个词条粒度间的上下文关系, 建立或更 新所述本地词库。 具体地, 当该用户上屏的输入词条记录较长, 例如可 预设一长度阈值, 当该输入词条记录的词条长度超过该长度阈值时, 判 断该输入词条记录长度较长, 需要进行切词处理, 该切词装置根据该第 二获取装置 206所获取的输入词条记录, 对其进行切词处理, 以获得至 少一个词条粒度。 随后, 该更新装置 207将经切词处理后所获得的至少 一个词条粒度, 及其相互间的上下文关系, 存入该本地词库中, 如存入 该本地词库的树形结构中, 以实现对该本地词库的建立或更新。
例如, 用户历史分别分段上屏 A 、 B两个较长的输入词条记录, 第二获取装置 206获取该两个输入词条记录; 该切词装置对该两个输 入词条记录进行切词处理, 将输入词条记录 A切分为 al、 a2、 a3三 个词条粒度, 将输入词条记录 B切分为 bl、 b2、 b3三个词条粒度。 随后, 更新装置 207首先将 A、 B、 al、 a2、 a3、 bl、 b2、 b3作为一 个完整的词条被学, 再保存其间的上下文关系, 下文用一个属性名为 nextentry的 vector结构 i己录, 如 a 1的下文词条增加 a2 , a2的下文词 条增加 a3, a3的下文词条除了增加 bl外还要增加 B, B在 bl之前; 接着, 该更新装置 207继续记录其间的上上文关系, 例如词条粒度 a2 增加 a3作为下文词条时, 同时要记录 a3作为下文词条时 a2的上文 词条是 al ,如在 a2的 nextentry vector里增加一条记录" a3\ral",在此, "\r "用于分割下文词条和上上文词条。
更优选地, 所述更新装置 207判断由至少两个词条粒度合并后所获
得的词条是否为高频词条, 其中, 所述至少两个词条粒度具有上下文关 系; 若所述词条为高频词条,根据所述词条, 建立或更新所述本地词库。 具体地, 更新装置 207才艮据切词装置所切词处理后的词条粒度, 判断由 至少两个词条粒度合并后所获得的词条是否为高频词条, 例如, 判断该 合并后的词条在该本地词库中的出现概率是否大于预定概率阈值, 若大 于该预定概率阈值, 则判断该合并后的词条为高频词条; 随后, 将该词 条存入该本地词库中, 如存入该本地词库的树形结构的对应节点中, 以 建立或更新该本地词库。 在此, 该合并后获得高频词条的两个词条粒度 具有上下文关系。接前例, 第二获取装置 206获取的输入词条记录为 A 和 B; 切词装置对该两个输入词条记录进行切词处理, 获得词条粒度 al、 a2、 a3、 bl、 b2、 b3; 更新装置 207判断由词条粒度 a2、 a3合并 后的词条 a2a3为高频词条,其中,该词条粒度 a2和 a3具有上下文关系, 随后, 该更新装置 207将该词条粒度 a2、 a3进行合并处理, 获得词条 a2a3 , 并将该词条 a2a3存入该本地词库中, 以实现对该本地词库的建立 或更新。 这样, 当下次匹配查询在该本地词库的树形结构中遍历时, 追 溯 al可得下文词条 a2或 a2a3。
图 3示出根据本发明另一个方面的用于基于本地词库提供输入候 选词条的方法流程图。
在步骤 S301中, 用户设备 1获取用户输入的输入字符串。 具体地, 用户通过与用户设备 1的交互, 在用户设备 1中输入输入字符串, 在步 骤 S301 中, 用户设备 1通过一次或多次调用该用户设备 1所提供的应 用程序接口 (API ), 获取该用户输入的输入字符串。
本领域技术人员应能理解上述输入及获取输入字符串的方式仅 为举例, 其他现有的或今后可能出现的输入或获取输入字符串的方式 如可适用于本发明, 也应包含在本发明保护范围以内, 并在此以引用 方式包含于此。
在步骤 S302中, 用户设备 1根据所述输入字符串, 在本地词库中 进行匹配查询, 获取对应的候选词条, 其中, 所述本地词库才艮据用户历 史分段上屏的输入词条记录建立或更新。 具体地, 在步骤 S302 中, 用
户设备 1根据在步骤 S301 中所获取的输入字符串, 在本地词库中进行 匹配查询, 获取与该输入字符串对应的候选词条, 如才 据该本地词库中 所存储的读音与词条的映射关系, 或通过遍历该本地词库中以树形结构 存放的词条, 匹配得到读音与该输入字符串匹配的候选词条。 例如, 用 户通过与用户设备 1的交互, 输入输入字符串 abcdef, 其中, 假设 ab是 al的读音, cd是 a2的读音, ef是 a3的读音, 则在步骤 S301中, 用户 设备 1通过与用户设备 1的交互,获取该输入字符串 abcdef;在步骤 S302 中, 用户设备 1根据该输入字符串, 在本地词库中进行匹配查找, 直接 找到与该输入字符串读音匹配的候选词条 ala2a3,; 或者, 在步骤 S302 中, 用户设备 1分别找到与 ab读音匹配的 al、 与 cd读音匹配的 a2、 与 ef读音匹配的 a3,再将三者进行拼接,得到与该输入字符串 abcdef读音 匹配的候选词条 ala2a3; 或者, 在步骤 S302中, 用户设备 1才艮据 al找 到 al的下文词条 a2, a2的读音和输入字符串 abcdef中的 cd匹配, 则 把它拼接 al , 组成 ala2, 接着, 再根据 a2找到 a2的下文词条 a3, a3 的读音和输入字符串 abcdef中的 ef匹配,则把它拼接 ala2,组成 ala2a3, 作为与该输入字符串匹配的候选词条。
在此, 该本地词库根据用户历史分段上屏的输入词条记录建立或更 新, 该分段上屏的输入词条记录为用户接连上屏的输入词条。 例如, 用 户历史输入输入字符串 ab, 并选择了输入词条 al上屏, 紧接着, 该用 户又输入输入字符串 cd, 并选择了输入词条 a2上屏, 则该输入词条 al 及 a2即为分段上屏的输入词条记录,该用户设备 1将该分段上屏的输入 词条记录存入该本地词库中, 以实现对该本地词库的建立或更新。 具体 建立或更新的方式将在图 2对应的实施例中详细描述。
本领域技术人员应能理解上述匹配候选词条的方式仅为举例, 其 他现有的或今后可能出现的匹配候选词条的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并在此以引用方式包含于此。
在步骤 S303中, 用户设备 1对所述候选词条中所包括的末节子词 条, 在所述本地词库中进行匹配查询, 确定所述末节子词条所对应的下 文词条。 具体地, 在步骤 S303中, 用户设备 1根据在步骤 S302中所匹
配获得的候选词条, 根据该候选词条中所包括的末节子词条, 如上例中 最后匹配得到的末节子词条 a3 , 在该本地词库中进行匹配查询, 确定该 末节子词条所对应的下文词条, 如根据该本地词库中所存储的词条间的 上下文关系, 找到与该末节子词条具有上下文关系的下文词条。
优选地, 在步骤 S303中, 用户设备 1可以根据该匹配得到的下文 词条,继续在该本地词库中进行匹配查询,得到该下文词条的下文词条, 即, 得到该末节子词条的下下文词条。 最终匹配得到的词条与该用户设 备 1在步骤 S302中所匹配得到的候选词条可拼接为一个完整的输入候 选词条, 以提供给该用户。
本领域技术人员应能理解上述匹配下文词条的方式仅为举例, 其 他现有的或今后可能出现的匹配下文词条的方式如可适用于本发明, 也应包含在本发明保护范围以内, 并在此以引用方式包含于此。
在步骤 S304中, 用户设备 1将所述候选词条与所述下文词条进行 合并处理, 以获得待提供给所述用户的输入候选词条。 具体地, 在步骤 S304中,用户设备 1将在步骤 S302中匹配得到的候选词条和在步骤 S303 中匹配得到的下文词条进行合并处理, 如将在步骤 S303 中匹配得到的 下文词条拼接在步骤 S302 中匹配得到的候选词条之后, 获得合并处理 后的合并结果, 作为输入候选词条。 例如, 在步骤 S304中, 用户设备 1 才艮据在步骤 S302中匹配得到的候选词条 ala2a3及在步骤 S303中匹配得 到的下文词条 bl ,将这两者进行合并处理,得到输入候选词条 ala2a3bl。
本领域技术人员应能理解上述合并处理的方式仅为举例, 其他现 有的或今后可能出现的合并处理的方式如可适用于本发明, 也应包含 在本发明保护范围以内, 并在此以引用方式包含于此。
在步骤 S305中, 用户设备 1将所述输入候选词条提供给所述用户。 具体地, 在步骤 S305中, 用户设备 1将在步骤 S304中合并所获得的输 入候选词条, 通过调用 ASP、 JSP或 PHP等页面技术, 或通过其他约定 的显示方式, 提供给该用户。 这一操作可以采用任何已知的计算机提 供人可读信息的技术手段, 典型的例子如屏幕显示、 扬声器播放等。 以屏幕显示为例, 在步骤 S305中, 用户设备 1将在步骤 S304中合并
处理所获得的输入候选词条, 按一定顺序和格式提供给所述用户, 供 其选择以作具体输入。 具体地, 通过在显示器的一个输入窗口栏中显 示给用户时, 可将多个输入候选词条与输入字符串分栏显示, 多个输 入候选词条可全部列入下一栏中供用户选择。 优选地, 可以在词条栏 中仅显示一行输入候选词条, 该行输入候选词条数目可以是缺省的也 可由用户设定, 通过由用户按动特定功能键显示上一行或下一行输入 候选词条, 该特定功能键例如可以是" +"和" -"。
本领域技术人员应能理解上述提供输入候选词条的方式仅为举 例, 其他现有的或今后可能出现的提供输入候选词条的方式如可适用 于本发明, 也应包含在本发明保护范围以内, 并在此以引用方式包含 于此。
优选地,用户设备 1的各个步骤之间是持续不断工作的。具体地, 在步骤 S301中, 用户设备 1获取用户输入的输入字符串; 在步骤 S302 中, 用户设备 1根据所述输入字符串, 在本地词库中进行匹配查询, 获 取对应的候选词条, 其中, 所述本地词库根据用户历史分段上屏的输入 词条记录建立或更新; 在步骤 S303中, 用户设备 1对所述候选词条中 所包括的末节子词条, 在所述本地词库中进行匹配查询, 确定所述末节 子词条所对应的下文词条; 在步骤 S304中, 用户设备 1将所述候选词 条与所述下文词条进行合并处理, 以获得待提供给所述用户的输入候选 词条; 在步骤 S305中, 用户设备 1将所述输入候选词条提供给所述用 户。 在此, 本领域技术人员应理解"持续"是指用户设备 1的各步骤分 别按照设定的或实时调整的工作模式要求进行输入字符串的获取、 候 选词条和下文词条的获取和合并处理、 输入候选词条的提供, 直至该 用户设备 1在较长时间内停止获取用户输入的输入字符串。
在此, 本发明根据用户输入的输入字符串, 在根据用户历史分段上 屏的输入词条记录建立或更新的本地词库中进行匹配查询, 获取对应的 候选词条, 并进一步根据该候选词条中所包括的末节子词条, 在该本地 词库中匹配查询得到对应的下文词条, 将该候选词条与该下文词条进行 合并处理, 得到输入候选词条并提供给该用户, 准确、 有效地扩大了提
供的输入候选词条的范围, 使得提供的输入候选词条更符合用户的输入 需求, 提升了用户的输入体验。
优选地, 所述本地词库以树形结构存放词条; 其中, 在步骤 S302 中和在步骤 S303 中所执行的匹配查询包括采用深度遍历算法, 遍历所 述本地词库中的所述树形结构。 具体地, 该本地词库以树形结构存放词 条, 各节点存放读音、 词条、 读音切分、 词条切分、 下文读音、 下文词 条等。 在步骤 S302中和在步骤 S303中, 该用户设备 1采用深度遍历算 法, 遍历该树形结构。 例如, 用户通过与用户设备 1的交互, 输入输入 字符串 abcdef, 其中, 假设 ab是 al的读音, cd是 a2的读音, ef是 a3 的读音, 则在步骤 S301 中, 用户设备 1通过与用户设备 1的交互, 获 取该输入字符串 abcdef;在步骤 S302中,用户设备 1才艮据该输入字符串, 分别用 a、 ab、 abc、 abed, abcde...在本地词库中进行匹配查询, 找到 读音匹配的词, 假设这里找出前缀词 al、 ax, 在此, 前缀词为与输入字 符串部分读音匹配的词条; 接着, 在步骤 S302中, 用户设备 1采用深 度遍历算法, 追溯每个前缀词的下文, 例如根据 al找到 al的下文 a2, a2的读音和输入字符串 abcdef中的 cd匹配,则 4巴它拼接 al ,组成 ala2; 如果遇到不匹配的则跳过, 一直拼接到一个读音和输入字符串一致的 词, 如 ala2a3, 作为候选词条; 或者, 在步骤 S302中, 用户设备 1根 据输入字符串 abcdef, 直接在本地词库中匹配查询到读音匹配的词 A, 作为候选词条。 优选地, 在步骤 S302中, 用户设备 1 匹配得到前缀词 之后, 可以按照用户历史输入该前缀词的先后顺序, 依次追溯该前缀词 的下文, 例如对于前缀词 al、 ax, 用户最近输入过 al , 则该用户设备 1 才艮据 al追溯其下文。
在此, 在步骤 S303中, 用户设备 1 同样采用深度遍历算法, 遍历 该本地词库中的树形结构。 接上例, 在步骤 S303中, 用户设备 1根据 在步骤 S302中所匹配得到的候选词条 ala2a3, 才艮据其中的末节子词条 a3, 采用深度遍历算法, 追溯其下文词条, 例如找到下文词条 bl ; 随后, 在步骤 S304中,用户设备 1将该候选词条 ala2a3及该下文词条 bl进行 合并处理, 获得合并结果 ala2a3bl , 作为输入候选词条; 随后, 在步骤
S305中, 用户设备 1将该输入候选词条提供给该用户。
优选地, 在步骤 S303中, 用户设备 1可继续沿着下文词条 bl进行 遍历, 例如获得 bl的下文词条 b2, 该 b2即为 a3的下下文词条; 该用 户设备 1可再沿着 b2遍历, 得到 b3, 假设 ala2a3blb2b3为一完整的词 条, 则在步骤 S304中, 用户设备 1可将 ala2a3与 bl、 b2、 b3进行合并 处理, 得到 ala2a3blb2b3, 作为输入候选词条; 随后, 在步骤 S305中 将该输入候选词条提供给该用户。
优选地, 由于深度遍历算法时间复杂度比较大, 因此, 在步骤 S302 中和在步骤 S303中, 用户设备 1在找到符合结果数的输入候选词条后 即可终止遍历。 在此, 该输入候选词条的结果数可由系统预置, 也可由 用户进行设置。
更优选地, 所述匹配查询包括根据所述树形结构的节点中所存放的 词条的上下文关系, 采用所述深度遍历算法, 遍历所述本地词库中的所 述树形结构。 具体地, 在步骤 S302中和在步骤 S303中, 用户设备 1根 据该本地词库中树形结构的节点中存放的词条的上下文关系, 如该词条 的上上文关系, 采用深度遍历算法, 遍历该树形结构。 例如, 用户历史 分别分段上屏过 al、 a2、 a3; a2、 a4; a2、 a5; 则该本地词库中的树形 结构的 a2节点分别对应上文词条 al、 对应下文词条 a3、 a4、 a5, 即, 该词条 a3与该词条 al具有上上文关系。 因此,若在步骤 S302中或在步 骤 S303中, 用户设备 1在匹配过程中优先考虑上上文关系, 则当用户 设备 1在该两个步骤中查询 a2的下文词条并同时找到下文词条 a3、 a4、 a5时, 由于词条 a3还对应上上文词条 al , 因此该两个步骤优先拼接词 条 a3, 并继续从 a3节点进行遍历。 优选地, 该用户设备 1在该两个步 骤中深度遍历每层节点时, 可优先考虑节点中所存放词条的上下文关 系, 再考虑用户历史输入该词条的时间顺序。
优选地, 该方法还包括步骤 S308 (未示出), 在步骤 S308中, 用户 设备 1根据所述用户的历史输入顺序,确定所述输入候选词条的优先级; 其中, 在步骤 S305中, 用户设备 1按照所述优先级, 将所述输入候选 词条提供给所述用户。 具体地, 在步骤 S308中, 用户设备 1根据用户
的历史输入顺序, 确定所述输入候选词条的优先级, 例如, 按用户历史 输入该输入候选词条的远近进行排序, 用户最近输入的输入候选词条的 优先级最高; 随后, 在步骤 S305中, 用户设备 1按照在步骤 S308中所 确定的优先级, 将该输入候选词条提供给该用户。
更优选地, 在步骤 S308中, 用户设备 1根据所述用户的历史输入 顺序, 并结合所述输入候选词条的词条属性, 确定所述输入候选词条的 优先级;
其中, 所述词条属性包括以下至少任一项:
- 所述输入候选词条对应所述本地词库的概率属性;
- 所述用户历史输入所述输入候选词条的次数;
- 所述输入候选词条中所包括的子词条间的转移概率;
- 所述输入候选词条所对应的预测长度。
具体地, 在步骤 S308中, 用户设备 1根据用户的历史输入顺序, 再结合该输入候选词条的词条属性, 如该输入候选词条对应该本地词库 的概率属性、 该述用户历史输入该输入候选词条的次数、 该输入候选词 条中所包括的子词条间的转移概率、 该输入候选词条所对应的预测长度 等, 确定所述输入候选词条的优先级。 例如, 该历史输入顺序与各个词 条属性可分别对应一定的分数与权值, 在步骤 S308中, 用户设备 1通 过加权计算, 得到各个输入候选词条的分值, 再按照该分值, 确定每个 输入候选词条的优先级。 该权值可由系统预置, 或可由用户自行设置。
在此, 该输入候选词条对应该本地词库的概率属性例如该输入候选 词条在该本地词库中的出现概率, 可由该输入候选词条在该本地词库中 的出现次数与该本地词库中所有词条的出现次数来计算得出。 该述用户 历史输入该输入候选词条的次数可通过统计得出。 该输入候选词条中所 包括的子词条间的转移概率可通过语言模型的转移概率计算得出, 转移 概率为例如输入候选词条 ab在上文词条为 a的情况时下文词条为 b的概 率。 该输入候选词条所对应的预测长度例如一个输入候选词条最大可包 括的子词条的数量, 其可由系统预置, 或可由用户自行设置。
优选地, 在步骤 S308中, 用户设备 1可根据该用户的历史输入顺
序, 并结合输入候选词条的上述任意多个词条属性, 确定该输入候选词 条的优先级, 例如可通过将词条属性对应的权值置为零的方式, 筛选所 需考虑的词条属性。
本领域技术人员应能理解上述词条属性仅为举例, 其他现有的或 今后可能出现的提供输入候选词条的方式如可适用于本发明, 也应包 含在本发明保护范围以内, 并在此以引用方式包含于此。
优选地, 在步骤 S308中, 用户设备 1还可根据如下方式确定输入 候选词条的优先级, 例如, 将精确匹配的输入候选词条与完整的输入候 选词条按照用户的历史输入顺序进行排序, 再将仅预测一个下文词条的 输入候选词条紧跟在该精确匹配的输入候选词条之后。 例如, 用户通过 与用户设备 1的交互, 输入输入字符串 abcdef, 其中, 假设 ab是 al的 读音, cd是 a2的读音, ef是 a3的读音, 则在步骤 S302中, 用户设备 1 才艮据该输入字符串, 所匹配得到的 ala2a3 即为精确匹配的输入候选词 条; 在步骤 S303中, 用户设备 1根据其中的末节子词条 a3, 采用深度 遍历算法, 找到下文词条 bl之后, 在步骤 S304中, 用户设备 1所拼接 得到的 ala2a3bl , 即为仅预测一个下文词条的输入候选词条; 在步骤 S303中, 用户设备 1继续沿着下文词条 bl进行遍历, 获得 bl的下文词 条 b2, 再沿着 b2遍历, 得到 b3, 假设 ala2a3blb2b3为一完整的词条, 则在步骤 S304中, 用户设备 1将 ala2a3与 bl、 b2、 b3进行合并处理之 后得到的 ala2a3blb2b3, 即为完整的输入候选词条。 且该用户输入该精 确匹配的输入候选词条 ala2a3 的时间后于输入该完整的输入候选词条 ala2a3blb2b3的时间, 则在步骤 S308中, 用户设备 1据此确定的输入 候选词条的优先级由高到低依次为: ala2a3 、 ala2a3bl、 ala2a3blb2b3。
图 4示出根据本发明一个优选实施例的用于基于本地词库提供输 入候选词条的方法流程图。以下参照图 4对该优选实施例进行详细描述: 具体地, 在步骤 S406中, 用户设备 1获取所述用户历史分段上屏的输 入词条记录; 在步骤 S407中, 用户设备 1根据所述历史分段上屏的输 入词条记录间的上下文关系, 建立或更新所述本地词库; 在步骤 S401 中, 用户设备 1获取用户输入的输入字符串; 在步骤 S402中, 用户设
备 1根据所述输入字符串, 在本地词库中进行匹配查询, 获取对应的候 选词条, 其中, 所述本地词库根据用户历史分段上屏的输入词条记录建 立或更新; 在步骤 S403中, 用户设备 1对所述候选词条中所包括的末 节子词条, 在所述本地词库中进行匹配查询, 确定所述末节子词条所对 应的下文词条; 在步骤 S404中, 用户设备 1将所述候选词条与所述下 文词条进行合并处理, 以获得待提供给所述用户的输入候选词条; 在步 骤 S405中, 用户设备 1将所述输入候选词条提供给所述用户。 在此, 步骤 S401-S405与图 3所示对应步骤相同或基本相同,故此处不再赘述, 并通过引用的方式包含于此。
在步骤 S406中, 用户设备 1获取所述用户历史分段上屏的输入词 条记录。 具体地, 用户通过与用户设备 1的交互, 分段上屏了输入词条 记录, 在步骤 S406中, 用户设备 1通过调用该用户设备 1所提供的应 用程序接口 (API ), 或其他约定的方式, 获取该用户分段上屏的输入词 条记录。 例如, 用户历史输入输入字符串 ab, 并选择了输入词条 al上 屏, 则在步骤 S406中, 用户设备 1通过与该用户设备 1的交互, 获取 该用户上屏的输入词条 al, 作为该用户历史上屏的输入词条记录; 紧接 着, 该用户又输入输入字符串 cd, 并选择了输入词条 a2上屏, 则在步 骤 S406中, 用户设备 1继续通过与该用户设备 1的交互, 获取该用户 上屏的输入词条 a2, 作为该用户历史上屏的输入词条记录, 由于该输入 词条 al、 a2为该用户接连上屏的, 该输入词条 al、 a2即为该用户历史 分段上屏的输入词条记录。
在步骤 S407中, 用户设备 1根据所述历史分段上屏的输入词条记 录间的上下文关系, 建立或更新所述本地词库。 接上例, 在步骤 S407 中, 用户设备 1根据在步骤 S406中所获取的该用户历史分段上屏的输 入词条记录 al、 a2, 根据该两个输入词条记录间的上下文关系, 并结合 该两个输入词条记录的输入频次, 建立或更新该本地词库, 例如将该分 段上屏的输入词条记录及其对应的上下文关系存入该本地词库中, 如该 a2作为该 al的下文词条, 被以一个属性名为 nextentry的 vector结构 记录, 以实现对该本地词库的建立或更新。
优选地, 该方法还包括步骤 S409 (未示出), 在步骤 S409中, 用户 设备 1对所述输入词条记录进行切词处理, 以获得至少一个词条粒度; 其中, 在步骤 S407中, 用户设备 1根据所述至少一个词条粒度间的上 下文关系, 建立或更新所述本地词库。 具体地, 当该用户上屏的输入词 条记录较长, 例如可预设一长度阈值, 当该输入词条记录的词条长度超 过该长度阈值时, 判断该输入词条记录长度较长, 需要进行切词处理, 在步骤 S409中,用户设备 1根据在步骤 S406中所获取的输入词条记录, 对其进行切词处理, 以获得至少一个词条粒度。 随后, 在步骤 S407中, 用户设备 1将经切词处理后所获得的至少一个词条粒度, 及其相互间的 上下文关系, 存入该本地词库中, 如存入该本地词库的树形结构中, 以 实现对该本地词库的建立或更新。
例如, 用户历史分别分段上屏 A 、 B两个较长的输入词条记录, 在步骤 S406中, 用户设备 1获取该两个输入词条记录; 在步骤 S409 中, 用户设备 1对该两个输入词条记录进行切词处理, 将输入词条记 录 A切分为 al、 a2、 a3三个词条粒度,将输入词条记录 B切分为 bl、 b2、 b3三个词条粒度。 随后, 在步骤 S407中, 用户设备 1首先将 A、 B、 al、 a2、 a3、 bl、 b2、 b3作为一个完整的词条被学, 再保存其间 的上下文关系, 下文用一个属性名为 nextentry的 vector结构记录, 如 al的下文词条增加 a2, a2的下文词条增加 a3 , a3的下文词条除了 增加 bl外还要增加 B, B在 bl之前; 接着, 在步骤 S407中, 用户设 备 1继续记录其间的上上文关系, 例如词条粒度 a2增加 a3作为下文 词条时, 同时要记录 a3作为下文词条时 a2的上文词条是 al ,如在 a2 的 nextentry vector里增加一条记录" a3\ral", 在此, "\r,,用于分割下文 词条和上上文词条。
更优选地, 在步骤 S407中, 用户设备 1判断由至少两个词条粒度 合并后所获得的词条是否为高频词条, 其中, 所述至少两个词条粒度具 有上下文关系; 若所述词条为高频词条, 根据所述词条, 建立或更新所 述本地词库。 具体地, 在步骤 S407中, 用户设备 1才艮据在步骤 S409中 所切词处理后的词条粒度, 判断由至少两个词条粒度合并后所获得的词
条是否为高频词条, 例如, 判断该合并后的词条在该本地词库中的出现 率是否大于预定^ ^率阈值, 若大于该预定^ ^率阈值, 则判断该合并后 的词条为高频词条; 随后, 将该词条存入该本地词库中, 如存入该本地 词库的树形结构的对应节点中, 以建立或更新该本地词库。 在此, 该合 并后获得高频词条的两个词条粒度具有上下文关系。 接前例, 在步骤
S406中, 用户设备 1获取的输入词条记录为 A和 B; 在步骤 S409中, 用户设备 1对该两个输入词条记录进行切词处理, 获得词条粒度 al、 a2、 a3、 bl、 b2、 b3; 在步骤 S407中, 用户设备 1判断由词条粒度 a2、 a3合并后的词条 a2a3为高频词条,其中,该词条粒度 a2和 a3具有上下 文关系, 随后, 该用户设备 1将该词条粒度 a2、 a3进行合并处理, 获得 词条 a2a3, 并将该词条 a2a3存入该本地词库中, 以实现对该本地词库 的建立或更新。 这样, 当下次匹配查询在该本地词库的树形结构中遍历 时, 追溯 al可得下文词条 a2或 a2a3。
需要注意的是, 本发明可在软件和 /或软件与硬件的组合体中被 实施, 例如, 可采用专用集成电路(ASIC )、 通用目的计算机或任何 其他类似硬件设备来实现。 在一个实施例中, 本发明的软件程序可 以通过处理器执行以实现上文所述步骤或功能。 同样地, 本发明的 软件程序(包括相关的数据结构)可以被存储到计算机可读记录介质 中, 例如, RAM 存储器, 磁或光驱动器或软磁盘及类似设备。 另 外, 本发明的一些步骤或功能可采用硬件来实现, 例如, 作为与处 理器配合从而执行各个步骤或功能的电路。
另外, 本发明的一部分可被应用为计算机程序产品, 例如计算 机程序指令, 当其被计算机执行时, 通过该计算机的操作, 可以调 用或提供根据本发明的方法和 /或技术方案。 而调用本发明的方法的 程序指令, 可能被存储在固定的或可移动的记录介质中, 和 /或通过 广播或其他信号承载媒体中的数据流而被传输, 和 /或被存储在根据 所述程序指令运行的计算机设备的工作存储器中。 在此, 根据本发 明的一个实施例包括一个装置, 该装置包括用于存储计算机程序指 令的存储器和用于执行程序指令的处理器, 其中, 当该计算机程序
指令被该处理器执行时, 触发该装置运行基于前述根据本发明的多 个实施例的方法和 /或技术方案。
对于本领域技术人员而言, 显然本发明不限于上述示范性实施 例的细节, 而且在不背离本发明的精神或基本特征的情况下, 能够 以其他的具体形式实现本发明。 因此, 无论从哪一点来看, 均应将 实施例看作是示范性的, 而且是非限制性的, 本发明的范围由所附 权利要求而不是上述说明限定, 因此旨在将落在权利要求的等同要 件的含义和范围内的所有变化涵括在本发明内。 不应将权利要求中 的任何附图标记视为限制所涉及的权利要求。 此外, 显然"包括"一 词不排除其他单元或步骤, 单数不排除复数。 装置权利要求中陈述 的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实 现。 第一, 第二等词语用来表示名称, 而并不表示任何特定的顺 序。
Claims
1. 一种用于基于本地词库提供输入候选词条的方法, 其中, 该方法 包括以下步骤:
a获取用户输入的输入字符串;
b根据所述输入字符串, 在本地词库中进行匹配查询, 获取对应的 候选词条, 其中, 所述本地词库根据用户历史分段上屏的输入词条记录 建立或更新;
c对所述候选词条中所包括的末节子词条, 在所述本地词库中进行 匹配查询, 确定所述末节子词条所对应的下文词条;
d将所述候选词条与所述下文词条进行合并处理, 以获得待提供给 所述用户的输入候选词条;
e将所述输入候选词条提供给所述用户。
2. 根据权利要求 1 所述的方法, 其中, 所述本地词库以树形结构 存放词条; 其中, 所述步骤 b和步骤 c中的匹配查询包括:
- 采用深度遍历算法, 遍历所述本地词库中的所述树形结构。
3. 根据权利要求 2所述的方法, 其中, 所述匹配查询包括:
-根据所述树形结构的节点中所存放的词条的上下文关系, 采用所 述深度遍历算法, 遍历所述本地词库中的所述树形结构。
4. 根据权利要求 1至 3 中任一项所述的方法, 其中, 该方法还包 括:
X根据所述用户的历史输入顺序, 确定所述输入候选词条的优先 级;
其中, 所述步骤 e包括:
-按照所述优先级, 将所述输入候选词条提供给所述用户。
5. 根据泉流要求 4所述的方法, 其中, 所述步骤 X包括:
-根据所述用户的历史输入顺序, 并结合所述输入候选词条的词条 属性, 确定所述输入候选词条的优先级;
其中, 所述词条属性包括以下至少任一项:
- 所述输入候选词条对应所述本地词库的概率属性;
- 所述用户历史输入所述输入候选词条的次数;
- 所述输入候选词条中所包括的子词条间的转移概率;
- 所述输入候选词条所对应的预测长度。
6. 根据权利要求 1至 5中任一项所述的方法, 其中, 该方法还包 括:
- 获取所述用户历史分段上屏的输入词条记录;
r根据所述历史分段上屏的输入词条记录间的上下文关系, 建立或 更新所述本地词库。
7. 根据权利要求 6所述的方法, 其中, 该方法还包括:
-对所述输入词条记录进行切词处理, 以获得至少一个词条粒度; 其中, 所述步骤 r包括:
-根据所述至少一个词条粒度间的上下文关系, 建立或更新所述本 地词库。
8. 根据权利要求 7所述的方法, 其中, 所述步骤 r包括:
- 判断由至少两个词条粒度合并后所获得的词条是否为高频词条, 其中, 所述至少两个词条粒度具有上下文关系;
- 若所述词条为高频词条, 根据所述词条, 建立或更新所述本地词 库。
9. 一种用于基于本地词库提供输入候选词条的用户设备, 其中, 该 设备包括:
第一获取装置, 用于获取用户输入的输入字符串;
第一匹配装置, 用于根据所述输入字符串, 在本地词库中进行匹配 查询, 获取对应的候选词条, 其中, 所述本地词库根据用户历史分段上 屏的输入词条 i己录建立或更新;
第二匹配装置, 用于对所述候选词条中所包括的末节子词条, 在所 述本地词库中进行匹配查询, 确定所述末节子词条所对应的下文词条; 合并装置, 用于将所述候选词条与所述下文词条进行合并处理, 以 获得待提供给所述用户的输入候选词条;
提供装置, 用于将所述输入候选词条提供给所述用户。
10. 根据权利要求 9所述的用户设备, 其中, 所述本地词库以树形 结构存放词条; 其中, 所述第一匹配装置和第二匹配装置所执行的匹配 查询包括:
- 采用深度遍历算法, 遍历所述本地词库中的所述树形结构。
11. 根据权利要求 10所述的用户设备, 其中, 所述匹配查询包括:
-根据所述树形结构的节点中所存放的词条的上下文关系, 采用所 述深度遍历算法, 遍历所述本地词库中的所述树形结构。
12. 根据权利要求 9至 11中任一项所述的用户设备, 其中, 该设备 还包括:
优先级确定装置, 用于才艮据所述用户的历史输入顺序, 确定所述输 入候选词条的优先级;
其中, 所述提供装置用于:
-按照所述优先级, 将所述输入候选词条提供给所述用户。
13. 根据泉流要求 12 所述的用户设备, 其中, 所述优先级确定装 置用于:
-根据所述用户的历史输入顺序, 并结合所述输入候选词条的词条 属性, 确定所述输入候选词条的优先级;
其中, 所述词条属性包括以下至少任一项:
- 所述输入候选词条对应所述本地词库的概率属性;
- 所述用户历史输入所述输入候选词条的次数;
- 所述输入候选词条中所包括的子词条间的转移概率;
- 所述输入候选词条所对应的预测长度。
14. 根据权利要求 9至 13 中任一项所述的用户设备, 其中, 该设 备还包括:
第二获取装置, 用于获取所述用户历史分段上屏的输入词条记录; 更新装置, 用于根据所述历史分段上屏的输入词条记录间的上下文 关系, 建立或更新所述本地词库。
15. 根据权利要求 14所述的用户设备, 其中, 该设备还包括:
切词装置, 用于对所述输入词条记录进行切词处理, 以获得至少一 个词条粒度;
其中, 所述更新装置用于:
-根据所述至少一个词条粒度间的上下文关系, 建立或更新所述本 地词库。
16. 根据权利要求 15所述的用户设备, 其中, 所述更新装置用于: - 判断由至少两个词条粒度合并后所获得的词条是否为高频词条, 其中, 所述至少两个词条粒度具有上下文关系;
- 若所述词条为高频词条, 根据所述词条, 建立或更新所述本地词 库。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310157069.0A CN103984688B (zh) | 2013-04-28 | 2013-04-28 | 一种基于本地词库提供输入候选词条的方法与设备 |
CN201310157069.0 | 2013-04-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014176959A1 true WO2014176959A1 (zh) | 2014-11-06 |
Family
ID=51276664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/074856 WO2014176959A1 (zh) | 2013-04-28 | 2014-04-04 | 一种基于本地词库提供输入候选词条的方法与设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103984688B (zh) |
WO (1) | WO2014176959A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961791A (zh) * | 2017-12-22 | 2019-07-02 | 北京搜狗科技发展有限公司 | 一种语音信息处理方法、装置及电子设备 |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281649B (zh) * | 2014-09-09 | 2017-04-19 | 北京搜狗科技发展有限公司 | 一种输入方法、装置及电子设备 |
CN104268166B (zh) * | 2014-09-09 | 2017-04-19 | 北京搜狗科技发展有限公司 | 一种输入方法、装置和电子设备 |
CN105868113B (zh) * | 2016-03-31 | 2019-05-31 | 广州华多网络科技有限公司 | 字符串查找方法及装置 |
CN107665206B (zh) * | 2016-07-27 | 2023-04-07 | 北京搜狗科技发展有限公司 | 清理用户词库的方法、系统和用于清理用户词库的装置 |
CN106484135B (zh) * | 2016-09-23 | 2019-03-19 | 百度在线网络技术(北京)有限公司 | 一种用于提供输入候选项的方法与装置 |
CN106557178B (zh) * | 2016-11-29 | 2021-03-09 | 百度国际科技(深圳)有限公司 | 用于更新输入法词条的方法及装置 |
CN106909232A (zh) * | 2017-02-28 | 2017-06-30 | 百度在线网络技术(北京)有限公司 | 用于展示候选词条的方法和装置 |
CN106873801A (zh) * | 2017-02-28 | 2017-06-20 | 百度在线网络技术(北京)有限公司 | 用于生成输入法词库中的词条组合的方法和装置 |
CN108572953B (zh) * | 2017-03-07 | 2023-06-20 | 上海颐为网络科技有限公司 | 一种词条结构的合并方法 |
CN110019656A (zh) * | 2017-07-26 | 2019-07-16 | 上海颐为网络科技有限公司 | 一种新建词条相关内容智能推送方法和系统 |
CN107844580A (zh) * | 2017-11-10 | 2018-03-27 | 北京酷我科技有限公司 | 一种搜索词匹配方法 |
CN111522448B (zh) * | 2019-02-02 | 2024-04-30 | 北京搜狗科技发展有限公司 | 一种提供输入候选项的方法、装置和设备 |
CN112445347A (zh) * | 2019-08-27 | 2021-03-05 | 北京搜狗科技发展有限公司 | 一种输入方法、装置和用于输入的装置 |
CN113703588B (zh) * | 2020-05-20 | 2024-10-29 | 北京搜狗科技发展有限公司 | 一种输入方法、装置和用于输入的装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101419531A (zh) * | 2008-12-12 | 2009-04-29 | 腾讯科技(深圳)有限公司 | 在计算机中进行文字输入的方法及装置 |
CN101458694A (zh) * | 2008-10-09 | 2009-06-17 | 浙江大学 | 一种基于树形词库的中文分词方法 |
CN102629160A (zh) * | 2012-03-16 | 2012-08-08 | 华为终端有限公司 | 一种输入法、输入装置及终端 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005070856A (ja) * | 2003-08-27 | 2005-03-17 | Seiko Instruments Inc | 電子辞書における漢字熟語検索機能 |
CN102445994B (zh) * | 2010-09-30 | 2018-05-04 | 北京搜狗科技发展有限公司 | 一种智能输入方法及输入法系统 |
CN102360250A (zh) * | 2011-10-13 | 2012-02-22 | 广东步步高电子工业有限公司 | 一种记忆式输入法、系统及其应用的移动手持设备 |
-
2013
- 2013-04-28 CN CN201310157069.0A patent/CN103984688B/zh active Active
-
2014
- 2014-04-04 WO PCT/CN2014/074856 patent/WO2014176959A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458694A (zh) * | 2008-10-09 | 2009-06-17 | 浙江大学 | 一种基于树形词库的中文分词方法 |
CN101419531A (zh) * | 2008-12-12 | 2009-04-29 | 腾讯科技(深圳)有限公司 | 在计算机中进行文字输入的方法及装置 |
CN102629160A (zh) * | 2012-03-16 | 2012-08-08 | 华为终端有限公司 | 一种输入法、输入装置及终端 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961791A (zh) * | 2017-12-22 | 2019-07-02 | 北京搜狗科技发展有限公司 | 一种语音信息处理方法、装置及电子设备 |
CN109961791B (zh) * | 2017-12-22 | 2021-10-22 | 北京搜狗科技发展有限公司 | 一种语音信息处理方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN103984688B (zh) | 2015-11-25 |
CN103984688A (zh) | 2014-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014176959A1 (zh) | 一种基于本地词库提供输入候选词条的方法与设备 | |
DE112013004585B4 (de) | Inkrementelle merkmalbasierte Gestentastatur-Dekodierung | |
JP5799621B2 (ja) | 情報処理装置、情報処理方法及びプログラム | |
US11573989B2 (en) | Corpus specific generative query completion assistant | |
JP7014163B2 (ja) | 情報処理装置、および情報処理方法 | |
JP2022037100A (ja) | 車載機器の音声処理方法、装置、機器及び記憶媒体 | |
KR102456535B1 (ko) | 의료 사실 검증 방법, 장치, 전자 기기, 저장 매체 및 프로그램 | |
US10860588B2 (en) | Method and computer device for determining an intent associated with a query for generating an intent-specific response | |
WO2021147421A1 (zh) | 用于人机交互的自动问答方法、装置和智能设备 | |
TW201510774A (zh) | 以語音辨識來選擇控制客體的裝置及方法 | |
TW201512968A (zh) | 以語音辨識來發生事件裝置及方法 | |
US20230004798A1 (en) | Intent recognition model training and intent recognition method and apparatus | |
JP6064629B2 (ja) | 音声入出力データベース検索方法、プログラム、及び装置 | |
JP2014202848A (ja) | テキスト生成装置、方法、及びプログラム | |
US10540968B2 (en) | Information processing device and method of information processing | |
JP2013164778A (ja) | 文字入力装置、文字入力方法及び文字入力プログラム | |
CN105589570B (zh) | 一种处理输入错误的方法和装置 | |
US20190287514A1 (en) | Voice recognition method, device and computer storage medium | |
JP2014115894A (ja) | 表示装置 | |
CN107316639A (zh) | 一种基于语音识别的信息输入方法及装置,电子设备 | |
CN105988595B (zh) | 滑行输入方法及装置 | |
JP4524702B2 (ja) | データ管理装置、検索条件情報管理方法、およびコンピュータプログラム | |
TW201506685A (zh) | 以語音辨識來選擇控制客體的裝置及方法 | |
US8798996B2 (en) | Splitting term lists recognized from speech | |
US9026547B2 (en) | Fault-tolerant search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14791178 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14791178 Country of ref document: EP Kind code of ref document: A1 |