WO2013143252A1

WO2013143252A1 - Method and system for prompting input candidate words based on context scenario

Info

Publication number: WO2013143252A1
Application number: PCT/CN2012/079960
Authority: WO
Inventors: 李静
Original assignee: 百度在线网络技术（北京）有限公司
Priority date: 2012-03-28
Filing date: 2012-08-10
Publication date: 2013-10-03
Also published as: CN103365833A; CN103365833B

Abstract

Provided is a method for prompting input candidate words based on a context scenario, comprising: receiving an entry input by a user; based on the input history of the entry by the user and a random non-local context scenario related to the entry, generating a first candidate word set; and providing the first candidate word set to the user. Also provided is a system used for the method. The present invention fully uses a context scenario to recommend candidate words, which effectively increases the hit rate of a candidate first word in the input process.

Description

Input candidate word prompting method and system based on context scene

[0001] The present application claims the priority of the Chinese patent application filed on March 28, 2012, with the application number 201210086810.4, entitled "A Context-Based Input Candidate Prompt Method and System", the entire contents of which are incorporated herein by reference. This is incorporated herein by reference. Technical field

The present invention relates to the field of input methods, and in particular to a method and system for prompting input candidate words based on a context scenario. Background technique

[0003] Different people have different input contents because of different interests, hobbies, and habits. The existing input method candidate hint ignores the user's difference. In the case of the same entered word, the candidate does not consider the user's personalized information, so that the user cannot find the candidate quickly and conveniently. The existing input method is mainly based on large-scale word frequency statistics method, combined with local context to achieve candidate word probability statistics and prompts. The current mainstream input method can count the most recent and most frequent input of the user, and the weighted priority displays the user's most recent and most frequently used vocabulary.

[0004] Commonly used input methods are mainly divided into the following categories:

1. The input method for the smart phone can be combined with the characteristics of the mobile phone to provide the name of the phone book, the candidate word prompt of the phone or the user to specify a fixed candidate word.

2, according to the vocabulary classification, can provide a specific term in a certain field, such as "stock code" fast input device.

3. The candidate word prompt is combined with the device attribute, and is usually used in the Internet search, and the candidate input word is presented by referring to the model of the input method carrying device and the corresponding function.

4. Using multi-user personalized information to mine user characteristics and make candidate word prompts, for example, by counting the vocabulary of the user client, mining users with the same interests and hobbies, establishing a similarity relationship, and thus the user vocabulary with similar interests Recommended for other users.

[0005] However, the commonly used input method only considers the context of user input, ignoring the device used by the user and also the recipient of the information, and the received information changes the behavior habits of the device user. Change. Taking a mobile phone as an example, when a user receives a short message, different information may be replied to different short messages; when the user browses the webpage, the user may use the input function such as replying and searching for different pages on the Internet, so the vocabulary used by the user It will be different because of the situation at the time. In this case, the existing input method does not provide the candidate words well for the user. Summary of the invention

The present invention provides a method for prompting an input candidate word based on a context scenario, which is used to calculate the historical information of the short message input by the user, and simultaneously consider the context scenario input by the non-local user, to make up for the deficiency of the existing input method, and improve the candidate. The word "first hit rate" and "candidate hit rate" make the input content truly "personalized".

According to an aspect of the present invention, a context candidate based input candidate prompting method is provided, wherein the following steps are included:

a) receiving the entry entered by the user;

b) generating a first set of candidate words based on an input history of the term by the user and any non-local contextual scene associated with the term;

c) providing the first set of candidate words to the user.

According to another aspect of the present invention, an input candidate word prompting system based on a context scenario is provided, which includes:

a receiving device, configured to receive a term input by a user;

[0010] generating means, configured to generate a first candidate word set based on an input history of the term by the user and an arbitrary non-local context scenario related to the term;

[0011] providing means for providing the first candidate word set to the user.

The present invention is applied to various input candidate platforms having context scenarios, and the context scenarios include any context scenarios input by non-local users. According to the entry input by the user, combined with the analysis of the user history input record and the context scenario, a candidate word set is generated, and the candidate word set is provided to the user. The present invention fully utilizes non-local user resource information, especially represented by a mobile communication device. For example, when a mobile phone performs a short message chat, if the candidate word set can be re-generated in consideration of the context scenario, the input of the user is greatly facilitated, thereby Improve the candidate's first hit rate for mobile phone input. DRAWINGS

Other features, objects, and advantages of the present invention will become more apparent from the Detailed Description of Description

1 is a flow chart showing a specific embodiment of a method for prompting an input candidate word based on a context scenario according to the present invention;

2 is a schematic structural diagram of a specific implementation manner of a context candidate based input candidate prompting system according to the present invention;

3 is a schematic structural diagram of a specific implementation manner of a generating device in an input candidate prompting system based on a context scenario according to the present invention.

The same or similar reference numerals in the drawings denote the same or similar components. detailed description

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.

[0020] The following disclosure provides many different embodiments or examples for implementing different structures of the present invention. In order to simplify the disclosure of the present invention, the components and arrangements of the specific examples are described below. Of course, they are merely examples and are not intended to limit the invention. Furthermore, the present invention may repeat reference numerals and/or letters in different examples. This repetition is for the purpose of clarity and clarity and does not in itself indicate the relationship between the various embodiments and/or arrangements discussed. It is noted that the components illustrated in the drawings are not necessarily to scale. The description of known components and processing techniques and processes is omitted to avoid unnecessarily limiting the invention.

As shown in FIG. 1 , FIG. 1 is a schematic flowchart of a specific embodiment of a method for prompting an input candidate word based on a context scenario according to the present invention, which includes steps S101 to S103, and FIG. 1 is combined with a specific embodiment. The method shown is described. [0022] Step S101, receiving a term input by a user. The method of the present invention can be applied to any device that can load an input method, including but not limited to: a terminal such as a PC, a notebook computer, a PDA (Personal Handheld Computer), a mobile phone, a tablet computer, etc., preferably capable of loading an input method. Mobile phone. Therefore, the following is an example of a mobile phone.

[0023] The terms entered by the user may be one of characters in various languages, one of pinyin, or a combination thereof. For example: "Baidu", "woxihuan", "Baidu ditu" and so on.

[0024] Step S102: Generate a first candidate word set based on an input history of the entry by the user and an arbitrary non-local context scenario related to the entry.

[0025] Preferably, the second set of candidate words may be generated first based on a history recorded by the user on the entry. After receiving the input term, the term needs to be semantically analyzed, such as analyzing its part of speech, inputting history, etc., to determine candidate words. For example, if the entry entered by the user is "on", then according to the analysis of the entry, it is known that the term usually appears as a verb part, and then a noun part of the term appears, such as: , machine, etc.; According to the input history, there will also be entries such as: start, start, guide, etc., which often appear after the entry "open".

[0026] For the input history analysis of a certain term, in addition to the massive data analysis of the input history of most users, it is also necessary to adjust the order of the candidate terms in conjunction with the use of the native input method, so as to be able to More flexible matching of personalized user needs. For example, if the user is a psychological counselor, then after entering the entry "open", the first digit of the candidate word is "guide", followed by: computer, machine, start,, and so on; For a person who meets frequently, then after entering the entry "open", the first place of the candidate is "will", followed by: computer, machine, start, and guide. For the analysis of the usage history of the local input method, it can be carried out by using common analysis methods such as account information and cookies of the local user. The set of candidate words generated by the above-described analysis of the history input according to the user entry is the second candidate word set.

[0027] Furthermore, in order to make candidate words closer to user needs, analysis of contextual contexts is also important. With the development of the Internet and wireless communication, information interaction is becoming more and more important, so it is crucial to analyze non-local and contextual contexts related to the term, based on any non-local and A contextual context related to the entry, generating a third set of candidate words. For example, in the normal case, when the user enters the entry "section", the candidate words are usually: skills, learning, goals, length, room, etc.; and when the user is browsing a web page related to the NBA player, then when the user Need to When replying, searching, etc., the entry "section" is entered, and the first candidate is: ratio. Another example: When the user needs to perform an input operation when browsing on a soccer website, when the user inputs the entry "bay", the candidate words may be: Lee, Kenbauer, Takashi, and the like. The above candidate word set is the third candidate word set.

[0028] The following is an example of editing a short message by a smartphone.

[0029] User A receives a text message from User B: "I went to the Software Park today, it is very good!" Since User A does not know where "Shangdi Software Park" is, he wants to reply to a message to the user. B asks. However, because "上上" is not a common word, that is, no login words, when user A enters "up", the existing candidate prompt method cannot set "ground" as a candidate at all. Therefore, User A needs to enter the words "up" and "ground" separately. However, the method of the present invention can prompt candidate words based on non-local context scenes related to the terms, so the input method in the present invention can use "ground" as a candidate word.

[0030] First, a user common vocabulary is generated according to each user input information. Calculate the probability a for the word beginning with "up". Secondly, receiving text information input by the non-local user (user B), and cutting the text information to form at least one type of word. That is to divide the original text message content, using the reverse maximum matching method, the result is "today \ I \ go \ on \ land \ software park \". Then, the term is stored in the pre-stored vocabulary, and two consecutive singular words in the segmentation result are stored as one word, and stored in a pre-stored vocabulary, such as "I go", "go to", "上上", and the like. The probability values of successive occurrences can be calculated using the n-gram model, assuming that the probability of occurrence of "upper" is b. Again, based on the entry entered by the user, a third set of candidate words is generated based on the pre-stored vocabulary. Since user A's reply input has "up", the values of a*a and β*b are compared. A and β are trained parameters, so that the candidate hint box gives priority to "ground" rather than traditional" Go to work, "on the train", "online", etc. That is, the third candidate word set may be a ground, a class, a car, a net, or the like. Preferably, new terms appearing in the context are given a higher weight such that the terms appearing in these contexts preferentially appear in the set of candidate words entered by the user.

[0031] generating a first candidate word set according to the second candidate word set and the third candidate word. Preferably, the second candidate word set and the third candidate word set are weighted to generate a first candidate word set. The weights of the second candidate word set and the third candidate word set may be set by the user according to requirements. Preferably, the weight of the third candidate word set is higher than the second candidate word set, usually Next, the first candidate word of the third candidate word set is the first candidate word of the first candidate word set.

[0032] Step S103, providing the first candidate word set to the user. After the above steps are completed, a first candidate word set most relevant to the user input term can be obtained, and the first candidate word set is provided to the user for selection by the user. Usually, the first candidate will be displayed differently from other candidate words, such as: reverse white, different colors, and so on.

Referring to FIG. 2, FIG. 2 is a block diagram showing a specific embodiment of a context-based scene-based input candidate word presentation system 10 in accordance with the present invention. The system 10 includes: a receiving device 1 1, a generating device 12, and a providing device 13.

[0034] The receiving device 1 1 is configured to receive a term input by a user. The system of the present invention can be applied to any device that can load an input method, including but not limited to: a terminal such as a PC, a notebook computer, a PDA (Personal Handheld Computer), a mobile phone, a tablet computer, etc., preferably capable of loading an input method. Mobile phone. Therefore, the following is an example of a mobile phone.

[0035] The terms entered by the user may be one of characters in various languages, one of pinyin, or a combination thereof. For example: "Baidu", "woxihuan", "Baidu ditu" and so on.

[0036] The generating means 12 is configured to generate a first candidate word set based on the input history of the term by the user and any non-local contextual scene related to the term.

[0037] Preferably, the generating means 12 may be further configured to first generate a second set of candidate words based on a history of the user input to the entry. After receiving the input term, the term needs to be semantically analyzed, such as analyzing its part of speech, inputting history, etc., to determine candidate words. For example, if the entry entered by the user is "on", then according to the analysis of the entry, it is known that the term usually appears as a verb part, and then a noun part of the term appears, such as: , machine, etc., according to the input history, there will also be words such as: start, start, guide, etc. often appear after the entry "open".

[0038] For the input history analysis of a certain entry, in addition to the massive data analysis of the input history of most users, it is also necessary to adjust the order of the candidate terms in conjunction with the use of the native input method, so as to be able to More flexible matching of personalized user needs. For example, if the user is a counselor, then after entering the entry "open", the first digit of the candidate is "guide", followed by computer, machine, start, conference, etc.; When a person who meets frequently, then after entering the entry "open", the first place of the candidate is "will", then the computer, machine, start, , guidance, etc. For the analysis of the usage history of the local input method, it can be performed by using common analysis methods such as account information and cookies of the local user. The set of candidate words generated by the above analysis of the history input according to the user entry is the second candidate word set.

[0039] Furthermore, in order to make candidate words closer to user needs, analysis of contextual contexts is also important. With the development of the Internet and wireless communication, information interaction is becoming more and more important, so it is crucial to analyze non-local and contextual contexts related to the term, and then generate device 12 for arbitrary A non-local contextual scene associated with the term generates a third set of candidate words. For example, in the normal case, when the user enters the entry "section", the candidate words are usually: skills, learning, goals, length, room, etc.; and when the user is browsing a web page related to the NBA player, then when the user When a reply, search, etc. operation is required, the entry "section" is entered, and the first candidate word is: ratio. Another example: When the user needs to perform an input operation when browsing on a soccer website, when the user inputs the entry "Bei", the candidate words may be: Lee, Kenbauer, Takashi, etc. The above candidate word set is the third candidate word set.

[0040] The following is an example of editing a short message by a smartphone.

[0041] User A receives a text message from User B: "I went to the Software Park today, it is very good!" Since User A does not know where the Shangdi Software Park is, he wants to reply to a message to User B. ask. However, because "上上" is not a common word, that is, no login words, when user A enters "up", the existing candidate prompt method cannot set "ground" as a candidate at all. Therefore, User A needs to enter the words "up" and "ground" respectively. However, the method of the present invention can present candidate words based on a non-local contextual scene associated with the term, so the input method of the present invention can use "ground" as a candidate.

[0042] The system 10 generates a user common vocabulary based on each user input information. For example, calculate the probability a of the word beginning with "up". Referring to FIG. 3, the generating device 12 further includes: a word generating module 121, a storage module 122, and a generating module 123. The word generating module 121 is configured to receive text information input by a non-local user (user B), and perform word cutting on the text information to form at least one type of word. That is, the original short message content is cut, and the reverse maximum matching method is used. After the segmentation, the result is "Today\I\Go\上\地\Software Park\". The storage module 122 is configured to store the word in the pre-stored vocabulary, that is, two consecutive singular words in the segmentation result are stored as one word, and stored in a pre-stored vocabulary, such as "I go", "go", "上上"" Wait. The probability values of consecutive occurrences can be calculated using the n-gram model, assuming the probability of occurrence of "上上" Is 13. The generating module 123 is configured to generate, according to the entry entered by the user, a third candidate word set based on the pre-stored vocabulary. Since user A's reply input has "up", the values of a *a and β · b are compared, and a and β are trained parameters, so that the candidate hint box gives priority to "ground" instead of the traditional one. Go to work, "on the train", "online", etc. That is, the third candidate word set may be a ground, a class, a car, a net, or the like. Preferably, new terms appearing in the context are given a higher weight such that the terms appearing in these contexts preferentially appear in the set of candidate words entered by the user.

Further, the generating means 12 is configured to generate a first candidate word set according to the second candidate word set and the third candidate word. Preferably, the second candidate word set and the third candidate word set are weighted to generate a first candidate word set. The weights of the second candidate word set and the third candidate word set may be set by the user according to requirements. Preferably, the weight of the third candidate word set is higher than the second candidate word set. Generally, the first candidate word of the third candidate word set is the first candidate word of the first candidate word set.

[0044] providing means 13 for providing the first set of candidate words to the user. After the above steps are completed, a first candidate word set most relevant to the user input term can be obtained, and the first candidate word set is provided to the user for selection by the user. Usually, the first candidate will use a different display than the other candidates, such as: reverse white, different colors, and so on.

[0045] With the method and system of the present invention, candidate word recommendation can be fully utilized by non-local context scenarios, and the candidate word hit rate in the input process can be effectively improved.

[0046] It is apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, and that the present invention can be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the invention is defined by the appended claims All changes in the meaning and scope of equivalent elements are included in the present invention. Any reference signs in the claims should not be construed as limiting the claim. In addition, it is obvious that the word "comprising" does not exclude other modules or steps, and the singular does not exclude the plural.

Claims

Rights request

A method for prompting an input candidate word based on a context scenario, comprising the steps of: a) receiving a term input by a user;

b) generating a first set of candidate words based on an input history of the entry by the user and any non-local contextual context associated with the entry;

c) providing the first set of candidate words to the user.

2. The method according to claim 1, wherein the step b) further comprises the steps of: generating a second candidate word set based on the history record entered by the user on the entry; based on any non-local context Generating a third candidate word set according to the context scenario related to the entry; and generating a first candidate word set according to the second candidate word set and the third candidate word set.

3. The method according to claim 2, wherein the step b) further comprises the steps of: receiving text information input by a non-local user, and performing a word-cutting on the text information to form at least one type of word;

Storing the class words in a pre-stored vocabulary;

And generating, according to the entry entered by the user, a third candidate word set based on the pre-stored vocabulary.

The method according to claim 2 or 3, wherein the second candidate word set and the third candidate word set are weighted to generate a first candidate word set.

The method according to any one of claims 1 to 4, wherein the entry is one of characters of various languages, pinyin, or a combination thereof.

The method according to claim 1 or 2, wherein the context scenario is a text message received by a user or context information of a browsed webpage.

7. The method according to claim 6, wherein the entry appearing in the context information It appears preferentially in the first set of candidate words entered by the user.

8. An input candidate prompting system based on a context scenario, wherein:

a receiving device, configured to receive a term input by a user;

Generating means for generating a first candidate word set based on an input history of the term by the user and an arbitrary non-local context scenario related to the term;

Providing means for providing the first candidate word set to the user.

9. The system according to claim 8, wherein the generating means is further configured to: generate a second candidate word set based on the historical record input by the user on the entry; based on any non-local and a term related context scenario, generating a third candidate word set; generating a first candidate word set according to the second candidate word set and the third candidate word set.

10. The system according to claim 9, wherein the generating means further comprises:

a word generating module, configured to receive text information input by a non-local user, and perform word cutting on the text information to form at least one type of word;

a storage module, configured to store the word in a pre-stored vocabulary;

And a generating module, configured to generate, according to the entry entered by the user, a third candidate word set based on the pre-stored vocabulary.

The system according to claim 9 or 10, wherein the generating means is configured to weight the second candidate word set and the third candidate word set to generate a first candidate word set.

The system according to any one of claims 8 to 11, wherein the entry is one of characters of a variety of languages, one of pinyin, or a combination thereof.

The system according to claim 8 or 9, wherein the context scenario is a text message received by a user or context information of a browsed webpage.

14. The system according to claim 8, wherein the term appearing in the context information preferentially appears in a first candidate word set input by a user.