WO2016037519A1 - 一种输入方法、装置及电子设备 - Google Patents

一种输入方法、装置及电子设备 Download PDF

Info

Publication number
WO2016037519A1
WO2016037519A1 PCT/CN2015/087050 CN2015087050W WO2016037519A1 WO 2016037519 A1 WO2016037519 A1 WO 2016037519A1 CN 2015087050 W CN2015087050 W CN 2015087050W WO 2016037519 A1 WO2016037519 A1 WO 2016037519A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
queue
upper screen
language model
input cursor
Prior art date
Application number
PCT/CN2015/087050
Other languages
English (en)
French (fr)
Inventor
崔欣
任尚昆
唐拯
张扬
Original Assignee
北京搜狗科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京搜狗科技发展有限公司 filed Critical 北京搜狗科技发展有限公司
Priority to US15/521,299 priority Critical patent/US10496687B2/en
Publication of WO2016037519A1 publication Critical patent/WO2016037519A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • G06F16/3323Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/018Input/output arrangements for oriental characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0236Character input methods using selection techniques to select from displayed items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04812Interaction techniques based on cursor appearance or behaviour, e.g. being affected by the presence of displayed objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/048Indexing scheme relating to G06F3/048
    • G06F2203/04801Cursor retrieval aid, i.e. visual aspect modification, blinking, colour changes, enlargement or other visual cues, for helping user do find the cursor in graphical user interfaces

Definitions

  • the present application relates to the field of communications technologies, and in particular, to an input method, device, and electronic device.
  • Pinyin input is one of the simplest Chinese character input methods. It develops very fast, from the first generation to word input, that is, the user can only input one Chinese character at a time, and the second generation is based on word input and has Intelligent frequency modulation function, this time mainly depends on the input method dictionary, developed to the third generation, the user can input the sentence, the sentence not in the input method dictionary can also be input, the group word function has a great experience on the input. Impact.
  • the input method association function is an extension of the active input of the pinyin input method. Its appearance reduces the number of times the user actively inputs, the number of key presses, and increases the intelligence of the input method.
  • the implementation process of the input method is to first obtain the last entry of the user on the upper screen, and query the pre-built vocabulary such as the system binary database to obtain the upper screen candidate word queue, and then output the upper screen candidate word queue.
  • the upper screen candidate queue in the input method must depend on the last time the upper screen entry, when the input cursor changes position, it can not obtain a reliable upper screen entry, and thus can not associate the input cursor.
  • the upper screen candidate queue Therefore, a technical problem that needs to be solved urgently by those skilled in the art is how to obtain a reliable upper screen candidate queue when the input cursor moves.
  • the technical problem to be solved by the embodiments of the present application is to provide an input method capable of obtaining a reliable upper screen candidate queue when the input cursor moves.
  • the embodiment of the present application further provides an input device and an electronic device to ensure implementation and application of the foregoing method.
  • an input method including:
  • the text information including the above text information before the input cursor and/or the following text information located after the input cursor;
  • the upper screen candidate queue is output.
  • the obtaining the text information at the input cursor includes:
  • the text information at the input cursor is acquired.
  • the obtaining the text information at the input cursor includes:
  • the text information at the input cursor is obtained by using a whole sentence dividing point or a text box boundary where the input cursor is located as a length boundary of the text information.
  • the associating the candidate candidate vocabulary of the keyword to obtain the upper screen candidate term queue at the input cursor comprises:
  • the determining a language model corresponding to the keyword according to the distance relationship between the keyword and the input cursor comprises:
  • the keyword is one, when the distance relationship between the keyword and the input cursor is an adjacency relationship, determining that the language model corresponding to the keyword is a neighboring binary language model; when the distance relationship When the non-contiguous relationship is determined, the language model corresponding to the keyword is determined to be a long-distance binary language model;
  • the language model corresponding to the keyword is a ternary language model.
  • the method further includes:
  • the language model comprising a neighboring binary language model, a long-distance binary language model and a ternary language model;
  • the establishing a language model and its associative candidate vocabulary include:
  • Extracting a training candidate word and a training keyword in the training corpus the distance relationship between the training keyword and the training candidate word includes an adjacency relationship and a non-contiguous relationship, and the training keyword is at least one;
  • the determining, according to the application attribute to which the keyword belongs, the language model corresponding to the keyword including:
  • the associating the candidate candidate vocabulary of the language model to obtain the upper screen candidate term queue at the input cursor comprises:
  • the merged upper screen candidate words are sorted according to the weight from high to low to obtain the upper screen candidate word queue at the input cursor.
  • the method further includes:
  • the outputting the upper screen candidate queue includes:
  • sequencing of the upper screen candidate word queue according to the topic scenario at the input cursor includes:
  • the upper screen candidate word queues are sequenced according to the sequence of the scene feature tags, wherein the upper screen candidate words in the upper screen candidate word queue each have respective scene feature tags.
  • the application also discloses an input device, including:
  • a text obtaining unit configured to acquire text information at the input cursor, the text information including the above text information before the input cursor and/or the following text information located after the input cursor;
  • a keyword extracting unit configured to extract keywords in the text information
  • a queue obtaining unit configured to search for a prediction candidate dictionary of the keyword, and obtain an upper screen candidate word queue at the input cursor
  • a queue output unit configured to output the upper screen candidate queue.
  • the text obtaining unit is configured to acquire text information at the input cursor when detecting that the input cursor is located in the text box and the time for stopping the text input exceeds a time threshold.
  • the text obtaining unit is specifically configured to obtain the text information at the input cursor by using a whole sentence dividing point or a text box boundary where the input cursor is located as a length boundary of the text information.
  • the queue obtaining unit includes:
  • a model determining subunit configured to determine a language model corresponding to the keyword according to a distance relationship between the keyword and the input cursor and/or an application attribute to which the keyword belongs;
  • the queue obtaining subunit is configured to search for a prediction candidate dictionary of the language model, and obtain an upper screen candidate queue at the input cursor.
  • the model determining subunit is specifically configured to: if the keyword is one, determine a language model corresponding to the keyword when a distance relationship between the keyword and the input cursor is an adjacency relationship a proximity binary language model; determining the key when the distance relationship is a non-contiguous relationship
  • the language model corresponding to the word is a long-distance binary language model; if the keywords are two, it is determined that the language model corresponding to the keyword is a ternary language model.
  • the queue obtaining unit further includes:
  • a model establishing subunit configured to establish a language model and a prediction candidate vocabulary before the model determining subunit determines a language model corresponding to the keyword, the language model including a neighboring binary language model, and a long distance binary Language model and ternary language model;
  • the model building subunit includes:
  • An extraction subunit configured to extract a training candidate word and a training keyword in the training corpus, the distance relationship between the training keyword and the training candidate word includes an adjacency relationship and a non-contiguous relationship, the training keyword At least one;
  • a training subunit configured to perform model training on the training candidate word and the training keyword, to obtain the language model and the association candidate term pool.
  • the model determining subunit is specifically configured to determine a user model corresponding to the keyword according to a user usage habit characteristic to which the keyword belongs; or determine, according to an application domain to which the keyword belongs, the keyword corresponding a vertical model; or, a common word language model corresponding to the keyword is determined according to a common vocabulary to which the keyword belongs; or a scenario model corresponding to the keyword is determined according to a topic scenario to which the keyword belongs.
  • the queue obtaining subunit includes:
  • Determining a subunit configured to determine an upper screen candidate word in the association candidate vocabulary of each of the language models when the language model has at least two;
  • a merging subunit configured to linearly merge and merge the upper screen candidate words according to weights according to preset weights of each language model
  • a sorting sub-unit configured to sort the merged upper screen candidate words according to a weight from high to low to obtain an upper screen candidate word queue at the input cursor.
  • the device further includes:
  • a queue aligning unit configured to sequence the upper screen candidate word queue according to a topic scenario at the input cursor before the queue output unit outputs the upper screen candidate word queue
  • the queue output unit is configured to output a sequenced upper screen candidate queue.
  • the queue sequencing unit includes:
  • a score calculation subunit configured to determine a feature score of each of the scene feature tags according to a number of the keywords that hit each of the scene feature tags and a probability sum of the keywords hitting the respective scene feature tags;
  • a context sorting sub-unit configured to sort the scene feature tags from high to low according to feature scores of the respective scene feature tags
  • the sequence sub-unit is configured to sequence the upper screen candidate word queue according to the sequence of the scene feature tags, wherein the upper screen candidate words in the upper screen candidate word queue each have a respective scene feature tag.
  • the present application also discloses an electronic device comprising a memory and a processor for storing computer instructions or code, the processor and the memory being coupled for executing computer instructions or code in the memory, Implement the following methods:
  • the text information including the above text information before the input cursor and/or the following text information located after the input cursor;
  • the upper screen candidate queue is output.
  • the present application also discloses a computer program comprising computer readable code that, when executed on a mobile terminal, causes the mobile terminal to perform an input method as previously described.
  • the present application also discloses a computer readable medium in which a computer program as previously described is stored.
  • the embodiment of the present application at least includes the following advantages:
  • the text information at the input cursor is obtained, and the upper screen candidate queue is determined based on the keywords in the text information, which solves the problem that the input cursor cannot be obtained after the input cursor changes position in the prior art.
  • the method can not only obtain reliable upper screen candidate words when the input cursor moves, but also the input method does not rely on the previous upper screen entry association to give the upper screen candidate word queue, but can use the input cursor before and after.
  • the text information, as well as the long-distance text information to associate the upper screen candidate word queue the method can be more comprehensive and correct A clear understanding of the user's input intent, which can give a more reliable upper screen candidate queue.
  • FIG. 1 is a schematic flow chart of an input method according to an embodiment of the present application.
  • FIG. 2 is a flowchart of a method for obtaining an upper screen candidate word queue at an input cursor in an embodiment of the present application
  • FIG. 3 is a flow chart of a system model and a method for establishing a prediction candidate dictionary in the embodiment of the present application;
  • FIG. 4 is a flowchart of a method for obtaining an upper screen candidate word queue at an input cursor according to a prediction candidate lexicon corresponding to a language model in the embodiment of the present application;
  • FIG. 5 is a flowchart of a method for ordering an upper screen candidate word queue according to a topic scenario at an input cursor in an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of an input device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a queue obtaining unit according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a model establishing subunit according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a queue obtaining subunit according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of another input device according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a queue sequencing unit according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of an input method according to an embodiment of the present application.
  • the method of the embodiment of the present application may be used to directly substitute or combine the existing method for predicting the upper screen candidate according to the previous upper screen entry to give the upper screen candidate queue at the input cursor.
  • the method of the embodiments of the present application may be performed under certain conditions. Specifically, when the input device detects that the input cursor is located in the text box, and the time for stopping the text input exceeds the time threshold, the upper screen candidate word queue at the input cursor may be given according to the method of the embodiment, for example, when the user When you need to modify or add text information to change the position of the input cursor in the text box, the input cursor is always in the text box and the text input is paused. .
  • the method can include the following steps:
  • Step 101 Acquire text information at the input cursor.
  • the input device first reads the text information at the input cursor through the system API interface, and may use the entire sentence division point or the text box boundary where the input cursor is located as the length boundary of the text information.
  • the text information may include the above text information before the input cursor or the following text information after the input cursor. Of course, if the text information exists before and after the input cursor, the text information and the text below may also be acquired at the same time. information.
  • Step 102 Extract keywords in the text information.
  • a keyword meta-word list may be set in advance, and the keyword meta-word list is a set containing entries that can be used as keywords. It can be agreed that any entry that appears in the keyword meta-word list can be used as a keyword, and entries that are not in the keyword meta-word list are not used as keywords.
  • all the words in the text information table belonging to the keyword meta-word list can be extracted as keywords.
  • keywords in the text information and the keywords in the text information below may be respectively stored in different sets, or may be differentiated and labeled to facilitate subsequent search for the upper screen candidate. For details, refer to the description of the subsequent embodiments.
  • the keywords in the text information extracted according to the above method may have one or more keywords, and the possible keywords may be located in the text information above, or the keywords may be located in the text information below, and the text information and the text information below may also be There are keywords in the middle.
  • Step 103 Find a prediction candidate dictionary of keywords, and obtain an upper screen candidate queue at the input cursor.
  • the corresponding association candidate lexicon can be searched according to the keyword, thereby obtaining the upper screen candidate vocabulary at the input cursor.
  • each keyword may correspond to a prediction candidate dictionary, and the association candidate
  • the upper screen candidate words in the library are sorted according to the probability of use, from large to small.
  • the candidate candidate lexicon of multiple keywords there is a high possibility that there are repeated upper screen candidate words, and the upper screen candidates in each vocabulary can be selected.
  • the words are arranged in a high order according to the repetition rate, thereby obtaining an upper screen candidate queue at the input cursor.
  • the language model may be established based on a plurality of distance relationships between the keyword and the input cursor, or may be based on an application attribute to which the keyword belongs, wherein the application attribute may be a user's usage habit of the keyword, or It is an application field to which a keyword belongs, such as time, geographical location, holiday greetings, etc., and may also be a common vocabulary to which a keyword belongs, or may be a topic scenario to which a keyword belongs.
  • only one language model corresponding to the keyword extracted in the previous step may be determined, and then the upper candidate candidate queue at the input cursor is obtained according to the determined association candidate vocabulary of the language model;
  • the multi-language model corresponding to the keyword is extracted, and then the association candidate lexicon of the multi-language model is merged to determine the upper-screen candidate vocabulary at the final input cursor.
  • the association candidate lexicon of the multi-language model is merged to determine the upper-screen candidate vocabulary at the final input cursor.
  • Step 104 Output an upper screen candidate queue.
  • the upper screen candidate word queue may be directly output for the user to select, or the upper screen candidate word queue may be first sequenced, and then the sequenced upper screen candidate word queue may be output, and the sequence is adjusted. There are many ways.
  • the text information at the input cursor is obtained, and the upper screen candidate queue is determined based on the keywords in the text information, which solves the problem that the input cursor cannot be obtained after the input cursor changes position in the prior art.
  • the method can not only obtain reliable upper screen candidate words when the input cursor moves, but also the input method does not rely on the previous upper screen entry association to give the upper screen candidate word queue, but can use the input cursor before and after.
  • the text information, as well as the long-distance text information to associate the upper-screen candidate word queue can more fully and correctly understand the user's input intention, so that a more reliable upper-screen candidate queue can be given.
  • one of the modes when performing the step 103 to search for the association candidate vocabulary of the keyword and obtain the queue of the upper screen candidate at the input cursor, one of the modes may be as shown in FIG. 2 . , including the following steps:
  • Step 201 Establish a language model and its association candidate vocabulary.
  • this step does not need to be repeated every time the upper screen candidate queue at the input cursor is obtained, and it can be executed only once in the initial state.
  • the system model, the user model, the vertical model, the common word language model, and the scenario model may be included.
  • the system model is a language model established for the distance relationship between the keyword and the input cursor; the user model, the vertical model, the common word language model, and the scenario model are language models established for the application attributes to which the keyword belongs.
  • the user model is a model established for the user's usage habits of keywords; the vertical model is a model established for the application domain to which the keyword belongs, such as time, geographical location, holiday greetings, etc.; the common word language model is aimed at the key
  • a scenario model is a model established for a topic scenario to which a keyword belongs. The models are described separately below.
  • the system model includes a neighboring binary language model, a long-distance binary language model and a ternary language model.
  • the system model and the establishment process of the association candidate termbase, as shown in FIG. 3, may include:
  • step 301 training corpus is collected.
  • Step 302 Extract training candidates and training keywords in the training corpus.
  • keywords are extracted according to the keyword meta-word list, as a training keyword, and a term at a certain position in the training corpus is set as a training candidate word, wherein, in order to obtain different system models for training, The distance relationship between the training keyword and the training candidate is required, including the adjacency relationship and the non-contiguous relationship, and the training keyword is at least one.
  • the adjacency relationship refers to that there is no interval between the training keyword training candidate words or only the interval stop words, and the non-contiguous relationship is reversed.
  • a stop word is a word that assists the user, such as the modal words "ha”, “a”, “hmm”, and the like.
  • Step 303 Perform model training on the training candidate words and the training keywords to obtain a language model and a corresponding association candidate vocabulary.
  • the training process of the model is similar to the training process of the adjacent dual language model in the prior art, and will not be described here.
  • the adjacent binary language model, the long-distance binary language model, the ternary language model, and the association candidate lexicon of each model can be obtained.
  • the adjacent binary language model is used to solve the binary relationship between adjacent keywords and upper-screen candidate words. Therefore, the adjacency relationship may be a relationship between a keyword in the text information above and a candidate for the upper screen, or may be a relationship between the candidate for the upper screen and the keyword in the text information below. For example, a ⁇ dinner party is held, which is a candidate for a screen, and the dinner is a keyword in the text information below.
  • the adjacent binary language model is a highly deterministic language model. The disadvantage is that the amount of information is small, and too many candidates can be predicted, and it is difficult to select the user's desired.
  • the long-distance binary language model is used to solve the binary relationship between the keywords of the non-contiguous relationship and the candidate words of the upper screen, and the long-distance relationship may be the relationship between the keywords in the text information and the candidates of the upper screen. It may also be the relationship between the upper screen candidate and the keywords in the text information below.
  • the long-distance binary does not require two meta-words to be adjacent; for example, the keyword "apple” and the upper-screen candidate "pear".
  • the long-distance binary language model is an embodiment of the co-occurrence relationship between two meta-words, which often represents the degree of association between two meta-words.
  • the ternary language model is used to solve the ternary relationship between two keywords and the upper screen candidate words, and gives two keywords predictions for the upper screen candidate words.
  • the prediction relationship between the two keywords and the upper screen candidate may be the prediction of the upper screen candidate words by the keywords in the two text information, or the keyword pair upper screen candidates in the two text information below.
  • User model includes user binary model, user ternary model, and long-distance user binary model.
  • the user binary model is used to solve the user binary relationship between the previous user's upper screen and the next user's upper screen; the user ternary model is used to solve the user ternary relationship of three consecutive users on the screen; the remote user two
  • the metamodel is used to solve the problem that the user's screen words within a certain distance and the user's screen words exist. Distance binary relationship.
  • the user model is based on a model obtained by statistically calculating the usage habits of the user, and each model corresponding to the statistics has its own association candidate vocabulary.
  • the vertical model includes language models of multiple vertical domains, which are related to the classification of the domain to which the entry belongs, for example, a time-dependent domain system binary language model, for example, a contact candidate for a vertical model corresponding to "evening"
  • the library contains “9 points, 10 points, 11 points", and the contact candidate vocabulary of the vertical model corresponding to "Saturday" includes “morning, afternoon”; the position-dependent domain language model, for example, the vertical corresponding to "five crossings”
  • the contact candidate vocabulary of the model includes “Tsinghua Tongfang, Richang, Hualian”, etc.; the domain language model related to the quantifier; the relevant domain language model is recommended; the domain language model of the input app environment; the domain language model related to the title and person name The festival-related blessing language domain language model, etc., each vertical model is a model obtained based on statistics of the field to which the entry belongs, and each model corresponding to the statistics has its own association candidate term database.
  • the common word language model (system word language model) is used to cover the incomplete input of an entity word, and complete its prediction of the complete term suffix.
  • the model is based on the statistics obtained from the statistics of common terms. For example, if the keyword in the above text information is “smile”, the upper screen candidate given is “Jianghu”.
  • the scenario model is a model established for the topic scenario to which the keyword belongs. For example, a meeting scenario, a dinner meeting scenario, etc., each keyword has one or more context feature tags, and each scenario feature tag corresponds to a scenario model, and each scenario model has its own association candidate vocabulary.
  • Step 202 can be performed after the above language model is established in advance.
  • Step 202 Determine a language model corresponding to the keyword according to a distance relationship between the keyword and the input cursor and/or an application attribute to which the keyword belongs.
  • the system model corresponding to the keyword may be determined according to the distance relationship between the keyword and the input cursor. If the extracted keyword is one, when the distance relationship between the keyword and the input cursor is an adjacency relationship, the key is determined.
  • the language model corresponding to the word is a close-range binary language model; when the distance relationship is non-contiguous, the language model corresponding to the keyword is determined to be a long-distance binary language model; when the keyword is two, the language corresponding to the keyword is determined.
  • the model is a ternary language model.
  • the language model corresponding to the keyword may be determined according to an application attribute to which the keyword belongs, for example, the user model corresponding to the keyword is determined according to the user usage habit feature to which the keyword belongs; or the keyword is determined according to the application domain to which the keyword belongs. Corresponding vertical field; or, according to the key
  • the common vocabulary to which the word belongs determines a common word language model corresponding to the keyword; or, the scenario model corresponding to the keyword is determined according to the topic scenario to which the keyword belongs.
  • Step 203 Obtain an upper screen candidate word queue at the input cursor according to the association candidate vocabulary corresponding to the language model.
  • an index may also be established in each of the association candidate lexicons, such as a left meta index and a right meta index.
  • the left-element index in the association candidate vocabulary of the language model may be used to search for the upper-screen candidate queue at the input cursor; when the keyword is derived from the following text information, the language model may be utilized.
  • the right meta index in the association candidate lexicon looks for the upper screen candidate queue at the input cursor; when the keyword is derived from the upper and lower text information, the search in both directions is taken into account, and in addition, The query with the middle element as the search target, for this purpose, two secondary indexes are established in the association candidate lexicon of the ternary model to search for the intermediate elements in two directions.
  • the upper screen candidate queue at the input cursor can be obtained by matching the prefix.
  • the process of obtaining the upper screen candidate queue at the input cursor may further include the following steps:
  • Step 401 Determine an upper screen candidate in the association candidate vocabulary of each language model.
  • Step 402 According to the preset weight of each language model, the upper screen candidate words are linearly superimposed according to the weight.
  • Step 403 Sort the merged upper screen candidate words according to the weight from high to low to obtain an upper screen candidate word queue at the input cursor.
  • a more ideal and reliable upper-screen candidate queue can be obtained by combining multiple language models corresponding to keywords.
  • the text message at the input cursor is "I am going to Dalian tomorrow, I want to find [Cursor]”, the user's input intention is that he wants to discover the kingdom playground. Extract the keywords "Dalian” and “Discovery” in the above text information, where "Dalian” prompts the location of the user's destination, the keyword belongs to the vertical model, then combined with the keyword “discovery”, you can get reliable Screen candidate "kingdom”.
  • the method may be directly
  • the upper screen candidate queue is output for the user to select, and the upper screen candidate queue may be sequenced before outputting the upper screen candidate queue, and then the sequenced upper screen candidate queue is output.
  • one of the methods for sorting the upper screen candidate word queue according to the topic scenario at the input cursor, as shown in FIG. 5, may include:
  • Step 501 Determine a feature score of each scene feature tag according to the number of keywords hitting each scene feature tag and the probability sum of the keyword hitting each scene feature tag.
  • Each keyword may hit one or more context feature tags, each context feature tag corresponds to a topic scenario, and the probability that the keyword hits a profile feature tag in the final screen result may be obtained according to statistics, therefore, each The feature score feature i of the scene feature tag i can be expressed as:
  • N i keywords hit the scene feature tag i; word j is the probability that the jth keyword hits the scene feature tag i in the final upper screen result, j 1, . . . , N i .
  • Step 502 Sort the scene feature tags from high to low according to the feature score of the scene feature tag.
  • the topic scenario corresponding to the scenario feature tag with the higher score is most likely the topic scenario to which the final screen word belongs.
  • step 503 the upper screen candidate word queue is sequenced according to the sequence of the scene feature tags.
  • the upper screen candidate words in the upper screen candidate word queue each have a respective scene feature label.
  • the upper screen candidate words may be sequenced according to the sequence of the scene feature tags, thereby obtaining the final upper screen candidate queue.
  • the embodiment of the present application combines the situational awareness function, and ranks the ideal candidate words by ranking the upper screen candidate words, and gives a more reliable upper screen candidate word queue.
  • keywords in the text information are extracted: “arrangement”, “garden”, “hotel”, “evening”, “grand”, “banquet”; according to the distance between the keyword and the input cursor
  • the language model corresponding to the relationship determination keyword is: adjacency binary model, long-distance binary model, ternary model; obtain the upper screen waiting at the input cursor according to the association candidate lexicon corresponding to the language model
  • the queue of words is: (evening) sleeping, (evening) dating, holding (banquet), (arrangement) delivery, (garden) door, (evening) (banquet); ordering the upper screen candidate queue
  • the final candidate for the upper screen is: hold, sleep, date, door, and delivery.
  • the text information at the input cursor is "Going to Korea in the Mid-Autumn Festival last year, I want to go to [Cursor] this year.”
  • the user's input intention is the upper screen "Japan”.
  • "go” and “want to go” will be used for the upper screen candidate search.
  • "Korea” and “Japan” are a candidate candidate vocabulary of a well-drawn long-distance binary language model
  • “go” and “Japan” are associations of a extracted adjacent binary language model.
  • the current system time and the time data of the upper screen in the user input history are used for prediction according to the vertical model of the time domain corresponding to the keyword and the user model.
  • the keyword of the text information above is “Five Passes”, then according to the vertical model of the geographic domain corresponding to the keyword and the user model, the historical data in the user input history is used to input the historical data and the instantaneously obtained location information.
  • the user wants to express the meaning of "the autumn of the old capital", the user has completed the loss of the first three characters.
  • the user's upper screen form may be various, "there are ⁇ ⁇ autumn", “so ⁇ ⁇ ⁇ ⁇ autumn", “the old ⁇ autumn”; this is the same for "autumn”
  • the upper screen candidate is used for association.
  • the method of this application is to extract the keyword " Then, according to the language model corresponding to the keyword, for example, a common word language model, the prediction of the upper screen candidate word is performed, and the upper screen candidate word "autumn" can be obtained.
  • the method disclosed in the above embodiments can understand the user input intention more comprehensively and correctly.
  • the above embodiments can be applied not only to Chinese input scenes, but also to other language input scenes such as English, Japanese, and Korean.
  • FIG. 6 is a schematic structural diagram of an input device according to an embodiment of the present application.
  • the device can include the following units:
  • the text obtaining unit 601 is configured to acquire text information at the input cursor, where the text information includes the above text information before the input cursor and/or the following text information located after the input cursor.
  • the keyword extracting unit 602 is configured to extract keywords in the text information.
  • the queue obtaining unit 603 is configured to search for a prediction candidate dictionary of the keyword, and obtain an upper screen candidate queue at the input cursor.
  • the queue output unit 604 is configured to output the upper screen candidate queue.
  • the device obtains the text message information at the input cursor and determines the upper screen word candidate queue based on the keywords in the text information, which solves the problem that the input cursor cannot change the location and cannot obtain the reliable upper screen entry. Give the question of the upper screen candidate.
  • the device can not only obtain reliable upper screen candidate words when the input cursor moves, but also the input method does not rely on the last upper screen entry association to give the upper screen candidate word queue, but can use the input cursor before and after. Text information, as well as long-distance text information to associate the upper screen candidate queue, the device can be more comprehensive and correct The user's input intent is solved, so that a more reliable upper screen candidate queue can be given.
  • the text obtaining unit 601 may be specifically configured to: when detecting that the input cursor is located in the text box, and the time for stopping the text input exceeds a time threshold, acquiring the text information at the input cursor .
  • the text obtaining unit may be further configured to obtain the text information at the input cursor by using a whole sentence dividing point or a text box boundary where the input cursor is located as a length boundary of the text information.
  • the queue obtaining unit 603 may further include:
  • a model establishing sub-unit 701 configured to establish a language model and a prediction candidate vocabulary before the model determining sub-unit 702 determines a language model corresponding to the keyword, the language model including a neighboring binary language model, and a long distance Binary language model and ternary language model.
  • the model determining sub-unit 702 is configured to determine a language model corresponding to the keyword according to a distance relationship between the keyword and the input cursor and/or an application attribute to which the keyword belongs.
  • the queue obtaining sub-unit 703 is configured to search for a prediction candidate dictionary of the language model, and obtain an upper screen candidate queue at the input cursor.
  • model establishing subunit 701 may further include:
  • a collection subunit 801 is provided for collecting training corpora.
  • the extraction sub-unit 802 is configured to extract a training candidate word and a training keyword in the training corpus, and the distance relationship between the training keyword and the training candidate word includes an adjacency relationship and a non-contiguous relationship, and the training key The word is at least one.
  • the training sub-unit 803 is configured to perform model training on the training candidate words and the training keywords to obtain the language model and its association candidate vocabulary.
  • the model determining sub-unit 702 is specifically configured to: if the keyword is one, when the distance relationship between the keyword and the input cursor is an adjacency relationship, determine that the language model corresponding to the keyword is a proximity binary language model; determining that the language model corresponding to the keyword is a long-distance binary language model when the distance relationship is a non-contiguous relationship; and determining that the keyword corresponds to when the keyword is two
  • the language model is a ternary language model.
  • the model determining sub-unit 702 is further configured to determine a user model corresponding to the keyword according to a user usage habit characteristic to which the keyword belongs; or determine, according to an application domain to which the keyword belongs, the keyword corresponding to the keyword Vertical model; or; according to the common vocabulary to which the keyword belongs Defining a common word language model corresponding to the keyword; or determining a scenario model corresponding to the keyword according to a topic scenario to which the keyword belongs.
  • the queue obtaining subunit 703 may further include:
  • the determining subunit 901 is configured to determine an upper screen candidate word in the association candidate vocabulary of each of the language models when the language model has at least two.
  • the merging sub-unit 902 is configured to linearly merge and merge the upper screen candidate words according to weights according to preset weights of the language models.
  • the sorting sub-unit 903 is configured to sort the merged upper screen candidate words according to the weight from high to low to obtain an upper screen candidate word queue at the input cursor.
  • FIG. 10 is a schematic structural diagram of another input device according to an embodiment of the present application.
  • the device may include, in addition to the text obtaining unit 601, the keyword extracting unit 602, the queue obtaining unit 603, and the queue output unit 604, the following:
  • the queue sequencing unit 1001 is configured to sequence the upper screen candidate word queue according to the topic scenario at the input cursor before the queue output unit 604 outputs the upper screen candidate word queue.
  • the queue output unit 604 is configured to output the sequenced upper screen candidate queue.
  • the queue sequencing unit 1001 may further include:
  • the score calculation sub-unit 1101 is configured to determine a feature score of each of the scene feature tags according to the number of the keywords that hit each scene feature tag and the probability sum of the keywords hitting the respective scene feature tags.
  • the context sorting sub-unit 1102 is configured to sort the scene feature tags from high to low according to the feature scores of the respective scene feature tags.
  • the sequence sub-unit 1103 is configured to sequence the upper screen candidate word queue according to the sequence of the scene feature tags, wherein the upper screen candidate words in the upper screen candidate word queue each have a respective scene feature tag .
  • the device combines the situational awareness function, and ranks the ideal candidate words by ranking the upper screen candidate words, and gives a more reliable upper screen candidate queue.
  • the embodiment of the present application further provides an electronic device, including a memory and a processor, the memory is used to store computer instructions or code, and the processor and the memory are coupled to execute computer instructions in the memory or Code that implements the following methods:
  • the text information including the above text information before the input cursor and/or the following text information located after the input cursor;
  • the upper screen candidate queue is output.
  • the present application also discloses a computer program comprising computer readable code that, when executed on a mobile terminal, causes the mobile terminal to perform the input method described above.
  • a computer readable recording medium for executing the above computer program is recorded thereon.
  • the computer readable recording medium includes any mechanism for storing or transmitting information in a form readable by a computer (eg, a computer).
  • a machine-readable medium includes read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash storage media, electrical, optical, acoustic, or other forms of propagation signals (eg, carrier waves) , infrared signals, digital signals, etc.).
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Produce Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种输入方法、装置及电子设备。该输入方法包括:获取输入光标处的文本信息(101),所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息;提取所述文本信息中的关键词(102);查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列(103);输出所述上屏候选词队列(104)。通过获取输入光标处的文本信息,并基于该文本信息中的关键词确定出上屏词候选队列,解决了现有技术中输入光标改变位置后由于无法获取可靠上屏词条而无法联想给出上屏候选词的问题。

Description

一种输入方法、装置及电子设备
本申请要求于2014年9月9日提交中国专利局、申请号为201410455924.0、申请名称为“一种输入方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种输入方法、装置及电子设备。
背景技术
拼音输入是一种最简单的汉字输入方法,它的发展非常快,从第一代的以字输入为主,即用户一次只能输入一个汉字,发展到第二代以词输入为主并具有智能调频功能,这个时候主要依赖的是输入法的词典,发展到第三代,用户可以进行语句的输入,输入法词典中没有的句子也可以进行输入,组词功能对输入的体验有着很大的影响。
输入法联想功能是拼音输入法主动输入的一种扩展,它的出现减少了用户主动输入的次数、按键的次数,并增加了输入法的智能性。该输入法的实现过程是首先获取用户上一次上屏的词条,并根据该词条查询系统二元库等预建词库来获取上屏候选词队列,然后输出该上屏候选词队列。
然而,该输入法中的上屏候选词队列由于必须依赖于上一次上屏的词条,当输入光标改变位置时,就无法获取可靠的上屏词条,进而也无法联想给出该输入光标处的上屏候选词队列。因此,目前需要本领域技术人员迫切解决的一个技术问题就是:如何在输入光标移动时获得可靠的上屏候选词队列。
发明内容
本申请实施例所要解决的技术问题是提供一种输入方法,能够在输入光标移动时获得可靠的上屏候选词队列。
相应的,本申请实施例还提供了一种输入装置及电子设备,用以保证上述方法的实现及应用。
为了解决上述问题,本申请公开了一种输入方法,包括:
获取输入光标处的文本信息,所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息;
提取所述文本信息中的关键词;
查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列;
输出所述上屏候选词队列。
进一步,所述获取输入光标处的文本信息,包括:
当检测到所述输入光标位于文本框内,且停止文本输入的时间超过时间阈值时,获取所述输入光标处的文本信息。
进一步,所述获取输入光标处的文本信息,包括:
以所述输入光标所在的整句分割点或文本框边界作为所述文本信息的长度边界,获取所述输入光标处的文本信息。
进一步,所述查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列,包括:
根据所述关键词与所述输入光标之间的距离关系和/或所述关键词所属的应用属性,确定所述关键词对应的语言模型;
查找所述语言模型的联想候选词库,获得所述输入光标处的上屏候选词队列。
进一步,所述根据所述关键词与所述输入光标之间的距离关系确定所述关键词对应的语言模型,包括:
若所述关键词为一个,则当所述关键词与所述输入光标之间的距离关系为邻接关系时,确定所述关键词对应的语言模型为邻近二元语言模型;当所述距离关系为非邻接关系时确定所述关键词对应的语言模型为远距离二元语言模型;
若所述关键词为两个,则确定所述关键词对应的语言模型为三元语言模型。
进一步,在所述根据所述关键词与所述输入光标之间的距离关系确定所述关键词对应的语言模型之前,还包括:
建立语言模型及其联想候选词库,所述语言模型包括邻近二元语言模型,远距离二元语言模型及三元语言模型;
所述建立语言模型及其联想候选词库,包括:
收集训练语料;
提取所述训练语料中的训练候选词及训练关键词,所述训练关键词与所述训练候选词之间的距离关系包括邻接关系和非邻接关系,所述训练关键词至少为一个;
对所述训练候选词及所述训练关键词进行模型训练,获得所述语言模型及其联想候选词库。
进一步,所述根据所述关键词所属的应用属性确定所述关键词对应的语言模型,包括:
根据所述关键词所属的用户使用习惯特征确定所述关键词对应的用户模型;或者,
根据所述关键词所属的应用领域确定所述关键词对应的垂直模型;或者,
根据所述关键词所属的常用词汇确定所述关键词对应的常见词语言模型;或者,
根据所述关键词所属的话题情景确定所述关键词对应的情景模型。
进一步,所述查找所述语言模型的联想候选词库,获得所述输入光标处的上屏候选词队列,包括:
当所述语言模型至少有两个时,分别确定各所述语言模型的联想候选词库中的上屏候选词;
根据各所述语言模型的预设权重,按照权重线性叠加合并所述上屏候选词;
对合并后的上屏候选词按照权重由高到低进行排序获得所述输入光标处的上屏候选词队列。
进一步,在所述输出所述上屏候选词队列之前,还包括:
根据所述输入光标处的话题情景对所述上屏候选词队列进行调序;
所述输出所述上屏候选词队列,包括:
输出调序后的上屏候选词队列。
进一步,所述根据所述输入光标处的话题情景对所述上屏候选词队列进行调序,包括:
根据命中各情景特征标签的所述关键词的个数及所述关键词命中所述各情景特征标签的概率和,确定所述各情景特征标签的特征得分;
按照所述各情景特征标签的特征得分,由高到底对所述情景特征标签进行排序;
按照所述情景特征标签的顺序对所述上屏候选词队列进行调序,其中,所述上屏候选词队列中的上屏候选词均具有各自的情景特征标签。
本申请还公开了一种输入装置,包括:
文本获取单元,用于获取输入光标处的文本信息,所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息;
关键词提取单元,用于提取所述文本信息中的关键词;
队列获取单元,用于查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列;
队列输出单元,用于输出所述上屏候选词队列。
进一步,所述文本获取单元,具体用于当检测到所述输入光标位于文本框内,且停止文本输入的时间超过时间阈值时,获取所述输入光标处的文本信息。
进一步,所述文本获取单元,具体用于以所述输入光标所在的整句分割点或文本框边界作为所述文本信息的长度边界,获取所述输入光标处的文本信息。
进一步,所述队列获取单元包括:
模型确定子单元,用于根据所述关键词与所述输入光标之间的距离关系和/或所述关键词所属的应用属性,确定所述关键词对应的语言模型;
队列获取子单元,用于查找所述语言模型的联想候选词库,获得所述输入光标处的上屏候选词队列。
进一步,所述模型确定子单元,具体用于若所述关键词为一个,则当所述关键词与所述输入光标之间的距离关系为邻接关系时,确定所述关键词对应的语言模型为邻近二元语言模型;当所述距离关系为非邻接关系时确定所述关键 词对应的语言模型为远距离二元语言模型;若所述关键词为两个,则确定所述关键词对应的语言模型为三元语言模型。
进一步,所述队列获取单元还包括:
模型建立子单元,用于在所述模型确定子单元确定所述关键词对应的语言模型之前,建立语言模型及其联想候选词库,所述语言模型包括邻近二元语言模型,远距离二元语言模型及三元语言模型;
所述模型建立子单元包括:
收集子单元,用于收集训练语料;
提取子单元,用于提取所述训练语料中的训练候选词及训练关键词,所述训练关键词与所述训练候选词之间的距离关系包括邻接关系和非邻接关系,所述训练关键词至少为一个;
训练子单元,用于对所述训练候选词及所述训练关键词进行模型训练,获得所述语言模型及其联想候选词库。
进一步,所述模型确定子单元,具体用于根据所述关键词所属的用户使用习惯特征确定所述关键词对应的用户模型;或者,根据所述关键词所属的应用领域确定所述关键词对应的垂直模型;或者,根据所述关键词所属的常用词汇确定所述关键词对应的常见词语言模型;或者,根据所述关键词所属的话题情景确定所述关键词对应的情景模型。
进一步,所述队列获取子单元包括:
确定子单元,用于当所述语言模型至少有两个时,分别确定各所述语言模型的联想候选词库中的上屏候选词;
合并子单元,用于根据各所述语言模型的预设权重,按照权重线性叠加合并所述上屏候选词;
排序子单元,用于对合并后的上屏候选词按照权重由高到低进行排序获得所述输入光标处的上屏候选词队列。
进一步,所述装置还包括:
队列调序单元,用于在所述队列输出单元输出所述上屏候选词队列之前,根据所述输入光标处的话题情景对所述上屏候选词队列进行调序;
所述队列输出单元,用于输出调序后的上屏候选词队列。
进一步,所述队列调序单元包括:
得分计算子单元,用于根据命中各情景特征标签的所述关键词的个数及所述关键词命中所述各情景特征标签的概率和,确定所述各情景特征标签的特征得分;
情景排序子单元,用于按照所述各情景特征标签的特征得分,由高到底对所述情景特征标签进行排序;
调序子单元,用于按照所述情景特征标签的顺序对所述上屏候选词队列进行调序,其中,所述上屏候选词队列中的上屏候选词均具有各自的情景特征标签。
本申请还公开了一种电子设备,包括存储器和处理器,所述存储器用于存储计算机指令或代码,所述处理器和所述存储器耦合,用于执行所述存储器中的计算机指令或代码,实现以下方法:
获取输入光标处的文本信息,所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息;
提取所述文本信息中的关键词;
查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列;
输出所述上屏候选词队列。
本申请还公开了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在移动终端上运行时,导致所述移动终端执行如前所述的输入方法。
本申请还公开了一种计算机可读介质,其中存储了如前所述的计算机程序。
与现有技术相比,本申请实施例至少包括以下优点:
本申请实施例通过获取输入光标处的文本信息,并基于该文本信息中的关键词确定出上屏词候选队列,解决了现有技术中输入光标改变位置后由于无法获取可靠上屏词条而无法联想给出上屏候选词的问题。该方法不仅能够在输入光标移动时获得可靠的上屏候选词,而且,该输入方法不单单依靠上一次的上屏词条联想给出上屏候选词队列,而是可以利用输入光标前、后的文本信息,以及远距离的文本信息来联想给出上屏候选词队列,该方法可以更全面、更正 确的理解用户的输入意图,从而可以给出更可靠的上屏候选词队列。
附图说明
图1为本申请实施例一种输入方法的流程示意图;
图2是本申请实施例中一种获得输入光标处的上屏候选词队列的方法流程图;
图3是本申请实施例中一种系统模型及其联想候选词库的建立方法流程图;
图4是本申请实施例中一种根据语言模型对应的联想候选词库获得输入光标处的上屏候选词队列的方法流程图;
图5是本申请实施例中一种根据输入光标处的话题情景对上屏候选词队列进行调序的方法流程图;
图6为本申请实施例一种输入装置的结构示意图;
图7为本申请实施例中一种队列获取单元的结构示意图;
图8为本申请实施例中一种模型建立子单元的结构示意图;
图9为本申请实施例中一种队列获取子单元的结构示意图;
图10为本申请实施例另一种输入装置的结构示意图;
图11为本申请实施例中一种队列调序单元的结构示意图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
参照图1,为本申请实施例一种输入方法的流程示意图。
在用户进行文本输入的过程可以采用本申请实施例的方法来直接代替或结合现有的根据上一次上屏词条预测上屏候选词的方法给出输入光标处的上屏候选词队列,也可以是在某种条件下执行本申请实施例的方法。具体的,当输入装置检测到输入光标位于文本框内,且停止文本输入的时间超过时间阈值时,可以按照本实施例方法给出输入光标处的上屏候选词队列,例如,当用户 需要修改或增加文本信息而改变输入光标在文本框内的位置时,在该场景下输入光标始终位于文本框内,且会暂停文本输入。。该方法可以包括如下步骤:
步骤101,获取输入光标处的文本信息。
在本步骤中,输入装置首先通过系统API接口读取输入光标处的文本信息,可以以输入光标所在的整句分割点或文本框边界作为该文本信息的长度边界。
其中,该文本信息可以包括位于输入光标之前的上文文本信息,或者位于输入光标之后的下文文本信息,当然,如果输入光标的前后都存在文本信息,也可以同时获取上文文本信息和下文文本信息。
步骤102,提取文本信息中的关键词。
在本实施例中可以预先设置关键词元词表,该关键词元词表是一个集合,其中包含可以作为关键词的词条。可以约定凡是出现在该关键词元词表中的词条都可以作为关键词,不在这个关键词元词表中的词条都不作为关键词。
在本步骤中即可提取文本信息中所有属于该关键词元词表中的词条作为关键词。具体的,对于上文文本信息,可以从输入光标处开始,向前到整句的分割点或者文本框边界结束,利用动态规划算法进行遍历得到上文文本信息中的关键词,对于下文文本信息,可以输入光标处开始,向后到整句的分割点或者文本框边界结束,利用动态规划算法进行遍历得到下文文本信息中的关键词。上文文本信息中的关键词和下文文本信息中的关键词可以分别存入不同的集合,或者进行区分标注,以便于后续搜索上屏候选词,具体请参见后续实施例的描述。
按照上述方法提取的文本信息中的关键词可以有一个也可能有多个,可能关键词均位于上文文本信息,也可能关键词均位于下文文本信息,还可能上文文本信息和下文文本信息中均存在关键词。
步骤103,查找关键词的联想候选词库,获得输入光标处的上屏候选词队列。
在获得文本信息的关键词后,可以根据该关键词查找对应的联想候选词库,从而获得输入光标处的上屏候选词队列。
其中一种方式,可以是每个关键词对应一个联想候选词库,该联想候选词 库内的上屏候选词按照使用概率由大到小进行排序,查询多个关键词的联想候选词库时,极有可能存在重复的上屏候选词,可以将各词库中的上屏候选词按照重复率由高到底排列,从而获得输入光标处的上屏候选队列。
另一种方式,还可以是预先建立语言模型及其联想候选词库。该语言模型可以是基于关键词与输入光标之间的多种距离关系建立的,也可以是基于关键词所属的应用属性建立的,其中,应用属性可以是用户对关键词的使用习惯,也可以是关键词所属的应用领域,如时间、地理位置、节日祝福语等领域,也可以是关键词所属的常用词汇,还可以是关键词所属的话题情景等。在执行本步骤时,可以只确定上步骤提取的关键词对应的一种语言模型,然后根据该确定出的语言模型的联想候选词库获得输入光标处的上屏候选词队列;也可以确定出所提取关键词对应的多种语言模型,然后将多种语言模型的联想候选词库进行合并,确定出最终的输入光标处的上屏候选词队列。具体请参见后续实施例的描述。
当然还可以存在其它方式,此处不再一一列举。
步骤104,输出上屏候选词队列。
在获得上屏候选词队列后,可以直接输出该上屏候选词队列供用户选择,也可以首先对该上屏候选词队列进行调序后再输出调序后的上屏候选词队列,调序方法有多种。
本申请实施例通过获取输入光标处的文本信息,并基于该文本信息中的关键词确定出上屏词候选队列,解决了现有技术中输入光标改变位置后由于无法获取可靠上屏词条而无法联想给出上屏候选词的问题。该方法不仅能够在输入光标移动时获得可靠的上屏候选词,而且,该输入方法不单单依靠上一次的上屏词条联想给出上屏候选词队列,而是可以利用输入光标前、后的文本信息,以及远距离的文本信息来联想给出上屏候选词队列,该方法可以更全面、更正确的理解用户的输入意图,从而可以给出更可靠的上屏候选词队列。
在本申请的另一实施例中,如前所述,在执行步骤103查找关键词的联想候选词库,获得输入光标处的上屏候选词队列时,其中一种方式可以如图2所示,包括以下步骤:
步骤201,建立语言模型及其联想候选词库。
首先,本步骤无需在每次获得输入光标处的上屏候选词队列时重复执行,只在初始状态执行一次即可。
本步骤中建立的语言模型可以有多种,本实施例中,可以包括系统模型,用户模型,垂直模型,常见词语言模型,情景模型。
系统模型是针对关键词与输入光标之间的距离关系建立的语言模型;用户模型,垂直模型,常见词语言模型,情景模型均是针对关键词所属的应用属性建立的语言模型。其中,用户模型是针对用户对关键词的使用习惯建立的模型;垂直模型是针对关键词所属的应用领域,如时间、地理位置、节日祝福语等领域建立的模型;常见词语言模型是针对关键词所属的常用词汇建立的模型;情景模型是针对关键词所属的话题情景建立的模型。下面对各模型分别进行介绍。
1)系统模型包括邻近二元语言模型,远距离二元语言模型及三元语言模型。该系统模型及其联想候选词库的建立过程,如图3所示,可以包括:
步骤301,收集训练语料。
步骤302,提取训练语料中的训练候选词及训练关键词。
针对每一个训练语料均按照关键词元词表来提取关键词,作为训练关键词,并设定训练语料中的某一位置的词条作为训练候选词,其中,为了训练得到不同的系统模型,需要训练关键词与训练候选词之间的距离关系,包括邻接关系和非邻接关系,且训练关键词至少为一个。
其中,邻接关系是指训练关键词语训练候选词之间没有间隔或者是仅间隔停用字,非邻接关系反之。停用字是指辅助用户的字词,例如语气词“哈”、“了”、“嗯”等。
步骤303,对训练候选词及训练关键词进行模型训练,获得语言模型及其对应的联想候选词库。
该模型训练的过程与现有技术中邻近二元语言模型的训练过程类似,此处不再赘述。
在模型训练后即可获得邻近二元语言模型,远距离二元语言模型,三元语言模型,及各模型的联想候选词库。
其中,邻近二元语言模型用来解决邻接的关键词与上屏候选词的二元关 系,该邻接关系可能是上文文本信息中的关键词与上屏候选词之间的关系,也可能是上屏候选词与下文文本信息中的关键词之间的关系。例如,举行~晚宴,举行是上屏候选词,晚宴是下文文本信息中的关键词。邻近二元语言模型是确定性较高的语言模型,缺点是信息量较少,能够预测出的候选过多,难以从中选择用户想要的。
远距离二元语言模型用来解决非邻接关系的关键词与上屏候选词之间的二元关系,该远距离关系可能是上文文本信息中的关键词与上屏候选词之间的关系,也可能是上屏候选词与下文文本信息中的关键词之间的关系。与邻近二元语言模型不同的是,远距离二元不需要两个元词是相邻的;例如,关键词“苹果”和上屏候选词“梨”。远距离二元语言模型是两个元词共现关系的一种体现,它往往表征着两个元词之间的关联度。
三元语言模型用来解决两个关键词与上屏候选词之间的三元关系,给出两个关键词对上屏候选词的预测。该两个关键词与上屏候选词之间的预测关系可能是两个上文文本信息中的关键词对上屏候选词的预测,或者,两个下文文本信息中的关键词对上屏候选词之间的预测,还或者,上、下文文本信息中各出现一个关键词以两边夹的方式对中间上屏候选词的预测。两个上文文本信息中的关键词对上屏候选词的预测,例如:“会议在晚上(召开)”,“召开”为上屏候选词,“会议~召开”是一个比较显著的远距离二元,“召开”作为上屏候选词的排位靠前;“在晚上~召开”虽然二元关系显著,但是排位在百位以后;如果只根据现有技术中的邻接二元关系,“召开”这个上屏候选词很可能被遗漏掉,而通过引进该三元语言模型“A~B~C”,A表示远距离上文文本信息中的某一个关键词,B表示近距离/邻接的关键词,C则为上屏候选词,即可获得可靠的上屏候选词。另外一种情况,如果输入光标前后各提出关键词“脚本”和“指南”,那么“脚本~学习~指南”则会被利用来预测上屏候选词“学习”。
2)用户模型包括用户二元模型,用户三元模型,远距离用户二元模型。其中,用户二元模型用于解决前一次用户上屏与下一次用户上屏存在的用户二元关系;用户三元模型用于解决连续三次用户上屏存在的用户三元关系;远距离用户二元模型用于解决一定距离内用户上屏词与本次用户上屏词存在的远 距离二元关系。该用户模型是基于对用户对词条的使用习惯进行统计所获得的模型,每种模型对应统计有各自的联想候选词库。
3)垂直模型包括诸多个垂直领域的语言模型,这些语言模型与词条所属领域的分类有关,例如,时间相关的领域系统二元语言模型,例如,“晚上”对应的垂直模型的联系候选词库中包含“9点、10点、11点”,“周六”对应的垂直模型的联系候选词库中包含“上午、下午”;位置相关的领域语言模型,例如,“五道口”对应的垂直模型的联系候选词库中包含“清华同方、日昌、华联”等;量词相关的领域语言模型;推荐相关的领域语言模型;输入app环境的领域语言模型;称谓、人名相关的领域语言模型;节日相关祝福语领域语言模型等,各垂直模型是基于词条所属领域进行统计所获得的模型,每种模型对应统计有各自的联想候选词库。
4)常见词语言模型(系统词语言模型)用来覆盖对一个实体词不完整输入的情况,完成其对完整词条后缀的预测,该模型是基于对常见词条进行统计所获得的模型;例如,上文文本信息中的关键词是“笑傲”,则给出的上屏候选词为“江湖”。
5)情景模型是针对关键词所属的话题情景所建立的模型。例如,会议情景,聚餐情景等,每个关键词都具有一个或多个情景特征标签,每种情景特征标签对应一个情景模型,每种情景模型都具有各自的联想候选词库。
在预先建立上述语言模型后即可执行步骤202。
步骤202,根据关键词与输入光标之间的距离关系和/或关键词所属的应用属性确定关键词对应的语言模型。
本步骤中可以根据关键词与输入光标之间的距离关系确定关键词对应的系统模型,若提取的关键词为一个,则当关键词与输入光标之间的距离关系为邻接关系时,确定关键词对应的语言模型为近距离二元语言模型;当距离关系为非邻接关系时确定关键词对应的语言模型为远距离二元语言模型;当关键词为两个时,确定关键词对应的语言模型为三元语言模型。
也可以根据关键词所属的某一应用属性确定关键词对应的语言模型,例如,根据关键词所属的用户使用习惯特征确定关键词对应的用户模型;或者,根据关键词所属的应用领域确定关键词对应的垂直领域;或者,根据所述关键 词所属的常用词汇确定所述关键词对应的常见词语言模型;或者,根据关键词所属的话题情景确定关键词对应的情景模型等。
还可以同时确定关键词对应的多种语言模型,例如远距离二元模型、邻接二元模型、三元模型、用户二元模型,以及常见词语言模型等。
步骤203,根据语言模型对应的联想候选词库获得输入光标处的上屏候选词队列。
对于系统规模、用户模型、垂直模型,为了便于查找语言模型的联想候选词库中的上屏候选队列,还可以在各联想候选词库中按照常规方式建立索引,例如左元索引和右元索引。当关键词来源于上文文本信息时,可以利用语言模型的联想候选词库中的左元索引查找输入光标处的上屏候选词队列;当关键词来源于下文文本信息时,可以利用语言模型的联想候选词库中的右元索引查找输入光标处的上屏候选词队列;当关键词来源于上、下文文本信息时,则会兼顾两个方向的搜索,除此之外,还会增加以中间元的为搜索目标的查询,为此,三元模型的联想候选词库中会建立两个二级索引,以便在两个方向上搜索中间元。对于常见词模型,与现有联想方式类似,可以采用匹配前缀的方式获得输入光标处的上屏候选词队列。
当上步骤确定出的语言模型至少有两个时,该获得输入光标处的上屏候选词队列的过程,如图4所示,还可以进一步包括以下步骤:
步骤401,确定各语言模型的联想候选词库中的上屏候选词。
步骤402,根据各语言模型的预设权重,按照权重线性叠加合并上屏候选词。
步骤403,对合并后的上屏候选词按照权重由高到低进行排序获得输入光标处的上屏候选词队列。
通过结合关键词对应的多种语言模型可以获得更理想更可靠的上屏候选词队列。例如,输入光标处的文本信息为“明天我到大连,我想去发现[光标]”,用户的输入意图是他想去发现王国这个游乐场。提取上文文本信息中的关键词“大连”和“发现”,其中“大连”提示了用户目的地的位置,该关键词属于垂直模型,那么结合关键词“发现”,即可获得可靠的上屏候选词“王国”。
在本申请的另一实施例中,在基于上述方式获得上屏候选队列后可以直接 输出该上屏候选队列供用户选择,也还可以在输出该上屏候选队列之前,对上屏候选队列进行调序,然后再输出调序后的上屏候选词队列。
调序方式有多种,其中一种可以根据输入光标处的话题情景对上屏候选词队列进行调序的方法,如图5所示,可以包括:
步骤501,根据命中各情景特征标签的关键词的个数及关键词命中各情景特征标签的概率和,确定各情景特征标签的特征得分。
每一关键词可能命中一个或多个情景特征标签,每个情景特征标签对应一种话题情景,而最终上屏结果中关键词命中某一情景特征标签的概率可以根据统计获得,因此,每个情景特征标签i的特征得分featurei即可表示为:
Figure PCTCN2015087050-appb-000001
其中,Ni个关键词命中情景特征标签i;wordj是最终上屏结果中第j个关键词命中该情景特征标签i的概率,j=1,......,Ni
步骤502,按照情景特征标签的特征得分,由高到底对各情景特征标签进行排序。
得分越高的情景特征标签对应的话题情景最可能是最终上屏词所属的话题情景。
步骤503,根据情景特征标签的顺序对上屏候选词队列进行调序。
在经过前述实施例的方法获得上屏候选词队列后,该上屏候选词队列中的上屏候选词均具有各自的情景特征标签。本步骤中,即可根据情景特征标签的顺序对上屏候选词进行调序,进而获得最终的上屏候选队列。
本申请实施例结合了情景感知功能,通过对上屏候选词进行排序调整,将理想候选词排位靠前,给出了更可靠的上屏候选词队列。
下面通过具体实例进行说明。
例如,输入光标处的文本信息为“我们安排在花园酒店,在晚上[光标]盛大的宴会”。按照本申请实施例的方法,提取文本信息中的关键词:“安排”、“花园”、“酒店”、“晚上”、“盛大”、“宴会”;根据关键词与输入光标之间的距离关系确定关键词对应的语言模型为:邻接二元模型、远距离二元模型、三元模型;根据语言模型对应的联想候选词库获得输入光标处的上屏候 选词队列为:(晚上)睡觉、(晚上)约会、举行(宴会)、(安排)发货、(花园)门口、(晚上)举行(宴会);对上屏候选词队列进行调序后获得最终的上屏候选队列为:举行、睡觉、约会、门口、发货。
在这个例子中,支持“举行”这个上屏候选词出现的技术点有两个:第一,支持输入光标下文文本信息的理解;第二,需要远距离的触发上屏候选词的过程支持。“在晚上”和“举行”是存在一定二元关系,但是其关系极弱,一般联想预测结果将这个例子提前会略显突兀。在获取输入光标后的下文文本信息中,邻接下文为“盛大”,必定不能对“举行”这个预测候选做出任何贡献。而“举行~宴会”是一对极为强烈的远距离二元语言模型,这对“举行”候选的预测起着至关重要的作用。
再例如,输入光标处的文本信息为“去年中秋去了韩国,今年想去[光标]”。用户的输入意图是上屏“日本”。输入光标所在处为“想去”的后面,那么按照传统的联想策略,会利用“去”和“想去”进行上屏候选词搜索。按照本申请实施例方法,“韩国”和“日本”是一个抽取好的远距离二元语言模型的联想候选词库,“去”和“日本”是一个抽取好的邻近二元语言模型的联想候选词库;那么在二者协同关系的作用下,“日本”这个上屏候选词即会在上屏候选词队列中极为靠前,能够产生的类似上屏候选词的还可能有“泰国”、“新加坡”。
再例如,如果上文文本信息的关键词是“晚上”,那么根据该关键词对应的时间领域的垂直模型以及用户模型,利用当前的系统时间以及用户输入历史中上屏过的时间数据进行预测,即可给出上屏候选词队列:{10点9点11点};如果用户选择了其中的某个上屏候选词,则会继续输出上屏候选词队列:{半、一刻、三刻}。
再例如,如果上文文本信息的关键词是“五道口”,那么根据该关键词对应的地理领域的垂直模型以及用户模型,利用用户输入历史中的地名输入历史数据以及即时获取到的位置信息,即可给出附近以及相关地名名词作为上屏候选词队列:{清华同方、日昌、华联};那么,该方式在用户输入了五道口以后,系统提供的上屏候选词除了“城铁”之外,又提供了“清华同方”,让用户眼前为之一亮。
再例如,用户想表达“故都的秋”这个意思,用户已经完成对前三字的输 入,但用户的上屏形式可能是多种多样的,“故都~的~秋”,“故~都~的~秋”,“故都的~秋”;这样同样是对“秋”这个上屏候选词进行联想,上一次上屏信息却差异很大,能够预测出“秋”这个候选也许只能通过最后一种用户输入时的断句方式;而本申请方法通过提取关键词“故都”进而根据该关键词对应的语言模型,例如常见词语言模型,进行上屏候选词的预测,即可获得上屏候选词“秋”。
上述实施例公开的方法可以更全面、更正确的理解用户输入意图。上述实施例不仅可以应用于中文输入的场景,还可以应用于英文、日文、韩文等其它语言输入场景。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
参照图6,为本申请实施例一种输入装置的结构示意图。
该装置可以包括如下单元:
文本获取单元601,用于获取输入光标处的文本信息,所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息。
关键词提取单元602,用于提取所述文本信息中的关键词。
队列获取单元603,用于查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列。
队列输出单元604,用于输出所述上屏候选词队列。
该装置通过获取输入光标处的文本信息,并基于该文本信息中的关键词确定出上屏词候选队列,解决了现有技术中输入光标改变位置后由于无法获取可靠上屏词条而无法联想给出上屏候选词的问题。该装置不仅能够在输入光标移动时获得可靠的上屏候选词,而且,该输入方法不单单依靠上一次的上屏词条联想给出上屏候选词队列,而是可以利用输入光标前、后的文本信息,以及远距离的文本信息来联想给出上屏候选词队列,该装置可以更全面、更正确的理 解用户的输入意图,从而可以给出更可靠的上屏候选词队列。
在本申请另一实施例中,文本获取单元601,具体可以用于当检测到所述输入光标位于文本框内,且停止文本输入的时间超过时间阈值时,获取所述输入光标处的文本信息。文本获取单元,还可以具体用于以所述输入光标所在的整句分割点或文本框边界作为所述文本信息的长度边界,获取所述输入光标处的文本信息。
在本申请另一实施例中,如图7所示,队列获取单元603可以进一步包括:
模型建立子单元701,用于在所述模型确定子单元702确定所述关键词对应的语言模型之前,建立语言模型及其联想候选词库,所述语言模型包括邻近二元语言模型,远距离二元语言模型及三元语言模型。
模型确定子单元702,用于根据所述关键词与所述输入光标之间的距离关系和/或所述关键词所属的应用属性确定所述关键词对应的语言模型。
队列获取子单元703,用于查找所述语言模型的联想候选词库,获得所述输入光标处的上屏候选词队列。
其中,如图8所示,模型建立子单元701又进一步可以包括:
收集子单元801,用于收集训练语料。
提取子单元802,用于提取所述训练语料中的训练候选词及训练关键词,所述训练关键词与所述训练候选词之间的距离关系包括邻接关系和非邻接关系,所述训练关键词至少为一个。
训练子单元803,用于对所述训练候选词及所述训练关键词进行模型训练,获得所述语言模型及其联想候选词库。
其中,模型确定子单元702,具体用于若所述关键词为一个,则当所述关键词与所述输入光标之间的距离关系为邻接关系时,确定所述关键词对应的语言模型为邻近二元语言模型;当所述距离关系为非邻接关系时确定所述关键词对应的语言模型为远距离二元语言模型;当所述关键词为两个时,确定所述关键词对应的语言模型为三元语言模型。
模型确定子单元702,还具体可以用于根据所述关键词所属的用户使用习惯特征确定所述关键词对应的用户模型;或者,根据所述关键词所属的应用领域确定所述关键词对应的垂直模型;或者;根据所述关键词所属的常用词汇确 定所述关键词对应的常见词语言模型;或者,根据所述关键词所属的话题情景确定所述关键词对应的情景模型。
如图9所示,队列获取子单元703又进一步可以包括:
确定子单元901,用于当所述语言模型至少有两个时,分别确定各所述语言模型的联想候选词库中的上屏候选词。
合并子单元902,用于根据各所述语言模型的预设权重,按照权重线性叠加合并所述上屏候选词。
排序子单元903,用于对合并后的上屏候选词按照权重由高到低进行排序获得所述输入光标处的上屏候选词队列。
参见图10,为本申请实施例另一种输入装置的结构示意图。
该装置除了可以包括上述文本获取单元601,关键词提取单元602,队列获取单元603,队列输出单元604之外,还可以包括:
队列调序单元1001,用于在所述队列输出单元604输出所述上屏候选词队列之前,根据所述输入光标处的话题情景对所述上屏候选词队列进行调序。
队列输出单元604,用于输出调序后的上屏候选词队列。
其中,如图11所示,队列调序单元1001可以进一步包括:
得分计算子单元1101,用于根据命中各情景特征标签的所述关键词的个数及所述关键词命中所述各情景特征标签的概率和,确定所述各情景特征标签的特征得分。
情景排序子单元1102,用于按照所述各情景特征标签的特征得分,由高到底对所述情景特征标签进行排序。
调序子单元1103,用于按照所述情景特征标签的顺序对所述上屏候选词队列进行调序,其中,所述上屏候选词队列中的上屏候选词均具有各自的情景特征标签。
该装置结合了情景感知功能,通过对上屏候选词进行排序调整,将理想候选词排位靠前,给出了更可靠的上屏候选词队列。
本申请实施例还提供了一种电子设备,包括存储器和处理器,所述存储器用于存储计算机指令或代码,所述处理器和所述存储器耦合,用于执行所述存储器中的计算机指令或代码,实现以下方法:
获取输入光标处的文本信息,所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息;
提取所述文本信息中的关键词;
查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列;
输出所述上屏候选词队列。
本申请还公开了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在移动终端上运行时,导致所述移动终端执行上述输入方法。
在其上记录有用于执行上述计算机程序的计算机可读记录介质。所述计算机可读记录介质包括用于以计算机(例如计算机)可读的形式存储或传送信息的任何机制。例如,机器可读介质包括只读存储器(ROM)、随机存取存储器(RAM)、磁盘存储介质、光存储介质、闪速存储介质、电、光、声或其他形式的传播信号(例如,载波、红外信号、数字信号等)等。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用 于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种输入方法、装置和电子设备,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (23)

  1. 一种输入方法,其特征在于,包括:
    获取输入光标处的文本信息,所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息;
    提取所述文本信息中的关键词;
    查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列;
    输出所述上屏候选词队列。
  2. 根据权利要求1所述的方法,其特征在于,所述获取输入光标处的文本信息,包括:
    当检测到所述输入光标位于文本框内,且停止文本输入的时间超过时间阈值时,获取所述输入光标处的文本信息。
  3. 根据权利要求1所述的方法,其特征在于,所述获取输入光标处的文本信息,包括:
    以所述输入光标所在的整句分割点或文本框边界作为所述文本信息的长度边界,获取所述输入光标处的文本信息。
  4. 根据权利要求1所述的方法,其特征在于,所述查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列,包括:
    根据所述关键词与所述输入光标之间的距离关系和/或所述关键词所属的应用属性,确定所述关键词对应的语言模型;
    查找所述语言模型的联想候选词库,获得所述输入光标处的上屏候选词队列。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述关键词与所述输入光标之间的距离关系确定所述关键词对应的语言模型,包括:
    若所述关键词为一个,则当所述关键词与所述输入光标之间的距离关系为邻接关系时,确定所述关键词对应的语言模型为邻近二元语言模型;当所述距离关系为非邻接关系时确定所述关键词对应的语言模型为远距离二元语言模型;
    若所述关键词为两个,则确定所述关键词对应的语言模型为三元语言模 型。
  6. 根据权利要求5所述的方法,其特征在于,在所述根据所述关键词与所述输入光标之间的距离关系确定所述关键词对应的语言模型之前,还包括:
    建立语言模型及其联想候选词库,所述语言模型包括邻近二元语言模型,远距离二元语言模型及三元语言模型;
    所述建立语言模型及其联想候选词库,包括:
    收集训练语料;
    提取所述训练语料中的训练候选词及训练关键词,所述训练关键词与所述训练候选词之间的距离关系包括邻接关系和非邻接关系,所述训练关键词至少为一个;
    对所述训练候选词及所述训练关键词进行模型训练,获得所述语言模型及其联想候选词库。
  7. 根据权利要求4所述的方法,其特征在于,所述根据所述关键词所属的应用属性确定所述关键词对应的语言模型,包括:
    根据所述关键词所属的用户使用习惯特征确定所述关键词对应的用户模型;或者,
    根据所述关键词所属的应用领域确定所述关键词对应的垂直模型;或者,
    根据所述关键词所属的常用词汇确定所述关键词对应的常见词语言模型;或者,
    根据所述关键词所属的话题情景确定所述关键词对应的情景模型。
  8. 根据权利要求4所述的方法,其特征在于,所述查找所述语言模型的联想候选词库,获得所述输入光标处的上屏候选词队列,包括:
    当所述语言模型至少有两个时,分别确定各所述语言模型的联想候选词库中的上屏候选词;
    根据各所述语言模型的预设权重,按照权重线性叠加合并所述上屏候选词;
    对合并后的上屏候选词按照权重由高到低进行排序获得所述输入光标处的上屏候选词队列。
  9. 根据权利要求1至8中任意一项所述的方法,其特征在于,在所述输 出所述上屏候选词队列之前,还包括:
    根据所述输入光标处的话题情景对所述上屏候选词队列进行调序;
    所述输出所述上屏候选词队列,包括:
    输出调序后的上屏候选词队列。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述输入光标处的话题情景对所述上屏候选词队列进行调序,包括:
    根据命中各情景特征标签的所述关键词的个数及所述关键词命中所述各情景特征标签的概率和,确定所述各情景特征标签的特征得分;
    按照所述各情景特征标签的特征得分,由高到底对所述情景特征标签进行排序;
    按照所述情景特征标签的顺序对所述上屏候选词队列进行调序,其中,所述上屏候选词队列中的上屏候选词均具有各自的情景特征标签。
  11. 一种输入装置,其特征在于,包括:
    文本获取单元,用于获取输入光标处的文本信息,所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息;
    关键词提取单元,用于提取所述文本信息中的关键词;
    队列获取单元,用于查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列;
    队列输出单元,用于输出所述上屏候选词队列。
  12. 根据权利要求11所述的装置,其特征在于,
    所述文本获取单元,具体用于当检测到所述输入光标位于文本框内,且停止文本输入的时间超过时间阈值时,获取所述输入光标处的文本信息。
  13. 根据权利要求11所述的装置,其特征在于,
    所述文本获取单元,具体用于以所述输入光标所在的整句分割点或文本框边界作为所述文本信息的长度边界,获取所述输入光标处的文本信息。
  14. 根据权利要求11所述的装置,其特征在于,所述队列获取单元包括:
    模型确定子单元,用于根据所述关键词与所述输入光标之间的距离关系和/或所述关键词所属的应用属性,确定所述关键词对应的语言模型;
    队列获取子单元,用于查找所述语言模型的联想候选词库,获得所述输入光标处的上屏候选词队列。
  15. 根据权利要求14所述的装置,其特征在于,
    所述模型确定子单元,具体用于若所述关键词为一个,则当所述关键词与所述输入光标之间的距离关系为邻接关系时,确定所述关键词对应的语言模型为邻近二元语言模型;当所述距离关系为非邻接关系时确定所述关键词对应的语言模型为远距离二元语言模型;若所述关键词为两个,则确定所述关键词对应的语言模型为三元语言模型。
  16. 根据权利要求15所述的装置,其特征在于,所述队列获取单元还包括:
    模型建立子单元,用于在所述模型确定子单元确定所述关键词对应的语言模型之前,建立语言模型及其联想候选词库,所述语言模型包括邻近二元语言模型,远距离二元语言模型及三元语言模型;
    所述模型建立子单元包括:
    收集子单元,用于收集训练语料;
    提取子单元,用于提取所述训练语料中的训练候选词及训练关键词,所述训练关键词与所述训练候选词之间的距离关系包括邻接关系和非邻接关系,所述训练关键词至少为一个;
    训练子单元,用于对所述训练候选词及所述训练关键词进行模型训练,获得所述语言模型及其联想候选词库。
  17. 根据权利要求14所述的装置,其特征在于,
    所述模型确定子单元,具体用于根据所述关键词所属的用户使用习惯特征确定所述关键词对应的用户模型;或者,根据所述关键词所属的应用领域确定所述关键词对应的垂直模型;或者,根据所述关键词所属的常用词汇确定所述关键词对应的常见词语言模型;或者,根据所述关键词所属的话题情景确定所述关键词对应的情景模型。
  18. 根据权利要求14所述的装置,其特征在于,所述队列获取子单元包括:
    确定子单元,用于当所述语言模型至少有两个时,分别确定各所述语言模 型的联想候选词库中的上屏候选词;
    合并子单元,用于根据各所述语言模型的预设权重,按照权重线性叠加合并所述上屏候选词;
    排序子单元,用于对合并后的上屏候选词按照权重由高到低进行排序获得所述输入光标处的上屏候选词队列。
  19. 根据权利要求11至18中任意一项所述的装置,其特征在于,所述装置还包括:
    队列调序单元,用于在所述队列输出单元输出所述上屏候选词队列之前,根据所述输入光标处的话题情景对所述上屏候选词队列进行调序;
    所述队列输出单元,用于输出调序后的上屏候选词队列。
  20. 根据权利要求19所述的装置,其特征在于,所述队列调序单元包括:
    得分计算子单元,用于根据命中各情景特征标签的所述关键词的个数及所述关键词命中所述各情景特征标签的概率和,确定所述各情景特征标签的特征得分;
    情景排序子单元,用于按照所述各情景特征标签的特征得分,由高到底对所述情景特征标签进行排序;
    调序子单元,用于按照所述情景特征标签的顺序对所述上屏候选词队列进行调序,其中,所述上屏候选词队列中的上屏候选词均具有各自的情景特征标签。
  21. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器用于存储计算机指令或代码,所述处理器和所述存储器耦合,用于执行所述存储器中的计算机指令或代码,实现以下方法:
    获取输入光标处的文本信息,所述文本信息包括位于所述输入光标之前的上文文本信息和/或位于所述输入光标之后的下文文本信息;
    提取所述文本信息中的关键词;
    查找所述关键词的联想候选词库,获得所述输入光标处的上屏候选词队列;
    输出所述上屏候选词队列。
  22. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在移 动终端上运行时,导致所述移动终端执行权利要求1-10中的任一项所述的输入方法。
  23. 一种计算机可读介质,其中存储了如权利要求22所述的计算机程序。
PCT/CN2015/087050 2014-09-09 2015-08-14 一种输入方法、装置及电子设备 WO2016037519A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/521,299 US10496687B2 (en) 2014-09-09 2015-08-14 Input method, device, and electronic apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410455924.0 2014-09-09
CN201410455924.0A CN104281649B (zh) 2014-09-09 2014-09-09 一种输入方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2016037519A1 true WO2016037519A1 (zh) 2016-03-17

Family

ID=52256522

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/087050 WO2016037519A1 (zh) 2014-09-09 2015-08-14 一种输入方法、装置及电子设备

Country Status (3)

Country Link
US (1) US10496687B2 (zh)
CN (1) CN104281649B (zh)
WO (1) WO2016037519A1 (zh)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281649B (zh) * 2014-09-09 2017-04-19 北京搜狗科技发展有限公司 一种输入方法、装置及电子设备
CN104731361B (zh) * 2015-03-04 2018-06-19 百度在线网络技术(北京)有限公司 一种确定候选词条的可选择区域的方法与装置
CN105302335B (zh) * 2015-10-28 2018-11-02 小米科技有限责任公司 词汇推荐方法和装置及计算机可读存储介质
US20190332663A1 (en) * 2016-07-22 2019-10-31 Huawei Technologies Co., Ltd. Candidate-item presentation method and terminal device
CN106446186A (zh) * 2016-09-28 2017-02-22 北京金山安全软件有限公司 联想词检索方法以及装置、终端
CN109002183B (zh) * 2017-06-07 2022-11-29 北京搜狗科技发展有限公司 一种信息输入的方法及装置
CN109032374B (zh) * 2017-06-09 2023-06-20 北京搜狗科技发展有限公司 一种用于输入法的候选展示方法、装置、介质及设备
CN109144286B (zh) * 2017-06-27 2022-08-02 北京搜狗科技发展有限公司 一种输入方法及装置
CN109388252B (zh) * 2017-08-14 2022-10-04 北京搜狗科技发展有限公司 一种输入方法及装置
CN107704097A (zh) * 2017-08-23 2018-02-16 合肥龙图腾信息技术有限公司 一种自动检测组词输入法
CN109471538B (zh) * 2017-09-08 2022-07-05 北京搜狗科技发展有限公司 一种输入方法、装置和用于输入的装置
CN109725736B (zh) * 2017-10-27 2023-02-28 北京搜狗科技发展有限公司 一种候选排序方法、装置及电子设备
CN108377289B (zh) * 2018-01-22 2021-02-09 努比亚技术有限公司 已发送信息修改方法、装置及计算机可读存储介质
CN108304530B (zh) * 2018-01-26 2022-03-18 腾讯科技(深圳)有限公司 知识库词条分类方法和装置、模型训练方法和装置
CN110096165A (zh) * 2018-01-31 2019-08-06 北京搜狗科技发展有限公司 一种联想方法、装置和电子设备
CN110244861B (zh) * 2018-03-09 2024-02-02 北京搜狗科技发展有限公司 数据处理方法和装置
CN110858100B (zh) * 2018-08-22 2023-10-20 北京搜狗科技发展有限公司 联想候选词生成方法及装置
CN111381685B (zh) * 2018-12-29 2024-03-22 北京搜狗科技发展有限公司 一种句联想方法和装置
CN111752397B (zh) * 2019-03-29 2024-06-04 北京搜狗科技发展有限公司 一种候选词确定方法及装置
CN111160347B (zh) * 2019-08-14 2023-04-18 广东小天才科技有限公司 一种基于相似字符识别的文本识别方法及电子设备
CN111078028B (zh) * 2019-12-09 2023-11-21 科大讯飞股份有限公司 输入方法、相关设备及可读存储介质
CN111400484B (zh) * 2020-03-20 2023-06-02 支付宝(杭州)信息技术有限公司 一种关键词提取方法和系统
CN113589955B (zh) * 2020-04-30 2024-07-26 北京搜狗科技发展有限公司 一种数据处理方法、装置和电子设备
CN112000233B (zh) * 2020-07-29 2024-09-03 北京搜狗科技发展有限公司 联想候选的处理方法、装置和用于处理联想候选的装置
CN112181167A (zh) * 2020-10-27 2021-01-05 维沃移动通信有限公司 输入法侯选词处理方法和电子设备
CN112464656B (zh) * 2020-11-30 2024-02-13 中国科学技术大学 关键词抽取方法、装置、电子设备和存储介质
CN112684909B (zh) * 2020-12-29 2024-05-31 科大讯飞股份有限公司 输入法联想效果评测方法、装置、电子设备及存储介质
CN113449515A (zh) * 2021-01-27 2021-09-28 心医国际数字医疗系统(大连)有限公司 一种医学文本的预测方法、预测装置及电子设备
CN112836021B (zh) * 2021-02-24 2022-04-26 南京乐图软件技术有限公司 一种图书馆智能化搜索系统
CN115437510A (zh) * 2022-09-23 2022-12-06 联想(北京)有限公司 数据显示方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158969A (zh) * 2007-11-23 2008-04-09 腾讯科技(深圳)有限公司 一种整句生成方法及装置
US20140136970A1 (en) * 2011-07-14 2014-05-15 Tencent Technology (Shenzhen) Company Limited Text inputting method, apparatus and system
CN103869999A (zh) * 2012-12-11 2014-06-18 百度国际科技(深圳)有限公司 对输入法所产生的候选项进行排序的方法及装置
CN104281649A (zh) * 2014-09-09 2015-01-14 北京搜狗科技发展有限公司 一种输入方法、装置及电子设备

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1011447A (ja) * 1996-06-21 1998-01-16 Ibm Japan Ltd パターンに基づく翻訳方法及び翻訳システム
CN1159661C (zh) * 1999-04-08 2004-07-28 肯特里奇数字实验公司 用于中文的标记和命名实体识别的系统
US6568516B2 (en) * 2000-08-03 2003-05-27 U.S. Reel - Missouri, L.L.C. Switchable clutch
JP2002108858A (ja) * 2000-09-20 2002-04-12 Internatl Business Mach Corp <Ibm> 機械翻訳方法、機械翻訳装置および記録媒体
US6925460B2 (en) * 2001-03-23 2005-08-02 International Business Machines Corporation Clustering data including those with asymmetric relationships
JP3765799B2 (ja) * 2003-05-28 2006-04-12 沖電気工業株式会社 自然言語処理装置、自然言語処理方法及び自然言語処理プログラム
US20060048055A1 (en) * 2004-08-25 2006-03-02 Jun Wu Fault-tolerant romanized input method for non-roman characters
US7590626B2 (en) * 2006-10-30 2009-09-15 Microsoft Corporation Distributional similarity-based models for query correction
JP2008305167A (ja) * 2007-06-07 2008-12-18 Toshiba Corp 原言語文を目的言語文に機械翻訳する装置、方法およびプログラム
US7917355B2 (en) * 2007-08-23 2011-03-29 Google Inc. Word detection
US8321802B2 (en) * 2008-11-13 2012-11-27 Qualcomm Incorporated Method and system for context dependent pop-up menus
CN101751202A (zh) * 2008-12-17 2010-06-23 爱思开电讯投资(中国)有限公司 一种基于环境信息进行文字关联输入的方法和装置
CN101634905B (zh) * 2009-07-01 2011-07-06 广东国笔科技股份有限公司 一种智能联想输入系统及方法
US20120016671A1 (en) * 2010-07-15 2012-01-19 Pawan Jaggi Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions
CN102902362B (zh) * 2011-07-25 2017-10-31 深圳市世纪光速信息技术有限公司 文字输入方法及系统
CN103902535B (zh) * 2012-12-24 2019-02-22 腾讯科技(深圳)有限公司 获取联想词的方法、装置及系统
CN103984688B (zh) * 2013-04-28 2015-11-25 百度在线网络技术(北京)有限公司 一种基于本地词库提供输入候选词条的方法与设备
CN103440299B (zh) * 2013-08-20 2016-12-28 陈喜 一种基于焦点上下文联想词的信息快速输入方法
US9899019B2 (en) * 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158969A (zh) * 2007-11-23 2008-04-09 腾讯科技(深圳)有限公司 一种整句生成方法及装置
US20140136970A1 (en) * 2011-07-14 2014-05-15 Tencent Technology (Shenzhen) Company Limited Text inputting method, apparatus and system
CN103869999A (zh) * 2012-12-11 2014-06-18 百度国际科技(深圳)有限公司 对输入法所产生的候选项进行排序的方法及装置
CN104281649A (zh) * 2014-09-09 2015-01-14 北京搜狗科技发展有限公司 一种输入方法、装置及电子设备

Also Published As

Publication number Publication date
US20170316086A1 (en) 2017-11-02
CN104281649B (zh) 2017-04-19
US10496687B2 (en) 2019-12-03
CN104281649A (zh) 2015-01-14

Similar Documents

Publication Publication Date Title
WO2016037519A1 (zh) 一种输入方法、装置及电子设备
US20220214775A1 (en) Method for extracting salient dialog usage from live data
WO2016037520A1 (zh) 一种输入方法、装置和电子设备
US10803391B2 (en) Modeling personal entities on a mobile device using embeddings
US9672818B2 (en) Updating population language models based on changes made by user clusters
US9336298B2 (en) Dialog-enhanced contextual search query analysis
US10783885B2 (en) Image display device, method for driving the same, and computer readable recording medium
US20150074112A1 (en) Multimedia Question Answering System and Method
WO2016008452A1 (zh) 高效输入的预测方法和装置
CN106537370A (zh) 在存在来源和翻译错误的情况下对命名实体鲁棒标记的方法和系统
US9916396B2 (en) Methods and systems for content-based search
CN105956053B (zh) 一种基于网络信息的搜索方法及装置
KR20210000326A (ko) 모바일 비디오 서치 기법
JP2018504727A (ja) 参考文書の推薦方法及び装置
CN111984749B (zh) 一种兴趣点排序方法和装置
WO2017012222A1 (zh) 时效需求识别方法、装置、设备及非易失性计算机存储介质
US11954097B2 (en) Intelligent knowledge-learning and question-answering
CN106095912B (zh) 用于生成扩展查询词的方法和装置
US20180143760A1 (en) Sequence expander for data entry/information retrieval
CN103020049A (zh) 搜索方法及搜索系统
JPWO2013146736A1 (ja) 同義関係判定装置、同義関係判定方法、及びそのプログラム
WO2024036616A1 (zh) 一种基于终端的问答方法及装置
WO2016041428A1 (zh) 一种英文的输入方法和装置
CN108803890A (zh) 一种输入方法、输入装置和用于输入的装置
CN111274428B (zh) 一种关键词的提取方法及装置、电子设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15839665

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 15521299

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 15839665

Country of ref document: EP

Kind code of ref document: A1