CN110673748B - Method and device for providing candidate long sentences in input method - Google Patents

Method and device for providing candidate long sentences in input method Download PDF

Info

Publication number
CN110673748B
CN110673748B CN201910927584.XA CN201910927584A CN110673748B CN 110673748 B CN110673748 B CN 110673748B CN 201910927584 A CN201910927584 A CN 201910927584A CN 110673748 B CN110673748 B CN 110673748B
Authority
CN
China
Prior art keywords
candidate
prediction model
long sentence
word
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910927584.XA
Other languages
Chinese (zh)
Other versions
CN110673748A (en
Inventor
龚建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910927584.XA priority Critical patent/CN110673748B/en
Publication of CN110673748A publication Critical patent/CN110673748A/en
Application granted granted Critical
Publication of CN110673748B publication Critical patent/CN110673748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The method comprises the steps of obtaining a current input sequence input by a user in input method application, obtaining candidate words matched with the current input sequence, obtaining corresponding candidate long sentences by combining a pre-trained long sentence prediction model and the candidate words, displaying the candidate long sentences while displaying the candidate words on the input method application, and therefore, combining the pre-trained long sentence prediction model to quickly obtain matched candidate long sentences and provide the candidate long sentences for the user, so that the user can conveniently and quickly complete long sentence input according to the candidate long sentences, input cost of the user is reduced, and user experience is improved.

Description

Method and device for providing candidate long sentences in input method
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a method and a device for providing candidate long sentences in an input method.
Background
At present, in the input method application, the input method application can provide a candidate word corresponding to the pinyin sequence and a text with a shorter character, a word, a phrase and the like corresponding to the candidate word according to the pinyin sequence input by a user, however, in the practical application, when the user needs to input a complete sentence through the input method application, the user needs to input the corresponding pinyin sequence in the corresponding complete sentence through the input method application for multiple times, so that the input of the complete sentence is completed, the input cost of the complete sentence input by the user is higher, and the input method experience of the user is not ideal.
Disclosure of Invention
The present application aims to solve, at least to some extent, one of the technical problems in the related art.
Therefore, a first object of the present application is to provide a method for providing candidate long sentences in an input method.
The second object of the present application is to provide a device for providing candidate long sentences in an input method.
A third object of the present application is to propose an electronic device.
A fourth object of the present application is to propose a computer readable storage medium.
A fifth object of the present application is to propose a computer programme product.
In order to achieve the above objective, an embodiment of a first aspect of the present application provides a method for providing a candidate long sentence in an input method, including: acquiring a current input sequence input by a user in an input method application; acquiring candidate words matched with the current input sequence; obtaining a candidate long sentence matched with the candidate word according to a pre-trained long sentence prediction model; and displaying the candidate words and the candidate long sentences on the input method application.
According to the method for providing the candidate long sentences in the input method, the current input sequence input by the user in the input method application is obtained, the candidate words matched with the current input sequence are obtained, the corresponding candidate long sentences are obtained by combining the pre-trained long sentence prediction model and the candidate words, the candidate long sentences are displayed while the candidate words are displayed on the input method application, therefore, the matched candidate long sentences are obtained quickly by combining the pre-trained long sentence prediction model, the candidate long sentences are provided for the user, the user can conveniently and quickly complete the input of the long sentences according to the candidate long sentences, the input cost of the user is reduced, and the user experience is improved.
To achieve the above objective, an embodiment of a third aspect of the present application provides a device for providing a candidate long sentence in an input method, including: the first acquisition module is used for acquiring a current input sequence input by a user in the input method application; the second acquisition module is used for acquiring candidate words matched with the current input sequence; the third acquisition module is used for acquiring a candidate long sentence matched with the candidate word according to a pre-trained long sentence prediction model; and the display module is used for displaying the candidate words and the candidate long sentences on the input method application.
According to the device for providing the candidate long sentences in the input method, the current input sequence input by the user in the input method application is obtained, the candidate words matched with the current input sequence are obtained, the corresponding candidate long sentences are obtained by combining the pre-trained long sentence prediction model and the candidate words, the candidate long sentences are displayed while the candidate words are displayed on the input method application, therefore, the matched candidate long sentences are obtained quickly by combining the pre-trained long sentence prediction model, the candidate long sentences are provided for the user, the user can conveniently and quickly complete the input of the long sentences according to the candidate long sentences, the input cost of the user is reduced, and the user experience is improved.
To achieve the above object, an embodiment of a third aspect of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements a method for providing candidate long sentences in an input method as described above when the processor executes the program.
In order to achieve the above object, a fourth aspect of the present application proposes a computer-readable storage medium, which when executed by a processor, implements a method for providing a candidate long sentence in an input method as described above.
To achieve the above object, an embodiment of a fifth aspect of the present application proposes a computer program product, which when executed by an instruction processor in the computer program product, implements a qualification auditing method as described above.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a method for providing candidate long sentences in an input method according to one embodiment of the present application;
FIG. 2 is an exemplary diagram of a user interface containing candidate long sentences;
FIG. 3 is a schematic diagram of a refinement flow diagram of step 103 in the embodiment shown in FIG. 1;
FIG. 4 is a second schematic diagram of the refinement procedure of step 103 in the embodiment shown in FIG. 1;
FIG. 5 is a flow chart of a method for providing candidate long sentences in an input method according to another embodiment of the present application;
FIG. 6 is a schematic structural diagram of a device for providing candidate long sentences in an input method according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a device for providing candidate long sentences in an input method according to another embodiment of the present application;
FIG. 8 is a schematic structural diagram of a device for providing candidate long sentences in an input method according to another embodiment of the present application;
FIG. 9 is a schematic diagram of a device for providing candidate long sentences in an input method according to another embodiment of the present application
Fig. 10 is a schematic structural view of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.
The method, the device and the electronic equipment for providing the candidate long sentences in the input method of the embodiment of the application are described below with reference to the accompanying drawings.
Fig. 1 is a flow chart of a method for providing candidate long sentences in an input method according to an embodiment of the present application. It should be noted that, the execution body of the method for providing the candidate long sentence in the input method provided in this embodiment is a device for providing the candidate long sentence in the input method, and the device may be configured in an electronic device or a cloud server, which is not limited in this embodiment specifically.
As shown in fig. 1, the method for providing the candidate long sentence in the input method may include:
step 101, acquiring a current input sequence input by a user in an input method application.
Step 102, obtaining candidate words matched with the current input sequence.
As an exemplary embodiment, when a user needs to input information through an input method application, the terminal device may acquire a current input sequence input by the user in the input method application, and upload the current input sequence to the cloud server, so that the cloud server may convert the current input sequence to obtain candidate words matched with the current input sequence.
It may be understood that, the obtaining of the candidate word matching the current input sequence may be performed by the cloud server, and may also be performed by the terminal device, for example, the terminal device may convert the current input sequence according to the current input sequence input by the user in the input method application, in combination with the key model, to obtain the candidate word matching the current input sequence, and upload the corresponding candidate word to the cloud server.
The terminal device may include, but is not limited to, a personal computer, a tablet computer, a mobile phone, a smart phone, and other hardware devices with an input method application, which is not limited in particular.
For example, the current input sequence input by the user in the input method application is nihaozai, and the current input sequence "nihaozai" is converted, so that the first candidate word corresponding to the current input sequence is "you are" can be obtained.
For another example, the current input sequence input by the user in the input method application is guonianhao, and the current input sequence "guonianhao" is converted, so that the first candidate word corresponding to the current input sequence is "good over the year".
And step 103, obtaining candidate long sentences matched with the candidate words according to a pre-trained long sentence prediction model.
Specifically, after obtaining a candidate word matched with the current input sequence, in order to facilitate a user to quickly input a complete long sentence through the application of an input method, a pre-trained long sentence prediction model can be combined to obtain a candidate long sentence matched with the candidate word.
It should be noted that, in different application scenarios, the manner of obtaining the candidate long sentence matched with the candidate word is different according to the long sentence prediction model, for example, the candidate word may be directly input into the long sentence prediction model, and the long sentence prediction model will directly output the candidate long sentence matched with the candidate word, where the long sentence prediction model has learned the correspondence between the candidate word and the candidate long sentence.
Other ways of obtaining a candidate long sentence matching a candidate term according to the long sentence prediction model will be described in the following embodiments.
It is to be understood that the candidate long sentence includes the candidate word and the suffix word after the candidate word.
In practical application, sentence usage habits of different users may be different, so that in order to make the provided candidate long sentences more in line with user requirements, as an exemplary implementation manner, after obtaining a candidate long sentence matched with a candidate word, sentence preference characteristics of the user may be obtained, and in combination with the sentence preference characteristics, the obtained candidate long sentence is adjusted, and the adjusted candidate long sentence is fed back to the terminal device.
And 104, displaying the candidate words and the candidate long sentences on the input method application.
In this embodiment, the candidate long sentence matched with the candidate word may be one or more.
In this embodiment, to avoid the occurrence of sensitive words in the candidate long sentence, for example, a non-civilized sentence, after the candidate long sentence is obtained, it may be determined whether the corresponding candidate long sentence includes the words in the blackname word list, if the words in the blackname word list are included, the corresponding candidate long sentence is filtered, and if the words do not exist, the candidate long sentence is saved.
Wherein, the black name word list stores some preset non-civilized words, illegal words and the like.
As an exemplary embodiment, when it is determined that the candidate long sentence matching the candidate word is plural, in order to accurately provide the candidate long sentence to the user, the score of each candidate long sentence may be obtained, and the candidate long sentence having the highest score may be applied to the input method.
For example, if it is determined that each candidate long sentence does not include a word in the blacklist vocabulary, at this time, the candidate long sentence with the highest score may be obtained and the candidate long sentence with the highest score may be fed back to the terminal device for display.
As another exemplary embodiment, after determining that the number of candidate long sentences matched with the candidate words is multiple, a score corresponding to each candidate long sentence can be obtained, the candidate long sentences are ranked according to the order of the scores from high to low, and the ranked candidate long sentences are displayed on the input method application.
In this embodiment, in order not to affect the user to input information through the input method, the candidate long sentence may be displayed in the upper left corner or the upper right corner on the application of the input method, and the display position of the candidate long sentence is not specifically limited in this embodiment.
For example, the user currently inputs the sequence "nihaozai" in the input method application, which displays candidate words in the input method application, and candidate long sentences with matched candidate words "hello is in the same way, i find something, wherein an example diagram on the corresponding user interface is shown in fig. 2, and it should be noted that, in fig. 2, an example is shown in which the candidate long sentences are displayed on the upper right corner.
According to the method for providing the candidate long sentences in the input method, the current input sequence input by the user in the input method application is obtained, the candidate words matched with the current input sequence are obtained, the corresponding candidate long sentences are obtained by combining the pre-trained long sentence prediction model and the candidate words, the candidate long sentences are displayed while the candidate words are displayed on the input method application, therefore, the matched candidate long sentences are obtained quickly by combining the pre-trained long sentence prediction model, the candidate long sentences are provided for the user, the user can conveniently and quickly complete the input of the long sentences according to the candidate long sentences, the input cost of the user is reduced, and the user experience is improved.
As shown in fig. 3, in one embodiment, the specific implementation procedure of step 103 may include:
Step 301, taking the candidate words as the current input of the long sentence prediction model.
In this embodiment, in order to accurately predict the next word appearing after each word through the long sentence prediction model, the long sentence prediction model may be further trained in combination with training corpus data before the current input into the long sentence prediction model to obtain the current output of the long sentence prediction model.
The specific process of training the long sentence prediction model is as follows:
step a, obtaining training corpus data, wherein the training corpus data comprises prefix sample words and suffix sample words corresponding to the prefix sample words.
Wherein the suffix sample word is a word that appears after the prefix sample word.
In this embodiment, a large amount of chat corpora in the instant messaging chat scene can be combined to construct training corpus data.
As an exemplary embodiment, to ensure that there is enough preamble information, when selecting a chat sentence, a sentence with the number of input words greater than or equal to a preset word number threshold may be selected as the chat corpus.
The preset word number threshold is a preset word number threshold, and if the word number in one chat sentence is equal to or greater than the word number threshold, the chat sentence can be used as a sentence for constructing the training corpus. For example, the preset word number threshold is 7, the chat sentence is "which we eat at night", and it can be determined that the word number of the chat sentence exceeds 7 at this time, and the chat sentence can be used as a sentence for constructing the training corpus.
The general process of constructing the training corpus data according to the chat corpus is as follows: and carrying out separation processing on chat sentences in the chat corpus through preset separators, and constructing training corpus data according to separation processing results, wherein the words before each preset separator in the chat corpus are prefix sample words, and the words after the corresponding preset separator are suffix sample words.
Wherein the preset separator is preset, for example, the preset separator may be "|".
For example, the chat conversation is "what we eat at night", after the chat sentence is divided by using the separator "|", the divided chat conversation is "what we eat at night", for the first separator, the corresponding prefix sample word is "we", the suffix sample word is "evening", the second separator, the prefix sample word is "we evening", the suffix sample word is "what we eat", and the third separator, the prefix sample word is "what we eat at night", and the suffix sample word is "eat at night".
And b, training the long sentence prediction model according to the prefix sample words and the suffix sample words.
Specifically, the prefix sample word is used as an input characteristic of the long sentence prediction model, and the suffix sample word is used as an output characteristic of the long sentence prediction model, so that the long sentence prediction model is trained.
For example, long sentence prediction models may be trained in conjunction with the recurrent neural network RNN (Recurrent Neural Network) and the prefix and suffix sample words.
Wherein, RNN can use LSTM or GRNN and other structures, the input characteristic is Chinese character, and the output characteristic is the next Chinese character. Then passing through an Embedding layer and modeling through an RNN layer. Then, the network structure is output through grading hierarchical Softmax, and the corresponding Chinese characters are selected.
The method is characterized in that a hierarchical Softmax output network structure is adopted, so that the classification calculation amount can be reduced, and the efficiency of training a model can be improved.
In this embodiment, the long sentence prediction model used has the following advantages: the efficiency of outputting the candidate long sentence is higher, the storage space required by the long sentence prediction model is smaller, the requirement on the storage space is not high, and the storage resources occupied by the model are reduced.
In the process of training the model, the parameters in the model can be optimized by using a BP algorithm to obtain a final sequence conversion model of the voice and the word.
It should be understood that the trained long sentence prediction module can accurately predict the next word occurring after each input word in the long sentence prediction module.
Step 302, inputting the current input into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output includes the next word after the current input.
Step 303, when it is determined that the next word does not match with the preset sentence termination information, updating the current input of the long sentence prediction model according to the current output and the current input, and obtaining the current output corresponding to the current input through the long sentence prediction model until the current output of the long sentence prediction model matches with the preset sentence termination information.
And 304, when the current output of the long sentence prediction model is matched with the preset sentence termination information, generating a candidate long sentence matched with the candidate word according to the current input of the long sentence prediction model.
That is, in this embodiment, the candidate word and the next word appearing after the candidate word is predicted by combining the long sentence prediction model, and the candidate word and the next word are used as the input of the next prediction of the language model, so as to further predict the next word, and repeatedly use the long sentence prediction model until the long sentence prediction model outputs the sentence terminator.
The statement termination information is information indicating termination of the statement. The statement termination information is preset. For example, the statement termination information may be a statement terminator, which may be NULL.
For example, the sentence termination information may be NULL, and it is assumed that according to a current input sequence input by a user in the input method application, the obtained candidate word is "we evening", at this time, the candidate word "we evening" may be used as a current input of the long sentence prediction model, after the "we evening" is input to the long sentence prediction model, if the current output of the long sentence prediction model is "to which", that is, the next word appearing after the "we evening" is "to which", at this time, it may be determined that the current output is not the sentence termination information, at this time, the current output may be spliced after the current input in combination to obtain an updated current input, and the updated current input is "to which evening". Correspondingly, the current output of the long sentence prediction model is "eat". Correspondingly, splicing the current output again after the current input to obtain updated current input, wherein the updated current input is 'what we eat at night', at this time, the current output of the long sentence prediction model is 'NULL', and the current input corresponding to the long sentence prediction model is 'what we eat at night', namely, the long sentence candidate matched with the candidate word.
As shown in fig. 4, in another embodiment, the specific implementation procedure of step 103 may include:
and step 401, determining suffix words matched with the candidate words through a long sentence prediction model, wherein the long sentence prediction model is learned to obtain the corresponding relation between the candidate words and the suffix words.
In this embodiment, in order to accurately predict the suffix word matched with the candidate word through the long sentence prediction model, the long sentence prediction model may be further trained in combination with training corpus data before the current input into the long sentence prediction model to obtain the current output of the long sentence prediction model.
The specific process of training the long sentence prediction model is as follows:
step a, obtaining training corpus data, wherein the training corpus data comprises prefix sample words and suffix sample words corresponding to the prefix sample words, and the prefix sample words and the suffix sample words can form long sentences.
In this embodiment, a large amount of chat corpora in the instant messaging chat scene can be combined to construct training corpus data.
As an exemplary embodiment, to ensure that there is enough preamble information, when selecting a chat sentence, a sentence with the number of input words greater than or equal to a preset word number threshold may be selected as the chat corpus.
The preset word number threshold is a preset word number threshold, and if the word number in one chat sentence is equal to or greater than the word number threshold, the chat sentence can be used as a sentence for constructing the training corpus. For example, the preset word number threshold is 7, the chat sentence is "which we eat at night", and it can be determined that the word number of the chat sentence exceeds 7 at this time, and the chat sentence can be used as a sentence for constructing the training corpus.
The rough process of constructing the training corpus data according to the chat corpus is as follows, the chat sentences in the chat corpus are separated through preset separators, and the training corpus data is determined according to the separation processing result. The words before the preset separator in the chat sentence are prefix sample words, and the words after the corresponding preset separator are suffix sample words.
Wherein the preset separator is preset, for example, the preset separator may be "|".
For example, the chat conversation is "what we have eaten at night", and after the chat sentence is divided by using the separator "|", the divided chat conversation is "what we have eaten at night", the prefix sample word is "what we have at night", and the suffix sample word is "what have eaten at night".
For another example, the chat session is "what we go to eat at night", and after the chat sentence is divided by using the separator "|", the divided chat session is "what we go to eat at night", the prefix sample word is "we", and the suffix sample word is "what eat at night".
And b, training the long sentence prediction model according to the prefix sample words and the suffix sample words.
In this embodiment, the long sentence prediction model may be trained in combination with a typical sequence-to-sequence neural network translation model, as well as prefix and suffix terms.
Step 402, generating a candidate long sentence according to the candidate word and the suffix word.
In this implementation, after the candidate term and the suffix term are obtained, the suffix term may be spliced after the candidate term to generate a candidate long sentence.
For example, assuming that the candidate word is "we evening", if the suffix word after the candidate word is predicted to be "to what eat" according to the long sentence prediction model, at this time, the generated candidate long sentence is "to what eat we go" according to the candidate word and the suffix word.
It may be appreciated that in practical application, before the user inputs the current input sequence through the input method application, it is possible that the user has already screened words in the text editing box in the corresponding user interface through the input method, so that, in order to accurately provide the candidate long sentence for the user, on the basis of any embodiment described above, the screened words and the current input sequence input by the user in the input method application may be combined to determine the candidate long sentence.
The method for providing the candidate long sentence in the input method of this embodiment is further described below with reference to fig. 5.
Fig. 5 is a flowchart of a method for providing candidate long sentences in an input method according to another embodiment of the present application.
As shown in fig. 5, the method for providing the candidate long sentence in the input method may include:
step 501, a current input sequence input by a user in an input method application is obtained.
Step 502, a candidate word matching the current input sequence is obtained.
It should be noted that the foregoing explanation of the steps 101 to 102 is also applicable to the steps 501 to 502 of the embodiment, and will not be repeated here.
Step 503, the on-screen words preceding the current input sequence are obtained.
It should be noted that, the execution sequence of step 502 and step 503 in this embodiment is not separate.
As an exemplary implementation, text information input by a user in a text editing box in a user interface can be obtained, wherein the text information is an on-screen word.
And step 504, obtaining candidate long sentences matched with the candidate words and the on-screen words by adopting a pre-trained long sentence prediction model.
In the different application scenarios, the method for obtaining the candidate long sentence matched with the candidate word and the on-screen word by adopting the pre-trained long sentence prediction model is different, and is exemplified as follows:
In a first implementation scenario, the on-screen word and the candidate word are used as the current input of a long sentence prediction model; inputting the current into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output comprises the next word after the current input; when the fact that the next word is not matched with the preset sentence termination information is determined, updating the current input of the long sentence prediction model according to the current output and the current input, and acquiring the current output corresponding to the current input through the long sentence prediction model until the current output of the long sentence prediction model is matched with the preset sentence termination information; when the current output of the long sentence prediction model is matched with the preset sentence termination information, generating a candidate long sentence matched with the candidate word according to the current input of the long sentence prediction model.
As an example, candidate words may be stitched after the on-screen word and the stitched word stitched is taken as the current input of the long sentence prediction model.
For example, the on-screen word is "we", the candidate word corresponding to the current input sequence is "evening", the on-screen word and the candidate word may be spliced to obtain the current input of the long sentence prediction model as "we evening", after the "we evening" is input to the long sentence prediction model, if the current output of the long sentence prediction model is "to which", that is, the next word appearing after "we evening" is "to which", at this time, it may be determined that the current output is not a sentence terminator, at this time, the current output may be spliced in combination with the current input to obtain the updated current input, and the updated current input is "to which we evening". Correspondingly, the current output of the long sentence prediction model is "eat". Correspondingly, splicing the current output again after the current input to obtain updated current input, wherein the updated current input is 'what we eat at night', at this time, the current output of the long sentence prediction model is 'NULL', and the current input corresponding to the long sentence prediction model is 'what we eat at night', namely, the long sentence candidate matched with the candidate word.
In a second implementation scenario, determining suffix words matched with the candidate words and the on-screen words through a long sentence prediction model, wherein the long sentence prediction model is learned to obtain the corresponding relation between the candidate words and the suffix words; and generating a candidate long sentence according to the candidate word and the suffix word.
Specifically, the candidate words can be spliced after the screen words are displayed to obtain spliced words, the spliced words are input into the long sentence prediction model to obtain suffix words corresponding to the spliced words through the long sentence prediction model, and then the suffix words are spliced after the spliced words to obtain the candidate long sentences.
For example, assuming that the on-screen word is "we", the candidate word corresponding to the current input sequence is "evening", the on-screen word and the candidate word may be spliced to obtain the spliced word as "we evening", if the suffix word after the spliced word is predicted to be "to what eat" according to the long sentence prediction model, that is, "to what eat" is the suffix word matching the on-screen word and the candidate word, at this time, the generated candidate long sentence is "to what eat we go" according to the candidate word and the suffix word.
In practical application, sentence usage habits of different users may be different, so that in order to make the provided candidate long sentences more in line with user requirements, as an exemplary implementation manner, after obtaining a candidate long sentence matched with a candidate word, sentence preference characteristics of the user may be obtained, and in combination with the sentence preference characteristics, the obtained candidate long sentence is fed back to the terminal device.
Step 505, displaying the candidate words and the candidate long sentences on the input method application.
According to the method for providing the candidate long sentences in the input method, the current input sequence input by the user in the input method application is obtained, the candidate words matched with the current input sequence are obtained, the on-screen words before the current input sequence is input are obtained, the candidate long sentences corresponding to the candidate words and the on-screen words are obtained by combining a pre-trained long sentence prediction model, and the candidate long sentences are displayed while the candidate words are displayed in the input method application, so that the matched candidate long sentences are obtained quickly by combining the pre-trained long sentence prediction model, the candidate long sentences are provided for the user, the user can conveniently and quickly complete the input of the long sentences according to the candidate long sentences, the input cost of the user is reduced, and the user experience is improved.
Fig. 6 is a schematic structural diagram of a device for providing candidate long sentences in an input method according to an embodiment of the present application.
As shown in fig. 6, the device for providing candidate long sentences in the input method includes a first obtaining module 110, a second obtaining module 120, a third obtaining module 130 and a display module 140, where:
the first obtaining module 110 is configured to obtain a current input sequence input by a user in the input method application.
A second obtaining module 120, configured to obtain a candidate word that matches the current input sequence.
And a third obtaining module 130, configured to obtain, according to a pre-trained long sentence prediction model, a candidate long sentence matched with the candidate word.
And the display module 140 is used for displaying the candidate words and the candidate long sentences on the input method application.
In one embodiment of the present application, the third obtaining module 130 is specifically configured to: and taking the candidate words as the current input of the long sentence prediction model. And inputting the current output into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output comprises the next word after the current input. When the next word is not matched with the preset sentence termination information, updating the current input of the long sentence prediction model according to the current output and the current input, and acquiring the current output corresponding to the current input through the long sentence prediction model until the current output of the long sentence prediction model is matched with the preset sentence termination information. When the current output of the long sentence prediction model is matched with the preset sentence termination information, generating a candidate long sentence matched with the candidate word according to the current input of the long sentence prediction model.
In one embodiment of the present application, based on the embodiment of the apparatus shown in fig. 6, as shown in fig. 7, the apparatus may include:
the fourth obtaining module 150 is configured to obtain training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, and the suffix sample words are words that appear after the prefix sample words.
The first training module 160 is configured to train the long sentence prediction model according to the prefix sample word and the suffix sample word.
In one embodiment of the present application, the third obtaining module 130 is specifically configured to: and determining the suffix word matched with the candidate word through a long sentence prediction model, wherein the long sentence prediction model is learned to obtain the corresponding relation between the candidate word and the suffix word. And generating a candidate long sentence according to the candidate word and the suffix word.
In one embodiment of the present application, based on the embodiment of the apparatus shown in fig. 6, as shown in fig. 8, the apparatus further includes:
the fifth obtaining module 170 is configured to obtain training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, and the prefix sample words and the suffix sample words may form long sentences.
The second training module 180 is configured to train the long sentence prediction model according to the prefix sample word and the suffix sample word.
In one embodiment of the present application, based on the embodiment of the apparatus shown in fig. 6, as shown in fig. 9, the apparatus may further include:
a sixth obtaining module 190 is configured to obtain the on-screen word before the current input sequence.
The third obtaining module 130 is specifically configured to: and obtaining candidate long sentences matched with the candidate words and the on-screen words by adopting a pre-trained long sentence prediction model.
It should be understood that the structure of the sixth acquisition module 190 in the embodiment of the apparatus shown in fig. 9 may be also included in the embodiment of the apparatus shown in fig. 7 or fig. 8, and this embodiment is not limited thereto.
The foregoing explanation of the embodiment of the method for providing the candidate long sentence in the input method is also applicable to the device for providing the candidate long sentence in the input method of this embodiment, and the implementation principle is similar and will not be repeated here.
According to the device for providing the candidate long sentences in the input method, the current input sequence input by the user in the input method application is obtained, the candidate words matched with the current input sequence are obtained, the corresponding candidate long sentences are obtained by combining the pre-trained long sentence prediction model and the candidate words, the candidate long sentences are displayed while the candidate words are displayed on the input method application, therefore, the matched candidate long sentences are obtained quickly by combining the pre-trained long sentence prediction model, the candidate long sentences are provided for the user, the user can conveniently and quickly complete the input of the long sentences according to the candidate long sentences, the input cost of the user is reduced, and the user experience is improved.
Fig. 10 is a schematic structural view of an electronic device according to an embodiment of the present application. The electronic device includes:
memory 1001, processor 1002, and a computer program stored on memory 1001 and executable on processor 1002.
The processor 1002 implements the method for providing the candidate long sentence in the input method provided in the above embodiment when executing the program.
Further, the electronic device further includes:
a communication interface 1003 for communication between the memory 1001 and the processor 1002.
Memory 1001 for storing computer programs that may be run on processor 1002.
Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 1002 is configured to implement the method for providing the candidate long sentence in the input method according to the above embodiment when executing the program.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Peripheral Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 10, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on a chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through internal interfaces.
The processor 1002 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the method for providing a candidate long sentence in the input method as above.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (12)

1. The method for providing the candidate long sentence in the input method is characterized by comprising the following steps:
acquiring a current input sequence input by a user in an input method application;
acquiring candidate words matched with the current input sequence;
obtaining a candidate long sentence matched with the candidate word according to a pre-trained long sentence prediction model, wherein the candidate long sentence comprises the candidate word and a suffix word behind the candidate word;
displaying candidate words and the candidate long sentences on the input method application;
the obtaining the candidate long sentence matched with the candidate word according to the pre-trained long sentence prediction model comprises the following steps:
taking the candidate words as the current input of the long sentence prediction model;
Inputting the current into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output comprises the next word after the current input;
when the fact that the next word is not matched with the preset sentence termination information is determined, updating the current input of the long sentence prediction model according to the current output and the current input, and acquiring the current output corresponding to the current input through the long sentence prediction model until the current output of the long sentence prediction model is matched with the preset sentence termination information;
and when the current output of the long sentence prediction model is matched with the preset sentence termination information, generating a candidate long sentence matched with the candidate word according to the current input of the long sentence prediction model.
2. The method of claim 1, further comprising, prior to inputting the current into the long sentence prediction model to obtain a current output of the long sentence prediction model:
obtaining training corpus data, wherein the training corpus data comprises prefix sample words and suffix sample words corresponding to the prefix sample words, and the suffix sample words are words appearing after the prefix sample words;
And training the long sentence prediction model according to the prefix sample words and the suffix sample words.
3. The method of claim 1, wherein the obtaining the candidate long sentence matching the candidate word according to the pre-trained long sentence prediction model comprises:
determining a suffix word matched with the candidate word through the long sentence prediction model, wherein the long sentence prediction model is learned to obtain the corresponding relation between the candidate word and the suffix word;
and generating the candidate long sentence according to the candidate word and the suffix word.
4. The method of claim 3, further comprising, prior to said determining, by said long sentence predictive model, a suffix word that matches said candidate word:
acquiring training corpus data, wherein the training corpus data comprises prefix sample words and suffix sample words corresponding to the prefix sample words, and the prefix sample words and the suffix sample words can form long sentences;
and training the long sentence prediction model according to the prefix sample words and the suffix sample words.
5. The method of any one of claims 1-4, further comprising:
acquiring on-screen words before the current input sequence;
the obtaining the candidate long sentence matched with the candidate word according to the pre-trained long sentence prediction model comprises the following steps:
and obtaining candidate long sentences matched with the candidate words and the on-screen words by adopting a pre-trained long sentence prediction model.
6. A device for providing candidate long sentences in an input method, the device comprising:
the first acquisition module is used for acquiring a current input sequence input by a user in the input method application;
the second acquisition module is used for acquiring candidate words matched with the current input sequence;
the third acquisition module is used for acquiring a candidate long sentence matched with the candidate word according to a pre-trained long sentence prediction model, wherein the candidate long sentence comprises the candidate word and a suffix word behind the candidate word;
the display module is used for displaying candidate words and the candidate long sentences on the input method application;
the third obtaining module is specifically configured to:
taking the candidate words as the current input of the long sentence prediction model;
Inputting the current into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output comprises the next word after the current input;
when the fact that the next word is not matched with the preset sentence termination information is determined, updating the current input of the long sentence prediction model according to the current output and the current input, and acquiring the current output corresponding to the current input through the long sentence prediction model until the current output of the long sentence prediction model is matched with the preset sentence termination information;
and when the current output of the long sentence prediction model is matched with the preset sentence termination information, generating a candidate long sentence matched with the candidate word according to the current input of the long sentence prediction model.
7. The apparatus as recited in claim 6, further comprising:
a fourth obtaining module, configured to obtain training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, where the suffix sample words are words that appear after the prefix sample words;
and the first training module is used for training the long sentence prediction model according to the prefix sample words and the suffix sample words.
8. The apparatus of claim 6, wherein the third acquisition module is specifically configured to:
determining a suffix word matched with the candidate word through the long sentence prediction model, wherein the long sentence prediction model is learned to obtain the corresponding relation between the candidate word and the suffix word;
and generating the candidate long sentence according to the candidate word and the suffix word.
9. The apparatus as recited in claim 8, further comprising:
a fifth obtaining module, configured to obtain training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, where the prefix sample words and the suffix sample words may form long sentences;
and the second training module is used for training the long sentence prediction model according to the prefix sample words and the suffix sample words.
10. The apparatus according to any one of claims 6-9, further comprising:
a sixth acquisition module, configured to acquire a word that has been on-screen before the current input sequence;
the third obtaining module is specifically configured to:
And obtaining candidate long sentences matched with the candidate words and the on-screen words by adopting a pre-trained long sentence prediction model.
11. An electronic device, comprising:
a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for providing candidate long sentences in the input method according to any one of claims 1-5 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a method for providing candidate long sentences in an input method according to any one of claims 1-5.
CN201910927584.XA 2019-09-27 2019-09-27 Method and device for providing candidate long sentences in input method Active CN110673748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910927584.XA CN110673748B (en) 2019-09-27 2019-09-27 Method and device for providing candidate long sentences in input method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910927584.XA CN110673748B (en) 2019-09-27 2019-09-27 Method and device for providing candidate long sentences in input method

Publications (2)

Publication Number Publication Date
CN110673748A CN110673748A (en) 2020-01-10
CN110673748B true CN110673748B (en) 2023-04-28

Family

ID=69079711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910927584.XA Active CN110673748B (en) 2019-09-27 2019-09-27 Method and device for providing candidate long sentences in input method

Country Status (1)

Country Link
CN (1) CN110673748B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113589954A (en) * 2020-04-30 2021-11-02 北京搜狗科技发展有限公司 Data processing method and device and electronic equipment
CN112052649A (en) * 2020-10-12 2020-12-08 腾讯科技(深圳)有限公司 Text generation method and device, electronic equipment and storage medium
CN112506359B (en) * 2020-12-21 2023-07-21 北京百度网讯科技有限公司 Method and device for providing candidate long sentences in input method and electronic equipment
CN112527127B (en) * 2020-12-23 2022-01-28 北京百度网讯科技有限公司 Training method and device for input method long sentence prediction model, electronic equipment and medium
CN113449515A (en) * 2021-01-27 2021-09-28 心医国际数字医疗系统(大连)有限公司 Medical text prediction method and device and electronic equipment
CN113655893A (en) * 2021-07-08 2021-11-16 华为技术有限公司 Word and sentence generation method, model training method and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011128958A (en) * 2009-12-18 2011-06-30 Chiteki Mirai:Kk Device, method and program for inputting sentence
CN102866782A (en) * 2011-07-06 2013-01-09 哈尔滨工业大学 Input method and input method system for improving sentence generating efficiency
CN110187780A (en) * 2019-06-10 2019-08-30 北京百度网讯科技有限公司 Long text prediction technique, device, equipment and storage medium
CN110286778A (en) * 2019-06-27 2019-09-27 北京金山安全软件有限公司 Chinese deep learning input method and device and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10235548B4 (en) * 2002-03-25 2012-06-28 Agere Systems Guardian Corp. Method and device for the prediction of a text message input
JP2007034871A (en) * 2005-07-29 2007-02-08 Sanyo Electric Co Ltd Character input apparatus and character input apparatus program
CN105718070A (en) * 2016-01-16 2016-06-29 上海高欣计算机系统有限公司 Pinyin long sentence continuous type-in input method and Pinyin long sentence continuous type-in input system
CN105759984B (en) * 2016-02-06 2019-07-02 上海触乐信息科技有限公司 The method and apparatus of secondary input text
CN105929979B (en) * 2016-06-29 2018-09-11 百度在线网络技术(北京)有限公司 Long sentence input method and device
CN107688398B (en) * 2016-08-03 2019-09-17 中国科学院计算技术研究所 It determines the method and apparatus of candidate input and inputs reminding method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011128958A (en) * 2009-12-18 2011-06-30 Chiteki Mirai:Kk Device, method and program for inputting sentence
CN102866782A (en) * 2011-07-06 2013-01-09 哈尔滨工业大学 Input method and input method system for improving sentence generating efficiency
CN110187780A (en) * 2019-06-10 2019-08-30 北京百度网讯科技有限公司 Long text prediction technique, device, equipment and storage medium
CN110286778A (en) * 2019-06-27 2019-09-27 北京金山安全软件有限公司 Chinese deep learning input method and device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IUY ; .拼音输入法词库广度及选词精度全测试.网络与信息.2009,(第10期),第10-11页. *
袁哲 ; .人工智能在拼音输入法中的应用.软件导刊.2010,(第06期),第10-12页. *

Also Published As

Publication number Publication date
CN110673748A (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN110673748B (en) Method and device for providing candidate long sentences in input method
CN110377716B (en) Interaction method and device for conversation and computer readable storage medium
CN107731228B (en) Text conversion method and device for English voice information
CN106534548B (en) Voice error correction method and device
CN109471915B (en) Text evaluation method, device and equipment and readable storage medium
CN111507099A (en) Text classification method and device, computer equipment and storage medium
CN104573099B (en) The searching method and device of topic
CN108920644B (en) Method, device, equipment and computer readable medium for judging conversation continuity
JP6677419B2 (en) Voice interaction method and apparatus
CN110750993A (en) Word segmentation method, word segmentation device, named entity identification method and system
CN108304376B (en) Text vector determination method and device, storage medium and electronic device
CN110187780B (en) Long text prediction method, long text prediction device, long text prediction equipment and storage medium
CN116797695A (en) Interaction method, system and storage medium of digital person and virtual whiteboard
CN113516972B (en) Speech recognition method, device, computer equipment and storage medium
CN110188327B (en) Method and device for removing spoken language of text
CN112559725A (en) Text matching method, device, terminal and storage medium
CN112712121A (en) Image recognition model training method and device based on deep neural network and storage medium
CN108829896B (en) Reply information feedback method and device
CN113569581B (en) Intention recognition method, device, equipment and storage medium
CN115587173A (en) Dialog text prediction method, device, equipment and storage medium
CN111368553B (en) Intelligent word cloud image data processing method, device, equipment and storage medium
CN109597884B (en) Dialog generation method, device, storage medium and terminal equipment
CN110970030A (en) Voice recognition conversion method and system
CN111899738A (en) Dialogue generating method, device and storage medium
CN111161737A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant