CN107422872B - Input method, input device and input device - Google Patents

Input method, input device and input device Download PDF

Info

Publication number
CN107422872B
CN107422872B CN201610350134.5A CN201610350134A CN107422872B CN 107422872 B CN107422872 B CN 107422872B CN 201610350134 A CN201610350134 A CN 201610350134A CN 107422872 B CN107422872 B CN 107422872B
Authority
CN
China
Prior art keywords
word
sequence
vector
score
multivariate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610350134.5A
Other languages
Chinese (zh)
Other versions
CN107422872A (en
Inventor
崔欣
张扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201610350134.5A priority Critical patent/CN107422872B/en
Publication of CN107422872A publication Critical patent/CN107422872A/en
Application granted granted Critical
Publication of CN107422872B publication Critical patent/CN107422872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides an input method, an input device and a device for inputting, wherein the input method specifically comprises the following steps: acquiring a first vector sequence corresponding to an input string; calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule; and determining a candidate item corresponding to the input string according to the first multivariate relation score. The embodiment of the invention can store the vector for obtaining the first vector sequence, and can not store all the multivariate relations more than or equal to 2, thereby saving a large amount of storage space.

Description

Input method, input device and input device
Technical Field
The present invention relates to the field of input methods, and in particular, to an input method, an input device, and an input device.
Background
For users in chinese, japanese, korean, etc., it is generally necessary to interact with a computer through an input method system. For example, a user can type an input string through a keyboard, and then the input string is converted into a candidate item of a corresponding language and displayed by the input method system according to a preset standard mapping rule, so that the candidate item selected by the user is displayed on a screen.
With the continuous development of input method technology and the continuous improvement of input experience, the demand of users for inputting long words or sentences is increasing, for example, the input of long words: "crab is grabbed by seaside", "daily decline in accumulated money", "Dezhou of America, and" today is really sunny ", etc. In order to meet the input requirements of users for the long words or sentences, according to a traditional n-gram (n-gram) storage mode, triples or multi-tuple such as' seaside | crab | are required to be stored in a system word stock.
However, in practical applications, when the n-gram is greater than or equal to 3, the multi-gram to be stored will increase by a geometric multiple, and it is obvious that input devices with limited memory, such as mobile phones, tablet computers, etc., cannot satisfy the complete storage of the n-gram storage structure, so the system lexicon usually adopts a 2-gram relationship. It can be seen that the existing n-gram storage mode cannot meet the requirement of the multivariate relation under the condition of limited storage space.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are provided to provide an input method, an input apparatus, and an input apparatus for input that overcome or at least partially solve the above problems, and that can simplify the input process of mathematical expressions and improve the input efficiency.
In order to solve the above problem, an embodiment of the present invention discloses an input method, including:
acquiring a first vector sequence corresponding to an input string;
calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule;
and determining a candidate item corresponding to the input string according to the first multivariate relation score.
Optionally, the step of obtaining a first vector sequence corresponding to the input string includes:
segmenting an input string of a user according to the element words to obtain a first character segmentation result;
acquiring a first word sequence corresponding to the first character segmentation result;
inquiring an established word vector library to obtain a vector corresponding to each element word in the first element word sequence;
and sequentially connecting vectors corresponding to each element word in the first element word sequence in series to obtain a first vector sequence corresponding to the input string.
Optionally, the word vector library is established by:
acquiring a meta word number corresponding to a meta word in a word bank;
generating corresponding vectors for the element words in the word stock;
and establishing a word vector library according to the mapping relation between the meta word number and the vector.
Optionally, the method further comprises:
acquiring a system word sequence corresponding to the input string;
determining a second multivariate relation score corresponding to the system word sequence;
the step of determining the candidate item corresponding to the input string according to the first multivariate relationship score includes:
and determining the candidate item corresponding to the input string according to the ranking of the first multivariate relation score and the second multivariate relation score.
Optionally, the step of obtaining the system word sequence corresponding to the input string includes:
segmenting the input string according to the system words to obtain a second character segmentation result;
and acquiring a system word sequence corresponding to the second character segmentation result.
Optionally, the step of determining a second multivariate relationship score corresponding to the system word sequence includes:
searching in a system word library to obtain the word frequency corresponding to each system word in the system word sequence, and calculating to obtain a unary word-forming score corresponding to the system word sequence;
when a binary relation exists in the system word sequence, calculating to obtain a binary word group score corresponding to the system word sequence according to the binary relation;
and determining a second multivariate relation score corresponding to the system word sequence according to the unary word composition score and the binary word composition score.
Optionally, the method further comprises:
acquiring a second word sequence corresponding to the upper text and/or the lower text of the input string;
inquiring an established word vector library to obtain a vector corresponding to each element word in the second element word sequence;
sequentially connecting vectors corresponding to each element word in the second element word sequence in series to obtain a second vector sequence;
and calculating a third multivariate relation score between the first vector sequence and the second vector sequence, and adjusting the ranking of the candidate items corresponding to the input string according to the third multivariate relation score.
Optionally, the method further comprises:
acquiring association candidate items corresponding to the input according to the upper text and/or the lower text of the input string;
acquiring a third vector sequence corresponding to the association candidate item;
and calculating a fourth multivariate relation score between the second vector sequence and the third vector sequence, and performing ranking display on the association candidate items according to the fourth multivariate relation score.
In another aspect, an embodiment of the present invention discloses an input device, including:
the first vector sequence acquisition module is used for acquiring a first vector sequence corresponding to the input string;
the first multivariate relation calculation module is used for calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule; and
and the candidate item determining module is used for determining the candidate item corresponding to the input string according to the first multivariate relationship score.
In yet another aspect, an embodiment of the present invention discloses a device for input, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors include instructions for:
acquiring a first vector sequence corresponding to an input string;
calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule;
and determining a candidate item corresponding to the input string according to the first multivariate relation score.
The embodiment of the invention has the following advantages:
in the embodiment of the present invention, the first vector sequence corresponding to the input string may be calculated according to a preset multivariate relation calculation rule, so as to obtain a multivariate relation score corresponding to the first vector sequence, and the candidate item corresponding to the input string is determined according to the multivariate relation score, so that the obtained candidate item may embody the multivariate relation in the input string. Because the multivariate relation score of the embodiment of the invention is obtained by calculating the first vector sequence, and is not obtained from the word stock, that is, the embodiment of the invention only needs to store the vector for obtaining the first vector sequence, but does not store all the multivariate relations more than or equal to 2, thereby saving a large amount of storage space.
Drawings
FIG. 1 is a flow chart of the steps of a first embodiment of an input method of the present invention;
FIG. 2 is a flow chart of the steps of one embodiment of a method of generating a library of word vectors of the present invention;
FIG. 3 is a flowchart of the third step of an input method embodiment of the present invention;
FIG. 4 is a flow chart of the steps of an embodiment of an input method of the present invention;
FIG. 5 is a flow chart of the steps of an input method embodiment five of the present invention;
FIG. 6 is a block diagram of an input device according to an embodiment of the present invention;
FIG. 7 is a block diagram of an apparatus 800 for input of the present invention; and
fig. 8 is a schematic diagram of a server according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
One of the core concepts of the embodiments of the present invention is to provide a scheme for obtaining a multivariate relationship score by calculation according to a preset multivariate relationship calculation rule in an input process using an input method, and determine a candidate item corresponding to an input string according to the multivariate relationship score, so that the obtained candidate item can reflect the multivariate relationship in the input string. In this scheme, since the multivariate relationship score of the embodiment of the present invention is obtained by calculating the first vector sequence, rather than being obtained from the lexicon, that is, the embodiment of the present invention may only store the vector used for obtaining the first vector sequence, and may not store all the multivariate relationships greater than or equal to 2, so that a large amount of storage space may be saved.
Method embodiment one
Referring to fig. 1, a flowchart illustrating steps of a first embodiment of an input method according to the present invention is shown, which may specifically include the following steps:
step 101, acquiring a first vector sequence corresponding to an input string;
102, calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule;
step 103, determining a candidate item corresponding to the input string according to the first multivariate relationship score.
The embodiment of the invention can be applied to input method systems of various input modes, such as pinyin input, English input, stroke input, voice input, handwriting input and the like. The user can complete the input of the input string by any input mode, namely, the user can input by a physical keyboard, a virtual keyboard, a handwriting board, a touch screen, a sound acquisition device and the like. The input string may be composed of any one or several of numbers, symbols, pinyin, English letters, etc. For convenience of description, the pinyin string is used as an input string in the embodiments of the present invention, and other types of input strings may be referred to each other.
In the process of inputting by using an input method, in order to obtain more n-gram relationships (n is greater than or equal to 2), a considerable storage space is required to store an n-gram structure, however, for a terminal device (such as a mobile phone) with a limited storage space, it is difficult to store all n-gram relationships with n being greater than or equal to 2, and usually only 2-gram structures, that is, binary relationships, are stored, however, it is difficult to meet the requirement of a user for inputting long words or sentences only by storing binary relationships. Therefore, in order to solve the above problems, embodiments of the present invention provide a scheme for obtaining an n-gram relationship by calculating an input vector corresponding to an input string, and the method for obtaining an n-gram relationship provided in embodiments of the present invention can meet a user's requirement for inputting a long word or a sentence without storing all multivariate relationships greater than or equal to 2, thereby saving a storage space. Wherein, all the multivariate relations greater than or equal to 2 can specifically comprise complete n-gram structures, such as 2-gram, 3-gram, 4-gram and up to n-gram structures; or long words or sentences with multiple relations, such as "we have a meal together", "today is really sunny", etc.
In an optional embodiment of the present invention, the step of obtaining the first vector sequence corresponding to the input string may specifically include:
the substep S11, segmenting the input string of the user according to the meta word to obtain a first character segmentation result;
substep S12, obtaining a first word sequence corresponding to the first character segmentation result;
step S13, inquiring the established word vector library to obtain the vector corresponding to each element word in the first element word sequence;
and a substep S14 of sequentially connecting the vectors corresponding to each element word in the first element word sequence in series to obtain a first vector sequence corresponding to the input string.
In the embodiment of the invention, the meta-word can be used for representing words with independence of concepts and units of concepts. The independence of the concept means that the concept expressed by the vocabulary has independent and complete meaning; the unit character of the concept means that the concept expressed by the element word is a most basic concept unit, namely, the concept word can not be split in meaning or in word. For example, "mathematics" is a meta word, which represents an independent concept and a unit concept, and can not be divided into "number" and "mathematics"; the mathematical model is not a meta word, and although the mathematical model represents an independent concept, the mathematical model can be further split into two meta words, namely a mathematical word and a model word.
The embodiment of the invention expresses the meta-word by using the vector, establishes a word vector library, can store the corresponding relation between the meta-word and the vector in the word vector library, and can obtain the vector corresponding to the meta-word by inquiring the word vector library. According to the embodiment of the invention, the element words are represented by the vectors, the correlation among the element words can be obtained through numerical calculation under the condition that the n-element relation of an n-gram structure is not stored, and the n-element relation among the element words can be further obtained through model calculation.
In an application example of the present invention, assuming that a received input string is "mantindaxue", the input string is segmented according to a primitive word, and the following character segmentation result can be obtained: [ mantian ] [ daxue ], the meta-word sequence corresponding to the character segmentation result may include: wandering | big snow, full-weather | big snow, and the like. The obtaining of the vector corresponding to each meta word in the meta word sequence by querying the established word vector library may specifically be: the vector corresponding to the meta-word "wandering sky" is V1, the vector corresponding to the meta-word "full sky" is V2, the vector corresponding to the meta-word "big snow" is V3, and the vector corresponding to the meta-word "university" is V4; the first vector sequence corresponding to the input string "mantindaxue" may include: (V1, V3), (V1, V4), (V2, V3), (V2, V4). Then, a first multivariate relation score corresponding to the first vector sequence may be calculated according to a preset multivariate relation calculation rule, where a binary relation score is calculated, for example, the binary relation score of the first vector sequence (V1, V3) is 90, (V1, V4) is 10, and the binary relation score of (V2, V3) is 60, (V2, V4) is 2; it can be seen that the binary relation score of the first vector sequence (V1, V3) is the highest, i.e. the connection between the terms "wandering sky" and "snowy snow" is the strongest, so that "wandering snowy snow" can be output as a candidate.
Alternatively, candidates whose multivariate relationship scores satisfy a preset threshold may be sorted and output, for example, in the above application example, if the preset threshold is set to 55, then "wandering snow" and "full snow" may be used as candidates, and sorted and output according to the binary relationship scores.
In an embodiment of the present invention, a first multivariate relationship score of the first vector sequence may be calculated using a preset model. The preset model may be a multilayer neural network. The preset model may have an input of a vector sequence and an output of a probability value representing a multivariate relationship score. When the preset model is trained, the existing context and the corresponding candidate items can be used as a training set to obtain the vector expression of each meta-word and the expected output (0 or 1) for training, and finally the parameters of all the nodes in the multilayer neural network are obtained.
For example, after vectors corresponding to three meta-words are connected in series, a vector sequence (V1, V2, V3) is obtained, the vector sequence (V1, V2, V3) can be used as the input of the model, the output of the model is a probability value, and the larger the probability value is, the stronger the ternary relationship among the three meta-words is; conversely, the weaker the ternary relationship. It can be understood that the above-mentioned vector sequence obtained by means of vector concatenation is only an application example of the present invention, and the embodiment of the present invention does not limit the specific way of obtaining the vector sequence. It is also possible to operate, for example, in a convolution-like manner to obtain a sequence of vectors of fixed window size. Specifically, a CNN (Convolutional Neural Network) method may be used to process a plurality of vectors, and for a CNN model, regardless of the number and size of vectors input thereto, the CNN model has the capability of integrating the input vectors and outputting a vector sequence with a fixed dimension.
In another example of application of the present invention, suppose the user wants to input "today is really sunny", the sequence of meta-words corresponding to the input string is obtained and slid to the right in a sliding window of size 3 to obtain the ternary relationship between every three adjacent meta-words in the input string. First the sequence of words in the first sliding window may be "today | weather | true yes"; obtaining the vector corresponding to each element word by inquiring the word vector library, connecting the vectors end to obtain the corresponding vector sequence, using the vector sequence as the input of the preset model, wherein the output of the preset model is the calculated ternary relationship score. Then the window is slid to the right, the ternary relationship score of 'weather is really sunlight' is continuously calculated, and the ternary relationship score with high is output as a candidate item.
In summary, in the embodiment of the present invention, a first vector sequence corresponding to an input string may be calculated according to a preset multivariate relation calculation rule, so as to obtain a multivariate relation score corresponding to the first vector sequence, and a candidate item corresponding to the input string is determined according to the multivariate relation score, so that the obtained candidate item may represent a multivariate relation in the input string. Because the multivariate relation score of the embodiment of the invention is obtained by calculating the first vector sequence, and is not obtained from the word stock, that is, the embodiment of the invention only needs to store the vector for obtaining the first vector sequence, but does not store all the multivariate relations more than or equal to 2, thereby saving a large amount of storage space.
Method embodiment two
The present embodiment describes in detail a specific process of generating a word vector library on the basis of the first embodiment. Referring to fig. 2, a flowchart illustrating steps of an embodiment of a method for generating a word vector library according to the present invention is shown, which may specifically include:
step 201, acquiring a meta word number corresponding to a meta word in a word stock;
step 202, generating corresponding word vectors for the meta-words in the word stock;
step 203, establishing a word vector library according to the mapping relation between the meta word number and the word vector.
The word stock in the embodiment of the present invention may specifically include: a system lexicon, a user lexicon, a system n-gram, and a word vector library. The system lexicon can be a unary vocabulary with higher input frequency obtained according to corpus statistics; the user word bank is a unitary or multi-component word list which is collected according to the input behaviors of the user and accords with the input habits of the user; the system n-gram multivariate vocabulary can be an n-gram multivariate vocabulary obtained by statistics according to the connection relation of two or more words in the corpus, and is usually a binary vocabulary of 2-gram; the word vector library can be a vector word list obtained by representing element words in the system word library by using vectors.
It can be understood that the word vector library established in the embodiment of the present invention may be established according to any one of the word libraries, and for convenience of description, in the embodiment of the present invention, the word vector library established according to the meta-word in the system word library is described as an example, and the scenes of the word vector library established according to the meta-word in the other word libraries may be referred to each other.
In order to save the storage space occupied by the word stock, the n-element library of the system used in the embodiment of the invention can only store binary relations, and can be obtained by vector calculation for ternary relations and multivariate relations above ternary relations. Of course, in practical application, the stored multivariate relationship can be selected according to the processing or storage capability of the system, for example, the multivariate relationship of ternary and more than ternary can also be stored, and the vector calculation of the invention can be matched to the multivariate relationship which is not stored. In summary, the embodiment of the present invention does not limit the specific content stored in the n-ary library of the system.
In a specific application, the n-gram multivariate vocabulary can be used for representing the connection relationship of two or more words, taking the binary relationship of "wandering sky and snow" as an example, and in the n-gram multivariate vocabulary, the binary frequency of the two words can be used for representing the strength of the connection relationship between the two words. The embodiment of the invention expresses the meta-words in the system word stock by using the vectors, and calculates the two vectors to obtain a score to express the strength of the connection relation between the two meta-words. Therefore, the multivariate relation can be obtained by only storing the word vectors corresponding to the meta-words through calculation, and the actual n-gram relation does not need to be stored, so that a large amount of storage space can be saved.
In the embodiment of the present invention, the system word library may store information such as a system word and a word frequency and a meta word number corresponding to the system word, and specifically, the system word may be stored according to the following format: the system entry i | word frequency i | element word number i. The meta word number may be a positive integer, that is, an integer represents a system entry. For example, the system word library stores the following system words: full sky |506|39, diffuse sky |501|23, college |701|67, snowy |302|89, and so on. The word frequency corresponding to the system word "full day" is 506, and the system word "full day" is a meta-word itself, and the meta-word number corresponding to the "full day" is 39. In another example of the present invention, the system lexicon stores the following system words: the united states of america |368|0, where the word frequency corresponding to the system word "the united states of america" is 368, since the system word "the united states of america" is not a meta word, the corresponding meta word number may be identified as 0.
In the embodiment of the invention, the multivariate relation among the meta word numbers can be stored in the n-ary library of the system. Specifically, taking a binary relationship as an example, the binary relationship between the meta word numbers may be stored in the following format: the primitive word i | primitive word j | binary frequency. The binary frequency can be used for representing the strength of the connection relation between the meta-word i and the meta-word j. For example, the system n-gram stores the following binary relationships: 23|89|8, 23|67| 1. As can be known by querying a system word stock, if the meta-word corresponding to the meta-word number 23 is "wandering sky" and the meta-word corresponding to the meta-word number 89 is "big snow", the binary frequency between the meta-words "wandering sky" and "big snow" is 8; and, the meta word corresponding to the meta word number 67 is "university", the binary frequency between the meta words "roaming" and "university" is 1. It can be seen that the connection between "wandering" and "snowing" is stronger than the connection between "wandering" and "university".
In specific application, due to the limitation of storage space, a system n-gram library usually only stores binary relations, and in order to obtain more n-gram relations, the embodiment of the invention represents the meta-words in the system word library by using vectors, and obtains the n-gram relations among the meta-words by calculating the vectors. Specifically, the vector corresponding to the meta-word may be stored according to the following format: the meta word number i | vector < v1, v2, …, vd >. The vector may be a multidimensional vector, for example, the vector < v1, v2, …, vd > is a d-dimensional vector.
In an application example of the present invention, the following vectors are stored in the word vector library: 39| <0.5, 0.97.., 0.65>, 89| <0.43,0.67, …,0.12 >. As can be known by querying the system word stock, if the meta-word corresponding to the meta-word number 39 is "full day", the meta-word "full day" can be expressed as a vector <0.5, 0.97., 0.65 >; the meta-word corresponding to the meta-word number 89 is "snowy", the meta-word "snowy" can be expressed as a vector <0.43,0.67, …,0.12 >. By calculating the two vectors, the strength of the binary relation between the element words 'full sky' and 'big snow' can be obtained.
The vector corresponding to the meta word can be obtained according to a vocabulary distributed representation method, that is, a multi-dimensional vector can be used for representing the vocabulary. For example, in the above example, the word "full day" is represented by the vector <0.5, 0.97.
After the vocabularies are expressed by vectors, the strength of the connection relation among the vocabularies can be obtained by calculating the vectors corresponding to a plurality of vocabularies. Specifically, the calculation of the plurality of vectors may specifically adopt a calculation manner of an inner product of the vectors or a calculation manner of other model types, and it is understood that the calculation manner of the vectors is not limited in the embodiment of the present invention.
In an alternative embodiment of the present invention, the following vector calculation may be provided:
in a first mode
The calculation is performed by means of vector inner products. For example, vector d1 (d) is calculated11,d12,d13,...,d1n) And d2 (d)21,d22,d23,…,d2n) The specific calculation formula of the multivariate relationship score is as follows:
result=d11×d21+d12×d22+d13×d23+d14×d24+…+d1n×d2n (1)
mode two
The calculation is performed by NNLM (Neural Network Language Model). Specifically, each NNLM sets the number of words in the input layer, for example, if the window of input words of the NNLM is 3, the number of nodes in the input layer is 3 × D, where D is the dimension of the vector. And (3) carrying out tail concatenation on vectors V1, V2 and V3 of the latest three words to obtain a vector sequence V (V1, V2 and V3), inputting V (V1, V2 and V3) into the NNLM, and outputting the vector sequence V to obtain the multivariate relation scores of the vectors V1, V2 and V3.
Mode III
The calculation is performed by RNN (Recurrent Neural Networks). Specifically, the number of words in the input word window is not limited, and the vector of each word is input into the RNN to obtain the hidden layer representation; this hidden layer representation will be combined with the next input as the input to the next RNN; the number of the neurons of the output layer is the same as the size of the element word list, and the output of each neuron is the predicted probability of the word.
In an optional embodiment of the present invention, a meta-word may correspond to a plurality of different vector representations, so that the multivariate relationship obtained from vector calculation is more accurate in different input scenarios. For example, different vector representations may be employed for different input scenarios such as QQ (instant messaging), maps, games, word (word processor), etc. For example, a certain meta-word may represent a certain place name in a map, but have a different meaning in other scenarios. Therefore, a plurality of different vectors are set for the meta-word, and the vectors corresponding to the input scenes can be obtained in different input scenes, so that the accuracy of vector representation is improved.
It can be understood that, in practical applications, as for a specific use method of the word vector library, the embodiment of the present invention is not limited, for example, in an input process, the word vector library may be used alone to calculate the multivariate relationship; alternatively, a complex relation or the like may be obtained by comprehensive calculation using a combination of a system word library, a system n-ary library, and a word vector library.
According to the embodiment of the invention, the vector corresponding to each meta-word can be obtained through query by the word vector library, so that the multivariate relationship score among the meta-words can be obtained through calculation to express the strength of the connection relationship among the meta-words. Because the word vector library only needs to store the vector corresponding to the meta-word, and the actual n-gram relationship can not be stored, under the condition that the size of the word library is limited, more n-gram relationships can be obtained, and the coverage of the n-gram relationships is wider.
Method embodiment three
On the basis of the second embodiment, the candidate item corresponding to the input string may be determined by combining the multivariate relationship in the system n-gram library and the multivariate relationship obtained by vector calculation, so as to utilize the advantages of the system n-gram library in terms of high-frequency vocabulary and the advantages of the word vector library in terms of multivariate relationship coverage, so that the word formation result is more accurate.
Referring to fig. 3, a flowchart illustrating steps of a third embodiment of an input method according to the present invention is shown, which may specifically include the following steps:
step 301, acquiring a first vector sequence corresponding to an input string;
step 302, calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule;
step 303, acquiring a system word sequence corresponding to the input string;
step 304, determining a second multivariate relation score corresponding to the system word sequence;
step 305, determining the candidate item corresponding to the input string according to the ranking of the first multivariate relationship score and the second multivariate relationship score.
In the embodiment of the invention, after an input string of a user is received, a first vector sequence and a system word sequence corresponding to the input string are obtained; then, respectively calculating to obtain a first multivariate relation score corresponding to the first vector sequence and a second multivariate relation score corresponding to the system word sequence; and finally, determining the candidate item corresponding to the input string according to the ranking of the first multiple relation score and the second multiple relation score. The step of obtaining the system word sequence corresponding to the input string may specifically include:
step S21, segmenting the input string according to the system words to obtain a second character segmentation result;
and step S22, acquiring a system word sequence corresponding to the second character segmentation result.
The system words can adopt system words stored in an existing system word bank. In an application example of the present invention, an input string received from a user is "meilijianhezhongguodezhou", the input string may be segmented according to system words by querying a system thesaurus, and a corresponding system word sequence may be "united states of america | texas", in the system thesaurus, the system words may include meta-words or compound words, and since the compound word "united states of america" is a common proper noun, the whole has a special meaning, the compound word "united states of america" may also be stored as a system word. If the input string is segmented according to the meta-word, the following meta-word sequence can be obtained: "meridia | republic | texas".
The embodiment of the invention segments the input string by two segmentation modes to obtain the meta word sequence and the system word sequence corresponding to the input string, respectively calculates the multivariate relationship score among the meta words in the meta word sequence and the multivariate relationship score among the system words in the system word sequence, and finally determines the candidate item corresponding to the input string according to the score sorting so as to ensure that the obtained candidate item is more accurate. Specifically, the second multivariate relationship score corresponding to the system word sequence may be determined by:
step S31, searching in a system word library to obtain the word frequency corresponding to each system word in the system word sequence, and calculating to obtain the unary word-forming score corresponding to the system word sequence;
in the specific application, system words and word frequencies corresponding to the system words are stored in a system word bank, the word frequencies corresponding to the system words in the system word sequence can be obtained by querying the system word bank, and unary word-forming scores corresponding to the system word sequence can be obtained by calculating the product of the word frequencies.
In an application example of the present invention, it is assumed that the input string received from the user is "gongjijniandianjiang" (corresponding to Chinese: the accumulation fund heaven and earth). The input string is segmented according to the system words, and can be segmented into a plurality of syllable sequences, such as: gongji | jintianjian | tianjiang, gongjijin | tiantian | jiang, gong | jijin | tiantian | jiang, etc.; wherein each syllable sequence can correspond to one or more system word sequences, for example, "gongji | jindian | tianjiang" may correspond to system word sequences including "attack | today | day down", "cock | today | day down", etc.; "gongjijin | tiantian | jiang" may correspond to a system word sequence such as "accumulation fund | day | fall".
Calculating a unary word-forming score corresponding to each system word sequence according to the word frequency corresponding to each system word in the system word sequences, for example, for the system word sequence "public accumulation fund | sky | descending", by querying the system word stock, the word frequency corresponding to the system word is obtained as follows: p (accumulation), p (day), p (decline); the unary word-forming score scoreA corresponding to the systematic word sequence can be obtained by calculating the product of the word frequencies, and specifically, scoreA ═ p (accumulation fund) × p (day-day) × p (fall).
Step S32, when a binary relation exists in the system word sequence, calculating to obtain a binary word group score corresponding to the system word sequence according to the binary relation;
after the unary group word score is calculated, whether a binary relationship exists in the system word sequence may be further determined, specifically, a system n-ary library may be queried by using the system words in the system word sequence, for example, the system words "day and day" and "descent" have a binary relationship and are queried to obtain that the binary relationship score is scoreB, and a binary relationship does not exist between the system words "public fund" and "day and day", so that the binary word score corresponding to the system word sequence may be scoreB.
And step S33, determining a second multivariate relation score corresponding to the system word sequence according to the unary word composition score and the binary word composition score.
In an alternative embodiment of the present invention, the second multivariate relationship score corresponding to the systematic word sequence may be determined according to a product of the unary word composition score and the binary word composition score, and then score is scoreA × scoreB. Through the steps, second multivariate relation scores score1, score2 and … … scoreN corresponding to all system word sequences can be obtained through calculation.
In the above application example, the input string may correspond to the following first sequence of words: "attack | today | day descending", "cock | today | day descending", "accumulation fund | day | descending", obtaining the vector corresponding to each meta-word by querying the word vector library, thereby obtaining the first vector sequence corresponding to each first meta-word sequence, and obtaining the first multivariate relationship score1 ', score2 ', … … scorenn ' corresponding to each first vector sequence by calculation.
And finally, the first multivariate relation score and the second multivariate relation score obtained by calculation are ranked together, and the candidate items corresponding to the input string are output according to the scores. For example, if the first multivariate relationship score corresponding to the first term sequence "accumulation fund | heaven and sky | descent" is highest and the second multivariate relationship score corresponding to the system term sequence "accumulation fund | heaven and sky speech" is highest, then "accumulation fund heaven and sky descent" and "accumulation fund | heaven and sky speech" may be output as candidates, and the candidate "accumulation fund heaven and sky descent" may be ranked in front of the candidate "accumulation fund | heaven and sky speech".
Therefore, the process of combining the system word library, the system n-element library and the word vector library to carry out word combination is completed, and the advantage of the system n-element library in the aspect of high-frequency words and the advantage of the word vector library in the aspect of multi-element relation coverage are utilized, so that the word combination result is more accurate.
It can be understood that, in the above application example, the input string is segmented in two ways, i.e., the meta word and the system word, and the corresponding first multivariate relationship score and the second multivariate relationship score are respectively calculated, and then the candidate item is determined. Therefore, in the actual input process, the system word library, the system n-element library and the word vector library can be flexibly used according to requirements. For example, the input string may be segmented according to the system words to obtain a system word sequence, a second multivariate relationship score is obtained by querying the system n-ary, and if the second multivariate relationship score is high enough, for example, greater than a preset threshold, it is determined that the candidate item may be determined by the system word sequence, and the processes of segmenting the input string according to the meta words and calculating the first multivariate relationship score are not performed. Therefore, a part of calculation amount can be saved, and the input efficiency is further improved.
In summary, in the embodiment of the present invention, after an input string of a user is received, first, a first vector sequence and a system word sequence corresponding to the input string are obtained; then, respectively calculating to obtain a first multivariate relation score corresponding to the first vector sequence and a second multivariate relation score corresponding to the system word sequence; and finally, determining the candidate item corresponding to the input string according to the ranking of the first multiple relation score and the second multiple relation score. Therefore, the embodiment of the invention can be combined with the n-ary library of the system and the word vector library to comprehensively calculate the multivariate relation score so as to utilize the advantages of the n-ary library of the system in the aspect of high-frequency words and the advantages of the word vector library in the aspect of multivariate relation coverage, thereby enabling the word forming result to be more accurate.
Method example four
In this embodiment, a process of performing frequency modulation by using the established word vector library in the input process is described in detail on the basis of the second embodiment. Referring to fig. 4, a flowchart illustrating a fourth step of an input method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 401, obtaining a first vector sequence corresponding to an input string;
step 402, calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule;
step 403, determining a candidate item corresponding to the input string according to the first multivariate relationship score;
step 404, acquiring a second word sequence corresponding to the upper text and/or the lower text of the input string;
step 405, querying an established word vector library to obtain a vector corresponding to each primitive word in the second primitive word sequence;
step 406, sequentially connecting vectors corresponding to each element word in the second element word sequence in series to obtain a second vector sequence;
step 407, calculating a third multivariate relationship score between the first vector sequence and the second vector sequence, and adjusting the ranking of the candidate items corresponding to the input string according to the third multivariate relationship score.
The embodiment of the invention can also adjust the sorting of the candidate items according to the established word vector library. In an application example of the present invention, for example, the current input string is "px", the above of the input string is "go to sea scratch", the word of the input string "px" may be searched in a thesaurus such as a system thesaurus and a user thesaurus, and obtaining the candidate corresponding to the input string may include: "sorting", "leather shoes", "training", "crab", etc.
Firstly, obtaining a second word sequence corresponding to the above "go seaside scratch" as "go | seaside scratch", and obtaining a vector corresponding to each word in the second word sequence "go | seaside scratch" by querying a word vector library, wherein the vector corresponding to the "go" is V1, the vector corresponding to the "seaside" is V2, and the vector corresponding to the "scratch" is V3. And sequentially connecting vectors corresponding to each element word in the second element word sequence in series to obtain a second vector sequence (V1, V2 and V3).
It is to be understood that, the above-mentioned concatenating the vectors to obtain the second vector sequence is only an application example of the present invention, and in practical applications, the second vector sequence may also be obtained in other manners, for example, a current neural Network (RNN) model or the like may also be used to represent the vector corresponding to each meta-word as the second vector sequence corresponding to the whole above.
Then, a first vector sequence corresponding to the input string "px" is obtained, and since the candidate corresponding to the input string "px" is the meta-word itself, the meta-word division is not needed, the vector corresponding to the "sorting" is obtained by querying a word vector library and is V4, the vector corresponding to the "leather shoes" is V5, the vector corresponding to the "training" is V6, and the vector corresponding to the crab "is V7. That is, the first vector sequence includes V4, V5, V6 or V7, etc.
And then, calculating a third multivariate relation score between the first vector sequence and the second vector sequence, and adjusting the ranking of the candidate items corresponding to the input string according to the third multivariate relation score. Specifically, a binary relationship score between V4 and (V1, V2, V3), a binary relationship score between V5 and (V1, V2, V3), a binary relationship score between V6 and (V1, V2, V3), and the like are calculated. Assuming that the binary relationship between V7 and (V1, V2, V3) scores the highest, the candidate "crab" corresponding to V7 may be ranked the top.
In the embodiment of the present invention, a third multivariate relation score between the first vector sequence and the second vector sequence may be calculated by using a preset model, and the word frequency of the candidate items may be modified according to the third multivariate relation score, so as to reorder the candidate list.
In the above example, the "sea scratch" is used as the input string, it is understood that in practical applications, the embodiment of the present invention is not limited to the above length, for example, only "scratch" may be used as the above, and the binary relationship scores between "scratch" and "sort", "leather shoes", "training", "crab" waiting options may be calculated respectively, or "sea scratch" may be used as the above, and the like. The embodiment of the present invention preferably concatenates a plurality of meta-words as above, for example, the above "go to sea and grab" in the above example, the above is composed of three meta-words, so that the above is considered more comprehensively when frequency modulation is performed, and candidates that are more consistent with the context environment are ranked in front. For example, in the prior art, taking only "grab" as above, candidate words having a binary relationship with "grab" are likely to include "crab", "thief", etc., even though "grab | thief" scores higher than the binary relationship of "grab | crab". If the whole of going to the sea and catching the crab is taken as the above, the 'going to the sea and catching the crab' is obviously more reasonable than the 'going to the sea and catching the thief' and the binary relation score is probably larger, so that the candidate 'crab' can be arranged at the front position.
The embodiment of the invention can acquire the first vector sequence corresponding to the input string and the second vector sequence corresponding to the context and/or the context of the input string, calculate the binary relation score between the first vector sequence and the second vector sequence according to the preset multivariate relation calculation rule, and adjust the sequence of the candidate items corresponding to the input string according to the binary relation score.
Method example five
In this embodiment, based on the second embodiment, a process of associating with an established word vector library in an input process is described in detail. Referring to fig. 5, a flowchart illustrating steps of a fifth embodiment of the input method of the present invention is shown, which may specifically include the following steps:
step 501, acquiring association candidate items corresponding to the input according to the context of the input string;
step 502, obtaining a third vector sequence corresponding to the association candidate item;
step 503, calculating a fourth multivariate relationship score between the second vector sequence and the third vector sequence, and displaying the association candidate items in a ranking manner according to the fourth multivariate relationship score.
In an application example of the present invention, assuming that the current input string received by the user is "p" and the above text is "go to sea scratch", the most likely candidate can be associated by the input string "p" which has been currently input and the corresponding above text. Specifically, first, the second word sequence corresponding to the above is obtained: "go | seaside | grab"; by querying the word vector library, the corresponding second vector sequence above is obtained (V1, V2, V3). Then, the target candidate set is traversed to find out the association candidate matching the input string "p", for example, the association candidate may include: the target candidate set can be a system word bank, a user word bank and the like; searching the vector corresponding to each association candidate in the word vector library, and respectively obtaining a corresponding third vector sequence, which is marked as Ui(ii) a Finally, (V1, V2, V3) and UiInput to the preset model, calculate (V1, V2, V3) and UiAnd ranking the association candidates using the binary relation scores.
In practical applications, when the above is "go to sea scratch", and the user has not input any input string, the association function can be used to obtain the most likely candidate corresponding to "go to sea scratch".
In the input process of the user, the binary relation score between the text and the association candidate item can be obtained by obtaining the second vector sequence corresponding to the text and the third vector sequence corresponding to the association candidate item, so that the association candidate item can be determined according to the score. The embodiment of the invention can consider the multivariate relation more than 2-gram in the association process, thereby enabling the obtained association candidate item to be more accurate.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Device embodiment
Referring to fig. 6, a block diagram of an embodiment of an input device according to the present invention is shown, which may specifically include the following modules:
a first vector sequence obtaining module 601, configured to obtain a first vector sequence corresponding to an input string;
a first multivariate relation calculation module 602, configured to calculate a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule; and
a candidate determining module 603, configured to determine a candidate corresponding to the input string according to the first multivariate relationship score.
In an optional embodiment of the present invention, the first vector sequence obtaining module 601 may specifically include:
the first segmentation module is used for segmenting an input string of a user according to the meta word to obtain a first character segmentation result;
the first word sequence obtaining sub-module is used for obtaining a first word sequence corresponding to the first character segmentation result;
the first query submodule is used for querying the established word vector library to obtain a vector corresponding to each element word in the first element word sequence;
and the first vector sequence determining submodule is used for sequentially connecting vectors corresponding to each element word in the first element word sequence in series to obtain a first vector sequence corresponding to the input string.
In another alternative embodiment of the present invention, the word vector library may be created by:
acquiring a meta word number corresponding to a meta word in a word bank;
generating corresponding vectors for the element words in the word stock;
and establishing a word vector library according to the mapping relation between the meta word number and the vector.
In yet another optional embodiment of the present invention, the apparatus may further comprise:
the system word sequence acquisition module is used for acquiring a system word sequence corresponding to the input string;
the second multivariate relation score determining module is used for determining a second multivariate relation score corresponding to the system word sequence;
the candidate determining module 603 may specifically include:
and the candidate determining submodule is used for determining the candidate corresponding to the input string according to the ranking of the first multivariate relationship score and the second multivariate relationship score.
In yet another optional embodiment of the present invention, the system word sequence obtaining module may specifically include:
the second segmentation submodule is used for segmenting the input string according to the system words to obtain a second character segmentation result;
and the system word sequence determining submodule is used for acquiring the system word sequence corresponding to the second character segmentation result.
In yet another optional embodiment of the present invention, the second multivariate relationship score determining module may specifically include:
the unary word-forming score calculation sub-module is used for inquiring in a system word library to obtain the word frequency corresponding to each system word in the system word sequence and calculating to obtain an unary word-forming score corresponding to the system word sequence;
the binary word group score calculation submodule is used for calculating and obtaining a binary word group score corresponding to the system word sequence according to the binary relation when the binary relation exists in the system word sequence;
and the second multivariate relation score calculating submodule is used for determining a second multivariate relation score corresponding to the system word sequence according to the unary word composition score and the binary word composition score.
In yet another alternative embodiment of the present invention, the apparatus may further include:
the second word sequence acquisition module is used for acquiring a second word sequence corresponding to the upper text and/or the lower text of the input string;
the second query module is used for querying the established word vector library to obtain vectors corresponding to each element word in the second element word sequence;
the second vector sequence determining module is used for sequentially connecting vectors corresponding to each element word in the second element word sequence in series to obtain a second vector sequence;
and the ranking adjusting module is used for calculating a third multivariate relation score between the first vector sequence and the second vector sequence and adjusting the ranking of the candidate items corresponding to the input strings according to the third multivariate relation score.
In yet another alternative embodiment of the present invention, the apparatus may further include:
the association candidate item acquisition module is used for acquiring an association candidate item corresponding to the input according to the upper text and/or the lower text of the input string;
a third vector sequence obtaining module, configured to obtain a third vector sequence corresponding to the association candidate item;
and the association candidate sorting module is used for calculating a fourth multivariate relation score between the second vector sequence and the third vector sequence and sorting and displaying the association candidate items according to the fourth multivariate relation score.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
FIG. 7 is a block diagram illustrating an apparatus 800 for input according to an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 7, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform an input method, the method comprising: acquiring a first vector sequence corresponding to an input string; calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule; and determining a candidate item corresponding to the input string according to the first multivariate relation score.
Fig. 8 is a schematic structural diagram of a server in an embodiment of the present invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is only limited by the appended claims
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
The present invention provides an input method, an input device and a device for inputting, which are described in detail above, and the principle and the implementation of the present invention are explained herein by applying specific examples, and the description of the above examples is only used to help understand the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (22)

1. An input method, comprising:
acquiring a first vector sequence corresponding to an input string;
calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule;
determining a candidate item corresponding to the input string according to the first multivariate relation score;
the step of obtaining a first vector sequence corresponding to the input string includes:
segmenting an input string of a user according to the element words to obtain a first character segmentation result;
acquiring a first word sequence corresponding to the first character segmentation result;
inquiring an established word vector library to obtain a vector corresponding to each element word in the first element word sequence;
and sequentially connecting vectors corresponding to each element word in the first element word sequence in series to obtain a first vector sequence corresponding to the input string.
2. The method of claim 1, wherein the library of word vectors is created by:
acquiring a meta word number corresponding to a meta word in a word bank;
generating corresponding vectors for the element words in the word stock;
and establishing a word vector library according to the mapping relation between the meta word number and the vector.
3. The method of claim 1, further comprising:
acquiring a system word sequence corresponding to the input string;
determining a second multivariate relation score corresponding to the system word sequence;
the step of determining the candidate item corresponding to the input string according to the first multivariate relationship score includes:
and determining the candidate item corresponding to the input string according to the ranking of the first multivariate relation score and the second multivariate relation score.
4. The method according to claim 3, wherein the step of obtaining the systematic word sequence corresponding to the input string comprises:
segmenting the input string according to the system words to obtain a second character segmentation result;
and acquiring a system word sequence corresponding to the second character segmentation result.
5. The method of claim 3, wherein the step of determining the second multivariate relationship score corresponding to the sequence of systematic words comprises:
searching in a system word library to obtain the word frequency corresponding to each system word in the system word sequence, and calculating to obtain a unary word-forming score corresponding to the system word sequence;
when a binary relation exists in the system word sequence, calculating to obtain a binary word group score corresponding to the system word sequence according to the binary relation;
and determining a second multivariate relation score corresponding to the system word sequence according to the unary word composition score and the binary word composition score.
6. The method of claim 1, further comprising:
acquiring a second word sequence corresponding to the upper text and/or the lower text of the input string;
inquiring an established word vector library to obtain a vector corresponding to each element word in the second element word sequence;
sequentially connecting vectors corresponding to each element word in the second element word sequence in series to obtain a second vector sequence;
and calculating a third multivariate relation score between the first vector sequence and the second vector sequence, and adjusting the ranking of the candidate items corresponding to the input string according to the third multivariate relation score.
7. The method of claim 6, further comprising:
acquiring association candidate items corresponding to the input according to the upper text and/or the lower text of the input string;
acquiring a third vector sequence corresponding to the association candidate item;
and calculating a fourth multivariate relation score between the second vector sequence and the third vector sequence, and performing ranking display on the association candidate items according to the fourth multivariate relation score.
8. An input device, comprising:
the first vector sequence acquisition module is used for acquiring a first vector sequence corresponding to the input string;
the first multivariate relation calculation module is used for calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule; and
the candidate item determining module is used for determining a candidate item corresponding to the input string according to the first multivariate relationship score;
the first vector sequence acquisition module comprises:
the first segmentation module is used for segmenting an input string of a user according to the meta word to obtain a first character segmentation result;
the first word sequence obtaining sub-module is used for obtaining a first word sequence corresponding to the first character segmentation result;
the first query submodule is used for querying the established word vector library to obtain a vector corresponding to each element word in the first element word sequence;
and the first vector sequence determining submodule is used for sequentially connecting vectors corresponding to each element word in the first element word sequence in series to obtain a first vector sequence corresponding to the input string.
9. The apparatus of claim 8, further comprising:
the establishing module is used for establishing the word vector library:
the establishing module comprises:
the meta word number acquisition module is used for acquiring a meta word number corresponding to a meta word in a word stock;
the vector generation module is used for generating corresponding vectors for the element words in the word stock;
and the library establishing module is used for establishing a word vector library according to the mapping relation between the meta word number and the vector.
10. The apparatus of claim 8, further comprising:
the system word sequence acquisition module is used for acquiring a system word sequence corresponding to the input string;
the second multivariate relation score determining module is used for determining a second multivariate relation score corresponding to the system word sequence;
the candidate determination module includes:
and the candidate determining submodule is used for determining the candidate corresponding to the input string according to the ranking of the first multivariate relationship score and the second multivariate relationship score.
11. The apparatus of claim 10, wherein the system word sequence obtaining module comprises:
the second segmentation submodule is used for segmenting the input string according to the system words to obtain a second character segmentation result;
and the system word sequence determining submodule is used for acquiring the system word sequence corresponding to the second character segmentation result.
12. The apparatus of claim 10, wherein the second multivariate relationship score determination module comprises:
the unary word-forming score calculation sub-module is used for inquiring in a system word library to obtain the word frequency corresponding to each system word in the system word sequence and calculating to obtain an unary word-forming score corresponding to the system word sequence;
the binary word group score calculation submodule is used for calculating and obtaining a binary word group score corresponding to the system word sequence according to the binary relation when the binary relation exists in the system word sequence;
and the second multivariate relation score calculating submodule is used for determining a second multivariate relation score corresponding to the system word sequence according to the unary word composition score and the binary word composition score.
13. The apparatus of claim 8, further comprising:
the second word sequence acquisition module is used for acquiring a second word sequence corresponding to the upper text and/or the lower text of the input string;
the second query module is used for querying the established word vector library to obtain vectors corresponding to each element word in the second element word sequence;
the second vector sequence determining module is used for sequentially connecting vectors corresponding to each element word in the second element word sequence in series to obtain a second vector sequence;
and the ranking adjusting module is used for calculating a third multivariate relation score between the first vector sequence and the second vector sequence and adjusting the ranking of the candidate items corresponding to the input strings according to the third multivariate relation score.
14. The apparatus of claim 13, further comprising:
the association candidate item acquisition module is used for acquiring an association candidate item corresponding to the input according to the upper text and/or the lower text of the input string;
a third vector sequence obtaining module, configured to obtain a third vector sequence corresponding to the association candidate item;
and the association candidate sorting module is used for calculating a fourth multivariate relation score between the second vector sequence and the third vector sequence and sorting and displaying the association candidate items according to the fourth multivariate relation score.
15. An apparatus for input, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for:
acquiring a first vector sequence corresponding to an input string;
calculating to obtain a first multivariate relation score corresponding to the first vector sequence according to a preset multivariate relation calculation rule;
determining a candidate item corresponding to the input string according to the first multivariate relation score;
the obtaining of the first vector sequence corresponding to the input string includes:
segmenting an input string of a user according to the element words to obtain a first character segmentation result;
acquiring a first word sequence corresponding to the first character segmentation result;
inquiring an established word vector library to obtain a vector corresponding to each element word in the first element word sequence;
and sequentially connecting vectors corresponding to each element word in the first element word sequence in series to obtain a first vector sequence corresponding to the input string.
16. The apparatus of claim 15, wherein the library of word vectors is created by:
acquiring a meta word number corresponding to a meta word in a word bank;
generating corresponding vectors for the element words in the word stock;
and establishing a word vector library according to the mapping relation between the meta word number and the vector.
17. The apparatus of claim 15, wherein the apparatus is also configured to execute the one or more programs by one or more processors includes instructions for:
acquiring a system word sequence corresponding to the input string;
determining a second multivariate relation score corresponding to the system word sequence;
determining a candidate item corresponding to the input string according to the first multivariate relationship score, including:
and determining the candidate item corresponding to the input string according to the ranking of the first multivariate relation score and the second multivariate relation score.
18. The apparatus of claim 17, wherein the obtaining of the sequence of systematic words corresponding to the input string comprises:
segmenting the input string according to the system words to obtain a second character segmentation result;
and acquiring a system word sequence corresponding to the second character segmentation result.
19. The apparatus of claim 17, wherein the determining a second multivariate relationship score corresponding to the sequence of systematic words comprises:
searching in a system word library to obtain the word frequency corresponding to each system word in the system word sequence, and calculating to obtain a unary word-forming score corresponding to the system word sequence;
when a binary relation exists in the system word sequence, calculating to obtain a binary word group score corresponding to the system word sequence according to the binary relation;
and determining a second multivariate relation score corresponding to the system word sequence according to the unary word composition score and the binary word composition score.
20. The apparatus of claim 15, wherein the apparatus is also configured to execute the one or more programs by one or more processors includes instructions for:
acquiring a second word sequence corresponding to the upper text and/or the lower text of the input string;
inquiring an established word vector library to obtain a vector corresponding to each element word in the second element word sequence;
sequentially connecting vectors corresponding to each element word in the second element word sequence in series to obtain a second vector sequence;
and calculating a third multivariate relation score between the first vector sequence and the second vector sequence, and adjusting the ranking of the candidate items corresponding to the input string according to the third multivariate relation score.
21. The device of claim 20, wherein the device is also configured to execute the one or more programs by one or more processors includes instructions for:
acquiring association candidate items corresponding to the input according to the upper text and/or the lower text of the input string;
acquiring a third vector sequence corresponding to the association candidate item;
and calculating a fourth multivariate relation score between the second vector sequence and the third vector sequence, and performing ranking display on the association candidate items according to the fourth multivariate relation score.
22. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform an input method as recited in one or more of claims 1-7.
CN201610350134.5A 2016-05-24 2016-05-24 Input method, input device and input device Active CN107422872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610350134.5A CN107422872B (en) 2016-05-24 2016-05-24 Input method, input device and input device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610350134.5A CN107422872B (en) 2016-05-24 2016-05-24 Input method, input device and input device

Publications (2)

Publication Number Publication Date
CN107422872A CN107422872A (en) 2017-12-01
CN107422872B true CN107422872B (en) 2021-11-30

Family

ID=60422811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610350134.5A Active CN107422872B (en) 2016-05-24 2016-05-24 Input method, input device and input device

Country Status (1)

Country Link
CN (1) CN107422872B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110292B (en) * 2018-01-29 2023-11-14 北京搜狗科技发展有限公司 Data processing method and device for data processing
CN110244861B (en) * 2018-03-09 2024-02-02 北京搜狗科技发展有限公司 Data processing method and device
CN111752397B (en) * 2019-03-29 2024-06-04 北京搜狗科技发展有限公司 Candidate word determining method and device
CN112684909B (en) * 2020-12-29 2024-05-31 科大讯飞股份有限公司 Input method association effect evaluation method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013443A (en) * 2007-02-13 2007-08-08 北京搜狗科技发展有限公司 Intelligent word input method and input method system and updating method thereof
CN101644961A (en) * 2009-08-14 2010-02-10 北京搜狗科技发展有限公司 Encoded string sequencing method, device and character input method and device
CN101697109A (en) * 2009-10-26 2010-04-21 北京搜狗科技发展有限公司 Method and system for acquiring candidates of input method
CN102455845A (en) * 2010-10-14 2012-05-16 北京搜狗科技发展有限公司 Character entry method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6934683B2 (en) * 2001-01-31 2005-08-23 Microsoft Corporation Disambiguation language model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013443A (en) * 2007-02-13 2007-08-08 北京搜狗科技发展有限公司 Intelligent word input method and input method system and updating method thereof
CN101644961A (en) * 2009-08-14 2010-02-10 北京搜狗科技发展有限公司 Encoded string sequencing method, device and character input method and device
CN101697109A (en) * 2009-10-26 2010-04-21 北京搜狗科技发展有限公司 Method and system for acquiring candidates of input method
CN102455845A (en) * 2010-10-14 2012-05-16 北京搜狗科技发展有限公司 Character entry method and device

Also Published As

Publication number Publication date
CN107422872A (en) 2017-12-01

Similar Documents

Publication Publication Date Title
CN107291690B (en) Punctuation adding method and device and punctuation adding device
CN107608532B (en) Association input method and device and electronic equipment
CN110008401B (en) Keyword extraction method, keyword extraction device, and computer-readable storage medium
CN107221330B (en) Punctuation adding method and device and punctuation adding device
CN107305438B (en) Method and device for sorting candidate items
CN108304412B (en) Cross-language search method and device for cross-language search
CN107291704B (en) Processing method and device for processing
CN108628813B (en) Processing method and device for processing
CN107422872B (en) Input method, input device and input device
CN111368541A (en) Named entity identification method and device
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN110069624B (en) Text processing method and device
CN108628819B (en) Processing method and device for processing
CN111160047A (en) Data processing method and device and data processing device
CN110633017A (en) Input method, input device and input device
CN108628461B (en) Input method and device and method and device for updating word stock
CN109979435B (en) Data processing method and device for data processing
CN109471538B (en) Input method, input device and input device
CN109976548B (en) Input method and input device
CN111381685A (en) Sentence association method and device
CN108073294B (en) Intelligent word forming method and device for intelligent word forming
CN112987941B (en) Method and device for generating candidate words
CN109426359B (en) Input method, device and machine readable medium
CN113589949A (en) Input method and device and electronic equipment
CN113010768A (en) Data processing method and device and data processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant