CN108009150B

CN108009150B - Input method and device based on recurrent neural network

Info

Publication number: CN108009150B
Application number: CN201711217459.7A
Authority: CN
Inventors: 阮翀
Original assignee: Beijing Xinmei Hutong Technology Co ltd
Current assignee: Beijing Xinmei Hutong Technology Co ltd
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2021-01-05
Anticipated expiration: 2037-11-28
Also published as: CN108009150A

Abstract

The invention discloses an input method and device based on a recurrent neural network. When candidate words need to be recommended, word vectors of the needed words are obtained according to the overcomplete basis matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix, and the candidate words are obtained according to the word vectors and the recurrent neural network model. The word vectors of all words are obtained through the overcomplete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix. The sum of the capacities of the overcomplete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix is smaller than that of the word vector matrix, the space occupied by the terminal equipment is smaller, the calculation amount is smaller, the requirements on the storage capacity and the traffic capacity of the terminal equipment are reduced, and the method and the device can be applied to various terminal equipment.

Description

Input method and device based on recurrent neural network

Technical Field

The present application relates to the field of information input, and in particular, to an input method and device based on a recurrent neural network.

Background

With the development of science and technology, various terminal devices, such as mobile phones, smart televisions, computers and the like, come to the fore to meet the work and entertainment requirements of users. In the using process of the terminal device, the user sometimes needs to input information so that the terminal device can execute corresponding operations according to the received information. In order to improve the input efficiency and optimize the user experience, the existing input method usually provides a candidate word recommendation function. The function is used for presuming the next word which the user wants to input according to the first few words input by the user, taking the next word as a candidate word and recommending the candidate word to the user.

At present, in order to implement a candidate word recommendation function, most input methods employ an n-gram language model, in which an n-gram language model is constructed by counting the occurrence frequency of n-gram phrases in a large scale in advance, and then a next word (i.e., a candidate word) to be recommended to a user is determined according to the previous n-1 words already input by the user and the n-gram language model. However, due to the limitation of the model size of the n-gram language model and the data sparseness problem, when the scheme is applied, the candidate words can only be presumed according to the first n-1 or n-2 input words, and longer context information cannot be considered, so that the recommended candidate words are inaccurate. For example, if the user inputs "in two days" and the employed n-gram language model is a ternary language model, the candidate word is determined by the "two days" input by the user, and since the "two days ago" can be determined as a common phrase by the large-scale corpus statistics, the ternary language model may presume that the candidate word is "ago". However, "in two days ago" is not in conformity with english grammar, "ago" is not a proper candidate word, resulting in inaccuracy of the candidate word recommended this time.

In order to solve the problem that the recommended candidate words of the input method are inaccurate, an input method using a recurrent neural network can be adopted at present. In this method, a recurrent neural network model is constructed, and each word is decomposed into a multi-dimensional (usually hundreds of dimensions) vector, and the vector of each word constitutes a word vector matrix. If the number of words used to construct the word vector matrix is n and the word vector dimension of each word is e, the word vector matrix is usually an n × e matrix. When a candidate word needs to be recommended to a user, word vectors of all historical words input by the user and word vectors of current words can be obtained through the word vector matrix, the word vectors of all historical words and the word vectors of the current words are input into the recurrent neural network, and the recurrent neural network can output the candidate word. The method can be used for speculatively obtaining the candidate words based on the word vectors of the historical words and the word vectors of the current input words input by the user, so that the accuracy of recommending the candidate words is improved.

However, in the course of research of the present application, the inventors found that, when the input method of the recurrent neural network is applied, each word is usually expressed as a vector with several hundred dimensions, and there are often several tens of thousands of words in the word list (i.e., the value of n is several tens of thousands), which results in that the word vector matrix contains several hundreds of thousands of vectors (i.e., the value of e is several hundreds), the capacity is extremely large, and the calculation amount when determining candidate words according to the word vectors is large, which has high requirements on the storage capacity and the traffic capacity of the terminal device, and even is difficult to apply to some terminal devices (e.g., mobile phones).

Disclosure of Invention

The embodiment of the invention discloses an input method and device based on a recurrent neural network, and aims to solve the problems that in the prior art, the capacity of a word vector matrix is extremely large, the calculated amount is large when candidate words are determined according to word vectors, and the requirements on the storage capacity and the traffic capacity of terminal equipment are high.

In a first aspect of the present invention, an input method based on a recurrent neural network is disclosed, which includes:

obtaining a word vector matrix, wherein the word vector matrix is an n-e matrix, n is the number of words, and e is the word vector dimension of each word;

obtaining word vectors of b words in the word vector matrix, and constructing an overcomplete basis matrix according to the word vectors of the b words, wherein the overcomplete basis matrix is a b-x-e matrix, each basis vector in the overcomplete basis matrix represents a word vector corresponding to the b words, and b is a positive integer less than n;

constructing a sparse representation subscript matrix and a sparse representation coefficient matrix according to the word vector of each word and the overcomplete substrate matrix, wherein the sparse representation subscript matrix and the sparse representation coefficient matrix are both n-s matrices, and s is a preset sparse effect parameter;

when a candidate word needs to be recommended, acquiring a target base vector of the required word according to the overcomplete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix, determining a word vector of the required word through the target base vector, and acquiring the candidate word according to the word vector of the required word and the recurrent neural network model;

wherein the target base vector of each term is used to combine into a term vector for the term; the sparse representation subscript matrix comprises subscript vectors representing the positions of the required target substrate vectors in the overcomplete substrate matrix if the word vectors forming the words are formed; the sparse representation coefficient matrix contains coefficient vectors representing coefficients corresponding to the required target basis vectors if the word vectors of the words are formed.

Optionally, the obtaining the word vectors of b words in the word vector matrix includes:

ordering the words according to word frequency;

b words with the highest word frequency are selected according to the sequencing result;

and obtaining the word vectors of the b words by inquiring the word vector matrix.

Optionally, constructing a sparse representation subscript matrix and a sparse representation coefficient matrix according to the word vector of each word and the overcomplete basis matrix, includes:

setting the word vector of one word as x_iThe overcomplete basis matrix is B, the following LASSO regression problem is created:

wherein the content of the first and second substances,

being a hyper-parameter in real form, w_iAs vector parameters, w_iIs equal to the number of rows of the overcomplete base matrix;

solving the LASSO regression problem to obtain current w_iWhen the number of the medium non-zero components is s, the non-zero components are in w_iAnd the value of the non-zero component;

according to said non-zero component at w_iDetermining a sparse representation subscript vector corresponding to the word, and determining a sparse representation coefficient vector corresponding to the word according to the value of the non-zero component;

and constructing a corresponding sparse representation subscript matrix according to the sparse representation subscript vector corresponding to each word, and constructing a corresponding sparse representation coefficient matrix according to the sparse representation coefficient vector corresponding to each word.

Optionally, the LASSO regression problem is solved to obtain current w_iWhen the number of the medium non-zero components is s, theThe non-zero component being at w_iAnd the value of the non-zero component, including:

41) determining

The value of (a) is a preset initial value;

42) will be provided with

Substituting the value of (a) into the LASSO regression problem and calculating to obtain w_i；

43) Obtaining w_iThe number of the medium and non-zero components, judging w_iWhether the number of the medium non-zero components is equal to s or not is judged, if so, the non-zero components are obtained in w_iAnd the value of the non-zero component, if not, performing the operation of step 44);

44) if w_iThe number of the medium and non-zero components is larger than s, and the expansion is carried out according to a preset first adjustment amplitude

If w is_iThe number of the medium and non-zero components is less than s, and the medium and non-zero components are reduced according to a preset second adjustment amplitude

Then returns to performing the operation of step 42).

Optionally, when a candidate word needs to be recommended, obtaining a target base vector of a required word according to the overcomplete base matrix, the sparse representation subscript matrix, and the sparse representation coefficient matrix, and determining a word vector of the required word according to the target base vector, where the method includes:

searching the sparse representation subscript matrix, and acquiring a sparse representation subscript vector corresponding to the required word in the sparse representation subscript matrix, wherein the sparse representation subscript vector is used for indicating the position of a target base vector of the required word in the overcomplete base matrix;

searching the sparse representation coefficient matrix, and acquiring a sparse representation coefficient vector corresponding to the required word in the sparse representation coefficient matrix, wherein the sparse representation coefficient vector is used for indicating a coefficient corresponding to a target base vector of the required word;

searching the overcomplete base matrix according to the sparse representation subscript vector to obtain a target base vector of the required word;

and performing weighted combination according to the target base vector and the coefficient corresponding to the target base vector to calculate the word vector of the required word.

In a second aspect of the present invention, an input device based on a recurrent neural network is disclosed, comprising:

the first matrix obtaining module is used for obtaining a word vector matrix, wherein the word vector matrix is an n-e matrix, n is the number of words, and e is the word vector dimension of each word;

a second matrix obtaining module, configured to obtain word vectors of b words in the word vector matrix, and construct an overcomplete basis matrix according to the word vectors of the b words, where the overcomplete basis matrix is a b × e matrix, each basis vector in the overcomplete basis matrix represents a word vector corresponding to the b words, and b is a positive integer smaller than n;

the third matrix acquisition module is used for constructing a sparse representation subscript matrix and a sparse representation coefficient matrix according to the word vector of each word and the over-complete substrate matrix, wherein the sparse representation subscript matrix and the sparse representation coefficient matrix are both n × s matrices, and s is a preset sparse effect parameter;

the candidate word acquisition module is used for acquiring a target base vector of a required word according to the overcomplete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix when the candidate word needs to be recommended, determining a word vector of the required word according to the target base vector, and acquiring the candidate word according to the word vector of the required word and the recurrent neural network model;

Optionally, the second matrix obtaining module includes:

the word sorting unit is used for sorting the words according to the word frequency;

the word selecting unit is used for selecting b words with the highest word frequency according to the sequencing result;

and the word vector acquisition unit is used for acquiring the word vectors of the b words by inquiring the word vector matrix.

Optionally, the third matrix obtaining module includes:

a regression problem creation unit for setting a word vector of one of the words as x_iThe overcomplete basis matrix is B, the following LASSO regression problem is created:

wherein the content of the first and second substances,

a regression problem solving unit for solving the LASSO regression problem to obtain current w_iWhen the number of the medium non-zero components is s, the non-zero components are in w_iAnd the value of the non-zero component;

a vector determination unit for determining w from the non-zero component_iDetermining a sparse representation subscript vector corresponding to the word, and determining a sparse representation coefficient vector corresponding to the word according to the value of the non-zero component;

and the matrix construction unit is used for constructing a corresponding sparse representation subscript matrix according to the sparse representation subscript vector corresponding to each word, and constructing a corresponding sparse representation coefficient matrix according to the sparse representation coefficient vector corresponding to each word.

Optionally, the regression problem solving unit includes: the system comprises an initial numerical value determining subunit, a regression problem calculating subunit, a non-zero fraction subunit and a numerical value adjusting subunit;

the initial value determining subunit is used for determining

The value of (a) is a preset initial value;

the regression problem calculation subunit is used for calculating the regression problem

The non-zero component quantum unit is used for acquiring w_iThe number of the medium and non-zero components, judging w_iWhether the number of the medium non-zero components is equal to s or not is judged, if so, the non-zero components are obtained in w_iIf not, triggering the numerical value adjusting subunit to execute corresponding operation;

the value adjusting subunit is used for determining if w_iThe number of the medium and non-zero components is larger than s, and the expansion is carried out according to a preset first adjustment amplitude

Then triggers the regression problem calculation subunit to execute the corresponding operation.

Optionally, the candidate word obtaining module includes:

a first searching unit, configured to search the sparse representation subscript matrix, and obtain a sparse representation subscript vector corresponding to the required word in the sparse representation subscript matrix, where the sparse representation subscript vector is used to indicate a position of a target base vector of the required word in the overcomplete base matrix;

the second searching unit is used for searching the sparse representation coefficient matrix and acquiring a sparse representation coefficient vector corresponding to the required word in the sparse representation coefficient matrix, wherein the sparse representation coefficient vector is used for indicating a coefficient corresponding to a target base vector of the required word;

a target base vector obtaining unit, configured to search the overcomplete base matrix according to the sparse representation subscript vector, and obtain a target base vector of the required word;

and the weighted combination unit is used for carrying out weighted combination according to the target base vector and the coefficient corresponding to the target base vector, and calculating the word vector of the required word.

The embodiment of the invention discloses an input method and device based on a recurrent neural network. When candidate words need to be recommended, word vectors of the needed words can be obtained according to the constructed overcomplete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix, and then the candidate words are obtained according to the word vectors and the recurrent neural network model.

That is to say, according to the scheme disclosed in the embodiment of the present invention, the word vector matrix is decomposed into an overcomplete basis matrix, a sparse representation subscript matrix, and a sparse representation coefficient matrix, and the word vectors of the words can be obtained through the overcomplete basis matrix, the sparse representation subscript matrix, and the sparse representation coefficient matrix. And the sum of the capacities of the overcomplete basis matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix is smaller than that of the word vector matrix, so that compared with the prior art, the method disclosed by the embodiment of the invention occupies smaller space of the terminal equipment and has smaller calculation amount, thereby reducing the requirements on the storage capacity and the traffic capacity of the terminal equipment and being capable of being applied to various terminal equipment (such as mobile phones).

Furthermore, in the embodiment of the invention, because the capacity of the over-complete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix is smaller, the installation package volume of the terminal device application program is reduced, thereby facilitating distribution and downloading in an application store and facilitating popularization and application. In addition, the calculation amount in the scheme is small, so that the calculation efficiency can be improved, and the scheme disclosed by the embodiment of the invention can be used for reducing the program response time, reducing the click feeling of the terminal equipment and improving the use experience of a user when the scheme disclosed by the embodiment of the invention is adopted for text input.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any inventive exercise.

Fig. 1 is a schematic workflow diagram of an input method based on a recurrent neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a matrix comparison in an input method based on a recurrent neural network according to an embodiment of the present invention;

fig. 3 is a schematic view of a workflow for constructing a sparse representation subscript matrix and a sparse representation coefficient matrix in the input method based on the recurrent neural network disclosed in the embodiment of the present invention;

fig. 4 is a schematic diagram of a workflow for obtaining subscripts and values of non-zero components in an input method based on a recurrent neural network according to an embodiment of the present invention;

fig. 5 is a schematic view of a workflow for obtaining word vectors of required words in an input method based on a recurrent neural network according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an input device based on a recurrent neural network according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings.

The first embodiment of the invention discloses an input method based on a recurrent neural network. Referring to the workflow diagram shown in fig. 1, the method comprises the following steps:

step S11, obtaining a word vector matrix, where the word vector matrix is an n × e matrix, n is the number of words, and e is the word vector dimension of each word.

In this case, each row in the word vector matrix represents a word vector for the word corresponding to that row.

Step S12, obtaining word vectors of b words in the word vector matrix, and constructing an overcomplete basis matrix according to the word vectors of the b words, where the overcomplete basis matrix is a b × e matrix, each basis vector in the overcomplete basis matrix represents a word vector corresponding to the b words, and b is a positive integer less than n.

Wherein b is a positive integer less than n to avoid the over-complete substrate matrix capacity being too large. And b is usually greater than e, that is, the overcomplete basis matrix includes basis vectors corresponding to b words, and the specific value of the number b of the basis vectors in the overcomplete basis matrix is usually greater than the dimension of the word vectors, so as to ensure that the overcomplete basis matrix includes a sufficient number of basis vectors to meet the subsequent input requirements.

In practical applications, the number of values of b is usually set to be 4 to 5 times the number of values of e. For example, if the size of the word vector matrix is 20000 × 400, i.e., the word vector dimension e is 400, it can be determined that the value of b is 2000, and the size of the overcomplete basis matrix is 2000 × 400. Of course, b may also be set to other values, which are not limited in the embodiment of the present invention.

In the embodiment of the invention, as the number of words is usually ten thousand, the dimension of the word vector is hundreds, and the number of words is far greater than the dimension of the word vector, the word vectors among the words can be mutually expressed, so that an over-complete base matrix is constructed. Each vector in the overcomplete basis matrix is a basis vector, that is, the overcomplete basis matrix includes b basis vectors, and the word vector of each word can be formed by weighted combination of partial basis vectors.

Step S13, constructing a sparse representation index matrix and a sparse representation coefficient matrix according to the word vector of each word and the overcomplete substrate matrix, wherein the sparse representation index matrix and the sparse representation coefficient matrix are both n × S matrices, and S is a preset sparse effect parameter.

The specific value of s is a preset positive integer, for example, s may be 10, in which case, if the size of the word vector matrix is 20000 × 400, the size of the sparse representation index matrix and the size of the sparse representation coefficient matrix are both 20000 × 10. Of course, s may also be other values, which is not limited in the embodiment of the present invention.

In the embodiment of the present invention, it is considered that the word vector of a word can be obtained by performing weighted combination on s basis vectors in the overcomplete basis matrix, and the s basis vectors are the target basis vectors of the word. That is, the target base vector for each word is used to combine the word vectors for that word.

Wherein s is a sparse effect parameter, and means that for any word, s basis vectors (i.e., target basis vectors of the word) are selected from the overcomplete basis matrix, and the word vectors of the word can be obtained by performing weighted combination on the s basis vectors.

In this case, for a word, it is not necessary to store the word vector matrix of the word, but only to store which basis vectors the word vector of the word can be formed by weighting and combining. The positions of the target base vectors of the terms in the overcomplete base matrix can be determined by sparsely representing the subscript matrix, and the target base vectors of the terms can be determined by querying the overcomplete base matrix. In addition, the coefficient of each target basis vector can be determined by sparsely representing the coefficient matrix, and the coefficient is the weight occupied by each target basis vector when the target basis vectors are weighted and combined to obtain the word vector of the word.

Because each word can be weighted and combined through the target basis vector corresponding to the word to obtain the word vector of the word, n × s subscripts need to be recorded in total, and the recorded n × s subscripts are the sparse representation subscript matrix. In addition, n × s coefficients need to be recorded in total, and the recorded n × s coefficients are a sparse representation coefficient matrix.

Each row vector included in the sparse representation index matrix is referred to as an index vector, wherein each index vector represents the position of a plurality of target basis vectors in the overcomplete basis matrix, which are required if a word vector of a certain word is formed. For example, if the s target substrate vectors of one word are located in the rows a1, a2, …, as respectively in the overcomplete substrate matrix, the corresponding subscript vector may be (a1, a2, …, as).

In addition, each row vector in the sparse representation coefficient matrix is referred to as a coefficient vector, wherein each coefficient vector represents a coefficient corresponding to each of a plurality of target basis vectors required for forming a word vector of the word.

And step S14, when a candidate word needs to be recommended, acquiring a target base vector of the required word according to the overcomplete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix, determining a word vector of the required word according to the target base vector, and acquiring the candidate word according to the word vector of the required word and the recurrent neural network model.

In this step, the word vectors of the required words can be obtained by performing weighted combination on the target basis vectors of the required words. The required word may be a current word input by the user, a history word input before, and the like.

When candidate words need to be recommended, target basis vectors of the needed words and coefficients corresponding to the target basis vectors can be determined through the overcomplete basis matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix which are constructed in the steps, and the word vectors of the needed words are obtained through weighted combination. And then, acquiring candidate words according to the word vectors of the required words and the recurrent neural network, specifically, inputting the word vectors of the required words into a recurrent neural network model, and outputting the candidate words by the recurrent neural network model, thereby meeting the requirement of recommending the candidate words to the user.

The first embodiment of the invention discloses an input method based on a recurrent neural network, which comprises the steps of constructing an over-complete substrate matrix according to a word vector matrix after the word vector matrix is obtained, and constructing a sparse representation subscript matrix and a sparse representation coefficient matrix according to the over-complete substrate matrix. When candidate words need to be recommended, word vectors of the needed words can be obtained according to the constructed overcomplete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix, and then the candidate words are obtained according to the word vectors and the recurrent neural network model.

In order to clarify the difference between the capacity of the word vector matrix and the overcomplete basis matrix, the sparse representation subscript matrix, and the sparse representation coefficient matrix in the embodiment of the present invention, fig. 2 is disclosed.

Referring to the schematic diagram of matrix comparison shown in fig. 2, the leftmost rectangle in the diagram is a word vector matrix, and the size of the word vector matrix is 20000 × 400. In the figure, the three rectangles on the right side are the overcomplete basis matrix, the sparse representation index matrix, and the sparse representation coefficient matrix, respectively, and it can be seen from the above steps that the size of the overcomplete basis matrix is 2000 × 400, the size of the sparse representation index matrix is 20000 × 10, and the size of the sparse representation coefficient matrix is 20000 × 10. The area of each rectangle represents the dimension of the matrix corresponding to the rectangle and also reflects the number of elements in the matrix. From fig. 2, it can be seen that the word vector matrix is split into three matrices, namely a complete basis matrix, a sparse representation subscript matrix, and a sparse representation coefficient matrix.

Accordingly, the compression ratio of the word vector matrix before and after the division can be calculated, and the compression ratio is (2000 × 400+20000 × 10)/(20000 × 400) ═ 15%. The calculation shows that the capacity of the word vector matrix can be reduced by about 6 times by the method disclosed by the embodiment of the invention, which is considerable compression amount, and the method disclosed by the embodiment of the invention can effectively reduce the capacity of the word vector matrix.

Further, in the embodiment of the present invention, a step of obtaining word vectors of b words in the word vector matrix is disclosed, so as to construct an overcomplete basis matrix through the word vectors of the b words. Wherein, the obtaining of the word vectors of b words in the word vector matrix comprises the following steps:

firstly, ordering the words according to word frequency;

then, according to the sequencing result, selecting b words with the highest word frequency;

and finally, acquiring the word vectors of the b words by inquiring the word vector matrix.

In constructing the overcomplete basis matrix, b words may be optionally selected from the word vector matrix, and the overcomplete basis matrix may be constructed by the optional word vectors of the b words. In addition, in order to improve the accuracy of obtaining candidate words, an overcomplete basis matrix is usually constructed according to word vectors of words with higher word frequency. In this case, according to the description of the above steps, the words may be sorted according to the word frequency, b words with the highest word frequency are selected according to the sorting result, and then the word vectors of the b words are obtained according to the word vector matrix, so as to construct the overcomplete base matrix by the word vectors of the b words with the highest word frequency.

Further, in the embodiment of the present invention, an operation of constructing a sparse representation subscript matrix and a sparse representation coefficient matrix according to the word vector of each word and the overcomplete basis matrix is disclosed. Referring to the workflow diagram shown in fig. 3, the constructing a sparse representation subscript matrix and a sparse representation coefficient matrix according to the word vector of each word and the overcomplete basis matrix includes the following steps:

step S21, setting the word vector of one word as x_iThe overcomplete basis matrix is B, the following LASSO regression problem is created:

wherein the content of the first and second substances,

being a hyper-parameter in real form, w_iAs vector parameters, w_iIs equal to the number of rows of the overcomplete base matrix.

Is the error between the word vector obtained by the overcomplete basis matrix, the sparse representation subscript matrix, and the sparse representation coefficient matrix, and the word vector obtained by the word vector matrix.

For example, if the overcomplete base matrix has 5 rows, i.e., b is 5, then w_iDimension (d) is 5.

Each row in the n-e word vector matrix is a word vector of a word, the dimension of the word vector is e, and each word vector can be respectively marked as x₁To x_nAnd the word vector for any one of the words may be denoted as x_iAnd i is a positive integer of not less than 1 and not more than n. For x_iAn s-dimensional sparse representation index vector and an s-dimensional sparse representation sparse vector may be constructed to determine the word vector x by the s-dimensional sparse representation index vector and the s-dimensional sparse representation sparse vector_i。

Step S22, solving the LASSO regression problem, and obtaining current w_iWhen the number of the medium non-zero components is s, the non-zero components are in w_iAnd the value of the non-zero component.

When solving the LASSO regression problem, a cyclic coordinate descent method, an LARS method, and the like may be adopted, and of course, other methods may also be adopted, which is not limited in the embodiment of the present invention.

Step S23, according to the non-zero component at w_iDetermining a sparse representation index vector corresponding to the word, and determining a sparse representation coefficient vector corresponding to the word according to the value of the non-zero component.

In the embodiment of the present invention, when w_iWhen the number of the medium non-zero components is s, the non-zero components are in w_iA dimension in (a) may indicate a sparse representation index vector corresponding to the term, and a value of a non-zero component may indicate a sparse representation coefficient vector corresponding to the term.

For example, if the preset sparse effectThe fruit parameter s is 2, b is 5, the number of words n is 10000, in which case w_iDimension (d) is 5. By solving the LASSO regression problem, when w_iWhen the number of the non-zero components is s (namely 2), the obtained w_iIs (0, 0.3,0,0,0.6), where the non-zero components are the 2 nd and 5 th dimensions, respectively, then the corresponding sparsity of the term indicates a subscript vector of (2, 5). When the values of the non-zero components are 0.3 and 0.6, respectively, the sparse representation coefficient vector is (0.3, 0.6).

And S24, constructing a corresponding sparse representation subscript matrix according to the sparse representation subscript vector corresponding to each word, and constructing a corresponding sparse representation coefficient matrix according to the sparse representation coefficient vector corresponding to each word.

One index vector is a row in the sparse representation index matrix, and one coefficient vector is a row in the sparse representation coefficient matrix.

Because each word vector in the word vector matrix can construct an s-dimensional sparse representation subscript vector and an s-dimensional sparse representation sparse vector, n s-dimensional sparse representation subscript vectors and n s-dimensional sparse representation coefficient vectors can be constructed together, the n s-dimensional sparse representation subscript vectors can form a sparse representation subscript matrix, the n s-dimensional sparse representation coefficient vectors can form a sparse representation coefficient matrix, and thus the n s sparse representation subscript matrix and the n s sparse representation coefficient matrix are obtained.

In the above steps, the current w is obtained by solving the LASSO regression problem_iWhen the number of the medium non-zero components is s, the non-zero components are in w_iAnd the values of said non-zero components, and from this a sparse representation index matrix and a sparse representation coefficient matrix are constructed. Wherein, in the LASSO regression problem,

is the error between the word vector obtained by the overcomplete basis matrix, the sparse representation subscript matrix, and the sparse representation coefficient matrix, and the word vector obtained by the word vector matrix. The smaller the error, the more complete the substrate moment is passedThe closer the word vectors obtained by the array, the sparse representation subscript matrix, and the sparse representation coefficient matrix are to the word vectors obtained by the word vector matrix. However, the smaller the error, w_iThe larger the number of the medium non-zero components is, the larger the capacity of the constructed sparse representation subscript matrix and sparse representation coefficient matrix is. To avoid this, hyper-parameters are set in the LASSO regression problem

So as to pass the hyper-parameter

Adjusting w_iThe number of non-zero components in (a).

In this case, referring to fig. 4, in the embodiment of the present invention, the LASSO regression problem is solved to obtain the current w_iWhen the number of the medium non-zero components is s, the non-zero components are in w_iAnd the value of the non-zero component, comprising the steps of:

step S31, determination

The value of (a) is a preset initial value.

In the embodiment of the present invention, it is,

is a real form of hyper-parameter. The initial value in this step is preset according to the requirements. For example, it can be determined

The value of (2) is 15. Of course, it is also possible to provide

The numerical value of (b) is other values, which are not limited in the embodiment of the present invention.

Step S32, the

Substituting the value of (a) into the LASSO regression problem and calculating to obtain w_i。

Step S33, acquiring w_iThe number of the medium and non-zero components, judging w_iIf the number of non-zero components is equal to S, if yes, the operation of step S34 is performed, and if no, the operation of step S35 is performed.

When they are different values, w_iThe number of non-zero components in (a) is usually different. In this case, w is calculated_iThen, it is necessary to judge w_iIf the number of the medium non-zero components is equal to s, correspondingly adjusting

The specific numerical value of (1).

Step S34, if w_iThe number of the medium non-zero components is equal to s, and the non-zero components are obtained at w_iAnd the value of the non-zero component.

Step S35, if w_iIf the number of the medium non-zero components is not equal to s, judging w_iIf the number of the non-zero components is greater than S, the operation of step S36 is executed, and if not, the operation of step S37 is executed.

Step S36, if w_iThe number of the medium and non-zero components is larger than s, and the expansion is carried out according to a preset first adjustment amplitude

And then returns to perform the operation of step S32.

Wherein the preset first adjustment range may be 1.5, in which case the expanded one

The numerical value of (A) is before enlargement

1.5 times the value of (c). Of course, the preset first adjustment range can also be adjustedThe present invention is not limited to these values.

Step S37, if w_iThe number of the medium and non-zero components is less than s, and the medium and non-zero components are reduced according to a preset second adjustment amplitude

And then returns to perform the operation of step S32.

Wherein the preset second adjustment amplitude may be 0.9, in which case the reduced second adjustment amplitude is set to

Before reduction

0.9 times the value of (c). Of course, the preset second adjustment amplitude may also be other values, which is not limited in the embodiment of the present invention.

The larger, w_iThe fewer the number of non-zero components in (b),

the smaller, w_iThe greater the number of non-zero components in. Accordingly, the embodiments of the present invention can be applied to

Is adjusted, and is adjusted at each time

After the size of (d), calculate w_iNumber of medium non-zero components, wherein, if w_iIf the number of the medium non-zero components is larger than s, the number of the medium non-zero components is increased

If w_iNumber of medium non-zero components, wherein, if w_iThe number of medium and non-zero components is less than s, then it is reduced

Up to w_iThe number of the medium non-zero components is s.

In step S14, an operation is disclosed, when a candidate word needs to be recommended, of obtaining a target basis vector of a desired word according to the overcomplete basis matrix, the sparse representation subscript matrix, and the sparse representation coefficient matrix, and determining a word vector of the desired word by using the target basis vector, with reference to the workflow diagram shown in fig. 5, the operation includes the following steps:

and step S41, searching the sparse representation subscript matrix, and acquiring a sparse representation subscript vector corresponding to the required word in the sparse representation subscript matrix.

Wherein the sparse representation subscript vector is to indicate a location of a target base vector of the desired word in the overcomplete base matrix.

The sparse representation subscript matrix is an n-s matrix and is composed of sparse representation subscript vectors of n words, and when the target base vector of the required word needs to be obtained, the sparse representation subscript vector corresponding to the required word needs to be obtained.

And S42, searching the sparse representation coefficient matrix, and acquiring a sparse representation coefficient vector corresponding to the required word in the sparse representation coefficient matrix.

Wherein the sparse representation coefficient vector is used to indicate a coefficient corresponding to a target basis vector of the desired word.

The sparse representation coefficient matrix is an n-s matrix and is composed of sparse representation coefficient vectors of n words, and when a target base vector of a required word needs to be obtained, the sparse representation coefficient vector corresponding to the required word needs to be obtained.

And step S43, searching the overcomplete basis matrix according to the sparse expression subscript vector, and acquiring a target basis vector of the required word.

For example, if the sparseness of the desired word indicates that the subscript vector is (2,5), then the vectors of the second and fifth rows in the overcomplete basis matrix are the target basis vectors for the desired word.

And step S44, carrying out weighted combination according to the target base vector and the coefficient corresponding to the target base vector, and calculating the word vector of the required word.

For example, if the sparse representation coefficient vector corresponding to the desired word is (0.3,0.6), it indicates that the coefficients of the two target basis vectors are 0.3 and 0.6, respectively, when the two target basis vectors of the desired word are weighted and combined. Accordingly, the word vector of the required word can be obtained through weighted combination.

After the word vector of the required word is obtained, the word vector of the required word is input into the neural network model, and the neural network model can output a corresponding candidate word.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.

The embodiment of the invention discloses an input device based on a recurrent neural network. Referring to the schematic structural diagram shown in fig. 6, the input device based on the recurrent neural network includes: a first matrix obtaining module 100, a second matrix obtaining module 200, a third matrix obtaining module 300, and a candidate word obtaining module 400.

The first matrix obtaining module 100 is configured to obtain a word vector matrix, where the word vector matrix is an n × e matrix, n is a word number, and e is a word vector dimension of each word.

The second matrix obtaining module 200 is configured to obtain word vectors of b words in the word vector matrix, and construct an overcomplete basis matrix according to the word vectors of the b words, where the overcomplete basis matrix is a b × e matrix, each basis vector in the overcomplete basis matrix represents a word vector corresponding to the b words, and b is a positive integer smaller than n.

In practical applications, the number of values of b is usually set to be 4 to 5 times the number of values of e. For example, if the size of the word vector matrix is 20000 × 400, i.e., the word vector dimension e is 400, it can be determined that the value of b is 2000, and the size of the overcomplete basis matrix is 2000 × 400.

Of course, b may also be set to other values, which are not limited in the embodiment of the present invention.

The third matrix obtaining module 300 is configured to construct a sparse representation subscript matrix and a sparse representation coefficient matrix according to the word vector of each word and the overcomplete basis matrix, where the sparse representation subscript matrix and the sparse representation coefficient matrix are both n × s matrices, and s is a preset sparse effect parameter.

The candidate word obtaining module 400 is configured to, when a candidate word needs to be recommended, obtain a target base vector of the required word according to the overcomplete base matrix, the sparse representation subscript matrix, and the sparse representation coefficient matrix, determine a word vector of the required word according to the target base vector, and obtain the candidate word according to the word vector of the required word and the recurrent neural network model.

When candidate words need to be recommended, target basis vectors of the needed words and coefficients corresponding to the target basis vectors can be determined through the overcomplete basis matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix, and weighted combination is carried out to obtain word vectors of the needed words. And then, acquiring candidate words according to the word vectors of the required words and the recurrent neural network, specifically, inputting the word vectors of the required words into a recurrent neural network model, and outputting the candidate words by the recurrent neural network model, thereby meeting the requirement of recommending the candidate words to the user.

According to the scheme disclosed by the embodiment of the invention, the word vector matrix is decomposed into the overcomplete substrate matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix, and the word vectors of all words can be obtained through the overcomplete substrate matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix. And the sum of the capacities of the overcomplete basis matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix is smaller than that of the word vector matrix, so that compared with the prior art, the method disclosed by the embodiment of the invention can occupy smaller space of the terminal equipment and has smaller calculation amount, thereby reducing the requirements on the storage capacity and the traffic capacity of the terminal equipment and being applied to various terminal equipment (such as mobile phones).

Further, in the input device based on the recurrent neural network disclosed in the embodiment of the present invention, the second matrix obtaining module includes:

Through the units, the words are sequenced according to the word frequency, the b words with the highest word frequency are selected according to the sequencing result, then the word vectors of the b words are obtained according to the word vector matrix, and the overcomplete base matrix is constructed through the word vectors of the b words with the highest word frequency, so that the accuracy of obtaining the candidate words can be improved.

In the input device based on the recurrent neural network disclosed in the embodiment of the present invention, the third matrix acquisition module includes:

wherein the content of the first and second substances,

For example, if the preset sparse effect parameter s is 2, b is 5, and the number of words n is 10000, in this case, w_iDimension (d) is 5. By solving the LASSO regression problem, when w_iWhen the number of the non-zero components is s (namely 2), the obtained w_iIs (0, 0.3,0,0,0.6), where the non-zero components are the 2 nd and 5 th dimensions, respectively, then the corresponding sparsity of the term indicates a subscript vector of (2, 5). When the values of the non-zero components are 0.3 and 0.6, respectively, the sparse representation coefficient vector is (0.3, 0.6).

In the input device based on the recurrent neural network disclosed in the embodiment of the present invention, the regression problem solving unit includes: the system comprises an initial numerical value determining subunit, a regression problem calculating subunit, a non-zero fraction subunit and a numerical value adjusting subunit;

the initial value determining subunit is used for determining

The value of (a) is a preset initial value;

The numerical value of (A) is before enlargement

1.5 times the value of (c). Of course, the preset first adjustment amplitude may also be other values, which is not limited in the embodiment of the present invention.

Further, in the input device based on the recurrent neural network disclosed in the embodiment of the present invention, the candidate word acquiring module includes:

the second searching unit is used for searching the sparse representation coefficient matrix and acquiring a sparse representation coefficient vector corresponding to a required word in the sparse representation coefficient matrix, wherein the sparse representation coefficient vector is used for indicating a coefficient corresponding to a target base vector of the required word;

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The same and similar parts in the various embodiments in this specification may be referred to each other. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the description in the method embodiment.

The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

Claims

1. An input method based on a recurrent neural network, comprising:

wherein the target base vector of each term is used to combine into a term vector for the term; the sparse representation subscript matrix comprises subscript vectors representing the positions of the required target substrate vectors in the overcomplete substrate matrix if the word vectors forming the words are formed; the coefficient vector contained in the sparse representation coefficient matrix represents a coefficient corresponding to a required target base vector if the word vectors of the words are formed;

when a candidate word needs to be recommended, acquiring a target base vector of a required word according to the overcomplete base matrix, the sparse representation subscript matrix and the sparse representation coefficient matrix, and determining a word vector of the required word according to the target base vector, wherein the method comprises the following steps:

2. The recurrent neural network-based input method of claim 1, wherein said obtaining a word vector of b words in said word vector matrix comprises:

ordering the words according to word frequency;

3. The recurrent neural network-based input method of claim 1, wherein said constructing a sparse representation subscript matrix and a sparse representation coefficient matrix from the word vector of each word and the overcomplete basis matrix comprises:

wherein the content of the first and second substances,

4. The recurrent neural network-based input method of claim 3, wherein said solving said LASSO regression problem, obtaining current w_iWhen the number of the medium non-zero components is s, the non-zero components are in w_iAnd the value of the non-zero component, including:

41) determining

The value of (a) is a preset initial value;

42) will be provided with

Then returns to performing the operation of step 42).

5. An input device based on a recurrent neural network, comprising:

the candidate word acquisition module comprises:

6. The recurrent neural network-based input device of claim 5, wherein said second matrix acquisition module comprises:

7. The recurrent neural network-based input device of claim 5, wherein said third matrix acquisition module comprises:

wherein the content of the first and second substances,

a regression problem solving unit for solving the LASSORegression problem, obtaining when w_iWhen the number of the medium non-zero components is s, the non-zero components are in w_iAnd the value of the non-zero component;

8. The recurrent neural network-based input device of claim 7, wherein the regression problem solving unit includes: the system comprises an initial numerical value determining subunit, a regression problem calculating subunit, a non-zero fraction subunit and a numerical value adjusting subunit;

the initial value determining subunit is used for determining

The value of (a) is a preset initial value;

the value adjusting subunit is used for determining if w_iThe number of the medium and non-zero components is larger than s, and the adjustment range is adjusted according to a preset first adjustment rangeEnlargement