CN111274359B

CN111274359B - Query recommendation method and system based on improved VHRED and reinforcement learning

Info

Publication number: CN111274359B
Application number: CN202010067232.4A
Authority: CN
Inventors: 陈羽中; 胡潇炜; 郭昆; 陈泽林
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2022-06-14
Anticipated expiration: 2040-01-20
Also published as: CN111274359A

Abstract

The invention relates to a query recommendation method and a query recommendation system based on improved VHRED and reinforcement learning, wherein the method comprises the following steps: step A: collecting user inquiry log records of a search engine, preprocessing the user inquiry log record data, and constructing a user inquiry log training setTS(ii) a And B, step B: training set using user query logsTSTraining a query recommendation deep learning network model based on VHRED with time characteristics and reinforcement learning; and C: and the query recommendation system receives the query sentence input by the user, inputs the query sentence into the trained query recommendation deep learning network model and outputs the matched query recommendation. The method and the system are beneficial to generating the query recommendation meeting the requirements of the user.

Description

Query recommendation method and system based on improved VHRED and reinforcement learning

Technical Field

The invention relates to the field of natural language processing, in particular to a query recommendation method and system based on improved VHRED and reinforcement learning.

Background

Query suggestions provide suggested queries for the session entered by the user. The query suggestions can enable the search engine to better understand the query intent of the user, and thus, the query of the user can be better optimized. Therefore, this task has received considerable attention in the last decade.

Cao et al propose a context-aware query suggestion framework-considering the entire sequence of queries in a session, rather than just the last query. They constructed a concept sequence suffix tree using query clusters for efficient and effective context-aware query suggestion. The query sequence may also be modeled with a mixed variable memory markov model. Context-aware query suggestions consider more user actions in the session, thereby better modeling information needs. Thus, the idea is also valid in terms of query classification and ranking. Ozertem et al developed a ranking framework that learned to suggest queries directly from search behavior in user search logs. It uses large-scale search logs, avoiding the requirement of manual labeling. Supervised advisory systems are generally more accurate and flexible than unsupervised advisory systems. Their suggested results may also improve diversified and personalized searches. Sordoni et al developed a hierarchical codec model (HRED) for context-aware query suggestion. The encoder first encodes query terms into query embedding using a two-stage Recurrent Neural Network (RNN), and then encodes query sequences into session embedding. It then decodes the session embedded in the target proposal. HRED avoids sparsity using a smooth distribution representation, better utilizing the large scale training data available in the search logs. One recent study has upgraded the sequence-to-sequence model of HRED, modeling different query importance and repeated terms in the session using attention and coping mechanisms.

In the former model, only the HRED model is used as a generation model, and the effect of the model still has a promotion space. Moreover, most models ignore the time characteristics of the query, and the time characteristics have great influence on the generation effect of the models. If the query is generated by only using the generator model, the generated query can not be guaranteed to be prepared to be close to the query generated by the user, so that the generated query has obvious machine generation traces and can not well express the query intention of the user.

Disclosure of Invention

The invention aims to provide a query recommendation method and system based on improved VHRED and reinforcement learning, which are beneficial to generating query recommendations meeting the needs of users.

In order to achieve the purpose, the invention adopts the technical scheme that: a query recommendation method based on improved VHRED and reinforcement learning comprises the following steps:

step A: collecting user query log records of a search engine, preprocessing the user query log record data, and constructing a user query log Training Set (TS);

and B: training a query recommendation deep learning network model based on VHRED with time characteristics and reinforcement learning by using a user query log Training Set (TS);

step C: and the query recommendation system receives the query sentence input by the user, inputs the query sentence into the trained query recommendation deep learning network model and outputs the matched query recommendation.

Further, the step a specifically includes the following steps:

step A1: collecting user query log records of a search engine to obtain an original query log set; wherein each query log of the search engine is represented by a triplet (u, q, t), u representing a user, q representing a query, and t representing a query time;

step A2: dividing an original query log set according to users, and sequencing according to query time to obtain query log subsets of different users;

step A3: setting a time interval T according to the following rule: query logs with query time intervals larger than T belong to different sessions, queries in different sessions are not related to each other, the last query in the same session is a target query containing a query intention of a user, a query log subset of each user is further divided into a plurality of sessions to obtain a session set of each user, and the session sets of all the users form a user query log training set TS;

one session of one user u in TS is represented as

Wherein q is_iIndicating the ith query, t, in the session_iDenotes q_iCorresponding to the query time, the session contains k_u+1 queries, last query

Is true of the userReal target query;

to q is_iAfter word segmentation and stop word removal, q is added_iIs further shown as

Denotes q_iThe j-th word in (1, 2., L (q) ·_i)，L(q_i) Denotes q_iThe number of words of; q. q.s_iCorresponding query time t_iIs denoted by t_i＝(x_i,y_i,z_i,d_i)，x_iRepresents hour, y_iRepresents minute, z_iDenotes second, d_iIndicating the day of the week.

Further, the query recommendation deep learning network model comprises a generator network based on a variable hierarchical encoder-decoder recurrent neural network with time characteristics VHRED and a discriminator network based on a hierarchical self-encoder, wherein the hierarchical self-encoder encodes words, sentences and paragraphs by multiple layers of GRUs respectively so as to capture semantic structure information of different levels; the step B specifically comprises the following steps:

step B1: inputting query text and query time pairs in a user query log training set TS into a generator network by taking a user session as a unit, and outputting a target query predicted by the generator network;

step B2: calculating the gradient of each parameter in the generator network by using a back propagation method according to the target loss function loss, and updating the parameters by using a random gradient descent method;

step B3: inputting the target query predicted in the step B1 and the real target query of the user in the user session into a discriminator network, outputting category probability, and judging whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability;

step B4: taking the class probability output by the discriminator network in the step B3 as the reward of the generator network, and performing reinforcement learning training by using a strategy gradient method to maximize the return expectation;

step B5: and when the iterative change of the loss value generated by inquiring the recommended deep learning network model is smaller than a set threshold value or reaches the maximum iteration times, terminating the training of the inquiring recommended deep learning network model.

Further, the step B1, with the user session as a unit, inputs the query text and the query time pair in the training set TS of the user query log into the generator network, and outputs the target query predicted by the generator network, specifically includes the following steps:

step B11: taking the user session as a unit, and carrying out the text and query time pair (q) on each query except the target query in the user session_i,t_i) Coding to obtain a characterization vector

If the user session is

A query text to query time pair (q) in the user session_i,t_i) Is characterized by a token vector

Is shown as

i＝1,2,...,k_u，

To characterize a vector

And

the connection of (a) to (b),

is q_iThe characterization vector of (a) is determined,

is t_iThe characterization vector of (2);

wherein q is_iThe coding formula of (a) is as follows:

wherein GRU represents a gated recurrent neural network,

j＝1,2,...,L(q_i) Is q_iThe j-th word in

Is used to represent the word vector of (a),

by using a pre-trained word vector matrix E ∈ R^d×|D|Where D represents the dimension of the word vector, | D | is the number of words in the lexicon D,

t_ithe coding formula of (a) is as follows:

wherein,

connection characterization vector

And

to obtain

Step B12: with the user session as a unit, forming a sequence of the characterization vectors of each query text and query time pair except for the target query in the user session, inputting the sequence into an encoder module of a generator network based on a GRU network for encoding to obtain a user session S_uIs characterized by a token vector

All query information except the target query in the user session is contained;

wherein S is_uThe coding formula of (a) is as follows:

step B13: obtained according to step B12

Computing user sessions S_uLatent variable z of_u；

Firstly, the first step is to

The mean value mu is obtained by the feedforward neural network module of the input generator network_uThe formula is as follows:

wherein

d_zIs a hidden variable z_uTanh is the hyperbolic tangent function, f_FNNIs a feedforward neural network;

the mean value mu_uInputting a softplus function, and calculating to obtain covariance sigma_uThe formula is as follows:

∑_u＝softplus(f(μ_u))

wherein,

softplus is an activation function, softplus (x) log (1+ e)^x)；

Then, according to the mean value mu_uSum covariance ∑_uObtaining the latent variable z by random sampling calculation_uThe formula is as follows:

wherein samples is a random number vector,

extracting d from a standard normal distribution_zA plurality of random numbers, constituting a random number vector samples,

is a vector sigma_uHadamard product with samples to obtain hidden variables of user session

Step B14: subjecting the product obtained in step B12

And z obtained in step B13_uThe GRU network-based decoder module which is input into the generator network decodes and outputs the target query predicted by the generator network;

first, a token vector of a user session is calculated

Initial hidden state of

The formula is as follows:

wherein, W₁Is a weight parameter that is a function of,

d_his h₀Dimension of (b)₀Is a bias term;

firstly, h obtained by the above formula₀Decoding through GRU network, decoding the decoded hidden state vector through GRU network, repeating K_targetSecondary decoding to generate a block containing K_targetWord-by-word target query q_sWherein the decoding formula is as follows:

each decoding produces the next word

The probability of (c) is:

wherein f is the full junction layer, W₂、W₃、b_probParameters of the fully connected layer, b_probIn order to be a term of the offset,

W₃∈R^d×d，b_prob∈R^d；

candidate target query q'_sScore of(s) (score of q'_s) For the probability product of decoding, the formula is as follows:

wherein

To constitute target query q'_sThe word sequence of (a);

taking logarithm of the above formula, the formula is as follows:

selecting

Highest q'_sTarget query q as predicted by a generator network_s：

Further, in the step B2, the target loss function loss is defined as follows:

loss＝loss₁+loss₂

among them, loss₁The loss value loss is obtained using KL divergence measure as the difference between the distribution of hidden variables and the unit Gaussian distribution₁The calculation is as follows:

wherein, mu_u,Σ_uThe mean and covariance obtained in step B13;

loss₂target query q predicted for a generator network_sAnd a session S_uTarget query representing real query intention of user

By cross entropy loss functionThe loss value obtained by number calculation is calculated as follows:

wherein Cross EntropyLoss is a cross entropy loss function;

and updating the learning rate through a gradient optimization algorithm AdaGrad, and updating model parameters through back propagation iteration so as to minimize a loss function to train the model.

Further, the step B3, inputting the target query predicted in the step B1 and the real target query of the user in the user session into the discriminator network, outputting the category probability, and determining whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability specifically includes the following steps:

step B31: the user session S obtained by B11_uRemoving user's true target query

Outer text of each query q_i,i＝1,2,...,k_uInputting the data into a discriminator network for coding to obtain q_iIs characterized by a token vector

i＝1,2,...,k_u；

Step B32: q obtained in the step B32_iIs characterized by a token vector

Constituting a sequence

The GRU network module in the input discriminator network is coded to obtain

The coding formula is as follows:

step B33: respectively conversing users S_uUser's true target query in

Target query q predicted from step B14_sThe GRU network module of the input discriminator network is coded to obtain

And q is_sIs characterized by a token vector

And

the coding formula is as follows:

wherein,

for querying

Middle j (th) word

Is used to represent the word vector of (a),

k＝1,2,...,L(q_s) For querying q_sMiddle j (th) word

By using a word vector matrix E ∈ R in a pre-training^d×|D|The obtained result is searched;

step B34: subjecting the product obtained in step B32

And obtained in step B33

The GRU network module of the input discriminator network is coded to obtain

Subjecting the product obtained in step B32

And obtained in step B33

The GRU network module of the input discriminator network is coded to obtain

And

the coding formula of (c) is as follows:

step B35: subjecting the product obtained in step B34

And

respectively inputting softmax layers of the discriminator network, and outputting the category probability of the discriminator network considering the discriminator network to belong to the target query predicted by the generator network or the real target query of the user, wherein the calculation formula is as follows:

wherein,

obtained for step B34

Or

R represents the probability of belonging to the two categories.

Further, the step B4 specifically includes the following steps:

step B41: regarding the process of generating the query recommendation by the model as an action sequence, regarding the generator network based on the improved VHRED as a strategy, regarding the probability obtained in the step B35 as the reward of the generator network, calculating the loss value:

J(θ)＝E(R-b|θ)

wherein E represents the expected value of the reward, b is a baseline value and is a balance item which enables the training to be stable, and theta is a hyperparameter;

step B42: from the formula of the loss value of step B41, the update gradient is obtained by likelihood approximation:

wherein,

generating the next word for step B15

The probability of (d);

and retraining parameters of the generator network based on the updated gradient, and enabling the target query predicted by the generator network to be closer to the real target query of the user through repeated iterative updating so as to obtain a trained query recommendation deep learning network model.

The invention also provides a query recommendation system adopting the method, which comprises the following steps:

the data collection module is used for collecting all user query log records in the search engine;

the preprocessing module is used for preprocessing the collected user query log record data, extracting queried time characteristic information and text characteristic information and constructing a user query log Training Set (TS);

the network training module is used for training a VHRED model with time characteristics by using the obtained user query log training set, generating target query recommendations by using all queried time characteristic information and text characteristic information in a session, calculating corresponding loss values, training the whole VHRED model with time characteristics by taking a minimum loss value as a target to obtain the trained VHRED model, then enabling a recommendation query generated by the trained VHRED model and a query generated by a real user to pass through a discriminator network to obtain a probability value R as a reward, continuously modifying a learning rate learning _ rate by using the reward R, and controlling a gradient descending direction, so that parameters of a generator network based on the improved VHRED are retrained, and finally the required trained query recommendation deep learning network model is obtained by carrying out repeated iterative updating; and

and the query recommendation module is used for receiving the query sentence input by the user, inputting the query sentence into the trained query recommendation deep learning network model and outputting the matched query recommendation.

Compared with the prior art, the invention has the following beneficial effects: the method and the system generate query recommendation by constructing and training a query recommendation deep learning network model based on the VHRED with the time characteristics and the reinforcement learning, are fast and robust, can accurately know the query intention of a user, generate the query recommendation meeting the needs of the user, and have better practicability and higher application value.

Drawings

Fig. 1 is a flowchart of a method implementation according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a system according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the embodiments.

The invention provides a query recommendation method based on improved VHRED and reinforcement learning, which comprises the following steps of:

step A: collecting user query log records of a search engine, preprocessing the user query log record data, and constructing a user query log Training Set (TS). The method specifically comprises the following steps:

step A1: collecting user query log records of a search engine to obtain an original query log set; wherein each query log of the search engine is represented by a triplet (u, q, t), u representing the user, q representing the query, and t representing the query time.

Step A2: and dividing the original query log set according to users, and sequencing according to query time to obtain query log subsets of different users.

one session of one user u in TS is represented as

Querying a real target of a user;

Denotes q_iThe j-th word in (1, 2., L (q) ·_i)，L(q_i) Represents q_iThe number of words of; q. q.s_iCorresponding query time t_iIs denoted by t_i＝(x_i,y_i,z_i,d_i)，x_iRepresents hour, y_iRepresents minute, z_iDenotes second, d_iIndicating the day of the week.

And B, step B: and training a query recommendation deep learning network model based on VHRED with time characteristics and reinforcement learning by using a user query log Training Set (TS).

The query recommendation deep learning network model comprises a generator network based on a Variable Hierarchical Encoder-Decoder Recurrent neural network VHRED (Variable Hierarchical Recurrent Encoder-Decoder) with time characteristics and a discriminator network based on a Hierarchical Auto Encoder (Hierarchical Auto Encoder), wherein the Hierarchical Auto Encoder encodes words, sentences and paragraphs by multiple layers of GRUs respectively to capture semantic structure information of different levels (word level, sentence level and paragraph level). The step B specifically comprises the following steps:

step B1: and inputting query text and query time pairs in a training set TS of the user query log into the generator network by taking the user session as a unit, and outputting the target query predicted by the generator network. The method specifically comprises the following steps:

step B11: taking user session as unit, and sending the user session to the clientEach query text to query time pair (q) in the conversation except for the target query_i,t_i) Coding to obtain a characterization vector

If the user session is

Is shown as

i＝1,2,...,k_u，

To characterize a vector

And

the connection of (a) to (b),

is q is_iThe characterization vector of (a) is calculated,

is t_iThe characterization vector of (2);

wherein q is_iThe coding formula of (a) is as follows:

wherein GRU represents a gated recurrent neural network,

j＝1,2,...,L(q_i) Is q_iThe j-th word in

Is used to represent the word vector of (a),

t_ithe coding formula of (a) is as follows:

wherein,

connection characterization vector

And

to obtain

All query information except the target query in the user session is contained;

wherein S is_uThe coding formula of (a) is as follows:

step B13: obtained according to step B12

Computing user sessions S_uLatent variable z of_u；

Firstly, the first step is to

wherein

∑_u＝softplus(f(μ_u))

wherein,

softplus is an activation function, softplus (x) log (1+ e)^x)；

wherein samples is a random number vector,

extracting d from a standard normal distribution_zA random number, constituting a random number vector samples,

Step B14: subjecting the product obtained in step B12

first, a token vector of a user session is calculated

Initial hidden state of

The formula is as follows:

wherein, W₁Is a weight parameter that is a function of,

d_his h₀Dimension of (b)₀Is a bias term;

firstly, h obtained by the above formula₀Decoding is carried out through a GRU network, and the hidden state vector obtained by decoding enters through the GRU networkLine decoding, repetition K_targetSecondary decoding to generate a block containing K_targetWord-by-word target query q_sWherein the decoding formula is as follows:

each decoding produces the next word

The probability of (c) is:

wherein f is the full junction layer, W₂、W₃、b_probParameters of the fully connected layer, b_probIn order to be a bias term, the bias term,

W₃∈R^d×d，b_prob∈R^d；

wherein

To constitute target query q'_sThe sequence of words of (a);

taking logarithm of the above formula, the formula is as follows:

selecting

Highest q'_sTarget query q as predicted by a generator network_s：

Step B2: and calculating the gradient of each parameter in the generator network by using a back propagation method according to the target loss function loss, and updating the parameter by using a random gradient descent method.

Wherein the target loss function loss is defined as follows:

loss＝loss₁+loss₂

therein, loss₁The loss value loss is obtained using KL divergence measure as the difference between the distribution of hidden variables and the unit Gaussian distribution₁The calculation is as follows:

wherein, mu_u,Σ_uThe mean and covariance obtained in step B13;

The loss value calculated by the cross entropy loss function is calculated as follows:

wherein Cross EntropyLoss is a cross entropy loss function;

Step B3: inputting the target query predicted in the step B1 and the real target query of the user in the user session into the discriminator network, outputting the category probability, and judging whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability. The method specifically comprises the following steps:

step B31: user session S obtained by B11_uRemoving user's true target query

i＝1,2,...,k_u；

Step B32: q obtained in the step B32_iIs characterized by a token vector

Constituting a sequence

The GRU network module in the input discriminator network is coded to obtain

The coding formula is as follows:

step B33: respectively conversing users S_uUser's true target query in

And q is_sIs characterized by a token vector

And

the coding formula is as follows:

wherein,

for querying

Middle j (th) word

Is used to represent the word vector of (a),

k＝1,2,...,L(q_s) For querying q_sMiddle j (th) word

step B34: subjecting the product obtained in step B32

And obtained in step B33

Inputting GRU network module of the discriminator network for coding to obtain

Subjecting the product obtained in step B32

And obtained in step B33

The GRU network module of the input discriminator network is coded to obtain

And

the coding formula of (a) is as follows:

step B35: subjecting the product obtained in step B34

And

wherein,

obtained for step B34

Or

R represents the probability of belonging to the two categories.

Step B4: and B3, taking the class probability output by the discriminator network in the step B3 as the reward of the generator network, and performing reinforcement learning training by using a strategy gradient method to maximize the return expectation. The method specifically comprises the following steps:

J(θ)＝E(R-b|θ)

wherein E represents the expected value of the reward, b is a base line value and is a balance item which enables the training to be stable, and theta is a hyperparameter;

step B42: the update gradient is obtained by likelihood approximation from the formula of the loss value of step B41:

wherein,

generating the next word for step B15

The probability of (d);

if the reward R for an action is large, the probability of next generation of the sequence increases, and for sequences with lower reward R, the generation is relatively suppressed, so that a base value b is subtracted, so that the reward R has a positive or negative value.

In brief, the probability R is obtained as an incentive, the learning rate learning _ rate is modified, the gradient descending direction is controlled, the parameters of the generator network are retrained based on the updated gradient, and the target query predicted by the generator network is closer to the real target query of the user through repeated iteration updating, so that the trained query recommendation deep learning network model is obtained.

The invention also provides a query recommendation system adopting the method, as shown in fig. 2, comprising:

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A query recommendation method based on improved VHRED and reinforcement learning is characterized by comprising the following steps:

and C: the query recommendation system receives a query sentence input by a user, inputs the query sentence into the trained query recommendation deep learning network model, and outputs a matched query recommendation;

the inquiry recommendation deep learning network model comprises a generator network based on a variable hierarchical encoder-decoder recurrent neural network (VHRED) with time characteristics and a discriminator network based on a hierarchical self-encoder, wherein the hierarchical self-encoder encodes words, sentences and paragraphs by multiple layers of GRUs respectively so as to capture semantic structure information of different levels; the step B specifically comprises the following steps:

2. The method of claim 1, wherein the step a specifically comprises the following steps:

one session of one user u in TS is represented as

Querying a real target of a user;

Represents q_iThe j-th word in (1, 2., L (q) ·_i)，L(q_i) Denotes q_iThe number of words of; q. q.s_iCorresponding query time t_iIs denoted by t_i＝(x_i,y_i,z_i,d_i)，x_iRepresents hour, y_iRepresents minute, z_iDenotes second, d_iIndicating the day of the week.

3. The method for query recommendation based on VHRED and reinforcement learning of claim 1, wherein said step B1 is to input query text and query time pairs in a training set TS of user query logs into a generator network and output target queries predicted by the generator network in units of user sessions, and comprises the following steps:

If the user session is

Is shown as

i＝1,2,...,k_u，

To characterize a vector

And

the connection of (a) to (b),

is q_iThe characterization vector of (a) is determined,

is t_iThe characterization vector of (2);

wherein q is_iThe coding formula of (a) is as follows:

wherein GRU represents a gated recurrent neural network,

j＝1,2,...,L(q_i) Is q_iThe j-th word in

Is used to represent the word vector of (a),

t_ithe coding formula of (a) is as follows:

wherein,

concatenated token vectors

And

to obtain

Step B12: with the user session as a unit, forming a sequence of the characterization vectors of each query text and query time pair except for the target query in the user session, inputting the sequence into an encoder module of a generator network based on a GRU network for encoding to obtain a user session S_uIs characterized vector of

All query information except the target query in the user session is contained;

wherein S is_uThe coding formula of (a) is as follows:

step B13: obtained according to step B12

Computing user sessions S_uLatent variable z of_u；

Firstly, the first step is to

The mean value mu is obtained by a feedforward neural network module of the input generator network_uThe formula is as follows:

wherein

mean value mu_uInputting a softplus function, and calculating to obtain covariance sigma_uThe formula is as follows:

∑_u＝softplus(f(μ_u))

wherein,

softplus is an activation function, softplus (x) log (1+ e)^x)；

wherein samples is random number vector, samples∈R^dzExtracting d from the normal distribution_zA random number constituting a random number vector samples "

Step B14: subjecting the product obtained in step B12

first, a token vector of a user session is calculated

Initial hidden state of

The formula is as follows:

wherein, W₁Is a weight parameter that is a function of,

d_his h₀Dimension of (b)₀Is a bias term;

firstly, h obtained by the above formula₀Decoding through GRU network, decoding the decoded hidden state vector through GRU network, repeating K_targetSecondary decoding to generate a block containing K_targetWord-by-word target query q_sWherein the decoding formula is asThe following:

each decoding produces the next word

The probability of (c) is:

W₃∈R^d×d，b_prob∈R^d；

candidate target query q'_sScore of(s) (score of q'_s) For the decoded probability product, the formula is as follows:

wherein

To constitute target query q'_sThe word sequence of (a);

taking the logarithm of the above formula, the formula is as follows:

selecting

Highest q'_sTarget query q as predicted by a generator network_s：

4. The method for recommending queries based on VHRED and reinforcement learning of claim 3, wherein in said step B2, the objective loss function loss is defined as follows:

loss＝loss₁+loss₂

wherein, mu_u,∑_uThe mean and covariance obtained in step B13;

wherein Cross EntropyLoss is a cross entropy loss function;

and updating the learning rate by a gradient optimization algorithm AdaGrad, and updating model parameters by using back propagation iteration to train the model by minimizing a loss function.

5. The method of claim 4, wherein the step B3 of inputting the target query predicted in the step B1 and the real target query of the user in the user session into the network of discriminators, outputting a category probability, and determining whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability comprises the following steps:

step B31: user session S obtained by B11_uRemoving user's true target query

i＝1,2,...,k_u；

Step B32: q obtained in the step B31_iIs characterized by a token vector

Constituting a sequence

Inputting GRU network module in the discriminator network for coding to obtain

The coding formula is as follows:

step B33: respectively conversing users S_uUser's true target query in

And q is_sIs characterized by a token vector

And

the coding formula is as follows:

wherein,

for querying

Middle j (th) word

Is used to represent the word vector of (a),

k＝1,2,...,L(q_s) For querying q_sMiddle j (th) word

step B34: subjecting the product obtained in step B32

And obtained in step B33

The GRU network module of the input discriminator network is coded to obtain

Subjecting the product obtained in step B32

And obtained in step B33

The GRU network module of the input discriminator network is coded to obtain

And

the coding formula of (a) is as follows:

step B35: subjecting the product obtained in step B34

And

wherein,

obtained for step B34

Or

R represents the probability of belonging to the two categories.

6. The method for recommending queries based on VHRED and reinforcement learning of claim 5, wherein said step B4 specifically comprises the following steps:

J(θ)＝E(R-b|θ)

wherein,

generating the next word for step B14

The probability of (d);

7. A query recommendation system employing the method of any of claims 1-6, comprising:

the network training module is used for training a VHRED model with time characteristics by using the obtained user query log training set, generating target query recommendations by using all queried time characteristic information and text characteristic information in a session, calculating corresponding loss values, training the whole VHRED model with time characteristics by taking a minimum loss value as a target to obtain the trained VHRED model, then enabling a recommendation query generated by the trained VHRED model and a query generated by a real user to pass through a discriminator network to obtain a probability value R as a reward, continuously modifying a learning rate learning _ rate by using the reward R, and controlling a gradient descending direction, so that parameters of a generator network based on the improved VHRED are retrained, and finally the required trained query recommendation deep learning network model is obtained by carrying out repeated iterative updating;