CN111274359B - Query recommendation method and system based on improved VHRED and reinforcement learning - Google Patents
Query recommendation method and system based on improved VHRED and reinforcement learning Download PDFInfo
- Publication number
- CN111274359B CN111274359B CN202010067232.4A CN202010067232A CN111274359B CN 111274359 B CN111274359 B CN 111274359B CN 202010067232 A CN202010067232 A CN 202010067232A CN 111274359 B CN111274359 B CN 111274359B
- Authority
- CN
- China
- Prior art keywords
- query
- user
- network
- target
- session
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000002787 reinforcement Effects 0.000 title claims abstract description 19
- 238000013135 deep learning Methods 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 63
- 230000006870 function Effects 0.000 claims description 26
- 238000012512 characterization method Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000000306 recurrent effect Effects 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 5
- 241000288105 Grus Species 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3338—Query expansion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a query recommendation method and a query recommendation system based on improved VHRED and reinforcement learning, wherein the method comprises the following steps: step A: collecting user inquiry log records of a search engine, preprocessing the user inquiry log record data, and constructing a user inquiry log training setTS(ii) a And B, step B: training set using user query logsTSTraining a query recommendation deep learning network model based on VHRED with time characteristics and reinforcement learning; and C: and the query recommendation system receives the query sentence input by the user, inputs the query sentence into the trained query recommendation deep learning network model and outputs the matched query recommendation. The method and the system are beneficial to generating the query recommendation meeting the requirements of the user.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a query recommendation method and system based on improved VHRED and reinforcement learning.
Background
Query suggestions provide suggested queries for the session entered by the user. The query suggestions can enable the search engine to better understand the query intent of the user, and thus, the query of the user can be better optimized. Therefore, this task has received considerable attention in the last decade.
Cao et al propose a context-aware query suggestion framework-considering the entire sequence of queries in a session, rather than just the last query. They constructed a concept sequence suffix tree using query clusters for efficient and effective context-aware query suggestion. The query sequence may also be modeled with a mixed variable memory markov model. Context-aware query suggestions consider more user actions in the session, thereby better modeling information needs. Thus, the idea is also valid in terms of query classification and ranking. Ozertem et al developed a ranking framework that learned to suggest queries directly from search behavior in user search logs. It uses large-scale search logs, avoiding the requirement of manual labeling. Supervised advisory systems are generally more accurate and flexible than unsupervised advisory systems. Their suggested results may also improve diversified and personalized searches. Sordoni et al developed a hierarchical codec model (HRED) for context-aware query suggestion. The encoder first encodes query terms into query embedding using a two-stage Recurrent Neural Network (RNN), and then encodes query sequences into session embedding. It then decodes the session embedded in the target proposal. HRED avoids sparsity using a smooth distribution representation, better utilizing the large scale training data available in the search logs. One recent study has upgraded the sequence-to-sequence model of HRED, modeling different query importance and repeated terms in the session using attention and coping mechanisms.
In the former model, only the HRED model is used as a generation model, and the effect of the model still has a promotion space. Moreover, most models ignore the time characteristics of the query, and the time characteristics have great influence on the generation effect of the models. If the query is generated by only using the generator model, the generated query can not be guaranteed to be prepared to be close to the query generated by the user, so that the generated query has obvious machine generation traces and can not well express the query intention of the user.
Disclosure of Invention
The invention aims to provide a query recommendation method and system based on improved VHRED and reinforcement learning, which are beneficial to generating query recommendations meeting the needs of users.
In order to achieve the purpose, the invention adopts the technical scheme that: a query recommendation method based on improved VHRED and reinforcement learning comprises the following steps:
step A: collecting user query log records of a search engine, preprocessing the user query log record data, and constructing a user query log Training Set (TS);
and B: training a query recommendation deep learning network model based on VHRED with time characteristics and reinforcement learning by using a user query log Training Set (TS);
step C: and the query recommendation system receives the query sentence input by the user, inputs the query sentence into the trained query recommendation deep learning network model and outputs the matched query recommendation.
Further, the step a specifically includes the following steps:
step A1: collecting user query log records of a search engine to obtain an original query log set; wherein each query log of the search engine is represented by a triplet (u, q, t), u representing a user, q representing a query, and t representing a query time;
step A2: dividing an original query log set according to users, and sequencing according to query time to obtain query log subsets of different users;
step A3: setting a time interval T according to the following rule: query logs with query time intervals larger than T belong to different sessions, queries in different sessions are not related to each other, the last query in the same session is a target query containing a query intention of a user, a query log subset of each user is further divided into a plurality of sessions to obtain a session set of each user, and the session sets of all the users form a user query log training set TS;
one session of one user u in TS is represented asWherein q isiIndicating the ith query, t, in the sessioniDenotes qiCorresponding to the query time, the session contains ku+1 queries, last queryIs true of the userReal target query;
to q isiAfter word segmentation and stop word removal, q is addediIs further shown asDenotes qiThe j-th word in (1, 2., L (q) ·i),L(qi) Denotes qiThe number of words of; q. q.siCorresponding query time tiIs denoted by ti=(xi,yi,zi,di),xiRepresents hour, yiRepresents minute, ziDenotes second, diIndicating the day of the week.
Further, the query recommendation deep learning network model comprises a generator network based on a variable hierarchical encoder-decoder recurrent neural network with time characteristics VHRED and a discriminator network based on a hierarchical self-encoder, wherein the hierarchical self-encoder encodes words, sentences and paragraphs by multiple layers of GRUs respectively so as to capture semantic structure information of different levels; the step B specifically comprises the following steps:
step B1: inputting query text and query time pairs in a user query log training set TS into a generator network by taking a user session as a unit, and outputting a target query predicted by the generator network;
step B2: calculating the gradient of each parameter in the generator network by using a back propagation method according to the target loss function loss, and updating the parameters by using a random gradient descent method;
step B3: inputting the target query predicted in the step B1 and the real target query of the user in the user session into a discriminator network, outputting category probability, and judging whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability;
step B4: taking the class probability output by the discriminator network in the step B3 as the reward of the generator network, and performing reinforcement learning training by using a strategy gradient method to maximize the return expectation;
step B5: and when the iterative change of the loss value generated by inquiring the recommended deep learning network model is smaller than a set threshold value or reaches the maximum iteration times, terminating the training of the inquiring recommended deep learning network model.
Further, the step B1, with the user session as a unit, inputs the query text and the query time pair in the training set TS of the user query log into the generator network, and outputs the target query predicted by the generator network, specifically includes the following steps:
step B11: taking the user session as a unit, and carrying out the text and query time pair (q) on each query except the target query in the user sessioni,ti) Coding to obtain a characterization vector
If the user session isA query text to query time pair (q) in the user sessioni,ti) Is characterized by a token vectorIs shown asi=1,2,...,ku,To characterize a vectorAndthe connection of (a) to (b),is qiThe characterization vector of (a) is determined,is tiThe characterization vector of (2);
wherein q isiThe coding formula of (a) is as follows:
wherein GRU represents a gated recurrent neural network,j=1,2,...,L(qi) Is qiThe j-th word inIs used to represent the word vector of (a),by using a pre-trained word vector matrix E ∈ Rd×|D|Where D represents the dimension of the word vector, | D | is the number of words in the lexicon D,
tithe coding formula of (a) is as follows:
Step B12: with the user session as a unit, forming a sequence of the characterization vectors of each query text and query time pair except for the target query in the user session, inputting the sequence into an encoder module of a generator network based on a GRU network for encoding to obtain a user session SuIs characterized by a token vectorAll query information except the target query in the user session is contained;
wherein S isuThe coding formula of (a) is as follows:
Firstly, the first step is toThe mean value mu is obtained by the feedforward neural network module of the input generator networkuThe formula is as follows:
whereindzIs a hidden variable zuTanh is the hyperbolic tangent function, fFNNIs a feedforward neural network;
the mean value muuInputting a softplus function, and calculating to obtain covariance sigmauThe formula is as follows:
∑u=softplus(f(μu))
Then, according to the mean value muuSum covariance ∑uObtaining the latent variable z by random sampling calculationuThe formula is as follows:
wherein samples is a random number vector,extracting d from a standard normal distributionzA plurality of random numbers, constituting a random number vector samples,is a vector sigmauHadamard product with samples to obtain hidden variables of user session
Step B14: subjecting the product obtained in step B12And z obtained in step B13uThe GRU network-based decoder module which is input into the generator network decodes and outputs the target query predicted by the generator network;
first, a token vector of a user session is calculatedInitial hidden state ofThe formula is as follows:
firstly, h obtained by the above formula0Decoding through GRU network, decoding the decoded hidden state vector through GRU network, repeating KtargetSecondary decoding to generate a block containing KtargetWord-by-word target query qsWherein the decoding formula is as follows:
wherein f is the full junction layer, W2、W3、bprobParameters of the fully connected layer, bprobIn order to be a term of the offset,W3∈Rd×d,bprob∈Rd;
candidate target query q'sScore of(s) (score of q's) For the probability product of decoding, the formula is as follows:
taking logarithm of the above formula, the formula is as follows:
Further, in the step B2, the target loss function loss is defined as follows:
loss=loss1+loss2
among them, loss1The loss value loss is obtained using KL divergence measure as the difference between the distribution of hidden variables and the unit Gaussian distribution1The calculation is as follows:
wherein, muu,ΣuThe mean and covariance obtained in step B13;
loss2target query q predicted for a generator networksAnd a session SuTarget query representing real query intention of userBy cross entropy loss functionThe loss value obtained by number calculation is calculated as follows:
wherein Cross EntropyLoss is a cross entropy loss function;
and updating the learning rate through a gradient optimization algorithm AdaGrad, and updating model parameters through back propagation iteration so as to minimize a loss function to train the model.
Further, the step B3, inputting the target query predicted in the step B1 and the real target query of the user in the user session into the discriminator network, outputting the category probability, and determining whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability specifically includes the following steps:
step B31: the user session S obtained by B11uRemoving user's true target queryOuter text of each query qi,i=1,2,...,kuInputting the data into a discriminator network for coding to obtain qiIs characterized by a token vectori=1,2,...,ku;
Step B32: q obtained in the step B32iIs characterized by a token vectorConstituting a sequenceThe GRU network module in the input discriminator network is coded to obtainThe coding formula is as follows:
step B33: respectively conversing users SuUser's true target query inTarget query q predicted from step B14sThe GRU network module of the input discriminator network is coded to obtainAnd q issIs characterized by a token vectorAndthe coding formula is as follows:
wherein,for queryingMiddle j (th) wordIs used to represent the word vector of (a),k=1,2,...,L(qs) For querying qsMiddle j (th) wordBy using a word vector matrix E ∈ R in a pre-trainingd×|D|The obtained result is searched;
step B34: subjecting the product obtained in step B32And obtained in step B33The GRU network module of the input discriminator network is coded to obtainSubjecting the product obtained in step B32And obtained in step B33The GRU network module of the input discriminator network is coded to obtainAndthe coding formula of (c) is as follows:
step B35: subjecting the product obtained in step B34Andrespectively inputting softmax layers of the discriminator network, and outputting the category probability of the discriminator network considering the discriminator network to belong to the target query predicted by the generator network or the real target query of the user, wherein the calculation formula is as follows:
Further, the step B4 specifically includes the following steps:
step B41: regarding the process of generating the query recommendation by the model as an action sequence, regarding the generator network based on the improved VHRED as a strategy, regarding the probability obtained in the step B35 as the reward of the generator network, calculating the loss value:
J(θ)=E(R-b|θ)
wherein E represents the expected value of the reward, b is a baseline value and is a balance item which enables the training to be stable, and theta is a hyperparameter;
step B42: from the formula of the loss value of step B41, the update gradient is obtained by likelihood approximation:
and retraining parameters of the generator network based on the updated gradient, and enabling the target query predicted by the generator network to be closer to the real target query of the user through repeated iterative updating so as to obtain a trained query recommendation deep learning network model.
The invention also provides a query recommendation system adopting the method, which comprises the following steps:
the data collection module is used for collecting all user query log records in the search engine;
the preprocessing module is used for preprocessing the collected user query log record data, extracting queried time characteristic information and text characteristic information and constructing a user query log Training Set (TS);
the network training module is used for training a VHRED model with time characteristics by using the obtained user query log training set, generating target query recommendations by using all queried time characteristic information and text characteristic information in a session, calculating corresponding loss values, training the whole VHRED model with time characteristics by taking a minimum loss value as a target to obtain the trained VHRED model, then enabling a recommendation query generated by the trained VHRED model and a query generated by a real user to pass through a discriminator network to obtain a probability value R as a reward, continuously modifying a learning rate learning _ rate by using the reward R, and controlling a gradient descending direction, so that parameters of a generator network based on the improved VHRED are retrained, and finally the required trained query recommendation deep learning network model is obtained by carrying out repeated iterative updating; and
and the query recommendation module is used for receiving the query sentence input by the user, inputting the query sentence into the trained query recommendation deep learning network model and outputting the matched query recommendation.
Compared with the prior art, the invention has the following beneficial effects: the method and the system generate query recommendation by constructing and training a query recommendation deep learning network model based on the VHRED with the time characteristics and the reinforcement learning, are fast and robust, can accurately know the query intention of a user, generate the query recommendation meeting the needs of the user, and have better practicability and higher application value.
Drawings
Fig. 1 is a flowchart of a method implementation according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a system according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments.
The invention provides a query recommendation method based on improved VHRED and reinforcement learning, which comprises the following steps of:
step A: collecting user query log records of a search engine, preprocessing the user query log record data, and constructing a user query log Training Set (TS). The method specifically comprises the following steps:
step A1: collecting user query log records of a search engine to obtain an original query log set; wherein each query log of the search engine is represented by a triplet (u, q, t), u representing the user, q representing the query, and t representing the query time.
Step A2: and dividing the original query log set according to users, and sequencing according to query time to obtain query log subsets of different users.
Step A3: setting a time interval T according to the following rule: query logs with query time intervals larger than T belong to different sessions, queries in different sessions are not related to each other, the last query in the same session is a target query containing a query intention of a user, a query log subset of each user is further divided into a plurality of sessions to obtain a session set of each user, and the session sets of all the users form a user query log training set TS;
one session of one user u in TS is represented asWherein q isiIndicating the ith query, t, in the sessioniDenotes qiCorresponding to the query time, the session contains ku+1 queries, last queryQuerying a real target of a user;
to q isiAfter word segmentation and stop word removal, q is addediIs further shown asDenotes qiThe j-th word in (1, 2., L (q) ·i),L(qi) Represents qiThe number of words of; q. q.siCorresponding query time tiIs denoted by ti=(xi,yi,zi,di),xiRepresents hour, yiRepresents minute, ziDenotes second, diIndicating the day of the week.
And B, step B: and training a query recommendation deep learning network model based on VHRED with time characteristics and reinforcement learning by using a user query log Training Set (TS).
The query recommendation deep learning network model comprises a generator network based on a Variable Hierarchical Encoder-Decoder Recurrent neural network VHRED (Variable Hierarchical Recurrent Encoder-Decoder) with time characteristics and a discriminator network based on a Hierarchical Auto Encoder (Hierarchical Auto Encoder), wherein the Hierarchical Auto Encoder encodes words, sentences and paragraphs by multiple layers of GRUs respectively to capture semantic structure information of different levels (word level, sentence level and paragraph level). The step B specifically comprises the following steps:
step B1: and inputting query text and query time pairs in a training set TS of the user query log into the generator network by taking the user session as a unit, and outputting the target query predicted by the generator network. The method specifically comprises the following steps:
step B11: taking user session as unit, and sending the user session to the clientEach query text to query time pair (q) in the conversation except for the target queryi,ti) Coding to obtain a characterization vector
If the user session isA query text to query time pair (q) in the user sessioni,ti) Is characterized by a token vectorIs shown asi=1,2,...,ku,To characterize a vectorAndthe connection of (a) to (b),is q isiThe characterization vector of (a) is calculated,is tiThe characterization vector of (2);
wherein q isiThe coding formula of (a) is as follows:
wherein GRU represents a gated recurrent neural network,j=1,2,...,L(qi) Is qiThe j-th word inIs used to represent the word vector of (a),by using a pre-trained word vector matrix E ∈ Rd×|D|Where D represents the dimension of the word vector, | D | is the number of words in the lexicon D,
tithe coding formula of (a) is as follows:
Step B12: with the user session as a unit, forming a sequence of the characterization vectors of each query text and query time pair except for the target query in the user session, inputting the sequence into an encoder module of a generator network based on a GRU network for encoding to obtain a user session SuIs characterized by a token vectorAll query information except the target query in the user session is contained;
wherein S isuThe coding formula of (a) is as follows:
Firstly, the first step is toThe mean value mu is obtained by the feedforward neural network module of the input generator networkuThe formula is as follows:
whereindzIs a hidden variable zuTanh is the hyperbolic tangent function, fFNNIs a feedforward neural network;
the mean value muuInputting a softplus function, and calculating to obtain covariance sigmauThe formula is as follows:
∑u=softplus(f(μu))
Then, according to the mean value muuSum covariance ∑uObtaining the latent variable z by random sampling calculationuThe formula is as follows:
wherein samples is a random number vector,extracting d from a standard normal distributionzA random number, constituting a random number vector samples,is a vector sigmauHadamard product with samples to obtain hidden variables of user session
Step B14: subjecting the product obtained in step B12And z obtained in step B13uThe GRU network-based decoder module which is input into the generator network decodes and outputs the target query predicted by the generator network;
first, a token vector of a user session is calculatedInitial hidden state ofThe formula is as follows:
firstly, h obtained by the above formula0Decoding is carried out through a GRU network, and the hidden state vector obtained by decoding enters through the GRU networkLine decoding, repetition KtargetSecondary decoding to generate a block containing KtargetWord-by-word target query qsWherein the decoding formula is as follows:
wherein f is the full junction layer, W2、W3、bprobParameters of the fully connected layer, bprobIn order to be a bias term, the bias term,W3∈Rd×d,bprob∈Rd;
candidate target query q'sScore of(s) (score of q's) For the probability product of decoding, the formula is as follows:
taking logarithm of the above formula, the formula is as follows:
Step B2: and calculating the gradient of each parameter in the generator network by using a back propagation method according to the target loss function loss, and updating the parameter by using a random gradient descent method.
Wherein the target loss function loss is defined as follows:
loss=loss1+loss2
therein, loss1The loss value loss is obtained using KL divergence measure as the difference between the distribution of hidden variables and the unit Gaussian distribution1The calculation is as follows:
wherein, muu,ΣuThe mean and covariance obtained in step B13;
loss2target query q predicted for a generator networksAnd a session SuTarget query representing real query intention of userThe loss value calculated by the cross entropy loss function is calculated as follows:
wherein Cross EntropyLoss is a cross entropy loss function;
and updating the learning rate through a gradient optimization algorithm AdaGrad, and updating model parameters through back propagation iteration so as to minimize a loss function to train the model.
Step B3: inputting the target query predicted in the step B1 and the real target query of the user in the user session into the discriminator network, outputting the category probability, and judging whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability. The method specifically comprises the following steps:
step B31: user session S obtained by B11uRemoving user's true target queryOuter text of each query qi,i=1,2,...,kuInputting the data into a discriminator network for coding to obtain qiIs characterized by a token vectori=1,2,...,ku;
Step B32: q obtained in the step B32iIs characterized by a token vectorConstituting a sequenceThe GRU network module in the input discriminator network is coded to obtainThe coding formula is as follows:
step B33: respectively conversing users SuUser's true target query inTarget query q predicted from step B14sThe GRU network module of the input discriminator network is coded to obtainAnd q issIs characterized by a token vectorAndthe coding formula is as follows:
wherein,for queryingMiddle j (th) wordIs used to represent the word vector of (a),k=1,2,...,L(qs) For querying qsMiddle j (th) wordBy using a word vector matrix E ∈ R in a pre-trainingd×|D|The obtained result is searched;
step B34: subjecting the product obtained in step B32And obtained in step B33Inputting GRU network module of the discriminator network for coding to obtainSubjecting the product obtained in step B32And obtained in step B33The GRU network module of the input discriminator network is coded to obtainAndthe coding formula of (a) is as follows:
step B35: subjecting the product obtained in step B34Andrespectively inputting softmax layers of the discriminator network, and outputting the category probability of the discriminator network considering the discriminator network to belong to the target query predicted by the generator network or the real target query of the user, wherein the calculation formula is as follows:
Step B4: and B3, taking the class probability output by the discriminator network in the step B3 as the reward of the generator network, and performing reinforcement learning training by using a strategy gradient method to maximize the return expectation. The method specifically comprises the following steps:
step B41: regarding the process of generating the query recommendation by the model as an action sequence, regarding the generator network based on the improved VHRED as a strategy, regarding the probability obtained in the step B35 as the reward of the generator network, calculating the loss value:
J(θ)=E(R-b|θ)
wherein E represents the expected value of the reward, b is a base line value and is a balance item which enables the training to be stable, and theta is a hyperparameter;
step B42: the update gradient is obtained by likelihood approximation from the formula of the loss value of step B41:
if the reward R for an action is large, the probability of next generation of the sequence increases, and for sequences with lower reward R, the generation is relatively suppressed, so that a base value b is subtracted, so that the reward R has a positive or negative value.
In brief, the probability R is obtained as an incentive, the learning rate learning _ rate is modified, the gradient descending direction is controlled, the parameters of the generator network are retrained based on the updated gradient, and the target query predicted by the generator network is closer to the real target query of the user through repeated iteration updating, so that the trained query recommendation deep learning network model is obtained.
Step B5: and when the iterative change of the loss value generated by inquiring the recommended deep learning network model is smaller than a set threshold value or reaches the maximum iteration times, terminating the training of the inquiring recommended deep learning network model.
Step C: and the query recommendation system receives the query sentence input by the user, inputs the query sentence into the trained query recommendation deep learning network model and outputs the matched query recommendation.
The invention also provides a query recommendation system adopting the method, as shown in fig. 2, comprising:
the data collection module is used for collecting all user query log records in the search engine;
the preprocessing module is used for preprocessing the collected user query log record data, extracting queried time characteristic information and text characteristic information and constructing a user query log Training Set (TS);
the network training module is used for training a VHRED model with time characteristics by using the obtained user query log training set, generating target query recommendations by using all queried time characteristic information and text characteristic information in a session, calculating corresponding loss values, training the whole VHRED model with time characteristics by taking a minimum loss value as a target to obtain the trained VHRED model, then enabling a recommendation query generated by the trained VHRED model and a query generated by a real user to pass through a discriminator network to obtain a probability value R as a reward, continuously modifying a learning rate learning _ rate by using the reward R, and controlling a gradient descending direction, so that parameters of a generator network based on the improved VHRED are retrained, and finally the required trained query recommendation deep learning network model is obtained by carrying out repeated iterative updating; and
and the query recommendation module is used for receiving the query sentence input by the user, inputting the query sentence into the trained query recommendation deep learning network model and outputting the matched query recommendation.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.
Claims (7)
1. A query recommendation method based on improved VHRED and reinforcement learning is characterized by comprising the following steps:
step A: collecting user query log records of a search engine, preprocessing the user query log record data, and constructing a user query log Training Set (TS);
and B: training a query recommendation deep learning network model based on VHRED with time characteristics and reinforcement learning by using a user query log Training Set (TS);
and C: the query recommendation system receives a query sentence input by a user, inputs the query sentence into the trained query recommendation deep learning network model, and outputs a matched query recommendation;
the inquiry recommendation deep learning network model comprises a generator network based on a variable hierarchical encoder-decoder recurrent neural network (VHRED) with time characteristics and a discriminator network based on a hierarchical self-encoder, wherein the hierarchical self-encoder encodes words, sentences and paragraphs by multiple layers of GRUs respectively so as to capture semantic structure information of different levels; the step B specifically comprises the following steps:
step B1: inputting query text and query time pairs in a user query log training set TS into a generator network by taking a user session as a unit, and outputting a target query predicted by the generator network;
step B2: calculating the gradient of each parameter in the generator network by using a back propagation method according to the target loss function loss, and updating the parameters by using a random gradient descent method;
step B3: inputting the target query predicted in the step B1 and the real target query of the user in the user session into a discriminator network, outputting category probability, and judging whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability;
step B4: taking the class probability output by the discriminator network in the step B3 as the reward of the generator network, and performing reinforcement learning training by using a strategy gradient method to maximize the return expectation;
step B5: and when the iterative change of the loss value generated by inquiring the recommended deep learning network model is smaller than a set threshold value or reaches the maximum iteration times, terminating the training of the inquiring recommended deep learning network model.
2. The method of claim 1, wherein the step a specifically comprises the following steps:
step A1: collecting user query log records of a search engine to obtain an original query log set; wherein each query log of the search engine is represented by a triplet (u, q, t), u representing a user, q representing a query, and t representing a query time;
step A2: dividing an original query log set according to users, and sequencing according to query time to obtain query log subsets of different users;
step A3: setting a time interval T according to the following rule: query logs with query time intervals larger than T belong to different sessions, queries in different sessions are not related to each other, the last query in the same session is a target query containing a query intention of a user, a query log subset of each user is further divided into a plurality of sessions to obtain a session set of each user, and the session sets of all the users form a user query log training set TS;
one session of one user u in TS is represented asWherein q isiIndicating the ith query, t, in the sessioniDenotes qiCorresponding to the query time, the session contains ku+1 queries, last queryQuerying a real target of a user;
to q isiAfter word segmentation and stop word removal, q is addediIs further shown asRepresents qiThe j-th word in (1, 2., L (q) ·i),L(qi) Denotes qiThe number of words of; q. q.siCorresponding query time tiIs denoted by ti=(xi,yi,zi,di),xiRepresents hour, yiRepresents minute, ziDenotes second, diIndicating the day of the week.
3. The method for query recommendation based on VHRED and reinforcement learning of claim 1, wherein said step B1 is to input query text and query time pairs in a training set TS of user query logs into a generator network and output target queries predicted by the generator network in units of user sessions, and comprises the following steps:
step B11: taking the user session as a unit, and carrying out the text and query time pair (q) on each query except the target query in the user sessioni,ti) Coding to obtain a characterization vector
If the user session isA query text to query time pair (q) in the user sessioni,ti) Is characterized by a token vectorIs shown asi=1,2,...,ku,To characterize a vectorAndthe connection of (a) to (b),is qiThe characterization vector of (a) is determined,is tiThe characterization vector of (2);
wherein q isiThe coding formula of (a) is as follows:
wherein GRU represents a gated recurrent neural network,j=1,2,...,L(qi) Is qiThe j-th word inIs used to represent the word vector of (a),by using a pre-trained word vector matrix E ∈ Rd×|D|Where D represents the dimension of the word vector, | D | is the number of words in the lexicon D,
tithe coding formula of (a) is as follows:
Step B12: with the user session as a unit, forming a sequence of the characterization vectors of each query text and query time pair except for the target query in the user session, inputting the sequence into an encoder module of a generator network based on a GRU network for encoding to obtain a user session SuIs characterized vector ofAll query information except the target query in the user session is contained;
wherein S isuThe coding formula of (a) is as follows:
Firstly, the first step is toThe mean value mu is obtained by a feedforward neural network module of the input generator networkuThe formula is as follows:
whereindzIs a hidden variable zuTanh is the hyperbolic tangent function, fFNNIs a feedforward neural network;
mean value muuInputting a softplus function, and calculating to obtain covariance sigmauThe formula is as follows:
∑u=softplus(f(μu))
Then, according to the mean value muuSum covariance ∑uObtaining the latent variable z by random sampling calculationuThe formula is as follows:
wherein samples is random number vector, samples∈RdzExtracting d from the normal distributionzA random number constituting a random number vector samples "Is a vector sigmauHadamard product with samples to obtain hidden variables of user session
Step B14: subjecting the product obtained in step B12And z obtained in step B13uThe GRU network-based decoder module which is input into the generator network decodes and outputs the target query predicted by the generator network;
first, a token vector of a user session is calculatedInitial hidden state ofThe formula is as follows:
firstly, h obtained by the above formula0Decoding through GRU network, decoding the decoded hidden state vector through GRU network, repeating KtargetSecondary decoding to generate a block containing KtargetWord-by-word target query qsWherein the decoding formula is asThe following:
wherein f is the full junction layer, W2、W3、bprobParameters of the fully connected layer, bprobIn order to be a bias term, the bias term,W3∈Rd×d,bprob∈Rd;
candidate target query q'sScore of(s) (score of q's) For the decoded probability product, the formula is as follows:
taking the logarithm of the above formula, the formula is as follows:
4. The method for recommending queries based on VHRED and reinforcement learning of claim 3, wherein in said step B2, the objective loss function loss is defined as follows:
loss=loss1+loss2
among them, loss1The loss value loss is obtained using KL divergence measure as the difference between the distribution of hidden variables and the unit Gaussian distribution1The calculation is as follows:
wherein, muu,∑uThe mean and covariance obtained in step B13;
loss2target query q predicted for a generator networksAnd a session SuTarget query representing real query intention of userThe loss value calculated by the cross entropy loss function is calculated as follows:
wherein Cross EntropyLoss is a cross entropy loss function;
and updating the learning rate by a gradient optimization algorithm AdaGrad, and updating model parameters by using back propagation iteration to train the model by minimizing a loss function.
5. The method of claim 4, wherein the step B3 of inputting the target query predicted in the step B1 and the real target query of the user in the user session into the network of discriminators, outputting a category probability, and determining whether the input target query is the target query predicted by the generator network or the real target query of the user according to the category probability comprises the following steps:
step B31: user session S obtained by B11uRemoving user's true target queryOuter text of each query qi,i=1,2,...,kuInputting the data into a discriminator network for coding to obtain qiIs characterized by a token vectori=1,2,...,ku;
Step B32: q obtained in the step B31iIs characterized by a token vectorConstituting a sequenceInputting GRU network module in the discriminator network for coding to obtainThe coding formula is as follows:
step B33: respectively conversing users SuUser's true target query inTarget query q predicted from step B14sThe GRU network module of the input discriminator network is coded to obtainAnd q issIs characterized by a token vectorAndthe coding formula is as follows:
wherein,for queryingMiddle j (th) wordIs used to represent the word vector of (a),k=1,2,...,L(qs) For querying qsMiddle j (th) wordBy using a word vector matrix E ∈ R in a pre-trainingd×|D|The obtained result is searched;
step B34: subjecting the product obtained in step B32And obtained in step B33The GRU network module of the input discriminator network is coded to obtainSubjecting the product obtained in step B32And obtained in step B33The GRU network module of the input discriminator network is coded to obtainAndthe coding formula of (a) is as follows:
step B35: subjecting the product obtained in step B34Andrespectively inputting softmax layers of the discriminator network, and outputting the category probability of the discriminator network considering the discriminator network to belong to the target query predicted by the generator network or the real target query of the user, wherein the calculation formula is as follows:
6. The method for recommending queries based on VHRED and reinforcement learning of claim 5, wherein said step B4 specifically comprises the following steps:
step B41: regarding the process of generating the query recommendation by the model as an action sequence, regarding the generator network based on the improved VHRED as a strategy, regarding the probability obtained in the step B35 as the reward of the generator network, calculating the loss value:
J(θ)=E(R-b|θ)
wherein E represents the expected value of the reward, b is a baseline value and is a balance item which enables the training to be stable, and theta is a hyperparameter;
step B42: the update gradient is obtained by likelihood approximation from the formula of the loss value of step B41:
and retraining parameters of the generator network based on the updated gradient, and enabling the target query predicted by the generator network to be closer to the real target query of the user through repeated iterative updating so as to obtain a trained query recommendation deep learning network model.
7. A query recommendation system employing the method of any of claims 1-6, comprising:
the data collection module is used for collecting all user query log records in the search engine;
the preprocessing module is used for preprocessing the collected user query log record data, extracting queried time characteristic information and text characteristic information and constructing a user query log Training Set (TS);
the network training module is used for training a VHRED model with time characteristics by using the obtained user query log training set, generating target query recommendations by using all queried time characteristic information and text characteristic information in a session, calculating corresponding loss values, training the whole VHRED model with time characteristics by taking a minimum loss value as a target to obtain the trained VHRED model, then enabling a recommendation query generated by the trained VHRED model and a query generated by a real user to pass through a discriminator network to obtain a probability value R as a reward, continuously modifying a learning rate learning _ rate by using the reward R, and controlling a gradient descending direction, so that parameters of a generator network based on the improved VHRED are retrained, and finally the required trained query recommendation deep learning network model is obtained by carrying out repeated iterative updating;
and the query recommendation module is used for receiving the query sentence input by the user, inputting the query sentence into the trained query recommendation deep learning network model and outputting the matched query recommendation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010067232.4A CN111274359B (en) | 2020-01-20 | 2020-01-20 | Query recommendation method and system based on improved VHRED and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010067232.4A CN111274359B (en) | 2020-01-20 | 2020-01-20 | Query recommendation method and system based on improved VHRED and reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111274359A CN111274359A (en) | 2020-06-12 |
CN111274359B true CN111274359B (en) | 2022-06-14 |
Family
ID=70998997
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010067232.4A Active CN111274359B (en) | 2020-01-20 | 2020-01-20 | Query recommendation method and system based on improved VHRED and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111274359B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114077783A (en) * | 2020-08-18 | 2022-02-22 | 中国电信股份有限公司 | Interactive simulation method and device for recommendation system based on reinforcement learning |
CN113360497B (en) * | 2021-05-26 | 2022-04-05 | 华中科技大学 | Multi-load-oriented automatic recommendation method and system for secondary indexes of cloud database |
CN115070753A (en) * | 2022-04-28 | 2022-09-20 | 同济大学 | Multi-target reinforcement learning method based on unsupervised image editing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609433A (en) * | 2011-12-16 | 2012-07-25 | 北京大学 | Method and system for recommending query based on user log |
CN106557563A (en) * | 2016-11-15 | 2017-04-05 | 北京百度网讯科技有限公司 | Query statement based on artificial intelligence recommends method and device |
CN107122469A (en) * | 2017-04-28 | 2017-09-01 | 中国人民解放军国防科学技术大学 | Sort method and device are recommended in inquiry based on semantic similarity and timeliness resistant frequency |
CN109145213A (en) * | 2018-08-22 | 2019-01-04 | 清华大学 | Inquiry recommended method and device based on historical information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170185673A1 (en) * | 2015-12-25 | 2017-06-29 | Le Holdings (Beijing) Co., Ltd. | Method and Electronic Device for QUERY RECOMMENDATION |
-
2020
- 2020-01-20 CN CN202010067232.4A patent/CN111274359B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609433A (en) * | 2011-12-16 | 2012-07-25 | 北京大学 | Method and system for recommending query based on user log |
CN106557563A (en) * | 2016-11-15 | 2017-04-05 | 北京百度网讯科技有限公司 | Query statement based on artificial intelligence recommends method and device |
CN107122469A (en) * | 2017-04-28 | 2017-09-01 | 中国人民解放军国防科学技术大学 | Sort method and device are recommended in inquiry based on semantic similarity and timeliness resistant frequency |
CN109145213A (en) * | 2018-08-22 | 2019-01-04 | 清华大学 | Inquiry recommended method and device based on historical information |
Non-Patent Citations (1)
Title |
---|
查询推荐研究综述;张晓娟等;《情报学报》;20190424;第38卷(第4期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111274359A (en) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543180B (en) | Text emotion analysis method based on attention mechanism | |
CN110222163B (en) | Intelligent question-answering method and system integrating CNN and bidirectional LSTM | |
Kim et al. | Domain attention with an ensemble of experts | |
Yang et al. | Multitask learning and reinforcement learning for personalized dialog generation: An empirical study | |
Wang | Bankruptcy prediction using machine learning | |
CN111274359B (en) | Query recommendation method and system based on improved VHRED and reinforcement learning | |
CN111626063A (en) | Text intention identification method and system based on projection gradient descent and label smoothing | |
CN112417894B (en) | Conversation intention identification method and system based on multi-task learning | |
CN111782961B (en) | Answer recommendation method oriented to machine reading understanding | |
CN112749274B (en) | Chinese text classification method based on attention mechanism and interference word deletion | |
CN112232087B (en) | Specific aspect emotion analysis method of multi-granularity attention model based on Transformer | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
US20230169271A1 (en) | System and methods for neural topic modeling using topic attention networks | |
CN112182154A (en) | Personalized search model for eliminating keyword ambiguity by utilizing personal word vector | |
CN112416956A (en) | Question classification method based on BERT and independent cyclic neural network | |
CN109308316B (en) | Adaptive dialog generation system based on topic clustering | |
CN118193683B (en) | Text recommendation method and system based on language big model | |
Moriya et al. | Evolution-strategy-based automation of system development for high-performance speech recognition | |
CN115994224A (en) | Phishing URL detection method and system based on pre-training language model | |
CN111444328A (en) | Natural language automatic prediction inference method with interpretation generation | |
CN116167353A (en) | Text semantic similarity measurement method based on twin long-term memory network | |
CN113342964B (en) | Recommendation type determination method and system based on mobile service | |
Lee et al. | A two-level recurrent neural network language model based on the continuous Bag-of-Words model for sentence classification | |
CN108563639B (en) | Mongolian language model based on recurrent neural network | |
Hilmiaji et al. | Identifying Emotion on Indonesian Tweets using Convolutional Neural Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |