CN110110372B

CN110110372B - Automatic segmentation prediction method for user time sequence behavior

Info

Publication number: CN110110372B
Application number: CN201910279004.0A
Authority: CN
Inventors: 张伟; 梁文伟
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2023-04-18
Anticipated expiration: 2039-04-09
Also published as: CN110110372A

Abstract

Short-session based recommendations have been a hot problem in recommendation systems. Short-session based recommendations mean that the user's future is predicted from the user's continuous behavior over a short time window. The conventional method generally divides the time sequence behavior of a user into a plurality of short sessions according to a time window with a fixed size, and the division mode has the disadvantages that 1) if the time window is too large, the short sessions contain too much user behavior, and if the time window is too small, the short sessions cannot cover complete user stage behavior; 2) It is difficult to set a time window suitable for all user behaviors, and the like. Therefore, the invention provides the automatic segmentation prediction method for the user time sequence behavior based on the deep sequence reinforcement learning, the user sequence does not need to be artificially divided, and the defects are effectively overcome.

Description

Automatic segmentation prediction method for user time sequence behavior

Technical Field

The invention relates to the technical field of computer science, in particular to a user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning.

Background

Short-session based recommendations have been a hot issue in the field of machine learning and recommendation systems. Short-session based recommendations mean that the user's future is predicted from the user's continuous behavior over a short time window. For example, a user checks in to 5 places on a social networking application a day; the user has clicked on 8 items, etc. within a period of time while logging on to an e-commerce web site. The traditional approach is to model such short sessions through a recurrent neural network.

However, the conventional method usually divides the complete time sequence behavior of the user into a plurality of short sessions according to a time window with a fixed size, and such a division manner has the disadvantages that 1) for one user, if the time window is set to be too large, the short sessions may contain a plurality of segments of user behaviors which are mutually independent, and if the time window is set to be too small, the short sessions cannot cover a complete user stage behavior; 2) It is difficult to set a time window suitable for all user behaviors, and the like. Therefore, the invention provides the automatic segmentation and prediction method of the user time sequence behavior based on the hierarchical recurrent neural network and the reinforcement learning, the user sequence does not need to be divided artificially, and the defects are effectively solved.

Disclosure of Invention

The invention innovatively provides a user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning for the first time, which is characterized in that a strategy network is used for learning segmentation of user time sequence data, a hierarchical recurrent neural network is used for modeling divided conversations, and behaviors of future users are predicted. Through searching, no prior art or report related to the invention is found. According to the invention, the hierarchical recurrent neural network is adopted to model the user time sequence, the user behavior expressions of different hierarchies are considered, and important sequence information can be effectively extracted.

The invention provides a user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning, which comprises the following steps:

the method comprises the following steps: selecting a data set, preprocessing the data, and then segmenting the data into a training set, a verification set and a test set;

step two: representing the user and the time sequence behavior by using high-dimensional one-hot coding, and converting the user and the time sequence behavior into low-dimensional dense vectors by using an embedding technology to be used as the input of a model;

step three: modeling time sequence data by using a hierarchical recurrent neural network, generating an action of each time step by using a strategy network to guide whether to segment the sequence, and then completing prediction of the next time step action of the sequence by using a classification network;

step four: training model parameters, namely using training samples to optimize parameters of the network model in stages according to different target functions and using a verification set to adjust and optimize the model parameters;

step five: and predicting the next possible behaviors of the users in the test set by using a network model based on the hierarchical recurrent neural network and the reinforcement learning.

In the invention, the user time sequence behaviors comprise user sign-in behaviors, user commodity purchasing behaviors, user webpage clicking behaviors and user music listening behaviors, which are all behavior data commonly adopted in the field.

In the present invention, the data set includes: one or more of the Gowalla dataset, the Foursquare dataset, and the Amazon dataset, all of which are public behavior datasets commonly employed in the art.

In the first step, the data preprocessing comprises the following steps:

a1. sequencing the behavior sequence data from far to near according to the time stamp by the user;

a2. filtering the infrequent data, deleting users with less than 10 behaviors, and deleting articles with less than 5 user behaviors;

a3. selecting a time window, and recording the sequence segmentation mode based on the time window as the initial strategy pi of the strategy network ₀ 。

In the first step, the data is segmented into a training set, and the verification set and the test set refer to that for each user, the last place of the time sequence data is used as the test set, the penultimate of the time sequence data is used as the verification set, and the rest of the time sequence data are used as the training set.

In the second step, obtaining the input of the model comprises the following steps:

b1. data coding: the method comprises the steps of recording N users and M places, adopting independent hot coding, namely representing a user set by using an N-dimensional sparse vector, recording the unique characteristic dimension of each user as 1, and taking the rest as 0, and applying to the places in the same way;

b2. data embedding: embedding technique for N-dimensionThe user vector is mapped to another low-dimensional numerical vector space, and as the input of the subsequent model, the transformed user vector set is represented as U = { U = ₁ ,u ₂ ,…,u _N Represents a set of location vectors as P = { P = } ₁ ,p ₂ ,…,p _M }。

In the third step, the recurrent neural network refers to but is not limited to a gated recurrent unit network, and can be replaced by a long-term and short-term memory network, taking time step t as an example, note x _t The specific calculation process for the input at time step t comprises the following steps:

c1. compute update gate z _t ：

z _t ＝σ(W _z ·[h _t-1 ,x _t ]+b _z ),

c2. Calculate reset gate r _t ：

r _t ＝σ(W _r ·[h _t-1, x _t ]+b _r ),

c3. Computing hidden memory states

c4. Calculating the hidden state h _t ：

Wherein σ is a sigmoid function, and [ means ] Hadamard product]Represents concatenation of vectors, represents matrix multiplication,

are all parameters that the model can learn.

In the third step, the time sequence data is modeled by utilizing a hierarchical recurrent neural network, and u is expressed by a user _k For example, the method comprises the following steps:

d1. sequence level recurrent neural network:

d11. sequence of places of length L of input sequence, denoted

d12. Calculated by recurrent neural networks

The output of each time step of the sequence level is obtained and recorded as->

d2. Session level recurrent neural network:

d21. according to a segmentation strategy pi, a result corresponding to the selected segmentation time step is output from the sequence level recurrent neural network and is used as the input of the session level recurrent neural network, the length is | pi |, and is recorded as

d22. Via recurrent neural network computation

Get the output of the recurrent neural network of the conversation level, and mark as &>

d3. And (3) expanding and outputting according to the time step: from an output of length | π |)

And expanding the output with the length of L according to the segmentation strategy pi.

In the third step, the action of each time step is generated by using the strategy network, and the action a generated at the time of the time step t _t The method comprises the following steps:

e1. defining a state function s _t ：

Wherein

Represents a concatenation of vectors, is>

And &>

Respectively outputting the sequence level and the conversation level of the recurrent neural network at a time step t;

e2. defining an action space a _t ：

a _t ＝{1,0},

Wherein 1 represents that the current behavior belongs to the current session, and 0 represents that the current behavior does not belong to the current session;

e3. defining a policy function pi:

π(a _t |s _t ；Θ)＝σ(W _π *s _t +b _π ),

wherein W _π ,b _π Are parameters of the policy network. During the training process, action a _t Is sampled by a strategy pi probability value, and when testing, the action a depends on

In the third step, the behavior of the next time step of the sequence is predicted by using the classification network, and the time step t +1 is predicted by using the time step t as an example, and the method comprises the following steps:

f1. sum of output of the concatenated user representation and the hierarchical recurrent neural network:

f2. adding a full connection layer thereon:

wherein W _o ,b _o Is a parameter of the classification network, the dimension of which is consistent with the number of places as M,

the predicted location where the user arrived at time step t +1 is represented by a one-hot code.

In the fourth step, according to different objective functions, the method comprises the following steps:

g1. when the whole strategy network completes the generation of sequence actions, the segmentation of the whole sequence is also completed, firstly, a delay reward function of the whole sequence strategy network is defined as:

wherein y is _t Is input of X _L The real place mark at time step t is expressed by one-hot coding, L' represents the number of sessions in the sequence, gamma is a hyper-parameter for measuring two-part rewards, and Q is a certain constant. Assuming that the length of a session is moderate, a unimodal function f (x) = x + Q/x is proposed, which is in

Takes the minimum value in time>

Artificially wishing for the length of a session

It would be better. Different limits can be put on the length of the session by replacing the second item of the reward function;

g2. the gradient update formula defining a sequence in the policy network is:

wherein

Is a parameter in the policy network;

g3. defining a cross entropy function as an objective function of the training classification network:

where Θ represents all parameters in the classification network and β is a hyperparameter that trades off the loss of both parts.

In the fourth step, the parameters of the staged training network model include the following steps:

h1. pre-training a classification network: applying an initial policy π ₀ And training samples, utilizing back propagation, taking the cross entropy loss function defined in the step (3) as a minimization target, and updating parameters in the classification network;

h2. pre-training the strategy network: keeping the parameters in the classification network unchanged, and training the parameters in the strategy network by updating the gradient updating formula defined in the step (2);

h3. performing combined training: parameters in the entire network are jointly trained until the loss converges.

In the fifth step, the next possible behavior of the user in the test set is predicted by using the network model based on the hierarchical recurrent neural network and the reinforcement learning, and the method comprises the following steps:

i1. the sum of the concatenated user representation and the output of the last time step of the hierarchical recurrent neural network:

i2. adding a full connection layer prediction target:

wherein

For the predicted location distribution, represented by one-hot coding, W _o ,b _o Are parameters of the classification network.

Compared with the prior art, the invention has the beneficial effects that:

(1) The hierarchical recurrent neural network is adopted to model the user time sequence, the user behavior representation of different levels is considered, and important sequence information can be effectively extracted;

(2) The method comprises the steps of utilizing strategy network learning to segment user time sequence data, reasonably dividing sequences by considering the relation between the front and the back of the sequences, and meanwhile considering various constraints into a reward function;

(3) The method effectively solves the defect that the artificial division sequence is short conversation, such as the problem that the window size suitable for all users cannot be provided.

Drawings

Fig. 1 is a schematic flow chart of a user time series behavior segmentation prediction method according to the present invention.

FIG. 2 is a block diagram of the overall network model in one embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

The invention provides a user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning, a flow chart is shown as figure 1, and the method comprises the following steps:

step two: expressing the user and the time sequence behavior by using high-dimensional single hot codes, and converting the user and the time sequence behavior into low-dimensional dense vectors by using an embedding technology to serve as the input of a model;

step three: modeling the time sequence data by using a hierarchical recurrent neural network, generating an action of each time step by using a strategy network to guide whether the sequence is segmented, and then completing prediction of the next time step action of the sequence by using a classification network;

In more detail, first, a data set is selected, taking the Gowalla data set as an example, and then the data set is processed by Python according to the following steps:

a1. sorting the place sequence data from far to near according to the time stamps by the user;

a3. selecting a time window, and recording the sequence segmentation mode based on the time window as the initial strategy pi of the strategy network ₀ ；

a4. For each user, the last place of the time series data is taken as a test set, the second last of the time series data is taken as a verification set, and the rest are taken as training sets.

The processing of the input of the model is done by calling some of the packages in Tensorflow and Python, including the following steps:

b2. data embedding: mapping the N-dimensional user vector to another low-dimensional numerical vector space by using an embedding technology, and using the N-dimensional user vector as an input of a subsequent model, wherein a transformed user vector set is represented as U = { U = { (U) } ₁ ,u ₂ ,…,u _N Represents a set of location vectors as P = { P = } ₁ ,p ₂ ,…,p _M }。

Next, the change of GRU modules and tensors in the tensors is used to complete the construction of the hierarchical recurrent neural network, which comprises the following steps:

c1. sequence level recurrent neural network:

c11. inputting a sequence of locations of length L, noted

c12. Calculated by recurrent neural networks

c2. Session level recurrent neural network:

c21. according to a segmentation strategy pi, a result corresponding to the selected segmentation time step is output from the sequence level recurrent neural network and is used as the input of the session level recurrent neural network, the length is | pi |, and is recorded as

c22. Calculated by recurrent neural networks

c3. And (3) expanding and outputting according to the time step: from an output of length | π |)

And according to the segmentation strategy pi, expanding the output with the length of L. Specifically, the output of the subsequent session stage is developed from the hidden state of the final time step of the previous session, and the output of the first session is an all-zero vector.

Constructing a strategy network by using built-in functions of Tensorflow, and generating actions of each time step by using the strategy network so as to generate action a at time step t _t By way of example, the method comprises the following steps:

d1. calculating a state function s _t ：

Wherein

Represents a concatenation of vectors, is>

And &>

d2. calculating a policy function pi:

π(a _t |s _t ；Θ)＝σ(W _π *s _t +b _π ),

wherein W _π ,b _π Are parameters of the policy network. During the training process, action a _t Is sampled by a strategic pi probability value, and during testing, the action a depends on

The method comprises the following steps of constructing the behavior of the next time step of a classification network prediction sequence by utilizing a full connection layer in Tensorflow and the like, and predicting the time step t +1 by using the time step t as an example:

e1. sum of output of the concatenated user representation and the hierarchical recurrent neural network:

e2. adding a full connection layer thereon:

The method for training the parameters of the network model according to different target functions by calling optimization functions such as back propagation in Tensorflow comprises the following steps:

f1. pre-training a classification network: applying an initial policy π ₀ And training samples, wherein the cross entropy function is taken as an objective function of the training classification network by utilizing back propagation:

/>

where Θ represents all parameters in the classification network and β is a hyperparameter that trades off the loss of both parts. Updating parameters in the network by minimizing the above equation;

f2. pre-training the strategy network: keeping the parameters in the classification network unchanged, wherein the delay reward function of the whole sequence strategy network is as follows:

wherein y is _t Is inputting X _L The real place mark at time step t is expressed by one-hot coding, L' represents the number of sessions in the sequence, gamma is a hyper-parameter for measuring two-part rewards, and Q is a certain constant. In practice Q =100 may be set and changing the second term of the reward function can put different restrictions on the length of the session. The gradient updating formula of one sequence in the updating strategy network is as follows:

wherein

Is a parameter in the strategy network, so as to train the strategy network;

f3. performing combined training: parameters in the entire network are jointly trained until the loss converges.

Using trained parameters W in a classification network _o ,b _o And predicting the next possible behavior of the user in the test set, comprising the following steps:

g1. the sum of the concatenated user representation and the output of the last time step of the hierarchical recurrent neural network:

g2. adding a full connection layer prediction target:

wherein

For the predicted location distribution, we express it in a one-hot code, and Wo, bo are parameters of the classification network.

In practice, between the model layers, the following steps may also be optionally included: during model training, the parameters are limited by using a dropout network and a two-norm rule of the parameters, and the condition of overfitting is prevented.

In an embodiment of the present invention, a frame diagram of a whole network model is shown in fig. 2:

h1. hierarchical recurrent neural networks: for sequence input of a user, extracting sequence information representation of different levels by using a sequence level recurrent neural network and a session level recurrent neural network respectively;

h2. policy network: receiving user representation and output of a hierarchical recurrent neural network, calculating delay reward of the whole sequence by using a full connection layer, and updating parameters in the network by using gradient;

h3. classifying the network: receiving user representation and output of the hierarchical recurrent neural network, completing behavior prediction of a user in future time step by using a full connection layer, and updating parameters in the network by using a back propagation algorithm.

The method of the invention can also be applied to other user time sequence behaviors, such as a commodity purchasing sequence of a user and a music listening sequence of the user, the implementation is basically the same as the embodiment, and the specific process is not explained in detail.

The parameters in the embodiments of the present invention are determined according to the experimental results, that is, different parameter combinations are tested, a group of parameters with better evaluation indexes on the verification set is selected, and the results are obtained by evaluating on the test set. In actual tests, the purpose of the invention can be achieved by properly adjusting the parameters according to requirements.

The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art are intended to be included within the present invention without departing from the spirit and scope of the inventive concept and are intended to be protected by the following claims.

Claims

1. A user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning is characterized by comprising the following steps:

in the third step, the time sequence data is modeled by utilizing a hierarchical recurrent neural network, and u is represented by a user _k For example, the method comprises the following steps:

d1. sequence level recurrent neural network:

d11. inputting a sequence of locations of length L, noted

d12. Calculating by a recurrent neural network to obtain the output of each time step, and recording as

d2. Session level recurrent neural network:

d21. according to the segmentation strategy pi, the result corresponding to the selection segmentation time step is output from the sequence level recurrent neural network and is used as the input of the session level recurrent neural network, the length is | pi |, and the result is recorded as

d22. Calculating by the recurrent neural network to obtain the output of the recurrent neural network at the conversation level, and recording as

d3. According to the time stepAnd (3) unfolding and outputting: from an output of length | π |)

Expanding the output into the output with the length of L according to the segmentation strategy pi;

wherein, the recurrent neural network refers to a gated recurrent unit network or a long-short term memory network, taking time step t as an example, note x _t The specific calculation process for the input at time step t comprises the following steps:

c1. compute update gate z _t ：

z _t ＝σ(W _z ·[h _t-1 ,x _t ]+b _z ),

c2 calculating reset gate r _t ：

r _t ＝σ(W _r ·[h _t-1 ,x _t ]+b _r ),

c3. Computing hidden memory states

c4. Calculating the hidden state h _t ：

Wherein σ is a sigmoid function, and [ means ] Hadamard product]Representing concatenation of vectors, a matrix multiplication, W _z ,W _r ,

b _z ,b _r ,b _h Are all model learnable parameters;

2. The automatic segmentation prediction method for user time series behaviors based on hierarchical recurrent neural network and reinforcement learning as claimed in claim 1, wherein the user time series behaviors include user sign-in behavior, user commodity purchasing behavior, user webpage clicking behavior, and user music listening behavior, all of which are behavior data commonly adopted in the field.

3. The method for automatic segmentation prediction of user time-series behavior based on hierarchical recurrent neural network and reinforcement learning of claim 1, wherein in the first step, the data set comprises: one or more of the Gowalla dataset, the Foursquare dataset, and the Amazon dataset, all of which are public behavior datasets commonly employed in the art.

4. The method according to claim 1, wherein the step one, preprocessing the data comprises the following steps:

a2. filtering the infrequent data;

5. The method for automatically segmenting and predicting the user time-series behaviors based on the hierarchical recurrent neural network and the reinforcement learning of claim 1, wherein in the first step, the segmentation data is a training set, and the verification set and the test set are: for each user, the last place of the time series data is used as a test set, the second last of the time series data is used as a verification set, and the rest of the time series data are used as training sets.

6. The method for automatic segmentation prediction of user time-series behavior based on hierarchical recurrent neural network and reinforcement learning as claimed in claim 1, wherein in the second step, the input of the model comprises the following steps:

b2. data embedding: mapping the N-dimensional user vector to another low-dimensional numerical vector space by using an embedding technology, and using the N-dimensional user vector as an input of a subsequent model, wherein a transformed user vector set is represented as U = { U = { (U) } ₁ ,u ₂ ,…,u _N Is represented as P { P } by the set of location vectors ₁ ,p ₂ ,…,p _M }。

7. The method according to claim 1, wherein the step three is to generate the action of each time step by using a policy network, so as to generate the action a at the time step t _t By way of example, the method comprises the following steps:

e1. defining a state function s _t ：

Wherein

Represents a concatenation of vectors, is>

And &>

e2. defining an action space a _t ：

a _t ＝{1,0},

e3. defining a policy function pi:

π(a _t |s _t ；Θ)＝σ(W _π *s _t +b _π ),

wherein W _π ,b _π Parameters of the policy network; during the training process, action a _t Is sampled by a strategy pi probability value, and when testing, the action a depends on

8. The automatic segmentation prediction method for user time sequence behaviors based on hierarchical recurrent neural networks and reinforcement learning according to claim 1, wherein in the third step, the classification network is used to predict the behavior of the next time step in the sequence, and the time step t +1 is predicted by using the time step t as an example, and the method comprises the following steps:

f2. adding a full connection layer thereon:

and representing the predicted position where the user arrives at the time step t +1 by using M-dimensional sparse one-hot coding.

9. The method for automatically segmenting and predicting the user time-series behaviors based on the hierarchical recurrent neural network and the reinforcement learning of claim 1, wherein the fourth step comprises the following steps according to different objective functions:

g1. when the whole strategy network completes the generation of the sequence action, the delay reward function of the whole sequence strategy network is defined as follows:

wherein y is _t Is inputting X _L The real place mark at the time step t is represented by single hot code, L' represents the number of sessions in the sequence, gamma is a hyper-parameter for measuring two parts of rewards, and Q is a certain constant;

g2. the gradient update formula defining a sequence in the policy network is:

wherein

Is a parameter in the policy network;

10. The method for automatic segmentation prediction of user temporal behavior based on hierarchical recurrent neural networks and reinforcement learning of claim 9, wherein the step four, the step of training the parameters of the network model in stages, comprises the following steps:

h1. pre-training a classification network: applying an initial policy π ₀ And training samples, updating parameters in the classification network by using back propagation and taking the function in the step g3 as a minimization target;

h2. pre-training the strategy network: keeping the parameters in the classification network unchanged, and training the parameters of the strategy network by updating the gradient updating formula in the step g 2;

11. The method for automatically segmenting and predicting the user time-series behaviors based on the hierarchical recurrent neural network and the reinforcement learning according to claim 1, wherein in the fifth step, the next possible behaviors of the user in the test set are predicted by using a network model based on the hierarchical recurrent neural network and the reinforcement learning, and the method comprises the following steps:

i1. the sum of the output of the last time step of the hierarchical recurrent neural network and the concatenated user representation:

i2. adding a full connection layer prediction target:

wherein

For the predicted location distribution, expressed by M-dimensional sparse one-hot coding,W _o ,b _o are parameters of the classification network. />