CN110110372B - Automatic segmentation prediction method for user time sequence behavior - Google Patents

Automatic segmentation prediction method for user time sequence behavior Download PDF

Info

Publication number
CN110110372B
CN110110372B CN201910279004.0A CN201910279004A CN110110372B CN 110110372 B CN110110372 B CN 110110372B CN 201910279004 A CN201910279004 A CN 201910279004A CN 110110372 B CN110110372 B CN 110110372B
Authority
CN
China
Prior art keywords
user
network
time
recurrent neural
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910279004.0A
Other languages
Chinese (zh)
Other versions
CN110110372A (en
Inventor
张伟
梁文伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201910279004.0A priority Critical patent/CN110110372B/en
Publication of CN110110372A publication Critical patent/CN110110372A/en
Application granted granted Critical
Publication of CN110110372B publication Critical patent/CN110110372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Short-session based recommendations have been a hot problem in recommendation systems. Short-session based recommendations mean that the user's future is predicted from the user's continuous behavior over a short time window. The conventional method generally divides the time sequence behavior of a user into a plurality of short sessions according to a time window with a fixed size, and the division mode has the disadvantages that 1) if the time window is too large, the short sessions contain too much user behavior, and if the time window is too small, the short sessions cannot cover complete user stage behavior; 2) It is difficult to set a time window suitable for all user behaviors, and the like. Therefore, the invention provides the automatic segmentation prediction method for the user time sequence behavior based on the deep sequence reinforcement learning, the user sequence does not need to be artificially divided, and the defects are effectively overcome.

Description

Automatic segmentation prediction method for user time sequence behavior
Technical Field
The invention relates to the technical field of computer science, in particular to a user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning.
Background
Short-session based recommendations have been a hot issue in the field of machine learning and recommendation systems. Short-session based recommendations mean that the user's future is predicted from the user's continuous behavior over a short time window. For example, a user checks in to 5 places on a social networking application a day; the user has clicked on 8 items, etc. within a period of time while logging on to an e-commerce web site. The traditional approach is to model such short sessions through a recurrent neural network.
However, the conventional method usually divides the complete time sequence behavior of the user into a plurality of short sessions according to a time window with a fixed size, and such a division manner has the disadvantages that 1) for one user, if the time window is set to be too large, the short sessions may contain a plurality of segments of user behaviors which are mutually independent, and if the time window is set to be too small, the short sessions cannot cover a complete user stage behavior; 2) It is difficult to set a time window suitable for all user behaviors, and the like. Therefore, the invention provides the automatic segmentation and prediction method of the user time sequence behavior based on the hierarchical recurrent neural network and the reinforcement learning, the user sequence does not need to be divided artificially, and the defects are effectively solved.
Disclosure of Invention
The invention innovatively provides a user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning for the first time, which is characterized in that a strategy network is used for learning segmentation of user time sequence data, a hierarchical recurrent neural network is used for modeling divided conversations, and behaviors of future users are predicted. Through searching, no prior art or report related to the invention is found. According to the invention, the hierarchical recurrent neural network is adopted to model the user time sequence, the user behavior expressions of different hierarchies are considered, and important sequence information can be effectively extracted.
The invention provides a user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning, which comprises the following steps:
the method comprises the following steps: selecting a data set, preprocessing the data, and then segmenting the data into a training set, a verification set and a test set;
step two: representing the user and the time sequence behavior by using high-dimensional one-hot coding, and converting the user and the time sequence behavior into low-dimensional dense vectors by using an embedding technology to be used as the input of a model;
step three: modeling time sequence data by using a hierarchical recurrent neural network, generating an action of each time step by using a strategy network to guide whether to segment the sequence, and then completing prediction of the next time step action of the sequence by using a classification network;
step four: training model parameters, namely using training samples to optimize parameters of the network model in stages according to different target functions and using a verification set to adjust and optimize the model parameters;
step five: and predicting the next possible behaviors of the users in the test set by using a network model based on the hierarchical recurrent neural network and the reinforcement learning.
In the invention, the user time sequence behaviors comprise user sign-in behaviors, user commodity purchasing behaviors, user webpage clicking behaviors and user music listening behaviors, which are all behavior data commonly adopted in the field.
In the present invention, the data set includes: one or more of the Gowalla dataset, the Foursquare dataset, and the Amazon dataset, all of which are public behavior datasets commonly employed in the art.
In the first step, the data preprocessing comprises the following steps:
a1. sequencing the behavior sequence data from far to near according to the time stamp by the user;
a2. filtering the infrequent data, deleting users with less than 10 behaviors, and deleting articles with less than 5 user behaviors;
a3. selecting a time window, and recording the sequence segmentation mode based on the time window as the initial strategy pi of the strategy network 0
In the first step, the data is segmented into a training set, and the verification set and the test set refer to that for each user, the last place of the time sequence data is used as the test set, the penultimate of the time sequence data is used as the verification set, and the rest of the time sequence data are used as the training set.
In the second step, obtaining the input of the model comprises the following steps:
b1. data coding: the method comprises the steps of recording N users and M places, adopting independent hot coding, namely representing a user set by using an N-dimensional sparse vector, recording the unique characteristic dimension of each user as 1, and taking the rest as 0, and applying to the places in the same way;
b2. data embedding: embedding technique for N-dimensionThe user vector is mapped to another low-dimensional numerical vector space, and as the input of the subsequent model, the transformed user vector set is represented as U = { U = 1 ,u 2 ,…,u N Represents a set of location vectors as P = { P = } 1 ,p 2 ,…,p M }。
In the third step, the recurrent neural network refers to but is not limited to a gated recurrent unit network, and can be replaced by a long-term and short-term memory network, taking time step t as an example, note x t The specific calculation process for the input at time step t comprises the following steps:
c1. compute update gate z t
z t =σ(W z ·[h t-1 ,x t ]+b z ),
c2. Calculate reset gate r t
r t =σ(W r ·[h t-1, x t ]+b r ),
c3. Computing hidden memory states
Figure BDA0002021055130000031
Figure BDA0002021055130000032
c4. Calculating the hidden state h t
Figure BDA0002021055130000033
Wherein σ is a sigmoid function, and [ means ] Hadamard product]Represents concatenation of vectors, represents matrix multiplication,
Figure BDA00020210551300000311
are all parameters that the model can learn.
In the third step, the time sequence data is modeled by utilizing a hierarchical recurrent neural network, and u is expressed by a user k For example, the method comprises the following steps:
d1. sequence level recurrent neural network:
d11. sequence of places of length L of input sequence, denoted
Figure BDA0002021055130000034
d12. Calculated by recurrent neural networks
Figure BDA0002021055130000035
The output of each time step of the sequence level is obtained and recorded as->
Figure BDA0002021055130000036
d2. Session level recurrent neural network:
d21. according to a segmentation strategy pi, a result corresponding to the selected segmentation time step is output from the sequence level recurrent neural network and is used as the input of the session level recurrent neural network, the length is | pi |, and is recorded as
Figure BDA0002021055130000037
d22. Via recurrent neural network computation
Figure BDA0002021055130000038
Get the output of the recurrent neural network of the conversation level, and mark as &>
Figure BDA0002021055130000039
d3. And (3) expanding and outputting according to the time step: from an output of length | π |)
Figure BDA00020210551300000310
And expanding the output with the length of L according to the segmentation strategy pi.
In the third step, the action of each time step is generated by using the strategy network, and the action a generated at the time of the time step t t The method comprises the following steps:
e1. defining a state function s t
Figure BDA0002021055130000041
Wherein
Figure BDA0002021055130000042
Represents a concatenation of vectors, is>
Figure BDA0002021055130000043
And &>
Figure BDA0002021055130000044
Respectively outputting the sequence level and the conversation level of the recurrent neural network at a time step t;
e2. defining an action space a t
a t ={1,0},
Wherein 1 represents that the current behavior belongs to the current session, and 0 represents that the current behavior does not belong to the current session;
e3. defining a policy function pi:
π(a t |s t ;Θ)=σ(W π *s t +b π ),
wherein W π ,b π Are parameters of the policy network. During the training process, action a t Is sampled by a strategy pi probability value, and when testing, the action a depends on
Figure BDA0002021055130000045
In the third step, the behavior of the next time step of the sequence is predicted by using the classification network, and the time step t +1 is predicted by using the time step t as an example, and the method comprises the following steps:
f1. sum of output of the concatenated user representation and the hierarchical recurrent neural network:
Figure BDA0002021055130000046
f2. adding a full connection layer thereon:
Figure BDA0002021055130000047
wherein W o ,b o Is a parameter of the classification network, the dimension of which is consistent with the number of places as M,
Figure BDA0002021055130000048
the predicted location where the user arrived at time step t +1 is represented by a one-hot code.
In the fourth step, according to different objective functions, the method comprises the following steps:
g1. when the whole strategy network completes the generation of sequence actions, the segmentation of the whole sequence is also completed, firstly, a delay reward function of the whole sequence strategy network is defined as:
Figure BDA0002021055130000049
wherein y is t Is input of X L The real place mark at time step t is expressed by one-hot coding, L' represents the number of sessions in the sequence, gamma is a hyper-parameter for measuring two-part rewards, and Q is a certain constant. Assuming that the length of a session is moderate, a unimodal function f (x) = x + Q/x is proposed, which is in
Figure BDA00020210551300000410
Takes the minimum value in time>
Figure BDA00020210551300000411
Artificially wishing for the length of a session
Figure BDA00020210551300000412
It would be better. Different limits can be put on the length of the session by replacing the second item of the reward function;
g2. the gradient update formula defining a sequence in the policy network is:
Figure BDA0002021055130000051
wherein
Figure BDA0002021055130000052
Is a parameter in the policy network;
g3. defining a cross entropy function as an objective function of the training classification network:
Figure BDA0002021055130000053
where Θ represents all parameters in the classification network and β is a hyperparameter that trades off the loss of both parts.
In the fourth step, the parameters of the staged training network model include the following steps:
h1. pre-training a classification network: applying an initial policy π 0 And training samples, utilizing back propagation, taking the cross entropy loss function defined in the step (3) as a minimization target, and updating parameters in the classification network;
h2. pre-training the strategy network: keeping the parameters in the classification network unchanged, and training the parameters in the strategy network by updating the gradient updating formula defined in the step (2);
h3. performing combined training: parameters in the entire network are jointly trained until the loss converges.
In the fifth step, the next possible behavior of the user in the test set is predicted by using the network model based on the hierarchical recurrent neural network and the reinforcement learning, and the method comprises the following steps:
i1. the sum of the concatenated user representation and the output of the last time step of the hierarchical recurrent neural network:
Figure BDA0002021055130000054
i2. adding a full connection layer prediction target:
Figure BDA0002021055130000055
wherein
Figure BDA0002021055130000056
For the predicted location distribution, represented by one-hot coding, W o ,b o Are parameters of the classification network.
Compared with the prior art, the invention has the beneficial effects that:
(1) The hierarchical recurrent neural network is adopted to model the user time sequence, the user behavior representation of different levels is considered, and important sequence information can be effectively extracted;
(2) The method comprises the steps of utilizing strategy network learning to segment user time sequence data, reasonably dividing sequences by considering the relation between the front and the back of the sequences, and meanwhile considering various constraints into a reward function;
(3) The method effectively solves the defect that the artificial division sequence is short conversation, such as the problem that the window size suitable for all users cannot be provided.
Drawings
Fig. 1 is a schematic flow chart of a user time series behavior segmentation prediction method according to the present invention.
FIG. 2 is a block diagram of the overall network model in one embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides a user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning, a flow chart is shown as figure 1, and the method comprises the following steps:
the method comprises the following steps: selecting a data set, preprocessing the data, and then segmenting the data into a training set, a verification set and a test set;
step two: expressing the user and the time sequence behavior by using high-dimensional single hot codes, and converting the user and the time sequence behavior into low-dimensional dense vectors by using an embedding technology to serve as the input of a model;
step three: modeling the time sequence data by using a hierarchical recurrent neural network, generating an action of each time step by using a strategy network to guide whether the sequence is segmented, and then completing prediction of the next time step action of the sequence by using a classification network;
step four: training model parameters, namely using training samples to optimize parameters of the network model in stages according to different target functions and using a verification set to adjust and optimize the model parameters;
step five: and predicting the next possible behaviors of the users in the test set by using a network model based on the hierarchical recurrent neural network and the reinforcement learning.
In more detail, first, a data set is selected, taking the Gowalla data set as an example, and then the data set is processed by Python according to the following steps:
a1. sorting the place sequence data from far to near according to the time stamps by the user;
a2. filtering the infrequent data, deleting users with less than 10 behaviors, and deleting articles with less than 5 user behaviors;
a3. selecting a time window, and recording the sequence segmentation mode based on the time window as the initial strategy pi of the strategy network 0
a4. For each user, the last place of the time series data is taken as a test set, the second last of the time series data is taken as a verification set, and the rest are taken as training sets.
The processing of the input of the model is done by calling some of the packages in Tensorflow and Python, including the following steps:
b1. data coding: the method comprises the steps of recording N users and M places, adopting independent hot coding, namely representing a user set by using an N-dimensional sparse vector, recording the unique characteristic dimension of each user as 1, and taking the rest as 0, and applying to the places in the same way;
b2. data embedding: mapping the N-dimensional user vector to another low-dimensional numerical vector space by using an embedding technology, and using the N-dimensional user vector as an input of a subsequent model, wherein a transformed user vector set is represented as U = { U = { (U) } 1 ,u 2 ,…,u N Represents a set of location vectors as P = { P = } 1 ,p 2 ,…,p M }。
Next, the change of GRU modules and tensors in the tensors is used to complete the construction of the hierarchical recurrent neural network, which comprises the following steps:
c1. sequence level recurrent neural network:
c11. inputting a sequence of locations of length L, noted
Figure BDA0002021055130000071
c12. Calculated by recurrent neural networks
Figure BDA0002021055130000072
The output of each time step of the sequence level is obtained and recorded as->
Figure BDA0002021055130000073
c2. Session level recurrent neural network:
c21. according to a segmentation strategy pi, a result corresponding to the selected segmentation time step is output from the sequence level recurrent neural network and is used as the input of the session level recurrent neural network, the length is | pi |, and is recorded as
Figure BDA0002021055130000074
c22. Calculated by recurrent neural networks
Figure BDA0002021055130000075
Get the output of the recurrent neural network of the conversation level, and mark as &>
Figure BDA0002021055130000076
c3. And (3) expanding and outputting according to the time step: from an output of length | π |)
Figure BDA0002021055130000077
And according to the segmentation strategy pi, expanding the output with the length of L. Specifically, the output of the subsequent session stage is developed from the hidden state of the final time step of the previous session, and the output of the first session is an all-zero vector.
Constructing a strategy network by using built-in functions of Tensorflow, and generating actions of each time step by using the strategy network so as to generate action a at time step t t By way of example, the method comprises the following steps:
d1. calculating a state function s t
Figure BDA0002021055130000081
Wherein
Figure BDA0002021055130000082
Represents a concatenation of vectors, is>
Figure BDA0002021055130000083
And &>
Figure BDA0002021055130000084
Respectively outputting the sequence level and the conversation level of the recurrent neural network at a time step t;
d2. calculating a policy function pi:
π(a t |s t ;Θ)=σ(W π *s t +b π ),
wherein W π ,b π Are parameters of the policy network. During the training process, action a t Is sampled by a strategic pi probability value, and during testing, the action a depends on
Figure BDA0002021055130000085
The method comprises the following steps of constructing the behavior of the next time step of a classification network prediction sequence by utilizing a full connection layer in Tensorflow and the like, and predicting the time step t +1 by using the time step t as an example:
e1. sum of output of the concatenated user representation and the hierarchical recurrent neural network:
Figure BDA0002021055130000086
e2. adding a full connection layer thereon:
Figure BDA0002021055130000087
wherein W o ,b o Is a parameter of the classification network, the dimension of which is consistent with the number of places as M,
Figure BDA0002021055130000088
the predicted location where the user arrived at time step t +1 is represented by a one-hot code.
The method for training the parameters of the network model according to different target functions by calling optimization functions such as back propagation in Tensorflow comprises the following steps:
f1. pre-training a classification network: applying an initial policy π 0 And training samples, wherein the cross entropy function is taken as an objective function of the training classification network by utilizing back propagation:
Figure BDA0002021055130000089
/>
where Θ represents all parameters in the classification network and β is a hyperparameter that trades off the loss of both parts. Updating parameters in the network by minimizing the above equation;
f2. pre-training the strategy network: keeping the parameters in the classification network unchanged, wherein the delay reward function of the whole sequence strategy network is as follows:
Figure BDA00020210551300000810
wherein y is t Is inputting X L The real place mark at time step t is expressed by one-hot coding, L' represents the number of sessions in the sequence, gamma is a hyper-parameter for measuring two-part rewards, and Q is a certain constant. In practice Q =100 may be set and changing the second term of the reward function can put different restrictions on the length of the session. The gradient updating formula of one sequence in the updating strategy network is as follows:
Figure BDA0002021055130000091
wherein
Figure BDA0002021055130000092
Is a parameter in the strategy network, so as to train the strategy network;
f3. performing combined training: parameters in the entire network are jointly trained until the loss converges.
Using trained parameters W in a classification network o ,b o And predicting the next possible behavior of the user in the test set, comprising the following steps:
g1. the sum of the concatenated user representation and the output of the last time step of the hierarchical recurrent neural network:
Figure BDA0002021055130000093
g2. adding a full connection layer prediction target:
Figure BDA0002021055130000094
wherein
Figure BDA0002021055130000095
For the predicted location distribution, we express it in a one-hot code, and Wo, bo are parameters of the classification network.
In practice, between the model layers, the following steps may also be optionally included: during model training, the parameters are limited by using a dropout network and a two-norm rule of the parameters, and the condition of overfitting is prevented.
In an embodiment of the present invention, a frame diagram of a whole network model is shown in fig. 2:
h1. hierarchical recurrent neural networks: for sequence input of a user, extracting sequence information representation of different levels by using a sequence level recurrent neural network and a session level recurrent neural network respectively;
h2. policy network: receiving user representation and output of a hierarchical recurrent neural network, calculating delay reward of the whole sequence by using a full connection layer, and updating parameters in the network by using gradient;
h3. classifying the network: receiving user representation and output of the hierarchical recurrent neural network, completing behavior prediction of a user in future time step by using a full connection layer, and updating parameters in the network by using a back propagation algorithm.
The method of the invention can also be applied to other user time sequence behaviors, such as a commodity purchasing sequence of a user and a music listening sequence of the user, the implementation is basically the same as the embodiment, and the specific process is not explained in detail.
The parameters in the embodiments of the present invention are determined according to the experimental results, that is, different parameter combinations are tested, a group of parameters with better evaluation indexes on the verification set is selected, and the results are obtained by evaluating on the test set. In actual tests, the purpose of the invention can be achieved by properly adjusting the parameters according to requirements.
The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art are intended to be included within the present invention without departing from the spirit and scope of the inventive concept and are intended to be protected by the following claims.

Claims (11)

1. A user time sequence behavior automatic segmentation prediction method based on a hierarchical recurrent neural network and reinforcement learning is characterized by comprising the following steps:
the method comprises the following steps: selecting a data set, preprocessing the data, and then segmenting the data into a training set, a verification set and a test set;
step two: representing the user and the time sequence behavior by using high-dimensional one-hot coding, and converting the user and the time sequence behavior into low-dimensional dense vectors by using an embedding technology to be used as the input of a model;
step three: modeling time sequence data by using a hierarchical recurrent neural network, generating an action of each time step by using a strategy network to guide whether to segment the sequence, and then completing prediction of the next time step action of the sequence by using a classification network;
in the third step, the time sequence data is modeled by utilizing a hierarchical recurrent neural network, and u is represented by a user k For example, the method comprises the following steps:
d1. sequence level recurrent neural network:
d11. inputting a sequence of locations of length L, noted
Figure FDA0003926586210000011
d12. Calculating by a recurrent neural network to obtain the output of each time step, and recording as
Figure FDA0003926586210000012
d2. Session level recurrent neural network:
d21. according to the segmentation strategy pi, the result corresponding to the selection segmentation time step is output from the sequence level recurrent neural network and is used as the input of the session level recurrent neural network, the length is | pi |, and the result is recorded as
Figure FDA0003926586210000013
d22. Calculating by the recurrent neural network to obtain the output of the recurrent neural network at the conversation level, and recording as
Figure FDA0003926586210000014
d3. According to the time stepAnd (3) unfolding and outputting: from an output of length | π |)
Figure FDA0003926586210000015
Expanding the output into the output with the length of L according to the segmentation strategy pi;
wherein, the recurrent neural network refers to a gated recurrent unit network or a long-short term memory network, taking time step t as an example, note x t The specific calculation process for the input at time step t comprises the following steps:
c1. compute update gate z t
z t =σ(W z ·[h t-1 ,x t ]+b z ),
c2 calculating reset gate r t
r t =σ(W r ·[h t-1 ,x t ]+b r ),
c3. Computing hidden memory states
Figure FDA0003926586210000016
Figure FDA0003926586210000021
c4. Calculating the hidden state h t
Figure FDA0003926586210000022
Wherein σ is a sigmoid function, and [ means ] Hadamard product]Representing concatenation of vectors, a matrix multiplication, W z ,W r ,
Figure FDA0003926586210000023
b z ,b r ,b h Are all model learnable parameters;
step four: training model parameters, namely using training samples to optimize parameters of the network model in stages according to different target functions and using a verification set to adjust and optimize the model parameters;
step five: and predicting the next possible behaviors of the users in the test set by using a network model based on the hierarchical recurrent neural network and the reinforcement learning.
2. The automatic segmentation prediction method for user time series behaviors based on hierarchical recurrent neural network and reinforcement learning as claimed in claim 1, wherein the user time series behaviors include user sign-in behavior, user commodity purchasing behavior, user webpage clicking behavior, and user music listening behavior, all of which are behavior data commonly adopted in the field.
3. The method for automatic segmentation prediction of user time-series behavior based on hierarchical recurrent neural network and reinforcement learning of claim 1, wherein in the first step, the data set comprises: one or more of the Gowalla dataset, the Foursquare dataset, and the Amazon dataset, all of which are public behavior datasets commonly employed in the art.
4. The method according to claim 1, wherein the step one, preprocessing the data comprises the following steps:
a1. sequencing the behavior sequence data from far to near according to the time stamp by the user;
a2. filtering the infrequent data;
a3. selecting a time window, and recording the sequence segmentation mode based on the time window as the initial strategy pi of the strategy network 0
5. The method for automatically segmenting and predicting the user time-series behaviors based on the hierarchical recurrent neural network and the reinforcement learning of claim 1, wherein in the first step, the segmentation data is a training set, and the verification set and the test set are: for each user, the last place of the time series data is used as a test set, the second last of the time series data is used as a verification set, and the rest of the time series data are used as training sets.
6. The method for automatic segmentation prediction of user time-series behavior based on hierarchical recurrent neural network and reinforcement learning as claimed in claim 1, wherein in the second step, the input of the model comprises the following steps:
b1. data coding: the method comprises the steps of recording N users and M places, adopting independent hot coding, namely representing a user set by using an N-dimensional sparse vector, recording the unique characteristic dimension of each user as 1, and taking the rest as 0, and applying to the places in the same way;
b2. data embedding: mapping the N-dimensional user vector to another low-dimensional numerical vector space by using an embedding technology, and using the N-dimensional user vector as an input of a subsequent model, wherein a transformed user vector set is represented as U = { U = { (U) } 1 ,u 2 ,…,u N Is represented as P { P } by the set of location vectors 1 ,p 2 ,…,p M }。
7. The method according to claim 1, wherein the step three is to generate the action of each time step by using a policy network, so as to generate the action a at the time step t t By way of example, the method comprises the following steps:
e1. defining a state function s t
Figure FDA0003926586210000031
Wherein
Figure FDA0003926586210000032
Represents a concatenation of vectors, is>
Figure FDA0003926586210000033
And &>
Figure FDA0003926586210000034
Respectively outputting the sequence level and the conversation level of the recurrent neural network at a time step t;
e2. defining an action space a t
a t ={1,0},
Wherein 1 represents that the current behavior belongs to the current session, and 0 represents that the current behavior does not belong to the current session;
e3. defining a policy function pi:
π(a t |s t ;Θ)=σ(W π *s t +b π ),
wherein W π ,b π Parameters of the policy network; during the training process, action a t Is sampled by a strategy pi probability value, and when testing, the action a depends on
Figure FDA0003926586210000035
8. The automatic segmentation prediction method for user time sequence behaviors based on hierarchical recurrent neural networks and reinforcement learning according to claim 1, wherein in the third step, the classification network is used to predict the behavior of the next time step in the sequence, and the time step t +1 is predicted by using the time step t as an example, and the method comprises the following steps:
f1. sum of output of the concatenated user representation and the hierarchical recurrent neural network:
Figure FDA0003926586210000036
f2. adding a full connection layer thereon:
Figure FDA0003926586210000037
wherein W o ,b o Is a parameter of the classification network, the dimension of which is consistent with the number of places as M,
Figure FDA0003926586210000038
and representing the predicted position where the user arrives at the time step t +1 by using M-dimensional sparse one-hot coding.
9. The method for automatically segmenting and predicting the user time-series behaviors based on the hierarchical recurrent neural network and the reinforcement learning of claim 1, wherein the fourth step comprises the following steps according to different objective functions:
g1. when the whole strategy network completes the generation of the sequence action, the delay reward function of the whole sequence strategy network is defined as follows:
Figure FDA0003926586210000041
wherein y is t Is inputting X L The real place mark at the time step t is represented by single hot code, L' represents the number of sessions in the sequence, gamma is a hyper-parameter for measuring two parts of rewards, and Q is a certain constant;
g2. the gradient update formula defining a sequence in the policy network is:
Figure FDA0003926586210000042
wherein
Figure FDA0003926586210000043
Is a parameter in the policy network;
g3. defining a cross entropy function as an objective function of the training classification network:
Figure FDA0003926586210000044
where Θ represents all parameters in the classification network and β is a hyperparameter that trades off the loss of both parts.
10. The method for automatic segmentation prediction of user temporal behavior based on hierarchical recurrent neural networks and reinforcement learning of claim 9, wherein the step four, the step of training the parameters of the network model in stages, comprises the following steps:
h1. pre-training a classification network: applying an initial policy π 0 And training samples, updating parameters in the classification network by using back propagation and taking the function in the step g3 as a minimization target;
h2. pre-training the strategy network: keeping the parameters in the classification network unchanged, and training the parameters of the strategy network by updating the gradient updating formula in the step g 2;
h3. performing combined training: parameters in the entire network are jointly trained until the loss converges.
11. The method for automatically segmenting and predicting the user time-series behaviors based on the hierarchical recurrent neural network and the reinforcement learning according to claim 1, wherein in the fifth step, the next possible behaviors of the user in the test set are predicted by using a network model based on the hierarchical recurrent neural network and the reinforcement learning, and the method comprises the following steps:
i1. the sum of the output of the last time step of the hierarchical recurrent neural network and the concatenated user representation:
Figure FDA0003926586210000051
i2. adding a full connection layer prediction target:
Figure FDA0003926586210000052
wherein
Figure FDA0003926586210000053
For the predicted location distribution, expressed by M-dimensional sparse one-hot coding,W o ,b o are parameters of the classification network. />
CN201910279004.0A 2019-04-09 2019-04-09 Automatic segmentation prediction method for user time sequence behavior Active CN110110372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910279004.0A CN110110372B (en) 2019-04-09 2019-04-09 Automatic segmentation prediction method for user time sequence behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910279004.0A CN110110372B (en) 2019-04-09 2019-04-09 Automatic segmentation prediction method for user time sequence behavior

Publications (2)

Publication Number Publication Date
CN110110372A CN110110372A (en) 2019-08-09
CN110110372B true CN110110372B (en) 2023-04-18

Family

ID=67483968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910279004.0A Active CN110110372B (en) 2019-04-09 2019-04-09 Automatic segmentation prediction method for user time sequence behavior

Country Status (1)

Country Link
CN (1) CN110110372B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160484B (en) * 2019-12-31 2023-08-29 腾讯科技(深圳)有限公司 Data processing method, data processing device, computer readable storage medium and electronic equipment
CN112001536B (en) * 2020-08-12 2023-08-11 武汉青忆辰科技有限公司 High-precision discovery method for point defect minimum sample of mathematical ability of middle and primary schools based on machine learning
CN112525213B (en) * 2021-02-10 2021-05-14 腾讯科技(深圳)有限公司 ETA prediction method, model training method, device and storage medium
CN114417817B (en) * 2021-12-30 2023-05-16 中国电信股份有限公司 Session information cutting method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787100A (en) * 2016-03-18 2016-07-20 浙江大学 User session recommendation method based on deep neural network
CN108595602A (en) * 2018-04-20 2018-09-28 昆明理工大学 The question sentence file classification method combined with depth model based on shallow Model
CN108647251A (en) * 2018-04-20 2018-10-12 昆明理工大学 The recommendation sort method of conjunctive model is recycled based on wide depth door
CN109241431A (en) * 2018-09-07 2019-01-18 腾讯科技(深圳)有限公司 A kind of resource recommendation method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180342004A1 (en) * 2017-05-25 2018-11-29 Microsoft Technology Licensing, Llc Cumulative success-based recommendations for repeat users

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787100A (en) * 2016-03-18 2016-07-20 浙江大学 User session recommendation method based on deep neural network
CN108595602A (en) * 2018-04-20 2018-09-28 昆明理工大学 The question sentence file classification method combined with depth model based on shallow Model
CN108647251A (en) * 2018-04-20 2018-10-12 昆明理工大学 The recommendation sort method of conjunctive model is recycled based on wide depth door
CN109241431A (en) * 2018-09-07 2019-01-18 腾讯科技(深圳)有限公司 A kind of resource recommendation method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Bal'azs Hidasi.SESSION-BASED RECOMMENDATIONS WITH RECURRENT NEURAL NETWORKS.《ICLR 2016》.2016,1-9. *
Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction;Dongyang Zhao;《arXiv:1903.09374v1 [cs.LG]》;20190322;1-11 *
Top-K Off-Policy Correction for a REINFORCE Recommender System;Minmin Chen;《 Proceedings of the Twelfth ACM International》;20190211;456-464 *
一种基于循环神经网络的古文断句方法;王博立;《北京大学学报(自然科学版)》;20170331;255-261 *

Also Published As

Publication number Publication date
CN110110372A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN110110372B (en) Automatic segmentation prediction method for user time sequence behavior
CN111275521B (en) Commodity recommendation method based on user comment and satisfaction level embedding
CN109345302A (en) Machine learning model training method, device, storage medium and computer equipment
Chau Application of a PSO-based neural network in analysis of outcomes of construction claims
CN111581966B (en) Context feature-fused aspect-level emotion classification method and device
CN110619044B (en) Emotion analysis method, system, storage medium and equipment
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
CN110287477A (en) Entity emotion analysis method and relevant apparatus
CN112883714B (en) ABSC task syntactic constraint method based on dependency graph convolution and transfer learning
CN110909125B (en) Detection method of media rumor of news-level society
CN115082147A (en) Sequence recommendation method and device based on hypergraph neural network
CN111582538A (en) Community value prediction method and system based on graph neural network
CN112967088A (en) Marketing activity prediction model structure and prediction method based on knowledge distillation
CN114443899A (en) Video classification method, device, equipment and medium
Wu et al. Optimized deep learning framework for water distribution data-driven modeling
CN111538841B (en) Comment emotion analysis method, device and system based on knowledge mutual distillation
Chen et al. A new approach for mobile advertising click-through rate estimation based on deep belief nets
CN116402352A (en) Enterprise risk prediction method and device, electronic equipment and medium
Zou et al. Deep field relation neural network for click-through rate prediction
CN110569355A (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
Jenny Li et al. Evaluating deep learning biases based on grey-box testing results
CN116975686A (en) Method for training student model, behavior prediction method and device
Akram et al. A comprehensive survey on Pi-Sigma neural network for time series prediction
CN116258504A (en) Bank customer relationship management system and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant