CN110110372A

CN110110372A - A kind of user's timing behavior automatic segmentation prediction technique

Info

Publication number: CN110110372A
Application number: CN201910279004.0A
Authority: CN
Inventors: 张伟; 梁文伟
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2019-08-09
Anticipated expiration: 2039-04-09
Also published as: CN110110372B

Abstract

Recommendation based on short session is always a hot issue in recommender system.Recommendation based on short session means that the Continuous behavior according to user in a bit of time window will predict user future.Time window of traditional method generally according to fixed size, the timing behavior of user is divided into multiple short sessions, such division mode is there is including excessive user behavior in the excessive then short session of 1) time window, and too small, short session can not cover complete user's stage behavior；2) the problems such as being difficult the time window that setting one is suitable for all user behaviors.Therefore, the present invention provides a kind of user's timing behavior automatic segmentation prediction technique based on the study of depth serial reinforcement, user's sequence is divided without artificial, effective solution drawbacks described above.

Description

A kind of user's timing behavior automatic segmentation prediction technique

Technical field

The present invention relates to computer science and technology fields, and in particular to is a kind of based on level Recognition with Recurrent Neural Network and reinforcing User's timing behavior automatic segmentation prediction technique of study.

Background technique

Recommendation based on short session is always a hot issue in machine learning and recommender system field.Based on short session Recommendation mean that the Continuous behavior according to user in a bit of time window will predict user future.For example, user It registers in certain social networking application in one day 5 places；User clicks within a period of time for logging in certain e-commerce website 8 commodity etc..Traditional method is to be modeled by Recognition with Recurrent Neural Network to such short session.

However, time window of traditional method generally according to fixed size, the complete timing behavior of user is divided into Multiple short sessions, there is 1) for a user, time window is arranged can in excessive then short session for such division mode Can comprising multistage mutually to independent user behavior, in contrast, time window setting it is too small, short session can not cover one it is complete Whole user's stage behavior；2) the problems such as being difficult the time window that setting one is suitable for all user behaviors.Therefore, this hair It is bright to provide a kind of user's timing behavior automatic segmentation prediction technique based on level circulation neural network and intensified learning, it is not necessarily to Artificial divides user's sequence, effective solution drawbacks described above.

Summary of the invention

The present invention innovates for the first time provides a kind of user's timing behavior based on level Recognition with Recurrent Neural Network and intensified learning Automatic segmentation prediction technique, core is cutting of the Utilization strategies e-learning to user's time series data, and is followed using level Ring neural network models the session of division, predicts the behavior of future customer.Through retrieving, there is not yet it is any with The relevant prior art of the present invention or report.The present invention models user's time series using level Recognition with Recurrent Neural Network, The user behavior for considering different levels indicates, can efficiently extract out important sequence information.

User's timing behavior automatic segmentation prediction proposed by the present invention based on level Recognition with Recurrent Neural Network and intensified learning Method, comprising the following steps:

Step 1: choosing data set, and cutting data are training set, verifying collection and test set after pre-processing to data；

Step 2: the one-hot coding of user and timing behavior higher-dimension are indicated, are converted to low-dimensional using embedded technology Input of the dense vector as model；

Step 3: modeling time series data using level Recognition with Recurrent Neural Network, when Utilization strategies network generates each The action director of spacer step whether to sequence carry out cutting, then using sorter network complete to sequence future time walking be it is pre- It surveys；

Step 4: training pattern parameter, using training sample, according to the optimization network model of different target function stage Parameter, and utilize verifying the set pair analysis model parameter carry out tuning；

Step 5: using based on the user in the network model of level Recognition with Recurrent Neural Network and intensified learning prediction test set Next probable behavior.

In the present invention, user's timing behavior include user register behavior, user buy commodity behavior, user click Music behavior is listened in webpage behavior, user, all behavioral datas generallyd use for this field.

In the present invention, the data set includes: Gowalla data set, Foursquare data set, Amazon data set In one or more data sets, all overt behavior data sets generallyd use for this field.

In the step 1, data are pre-processed the following steps are included:

A1. behavior sequence data are temporally stabbed by user and is sorted from the distant to the near；

A2. operation is filtered to data wherein infrequently, deletes the user for occurring to be less than 10 behaviors, deleted and occur Less than the article of 5 user behaviors；

A3. a time window is selected, will be recorded as the sequence slit mode of cutting foundation as policy network The initial policy π of network₀。

In the step 1, cutting data are training set, and verifying collection and test set refer to each user, by time series data The last one place as test set, for the penultimate of time series data as verifying collection, remaining is accordingly to be regarded as training set.

In the step 2, obtain the input of model the following steps are included:

B1. data encoding: note shares N number of user and M place, using one-hot coding, i.e., is indicated with N-dimensional sparse vector User's set, the unique characteristic dimension of user are denoted as 1, remaining is all 0, are similarly applied to place；

B2. data insertionization: the numerical value vector that N-dimensional user vector is mapped to another low-dimensional using embedded technology is empty Between, as the input of model later, remember that transformed user vector set expression is U={ u₁,u₂,…,u_N, place vector set Conjunction is expressed as P={ p₁,p₂,…,p_M}。

In the step three, Recognition with Recurrent Neural Network refers to but is not limited to gating cycle unit networks, is remembered using shot and long term Recall network replacement also can, by taking time step t as an example, remember x_tInput when for time step t, specific calculating process includes following step It is rapid:

C1. it calculates and updates door z_t:

z_t=σ (W_z·[h_t-1,x_t]+b_z),

C2. resetting door r is calculated_t:

r_t=σ (W_r·[h_t-1,x_t]+b_r),

C3. cryptomnesia state is calculated

C4. hidden state h is calculated_t:

Wherein, σ is sigmoid function, indicates that Hadamard is long-pending, the splicing of [] expression vector, * representing matrix multiplication,It is all the parameter that model can learn.

In the step 3, time series data is modeled using level Recognition with Recurrent Neural Network, u is indicated with user_kFor, Include the following steps:

D1. sequence level Recognition with Recurrent Neural Network:

D11. list entries length is the location sequence of L, is denoted as

D12. it is calculated by Recognition with Recurrent Neural NetworkObtain the defeated of each time step of sequence level Out, it is denoted as

D2. session-level Recognition with Recurrent Neural Network:

D21. according to cutting strategy π, from sequence level Recognition with Recurrent Neural Network export selection cutting time step it is corresponding as a result, As the input of session-level Recognition with Recurrent Neural Network, length is | π |, it is denoted as

D22. it is calculated by Recognition with Recurrent Neural NetworkObtain session-level Recognition with Recurrent Neural Network Output, is denoted as

D3. according to time step be unfolded export: be by length | π | outputAccording to cutting strategy π, exhibition Open the output for being L for length.

In the step three, Utilization strategies network generates the movement of each time step, dynamic with what is generated in time step t Make a_tCiting, includes the following steps:

E1. definition status function s_t:

WhereinIndicate the splicing of vector,WithRespectively Recognition with Recurrent Neural Network is in sequence level and session-level Output in time step t；

E2. motion space a is defined_t:

a_t={ 1,0 },

Wherein 1 expression current behavior belongs to current session, and 0 expression is not belonging to current session；

E3. definition strategy function π:

π(a_t|s_t；Θ)=σ (W_π*s_t+b_π),

Wherein W_π,b_πFor the parameter of tactful network.In the training process, a is acted_tValue by tactful π probability value sample institute , in test, movement a is depended on

In the step three, the behavior walked using sorter network forecasting sequence future time, with time step t predicted time Walk t+1 citing, comprising the following steps:

F1. splicing user indicates the sum of the output with level Recognition with Recurrent Neural Network:

F2. full articulamentum is added on it:

Wherein W_o,b_oIt is the parameter of sorter network, dimension consistent with place number is M,Indicate that the user of prediction exists The place that time step t+1 is gone to, is indicated with one-hot coding.

In the step four, according to different objective functions, comprising the following steps:

G1. when entire tactful network completes the generation acted to sequence, the cutting of entire sequence is also just completed, fixed first The delay reward function of adopted entire sequence strategy network are as follows:

Wherein y_tIt is input X_LIt in the true place marks of time step t, is indicated with one-hot coding, L' indicates session in sequence Number, γ be measure two parts reward hyper parameter, Q is a certain constant.It is assumed that the moderate length of one section of session, therefore propose Unimodal function f (x)=x+Q/x,When get minimum valueThe length of artificial one section of session of hopeIt can be relatively good.Section 2 by replacing reward function can propose different limitations to the length of session；

G2. in definition strategy network a sequence gradient updating formula are as follows:

WhereinIt is the parameter in tactful network；

G3. definition intersects the objective function that entropy function is training sorter network:

Wherein Θ represents parameter all in sorter network, and β is the hyper parameter for weighing two parts loss.

In the step four, the parameter of interim training network model, comprising the following steps:

H1. pre-training sorter network: initial policy π is applied₀And training sample, using backpropagation, with fixed in aforementioned (3) The cross entropy loss function function of justice is to minimize target, updates the parameter in sorter network；

H2. pre-training strategy network: keeping the parameter constant in sorter network, passes through and updates ladder defined in aforementioned (2) Spend more new formula, the parameter in Training strategy network；

H3. joint training: the parameter in joint training whole network, until loss restrains.

In the step five, test set is predicted using the network model based on level Recognition with Recurrent Neural Network and intensified learning In the lower probable behavior of user, comprising the following steps:

I1. splicing user indicates the sum of the output with last time step of level Recognition with Recurrent Neural Network:

I2. full articulamentum prediction target is added:

WhereinIt is distributed, is indicated with one-hot coding, W for the place of prediction_o,b_oIt is the parameter of sorter network.

Compared with prior art, the present invention includes: with beneficial effect

(1) user's time series is modeled using level Recognition with Recurrent Neural Network, it is contemplated that user's row of different levels To indicate, important sequence information can be efficiently extracted out；

(2) it is reasonable to consider that the connection between sequence front and back comes for cutting of the Utilization strategies e-learning to user's time series data Dividing sequence, while various constraints being taken into account in reward function；

(3) existing defect when the artificial dividing sequence of effective solution is short session is suitable for institute as that can not provide one There is the problem of window size of user.

Detailed description of the invention

Fig. 1 is the flow diagram of user's timing behavior cutting prediction technique of the present invention.

Fig. 2 is the frame diagram of whole network model in one embodiment of the invention.

Specific embodiment

In conjunction with following specific embodiments and attached drawing, the present invention is described in further detail.Implement process of the invention, Condition, experimental method etc. are among the general principles and common general knowledge in the art, this hair in addition to what is specifically mentioned below It is bright that there are no special restrictions to content.It should be pointed out that those skilled in the art, not departing from present inventive concept Under the premise of, various modifications and improvements can be made.These are all within the scope of protection of the present invention.

It is pre- that the present invention provides a kind of user's timing behavior automatic segmentation for recycling neural network and intensified learning based on level Survey method, flow chart as shown in Figure 1, method includes the following steps:

It is finer, it is carried out as follows by taking Gowalla data set as an example using Python firstly, choosing data set Processing:

A1. location sequence data are temporally stabbed by user and is sorted from the distant to the near；

A3. a time window is selected, will be recorded as the sequence slit mode of cutting foundation as policy network The initial policy π of network₀；

A4. to each user, using the last one place of time series data as test set, the penultimate of time series data Collect as verifying, remaining is accordingly to be regarded as training set.

By calling some packets in Tensorflow and Python, the processing of the input of model, including following step are completed It is rapid:

Next, being completed using the variation of GRU module and tensor in Tensorflow to level Recognition with Recurrent Neural Network Building, comprising the following steps:

C1. sequence level Recognition with Recurrent Neural Network:

C11. list entries length is the location sequence of L, is denoted as

C12. it is calculated by Recognition with Recurrent Neural NetworkObtain the defeated of each time step of sequence level Out, it is denoted as

C2. session-level Recognition with Recurrent Neural Network:

C21. according to cutting strategy π, from sequence level Recognition with Recurrent Neural Network export selection cutting time step it is corresponding as a result, As the input of session-level Recognition with Recurrent Neural Network, length is | π |, it is denoted as

C22. it is calculated by Recognition with Recurrent Neural NetworkObtain session-level Recognition with Recurrent Neural Network Output, is denoted as

C3. according to time step be unfolded export: be by length | π | outputAccording to cutting strategy π, exhibition Open the output for being L for length.Specifically, the output of latter session stage is unfolded by the hidden state of the final time step of previous session It obtains, the output of first session is full null vector.

Using the built-in function construction strategy network of Tensorflow, and generate with it movement of each time step, with The movement a generated when time step t_tCiting, includes the following steps:

D1. function of state s is calculated_t:

D2. calculative strategy function π:

π(a_t|s_t；Θ)=σ (W_π*s_t+b_π),

Using the full articulamentum etc. in Tensorflow, the behavior of sorter network forecasting sequence future time step is constructed, with Time step t predicted time walks t+1 citing, comprising the following steps:

E1. splicing user indicates the sum of the output with level Recognition with Recurrent Neural Network:

E2. full articulamentum is added on it:

By calling the majorized functions such as the backpropagation in Tensorflow, according to different objective functions, training network The parameter of model, comprising the following steps:

F1. pre-training sorter network: initial policy π is applied₀And training sample, using backpropagation, to intersect entropy function For training sorter network objective function:

Wherein Θ represents parameter all in sorter network, and β is the hyper parameter for weighing two parts loss.On minimizing Formula updates the parameter in network；

F2. the parameter constant in sorter network, the delay reward of entire sequence strategy network pre-training strategy network: are kept Function are as follows:

Wherein y_tIt is input X_LIt in the true place marks of time step t, is indicated with one-hot coding, L' indicates session in sequence Number, γ be measure two parts reward hyper parameter, Q is a certain constant.Q=100, replacement reward can be set in practice The Section 2 of function can propose different limitations to the length of session.Update the gradient updating formula of a sequence in tactful network Are as follows:

WhereinIt is the parameter in tactful network, with this Training strategy network；

F3. joint training: the parameter in joint training whole network, until loss restrains.

Utilize parameter W trained in sorter network_o,b_oDeng, predict the lower probable behavior of user in test set, including Following steps:

G1. splicing user indicates the sum of the output with last time step of level Recognition with Recurrent Neural Network:

G2. full articulamentum prediction target is added:

WhereinIt is distributed, is indicated with one-hot coding, Wo, bo is the parameter of sorter network for the place of prediction.

In practice, between model layer, it is also an option that property the following steps are included: during model training, use The case where two norm canonicals of dropout network and parameter limit parameter, prevent over-fitting.

The frame diagram of whole network model in one embodiment of the invention, as shown in Figure 2:

H1. to the sequence inputting of user, sequence level Recognition with Recurrent Neural Network and meeting level Recognition with Recurrent Neural Network: are utilized respectively The sequence information table that words rank Recognition with Recurrent Neural Network extracts different levels rank shows；

H2. tactful network: receiving user indicates and the output of level Recognition with Recurrent Neural Network, is calculated using full articulamentum whole The delay of sequence is rewarded, and the parameter in gradient updating network is utilized；

H3. sorter network: receiving user indicates and the output of level Recognition with Recurrent Neural Network, using full articulamentum complete to The behavior prediction of family future time step updates the parameter in network using back-propagation algorithm.

The method of the present invention can be applicable to other users timing behavior, as user buys commodity sequence, user listens to sound Happy sequence, implements and present embodiment is essentially identical, and detailed process is no longer described in detail.

Parameter in the above embodiment of the present invention is determined according to experimental result, that is, tests different parameter combinations, choosing It takes and collects upper evaluation index preferably one group of parameter in verifying, evaluation obtains result on test set.It, can root in actual test The purpose of the present invention can also be realized by carrying out appropriate adjustment to above-mentioned parameter according to demand.

Protection content of the invention is not limited to above embodiments.Under the spirit and scope without departing substantially from present inventive concept, Various changes and advantages that will be apparent to those skilled in the art are all included in the present invention, and are with appended claims Protection scope.

Claims

1. a kind of user's timing behavior automatic segmentation prediction technique based on level circulation neural network and intensified learning, feature It is, described method includes following steps:

Step 2: indicating the one-hot coding of user and timing behavior higher-dimension, is converted to the dense of low-dimensional using embedded technology Input of the vector as model；

Step 3: modeling time series data using level Recognition with Recurrent Neural Network, and Utilization strategies network generates each time step Action director whether to sequence carry out cutting, then using sorter network complete be to sequence future time walking prediction；

Step 4: training pattern parameter, using training sample, according to the ginseng of the interim optimization network model of different target function Number, and tuning is carried out using verifying the set pair analysis model parameter；

Step 5: using based under the user in the network model of level Recognition with Recurrent Neural Network and intensified learning prediction test set one Probable behavior.

2. user's timing behavior automatic segmentation according to claim 1 based on level circulation neural network and intensified learning Prediction technique, which is characterized in that user's timing behavior include user register behavior, user buy commodity behavior, user Music behavior is listened in webpage clicking behavior, user, all behavioral datas generallyd use for this field.

3. user's timing behavior automatic segmentation according to claim 1 based on level circulation neural network and intensified learning Prediction technique, which is characterized in that in the step 1, the data set includes: Gowalla data set, Foursquare data One or more data sets in collection, Amazon data set, all overt behavior data sets generallyd use for this field.

4. user's timing behavior automatic segmentation according to claim 1 based on level circulation neural network and intensified learning Prediction technique, which is characterized in that in the step 1, it is described data are pre-processed the following steps are included:

A2. operation is filtered to data wherein infrequently；

A3. a time window is selected, will be recorded as the sequence slit mode of cutting foundation as tactful network Initial policy π₀。

5. user's timing behavior automatic segmentation according to claim 1 based on level circulation neural network and intensified learning Prediction technique, which is characterized in that in the step one, cutting data are training set, verifying collection and test set are as follows: to each use Family, using the last one place of time series data as test set, as verifying collection, remaining makees the penultimate of time series data For training set.

6. user's timing behavior automatic segmentation according to claim 1 based on level circulation neural network and intensified learning Prediction technique, which is characterized in that in the step two, the input for obtaining model the following steps are included:

B1. data encoding: note shares N number of user and M place, using one-hot coding, i.e., indicates user with N-dimensional sparse vector Set, the unique characteristic dimension of user are denoted as 1, remaining is all 0, are similarly applied to place；

N-dimensional user vector: being mapped to the numerical value vector space of another low-dimensional using embedded technology by b2. data insertionization, is made The input of model after for it remembers that transformed user vector set expression is U={ u₁, u₂..., u_N, place vector collection table It is shown as P={ p₁, p₂..., p_M}。

7. user's timing behavior automatic segmentation according to claim 1 based on level circulation neural network and intensified learning Prediction technique, which is characterized in that in the step three, Recognition with Recurrent Neural Network refers to but is not limited to gating cycle unit networks, Using shot and long term memory network replacement also can, by taking time step t as an example, remember x_tInput when for time step t, specific calculating process The following steps are included:

C1. it calculates and updates door z_t:

z_t=σ (W_z·[h_t-1, x_t]+b_z),

C2 calculates resetting door r_t:

r_t=σ (W_r·[h_t-1, x_t]+b_r),

C3. cryptomnesia state is calculated

C4. hidden state h is calculated_t:

Wherein, σ is sigmoid function, indicates Hadamard product, and [] indicates the splicing of vector, * representing matrix multiplication, W_z, W_r,b_z, b_r, b_hIt is all the parameter that model can learn.

8. user's timing behavior automatic segmentation according to claim 1 based on level circulation neural network and intensified learning Prediction technique, which is characterized in that in the step three, time series data is modeled using level Recognition with Recurrent Neural Network, with User indicates u_kFor, include the following steps:

D1. sequence level Recognition with Recurrent Neural Network:

D11. list entries length is the location sequence of L, is denoted as

D12. it is calculated by the Recognition with Recurrent Neural Network in claim 6, obtains the output of each time step, be denoted as

D2. session-level Recognition with Recurrent Neural Network:

D21. according to cutting strategy π, select cutting time step corresponding from the output of sequence level Recognition with Recurrent Neural Network as a result, as The input of session-level Recognition with Recurrent Neural Network, length is | π |, it is denoted as

D22. it is calculated by the Recognition with Recurrent Neural Network in claim 6, obtains the output of session-level Recognition with Recurrent Neural Network, be denoted as

D3. according to time step be unfolded export: be by length | π | outputAccording to cutting strategy π, expand into Length is the output of L.

9. user's timing behavior automatic segmentation according to claim 1 based on level circulation neural network and intensified learning Prediction technique, which is characterized in that in the step three, Utilization strategies network generates the movement of each time step, in the time The movement a generated when walking t_tCiting, includes the following steps:

E1. definition status function s_t:

WhereinIndicate the splicing of vector,WithRespectively Recognition with Recurrent Neural Network is in sequence level and session-level in the time Walk output when t；

E2. motion space a is defined_t:

a_t={ 1,0 },

E3. definition strategy function π:

π(a_t|s_t；Θ)=σ (W_π*s_t+b_π),

Wherein W_π, b_πFor the parameter of tactful network；In the training process, a is acted_tValue by tactful π probability value sample gained, In test, movement a is depended on

10. user's timing behavior according to claim 1 based on level circulation neural network and intensified learning is cut automatically Divide prediction technique, which is characterized in that in the step three, the behavior walked using sorter network forecasting sequence future time, with Time step t predicted time walks t+1 citing, comprising the following steps:

F2. full articulamentum is added on it:

Wherein W_o, b_oIt is the parameter of sorter network, dimension consistent with place number is M,Indicate the user of prediction in the time The place that step t+1 is gone to, tieing up sparse one-hot coding with M indicates.

11. user's timing behavior according to claim 1 based on level circulation neural network and intensified learning is cut automatically Divide prediction technique, which is characterized in that in the step four, according to different objective functions, comprising the following steps:

G1. when the generation that entire tactful network completion acts sequence, the delay reward function of entire sequence strategy network is defined Are as follows:

Wherein y_tIt is input X_LIt in the true place marks of time step t, is indicated with one-hot coding, the number of session in L ' expression sequence Mesh, γ are the hyper parameter for measuring two parts reward, and Q is a certain constant；

WhereinIt is the parameter in tactful network；

12. user's timing behavior according to claim 11 based on level circulation neural network and intensified learning is cut automatically Divide prediction technique, which is characterized in that in the step four, the parameter of interim training network model, comprising the following steps:

H1. pre-training sorter network: initial policy π is applied₀And training sample, using backpropagation, with function described in step g3 To minimize target, the parameter in sorter network is updated；

H2. pre-training strategy network: keeping the parameter constant in sorter network, passes through that update gradient updating described in step g2 public Formula, the parameter of Training strategy network；

13. user's timing behavior according to claim 1 based on level circulation neural network and intensified learning is cut automatically Divide prediction technique, which is characterized in that in the step five, utilize the network based on level Recognition with Recurrent Neural Network and intensified learning The lower probable behavior of user in model prediction test set, comprising the following steps:

I2. full articulamentum prediction target is added:

WhereinIt is distributed for the place of prediction, tieing up sparse one-hot coding with M indicates, W_o, b_oIt is the parameter of sorter network.