Background
With the continuous development of online shopping platforms, recommendation systems have become irreplaceable important components in e-commerce. The recommendation system can learn the hidden preference information in the user historical behaviors, so that the shopping behaviors of the user are further predicted, customers are helped to select satisfied commodities, and the income of an e-commerce platform is promoted to be improved. Therefore, how to efficiently and accurately provide personalized commodity recommendation service for users has been an important issue for research in the academic world and the industrial industry.
Currently, there are two main categories of research on recommendation systems:
1) recommendation system based on user static preference
The algorithms based on content, collaborative filtering or mixed type are all similar, and the methods take the commodity information of the user as static characteristics, and mine the information of commodity similarity, user personalized preference and the like hidden in the characteristics by clustering, matrix decomposition and other methods, so that the similar content or similar preference recommendation is carried out on the user. Under the model, the historical behavior data of the user is regarded as the static characteristics of the user, the preference of the user is regarded as stable for a long time and can influence the future decision making process of the user, and on the basis, the recommendation system only needs to learn the historical preference of the user and recommend similar commodities according to the preference of the user.
2) Short-session-based sequence recommendation method
Some online platforms, especially small retail platforms and multimedia content providers, lack sufficient user history, but their back office accumulates a large amount of user short-session content. In consideration of the fact that the scene lacks of long-term preference characteristics of the user, a short-session-based user sequence recommendation method is proposed by scholars. The method is usually based on the short-term operation behaviors of the user, and a deep neural network is constructed to model the dynamic changes of the behaviors of the user in the short term, so that the dynamic changes are used for predicting the commodity objects which are interested in the next step of the user and recommending the user.
In the above online platform recommendation method, the recommendation system based on the static preference of the user can well learn and understand the stable preference of the user and can recommend the favorite goods or services to the user more accurately, but the method is static, only the preference of the user is regarded as long-term unchanged, the preference of the user is not considered, and the process of dynamic change is also not considered, and meanwhile, the method does not consider the current requirements of the user, so that the goods or services recommended to the user are really the favorite of the user but not the requirements of the user. The short-session-based sequence recommendation method is to record the interaction process of a user in a short period, and analyze the sequence characteristics of the user in the current decision-making process from the short-term behaviors so as to judge the next interested commodity or service of the user. The short-session sequence recommendation method can model dynamic changes of user behaviors in a short period through a deep neural network, but the method ignores the preference of the user, so that the recommendation result is always in line with the user requirements but is not the favorite type of the user. Meanwhile, the two methods cannot deeply model the dynamic change of the user in the whole decision making process, and different preference degrees expressed by different behaviors of the user are not specifically analyzed. Therefore, it is difficult to accurately model a complete decision process when a user selects goods or services by using the conventional recommendation method, and requirements and preferences of the user cannot be combined, so that recommended content cannot meet the expectations of the user.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a sequence recommendation method based on user behavior difference modeling, which mainly comprises the following steps as shown in figure 1:
step 1, obtaining historical behavior information of a user.
Each user can leave a series of log records in the background when browsing the online platform, and the records have definite time sequence relation and comprise commodity related operations of browsing, clicking, adding a shopping cart, collecting and purchasing and the like. This data may be collected directly from the online shopping platform or online service provider.
In the embodiment of the present invention, the acquired historical behavior information of the user is data in the form of an interactive commodity sequence, and the interactive commodity sequence of the user u is represented as:
wherein x represents the commodity, subscript is the serial number of the commodity, b represents the user operation behavior, and is a one-hot vector, and the length of the vector is the number of the interaction types.
And 2, calculating a commodity feature vector according to the historical behavior information of the user.
In the embodiment of the invention, a Skip-gram model based on negative sampling is constructed by modeling a commodity sequence relation in user behaviors, and a characteristic vector of a commodity is generated; the main process is as follows:
interactive merchandise sequence according to user u
A commodity feature vector is established aiming to maximize the following objectives:
wherein N is an interactive commodity sequence
P represents the form of a softmax function, defined as x
j、x
iProbability of correlation, p (x)
j|x
i) This form is known in the professional field as the softmax function and is of the form:
wherein, wiAnd viIs related to the commodity xiRepresents the corresponding potential vector and target vector; w is ajIs related to the commodity xjRepresents the corresponding potential vector; w is ak'Is related to the commodity xk'Represents the corresponding potential vector; k' takes a value from 1 to N;
to alleviate the computational complexity of the gradient, the above equation is replaced by the following procedure:
where σ (r)' 1/(1+ exp (-r)) is the sigmoid function, and E is the number of negative samples to be drawn per positive sample, where a positive sample refers to xiContext-related commodities, negative examples refer to irrelevant commodities, and the size of E can be set by a user according to actual conditions or experience;
considering that different commodities have different occurrence times and certain noise is brought to the negative sampling process, the above formula is defined again based on the mode of weighting the occurrence times of the individual commodities:
wherein, theta (x)i) Is a commodity xiAnd counting the frequency appearing in the interactive commodity sequence, wherein the target of the commodity embedded representation is a maximized loss function:
then, a commodity feature vector P is obtained in a gradient descending modeu={v1,v2,...,vNIn which v isjRepresenting a commodity xjD-dimensional feature vector of (1).
And 3, performing sequence modeling by using a behavior difference modeling method in combination with the commodity feature vector, and acquiring the current demand and the historical preference of the user through two different neural network architectures.
After the commodity feature vectors are obtained, the differential behavior modeling can take continuous behaviors as prior knowledge, and aims to recommend items which are most likely to be accessed by a target user in the next access. The decision making process of a user is mainly influenced by two factors: current motivation and historical preference. More specifically, the user's current consumer motivation is dynamic in the short term, and recent fluctuations are also important to reflect short-term characteristics. Considering that all recent actions (e.g., click, collect, shopping cart, purchase) may mean the user's current short-term motivation, the present invention uses all types of recent actions to present the current consumption motivation. On the other hand, not all types of behavior can describe a user's preferences with the user's historical preferences. To model the long-term preferences of a user, the present invention only retains behaviors that clearly describe the user's potential preferences from the interaction history, i.e., purchasing behavior. In effect, the user's interactive process is a series of implicit feedback over time. Thus, unlike conventional recommendation systems that explore user item interactions from a static approach, the next suggestion is processed through sequential modeling. Specifically, we have devised two distinct behavioral modeling processes: conversation behavior modeling and preference behavior modeling, which discriminately learns the current consumption motivation and long-term stable preference of a user. Furthermore, on this basis, we have invented two LSTM-based deep-cycle neural networks to jointly learn permutations of these two motivations and preferred behavior.
Firstly, carrying out conversation behavior modeling, and obtaining a commodity feature vector P
u={v
1,v
2,...,v
NThe corresponding interactive commodity sequence is
Defining the following indicator function to determine the goods x
iWhether the scope of the current session behavior is satisfied:
DSBL(xi,xN)=Φ((N-i)≤Ts);
wherein phi (a) is a Boolean type function, when a is true, the function value is 1, otherwise, the function value is 0; ts represents a control time step of the conversational behavior, and is used for controlling the length of the conversational behavior; x is the number ofNIs the current interactive commodity sequence Su bThe last commodity in the group;
after the definition of the initialized LSTM matrix, in the t-th iteration step, the hidden layer state h of eachtAnd hidden layer state h of the last time stept-1And the currently input commodity feature vector vtAnd a behavior vector btCorrelation; wherein the updating step is as follows:
ht=ottanh(ct)
wherein i
t、f
t、o
tAn input gate, a forgetting gate and an output gate in the t-th iteration step respectively; c. C
tA memory module that is a network element; b
tThe user operation behavior corresponding to the t-th commodity is input in the t-th iteration step; w
vi、W
hi、W
ci、W
biCorresponding to input gate i
tMiddle v
t、h
t-1、c
t-1、b
tThe weight of (c); w
vf、W
hf、W
cf、W
bfCorresponding to the forgetting door f
tMiddle v
t、h
t-1、c
t-1、b
tThe weight of (c); w
vc、W
hc、W
bcCorresponding to v in the memory module
t、h
t-1、b
tThe weight of (c); w
vo、W
ho、W
co、W
boCorresponding to the output gate o
tMiddle v
t、h
t-1、c
t-1、b
tThe weight of (c);
respectively correspond to input gates i
tForgetting door f
tAnd an output gate o
tMemory module c
tA deviation of (a); h is
tIs the output of the current state; tan h is the hyperbolic tangent function.
The current purchase demand of the user is expressed as:
ΨSBL=hN;
in the above operation process, the iteration number is the same as the number of commodities in the interactive commodity sequence, i.e. t is 1,2NLast item of the sequence xNAnd the output after input is the output of the Nth iteration step.
Second, historical preference modeling of the user is performed, operating commodity-behavior pairs (v) for each useri,bi)∈Su b(ii) a The indicator function is expressed as:
DPBL(vi,bi)=Φ(bi∈P);
wherein, P is a set of preference behaviors, mainly comprising purchasing, collecting and shopping cart adding operation behaviors;
using a bi-directional LSTM network to learn a user's preference expression, there are two hidden layer outputs at each time step of historical preference modeling, for the s-th time step, where the forward output is
Is output from its previous time step
And the current commodity-action pair (v)
s,b
s) (ii) determined; backward output
Is output by the next time step
And the current commodity-action pair (v)
s,b
s) (ii) determined; the corresponding formula is as follows:
hs=ostanh(cs)
wherein i
s、f
s、o
sAn input gate, a forgetting gate and an output gate which are respectively the s-th time step; c. C
sA memory module that is a network element; b
sThe user operation behavior corresponding to the s-th commodity is input in the s-th iteration step; w
vi'、W
hi'、W
ci'、W
bi' corresponding to input gate i
sMiddle v
s、h
s-1、c
s-1、b
sThe weight of (c); w
vf'、W
hf'、W
cf'、W
bf' corresponding to forget gate f
sMiddle v
s、h
s-1、c
s-1、b
sThe weight of (c); w
vc'、W
hc'、W
bc' corresponds to v in the memory module
s、h
s-1、b
sThe weight of (c); w
vo'、W
ho'、W
co'、W
bo' corresponding to output gate o
sMiddle v
s、h
s-1、c
s-1、b
sThe weight of (c);
respectively correspond to input gates i
sForgetting door f
sAnd an output gate o
sMemory module c
sA deviation of (a); h is
sIs the output of the current state; if the process is a forward process, the output h of the current state
sIs that
If the process is a backward process, the output h of the current state
sIs that
Through the bidirectional LSTM network, the preference characterization vector of the current user can be obtained for each time step:
wherein the content of the first and second substances,
is composed of
The historical preference of the user is expressed as an average pooling process as follows:
and 4, predicting the next interested commodity of the user through combined learning according to the current purchase demand and the historical preference of the user, matching in a commodity vector space, finding a plurality of commodities which are most similar to the prediction result in the commodity vector space, and generating a commodity recommendation sequence.
In the embodiment of the invention, the current purchase demand psi of the user is combined through a full link layer
SBLAnd historical preferences Ψ
PBLAnd then, calculating to obtain a prediction vector of the next commodity of interest of the user:
wherein the content of the first and second substances,
and
corresponding to the weights of the current purchasing demand and the historical preference; bias represents the model bias.
In the model training process, the next interested commodity vector of the real user is assumed as: v. ofT+1=(y1,y2,...,yd) (ii) a The loss function of the model can be defined as:
where d is the dimension of the vector.
According to the scheme of the embodiment of the invention, sequence information of different users is divided according to time sequence aiming at historical behavior records of the users, the method is embodied on the method for constructing the commodity feature vector and the method for establishing the user behavior difference modeling, the commodity feature vector is generated by using a commodity embedding representation method, the difference sequence modeling is carried out on different behaviors of the commodity by the users, the current requirements and the historical preferences of the users are respectively learned, and the next commodity which the users are interested in is predicted. The method combines the historical preference of the user with the current requirement, models different preferences of commodities expressed by different behaviors of the user, dynamically learns the decision process of the user through the recurrent neural network, and further generates personalized sequence recommendation for the user, and overcomes the defects that the existing method is lack of dynamics and is really personalized.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.