CN116304299A

CN116304299A - Personalized recommendation method integrating user interest evolution and gradient promotion algorithm

Info

Publication number: CN116304299A
Application number: CN202310003507.1A
Authority: CN
Inventors: 蔡世民; 刘一龙; 宗雨欣; 周洲
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-06-23

Abstract

The invention discloses a personalized recommendation method integrating user interest evolution and gradient lifting algorithms, and belongs to the field of recommendation system research in the field of machine learning. The product recommended by the model has higher coincidence rate with the product purchased by the customer within 7 days after the training data is finished. In the experiment, the sorting and prediction precision of the model are improved through preprocessing the original data and feature engineering processing, and a feature engineering method which is useful in the scene is provided. Several important features affecting the ranking result of the recommendation system are provided, and a certain reference is provided for continuously improving the accuracy of the algorithm and the recommendation system. The method has wide application field, is not only the personalized recommendation of clothing, but also can be transplanted to various recommendation fields, such as music recommendation, book recommendation and the like.

Description

Personalized recommendation method integrating user interest evolution and gradient promotion algorithm

Technical Field

The application belongs to the research field of recommendation systems in the machine learning field.

Background

Keyword term definition:

neural network: is a mathematical or computational model that mimics the structure and function of a biological neural network for estimating or approximating a function. Neural networks are calculated from a large number of artificial neuronal junctions. In most cases, the artificial neural network can change the internal structure based on external information, and is an adaptive system.

Gradient lifting tree: the gradient lifting tree, i.e., GBDT, gradient Boosting Decision Tree, is a member of the boosting family in ensemble learning, and is an iterative decision tree algorithm consisting of multiple decision trees, and the conclusions of all trees are accumulated to make a final answer. It was previously proposed to be considered together with the SVM as a powerful algorithm for generalization. In recent years, attention has been paid to machine learning models used for search ranking.

Evolution of user interest: under most non-search e-commerce scenarios, users do not express current interest preferences in real-time. Therefore, capturing the interest of the dynamic change of the user through the design model is a key for improving the recommendation effect.

The life style of people is changed by the Internet, and the life and study of people are more convenient. In recent years, with the wide spread of the internet and the rapid development of electronic commerce, online shopping has become an indispensable part of life. Websites are enriched with a large amount of product information, which often makes customers disoriented and unable to successfully find the desired product. The recommendation system for classifying the commodities can quickly and actively help customers to find favorite commodities and potential buyers, so that sales volume is increased, and huge economic benefits are brought. Meanwhile, the shopping time can be saved for customers, and the shopping efficiency is improved.

Commodity data, transaction data, and customer data hide much of the information that is not mined and affects the customer's choice of commodity. It is difficult to determine which factors are important through subjective experience. Currently, machine learning algorithms have been widely used in research for commodity recommendation and ordering. Li et al set up a recommendation system based on emotion analysis, analyze customer evaluations, and recommend the most interesting products for customers. Sun et al converts commodity recommendation problems into density-based clustering problems, and results show that the model can solve the problems to a certain extent.

The prior art has the following defects:

on one hand, the algorithm can only describe the influence of a few features on sequencing and recommendation results, and the accuracy and efficiency of the traditional machine learning algorithm are greatly reduced in the face of massive unlabeled data; on the other hand, the algorithm cannot simulate the interest evolution route of the user, so that the selection of online shopping of the user is more and more, the interests are changed at the moment, and a plurality of interests possibly exist in the user for a period of time, namely the interests of the user are continuously evolved and crossed, so that how to more accurately express the dynamic change of the interests of the user and capture the long-term interests of the user is important.

In the method, the recommendation model which is integrated into the simulation of the user interest evolution and gradient promotion algorithm is used for conducting personalized recommendation of the commodities to be purchased in the future week of the user by taking the past transaction data provided by H & M company provided by the kagle platform and the multi-mode data of the clients and the commodities as the background.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a recommendation model which is integrated into an algorithm for simulating the evolution of the user interest and gradient lifting. The method not only can process mass data, but also can accurately simulate the evolution of the user interest. Specifically, for processing massive data, algorithms such as GBDT, XGBoost, lightGBM of Boosting families are sequentially proposed and applied, so that recommending and ordering products in massive data is possible, and lambda gradients are added on the basis of a LightGBM algorithm, so that the method is more suitable for recommending and ordering scenes. For simulating the user interest evolution, the focus in the target commodity and the user behavior data can be screened out by using a attention mechanism which is successful in the nlp field, so that the downstream task is performed.

The technical scheme of the invention is a personalized recommendation method integrating user interest evolution and gradient promotion algorithm, which comprises the following steps:

step 1: acquiring user information, commodity information and scene information from a database;

wherein the user information includes: attribute information such as ID, age, zip code, whether club members are active, whether news pushing is accepted, etc. of the user; the commodity information includes: attribute information such as commodity ID, commodity code, commodity name, commodity number, commodity name, commodity color, date of departure, production department, etc., visual information such as commodity picture, and text information such as commodity description; the scene information includes: attribute information such as user ID, commodity ID, transaction date, transaction channel, etc.;

step 2: dividing a data set;

taking the last week of samples as a test set and the previous samples as a training set;

step 3: constructing a model of user interest evolution;

the model comprises a behavior sequence layer, an interest extraction layer, an interest evolution layer and an MLP network:

behavioral sequence layer: the method is used for converting the original ID behavior sequence of the user for n days into an ebedding behavior sequence;

interest extraction layer: for extracting the interest from the ebedding data, the GRU unit is used to extract the interest:

u _t ＝σ(W ^u i _t +U ^u h _t-1 +b ^u )

r _t ＝σ(W ^r i _t +U ^r h _t-1 +b ^r )

wherein u is _t Representing the output value of the update gate in GRU at time t, r _t Representing the output value, i, of the reset gate in the GRU at the time t _t Representing the input of the GRU at time t,

representing a newly learned memory state of the GRU at a time t, W representing a parameter weight input by the GRU unit, U representing a parameter weight of a hidden state of the GRU at a previous time, b representing a parameter bias, superscript U, r, h representing an update gate, a reset gate and a new learning state, respectively, sigma representing a sigmoid function, and DEG representing an element-wise product, i _t Is the input of GRU, namely, each behavior of behavior sequence layer is an ebedding vector, and represents the ebedding vector of the t-th behavior of the user, h _t Then the t hidden state of the GRU is further abstracted by the user behavior vector b (t) after the interest network of the GRU is passed, and an interest state vector h (t) is formed;

interest evolution layer: the method is used for describing the evolution process of the user interests, adding an attention mechanism and scoring the attention mechanism:

wherein a is ^t Represents the attention score at time t of the attention mechanism, W represents the parameter weight of the attention unit, e _a An embedding vector representing a target item, T representing the total number of moments, h _t The output of the interest extraction layer at the time t is shown, and the attention score is added on the structure of the original updated gate through AUGRU (GRUwith Attentional Update gate) based on the GRU structure of the attention updated gate, and the specific form is as follows:

wherein u is ^′ _t Is the original update gate of the augur,

attentional update gate, h designed for AUGRU ^′ _t Is the hidden state of the AUGRU; output h of interest evolution layer ^′ _t As input to a subsequent MLP network;

step 4: training a model simulating user interest evolution;

training a model of user interest evolution by utilizing the data set obtained in the step 2;

step 5: constructing a gradient lifting tree model for processing mass data;

step 5.1: the model score of each commodity is 0 initially, and N tree models are generated;

step 5.2: aiming at the training of each tree, traversing different commodity pairs of labels in the training data set to obtain Lambda value Lambda of each sample _i ；

The calculation method comprises the following steps:

wherein lambda is _i,j Lambda value of commodity i when it is arranged in front of commodity j, |ΔZ _ij The expression "s" indicates a change in MAP index caused by changing positions of items i and j in the list _i Representing the output score of the gradient lifting tree model for commodity i;

step 5.3: calculating lambda _i Corresponding derivative omega _i For solving leaf nodes by the subsequent Newton methodThe numerical value of the dot;

lambda of all documents _i As a label training decision tree, adopting a minimized sum of square errors to split nodes, namely selecting a value val for a certain selected feature, and dividing all samples smaller than or equal to val into left child nodes and dividing samples larger than val into right child nodes; then calculating the sum of square errors of Lambda for the left node and the right node respectively, adding the sum of square errors of Lambda as the cost of splitting, selecting the (feature, val) pair with the minimum cost as the current splitting point, and finally generating a decision tree with the leaf node number L;

calculating the numerical value of each leaf node by adopting Newton step for the generated decision tree, namely calculating the output value of the leaf node for a document set falling into the leaf node;

step 5.4: updating a model, adding the currently learned decision tree into the existing LightGBM model, and regularizing with a learning rate;

step 6: training and processing a gradient lifting tree model of mass data;

training the gradient lifting tree model by utilizing the data set obtained in the step 2

Step 7: fusing the models;

taking the data of the multidimensional matrix as input, respectively inputting the data into a model of user interest evolution and a gradient lifting tree model for training and learning to obtain commodity scores of the two models, carrying out linear weighting on the scores of the two models to obtain a total score, and sorting according to the total score to obtain a final commodity recommendation list;

step 8: obtaining prediction data;

acquiring test set data from a database, and preprocessing and feature engineering to obtain user data to be tested;

step 9: obtaining a recommendation list through combined model prediction;

and inputting the to-be-detected products into the fused model, wherein the output of the network is the forecast of the commodity recommendation list to be purchased by the user in the next week.

Compared with the prior art, the invention has the beneficial effects that:

1. a ranking model fusing the DIEN and LightGBMRanker algorithms is provided to recommend H & M group products to improve the shopping experience of customers. Experimental results show that the product recommended by the model has higher coincidence rate with the product purchased by the client within 7 days after the training data are finished.

2. In the experiment, the sorting and prediction precision of the model are improved through preprocessing the original data and feature engineering processing, and a feature engineering method which is useful in the scene is provided.

3. Several important features affecting the ranking result of the recommendation system are provided, and a certain reference is provided for continuously improving the accuracy of the algorithm and the recommendation system.

4. The method has wide application field, is not only the personalized recommendation of clothing, but also can be transplanted to various recommendation fields, such as music recommendation, book recommendation and the like.

Drawings

FIG. 1 is a flow diagram of a fusion of DIEN and LightGBMRanker modules in one embodiment;

FIG. 2 is a histogram used by the DIEN module in one embodiment;

FIG. 3 is a graph of volume of transactions over time in one embodiment;

FIG. 4 is an age distribution diagram of a customer in one embodiment;

FIG. 5 is a DIEN module in one embodiment;

FIG. 6 is pseudocode of the LightGBMRanker module of user behavior prediction in one embodiment;

fig. 7 is a feature importance map of the first ten in one example.

Detailed description of the preferred embodiments

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a personalized recommendation process for a user is provided by combining a model of DIEN and LightGBMRanker, the method comprising the steps of:

step 1: data exploration, data preprocessing and feature engineering;

the acquired data set includes user information, merchandise information, and scene information.

The user information includes 7 attribute features such as user ID, profile number, whether the user is active, whether the club member is receiving a corporate message push, age, zip code, etc.

The commodity information comprises 24 attribute characteristics such as commodity ID, product code, product name, product type number, product type name, product group name, 1 picture information characteristic such as product picture, and 1 text information characteristic such as product description.

The scene information includes 4 attribute features and time features such as user ID, commodity ID, transaction time, transaction channel (online or offline).

The size of the dataset is viewed, along with the meaning and distribution of each field. Statistical analysis of the data is necessary for which information may be useful without the inclusion of a dominant concept. As shown in fig. 3, it can be seen that there is a sudden increase in commodity transaction amount, and this trend occurs periodically with the lapse of time, so that the influence of factors such as sales promotion, holidays, etc. should be considered. As can be seen from fig. 4, the ages of customers are mainly concentrated between 20 and 30 years, which is different from shopping habits of other ages, and thus should show age distribution differences at the time of feature engineering.

The preprocessing is the operation of carrying out the complement of the global average value and the replacement of the abnormal value on the missing data in the data set.

Feature engineering involves normalizing and barreling continuous features, and one-hot encoding discrete features.

Since the data size is relatively large and it is unclear which data is useful, it is necessary to perform feature engineering before the experiment. Firstly, by a memory compression technology, for example, data is compressed into a smaller floating point float type, so that the demand on computer resources is reduced, and the running speed is improved; secondly, extracting features, mainly comprising time feature extraction, user feature extraction, high-order feature combination and the like; then carrying out feature statistical analysis such as maximum value, minimum value, median value, correlation coefficient and the like; finally, selecting the characteristics through related algorithms such as RFECV, SFS and the like, and deleting some unimportant characteristics.

Step 2: dividing the data set;

the model aims to predict commodities to be purchased in the future week, and to avoid information crossing caused by putting predicted future time window data into a training set training model in the data set dividing process, a time cutting method is used for dividing the data set into a training set and a verification set according to time;

the behavior data of each user is input as a sample, the acquired data set is two-year data, the aim is to predict a commodity list to be purchased by the user in the future for one week, and in order to avoid the problem of 'information crossing', the data set is divided as follows:

dividing into 4 groups of data sets, wherein data0 is a verification set, and data1, data2 and data3 are training sets. The verification set data0 is data of a window of which the reciprocal is 1 week, and the previous time window data is valid; training set data1 is the data of the last 2 weeks window as a target, and the previous time window data is used as valid; data2 is the data of the 3 rd week window as a target, and the previous time window data as valid; data3 is the data of the 4 th week window as target, and the previous time window data as valid.

Step 3: constructing a model for simulating user interest evolution;

constructing a DIEN module in the combined deep learning network; the DIN model is improved on the basis of the DIN model; converting an original id behavior sequence into an Embedding behavior sequence by utilizing a behavior sequence layer; capturing real-time interests according to a user history sequence by utilizing an interest extraction layer, and providing a loss function to supervise and learn the user interests in each step; and capturing an interest evolution process related to the target item by using the interest evolution layer, introducing an attention mechanism in the sequence structure, and enhancing the influence of the related item in the interest evolution process. In the model, GRU and Attention are simply added on the basis of a classical Emmbedding & MLP model, and a loss function is modified;

the DIEN module herein serves as a benchmark model for capturing migration curves of shopping interests of the user. The DIEN model mainly comprises three layers:

behavioral sequence layer: as shown in the bluish layer of fig. 5, like the normal mebedding layer, it is responsible for converting the original ID class behavior sequence of the user for n days into an ebedding behavior sequence.

Interest extraction layer: as shown in a pale yellow layer in FIG. 5, a sequence model consisting of GRUs is utilized to simulate a user interest migration process, and user interests corresponding to each commodity node are extracted. The main goal of the interest extraction layer Interest Extractor Layer is to extract the interest from the ebedding data, using the GRU units to extract the interest:

u _t ＝σ(W ^u i _t +U ^u h _t-1 +b ^u )

r _t ＝σ(W ^r i _t +U ^r h _t-1 +b ^r )

wherein σ represents a sigmoid function, ° represents an element-wise product, i _t Is the input of GRU, namely, each behavior of behavior sequence layer is an ebedding vector, and represents the ebedding vector of the t-th behavior of the user, h _t Then it is the t-th hidden state of the GRU. After passing through the interest network of the GRU, the user behavior vector b (t) is further abstracted to form an interest state vector h (t).

Interest evolution layer: as shown in a light red layer in FIG. 5, an AUGRU composition sequence model is utilized, an attention mechanism is added on the basis of an interest extraction layer, an interest evolution process related to a current target commodity is simulated, and the output of the last state of the interest evolution layer is the current interest vector of a user. The main goal of the interest evolution layer Interest Evolution Layer is to characterize the evolution process of the user's interests, add attention mechanisms, attention mechanism scores:

through AUGRU (GRU with Attentional Update gate) GRU structure based on attention update door, attention score is added on the structure of original update door, and the specific form is as follows:

wherein u is ^′ _t Is the original update gate of the augur,

attentional update gate, h designed for AUGRU ^′ _t Is the hidden state of the augur. Output h of interest evolution layer ^′ _t As input to the subsequent MLP network.

Step 4: training the DIEN deep learning network by using a training set, verifying the trained combined deep learning network by using a verification set, and adjusting the super parameters of the network until the preset conditions are met, so as to obtain a trained user behavior prediction model.

Step 5: constructing a gradient lifting tree model for processing mass data;

constructing a GBDT module in the combined deep learning network; compared with the XGBoost model, the LightGBM model in the GBDT family is lighter and faster, and the accuracy is guaranteed; the LightGBMRanker used in the invention adds lambda gradient on the basis of the LightGBM, so that the lightning GBMRanker is more suitable for the application of sequencing recommended scenes, and is a ListWise type LTR algorithm.

The continuous floating point eigenvalues are first discretized into integers while constructing a histogram of width dimensions. When traversing data, accumulating statistics in the histogram according to the discretized value as an index, accumulating needed statistics in the histogram after traversing the data once, and then traversing to find the optimal segmentation point according to the discretized value of the histogram. Feature discretization has many advantages such as convenient storage, faster operation, strong robustness, more stable model, etc. The most straightforward for this algorithm is the following two advantages:

the memory occupation is smaller, the algorithm does not need to additionally store the pre-ordered result, only the value after feature discretization can be stored, but the value is generally enough to be stored in a bit integer mode, and the memory consumption can be reduced to be original. That is, XGBoost needs to use 32-bit floating point numbers to store the characteristic values and bit shaping to store the indexes, while LightGBM only needs to use 8 bits to store the histograms, and the memory is reduced to 1/8;

the computational cost is smaller, the pre-ordering algorithm XGBoost needs to calculate the gain of one split per traversal of one eigenvalue, while the histogram algorithm LightGBM needs to calculate k times only (k can be considered as a constant).

Goss is a sample sampling algorithm that can exclude most of samples with small gradients while guaranteeing a basic distribution of data, thereby reducing the amount of data while guaranteeing accuracy. EFB is a method of reducing feature dimensions (dimension reduction technique) by feature bundling to improve computation efficiency. This approach allows the option of binding two incompletely mutually exclusive features without affecting the final accuracy. Through efficient parallel processing, including feature parallelism, data parallelism and voting parallelism, the running speed is further improved, and the resource occupation is reduced.

The LightGBMRanker module is used as a reference model of a gradient lifting tree for processing mass data. The LightGBM may be regarded as an improvement of XGBoost algorithm, with faster speed, less resource consumption, higher precision, including decision tree based histogram algorithm, gradient-based One-Side Sampling (GOSS), proprietary feature binding (EFB), classification feature (Categorical Feature), support for efficient parallelism, etc. The LightGBMRanker module adds lambda gradient on the basis of the LightGBM model, so that the LightGBMRanker module is more suitable for sequencing recommended scenes.

The continuous floating point eigenvalues are first discretized into integers while constructing a histogram of width, as shown in fig. 2, and the continuity values are binned. When traversing data, accumulating statistics in the histogram according to the discretized value as an index, accumulating needed statistics in the histogram after traversing the data once, and then traversing to find the optimal segmentation point according to the discretized value of the histogram. The Lambda gradient is defined by sequencing indexes such as MAP and NDCG, and MAP is used as an experimental index.

Where U is the number of customers, n is the number of recommended (ordered) products per customer, and m is the number of ground truth values per customer. P (k) represents the precision of the cut-off k, rel (k) is an indicator function, 1 if the term of rank k is a relevant (correct) label, and 0 otherwise.

Considering ordered pairs (i, j), to emphasize the importance of the front and back positions in the ranking, an exchange index |ΔZ is introduced _ij I indicates that the MAP index changes after the article i and article j in the list are shifted. s is(s) _i Representing the output score of the model for commodity i.

Next, lambda gradient of commodity i was calculated to be λ _i 。

After defining the lambda gradient, the loss function L is deduced in reverse _ij 。

L _ij ＝log{1+exp(s _i -s _j )}·|ΔZ _ij |

The optimization target during model training iteration is the loss function L added with lambda gradient _ij . FIG. 6 details specific steps for constructing a LightGBMRanker module using lambda gradients:

initially, there is no decision tree model, so the model score for each commodity is 0;

aiming at the training of each tree, the algorithm traverses different commodity pairs of label in the training data set to calculate index change |delta Z caused by the position exchange of the commodity pairs of label _ij I and lambda _i,j Thereby obtaining Lambda value Lambda of each document _i ；

Calculating the derivative omega of each lambda _i The values for the leaf nodes are solved for the following Newton method;

and training a decision tree by taking lambda of all documents as label, and splitting nodes by adopting a minimized square error sum, namely selecting a value val for a certain selected feature, and dividing all samples smaller than or equal to val into left child nodes and dividing samples larger than val into right child nodes. Then calculating the sum of square errors of Lambda for the left node and the right node respectively, adding the sum of square errors of Lambda as the cost of splitting, selecting the (feature, val) pair with the minimum cost as the current splitting point, and finally generating a decision tree with the leaf node number L;

updating the model, adding the currently learned decision tree into the existing model, and regularizing by using the learning rate.

Step 6: training the LightGBMRanker tree model by using a training set, verifying the trained combined deep learning network by using a verification set, and adjusting the super parameters of the network until the preset condition is met, so as to obtain a trained user behavior prediction model.

Step 7: fusing the models;

and (3) carrying out model integration, and carrying out fine adjustment on the fused model on training parameters. And linearly weighting the recommendation lists obtained respectively, and sequencing the recommendation lists with final scores to obtain a final recommendation item list. The weighting formula:

Weighted＝(LightGBMRanker+DIEN)/2

step 8: and acquiring the behaviors of a plurality of users in n days, and preprocessing to obtain matrix data of the users to be detected.

Step 9: and inputting matrix data of the user to be tested into the combined model to obtain a commodity list to be purchased by the user in the next week. Inputting the data series to be tested into a DIEN model and a LightGBMRanker model to obtain the prediction scores of different commodities of a user, then carrying out linear weighting on the prediction values, and sorting according to the finally weighted scores to obtain a final one-week recommended commodity list.

The verification experiment is carried out on the past transaction data provided by H & M company provided by the kagle platform and the multi-mode data of clients and commodities, DIEN, lightGBM and the performance of the combined deep learning model fused with the simulated user interest evolution and gradient lifting algorithm on the test set are respectively compared, and the MAP@12 index results of the DIEN model, the LightGBM model and the combined deep learning model on the training set and the test set are shown in table 1. On the training set, the MAP@12 score of the DIEN model is 0.02256, the MAP@12 score of the LightGBM model is 0.02321, the MAP@12 score of the LightGBMRanker model is 0.02384, the MAP@12 score of the combined deep learning model is 0.02841, and compared with the fluctuation of the MAP@12 score of the previous three, the fluctuation of the MAP@12 score of the combined deep learning model is 25.9%, 22.4% and 19.1%; on the test set, the MAP@12 score of the DIEN model is 0.02239, the MAP@12 score of the LightGBM model is 0.02298, the MAP@12 score of the LightGBMRanker model is 0.02361, the MAP@12 score of the combined deep learning model is 0.0282, and the score amplitudes of the combined deep learning model and the combined deep learning model are 26.0%, 22.8% and 19.5% compared with the previous three. The recommendation result of the combined deep learning network is closer to the purchasing condition of the customer, and the commodity to be purchased in the future week of the user can be recommended more accurately. Compared with DIEN, lightGBM and LightGBMRanker algorithms, the prediction accuracy of the combined deep learning algorithm is obviously improved.

Finally, the first 10 important features are given, as shown in the feature importance diagram of fig. 7, and in this example the more important features are: the method provides a certain reference for continuously improving the accuracy of algorithms and recommendation systems in the future.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

While the invention has been described in terms of specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the equivalent or similar purpose, unless expressly stated otherwise; all of the features disclosed, or all of the steps in a method or process, except for mutually exclusive features and steps, may be combined in any manner.

Table 1 Experimental results graphs of the combined deep learning model fused with DIEN and LightGBMRankwe in the examples

Model	Training set MAP@12	Test set MAP@12
			DIEN	0.02256	0.02239
LightGBM	0.02321	0.02298
			LightGBMRanker	0.02384	0.02361
DIEN+LightGBMRanker	0.02841	0.02822

Claims

1. A personalized recommendation method incorporating a user interest evolution and gradient promotion algorithm, the method comprising:

step 2: dividing a data set;

step 3: constructing a model of user interest evolution;

u _t ＝σ(W ^u i _t +U ^u h _t-1 +b ^u )

r _t ＝σ(W ^r i _t +U ^r h _t-1 +b ^r )

representing the newly learned memory state of the GRU at the time t, W representing the parameter weight input by the GRU unit, U representing the parameter weight of the hidden state of the GRU at the last time, b representing the parameter bias, the superscript U, r, h representing the update gate, the reset gate and the new learning state respectively, sigma representing the sigmoid function, and the hidden state of the GRU at the last time>

Representing element-wise product, i _t Is the input of GRU, namely, each behavior of behavior sequence layer is an ebedding vector, and represents the ebedding vector of the t-th behavior of the user, h _t Then the t hidden state of the GRU is further abstracted by the user behavior vector b (t) after the interest network of the GRU is passed, and an interest state vector h (t) is formed;

wherein a is ^t Represents the attention score at time t of the attention mechanism, W represents the parameter weight of the attention unit, e _a An embedding vector representing a target item, T representing the total number of moments, h _t The output of the interest extraction layer at the time t is shown, and the attention score is added on the structure of the original updated gate through AUGRU (GRU with Attentional Update gate) based on the GRU structure of the attention updated gate, and the specific form is as follows:

wherein u' _t Is the original update gate of the augur,

attentional update gate, h 'designed for AUGRU' _t Is the hidden state of the AUGRU; output h 'of interest evolutionary layer' _t As input to a subsequent MLP network;

step 4: training a model simulating user interest evolution;

step 5: constructing a gradient lifting tree model for processing mass data;

step 5.2: aiming at the training of each tree, traversing different commodity pairs of labels in the training data set to obtain each sampleLambda value Lambda of (x) _i ；

The calculation method comprises the following steps:

step 5.3: calculating lambda _i Corresponding derivative omega _i The values for the leaf nodes are solved for the following Newton method;

step 6: training and processing a gradient lifting tree model of mass data;

Step 7: fusing the models;

step 8: obtaining prediction data;

step 9: obtaining a recommendation list through combined model prediction;