CN115687757A - Recommendation method fusing hierarchical attention and feature interaction and application system thereof - Google Patents

Recommendation method fusing hierarchical attention and feature interaction and application system thereof Download PDF

Info

Publication number
CN115687757A
CN115687757A CN202211322085.6A CN202211322085A CN115687757A CN 115687757 A CN115687757 A CN 115687757A CN 202211322085 A CN202211322085 A CN 202211322085A CN 115687757 A CN115687757 A CN 115687757A
Authority
CN
China
Prior art keywords
layer
recommendation
user
attention
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211322085.6A
Other languages
Chinese (zh)
Inventor
黄发良
隆广庆
尹云飞
宋佩华
李林
戴智鹏
黄恩博
何广静
元昌安
龙连杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanning Normal University
Original Assignee
Nanning Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanning Normal University filed Critical Nanning Normal University
Priority to CN202211322085.6A priority Critical patent/CN115687757A/en
Publication of CN115687757A publication Critical patent/CN115687757A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a recommendation method for integrating hierarchical attention and feature interaction, which comprises the following steps of: s1, collecting user instance information; s2, according to the user instance information, training a recommendation model by utilizing the low-order to high-order implicit and explicit characteristics interaction by utilizing a second-order interaction operator and a level attention mechanism; s3, calculating the probability of recommending the target item for the user by using a trained recommendation model for the user instance information and the target item to be predicted; and S4, recommending the target item with the recommendation probability larger than or equal to the preset recommendation threshold value to the user. The method of the invention can better depict the potential interest of the user, improve the performance of the recommendation system, can be widely applied to the recommendation system and improve the active service quality of information.

Description

Recommendation method fusing hierarchical attention and feature interaction and application system thereof
Technical Field
The invention relates to the technical field of recommendation systems, in particular to a recommendation method fusing hierarchical attention and feature interaction and an application system thereof.
Background
With the rapid development of information technology, various online services such as e-commerce, news information, social platforms and the like are in a large number, so that people are in a predicament of 'mass data but poor knowledge' while bringing convenience to life of people, and information meeting the needs of people is difficult to find in a data sea. Search engines have been developed to alleviate this problem to some extent, but they require the user to provide a clear need because search engines use keyword matching techniques to feed back to the user a list of items in the search information base that match the user's needs most closely. In reality, people often have difficulty in accurately describing the needs of the people by using a few keywords. In addition, the search engine cannot meet the personalized demands of different users, since the feedback results for the same keyword are identical. Aiming at the difficult problem that the individual requirements of people are difficult to meet, information recommendation technology and a recommendation system are developed.
A core task in the information recommendation method is how to efficiently and accurately calculate the importance degree of each feature and interactive feature. In the existing recommendation method, the feature interaction mainly has two forms of explicit feature interaction and implicit feature interaction. The former mainly utilizes a well-designed interaction network, such as a factorization model, to promote obvious interaction among various features. The latter usually utilizes a deep neural network to implicitly interact with features, such as deep factorization model deep fm. Recently, some more advanced recommendation methods have attempted to fuse explicit and implicit feature interaction information. However, when the models fuse the information of the two aspects, the dimensions of low-order interaction and high-order interaction are not considered, and only the high-order interaction of the features is completely lost to the deep neural network for self-learning, so that the effect of the feature interaction cannot be effectively embodied.
Disclosure of Invention
In order to solve at least the above defects or problems, the present invention proposes an information recommendation method based on feature interaction and hierarchical attention, that is, the recommendation method of the present invention integrates hierarchical attention and feature interaction. According to the method, a factorization machine is used for carrying out explicit feature interaction, and according to an interaction result, a second-order interaction operator is used for extracting interaction information, and feature interaction from a low order to a high order is carried out implicitly through level attention. During which feature interactions from lower order to higher order are accomplished by optimizing the representation of the feature by attention mechanism to provide a feature representation of higher order information for the next explicit feature interaction. Therefore, the implicit and explicit characteristic interaction from low order to high order is realized by utilizing a second-order interaction operator and a hierarchical attention mechanism, and the performance of the recommendation system is effectively improved.
The invention provides a recommendation method based on feature interaction and hierarchical attention, which comprises the following steps:
s1, collecting user instance information, wherein the user instance information comprises user attributes, project attributes and user historical behavior attributes;
s2, according to the user instance information, carrying out interactive training recommendation models on implicit and explicit characteristics from a low order to a high order by utilizing a second-order interaction operator and a level attention mechanism;
s3, calculating the probability of recommending the target item for the user by using the trained recommendation model for the user instance information to be predicted and the target item;
and S4, recommending the target item with the recommendation probability larger than or equal to the preset recommendation threshold value to the user.
Preferably, S2 comprises:
s2-1, initializing a parameter set of the recommendation model;
s2-2, carrying out sparse coding in a one-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through a sparse representation layer of the recommendation model to obtain sparse representation of the user instance information;
s2-3, converting the sparse representation of the user instance information into a low-dimensional dense vector through an embedding layer of the recommendation model to obtain embedded representation of the user and the project;
s2-4, collecting a small batch of samples from the conversion expression of all the example samples obtained in the S2-3;
s2-5, calculating a prediction score of the item recommended by the user by using a small batch of samples through a second-order interaction operator and a forward process of a hierarchical attention network;
s2-6, calculating the predicted Loss based on the cross entropy;
s2-7, optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the recommendation model, and searching the learning rate between {0.0001,0.01 };
s2-8, performing loop from S2-4 to S2-7 until the number of iterations is equal to the number of specified iterations.
Preferably, S2-5 comprises:
the hierarchical attention network is formed by stacking L attention layers, wherein the L (L =0,1, …, L) attention layer comprises:
x (0) =[v 1 ,v 2 ,…,v m ]
Figure BDA0003913671440000021
Figure BDA0003913671440000022
Figure BDA0003913671440000031
Figure BDA0003913671440000032
Figure BDA0003913671440000033
Figure BDA0003913671440000034
Figure BDA0003913671440000035
Figure BDA0003913671440000036
wherein the superscript (l) indicates that the current attention layer is positioned at the level numbered l, x (0) =[v 1 ,v 2 ,…,v m ]Is the output of the embedding layer of the recommendation model, v i (i =1,2, …, m) is the vector of representations of the i-th feature of the embedded layer output, m is the number of attributes of the user instance,
Figure BDA0003913671440000037
embedding vector v corresponding to ith feature input into ith layer i The (c) th dimension of (a),
Figure BDA0003913671440000038
IS the k-dimension, IS, of the interactive feature vector of the l-th layer (l) Representing the interactive characteristic vector of the l-th layer, M is the layer number of the multilayer perceptron corresponding to the output module, W i (i =1,2, …, M) and b i (i =1,2, …, M) are the weight and offset, respectively, for the i-th layer of the multi-layer perceptron, and h and b are the weight and offset, respectively, of the linear transformation;
Figure BDA0003913671440000039
to input the embedding vector corresponding to the ith feature of the ith layer,
Figure BDA00039136714400000310
is the attention coefficient of the first layer of attention, W (l) And b (l) Respectively the connection weight and the bias of the l-th layer in the depth model,
Figure BDA00039136714400000311
is the normalized attention coefficient, x, of the ith attention layer (l+1) And x (l) Respectively, input to the l +1 th layer and the l layer in the depth model, delta is a nonlinear excitation function sigmoid,
Figure BDA00039136714400000312
is the output of the i-th layer in a multi-layer feedforward network, o (l) Representing the output of the last layer in the multi-layer feedforward network, and T representing a vector transposition operator;
then, all the output layers of the hierarchical attention network are linearly integrated, and the recommended prediction score of the item x is calculated by utilizing a softmax function
Figure BDA00039136714400000313
In particular O = [ O ] (0) ,o (1) ,…,o (L) ]And
Figure BDA00039136714400000314
wherein W p And b p Respectively, the weight and the bias of the linear transformation.
Preferably, the cross-entropy penalty in S2-6 is:
Figure BDA00039136714400000315
wherein Θ is a parameter set of the recommendation model, N is a total number of training examples, and λ is a weight of a regularization term; v. of i Is the training data label of whether the ith item was recommended,
Figure BDA00039136714400000316
is the probability that the model calculates that the ith item was recommended, i ranges from i =1,2 … … N.
Still another object of the present invention is to obtain a recommendation system using the above method, comprising:
an acquisition unit configured to acquire user instance information;
a presumption unit configured to input the user instance information into a recommendation model trained according to the method of any one of claims 1-5, and calculate the probability that the user recommends the target item by using the trained recommendation model for the user instance information to be predicted;
and the pushing unit is configured to recommend the target item with the recommendation probability larger than or equal to a preset recommendation threshold value to the terminal of the user.
The invention at least comprises the following beneficial effects:
compared with the traditional recommendation method, the method realizes the implicit and explicit characteristic interaction from low order to high order by utilizing the second-order interaction operator and the level attention mechanism, can better depict the potential interest of the user, can effectively improve the performances such as the accuracy and the like of the recommendation system, can be widely applied to the recommendation system, and improves the active service quality of information.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
FIG. 1 is a flow chart of a recommendation method of the present invention.
FIG. 2 is a schematic diagram of the training of the proposed model of the present invention.
FIG. 3 is a bar graph of the average run time per epoch for different models on the Criteo dataset.
FIG. 4 is a bar graph of the average run time per epoch for different models on an Avazu dataset.
FIG. 5 is a bar graph of the average run time per epoch for different models on the Movielens-1M dataset.
FIG. 6 is a bar graph of the average run time per epoch for different models on the Book-crosslinking dataset.
Detailed Description
The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.
It is to be noted that the experimental methods described in the following embodiments are all conventional methods unless otherwise specified, and the devices and materials described therein are commercially available without otherwise specified.
As shown in fig. 1-2, the present invention discloses a recommendation method based on feature interaction and hierarchical attention, which comprises: encoding input instance information into a sparse feature vector, and embedding the sparse feature vector in a low-dimensional manner; extracting interaction information between the embedded feature vectors by using a second-order interaction operator; realizing multi-level high-order characteristic interaction from front to back through a level attention mechanism; and (3) obtaining interest expression vectors of the users by linearly combining the outputs of different layers of the deep neural network, calculating the recommendation probability of each information item according to the interest expression vectors, and recommending the target item corresponding to the recommendation probability which is greater than or equal to a preset recommendation threshold value to the users. The specific operation is as follows:
step 1: inputting user instance information comprising user attributes, item attributes and user historical behavior attributes;
the user attributes mainly include, but are not limited to, age, gender and occupation, the project attributes mainly include, but are not limited to, category and price, and the user historical behavior attributes mainly include, but are not limited to, online user click behavior, browsing behavior, comment behavior and like behavior. For example, the user instance information may be [ user id:1994, male, hotel waiter; item id: dell notebook or applet notebook, 3800-6000 yuan range; and (4) behavior id: the browsing time of the Tan-Bao-De official flagship shop is more than 10 seconds, the browsing time of the Del-Jingdong-owned official flagship shop is more than 30 seconds, and the like.
Step 2: training a recommendation model according to the user instance information;
the step 2 further comprises the following steps:
step 2.1: initializing a parameter set of the recommendation model;
step 2.2: carrying out sparse coding in a single-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through a sparse representation layer of the recommendation model to obtain a sparse representation vector of the user instance information;
step 2.3: converting the sparse representation vector of the user instance information into a low-dimensional dense vector through an embedding layer of the recommendation model to obtain an embedded representation vector of the user instance information;
step 2.4: collecting a set formed by embedded expression vectors of user instance information to obtain a small batch of samples;
step 2.5: calculating a prediction score of the item recommended by the user by using a second-order interaction operator and a forward process of a hierarchical attention network for a small batch of samples;
in said step 2.5, the prediction score may be calculated according to the following method: first, the output o of each layer l of the hierarchical attention network is calculated according to the following formula (l) Then, then
x (0) =[v 1 ,v 2 ,…,v m ]
Figure BDA0003913671440000051
Figure BDA0003913671440000052
Figure BDA0003913671440000053
Figure BDA0003913671440000054
Figure BDA0003913671440000061
Figure BDA0003913671440000062
Figure BDA0003913671440000063
Figure BDA0003913671440000064
Wherein the superscript (l) indicates that the current attention layer is positioned at the level numbered l, x (0) =[v 1 ,v 2 ,…,v m ]Is the output of the embedding layer of the recommendation model, v i (i =1,2, …, m) is the vector of representations of the i-th feature of the embedded layer output, m is the number of attributes of the user instance,
Figure BDA0003913671440000065
embedding vector v corresponding to ith feature input into ith layer i The (c) th dimension of (a),
Figure BDA0003913671440000066
IS the k-dimension, IS, of the interactive feature vector of the l-th layer (l) Representing the interactive characteristic vector of the l-th layer, M is the layer number of the multilayer perceptron corresponding to the output module, W i (i =1,2, …, M) and b i (i =1,2, …, M) are the weight and offset, respectively, for the i-th layer of the multi-layer perceptron, and h and b are the weight and offset, respectively, of the linear transformation;
Figure BDA0003913671440000067
to input the embedding vector corresponding to the ith feature of the ith layer,
Figure BDA0003913671440000068
is the attention coefficient of the first layer of attention, W (l) And b (l) Respectively the connection weight and the offset of the ith layer in the depth model,
Figure BDA0003913671440000069
is the normalized attention coefficient, x, of the ith attention layer (l+1) And x (l) Respectively, input to the l +1 th layer and the l layer in the depth model, delta is a nonlinear excitation function sigmoid,
Figure BDA00039136714400000610
is the output of the i-th layer in a multi-layer feedforward network, o (l) Representing the output of the last layer in a multi-layer feedforward network, T representing the vector transpose operator
Then, carrying out the operation; linearly integrating all the layer outputs of the hierarchical attention network and calculating the recommended prediction score of the item x by utilizing a softmax function
Figure BDA00039136714400000611
In particular O = [ O ] (0) ,o (1) ,…,o (L) ]And
Figure BDA00039136714400000612
wherein W p And b p Respectively, the weight and the bias of the linear transformation.
Step 2.6: the prediction Loss based on the cross entropy is calculated by the formula
Figure BDA00039136714400000613
Theta is a parameter set of the recommendation model, N is the total number of training examples, and lambda is the weight of the regularization term; y is i Is the training data label of whether the ith item was recommended,
Figure BDA00039136714400000614
is the probability that the ith item was recommended as calculated by the model, with i ranging from i =1,2 … … N;
step 2.7: optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the recommendation model, and searching the learning rate between {0.0001,0.01 };
step 2.8: training the recommendation model in a manner that loops from step 2.4 to step 2.7 until the number of iterations equals a specified number of iterations; the number of given iterations is specifically determined by the error stability of the cross-entropy losses obtained by the loop execution, usually with the fluctuation of the cross-entropy losses obtained in the last 10 consecutive times being less than 10 -8 After training is completed, the trained guessed model is recorded as HAFM.
And step 3: calculating the probability of recommending the target item for the user by utilizing a trained recommendation model for the user instance information to be predicted and the target item;
and 4, step 4: recommending the target item with the recommendation probability larger than or equal to a preset recommendation threshold value to the user.
< testing of the effect of the trained inference model HAFM >
Experimental Environment
The experiments related to the invention all adopt a uniform experiment environment, and the main parameters of the hardware environment and the software environment used by the experiments are shown in table 1 based on the requirements of the depth model.
Table 1 basic information of experimental environment
Figure BDA0003913671440000071
Data set
To evaluate the performance and efficiency of the model, experiments were performed with 4 standard datasets for click rate prediction. Table 2 lists the statistics for these 4 data sets, each divided by a ratio of 8: 1, for training, validation and testing.
Table 2 basic statistics of four data sets
Figure BDA0003913671440000072
Baseline model
Baseline models can be roughly divided into three major categories: logistic regression models, FM-based models, and deep network-based models, are briefly described below.
(1) Logistic regression model
LR: the LR model is the most classical baseline model in click-through estimates.
(2) Model based on FM
FM: on the basis of LR, feature second-order interactive operation is introduced, and model expression is improved.
NFM: and after the FM second-order interaction, canceling the summation operation, and performing deeper feature interaction on the FM result by using the DNN.
FFM: the concept of the feature domain is introduced, and each feature domain learns one embedding for each feature, so that feature representation is enhanced.
AFM: a weight is learned for each interactive feature using the attention network.
Deep FM: the deep learning network and the FM are combined in parallel, and first-order features, explicit second-order feature interaction and implicit feature interaction are learned simultaneously.
xDeepFM: on the basis of deep FM, a CIN network is further designed to replace an explicit interaction module, and the estimation capability of the feature interaction promotion model is carried out at the vector level.
IFM: sample perception is proposed, and each input sample is given different weight, so that the embedded representation of the optimization features is updated.
DIFM: on the basis of IFM, the main idea in the Transformer is introduced, and the vector level interaction is learned through self-attention mechanics.
(3) Deep network based model
FNN: embedding by utilizing FM pre-training characteristics, and then carrying out subsequent deep characteristic interaction by using DNN.
AFN: a logarithmic neural network layer for adaptively adjusting interaction orders is designed, and the orders of characteristic interaction can be selected through the network, so that the performance of the model is improved.
WD: from the perspective of "generalization" and "memory," deep learning networks and LR are combined in parallel.
DCN: and (3) providing explicit interaction of cross network learning characteristics, implicit interaction of learning characteristics by using a neural network module, and finally integrating the output of the two modules.
IPNN: belongs to the PNN model and is in the same family with OPNN. After embedding the layer, performing explicit feature interaction in an inner product mode.
OPNN: belongs to the PNN model and is in the same family with IPNN. After embedding the layer, explicit feature interaction is performed in an outer product mode.
DRM: the importance of each embedding dimension to the model is calculated through an attention structure, so that the embedding quality of the features is improved.
MaskNet: the capability of DNN excavation of complex interaction features can be improved by using MaskBlock, maskNet families have a parallel model and a serial model due to different maskBlock stacking modes, the serial model is MaskNet, and the parallel model is MaskNetx.
Evaluation index
The experiment adopts two most commonly used evaluation indexes of AUC and Logloss in a click rate estimation task. The AUC is an area under an ROC curve, is an evaluation index for measuring the performance of the binary classification model, can represent the probability that a predicted positive case is arranged in front of a predicted negative case, and the higher the AUC value is, the better the performance is. Logloss is cross entropy loss, can represent the difference between real sample and prediction probability, and in the click rate estimation model, the smaller the Logloss, the more accurate the model prediction is represented. It is emphasized that 1% AUC boost and Logloss reduction are significant advances in offline prediction of click rate estimation tasks.
Details of the experiment
All models were trained using the PyTorch framework, and for comparative fairness, all models were learned using an Adam optimizer to optimize cross entropy loss, and the learning rate was searched between {0.0001,0.001,0.01 }. The embedding size is set to 16 and other hyper-parameters are kept consistent with default parameters in their paper reports or open source code. For the model containing the DNN module, the DNN hidden layer depth is uniformly set to be 2, and the number of neurons in each hidden layer is set to be 16. For the Criteo dataset, the Batch Size (Batch Size) is set to 4096, the Avazu dataset Batch Size is set to 2048, and for both the Movielens-1M and Book-Cross datasets, batch Size is set to 128. The regularization parameter λ is searched between {0.00001,0.0001,0.001 }. And respectively fixing a plurality of groups of random number seeds during model comparison, carrying out a plurality of groups of experiments, and averaging the results.
Next, the private hyper-parameters of the partial model are described:
the AFM model attention factor size was set to 16.
The number of cross layers of the DCN model is set to 3.
The CIN network layer parameters of the xDeepFM model are consistent with the default DNN parameters and are set to [16, 16].
The log neural network dimension of the AFN model was set to 1500.
In the multi-head attention module in the diff model, the number of attention heads is set to 4, the number of attention layers is 3, and the dimension size of attention is set to 16.
The maskbock number of the MaskNet model is set to 3.
The HAFM model employs a 3-layer attention network, i.e., L =3.
The maximum iteration period (epoch) of the model is set to 50, and an early stopping method (the maximum performance reduction step number is set to 2) is adopted to avoid overfitting of the model to the training data and shorten the training time. The specific operation is as follows:
(1) continuing to finish the training of the current period, judging whether the maximum training period is reached or not when the training is finished, and if the maximum training period is reached, terminating the training and turning to the step (4); otherwise, jumping to the step (2).
(2) Calculating the AUC of the current model by using the verification set, and if the AUC rises, carrying out next cycle training and skipping to the step (1); otherwise, recording the maximum decrease times of the AUC, and jumping to the step (3).
(3) Judging the maximum AUC (AUC) reduction times, if the times are more than 2, terminating the training, and skipping to the step (4); otherwise, jumping to the step (1).
(4) And terminating the model training and outputting the model result when the AUC of the verification set is the highest.
TABLE 3 AUC and Logloss results for click rate prediction on four datasets
Figure BDA0003913671440000101
Figure BDA0003913671440000111
Prediction accuracy analysis
Table 3 shows the overall performance of the different models on four different data sets. From the experimental results it can be observed that: 1) From the whole, compared with the most advanced models based on FM and deep networks, the HAFM model of the invention has great improvement on the indexes of AUC and Logloss, and achieves better performance on all four data sets. 2) Compared with the LR model, all the other feature-containing interaction models have better prediction accuracy, which shows the importance and effectiveness of feature interaction operation on CTR estimation. 3) Compared with an FM model, the NFM achieves performance improvement on both Criteo and Avazu data sets, but achieves opposite effects on small data sets MovieLens-1M and Book-Cross with a small number of features, and conversely achieves larger improvement on other methods for model interaction by utilizing DNN. This aspect illustrates that DNN is a contributing role for deep-level, high-order interactions and non-linear interactions of features, but simply stacking DNN after a second-order interaction of features seems to promote less and is no longer applicable for scenarios with a smaller number of features. On the other hand, it is shown that FM-based explicit second-order feature interaction is still more likely in scenarios with a smaller number of features. Therefore, the HAFM introduces DNN to carry out implicit high-order feature interaction, nonlinear interaction and explicit feature interaction on the features, and the click rate prediction accuracy can be effectively improved. 4) Compared with the FM model, the AFM and FFM models have improved performance except for the MovieLens-1M data set. This indicates that there is some improvement in performance whether feature attention is introduced or feature domain perceptual attention is introduced. This also illustrates that the attention mechanism has some promotion effect on the feature interaction of CTR prediction. The HAFM model introduces a hierarchical attention mechanism in linguistics, and the interaction of capturing features from low-order stratum to high-order stratum obtains remarkable improvement of effect. 5) Compared with other FM-based high-order feature interaction models and deep network advanced models, the HAFM model has the advantages of being remarkably superior to the other FM-based high-order feature interaction models and deep network advanced models. This also indicates the effectiveness of the hierarchical attention network proposed by the present invention.
Time efficiency analysis
FIGS. 3-6 show the run-time comparison of all models per iteration cycle (epoch) over four data sets. The abscissa of each column represents the current model, the ordinate represents the training time of one iteration cycle of the current model, and the darkened column represents the model HAFM provided by the invention. From the figure it can be observed that: 1) As can be seen from the entirety, the run time of the model on the Criteo and Avazu datasets is significantly higher than the MovieLens-1M and Book-Cross datasets due to the large number of features and samples in the first two datasets relative to the second two datasets, as shown in Table 2. 2) The time efficiency of LR is highest on all datasets, and FM is second order, since both models are relatively simple, FM adds second order feature interoperation on the basis of LR. Xdepfm is the least efficient model on Criteo and Avazu datasets, next to FFM. Since these two models have too many model parameters and computational operations. Each layer of CIN network calculation introduced by the xDepFM model needs Hadamard (Hadamard) product operation with all vectors of an input layer pairwise. The FFM model embeds independent expression vectors in different feature domains for each feature, and introduces too many parameters. On the MovieLens-1M and Book-Cross datasets, HAFM was the least efficient model. This is because for a small data set with a small number of features and samples, the number of feature fields is smaller than the feature embedding size, the computation overhead of the model occupies most of the runtime, and the HAFM model includes 3-layer attention computation, which reduces the efficiency of model operation, but it can be ignored for a large data set with a large number of features.
Through the test analysis, the implicit and explicit feature interaction from low order to high order is realized by utilizing a second-order interaction operator and a level attention mechanism, the accuracy, the running speed and other performances of the recommendation system are effectively improved, the recommendation system can be widely applied, and the active service quality of information is improved.
According to the recommendation method of the present invention, the present invention also provides a recommendation system, comprising:
an acquisition unit configured to acquire user instance information; for example, a background such as a mobile phone client or a computer client acquires data such as the age, sex, occupation, types and prices of favorite items, webpage clicking behavior, browsing behavior, comment behavior, praise behavior and the like of a user to obtain user instance information;
the presumption unit is configured to input the user instance information into a recommendation model trained according to the method of any one of claims 1-5, and calculate the probability of the user recommending the target item by using the trained recommendation model for the user instance information to be predicted;
and the pushing unit is configured to recommend the target item with the recommendation probability larger than or equal to a preset recommendation threshold value to the terminal of the user.
While embodiments of the invention have been disclosed above, it is not intended to be limited to the uses set forth in the specification and examples. It can be applied to all kinds of fields suitable for the present invention. Additional modifications will readily occur to those skilled in the art.

Claims (6)

1. The recommendation method for integrating the interaction between the hierarchical attention and the features is characterized by comprising the following steps of:
s1, collecting user instance information;
s2, according to the user instance information, carrying out interactive training recommendation models on implicit and explicit characteristics from a low order to a high order by utilizing a second-order interaction operator and a level attention mechanism;
s3, calculating the probability of recommending the target item for the user by using a trained recommendation model for the user instance information and the target item to be predicted;
and S4, recommending the target item with the recommendation probability larger than or equal to the preset recommendation threshold value to the user.
2. The method for recommending fused hierarchical attention and feature interaction of claim 1, wherein said user instance information comprises user attributes, item attributes, and user historical behavior attributes.
3. The recommendation method for fusing hierarchical attention and feature interaction according to claim 2, wherein the S2 comprises:
s2-1, initializing a parameter set of the recommendation model;
s2-2, carrying out sparse coding in a one-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through a sparse representation layer of the recommendation model to obtain sparse representation of the user instance information;
s2-3, converting the sparse representation of the user instance information into a low-dimensional dense vector through an embedding layer of the recommendation model to obtain embedded representation of the user and the project;
s2-4, collecting a small batch of samples from the embedded representation obtained in the S2-3;
s2-5, calculating a prediction score of the item recommended by the user by using a small batch of samples through a second-order interaction operator and a forward process of a hierarchical attention network;
s2-6, calculating the predicted Loss based on cross entropy;
s2-7, optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the recommendation model, and searching the learning rate between {0.0001,0.01 };
s2-8, performing loop from S2-4 to S2-7 until the number of iterations is equal to the number of specified iterations.
4. The method for recommending fusion level attention and feature interaction of claim 3, wherein said S2-5 comprises:
the hierarchical attention network is formed by stacking L attention layers, wherein the L (L =0,1, …, L) attention layer comprises:
x (0) =[v 1 ,v 2 ,…,v m ]
Figure FDA0003913671430000021
Figure FDA0003913671430000022
Figure FDA0003913671430000023
Figure FDA0003913671430000024
Figure FDA0003913671430000025
Figure FDA0003913671430000026
Figure FDA0003913671430000027
Figure FDA0003913671430000028
wherein the superscript (l) indicates that the current attention layer is positioned at the hierarchy number of l, x (0) =[v 1 ,v 2 ,…,v m ]Is the output of the embedding layer of the recommendation model, v i (i=1,2,…,m) Is the expression vector of the ith characteristic output by the embedding layer, m is the attribute number of the user instance,
Figure FDA0003913671430000029
embedding vector v corresponding to ith feature input into ith layer i The (d) dimension (k) of (c),
Figure FDA00039136714300000210
IS the k-dimension, IS, of the interactive feature vector of the l-th layer (l) Representing the interactive characteristic vector of the l-th layer, M is the layer number of the multilayer perceptron corresponding to the output module, W i (i =1,2, …, M) and b i (i =1,2, …, M) are the weight and offset, respectively, for the i-th layer of the multi-layer perceptron, and h and b are the weight and offset, respectively, of the linear transformation;
Figure FDA00039136714300000211
to input the embedding vector corresponding to the ith feature of the ith layer,
Figure FDA00039136714300000212
is the attention coefficient of the first layer of attention, W (l) And b (l) Respectively the connection weight and the offset of the ith layer in the depth model,
Figure FDA00039136714300000213
is the normalized attention coefficient, x, of the ith attention layer (l+1) And x (l) Respectively, input to the l +1 th layer and the l layer in the depth model, delta is a nonlinear excitation function sigmoid,
Figure FDA00039136714300000214
is the output of the i-th layer in a multi-layer feedforward network, o (l) Representing the output of the last layer in the multi-layer feedforward network, and T representing a vector transposition operator;
then, all the layer outputs of the hierarchical attention network are linearly integrated and the recommended prediction score of the item x is calculated by utilizing the softmax function
Figure FDA00039136714300000216
In particular O = [ O ] (0) ,o (1) ,…,o (L) ]And
Figure FDA00039136714300000215
wherein W p And b p Respectively, the weight and the bias of the linear transformation.
5. A recommendation method for fusing hierarchical attention and feature interaction according to claim 3 or 4, characterized in that the cross-entropy penalty in S2-6 is:
Figure FDA0003913671430000031
wherein, Θ is a parameter set of the recommendation model, N is a total number of training examples, and λ is a weight of the regularization term; y is i Is the training data label of whether the ith item was recommended,
Figure FDA0003913671430000032
is the probability that the model calculates that the ith item was recommended, i ranges from i =1,2 … … N.
6. A recommendation system, comprising:
an acquisition unit configured to acquire user instance information;
the presumption unit is configured to input the user instance information into a recommendation model trained according to the method of any one of claims 1-5, and calculate the probability of the user recommending the target item by using the trained recommendation model for the user instance information to be predicted;
and the pushing unit is configured to recommend the target item with the recommendation probability larger than or equal to a preset recommendation threshold value to the terminal of the user.
CN202211322085.6A 2022-10-28 2022-10-28 Recommendation method fusing hierarchical attention and feature interaction and application system thereof Pending CN115687757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211322085.6A CN115687757A (en) 2022-10-28 2022-10-28 Recommendation method fusing hierarchical attention and feature interaction and application system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211322085.6A CN115687757A (en) 2022-10-28 2022-10-28 Recommendation method fusing hierarchical attention and feature interaction and application system thereof

Publications (1)

Publication Number Publication Date
CN115687757A true CN115687757A (en) 2023-02-03

Family

ID=85099176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211322085.6A Pending CN115687757A (en) 2022-10-28 2022-10-28 Recommendation method fusing hierarchical attention and feature interaction and application system thereof

Country Status (1)

Country Link
CN (1) CN115687757A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033948A (en) * 2023-10-08 2023-11-10 江西财经大学 Project recommendation method based on feature interaction information and time tensor decomposition

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033948A (en) * 2023-10-08 2023-11-10 江西财经大学 Project recommendation method based on feature interaction information and time tensor decomposition
CN117033948B (en) * 2023-10-08 2024-01-09 江西财经大学 Project recommendation method based on feature interaction information and time tensor decomposition

Similar Documents

Publication Publication Date Title
La Gatta et al. Music recommendation via hypergraph embedding
Acilar et al. A collaborative filtering method based on artificial immune network
Luo et al. Personalized recommendation by matrix co-factorization with tags and time information
CN110781409B (en) Article recommendation method based on collaborative filtering
Sun et al. Self-attention network for session-based recommendation with streaming data input
CN111737578A (en) Recommendation method and system
Lv et al. AICF: Attention-based item collaborative filtering
Xian et al. Regnn: a repeat aware graph neural network for session-based recommendations
CN113918833A (en) Product recommendation method realized through graph convolution collaborative filtering of social network relationship
Aggarwal et al. Context-sensitive recommender systems
Zheng et al. Graph-convolved factorization machines for personalized recommendation
CN115687757A (en) Recommendation method fusing hierarchical attention and feature interaction and application system thereof
Nadee Modelling user profiles for recommender systems
Ibrahim et al. Improved Hybrid Deep Collaborative Filtering Approach for True Recommendations.
Xiao et al. LT4REC: A Lottery Ticket Hypothesis Based Multi-task Practice for Video Recommendation System
CN116757747A (en) Click rate prediction method based on behavior sequence and feature importance
Koniew Classification of the User's Intent Detection in Ecommerce systems-Survey and Recommendations.
Chen et al. Combine temporal information in session-based recommendation with graph neural networks
CN116738053A (en) Cross-domain news recommendation system and recommendation method based on text implication
CN114565436A (en) Vehicle model recommendation system, method, device and storage medium based on time sequence modeling
Shomalnasab et al. An optimal similarity measure for collaborative filtering using firefly algorithm
Rawat et al. An embedding-based deep learning approach for movie recommendation
Luo et al. User dynamic preference construction method based on behavior sequence
Chantamunee et al. Relation-aware collaborative autoencoder for personalized multiple facet selection
Osmanlı A singular value decomposition approach for recommendation systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination