CN110781409B - Article recommendation method based on collaborative filtering - Google Patents

Article recommendation method based on collaborative filtering Download PDF

Info

Publication number
CN110781409B
CN110781409B CN201911022328.2A CN201911022328A CN110781409B CN 110781409 B CN110781409 B CN 110781409B CN 201911022328 A CN201911022328 A CN 201911022328A CN 110781409 B CN110781409 B CN 110781409B
Authority
CN
China
Prior art keywords
attention
item
user
layer
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911022328.2A
Other languages
Chinese (zh)
Other versions
CN110781409A (en
Inventor
郑莹
吕艳霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University Qinhuangdao Branch
Original Assignee
Northeastern University Qinhuangdao Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University Qinhuangdao Branch filed Critical Northeastern University Qinhuangdao Branch
Priority to CN201911022328.2A priority Critical patent/CN110781409B/en
Publication of CN110781409A publication Critical patent/CN110781409A/en
Application granted granted Critical
Publication of CN110781409B publication Critical patent/CN110781409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an item recommendation method based on collaborative filtering, which relates to the technical field of recommendation systems and introduces a special dynamic weight to better predict the preference degree of a user u to an item i, wherein the dynamic weight can be estimated by using an attention mechanism, the recommendation performance is estimated by using recall ratio and precision ratio, the effectiveness and recommendation quality of the recommendation system are improved, the attention mechanism is proved to be helpful for estimating the contribution of history items interacted by the user to the user preference representation, and the personalized recommendation is more accurate. The attention scores are respectively calculated by utilizing the point-by-point attention and the self-attention, remarkable effects are obtained, meanwhile, the transform model and the recommendation algorithm are combined and compared with a conventional embedded model, and the improvement of the recommendation effect is shown.

Description

Article recommendation method based on collaborative filtering
Technical Field
The invention relates to the technical field of recommendation systems, in particular to an article recommendation method based on collaborative filtering.
Background
Collaborative Filtering (CF) is the earliest and well-known recommendation algorithm. The main functions are prediction and recommendation, and the method is not only deeply researched in academia but also widely applied in the industry. The algorithm discovers the preference of the user through mining historical behavior data of the user, and recommends similar-taste articles to the user based on different preferences. Collaborative Filtering recommendation algorithms are mainly divided into two categories, namely User-based Collaborative Filtering (User-based Collaborative Filtering, abbreviated as UserCF) and Item-based Collaborative Filtering (Item-based Collaborative Filtering, abbreviated as ItemCF). In brief, the following is: humans are classified as species and groups as groups. The collaborative filtering algorithm based on the user finds out the user's likes (such as commodity purchase, collection, content comment or share) of commodities or contents through the historical behavior data of the user, and measures and scores the likes. And calculating the relationship between the users according to attitudes and preference degrees of different users on the same commodity or content, and recommending commodities among users with the same preference. In brief, if two users A and B purchase three books x, y and z and give a good comment of 5 stars, the books w viewed by A can be recommended to the user B by the users A and B belonging to the same class. UserCF finds application in some websites (e.g., Digg), but the algorithm has some disadvantages. Firstly, as the number of users of a website is larger and larger, it is more and more difficult to calculate the user interest similarity matrix, and the increase of the operation time complexity and the space complexity and the increase of the number of users are approximate to a square relation. Second, user-based collaborative filtering makes interpretation of recommendation results difficult. Therefore, amazon, a well-known e-commerce company, proposes another article-based collaborative filtering algorithm.
An item-based collaborative filtering algorithm (ICF) recommends to users items that are similar to the items they previously liked. For example, the algorithm may recommend machine learning for you because you have purchased data mining guide. However, the ICF algorithm does not calculate the similarity between items using the content attributes of the items, and it calculates the similarity between items mainly by analyzing the behavior records of users. ICF not only provides a convincing explanation for prediction results in many recommendation scenarios, but also facilitates real-time personalization. In particular, the main calculation of estimating the similarity between items can be done off-line, whereas the online recommendation module only needs to perform a series of lookups on similar items, which is easily done in real time.
The earliest collaborative filtering ItemCF method based on items was to determine whether to add a target item to the user's recommendation list by calculating the similarity between the items that the user has contacted in the past and the current target item, i.e. the predicted score of the user u for a particular item i is equal to the similarity s between all the interacted items j of the user u and the item i respectivelyijMultiplying by the user's score r for jujAnd finally the accumulated values. The calculation formula is as follows:
Figure BDA0002247620970000021
early ItemCF methods used statistical measures to calculate the similarity between user historical items and target items, such as Pearson coefficient and Cosine similarity. This approach is simple but this heuristic-based approach to estimating item similarity lacks optimizations tailored to recommendations and therefore can produce suboptimal performance. Secondly, in the case of sparse data, it is assumed that the cosine similarity of the user to the unevaluated item is adjusted to 0, and the item set (co-related) evaluated by the user together in the Pearson coefficient may be small. Therefore, these methods need to be adapted and optimized to adapt different data sets to the recommended scheme.With the development of machine learning, a learning-based approach is proposed, called SLIM. The method mainly customizes a recommendation objective function to optimize the similarity between learning objects which are self-adaptive from data. That is, to minimize the loss between the original user item interaction matrix and the interaction matrix reconstructed based on the CF model of the item. Although SLIMs can achieve better recommendation accuracy, it has two inherent limitations. First, the offline training process can be very time consuming for large-scale data, since to learn directly with the similarity matrix S, the temporal complexity is O (I)2) Magnitude. Secondly it can only estimate the similarity between two items purchased together or scored together, cannot estimate the similarity between unrelated items and therefore cannot capture the transitive relationship between items. In an actual recommendation task, particularly when data is sparse, the recommendation effect of the SLIM is reduced.
FISM addresses these limitations well. This method is primarily to represent the items as low-dimensional embedded vectors, so that the similarity s between the itemsijIt is parameterized as the inner product of the embedded vectors of items i and j. As the number of users and articles increases and the whole interaction matrix becomes sparse, the effectiveness of the existing Top-K recommendation method is reduced, and an article-based method is provided in the FISM algorithm for generating Top-K recommendations, wherein the recommendation algorithm sets the learning of an article similarity matrix as the product of two low-dimensional latent factor matrices. A whole set of experiments performed on multiple data sets at several different sparsity levels shows that the method proposed in the fish algorithm can efficiently process sparse data sets. Due to the fact, the recommendation accuracy of the FISM is superior to that of other popular Top-K recommendation algorithms, and particularly as the data set becomes sparse, the performance of the FISM is greatly improved. Although it has superior performance, it is clearly unreasonable to assume that all of the historically interactive items of the user have the same contribution to the representation of the user's preferences. For example, basketball and everyday items do not have the same effect on real-time recommended basketball shoes. Therefore, a special dynamic weight is introduced to better predict the preference degree of the user u for the item i, and the dynamic weight is estimated by using an attention mechanism。
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an article recommendation method based on collaborative filtering.
An article recommendation method based on collaborative filtering comprises the following specific steps:
1. an article recommendation method based on collaborative filtering is characterized in that: the method comprises the following steps:
step 1: calculating the prediction score of a user u on a target item i, obtaining an embedding vector p and an embedding vector q of the prediction target item i through an embedding layer by using one-hot coding, wherein p represents that the item is a predicted item, and q represents that the item is a historical interactive item, and obtaining an evaluation index of the item, wherein the ItemCF formula based on attention is defined as follows:
Figure BDA0002247620970000031
aij=f(pi,qj)
wherein i is a predicted target item, j is a user history interactive item, aijWeights, p, of historical interactive items to representations of user preferences calculated using an attention networkiAnd q isjRepresenting the embedding vector of the predicted item set and the embedding vector of the user-interacted items, respectively, R represents the positive case set of user u,
Figure BDA0002247620970000032
showing the removal of the item i in the proper case,
Figure BDA0002247620970000033
is a coefficient;
step 1.1: embedding vector p to query an item setiEmbedding vector q of user-interacted article setjSplicing the two vectors to obtain a spliced vector c,
Figure BDA0002247620970000034
using the stitching vector as a point-by-point attention modelWill name the first attempt of attention mechanism as Dot;
step 1.1.1: independently performing three times of linear transformation on the splicing vector c, wherein the coefficient matrixes are respectively WQ,WK,WVThus, the input Query, Key, Value (Q, K, V) of the attention network is obtained;
step 1.1.2: the dot product of Q and K transposes is implemented using a highly optimized matrix multiplication, after softmax, by V to get a weight matrix, expressing the Attention function as Attention (Q, K, V), and the calculation formula is as follows:
Figure BDA0002247620970000035
wherein d iskExpressing the dimension of K, the softmax function converts the value into probability distribution, if the dimensions of Q, K and V are the same, the dimension of the output attention weight matrix is the same as the dimensions of the Q, K and V;
step 1.2: vector to be spliced
Figure BDA0002247620970000036
Putting the obtained product into a network as input, repeating the previous single point multiplied by attention for h times, splicing h times of result matrixes, and finally converting the result matrixes into required dimensionality through linear transformation, namely setting an attention function as a Self-attention model to calculate the weight of the contribution of the historical item j to the score of the user u prediction target item i, and naming the weight as Self;
step 1.3: the method comprises the steps of utilizing a main framework of a Transformer model, mainly dividing the main framework into an encoder module and a decoder module, setting the input of a first submodule of the encoder module as an embedded vector p of a target object to be predictediThe input of each remaining submodule is the output of the previous submodule, each encoder submodule is composed of two layers, the first layer is a self-Attention model layer, the second layer is a feedback layer, after the extension operation, the encoder and decoder both contain a fully connected forward network, including two linear transformations and a Relu activation output, and the formula is as follows:
FFN(x)=max(0,xW1+b1)W2+b2
inputting a first sub-module of the decoder module into a set q of historical items for which user interaction is setjThe input of each remaining sub-module is the output of the previous sub-module, each decoder sub-module is composed of three layers, the first layer and the second layer are self-attention layers, but the input Q of the second layer is the output of the previous layer, K and V are the outputs of the encoder, the third layer is a feedback layer, and "Add" is added after each layer&A normaize "layer to prevent fading or explosion while preventing overfitting; the output of the model is converted to the required size by a fully connected layer and softmax function to obtain the attention weight aijAnd the following work is carried out, and the model is defined as Trans;
step 1.4: customizing an objective function, treating observed user-item interactions as positive examples, extracting negative examples from remaining unobserved interactions, using R+And R-Represents the set of positive and negative examples, uses log as a loss term, and penalizes the embedded vectors and the coefficient and bias terms of each network with L2 paradigm. Then the loss function is as follows:
Figure BDA0002247620970000041
where N represents the total number of training examples, σ represents sigmoid method to convert predicted values into probability values, the strength of L2 paradigm controlled by hyperparameter λ is used to prevent overfitting, θ { { p { (p)i},{qjW, b, h represents all trainable parameters, where W, b, h and all used parameters of linear transformation have regular penalties; a variant of the algorithm that uses stochastic gradient descent, called Adagrad optimization objective function, applies an adaptive learning rate to each parameter, extracts random samples from all training examples, and updates the relevant parameters in the negative direction of the gradient. A mini-batch method is used to randomly pick a user and then use all of its interacted article sets as a small batch.
Step 2: and (4) performing an experiment on the real article data set on the evaluation index, judging the performance according to the recommendation result, and comparing the experiment result with other recommendation methods.
The invention has the beneficial effects that:
the method applies a machine translation attention mechanism transformer in natural language processing to a recommendation model, performs experiments on the method provided by the invention on two real data sets of a movie and a picture, and evaluates by using two common recommendation model evaluation indexes of recall ratio and precision ratio. Based on the recall ratio, the method realizes the improvement of 3.2 percent relatively, and based on the precision ratio, the method realizes the improvement of 4.3 percent relatively, so the method can generate a more accurate personalized recommendation list for the user. The efficient recommendation system can provide an efficient and intelligent information filtering technology for the user under the condition that the user lacks experience in related fields or cannot process massive data, explores potential consumption tendency of the user, and provides personalized services for numerous users. Through recommending articles to the user accurately, the interest of the user can be improved, the browsing amount of the website, the click rate and the purchase rate are improved, and great convenience is brought to the life and leisure of the user while income is brought to the website. The better recommendation method can bring business value to the enterprise entity, optimize sales boundary and profit, help the product to expand the boundary, provide more various and more intimate experience through scene construction, and finally improve profit and the like.
Drawings
FIG. 1 is a basic framework of the attention-based Item CF model;
FIG. 2 is a point-by-point attention model structure;
FIG. 3 is a transform model base framework;
FIG. 4 is a comparison of the performance of the models FISM, Dot, Self, Trans at an embedding size of 16;
in FIG. a, ML-1M-HR, in FIG. b ML-1M-NDCG, in FIG. c Pinterest-20-HR and in FIG. d Pinterest-20-NDCG.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments. The specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.
An item recommendation method based on collaborative filtering comprises the following steps:
step 1: as shown in fig. 1, u is represented by Multi-hot coding, i.e. all items that user u has interacted with in the case of implicit feedback, in which case the Multi-hot coding of the user passes through the embedding layer and generates a set of vectors, where each vector represents a historical item associated with the user, and the target item to be predicted obtains its embedded vector through the embedding layer using one-hot coding. Calculating the prediction score of a user u on a target item i, and obtaining an embedding vector p and an embedding vector q of the prediction target item i through an embedding layer by using one-hot coding, wherein p represents that the item is a predicted item, and q represents that the item is a historical interactive item, as shown in FIG. 1, an attention-based ItemCF formula is defined as follows:
Figure BDA0002247620970000051
aij=f(pi,qj)
wherein i is a predicted target item, j is a user history interactive item, aijWeights, p, of historical interactive items to representations of user preferences calculated using an attention networkiAnd q isjRepresenting the embedding vector of the predicted item set and the embedding vector of the user-interacted items, respectively, R represents the positive case set of user u,
Figure BDA0002247620970000052
showing the removal of the item i in the proper case,
Figure BDA0002247620970000053
is a coefficient;
step 1.1: embedding vector p to query an item setiEmbedding vector q of user-interacted article setjSplicing the two vectors to obtainThe stitching vector c learns the interaction weights,
Figure BDA0002247620970000054
taking the stitching vector as an input of the point-by-point attention model, as shown in fig. 2, the first attempt of the attention mechanism is named Dot;
step 1.1.1: independently performing three times of linear transformation on the splicing vector c, wherein the coefficient matrixes are respectively WQ,WK,WVThus, the input Query, Key, Value (Q, K, V) of the attention network is obtained;
step 1.1.2: the Q and K transposed dot products are implemented using highly optimized matrix multiplication because dot products are faster, more space-saving, and can be implemented using highly optimized matrix multiplication with factors
Figure BDA0002247620970000061
The adjustment is performed so that the inner product is not too large, otherwise the value of the softmax layer is not 0 or 1, which causes the problem of gradient disappearance or explosion, and thus the value can be kept at the position where the gradient is large, the softmax function is to convert the value into a probability distribution, which is very friendly to the gradient calculation, after softmax, the value is multiplied by V to obtain a weight matrix, and the Attention function is expressed as Attention (Q, K, V), and the calculation formula is as follows:
Figure BDA0002247620970000062
wherein d iskExpressing the dimension of K, the softmax function converts the value into probability distribution, if the dimensions of Q, K and V are the same, the dimension of the output attention weight matrix is the same as the dimensions of the Q, K and V;
step 1.2: vector to be spliced
Figure BDA0002247620970000063
Putting the data into a network as input, repeating the previous single point with attention for h times, splicing the result matrixes of the h times, and finally converting the result matrixes into the required dimension through linear transformation, namely putting noteThe intention function is set as a Self-attention model to calculate the weight that the historical item j contributes to the score of the user u predicting the target item i and is named Self.
Step 1.3: the main framework of the transform model is mainly divided into an encoder module and a decoder module as shown in FIG. 3, wherein the input of a first sub-module of the encoder module is set as an embedded vector p of a target object to be predictediThe input of each remaining submodule is the output of the previous submodule, each encoder submodule is composed of two layers, the first layer is a self-Attention model layer, the second layer is a feedback layer, after the extension operation, the encoder and decoder both contain a fully connected forward network, including two linear transformations and a Relu activation output, and the formula is as follows:
FFN(x)=max(0,xW1+b1)W2+b2
inputting a first sub-module of the decoder module into a set q of historical items for which user interaction is setjThis greatly enhances the model interpretability, the input of each remaining submodule is the output of the previous submodule, each decoder submodule is composed of three layers, the first layer and the second layer are self-attention layers, but the input Q of the second layer is the output of the previous layer, K and V are the outputs of the encoder, the third layer is a feedback layer, and "Add" is added after each layer&A normaize "layer to prevent fading or explosion while preventing overfitting; for the output of the model, it will be converted to the required size by a fully connected layer and softmax function to obtain the attention weight αijAnd the following work is carried out, and the model is defined as Trans;
step 1.4: customizing an objective function, treating observed user-item interactions as positive examples, extracting negative examples from remaining unobserved interactions, using R+And R-Representing a set of positive and negative examples, using a loss-over-cross function as an objective function, and penalizing the embedded vectors and the coefficients and bias terms of each network with the L2 paradigm. Then the objective function is as follows:
Figure BDA0002247620970000071
where N represents the total number of training examples, σ represents the likelihood that sigmoid method converts predicted values to probability values representing the likelihood that user u will interact with item i, and the hyperparameter λ controls the strength used for the L2 paradigm to prevent overfitting, θ { { p { (p) }i},{qjW, b, h represents all trainable parameters, where W, b, h and all used parameters of linear transformation have regular penalties; a variant of the algorithm that uses stochastic gradient descent, called Adagrad optimization objective function, applies an adaptive learning rate to each parameter, extracts random samples from all training examples, and updates the relevant parameters in the negative direction of the gradient.
The present embodiment implements all models using TensorFlow, which requires that all training instances of a batch must be the same length, since some active users may have interacted with thousands of items, so that the sampled small batch training set is still very large. To solve this problem, this embodiment uses a mini-batch method to randomly pick a user and then use all the interacted article sets as a small batch, instead of randomly drawing a fixed number of training examples as a small batch of training sets. This approach has two advantages: 1) the masking skill is not needed, so the speed is higher; 2) there is no need to specify the batch size, which can avoid resizing the batch. If the attention network and the item embedding vector are trained simultaneously, the output of the attention network changes the item embedding, so that the joint training easily causes the self-adaptive effect, and the convergence speed is reduced. In order to solve the practical problem in the model training, the present embodiment uses the FISM algorithm proposed by Kabbur et al to pre-train the model, and initializes the model by using the article embedding vector learned by the FISM algorithm. Since the FISM algorithm has no self-adaptation problem, the FISM algorithm can better learn the similarity of the embedded coded objects. Therefore, the learning of the attention network can be greatly promoted by initializing the model by using the FISM algorithm, and the performance can be better and the convergence can be fast.
Step 2: and (4) performing an experiment on the real article data set on the evaluation index, judging the performance according to the recommendation result, and comparing the experiment result with other recommendation methods.
The embodiment gives a weight to each item interacted in the user history, so that the user preference can be more accurately represented by the user history item set when the user predicts and scores the target item, the recommendation effect is improved, the personalized recommendation is more accurate, and the improvements are attributed to an effective attention introducing mechanism so as to distinguish the importance of the history items in the user representation. We performed a comprehensive experiment on two authentic object data sets ML-1M and Pinterest-20 on the evaluation indices HR and NDCG to evaluate Top-K. Performance is evaluated by Hit Ratio (HR) and Normalized counted relative Gain (NDCG) of the first 10 bits of the recommended result. These two indicators have been widely used in search systems for evaluating Top-K recommendations and information retrieval documents. HR @10 may be interpreted as a recall-based metric that represents the percentage of successfully recommended users (i.e., the positive case appears in the top 10), while NDCG @10 is a measure that takes into account the predicted location of the positive case, the larger the values of these two metrics the better.
We compare the experimental results with some popular recommendations. For these embedding-based methods (MF, MLP, FISM, and models herein), the embedding size controls its modeling capabilities; therefore, we set all methods to 16. As a result, as shown in Table 1, it can be seen that the attention-based models all achieve better results and the final results are similar. They received the highest scores for NDCG and HR scores in all data sets. On the ML-1M data set, the performances of the three models are improved to a certain extent compared with the FISM, wherein the Self model is relatively improved to the FISM by 3.1 percent and 4.3 percent in the aspects of HR and NDCG. This may be a relatively simple-structured model that captures user features more fully on a less sparse data set, characterizing user preferences. On Pinterest-20, Trans is better than the other two in that it reaches the highest score and is improved by 3.2% over the FISM on NDCG, probably because of the deeper network's better ability to capture sparse data. The learning-based collaborative filtering method generally performs much better than those based on heuristics such as Pop and ItemKNN, and particularly, the FISM is much higher than ItemKNN. Considering that both methods use the same predictive model, but differ in the approach to similarity estimation of the item, the positive impact of the custom optimization on the recommendations can be clearly seen.
Table 1 comparative effect chart
Figure BDA0002247620970000081
As shown in fig. 4, with an article embedding size of 16, the performance of FISM and the Dot, Self and Trans proposed in this application at each epoch, the three models reach the highest scores of HR and NDCG on both data sets, which achieve the same performance level, achieving a significant improvement over FISM. We believe that these advantages are due to the efficient design of the attention network when learning item-to-item interactions. Especially at the first epoch, the performance of our model significantly exceeds the FISM, and as the training time increases, the experimental effect becomes better until convergence.
Based on the above discussion, the research on the article-based collaborative filtering algorithm is attempted to access various attention models to improve the learning of the dynamic weight coefficient, and to implement and experiment, and compared with other models, a better effect is achieved. The main contribution is (1) the demonstration that the attention mechanism helps to capture the dynamic weight of the contribution of the new item to the similarity calculation between the historical item sets that the user has been exposed to. The personalized recommendation is more accurate. (2) The attention points are calculated from the attention by using the point-by-point attention, and a good effect is achieved. (3) The method combines the transform model and the recommendation algorithm and compares the transform model with the conventional embedded model, and shows the improvement of the recommendation effect.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions as defined in the appended claims.

Claims (1)

1. An article recommendation method based on collaborative filtering is characterized in that: the method comprises the following steps:
step 1: calculating the prediction score of a user u on a target item i, obtaining an embedding vector p and an embedding vector q of the prediction target item i through an embedding layer by using one-hot coding, wherein p represents that the item is a predicted item, and q represents that the item is a historical interactive item, and obtaining an evaluation index of the item, wherein the ItemCF formula based on attention is defined as follows:
Figure FDA0003408983680000011
aij=f(pi,qj)
wherein i is a predicted target item, j is a user history interactive item, aijWeights, p, of historical interactive items to representations of user preferences calculated using an attention networkiAnd q isjRepresenting the embedding vector of the predicted item set and the embedding vector of the user-interacted items, respectively, R represents the positive case set of user u,
Figure FDA0003408983680000012
showing the removal of the item i in the proper case,
Figure FDA0003408983680000013
is a coefficient;
step 1.1: embedding vector p to predict an item setiEmbedding vector q of user-interacted article setjSplicing the two vectors to obtain a spliced vector c,
Figure FDA0003408983680000014
will be pieced togetherConnecting the vector as the input of the point-by-point attention model, and naming the first attempt of the attention model as Dot;
step 1.1.1: independently performing three times of linear transformation on the splicing vector c, wherein the coefficient matrixes are respectively WQ,WK,WVThus, the input Query, Key, Value (Q, K, V) of the attention network is obtained;
step 1.1.2: the dot product of Q and K transposes is implemented using a highly optimized matrix multiplication, after softmax, by V to get a weight matrix, expressing the Attention function as Attention (Q, K, V), and the calculation formula is as follows:
Figure FDA0003408983680000015
wherein d iskExpressing the dimension of K, the softmax function converts the value into probability distribution, if the dimensions of Q, K and V are the same, the dimension of the output attention weight matrix is the same as the dimensions of the Q, K and V;
step 1.2: vector to be spliced
Figure FDA0003408983680000016
Putting the obtained product into a network as input, repeating the previous single point multiplied by attention for h times, splicing h times of result matrixes, and finally converting the result matrixes into required dimensionality through linear transformation, namely setting an attention function as a Self-attention model to calculate the weight of the contribution of the historical item j to the score of the user u prediction target item i, and naming the weight as Self;
step 1.3: the method comprises the steps of utilizing a main framework of a Transformer model, mainly dividing the main framework into an encoder module and a decoder module, setting the input of a first submodule of the encoder module as an embedded vector p of a target object to be predictediThe input of each remaining submodule is the output of the previous submodule, each encoder submodule is composed of two layers, the first layer is a self-Attention model layer, the second layer is a feedback layer, and both the encoder and the decoder comprise a fully-connected forward network after the Attention operation, and the network comprises two linear transformationsAnd a Relu activation output, which is formulated as follows:
FFN(x)=max(0,xW1+b1)W2+b2
inputting a first sub-module of the decoder module into a set q of historical items for which user interaction is setjThe input of each remaining sub-module is the output of the previous sub-module, each decoder sub-module is composed of three layers, the first layer and the second layer are self-attention layers, but the input Q of the second layer is the output of the previous layer, K and V are the outputs of the encoder, the third layer is a feedback layer, and "Add" is added after each layer&A normaize "layer to prevent fading or explosion while preventing overfitting; the output of the model is converted into the required size by a full connection layer and softmax function to obtain the attention weight aijAnd the following work is carried out, and the model is defined as Trans;
step 1.4: customizing an objective function, treating observed user-item interactions as positive examples, extracting negative examples from remaining unobserved interactions, using R+And R-Representing a set of positive and negative examples, using log as a loss term and penalizing the embedded vectors and the coefficients and bias terms of each network with the L2 paradigm, then the loss function is as follows:
Figure FDA0003408983680000021
where N represents the total number of training examples, σ represents sigmoid method to convert predicted values into probability values, the strength of L2 paradigm controlled by hyperparameter λ is used to prevent overfitting, θ { { p { (p)i},{qjW, b, h represents all trainable parameters, where W, b, h and all used parameters of linear transformation have regular penalties; a variant of the method using stochastic gradient descent, called Adagrad optimization objective function, applies adaptive learning rate to each parameter, extracts random samples from all training examples, updates the relevant parameters to the negative direction of the gradient, uses a mini-batch method to randomly pick a user, and uses all the interacted objects to select a new userTaking the product set as a small batch;
step 2: and (4) performing an experiment on the real article data set on the evaluation index, judging the performance according to the recommendation result, and comparing the experiment result with other recommendation methods.
CN201911022328.2A 2019-10-25 2019-10-25 Article recommendation method based on collaborative filtering Active CN110781409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911022328.2A CN110781409B (en) 2019-10-25 2019-10-25 Article recommendation method based on collaborative filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911022328.2A CN110781409B (en) 2019-10-25 2019-10-25 Article recommendation method based on collaborative filtering

Publications (2)

Publication Number Publication Date
CN110781409A CN110781409A (en) 2020-02-11
CN110781409B true CN110781409B (en) 2022-02-01

Family

ID=69388037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911022328.2A Active CN110781409B (en) 2019-10-25 2019-10-25 Article recommendation method based on collaborative filtering

Country Status (1)

Country Link
CN (1) CN110781409B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737573A (en) * 2020-06-17 2020-10-02 北京三快在线科技有限公司 Resource recommendation method, device, equipment and storage medium
CN112182156B (en) * 2020-09-28 2023-02-07 齐鲁工业大学 Aspect-level interpretable deep network scoring prediction recommendation method based on text processing
CN112328908B (en) * 2020-11-11 2022-10-28 北京工业大学 Personalized recommendation method based on collaborative filtering
CN112529414B (en) * 2020-12-11 2023-08-01 西安电子科技大学 Article scoring method based on multi-task neural collaborative filtering network
CN112784153B (en) * 2020-12-31 2022-09-20 山西大学 Tourist attraction recommendation method integrating attribute feature attention and heterogeneous type information
CN113158024B (en) * 2021-02-26 2022-07-15 中国科学技术大学 Causal reasoning method for correcting popularity deviation of recommendation system
CN112967101B (en) * 2021-04-07 2023-04-07 重庆大学 Collaborative filtering article recommendation method based on multi-interaction information of social users
CN115712828A (en) * 2021-08-18 2023-02-24 华为技术有限公司 Image classification method and related equipment thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018212710A1 (en) * 2017-05-19 2018-11-22 National University Of Singapore Predictive analysis methods and systems
CN109087130A (en) * 2018-07-17 2018-12-25 深圳先进技术研究院 A kind of recommender system and recommended method based on attention mechanism
CN109299396A (en) * 2018-11-28 2019-02-01 东北师范大学 Merge the convolutional neural networks collaborative filtering recommending method and system of attention model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018212710A1 (en) * 2017-05-19 2018-11-22 National University Of Singapore Predictive analysis methods and systems
CN109087130A (en) * 2018-07-17 2018-12-25 深圳先进技术研究院 A kind of recommender system and recommended method based on attention mechanism
CN109299396A (en) * 2018-11-28 2019-02-01 东北师范大学 Merge the convolutional neural networks collaborative filtering recommending method and system of attention model

Also Published As

Publication number Publication date
CN110781409A (en) 2020-02-11

Similar Documents

Publication Publication Date Title
CN110781409B (en) Article recommendation method based on collaborative filtering
Huang et al. A deep reinforcement learning based long-term recommender system
CN109299396B (en) Convolutional neural network collaborative filtering recommendation method and system fusing attention model
CN111538912B (en) Content recommendation method, device, equipment and readable storage medium
CN110717098B (en) Meta-path-based context-aware user modeling method and sequence recommendation method
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
Karatzoglou et al. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering
CN111737578B (en) Recommendation method and system
CA2634020A1 (en) System and method for multi-level online learning
CN112364976A (en) User preference prediction method based on session recommendation system
CN114202061A (en) Article recommendation method, electronic device and medium based on generation of confrontation network model and deep reinforcement learning
CN111581520A (en) Item recommendation method and system based on item importance in session
Chen et al. Generative inverse deep reinforcement learning for online recommendation
CN114693397A (en) Multi-view multi-modal commodity recommendation method based on attention neural network
CN110727872A (en) Method and device for mining ambiguous selection behavior based on implicit feedback
Chen et al. Session-based recommendation: Learning multi-dimension interests via a multi-head attention graph neural network
Wang et al. Modeling uncertainty to improve personalized recommendations via Bayesian deep learning
CN116228368A (en) Advertisement click rate prediction method based on deep multi-behavior network
CN117216281A (en) Knowledge graph-based user interest diffusion recommendation method and system
CN113763031A (en) Commodity recommendation method and device, electronic equipment and storage medium
Gasmi et al. Context-aware based evolutionary collaborative filtering algorithm
CN110851705A (en) Project-based collaborative storage recommendation method and recommendation device thereof
CN115687757A (en) Recommendation method fusing hierarchical attention and feature interaction and application system thereof
CN110956528B (en) Recommendation method and system for e-commerce platform
Ren et al. A hybrid recommender approach based on widrow-hoff learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant