CN108363804A - Local model weighted fusion Top-N movie recommendation method based on user clustering - Google Patents
Local model weighted fusion Top-N movie recommendation method based on user clustering Download PDFInfo
- Publication number
- CN108363804A CN108363804A CN201810169922.3A CN201810169922A CN108363804A CN 108363804 A CN108363804 A CN 108363804A CN 201810169922 A CN201810169922 A CN 201810169922A CN 108363804 A CN108363804 A CN 108363804A
- Authority
- CN
- China
- Prior art keywords
- user
- model
- film
- document
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/735—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种网络上的电影推荐方法。The invention relates to a method for recommending movies on the network.
背景技术Background technique
随着信息科技和社交网络的快速发展,互联网产生的数据近来呈指数式暴涨,大数据时代来临。随着数据量的增多,人们越来越难以从海量数据中发现自己真正想要的信息。此时,推荐系统则能发挥它的最大应用价值。根据用户资料、物品信息以及用户历史行为数据,推荐算法能够准确预测用户的喜好,个性化地为用户推荐他们可能感兴趣的东西,大大降低了用户发现目标信息的成本。With the rapid development of information technology and social networks, the data generated by the Internet has recently skyrocketed, and the era of big data is coming. With the increase of the amount of data, it becomes more and more difficult for people to find the information they really want from the massive data. At this time, the recommendation system can exert its maximum application value. Based on user profiles, item information, and user historical behavior data, the recommendation algorithm can accurately predict user preferences and recommend items that may be of interest to users in a personalized manner, greatly reducing the cost for users to discover target information.
推荐算法可分为基于内容的推荐以及协同过滤推荐。现代化的推荐系统主要有两个任务,一个是评分预测,另一个是在现实商业场景中应用最多的Top-N推荐。Top-N推荐算法通过给用户推荐一个经过排名且大小为n的物品列表的方式让用户选择自己感兴趣的东西。Top-N推荐模型主要分为两种类型,分别是基于邻域的协同过滤和基于模型的协同过滤。前者又可细分为基于用户的邻域模型(UserKNN)和基于物品的邻域模型(ItemKNN),后者则以隐因子模型为代表。Recommendation algorithms can be divided into content-based recommendation and collaborative filtering recommendation. Modern recommendation systems mainly have two tasks, one is score prediction, and the other is Top-N recommendation, which is most widely used in real business scenarios. The Top-N recommendation algorithm allows users to choose what they are interested in by recommending a ranked and n-sized item list to users. Top-N recommendation models are mainly divided into two types, namely neighborhood-based collaborative filtering and model-based collaborative filtering. The former can be subdivided into user-based neighborhood model (UserKNN) and item-based neighborhood model (ItemKNN), and the latter is represented by latent factor model.
俗话说“物以类聚人以群分”,不同用户群体内部往往会形成各自独特的行为模式,使得两个相同的物品在不同的人群中相似度发生改变。而单一推荐算法模型往往捕捉不到这些局部的相似度差别,它们认为两个相同的物品在任何场景中的相似度都是一致的,这些模型无法准确捕获用户的真实偏好,降低了个性化推荐的质量。通过训练多个局部推荐模型,再融合局部模型来提升总体推荐效果的推荐算法在一定程度上能解决以上问题,但是这些算法往往没有充分利用推荐场景提供的数据,利用到的数据比较单一,最终的推荐效果也一般。As the saying goes, "Things of a feather flock together and people are divided into groups." Different user groups often form their own unique behavior patterns, which makes the similarity of two identical items change in different groups of people. However, single recommendation algorithm models often fail to capture these local similarity differences. They believe that the similarity of two identical items in any scene is the same. These models cannot accurately capture the real preferences of users, which reduces the personalized recommendation the quality of. The recommendation algorithm that improves the overall recommendation effect by training multiple local recommendation models and then fusing the local models can solve the above problems to a certain extent, but these algorithms often do not make full use of the data provided by the recommendation scene, and the data used is relatively single. The recommended effect is also average.
发明内容Contents of the invention
为了克服现有技术的单一模型无法准确捕获用户偏好以及多模型融合算法使用训练数据单一的问题,本发明提供一种新的基于用户聚类的局部模型加权融合电影推荐算法来实现电影的Top-N个性化推荐。In order to overcome the problem that a single model in the prior art cannot accurately capture user preferences and the multi-model fusion algorithm uses a single training data, the present invention provides a new local model weighted fusion movie recommendation algorithm based on user clustering to realize the Top- N personalized recommendation.
本发明利用电影的文本内容信息,通过LDA主题模型计算语义层次用户特征向量,并基于此通过谱聚类算法来实现用户聚类,构造局部人群。本发明进一步利用用户对电影的评分信息,通过稀疏线性模型构造局部推荐模型和全局推荐模型,通过局部模型和全局模型的线性加权融合来实现最终的电影Top-N个性化推荐。The present invention utilizes the text content information of movies to calculate semantic level user feature vectors through the LDA topic model, and based on this, realizes user clustering through spectral clustering algorithm and constructs local groups of people. The present invention further uses the user's rating information on movies, constructs a local recommendation model and a global recommendation model through a sparse linear model, and realizes the final Top-N personalized recommendation of movies through the linear weighted fusion of the local model and the global model.
基于用户聚类的局部模型加权融合Top-N电影推荐方法,总体流程如图1所示,具体包括如下步骤:The local model weighted fusion Top-N movie recommendation method based on user clustering, the overall process is shown in Figure 1, including the following steps:
步骤1:数据预处理阶段。对一些不活跃用户以及流行度很小的电影进行数据清洗;构造用户电影标签文档;把显式的评分信息转换成隐式反馈信息,构造用户-电影隐式反馈矩阵A;Step 1: Data preprocessing stage. Perform data cleaning on some inactive users and movies with low popularity; construct user movie label documents; convert explicit rating information into implicit feedback information, and construct user-movie implicit feedback matrix A;
1.1对原始数据集进行数据清洗工作,剔除观影数小于20部电影的用户,同时剔除被评分次数小于20次的电影,得到新的训练数据集;1.1 Carry out data cleaning work on the original data set, remove users who watched less than 20 movies, and remove movies rated less than 20 times to obtain a new training data set;
1.2统计新数据集里所有用户给电影打的标签生成一个标签字典,把用户看过的所有电影的标签组成的文档来表示当前用户,所有用户的文档组成一个语料库,计算文档中每个词在语料库中的TF-IDF值。词频TF,逆文档频IDF以及词频-逆文档频TF-IDF的计算公式如公式(1)(2)(3)所示;1.2 Count all the tags that users put on movies in the new data set to generate a tag dictionary, and use the tags of all the movies that the user has watched to represent the current user. TF-IDF values in the corpus. The calculation formulas of term frequency TF, inverse document frequency IDF and term frequency-inverse document frequency TF-IDF are shown in formula (1)(2)(3);
TFIDFi,j=TFi,j×IDFi (3)TFIDF i,j =TF i,j ×IDF i (3)
其中TFi,j表示词语ti在文档dj中的词频,ni,j表示词语ti在文档dj中出现的次数,∑knk,j表示文档dj中所有词语的出现次数之和。IDFi表示词ti的逆文档频,|D|表示语料库中文档的总数,|{j:ti∈dj}|表示包含词语ti的文档数目。TFIDFi,j表示文档dj中词语ti的词频逆文档频;Where TF i,j represents the word frequency of word t i in document d j , ni ,j represents the number of times word t i appears in document d j , ∑ k n k,j represents the number of occurrences of all words in document d j Sum. IDF i represents the inverse document frequency of term t i , |D| represents the total number of documents in the corpus, and |{j:t i ∈ d j }| represents the number of documents containing term t i . TFIDF i, j represents the word frequency inverse document frequency of word t i in document d j ;
1.3把显式的评分信息如1-5分,转换成用0-1表示的隐式反馈信息,若当前用户对当前电影打过分则记为1,没打过分的电影即待推荐的电影记为0,得到一个n×m的用户-电影隐式反馈矩阵,用户数为n,电影数为m;1.3 Convert explicit scoring information such as 1-5 points into implicit feedback information represented by 0-1. If the current user has rated the current movie too much, it will be recorded as 1, and the movie that has not rated too much is the movie to be recommended. is 0, and an n×m user-movie implicit feedback matrix is obtained, the number of users is n, and the number of movies is m;
步骤2:用户聚类阶段。利用电影标签信息,通过LDA主题模型训练得到用户特征向量,用谱聚类算法实现用户聚类;Step 2: User clustering stage. Using movie label information, user feature vectors are obtained through LDA topic model training, and user clustering is realized by spectral clustering algorithm;
2.1LDA主题模型是一个文档-主题-单词的三层贝叶斯网络,给定一个语料库,该模型可以分析该语料库中每篇文档的主题分布,以及每个主题的词分布。它的联合概率如公式(4)所示;2.1 The LDA topic model is a document-topic-word three-layer Bayesian network. Given a corpus, the model can analyze the topic distribution of each document in the corpus and the word distribution of each topic. Its joint probability is shown in formula (4);
θ表示一篇文档的主题分布,z表示一个主题,w表示一篇文档,α表示每篇文档下主题的多项分布的Dirichlet先验参数,β表示每个主题下词的多项分布的Dirichlet先验参数,N表示语料库中的文档数,zn表示一篇文档中第n个词的主题,wn表示一篇文档的第n个单词;θ represents the topic distribution of a document, z represents a topic, w represents a document, α represents the Dirichlet prior parameter of the multinomial distribution of topics under each document, and β represents the Dirichlet of the multinomial distribution of words under each topic A priori parameter, N represents the number of documents in the corpus, z n represents the topic of the nth word in a document, w n represents the nth word in a document;
每部电影都有多个用户给它赋予的标签,把一个电影标签映射成一个单词wn,把一个用户看过的所有电影的标签组成的集合映射成一篇文档w,把用户所偏好的一类特定的电影类型映射成一个主题z。若数据集里共有n个用户,则可生成一个含有n篇文档的语料库以及一个字典,语料库中的每篇文档用字典长度的向量表示,向量中的每个值是对应字典中标签在该用户文档及语料库中的TF-IDF值;Each movie has multiple tags given to it by users. A movie tag is mapped to a word w n , a set of tags of all movies a user has watched is mapped to a document w, and a user-preferred Class-specific movie genres are mapped to a topic z. If there are n users in the data set, a corpus containing n documents and a dictionary can be generated. Each document in the corpus is represented by a vector of the length of the dictionary, and each value in the vector is the corresponding label in the dictionary. TF-IDF values in documents and corpora;
为了能区分出更加独特的用户群体,不同主题之间的差异性越大越好。为了确定最佳主题个数,通过设置多个主题数训练多个LDA模型,计算每个LDA模型训练得到的主题向量之间的平均相似度,取主题向量平均相似度最小的模型对应的主题数作为模型最佳主题个数。通过LDA模型训练,得到每一篇文档的主题分布θ,用它来表示每一个用户的特征向量;In order to be able to distinguish more unique user groups, the more differences between the different topics, the better. In order to determine the optimal number of topics, train multiple LDA models by setting multiple topic numbers, calculate the average similarity between the topic vectors obtained by each LDA model training, and take the topic number corresponding to the model with the smallest average similarity of topic vectors The optimal number of topics to use as a model. Through LDA model training, the topic distribution θ of each document is obtained, and it is used to represent the feature vector of each user;
2.2利用以上步骤得到的用户特征向量(共n个),使用谱聚类算法实现对用户的聚类;2.2 Utilize the user feature vectors (n in total) obtained by the above steps, and use the spectral clustering algorithm to realize the clustering of users;
在聚类之前首先需要确定聚类个数。因为训练得到的每个用户向量的每一维度表示该用户属于对应主题的隶属度,故为了确定每个主题在当前用户群体中的重要性,把所有用户特征向量按维度做累加后再取平均,得到一个代表整体的主题强度向量,通过观察主题强度向量的值分布来确定最佳聚类个数。例如,在某次主题数为10的LDA训练过程中,按以上方法得到一个10维的主题强度向量,可视化如图2所示(纵轴表示主题强度,横轴为主题),通过观察可以看到主题2、9、3、8、6在当前数据集中强度最大,说明喜欢看这些类型电影的人最多,故当前情况使用谱聚类算法把用户聚成5类较适宜。谱聚类算法具体步骤如下:Before clustering, it is first necessary to determine the number of clusters. Because each dimension of each user vector obtained through training represents the membership degree of the user belonging to the corresponding topic, so in order to determine the importance of each topic in the current user group, all user feature vectors are accumulated by dimension and then averaged , get a topic strength vector representing the whole, and determine the optimal number of clusters by observing the value distribution of the topic strength vector. For example, in an LDA training process with a topic number of 10, a 10-dimensional topic strength vector is obtained by the above method, and the visualization is shown in Figure 2 (the vertical axis represents the topic strength, and the horizontal axis is the topic). Through observation, we can see Topics 2, 9, 3, 8, and 6 are the most intense in the current data set, indicating that the most people like to watch these types of movies. Therefore, it is more appropriate to use the spectral clustering algorithm to cluster users into 5 categories in the current situation. The specific steps of the spectral clustering algorithm are as follows:
(1)计算n×n的相似度矩阵W和度矩阵D;(1) Calculate n×n similarity matrix W and degree matrix D;
(2)计算拉普拉斯矩阵L=D-W;(2) Calculate the Laplacian matrix L=D-W;
(3)计算L的前k个特征向量t1,t2,…,tk;(3) Calculate the first k eigenvectors t 1 , t 2 ,...,t k of L;
(4)将k个列向量t1,t2,…,tk组成矩阵T,T∈Rn×k;(4) Composing k column vectors t 1 , t 2 ,...,t k into a matrix T, T∈R n×k ;
(5)对于i=1,…,n,令yi∈Rk是T的第i行向量;(5) For i=1,...,n, let y i ∈ R k be the i-th row vector of T;
(6)使用K-Means算法将用户(yi)i=1,2,…,n聚类成簇C1,C2,…,Ck;(6) Use the K-Means algorithm to cluster users (y i ) i=1, 2,..., n into clusters C 1 , C 2 ,..., C k ;
对于每个用户聚类,把原始隐式反馈训练矩阵A中不属于该聚类的用户行向量都置为0,每个聚类都生成一个对应的局部隐式反馈训练矩阵Pu表示聚类编号,且Pu∈{1,…,k};For each user cluster, the user row vectors in the original implicit feedback training matrix A that do not belong to the cluster are set to 0, and each cluster generates a corresponding local implicit feedback training matrix P u represents the cluster number, and P u ∈ {1,…,k};
步骤3确定局部推荐模型和进行全局推荐模型训练。稀疏线性模型SLIM的损失函数如公式(5)所示;Step 3 determines the local recommendation model and performs global recommendation model training. The loss function of the sparse linear model SLIM is shown in formula (5);
其中,A表示原始的用户-电影隐式反馈矩阵,α和ρ控制L1和L2范数的权重,通过最小化该损失函数可以获得一个大小为m×m的电影相似度稀疏矩阵W。该模型中L1范数控制W稀疏程度,L2范数控制模型的复杂度,防止模型过拟合。该模型通过随机梯度下降法,并行训练W矩阵的每一列wj来得到最终的W矩阵,如公式(6)所示;Among them, A represents the original user-movie implicit feedback matrix, α and ρ control the weights of L1 and L2 norms, and a sparse matrix W of movie similarity of size m×m can be obtained by minimizing the loss function. In this model, the L1 norm controls the sparsity of W, and the L2 norm controls the complexity of the model to prevent the model from overfitting. The model uses the random gradient descent method to train each column w j of the W matrix in parallel to obtain the final W matrix, as shown in formula (6);
其中,aj表示矩阵A中的第j列。用户i关于电影j的预测推荐度计算公式如公式(7)所示;Among them, a j represents the jth column in the matrix A. User i's predicted recommendation for movie j The calculation formula is shown in formula (7);
使用稀疏线性模型SLIM作为基本推荐模型构建全局推荐模型和局部推荐模型,利用全局隐式反馈训练矩阵A训练得到全局电影相似度矩阵W,利用局部隐式反馈训练矩阵训练得到每个聚类对应的局部电影相似度矩阵 Use the sparse linear model SLIM as the basic recommendation model to build a global recommendation model and a local recommendation model, use the global implicit feedback training matrix A to train the global movie similarity matrix W, and use the local implicit feedback training matrix Train to get the local movie similarity matrix corresponding to each cluster
步骤4模型加权融合推荐阶段。局部模型加权融合推荐度计算公式如公式(8)所示;Step 4: Model weighted fusion recommendation stage. The calculation formula of local model weighted fusion recommendation degree is shown in formula (8);
其中表示电影j对于用户u的加权融合推荐度,Ru为与用户u发生过交互的所有电影的集合,wlj为电影l和电影j在全局模型中的相似度,为电影l和电影j在用户u所属的聚类Pu对应的局部模型中的相似度,参数g为全局模型的权重参数。通过调节参数g来控制全局模型和局部模型在融合模型中的权重比例,通过确定最优权重参数g获得融合模型的最佳推荐效果。可以通过实验来确定在当前数据集中最佳的全局模型权重参数。在确定了模型中的所有参数之后,通过计算所有电影关于当前用户u的加权融合推荐度,按从大到小的排序,同时删除已经与当前用户发生过交互的电影,取排在前N位的电影推荐给当前用户;in Represents the weighted fusion recommendation degree of movie j for user u, R u is the set of all movies that have interacted with user u, w lj is the similarity between movie l and movie j in the global model, is the similarity between movie l and movie j in the local model corresponding to the cluster P u to which user u belongs, and the parameter g is the weight parameter of the global model. The weight ratio of the global model and the local model in the fusion model is controlled by adjusting the parameter g, and the best recommendation effect of the fusion model is obtained by determining the optimal weight parameter g. The optimal global model weight parameters in the current dataset can be determined experimentally. After determining all the parameters in the model, by calculating the weighted fusion recommendation of all movies for the current user u, sort them from large to small, and delete the movies that have interacted with the current user, and take the top N positions movies recommended to the current user;
步骤5.该推荐方法可通过留一法交叉验证来证明模型的有效性。可以从每个用户的电影评分集合中随机抽取一部电影放入测试集中,其他电影用来作为模型的训练集。然后用训练好的模型为每个用户推荐一个Top-N的电影列表,观察测试集里该用户的对应那一部电影是否出现在推荐列表中以及其出现在列表中的具体位置pi。最后,可以用命中率(HR)和平均排名命中率(ARHR)两个指标来衡量模型的推荐质量,其中#hits表示推荐命中数,#users表示用户总数,它们的定义如公式(9)、(10)所示;Step 5. The recommended method can prove the validity of the model by leave-one-out cross-validation. A movie can be randomly selected from each user's movie rating set and put into the test set, and other movies are used as the training set of the model. Then use the trained model to recommend a Top-N movie list for each user, and observe whether the corresponding movie of the user in the test set appears in the recommended list and its specific position p i in the list. Finally, two indicators, hit rate (HR) and average ranking hit rate (ARHR), can be used to measure the recommendation quality of the model, where #hits represents the number of recommended hits, and #users represents the total number of users. Their definitions are as in formula (9), (10);
推荐方法流程步骤至此结束。This concludes the recommended method flow steps.
本发明综合上述技术提出了基于用户聚类的局部模型加权融合Top-N电影推荐算法。为了解决传统单一推荐模型无法准确估计物品的局部差异性,导致无法准确捕获用户偏好的问题,提出了分别训练全局推荐模型和基于用户聚类的局部推荐模型,通过模型之间的线性加权融合来实现电影的Top-N推荐。另外,为了充分使用电影推荐场景中的数据,从多个维度来提升推荐的质量,本发明利用电影标签信息,通过LDA主题模型来实现对用户在语义层次的特征向量的计算,实现用户在语义层次族群的划分。The present invention combines the above technologies and proposes a Top-N movie recommendation algorithm based on user clustering based on local model weighted fusion. In order to solve the problem that the traditional single recommendation model cannot accurately estimate the local differences of items, resulting in the inability to accurately capture user preferences, a global recommendation model and a local recommendation model based on user clustering are proposed to be trained separately, through linear weighted fusion between the models. Realize the Top-N recommendation of movies. In addition, in order to make full use of the data in the movie recommendation scene and improve the quality of recommendation from multiple dimensions, the present invention uses movie label information to realize the calculation of the feature vector of the user at the semantic level through the LDA topic model, and realizes the user's semantic level. The division of hierarchical groups.
本发明的优点是:(1)算法思路新颖。使用稀疏线性模型作为基本推荐模型,分别训练全局推荐模型和基于用户聚类的局部推荐模型,最后通过线性加权融合生成最终的融合模型,这一思路能够处理电影的在不同人群中的相似度差异,有效克服了单一模型无法准确捕获用户偏好的问题。(2)多维度提升推荐质量。除了使用传统的评分数据来训练推荐模型,在用户聚类阶段,本发明通过引入电影标签数据,利用LDA主题模型分析人群在语义层次上的主题属性,得到用户特征向量并用谱聚类算法实现人群聚类,进一步提升了推荐的质量。(3)算法实现简单快速。在局部模型和全局模型训练阶段,由于各模型之间互相独立,各模型相似度矩阵的每一列之间也相互独立,故可采用并行训练的方法,极大降低了模型的训练时间,提升了模型训练的效率。(4)推荐质量较优。本发明提出的局部模型加权融合推荐算法是内容推荐、基于邻域的协同过滤、基于模型的协同过滤三者的有效结合,充分利用了每种推荐算法的优点,又弥补了互相之间的不足,相比于单一使用某种推荐算法,在推荐质量上有了极大的提升。The advantages of the present invention are: (1) The algorithm idea is novel. Use the sparse linear model as the basic recommendation model, train the global recommendation model and the local recommendation model based on user clustering respectively, and finally generate the final fusion model through linear weighted fusion. This idea can deal with the difference in similarity between different groups of people. , which effectively overcomes the problem that a single model cannot accurately capture user preferences. (2) Improve recommendation quality in multiple dimensions. In addition to using traditional scoring data to train the recommendation model, in the user clustering stage, the present invention introduces movie label data, uses the LDA topic model to analyze the topic attributes of the crowd at the semantic level, obtains user feature vectors, and uses the spectral clustering algorithm to realize the crowd Clustering further improves the quality of recommendations. (3) The algorithm implementation is simple and fast. In the training phase of the local model and the global model, since each model is independent of each other, each column of the similarity matrix of each model is also independent of each other, so the method of parallel training can be adopted, which greatly reduces the training time of the model and improves the The efficiency of model training. (4) The recommendation quality is better. The local model weighted fusion recommendation algorithm proposed by the present invention is an effective combination of content recommendation, neighborhood-based collaborative filtering, and model-based collaborative filtering. It makes full use of the advantages of each recommendation algorithm and makes up for the shortcomings of each other. , compared with a single recommendation algorithm, the recommendation quality has been greatly improved.
附图说明Description of drawings
图1是本发明方法的总流程图;Fig. 1 is the general flowchart of the inventive method;
图2是本发明方法的主题强度分布图。Figure 2 is a graph of the subject intensity distribution of the method of the present invention.
具体实施方式Detailed ways
参照图1技术方案总流程图,本发明共有四个阶段,分别是:数据预处理阶段、用户聚类阶段、全局推荐模型和局部推荐模型训练阶段以及推荐模型线性加权融合阶段。数据预处理阶段是对数据集进行清洗,剔除掉一些不活跃用户和冷门电影,构造用于LDA主题模型训练的语料库和用于稀疏线性模型训练的用户电影隐式反馈训练矩阵;用户聚类阶段,使用第一阶段得到的用户语料库通过训练LDA主题模型,得到用户特征向量,通过谱聚类算法实现对用户的聚类,每个聚类生成一个局部隐式反馈训练矩阵;全局推荐模型和局部推荐模型训练阶段,用原始隐式反馈矩阵和局部隐式反馈矩阵分别通过稀疏线性模型训练得到全局模型和局部模型;模型线性加权融合推荐阶段,把前一步得到的全局模型和局部模型通过线性加权的方式融合得到最终的推荐模型。Referring to the overall flow chart of the technical solution in Figure 1, the present invention has four stages, namely: data preprocessing stage, user clustering stage, global recommendation model and local recommendation model training stage, and recommendation model linear weighted fusion stage. The data preprocessing stage is to clean the data set, remove some inactive users and unpopular movies, construct a corpus for LDA topic model training and a user movie implicit feedback training matrix for sparse linear model training; user clustering stage , use the user corpus obtained in the first stage to train the LDA topic model to obtain the user feature vector, and realize the clustering of users through the spectral clustering algorithm, and each cluster generates a local implicit feedback training matrix; the global recommendation model and the local In the recommendation model training stage, the original implicit feedback matrix and the local implicit feedback matrix are used to obtain the global model and local model through sparse linear model training respectively; in the model linear weighted fusion recommendation stage, the global model and local model obtained in the previous step are linearly weighted way to get the final recommendation model.
本发明的输入为用户观影的评分数据、以及电影的标签数据,输出为针对用户的Top-N个性化电影推荐列表。The input of the present invention is the rating data of the user watching movies and the label data of the movie, and the output is the Top-N personalized movie recommendation list for the user.
具体步骤如下:Specific steps are as follows:
步骤1:数据预处理阶段。对一些不活跃用户以及流行度很小的电影进行数据清洗;构造用户电影标签文档;把显式的评分信息转换成隐式反馈信息,构造用户-电影隐式反馈矩阵A;Step 1: Data preprocessing stage. Perform data cleaning on some inactive users and movies with low popularity; construct user movie label documents; convert explicit rating information into implicit feedback information, and construct user-movie implicit feedback matrix A;
1.1对原始数据集进行数据清洗工作,剔除观影数小于20部电影的用户,同时剔除被评分次数小于20次的电影,得到新的训练数据集;1.1 Carry out data cleaning work on the original data set, remove users who watched less than 20 movies, and remove movies rated less than 20 times to obtain a new training data set;
1.2统计新数据集里所有用户给电影打的标签生成一个标签字典,把用户看过的所有电影的标签组成的文档来表示当前用户,所有用户的文档组成一个语料库,计算文档中每个词在语料库中的TF-IDF值。TF(词频),IDF(逆文档频)以及TF-IDF(词频-逆文档频)的计算公式如公式(1)(2)(3)所示;1.2 Count all the tags that users put on movies in the new data set to generate a tag dictionary, and use the tags of all the movies that the user has watched to represent the current user. TF-IDF values in the corpus. The calculation formulas of TF (term frequency), IDF (inverse document frequency) and TF-IDF (term frequency-inverse document frequency) are shown in formula (1)(2)(3);
TFIDFi,j=TFi,j×IDFi (3)TFIDF i,j =TF i,j ×IDF i (3)
其中TFi,j表示词语ti在文档dj中的词频,ni,j表示词语ti在文档dj中出现的次数,∑knk,j表示文档dj中所有词语的出现次数之和。IDFi表示词ti的逆文档频,|D|表示语料库中文档的总数,|{j:ti∈dj}|表示包含词语ti的文档数目。TFIDFi,j表示文档dj中词语ti的词频逆文档频;Where TF i,j represents the word frequency of word t i in document d j , ni ,j represents the number of times word t i appears in document d j , ∑ k n k,j represents the number of occurrences of all words in document d j Sum. IDF i represents the inverse document frequency of term t i , |D| represents the total number of documents in the corpus, and |{j:t i ∈ d j }| represents the number of documents containing term t i . TFIDF i, j represents the word frequency inverse document frequency of word t i in document d j ;
1.3把显式的评分信息如1-5分,转换成用0-1表示的隐式反馈信息,若当前用户对当前电影打过分则记为1,没打过分的电影即待推荐的电影记为0,得到一个n×m的用户-电影隐式反馈矩阵,用户数为n,电影数为m;1.3 Convert explicit scoring information such as 1-5 points into implicit feedback information represented by 0-1. If the current user has rated the current movie too much, it will be recorded as 1, and the movie that has not rated too much is the movie to be recommended. is 0, and an n×m user-movie implicit feedback matrix is obtained, the number of users is n, and the number of movies is m;
步骤2:用户聚类阶段。利用电影标签信息,通过LDA主题模型训练得到用户特征向量,用谱聚类算法实现用户聚类;Step 2: User clustering stage. Using movie label information, user feature vectors are obtained through LDA topic model training, and user clustering is realized by spectral clustering algorithm;
2.1LDA主题模型是一个文档-主题-单词的三层贝叶斯网络,给定一个语料库,该模型可以分析该语料库中每篇文档的主题分布,以及每个主题的词分布。它的联合概率如公式(4)所示;2.1 The LDA topic model is a document-topic-word three-layer Bayesian network. Given a corpus, the model can analyze the topic distribution of each document in the corpus and the word distribution of each topic. Its joint probability is shown in formula (4);
θ表示一篇文档的主题分布,z表示一个主题,w表示一篇文档,α表示每篇文档下主题的多项分布的Dirichlet先验参数,β表示每个主题下词的多项分布的Dirichlet先验参数,N表示语料库中的文档数,zn表示一篇文档中第n个词的主题,wn表示一篇文档的第n个单词;θ represents the topic distribution of a document, z represents a topic, w represents a document, α represents the Dirichlet prior parameter of the multinomial distribution of topics under each document, and β represents the Dirichlet of the multinomial distribution of words under each topic A priori parameter, N represents the number of documents in the corpus, z n represents the topic of the nth word in a document, w n represents the nth word in a document;
每部电影都有多个用户给它赋予的标签,把一个电影标签映射成一个单词wn,把一个用户看过的所有电影的标签组成的集合映射成一篇文档w,把用户所偏好的一类特定的电影类型映射成一个主题z。若数据集里共有n个用户,则可生成一个含有n篇文档的语料库以及一个字典,语料库中的每篇文档用字典长度的向量表示,向量中的每个值是对应字典中标签在该用户文档及语料库中的TF-IDF值;Each movie has multiple tags given to it by users. A movie tag is mapped to a word w n , a set of tags of all movies a user has watched is mapped to a document w, and a user-preferred Class-specific movie genres are mapped to a topic z. If there are n users in the data set, a corpus containing n documents and a dictionary can be generated. Each document in the corpus is represented by a vector of the length of the dictionary, and each value in the vector is the corresponding label in the dictionary. TF-IDF values in documents and corpora;
为了能区分出更加独特的用户群体,不同主题之间的差异性越大越好。为了确定最佳主题个数,通过设置多个主题数训练多个LDA模型,计算每个LDA模型训练得到的主题向量之间的平均相似度,取主题向量平均相似度最小的模型对应的主题数作为模型最佳主题个数。通过LDA模型训练,得到每一篇文档的主题分布θ,用它来表示每一个用户的特征向量;In order to be able to distinguish more unique user groups, the more differences between the different topics, the better. In order to determine the optimal number of topics, train multiple LDA models by setting multiple topic numbers, calculate the average similarity between the topic vectors obtained by each LDA model training, and take the topic number corresponding to the model with the smallest average similarity of topic vectors The optimal number of topics to use as a model. Through LDA model training, the topic distribution θ of each document is obtained, and it is used to represent the feature vector of each user;
2.2利用以上步骤得到的用户特征向量(共n个),使用谱聚类算法实现对用户的聚类;2.2 Utilize the user feature vectors (n in total) obtained by the above steps, and use the spectral clustering algorithm to realize the clustering of users;
在聚类之前首先需要确定聚类个数。因为训练得到的每个用户向量的每一维度表示该用户属于对应主题的隶属度,故为了确定每个主题在当前用户群体中的重要性,把所有用户特征向量按维度做累加后再取平均,得到一个代表整体的主题强度向量,通过观察主题强度向量的值分布来确定最佳聚类个数。例如,在某次主题数为10的LDA训练过程中,按以上方法得到一个10维的主题强度向量,可视化如图2所示(纵轴表示主题强度,横轴为主题),通过观察可以看到主题2、9、3、8、6在当前数据集中强度最大,说明喜欢看这些类型电影的人最多,故当前情况使用谱聚类算法把用户聚成5类较适宜。谱聚类算法具体步骤如下:Before clustering, it is first necessary to determine the number of clusters. Because each dimension of each user vector obtained through training represents the membership degree of the user belonging to the corresponding topic, so in order to determine the importance of each topic in the current user group, all user feature vectors are accumulated by dimension and then averaged , get a topic strength vector representing the whole, and determine the optimal number of clusters by observing the value distribution of the topic strength vector. For example, in an LDA training process with a topic number of 10, a 10-dimensional topic strength vector is obtained by the above method, and the visualization is shown in Figure 2 (the vertical axis represents the topic strength, and the horizontal axis is the topic). Through observation, we can see Topics 2, 9, 3, 8, and 6 are the most intense in the current data set, indicating that the most people like to watch these types of movies. Therefore, it is more appropriate to use the spectral clustering algorithm to cluster users into 5 categories in the current situation. The specific steps of the spectral clustering algorithm are as follows:
(1)计算n×n的相似度矩阵W和度矩阵D;(1) Calculate n×n similarity matrix W and degree matrix D;
(2)计算拉普拉斯矩阵L=D-W;(2) Calculate the Laplacian matrix L=D-W;
(3)计算L的前k个特征向量t1,t2,…,tk;(3) Calculate the first k eigenvectors t 1 , t 2 ,...,t k of L;
(4)将k个列向量t1,t2,…,tk组成矩阵T,T∈Rn×k;(4) Composing k column vectors t 1 , t 2 ,...,t k into a matrix T, T∈R n×k ;
(5)对于i=1,…,n,令yi∈Rk是T的第i行向量;(5) For i=1,...,n, let y i ∈ R k be the i-th row vector of T;
(6)使用K-Means算法将用户(yi)i=1,2,…,n聚类成簇C1,C2,…,Ck;(6) Use the K-Means algorithm to cluster users (y i ) i=1, 2,..., n into clusters C 1 , C 2 ,..., C k ;
对于每个用户聚类,把原始隐式反馈训练矩阵A中不属于该聚类的用户行向量都置为0,每个聚类都生成一个对应的局部隐式反馈训练矩阵Pu表示聚类编号,且Pu∈{1,…,k};For each user cluster, the user row vectors in the original implicit feedback training matrix A that do not belong to the cluster are set to 0, and each cluster generates a corresponding local implicit feedback training matrix P u represents the cluster number, and P u ∈ {1,…,k};
步骤3确定局部推荐模型和进行全局推荐模型训练。稀疏线性模型SLIM的损失函数如公式(5)所示;Step 3 determines the local recommendation model and performs global recommendation model training. The loss function of the sparse linear model SLIM is shown in formula (5);
其中,A表示原始的用户-电影隐式反馈矩阵,α和ρ控制L1和L2范数的权重,通过最小化该损失函数可以获得一个大小为m×m的电影相似度稀疏矩阵W。该模型中L1范数控制W稀疏程度,L2范数控制模型的复杂度,防止模型过拟合。该模型通过随机梯度下降法,并行训练W矩阵的每一列wj来得到最终的W矩阵,如公式(6)所示;Among them, A represents the original user-movie implicit feedback matrix, α and ρ control the weights of L1 and L2 norms, and a sparse matrix W of movie similarity of size m×m can be obtained by minimizing the loss function. In this model, the L1 norm controls the sparsity of W, and the L2 norm controls the complexity of the model to prevent the model from overfitting. The model uses the random gradient descent method to train each column w j of the W matrix in parallel to obtain the final W matrix, as shown in formula (6);
其中,aj表示矩阵A中的第j列。用户i关于电影j的预测推荐度计算公式如公式(7)所示;Among them, a j represents the jth column in the matrix A. User i's predicted recommendation for movie j The calculation formula is shown in formula (7);
使用稀疏线性模型SLIM作为基本推荐模型构建全局推荐模型和局部推荐模型,利用全局隐式反馈训练矩阵A训练得到全局电影相似度矩阵W,利用局部隐式反馈训练矩阵训练得到每个聚类对应的局部电影相似度矩阵 Use the sparse linear model SLIM as the basic recommendation model to build a global recommendation model and a local recommendation model, use the global implicit feedback training matrix A to train the global movie similarity matrix W, and use the local implicit feedback training matrix Train to get the local movie similarity matrix corresponding to each cluster
步骤4模型加权融合推荐阶段。局部模型加权融合推荐度计算公式如公式(8)所示;Step 4: Model weighted fusion recommendation stage. The calculation formula of local model weighted fusion recommendation degree is shown in formula (8);
其中表示电影j对于用户u的加权融合推荐度,Ru为与用户u发生过交互的所有电影的集合,wlj为电影l和电影j在全局模型中的相似度,为电影l和电影j在用户u所属的聚类Pu对应的局部模型中的相似度,参数g为全局模型的权重参数。通过调节参数g来控制全局模型和局部模型在融合模型中的权重比例,通过确定最优权重参数g获得融合模型的最佳推荐效果。可以通过实验来确定在当前数据集中最佳的全局模型权重参数。在确定了模型中的所有参数之后,通过计算所有电影关于当前用户u的加权融合推荐度,按从大到小的排序,同时删除已经与当前用户发生过交互的电影,取排在前N位的电影推荐给当前用户;in Represents the weighted fusion recommendation degree of movie j for user u, R u is the set of all movies that have interacted with user u, w lj is the similarity between movie l and movie j in the global model, is the similarity between movie l and movie j in the local model corresponding to the cluster P u to which user u belongs, and the parameter g is the weight parameter of the global model. The weight ratio of the global model and the local model in the fusion model is controlled by adjusting the parameter g, and the best recommendation effect of the fusion model is obtained by determining the optimal weight parameter g. The optimal global model weight parameters in the current dataset can be determined experimentally. After determining all the parameters in the model, by calculating the weighted fusion recommendation of all movies for the current user u, sort them from large to small, and delete the movies that have interacted with the current user, and take the top N positions movies recommended to the current user;
步骤5.该推荐方法可通过留一法交叉验证来证明模型的有效性。可以从每个用户的电影评分集合中随机抽取一部电影放入测试集中,其他电影用来作为模型的训练集。然后用训练好的模型为每个用户推荐一个Top-N的电影列表,观察测试集里该用户的对应那一部电影是否出现在推荐列表中以及其出现在列表中的具体位置pi。最后,可以用命中率(HR)和平均排名命中率(ARHR)两个指标来衡量模型的推荐质量,其中#hits表示推荐命中数,#users表示用户总数,它们的定义如公式(9)、(10)所示;Step 5. The recommended method can prove the validity of the model by leave-one-out cross-validation. A movie can be randomly selected from each user's movie rating set and put into the test set, and other movies are used as the training set of the model. Then use the trained model to recommend a Top-N movie list for each user, and observe whether the corresponding movie of the user in the test set appears in the recommended list and its specific position p i in the list. Finally, two indicators, hit rate (HR) and average ranking hit rate (ARHR), can be used to measure the recommendation quality of the model, where #hits represents the number of recommended hits, and #users represents the total number of users. Their definitions are as in formula (9), (10);
推荐方法流程步骤至此结束。This concludes the recommended method flow steps.
本说明书实施例所述的内容仅仅是对发明构思的实现形式的列举,本发明的保护范围不应当被视为仅限于实施例所陈述的具体形式,本发明的保护范围也及于本领域技术人员根据本发明构思所能够想到的等同技术手段。The content described in the embodiments of this specification is only an enumeration of the implementation forms of the inventive concept. The protection scope of the present invention should not be regarded as limited to the specific forms stated in the embodiments. Equivalent technical means that a person can think of based on the concept of the present invention.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810169922.3A CN108363804B (en) | 2018-03-01 | 2018-03-01 | Local model weighted fusion Top-N movie recommendation method based on user clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810169922.3A CN108363804B (en) | 2018-03-01 | 2018-03-01 | Local model weighted fusion Top-N movie recommendation method based on user clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108363804A true CN108363804A (en) | 2018-08-03 |
CN108363804B CN108363804B (en) | 2020-08-21 |
Family
ID=63002919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810169922.3A Active CN108363804B (en) | 2018-03-01 | 2018-03-01 | Local model weighted fusion Top-N movie recommendation method based on user clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363804B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408702A (en) * | 2018-08-29 | 2019-03-01 | 昆明理工大学 | A kind of mixed recommendation method based on sparse edge noise reduction autocoding |
CN110008377A (en) * | 2019-03-27 | 2019-07-12 | 华南理工大学 | A method for movie recommendation using user attributes |
CN110069663A (en) * | 2019-04-29 | 2019-07-30 | 厦门美图之家科技有限公司 | Video recommendation method and device |
CN110084670A (en) * | 2019-04-15 | 2019-08-02 | 东北大学 | A kind of commodity on shelf combined recommendation method based on LDA-MLP |
CN110443502A (en) * | 2019-08-06 | 2019-11-12 | 合肥工业大学 | Crowdsourcing task recommendation method and system based on worker's capability comparison |
CN110795570A (en) * | 2019-10-11 | 2020-02-14 | 上海上湖信息技术有限公司 | Method and device for extracting user time sequence behavior characteristics |
CN111008334A (en) * | 2019-12-04 | 2020-04-14 | 华中科技大学 | Top-K recommendation method and system based on local pairwise ordering and global decision fusion |
CN111309873A (en) * | 2018-11-23 | 2020-06-19 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111309874A (en) * | 2018-11-23 | 2020-06-19 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111428144A (en) * | 2020-02-27 | 2020-07-17 | 中国平安财产保险股份有限公司 | Recommendation method and device based on combination of DCN and L DA and computer equipment |
CN111460046A (en) * | 2020-03-06 | 2020-07-28 | 合肥海策科技信息服务有限公司 | Scientific and technological information clustering method based on big data |
CN111581522A (en) * | 2020-06-05 | 2020-08-25 | 预见你情感(北京)教育咨询有限公司 | Social analysis method based on user identity identification |
CN111860091A (en) * | 2020-01-22 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Face image evaluation method and system, server and computer-readable storage medium |
CN111897999A (en) * | 2020-07-27 | 2020-11-06 | 九江学院 | A deep learning model construction method based on LDA for video recommendation |
CN111984856A (en) * | 2019-07-25 | 2020-11-24 | 北京嘀嘀无限科技发展有限公司 | Information pushing method and device, server and computer readable storage medium |
CN112184391A (en) * | 2020-10-16 | 2021-01-05 | 中国科学院计算技术研究所 | A training method, medium, electronic device and recommendation model for a recommendation model |
CN112348629A (en) * | 2020-10-26 | 2021-02-09 | 邦道科技有限公司 | Commodity information pushing method and device |
CN112364245A (en) * | 2020-11-20 | 2021-02-12 | 浙江工业大学 | Top-K movie recommendation method based on heterogeneous information network embedding |
CN112395487A (en) * | 2019-08-14 | 2021-02-23 | 腾讯科技(深圳)有限公司 | Information recommendation method and device, computer-readable storage medium and electronic equipment |
CN112925926A (en) * | 2021-01-28 | 2021-06-08 | 北京达佳互联信息技术有限公司 | Training method and device of multimedia recommendation model, server and storage medium |
CN113111251A (en) * | 2020-01-10 | 2021-07-13 | 阿里巴巴集团控股有限公司 | Project recommendation method, device and system |
CN113268670A (en) * | 2021-06-16 | 2021-08-17 | 中移(杭州)信息技术有限公司 | Latent factor hybrid recommendation method, device, equipment and computer storage medium |
CN113342963A (en) * | 2021-04-29 | 2021-09-03 | 山东大学 | Service recommendation method and system based on transfer learning |
CN113449147A (en) * | 2021-07-06 | 2021-09-28 | 乐视云计算有限公司 | Video recommendation method and device based on theme |
CN114418679A (en) * | 2022-01-13 | 2022-04-29 | 中电福富信息科技有限公司 | A Recommendation Method Based on Item Similarity and User Behavior |
CN114936314A (en) * | 2022-03-24 | 2022-08-23 | 阿里巴巴达摩院(杭州)科技有限公司 | Recommendation information generation method and device, storage medium and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544216A (en) * | 2013-09-23 | 2014-01-29 | Tcl集团股份有限公司 | Information recommendation method and system combining image content and keywords |
US20150120742A1 (en) * | 2012-06-21 | 2015-04-30 | Tencent Technology (Shenzhen) Company Limited | Method and system for processing recommended target software |
CN107609201A (en) * | 2017-10-25 | 2018-01-19 | 广东工业大学 | A kind of recommended models generation method and relevant apparatus based on commending system |
-
2018
- 2018-03-01 CN CN201810169922.3A patent/CN108363804B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150120742A1 (en) * | 2012-06-21 | 2015-04-30 | Tencent Technology (Shenzhen) Company Limited | Method and system for processing recommended target software |
CN103544216A (en) * | 2013-09-23 | 2014-01-29 | Tcl集团股份有限公司 | Information recommendation method and system combining image content and keywords |
CN107609201A (en) * | 2017-10-25 | 2018-01-19 | 广东工业大学 | A kind of recommended models generation method and relevant apparatus based on commending system |
Non-Patent Citations (2)
Title |
---|
EVANGELIA CHRISTAKOPOULOU: "Local item-item models for top-n recommendation", 《ACM》 * |
李倩: "基于谱聚类与多因子融合的协同过滤推荐算法", 《计算机应用研究》 * |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408702B (en) * | 2018-08-29 | 2021-07-16 | 昆明理工大学 | A Hybrid Recommendation Method Based on Sparse Edge Noise Reduction Autocoding |
CN109408702A (en) * | 2018-08-29 | 2019-03-01 | 昆明理工大学 | A kind of mixed recommendation method based on sparse edge noise reduction autocoding |
CN111309873A (en) * | 2018-11-23 | 2020-06-19 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111309874A (en) * | 2018-11-23 | 2020-06-19 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device, electronic equipment and storage medium |
CN110008377A (en) * | 2019-03-27 | 2019-07-12 | 华南理工大学 | A method for movie recommendation using user attributes |
CN110008377B (en) * | 2019-03-27 | 2021-09-21 | 华南理工大学 | Method for recommending movies by using user attributes |
CN110084670A (en) * | 2019-04-15 | 2019-08-02 | 东北大学 | A kind of commodity on shelf combined recommendation method based on LDA-MLP |
CN110084670B (en) * | 2019-04-15 | 2022-03-25 | 东北大学 | Shelf commodity combination recommendation method based on LDA-MLP |
CN110069663A (en) * | 2019-04-29 | 2019-07-30 | 厦门美图之家科技有限公司 | Video recommendation method and device |
CN110069663B (en) * | 2019-04-29 | 2021-06-04 | 厦门美图之家科技有限公司 | Video recommendation method and device |
CN111984856A (en) * | 2019-07-25 | 2020-11-24 | 北京嘀嘀无限科技发展有限公司 | Information pushing method and device, server and computer readable storage medium |
CN110443502A (en) * | 2019-08-06 | 2019-11-12 | 合肥工业大学 | Crowdsourcing task recommendation method and system based on worker's capability comparison |
CN112395487B (en) * | 2019-08-14 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Information recommendation method and device, computer readable storage medium and electronic equipment |
CN112395487A (en) * | 2019-08-14 | 2021-02-23 | 腾讯科技(深圳)有限公司 | Information recommendation method and device, computer-readable storage medium and electronic equipment |
CN110795570B (en) * | 2019-10-11 | 2022-06-17 | 上海上湖信息技术有限公司 | A method and device for extracting user timing behavior features |
CN110795570A (en) * | 2019-10-11 | 2020-02-14 | 上海上湖信息技术有限公司 | Method and device for extracting user time sequence behavior characteristics |
CN111008334B (en) * | 2019-12-04 | 2023-04-18 | 华中科技大学 | Top-K recommendation method and system based on local pairwise ordering and global decision fusion |
CN111008334A (en) * | 2019-12-04 | 2020-04-14 | 华中科技大学 | Top-K recommendation method and system based on local pairwise ordering and global decision fusion |
CN113111251A (en) * | 2020-01-10 | 2021-07-13 | 阿里巴巴集团控股有限公司 | Project recommendation method, device and system |
CN111860091A (en) * | 2020-01-22 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Face image evaluation method and system, server and computer-readable storage medium |
CN111428144A (en) * | 2020-02-27 | 2020-07-17 | 中国平安财产保险股份有限公司 | Recommendation method and device based on combination of DCN and L DA and computer equipment |
CN111460046A (en) * | 2020-03-06 | 2020-07-28 | 合肥海策科技信息服务有限公司 | Scientific and technological information clustering method based on big data |
CN111581522A (en) * | 2020-06-05 | 2020-08-25 | 预见你情感(北京)教育咨询有限公司 | Social analysis method based on user identity identification |
CN111897999B (en) * | 2020-07-27 | 2023-06-16 | 九江学院 | Deep learning model construction method for video recommendation and based on LDA |
CN111897999A (en) * | 2020-07-27 | 2020-11-06 | 九江学院 | A deep learning model construction method based on LDA for video recommendation |
CN112184391A (en) * | 2020-10-16 | 2021-01-05 | 中国科学院计算技术研究所 | A training method, medium, electronic device and recommendation model for a recommendation model |
CN112184391B (en) * | 2020-10-16 | 2023-10-10 | 中国科学院计算技术研究所 | Training method of recommendation model, medium, electronic equipment and recommendation model |
CN112348629A (en) * | 2020-10-26 | 2021-02-09 | 邦道科技有限公司 | Commodity information pushing method and device |
CN112364245B (en) * | 2020-11-20 | 2021-12-21 | 浙江工业大学 | Top-K Movie Recommendation Method Based on Heterogeneous Information Network Embedding |
CN112364245A (en) * | 2020-11-20 | 2021-02-12 | 浙江工业大学 | Top-K movie recommendation method based on heterogeneous information network embedding |
CN112925926A (en) * | 2021-01-28 | 2021-06-08 | 北京达佳互联信息技术有限公司 | Training method and device of multimedia recommendation model, server and storage medium |
CN113342963A (en) * | 2021-04-29 | 2021-09-03 | 山东大学 | Service recommendation method and system based on transfer learning |
CN113268670B (en) * | 2021-06-16 | 2022-09-27 | 中移(杭州)信息技术有限公司 | Latent factor hybrid recommendation method, device, equipment and computer storage medium |
CN113268670A (en) * | 2021-06-16 | 2021-08-17 | 中移(杭州)信息技术有限公司 | Latent factor hybrid recommendation method, device, equipment and computer storage medium |
CN113449147A (en) * | 2021-07-06 | 2021-09-28 | 乐视云计算有限公司 | Video recommendation method and device based on theme |
CN114418679A (en) * | 2022-01-13 | 2022-04-29 | 中电福富信息科技有限公司 | A Recommendation Method Based on Item Similarity and User Behavior |
CN114936314A (en) * | 2022-03-24 | 2022-08-23 | 阿里巴巴达摩院(杭州)科技有限公司 | Recommendation information generation method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108363804B (en) | 2020-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363804A (en) | Local model weighted fusion Top-N movie recommendation method based on user clustering | |
Kumar et al. | Movie recommendation system using sentiment analysis from microblogging data | |
US12190583B2 (en) | User tag generation method and apparatus, storage medium, and computer device | |
CN108763362A (en) | Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point | |
Phorasim et al. | Movies recommendation system using collaborative filtering and k-means | |
CN107357793B (en) | Information recommendation method and device | |
Gao et al. | Personalized service system based on hybrid filtering for digital library | |
CN109214454B (en) | A Weibo-Oriented Emotional Community Classification Method | |
CN117708421B (en) | Dynamic recommendation method and system based on modularized neural network | |
Cong et al. | Hierarchical attention based neural network for explainable recommendation | |
Yao et al. | A personalized recommendation system based on user portrait | |
Gu | Research on precision marketing strategy and personalized recommendation method based on big data drive | |
CN104572915A (en) | User event relevance calculation method based on content environment enhancement | |
Hoang et al. | Academic event recommendation based on research similarity and exploring interaction between authors | |
Shen et al. | Modified similarity algorithm for collaborative filtering | |
Ye et al. | An interpretable mechanism for personalized recommendation based on cross feature | |
Sun et al. | Deep plot-aware generalized matrix factorization for collaborative filtering | |
Ma et al. | Book recommendation model based on wide and deep model | |
Bharadhwaj | Layer-wise relevance propagation for explainable recommendations | |
Shrivastava et al. | K-means clustering based solution of sparsity problem in rating based movie recommendation system | |
Gupta et al. | Multimodal graph-based recommendation system using hybrid filtering approach | |
Niu et al. | Tourism event knowledge graph for attractions recommendation | |
Ye et al. | A collaborative neural model for rating prediction by leveraging user reviews and product images | |
Tang et al. | Predicting total sales volume interval of an experiential product with short life cycle before production: similarity comparison in attribute relationship patterns | |
Xu et al. | xDeepFIG: An extreme deep model with feature interactions and generation for CTR prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |