CN114637909A - Film recommendation system and method based on improved deep structured semantic model - Google Patents
Film recommendation system and method based on improved deep structured semantic model Download PDFInfo
- Publication number
- CN114637909A CN114637909A CN202210132815.XA CN202210132815A CN114637909A CN 114637909 A CN114637909 A CN 114637909A CN 202210132815 A CN202210132815 A CN 202210132815A CN 114637909 A CN114637909 A CN 114637909A
- Authority
- CN
- China
- Prior art keywords
- user
- movie
- vector
- module
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 156
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 28
- 238000005516 engineering process Methods 0.000 claims abstract description 11
- 230000002452 interceptive effect Effects 0.000 claims abstract description 6
- 230000006399 behavior Effects 0.000 claims description 49
- 230000009467 reduction Effects 0.000 claims description 27
- 238000007906 compression Methods 0.000 claims description 24
- 230000006835 compression Effects 0.000 claims description 24
- 230000005284 excitation Effects 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 11
- 238000004140 cleaning Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 238000013144 data compression Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims 1
- 238000012163 sequencing technique Methods 0.000 claims 1
- 230000003993 interaction Effects 0.000 abstract description 4
- 230000001568 sexual effect Effects 0.000 abstract 1
- 230000002776 aggregation Effects 0.000 description 4
- 238000004220 aggregation Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种基于改进深度结构化语义模型的电影推荐系统,包括用户行为采集与处理模块、离线训练模块和在线召回与排序模块,所述用户行为采集与处理模块收集用户的互动行为,所述离线训练模块接收用户行为采集与处理模块输出的合并后的数据,所述在线召回与排序模块根据用户的属性特征中取出用户已经训练得出的用户特征向量,并采用近似最近邻搜索技术在电影向量库中召回为用户推荐的电影子集;一种基于改进深度结构化语义模型的电影推荐方法,包括用户行为采集和处理、离线训练、在线召回与排序等步骤,本发明可以根据用户与电影的显性特征和隐性交互信息,有效地挖掘出符合用户兴趣的电影,为用户提供个性化的推荐服务,得到该用户的推荐结果。
A movie recommendation system based on an improved deep structured semantic model, comprising a user behavior collection and processing module, an offline training module and an online recall and sorting module, the user behavior collection and processing module collects the user's interactive behavior, the offline training module The module receives the combined data output by the user behavior collection and processing module, and the online recall and sorting module takes out the user feature vector that the user has trained to obtain according to the user's attribute characteristics, and uses the approximate nearest neighbor search technology in the movie vector library. Recall is a subset of movies recommended by users; a movie recommendation method based on an improved deep structured semantic model includes steps such as user behavior collection and processing, offline training, online recall and sorting, etc. Sexual characteristics and implicit interaction information can be used to effectively mine movies that meet the user's interests, provide users with personalized recommendation services, and obtain the user's recommendation results.
Description
技术领域technical field
本发明涉及语意模型领域,具体是一种基于改进深度结构化语义模型的电影推荐系统及方法。The invention relates to the field of semantic models, in particular to a movie recommendation system and method based on an improved deep structured semantic model.
背景技术Background technique
随着互联网不断发展,网络视频市场规模逐年增长。网络用户在享受形式和内容丰富多样的视频盛宴的同时,也在不断被大量冗余、无效的信息冲击。这些庞大的数据信息远远超出了用户能承受的程度,严重干扰了用户对自身所需信息的正确选择,导致信息利用率非常低,甚至带给用户困扰和反感。With the continuous development of the Internet, the size of the online video market has grown year by year. While enjoying the video feast with rich and varied forms and contents, Internet users are constantly being impacted by a large amount of redundant and invalid information. These huge amounts of data and information are far beyond what users can bear, and seriously interfere with users' correct selection of the information they need, resulting in very low information utilization, and even causing confusion and disgust to users.
推荐系统作为解决“信息过载”问题的有效手段,这些年来在得以迅速发展,其在互联网服务中的作用也日益增加。电影作为丰富信息的一种载体,自然成为个性化推荐中一类重要的研究对象。随着用户和电影数量的不断增长,如何深度挖掘电影信息,准确匹配用户兴趣,从茫茫电影库中为用户挑选合适的电影,提供精准的个性化服务,已经成为行业内研究的热点。As an effective means to solve the problem of "information overload", the recommendation system has been developing rapidly in recent years, and its role in Internet services is also increasing. As a carrier of rich information, movies naturally become an important research object in personalized recommendation. With the continuous growth of the number of users and movies, how to deeply mine movie information, accurately match user interests, select suitable movies for users from the vast movie library, and provide accurate personalized services has become a research hotspot in the industry.
电影推荐算法是电影推荐系统的核心,目前使用比较多的是协同过滤,它是基于用户的观看历史,找到与目标用户看过同样视频的相似用户,然后找到这些相似用户喜欢看的其他视频,推荐给目标用户。这种方案的问题是:当用户对电影的评分记录很少时,使用协同过滤推荐效果很差,也就是冷启动问题;随着用户和电影的不断增加,协同过滤矩阵会非常大,利用矩阵分解得到用户和电影的特征向量代价也越来越大。其他方案是比如利用深度神经网络或者循环神经网络等来学习用户的历史偏好。目前来说,电影推荐算法的研究还有很大的发展空间。The movie recommendation algorithm is the core of the movie recommendation system. At present, collaborative filtering is widely used. It is based on the user's viewing history to find similar users who have watched the same video as the target user, and then find other videos that these similar users like to watch. Recommended to target users. The problem with this scheme is: when users have few ratings records for movies, the recommendation effect of using collaborative filtering is very poor, that is, the cold start problem; with the continuous increase of users and movies, the collaborative filtering matrix will be very large, using the matrix The cost of decomposing the feature vectors of users and movies is also increasing. Other solutions are to use deep neural networks or recurrent neural networks to learn the historical preferences of users. At present, there is still a lot of room for development in the research of movie recommendation algorithms.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种基于改进深度结构化语义模型的电影推荐系统及方法,以解决现有技术中的问题。The purpose of the present invention is to provide a movie recommendation system and method based on an improved deep structured semantic model, so as to solve the problems in the prior art.
为实现上述目的,本发明提供一种基于改进深度结构化语义模型的电影推荐系统,包括用户行为采集与处理模块、离线训练模块和在线召回与排序模块,In order to achieve the above purpose, the present invention provides a movie recommendation system based on an improved deep structured semantic model, including a user behavior collection and processing module, an offline training module and an online recall and sorting module,
所述用户行为采集与处理模块通过在前端埋点,收集用户的互动行为日志、搜索行为日志和播放记录列表,作为用户特征数据存入文件系统特征库中,借助数据仓库工具,对收集的日志进行数据清洗得到基础样本原始数据集;The user behavior collection and processing module collects the user's interactive behavior log, search behavior log and play record list by burying points in the front end, and stores it in the file system feature database as user feature data. Perform data cleaning to obtain the original data set of the basic sample;
根据清洗得到的原始数据,系统对用户的行为日志进行预处理,对用户的行为日志进行数据合并;According to the raw data obtained by cleaning, the system preprocesses the user's behavior log and merges the data of the user's behavior log;
所述离线训练模块接收用户行为采集与处理模块输出的合并后的数据样本,对数据编码和降维后,对数据重新赋权,并对数据进行深层隐语义特征的提取与挖掘,根据数据特征对用户和电影之间匹配;The offline training module receives the merged data samples output by the user behavior collection and processing module, re-weights the data after encoding and dimensionality reduction of the data, and extracts and mines the deep hidden semantic features of the data. match between users and movies;
所述在线召回与排序模块根据用户的属性特征中取出用户已经训练得出的用户特征向量,并采用近似最近邻搜索技术在电影向量库中做向量检索,召回为用户推荐的电影子集。The online recall and sorting module retrieves the user feature vector that has been trained by the user according to the user's attribute characteristics, and uses the approximate nearest neighbor search technology to perform vector retrieval in the movie vector library, and recalls the movie subset recommended for the user.
所述离线训练模块包括输入层、自注意力层、特征抽取层和匹配层,The offline training module includes an input layer, a self-attention layer, a feature extraction layer and a matching layer,
所述用户行为采集与处理模块输出的用户特征数据合并后发送至输入层,所述输入层包括编码模块和降维模块,所述输入层对用户特征数据输入至编码模块和降维模块;The user feature data output by the user behavior collection and processing module is combined and sent to the input layer, the input layer includes an encoding module and a dimension reduction module, and the input layer inputs the user characteristic data to the encoding module and the dimension reduction module;
所述自注意力层采用压缩激励网络SENET对输入层的数据重新赋权;The self-attention layer adopts the compression excitation network SENET to re-weight the data of the input layer;
所述特征抽取层使用三个全连接网络组成深度神经网络,用于对输入的用户和电影特征向量进行深层隐语义特征的提取与挖掘;The feature extraction layer uses three fully connected networks to form a deep neural network, which is used to extract and mine deep latent semantic features for the input user and movie feature vectors;
所述匹配层根据提取出的隐语义特征向量,通过计算它们之间的余弦相似度,得到用户和电影间的匹配得分。The matching layer obtains the matching score between the user and the movie by calculating the cosine similarity between them according to the extracted latent semantic feature vector.
所述用户特征数据包括用户稠密特征和用户稀疏特征,所述用户稠密特征输入至编码模块,所述用户稀疏特征输入至降维模块;The user feature data includes user dense features and user sparse features, the user dense features are input to the encoding module, and the user sparse features are input to the dimension reduction module;
所述用户稀疏特征包括有确定值的稀疏特征和变长稀疏特征,所述有确定值的稀疏特征输入编码模块后,输出为低维向量;The user sparse feature includes a sparse feature with a definite value and a variable-length sparse feature, and after the sparse feature with a definite value is input into the encoding module, the output is a low-dimensional vector;
所述变长稀疏特征包括观看历史和搜索历史,所述观看历史对应的电影嵌入序列经降维模块向量加权平均后得到观看向量;The variable-length sparse feature includes a viewing history and a search history, and a viewing vector is obtained after the movie embedding sequence corresponding to the viewing history is weighted and averaged by the dimensionality reduction module vector;
所述搜索历史经降维模块训练得到嵌入向量,对应的电影嵌入序列经加权平均后得到搜索向量;The search history is trained by a dimension reduction module to obtain an embedding vector, and the corresponding movie embedding sequence is weighted to obtain a search vector;
所述搜索历史与观看历史交错对应训练,所述输入层再将处理后的稀疏特征和稠密特征拼接,并将拼接后的用户和电影向量作为初始嵌入向量。The search history and viewing history are interleaved for training, and the input layer splices the processed sparse features and dense features, and uses the spliced user and movie vectors as initial embedding vectors.
所述自注意力层包括压缩模块和激发模块,所述压缩模块对从输入层接收到的每个特征的嵌入向量进行数据压缩与信息汇总,形成初始权重向量;The self-attention layer includes a compression module and an excitation module, and the compression module performs data compression and information aggregation on the embedding vector of each feature received from the input layer to form an initial weight vector;
所述激发模块用于对压缩模块输出的初始权重向量做特征交叉和保持输出大小维度;The excitation module is used to perform feature crossover and maintain the output size dimension on the initial weight vector output by the compression module;
所述离线推荐模块匹配层根据特征抽取层提取出的隐语义特征向量,计算它们之间的余弦相似度,得到用户和电影间的匹配得分。The matching layer of the offline recommendation module calculates the cosine similarity between them according to the latent semantic feature vectors extracted by the feature extraction layer, and obtains the matching score between the user and the movie.
所述在线召回与排序模块根据用户的属性特征中取出用户已经训练得出的用户特征向量,并采用近似最近邻搜索技术在电影向量库中召回为用户推荐的电影子集,在排序阶段去掉用户已经观看过的电影,将剩余电影与用户特征向量计算相似度,以此作为排序依据,并返回推荐结果列表。The online recall and sorting module retrieves the user feature vector that the user has trained to obtain according to the user's attribute characteristics, and uses the approximate nearest neighbor search technology to recall the movie subset recommended for the user in the movie vector library, and removes the user in the sorting stage. For the movies that have been watched, the similarity between the remaining movies and the user feature vector is calculated as the sorting basis, and a list of recommended results is returned.
一种基于改进深度结构化语义模型的电影推荐方法,包括以下步骤:A movie recommendation method based on an improved deep structured semantic model, including the following steps:
S1:用户行为采集和处理:通过在前端埋点,收集用户的互动行为日志、搜索行为日志和播放记录列表,存入文件系统中,借助数据仓库工具,对收集的日志进行数据清洗得到原始数据集;S1: User behavior collection and processing: Collect user interaction behavior logs, search behavior logs and play record lists by burying points in the front end, store them in the file system, and use data warehouse tools to clean the collected logs to obtain the original data. set;
根据清洗得到的原始数据,系统对用户的行为日志进行预处理,对用户的行为日志进行数据合并;According to the raw data obtained by cleaning, the system preprocesses the user's behavior log and merges the data of the user's behavior log;
S2:离线训练:经S1接收用户行为采集与处理模块输出的合并后的数据,对数据编码和降维后,对数据重新赋权,并对数据进行深层隐语义特征的提取与挖掘,根据数据特征对用户和电影之间匹配;S2: Offline training: receive the combined data output by the user behavior collection and processing module through S1, re-weight the data after encoding and dimensionality reduction, and extract and mine deep hidden semantic features of the data. Matching between features pair users and movies;
S3:在线召回与排序:根据用户的属性特征中取出用户已经训练得出的用户特征向量,并采用近似最近邻搜索技术在电影向量库中召回为用户推荐的电影子集。S3: Online recall and sorting: According to the user's attribute features, the user feature vector that has been trained by the user is extracted, and the approximate nearest neighbor search technology is used to recall the movie subset recommended for the user in the movie vector library.
步骤S2中,还包括以下步骤:In step S2, the following steps are also included:
A1:特征输入:接收所述用户行为采集和处理输出的合并后的数据后,对用户特征数据进行编码和降维;A1: Feature input: after receiving the combined data output from the user behavior collection and processing, encode and reduce the dimension of the user feature data;
A2:特征学习:采用压缩激励网络SENET对输入层的数据重新赋权;A2: Feature learning: use the compressed excitation network SENET to re-weight the data of the input layer;
A3:特征抽取:使用三个全连接网络组成深度神经网络,对输入的用户和电影特征向量进行深层隐语义特征的提取与挖掘;A3: Feature extraction: use three fully connected networks to form a deep neural network to extract and mine deep latent semantic features from the input user and movie feature vectors;
A4:特征匹配:根据提取出的隐语义特征向量,通过计算它们之间的余弦相似度,得到用户和电影间的匹配得分。A4: Feature matching: According to the extracted latent semantic feature vector, the matching score between the user and the movie is obtained by calculating the cosine similarity between them.
步骤A1中,所述用户稀疏特征包括有确定值的稀疏特征和变长稀疏特征,所述有确定值的稀疏特征编码后,输出为低维向量;In step A1, the user sparse feature includes a sparse feature with a definite value and a variable-length sparse feature, and after the sparse feature with a definite value is encoded, the output is a low-dimensional vector;
所述变长稀疏特征包括观看历史和搜索历史,所述观看历史对应的电影嵌入序列经降维向量加权平均后得到观看向量,公式如下:The variable-length sparse features include viewing history and search history, and a viewing vector is obtained after the movie embedding sequence corresponding to the viewing history is weighted and averaged by a dimensionality reduction vector, and the formula is as follows:
其中,t表示当前时间,t0表示观看时间,mi是第i个电影的嵌入向量;Among them, t represents the current time, t 0 represents the viewing time, and m i is the embedding vector of the ith movie;
所述搜索历史经降维训练得到嵌入向量,对应的电影嵌入序列经加权平均后得到搜索向量,公式如下:The search history obtains the embedding vector after dimensionality reduction training, and the corresponding movie embedding sequence is weighted to obtain the search vector, and the formula is as follows:
所述搜索历史与观看历史交错对应训练,所述输入层再将处理后的稀疏特征和稠密特征拼接,并将拼接后的用户和电影向量作为初始嵌入向量。The search history and viewing history are interleaved for training, and the input layer splices the processed sparse features and dense features, and uses the spliced user and movie vectors as initial embedding vectors.
步骤A2中,还包括压缩阶段和激发阶段,所述压缩阶段对从步骤A1中接收到的每个特征的嵌入向量进行数据压缩与信息汇总,形成初始权重向量,公式如下:In step A2, it also includes a compression stage and an excitation stage. The compression stage performs data compression and information aggregation on the embedding vector of each feature received from step A1 to form an initial weight vector, and the formula is as follows:
所述激发阶段对压缩阶段输出的初始权重向量做特征交叉和保持输出大小维度,在压缩阶段,引入了中间层比较窄的两层MLP网络,作用在激发阶段的输出向量Z上,公式如下:In the excitation stage, feature crossover is performed on the initial weight vector output in the compression stage and the output size dimension is maintained. In the compression stage, a two-layer MLP network with a relatively narrow middle layer is introduced, which acts on the output vector Z in the excitation stage. The formula is as follows:
S=Fex(Z,W)=δ(W2δ(W1Z));S=F ex (Z,W)=δ(W 2 δ(W 1 Z));
其中,δ是激活函数,第一个MLP的作用是做特征交叉,第二个MLP的作用是为了保持输出的大小维度。Among them, δ is the activation function, the function of the first MLP is to do feature crossover, and the function of the second MLP is to maintain the size dimension of the output.
步骤A3中,使用三个全连接网络组成深度神经网络,对输入的用户和电影特征向量进行深层隐语义特征的提取与挖掘,隐语义特征向量y具体表示为:In step A3, three fully connected networks are used to form a deep neural network, and the input user and movie feature vectors are used to extract and mine deep latent semantic features. The latent semantic feature vector y is specifically expressed as:
li=f(Wili-1+bi),i=2,…,N-1;l i =f(W i l i-1 +b i ),i=2,...,N-1;
y=f(WNlN-1+bN);y=f(W N l N-1 +b N );
其中,{li,i=1,2,…,N-1}表示各全连接层的输出,Wi,bi分别表示第i层的权重矩阵与偏置项,Among them, {l i , i=1,2,...,N-1} represents the output of each fully connected layer, W i , b i represent the weight matrix and bias term of the i-th layer, respectively,
f表示激活函数tanh:f represents the activation function tanh:
步骤A4根据步骤A3特征抽取提取出的隐语义特征向量,计算它们之间的余弦相似度,得到用户和电影间的匹配得分;Step A4 extracts the latent semantic feature vector extracted according to the feature of step A3, calculates the cosine similarity between them, and obtains the matching score between the user and the movie;
其中,yU、yM分别表示最终得到的用户和电影的隐语义特征向量,‖·‖表示模运算;Among them, y U and y M represent the final latent semantic feature vectors of users and movies, respectively, and ‖·‖ represents the modulo operation;
模型训练时,通过softmax函数将用户和电影最终特征向量的余弦相似度转化为后验概率,公式如下:When the model is trained, the cosine similarity of the final feature vector of the user and the movie is converted into the posterior probability by the softmax function. The formula is as follows:
其中,γ表示softmax函数的平滑因子,并通过极大似然估计最小化损失函数,通过增加时间权重,以最大化用户观看时长为目标,公式如下:Among them, γ represents the smoothing factor of the softmax function, and the loss function is minimized by maximum likelihood estimation. By increasing the time weight, the goal is to maximize the user's viewing time. The formula is as follows:
L(Λ)=-log∏(U,M+)Tj·P(M+|U);L(Λ)=-log∏ (U,M+) T j ·P(M + |U);
其中,Tj表示第j部电影的时长,M表示候选电影集合,Λ表示模型参数,M+表示候选电影中的正样本。Among them, Tj represents the duration of the jth movie, M represents the candidate movie set, Λ represents the model parameters, and M + represents the positive samples in the candidate movie.
步骤S3根据用户的属性特征中取出用户已经训练得出的用户特征向量,并采用近似最近邻搜索技术在电影向量库中召回为用户推荐的电影子集,在排序阶段去掉用户已经观看过的电影,将剩余电影与用户特征向量计算相似度,以此作为排序依据,并返回推荐结果列表。Step S3 extracts the user feature vector that the user has trained to obtain according to the user's attribute features, and uses the approximate nearest neighbor search technology to recall the movie subset recommended for the user in the movie vector library, and removes the movies that the user has watched in the sorting stage. , calculate the similarity between the remaining movies and the user feature vector, use this as the sorting basis, and return a list of recommended results.
与现有技术相比,本发明的有益效果为:Compared with the prior art, the beneficial effects of the present invention are:
1、本专利提出的改进深度结构化语义推荐模型,可以根据用户与电影的显性特征和隐性交互信息,有效地挖掘出符合用户兴趣的电影,为用户提供个性化的推荐服务;1. The improved deep structured semantic recommendation model proposed in this patent can effectively mine movies that meet the user's interests according to the explicit features and implicit interaction information between users and movies, and provide users with personalized recommendation services;
2、本专利使用Faiss框架对模型训练出的用户和电影的特征向量按照其属性特征进行保存,当用户再次访问系统时,可快速取出该向量,并以此做ANN搜索,得到该用户的推荐结果;2. This patent uses the Faiss framework to save the feature vectors of users and movies trained by the model according to their attribute characteristics. When the user accesses the system again, the vector can be quickly retrieved and used for ANN search to get the user's recommendation. result;
3、本专利提出的电影推荐系统会不断地收集用户行为日志,并会将这些历史记录定期更新到用户的画像中,保证系统能够及时学习到用户的近期兴趣。3. The movie recommendation system proposed in this patent will continuously collect user behavior logs, and regularly update these historical records to the user's portrait, so as to ensure that the system can learn the user's recent interests in time.
附图说明Description of drawings
图1为本发明一种基于改进深度结构化语义模型的电影推荐系统示意图;1 is a schematic diagram of a movie recommendation system based on an improved deep structured semantic model of the present invention;
图2为本发明一种基于改进深度结构化语义模型的电影推荐系统的离线训练模型结构图;Fig. 2 is a kind of offline training model structure diagram of the movie recommendation system based on the improved deep structured semantic model of the present invention;
图3为本发明一种基于改进深度结构化语义模型的电影推荐系统的解决时间穿越问题示意图;3 is a schematic diagram of solving the time travel problem of a movie recommendation system based on an improved deep structured semantic model of the present invention;
图4为本发明一种基于改进深度结构化语义模型的电影推荐方法的实施例流程图。FIG. 4 is a flowchart of an embodiment of a movie recommendation method based on an improved deep structured semantic model of the present invention.
具体实施方式Detailed ways
下为使本发明实施例的目的、技术方案和优点更加清楚,下面将对本发明实施例中的技术方案进行清楚、完整地描述。实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市售购买获得的常规产品。In order to make the objectives, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely below. If the specific conditions are not indicated in the examples, it is carried out according to the conventional conditions or the conditions suggested by the manufacturer. The reagents or instruments used without the manufacturer's indication are conventional products that can be purchased from the market.
实施例:如图1所示,一种基于改进深度结构化语义模型的电影推荐系统,包括用户行为采集与处理模块、离线训练模块和在线召回与排序模块,Example: As shown in Figure 1, a movie recommendation system based on an improved deep structured semantic model includes a user behavior collection and processing module, an offline training module and an online recall and sorting module,
用户行为采集与处理模块通过在前端埋点,收集用户的互动行为日志、搜索行为日志和播放记录列表,作为用户特征数据存入文件系统特征库中,本实施例特征库采用Faiss框架,借助数据仓库工具,对收集的日志进行数据清洗得到基础样本原始数据集;The user behavior collection and processing module collects the user's interactive behavior log, search behavior log and play record list by burying points in the front end, and stores it in the file system feature database as user feature data. Warehouse tool, which cleans the collected logs to obtain the original data set of basic samples;
根据清洗得到的原始数据,系统对用户的行为日志进行预处理,对用户的行为日志进行数据合并;According to the raw data obtained by cleaning, the system preprocesses the user's behavior log and merges the data of the user's behavior log;
离线训练模块接收用户行为采集与处理模块输出的合并后的数据样本,对数据编码和降维后,对数据重新赋权,并对数据进行深层隐语义特征的提取与挖掘,完成模型训练,根据数据特征对用户和电影之间匹配,完成模型预测;The offline training module receives the combined data samples output by the user behavior acquisition and processing module, re-weights the data after encoding and dimensionality reduction, and extracts and mines deep hidden semantic features of the data to complete the model training. Data features are matched between users and movies to complete model prediction;
在线召回与排序模块根据用户的属性特征中取出用户已经训练得出的用户特征向量模型预测数据,并采用近似最近邻搜索技术在电影向量库中做向量检索,本实施例中向量检索采用最近邻搜索ANN,召回为用户推荐的电影子集。The online recall and sorting module retrieves the user feature vector model prediction data that the user has trained based on the user's attribute characteristics, and uses the approximate nearest neighbor search technology to perform vector retrieval in the movie vector library. In this embodiment, the vector retrieval uses the nearest neighbor Search the ANN to recall the subset of movies recommended for the user.
如图2所示,离线训练模块包括输入层、自注意力层、特征抽取层和匹配层,As shown in Figure 2, the offline training module includes an input layer, a self-attention layer, a feature extraction layer and a matching layer.
用户行为采集与处理模块输出的用户特征数据合并后发送至输入层,输入层包括编码模块和降维模块,输入层对用户特征数据输入至编码模块和降维模块;The user feature data output by the user behavior collection and processing module is combined and sent to the input layer, the input layer includes an encoding module and a dimension reduction module, and the input layer inputs the user characteristic data to the encoding module and the dimension reduction module;
自注意力层采用压缩激励网络SENET对输入层的数据重新赋权;The self-attention layer adopts the compression excitation network SENET to re-weight the data of the input layer;
特征抽取层使用三个全连接网络组成深度神经网络,用于对输入的用户和电影特征向量进行深层隐语义特征的提取与挖掘;The feature extraction layer uses three fully connected networks to form a deep neural network, which is used to extract and mine deep latent semantic features from the input user and movie feature vectors;
匹配层根据提取出的隐语义特征向量,通过计算它们之间的余弦相似度,得到用户和电影间的匹配得分。The matching layer obtains the matching score between the user and the movie by calculating the cosine similarity between them according to the extracted latent semantic feature vectors.
用户特征数据包括用户稠密特征和用户稀疏特征,用户稠密特征输入至编码模块,用户稀疏特征输入至降维模块;The user feature data includes user dense features and user sparse features, the user dense features are input to the encoding module, and the user sparse features are input to the dimension reduction module;
用户稀疏特征包括有确定值的稀疏特征和变长稀疏特征,有确定值的稀疏特征输入编码模块后,输出为低维向量;User sparse features include sparse features with definite values and variable-length sparse features. After the sparse features with definite values are input into the encoding module, the output is a low-dimensional vector;
变长稀疏特征包括观看历史和搜索历史,观看历史对应的电影Embedding嵌入序列经降维模块向量加权平均后得到观看向量;The variable-length sparse features include viewing history and search history. The viewing vector is obtained after the movie Embedding embedding sequence corresponding to the viewing history is weighted and averaged by the dimensionality reduction module vector;
搜索历史经降维模块训练得到Embedding嵌入向量,对应的电影Embedding嵌入序列经加权平均后得到搜索向量;The search history is trained by the dimensionality reduction module to obtain the Embedding embedding vector, and the corresponding movie Embedding embedding sequence is weighted and averaged to obtain the search vector;
搜索历史与观看历史交错对应训练,输入层再将处理后的稀疏特征和稠密特征拼接,并将拼接后的用户和电影向量作为初始Embedding嵌入向量。The search history and viewing history are interleaved for training, and the input layer splices the processed sparse features and dense features, and uses the spliced user and movie vectors as the initial Embedding embedding vectors.
自注意力层包括压缩模块和激发模块,压缩模块对从输入层接收到的每个特征的Embedding嵌入向量进行数据压缩与信息汇总,形成初始权重向量;The self-attention layer includes a compression module and an excitation module. The compression module performs data compression and information aggregation on the Embedding embedding vector of each feature received from the input layer to form an initial weight vector;
激发模块用于对压缩模块输出的初始权重向量做特征交叉和保持输出大小维度;The excitation module is used to perform feature crossover on the initial weight vector output by the compression module and maintain the output size dimension;
离线推荐模块匹配层根据特征抽取层提取出的隐语义特征向量,计算它们之间的余弦相似度,得到用户和电影间的匹配得分。The matching layer of the offline recommendation module calculates the cosine similarity between the latent semantic feature vectors extracted by the feature extraction layer and obtains the matching score between the user and the movie.
在线召回与排序模块根据用户的属性特征中取出用户已经训练得出的用户特征向量,并采用近似最近邻搜索技术ANN在电影向量库中召回为用户推荐的电影子集,在排序阶段去掉用户已经观看过的电影,将剩余电影与用户特征向量计算相似度,以此作为排序依据,并返回推荐结果列表。The online recall and sorting module extracts the user feature vector that the user has trained according to the user's attribute characteristics, and uses the approximate nearest neighbor search technology ANN to recall the movie subset recommended for the user in the movie vector library, and removes the user's already in the sorting stage. Watched movies, calculate the similarity between the remaining movies and the user feature vector, use this as the sorting basis, and return a list of recommended results.
一种基于改进深度结构化语义模型的电影推荐方法,包括以下步骤:A movie recommendation method based on an improved deep structured semantic model, including the following steps:
S1:用户行为采集和处理:通过在前端埋点,收集用户的互动行为日志、搜索行为日志和播放记录列表,存入文件系统中,借助数据仓库工具,对收集的日志进行数据清洗得到原始数据集;S1: User behavior collection and processing: Collect user interaction behavior logs, search behavior logs and play record lists by burying points in the front end, store them in the file system, and use data warehouse tools to clean the collected logs to obtain the original data. set;
根据清洗得到的原始数据,系统对用户的行为日志进行预处理,对用户的行为日志进行数据合并;According to the raw data obtained by cleaning, the system preprocesses the user's behavior log and merges the data of the user's behavior log;
S2:离线训练:经S1接收用户行为采集与处理模块输出的合并后的数据,对数据编码和降维后,对数据重新赋权,并对数据进行深层隐语义特征的提取与挖掘,根据数据特征对用户和电影之间匹配;S2: Offline training: receive the combined data output by the user behavior collection and processing module through S1, re-weight the data after encoding and dimensionality reduction, and extract and mine deep hidden semantic features of the data. Matching between features pair users and movies;
S3:在线召回与排序:根据用户的属性特征中取出用户已经训练得出的用户特征向量,并采用近似最近邻搜索技术在电影向量库中召回为用户推荐的电影子集。S3: Online recall and sorting: According to the user's attribute features, the user feature vector that has been trained by the user is extracted, and the approximate nearest neighbor search technology is used to recall the movie subset recommended for the user in the movie vector library.
步骤S2中,还包括以下步骤:In step S2, the following steps are also included:
A1:特征输入:接收用户行为采集和处理输出的合并后的数据后,对用户特征数据进行编码和降维;A1: Feature input: After receiving the combined data of user behavior collection and processing output, encode and reduce the dimension of user feature data;
A2:特征学习:采用压缩激励网络SENET对输入层的数据重新赋权;A2: Feature learning: use the compressed excitation network SENET to re-weight the data of the input layer;
A3:特征抽取:使用三个全连接网络组成深度神经网络,对输入的用户和电影特征向量进行深层隐语义特征的提取与挖掘;A3: Feature extraction: use three fully connected networks to form a deep neural network to extract and mine deep latent semantic features from the input user and movie feature vectors;
A4:特征匹配:根据提取出的隐语义特征向量,通过计算它们之间的余弦相似度,得到用户和电影间的匹配得分。A4: Feature matching: According to the extracted latent semantic feature vector, the matching score between the user and the movie is obtained by calculating the cosine similarity between them.
步骤A1中,用户稀疏特征包括有确定值的稀疏特征和变长稀疏特征,有确定值的稀疏特征编码后,输出为低维向量;In step A1, the user sparse features include sparse features with definite values and variable-length sparse features, and after the sparse features with definite values are encoded, the output is a low-dimensional vector;
对用户稠密特征进行one-hot编码操作;对用户稀疏特征进行embedding嵌入降维到低维空间操作。The one-hot encoding operation is performed on the user dense features; the embedding dimension is reduced to a low-dimensional space operation on the user sparse features.
其中对用户稀疏特征的处理可以分为两类:The processing of user sparse features can be divided into two categories:
一种是处理具有确定值的稀疏特征:该类特征主要是用户编号、用户年龄、用户职业等编码特征,此类特征每个用户只有唯一的确定值。可以借助pytorch等机器学习库,使用torch.nn.Embedding工具创建一个词嵌入模型进行编码,将有确定值的稀疏特征输出成低维向量。One is to deal with sparse features with definite values: such features are mainly encoded features such as user ID, user age, and user occupation, and each user has only a unique definite value for such features. You can use the torch.nn.Embedding tool to create a word embedding model for coding with the help of machine learning libraries such as pytorch, and output sparse features with certain values into low-dimensional vectors.
变长稀疏特征包括观看历史和搜索历史,观看历史对应的电影Embedding嵌入序列经降维向量加权平均后得到观看向量,公式如下:The variable-length sparse features include viewing history and search history. The viewing vector is obtained after the movie Embedding embedding sequence corresponding to the viewing history is weighted and averaged by the dimension reduction vector. The formula is as follows:
其中,t表示当前时间,t0表示观看时间,mi是第i个电影的Embedding嵌入向量;Among them, t represents the current time, t 0 represents the viewing time, and m i is the Embedding embedding vector of the ith movie;
对用户搜索历史的处理和观看历史的处理方式类似,先把历史搜索的关键词分词得到词条token,训练得到token的Embedding嵌入向量,然后将用户历史搜索的token所对应的Embedding嵌入向量进行加权平均得到搜索向量search vector,这样可以学习到用户搜索历史的整体状态,且与当前时间间隔越短,该Embedding嵌入向量的权重越高,公式如下:The processing of user search history is similar to the processing of viewing history. First, the keywords of historical search are segmented to obtain entry tokens, and the embedding vector of the token is obtained by training, and then the embedding vector corresponding to the token of the user's historical search is weighted. The search vector search vector is obtained on average, so that the overall state of the user's search history can be learned, and the shorter the time interval from the current time, the higher the weight of the Embedding embedding vector, the formula is as follows:
搜索历史经降维训练得到嵌入向量,对应的电影嵌入序列经加权平均后得到搜索向量,公式如下:The search history is trained by dimensionality reduction to obtain the embedding vector, and the corresponding movie embedding sequence is weighted and averaged to obtain the search vector. The formula is as follows:
对于这两种变长序列特征的训练,系统有对过于活跃用户的惩罚的机制,会为每个用户设置序列特征上限,避免模型被少数过于活跃的用户所代表,平等地为每个用户推荐。For the training of these two variable-length sequence features, the system has a mechanism to punish overactive users, and sets an upper limit of sequence features for each user to prevent the model from being represented by a small number of overactive users, and recommends each user equally. .
如图3所示,考虑到用户观看视频和搜索往往是序列式的,有些前后观看行为甚至存在一些因果关联,传统的推荐模型是将前后上下文作为输入信息,相当于透露一些未来的信息来训练,本发明为了解决这样的时间穿越问题,训练样本将未来的信息完全分离,采用的是使用T-2的搜索记录和T-1的观看历史记录进行训练,解决时间穿越问题。As shown in Figure 3, considering that users watching videos and searching are often sequential, and some viewing behaviors even have some causal relationships, the traditional recommendation model uses the context as input information, which is equivalent to revealing some future information for training , In order to solve such a time travel problem, the present invention completely separates future information from training samples, and uses T-2 search records and T-1 viewing history records for training to solve the time travel problem.
搜索历史与观看历史交错对应训练,输入层再将处理后的稀疏特征和稠密特征拼接,并将拼接后的用户和电影向量作为初始嵌入向量。The search history and viewing history are interleaved for training, and the input layer splices the processed sparse features and dense features, and uses the spliced user and movie vectors as the initial embedding vectors.
步骤A2中,采用压缩激励网络SENET动态地学习这些特征的重要性,通过减小权重抑制无效低频特征,通过增大权重放大重要特征,对用户和电影的初始的Embedding嵌入向量进行重新赋权,具体包括Squeeze压缩阶段和Excitation激发阶段,Squeeze压缩阶段对从步骤A1中接收到的每个特征的嵌入向量进行数据压缩与信息汇总,形成初始权重向量,公式如下:In step A2, the importance of these features is dynamically learned by the compression excitation network SENET, the invalid low-frequency features are suppressed by reducing the weight, the important features are amplified by increasing the weight, and the initial embedding vectors of users and movies are re-weighted. Specifically, it includes the Squeeze compression stage and the Excitation excitation stage. The Squeeze compression stage performs data compression and information aggregation on the embedding vector of each feature received from step A1 to form an initial weight vector. The formula is as follows:
假设某个特征ui的Embedding嵌入向量维度为k,那么我们对Embedding嵌入向量里包含的k个数字作求均值操作,得到能够代表这个特征汇总信息的数值zi;Assuming that the dimension of the embedding vector of a feature ui is k, then we average the k numbers contained in the embedding vector to obtain a value zi that can represent the summary information of this feature;
通过Squeeze压缩阶段,对于每个特征ui,都压缩成了单个数值zi,假设拼接后的Embedding嵌入向量有f个特征,就可以形成初始权重向量Z={z1,z2,…,zf};Through the Squeeze compression stage, each feature ui is compressed into a single value zi . Assuming that the spliced Embedding embedding vector has f features, the initial weight vector Z={z 1 ,z 2 ,…, z f };
Excitation激发阶段对Squeeze压缩阶段输出的初始权重向量做特征交叉和保持输出大小维度,在Excitation激发阶段,引入了中间层比较窄的两层MLP网络,作用在激发阶段的输出向量Z上,公式如下:In the excitation stage, the initial weight vector output in the Squeeze compression stage is cross-featured and the output size is maintained. In the excitation stage, a two-layer MLP network with a relatively narrow middle layer is introduced, which acts on the output vector Z in the excitation stage. The formula is as follows :
S=Fex(Z,W)=δ(W2δ(W1Z));S=F ex (Z,W)=δ(W 2 δ(W 1 Z));
其中,δ是激活函数,第一个MLP的作用是做特征交叉,第二个MLP的作用是为了保持输出的大小维度。Among them, δ is the activation function, the function of the first MLP is to do feature crossover, and the function of the second MLP is to maintain the size dimension of the output.
步骤A3中,使用三个全连接网络DNN组成深度神经网络,对输入的用户和电影特征向量进行深层隐语义特征的提取与挖掘,隐语义特征向量y具体表示为:In step A3, three fully connected network DNNs are used to form a deep neural network to extract and mine deep latent semantic features for the input user and movie feature vectors. The latent semantic feature vector y is specifically expressed as:
li=f(Wili-1+bi),i=2,…,N-1;l i =f(W i l i-1 +b i ),i=2,...,N-1;
y=f(WNlN-1+bN);y=f(W N l N-1 +b N );
其中,{li,i=1,2,…,N-1}表示各全连接层的输出,Wi,bi分别表示第i层的权重矩阵与偏置项,Among them, {l i , i=1,2,...,N-1} represents the output of each fully connected layer, W i , b i represent the weight matrix and bias term of the i-th layer, respectively,
f表示激活函数tanh:f represents the activation function tanh:
步骤A4根据步骤A3特征抽取提取出的隐语义特征向量,计算它们之间的余弦相似度,得到用户和电影间的匹配得分;Step A4 extracts the latent semantic feature vector extracted according to the feature of step A3, calculates the cosine similarity between them, and obtains the matching score between the user and the movie;
其中,yU、yM分别表示最终得到的用户和电影的隐语义特征向量,‖·‖表示模运算;Among them, y U and y M represent the final latent semantic feature vectors of users and movies, respectively, and ‖·‖ represents the modulo operation;
模型的输入由一个用户和多部候选电影组成的集合构成,电影集合包含一定比例的正负样本,其中正负比例的划分标准是按照用户对电影的偏好程度,偏好程度大于0.5为正样本,小于0.3为负样本。偏好程度的计算公式如下:The input of the model consists of a user and a set of multiple candidate movies. The movie set contains a certain proportion of positive and negative samples. The standard for the division of positive and negative proportions is based on the user's preference for movies. The preference is greater than 0.5. Positive samples, Less than 0.3 is a negative sample. The formula for calculating preference is as follows:
其中,表示用户ui对电影mj的观看时长,如果观看时长小于电影时长的百分之三十则得1分,如果大于百分之七十则得2分,和分别表示用户ui对电影mj点赞和转发的得分,本专利设置点赞和转发的得分为3分和4分。in, Represents the viewing time of the movie m j by the user u i . If the viewing time is less than 30% of the movie time, it will get 1 point, and if it is more than 70%, it will get 2 points. and respectively represent the scores of user ui 's likes and reposts of the movie mj , and the patent sets the likes and reposts as 3 points and 4 points.
模型训练时,通过softmax函数将用户和电影最终特征向量的余弦相似度转化为后验概率,公式如下:When the model is trained, the cosine similarity of the final feature vector of the user and the movie is converted into the posterior probability by the softmax function. The formula is as follows:
其中,γ表示softmax函数的平滑因子,并通过极大似然估计最小化损失函数,通过增加时间权重,以最大化用户观看时长为目标,公式如下:Among them, γ represents the smoothing factor of the softmax function, and the loss function is minimized by maximum likelihood estimation. By increasing the time weight, the goal is to maximize the user's viewing time. The formula is as follows:
其中,Tj表示第j部电影的时长,M表示候选电影集合,Λ表示模型参数,M+表示候选电影中的正样本。Among them, Tj represents the duration of the jth movie, M represents the candidate movie set, Λ represents the model parameters, and M + represents the positive samples in the candidate movie.
步骤S3根据用户的属性特征中取出用户已经训练得出的用户特征向量,并采用近似最近邻搜索技术在电影向量库中召回为用户推荐的电影子集,在排序阶段去掉用户已经观看过的电影,将剩余电影与用户特征向量计算相似度,以此作为排序依据,并返回推荐结果列表。Step S3 extracts the user feature vector that the user has trained to obtain according to the user's attribute features, and uses the approximate nearest neighbor search technology to recall the movie subset recommended for the user in the movie vector library, and removes the movies that the user has watched in the sorting stage. , calculate the similarity between the remaining movies and the user feature vector, use this as the sorting basis, and return a list of recommended results.
具体实施流程图如图4所示。首先用户进入系统,进行登录。系统对用户进行判断是否为新用户,若该用户不是新用户则说明系统已有其历史数据,系统已经得到该用户在隐语义空间中的特征向量,并根据用户的ID建立索引存入了Faiss框架中。系统根据用户ID取出该特征向量,并以此特征向量在电影向量库中进行近似最近邻搜索(ANN)召回用户可能感兴趣的电影候选集。由于电影候选集可能存在用户已经观看过的电影,因此系统在排序阶段会对电影候选集再进行一次筛选,去除已经观看过的电影,将剩余电影与用户特征向量计算相似度,以此作为排序依据,并返回TOP-N的推荐结果列表。The specific implementation flowchart is shown in FIG. 4 . First, the user enters the system and logs in. The system judges whether the user is a new user. If the user is not a new user, it means that the system already has its historical data, the system has obtained the feature vector of the user in the latent semantic space, and establishes an index according to the user's ID and stores it in Faiss. in the frame. The system retrieves the feature vector according to the user ID, and uses this feature vector to perform an approximate nearest neighbor search (ANN) in the movie vector library to recall the movie candidate set that the user may be interested in. Since the movie candidate set may contain movies that the user has already watched, the system will filter the movie candidate set again in the sorting stage, remove the movies that have been watched, and calculate the similarity between the remaining movies and the user's feature vector, which is used as a ranking According to, and return the list of recommended results of TOP-N.
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神和基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本发明内,不应将权利要求中的任何附图标记视为限制所涉及的权利要求。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit and essential characteristics of the present invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the appended claims. All changes that come within the meaning and range of equivalents of , are intended to be embraced within the invention, and any reference signs in the claims shall not be construed as limiting the involved claim.
此外,应当理解,虽然本说明书按照实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施例中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。In addition, it should be understood that although this specification is described in terms of embodiments, not each embodiment only includes an independent technical solution, and this description in the specification is only for the sake of clarity, and those skilled in the art should take the specification as a whole , the technical solutions in each embodiment can also be appropriately combined to form other implementations that can be understood by those skilled in the art.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210132815.XA CN114637909A (en) | 2022-02-14 | 2022-02-14 | Film recommendation system and method based on improved deep structured semantic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210132815.XA CN114637909A (en) | 2022-02-14 | 2022-02-14 | Film recommendation system and method based on improved deep structured semantic model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114637909A true CN114637909A (en) | 2022-06-17 |
Family
ID=81946036
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210132815.XA Pending CN114637909A (en) | 2022-02-14 | 2022-02-14 | Film recommendation system and method based on improved deep structured semantic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114637909A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE202023104110U1 (en) | 2023-07-23 | 2023-07-28 | Upasana Adhikari | Intelligent encryption-based system for movie recommendations |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180067935A1 (en) * | 2017-08-24 | 2018-03-08 | Prakash Kumar | Systems and methods for digital media content search and recommendation |
CN110162706A (en) * | 2019-05-22 | 2019-08-23 | 南京邮电大学 | A kind of personalized recommendation method and system based on interaction data cluster |
CN113051468A (en) * | 2021-02-22 | 2021-06-29 | 山东师范大学 | Movie recommendation method and system based on knowledge graph and reinforcement learning |
-
2022
- 2022-02-14 CN CN202210132815.XA patent/CN114637909A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180067935A1 (en) * | 2017-08-24 | 2018-03-08 | Prakash Kumar | Systems and methods for digital media content search and recommendation |
CN110162706A (en) * | 2019-05-22 | 2019-08-23 | 南京邮电大学 | A kind of personalized recommendation method and system based on interaction data cluster |
CN113051468A (en) * | 2021-02-22 | 2021-06-29 | 山东师范大学 | Movie recommendation method and system based on knowledge graph and reinforcement learning |
Non-Patent Citations (2)
Title |
---|
XUGANG YE 等: "Enhancing Retrieval and Ranking Performance for Media Search Engine by Deep Learning", 2016 49TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 31 December 2016 (2016-12-31), pages 1 - 7 * |
常志 等: "基于深度学习的视频描述方法研究综述", 天津理工大学学报, vol. 36, no. 6, 31 December 2020 (2020-12-31), pages 1 - 7 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE202023104110U1 (en) | 2023-07-23 | 2023-07-28 | Upasana Adhikari | Intelligent encryption-based system for movie recommendations |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112765486B (en) | A Movie Recommendation Method Integrating the Attention Mechanism of Knowledge Graph | |
CN110442781B (en) | Pair-level ranking item recommendation method based on generation countermeasure network | |
CN109874053B (en) | Short video recommendation method based on video content understanding and user dynamic interest | |
WO2021139415A1 (en) | Data processing method and apparatus, computer readable storage medium, and electronic device | |
CN110377840A (en) | A kind of music list recommended method and system based on user's shot and long term preference | |
Dezfouli et al. | Deep neural review text interaction for recommendation systems | |
CN112464100B (en) | Information recommendation model training method, information recommendation method, device and equipment | |
CN113051468B (en) | Movie recommendation method and system based on knowledge graph and reinforcement learning | |
CN115618101A (en) | Streaming media content recommendation method, device and electronic equipment based on negative feedback | |
Mehta et al. | Movie recommendation systems using sentiment analysis and cosine similarity | |
Zarzour et al. | RecDNNing: a recommender system using deep neural network with user and item embeddings | |
Liu et al. | Building effective short video recommendation | |
CN117056609A (en) | Session recommendation method based on multi-layer aggregation enhanced contrast learning | |
CN112380451A (en) | Favorite content recommendation method based on big data | |
Fazelnia et al. | Variational user modeling with slow and fast features | |
Chakder et al. | Graph network based approaches for multi-modal movie recommendation system | |
CN116150487A (en) | Multi-mode information deviation-removing recommendation method for breaking through information cocoon houses | |
CN114637909A (en) | Film recommendation system and method based on improved deep structured semantic model | |
CN114117233B (en) | A conversational news recommendation method and recommendation system based on user implicit feedback | |
CN113468413B (en) | Multi-user sharing-oriented multimedia network video recommendation method | |
CN113688281B (en) | Video recommendation method and system based on deep learning behavior sequence | |
Mirhasani et al. | Alleviation of cold start in movie recommendation systems using sentiment analysis of multi-modal social networks | |
Naskar et al. | Implementation of Movie Recommendation System Using Hybrid Filtering Methods and Sentiment Analysis of Movie Reviews | |
CN116340647A (en) | Decoupling negative sampling method and system based on contrast learning | |
CN114090848A (en) | Data recommendation and classification method, feature fusion model and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |