CN112364245A - Top-K movie recommendation method based on heterogeneous information network embedding - Google Patents
Top-K movie recommendation method based on heterogeneous information network embedding Download PDFInfo
- Publication number
- CN112364245A CN112364245A CN202011306020.3A CN202011306020A CN112364245A CN 112364245 A CN112364245 A CN 112364245A CN 202011306020 A CN202011306020 A CN 202011306020A CN 112364245 A CN112364245 A CN 112364245A
- Authority
- CN
- China
- Prior art keywords
- node
- nodes
- data
- information network
- heterogeneous information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method for recommending the Top-K movie based on heterogeneous information network embedding comprises the following steps: step 1, preprocessing data: step 2, embedding a learning heterogeneous information network; step 3, information is spread in the heterogeneous information network; step 4, aggregating node information and side information; step 5, predicting and scoring; and 6, Top-K evaluation. The invention improves the learning of the heterogeneous information network, explicitly adds edges among nodes into the learning process, applies the improved learning method to the recommendation task, fully obtains the relation among different types of nodes in the movie data, obtains richer semantic information compared with the traditional homogeneous network, obtains the edge information compared with the existing heterogeneous information network learning method, reduces the data loss in the learning process and improves the utilization rate of the information in the heterogeneous information network.
Description
Technical Field
The invention relates to a movie recommendation method.
Background
Along with the rapid development of the internet, the method brings abundant information to people and meets the requirements of people on the information. With the explosive increase of information amount, people find that more and more information can be contacted in daily life, but less and less information is really useful for the people, so that the problem of information overload is generated, namely, the problem that users cannot quickly find needed information due to limited knowledge level and cognitive ability of the users in the face of mass information.
Initially, the main approaches to information overload were categorized catalogs and search engines, such as yahoo and google. However, due to the rapid increase of the data volume, the method cannot meet the requirements of people, and therefore, a recommendation system is produced. The recommendation system obtains the interest of the user by analyzing the historical behavior of the user and actively pushes the interested information for the user.
The recommendation algorithm mainly focuses on collaborative filtering in early research and achieves good effect. Collaborative filtering is mainly divided into two major categories, namely neighborhood-based collaborative filtering and model-based collaborative filtering. Neighborhood-based collaborative filtering can be divided into user-based collaborative filtering and item-based collaborative filtering; the model-based collaborative filtering method mainly comprises an SVM model, a Bayesian network model, a cause shadow model and the like.
At present, the methods only focus on homogeneous networks, the homogeneous networks cannot well model complex real world, and heterogeneous information networks are introduced at the moment. The heterogeneous information network comprises more than two different types of nodes and relations, so that the complex relations in the real world can be well described, and the recommendation accuracy is improved. At present, research of heterogeneous information networks in recommendation mainly focuses on embedded learning of nodes, and the general directions of the heterogeneous information networks are classified into two types, namely a meta-path method and a graph neural network which is directly utilized. The two methods are characterized in that nodes are vectorized, structural information of a heterogeneous information network is obtained, and then a recommendation task is completed by combining a classical recommendation algorithm. At present, most of the methods focus on the processing of nodes, but ignore the information of edges between the nodes, and the types of the nodes in the heterogeneous information network are various, so the types of the edges between the nodes are also various, and contain much information, and the information is ignored.
Disclosure of Invention
In order to overcome the defects in the prior art and add rich side information in the heterogeneous information network into the recommendation model, the invention provides a new recommendation method based on the heterogeneous information network.
The method includes the steps that nodes and edges in a heterogeneous information network are initially embedded by using a TransR method to obtain vector representations of the nodes and the edges, then the node vectors and the edge vectors are aggregated to obtain vector representations of users and articles, and then a Top-K recommendation task is completed.
The method for recommending the Top-K movie based on heterogeneous information network embedding comprises the following specific steps:
step 1, preprocessing data, specifically comprising:
1.1 cleaning data; cleaning the original data, and filtering invalid data in the original data set, wherein the invalid data comprise user data with the watching times smaller than a preset value and movie data with the evaluation times smaller than the preset value, and further training data and testing data;
1.2 constructing heterogeneous information network data and constructing training data and test data; constructing a heterogeneous information network by using the cleaned data; constructing the cleaned data into a triple group to represent the heterogeneous information network, wherein the triple group is in the following form:
(h,r,t) (1)
wherein h represents a head node, t represents a tail node, and r represents the relationship between the head node h and the tail node t, i.e. the edge between the head node h and the tail node t;
step 2, the embedded learning heterogeneous information network specifically comprises the following steps:
2.1 initializing embedding; firstly, initializing vectors of nodes and edges in the heterogeneous information network, wherein a TransR model is adopted to initialize the nodes and the edges in the heterogeneous information network by using vectors with the same dimension, namely Eh、Et、ErHead, tail, and edges; the nodes are then mapped according to the type of relationship, i.e. for each relationship r there is a mapping matrix MrAnd mapping the nodes into a vector space of the relation r, wherein the formula is as follows:
wherein the content of the first and second substances,respectively representing vectors after the nodes h and t are mapped to r;
2.2, learning a heterogeneous information network; here, vector representations of nodes and edges are obtained through initialization, and the heterogeneous information network is learned through a score function:
wherein f (h, r, t) represents a scoring function; by means of the function, nodes with connections can be close to each other, while nodes without connections can be distant from each other; loss function L of learning process1Is defined as:
wherein (h, r, t) e G represents a positive sample in the heterogeneous information network,is a negative example, G denotes a heterogeneous information network;
step 3, information is transmitted in the heterogeneous information network, and the method specifically comprises the following steps:
3.1 calculating the attention scores between the nodes and the neighbors;
unlike the meta-path method using a pre-prepared path instance, the present invention directly calculates attention scores for a node and its neighbors according to connectivity of the node in a heterogeneous information network, for example, the attention score pi (h, r, t) of a node h and its neighbor t is:
wherein tanh (-) is an activation function; the closer the nodes are associated with their neighbors, the greater the attention score; since a node has multiple neighbors, there are multiple attention scores, so the obtained attention scores are normalized:
wherein the numerator exp (π (h, r, t)) represents the attention score, denominator, of a node h and its one neighbor tRepresents the sum of the attention scores of all the neighbors of node h;
3.2 information transmission among nodes, wherein the part of information aggregated from neighbor nodes to the current node comprises node fusion; specifically, taking the head node h in the triplet (h, r, t) as an example, its neighbor set is NhIf { (h, r, t) | (h, r, t) ∈ G }, then the vector of the neighbor of node h is represented as:
step 4, aggregating node information and side information; aggregation of node h and its edges between neighborsExpressed as:
to aggregate this information, it is implemented by the following function:
wherein LeakReLU (-) is the activation function, EhThe node h initializes the representation and,is a representation of the edges of the image,is the information of the neighbor of node h; information in the heterogeneous information network is fully mined through the representation of the aggregation nodes and the edges;
step 5, predicting and scoring; through the above steps, a representation E of the user node may be obtaineduAnd representation of item node EiAs follows:
scoring the predictionsExpressed as the inner product of the user node vector representation and the item node vector representation:
score predicted loss function L2The following were used:
D={(u,i,j)|(u,i)∈R+,(u,j)∈R-} (15)
wherein D is the data set, (u, i) eR+Denotes a positive sample, (u, j) e R-Is a negative sample; total loss function LtotalComprises the following steps:
Ltotal=L1+L2 (16)
step 6, Top-K evaluation; by two commonly used criteria: HR @ K and NDCG @ K are used for evaluating the recommendation method, and the formula is as follows:
wherein K represents the first K data in the recommendation removing result; GT represents test set data; reliThen the correlation at the ith location is represented, typically rel if the item at the ith location is in the test setiIs 1, otherwise is 0; zkRepresenting the normalized coefficient.
Preferably, the predetermined value described in step 1.1 is 20 times.
The invention integrates the current novel heterogeneous information network learning method, fuses the relationship between nodes into the learning of the heterogeneous information network, and fully excavates the information in the heterogeneous network; the innovation point of the method is that the learning of the heterogeneous information network is improved, edges between nodes are explicitly added into the learning process, the improved learning method is applied to the recommendation task, the relation between different types of nodes in the movie data is fully acquired, compared with the traditional homogeneous network, richer semantic information is acquired, compared with the existing heterogeneous information network learning method, the edge information is acquired, the data loss in the learning process is reduced, and the utilization rate of the information in the heterogeneous information network is improved.
Drawings
FIG. 1 is a general flow diagram of the process of the present invention.
Detailed Description
The input data of the method provided by the invention is divided into two parts, namely heterogeneous information graph data, namely a triple, and scoring data for training and testing, and the output of the method is the top K movie lists of each user.
As shown in fig. 1, the Top-K movie recommendation method based on heterogeneous information network embedding of the present invention includes the following steps:
step 1, preprocessing data, specifically:
1.1 cleaning data; removing users with the film watching times smaller than 20 and films with the film watching times smaller than 20 in the film data to finish the cleaning of the data;
1.2 constructing a heterogeneous information network and a grading data set; coding the user, the movie, the director, the actors and the genres, coding the relationships among the objects of the user, the movie, the director, the movie, the actors and the movie, constructing a triple and a scoring data set, randomly dividing the scoring data set to obtain training data and test data, wherein the training samples comprise positive samples and negative samples; constructing a heterogeneous information network by using the cleaned data; constructing the cleaned data into a triple group to represent the heterogeneous information network, wherein the triple group is in the following form:
(h,r,t) (1)
wherein h represents a head node, t represents a tail node, and r represents the relationship between the head node h and the tail node t, i.e. the edge between the head node h and the tail node t;
step 2, embedding a learning heterogeneous network;
2.1 initializing embedding; and taking the constructed triple data as the input of the heterogeneous information network embedding learning in the form of an adjacency matrix, and initializing the nodes by the following formula,the vector representation after nodes h and t are mapped to r respectively:
2.2, learning a heterogeneous information network; learning the embedding of the heterogeneous information system network through a score function, wherein f (h, r, t) represents the score function:
by means of the function, nodes with connections can be close to each other, while nodes without connections can be distant from each other; the loss function of this learning process is defined as:
wherein (h, r, t) e G represents a positive sample triplet in the heterogeneous information network,is a negative example, G denotes a heterogeneous information network;
step 3, information is spread in the heterogeneous information network; the information transmission between the node and the neighbors is calculated, one node is provided with a plurality of neighbors, and the importance of each neighbor to the node is inconsistent, so that the weight between the node and the different neighbors is firstly calculated, and then the information transmission is carried out on the node and the neighbors thereof; specifically, the method comprises the following steps:
3.1 calculating the attention scores between the nodes and the neighbors;
the importance of different neighbors to a node varies, and for this reason the degree of importance, i.e. the weight between a node and its neighbors, is measured by pi (h, r, t), where tanh (·) is the activation function:
after the weights of the node and all its neighbors are computed, these importance values are normalized:
wherein N ish{ (h, r, t) | (h, r, t) ∈ G } represents a neighbor of the node h, and the numerator exp (pi (h, r, t)) represents the attention score, denominator, of the node h and its one neighbor tRepresents the sum of the attention scores of all the neighbors of node h;
3.2 information transmission among nodes; the information propagated by the neighbors of the nodes is aggregated through the calculated weight,information forwarded from the neighbor is represented:
step 4, aggregating node information and side information;
the node information, the information transmitted by the neighbors of the node and the information of the edges between the node and the neighbors are aggregated, firstly, the edges between the node h and the neighbors are aggregated and are expressed as
Then, the three are aggregated, and the activation function adopts LeakReLU (·):
step (ii) of5. Predicting the score; finally, through the above steps, final vector representations of the user node and the movie node can be obtained, and are respectively marked as EuAnd Ei:
Predictive scoringExpressed as the inner product of the user node vector representation and the item node vector representation:
the loss function in the scoring prediction process is:
D={(u,i,j)|(u,i)∈R+,(u,j)∈R-} (15)
wherein D is the data set, (u, i) is e.R+Denotes a positive sample, (u, j) e R-Is a negative sample;
the total loss function of the entire model is Ltotal:
Ltotal=L1+L2 (16)
Step 6; Top-K evaluation; after the whole learning process is completed, evaluating the result output by the model; the output of the model is a list of the front K movie numbers of each user, and the recommendation result is evaluated through two indexes of HR @ K and NDCG @ K:
all steps of the entire recommendation are now complete.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.
Claims (2)
1. The method for recommending the Top-K movie based on heterogeneous information network embedding comprises the following specific steps:
step 1, preprocessing data, specifically comprising:
1.1 cleaning data; cleaning the original data, and filtering invalid data in the original data set, wherein the invalid data comprise user data with the watching times smaller than a preset value and movie data with the evaluation times smaller than the preset value, and further training data and testing data;
1.2 constructing heterogeneous information network data and constructing training data and test data; constructing a heterogeneous information network by using the cleaned data; constructing the cleaned data into a triple group to represent the heterogeneous information network, wherein the triple group is in the following form:
(h,r,t) (1)
wherein h represents a head node, t represents a tail node, and r represents the relationship between the head node h and the tail node t, i.e. the edge between the head node h and the tail node t;
step 2, the embedded learning heterogeneous information network specifically comprises the following steps:
2.1 initializing embedding; firstly, initializing vectors of nodes and edges in the heterogeneous information network, wherein a TransR model is adopted to initialize the nodes and the edges in the heterogeneous information network by using vectors with the same dimension, namely Eh、Et、ErHead, tail, and edges; then according to the relation typeThe nodes are mapped, i.e. for each relation r, there is a mapping matrix MrAnd mapping the nodes into a vector space of the relation r, wherein the formula is as follows:
wherein the content of the first and second substances,respectively representing vectors after the nodes h and t are mapped to r;
2.2, learning a heterogeneous information network; here, vector representations of nodes and edges are obtained through initialization, and the heterogeneous information network is learned through a score function:
wherein f (h, r, t) represents a scoring function; by means of the function, nodes with connections can be close to each other, while nodes without connections can be distant from each other; loss function L of learning process1Is defined as:
wherein (h, r, t) e G represents a positive sample in the heterogeneous information network,is a negative example, G denotes a heterogeneous information network;
step 3, information is transmitted in the heterogeneous information network, and the method specifically comprises the following steps:
3.1 calculating the attention scores between the nodes and the neighbors;
unlike the meta-path method using a pre-prepared path instance, the present invention directly calculates attention scores for a node and its neighbors according to connectivity of the node in a heterogeneous information network, for example, the attention score pi (h, r, t) of a node h and its neighbor t is:
wherein tanh (-) is an activation function; the closer the nodes are associated with their neighbors, the greater the attention score; since a node has multiple neighbors, there are multiple attention scores, so the obtained attention scores are normalized:
wherein the numerator exp (π (h, r, t)) represents the attention score, denominator, of a node h and its one neighbor tRepresents the sum of the attention scores of all the neighbors of node h;
3.2 information transmission among nodes, wherein the part of information aggregated from neighbor nodes to the current node comprises node fusion; specifically, taking the head node h in the triplet (h, r, t) as an example, its neighbor set is NhIf { (h, r, t) | (h, r, t) ∈ G }, then the vector of the neighbor of node h is represented as:
step 4, aggregating node information and side information; aggregation of node h and its edges between neighborsExpressed as:
to aggregate this information, it is implemented by the following function:
wherein LeakReLU (-) is the activation function, EhThe node h initializes the representation and,is a representation of the edges of the image,is the information of the neighbor of node h; information in the heterogeneous information network is fully mined through the representation of the aggregation nodes and the edges;
step 5, predicting and scoring; through the above steps, a representation E of the user node may be obtaineduAnd representation of item node EiAs follows:
scoring the predictionsExpressed as the inner product of the user node vector representation and the item node vector representation:
score predicted loss function L2The following were used:
D={(u,i,j)|(u,i)∈R+,(u,j)∈R-} (15)
wherein D is the data set, (u, i) is e.R+Denotes a positive sample, (u, j) e R-Is a negative sample; total loss function LtotalComprises the following steps:
Ltotal=L1+L2 (16)
step 6, Top-K evaluation; by two commonly used criteria: HR @ K and NDCG @ K are used for evaluating the recommendation method, and the formula is as follows:
wherein K represents the first K data in the recommendation removing result; GT represents test set data; reliThen the correlation at the ith location is represented, typically rel if the item at the ith location is in the test setiIs 1, otherwise is 0; zkRepresenting the normalized coefficient.
2. The heterogeneous information network embedding-based Top-K movie recommendation method of claim 1, wherein: the predetermined value stated in step 1.1 is 20 times.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011306020.3A CN112364245B (en) | 2020-11-20 | 2020-11-20 | Top-K movie recommendation method based on heterogeneous information network embedding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011306020.3A CN112364245B (en) | 2020-11-20 | 2020-11-20 | Top-K movie recommendation method based on heterogeneous information network embedding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112364245A true CN112364245A (en) | 2021-02-12 |
CN112364245B CN112364245B (en) | 2021-12-21 |
Family
ID=74534351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011306020.3A Active CN112364245B (en) | 2020-11-20 | 2020-11-20 | Top-K movie recommendation method based on heterogeneous information network embedding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112364245B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112861006A (en) * | 2021-02-22 | 2021-05-28 | 中国科学院计算技术研究所 | Recommendation method and system fusing meta-path semantics |
CN114238439A (en) * | 2021-12-14 | 2022-03-25 | 四川大学 | Task-driven relational data view recommendation method based on joint embedding |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491540A (en) * | 2017-08-24 | 2017-12-19 | 济南浚达信息技术有限公司 | A kind of combination depth Bayesian model and the film of collaboration Heterogeneous Information insertion recommend method |
US20180052994A1 (en) * | 2015-04-20 | 2018-02-22 | Splunk Inc. | User activity monitoring |
CN108363804A (en) * | 2018-03-01 | 2018-08-03 | 浙江工业大学 | Partial model Weighted Fusion Top-N films based on user clustering recommend method |
US20190080383A1 (en) * | 2017-09-08 | 2019-03-14 | NEC Laboratories Europe GmbH | Method and system for combining user, item and review representations for recommender systems |
CN110677284A (en) * | 2019-09-24 | 2020-01-10 | 北京工商大学 | Heterogeneous network link prediction method based on meta path |
-
2020
- 2020-11-20 CN CN202011306020.3A patent/CN112364245B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180052994A1 (en) * | 2015-04-20 | 2018-02-22 | Splunk Inc. | User activity monitoring |
CN107491540A (en) * | 2017-08-24 | 2017-12-19 | 济南浚达信息技术有限公司 | A kind of combination depth Bayesian model and the film of collaboration Heterogeneous Information insertion recommend method |
US20190080383A1 (en) * | 2017-09-08 | 2019-03-14 | NEC Laboratories Europe GmbH | Method and system for combining user, item and review representations for recommender systems |
CN108363804A (en) * | 2018-03-01 | 2018-08-03 | 浙江工业大学 | Partial model Weighted Fusion Top-N films based on user clustering recommend method |
CN110677284A (en) * | 2019-09-24 | 2020-01-10 | 北京工商大学 | Heterogeneous network link prediction method based on meta path |
Non-Patent Citations (1)
Title |
---|
汤颖等: "基于局部模型加权融合的Top-N电影推荐算法", 《计算机科学》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112861006A (en) * | 2021-02-22 | 2021-05-28 | 中国科学院计算技术研究所 | Recommendation method and system fusing meta-path semantics |
CN112861006B (en) * | 2021-02-22 | 2023-06-23 | 中国科学院计算技术研究所 | Recommendation method and system for fusion element path semantics |
CN114238439A (en) * | 2021-12-14 | 2022-03-25 | 四川大学 | Task-driven relational data view recommendation method based on joint embedding |
CN114238439B (en) * | 2021-12-14 | 2023-03-28 | 四川大学 | Task-driven relational data view recommendation method based on joint embedding |
Also Published As
Publication number | Publication date |
---|---|
CN112364245B (en) | 2021-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110162706B (en) | Personalized recommendation method and system based on interactive data clustering | |
Luo et al. | Personalized recommendation by matrix co-factorization with tags and time information | |
CN110532471B (en) | Active learning collaborative filtering method based on gated cyclic unit neural network | |
CN109190030B (en) | Implicit feedback recommendation method fusing node2vec and deep neural network | |
CN112507246B (en) | Social recommendation method fusing global and local social interest influence | |
Anand et al. | Folksonomy-based fuzzy user profiling for improved recommendations | |
CN112364245B (en) | Top-K movie recommendation method based on heterogeneous information network embedding | |
CN113420221B (en) | Interpretable recommendation method integrating implicit article preference and explicit feature preference of user | |
CN112948625A (en) | Film recommendation method based on attribute heterogeneous information network embedding | |
CN114510653B (en) | Social group recommendation method, system, device and storage medium | |
CN110083766B (en) | Query recommendation method and device based on meta-path guiding embedding | |
CN115329215A (en) | Recommendation method and system based on self-adaptive dynamic knowledge graph in heterogeneous network | |
CN113590965B (en) | Video recommendation method integrating knowledge graph and emotion analysis | |
CN115712780A (en) | Information pushing method and device based on cloud computing and big data | |
Sridhar et al. | Content-Based Movie Recommendation System Using MBO with DBN. | |
CN108491477B (en) | Neural network recommendation method based on multi-dimensional cloud and user dynamic interest | |
CN113590976A (en) | Recommendation method of space self-adaptive graph convolution network | |
Zheng et al. | Incorporating price into recommendation with graph convolutional networks | |
CN115481325A (en) | Personalized news recommendation method and system based on user global interest migration perception | |
CN115840853A (en) | Course recommendation system based on knowledge graph and attention network | |
CN114329167A (en) | Hyper-parameter learning, intelligent recommendation, keyword and multimedia recommendation method and device | |
Hill et al. | A graph neural network recommendation model with knowledge graph and attention mechanism | |
Stanhope et al. | Group link prediction | |
Wang et al. | BERT-based aggregative group representation for group recommendation | |
Nie | Research on Personalized Recommendation Algorithm of Internet Platform Goods Based on Knowledge Graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |