CN114238750A - Interactive visual recommendation method based on heterogeneous network information embedding model - Google Patents

Interactive visual recommendation method based on heterogeneous network information embedding model Download PDF

Info

Publication number
CN114238750A
CN114238750A CN202111371845.8A CN202111371845A CN114238750A CN 114238750 A CN114238750 A CN 114238750A CN 202111371845 A CN202111371845 A CN 202111371845A CN 114238750 A CN114238750 A CN 114238750A
Authority
CN
China
Prior art keywords
model
path
data
user
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111371845.8A
Other languages
Chinese (zh)
Inventor
汤颖
王攸妍
周元博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111371845.8A priority Critical patent/CN114238750A/en
Publication of CN114238750A publication Critical patent/CN114238750A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The interactive visual recommendation method based on the heterogeneous network information embedding model comprises the following steps: step 1: crawling and cleaning the data; collecting the user data of the broad beans and the data of the broad beans movie from a real broad bean movie website, and cleaning; step 2: acquiring parameters and characteristic data in a model training process, constructing a heterogeneous film information network for preprocessed film data as input of the model, then training by using heterogeneous embedded models HetGNN, KGAT and NIRec, and reserving related parameters and characteristic information in the training process; and step 3: model comparison exploration based on user interaction, and designing corresponding visual charts according to different index data and model parameters based on the retained path characteristic data; and 4, step 4: interactive visual recommendation based on the heterogeneous embedded model learns the objects and the relations of multiple types in the heterogeneous graph, and mines implicit rich structures and semantic information for recommending tasks. The invention solves the problem of the recommended black box and increases the interpretability of the recommendation result.

Description

Interactive visual recommendation method based on heterogeneous network information embedding model
Technical Field
The invention provides a visual analysis method of a heterogeneous network embedding model, which systematically explores and compares the differences of three representative embedding models on the downstream recommended task performance, the common mode of the models in the network embedding process and the source information of the recommendation result, and increases the interpretability of the recommendation.
Background
In order to better retain complex structural information and rich semantic information of the objects in data mining, researchers fuse different types of objects and interaction information thereof, even information from different data sources, into one information network, and the information network is called a heterogeneous information network.
In recent years, with the development of deep neural networks, some researchers try to apply a deep model to heterogeneous network embedding, and compared with a shallow model, the deep model learns embedding from node attributes and interactions among nodes by using the neural network, so that nonlinear relations can be captured better, and more structural information and semantic information can be reserved. However, the method of embedding a heterogeneous network using a meta path is considered to have a certain limitation because it requires domain knowledge; the modeling method of the high-order relation by using the information transfer mechanism is considered to introduce noise information because the characteristics of all surrounding neighbors are aggregated. The above two methods have their own advantages and disadvantages in theory, however, what are the differences between the user characteristics learned in different ways? Is its characteristic information effectively retained? Particularly in the recommendation task, the problems that the evaluation values of different models are similar, whether the effectiveness of the recommendation result is consistent or not means, and whether the recommendation of different users by the same model maintains the individuation or not are very worthy of being explored for recommendation.
However, most of the current model visualization work focuses on analyzing a single model, most of the processed data is directly observable image data, and the heterogeneous network embedding maps high-dimensional features into low-dimensional vectors, so that direct comparison or intuitive understanding of the features included in the learning process cannot be performed. In addition, although the existing multi-model visual comparison method can analyze the intrinsic characteristics and the model performance of the model by visualizing the activation condition of the neuron, the method is not suitable for the heterogeneous network embedded model.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a visual contrast analysis method oriented to a heterogeneous network embedded model.
The method comprises the steps of crawling and cleaning data for evaluating models, unifying downstream tasks and evaluation indexes of the models, using the processed data as the input of the models, performing learning of heterogeneous network embedding, and keeping a large amount of parameters and learned characteristic information in the learning process; then extracting effective characteristics in the data and designing a data visualization view to display the effective characteristics; and finally, mining abnormal data in the embedded result by designing interactive operation so as to evaluate the effectiveness of the models and explore common modes among the models.
The interactive visual recommendation method based on the heterogeneous network information embedding model comprises the following steps:
step 1: crawling and cleaning the data; collecting the user data of the broad beans and the broad bean movie data from a real broad bean movie website, and cleaning the broad bean user data and the broad bean movie data according to a certain rule to ensure the validity of the data;
1.1, acquiring an original data set, randomly selecting one bean user, randomly reserving the bean user according to the percentage of the film watching quantity of the bean user, randomly selecting three users from a concerned list of the bean user, and repeating the process; then, counting the film watching records of all users, reserving effective film numbers, and crawling film information;
1.2, carrying out data preprocessing on an original data set, and carrying out data filtering under the conditions that the film watching records of a user are at least more than 50, the film watching times are more than 20 by an effective user and the like in order to make the evaluation on a model fairer and more effective, namely, prevent invalid data; since the recommended model usually has a cold start problem, 10% of users with viewing records <50 are reserved for this purpose as cold start data for evaluation of special cases.
1.3 fair evaluation of the model, in order to enable comparative evaluation of the model to have effectiveness and fairness, downstream targets embedded into the model are unified into recommended tasks, evaluation indexes of the downstream targets are reconstructed into the most common evaluation indexes in the recommended tasks, namely accuracy, recall rate and AUC, and the calculation mode is as follows.
Figure BDA0003362555770000031
Figure BDA0003362555770000032
Figure BDA0003362555770000033
Wherein TP represents the positive samples of the correct classification, TN the negative samples of the correct classification, FP the positive samples of the incorrect classification, FN the negative samples of the incorrect classification, M the number of the positive samples, N the number of the negative samples, rankiRepresenting the prediction score of the positive sample i.
In addition, the personalized index of the recommendation result is increased, the similarity evaluation of the recommendation result is increased, and the index calculates the cosine similarity between the recommendation lists of different users; the personalized score calculation process of the user i is shown in formula (4):
Figure BDA0003362555770000034
where U represents a set of users, qiA vector of certain recommendation lists representing users i.
Step 2: acquiring parameters and characteristic data in a model training process;
and constructing a heterogeneous film information network for the film data preprocessed in the steps as the input of the model, then using three representative heterogeneous network embedding models HetGNN, KGAT and NIRec for training, and keeping relevant parameters and characteristic information in the training process. The processing and data saving of the heterogeneous network embedding model comprises the following steps:
2.1 acquiring meta-path characteristics of the NIRec model;
in order to explore the difference of the neighbor characteristics learned by the heterogeneous network embedded model based on the meta-path neighbors in the end-to-end interaction modeling method, path instance information enumerated by the heterogeneous network embedded model in the process of calculating the path attention weight-based parameters is reserved and used as visualization data.
The path information is divided into two parts, namely node characteristic information based on the example path
Figure BDA0003362555770000041
The second is based on the path characteristic information of the example path, and the calculation methods are respectively as follows.
Figure BDA0003362555770000042
Figure BDA0003362555770000043
hj ρRepresenting example path information based on a path rho in an interaction matrix, wherein W, alpha, beta and b all belong to trainable parameters, and NρjRepresenting meta-path-guided neighborhood and single meta-path and semantic specific aggregated information.
The path information is characteristic paths of all positive and negative samples, in order to ensure fairness of model comparison, the paths are screened by using a final recommendation result, namely the positive sample, and only example path information of a source node which can reach a target node is reserved as visual data.
2.2 acquiring attention weight characteristics of the KGAT model;
in order to explore the difference between neighbor features learned by a heterogeneous network embedding model based on an information transfer mechanism and a non-self-defined soft path formed by an attention mechanism, a bidirectional attention weight parameter pi (h, r, t) in the model training process is reserved, the parameter represents the importance of a neighbor node to a source node, and a specific calculation formula is as follows.
Figure BDA0003362555770000044
Wherein h represents a source node, t represents a tail node, r represents the relationship from the tail node to the source node, and NhRepresenting the set of all neighboring nodes around the source node.
And then, calculating an average attention value among different types of nodes according to the obtained data, taking the average attention value as a screening threshold value of a neighbor node constructing the meta-path, and regarding the neighbor node higher than the threshold value as an important node, otherwise, discarding the node. And finally, traversing the important neighbor nodes of each source node by using a depth-first algorithm to generate a meta-path with the length less than 4 and capable of reaching the target node, wherein the meta-path contains characteristic information from the source node to the target node, and therefore the meta-path is used as visualization data for model comparison.
2.3, acquiring the meta-path characteristics of the HetGNN model;
in order to explore the feature difference learned by the heterogeneous network embedding model based on the meta-path in the neighbor node aggregation process, the path examples participating in the model learning process are reserved and used as visualization data for model comparison.
In the initialization process of the heterogeneous network embedded model, when the initial imbedding of the nodes is generated by using a random walk algorithm with restarting, equal-proportion sampling is carried out according to the relative proportion of each type of node in the total number of the nodes, film neighbors, user neighbors, type neighbors, actor neighbors and director neighbors of each node are sampled, and example paths formed by the neighbors are calculated by the following formula
Figure BDA0003362555770000051
Figure BDA0003362555770000052
The original data are learned through the three models, and node entity paths used by the NIRec model in the learning process are kept as path fusion characteristics; the attention weight of each node pair in the KGAT model in the learning process is reserved, and an entity path is constructed through a network connection relation to serve as a fusion characteristic; and keeping entity path information sampled in the learning process of the HetGNN model as path fusion characteristics.
And step 3: comparing the exploration design based on the model of user interaction;
based on the path characteristic data reserved in the steps, corresponding visual charts are designed according to different index data and model parameters. The specific steps are as follows:
3.1, a data selection inlet and a detail overview inlet of a visual design are used for displaying evaluation indexes of the heterogeneous embedded model for a recommended task by using annular stacked histograms according to index overview requirements of model researchers for model evaluation, each bar graph represents a recommended evaluation result of a certain user, and each color represents a model. In order to facilitate data screening, more interaction methods need to be designed, and a specified target needs to be selected by selecting a frame. In addition, a relational Wien diagram for displaying recommended results among different models is designed, corresponding results are scattered in corresponding color areas of the models randomly, and scattered point results which hit the film watching history of a user are attached to a white frame to show that the scattered point results belong to a correctly predicted film;
3.2 visually comparing the model details of the design, designing a scalable force guide diagram based on the requirements of the design and model developers in step 3.1 for viewing the detailed information of the embedded model in the process of aggregating neighbors, and showing meta paths between the selected target and the recommendation result thereof, wherein the paths substantially comprise UMUM (user-mode-user-mode), UMGM (user-mode-gene-mode), UMAM (user-mode-operator-mode), and UMDM (user-mode-director-mode). The width of the path represents the importance degree of the path to the target node, and the more entity nodes the radius of the node represents, the higher the importance degree is.
And 4, step 4: interactive visual recommendations based on heterogeneous embedding models;
the heterogeneous network embedded model aims to learn various types of objects and relations in the heterogeneous graph and mine implicit rich structure and semantic information for recommending tasks.
A heterogeneous embedding method NIRec and HetGNN based on meta-path is used for a recommendation system to carry out three steps, namely, an effective meta-path is designed by an expert according to domain knowledge; secondly, learning user neighborhood characteristics by using neurons through the meta-path through a heterogeneous embedding model, and respectively obtaining embedding vectors of the user and the article so as to mine user preference in the user-article interaction graph; and thirdly, carrying out similarity calculation on the embedding vector of the user and all article embedding vectors, excluding training data, and taking the first K articles with the highest similarity as recommendation results.
The heterogeneous embedding method KGAT based on the information transfer mechanism utilizes the natural connection attribute of the heterogeneous graph to carry out feature learning on tail nodes connected with target nodes through edges, and comprises the following specific steps: firstly, acquiring an embedded vector of each node in the heterogeneous graph by using a common TransR method in the field of knowledge graphs; secondly, merging neighbor features around the target node into the embedded vector of the node through a propagation mechanism, and calculating the importance of each neighbor node by using an attention mechanism so as to distinguish the merging degree of each feature; and thirdly, after iteration high-order propagation, multiplying the high-order embedded vectors of the user and the articles to obtain the final recommended probability, and taking the first K articles with the highest recommended probability as recommendations.
Finally, path information between the different models and the recommended movies and the target users is shown in the recommendation result tracing view. The meta-path-based method provides interpretability of the recommendation through the meta-path, and the heterogeneous network embedding model searches for what movies the target user watches as well as the persons who watch the same movies through the path, so that the recommendations are obtained, and preference characteristics of similar users are mined. The method based on the propagation mechanism provides interpretability of recommendation through attention scores in the neighbor relation, for example, attention weights of type nodes such as 'love', 'plot' and the like connected with a user are high, and finally, the recommended movie contains the type characteristics, which indicates that recommendation of the heterogeneous network embedding model accords with user preference.
Preferably, in step 1, 2 ten thousand pieces of bean user data and 2 ten thousand pieces of bean movie data are collected from a real bean movie website.
Preferably, the number of movie neighbors, user neighbors, genre neighbors, actor neighbors, director neighbors sampled for each node in step 2.3 is 25, 15, 10 respectively.
The interaction method of step 3.1, comprising slide bar inter-zone selection.
The invention mainly solves the following three problems: systematically comparing and analyzing three heterogeneous information network embedding models with different architectures to provide understanding of influence of different neighbor feature aggregation mechanisms on embedding results; secondly, by a bottom-up visualization method, the relationship between the recommendation result and the model learning process is displayed, the context information of the model is provided, and non-model developers are helped to understand the generation process of the recommendation result; and thirdly, by combining the real bean-film heterogeneous graph data with the characteristic data of the heterogeneous embedded model in the recommendation process, the problem of the recommended black box is solved, and the interpretability is increased.
The invention has the advantages that: (1) simple operation and strong expression. The user only needs to perform simple selection operation on the visual interface; the system will present the recommendation with its rich contextual characteristics and compare and evaluate them visually. (2) The innovation is strong. At present, a visual comparison system aiming at a heterogeneous network embedded model does not exist, the method can carry out systematic comparison and analysis on three embedded models with different architectures, can help developers to understand the influence of different neighbor feature aggregation mechanisms on an embedded result, discovers the neighbor interaction learning problem of an NIRec model and the neighbor sampling fusion problem of an HetGNN model through case analysis, and proves the effectiveness of the system in the aspect of mining a model neighbor fusion mechanism. (3) The practical significance. The recommendation result tracing graph can help non-model developers to quickly understand the origin of the recommendation result and the relationship between the model and the recommendation result, the non-model developers can know the context information of the recommended movies and the association between the recommended movies and the user preference by combining the viewing history thermodynamic diagram and the correlation histogram, the recommendation black box problem is solved, and the interpretability of the recommendation result is increased.
Drawings
FIG. 1 is a general flow diagram of the process of the present invention.
Fig. 2 is a general framework diagram of the KGAT model for visual comparison according to the present invention.
FIG. 3 is a general framework diagram of the NIRec model of the present invention for visual comparison.
Fig. 4 is a general framework diagram of the HetGNN model of the present invention for visual contrast.
FIG. 5 is a system implementation of the method of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
The embodiment provides an interactive visual recommendation method based on a heterogeneous network information embedding model and related to movie recommendation on a network.
Referring to a general flow chart of the technical scheme, the method comprises four stages in total, namely raw data crawling and processing, data preprocessing and model training to obtain model parameters and characteristic data and visual recommendation. In the data acquisition stage, data of a broad bean movie website are mainly crawled and cleaned, and user data with less viewing history and movie data with less viewing times by effective users are removed; the data preprocessing stage comprises the steps of constructing original data into a heterogeneous information network, and unifying downstream tasks and model evaluation indexes of all models; in the model training stage, the constructed film heterogeneous information network is used as input data of the model, the HetGNN, KGAT and the NIRec models are trained, and relevant parameters and characteristic information in the training process are reserved. And finally, carrying out visual design on relevant characteristics and results in the model training process, and adding corresponding interactive functions to finish comparison between interpretable recommendation of the film and recommendation results among different models.
The interactive visual recommendation method based on the heterogeneous network information embedding model comprises the following specific steps:
step 1: crawling and cleaning the data; 2 ten thousand pieces of bean user data and 2 ten thousand pieces of bean movie data are collected from a real bean movie website and are cleaned according to a certain rule, so that the validity of the data is ensured;
1.1, acquiring an original data set, randomly selecting one bean user, randomly reserving the bean user according to the percentage of the film watching quantity of the bean user, randomly selecting three users from a concerned list of the bean user, and repeating the process; then, counting the film watching records of all users, reserving effective film numbers, and crawling film information;
1.2, carrying out data preprocessing on an original data set, and carrying out data filtering under the conditions that the film watching records of a user are at least more than 50, the film watching times are more than 20 by an effective user and the like in order to make the evaluation on a model fairer and more effective, namely, prevent invalid data; since the recommended model usually has a cold start problem, 10% of users with viewing records <50 are reserved for this purpose as cold start data for evaluation of special cases.
1.3 fair evaluation of the model, unifying the three downstream tasks of the embedded model for comparison into recommendation, and modifying the evaluation indexes of the three downstream tasks into the most common evaluation indexes in the recommended tasks, namely accuracy P and recall R, AUC. In addition, the scheme increases the personalized index of the recommendation result and increases the similarity evaluation of the recommendation result.
Step 2: acquiring parameters and characteristic data in a model training process; constructing a heterogeneous information movie network by utilizing preprocessed movie data as input of a model, training by three representative heterogeneous embedded networks HetGNN, KGAT and NIRec, reserving relevant parameters and characteristic information in the training process, and designing a data visualization view for information display;
2.1 training the NIRec model to obtain parametric and feature data
Nierc designs a neighborhood-based interaction model to enhance the representation of objects. On the basis of a recommendation system, an integration method is used for guiding the selection of neighbors of different steps and types, a heterogeneous interaction module is designed to obtain rich interaction information, and a heterogeneous aggregation module obtains rich object embedding. Therefore, the path instance information enumerated by the model in the process of calculating the attention weight parameter based on the path is reserved and used as visualization data
2.2 training KGAT models to obtain parameters and feature data
The KGAT model solves the modeling of a high-order relation in a recursive embedding and propagation mode, updates the embedding of nodes according to the embedding of neighbor embedding of the nodes, and captures high-order connectivity with linear time complexity; the weight of each neighbor in the propagation process is learned through an attention-based aggregation method. Therefore, we retain the two-way attention weight parameters and local neighbor features in the model training process as visualization data.
2.3 training HetGNN model to obtain parameter and feature data
The HetGNN model can effectively capture the heterogeneity of structures and contents, simultaneously considers the structure information and the node attribute information of a heterogeneous network, and separates the learning process from downstream tasks so as to meet the downstream tasks of various HINs. Therefore, the path instance of the learning process which is participated in when the model is initialized by the embedded vector is reserved as the visualization data of model comparison.
The original data are learned through the three models, and node entity paths used by the NIRec model in the learning process are reserved as path fusion characteristics; the attention weight of each node pair in the KGAT model in the learning process is reserved, and an entity path is constructed through a network connection relation to serve as a fusion characteristic; and the entity path information sampled in the learning process of the HetGNN model is reserved as the path fusion characteristics.
And step 3: comparing the exploration design based on the model of user interaction;
based on the path characteristic data reserved in the steps, the invention designs the corresponding visual chart according to different index data and model parameters. The specific steps are as follows:
3.1 visualization design data selection entry and detail overview entry, according to model researchers' index summary requirements for model evaluation, the invention uses annular stack column chart to show evaluation indexes of heterogeneous embedded model for recommended tasks, each bar chart represents a recommended evaluation result of a certain user, and each color represents a model. In order to facilitate data screening, more interactive methods need to be designed, such as sliding bar interval selection and selection of a frame selection designated target. In addition, a relational Wien diagram for displaying recommended results among different models is designed, corresponding results are scattered in corresponding color areas of the models randomly, and scattered point results which hit the film watching history of a user are attached to a white frame to show that the scattered point results belong to a correctly predicted film;
3.2 model details of the visual contrast design, based on the design and model developer's requirement for looking up the detail information of the embedded model in the process of aggregating neighbors in step 3.1, a scalable force guide graph is designed, which shows the meta-path between the selected target and its recommended result, the path approximately includes UMUM, UMGM, UMAM, UMDM. The width of the path represents the importance degree of the path to the target node, and the more entity nodes the radius of the node represents, the higher the importance degree is.
And 4, step 4: interactive visual recommendations based on heterogeneous embedding models; the invention can enable the heterogeneous embedded model developer to visually compare the user characteristic difference learned by different modes in practical application, thereby solving the problem of black box of the heterogeneous embedded model in the recommended task and improving the effectiveness of recommendation.
A heterogeneous embedding method NIRec and HetGNN based on meta-path is used for a recommendation system to carry out three steps, namely, an effective meta-path is designed by an expert according to domain knowledge; secondly, learning user neighborhood characteristics by using neurons through the meta-path through a heterogeneous embedding model, and respectively obtaining embedding vectors of the user and the article so as to mine user preference in the user-article interaction graph; and thirdly, carrying out similarity calculation on the embedding vector of the user and all article embedding vectors, excluding training data, and taking the first K articles with the highest similarity as recommendation results.
The heterogeneous embedding method KGAT based on the information transfer mechanism comprises the following specific steps: firstly, acquiring an embedded vector of each node in the heterogeneous graph by using a TransR method; secondly, fusing neighbor characteristics around the target node into an embedded vector of the node through a propagation mechanism, and calculating the importance of the neighbor node by using an attention mechanism; and thirdly, after iteration high-order propagation, multiplying the high-order embedded vectors of the user and the articles to obtain the final recommended probability, and taking the first K articles with the highest recommended probability as recommendations.
Finally, path information between the different models and the recommended movies and the target users is shown in the recommendation result tracing view. The meta-path based approach provides interpretability of recommendations through meta-paths, e.g., path UMUM mines preference characteristics of similar users. The propagation mechanism based approach provides interpretability of recommendations by attention scores in neighbor relations.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims (4)

1. The interactive visual recommendation method based on the heterogeneous network information embedding model comprises the following steps:
step 1: crawling and cleaning the data; collecting the user data of the broad beans and the broad bean movie data from a real broad bean movie website, and cleaning the data to ensure the validity of the data;
1.1, acquiring an original data set, randomly selecting one bean user, randomly reserving the bean user according to the percentage of the film watching quantity of the bean user, randomly selecting three users from a concerned list of the bean user, and repeating the process; then, counting the film watching records of all users, reserving effective film numbers, and crawling film information;
1.2, carrying out data preprocessing on an original data set, and carrying out data filtering under the conditions that the film watching records of a user are at least more than 50, the film watching times of an effective user are more than 20 and the like in order to make the evaluation on a model fairer and more effective, namely, prevent invalid data; since the recommended model usually has a cold start problem, 10% of users with viewing records <50 are reserved as cold start data for special case evaluation;
1.3, fairly evaluating the model, unifying downstream targets embedded into the model into recommended tasks in order to ensure that the comparative evaluation of the model has effectiveness and fairness, reconstructing evaluation indexes of the downstream targets into the most common evaluation indexes in the recommended tasks, namely accuracy, recall rate and AUC, and calculating the following calculation modes;
Figure FDA0003362555760000011
Figure FDA0003362555760000012
Figure FDA0003362555760000013
wherein TP represents the positive samples of the correct classification, TN the negative samples of the correct classification, FP the positive samples of the incorrect classification, FN the negative samples of the incorrect classification, M the number of the positive samples, N the number of the negative samples, rankiA prediction score representing a positive sample i;
in addition, the personalized index of the recommendation result is increased, the similarity evaluation of the recommendation result is increased, and the index calculates the cosine similarity between the recommendation lists of different users; the personalized score calculation process of the user i is shown in formula (4):
Figure FDA0003362555760000021
where U represents a set of users, qiA vector consisting of a certain recommendation list representing user i;
step 2: acquiring parameters and characteristic data in a model training process;
constructing a heterogeneous film information network for the film data preprocessed in the previous step as the input of a model, then using three representative heterogeneous network embedding models HetGNN, KGAT and NIRec for training, and reserving relevant parameters and characteristic information in the training process; the model processing and data saving comprises the following steps:
2.1 acquiring meta-path characteristics of the NIRec model;
in order to explore the difference of the learned neighbor characteristics of the heterogeneous network embedding model based on the meta-path neighbors in the end-to-end interaction modeling method, path instance information enumerated by the heterogeneous network embedding model in the process of calculating the path attention weight-based parameters is reserved and used as visualization data;
the path information is divided into two parts, namely node characteristic information based on the example path
Figure FDA0003362555760000022
Secondly, based on the path characteristic information of the example path, the calculation methods are respectively as follows;
Figure FDA0003362555760000023
Figure FDA0003362555760000024
Figure FDA0003362555760000025
representing example path information based on a path rho in an interaction matrix, wherein W, alpha, beta and b all belong to trainable parameters, and NρjRepresenting meta-path-guided neighborhood and single meta-path and semantic-specific aggregated information;
the path information is the characteristic paths of all positive and negative samples, in order to ensure the fairness of model comparison, the paths are screened by using the final recommendation result, namely the positive sample, and only the example path information of the source node which can reach the target node is reserved as visual data;
2.2 acquiring attention weight characteristics of the KGAT model;
in order to explore the difference between neighbor characteristics learned by a non-self-defined soft path formed by combining a heterogeneous network embedded model based on an information transfer mechanism and an attention mechanism, a bidirectional attention weight parameter pi (h, r, t) in the training process of the heterogeneous network embedded model is reserved, the importance of a neighbor node to a source node is represented, and a specific calculation formula is as follows;
Figure FDA0003362555760000031
wherein h represents a source node, t represents a tail node, r represents the relationship from the tail node to the source node, and NhRepresenting a set of all neighbor nodes around the source node;
then, calculating an average attention value among different types of nodes according to the obtained data, taking the average attention value as a screening threshold value of a neighbor node constructing the meta-path, and taking the neighbor node higher than the threshold value as an important node, otherwise, discarding the node; finally, traversing the important neighbor nodes of each source node by using a depth-first algorithm to generate a meta-path with the length less than 4 and capable of reaching the target node, wherein the meta-path comprises characteristic information from the source node to the target node, and therefore the meta-path is used as visual data for model comparison;
2.3, acquiring the meta-path characteristics of the HetGNN model;
in order to explore the feature difference learned by a heterogeneous network embedded model based on a meta-path in the neighbor node aggregation process, a path example participating in the model learning process is reserved and used as visualized data of model comparison;
in the initialization process of the heterogeneous network embedded model, random walk calculation with restart is utilizedWhen the method generates the initial imbedding of the nodes, equal proportion sampling is carried out according to the relative proportion of each type of node in the total number of the nodes, film neighbors, user neighbors, type neighbors, actor neighbors and director neighbors of each node are sampled, and example paths formed by the neighbors are calculated by the following formula
Figure FDA0003362555760000032
Figure FDA0003362555760000041
Learning the original data through the three models, and reserving a node entity path used by the NIRec model in the learning process as a path fusion characteristic; the attention weight of each node pair in the KGAT model in the learning process is reserved, and an entity path is constructed through a network connection relation to serve as a fusion characteristic; keeping entity path information sampled in the learning process of the HetGNN model as path fusion characteristics;
and step 3: comparing the exploration design based on the model of user interaction;
based on the path characteristic data reserved in the steps, designing a corresponding visual chart according to different index data and model parameters; the specific steps are as follows:
3.1, a data selection inlet and a detail overview inlet of a visual design are used for displaying evaluation indexes of a heterogeneous network embedded model aiming at recommended tasks by using annular stacked histograms according to index overview requirements of model researchers on model evaluation, each bar graph represents a recommended evaluation result of a certain user, and each color represents a model; in order to facilitate data screening, more interaction methods need to be designed, and a specified target is selected by selecting a frame; in addition, a relational Wien diagram for displaying recommended results among different models is designed, corresponding results are scattered in corresponding color areas of the models randomly, and scattered point results which hit the film watching history of a user are attached to a white frame to show that the scattered point results belong to a correctly predicted film;
3.2 visually comparing the details of the designed model, designing a scalable force guide diagram based on the requirements of the design and model developers in the step 3.1 for viewing the detailed information of the embedded model in the neighbor aggregation process, and showing meta paths between the selected target and the recommendation result thereof, wherein the paths include UMUM (user-mode-user-mode), UMGM (user-mode-gene-mode), UMAM (user-mode-operator-mode), and UMDM (user-mode-director-mode); the width of the path represents the importance degree of the path to the target node, and the radius of the node represents that the more entity nodes the node contains, the higher the importance degree is;
and 4, step 4: interactive visual recommendation based on heterogeneous network embedding model;
the heterogeneous network embedded model aims to learn various types of objects and relations in the heterogeneous graph, and excavate hidden rich structures and semantic information for recommending tasks;
a heterogeneous embedding method NIRec and HetGNN based on meta-path is used for a recommendation system to carry out three steps, namely, an effective meta-path is designed by an expert according to domain knowledge; secondly, learning user neighborhood characteristics by using the neuron through the meta-path by using the heterogeneous network embedding model, and respectively obtaining embedding vectors of the user and the article so as to mine user preference in the user-article interaction diagram; thirdly, similarity calculation is carried out on the embedding vector of the user and all article embedding vectors, training data are excluded, and the first K articles with the highest similarity serve as recommendation results;
the heterogeneous embedding method KGAT based on the information transfer mechanism utilizes the natural connection attribute of the heterogeneous graph to carry out feature learning on tail nodes connected with target nodes through edges, and comprises the following specific steps: firstly, acquiring an embedded vector of each node in the heterogeneous graph by using a common TransR method in the field of knowledge graphs; secondly, merging neighbor features around the target node into the embedded vector of the node through a propagation mechanism, and calculating the importance of each neighbor node by using an attention mechanism so as to distinguish the merging degree of each feature; after iteration high-order propagation, multiplying high-order embedded vectors of the user and the articles to obtain a recommended final probability, and taking the first K articles with the highest recommended probability as recommendations;
finally, path information of different models aiming at the recommended movies and the target users is displayed in a recommendation result traceability view; the meta-path-based method provides interpretability of recommendation through meta-paths, and the embedded model searches for what movies the same movie people as the target user watch through the paths, takes the movies as recommendations, and mines preference characteristics of similar users; the method based on the propagation mechanism provides interpretability of the recommendation through the attention scores in the neighbor relations, and finally the recommended movies contain the type features, which indicates that the recommendation of the heterogeneous network embedding model conforms to the preference of the user.
2. The interactive visual recommendation method based on the heterogeneous network information embedding model according to claim 1, characterized in that: in step 1, 2 ten thousand pieces of broad bean user data and 2 ten thousand pieces of broad bean movie data are collected from a real broad bean movie website.
3. The interactive visual recommendation method based on the heterogeneous network information embedding model according to claim 1, characterized in that: in step 2.3, the numbers of the movie neighbors, the user neighbors, the type neighbors, the actor neighbors and the director neighbors are 25, 15 and 10 respectively for each node sample.
4. The interactive visual recommendation method based on the heterogeneous network information embedding model according to claim 1, characterized in that: the interaction method of step 3.1, comprising slide bar inter-zone selection.
CN202111371845.8A 2021-11-18 2021-11-18 Interactive visual recommendation method based on heterogeneous network information embedding model Pending CN114238750A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111371845.8A CN114238750A (en) 2021-11-18 2021-11-18 Interactive visual recommendation method based on heterogeneous network information embedding model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111371845.8A CN114238750A (en) 2021-11-18 2021-11-18 Interactive visual recommendation method based on heterogeneous network information embedding model

Publications (1)

Publication Number Publication Date
CN114238750A true CN114238750A (en) 2022-03-25

Family

ID=80750050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111371845.8A Pending CN114238750A (en) 2021-11-18 2021-11-18 Interactive visual recommendation method based on heterogeneous network information embedding model

Country Status (1)

Country Link
CN (1) CN114238750A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707654A (en) * 2022-06-06 2022-07-05 浙江大学 Algorithm training reasoning performance visualization method and device based on artificial intelligence framework
CN114756762A (en) * 2022-06-13 2022-07-15 腾讯科技(深圳)有限公司 Data processing method, device, equipment, storage medium and program product
CN114896514A (en) * 2022-07-14 2022-08-12 西安电子科技大学 Web API label recommendation method based on graph neural network
CN116071119A (en) * 2022-08-16 2023-05-05 电子科技大学 Model-agnostic inverse fact interpretation method based on multi-behavior recommendation model
CN116701706A (en) * 2023-07-29 2023-09-05 腾讯科技(深圳)有限公司 Data processing method, device, equipment and medium based on artificial intelligence

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707654A (en) * 2022-06-06 2022-07-05 浙江大学 Algorithm training reasoning performance visualization method and device based on artificial intelligence framework
CN114707654B (en) * 2022-06-06 2022-08-23 浙江大学 Algorithm training reasoning performance visualization method and device based on artificial intelligence framework
CN114756762A (en) * 2022-06-13 2022-07-15 腾讯科技(深圳)有限公司 Data processing method, device, equipment, storage medium and program product
CN114896514A (en) * 2022-07-14 2022-08-12 西安电子科技大学 Web API label recommendation method based on graph neural network
CN114896514B (en) * 2022-07-14 2022-09-30 西安电子科技大学 Web API label recommendation method based on graph neural network
CN116071119A (en) * 2022-08-16 2023-05-05 电子科技大学 Model-agnostic inverse fact interpretation method based on multi-behavior recommendation model
CN116071119B (en) * 2022-08-16 2023-12-08 电子科技大学 Model-agnostic inverse fact interpretation method based on multi-behavior recommendation model
CN116701706A (en) * 2023-07-29 2023-09-05 腾讯科技(深圳)有限公司 Data processing method, device, equipment and medium based on artificial intelligence
CN116701706B (en) * 2023-07-29 2023-09-29 腾讯科技(深圳)有限公司 Data processing method, device, equipment and medium based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN114238750A (en) Interactive visual recommendation method based on heterogeneous network information embedding model
WO2016035072A2 (en) Sentiment rating system and method
Lin et al. Rclens: Interactive rare category exploration and identification
Ou et al. Object-relation reasoning graph for action recognition
Arendt et al. Towards rapid interactive machine learning: evaluating tradeoffs of classification without representation
CN113326384A (en) Construction method of interpretable recommendation model based on knowledge graph
Harakawa et al. Extracting hierarchical structure of web video groups based on sentiment-aware signed network analysis
Stracuzzi et al. Quantifying Uncertainty to Improve Decision Making in Machine Learning.
CN117216419B (en) Data analysis method based on AI technology
Gong et al. Fake news detection through graph-based neural networks: A survey
CN115481325A (en) Personalized news recommendation method and system based on user global interest migration perception
CN113704439B (en) Conversation recommendation method based on multi-source information heteromorphic graph
Quintana et al. Recommendation techniques in forensic data analysis: a new approach
Chen et al. Attention in reasoning: Dataset, analysis, and modeling
Umar et al. Comparing the Performance of Data Mining Algorithms in Predicting Sentiments on Twitter
Niture Predictive analysis of YouTube trending videos using Machine Learning
Stanhope et al. Group link prediction
Erfanian et al. Chameleon: Foundation models for fairness-aware multi-modal data augmentation to enhance coverage of minorities
Ntalianis et al. Social media content ranking based on social computing and user influence
Odeh Event detection in heterogeneous data streams
Ghozia et al. Intelligence Is beyond Learning: A Context‐Aware Artificial Intelligent System for Video Understanding
Unger et al. Hierarchical Latent Context Representation for CARS
Schaffer et al. Truth, lies, and data: Credibility representation in data analysis
Xu A DeepFM model-based personalized Restaurant Recommendation System
Jutte The Smart Annotation Tool: optimizing semi-automated behavioural annotation using an AutoML framework supported by classification correctness prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination