CN112966091A

CN112966091A - Knowledge graph recommendation system fusing entity information and heat

Info

Publication number: CN112966091A
Application number: CN201911272709.6A
Authority: CN
Inventors: 杜雨燕; 梁斌; 梁家卿; 肖仰华
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2021-06-15
Anticipated expiration: 2039-12-12
Also published as: CN112966091B

Abstract

The invention aims to provide a recommendation system capable of fully utilizing semantic information and graph structure information in a knowledge graph, which is mainly characterized by comprising the following steps: a calculation method of semantic similarity of entities is adopted; an entity embedding calculation method aiming at entity recommendation application is adopted; and combining semantic information in the map with the map structure information recommending entity. The recommendation system specifically comprises: the entity similarity calculation module is used for obtaining the similarity of the entity after the text information of the knowledge graph is input; the map embedded expression learning module is used for obtaining embedded expression of the entity by taking the relation between the entities in the knowledge map as input; and the entity recommendation generation module is used for fusing the similarity obtained by the entity similarity calculation module and the embedded representation obtained by the map embedded representation learning module into a final similarity score based on the click feedback data of the user, so that the recommendation application recommends the entity to the user.

Description

Knowledge graph recommendation system fusing entity information and heat

Technical Field

The invention belongs to the technical field of information, relates to an information content recommendation system, and particularly relates to a knowledge graph recommendation system fusing entity information and heat.

Background

With the rapid development of information technology, a large amount of data in each field is continuously crossed and fused, so that a huge amount of data exceeding the data of any previous age is generated, and the problem of information overload is caused; meanwhile, the information explosion reduces the utilization rate of the information, so that a user is difficult to efficiently acquire the information, and therefore, how to effectively acquire the information from massive information becomes a subject worthy of deep excavation.

The traditional search algorithm (i.e. the method for searching related objects based on the search terms of the user) can only present the ranking results of the same objects to the user, and cannot provide the ranking results of corresponding services for different interests and hobbies of the user.

Therefore, in the information browsing means such as web browsing and APP browsing in the prior art, a recommendation system is mostly adopted, that is, relevant information is recommended and pushed to the user according to the current browsing situation or browsing record of the user. The recommendation system is an important information filtering means, and is one of the effective methods for solving the problem of information overload at present, and can recommend information which is interested by a user according to the relatively fuzzy intention provided by the user.

In the prior art, recommendation methods adopted by a recommendation system can be divided into three categories: 1) based on the recommendation of the content similarity, the method provides recommendation for the user according to the object which is selected by the user currently, judges the interest content of the user through the current object, and recommends other objects which are similar to the current access object in content. 2) The core idea of collaborative filtering recommendation is to utilize historical behaviors of other user groups similar to user behaviors for recommendation, the historical behaviors of users are generally expressed as a relationship matrix from the users to objects, and the users with similar historical behaviors are obtained through calculation of the relationship matrix, so that contents which are interested by the users are recommended. The method is the best known in recommendation system algorithms and works well when there is a sufficient amount of user behavior data. 3) The recommendation based on the knowledge graph is that the knowledge graph is a semantic network, different types of entities are connected together through various semantic relations, and the entities and the relations form a large-scale graph together. Because the knowledge graph contains rich semantic associated knowledge among entities, some recommendation systems in recent years use knowledge of the knowledge graph and introduce auxiliary information in the graph, and provide a recommendation method based on entity paths in the graph and a recommendation method based on features.

In the above conventional recommendation technology, there are mainly a data sparsity problem and a cold start problem of the recommendation system when there is a lack of user data. In a real application scene, since a user often has any form of interaction with only a small number of articles directly, the interaction information between the user and the articles is generally very sparse. For example, in a recommended scene of a movie, the total number of movies may be as many as ten thousand, and each user typically only scores tens of movies. When such sparse observation data is used to predict a large amount of unknown information, the risk of model overfitting is greatly increased. As the most popular recommendation algorithm at present, collaborative filtering is difficult to provide effective recommendations when the system has only a small accumulation of user behavior; moreover, for newly added users or items, it is also difficult to accurately model and recommend because there is no corresponding interaction information in the system. The conventional recommendation based on the knowledge graph only utilizes a small amount of knowledge in the knowledge graph, and does not fully use semantic information and graph structure information in the graph, so that the recommendation effect is still defective.

Disclosure of Invention

In order to solve the problems, the invention provides a recommendation system capable of fully utilizing semantic information and graph structure information in a knowledge graph, which is mainly characterized by comprising the following steps: 1) a calculation method of semantic similarity of entities is adopted; 2) an entity embedding calculation method aiming at entity recommendation application is adopted; 3) and combining semantic information in the map with the map structure information recommending entity.

Specifically, the invention provides a knowledge graph recommendation system fusing entity information and popularity, which is arranged in a recommendation application for recommending entities to a user, and is used for obtaining similarity scores of the entities according to knowledge graphs established by related information of a plurality of entities and click feedback information of the user when browsing the related information of the entities so as to make the recommendation application recommend according to the similarity scores, and the system is characterized by comprising: the entity similarity calculation module is used for calculating and obtaining the similarity of the entities by using the entity names, the text description information and the triple semantic information of the entities in the knowledge graph after the text information of the knowledge graph is input; the map embedded expression learning module comprises a TransE model improved according to recommended application, and machine learning is carried out by taking the relation among the entities in the knowledge map as input so as to obtain embedded expression of the entities in the knowledge map; and the entity recommendation generation module is used for fusing the similarity obtained by the entity similarity calculation module and the embedded representation obtained by the map embedded representation learning module into a final similarity score based on the click feedback data of the user, so that the recommendation application recommends the entity to the user according to the similarity score.

The knowledge graph recommendation system fusing entity information and heat provided by the invention can also have the technical characteristics that the entity similarity calculation module comprises a literal similarity operator module for calculating the similarity between the names of the entities, a text similarity operator module for calculating the similarity between the description texts of the entities and a matrix generation submodule for generating an entity similarity matrix as the similarity according to the results of the literal similarity operator module and the text similarity operator module.

Furthermore, the above-mentioned knowledge-graph recommendation system for fusing entity information and heat may also have the technical feature that the literal similarity calculation operator module takes the entity name of the entity as input, calculates the text similarity of the entity based on the longest common subsequence LCS and the edit distance Levenshtein, and sets the entity e_iAnd e_jThe corresponding entity name is s_iAnd s_jEntity e_iAnd e_jSimilarity SimLCS based on longest common subsequence LCS_ijComprises the following steps:

in the above formula, | LCS(s)_i,s_j) L is s_iAnd s_jLongest common subsequence length, | s_iI and|s_jl is respectively s_iAnd s_jLength of (d);

entity e_iAnd e_jSimilarity SimED based on edit distance Levenshtein_ijComprises the following steps:

in the above formula, ldist(s)_i,s_j) Is s is_iAnd s_jClass edit distance, | s_iI and s_jL is respectively s_iAnd s_jLength of (d).

Furthermore, the knowledge graph recommendation system for fusing entity information and popularity may further have a technical feature that the text similarity calculation subunit module calculates the text similarity of the entities by using five indexes, namely, a TF-IDF index, an LSI model, an LDA model, an HDP model and a Jaccard coefficient, with the description text of each entity as an input document.

Further, the aforementioned knowledge graph recommendation system integrating entity information and popularity may further have a technical feature in which an entity e is referred to_iAnd e_jSetting the two entities e_iAnd e_jThe corresponding description texts are respectively d_iAnd d_j，

Entity e_iAnd e_jBased on the similarity SimTFIDF of TF-IDF_ijComprises the following steps:

SimTFIDF_ij＝cos(v_i,v_j)

in the above formula, v_iIs d_iOf the TF-IDF weighted vector of each dimension w_ijDefined as the smoothed TF-IDF factor;

entity e_iAnd e_jSimilarity SimLSI based on LSI_ijComprises the following steps:

SimLSI_ij＝cos(v_i,v_j)

in the above formula, v_iIs D' in D_iCorresponding vector, v_jIs D' in D_jCorresponding vector, D' being reduced in dimensionA document vectorized representation;

entity e_iAnd e_jBased on the similarity SimLDA of the LDA_ijComprises the following steps:

SimLDA_ij＝cos(v_i,v_j)

in the above formula, v_iIs d_iCorresponding (document-topic) distribution, v_jIs d_jCorresponding (document-topic) distribution;

entity e_iAnd e_jBased on the similarity SimHDP of HDP_ijComprises the following steps:

SimHDP_ij＝cos(v_i,v_j)

entity e_iAnd e_jSimilarity SimJaccard based on Jaccard_ijComprises the following steps:

in the above formula, Inter (d)_i,d_j) Is d_iAnd d_jIntersection of Chinese terms, Union (d)_i,d_j) Is d_iAnd d_jUnion of Chinese words.

The knowledge graph recommendation system fusing entity information and heat provided by the invention can also have the technical characteristics,

the map embedded expression learning module adopts an improved method based on a TransE algorithm for learning, and the improvement on the TransE algorithm in the method is as follows:

during training, each edge introduces a weight w:

in the above formula, v_tThe number of accesses of the tail entity is used for measuring the popularity of the entity; v. of_max、v_minRespectively representing the maximum and minimum access times of all edges of the head entity h, and being used for normalizing the weight of each entity; alpha is a hyper-parameter, used to compensate for the weight of low-prevalence edges, for each edge in the knowledge-graph<h,r,t>Calculating the weight w to obtain a set of inputs (h, r, t, w) of the model, wherein the final loss function is as follows:

in the above formula, (h ', r, t') represents a negative sample; gamma is a margin hyper-parameter, and the value is preset.

Further, the knowledge graph recommendation system fusing the entity information and the heat may further have a technical feature that a value of γ is 1.

The knowledge graph recommendation system fusing entity information and heat provided by the invention can also have the technical characteristics that the entity recommendation generation module adopts an improved RankNet model, the improved RankNet model comprises a bottom layer, a middle layer, a main body part and an output layer, and the bottom layer is provided with three input vecs_i,vec_j,vec_kThe three inputs vec_i,vec_j,vec_kAre respectively entity e_i,e_j,e_kThe difference of the embedded representation between the entities is obtained through the calculation of the bottom layer part; the middle layer also has two inputs sim_k.iAnd sim_k.jThe sim_k.iAnd sim_k.jAre respectively an entity pair (e)_i,e_k) And (e)_j,e_k) The input of the middle layer is combined with the output of the bottom layer (i.e. the difference of the calculated embedded representation), and S is calculated by the main part_k,iAnd S_k,jThe output layer is calculated by the following calculation formula to obtain prob_i,jAs a final phaseSimilarity score:

in the above formula, P_ijAs entity e_iRatio e_jWith a target entity e_kMore relevant probabilities, i.e. prob_i,j。

Further, the knowledge graph recommendation system fusing entity information and heat can also have the technical characteristics that access data of each user is used as input when the improved RankNet model is trained, and the trained improved RankNet model is obtained and corresponds to each user respectively.

Action and Effect of the invention

According to the knowledge graph recommendation system fusing entity information and heat provided by the invention, because the entity similarity calculation module can calculate the similarity of the entities by using the entity names, the text description information and the triple semantic information of the entities in the knowledge graph, the map embedded expression learning module can obtain the embedded expression of the entities in the knowledge map by taking the relationship between the entities in the knowledge map as input learning, the entity recommendation generation module can fuse the output of the entity similarity calculation module and the map embedded expression learning module into a final similarity score by utilizing click feedback data of a user, therefore, when entity similarity ranking and entity recommendation are carried out based on the final similarity score, not only can semantic information and graph structure information in the knowledge graph be fully utilized, but also the intention of a user can be combined, and the recommendation result is more accurate and has the advantage of individuation.

Drawings

FIG. 1 is a diagram of a knowledge-graph recommendation system that fuses entity information and heat in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a block diagram of an entity similarity calculation module according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the core idea of the TransE algorithm;

FIG. 4 is a schematic diagram of a model framework of an entity recommendation generation module according to an embodiment of the invention.

Detailed Description

The following describes specific embodiments of the present invention.

Fig. 1 is a configuration diagram of a knowledge-graph recommendation system that merges entity information and heat according to an embodiment of the present invention.

As shown in fig. 1, a knowledge graph recommendation system (hereinafter referred to as recommendation system) 100 for fusing entity information and popularity in the present embodiment includes an entity similarity calculation module 10, a graph embedded representation learning module 20, and an entity recommendation generation module 30.

The recommendation system 100 of the embodiment is configured to perform entity recommendation according to the knowledge graph 200 after establishing the knowledge graph 200 of a plurality of objects (i.e., different entities, such as online shopping goods) that can be recommended. The recommendation system 100 may be set in recommendation applications that need to perform entity recommendation on a user, where the recommendation applications may be an online shopping website, a comprehensive content publishing website, a related APP, and the like. The knowledge graph 200 is obtained by linking different types of entities together through various semantic relationships according to entity information (such as names, attributes, categories, text introduction/description, and the like of the goods) related to the recommended application, and a specific obtaining method can refer to a knowledge graph obtaining construction method in the prior art, which is not described herein again.

After the text information of the acquired knowledge graph is input, the entity similarity calculation module 10 calculates the similarity of the entities by using the entity names, the text description information and the triple semantic information of the entities in the knowledge graph 200, and further acquires a plurality of recommended entities; the map embedded expression learning module 20 takes the relationship between the entities in the knowledge map 200 as input, and learns to obtain the embedded expression of the entities in the knowledge map 200 by using a TransE model improved for recommended application, so that rich map structure information contained in the map is fully utilized; the entity recommendation generating module 30 includes a recommendation ranking model based on RankNet, and fuses the outputs of the entity similarity calculating module 10 and the graph embedded representation learning module 20 into a final similarity score by using the click feedback data of the user, so that the recommendation application recommends an entity to the user according to the similarity score (for example, entity information with a high similarity score is displayed in a content browsing interface so that the user can click and view the entity information).

The structure and specific working principle of each module are explained in turn below.

1. Entity similarity calculation module

The input information of the entity similarity calculation module 10 of this embodiment is the entity name, the text description information, and the triple semantic information acquired in the knowledge graph 200, wherein the structured triple information < h, r, t > in the knowledge graph 200 is converted into unstructured text data, which is added to the text description of the entity, to calculate the overall text similarity.

Fig. 2 is a schematic configuration diagram of an entity similarity calculation module according to an embodiment of the present invention.

As shown in fig. 2, the entity similarity calculation module 10 includes a literal similarity operator module 11 for calculating the similarity between names of entities, a text similarity operator module 12 for calculating the similarity between description texts of the entities, and a matrix generation sub-module 13 for generating an entity similarity matrix according to the results of the literal similarity operator module 11 and the text similarity operator module 12.

The name of the entity (i.e. the entity name) often contains a lot of visual information, and the literal similarity operator module 11 of this embodiment takes the entity name as input, and calculates the text similarity of the entity based on the longest common subsequence LCS and the edit distance Levenshtein.

The literal similarity calculation based on LCS adopts the following mode:

if a new sequence T is obtained by randomly deleting a plurality of characters from a sequence S, T is called a subsequence of S. Of the two sequences X and Y common subsequences, the subsequence with the longest length is defined as the longest common subsequence of X and Y. Let entity e_iAnd e_jThe corresponding entity name is s_iAnd s_jTheir similarity based on the longest common subsequence SimLCS_ijComprises the following steps:

wherein, | LCS(s)_i,s_j) L is s_iAnd s_jLongest common subsequence length, | s_iI and s_jL is respectively s_iAnd s_jLength of (d).

The literal similarity calculation based on Levenshtein adopts the following mode:

the Levenshtein distance, also called edit distance, refers to the minimum number of edit operations required to convert one character string into another character string, and is a distance metric function for measuring similarity by using operation steps.

The editing operations permitted in this embodiment include: replacing one character with another; inserting a character; one character is deleted. In this embodiment, the class edit distance in the edit distances is specifically used for measurement, that is, the number of operation steps of deleting and inserting +1, but the number of operation steps of replacing + 2.

For entity name s_iAnd s_jTheir similarity SimED based on the edit distance Levenshtein_ijComprises the following steps:

wherein, ldist(s)_i,s_j) Is s is_iAnd s_jClass edit distance, | s_iI and s_jL is respectively s_iAnd s_jLength of (d).

The text similarity calculation operator module 12 calculates the text similarity of the entities by using five indexes, namely TF-IDF, LSI model, LDA model, HDP model and Jaccard coefficient, with the description text of each entity as an input document. For the input document, firstly performing word segmentation, removing stop words, and then vectorizing (word bag model), namely completing preprocessing, and then calculating the similarity between every two preprocessed input documents by using four similarity models.

The entity text similarity calculation based on the TF-IDF adopts the following mode:

and representing the feature items obtained after the preprocessing by using a vector space model, weighting the feature items by using TF-IDF factors, and calculating the similarity of the documents by using vector included angle cosines.

For entity e_iAnd e_jTheir corresponding description text is d_iAnd d_jTheir TF-IDF-based similarity, SimTFIDF_ijComprises the following steps:

SimTFIDF_ij＝cos(v_i,v_j)

wherein v is_iIs d_iOf the TF-IDF weighted vector of each dimension w_ijDefined as the smoothed TF-IDF factor:

wherein, tf_ijFor the word frequency of a given word in a document, N is the total number of documents in the corpus, df_ijThe number of documents containing the word.

The LSI-based calculation of the degree of similarity of entity texts is performed as follows:

the LSI captures the correlation between words by SVD, mapping document vectors to a new semantic vector space. For a (word-document) matrix, after singular value decomposition, the largest k singular values are reserved, and the vectorized representation D' of the document after dimension reduction is obtained.

For description text d_iAnd d_jTheir LSI-based similarity SimLSI_ijComprises the following steps:

SimLSI_ij＝cos(v_i,v_j)

wherein v is_iIs D' in D_iCorresponding vector, v_jIs D' in D_jThe corresponding vector.

The LDA-based entity text similarity calculation adopts the following mode:

the LDA model clusters similar words into a topic by a similar word clustering method, so that words under the same topic have the characteristics of similar words, and words between different topics have the characteristics of polysemous words. When performing text similarity calculation using LDA, the goal is to find the topic distribution of each document and use the topic distribution to calculate the similarity between texts.

For description text d_iAnd d_jTheir LDA-based similarity SimLDA_ijComprises the following steps:

SimLDA_ij＝cos(v_i,v_j)

wherein v is_iIs d_iCorresponding (document-topic) distribution, v_jIs d_jCorresponding (document-topic) distribution.

The HDP-based entity text similarity calculation adopts the following mode:

the HDP can automatically learn the number of the topics on the basis of the Dirichlet process, a probability measure which tends to be infinite is used for replacing limited topic mixing in LDA, a new DP is established for each document according to different mixing proportions, extracted topics are shared by all DPs, but different DPs have different mixing proportions and can be obtained by sampling from upper-layer DPs.

Similar to LDA, for description text d_iAnd d_jTheir HDP-based similarity SimHDP_ijComprises the following steps:

SimHDP_ij＝cos(v_i,v_j)

The similarity calculation of the entity text based on the Jaccard adopts the following method:

intuitively, the larger the same part of two sentences is, the larger the number of co-occurring words is, and the higher their similarity should be, and it is based on this idea that the traditional Jaccard algorithm is. And the proportion of the co-occurrence words relative to all the words can numerically reflect the similarity of the two sentences.

For description text d_iAnd d_jTheir Jaccard-based similarity SimJaccard_ijComprises the following steps:

wherein, Inter (d)_i,d_j) Is d_iAnd d_jIntersection of Chinese terms, Union (d)_i,d_j) Is d_iAnd d_jUnion of Chinese words.

After the above-mentioned different similarity calculations are completed, the matrix generation module 13 summarizes various similarity results between the entity pairs according to pairwise pairing relationship, so as to form a similarity matrix between different entity pairs.

2. Map embedded type representation learning module

The Graph embedded expression learning module 20 is intended to recommend entities by using relationships between knowledge Graph entities, learns each entity and relationship in a knowledge Graph by using a Graph embedding learning method (Graph embedding) to obtain a low-dimensional vector, and maintains the original structural information in the Graph, which is used as an important input of the next module to influence the final entity recommendation output of the system.

In this embodiment, the map embedded representation learning module 20 performs learning by using an improved method based on a TransE algorithm.

Fig. 3 is a schematic diagram of the core idea of the TransE algorithm.

As shown in fig. 3, the TransE algorithm is a classical graph-embedding learning method of knowledge-graph, wherein for a set of triples < head, relation, tail >, (h + r) should be as equal as possible to t.

In the TransE algorithm, every triplet is equally important, i.e., the weight of each < h, t, r > edge is the same. While in many recommendation applications, users may be more inclined to recommendations of more trending entities. For example, there are several parts of the entity "wujing" in the knowledge-graph 200 to represent the work ": < wujing, representative of the work, wandering earth >, < wujing, representative of the work, shaolin wuwang >, it is clear that users are more inclined to require recommendations of the hot entity "wandering earth" when they receive recommendations of related entities of the entity "wujing".

In the embodiment, aiming at the characteristics of the recommended application, an embedded learning method of an entity is modified on the basis of a TransE algorithm. During training, each edge introduces a weight w:

wherein v is_tThe number of accesses of the tail entity is used for measuring the popularity of the entity; v. of_max、v_minRespectively representing the maximum and minimum access times of all edges of the head entity h, and being used for normalizing the weight of each entity; alpha is a super parameter, used to compensate for the weight of the low prevalence edge.

Since there are some trivial entities with high number of visits in the knowledge-graph 200, we do not need this edge to be weighted too heavily in the application, e.g. in the triple < wujing, nationality, china >, the number of visits of the entity "china" is very high, which is not suitable for the recommended task.

Therefore, considering that for an entity, if the number of introductions of the entity is too high, the information content of the relation of the entity as the tail entity is low, the module introduces the arithmetic square root of the introductions of the tail entity as the measure of the trivial entity, and the updated weight w is:

for each edge < h, r, t > in the knowledge-graph, its weight w is calculated, resulting in a set of inputs (h, r, t, w) to the model, and the final loss function is:

wherein, (h ', r, t') represents a negative sample; gamma is a margin super parameter, and the value of the embodiment is 1.

3. Entity recommendation generation module

The entity recommendation generating module 30 of this embodiment includes a ranking model, and the training of the model adopts a training method based on Pairwise, and uses the user click data to train the parameter w in the model.

In the prior art, an entity similarity calculation module based on semantic information is directly utilized to provide a set of unsupervised entity recommendation schemes, that is, other entities with high similarity are ranked and recommended to a user according to the user concerned content (for example, the entity content being browsed). However, in such recommendation system, the quality of the ranking completely depends on the output of the model, while the ranking result output by the unsupervised method hardly reflects the preference of the user, and it only uses the ranking score method to add the scores of different similarity models, so that the behavior of the user in the real environment cannot be fitted. In order to integrate the selection preference result of the user in the real environment into the recommendation system, the problem is converted into how to obtain the optimal model parameter w by using the user access data.

In this embodiment, the training model is a ranking learning method based on Pairwise, and therefore in the training set, each training example is a pair of entity pairs having a partial ranking relationship. The training data set used by the model in the entity recommendation generation module 30 during training is constructed from the user click data: and sequencing the correlation degree of the entities in the original recommendation list and the target entities in the system through the click records of the links presented to the user by the user in the recommendation engine, selecting an entity with high click rate and an entity with low click rate from the sequencing, and constructing a pair of entity pairs with a partial order relationship as a group of training data. The specific utilization method is as follows:

for a certain target entity e in the map_kE to be presented to the user by the recommendation engine_kThe entities are sorted according to the click times of the user through the recommended link, and the first N entities and the last N entities of the sorted sequence are taken to form two sets A_kAnd B_kConsidered relative to set B_kMiddle energizerIn particular, set A_kEntity in (1) and entity e_kMore relevant. Respectively from A_kAnd B_kTaking one entity to form an entity pair (e)_i,e_j) If e is_i∈A_k,e_j∈B_kThen the target Y is 1, otherwise Y is 0.

As shown in FIG. 4, a RankNet model is used in the entity recommendation generation module 30 to predict the more relevant probabilities between entities, i.e., e for any one entity pair_iAnd e_jPredicting e_iRatio e_jFurther to the target entity e_kProbability of correlation P_ijComprises the following steps:

specifically, the model structure is shown in fig. 4. The bottom layer of the model has three inputs, vec_i,vec_j,vec_kThe three inputs vec_i,vec_j,vec_kAre respectively entity e_i,e_j,e_kIs obtained by the graph embedded representation learning module 20. vec_i,vec_j,vec_kAfter the model bottom layer is input, the difference of the embedded expression between the two entities is obtained through a calculation formula in the model bottom layer part:

dis_k,i＝vec_k-vec_i

the middle layer of the model also has two inputs, sim_k.iAnd sim_k.jThe sim_k.iAnd sim_k.jAre respectively an entity pair (e)_i,e_k) And (e)_j,e_k) The input of the middle layer is combined with the output of the bottom layer (i.e. the difference of the calculated embedded representation), and S is calculated by the main part of the model_k,iAnd S_k,jThen, prob can be obtained by the above formula_i,jThe prob_i,jIs an entitye_iRatio e_jFurther to the target entity e_kProbability of correlation P_ijI.e. the final similarity score.

Therefore, in the model, the distance characteristics of the entities in the knowledge graph 200 are increased, so that the distance relationship between the two entities in the knowledge graph 200 can be measured while the final similarity score is obtained. In addition, the main part of the model adopts the structure of the RankNet model in the prior art, the RankNet model comprises a plurality of DENSE modules, the specific structure and the working principle are the same as those of the RankNet model in the prior art, and the details are not repeated here.

Effects and effects of the embodiments

In this embodiment, the entity similarity calculation module can calculate the similarity of the entities by using the entity names, the text description information, and the triple semantic information of the entities in the knowledge graph, the graph embedded expression learning module can learn to obtain the embedded expression of the entities in the knowledge graph by using the relationship between the entities in the knowledge graph as input, and the entity recommendation generation module can fuse the outputs of the entity similarity calculation module and the graph embedded expression learning module into a final similarity score by using click feedback data of a user.

In the embodiment, because the graph embedded type representation learning module measures the weight of the triples through the heat degree of the tail entity and the entrance degree information of the tail entity, compared with the prior art, the graph embedded type representation learning module can enable the triples with higher quality to have higher training weights, so that the effect of the representation learning algorithm is improved.

In the embodiment, since the entity similarity calculation module includes a literal similarity calculation operator module for calculating the similarity between names of entities, and a text similarity calculation operator module for calculating the similarity between description texts of the entities, the two modules can use a large amount of knowledge information owned by the knowledge graph, such as triple information, entity description information, and the like, to calculate the unsupervised entity similarity without clicking or feeding back data by the user, and the calculation result can provide the similarity ranking between the entities to allow the recommendation application to provide the recommended entity list for the user, the recommendation system of the embodiment is a system that can be started without clicking or feeding back by the user, that is, a cold start system, compared with the prior art that needs a hot start system (that needs a certain amount of clicking or feeding back data by the user to perform recommendation), the environmental limitation is small when the method is applied.

The entity recommendation generation module of the embodiment can fuse the similarity between entities obtained by the entity similarity calculation module and the entity embedded vector information obtained by the map embedded representation learning module through a model based on a sequencing learning method by using the click information of the user, then obtain the final similarity score of the entities, and sequence the similar entities based on the final similarity score, so that the sequencing result can be more fitted with the real intention of the user, and a better personalized recommendation effect is achieved.

Claims

1. A knowledge graph recommendation system fusing entity information and popularity is arranged in a recommendation application for recommending entities to a user, and is used for obtaining similarity scores of the entities according to knowledge graphs established by related information of a plurality of entities and click feedback information of the user when browsing the related information of the entities so as to make the recommendation application recommend according to the similarity scores, and the system is characterized by comprising:

the entity similarity calculation module is used for calculating and obtaining the similarity of the entities by utilizing entity names, text description information and triple semantic information of the entities in the knowledge graph after acquiring the text information of the knowledge graph and inputting the text information;

the knowledge graph embedded representation learning module comprises a TransE model improved according to recommended application, and machine learning is carried out by taking the relation between the entities in the knowledge graph as input so as to obtain embedded representation of the entities in the knowledge graph; and

and the entity recommendation generation module is used for fusing the similarity obtained by the entity similarity calculation module and the embedded representation obtained by the map embedded representation learning module into a final similarity score based on the click feedback data of the user, so that the recommendation application recommends the entity to the user according to the similarity score.

2. The system according to claim 1, wherein the system comprises:

the entity similarity calculation module comprises a literal similarity operator module for calculating the similarity between the names of the entities, a text similarity operator module for calculating the similarity between the description texts of the entities, and a matrix generation submodule for generating an entity similarity matrix as the similarity according to the results of the literal similarity operator module and the text similarity operator module.

3. The system of claim 2, wherein the system comprises:

wherein the literal similarity operator module takes the entity name of the entity as input, and calculates the text similarity of the entity based on the longest common subsequence LCS and the edit distance Levenshtein,

let entity e_iAnd e_jThe corresponding entity name is s_iAnd s_j，

The entity e_iAnd e_jSimilarity SimLCS based on longest common subsequence LCS_ijComprises the following steps:

in the above formula, | LCS(s)_i,s_j) L is s_iAnd s_jLongest common subsequence length, | s_iI and s_jL is respectively s_iAnd s_jLength of (d);

the entity e_iAnd e_jSimilarity SimED based on edit distance Levenshtein_ijComprises the following steps:

4. The system of claim 2, wherein the system comprises:

the text similarity calculation operator module takes the description text of each entity as an input document, and calculates the text similarity of the entities by using five indexes of TF-IDF, LSI model, LDA model, HDP model and Jaccard coefficient.

5. The system according to claim 4, wherein the system comprises:

wherein, for entity e_iAnd e_jSetting the two entities e_iAnd e_jThe corresponding description texts are respectively d_iAnd d_j，

The entity e_iAnd e_jBased on the similarity SimTFIDF of TF-IDF_ijComprises the following steps:

SimTFIDF_ij＝cos(v_i,v_j)

the entity e_iAnd e_jBased on LSI similarityDegree SimLSI_ijComprises the following steps:

SimLSI_ij＝cos(v_i,v_j)

in the above formula, v_iIs D' in D_iCorresponding vector, v_jIs D' in D_jCorresponding vector, D' is the vectorization representation of the document after dimension reduction;

the entity e_iAnd e_jBased on the similarity SimLDA of the LDA_ijComprises the following steps:

SimLDA_ij＝cos(v_i,v_j)

the entity e_iAnd e_jBased on the similarity SimHDP of HDP_ijComprises the following steps:

SimHDP_ij＝cos(v_i,v_j)

the entity e_iAnd e_jSimilarity SimJaccard based on Jaccard_ijComprises the following steps:

6. The system according to claim 1, wherein the system comprises:

the map embedded expression learning module adopts an improved method based on a TransE algorithm for learning, and the improvement of the TransE algorithm in the method is as follows:

during training, each edge introduces a weight w:

in the above formula, v_tThe number of accesses of the tail entity is used for measuring the popularity of the entity; v. of_max、v_minRespectively representing the maximum and minimum access times of all edges of the head entity h, and being used for normalizing the weight of each entity; alpha is a super parameter, used to compensate the weight of the low popularity edge,

for each edge < h, r, t > in the knowledge-graph, the weight w is calculated to obtain a set of inputs (h, r, t, w) of the model, and the final loss function is:

7. The system according to claim 6, wherein the system comprises:

wherein gamma is 1.

8. The system according to claim 1, wherein the system comprises:

wherein the entity recommendation generation module adopts an improved RankNet model,

the improved RankNet model comprises a bottom layer, a middle layer, a main body part and an output layer,

the bottom layer has three inputs vec_i,vec_j,vec_kThe three inputs vec_i,vec_j,vec_kAre respectively entity e_i,e_j,e_kThe difference of the embedded representations between the entities is obtained through the calculation of the bottom layer part;

the middle layer also has two inputs sim_k.iAnd sim_k.jThe sim_k.iAnd sim_k.jAre respectively an entity pair (e)_i,e_k) And (e)_j,e_k) The input of the middle layer is combined with the output of the bottom layer (i.e. the difference of the calculated embedded representation), and S is calculated by the main part_k,iAnd S_k,j，

The output layer calculates to obtain prob by adopting the following calculation formula_i,jAs the final similarity score:

in the above formula, P_ijAs said entity e_iRatio e_jWith a target entity e_kMore relevant probabilities, i.e. prob_i,j。

9. The system according to claim 8, wherein the system comprises:

and when the improved RankNet model is trained, the access data of each user is used as input, and the improved RankNet model after training is obtained and corresponds to each user respectively.