CN112966091A - Knowledge graph recommendation system fusing entity information and heat - Google Patents

Knowledge graph recommendation system fusing entity information and heat Download PDF

Info

Publication number
CN112966091A
CN112966091A CN201911272709.6A CN201911272709A CN112966091A CN 112966091 A CN112966091 A CN 112966091A CN 201911272709 A CN201911272709 A CN 201911272709A CN 112966091 A CN112966091 A CN 112966091A
Authority
CN
China
Prior art keywords
entity
similarity
entities
information
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911272709.6A
Other languages
Chinese (zh)
Other versions
CN112966091B (en
Inventor
杜雨燕
梁斌
梁家卿
肖仰华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201911272709.6A priority Critical patent/CN112966091B/en
Publication of CN112966091A publication Critical patent/CN112966091A/en
Application granted granted Critical
Publication of CN112966091B publication Critical patent/CN112966091B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention aims to provide a recommendation system capable of fully utilizing semantic information and graph structure information in a knowledge graph, which is mainly characterized by comprising the following steps: a calculation method of semantic similarity of entities is adopted; an entity embedding calculation method aiming at entity recommendation application is adopted; and combining semantic information in the map with the map structure information recommending entity. The recommendation system specifically comprises: the entity similarity calculation module is used for obtaining the similarity of the entity after the text information of the knowledge graph is input; the map embedded expression learning module is used for obtaining embedded expression of the entity by taking the relation between the entities in the knowledge map as input; and the entity recommendation generation module is used for fusing the similarity obtained by the entity similarity calculation module and the embedded representation obtained by the map embedded representation learning module into a final similarity score based on the click feedback data of the user, so that the recommendation application recommends the entity to the user.

Description

Knowledge graph recommendation system fusing entity information and heat
Technical Field
The invention belongs to the technical field of information, relates to an information content recommendation system, and particularly relates to a knowledge graph recommendation system fusing entity information and heat.
Background
With the rapid development of information technology, a large amount of data in each field is continuously crossed and fused, so that a huge amount of data exceeding the data of any previous age is generated, and the problem of information overload is caused; meanwhile, the information explosion reduces the utilization rate of the information, so that a user is difficult to efficiently acquire the information, and therefore, how to effectively acquire the information from massive information becomes a subject worthy of deep excavation.
The traditional search algorithm (i.e. the method for searching related objects based on the search terms of the user) can only present the ranking results of the same objects to the user, and cannot provide the ranking results of corresponding services for different interests and hobbies of the user.
Therefore, in the information browsing means such as web browsing and APP browsing in the prior art, a recommendation system is mostly adopted, that is, relevant information is recommended and pushed to the user according to the current browsing situation or browsing record of the user. The recommendation system is an important information filtering means, and is one of the effective methods for solving the problem of information overload at present, and can recommend information which is interested by a user according to the relatively fuzzy intention provided by the user.
In the prior art, recommendation methods adopted by a recommendation system can be divided into three categories: 1) based on the recommendation of the content similarity, the method provides recommendation for the user according to the object which is selected by the user currently, judges the interest content of the user through the current object, and recommends other objects which are similar to the current access object in content. 2) The core idea of collaborative filtering recommendation is to utilize historical behaviors of other user groups similar to user behaviors for recommendation, the historical behaviors of users are generally expressed as a relationship matrix from the users to objects, and the users with similar historical behaviors are obtained through calculation of the relationship matrix, so that contents which are interested by the users are recommended. The method is the best known in recommendation system algorithms and works well when there is a sufficient amount of user behavior data. 3) The recommendation based on the knowledge graph is that the knowledge graph is a semantic network, different types of entities are connected together through various semantic relations, and the entities and the relations form a large-scale graph together. Because the knowledge graph contains rich semantic associated knowledge among entities, some recommendation systems in recent years use knowledge of the knowledge graph and introduce auxiliary information in the graph, and provide a recommendation method based on entity paths in the graph and a recommendation method based on features.
In the above conventional recommendation technology, there are mainly a data sparsity problem and a cold start problem of the recommendation system when there is a lack of user data. In a real application scene, since a user often has any form of interaction with only a small number of articles directly, the interaction information between the user and the articles is generally very sparse. For example, in a recommended scene of a movie, the total number of movies may be as many as ten thousand, and each user typically only scores tens of movies. When such sparse observation data is used to predict a large amount of unknown information, the risk of model overfitting is greatly increased. As the most popular recommendation algorithm at present, collaborative filtering is difficult to provide effective recommendations when the system has only a small accumulation of user behavior; moreover, for newly added users or items, it is also difficult to accurately model and recommend because there is no corresponding interaction information in the system. The conventional recommendation based on the knowledge graph only utilizes a small amount of knowledge in the knowledge graph, and does not fully use semantic information and graph structure information in the graph, so that the recommendation effect is still defective.
Disclosure of Invention
In order to solve the problems, the invention provides a recommendation system capable of fully utilizing semantic information and graph structure information in a knowledge graph, which is mainly characterized by comprising the following steps: 1) a calculation method of semantic similarity of entities is adopted; 2) an entity embedding calculation method aiming at entity recommendation application is adopted; 3) and combining semantic information in the map with the map structure information recommending entity.
Specifically, the invention provides a knowledge graph recommendation system fusing entity information and popularity, which is arranged in a recommendation application for recommending entities to a user, and is used for obtaining similarity scores of the entities according to knowledge graphs established by related information of a plurality of entities and click feedback information of the user when browsing the related information of the entities so as to make the recommendation application recommend according to the similarity scores, and the system is characterized by comprising: the entity similarity calculation module is used for calculating and obtaining the similarity of the entities by using the entity names, the text description information and the triple semantic information of the entities in the knowledge graph after the text information of the knowledge graph is input; the map embedded expression learning module comprises a TransE model improved according to recommended application, and machine learning is carried out by taking the relation among the entities in the knowledge map as input so as to obtain embedded expression of the entities in the knowledge map; and the entity recommendation generation module is used for fusing the similarity obtained by the entity similarity calculation module and the embedded representation obtained by the map embedded representation learning module into a final similarity score based on the click feedback data of the user, so that the recommendation application recommends the entity to the user according to the similarity score.
The knowledge graph recommendation system fusing entity information and heat provided by the invention can also have the technical characteristics that the entity similarity calculation module comprises a literal similarity operator module for calculating the similarity between the names of the entities, a text similarity operator module for calculating the similarity between the description texts of the entities and a matrix generation submodule for generating an entity similarity matrix as the similarity according to the results of the literal similarity operator module and the text similarity operator module.
Furthermore, the above-mentioned knowledge-graph recommendation system for fusing entity information and heat may also have the technical feature that the literal similarity calculation operator module takes the entity name of the entity as input, calculates the text similarity of the entity based on the longest common subsequence LCS and the edit distance Levenshtein, and sets the entity eiAnd ejThe corresponding entity name is siAnd sjEntity eiAnd ejSimilarity SimLCS based on longest common subsequence LCSijComprises the following steps:
Figure BDA0002314636270000031
in the above formula, | LCS(s)i,sj) L is siAnd sjLongest common subsequence length, | siI and|sjl is respectively siAnd sjLength of (d);
entity eiAnd ejSimilarity SimED based on edit distance LevenshteinijComprises the following steps:
Figure BDA0002314636270000032
in the above formula, ldist(s)i,sj) Is s isiAnd sjClass edit distance, | siI and sjL is respectively siAnd sjLength of (d).
Furthermore, the knowledge graph recommendation system for fusing entity information and popularity may further have a technical feature that the text similarity calculation subunit module calculates the text similarity of the entities by using five indexes, namely, a TF-IDF index, an LSI model, an LDA model, an HDP model and a Jaccard coefficient, with the description text of each entity as an input document.
Further, the aforementioned knowledge graph recommendation system integrating entity information and popularity may further have a technical feature in which an entity e is referred toiAnd ejSetting the two entities eiAnd ejThe corresponding description texts are respectively diAnd dj
Entity eiAnd ejBased on the similarity SimTFIDF of TF-IDFijComprises the following steps:
SimTFIDFij=cos(vi,vj)
in the above formula, viIs diOf the TF-IDF weighted vector of each dimension wijDefined as the smoothed TF-IDF factor;
entity eiAnd ejSimilarity SimLSI based on LSIijComprises the following steps:
SimLSIij=cos(vi,vj)
in the above formula, viIs D' in DiCorresponding vector, vjIs D' in DjCorresponding vector, D' being reduced in dimensionA document vectorized representation;
entity eiAnd ejBased on the similarity SimLDA of the LDAijComprises the following steps:
SimLDAij=cos(vi,vj)
in the above formula, viIs diCorresponding (document-topic) distribution, vjIs djCorresponding (document-topic) distribution;
entity eiAnd ejBased on the similarity SimHDP of HDPijComprises the following steps:
SimHDPij=cos(vi,vj)
in the above formula, viIs diCorresponding (document-topic) distribution, vjIs djCorresponding (document-topic) distribution;
entity eiAnd ejSimilarity SimJaccard based on JaccardijComprises the following steps:
Figure BDA0002314636270000041
in the above formula, Inter (d)i,dj) Is diAnd djIntersection of Chinese terms, Union (d)i,dj) Is diAnd djUnion of Chinese words.
The knowledge graph recommendation system fusing entity information and heat provided by the invention can also have the technical characteristics,
the map embedded expression learning module adopts an improved method based on a TransE algorithm for learning, and the improvement on the TransE algorithm in the method is as follows:
during training, each edge introduces a weight w:
Figure BDA0002314636270000051
Figure BDA0002314636270000052
in the above formula, vtThe number of accesses of the tail entity is used for measuring the popularity of the entity; v. ofmax、vminRespectively representing the maximum and minimum access times of all edges of the head entity h, and being used for normalizing the weight of each entity; alpha is a hyper-parameter, used to compensate for the weight of low-prevalence edges, for each edge in the knowledge-graph<h,r,t>Calculating the weight w to obtain a set of inputs (h, r, t, w) of the model, wherein the final loss function is as follows:
Figure BDA0002314636270000053
in the above formula, (h ', r, t') represents a negative sample; gamma is a margin hyper-parameter, and the value is preset.
Further, the knowledge graph recommendation system fusing the entity information and the heat may further have a technical feature that a value of γ is 1.
The knowledge graph recommendation system fusing entity information and heat provided by the invention can also have the technical characteristics that the entity recommendation generation module adopts an improved RankNet model, the improved RankNet model comprises a bottom layer, a middle layer, a main body part and an output layer, and the bottom layer is provided with three input vecsi,vecj,veckThe three inputs veci,vecj,veckAre respectively entity ei,ej,ekThe difference of the embedded representation between the entities is obtained through the calculation of the bottom layer part; the middle layer also has two inputs simk.iAnd simk.jThe simk.iAnd simk.jAre respectively an entity pair (e)i,ek) And (e)j,ek) The input of the middle layer is combined with the output of the bottom layer (i.e. the difference of the calculated embedded representation), and S is calculated by the main partk,iAnd Sk,jThe output layer is calculated by the following calculation formula to obtain probi,jAs a final phaseSimilarity score:
Figure BDA0002314636270000054
in the above formula, PijAs entity eiRatio ejWith a target entity ekMore relevant probabilities, i.e. probi,j
Further, the knowledge graph recommendation system fusing entity information and heat can also have the technical characteristics that access data of each user is used as input when the improved RankNet model is trained, and the trained improved RankNet model is obtained and corresponds to each user respectively.
Action and Effect of the invention
According to the knowledge graph recommendation system fusing entity information and heat provided by the invention, because the entity similarity calculation module can calculate the similarity of the entities by using the entity names, the text description information and the triple semantic information of the entities in the knowledge graph, the map embedded expression learning module can obtain the embedded expression of the entities in the knowledge map by taking the relationship between the entities in the knowledge map as input learning, the entity recommendation generation module can fuse the output of the entity similarity calculation module and the map embedded expression learning module into a final similarity score by utilizing click feedback data of a user, therefore, when entity similarity ranking and entity recommendation are carried out based on the final similarity score, not only can semantic information and graph structure information in the knowledge graph be fully utilized, but also the intention of a user can be combined, and the recommendation result is more accurate and has the advantage of individuation.
Drawings
FIG. 1 is a diagram of a knowledge-graph recommendation system that fuses entity information and heat in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a block diagram of an entity similarity calculation module according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the core idea of the TransE algorithm;
FIG. 4 is a schematic diagram of a model framework of an entity recommendation generation module according to an embodiment of the invention.
Detailed Description
The following describes specific embodiments of the present invention.
Fig. 1 is a configuration diagram of a knowledge-graph recommendation system that merges entity information and heat according to an embodiment of the present invention.
As shown in fig. 1, a knowledge graph recommendation system (hereinafter referred to as recommendation system) 100 for fusing entity information and popularity in the present embodiment includes an entity similarity calculation module 10, a graph embedded representation learning module 20, and an entity recommendation generation module 30.
The recommendation system 100 of the embodiment is configured to perform entity recommendation according to the knowledge graph 200 after establishing the knowledge graph 200 of a plurality of objects (i.e., different entities, such as online shopping goods) that can be recommended. The recommendation system 100 may be set in recommendation applications that need to perform entity recommendation on a user, where the recommendation applications may be an online shopping website, a comprehensive content publishing website, a related APP, and the like. The knowledge graph 200 is obtained by linking different types of entities together through various semantic relationships according to entity information (such as names, attributes, categories, text introduction/description, and the like of the goods) related to the recommended application, and a specific obtaining method can refer to a knowledge graph obtaining construction method in the prior art, which is not described herein again.
After the text information of the acquired knowledge graph is input, the entity similarity calculation module 10 calculates the similarity of the entities by using the entity names, the text description information and the triple semantic information of the entities in the knowledge graph 200, and further acquires a plurality of recommended entities; the map embedded expression learning module 20 takes the relationship between the entities in the knowledge map 200 as input, and learns to obtain the embedded expression of the entities in the knowledge map 200 by using a TransE model improved for recommended application, so that rich map structure information contained in the map is fully utilized; the entity recommendation generating module 30 includes a recommendation ranking model based on RankNet, and fuses the outputs of the entity similarity calculating module 10 and the graph embedded representation learning module 20 into a final similarity score by using the click feedback data of the user, so that the recommendation application recommends an entity to the user according to the similarity score (for example, entity information with a high similarity score is displayed in a content browsing interface so that the user can click and view the entity information).
The structure and specific working principle of each module are explained in turn below.
1. Entity similarity calculation module
The input information of the entity similarity calculation module 10 of this embodiment is the entity name, the text description information, and the triple semantic information acquired in the knowledge graph 200, wherein the structured triple information < h, r, t > in the knowledge graph 200 is converted into unstructured text data, which is added to the text description of the entity, to calculate the overall text similarity.
Fig. 2 is a schematic configuration diagram of an entity similarity calculation module according to an embodiment of the present invention.
As shown in fig. 2, the entity similarity calculation module 10 includes a literal similarity operator module 11 for calculating the similarity between names of entities, a text similarity operator module 12 for calculating the similarity between description texts of the entities, and a matrix generation sub-module 13 for generating an entity similarity matrix according to the results of the literal similarity operator module 11 and the text similarity operator module 12.
The name of the entity (i.e. the entity name) often contains a lot of visual information, and the literal similarity operator module 11 of this embodiment takes the entity name as input, and calculates the text similarity of the entity based on the longest common subsequence LCS and the edit distance Levenshtein.
The literal similarity calculation based on LCS adopts the following mode:
if a new sequence T is obtained by randomly deleting a plurality of characters from a sequence S, T is called a subsequence of S. Of the two sequences X and Y common subsequences, the subsequence with the longest length is defined as the longest common subsequence of X and Y. Let entity eiAnd ejThe corresponding entity name is siAnd sjTheir similarity based on the longest common subsequence SimLCSijComprises the following steps:
Figure BDA0002314636270000081
wherein, | LCS(s)i,sj) L is siAnd sjLongest common subsequence length, | siI and sjL is respectively siAnd sjLength of (d).
The literal similarity calculation based on Levenshtein adopts the following mode:
the Levenshtein distance, also called edit distance, refers to the minimum number of edit operations required to convert one character string into another character string, and is a distance metric function for measuring similarity by using operation steps.
The editing operations permitted in this embodiment include: replacing one character with another; inserting a character; one character is deleted. In this embodiment, the class edit distance in the edit distances is specifically used for measurement, that is, the number of operation steps of deleting and inserting +1, but the number of operation steps of replacing + 2.
For entity name siAnd sjTheir similarity SimED based on the edit distance LevenshteinijComprises the following steps:
Figure BDA0002314636270000082
wherein, ldist(s)i,sj) Is s isiAnd sjClass edit distance, | siI and sjL is respectively siAnd sjLength of (d).
The text similarity calculation operator module 12 calculates the text similarity of the entities by using five indexes, namely TF-IDF, LSI model, LDA model, HDP model and Jaccard coefficient, with the description text of each entity as an input document. For the input document, firstly performing word segmentation, removing stop words, and then vectorizing (word bag model), namely completing preprocessing, and then calculating the similarity between every two preprocessed input documents by using four similarity models.
The entity text similarity calculation based on the TF-IDF adopts the following mode:
and representing the feature items obtained after the preprocessing by using a vector space model, weighting the feature items by using TF-IDF factors, and calculating the similarity of the documents by using vector included angle cosines.
For entity eiAnd ejTheir corresponding description text is diAnd djTheir TF-IDF-based similarity, SimTFIDFijComprises the following steps:
SimTFIDFij=cos(vi,vj)
wherein v isiIs diOf the TF-IDF weighted vector of each dimension wijDefined as the smoothed TF-IDF factor:
Figure BDA0002314636270000091
wherein, tfijFor the word frequency of a given word in a document, N is the total number of documents in the corpus, dfijThe number of documents containing the word.
The LSI-based calculation of the degree of similarity of entity texts is performed as follows:
the LSI captures the correlation between words by SVD, mapping document vectors to a new semantic vector space. For a (word-document) matrix, after singular value decomposition, the largest k singular values are reserved, and the vectorized representation D' of the document after dimension reduction is obtained.
For description text diAnd djTheir LSI-based similarity SimLSIijComprises the following steps:
SimLSIij=cos(vi,vj)
wherein v isiIs D' in DiCorresponding vector, vjIs D' in DjThe corresponding vector.
The LDA-based entity text similarity calculation adopts the following mode:
the LDA model clusters similar words into a topic by a similar word clustering method, so that words under the same topic have the characteristics of similar words, and words between different topics have the characteristics of polysemous words. When performing text similarity calculation using LDA, the goal is to find the topic distribution of each document and use the topic distribution to calculate the similarity between texts.
For description text diAnd djTheir LDA-based similarity SimLDAijComprises the following steps:
SimLDAij=cos(vi,vj)
wherein v isiIs diCorresponding (document-topic) distribution, vjIs djCorresponding (document-topic) distribution.
The HDP-based entity text similarity calculation adopts the following mode:
the HDP can automatically learn the number of the topics on the basis of the Dirichlet process, a probability measure which tends to be infinite is used for replacing limited topic mixing in LDA, a new DP is established for each document according to different mixing proportions, extracted topics are shared by all DPs, but different DPs have different mixing proportions and can be obtained by sampling from upper-layer DPs.
Similar to LDA, for description text diAnd djTheir HDP-based similarity SimHDPijComprises the following steps:
SimHDPij=cos(vi,vj)
wherein v isiIs diCorresponding (document-topic) distribution, vjIs djCorresponding (document-topic) distribution.
The similarity calculation of the entity text based on the Jaccard adopts the following method:
intuitively, the larger the same part of two sentences is, the larger the number of co-occurring words is, and the higher their similarity should be, and it is based on this idea that the traditional Jaccard algorithm is. And the proportion of the co-occurrence words relative to all the words can numerically reflect the similarity of the two sentences.
For description text diAnd djTheir Jaccard-based similarity SimJaccardijComprises the following steps:
Figure BDA0002314636270000101
wherein, Inter (d)i,dj) Is diAnd djIntersection of Chinese terms, Union (d)i,dj) Is diAnd djUnion of Chinese words.
After the above-mentioned different similarity calculations are completed, the matrix generation module 13 summarizes various similarity results between the entity pairs according to pairwise pairing relationship, so as to form a similarity matrix between different entity pairs.
2. Map embedded type representation learning module
The Graph embedded expression learning module 20 is intended to recommend entities by using relationships between knowledge Graph entities, learns each entity and relationship in a knowledge Graph by using a Graph embedding learning method (Graph embedding) to obtain a low-dimensional vector, and maintains the original structural information in the Graph, which is used as an important input of the next module to influence the final entity recommendation output of the system.
In this embodiment, the map embedded representation learning module 20 performs learning by using an improved method based on a TransE algorithm.
Fig. 3 is a schematic diagram of the core idea of the TransE algorithm.
As shown in fig. 3, the TransE algorithm is a classical graph-embedding learning method of knowledge-graph, wherein for a set of triples < head, relation, tail >, (h + r) should be as equal as possible to t.
In the TransE algorithm, every triplet is equally important, i.e., the weight of each < h, t, r > edge is the same. While in many recommendation applications, users may be more inclined to recommendations of more trending entities. For example, there are several parts of the entity "wujing" in the knowledge-graph 200 to represent the work ": < wujing, representative of the work, wandering earth >, < wujing, representative of the work, shaolin wuwang >, it is clear that users are more inclined to require recommendations of the hot entity "wandering earth" when they receive recommendations of related entities of the entity "wujing".
In the embodiment, aiming at the characteristics of the recommended application, an embedded learning method of an entity is modified on the basis of a TransE algorithm. During training, each edge introduces a weight w:
Figure BDA0002314636270000111
wherein v istThe number of accesses of the tail entity is used for measuring the popularity of the entity; v. ofmax、vminRespectively representing the maximum and minimum access times of all edges of the head entity h, and being used for normalizing the weight of each entity; alpha is a super parameter, used to compensate for the weight of the low prevalence edge.
Since there are some trivial entities with high number of visits in the knowledge-graph 200, we do not need this edge to be weighted too heavily in the application, e.g. in the triple < wujing, nationality, china >, the number of visits of the entity "china" is very high, which is not suitable for the recommended task.
Therefore, considering that for an entity, if the number of introductions of the entity is too high, the information content of the relation of the entity as the tail entity is low, the module introduces the arithmetic square root of the introductions of the tail entity as the measure of the trivial entity, and the updated weight w is:
Figure BDA0002314636270000112
for each edge < h, r, t > in the knowledge-graph, its weight w is calculated, resulting in a set of inputs (h, r, t, w) to the model, and the final loss function is:
Figure BDA0002314636270000113
wherein, (h ', r, t') represents a negative sample; gamma is a margin super parameter, and the value of the embodiment is 1.
3. Entity recommendation generation module
The entity recommendation generating module 30 of this embodiment includes a ranking model, and the training of the model adopts a training method based on Pairwise, and uses the user click data to train the parameter w in the model.
In the prior art, an entity similarity calculation module based on semantic information is directly utilized to provide a set of unsupervised entity recommendation schemes, that is, other entities with high similarity are ranked and recommended to a user according to the user concerned content (for example, the entity content being browsed). However, in such recommendation system, the quality of the ranking completely depends on the output of the model, while the ranking result output by the unsupervised method hardly reflects the preference of the user, and it only uses the ranking score method to add the scores of different similarity models, so that the behavior of the user in the real environment cannot be fitted. In order to integrate the selection preference result of the user in the real environment into the recommendation system, the problem is converted into how to obtain the optimal model parameter w by using the user access data.
In this embodiment, the training model is a ranking learning method based on Pairwise, and therefore in the training set, each training example is a pair of entity pairs having a partial ranking relationship. The training data set used by the model in the entity recommendation generation module 30 during training is constructed from the user click data: and sequencing the correlation degree of the entities in the original recommendation list and the target entities in the system through the click records of the links presented to the user by the user in the recommendation engine, selecting an entity with high click rate and an entity with low click rate from the sequencing, and constructing a pair of entity pairs with a partial order relationship as a group of training data. The specific utilization method is as follows:
for a certain target entity e in the mapkE to be presented to the user by the recommendation enginekThe entities are sorted according to the click times of the user through the recommended link, and the first N entities and the last N entities of the sorted sequence are taken to form two sets AkAnd BkConsidered relative to set BkMiddle energizerIn particular, set AkEntity in (1) and entity ekMore relevant. Respectively from AkAnd BkTaking one entity to form an entity pair (e)i,ej) If e isi∈Ak,ej∈BkThen the target Y is 1, otherwise Y is 0.
FIG. 4 is a schematic diagram of a model framework of an entity recommendation generation module according to an embodiment of the invention.
As shown in FIG. 4, a RankNet model is used in the entity recommendation generation module 30 to predict the more relevant probabilities between entities, i.e., e for any one entity pairiAnd ejPredicting eiRatio ejFurther to the target entity ekProbability of correlation PijComprises the following steps:
Figure BDA0002314636270000121
specifically, the model structure is shown in fig. 4. The bottom layer of the model has three inputs, veci,vecj,veckThe three inputs veci,vecj,veckAre respectively entity ei,ej,ekIs obtained by the graph embedded representation learning module 20. veci,vecj,veckAfter the model bottom layer is input, the difference of the embedded expression between the two entities is obtained through a calculation formula in the model bottom layer part:
disk,i=veck-veci
the middle layer of the model also has two inputs, simk.iAnd simk.jThe simk.iAnd simk.jAre respectively an entity pair (e)i,ek) And (e)j,ek) The input of the middle layer is combined with the output of the bottom layer (i.e. the difference of the calculated embedded representation), and S is calculated by the main part of the modelk,iAnd Sk,jThen, prob can be obtained by the above formulai,jThe probi,jIs an entityeiRatio ejFurther to the target entity ekProbability of correlation PijI.e. the final similarity score.
Therefore, in the model, the distance characteristics of the entities in the knowledge graph 200 are increased, so that the distance relationship between the two entities in the knowledge graph 200 can be measured while the final similarity score is obtained. In addition, the main part of the model adopts the structure of the RankNet model in the prior art, the RankNet model comprises a plurality of DENSE modules, the specific structure and the working principle are the same as those of the RankNet model in the prior art, and the details are not repeated here.
Effects and effects of the embodiments
In this embodiment, the entity similarity calculation module can calculate the similarity of the entities by using the entity names, the text description information, and the triple semantic information of the entities in the knowledge graph, the graph embedded expression learning module can learn to obtain the embedded expression of the entities in the knowledge graph by using the relationship between the entities in the knowledge graph as input, and the entity recommendation generation module can fuse the outputs of the entity similarity calculation module and the graph embedded expression learning module into a final similarity score by using click feedback data of a user.
In the embodiment, because the graph embedded type representation learning module measures the weight of the triples through the heat degree of the tail entity and the entrance degree information of the tail entity, compared with the prior art, the graph embedded type representation learning module can enable the triples with higher quality to have higher training weights, so that the effect of the representation learning algorithm is improved.
In the embodiment, since the entity similarity calculation module includes a literal similarity calculation operator module for calculating the similarity between names of entities, and a text similarity calculation operator module for calculating the similarity between description texts of the entities, the two modules can use a large amount of knowledge information owned by the knowledge graph, such as triple information, entity description information, and the like, to calculate the unsupervised entity similarity without clicking or feeding back data by the user, and the calculation result can provide the similarity ranking between the entities to allow the recommendation application to provide the recommended entity list for the user, the recommendation system of the embodiment is a system that can be started without clicking or feeding back by the user, that is, a cold start system, compared with the prior art that needs a hot start system (that needs a certain amount of clicking or feeding back data by the user to perform recommendation), the environmental limitation is small when the method is applied.
The entity recommendation generation module of the embodiment can fuse the similarity between entities obtained by the entity similarity calculation module and the entity embedded vector information obtained by the map embedded representation learning module through a model based on a sequencing learning method by using the click information of the user, then obtain the final similarity score of the entities, and sequence the similar entities based on the final similarity score, so that the sequencing result can be more fitted with the real intention of the user, and a better personalized recommendation effect is achieved.

Claims (9)

1. A knowledge graph recommendation system fusing entity information and popularity is arranged in a recommendation application for recommending entities to a user, and is used for obtaining similarity scores of the entities according to knowledge graphs established by related information of a plurality of entities and click feedback information of the user when browsing the related information of the entities so as to make the recommendation application recommend according to the similarity scores, and the system is characterized by comprising:
the entity similarity calculation module is used for calculating and obtaining the similarity of the entities by utilizing entity names, text description information and triple semantic information of the entities in the knowledge graph after acquiring the text information of the knowledge graph and inputting the text information;
the knowledge graph embedded representation learning module comprises a TransE model improved according to recommended application, and machine learning is carried out by taking the relation between the entities in the knowledge graph as input so as to obtain embedded representation of the entities in the knowledge graph; and
and the entity recommendation generation module is used for fusing the similarity obtained by the entity similarity calculation module and the embedded representation obtained by the map embedded representation learning module into a final similarity score based on the click feedback data of the user, so that the recommendation application recommends the entity to the user according to the similarity score.
2. The system according to claim 1, wherein the system comprises:
the entity similarity calculation module comprises a literal similarity operator module for calculating the similarity between the names of the entities, a text similarity operator module for calculating the similarity between the description texts of the entities, and a matrix generation submodule for generating an entity similarity matrix as the similarity according to the results of the literal similarity operator module and the text similarity operator module.
3. The system of claim 2, wherein the system comprises:
wherein the literal similarity operator module takes the entity name of the entity as input, and calculates the text similarity of the entity based on the longest common subsequence LCS and the edit distance Levenshtein,
let entity eiAnd ejThe corresponding entity name is siAnd sj
The entity eiAnd ejSimilarity SimLCS based on longest common subsequence LCSijComprises the following steps:
Figure FDA0002314636260000021
in the above formula, | LCS(s)i,sj) L is siAnd sjLongest common subsequence length, | siI and sjL is respectively siAnd sjLength of (d);
the entity eiAnd ejSimilarity SimED based on edit distance LevenshteinijComprises the following steps:
Figure FDA0002314636260000022
in the above formula, ldist(s)i,sj) Is s isiAnd sjClass edit distance, | siI and sjL is respectively siAnd sjLength of (d).
4. The system of claim 2, wherein the system comprises:
the text similarity calculation operator module takes the description text of each entity as an input document, and calculates the text similarity of the entities by using five indexes of TF-IDF, LSI model, LDA model, HDP model and Jaccard coefficient.
5. The system according to claim 4, wherein the system comprises:
wherein, for entity eiAnd ejSetting the two entities eiAnd ejThe corresponding description texts are respectively diAnd dj
The entity eiAnd ejBased on the similarity SimTFIDF of TF-IDFijComprises the following steps:
SimTFIDFij=cos(vi,vj)
in the above formula, viIs diOf the TF-IDF weighted vector of each dimension wijDefined as the smoothed TF-IDF factor;
the entity eiAnd ejBased on LSI similarityDegree SimLSIijComprises the following steps:
SimLSIij=cos(vi,vj)
in the above formula, viIs D' in DiCorresponding vector, vjIs D' in DjCorresponding vector, D' is the vectorization representation of the document after dimension reduction;
the entity eiAnd ejBased on the similarity SimLDA of the LDAijComprises the following steps:
SimLDAij=cos(vi,vj)
in the above formula, viIs diCorresponding (document-topic) distribution, vjIs djCorresponding (document-topic) distribution;
the entity eiAnd ejBased on the similarity SimHDP of HDPijComprises the following steps:
SimHDPij=cos(vi,vj)
in the above formula, viIs diCorresponding (document-topic) distribution, vjIs djCorresponding (document-topic) distribution;
the entity eiAnd ejSimilarity SimJaccard based on JaccardijComprises the following steps:
Figure FDA0002314636260000041
in the above formula, Inter (d)i,dj) Is diAnd djIntersection of Chinese terms, Union (d)i,dj) Is diAnd djUnion of Chinese words.
6. The system according to claim 1, wherein the system comprises:
the map embedded expression learning module adopts an improved method based on a TransE algorithm for learning, and the improvement of the TransE algorithm in the method is as follows:
during training, each edge introduces a weight w:
Figure FDA0002314636260000042
Figure FDA0002314636260000043
in the above formula, vtThe number of accesses of the tail entity is used for measuring the popularity of the entity; v. ofmax、vminRespectively representing the maximum and minimum access times of all edges of the head entity h, and being used for normalizing the weight of each entity; alpha is a super parameter, used to compensate the weight of the low popularity edge,
for each edge < h, r, t > in the knowledge-graph, the weight w is calculated to obtain a set of inputs (h, r, t, w) of the model, and the final loss function is:
Figure FDA0002314636260000044
in the above formula, (h ', r, t') represents a negative sample; gamma is a margin hyper-parameter, and the value is preset.
7. The system according to claim 6, wherein the system comprises:
wherein gamma is 1.
8. The system according to claim 1, wherein the system comprises:
wherein the entity recommendation generation module adopts an improved RankNet model,
the improved RankNet model comprises a bottom layer, a middle layer, a main body part and an output layer,
the bottom layer has three inputs veci,vecj,veckThe three inputs veci,vecj,veckAre respectively entity ei,ej,ekThe difference of the embedded representations between the entities is obtained through the calculation of the bottom layer part;
the middle layer also has two inputs simk.iAnd simk.jThe simk.iAnd simk.jAre respectively an entity pair (e)i,ek) And (e)j,ek) The input of the middle layer is combined with the output of the bottom layer (i.e. the difference of the calculated embedded representation), and S is calculated by the main partk,iAnd Sk,j
The output layer calculates to obtain prob by adopting the following calculation formulai,jAs the final similarity score:
Figure FDA0002314636260000051
in the above formula, PijAs said entity eiRatio ejWith a target entity ekMore relevant probabilities, i.e. probi,j
9. The system according to claim 8, wherein the system comprises:
and when the improved RankNet model is trained, the access data of each user is used as input, and the improved RankNet model after training is obtained and corresponds to each user respectively.
CN201911272709.6A 2019-12-12 2019-12-12 Knowledge map recommendation system fusing entity information and heat Active CN112966091B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911272709.6A CN112966091B (en) 2019-12-12 2019-12-12 Knowledge map recommendation system fusing entity information and heat

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911272709.6A CN112966091B (en) 2019-12-12 2019-12-12 Knowledge map recommendation system fusing entity information and heat

Publications (2)

Publication Number Publication Date
CN112966091A true CN112966091A (en) 2021-06-15
CN112966091B CN112966091B (en) 2022-10-14

Family

ID=76270986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911272709.6A Active CN112966091B (en) 2019-12-12 2019-12-12 Knowledge map recommendation system fusing entity information and heat

Country Status (1)

Country Link
CN (1) CN112966091B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254789A (en) * 2021-06-30 2021-08-13 中国气象局公共气象服务中心(国家预警信息发布中心) Method and device for pushing meteorological service content
CN113282839A (en) * 2021-07-15 2021-08-20 长沙豆芽文化科技有限公司 Internet data push processing method and system
CN114564636A (en) * 2021-12-29 2022-05-31 东方财富信息股份有限公司 Recall sequencing algorithm and stacked technical architecture for financial information search middleboxes
CN116702899A (en) * 2023-08-07 2023-09-05 上海银行股份有限公司 Entity fusion method suitable for public and private linkage scene
CN116910371A (en) * 2023-09-07 2023-10-20 南京大数据集团有限公司 Recommendation method and system based on deep relation
CN116993237A (en) * 2023-09-21 2023-11-03 北京上奇数字科技有限公司 Enterprise recommendation method and system based on cosine similarity algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733798A (en) * 2018-05-17 2018-11-02 电子科技大学 A kind of personalized recommendation method of knowledge based collection of illustrative plates
CN109063188A (en) * 2018-08-28 2018-12-21 国信优易数据有限公司 A kind of entity recommended method and device
CN110188208A (en) * 2019-06-04 2019-08-30 河海大学 A kind of the information resources inquiry recommended method and system of knowledge based map
CN110321482A (en) * 2019-06-11 2019-10-11 阿里巴巴集团控股有限公司 A kind of recommended method of information, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733798A (en) * 2018-05-17 2018-11-02 电子科技大学 A kind of personalized recommendation method of knowledge based collection of illustrative plates
CN109063188A (en) * 2018-08-28 2018-12-21 国信优易数据有限公司 A kind of entity recommended method and device
CN110188208A (en) * 2019-06-04 2019-08-30 河海大学 A kind of the information resources inquiry recommended method and system of knowledge based map
CN110321482A (en) * 2019-06-11 2019-10-11 阿里巴巴集团控股有限公司 A kind of recommended method of information, device and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DEQING YANG: "Knowledge embedding towards the recommendation with sparse user-item interactions", 《2019 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254789A (en) * 2021-06-30 2021-08-13 中国气象局公共气象服务中心(国家预警信息发布中心) Method and device for pushing meteorological service content
CN113282839A (en) * 2021-07-15 2021-08-20 长沙豆芽文化科技有限公司 Internet data push processing method and system
CN113282839B (en) * 2021-07-15 2021-09-14 长沙豆芽文化科技有限公司 Internet data push processing method and system
CN114564636A (en) * 2021-12-29 2022-05-31 东方财富信息股份有限公司 Recall sequencing algorithm and stacked technical architecture for financial information search middleboxes
CN116702899A (en) * 2023-08-07 2023-09-05 上海银行股份有限公司 Entity fusion method suitable for public and private linkage scene
CN116702899B (en) * 2023-08-07 2023-11-28 上海银行股份有限公司 Entity fusion method suitable for public and private linkage scene
CN116910371A (en) * 2023-09-07 2023-10-20 南京大数据集团有限公司 Recommendation method and system based on deep relation
CN116910371B (en) * 2023-09-07 2024-01-23 南京大数据集团有限公司 Recommendation method and system based on deep relation
CN116993237A (en) * 2023-09-21 2023-11-03 北京上奇数字科技有限公司 Enterprise recommendation method and system based on cosine similarity algorithm

Also Published As

Publication number Publication date
CN112966091B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN110046304B (en) User recommendation method and device
CN111737495B (en) Middle-high-end talent intelligent recommendation system and method based on domain self-classification
CN108573411B (en) Mixed recommendation method based on deep emotion analysis and multi-source recommendation view fusion of user comments
CN112966091B (en) Knowledge map recommendation system fusing entity information and heat
CN111753098A (en) Teaching method and system based on cross-media dynamic knowledge graph
US7844592B2 (en) Ontology-content-based filtering method for personalized newspapers
CN112214685A (en) Knowledge graph-based personalized recommendation method
JP2020123318A (en) Method, apparatus, electronic device, computer-readable storage medium, and computer program for determining text relevance
Sang et al. Context-dependent propagating-based video recommendation in multimodal heterogeneous information networks
US20100205176A1 (en) Discovering City Landmarks from Online Journals
CN105139211B (en) Product brief introduction generation method and system
CN111143672B (en) Knowledge graph-based professional speciality scholars recommendation method
Huang et al. Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow
JP2022035314A (en) Information processing unit and program
Sun et al. Graph force learning
Zhang et al. An interpretable and scalable recommendation method based on network embedding
Wei et al. Online education recommendation model based on user behavior data analysis
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
CN117574915A (en) Public data platform based on multiparty data sources and data analysis method thereof
CN115481325A (en) Personalized news recommendation method and system based on user global interest migration perception
KR102454261B1 (en) Collaborative partner recommendation system and method based on user information
Lemoisson et al. ViewpointS: capturing formal data and informal contributions into an adaptive knowledge graph
Xu et al. Similarmf: a social recommender system using an embedding method
Zhang et al. A deep recommendation framework for completely new users in mashup creation
Blanco-Fernández et al. Automatically Assembling a Custom-Built Training Corpus for Improving the Learning of In-Domain Word/Document Embeddings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant