CN114580638A - Knowledge graph representation learning method and system based on text graph enhancement - Google Patents

Knowledge graph representation learning method and system based on text graph enhancement Download PDF

Info

Publication number
CN114580638A
CN114580638A CN202210133500.7A CN202210133500A CN114580638A CN 114580638 A CN114580638 A CN 114580638A CN 202210133500 A CN202210133500 A CN 202210133500A CN 114580638 A CN114580638 A CN 114580638A
Authority
CN
China
Prior art keywords
entity
text
knowledge graph
representation
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210133500.7A
Other languages
Chinese (zh)
Inventor
卢记仓
王凌
周刚
兰明敬
李珠峰
祝涛杰
吴建萍
陈静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202210133500.7A priority Critical patent/CN114580638A/en
Publication of CN114580638A publication Critical patent/CN114580638A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of knowledge graphs, and particularly relates to a knowledge graph representation learning method and system based on text graph enhancement, wherein a named entity is extracted by analyzing and processing text description of a knowledge graph entity, and a two-layer heterogeneous text graph formed by sentence layer nodes and text entity layer nodes is constructed; establishing connection between a text graph entity and a knowledge graph entity, acquiring an enhanced knowledge graph, and processing to obtain node initialization representation; carrying out semantic propagation between entities by adopting a graph convolution neural network to obtain entity text representation fusing text content semantics and triple structure semantics; the entity text representation is combined with the entity structure representation only considering the triples, and the optimization is updated through a negative sample and loss function. The method can better integrate the entity text content semantics into the knowledge graph, effectively relieve the problem of sparsity of the knowledge graph, improve the expression capability of knowledge graph representation learning, and have better applicability under the condition of few samples or zero samples.

Description

Knowledge graph representation learning method and system based on text graph enhancement
Technical Field
The invention belongs to the technical field of knowledge graphs, and particularly relates to a knowledge graph representation learning method and system based on text graph enhancement.
Background
The origin of the Knowledge Graph (knowledgegraph) dates back to the 50 th century, and in 2012, the concept of the Knowledge Graph was formally proposed by Google and used for a search engine, so that the performance of the Knowledge Graph is greatly improved. With the rapid development of artificial intelligence and the support of technologies such as big data, internet of things, natural language processing and the like, the knowledge graph plays an important promoting role in various social industries such as security, finance, judicial sciences, traffic, science and technology, medical treatment and the like, and is also commonly used as one of core support technologies in the fields of intelligent question answering, recommendation systems and the like. Therefore, the research of knowledge map related technology is of great practical significance. With the rapid expansion of data and information, the knowledge graph technology faces many problems in the aspects of knowledge acquisition, representation, fusion, application and the like, especially for a knowledge graph with a large scale, the problems of triple deletion (link deletion) or errors and the like often exist, and the manual method for solving the problems is high in cost and low in efficiency. Knowledge reasoning refers to mining or complementing missing and implicit knowledge by utilizing existing entities and relations in a knowledge graph, and meanwhile, optimizing possible errors or conflicts. Therefore, the research on knowledge reasoning technology and method has important theoretical value and practical significance.
The prior knowledge graph reasoning technology and method can be roughly divided into two aspects of rule-based reasoning and learning-based reasoning, and in the learning-based reasoning aspect, a proper knowledge graph representation learning method is mainly adopted to map entities and relations into low-dimensional dense vectors, and then the relations among different vectors are predicted and reasoned through a specific algorithm, and the method mainly comprises the steps of structure-based reasoning, structure-and text-description reasoning and reasoning referring to external information. The structure-based reasoning usually only considers the triple information when representing the vector, performs simple modeling on the entity and the relation, and has relatively limited vector expression capability and model performance on the conditions of complex relation, uneven distribution and sparseness of the triple and the like. In order to solve the problem, various methods try to introduce external auxiliary information such as categories, time, images and the like into the model, and the expression capacity of the vector is improved to a certain extent. Text generally has richer semantic information than auxiliary information such as category, time, image, etc., and therefore, a model combining text information is an important direction for current and future knowledge graph representation learning research. However, when the model is combined with text information, independent coding is often adopted, and it is difficult to better fuse the triple structure semantics and the content semantics of auxiliary information such as text, and the performance of the model needs to be further improved.
Disclosure of Invention
Therefore, the invention provides a knowledge graph representation learning method and system based on text graph enhancement, which expand content semantic information described by an entity text into a knowledge graph in a text graph mode, and simultaneously realize propagation and joint representation between the structure semantics of the knowledge graph and the text description content semantics through a graph convolution network, thereby realizing the full fusion of the two, improving the vector representation accuracy and expression capability of the entity and relation and the knowledge reasoning performance in the embedding process of the knowledge graph, and facilitating the application of knowledge graph reasoning in the actual industry field.
According to the design scheme provided by the invention, the knowledge graph representation learning method based on the text graph enhancement comprises the following contents:
aiming at an original knowledge graph containing entity text description information, identifying and extracting named entities in the original knowledge graph text description information, and constructing a text graph of entity description based on an entity layer and a sentence layer;
expanding the original knowledge graph by establishing connection between entity nodes in the text graph and entity nodes in the original knowledge graph, taking the expanded original knowledge graph as an enhanced knowledge graph, and performing initialization vector representation on each node in the enhanced knowledge graph;
aiming at the enhanced knowledge graph represented by initialization, according to the adjacency relation type among nodes, aggregating the nodes of different types of adjacency relations into an entity through semantic propagation and aggregation in a graph convolution network, and acquiring entity text representation of a fusion structure and content semantics;
acquiring entity joint representation through entity text representation and entity structure representation in an original knowledge graph triple relation;
constructing a negative sample based on a self-confrontation negative sampling strategy, setting a loss function, training and optimizing each entity with a relationship in the entity joint representation by using the negative sample to obtain a knowledge graph representation learning model, and performing vector representation on the entity and the relationship in the target input by using the knowledge graph representation learning model.
As the learning method based on the text graph enhanced knowledge graph representation, further, in identifying and extracting the named entities of the original knowledge graph, firstly, analyzing and processing the entity text description information in the original knowledge graph, wherein the analyzing and processing process at least comprises the following steps: judging the word segmentation and the part of speech of the text description; then, named entities are identified and extracted from the analysis process, and each named entity category is determined according to the identification result.
As the learning method based on the enhanced knowledge graph representation of the text graph, further, in the text graph of the entity description constructed based on the entity layer and the sentence layer, firstly, the identified named entity and the sentence in which the named entity is located are used as nodes of different types to construct the nodes of the entity layer and the nodes of the sentence layer; then, connections are established between the named entity nodes and the sentence nodes where the named entity nodes are located, between the named entities appearing in the same sentence, and between the sentence nodes with the context relationship.
As the learning method for representation of the knowledge graph based on text graph enhancement, further, in the process of initializing and representing each node in the enhanced knowledge graph, firstly, a text entity node in the text graph is connected with a corresponding entity in an original knowledge graph from which the text entity node is derived, if the text entity appears in text description of a plurality of different entities in the original knowledge graph, the text entity is connected with each corresponding entity in the original knowledge graph, and the connection types are distinguished according to the text entity categories to obtain the enhanced knowledge graph; then, a pre-training model and vector representation dimension parameters for generating text vectors are set, text entity nodes, sentence nodes and entity nodes in the original knowledge graph are respectively used as pre-training model inputs, and the pre-training model is used for obtaining initialization vector representations corresponding to the inputs.
The method comprises the steps of obtaining an enhanced knowledge graph expressed by an initialization vector in entity text expression, taking an entity to be expressed as a center and a relation path length as a distance, sequentially aggregating sentence nodes, text entity nodes and entity nodes which are in a set condition with the distance from the entity to be expressed to the entity to be expressed by using a graph convolution network model and a weighted aggregation method, and transmitting the sentence nodes, the text entity nodes and the entity nodes to the entity to be expressed, wherein the distance from the entity to be expressed meets the set condition, and setting corresponding weight parameters according to different relations.
As the learning method based on the text graph enhanced knowledge graph representation, further, the operation formula of the aggregation transfer is represented as follows:
Figure BDA0003503535750000031
wherein N isi,rIs a set of neighbor nodes of node i under the relation r, ci,r、ci,RRespectively representing a set of relationships NRSet of neighbor nodes Ni,rThe size of (a) is (b),
Figure BDA0003503535750000032
is the weight parameter corresponding to the relation r,
Figure BDA0003503535750000033
is the weight parameter corresponding to the node itself,
Figure BDA0003503535750000034
and representing the vector representation of the node i in the l-th layer graph convolution network model, wherein the sigma represents an activation function.
As the learning method based on the text graph enhanced knowledge graph representation, the corresponding weight parameters are respectively set for different entities and entity representation dimensions aiming at the entity text representation obtained from the enhanced knowledge graph and the entity structure representation obtained from the original knowledge graph in the entity joint representation, the weight parameters are used as gate vectors, and the entity text representation and the entity structure representation are subjected to dimension-by-dimension weighted summation to obtain the entity joint representation.
As the learning method based on the text graph enhanced knowledge graph representation, further, the dimension-by-dimension weighted summation process is represented as follows: e- σ (g)e)⊙es+(1-σ(ge))⊙edWherein, [ alpha ] indicates a bit-wise multiplication of elements, [ alpha ] is an activation function, geRepresenting the gate weight vector associated with entity e.
As the learning method based on the text graph enhanced knowledge graph representation, further, in training optimization, a negative sample is constructed by setting a sampling rate, and a loss function is utilized
Figure BDA0003503535750000035
Carrying out training optimization, wherein gamma is a boundary hyperparameter, sigma is an activation function, f (h, t) is the scores of all positive triples, (h)i′,r,ti') is the ith negative sample, p (h'i,r,t′i) Is the ith negative sample sampling probability, f (h'i,t′i) And n is the negative sample number of the sampled triple.
Further, the present invention also provides a learning system based on the enhanced knowledge graph representation of the text graph, which comprises: a building module, an enhancement module, a combination module and an optimization module, wherein,
the construction module is used for identifying and extracting named entities in the original knowledge graph text description information aiming at the original knowledge graph containing entity text description information, and constructing a text graph of entity description based on an entity layer and a sentence layer;
the enhancement module is used for expanding the original knowledge graph by establishing connection between entity nodes in the text graph and entity nodes in the original knowledge graph, taking the expanded original knowledge graph as an enhanced knowledge graph and performing initialization vector representation on each node in the enhanced knowledge graph;
the joint module is used for aggregating the nodes of the adjacent relations of different types into an entity through semantic propagation and aggregation in a graph convolution network according to the adjacent relation types among the nodes aiming at the enhanced knowledge graph expressed by initialization, and acquiring entity text expression of a fusion structure and content semantics; acquiring entity joint representation through entity text representation and entity structure representation in an original knowledge graph triple relation;
and the optimization module is used for constructing a negative sample based on a self-confrontation negative sampling strategy, setting a loss function, training and optimizing each entity with a relation in the entity joint representation by using the negative sample to obtain a knowledge graph representation learning model, and performing vector representation on the entity and the relation in the target input by using the knowledge graph representation learning model.
The invention has the beneficial effects that:
1. the method adopts named entity recognition, two-layer heteromorphic graph construction and the like to obtain the text graph of the knowledge graph entity description information, and extends and connects the text graph to the original knowledge graph, and the obtained enhanced knowledge graph can simultaneously contain text content semantics and the relation structure semantics of the original knowledge graph, so that the subsequent knowledge graph representation learning and knowledge reasoning can be better supported.
2. According to the invention, on the basis of enhancing knowledge graph initialized representation, graph models such as graph convolution network and the like are adopted to represent entity nodes, so that semantic propagation between text content semantics and relation structure semantics can be realized, and the obtained entity representation can better integrate information such as content semantics and structure semantics of knowledge graph entities, thereby enabling knowledge graph representation learning to have better semantic expression capability.
3. According to the method, a door mechanism is adopted to combine the entity text representation obtained by the enhanced map with the entity representation obtained by the original knowledge map, fusion of finer granularity can be realized by setting different parameters for different representation dimensions, and meanwhile, the quality of a training data set is better ensured based on a self-confrontation negative sample construction strategy, so that the trained and optimized model has better performance, namely a better knowledge map representation result.
Description of the drawings:
FIG. 1 is a flow diagram of a knowledge graph representation learning method based on text graph enhancement in an embodiment;
FIG. 2 is an example knowledge graph representation learning overview framework illustration;
FIG. 3 is a schematic diagram of a multi-layer heterogeneous text graph of entity description information in the embodiment;
FIG. 4 is a schematic representation of the construction of an enhanced knowledge graph and its entity association in an embodiment;
FIG. 5 is a schematic representation of a knowledge graph representation learning system in an embodiment.
The specific implementation mode is as follows:
in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.
In consideration of the high accuracy and reliability requirements of realistic research and field application on a knowledge graph representation learning and reasoning method, but the problem that the conventional knowledge graph representation learning method is difficult to sufficiently fuse a triple structure and external auxiliary information at a semantic level, the embodiment of the invention provides a knowledge graph representation learning method based on text graph enhancement, which is shown in fig. 1 and comprises the following contents:
s101, aiming at an original knowledge graph containing entity text description information, identifying and extracting named entities in the original knowledge graph text description information, and constructing a text graph of entity description based on an entity layer and a sentence layer;
s102, expanding the original knowledge graph by establishing connection between entity nodes in the text graph and entity nodes in the original knowledge graph, taking the expanded original knowledge graph as an enhanced knowledge graph, and performing initialization vector representation on each node in the enhanced knowledge graph;
s103, aggregating the nodes of the adjacent relations of different types into an entity through semantic propagation and aggregation in a graph convolution network according to the adjacent relation type among the nodes aiming at the enhanced knowledge graph expressed by initialization, and acquiring entity text expression of a fusion structure and content semantics;
s104, acquiring entity joint representation through entity text representation and entity structure representation in the original knowledge graph triple relation;
s105, constructing a negative sample based on a self-confrontation negative sampling strategy, setting a loss function, training and optimizing each entity with a relation in the entity joint representation by using the negative sample to obtain a knowledge graph representation learning model, and performing vector representation on the entity and the relation in the target input by using the knowledge graph representation learning model.
Referring to the framework shown in fig. 2, content semantic information described by an entity text is expanded into a knowledge graph in a text graph mode, and propagation and joint representation between the structure semantics of the knowledge graph and the text description content semantics are fused through a graph convolution network, so that the two are fully fused, and the expression capability and the knowledge inference performance of entity vector representation are improved; the method has the advantages that the entity text content semantics can be better fused into the knowledge graph, the problem of sparsity of the knowledge graph is effectively solved, the expression capability of learning represented by the knowledge graph is improved, and meanwhile, the method has better applicability under the condition of few samples or zero samples.
Referring to fig. 3, by using a suitable method, named entities are identified from text description information corresponding to knowledge graph entities, and a text graph of entity descriptions is constructed and obtained based on a multilayer heterogeneous representation method. Aiming at a knowledge graph entity containing text description information, selecting a proper method from a Jieba tool, a Conditional Random Field (CRF) algorithm, a BERT model or a Transformer framework and the like, carrying out analysis processing such as word segmentation, part of speech discrimination and the like on the entity text description, extracting and identifying the contained named entity from the entity text description on the basis, and then determining the category of each text entity according to the identification result, wherein the category can be described in a mode shown in a table 1;
TABLE 1 named entity Category partitioning
Type (B) Description of the preferred embodiment
PERSON Character
NORP Nationality, religion, etc
FAC Buildings, roads, or the like
ORG Company, institution, etc
GPE Country, city, etc
LOC Mountains, water bodies, etc
PRODUCT Articles, vehicles, food, and the like
EVENT Famous battle, sports meeting, etc
WORK_OF_ART Names of books, songs, etc
LAW Law of law
LANGUAGE Language(s)
On the basis of named entity recognition, sentences contained in recognized text entities and text descriptions are respectively used as nodes of different types, and entity layer nodes and sentence layer nodes are constructed. On the basis, connections between different nodes are established, and all the connections are undirected. For nodes in the same level, if two text entities appear in the same sentence, namely the two text entities have a co-occurrence relationship, the connection is established between the two text entity nodes, and if the two sentences are in a context relationship in the text description, the connection is established between the two sentence nodes. Through the processing, a two-layer heterogeneous text graph of the entity text description information can be obtained.
As shown in fig. 4, based on the constructed heterogeneous text graph, the extended enhanced knowledge graph is obtained by establishing the connection between the text entity node and the entity node in the original knowledge graph, and on the basis, the initialized representation of each node of the enhanced knowledge graph is given by adopting a proper pre-training model. The specific reference may be as follows: according to the text entity recognition result, establishing connection between the text entity node in the text graph and the entity node corresponding to the original knowledge graph, namely: if a text entity node in the text graph is from a text description of an entity in the original knowledge graph, a connection is established between the text entity node and the text entity, and if a text entity appears in the text descriptions of a plurality of different entities, a connection is established between the text entity and each entity, that is, a one-to-many connection is established, and the connection types are distinguished according to the text entity categories, for example: if the text entity type is "PERSON" ("PERSON"), then "PERSON" is labeled on the connection. The connection between the newly established text image-text entity and the original knowledge map entity is called 'enhanced connection', and the new map obtained after the connection is established is called 'enhanced knowledge map'. And selecting a proper pre-training model, such as BERT, RoBERTA, GPT-3, T5 and the like, aiming at the constructed enhanced knowledge graph, setting proper vector representation dimension parameters, respectively taking text entity nodes, sentence nodes, entity nodes in the original graph and the like in the text graph as input, calculating to obtain the output of the pre-training model, and respectively taking the output as the initialized distributed vector representation of each node of the enhanced knowledge graph.
As the learning method based on the text graph enhanced knowledge graph representation in the embodiment of the invention, further, in the entity text representation, aiming at the enhanced knowledge graph represented by the initialized vector, taking the entity to be represented as the center and the relationship path length as the distance, a proper graph convolution network, a graph attention machine mechanism and other models and a weighting aggregation method which can be used for graph data analysis can be selected, sentence nodes, text entity nodes and entity nodes which are in the distance from the entity to be represented and meet the set conditions are sequentially aggregated and transmitted to the entity to be represented, and corresponding weight parameters are set in different relationships.
Taking a graph convolution network model RGCN as an example, aiming at an entity to be represented, different weight parameters are respectively set according to the adjacent relation types between text entity nodes and text sentence nodes, between the text entity nodes and original graph entity nodes and between the original graph entity nodes, and transformation and aggregation operation are carried out in the direction of the entity to be represented, wherein the expression is as follows:
Figure BDA0003503535750000071
wherein N isi,rIs Ni,RAnd (5) the node i is a neighbor node set under the relation r. c. Ci,r、ci,RIs a normalization constant, and respectively represents a relationship set NRSet of neighbor nodes Ni,rThe size of (a) is (b),
Figure BDA0003503535750000072
is the weight parameter corresponding to the relation r,
Figure BDA0003503535750000073
is the weight parameter corresponding to the node itself,
Figure BDA0003503535750000074
the vector representation of the RGCN network model of the node i at the l-th layer represents the activation function. Through the operation, the node representations of different types of adjacent relations are aggregated into the entity, so that the entity text representation of the fusion structure and the content semantics is obtained, and the semantic propagation between the original image spectrum entity containing the relation structure semantics and the text entity containing the text content semantics is realized.
Further, in the entity joint representation, aiming at the entity text representation obtained from the enhanced knowledge graph and the entity structure representation obtained from the original knowledge graph, corresponding weight parameters are respectively set for different entities and entity representation dimensions, the weight parameters are used as gate vectors, and the entity text representation and the entity structure representation are subjected to dimension-by-dimension weighted summation to obtain the entity joint representation.
And (4) adopting a door mechanism to jointly represent the entity text representation and the traditional entity structure representation only considering the original knowledge graph triple relation. Referring to FIG. 4, the general process may include the following: only the entity structure representation of the original triple relationship is considered, and the specific method is as follows: aiming at the original knowledge graph before the expansion and the enhancement, selecting a proper knowledge graph representation learning method only considering the triple relation structure, such as typical representation learning models of TransE, TransR, TransD, RotatE, Quate, ConvE and the like, and obtaining the entity structure of the original knowledge graphAnd (4) showing. The entity joint representation based on the door mechanism comprises the following specific steps: representing h for entity text obtained from enhanced knowledge graphdAnd a representation h of the structure of the entity obtained from the original knowledge-graphsFor different entities and different dimensions of entity representation, different weight parameters are respectively set and used as gate vectors, the text representation and the structure representation of the entities are subjected to weighted summation dimension by dimension, and then entity joint representation based on a gate mechanism is obtained, and a calculation expression is as follows:
e=σ(ge)⊙es+(1-σ(ge))⊙ed
wherein,. alpha. "indicates that the elements are multiplied bit by bit,. alpha.is a Sigmoid function, geRepresenting a gate weight vector associated with entity e to constrain the structural representation e in esAnd a text representation edSpecific gravity of (a). g is a radical of formulaeBy using a sum of esThe initialization is carried out in the same way, and the dynamic adjustment is carried out in the training process, and the dynamic adjustment is not changed after the training is finished. From this, the head entity h, the relation r and the representation t of the tail entity are given as follows:
h=σ(ge)⊙hs+(1-σ(ge))⊙hd
r=rd
t=σ(ge)⊙ts+(1-σ(ge))⊙td
in the training optimization process, a high-quality negative sample for model training and parameter optimization is constructed by adopting a self-confrontation negative sampling strategy, and different possible negative samples can be sampled according to the following probabilities:
Figure BDA0003503535750000081
where α is the sampling rate, fr(h′j,t′j) The score for the negative sample of the sampled triplet is calculated according to the confidence score function of the knowledge representation learning method selected in step 401. On this basis, a suitable loss function is further constructed as follows:
Figure BDA0003503535750000082
wherein gamma is a boundary hyper-parameter, sigma is a Sigmoid function, f (h, t) is the score of all the positive triple samples, (h'i,r,t′i) Is the ith negative sample, p (h'i,r,t′i) Is the ith negative sample sampling probability, f (h'i,t′i) And n is the negative sample number of the sampled triple. Appropriate hyper-parameters including a learning rate, a parameter optimization algorithm, vector representation dimensions, boundary hyper-parameters, a sampling rate and the like are set through searching, a model is trained and optimized by taking a minimum loss function as a target, and entity representation with better expression ability is obtained.
Further, based on the foregoing method, referring to fig. 5, an embodiment of the present invention further provides a knowledge graph representation learning system based on text graph enhancement, including: a building module, an enhancement module, a combination module and an optimization module, wherein,
the construction module is used for identifying and extracting named entities in the original knowledge graph text description information aiming at the original knowledge graph containing entity text description information, and constructing a text graph of entity description based on an entity layer and a sentence layer;
the enhancement module is used for expanding the original knowledge graph by establishing connection between entity nodes in the text graph and entity nodes in the original knowledge graph, taking the expanded original knowledge graph as an enhanced knowledge graph and performing initialization vector representation on each node in the enhanced knowledge graph;
the joint module is used for aggregating the nodes of the adjacent relations of different types into an entity through semantic propagation and aggregation in a graph convolution network according to the adjacent relation types among the nodes aiming at the enhanced knowledge graph expressed by initialization, and acquiring entity text expression of a fusion structure and content semantics; acquiring entity joint representation through entity structure representation in the joint entity text representation and the original knowledge graph triple relation;
and the optimization module is used for constructing a negative sample based on a self-confrontation negative sampling strategy, setting a loss function, training and optimizing each entity with a relation in the entity joint representation by using the negative sample to obtain a knowledge graph representation learning model, and performing vector representation on the entity and the relation in the target input by using the knowledge graph representation learning model.
Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
Based on the foregoing method and/or system, an embodiment of the present invention further provides a server, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method described above.
Based on the above method and/or system, the embodiment of the invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the above method.
In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A knowledge graph representation learning method based on text graph enhancement is characterized by comprising the following contents:
recognizing and extracting named entities in the text description information of the original knowledge graph aiming at the original knowledge graph containing the text description information of the entities, and constructing a text graph of entity description based on an entity layer and a sentence layer;
expanding the original knowledge graph by establishing connection between entity nodes in the text graph and entity nodes in the original knowledge graph, taking the expanded original knowledge graph as an enhanced knowledge graph, and performing initialization vector representation on each node in the enhanced knowledge graph;
aiming at the enhanced knowledge graph represented by initialization, according to the adjacency relation type among nodes, aggregating the nodes of different types of adjacency relations into an entity through semantic propagation and aggregation in a graph convolution network, and acquiring entity text representation of a fusion structure and content semantics;
acquiring entity joint representation through entity text representation and entity structure representation in an original knowledge graph triple relation;
constructing a negative sample based on self-confrontation negative sampling measurement, setting a loss function, training and optimizing each entity with a relation in entity joint representation by using the negative sample to obtain a knowledge graph representation learning model, and performing vector representation on the entity and the relation in target input by using the knowledge graph representation learning model.
2. The method as claimed in claim 1, wherein the named entities of the original knowledge graph are identified and extracted, and the entity text description information in the original knowledge graph is analyzed, the analyzing process at least comprises: judging the word segmentation and the part of speech of the text description; then, named entities are identified and extracted from the analysis process, and each named entity category is determined according to the identification result.
3. The learning method based on the knowledge graph representation enhanced by the text graph according to claim 1 or 2, wherein in the text graph of the entity description constructed based on the entity layer and the sentence layer, firstly, the identified named entity and the sentence in which the named entity is located are taken as different types of nodes to construct the entity layer node and the sentence layer node; then, connections are established between the named entity nodes and the sentence nodes where the named entity nodes are located, between the named entities appearing in the same sentence, and between the sentence nodes with the context relationship.
4. The knowledge graph representation learning method based on text graph enhancement as claimed in claim 1, wherein in initializing each node in the enhanced knowledge graph, firstly, a connection is established between a text entity node in the text graph and a corresponding entity in an original knowledge graph from which the text entity is derived, if the text entity appears in text description of a plurality of different entities in the original knowledge graph, the connection is established between the text entity and each corresponding entity in the original knowledge graph, and the connection types are distinguished according to text entity categories to obtain the enhanced knowledge graph; then, a pre-training model and vector representation dimension parameters for generating text vectors are set, text entity nodes, sentence nodes and entity nodes in the original knowledge graph are respectively used as pre-training model inputs, and the pre-training model is used for obtaining initialization vector representations corresponding to the inputs.
5. The knowledge graph representation learning method based on text graph enhancement as claimed in claim 1 or 4, wherein in the entity text representation, aiming at the enhanced knowledge graph represented by the initialized vector, the sentence nodes, the text entity nodes and the entity nodes which are in the distance from the entity to be represented and meet the set conditions are sequentially aggregated and transmitted to the entity to be represented by using a graph convolution network model and a weighted aggregation method by taking the entity to be represented as the center and taking the length of the relationship path as the distance, and corresponding weight parameters are set according to different relationships.
6. The knowledge graph representation learning method based on text graph enhancement according to claim 5, wherein the operation formula of the aggregation transfer is expressed as follows:
Figure FDA0003503535740000021
wherein N isi,rIs a set of neighbor nodes of node i under the relation r, ci,r、ci,RRespectively representing a set of relationships NRSet of neighbor nodes Ni,rThe size of (a) is (b),
Figure FDA0003503535740000022
is the weight parameter corresponding to the relation r,
Figure FDA0003503535740000023
is the weight parameter corresponding to the node itself,
Figure FDA0003503535740000024
and representing the vector representation of the node i in the l-th layer graph convolution network model, wherein the sigma represents an activation function.
7. The knowledge graph representation learning method based on text graph enhancement as claimed in claim 1, wherein in the obtaining of the entity joint representation, for the entity text representation obtained in the enhanced knowledge graph and the entity structure representation obtained in the original knowledge graph, corresponding weight parameters are respectively set for different entities and entity representation dimensions, the weight parameters are used as gate vectors, and the entity text representation and the entity structure representation are subjected to dimension-by-dimension weighted summation to obtain the entity joint representation.
8. According toThe method of learning representation of knowledge graph based on text graph enhancement as claimed in claim 7, wherein the dimension-wise weighted summation process is represented as follows: e- σ (g)e)⊙es+(1-σ(ge))⊙edWherein, [ alpha ] indicates a bit-wise multiplication of elements, [ alpha ] is an activation function, geRepresenting the gate weight vector associated with entity e.
9. The method of claim 1, wherein in training optimization, negative samples are constructed by setting a sampling rate, and a loss function is used to learn the knowledge graph representation based on the enhancement of the text graph
Figure FDA0003503535740000025
Training optimization is carried out, wherein gamma is a boundary hyperparameter, sigma is an activation function, and f (h, t) is scores of all the positive triple samples, (h'i,r,t′i) Is the ith negative sample, p (h'i,r,t′i) Is the ith negative sample sampling probability, f (h'i,t′i) And n is the negative sample number of the sampled triple.
10. A system for learning a knowledge graph representation based on text graph enhancement, comprising: a building module, an enhancement module, a combination module and an optimization module, wherein,
the construction module is used for identifying and extracting named entities in the original knowledge graph text description information aiming at the original knowledge graph containing entity text description information, and constructing a text graph of entity description based on an entity layer and a sentence layer;
the enhancement module is used for expanding the original knowledge graph by establishing connection between entity nodes in the text graph and entity nodes in the original knowledge graph, taking the expanded original knowledge graph as an enhanced knowledge graph and performing initialization vector representation on each node in the enhanced knowledge graph;
the association module is used for aggregating the nodes of the adjacent relations of different types into an entity through semantic propagation and aggregation in a graph convolution network according to the adjacent relation type among the nodes aiming at the enhanced knowledge graph expressed by initialization, and acquiring entity text expression containing fusion structure and content semantics; acquiring entity joint representation through entity text representation and entity structure representation in an original knowledge graph triple relation;
and the optimization module is used for constructing a negative sample based on a self-confrontation negative sampling strategy, setting a loss function, training and optimizing each entity with a relation in the entity joint representation by using the negative sample to obtain a knowledge graph representation learning model, and performing vector representation on the entity and the relation in the target input by using the knowledge graph representation learning model.
CN202210133500.7A 2022-02-14 2022-02-14 Knowledge graph representation learning method and system based on text graph enhancement Pending CN114580638A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210133500.7A CN114580638A (en) 2022-02-14 2022-02-14 Knowledge graph representation learning method and system based on text graph enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210133500.7A CN114580638A (en) 2022-02-14 2022-02-14 Knowledge graph representation learning method and system based on text graph enhancement

Publications (1)

Publication Number Publication Date
CN114580638A true CN114580638A (en) 2022-06-03

Family

ID=81770086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210133500.7A Pending CN114580638A (en) 2022-02-14 2022-02-14 Knowledge graph representation learning method and system based on text graph enhancement

Country Status (1)

Country Link
CN (1) CN114580638A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080766A (en) * 2022-08-16 2022-09-20 之江实验室 Multi-modal knowledge graph characterization system and method based on pre-training model
CN115861715A (en) * 2023-02-15 2023-03-28 创意信息技术股份有限公司 Knowledge representation enhancement-based image target relation recognition algorithm
WO2023246849A1 (en) * 2022-06-22 2023-12-28 青岛海尔电冰箱有限公司 Feedback data graph generation method and refrigerator
CN117540035A (en) * 2024-01-09 2024-02-09 安徽思高智能科技有限公司 RPA knowledge graph construction method based on entity type information fusion

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023246849A1 (en) * 2022-06-22 2023-12-28 青岛海尔电冰箱有限公司 Feedback data graph generation method and refrigerator
CN115080766A (en) * 2022-08-16 2022-09-20 之江实验室 Multi-modal knowledge graph characterization system and method based on pre-training model
CN115080766B (en) * 2022-08-16 2022-12-06 之江实验室 Multi-modal knowledge graph characterization system and method based on pre-training model
CN115861715A (en) * 2023-02-15 2023-03-28 创意信息技术股份有限公司 Knowledge representation enhancement-based image target relation recognition algorithm
CN117540035A (en) * 2024-01-09 2024-02-09 安徽思高智能科技有限公司 RPA knowledge graph construction method based on entity type information fusion
CN117540035B (en) * 2024-01-09 2024-05-14 安徽思高智能科技有限公司 RPA knowledge graph construction method based on entity type information fusion

Similar Documents

Publication Publication Date Title
CN111488734B (en) Emotional feature representation learning system and method based on global interaction and syntactic dependency
US11631007B2 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN110334219B (en) Knowledge graph representation learning method based on attention mechanism integrated with text semantic features
CN114580638A (en) Knowledge graph representation learning method and system based on text graph enhancement
CN111753024B (en) Multi-source heterogeneous data entity alignment method oriented to public safety field
CN109389151B (en) Knowledge graph processing method and device based on semi-supervised embedded representation model
CN113657561B (en) Semi-supervised night image classification method based on multi-task decoupling learning
CN113515632B (en) Text classification method based on graph path knowledge extraction
CN113360675A (en) Knowledge graph specific relation completion method based on Internet open world
CN114756686A (en) Knowledge reasoning and fault diagnosis method based on knowledge graph
CN115329101A (en) Electric power Internet of things standard knowledge graph construction method and device
CN114238524A (en) Satellite frequency-orbit data information extraction method based on enhanced sample model
CN114239828A (en) Supply chain affair map construction method based on causal relationship
CN111079840B (en) Complete image semantic annotation method based on convolutional neural network and concept lattice
CN117648984A (en) Intelligent question-answering method and system based on domain knowledge graph
CN116680407A (en) Knowledge graph construction method and device
CN116226404A (en) Knowledge graph construction method and knowledge graph system for intestinal-brain axis
CN113190690B (en) Unsupervised knowledge graph inference processing method, unsupervised knowledge graph inference processing device, unsupervised knowledge graph inference processing equipment and unsupervised knowledge graph inference processing medium
CN115906846A (en) Document-level named entity identification method based on double-graph hierarchical feature fusion
CN113792144A (en) Text classification method based on semi-supervised graph convolution neural network
CN113656594A (en) Knowledge reasoning method based on aircraft maintenance
CN113435190A (en) Chapter relation extraction method integrating multilevel information extraction and noise reduction
CN113361261B (en) Method and device for selecting legal case candidate paragraphs based on enhance matrix
CN117436457B (en) Irony identification method, irony identification device, computing equipment and storage medium
CN114328978B (en) Relationship extraction method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination