WO2023273182A1 - 面向多源知识图谱融合的实体对齐方法、装置与系统 - Google Patents

面向多源知识图谱融合的实体对齐方法、装置与系统 Download PDF

Info

Publication number
WO2023273182A1
WO2023273182A1 PCT/CN2021/137139 CN2021137139W WO2023273182A1 WO 2023273182 A1 WO2023273182 A1 WO 2023273182A1 CN 2021137139 W CN2021137139 W CN 2021137139W WO 2023273182 A1 WO2023273182 A1 WO 2023273182A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
representation
sample set
embedding matrix
complete
Prior art date
Application number
PCT/CN2021/137139
Other languages
English (en)
French (fr)
Inventor
鄂海红
林学渊
宋文宇
宋美娜
Original Assignee
北京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京邮电大学 filed Critical 北京邮电大学
Publication of WO2023273182A1 publication Critical patent/WO2023273182A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the technical field of big data processing, in particular to an entity alignment method, device and system for multi-source knowledge graph fusion.
  • Knowledge Graph (KG for short), consisting of points (entities) and edges (relationships between entities, entity attributes), plays a pivotal role in many researches and applications of artificial intelligence.
  • the cornerstone of technology in other fields has received extensive attention.
  • Widely used in knowledge-driven AI tasks such as question answering models, recommendation systems, search engines, and more.
  • General knowledge graphs and domain knowledge graphs are constructed by different organizations, experts, or automated and semi-automated systems, and there are overlaps and intersections of knowledge between them. It has special significance to promote downstream tasks and so on.
  • Entity alignment is a key step in the automatic fusion (merge/integration) of multi-source knowledge graphs, and its effect is directly related to the effect of automatic fusion (merge/integration) of knowledge graphs. Therefore, the accuracy of entity alignment algorithms is particularly important. Due to the large differences in the expression of various aspects of entity information in different knowledge graphs, the existing methods for entity alignment are mostly based on graph neural network models (GNN), convolution-based models, and capsule network-based models. The learning of the unique triple structure of the knowledge graph to propagate the alignment information has the following disadvantages:
  • the present invention aims to solve one of the technical problems in the related art at least to a certain extent.
  • the first purpose of the present invention is to propose an entity alignment method for multi-source knowledge graph fusion, which models the implicit interaction between entities and relationships, and improves the interaction between entities and relationships;
  • the iterative strategy of the bidirectional global filtering strategy (ABGS) combined with attributes is used to generate high-quality semi-supervised data, and further generate "aligned entity pairs" including positive and negative examples, In order to reduce the error rate of generated data and improve the utilization rate of prediction results.
  • ABS bidirectional global filtering strategy
  • the second purpose of the present invention is to propose an entity alignment device for multi-source knowledge graph fusion.
  • the third purpose of the present invention is to propose a data service system for automatic integration of multi-source knowledge graphs.
  • a fourth object of the present invention is to provide a non-transitory computer-readable storage medium.
  • a fifth object of the present invention is to provide an electronic device.
  • the sixth object of the present invention is to provide a computer program product.
  • an entity alignment method for multi-source knowledge graph fusion including:
  • Extracting entity features of entities in the knowledge graph generating an entity embedding matrix according to the entity features of the entity, and obtaining an entity representation of the knowledge graph according to the entity embedding matrix;
  • a two-way global filtering strategy is used to generate a sample set, and the neural network model is iteratively trained according to the sample set, so that the trained neural network model has the ability to align and fuse multiple knowledge graphs. Capability, wherein the sample set includes an iterative positive sample set and an iterative negative sample set.
  • the entity alignment method for multi-source knowledge map fusion proposed in the embodiment of the present application also includes: a dropout network and a cross-layer highway network;
  • the highway network is used to mix two different entity embedding matrices, where,
  • X (a) and X (b) are two entity embedding matrices
  • X (out) is the output of the highway network
  • W and b are the bias vectors of the weight matrix of the linear layer
  • is the gating weight vector
  • the output X (out) of the highway network is input to the dropout network to obtain a mixed feature, and the mixed feature is input to the graph attention network GAT, and the graph attention network GAT output is:
  • ⁇ ij represents the attention weight of the adjacent entities of the entity e i
  • a is a trainable parameter vector
  • the dimension is 2d e ⁇ 1
  • a T represents the transposition of the parameter vector
  • *] represents the splicing operation
  • exp( x ) ex
  • LeakyReLU is the activation function
  • LeakyReLU(x) max(x,0)+0.01*min(x,0)
  • N i represents the set of all adjacent entities of entity e i .
  • the relationship information between the entity and the adjacent entities is calculated according to the entity representation, and the entity representation is enhanced according to the relationship information, so as to obtain a complete knowledge map Entity representations, including:
  • each relation r k Divide the semantics of each relation r k into two parts, the part related to the head entity and the part related to the tail entity
  • the entity e i is represented as part of the head entity, is the list of relations related to the head entity e i , and ⁇ ik represents the attention weight of the relation r k on the head entity e i ;
  • the complete entity embedding matrix is obtained according to the complete entity representation
  • the final entity embedding matrix is obtained according to the complete entity embedding matrix, including:
  • the calculation formula of the loss function is:
  • P + is the positive sample set
  • P - is the negative sample set generated from the positive sample set
  • is the hyperparameter
  • xi is from Entity embedding vector
  • P + consists of two parts, one part is the training set P of the original pre-aligned entity , the other part is the iterative positive sample set generated by the bidirectional global filtering strategy combined with attributes which is
  • the two-way global filtering strategy includes:
  • Semi-supervised data is generated using local and global alignments to generate iterative positive and iterative negative sets.
  • an entity alignment device for multi-source knowledge graph fusion including:
  • the original aggregation network module is used to extract the entity features of the entities in the knowledge map, generate an entity embedding matrix according to the entity features of the entity, and obtain the entity representation of the knowledge map according to the entity embedding matrix;
  • An echo network module configured to calculate the relationship information between the entity and the adjacent entity according to the entity representation, and enhance the entity representation according to the relationship information, so as to obtain a complete entity representation of the knowledge graph;
  • a complete aggregation network module configured to obtain a complete entity embedding matrix according to the complete entity representation, and obtain a final entity embedding matrix according to the complete entity embedding matrix;
  • Alignment loss function calculation module used to calculate loss function according to the final entity embedding matrix and data set
  • a bidirectional global filtering strategy module combined with attributes is used to generate a sample set using a bidirectional global filtering strategy according to the loss function and the attribute information of the entity, and iteratively trains the neural network model according to the sample set, so that the neural network model after training
  • the network model has the ability to align and fuse multiple knowledge graphs, wherein the sample set includes an iterative positive sample set and an iterative negative sample set.
  • the embodiment of the third aspect of the present application proposes a data service system for automatic integration of multi-source knowledge graphs, including:
  • the knowledge map data source management module to be aligned is used to save and manage multiple knowledge map data sources
  • the data management module is used to obtain the knowledge map data to be aligned, and convert the knowledge map data to be aligned into the knowledge map data to be aligned in a preset data format;
  • the knowledge fusion module is used to use the neural network model trained in the entity alignment method for multi-source knowledge graph fusion as described in the embodiment of the first aspect of the application to predict the knowledge graph data to be aligned in a preset data format to obtain aligning entity pairs, and merging the knowledge graph data to be aligned into a knowledge graph according to the aligned entity pairs;
  • the integrated knowledge map management module is used to save and manage the knowledge map, and publish data services according to the knowledge map.
  • the embodiment of the fourth aspect of the present application proposes a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program of the embodiment of the first aspect of the application is implemented.
  • the described entity alignment method for multi-source knowledge graph fusion is described.
  • the embodiment of the fifth aspect of the present application proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions , so as to realize the entity alignment method for multi-source knowledge graph fusion as described in the embodiment of the first aspect of the present application.
  • the embodiment of the sixth aspect of the present application proposes a computer program product, including a computer program.
  • the computer program When the computer program is executed by a processor, it realizes the multi-source knowledge-oriented An Entity Alignment Approach for Graph Fusion.
  • the scheme first extracts the entity features of the entities in the knowledge graph, and according to the entity Entity feature generation entity embedding matrix, and obtain the entity representation of the knowledge map according to the entity embedding matrix; then calculate the relationship information between the entity and the adjacent entity according to the obtained entity representation, and according to the relationship information between the entity and the adjacent entity Enhance the above entity representation to obtain a complete entity representation of the knowledge graph; then obtain the final entity embedding matrix based on the obtained complete entity representation of the knowledge graph; then calculate the loss function based on the obtained final entity embedding matrix and data set; finally adopt bidirectional global filtering
  • the strategy processes the loss function and the attribute information of the entity to generate an iterative positive sample set and an iterative negative sample set, and then iteratively trains the neural network model through the sample set, so that the trained neural network model has the ability to align and integrate multiple knowledge
  • the above scheme disclosed in the embodiment of the present application realizes the modeling of the implicit interaction between entities and relationships, and improves the interaction between entities and relationships; secondly, according to the loss function and the attribute information of entities, a combination of The iterative strategy of the bidirectional global filtering strategy (ABGS) of attributes to generate high-quality semi-supervised data, and further generate "aligned entity pairs" containing positive and negative examples to reduce the error rate of generated data and improve the accuracy of prediction results. utilization rate.
  • ABS bidirectional global filtering strategy
  • FIG. 1 is a flow chart of an entity alignment method for multi-source knowledge graph fusion provided by an embodiment of the present application
  • Fig. 2 is the overall flowchart of the entity alignment method in the embodiment of the present application.
  • Fig. 3 is the flow chart of two-way global filtering strategy in the embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an entity alignment device for multi-source knowledge graph fusion provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a data service system oriented towards automatic integration of multi-source knowledge graphs provided by an embodiment of the present application.
  • Knowledge Graph (KG for short), consisting of points (entities) and edges (relationships between entities, entity attributes), plays a pivotal role in many researches and applications of artificial intelligence.
  • the cornerstone of technology in other fields has received extensive attention. It is widely used in knowledge-driven AI tasks, such as question answering models, recommendation systems, search engines, etc.
  • General knowledge graphs and domain knowledge graphs are constructed by different organizations, experts, or automated and semi-automated systems, and there are overlaps and intersections of knowledge between them. It has special significance to promote downstream tasks and so on.
  • Entity alignment is a key step in the automatic fusion (merge/integration) of multi-source knowledge graphs, and its effect is directly related to the effect of automatic fusion (merge/integration) of knowledge graphs. Therefore, the accuracy of entity alignment algorithms is particularly important. Due to the large differences in the expression of various aspects of entity information in different knowledge graphs, the existing methods for entity alignment are mostly based on graph neural network models (GNN), convolution-based models, and capsule network-based models. Knowledge graph-specific triplet structure learning to propagate alignment information.
  • GNN graph neural network models
  • convolution-based models convolution-based models
  • capsule network-based models Knowledge graph-specific triplet structure learning to propagate alignment information.
  • the current mainstream entity alignment frameworks are:
  • KGE Knowledge Graph Embedding
  • the current knowledge graph embedding KGE model is divided into two types: the first type is centered on the relationship, emphasizing that the tail entity comes from the relationship and acts on the head entity.
  • This category includes TransE series, rotation model, polar coordinate model, bilinear model, etc., which have shined in entity link prediction tasks. But in entity alignment tasks, these models perform poorly; the second category is entity-centric, emphasizing that all entities are equal, and the relationship between entities is only one of the information sources to enhance entity representation.
  • Such models include graph neural network models (GNN), convolution-based models, capsule network-based models, etc. They are closely related to the fields of computer vision and natural language processing, and their interpretability is weak.
  • the embodiment of the present application proposes an entity alignment method, an entity alignment device, a data service system, and a readable storage medium for multi-source knowledge graph fusion.
  • the embodiment of this application designs a novel graph neural network model Echo to improve the implicit interaction between entities and relationships; secondly, this embodiment of the application also proposes a A more excellent iterative strategy, combined with the bidirectional global filtering strategy (ABGS) of attributes, to generate high-quality semi-supervised data ("aligned entity pairs" for the next round of training), and further generated “aligned entity pairs” , both positive and negative examples.
  • ABS bidirectional global filtering strategy
  • the accuracy rate of the top1 model can be increased to 96%, far exceeding the 79% of the previous model.
  • the embodiment of the present application also includes an entity alignment device, a data service system, and a non-transitory computer-readable storage medium.
  • entity alignment device for example: two financial events Knowledge graphs, two medical knowledge graphs, and commonsense knowledge graphs generated by two different encyclopedias.
  • FIG. 1 is a flow chart of an entity alignment method for multi-source knowledge graph fusion provided by an embodiment of the present application.
  • FIG. 2 is an overall flowchart of the entity alignment method in the embodiment of the present application.
  • an entity alignment method for multi-source knowledge graph fusion includes the following steps 101 to 105:
  • Step 101 extracting entity features of entities in the knowledge graph, generating an entity embedding matrix according to the entity features of the entities, and obtaining entity representations of the knowledge graph according to the entity embedding matrix.
  • the embodiment of the present application proposes the original aggregation network module.
  • the original aggregation network module extracts the entity features of the entities in the knowledge map by stacking multiple layers of GCN and GAT to generate the entity embedding matrix.
  • its original aggregation network module can be designed as the first layer is GCN, and the second and third layers are GAT.
  • the embodiment of the present application inserts a dropout network and a cross-layer highway network into the original aggregation network module.
  • the embodiment of the present application sets the entity embedding matrix where
  • the highway network is used to mix two different entity embedding matrices, where,
  • X (a) and X (b) are two entity embedding matrices
  • X (out) is the output of the highway network
  • W and b are the bias vectors of the weight matrix of the linear layer
  • is the gating weight vector
  • ⁇ ij represents the attention weight of the adjacent entities of the entity e i
  • a is a trainable parameter vector
  • the dimension is 2d e ⁇ 1
  • *] represents the splicing operation
  • exp( x ) ex
  • LeakyReLU is the activation function
  • LeakyReLU(x) max(x,0)+0.01*min(x,0)
  • N i represents the set of all adjacent entities of entity e i .
  • the embodiment of the present application also uses the highway network in the echo network module.
  • Step 102 calculate the relationship information between the entity and the adjacent entity according to the entity representation, and enhance the entity representation according to the relationship information, so as to obtain a complete entity representation of the knowledge graph.
  • the embodiment of the present application calculates the relationship information between the entity and the adjacent entity based on the entity representation obtained in step 101, and enhances the entity representation according to the relationship information to obtain a complete entity representation of the knowledge graph, including:
  • each relation r k Divide the semantics of each relation r k into two parts, the part related to the head entity and the part related to the tail entity
  • the entity e i is represented as part of the head entity, is the list of relations related to the head entity e i , and ⁇ ik represents the attention weight of the relation r k on the head entity e i ;
  • the output of the reverberation network module in the embodiment of the present application is a complete entity representation of the knowledge graph, and the complete entity representation of the knowledge graph is obtained in the following manner:
  • the complete entity representation in the embodiment of the present application is dynamically calculated from two views of the neighbor relationship representation, and the relation representation is generated based on the original entity representation, where the reverberation network module design is different from previous models because it does not ignore entities Contribution to the relationship, and make different parts of the relationship work.
  • the design of the echo network module in the embodiment of this application follows the idea that the relationship information must be further used to enhance the entity representation, and the contribution of the entity to the relationship cannot be ignored .
  • the embodiment of the present application divides the semantics of each relation r k into two parts, the part related to the head entity and the part related to the tail entity That is, each part depends only on related entities.
  • the embodiment of this application uses GAT to propagate entity information to relationships, as follows:
  • ⁇ ijk represents the attention weight from the head entity e i to the relation r k based on the head entity e i and the tail entity e j . Pass directly to the next layer of GAT to output the part of the full entity representation as the head entity.
  • the entity e i is represented as part of the head entity, is a list of relations related to the head entity e i , rather than a set, here it is allowed to repeat relations specific to different tail entities, and ⁇ ik represents the attention weight of the relation r k with respect to the head entity e i .
  • ⁇ ik represents the attention weight of the relation r k with respect to the head entity e i .
  • Step 103 obtain a complete entity embedding matrix according to the complete entity representation, and obtain a final entity embedding matrix according to the complete entity embedding matrix.
  • the embodiment of the present application obtains the complete entity embedding matrix according to the complete entity representation, and obtains the final entity embedding matrix according to the complete entity embedding matrix, including:
  • the entity representation after passing through the echo network is the complete entity representation.
  • the complete aggregation network module aggregates the information of neighboring entities again to obtain the final entity embedding matrix.
  • the specific acquisition method is as follows:
  • the embodiment of the present application continues to use the GAT layer to aggregate information from neighbors again.
  • the GAT in this layer is much more powerful than the GAT in the original aggregation layer, because this layer further obtains entity information from deconstructed relations, while the original aggregation layer ignores the role of relations on entities, so A full aggregation network is crucial for aggregating optimal entity representations.
  • Step 104 calculate the loss function according to the final entity embedding matrix and the data set, wherein the calculation formula of the loss function is:
  • P + is the positive sample set
  • P - is the negative sample set generated from the positive sample set
  • is the hyperparameter
  • xi is from Entity embedding vector
  • P + consists of two parts, one part is the training set P of the original pre-aligned entity , the other part is the iterative positive sample set generated by the bidirectional global filtering strategy combined with attributes which is
  • the alignment loss function calculation module in the embodiment of the present application is responsible for calculating losses based on entity representations and datasets for neural network training.
  • the loss function calculation formula is Hinge loss using Manhattan distance.
  • Step 105 according to the loss function and the attribute information of the entity, adopt a bidirectional global filtering strategy to generate a sample set, and perform iterative training on the neural network model according to the sample set, so that the trained neural network model has alignment and fusion multiple The capability of the knowledge graph, wherein the sample set includes an iterative positive sample set and an iterative negative sample set.
  • the embodiment of the present application proposes a bidirectional global filtering strategy method combined with attributes, and a corresponding processing module.
  • the input of this strategy is the entity candidate set E1, E2 and the relationship similarity matrix
  • the output is the iterative positive sample set and iterative negative sample set
  • the task of the bidirectional global filtering strategy module is to be responsible for continuously generating high-quality alignment data during the model training process for the next round of model training. Therefore, the two-way global filtering strategy module proposed in the embodiment of the present application combines the attribute information of the entity. Through the introduction of attribute information, the accuracy of positive example generation is further improved, the error of positive example generation is reduced, and the next round of iteration Influence of training quality.
  • FIG. 3 is a flow chart of a two-way global filtering strategy in an embodiment of the present application.
  • the bidirectional global filtering strategy in this embodiment of the present application includes steps 201 to 204 .
  • Step 201 calculates the attribute similarity matrix and the attribute value similarity matrix, and the specific calculation method is as follows:
  • attribute names in different languages may be translated into the same language in cross-language alignment.
  • the similarity between two attribute names can be directly calculated.
  • the value similarity of attribute a based on entities e i and e j is Where Value a (e i ) is the value set of attribute a of entity e i .
  • the attribute value-based similarity matrix S attr_value is obtained, where the i-th row and j-th column element
  • Step 202 calculating a final similarity matrix according to the attribute similarity matrix and the attribute value similarity matrix.
  • step 202 three similarity matrices S attr_value , S attr and S rel are first obtained in step 202, and then the final alignment is calculated using hyperparameters ⁇ 1 , ⁇ 2 , ⁇ 3 ⁇ [0,1] matrix:
  • Step 203 calculating a local alignment result according to the final similarity matrix.
  • the result of calculating the local alignment is as follows:
  • S(e 1 , e 2 ) represents the similarity between entity e 1 and entity e 2 in the final similarity matrix.
  • Step 204 using local alignment and global alignment to generate semi-supervised data to generate an iterative positive sample set and an iterative negative sample set.
  • the embodiments of the present application use both local alignment and global alignment to generate semi-supervised data.
  • Global alignment refers to the process of selecting the target entity from the candidate set without replacement.
  • local alignment refers to the selection of target entities from the candidate set with replacement.
  • the strategy outputs the iterative positive sample set and iterative negative sample set For use in neural network model training.
  • the application scenarios of the foregoing embodiments are divided into original image applications and supporting applications.
  • the application of the original image refers to direct application using the fused knowledge graph, such as querying as shown in the figure;
  • the supporting application refers to the use of the fused knowledge graph to provide support for downstream applications, such as recommendation, search, and question-and-answer.
  • the application of the original image in this embodiment of the application includes: academic knowledge query, legal document query, enterprise relationship query, suspect relationship query, insurance claim case query, clinical case query, business process query, etc.
  • the supporting applications in the embodiments of this application include: intelligent question and answer, intelligent search, intelligent recommendation, decision analysis system, group portrait, holographic file, risk warning, public security clue mining, auxiliary clinical diagnosis and treatment, etc.
  • intelligent search based on knowledge graph can perform cross-media search on complex and multivariate objects such as text, pictures, and videos, and can also realize multi-granularity search at chapter level, paragraph level, and sentence level.
  • Intelligent search allows computers to more accurately identify and understand users' deep-seated search intentions and needs, find target entities and their related content in multi-source knowledge graphs, sort and classify the result content, and use natural language that conforms to human habits displayed in the form of , thereby improving the search experience.
  • the present invention also proposes an entity alignment device for multi-source knowledge graph fusion.
  • FIG. 4 is a schematic structural diagram of an entity alignment device for multi-source knowledge graph fusion provided by an embodiment of the present application.
  • the embodiment of the present application provides an entity alignment device for multi-source knowledge graph fusion, including:
  • the original aggregation network module 10 is used to extract the first entity feature of the entity in the knowledge map and the second entity feature of the adjacent entity, generate a first entity embedding matrix according to the first entity feature of the entity, and generate a first entity embedding matrix according to the adjacent entity
  • the second entity feature generates a second entity embedding matrix, and aggregates the first entity embedding matrix and the second entity embedding matrix to obtain an entity representation of the knowledge graph;
  • An echo network module 20 configured to calculate the relationship information between the entity and the adjacent entity according to the entity representation, and enhance the entity representation according to the relationship information, so as to obtain a complete entity representation of the knowledge graph;
  • a complete aggregation network module 30 configured to obtain a complete entity embedding matrix according to the complete entity representation, and obtain a final entity embedding matrix according to the complete entity embedding matrix;
  • Alignment loss function calculation module 40 used to calculate loss function according to the final entity embedding matrix and data set
  • the bidirectional global filtering strategy module 50 combined with attributes is used to generate a sample set using a bidirectional global filtering strategy according to the loss function and the attribute information of the entity, and iteratively trains the neural network model according to the sample set, so that the trained
  • the neural network model has the ability to align and fuse multiple knowledge graphs, wherein the sample set includes an iterative positive sample set and an iterative negative sample set.
  • the embodiment of the present application proposes a data service system for automatic integration of multi-source knowledge graphs.
  • Fig. 5 is a schematic structural diagram of a data service system for automatic integration of multi-source knowledge graphs provided by the embodiment of the present application.
  • the embodiment of the present application provides a data service system for automatic integration of multi-source knowledge graphs, including:
  • To be aligned knowledge map data source management module 60 for saving and managing multiple knowledge map data sources
  • the data management module 70 is used to obtain the knowledge map data to be aligned, and convert the knowledge map data to be aligned into the knowledge map data to be aligned in a preset data format;
  • the knowledge fusion module 80 is used to use the neural network model trained in the entity alignment method for multi-source knowledge graph fusion described in the embodiment of the present application to predict the knowledge graph data to be aligned in a preset data format to obtain aligned entity pairs , merging the knowledge map data to be aligned into a knowledge map according to the alignment entity pair;
  • the integrated knowledge graph management module 90 is configured to save and manage the knowledge graph, and publish data services according to the knowledge graph.
  • source A and source B located on server A and server B respectively.
  • Server 1 runs the data source management module of the knowledge graph to be aligned, and the description data records it saves are similar to the following table.
  • the following table is an example of a description data table of the data source management module of the knowledge graph to be aligned:
  • Server 2 runs the data management module.
  • Server 3 runs the knowledge fusion module.
  • the server 4 runs the integrated knowledge map management module.
  • a running example is as follows:
  • Server 2 initiates a request to Server 1 to obtain description data of all data sources for dynamic assembly of data conversion modules. Then the entrusting server 1 requests the data of source A and the data of source B in sequence.
  • Server 1 will initiate requests to 123.123.123.1:8888 and 123.123.123.2:8888 in turn, and forward the data to Server 2.
  • the server 2 receives the data from the server 1, executes the data reading module, data conversion module, and data transmission module, converts the original knowledge graph data into a standard format, and then forwards multiple knowledge graph data in the standard format to the server 3.
  • Server 3 receives the standard data from server 2; first run the training module so that the neural network model has the ability to integrate multiple knowledge graphs; then run the prediction module to let the trained neural network model predict the intersection of multiple knowledge graphs, namely Align entity pairs; finally run the fusion module to fuse multiple knowledge graphs into one knowledge graph and send it to the server 4 .
  • the server 4 receives the unified knowledge graph of the server 3, and publishes the knowledge graph as a data service. Third parties can subscribe to the service to pull the knowledge graph onto their servers. Server 4 can be used as a data source to continue to provide data for the next data service system oriented to the automatic integration of multi-source knowledge graphs, so as to build a larger and more complete knowledge graph.
  • the embodiment of the present application also proposes a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the multi-oriented An Entity Alignment Method for Source Knowledge Graph Fusion.
  • an embodiment of the present application further proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to Realize the entity alignment method for multi-source knowledge map fusion as described in the embodiment of the present application.
  • the embodiment of the present application also proposes a computer program product, including a computer program.
  • the computer program When the computer program is executed by a processor, the entity alignment oriented to multi-source knowledge graph fusion as described in the embodiment of the present application is realized. method.
  • the graph neural network model Echo further strengthens the interaction between entities and relationships, enabling entity representation to perceive different parts of relationships, and its structure and calculation process are novel and effective.
  • the two-way global filtering strategy that iteratively generates the combined attributes of the training data can solve the problem of lack of artificial alignment seeds and greatly improve the accuracy of the model.
  • the third is to use the data service system oriented to the automatic integration of multi-source knowledge graphs of the above-mentioned entity alignment device to abstract knowledge graph data sources into descriptive data, and automatically run the alignment device to fuse multi-source knowledge graphs and automatically publish the fused knowledge Graph serves data, enabling third parties to conveniently obtain unified large-scale knowledge graph data resources.
  • first and second are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless specifically defined otherwise.
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device, or device.
  • computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary.
  • the program is processed electronically and stored in computer memory.
  • each functional unit in each embodiment of the present invention may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种面向多源知识图谱融合的实体对齐方法、装置与系统,涉及大数据处理技术领域,该方案包括:提取知识图谱中实体的实体特征,根据实体的实体特征生成实体嵌入矩阵,并根据实体嵌入矩阵获取知识图谱的实体表示;根据实体表示计算实体与相邻实体的关系信息,根据关系信息增强实体表示得到完整实体表示;依据完整实体表示获取最终实体嵌入矩阵;根据最终实体嵌入矩阵和数据集计算损失函数;采用双向全局过滤策略对损失函数和实体的属性信息进行处理生成迭代正样本集和迭代负样本集,通过样本集对神经网络模型进行迭代训练。

Description

面向多源知识图谱融合的实体对齐方法、装置与系统
相关申请的交叉引用
本申请基于申请号为202110726190.5、申请日为2021年06月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本发明涉及大数据处理技术领域,尤其涉及一种面向多源知识图谱融合的实体对齐方法、装置与系统。
背景技术
知识图谱(Knowledge Graph,简称KG),由点(实体)和边(实体间的关系,实体属性)组成,在人工智能的多项研究与应用中扮演着举足轻重的角色,它作为问答、推荐系统等领域的技术基石,受到了广泛关注。广泛应用于知识驱动的AI任务,如问答模型、推荐系统、搜索引擎等等。通用知识图谱及领域知识图谱由不同组织机构、专家或自动化及半自动化系统构建形成,彼此之间存在着知识的重叠和交叉,因此,合并(融合)两个知识图谱对扩大现有知识图谱、提升下游任务等有特殊意义。
实体对齐是多源知识图谱自动融合(合并/集成)的关键步骤,其效果和知识图谱自动融合(合并/集成)的效果直接相关,因此实体对齐算法的准确率尤为重要。由于不同的知识图谱对于实体各方面信息的表达差异性较大,现有针对实体对齐的方法,大多基于图神经网络模型(GNN)、基于卷积的模型、基于胶囊网络的模型等模型,通过知识图谱特有的三元组结构的学习来传播对齐信息,但是以下缺点:
(1)交互不足。对实体为中心的模型来说,实体和关系之间的隐性交互是很难建模的,和关系为中心的模型恰恰相反。而现有技术的工作集中在图的连通性上,忽略了关系类型、关系方向、实体信息对关系表示的贡献等等。
(2)低质量自举。为解决缺少预对齐种子数据的缺点提出的自举方法认为,如果模型对其预测的结果有信心,那这个结果应该视为正确,作为额外的数据也加入模型训练,这样模型效果还能提升,BootEA、MRAEA都是优秀且经典的自举方法,但都严重依赖于模型本身的效果,而且生成的数据错误率高,质量低,且只能生成正例,不能生成负例,这导致对预测结果利用率较低的问题。
发明内容
本发明旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本发明的第一个目的在于提出一种面向多源知识图谱融合的实体对齐方法,对实体和关系之间的隐性交互进行了建模,提高了实体与关系之间的交互;其次,根据损失函数和实体的属性信息,采用结合属性的双向全局过滤策略(ABGS)的迭代策略 来生成高质量的半监督数据,且进一步生成包含正例和负例的“对齐实体对”,以降低生成的数据错误率,和提高对预测结果的利用率。
本发明的第二个目的在于提出一种面向多源知识图谱融合的实体对齐装置。
本发明的第三个目的在于提出一种面向多源知识图谱自动化集成的数据服务系统。
本发明的第四个目的在于提出一种非临时性计算机可读存储介质。
本发明的第五个目的在于提出一种电子设备。
本发明的第六个目的在于提出一种计算机程序产品。
为达上述目的,本申请第一方面实施例提出了一种面向多源知识图谱融合的实体对齐方法,包括:
提取知识图谱中实体的实体特征,根据所述实体的实体特征生成实体嵌入矩阵,并根据所述实体嵌入矩阵获取所述知识图谱的实体表示;
根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示;
根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵;
根据所述最终实体嵌入矩阵和数据集计算损失函数;
根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
可选的,本申请实施例提出的面向多源知识图谱融合的实体对齐方法,还包括:dropout网络和跨层highway网络;
其中,使用highway网络混合两种不同的实体嵌入矩阵,其中,
α=sigmoid(X (a)W+b),
X (out)=(1-α)X (a)+αX (b)
其中,X (a),X (b)是两个实体嵌入矩阵,X (out)是highway网络的输出,W和b分别是线性层的权重矩阵的偏置矢量,α是门控权重向量;
将所述highway网络的输出X (out)输入dropout网络,以得到混合特征,将所述混合特征输入到图注意力网络GAT,所述图注意力网络GAT输出为:
Figure PCTCN2021137139-appb-000001
Figure PCTCN2021137139-appb-000002
其中,
Figure PCTCN2021137139-appb-000003
是第l层GAT输出的实体e i的嵌入表示,
Figure PCTCN2021137139-appb-000004
是第l-1层GAT输出的实体e j′的嵌入表示,α ij表示实体e i的相邻实体的注意力权重,a是可训练的参数向量,维数为2d e×1, a T表示参数向量的转置,[*||*]表示拼接运算,exp(x)=e x,LeakyReLU是激活函数,LeakyReLU(x)=max(x,0)+0.01*min(x,0),N i表示实体e i的所有相邻实体组成的集合。
可选的,在本申请实施例中,根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示,包括:
将每个关系r k的语义分为两部分,与头实体相关的部分
Figure PCTCN2021137139-appb-000005
和与尾实体相关的部分
Figure PCTCN2021137139-appb-000006
每个实体x的表示可以拆分为x h=x (PAN)W h和x t=x (PAN)W t,其中W h,
Figure PCTCN2021137139-appb-000007
是权重矩阵,d r是关系嵌入维数,x (PAN)是来自原始聚合层输出的嵌入矩阵X (PAN)的实体嵌入;
采用所述图注意力网络GAT将实体信息传播到关系,
Figure PCTCN2021137139-appb-000008
Figure PCTCN2021137139-appb-000009
其中,
Figure PCTCN2021137139-appb-000010
是基于关系头语义
Figure PCTCN2021137139-appb-000011
的实体e i作为头实体的部分表示,
Figure PCTCN2021137139-appb-000012
是与头实体e i相关的关系列表,α ik表示关系r k关于头实体e i的注意力权重;
Figure PCTCN2021137139-appb-000013
计算出
Figure PCTCN2021137139-appb-000014
和从
Figure PCTCN2021137139-appb-000015
计算出
Figure PCTCN2021137139-appb-000016
使用所述Highway网络自动平衡
Figure PCTCN2021137139-appb-000017
Figure PCTCN2021137139-appb-000018
中的信息,并通过拼接获得e i的完整实体表示
Figure PCTCN2021137139-appb-000019
Figure PCTCN2021137139-appb-000020
可选的,在本申请实施例中,根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵,包括:
使用回响网络输出所述完整实体对应的完整实体嵌入矩阵X (EN),并输出所述最终实体嵌入矩阵
Figure PCTCN2021137139-appb-000021
Figure PCTCN2021137139-appb-000022
可选的,在本申请实施例中,所述损失函数计算公式是:
Figure PCTCN2021137139-appb-000023
其中,P +是正样本集,P -是从正样本集中生成的负样本集,
Figure PCTCN2021137139-appb-000024
是迭代策略生成的负样本集,λ是超参数,x i是来自
Figure PCTCN2021137139-appb-000025
实体嵌入向量,d(x i,x j)是距离函数d(x i,x j)=|x i-x j|,P +由两部分组成,一部分是原始的预对齐实体的训练集P,另一部分是结合属性的双向全局过滤策略生成的迭代正样本集
Figure PCTCN2021137139-appb-000026
Figure PCTCN2021137139-appb-000027
可选的,在本申请实施例中,所述双向全局过滤策略包括:
计算属性相似度矩阵和属性值相似度矩阵;
根据所述属性相似度矩阵和属性值相似度矩阵计算最终相似度矩阵;
根据所述最终相似度矩阵计算局部对齐的结果;
使用局部对齐和全局对齐来生成半监督数据,以生成迭代正样本集和迭代负样本集。
为达上述目的,本申请第二方面实施例提出了一种面向多源知识图谱融合的实体对齐装置,包括:
原始聚合网络模块,用于提取知识图谱中实体的实体特征,根据所述实体的实体特征生成实体嵌入矩阵,并根据所述实体嵌入矩阵获取所述知识图谱的实体表示;
回响网络模块,用于根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示;
完整聚合网络模块,用于根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵;
对齐损失函数计算模块,用于根据所述最终实体嵌入矩阵和数据集计算损失函数;
结合属性的双向全局过滤策略模块,用于根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
为达上述目的,本申请第三方面实施例提出了一种面向多源知识图谱自动化集成的数据服务系统,包括:
待对齐知识图谱数据源管理模块,用于保存和管理多个知识图谱数据源;
数据管理模块,用于获取待对齐知识图谱数据,将待对齐知识图谱数据转换成预设数据格式的待对齐知识图谱数据;
知识融合模块,用于使用如本申请第一方面实施例所述的面向多源知识图谱融合的实体对齐方法中训练后的神网络模型对预设数据格式的待对齐知识图谱数据进行预测以得到对齐实体对,根据所述对齐实体对将待对齐知识图谱数据融合为知识图谱;
已融合知识图谱管理模块,用于保存和管理所述知识图谱,并根据所述知识图谱发布数据服务。
为达上述目的,本申请第四方面实施例提出了一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如本申请第一方面实施例所述的面向多源知识图谱融合的实体对齐方法。
为达上述目的,本申请第五方面实施例提出了一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现如本申请第一方面实施例所述的面向多源知识图谱融合的实体对齐方法。
为达上述目的,本申请第六方面实施例提出了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现如本申请第一方面实施例所述的面向多源知识图谱 融合的实体对齐方法。
综上所述,本申请实施例的面向多源知识图谱融合的实体对齐方法、实体对齐装置、数据服务系统及可读存储介质,该方案首先提取知识图谱中实体的实体特征,根据所述实体的实体特征生成实体嵌入矩阵,并根据所述实体嵌入矩阵获取所述知识图谱的实体表示;然后依据得到的实体表示计算实体与相邻实体的关系信息,并根据实体与相邻实体的关系信息增强上述实体表示,以得到知识图谱的完整实体表示;随后依据得到的知识图谱的完整实体表示获取最终实体嵌入矩阵;再根据获取的最终实体嵌入矩阵和数据集计算损失函数;最后采用双向全局过滤策略对损失函数和实体的属性信息进行处理以生成迭代正样本集和迭代负样本集,进而通过样本集对神经网络模型进行迭代训练,以使得训练后的神网络模型具有对齐和融合多个知识图谱的能力。由此,本申请实施例公开的上述方案实现了对实体和关系之间的隐性交互的建模,提高了实体与关系之间的交互;其次,根据损失函数和实体的属性信息,采用结合属性的双向全局过滤策略(ABGS)的迭代策略来生成高质量的半监督数据,且进一步生成包含正例和负例的“对齐实体对”,以降低生成的数据错误率,和提高对预测结果的利用率。
本发明附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
附图说明
本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为本申请实施例所提供的一种面向多源知识图谱融合的实体对齐方法的流程图;
图2为本申请实施例中实体对齐方法的总体流程图;
图3为本申请实施例中双向全局过滤策略的流程图;
图4为本申请实施例所提供的一种面向多源知识图谱融合的实体对齐装置的结构示意图;以及
图5为本申请实施例所提供的一种面向多源知识图谱自动化集成的数据服务系统的结构示意图。
具体实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。
知识图谱(Knowledge Graph,简称KG),由点(实体)和边(实体间的关系,实体属性)组成,在人工智能的多项研究与应用中扮演着举足轻重的角色,它作为问答、推荐系统等领域的技术基石,受到了广泛关注。广泛应用于知识驱动的AI任务,如问答模型、推荐系统、 搜索引擎等等。通用知识图谱及领域知识图谱由不同组织机构、专家或自动化及半自动化系统构建形成,彼此之间存在着知识的重叠和交叉,因此,合并(融合)两个知识图谱对扩大现有知识图谱、提升下游任务等有特殊意义。
实体对齐是多源知识图谱自动融合(合并/集成)的关键步骤,其效果和知识图谱自动融合(合并/集成)的效果直接相关,因此实体对齐算法的准确率尤为重要。由于不同的知识图谱对于实体各方面信息的表达差异性较大,现有针对实体对齐的方法,大多基于图神经网络模型(GNN)、基于卷积的模型、基于胶囊网络的模型等模型,通过知识图谱特有的三元组结构的学习来传播对齐信息。
目前主流的实体对齐框架是:
(1)首先使用知识图谱嵌入(Knowledge Graph Embedding,KGE)模型将实体表示嵌入到低维向量空间中;
(2)然后基于实体向量计算源实体与候选实体的相似度矩阵;
(3)最后根据相似度矩阵获得预测结果。
从信息流的角度,目前知识图谱嵌入KGE模型分为两类:第一类是以关系为中心,强调尾实体来自于关系作用于头实体。这类包括TransE系列、旋转模型、极坐标模型、双线性模型等等,已经在实体链接预测任务大放异彩。但是在实体对齐任务中,这些模型效果表现不佳;第二类是以实体为中心,强调所有实体一律平等,实体间的关系仅作为增强实体表示的信息源之一。这类模型有图神经网络模型(GNN)、基于卷积的模型、基于胶囊网络的模型等等,他们和计算机视觉领域和自然语言处理领域密切相关,可解释性较弱。
特别地,图神经网络的迅速发展推动了第二类以实体为中心的方法的应用,但是经实践这种方式存在着缺少预对齐种子数据的缺点,针对上述缺点,本领域技术人员提出了自举方法,自举方法也叫做自扩展方法,它是在每一轮迭代中,选择若干个置信度较高(全局最高)的“对齐实体对”添加到训练集中来迭代扩展生成“对齐实体对”数据。但是,现有的自扩展方法,缺少预对齐种子数据的问题被一定程度解决,但实践证明,现有的技术依旧存在下述缺点:
(1)交互不足。对实体为中心的模型来说,实体和关系之间的隐性交互是很难建模的,和关系为中心的模型恰恰相反。而现有技术的工作集中在图的连通性上,忽略了关系类型、关系方向、实体信息对关系表示的贡献等等。
(2)低质量自举。为解决缺少预对齐种子数据的缺点提出的自举方法认为,如果模型对其预测的结果有信心,那这个结果应该视为正确,作为额外的数据也加入模型训练,这样模型效果还能提升,BootEA、MRAEA都是优秀且经典的自举方法,但都严重依赖于模型本身的效果,而且生成的数据错误率高,质量低,且只能生成正例,不能生成负例,这导致对预测结果利用率较低的问题。
针对上述问题,本申请实施例提出一种面向多源知识图谱融合的实体对齐方法、实体对齐装置、数据服务系统和可读存储介质。
为了对实体和关系之间的隐性交互建模,本申请实施例设计了一个新颖的图神经网络模 型Echo,以提高实体与关系之间的隐性交互;其次本申请实施例还提出了一个更优异的迭代策略,结合属性的双向全局过滤策略(ABGS),来生成高质量的半监督数据(用于下一轮次训练的“对齐实体对”),且进一步生成的“对齐实体对”,既有正例,还包含负例。
通过本申请实施例提出的技术方案,在跨语言知识图谱数据集上,经过测试,这样能够将模型top1准确率提升到96%,远远超过以往模型的79%。
另外,本申请实施例还包括实体对齐装置、数据服务系统和非临时性计算机可读存储介质,上述方案可以应用在各类场景中的多源知识图谱自动融合任务中,例如:两个金融事件知识图谱、两个医学知识图谱、两个不同百科生成的常识知识图谱。
下面参考附图描述本申请实施例的面向多源知识图谱融合的实体对齐方法、实体对齐装置、数据服务系统和非临时性计算机可读存储介质。
图1为本申请实施例所提供的一种面向多源知识图谱融合的实体对齐方法的流程图。
图2为本申请实施例中实体对齐方法的总体流程图。
如图1和图2所示,本申请实施例提供的一种面向多源知识图谱融合的实体对齐方法,包括以下步骤101至步骤105:
步骤101,提取知识图谱中实体的实体特征,根据实体的实体特征生成实体嵌入矩阵,并根据实体嵌入矩阵获取所述知识图谱的实体表示。
本申请实施例为了获得基础的实体表示,提出了原始聚合网络模块,具体而言,原始聚合网络模块通过采用堆积多层GCN和GAT来提取知识图谱中实体的实体特征,以生成实体嵌入矩阵,例如,在跨语言实体对齐场景中,其原始聚合网络模块可以设计为第一层是GCN,第二层和第三层是GAT。
进一步的,为了解决过平滑问题,本申请实施例在原始聚合网络模块中插入dropout网络和跨层highway网络。
具体而言,本申请实施例设实体嵌入矩阵
Figure PCTCN2021137139-appb-000028
其中|E|是KG的实体数,d e是实体嵌入维数。
则GCN层的输出
Figure PCTCN2021137139-appb-000029
是:
Figure PCTCN2021137139-appb-000030
其中σ(.)是激活函数,一般取为ReLU函数ReLU(x)=max(x,0),
Figure PCTCN2021137139-appb-000031
是每个实体具有自环的邻接矩阵(I为单位矩阵,M为图的邻接矩阵),
Figure PCTCN2021137139-appb-000032
是度矩阵,W是d (in)×d (out)维的权重矩阵,其中,d (in)=d (out)=d e
在本申请实施例中使用highway网络混合两种不同的实体嵌入矩阵,其中,
α=sigmoid(X (a)W+b),
X (out)=(1-α)X (a)+αX (b)
其中,X (a),X (b)是两个实体嵌入矩阵,X (out)是highway网络的输出,W和b分别是线性层的权重矩阵的偏置矢量,α是门控权重向量;
将highway网络的输出X (out)输入dropout网络,以得到混合特征,将混合特征输入到图注意力网络GAT,图注意力网络GAT输出为:
Figure PCTCN2021137139-appb-000033
Figure PCTCN2021137139-appb-000034
其中,
Figure PCTCN2021137139-appb-000035
是第l层GAT输出的实体e i的嵌入表示,
Figure PCTCN2021137139-appb-000036
是第l-1层GAT输出的实体e j′的嵌入表示,α ij表示实体e i的相邻实体的注意力权重,a是可训练的参数向量,维数为2d e×1,a T表示参数向量的转置,[*||*]表示拼接运算,exp(x)=e x,LeakyReLU是激活函数,LeakyReLU(x)=max(x,0)+0.01*min(x,0),N i表示实体e i的所有相邻实体组成的集合。另外,在回响网络模块中本申请实施例也使用了highway网络。
步骤102,根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示。
进一步的,本申请实施例根据由步骤101得到的实体表示计算实体与相邻实体的关系信息,并根据关系信息增强实体表示,以得到知识图谱的完整实体表示,包括:
将每个关系r k的语义分为两部分,与头实体相关的部分
Figure PCTCN2021137139-appb-000037
和与尾实体相关的部分
Figure PCTCN2021137139-appb-000038
每个实体x的表示可以拆分为x h=x (PAN)W h和x t=x (PAN)W t,其中W h,
Figure PCTCN2021137139-appb-000039
是权重矩阵,d r是关系嵌入维数,x (PAN)是来自原始聚合层输出的嵌入矩阵X (PAN)的实体嵌入;
采用所述图注意力网络GAT将实体信息传播到关系,
Figure PCTCN2021137139-appb-000040
Figure PCTCN2021137139-appb-000041
其中,
Figure PCTCN2021137139-appb-000042
是基于关系头语义
Figure PCTCN2021137139-appb-000043
的实体e i作为头实体的部分表示,
Figure PCTCN2021137139-appb-000044
是与头实体e i相关的关系列表,α ik表示关系r k关于头实体e i的注意力权重;
Figure PCTCN2021137139-appb-000045
计算出
Figure PCTCN2021137139-appb-000046
和从
Figure PCTCN2021137139-appb-000047
计算出
Figure PCTCN2021137139-appb-000048
使用所述Highway网络自动平衡
Figure PCTCN2021137139-appb-000049
Figure PCTCN2021137139-appb-000050
中的信息,并通过拼接获得e i的完整实体表示
Figure PCTCN2021137139-appb-000051
Figure PCTCN2021137139-appb-000052
由此可知,本申请实施例中的回响网络模块输出的是知识图谱的完整实体表示,具体通过下述方式得到知识图谱的完整实体表示:
本申请实施例中的完整实体表示由邻居关系表示的两个视图动态计算而成,且关系表示是基于原始实体表示生成的,其中,回响网络模块设计与以前的模型不同,因为它不忽略实体对关系的贡献,且使关系的不同部分发挥作用,换言之,本申请实施例中回响网络模块的设计遵循这样的思想,即必须进一步利用关系信息来增强实体表示,而不能忽略实体对关系的贡献。
由此,本申请实施例将每个关系r k的语义分为两部分,与头实体相关的部分
Figure PCTCN2021137139-appb-000053
和与尾实体相关的部分
Figure PCTCN2021137139-appb-000054
也就是说,每个部分仅取决于相关实体。
类似地,每个实体x的表示可以拆分为x h=x (PAN)W h和x t=x (PAN)W t,其中W h,
Figure PCTCN2021137139-appb-000055
Figure PCTCN2021137139-appb-000056
是权重矩阵,d r是关系嵌入维数,x (PAN)是来自原始聚合层输出的嵌入矩阵X (PAN)的实体嵌入。
本申请实施例采用GAT将实体信息传播到关系,如下所示:
Figure PCTCN2021137139-appb-000057
Figure PCTCN2021137139-appb-000058
其中
Figure PCTCN2021137139-appb-000059
是和关系r k相连的头实体集合,
Figure PCTCN2021137139-appb-000060
是和关系r k头实体e i相连的尾实体集合,α ijk表示基于头实体e i和尾实体e j的从头实体e i到关系r k的注意力权重。
Figure PCTCN2021137139-appb-000061
直接传递到下一层GAT,以输出完整实体表示的作为头实体的部分。
本申请实施例有:
Figure PCTCN2021137139-appb-000062
Figure PCTCN2021137139-appb-000063
其中
Figure PCTCN2021137139-appb-000064
是基于关系头语义
Figure PCTCN2021137139-appb-000065
的实体e i作为头实体的部分表示,
Figure PCTCN2021137139-appb-000066
是与头实体e i相关的关系列表,而不是集合,这里允许重复特定于不同尾实体的关系,α ik表示关系r k关于头实体e i的注意力权重。以相同的方式,本申请实施例可以从
Figure PCTCN2021137139-appb-000067
计算出
Figure PCTCN2021137139-appb-000068
和从
Figure PCTCN2021137139-appb-000069
计算出
Figure PCTCN2021137139-appb-000070
然后,本申请实施例应用Highway网络自动平衡
Figure PCTCN2021137139-appb-000071
Figure PCTCN2021137139-appb-000072
中的信息,并通过拼接获得e i的 完整实体表示
Figure PCTCN2021137139-appb-000073
Figure PCTCN2021137139-appb-000074
步骤103,根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵。
进一步的,本申请实施例根据完整实体表示获取完整实体嵌入矩阵,根据完整实体嵌入矩阵获取最终实体嵌入矩阵,包括:
使用回响网络输出完整实体对应的完整实体嵌入矩阵X (EN),并输出最终实体嵌入矩阵
Figure PCTCN2021137139-appb-000075
Figure PCTCN2021137139-appb-000076
在本申请实施例中,经过回响网络后的实体表示才是完整的实体表示。完整聚合网络模块在此基础上,再次聚合邻居实体的信息,以获取最终实体嵌入矩阵,具体获取方式如下:
利用Echo网络输出的完整实体嵌入矩阵X (EN),本申请实施例继续使用GAT层再次从邻居那里聚合信息。
尽管它们具有相同的结构,但该层中的GAT比原始聚合层中的GAT强大得多,因为该层进一步从解构的关系中获得实体信息,而原始聚合层忽略关系在实体上的作用,所以完整聚合网络对于聚合最佳实体表示至关重要。
为了简化问题,我们使用和原始聚合网络(PAN)模块相同的注意力层。
最后,最终输出实体嵌入矩阵
Figure PCTCN2021137139-appb-000077
为:
Figure PCTCN2021137139-appb-000078
步骤104,根据最终实体嵌入矩阵和数据集计算损失函数,其中,损失函数计算公式是:
Figure PCTCN2021137139-appb-000079
其中,P +是正样本集,P -是从正样本集中生成的负样本集,
Figure PCTCN2021137139-appb-000080
是迭代策略生成的负样本集,λ是超参数,x i是来自
Figure PCTCN2021137139-appb-000081
实体嵌入向量,d(x i,x j)是距离函数d(x i,x j)=|x i-x j|,P +由两部分组成,一部分是原始的预对齐实体的训练集P,另一部分是结合属性的双向全局过滤策略生成的迭代正样本集
Figure PCTCN2021137139-appb-000082
Figure PCTCN2021137139-appb-000083
具体而言,本申请实施例对齐损失函数计算模块负责根据实体表示和数据集,计算损失,用于神经网络的训练,损失函数计算公式是使用曼哈顿距离的Hinge损失。
步骤105,根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
为了生成高质量样本,本申请实施例提出结合属性的双向全局过滤策略方法,及相应处理模块。该策略的输入是实体候选集E1,E2和关系相似度矩阵
Figure PCTCN2021137139-appb-000084
输出是迭 代正样本集
Figure PCTCN2021137139-appb-000085
和迭代负样本集
Figure PCTCN2021137139-appb-000086
其中,关系相似度矩阵的第i行第j列元素
Figure PCTCN2021137139-appb-000087
d(x i,x j)是距离函数d(x i,x j)=|x i-x j|。
|E1|和|E2|分别是集合E1和E2的元素个数。
在本申请实施例中,双向全局过滤策略模块的任务是:负责在模型训练的过程中,不断生成高质量的对齐数据,用于模型的下一轮训练。由此,本申请实施例提出的双向全局过滤策略模块结合了实体的属性信息,通过属性信息的引入,进一步提升了正例生成的精确度,降低了正例的生成误差,对下一轮迭代训练质量的影响。
图3为本申请实施例中双向全局过滤策略的流程图。
进一步的,如图3所示,本申请实施例中的双向全局过滤策略包括步骤201至步骤204。
步骤201计算属性相似度矩阵和属性值相似度矩阵,具体的计算方式如下:
(1)计算基于属性名称的相似度。
计算基于属性名称的相似度时,在跨语言对齐中可以是将不同语言(中文、法语、德语等)的属性名称翻译成相同的语言。在同语言场景下的多源实体对齐,可以直接计算两个属性名称的相似度。
这里以跨语言对齐为例。首先,将属性的名称翻译成相同的语言(英语),然后根据字符串匹配测度(Sorensen-Dice系数)作为相似度,按top1相似度大于给定阈值λ过滤出对齐属性对。接下来,使用这些可比较的属性,获得实体e i的属性集Attr(e i)。最后,可以计算基于属性的相似度矩阵S attr,其中第i行第j列元素
Figure PCTCN2021137139-appb-000088
e i,e j是分别来自KG1和KG2的两个实体,
Figure PCTCN2021137139-appb-000089
表示两个集合A和B之间的Jaccard相似度。
(2)计算基于属性值的相似度
为了基于属性值计算e i,e j的相似度,首先获取公共属性集C attr=Attr(e i)∩Attr(e j)。
对于C attr中的每个属性,基于实体e i和e j的属性a的值相似度为
Figure PCTCN2021137139-appb-000090
其中Value a(e i)是实体e i的属性a的值集。
通过平均C attr中所有属性的值相似度,得到基于属性值的相似度矩阵S attr_value,其中第i行第j列元素
Figure PCTCN2021137139-appb-000091
步骤202,根据所述属性相似度矩阵和属性值相似度矩阵计算最终相似度矩阵。
具体而言,本申请实施例先获得由步骤202获得三个相似度矩阵S attr_value,S attr和S rel,再使用超参数α 123∈[0,1]计算最终的对齐矩阵:
S=α 1S attr-value2S attr3S rel
步骤203,根据所述最终相似度矩阵计算局部对齐的结果。
具体而言,本申请实施例根据最终相似度矩阵S,计算局部对齐的结果如下:
Figure PCTCN2021137139-appb-000092
Figure PCTCN2021137139-appb-000093
Figure PCTCN2021137139-appb-000094
Figure PCTCN2021137139-appb-000095
其中S(e 1,e 2)表示最终相似度矩阵中实体e 1和实体e 2的相似度。
Figure PCTCN2021137139-appb-000096
指遍历集合E2中所有元素,取其中使S(e 1,e 2)最大的实体。
Figure PCTCN2021137139-appb-000097
为根据左边实体预测右边实体所得结果;类似地,
Figure PCTCN2021137139-appb-000098
为根据右边实体预测左边实体所得结果。
Figure PCTCN2021137139-appb-000099
分别是根据局部对齐结果所获得的正样本集和负样本集,这两个样本集在下一步中需要用到。
步骤204,使用局部对齐和全局对齐来生成半监督数据,以生成迭代正样本集和迭代负样本集。
具体而言,本申请实施例同时使用局部对齐和全局对齐来生成半监督数据。
全局对齐是指从候选集中选取目标实体的过程是不放回的。
与之相比,局部对齐是指从候选集中选取目标实体时是有放回的。
因为局部对齐生成的两个样本集包含了很多无法确保是正确还是错误的样本,我们用更严格的全局对齐来过滤它。
设全局对齐的结果为P global。然后迭代正样本集
Figure PCTCN2021137139-appb-000100
和迭代负样本集
Figure PCTCN2021137139-appb-000101
计算如下:
Figure PCTCN2021137139-appb-000102
最终,该策略输出迭代正样本集
Figure PCTCN2021137139-appb-000103
和迭代负样本集
Figure PCTCN2021137139-appb-000104
供神经网络模型训练中使用。
为了便于本领域技术人员更好的理解本申请实施例,现提供结合属性的双向全局过滤策略的伪代码如下:
Figure PCTCN2021137139-appb-000105
上述实施例的应用场景分为原图应用和支撑应用。其中,原图应用指使用融合好的知识图谱直接进行应用,如图查询;支撑应用指以融合好的知识图谱为下游应用提供支撑,如推 荐、搜索、问答。
具体而言,本申请实施例中的原图应用包括:学术知识查询、法律案牍查询、企业关系查询、嫌疑人关系查询、保险理赔案例查询、临床病例查询、业务流程查询等。
例如,以学术知识查询为例,利用此发明融合多语言数据源的学术知识图谱,学术名词实体对齐后,有利于跨源的相关知识的搜索查询。如:查询中文的学术名词“牛顿力学”,其英语源知识图谱的对齐实体为“Newtonian Mechanics”,则可以将英语源知识图谱中关于“Newtonian Mechanics”的学术知识添加到“牛顿力学”的查询结果中。
具体而言,本申请实施例中的支撑应用包括:智能问答、智能搜索、智能推荐、决策分析系统、群体画像、全息档案、风险预警、公安线索挖掘、辅助临床诊疗等。
例如,以智能搜索为例,基于知识图谱的智能搜索能对文本、图片、视频等复杂多元对象进行跨媒体搜索,也能实现篇章级、段落级、语句级的多粒度搜索。智能搜索让计算机更准确地识别和理解用户深层的搜索意图和需求,在多源知识图谱中查找出目标实体及其相关内容,对结果内容进行实体排序和分类,并以符合人类习惯的自然语言的形式展示,从而提高搜索体验。
为了实现上述实施例,本发明还提出一种面向多源知识图谱融合的实体对齐装置。
图4为本申请实施例所提供的一种面向多源知识图谱融合的实体对齐装置的结构示意图。
如图4所示,本申请实施例提供的一种面向多源知识图谱融合的实体对齐装置,包括:
原始聚合网络模块10,用于提取知识图谱中实体的第一实体特征和相邻实体的第二实体特征,根据所述实体的第一实体特征生成第一实体嵌入矩阵,根据所述相邻实体的第二实体特征生成第二实体嵌入矩阵,并对所述第一实体嵌入矩阵和所述第二实体嵌入矩阵进行聚合以得到所述知识图谱的实体表示;
回响网络模块20,用于根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示;
完整聚合网络模块30,用于根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵;
对齐损失函数计算模块40,用于根据所述最终实体嵌入矩阵和数据集计算损失函数;
结合属性的双向全局过滤策略模块50,用于根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
为了实现上述实施例,本申请实施例提出了一种面向多源知识图谱自动化集成的数据服务系统。
图5为本申请实施例所提供的一种面向多源知识图谱自动化集成的数据服务系统的结 构示意图。
如图5所示,本申请实施例提供的一种面向多源知识图谱自动化集成的数据服务系统,包括:
待对齐知识图谱数据源管理模块60,用于保存和管理多个知识图谱数据源;
数据管理模块70,用于获取待对齐知识图谱数据,将待对齐知识图谱数据转换成预设数据格式的待对齐知识图谱数据;
知识融合模块80,用于使用本申请实施例所述的面向多源知识图谱融合的实体对齐方法中训练后的神网络模型对预设数据格式的待对齐知识图谱数据进行预测以得到对齐实体对,根据所述对齐实体对将待对齐知识图谱数据融合为知识图谱;
已融合知识图谱管理模块90,用于保存和管理所述知识图谱,并根据所述知识图谱发布数据服务。
为了便于本领域技术人员更好的理解本申请实施例提出的面向多源知识图谱自动化集成的数据服务系统,现用下述运行实例进行说明。
假设有两个数据源:源A和源B,分别位于服务器A和服务器B。
服务器1运行待对齐知识图谱数据源管理模块,它保存的描述数据记录类似下表,下表为待对齐知识图谱数据源管理模块的一个描述数据表示例:
名称 地址 数据格式 数据量 ...
源A 123.123.123.1:8888 [id,name],[h_id,r_id,t_id] 8MB ...
源B 123.123.123.2:8888 [h_name,r_name,t_name] 1G ...
服务器2运行数据管理模块。
服务器3运行知识融合模块。
服务器4运行已融合知识图谱管理模块。
一个运行实例如下:
1)服务器2向服务器1发起请求,获取所有数据源的描述数据,用于动态组装数据转换模块。接着委托服务器1依次请求源A的数据和源B的数据。
2)服务器1将依次向123.123.123.1:8888和123.123.123.2:8888发起请求,并将数据转发给服务器2。
3)服务器2接收服务器1的数据,执行数据读取模块、数据转换模块、数据传输模块,将原始的知识图谱数据转换为标准格式,再将标准格式的多个知识图谱数据转发给服务器3。
4)服务器3接收服务器2的标准数据;首先运行训练模块,使神经网络模型具有融合多个知识图谱的能力;接着运行预测模块,让训练好的神经网络模型预测多个知识图谱的交集,即对齐实体对;最后运行融合模块,将多个知识图谱融合为一个知识图谱,发送给服务器4。
5)服务器4接收服务器3的统一知识图谱,将该知识图谱发布为数据服务。第三方可以订阅该服务,将该知识图谱拉取到他们的服务器上。服务器4可以作为数据源,继续为下一个面向多源知识图谱自动化集成的数据服务系统提供数据,以构建更大更完善的知识图 谱。
为了实现上述实施例,本申请实施例还提出一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如本申请实施例所述的面向多源知识图谱融合的实体对齐方法。
为了实现上述实施例,本申请实施例还提出一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现如本申请实施例所述的面向多源知识图谱融合的实体对齐方法。
为了实现上述实施例,本申请实施例还提出一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现如本申请实施例所述的面向多源知识图谱融合的实体对齐方法。
综上,本申请实施例提出的本申请实施例的的面向多源知识图谱融合的实体对齐方法、实体对齐装置、数据服务系统及计算机设备,上述方案具有以下优点:
一是,图神经网络模型Echo进一步加强了实体和关系之间的交互,使得实体表示能够感知关系的不同部分,其结构和计算过程具有新颖性和有效性。
二是,迭代生成训练数据的结合属性的双向全局过滤策略,能够解决缺乏人工对齐种子的问题,并能够大幅提高模型的准确率。
三是,利用上述实体对齐装置的面向多源知识图谱自动化集成的数据服务系统,将知识图谱数据源抽象化为描述数据,并自动化运行对齐装置来融合多源知识图谱,自动发布融合后的知识图谱为数据服务,使得第三方能方便获取统一的大规模知识图谱数据资源。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所 属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (11)

  1. 一种面向多源知识图谱融合的实体对齐方法,其特征在于,包括:
    提取知识图谱中实体的实体特征,根据所述实体的实体特征生成实体嵌入矩阵,并根据所述实体嵌入矩阵获取所述知识图谱的实体表示;
    根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示;
    根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵;
    根据所述最终实体嵌入矩阵和数据集计算损失函数;
    根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
  2. 如权利要求1所述的面向多源知识图谱融合的实体对齐方法,其特征在于,还包括:
    dropout网络和跨层highway网络;
    其中,使用highway网络混合两种不同的实体嵌入矩阵,其中,
    α=sigmoid(X (a)W+b),
    X (out)=(1-α)X (a)+αX (b)
    其中,X (a),X (b)是两个实体嵌入矩阵,X (out)是highway网络的输出,W和b分别是线性层的权重矩阵的偏置矢量,α是门控权重向量;
    将所述highway网络的输出X (out)输入dropout网络,以得到混合特征,将所述混合特征输入到图注意力网络GAT,所述图注意力网络GAT输出为:
    Figure PCTCN2021137139-appb-100001
    Figure PCTCN2021137139-appb-100002
    其中,
    Figure PCTCN2021137139-appb-100003
    是第l层GAT输出的实体e i的嵌入表示,
    Figure PCTCN2021137139-appb-100004
    是第l-1层GAT输出的实体e j′的嵌入表示,α ij表示实体e i的相邻实体的注意力权重,a是可训练的参数向量,维数为2d e×1,a T表示参数向量的转置,[*||*]表示拼接运算,exp(x)=e x,LeakyReLU是激活函数,LeakyReLU(x)=max(x,0)+0.01*min(x,0),N i表示实体e i的所有相邻实体组成的集合。
  3. 如权利要求2所述的面向多源知识图谱融合的实体对齐方法,其特征在于,根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示,包括:
    将每个关系r k的语义分为两部分,与头实体相关的部分
    Figure PCTCN2021137139-appb-100005
    和与尾实体相关的部分
    Figure PCTCN2021137139-appb-100006
    每个实体x的表示可以拆分为x h=x (PAN)W h和x t=x (PAN)W t,其中W h,
    Figure PCTCN2021137139-appb-100007
    是权重矩阵,d r是关系嵌入维数,x (PAN)是来自原始聚合层输出的嵌入矩阵X (PAN)的实体嵌入;
    采用所述图注意力网络GAT将实体信息传播到关系,
    Figure PCTCN2021137139-appb-100008
    Figure PCTCN2021137139-appb-100009
    其中,
    Figure PCTCN2021137139-appb-100010
    是基于关系头语义
    Figure PCTCN2021137139-appb-100011
    的实体e i作为头实体的部分表示,
    Figure PCTCN2021137139-appb-100012
    是与头实体e i相关的关系列表,α ik表示关系r k关于头实体e i的注意力权重;
    Figure PCTCN2021137139-appb-100013
    计算出
    Figure PCTCN2021137139-appb-100014
    和从
    Figure PCTCN2021137139-appb-100015
    计算出
    Figure PCTCN2021137139-appb-100016
    使用所述Highway网络自动平衡
    Figure PCTCN2021137139-appb-100017
    Figure PCTCN2021137139-appb-100018
    中的信息,并通过拼接获得e i的完整实体表示
    Figure PCTCN2021137139-appb-100019
    Figure PCTCN2021137139-appb-100020
  4. 如权利要求3所述的面向多源知识图谱融合的实体对齐方法,其特征在于,根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵,包括:
    使用回响网络输出所述完整实体对应的完整实体嵌入矩阵X (EN),并输出所述最终实体嵌入矩阵
    Figure PCTCN2021137139-appb-100021
    Figure PCTCN2021137139-appb-100022
  5. 如权利要求4所述的面向多源知识图谱融合的实体对齐方法,其特征在于,所述损失函数计算公式是:
    Figure PCTCN2021137139-appb-100023
    其中,P +是正样本集,P -是从正样本集中生成的负样本集,
    Figure PCTCN2021137139-appb-100024
    是迭代策略生成的负样本集,λ是超参数,x i是来自
    Figure PCTCN2021137139-appb-100025
    实体嵌入向量,d(x i,x j)是距离函数d(x i,x j)=|x i-x j|,P +由两部分组成,一部分是原始的预对齐实体的训练集P,另一部分是结合属性的双向全局过滤策略生成的迭代正样本集
    Figure PCTCN2021137139-appb-100026
    Figure PCTCN2021137139-appb-100027
  6. 如权利要求1所述的面向多源知识图谱融合的实体对齐方法,其特征在于,所述双向全局过滤策略包括:
    计算属性相似度矩阵和属性值相似度矩阵;
    根据所述属性相似度矩阵和属性值相似度矩阵计算最终相似度矩阵;
    根据所述最终相似度矩阵计算局部对齐的结果;
    使用局部对齐和全局对齐来生成半监督数据,以生成迭代正样本集和迭代负样本集。
  7. 一种面向多源知识图谱融合的实体对齐装置,其特征在于,包括:
    原始聚合网络模块,用于提取知识图谱中实体的实体特征,根据所述实体的实体特征生成实体嵌入矩阵,并根据所述实体嵌入矩阵获取所述知识图谱的实体表示;
    回响网络模块,用于根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示;
    完整聚合网络模块,用于根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵;
    对齐损失函数计算模块,用于根据所述最终实体嵌入矩阵和数据集计算损失函数;
    结合属性的双向全局过滤策略模块,用于根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
  8. 一种面向多源知识图谱自动化集成的数据服务系统,其特征在于,包括:
    待对齐知识图谱数据源管理模块,用于保存和管理多个知识图谱数据源;
    数据管理模块,用于获取待对齐知识图谱数据,将待对齐知识图谱数据转换成预设数据格式的待对齐知识图谱数据;
    知识融合模块,用于使用如权利要求1-6中任一项所述的面向多源知识图谱融合的实体对齐方法中训练后的神网络模型对预设数据格式的待对齐知识图谱数据进行预测以得到对齐实体对,根据所述对齐实体对将待对齐知识图谱数据融合为知识图谱;
    已融合知识图谱管理模块,用于保存和管理所述知识图谱,并根据所述知识图谱发布数据服务。
  9. 一种非临时性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现以下步骤:
    提取知识图谱中实体的实体特征,根据所述实体的实体特征生成实体嵌入矩阵,并根据所述实体嵌入矩阵获取所述知识图谱的实体表示;
    根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强 所述实体表示,以得到所述知识图谱的完整实体表示;
    根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵;
    根据所述最终实体嵌入矩阵和数据集计算损失函数;
    根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
  10. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现以下步骤:
    提取知识图谱中实体的实体特征,根据所述实体的实体特征生成实体嵌入矩阵,并根据所述实体嵌入矩阵获取所述知识图谱的实体表示;
    根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示;
    根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵;
    根据所述最终实体嵌入矩阵和数据集计算损失函数;
    根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
  11. 一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现以下步骤:
    提取知识图谱中实体的实体特征,根据所述实体的实体特征生成实体嵌入矩阵,并根据所述实体嵌入矩阵获取所述知识图谱的实体表示;
    根据所述实体表示计算所述实体与所述相邻实体的关系信息,并根据所述关系信息增强所述实体表示,以得到所述知识图谱的完整实体表示;
    根据所述完整实体表示获取完整实体嵌入矩阵,根据所述完整实体嵌入矩阵获取最终实体嵌入矩阵;
    根据所述最终实体嵌入矩阵和数据集计算损失函数;
    根据所述损失函数和实体的属性信息,采用双向全局过滤策略生成样本集,并根据所述样本集对神经网络模型进行迭代训练,使得训练后的神网络模型具有对齐和融合多个知识图谱的能力,其中,所述样本集包括迭代正样本集和迭代负样本集。
PCT/CN2021/137139 2021-06-29 2021-12-10 面向多源知识图谱融合的实体对齐方法、装置与系统 WO2023273182A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110726190.5A CN113641826B (zh) 2021-06-29 2021-06-29 面向多源知识图谱融合的实体对齐方法、装置与系统
CN202110726190.5 2021-06-29

Publications (1)

Publication Number Publication Date
WO2023273182A1 true WO2023273182A1 (zh) 2023-01-05

Family

ID=78416276

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137139 WO2023273182A1 (zh) 2021-06-29 2021-12-10 面向多源知识图谱融合的实体对齐方法、装置与系统

Country Status (2)

Country Link
CN (1) CN113641826B (zh)
WO (1) WO2023273182A1 (zh)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860152A (zh) * 2023-02-20 2023-03-28 南京星耀智能科技有限公司 一种面向人物军事知识发现的跨模态联合学习方法
CN116028853A (zh) * 2023-02-14 2023-04-28 华北电力大学 一种基于知识图谱的家庭电器识别方法与系统
CN116150405A (zh) * 2023-04-19 2023-05-23 中电科大数据研究院有限公司 一种多场景的异构数据处理方法
CN116227592A (zh) * 2023-05-06 2023-06-06 城云科技(中国)有限公司 一种多源知识图谱对齐模型及其构建方法、装置及应用
CN116561346A (zh) * 2023-07-06 2023-08-08 北京邮电大学 一种基于图卷积网络和信息融合的实体对齐方法及装置
CN116975256A (zh) * 2023-07-28 2023-10-31 三峡大学 抽水蓄能电站地下厂房施工过程多源信息的处理方法及系统
CN116992137A (zh) * 2023-07-31 2023-11-03 中国科学院地理科学与资源研究所 一种顾及空间异质性的可解释生态文明模式推荐方法
CN117149839A (zh) * 2023-09-14 2023-12-01 中国科学院软件研究所 一种面向开源软件供应链的跨生态软件检测方法及装置
CN117235281A (zh) * 2023-09-22 2023-12-15 武汉贝塔世纪科技有限公司 基于知识图谱技术的多元数据管理方法及系统
CN117390197A (zh) * 2023-10-23 2024-01-12 深圳市云鲸视觉科技有限公司 城市模型区域表示生成方法、装置、电子设备及介质
CN117407689A (zh) * 2023-12-14 2024-01-16 之江实验室 一种面向实体对齐的主动学习方法、装置和电子装置
CN117435935A (zh) * 2023-09-13 2024-01-23 广州大学 基于自监督图注意力网络的人员群体预测方法及装置
CN117556277A (zh) * 2024-01-12 2024-02-13 暨南大学 一种用于知识图谱实体对齐的初始对齐种子生成方法
CN117688247A (zh) * 2024-01-31 2024-03-12 云南大学 推荐方法、终端设备及存储介质
CN117743602A (zh) * 2024-02-06 2024-03-22 中国科学院国家空间科学中心 一种支持双侧悬空实体检测的实体对齐系统及方法
CN117788203A (zh) * 2024-02-28 2024-03-29 西安华联电力电缆有限公司 一种改进的交联聚乙烯绝缘电力电缆的高效生产制备方法
CN118095445A (zh) * 2024-04-24 2024-05-28 武汉纺织大学 一种基于知识图谱的少样本多跳推理优化方法
CN116204660B (zh) * 2023-03-28 2024-06-11 北京航空航天大学 一种多源异构数据驱动的领域知识图谱构建方法
CN118296976A (zh) * 2024-06-06 2024-07-05 浙江大学 微带滤波器设计迭代方法、系统、介质、产品及终端
CN118364906A (zh) * 2024-06-19 2024-07-19 安徽大学 应用可信度感知迭代训练策略实现实体对齐的方法、系统
CN118444620A (zh) * 2024-07-08 2024-08-06 青岛科技大学 一种面向终端设备的智能场景生成方法及智慧家庭系统
CN118520943A (zh) * 2023-12-08 2024-08-20 浙江大学 基于自交错注意力机制与扩散聚合的知识图谱嵌入方法
CN118626573A (zh) * 2024-08-12 2024-09-10 国网信通亿力科技有限责任公司 一种电网量测数据质量智能监测方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641826B (zh) * 2021-06-29 2024-03-26 北京邮电大学 面向多源知识图谱融合的实体对齐方法、装置与系统
CN114299128A (zh) * 2021-12-30 2022-04-08 咪咕视讯科技有限公司 多视角定位检测方法及装置
CN114357193B (zh) * 2022-01-10 2024-04-02 中国科学技术大学 一种知识图谱实体对齐方法、系统、设备与存储介质
CN114942998B (zh) * 2022-04-25 2024-02-13 西北工业大学 融合多源数据的知识图谱邻域结构稀疏的实体对齐方法
CN115329158B (zh) * 2022-10-17 2023-03-24 湖南能源大数据中心有限责任公司 一种基于多源异构电力数据的数据关联方法
CN115659985B (zh) * 2022-12-09 2023-03-31 南方电网数字电网研究院有限公司 电力知识图谱实体对齐方法、装置和计算机设备
CN116434976A (zh) * 2022-12-29 2023-07-14 之江实验室 一种融合多源知识图谱的药物重定位方法和系统
CN116432750B (zh) * 2023-04-13 2023-10-27 华中师范大学 一种基于盒嵌入的少样本知识图谱补全方法
CN116610820B (zh) * 2023-07-21 2023-10-20 智慧眼科技股份有限公司 一种知识图谱实体对齐方法、装置、设备及存储介质
CN118364428B (zh) * 2024-06-18 2024-08-20 安徽思高智能科技有限公司 面向rpa的多模态实体对齐自动融合方法、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122111A1 (en) * 2017-10-24 2019-04-25 Nec Laboratories America, Inc. Adaptive Convolutional Neural Knowledge Graph Learning System Leveraging Entity Descriptions
CN110188206A (zh) * 2019-05-08 2019-08-30 北京邮电大学 基于翻译模型的协同迭代联合实体对齐方法及装置
CN111753024A (zh) * 2020-06-24 2020-10-09 河北工程大学 一种面向公共安全领域的多源异构数据实体对齐方法
CN111931505A (zh) * 2020-05-22 2020-11-13 北京理工大学 一种基于子图嵌入的跨语言实体对齐方法
CN112131395A (zh) * 2020-08-26 2020-12-25 浙江工业大学 一种基于动态阈值的迭代式知识图谱实体对齐方法
CN113641826A (zh) * 2021-06-29 2021-11-12 北京邮电大学 面向多源知识图谱融合的实体对齐方法、装置与系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472065B (zh) * 2019-07-25 2022-03-25 电子科技大学 基于gcn孪生网络的跨语言知识图谱实体对齐方法
CN112784065B (zh) * 2021-02-01 2023-07-14 东北大学 基于多阶邻域注意力网络的无监督知识图谱融合方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122111A1 (en) * 2017-10-24 2019-04-25 Nec Laboratories America, Inc. Adaptive Convolutional Neural Knowledge Graph Learning System Leveraging Entity Descriptions
CN110188206A (zh) * 2019-05-08 2019-08-30 北京邮电大学 基于翻译模型的协同迭代联合实体对齐方法及装置
CN111931505A (zh) * 2020-05-22 2020-11-13 北京理工大学 一种基于子图嵌入的跨语言实体对齐方法
CN111753024A (zh) * 2020-06-24 2020-10-09 河北工程大学 一种面向公共安全领域的多源异构数据实体对齐方法
CN112131395A (zh) * 2020-08-26 2020-12-25 浙江工业大学 一种基于动态阈值的迭代式知识图谱实体对齐方法
CN113641826A (zh) * 2021-06-29 2021-11-12 北京邮电大学 面向多源知识图谱融合的实体对齐方法、装置与系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG YOUMIN; LIU LI; FU SHUN; ZHONG FUJIN: "Entity Alignment Across Knowledge Graphs Based on Representative Relations Selection", 2018 5TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 10 November 2018 (2018-11-10), pages 1056 - 1061, XP033489856, DOI: 10.1109/ICSAI.2018.8599288 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028853A (zh) * 2023-02-14 2023-04-28 华北电力大学 一种基于知识图谱的家庭电器识别方法与系统
CN116028853B (zh) * 2023-02-14 2023-09-12 华北电力大学 一种基于知识图谱的家庭电器识别方法与系统
CN115860152A (zh) * 2023-02-20 2023-03-28 南京星耀智能科技有限公司 一种面向人物军事知识发现的跨模态联合学习方法
CN116204660B (zh) * 2023-03-28 2024-06-11 北京航空航天大学 一种多源异构数据驱动的领域知识图谱构建方法
CN116150405A (zh) * 2023-04-19 2023-05-23 中电科大数据研究院有限公司 一种多场景的异构数据处理方法
CN116150405B (zh) * 2023-04-19 2023-06-27 中电科大数据研究院有限公司 一种多场景的异构数据处理方法
CN116227592A (zh) * 2023-05-06 2023-06-06 城云科技(中国)有限公司 一种多源知识图谱对齐模型及其构建方法、装置及应用
CN116227592B (zh) * 2023-05-06 2023-07-18 城云科技(中国)有限公司 一种多源知识图谱对齐模型及其构建方法、装置及应用
CN116561346A (zh) * 2023-07-06 2023-08-08 北京邮电大学 一种基于图卷积网络和信息融合的实体对齐方法及装置
CN116561346B (zh) * 2023-07-06 2023-10-31 北京邮电大学 一种基于图卷积网络和信息融合的实体对齐方法及装置
CN116975256A (zh) * 2023-07-28 2023-10-31 三峡大学 抽水蓄能电站地下厂房施工过程多源信息的处理方法及系统
CN116975256B (zh) * 2023-07-28 2024-01-16 三峡大学 抽水蓄能电站地下厂房施工过程多源信息的处理方法及系统
CN116992137A (zh) * 2023-07-31 2023-11-03 中国科学院地理科学与资源研究所 一种顾及空间异质性的可解释生态文明模式推荐方法
CN117435935A (zh) * 2023-09-13 2024-01-23 广州大学 基于自监督图注意力网络的人员群体预测方法及装置
CN117149839A (zh) * 2023-09-14 2023-12-01 中国科学院软件研究所 一种面向开源软件供应链的跨生态软件检测方法及装置
CN117149839B (zh) * 2023-09-14 2024-04-16 中国科学院软件研究所 一种面向开源软件供应链的跨生态软件检测方法及装置
CN117235281A (zh) * 2023-09-22 2023-12-15 武汉贝塔世纪科技有限公司 基于知识图谱技术的多元数据管理方法及系统
CN117390197A (zh) * 2023-10-23 2024-01-12 深圳市云鲸视觉科技有限公司 城市模型区域表示生成方法、装置、电子设备及介质
CN118520943A (zh) * 2023-12-08 2024-08-20 浙江大学 基于自交错注意力机制与扩散聚合的知识图谱嵌入方法
CN117407689A (zh) * 2023-12-14 2024-01-16 之江实验室 一种面向实体对齐的主动学习方法、装置和电子装置
CN117407689B (zh) * 2023-12-14 2024-04-19 之江实验室 一种面向实体对齐的主动学习方法、装置和电子装置
CN117556277A (zh) * 2024-01-12 2024-02-13 暨南大学 一种用于知识图谱实体对齐的初始对齐种子生成方法
CN117556277B (zh) * 2024-01-12 2024-04-05 暨南大学 一种用于知识图谱实体对齐的初始对齐种子生成方法
CN117688247A (zh) * 2024-01-31 2024-03-12 云南大学 推荐方法、终端设备及存储介质
CN117688247B (zh) * 2024-01-31 2024-04-12 云南大学 推荐方法、终端设备及存储介质
CN117743602A (zh) * 2024-02-06 2024-03-22 中国科学院国家空间科学中心 一种支持双侧悬空实体检测的实体对齐系统及方法
CN117743602B (zh) * 2024-02-06 2024-06-04 中国科学院国家空间科学中心 一种支持知识图谱双侧悬空实体检测的实体对齐系统及方法
CN117788203B (zh) * 2024-02-28 2024-05-10 西安华联电力电缆有限公司 一种改进的交联聚乙烯绝缘电力电缆的高效生产制备方法
CN117788203A (zh) * 2024-02-28 2024-03-29 西安华联电力电缆有限公司 一种改进的交联聚乙烯绝缘电力电缆的高效生产制备方法
CN118095445A (zh) * 2024-04-24 2024-05-28 武汉纺织大学 一种基于知识图谱的少样本多跳推理优化方法
CN118296976A (zh) * 2024-06-06 2024-07-05 浙江大学 微带滤波器设计迭代方法、系统、介质、产品及终端
CN118364906A (zh) * 2024-06-19 2024-07-19 安徽大学 应用可信度感知迭代训练策略实现实体对齐的方法、系统
CN118444620A (zh) * 2024-07-08 2024-08-06 青岛科技大学 一种面向终端设备的智能场景生成方法及智慧家庭系统
CN118626573A (zh) * 2024-08-12 2024-09-10 国网信通亿力科技有限责任公司 一种电网量测数据质量智能监测方法及系统

Also Published As

Publication number Publication date
CN113641826B (zh) 2024-03-26
CN113641826A (zh) 2021-11-12

Similar Documents

Publication Publication Date Title
WO2023273182A1 (zh) 面向多源知识图谱融合的实体对齐方法、装置与系统
Kim et al. Transparency and accountability in AI decision support: Explaining and visualizing convolutional neural networks for text information
US11847113B2 (en) Method and system for supporting inductive reasoning queries over multi-modal data from relational databases
CN112308157B (zh) 一种面向决策树的横向联邦学习方法
CN110866124B (zh) 基于多数据源的医学知识图谱融合方法及装置
CN107391677B (zh) 携带实体关系属性的中文通用知识图谱的生成方法及装置
Dong et al. Knowledge curation and knowledge fusion: challenges, models and applications
WO2023179038A1 (zh) 数据标注的方法、ai开发平台、计算设备集群和存储介质
Zhu et al. HUNA: A method of hierarchical unsupervised network alignment for IoT
Ramar et al. Technical review on ontology mapping techniques
Stoter et al. A semantic-rich multi-scale information model for topography
CN117577254A (zh) 医疗领域语言模型构建及电子病历文本结构化方法、系统
Mohotti et al. Corpus-based augmented media posts with density-based clustering for community detection
Yu et al. Intelligent analysis system of college students' employment and entrepreneurship situation: Big data and artificial intelligence-driven approach
Suri et al. Leveraging organizational resources to adapt models to new data modalities
Zhao et al. Entity Alignment: Concepts, Recent Advances and Novel Approaches
Bai Variable incremental adaptive learning model based on knowledge graph and its application in online learning system
Duong et al. A hybrid method for integrating multiple ontologies
Beneventano et al. Semantic annotation of the CEREALAB database by the AGROVOC linked dataset
Mazo et al. Using an ontology of the human cardiovascular system to improve the classification of histological images
Lin et al. Echoea: Echo information between entities and relations for entity alignment
JP2021531540A (ja) マルチソース型の相互運用性および/または情報検索の最適化
Ferranti et al. An experimental analysis on evolutionary ontology meta-matching
Idoudi et al. Ontology knowledge mining for ontology conceptual enrichment
Yang et al. More refined superbag: Distantly supervised relation extraction with deep clustering

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21948107

Country of ref document: EP

Kind code of ref document: A1