CN115858812A

CN115858812A - Embedded alignment method constructed by computer

Info

Publication number: CN115858812A
Application number: CN202211615736.0A
Authority: CN
Inventors: 管仁初; 张春雷; 崔璐; 丰小月
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2022-05-16
Filing date: 2022-12-15
Publication date: 2023-03-28
Also published as: CN114840688A

Abstract

The invention designs an embedded alignment method constructed by a computer, and through the method, the invention constructs the entity name and attribute information of a knowledge graph through a large-scale pre-training language model by the computer, so that the semantic information on the knowledge graph can be comprehensively utilized. Experimental results on different data sets show that the method has stronger robustness. Meanwhile, an ablation experiment is set to verify the effectiveness of the invention.

Description

Embedded alignment method constructed by computer

Technical Field

The invention relates to the technical field of knowledge graphs, in particular to an entity alignment improvement technology of semantic information and structural information on knowledge graphs.

Background

In recent years, with the development of various knowledge graph technologies, the scale of knowledge graphs is continuously expanding, and new knowledge graphs are built in various fields according to self requirements. Nowadays, knowledge maps are widely applied to tasks such as retrieval, question answering and reasoning, and support the application of many industries. However, a single knowledge graph cannot meet the actual application requirements, and information redundancy, heterogeneity and the like exist among knowledge graphs, so that knowledge fusion becomes a topic of interest of people. Entity alignment is an important component of knowledge graph fusion and aims to find nodes pointing to the same entity in the real world among different knowledge graphs.

The entity alignment work of fusing different knowledge graphs has great significance and simultaneously has a plurality of challenges. The main challenges are computational complexity, data quality and acquisition of a priori knowledge.

The challenge of computational complexity comes from the vigorous development of various knowledge bases in the big data age, so that the data scale is huge. This situation causes enormous computing resources to be consumed for aligning entities in the two knowledge graphs, and the design of various entity alignment methods needs to consider to reduce the computational complexity while ensuring accuracy, so that the methods can be applied to the increasingly large-scale knowledge graphs.

The challenge of data quality comes from the construction of various knowledge maps without a uniform standard. For example, two knowledge graphs may focus on different domains, even if the domains are the same, the description languages may be different due to different data sources, and the relationships may be of different kinds. These problems due to different construction modes and purposes are directly reflected in the data quality: the same entity derived from both knowledge-graphs may have different names, such as "Apple" and "Apple"; the same name may refer to different real world entities such as "apple" company and the fruit "apple"; the same entity has different domain structures and attributes in different knowledge maps; the same entity has no standardized name format, and the name is inconsistent due to such problems as abbreviation, space, case and the like, such as "DeoxyriboNucleic Acid" and "DNA"; it may also occur that the entities differ in size, for example "lung cancer" may be considered as "disease" and "cancer" respectively in two different patterns of the knowledge map. The challenge of prior knowledge acquisition comes from aligning seed data sets, i.e., training sets in supervised learning, which acquisition is difficult. In practice, expert manual labeling or crowdsourcing algorithms are often relied upon to enlarge the size of the aligned seed data set and to minimize the occurrence of noisy data. This challenge enables the study of entity alignments to be adapted to achieve as high an accuracy as possible with smaller training data.

Disclosure of Invention

The present invention is directed to a computer-implemented embedded alignment method that solves or partially solves the above-mentioned problems.

In order to achieve the effect of the technical scheme, the technical scheme of the invention is as follows: step one, setting a bidirectional encoder representing unit based on a converter, wherein the bidirectional encoder representing unit based on the converter is used for embedding and constructing words of entity names into vectors with a mean value of 0 and a covariance matrix of an identity matrix; word embedding is to embed a high-dimensional space with the number of all words into a low-dimensional continuous vector space by using a language model and characterization learning in natural language processing, each word or phrase is mapped to a vector on a real number domain, and then a word embedding set of N entity names of two knowledge maps is obtained by a large-scale pre-training language model in a representation unit in a bidirectional encoder based on a transformer

Wherein N is a natural number, j is a natural number from 1 to N, x _j Representing a single word embedding, then calculating the mean value mu and the covariance matrix sigma in the word embedding set, and carrying out singular value decomposition followed byPerforming matrix interception, namely calculating matrix slices according to a value of k which is preset as required, wherein k is a natural number; wherein the calculation of the mean μ and the covariance matrix Σ is given by the formula one: mean value

Covariance matrix>

Where T denotes transpose; in order to reduce the influence of potential information learned in a large-scale pre-training language model on alleviating entity name deviation, different targeted weight files are adopted for different data sets;

step two, when the problems of word ambiguity and name identity are generated, judging whether word embedding sets obtained through the bidirectional encoder representation unit model based on the converter are also identical, if no context information is provided for the bidirectional encoder representation unit based on the converter when the problems of word ambiguity and name identity are generated, starting to carry out step three, namely using the structure information of the knowledge graph to further improve the effect of entity alignment;

extracting neighbors with certain similarity in the aligned entities in the two knowledge maps, fusing relationship information into the structure embedding model in the representation unit in the bidirectional encoder based on the converter by using a graph attention network based on the structure embedding model in the representation unit in the bidirectional encoder based on the converter, judging whether semantic association information possibly occurs in the direct neighbors or the remote neighbors of the aligned entities, and then beginning to aggregate the semantic association information of the direct neighbors and the remote neighbors; the graph attention network is a graph convolution network based on space, and an attention mechanism of the graph attention network is used for determining the weight of a node neighborhood in an attention mechanism characterization mode when characteristic information is aggregated;

step four, in order to expand the effective neighbors of multiple jumps, adopt the door system network to combine first order neighbor and second order neighbor in the presentation unit in the bidirectional encoder based on converter, thus expand the second order to the neighbor entity in the multistage range, and connect and accelerate the structure embedding model in the presentation unit in the bidirectional encoder based on converter through the effective neighbor way of the said jump, in order to improve the training effect after the structure embedding model in the presentation unit in the bidirectional encoder based on converter uses the network of attention of the drawing;

the function of the gate mechanism network is that when the first-order neighbors of the aligned entity are identical or not identical in different knowledge maps, in order to reduce noise caused by the difference of the first-order neighbors, a structural embedding model in a representation unit in a bidirectional encoder based on a converter carries out aggregation of the second-order neighbors;

when different specific entities are respectively a central entity and a first-order neighbor entity of the central entity in the knowledge graph, a first attention weight is set to represent and calculate different associations between the central entity and the first-order neighbor entity of the central entity, and through nonlinear transformation, a first result of the output of a structure embedding model in a representation unit in a bidirectional encoder based on a transformer can approximate a nonlinear function, so that more complex tasks are processed; in order to allow the first attention weight to be compared between different entities, the comparison is preceded by a normalization;

when different specific entities, namely a central entity and a second-order neighbor entity of the central entity in the knowledge graph are set, second attention weight is set to represent and calculate different associations of the central entity and the second-order neighbor entity, and through nonlinear transformation, a second result output by a structure embedding model in a representation unit in a bidirectional encoder based on the transformer can approximate to a nonlinear function, so that more complex tasks are processed; in order to allow the second attention weight to be compared between different entities, the comparison is preceded by a normalization;

using two matrices separately

And matrix->

Transformation for a central entity and a neighbor entity, two matrices @>

And matrix +>

Initialization begins, and both matrices->

And matrix->

The elements in (a) change with the training process so that the input and output of the structure-embedded model in the representation unit in the transformer-based bi-directional encoder meet the system requirements.

Further, the embedding h of the entity S combining the information of the first and second order neighbors _u The calculation is as the formula two:

h _u ＝g(h _i,2 )·h _i,1 +(1-g(h _i,2 ))·h _i,2 a formula two;

wherein h is _i,1 And h _i,2 Respectively, from the first-order neighbors and the second-order neighbors, function g (h) _i,2 )＝σ(Mh _i,2 In + b), σ is an activation function for introducing a nonlinear factor to the graph attention network, so that the function of the graph attention network can be arbitrarily approximated to any nonlinear function, M and b are a weight matrix and a bias vector, respectively, the weight matrix M is initialized using uniform distribution, the bias vector b is initialized using all-zero, and elements in both the weight matrix and the bias vector change along with the training process, so that the input and output of the structure embedding model in the representation unit in the bidirectional encoder based on the transformer meet the requirements;

step five, the first characteristic linear modulation layer combined word embedding and structure embedding are adopted, and the specific calculation mode of the first characteristic linear modulation method is shown as the following formula III:

h _u,e ＝FiLM(h _u,s )＝h _u,w W ₁ e h _u,s +h _u,w W ₂ a formula III;

wherein h is _u,s Structural embeddings representing entity u, h _u,w Word-in representing entity u,. Alpha.representing a Hadamard product operation, is an operation of a matrix, W ₁ And W ₂ The method is characterized in that the method comprises the following steps that two matrixes which are initialized in a distributed mode and use specific uniform distribution are used, and the specific uniform distribution is used for keeping the activation values and the gradients of all layers of the graph attention network consistent in the process of propagation; fiLM (h) _u,s ) Representing a characteristic linear modulation method in a graph attention network, the final output h of the characteristic linear modulation method _u,e Embedding the final entity, and calculating by cosine similarity to obtain an aligned entity pair;

step six, in addition, for a given alignment seed set entity pair, when two entities in the alignment entity pair are from different knowledge maps and the neighborhood structures of different knowledge maps are different, in order to utilize the information in the alignment seed set entity pair as much as possible at the maximum efficiency, before an alignment task starts, the neighborhood structures of the entities in the alignment entity pair are expanded, triples in the two knowledge maps containing the alignment seed set entity pair are supplemented with each other, so that the capacity of a graph attention network for identifying isomorphic subgraphs is enhanced, and the entity pairs in the alignment seed set are more easily subjected to the same embedded representation through the more similar neighborhood structures, so that after the graph attention network is processed and propagated, favorable influence factors are expanded into the embedding of the whole knowledge map, and the effect is more accurate;

setting the precondition in the step six as follows: setting an alignment loss function and a relation loss function, calculating the difference between the forward calculation result of each iteration of the graph attention network and the true value through the alignment loss function and the relation loss function so as to guide the next training to be carried out in the correct direction, and calculating an output prediction value by substituting a sample into a large-scale pre-training language model in a representation unit in a bidirectional encoder based on a converter; calculating errors of the predicted value and the true value by using the alignment loss function and the relationship loss function; returning the error in the minimum gradient direction according to the derivative of the alignment loss function and the relation loss function, correcting a trainable matrix and a trainable vector in a forward calculation formula, and stopping iteration when the loss values of the alignment loss function and the relation loss function reach a satisfactory value, and finishing the training of a large-scale pre-training language model in a representation unit in a bidirectional encoder based on a converter;

step seven, designing an alignment loss function L by reducing the distance between the positive example entities and expanding the distance between the negative example entities _a ：

Wherein G is ⁺ And G ^- Respectively representing a set of positive example entity pairs and negative example entity pairs, wherein the positive example entity pairs refer to two entities aligned in two knowledge graphs, (m, n) represent any one positive example entity pair, m and n are entities from different knowledge graphs, and the two entities represent the same thing in the real world but come from different knowledge graphs; negative case entity pair refers to two entities in two knowledge graphs that are not aligned, (m ', n') is used to represent any one negative case entity pair, m 'and n' are two entities from different knowledge graphs, and m 'and n' represent different things in the real world; the method for generating the negative examples adopts one entity in the aligned entity pair in the random replacement training set, mu and lambda are hyperparameters larger than 0, the hyperparameters are parameters set before the training process of a large-scale pre-training language model in a representation unit in a bidirectional encoder based on a transformer, at the moment, mu =0.1, lambda =1.5, lambda can enable the distance of the negative examples to be larger than a proper value, so that entities which are possibly aligned can be better found out, | | · | | | |, represents a second norm of a calculation vector, and | h | | _m' -h _n' I represents the second norm of the vector extracted by embedding entities for calculating two different knowledge maps m 'and n', max (lambda- | h) _m' -h _n' 0) represents the value λ -h _m' -h _n' If the value is 0, the maximum value between the | and 0 represents the condition that a negative case entity does not exist, and a negative value is avoided during calculation;

step eight, setting a relation loss function in order to introduce relation information into the graph attention network:

wherein R represents a set of all relationships, T _r Representing a set of triples (h, r, T) formed by a relation r, | T _r I represents a set T _r Number of middle elements, h _h Entity embedding, h, representing an entity h in a triplet _t Entity embedding representing entity t in triples, relationship embedding h _r Embedding h by an entity _h And h _t And calculating to obtain:

the relation loss function reduces parameter increase caused by introducing specific relation embedding, and reduces the effect that an improper relation alignment method causes an undesirable alignment effect when a knowledge graph is used as an entity alignment data set and does not provide relation alignment related information;

in the training process, two functions, namely an alignment loss function and a relation loss function, need to be optimized simultaneously, and the two loss functions are combined by setting a hyper-parameter alpha to become a final loss function: l = L _a +αL _r ；

The final objective of the task of entity alignment is to find a set of aligned entity pairs, the aligned entities are obtained by calculating the similarity of entity vectors, and the modular length of the entity vectors is limited to 1 in the training process, so that only inner product operation is needed when the cosine similarity is calculated; in a matrix formed by entity vectors, matrix multiplication is required to obtain cosine similarity of all aligned entity pairs; the specific calculation is as follows:

wherein, sim (h) ₁ ,h ₂ ) Representing the cosine similarity calculation; is the inner product operation of the entity vector, h ₁ And h ₂ Representing data from different knowledge graphsEntity embedding of spectra, | | h ₁ ||||h ₂ I represents h ₁ And h ₂ The multiplication of the modulo length of (c) is 1; a cosine similarity matrix is then constructed.

Detailed description of the invention

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more apparent, the present invention is described in detail below with reference to the embodiments. It should be noted that the specific embodiments described herein are only for illustrating the present invention and are not to be construed as limiting the present invention, and products that can achieve the same functions are included in the scope of the present invention. The specific method comprises the following steps:

example (b): this embodiment specifically describes the contents of an embedded alignment method constructed by a computer: in order to obtain the structural information and semantic information on the knowledge graph greatly, the invention works to obtain word embedding of entities, attributes or entity categories through a pre-trained language model, obtain the structural embedding of the knowledge graph through the combination of an attention power mechanism and an attention power network, and finally fuse the two into an iterative alignment model.

The invention adopts a bidirectional encoder representing unit based on a converter as a word embedding method of an entity name, but because the distribution of word vectors of the bidirectional encoder representing unit based on the converter in space presents a cone shape, the representation of high-frequency words is close to an original point, and the representation of low-frequency words is far away from the original point. This may result in that even though the high frequency words have the same semantic meaning as the low frequency words, the distance of the word vectors does not express semantic relatedness between words well because the words have different representations due to different word frequencies. So compared to the present invention which finally uses a transformer-based bi-directional encoder representation unit, by making all representations have the same mean and variance, the similarity between vectors can be used to represent semantic similarity. For both generic and biomedical knowledge-maps, the present invention works by representing the unit models using a transducer-based bi-directional encoder, respectively, to maximize the use of the rich semantic information in the pre-trained model. For a knowledge graph that provides attributes and entity classes, the method of obtaining word embedding is the same as entity name embedding.

The invention designs a knowledge graph alignment method combining word embedding and structure embedding, which expresses semantic information in a knowledge graph by using a bidirectional encoder expression unit of a large-scale pre-training language model based on a converter to assist an entity alignment task. Then, the first-order neighbors and the second-order neighbors of the nodes are aggregated by using the graph attention network, so that the capability of capturing the second-order neighbors is realized, the model is better utilized to the information of the remote entity, and the alignment of the central entity is more facilitated. By combining word embedding and structure embedding, the accuracy of entity alignment is improved.

Data on the knowledge graph often has semantic information and appears in the form of words. From these words, it can be inferred generally whether some pairs of entities are aligned, according to some empirical judgment. For example, "Acquiredimmune Deficiency Syndrome" and "AIDS". Although high accuracy alignment cannot be performed completely by words, it is necessary to help the effect improvement of entity alignment if such information is utilized reasonably. In this regard, the present invention introduces word embedding of knowledge-graph entity names.

The present invention uses a large scale pre-trained language model transformer-based bi-directional encoder representation unit for word embedding. However, the present invention has been studied for a long time and found that a bidirectional encoder representing unit model based on a transformer without fine adjustment is not effective in a task of calculating the similarity of texts, so that fine adjustment is also performed when the bidirectional encoder representing unit model based on a transformer is set, words of the bidirectional encoder representing unit based on a transformer are embedded in a space in a tapered distribution, and high-frequency words and low-frequency words are located in different regions in the space. The high-frequency words are close to the origin (the mean value of all word embedding), and the low-frequency words are far away from the origin, so that the similarity calculation between the high-frequency words and the low-frequency words cannot represent the semantic similarity of the high-frequency words and the low-frequency words. And the low-frequency words can not be trained sufficiently, the distribution is sparse, the region has a place with incomplete semantics, and the calculated similarity is also problematic.

The invention has a strong paradigm of entity alignment based on embedding, and also has the ability to identify isomorphic subgraphs. However, in different knowledge-graphs, corresponding entities often have non-homogeneous neighborhood structures, which easily results in different representations. To solve this problem, the present invention uses a graph attention network with gating mechanism to capture more diverse neighborhood structure information and highlight useful neighbor nodes. Direct and long-range neighborhood information is then aggregated for use by a gating mechanism. In order to make effective use of word embedding and structure embedding, the present invention combines both into entity embedding using a method of characteristic linear modulation. The model of the invention also sets a relationship loss function to improve the entity representation.

In order to apply the model to practice, the invention implements a process of fusing three released biomedical knowledge maps, and releases the fused knowledge map through a knowledge map application platform, wherein the platform is a Web application program, is developed by using a network application framework of open source codes, can be directly accessed through a browser, and provides functions of map retrieval, intelligent question answering, entity recommendation and the like.

The bidirectional encoder representing unit based on the converter is also a pre-trained language model, the model architecture of the invention is the connection of a multi-layer transform encoder, each encoder has two sublayers, one is a multi-head attention layer, and the internal relation of words in a sentence is learned by utilizing a plurality of self-attention mechanisms; the other is a feed forward graph attention network layer, which includes two linear changes and an activation function. Each sub-layer has a residual concatenation module.

The core of the transducer-based bi-directional encoder representation unit is a multi-headed attention mechanism. Attention, as the name implies, is a focus on dealing with a problem. In practice, the attention mechanism has the characteristics of few parameters, parallelism and remarkable effect. The multi-head attention mechanism is characterized in that input matrixes Q, K and V formed by word vectors are linearly transformed through a plurality of groups of different matrixes, and different attention results are finally spliced together.

The input of the transformer-based bi-directional encoder representation unit model consists of three parts: firstly, embedding marks, namely, cutting an input sentence or sentence pair into a plurality of marks through a WordPiece algorithm, adding a 'CLS' mark at the beginning of the sentence for a classification task, adding an 'SEP' mark between two sentences for the sentence pair task, and then embedding the cut marks and the added marks together; segment embedding, namely distinguishing two sentences when a task aiming at sentence pairs is carried out; and thirdly, embedding the position, wherein the embedding of the position is generated by a set of rule generating mode in the model.

The transformer-based bi-directional encoder representation unit model is pre-trained using two unsupervised prediction tasks, the masking language model and the next sentence prediction, respectively. The masking language model is a task for a single sentence by randomly masking portions of input words in the sentence and then predicting those masked words. In the training process, the position of a certain mark is randomly selected for prediction in each training sequence with a probability of 15%. The marker at the selected position is relabeled as "MASK" with 80% probability, random other markers with 10% probability, and original markers with 10% probability. By this training method, the model can predict the occluded words through the context, so that the word vector reflects the correlation between words and has different representations under different contexts.

Many natural language processing tasks require an understanding of the relationship between two sentences, such as automatic question-answering, and the next sentence prediction task is a task that designs a level task for a sentence. The specific method comprises the following steps: each training sample is composed of a sentence A and a sentence B, 50% of the sentences B are the next sentences of the sentences A, and the remaining 50% of the sentences B are random sentences, and then the training samples are input into the model to carry out two-class prediction.

And pre-training a bidirectional encoder representation unit based on a converter on large-scale linguistic data through the two tasks to obtain a pre-training language model with strong generalization capability. When the method is used on different natural language processing tasks, a good effect can be achieved through simple fine adjustment.

The study of words is an emphasis in the field of natural language processing. Because the granularity of words is small, the words form sentences, and the sentences form paragraphs, chapters and documents, many researches in the field of natural language processing are built on the basis of the words. When studying words, the problem of expression of the words is to be solved first. Taking part of speech determination as an example, according to the steps of traditional machine learning, a general solution is to represent a sample by (x, y), where x is a word and y is the part of speech of the corresponding word, and construct a mapping of y = f (x). However, the input of the mathematical model f (such as the graph attention network and the support vector machine) used in the process is numerical, and the words are abstract symbols used by human beings to express and transmit information, so that the words cannot be directly input into the model, and a proper method is needed to convert the words into numerical types or embed the words into a vector space. This method of embedding words into a dense vector space is referred to as a word embedding method. The invention expresses the words as dense vectors in a low-dimensional space, and is convenient for various downstream tasks to use.

The word embedding model in the invention has two forms: one is called inferring an intermediary word from the context and one is called inferring a context word from the intermediary word. In order to reduce the calculation amount, the core idea is to fit the information in the co-occurrence matrix through an objective function so that the words are expressed into word vectors containing statistical information in the co-occurrence matrix. These vectors have certain semantic properties and can be used for natural language processing tasks, for example, to infer semantic similarity between words by solving the euclidean distance or cosine similarity between two word vectors. Each word corresponds to a vector, and the word vectors are static and cannot process ambiguous words. For example, the word "Apple" may be the names of fruit and company, respectively, in different contexts.

The graph attention network model is to use a graph attention network to extract the characteristics of entities and relations, and then carry out certain operation on the characteristics to judge the credibility of the fact triples. With the development of the graph attention network, the graph attention network model migrates the graph attention network model on the traditional Euclidean space into the modeling of graph data, and automatically learns and extracts the characteristics of the graph data in an end-to-end mode. The graph-volume network is a popular research direction in the graph attention network. The graph convolution network model mainly comprises a spectrum method based on convolution theorem and a space method based on neighbor aggregation. Any one graph convolution network can be written as a non-linear function. A knowledge graph is a more complex graph structure, usually with heterogeneous entity types and relationship types, and the relationship types are directed. The graph convolution network is provided on the basis of a homogeneous graph with only one node type and one relation type, and obviously, the traditional graph convolution network cannot fully utilize information on the knowledge graph to represent the entities and the relations of the knowledge graph.

In the invention, the most important part of the data can be highlighted by using the attention mechanism, and when the attention mechanism is applied to the graph attention network, the attention mechanism is used for calculating the weight of each neighbor node in the aggregation process, so that the information of the important node is highlighted. The graph attention network is a space-based graph convolution network, and the attention mechanism of the invention is used for determining the weight of a node neighborhood when feature information is aggregated. The graph attention network has the following advantages. 1. The calculation is efficient, and the calculation can be in parallel. 2. Compared with a graph convolution network, different weights are distributed to nodes in the same neighborhood, and the model scale can be expanded. 3. And sharing the model weight. 4. The algorithm can process the entire neighborhood without fixing the sample size. 5. The attention weight is calculated by using the node characteristics, but not the structural characteristics of the nodes, so that the attention weight can be calculated without knowing the structure of the graph.

Different knowledge maps, methods for collecting knowledge, different emphasis and sources, so that entities in the real world have different names, neighbor structures or attributes on different knowledge maps. The purpose of entity alignment is to fuse these heterogeneous knowledge-graphs. A bidirectional encoder representing unit based on a converter is provided, wherein the bidirectional encoder representing unit based on the converter is used for embedding words of entity names into a structure with a mean value of 0 and a covariance matrix of unit momentA vector of the array; word embedding is to embed a high-dimensional space with the number of all words into a low-dimensional continuous vector space by using a language model and characterization learning in natural language processing, each word or phrase is mapped to a vector on a real number domain, and then a word embedding set of N entity names of two knowledge maps is obtained by a large-scale pre-training language model in a representation unit in a bidirectional encoder based on a transformer

Where N is a natural number, i is a natural number from 1 to N, x _i Representing single word embedding, then calculating a mean value mu and a covariance matrix sigma in a word embedding set, and then performing singular value decomposition according to a value k which is preset as required, wherein k is a natural number; wherein the mean μ and the covariance matrix Σ are calculated as formula one: mean value->

Covariance matrix ≥>

judging whether a word embedding set obtained by a converter-based bidirectional encoder representing unit model is also the same when the same-word-polysemy and name problems are generated, and starting to use structural information of a knowledge map to further improve the effect of entity alignment if no context information is provided to the converter-based bidirectional encoder representing unit when the same-word-polysemy and name problems are generated;

extracting neighbors with certain similarity in aligned entities in the two knowledge maps, fusing relationship information into a structure embedding model in a representation unit in a bidirectional encoder based on a converter by using a map attention network, judging whether semantic association possibly occurs in a direct neighbor or a remote neighbor of the aligned entities, and then beginning to aggregate the semantic association information of the direct neighbor and the remote neighbor; the graph attention network is a graph convolution network based on space, and an attention mechanism of the graph attention network is used for determining the weight of a node neighborhood when characteristic information is aggregated;

in order to expand effective neighbors with more jumps, a gate mechanism network is adopted in a representation unit in the bidirectional encoder based on the converter to combine a first-order neighbor with a second-order neighbor, so that a more-order neighbor entity is expanded, and the training effect of a structure embedded model in the representation unit in the bidirectional encoder based on the converter after using a graph attention network is accelerated through the jump connection;

the door mechanism network has the following functions: when the first-order neighbors of the aligned entity are identical or not identical in different knowledge maps, in order to reduce noise caused by the difference of the first-order neighbors, a structural embedding model in a representation unit in a bidirectional encoder based on a converter carries out aggregation of the second-order neighbors;

using two matrices respectively

And matrix +>

Transformation for central and neighbour entities, two matrices

And matrix->

Initialization begins, and both matrices->

And matrix->

The elements in (a) change with the training process so that the inputs and outputs of the structure-embedded model meet the system requirements.

Embedding h of an entity S combining information of first and second order neighbors _u The calculation is as the formula two:

h _u ＝g(h _i,2 )·h _i,1 +(1-g(h _i,2 ))·h _i,2 a formula two;

wherein h is _i,1 And h _i,2 Respectively, from the first-order neighbors and the second-order neighbors, function g (h) _i,2 )＝σ(Mh _i,2 + b), σ is an activation function, and non-linear factors are introduced into the graph attention network, so that the graph attention network can arbitrarily approximate any non-linear function, M and b are respectively a weight matrix and a bias vector, the weight matrix M is initialized by using uniform distribution, the bias vector b is initialized by using all-zero, and elements in the weight matrix and the bias vector change along with the training processSo that the input and output of the structure embedding model meet the requirements;

the method adopts the first characteristic linear modulation layer combined word embedding and structure embedding, and the specific calculation mode of the first characteristic linear modulation method is shown as the following formula III:

h _u,e ＝FiLM(h _u,s )＝h _u,w W ₁ e h _u,s +h _u,w W ₂ a formula III;

wherein h is _u,s Structural embeddings representing entity u, h _u,w Word-in representing entity u,. Alpha.representing a Hadamard product operation, is an operation of a matrix, W ₁ And W ₂ The method is characterized in that the method comprises the following steps that two matrixes which are initialized in a distributed mode and use specific uniform distribution are used, and the specific uniform distribution is used for keeping the activation values and the gradients of all layers of a graph attention network consistent in the process of propagation; fiLM (h) _u,s ) Characteristic linear modulation method in representative graph attention network, final output h of characteristic linear modulation method _u,e Embedding the final entity, and calculating by cosine similarity to obtain an aligned entity pair;

in addition, for a given alignment seed set entity pair, when two entities in the alignment entity pair are different in neighborhood structure due to different knowledge graphs, in order to utilize information in the alignment seed set entity pair as much as possible, before an alignment task starts, the entities in the alignment entity pair are expanded in the neighborhood structure, namely triples in the two knowledge graphs containing the alignment seed set entity pair are supplemented with each other, so that the capacity of a graph attention network for identifying isomorphic subgraphs is exerted as much as possible, and the entity pairs in the alignment seed set are more easily subjected to the same embedded representation through the more similar neighborhood structure, so that after the graph attention network is propagated, favorable influence factors are expanded into the embedding of the whole knowledge graph, and the effect is more accurate;

designing an alignment penalty function L by narrowing the distance between positive example entities and enlarging the distance between negative example entities _a ：

Wherein G is ⁺ And G ^- Respectively representing a set of positive example entity pairs and negative example entity pairs, wherein the positive example entity pairs refer to two aligned entities in two knowledge graphs, and (m, n) represent any one positive example entity pair, m and n are entities from different knowledge graphs, and the two entities represent the same thing in the real world but come from different knowledge graphs; negative case entity pair refers to two entities in two knowledge graphs that are not aligned, (m ', n') is used to represent any one negative case entity pair, m 'and n' are two entities from different knowledge graphs, and the two entities m 'and n' represent different things in the real world; the method for generating the negative examples adopts one entity in the aligned entity pair in the random replacement training set, mu and lambda are hyperparameters larger than 0, the hyperparameters are parameters set before the training process of a large-scale pre-training language model in a representation unit in a bidirectional encoder based on a transformer, at the moment, mu =0.1, lambda =1.5, lambda can enable the distance of the negative examples to be larger than a proper value, so that entities which are possibly aligned can be better found out, | | · | | | |, represents a second norm of a calculation vector, and | h | | _m' -h _n' I represents the calculation of the sum of two different knowledge maps mn' is embedded into the second norm, max (λ - | h), of the extracted vector _m' -h _n' 0) represents the value lambda- | | h _m' -h _n' If the maximum value between the | | and 0 is 0, the situation that a negative case entity does not exist is represented, and a negative value is avoided during calculation;

to introduce relationship information to the graph attention network, a relationship loss function is set:

wherein R represents a set of all relationships, T _r Representing a set of triples (h, r, T) formed by a relation r, | T _r I represents the set T _r Number of middle elements, h _h Entity embedding, h, representing an entity h in a triplet _t Entity embedding representing entity t in triples, relationship embedding h _r Embedding h by entities _h And h _t And calculating to obtain:

The final objective of the task of entity alignment is to find a set of aligned entity pairs, the aligned entities are obtained by calculating the similarity of entity vectors, and the modular length of the entity vectors is limited to 1 in the training process, so that only inner product operation is needed when the cosine similarity is calculated; in the matrix formed by entity vectors, matrix multiplication is needed to obtain all aligned entity pairsCosine similarity of (d); the specific calculation is as follows:

wherein, sim (h) ₁ ,h ₂ ) Representing the cosine similarity calculation; is the inner product operation of the entity vector, h ₁ And h ₂ Representing entity embedding from different knowledge graphs, | | h ₁ ||||h ₂ I represents h ₁ And h ₂ The multiplication of the modulo length of (c) is 1; a cosine similarity matrix is then constructed.

The above description is only for the preferred embodiment of the present invention, and should not be used to limit the scope of the claims of the present invention. While the foregoing description will be understood and appreciated by those skilled in the relevant art, other equivalents may be made thereto without departing from the scope of the claims.

The beneficial results are as follows: the invention provides a computer-constructed embedded alignment method, which alternately discovers new alignment entities and corrects the existing alignment entities through an iterative process. The method comprises the steps of finding an alignment entity, comparing and selecting the alignment entity by calculating the similarity of two entity character strings, and correcting the alignment entity by a greedy algorithm and a design reasoning process, so that in practice, the three published biomedical knowledge maps are aligned and fused into a larger-scale knowledge map. The knowledge graph is provided for other researchers to use by developing a knowledge graph application platform.

Claims

1. A computer-implemented embedded alignment method, characterized by: step one, setting a bidirectional encoder representing unit based on a converter, wherein the bidirectional encoder representing unit based on the converter is used for embedding and constructing words of entity names into vectors with a mean value of 0 and a covariance matrix of an identity matrix; word embedding is to embed a high-dimensional space with the number of all words into a low-dimensional continuous vector space by using a language model and characterization learning in natural language processing, each word or phrase is mapped to a vector on a real number domain, and then the vector passes through a baseWord-embedded sets of N entity names for two knowledge graphs obtained by large-scale pre-training language models in presentation units in a bi-directional encoder of a transformer

The method comprises the following steps of (1) performing singular word embedding, wherein N is a natural number, j is a natural number from 1 to N, xj represents single word embedding, then calculating a mean value mu and a covariance matrix sigma in a word embedding set, performing matrix interception according to a value of k which is required to be set in advance after singular value decomposition, namely performing matrix slice calculation, and k is a natural number; wherein the calculation of the mean μ and the covariance matrix Σ is of formula one: mean value

Covariance matrix ≥>

using two matrices respectively

And matrix->

Transformation for a central entity and a neighbor entity, two matrices @>

And matrix->

Initialization begins, two matrices->

And matrix->

2. The computer-implemented embedded alignment method of claim 1, wherein: embedding h of an entity S combining information of first and second order neighbors _u The calculation is as the formula two:

h _u ＝g(h _i,2 )·h _i,1 +(1-g(h _i,2 ))·h _i,2 a formula two;

wherein h is _i,1 And h _i,2 Respectively, from the first-order neighbors and the second-order neighbors, function g (h) _i,2 )＝σ(Mh _i,2 + b), σ is an activation function that introduces non-linear factors into the graph attention network, so that the function of the graph attention network can arbitrarily approximate any non-linear function, M and b are the weight matrix and the offset vector, respectively,the weight matrix M is initialized by using uniform distribution, the bias vector b is initialized by using all zeros, and elements in the weight matrix and the bias vector are changed along with the training process, so that the input and the output of a structure embedding model in a representation unit in the bidirectional encoder based on the converter meet the requirements;

h _u,e ＝FiLM(h _u,s )＝h _u,w W ₁ e h _u,s +h _u,w W ₂ a formula III;

step six, in addition, for a given alignment seed set entity pair, when two entities in the alignment entity pair are from different knowledge graphs and the neighborhood structures of different knowledge graphs are different, in order to utilize the information in the alignment seed set entity pair as most efficiently, before an alignment task starts, the neighborhood structures of the entities in the alignment entity pair are expanded, triples in the two knowledge graphs containing the alignment seed set entity pair are supplemented with each other, so that the capacity of a graph attention network for identifying isomorphic subgraphs is enhanced, and the entity pairs in the alignment seed set can obtain the same embedded representation more easily through the more similar neighborhood structures, so that after the graph attention network is processed and propagated, favorable influence factors are expanded into the embedding of the whole knowledge graph, and the effect is more accurate;

Wherein G is ⁺ And G ^- Respectively representing a set of positive example entity pairs and negative example entity pairs, wherein the positive example entity pairs refer to two entities aligned in two knowledge graphs, (m, n) represent any one positive example entity pair, m and n are entities from different knowledge graphs, and the two entities represent the same thing in the real world but come from different knowledge graphs; negative example entity pair refers to two entities in two knowledge graphs that are not aligned, (m ', n') is used to represent any one negative example entity pair, m 'and n' are two entities from different knowledge graphs, and m 'and n' represent different things in the real world; the method for generating negative examples adopts one entity in the aligned entity pair in the random replacement training set, mu and lambda are hyperparameters larger than 0, and the hyperparameters are parameters set before the training process of a large-scale pre-training language model in a representation unit in a bidirectional encoder based on a converter, wherein mu =0.1 and lambda isK =1.5, λ can make the distance of the negative case larger than a suitable value, so as to better find out the entities possibly aligned, | | · | | | represents the second norm of the calculation vector, | | h _m' -h _n' | | represents the second norm of the vector extracted by embedding entities for calculating two different knowledge maps m 'and n', max (λ - | h) _m' -h _n' 0) represents the value λ -h _m' -h _n' If the value is 0, the maximum value between the | and 0 represents the condition that a negative case entity does not exist, and a negative value is avoided during calculation;

wherein R represents a set of all relationships, T _r Representing a set of triples (h, r, T) formed by a relation r, | T _r I represents the set T _r Number of middle elements, h _h Entity embedding, h, representing an entity h in a triplet _t Entity embedding h representing entity t in triple and relation embedding h _r Embedding h by an entity _h And h _t And calculating to obtain:

the relation loss function reduces parameter increase caused by introducing specific relation embedding, and when the knowledge graph is used as an entity alignment data set and does not provide relation alignment related information, the effect that an improper relation alignment method causes an undesirable alignment effect is reduced;

The task end goal of entity alignment is to find a set of aligned entity pairs, the aligned entitiesThe body is obtained by calculating the similarity of the entity vector, and the modular length of the entity vector is limited to 1 in the training process, so that only inner product operation is needed when the cosine similarity is calculated; in a matrix formed by entity vectors, matrix multiplication is required to obtain cosine similarity of all aligned entity pairs; the specific calculation is as follows:

wherein, sim (h) ₁ ,h ₂ ) Representing the cosine similarity calculation; is the inner product operation of the entity vector, h ₁ And h ₂ Representing entity embedding from different knowledge graphs, | | h ₁ ||||h ₂ I represents h ₁ And h ₂ The multiplication of the modulo length of (c) is 1; and then constructing a cosine similarity matrix. />