LU503448B1

LU503448B1 - Alignment Method, Device and Storage Medium for Entity of Multimodal Knowledge Graph

Info

Publication number: LU503448B1
Application number: LU503448A
Authority: LU
Inventors: Jia Zhu
Original assignee: Univ Zhejiang Normal
Priority date: 2021-06-21
Filing date: 2022-06-16
Publication date: 2023-06-07
Also published as: CN113360673B; WO2022267976A1; CN113360673A

Abstract

The invention provides an alignment method, device and storage medium for entity of multimodal knowledge graphs. The invention comprises the followings: acquiring data of a first multimodal knowledge graph and a second multimodal knowledge graph, and extracting entities needing to be aligned from the data; then processing the multimodal data of the entity to obtain modal vectors of the entity, and performing early fusion and late fusion according to the modal vectors; then, obtaining the multi-modal embedding vector by combining the early fusion results with the late fusion results; finally, performing entity alignment according to the multimodal embedding vector. By using the method of the invention, the entity alignment of the multimodal knowledge graph can be realized, and the problem of inconsistency among multimodal knowledge expressions is solved. The invention can be widely applied to the technical field of knowledge graph.

Description

DESCRIPTION LU503448

ALIGNMENT METHOD, DEVICE AND STORAGE MEDIUM FOR ENTITY OF

MULTIMODAL KNOWLEDGE GRAPH

TECHNICAL FIELD

The invention relates to the technical field of knowledge graphs, in particular to an alignment method, device and storage medium for entity of multimodal knowledge graphs.

BACKGROUND

Because most knowledge graphs are built for specific purposes and based on monolingual environment, this leads to different expressions of the same concept in different knowledge graphs. The purpose of entity alignment is to screen out the entities with different expressions but actually the same in two knowledge graphs, so as to integrate different knowledge graphs.

Due to the variety of knowledge forms, the current embedding technology can't handle multimodal knowledge well. To overcome this challenge, researchers have recently proposed various models to fuse multimodal information in knowledge graphs and form joint embedding, so that the aligned models can automatically adjust modal weights. However, these studies do not consider the modal correlation of feature level, and when the correlation between multiple modes is relatively large, it is likely that satisfactory results will not be obtained. These problems existing in the prior art need to be solved urgently.

SUMMARY LU503448

The purpose of the present invention is to solve one of the technical problems existing in the prior art at least to some extent.

Therefore, an object of the present invention is to provide an alignment method, a device and a medium for entity of multimodal knowledge graph, which can realize entity alignment of multimodal knowledge graph by early and late fusion of multimodal knowledge graph, and solve the problem of inconsistency among multimodal knowledge expressions.

In order to achieve the above technical purpose, the technical scheme adopted by the embodiment of the present invention includes:

In the first aspect, the invention provides an alignment method for entity of multimodal knowledge graph.

The alignment method for entity of multimodal knowledge graph comprises: acquiring data of a first multimodal knowledge graph and a second multimodal knowledge graph; respectively extracting entities to be aligned from the first multimodal knowledge graph and the second multimodal knowledge graph; processing multimodal data of the entity to obtain modal vectors of the entity, wherein the multimodal data comprises image data, relationship data, attribute data and knowledge graph structure data; each modal vector comprises an image embedding vector, a relationship embedding vector, an attribute embedding vector and a knowledge graph structure vector; according to the modal vectors, carrying out early fusion through the fully connected neural network model; according to the modal vectors, carrying out late-stage fusion through a low-rank multimodal model; combining the early fusion results with the late fusion results to obtain the multi-modal embedding vector; performing entity alignment according to the multimodal embedding vector.

Further, processing the image data of the entity to obtain the image embeddirld/503448 vector of the entity specifically comprises: using a pre-trained RESNET model to extract features of the acquired image data; processing the extracted features by a first preset function to obtain an image embedding vector.

Further, processing the relationship data of the entity to obtain the relationship embedding vector of the entity specifically includes: converting the obtained relationship data into translation vectors through a TransE model; calculating the structural similarity of the translation vector through a second preset function to obtain a logistic regression loss function; the logistic regression loss function is converged to obtain the relationship embedding vector.

Further, processing the attribute data of the entity to obtain the attribute embedding vector of the entity specifically includes: mapping the obtained attribute data to a low-dimensional space through a feedforward network to obtain an attribute embedding vector.

Further, processing the knowledge graph structure data of the entity to obtain the structure embedding vector of the entity specifically includes: establishing a semi-supervised embedding model based on graph convolution network; setting the relationship vertex; processing the relationship vertices by the semi-supervised embedding model to obtain a structural embedding vector.

Further, the early fusion specifically includes: establishing a fully connected neural network model; fusing all features extracted by the RESNET model by the fully connected neural network model.

Further, the late fusion specifically includes:

simplifying the vector representation of multimodal fusion by low-rank multimodal503448 fusion model; simplifying the vector representation in a preset way.

Further, combining the early fusion and the late fusion specifically includes: combining the early fusion and the late fusion through collaborative training according to a preset loss function.

In the second aspect, the invention provides an alignment device for entity of multimodal knowledge graph, comprising: at least one processor; at least one memory for storing at least one program; when the at least one program is executed by the at least one processor, the at least one processor realizes the alignment method for entity of the multimodal knowledge graph.

In the third aspect, the invention provides a computer-readable storage medium, storing instructions executable by a processor, the instructions executable by the processor are used to implement the alignment method for entity of the multimodal knowledge graph.

The alignment method for entity of multimodal knowledge graph has the following beneficial effects.

According to the invention, by acquiring the data of the first multimodal knowledge graph and the second multimodal knowledge graph, the entities to be aligned are extracted from them; then, multi-modal entity data composed of image data, relationship data, attribute data and knowledge graph structure data are processed to obtain various modal vectors composed of image embedding vector, relational embedding vector, attribute embedding vector and knowledge graph structure vector, and early fusion and late fusion are carried out according to each modal vector; then, the multi-modal embedding vector is obtained by combining the early fusion results with the late fusion results; finally, entity alignment is performed according to the multimodal embedding vector. By using the method of the invention, the entity alignment of the multimodal knowledge graph can be realized, and the problem of inconsistency among multimodgU503448 knowledge expressions is solved.

BRIEF DESCRIPTION OF THE FIGURES

In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following description is given to the drawings of the embodiments of the present invention or the related technical solutions in the prior art. It should be understood that the drawings in the following description are only for the convenience of clearly expressing some embodiments in the technical scheme of the present invention. For those skilled in the art, other drawings can be obtained according to these drawings without any creative labor.

Fig. 1 is a flowchart of an alignment method for entity of a multimodal knowledge graph of the present invention;

Fig. 2 is a flowchart of an alignment method for entity of a multimodal knowledge graph in the application process of the present invention;

Fig. 3 is a structural schematic diagram of an alignment device for entity of a multimodal knowledge graph of the present invention.

DESCRIPTION OF THE INVENTION

Embodiments of the present invention will be described in detail below, examples of which are shown in the accompanying drawings, in which the same or similar reference numerals refer to the same or similar elements or elements with the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, only for explaining the present invention, and should not be construed as limiting the present invention. For the step numbers in the following examples, they are only set for convenience of explanation, and there is no restriction on the order of steps. The execution order of each step in the examples can be adjusted adaptively according to the understanding of those skilled in the art.

Entity alignment is a key work to integrate different knowledge graphs by arrangirig)503448 various entities of the same real-world prototype. Because most knowledge graphs are built for specific purposes and based on monolingual environment, all knowledge graphs will have different descriptions even for the same concept.

Most of the early research on entity alignment focused on attribute similarity. These studies are often confused by the attribute heterogeneity that makes entity alignment error-prone. Recently, in view of the rapid development of knowledge graph embedding, many researchers try to apply embedding technology according to various models of entity alignment. However, these embedding techniques can't handle multimodal knowledge well, because there are various knowledge forms, such as relational triples, images, etc, but at the same time, these knowledge forms highly support entity alignment.

The influence of multi-modal knowledge on entity alignment is not insignificant, because the inevitable heterogeneity in different modes makes it difficult to learn and fuse knowledge expressions from different modes. It is not easy to identify the same target by using traditional technology and only using images or text information. To overcome this challenge, researchers have recently proposed various models to fuse multimodal information in knowledge graph and form joint embedding, so that the alignment model can automatically adjust modal weights. However, these studies do not consider the modal correlation of feature level, and when the correlation between multiple modes is relatively large, it is likely that satisfactory results will not be obtained.

Based on the above problems, this scheme proposes an alignment method for entity of multimodal knowledge graph. In this scheme, multimodal data composed of image data, relationship data, attribute data and knowledge graph structure data in entities are processed first to obtain modal vectors composed of image embedding vector, relational embedding vector, attribute embedding vector and knowledge graph structure vector.

Then, after early and late fusion are respectively carried out according to each modal vector, the results of early fusion and late fusion are combined to obtain multimodal embedding vector, so as to solve the influence caused when the correlation between multiple modalities is relatively large and improve the entity alignment.

Specifically, referring to fig. 1 and fig. 2, the alignment method for entity bH503448 multimodal knowledge graph provided by the embodiment of the present invention includes the following steps:

S101: Obtain the data of the first multimodal knowledge graph and the second multimodal knowledge graph. Knowledge graph is a modern theory that combines the theories and methods of applied mathematics, graphics, information visualization technology, information science and other disciplines with bibliometric citation analysis, co-occurrence analysis and other methods, and displays the core structure, development history, frontier fields and overall knowledge structure of the disciplines vividly by using visual map. Among them, the main difference between multi-modal knowledge graph and traditional knowledge graph is that traditional knowledge graph mainly focuses on the entities and relationships of texts and databases, while multi-modal knowledge graph, on the basis of traditional knowledge graph, constructs multi-modal entities and multi-modal semantic relationships among multi-modal entities.

S102: Extract the entities to be aligned from the multimodal knowledge graph. The modal knowledge graph in this step refers to the first multimodal knowledge graph and the second multimodal knowledge graph in S101, and the specific operation process refers to extracting the entities to be aligned from the first multimodal knowledge graph and the second multimodal knowledge graph respectively. Entities are things that exist objectively and can be distinguished from each other, often referring to a collection of certain kinds of things.

S103, process the multimodal data of the entity to obtain each modal vector of the entity, wherein the multimodal data includes image data, relationship data, attribute data and knowledge graph structure data; each modal vector includes image embedding vector, relationship embedding vector, attribute embedding vector and knowledge graph structure vector.

Among them, image embedding is to use the pre-trained RESNET model as the feature extractor of the image, and take the output of the last layer as the image representation. Finally, the extracted features are processed by the first preset function, and the image embedding vector emb_! is obtained. RESNET model refers to residual network, which is a kind of convolution neural network. Its characteristic is that it is eadyJ503448 to optimize, and it can improve the accuracy by increasing a considerable depth. Its internal residual block uses jumping connection, which alleviates the problem of gradient disappearance caused by increasing depth in deep neural network. Compared with another classical convolutional neural network model VGG16, RESNTET can solve the degradation problem in deep networks.

The first preset function is as follows: emb, = W, * RESNET (I) + by, in the above formula, W, is the weight vector, b; is the offset vector, and represent the image.

Among them, relationship embedding is specifically to use TransE model to express all entities and relations in multimodal knowledge graph as a low-dimensional vector.

TransE model is used to translate triples into embedding word vectors. And the triple, that is, the form of (head entity, relation, tail entity), the head entity and tail entity are collectively referred to as entities. For simplicity, the triple is represented by f(h,r,t), where h is the head entity, t is the tail entity, and r is the relationship between h and t. Then, the similarity of structures is measured by the second preset function.

The second preset function is as follows: fralhr,t0)= —||h9 +r —+@][, where f-(h,r,t) is a function to calculate the similarity between entity h and entity t.

Then, the logistic regression loss function is obtained, and the relationship embedding vector emb_r is obtained through the convergence function, as follows:

Lembr = E(hr,t)ex+ux- log(1 + exp(afre1(h, r, 0)

In the above formula, a is the label of f-(h,r,t), and its value is 1 or -1.

X* represents the positive correlation facts in the source knowledge graph and the target knowledge graph, and X7 represents a group of negative samples by replacing the head or tail entities of the positive correlation facts.

Among them, attribute embedding is specifically, because of the noise from neighboring nodes, the effect of using deep neural network model to deal with attribute embedding is not good. Therefore, a simple feedforward network is used to map attributé)503448 features into a low-dimensional space, so as to obtain attribute embedding vectors: emb, = Wy x A + by,

In the above formula, emb, is the attribute embedding vector, W, is the weight matrix vector, b, is the deviation vector, and A is the set of attributes.

Among them, the embedding of knowledge graph structure includes establishing a semi-supervised embedding model based on graph convolution network, transforming the knowledge graph into an undirected graph, and reconstructing the structure of the original knowledge graph. For example, suppose that triple (el, r, e2), e1,e2 represent entities, and R represents relationships among entities. In this embodiment, semi-supervised embedding model assigns different relationship vertices r1 and r2 to triple, forming (e1, r1) and (e2, r2). Each vertex of the relationship is represented by a unique single heat.

Based on this new undirected graph, the Deepwalk algorithm is used to represent the feature vector of each entity vertex, and the unique heat representation of each relation vertex is input into GCN system. These relationship vertices can display the total number of neighbors with the same relationship information between two entity vertices.

After the coding of convolution layer, the representation information of entity vertices and relation vertices in the graph can be obtained. For each layer in GCN, it can be written as a nonlinear function:

In the above formula, HY is the input matrix, H is the output matrix, L is the number of layers, and M is the adjacency matrix of knowledge graph. Then, set the following propagation rules:

F (HP M) = Rerv (ah wh)

In the above formula, W is the weight matrix of L network layer, and ReLU is the activation function. Note that multiplying by M only sums up all the attributes of all adjacent vertices, not the vertices themselves. Therefore, it is necessary to add identityJ503448 matrix | to M, so the above equation is updated as follows:

FIED ar) = ReLU (DD HP 0)

In the formula, M = M+I, and D is the diagonal matrix of M. In this embodiment, the output of the last layer is used as the structural embedding vector emb_kg of the knowledge graph.

S104: Early fusion is performed through the fully connected neural network model.

Early fusion refers to capturing the relationships among features better by combining features before data is sent into the model. This scheme uses standard early fusion technology to fuse multiple features from different data modalities. In this embodiment, a simple fully connected neural network model is designed to connect all the features of each mode in series.

S105: Perform late-stage fusion through the low-rank multimodal model. {Zn }_, is defined as the coding of single modal information of M different modalities. The goal of multimodal fusion is to integrate the single modal representation into a compact multimodal representation. Tensor representation is considered as an effective method for multimodal fusion. However, the parameters of the learning weight tensor will also increase exponentially. This not only increases a lot of calculations, but also makes the model have the risk of over-fitting. In this embodiment, the weight is decomposed into a series of low-rank factor sets through a low-rank multimodal fusion model. The low-rank multimodal fusion model can simplify Z = Bes Fe into the output vector h;: h, = J pe wD + Zm|,

In the above formula, ”” represents the dot product of a series of tensors, r is the rank of tensors, and wd is the corresponding low rank factor of each mode m.

Compared with the existing methods, this calculation method simplifies the parallel decomposition of Z and W, so that only h, needs to be calculated without creating tensor Z, thus avoiding the calculation of large input tensor Z. If r is too large, the amount of calculation is still very large. At this time, the following equation is updated k/503448 exchanging summation order and multiplying elements: hy = Yi 2 wd * “ i 5. tL. PX I

In the above formula, i represents the i-th of the matrix, and the newly added constraint condition is to ensure that the decomposition exists in an acceptable range while reducing the amount of calculation.

S106: Combine the early fusion result with the late fusion result to obtain a multimodal embedding vector. Specifically, the final multi-modal embedded embF is obtained by combining the late fusion result h, with the early fusion model h, through the following loss function. In this way, the advantages of the two kinds of fusion can be combined: not only can the output features of the previous fusion be easily combined, but also the calculation caused by the input tensor process can be avoided, thus reducing the complexity of calculation.

Le = EX (|| He = BOZO + 1 - 207207),

S107: Perform entity alignment according to the multimodal embedding vector.

In some embodiments, the embedding of multimodal vectors is achieved through multiple trainings. Specifically, L2 specification is used to constrain the embedding of all entities to adjust the embedding vector. The parameter Xavier initializer is used for initialization, and the loss function is optimized by Adadelta to simplify the calculation. In addition to the embF of all entities, it is also necessary to calculate the similarity of all pairs of bipartite graph entities and arrange them with the loss function Lea- Lea is as follows:

Lea = FEN (3108(1 + Bnei e457) + Hlog(1 + Tur e957) — log(1 + BSu)), where a and B are temperature scales; N is the number of seeds.

When the whole training process converges, entity alignment is performed hHy/503448 nearest neighbor search algorithm based on embF.

The following is the specific experimental data of this embodiment:

The main content of this experiment is to measure the similarity between two common multi-mode data sets FB15K-DB15K and FB15K-YAGO15K, so as to obtain the performance of this embodiment. This embodiment uses cosine similarity to calculate the similarity of two data sets, and uses Hits@n, MR and MRR as indicators to evaluate all models. Hits@n indicates the ratio of correct entities in the top n based on similarity calculation. MR represents the average level of the correct entity. MRR represents the average reciprocal of the correct entity.

In the experiment, various types of latest models are selected to demonstrate the performance of this embodiment (DFMKE) framework, including two typical translation-based methods, namely TransE and IPTransE. Two simple late fusion methods: MMKG and MMEA; and two latest methods: MultiKE and EVA. For those methods that use the same data set as this embodiment, the reported results are directly adopted. For other methods, follow the same superparameter settings mentioned in the original paper to repeat the experiments of other methods.

PR PP EDS TS RS SES

His SR MA DMR! inst MB | MER

CMMSG IN NN NS MAN D mana WE

It can be seen from the above table that this embodiment (DFMKE) ranks the highest among the three indexes of Hits@1, Hits@10 and MRR; in the MR index, this embodiment (DFMKE) ranks lowest, that is to say, this embodiment (DFMKE) has a higher accuracy of entity alignment compared with other prior art, and effectively solves the problem of inconsistency among multimodal knowledge expressions.

Referring to fig. 3, an embodiment of the present invention provides an alignmeht503448 device for entity of a multimodal knowledge graph, which includes:

At least one processor 201;

At least one memory 202 for storing at least one program;

When the at least one program is executed by the at least one processor 201, the at least one processor 201 realizes the alignment method for entity of the multimodal knowledge graph shown in fig. 1.

The contents in the above-mentioned method embodiments are all applicable to this device embodiment. The specific functions realized by this device embodiment are the same as those of the above-mentioned method embodiments, and the beneficial effects achieved are also the same as those achieved by the above-mentioned method embodiments.

An embodiment of the present invention also provides a storage medium in which instructions executable by a processor are stored, and the instructions executable by the processor are used to implement the alignment method for entity of the multimodal knowledge graph shown in FIG. 1 when executed by the processor.

In some alternative embodiments, the functions/operations mentioned in the block diagram may occur out of the order mentioned in the operation diagram. For example, depending on the functions/operations involved, two blocks shown in succession can actually be executed substantially simultaneously or the blocks can sometimes be executed in the reverse order. In addition, the embodiments presented and described in the flowchart of the present invention are provided by way of example, with the aim of providing a more comprehensive understanding of the technology. The disclosed method is not limited to the operation and logic flow presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of a larger operation are independently performed.

Further, although the present invention is described in the context of functional modules, it should be understood that unless otherwise stated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented 503448 separate physical devices or software modules. It can also be understood that a detailed discussion about the actual implementation of each module is not necessary for understanding the present invention. More specifically, considering the attributes, functions and internal relations of various functional modules in the device disclosed herein, the actual implementation of this module will be known within the engineer's conventional technology. Therefore, those skilled in the art can realize the invention set forth in the claims without undue experimentation by applying ordinary techniques. It is also to be understood that the specific concepts disclosed are only illustrative and not intended to limit the scope of the invention, which is determined by the appended claims and their full scope of equivalents.

If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, which essentially contributes to the prior art or part of the technical solution, can be embodied in the form of a software product, which is stored in a storage medium and includes a number of instructions to make a computer device (such as a personal computer, a server, or a network device, etc.) perform all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, Read-Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program codes.

The logic and/or steps shown in the flowchart or described in other ways herein, for example, can be considered as a sequence table of executable instructions for realizing logical functions, and can be embodied in any computer-readable medium for use by or in connection with the instruction execution system, device or equipment (such as computer-based systems, systems including processors, or other systems that can fetch and execute instructions from instruction execution systems, devices, or devices). For the purposes of this specification, a "computer readable medium" can be any device that can contain, store, communicate, propagate or transport programs for use by or 1503448 connection with instruction execution systems, devices or devices.

More specific examples (non-exhaustive list) of computer-readable media include the following: an electrical connection part (electronic device) with one or more wires, a portable computer case (magnetic device), a RAM, a ROM, an erasable and editable read-only memory (EPROM or flash memory), an optical fiber device, and a portable

CD-ROM. In addition, the computer-readable medium can even be paper or other suitable media on which the program can be printed, because the program can be obtained electronically by, for example, optically scanning the paper or other media, then editing, interpreting or processing in other suitable ways if necessary, and then stored in the computer memory.

It should be understood that various parts of the present invention can be implemented by hardware, software, firmware or their combination. In the above embodiments, a plurality of steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system.

For example, if it is implemented by hardware, as in another embodiment, it can be implemented by any one or their combination of the following technologies known in the art: discrete logic circuit with logic gates for realizing logic functions on data signals, application specific integrated circuit with appropriate combinational logic gates, programmable gate array (PGA), field programmable gate array (FPGA), etc.

In the above description of this specification, the description referring to the terms "one embodiment/example", "another embodiment/example" or "some embodiments/examples” means that the specific features or characteristics described in connection with the embodiments or examples are included in at least one embodiment or example of the present invention. In this specification, the schematic expressions of the above terms do not necessarily refer to the same embodiment or example.

Furthermore, the specific features or features described may be combined in any one or more embodiments or examples in a suitable manner.

Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that many changes, modifications, substitutions and variations can be made to these embodiments without departing from the principlés/503448 and purposes of the present invention, and the scope of the present invention is defined by the claims and their equivalents.

The preferred embodiment of the present invention has been specifically described above, but the present invention is not limited to the described embodiment. Those skilled in the art can make various equivalent modifications or substitutions without violating the spirit of the present invention, and these equivalent modifications or substitutions are included in the scope defined in the claims of this application.

Claims

CLAIMS LU503448

1. An alignment method for entity of multimodal knowledge graph, characterized by comprising: acquiring data of a first multimodal knowledge graph and a second multimodal knowledge graph; respectively extracting entities to be aligned from the first multimodal knowledge graph and the second multimodal knowledge graph; processing multimodal data of the entity to obtain modal vectors of the entity, wherein the multimodal data comprises image data, relationship data, attribute data and knowledge graph structure data; each modal vector comprises an image embedding vector, a relationship embedding vector, an attribute embedding vector and a knowledge graph structure vector; according to the modal vectors, carrying out early fusion through the fully connected neural network model; according to the modal vectors, carrying out late fusion through a low-rank multimodal model; combining the early fusion results with the late fusion results to obtain the multi-modal embedding vector; performing entity alignment according to the multimodal embedding vector.

2. The alignment method for entity of multimodal knowledge graph according to claim 1, characterized in that processing the image data of the entity to obtain the image embedding vector of the entity comprises: using a pre-trained RESNET model to extract features of the acquired image data; processing the extracted features by a first preset function to obtain an image embedding vector.

3. The alignment method for entity of multimodal knowledge graph according td/503448 claim 1, characterized in that processing the relationship data of the entity to obtain the relationship embedding vector of the entity includes: converting the obtained relationship data into translation vectors through a TransE model; calculating the structural similarity of the translation vector through a second preset function to obtain a logistic regression loss function; converging the logistic regression loss function to obtain the relationship embedding vector.

4. The alignment method for entity of multimodal knowledge graph according to claim 1, characterized in that processing the attribute data of the entity to obtain the attribute embedding vector of the entity includes: mapping the obtained attribute data to a low-dimensional space through a feedforward network to obtain an attribute embedding vector.

5. The alignment method for entity of multimodal knowledge graph according to claim 1, characterized in that processing the knowledge graph structure data of the entity to obtain the structure embedding vector of the entity includes: establishing a semi-supervised embedding model based on graph convolution network; setting the relationship vertex; processing the relationship vertices by the semi-supervised embedding model to obtain a structural embedding vector.

6. The alignment method for entity of multimodal knowledge graph according to claim 2, characterized in that the early fusion includes: establishing a fully connected neural network model; fusing all features extracted by the RESNET model by the fully connected neural network model.

7. The alignment method for entity of multimodal knowledge graph according td/503448 claim 1, characterized in that the late fusion includes: simplifying the vector representation of multimodal fusion by low-rank multimodal fusion model; simplifying the vector representation in a preset way.

8. The alignment method for entity of multimodal knowledge graph according to claim 1, characterized in that combining the early fusion and the late fusion specifically includes: combining the early fusion and the late fusion through collaborative training according to a preset loss function.

9. An alignment device for entity of multimodal knowledge graph, characterized by comprising: at least one processor; at least one memory for storing at least one program; when at least one program is executed by at least one processor, at least one processor realizes the alignment method for entity of the multimodal knowledge graph according to any one of claims 1-8.

10. A computer-readable storage medium, storing instructions executable by a processor, characterized in that the instructions executable by the processor are used to implement the alignment method for entity of the multimodal knowledge graph according to claim 1.