CN110765276A

CN110765276A - Entity alignment method and device in knowledge graph

Info

Publication number: CN110765276A
Application number: CN201911001804.2A
Authority: CN
Inventors: 姜旭; 李嘉琛
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2019-10-21
Filing date: 2019-10-21
Publication date: 2020-02-07

Abstract

The invention provides a method and a device for aligning entities in a knowledge graph, wherein the method comprises the following steps: acquiring a plurality of entities from a plurality of platforms as entity training sets; generating a feature for performing collaborative training according to available information related to each entity in the entity training set, wherein the feature is used for indicating similarity between the same type of available information in a plurality of entities; and training the model based on the collaborative training according to the characteristics, and judging whether the entity pair to be processed is synonymous according to the model obtained by training. By the method and the device, the problem that the knowledge representation learning-based method in the related technology depends on a large amount of marked data entities and is poor in alignment is solved.

Description

Entity alignment method and device in knowledge graph

Technical Field

The invention relates to the field of computers, in particular to a method and a device for aligning entities in a knowledge graph.

Background

In the task of constructing a large-scale knowledge base, a large amount of entity data from a multi-source knowledge base needs to be processed. At the beginning of constructing the knowledge base, a knowledge description system is firstly established, and entity data are mounted in the system. Entities with the same entry name may represent semantically the same thing, perhaps two things.

In the existing practical operation process, entity fusion is mainly carried out through two methods:

1) legacy entity alignment

The traditional entity alignment method is mainly realized in a mode of attribute similarity matching, and a machine learning model with supervised learning is utilized, such as: decision trees, support vector machines, ensemble learning, etc. And deducing the cross-platform entity alignment relation according to the attribute information of the entity and the attribute similarity. Different attribute similarity calculation functions need to be designed due to different types of attributes, and different attribute similarity functions need to be designed in different fields. However, this alignment method has the following disadvantages: a) labor is consumed; b) difficult to migrate among domains; c) because the expression of the attributes is discrete, the semantic similarity of the attributes is ignored in the calculation mode, and the effect of entity alignment is limited.

2) Learning based on knowledge representation

And (3) calculating the similarity between the entities by directly using a mathematical expression by mapping the entities and the relations in the knowledge graph to a low-dimensional space vector, such as a TransE method and the like.

Knowledge representation learning is a method of using modeling to represent entities and relationships in a knowledge graph into low-dimensional vectors, which are then computed and inferred. TransE is the earliest knowledge representation learning model. It represents each triplet relationship (h, r, t) as a vector from the head entity h to the tail entity r. TransE hopes that the tail entity t of the triplet should be as close as possible to the sum of the head entity h and the relationship r, i.e., h + r t, TransE defines the loss function | h + r-t | L1/L2, and updates the parameters in the model using a stochastic gradient descent method. The traditional triple modeling method in the training knowledge base has too many parameters, so that the model is too complex and difficult to interpret, and large calculation cost is needed, and the problems of over-fitting or under-fitting are easy to occur. And TransE is used as a simple model for embedding the entity and the relation into a low-dimensional vector space, and overcomes the defects of complex training and excessive parameters of the traditional method. Although the TransE model works well on large-scale datasets, it can only compute on one-to-one relationships, but cannot compute on one-to-many, many-to-one, and many-to-many complex relationships. Therefore, many models for improving TransE have appeared, such as TransH, TransR, TransSparce, TransA, HTransA, PTransE, etc. Compared with the TransE model, these new Transs series models can compute more complex entity relationship modeling in the knowledge base, such as one-to-many, many-to-one, many-to-many relationships. For example, the TransH, TransR and TransSpace models project a head entity h and a tail entity t onto another hyperplane; TransA and HTransA use the adaptive way of local feature to get the optimum loss function, needn't appoint the closed candidate value set of the parameter in advance; PTransE is a path distribution representation-based method that represents entities, relationships, and paths in a low-dimensional vector space.

The entity alignment relationship inference methods are all single network relationship inference algorithms, and in recent years, cross-network relationship inference algorithms based on knowledge representation learning have come into existence. However, if the knowledge representation learning algorithm is directly applied to the entity alignment task, the multi-network joint representation learning cannot achieve a satisfactory effect. Because entity alignment is a special cross-network relationship, a joint representation learning model facing entity alignment needs to be analyzed and designed according to the characteristics of the entity alignment relationship. Therefore, methods for entity alignment work by using knowledge representation learning methods have been provided, and good effects such as Cross-KG and SEEA are obtained. Wherein Cross-KG proposes to jointly learn two knowledge maps for the first time, so that the complementary information of two data sources can be utilized to carry out relationship inference. However, this method has the following disadvantages: a) learning modeling semantic information only through knowledge representation, and ignoring structural attribute information of the knowledge graph; b) knowledge representation-based learning methods rely on a large amount of labeled data; c) structured information such as attributes in the knowledge graph is not utilized, and the effect of entity alignment is limited.

In view of the above problems in the related art, no effective solution exists at present.

Disclosure of Invention

The embodiment of the invention provides an entity alignment method and device in a knowledge graph, which at least solve the problem that the entity alignment is poor depending on a large amount of labeled data in a knowledge representation learning method in the related art.

According to an embodiment of the invention, there is provided a method for aligning entities in a knowledge-graph, including: acquiring a plurality of entities from a plurality of platforms as entity training sets; generating a feature for performing collaborative training according to available information related to each entity in the entity training set, wherein the feature is used for indicating similarity between the same type of available information in a plurality of entities; and training the model based on the collaborative training according to the characteristics, and judging whether the entity pair to be processed is synonymous according to the model obtained by training.

According to another embodiment of the present invention, there is provided an entity alignment apparatus in a knowledge-graph, including: the acquisition module is used for acquiring a plurality of entities from a plurality of platforms as an entity training set; a generating module, configured to generate a feature for performing collaborative training according to available information related to each entity in the entity training set, where the feature is used to indicate a similarity between the same type of available information in multiple entities; and the alignment module is used for training the model based on the collaborative training according to the characteristics and judging whether the entity pair to be processed is synonymous according to the model obtained by training.

According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the invention, a plurality of entities are obtained from a plurality of platforms as entity training sets, the characteristics for performing collaborative training are generated according to the available information related to each entity in the entity training sets, wherein the characteristics are used for indicating the similarity between the same type of available information in the plurality of entities, then the model based on collaborative training is trained according to the characteristics, and whether the entity pair to be processed is synonymous or not is judged according to the model obtained by training, so that the connection relation between the entities in different structured knowledge bases is mined, semantic disambiguation between the entities in the knowledge bases is oriented, the problem that the method based on knowledge representation learning in the related technology depends on poor alignment of a large number of labeled data entities is solved, and the effect of entity alignment is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware structure of a terminal of an entity alignment method in a knowledge graph according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a method of entity alignment in a knowledge-graph according to an embodiment of the invention;

FIG. 3 is a block diagram of an entity alignment apparatus in a knowledge-graph according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

First, terms referred to in the present application are explained;

the knowledge map is formed by the following modeThe triplet is (E, R, F), where E { E1, E2, …, eNe } represents an entity set, including values of instances and their attributes; r { R1, R2, …, rNr } represents a binary relationship set for describing the relationship between entities;

representing a set of fact triplets.

Knowledge graph entity alignment: given two knowledge-maps KG1, KG2, all entities in knowledge-map KG1 (or KG2), respectively, that can align into KG2 (or KG1) are found. Namely, align (KG1, KG2) { (E, E') | E ∈ E1, E ∈ E2 }.

The framework of the entity alignment method of semi-supervised collaborative training comprises the following steps: the semi-supervised learning algorithm mainly comprises model training and training samples, and updates the 2 key parts.

Wherein, the training process is training: each tagged entity pair in the set generates features for learning of the model; and in the sample updating process, predicting whether the unmarked entity pair is synonymous by using the model obtained by learning, and adding the entity pair with high classification confidence coefficient and the prediction label thereof into a training set. Semi-supervised learning iteratively performs these 2 processes until a stopping condition is met, such as a maximum number of iterations is reached or the unlabeled data set is empty.

The cooperative training is one of semi-supervised methods, and the core idea is as follows: in the training stage of the model, dividing the feature space into 2 parts (views) which are relatively independent, and respectively training a classification model on the 2 views; and in the sample updating process, adding the samples with high confidence in the classification result of each model into the training sample set of another model respectively.

Example 1

The method provided by the first embodiment of the present application may be executed in a terminal, a computer terminal, or a similar computing device. Taking the example of running on a terminal, fig. 1 is a hardware structure block diagram of the terminal of an entity alignment method in a knowledge graph according to an embodiment of the present invention. As shown in fig. 1, the terminal 10 may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used for storing computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the entity alignment method in the knowledge graph in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, thereby implementing the above-mentioned methods. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In this embodiment, there is provided an entity alignment method in a knowledge graph running on the terminal, and fig. 2 is a flowchart of the entity alignment method in the knowledge graph according to the embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, acquiring a plurality of entities from a plurality of platforms as entity training sets;

step S204, generating characteristics for performing collaborative training according to available information related to each entity in the entity training set, wherein the characteristics are used for indicating the similarity between the same type of available information in a plurality of entities;

and S206, training the model based on the collaborative training according to the characteristics, and judging whether the entity pair to be processed is synonymous according to the model obtained by training.

Through the steps S202 to S206, a plurality of entities are obtained from a plurality of platforms as entity training sets, and features for performing collaborative training are generated according to available information related to each entity in the entity training sets, wherein the features are used for indicating the similarity between the same type of available information in the plurality of entities, so that a model based on collaborative training is trained according to the features, and whether the entity pairs to be processed are synonymous or not is judged according to the model obtained through training, so that the mining of the connection relationship between the entities in different structured knowledge bases is realized, semantic disambiguation between the entities in the knowledge bases is oriented, the problem that the knowledge representation learning-based method in the related art depends on a large amount of labeled data and the entity alignment is poor is solved, and the effect of entity alignment is improved.

Optionally, the manner of acquiring a plurality of entities from a plurality of platforms as an entity training set in step S202 in this embodiment includes:

step S202-11, extracting available information of a plurality of entities of a plurality of platforms, wherein the available information at least comprises one of the following information: entity name, text contained in the entity, key discrete value and entity attribute;

and S202-12, taking the entity with the extracted available information as the entity in the entity training set.

Optionally, the manner for generating the feature for performing the collaborative training according to the available information related to each entity in the entity training set in step S204 in this embodiment includes:

step S204-11, determining similarity among the entity names of a plurality of entities; or, determining the similarity between the titles and the texts in the texts contained in the entities of the plurality of entities and the similarity between 2 texts in each combination after the attributes are combined; or, determining similarity between 2 key discrete values in the key discrete value sets of the plurality of entities; or, determining attributes of a plurality of entities, extracting 2-dimensional features, and determining the similarity of the attributes of the 2 entities;

and S204-12, taking the similarity as a characteristic for carrying out collaborative training.

Optionally, in this embodiment, the method for training the model based on collaborative training according to the features, which is involved in step S206, may include:

step S206-11, dividing the features into a text view and a key discrete value view, wherein the entity name and the text contained in the entity are divided into the text view; dividing the attribute and the key discrete value into the key discrete value view;

and S206-12, training the model based on the collaborative training based on the text view and the key discrete value view.

It should be noted that the model in this embodiment may be selected as a two-classifier.

The present application will be illustrated with reference to alternative embodiments of the present embodiment;

the entity alignment modeling is carried out to form a two-classification problem with constraint, and key information such as entity names, attributes, blog character contents, time and numerical values in the entity names, attributes and blog character contents are fully utilized to generate multidimensional characteristics in a combined mode; the features are divided into 2 relatively independent views, and the distribution condition of the synonymous entities is iteratively learned from the unlabeled data through the collaborative training of the classifiers on the 2 views.

To achieve the above object, the method steps of this alternative embodiment:

step S11, perform data preprocessing on the 2 platforms, including entity information extraction, attribute value normalization, extraction and normalization of time values and numerical values in the text, and the like.

Step S12, generating a synonym candidate based on the inverted index.

And step S13, comprehensively utilizing each type of information to generate characteristic representation for the candidate entity pair.

Step S14, training the co-trained classifier and using the learned model to determine whether the candidate entity pair is synonymous.

The specific implementation of the above steps S11 to S14 is performed in the following order:

1. data pre-processing

1) Feature extraction

Taking an entity as an example of a microblog account, in microblog account data, each microblog user generally represents one entity, so that available information of the entity mainly comprises an entity name, attributes, microblog article texts, time values, numerical values, hyperlinks and other key discrete values of each part.

The name of an entity is an important identifier of the entity, a high reference result can be obtained only by matching the name, but the synonymy and ambiguity problems in the name cannot be solved well. Attributes are a structured description of some basic information of an entity, and are typically extractable from microblog account information. Microblog passage text is a specific description of an entity and generally comprises a title and a body 2 part. Finally, the time, numerical value and hyperlink appearing in each text have higher discrimination when identifying whether the entities are synonymous, so called key discrete value, the extraction of the time value and the numerical value can be completed by a regular matching method, and the time expression and the normalization of the numerical value unit are carried out.

2) Feature engineering

And generating characteristics from 4 types of information such as entity names, microblog texts, key discrete values, attributes and the like for subsequent model training.

The characteristics related to the entity name are 2-dimensional, and comprise: whether the names of the candidate entity pairs are strictly matched and the similarity between the names; where the similarity can be calculated using the edit distance.

The blog text for each entity includes: and 3 types of title, body and attribute text, completely combining the 3 types of text, and calculating the similarity of 2 sections of text in each combination to obtain 9-dimensional features. The similarity is calculated by using the traditional cosine similarity.

The key discrete values are derived from different descriptive texts, so that for each type of key value, a feature is also required to be generated based on the combination of the contents of the parts, the discrete values in each part of the text form a set, and thus the 2-key discrete value set is calculated, and the similarity between S1 and S2 is taken as a feature.

And finally, extracting 2-dimensional features from the structured attributes, and respectively measuring the attribute similarity and dissimilarity of the 2 entities as one feature.

2. Learning process

1) View partitioning of feature space

Dividing the extracted features into a text view and a key discrete value view, wherein the entity name and the features describing the text belong to the text category in form and calculation method, and dividing the 2 parts of features into the text view; the attributes and the characteristics of the key discrete values are uniformly divided into the key discrete value view.

2) Collaborative training process

Given 2 repositories D1 and D2 to be aligned, and a set L { < ei, ej >, label | ei ∈ D1, ej ∈ D2, label ∈ {0, 1} }, and a pair U { < ei, ej > | ei ∈ D1, ej ∈ EC }, where EC is a set of candidate synonymous entities of ei in D2, which can be obtained by retrieval using an inverted index, the entity alignment method based on collaborative training is as follows.

Entity alignment algorithm based on cooperative training:

inputting U, L, maximum iteration number Nmax, positive sample number Npos added to the training set each time in iteration, and positive-negative sample ratio r added

Outputting a classifier f1 (on a text view), initializing a classifier f2 (on a key discrete value view), wherein a training set L1 is equal to L, an L2 is equal to L, and the iteration number Niter is equal to 0;

1) classifier f1 was trained on the text view with training set L1 and classifier f2 was trained on the key discrete value view with training set L2.

2) The entity pairs in U are classified using f1, f2, respectively.

3) And taking the entity pairs with consistent f1 and f2 classification labels in U to form a set S, selecting Npos positive samples with highest classification confidence coefficient and Npos/r negative samples with highest confidence coefficient from the set S, adding the Npos positive samples with highest classification confidence coefficient and the Npos/r negative samples with highest confidence coefficient into L2(L1), and removing the selected samples from U.

4) And (4) if the Niter is equal to the Niter +1, if the Niter is less than Nmax and U is not empty, iterating 1-3, otherwise, stopping iterating.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

In this embodiment, an entity alignment apparatus in a knowledge graph is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 3 is a block diagram of an entity alignment apparatus in a knowledge-graph according to an embodiment of the present invention, as shown in fig. 3, the apparatus includes: an obtaining module 32, configured to obtain a plurality of entities from a plurality of platforms as an entity training set; a generating module 34, configured to generate a feature for performing collaborative training according to available information related to each entity in the entity training set, where the feature is used to indicate a similarity between the same type of available information in multiple entities; and the alignment module 36 is configured to train the model based on collaborative training according to the features, and judge whether the entity pair to be processed is synonymous according to the model obtained through training.

Optionally, the obtaining module 32 in this embodiment may further include: an extracting unit, configured to extract available information of a plurality of entities of a plurality of platforms, where the available information includes at least one of: entity name, text contained in the entity, key discrete value and entity attribute; and the first processing unit is used for taking the entity with the extracted available information as an entity in the entity training set.

Optionally, the generating module 34 in this embodiment includes: a determining unit configured to determine similarity between entity names of a plurality of entities; or, determining the similarity between the titles and the texts in the texts contained in the entities of the plurality of entities and the similarity between 2 texts in each combination after the attributes are combined; or, determining similarity between 2 key discrete values in the key discrete value sets of the plurality of entities; or, determining attributes of a plurality of entities, extracting 2-dimensional features, and determining the similarity of the attributes of the 2 entities; and the second processing unit is used for taking the similarity as a characteristic for carrying out the collaborative training.

Optionally, the alignment module 36 in this embodiment includes: the dividing unit is used for dividing the features into a text view and a key discrete value view, wherein the entity name and the text contained in the entity are divided into the text view; dividing the attribute and the key discrete value into a key discrete value view; and the training unit is used for training the model based on the collaborative training based on the text view and the key discrete value view.

Optionally, the model in this embodiment is a classifier.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring a plurality of entities from a plurality of platforms as entity training sets;

s2, generating a feature for performing collaborative training according to available information related to each entity in the entity training set, wherein the feature is used for indicating similarity between the same type of available information in a plurality of entities;

and S3, training the model based on the collaborative training according to the characteristics, and judging whether the entity pair to be processed is synonymous according to the model obtained by training.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for aligning entities in a knowledge graph, comprising:

acquiring a plurality of entities from a plurality of platforms as entity training sets;

generating a feature for performing collaborative training according to available information related to each entity in the entity training set, wherein the feature is used for indicating similarity between the same type of available information in a plurality of entities;

and training the model based on the collaborative training according to the characteristics, and judging whether the entity pair to be processed is synonymous according to the model obtained by training.

2. The method of claim 1, wherein obtaining a plurality of entities from a plurality of platforms as an entity training set comprises:

extracting available information of a plurality of entities of a plurality of platforms, wherein the available information comprises at least one of: entity name, text contained in the entity, key discrete value and entity attribute;

and taking the entity with the extracted available information as the entity in the entity training set.

3. The method of claim 2, wherein generating features for collaborative training based on available information about each entity in the training set of entities comprises:

determining similarity between the entity names of a plurality of entities; or, determining the similarity between the titles and the texts in the texts contained in the entities of the plurality of entities and the similarity between 2 texts in each combination after the attributes are combined; or, determining similarity between 2 key discrete values in the key discrete value sets of the plurality of entities; or, determining attributes of a plurality of entities, extracting 2-dimensional features, and determining the similarity of the attributes of the 2 entities;

and taking the similarity as a characteristic for carrying out cooperative training.

4. The method of claim 3, wherein training the co-training based model according to the features comprises:

dividing the features into a text view and a key discrete value view, wherein entity names and texts contained by entities are divided into the text view; dividing the attribute and the key discrete value into the key discrete value view;

the collaborative training based model is trained based on the text view and the key discrete value view.

5. The method of any one of claims 1 to 4, wherein the model is a classifier.

6. An apparatus for aligning entities in a knowledge graph, comprising:

the acquisition module is used for acquiring a plurality of entities from a plurality of platforms as an entity training set;

a generating module, configured to generate a feature for performing collaborative training according to available information related to each entity in the entity training set, where the feature is used to indicate a similarity between the same type of available information in multiple entities;

and the alignment module is used for training the model based on the collaborative training according to the characteristics and judging whether the entity pair to be processed is synonymous according to the model obtained by training.

7. The apparatus of claim 6, wherein the obtaining module comprises:

an extracting unit, configured to extract available information of a plurality of entities of a plurality of platforms, where the available information includes at least one of: entity name, text contained in the entity, key discrete value and entity attribute;

and the first processing unit is used for taking the entity with the extracted available information as the entity in the entity training set.

8. The apparatus of claim 7, wherein the generating module comprises:

a determining unit configured to determine similarity between the entity names of a plurality of entities; or, determining the similarity between the titles and the texts in the texts contained in the entities of the plurality of entities and the similarity between 2 texts in each combination after the attributes are combined; or, determining similarity between 2 key discrete values in the key discrete value sets of the plurality of entities; or, determining attributes of a plurality of entities, extracting 2-dimensional features, and determining the similarity of the attributes of the 2 entities;

and the second processing unit is used for taking the similarity as a characteristic for carrying out collaborative training.

9. The apparatus of claim 8, wherein the alignment module comprises:

the dividing unit is used for dividing the features into a text view and a key discrete value view, wherein the entity name and the text contained in the entity are divided into the text view; dividing the attribute and the key discrete value into the key discrete value view;

and the training unit is used for training the model based on the collaborative training based on the text view and the key discrete value view.

10. The apparatus of any one of claims 6 to 9, wherein the model is a classifier.

11. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 5 when executed.

12. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 5.