CN109960810B

CN109960810B - Entity alignment method and device

Info

Publication number: CN109960810B
Application number: CN201910243854.5A
Authority: CN
Inventors: 何芙珍; 李直旭; 陈志刚
Original assignee: Iflytek Suzhou Technology Co Ltd
Current assignee: Iflytek Suzhou Technology Co Ltd
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2020-05-19
Anticipated expiration: 2039-03-28
Also published as: CN109960810A

Abstract

The application discloses an entity alignment method and device, the method obtains a first entity pair set by iteratively executing two screening steps, and the two steps are: and screening a target entity pair from the two knowledge graphs according to the target attribute pair, and screening a new target attribute pair from the two knowledge graphs according to the target entity pair. Because each target entity pair in the first entity pair set is obtained through the attribute information among different entities, and the attribute information of each entity can represent the entity more truly and comprehensively, the accuracy of the entity alignment result can be improved when the entity alignment is carried out by utilizing the attribute information of the entity.

Description

Entity alignment method and device

Technical Field

The present application relates to the field of knowledge graph technology, and in particular, to a method and an apparatus for entity alignment.

Background

With the continuous development and breakthrough of artificial intelligence, Knowledge maps (KGs) have attracted extensive attention as a technical foundation for realizing strong artificial intelligence in the future. At present, when a knowledge graph is constructed, triples are generally collected from semi-structured texts of various encyclopedic websites, or the triples are extracted from various unstructured texts by using an information extraction technology to construct the knowledge graph, wherein two types of triples exist, one type is a relationship triplet, and the other type is an attribute triplet.

Because there is a semantic gap in the expression of entities, relationships and attributes in unstructured texts of various encyclopedias and various sources, students have developed a research topic of semantic integration in order to integrate different knowledge graphs into a uniform, consistent and concise form and provide semantic interoperability for interaction between applications using different knowledge graphs. Entity alignment is an important prerequisite and technical means for semantic integration, and aims to find out entities in two heterogeneous knowledge graphs which point to the same object in the real world.

However, different knowledge maps have large expression differences for various information of entities and relationship structures between the entities, so that entity alignment is very challenging, and the existing entity alignment technology mainly performs entity alignment based on relationships between the entities, but the entity alignment result obtained by adopting the method cannot achieve high accuracy.

Disclosure of Invention

The embodiment of the present application mainly aims to provide an entity alignment method and an entity alignment device, which can improve the accuracy of an entity alignment result when two heterogeneous knowledge maps are subjected to entity alignment.

The embodiment of the application provides an entity alignment method, which comprises the following steps:

determining each known target attribute pair in the two knowledge maps as each reference attribute pair;

screening each target entity pair from the two knowledge graphs according to each reference attribute pair;

screening out each new target attribute pair from the two knowledge graphs as each reference attribute pair according to the screened target entity pair, and continuously performing the step of screening out each target entity pair from the two knowledge graphs according to each reference attribute pair until the target entity pair cannot be screened out, so as to form a first entity pair set;

the two attributes included in the target attribute pair are the same and belong to two knowledge maps respectively; the two entities included in the target entity pair are the same and belong to two knowledge graphs respectively.

Optionally, the screening out each target entity pair from the two knowledge maps according to each reference attribute pair includes:

and screening each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair.

Optionally, the screening out each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair includes:

determining each initial entity pair in the two knowledge graphs, wherein the initial entity pair has at least one reference attribute pair, each reference attribute pair corresponds to an attribute value similarity, and the attribute value similarity is the similarity between the attribute values of the two attributes included in the corresponding reference attribute pair;

and judging whether the initial entity pair belongs to the target entity pair or not according to the similarity of at least one attribute value of the initial entity pair.

Optionally, the determining, according to the similarity of at least one attribute value of the initial entity pair, whether the initial entity pair belongs to the target entity pair includes:

calculating an average value of the similarity of at least one attribute value of the initial entity pair;

and if the calculated average value is larger than a first preset threshold value, judging that the initial entity pair is the target entity pair.

Optionally, the screening out new target attribute pairs from the two knowledge maps according to the screened out target entity pairs includes:

for each screened target entity pair, combining each attribute to be selected under the target entity pair pairwise to obtain an attribute to be selected under each combination, wherein two attributes included in the attribute to be selected do not belong to each determined target attribute pair and respectively belong to two entities in the target entity pair;

calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected;

and if the calculated similarity is greater than a second preset threshold, judging that the attribute pair to be selected is a new target attribute pair.

Optionally, the method further includes:

and screening each target entity pair in the two knowledge maps by using an entity alignment model obtained by pre-training to form a second entity pair set, wherein the entity alignment model is used for screening the entity pairs based on the entity relationship.

Optionally, the entity alignment model is obtained by training using model training data, where the model training data includes a target entity pair with high correctness screened from the first entity pair set.

Optionally, after the forming the second entity pair set, the method further includes:

merging the first entity pair set and the second entity pair set to form a third entity pair set;

and removing the target entity pairs with low accuracy from the third entity pair set to serve as a fourth entity pair set.

Optionally, the removing the target entity pair with low accuracy from the third entity pair set includes:

for a target entity pair belonging to the first entity pair set and the second entity pair set at the same time, determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair;

wherein the first similarity is a similarity between two entities included in the target entity pair obtained when the first entity pair set is formed, and the second similarity is a similarity between two entities included in the target entity pair obtained when the second entity pair set is formed;

and if the final similarity of the target entity pair is smaller than a third preset threshold value, removing the target entity pair from the third entity pair set.

Optionally, the determining the final similarity of the target entity pair includes:

and determining the final similarity of the target entity pair based on the respective confidence degrees of the first similarity and the second similarity.

Optionally, the confidence is obtained by learning in model learning data by using a pre-constructed regression model, where the model learning data includes a target entity pair with high correctness screened from the first entity pair set.

An embodiment of the present application further provides an entity alignment apparatus, including:

a reference attribute pair acquisition unit, configured to determine each known target attribute pair as each reference attribute pair in the two knowledge maps;

the target entity pair screening unit is used for screening each target entity pair from the two knowledge maps according to each reference attribute pair;

the reference attribute pair screening unit is used for screening out each new target attribute pair from the two knowledge maps as each reference attribute pair according to the screened target entity pair;

the target entity pair circulating screening unit is used for calling the screened reference attribute pairs, screening each target entity pair in the two knowledge maps according to each reference attribute pair until the target entity pair cannot be screened, and forming a first entity pair set;

Optionally, the target entity pair screening unit is specifically configured to:

Optionally, the target entity pair screening unit includes:

an initial entity determining subunit, configured to determine, in two knowledge graphs, initial entity pairs, where each initial entity pair has at least one reference attribute pair, and each reference attribute pair has an attribute value similarity corresponding to an attribute value, where the attribute value similarity is a similarity between attribute values of two attributes included in the corresponding reference attribute pair;

and the target entity pair determining subunit is used for judging whether the initial entity pair belongs to the target entity pair according to the similarity of at least one attribute value of the initial entity pair.

Optionally, the determining subunit of the target entity pair includes:

the similarity average value calculating module is used for calculating the average value of the similarity of at least one attribute value of the initial entity pair;

and the target entity pair determining module is used for judging that the initial entity pair is the target entity pair if the calculated average value is greater than a first preset threshold value.

Optionally, the reference attribute pair screening unit includes:

a candidate attribute pair obtaining subunit, configured to combine, for each screened target entity pair, every two of the candidate attributes under the target entity pair to obtain a candidate attribute pair under each combination, where two attributes included in the candidate attribute pair do not belong to each determined target attribute pair and respectively belong to two entities in the target entity pair;

the attribute similarity calculation operator unit is used for calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected;

and the target attribute pair determining subunit is used for judging that the attribute pair to be selected is a new target attribute pair if the calculated similarity is greater than a second preset threshold.

Based on the technical scheme, the method has the following beneficial effects:

the entity alignment method provided by the present application obtains a first entity pair set by iteratively performing two screening steps, and the two steps are: and screening a target entity pair from the two knowledge graphs according to the target attribute pair, and screening a new target attribute pair from the two knowledge graphs according to the target entity pair. Because each target entity pair in the first entity pair set is obtained through the attribute information among different entities, and the attribute information of each entity can represent the entity more truly and comprehensively, the accuracy of the entity alignment result can be improved when the entity alignment is carried out by utilizing the attribute information of the entity. In addition, the target attribute pair can be obtained by screening from the knowledge graph spectrum according to the target entity pair, so that two attributes with the same semantics and different expression modes can form the target attribute pair, the problem that the attributes cannot be aligned due to various expression modes is solved, and the accuracy of the entity alignment result is further improved. In addition, because the target attribute pair and the target entity pair are generated in the iterative process, training data comprising a large number of entity pairs which are aligned in advance are not needed, the problem of low accuracy of the entity alignment result caused by low quality of the training data is solved, and the accuracy of the entity alignment result is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of the structure of various knowledge-maps provided by an embodiment of the present application;

fig. 2 is a flowchart of an entity alignment method according to an embodiment of the present invention;

fig. 3 is a diagram illustrating an example of a process for iteratively screening pairs of target entities according to an embodiment of the present application;

fig. 4 is a flowchart of an entity alignment method according to a second embodiment of the present application;

fig. 5 is a flowchart of an entity alignment method according to a third embodiment of the present application;

FIG. 6 is a diagram illustrating a first set merge result provided in an embodiment of the present application;

FIG. 7 is a diagram illustrating a second set merge result provided in an embodiment of the present application;

fig. 8 is a schematic flowchart of a specific implementation of an entity alignment method according to a third embodiment of the present application;

fig. 9 is a flowchart of a specific implementation of an entity alignment method provided in a third embodiment of the method of the present application;

fig. 10 is a schematic structural diagram of a physical alignment apparatus according to an embodiment of the present application.

Detailed Description

A knowledge graph can be used to describe the relationships between different entities and the attributes each entity has. The entity refers to things which exist in the objective world and can be distinguished from each other, and the entity can be a person, an object, or an abstract concept. For example, "arbor" and "san Francisco" are both entities.

The relationship between different entities refers to some association existing between two entities, for example, according to the fact that "arbor comes from san francisco", the association between the entity "arbor" and the entity "san francisco" is specifically: the place of origin of "arbor" is "san Francisco".

The attributes of an entity refer to certain characteristics of the entity itself, and each attribute refers to an attribute name and an attribute value. For example, from the fact that "arbor's date of birth is 24/2/1955", the entity "arbor" has an attribute of birth within 24/2/1955, wherein "date of birth" is an attribute name and "24/2/1955" is an attribute value.

In addition, in constructing a knowledge graph, triples are typically collected from semi-structured text of various large encyclopedia sites, or extracted from various unstructured text using information extraction techniques. Wherein, the triplets can adopt a uniform representation mode: (subject, predicate, object); moreover, in some cases, a triplet may adopt a special representation mode, which is specifically: a relational triple for describing some association between different entities may be represented by an entity entry, e.g., "arbor is born in san francisco" may be represented by a relational triple (arbor, place of birth, san francisco); an attribute triple for describing a certain attribute of each entity may be represented by an (entity entry, attribute, value), for example, "height of arbor is 188 cm" may be represented by an attribute triple (arbor, height, 188 cm).

However, since there are semantic gaps in the expression of entities, relationships and attributes, in various encyclopedia websites and unstructured texts from various sources, in order to integrate different knowledge graphs into a uniform, consistent and concise form and to provide semantic interoperability for interaction between applications using different knowledge graphs, researchers have developed a research topic of semantic integration. Entity alignment is an important prerequisite and technical means for semantic integration, and aims to find out entities in two heterogeneous knowledge graphs which point to the same object in the real world. However, the entity alignment work is very challenging because different knowledge maps have different expressions for various information of the entities and relationship structures among the entities.

For convenience of explanation and understanding, the entity expression differences in different knowledge-graphs will be described with reference to fig. 1, where fig. 1 is a schematic structural diagram of different knowledge-graphs provided in the embodiments of the present application.

As an example, FIG. 1 comprises a first knowledge-map KG₁And a second knowledge-map KG₂. Wherein, the first knowledge-map KG₁Includes a first entity

Second entity

And a third entity

Second entity

And a third entity

Has a first relation therebetween

First entity

And a second entity

Has a second relation therebetween

For example, the second entity

The attribute information and the relationship information specifically include: "birthday-time" 1955-2-24 "," name "Steve Jobs", "birthday-place" San Francisco, California, USA "and" height "188 cm". Second knowledge-graph KG₂Including a fourth entity

Fifth body

And a sixth entity

Fifth body

And a sixth entity

Has a third relation therebetween

Fourth entity

And the fifth body

Have a fourth relationship therebetween

For example, the fifth body

The attribute information and the relationship information specifically include: "birthdate" is "1955.02.24", "name" is "Steve Jobs", "birthday-place" is "SanFrancisco", "nickname" is "Apple godfather", and "height" is "188 centi-meter".

Referring to FIG. 1, the second entity

And the fifth body

All point to the person "arbor", but, for the computer, it is very difficult to judge that "birthday-time" and "birthdate" are the same attribute, "San Francisco, California, USA" and "San Francisco" are the same entity, and "188 cm" and "188 centi-meter" are equal attribute values, thus making the entity alignment work very challenging.

In the prior art, the entity alignment method is usually performed based on the idea of word vector embedding (embedding), and two commonly used entity alignment methods will be described as an example below.

The first entity alignment method is to map the entities in the knowledge graph and the relationships between different entities into a vector space, so that the similarity between different entities can be obtained by calculating the distance between vectors, i.e., the deep structure information of the entities on the whole knowledge graph is obtained without depending on any text information, and the entity alignment is performed based on the deep structure information.

The second entity alignment method is improved based on the first entity alignment method, and the improvement of the method is that: according to the relationship among different entities, the semantic description and the attribute of each entity, the vector representation of each entity is obtained, so that the defect of low accuracy of an entity alignment result caused by only considering the relationship among different entities in the first entity alignment method is overcome to a certain extent.

However, it has been found that the two methods of aligning the entities have the following disadvantages:

the first entity alignment method has the following defects: since the method requires a large number of entity pairs which are pre-aligned as training data, but the acquisition of high-quality training data is very difficult, the quality of the training data used by the method is low, which results in low accuracy of the entity alignment result of the method. In addition, since the relationships between the entities in the knowledge-graph are sparse, that is, each entity in the knowledge-graph has little or no relationship with other entities (e.g., an isolated entity in the knowledge-graph), the accuracy of the entity alignment result is low because only the entity relationships are used for entity alignment.

The second entity alignment method has the following defects: since this method is an improvement of the first entity alignment method, it still has the disadvantages of the first entity alignment method. In addition, in the construction process of the entity vector, in order to bypass the diversity of the attribute names and the attribute value expressions, the attribute values are simplified into the attribute value types (such as date types, digital types, and the like), so that the noise in the attribute values is large, the attribute information is not effectively utilized, and the accuracy of the entity alignment result is still low.

When the embodiment of the application is used for entity alignment, the attribute information is effectively utilized, and the accuracy of the entity alignment result is improved.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Method embodiment one

Referring to fig. 2, a flowchart of an entity alignment method according to an embodiment of the present invention is shown.

The entity alignment method provided by the embodiment of the application comprises the following steps:

s201: in both knowledge maps, known respective pairs of target attributes are determined as respective pairs of reference attributes.

S202: and screening each target entity pair from the two knowledge graphs according to each reference attribute pair.

S203: judging whether the number of the currently screened target entity pairs is 0, if so, executing S206; if not, go to S204.

S204: and screening each new target attribute pair from the two knowledge maps according to the screened target entity pairs.

S205: and taking each screened new target attribute pair as each reference attribute pair, and returning to execute S202.

S206: and forming a first entity pair set by all the screened target entity pairs.

In order to facilitate understanding and explanation of the entity alignment method provided in the first embodiment of the method of the present application, the following sequentially describes specific implementation manners of S201 to S206.

First, a specific embodiment of S201 will be described.

In S201, each target attribute pair includes two attributes, where the two attributes are the same and belong to two knowledge-graphs, respectively. By way of example, the first target attribute pair includes a first attribute and a second attribute, wherein the first attribute and the second attribute are the same, and wherein the first attribute belongs to a first knowledge-graph and the second attribute belongs to a second knowledge-graph.

The first attribute and the second attribute may be identical, and the attribute name and the attribute value of the first attribute may be identical to the attribute name and the attribute value of the second attribute, respectively, or the attribute name and the attribute value of the first attribute may be semantically identical to the attribute name and the attribute value of the second attribute, respectively.

For ease of explanation and understanding of the target attribute pairs, reference will now be made to FIG. 1 and two examples.

The first example is specifically: as shown in FIG. 1, when the first attribute is the second entity

And the attribute name of the first attribute is "name", and the attribute value of the first attribute is "Steve Jobs"; also, the second attribute is a fifth body

When the attribute name of the second attribute is "name" and the attribute value of the second attribute is "Steve Jobs", the attribute name and the attribute value of the first attribute are completely the same as those of the second attribute, so that the first attribute is the same as the second attribute; and, due to the second entity

Belonging to a first knowledge-graph KG₁And the fifth embodiment

Belonging to a second knowledge-graph KG₂Thus, belong to the second entity

Is belonging to a first knowledge-graph KG₁Element of the fifth embodiment

Is belonging to a second knowledge-graph KG₂The first attribute and the second attribute are made to belong to two different knowledge-graphs, respectively. It can be seen that, since the attribute name and the attribute value of the first attribute are respectively identical to the attribute name and the attribute value of the second attribute, and the first attribute and the second attribute respectively belong to two different knowledge maps, the first attribute and the second attribute form a target attribute pair.

The above is related content of the first example, and in this example, the target attribute pair is described by taking an example in which the attribute name and the attribute value of the first attribute are completely the same as the attribute name and the attribute value of the second attribute, respectively.

The second example is specifically: as shown in FIG. 1, when the first attribute is the second entity

And the attribute name of the first attribute is "birthday-time", the attribute value of the first attribute is "1955-2-24"; also, the second attribute is a fifth body

And the attribute name of the second attribute is "birthdate", and the attribute value of the second attribute is "1955.02.24", since the attribute names "birthday-time" and "birthdate" are both representative of the birth date and the attribute values "1955-2-24" and "1955.02.24" are both representative of 2.24.2.1955, the attribute name and attribute value of the first attribute are both semantically identical to the attribute name and attribute value of the second attribute, such that the first attribute and the second attribute are identical; and, due to the second entity

Belonging to a first knowledge-graph KG₁And the fifth embodiment

Belonging to a second knowledge-graph KG₂Thus, belong to the second entity

Is belonging to a first knowledge-graph KG₁Element of the fifth embodiment

Is belonging to a second knowledge-graph KG₂The first attribute and the second attribute are made to belong to two different knowledge-graphs, respectively. It can be seen that the attribute name and attribute value of the first attribute are respectively equal to the attribute name of the second attributeAnd the semantics of the attribute values are the same, and the first attribute and the second attribute belong to two different knowledge maps respectively, so that the first attribute and the second attribute form a target attribute pair.

The above is related content of the second example, in which the target attribute pair is described by taking an example that the attribute name and the attribute value of the first attribute have the same semantics as those of the attribute name and the attribute value of the second attribute, respectively.

In addition, each known target attribute pair may be a preset attribute pair, or an attribute pair obtained by using a preset algorithm in advance. The preset algorithm may be any algorithm capable of determining the target attribute pair, and this is not specifically limited in the embodiment of the present application.

Since the known target attribute pairs may adopt different obtaining manners, S201 may adopt various embodiments accordingly, and the following description will take one embodiment as an example.

As an embodiment, S201 may specifically be: and determining known target attribute pairs in the two knowledge maps by using a preset algorithm so as to take the target attribute pairs as reference attribute pairs.

The above is a specific implementation manner of S201, in this implementation manner, each target attribute pair may be determined from two knowledge maps by using a preset algorithm, and each determined target attribute pair may be used as each reference attribute pair.

A specific embodiment of S202 will be described below.

In S202, each target entity pair includes two entities, wherein the two entities are the same (i.e., the two entities refer to the same content), and the two entities belong to two knowledge-graphs respectively.

As an example, as shown in FIG. 1, the second entity

And the fifth body

Are all indicatedIs the person "arbor" and the second entity

Belonging to a first knowledge-graph KG₁Fifth embodiment

Belonging to a second knowledge-graph KG₂Thus, the second entity

And the fifth body

A target entity pair may be composed.

Moreover, in the embodiment of the present application, whether the first entity and the second entity are the same may be determined according to the attribute information of the entities.

Since each attribute information may include an attribute name and an attribute value, a target entity pair may be determined from the attribute name and the attribute value of each attribute of an entity. At this time, in order to further improve the accuracy of the target entity pair, the present application provides an embodiment of S202, in which S202 may specifically be: and screening each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair.

For ease of explanation and understanding of the above-provided embodiments of S202, the following description will be made in conjunction with fig. 1.

As an example, it is assumed that each reference attribute pair obtained by S201 includes a first reference attribute pair and a second reference attribute pair. In particular, the first reference attribute pair comprises a first attribute and a second attribute, wherein the first attribute is the second entity

The attribute name of the first attribute is "birthday-time" and the attribute value is "1955-2-24", the second attribute is the fifth entity

The attribute name of the second attribute is "birthdate" and the attribute value is "1955.02.24"; the second reference attribute pair includes a third attribute and a fourth attribute, wherein the third attribute is the second entity

The attribute name of the third attribute is "height" and the attribute value is "188 cm", and the fourth attribute is the fifth body

The fourth attribute has an attribute name of "height" and an attribute value of "188 centi-meter".

If the above assumption is true, S202 may specifically be: according to the attribute value '1955-2-24' of the first attribute in the first reference attribute pair and the attribute value '1955.02.24' of the second attribute, and the attribute value '188 cm' of the third attribute in the second reference attribute pair and the attribute value '188 centi-meter' of the fourth attribute, in the first knowledge-graph KG₁And a second knowledge-map KG₂Screening for inclusion of a second entity

And the fifth body

The target entity pair of (1).

It should be noted that, the above is described by taking screening of the target entity pair according to two reference attribute pairs as an example, however, in the present application, the reference attribute pair may be at least one pair, and the screened target entity pair may also be at least one pair, and a process of screening at least one target entity pair according to at least one reference attribute pair is the same as that in the above embodiment, and for the sake of brevity, is not described again here.

Based on the above-described one embodiment of S202, in order to further improve the accuracy of the target entity pair and further improve the accuracy of the entity alignment method, the present application further provides another embodiment of S202, in which S202 may specifically include steps S2021-S2022:

s2021: each initial entity pair is determined in two knowledge-graphs.

The initial entity pair has at least one reference attribute pair, and each reference attribute pair has an attribute value similarity corresponding to the similarity between the attribute values of the two attributes included in the corresponding reference attribute pair.

For ease of understanding and explaining the initial entity pair, reference will be made to fig. 3, where fig. 3 is an exemplary diagram of a process for iteratively screening the target entity pair according to an embodiment of the present application.

As shown in FIG. 3, assume a first knowledge-map KG₁Includes a first entity

Second entity

And a third entity

Second knowledge-graph KG₂Including a fourth entity

Fifth body

And a seventh entity

Furthermore, it is possible to provide a liquid crystal display device,

representing a first knowledge-graph KG₁The (c) th attribute of (a),

representing a second knowledge-graph KG₂J and i are positive integers; furthermore, the first reference attribute pair includes an attribute

And attribute

And a first entity

Property (2) of

And a fourth entity

Property (2) of

The attribute value similarity of (1) is 0.78; the second reference attribute pair comprises an attribute

And attribute

And a first entity

Property (2) of

And a fourth entity

Property (2) of

The attribute value similarity of (1) is 0.90; the third reference attribute pair comprises an attribute

And attribute

And a first entity

Property (2) of

And a fourth entity

Property (2) of

Has an attribute value similarity of 0.85, and the second entity

Property (2) of

And the fifth body

Property (2) of

The attribute value similarity of (1) is 0.80; the fourth reference attribute pair includes an attribute

And attribute

And a second entity

Property (2) of

And the fifth body

Property (2) of

The attribute value similarity of (1) is 0.95; the fifth reference attribute pair includes an attribute

And attribute

And a second entity

Property (2) of

And a seventh entity

Property (2) of

The attribute value similarity of (2) is 0.95.

When the above assumption is satisfied, the first knowledge-map KG is known₁And a second knowledge-map KG₂In the first entity

And a fourth entity

Three reference attribute pairs including a first reference attribute pair, a second reference attribute pair, and a third reference attribute pair; second entity

And the fifth body

Two pairs of reference attribute pairs including a third reference attribute pair and a fourth reference attribute pair; second entity

And a seventh entity

A fifth reference attribute pair is included. In this case, S2021 may specifically be: on the first knowledge-map KG₁And a second knowledge-map KG₂In determining the first entity

And a fourth entity

Second entity

And the fifth body

And a second entity

And a seventh entity

Respectively, initial entity pairs.

The above is a specific embodiment of S2021, and in this embodiment, each initial entity pair may be determined from two knowledge-graphs according to a reference attribute pair.

S2022: and judging whether the initial entity pair belongs to the target entity pair or not according to the similarity of at least one attribute value of the initial entity pair.

The attribute value similarity refers to a similarity between attribute values of two attributes included in the corresponding reference attribute pair. For example, in FIG. 3, an attribute value similarity of 0.78 represents a first entity

Property (2) of

Attribute value and fourth entity

Property (2) of

Similarity between attribute values of (1).

As an example, when the initial entity pair includes N attribute value similarities, S2022 may specifically be: judging whether the initial entity pair belongs to a target entity pair or not according to the similarity of the M attribute values in the initial entity pair; wherein M is a positive integer and M is less than or equal to N.

As an embodiment, in order to further improve the accuracy of the target entity pair and further improve the accuracy of the entity alignment method, S2022 may specifically include S2022a-S2022 b:

s2022 a: and calculating the average value of the similarity of at least one attribute value of the initial entity pair as the attribute similarity of the initial entity pair.

The attribute similarity refers to the similarity between the attributes of two entities included in a corresponding entity pair. For example, as shown in FIG. 3, when a first initial pair of entities comprises a first entity

And a fourth entity

Then, the similarity of the attributes of the first initial entity pair may represent the first entity

Attribute and fourth entity

Similarity between attributes of (1).

As an example, when the initial entity pair includes N attribute value similarities, S2022a may specifically be: calculating the average value of the similarity of the M attribute values of the initial entity pair, and taking the average value as the attribute similarity of the initial entity pair; wherein M is a positive integer and M is less than or equal to N.

For convenience of explanation and understanding, the following description will be made with reference to fig. 3, taking M ═ N as an example.

By way of example, when the assumption made for fig. 3 provided by S2021 holds true, then it can be seen that the first initial entity pair comprises the first entity

And a fourth entity

And the first initial entity pair comprises three attribute value similarities of 0.78, 0.90 and 0.85; the second initial pair of entities comprises the second entity

And the fifth body

And the second initial entity pair comprises two attribute value similarities of 0.80 and 0.95; the third initial pair of entities comprises the second entity

And a seventh entity

And the third initial entity pair comprises 0.3 in common attribute value similarity. In this case, S2022a may specifically be: calculating an average value 0.843 of 0.78, 0.90 and 0.85, and taking the average value 0.843 as the attribute similarity of the first initial entity pair; calculating an average value 0.875 of 0.80 and 0.95, and taking the average value 0.875 as the attribute similarity of the second initial entity pair; an average value of 0.3 is calculated and the average value of 0.3 is taken as the attribute similarity of the third initial entity pair.

It should be noted that, although S2022a is described above by taking M ═ N as an example, in the present application, in S2022a, M may be not only equal to N but also any positive integer smaller than N, and when M takes a different value, the execution procedure of S2022a is the same as that of the above example, and for the sake of brevity, it is not described again here.

In the above embodiment of S2022a, an average value of the similarity of at least one attribute value of the initial entity pair may be calculated, and the average value may be used as the attribute similarity of the initial entity pair.

S2022 b: and if the calculated attribute similarity of the initial entity pair is greater than a first preset threshold, judging that the initial entity pair is the target entity pair.

The first preset threshold may be preset, for example, the first preset threshold may be preset according to an application scenario.

As an example, if the first preset threshold is preset to be 0.7, S2022b may specifically be: judging whether the attribute similarity of the initial entity pair is greater than a first preset threshold value, if so, determining that the initial entity pair is a target entity pair; if not, determining that the initial entity pair is not the target entity pair.

For ease of explanation and understanding of S2022b, the following description is made in conjunction with fig. 3.

As an example, when the assumption provided in S2021 for fig. 3 holds, and the similarity of the attributes of the first initial entity pair is 0.843, the similarity of the attributes of the second initial entity pair is 0.875, the similarity of the attributes of the third initial entity pair is 0.3, and the preset first preset threshold is 0.7, then S2022b may specifically be: since both 0.843 and 0.875 are greater than 0.7, both the attribute similarity of the first initial entity pair and the attribute similarity of the second initial entity pair are greater than the first preset threshold, and at this time, it may be determined that both the first initial entity pair and the second initial entity pair are the target entity pair.

In the above specific implementation manner of S2022b, in this implementation manner, it may be determined whether the initial entity pair is the target entity pair by determining whether the calculated attribute similarity of the initial entity pair is greater than a first preset threshold.

The above is a specific implementation manner of S202, in this implementation manner, the attribute similarity of each entity pair may be determined according to each reference attribute pair, and each target entity pair may be screened from the two knowledge graphs according to the attribute similarity of each entity pair. In this way, the attribute of each entity can represent the entity more truly and comprehensively, so that the accuracy of the target entity pair screened according to the attribute similarity of the entity pair is higher.

A specific embodiment of S203 is described below.

In S203, the currently screened target entity pair refers to the target entity pair screened by performing step S202 in the current screening cycle.

Since the present application may implement the screening process for the target entity pair by iteratively performing steps S202 to S205, the screening process for the target entity pair provided by the present application may include at least one screening cycle. In addition, in each screening cycle, whether to continue the next screening cycle may be determined by judging whether the target entity pair can be screened out in the current screening cycle, and the determining process may specifically be: if at least one pair of target entity pairs can be screened in the current screening period, continuing to execute the next screening period; and if the target entity pair cannot be screened in the current screening period, ending the screening process of the target entity pair, and forming a first entity pair set by using the target entity pairs obtained in all screening periods before the current screening period.

The above is a specific implementation manner of S203, in this implementation manner, it may be determined whether to continue to execute the next screening cycle by determining whether the number of the currently screened target entity pairs is 0.

A specific embodiment of S204 is described below.

In S204, since the reference contents of the two entities belonging to the same target entity pair are the same and each entity includes a plurality of attributes, it can be determined that the two attributes having the same semantic meaning respectively belonging to the two entities are also the same. Therefore, each new target attribute pair can be screened from the two knowledge maps according to the screened target entity pair.

As an embodiment, S204 may specifically include steps S2041 to S2043:

s2041: and for each screened target entity pair, combining each attribute to be selected under the target entity pair pairwise to obtain the attribute to be selected under each combination.

The two attributes included in the attribute pair to be selected do not belong to the determined target attribute pairs and respectively belong to the two entities in the target entity pair.

For ease of explanation and understanding of the candidate attribute pairs, reference will now be made to FIG. 3.

When the assumption made in FIG. 3 provided in S2021 holds, the first entity

And a fourth entity

When the first target entity is paired, the first entity can be paired

Property (2) of

And attribute

And a fourth entity

Property (2) of

And attribute

As the candidate attribute of the first target entity pair, and the attribute

And attribute

Respectively with attributes

And attribute

And combining every two to obtain four groups of attribute pairs to be selected: including attributes

And attribute

The first attribute pair to be selected comprises attributes

And attribute

Second candidate attribute pair of (1), including attribute

And attribute

And a third candidate attribute pair of (2) and including the attribute

And attribute

The fourth candidate attribute pair of (1).

In addition, when the second entity

And the fifth body

When the second target entity is paired, the second entity can be paired

Property (2) of

And attribute

And the fifth embodiment

Property (2) of

And attribute

As the attribute to be selected of the second target entity pair, and the attribute

And attribute

Respectively with attributes

And attribute

And attribute

The fifth candidate attribute pair of (1), including the attribute

And attribute

The sixth candidate attribute pair of (1), including the attribute

And attribute

And includes the attribute

And attribute

The eighth candidate attribute pair of (1).

It should be noted that, the specific implementation of S2041 is explained and explained above by taking the process of obtaining the candidate attribute pair according to the first target entity pair and the second target entity pair as an example, but in the present application, S2041 is not limited to be able to adopt only the above implementation, and may adopt other implementations, and for the sake of brevity, details are not described here again.

S2042: and calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected as the attribute value similarity of the attribute pair to be selected.

As an example, when the first to-be-selected attribute pair includes an attribute

And attribute

Then, S2042 may specifically be: calculating attributes in the first attribute pair to be selected

Property value and property of

As the first to-be-treated, the similarity between the attribute values ofAnd selecting the attribute value similarity of the attribute pair.

S2043: and if the attribute value similarity of the attribute pair to be selected obtained through calculation is greater than a second preset threshold value, judging that the attribute pair to be selected is a new target attribute pair.

The second preset threshold may be preset, for example, the second preset threshold may be preset according to an actual application scenario.

For ease of explanation and understanding of S2043, the following description will be made in conjunction with fig. 3.

As an example, the assumption made in FIG. 3 provided at S2021 holds, and the first pair of attributes to be selected includes an attribute

And attribute

And the attribute value similarity of the first attribute pair to be selected is 0.8; the second candidate attribute pair includes an attribute

And attribute

And the attribute value similarity of the second attribute pair to be selected is 0.2; the third candidate attribute pair includes an attribute

And attribute

And the attribute value similarity of the third candidate attribute pair is 0.1; the fourth candidate attribute pair includes an attribute

And attribute

And the attribute value similarity of the fourth candidate attribute pair is 0.15; the fifth candidate attribute pair includes an attribute

And attribute

And the attribute value similarity of the fifth candidate attribute pair is 0.2; the sixth candidate attribute pair includes an attribute

And attribute

And the attribute value similarity of the sixth candidate attribute pair is 0.3; the seventh candidate attribute pair includes an attribute

And attribute

And the attribute value similarity of the seventh candidate attribute pair is 0.85; the eighth candidate attribute pair includes an attribute

And attribute

And the attribute value similarity of the eighth attribute pair to be selected is 0.13; and when the second preset threshold is 0.6, since only the attribute value similarity 0.8 of the first attribute pair to be selected and the attribute value similarity 0.85 of the seventh attribute pair to be selected are both greater than the second preset threshold 0.6, it can be determined that: including attributes

And attribute

And the first to-be-selected attribute pair of (1) and (2) includes an attribute

And attribute

The seventh candidate attribute pair of (1) is a new target attribute pair.

In this embodiment, at least one pair of attribute pairs to be selected may be obtained according to the screened target entity pair, and whether the attribute pair to be selected is a new target attribute pair may be determined according to the similarity of the attribute values of each attribute pair to be selected, so that a new target attribute pair may be extracted from an existing target entity pair, so as to use the new target attribute pair as a reference attribute pair and continue to perform the next screening cycle.

A specific embodiment of S205 is described below.

As an embodiment, when the total number of new target attribute pairs is N, then S205 may specifically be: and taking the N new target attribute pairs as N reference attribute pairs so as to screen out each new target entity pair in the two knowledge maps according to the N reference attribute pairs.

For ease of explanation and understanding of S205, it will be explained below in conjunction with fig. 3.

As an example, the assumptions made for FIG. 3 when provided by S2021 hold, and include the attributes

And attribute

And attribute

If the seventh candidate attribute pair is a new target attribute pair, S205 may specifically be: will include the attribute

And attribute

As a sixth reference attribute pair, and will include the attribute

And attribute

As a seventh reference attribute pair, so that the second knowledge-graph KG can be obtained according to the sixth reference attribute pair and the seventh reference attribute pair in S202₁And a second knowledge-map KG₂Screening for inclusion of a third entity

And a seventh entity

The third target entity pair of (1).

The above is a specific implementation of S205, in this implementation, each new target attribute pair that is screened out may be used as each reference attribute pair, and S202 is executed back to perform the screening of the target entity pair in the next screening cycle based on the reference attribute pair.

The following describes a specific embodiment of S206

In S206, all the screened target entity pairs refer to the target entity pairs screened in all the screening cycles.

For ease of explanation and understanding of S206, it will be explained below in conjunction with fig. 3.

As an example, the assumption made in FIG. 3 provided at S2021 holds and the first entity is screened during the first screening cycle

And a fourth entity

And including a second entity

And the fifth body

A second target entity pair of (a); in the second screening period screening out the second entity

And a seventh entity

If the third target entity pair is obtained, then S206 may specifically be: and collecting the first target entity pair, the second target entity pair and the third target entity pair to obtain a first entity pair collection.

In the above embodiment of S206, in this embodiment, all the screened target entity pairs may be aggregated to obtain a first entity pair set.

The above is a specific implementation manner of the first embodiment of the method, in which the first entity pair set is obtained by iteratively performing two screening steps, and the two steps are: and screening a target entity pair from the two knowledge graphs according to the target attribute pair, and screening a new target attribute pair from the two knowledge graphs according to the target entity pair. Because each target entity pair in the first entity pair set is obtained through the attribute information among different entities, and the attribute information of each entity can represent the entity more truly and comprehensively, the accuracy of the entity alignment result can be improved when the entity alignment is carried out by utilizing the attribute information of the entity. In addition, the target attribute pair can be obtained by screening from the knowledge graph spectrum according to the target entity pair, so that two attributes with the same semantics and different expression modes can form the target attribute pair, the problem that the attributes cannot be aligned due to various expression modes is solved, and the accuracy of the entity alignment result is further improved. In addition, because the target attribute pair and the target entity pair are generated in the iterative process, training data comprising a large number of entity pairs which are aligned in advance are not needed, the problem of low accuracy of the entity alignment result caused by low quality of the training data is solved, and the accuracy of the entity alignment result is improved.

In the entity alignment method provided in the first embodiment of the method, at least one pair of target entity pairs is obtained by using the entity attributes, and the entity attributes can represent the entity more truly and comprehensively, so that the accuracy of the target entity pairs is improved, and the accuracy of the entity alignment result is improved.

In addition, in order to further improve the accuracy of the target entity pair and thus further improve the accuracy of the entity alignment result, the target entity pair may also be obtained by simultaneously using the entity attribute and the entity relationship, so that the present application further provides another entity alignment method, which will be explained and explained below with reference to the accompanying drawings.

Method embodiment two

For the sake of brevity, the same contents as those in the first method embodiment are not described again.

Referring to fig. 4, it is a flowchart of an entity alignment method provided in the second embodiment of the present application.

The entity alignment method provided by the embodiment of the application comprises S401-S407:

it should be noted that S401 to S406 are the same as S201 to S206 in the first embodiment of the method, and are not repeated herein for brevity.

S407: and screening each target entity pair from the two knowledge maps by using an entity alignment model obtained by pre-training to form a second entity pair set.

It should be noted that the present application does not limit the execution order of S407, and may execute before, after, or synchronously with S401 to S406.

In S407, the entity alignment model is used to screen the target entity pair based on the entity relationship, and the entity alignment model may be any model that utilizes the entity relationship to perform entity pair screening.

By way of example, the entity alignment model may be any one of word vector embedding (embedding) -based models, and in the model, the relationship triplets (h, r, t) corresponding to each pair of entities may be mapped into a vector space to obtain the entity alignment model

So as to measure the similarity between different entities by using the distance between vectors between different entities, i.e. the relationship similarity of the entity pairs. Wherein h represents a head entity of the relational triple; r represents the relationship between the head entity of the relationship triple and the tail entity of the relationship triple; t represents a tail entity of the relational triple;

representing a vector corresponding to a head entity of the relational triple;

representing a vector corresponding to the relation;

and representing the vector corresponding to the tail entity of the relation triple.

The entity alignment model may be obtained by training in advance using model training data, the model training data may include at least one pair of target entity pairs, and the source of the model training data is wide. Wherein, the model training data can be a training data set composed of at least one pair of artificially labeled target entities; the model training data may also be training data composed of at least one pair of target entities obtained by using a preset labeling algorithm, and the preset labeling algorithm may be preset.

For ease of explanation and understanding of the entity alignment model, an embodiment of training the entity alignment model will be described as an example.

As an implementation manner, in order to improve the efficiency and accuracy of the entity alignment method, the preset labeling algorithm may be any entity alignment method provided in the first embodiment of the method (i.e., any implementation manner of steps S401 to S406), and thus, the model training data of the entity alignment model may include a target entity pair with high correctness screened from the first entity pair set.

Wherein the first set of entity pairs is generated by step S406; moreover, the process of screening out the target entity pair with high correctness from the first entity pair set may specifically be: firstly, for each target entity pair in a first entity pair set, determining an average value of attribute value similarity of at least one same attribute of the target entity pair, and taking the average value as the alignment correctness of the target entity pair; and secondly, comparing the alignment correctness of each target entity pair with a preset correctness threshold, and acquiring the target entity pair with the alignment correctness higher than the preset correctness threshold as model training data.

In this embodiment, the entity alignment model may be trained by using model training data formed by the target entity pairs with high accuracy screened from the first entity pair set, and each target entity pair may be screened from the two knowledge maps by using the entity alignment model obtained by training, so as to form the second entity pair set.

In this embodiment, after the first entity pair set is obtained according to the entity attributes, each target entity pair may be screened out from the two knowledge graphs by using an entity alignment model based on the entity relationship to form a second entity pair set. Therefore, the finally obtained target entity pairs are obtained based on the entity attributes and the entity relationships, the accuracy of the finally obtained target entity pairs is improved, and the accuracy and the comprehensiveness of the entity alignment results are improved.

The entity alignment method provided in the above method embodiment one and method embodiment two can obtain the target entity pair according to the entity attribute and/or the entity relationship.

In addition, in order to further improve the accuracy of the entity alignment result, the attribute similarity and the relationship similarity of the target entity pair may be comprehensively evaluated to obtain a final target entity pair set.

Method embodiment three

For the sake of brevity, details of the same parts in the third method embodiment as those in the second method embodiment are not repeated herein.

Referring to fig. 5, it is a flowchart of an entity alignment method provided in the third embodiment of the present application.

The entity alignment method provided by the embodiment of the application comprises the following steps of S501-S509:

it should be noted that S501 to S507 are the same as S401 to S407 in the second embodiment of the method, and for brevity, are not repeated herein.

S508: and combining the first entity pair set and the second entity pair set to form a third entity pair set.

S509: and removing the target entity pairs with low accuracy from the third entity pair set to serve as a fourth entity pair set.

In order to facilitate understanding and explanation of the entity alignment method provided in the third embodiment of the method of the present application, the following sequentially describes specific implementation manners of S508 and S509.

First, a specific embodiment of S508 will be described.

S508 can adopt three embodiments, which will be described in turn with reference to the accompanying drawings.

As a first embodiment, as shown in fig. 6, S508 may specifically be: and collecting the target entity pairs included in the first entity pair set and the target entity pairs included in the second entity pair set to obtain a third entity pair set.

In the first implementation manner of S508, in this implementation manner, all target entity pairs in the first entity pair set and the second entity pair set may be aggregated, so as to form a third entity pair set.

In addition, in order to further improve the efficiency of the entity alignment method, the same target entity pair may not exist in the third entity pair set, and thus, the present application also provides a second embodiment and a third embodiment of S508, which will be described in turn below.

As a second embodiment, as shown in fig. 7, S508 may specifically be: and acquiring a union of the first entity pair set and the second entity pair set, and taking the union as a third entity pair set.

As a third embodiment, S508 may specifically be: firstly, target entity pairs included in a first entity pair set and target entity pairs included in a second entity pair set are aggregated to obtain an initial third entity pair set (as shown in fig. 6); then, the repeated target entity pairs are deleted from the initial third entity pair set, resulting in a third entity pair set (as shown in fig. 7).

In the above three embodiments of S508, in this embodiment, the first entity pair set and the second entity pair set may be merged to form a third entity pair set.

A specific embodiment of S509 is described below.

In S509, the accuracy of the target entity pair may be determined according to a plurality of indexes, for example, the accuracy of the target entity pair may be determined only according to the attribute similarity, may be determined only according to the relationship similarity, and may be determined according to the integrated value of the attribute similarity and the relationship similarity.

For convenience of explanation and understanding, the accuracy of determining the target entity pair according to the integrated value of the attribute similarity and the relationship similarity will be described below as an example.

As an embodiment, in order to improve the accuracy of the entity alignment method, S509 may specifically include S5091-S5092:

s5091: and determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair for the target entity pair belonging to the first entity pair set and the second entity pair set simultaneously.

Wherein, a target entity pair belonging to both the first entity pair set and the second entity pair set is an entity pair in an intersection (such as the intersection shown in fig. 7) of the first entity pair set and the second entity pair set.

The first similarity is a similarity between two entities included in the target entity pair (i.e., an attribute similarity of the entity pair) obtained when the first entity pair set is formed, and the second similarity is a similarity between two entities included in the target entity pair (i.e., a relationship similarity of the entity pair) obtained when the second entity pair set is formed.

As an embodiment, S5091 may specifically be: and for the target entity pair which belongs to the first entity pair set and the second entity pair set at the same time, determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair and based on the respective confidence degrees of the first similarity and the second similarity.

For example, the confidence of the first similarity and the confidence of the second similarity may be set in advance according to the application scenario.

In addition, in order to further improve the accuracy of the confidence level, so as to further improve the accuracy of the final similarity and further improve the accuracy of the entity alignment method, the confidence level may be obtained by learning in model learning data by using a pre-constructed regression model, and the model learning data includes a target entity pair with high accuracy screened from the first entity pair set. For a specific screening process of the target entity pair with high correctness screened from the first entity pair set, reference may be made to "the process of screening the target entity pair with high correctness from the first entity pair set" provided in the specific implementation manner of step S407, and details are not repeated here for brevity.

Based on the above-mentioned related content of confidence, as an embodiment, S5091 may specifically include S50911-S50913:

s50911: and (3) learning the confidence coefficient parameter in the formula (1) in the model learning data by using a pre-constructed regression model.

Wherein the model learning data comprises the target entity pair with high correctness screened from the first entity pair set, and the model learning data can be used for

The form is shown.

In the formula (I), the compound is shown in the specification,

representing the ith entity in the first knowledge-graph,

represents the jth entity in the second knowledge-graph, and,

and

a pair of target entities can be formed;

the representation comprises

And

a first similarity of the target entity pair of (a);

the representation comprises

And

a second similarity of the target entity pair of (a);

the representation comprises

And

the final similarity of the target entity pair of (1); λ represents a confidence parameter.

S50912: and obtaining the confidence coefficient of the first similarity and the confidence coefficient of the second similarity according to the confidence coefficient parameter lambda.

As an embodiment, S50912 may specifically be: 1- λ is taken as the confidence of the first similarity, and λ is taken as the confidence of the second similarity.

S50913: and for the target entity pair which belongs to the first entity pair set and the second entity pair set at the same time, determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair and based on the respective confidence degrees of the first similarity and the second similarity.

As an embodiment, when the confidence of the first similarity is 1- λ and the confidence of the second similarity is λ, then S50913 may specifically be: and (3) obtaining the final similarity of the target entity pair by using a formula (2) according to the first similarity, the second similarity, the confidence of the first similarity and the confidence of the second similarity of the target entity pair, for the target entity pair which belongs to the first entity pair set and the second entity pair set at the same time.

The final similarity is the first similarity x (1- λ) + the second similarity x λ (2)

Wherein 1- λ represents a confidence of the first similarity; λ represents the confidence of the second similarity.

S5092: and if the final similarity of the target entity pair is smaller than a third preset threshold value, removing the target entity pair from the third entity pair set, and taking the entity pair set after the removal operation as a fourth entity pair set.

The third preset threshold may be preset, for example, the third preset threshold may be determined in advance according to an application scenario.

In the above embodiment of S509, in this embodiment, the target entity pair with low accuracy may be removed from the third entity pair set as the fourth entity pair set.

In addition, in order to understand and explain the entity alignment method provided in the embodiments of the present application, a specific implementation of the entity alignment method will be described below with reference to fig. 8.

First, the specific meaning of each symbol in fig. 8 is described: KG₁Representing a first knowledge-graph; KG₂Representing a second knowledge-graph;

representing the ith entity in the first knowledge-graph;

an attribute name representing an ith attribute in the first knowledge-graph;

representing in the first knowledge-graph

A corresponding attribute value;

representing the tth relationship in the first knowledge-graph;

representing a jth entity in the second knowledge-graph;

an attribute name representing a jth attribute in the second knowledge-graph;

representing a second knowledge-graph

A corresponding attribute value;

representing the s-th relationship in the second knowledge-graph;

representing attribute similarity (i.e., a first similarity) of an mth target entity pair in the first set of entity pairs;

representing a relational similarity (i.e., a second similarity) of an nth target entity pair in the second set of entity pairs.

Next, a specific implementation of the entity alignment method provided in the third method embodiment is described with reference to fig. 8 and fig. 9, in this implementation, the entity alignment method may specifically be:

s901: from the first knowledge-map KG₁And a second knowledge-map KG₂And acquiring an attribute triple and a relation triple.

The attribute triple is used for representing attribute information of each entity; relationship triples are used to represent relationship information between different entities.

S902: and normalizing the attribute values in the attribute triples by using a preset normalization algorithm.

The preset normalization algorithm is used for converting attribute values adopting different expression modes into attribute values adopting the same expression mode; moreover, the preset normalization algorithm may employ any normalization algorithm, for example, the preset normalization algorithm may employ a regular matching algorithm based on a manually-made specification.

For example, the attribute value corresponding to "birth date" may be represented by "1955-02-24", or "02/24/1955", or "24 th feb.1955", and in this case, S902 may specifically be: the '1955-02-24', '02/24/1955' and '24 th Feb.1955' are normalized by a preset normalization algorithm to obtain '1955/02/24'.

S903: according to the attribute triple, the specific implementation manner corresponding to steps 501 to 506 is executed to obtain a first entity pair set.

The specific implementation manner corresponding to steps 501 to 506 can implement the process of "interaction mode" in fig. 8.

S904: and taking the target entity pair with high correctness screened from the first entity pair set as model training data, and training the entity alignment model.

S905: and executing the specific implementation mode corresponding to the step S507 according to the relation triple and the trained entity alignment model to obtain a second entity pair set.

S906: according to the first entity pair set and the second entity pair set, the specific implementation manners corresponding to the steps S508 to S509 are executed to obtain a fourth entity pair set.

The above is an implementation manner of the entity alignment method, in which the attribute values in the attribute triples can be normalized, the problem of attribute value noise caused by various expression manners of the attribute values can be avoided, and the accuracy and consistency of entity attribute description are improved, so that the accuracy of a target entity pair is improved, and the accuracy of an entity alignment result is improved.

In this embodiment, the obtained first entity pair set and the second entity pair set may be merged to obtain a third entity pair set, and a target entity pair with low accuracy is removed from the third entity pair set to serve as a fourth entity pair set. The accuracy of the target entity pair can be obtained by comprehensively evaluating the first similarity and the second similarity of the target entity pair, the first similarity is the attribute similarity of the target entity pair, and the second similarity is the relationship similarity of the target entity pair, so that the accuracy of the target entity pair can be obtained by comprehensively evaluating the attribute similarity and the relationship similarity of the target entity pair, and the accuracy and the comprehensiveness of the entity alignment result are improved.

Based on any one of the entity alignment methods provided in the first to third method embodiments, the present application further provides an entity alignment apparatus, which will be explained and explained below with reference to the accompanying drawings.

Apparatus embodiment one

Referring to fig. 10, a schematic structural diagram of a physical alignment apparatus according to an embodiment of the present application is shown.

The entity alignment apparatus 1000 provided in the embodiment of the present application includes:

a reference attribute pair acquisition unit 1001 configured to determine each known target attribute pair as each reference attribute pair in the two knowledge maps;

a target entity pair screening unit 1002, configured to screen each target entity pair from the two knowledge maps according to each reference attribute pair;

a reference attribute pair screening unit 1003 for screening out each new target attribute pair in the two knowledge maps according to the screened target entity pair as each reference attribute pair,

a target entity pair cyclic screening unit 1004, configured to invoke the screened reference attribute pairs, and screen out each target entity pair in the two knowledge maps according to each reference attribute pair until the target entity pair cannot be screened out, so as to form a first entity pair set;

As an embodiment, in order to further improve the accuracy of the entity alignment result, the target entity pair screening unit 1002 is specifically configured to:

As an embodiment, in order to further improve the accuracy of the entity alignment result, the target entity pair screening unit 1002 includes:

As an embodiment, to further improve the accuracy of the entity alignment result, the target entity pair determining subunit includes:

As an embodiment, in order to further improve the accuracy of the entity alignment result, the reference attribute pair screening unit 1003 includes:

As an embodiment, in order to further improve the accuracy of the entity alignment result, the entity alignment apparatus 1000 further includes:

and the second entity pair set generating unit is used for screening each target entity pair from the two knowledge maps by using an entity alignment model obtained by pre-training to form a second entity pair set, and the entity alignment model is used for screening the entity pairs based on the entity relationship.

As an embodiment, in order to further improve the accuracy of the entity alignment result, the entity alignment model is obtained by training using model training data, where the model training data includes a target entity pair with high correctness screened from the first entity pair set.

a third entity pair set generating unit, configured to, after forming a second entity pair set, combine the first entity pair set and the second entity pair set to form a third entity pair set;

and the fourth entity pair set generating unit is used for removing the target entity pairs with low accuracy from the third entity pair set to serve as a fourth entity pair set.

As an embodiment, in order to further improve the accuracy of the entity alignment result, the fourth entity pair set generating unit includes:

a final similarity determining subunit, configured to determine, for a target entity pair that belongs to both the first entity pair set and the second entity pair set, a final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair;

and the target entity pair removing subunit is used for removing the target entity pair from the third entity pair set if the final similarity of the target entity pair is smaller than a third preset threshold.

As an embodiment, in order to further improve the accuracy of the entity alignment result, the final similarity determining subunit is specifically configured to:

As an embodiment, in order to further improve the accuracy of the entity alignment result, the confidence is obtained by learning in model learning data by using a pre-constructed regression model, and the model learning data includes a target entity pair with high correctness screened from the first entity pair set.

Further, an embodiment of the present application further provides an entity alignment apparatus, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is configured to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform any one implementation of the entity alignment method described above.

Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute any implementation manner of the entity alignment method.

Further, an embodiment of the present application further provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation manner of the entity alignment method.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of entity alignment, comprising:

the attribute names and the attribute values of the two attributes included in the target attribute pair are respectively the same, and the two attributes respectively belong to two knowledge maps; the two entities included in the target entity pair are the same and belong to two knowledge graphs respectively.

2. The method of claim 1, wherein the screening out each pair of target entities in two knowledge-graphs according to each pair of reference attributes comprises:

3. The method of claim 2, wherein the screening out each pair of target entities in the two knowledge-graphs according to the attribute values of the two attributes included in each respective pair of reference attributes comprises:

4. The method of claim 3, wherein the determining whether the initial entity pair belongs to the target entity pair according to at least one attribute value similarity of the initial entity pair comprises:

5. The method of claim 1, wherein the screening of each new pair of target attributes in the two knowledge-graphs according to the screened pair of target entities comprises:

6. The method according to any one of claims 1 to 5, further comprising:

7. The method of claim 6, wherein the entity alignment model is trained using model training data comprising a high correctness target entity pair selected from the first set of entity pairs.

8. The method of claim 6, wherein after forming the second set of entity pairs, further comprising:

9. The method of claim 8, wherein the culling of low accuracy target entity pairs from the third set of entity pairs comprises:

10. The method of claim 9, wherein determining the final similarity of the target entity pair comprises:

11. The method of claim 10, wherein the confidence level is learned using a pre-constructed regression model in model learning data that includes a highly accurate target entity pair selected from the first set of entity pairs.

12. A physical alignment device, comprising:

13. The apparatus of claim 12, wherein the target entity pair screening unit is specifically configured to:

14. The apparatus of claim 13, wherein the target entity pair screening unit comprises:

15. The apparatus of claim 14, wherein the target entity pair determination subunit comprises:

16. The apparatus of claim 12, wherein the reference attribute pair filter unit comprises: