CN109960810B - Entity alignment method and device - Google Patents

Entity alignment method and device Download PDF

Info

Publication number
CN109960810B
CN109960810B CN201910243854.5A CN201910243854A CN109960810B CN 109960810 B CN109960810 B CN 109960810B CN 201910243854 A CN201910243854 A CN 201910243854A CN 109960810 B CN109960810 B CN 109960810B
Authority
CN
China
Prior art keywords
pair
attribute
entity
target
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910243854.5A
Other languages
Chinese (zh)
Other versions
CN109960810A (en
Inventor
何芙珍
李直旭
陈志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Suzhou Technology Co Ltd
Original Assignee
Iflytek Suzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iflytek Suzhou Technology Co Ltd filed Critical Iflytek Suzhou Technology Co Ltd
Priority to CN201910243854.5A priority Critical patent/CN109960810B/en
Publication of CN109960810A publication Critical patent/CN109960810A/en
Application granted granted Critical
Publication of CN109960810B publication Critical patent/CN109960810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an entity alignment method and device, the method obtains a first entity pair set by iteratively executing two screening steps, and the two steps are: and screening a target entity pair from the two knowledge graphs according to the target attribute pair, and screening a new target attribute pair from the two knowledge graphs according to the target entity pair. Because each target entity pair in the first entity pair set is obtained through the attribute information among different entities, and the attribute information of each entity can represent the entity more truly and comprehensively, the accuracy of the entity alignment result can be improved when the entity alignment is carried out by utilizing the attribute information of the entity.

Description

Entity alignment method and device
Technical Field
The present application relates to the field of knowledge graph technology, and in particular, to a method and an apparatus for entity alignment.
Background
With the continuous development and breakthrough of artificial intelligence, Knowledge maps (KGs) have attracted extensive attention as a technical foundation for realizing strong artificial intelligence in the future. At present, when a knowledge graph is constructed, triples are generally collected from semi-structured texts of various encyclopedic websites, or the triples are extracted from various unstructured texts by using an information extraction technology to construct the knowledge graph, wherein two types of triples exist, one type is a relationship triplet, and the other type is an attribute triplet.
Because there is a semantic gap in the expression of entities, relationships and attributes in unstructured texts of various encyclopedias and various sources, students have developed a research topic of semantic integration in order to integrate different knowledge graphs into a uniform, consistent and concise form and provide semantic interoperability for interaction between applications using different knowledge graphs. Entity alignment is an important prerequisite and technical means for semantic integration, and aims to find out entities in two heterogeneous knowledge graphs which point to the same object in the real world.
However, different knowledge maps have large expression differences for various information of entities and relationship structures between the entities, so that entity alignment is very challenging, and the existing entity alignment technology mainly performs entity alignment based on relationships between the entities, but the entity alignment result obtained by adopting the method cannot achieve high accuracy.
Disclosure of Invention
The embodiment of the present application mainly aims to provide an entity alignment method and an entity alignment device, which can improve the accuracy of an entity alignment result when two heterogeneous knowledge maps are subjected to entity alignment.
The embodiment of the application provides an entity alignment method, which comprises the following steps:
determining each known target attribute pair in the two knowledge maps as each reference attribute pair;
screening each target entity pair from the two knowledge graphs according to each reference attribute pair;
screening out each new target attribute pair from the two knowledge graphs as each reference attribute pair according to the screened target entity pair, and continuously performing the step of screening out each target entity pair from the two knowledge graphs according to each reference attribute pair until the target entity pair cannot be screened out, so as to form a first entity pair set;
the two attributes included in the target attribute pair are the same and belong to two knowledge maps respectively; the two entities included in the target entity pair are the same and belong to two knowledge graphs respectively.
Optionally, the screening out each target entity pair from the two knowledge maps according to each reference attribute pair includes:
and screening each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair.
Optionally, the screening out each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair includes:
determining each initial entity pair in the two knowledge graphs, wherein the initial entity pair has at least one reference attribute pair, each reference attribute pair corresponds to an attribute value similarity, and the attribute value similarity is the similarity between the attribute values of the two attributes included in the corresponding reference attribute pair;
and judging whether the initial entity pair belongs to the target entity pair or not according to the similarity of at least one attribute value of the initial entity pair.
Optionally, the determining, according to the similarity of at least one attribute value of the initial entity pair, whether the initial entity pair belongs to the target entity pair includes:
calculating an average value of the similarity of at least one attribute value of the initial entity pair;
and if the calculated average value is larger than a first preset threshold value, judging that the initial entity pair is the target entity pair.
Optionally, the screening out new target attribute pairs from the two knowledge maps according to the screened out target entity pairs includes:
for each screened target entity pair, combining each attribute to be selected under the target entity pair pairwise to obtain an attribute to be selected under each combination, wherein two attributes included in the attribute to be selected do not belong to each determined target attribute pair and respectively belong to two entities in the target entity pair;
calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected;
and if the calculated similarity is greater than a second preset threshold, judging that the attribute pair to be selected is a new target attribute pair.
Optionally, the method further includes:
and screening each target entity pair in the two knowledge maps by using an entity alignment model obtained by pre-training to form a second entity pair set, wherein the entity alignment model is used for screening the entity pairs based on the entity relationship.
Optionally, the entity alignment model is obtained by training using model training data, where the model training data includes a target entity pair with high correctness screened from the first entity pair set.
Optionally, after the forming the second entity pair set, the method further includes:
merging the first entity pair set and the second entity pair set to form a third entity pair set;
and removing the target entity pairs with low accuracy from the third entity pair set to serve as a fourth entity pair set.
Optionally, the removing the target entity pair with low accuracy from the third entity pair set includes:
for a target entity pair belonging to the first entity pair set and the second entity pair set at the same time, determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair;
wherein the first similarity is a similarity between two entities included in the target entity pair obtained when the first entity pair set is formed, and the second similarity is a similarity between two entities included in the target entity pair obtained when the second entity pair set is formed;
and if the final similarity of the target entity pair is smaller than a third preset threshold value, removing the target entity pair from the third entity pair set.
Optionally, the determining the final similarity of the target entity pair includes:
and determining the final similarity of the target entity pair based on the respective confidence degrees of the first similarity and the second similarity.
Optionally, the confidence is obtained by learning in model learning data by using a pre-constructed regression model, where the model learning data includes a target entity pair with high correctness screened from the first entity pair set.
An embodiment of the present application further provides an entity alignment apparatus, including:
a reference attribute pair acquisition unit, configured to determine each known target attribute pair as each reference attribute pair in the two knowledge maps;
the target entity pair screening unit is used for screening each target entity pair from the two knowledge maps according to each reference attribute pair;
the reference attribute pair screening unit is used for screening out each new target attribute pair from the two knowledge maps as each reference attribute pair according to the screened target entity pair;
the target entity pair circulating screening unit is used for calling the screened reference attribute pairs, screening each target entity pair in the two knowledge maps according to each reference attribute pair until the target entity pair cannot be screened, and forming a first entity pair set;
the two attributes included in the target attribute pair are the same and belong to two knowledge maps respectively; the two entities included in the target entity pair are the same and belong to two knowledge graphs respectively.
Optionally, the target entity pair screening unit is specifically configured to:
and screening each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair.
Optionally, the target entity pair screening unit includes:
an initial entity determining subunit, configured to determine, in two knowledge graphs, initial entity pairs, where each initial entity pair has at least one reference attribute pair, and each reference attribute pair has an attribute value similarity corresponding to an attribute value, where the attribute value similarity is a similarity between attribute values of two attributes included in the corresponding reference attribute pair;
and the target entity pair determining subunit is used for judging whether the initial entity pair belongs to the target entity pair according to the similarity of at least one attribute value of the initial entity pair.
Optionally, the determining subunit of the target entity pair includes:
the similarity average value calculating module is used for calculating the average value of the similarity of at least one attribute value of the initial entity pair;
and the target entity pair determining module is used for judging that the initial entity pair is the target entity pair if the calculated average value is greater than a first preset threshold value.
Optionally, the reference attribute pair screening unit includes:
a candidate attribute pair obtaining subunit, configured to combine, for each screened target entity pair, every two of the candidate attributes under the target entity pair to obtain a candidate attribute pair under each combination, where two attributes included in the candidate attribute pair do not belong to each determined target attribute pair and respectively belong to two entities in the target entity pair;
the attribute similarity calculation operator unit is used for calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected;
and the target attribute pair determining subunit is used for judging that the attribute pair to be selected is a new target attribute pair if the calculated similarity is greater than a second preset threshold.
Based on the technical scheme, the method has the following beneficial effects:
the entity alignment method provided by the present application obtains a first entity pair set by iteratively performing two screening steps, and the two steps are: and screening a target entity pair from the two knowledge graphs according to the target attribute pair, and screening a new target attribute pair from the two knowledge graphs according to the target entity pair. Because each target entity pair in the first entity pair set is obtained through the attribute information among different entities, and the attribute information of each entity can represent the entity more truly and comprehensively, the accuracy of the entity alignment result can be improved when the entity alignment is carried out by utilizing the attribute information of the entity. In addition, the target attribute pair can be obtained by screening from the knowledge graph spectrum according to the target entity pair, so that two attributes with the same semantics and different expression modes can form the target attribute pair, the problem that the attributes cannot be aligned due to various expression modes is solved, and the accuracy of the entity alignment result is further improved. In addition, because the target attribute pair and the target entity pair are generated in the iterative process, training data comprising a large number of entity pairs which are aligned in advance are not needed, the problem of low accuracy of the entity alignment result caused by low quality of the training data is solved, and the accuracy of the entity alignment result is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of the structure of various knowledge-maps provided by an embodiment of the present application;
fig. 2 is a flowchart of an entity alignment method according to an embodiment of the present invention;
fig. 3 is a diagram illustrating an example of a process for iteratively screening pairs of target entities according to an embodiment of the present application;
fig. 4 is a flowchart of an entity alignment method according to a second embodiment of the present application;
fig. 5 is a flowchart of an entity alignment method according to a third embodiment of the present application;
FIG. 6 is a diagram illustrating a first set merge result provided in an embodiment of the present application;
FIG. 7 is a diagram illustrating a second set merge result provided in an embodiment of the present application;
fig. 8 is a schematic flowchart of a specific implementation of an entity alignment method according to a third embodiment of the present application;
fig. 9 is a flowchart of a specific implementation of an entity alignment method provided in a third embodiment of the method of the present application;
fig. 10 is a schematic structural diagram of a physical alignment apparatus according to an embodiment of the present application.
Detailed Description
A knowledge graph can be used to describe the relationships between different entities and the attributes each entity has. The entity refers to things which exist in the objective world and can be distinguished from each other, and the entity can be a person, an object, or an abstract concept. For example, "arbor" and "san Francisco" are both entities.
The relationship between different entities refers to some association existing between two entities, for example, according to the fact that "arbor comes from san francisco", the association between the entity "arbor" and the entity "san francisco" is specifically: the place of origin of "arbor" is "san Francisco".
The attributes of an entity refer to certain characteristics of the entity itself, and each attribute refers to an attribute name and an attribute value. For example, from the fact that "arbor's date of birth is 24/2/1955", the entity "arbor" has an attribute of birth within 24/2/1955, wherein "date of birth" is an attribute name and "24/2/1955" is an attribute value.
In addition, in constructing a knowledge graph, triples are typically collected from semi-structured text of various large encyclopedia sites, or extracted from various unstructured text using information extraction techniques. Wherein, the triplets can adopt a uniform representation mode: (subject, predicate, object); moreover, in some cases, a triplet may adopt a special representation mode, which is specifically: a relational triple for describing some association between different entities may be represented by an entity entry, e.g., "arbor is born in san francisco" may be represented by a relational triple (arbor, place of birth, san francisco); an attribute triple for describing a certain attribute of each entity may be represented by an (entity entry, attribute, value), for example, "height of arbor is 188 cm" may be represented by an attribute triple (arbor, height, 188 cm).
However, since there are semantic gaps in the expression of entities, relationships and attributes, in various encyclopedia websites and unstructured texts from various sources, in order to integrate different knowledge graphs into a uniform, consistent and concise form and to provide semantic interoperability for interaction between applications using different knowledge graphs, researchers have developed a research topic of semantic integration. Entity alignment is an important prerequisite and technical means for semantic integration, and aims to find out entities in two heterogeneous knowledge graphs which point to the same object in the real world. However, the entity alignment work is very challenging because different knowledge maps have different expressions for various information of the entities and relationship structures among the entities.
For convenience of explanation and understanding, the entity expression differences in different knowledge-graphs will be described with reference to fig. 1, where fig. 1 is a schematic structural diagram of different knowledge-graphs provided in the embodiments of the present application.
As an example, FIG. 1 comprises a first knowledge-map KG1And a second knowledge-map KG2. Wherein, the first knowledge-map KG1Includes a first entity
Figure BDA0002010495540000071
Second entity
Figure BDA0002010495540000072
And a third entity
Figure BDA0002010495540000073
Second entity
Figure BDA0002010495540000074
And a third entity
Figure BDA0002010495540000075
Has a first relation therebetween
Figure BDA0002010495540000076
First entity
Figure BDA0002010495540000077
And a second entity
Figure BDA0002010495540000078
Has a second relation therebetween
Figure BDA0002010495540000079
For example, the second entity
Figure BDA00020104955400000710
The attribute information and the relationship information specifically include: "birthday-time" 1955-2-24 "," name "Steve Jobs", "birthday-place" San Francisco, California, USA "and" height "188 cm". Second knowledge-graph KG2Including a fourth entity
Figure BDA00020104955400000711
Fifth body
Figure BDA00020104955400000712
And a sixth entity
Figure BDA00020104955400000713
Fifth body
Figure BDA00020104955400000714
And a sixth entity
Figure BDA00020104955400000715
Has a third relation therebetween
Figure BDA00020104955400000716
Fourth entity
Figure BDA00020104955400000721
And the fifth body
Figure BDA00020104955400000717
Have a fourth relationship therebetween
Figure BDA00020104955400000718
For example, the fifth body
Figure BDA00020104955400000722
The attribute information and the relationship information specifically include: "birthdate" is "1955.02.24", "name" is "Steve Jobs", "birthday-place" is "SanFrancisco", "nickname" is "Apple godfather", and "height" is "188 centi-meter".
Referring to FIG. 1, the second entity
Figure BDA00020104955400000719
And the fifth body
Figure BDA00020104955400000720
All point to the person "arbor", but, for the computer, it is very difficult to judge that "birthday-time" and "birthdate" are the same attribute, "San Francisco, California, USA" and "San Francisco" are the same entity, and "188 cm" and "188 centi-meter" are equal attribute values, thus making the entity alignment work very challenging.
In the prior art, the entity alignment method is usually performed based on the idea of word vector embedding (embedding), and two commonly used entity alignment methods will be described as an example below.
The first entity alignment method is to map the entities in the knowledge graph and the relationships between different entities into a vector space, so that the similarity between different entities can be obtained by calculating the distance between vectors, i.e., the deep structure information of the entities on the whole knowledge graph is obtained without depending on any text information, and the entity alignment is performed based on the deep structure information.
The second entity alignment method is improved based on the first entity alignment method, and the improvement of the method is that: according to the relationship among different entities, the semantic description and the attribute of each entity, the vector representation of each entity is obtained, so that the defect of low accuracy of an entity alignment result caused by only considering the relationship among different entities in the first entity alignment method is overcome to a certain extent.
However, it has been found that the two methods of aligning the entities have the following disadvantages:
the first entity alignment method has the following defects: since the method requires a large number of entity pairs which are pre-aligned as training data, but the acquisition of high-quality training data is very difficult, the quality of the training data used by the method is low, which results in low accuracy of the entity alignment result of the method. In addition, since the relationships between the entities in the knowledge-graph are sparse, that is, each entity in the knowledge-graph has little or no relationship with other entities (e.g., an isolated entity in the knowledge-graph), the accuracy of the entity alignment result is low because only the entity relationships are used for entity alignment.
The second entity alignment method has the following defects: since this method is an improvement of the first entity alignment method, it still has the disadvantages of the first entity alignment method. In addition, in the construction process of the entity vector, in order to bypass the diversity of the attribute names and the attribute value expressions, the attribute values are simplified into the attribute value types (such as date types, digital types, and the like), so that the noise in the attribute values is large, the attribute information is not effectively utilized, and the accuracy of the entity alignment result is still low.
When the embodiment of the application is used for entity alignment, the attribute information is effectively utilized, and the accuracy of the entity alignment result is improved.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Method embodiment one
Referring to fig. 2, a flowchart of an entity alignment method according to an embodiment of the present invention is shown.
The entity alignment method provided by the embodiment of the application comprises the following steps:
s201: in both knowledge maps, known respective pairs of target attributes are determined as respective pairs of reference attributes.
S202: and screening each target entity pair from the two knowledge graphs according to each reference attribute pair.
S203: judging whether the number of the currently screened target entity pairs is 0, if so, executing S206; if not, go to S204.
S204: and screening each new target attribute pair from the two knowledge maps according to the screened target entity pairs.
S205: and taking each screened new target attribute pair as each reference attribute pair, and returning to execute S202.
S206: and forming a first entity pair set by all the screened target entity pairs.
In order to facilitate understanding and explanation of the entity alignment method provided in the first embodiment of the method of the present application, the following sequentially describes specific implementation manners of S201 to S206.
First, a specific embodiment of S201 will be described.
In S201, each target attribute pair includes two attributes, where the two attributes are the same and belong to two knowledge-graphs, respectively. By way of example, the first target attribute pair includes a first attribute and a second attribute, wherein the first attribute and the second attribute are the same, and wherein the first attribute belongs to a first knowledge-graph and the second attribute belongs to a second knowledge-graph.
The first attribute and the second attribute may be identical, and the attribute name and the attribute value of the first attribute may be identical to the attribute name and the attribute value of the second attribute, respectively, or the attribute name and the attribute value of the first attribute may be semantically identical to the attribute name and the attribute value of the second attribute, respectively.
For ease of explanation and understanding of the target attribute pairs, reference will now be made to FIG. 1 and two examples.
The first example is specifically: as shown in FIG. 1, when the first attribute is the second entity
Figure BDA0002010495540000091
And the attribute name of the first attribute is "name", and the attribute value of the first attribute is "Steve Jobs"; also, the second attribute is a fifth body
Figure BDA0002010495540000092
When the attribute name of the second attribute is "name" and the attribute value of the second attribute is "Steve Jobs", the attribute name and the attribute value of the first attribute are completely the same as those of the second attribute, so that the first attribute is the same as the second attribute; and, due to the second entity
Figure BDA0002010495540000109
Belonging to a first knowledge-graph KG1And the fifth embodiment
Figure BDA00020104955400001010
Belonging to a second knowledge-graph KG2Thus, belong to the second entity
Figure BDA0002010495540000101
Is belonging to a first knowledge-graph KG1Element of the fifth embodiment
Figure BDA0002010495540000102
Is belonging to a second knowledge-graph KG2The first attribute and the second attribute are made to belong to two different knowledge-graphs, respectively. It can be seen that, since the attribute name and the attribute value of the first attribute are respectively identical to the attribute name and the attribute value of the second attribute, and the first attribute and the second attribute respectively belong to two different knowledge maps, the first attribute and the second attribute form a target attribute pair.
The above is related content of the first example, and in this example, the target attribute pair is described by taking an example in which the attribute name and the attribute value of the first attribute are completely the same as the attribute name and the attribute value of the second attribute, respectively.
The second example is specifically: as shown in FIG. 1, when the first attribute is the second entity
Figure BDA0002010495540000103
And the attribute name of the first attribute is "birthday-time", the attribute value of the first attribute is "1955-2-24"; also, the second attribute is a fifth body
Figure BDA0002010495540000104
And the attribute name of the second attribute is "birthdate", and the attribute value of the second attribute is "1955.02.24", since the attribute names "birthday-time" and "birthdate" are both representative of the birth date and the attribute values "1955-2-24" and "1955.02.24" are both representative of 2.24.2.1955, the attribute name and attribute value of the first attribute are both semantically identical to the attribute name and attribute value of the second attribute, such that the first attribute and the second attribute are identical; and, due to the second entity
Figure BDA0002010495540000105
Belonging to a first knowledge-graph KG1And the fifth embodiment
Figure BDA0002010495540000106
Belonging to a second knowledge-graph KG2Thus, belong to the second entity
Figure BDA0002010495540000107
Is belonging to a first knowledge-graph KG1Element of the fifth embodiment
Figure BDA0002010495540000108
Is belonging to a second knowledge-graph KG2The first attribute and the second attribute are made to belong to two different knowledge-graphs, respectively. It can be seen that the attribute name and attribute value of the first attribute are respectively equal to the attribute name of the second attributeAnd the semantics of the attribute values are the same, and the first attribute and the second attribute belong to two different knowledge maps respectively, so that the first attribute and the second attribute form a target attribute pair.
The above is related content of the second example, in which the target attribute pair is described by taking an example that the attribute name and the attribute value of the first attribute have the same semantics as those of the attribute name and the attribute value of the second attribute, respectively.
In addition, each known target attribute pair may be a preset attribute pair, or an attribute pair obtained by using a preset algorithm in advance. The preset algorithm may be any algorithm capable of determining the target attribute pair, and this is not specifically limited in the embodiment of the present application.
Since the known target attribute pairs may adopt different obtaining manners, S201 may adopt various embodiments accordingly, and the following description will take one embodiment as an example.
As an embodiment, S201 may specifically be: and determining known target attribute pairs in the two knowledge maps by using a preset algorithm so as to take the target attribute pairs as reference attribute pairs.
The above is a specific implementation manner of S201, in this implementation manner, each target attribute pair may be determined from two knowledge maps by using a preset algorithm, and each determined target attribute pair may be used as each reference attribute pair.
A specific embodiment of S202 will be described below.
In S202, each target entity pair includes two entities, wherein the two entities are the same (i.e., the two entities refer to the same content), and the two entities belong to two knowledge-graphs respectively.
As an example, as shown in FIG. 1, the second entity
Figure BDA0002010495540000111
And the fifth body
Figure BDA0002010495540000112
Are all indicatedIs the person "arbor" and the second entity
Figure BDA0002010495540000113
Belonging to a first knowledge-graph KG1Fifth embodiment
Figure BDA0002010495540000114
Belonging to a second knowledge-graph KG2Thus, the second entity
Figure BDA0002010495540000115
And the fifth body
Figure BDA0002010495540000116
A target entity pair may be composed.
Moreover, in the embodiment of the present application, whether the first entity and the second entity are the same may be determined according to the attribute information of the entities.
Since each attribute information may include an attribute name and an attribute value, a target entity pair may be determined from the attribute name and the attribute value of each attribute of an entity. At this time, in order to further improve the accuracy of the target entity pair, the present application provides an embodiment of S202, in which S202 may specifically be: and screening each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair.
For ease of explanation and understanding of the above-provided embodiments of S202, the following description will be made in conjunction with fig. 1.
As an example, it is assumed that each reference attribute pair obtained by S201 includes a first reference attribute pair and a second reference attribute pair. In particular, the first reference attribute pair comprises a first attribute and a second attribute, wherein the first attribute is the second entity
Figure BDA0002010495540000117
The attribute name of the first attribute is "birthday-time" and the attribute value is "1955-2-24", the second attribute is the fifth entity
Figure BDA0002010495540000118
The attribute name of the second attribute is "birthdate" and the attribute value is "1955.02.24"; the second reference attribute pair includes a third attribute and a fourth attribute, wherein the third attribute is the second entity
Figure BDA0002010495540000119
The attribute name of the third attribute is "height" and the attribute value is "188 cm", and the fourth attribute is the fifth body
Figure BDA00020104955400001110
The fourth attribute has an attribute name of "height" and an attribute value of "188 centi-meter".
If the above assumption is true, S202 may specifically be: according to the attribute value '1955-2-24' of the first attribute in the first reference attribute pair and the attribute value '1955.02.24' of the second attribute, and the attribute value '188 cm' of the third attribute in the second reference attribute pair and the attribute value '188 centi-meter' of the fourth attribute, in the first knowledge-graph KG1And a second knowledge-map KG2Screening for inclusion of a second entity
Figure BDA0002010495540000121
And the fifth body
Figure BDA0002010495540000122
The target entity pair of (1).
It should be noted that, the above is described by taking screening of the target entity pair according to two reference attribute pairs as an example, however, in the present application, the reference attribute pair may be at least one pair, and the screened target entity pair may also be at least one pair, and a process of screening at least one target entity pair according to at least one reference attribute pair is the same as that in the above embodiment, and for the sake of brevity, is not described again here.
Based on the above-described one embodiment of S202, in order to further improve the accuracy of the target entity pair and further improve the accuracy of the entity alignment method, the present application further provides another embodiment of S202, in which S202 may specifically include steps S2021-S2022:
s2021: each initial entity pair is determined in two knowledge-graphs.
The initial entity pair has at least one reference attribute pair, and each reference attribute pair has an attribute value similarity corresponding to the similarity between the attribute values of the two attributes included in the corresponding reference attribute pair.
For ease of understanding and explaining the initial entity pair, reference will be made to fig. 3, where fig. 3 is an exemplary diagram of a process for iteratively screening the target entity pair according to an embodiment of the present application.
As shown in FIG. 3, assume a first knowledge-map KG1Includes a first entity
Figure BDA0002010495540000123
Second entity
Figure BDA0002010495540000124
And a third entity
Figure BDA0002010495540000125
Second knowledge-graph KG2Including a fourth entity
Figure BDA0002010495540000126
Fifth body
Figure BDA0002010495540000127
And a seventh entity
Figure BDA0002010495540000128
Furthermore, it is possible to provide a liquid crystal display device,
Figure BDA0002010495540000129
representing a first knowledge-graph KG1The (c) th attribute of (a),
Figure BDA00020104955400001210
representing a second knowledge-graph KG2J and i are positive integers; furthermore, the first reference attribute pair includes an attribute
Figure BDA00020104955400001211
And attribute
Figure BDA00020104955400001212
And a first entity
Figure BDA00020104955400001213
Property (2) of
Figure BDA00020104955400001214
And a fourth entity
Figure BDA00020104955400001215
Property (2) of
Figure BDA00020104955400001216
The attribute value similarity of (1) is 0.78; the second reference attribute pair comprises an attribute
Figure BDA00020104955400001217
And attribute
Figure BDA00020104955400001218
And a first entity
Figure BDA00020104955400001219
Property (2) of
Figure BDA00020104955400001220
And a fourth entity
Figure BDA00020104955400001221
Property (2) of
Figure BDA00020104955400001222
The attribute value similarity of (1) is 0.90; the third reference attribute pair comprises an attribute
Figure BDA00020104955400001223
And attribute
Figure BDA00020104955400001224
And a first entity
Figure BDA00020104955400001225
Property (2) of
Figure BDA00020104955400001226
And a fourth entity
Figure BDA00020104955400001227
Property (2) of
Figure BDA00020104955400001228
Has an attribute value similarity of 0.85, and the second entity
Figure BDA00020104955400001229
Property (2) of
Figure BDA00020104955400001230
And the fifth body
Figure BDA00020104955400001231
Property (2) of
Figure BDA00020104955400001232
The attribute value similarity of (1) is 0.80; the fourth reference attribute pair includes an attribute
Figure BDA00020104955400001233
And attribute
Figure BDA00020104955400001234
And a second entity
Figure BDA00020104955400001235
Property (2) of
Figure BDA00020104955400001236
And the fifth body
Figure BDA00020104955400001237
Property (2) of
Figure BDA00020104955400001238
The attribute value similarity of (1) is 0.95; the fifth reference attribute pair includes an attribute
Figure BDA00020104955400001239
And attribute
Figure BDA00020104955400001240
And a second entity
Figure BDA00020104955400001241
Property (2) of
Figure BDA00020104955400001242
And a seventh entity
Figure BDA00020104955400001243
Property (2) of
Figure BDA00020104955400001244
The attribute value similarity of (2) is 0.95.
When the above assumption is satisfied, the first knowledge-map KG is known1And a second knowledge-map KG2In the first entity
Figure BDA0002010495540000131
And a fourth entity
Figure BDA0002010495540000132
Three reference attribute pairs including a first reference attribute pair, a second reference attribute pair, and a third reference attribute pair; second entity
Figure BDA0002010495540000133
And the fifth body
Figure BDA0002010495540000134
Two pairs of reference attribute pairs including a third reference attribute pair and a fourth reference attribute pair; second entity
Figure BDA0002010495540000135
And a seventh entity
Figure BDA0002010495540000136
A fifth reference attribute pair is included. In this case, S2021 may specifically be: on the first knowledge-map KG1And a second knowledge-map KG2In determining the first entity
Figure BDA0002010495540000137
And a fourth entity
Figure BDA0002010495540000138
Second entity
Figure BDA0002010495540000139
And the fifth body
Figure BDA00020104955400001310
And a second entity
Figure BDA00020104955400001311
And a seventh entity
Figure BDA00020104955400001312
Respectively, initial entity pairs.
The above is a specific embodiment of S2021, and in this embodiment, each initial entity pair may be determined from two knowledge-graphs according to a reference attribute pair.
S2022: and judging whether the initial entity pair belongs to the target entity pair or not according to the similarity of at least one attribute value of the initial entity pair.
The attribute value similarity refers to a similarity between attribute values of two attributes included in the corresponding reference attribute pair. For example, in FIG. 3, an attribute value similarity of 0.78 represents a first entity
Figure BDA00020104955400001313
Property (2) of
Figure BDA00020104955400001314
Attribute value and fourth entity
Figure BDA00020104955400001315
Property (2) of
Figure BDA00020104955400001316
Similarity between attribute values of (1).
As an example, when the initial entity pair includes N attribute value similarities, S2022 may specifically be: judging whether the initial entity pair belongs to a target entity pair or not according to the similarity of the M attribute values in the initial entity pair; wherein M is a positive integer and M is less than or equal to N.
As an embodiment, in order to further improve the accuracy of the target entity pair and further improve the accuracy of the entity alignment method, S2022 may specifically include S2022a-S2022 b:
s2022 a: and calculating the average value of the similarity of at least one attribute value of the initial entity pair as the attribute similarity of the initial entity pair.
The attribute similarity refers to the similarity between the attributes of two entities included in a corresponding entity pair. For example, as shown in FIG. 3, when a first initial pair of entities comprises a first entity
Figure BDA00020104955400001317
And a fourth entity
Figure BDA00020104955400001318
Then, the similarity of the attributes of the first initial entity pair may represent the first entity
Figure BDA00020104955400001319
Attribute and fourth entity
Figure BDA00020104955400001320
Similarity between attributes of (1).
As an example, when the initial entity pair includes N attribute value similarities, S2022a may specifically be: calculating the average value of the similarity of the M attribute values of the initial entity pair, and taking the average value as the attribute similarity of the initial entity pair; wherein M is a positive integer and M is less than or equal to N.
For convenience of explanation and understanding, the following description will be made with reference to fig. 3, taking M ═ N as an example.
By way of example, when the assumption made for fig. 3 provided by S2021 holds true, then it can be seen that the first initial entity pair comprises the first entity
Figure BDA0002010495540000141
And a fourth entity
Figure BDA0002010495540000142
And the first initial entity pair comprises three attribute value similarities of 0.78, 0.90 and 0.85; the second initial pair of entities comprises the second entity
Figure BDA0002010495540000143
And the fifth body
Figure BDA0002010495540000144
And the second initial entity pair comprises two attribute value similarities of 0.80 and 0.95; the third initial pair of entities comprises the second entity
Figure BDA0002010495540000145
And a seventh entity
Figure BDA0002010495540000146
And the third initial entity pair comprises 0.3 in common attribute value similarity. In this case, S2022a may specifically be: calculating an average value 0.843 of 0.78, 0.90 and 0.85, and taking the average value 0.843 as the attribute similarity of the first initial entity pair; calculating an average value 0.875 of 0.80 and 0.95, and taking the average value 0.875 as the attribute similarity of the second initial entity pair; an average value of 0.3 is calculated and the average value of 0.3 is taken as the attribute similarity of the third initial entity pair.
It should be noted that, although S2022a is described above by taking M ═ N as an example, in the present application, in S2022a, M may be not only equal to N but also any positive integer smaller than N, and when M takes a different value, the execution procedure of S2022a is the same as that of the above example, and for the sake of brevity, it is not described again here.
In the above embodiment of S2022a, an average value of the similarity of at least one attribute value of the initial entity pair may be calculated, and the average value may be used as the attribute similarity of the initial entity pair.
S2022 b: and if the calculated attribute similarity of the initial entity pair is greater than a first preset threshold, judging that the initial entity pair is the target entity pair.
The first preset threshold may be preset, for example, the first preset threshold may be preset according to an application scenario.
As an example, if the first preset threshold is preset to be 0.7, S2022b may specifically be: judging whether the attribute similarity of the initial entity pair is greater than a first preset threshold value, if so, determining that the initial entity pair is a target entity pair; if not, determining that the initial entity pair is not the target entity pair.
For ease of explanation and understanding of S2022b, the following description is made in conjunction with fig. 3.
As an example, when the assumption provided in S2021 for fig. 3 holds, and the similarity of the attributes of the first initial entity pair is 0.843, the similarity of the attributes of the second initial entity pair is 0.875, the similarity of the attributes of the third initial entity pair is 0.3, and the preset first preset threshold is 0.7, then S2022b may specifically be: since both 0.843 and 0.875 are greater than 0.7, both the attribute similarity of the first initial entity pair and the attribute similarity of the second initial entity pair are greater than the first preset threshold, and at this time, it may be determined that both the first initial entity pair and the second initial entity pair are the target entity pair.
In the above specific implementation manner of S2022b, in this implementation manner, it may be determined whether the initial entity pair is the target entity pair by determining whether the calculated attribute similarity of the initial entity pair is greater than a first preset threshold.
The above is a specific implementation manner of S202, in this implementation manner, the attribute similarity of each entity pair may be determined according to each reference attribute pair, and each target entity pair may be screened from the two knowledge graphs according to the attribute similarity of each entity pair. In this way, the attribute of each entity can represent the entity more truly and comprehensively, so that the accuracy of the target entity pair screened according to the attribute similarity of the entity pair is higher.
A specific embodiment of S203 is described below.
In S203, the currently screened target entity pair refers to the target entity pair screened by performing step S202 in the current screening cycle.
Since the present application may implement the screening process for the target entity pair by iteratively performing steps S202 to S205, the screening process for the target entity pair provided by the present application may include at least one screening cycle. In addition, in each screening cycle, whether to continue the next screening cycle may be determined by judging whether the target entity pair can be screened out in the current screening cycle, and the determining process may specifically be: if at least one pair of target entity pairs can be screened in the current screening period, continuing to execute the next screening period; and if the target entity pair cannot be screened in the current screening period, ending the screening process of the target entity pair, and forming a first entity pair set by using the target entity pairs obtained in all screening periods before the current screening period.
The above is a specific implementation manner of S203, in this implementation manner, it may be determined whether to continue to execute the next screening cycle by determining whether the number of the currently screened target entity pairs is 0.
A specific embodiment of S204 is described below.
In S204, since the reference contents of the two entities belonging to the same target entity pair are the same and each entity includes a plurality of attributes, it can be determined that the two attributes having the same semantic meaning respectively belonging to the two entities are also the same. Therefore, each new target attribute pair can be screened from the two knowledge maps according to the screened target entity pair.
As an embodiment, S204 may specifically include steps S2041 to S2043:
s2041: and for each screened target entity pair, combining each attribute to be selected under the target entity pair pairwise to obtain the attribute to be selected under each combination.
The two attributes included in the attribute pair to be selected do not belong to the determined target attribute pairs and respectively belong to the two entities in the target entity pair.
For ease of explanation and understanding of the candidate attribute pairs, reference will now be made to FIG. 3.
When the assumption made in FIG. 3 provided in S2021 holds, the first entity
Figure BDA0002010495540000161
And a fourth entity
Figure BDA0002010495540000162
When the first target entity is paired, the first entity can be paired
Figure BDA0002010495540000163
Property (2) of
Figure BDA0002010495540000164
And attribute
Figure BDA0002010495540000165
And a fourth entity
Figure BDA0002010495540000166
Property (2) of
Figure BDA0002010495540000167
And attribute
Figure BDA0002010495540000168
As the candidate attribute of the first target entity pair, and the attribute
Figure BDA0002010495540000169
And attribute
Figure BDA00020104955400001610
Respectively with attributes
Figure BDA00020104955400001611
And attribute
Figure BDA00020104955400001612
And combining every two to obtain four groups of attribute pairs to be selected: including attributes
Figure BDA00020104955400001613
And attribute
Figure BDA00020104955400001614
The first attribute pair to be selected comprises attributes
Figure BDA00020104955400001615
And attribute
Figure BDA00020104955400001616
Second candidate attribute pair of (1), including attribute
Figure BDA00020104955400001617
And attribute
Figure BDA00020104955400001618
And a third candidate attribute pair of (2) and including the attribute
Figure BDA00020104955400001619
And attribute
Figure BDA00020104955400001620
The fourth candidate attribute pair of (1).
In addition, when the second entity
Figure BDA00020104955400001621
And the fifth body
Figure BDA00020104955400001622
When the second target entity is paired, the second entity can be paired
Figure BDA00020104955400001623
Property (2) of
Figure BDA00020104955400001624
And attribute
Figure BDA00020104955400001625
And the fifth embodiment
Figure BDA00020104955400001626
Property (2) of
Figure BDA00020104955400001627
And attribute
Figure BDA00020104955400001628
As the attribute to be selected of the second target entity pair, and the attribute
Figure BDA00020104955400001629
And attribute
Figure BDA00020104955400001630
Respectively with attributes
Figure BDA00020104955400001631
And attribute
Figure BDA00020104955400001632
And combining every two to obtain four groups of attribute pairs to be selected: including attributes
Figure BDA00020104955400001633
And attribute
Figure BDA00020104955400001634
The fifth candidate attribute pair of (1), including the attribute
Figure BDA00020104955400001635
And attribute
Figure BDA00020104955400001636
The sixth candidate attribute pair of (1), including the attribute
Figure BDA00020104955400001637
And attribute
Figure BDA00020104955400001644
And includes the attribute
Figure BDA00020104955400001638
And attribute
Figure BDA00020104955400001639
The eighth candidate attribute pair of (1).
It should be noted that, the specific implementation of S2041 is explained and explained above by taking the process of obtaining the candidate attribute pair according to the first target entity pair and the second target entity pair as an example, but in the present application, S2041 is not limited to be able to adopt only the above implementation, and may adopt other implementations, and for the sake of brevity, details are not described here again.
S2042: and calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected as the attribute value similarity of the attribute pair to be selected.
As an example, when the first to-be-selected attribute pair includes an attribute
Figure BDA00020104955400001640
And attribute
Figure BDA00020104955400001641
Then, S2042 may specifically be: calculating attributes in the first attribute pair to be selected
Figure BDA00020104955400001642
Property value and property of
Figure BDA00020104955400001643
As the first to-be-treated, the similarity between the attribute values ofAnd selecting the attribute value similarity of the attribute pair.
S2043: and if the attribute value similarity of the attribute pair to be selected obtained through calculation is greater than a second preset threshold value, judging that the attribute pair to be selected is a new target attribute pair.
The second preset threshold may be preset, for example, the second preset threshold may be preset according to an actual application scenario.
For ease of explanation and understanding of S2043, the following description will be made in conjunction with fig. 3.
As an example, the assumption made in FIG. 3 provided at S2021 holds, and the first pair of attributes to be selected includes an attribute
Figure BDA0002010495540000171
And attribute
Figure BDA0002010495540000172
And the attribute value similarity of the first attribute pair to be selected is 0.8; the second candidate attribute pair includes an attribute
Figure BDA0002010495540000173
And attribute
Figure BDA0002010495540000174
And the attribute value similarity of the second attribute pair to be selected is 0.2; the third candidate attribute pair includes an attribute
Figure BDA0002010495540000175
And attribute
Figure BDA0002010495540000176
And the attribute value similarity of the third candidate attribute pair is 0.1; the fourth candidate attribute pair includes an attribute
Figure BDA0002010495540000177
And attribute
Figure BDA0002010495540000178
And the attribute value similarity of the fourth candidate attribute pair is 0.15; the fifth candidate attribute pair includes an attribute
Figure BDA0002010495540000179
And attribute
Figure BDA00020104955400001710
And the attribute value similarity of the fifth candidate attribute pair is 0.2; the sixth candidate attribute pair includes an attribute
Figure BDA00020104955400001711
And attribute
Figure BDA00020104955400001712
And the attribute value similarity of the sixth candidate attribute pair is 0.3; the seventh candidate attribute pair includes an attribute
Figure BDA00020104955400001713
And attribute
Figure BDA00020104955400001714
And the attribute value similarity of the seventh candidate attribute pair is 0.85; the eighth candidate attribute pair includes an attribute
Figure BDA00020104955400001715
And attribute
Figure BDA00020104955400001716
And the attribute value similarity of the eighth attribute pair to be selected is 0.13; and when the second preset threshold is 0.6, since only the attribute value similarity 0.8 of the first attribute pair to be selected and the attribute value similarity 0.85 of the seventh attribute pair to be selected are both greater than the second preset threshold 0.6, it can be determined that: including attributes
Figure BDA00020104955400001717
And attribute
Figure BDA00020104955400001718
And the first to-be-selected attribute pair of (1) and (2) includes an attribute
Figure BDA00020104955400001719
And attribute
Figure BDA00020104955400001720
The seventh candidate attribute pair of (1) is a new target attribute pair.
In this embodiment, at least one pair of attribute pairs to be selected may be obtained according to the screened target entity pair, and whether the attribute pair to be selected is a new target attribute pair may be determined according to the similarity of the attribute values of each attribute pair to be selected, so that a new target attribute pair may be extracted from an existing target entity pair, so as to use the new target attribute pair as a reference attribute pair and continue to perform the next screening cycle.
A specific embodiment of S205 is described below.
As an embodiment, when the total number of new target attribute pairs is N, then S205 may specifically be: and taking the N new target attribute pairs as N reference attribute pairs so as to screen out each new target entity pair in the two knowledge maps according to the N reference attribute pairs.
For ease of explanation and understanding of S205, it will be explained below in conjunction with fig. 3.
As an example, the assumptions made for FIG. 3 when provided by S2021 hold, and include the attributes
Figure BDA00020104955400001721
And attribute
Figure BDA00020104955400001728
And the first to-be-selected attribute pair of (1) and (2) includes an attribute
Figure BDA00020104955400001722
And attribute
Figure BDA00020104955400001723
If the seventh candidate attribute pair is a new target attribute pair, S205 may specifically be: will include the attribute
Figure BDA00020104955400001724
And attribute
Figure BDA00020104955400001725
As a sixth reference attribute pair, and will include the attribute
Figure BDA00020104955400001726
And attribute
Figure BDA00020104955400001727
As a seventh reference attribute pair, so that the second knowledge-graph KG can be obtained according to the sixth reference attribute pair and the seventh reference attribute pair in S2021And a second knowledge-map KG2Screening for inclusion of a third entity
Figure BDA0002010495540000181
And a seventh entity
Figure BDA0002010495540000182
The third target entity pair of (1).
The above is a specific implementation of S205, in this implementation, each new target attribute pair that is screened out may be used as each reference attribute pair, and S202 is executed back to perform the screening of the target entity pair in the next screening cycle based on the reference attribute pair.
The following describes a specific embodiment of S206
In S206, all the screened target entity pairs refer to the target entity pairs screened in all the screening cycles.
For ease of explanation and understanding of S206, it will be explained below in conjunction with fig. 3.
As an example, the assumption made in FIG. 3 provided at S2021 holds and the first entity is screened during the first screening cycle
Figure BDA0002010495540000183
And a fourth entity
Figure BDA0002010495540000184
And including a second entity
Figure BDA0002010495540000185
And the fifth body
Figure BDA0002010495540000186
A second target entity pair of (a); in the second screening period screening out the second entity
Figure BDA0002010495540000187
And a seventh entity
Figure BDA0002010495540000188
If the third target entity pair is obtained, then S206 may specifically be: and collecting the first target entity pair, the second target entity pair and the third target entity pair to obtain a first entity pair collection.
In the above embodiment of S206, in this embodiment, all the screened target entity pairs may be aggregated to obtain a first entity pair set.
The above is a specific implementation manner of the first embodiment of the method, in which the first entity pair set is obtained by iteratively performing two screening steps, and the two steps are: and screening a target entity pair from the two knowledge graphs according to the target attribute pair, and screening a new target attribute pair from the two knowledge graphs according to the target entity pair. Because each target entity pair in the first entity pair set is obtained through the attribute information among different entities, and the attribute information of each entity can represent the entity more truly and comprehensively, the accuracy of the entity alignment result can be improved when the entity alignment is carried out by utilizing the attribute information of the entity. In addition, the target attribute pair can be obtained by screening from the knowledge graph spectrum according to the target entity pair, so that two attributes with the same semantics and different expression modes can form the target attribute pair, the problem that the attributes cannot be aligned due to various expression modes is solved, and the accuracy of the entity alignment result is further improved. In addition, because the target attribute pair and the target entity pair are generated in the iterative process, training data comprising a large number of entity pairs which are aligned in advance are not needed, the problem of low accuracy of the entity alignment result caused by low quality of the training data is solved, and the accuracy of the entity alignment result is improved.
In the entity alignment method provided in the first embodiment of the method, at least one pair of target entity pairs is obtained by using the entity attributes, and the entity attributes can represent the entity more truly and comprehensively, so that the accuracy of the target entity pairs is improved, and the accuracy of the entity alignment result is improved.
In addition, in order to further improve the accuracy of the target entity pair and thus further improve the accuracy of the entity alignment result, the target entity pair may also be obtained by simultaneously using the entity attribute and the entity relationship, so that the present application further provides another entity alignment method, which will be explained and explained below with reference to the accompanying drawings.
Method embodiment two
For the sake of brevity, the same contents as those in the first method embodiment are not described again.
Referring to fig. 4, it is a flowchart of an entity alignment method provided in the second embodiment of the present application.
The entity alignment method provided by the embodiment of the application comprises S401-S407:
it should be noted that S401 to S406 are the same as S201 to S206 in the first embodiment of the method, and are not repeated herein for brevity.
S407: and screening each target entity pair from the two knowledge maps by using an entity alignment model obtained by pre-training to form a second entity pair set.
It should be noted that the present application does not limit the execution order of S407, and may execute before, after, or synchronously with S401 to S406.
In S407, the entity alignment model is used to screen the target entity pair based on the entity relationship, and the entity alignment model may be any model that utilizes the entity relationship to perform entity pair screening.
By way of example, the entity alignment model may be any one of word vector embedding (embedding) -based models, and in the model, the relationship triplets (h, r, t) corresponding to each pair of entities may be mapped into a vector space to obtain the entity alignment model
Figure BDA0002010495540000191
So as to measure the similarity between different entities by using the distance between vectors between different entities, i.e. the relationship similarity of the entity pairs. Wherein h represents a head entity of the relational triple; r represents the relationship between the head entity of the relationship triple and the tail entity of the relationship triple; t represents a tail entity of the relational triple;
Figure BDA0002010495540000192
representing a vector corresponding to a head entity of the relational triple;
Figure BDA0002010495540000193
representing a vector corresponding to the relation;
Figure BDA0002010495540000194
and representing the vector corresponding to the tail entity of the relation triple.
The entity alignment model may be obtained by training in advance using model training data, the model training data may include at least one pair of target entity pairs, and the source of the model training data is wide. Wherein, the model training data can be a training data set composed of at least one pair of artificially labeled target entities; the model training data may also be training data composed of at least one pair of target entities obtained by using a preset labeling algorithm, and the preset labeling algorithm may be preset.
For ease of explanation and understanding of the entity alignment model, an embodiment of training the entity alignment model will be described as an example.
As an implementation manner, in order to improve the efficiency and accuracy of the entity alignment method, the preset labeling algorithm may be any entity alignment method provided in the first embodiment of the method (i.e., any implementation manner of steps S401 to S406), and thus, the model training data of the entity alignment model may include a target entity pair with high correctness screened from the first entity pair set.
Wherein the first set of entity pairs is generated by step S406; moreover, the process of screening out the target entity pair with high correctness from the first entity pair set may specifically be: firstly, for each target entity pair in a first entity pair set, determining an average value of attribute value similarity of at least one same attribute of the target entity pair, and taking the average value as the alignment correctness of the target entity pair; and secondly, comparing the alignment correctness of each target entity pair with a preset correctness threshold, and acquiring the target entity pair with the alignment correctness higher than the preset correctness threshold as model training data.
In this embodiment, the entity alignment model may be trained by using model training data formed by the target entity pairs with high accuracy screened from the first entity pair set, and each target entity pair may be screened from the two knowledge maps by using the entity alignment model obtained by training, so as to form the second entity pair set.
In this embodiment, after the first entity pair set is obtained according to the entity attributes, each target entity pair may be screened out from the two knowledge graphs by using an entity alignment model based on the entity relationship to form a second entity pair set. Therefore, the finally obtained target entity pairs are obtained based on the entity attributes and the entity relationships, the accuracy of the finally obtained target entity pairs is improved, and the accuracy and the comprehensiveness of the entity alignment results are improved.
The entity alignment method provided in the above method embodiment one and method embodiment two can obtain the target entity pair according to the entity attribute and/or the entity relationship.
In addition, in order to further improve the accuracy of the entity alignment result, the attribute similarity and the relationship similarity of the target entity pair may be comprehensively evaluated to obtain a final target entity pair set.
Method embodiment three
For the sake of brevity, details of the same parts in the third method embodiment as those in the second method embodiment are not repeated herein.
Referring to fig. 5, it is a flowchart of an entity alignment method provided in the third embodiment of the present application.
The entity alignment method provided by the embodiment of the application comprises the following steps of S501-S509:
it should be noted that S501 to S507 are the same as S401 to S407 in the second embodiment of the method, and for brevity, are not repeated herein.
S508: and combining the first entity pair set and the second entity pair set to form a third entity pair set.
S509: and removing the target entity pairs with low accuracy from the third entity pair set to serve as a fourth entity pair set.
In order to facilitate understanding and explanation of the entity alignment method provided in the third embodiment of the method of the present application, the following sequentially describes specific implementation manners of S508 and S509.
First, a specific embodiment of S508 will be described.
S508 can adopt three embodiments, which will be described in turn with reference to the accompanying drawings.
As a first embodiment, as shown in fig. 6, S508 may specifically be: and collecting the target entity pairs included in the first entity pair set and the target entity pairs included in the second entity pair set to obtain a third entity pair set.
In the first implementation manner of S508, in this implementation manner, all target entity pairs in the first entity pair set and the second entity pair set may be aggregated, so as to form a third entity pair set.
In addition, in order to further improve the efficiency of the entity alignment method, the same target entity pair may not exist in the third entity pair set, and thus, the present application also provides a second embodiment and a third embodiment of S508, which will be described in turn below.
As a second embodiment, as shown in fig. 7, S508 may specifically be: and acquiring a union of the first entity pair set and the second entity pair set, and taking the union as a third entity pair set.
As a third embodiment, S508 may specifically be: firstly, target entity pairs included in a first entity pair set and target entity pairs included in a second entity pair set are aggregated to obtain an initial third entity pair set (as shown in fig. 6); then, the repeated target entity pairs are deleted from the initial third entity pair set, resulting in a third entity pair set (as shown in fig. 7).
In the above three embodiments of S508, in this embodiment, the first entity pair set and the second entity pair set may be merged to form a third entity pair set.
A specific embodiment of S509 is described below.
In S509, the accuracy of the target entity pair may be determined according to a plurality of indexes, for example, the accuracy of the target entity pair may be determined only according to the attribute similarity, may be determined only according to the relationship similarity, and may be determined according to the integrated value of the attribute similarity and the relationship similarity.
For convenience of explanation and understanding, the accuracy of determining the target entity pair according to the integrated value of the attribute similarity and the relationship similarity will be described below as an example.
As an embodiment, in order to improve the accuracy of the entity alignment method, S509 may specifically include S5091-S5092:
s5091: and determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair for the target entity pair belonging to the first entity pair set and the second entity pair set simultaneously.
Wherein, a target entity pair belonging to both the first entity pair set and the second entity pair set is an entity pair in an intersection (such as the intersection shown in fig. 7) of the first entity pair set and the second entity pair set.
The first similarity is a similarity between two entities included in the target entity pair (i.e., an attribute similarity of the entity pair) obtained when the first entity pair set is formed, and the second similarity is a similarity between two entities included in the target entity pair (i.e., a relationship similarity of the entity pair) obtained when the second entity pair set is formed.
As an embodiment, S5091 may specifically be: and for the target entity pair which belongs to the first entity pair set and the second entity pair set at the same time, determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair and based on the respective confidence degrees of the first similarity and the second similarity.
For example, the confidence of the first similarity and the confidence of the second similarity may be set in advance according to the application scenario.
In addition, in order to further improve the accuracy of the confidence level, so as to further improve the accuracy of the final similarity and further improve the accuracy of the entity alignment method, the confidence level may be obtained by learning in model learning data by using a pre-constructed regression model, and the model learning data includes a target entity pair with high accuracy screened from the first entity pair set. For a specific screening process of the target entity pair with high correctness screened from the first entity pair set, reference may be made to "the process of screening the target entity pair with high correctness from the first entity pair set" provided in the specific implementation manner of step S407, and details are not repeated here for brevity.
Based on the above-mentioned related content of confidence, as an embodiment, S5091 may specifically include S50911-S50913:
s50911: and (3) learning the confidence coefficient parameter in the formula (1) in the model learning data by using a pre-constructed regression model.
Wherein the model learning data comprises the target entity pair with high correctness screened from the first entity pair set, and the model learning data can be used for
Figure BDA0002010495540000231
The form is shown.
Figure BDA0002010495540000232
In the formula (I), the compound is shown in the specification,
Figure BDA0002010495540000233
representing the ith entity in the first knowledge-graph,
Figure BDA0002010495540000234
represents the jth entity in the second knowledge-graph, and,
Figure BDA0002010495540000235
and
Figure BDA0002010495540000236
a pair of target entities can be formed;
Figure BDA0002010495540000237
the representation comprises
Figure BDA0002010495540000238
And
Figure BDA0002010495540000239
a first similarity of the target entity pair of (a);
Figure BDA00020104955400002310
the representation comprises
Figure BDA00020104955400002311
And
Figure BDA00020104955400002312
a second similarity of the target entity pair of (a);
Figure BDA00020104955400002313
the representation comprises
Figure BDA00020104955400002314
And
Figure BDA00020104955400002315
the final similarity of the target entity pair of (1); λ represents a confidence parameter.
S50912: and obtaining the confidence coefficient of the first similarity and the confidence coefficient of the second similarity according to the confidence coefficient parameter lambda.
As an embodiment, S50912 may specifically be: 1- λ is taken as the confidence of the first similarity, and λ is taken as the confidence of the second similarity.
S50913: and for the target entity pair which belongs to the first entity pair set and the second entity pair set at the same time, determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair and based on the respective confidence degrees of the first similarity and the second similarity.
As an embodiment, when the confidence of the first similarity is 1- λ and the confidence of the second similarity is λ, then S50913 may specifically be: and (3) obtaining the final similarity of the target entity pair by using a formula (2) according to the first similarity, the second similarity, the confidence of the first similarity and the confidence of the second similarity of the target entity pair, for the target entity pair which belongs to the first entity pair set and the second entity pair set at the same time.
The final similarity is the first similarity x (1- λ) + the second similarity x λ (2)
Wherein 1- λ represents a confidence of the first similarity; λ represents the confidence of the second similarity.
S5092: and if the final similarity of the target entity pair is smaller than a third preset threshold value, removing the target entity pair from the third entity pair set, and taking the entity pair set after the removal operation as a fourth entity pair set.
The third preset threshold may be preset, for example, the third preset threshold may be determined in advance according to an application scenario.
In the above embodiment of S509, in this embodiment, the target entity pair with low accuracy may be removed from the third entity pair set as the fourth entity pair set.
In addition, in order to understand and explain the entity alignment method provided in the embodiments of the present application, a specific implementation of the entity alignment method will be described below with reference to fig. 8.
First, the specific meaning of each symbol in fig. 8 is described: KG1Representing a first knowledge-graph; KG2Representing a second knowledge-graph;
Figure BDA0002010495540000241
representing the ith entity in the first knowledge-graph;
Figure BDA0002010495540000242
an attribute name representing an ith attribute in the first knowledge-graph;
Figure BDA0002010495540000243
representing in the first knowledge-graph
Figure BDA00020104955400002412
A corresponding attribute value;
Figure BDA0002010495540000244
representing the tth relationship in the first knowledge-graph;
Figure BDA0002010495540000245
representing a jth entity in the second knowledge-graph;
Figure BDA0002010495540000246
an attribute name representing a jth attribute in the second knowledge-graph;
Figure BDA0002010495540000247
representing a second knowledge-graph
Figure BDA0002010495540000248
A corresponding attribute value;
Figure BDA0002010495540000249
representing the s-th relationship in the second knowledge-graph;
Figure BDA00020104955400002410
representing attribute similarity (i.e., a first similarity) of an mth target entity pair in the first set of entity pairs;
Figure BDA00020104955400002411
representing a relational similarity (i.e., a second similarity) of an nth target entity pair in the second set of entity pairs.
Next, a specific implementation of the entity alignment method provided in the third method embodiment is described with reference to fig. 8 and fig. 9, in this implementation, the entity alignment method may specifically be:
s901: from the first knowledge-map KG1And a second knowledge-map KG2And acquiring an attribute triple and a relation triple.
The attribute triple is used for representing attribute information of each entity; relationship triples are used to represent relationship information between different entities.
S902: and normalizing the attribute values in the attribute triples by using a preset normalization algorithm.
The preset normalization algorithm is used for converting attribute values adopting different expression modes into attribute values adopting the same expression mode; moreover, the preset normalization algorithm may employ any normalization algorithm, for example, the preset normalization algorithm may employ a regular matching algorithm based on a manually-made specification.
For example, the attribute value corresponding to "birth date" may be represented by "1955-02-24", or "02/24/1955", or "24 th feb.1955", and in this case, S902 may specifically be: the '1955-02-24', '02/24/1955' and '24 th Feb.1955' are normalized by a preset normalization algorithm to obtain '1955/02/24'.
S903: according to the attribute triple, the specific implementation manner corresponding to steps 501 to 506 is executed to obtain a first entity pair set.
The specific implementation manner corresponding to steps 501 to 506 can implement the process of "interaction mode" in fig. 8.
S904: and taking the target entity pair with high correctness screened from the first entity pair set as model training data, and training the entity alignment model.
S905: and executing the specific implementation mode corresponding to the step S507 according to the relation triple and the trained entity alignment model to obtain a second entity pair set.
S906: according to the first entity pair set and the second entity pair set, the specific implementation manners corresponding to the steps S508 to S509 are executed to obtain a fourth entity pair set.
The above is an implementation manner of the entity alignment method, in which the attribute values in the attribute triples can be normalized, the problem of attribute value noise caused by various expression manners of the attribute values can be avoided, and the accuracy and consistency of entity attribute description are improved, so that the accuracy of a target entity pair is improved, and the accuracy of an entity alignment result is improved.
In this embodiment, the obtained first entity pair set and the second entity pair set may be merged to obtain a third entity pair set, and a target entity pair with low accuracy is removed from the third entity pair set to serve as a fourth entity pair set. The accuracy of the target entity pair can be obtained by comprehensively evaluating the first similarity and the second similarity of the target entity pair, the first similarity is the attribute similarity of the target entity pair, and the second similarity is the relationship similarity of the target entity pair, so that the accuracy of the target entity pair can be obtained by comprehensively evaluating the attribute similarity and the relationship similarity of the target entity pair, and the accuracy and the comprehensiveness of the entity alignment result are improved.
Based on any one of the entity alignment methods provided in the first to third method embodiments, the present application further provides an entity alignment apparatus, which will be explained and explained below with reference to the accompanying drawings.
Apparatus embodiment one
Referring to fig. 10, a schematic structural diagram of a physical alignment apparatus according to an embodiment of the present application is shown.
The entity alignment apparatus 1000 provided in the embodiment of the present application includes:
a reference attribute pair acquisition unit 1001 configured to determine each known target attribute pair as each reference attribute pair in the two knowledge maps;
a target entity pair screening unit 1002, configured to screen each target entity pair from the two knowledge maps according to each reference attribute pair;
a reference attribute pair screening unit 1003 for screening out each new target attribute pair in the two knowledge maps according to the screened target entity pair as each reference attribute pair,
a target entity pair cyclic screening unit 1004, configured to invoke the screened reference attribute pairs, and screen out each target entity pair in the two knowledge maps according to each reference attribute pair until the target entity pair cannot be screened out, so as to form a first entity pair set;
the two attributes included in the target attribute pair are the same and belong to two knowledge maps respectively; the two entities included in the target entity pair are the same and belong to two knowledge graphs respectively.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the target entity pair screening unit 1002 is specifically configured to:
and screening each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the target entity pair screening unit 1002 includes:
an initial entity determining subunit, configured to determine, in two knowledge graphs, initial entity pairs, where each initial entity pair has at least one reference attribute pair, and each reference attribute pair has an attribute value similarity corresponding to an attribute value, where the attribute value similarity is a similarity between attribute values of two attributes included in the corresponding reference attribute pair;
and the target entity pair determining subunit is used for judging whether the initial entity pair belongs to the target entity pair according to the similarity of at least one attribute value of the initial entity pair.
As an embodiment, to further improve the accuracy of the entity alignment result, the target entity pair determining subunit includes:
the similarity average value calculating module is used for calculating the average value of the similarity of at least one attribute value of the initial entity pair;
and the target entity pair determining module is used for judging that the initial entity pair is the target entity pair if the calculated average value is greater than a first preset threshold value.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the reference attribute pair screening unit 1003 includes:
a candidate attribute pair obtaining subunit, configured to combine, for each screened target entity pair, every two of the candidate attributes under the target entity pair to obtain a candidate attribute pair under each combination, where two attributes included in the candidate attribute pair do not belong to each determined target attribute pair and respectively belong to two entities in the target entity pair;
the attribute similarity calculation operator unit is used for calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected;
and the target attribute pair determining subunit is used for judging that the attribute pair to be selected is a new target attribute pair if the calculated similarity is greater than a second preset threshold.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the entity alignment apparatus 1000 further includes:
and the second entity pair set generating unit is used for screening each target entity pair from the two knowledge maps by using an entity alignment model obtained by pre-training to form a second entity pair set, and the entity alignment model is used for screening the entity pairs based on the entity relationship.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the entity alignment model is obtained by training using model training data, where the model training data includes a target entity pair with high correctness screened from the first entity pair set.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the entity alignment apparatus 1000 further includes:
a third entity pair set generating unit, configured to, after forming a second entity pair set, combine the first entity pair set and the second entity pair set to form a third entity pair set;
and the fourth entity pair set generating unit is used for removing the target entity pairs with low accuracy from the third entity pair set to serve as a fourth entity pair set.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the fourth entity pair set generating unit includes:
a final similarity determining subunit, configured to determine, for a target entity pair that belongs to both the first entity pair set and the second entity pair set, a final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair;
wherein the first similarity is a similarity between two entities included in the target entity pair obtained when the first entity pair set is formed, and the second similarity is a similarity between two entities included in the target entity pair obtained when the second entity pair set is formed;
and the target entity pair removing subunit is used for removing the target entity pair from the third entity pair set if the final similarity of the target entity pair is smaller than a third preset threshold.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the final similarity determining subunit is specifically configured to:
and determining the final similarity of the target entity pair based on the respective confidence degrees of the first similarity and the second similarity.
As an embodiment, in order to further improve the accuracy of the entity alignment result, the confidence is obtained by learning in model learning data by using a pre-constructed regression model, and the model learning data includes a target entity pair with high correctness screened from the first entity pair set.
Further, an embodiment of the present application further provides an entity alignment apparatus, including: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is configured to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform any one implementation of the entity alignment method described above.
Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute any implementation manner of the entity alignment method.
Further, an embodiment of the present application further provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation manner of the entity alignment method.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (16)

1. A method of entity alignment, comprising:
determining each known target attribute pair in the two knowledge maps as each reference attribute pair;
screening each target entity pair from the two knowledge graphs according to each reference attribute pair;
screening out each new target attribute pair from the two knowledge graphs as each reference attribute pair according to the screened target entity pair, and continuously performing the step of screening out each target entity pair from the two knowledge graphs according to each reference attribute pair until the target entity pair cannot be screened out, so as to form a first entity pair set;
the attribute names and the attribute values of the two attributes included in the target attribute pair are respectively the same, and the two attributes respectively belong to two knowledge maps; the two entities included in the target entity pair are the same and belong to two knowledge graphs respectively.
2. The method of claim 1, wherein the screening out each pair of target entities in two knowledge-graphs according to each pair of reference attributes comprises:
and screening each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair.
3. The method of claim 2, wherein the screening out each pair of target entities in the two knowledge-graphs according to the attribute values of the two attributes included in each respective pair of reference attributes comprises:
determining each initial entity pair in the two knowledge graphs, wherein the initial entity pair has at least one reference attribute pair, each reference attribute pair corresponds to an attribute value similarity, and the attribute value similarity is the similarity between the attribute values of the two attributes included in the corresponding reference attribute pair;
and judging whether the initial entity pair belongs to the target entity pair or not according to the similarity of at least one attribute value of the initial entity pair.
4. The method of claim 3, wherein the determining whether the initial entity pair belongs to the target entity pair according to at least one attribute value similarity of the initial entity pair comprises:
calculating an average value of the similarity of at least one attribute value of the initial entity pair;
and if the calculated average value is larger than a first preset threshold value, judging that the initial entity pair is the target entity pair.
5. The method of claim 1, wherein the screening of each new pair of target attributes in the two knowledge-graphs according to the screened pair of target entities comprises:
for each screened target entity pair, combining each attribute to be selected under the target entity pair pairwise to obtain an attribute to be selected under each combination, wherein two attributes included in the attribute to be selected do not belong to each determined target attribute pair and respectively belong to two entities in the target entity pair;
calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected;
and if the calculated similarity is greater than a second preset threshold, judging that the attribute pair to be selected is a new target attribute pair.
6. The method according to any one of claims 1 to 5, further comprising:
and screening each target entity pair in the two knowledge maps by using an entity alignment model obtained by pre-training to form a second entity pair set, wherein the entity alignment model is used for screening the entity pairs based on the entity relationship.
7. The method of claim 6, wherein the entity alignment model is trained using model training data comprising a high correctness target entity pair selected from the first set of entity pairs.
8. The method of claim 6, wherein after forming the second set of entity pairs, further comprising:
merging the first entity pair set and the second entity pair set to form a third entity pair set;
and removing the target entity pairs with low accuracy from the third entity pair set to serve as a fourth entity pair set.
9. The method of claim 8, wherein the culling of low accuracy target entity pairs from the third set of entity pairs comprises:
for a target entity pair belonging to the first entity pair set and the second entity pair set at the same time, determining the final similarity of the target entity pair according to the first similarity and the second similarity of the target entity pair;
wherein the first similarity is a similarity between two entities included in the target entity pair obtained when the first entity pair set is formed, and the second similarity is a similarity between two entities included in the target entity pair obtained when the second entity pair set is formed;
and if the final similarity of the target entity pair is smaller than a third preset threshold value, removing the target entity pair from the third entity pair set.
10. The method of claim 9, wherein determining the final similarity of the target entity pair comprises:
and determining the final similarity of the target entity pair based on the respective confidence degrees of the first similarity and the second similarity.
11. The method of claim 10, wherein the confidence level is learned using a pre-constructed regression model in model learning data that includes a highly accurate target entity pair selected from the first set of entity pairs.
12. A physical alignment device, comprising:
a reference attribute pair acquisition unit, configured to determine each known target attribute pair as each reference attribute pair in the two knowledge maps;
the target entity pair screening unit is used for screening each target entity pair from the two knowledge maps according to each reference attribute pair;
the reference attribute pair screening unit is used for screening out each new target attribute pair from the two knowledge maps as each reference attribute pair according to the screened target entity pair;
the target entity pair circulating screening unit is used for calling the screened reference attribute pairs, screening each target entity pair in the two knowledge maps according to each reference attribute pair until the target entity pair cannot be screened, and forming a first entity pair set;
the attribute names and the attribute values of the two attributes included in the target attribute pair are respectively the same, and the two attributes respectively belong to two knowledge maps; the two entities included in the target entity pair are the same and belong to two knowledge graphs respectively.
13. The apparatus of claim 12, wherein the target entity pair screening unit is specifically configured to:
and screening each target entity pair from the two knowledge graphs according to the attribute values of the two attributes included in each reference attribute pair.
14. The apparatus of claim 13, wherein the target entity pair screening unit comprises:
an initial entity determining subunit, configured to determine, in two knowledge graphs, initial entity pairs, where each initial entity pair has at least one reference attribute pair, and each reference attribute pair has an attribute value similarity corresponding to an attribute value, where the attribute value similarity is a similarity between attribute values of two attributes included in the corresponding reference attribute pair;
and the target entity pair determining subunit is used for judging whether the initial entity pair belongs to the target entity pair according to the similarity of at least one attribute value of the initial entity pair.
15. The apparatus of claim 14, wherein the target entity pair determination subunit comprises:
the similarity average value calculating module is used for calculating the average value of the similarity of at least one attribute value of the initial entity pair;
and the target entity pair determining module is used for judging that the initial entity pair is the target entity pair if the calculated average value is greater than a first preset threshold value.
16. The apparatus of claim 12, wherein the reference attribute pair filter unit comprises:
a candidate attribute pair obtaining subunit, configured to combine, for each screened target entity pair, every two of the candidate attributes under the target entity pair to obtain a candidate attribute pair under each combination, where two attributes included in the candidate attribute pair do not belong to each determined target attribute pair and respectively belong to two entities in the target entity pair;
the attribute similarity calculation operator unit is used for calculating the similarity between the attribute values of the two attributes included in the attribute pair to be selected;
and the target attribute pair determining subunit is used for judging that the attribute pair to be selected is a new target attribute pair if the calculated similarity is greater than a second preset threshold.
CN201910243854.5A 2019-03-28 2019-03-28 Entity alignment method and device Active CN109960810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910243854.5A CN109960810B (en) 2019-03-28 2019-03-28 Entity alignment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910243854.5A CN109960810B (en) 2019-03-28 2019-03-28 Entity alignment method and device

Publications (2)

Publication Number Publication Date
CN109960810A CN109960810A (en) 2019-07-02
CN109960810B true CN109960810B (en) 2020-05-19

Family

ID=67025173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910243854.5A Active CN109960810B (en) 2019-03-28 2019-03-28 Entity alignment method and device

Country Status (1)

Country Link
CN (1) CN109960810B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377906A (en) * 2019-07-15 2019-10-25 出门问问信息科技有限公司 Entity alignment schemes, storage medium and electronic equipment
CN112445916A (en) * 2019-08-28 2021-03-05 阿里巴巴集团控股有限公司 Business object issuing method, entity issuing method and device
CN110580294B (en) * 2019-09-11 2022-11-29 腾讯科技(深圳)有限公司 Entity fusion method, device, equipment and storage medium
CN110727802B (en) * 2019-09-16 2022-10-28 金色熊猫有限公司 Knowledge graph construction method and device, storage medium and electronic terminal
CN110704620B (en) * 2019-09-25 2022-06-10 海信集团有限公司 Method and device for identifying same entity based on knowledge graph
CN110825822B (en) * 2019-09-30 2022-11-22 深圳云天励飞技术有限公司 Personnel relationship query method and device, electronic equipment and storage medium
CN111046186A (en) * 2019-10-30 2020-04-21 平安科技(深圳)有限公司 Entity alignment method, device and equipment of knowledge graph and storage medium
CN110928894B (en) * 2019-11-18 2023-05-02 北京秒针人工智能科技有限公司 Entity alignment method and device
CN111291196B (en) * 2020-01-22 2024-03-22 腾讯科技(深圳)有限公司 Knowledge graph perfecting method and device, and data processing method and device
CN111475657B (en) * 2020-03-30 2023-10-03 海信集团有限公司 Display equipment, display system and entity alignment method
CN111738005A (en) * 2020-06-19 2020-10-02 平安科技(深圳)有限公司 Named entity alignment method and device, electronic equipment and readable storage medium
CN112559765B (en) * 2020-12-11 2023-06-16 中电科大数据研究院有限公司 Semantic integration method for multi-source heterogeneous database
CN112966124B (en) * 2021-05-18 2021-07-30 腾讯科技(深圳)有限公司 Training method, alignment method, device and equipment of knowledge graph alignment model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202041A (en) * 2016-07-01 2016-12-07 北京奇虎科技有限公司 A kind of method and apparatus of the entity alignment problem solved in knowledge mapping
CN107480191A (en) * 2017-07-12 2017-12-15 清华大学 A kind of entity alignment model of iteration
CN107748799A (en) * 2017-11-08 2018-03-02 四川长虹电器股份有限公司 A kind of method of multi-data source movie data entity alignment
CN108268581A (en) * 2017-07-14 2018-07-10 广东神马搜索科技有限公司 The construction method and device of knowledge mapping
CN108389614A (en) * 2018-03-02 2018-08-10 西安交通大学 The method for building medical image collection of illustrative plates based on image segmentation and convolutional neural networks
CN108763221A (en) * 2018-06-20 2018-11-06 科大讯飞股份有限公司 A kind of attribute-name characterizing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776711B (en) * 2016-11-14 2020-04-07 浙江大学 Chinese medical knowledge map construction method based on deep learning
CN107862075A (en) * 2017-11-29 2018-03-30 浪潮软件股份有限公司 A kind of knowledge mapping construction method and device based on health care big data
CN109271530A (en) * 2018-10-17 2019-01-25 长沙瀚云信息科技有限公司 A kind of disease knowledge map construction method and plateform system, equipment, storage medium
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202041A (en) * 2016-07-01 2016-12-07 北京奇虎科技有限公司 A kind of method and apparatus of the entity alignment problem solved in knowledge mapping
CN107480191A (en) * 2017-07-12 2017-12-15 清华大学 A kind of entity alignment model of iteration
CN108268581A (en) * 2017-07-14 2018-07-10 广东神马搜索科技有限公司 The construction method and device of knowledge mapping
CN107748799A (en) * 2017-11-08 2018-03-02 四川长虹电器股份有限公司 A kind of method of multi-data source movie data entity alignment
CN108389614A (en) * 2018-03-02 2018-08-10 西安交通大学 The method for building medical image collection of illustrative plates based on image segmentation and convolutional neural networks
CN108763221A (en) * 2018-06-20 2018-11-06 科大讯飞股份有限公司 A kind of attribute-name characterizing method and device

Also Published As

Publication number Publication date
CN109960810A (en) 2019-07-02

Similar Documents

Publication Publication Date Title
CN109960810B (en) Entity alignment method and device
Olmezogullari et al. Representation of click-stream datasequences for learning user navigational behavior by using embeddings
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN109739995B (en) Information processing method and device
KR20180041200A (en) Information processing method and apparatus
CN109582956A (en) text representation method and device applied to sentence embedding
CN111783394A (en) Training method of event extraction model, event extraction method, system and equipment
CN112632226B (en) Semantic search method and device based on legal knowledge graph and electronic equipment
CN111539197A (en) Text matching method and device, computer system and readable storage medium
CN111368870A (en) Video time sequence positioning method based on intra-modal collaborative multi-linear pooling
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN115238855A (en) Completion method of time sequence knowledge graph based on graph neural network and related equipment
CN111090765B (en) Social image retrieval method and system based on missing multi-modal hash
CN113793197A (en) Conversation recommendation system based on knowledge graph semantic fusion
CN116521899B (en) Improved graph neural network-based document level relation extraction method and system
CN115934905A (en) Document question and answer method, device and system, electronic equipment and storage medium
CN116049434A (en) Construction method and device of power construction safety knowledge graph and electronic equipment
US12039419B2 (en) Automatically labeling functional blocks in pipelines of existing machine learning projects in a corpus adaptable for use in new machine learning projects
CN111221880B (en) Feature combination method, device, medium, and electronic apparatus
KR20190082453A (en) Method, apparatus and computer program for analyzing new learning contents for machine learning modeling
CN114357160A (en) Early rumor detection method and device based on generation propagation structure characteristics
CN113761874A (en) Event reality prediction method and device, electronic equipment and storage medium
CN113469819A (en) Recommendation method of fund product, related device and computer storage medium
CN117909505B (en) Event argument extraction method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant