CN111651972A - Entity alignment method, device, computer readable medium and electronic equipment - Google Patents

Entity alignment method, device, computer readable medium and electronic equipment Download PDF

Info

Publication number
CN111651972A
CN111651972A CN202010374543.5A CN202010374543A CN111651972A CN 111651972 A CN111651972 A CN 111651972A CN 202010374543 A CN202010374543 A CN 202010374543A CN 111651972 A CN111651972 A CN 111651972A
Authority
CN
China
Prior art keywords
entity
aligned
target
sub
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010374543.5A
Other languages
Chinese (zh)
Other versions
CN111651972B (en
Inventor
李雪莲
程序
张涵宇
谢思发
江小琴
刘文强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010374543.5A priority Critical patent/CN111651972B/en
Publication of CN111651972A publication Critical patent/CN111651972A/en
Application granted granted Critical
Publication of CN111651972B publication Critical patent/CN111651972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/189Automatic justification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides an entity alignment method, an entity alignment device, a computer readable medium and electronic equipment. The entity alignment method comprises the following steps: acquiring a target entity and an entity to be aligned corresponding to the target entity; matching the entity to be aligned with the entity contained in the target entity to determine a matched entity to be aligned from the entity to be aligned, and updating the target entity by taking the matched entity to be aligned as a new entity of the target entity to obtain a new target entity; continuing to perform matching of the entity to be aligned and updating of the target entity based on the sub-entity contained in the new target entity until a matching end condition is met; and aligning the sub-entities contained in the target entity obtained after the matching end condition is satisfied. According to the technical scheme of the embodiment of the application, the entity to be aligned matched with the target entity can be accurately determined, so that the accuracy of entity alignment is improved.

Description

Entity alignment method, device, computer readable medium and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for entity alignment, a computer-readable medium, and an electronic device.
Background
The construction of knowledge graph often requires fusing data from a plurality of different sources and aligning the data from the different sources into one entity, for example, the entity "becoming dragon" may be called "becoming dragon in" or "jackie chan" in other data sources. Thus, the essence of knowledge fusion is entity alignment.
As a basic component unit of the knowledge graph, an entity has its own attribute features: for example, the entity "becomes dragon", and has attributes such as "occupation", "height", "age", and the like. The traditional entity alignment method has the defects of low accuracy and the like.
Disclosure of Invention
Embodiments of the present application provide an entity alignment method, an entity alignment device, a computer-readable medium, and an electronic device, so that an entity to be aligned that is matched with a target entity can be accurately determined at least to a certain extent, thereby improving the accuracy of entity alignment.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided an entity alignment method, including: acquiring a target entity and an entity to be aligned corresponding to the target entity; matching the entity to be aligned with the entity contained in the target entity to determine a matched entity to be aligned from the entity to be aligned, and updating the target entity by taking the matched entity to be aligned as a new entity of the target entity to obtain a new target entity; continuing to perform matching of the entity to be aligned and updating of the target entity based on the sub-entity contained in the new target entity until a matching end condition is met; and aligning the sub-entities contained in the target entity obtained after the matching end condition is satisfied.
According to an aspect of an embodiment of the present application, there is provided an entity alignment apparatus including: the device comprises an acquisition unit, a registration unit and a registration unit, wherein the acquisition unit is configured to acquire a target entity and an entity to be aligned corresponding to the target entity; the matching unit is configured to match the entity to be aligned with a sub-entity included in the target entity, so as to determine a matched entity to be aligned from the entity to be aligned, and update the target entity by using the matched entity to be aligned as a new sub-entity of the target entity, so as to obtain a new target entity; the updating unit is configured to continue to perform matching of the entity to be aligned and updating of the target entity based on the sub-entity contained in the new target entity until a matching end condition is met; and a processing unit configured to perform alignment processing on a sub-entity included in the target entity obtained after the matching end condition is satisfied.
In some embodiments of the present application, based on the foregoing solution, the obtaining unit includes: an acquisition subunit configured to acquire a sub-entity included in the target entity; and the searching subunit is configured to search and obtain the entity to be aligned according to the attribute characteristics of the sub-entity.
In some embodiments of the present application, based on the foregoing solution, the finding subunit is further configured to: if the entity name is contained in the entity, searching an entity with the similarity between the entity name and the entity name of the entity being greater than or equal to a first similarity threshold value from an entity data source according to the entity name of the entity, and taking the searched entity as the entity to be aligned; if the entity comprises the entity picture, searching an entity with the similarity between the entity picture and the entity picture of the entity being greater than or equal to a second similarity threshold from an entity data source according to the entity picture of the entity, and taking the searched entity as the entity to be aligned.
In some embodiments of the present application, based on the foregoing solution, the finding subunit is further configured to: and before searching from the entity data source according to the entity name of the sporocarp, denoising the entity name contained in the entity data source so as to remove the noise word in the entity name contained in the entity data source according to a set noise word bank.
In some embodiments of the present application, based on the foregoing solution, the finding subunit is further configured to: before searching from the entity data source according to the entity picture of the sub-entity, extracting a first feature vector of the entity picture contained in the entity data source and a second feature vector of the entity picture of the sub-entity, so as to calculate the similarity between the entity picture of the entity data source and the entity picture of the sub-entity based on the first feature vector and the second feature vector.
In some embodiments of the present application, based on the foregoing solution, the matching unit is further configured to: matching the similarity of the entity picture of the entity to be aligned and the entity picture of the entity contained in the target entity; and taking the entity to be aligned with the similarity between the entity picture and the entity pictures of the sub-entities with the preset proportion in the target entity, wherein the similarity is greater than the preset similarity, as the matched entity to be aligned.
In some embodiments of the present application, based on the foregoing solution, the matching unit is further configured to: carrying out similarity matching on the description attribute of the entity to be aligned and the description attribute of a sub-entity contained in the target entity; and taking the entity to be aligned, of which the similarity between the description attribute and the description attributes of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, as the matched entity to be aligned.
In some embodiments of the present application, based on the foregoing solution, the obtaining unit is further configured to: acquiring the right information of the matched entity to be aligned and the right information of a sub-entity contained in the target entity; and aligning the matched right information of the entity to be aligned and the right information of the sub-entity contained in the target entity, and storing the aligned right information and the right information into a right information list.
In some embodiments of the present application, based on the foregoing solution, the matching unit is further configured to: similarity matching is carried out on the right information of the entity to be aligned and the right information of the sub-entity contained in the target entity, and the difference between the issuing time of the entity to be aligned and the issuing time of the sub-entity contained in the target entity is calculated; and taking the entity to be aligned as the matched entity to be aligned, wherein the similarity between the ownership information and the ownership information of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, and the difference between the issuing time and the issuing time of the sub-entities with the preset proportion in the target entity is less than the time threshold.
In some embodiments of the present application, based on the foregoing scheme, the matching end condition includes at least one of: the matching times reach a first preset threshold value, and the difference between the number of entities to be aligned, which are matched with the sporocarp in the two adjacent matching processes, reaches a second preset threshold value.
In some embodiments of the present application, based on the foregoing solution, the updating unit is further configured to: re-searching the entity to be aligned from the entity data source based on the sub-entity contained in the new target entity; and according to the entity to be aligned obtained by searching again and the sub-entity contained in the new target entity, performing matching of the entity to be aligned and updating of the target entity.
In some embodiments of the present application, based on the foregoing solution, the updating unit is further configured to: acquiring entities to be aligned which are determined in the previous matching process and are not matched with the sub-entities to be aligned to serve as a new entity set to be aligned; and according to the new entity set to be aligned and the sub-entity contained in the new target entity, performing matching of the entity to be aligned and updating of the target entity.
According to an aspect of embodiments of the present application, there is provided a computer-readable medium on which a computer program is stored, which computer program, when executed by a processor, implements the entity alignment method as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the entity alignment method as described in the embodiments above.
In the technical solutions provided in some embodiments of the present application, an entity to be aligned that is matched with a sub-entity included in a target entity is determined from the entities to be aligned, the matched entity to be aligned is used as a newly added sub-entity of the target entity to update the target entity, so as to obtain a new target entity, matching of the entity to be aligned and updating of the target entity are continued based on the sub-entity included in the new target entity until a matching end condition is met, and the sub-entity included in the target entity obtained after the matching end condition is met is aligned, so that the entity to be aligned that is matched with the sub-entity included in the target entity can be determined through multiple matching processes, and the matched entity to be aligned can be determined more accurately, thereby improving accuracy of entity alignment and facilitating realization of better entity alignment effect.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
fig. 1 illustrates an application scenario of an entity alignment method according to an embodiment of the present application;
FIG. 2 shows a flow diagram of an entity alignment method according to an embodiment of the present application;
FIG. 3 shows a detailed flowchart of step S210 according to an embodiment of the present application;
FIG. 4 shows a detailed flowchart of step S2102 according to an embodiment of the present application;
FIG. 5 shows a schematic diagram of a process of finding an entity to be aligned according to an entity name of a sub-entity according to an embodiment of the application;
fig. 6 shows a schematic diagram of a process of finding entities to be aligned according to an entity picture of a sub-entity according to an embodiment of the present application;
FIG. 7 shows a flow diagram of an entity alignment method according to an embodiment of the present application;
FIG. 8 shows a flow diagram of an entity alignment method according to an embodiment of the present application;
FIG. 9 shows a flow diagram of an entity alignment method according to an embodiment of the present application;
FIG. 10 shows a flow diagram of an entity alignment method according to an embodiment of the present application;
FIG. 11 shows a flow diagram of an entity alignment method according to an embodiment of the present application;
FIG. 12 shows a flow diagram of an entity alignment method according to an embodiment of the present application;
FIG. 13 shows an interaction flow diagram of an entity alignment method according to an embodiment of the application;
FIG. 14A illustrates an example diagram of an application of an entity alignment method according to one embodiment of the present application to a target entity obtained in the field of gaming;
FIG. 14B illustrates an example diagram of an entity to be aligned corresponding to a target entity found in the field of gaming, as applied by the entity alignment method according to one embodiment of the present application;
FIG. 14C illustrates an example diagram of an entity to be aligned corresponding to a target entity found in the field of gaming, as applied by the entity alignment method according to one embodiment of the present application;
FIG. 14D illustrates an example diagram of an entity to be aligned corresponding to a target entity found in the field of gaming, as applied by the entity alignment method according to one embodiment of the present application;
15A-15B illustrate a front-to-back effect comparison diagram of an entity alignment method applied in the field of gaming, according to one embodiment of the present application;
FIG. 16 shows a block diagram of an entity alignment apparatus according to an embodiment of the present application;
FIG. 17 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
Knowledge graph (knowledge graph): used to describe various entities and concepts existing in the real world, and to characterize the intrinsic properties of an entity by attribute-value pairs, to connect two or more entities by relationships, to characterize an association between two or more entities. The knowledgegraph may also be viewed as a massive mesh, where nodes represent entities and edges are formed by attributes or relationships. Knowledge-graph is essentially a semantic network. Its nodes represent entities or concepts and the edges represent various semantic relationships between entities/concepts. A knowledge graph is essentially a semantic network, a graph-based data structure, consisting of nodes ("entities") and edges ("relationships"). In the knowledge-graph, each node represents an "entity" existing in the real world, and each edge is a "relationship" between entities.
Entity alignment (entity alignment): also called entity similarity matching, ontology alignment, etc., means that knowledge graphs from different sources contain the same entities, which need to be fused. For example: entities in the data source 1: adult dragon, the entities in data source 2: jackie Chan, where two data sources refer to the same person, should be an entity in the knowledge base, and thus needs to merge the contents of the two data sources.
In the related technology, most of the knowledge graph fusion technologies directly adopt attribute similarity to perform single-round entity alignment aiming at two knowledge graphs, and if an entity corresponds to a plurality of attributes, a corresponding weight is distributed to the similarity of each attribute.
Taking entity 1 "becoming dragon" and entity 2 "becoming dragon big brother" as examples, entity 1 "becoming dragon" includes the following attributes: name-dragon, age-65 years, occupation-actor, height-1.73 m, entity 2 "dragon-brother" includes the following attributes: name-chenglong, age-65, occupation-actor, height-173 cm, the specific steps of the traditional knowledge fusion process are as follows:
(1) data preprocessing: the noise word "age" of the age attribute "65 years" of the entity 1 is removed, and the height attribute "1.73 m" is unified as cm, so that the entity can be converted into "173 cm";
(2) comparing the attribute similarity: respectively calculating the similarity of each attribute of the entity 1 and the entity 2 based on the editing distance, wherein the similarity of names is 0.83, and the similarity of ages, professions and heights is 1;
(3) based on the importance degree of each attribute to entity fusion, weight is distributed to the similarity of each attribute: the name attribute weight is 0.5, the professional attribute weight is 0.2, the age attribute weight is 0.2, and the height attribute weight is 0.1, then the total similarity of entity 1 and entity 2 is:
0.5*0.83+0.2+0.2+0.1=0.91;
(4) entity alignment: the fusion threshold is set to 0.9, above which the similarity of entity 1 and entity 2 is higher, and therefore fusion is possible.
However, the conventional knowledge-graph fusion technology has the following three disadvantages:
(1) the weight of the relevant attribute is difficult to determine;
(2) the fusion process is fixed, and the mutual learning capacity of attribute alignment and entity alignment is lacked;
(3) the requirement on the integrity of the entity attributes is high, and if some attribute is missing, the fusion effect is greatly influenced.
In view of the above technical problems, embodiments of the present application provide a method, an apparatus, a computer-readable medium and an electronic device for entity alignment, by obtaining a target entity and an entity to be aligned corresponding to the target entity, matching the entity to be aligned with a sub-entity contained in the target entity, the method comprises the steps of determining a matched entity to be aligned from entities to be aligned, updating a target entity by taking the matched entity to be aligned as a newly added sub-entity of the target entity to obtain a new target entity, continuing to perform matching of the entity to be aligned and updating of the target entity based on the sub-entity contained in the new target entity until a matching end condition is met, aligning the sub-entity contained in the target entity obtained after the matching end condition is met, and being capable of improving and accurately determining the entity to be aligned matched with the target entity, so that the accuracy of entity alignment is improved.
Referring to fig. 1, an application scenario diagram of the entity alignment method provided in the exemplary embodiment of the present application is shown, where the entity alignment apparatus may specifically operate in a terminal device 102 or a server 103 having a storage unit and a processor and having an arithmetic capability; the terminal 102 may be a hardware device having various operating systems, such as a smart phone, a desktop computer, a tablet computer, and a notebook computer, and the server 103 may be a single server, a server cluster formed by a plurality of servers, a cloud server, and the like.
In an embodiment of the present application, a user may input an entity name through a search bar provided by a knowledge graph interface of the terminal 102, and the terminal 102 may obtain a target entity and an entity to be aligned corresponding to the target entity from the server 103 based on an input instruction of the user, match the entity to be aligned with a sub-entity included in the target entity, determining a matched entity to be aligned from the entities to be aligned, updating the target entity by using the matched entity to be aligned as a new sub-entity of the target entity to obtain a new target entity, meanwhile, the terminal 102 may continue to perform matching of the entity to be aligned and updating of the target entity based on the new sub-entity included in the target entity until a matching end condition is satisfied, and finally perform alignment processing on the sub-entity included in the target entity obtained after the matching end condition is satisfied, and display the result of the alignment processing on the graphical user interface of the terminal.
In another embodiment of the present application, a user may input an entity name through a search bar provided by a knowledge graph interface of the terminal 102, the terminal 102 may send an input instruction of the user to the server 103, the server 103 may obtain a target entity and an entity to be aligned corresponding to the target entity according to the received input instruction of the user, match the entity to be aligned with a sub-entity included in the target entity to determine a matched entity to be aligned from the entities to be aligned, update the target entity by using the matched entity to be aligned as a new sub-entity of the target entity to obtain a new target entity, continue matching the entity to be aligned and updating the target entity based on the sub-entity included in the new target entity until a matching end condition is satisfied, and finally align the sub-entity included in the target entity obtained after the matching end condition is satisfied, and transmits the result of the alignment process to the terminal 102 so that the terminal 102 displays the alignment result on the graphic user interface.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 2 shows a flowchart of an entity alignment method according to an embodiment of the present application, and referring to fig. 2, the entity alignment method includes:
step S210, acquiring a target entity and an entity to be aligned corresponding to the target entity;
step S220, matching the entity to be aligned with the entity contained in the target entity to determine the matched entity to be aligned from the entity to be aligned, and updating the target entity by taking the matched entity to be aligned as a new entity of the target entity to obtain a new target entity;
step S230, continuing to match the entity to be aligned and update the target entity based on the sub-entity contained in the new target entity until the matching end condition is met;
step S240, performing an alignment process on the sub-entities included in the target entity obtained after the matching end condition is satisfied.
These steps are described in detail below.
In step S210, a target entity and an entity to be aligned corresponding to the target entity are obtained.
As before, each node in the knowledge-graph represents an entity existing in the real world, and thus, first, an entity data source is obtained from at least one knowledge-graph, and then, a target entity and an entity to be aligned corresponding to the target entity can be obtained from the entity data source. The entity to be aligned is an entity to be entity aligned with a target entity, and before entity alignment, the target entity and the entity to be aligned are independent entities.
In step S220, the entity to be aligned is matched with the sub-entity included in the target entity to determine the matched entity to be aligned from the entities to be aligned, and the matched entity to be aligned is used as a new sub-entity of the target entity to update the target entity, so as to obtain a new target entity.
Specifically, the target entity includes a sub-entity, for example, { target entity: the sub-entities a and b, which indicate that the target entity has two sub-entities, may match the entity to be aligned with the sub-entities included in the target entity after the target entity and the entity to be aligned corresponding to the target entity are obtained in step S210, so as to determine the matched entity to be aligned from the entities to be aligned, where the matching method may be matching between the attribute features of the entity to be aligned and the sub-entities, and if the entity to be aligned includes the entity to be aligned that is matched with the sub-entities, the matched entity to be aligned is obtained.
Further, after the matched entity to be aligned is determined, the matched entity to be aligned can be used as a new sub-entity of the target entity to obtain a new target entity.
Step S230, continuing to perform matching of the entity to be aligned and updating of the target entity based on the sub-entity included in the new target entity until the matching end condition is satisfied.
It can be understood that, after obtaining a new target entity, the previous round of matching and updating process is ended, and in order to better achieve entity alignment and achieve a good entity alignment effect, the next round of matching and updating process may be continued based on the new target entity, in other words, the matching of the entity to be aligned and the updating of the target entity are continued based on the sub-entity included in the new target entity until the matching end condition is satisfied.
In one embodiment of the present application, the matching end condition includes at least one of:
the matching times reach a first preset threshold value, and the difference between the number of entities to be aligned, which are matched with the sporocarp in the two adjacent matching processes, reaches a second preset threshold value.
In this embodiment, if the matching frequency reaches a first preset threshold, the matching may be ended, or if a difference between the numbers of entities to be aligned that are matched with the sub-entity in two adjacent matching processes reaches a second preset threshold, that is, a matching end condition is reached, the matching may be ended.
In step S240, the sub-entities included in the target entity obtained after the matching end condition is satisfied are aligned.
Specifically, after each round of matching links determines the matched entity to be aligned, the matched entity to be aligned determined by each round of matching links can be used as the newly added entity of the target entity, so that after the matching is finished, the entity contained in the target entity obtained after the matching finishing condition is met can be aligned, and the entity alignment effect is realized.
Based on the technical scheme provided by the embodiment, the entity to be aligned, which is matched with the entity contained in the target entity, is determined from the entities to be aligned, the matched entity to be aligned is used as the newly added entity of the target entity to update the target entity to obtain a new target entity, the matching of the entity to be aligned and the updating of the target entity are continuously performed based on the entity contained in the new target entity until the matching end condition is met, and the entity contained in the target entity obtained after the matching end condition is met is aligned.
Fig. 3 shows a flowchart of an entity alignment method according to an embodiment of the present application, in which an entity to be aligned may be searched according to an attribute feature of a sub-entity of a target entity, as shown in fig. 3, step S210 specifically includes steps S2101 to S2102, which are described in detail as follows:
step S2101, a fruit body included in the target entity is acquired.
Specifically, the entity included in the target entity is the original entity of the target entity, rather than the new entity, and the original entity of the target entity may be the target entity, for example, if the target entity is a game "royal jelly" obtained from an entity data source of the game knowledge graph, the entity included in the target entity is the target entity itself, that is, the entity is the "royal jelly".
And step S2102, searching and obtaining the entity to be aligned according to the attribute characteristics of the sporocarp.
In this step, since each entity includes an attribute feature, the entity to be aligned can be found according to the attribute feature of the sub-entity included in the target entity, where the attribute feature of the sub-entity may include attributes such as an entity name, an entity picture, a description, a release time, and ownership information.
In an embodiment of the present application, in order to ensure the comprehensiveness of the entity to be aligned obtained by searching, a single-attribute multi-type searching manner may be adopted, and this searching process is mainly coarse searching, as shown in fig. 4, the step S2102 of searching the entity to be aligned may specifically include steps S21021 to S21022 according to the attribute characteristics of the sub-entity, and the following is now described in detail:
step S21021, if the entity name is included in the entity, searching an entity having a similarity greater than or equal to the first similarity threshold between the entity name and the entity name of the entity from the entity data source according to the entity name of the entity, and using the searched entity as an entity to be aligned.
Each sub-entity may have a plurality of attribute features, and if the sub-entity included in the target entity includes an attribute of an entity name, an entity satisfying a condition may be searched from the entity data source according to the entity name of the sub-entity, and the entity satisfying the condition may be an entity whose similarity between the entity name and the entity name of the sub-entity is greater than or equal to a first similarity threshold.
In an embodiment, since the entity name in the entity data source may include noise, before searching for the entity to be aligned according to the entity name of the sub-entity, the de-noising process may be performed on the entity name in the entity data source, which in this embodiment further includes:
before searching from the entity data source according to the entity name of the entity, denoising the entity name contained in the entity data source so as to remove the noise word in the entity name contained in the entity data source according to the set noise word bank.
Specifically, before searching for an entity to be aligned from an entity data source according to an entity name of a sub-entity, denoising processing may be performed on the entity name included in the entity data source, where the denoising processing aims to remove a noise word in the entity name, where the noise word may be a noise word included in a set noise word library or a noise word determined in another manner.
Of course, in an embodiment, after the de-noising processing is performed on the entity names in the entity data source, operations such as case unification, bracket removal and the like may be further performed on the entity names, and then the similarity between the entity names and the entity names of the sub-entities is calculated by using the further processed entity names.
In one embodiment, the method for calculating the similarity between the names of entities may be an edit distance algorithm, where the edit distance refers to the minimum number of single-character edit operations required to convert one string into another string between two strings. The allowed editing operations include replacing one character with another, inserting one character, and deleting one character. The larger the edit distance, the greater the similarity of two strings. It is to be understood that the similarity algorithm listed above is only exemplary, and the embodiments of the present application are not particularly limited thereto.
Fig. 5 is a schematic diagram illustrating a process of searching for an entity to be aligned according to entity names of sub-entities according to an embodiment of the present application, as shown in fig. 5, entity names of a plurality of entities are included in an entity data source, which are respectively the entity name of entity 1, the entity name of entity 2, and the entity name of entity 3. In fig. 5, if the calculated similarity between the processed entity name 1 and the entity name of the sub-entity is greater than the first similarity threshold, the entity 1 and the entity 2 corresponding to the processed entity name 1 may be regarded as entities to be aligned.
Continuing with fig. 4, in step S21022, if the entity picture includes an entity picture, an entity with a similarity between the entity picture and the entity picture of the entity being greater than or equal to the second similarity threshold is searched from the entity data source according to the entity picture of the entity, and the searched entity is used as an entity to be aligned.
Besides searching the entity to be aligned according to the entity name of the sub-entity, if the sub-entity comprises the attribute feature of the entity picture, the entity with the similarity between the entity picture and the entity picture of the sub-entity being greater than or equal to the second similarity threshold value can be searched from the entity data source according to the entity picture of the sub-entity, and the searched entity is used as the entity to be aligned. The second similarity threshold may be set according to an actual situation, and is not specifically limited herein.
In an embodiment, before searching from an entity data source according to an entity picture of a sub-entity, a feature vector of the picture may be extracted to calculate a similarity between the pictures based on the feature vector of the picture, and in this embodiment, the method specifically further includes:
before searching from the entity data source according to the entity picture of the entity, extracting a first feature vector of the entity picture and a second feature vector of the entity picture of the entity contained in the entity data source, and calculating the similarity between the entity picture of the entity data source and the entity picture of the entity based on the first feature vector and the second feature vector.
Specifically, a DenseNet network model may be used to extract a first feature vector of an entity picture and a second feature vector of an entity of a sub-entity included in an entity data source, and then Faiss is introduced to perform cosine similarity calculation between the first feature vector and the second feature vector.
Fig. 6 shows a schematic diagram of a process of finding an entity to be aligned according to an entity picture of a sub-entity according to an embodiment of the present application.
As shown in fig. 6, in order to search the entity to be aligned from the entity data source, the entity data source includes a plurality of entity pictures, which are respectively the entity picture of entity 1, the entity picture of entity 2, and the entity picture … of entity 3, first, the entity picture of the entity data source is vector-converted to obtain a plurality of first feature vectors, and the entity picture of the sub-entity is vector-converted to obtain a second feature vector, then, the similarity between each first feature vector and the second feature vector of the entity picture of the sub-entity is calculated, the entity to be aligned is determined according to the calculation result of the similarity, in a specific embodiment, the entity with the similarity greater than or equal to the second similarity threshold may be used as the entity to be aligned, for example, in fig. 6, if the similarity between the first feature vector 1 and the second feature vector of the entity picture of the sub-entity is calculated to be greater than the second similarity threshold, the processed entity 1 may be taken as the entity to be aligned.
Through the above process, entities to be aligned corresponding to the target entity can be found, and since the search process is focused on the comprehensiveness of the search, introduction of some entities that cannot be aligned with the target entity may result, and therefore, it is further necessary to further determine entities to be aligned that match with sub-entities included in the target entity from the entities to be aligned.
In a specific embodiment, in order to further determine an entity to be aligned matching with a sub-entity included in the target entity from the entities to be aligned, as shown in fig. 7, the method may include:
step S710, carrying out similarity matching on the entity picture of the entity to be aligned and the entity picture of the entity contained in the target entity;
step S720, using the entity to be aligned, in which the similarity between the entity picture and the entity pictures of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, as the matched entity to be aligned.
In step S710, in order to determine the matched entity to be aligned from the entities to be aligned, a similarity matching manner of the entity pictures may be used, that is, a similarity between the entity picture of the entity to be aligned and the entity picture of the entity body included in the target entity is calculated.
In step S720, the entity to be aligned, in which the similarity between the entity picture and the entity pictures of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, is taken as the matched entity to be aligned.
It should be noted that, when performing similarity matching between the entity to be aligned and the sub-entities included in the target entity in step S710, similarity matching between the entity to be aligned and the sub-entities included in the target entity needs to be performed one by one.
After similarity matching is performed on the entities to be aligned and the sub-entities included in the target entity one by one, the entities to be aligned, of which the similarity between the entity picture and the entity pictures of the sub-entities in the target entity in the preset proportion is greater than the preset similarity, can be used as the matched entities to be aligned. For example, if the target entity includes 6 sub-entities, and the preset ratio is 50%, for any entity to be aligned, the entity to be aligned can be used as a matching entity to be aligned only when the entity picture of the entity to be aligned and the entity pictures of 3 sub-entities in the target entity are both greater than the preset similarity.
It should be noted that, because the process of searching for the entities to be aligned is focused on the comprehensiveness of the search, it may result in the introduction of some entities that cannot be aligned with the target entity, and thus further determination of the entities to be aligned that are determined from the entities to be aligned and match with the sub-entities included in the target entity is also required. In order to improve the accuracy of the determined matched entity to be aligned, in the process of determining the matched entity to be aligned by using the similarity of the entity picture, the preset similarity value may be set to be higher than a second similarity threshold used for searching the entity to be aligned by using the similarity of the entity picture.
In another embodiment, in order to further determine an entity to be aligned matching with a sub-entity included in the target entity from the entities to be aligned, as shown in fig. 8, steps S810 to S820 may be specifically included, which is described in detail as follows:
step S810, carrying out similarity matching on the description attribute of the entity to be aligned and the description attribute of the sub-entity contained in the target entity.
Since the entities may include attributes such as description, in order to determine a matching entity to be aligned from the entities to be aligned, a similarity matching manner of the description attributes may be used, that is, a similarity between the description attributes of the entities to be aligned and the description attributes of the sub-entities included in the target entity is calculated.
Step 820, regarding the entity to be aligned with the similarity between the description attribute and the description attribute of the sub-entities with the preset proportion in the target entity greater than the preset similarity as the matched entity to be aligned.
After similarity matching of description attributes is performed on the entities to be aligned and the sub-entities included in the target entity one by one, the entities to be aligned, of which the similarities between the description attributes and the description attributes of the sub-entities in the target entity in a preset proportion are greater than the preset similarity, can be used as the entities to be aligned for matching. For example, if the target entity includes 6 sub-entities, and the preset ratio is 50%, for any entity to be aligned, the entity to be aligned can be used as a matching entity to be aligned only when the similarity between the description attribute of the entity to be aligned and the description attributes of 3 sub-entities in the target entity is greater than the preset similarity.
In an embodiment of the present application, since the attribute features of the entity include an entity name, an entity picture, a description, ownership information, a release time, and the like, after determining the entity to be aligned that matches with the entity included in the target entity through the entity name or the entity picture, the entity included in the target entity and the ownership information of the matching entity to be aligned may be aligned and accumulated as additional knowledge, as shown in fig. 9, the embodiment specifically includes steps S910 to S920, and the following description is made in detail:
step S910, obtaining the right information of the matched entity to be aligned and the right information of the sub-entity included in the target entity.
After the entity picture of the entity to be aligned and the entity picture of the entity to be aligned contained in the target entity are subjected to similarity matching to determine the matched entity to be aligned, or after the description attribute of the entity to be aligned and the description attribute of the entity to be aligned are subjected to similarity matching to determine the matched entity to be aligned, the right information of the matched entity to be aligned and the right information of the entity to be aligned contained in the target entity can be further obtained.
And step S920, aligning the matched right information of the entity to be aligned and the right information of the sub-entity contained in the target entity, and storing the aligned right information and the right information into a right information list.
Since the matched entity to be aligned is an entity that can be aligned with the sub-entity included in the target entity, the right information of the matched entity to be aligned and the right information of the sub-entity included in the target entity can be aligned and stored in the right information list.
In another embodiment, in order to further determine an entity to be aligned matching with a sub-entity included in the target entity from the entities to be aligned, as shown in fig. 10, steps S1010 to S1020 may be specifically included, which is described in detail as follows:
step S1010, performing similarity matching between the ownership information of the entity to be aligned and the ownership information of the sub-entity included in the target entity, and calculating a difference between the issue time of the entity to be aligned and the issue time of the sub-entity included in the target entity.
Since the entity may include attributes such as ownership information and issue time, and the ownership information may be developer information or publisher information, in order to determine the matched entity to be aligned from the entities to be aligned, a similarity matching manner of the ownership information may be used, that is, a similarity between the ownership information of the entity to be aligned and ownership information of a sub-entity included in the target entity is calculated, and at the same time, a difference between the issue time of the entity to be aligned and issue time of the sub-entity included in the target entity is calculated.
Step S1020, the entities to be aligned, for which the similarity between the ownership information and the ownership information of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, and the difference between the release time and the release time of the sub-entities with the preset proportion in the target entity is less than the time threshold, are taken as the matched entities to be aligned.
The similarity between the ownership information of the entity to be aligned and the ownership information of the child entity included in the target entity is calculated, and at the same time, the difference between the release time of the entity to be aligned and the release time of the child entity included in the target entity is calculated.
Further, after performing similarity matching of the ownership information and calculation of a difference between the issuance times of the entities to be aligned and the target entity, the entities to be aligned, which have similarities between the ownership information and the ownership information of the sub-entities in the target entity in a preset proportion that are all greater than the preset similarity, and have differences between the issuance times and the issuance times of the sub-entities in the target entity in the preset proportion that are all less than the time threshold, may be used as the entities to be aligned that are matched. For example, if the target entity includes 6 sub-entities, and the preset ratio is 50%, for any entity to be aligned, the entity to be aligned can be used as a matching entity to be aligned only when the similarity between the ownership information of the entity to be aligned and the ownership information of 3 sub-entities in the target entity is greater than the preset similarity, and the similarity between the issue time of the entity to be aligned and the issue time of the 3 sub-entities in the target entity is greater than the preset similarity.
In one embodiment of the present application, as shown in fig. 11, step S230 may include:
step S2301, re-searching entity to be aligned from entity data source based on new sub-entity contained in target entity;
step S2302, performing matching of the entity to be aligned and updating of the target entity according to the entity to be aligned obtained by re-searching and a sub-entity included in the new target entity.
These steps are described in detail below:
in step S2301, the entity to be aligned is retrieved from the entity data source based on the sub-entity contained in the new target entity.
Specifically, after each target entity is updated, the entity to be aligned may be searched again from the entity data source according to the sub-entity included in the new target entity.
The searching method may be to perform the searching again according to the attribute characteristics of the sub-entity included in the target entity, for example, to perform the searching again through the entity name or the entity picture of the sub-entity included in the target entity.
Because the new target entity comprises the newly added sporocarp, the entity to be aligned obtained by re-searching can comprise the entity found according to the original sporocarp and can also comprise the entity found according to the newly added sporocarp.
Step S2302, performing matching of the entity to be aligned and updating of the target entity according to the entity to be aligned obtained by re-searching and a sub-entity included in the new target entity.
Since the entity to be aligned is obtained by re-searching, the matching of the entity to be aligned and the updating of the target entity can be performed based on the entity to be aligned obtained by re-searching and the sub-entity included in the new target entity.
Specifically, the entity to be aligned obtained by re-searching is matched with a new entity included in the target entity, so as to determine the matched entity to be aligned from the entity to be aligned obtained by re-searching, and the matched entity to be aligned is used as a new entity of the new target entity to update the new target entity, so as to obtain the target entity in the next round of matching process.
In an embodiment of the present application, as shown in fig. 12, step S230 may further include:
step S2301', acquiring entities to be aligned which are determined in the previous matching process and are not matched with the sub-entities to serve as a new entity set to be aligned;
step S2302', performs matching of the entity to be aligned and updating of the target entity according to the new entity set to be aligned and the new sub-entity included in the target entity.
These steps are described in detail below:
in step S2301', the entity to be aligned that is determined in the previous matching process and does not match with the sub-entity is obtained as a new entity set to be aligned.
In this embodiment, after a new target entity is obtained by updating the target entity each time, a new entity set to be aligned may be obtained according to an unmatched entity to be aligned by obtaining an entity to be aligned that is determined in the previous matching process and is unmatched with the sub-entity.
Step S2302', performs matching of the entity to be aligned and updating of the target entity according to the new entity set to be aligned and the new sub-entity included in the target entity.
After the new entity set to be aligned is obtained, matching of the entities to be aligned and updating of the target entity may be performed based on the new entity set to be aligned and the new sub-entity included in the target entity.
Specifically, entities to be aligned in the new entity set to be aligned are matched with new sub-entities included in the target entity, so that the matched entities to be aligned are determined from the new entity set to be aligned, and the matched entities to be aligned are used as new sub-entities of the new target entity to update the new target entity, so that the target entity of the next round of matching process is obtained.
Fig. 13 shows an interaction flowchart of an entity alignment method according to an embodiment of the present application, and as shown in fig. 13, the entity alignment method mainly includes steps S1310-S1320, which will now be described in detail as follows:
step S1310, find an entity to be aligned.
Specifically, the entity to be aligned may be searched from the entity data source according to the target entity. In the entity data source state 1, the entity data source includes the target entity and other entities except the target entity, and the sub-entities of each entity are themselves. In order to realize entity alignment, an entity to be aligned can be searched from an entity data source according to the attribute characteristics of a target entity.
Step S1320, determining the matched entity to be aligned.
After the entity to be aligned is obtained in step S1310, the entity data source state 2 is entered, and unlike the entity data source state 1, in the entity data source state 2, the target entity includes the target entity itself as the original sub-entity and also includes the entity to be aligned as the new sub-entity. In the entity data source state 1, the entity to be aligned and the target entity exist independently.
However, since the entity to be aligned obtained by searching may include an entity that cannot be aligned, it is necessary to further determine the matched entity to be aligned, use the matched entity to be aligned as a new entity of the target entity, remove the unmatched entity to be aligned, and do not use the unmatched entity to be a new entity of the target entity, so that the state of the entity data source is updated.
In one embodiment, the process of determining a matching entity to align from among entities to align may be aided with a list of ownership information, crowd-sourced data, stem similarity, and a name library. The crowdsourcing data is data formed by manually scoring, and the name library can be an alias library directly provided by Appanie.
The application of the entity alignment method is further illustrated by the application scenario of the specific game field. Table 1 shows the process of aligning the target entity "queen shuffler" with the entity "queen shuffler (yand pronoun dialect)", "hand trip of queen shuffler", and "fighting queen shuffler".
Figure BDA0002479471090000191
Figure BDA0002479471090000201
TABLE 1
A first round of circulation:
a. entity data source state 1: the entity data source comprises four game entities, namely a ' random king person ', ' random king person (Yang power code), ' random king person hand trip ', ' random fighting king person ', and the fruiting body of each game entity is the game entity. Wherein, the "chaoshiwang" is the target entity. Fig. 14A shows the obtained target entity "royal jelly", which includes attributes such as entity name, entity picture, developer, publisher, time of release, and description.
b. Acquiring an entity to be aligned: the method comprises the steps of obtaining an entity to be aligned from an entity data source according to an entity name or an entity picture of a sporocarp ' died prince ' contained in a target entity, obtaining the entity to be aligned ' died prince (Yang power generation), ' died prince handwalk ', ' died prince fighting ' and ' died fighter ' through the entity name of the target entity ' died prince ', and certainly obtaining the entity to be aligned ' died prince (Yang power generation) ' through the entity picture of the target entity ' died prince '.
Fig. 14B shows the obtained entity to be aligned, "queen of the world (yankee power generation)", which includes attributes such as entity name, entity picture, developer, publisher, time of release, and description. FIG. 14C shows the obtained entity to be aligned, "Queen handwalk," which includes attributes of entity name, developer, publisher, time of release, and so on. Fig. 14D shows the obtained entity to be aligned, "joker," which includes attributes such as entity name, entity picture, developer, publisher, time of release, and description.
c. Entity data source state 2: after the entity to be aligned is obtained, the entity to be aligned and the target entity are aligned, at this time, only 1 target entity "dieshiwang" is in the entity data source, and four sporophores are arranged below the target entity, "dieshiwang" is an original sporophore, "dieshiwang (Yang poetry)", "dieshiwang hand trip," and "dieshiwang" are the obtained entity to be aligned.
d. Determining a matched entity to be aligned: although in the entity data source state 2, the target entity "dieshiwang", below which there are four sub-entities, "dieshiwang" is the original sub-entity, "dieshiwang (popcom), hand trip," dieshiwang "and" diedouwang "are the newly added sub-entities, it is further necessary to determine whether the newly added sub-entities are the matched entities to be aligned, use the matched entities to be aligned as the newly added sub-entities of the target entity, and remove the unmatched entities to be aligned without using them as the newly added sub-entities of the target entity. Therefore, the entity to be aligned, "the queen messy (Yang Dynasty pronoun)", "the queen messy hand trip", and "the queen messy" are matched with the original fruit body "the queen messy" of the target entity to obtain the matched entity to be aligned, "the queen messy (Yang poty pronoun)", so that the "queen messy (Yang poy pronoun)" is finally used as the new fruit body of the target entity, the "queen messy hand trip", and "the queen messy" are removed, and the new fruit body of the fruit body "the queen messy" which is not used as the target entity is added.
And (3) second round circulation:
a. entity data source state 1: since the "royal jelly" (the pronoun dialect) is aligned with the sporocarp "royal jelly" of the target entity in the previous round, the entity data source only comprises three game entities, namely, the target entity "royal jelly", the entity "royal jelly hand trip", and the entity "royal jelly", wherein the target entity "royal jelly" comprises two sporocarp: the "disorder king" and the "disorder king (Yang power speakers)".
b. Acquiring an entity to be aligned: acquiring a entity to be aligned, namely a hand trip of a princess to be aligned and a princess to be aligned from an entity data source according to entity names or entity pictures of a sporocarp, namely the princess to be confused, and the princess to be confused, contained in a target entity.
c. Entity data source state 2: after the entity to be aligned is obtained, the entity to be aligned and the target entity are aligned, at this time, only 1 game entity "dieshiwang" is in the entity data source, and four sub-entities are arranged below the entity data source, wherein the "dieshiwang", "dieshiwang (yangmo), is an original sub-entity, and the" dieshiwang "is a hand trip, and the" dieshiwang "is the obtained entity to be aligned.
d. Determining a matched entity to be aligned: although in the entity data source state 2, the target entity "royal jelly" has four sub-entities below it, "royal jelly", "royal jelly (yangmei) is the original sub-entity," royal jelly handwalk "," royal jelly "is the newly added sub-entity, it is necessary to further determine whether the newly added sub-entity is the matched entity to be aligned, use the matched entity to be aligned as the newly added sub-entity of the target entity, and remove the unmatched entity to be aligned without using it as the newly added sub-entity of the target entity. Therefore, the entity to be aligned, namely the Rong Queen handwalk, the Rong Queen, is matched with the original sporophore of the target entity, namely the Rong Queen, and the Rong Queen (Yang power generation) to obtain the matched entity to be aligned, namely the Rong Queen handwalk, so that the Rong Queen handwalk can be used as a new sporophore of the target entity, and the Rong Queen is removed without being used as the new sporophore of the target entity.
And a third cycle:
a. entity data source state 1: since the "hand trip of the princess of died-down" is aligned with the sporocarp "princess of died-down", "princess of died-down (yand-down speakers)" of the target entity in the previous cycle, the entity data source only contains two game entities, the target entity "princess" and the entity "princess of died-down", wherein the target entity "princess" includes three sporocarps: "disorder king", "disorder king (Yang power generation)" and "disorder king" hand trip.
b. Acquiring an entity to be aligned: the cycle ends.
15A-15B illustrate a front-to-back effect comparison diagram of an entity alignment method applied in the field of gaming, according to one embodiment of the present application.
The entity alignment method of the embodiment of the application can be applied to a game knowledge graph in the field of games, before entity alignment is carried out, search results of two entities, namely bingo disappearing and disappearing are found in the process of searching for "bingo disappearing and opening a shop" and "bingo disappearing and disappearing-" love apartment "recommendation", are shown in fig. 15A, however, the two entities are actually pointed to the same game, and therefore the two entities can be aligned through the entity alignment method of the embodiment of the application. The effect after entity alignment processing is shown in fig. 15B, after "bingo disappearing" is searched, only one entity appearing "bingo disappearing-disappearing and disappearing store" makes the search result more concise and ordered.
Embodiments of the apparatus of the present application are described below, which may be used to perform the entity alignment methods of the above-described embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the entity alignment method described above in the present application.
Fig. 16 shows a block diagram of an entity alignment apparatus according to an embodiment of the present application, and referring to fig. 16, an entity alignment apparatus 1600 according to an embodiment of the present application includes: an obtaining unit 1602, a matching unit 1604, an updating unit 1606, and a processing unit 1608.
The acquiring unit 1602 is configured to acquire a target entity and an entity to be aligned corresponding to the target entity; a matching unit 1604, configured to match the entity to be aligned with a sub-entity included in the target entity, so as to determine a matched entity to be aligned from the entity to be aligned, and update the target entity by using the matched entity to be aligned as a new sub-entity of the target entity, so as to obtain a new target entity; an updating unit 1606, configured to continue to perform matching of the entity to be aligned and updating of the target entity based on a sub-entity included in the new target entity until a matching end condition is satisfied; the processing unit 1608 is configured to perform alignment processing on a sub-entity included in the target entity obtained after the matching end condition is satisfied.
In some embodiments of the present application, the obtaining unit 1602 includes: an acquisition subunit configured to acquire a sub-entity included in the target entity; and the searching subunit is configured to search and obtain the entity to be aligned according to the attribute characteristics of the sub-entity.
In some embodiments of the present application, the lookup subunit is further configured to: if the entity name is contained in the entity, searching an entity with the similarity between the entity name and the entity name of the entity being greater than or equal to a first similarity threshold value from an entity data source according to the entity name of the entity, and taking the searched entity as the entity to be aligned; if the entity comprises the entity picture, searching an entity with the similarity between the entity picture and the entity picture of the entity being greater than or equal to a second similarity threshold from an entity data source according to the entity picture of the entity, and taking the searched entity as the entity to be aligned.
In some embodiments of the present application, the lookup subunit is further configured to: and before searching from the entity data source according to the entity name of the sporocarp, denoising the entity name contained in the entity data source so as to remove the noise word in the entity name contained in the entity data source according to a set noise word bank.
In some embodiments of the present application, the lookup subunit is further configured to: before searching from the entity data source according to the entity picture of the sub-entity, extracting a first feature vector of the entity picture contained in the entity data source and a second feature vector of the entity picture of the sub-entity, so as to calculate the similarity between the entity picture of the entity data source and the entity picture of the sub-entity based on the first feature vector and the second feature vector.
In some embodiments of the present application, the matching unit 1604 is further configured to: matching the similarity of the entity picture of the entity to be aligned and the entity picture of the entity contained in the target entity; and taking the entity to be aligned with the similarity between the entity picture and the entity pictures of the sub-entities with the preset proportion in the target entity, wherein the similarity is greater than the preset similarity, as the matched entity to be aligned.
In some embodiments of the present application, the matching unit 1604 is further configured to: carrying out similarity matching on the description attribute of the entity to be aligned and the description attribute of a sub-entity contained in the target entity; and taking the entity to be aligned, of which the similarity between the description attribute and the description attributes of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, as the matched entity to be aligned.
In some embodiments of the present application, the obtaining unit 1602 is further configured to: acquiring the right information of the matched entity to be aligned and the right information of a sub-entity contained in the target entity; and aligning the matched right information of the entity to be aligned and the right information of the sub-entity contained in the target entity, and storing the aligned right information and the right information into a right information list.
In some embodiments of the present application, the matching unit 1604 is further configured to: similarity matching is carried out on the right information of the entity to be aligned and the right information of the sub-entity contained in the target entity, and the difference between the issuing time of the entity to be aligned and the issuing time of the sub-entity contained in the target entity is calculated; and taking the entity to be aligned as the matched entity to be aligned, wherein the similarity between the ownership information and the ownership information of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, and the difference between the issuing time and the issuing time of the sub-entities with the preset proportion in the target entity is less than the time threshold.
In some embodiments of the present application, the match termination condition includes at least one of: the matching times reach a first preset threshold value, and the difference between the number of entities to be aligned, which are matched with the sporocarp in the two adjacent matching processes, reaches a second preset threshold value.
In some embodiments of the present application, the updating unit 1606 is further configured to: re-searching the entity to be aligned from the entity data source based on the sub-entity contained in the new target entity; and according to the entity to be aligned obtained by searching again and the sub-entity contained in the new target entity, performing matching of the entity to be aligned and updating of the target entity.
In some embodiments of the present application, the updating unit 1606 is further configured to: acquiring entities to be aligned which are determined in the previous matching process and are not matched with the sub-entities to be aligned to serve as a new entity set to be aligned; and according to the new entity set to be aligned and the sub-entity contained in the new target entity, performing matching of the entity to be aligned and updating of the target entity.
FIG. 17 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 1700 of the electronic device shown in fig. 17 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments.
As shown in fig. 17, a computer system 1700 includes a Central Processing Unit (CPU)1701 that can perform various appropriate actions and processes, such as executing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1702 or a program loaded from a storage portion 1708 into a Random Access Memory (RAM) 1703. In the RAM 1703, various programs and data necessary for system operation are also stored. The CPU 1701, ROM 1702, and RAM 1703 are connected to each other through a bus 1704. An Input/Output (I/O) interface 1705 is also connected to the bus 1704.
The following components are connected to the I/O interface 1705: an input section 1706 including a keyboard, a mouse, and the like; an output section 1707 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1708 including a hard disk and the like; and a communication section 1709 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 1709 performs communication processing via a network such as the internet. A driver 1710 is also connected to the I/O interface 1705 as necessary. A removable medium 1711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1710 as necessary, so that a computer program read out therefrom is mounted into the storage portion 1708 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1709, and/or installed from the removable media 1711. When the computer program is executed by a Central Processing Unit (CPU)1701, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. A method of entity alignment, the method comprising:
acquiring a target entity and an entity to be aligned corresponding to the target entity;
matching the entity to be aligned with the entity contained in the target entity to determine a matched entity to be aligned from the entity to be aligned, and updating the target entity by taking the matched entity to be aligned as a new entity of the target entity to obtain a new target entity;
continuing to perform matching of the entity to be aligned and updating of the target entity based on the sub-entity contained in the new target entity until a matching end condition is met;
and aligning the sub-entities contained in the target entity obtained after the matching end condition is satisfied.
2. The method of claim 1, wherein the obtaining a target entity and an entity to be aligned corresponding to the target entity comprises:
acquiring a sporocarp contained in a target entity;
and searching and obtaining the entity to be aligned according to the attribute characteristics of the sub-entity.
3. The method according to claim 2, wherein finding the entity to be aligned according to the attribute of the sub-entity comprises:
if the entity name is contained in the entity, searching an entity with the similarity between the entity name and the entity name of the entity being greater than or equal to a first similarity threshold value from an entity data source according to the entity name of the entity, and taking the searched entity as the entity to be aligned;
if the entity comprises the entity picture, searching an entity with the similarity between the entity picture and the entity picture of the entity being greater than or equal to a second similarity threshold from an entity data source according to the entity picture of the entity, and taking the searched entity as the entity to be aligned.
4. The method of claim 3, further comprising:
and before searching from the entity data source according to the entity name of the sporocarp, denoising the entity name contained in the entity data source so as to remove the noise word in the entity name contained in the entity data source according to a set noise word bank.
5. The method of claim 3, further comprising:
before searching from the entity data source according to the entity picture of the sub-entity, extracting a first feature vector of the entity picture contained in the entity data source and a second feature vector of the entity picture of the sub-entity, so as to calculate the similarity between the entity picture of the entity data source and the entity picture of the sub-entity based on the first feature vector and the second feature vector.
6. The method according to claim 1, wherein matching the entity to be aligned with a sub-entity included in the target entity to determine a matching entity to be aligned from the entities to be aligned comprises:
matching the similarity of the entity picture of the entity to be aligned and the entity picture of the entity contained in the target entity;
and taking the entity to be aligned with the similarity between the entity picture and the entity pictures of the sub-entities with the preset proportion in the target entity, wherein the similarity is greater than the preset similarity, as the matched entity to be aligned.
7. The method according to claim 1, wherein matching the entity to be aligned with a sub-entity included in the target entity to determine a matching entity to be aligned from the entities to be aligned comprises:
carrying out similarity matching on the description attribute of the entity to be aligned and the description attribute of a sub-entity contained in the target entity;
and taking the entity to be aligned, of which the similarity between the description attribute and the description attributes of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, as the matched entity to be aligned.
8. The method according to claim 6 or 7, characterized in that the method further comprises:
acquiring the right information of the matched entity to be aligned and the right information of a sub-entity contained in the target entity;
and aligning the matched right information of the entity to be aligned and the right information of the sub-entity contained in the target entity, and storing the aligned right information and the right information into a right information list.
9. The method according to claim 1, wherein matching the entity to be aligned with a sub-entity included in the target entity to determine a matching entity to be aligned from the entities to be aligned comprises:
similarity matching is carried out on the right information of the entity to be aligned and the right information of the sub-entity contained in the target entity, and the difference between the issuing time of the entity to be aligned and the issuing time of the sub-entity contained in the target entity is calculated;
and taking the entity to be aligned as the matched entity to be aligned, wherein the similarity between the ownership information and the ownership information of the sub-entities with the preset proportion in the target entity is greater than the preset similarity, and the difference between the issuing time and the issuing time of the sub-entities with the preset proportion in the target entity is less than the time threshold.
10. The method of claim 1, wherein the match termination condition comprises at least one of:
the matching times reach a first preset threshold value, and the difference between the number of entities to be aligned, which are matched with the sporocarp in the two adjacent matching processes, reaches a second preset threshold value.
11. The method according to claim 1, wherein continuing the updating of the target entity based on the matching of the entity to be aligned and the sub-entity contained in the new target entity comprises:
re-searching the entity to be aligned from the entity data source based on the sub-entity contained in the new target entity;
and according to the entity to be aligned obtained by searching again and the sub-entity contained in the new target entity, performing matching of the entity to be aligned and updating of the target entity.
12. The method according to claim 1, wherein continuing the updating of the target entity based on the matching of the entity to be aligned and the sub-entity contained in the new target entity comprises:
acquiring entities to be aligned which are determined in the previous matching process and are not matched with the sub-entities to be aligned to serve as a new entity set to be aligned;
and according to the new entity set to be aligned and the sub-entity contained in the new target entity, performing matching of the entity to be aligned and updating of the target entity.
13. An entity alignment apparatus, comprising:
the device comprises an acquisition unit, a registration unit and a registration unit, wherein the acquisition unit is configured to acquire a target entity and an entity to be aligned corresponding to the target entity;
the matching unit is configured to match the entity to be aligned with a sub-entity included in the target entity, so as to determine a matched entity to be aligned from the entity to be aligned, and update the target entity by using the matched entity to be aligned as a new sub-entity of the target entity, so as to obtain a new target entity;
the updating unit is configured to continue to perform matching of the entity to be aligned and updating of the target entity based on the sub-entity contained in the new target entity until a matching end condition is met;
and a processing unit configured to perform alignment processing on a sub-entity included in the target entity obtained after the matching end condition is satisfied.
14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the entity alignment method according to any one of claims 1 to 12.
15. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the entity alignment method of any one of claims 1 to 12.
CN202010374543.5A 2020-05-06 2020-05-06 Entity alignment method, device, computer readable medium and electronic equipment Active CN111651972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010374543.5A CN111651972B (en) 2020-05-06 2020-05-06 Entity alignment method, device, computer readable medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010374543.5A CN111651972B (en) 2020-05-06 2020-05-06 Entity alignment method, device, computer readable medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111651972A true CN111651972A (en) 2020-09-11
CN111651972B CN111651972B (en) 2022-06-17

Family

ID=72346630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010374543.5A Active CN111651972B (en) 2020-05-06 2020-05-06 Entity alignment method, device, computer readable medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111651972B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901264A (en) * 2021-11-12 2022-01-07 央视频融媒体发展有限公司 Method and system for matching periodic entities among movie and television attribute data sources

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154284A1 (en) * 2013-11-29 2015-06-04 Katja Pfeifer Aggregating results from named entity recognition services
CN107480191A (en) * 2017-07-12 2017-12-15 清华大学 A kind of entity alignment model of iteration
CN108647318A (en) * 2018-05-10 2018-10-12 北京航空航天大学 A kind of knowledge fusion method based on multi-source data
CN109726294A (en) * 2018-12-04 2019-05-07 北京奇艺世纪科技有限公司 A kind of App entity alignment schemes, device and electronic equipment
CN110377747A (en) * 2019-06-10 2019-10-25 河海大学 A kind of knowledge base fusion method towards encyclopaedia website
US20200026705A1 (en) * 2018-07-20 2020-01-23 Dan Benanav Automatic object inference in a database system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154284A1 (en) * 2013-11-29 2015-06-04 Katja Pfeifer Aggregating results from named entity recognition services
CN107480191A (en) * 2017-07-12 2017-12-15 清华大学 A kind of entity alignment model of iteration
CN108647318A (en) * 2018-05-10 2018-10-12 北京航空航天大学 A kind of knowledge fusion method based on multi-source data
US20200026705A1 (en) * 2018-07-20 2020-01-23 Dan Benanav Automatic object inference in a database system
CN109726294A (en) * 2018-12-04 2019-05-07 北京奇艺世纪科技有限公司 A kind of App entity alignment schemes, device and electronic equipment
CN110377747A (en) * 2019-06-10 2019-10-25 河海大学 A kind of knowledge base fusion method towards encyclopaedia website

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴运兵 等: "基于多数据源的知识图谱构建方法研究", 《福州大学学报(自然科学版)》 *
张伟莉 等: "基于半监督协同训练的百科知识库实体对齐", 《计算机与现代化》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901264A (en) * 2021-11-12 2022-01-07 央视频融媒体发展有限公司 Method and system for matching periodic entities among movie and television attribute data sources

Also Published As

Publication number Publication date
CN111651972B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN109101620B (en) Similarity calculation method, clustering method, device, storage medium and electronic equipment
WO2020233269A1 (en) Method and apparatus for reconstructing 3d model from 2d image, device and storage medium
CN113032673B (en) Resource acquisition method and device, computer equipment and storage medium
US20190303266A1 (en) String transformation based trace classification and analysis
CN112100396B (en) Data processing method and device
CN107229731B (en) Method and apparatus for classifying data
CN106294418B (en) Search method and searching system
CN111597788B (en) Attribute fusion method, device, equipment and storage medium based on entity alignment
EP3819854A1 (en) Quotation method executed by computer, quotation device, electronic device and storage medium
CN110874396B (en) Keyword extraction method and device and computer storage medium
CN112632285A (en) Text clustering method and device, electronic equipment and storage medium
CN113641707B (en) Knowledge graph disambiguation method, device, equipment and storage medium
CN113535977A (en) Knowledge graph fusion method, device and equipment
CN112883736A (en) Medical entity relationship extraction method and device
CN111581969A (en) Medical term vector representation method, device, storage medium and electronic equipment
CN115358397A (en) Parallel graph rule mining method and device based on data sampling
CN111651972B (en) Entity alignment method, device, computer readable medium and electronic equipment
CN110263318B (en) Entity name processing method and device, computer readable medium and electronic equipment
US20150248467A1 (en) Real-time calculation, storage, and retrieval of information change
CN117492825A (en) Method for generating stability annotation based on context learning and large language model
CN111639161A (en) System information processing method, apparatus, computer system and medium
CN111125332A (en) Method, device, equipment and storage medium for calculating TF-IDF value of word
CN112507214B (en) User name-based data processing method, device, equipment and medium
CN115034196A (en) Text information matching method and device, electronic equipment and storage medium
US9965517B2 (en) Windowless real-time joins

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant