CN110188207B

CN110188207B - Knowledge graph construction method and device, readable storage medium and electronic equipment

Info

Publication number: CN110188207B
Application number: CN201910408077.5A
Authority: CN
Inventors: 徐丰硕; 林凤绿; 王倪东
Original assignee: Mobvoi Innovation Technology Co Ltd
Current assignee: Mobvoi Innovation Technology Co Ltd
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2021-06-04
Anticipated expiration: 2039-05-15
Also published as: CN110188207A

Abstract

The embodiment of the application uses part of labeled data sets, namely structured and semi-structured data of encyclopedic, to link by using the characteristics of the data sets, so that the accuracy and efficiency of a link process are improved.

Description

Knowledge graph construction method and device, readable storage medium and electronic equipment

Technical Field

The invention relates to the technical field of computers, in particular to a knowledge graph construction method and device, a readable storage medium and electronic equipment.

Background

The construction process of the knowledge graph generally comprises information extraction, knowledge fusion and knowledge processing. Wherein, the information extraction comprises entity extraction, relationship extraction and attribute extraction; the knowledge fusion comprises entity linking and knowledge merging; the knowledge processing comprises ontology construction, knowledge reasoning, quality evaluation and knowledge updating. The current major tools for the fusion of knowledge-maps are Falcon-AO, Dedupe, Limes, Silk, etc.

The data form of information extraction in the traditional link method mainly comprises structured data, semi-structured data and unstructured data. The structured data has a single structure and can be directly converted into a triple for use, but the data has a small scale and is generally used in a specific field; the unstructured data can be used only after being converted into triples through entity extraction, relationship extraction and attribute extraction by adopting methods such as statistical learning or machine learning, but the current accuracy is low and the commercial requirements cannot be met. The semi-structured data balances the data scale and accuracy, and can be converted into structured data through preprocessing and standardization so as to be converted into triples. Meanwhile, in the link process of knowledge fusion, the traditional link method is complicated in procedure and prone to error, and therefore accuracy and efficiency are low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for constructing a knowledge graph, a readable storage medium, and an electronic device, which aim to improve accuracy and efficiency of a linking process.

In a first aspect, an embodiment of the present invention discloses a method for constructing a knowledge graph, where the knowledge graph is used to represent entity information of entities and relationships between different entities, and the method includes:

creating an information pattern according to the crawled data information, wherein the information pattern comprises concepts and attributes corresponding to the concepts;

carrying out entity classification on data information according to the information mode, and determining entity information of each entity and an entity identifier for identifying the entity, wherein the entity information comprises entity attributes and attribute values, the entity attributes comprise data attributes and object attributes, and the attribute values of the object attributes point to another entity;

unifying entity attribute names of different entities according to the name of the attribute in the information mode;

determining an identity link or non-identity link for each entity;

comparing entities with the same link prefix, merging the entity attributes of the non-identification link into the entity attributes of the identification link, and combining the entity attributes of each entity, wherein the link prefix is a part of the link which does not include the identification;

comparing entities with the same name, merging entity information of different entities into the same entity in response to the fact that the matched entity attribute is larger than a threshold value, and merging the entity attribute of each entity;

and outputting the entity information of the entities and the relation among different entities.

Further, the determining an identified link or a non-identified link for each entity includes:

dividing object attributes of the entity to enable each object attribute to correspond to an attribute value;

an identifying link or non-identifying link is determined for each entity and for each object attribute value of the entity.

Further, the determining an identification link or a non-identification link for each entity further comprises:

and determining a non-identification link in response to the data information not having an identification link corresponding to the entity.

selecting an identification link in response to the presence of an identification link corresponding to the entity in the data information.

Further, the method further comprises:

responding to the data type of the data attribute value as a character string type, and adding a language tag to the data attribute value;

and in response to the data type of the data attribute value being a numeric type, unifying units of the data attribute value.

Further, the incorporating the entity attribute of the non-identification link into the entity attribute of the identification link includes:

determining a mapping relation between a target entity identifier and an entity identifier to be merged, and broadcasting the mapping relation to all hosts for storing the data information through a Spark frame;

and screening the entity identifications stored in the host in each host, and executing merging.

Further, the merging of entity information of different entities into the same entity includes:

In a second aspect, an embodiment of the present invention discloses a knowledge graph constructing apparatus, where the knowledge graph is used to represent entity information of entities and relationships between different entities, and the apparatus includes:

and the pattern building module is used for creating an information pattern according to the crawled data information, and the information pattern comprises concepts and attributes corresponding to the concepts.

And the entity classification module is used for carrying out entity classification on the data information according to the information mode, determining the entity information of each entity and the entity identifier for identifying the entity, wherein the entity information comprises entity attributes and attribute values, the entity attributes comprise data attributes and object attributes, and the attribute values of the object attributes point to another entity.

And the standardization module is used for unifying the entity attribute names of different entities according to the names of the attributes in the information mode.

A link determination module for determining an identified link or a non-identified link for each entity.

The first entity merging module is used for comparing entities with the same name, merging the entity attributes of the non-identification link into the entity attributes of the identification link, and merging the entity attributes of each entity which are the same.

And the second entity merging module is used for comparing entities with the same name, merging the entity information of different entities into the same entity in response to the fact that the matched entity attribute is larger than the threshold value, and merging the entity attribute of each entity.

And the information transmission module is used for outputting the entity information of the entity and the relationship between different entities.

In a third aspect, an embodiment of the invention discloses a computer readable storage medium, which when executed by a processor implements the method as described in any one of the above.

In a fourth aspect, an embodiment of the present invention discloses an electronic device, including a memory and a processor, the memory being configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method according to any one of the above.

The embodiment of the invention uses part of marked data sets, namely structural and semi-structural data of encyclopedic, to link by using the characteristics of the data sets, thereby improving the accuracy of the link process.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of a method of knowledge-graph construction according to an embodiment of the present invention;

fig. 2 is a schematic diagram of two entity information of an alternative implementation of the embodiment of the present invention;

FIG. 3 is a flowchart of an alternative implementation of an embodiment of the invention for determining an identified link or a non-identified link for an entity;

FIG. 4 is a diagram illustrating an alternative implementation of determining an identified link or a non-identified link for an entity according to an embodiment of the invention;

FIG. 5 is a schematic diagram of a knowledge graph building apparatus according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present invention.

Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.

Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.

Fig. 1 is a flowchart of a method for constructing a knowledge graph according to an embodiment of the present invention, as shown in fig. 1, the method for constructing a knowledge graph includes:

step S100: and creating an information mode according to the crawled data information.

Specifically, the information schema includes concepts and attributes corresponding to the concepts. The data information source may be, for example, an encyclopedia web page and the content may include, for example, a URL, a name, a synonym, a brief, a body, an Infobox, an entry tag, a web page link, and the like. The information schema includes concepts and attributes corresponding to the concepts. Wherein the concepts further may also include inter-concept context relationships, such as: when the generic concept is human, the subordinate concepts may include: singers, actors, scholars, professors, entrepreneurs, etc. The attribute may further include an attribute value range and an attribute unit, where the attribute value range is used to determine the reasonability of the attribute value, for example, when the attribute is the height of a person, it may be determined that the attribute value is unreasonable if the attribute value is 30 m. The attribute unit is set according to the attribute, for example, when the attribute is height, the attribute unit is cm or m, and the like; when the attribute is body weight, the attribute unit is kg or g, etc.

In the process of creating the information mode, the data information needs to be analyzed first, the frequency of occurrence of each attribute in the data information is counted, and the attributes are sorted according to the frequency. And then screening the attributes in the attribute list according to a set proportion. For example, when the set proportion is 70%, and the attribute list is sorted from large to small according to the frequency of occurrence of the attributes, the first 70% of the attributes in the attribute list are taken. The method can screen important attributes, and prevent low efficiency caused by excessive attributes in the information mode.

Step S200: and carrying out entity classification on the data information, and determining entity information of each entity and an identifier for identifying the entity.

Specifically, the data information is subjected to entity classification according to the concept in the information mode and the attribute corresponding to the concept, and the entity information of each entity and the identification for identifying the entity are determined. The "himalaya" is distinguished into "product" entities and "mountain" entities, for example, according to the concepts "product" and "mountain" in the information schema. The entity information includes entity attributes including data attributes and object attributes, and attribute values of the object attributes are directed to another entity.

In the process of entity classification, a classification model is obtained through preprocessing, specifically, the data information is analyzed, and feature information in the data information is extracted, such as entity attribute names, vocabulary entry labels, brief descriptions and the like. The classification model is trained according to the feature information, and the process of training the classification model can be performed through a neural network, for example, the neural network can be TextCNN or TextRNN, and the like. And inputting data comprising the characteristic information into the classification model to obtain an entity and entity information corresponding to the entity. And the entity attribute in the entity is the attribute set in the information mode.

Step S300: and unifying the entity attribute names of different entities according to the names of the attributes in the information mode.

Specifically, the entity attribute names in the data information are different, which may result in that the same entity cannot be merged, for example: for the same entity, because of the entity attribute: the 'place of birth' is different from the 'place of birth', the 'height' is different from the 'height', the 'occupation' is different from the 'work', although the 'place of birth' is the same meaning, if the names of the entity attributes are directly compared, the entity attributes with the same meaning cannot be compared, and the two entities which should be combined cannot be combined. Therefore, the entity attribute names of different entities need to be unified according to the names of the attributes set in the information mode, so as to prevent the same entity from being incapable of being combined.

Step S400: an identifying link or non-identifying link is determined for each entity.

Specifically, the attribute values of the object attributes in the entity information of the entity point to another entity, and thus the process of determining an identification link or non-identification link for each entity includes determining an identification link or non-identification link for each object attribute value of an entity. Wherein the identification link is a link including an identification for identifying the entity, and is selected from the crawled data information. The non-identity link is a link that does not include an identity for identifying the entity.

Further, while determining an identifying link or a non-identifying link for each entity, in response to the data type of the data attribute value being a string type, adding a language tag to the data attribute value. And in response to the data type of the data attribute value being a numeric type, unifying units of the data attribute value. Namely, when the type of the data attribute value is a character string type, adding a language tag to the data attribute value of the character string type. For example: and if the attribute value is the famous person A, adding a language tag @ zh, and unifying the unit of the data attribute value when the type of the data attribute value is a digital type. For example: the attribute value is 173cm, the unit of the attribute value is m, and the output is 1.73 m.

Step S500: comparing the entities with the same link prefix, merging the entity attributes of the non-identification link into the entity attributes of the identification link, and combining the entity attributes of each entity which are the same.

Specifically, the links of the entities include identification links and non-identification links, that is, entities with the same name include entities with identification links and entities with non-identification links. For example: the entity named Zhang III includes the links: an entity of "/item/third/1234", an entity of "/item/third/6789", and an entity linked as "/item/third/". The link prefix is a portion of the link that does not include an identification, such as the "/item/sheet three/" portion of the identification link "/item/sheet three/1234" and the entire contents of the non-identification link "/item/sheet three/". In the process of merging the entity attributes, the entity attributes of the entities which are linked as non-identification links are split, the split entity attributes are distinguished and are respectively merged into the entities which are linked as identification links with different links. Finally, the same entity attributes within each entity are merged, for example, when the entity linked to "/item/three/1234" includes the entity attribute "age: 23 years old ", the entity attributes incorporated by the entity whose link is a non-identifying link also include" age: age 23 ", the entity attribute" age "is merged.

Further, when the process of merging the entity attributes of the non-identity link into the entity attributes of the identity link is based on different hosts, determining the mapping relationship between the target entity identifier and the entity identifier to be merged, and broadcasting the mapping relationship to all hosts for storing the data information through a Spark framework. Further, the target entity identifier, i.e., the link, is the entity identifier of the non-identity linked entity, and the to-be-merged entity identifier, i.e., the link, is the entity identifier of the identity linked entity. And screening entity identifications stored in the host in each host, and executing merging. Firstly, the mapping chart of the entity identification of the entity needing to be merged is collected to a driving end, then the mapping chart is broadcasted by using a broadcasting mechanism, each host can screen the entity identification of the host after receiving the mapping chart, and the merging is carried out only if the entity identification contained in the mapping chart exists. Because the mapping map only stores entity identification and has no information such as attributes and the like, the volume is greatly reduced, and the broadcasting efficiency is greatly improved.

Step S600: and comparing entities with the same name, merging the entity information of different entities into the same entity in response to the fact that the matched entity attribute is larger than the threshold value, and merging the entity attribute of each entity, wherein the entity attribute is the same as the entity attribute.

Specifically, all entities are grouped according to their names, which are generalized names, and may be, for example: nicknames, titles, and a variety of name-related attributes. Then, the comparison is performed in sequence, starting from the second entity, the comparison is performed with the previous entity, if the attributes of the entities match, the record is made as +1, and if the attributes of the entities are different, the record is made as 0. And in response to the matched entity attribute being greater than the threshold, considering the two entities as the same entity, and incorporating the entity information of the different entities into the same entity. For example, if the matching entity attribute is set to be greater than half of the intersection of all entity attributes, then a match is taken. Specifically, two entities named singers, one singing a song a and one recording 100 songs, one of which is song a, are also judged to be matched in the attribute of the entity "composition".

Further, when judging whether the entity attributes are matched, for the data attributes, if the attribute values are equal, the matching is judged. For object attributes, the name of the entity pointed to by the attribute value is used to judge, for example, the names of their spouses of two Zhougerens are all kuns, and whether the two kuns are one person or not, we judge that the two Zhougerens are the same person.

Further, when the process of merging the entity information of different entities into the same entity is based on different hosts, determining the mapping relationship between the target entity identifier and the entity identifier to be merged, and broadcasting the mapping relationship to all hosts for storing the data information through the Spark framework. And screening the entity identifications stored in the host in each host, and executing merging. Firstly, a mapping graph between the entity identification of the entity to be merged and the entity identification of the target entity is collected to a driving end, then the mapping graph is broadcasted by using a broadcasting mechanism, each host receives the mapping and then screens the entity identification of the local storage entity, and only the entity identification in the mapping graph exists for merging. Because the mapping map only stores entity identification and has no information such as attributes and the like, the volume is greatly reduced, and the broadcasting efficiency is greatly improved.

Step S700: and outputting the entity information of the entities and the relation among different entities.

Specifically, the entity and the entity information after the merging processing, that is, the triple composed of the entity identifier, the entity attribute, and the attribute value, are output. The attribute values of the object attributes of an entity point to another entity, linking different entities. Therefore, the relation between the entity and different entities is output while the triples consisting of the entity, the entity attribute and the attribute value are output.

The method directly utilizes the data rule in the encyclopedic data source to carry out linking, and improves the accuracy of the linking process. Meanwhile, the knowledge fusion process is based on a Spark framework and executed by using a distributed framework, and can be executed on a plurality of configured general hosts concurrently, so that the cost for purchasing high-performance hosts is reduced, and the execution efficiency is improved.

Fig. 2 is a schematic diagram of information of two entities in an alternative implementation manner according to an embodiment of the present invention, as shown in fig. 2, the "names" of the two entities are both "celebrity a", but the entity identified as "123" further includes an attribute { "place of birth": "Taiwan"; the height is as follows: "173 cm"; "blood type": "O"; "occupation": "singer", entity identified as "124" also contains the attribute { "originated from": "Taiwan"; "high": "173 cm"; "blood type": "O"; the 'working': "singer". Wherein the entity attribute "blood type": the "O" is the same, while the attribute values of the entity attributes "place of birth" and "place of birth", "height" and "high", "profession" and "work" are the same, while the entity attribute names are different. The different entity attribute names are only different literally, and although the names have the same meaning, if the entity attribute names are directly compared, the entity attributes with the same meaning cannot be compared, so that two entities which should be combined cannot be combined. Therefore, the entity attribute names of different entities need to be unified according to the names of the set attributes in the information mode, so as to prevent the same entities from being incapable of being combined. For example, when the set attribute names are "place of birth", "height", "blood type", "work", after unifying the entity attribute names, the entity identified as "123" includes the attribute { "place of birth": "Taiwan"; the height is as follows: "173 cm"; "blood type": "O"; the 'working': "singer", the entity identified as "124" also contains the attribute { "place of birth": "Taiwan"; the height is as follows: "173 cm"; "blood type": "O"; "work": "singer". The entity with the entity identifier "123" and the entity with the entity identifier "124" can be judged to be the same entity by comparing the entity attribute name with the attribute value corresponding to the entity attribute, and can be directly merged.

Fig. 3 is a flowchart of determining an identified link or a non-identified link for an entity according to an alternative implementation manner of the embodiment of the present invention, and as shown in fig. 3, the method for determining an identified link or a non-identified link for an entity includes:

step S410: the object attributes of the entity are partitioned such that each object attribute corresponds to an attribute value.

Specifically, the attribute value of the object attribute of the entity is segmented, for example, the famous nail works include 'works A' and 'works B', and the segmented works are divided into { 'works: works A', 'works: works B' }

Step S420: a non-identifying link is determined.

Specifically, the step S420 determines a non-identification link in response to that no identification link corresponding to the entity exists in the data information. Further, in response to the link corresponding to the entity being contained in the Infobox or in another location in the data information and not containing the identifier, determining the link as a non-identifier link corresponding to the entity; and in response to the data information not having a link corresponding to the entity, creating a virtual link corresponding to the entity, the virtual link not including an identifier.

Step S430: an identification link is selected.

Specifically, the step S430 selects an identification link in response to the existence of an identification link corresponding to the entity in the data information. Further, in response to the Infobox or other location in the data information containing a link corresponding to the entity, selecting a link containing an identifier from the links, and determining the link containing the identifier as the identifier link corresponding to the entity.

The method can determine the corresponding link for each entity, facilitates the entities to directly utilize the data rule in the encyclopedia data source for linking, and is high in accuracy.

It should be understood that other existing algorithms may be used by those skilled in the art to implement the preprocessing steps described above.

Fig. 4 is a schematic diagram of determining an identified link or a non-identified link for an entity according to an alternative implementation manner of the embodiment of the present invention, as shown in fig. 4, where the initial triple includes an entity attribute and an entity attribute value. Wherein the entity attributes include: { "name": "famous nail"; the height is as follows: "173 cm"; "wife": "celebrity B"; the "work": "work a, work B, work C", wherein the "name" and "height" are data attributes, and the "wife" and "work" are object attributes, wherein each attribute value points to an entity.

And segmenting the attribute values of the object attributes in the initial triples, wherein the obtained entity attributes in the segmented triples comprise the name: "famous nail"; the height is as follows: "173 cm"; "wife": "celebrity B"; the "work": "work A"; the "work": "work B"; the "work": "work C" }, which makes each object attribute correspond to only one object attribute value.

And processing the segmented triple, and determining an identification link or a non-identification link for each entity, namely determining an identification or non-identification link for the attribute value of each object attribute. And simultaneously responding to the data type of the data attribute value as a character string type, and adding a language tag to the data attribute value. Unifying units of the data attribute values in response to a data type of the data attribute values. As shown in the processed triple in fig. 4, the attribute value of the data attribute "name" is "celebrity a", the data type is a character string type, so that a language tag @ zh is added to the data attribute value, the attribute value of the data attribute "height" is "173 cm", and the data type is a numeric type, so that the unit of the data attribute value is unified according to a set attribute value unit, for example, when the set unit is m, the data attribute value "173 cm" is converted into "1.73 m". The "wife" and "work" are subject attributes, and thus a link is determined for the attribute value of each subject attribute. For example: selecting the identification link http:// baike/item/celebrity B/1234 for "celebrity B", selecting the identification link http:// baike/item/work A/567 "for" work A ", selecting the identification link http:// baike/item/work B/557" for "work B", and determining the non-identification link http:// baike/item/work C/"for" work C ".

Fig. 5 is a schematic diagram of a knowledge graph constructing apparatus according to an embodiment of the present invention, as shown in fig. 5, the knowledge graph constructing apparatus includes a pattern constructing module 51, an entity classifying module 52, a normalizing module 53, a link determining module 54, a first entity combining module 55, a second entity combining module 56, and an information transmitting module 57.

Specifically, the schema construction module 51 is used for creating an information schema from the crawled data information, wherein the information schema comprises concepts and attributes corresponding to the concepts. The entity classification module 52 is configured to perform entity classification on the data information according to the information pattern, and determine entity information of each entity and an entity identifier for identifying the entity, where the entity information includes an entity attribute and an attribute value, the entity attribute includes a data attribute and an object attribute, and the attribute value of the object attribute points to another entity. The standardizing module 53 is used for unifying the entity attribute names of different entities according to the names of the attributes in the information schema. The link determination module 54 is used to determine an identified link or a non-identified link for each entity. The first entity merging module 55 is configured to compare entities with the same name, merge entity attributes of non-identified links into entity attributes of identified links, and merge entity attributes that are the same for each entity. The second entity merging module 56 is configured to compare entities with the same name, merge entity information of different entities into the same entity in response to the matched entity attribute being greater than the threshold, and merge the entity attributes of each entity that are the same. The information transmission module 57 is used for outputting entity information of the entities and relationships between different entities.

The device directly utilizes the data rule in the encyclopedic data source to carry out linking, and improves the accuracy and efficiency of the linking process.

Fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, in this embodiment, the electronic device includes a server, a terminal, and the like. As shown, the electronic device includes: at least one processor 62; a memory 61 communicatively coupled to the at least one processor; and a communication component 63 communicatively coupled to the storage medium, the communication component 63 receiving and transmitting data under control of the processor 62; wherein the memory 61 stores instructions executable by the at least one processor 62, the instructions being executed by the at least one processor 62 to implement the method of knowledge-graph construction in the above embodiments.

In particular, the memory 61, as a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 62 executes various functional applications and data processing of the device by executing nonvolatile software programs, instructions and modules stored in the memory 61, so as to implement the above-mentioned knowledge graph building method.

The memory 61 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory 61 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 61 may optionally include memory located remotely from the processor, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in memory 61 and, when executed by one or more processes 62, perform the method of knowledge-graph construction in any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.

The present invention also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.

That is, as those skilled in the art can understand, all or part of the steps in the method for implementing the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made to the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for constructing a knowledge graph, wherein the knowledge graph is used for representing entity information of entities and relations among different entities, and the method comprises the following steps:

carrying out entity classification on data information according to the information mode, and determining entity information of each entity and an entity identifier for identifying the entity, wherein the entity information comprises entity attributes and attribute values, the entity attributes comprise data attribute values and object attributes, and the attribute values of the object attributes point to another entity;

determining an identity link or non-identity link for each entity;

comparing entities with the same link prefix, merging the entity attributes of the non-identification link into the entity attributes of the identification link, and combining the entity attributes of each entity, wherein the link prefix is a part of the link without the identification;

comparing entities with the same name, merging entity information of different entities into the same entity in response to the fact that the matched entity attribute is larger than a threshold value, and merging the entity attribute of each entity, wherein the entity attribute is the same as the entity attribute;

outputting entity information of the entities and the relation between different entities;

the determining an identity link or non-identity link for each entity comprises:

2. The method of claim 1, wherein said determining an identified link or a non-identified link for each entity further comprises:

3. The method of claim 1, wherein said determining an identified link or a non-identified link for each entity further comprises:

4. The method of claim 1, wherein the method further comprises:

5. The method of claim 1, wherein incorporating the entity attributes of the non-identifying link into the entity attributes of the identifying link comprises:

6. The method of claim 1, wherein the incorporating entity information of different entities into the same entity comprises:

7. An apparatus for constructing a knowledge graph representing entity information of entities and relationships between different entities, the apparatus comprising:

the mode construction module is used for constructing an information mode according to the crawled data information, and the information mode comprises concepts and attributes corresponding to the concepts;

the entity classification module is used for carrying out entity classification on the data information according to the information mode, determining entity information of each entity and an entity identifier for identifying the entity, wherein the entity information comprises entity attributes and attribute values, the entity attributes comprise data attribute values and object attributes, and the attribute values of the object attributes point to another entity;

the standardization module is used for unifying the entity attribute names of different entities according to the names of the attributes in the information mode;

a link determination module for determining an identity link or non-identity link for each entity;

the first entity merging module is used for comparing entities with the same link prefix, merging entity attributes of non-identification links into entity attributes of identification links, and merging the entity attributes of each entity, wherein the link prefix is a part of the link, which does not include an identification;

the second entity merging module is used for comparing entities with the same name, merging entity information of different entities into the same entity in response to the fact that the matched entity attribute is larger than a threshold value, and merging the entity attribute of each entity;

the information transmission module is used for outputting entity information of the entities and the relation between different entities;

8. A computer readable storage medium storing computer program instructions, which when executed by a processor implement the method of any one of claims 1-6.

9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-6.