CN112434811A

CN112434811A - Knowledge graph construction method and device, computing equipment and storage medium

Info

Publication number: CN112434811A
Application number: CN201911147385.3A
Authority: CN
Inventors: 郑毅; 袁晶; 卢栋才; 王喆锋; 怀宝兴; 彭朱炜; 王禹; 章涛; 王鹏
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2019-08-26
Filing date: 2019-11-21
Publication date: 2021-03-02

Abstract

The application discloses a knowledge graph construction method, which comprises the following steps: receiving an information extraction instruction, wherein the information extraction instruction is used for indicating an information extraction strategy adopted for extracting information from source data for constructing a knowledge graph; then, adopting an information extraction strategy indicated by the information extraction instruction to extract information from the source data to obtain a plurality of multi-element group data, wherein each multi-element group data comprises: information indicating entity types, entity attributes and association relationships of the entities; and then constructing a knowledge graph according to the multi-element data, wherein the knowledge graph records entities included in the source data and relations among different entities. The application ensures the application range of the knowledge graph construction method and improves the flexibility of the construction of the knowledge graph.

Description

Knowledge graph construction method and device, computing equipment and storage medium

The present application claims priority from chinese patent application No. 201910792526.0 entitled "a method and apparatus for constructing a knowledge graph" filed on 26.08/2019, which is incorporated herein by reference in its entirety.

Technical Field

The application relates to the technical field of cloud computing, in particular to a knowledge graph construction method and device, computing equipment and a storage medium.

Background

More and more enterprises have realized the importance of knowledge to business, and the knowledge system in the business needs to be combed urgently to improve the working efficiency and the effect. Knowledge Graph (KG) is a representation form of knowledge organization and knowledge representation, and it has become a development trend to use the knowledge graph to represent a knowledge system.

In the related art, when a knowledge graph is constructed, a knowledge graph ontology model needs to be designed by combining domain knowledge in the domain to which a service belongs, then information extraction is performed on data related to the service to obtain information used for indicating an entity in the data, and then the extracted information is filled into the knowledge graph ontology to obtain the knowledge graph.

The construction process of the knowledge graph is usually realized by a customized module, and the customized module is customized according to the field requirements of the field to which the business belongs. However, due to different requirements of different fields, the customized module is difficult to be used for constructing the knowledge maps of different fields, so that the customized module is poor in applicability.

Disclosure of Invention

The application provides a knowledge graph construction method and device, computing equipment and a storage medium, which can solve the problem of poor applicability of a knowledge graph construction method in the related art.

In a first aspect, the present application provides a method for constructing a knowledge graph, the method comprising: receiving an information extraction instruction, wherein the information extraction instruction is used for indicating an information extraction strategy adopted for extracting information from source data for constructing a knowledge graph; adopting an information extraction strategy indicated by an information extraction instruction to extract information of the source data to obtain a plurality of multi-element group data, wherein each multi-element group data comprises: information indicating entity types, entity attributes and association relationships of the entities; and constructing a knowledge graph according to the multi-element data, wherein the knowledge graph records entities included in the source data and relations among different entities.

According to the knowledge graph construction method, the information extraction strategy adopted for extracting the information of the source data for constructing the knowledge graph is determined by receiving the information extraction instruction, the information extraction strategy is adopted for extracting the information of the source data to obtain a plurality of multi-element data, and then the knowledge graph is constructed according to the multi-element data.

Optionally, before constructing the knowledge-graph from the plurality of multivariate data, the method may further comprise: acquiring a knowledge graph body model required to be used when a knowledge graph is constructed, wherein the knowledge graph body model defines standardized description of multi-component data in the knowledge graph; receiving a mapping policy instruction, wherein the mapping policy instruction is used for indicating a mapping policy for performing association mapping on a plurality of multi-component data according to the standardized description of the multi-component data; and performing associated mapping on the multi-component data according to the standardized description of the multi-component data and the mapping strategy indicated by the mapping strategy instruction to obtain the multi-component data which is subjected to standardized description by adopting the standardized description of the multi-component data. Correspondingly, the implementation process of constructing the knowledge graph according to a plurality of multi-element data comprises the following steps: and constructing a knowledge graph according to the plurality of multivariate data after standardized description.

The associative mapping is also known as a knowledge mapping. The knowledge mapping is to establish a mapping relation between extracted elements and ontology elements and adopt the ontology elements to perform standardized description on the corresponding extracted elements according to the mapping relation. Unified representation of multi-element data can be realized through knowledge mapping, and readability of the knowledge map is improved.

In one implementation of the mapping policy, a matching degree of each extracted element and each ontology element may be obtained. When the matching degree of a certain extraction element and an ontology element is greater than a matching degree threshold value, a mapping relation between the extraction element and the ontology element can be established, and the ontology element is indicated to be used for carrying out standardized description on the extraction element.

In another implementation of the mapping policy, the user may configure the mapping policy through the terminal. The implementation process comprises the following steps: the user can indicate the mapping relation between the extraction element and the ontology element of the standardized description defined by the knowledge graph ontology model in the multi-group data through the terminal, and indicate the extraction element with the mapping relation to be described in a standardized way by using the ontology element.

The mapping strategy is configured by a user, and the configured mapping strategy is used for carrying out association mapping on the multi-component data, so that the knowledge map construction device can use different mapping strategies aiming at different types of data, the accuracy of association mapping on the multi-component data can be improved, and the accuracy of knowledge map construction is improved.

Optionally, before constructing the knowledge-graph from the plurality of multivariate data, the method may further comprise: determining different multi-element group data comprising information indicating the same entity in the multi-element group data according to a specified multi-element group data matching strategy; different multi-element group data including information indicating the same entity is merged. Correspondingly, the implementation process of constructing the knowledge graph according to a plurality of multi-element data comprises the following steps: and constructing a knowledge graph according to the combined multi-element data.

When the knowledge graph is constructed according to a plurality of source data, the representation modes of information used for indicating the same entity may be different, and if the knowledge graph is constructed directly according to the extracted multi-element data, the same entity adopting different representation modes may be used as different entities, so that the constructed knowledge graph cannot accurately reflect the content embodied by the source data. By combining different multi-element group data including elements for indicating the same entity and constructing the knowledge graph according to the combined multi-element group data, the accuracy of the constructed knowledge graph can be improved.

In one implementation, before determining, in the multiple tuple data, different tuple data including information indicating the same entity according to a specified tuple data matching policy, the method further includes: and receiving a matching strategy instruction, wherein the matching strategy instruction is used for indicating whether a matching algorithm and a matching degree threshold value which indicate the information of the same entity are included in different multi-element data or not. Correspondingly, according to the specified multi-tuple data matching strategy, the implementation process of determining different multi-tuple data comprising the information indicating the same entity in the multi-tuple data comprises the following steps: and when the matching degree of the information indicating the entity in the two multi-element group data is determined to be not less than the threshold value of the matching degree according to the matching algorithm indicated by the matching strategy instruction, determining that the two multi-element group data comprise the information indicating the same entity.

The matching algorithm is selected through the matching strategy instruction, and whether the elements indicating the same entity are included in different multi-element group data is judged by using the selected matching algorithm, so that different matching algorithms can be adopted for the elements obtained based on data in different fields, the flexibility of knowledge mapping and the accuracy of obtaining the matching degree can be improved, and the accuracy and the comprehensiveness of knowledge map construction are improved.

Optionally, the source data includes: and multipath data with different sources. That is, the knowledge graph construction method provided by the embodiment of the application can be used for constructing the knowledge graph aiming at multiple paths of data. Correspondingly, the implementation process of extracting information from the source data by using the information extraction policy indicated by the information extraction instruction to obtain a plurality of multi-element group data may include: and respectively extracting the information of each path of data by using an information extraction strategy which is used for extracting the information of each path of data and indicated by the information extraction instruction to obtain a plurality of multi-element group data respectively corresponding to the multi-path data. At this time, according to a plurality of multi-element data, the implementation process of constructing the knowledge graph comprises the following steps: and constructing a knowledge graph according to a plurality of multi-element data corresponding to the multi-path data. Thus, the efficiency of constructing the knowledge graph from the plurality of pieces of data can be improved.

Wherein after constructing the knowledge-graph from the plurality of multivariate data, the method may further comprise: after the source data are determined to be updated, extracting information of incremental data in the updated source data according to a strategy indicated by the information extraction instruction to obtain a plurality of multi-element group data corresponding to the incremental data; and updating the knowledge graph according to a plurality of multivariate data corresponding to the incremental data.

By incrementally updating the knowledge graph, the amount of calculation in the process of constructing the knowledge graph according to the updated source data can be reduced, and the construction efficiency of the knowledge graph can be improved.

In an implementation manner, the implementation process of extracting information from the source data by using the information extraction policy indicated by the information extraction instruction may include: and extracting the information of the source data by adopting the AI model indicated by the information extraction instruction. The AI model is a trained model, training samples of the AI model are labeled by using standardized description of multi-element data in the knowledge graph body model, and the knowledge graph body model defines the standardized description of the multi-element data in the knowledge graph.

Because the training sample of the AI model is marked by using the standardized description of the multi-element data in the knowledge graph body model, when the AI model obtained by training the marking sample is used for extracting information, the multi-element data extracted by the AI model is information expressed by using the body element defined in the knowledge graph body model, so that the process of subsequently carrying out standardized description on the extracted multi-element data according to the body element can be reduced, the process of constructing the knowledge graph is simplified, and the construction efficiency of the knowledge graph is improved.

In a second aspect, the present application provides a knowledge-graph building apparatus, comprising: the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an information extraction instruction which is used for indicating an information extraction strategy adopted for extracting information from source data for constructing a knowledge graph; the extraction module is used for extracting the information of the source data by adopting an information extraction strategy indicated by the information extraction instruction to obtain a plurality of multi-element group data, and each multi-element group data comprises: information indicating entity types, entity attributes and association relationships of the entities; and the construction module is used for constructing a knowledge graph according to the multiple groups of data, and the knowledge graph records the entities contained in the source data and the relationship among different entities.

Optionally, the apparatus further comprises: the acquisition module is used for acquiring a knowledge graph body model which is required to be used when the knowledge graph is constructed, and the knowledge graph body model defines the standardized description of multi-group data in the knowledge graph; the receiving module is further used for receiving a mapping strategy instruction, and the mapping strategy instruction is used for indicating a mapping strategy for performing association mapping on a plurality of multi-component data according to the standardized description of the multi-component data; and the mapping module is used for performing associated mapping on the multi-component data according to the standardized description of the multi-component data and the mapping strategy indicated by the mapping strategy instruction to obtain the multi-component data which is subjected to standardized description by adopting the standardized description of the multi-component data.

Correspondingly, the building block is specifically configured to: and constructing a knowledge graph according to the plurality of multivariate data after standardized description.

Optionally, the apparatus further comprises: a determining module for determining different multi-component data including information indicating the same entity among the multi-component data according to a designated multi-component data matching policy; and the merging module is used for merging different multi-element group data comprising the information indicating the same entity.

Correspondingly, the building block is specifically configured to: and constructing a knowledge graph according to the combined multi-element data.

Optionally, the receiving module is further configured to receive a matching policy instruction, where the matching policy instruction is used to instruct to determine whether a matching algorithm and a matching degree threshold that indicate information of the same entity are included in different multi-element data.

Correspondingly, the determining module is specifically configured to: and when the matching degree of the information indicating the entity in the two multi-element group data is determined to be not less than the threshold value of the matching degree according to the matching algorithm indicated by the matching strategy instruction, determining that the two multi-element group data comprise the information indicating the same entity.

Wherein, the source data includes: the source of the multi-path data is different, and at this time, the extraction module is specifically configured to: and respectively extracting the information of each path of data by using an information extraction strategy which is used for extracting the information of each path of data and indicated by the information extraction instruction to obtain a plurality of multi-element group data respectively corresponding to the multi-path data.

Correspondingly, the building block is specifically configured to: and constructing a knowledge graph according to a plurality of multi-element data corresponding to the multi-path data.

Optionally, the extracting module is further configured to, after it is determined that the source data is updated, perform information extraction on incremental data in the updated source data according to a policy indicated by the information extraction instruction, to obtain multiple multi-element group data corresponding to the incremental data;

correspondingly, the construction module is further used for updating the knowledge graph according to the multi-element data corresponding to the incremental data.

Optionally, the extraction module is specifically configured to: extracting information of the source data by adopting an AI model indicated by the information extraction instruction; the AI model is a trained model, training samples of the AI model are labeled by using standardized description of multi-element data in the knowledge graph body model, and the knowledge graph body model defines the standardized description of the multi-element data in the knowledge graph.

In a third aspect, the present application provides a computing device comprising a processor and a memory; the memory has a computer program stored therein; when the processor executes the computer program, the computing device implements the method for constructing a knowledge graph provided by the first aspect.

In a fourth aspect, the present application provides a non-volatile storage medium, and when instructions in the storage medium are executed by a processor, the method for constructing a knowledge graph provided in the first aspect is implemented.

Drawings

FIG. 1 is a schematic deployment diagram of a knowledge graph building apparatus provided in an embodiment of the present application;

FIG. 2 is a schematic deployment diagram of another knowledge-graph building apparatus provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a computing device according to an embodiment of the present application;

FIG. 4 is a flow chart of a method for knowledge graph construction provided by an embodiment of the present application;

FIG. 5 is a logic diagram for constructing a knowledge graph from two paths of data according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an interface for selecting a knowledge-graph ontology model according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a knowledge graph ontology model provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of an interface for selecting source data according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of an interface for selecting an information extraction policy according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an interface for selecting a mapping policy according to an embodiment of the present application;

FIG. 11 is a schematic diagram of an interface for selecting a matching policy according to an embodiment of the present application;

FIG. 12 is a schematic illustration of a knowledge-graph provided by an embodiment of the present application;

FIG. 13 is a schematic structural diagram of a knowledge graph constructing apparatus provided in an embodiment of the present application;

fig. 14 is a schematic structural diagram of a knowledge graph constructing apparatus according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In order to facilitate understanding of the method for constructing the knowledge graph provided by the embodiment of the application, the related knowledge of the knowledge graph is introduced first.

Knowledge-graph is a semantic network that describes objective things in graph form. A knowledge graph consists of many nodes and connections between different nodes. Nodes are used to represent entity types or entity attributes of entities such as people or organizations. Connections (also called edges) between nodes indicate that the entities represented by the nodes have some associative relationship. Wherein the entity can be represented by the entity type, the entity attribute and the incidence relation. The association relationship between a node representing an entity type of an entity and a node representing an entity attribute of the entity may include: an attribution relationship between the entity type and the entity attribute. The association relationship between the node representing the entity type of a certain entity and the entity types representing other entities may include: an external connection between the entity and the other entity.

In the embodiment of the application, the knowledge graph can be applied to various application scenarios. For example, in an information recommendation system, information recommendations may be made based on knowledge-graphs. Alternatively, in the text classification process, the classification may be based on a knowledge-graph. Alternatively, in the semantic search process, the search may be based on a knowledge graph. Or, in the fault analysis system, for the fault, the cause of the fault can be determined according to the attribute of each entity and the incidence relation between the entities presented by the knowledge graph, so as to realize the analysis of the fault.

An entity is something that is distinguishable and exists independently. Such as a person, a city, a plant, or a commodity. An entity is the most basic element in a knowledge graph, and relationships that exist between different entities may differ, and entity attributes that different entities have may differ.

For example, in a knowledge graph representing basic information about an actor, nodes may represent entity types such as family members, friends, partners, representatives, brokerages, and graduates of the actor; alternatively, the nodes may represent entity attributes such as names, heights, and nationalities of actors of the entities indicated by the respective entity types; the edge between the node representing the entity type and the node representing the entity attribute can represent the attribution relationship between the entity attribute and the entity type; the edge between the node representing the actor and the node representing the family member may represent a couple relationship, a parent-female relationship, a parent-child relationship, etc. between the actor and the family member; an edge between the node representing the actor and the node representing a friend may represent a friendship between the actor and the friend; an edge between the node representing the actor and the node representing the partner may represent a partnership between the actor and the partner; an edge between a node representing the actor and a node representing the actor's representative work may represent an affiliation between the actor and the actor's representative work; an edge between a node representing the actor and a node representing the brokerage firm may be in a contractual relationship between the actor and the brokerage firm; an edge between a node representing the actor and a node representing a graduation institution may represent a relationship between the actor and the graduation institution.

In a knowledge graph, data may be organized by multi-tuple data. The multi-tuple data can include triple data, quadruple or quintuple data, and the like. The representation form of the triple data comprises the following steps: "node-edge-node" and "node-attribute name-attribute value". The first word in the triple can be regarded as a subject, the second word in the triple can be regarded as a predicate, the third word in the triple can be regarded as an object, and the relationship of the subject and the predicate is the relationship between the first word and the third word in the triple. For example, in the triple data "caocao-rename-apoca" represented using the representation form "node-attribute name-attribute value", the subject is caocao, the predicate is rename, and the object is apoca, and the relationship of the precedent is apoca for caocao, that is, the relationship between the node for representing "caocao" and the attribute value for representing "apoca".

The embodiment of the application provides a knowledge graph construction method, which comprises the steps of determining an information extraction strategy adopted for extracting information of source data for constructing a knowledge graph by receiving an information extraction instruction, extracting the information of the source data by adopting the information extraction strategy to obtain a plurality of multi-element data, and constructing the knowledge graph according to the multi-element data.

The knowledge graph construction method provided by the embodiment of the application can be executed by a knowledge graph construction device. The knowledge graph constructing device can establish communication connection with the terminal through a wired network or a wireless network, so that the terminal can send instructions to the knowledge graph constructing device through the communication connection to control the knowledge graph constructing device to execute the knowledge graph constructing method according to the content indicated by the instructions. For example, the terminal may send an instruction to the knowledge graph constructing apparatus to instruct acquisition of source data for constructing the knowledge graph, and after receiving the instruction, the knowledge graph constructing apparatus may acquire the source data according to the instruction and execute the knowledge graph constructing method provided in the embodiment of the present application according to the source data. Or, the terminal may send an information extraction instruction to the knowledge graph constructing apparatus, and after receiving the information extraction instruction, the knowledge graph constructing apparatus may extract information from the source data by using an information extraction policy indicated by the information extraction instruction, and construct a knowledge graph according to a plurality of extracted tuple data.

The terminal can be a smart phone, a notebook computer, a tablet computer, a personal desktop computer, an intelligent camera and the like. And the terminal can be provided with a client, and a user can interact with the knowledge graph constructing device through the client. Alternatively, the user may interact with the knowledge-graph building apparatus through a web page in the terminal.

Fig. 1 is a deployment schematic diagram of a knowledge graph building apparatus provided in an embodiment of the present application, and as shown in fig. 1, the knowledge graph building apparatus 01 may be deployed in a cloud environment. The cloud environment is an entity which provides cloud services to users by using basic resources in a cloud computing mode. The cloud environment comprises a cloud data center and a cloud service platform, wherein the cloud data center comprises a large amount of basic resources owned by a cloud service provider. For example, a cloud data center includes computing, storage, and network resources, etc., and the computing resources may be a large number of computing devices (e.g., servers). Optionally, the knowledge graph constructing apparatus 01 may be deployed independently on a server or a virtual machine in the cloud data center, or the knowledge graph constructing apparatus 01 may be deployed in a distributed manner on a plurality of servers in the cloud data center, or the knowledge graph constructing apparatus 01 may be deployed in a distributed manner on a plurality of virtual machines in the cloud data center, or the knowledge graph constructing apparatus 01 may be deployed in a distributed manner on a server and a virtual machine in the cloud data center.

As shown in fig. 1, the knowledge graph constructing apparatus 01 may be abstracted by a cloud service provider on a cloud service platform to a cloud service for constructing a knowledge graph, and after a user purchases the cloud service on the cloud service platform, the cloud environment may use the knowledge graph constructing apparatus 01 to construct the cloud service of the knowledge graph for the user. Moreover, a user can upload source data for constructing the knowledge graph to a cloud environment through an Application Program Interface (API) or a web interface provided by the cloud service platform on the terminal, so that the knowledge graph constructing device 01 can construct the knowledge graph according to the source data. After the completion of the knowledge graph construction, the knowledge graph construction apparatus 01 may send the constructed knowledge graph to a terminal used by a user, or store the knowledge graph in a cloud environment, for example: and the webpage interface is presented on the cloud service platform for the user to view.

In addition, the deployment of the knowledge graph constructing apparatus 01 may be various. In another deployment, the knowledge-graph constructing apparatus 01 may be logically divided into a plurality of parts, each part having different functions, the plurality of parts may be distributively deployed in different environments, and the plurality of parts deployed in different environments cooperatively implement the function of constructing the knowledge graph for the user. For example: as shown in fig. 2, the multiple portions may be deployed in any two or three of the terminal computing device, the edge environment, and the cloud environment, respectively. The terminal computing device includes: terminal server, smart mobile phone, notebook computer, panel computer, personal desktop computer and intelligent camera etc.. An edge environment is an environment that includes a collection of edge computing devices that are closer to the end computing device. The edge computing device includes: edge servers, edge kiosks that possess computational power, etc.

It should be understood that, in the present application, which parts of the knowledge graph constructing apparatus 01 are specifically deployed in what environment is not restrictively divided, and in actual application, adaptive deployment may be performed according to the computing capability of the terminal computing device, the resource occupation of the edge environment and the cloud environment, or the specific application requirements.

In yet another deployment of the knowledge-graph constructing apparatus 01, when the knowledge-graph constructing apparatus 01 is a software apparatus, the knowledge-graph constructing apparatus 01 may be distributed by a service provider in the form of an application, and a user may download the application to a terminal used by the user and use the functions of the knowledge-graph constructing apparatus 01 in the terminal.

In yet another deployment of the knowledge-graph building apparatus 01, the knowledge-graph building apparatus 01 can be deployed on a computing device in any environment. As shown in fig. 3, the computing device 100 may include a bus 101, a processor 102, a communication interface 103, and a memory 104. The processor 102, memory 104 and communication interface 103 communicate via a bus 101.

The processor 102 may be a hardware chip, which may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof. Processor 810 may also be a general-purpose processor, such as a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.

The memory 104 may include volatile memory (volatile memory), such as Random Access Memory (RAM). The memory 104 may also include a non-volatile memory (NVM), such as a read-only memory (ROM), a flash memory, an HDD, or an SSD. The memory 104 stores executable code for constructing a knowledge graph, and the processor 102 reads the executable code in the memory 104 to execute the knowledge graph construction method provided by the embodiment of the present application. The memory 104 may also include other software modules, data, etc. needed to run processes, such as an operating system. And the operating system may be LINUX^TM,UNIX^TM,WINDOWS^TMAnd the like.

Fig. 4 is a flowchart of a method for constructing a knowledge graph according to an embodiment of the present application. The knowledge graph construction method can construct a knowledge graph according to one path of data or multiple paths of data, and the knowledge graph is constructed according to the multiple paths of data, and the knowledge graph construction process is executed as an example by a knowledge graph construction device to explain the knowledge graph construction process. Meanwhile, for the convenience of understanding, the embodiment of the present application also provides a logical block diagram (fig. 5) for constructing a knowledge graph according to two paths of data (source data 1 and source data 2, respectively). As shown in fig. 4 and 5, the knowledge-graph construction method includes the following steps:

step 401, receiving a knowledge graph construction request.

When a user needs to adopt the knowledge graph construction device to construct a knowledge graph, a knowledge graph construction request can be sent to the knowledge graph construction device through the terminal so as to request the construction of the knowledge graph.

Step 402, receiving knowledge graph ontology model instructions.

The knowledge-graph body model instructions are for indicating a knowledge-graph body model used to construct the knowledge-graph. The ontology model (also called ontology) is the skeleton and basis of the knowledge graph. The knowledge-graph ontology model is a standardized description of multivariate data in a specific domain. That is, the knowledge-graph ontology specifies standardized descriptions of elements in multi-group data, such as standardized descriptions indicating entity types, standardized descriptions of entity attributes, and standardized descriptions of associations, which should be included in the knowledge-graph. Because the knowledge graph body prescribes the standardized description of the multi-element group data which should be included in the knowledge graph, the knowledge graph is constructed according to the knowledge graph body model, useless information included in the knowledge graph can be avoided, and elements such as entity types, entity attributes and incidence relations in the knowledge graph can be described in a unified mode. For convenience of description, elements in the multi-component data obtained by information extraction are referred to as extraction elements, and standardized descriptions of the elements in the multi-component data are referred to as ontology elements.

The user can send a knowledge graph ontology model instruction to the knowledge graph constructing device through the terminal so as to indicate the knowledge graph ontology model required to be used when the knowledge graph is constructed. Moreover, the knowledge graph ontology model instruction may carry the knowledge graph ontology model. Or, the knowledge graph ontology model instruction may carry an identification number or a storage address of the knowledge graph ontology model, so that the knowledge graph constructing apparatus can obtain the corresponding knowledge graph ontology model according to the knowledge graph ontology model instruction.

The knowledge graph ontology model can be stored in the deployment environment of the knowledge graph construction device, and the stored knowledge graph ontology model can be a model constructed in the knowledge graph construction device or a model constructed in the terminal and stored in the deployment environment. In addition, in order to improve the flexibility of knowledge graph construction, the knowledge graph construction device not only has the function of creating a knowledge graph body model, but also has the functions of modifying and deleting the created knowledge graph body model and adding, deleting and modifying body elements in the knowledge graph body model.

In an implementation manner, a plurality of candidate knowledge graph ontology models can be prestored in a deployment environment of the knowledge graph building device, at this time, a user can select the knowledge graph ontology model in a setting interface of the knowledge graph building device through a terminal, and after the selection is completed, a specified operation can be executed in the setting interface to trigger sending of a knowledge graph ontology model instruction. For example, fig. 6 is a schematic diagram of a setting interface of a knowledge graph building apparatus provided in an embodiment of the present application, and as shown in fig. 6, a user may select a knowledge graph ontology model to be used when building a knowledge graph in the setting interface and click a "next" button to trigger sending of a knowledge graph ontology model instruction.

And 403, acquiring a knowledge graph ontology model required to be used for constructing the knowledge graph according to the knowledge graph ontology model instruction.

After the knowledge graph building device receives the knowledge graph ontology model instruction, the knowledge graph ontology model can be obtained according to the instruction of the knowledge graph ontology model instruction. For example, when the knowledge graph ontology model instruction carries an identification number of the knowledge graph ontology model, the knowledge graph constructing apparatus may search, according to the identification number, the knowledge graph ontology model indicated by the identification number in the deployment environment of the knowledge graph ontology model, so as to obtain the knowledge graph ontology model indicated by the identification number.

Illustratively, fig. 7 is a schematic diagram of the knowledge graph ontology model obtained according to the knowledge graph ontology model instruction in step 402. As shown in FIG. 7, the knowledge graph ontology model defines an entity type standardized description, an entity attribute standardized description, and an association standardized description of the entity that should be included in the knowledge graph. The entity types (shown as solid dots in fig. 7) to be included in the knowledge graph are: entity types such as characters, songs, and movies. The physical attributes of the person (shown as open dots in FIG. 7) include: name, birthday, nationality, height and gender. The entity attributes of a song include: date and name of release. The entity attributes of the movie include: time of showing and country of showing. The association relationship between the persons includes: spouse relationship, clan membership, parental relationship, and parental relationship. The association relationship between the character and the song includes: the singing relationship. The association between characters and movies includes: a director relationship or a director relationship. The association between the movie and the song includes: the relationship is used.

It should be noted that, in the process of constructing the knowledge graph, whether to perform step 402 may be determined according to business requirements. Furthermore, the knowledge graph constructing apparatus may default to a knowledge graph body model for constructing the knowledge graph, and when step 402 is not executed, in step 403, the knowledge graph constructing apparatus may acquire the knowledge graph body model of the default configuration and construct the knowledge graph using the knowledge graph body model of the default configuration. However, when step 402 is executed, if the knowledge-graph ontology model is selected according to the application requirements, different knowledge-graph ontology models can be used for different fields, so that the adaptation degree of the constructed knowledge graph and the fields can be improved, and the accuracy of the construction of the knowledge graph is further improved.

Step 404, receiving a source data indication instruction.

The terminal may transmit a source data instruction to the knowledge-graph constructing apparatus, the source data instruction being for instructing source data for constructing the knowledge-graph. In one implementation, the source data indication instruction may carry source data for constructing a knowledge graph. In another implementation manner, the source data indication instruction may carry a storage address of the source data for constructing the knowledge-graph, so as to notify the knowledge-graph constructing apparatus to acquire the source data in the storage location indicated by the storage address.

For example, when the knowledge graph building device is deployed in a cloud environment, a user may store source data in a cloud data center in advance through a terminal, and send a source data indication instruction to the knowledge graph building device through the terminal, where the source data indication instruction carries a storage address of the source data in the cloud data center, so as to notify the knowledge graph building device to acquire the source data in the cloud data center according to the storage address.

Also, the source data indicated by the source data indication instruction may be preprocessed data. The pre-processing may include: and converting the data type of the data into a data category which can be directly used by the knowledge graph construction device. For example, after the terminal stores the source data in the data center, the cloud data center may convert the data type of the source data into a JSON data format, or convert the source data into data in a Comma Separated Values (CSV) file format, or the like, so that the knowledge graph constructing apparatus may directly use the preprocessed data without performing data conversion on the source data after acquiring the source data, so as to reduce the data processing amount when the knowledge graph constructing apparatus constructs the knowledge graph.

Optionally, the source data instruction may further carry a data type, an encoding method, a separator used by the source data, and the like of the source data, so as to notify the knowledge graph constructing apparatus of information such as the data type, the encoding method, the separator used by the source data, and the like of the source data. The knowledge graph constructing apparatus may also automatically recognize information such as a data type and an encoding method of the source data and a delimiter used for the source data, which is not specifically limited in the embodiment of the present application.

Further, whether the information needs to be carried in the source data indication instruction or not can be selected in a setting interface of the knowledge graph constructing device. And after the selection is completed, a designated operation can be executed in the setting interface to trigger sending of the source data indication instruction carrying the corresponding information. For example, fig. 8 is a schematic diagram of a setting interface of a knowledge graph constructing apparatus provided in an embodiment of the present application, and as shown in fig. 8, a user may select one or more paths of data required for constructing a knowledge graph in the setting interface, set a name of source data, add a storage address of each path of data, fill information such as a data type of the source data, an encoding method, and a separator used by the source data, and further select whether to set a title line of the source data. After completing the configuration of the setting interface, the user may click a "next" button in the setting interface to trigger sending of the source data indication instruction.

It should be noted that the embodiments of the present application do not limit the type and source of the source data used for constructing the knowledge graph. For example, the type of source data may be table structured data or text unstructured data, etc. The source data may be data derived from an encyclopedia, data derived from a broad bean movie, text data derived from entertainment news, or data derived from a database or document library within an enterprise. In addition, the embodiment of the present application does not limit the manner of obtaining the source data, and for example, the data from the web page may be obtained in a distributed crawler manner.

And step 405, acquiring multi-channel data according to the source data indication instruction.

After receiving the source data instruction, the knowledge graph constructing apparatus may acquire the source data according to the instruction of the source data instruction. For example, when the source data indicates a storage address carrying the source data in the instruction, the knowledge-graph constructing apparatus may acquire the source data in the storage location indicated by the storage address. Alternatively, when the source data instruction carries source data, the knowledge graph constructing apparatus may directly read the source data carried in the source data instruction. For example, it is assumed that two paths of data are obtained according to a source data instruction, where the two paths of data are related to introduction information of a certain chapter 1, where table 1 is one path of data obtained by the knowledge graph constructing apparatus from a certain website according to the source data instruction, and table 2 is another path of data obtained by the knowledge graph constructing apparatus from a certain database according to the source data instruction.

TABLE 1

TABLE 2

Name: chapter some 1	The star relationship: chapter a certain 2 (brother)
		Alias: 1 certain chapter	Nationality: china (China)
Sex: woman	Occupation: actor, producer, singer
		Height: 164 cm	The representation is as follows: hero, my father mother and ten faces buried
Date of birth: 1979, 2 month and 9 days	Song: ten-face buried

Step 406, receiving an information extraction instruction.

The information extraction instruction is used for indicating an information extraction strategy adopted for extracting the information from the source data. Information extraction refers to extracting multi-element data from source data. The tuple data may include: information indicating entity types of entities, information of entity attributes, information of association relations, and the like. The information extraction instruction indicates the implementation manner of the information extraction policy, and the implementation manner of the information extraction policy may include: the information extraction instruction carries an algorithm identification of the information extraction algorithm. The knowledge graph constructing device is pre-stored with program instructions of a plurality of alternative information extraction algorithms, and after receiving the algorithm identification carried in the information extraction instruction, the knowledge graph constructing device can determine the information extraction algorithm indicated by the algorithm identification in the plurality of alternative information extraction algorithms according to the algorithm identification, and use the information extraction algorithm to extract information from the source data. When the knowledge graph is constructed according to the multiple paths of data, the information extraction strategies for extracting the information from the multiple paths of data may be the same or different, and the embodiment of the present application does not specifically limit the information extraction strategies.

In an implementation manner, the information extraction instruction may be triggered by executing a specified operation after an information extraction algorithm is selected in a setting interface of the knowledge graph building device. For example, fig. 9 is a schematic view of a setting interface of a knowledge graph constructing apparatus provided in an embodiment of the present application, and as shown in fig. 9, a user may select corresponding information extraction policies for different source data on the setting interface and click a "next" button to trigger sending of an information extraction instruction.

And 407, extracting information of each path of data by respectively adopting an information extraction strategy corresponding to each path of data indicated by the information extraction instruction to obtain a plurality of multi-element group data corresponding to each path of data.

The information extraction strategy adopted when extracting information from different types of data can be different. For example, for the structured data and the semi-structured data, a fixed rule may be used for information extraction, or an Artificial Intelligence (AI) model may be used for information extraction. The fixed rule may be expressed in a manner including: the method is represented by a general algorithm model, preset plug-in scripts, configured function plug-ins and the like. Optionally, the fixed rule may be a regular expression, a rule function, or a semantic-based analysis method.

For unstructured data, the information extraction can be carried out by adopting a rule which is adaptively changed according to the data. For example, an AI model may be employed for information extraction. And before the AI model is used for information extraction, the AI model can be trained by adopting the labeled samples so as to ensure that the AI model has better information extraction performance. Further, the annotated sample may be annotated using ontology elements in the knowledge-graph ontology model. When the AI model extraction information obtained by using the labeled sample training is used, the multi-element data extracted by the AI model is information represented by the ontology elements defined in the knowledge graph ontology model, so that the subsequent process of performing standardized description on the extracted multi-element data according to the ontology elements can be reduced, the process of constructing the knowledge graph is simplified, and the construction efficiency of the knowledge graph is improved.

And the knowledge graph constructing device can be configured with a function of self-defining the function of the functional plug-in. The function plug-in self-definition function is that when a knowledge graph building device is deployed, an input interface and an output interface for accessing a function plug-in are reserved, and conditions which need to be met by the input interface and the output interface are specified, so that a user can conveniently define the function plug-in according to application requirements, and when the input of the self-defined function plug-in meets the limiting conditions of the input interface and the output meets the limiting conditions of the output interface, the self-defined function plug-in is used for extracting information of source data. By configuring the function of the function plug-in self-definition, a user can configure the function plug-in according to application requirements conveniently, and the flexibility of knowledge graph construction can be further improved, so that the knowledge graph construction method provided by the embodiment of the application can be applied to more knowledge graph construction scenes, and the application range of the knowledge graph construction method is ensured.

In the following, taking an AI model for information extraction as an example, the implementation process of information extraction is described for three information extraction scenarios. The three information extraction scenarios are respectively as follows: an information extraction scenario, an open information extraction scenario, and an event extraction scenario under mode constraints.

In the information extraction scenario under the mode constraint, each time the information extraction process extracts a specified type of multi-element group data. In each information extraction process, information extraction is performed on data to be extracted by using a predicate model (predicate model), a subject model (subject model) and an object model (object model) in sequence. The data to be extracted may be part of the data in the source data, for example, may be a sentence in the source data. The predicate model is used for judging whether multi-element group data of a specified type exists in the data to be extracted. The input of the predicate model is data to be extracted, and the output of the predicate model is the result of whether multi-element group data of the specified type exists in the data to be extracted. The subject model is used for extracting the subject of the specified type of the multi-element group data from the data to be extracted when the multi-element group data of the specified type exists in the data to be extracted. The input of the subject model is the data to be extracted and the type information of the multi-element group data of the specified type. The output of the subject model is the subject of the tuple data of the specified type. The object model is used for extracting the object of the specified type of multi-element group data from the data to be extracted when the multi-element group data of the specified type exists in the data to be extracted. The input of the object model is the data to be extracted, the type description of the specified type of the multi-element group data and the subject of the specified type of the multi-element group data. The output of the object model is the object of the specified type of tuple data.

The predicate model, subject model, and object model each have an input layer, a feature extraction layer, and an output layer. The input layer is used for dividing the data to be extracted according to words or words, using vectors to represent each part of divided data, and indicating the position of each part of divided data in the data to be extracted (namely, position embedding function). The feature extraction layer is used to extract features of the vectors input from the input layer. The output layer is used for judging the type of each divided part of data according to the features extracted by the feature extraction layer.

Alternatively, the input layers of the predicate model, subject model, and object model may all be implemented using a Bert model (a language characterization model). The feature extraction layers of the predicate model, subject model, and object model may all be implemented using a expanded gate convolutional neural network (DGCNN) model, a language characterization model. The output layers of the predicate model, subject model, and object model may all be implemented using Sigmoid functions (a type of Sigmoid function).

For example, in the sentence "forest newspaper-autumn" is a book published by a publisher in twenty first century in 2007, and the author is (soviet union) wii bianzyl ", the triple data included therein are (forest newspaper-autumn, author, wii bianzyl), (forest newspaper-autumn, publication time, 2007), (forest newspaper-autumn, publisher, twenty first century publisher), (forest newspaper-autumn, type, book), (wii bianzyl, nationality, soviet union), (wii bianzyl, type, character), and the like. In the information extraction scenario under the mode constraint, the specified type of triple data is (book, author, character), (book, publishing house), (character, country, nationality), and the above sentences can be extracted as (forest newspaper-autumn, author, wiki, forest newspaper-autumn, publishing house, twenty-first century publishing house), (wiki, nationality, and soviet union), respectively.

In the open information extraction scene, the multi-element group data of the specified type does not need to be limited to be extracted, the multi-element group data can be directly extracted from the data to be extracted, and the subject, the predicate and the object in the extracted multi-element group data are words directly appearing in the data to be extracted. And in each information extraction process, extracting the information of the data to be extracted by sequentially using the predicate model, the subject model and the object model. The predicate model is used for extracting predicates of multi-group data from the data to be extracted. The input of the predicate model is the data to be extracted, and the output of the predicate model is the predicate of the multi-element data. The subject model is used for extracting subjects of the multi-group data from the data to be extracted. The input of the subject model is the predicate of the data to be extracted and the multi-element data. The output of the subject model is the subject of the tuple data. The object model is used for extracting objects of a plurality of groups of data from the data to be extracted. The input of the object model is the data to be extracted, the subject and the predicate of the multi-element data. The output of the object model is the object of the multi-element data. The implementation manners of the predicate model, the subject model and the object model can be referred to the implementation manners of the predicate model, the subject model and the object model in the information extraction scene under the mode constraint.

For example, in the sentence "forest newspaper-autumn" is a book published by a publisher in twenty first century in 2007, and the author is (soviet union) wii bianzyl ", the triple data included therein are (forest newspaper-autumn, author, wii bianzyl), (forest newspaper-autumn, publication time, 2007), (forest newspaper-autumn, publisher, twenty first century publisher), (forest newspaper-autumn, type, book), (wii bianzyl, nationality, soviet union), (wii bianzyl, type, character), and the like. In the open information extraction scenario, the subject, predicate, and object in the extracted multi-component data need to be words that appear directly in the data to be extracted, and therefore the sentence can be extracted as a result (forest report-autumn, author, wiki).

In the event extraction scene, the data extracted each time is an event consisting of a plurality of multi-element group data of specified types. Before the information extraction operation is performed, the event type and the event attribute need to be defined in advance. The information extraction logic is as follows: the method comprises the steps of firstly identifying trigger words and event types of events, then extracting event elements, and judging the role of each event element. In each information extraction process, the subject model, the predicate model and the object model are sequentially used for extracting information from the data to be extracted. The subject model is used for judging whether predefined event types and trigger words exist in the data to be extracted. The input of the subject model is the data to be extracted. The output of the subject model is the result of whether the predefined event type exists in the data to be extracted. The predicate model is used for judging whether predefined event attributes exist in the data to be extracted. The input of the predicate model is the data to be extracted and the type information of the predefined event type, and the output of the predicate model is the event attribute existing in the data to be extracted. The object model is used for extracting the attribute value of the event attribute from the data to be extracted. The input of the object model is the data to be extracted, the type information of the predefined event type and the attribute information of the event attribute existing in the data to be extracted. The output of the object model is the attribute value of each event attribute. The outputs of the subject model, predicate model, and object model constitute events. The implementation manners of the predicate model, the subject model and the object model can be referred to the implementation manners of the predicate model, the subject model and the object model in the information extraction scene under the mode constraint.

For example, the data to be extracted is "banana company will take a new-product release meeting at 10 am at 12 am at 9 months and 12 days in western time (1 am at 9 months and 13 days in beijing), and the release meeting place is a newly-built Steve-Jobs theater. According to the current message, banana companies will publish ichne8, ichne7s, ichne7s Plus, ichne ch 3 and brand new ichne TV "at this release. The event type is defined as "post meeting", and the event attributes include "time", "place", "company", "product".

In the extraction process, the subject model is used for judging whether the event type 'release meeting' occurs in the data to be extracted. The input of the model is data to be extracted, the output of the model is the result of whether the event type 'release meeting' exists in the data to be extracted, and the subject model can also mark a trigger word 'new product release meeting' in the data to be extracted to distinguish a plurality of events of the same type which may appear in the data to be extracted.

The predicate model is used for judging whether event attributes "time", "place", "company" and "product" appear in the data to be extracted according to event types appearing in the data to be extracted. The input of the event type is data to be extracted and the type information of the event type, and the output of the event type is the attribute of the event existing in the data to be extracted.

The object model is used for extracting the attribute value of the event attribute from the data to be extracted. The inputs are the data to be extracted, the event type "post" and the event attributes "time", "place", "company", "product". The output is an attribute value of each event attribute in the data to be extracted, for example, corresponding to the event attribute "time", and the output is: the western time 9 and 12 months, 10 am, corresponds to the event attribute "location", and its output is: steve, Job theater, corresponding to the event attribute "company", the output is: apple, corresponding to the event attribute "product", its output is: ichne8, ichne7s, ichne7s Plus, ichne ch 3, and brand new ichne TV.

According to the output of the subject model, the predicate model and the object model, a plurality of triple data can be obtained: (Congress, company, banana company), (Congress, time, West 9 months 12 am 10 am), (Congress, venue, Steve-Jobbs theater), (Congress, product, ichne8), (Congress, product, ichne7s), and so forth. These triple data constitute the result of event extraction:

event type: issuing the meeting;

company: banana company;

time: western hours 9 months 12 am 10 am;

a place: steve, arbor theatre;

the product is as follows: ichne8, ichne7s, ichne7s Plus, ichne ch 3, ichneTV.

It should be noted that, in the process of constructing the knowledge graph, it may be determined whether to perform step 406 according to business requirements. Also, the knowledge-graph constructing apparatus may default to the information extraction policy, and when step 406 is not executed, in step 407, the knowledge-graph constructing apparatus may extract information from the source data using the information extraction policy of the default configuration. However, by selecting the information extraction strategy for extracting the information from the source data, the knowledge graph constructing device can adopt different information extraction strategies for the source data in different fields, can improve the accuracy of the information extracted from the source data, ensures the accuracy of the knowledge graph constructed according to the source data in different fields, ensures the application range of the knowledge graph constructing method, and improves the flexibility of constructing the knowledge graph.

Step 408, receiving a mapping policy instruction.

The mapping policy instruction is used for indicating a mapping policy for performing associative mapping (also called knowledge mapping) on a plurality of multi-element data according to the ontology element. Knowledge mapping refers to establishing a mapping relation between extracted elements and ontology elements, and adopting the ontology elements to perform standardized description on the corresponding extracted elements according to the mapping relation. For example, when the subject in the tuple data defined by the knowledge-graph body model is formally expressed as "name", if the subject in the extracted tuple data is "name", a mapping relation between the "name" and the "name" can be established according to the mapping strategy, and the "name" is standardized and described as the "name" according to the mapping relation. When the knowledge graph is constructed according to the multiple paths of data, mapping strategies corresponding to the multiple paths of data may be the same or different, and the mapping strategies are not specifically limited in the embodiment of the application.

In one implementation of the mapping strategy, the knowledge graph constructing apparatus may obtain a matching degree between each extracted element and each ontology element. When the matching degree of a certain extraction element and an ontology element is greater than a matching degree threshold value, the knowledge graph construction device can establish a mapping relation between the extraction element and the ontology element and instruct the ontology element to be used for carrying out standardized description on the extraction element. For example, when the matching degree of the extracted element "name" and the ontology element "name" is greater than the matching degree threshold, a mapping relationship between the "name" and the "name" may be established, and the "name" may be standardized and described as the "name" according to the mapping relationship.

At this time, the mapping policy instruction is used for instructing to establish a mapping relationship between the ontology element and the extraction element according to the matching degree, and a matching degree algorithm used for obtaining the matching degree. For example, the mapping policy instruction may instruct to establish a mapping relationship between the ontology element and the extraction element according to the matching degree, and the matching degree algorithm used to obtain the matching degree may be an edit distance similarity algorithm.

In another implementation manner of the mapping policy, a user can configure the mapping policy in a setting interface of the knowledge graph constructing device through a terminal. The implementation process comprises the following steps: the user can indicate the mapping relation between the extraction element and the body element through the terminal and indicate that the body element is used for carrying out standardized description on the extraction element with the mapping relation. After the user completes configuration, the user can trigger and send the mapping strategy instruction by executing specified operation in the setting interface. Moreover, since the ontology elements defined by the knowledge graph ontology model are determined after the knowledge graph ontology model is determined in step 403, the process of configuring the mapping policy is substantially a process of respectively indicating the extraction elements having mapping relationships with different ontology elements according to the determined ontology elements.

For example, fig. 10 is a schematic diagram of a setting interface of a knowledge graph building apparatus provided in an embodiment of the present application, and as shown in fig. 10, a user may add extraction elements having mapping relationships with ontology elements in the setting interface. For example, for an entity type (i.e., ontology entity type) "name" in a known ontology element, an entity type (i.e., abstraction entity type) in an abstraction element with which a mapping relationship exists may be added as a "name" to map the entity type. For the ontology element association relationship (i.e. ontology association relationship), the association relationship (i.e. extraction association relationship) in the extraction element with which the mapping relationship exists can be added to map the association relationship. For entity attributes in known ontology elements (i.e., ontology entity attributes), entity attributes in the extraction elements with which a mapping relationship exists (i.e., extraction entity attributes) can be added to perform knowledge mapping on the entity attributes. Furthermore, the type of the knowledge graph can be mapped according to the type of the knowledge graph body model (namely the body type). After configuration is complete, the "next" button may be clicked to trigger sending mapping policy instructions.

And 409, respectively performing associated mapping on a plurality of multi-component data extracted according to each path of data according to the mapping strategy indicated by the mapping strategy instruction and the standardized description of the multi-component data to obtain a plurality of multi-component data which are subjected to standardized description by adopting the standardized description of the multi-component data.

After the knowledge map construction device obtains the mapping strategy instruction, the knowledge map construction device can perform knowledge mapping on the multi-element group data according to the mapping strategy indicated by the mapping strategy instruction and the body elements to obtain the multi-element group data which are described in a standardized manner by the body elements. The extracted elements can be described in a standardized manner according to the ontology elements defined by the ontology model of the knowledge graph through the knowledge mapping, so that the uniform representation of the extracted elements is realized, and the readability of the knowledge graph is improved.

It should be noted that, in the process of constructing the knowledge graph, whether to execute step 408 may be determined according to business requirements. And, the mapping policy may be configured by default in the knowledge-graph constructing apparatus, and when step 408 is not executed, in step 409, the knowledge-graph constructing apparatus may perform association mapping on the multi-element data by using the mapping policy configured by default. However, by selecting the mapping strategy and using the selected mapping strategy to perform associated mapping on the multi-element data, the knowledge graph constructing device can use different mapping strategies for different types of data, the accuracy of performing associated mapping on the multi-element data can be improved, and the accuracy of constructing the knowledge graph is improved.

Step 410, receiving a matching policy instruction.

When the knowledge graph is constructed according to a plurality of source data, the representation modes of information used for indicating the same entity may be different, and if the knowledge graph is constructed directly according to the extracted multi-element data, the same entity adopting different representation modes may be used as different entities, so that the constructed knowledge graph cannot accurately reflect the content embodied by the source data. Therefore, before the knowledge graph is constructed according to the multi-element group data, whether the elements for indicating the same entity are included in different multi-element group data can be judged, and merging processing (also called knowledge fusion) is performed on the different multi-element group data including the elements for indicating the same entity, so that the knowledge graph is constructed according to the multi-element group data after merging processing, and the accuracy of the constructed knowledge graph is improved. For example, the information of the entity type obtained by extracting the information from the source data shown in table 1 is "name: chapter 1 ", information of an entity type obtained by extracting information from source data shown in table 2 is" name: 1 chapter "both are used to indicate the same entity, although they are represented in different ways, and in this case, knowledge fusion can be performed on both.

The matching strategy instruction is used for indicating whether a matching algorithm and a matching degree threshold value used for indicating elements of the same entity are included in different multi-group data or not. The knowledge graph constructing device can obtain the matching degree of the elements in different multi-element group data according to the matching degree algorithm, when the matching degree of the elements in the different multi-element group data is not smaller than the threshold value of the matching degree, the elements in the different multi-element group data are determined to be used for indicating the same entity, and at the moment, the elements in the different multi-element group data used for indicating the same entity can be combined.

In an implementation manner, programs of various matching algorithms may be stored in advance in a deployment environment of the knowledge graph constructing device, at this time, a matching algorithm to be used may be selected in a setting interface of the knowledge graph constructing device, and after the selection is completed, a matching policy instruction is triggered and sent by executing a specified operation in the setting interface. For example, fig. 11 is a schematic diagram of a setting interface of a knowledge graph constructing apparatus provided in an embodiment of the present application, and as shown in fig. 11, a user may select, for different elements, a matching algorithm and a matching degree threshold value to be used when performing knowledge fusion on the elements in the setting interface. Moreover, the matching algorithm and the threshold of the matching degree may be set for different entity attributes of the entity, and when determining whether the entity is the same as another entity, the determination result may be "integration" of the algorithm result of the matching algorithm corresponding to the different entity attributes of the entity. For example, the intersection of the algorithm results of the matching algorithms corresponding to different entity attributes of the entity may be used. Similarly, each attribute may be configured with multiple matching algorithms. After the setup is complete, the "next" button may be clicked to trigger a match policy directive.

Step 411, according to the multi-element group data matching policy indicated by the matching policy instruction, determining different multi-element group data including an element indicating the same entity from the multi-element group data after standardized description, and merging the different multi-element group data including the element indicating the same entity to obtain multiple multi-element group data after merging.

The merging processing of different multi-element group data including elements indicating the same entity means that the same entity adopting different representation modes is represented by adopting the same representation mode, so that the representation modes of the elements indicating the same entity are the same.

Exemplarily, the triplet data obtained by extracting information from the source data shown in table 1 are (chapter some 1, height, 164 cm), (chapter some 1, gender, woman), (chapter some 1, nationality, china), (chapter some 1, birthday, 1979, 2 month 9), (chapter some 1, sister, chapter some 2), (chapter some 1, lead, my father mother), (chapter some 1, lead, tiger dragon). The triplet data obtained by extracting information from the source data shown in table 2 are (1 certain chapter, height, 164 cm), (1 certain chapter, gender, woman), (1 certain chapter, brother and sister, certain chapter 2), (1 certain chapter, lead, my father mother), (1 certain chapter, lead, hero), (1 certain chapter, lead, ten-face burial), (1 certain chapter, singer, ten-face burial), respectively. After knowledge fusion is carried out according to the multi-element data matching strategy indicated by the matching strategy instruction, the following triple data are obtained: (certain chapter 1, height, 164 cm), (certain chapter 1, sex, woman), (certain chapter 1, nationality, China), (certain chapter 1, birthday, 1979, 2, 9 days), (certain chapter 1, brother, certain chapter 2), (certain chapter 1, lead, my father mother), (certain chapter 1, lead, ten-sided buried), (certain chapter 1, lead, hero), (certain chapter 1, singer, ten-sided buried).

It should be noted that, in the process of constructing the knowledge graph, whether to execute step 410 may be determined according to business requirements. And, the knowledge graph constructing device can be configured with a matching algorithm and a corresponding matching degree threshold value by default. In this step 411, without executing step 410, the knowledge-graph constructing apparatus may determine whether elements indicating the same entity are included in different multi-element data, using a matching algorithm of a default configuration and a corresponding threshold of matching degree. However, by selecting the matching algorithm and judging whether the elements indicating the same entity are included in the different multi-element group data by using the selected matching algorithm, the knowledge graph constructing device can adopt different matching algorithms for the elements obtained based on the data in different fields, the flexibility of knowledge mapping and the accuracy of obtaining the matching degree can be improved, and the accuracy and the comprehensiveness of knowledge graph construction are improved.

And step 412, constructing a knowledge graph according to the combined multi-element data.

The knowledge graph records entities included in the source data and relations among different entities. The steps 401 to 411 are all preparation work for constructing the knowledge graph, and after the preparation work is completed, the knowledge graph can be constructed according to the multiple tuple data after the combination processing. The process of constructing the knowledge graph according to the multi-element data can be understood as follows: and connecting the multiple multi-element group data into a semantic network according to the relationship among the elements in the multiple multi-element group data after combination processing. Each node in the semantic network corresponds to an entity type or an entity attribute in the multi-component data, the relationship between the nodes corresponds to the information of the association relationship in the multi-component data, the starting point of an arrow between the nodes corresponds to an element serving as a subject in the multi-component data, and the end point of the arrow corresponds to an element serving as an object in the multi-component data.

Illustratively, fig. 12 is a schematic diagram of a knowledge graph constructed from the multi-component data after the merging process in step 411. As shown in fig. 12, the knowledge graph records entity types, entity attributes, and association relationships in multi-group data for indicating entities, and the knowledge graph shows the source data in tables 1 and 2 in a graph form, so that the visualization degree of the source data is improved, and the convenience degree of analysis according to the source data is improved.

And 413, after the source data are determined to be updated, extracting the information of the incremental data in the updated source data according to the strategy indicated by the information extraction instruction to obtain a plurality of multi-component data corresponding to the incremental data, and updating the knowledge graph according to the plurality of multi-component data corresponding to the incremental data.

When the source data of the constructed knowledge graph is updated, incremental data of the updated source data relative to the source data can be obtained, and the constructed indication graph is updated according to the incremental data to obtain the knowledge graph corresponding to the updated source data. For example, information extraction may be performed on incremental data to obtain multiple multi-component data corresponding to the incremental data, knowledge mapping may be performed on the multiple multi-component data corresponding to the incremental data, knowledge fusion may be performed on the multiple multi-component data corresponding to the incremental data after association mapping, and then the indication map may be updated according to the multiple multi-component data after knowledge fusion. By incrementally updating the knowledge graph, the amount of calculation in the process of constructing the knowledge graph according to the updated source data can be reduced, and the construction efficiency of the knowledge graph can be improved.

To sum up, the method for constructing a knowledge graph provided in the embodiment of the present application determines an information extraction strategy for extracting information from source data for constructing the knowledge graph by receiving an information extraction instruction, extracts the information from the source data by using the information extraction strategy to obtain a plurality of multi-element data, and constructs the knowledge graph according to the plurality of multi-element data.

The order of the steps of the method for constructing the knowledge graph provided by the embodiment of the application may be appropriately adjusted, and the steps may also be increased or decreased according to the situation, for example, whether to execute the step 402, the step 406, the step 408, and the step 410 may be selected according to the application requirements. Any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application is covered by the protection scope of the present application, and thus the detailed description thereof is omitted.

The embodiment of the application also provides a knowledge graph construction device. As shown in fig. 13, the knowledge-map constructing apparatus 80 may include:

the receiving module 801 is configured to receive an information extraction instruction, where the information extraction instruction is used to indicate an information extraction policy to be used for extracting information from source data for constructing a knowledge graph.

An extracting module 802, configured to extract information from the source data by using an information extraction policy indicated by the information extraction instruction, to obtain multiple pieces of tuple data, where each piece of tuple data includes: information indicating entity type, entity attribute and association of the entity.

The constructing module 803 is configured to construct a knowledge graph according to the multiple sets of data, where the knowledge graph records entities included in the source data and relationships between different entities.

Optionally, as shown in fig. 14, the knowledge-graph constructing apparatus 80 further includes:

an obtaining module 804, configured to obtain a knowledge graph body model that needs to be used when constructing a knowledge graph, where the knowledge graph body model defines a standardized description of multi-component data in the knowledge graph.

The receiving module 801 is further configured to receive a mapping policy instruction, where the mapping policy instruction is used to indicate a mapping policy for performing association mapping on a plurality of multi-component data according to a standardized description of the multi-component data.

The mapping module 805 is configured to perform associated mapping on the multiple tuple data according to the standardized description of the multiple tuple data and the mapping policy indicated by the mapping policy instruction, so as to obtain multiple tuple data that are standardized and described by using the standardized description of the multiple tuple data.

Accordingly, building block 803 is specifically configured to: and constructing a knowledge graph according to the plurality of multivariate data after standardized description.

a determining module 806 configured to determine, according to a specified multi-tuple data matching policy, different multi-tuple data including information indicating the same entity among the plurality of multi-tuple data.

A merging module 807 for merging different multi-element groups including information indicating the same entity.

Accordingly, building block 803 is specifically configured to: and constructing a knowledge graph according to the combined multi-element data.

Optionally, the receiving module 801 is further configured to receive a matching policy instruction, where the matching policy instruction is used to instruct to determine whether a matching algorithm and a matching degree threshold that indicate information of the same entity are included in different multi-component data.

Correspondingly, the determining module 806 is specifically configured to: and when the matching degree of the information indicating the entity in the two multi-element group data is determined to be not less than the threshold value of the matching degree according to the matching algorithm indicated by the matching strategy instruction, determining that the two multi-element group data comprise the information indicating the same entity.

Optionally, the source data includes: the extraction module 802 is specifically configured to: and respectively extracting the information of each path of data by using an information extraction strategy which is used for extracting the information of each path of data and indicated by the information extraction instruction to obtain a plurality of multi-element group data respectively corresponding to the multi-path data.

Accordingly, building block 803 is specifically configured to: and constructing a knowledge graph according to a plurality of multi-element data corresponding to the multi-path data.

Optionally, the extracting module 802 is further configured to, after it is determined that the source data is updated, perform information extraction on incremental data in the updated source data according to a policy indicated by the information extraction instruction, so as to obtain multiple pieces of tuple data corresponding to the incremental data.

Correspondingly, the building module 803 is further configured to update the knowledge graph according to the plurality of multi-component data corresponding to the incremental data.

Optionally, the extraction module 802 is specifically configured to: and extracting the information of the source data by adopting the AI model indicated by the information extraction instruction.

The AI model is a trained model, training samples of the AI model are labeled by using standardized description of multi-element data in the knowledge graph body model, and the knowledge graph body model defines the standardized description of the multi-element data in the knowledge graph.

To sum up, the apparatus for constructing a knowledge graph provided in the embodiment of the present application receives an information extraction instruction through a receiving module, determines an information extraction strategy for extracting information from source data for constructing the knowledge graph, and an extraction module extracts information from the source data by using the information extraction strategy to obtain a plurality of multi-component data.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The embodiment of the application also provides a computing device, which comprises a processor and a memory; the memory has stored therein a computer program; when the processor executes the computer program, the computing device implements the knowledge graph construction method provided by the embodiment of the application. The computing device may be a server or a terminal, and the structure of the computing device refers to the structure of the computing device in fig. 3, which is not described herein again.

Optionally, the computing device may operate on an AI platform and a big data platform, so as to construct, train and deploy an AI model used in the knowledge graph construction method provided by the embodiment of the present application by using the AI platform, acquire source data from the big data platform, and perform data processing by using the big data platform.

The embodiment of the present application further provides a storage medium, which is a nonvolatile computer-readable storage medium, and when instructions in the storage medium are executed by a processor, the method for constructing a knowledge graph provided by the embodiment of the present application is implemented.

The embodiment of the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for constructing a knowledge graph provided by the embodiment of the present application.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

In the embodiments of the present application, the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "at least one" means one or more, and the term "plurality" means two or more, unless expressly defined otherwise.

The term "and/or" in this application is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The above description is only exemplary of the present application and is not intended to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of knowledge graph construction, the method comprising:

receiving an information extraction instruction, wherein the information extraction instruction is used for indicating an information extraction strategy adopted for extracting information from source data for constructing a knowledge graph;

and extracting the information of the source data by adopting an information extraction strategy indicated by the information extraction instruction to obtain a plurality of multi-element group data, wherein each multi-element group data comprises: information indicating entity types, entity attributes and association relationships of the entities;

and constructing the knowledge graph according to the multi-element data, wherein the knowledge graph records the entities included in the source data and the relations among different entities.

2. The method of claim 1, wherein prior to said constructing the knowledge-graph from the plurality of multivariate data, the method further comprises:

acquiring a knowledge graph body model required to be used when the knowledge graph is constructed, wherein the knowledge graph body model defines standardized description of multi-component data in the knowledge graph;

receiving a mapping policy instruction for indicating a mapping policy for associative mapping of the plurality of multi-component data according to the standardized description of the multi-component data;

performing associated mapping on the multi-component data according to the standardized description of the multi-component data and the mapping strategy indicated by the mapping strategy instruction to obtain a plurality of multi-component data which are subjected to standardized description by adopting the standardized description of the multi-component data;

the constructing the knowledge-graph according to the plurality of multivariate data comprises:

and constructing the knowledge graph according to a plurality of multivariate data after standardized description.

3. The method of claim 1 or 2, wherein prior to said constructing the knowledge-graph from the plurality of multivariate data, the method further comprises:

determining different multi-element group data including information indicating the same entity in the multi-element group data according to a specified multi-element group data matching strategy;

merging different multi-element group data comprising information indicating the same entity;

and constructing the knowledge graph according to the combined multi-element data.

4. The method of claim 3, wherein before determining, according to a specified tuple data matching policy, a different tuple data including information indicating the same entity among the plurality of tuple data, the method further comprises:

receiving a matching strategy instruction, wherein the matching strategy instruction is used for indicating and judging whether a matching algorithm and a matching degree threshold value of information indicating the same entity are included in different multi-element data;

the determining, according to a specified multi-tuple data matching policy, different multi-tuple data including information indicating the same entity among the plurality of multi-tuple data includes:

and when the matching degree of the information indicating the entity in the two multi-element group data is determined to be not less than the threshold value of the matching degree according to the matching algorithm indicated by the matching strategy instruction, determining that the two multi-element group data comprise the information indicating the same entity.

5. The method of any of claims 1 to 4, wherein the source data comprises: the method for extracting the source data by using the information extraction strategy indicated by the information extraction instruction to obtain a plurality of multi-element group data includes:

respectively adopting an information extraction strategy which is indicated by the information extraction instruction and is used for extracting information of each path of data, and extracting the information of each path of data to obtain a plurality of multi-element group data which respectively correspond to the multi-path data;

and constructing the knowledge graph according to a plurality of multivariate data corresponding to the multipath data.

6. The method of any of claims 1 to 5, wherein after said constructing the knowledge-graph from the plurality of multivariate data, the method further comprises:

after the source data are determined to be updated, extracting information of incremental data in the updated source data according to a strategy indicated by the information extraction instruction to obtain a plurality of multi-element group data corresponding to the incremental data;

and updating the knowledge graph according to a plurality of multi-element data corresponding to the incremental data.

7. The method according to claim 1, wherein the extracting information from the source data using the information extraction policy indicated by the information extraction instruction comprises:

extracting information from the source data by adopting an AI model indicated by the information extraction instruction;

the AI model is a trained model, training samples of the AI model are labeled by using standardized description of multi-element data in a knowledge graph body model, and the knowledge graph body model defines the standardized description of the multi-element data in the knowledge graph.

8. An apparatus for knowledge-graph construction, the apparatus comprising:

the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an information extraction instruction which is used for indicating an information extraction strategy adopted for extracting information from source data for constructing a knowledge graph;

an extraction module, configured to extract information from the source data by using an information extraction policy indicated by the information extraction instruction, to obtain multiple pieces of tuple data, where each piece of tuple data includes: information indicating entity types, entity attributes and association relationships of the entities;

and the construction module is used for constructing the knowledge graph according to the multi-element data, and the knowledge graph records the entities included in the source data and the relationship among different entities.

9. The apparatus of claim 8, further comprising:

the acquisition module is used for acquiring a knowledge graph body model which is required to be used when the knowledge graph is constructed, and the knowledge graph body model defines the standardized description of multi-group data in the knowledge graph;

the receiving module is further configured to receive a mapping policy instruction, where the mapping policy instruction is used to indicate a mapping policy for performing association mapping on the multiple pieces of group data according to the standardized description of the multiple pieces of group data;

the mapping module is used for performing associated mapping on the multi-component data according to the standardized description of the multi-component data and the mapping strategy indicated by the mapping strategy instruction to obtain a plurality of multi-component data which are subjected to standardized description by adopting the standardized description of the multi-component data;

the building module is specifically configured to:

10. The apparatus of claim 8 or 9, further comprising:

a determining module, configured to determine, according to a specified multi-tuple data matching policy, different multi-tuple data including information indicating a same entity among the multi-tuple data;

the merging module is used for merging different multi-element group data comprising information indicating the same entity;

the building module is specifically configured to:

11. The apparatus of claim 10,

the receiving module is further configured to receive a matching policy instruction, where the matching policy instruction is used to instruct and judge whether a matching algorithm and a matching degree threshold that indicate information of the same entity are included in different multi-element data;

the determining module is specifically configured to:

12. The apparatus according to any one of claims 8 to 11, wherein the source data comprises: the extraction module is specifically configured to:

the building module is specifically configured to:

13. The apparatus according to any one of claims 8 to 12,

the extraction module is further configured to, after it is determined that the source data is updated, extract information of incremental data in the updated source data according to a policy indicated by the information extraction instruction, so as to obtain multiple multi-element group data corresponding to the incremental data;

the construction module is further configured to update the knowledge graph according to a plurality of multi-element data corresponding to the incremental data.

14. The apparatus of claim 8, wherein the extraction module is specifically configured to:

15. A computing device, wherein the computing device comprises a processor and a memory;

the memory has stored therein a computer program;

the computer program, when executed by the processor, causes the computing device to implement the method of knowledge-graph construction of any of claims 1 to 7.

16. A non-transitory storage medium, wherein instructions in the storage medium, when executed by a processor, implement the method of constructing a knowledge graph of any one of claims 1 to 7.