WO2021037045A1 - 知识图谱构建方法及装置、计算设备、存储介质 - Google Patents
知识图谱构建方法及装置、计算设备、存储介质 Download PDFInfo
- Publication number
- WO2021037045A1 WO2021037045A1 PCT/CN2020/111308 CN2020111308W WO2021037045A1 WO 2021037045 A1 WO2021037045 A1 WO 2021037045A1 CN 2020111308 W CN2020111308 W CN 2020111308W WO 2021037045 A1 WO2021037045 A1 WO 2021037045A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- knowledge graph
- information extraction
- multiple sets
- instruction
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
Definitions
- This application relates to the field of cloud computing technology, in particular to a method and device for constructing a knowledge graph, computing equipment, and storage media.
- Knowledge graph is a representation form of knowledge organization and knowledge representation, and it has become a development trend to use knowledge graph to represent knowledge system.
- the process of constructing the knowledge graph is usually implemented by a customized module, which is customized according to the domain requirements of the business field.
- this customized module is difficult to be used to construct knowledge graphs in different fields, resulting in poor applicability.
- This application provides a method and device for constructing a knowledge graph, a computing device, and a storage medium, which can solve the problem of poor applicability of the method for constructing a knowledge graph in related technologies.
- this application provides a method for constructing a knowledge graph.
- the method includes: receiving an information extraction instruction, where the information extraction instruction is used to instruct an information extraction strategy used to extract information from the source data for constructing the knowledge graph; and adopting an information extraction instruction
- the indicated information extraction strategy extracts information from the source data to obtain multiple sets of data.
- Each multiple set of data includes: information indicating the entity type of the entity, entity attribute information, and association relationship information; Multiple sets of data to construct a knowledge graph, which records the entities included in the source data and the relationships between different entities.
- the knowledge graph construction method determines the information extraction strategy used for information extraction of the source data for constructing the knowledge graph by receiving information extraction instructions, and uses the information extraction strategy to extract information from the source data to obtain multiple multiple groups Data, and then construct a knowledge graph based on the multiple sets of data.
- information extraction strategies can be configured according to business needs, and different information extraction strategies can be adopted for source data in different fields, so that it can be based on different fields.
- the source data to construct the knowledge graph ensures the applicable scope of the knowledge graph construction method and improves the flexibility of constructing the knowledge graph.
- the method may further include: obtaining a knowledge graph ontology model that needs to be used when constructing the knowledge graph, and the knowledge graph ontology model defines the data of the multiple sets of data in the knowledge graph.
- Standardized description receiving the mapping strategy instruction, the mapping strategy instruction is used to instruct the mapping strategy of associating and mapping multiple multi-group data according to the standardized description of the multi-group data; according to the standardized description of the multi-group data and the mapping strategy indicated by the mapping strategy instruction , Perform association mapping on multiple multiple sets of data, and obtain multiple multiple sets of data that use standardized descriptions of multiple sets of data for standardized descriptions.
- the realization process of constructing a knowledge graph based on multiple multiple sets of data includes: constructing a knowledge graph based on multiple multiple sets of data after standardized description.
- Association mapping is also called knowledge mapping.
- the knowledge mapping refers to the establishment of the mapping relationship between the extracted elements and the ontology elements, and the ontology elements are used to standardize the description of the corresponding extracted elements according to the mapping relationship. Through knowledge mapping, a unified representation of multiple sets of data can be realized, and the readability of the knowledge graph can be improved.
- the matching degree between each extracted element and the ontology element can be obtained.
- the matching degree between an extracted element and an ontology element is greater than the matching degree threshold, the mapping relationship between the extracted element and the ontology element can be established, and the ontology element can be instructed to perform a standardized description of the extracted element.
- the user can configure the mapping strategy through the terminal.
- the realization process includes: the user can indicate the mapping relationship between the extracted elements in the multi-group data and the standardized description of the ontology element defined by the knowledge graph ontology model through the terminal, and instruct the use of the ontology element to standardize the description of the extracted elements with the mapping relationship. .
- the user configures the mapping strategy, and uses the configured mapping strategy to associate the multi-group data, so that the knowledge graph construction device can use different mapping strategies for different types of data, and can improve the accuracy of the association mapping of the multi-group data. Improved the accuracy of knowledge graph construction.
- the method may further include: according to a specified multiple set of data matching strategy, among multiple multiple sets of data, it is determined that the difference in information indicating the same entity is included. Multiple sets of data; combined processing of different multiple sets of data including information indicating the same entity.
- the realization process of constructing a knowledge graph based on multiple multiple sets of data includes: constructing a knowledge graph based on the multiple multiple sets of data that have been merged.
- the representation of the information used to indicate the same entity may be different. If the knowledge graph is constructed directly based on the extracted multiple sets of data, the same entity using different representations may be regarded as different Entities, resulting in the constructed knowledge graph cannot accurately reflect the content embodied in the source data.
- the accuracy of the constructed knowledge graph can be improved by merging different multiple sets of data including elements for indicating the same entity, and constructing a knowledge graph based on the multiple sets of data after the merging process.
- the method before determining different multiple sets of data including information indicating the same entity among the multiple sets of data according to the specified multiple set of data matching strategy, the method further includes: receiving a matching strategy instruction , The matching strategy instruction is used to indicate the matching algorithm and the matching degree threshold for judging whether the different multi-group data includes information indicating the same entity.
- the realization process of determining different multi-group data including information indicating the same entity includes: when according to the matching algorithm indicated by the matching strategy instruction, When it is determined that the matching degree of the information indicating the entity in the two tuple data is not less than the matching degree threshold, it is determined that the two tuple data includes the information indicating the same entity.
- the matching algorithm is selected by the matching strategy instruction, and the selected matching algorithm is used to determine whether the elements indicating the same entity are included in the different multi-group data, so that different matching algorithms can be used for the elements obtained based on the data in different fields, which can improve knowledge
- the flexibility of mapping and the accuracy of obtaining matching degrees improve the accuracy and comprehensiveness of the knowledge map construction.
- the source data includes: multiple channels of data with different sources. That is, the method for constructing a knowledge graph provided by the embodiment of the present application can construct a knowledge graph for multiple channels of data.
- the implementation process of using the information extraction strategy indicated by the information extraction instruction to extract information from the source data to obtain multiple sets of data may include: separately using the information extraction instructions for each path of data as indicated by the information extraction instruction.
- the information extraction strategy is to extract information from each channel of data to obtain multiple multiple sets of data corresponding to the multiple channels of data.
- the realization process of constructing a knowledge graph based on multiple multiple sets of data includes: constructing a knowledge graph based on multiple multiple sets of data corresponding to the multiple sets of data. In this way, the efficiency of constructing a knowledge graph based on multiple channels of data can be improved.
- the method may further include: after determining that the source data is updated, according to the strategy indicated by the information extraction instruction, performing the incremental data in the updated source data The information is extracted to obtain multiple sets of data corresponding to the incremental data; the knowledge graph is updated according to the multiple sets of data corresponding to the incremental data.
- the amount of calculation in the process of constructing the knowledge graph based on the updated source data can be reduced, and the construction efficiency of constructing the knowledge graph can be improved.
- the implementation process of using the information extraction strategy indicated by the information extraction instruction to extract information from the source data may include: using the AI model indicated by the information extraction instruction to extract information from the source data.
- the AI model is a trained model, and the training samples of the AI model are labeled with the standardized description of the multi-group data in the knowledge graph ontology model, and the knowledge graph ontology model defines the standardized description of the multi-group data in the knowledge graph.
- the training samples of the AI model are annotated using the standardized description of the multi-group data in the knowledge graph ontology model
- the AI model trained with this annotation sample is used to extract information
- the multi-group data extracted by the AI model is based on knowledge
- the information represented by the ontology elements defined in the graph ontology model can reduce the subsequent standardized description of the extracted multi-group data based on the ontology elements, simplify the process of building the knowledge graph, and improve the efficiency of the knowledge graph construction.
- this application provides a knowledge graph construction device, the device comprising: a receiving module for receiving information extraction instructions, and the information extraction instructions are used to instruct the information extraction strategy adopted for information extraction on the source data for constructing the knowledge graph ; Extraction module, used to use the information extraction strategy indicated by the information extraction instruction to extract information from the source data to obtain multiple multiple sets of data, each multiple set of data including: information used to indicate the entity type of the entity, entity attributes Information and association relationship information; a building module used to construct a knowledge graph based on multiple sets of data.
- the knowledge graph records the entities included in the source data and the relationships between different entities.
- the device further includes: an acquisition module for acquiring the knowledge graph ontology model that needs to be used when constructing the knowledge graph, the knowledge graph ontology model defines the standardized description of the multi-group data in the knowledge graph; the receiving module is also used for Receive a mapping strategy instruction, the mapping strategy instruction is used to instruct a mapping strategy for associative mapping of multiple tuples of data according to the standardized description of the tuple data; the mapping module is used for the standardized description of the multiple sets of data and what the mapping strategy instruction indicates The mapping strategy is to perform associative mapping on multiple multiple sets of data to obtain multiple multiple sets of data with standardized descriptions of multiple sets of data.
- the building module is specifically used to: construct a knowledge graph based on multiple sets of data after standardized description.
- the device further includes: a determining module, which is used to determine, among the multiple multiple sets of data, different multiple sets of data including information indicating the same entity according to a specified multiple set of data matching strategy; and a merging module for Merging processing of different multiple sets of data including information indicating the same entity.
- a determining module which is used to determine, among the multiple multiple sets of data, different multiple sets of data including information indicating the same entity according to a specified multiple set of data matching strategy
- a merging module for Merging processing of different multiple sets of data including information indicating the same entity.
- the building module is specifically used to: construct a knowledge map based on multiple multiple sets of data after merging processing.
- the receiving module is further configured to receive a matching strategy instruction, and the matching strategy instruction is used to indicate a matching algorithm and a matching degree threshold for judging whether different sets of data include information indicating the same entity.
- the determining module is specifically configured to: when it is determined that the matching degree of the information indicating the entity in the two tuple data is not less than the matching degree threshold according to the matching algorithm indicated by the matching strategy instruction, it is determined that the two tuple data includes Indicates the information of the same entity.
- the source data includes: multi-channel data with different sources.
- the extraction module is specifically used to: use the information extraction strategy for each channel of data indicated by the information extraction instruction to perform information on each channel of data. Extraction to obtain multiple sets of data corresponding to the multiple channels of data.
- the building module is specifically used for: constructing a knowledge graph based on multiple sets of data corresponding to multiple channels of data.
- the extraction module is further configured to, after determining that the source data is updated, perform information extraction on the incremental data in the updated source data according to the strategy indicated by the information extraction instruction, to obtain the data corresponding to the incremental data.
- Multiple sets of data are further configured to, after determining that the source data is updated, perform information extraction on the incremental data in the updated source data according to the strategy indicated by the information extraction instruction, to obtain the data corresponding to the incremental data.
- the building module is also used to update the knowledge graph according to the multiple sets of data corresponding to the incremental data.
- the extraction module is specifically used to: use the AI model indicated by the information extraction instruction to extract information from the source data; wherein the AI model is a trained model, and the training samples of the AI model use the knowledge graph ontology
- the standardized description of the multi-group data in the model is annotated, and the knowledge graph ontology model defines the standardized description of the multi-group data in the knowledge graph.
- the present application provides a computing device that includes a processor and a memory; a computer program is stored in the memory; when the processor executes the computer program, the computing device implements the knowledge graph construction method provided in the first aspect.
- the present application provides a non-volatile storage medium, which implements the knowledge graph construction method provided in the first aspect when the instructions in the storage medium are executed by the processor.
- FIG. 1 is a schematic diagram of deployment of a knowledge graph building apparatus provided by an embodiment of the present application
- FIG. 2 is a schematic diagram of deployment of another apparatus for constructing a knowledge graph provided by an embodiment of the present application
- FIG. 3 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
- FIG. 4 is a flowchart of a method for constructing a knowledge graph provided by an embodiment of the present application
- FIG. 5 is a logical block diagram of constructing a knowledge graph based on two channels of data provided by an embodiment of the present application
- FIG. 6 is a schematic diagram of an interface for selecting a knowledge graph ontology model provided by an embodiment of the present application
- FIG. 7 is a schematic diagram of a knowledge graph ontology model provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of an interface for selecting source data provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of an interface for selecting an information extraction strategy provided by an embodiment of the present application.
- FIG. 10 is a schematic diagram of an interface for selecting a mapping strategy according to an embodiment of the present application.
- FIG. 11 is a schematic diagram of an interface for selecting a matching strategy provided by an embodiment of the present application.
- FIG. 12 is a schematic diagram of a knowledge graph provided by an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of a knowledge graph construction device provided by an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of a knowledge graph construction device provided by an embodiment of the present application.
- Knowledge graph is a kind of semantic network, which describes objective things in the form of graphs.
- the knowledge graph consists of many nodes and the connections between different nodes.
- Nodes are used to represent entity types or entity attributes of entities such as persons or organizations.
- the connections between nodes also called edges) indicate that the entities represented by the nodes have a certain association relationship.
- entities can be represented by entity types, entity attributes, and association relationships.
- the association relationship between the node representing the entity type of a certain entity and the node representing the entity attribute of the entity may include: the attribution relationship between the entity type and the entity attribute.
- the association relationship between the node representing the entity type of a certain entity and the entity type representing other entities may include: external connections between the entity and the other entities.
- the knowledge graph can be applied to a variety of application scenarios.
- information can be recommended based on the knowledge graph.
- classification can be based on the knowledge graph.
- semantic search process the search can be performed based on the knowledge graph.
- the cause of the failure can be determined according to the attributes of each entity and the relationship between the entities presented in the knowledge graph, and the analysis of the failure can be realized.
- Entities are the most basic elements in the knowledge graph. Different entities may have different relationships, and different entities may have different entity attributes.
- nodes can represent the actor’s family members, friends, partners, representative works, brokerage companies, and graduate colleges, etc. entity types; or, nodes can represent the entity types indicated by each entity type
- entity attributes such as the name, height, and nationality of the actor of the entity
- the edge between the node representing the entity type and the node representing the entity attribute can represent the attribution relationship between the entity attribute and the entity type
- the node representing the actor and the representation The edges between the nodes of family members can represent the husband and wife relationship, father-daughter relationship, and parent-child relationship between the actor and family members
- the edges between the node representing the actor and the node representing the friend can represent the relationship between the actor and the friend
- the edge between the node representing the actor and the node representing the partner can represent the cooperative relationship between the actor and the partner;
- the edge between the node representing the actor and the node representing the actor’s representative work It can represent the attribution relationship between the actor and the representative works of the actor; the
- the tuple data may include triple data, quadruple, quintuple, and so on.
- the representation form of the triplet data includes: "node-edge-node” and "node-attribute name-attribute value".
- the first word in the triple can be regarded as the subject
- the second word in the triple can be regarded as the predicate
- the third word in the triple can be regarded as the object
- the subject-predicate-object relationship is the triple The relationship between the first word and the third word in.
- the embodiment of the application provides a method for constructing a knowledge graph.
- the information extraction strategy used for information extraction of the source data for constructing the knowledge graph is determined, and the information extraction strategy is used to extract information from the source data.
- a knowledge graph is constructed based on the multiple sets of data.
- information extraction strategies can be configured according to business needs, and different information extraction strategies can be used for source data in different fields, so that it can be based on The source data in different fields constructs the knowledge graph, which ensures the applicable scope of the knowledge graph construction method and improves the flexibility of constructing the knowledge graph.
- the method for constructing a knowledge graph provided by the embodiment of the present application may be executed by a device for constructing a knowledge graph.
- the knowledge graph construction device can establish a communication connection with the terminal through a wired network or a wireless network, so that the terminal can send instructions to the knowledge graph construction device through the communication connection to control the knowledge graph construction device to execute the application according to the content indicated by the instruction
- the method for constructing a knowledge graph provided by the embodiment For example, the terminal may send an instruction to obtain the source data for constructing the knowledge graph to the knowledge graph construction device. After receiving the instruction, the knowledge graph construction device may obtain the source data according to the instruction, and execute the embodiment of the application according to the source data. Provide the knowledge graph construction method.
- the terminal may send an information extraction instruction to the knowledge graph construction device.
- the knowledge graph construction device After the knowledge graph construction device receives the information extraction instruction, it can use the information extraction strategy indicated by the information extraction instruction to extract information from the source data, and extract information based on the extracted information. Multiple sets of data construct a knowledge graph.
- the terminal can be a smart phone, a notebook computer, a tablet computer, a personal desktop computer, a smart camera, etc.
- a client can be installed in the terminal, and the user can interact with the knowledge graph construction device through the client.
- the user can also interact with the knowledge graph construction device through the web page in the terminal.
- FIG. 1 is a schematic diagram of the deployment of a knowledge graph construction apparatus provided by an embodiment of the present application.
- the knowledge graph construction apparatus 01 can be deployed in a cloud environment.
- the cloud environment is an entity that uses basic resources to provide cloud services to users in the cloud computing mode.
- the cloud environment includes cloud data centers and cloud service platforms, and cloud data centers include a large number of basic resources owned by cloud service providers.
- a cloud data center includes computing resources, storage resources, network resources, etc., and the computing resources may be a large number of computing devices (for example, servers).
- the knowledge graph construction device 01 can be independently deployed on a server or virtual machine in a cloud data center, or the knowledge graph construction device 01 can be distributedly deployed on multiple servers in a cloud data center, or, The knowledge graph construction device 01 may be distributedly deployed on multiple virtual machines in a cloud data center, or alternatively, the knowledge graph construction device 01 may be distributedly deployed on servers and virtual machines in a cloud data center.
- the knowledge graph construction device 01 can be abstracted into a cloud service for constructing a knowledge graph on the cloud service platform by the cloud service provider. After the user purchases the cloud service on the cloud service platform, the cloud environment can use the knowledge The graph construction device 01 constructs a cloud service of a knowledge graph for users. In addition, the user can upload the source data used to construct the knowledge graph to the cloud environment through the application program interface (API) or the web interface provided by the cloud service platform on the terminal for the knowledge graph construction device 01 to follow This source data constructs a knowledge graph. After completing the construction of the knowledge graph, the knowledge graph construction device 01 can send the constructed knowledge graph to the terminal used by the user, or store the knowledge graph in the cloud environment, for example, present it on the web interface of the cloud service platform for the user Check it out.
- API application program interface
- the knowledge graph construction device 01 can be logically divided into multiple parts, each part has a different function, and the multiple parts can be deployed in different environments in a distributed manner.
- the multiple parts in the collaborative realization of the function of constructing a knowledge graph for users.
- the multiple parts can be respectively deployed in any two or three of the terminal computing device, the edge environment, and the cloud environment.
- Terminal computing devices include: terminal servers, smart phones, notebook computers, tablet computers, personal desktop computers, smart cameras, etc.
- the edge environment is an environment that includes a collection of edge computing devices that are closer to the terminal computing device.
- Edge computing devices include: edge servers, edge small stations with computing power, etc.
- this application does not restrict which parts of the knowledge graph construction device 01 are specifically deployed in which environment. In actual application, it may be based on the computing capabilities of the terminal computing equipment, the resource occupancy of the edge environment and the cloud environment or the specific The application needs to be deployed adaptively.
- the knowledge graph construction device 01 when the knowledge graph construction device 01 is a software device, the knowledge graph construction device 01 can be released by the service provider in the form of an application, and the user can download the application To the terminal used by the user, and use the function of the knowledge graph construction device 01 in the terminal.
- the knowledge graph construction device 01 can also be separately deployed on a computing device in any environment.
- the computing device 100 may include a bus 101, a processor 102, a communication interface 103, and a memory 104.
- the processor 102, the memory 104, and the communication interface 103 communicate through a bus 101.
- the processor 102 may be a hardware chip, which may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
- ASIC application-specific integrated circuit
- PLD programmable logic device
- the above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
- the processor 810 may also be a general-purpose processor, for example, a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP.
- the memory 104 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM).
- volatile memory such as a random access memory (random access memory, RAM).
- the memory 104 may also include non-volatile memory (NVM), such as read-only memory (ROM), flash memory, HDD or SSD.
- NVM non-volatile memory
- the memory 104 stores executable code for constructing a knowledge graph, and the processor 102 reads the executable code in the memory 104 to execute the method for constructing a knowledge graph provided by the embodiment of the present application.
- the memory 104 may also include an operating system and other software modules and data required for running processes. And the operating system can be LINUX TM , UNIX TM , WINDOWS TM and so on.
- FIG. 4 is a flowchart of a method for constructing a knowledge graph provided by an embodiment of the application.
- the knowledge graph construction method can construct a knowledge graph based on one channel of data or multiple channels of data. The following takes the construction of a knowledge graph based on multiple channels of data, and the process of constructing the knowledge graph is executed by the knowledge graph construction device as an example, to explain the knowledge graph construction process .
- the embodiment of the present application also provides a logical block diagram (FIG. 5) for constructing a knowledge graph based on two channels of data (source data 1 and source data 2).
- the knowledge graph construction method includes the following steps:
- Step 401 Receive a knowledge graph construction request.
- a knowledge graph construction request can be sent to the knowledge graph construction device through the terminal to request the knowledge graph construction.
- Step 402 Receive a knowledge graph ontology model instruction.
- the knowledge graph ontology model instruction is used to instruct the knowledge graph ontology model used to construct the knowledge graph.
- Knowledge graph ontology model (also called ontology, ontology) is the skeleton and foundation of knowledge graph.
- the knowledge graph ontology model is a standardized description of multiple sets of data in a specific field. That is, the knowledge graph ontology stipulates the standardized description of the elements in the multi-group data such as the standardized description of the entity type indicating the entity, the standardized description of the entity attributes, and the standardized description of the association relationship that should be included in the knowledge graph.
- the knowledge graph ontology stipulates the standardized description of the multi-group data that should be included in the knowledge graph
- constructing the knowledge graph based on the knowledge graph ontology model can avoid the useless information in the knowledge graph and ensure the entity types, entity attributes and associations in the knowledge graph.
- Elements such as relationships can be described in a unified way.
- the elements in the multi-group data obtained through information extraction are called extracted elements, and the standardized description of the elements in the multi-group data is called ontology elements.
- the user can send the knowledge graph ontology model instruction to the knowledge graph construction device through the terminal to indicate the knowledge graph ontology model that needs to be used when constructing the knowledge graph.
- the knowledge graph ontology model instruction may carry the knowledge graph ontology model.
- the knowledge graph ontology model instruction may carry the identification number or storage address of the knowledge graph ontology model, so that the knowledge graph construction device can obtain the corresponding knowledge graph ontology model according to the knowledge graph ontology model instruction.
- the deployment environment of the knowledge graph construction device may store a knowledge graph ontology model
- the stored knowledge graph ontology model may be a model constructed in the knowledge graph construction device, or it may be constructed in the terminal and stored in the deployment Model in the environment.
- the knowledge graph construction device has the function of creating the knowledge graph ontology model, and can also modify and delete the created knowledge graph ontology model, and modify the knowledge graph ontology model. The function of adding, deleting and modifying ontology elements.
- FIG. 6 is a schematic diagram of the setting interface of a knowledge graph construction device provided by an embodiment of the present application. As shown in FIG. 6, the user can select the knowledge graph ontology model that needs to be used when constructing the knowledge graph in the setting interface, and Click the "Next" button to trigger the sending of the knowledge graph ontology model instruction.
- Step 403 Acquire the knowledge graph ontology model needed to construct the knowledge graph according to the knowledge graph ontology model instruction.
- the knowledge graph construction device After receiving the knowledge graph ontology model instruction, the knowledge graph construction device can obtain the knowledge graph ontology model according to the instruction of the knowledge graph ontology model instruction. For example, when the knowledge graph ontology model instruction carries the identification number of the knowledge graph ontology model, the knowledge graph construction device can search for the knowledge graph ontology model indicated by the identification number in its deployment environment according to the identification number to obtain the The ontology model of the knowledge graph indicated by the identification number.
- FIG. 7 is a schematic diagram of the knowledge graph ontology model obtained according to the knowledge graph ontology model instruction in step 402.
- the knowledge graph ontology model defines the standardized description of entity types, the standardized description of entity attributes, and the standardized description of association relationships of entities that should be included in the knowledge graph.
- the entity types that should be included in the knowledge map are: characters, songs, movies, and other entity types.
- the entity attributes of the character include: name, birthday, nationality, height, and gender.
- the physical attributes of the song include: release date and name.
- the physical attributes of the movie include: the time of release and the country of release.
- the relationship between characters includes: spouse relationship, clan member relationship, parent relationship and parent-child relationship.
- the relationship between characters and songs includes: singing relationship.
- the relationship between the characters and the movie includes: the protagonist relationship or the director relationship.
- the relationship between movies and songs includes: use relationship.
- the knowledge graph construction device may be configured with a knowledge graph ontology model for constructing the knowledge graph by default.
- the knowledge graph construction device can obtain the knowledge graph ontology model of the default configuration, and Use the knowledge graph ontology model of the default configuration to construct a knowledge graph.
- step 402 if the knowledge graph ontology model is selected according to the application requirements, different knowledge graph ontology models can be used for different domains, which can improve the adaptability of the constructed knowledge graph and the domain, thereby improving the accuracy of the knowledge graph construction Sex.
- Step 404 Receive a source data instruction instruction.
- the terminal may send a source data instruction instruction to the knowledge graph construction device, where the source data instruction instruction is used to instruct to construct the source data of the knowledge graph.
- the source data instruction instruction may carry source data used to construct a knowledge graph.
- the source data instruction instruction may carry the storage address of the source data used to construct the knowledge graph, so as to notify the knowledge graph construction device to obtain the source data in the storage location indicated by the storage address.
- the user when the knowledge graph construction device is deployed in a cloud environment, the user can store the source data in the cloud data center in advance through the terminal, and send the source data instruction instruction to the knowledge graph construction device through the terminal, and the source data instruction instruction
- the storage address of the active data in the cloud data center is carried to notify the knowledge graph construction device to obtain the source data in the cloud data center according to the storage address.
- the source data indicated by the source data indication instruction may be preprocessed data.
- the preprocessing may include: converting the data type of the data into a data category that can be directly used by the knowledge graph building device. For example, after the terminal stores the source data in the data center, the cloud data center can convert the data type of the source data into a JSON data format, or convert the source data into a comma separated values (CSV) file format After obtaining the source data, the knowledge graph construction device does not need to perform data conversion on the source data, and can directly use the preprocessed data, so as to reduce the amount of data processing when the knowledge graph construction device constructs the knowledge graph.
- CSV comma separated values
- the source data instruction instruction may also carry the data category, encoding method, and separator used by the source data to inform the knowledge graph construction device of the source data data category, encoding method, and source data usage. Information such as separators. It should be noted that the knowledge graph construction device can also automatically identify information such as the data type, encoding method, and separator used by the source data of the source data, which is not specifically limited in the embodiment of the present application.
- FIG. 8 is a schematic diagram of a setting interface of a knowledge graph construction device provided by an embodiment of the present application. As shown in FIG.
- the user can select one or more channels of data required to construct the knowledge graph in the setting interface, and Set the name of the source data, add the storage address of each channel of data, fill in the data category of the source data, the encoding method, and the separator used by the source data, and you can also choose whether to set the header row of the source data.
- the user can click the "Next" button in the setting interface to trigger the sending source data instruction instruction.
- the embodiments of the present application do not limit the type and source of the source data used to construct the knowledge graph.
- the type of source data can be table structured data or text unstructured data.
- the source data can be data from Baidu Encyclopedia, data from Douban Movies, text data from entertainment news, or data from an enterprise's internal database or document library.
- the embodiment of the present application does not limit the method of obtaining source data.
- the data from the webpage can be obtained through a distributed crawler.
- Step 405 Acquire multiple channels of data according to the source data instruction instruction.
- the knowledge graph construction device can obtain the source data according to the instruction of the source data instruction instruction. For example, when the source data indicates the storage address of the active data carried in the instruction, the knowledge graph construction device may obtain the source data in the storage location indicated by the storage address. Or, when the source data instruction instruction carries active data, the knowledge graph construction device can directly read the source data carried in the source data instruction instruction. As an example, suppose that two channels of data are obtained according to the instruction of the source data, and the two channels of data are related introduction information of Zhang XX 1. Among them, Table 1 is the knowledge graph construction device obtained from a website according to the instruction of the source data One way of data, Table 2 is another way of data obtained from a certain database by the knowledge graph construction device according to the source data instruction.
- Step 406 Receive an information extraction instruction.
- the information extraction instruction is used to indicate the information extraction strategy adopted for information extraction of the source data.
- Information extraction refers to extracting multiple sets of data from source data.
- the multi-group data may include: information indicating the entity type of the entity, information of entity attributes, information of association relationship, and the like.
- the information extraction instruction indicates the implementation of the information extraction strategy may include: the information extraction instruction carries the algorithm identification of the information extraction algorithm.
- the knowledge graph construction device pre-stores the program instructions of multiple candidate information extraction algorithms. After the knowledge graph construction device receives the algorithm identification carried in the information extraction instruction, it can be determined in the multiple candidate information extraction algorithms according to the algorithm identification
- the algorithm identifies the indicated information extraction algorithm, and uses the information extraction algorithm to extract information from the source data.
- the information extraction strategies adopted for information extraction on the multiple channels of data may be the same or different, which is not specifically limited in the embodiment of the present application.
- the information extraction instruction may be triggered by performing a specified operation after selecting the information extraction algorithm in the setting interface of the knowledge graph construction device.
- FIG. 9 is a schematic diagram of the setting interface of a knowledge graph construction device provided by an embodiment of the present application. As shown in FIG. 9, the user can select corresponding information extraction strategies for different source data in the setting interface, and click " Next" button to trigger the sending of information extraction instructions.
- Step 407 Use the information extraction strategy corresponding to each channel of data indicated by the information extraction instruction to perform information extraction on each channel of data to obtain multiple multiple sets of data corresponding to each channel of data.
- the information extraction strategy used when extracting information for different types of data can be different.
- fixed rules can be used for information extraction
- an artificial intelligence (AI) model can be used for information extraction.
- the expression of the fixed rules may include: expression through a general algorithm model, preset plug-in scripts, and configured function plug-ins.
- the fixed rule may be a regular expression, a rule function, or a semantic-based analysis method.
- information can be extracted according to the rules of data adaptive change.
- an AI model can be used for information extraction.
- annotated samples can be used to train the AI model to ensure that the AI model has better information extraction performance.
- annotated samples can be annotated using ontology elements in the ontology model of the knowledge graph.
- the multi-group data extracted by the AI model is the information represented by the ontology elements defined in the knowledge graph ontology model, which can reduce the subsequent extraction based on ontology element pairs.
- the process of standardized description of multiple sets of data simplifies the process of constructing knowledge graphs and improves the efficiency of knowledge graph construction.
- the knowledge graph building device may also be configured with a function plug-in custom function.
- the function plug-in custom function refers to the input interface and output interface reserved for accessing the function plug-in when deploying the knowledge graph construction device, and stipulates the conditions that the input interface and output interface need to meet, so that users can meet the application requirements
- the following uses the AI model for information extraction as an example to illustrate the implementation process of information extraction for three information extraction scenarios.
- the three information extraction scenarios are: information extraction scenarios under mode constraints, open information extraction scenarios, and event extraction scenarios.
- each information extraction process extracts a specified type of multiple sets of data.
- a predicate model, a subject model, and an object model are used in order to extract information from the data to be extracted.
- the data to be extracted may be part of the data in the source data, for example, it may be a sentence in the source data.
- the predicate model is used to determine whether there is a specified type of multivariate data in the data to be extracted.
- the input of the predicate model is the data to be extracted, and the output of the predicate model is the result of whether the specified type of multivariate data exists in the data to be extracted.
- Subject model is used to extract the subject of the specified type of multiple set of data from the to be extracted when there is a specified type of multiple set of data in the data to be extracted.
- the input of the subject model is the type information of the to-be-extracted data and the specified type of multivariate data.
- the output of the subject model is the subject of the specified type of tuple data.
- the object model is used to extract the object of the specified type of multiple set of data from the to be extracted when there is a specified type of multiple set of data in the data to be extracted.
- the input of the object model is the data to be extracted, the type description of the specified type of tuple data, and the subject of the specified type of tuple data.
- the output of the object model is the object of the specified type of tuple data.
- the predicate model, subject model and object model all have an input layer, a feature extraction layer and an output layer.
- the input layer is used to divide the data to be extracted according to words or words, use a vector to represent each part of the divided data, and indicate the position of each part of the divided data in the data to be extracted (ie, position embedding function).
- the feature extraction layer is used to extract the features of the vector input from the input layer.
- the output layer is used to determine the type of each part of the divided data according to the features extracted by the feature extraction layer.
- the input layers of the predicate model, subject model, and object model can all be implemented using a Bert model (a language representation model).
- the feature extraction layers of the predicate model, the subject model, and the object model can all be implemented using the dilate gated convolutional neural network (DGCNN) model (a language representation model).
- DGCNN dilate gated convolutional neural network
- the output layer of the predicate model, the subject model and the object model can all be implemented using the Sigmoid function (a sigmoid function).
- the specified type of triple data is (book, author, person), (book, publisher, publisher), (person, country, nationality), then the above sentence can be extracted
- the results were (Forest News-Autumn, Author, Vi Bianchi), (Forest News-Autumn, Publishing House, 21st Century Press), (Vi Bianchi, Nationality, Soviet Union).
- the predicate model, subject model and object model are used in order to extract information from the data to be extracted.
- the predicate model is used to extract predicates of multiple sets of data from the data to be extracted.
- the input of the predicate model is the data to be extracted, and the output of the predicate model is the predicate of the multivariate data.
- the subject model is used to extract the subject of multiple sets of data from the data to be extracted.
- the input of the subject model is the predicate of the data to be extracted and the multi-group data.
- the output of the subject model is the subject of the tuple data.
- the object model is used to extract the object of multiple sets of data from the data to be extracted.
- the input of the object model is the data to be extracted, the subject and the predicate of the multiple set of data.
- the output of the object model is the object of the tuple data.
- the implementation of the predicate model, subject model, and object model can refer to the implementation of the predicate model, subject model, and object model in the information extraction scenario under the aforementioned mode constraints.
- the data extracted each time is an event composed of multiple sets of data of a specified type.
- event types and event attributes need to be defined in advance.
- the information extraction logic is: first identify the trigger word and event type of the event, then extract the event elements, and determine the role of each event element.
- the subject model, the predicate model and the object model are used in turn to extract information from the data to be extracted.
- the subject model is used to determine whether there are predefined event types and trigger words in the data to be extracted.
- the input of the subject model is the data to be extracted.
- the output of the subject model is the result of whether there is a predefined event type in the data to be extracted.
- the predicate model is used to determine whether there are predefined event attributes in the data to be extracted.
- the input of the predicate model is the type information of the data to be extracted and the predefined event type, and the output of the predicate model is the event attribute existing in the data to be extracted.
- the object model is used to extract the attribute value of the event attribute from the data to be extracted.
- the input of the object model is the data to be extracted, the type information of the predefined event type, and the attribute information of the event attributes existing in the data to be extracted.
- the output of the object model is the attribute value of each event attribute.
- the output of the subject model, predicate model, and object model constitute an event.
- the implementation of the predicate model, subject model, and object model can refer to the implementation of the predicate model, subject model, and object model in the information extraction scenario under the aforementioned mode constraints.
- the data to be extracted is "Banana Company will hold a new product launch conference at 10 a.m. Western time on September 12 (1 a.m. Beijing time on September 13).
- the venue for the launch will be the newly built Steve Jobs Theater.
- Banana Company will release ichne8, ichne7s, ichne7s Plus, ichnech 3 and the new ichne TV at this press conference.”
- the event type as "Release Conference”
- the event attributes include "time”, "location", "company”, and "product”.
- the subject model is used to determine whether the event type "press conference" appears in the data to be extracted.
- the input is the data to be extracted
- the output is the result of whether there is an event type "release meeting” in the data to be extracted
- the subject model can also mark the trigger word "new product release” in the data to be extracted to distinguish the data to be extracted Multiple events of the same type that may occur in the.
- the predicate model is used to determine whether the event attributes "time”, “location”, “company”, and “product” appear in the data to be extracted according to the types of events that appear in the data to be extracted. Its input is the data to be extracted and the type information of the event type, and its output is the event attribute existing in the data to be extracted.
- the object model is used to extract the attribute value of the event attribute from the data to be extracted.
- the input is the data to be extracted, the event type "press conference” and the event attributes "time”, “location”, “company”, and “product”.
- the output is the attribute value of each event attribute in the data to be extracted, for example, the corresponding event attribute "time”, its output is: September 12th at 10 am Western time, the corresponding event attribute "location”, its output is: history Steve Jobs Theater, corresponding to the event attribute "company”, its output is: Apple, corresponding to the event attribute "product”, and its output is: ichne8, ichne7s, ichne7sPlus, ichnech 3 and the new ichne TV.
- triple data can be obtained: (press conference, company, banana company), (press conference, time, 10:00 AM, September 12, Western time), (release Meeting, location, Steve Jobs Theater), (conference, product, ichne8), (conference, product, ichne7s), etc.
- These triples of data constitute the result of event extraction:
- Event type press conference
- the knowledge graph construction device may be configured with an information extraction strategy by default. When step 406 is not performed, in this step 407, the knowledge graph construction device may use the default configuration information extraction strategy to perform information extraction on the source data.
- the knowledge graph construction device can adopt different information extraction strategies for the source data in different fields, which can improve the accuracy of the information extracted from the source data and ensure The accuracy of the knowledge graph constructed based on the source data in different fields is ensured, the applicable scope of the knowledge graph construction method is guaranteed, and the flexibility of the knowledge graph construction is improved.
- Step 408 Receive a mapping strategy instruction.
- the mapping strategy instruction is used to indicate a mapping strategy for associative mapping (also called knowledge mapping) of multiple sets of data according to the ontology element.
- Knowledge mapping refers to the establishment of a mapping relationship between extracted elements and ontology elements, and the use of ontology elements to standardize the description of the corresponding extracted elements according to the mapping relationship. For example, when the formal expression of the subject in the tuple data defined by the knowledge graph ontology model is "name”, if the subject in the extracted tuple data is "name”, then the "name” and "name” can be established according to the mapping strategy. According to the mapping relationship, the “name” is standardized as “name”.
- the mapping strategies corresponding to the multiple channels may be the same or different, which is not specifically limited in the embodiment of the present application.
- the knowledge graph construction device can obtain the matching degree between each extracted element and the ontology element.
- the knowledge graph construction device can establish a mapping relationship between the extracted element and the ontology element, and instruct to use the ontology element to perform a standardized description of the extracted element. For example, when the matching degree between the extracted element "name” and the ontology element "name” is greater than the matching degree threshold, the mapping relationship between "name” and “name” can be established, and the "name” can be standardized as "name” based on the mapping relationship ".
- the mapping strategy instruction is used to instruct the establishment of the mapping relationship between the ontology element and the extracted element according to the matching degree, and the matching degree algorithm used to obtain the matching degree.
- the mapping strategy instruction may instruct to establish a mapping relationship between the ontology element and the extracted element according to the matching degree, and the matching degree algorithm used to obtain the matching degree may be an edit distance similarity algorithm.
- the user can configure the mapping strategy in the setting interface of the knowledge graph construction device through the terminal.
- the realization process includes: the user can indicate the mapping relationship between the extraction element and the ontology element through the terminal, and instruct the use of the ontology element to standardize the description of the extraction element with the mapping relationship. After the user completes the configuration, he can trigger the sending of the mapping strategy instruction by executing the specified operation in the setting interface.
- the process of configuring the mapping strategy is essentially to indicate the different ontology elements according to the determined ontology elements. The process of extracting elements with a mapping relationship.
- FIG. 10 is a schematic diagram of a setting interface of a knowledge graph building apparatus provided by an embodiment of the present application.
- the user can add extraction elements that have a mapping relationship with ontology elements in the setting interface.
- the entity type namely the ontology entity type
- the entity type namely the extracted entity type
- the association relationship in the extracted element with which the mapping relationship exists that is, the extraction association relationship
- the extraction association relationship can be added to map the association relationship.
- the entity attributes in the known ontology elements that is, the ontology entity attributes
- the entity attributes in the extraction elements that have a mapping relationship with the entity attributes can be added to perform knowledge mapping on the entity attributes.
- Step 409 According to the mapping strategy indicated by the mapping strategy instruction and the standardized description of the multivariate data, the multiple multivariate data extracted according to each channel of data are respectively associated and mapped, and the standardized description of the multivariate data is obtained using the standardized description of the multivariate data. Multiple sets of data.
- the knowledge graph construction device can perform knowledge mapping on the multiple sets of data according to the ontology element according to the mapping strategy indicated by the mapping strategy instruction, and obtain multiple sets of data for standardized description using the ontology element.
- the extracted elements can be standardized and described according to the ontology elements defined by the knowledge graph ontology model, which realizes the unified representation of the extracted elements and improves the readability of the knowledge graph.
- the knowledge graph construction device may be configured with a mapping strategy by default. When step 408 is not performed, in step 409, the knowledge graph construction device may use the default configured mapping strategy to associate and map the multi-group data. However, by selecting the mapping strategy and using the selected mapping strategy to associate the multi-group data, the knowledge graph construction device can use different mapping strategies for different types of data, which can improve the accuracy of the association mapping of the multi-group data. , Improve the accuracy of knowledge map construction.
- Step 410 Receive a matching strategy instruction.
- the representation of the information used to indicate the same entity may be different. If the knowledge graph is constructed directly based on the extracted multiple sets of data, the same entity using different representations may be regarded as different Entities, resulting in the constructed knowledge graph cannot accurately reflect the content embodied in the source data. Therefore, before constructing the knowledge graph based on the multiple sets of data, it can also be judged whether different multiple sets of data include elements for indicating the same entity, and different multiple sets of data including elements for indicating the same entity can be merged (also It is called knowledge conflation, so as to construct a knowledge graph based on the multi-group data after merging processing, thereby improving the accuracy of the constructed knowledge graph.
- the entity type information obtained by information extraction based on the source data shown in Table 1 is "Name: Chapter 1”
- the entity type information obtained by information extraction based on the source data shown in Table 2 is "Name: "1 XX Chapter”, although the two are expressed in different ways, both are used to indicate the same entity. At this time, knowledge fusion can be carried out on the two.
- the matching strategy instruction is used to indicate whether or not the matching algorithm and the matching degree threshold for indicating the elements of the same entity are included in the different multi-group data.
- the knowledge graph construction device can obtain the matching degree of the elements in different multi-group data according to the matching degree algorithm. When the matching degree of the elements in the different multi-group data is not less than the matching degree threshold, it is determined that the elements in the different multi-group data are used to indicate the same Entity, at this time, the elements in different sets of data used to indicate the same entity can be merged.
- FIG. 11 is a schematic diagram of the setting interface of a knowledge graph construction device provided by an embodiment of the present application. As shown in FIG. 11, the user can select different elements in the setting interface to be used when knowledge fusion is performed. The matching algorithm and matching threshold.
- the matching algorithm and matching degree threshold can be set separately for different entity attributes of the entity, and for an entity with multiple entity attributes, when judging whether the entity and other entities are the same entity, the judgment result can be this
- the "integration" of the algorithm results of the matching algorithm corresponding to the different entity attributes of the entity For example, it may be the intersection of the algorithm results of the matching algorithms corresponding to different entity attributes of the entity.
- each attribute can also be configured with multiple matching algorithms. After the setting is completed, you can click the "Next" button to trigger the matching strategy instruction.
- Step 411 According to the multiple set of data matching strategy indicated by the matching strategy instruction, among multiple multiple sets of data after standardized description, it is determined that different multiple sets of data including elements indicating the same entity are included. For those that include elements indicating the same entity Different multivariate data is merged to obtain multiple multivariate data after the merge processing.
- Combining different multiple sets of data including elements indicating the same entity refers to using the same representation method to represent the same entity using different representation methods, so that the representation methods of the elements indicating the same entity are the same.
- the triple data obtained by extracting information based on the source data shown in Table 1 are (Zhang XX1, height, 164 cm), (Zhang XX1, gender, female), (Zhang XX1, gender, female), and (Zhang XX1, height, 164 cm), (Zhang XX1, gender, female).
- Zhang XX1, height, 164 cm Nationality, Chinese
- Zhang X1, birthday, February 9, 1979 Chinese
- Zhang X1, brother and sister, Zhang X2 Zhang X2
- Zhang X1, starring, my father and mother Zhang Moumou 1, starring, Crouching Tiger, Hidden Dragon
- the triple data obtained by extracting information according to the source data shown in Table 2 are (1 XX chapter, height, 164 cm), (1 XX chapter, gender, female), (1 XX chapter, siblings, Zhang XX 2), (1 XX chapter, starring, my father and mother), (1 XX chapter, starring, hero), (Zhang XX 1, starring, ambush on all sides), (1 XX chapter, Singer, ambush on all sides).
- the following triple data are obtained: (Zhang XX 1, height, 164 cm), (Zhang XX 1, gender, female), (Zhang XX 1.
- a matching algorithm and a corresponding matching degree threshold may be configured in the knowledge graph building device by default.
- the knowledge graph construction device may use the matching algorithm configured by default and the corresponding matching degree threshold to determine whether the different multi-group data includes elements for indicating the same entity.
- the knowledge graph construction device can use different matching algorithms for elements obtained based on data in different fields. It can improve the flexibility of knowledge mapping and the accuracy of obtaining matching degrees, and improve the accuracy and comprehensiveness of knowledge map construction.
- Step 412 Construct a knowledge graph based on the multiple multiple sets of data that have been merged.
- the knowledge graph records the entities included in the source data and the relationships between different entities.
- the foregoing steps 401 to 411 are all preparations for constructing a knowledge graph.
- the knowledge graph can be constructed based on the multiple multiple sets of data that have undergone merging processing.
- the process of constructing a knowledge graph based on the multiple sets of data can be understood as: a process of connecting multiple multiple sets of data into a semantic network according to the relationship between the elements in the multiple sets of data after the merging process.
- each node in the semantic network corresponds to an entity type or entity attribute in the tuple data
- the relationship between the nodes corresponds to the information of the association relationship in the tuple data
- the starting point of the arrow between the nodes corresponds to the data in the tuple data
- the element used as the subject, and the end of the arrow corresponds to the element used as the object in the tuple data.
- FIG. 12 is a schematic diagram of a knowledge graph constructed based on the multi-group data after the merging process in step 411.
- the knowledge graph records the entity type, entity attributes and association relationships in the multi-group data used to indicate the entity.
- the knowledge graph shows the source data of Table 1 and Table 2 in the form of a graph, which improves The degree of visualization of the source data is improved, and the convenience of analysis based on the source data is improved.
- Step 413 After determining that the source data is updated, perform information extraction on the incremental data in the updated source data according to the strategy indicated by the information extraction instruction to obtain multiple sets of data corresponding to the incremental data, and according to the increase The multiple sets of data corresponding to the quantity data update the knowledge graph.
- the incremental data of the updated source data relative to the source data can be obtained, and the constructed indicator graph can be updated according to the incremental data to obtain the updated source data
- the corresponding knowledge graph For example, you can first extract information from incremental data to obtain multiple tuples of data corresponding to the incremental data, and then perform knowledge mapping on multiple tuples of data corresponding to the incremental data, and then map the incremental data after association mapping Corresponding multiple sets of data are knowledge fused, and then the indicator map is updated based on the multiple sets of data after knowledge fusion.
- the knowledge graph construction method determines the information extraction strategy used for information extraction of the source data for constructing the knowledge graph by receiving the information extraction instruction, and uses the information extraction strategy to perform information extraction on the source data. Obtain multiple multiple sets of data, and then construct a knowledge graph based on the multiple multiple sets of data.
- information extraction strategies can be configured according to business needs, and different information extraction strategies can be adopted for source data in different fields, so that The knowledge graph can be constructed based on the source data in different fields, which ensures the applicable scope of the knowledge graph construction method and improves the flexibility of constructing the knowledge graph.
- the sequence of steps in the knowledge graph construction method provided in the embodiments of the present application can be adjusted appropriately, and the steps can also be increased or decreased according to the situation. For example, you can choose whether to perform the above steps 402, 406, 408, and 410 according to application requirements. . Any person familiar with the technical field can easily think of a method of change within the technical scope disclosed in this application, which should be covered by the protection scope of this application, and therefore will not be repeated.
- the embodiment of the present application also provides a knowledge graph construction device.
- the knowledge graph construction device 80 may include:
- the receiving module 801 is configured to receive an information extraction instruction, and the information extraction instruction is used to instruct an information extraction strategy used for information extraction of the source data for constructing the knowledge graph.
- the extraction module 802 is used to use the information extraction strategy indicated by the information extraction instruction to extract information from the source data to obtain multiple multiple sets of data.
- Each multiple set of data includes: information used to indicate the entity type of the entity, and entity attributes Information and association information.
- the construction module 803 is used to construct a knowledge graph based on multiple sets of data, and the knowledge graph records the entities included in the source data and the relationships between different entities.
- the knowledge graph construction device 80 further includes:
- the acquiring module 804 is configured to acquire a knowledge graph ontology model that needs to be used when constructing a knowledge graph, and the knowledge graph ontology model defines a standardized description of multiple sets of data in the knowledge graph.
- the receiving module 801 is further configured to receive a mapping strategy instruction, where the mapping strategy instruction is used to indicate a mapping strategy for associative mapping of multiple tuple data according to the standardized description of the tuple data.
- the mapping module 805 is used to perform associative mapping of multiple multiple sets of data according to the standardized description of the multiple sets of data and the mapping strategy indicated by the mapping strategy instruction to obtain multiple sets of data that are standardized and described using the standardized description of the multiple sets of data .
- the construction module 803 is specifically used for: constructing a knowledge graph based on multiple sets of data after standardized description.
- the knowledge graph construction device 80 further includes:
- the determining module 806 is configured to determine, among the multiple multiple sets of data, different multiple sets of data including information indicating the same entity according to the specified multiple set of data matching strategy.
- the merging module 807 is used for merging different multiple sets of data including information indicating the same entity.
- the construction module 803 is specifically used for: constructing a knowledge graph based on the multiple multiple sets of data after merging processing.
- the receiving module 801 is further configured to receive a matching strategy instruction, where the matching strategy instruction is used to indicate a matching algorithm and a matching degree threshold for judging whether information indicating the same entity is included in different tuples of data.
- the determining module 806 is specifically configured to: when it is determined that the matching degree of the information indicating the entity in the two tuple data is not less than the matching degree threshold according to the matching algorithm indicated by the matching strategy instruction, it is determined that the two tuple data includes There is information indicating the same entity.
- the source data includes: multiple channels of data with different sources
- the extraction module 802 is specifically configured to: respectively adopt the information extraction strategy for information extraction of each channel of data indicated by the information extraction instruction, and perform information on each channel of data. Extraction to obtain multiple sets of data corresponding to the multiple channels of data.
- the construction module 803 is specifically used for constructing a knowledge graph based on multiple sets of data corresponding to the multiple channels of data.
- the extraction module 802 is further configured to, after determining that the source data has been updated, perform information extraction on the incremental data in the updated source data according to the strategy indicated by the information extraction instruction, to obtain the data corresponding to the incremental data. Multiple sets of data.
- the construction module 803 is also used to update the knowledge graph according to multiple multi-group data corresponding to the incremental data.
- the extraction module 802 is specifically configured to: use the AI model indicated by the information extraction instruction to extract information from the source data.
- the AI model is a trained model, and the training samples of the AI model are labeled with the standardized description of the multi-group data in the knowledge graph ontology model, and the knowledge graph ontology model defines the standardized description of the multi-group data in the knowledge graph.
- the knowledge graph construction device receives information extraction instructions through the receiving module, and determines the information extraction strategy used to extract information from the source data for constructing the knowledge graph, and the extraction module uses the information extraction strategy to analyze the source
- the data extracts information to obtain multiple multiple sets of data, and then the building module constructs a knowledge graph based on the multiple multiple sets of data.
- the information extraction strategy makes it possible to construct a knowledge graph based on source data in different fields, guarantees the scope of application of the knowledge graph construction method, and improves the flexibility of constructing a knowledge graph.
- An embodiment of the present application also provides a computing device that includes a processor and a memory; the memory stores a computer program; when the processor executes the computer program, the computing device implements the knowledge graph construction provided by the embodiment of the application method.
- the computing device may be a server or a terminal.
- the structure of the computing device please refer to the structure of the computing device in FIG. 3 accordingly, which will not be repeated here.
- the computing device can work on an AI platform and a big data platform to use the AI platform to construct, train, and deploy the AI model used in the knowledge graph construction method provided in the embodiments of this application, and obtain the data from the big data.
- the embodiment of the present application also provides a storage medium, which is a non-volatile computer-readable storage medium, and when the instructions in the storage medium are executed by the processor, the method for constructing the knowledge graph provided by the embodiment of the present application is implemented.
- the embodiments of the present application also provide a computer program product containing instructions.
- the computer program product runs on a computer, the computer executes the knowledge graph construction method provided in the embodiments of the present application.
- the program can be stored in a computer-readable storage medium.
- the storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.
- the terms “first”, “second” and “third” are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance.
- the term “at least one” refers to one or more, and the term “plurality” refers to two or more, unless expressly defined otherwise.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请公开了一种知识图谱构建方法,包括:接收信息抽取指令,该信息抽取指令用于指示对构建知识图谱的源数据进行信息抽取采用的信息抽取策略;然后,采用该信息抽取指令所指示的信息抽取策略,对源数据进行信息抽取,得到多个多元组数据,每个多元组数据包括:用于指示实体的实体类型的信息、实体属性的信息和关联关系的信息;再根据多个多元组数据,构建知识图谱,该知识图谱记录源数据所包括的实体及不同实体之间的关系。本申请保证了知识图谱构建方法的适用范围,提高了知识图谱构建的灵活性。
Description
本申请涉及云计算技术领域,特别涉及一种知识图谱构建方法及装置、计算设备、存储介质。
越来越多的企业已经意识到知识对业务的重要性,迫切需要梳理业务中的知识体系,以提升工作效率和效果。知识图谱(knowledge graph,KG)作为知识组织与知识表示的一种表示形式,使用知识图谱表示知识体系已成为发展趋势。
相关技术中,在构建知识图谱时,需要先结合业务所属领域的领域知识设计知识图谱本体模型,然后对业务所涉及的数据进行信息抽取,以获取数据中用于指示实体的信息,然后将抽取到的信息填充到知识图谱本体中,得到知识图谱。
该知识图谱的构建过程通常采用定制化模块实现,该定制化模块是按照业务所属领域的领域需求定制的。但是,由于不同领域的需求不同,该定制化模块较难用于构建不同领域的知识图谱,导致其适用性较差。
发明内容
本申请提供了一种知识图谱构建方法及装置、计算设备、存储介质,可以解决相关技术中构建知识图谱的方法的适用性较差的问题。
第一方面,本申请提供了一种知识图谱构建方法,该方法包括:接收信息抽取指令,信息抽取指令用于指示对构建知识图谱的源数据进行信息抽取采用的信息抽取策略;采用信息抽取指令所指示的信息抽取策略,对源数据进行信息抽取,得到多个多元组数据,每个多元组数据包括:用于指示实体的实体类型的信息、实体属性的信息和关联关系的信息;根据多个多元组数据,构建知识图谱,知识图谱记录源数据所包括的实体及不同实体之间的关系。
本申请实施例提供的知识图谱构建方法,通过接收信息抽取指令,确定对构建知识图谱的源数据进行信息抽取采用的信息抽取策略,采用该信息抽取策略对源数据进行信息抽取得到多个多元组数据,然后根据该多个多元组数据构建知识图谱,相较于相关技术,能够根据业务需求配置信息抽取策略,并针对不用领域中的源数据采用不同的信息抽取策略,使得可以根据不同领域中的源数据构建知识图谱,保证了知识图谱构建方法的适用范围,提高了构建知识图谱的灵活性。
可选地,在根据多个多元组数据,构建知识图谱之前,该方法还可以包括:获取用于构建知识图谱时需要使用的知识图谱本体模型,知识图谱本体模型定义知识图谱中多元组数据的标准化描述;接收映射策略指令,映射策略指令用于指示根据多元组数据的标准化描述对多个多元组数据进行关联映射的映射策略;根据多元组数据的标准化描述和映射策略指令所指示的映射策略,对多个多元组数据进行关联映射,得到 采用多元组数据的标准化描述进行标准化描述的多个多元组数据。相应的,根据多个多元组数据构建知识图谱的实现过程,包括:根据标准化描述后的多个多元组数据,构建知识图谱。
关联映射也称知识映射。该知识映射是指建立从抽取元素与本体元素之间的映射关系,并根据该映射关系采用本体元素对对应的抽取元素进行标准化描述。通过知识映射可以实现多元组数据的统一表示,提高了知识图谱的可读性。
在映射策略的一种实现方式中,可以获取每个抽取元素与本体元素的匹配度。当某一抽取元素与一个本体元素的匹配度大于匹配度阈值时,可以建立该抽取元素与该本体元素的映射关系,并指示使用该本体元素对该抽取元素进行标准化描述。
在映射策略的另一种可实现方式中,用户可以通过终端配置映射策略。其实现过程包括:用户可以通过终端指示多元组数据中抽取元素与知识图谱本体模型定义的标准化描述的本体元素之间的映射关系,并指示使用本体元素对与其具有映射关系的抽取元素进行标准化描述。
通过用户配置映射策略,并使用配置的映射策略对多元组数据进行关联映射,使得知识图谱构建装置能够针对不同类型的数据使用不同的映射策略,能够提高对多元组数据进行关联映射的准确性,提高了知识图谱构建的准确性。
可选地,在根据多个多元组数据,构建知识图谱之前,该方法还可以包括:根据指定的多元组数据匹配策略,在多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据;对包括有指示同一实体的信息的不同多元组数据进行合并处理。相应的,根据多个多元组数据构建知识图谱的实现过程,包括:根据经过合并处理后的多个多元组数据,构建知识图谱。
当根据多个源数据构建知识图谱时,用于指示同一实体的信息的表示方式可能不同,若直接根据提取出的多元组数据构建知识图谱,可能会将采用不同表示方式的同一实体当做不同的实体,导致构建的知识图谱无法准确反映源数据体现的内容。通过对包括有用于指示同一实体的元素的不同多元组数据进行合并处理,并根据经过合并处理后的多元组数据构建知识图谱,能够提高构建的知识图谱的准确性。
在一种可实现方式中,在根据指定的多元组数据匹配策略,在多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据之前,该方法还包括:接收匹配策略指令,匹配策略指令用于指示判断不同多元组数据中是否包括有指示同一实体的信息的匹配算法和匹配度阈值。相应的,根据指定的多元组数据匹配策略,在多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据的实现过程,包括:当根据匹配策略指令所指示的匹配算法,确定两个多元组数据中指示实体的信息的匹配度不小于匹配度阈值时,确定两个多元组数据包括有指示同一实体的信息。
通过匹配策略指令选择匹配算法,并使用选择的匹配算法判断不同多元组数据中是否包括有指示同一实体的元素,使得能够对基于不同领域中的数据获得的元素采用不同的匹配算法,能够提高知识映射的灵活度和获取匹配度的准确性,提高了知识图谱构建的准确性和全面性。
可选地,源数据包括:来源不同的多路数据。也即是,本申请实施例提供的知识图谱构建方法能够针对多路数据构建知识图谱。相应的,采用信息抽取指令所指示的 信息抽取策略,对源数据进行信息抽取,得到多个多元组数据的实现过程,可以包括:分别采用信息抽取指令所指示的对每路数据进行信息抽取采用的信息抽取策略,对每路数据进行信息抽取,得到分别与多路数据对应的多个多元组数据。此时,根据多个多元组数据,构建知识图谱的实现过程,包括:根据与多路数据对应的多个多元组数据,构建知识图谱。这样一来,能够提高根据多路数据构建知识图谱的构建效率。
其中,在根据多个多元组数据,构建知识图谱之后,该方法还可以包括:在确定源数据发生更新后,根据信息抽取指令所指示的策略,对更新后的源数据中的增量数据进行信息抽取,得到增量数据对应的多个多元组数据;根据增量数据对应的多个多元组数据更新知识图谱。
通过对知识图谱进行增量更新,能够减小根据更新后的源数据构建知识图谱过程中的额计算量,可以提高构建知识图谱的构建效率。
在一种可实现方式中,采用信息抽取指令所指示的信息抽取策略,对源数据进行信息抽取的实现过程,可以包括:采用信息抽取指令所指示的AI模型,对源数据进行信息抽取。其中,AI模型为已经过训练的模型,且AI模型的训练样本使用知识图谱本体模型中多元组数据的标准化描述进行标注,知识图谱本体模型定义知识图谱中多元组数据的标准化描述。
由于AI模型的训练样本是使用知识图谱本体模型中多元组数据的标准化描述进行标注的,当使用该标注样本训练得到的AI模型抽取信息时,采用该AI模型抽取到的多元组数据是采用知识图谱本体模型中定义的本体元素表示的信息,这样能够减少后续根据本体元素对抽取出的多元组数据进行标准化描述的过程,简化知识图谱构建的过程,提高知识图谱的构建效率。
第二方面,本申请提供了一种知识图谱构建装置,该装置包括:接收模块,用于接收信息抽取指令,信息抽取指令用于指示对构建知识图谱的源数据进行信息抽取采用的信息抽取策略;抽取模块,用于采用信息抽取指令所指示的信息抽取策略,对源数据进行信息抽取,得到多个多元组数据,每个多元组数据包括:用于指示实体的实体类型的信息、实体属性的信息和关联关系的信息;构建模块,用于根据多个多元组数据,构建知识图谱,知识图谱记录源数据所包括的实体及不同实体之间的关系。
可选地,该装置还包括:获取模块,用于获取用于构建知识图谱时需要使用的知识图谱本体模型,知识图谱本体模型定义知识图谱中多元组数据的标准化描述;接收模块,还用于接收映射策略指令,映射策略指令用于指示根据多元组数据的标准化描述对多个多元组数据进行关联映射的映射策略;映射模块,用于根据多元组数据的标准化描述和映射策略指令所指示的映射策略,对多个多元组数据进行关联映射,得到采用多元组数据的标准化描述进行标准化描述的多个多元组数据。
相应的,构建模块,具体用于:根据标准化描述后的多个多元组数据,构建知识图谱。
可选地,该装置还包括:确定模块,用于根据指定的多元组数据匹配策略,在多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据;合并模块,用于对包括有指示同一实体的信息的不同多元组数据进行合并处理。
相应的,构建模块,具体用于:根据经过合并处理后的多个多元组数据,构建知 识图谱。
可选地,该接收模块,还用于接收匹配策略指令,匹配策略指令用于指示判断不同多元组数据中是否包括有指示同一实体的信息的匹配算法和匹配度阈值。
相应的,确定模块,具体用于:当根据匹配策略指令所指示的匹配算法,确定两个多元组数据中指示实体的信息的匹配度不小于匹配度阈值时,确定两个多元组数据包括有指示同一实体的信息。
其中,源数据包括:来源不同的多路数据,此时,抽取模块,具体用于:分别采用信息抽取指令所指示的对每路数据进行信息抽取采用的信息抽取策略,对每路数据进行信息抽取,得到分别与多路数据对应的多个多元组数据。
相应的,构建模块,具体用于:根据与多路数据对应的多个多元组数据,构建知识图谱。
可选地,该抽取模块,还用于在确定源数据发生更新后,根据信息抽取指令所指示的策略,对更新后的源数据中的增量数据进行信息抽取,得到增量数据对应的多个多元组数据;
相应的,构建模块,还用于根据增量数据对应的多个多元组数据更新知识图谱。
可选地,该抽取模块,具体用于:采用信息抽取指令所指示的AI模型,对源数据进行信息抽取;其中,AI模型为已经过训练的模型,且AI模型的训练样本使用知识图谱本体模型中多元组数据的标准化描述进行标注,知识图谱本体模型定义知识图谱中多元组数据的标准化描述。
第三方面,本申请提供了一种计算设备,该计算设备包括处理器和存储器;存储器中存储有计算机程序;处理器执行计算机程序时,计算设备实现第一方面提供的知识图谱构建方法。
第四方面,本申请提供了一种非易失性的存储介质,当存储介质中的指令被处理器执行时,实现第一方面提供的知识图谱构建方法。
图1是本申请实施例提供的一种知识图谱构建装置的部署示意图;
图2是本申请实施例提供的另一种知识图谱构建装置的部署示意图;
图3是本申请实施例提供的一种计算设备的结构示意图;
图4是本申请实施例提供的一种知识图谱构建方法的流程图;
图5是本申请实施例提供的一种根据两路数据构建知识图谱的逻辑框图;
图6是本申请实施例提供的一种选择知识图谱本体模型的界面示意图;
图7是本申请实施例提供的一种知识图谱本体模型的示意图;
图8是本申请实施例提供的一种选择源数据的界面示意图;
图9是本申请实施例提供的一种选择信息抽取策略的界面示意图;
图10是本申请实施例提供的一种选择映射策略的界面示意图;
图11是本申请实施例提供的一种选择匹配策略的界面示意图;
图12是本申请实施例提供的一种知识图谱的示意图;
图13是本申请实施例提供的一种知识图谱构建装置的结构示意图;
图14是本申请实施例提供的一种知识图谱构建装置的结构示意图。
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
为便于理解本申请实施例提供的知识图谱构建方法,下面先对知识图谱的相关知识进行介绍。
知识图谱是一种语义网络,其用图的形式描述客观事物。知识图谱由许多节点及不同节点之间的连接组成。节点用于表示人或组织等实体的实体类型或实体属性。节点之间的连接(也称作边)表示节点所表示的实体之间具有某种关联关系。其中,实体可以使用实体类型、实体属性和关联关系共同表示。表示某实体的实体类型的节点与表示该实体的实体属性的节点之间的关联关系可以包括:该实体类型与该实体属性之间的归属关系。表示某实体的实体类型的节点与表示其他实体的实体类型之间的关联关系可以包括:该实体与该其他实体之间的外部联系。
在本申请实施例中,知识图谱可以应用于多种应用场景。例如,在信息推荐系统中,可以基于知识图谱进行信息推荐。或者,在文本分类过程中,可以基于知识图谱进行分类。或者,在语义搜索过程中,可以基于知识图谱进行搜索。或者,在故障分析系统中,针对出现的故障,可以根据知识图谱所呈现的各个实体的属性和实体之间的关联关系,确定出现故障的原因,实现故障的分析等。
实体是具有可区别性且独立存在的某种事物。如某一个人、某一个城市、某一种植物或某一种商品等。实体是知识图谱中的最基本元素,不同的实体间存在的关系可能不同,且不同实体具有的实体属性可能不同。
例如,在表示某演员基本信息的知识图谱中,节点可以表示该演员的家庭成员、朋友、合作伙伴、代表作品、经纪公司和毕业院校等实体类型;或者,节点可以表示各个实体类型所指示的实体的演员的姓名、身高和国籍等实体属性;表示实体类型的节点与表示实体属性的节点之间的边,可以表示该实体属性与该实体类型的归属关系;表示该演员的节点与表示家庭成员的节点之间的边可以表示该演员与家庭成员之间的夫妻关系、父女关系和父子关系等;表示该演员的节点与表示朋友的节点之间的边可以表示该演员与朋友之间的朋友关系;表示该演员的节点与表示合作伙伴的节点之间的边可以表示该演员与合作伙伴之间的合作关系;表示该演员的节点与表示演员的代表作品的节点之间的边可以表示该演员与该演员的代表作品之间的归属关系;表示该演员的节点与表示经纪公司的节点之间的边可以该演员与经纪公司之间的合约关系;表示该演员的节点与表示毕业院校的节点之间的边可以表示该演员与毕业院校之间的关系。
在知识图谱中,可以通过多元组数据组织数据。该多元组数据可以包括三元组数据、四元组或五元组等。其中,三元组数据的表示形式包括:“节点-边-节点”和“节点-属性名-属性值”。三元组中第一个词语可视为主语,三元组中第二个词语可视为谓语,三元组中第三个词语可视为宾语,该主谓宾的关系即为三元组中第一个词语和第三个词语之间的关系。示例地,在使用表示形式“节点-属性名-属性值”表示的三元组数据“曹 操-小名-阿瞒”中,主语是曹操,谓语是小名,宾语是阿瞒,该主谓宾的关系为曹操的小名是阿瞒,该关系即为用于表示“曹操”的节点与用于表示“阿瞒”的属性值之间的关系。
本申请实施例提供了一种知识图谱构建方法,通过接收信息抽取指令,确定对构建知识图谱的源数据进行信息抽取采用的信息抽取策略,并采用该信息抽取策略对源数据进行信息抽取得到多个多元组数据,然后根据该多个多元组数据构建知识图谱,相较于相关技术,能够根据业务需求配置信息抽取策略,并针对不用领域中的源数据采用不同的信息抽取策略,使得可以根据不同领域中的源数据构建知识图谱,保证了知识图谱构建方法的适用范围,提高了构建知识图谱的灵活性。
本申请实施例提供的知识图谱构建方法可以由知识图谱构建装置执行。该知识图谱构建装置可以通过有线网络或无线网络,与终端建立通信连接,使得终端可以通过该通信连接向知识图谱构建装置发送指令,以控制知识图谱构建装置根据该指令所指示的内容执行本申请实施例提供的知识图谱构建方法。例如,终端可以向知识图谱构建装置发送指示获取用于构建知识图谱的源数据的指令,知识图谱构建装置接收到该指令后,可以根据该指令获取源数据,并根据源数据执行本申请实施例提供的知识图谱构建方法。或者,终端可以向知识图谱构建装置发送信息抽取指令,知识图谱构建装置接收到该信息抽取指令后,可以采用该信息抽取指令指示的信息抽取策略对源数据进行信息抽取,并根据抽取得到的多个多元组数据构建知识图谱。
其中,终端可以为智能手机、笔记本电脑、平板电脑、个人台式电脑和智能摄相机等。且该终端中可以安装有客户端,用户可以通过该客户端与知识图谱构建装置交互。或者,用户也可以通过终端中的网页与知识图谱构建装置交互。
图1是本申请实施例提供的一种知识图谱构建装置的部署示意图,如图1所示,该知识图谱构建装置01可部署在云环境中。云环境是云计算模式下利用基础资源向用户提供云服务的实体。云环境包括云数据中心和云服务平台,云数据中心包括云服务提供商拥有的大量基础资源。例如云数据中心包括计算资源、存储资源和网络资源等,且该计算资源可以是大量的计算设备(例如服务器)。可选的,知识图谱构建装置01可以独立地部署在云数据中心中的服务器或虚拟机上,或者,知识图谱构建装置01可以分布式地部署在云数据中心中的多台服务器上,或者,知识图谱构建装置01可以分布式地部署在云数据中心中的多台虚拟机上,再或者,知识图谱构建装置01可以分布式地部署在云数据中心中的服务器和虚拟机上。
如图1所示,知识图谱构建装置01可以由云服务提供商在云服务平台上,抽象成一种构建知识图谱的云服务,用户在云服务平台购买该云服务后,云环境可以利用该知识图谱构建装置01向用户构建知识图谱的云服务。并且,用户可以在终端上通过应用程序接口(application program interface,API),或者云服务平台提供的网页界面,将用于构建知识图谱的源数据上传至云环境,以供知识图谱构建装置01根据该源数据构建知识图谱。在完成知识图谱构建后,知识图谱构建装置01可以将构建得到的知识图谱发送至用户使用的终端,或者将知识图谱存储在云环境,例如:呈现在云服务平台的网页界面上,以供用户查看。
除此之外,该知识图谱构建装置01的部署方式还可以有多种。在另一种部署方式 中,该知识图谱构建装置01可以在逻辑上分成多个部分,每个部分具有不同的功能,该多个部分可以分布式地部署在不同的环境中,部署在不同环境中的多个部分协同实现为用户构建知识图谱的功能。例如:如图2所示,该多个部分可以分别部署在终端计算设备、边缘环境和云环境中的任意两个或三个中。终端计算设备包括:终端服务器、智能手机、笔记本电脑、平板电脑、个人台式电脑和智能摄相机等。边缘环境为包括距离终端计算设备较近的边缘计算设备集合的环境。边缘计算设备包括:边缘服务器、拥有计算力的边缘小站等。
应理解的是,本申请不对知识图谱构建装置01的哪些部分具体部署在什么环境进行限制性的划分,实际应用时可根据终端计算设备的计算能力、边缘环境和云环境的资源占有情况或具体应用需求进行适应性的部署。
在知识图谱构建装置01的又一种部署方式中,当知识图谱构建装置01为软件装置时,该知识图谱构建装置01可以由服务提供商以应用程序的形式发布,用户可以将该应用程序下载至用户使用的终端中,并在终端中使用该知识图谱构建装置01的功能。
在知识图谱构建装置01的再一种部署方式中,知识图谱构建装置01也可以单独部署在任意环境的一个计算设备上。如图3所示,该计算设备100可以包括总线101、处理器102、通信接口103和存储器104。处理器102、存储器104和通信接口103之间通过总线101通信。
其中,处理器102可以是硬件芯片,该硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器810也可以是通用处理器,例如,中央处理器(central processing unit,CPU),网络处理器(network processor,NP)或者CPU和NP的组合。
存储器104可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器104还可以包括非易失性存储器(non-volatile memory,NVM),例如只读存储器(read-only memory,ROM),快闪存储器,HDD或SSD。存储器104中存储有用于构建知识图谱的可执行代码,处理器102读取存储器104中的该可执行代码以执行本申请实施例提供的知识图谱构建方法。存储器104中还可以包括操作系统等其他运行进程所需的软件模块和数据等。且操作系统可以为LINUX
TM,UNIX
TM,WINDOWS
TM等。
图4为本申请实施例提供的一种知识图谱构建方法的流程图。该知识图谱构建方法可以根据一路数据或多路数据构建知识图谱,下面以根据多路数据构建知识图谱,且构建知识图谱的过程由知识图谱构建装置执行为例,对该知识图谱构建过程进行说明。同时,为便于理解,本申请实施例还提供了根据两路数据(分别为源数据1和源数据2)构建知识图谱的逻辑框图(图5)。如图4和图5所示,知识图谱构建方法包括以下步骤:
步骤401、接收知识图谱构建请求。
在用户需要采用知识图谱构建装置构建知识图谱时,可以通过终端向知识图谱构建装置发送知识图谱构建请求,以请求构建知识图谱。
步骤402、接收知识图谱本体模型指令。
知识图谱本体模型指令用于指示构建知识图谱所使用的知识图谱本体模型。知识图谱本体模型(也称本体,ontology)是知识图谱的骨架和基础。知识图谱本体模型是对特定领域中的多元组数据的标准化描述。也即是,该知识图谱本体规定了知识图谱中应该包括的用于指示实体的实体类型的标准化描述、实体属性的标准化描述和关联关系的标准化描述等多元组数据中元素的标准化描述。由于知识图谱本体规定了知识图谱中应该包括的多元组数据的标准化描述,根据知识图谱本体模型构建知识图谱,可以避免知识图谱中包括无用信息,并保证知识图谱中的实体类型、实体属性和关联关系等元素能够采用统一方式进行描述。其中,为便于描述,将通过信息抽取得到的多元组数据中的元素称为抽取元素,将多元组数据中元素的标准化描述称为本体元素。
用户可以通过终端向知识图谱构建装置发送知识图谱本体模型指令,以指示构建知识图谱时需要使用的知识图谱本体模型。并且,该知识图谱本体模型指令中可以携带有该知识图谱本体模型。或者,该知识图谱本体模型指令中可以携带有知识图谱本体模型的标识号或存储地址,以便于知识图谱构建装置能够根据该知识图谱本体模型指令获取对应的知识图谱本体模型。
其中,知识图谱构建装置的部署环境中可以存储有知识图谱本体模型,且该存储的知识图谱本体模型可以为在知识图谱构建装置中构建的模型,也可以为在终端中构建并存储在该部署环境中的模型。并且,为提高构建知识图谱的灵活性,该知识图谱构建装置除了具有创建知识图谱本体模型的功能,还可以具有对已创建的知识图谱本体模型进行修改和删除,及对知识图谱本体模型中的本体元素进行增加、删除和修改的功能。
在一种可实现方式中,知识图谱构建装置的部署环境中可以预先存储有多个备选的知识图谱本体模型,此时,用户可以通过终端在知识图谱构建装置的设置界面中选择知识图谱本体模型,并在选择完成后,可以通过在设置界面中执行指定操作,以触发发送知识图谱本体模型指令。示例的,图6是本申请实施例提供一种知识图谱构建装置的设置界面的示意图,如图6所示,用户可以在该设置界面中选择构建知识图谱时需要使用的知识图谱本体模型,并点击“下一步”按钮,以触发发送知识图谱本体模型指令。
步骤403、根据知识图谱本体模型指令,获取构建知识图谱需要使用的知识图谱本体模型。
知识图谱构建装置接收到知识图谱本体模型指令后,可以按照知识图谱本体模型指令的指示获取知识图谱本体模型。例如,当知识图谱本体模型指令中携带有知识图谱本体模型的标识号时,知识图谱构建装置可以根据该标识号,在其部署环境中查找该标识号所指示的知识图谱本体模型,以得到该标识号所指示的知识图谱本体模型。
示例地,图7为根据步骤402中的知识图谱本体模型指令,获取的知识图谱本体模型的示意图。如图7所示,该知识图谱本体模型定义了知识图谱中应包括的实体的实体类型标准化描述、实体属性的标准化描述和关联关系的标准化描述。其中,知识 图谱中应包括的实体类型(如图7中的实心圆点所示)有:人物、歌曲和电影等实体类型。人物的实体属性(如图7中的空心圆点所示)包括:名字、生日、国籍、身高和性别。歌曲的实体属性包括:发布日期和名称。电影的实体属性包括:上映时间和上映国家。人物与人物之间的关联关系包括:配偶关系、氏族成员关系、父母关系和亲子关系。人物与歌曲之间的关联关系包括:演唱关系。人物与电影之间的关联关系包括:主演关系或导演关系。电影与歌曲之间的关联关系包括:使用关系。
需要说明的是,在构建知识图谱的过程中,可以根据业务需求确定是否执行步骤402。并且,知识图谱构建装置中可以默认配置有用于构建知识图谱的知识图谱本体模型,在不执行步骤402时,在该步骤403中,知识图谱构建装置可以获取该默认配置的知识图谱本体模型,并使用该默认配置的知识图谱本体模型构建知识图谱。但是,当执行步骤402时,若根据应用需求选择知识图谱本体模型,能够针对不同领域使用不同的知识图谱本体模型,能够提高构建的知识图谱与领域的适配度,进而提高知识图谱构建的准确性。
步骤404、接收源数据指示指令。
终端可以向知识图谱构建装置发送源数据指示指令,该源数据指示指令用于指示构建知识图谱的源数据。在一种可实现方式中,该源数据指示指令中可以携带有用于构建知识图谱的源数据。在另一种可实现方式中,该源数据指示指令中可以携带有用于构建知识图谱的源数据的存储地址,以通知知识图谱构建装置在该存储地址所指示的存储位置中获取源数据。
示例地,当知识图谱构建装置部署在云环境中时,用户可以通过终端预先将源数据存储在云数据中心中,并通过终端向知识图谱构建装置发送源数据指示指令,且该源数据指示指令携带有源数据在云数据中心中的存储地址,以通知知识图谱构建装置根据该存储地址在云数据中心中获取该源数据。
并且,源数据指示指令所指示的源数据可以为经过预处理的数据。该预处理可以包括:将数据的数据类型转换为知识图谱构建装置能够直接使用的数据类别。例如,终端将源数据存储在与数据中心中之后,云数据中心可以将该源数据的数据类型转换成JSON数据格式,或将源数据转换成逗号分隔值(comma separated values,CSV)文件格式中的数据等,使得知识图谱构建装置在获取源数据后,无需对源数据进行数据转换,可以直接使用该经过预处理后的数据,以减小知识图谱构建装置构建知识图谱时的数据处理量。
可选的,该源数据指示指令中还可以携带有源数据的数据类别、编码方式和源数据使用的分隔符等,以通知知识图谱构建装置源数据的数据类别、编码方式和源数据使用的分隔符等信息。需要说明的是,知识图谱构建装置也可以自动识别源数据的数据类别、编码方式和源数据使用的分隔符等信息,本申请实施例对其不做具体限定。
进一步的,可以在知识图谱构建装置的设置界面中选择是否需要在源数据指示指令中携带上述信息。并且,在选择完成后,可以在该设置界面中执行指定操作,以触发发送携带有对应信息的源数据指示指令。示例的,图8是本申请实施例提供一种知识图谱构建装置的设置界面的示意图,如图8所示,用户可以在该设置界面中选择构建知识图谱所需的一路或多路数据,并设置源数据的名称,添加每路数据的存储地址, 填写源数据的数据类别、编码方式和源数据使用的分隔符等信息,还可以选择是否设置源数据的标题行。在完成该设置界面的配置后,用户可以点击设置界面中的“下一步”按钮,以触发发送源数据指示指令。
需要说明的是,本申请实施例不限定用于构建知识图谱的源数据的类型和来源。例如,源数据的类型可以为表格结构化数据或文本非结构化数据等。源数据可以为来源于百度百科的数据、来源于豆瓣电影的数据、来源于娱乐新闻文本数据或来源于企业内部的数据库或文档库等数据。并且,本申请实施例也不限定源数据的获取方式,例如,可以通过分布式爬虫方式获取来自网页的数据。
步骤405、根据该源数据指示指令,获取多路数据。
知识图谱构建装置接收到源数据指示指令后,可以按照源数据指示指令的指示获取源数据。例如,当源数据指示指令中携带有源数据的存储地址时,知识图谱构建装置可以在该存储地址所指示的存储位置中获取源数据。或者,当该源数据指示指令中携带有源数据时,知识图谱构建装置可以直接读取该源数据指示指令中携带的源数据。示例地,假设根据源数据指示指令获取了两路数据,该两路数据均为章某某1的相关介绍信息,其中,表1为知识图谱构建装置根据源数据指示指令从某网站中获取的一路数据,表2为知识图谱构建装置根据源数据指示指令从某数据库中获取的另一路数据。
表1
表2
步骤406、接收信息抽取指令。
信息抽取指令用于指示对源数据进行信息抽取采用的信息抽取策略。信息抽取是指从源数据中提取出多元组数据。该多元组数据可以包括:用于指示实体的实体类型的信息、实体属性的信息和关联关系的信息等。信息抽取指令指示信息抽取策略的实现方式可以包括:信息抽取指令中携带有信息抽取算法的算法标识。知识图谱构建装置中预先存储有多个备选信息抽取算法的程序指令,知识图谱构建装置接收到信息抽取指令中携带的算法标识后,可以根据该算法标识在多个备选信息抽取算法中确定该 算法标识所指示的信息抽取算法,并使用该信息抽取算法对源数据进行信息抽取。其中,当根据多路数据构建知识图谱时,对该多路数据进行信息抽取采取的信息抽取策略可以相同或不同,本申请实施例对其不做具体限定。
在一种可实现方式中,该信息抽取指令可以为在知识图谱构建装置的设置界面中选择信息抽取算法后,通过执行指定操作后触发的。示例的,图9是本申请实施例提供一种知识图谱构建装置的设置界面的示意图,如图9所示,用户可以在该设置界面分别为不同源数据选择对应的信息抽取策略,并点击“下一步”按钮,以触发发送信息抽取指令。
步骤407、分别采用信息抽取指令所指示的每路数据对应的信息抽取策略,对每路数据进行信息抽取,得到每路数据对应的多个多元组数据。
对不同类型的数据进行信息抽取时采用的信息抽取策略可以不同。示例地,对于结构化数据和半结构化数据,可以采用固定的规则进行信息抽取,或者,可以采用人工智能(artificial intelligence,AI)模型进行信息抽取。其中,固定的规则的表示方式可以包括:通过通用的算法模型、预置的插件脚本和配置化的函数插件等表示。可选的,该固定的规则可以为正则表达式、规则函数或基于语义的分析方法等。
对于非结构化数据,可以采用根据数据自适应变化的规则进行信息抽取。例如,可以采用AI模型进行信息抽取。并且,在使用AI模型进行信息抽取前,可以采用标注样本对AI模型进行训练,以保证该AI模型具有较优的信息抽取性能。进一步地,标注样本可以使用知识图谱本体模型中的本体元素进行标注。当使用该标注样本训练得到的AI模型抽取信息时,由该AI模型抽取到的多元组数据是采用知识图谱本体模型中定义的本体元素表示的信息,这样能够减少后续根据本体元素对抽取出的多元组数据进行标准化描述的过程,简化知识图谱构建的过程,提高知识图谱的构建效率。
并且,知识图谱构建装置还可以配置有功能插件自定义功能。该功能插件自定义功能是指在部署知识图谱构建装置时,预留用于接入功能插件的输入接口和输出接口,并规定该输入接口和输出接口需要满足的条件,以便于用户根据应用需求自定义的功能插件,并在自定义的功能插件的输入满足该输入接口的限制条件、输出满足该输出接口的限制条件时,使用该自定义的功能插件对源数据进行信息抽取。通过配置功能插件自定义的功能,能够便于用户根据应用需求自行配置功能插件,能够进一步提高构建知识图谱的灵活性,使得本申请实施例提供的知识图谱构建方法能够应用于更多的知识图谱构建场景,保证了该知识图谱构建方法的应用范围。
下面以采用AI模型进行信息抽取为例,分别针对三种信息抽取场景对信息抽取的实现过程进行说明。该三种信息抽取场景分别为:模式约束下的信息抽取场景、开放信息抽取场景和事件抽取场景。
在模式约束下的信息抽取场景中,每次信息抽取过程抽取一个指定类型的多元组数据。在每次信息抽取过程中,依次使用谓语模型(predicate model)、主语模型(subject model)和宾语模型(object model)对待抽取数据进行信息抽取。其中,待抽取数据可以为源数据中的部分数据,例如,可以为源数据中的一个句子。谓语模型用于判断待抽取数据中是否存在指定类型的多元组数据。该谓语模型的输入为待抽取数据,该谓语模型的输出为待抽取数据中是否存在该指定类型的多元组数据的结果。主语模型 用于在待抽取数据中存在指定类型的多元组数据时,从待抽取数据中抽取该指定类型的多元组数据的主语。该主语模型的输入为该待抽取数据和该指定类型的多元组数据的类型信息。该主语模型的输出为该指定类型的多元组数据的主语。宾语模型用于在待抽取数据中存在指定类型的多元组数据时,从待抽取数据中抽取该指定类型的多元组数据的宾语。该宾语模型的输入为该待抽取数据、该指定类型的多元组数据的类型说明、及该指定类型的多元组数据的主语。该宾语模型的输出为该指定类型的多元组数据的宾语。
该谓语模型、主语模型和宾语模型均具有输入层、特征提取层和输出层。输入层用于按照字或词对待抽取数据进行划分,使用向量表示划分后的每部分数据,并指示划分后的每部分数据在待抽取数据中的位置(即位置嵌入(position embedding)功能)。特征提取层用于提取从输入层输入的向量的特征。输出层用于根据特征提取层提取的特征判定划分后的每部分数据的类型。
可选地,谓语模型、主语模型和宾语模型的输入层均可以使用Bert模型(一种语言表征模型)实现。谓语模型、主语模型和宾语模型的特征提取层均可以使用膨胀门卷积神经网络(dilate gated convolutional neural network,DGCNN)模型(一种语言表征模型)实现。谓语模型、主语模型和宾语模型的输出层均可以使用Sigmoid函数(一种S型函数)实现。
例如,从句子“《森林报-秋》是2007年二十一世纪出版社出版的图书,作者是(苏联)维·比安基”中,其包含的三元组数据为(森林报-秋,作者,维·比安基)、(森林报-秋,出版时间,2007年)、(森林报-秋,出版社,二十一世纪出版社)、(森林报-秋,类型,图书)、(维·比安基,国籍,苏联)、(维·比安基,类型,人物)等。在模式约束下的信息抽取场景中,指定类型的三元组数据为(图书,作者,人物)、(图书,出版社,出版社)、(人物,国家,国籍),那么上述句子中可以抽出的结果分别为(森林报-秋,作者,维·比安基)、(森林报-秋,出版社,二十一世纪出版社)、(维·比安基,国籍,苏联)。
在开放信息抽取场景中,不需要限定抽取指定类型的多元组数据,可以直接在待抽取数据中抽取多元组数据,且抽取出的多元组数据中的主语、谓语和宾语为待抽取数据中直接出现了的词语。在每次信息抽取过程中,依次使用谓语模型、主语模型和宾语模型对待抽取数据进行信息抽取。其中,谓语模型用于从待抽取数据中抽取多元组数据的谓语。该谓语模型的输入为该待抽取数据,该谓语模型的输出为多元组数据的谓语。主语模型用于从待抽取数据中抽取多元组数据的主语。该主语模型的输入为该待抽取数据和多元组数据的谓语。该主语模型的输出为该多元组数据的主语。宾语模型用于从待抽取数据中抽取多元组数据的宾语。该宾语模型的输入为该待抽取数据、该多元组数据的主语和谓语。该宾语模型的输出为该多元组数据的宾语。其中,该谓语模型、主语模型和宾语模型的实现方式,可以相应参考前述模式约束下的信息抽取场景中谓语模型、主语模型和宾语模型的实现方式。
例如,从句子“《森林报-秋》是2007年二十一世纪出版社出版的图书,作者是(苏联)维·比安基”中,其包含的三元组数据为(森林报-秋,作者,维·比安基)、(森林报-秋,出版时间,2007年)、(森林报-秋,出版社,二十一世纪出版社)、(森林 报-秋,类型,图书)、(维·比安基,国籍,苏联)、(维·比安基,类型,人物)等。在开放信息抽取场景中,由于抽取出的多元组数据中的主语、谓语和宾语需要是待抽取数据中直接出现了的词语,因此,述句子中可以抽出的结果为(森林报-秋,作者,维·比安基)。
在事件抽取场景中,每次抽取出的数据为多个指定类型的多元组数据组成的事件。在执行信息抽取操作前,需要预先定义事件类型和事件属性。其信息抽取逻辑为:先识别事件的触发词和事件类型,然后抽取事件元素,并判断每个事件元素的角色。在每次信息抽取过程中,依次使用主语模型、谓语模型和宾语模型对待抽取数据进行信息抽取。其中,主语模型用于判断待抽取数据中是否存在预先定义的事件类型和触发词。该主语模型的输入为该待抽取数据。该主语模型的输出为待抽取数据中是否存在预先定义事件类型的结果。谓语模型用于判断待抽取数据中是否存在预先定义的事件属性。该谓语模型的输入为该待抽取数据和该预先定义的事件类型的类型信息,该谓语模型的输出为待抽取数据中存在的事件属性。宾语模型用于从待抽取数据中抽取事件属性的属性值。该宾语模型的输入为该待抽取数据、预先定义的事件类型的类型信息和待抽取数据中存在的事件属性的属性信息。该宾语模型的输出为每个事件属性的属性值。该主语模型、谓语模型和宾语模型的输出构成事件。其中,该谓语模型、主语模型和宾语模型的实现方式,可以相应参考前述模式约束下的信息抽取场景中谓语模型、主语模型和宾语模型的实现方式。
例如,待抽取数据为“香蕉公司将于西部时间9月12日上午10点(北京时间9月13日凌晨1点)举行新品发布会,发布会地点是全新建造的史蒂夫·乔布斯剧院。根据目前的消息,这次发布会上香蕉公司将会发布ichne8、ichne7s、ichne7s Plus、ichne ch 3以及全新ichne TV”。定义事件类型为“发布会”,事件属性包括“时间”、“地点”、“公司”、“产品”。
在抽取过程中,主语模型用于判断待抽取数据是否出现事件类型“发布会”。其输入是待抽取数据,其输出是待抽取数据中是否有事件类型“发布会”的结果,且主语模型还可以标注待抽取数据中的触发词“新品发布会”,用以区分待抽取数据中可能出现的多个同类型的事件。
谓语模型用于根据待抽取数据中出现的事件类型,判断待抽取数据中是否出现了事件属性“时间”、“地点”、“公司”、“产品”。其输入是待抽取数据和该事件类型的类型信息,其输出是待抽取数据中存在的事件属性。
宾语模型用于从待抽取数据中抽取事件属性的属性值。其输入是待抽取数据、事件类型“发布会”和事件属性“时间”、“地点”、“公司”、“产品”。其输出是待抽取数据中每个事件属性的属性值,例如,对应事件属性“时间”,其输出为:西部时间9月12日上午10点,对应事件属性“地点”,其输出为:史蒂夫·乔布斯剧院,对应事件属性“公司”,其输出为:苹果公司,对应事件属性“产品”,其输出为:ichne8、ichne7s、ichne7s Plus、ichne ch 3以及全新ichne TV。
根据主语模型、谓语模型和宾语模型的输出,可以得到多个三元组数据:(发布会,公司,香蕉公司),(发布会,时间,西部时间9月12日上午10点),(发布会,地点,史蒂夫·乔布斯剧院),(发布会,产品,ichne8),(发布会,产品,ichne7s)等等。这些三元组数据构成事件抽取的结果:
事件类型:发布会;
公司:香蕉公司;
时间:西部时间9月12日上午10点;
地点:史蒂夫·乔布斯剧院;
产品:ichne8,ichne7s,ichne7s Plus,ichne ch 3,ichneTV。
需要说明的是,在构建知识图谱的过程中,可以根据业务需求确定是否执行步骤406。并且,知识图谱构建装置中可以默认配置有信息抽取策略,在不执行步骤406时,在该步骤407中,知识图谱构建装置可以使用默认配置的信息抽取策略对源数据进行信息抽取。但是,通过选择对源数据进行信息抽取的信息抽取策略,使得知识图谱构建装置能够针对不同领域中的源数据采用不同的信息抽取策略,能够提高从源数据中抽取到的信息的准确性,保证了根据不同领域中源数据构建的知识图谱的准确性,保证了知识图谱构建方法的适用范围,提高了构建知识图谱的灵活性。
步骤408、接收映射策略指令。
映射策略指令用于指示根据本体元素对多个多元组数据进行关联映射(也称知识映射,knowledge mapping)的映射策略。知识映射是指建立从抽取元素与本体元素之间的映射关系,并根据该映射关系采用本体元素对对应的抽取元素进行标准化描述。例如,当知识图谱本体模型定义的多元组数据中主语的形式化表达为“名称”时,若抽取的多元组数据中主语为“名字”,则根据映射策略可以建立“名称”与“名字”的映射关系,并根据该映射关系将“名字”标准化描述为“名称”。其中,当根据多路数据构建知识图谱时,多路数据对应的映射策略可以相同或不同,本申请实施例对其不做具体限定。
在映射策略的一种实现方式中,知识图谱构建装置可以获取每个抽取元素与本体元素的匹配度。当某一抽取元素与一个本体元素的匹配度大于匹配度阈值时,知识图谱构建装置可以建立该抽取元素与该本体元素的映射关系,并指示使用该本体元素对该抽取元素进行标准化描述。例如,当抽取元素“名字”与本体元素“名称”的匹配度大于匹配度阈值时,可以建立“名称”与“名字”的映射关系,并根据该映射关系将“名字”标准化描述为“名称”。
此时,映射策略指令用于指示根据匹配度建立本体元素和抽取元素的映射关系,及获取匹配度所使用的匹配度算法。例如,映射策略指令可以指示根据匹配度建立本体元素和抽取元素的映射关系,且获取匹配度使用的匹配度算法可以为编辑距离相似度算法。
在映射策略的另一种可实现方式中,用户可以通过终端在知识图谱构建装置的设置界面中配置映射策略。其实现过程包括:用户可以通过终端指示抽取元素与本体元素之间的映射关系,并指示使用本体元素对与其具有映射关系的抽取元素进行标准化描述。用户完成配置后,可以通过在设置界面中执行指定操作,触发发送映射策略指令。并且,由于在步骤403中确定知识图谱本体模型后,该知识图谱本体模型所定义的本体元素就确定了,因此,配置映射策略的过程实质为根据已确定的本体元素,分别指示与不同本体元素具有映射关系的抽取元素的过程。
示例的,图10是本申请实施例提供一种知识图谱构建装置的设置界面的示意图, 如图10所示,用户可以在该设置界面中,分别添加与本体元素具有映射关系的抽取元素。例如,对于已知的本体元素中的实体类型(即本体实体类型)“名称”,可以添加与其存在映射关系的抽取元素中的实体类型(即抽取实体类型)为“名字”,以对实体类型进行映射。对于本体元素关联关系(即本体关联关系),可以添加与其存在映射关系的抽取元素中的关联关系(即抽取关联关系),以对关联关系进行映射。对于已知的本体元素中的实体属性(即本体实体属性),可以添加与其存在映射关系的抽取元素中的实体属性(即抽取实体属性),以对实体属性进行知识映射。并且,还可以根据知识图谱本体模型的类别(即本体类别)对知识图谱的类别进行类型映射。在完成配置后,可以点击“下一步”按钮,以触发发送映射策略指令。
步骤409、根据映射策略指令指示的映射策略和多元组数据的标准化描述,分别对根据每路数据抽取得到的多个多元组数据进行关联映射,得到采用多元组数据的标准化描述进行标准化描述的多个多元组数据。
知识图谱构建装置在获取映射策略指令后,可以根据该映射策略指令指示的映射策略,根据本体元素对多个多元组数据进行知识映射,得到采用本体元素进行标准化描述的多个多元组数据。通过知识映射可以将抽取元素按照知识图谱本体模型定义的本体元素进行标准化描述,实现了抽取元素的统一表示,提高了知识图谱的可读性。
需要说明的是,在构建知识图谱的过程中,可以根据业务需求确定是否执行步骤408。并且,知识图谱构建装置中可以默认配置有映射策略,在不执行步骤408时,在步骤409中,知识图谱构建装置可以使用默认配置的映射策略对多元组数据进行关联映射。但是,通过选择映射策略,并使用选择的映射策略对多元组数据进行关联映射,使得知识图谱构建装置能够针对不同类型的数据使用不同的映射策略,能够提高对多元组数据进行关联映射的准确性,提高了知识图谱构建的准确性。
步骤410、接收匹配策略指令。
当根据多个源数据构建知识图谱时,用于指示同一实体的信息的表示方式可能不同,若直接根据提取出的多元组数据构建知识图谱,可能会将采用不同表示方式的同一实体当做不同的实体,导致构建的知识图谱无法准确反映源数据体现的内容。因此,在根据多元组数据构建知识图谱之前,还可以判断不同多元组数据中是否包括有用于指示同一实体的元素,并对包括有用于指示同一实体的元素的不同多元组数据进行合并处理(也称知识融合,knowledge conflation),以便于根据经过合并处理后的多元组数据构建知识图谱,进而提高构建的知识图谱的准确性。例如,根据表1所示的源数据进行信息提取得到的实体类型的信息为“名称:章某某1”,根据表2所示的源数据进行信息提取得到的实体类型的信息为“名称:1某某章”,两者虽然表示方式不同,但两者均用于指示同一实体,此时,可以对两者进行知识融合。
该匹配策略指令用于指示判断不同多元组数据中是否包括有用于指示同一实体的元素的匹配算法和匹配度阈值。知识图谱构建装置可以根据该匹配度算法获取不同多元组数据中元素的匹配度,当不同多元组数据中元素的匹配度不小于匹配度阈值时,确定该不同多元组数据中元素用于指示同一实体,此时,可以将该用于指示同一实体的不同多元组数据中的元素进行合并。
在一种可实现方式中,知识图谱构建装置的部署环境中可以预先存储有多种匹配 算法的程序,此时,可以在知识图谱构建装置的设置界面中选择需要使用的匹配算法,并在选择完成后,通过在设置界面中执行指定操作,触发发送匹配策略指令。示例的,图11是本申请实施例提供一种知识图谱构建装置的设置界面的示意图,如图11所示,用户可以在该设置界面中针对不同的元素,选择对其进行知识融合时需要使用的匹配算法和匹配度阈值。并且,还可以分别针对实体的不同实体属性分别设置匹配算法和匹配度阈值,且对具有多个实体属性的实体,在判断该实体与其他实体是否为相同的实体时,其判断结果可以为该实体的不同实体属性对应的匹配算法的算法结果的“集成”。例如,可以为该实体的不同实体属性对应的匹配算法的算法结果的交集。类似的,每个属性也可以配置有多个匹配算法。在设置完成后,可以点击“下一步”按钮,以触发匹配策略指令。
步骤411、根据匹配策略指令指示的多元组数据匹配策略,在标准化描述后的多个多元组数据中,确定包括有指示同一实体的元素的不同多元组数据,对包括有指示同一实体的元素的不同多元组数据进行合并处理,得到经过合并处理后的多个多元组数据。
对包括有指示同一实体的元素的不同多元组数据进行合并处理,是指采用相同表示方式表示采用不同表示方式的同一实体,使得用于指示同一实体的元素的表示方式相同。
示例地,根据表1所示的源数据进行信息提取得到的三元组数据分别为(章某某1,身高,164厘米)、(章某某1,性别,女)、(章某某1,国籍,中国)、(章某某1,生日,1979年2月9日)、(章某某1,兄妹,章某某2)、(章某某1,主演,我的父亲母亲)、(章某某1,主演,卧虎藏龙)。根据表2所示的源数据进行信息提取得到的三元组数据分别为(1某某章,身高,164厘米)、(1某某章,性别,女)、(1某某章,兄妹,章某某2)、(1某某章,主演,我的父亲母亲)、(1某某章,主演,英雄)、(章某某1,主演,十面埋伏)、(1某某章,演唱者,十面埋伏)。根据匹配策略指令指示的多元组数据匹配策略进行知识融合后,得到以下三元组数据:(章某某1,身高,164厘米)、(章某某1,性别,女)、(章某某1,国籍,中国)、(章某某1,生日,1979年2月9日)、(章某某1,兄妹,章某某2)、(章某某1,主演,我的父亲母亲)、(章某某1,主演,十面埋伏)、(章某某1,主演,英雄)、(章某某1,演唱者,十面埋伏)。
需要说明的是,在构建知识图谱的过程中,可以根据业务需求确定是否执行步骤410。并且,知识图谱构建装置中可以默认配置有匹配算法和对应的匹配度阈值。在不执行步骤410时,在该步骤411中,知识图谱构建装置可以使用默认配置的匹配算法和对应的匹配度阈值,判断不同多元组数据中是否包括有用于指示同一实体的元素。但是,通过选择匹配算法,并使用选择的匹配算法判断不同多元组数据中是否包括有指示同一实体的元素,使得知识图谱构建装置能够对基于不同领域中的数据获得的元素采用不同的匹配算法,能够提高知识映射的灵活度和获取匹配度的准确性,提高了知识图谱构建的准确性和全面性。
步骤412、根据经过合并处理后的多个多元组数据,构建知识图谱。
其中,知识图谱记录源数据所包括的实体及不同实体之间的关系。前述步骤401 至步骤411均为构建知识图谱的准备工作,在完成准备工作后,即可根据经过合并处理后的多个多元组数据构建知识图谱。该根据多元组数据构建知识图谱的过程可以理解为:按照经过合并处理后的多个多元组数据中各个元素之间的关系,将多个多元组数据连接成语义网络的过程。并且,语义网络中的每个节点对应一个多元组数据中的实体类型或实体属性,节点之间的关系对应多元组数据中的关联关系的信息,且节点之间箭头的起点对应多元组数据中用作主语的元素,箭头的终点对应多元组数据中用作宾语的元素。
示例地,图12为根据步骤411中经过合并处理后的多元组数据构建的知识图谱的示意图。如图12所示,该知识图谱记录了用于指示实体的多元组数据中的实体类型、实体属性及关联关系,该知识图谱通过图的形式表示出了表1和表2的源数据,提高了源数据的可视化程度,提高了根据该源数据进行分析的便捷程度。
步骤413、在确定源数据发生更新后,根据信息抽取指令所指示的策略,对更新后的源数据中的增量数据进行信息抽取,得到增量数据对应的多个多元组数据,并根据增量数据对应的多个多元组数据更新知识图谱。
当已构建的知识图谱的源数据发生更新时,可以获取更新后的源数据相对于该源数据的增量数据,并根据该增量数据更新该已构建的指示图谱,得到更新后的源数据对应的知识图谱。例如,可以先对增量数据进行信息抽取,得到增量数据对应的多个多元组数据,然后对增量数据对应的多个多元组数据进行知识映射,再对经过关联映射后的增量数据对应的多个多元组数据进行知识融合,然后根据经过知识融合后的多个多元组数据更新指示图谱。通过对知识图谱进行增量更新,能够减小根据更新后的源数据构建知识图谱过程中的额计算量,可以提高构建知识图谱的构建效率。
综上所述,本申请实施例提供的知识图谱构建方法,通过接收信息抽取指令,确定对构建知识图谱的源数据进行信息抽取采用的信息抽取策略,采用该信息抽取策略对源数据进行信息抽取得到多个多元组数据,然后根据该多个多元组数据构建知识图谱,相较于相关技术,能够根据业务需求配置信息抽取策略,并针对不用领域中的源数据采用不同的信息抽取策略,使得可以根据不同领域中的源数据构建知识图谱,保证了知识图谱构建方法的适用范围,提高了构建知识图谱的灵活性。
本申请实施例提供的知识图谱构建方法的步骤先后顺序可以进行适当调整,步骤也可以根据情况进行相应增减,例如,可以根据应用需求选择是否执行上述步骤402、步骤406、步骤408和步骤410。任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。
本申请实施例还提供了一种知识图谱构建装置。如图13所示,该知识图谱构建装置80可以包括:
接收模块801,用于接收信息抽取指令,信息抽取指令用于指示对构建知识图谱的源数据进行信息抽取采用的信息抽取策略。
抽取模块802,用于采用信息抽取指令所指示的信息抽取策略,对源数据进行信息抽取,得到多个多元组数据,每个多元组数据包括:用于指示实体的实体类型的信息、实体属性的信息和关联关系的信息。
构建模块803,用于根据多个多元组数据,构建知识图谱,知识图谱记录源数据所包括的实体及不同实体之间的关系。
可选地,如图14所示,知识图谱构建装置80还包括:
获取模块804,用于获取用于构建知识图谱时需要使用的知识图谱本体模型,知识图谱本体模型定义知识图谱中多元组数据的标准化描述。
接收模块801,还用于接收映射策略指令,映射策略指令用于指示根据多元组数据的标准化描述对多个多元组数据进行关联映射的映射策略。
映射模块805,用于根据多元组数据的标准化描述和映射策略指令所指示的映射策略,对多个多元组数据进行关联映射,得到采用多元组数据的标准化描述进行标准化描述的多个多元组数据。
相应的,构建模块803,具体用于:根据标准化描述后的多个多元组数据,构建知识图谱。
可选地,如图14所示,知识图谱构建装置80还包括:
确定模块806,用于根据指定的多元组数据匹配策略,在多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据。
合并模块807,用于对包括有指示同一实体的信息的不同多元组数据进行合并处理。
相应的,构建模块803,具体用于:根据经过合并处理后的多个多元组数据,构建知识图谱。
可选地,接收模块801,还用于接收匹配策略指令,匹配策略指令用于指示判断不同多元组数据中是否包括有指示同一实体的信息的匹配算法和匹配度阈值。
相应的,确定模块806,具体用于:当根据匹配策略指令所指示的匹配算法,确定两个多元组数据中指示实体的信息的匹配度不小于匹配度阈值时,确定两个多元组数据包括有指示同一实体的信息。
可选地,源数据包括:来源不同的多路数据,抽取模块802,具体用于:分别采用信息抽取指令所指示的对每路数据进行信息抽取采用的信息抽取策略,对每路数据进行信息抽取,得到分别与多路数据对应的多个多元组数据。
相应的,构建模块803,具体用于:根据与多路数据对应的多个多元组数据,构建知识图谱。
可选地,抽取模块802,还用于在确定源数据发生更新后,根据信息抽取指令所指示的策略,对更新后的源数据中的增量数据进行信息抽取,得到增量数据对应的多个多元组数据.
相应的,构建模块803,还用于根据增量数据对应的多个多元组数据更新知识图谱。
可选地,抽取模块802,具体用于:采用信息抽取指令所指示的AI模型,对源数据进行信息抽取。
其中,AI模型为已经过训练的模型,且AI模型的训练样本使用知识图谱本体模型中多元组数据的标准化描述进行标注,知识图谱本体模型定义知识图谱中多元组数据的标准化描述。
综上所述,本申请实施例提供的知识图谱构建装置,通过接收模块接收信息抽取指令,确定对构建知识图谱的源数据进行信息抽取采用的信息抽取策略,抽取模块采用该信息抽取策略对源数据进行信息抽取得到多个多元组数据,然后构建模块根据该多个多元组数据构建知识图谱,相较于相关技术,能够根据业务需求配置信息抽取策略,并针对不用领域中的源数据采用不同的信息抽取策略,使得可以根据不同领域中的源数据构建知识图谱,保证了知识图谱构建方法的适用范围,提高了构建知识图谱的灵活性。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本申请实施例还提供了一种计算设备,该计算设备包括处理器和存储器;该存储器中存储有计算机程序;该处理器执行计算机程序时,该计算设备实现本申请实施例提供的知识图谱构建方法。该计算设备可以为服务器或终端,该计算设备的结构请相应参考图3中计算设备的结构,此处不再赘述。
可选地,该计算设备可以工作在AI平台和大数据平台上,以利用该AI平台构建、训练和部署本申请实施例提供的知识图谱构建方法中使用到的AI模型,并从该大数据平台中获取源数据,及利用该大数据平台进行数据处理。
本申请实施例还提供了一种存储介质,该存储介质为非易失性计算机可读存储介质,当存储介质中的指令被处理器执行时,实现本申请实施例提供的知识图谱构建方法。
本申请实施例还提供了一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行本申请实施例提供的知识图谱构建方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
在本申请实施例中,术语“第一”、“第二”和“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。术语“至少一个”是指一个或多个,术语“多个”指两个或两个以上,除非另有明确的限定。
本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的构思和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
Claims (16)
- 一种知识图谱构建方法,其特征在于,所述方法包括:接收信息抽取指令,所述信息抽取指令用于指示对构建知识图谱的源数据进行信息抽取采用的信息抽取策略;采用所述信息抽取指令所指示的信息抽取策略,对所述源数据进行信息抽取,得到多个多元组数据,每个多元组数据包括:用于指示实体的实体类型的信息、实体属性的信息和关联关系的信息;根据所述多个多元组数据,构建所述知识图谱,所述知识图谱记录所述源数据所包括的实体及不同实体之间的关系。
- 根据权利要求1所述的方法,其特征在于,在所述根据所述多个多元组数据,构建所述知识图谱之前,所述方法还包括:获取用于构建所述知识图谱时需要使用的知识图谱本体模型,所述知识图谱本体模型定义所述知识图谱中多元组数据的标准化描述;接收映射策略指令,所述映射策略指令用于指示根据所述多元组数据的标准化描述对所述多个多元组数据进行关联映射的映射策略;根据所述多元组数据的标准化描述和所述映射策略指令所指示的映射策略,对所述多个多元组数据进行关联映射,得到采用所述多元组数据的标准化描述进行标准化描述的多个多元组数据;所述根据所述多个多元组数据,构建所述知识图谱,包括:根据标准化描述后的多个多元组数据,构建所述知识图谱。
- 根据权利要求1或2所述的方法,其特征在于,在所述根据所述多个多元组数据,构建所述知识图谱之前,所述方法还包括:根据指定的多元组数据匹配策略,在所述多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据;对包括有指示同一实体的信息的不同多元组数据进行合并处理;所述根据所述多个多元组数据,构建所述知识图谱,包括:根据经过合并处理后的多个多元组数据,构建所述知识图谱。
- 根据权利要求3所述的方法,其特征在于,在所述根据指定的多元组数据匹配策略,在所述多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据之前,所述方法还包括:接收匹配策略指令,所述匹配策略指令用于指示判断不同多元组数据中是否包括有指示同一实体的信息的匹配算法和匹配度阈值;所述根据指定的多元组数据匹配策略,在所述多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据,包括:当根据所述匹配策略指令所指示的匹配算法,确定两个多元组数据中指示实体的信息的匹配度不小于所述匹配度阈值时,确定所述两个多元组数据包括有指示同一实 体的信息。
- 根据权利要求1至4任一所述的方法,其特征在于,所述源数据包括:来源不同的多路数据,所述采用所述信息抽取指令所指示的信息抽取策略,对所述源数据进行信息抽取,得到多个多元组数据,包括:分别采用所述信息抽取指令所指示的对每路数据进行信息抽取采用的信息抽取策略,对每路数据进行信息抽取,得到分别与所述多路数据对应的多个多元组数据;所述根据所述多个多元组数据,构建所述知识图谱,包括:根据与所述多路数据对应的多个多元组数据,构建所述知识图谱。
- 根据权利要求1至5任一所述的方法,其特征在于,在所述根据所述多个多元组数据,构建所述知识图谱之后,所述方法还包括:在确定所述源数据发生更新后,根据所述信息抽取指令所指示的策略,对更新后的源数据中的增量数据进行信息抽取,得到所述增量数据对应的多个多元组数据;根据所述增量数据对应的多个多元组数据更新所述知识图谱。
- 根据权利要求1所述的方法,其特征在于,所述采用所述信息抽取指令所指示的信息抽取策略,对所述源数据进行信息抽取,包括:采用所述信息抽取指令所指示的AI模型,对所述源数据进行信息抽取;其中,所述AI模型为已经过训练的模型,且所述AI模型的训练样本使用知识图谱本体模型中多元组数据的标准化描述进行标注,所述知识图谱本体模型定义所述知识图谱中多元组数据的标准化描述。
- 一种知识图谱构建装置,其特征在于,所述装置包括:接收模块,用于接收信息抽取指令,所述信息抽取指令用于指示对构建知识图谱的源数据进行信息抽取采用的信息抽取策略;抽取模块,用于采用所述信息抽取指令所指示的信息抽取策略,对所述源数据进行信息抽取,得到多个多元组数据,每个多元组数据包括:用于指示实体的实体类型的信息、实体属性的信息和关联关系的信息;构建模块,用于根据所述多个多元组数据,构建所述知识图谱,所述知识图谱记录所述源数据所包括的实体及不同实体之间的关系。
- 根据权利要求8所述的装置,其特征在于,所述装置还包括:获取模块,用于获取用于构建所述知识图谱时需要使用的知识图谱本体模型,所述知识图谱本体模型定义所述知识图谱中多元组数据的标准化描述;所述接收模块,还用于接收映射策略指令,所述映射策略指令用于指示根据所述多元组数据的标准化描述对所述多个多元组数据进行关联映射的映射策略;映射模块,用于根据所述多元组数据的标准化描述和所述映射策略指令所指示的映射策略,对所述多个多元组数据进行关联映射,得到采用所述多元组数据的标准化 描述进行标准化描述的多个多元组数据;所述构建模块,具体用于:根据标准化描述后的多个多元组数据,构建所述知识图谱。
- 根据权利要求8或9所述的装置,其特征在于,所述装置还包括:确定模块,用于根据指定的多元组数据匹配策略,在所述多个多元组数据中,确定包括有指示同一实体的信息的不同多元组数据;合并模块,用于对包括有指示同一实体的信息的不同多元组数据进行合并处理;所述构建模块,具体用于:根据经过合并处理后的多个多元组数据,构建所述知识图谱。
- 根据权利要求10所述的装置,其特征在于,所述接收模块,还用于接收匹配策略指令,所述匹配策略指令用于指示判断不同多元组数据中是否包括有指示同一实体的信息的匹配算法和匹配度阈值;所述确定模块,具体用于:当根据所述匹配策略指令所指示的匹配算法,确定两个多元组数据中指示实体的信息的匹配度不小于所述匹配度阈值时,确定所述两个多元组数据包括有指示同一实体的信息。
- 根据权利要求8至11任一所述的装置,其特征在于,所述源数据包括:来源不同的多路数据,所述抽取模块,具体用于:分别采用所述信息抽取指令所指示的对每路数据进行信息抽取采用的信息抽取策略,对每路数据进行信息抽取,得到分别与所述多路数据对应的多个多元组数据;所述构建模块,具体用于:根据与所述多路数据对应的多个多元组数据,构建所述知识图谱。
- 根据权利要求8至12任一所述的装置,其特征在于,所述抽取模块,还用于在确定所述源数据发生更新后,根据所述信息抽取指令所指示的策略,对更新后的源数据中的增量数据进行信息抽取,得到所述增量数据对应的多个多元组数据;所述构建模块,还用于根据所述增量数据对应的多个多元组数据更新所述知识图谱。
- 根据权利要求8所述的装置,其特征在于,所述抽取模块,具体用于:采用所述信息抽取指令所指示的AI模型,对所述源数据进行信息抽取;其中,所述AI模型为已经过训练的模型,且所述AI模型的训练样本使用知识图谱本体模型中多元组数据的标准化描述进行标注,所述知识图谱本体模型定义所述知识图谱中多元组数据的标准化描述。
- 一种计算设备,其特征在于,所述计算设备包括处理器和存储器;所述存储器中存储有计算机程序;所述处理器执行所述计算机程序时,所述计算设备实现权利要求1至7中任一所述的知识图谱构建方法。
- 一种非易失性的存储介质,其特征在于,当所述存储介质中的指令被处理器执行时,实现权利要求1至7中任一所述的知识图谱构建方法。
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910792526 | 2019-08-26 | ||
CN201910792526.0 | 2019-08-26 | ||
CN201911147385.3 | 2019-11-21 | ||
CN201911147385.3A CN112434811A (zh) | 2019-08-26 | 2019-11-21 | 知识图谱构建方法及装置、计算设备、存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021037045A1 true WO2021037045A1 (zh) | 2021-03-04 |
Family
ID=74685500
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/111308 WO2021037045A1 (zh) | 2019-08-26 | 2020-08-26 | 知识图谱构建方法及装置、计算设备、存储介质 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021037045A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312494A (zh) * | 2021-05-28 | 2021-08-27 | 中国电力科学研究院有限公司 | 垂直领域知识图谱构建方法、系统、设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6768982B1 (en) * | 2000-09-06 | 2004-07-27 | Cellomics, Inc. | Method and system for creating and using knowledge patterns |
CN107633060A (zh) * | 2017-09-20 | 2018-01-26 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
CN108460136A (zh) * | 2018-03-08 | 2018-08-28 | 国网福建省电力有限公司 | 电力运维信息知识图谱构建方法 |
CN109508383A (zh) * | 2018-10-30 | 2019-03-22 | 北京国双科技有限公司 | 知识图谱的构建方法及装置 |
CN109657065A (zh) * | 2018-10-31 | 2019-04-19 | 百度在线网络技术(北京)有限公司 | 知识图谱处理方法、装置及电子设备 |
-
2020
- 2020-08-26 WO PCT/CN2020/111308 patent/WO2021037045A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6768982B1 (en) * | 2000-09-06 | 2004-07-27 | Cellomics, Inc. | Method and system for creating and using knowledge patterns |
CN107633060A (zh) * | 2017-09-20 | 2018-01-26 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
CN108460136A (zh) * | 2018-03-08 | 2018-08-28 | 国网福建省电力有限公司 | 电力运维信息知识图谱构建方法 |
CN109508383A (zh) * | 2018-10-30 | 2019-03-22 | 北京国双科技有限公司 | 知识图谱的构建方法及装置 |
CN109657065A (zh) * | 2018-10-31 | 2019-04-19 | 百度在线网络技术(北京)有限公司 | 知识图谱处理方法、装置及电子设备 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312494A (zh) * | 2021-05-28 | 2021-08-27 | 中国电力科学研究院有限公司 | 垂直领域知识图谱构建方法、系统、设备及存储介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10942708B2 (en) | Generating web API specification from online documentation | |
US10725836B2 (en) | Intent-based organisation of APIs | |
Qi et al. | Compatibility-aware web API recommendation for mashup creation via textual description mining | |
CN109074537B (zh) | 电子系统、计算设备和计算设备中的方法 | |
US12056161B2 (en) | System and method for smart categorization of content in a content management system | |
US11017764B1 (en) | Predicting follow-on requests to a natural language request received by a natural language processing system | |
US10691507B2 (en) | API learning | |
CN112434811A (zh) | 知识图谱构建方法及装置、计算设备、存储介质 | |
US11604626B1 (en) | Analyzing code according to natural language descriptions of coding practices | |
US20130262467A1 (en) | Method and apparatus for providing token-based classification of device information | |
US11494559B2 (en) | Hybrid in-domain and out-of-domain document processing for non-vocabulary tokens of electronic documents | |
CN112671886B (zh) | 基于边缘计算和人工智能的信息推送方法及大数据服务器 | |
US11836120B2 (en) | Machine learning techniques for schema mapping | |
US11507747B2 (en) | Hybrid in-domain and out-of-domain document processing for non-vocabulary tokens of electronic documents | |
US20170103125A1 (en) | Apparatus and method of exploring and accessing relevant data from big data repository | |
CN112463991A (zh) | 历史行为数据的处理方法、装置、计算机设备及存储介质 | |
WO2021037045A1 (zh) | 知识图谱构建方法及装置、计算设备、存储介质 | |
US20170270195A1 (en) | Providing token-based classification of device information | |
US20210150289A1 (en) | Text classification for input method editor | |
US12026209B2 (en) | Systems and methods for smart capture to provide input and action suggestions | |
US12094458B2 (en) | Multi-channel conversation processing | |
US11893365B2 (en) | Semantic design system | |
Settle et al. | aMatReader: Importing adjacency matrices via Cytoscape Automation | |
US11921808B2 (en) | Auto-evolving of online posting based on analyzed discussion thread | |
CN113806401A (zh) | 数据流处理 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20858608 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20858608 Country of ref document: EP Kind code of ref document: A1 |