CN111753022A - Method, device and equipment for constructing knowledge graph and readable storage medium - Google Patents

Method, device and equipment for constructing knowledge graph and readable storage medium Download PDF

Info

Publication number
CN111753022A
CN111753022A CN202010556526.3A CN202010556526A CN111753022A CN 111753022 A CN111753022 A CN 111753022A CN 202010556526 A CN202010556526 A CN 202010556526A CN 111753022 A CN111753022 A CN 111753022A
Authority
CN
China
Prior art keywords
knowledge
graph
interface
user
acquisition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010556526.3A
Other languages
Chinese (zh)
Inventor
陶冶
陈伟
陈雨强
谢佳雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202010556526.3A priority Critical patent/CN111753022A/en
Publication of CN111753022A publication Critical patent/CN111753022A/en
Priority to EP21825747.5A priority patent/EP4170520A4/en
Priority to PCT/CN2021/100709 priority patent/WO2021254457A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The present disclosure relates to a method, an apparatus, a device and a readable storage medium for constructing a knowledge graph, wherein the method comprises: acquiring an original information data set uploaded through a data uploading interface; outputting part of information data in the original information data set to a user through a manual labeling interface, and acquiring first labeling result data finished on the manual labeling interface by the user; training a knowledge acquisition model based on the first labeling result data to obtain a knowledge acquisition model meeting preset conditions; extracting first atlas knowledge from the original information dataset using the knowledge acquisition model; and constructing a knowledge graph based on the first graph knowledge. The technical scheme disclosed by the invention can improve the efficiency of constructing the knowledge graph, reduce the threshold of constructing the knowledge graph and improve the user experience.

Description

Method, device and equipment for constructing knowledge graph and readable storage medium
Technical Field
The present disclosure relates to the field of natural language processing technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for constructing a knowledge graph.
Background
With the continuous development of artificial intelligence technology, the knowledge graph technology, which is one of the important branches of artificial intelligence, is also increasingly applied in various industries.
In the prior art, in the process of constructing a knowledge graph, a specific structure of the graph is generally defined, then knowledge is extracted into a triple form from a large amount of unstructured data and structured data, and then the knowledge of the triple is constructed into the knowledge graph.
Therefore, the method for constructing the knowledge graph in the prior art needs professional NLP experts or knowledge graph experts to participate, and has the defects of high difficulty and low efficiency.
Disclosure of Invention
To solve the above technical problem or to at least partially solve the above technical problem, the present disclosure provides a method, an apparatus, a device, and a readable storage medium for constructing a knowledge graph.
A first aspect of the embodiments of the present disclosure provides a method for constructing a knowledge graph, where the method includes:
acquiring an original information data set uploaded through a data uploading interface;
outputting part of information data in the original information data set to a user through a manual labeling interface, and acquiring first labeling result data finished on the manual labeling interface by the user;
training a knowledge acquisition model based on the first labeling result data to obtain a knowledge acquisition model meeting preset conditions;
extracting first atlas knowledge from the original information dataset using the knowledge acquisition model;
and constructing a knowledge graph based on the first graph knowledge.
In some embodiments, before outputting the partial information data in the original information data set to the user through the manual tagging interface, the method further includes:
providing a map configuration interface, and acquiring knowledge map definition information input through the map configuration interface;
and a labeling prompt control matched with the knowledge graph definition information is arranged on the manual labeling interface.
In some embodiments, said extracting first atlas knowledge from the original information dataset using the knowledge acquisition model comprises:
and extracting first map knowledge matched with the knowledge map definition information from the original information data set by using a knowledge acquisition model based on the definition information of the knowledge map.
In some embodiments, said building a knowledge-graph based on said first graph knowledge comprises:
converting the extracted first atlas knowledge into a question to be confirmed, and pushing the question to at least one user for answering through a preset channel;
and constructing a knowledge graph based on the response result of the user.
In some embodiments, the method further comprises:
obtaining second annotation result data based on the answer result of the user;
performing optimization training on the knowledge acquisition model based on the second labeling result data to obtain an optimized knowledge acquisition model;
and converting the first atlas knowledge extracted by using the optimized knowledge acquisition model into a problem to be confirmed, and pushing the problem to at least one user for answering.
In some embodiments, when it is determined that a first map knowledge performance parameter value extracted from the original information data set by using the knowledge acquisition model is greater than or equal to a first preset threshold value based on a response result of a user, a knowledge map is constructed directly based on knowledge of the first map.
In some embodiments, the converting the extracted first atlas knowledge into a question to be confirmed and pushing the question to at least one user for answering through a preset channel includes:
and providing a user answering interface, displaying the questions to be confirmed on the user answering interface, and responding to a submission control.
In some embodiments, the method further comprises:
and displaying source information data of the first atlas knowledge corresponding to the question to be confirmed on the user response interface, and/or displaying a knowledge atlas created based on the first atlas knowledge corresponding to the question to be confirmed on the user response interface.
In some embodiments, at least two questions to be confirmed are presented on the user response interface, the response submission control including a batch selection control.
In some embodiments, the batch selection control includes at least one of a full selection control, a no selection control, a reverse selection control, and a partial multiple selection control.
In some embodiments, the first graph knowledge comprises at least one of an entity type, a relationship type, and an event type;
the method further comprises the following steps:
acquiring the priority of the first map knowledge of each type input through a map construction overview interface;
the reply is pushed to at least one user through a preset channel, and the reply comprises the following steps:
and pushing the problem to be confirmed corresponding to the first atlas knowledge of each type based on the priority.
In some embodiments, further comprising:
acquiring performance parameter information of the knowledge acquisition model, wherein the performance parameter information comprises at least one of accuracy, recall rate and F1, and is displayed on the map construction overview interface; and/or the presence of a gas in the gas,
and acquiring construction information of first map knowledge of each type of the knowledge map, and displaying the construction information on the map construction overview interface, wherein the construction information comprises constructors, construction times and construction accuracy.
In some embodiments, the training the knowledge acquisition model based on the first labeling result data to obtain the knowledge acquisition model meeting the preset condition includes:
training the knowledge acquisition model based on the first labeling result;
aiming at the same part of information data in the original information data set, respectively acquiring second atlas knowledge by using the knowledge acquisition model and acquiring third annotation result data through a manual annotation interface;
and if the performance parameter value of the knowledge acquisition model is determined to be greater than or equal to a second preset threshold value based on the second atlas knowledge and the third labeling result data, determining that the knowledge acquisition model meets a preset condition.
In some embodiments, the method further comprises:
and if the performance parameter value of the knowledge acquisition model is determined to be smaller than a second preset threshold value based on the second atlas knowledge and the third labeling result data, continuing to perform optimization training on the knowledge acquisition model based on the third labeling result data.
In some embodiments, the obtaining the raw information data set uploaded through the data upload interface includes:
providing a data uploading interface:
the method comprises the steps of obtaining an original information data set uploaded through a data uploading interface, wherein the original information data set comprises at least one of an original corpus data set, an original picture data set and an original video data set, and the original corpus data set comprises unstructured text data.
In some embodiments, further comprising:
acquiring structured data uploaded through a data uploading interface;
and constructing a knowledge graph based on graph knowledge in the structured data.
In some embodiments, the obtaining the structured data uploaded through the data uploading interface includes:
providing a data uploading interface, acquiring structured data uploaded through the data uploading interface, and acquiring fields in the structured data corresponding to the type of the atlas knowledge in the knowledge atlas.
In a second aspect of the disclosed embodiments, there is provided an apparatus for constructing a knowledge-graph, the method including:
the first acquisition module is used for acquiring an original information data set uploaded through the data uploading interface;
the second acquisition module is used for outputting part of information data in the original information data set to a user through a manual labeling interface and acquiring first labeling result data finished on the manual labeling interface by the user;
the model training module is used for training the knowledge acquisition model based on the first labeling result data to obtain the knowledge acquisition model meeting the preset conditions;
a knowledge extraction module for extracting a first map knowledge from the original information dataset using the knowledge acquisition model;
and the map construction module is used for constructing a knowledge map based on the first map knowledge.
In some embodiments, further comprising:
the third acquisition module is used for providing a map configuration interface and acquiring knowledge map definition information input through the map configuration interface;
and a labeling prompt control matched with the knowledge graph definition information is arranged on the manual labeling interface.
In some embodiments, the knowledge acquisition module is specifically configured to extract, based on the knowledge-graph definition information, a first graph knowledge matching the knowledge-graph definition information from the original information dataset using a knowledge acquisition model.
In some embodiments, the map construction module is specifically configured to convert the extracted first map knowledge into a question to be confirmed, and to push the question to at least one user for answering through a preset channel; and constructing a knowledge graph based on the response result of the user.
In some embodiments, the apparatus further comprises:
the fourth acquisition module is used for acquiring second annotation result data based on the answering result of the user;
the optimization training module is used for performing optimization training on the knowledge acquisition model based on the second labeling result data to obtain an optimized knowledge acquisition model;
the map construction module is specifically used for converting the first map knowledge extracted by the optimized knowledge acquisition model into a problem to be confirmed and pushing the problem to at least one user for answering.
In some embodiments, the graph construction module is specifically configured to, when it is determined that a first graph knowledge performance parameter value extracted from the original information data set by using the knowledge acquisition model is greater than or equal to a first preset threshold value based on a response result of a user, directly construct a knowledge graph based on using the first graph knowledge.
In some embodiments, the graph building module is specifically configured to provide a user response interface, and display the question to be confirmed on the user response interface, and a reply submission control.
In some embodiments, the graph construction module is further configured to display, on the user response interface, source information data of the first graph knowledge corresponding to the question to be confirmed, and/or display, on the user response interface, a knowledge graph created based on the first graph knowledge corresponding to the question to be confirmed.
In some embodiments, the atlas handling module is further configured to present at least two questions to be validated on the user-answer interface, the answer submission control including a batch selection control.
In some embodiments, the batch selection control includes at least one of a full selection control, a no selection control, a reverse selection control, and a partial multiple selection control.
In some embodiments, the first graph knowledge comprises at least one of an entity type, a relationship type, and an event type; the device further comprises:
the fifth acquisition module is used for acquiring the priority of the first map knowledge of each type input through the map construction overview interface;
the graph building module is specifically used for pushing the to-be-confirmed problem corresponding to the first graph knowledge of each type based on the priority.
In some embodiments, the fifth obtaining module is further configured to obtain performance parameter information of the knowledge obtaining model, where the performance parameter information includes at least one of accuracy, recall and F1, and is displayed on the graph construction overview interface; and/or acquiring construction information of the first map knowledge of each type of the knowledge map, and displaying the construction information on the map construction overview interface, wherein the construction information comprises constructors, construction times and construction accuracy.
In some embodiments, the model training module is specifically configured to train the knowledge acquisition model based on the first labeling result; aiming at the same part of information data in the original information data set, respectively acquiring second atlas knowledge by using the knowledge acquisition model and acquiring third annotation result data through a manual annotation interface; and if the performance parameter value of the knowledge acquisition model is determined to be greater than or equal to a second preset threshold value based on the second atlas knowledge and the third labeling result data, determining that the knowledge acquisition model meets a preset condition.
In some embodiments, the model training module is further configured to continue to perform optimization training on the knowledge acquisition model based on the third labeled result data if it is determined that the performance parameter value of the knowledge acquisition model is smaller than a second preset threshold value based on the second map knowledge and the third labeled result data.
In some embodiments, the first obtaining module is specifically configured to provide a data uploading interface, and obtain an original information data set uploaded through the data uploading interface, where the original information data set includes at least one of an original corpus data set, an original picture data set, and an original video data set, and the original corpus data set includes unstructured text data.
In some embodiments, further comprising:
the sixth acquisition model is used for acquiring the structured data uploaded through the data uploading interface;
the map construction module is specifically configured to construct a knowledge map based on map knowledge in the structured data.
In some embodiments, the sixth obtaining module is specifically configured to provide a data uploading interface, obtain structured data uploaded through the data uploading interface, and obtain a field in the structured data corresponding to the type of the atlas knowledge in the knowledge atlas.
In a third aspect of the disclosed embodiments, a computer apparatus is disclosed, the computer apparatus comprising:
a processor for implementing the steps of any of the above methods when executing a computer program stored in a memory.
In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is disclosed, having stored thereon computer instructions, which, when executed by a processor, implement the steps of the method as in any of the above.
According to the technical scheme provided by the embodiment of the disclosure, first labeling result data are obtained in a manual labeling mode, then a training knowledge acquisition model is obtained based on the first labeling result data, a knowledge acquisition model meeting preset conditions is obtained, finally, first atlas knowledge is extracted from an original information data set based on the knowledge acquisition model, and a knowledge atlas is constructed based on the first atlas knowledge, so that an atlas knowledge extraction scheme combining manual labeling and model extraction is provided, wherein the manual labeling process can be completed by an expert, and the knowledge acquisition model meeting the preset conditions is obtained by training based on a high-accuracy result of manual labeling, so that the construction difficulty of the knowledge atlas can be reduced, the construction threshold of the knowledge atlas is reduced, the construction efficiency of the knowledge atlas can be improved, and the experience of the knowledge atlas construction process is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram of a method for creating a knowledge-graph according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart diagram of another method for constructing a knowledge graph according to an embodiment of the disclosure;
FIG. 3 is a schematic interface diagram of an entity type definition information input interface in an embodiment of the present disclosure;
FIG. 4 is an interface diagram of a relationship type definition information input interface in an embodiment of the disclosure;
FIG. 5 is an interface diagram of an event type definition information input interface in an embodiment of the disclosure;
FIG. 6 is an interface diagram of a manual tagging interface in an embodiment of the present disclosure;
FIG. 7 is a schematic flow chart diagram illustrating yet another method for constructing a knowledge graph according to an embodiment of the present disclosure;
FIG. 8 is a schematic interface diagram of a user answering interface in an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of an efficiency mode display interface provided in an embodiment of the present disclosure;
FIG. 10 is a schematic flow chart illustrating the process of determining whether the knowledge acquisition model satisfies the predetermined condition according to the embodiment of the present disclosure;
FIG. 11 is a schematic flow chart illustrating construction of a knowledge graph using structured data according to an embodiment of the present disclosure;
FIG. 12 is an interface schematic diagram of a structured data upload interface provided by an embodiment of the present disclosure;
FIG. 13 is an interface diagram of a graph building overview interface provided by an embodiment of the present disclosure;
FIG. 14 is a schematic structural diagram of an apparatus for constructing a knowledge graph according to an embodiment of the present disclosure
FIG. 15 is a schematic structural diagram of another apparatus for constructing a knowledge graph according to an embodiment of the present disclosure;
fig. 16 is a schematic structural diagram of another apparatus for constructing a knowledge graph according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
In the embodiment of the disclosure, the related knowledge graph can be defined as an organization form of graph-like knowledge and data, and the entity node, the event node and the relationship edge are the main three constituent elements of the knowledge graph and can be a knowledge triple; in the knowledge graph, entity nodes and event nodes can be associated by different types of relationship edges.
The entity nodes in the knowledge graph refer to semantic words with specific meanings, such as people, places, mechanisms, numbers and other information.
Event nodes in the knowledge graph refer to 1 or more entity nodes with specific meanings and predefined event names which are connected by specific relation words to form a group of associated information, so that a dynamic behavior is described, and hereinafter, financing events and movie showing events are respectively explained.
For financing events, content may be included:
financing event-financing time-2020.5.20
Financing event-financing Party-XXX corporation
Financing event-financing amount-10 billion dollars
Financing event-investor-XXX corporation
For a movie show event, the content can be as follows:
movie and television work show event-movie and television work name-work name
Event-time mapping on film and television works
Movie work showing event-showing place-place
Event-director-name mapping on film and television works
Event-lead show-name of movie and television works
For a particular movie show event 01, the relationship edges that may be included may include the following:
movie work show event 01-movie work name-ironmen
Movie and television work showing event 01-showing time-2008, 4 months and 30 days
Movie work mapping event 01-mapping place-China
Event 01-director-Jone-Fibreuer on film and television works
Movie work mapping event 01-lead show-little robert-Tang
Movie work mapping event 01-lead actor-gurweneisi palo
Movie work mapping event 01-lead actor-tylens Howland
Event 01-lead actor-jeff brigies on film and television works
Aiming at the defects that in the prior art, in the process of constructing the knowledge graph, especially in the process of constructing the knowledge graph based on unstructured data, professional NLP experts or knowledge graph experts are required to participate, and the difficulty is high and the efficiency is low. In the technical scheme for constructing the knowledge graph provided by the embodiment of the disclosure, first labeling result data is obtained in a manual labeling mode, then training a knowledge acquisition model based on the first labeling result data to obtain a knowledge acquisition model meeting preset conditions, finally extracting first map knowledge from the original information data set based on the knowledge acquisition model, and the knowledge graph is constructed based on the first graph knowledge, thereby providing a graph knowledge extraction scheme combining manual labeling and model extraction, wherein the manual labeling process can be completed by experts, and the knowledge acquisition model meeting the preset conditions is trained on the basis of the high-accuracy result of the manual labeling, so that the construction difficulty of the knowledge map can be reduced, the construction threshold of the knowledge graph is reduced, the construction efficiency of the knowledge graph can be improved, and the experience of the construction process of the knowledge graph is improved.
Fig. 1 is a schematic flow chart of a method for creating a knowledge graph according to an embodiment of the present disclosure, as shown in fig. 1, the method includes the following steps:
step 101, acquiring an original information data set uploaded through a data uploading interface;
the original information data set in the step is the original material for constructing the knowledge graph, and corresponding graph knowledge can be obtained by analyzing and extracting the original information data set. The specific type of the original information data set may be various, and may include at least one of an original corpus data set, an original picture data set, and an original video data set, for example, which may be selected according to a specific service scenario.
102, outputting part of corpus information data in the original information data set to a user through a manual labeling interface, and acquiring first labeling result data finished on the manual labeling interface by the user;
the technical scheme provided by the embodiment of the disclosure is a technical scheme combining manual labeling and model extraction, wherein the manual labeling can be output to a user through a manual labeling interface, the user can be a service expert of a knowledge graph, and the like, first labeling result data is completed on the manual labeling interface through the user, the reliability of the first labeling result data is high, and the first labeling result data can be used as sample data for subsequently training a knowledge acquisition model.
103, training a knowledge acquisition model based on the first labeling result data to obtain a knowledge acquisition model meeting preset conditions;
based on the first labeling result data obtained in the above steps, the knowledge acquisition model may be further trained to obtain a knowledge acquisition model meeting preset conditions, where the preset conditions may be determined according to performance parameters of the knowledge acquisition model, and the parameters include, but are not limited to, parameters such as accuracy, recall rate, and F1 value of the knowledge acquisition model.
Step 104, extracting first atlas knowledge from the original information data set by using the knowledge acquisition model;
after the knowledge acquisition model satisfying the preset condition is obtained in step 103, that is, the knowledge acquisition model can already satisfy the service requirement, in this step, the original information data set may be input into the knowledge acquisition model as input data, so that the knowledge acquisition model extracts the first atlas knowledge from the original information data set, and in this step, the knowledge acquisition model is not limited to the first atlas knowledge in all or part of the original information data set input in step 101.
And 105, constructing a knowledge graph based on the first graph knowledge.
In this step, a knowledge graph is constructed based on the first graph knowledge, wherein the knowledge graph may include the entity nodes, the relationship edges, the events and the like as described above.
The technical scheme for constructing the knowledge graph provided by the embodiment of the disclosure provides a graph knowledge extraction scheme combining manual labeling and model extraction, wherein the manual labeling process can be completed by an expert, a knowledge acquisition model meeting preset conditions is obtained through training based on a high-accuracy result of manual labeling, then the knowledge acquisition model is used for extracting graph knowledge, the construction difficulty of the knowledge graph can be reduced, the construction threshold of the knowledge graph is reduced, the construction efficiency of the knowledge graph can be improved, and the experience of the knowledge graph construction process is improved.
In some embodiments, the knowledge graph may also be configured by the user, that is, the types of graph knowledge included in the knowledge graph and the specific item content included in each type of graph knowledge may be defined, and the structure, the type, and the included item content of the knowledge graph may also be different for different service scenarios. Fig. 2 is an embodiment of the present disclosure. Fig. 2 is a flowchart illustrating another method for constructing a knowledge graph according to another embodiment of the present disclosure, and steps 201 to 203 in the embodiment shown in fig. 2 may refer to steps 101 to 103 described above, as shown in fig. 2, the method further includes:
step 200, providing a map configuration interface, and acquiring knowledge map definition information input through the map configuration interface.
In the step, when the definition information of the knowledge graph is obtained, the user inputs the definition information through the graph configuration interface in a mode of providing the graph configuration interface. Specifically, when the knowledge graph includes entity nodes, event nodes, and relationship edges, the definition information of the acquired knowledge graph may include at least one of entity type definition information, event type definition information, and relationship type definition information of the acquired knowledge graph.
Specifically, the map knowledge for the entity type, where the input definition information may be summary information of the included specific entity node, for example, time, business, actor, etc., the input interface for the definition information for the entity type may be as shown in fig. 3, and in the page shown in fig. 3, the input definition information includes names of the entity nodes, for example, employees or persons, departments, duties, companies, dates, etc. In the disclosed embodiments, the graph knowledge of the entity type corresponds to a node in the graph, which may also be referred to as a node type in some cases.
The input interface for the entity type definition information may be as shown in fig. 4, and the input definition information includes names of relationship edges, such as "department-employee-job" and the like.
The map knowledge for event type, wherein the input definition information may be summary information including specific event, the summary information may include event name, and the relationship within the event, for example, for a financing event, wherein the relationship may include "investor-company", "invested time-time", and "number of financing rounds-number of rounds", etc., the input interface for the event type definition information may be as shown with reference to fig. 5. In the embodiment of the present disclosure, the definition information determines the type and content of the atlas knowledge included in the subsequent knowledge atlas. Therefore, the atlas knowledge matched with the definition information is labeled during manual labeling, namely, a labeling prompt control matched with the knowledge atlas definition information is arranged on the manual labeling interface. Specifically, fig. 6 is an interface schematic diagram of a manual labeling interface in an embodiment of the present disclosure, for example, the interface shown in fig. 6 is an interface that performs atlas knowledge extraction on a section of introduction data of "company X" in a manual labeling manner, where a labeling prompt control matched with definition information is displayed on the left side, for example, controls for entity types including characters, departments, companies, dates, jobs, and the like, and controls for relationship types including "company-creator-character", "company-release product-product", and the like.
After the definition information of the knowledge graph is obtained, in addition to that the graph knowledge matched with the definition information is labeled in the manual labeling stage to generate first labeling result data, in the model extraction stage, the graph knowledge is also extracted by using the definition information of the knowledge graph, as shown in fig. 2, where the step of extracting the first graph knowledge by using the knowledge acquisition model may specifically include:
and 204, extracting first map knowledge matched with the knowledge map definition information from the original information data set by using a knowledge acquisition model based on the knowledge map definition information.
In some embodiments, in the process of constructing the knowledge graph based on the first graph knowledge, the first graph knowledge may be converted into a question to be confirmed, and then the question is pushed to the user for answering, and manual annotation is performed again depending on the graph knowledge extracted by the knowledge acquisition model by the user, but the requirements on the user generally may be lower due to this type of manual annotation. Specifically, referring to fig. 2, the step of constructing the knowledge-graph includes:
step 205, converting the extracted first atlas knowledge into a question to be confirmed, and pushing the question to at least one user for answering through a preset channel;
the method comprises the steps of further converting first map knowledge into a problem to be confirmed on the basis of extracting the first map knowledge by using a knowledge extraction model, and pushing the problem to be confirmed to a user for answering through at least one preset channel, wherein the problem is a question-answer type question, and the user only needs to provide a confirmed or uncertain answering result, so that the requirement on the user is low, and the problem can be solved by using a common user without much professional knowledge; and the problem to be confirmed can be opened to the outside and accessed to different application scenes or platforms for realization. And the answer results of the ordinary users can be used as the labeled first map knowledge after being collected.
And step 206, constructing a knowledge graph based on the answer result of the user.
After answer results of a user for a problem to be confirmed are acquired through various application scenes or platforms, a knowledge graph can be constructed based on the answer results.
In the embodiment of the disclosure, the first atlas knowledge extracted by the knowledge extraction model is converted into the question to be confirmed, and the question is accessed to different application scenes or platforms, so that a common user can answer the question, and manual annotation of the first atlas knowledge is realized. According to the technical scheme of the embodiment of the disclosure, the construction difficulty of the knowledge graph is reduced, and the construction efficiency of the knowledge graph is improved.
In some embodiments, the confidence level of the first atlas knowledge confirmed by the user as a response is higher, and the knowledge acquisition model can be trained by taking the first atlas knowledge as sample data to improve the performance of the knowledge acquisition model. Fig. 7 is a schematic flowchart of another method for constructing a knowledge graph according to an embodiment of the present disclosure, and as shown in fig. 7, the method may further include:
step 701, obtaining second annotation result data based on the answer result of the user;
specifically, in the step, it is considered that the result of the user is equivalent to the result obtained by re-correcting the atlas knowledge extracted from the knowledge acquisition model, and the result is used to obtain the second labeling result data in the step, so as to re-optimize the knowledge acquisition model, which can be regarded as optimized training data.
Step 702, performing optimization training on the knowledge acquisition model based on the second labeling result data to obtain an optimized knowledge acquisition model;
after the knowledge acquisition model is optimally trained by the second annotation result data, the performance of the knowledge acquisition model can be significantly improved, for example, the accuracy, the recall rate, the F1 and other aspects of the knowledge acquisition model can be improved.
And 703, converting the atlas knowledge extracted by using the optimized knowledge acquisition model into a problem to be confirmed, and pushing the problem to at least one user for answering.
For the optimized knowledge acquisition model, the extracted first atlas knowledge may be converted into a question to be confirmed in the manner shown in the embodiment of fig. 2, and then the question to be confirmed is pushed to the user for answering.
As an implementation scheme of loop optimization, in the embodiment of the present disclosure, optimization training data may be generated based on a result answered by a user for multiple times, and the knowledge acquisition model is optimally trained, that is, the above steps 701 to 703 are repeatedly performed.
In the embodiment of the disclosure, the answering result continuously confirmed by the user can be used as an optimized training data optimization knowledge acquisition model, and then a more accurate question is generated and pushed to the user to answer. In some embodiments, if it is determined that the value of the performance parameter of map knowledge extracted from the original information data set by using the knowledge acquisition model reaches a certain threshold value, that is, is greater than or equal to a first preset threshold value, it may be considered that the first map knowledge is stored in the database and is directly used to construct the knowledge map, that is, in the embodiment shown in fig. 3, the method further includes:
step 704, when it is determined that the first map knowledge performance parameter value extracted from the original information data set by using the knowledge acquisition model is greater than or equal to the first preset threshold value based on the response result of the user, the knowledge map is directly constructed based on the first map knowledge.
In the disclosed embodiment, the performance parameter values may include parameters of the metric knowledge acquisition model performance such as accuracy, recall ratio or F1 values, or other values calculated according to the above parameters, where the F1 value is a harmonic mean of the precision value and the recall ratio.
In some embodiments, as for the step 701 in the embodiment shown in fig. 7, when the question to be confirmed is pushed to the user, the question to be confirmed may be presented on the user response interface by providing the user response interface, and the submission control is answered.
Specifically, the user response interface may be as shown in fig. 8, in which a question to be confirmed "is company in the X-th paradigm? ", and three answer submission controls" do not confirm "," yes ", and" no ", the user can submit the answer results by clicking on the control.
Optionally, one or all of the following contents may be displayed on the user response interface:
the first is source information data showing the first map knowledge corresponding to the problem to be confirmed, which may be a part of the original information data set, and in the embodiment illustrated in fig. 8, the source information data is a reference article, which is the problem to be confirmed, "is a company in the X-th paradigm? "source information data;
secondly, a knowledge graph created based on the first graph knowledge corresponding to the problem to be confirmed is shown, as shown in fig. 8, a problem to be confirmed is shown, wherein "is the X-th paradigm of a company? "the corresponding first graph knowledge created knowledge graph.
The above display method of the problem to be confirmed only displays one problem to be confirmed each time, and simultaneously displays the knowledge graph corresponding to the problem to be confirmed, which can be regarded as a graph mode display method. Different from the atlas mode display method, the embodiment of the disclosure also provides an efficiency mode display mode.
In the efficiency mode presentation mode, at least two questions to be confirmed can be presented on the user answering interface, and the reply submission control comprises a batch selection control. Fig. 9 is a schematic diagram of an efficiency mode display interface provided in the embodiment of the present disclosure, and as shown in fig. 9, a plurality of questions to be confirmed are displayed on a user answering interface, and a user can individually select each question to be confirmed and can give an answering result through a full-selection control, a full-non-selection control, a counter-selection control, and a part of multi-selection controls, so that the graph construction efficiency can be significantly improved.
In addition, under the efficiency mode, whether source information data are displayed or not can be selectively determined according to the content displayed on the user answering interface; for example, when there are many problems to be confirmed, the source information data is not displayed any more, and the reference article in fig. 8 is not displayed in fig. 9, and it is necessary for the user to determine whether the problem to be confirmed is correct or not by means of his/her own general knowledge.
In some embodiments, for the knowledge acquisition model meeting the preset condition in step 103, the second atlas knowledge obtained by using the knowledge acquisition model and the third labeled result data obtained by manual labeling may be compared by using the same part of information data in the original information data set to determine whether the performance parameter value of the knowledge acquisition model meets the preset condition. Fig. 10 is a schematic flowchart of a process of determining whether a knowledge acquisition model satisfies a preset condition in the embodiment of the present disclosure, as shown in fig. 10, including the following steps:
1001, training a knowledge acquisition model based on a first labeling result;
step 1002, aiming at the same part of information data in the original information data set, respectively acquiring second map knowledge by using a knowledge acquisition model and acquiring third annotation result data through a manual annotation interface;
step 1003, if it is determined that the performance parameter value of the knowledge acquisition model is greater than or equal to a second preset threshold value based on the second atlas knowledge and the third annotation result data, determining that the knowledge acquisition model meets a preset condition.
By the aid of the method, the performance of the knowledge acquisition model can be controlled, and only when the performance of the knowledge acquisition model meets requirements, the knowledge map is constructed by the aid of the first map knowledge acquired by the knowledge acquisition model.
Further, the method can also comprise the following steps:
and 1004, if the performance parameter value of the knowledge acquisition model is determined to be smaller than a second preset threshold value based on the second atlas knowledge and the third labeling result data, continuing to perform optimization training on the knowledge acquisition model based on the third labeling result data.
The knowledge acquisition model with performance parameter values not meeting the requirements is proved to have insufficient reliability and needs to be optimized, and the third labeling result data is used as manual labeling data and has high reliability, so that the knowledge acquisition model can be continuously optimized and trained by using the third labeling result data in the step.
In the embodiment of the present disclosure, the performance parameter value may include a parameter of the scale knowledge acquisition model performance, such as an accuracy, a recall, or an F1 value, or another value calculated according to the above parameter, where F1 is a harmonic mean of the accuracy and the recall.
In the above embodiment of the present disclosure, the obtaining of the original information data set uploaded through the data interface may be providing a data uploading interface, and obtaining the original information data set uploaded through the data uploading interface, where the original information data set includes at least one of an original corpus data set, an original picture data set, and an original video data set, and optionally, the original corpus data set includes unstructured text data. The unstructured text data can cover most of data forms, so that the embodiment of the disclosure has a strong application range.
In some embodiments, besides using the above unstructured data to construct a knowledge graph, a structured graph may be used to construct the knowledge graph, fig. 11 is a schematic flow chart of constructing the knowledge graph using the structured data in the embodiments of the present disclosure, and as shown in fig. 11, the method may further include the following steps:
1101, acquiring structured data uploaded through a data uploading interface;
the data uploading interface can be in various forms, for example, a data uploading interface can be provided, and structured data uploaded through the data uploading interface can be acquired; further, for the structured data, the salient feature is that the structured data has relatively fixed fields, so that it can be considered that the structured data is used to directly form a knowledge graph, and therefore, when the structured data is obtained, fields in the structured data corresponding to types of graph knowledge in the knowledge graph to be constructed can be further obtained, for example, when the knowledge graph includes graph knowledge of entity types, relationship types and event types, the corresponding content in the structured data needs to be specified, for example, for the relationship of "company-creator-person", the corresponding fields in the structured data of "company" and "person" need to be specified; after the correspondence is created, automatic introduction of atlas knowledge from the structured data can begin.
Step 1102, constructing a knowledge graph based on graph knowledge in the structured data.
Fig. 12 is an interface schematic diagram of a structured data uploading interface provided by an embodiment of the present disclosure, as shown in fig. 12, in a company information form in an uploaded csv format, a field access part is used to input a field in structured data corresponding to a type of map knowledge, for example, to input a corresponding relationship between two fields of name and age in "company information form. In addition, the knowledge graph generated according to the structured data can be continuously displayed on the interface.
In some embodiments, a map building overview interface may also be provided, which may implement at least one of the following functions: on one hand, the priority of the first atlas knowledge of each type can be obtained through the overview interface, and then when the questions to be confirmed are pushed, the questions are pushed by referring to the priority; in the second aspect, some relationship information in the map building process can be displayed through the interface.
For the first aspect, if the first graph knowledge includes at least one of an entity type, a relationship type, and an event type, the graph building method may further include:
acquiring the priority of the first map knowledge of each type input through a map construction overview interface;
in each of the above disclosed embodiments, the pushing the question to be confirmed to at least one user through a preset channel for answering may specifically include:
and pushing the problems to be confirmed corresponding to the first map knowledge of each type based on the priority.
For the second aspect, the method of constructing a knowledge-graph may further comprise at least one of:
acquiring performance parameter information of the knowledge acquisition model, wherein the performance parameter information comprises at least one of accuracy, recall rate and F1, and is displayed on a map construction overview interface; and/or the presence of a gas in the gas,
the method comprises the steps of obtaining construction information of first map knowledge of each type of knowledge map, and displaying the construction information on a map construction overview interface, wherein the construction information comprises constructors, construction times and construction accuracy.
Fig. 13 is an interface schematic diagram of a graph construction overview interface provided in the embodiment of the present disclosure, as shown in fig. 13, where the graph construction overview interface is divided according to types of first graph knowledge, such as an entity type, a relationship type, and an event type, and construction information corresponding to each first graph knowledge in each type, such as a text construction test, a graph mode construction frequency, an efficiency mode construction frequency, an accuracy, and the like, is provided, where the text construction is a process in which a service expert may perform labeling to obtain first labeling result data in the above disclosed embodiment, and reference may be made to the manual labeling interface of fig. 6; graph mode construction may refer to the user response interface of FIG. 8; the efficiency mode construction may refer to the user response interface shown in fig. 9, and in addition, the priority of each type of graph knowledge may be specified, the priority provides a corresponding selection function, and by selecting high, medium, and low, the probability that the question corresponding to the first graph knowledge of the type is pushed may be set.
Furthermore, in the dimension of the input original information data set, performance parameter information of the knowledge acquisition model corresponding to the input original information data set can be checked, the performance parameter information includes at least one of accuracy, recall rate and F1, and is displayed on the map construction overview interface, the performance parameter information may correspond to different data sets in the original information data set, or may correspond to the whole data set, and the embodiment of the present disclosure is not limited.
Fig. 14 is a schematic structural diagram of an apparatus for constructing a knowledge graph according to an embodiment of the present disclosure, and as shown in fig. 14, the apparatus includes a first obtaining module 11, a second obtaining module 12, a model training module 13, a knowledge extraction module 14, and a graph construction module 15.
The first obtaining module 11 is configured to obtain an original information data set uploaded through a data uploading interface; specifically, the original information data set is an original material for constructing a knowledge graph, and corresponding graph knowledge can be obtained through analysis and extraction of the original information data set. The specific type of the original information data set may be various, and may include at least one of an original corpus data set, an original picture data set, and an original video data set, for example, which may be selected according to a specific service scenario.
The second obtaining module 12 is configured to output part of the corpus information data in the original information data set to a user through a manual tagging interface, and obtain first tagging result data completed by the user on the manual tagging interface;
the technical scheme provided by the embodiment of the disclosure is a technical scheme combining manual labeling and model extraction, wherein the manual labeling can be output to a user through a manual labeling interface, the user can be a service expert of a knowledge graph, and the like, first labeling result data is completed on the manual labeling interface through the user, the reliability of the first labeling result data is high, and the first labeling result data can be used as sample data for subsequently training a knowledge acquisition model.
The model training module 13 is configured to train a knowledge acquisition model based on the first labeling result data to obtain a knowledge acquisition model meeting a preset condition;
based on the first annotation result data obtained by the second obtaining module 12, the knowledge obtaining model may be further trained to obtain a knowledge obtaining model meeting preset conditions, where the preset conditions may be determined according to performance parameters of the knowledge obtaining model, where the parameters include, but are not limited to, parameters such as accuracy, recall rate, and F1 value of the knowledge obtaining model.
The knowledge extraction module 14 is configured to extract a first map knowledge from the original information data set by using the knowledge acquisition model;
after the model training module obtains the knowledge acquisition model satisfying the preset condition, that is, the knowledge acquisition model can already satisfy the service requirement, in this step, the original information data set may be input into the knowledge acquisition model as input data, so that the knowledge acquisition model extracts the first atlas knowledge from the original information data set, and in this embodiment, the knowledge acquisition model is not limited to the first atlas knowledge in all or part of the input original information data set acquired from the first acquisition module 11.
The atlas construction module 15 is used to construct a knowledge-atlas based on the first atlas knowledge. In particular, the module constructs a knowledge graph based on the first graph knowledge, wherein the knowledge graph may include the entity nodes, the relationship edges, the events and the like as described above.
The technical scheme for constructing the knowledge graph provided by the embodiment of the disclosure provides a graph knowledge extraction scheme combining manual labeling and model extraction, wherein the manual labeling process can be completed by an expert, a knowledge acquisition model meeting preset conditions is obtained through training based on a high-accuracy result of manual labeling, then the knowledge acquisition model is used for extracting graph knowledge, the difficulty in constructing the knowledge graph can be reduced, and the construction efficiency of the knowledge graph is improved.
In some embodiments, the knowledge graph may also be configured by the user, that is, the types of graph knowledge included in the knowledge graph and the specific item content included in each type of graph knowledge may be defined, and the structure, the type, and the included item content of the knowledge graph may also be different for different service scenarios. Fig. 15 is a schematic structural diagram of another apparatus for constructing a knowledge graph according to an embodiment of the present disclosure, where the first obtaining module 21, the second obtaining module 22, and the model training module 23 in the embodiment shown in fig. 15 may refer to corresponding modules in the embodiment shown in fig. 1, as shown in fig. 15, the apparatus further includes:
and a third obtaining module 20, configured to provide a graph configuration interface, and obtain the knowledge graph definition information input through the graph configuration interface.
When the module acquires the definition information of the knowledge graph, a user inputs the definition information through the graph configuration interface in a mode of providing the graph configuration interface. Specifically, when the knowledge graph includes entity nodes, event nodes, and relationship edges, the definition information of the acquired knowledge graph may include at least one of entity type definition information, event type definition information, and relationship type definition information of the acquired knowledge graph.
Specifically, the map knowledge for the entity type, where the input definition information may be summary information of the included specific entity node, for example, time, business, actor, etc., the input interface for the definition information for the entity type may be as shown in fig. 3, and in the page shown in fig. 3, the input definition information includes names of the entity nodes, for example, employees or persons, departments, duties, companies, dates, etc.
The input interface for the entity type definition information may be as shown in fig. 4, and the input definition information includes names of relationship edges, such as "department-employee-job" and the like.
The map knowledge for event type, wherein the input definition information may be summary information including specific event, the summary information may include event name, and the relationship within the event, for example, for a financing event, wherein the relationship may include "investor-company", "invested time-time", and "number of financing rounds-number of rounds", etc., the input interface for the event type definition information may be as shown with reference to fig. 5. In the embodiment of the present disclosure, the definition information determines the type and content of the atlas knowledge included in the subsequent knowledge atlas. Therefore, the atlas knowledge matched with the definition information is labeled during manual labeling, namely, a labeling prompt control matched with the knowledge atlas definition information is arranged on the manual labeling interface. Specifically, fig. 6 is a schematic diagram of a manual labeling interface in an embodiment of the present disclosure, for example, the interface shown in fig. 6 is an interface that performs atlas knowledge extraction on a section of introduction data of the "X-th company" in a manual labeling manner, where a labeling prompt control matched with definition information is displayed on the left side, for example, controls for entity types including characters, departments, companies, dates, jobs, and the like, and controls for relationship types including "company-creator-characters", "company-published product-product", and the like.
After the definition information of the knowledge graph is obtained, in addition to the manual annotation stage, the graph knowledge matched with the definition information is annotated to generate the first annotation result data, in the model extraction stage, the graph knowledge is also extracted by using the definition information of the knowledge graph, as shown in fig. 15, the knowledge acquisition module 24 is specifically configured to extract the first graph knowledge matched with the knowledge graph definition information from the original information data set by using the knowledge acquisition model based on the definition information of the knowledge graph.
In some embodiments, in the process of constructing the knowledge graph based on the first graph knowledge, the first graph knowledge may be converted into a question to be confirmed, and then the question is pushed to the user for answering, and manual annotation is performed again depending on the graph knowledge extracted by the knowledge acquisition model by the user, but the requirements on the user generally may be lower due to this type of manual annotation. Specifically, referring to fig. 15, the graph construction module is specifically configured to convert the extracted first graph knowledge into a question to be confirmed, and push the question to at least one user through a preset channel to answer the question;
the module is used for further converting the first map knowledge into a question to be confirmed on the basis of extracting the first map knowledge by using a knowledge extraction model, and pushing the question to be confirmed to a user for answering through at least one preset channel, wherein the question is a question-and-answer type question, and the user only needs to provide a confirmed or uncertain answering result, so that the requirement on the user is low, and the question-and-answer type question can be realized by using a common user without much professional knowledge; and the problem to be confirmed can be opened to the outside and accessed to different application scenes or platforms for realization. And the answer results of the ordinary users can be used as the labeled first map knowledge after being collected.
Further, the map building module is also used for building a knowledge map based on the answer result of the user.
After answer results of a user for a problem to be confirmed are acquired through various application scenes or platforms, a knowledge graph can be constructed based on the answer results.
In the embodiment of the disclosure, the first atlas knowledge extracted by the knowledge extraction model is converted into the question to be confirmed, and the question is accessed to different application scenes or platforms, so that a common user can answer the question, and manual annotation of the first atlas knowledge is realized. According to the technical scheme of the embodiment of the disclosure, the construction difficulty of the knowledge graph is reduced, and the construction efficiency of the knowledge graph is improved.
In some embodiments, the confidence level of the first atlas knowledge confirmed by the user as a response is higher, and the knowledge acquisition model can be trained by taking the first atlas knowledge as sample data to improve the performance of the knowledge acquisition model. FIG. 16 is a schematic structural diagram of another apparatus for constructing a knowledge graph according to an embodiment of the present disclosure, as shown in FIG. 16, the apparatus may further include a fourth obtaining module 26 and an optimization training module 27, where the fourth obtaining module 26 is configured to obtain second annotation result data based on the answer result of the user; specifically, the module considers that the result of the user is equivalent to the result obtained by re-correcting the atlas knowledge extracted from the knowledge acquisition model, and in this step, the result is used to obtain second labeling result data so as to re-optimize the knowledge acquisition model, which can be regarded as optimized training data.
The optimization training module 27 is configured to perform optimization training on the knowledge acquisition model based on the second labeling result data to obtain an optimized knowledge acquisition model;
after the knowledge acquisition model is optimally trained by the second annotation result data, the performance of the knowledge acquisition model can be significantly improved, for example, the accuracy, the recall rate, the F1 and other aspects of the knowledge acquisition model can be improved.
The map construction module 25 is specifically configured to convert map knowledge extracted by using the optimized knowledge acquisition model into a problem to be confirmed, and push the problem to at least one user for answering.
For the optimized knowledge acquisition model, the extracted first atlas knowledge may be converted into a question to be confirmed in the manner shown in the embodiment of fig. 15, and then the question to be confirmed is pushed to the user for answering.
As an implementation scheme of circular optimization, in the embodiment of the present disclosure, optimization training data may be generated based on a result answered by a user for multiple times, and optimization training is performed on the knowledge acquisition model, that is, each module may repeatedly execute a corresponding function.
In the embodiment of the disclosure, the answering result continuously confirmed by the user can be used as an optimized training data optimization knowledge acquisition model, and then a more accurate question is generated and pushed to the user to answer. In some embodiments, if it is determined that the value of the map intellectual property parameter extracted from the original information data set by using the knowledge acquisition model reaches a certain threshold value, that is, is greater than or equal to a first preset threshold value, it may be considered that the first map knowledge is stored in the database and the knowledge map is directly constructed by using the first map knowledge, that is, in the embodiment shown in fig. 16, the map construction module 25 may be further configured to construct the knowledge map based on the first map knowledge when it is determined that the value of the first map intellectual property parameter extracted from the original information data set by using the knowledge acquisition model is greater than or equal to the first preset threshold value based on the answer result of the user.
In the disclosed embodiment, the performance parameter values may include parameters of the metric knowledge acquisition model performance such as accuracy, recall ratio or F1 values, or other values calculated according to the above parameters, where the F1 value is a harmonic mean of the precision value and the recall ratio.
In some embodiments, when the graph building module 25 pushes the question to be confirmed to the user, the question to be confirmed may be displayed on the user response interface and the response submission control may be provided by providing the user response interface.
Specifically, the user response interface may be as shown in fig. 8, in which a question to be confirmed "is company in the X-th paradigm? ", and three answer submission controls" do not confirm "," yes ", and" no ", the user can submit the answer results by clicking on the control.
Optionally, one or all of the following contents may be displayed on the user response interface:
the first is source information data showing the first map knowledge corresponding to the problem to be confirmed, which may be a part of the original information data set, and in the embodiment illustrated in fig. 8, the source information data is a reference article, which is the problem to be confirmed, "is a company in the X-th paradigm? "source information data;
secondly, a knowledge graph created based on the first graph knowledge corresponding to the problem to be confirmed is shown, as shown in fig. 8, a problem to be confirmed is shown, wherein "is the X-th paradigm of a company? "the corresponding first graph knowledge created knowledge graph.
The above display method of the problem to be confirmed only displays one problem to be confirmed each time, and simultaneously displays the knowledge graph corresponding to the problem to be confirmed, which can be regarded as a graph mode display method. Different from the atlas mode display method, the embodiment of the disclosure also provides an efficiency mode display mode.
In the efficiency mode, the atlas handling module 25 may present at least two questions to be identified on the user response interface, and the reply submission control may include a batch selection control. Fig. 9 provides a schematic diagram of an efficiency mode display interface, as shown in fig. 9, a plurality of questions to be confirmed are displayed on a user answering interface, a user can independently select each question to be confirmed, and an answering result can be given through a full-selection control, a full-non-selection control, a reverse-selection control and a part of multi-selection controls, so that the atlas construction efficiency can be significantly improved.
In addition, under the efficiency mode, whether source information data are displayed or not can be selectively determined according to the content displayed on the user answering interface; for example, when there are many problems to be confirmed, the source information data is not displayed any more, and the reference article in fig. 8 is not displayed in fig. 9, and it is necessary for the user to determine whether the problem to be confirmed is correct or not by means of his/her own general knowledge.
In some embodiments, the knowledge acquisition model meeting the preset condition may be obtained by comparing the same part of the information data in the original information data set with the second atlas knowledge obtained by the knowledge acquisition model and the third labeled result data obtained by manual labeling to determine whether the performance parameter value of the knowledge acquisition model meets the preset condition. The model training module 23 is configured to train the knowledge acquisition model based on the first labeling result; aiming at the same part of information data in the original information data set, respectively acquiring second atlas knowledge by using a knowledge acquisition model and acquiring third annotation result data through a manual annotation interface; and if the performance parameter value of the knowledge acquisition model is determined to be greater than or equal to a second preset threshold value based on the second map knowledge and the third labeling result data, determining that the knowledge acquisition model meets the preset condition.
By the aid of the method, the performance of the knowledge acquisition model can be controlled, and only when the performance of the knowledge acquisition model meets requirements, the knowledge map is constructed by the aid of the first map knowledge acquired by the knowledge acquisition model.
Further, the module training module 23 may be further configured to continue to perform optimization training on the knowledge acquisition model based on the third labeling result data if it is determined that the performance parameter value of the knowledge acquisition model is smaller than the second preset threshold based on the second map knowledge and the third labeling result data.
The knowledge acquisition model with performance parameter values not meeting the requirements is proved to have insufficient reliability and needs to be optimized, and the third labeling result data is used as manual labeling data and has high reliability, so that the knowledge acquisition model can be continuously optimized and trained by using the third labeling result data in the step.
In the embodiment of the present disclosure, the performance parameter value may include a parameter of the scale knowledge acquisition model performance, such as an accuracy, a recall, or an F1 value, or another value calculated according to the above parameter, where F1 is a harmonic mean of the accuracy and the recall.
In the above embodiment of the present disclosure, the obtaining of the original information data set uploaded through the data interface may be providing a data uploading interface, and obtaining the original information data set uploaded through the data uploading interface, where the original information data set includes at least one of an original corpus data set, an original picture data set, and an original video data set, and optionally, the original corpus data set includes unstructured text data. The unstructured text data can cover most of data forms, so that the embodiment of the disclosure has a strong application range.
In some embodiments, in addition to the construction of the knowledge graph by using the unstructured data, the construction of the knowledge graph by using the structured graph may be performed, and the graph construction apparatus may further include a sixth obtaining module, configured to obtain the structured data uploaded through the data uploading interface.
The data uploading interface can be in various forms, for example, a data uploading interface can be provided, and structured data uploaded through the data uploading interface can be acquired; further, for the structured data, the salient feature is that the structured data has relatively fixed fields, so that it can be considered that the structured data is used to directly form a knowledge graph, and therefore, when the structured data is obtained, fields in the structured data corresponding to types of graph knowledge in the knowledge graph to be constructed can be further obtained, for example, when the knowledge graph includes graph knowledge of entity types, relationship types and event types, the corresponding content in the structured data needs to be specified, for example, for the relationship of "company-creator-person", the corresponding fields in the structured data of "company" and "person" need to be specified; after the correspondence is created, automatic introduction of atlas knowledge from the structured data can begin.
Further, the atlas construction module is also used for constructing the knowledge atlas based on the atlas knowledge in the structured data.
Fig. 12 is a schematic diagram of a structured data uploading interface provided by an embodiment of the present disclosure, as shown in fig. 12, in which a company information table in a csv format is uploaded, and a field access part is used to input a field in structured data corresponding to a type of map knowledge, for example, to input a correspondence between two fields of name and age in "company information table. In addition, the knowledge graph generated according to the structured data can be continuously displayed on the interface.
In some embodiments, a map building overview interface may also be provided, which may implement at least one of the following functions: on one hand, the priority of the first atlas knowledge of each type can be obtained through the overview interface, and then when the questions to be confirmed are pushed, the questions are pushed by referring to the priority; in the second aspect, some relationship information in the map building process can be displayed through the interface.
For the first aspect, if the first graph knowledge includes at least one of an entity type, a relationship type and an event type, the graph construction apparatus may further include a fifth obtaining module, where the fifth obtaining module is configured to obtain a priority of the first graph knowledge of each type input through the graph construction overview interface;
in the above-mentioned embodiments, the graph building module may be specifically configured to push the to-be-confirmed question corresponding to the first graph knowledge of each type based on the priority.
For the second aspect, the method of constructing a knowledge-graph may further comprise at least one of:
acquiring performance parameter information of the knowledge acquisition model, wherein the performance parameter information comprises at least one of accuracy, recall rate and F1, and is displayed on a map construction overview interface; and/or the presence of a gas in the gas,
the method comprises the steps of obtaining construction information of first map knowledge of each type of knowledge map, and displaying the construction information on a map construction overview interface, wherein the construction information comprises constructors, construction times and construction accuracy.
Fig. 13 is an interface schematic diagram of an overview interface for graph construction in the embodiment of the present disclosure, as shown in fig. 13, where the first graph knowledge is divided according to types of the first graph knowledge, such as an entity type, a relationship type, and an event type, and construction information corresponding to each first graph knowledge in each type is provided, such as a text construction test, graph mode construction times, efficiency mode construction times, accuracy, and the like, where the text construction is a process in which a service expert may perform labeling to obtain first labeling result data in the above-described embodiment, and reference may be made to the manual labeling interface of fig. 6; graph mode construction may refer to the user response interface of FIG. 8; the efficiency mode construction may refer to the user response interface shown in fig. 9, and in addition, the priority of each type of graph knowledge may be specified, the priority provides a corresponding selection function, and by selecting high, medium, and low, the probability that the question corresponding to the first graph knowledge of the type is pushed may be set.
Furthermore, in the dimension of the input original information data set, performance parameter information of the knowledge acquisition model corresponding to the input original information data set can be checked, the performance parameter information includes at least one of accuracy, recall rate and F1, and is displayed on the map construction overview interface, the performance parameter information may correspond to different data sets in the original information data set, or may correspond to the whole data set, and the embodiment of the present disclosure is not limited.
In a third aspect, an embodiment of the present invention provides a computer apparatus, including:
a processor for implementing the steps of the method of constructing a knowledge-graph as described above when executing a computer program stored in the memory.
The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the computer to perform desired functions.
The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by a processor to implement the above method steps of the various embodiments of the present application and/or other desired functions.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of constructing a knowledge-graph as described above.
In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the method steps of the various embodiments of the present application.
The computer program product may include program code for carrying out operations for embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the method steps of the various embodiments of the present application.
A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of constructing a knowledge graph, wherein the method comprises:
acquiring an original information data set uploaded through a data uploading interface;
outputting part of information data in the original information data set to a user through a manual labeling interface, and acquiring first labeling result data finished on the manual labeling interface by the user;
training a knowledge acquisition model based on the first labeling result data to obtain a knowledge acquisition model meeting preset conditions;
extracting first atlas knowledge from the original information dataset using the knowledge acquisition model;
and constructing a knowledge graph based on the first graph knowledge.
2. The method of claim 1, wherein before outputting the partial information data in the original information data set to the user through the manual tagging interface, further comprising:
providing a map configuration interface, and acquiring knowledge map definition information input through the map configuration interface;
and a labeling prompt control matched with the knowledge graph definition information is arranged on the manual labeling interface.
3. The method of claim 2, wherein said extracting first atlas knowledge from the original information dataset using the knowledge acquisition model comprises:
and extracting first map knowledge matched with the knowledge map definition information from the original information data set by using a knowledge acquisition model based on the definition information of the knowledge map.
4. The method of claim 1, wherein the constructing a knowledge-graph based on the first graph knowledge comprises:
converting the extracted first atlas knowledge into a question to be confirmed, and pushing the question to at least one user for answering through a preset channel;
and constructing a knowledge graph based on the response result of the user.
5. The method of claim 4, wherein the method further comprises:
obtaining second annotation result data based on the answer result of the user;
performing optimization training on the knowledge acquisition model based on the second labeling result data to obtain an optimized knowledge acquisition model;
and converting the first atlas knowledge extracted by using the optimized knowledge acquisition model into a problem to be confirmed, and pushing the problem to at least one user for answering.
6. The method of claim 5, wherein upon determining that a first atlas knowledge performance parameter value extracted from the original information dataset using the knowledge acquisition model is greater than or equal to a first preset threshold based on a user's answer, constructing a knowledge-atlas directly based on using the first atlas knowledge.
7. The method of claim 4, wherein the converting the extracted first atlas knowledge into a question to be confirmed and pushing the question to at least one user for answering through a preset channel comprises:
and providing a user answering interface, displaying the questions to be confirmed on the user answering interface, and responding to a submission control.
8. An apparatus for constructing a knowledge graph, wherein the method comprises:
the first acquisition module is used for acquiring an original information data set uploaded through the data uploading interface;
the second acquisition module is used for outputting part of information data in the original information data set to a user through a manual labeling interface and acquiring first labeling result data finished on the manual labeling interface by the user;
the model training module is used for training the knowledge acquisition model based on the first labeling result data to obtain the knowledge acquisition model meeting the preset conditions;
a knowledge extraction module for extracting a first map knowledge from the original information dataset using the knowledge acquisition model;
and the map construction module is used for constructing a knowledge map based on the first map knowledge.
9. A computer device, the computer device comprising:
a processor for implementing the steps of the method according to any one of claims 1 to 7 when executing a computer program stored in a memory.
10. A computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 7.
CN202010556526.3A 2020-06-17 2020-06-17 Method, device and equipment for constructing knowledge graph and readable storage medium Pending CN111753022A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010556526.3A CN111753022A (en) 2020-06-17 2020-06-17 Method, device and equipment for constructing knowledge graph and readable storage medium
EP21825747.5A EP4170520A4 (en) 2020-06-17 2021-06-17 Method and device for constructing knowledge graph, computer device, and storage medium
PCT/CN2021/100709 WO2021254457A1 (en) 2020-06-17 2021-06-17 Method and device for constructing knowledge graph, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010556526.3A CN111753022A (en) 2020-06-17 2020-06-17 Method, device and equipment for constructing knowledge graph and readable storage medium

Publications (1)

Publication Number Publication Date
CN111753022A true CN111753022A (en) 2020-10-09

Family

ID=72676210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010556526.3A Pending CN111753022A (en) 2020-06-17 2020-06-17 Method, device and equipment for constructing knowledge graph and readable storage medium

Country Status (1)

Country Link
CN (1) CN111753022A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989062A (en) * 2021-02-24 2021-06-18 华东师范大学 Knowledge crowdsourcing platform construction method for rule judgment
WO2021254457A1 (en) * 2020-06-17 2021-12-23 第四范式(北京)技术有限公司 Method and device for constructing knowledge graph, computer device, and storage medium
CN114417018A (en) * 2022-03-28 2022-04-29 金现代信息产业股份有限公司 Full-process visual configuration system and method of knowledge graph

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
US20180285346A1 (en) * 2017-03-31 2018-10-04 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for updating mining model
CN110008353A (en) * 2019-04-09 2019-07-12 福建奇点时空数字科技有限公司 A kind of construction method of dynamic knowledge map
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
US20180285346A1 (en) * 2017-03-31 2018-10-04 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for updating mining model
CN110008353A (en) * 2019-04-09 2019-07-12 福建奇点时空数字科技有限公司 A kind of construction method of dynamic knowledge map
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021254457A1 (en) * 2020-06-17 2021-12-23 第四范式(北京)技术有限公司 Method and device for constructing knowledge graph, computer device, and storage medium
CN112989062A (en) * 2021-02-24 2021-06-18 华东师范大学 Knowledge crowdsourcing platform construction method for rule judgment
CN114417018A (en) * 2022-03-28 2022-04-29 金现代信息产业股份有限公司 Full-process visual configuration system and method of knowledge graph

Similar Documents

Publication Publication Date Title
EP4170520A1 (en) Method and device for constructing knowledge graph, computer device, and storage medium
CN111753022A (en) Method, device and equipment for constructing knowledge graph and readable storage medium
CN1457041B (en) System for automatically annotating training data for natural language understanding system
WO2021164284A1 (en) Method, apparatus and device for generating reading comprehension question, and storage medium
CN111753021A (en) Method, device and equipment for constructing knowledge graph and readable storage medium
US8707250B2 (en) Automation support for domain modeling
US20180157738A1 (en) Informational retrieval
CN114168619B (en) Training method and device of language conversion model
CN114254129A (en) Method, device and readable storage medium for updating knowledge graph
JP2022020543A (en) Skill term rating method and device, electronic apparatus, and computer readable medium
CN112199951A (en) Event information generation method and device
CN114417012A (en) Method for generating knowledge graph and electronic equipment
CN113627194B (en) Information extraction method and device, and communication message classification method and device
CN107578183B (en) Resource management method and device based on capability evaluation
US20140370489A1 (en) Processing apparatus, processing system, and processing method
CN116932694A (en) Intelligent retrieval method, device and storage medium for knowledge base
CN113609833B (en) Dynamic file generation method and device, computer equipment and storage medium
CN112036569B (en) Knowledge content labeling method and device, computer device and readable storage medium
CN115809322A (en) GPT 3-based question-answering system text generation method and device
CN112948580B (en) Text classification method and system
CN115836288A (en) Method and apparatus for generating training data
KR102665966B1 (en) Method and system for generating classification numbers using artificial intelligence to build book MARC DATA
WO2020206278A1 (en) Systems and methods for generating logical documents for a document evaluation system
Singh et al. Machine Learning Use Cases in AWS
KR20230008306A (en) Method and system for generating classification numbers using artificial intelligence to build book MARC DATA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination