CN111753021A

CN111753021A - Method, device and equipment for constructing knowledge graph and readable storage medium

Info

Publication number: CN111753021A
Application number: CN202010555945.5A
Authority: CN
Inventors: 陈伟; 陈雨强; 陶冶
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2020-10-09

Abstract

The disclosure relates to a method, a device, equipment and a readable storage medium for constructing a knowledge graph, wherein the method comprises the steps of obtaining definition information of the knowledge graph; acquiring an original information data set; extracting map knowledge from an original information data set by using a knowledge acquisition model based on the definition information of the knowledge map; converting the atlas knowledge extracted by using the knowledge acquisition model into a problem to be confirmed, and pushing the problem to at least one user for answering; and constructing a knowledge graph based on the response result of the user. The technical scheme provided by the embodiment of the disclosure can reduce the difficulty of knowledge graph construction and improve the construction efficiency of the knowledge graph.

Description

Method, device and equipment for constructing knowledge graph and readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for constructing a knowledge graph.

Background

With the continuous development of artificial intelligence technology, the knowledge graph technology, which is one of the important branches of artificial intelligence, is also increasingly applied in various industries.

In the prior art, in the construction process of a knowledge graph, the specific structure of the knowledge graph is generally defined, then a marking tool is used for manually marking a large amount of knowledge of all types on unstructured data according to the defined knowledge types, then a model is trained, the index effect does not reach the standard, all types are continuously arranged and marked, the model training index is used for automatically extracting subsequent unstructured data into a triple form after reaching the expectation, then the knowledge of the triple is constructed into the knowledge graph, in the whole process, all types are marked for each article, and a large amount of marks are marked, so that the marking is difficult, the leakage rate is high, the efficiency is low, the influence of the frequency of different knowledge types in the text is caused, the model index parameters obtained by the training of all knowledge types are uneven, the granularity of the knowledge types cannot be controlled and adjusted by a graph construction party, and the service requirement is poor, furthermore, multiple Natural Language Processing (NLP) experts or knowledge-graph experts are required to complete the process.

Therefore, the method for constructing the knowledge graph in the prior art needs professional NLP experts or knowledge graph experts to participate, and has the defects of high difficulty and low efficiency.

Disclosure of Invention

To solve the above technical problem or to at least partially solve the above technical problem, the present disclosure provides a method, an apparatus, a device, and a readable storage medium for constructing a knowledge graph.

To achieve the above object, in a first aspect, the present disclosure provides a method of constructing a knowledge-graph, comprising:

acquiring definition information of a knowledge graph;

acquiring an original information data set;

extracting map knowledge from an original information data set by using a knowledge acquisition model based on the definition information of the knowledge map;

converting the atlas knowledge extracted by using the knowledge acquisition model into a problem to be confirmed, and pushing the problem to at least one user for answering;

and constructing a knowledge graph based on the response result of the user.

In some embodiments, further comprising:

generating optimized training data based on the response result of the user;

performing optimization training on the knowledge acquisition model based on the optimization training data to obtain an optimized knowledge acquisition model;

and converting the atlas knowledge extracted by using the optimized knowledge acquisition model into a problem to be confirmed, and pushing the problem to at least one user for answering.

In some embodiments, further comprising:

and when the performance parameter value of the map knowledge extracted from the original information data set by using the knowledge acquisition model is determined to be larger than or equal to a preset threshold value based on the response result of the user, directly constructing the knowledge map based on the map knowledge extracted by using the knowledge acquisition model.

In some embodiments, obtaining knowledge-graph definition information includes:

acquiring at least one item of entity type definition information, event type definition information and relation type definition information of a knowledge graph;

extracting the atlas knowledge from the raw information dataset using the knowledge acquisition model comprises:

at least one of the graph knowledge of the entity type, the graph knowledge of the event type, and the graph knowledge of the relationship type is extracted from the original information data set using a knowledge acquisition model.

In some embodiments, the knowledge acquisition model comprises at least one of a rules model, a dictionary model, a statistical learning model, a machine learning model, and a language model.

In some embodiments, the knowledge acquisition model is utilized to extract at least one of graph knowledge of entity types, graph knowledge of event types, and graph knowledge of relationship types from the raw information dataset, including at least one of:

extracting the atlas knowledge of the entity type from the original information data set by using at least one entity type knowledge acquisition model;

extracting atlas knowledge of the event type from the original information data set by using at least one event type knowledge acquisition model;

and extracting the atlas knowledge of the relation type from the original information data set by using at least one relation type knowledge acquisition model.

In some embodiments, the atlas knowledge output by any one of the entity-type knowledge acquisition model, the event-type knowledge acquisition model, and the relationship-type knowledge acquisition model is used as input data for any one or both of the others.

In some embodiments, extracting the knowledge of the entity type from the raw information dataset using at least one entity type knowledge acquisition model comprises:

atlas knowledge of a person entity type may be extracted from a raw information dataset using at least one of a rule model, a bert language model, an LSTM + CRF language model, and an few-shot language model.

In some embodiments, further comprising:

extracting the graph knowledge of the entity type from the graph knowledge of the relationship type, and/or extracting the graph knowledge of the entity type and the graph knowledge of the relationship type from the graph knowledge of the event type;

and classifying and collecting the atlas knowledge of the entity type, the atlas knowledge of the relation type and the atlas knowledge of the event type according to the types.

In some embodiments, further comprising:

and merging and fusing the atlas knowledge of each type obtained by classification and collection, and/or screening and filtering the atlas knowledge obtained by classification and collection.

In some embodiments, converting the atlas knowledge extracted using the knowledge acquisition model into questions to be confirmed and pushing the questions to at least one user to answer includes:

generating a problem pool based on the problems to be confirmed;

and extracting the questions to be confirmed from the question pool according to a preset sequence, and pushing the questions to at least one user for answering.

In some embodiments, the questions to be confirmed are extracted from the question pool in a preset order and pushed to at least one user for response, including at least one of:

extracting the questions to be confirmed from the question pool based on the learning model, and pushing the questions to at least one user for answering;

extracting questions to be confirmed from the question pool based on preset priority and/or performance parameter value indexes, and pushing the questions to at least one user for answering;

extracting questions to be confirmed from the question pool based on the relevance of the map knowledge, and pushing the questions to at least one user for answering;

and extracting the questions to be confirmed from the question pool based on the data granularity in the original information data set, and pushing the questions to at least one user for answering.

In some embodiments, the responses are pushed to the at least one user, including:

pushing the answer to at least one user in a text mode, an image mode or a voice broadcasting mode;

the method further comprises the following steps:

and acquiring the response result of the user by any one of a keyboard mode, a mouse input mode, a touch input mode, a voice input mode or a somatosensory input mode.

In some embodiments, the responses are pushed to the at least one user, including at least one of:

pushing the problem to be confirmed to a shared learning platform;

pushing the problem to be confirmed to a login verification platform;

and pushing the questions to be confirmed to an operation platform, wherein the operation platform comprises at least one of a customer service system platform, an evaluation system platform and a crowdsourcing platform.

In some embodiments, constructing the knowledge-graph based on the answer results of the user comprises:

and confirming whether a preset confirmation condition is met or not based on the answer result of the user, and constructing a knowledge graph according to graph knowledge corresponding to the answer result when the preset confirmation condition is met.

In some embodiments, the user's answer results include a confirmation type result and a denial type result;

confirming whether preset confirmation conditions are met or not based on the answer result of the user, including:

determining whether a preset confirmation condition is satisfied based on the number of confirmation type results and/or the number of non-confirmation type results.

In some embodiments, the preset confirmation conditions include a first accuracy confirmation condition and a second accuracy confirmation condition, and the number requirements of the first accuracy confirmation condition and the second accuracy confirmation condition are different.

In some embodiments, constructing the knowledge-graph from graph knowledge corresponding to the response results comprises:

constructing entity nodes in the knowledge graph according to graph knowledge of the entity types;

establishing entity nodes which do not exist in the knowledge graph and establishing relationship edges according to the graph knowledge of the association type;

and constructing entity nodes which do not exist in the knowledge graph according to the graph knowledge of the event type, and constructing event nodes and events which are centered by the event nodes.

In some embodiments, the raw information dataset comprises at least one of a raw corpus dataset, a raw picture dataset, and a raw video dataset;

wherein obtaining the original corpus data set comprises:

acquiring unstructured text data;

and preprocessing the unstructured text data, wherein the preprocessing method comprises at least one of coding, word segmentation, rule model processing and dictionary matching.

In a second aspect, an embodiment of the present disclosure further discloses an apparatus for constructing a knowledge graph, including:

the first acquisition module is used for acquiring definition information of the knowledge graph;

the second acquisition module is used for acquiring an original information data set;

the extraction module is used for extracting the knowledge of the knowledge map from the original information data set by using the knowledge acquisition model based on the definition information of the knowledge map;

the pushing module is used for converting the atlas knowledge extracted by the knowledge acquisition model into a problem to be confirmed and pushing the problem to at least one user for answering;

and the construction module is used for constructing the knowledge graph based on the answering result of the user.

In some embodiments, further comprising:

the training data generation module is used for generating optimized training data based on the answering result of the user;

the training module is used for carrying out optimization training on the knowledge acquisition model based on the optimization training data so as to obtain the optimized knowledge acquisition model;

the pushing module is also used for converting the atlas knowledge extracted by the optimized knowledge acquisition model into a problem to be confirmed and pushing the problem to at least one user for answering.

In some embodiments, the construction module is further configured to directly construct the knowledge graph based on the graph knowledge extracted by the knowledge acquisition model when it is determined that the performance parameter value of the graph knowledge extracted from the original information data set by the knowledge acquisition model is greater than or equal to a preset threshold value based on the response result of the user.

In some embodiments, the first obtaining module is configured to obtain at least one of entity type definition information, event type definition information, and relationship type definition information of the knowledge-graph;

the extraction module is used for extracting at least one of the entity type map knowledge, the event type map knowledge and the relationship type map knowledge from the original information data set by using the knowledge acquisition model.

In some embodiments, the extraction module is to perform at least one of:

In some embodiments, the extraction module extracts the knowledge of the entity type from the raw information dataset using at least one entity type knowledge acquisition model, comprising:

In some embodiments, further comprising:

the collecting module is used for extracting the map knowledge of the entity type from the map knowledge of the relationship type and/or extracting the map knowledge of the entity type and the map knowledge of the relationship type from the map knowledge of the event type; and classifying and collecting the atlas knowledge of the entity type, the atlas knowledge of the relation type and the atlas knowledge of the event type according to the types.

In some embodiments, the collection module is further configured to perform merging and fusing processing on the atlas knowledge of each type obtained by classification and collection, and/or perform filtering processing on the atlas knowledge obtained by classification and collection.

In some embodiments, the push module is to generate a pool of questions based on the questions to be confirmed; and extracting the questions to be confirmed from the question pool according to a preset sequence, and pushing the questions to at least one user for answering.

In some embodiments, the push module is configured to perform at least one of:

In some embodiments, the push module is configured to push the answer to at least one user in a text mode, an image mode, or a voice broadcast mode;

the device, still include:

and the input module is used for acquiring the response result of the user in any one of a keyboard mode, a mouse input mode, a touch input mode, a voice input mode or a somatosensory input mode.

In some embodiments, the push module is configured to perform at least one of:

pushing the problem to be confirmed to a shared learning platform;

pushing the problem to be confirmed to a login verification platform;

In some embodiments, the construction module is configured to confirm whether a preset confirmation condition is satisfied based on a response result of the user, and construct the knowledge graph according to graph knowledge corresponding to the response result when the preset confirmation condition is satisfied.

In some embodiments, the construction module is specifically configured to construct entity nodes in the knowledge-graph based on graph knowledge of the entity type; establishing entity nodes which do not exist in the knowledge graph and establishing relationship edges according to the graph knowledge of the association type; and constructing entity nodes which do not exist in the knowledge graph according to the graph knowledge of the event type, and constructing event nodes and events which are centered by the event nodes.

In some embodiments, the original information data set includes at least one of an original corpus data set, an original picture data set, and an original video data set, and the second obtaining module is configured to obtain unstructured text data for the original corpus data set; and preprocessing the unstructured text data, wherein the preprocessing method comprises at least one of coding, word segmentation, rule model processing and dictionary matching.

In a third aspect, an embodiment of the present disclosure further discloses a computer apparatus, where the computer apparatus includes:

a processor for implementing the steps of any of the above methods when executing the computer program stored in the memory.

In a fourth aspect, the disclosed embodiments also disclose a computer-readable storage medium having computer instructions stored thereon, where the computer instructions are executed by a processor to perform the steps of any of the above methods.

According to the technical scheme provided by the embodiment of the disclosure, the knowledge graph knowledge extracted by using the knowledge acquisition model is converted into the question to be confirmed, the question to be confirmed is pushed to the user to answer, and the knowledge graph is constructed based on the answering result, so that a knowledge graph construction mode for carrying out knowledge annotation by using a question-and-answer mode is provided. According to the technical scheme of the embodiment of the disclosure, on one hand, the knowledge of the spectrogram is asked for the user in a to-be-confirmed question mode, so that the difficulty of labeling the knowledge by the user can be reduced, a professional NLP (non line segment) expert or excessive participation of the knowledge spectrogram expert is not needed, and the difficulty of constructing the knowledge spectrogram is reduced; in the technical scheme, the knowledge acquisition model is used for extracting the knowledge of the knowledge map, the modes of answering by common users are combined, and the requirements on the users are reduced, so that the required user resources can be searched in various modes, and the construction efficiency of the knowledge map can be improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic flow chart diagram of a method for constructing a knowledge graph according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of another method for constructing a knowledge graph according to an embodiment of the disclosure;

FIG. 3 is a schematic flow chart diagram of yet another method for constructing a knowledge graph according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram illustrating yet another method for constructing a knowledge graph according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of an apparatus for constructing a knowledge graph according to an embodiment of the present disclosure;

FIG. 6 is a flow chart of another method for constructing a knowledge graph according to an embodiment of the disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

In the embodiment of the present disclosure, the knowledge graph involved in the method may be defined as an organization form of graph-like knowledge and data, and the entity node, the event node and the relationship edge are three main components thereof, and the entity node and the event node can be associated with different types of relationship edges.

The entity nodes in the knowledge graph refer to semantic words with specific meanings, such as people, places, mechanisms, numbers and other information.

Event nodes in the knowledge graph refer to one or more entity nodes with specific meanings and predefined event names which are connected by specific relation words to form a group of associated information, so that a dynamic behavior is described, and hereinafter, financing events and movie showing events are respectively explained.

For financing events, content may be included:

financing event-financing time-2020.5.20

Financing event-financing Party-XXX corporation

Financing event-financing amount-10 billion dollars

Financing event-investor-XXX corporation

For a movie show event, the content can be as follows:

movie and television work show event-movie and television work name-work name

Event-time mapping on film and television works

Movie work showing event-showing place-place

Event-director-name mapping on film and television works

Event-lead show-name of movie and television works

For a particular movie show event 01, the relationship edges that may be included may include the following:

movie work show event 01-movie work name-ironmen

Movie and television work showing event 01-showing time-2008, 4 months and 30 days

Movie work mapping event 01-mapping place-China

Event 01-director-Jone-Fibreuer on film and television works

Movie work mapping event 01-lead show-little robert-Tang

Movie work mapping event 01-lead actor-gurweneisi palo

Movie work mapping event 01-lead actor-tylens Howland

Event 01-lead actor-jeff brigies on film and television works

Aiming at the defects that in the prior art, in the process of constructing the knowledge graph, especially in the process of constructing the knowledge graph based on unstructured data, professional NLP experts or knowledge graph experts are required to participate, and the difficulty is high and the efficiency is low. In the technical scheme for constructing the knowledge graph, the graph knowledge extracted by using the knowledge acquisition model is converted into the problem to be confirmed, the problem to be confirmed is pushed to the user to answer, and the knowledge graph is constructed based on the answering result, so that a knowledge graph construction method for carrying out knowledge annotation by using a question-answer mode is provided, the construction difficulty of the knowledge graph can be reduced, and the construction efficiency of the knowledge graph is improved.

Fig. 1 is a schematic flow chart of a method for constructing a knowledge graph according to an embodiment of the present disclosure, as shown in fig. 1, the method includes the following steps:

step 101, acquiring definition information of a knowledge graph;

in the step, in the process of constructing the knowledge graph, definition information of the knowledge graph to be constructed is obtained, the definition information may also be different according to different types or kinds of the knowledge graph to be constructed, and specific content of the definition information corresponds to constituent elements included in the knowledge graph.

In addition, in the embodiment of the present disclosure, the knowledge graph supports three situations, namely, a completely redefined input port, a fixed knowledge graph and a preset knowledge graph, wherein the completely redefined input port needs to be set with knowledge graph definition information, and receives newly input definition information, the fixed knowledge graph refers to a situation where a defined knowledge graph is constructed, definition information of the knowledge graph at this time can be stored in a database, and in this step, definition information of the knowledge graph is obtained from the database; the preset knowledge graph refers to a user part definition graph, part of graph definition information is built in the system in advance, and the user directly/indirectly multiplexes the type of a preset graph knowledge model in the system in the graph definition process. The user-defined part needs to set an input port of the knowledge graph definition information and receive newly input definition information.

102, acquiring an original information data set;

the original information data set obtained in this step is original data used for constructing the knowledge graph, and may include various forms, for example, at least one of an original corpus data set, an original picture data set, and an original video data set. The original expected data set can be divided into unstructured text data and structured text data according to different data types, and in the embodiment of the disclosure, atlas knowledge extraction can be performed on the unstructured text data. In some embodiments, the unstructured text data may be accessed through a database, such as a mysql, mongodb, ES, hive, or the like type database, or may be accessed as a file upload.

103, extracting map knowledge from an original information data set by using a knowledge acquisition model based on the definition information of the knowledge map;

in this step, based on the knowledge-graph definition information obtained in step 101, a pre-trained knowledge acquisition model is used to extract graph knowledge from the original forecast data set. Specifically, referring to step 101, when the definition information of the knowledge graph includes at least one of entity type definition information, event type definition information, and relationship type definition information, the graph knowledge acquired in this step may also include at least one of graph knowledge of the entity type, graph knowledge of the event type, and graph knowledge of the relationship type.

Step 104, converting the atlas knowledge extracted by the knowledge acquisition model into a question to be confirmed, and pushing the question to at least one user for answering;

the method comprises the steps that on the basis of extracting the atlas knowledge by using a knowledge extraction model, the atlas knowledge is further converted into a question to be confirmed, and the question to be confirmed is pushed to a user through at least one preset channel to answer, because the question is a question-answer type question, and the user only needs to provide a confirmed or uncertain answer result, the requirement on the user is low, and the method can be realized by using a common user without much professional knowledge; and the problem to be confirmed can be opened to the outside and accessed to different application scenes or platforms for realization. And the answer results of the ordinary users can be used as labeled atlas knowledge after being collected.

And 105, constructing a knowledge graph based on the answer result of the user.

After answer results of a user for a problem to be confirmed are acquired through various application scenes or platforms, a knowledge graph can be constructed based on the answer results.

In the embodiment of the disclosure, the atlas knowledge extracted by the knowledge extraction model is converted into the question to be confirmed, and the question is accessed to different application scenes or platforms, so that a common user can answer the question, and the atlas knowledge is manually labeled. According to the technical scheme of the embodiment of the disclosure, the construction difficulty of the knowledge graph is reduced, and the construction efficiency of the knowledge graph is improved.

In some embodiments, the reliability of the graph knowledge confirmed by the user as a response is high, and the graph knowledge can be considered as sample data to train the knowledge acquisition model so as to improve the performance of the knowledge acquisition model. Fig. 2 is a schematic flow chart of another method for constructing a knowledge graph according to an embodiment of the present disclosure, and as shown in fig. 2, the method may further include:

step 201, generating optimized training data based on the answer result of the user;

specifically, in the step, considering that the result of the user is equivalent to the result obtained by re-correcting the atlas knowledge extracted from the knowledge acquisition model, the step generates new sample data by using the result to re-optimize the knowledge acquisition model, so that the new sample data can be used as optimized training data.

202, performing optimization training on the knowledge acquisition model based on optimization training data to obtain an optimized knowledge acquisition model;

after the knowledge acquisition model is optimally trained through the optimized training data, the performance of the knowledge acquisition model can be obviously improved, for example, the performance of the knowledge acquisition model in the aspects of accuracy, recall rate and the like can be improved.

Step 203, converting the atlas knowledge extracted by the optimized knowledge acquisition model into a problem to be confirmed, and pushing the problem to at least one user for answering.

For the optimized knowledge acquisition model, the extracted atlas knowledge can be converted into a question to be confirmed continuously according to the mode in the embodiment shown in fig. 1, and then the question to be confirmed is pushed to the user for answering.

As an implementation of circular optimization, in the embodiment of the present disclosure, optimized training data may be generated based on the result answered by the user multiple times, and the knowledge acquisition model is optimally trained, that is, the above steps 201 to 203 are repeatedly performed.

In the embodiment of the disclosure, the answering result continuously confirmed by the user can be used as an optimized training data optimization knowledge acquisition model, and then a more accurate question is generated and pushed to the user to answer. In some embodiments, if it is determined that the performance parameter value of the map knowledge extracted from the original information data set by using the knowledge acquisition model reaches a certain threshold value, that is, is greater than or equal to a preset threshold value, it may be considered to directly store the map knowledge in the database and directly construct the knowledge map by using the map knowledge, that is, in the embodiment shown in fig. 2, the method further includes:

and 204, when the performance parameter value of the map knowledge extracted from the original information data set by using the knowledge acquisition model is determined to be larger than or equal to a preset threshold value based on the response result of the user, directly constructing the knowledge map based on the map knowledge extracted by using the knowledge acquisition model.

In the disclosed embodiment, the performance parameter values may include parameters of the metric knowledge acquisition model performance such as accuracy, recall ratio or F1 values, or other values calculated according to the above parameters, where the F1 value is a harmonic mean of the precision value and the recall ratio.

In some embodiments, it is desirable to extract at least one of the graph knowledge of the entity type, the graph knowledge of the event type, and the graph knowledge of the relationship type from the raw information data set using a knowledge acquisition model, wherein the knowledge acquisition model may be of various types, and may include at least one of a rule model, a dictionary model, a statistical learning model, a machine learning model, and a language model, for example.

In addition, for obtaining three types of the graph knowledge, namely, the graph knowledge of the entity type, the graph knowledge of the event type, and the graph knowledge of the relationship type, the three types of the graph knowledge can be obtained by using an entity type knowledge obtaining model, an event type knowledge obtaining model, and a relationship type knowledge obtaining model which respectively correspond to the three types of the graph knowledge, the three types of the event type knowledge obtaining model, and the three types of the relationship type knowledge obtaining model, and the specific number of the models may not be limited, that is, at least one of the following conditions may be included in the step of extracting the graph knowledge of the entity type, the graph:

Specifically, the extracting of the atlas knowledge of the entity type from the original information data set for the at least one entity type knowledge acquisition model may include extracting the atlas knowledge of the person entity type from the original information data set by using at least one of a regular model, a bert language model, an LSTM + CRF language model, and an few-shot language model.

Fig. 3 is a schematic flowchart of another method for constructing a knowledge graph according to an embodiment of the present disclosure, and fig. 4 is a schematic flowchart of another method for constructing a knowledge graph according to an embodiment of the present disclosure, which is respectively shown in fig. 3 and fig. 4, where the difference between the two methods is:

FIG. 3 illustrates an embodiment in which the entity-type knowledge acquisition model, the event-type knowledge acquisition model, and the relationship-type knowledge acquisition model each have independent input and output interfaces, and the acquired corresponding atlas knowledge is directly subjected to post-processing; in the embodiment shown in fig. 4, the atlas knowledge output by any one of the entity-type knowledge acquisition model, the event-type knowledge acquisition model, and the relationship-type knowledge acquisition model may be used as input data of any one or two of the other models, and in fig. 4, the atlas knowledge output by the entity-type knowledge acquisition model is used as input data of the event-type knowledge acquisition model and the relationship-type knowledge acquisition model, and the atlas knowledge output by the relationship-type knowledge acquisition model is used as input data of the event-type knowledge acquisition model, which is only an example, and other combinations may also be used.

In addition, as shown in fig. 3 and 4, for the graph knowledge of the relationship type, the graph knowledge of the entity type may be extracted therefrom, and/or the graph knowledge of the entity type and the graph knowledge of the relationship type may be extracted from the graph knowledge of the event type. At this time, the sources of the entity type of graph knowledge and the relation type of graph knowledge are expanded. On the basis of completing the extraction of the atlas knowledge of the entity type, the atlas knowledge of the relationship type and the atlas knowledge of the event type, the atlas knowledge can be further collected according to types, namely, atlas knowledge collections corresponding to the three types can be generated, and an atlas knowledge collection of the entity type, an atlas knowledge collection of the relationship type and an atlas knowledge collection of the event type are respectively formed.

In some embodiments, there may be situations where the knowledge of the graph obtained through the knowledge acquisition model is duplicated or already confirmed by the user, so that the graph knowledge of each type obtained through categorical grouping may be further subjected to a merging and fusing process, and/or the graph knowledge obtained through categorical grouping may be subjected to a filtering process.

In the specific fusion processing process, when a plurality of different knowledge acquisition models are used for map knowledge extraction, a plurality of same results may be obtained, for example, when a figure entity type is extracted by using the knowledge acquisition models, if 4 knowledge acquisition models perform map knowledge extraction on the same article containing three characters, 4 results of "three characters-three characters" may be extracted; similarly, the same situation exists for the relationship type and the event type, and therefore, in the result processing process, merging and fusing processing needs to be performed. On one hand, the merging and fusing processing can directly remove the duplicate, such as 4 'zhang san-figures', can be merged into one, and in addition, character matching and similar merging processing can be carried out, such as enterprise entity types, 3 candidate results exist: the "X-th norm", the "X-normal form", and the "X-th norm" can be fused to obtain the "X-th norm" of the optimization result.

In addition, for certain map knowledge that may have been translated into a problem to be confirmed and that has been confirmed or denied, or that has been flagged as uncertain map knowledge multiple times, that type of map knowledge may be filtered out.

In some embodiments, there may be a large number of the atlas knowledge extracted by the knowledge acquisition model to be converted into questions to be confirmed, and therefore, a question pool may be generated based on the questions to be confirmed, and the questions to be confirmed may be extracted from the question pool in a preset order and pushed to at least one user to answer. Specifically, for the order of extracting the questions to be confirmed from the question pool, a learning model or a prescribed policy may be used for constraint, which may include various cases, for example, the following cases:

firstly, the questions to be confirmed can be extracted from a question pool based on a learning model and pushed to at least one user for answering;

the learning model is mainly used for extracting the questions to be confirmed in the question pool so as to solve the problems that the questions to be confirmed in the question pool are too many, and how to select one or a plurality of most suitable questions to be confirmed is pushed to the user for answering. The learning model can learn according to the answer result of the user and decide the order of pushing the questions to be confirmed. The input of the learning template can comprise one or more items of information relevant to user answering, such as result information of answered questions, question related information to be confirmed, association information between questions and questions, source related information of questions to be confirmed, question-answering person related information and the like; the output is one or a batch of questions to be confirmed for that user.

Secondly, extracting the questions to be confirmed from the question pool based on preset priority and/or performance parameter value indexes, and pushing the questions to at least one user for answering;

the preset priority refers to the classification of knowledge acquisition models for constructing the knowledge graph, and managers can regulate and control graph construction resources according to the granularity of the knowledge acquisition models. If 10 knowledge acquisition models are defined, 5 requirements need to be preferentially constructed, the priority is set as priority, and the questions corresponding to the atlas knowledge extracted by the 5 knowledge acquisition modules are presented on a question and answer page in a large proportion; for example, 10 knowledge acquisition models, 3 already reach the service requirement, priority is set to pause, and the 3 questions to be confirmed can no longer be pushed.

The performance parameter indexes are that in the process of constructing the knowledge graph, the performance parameter indexes of all knowledge acquisition models are inconsistent, the performance parameter indexes of some models are better, and the performance parameter indexes of some models are poorer; the indexes of the correlation model are influenced by the answer quantity of the questions. Problem ratio limitation, low-frequency map knowledge types rarely appear in a problem pool, if a system does not interfere, the model development of the continuous high-frequency problem is fast, and the model development of the low-frequency problem is slow; therefore, based on the performance parameter indexes, the system is developed in a balanced manner, problems to be confirmed, which are related to a model with the performance parameter indexes behind or a low-frequency problem model, are preferentially extracted from the problem pool, the system adjusts the occurrence frequency of the related problems, the model indexes are improved, and achievements formed by supporting map construction generally have better quality.

And thirdly, extracting the questions to be confirmed from the question pool based on the relevance of the map knowledge, and pushing the questions to at least one user for answering. The above-mentioned manner of using the relationship expansion may be to ask an entity type question first, for example, "X-th paradigm is company or not"; after the user answers and confirms, continuously asking a question of a relation type, such as whether the 'X normal form-owned product-AA' belongs to the relation of 'enterprise-owned product-product'; and after the user answers and confirms, continuously asking about the release events of the AA products of the Xth paradigm company, and sequentially developing questions according to the relevance.

And fourthly, extracting the questions to be confirmed from the question pool based on the data granularity in the original information data set, and pushing the questions to at least one user for answering. For example, if a plurality of articles are included in the original information data set, questions about entity types, relationship types, and time types extracted from the articles may be sequentially asked according to the granularity of one article.

In some embodiments, pushing the question to be confirmed to at least one user for answering may include multiple implementations, for example, the question may be pushed to at least one user for answering by at least one of a text mode, an image mode, or a voice broadcast mode. The text mode mainly means that an interactive interface can be displayed on a screen of the electronic equipment, and the problem to be confirmed is displayed on the interactive interface in a text form; the image interaction mode is a mode of making the problem to be confirmed into a picture or a video for displaying; the voice broadcasting mode is a mode of making the problem to be confirmed into voice and issuing the voice to the user.

Further, after the item to be confirmed is pushed to the user in any one of the above manners, the user may also submit the answer result in various manners. For example, the answer result of the user may be obtained in any one of a keyboard mode, a mouse input mode, a touch input mode, a voice input mode, or a somatosensory input mode.

In some embodiments, the questions to be confirmed may be pushed to the user through a variety of application scenarios or platforms. Specifically, pushing the question to be confirmed to the user may include one of the following ways:

the first way is to push the questions to be confirmed to a shared learning platform, such as a wiki or other learning platform, and the questions to be confirmed are displayed on the learning platform and answered by the user;

the second mode is that the question to be confirmed is pushed to a login verification platform and can be displayed to the user in a verification code mode, so that each user can answer in the verification process of inputting the verification code and input an answer result;

a third way is to push the questions to be confirmed to an operational platform including, but not limited to, at least one of a customer service system platform, an assessment system platform, and a crowdsourcing platform.

The questions to be confirmed are pushed in various modes, various resources can be fully utilized to obtain user resources, a large number of questions to be confirmed can be answered, and the construction efficiency of the knowledge graph can be further improved.

In some embodiments, after pushing the question to be confirmed to the user and receiving the answer result of the user for the question to be confirmed, the knowledge graph may be further constructed according to the answer structure of the user, and the specific construction manner may include:

Specifically, the answer result of the general user may include a determination result and a denial result, or may further set an uncertainty option. When the answer result of the user includes a confirmation type result and a denial type result, the confirming whether the preset confirmation condition is satisfied based on the answer result of the user includes:

For example, for a certain question to be confirmed which is pushed to the user, the answer results which can be selected by the user include "yes", "no" and "no confirmation", and when the answer result of the user is "yes", the atlas knowledge corresponding to the question is supported by 1 person, that is, the number of the results of the confirmation type is increased by 1; when the user answers "no", that is, the number of negative type results is increased by 1, and the preset determination condition is set to confirm or deny that a certain value must be reached, it may be further determined whether the preset confirmation condition is satisfied.

Further, according to different situation requirements, different accuracies may be set for the confirmation conditions, that is, the preset confirmation conditions include a first accuracy confirmation condition and a second accuracy confirmation condition, and the number requirements of the first accuracy confirmation condition and the second accuracy confirmation condition are different. For example, the first accuracy confirmation condition requires that the answer result satisfy 1 result of the confirmation type of yes, and the second accuracy confirmation condition requires that the answer result satisfy 5 results of the confirmation type of yes.

In addition, the step of constructing the knowledge graph according to the graph knowledge corresponding to the answer result may specifically include:

Specifically, with reference to the foregoing embodiment, it has been described that when pushing a question to be confirmed, the pushing may be performed based on a relevance question, and for example, the pushing includes the following steps:

firstly, the first pushed problem is that whether the Xth paradigm is a company or not, and if the answer result of the user is yes, entity nodes can be established on the knowledge graph; if the answer result of the user is 'no', the relevance problem can be skipped, and meanwhile, the 'X-th paradigm' serving as the atlas knowledge of the company is counted into negative clearness and is filtered out in the subsequent atlas knowledge extraction process;

in an embodiment of the present disclosure, the graph knowledge of the entity type corresponds to an entity node on the graph. In some cases, an entity type may also be referred to as a node type.

Second, the second question of push is "X-th paradigm-owned product-AA" belongs to the relation of "enterprise-owned product-product? If the answer result of the user is yes, when the entity node X-th normal form does not exist, establishing the entity node X-th normal form, creating a relation edge of the X-th normal form-owned product-AA on the knowledge graph, and continuously inquiring other problems; and if the answer result of the user is 'no', the atlas knowledge corresponding to the question is counted into negative direction clear, and is filtered out when the atlas knowledge is subsequently extracted.

In addition, for the problem of the map knowledge production of the event type, for example, the problem is "whether the film and television work mapping event 01 exists, and the relationship in the event is as follows:

movie work show event 01-movie work name-ironmen

Movie work mapping event 01-mapping place-China

Event 01-director-Jone-Fibreuer on film and television works

Movie work mapping event 01-lead show-little robert-Tang

Movie work mapping event 01-lead actor-gurweneisi palo

Movie work mapping event 01-lead actor-tylens Howland

Event 01-lead actor-jeff brigies? ".

If the answer result of the user is yes, if there is no entity node associated with the event on the knowledge graph, creating an entity node, such as entity node "ironman", "2008, 4, 30, and" china ", and meanwhile calculating the entity node associated with the event to generate an event ID, such as movie and television show event 01, and meanwhile, building an entity node, which is an event node, on the knowledge graph, and building the event with the event node as the center:

movie work show event 01-movie work name-ironmen

Movie work mapping event 01-mapping place-China

Event 01-director-Jone-Fibreuer on film and television works

Movie work mapping event 01-lead show-little robert-Tang

Movie work mapping event 01-lead actor-gurweneisi palo

Movie work mapping event 01-lead actor-tylens Howland

If the answer result of the user is 'no', the event is discarded and negative-direction cleaning is added, and when the knowledge acquisition model is subsequently used for extracting the map knowledge, the map knowledge corresponding to the problem is filtered, so that the optimization effect is achieved.

In the embodiment of the disclosure, for the entity nodes, the relationship edges and the events in the constructed knowledge graph, the data storage form may be a data table structure, a json structure or a rdf structure, and the data storage structures all have good expansibility.

In some embodiments, the original information data set may include at least one of an original corpus data set, an original picture data set, and an original video data set, and for the various types of original information, the knowledge acquisition model may be used to extract knowledge of the knowledge map, and the artificial annotation is implemented in the question-and-answer manner provided by the embodiments of the present disclosure, and the knowledge map is constructed using the annotated result.

Specifically, for the original corpus data set, the original corpus data may be obtained from unstructured text data, and the original corpus data may be further preprocessed. Specifically, for the case that the original expected data is unstructured text data, the acquiring the original expected data set may specifically include the following steps:

acquiring unstructured text data;

the unstructured text data is preprocessed, and the preprocessing method process may include at least one of encoding, word segmentation, processing using a rule model, and dictionary matching processing.

Specifically, refer to the embodiments shown in fig. 3 and fig. 4, where the preprocessing is performed on the unstructured text data, and then the preprocessed original corpus data is input into the knowledge acquisition model to extract the atlas knowledge.

Corresponding to the method embodiment, the embodiment of the disclosure also provides a device for constructing the knowledge graph. Fig. 5 is a schematic structural diagram of an apparatus for constructing a knowledge graph according to an embodiment of the present disclosure, as shown in fig. 5, the apparatus includes: a first obtaining module 51, a second obtaining module 52, an extracting module 53, a pushing module 54 and a constructing module 55.

The first obtaining module 51 is configured to obtain definition information of a knowledge graph, specifically, in a process of constructing the knowledge graph, the definition information is obtained according to different types or kinds of the knowledge graph to be constructed, and specific content of the definition information corresponds to constituent elements included in the knowledge graph, for example, for the above embodiment, when the knowledge graph may include an entity node, an event node, and a relationship edge, the definition information obtained from the knowledge graph may include at least one of entity type definition information, event type definition information, and relationship type definition information obtained from the knowledge graph.

In addition, in the embodiment of the present disclosure, the knowledge graph supports three situations, namely, a completely redefined input port, a fixed knowledge graph and a preset knowledge graph, wherein the completely redefined input port needs to be set with knowledge graph definition information, and receives newly input definition information, the fixed knowledge graph refers to a situation where a defined knowledge graph is constructed, definition information of the knowledge graph at this time can be stored in a database, and in this step, definition information of the knowledge graph is obtained from the database; the preset knowledge graph refers to a user part definition graph, and the system embeds part of graph definition information in advance, and in the process of defining the graph by the user, the preset graph knowledge model type in the system is directly/indirectly multiplexed. The user-defined part needs to set an input port of the knowledge graph definition information and receive newly input definition information.

The second obtaining module 52 is configured to obtain an original information data set, specifically, the obtained original information data set, which is original data used for constructing a knowledge graph, and may include multiple forms, for example, at least one of an original corpus data set, an original picture data set, and an original video data set. The original expected data set can be divided into unstructured text data and structured text data according to different data types, and in the embodiment of the disclosure, atlas knowledge extraction can be performed on the unstructured text data. In some embodiments, the unstructured text data may be accessed through a database, such as a mysql, mongodb, ES, hive, or the like type database, or may be accessed as a file upload.

The extraction module 53 is configured to extract the knowledge of the knowledge graph from the original corpus data set by using a knowledge acquisition model based on the definition information of the knowledge graph, for example, the knowledge graph may be extracted from the original corpus data set by using a pre-trained knowledge acquisition model according to the definition information of the knowledge graph acquired by the first acquisition module 51. Specifically, when the definition information of the knowledge graph acquired by the first acquiring module 51 includes at least one of entity type definition information, event type definition information, and relationship type definition information, the graph knowledge acquired in this step may also include at least one of graph knowledge of an entity type, graph knowledge of an event type, and graph knowledge of a relationship type.

The pushing module 54 is configured to convert the atlas knowledge extracted by the knowledge acquisition model into a question to be confirmed, and push the question to at least one user for answering. Specifically, the module further converts the map knowledge into a question to be confirmed on the basis of extracting the map knowledge by using a knowledge extraction model, and pushes the question to be confirmed to a user through at least one preset channel for answering, because the question is a question-and-answer type question, and the user only needs to provide a determined or uncertain answering result, the requirement on the user is low, and the question-and-answer type question-and-answer module can be realized by using a common user without much professional knowledge; and the problem to be confirmed can be opened to the outside and accessed to different application scenes or platforms for realization. And the answer results of the ordinary users can be used as labeled atlas knowledge after being collected.

And the construction module 55 is used for constructing the knowledge graph based on the answer result of the user. After answer results of a user for a problem to be confirmed are obtained through various application scenes or platforms, the knowledge graph can be constructed by the module based on the answer results.

In some embodiments, the reliability of the graph knowledge confirmed by the user as a response is high, and the graph knowledge can be considered as sample data to train the knowledge acquisition model so as to improve the performance of the knowledge acquisition model. Fig. 6 is a schematic flowchart of another method for constructing a knowledge graph according to an embodiment of the present disclosure, and as shown in fig. 6, the apparatus may further include a training data generating module 56 and a training module 57.

The training data generating module 56 is configured to generate optimized training data based on the response result of the user. Specifically, the module considers that the result of the user is equivalent to the result obtained by re-correcting the atlas knowledge extracted from the knowledge acquisition model, and in the step, new sample data is generated by using the result to re-optimize the knowledge acquisition model, so that the module can be used as optimized training data.

The training module 57 is configured to perform optimization training on the knowledge acquisition model based on the optimization training data to obtain an optimized knowledge acquisition model; after the knowledge acquisition model is optimally trained through the optimized training data, the performance of the knowledge acquisition model can be obviously improved, for example, the performance of the knowledge acquisition model in the aspects of accuracy, recall rate and the like can be improved.

The pushing module 54 is further configured to convert the atlas knowledge extracted by using the optimized knowledge acquisition model into a question to be confirmed, and push the question to at least one user for answering.

For the optimized knowledge acquisition model, the extracted atlas knowledge may be converted into a question to be confirmed in the manner shown in the embodiment of fig. 5, and then the question to be confirmed is pushed to the user for answering.

As an implementation scheme of circular optimization, in the embodiment of the present disclosure, optimized training data may be generated based on a result of a user response for multiple times, and a knowledge acquisition model is optimally trained, that is, the training data generation module 56, the training module 57, and the push module 54 repeatedly execute the above processing procedure in sequence.

In the embodiment of the disclosure, the answering result continuously confirmed by the user can be used as an optimized training data optimization knowledge acquisition model, and then a more accurate question is generated and pushed to the user to answer. In some embodiments, if it is determined that the value of the performance parameter of knowledge of the atlas extracted from the original corpus information dataset by the knowledge acquisition model reaches a certain threshold value, that is, is greater than or equal to a preset threshold value, it may be considered that the atlas knowledge is directly stored in the database and the knowledge atlas is directly constructed by using the parameter value of the performance parameter of knowledge of the atlas extracted from the original corpus information dataset by the knowledge acquisition model, that is, in the embodiment shown in fig. 6, the construction module 55 may be further configured to directly construct the knowledge atlas based on the atlas knowledge of the knowledge acquisition model when it is determined that the value of the performance parameter of knowledge of the atlas extracted from the original corpus information dataset by the knowledge acquisition model is greater than or equal to the preset threshold value. In the embodiment of the present disclosure, the performance parameter value may include a parameter of the scale knowledge acquisition model performance, such as accuracy, recall ratio or F1 value, or other values calculated according to the above parameter, where F1 is a harmonic mean of accuracy and recall ratio.

In some embodiments, the first obtaining module 51 is configured to obtain at least one of entity type definition information, event type definition information, and relationship type definition information of the knowledge-graph. The extracting module 53 needs to extract at least one of the graph knowledge of the entity type, the graph knowledge of the event type, and the graph knowledge of the relationship type from the original corpus information dataset by using a knowledge acquisition model, wherein the knowledge acquisition model may be a plurality of types of models, and may include at least one of a rule model, a dictionary model, a statistical learning model, a machine learning model, and a language model, for example.

In addition, for obtaining three types of the graph knowledge, namely, the graph knowledge of the entity type, the graph knowledge of the event type, and the graph knowledge of the relationship type, the three types of the graph knowledge may be obtained by using the entity type knowledge obtaining model, the event type knowledge obtaining model, and the relationship type knowledge obtaining model corresponding to the three types of the graph knowledge, respectively, and the specific number of the models may not be limited, that is, the extracting module 53 in the embodiment shown in fig. 5 is configured to extract at least one of the graph knowledge of the entity type, the graph knowledge of the event type, and the graph knowledge of the relationship type from the original corpus information data set, and may include at least one of the following situations:

extracting the atlas knowledge of the entity type from the original corpus information data set by using at least one entity type knowledge acquisition model;

extracting atlas knowledge of the event type from the original corpus information data set by using at least one event type knowledge acquisition model;

and extracting the atlas knowledge of the relation type from the original corpus information data set by using at least one relation type knowledge acquisition model.

Specifically, the extracting of the atlas knowledge of the entity type from the original corpus information dataset may include extracting, for the person entity type, the atlas knowledge of the person entity type from the original corpus information dataset by using at least one of a regular model, a bert language model, an LSTM + CRF language model, and an few-shot language model.

Still referring to fig. 3 and 4 above, the difference between them is that:

In addition, as shown in fig. 3 and 4, the extracting module 53 may also extract the graph knowledge of the entity type from the graph knowledge of the relationship type when extracting the graph knowledge, and/or extract the graph knowledge of the entity type and the graph knowledge of the relationship type from the graph knowledge of the event type. At this time, the sources of the entity type of graph knowledge and the relation type of graph knowledge are expanded. On the basis of completing the extraction of the atlas knowledge of the entity type, the atlas knowledge of the relationship type, and the atlas knowledge of the event type, as further shown in fig. 6, a collection module 58 may be further provided, where the collection module 58 is configured to collect the atlas knowledge according to types, that is, a collection of the atlas knowledge corresponding to the three types may be generated.

In some embodiments, there may be a situation that the atlas knowledge obtained through the knowledge acquisition model is duplicated or already confirmed by the user, so the aggregation module may further perform a merging and fusing process on the atlas knowledge obtained through classification and aggregation, and/or perform a filtering process on the atlas knowledge obtained through classification and aggregation.

In some embodiments, the number of the atlas knowledge extracted by the knowledge acquisition model is possibly large, so that the pushing module 54 may generate a question pool based on the questions to be confirmed, extract the questions to be confirmed from the question pool according to a preset order, and push the questions to be confirmed to at least one user for answering. Specifically, for the order of extracting the questions to be confirmed from the question pool, a learning model or a prescribed policy may be used for constraint, which may include various cases, for example, the following cases:

the learning model is mainly used for extracting the questions to be confirmed in the question pool so as to solve the problems that the questions to be confirmed in the question pool are too many, and how to select one or a plurality of most suitable questions to be confirmed is pushed to the user for answering. The learning model can learn according to the answer result of the user and decide the order of pushing the questions to be confirmed. The input of the learning template can comprise one or more items of information relevant to user answering, such as result information of answered questions, question related information to be confirmed, association information between questions and questions, source related information of questions to be confirmed, question-answering person related information and the like; the output is one or a batch of questions to be confirmed for that user. Secondly, extracting the questions to be confirmed from the question pool based on preset priority and/or performance parameter value indexes, and pushing the questions to at least one user for answering;

The performance parameter indexes are that in the process of constructing the knowledge graph, the performance parameter indexes of all knowledge acquisition models are inconsistent, the performance parameter indexes of some models are better, and the performance parameter indexes of some models are poorer; the indexes of the correlation model are influenced by the answer quantity of the questions. Problem ratio limitation, low-frequency map knowledge types rarely appear in a problem pool, if a system does not interfere, the model development of the continuous high-frequency problem is fast, and the model development of the low-frequency problem is slow; therefore, based on the performance parameter indexes, the system is developed in a balanced manner, problems to be confirmed, which are related to a model with the performance parameter indexes behind or a low-frequency problem model, are preferentially extracted from the problem pool, the system adjusts the occurrence frequency of the related problems, the model indexes are improved, and achievements formed by supporting map construction generally have better quality. And thirdly, extracting the questions to be confirmed from the question pool based on the relevance of the map knowledge, and pushing the questions to at least one user for answering. The above-mentioned manner of using the relationship expansion may be to ask an entity type question first, for example, "X-th paradigm is company or not"; after the user answers and confirms, continuously asking a question of a relation type, such as whether the 'X normal form-owned product-AA' belongs to the relation of 'enterprise-owned product-product'; and after the user answers and confirms, continuously asking about the release events of the AA products of the Xth paradigm company, and sequentially developing questions according to the relevance.

And fourthly, extracting the questions to be confirmed from the question pool based on the data granularity in the original corpus information data set, and pushing the questions to at least one user for answering. For example, if the original corpus information dataset includes a plurality of articles, the questions about the entity type, the relationship type, and the time type extracted from one article may be sequentially asked according to the granularity of the article.

In some embodiments, the pushing module 54 may be configured to push the question to be confirmed to at least one user for answering, and the pushing module may include multiple implementations, for example, the question may be pushed to the at least one user for answering through at least one of a text mode, an image mode or a voice broadcasting mode. The text mode mainly means that an interactive interface can be displayed on a screen of the electronic equipment, and the problem to be confirmed is displayed on the interactive interface in a text form; the image interaction mode is a mode of making the problem to be confirmed into a picture or a video for displaying; the voice broadcasting mode is a mode of making the problem to be confirmed into voice and issuing the voice to the user.

Further, as shown in fig. 6, the apparatus may further include an input module 59, and after pushing the item to be confirmed to the user through any one of the above manners, the user may also submit the answer result through the input module 59, and the submission manner may include multiple manners. For example, the answer result of the user may be obtained in any one of a keyboard mode, a mouse input mode, a touch input mode, a voice input mode, or a somatosensory input mode.

In some embodiments, the push module 54 may push the questions to be confirmed to the user through a variety of application scenarios or platforms. Specifically, pushing the question to be confirmed to the user may include one of the following ways:

the first way is to push the question to be confirmed to a shared learning platform, such as wiki, etc., which displays the question to be confirmed and answers by the user;

In some embodiments, after pushing the question to be confirmed to the user and receiving the answer result of the user for the question to be confirmed, the building module 55 builds the knowledge graph according to the answer structure of the user, and the specific building manner may include:

In addition, the specific processing manner of the construction module 55 for constructing the knowledge graph according to the graph knowledge corresponding to the answer result may specifically include: constructing entity nodes in the knowledge graph according to graph knowledge of the entity types; establishing entity nodes which do not exist in the knowledge graph and establishing relationship edges according to the graph knowledge of the association type; and constructing entity nodes which do not exist in the knowledge graph according to the graph knowledge of the event type, and constructing event nodes and events which are centered by the event nodes.

movie work show event 01-movie work name-ironmen

Movie work mapping event 01-mapping place-China

Event 01-director-Jone-Fibreuer on film and television works

Movie work mapping event 01-lead show-little robert-Tang

Movie work mapping event 01-lead actor-gurweneisi palo

Movie work mapping event 01-lead actor-tylens Howland

Event 01-lead actor-jeff brigies? ".

movie work show event 01-movie work name-ironmen

Movie work mapping event 01-mapping place-China

Event 01-director-Jone-Fibreuer on film and television works

Movie work mapping event 01-lead show-little robert-Tang

Movie work mapping event 01-lead actor-gurweneisi palo

Movie work mapping event 01-lead actor-tylens Howland

In some embodiments, the original information data set acquired by the second acquiring module 52 may include at least one of an original corpus data set, an original picture data set, and an original video data set, and for the above various types of original information, extraction and extraction of atlas knowledge may be performed through a knowledge acquiring model, manual annotation is implemented through a question-and-answer mode provided by the embodiment of the present disclosure, and a knowledge atlas is constructed using an annotated result.

Specifically, for the original corpus data set, the original corpus data may be obtained from unstructured text data, and the second obtaining module 52 may further perform preprocessing on the original corpus data. Specifically, for the case that the original expected data is unstructured text data, the acquiring the original expected data set may specifically include the following steps:

acquiring unstructured text data;

the unstructured text data is preprocessed, and the preprocessing method process can comprise at least one of encoding, word segmentation, rule model processing and dictionary matching processing.

In a third aspect, an embodiment of the present invention provides a computer apparatus, including:

a processor for implementing the steps of the method of constructing a knowledge-graph as described above when executing a computer program stored in the memory.

The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the computer to perform desired functions.

The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by a processor to implement the above method steps of the various embodiments of the present application and/or other desired functions.

In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of constructing a knowledge-graph as described above.

In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the method steps of the various embodiments of the present application.

The computer program product may include program code for carrying out operations for embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the method steps of the various embodiments of the present application.

A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of constructing a knowledge graph, comprising:

acquiring definition information of a knowledge graph;

acquiring an original information data set;

extracting map knowledge from the original information data set by using a knowledge acquisition model based on the definition information of the knowledge map;

converting the atlas knowledge extracted by the knowledge acquisition model into a problem to be confirmed, and pushing the problem to at least one user for answering;

and constructing the knowledge graph based on the response result of the user.

2. The method of claim 1, further comprising:

generating optimized training data based on the response result of the user;

3. The method of claim 1, further comprising:

and when the performance parameter value of the map knowledge extracted from the original information data set by using the knowledge acquisition model is determined to be larger than or equal to a preset threshold value based on the response result of the user, directly constructing a knowledge map based on the map knowledge extracted by using the knowledge acquisition model.

4. The method of claim 1, wherein obtaining knowledge-graph definition information comprises:

the extracting of the atlas knowledge from the raw information dataset with the knowledge acquisition model comprises:

and extracting at least one of the entity type map knowledge, the event type map knowledge and the relationship type map knowledge from the original information data set by using a knowledge acquisition model.

5. The method of claim 4, wherein the knowledge acquisition model comprises at least one of a rule model, a dictionary model, a statistical learning model, a machine learning model, and a language model.

6. The method of claim 4, wherein the extracting at least one of graph knowledge of entity types, graph knowledge of event types, and graph knowledge of relationship types from the raw information dataset using a knowledge acquisition model comprises at least one of:

7. The method according to claim 6, wherein the graph knowledge output by any one of the entity-type knowledge acquisition model, the event-type knowledge acquisition model and the relationship-type knowledge acquisition model is used as input data for any one or both of the others.

8. An apparatus for constructing a knowledge graph, comprising:

the extraction module is used for extracting map knowledge from the original information data set by using a knowledge acquisition model based on the definition information of the knowledge map;

9. A computer device, the computer device comprising:

a processor for implementing the steps of the method according to any one of claims 1 to 7 when executing a computer program stored in a memory.

10. A computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 7.