CN117763161A

CN117763161A - Knowledge base automatic construction method based on knowledge graph

Info

Publication number: CN117763161A
Application number: CN202311741242.1A
Authority: CN
Inventors: 魏爽; 夏天舒; 谢满德
Original assignee: Sinyada Technology Co ltd
Current assignee: Sinyada Technology Co ltd
Priority date: 2023-12-18
Filing date: 2023-12-18
Publication date: 2024-03-26

Abstract

The invention discloses a knowledge base automatic construction method based on a knowledge graph, which comprises the following steps: s01, inputting an entry, and inducing targets and relations in the entry through an expert annotation method to form a target and relation indexing rule base; s02, training a machine learning model according to expert labeling rules and machine learning rules; s03, performing prediction labeling and analysis on the entry by using the trained machine learning model so as to obtain a knowledge graph under the target correlation. According to the knowledge base automatic construction method based on the knowledge spectrum, continuous vocabulary entry retrieval is utilized, the plurality of vocabulary entries are subjected to analysis to obtain the semantics of the closest initial vocabulary entry, then the knowledge spectrum is created according to the semantics and fed back to a user, and therefore on the basis of user-defined vocabulary entries, on the basis of reducing the times of vocabulary entry input by the user, target vocabulary entries required by the user are found out more quickly through intelligent trimming.

Description

Knowledge base automatic construction method based on knowledge graph

Technical Field

The invention relates to the technical field of cognitive intelligence, in particular to a knowledge base automatic construction method based on a knowledge graph.

Background

The concept of knowledge graph is formally proposed by Google in 2012 for improving the quality of search. The knowledge graph provides structured and detailed information about the topic in addition to displaying linked lists of other websites. The goal is that users will be able to use the information provided by this feature to solve the problem they are querying without having to navigate to other websites and aggregate the information themselves.

In connection with publication (bulletin) numbers: CN114911893a, publication (date): 2022-08-16 discloses a knowledge base automatic construction method and system based on knowledge graph. The method comprises the following steps: unstructured data are obtained and processed to form a training set file and a prediction set file; graphically constructing a Schema of a knowledge graph for describing the relationship between entities in the field; labeling entities in the training set file according to the constructed Schema; training a service model for predicting the relationship between entities by using the marked files and the pre-established rule set files; inputting the prediction set file into a trained service model, and executing a prediction task to obtain a prediction result, namely entity-relation-entity triple data; and converting the prediction result of the service model into a knowledge graph, and automatically adding the knowledge graph into a knowledge base. The method can realize the automatic construction of the knowledge graph and provide great convenience for utilizing the key information hidden by the data insight.

In the prior art comprising the patent, along with the appearance of massive data and the fusion and cross use of multiple data sources, the traditional data management mode is limited to a certain extent, and compared with the traditional database, the method for automatically constructing the knowledge base realizes the efficient management of knowledge. However, since the input is mostly fuzzy, the recommended terms cannot be effectively filtered and all terms are listed, which requires the user to continuously change terms so as to reduce the listed terms and facilitate finding the required target terms.

Disclosure of Invention

The invention aims to provide a knowledge base automatic construction method based on a knowledge graph, which is used for solving the problems.

In order to achieve the above object, the present invention provides the following technical solutions:

a method for automatically constructing a knowledge base based on a knowledge graph comprises the following steps:

s01, inputting an entry, and inducing targets and relations in the entry through an expert annotation method to form a target and relation indexing rule base;

s02, training a machine learning model according to expert labeling rules and machine learning rules;

s03, performing prediction labeling and analysis on the entry by using the trained machine learning model so as to obtain a knowledge graph under the target correlation.

Preferably, the step S01 of summarizing the objects and relationships in the entry includes the steps of:

s11, acquiring character strings of a plurality of finished words in the vocabulary entry into a plurality of initial vocabulary entries;

s12, obtaining a plurality of similar entries input in a continuous time period, wherein each character string of the similar entries corresponds to an initial entry one by one, and the initial entries are a plurality of continuous input results;

s13, analyzing the plurality of initial entries one by one based on semantic analysis, and giving a plurality of similar semantic search entries.

Preferably, the obtaining of the plurality of similar terms input in the continuous time period in S12 is that, if the term is recorded in the thesaurus, if the time that the term is not used exceeds a first time threshold, the term is deleted from the thesaurus and recorded in the cache;

if the term is recorded in the cache, if the time that the term has not been used exceeds a second time threshold, then a new retrieved term is identified.

Preferably, the step S01 of generalizing the objects and relationships in the entry adopts the following steps:

s14, cleaning a plurality of semantic search terms, rechecking and checking the semantic search terms, deleting repeated data, and normalizing the semantic search terms;

s15, converting the normalized semantic search vocabulary entry into a TXT format, and respectively generating a training set and a prediction set file required by machine learning rule training.

Preferably, the machine learning model is based on a Schema of a graphical knowledge graph, which Schema can be dragged.

Preferably, the machine learning model is trained by maximizing a log likelihood function to solve for optimal parameters of the model.

Preferably, the expert labeling rule is that the object and the relation in the entry are classified and identified by utilizing the optimal parameter of the solving model through a density-based clustering algorithm to obtain a classification result and position coordinates of each classification, clustering is carried out according to the classification result and the position coordinates of each classification, and probability mean value, standard deviation and variance of each classification are calculated.

In the technical scheme, the method for automatically constructing the knowledge base based on the knowledge graph has the following beneficial effects: and (3) searching a plurality of entries by utilizing continuous entries, analyzing the plurality of entries by using an analyzer to acquire the semantics of the closest initial entry, creating a knowledge graph according to the semantics, and feeding back to a user, so that on the basis of user-defined entries, the intelligent trimming is performed on the basis of reducing the times of entry of the entries by the user, and on the basis of reducing the times of entry of the entries by the user, the required target entry is found more quickly.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a schematic flow structure provided in an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a method for automatically constructing a knowledge base based on a knowledge graph includes the following steps:

In the above technology, by utilizing the search of continuous terms, the analyzer is performed on a plurality of terms to obtain the semantics of the most similar initial terms, then a knowledge graph is created according to the semantics and fed back to the user, so that on the basis of user-defined terms, the intelligent trimming is performed on the basis of reducing the times of inputting the terms by the user, the required target terms are found more quickly.

As a further embodiment of the present invention, S01 generalizing the objects and relationships in the vocabulary entry includes the following steps:

s12, acquiring a plurality of similar entries input in a continuous time period, wherein each character string of the similar entries corresponds to an initial entry one by one, and the initial entries are a plurality of continuous input results;

s13, analyzing a plurality of initial entries one by one based on semantic analysis, and giving out a plurality of similar semantic search entries.

As still another embodiment of the present invention, the obtaining of the plurality of similar terms input in the continuous time period in S12 is to delete the terms from the term library and record the terms in the cache if the time that the terms have not been used exceeds the first time threshold in the case that the terms are recorded in the term library;

if the term is recorded in the cache, the term is identified as a new retrieved term if the time that the term has not been used exceeds a second time threshold.

As still another embodiment of the present invention, S01 generalizes the objects and relationships in the entry to the following steps:

It should be noted that, the machine learning model is based on a Schema of the graphical knowledge graph, and the Schema can be dragged.

Secondly, the optimal parameters of the model are solved by maximizing the log likelihood function during training of the machine learning model.

And the expert labeling rule is that the object and the relation in the entry are classified and identified by utilizing the optimal parameters of the solving model through a density-based clustering algorithm to obtain a classification result and position coordinates of each classification, clustering is carried out according to the classification result and the position coordinates of each classification, and the probability mean value, standard deviation and variance of each classification are calculated.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

The embodiment of the application also provides a specific implementation manner of the electronic device capable of implementing all the steps in the method in the embodiment, and the electronic device specifically comprises the following contents:

a processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus;

the processor, the memory and the communication interface complete communication with each other through the bus;

the processor is configured to invoke the computer program in the memory, and when the processor executes the computer program, the processor implements all the steps in the method in the above embodiment.

The embodiments of the present application also provide a computer-readable storage medium capable of implementing all the steps of the methods in the above embodiments, the computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements all the steps of the methods in the above embodiments.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a hardware+program class embodiment, the description is relatively simple, as it is substantially similar to the method embodiment, as relevant see the partial description of the method embodiment. Although the present description provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in an actual device or end product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment) as illustrated by the embodiments or by the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when implementing the embodiments of the present disclosure, the functions of each module may be implemented in the same or multiple pieces of software and/or hardware, or a module that implements the same function may be implemented by multiple sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form. The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description embodiments may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present specification.

In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction. The foregoing is merely an example of an embodiment of the present disclosure and is not intended to limit the embodiment of the present disclosure. Various modifications and variations of the illustrative embodiments will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the embodiments of the present specification, should be included in the scope of the claims of the embodiments of the present specification.

Claims

1. The method for automatically constructing the knowledge base based on the knowledge graph is characterized by comprising the following steps of:

2. The method for automatically constructing a knowledge base based on knowledge graph according to claim 1, wherein the step S01 of generalizing the objects and relationships in the vocabulary entry comprises the following steps:

3. The method according to claim 1, wherein the obtaining of the plurality of similar terms inputted in the continuous time period in S12 is to delete the terms from the term library and record the terms in the cache if the time that the terms are not used exceeds a first time threshold;

4. The method for automatically constructing a knowledge base based on knowledge graph according to claim 1, wherein the step S01 of generalizing the objects and relationships in the vocabulary entry comprises the following steps:

5. The automated knowledge base construction method based on knowledge graph of claim 1, wherein the machine learning model is based on a Schema of graphically structured knowledge graph, the Schema being draggable.

6. The method for automatically constructing a knowledge base based on a knowledge graph according to claim 1, wherein the machine learning model is trained by maximizing a log likelihood function to solve for optimal parameters of the model.

7. The method for automatically constructing a knowledge base based on a knowledge graph according to claim 1, wherein the expert labeling rule is that a classification result and position coordinates of each classification are obtained by classifying and identifying objects and relations in terms by using optimal parameters of a solving model through a density-based clustering algorithm, clustering is carried out according to the classification result and the position coordinates of each classification, and probability mean, standard deviation and variance of each classification are calculated.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the visual localization method based on local variance and posterior probability classifier of any one of claims 1 to 7 when the program is executed by the processor.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the visual localization method based on a local variance and posterior probability classifier as claimed in any one of claims 1 to 7.