CN117763161A - Knowledge base automatic construction method based on knowledge graph - Google Patents
Knowledge base automatic construction method based on knowledge graph Download PDFInfo
- Publication number
- CN117763161A CN117763161A CN202311741242.1A CN202311741242A CN117763161A CN 117763161 A CN117763161 A CN 117763161A CN 202311741242 A CN202311741242 A CN 202311741242A CN 117763161 A CN117763161 A CN 117763161A
- Authority
- CN
- China
- Prior art keywords
- entry
- knowledge
- knowledge graph
- machine learning
- entries
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000010801 machine learning Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000002372 labelling Methods 0.000 claims abstract description 12
- 238000004458 analytical method Methods 0.000 claims abstract description 8
- 230000001939 inductive effect Effects 0.000 claims abstract description 4
- 238000004590 computer program Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 9
- 238000004140 cleaning Methods 0.000 claims description 3
- 230000004807 localization Effects 0.000 claims 2
- 230000000007 visual effect Effects 0.000 claims 2
- 238000009411 base construction Methods 0.000 claims 1
- 238000009966 trimming Methods 0.000 abstract description 3
- 238000001228 spectrum Methods 0.000 abstract 2
- 238000010586 diagram Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a knowledge base automatic construction method based on a knowledge graph, which comprises the following steps: s01, inputting an entry, and inducing targets and relations in the entry through an expert annotation method to form a target and relation indexing rule base; s02, training a machine learning model according to expert labeling rules and machine learning rules; s03, performing prediction labeling and analysis on the entry by using the trained machine learning model so as to obtain a knowledge graph under the target correlation. According to the knowledge base automatic construction method based on the knowledge spectrum, continuous vocabulary entry retrieval is utilized, the plurality of vocabulary entries are subjected to analysis to obtain the semantics of the closest initial vocabulary entry, then the knowledge spectrum is created according to the semantics and fed back to a user, and therefore on the basis of user-defined vocabulary entries, on the basis of reducing the times of vocabulary entry input by the user, target vocabulary entries required by the user are found out more quickly through intelligent trimming.
Description
Technical Field
The invention relates to the technical field of cognitive intelligence, in particular to a knowledge base automatic construction method based on a knowledge graph.
Background
The concept of knowledge graph is formally proposed by Google in 2012 for improving the quality of search. The knowledge graph provides structured and detailed information about the topic in addition to displaying linked lists of other websites. The goal is that users will be able to use the information provided by this feature to solve the problem they are querying without having to navigate to other websites and aggregate the information themselves.
In connection with publication (bulletin) numbers: CN114911893a, publication (date): 2022-08-16 discloses a knowledge base automatic construction method and system based on knowledge graph. The method comprises the following steps: unstructured data are obtained and processed to form a training set file and a prediction set file; graphically constructing a Schema of a knowledge graph for describing the relationship between entities in the field; labeling entities in the training set file according to the constructed Schema; training a service model for predicting the relationship between entities by using the marked files and the pre-established rule set files; inputting the prediction set file into a trained service model, and executing a prediction task to obtain a prediction result, namely entity-relation-entity triple data; and converting the prediction result of the service model into a knowledge graph, and automatically adding the knowledge graph into a knowledge base. The method can realize the automatic construction of the knowledge graph and provide great convenience for utilizing the key information hidden by the data insight.
In the prior art comprising the patent, along with the appearance of massive data and the fusion and cross use of multiple data sources, the traditional data management mode is limited to a certain extent, and compared with the traditional database, the method for automatically constructing the knowledge base realizes the efficient management of knowledge. However, since the input is mostly fuzzy, the recommended terms cannot be effectively filtered and all terms are listed, which requires the user to continuously change terms so as to reduce the listed terms and facilitate finding the required target terms.
Disclosure of Invention
The invention aims to provide a knowledge base automatic construction method based on a knowledge graph, which is used for solving the problems.
In order to achieve the above object, the present invention provides the following technical solutions:
a method for automatically constructing a knowledge base based on a knowledge graph comprises the following steps:
s01, inputting an entry, and inducing targets and relations in the entry through an expert annotation method to form a target and relation indexing rule base;
s02, training a machine learning model according to expert labeling rules and machine learning rules;
s03, performing prediction labeling and analysis on the entry by using the trained machine learning model so as to obtain a knowledge graph under the target correlation.
Preferably, the step S01 of summarizing the objects and relationships in the entry includes the steps of:
s11, acquiring character strings of a plurality of finished words in the vocabulary entry into a plurality of initial vocabulary entries;
s12, obtaining a plurality of similar entries input in a continuous time period, wherein each character string of the similar entries corresponds to an initial entry one by one, and the initial entries are a plurality of continuous input results;
s13, analyzing the plurality of initial entries one by one based on semantic analysis, and giving a plurality of similar semantic search entries.
Preferably, the obtaining of the plurality of similar terms input in the continuous time period in S12 is that, if the term is recorded in the thesaurus, if the time that the term is not used exceeds a first time threshold, the term is deleted from the thesaurus and recorded in the cache;
if the term is recorded in the cache, if the time that the term has not been used exceeds a second time threshold, then a new retrieved term is identified.
Preferably, the step S01 of generalizing the objects and relationships in the entry adopts the following steps:
s14, cleaning a plurality of semantic search terms, rechecking and checking the semantic search terms, deleting repeated data, and normalizing the semantic search terms;
s15, converting the normalized semantic search vocabulary entry into a TXT format, and respectively generating a training set and a prediction set file required by machine learning rule training.
Preferably, the machine learning model is based on a Schema of a graphical knowledge graph, which Schema can be dragged.
Preferably, the machine learning model is trained by maximizing a log likelihood function to solve for optimal parameters of the model.
Preferably, the expert labeling rule is that the object and the relation in the entry are classified and identified by utilizing the optimal parameter of the solving model through a density-based clustering algorithm to obtain a classification result and position coordinates of each classification, clustering is carried out according to the classification result and the position coordinates of each classification, and probability mean value, standard deviation and variance of each classification are calculated.
In the technical scheme, the method for automatically constructing the knowledge base based on the knowledge graph has the following beneficial effects: and (3) searching a plurality of entries by utilizing continuous entries, analyzing the plurality of entries by using an analyzer to acquire the semantics of the closest initial entry, creating a knowledge graph according to the semantics, and feeding back to a user, so that on the basis of user-defined entries, the intelligent trimming is performed on the basis of reducing the times of entry of the entries by the user, and on the basis of reducing the times of entry of the entries by the user, the required target entry is found more quickly.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
Fig. 1 is a schematic flow structure provided in an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a method for automatically constructing a knowledge base based on a knowledge graph includes the following steps:
s01, inputting an entry, and inducing targets and relations in the entry through an expert annotation method to form a target and relation indexing rule base;
s02, training a machine learning model according to expert labeling rules and machine learning rules;
s03, performing prediction labeling and analysis on the entry by using the trained machine learning model so as to obtain a knowledge graph under the target correlation.
In the above technology, by utilizing the search of continuous terms, the analyzer is performed on a plurality of terms to obtain the semantics of the most similar initial terms, then a knowledge graph is created according to the semantics and fed back to the user, so that on the basis of user-defined terms, the intelligent trimming is performed on the basis of reducing the times of inputting the terms by the user, the required target terms are found more quickly.
As a further embodiment of the present invention, S01 generalizing the objects and relationships in the vocabulary entry includes the following steps:
s11, acquiring character strings of a plurality of finished words in the vocabulary entry into a plurality of initial vocabulary entries;
s12, acquiring a plurality of similar entries input in a continuous time period, wherein each character string of the similar entries corresponds to an initial entry one by one, and the initial entries are a plurality of continuous input results;
s13, analyzing a plurality of initial entries one by one based on semantic analysis, and giving out a plurality of similar semantic search entries.
As still another embodiment of the present invention, the obtaining of the plurality of similar terms input in the continuous time period in S12 is to delete the terms from the term library and record the terms in the cache if the time that the terms have not been used exceeds the first time threshold in the case that the terms are recorded in the term library;
if the term is recorded in the cache, the term is identified as a new retrieved term if the time that the term has not been used exceeds a second time threshold.
As still another embodiment of the present invention, S01 generalizes the objects and relationships in the entry to the following steps:
s14, cleaning a plurality of semantic search terms, rechecking and checking the semantic search terms, deleting repeated data, and normalizing the semantic search terms;
s15, converting the normalized semantic search vocabulary entry into a TXT format, and respectively generating a training set and a prediction set file required by machine learning rule training.
It should be noted that, the machine learning model is based on a Schema of the graphical knowledge graph, and the Schema can be dragged.
Secondly, the optimal parameters of the model are solved by maximizing the log likelihood function during training of the machine learning model.
And the expert labeling rule is that the object and the relation in the entry are classified and identified by utilizing the optimal parameters of the solving model through a density-based clustering algorithm to obtain a classification result and position coordinates of each classification, clustering is carried out according to the classification result and the position coordinates of each classification, and the probability mean value, standard deviation and variance of each classification are calculated.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
The embodiment of the application also provides a specific implementation manner of the electronic device capable of implementing all the steps in the method in the embodiment, and the electronic device specifically comprises the following contents:
a processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus;
the processor, the memory and the communication interface complete communication with each other through the bus;
the processor is configured to invoke the computer program in the memory, and when the processor executes the computer program, the processor implements all the steps in the method in the above embodiment.
The embodiments of the present application also provide a computer-readable storage medium capable of implementing all the steps of the methods in the above embodiments, the computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements all the steps of the methods in the above embodiments.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a hardware+program class embodiment, the description is relatively simple, as it is substantially similar to the method embodiment, as relevant see the partial description of the method embodiment. Although the present description provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in an actual device or end product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment) as illustrated by the embodiments or by the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when implementing the embodiments of the present disclosure, the functions of each module may be implemented in the same or multiple pieces of software and/or hardware, or a module that implements the same function may be implemented by multiple sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form. The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description embodiments may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present specification.
In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction. The foregoing is merely an example of an embodiment of the present disclosure and is not intended to limit the embodiment of the present disclosure. Various modifications and variations of the illustrative embodiments will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the embodiments of the present specification, should be included in the scope of the claims of the embodiments of the present specification.
Claims (9)
1. The method for automatically constructing the knowledge base based on the knowledge graph is characterized by comprising the following steps of:
s01, inputting an entry, and inducing targets and relations in the entry through an expert annotation method to form a target and relation indexing rule base;
s02, training a machine learning model according to expert labeling rules and machine learning rules;
s03, performing prediction labeling and analysis on the entry by using the trained machine learning model so as to obtain a knowledge graph under the target correlation.
2. The method for automatically constructing a knowledge base based on knowledge graph according to claim 1, wherein the step S01 of generalizing the objects and relationships in the vocabulary entry comprises the following steps:
s11, acquiring character strings of a plurality of finished words in the vocabulary entry into a plurality of initial vocabulary entries;
s12, obtaining a plurality of similar entries input in a continuous time period, wherein each character string of the similar entries corresponds to an initial entry one by one, and the initial entries are a plurality of continuous input results;
s13, analyzing the plurality of initial entries one by one based on semantic analysis, and giving a plurality of similar semantic search entries.
3. The method according to claim 1, wherein the obtaining of the plurality of similar terms inputted in the continuous time period in S12 is to delete the terms from the term library and record the terms in the cache if the time that the terms are not used exceeds a first time threshold;
if the term is recorded in the cache, if the time that the term has not been used exceeds a second time threshold, then a new retrieved term is identified.
4. The method for automatically constructing a knowledge base based on knowledge graph according to claim 1, wherein the step S01 of generalizing the objects and relationships in the vocabulary entry comprises the following steps:
s14, cleaning a plurality of semantic search terms, rechecking and checking the semantic search terms, deleting repeated data, and normalizing the semantic search terms;
s15, converting the normalized semantic search vocabulary entry into a TXT format, and respectively generating a training set and a prediction set file required by machine learning rule training.
5. The automated knowledge base construction method based on knowledge graph of claim 1, wherein the machine learning model is based on a Schema of graphically structured knowledge graph, the Schema being draggable.
6. The method for automatically constructing a knowledge base based on a knowledge graph according to claim 1, wherein the machine learning model is trained by maximizing a log likelihood function to solve for optimal parameters of the model.
7. The method for automatically constructing a knowledge base based on a knowledge graph according to claim 1, wherein the expert labeling rule is that a classification result and position coordinates of each classification are obtained by classifying and identifying objects and relations in terms by using optimal parameters of a solving model through a density-based clustering algorithm, clustering is carried out according to the classification result and the position coordinates of each classification, and probability mean, standard deviation and variance of each classification are calculated.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the visual localization method based on local variance and posterior probability classifier of any one of claims 1 to 7 when the program is executed by the processor.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the visual localization method based on a local variance and posterior probability classifier as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311741242.1A CN117763161A (en) | 2023-12-18 | 2023-12-18 | Knowledge base automatic construction method based on knowledge graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311741242.1A CN117763161A (en) | 2023-12-18 | 2023-12-18 | Knowledge base automatic construction method based on knowledge graph |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117763161A true CN117763161A (en) | 2024-03-26 |
Family
ID=90313774
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311741242.1A Pending CN117763161A (en) | 2023-12-18 | 2023-12-18 | Knowledge base automatic construction method based on knowledge graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117763161A (en) |
-
2023
- 2023-12-18 CN CN202311741242.1A patent/CN117763161A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7912816B2 (en) | Adaptive archive data management | |
US8280915B2 (en) | Binning predictors using per-predictor trees and MDL pruning | |
US8224805B2 (en) | Method for generating context hierarchy and system for generating context hierarchy | |
US8973013B2 (en) | Composing analytic solutions | |
JP2021504789A (en) | ESG-based corporate evaluation execution device and its operation method | |
US9355152B2 (en) | Non-exclusionary search within in-memory databases | |
US9262506B2 (en) | Generating mappings between a plurality of taxonomies | |
US20220121823A1 (en) | System and method for artificial intelligence driven document analysis, including searching, indexing, comparing or associating datasets based on learned representations | |
Cruz et al. | A literature review and comparison of three feature location techniques using argouml-spl | |
CN107193915A (en) | A kind of company information sorting technique and device | |
CN114996562A (en) | Determining digital characters using data-driven analysis | |
CN116049379A (en) | Knowledge recommendation method, knowledge recommendation device, electronic equipment and storage medium | |
Hamad et al. | Knowledge-driven decision support system based on knowledge warehouse and data mining for market management | |
Scherger et al. | A systematic overview of the prediction of business failure | |
CN118227106A (en) | Code complement method, device, electronic equipment and medium | |
CN113673889A (en) | Intelligent data asset identification method | |
CN112970014B (en) | Search engine functionality using a shared AI model | |
CN110413757B (en) | Word paraphrase determining method, device and system | |
US11175907B2 (en) | Intelligent application management and decommissioning in a computing environment | |
CN116049376A (en) | Method, device and system for retrieving and replying information and creating knowledge | |
CN111026940A (en) | Network public opinion and risk information monitoring system and electronic equipment for power grid electromagnetic environment | |
CN116756373A (en) | Project review expert screening method, system and medium based on knowledge graph update | |
CN117763161A (en) | Knowledge base automatic construction method based on knowledge graph | |
CN112199557A (en) | Invention content recommendation tool, electronic equipment and computer-readable storage medium | |
US7657417B2 (en) | Method, system and machine readable medium for publishing documents using an ontological modeling system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |