CN114328946A

CN114328946A - Hidden danger processing method based on knowledge graph

Info

Publication number: CN114328946A
Application number: CN202111375866.7A
Authority: CN
Inventors: 吴建锋; 吴尚明; 廖敏杰; 秦宏帅; 秦会斌
Original assignee: Hangzhou Pioneer Electronic Technology Co ltd
Current assignee: Hangzhou Pioneer Electronic Technology Co ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-04-12

Abstract

The invention discloses a hidden danger processing method based on a knowledge graph, wherein a building end comprises the following steps: s11, acquiring industrial hidden danger data and common problems; s12, preprocessing data; s13, constructing a knowledge graph; s14, training a named entity recognition model; s15, training a relation extraction model; s16, training a semantic similarity model; after the model of the building end is trained, the implementation end comprises the following steps: s21, inputting a question; s22, respectively extracting entities and relations through the named entity recognition model and the association relation model; and S23, calculating the similarity of the relation attributes of the input question and the obtained triples through a similarity model, and taking the answer of the most similar triples as the answer. The invention innovatively introduces the industrial hidden danger data into the knowledge map technology and completes the industrial hidden danger question-answering system.

Description

Hidden danger processing method based on knowledge graph

Technical Field

The invention belongs to the technical field of hidden danger processing, and relates to a hidden danger processing method based on a knowledge graph.

Background

Potential safety hazards exist in the process of filling and reporting production by each operation company, potential safety hazard data collected by the system are accumulated to 12 thousands, and the potential safety hazard management system is increased by 3 thousands of data every year. The data volume of the hidden trouble records is large and the increase is fast. Many people can not know the details of the hidden dangers and need to inquire about the answers according to the knowledge graph. Therefore, a natural language processing technology is needed, answers are found by performing cypher query on the knowledge graph of the hidden danger profession, and then the answers are displayed. Therefore, the intelligent question answering of the hidden danger knowledge graph becomes a hot problem in the current safety field.

Entities and the relation thereof are automatically extracted from industrial hidden dangers and then search and query are carried out through corresponding Cypher sentences, the Neo4j queries graph data by using Cypher, the Cypher is descriptive graph query language, the grammar is simple, the function is strong, and the Neo4j is in an absolute leading position in a graph database family, so that the Cypher becomes a de facto standard of the graph query language.

Disclosure of Invention

In order to solve the problems, the invention provides a hidden danger processing method based on a knowledge graph, wherein a natural language is converted into a Cypher language capable of being directly inquired through a graph database, and a building end comprises the following steps:

s11, acquiring industrial hidden danger data and common problems;

s12, preprocessing data;

s13, constructing a knowledge graph;

s14, training a named entity recognition model;

s15, training a relation extraction model;

s16, training a semantic similarity model;

after the model of the building end is trained, the implementation end comprises the following steps:

s21, inputting a question;

s22, respectively extracting entities and relations through the named entity recognition model and the association relation model;

and S23, calculating the similarity of the relation attributes of the input question and the obtained triples through a similarity model, and taking the answer of the most similar triples as the answer.

Preferably, the S12 data preprocessing includes labeling common questions in sequence and classification to generate ann format file, including:

entity: each entity is identified by a T tag with a plurality of attributes;

entity ID: a unique number identifying an entity in a document, starting with 0 and incrementing each time a new entity is identified in the same document;

entity Type: one of the entity tags;

begin Index: the starting index of the entity, starting from 0, and incrementing every character;

end Index: end index of entity, starting from 0 and incrementing every character;

value: setting a word as a recognizable object;

relationships relationship, each relationship is identified by an R tag, which may have multiple attributes:

relationship ID: a unique number that identifies the relationship in the document. Starting at 0 and incrementing each time a new relationship is identified in the same document;

arg1 and Arg 2: two entities associated with a relationship;

the relationship Type: one of the relationship labels.

Preferably, the S13 knowledge graph construction includes importing triple information of industrial risk data into neo4j to construct a knowledge graph.

Preferably, in the training of the S14 named entity recognition model, the Embedding layer uses a bert pre-training model for training, the open-source chinese _ L-12_ H-768_ A-12 is used as a basic model framework, the decoding layer uses bilstm for predicting words, and the crf layer is used for constraining not to diverge.

Preferably, in the training of the S15 relationship extraction model, the extracted entity pairs are subjected to pcnn algorithm relationship classification, and a sentence is used to determine which relationship two entities belong to.

Preferably, in the training of the semantic similarity model of S16, the label that is 1 when the extracted triples and the corresponding triples are matched and 0 when the extracted triples and the corresponding triples are not matched is used as a data set, the separators are cls, and the model is a bert pre-training model.

Preferably, in S23, the non-semantic matching: if the relation attribute of the obtained triple is a subset of the character strings of the input question, namely character string matching, the answer of the obtained triple is obtained;

semantic matching: and calculating the similarity of the relation attributes of the input question and the obtained triples by utilizing a bert pre-training model, and taking the answer of the most similar triplet as the answer.

After the hidden danger information is obtained, the manually recorded text information is structured, corresponding entities and relations are extracted, and a knowledge graph is established. In the process, the missing part of the hidden danger information entities can be predicted and supplemented by the representation learning method in the invention.

Compared with the prior art, the invention has the beneficial effects that at least:

1) the conditions of semantic matching and non-semantic matching are fully considered, and the software matching rate is greatly accelerated;

2) the industrial hidden danger data is innovatively introduced into the knowledge map technology, and the industrial hidden danger question-answering system is completed.

Drawings

Fig. 1 is a flowchart illustrating steps of a hidden danger handling method based on a knowledge graph according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

Referring to fig. 1, a technical scheme of the present invention is a flow chart of a hidden danger processing method based on a knowledge graph, and a building end includes the following steps:

s11, acquiring industrial hidden danger data and common problems;

s12, preprocessing data;

s13, constructing a knowledge graph;

s14, training a named entity recognition model;

s15, training a relation extraction model;

s16, training a semantic similarity model;

s21, inputting a question;

S12, preprocessing data, including performing sequence labeling and classification labeling on common problems to generate ann format files, including:

entity: each entity is identified by a T tag with a plurality of attributes;

entity Type: one of the entity tags;

value: setting a word as a recognizable object;

arg1 and Arg 2: two entities associated with a relationship;

the relationship Type: one of the relationship labels.

And S13 knowledge graph construction, wherein the construction of the knowledge graph comprises the step of importing triple information of the industrial potential hazard data into neo4 j.

In the S14 named entity recognition model training, an Embedding layer adopts a bert pre-training model for training, an open-source Chinese _ L-12_ H-768_ A-12 is used as a basic model frame, a decoding layer adopts bilstm for predicting words, and a crf layer is used for restraining against divergence.

In the S15 relation extraction model training, pcnn algorithm relation classification is carried out on the extracted entity pairs, and which relation two entities belong to is judged from one sentence.

In the S16 semantic similarity model training, the extracted triples and the corresponding triples are matched to be 1, the label not matched to be 0 is used as a data set, the separators adopt cls, and the model adopts a bert pre-training model.

In S23, non-semantic matching: if the relation attribute of the obtained triple is a subset of the character strings of the input question, namely character string matching, the answer of the obtained triple is obtained;

Entities and the relation thereof are automatically extracted from industrial hidden dangers and then search and query are carried out through corresponding Cypher sentences, the Neo4j queries graph data by using Cypher, the Cypher is descriptive graph query language, the grammar is simple, the function is strong, and the Neo4j is in an absolute leading position in a graph database family, so that the Cypher becomes a de facto standard of the graph query language. Therefore, the invention is used for solving the problems in the prior art by converting the natural language into the Cypher language which can be directly inquired by the graph database.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A hidden danger processing method based on a knowledge graph is characterized in that a building end comprises the following steps:

s11, acquiring industrial hidden danger data and common problems;

s12, preprocessing data;

s13, constructing a knowledge graph;

s14, training a named entity recognition model;

s15, training a relation extraction model;

s16, training a semantic similarity model;

s21, inputting a question;

2. The knowledge-graph-based hidden danger processing method according to claim 1, wherein the S12 data preprocessing comprises performing sequence labeling and classification labeling on common questions to generate ann format files, and comprises the following steps:

entity: each entity is identified by a T tag with a plurality of attributes;

entity Type: one of the entity tags;

BeginIndex: the starting index of the entity, starting from 0, and incrementing every character;

value: setting a word as a recognizable object;

arg1 and Arg 2: two entities associated with a relationship;

the relationship Type: one of the relationship labels.

3. The method for handling hidden danger based on knowledge graph according to claim 2, wherein the step of constructing the knowledge graph of S13 comprises importing triple information of industrial hidden danger data into neo4j to construct the knowledge graph.

4. The knowledge-graph-based hidden danger processing method according to claim 3, wherein in the S14 named entity recognition model training, an Embedding layer adopts a bert pre-training model training, an open-source chinese _ L-12_ H-768_ A-12 is used as a basic model framework, a decoding layer adopts bilstm for predicting words, and a crf layer is used for constraining not to diverge.

5. The knowledge-graph-based hidden danger processing method according to claim 4, wherein in the training of the S15 relationship extraction model, the extracted entity pairs are subjected to pcnn algorithm relationship classification, and which relationship two entities belong to is determined from one sentence.

6. The knowledge-graph-based hidden danger processing method according to claim 5, wherein in the S16 semantic similarity model training, the extracted triplets and the corresponding triplets are matched with each other by 1, and the extracted triplets and the corresponding triplets are unmatched by 0 are used as a data set, the separators adopt cls, and the model adopts a bert pre-training model.

7. The knowledge-graph-based hidden danger processing method according to claim 6, wherein in the step S23, the semantic matching: if the relation attribute of the obtained triple is a subset of the character strings of the input question, namely character string matching, the answer of the obtained triple is obtained;