CN114328946A - Hidden danger processing method based on knowledge graph - Google Patents

Hidden danger processing method based on knowledge graph Download PDF

Info

Publication number
CN114328946A
CN114328946A CN202111375866.7A CN202111375866A CN114328946A CN 114328946 A CN114328946 A CN 114328946A CN 202111375866 A CN202111375866 A CN 202111375866A CN 114328946 A CN114328946 A CN 114328946A
Authority
CN
China
Prior art keywords
model
hidden danger
training
entity
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111375866.7A
Other languages
Chinese (zh)
Inventor
吴建锋
吴尚明
廖敏杰
秦宏帅
秦会斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Pioneer Electronic Technology Co ltd
Original Assignee
Hangzhou Pioneer Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Pioneer Electronic Technology Co ltd filed Critical Hangzhou Pioneer Electronic Technology Co ltd
Priority to CN202111375866.7A priority Critical patent/CN114328946A/en
Publication of CN114328946A publication Critical patent/CN114328946A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a hidden danger processing method based on a knowledge graph, wherein a building end comprises the following steps: s11, acquiring industrial hidden danger data and common problems; s12, preprocessing data; s13, constructing a knowledge graph; s14, training a named entity recognition model; s15, training a relation extraction model; s16, training a semantic similarity model; after the model of the building end is trained, the implementation end comprises the following steps: s21, inputting a question; s22, respectively extracting entities and relations through the named entity recognition model and the association relation model; and S23, calculating the similarity of the relation attributes of the input question and the obtained triples through a similarity model, and taking the answer of the most similar triples as the answer. The invention innovatively introduces the industrial hidden danger data into the knowledge map technology and completes the industrial hidden danger question-answering system.

Description

Hidden danger processing method based on knowledge graph
Technical Field
The invention belongs to the technical field of hidden danger processing, and relates to a hidden danger processing method based on a knowledge graph.
Background
Potential safety hazards exist in the process of filling and reporting production by each operation company, potential safety hazard data collected by the system are accumulated to 12 thousands, and the potential safety hazard management system is increased by 3 thousands of data every year. The data volume of the hidden trouble records is large and the increase is fast. Many people can not know the details of the hidden dangers and need to inquire about the answers according to the knowledge graph. Therefore, a natural language processing technology is needed, answers are found by performing cypher query on the knowledge graph of the hidden danger profession, and then the answers are displayed. Therefore, the intelligent question answering of the hidden danger knowledge graph becomes a hot problem in the current safety field.
Entities and the relation thereof are automatically extracted from industrial hidden dangers and then search and query are carried out through corresponding Cypher sentences, the Neo4j queries graph data by using Cypher, the Cypher is descriptive graph query language, the grammar is simple, the function is strong, and the Neo4j is in an absolute leading position in a graph database family, so that the Cypher becomes a de facto standard of the graph query language.
Disclosure of Invention
In order to solve the problems, the invention provides a hidden danger processing method based on a knowledge graph, wherein a natural language is converted into a Cypher language capable of being directly inquired through a graph database, and a building end comprises the following steps:
s11, acquiring industrial hidden danger data and common problems;
s12, preprocessing data;
s13, constructing a knowledge graph;
s14, training a named entity recognition model;
s15, training a relation extraction model;
s16, training a semantic similarity model;
after the model of the building end is trained, the implementation end comprises the following steps:
s21, inputting a question;
s22, respectively extracting entities and relations through the named entity recognition model and the association relation model;
and S23, calculating the similarity of the relation attributes of the input question and the obtained triples through a similarity model, and taking the answer of the most similar triples as the answer.
Preferably, the S12 data preprocessing includes labeling common questions in sequence and classification to generate ann format file, including:
entity: each entity is identified by a T tag with a plurality of attributes;
entity ID: a unique number identifying an entity in a document, starting with 0 and incrementing each time a new entity is identified in the same document;
entity Type: one of the entity tags;
begin Index: the starting index of the entity, starting from 0, and incrementing every character;
end Index: end index of entity, starting from 0 and incrementing every character;
value: setting a word as a recognizable object;
relationships relationship, each relationship is identified by an R tag, which may have multiple attributes:
relationship ID: a unique number that identifies the relationship in the document. Starting at 0 and incrementing each time a new relationship is identified in the same document;
arg1 and Arg 2: two entities associated with a relationship;
the relationship Type: one of the relationship labels.
Preferably, the S13 knowledge graph construction includes importing triple information of industrial risk data into neo4j to construct a knowledge graph.
Preferably, in the training of the S14 named entity recognition model, the Embedding layer uses a bert pre-training model for training, the open-source chinese _ L-12_ H-768_ A-12 is used as a basic model framework, the decoding layer uses bilstm for predicting words, and the crf layer is used for constraining not to diverge.
Preferably, in the training of the S15 relationship extraction model, the extracted entity pairs are subjected to pcnn algorithm relationship classification, and a sentence is used to determine which relationship two entities belong to.
Preferably, in the training of the semantic similarity model of S16, the label that is 1 when the extracted triples and the corresponding triples are matched and 0 when the extracted triples and the corresponding triples are not matched is used as a data set, the separators are cls, and the model is a bert pre-training model.
Preferably, in S23, the non-semantic matching: if the relation attribute of the obtained triple is a subset of the character strings of the input question, namely character string matching, the answer of the obtained triple is obtained;
semantic matching: and calculating the similarity of the relation attributes of the input question and the obtained triples by utilizing a bert pre-training model, and taking the answer of the most similar triplet as the answer.
After the hidden danger information is obtained, the manually recorded text information is structured, corresponding entities and relations are extracted, and a knowledge graph is established. In the process, the missing part of the hidden danger information entities can be predicted and supplemented by the representation learning method in the invention.
Compared with the prior art, the invention has the beneficial effects that at least:
1) the conditions of semantic matching and non-semantic matching are fully considered, and the software matching rate is greatly accelerated;
2) the industrial hidden danger data is innovatively introduced into the knowledge map technology, and the industrial hidden danger question-answering system is completed.
Drawings
Fig. 1 is a flowchart illustrating steps of a hidden danger handling method based on a knowledge graph according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
Referring to fig. 1, a technical scheme of the present invention is a flow chart of a hidden danger processing method based on a knowledge graph, and a building end includes the following steps:
s11, acquiring industrial hidden danger data and common problems;
s12, preprocessing data;
s13, constructing a knowledge graph;
s14, training a named entity recognition model;
s15, training a relation extraction model;
s16, training a semantic similarity model;
after the model of the building end is trained, the implementation end comprises the following steps:
s21, inputting a question;
s22, respectively extracting entities and relations through the named entity recognition model and the association relation model;
and S23, calculating the similarity of the relation attributes of the input question and the obtained triples through a similarity model, and taking the answer of the most similar triples as the answer.
S12, preprocessing data, including performing sequence labeling and classification labeling on common problems to generate ann format files, including:
entity: each entity is identified by a T tag with a plurality of attributes;
entity ID: a unique number identifying an entity in a document, starting with 0 and incrementing each time a new entity is identified in the same document;
entity Type: one of the entity tags;
begin Index: the starting index of the entity, starting from 0, and incrementing every character;
end Index: end index of entity, starting from 0 and incrementing every character;
value: setting a word as a recognizable object;
relationships relationship, each relationship is identified by an R tag, which may have multiple attributes:
relationship ID: a unique number that identifies the relationship in the document. Starting at 0 and incrementing each time a new relationship is identified in the same document;
arg1 and Arg 2: two entities associated with a relationship;
the relationship Type: one of the relationship labels.
And S13 knowledge graph construction, wherein the construction of the knowledge graph comprises the step of importing triple information of the industrial potential hazard data into neo4 j.
In the S14 named entity recognition model training, an Embedding layer adopts a bert pre-training model for training, an open-source Chinese _ L-12_ H-768_ A-12 is used as a basic model frame, a decoding layer adopts bilstm for predicting words, and a crf layer is used for restraining against divergence.
In the S15 relation extraction model training, pcnn algorithm relation classification is carried out on the extracted entity pairs, and which relation two entities belong to is judged from one sentence.
In the S16 semantic similarity model training, the extracted triples and the corresponding triples are matched to be 1, the label not matched to be 0 is used as a data set, the separators adopt cls, and the model adopts a bert pre-training model.
In S23, non-semantic matching: if the relation attribute of the obtained triple is a subset of the character strings of the input question, namely character string matching, the answer of the obtained triple is obtained;
semantic matching: and calculating the similarity of the relation attributes of the input question and the obtained triples by utilizing a bert pre-training model, and taking the answer of the most similar triplet as the answer.
Entities and the relation thereof are automatically extracted from industrial hidden dangers and then search and query are carried out through corresponding Cypher sentences, the Neo4j queries graph data by using Cypher, the Cypher is descriptive graph query language, the grammar is simple, the function is strong, and the Neo4j is in an absolute leading position in a graph database family, so that the Cypher becomes a de facto standard of the graph query language. Therefore, the invention is used for solving the problems in the prior art by converting the natural language into the Cypher language which can be directly inquired by the graph database.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (7)

1. A hidden danger processing method based on a knowledge graph is characterized in that a building end comprises the following steps:
s11, acquiring industrial hidden danger data and common problems;
s12, preprocessing data;
s13, constructing a knowledge graph;
s14, training a named entity recognition model;
s15, training a relation extraction model;
s16, training a semantic similarity model;
after the model of the building end is trained, the implementation end comprises the following steps:
s21, inputting a question;
s22, respectively extracting entities and relations through the named entity recognition model and the association relation model;
and S23, calculating the similarity of the relation attributes of the input question and the obtained triples through a similarity model, and taking the answer of the most similar triples as the answer.
2. The knowledge-graph-based hidden danger processing method according to claim 1, wherein the S12 data preprocessing comprises performing sequence labeling and classification labeling on common questions to generate ann format files, and comprises the following steps:
entity: each entity is identified by a T tag with a plurality of attributes;
entity ID: a unique number identifying an entity in a document, starting with 0 and incrementing each time a new entity is identified in the same document;
entity Type: one of the entity tags;
BeginIndex: the starting index of the entity, starting from 0, and incrementing every character;
end Index: end index of entity, starting from 0 and incrementing every character;
value: setting a word as a recognizable object;
relationships relationship, each relationship is identified by an R tag, which may have multiple attributes:
relationship ID: a unique number that identifies the relationship in the document. Starting at 0 and incrementing each time a new relationship is identified in the same document;
arg1 and Arg 2: two entities associated with a relationship;
the relationship Type: one of the relationship labels.
3. The method for handling hidden danger based on knowledge graph according to claim 2, wherein the step of constructing the knowledge graph of S13 comprises importing triple information of industrial hidden danger data into neo4j to construct the knowledge graph.
4. The knowledge-graph-based hidden danger processing method according to claim 3, wherein in the S14 named entity recognition model training, an Embedding layer adopts a bert pre-training model training, an open-source chinese _ L-12_ H-768_ A-12 is used as a basic model framework, a decoding layer adopts bilstm for predicting words, and a crf layer is used for constraining not to diverge.
5. The knowledge-graph-based hidden danger processing method according to claim 4, wherein in the training of the S15 relationship extraction model, the extracted entity pairs are subjected to pcnn algorithm relationship classification, and which relationship two entities belong to is determined from one sentence.
6. The knowledge-graph-based hidden danger processing method according to claim 5, wherein in the S16 semantic similarity model training, the extracted triplets and the corresponding triplets are matched with each other by 1, and the extracted triplets and the corresponding triplets are unmatched by 0 are used as a data set, the separators adopt cls, and the model adopts a bert pre-training model.
7. The knowledge-graph-based hidden danger processing method according to claim 6, wherein in the step S23, the semantic matching: if the relation attribute of the obtained triple is a subset of the character strings of the input question, namely character string matching, the answer of the obtained triple is obtained;
semantic matching: and calculating the similarity of the relation attributes of the input question and the obtained triples by utilizing a bert pre-training model, and taking the answer of the most similar triplet as the answer.
CN202111375866.7A 2021-11-19 2021-11-19 Hidden danger processing method based on knowledge graph Withdrawn CN114328946A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111375866.7A CN114328946A (en) 2021-11-19 2021-11-19 Hidden danger processing method based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111375866.7A CN114328946A (en) 2021-11-19 2021-11-19 Hidden danger processing method based on knowledge graph

Publications (1)

Publication Number Publication Date
CN114328946A true CN114328946A (en) 2022-04-12

Family

ID=81046166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111375866.7A Withdrawn CN114328946A (en) 2021-11-19 2021-11-19 Hidden danger processing method based on knowledge graph

Country Status (1)

Country Link
CN (1) CN114328946A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860917A (en) * 2022-07-06 2022-08-05 中化现代农业有限公司 Agricultural knowledge question-answering method, device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860917A (en) * 2022-07-06 2022-08-05 中化现代农业有限公司 Agricultural knowledge question-answering method, device, electronic equipment and storage medium
CN114860917B (en) * 2022-07-06 2022-10-18 中化现代农业有限公司 Agricultural knowledge question-answering method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107766483A (en) The interactive answering method and system of a kind of knowledge based collection of illustrative plates
CN110968699A (en) Logic map construction and early warning method and device based on event recommendation
CN113312501A (en) Construction method and device of safety knowledge self-service query system based on knowledge graph
CN110059160A (en) A kind of knowledge base answering method and device based on context end to end
CN111737484A (en) Warning situation knowledge graph construction method based on joint learning
AU2018411565B2 (en) System and methods for generating an enhanced output of relevant content to facilitate content analysis
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN111026880B (en) Joint learning-based judicial knowledge graph construction method
CN113282729B (en) Knowledge graph-based question and answer method and device
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
CN113190689B (en) Construction method, device, equipment and medium of electric power safety knowledge graph
CN113822026A (en) Multi-label entity labeling method
CN115292461B (en) Man-machine interaction learning method and system based on voice recognition
CN114153978A (en) Model training method, information extraction method, device, equipment and storage medium
CN112182248A (en) Statistical method for key policy of electricity price
CN110795932A (en) Geological report text information extraction method based on geological ontology
CN115599899A (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
CN114328946A (en) Hidden danger processing method based on knowledge graph
CN116842142B (en) Intelligent retrieval system for medical instrument
CN111459973B (en) Case type retrieval method and system based on case situation triple information
CN117216221A (en) Intelligent question-answering system based on knowledge graph and construction method
CN112988704A (en) AI consultation database cluster building method and system
CN112579666A (en) Intelligent question-answering system and method and related equipment
CN111737498A (en) Domain knowledge base establishing method applied to discrete manufacturing production process
CN111191455A (en) Legal provision prediction method in traffic accident damage compensation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220412

WW01 Invention patent application withdrawn after publication