CN113312501A

CN113312501A - Construction method and device of safety knowledge self-service query system based on knowledge graph

Info

Publication number: CN113312501A
Application number: CN202110725884.7A
Authority: CN
Inventors: 苏成; 宋建炜; 邓逸川
Original assignee: Sino Singapore International Joint Research Institute
Current assignee: Sino Singapore International Joint Research Institute
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-08-27

Abstract

The invention discloses a construction method of a safety knowledge self-service inquiry system based on a knowledge graph, which comprises the following steps: collecting and preprocessing safety knowledge data, and establishing a corpus for entity extraction and relationship extraction; extracting entities by adopting a BERT-BilSTM-CRF algorithm, extracting relations by adopting a BERT-CNN algorithm, generating triple data, storing the triple data in a graph database and constructing a knowledge graph; constructing a training sample set based on the sample query sentence, and training a naive Bayes classifier; generating a Cypher query statement based on natural language processing and a naive Bayesian classification algorithm, and constructing a self-service query end; and configuring the knowledge graph and the self-service query terminal to the mobile intelligent terminal to construct a self-service query system. The query system can be used for self-service query at any time and any place, is not limited by geographical conditions and network conditions, and can be used for accurately querying answers of problems, so that the low efficiency and the complexity of looking up construction specifications and searching webpages are avoided.

Description

Construction method and device of safety knowledge self-service query system based on knowledge graph

Technical Field

The invention relates to the field of safety management of construction sites, in particular to a construction method and a device of a safety knowledge self-service query system based on a knowledge graph.

Background

The engineering project construction is a comprehensive production activity of multiple categories, and has the characteristics of long construction period and multiple uncertain factors in the construction process. In recent years, although the construction safety situation of China is better, various safety accidents happen sometimes, the construction safety problem is not ignored, and the construction safety management level needs to be further improved. At present, construction safety specifications and construction safety accident reports are accumulated continuously, most of massive construction site safety knowledge is mainly stored in construction safety specification documents and accident safety reports of a website of a building department, most of the documents are paper documents or fragmented electronic information, traditional construction safety management cannot fully utilize unstructured information, the search and the query of the paper construction specifications or traditional webpage are very inefficient and tedious, and the self-service query cannot be completed.

Google formally proposed the concept of knowledge graph in 2012, and its original intention is to improve the search capability of search engine, improve the search quality and search experience of users. The earliest knowledge maps were the knowledge base used by Google to enhance its search engine functionality, and knowledge maps have now been used to refer broadly to a variety of large-scale knowledge bases. In essence, a knowledge graph is a knowledge base of a semantic network, and in brief, a knowledge graph is a multi-relation graph, in which "entities" are usually used to express "nodes" in the graph and "relations" are usually used to express "edges" in the graph. An entity refers to something in the real world, and a relationship is used to express some kind of connection between different entities. The inventor thinks that applying the knowledge map to the construction of the construction safety knowledge self-service query system can make efficient and accurate self-service query possible.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings in the prior art and provides a construction method of a safety knowledge self-service inquiry system based on a knowledge graph. The self-service query system overcomes the defects and improves the practicability by the aid of the self-service query system, and the self-service query system is high in zero fragmentation information and variable factors in the civil industry, construction safety knowledge is mainly stored in construction safety standard documents and accident safety reports of websites of a residential building department, most of the construction safety standard documents are paper documents or partial electronic information, and the non-structural documents cause low query efficiency. The invention further aims to overcome the defects and shortcomings in the prior art and provide a construction device of a safety knowledge self-service inquiry system based on a knowledge graph.

The first purpose of the invention can be achieved by adopting the following technical scheme:

a construction method of a self-service inquiry system of safety knowledge based on a knowledge graph is applied to the configuration of an intelligent mobile terminal, and comprises the following steps:

s1, collecting construction safety knowledge data, preprocessing the data, and constructing a corpus for entity extraction and relationship extraction based on the preprocessed safety knowledge data and application data labeling software. And marking entity category labels on entities in the text by adopting a marking tool YEDDA, and marking corresponding labels on the entities in the text and the relations between the entities by manually marking the corpora extracted by the relations. Generally, the collected security data cannot be directly used by subsequent operations, and thus a preprocessing operation is required. The safety knowledge mainly comes from construction safety specification documents, construction safety accident reports and accident safety reports of a website of a department of housing construction, if the unstructured safety knowledge is paper documents, the unstructured safety knowledge is firstly converted into electronic data, and then TXT (text based translation) processing is carried out.

S2, calling a BERT-BilSTM-CRF algorithm to extract a construction safety knowledge entity;

entity extraction, also known as named entity learning or named entity recognition, refers to the automatic identification of entities from a corpus. The BERT-BilSTM-CRF algorithm comprises a BERT model, a BilSTM model and a CRF model. The text to be processed sequentially passes through the three models, firstly the text to be processed passes through a BERT model to obtain a corresponding dynamic word vector, then the dynamic word vector is input into a BilSTM module for further processing, then a CRF module is used for decoding the output result of the BilSTM module to obtain a prediction labeling sequence, and finally each entity in the sequence is extracted and classified, so that the whole process of entity extraction is completed. The text to be processed comes from a corpus, and the concrete operation steps of entity extraction are as follows:

s21, outputting the corresponding dynamic word vector by the text to be processed through a BERT model:

the BERT model is a pre-training language model based on a Transform encoder, the Transform encoder is composed of a Self-Attention mechanism part (Self-Attention), a summation and normalization part (Add & Nomal) and a Feed-Forward neural Network part (Feed-Forward Network), and the most core Self-Attention mechanism of the Transform encoder can be expressed as:

where matrices Q, K and V are input word vectors; d_kTo input vector dimensions, QK^TMeaning that the semantic relation between the input word vectors is calculated, softmax (·) denotes a normalization function.

S2.2, outputting a corresponding word label score vector by the dynamic word vector through a BilSTM model:

the BilSTM model consists of two Long Short Term Memory (LSTM) (Long Short Term memory) models, and the LSTM model is a Recurrent Neural network (RNN-work) model with a gate control unit. Word vectors generated by the BERT pre-training language model are respectively input into positive-sequence LSTM units and negative-sequence LSTM units according to the sequence from left to right and from right to left, and three gate control units in the LSTM can selectively memorize and transmit information transmitted by the previous node, so that the BilTM model can comprehensively output word label score vectors by considering information in two directions. The calculation process for the LSTM model is as follows:

wherein sigma is a sigmoid activation function, x is a word embedding vector, and i, f and o respectively represent an input gate, a forgetting gate and an output gate; omega is the weight matrix of each control gate in different states,

a weight matrix representing the input gate,

a weight matrix representing a forgetting gate,

a weight matrix representing the output gates,

a weight matrix representing alternate value layers; b the offset vector of each control gate in different states,

representing the offset vector of the input gate,

a bias vector representing a forgetting gate,

a bias vector representing the output gate is shown,

is the offset vector of the alternate value layer; h is the output, h_tIs the output at time t, h_t-1Is the output at time t-1;

a transition matrix representing the state of the old time to the state of the new time, c_tIndicating the state of the memory cell at time t; tanh (. cndot.) is a pitch function with an output of (-1, 1).

S2.3, outputting an optimal label sequence by the word label score vector through a CRF model:

a CRF model is a Markov random field, meaning that given a random variable X, a Markov random field of a random variable Y, that is, given an input observed sequence, can output a predicted sequence of maximum probability. In the entity extraction, the CRF model considers the constraint relation among the word labels, calculates the occurrence probability of different label sequences by using the score of each word label and the transfer score among the word labels, and selects the sequence with the maximum occurrence probability as the label sequence of the text. The model can be expressed as:

wherein z and y are each independentlyRepresenting an input sentence and a sequence of output tags;

indicates that the jth word is discriminated as a label y_jThe probability of (a) of (b) being,

representing all possible conversions to label y_jN is the sequence length;

for each possible label combination situation, the required conditional probability is obtained after normalization, and the calculation process is as follows:

wherein, y_trueRepresents the true label value, P (y | x) represents the probability that x's predicted label is y;

taking a log-likelihood function for P (y | x) to obtain an objective function to be optimized, wherein the likelihood function calculation process of the correct label sequence is shown as the following formula:

the group of sequences with the highest probability is output as the optimal tag sequence, which can be expressed as:

and extracting and classifying each entity in the optimal label sequence, thereby finishing entity extraction.

S3, calling a BERT-CNN algorithm to perform construction safety knowledge relation extraction and construction safety knowledge triple extraction:

the BERT-CNN algorithm comprises a BERT model and a CNN model, and the text to be processed sequentially passes through the BERT model and the CNN model to realize the extraction of the relationship. Firstly, a text to be processed is subjected to a BERT model to obtain a corresponding dynamic word vector and an entity position vector, the entity position vector represents the position of an entity in a sentence, then the dynamic word vector and the entity position vector are input into a CNN model for further processing, a convolution layer of the CNN extracts sentence-level features, a compressed feature vector representation is obtained by using a pooling layer, and finally the feature vector classifies the relation expressed by the sentence through a fully-connected neural network layer of the CNN, so that entity relation extraction is realized. CNN is a neural network specially used for processing data having a grid-like structure, for example, for processing time series data, which can be regarded as a one-dimensional grid formed by regularly sampling on a time axis, and image data, which can be regarded as a two-dimensional pixel grid. The text to be processed comes from a corpus, and the specific operation steps of relation extraction are as follows:

s3.1, outputting a corresponding dynamic word vector by the text to be processed through a BERT model;

and S3.2, inputting the word vector and the position vector representing the position of the entity into the CNN model to realize the extraction of the relation. The sequence of the first character of the entity in the text is coded into a vector, namely the vector is the position vector of the entity, and the purpose of doing so is to improve the accuracy of relation extraction. The convolution operation of CNN is expressed as:

wherein, S (k, l) represents a value of the output characteristic surface function S at (k, l), i.e., a result value of the convolution operation; k (p, q) represents the value of the convolution kernel K at (p, q); i (k-p, l-q) represents the value of the input characteristic surface function I at (k-p, l-q); m and n represent the width and height of the convolution kernel, respectively.

S4, storing the triad into a database:

the entity and relationship data are processed into triple data of the type "entity → relationship → entity" and the triples are stored into a database. A knowledge graph is a typical graph structure knowledge base, and a triplet is a general representation of the knowledge graph, and when the fact is expressed by a triplet such as (entity 1, relationship, entity 2), (entity, attribute value), a graph database is particularly suitable to be selected as a storage medium, such as Neo4j, FlockDB, GraphDB, and the like. Neo4j is the most widely used software of database at present, so that the research of subsequent construction safety knowledge inquiry is carried out based on the Neo4j database.

S5, repeating the steps S2 to S4 to complete the construction of the knowledge graph:

a large amount of triple knowledge can be obtained by processing texts in the database, and the triple data is stored into a database Neo4j by applying a load csv method, so that construction of a construction safety knowledge map is completed.

S6, constructing a self-service safety knowledge query terminal based on natural language processing and naive Bayesian classification algorithm, wherein the self-service security knowledge query terminal is constructed according to the following steps:

s6.1, constructing a training sample set, wherein the training sample set comprises problem sample sentences of different categories and classification labels thereof, and the problem sample sentences are problems which can be asked by a user and are related to construction safety knowledge or construction safety accidents, such as' what safety protection tools should be set at an elevator wellhead? "how many construction safety accidents occurred in 2018 of Guangdong province? "," # how casualties caused by a security accident? "and the like.

The training sample set has a large number of categories, and the questions of each category should include the most detailed questions, such as "? The form of this question may also be expressed as how many people died in the accident, how many workers died in the accident, etc. Firstly, determining the number of possible problem categories and labeling each category, wherein the label of the first category is 0, the label of the second category is 1, and so on; and then determining all possible questions of each category problem, and sorting all possible questions in the category into a training sample set of the category.

S6.2 training a naive Bayes classifier, comprising the following steps:

vectorizing sample sentences in the training sample set to obtain sample sentence vectors;

converting the sample statement vector into a data set RDD which can be operated in parallel;

training a classifier by using a naive Bayes classification algorithm based on a data set RDD, wherein naive Bayes is originated from a classical mathematical theory and has a solid mathematical basis and stable classification efficiency, training the classifier according to keywords and labels in question sentences in the data set RDD, classifying the question sentences according to the keywords in sample sentences by using the trained classifier, and outputting the labels corresponding to the question categories.

S6.3, constructing a query template:

applying a natural language processing toolkit HanLP to perform word segmentation on a query sentence input by a user;

converting the query sentence after word segmentation into a query sentence vector;

inputting the query statement vector into a trained naive Bayes classifier, and outputting a label corresponding to the query statement;

and constructing a Cypher query statement template based on the tags.

S6.4, generating a Cypher query statement:

and extracting entities in the query statement as query conditions, and completely filling the query template according to a format provided by the query template to generate the Cypher query statement.

S7, constructing a safety knowledge self-service inquiry system based on the knowledge graph:

the self-service query method comprises the steps that a safety knowledge map and a self-service query end are configured to a mobile intelligent terminal to construct a self-service query system, the self-service query system comprises a construction safety knowledge map and a self-service query end, the construction safety knowledge is stored in a triple form by the safety knowledge map, and the self-service query end queries in the safety knowledge map according to query sentences input by a user and returns an accurate query result.

Further, the safety knowledge in step S1 is from construction safety specification documents and construction safety accident reports, and the safety data is electronic when paper data is used.

Further, the entity labeling in step S1 may also be applied with other tools, such as Stanza of the NLP group of stanford university, paddlelac of hundred degrees, LTP of haar size.

Further, in the step S4, a graph database Neo4j is used to store the triple knowledge.

The other purpose of the invention can be achieved by adopting the following technical scheme:

a construction device of a self-service knowledge inquiry system based on a knowledge graph is used for configuring a mobile terminal, and comprises the following components:

the corpus construction module is used for collecting construction safety knowledge data and constructing a corpus for entity extraction and relation extraction based on safety knowledge data application data labeling software;

the entity extraction module is used for calling a BERT-BilSTM-CRF algorithm to extract the construction safety knowledge entity;

the relation extraction module is used for calling a BERT-CNN algorithm to extract the construction safety knowledge relation;

the triple data generation module is used for processing the entity and the relation data into triple data of an entity → relation → entity type and storing the triple into the database;

the knowledge graph construction module completes construction of the safety knowledge graph by repeatedly calling the entity extraction module, the relation extraction module and the triple data generation module;

the self-service query end construction module generates a Cypher query statement based on natural language processing and a naive Bayesian classification algorithm, and constructs a safety knowledge self-service query end;

and the self-service query system construction module is used for configuring the safety knowledge map and the self-service query end to the mobile intelligent terminal to construct the self-service query system.

Compared with the prior art, the invention has the following advantages and effects:

the invention provides a construction method of a safety knowledge self-service query system based on a knowledge graph. By configuring the self-service query system containing the safety knowledge map constructed by the invention on the intelligent mobile terminal, construction site managers and constructors can perform self-service query on construction safety knowledge anytime and anywhere without the limitation of construction site geographical position and network condition and looking up a large amount of paper document data in a data room, and can efficiently query required safety knowledge even in remote mountainous areas without networks, so that the safety management level of the construction site managers and the constructors can be improved, the safety accident rate is effectively reduced, and the self-service query system has better significance for improving the construction safety management level of the whole building industry; in addition, different from the traditional method of returning a webpage, the self-service query system can accurately query the query result of the corresponding problem, and effectively avoids the low efficiency and the complexity of looking up the construction specification and searching the webpage.

Drawings

FIG. 1 is a flow chart of a construction method of a safety knowledge self-service inquiry system based on knowledge graph disclosed by the invention;

FIG. 2 is a schematic diagram of a query flow when the self-service query system of security knowledge is applied according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an example of a query when the security knowledge self-service query system is applied according to an embodiment of the present invention;

FIG. 4 is a flow chart of a prior art BERT-BilSTM-CRF algorithm.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The self-service inquiry system of the safety knowledge based on the knowledge graph comprises the safety knowledge graph and a self-service inquiry end of the safety knowledge, wherein the safety knowledge graph stores construction safety knowledge in the graph in a triple structure, and the self-service inquiry end of the safety knowledge can inquire in the safety knowledge graph according to inquiry sentences input by a user and return accurate inquiry results.

As shown in fig. 1, a construction safety knowledge self-service query system construction method based on a knowledge graph includes the following steps:

s1, constructing a safety knowledge corpus:

collecting construction safety knowledge data, such as ' high-altitude construction work safety technical specification ' JGJ80-2016 ', and textualizing the data to obtain construction safety knowledge data, labeling the data by using data labeling software, constructing a construction safety knowledge corpus for entity extraction and relationship extraction, and completing labeling of the categories of entities in the data and the relationships among the entities, wherein in the embodiment, a labeling tool ' YEDDA ' is used for labeling the entities in the text, and the linguistic data for relationship extraction is manually labeled to the entities in the text and the relationships among the entities by using corresponding labels.

S2 extracting entities based on BERT-BilSTM-CRF algorithm;

as shown in FIG. 4, the BERT-BilSTM-CRF algorithm comprises a BERT model, a BilSTM model and a CRF model, and texts to be processed from a construction safety knowledge corpus sequentially pass through the three models to realize entity extraction.

Taking the text "the elevator well head should be provided with the protective door" to be processed as an example, the following operations are specifically performed:

s2.1 the text to be processed is processed by the BERT model and the corresponding dynamic word vector is output

The BERT model is a pre-training language model based on a Transform encoder, and the most core self-attention mechanism of the Transform encoder can be expressed as:

where the matrix Q, K, V is an input word vector; d_kFor the input vector dimension, divide by d_kIs to obtain a more stable gradient, QK^TThe semantic relation between input word vectors can be calculated, and the result is multiplied by a matrix V after being processed by a normalization function softmax (·), so that irrelevant words can be removed while the integrity of important words is kept; at this time, each word representation includes information of other words in the sentence, and the dynamic word vector in this embodiment is a 256-dimensional vector, which is more global than the conventional word vector representation.

S2.2, the dynamic word vector is processed by a BilSTM model, and a word tag score vector is output:

the BilSTM model consists of two LSTM models, the LSTM model is a recurrent neural network RNN model with a gate control unit, and the calculation process of the LSTM network is as follows:

wherein, sigma is sigmoid activation function, x is word embedding vector, i, f, o are respectively tableAn input display door, a forgetting door and an output door; omega is the weight matrix of each control gate in different states,

a weight matrix representing the input gate,

a weight matrix representing a forgetting gate,

a weight matrix representing the output gates,

representing the offset vector of the input gate,

a bias vector representing a forgetting gate,

a bias vector representing the output gate is shown,

a transition matrix representing the state of the old time to the state of the new time, c_tIndicating the state of the memory cell at time t; tanh (-) is a pitch function that adjusts the output value between-1 and 1.

The word vectors generated by the BERT model are respectively input into positive-sequence LSTM units and negative-sequence LSTM units according to the sequence from left to right and from right to left, and three gate control units in the LSTM can selectively memorize and transmit information transmitted by a previous node, so that the BilTM model can comprehensively output word tag score vectors by considering information in two directions.

S2.3, processing the word label score vector through a CRF model, and outputting an optimal label sequence of the text:

the model of the CRF model can be expressed as:

wherein z and y represent the input sentence and the output tag sequence, respectively;

representing all possible conversions to label y_jN is the sequence length;

wherein y is_trueRepresents the true label value, P (y | x) represents the probability that x's predicted label is y;

taking a log-likelihood function for P (y | x), the likelihood function calculation process is shown as follows:

y^*＝argmaxs(z,y_true)

and in the entity extraction, the CRF model considers the constraint relation among the word labels, calculates the occurrence probability of different label sequences by utilizing the score of each word label and the transfer score among the word labels, and selects the sequence with the maximum occurrence probability as the label sequence of the text. And extracting the entity elevator well mouth and the protection door from the text to be processed, namely the protection door which is required to be arranged at the elevator well mouth.

S3 carries out entity relation extraction based on BERT-CNN algorithm:

the text to be processed firstly outputs corresponding dynamic word vectors through a BERT model, and then the word vectors and position vectors representing the entity positions are input into a convolutional neural network CNN, so as to realize entity relationship extraction, wherein the relationship extraction specifically comprises the following steps:

s3.1, outputting the corresponding dynamic word vector through the BERT model for the text to be processed.

S3.2, inputting the word vector and the position vector representing the entity position into a Convolutional Neural Network (CNN), wherein the convolution operation of the CNN is represented as:

The text to be processed, namely 'the elevator well mouth should be provided with the protective door', extracts the relation of 'setting' through the operation.

And S4, forming a triple according to the extracted entities and the extracted relations, wherein the triple formed by the text to be processed, namely the elevator well → the arrangement → the protective door, is stored in the graph database Neo4 j.

S5, repeating the steps S2 to S4 to complete the construction of the knowledge graph;

s6.1 construction of training sample set

According to the data characteristics of the knowledge graph in the construction safety field, a training sample set is constructed, the training sample set comprises sample sentences and classification labels thereof, a user query problem training sample set which is as comprehensive as possible is constructed, and each type of different problems in the data set is labeled. The sample sentence is a question about construction safety knowledge or construction safety accident that a user may ask, such as "what safety protection tool should be set at the elevator shaft mouth? "how many construction safety accidents occurred in 2018 of Guangdong province? "," # how casualties caused by a security accident? ".

The training sample set has a large number of categories, and the questions of each category should include the most detailed questions, such as "? The form of this question may also be expressed as how many people died in the accident, how many workers died in the accident, etc. We first need to determine the number of possible problem categories and label each category with a first category label of 0, a second category label of 1, and so on. The invention defines 12 types of problems according to the corpus content, the label number is from 0 to 11, and the keywords in the 12 types of problems are respectively: 0: accident time, 1: accident site, 2: casualty, 3: economic loss of accident, 4: accident cause, 5: accident resolution, 6: project construction unit in which accident occurs, 7: project construction unit of accident, 8: project supervision unit of accident, 9: accident type, 10: construction safety tool, 11: dangerous positions where safety accidents are likely to occur; then all possible question methods of each category problem are determined, all possible question sentences under the category are arranged into a training sample set of the category, so that the training sample sets of 12 categories are arranged in total, and a naive Bayes classifier is trained according to the large training sample set.

S6.2, segmenting words of sample sentences in the sample set into vectors, and training a naive Bayes classifier:

the sample sentences in the training sample set are natural sentences which cannot be directly read by a computer, the natural sentences need to be converted into vectors to obtain sample sentence vectors, and the vector conversion is carried out according to the following method: firstly, constructing a vocabulary data set, wherein the data set comprises characteristic words in all question sentences; and for the sample sentences in the training set, extracting the characteristic words of the sample sentences after word segmentation, comparing the characteristic words with the characteristic words in the vocubulary data set, setting the words as 1 if the characteristic words exist in the vocubulary data set, and setting the words as 0 if the characteristic words do not exist in the vocubulary data set, and converting all the sentences in the training set into vectors according to the idea for training a naive Bayes classifier.

The training steps are as follows:

and training a classifier by using a naive Bayes classification algorithm based on a training data set RDD.

S6.3, constructing a query template:

in the embodiment, firstly, a natural language processing toolkit HanLP is applied to perform word segmentation on a query sentence input by a user, then, the query sentence after word segmentation is converted into a query sentence vector, then, the query sentence vector is input into a trained naive bayesian classifier, a label corresponding to the query sentence is output, and finally, a Cypher query sentence template is obtained through automatic matching of the label.

Assuming that the question that the user inputs the query is "what should be set at the elevator shaft mouth", the query template is constructed as "what should be set at a certain adjacent position" through the above operations.

S6.4, generating Cypher query sentences based on the query template:

after the query template exists, Cypher sentences capable of executing the query in the Neo4j are generated, and the required construction safety knowledge can be queried in the Neo4j database. The entity in the query sentence is extracted as the query condition, the query template is completely filled according to the format provided by the query template, a Cypher query sentence is generated, and the flow of generating the query language is shown in FIG. 3.

Assuming that the problem that the user inputs to be queried is what the elevator wellhead should be set, in this embodiment, a Cypher query template obtained by matching the query template according to the input query problem is: "Macth (p: Position) - [ r: set ] - > (t: Tool) where p.title ═ return t.name"; then extracting a related entity 'elevator well mouth' in the query question as a query condition; and then filling the query condition 'elevator well mouth' into a Cypher query template to obtain a complete Cypher query statement: "mach (p: Position) - [ r: set ] - > (t: Tool) where p.title ═ elevator wellhead' return t.name"; and finally, executing Cypher query statements, wherein the query answer returned by the self-service query system is 'safety door'.

the self-service inquiry system of the safety knowledge comprises a knowledge map and a self-service inquiry end, wherein the construction safety knowledge is stored in the knowledge map in a triple structure, and the self-service inquiry end inquires in the knowledge map according to inquiry sentences input by a user and returns an accurate inquiry result.

The self-service query system constructed by the invention is configured on intelligent mobile terminals such as a smart phone and a tablet, can automatically query construction safety knowledge at any time and any place by applying the self-service query system, is not limited by geographical positions and network conditions of a construction site, does not need to look up a large amount of paper document data in a data room, can efficiently query required safety knowledge even in remote mountainous areas without networks, and is different from the traditional query method of returning a webpage.

In order to better understand the self-service knowledge query system based on the knowledge graph provided by the invention, the following description is provided with reference to fig. 2 for the query process of the self-service query system, and it should be noted that the self-service query process is only described for providing a visual understanding of the construction method of the self-service knowledge query system based on the knowledge graph provided by the invention, and the invention is not limited.

As shown in fig. 2, to perform security knowledge query, a user needs to input a question to be queried on a mobile terminal, for convenience of description, the question of the query input by the user is referred to as a query statement, and after receiving the query statement input by the user, the self-service query system performs a series of processing on the query statement, and finally outputs a query result: firstly, natural language input by a user is subjected to word segmentation and vectorization through a HanLP natural language processing toolkit, a trained naive Bayes classifier is adopted to classify natural language query sentences input by the user, then query targets of the query sentences of the user are obtained according to classification results of the query sentences and are matched with a query template, then relevant entities in the query sentences are extracted as query conditions, the query template is filled into complete Cypher query sentences, and required construction safety knowledge can be queried in a Neo4j database after the complete query sentences exist. And finally, the self-service query system returns the answer of the query statement according to the Cypher query statement.

Particularly, the knowledge graph is a typical graph structure knowledge base, a graph database is particularly suitable to be selected as a storage medium, Neo4j is a graph database software which is most widely applied at present, so that the embodiment is developed based on the Neo4j database, and other graph databases can be used as storage media in the follow-up research, such as FlockDB, GraphDB and the like. In addition, in this embodiment, the user query is a text input query, and a voice input mode is further adopted in the follow-up research to improve the use experience of the user.

Example two

A construction device of a safety knowledge self-service inquiry system based on a knowledge graph is used for configuring a mobile terminal and comprises a corpus construction module, an entity extraction module, a relation extraction module, a triple data generation module, a knowledge graph construction library module, a self-service inquiry terminal construction module and a self-service inquiry system construction module. The functions of the constituent modules are as follows:

The self-service inquiry system of the safety knowledge constructed by the invention comprises a knowledge map and a self-service inquiry end, wherein the knowledge map stores the construction safety knowledge in a triple structure, and the self-service inquiry end inquires in the knowledge map according to inquiry sentences input by a user and returns an accurate inquiry result. When the knowledge graph is constructed, a corpus construction module is applied to process safety knowledge data to be processed, and a corpus used for entity extraction and relation extraction is constructed; then, calling an entity extraction module to extract the entity and calling a relationship extraction module to extract the relationship; then the triple data generation module processes the entity and the relation data into triple data of the type of entity → relation → entity, and stores the triple data into a database; and finally, calling a knowledge graph construction module to complete construction of the whole knowledge graph. The self-service query system comprising the safety knowledge map and the self-service query end is configured on intelligent terminals such as a smart phone and a tablet, and the self-service query system can be used for self-service query of construction safety knowledge at any time and any place.

It should be noted that, in the above device embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be achieved; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

In addition, it can be understood by those skilled in the art that all or part of the steps in the above embodiments may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, or an optical disk.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A construction method of a self-service inquiry system of safety knowledge based on a knowledge graph is applied to the configuration of an intelligent mobile terminal, and is characterized in that the construction method of the self-service inquiry system comprises the following steps:

s1, collecting construction safety knowledge data, and constructing a corpus for entity extraction and relationship extraction based on safety knowledge data application data labeling software;

s3, calling a BERT-CNN algorithm to extract the construction safety knowledge relationship;

s4, processing the entity and the relation data into triple data of the type of entity → relation → entity, and storing the triple into a database;

s6, generating a Cypher query statement based on natural language processing and a naive Bayesian classification algorithm, and constructing a safety knowledge self-service query end;

and S7, configuring the security knowledge graph and the self-service inquiry terminal to the mobile intelligent terminal to construct a self-service inquiry system.

2. The method for constructing a knowledge-graph-based secure knowledge self-service query system according to claim 1, wherein the BERT-BiLSTM-CRF algorithm in step S2 includes a BERT model, a BiLSTM model and a CRF model, and the text to be processed sequentially passes through the three models to realize entity extraction, the text to be processed is from a corpus, and the entity extraction process is as follows:

s21, outputting a corresponding dynamic word vector by the text to be processed through a BERT model:

the BERT model is a pre-training language model based on a Transform encoder, the Transform encoder is composed of a self-attention mechanism part, a summation and normalization part and a feedforward neural network part, and the most core self-attention mechanism of the Transform encoder can be expressed as follows:

where matrices Q, K and V are input word vectors; d_kTo input vector dimensions, QK^TRepresenting the semantic relation between the input word vectors, and softmax ((-)) representing a normalization function;

the BilSTM model consists of two LSTM models, and the calculation process of the LSTM models is as follows:

a weight matrix representing the input gate,

a weight matrix representing a forgetting gate,

a weight matrix representing the output gates,

a weight matrix representing alternate value layers; b is the offset vector of each control gate in different states,

representing the offset vector of the input gate,

a bias vector representing a forgetting gate,

a bias vector representing the output gate is shown,

a transition matrix representing the state of the old time to the state of the new time, c_tIndicating the state of the memory cell at time t; tanh (·) is a value-adjusting function, and the output value of the value-adjusting function is positioned in the interval (-1, 1);

wherein the CRF model is a Markov random field, and is expressed as:

representing all possible conversions to label y_jN is the sequence length;

3. The method for constructing the safe self-help knowledge inquiry system based on the knowledge graph of claim 1, wherein the relational extraction in the step S3 is performed according to the following steps:

s3.1, outputting a corresponding dynamic word vector and an entity position vector by the text to be processed through a BERT model, wherein the dynamic word vector is represented by the vector of each character, and the position vector represents the position vector of the entity in a sentence;

s3.2, inputting the dynamic word vector and the position vector into a CNN model to realize relation extraction, wherein the convolution operation of the CNN is expressed as:

wherein, S (K, l) represents the value of the output characteristic surface function S at (K, l), namely the value of the result of convolution operation, K (p, q) represents the value of the convolution kernel function K at (p, q), I (K-p, l-q) represents the value of the input characteristic surface function I at (K-p, l-q), and m and n represent the width and height of the convolution kernel respectively.

4. The construction method of the safe self-help knowledge inquiry system based on the knowledge graph of claim 1, wherein the self-help inquiry terminal in the step S6 is constructed according to the following steps:

s6.1, constructing a training sample set, wherein the training sample set comprises question sample sentences of different categories and classification labels thereof, and the question sample sentences are questions which are possibly asked by a user and are related to the category construction safety knowledge;

s6.2 training a naive Bayes classifier: the training steps are as follows:

training a classifier by using a naive Bayes classification algorithm based on a data set RDD;

s6.3, constructing a query template:

constructing a Cypher query statement template based on the tags;

s6.4, generating a Cypher query statement:

and extracting entities in the query statement as query conditions, and completely filling the query template to generate the Cypher query statement.

5. The method for constructing the self-service knowledge base knowledge inquiry system based on the knowledge graph of any one of claims 1 to 4, wherein the safety knowledge in the step S1 is from construction safety specification documents and construction safety accident reports, and the safety data is paper data which is to be electronized in advance.

6. The method for constructing the safe self-help knowledge query system based on the knowledge graph of any one of claims 1 to 4, wherein in the step S1, a labeling tool YEDDA is called to label entities of the corpus with entity categories.

7. The method for constructing a safe knowledge self-service inquiry system based on knowledge graph according to any one of claims 1 to 4, wherein in the step S4, the graph database Neo4j is adopted to store the triple knowledge.

8. A construction device of a self-service knowledge inquiry system based on knowledge graph based on any one of claims 1 to 4, which is used for configuring a mobile terminal, and the construction device of the self-service inquiry system comprises: