CN110928988B

CN110928988B - Method for rapidly estimating risk level of potential safety hazard in factory building

Info

Publication number: CN110928988B
Application number: CN201911034620.6A
Authority: CN
Inventors: 刘庭煜; 韦凯翔
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2022-10-14
Anticipated expiration: 2039-10-29
Also published as: CN110928988A

Abstract

The invention discloses a method for quickly estimating the risk level of potential safety hazards in a factory, which comprises the following steps: step 1, establishing a factory building potential safety hazard database; step 2, pre-training a Chinese word vector model by utilizing a Chinese language database; step 3, dividing a workshop potential safety hazard text training set, a test set and a verification set, and generating a potential safety hazard corpus word vector; step 4, sending the divided and standardized potential safety hazard texts into a bert neural network model for fine adjustment to obtain a danger level classification model; step 5, when new potential safety hazards appear in the factory building, acquiring relevant element information of the factory building, importing a danger level classification model, and estimating a danger level; and 6, comparing the text similarity of the new potential safety hazard and the potential safety hazard with the same risk level in the database, and evaluating the confidence of the estimated result. By the method and the system, when the production elements of the factory building are changed, the new potential safety hazard can be rapidly evaluated in the danger level.

Description

Method for rapidly estimating risk level of potential safety hazard in factory building

Technical Field

The invention relates to the technical field of manufacturing industry danger control, in particular to a method for quickly estimating the danger level of potential safety hazards in a factory building.

Background

At present, with the rapid development of informatization of manufacturing industry and digitalization of manufacturing workshops in China, each single element in five elements (man-machine material method ring) of the workshops, such as equipment (machine) or products (materials), has a mature control method in the actual production of the traditional manufacturing industry. However, for the research on the actual potential safety hazard in the production plant, it is difficult to systematically and comprehensively grade and evaluate the risk due to the five elements of the plant.

In the actual production process of the current manufacturing industry, two main problems exist in the evaluation of potential safety hazards of a factory building: firstly, because the safety hidden trouble of the factory building involves many elements, and most of them are textual descriptions, it is difficult to perform quantitative grade evaluation. Secondly, the evaluation of the potential safety hazard needs to be carried out by safety experts in related fields. Therefore, LEC evaluation methods (proposed by American safety experts K.J. Graham and K.F. Jinni) are mostly adopted in the traditional manufacturing industry field at present for evaluating the dangerousness and harmfulness of operators in the operation in the environment with potential safety hazards. The method uses the product of index values of three factors related to the system risk to evaluate the casualty risk of the operator, wherein the three factors are as follows: l (likelihood, the likelihood of an accident), E (exposure, the frequency with which personnel are exposed to hazardous environments), and C (consequential, possible consequences in the event of an accident). The value of the risk of the working condition is evaluated by multiplying the value of the three values by the value of "D" (risk). However, the method still has great limitations, and the limitations are mainly reflected in the requirement of manual evaluation by safety experts. When any one of the five elements of the factory building production workshop is changed (for example, new equipment is purchased), the related potential safety hazard in the factory building is changed, the danger level of the factory building is possibly changed, at the moment, a safety specialist is required to perform reanalysis and evaluation again, and a new danger level is given, so that the evaluation process of the method is troublesome and slow.

Disclosure of Invention

The invention aims to provide a method for quickly estimating the risk level of potential safety hazard in a factory building, which can quickly estimate the new potential safety hazard caused by the change of five workshop elements in the factory building production process.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a method for quickly estimating the risk level of potential safety hazards in a factory building comprises the following steps:

step 1, establishing a factory building potential safety hazard database, which specifically comprises the following steps:

step 1.1, listing all potential safety hazards in a factory building according to the actual production condition of the factory building and five production factors of a man-machine material method ring workshop;

step 1.2, analyzing each potential safety hazard of the plant according to the following six points:

(1) The potential safety hazard is caused by human factors and equipment material factors;

(2) The method comprises the following steps of (1) representing the danger of the potential safety hazard, namely representing the actual expression form of the potential safety hazard in a workshop when the potential safety hazard actually occurs, wherein the actual expression form specifically comprises environmental factors, equipment material factors and personnel factors;

(3) The dangerous consequences of the potential safety hazard comprise personnel dangerous consequences, material dangerous consequences and equipment dangerous consequences;

(4) The potential safety hazard relates to regions, namely the regions of a factory where the potential safety hazard may occur;

(5) The potential safety hazard relates to objects, namely the potential safety hazard relates to which factors in a man-machine material law ring;

(6) The risk grade of the potential safety hazard is evaluated by adopting an LEC method to the potential safety hazard initially existing in the factory building;

step 1.3, inputting the six-point information of each potential safety hazard analyzed in the step 1.2 into a database to form a plant potential safety hazard database;

step 2, pre-training a Chinese word vector model by utilizing a Chinese language database;

step 3, dividing a plant potential safety hazard text training set, a test set and a verification set, and generating a corpus word vector of the potential safety hazard, which specifically comprises the following steps:

step 3.1, extracting text data stored in the plant potential safety hazard database formed in the step 1 by using pymysql;

step 3.2, standardizing the extracted text data of the potential safety hazards, wherein each potential safety hazard adopts a format of danger level + text content and uses \\ t' interval;

3.3, performing word segmentation on the information in the potential safety hazard text content by using Jieba, and establishing a special stop word list aiming at the potential safety hazard text content of a specific factory building after obtaining a word segmentation result;

step 3.4, the potential safety hazard text contents after word segmentation and word stop removal are sent into the word vector model trained in the step 2, and the feature vector of each corresponding word is output;

step 4, sending the divided and standardized potential safety hazard text into a bert neural network model for fine adjustment to obtain a danger level classification model;

step 5, when new potential safety hazards appear in the factory building, acquiring relevant element information according to the six points in the step 1.2, forming text information, importing a danger level classification model, and predicting the danger level, wherein the steps specifically include: preprocessing the text information of the new potential safety hazard, importing the preprocessed text information into a finely adjusted danger level classification model, and estimating the danger level of the new potential safety hazard from high to low in sequence according to the danger level probability output by the model;

and 6, after the estimated risk level of the new potential safety hazard is obtained, comparing the text similarity of the new potential safety hazard and the potential safety hazard with the same risk level stored in the database, and further evaluating the confidence level of the estimated result of the risk level of the new potential safety hazard.

Further, the chinese corpus in step 2 is a wikipedia chinese corpus.

Further, the step 2 specifically includes:

step 2.1, downloading a Chinese language database as original training data at a Wikipedia website and converting the Chinese language database into simplified characters by using an opencc tool;

step 2.2, extracting the content of the Chinese language database and performing word segmentation on the Chinese language database which is completely converted into simplified characters by using a regular expression;

and 2.3, training the corpus after Word segmentation and removal of stop words by using a Word2Vec model.

Further, the feature vector output in step 3.4 is a 64-bit feature vector.

Further, the step 4 specifically includes:

step 4.1, establishing a model, and establishing a pre-training model of bert by using python;

and 4.2, reading the divided potential safety hazard text training set, test set and verification set, and starting to train the potential safety hazard risk grade classification model.

Further, the step 6 specifically includes:

6.1, extracting text data with the same risk level as the risk level of the potential safety hazard with the highest probability evaluated in the step 5 from a potential safety hazard database after the risk level of the newly appeared potential safety hazard is estimated;

step 6.2, performing Word segmentation and stop Word removal on the newly-appeared potential safety hazard and the text data of the same level potential safety hazard extracted from the database, and then importing the Word data into the Word2Vec Word vector model generated in the step 2 to generate a corresponding Word vector;

6.3, representing the text vector of the potential safety hazard by using the average value of the word vectors in each potential safety hazard text;

and 6.4, representing the similarity of the new potential safety hazard and each text of the same-level potential safety hazard by using the space cosine similarity of the text vector, wherein the confidence threshold value is set to be 0.5, namely if the similarity of the new potential safety hazard text and more than 50% of the existing texts with the same-level potential safety hazard exceeds 50%, the risk level estimated by the risk level classification model is considered to be credible, otherwise, selecting the risk level corresponding to the next prediction probability from the step 5 and repeating the operations of the steps 6.2 to 6.3.

Further, in step 6.4, if all the corresponding risk levels do not exceed the confidence threshold, the risk level classification model is selected to output the risk level corresponding to the highest prediction probability.

Compared with the prior art, the invention has the remarkable advantages that:

(1) The invention provides a method for carrying out artificial rating without safety experts, and compared with the traditional evaluation method, the method saves a lot of labor cost;

(2) The word vector model is pre-trained by utilizing a huge corpus of Wikipedia, so that the accuracy of the word vector model is improved, and the word vector model has better practicability when being applied to a factory building potential safety hazard database;

(3) By the method, when the production elements of the factory building are changed, the new potential safety hazards can be rapidly evaluated in the danger level, and the safe production work of the factory building is guided.

Drawings

FIG. 1 is a flow chart of a rapid estimation method for the risk level of the potential safety hazard in a factory building.

FIG. 2 is a diagram of a pre-training Word vector model Word2Vec structure.

FIG. 3 is a diagram of a standardized data format of a plant safety hazard text.

Detailed Description

The following describes the implementation of the present invention in detail with reference to specific embodiments.

The invention discloses a method for quickly estimating the risk level of potential safety hazards in a factory building, which comprises the following steps of:

(1) The potential safety hazard causes include human factors and equipment material factors;

(4) The potential safety hazard relates to areas, namely areas where the potential safety hazard can occur in a factory building;

(5) The potential safety hazard relates to objects, namely which factors in a man-machine material law ring are related to the potential safety hazard;

step 1.3, inputting the six-point information of each potential safety hazard analyzed in the step 1.2 into a database to form a plant potential safety hazard database, wherein the database table is designed as shown in the following table (mysql database is adopted by default):

table 1 factory building potential safety hazard database table:

name of field	Length of	Character type	Whether or not it is empty	Main key
					Dangers _ id (potential safety hazard number)	5	Int	Not null	PK
Reason (cause of production)	255	Varchar	Not null
					Performance (hazard characterization)	255	Varchar	Not null
Result (consequence of risk)	255	Varchar	Not null
					Related _ area (Related to region)	50	Varchar	Not null
Related _ object (Related to object)	20	Varchar	Not null
					Level (danger Level)	5	Int	Not null

Step 2, pre-training the Chinese word vector model by utilizing a Chinese corpus of Wikipedia, which specifically comprises the following steps:

and 2.1, downloading a Chinese language database as original training data on a Wikipedia website. Since the data contains many traditional characters, the traditional characters are all converted into simplified characters by using an opencc tool.

And 2.2, extracting article contents and performing word segmentation on the Chinese corpus which is completely converted into simplified characters by using a regular expression. Step 2.1 the corpus extracted contains many < doc > </doc >, so that these irrelevant contents need to be removed by regular expressions. Then, the article is segmented by a Jieba tool in python, and some words without practical meaning are removed during segmentation, so that the removal of a stop word is added after the segmentation.

And 2.3, training the corpus after Word segmentation and Word stop removal by using a Word2Vec model, wherein the structure of the Word2Vec model is shown in FIG. 2. Some of the parameters can be applied to the subsequent steps with slight modifications as follows: firstly, changing the vector dimension of a word into 64 bits; secondly, changing the training window to 5, namely considering the front five adjacent words and the rear five adjacent words; then, the lowest word frequency is set to be 5, namely, if the frequency of one word appearing in all the linguistic data is less than five times, the word is discarded; meanwhile, the learning rate is adjusted to 0.025; finally, the number of iterations is set to 10.

And 3, dividing a plant potential safety hazard text training set, a test set and a verification set, and generating a potential safety hazard corpus word vector. The method specifically comprises the following steps:

and 3.1, extracting text data stored in the plant potential safety hazard database formed in the step 1 by using pymysql.

And 3.2, standardizing the extracted potential safety hazard text data according to the data format of the figure 3, wherein each potential safety hazard adopts a format of danger level + text content and is separated by a \ t interval.

And 3.3, performing word segmentation on the information in the specific content of the potential safety hazard by using Jieba, and establishing a special stop word list aiming at the specific text information of a specific workshop after a word segmentation result is obtained, wherein for example, a word "cause" is frequently generated in the text information of the potential safety hazard of a certain workshop, but the word has no specific function in risk level evaluation and is added into the stop word list.

And 3.4, sending the text data after word segmentation and word stop removal into the word vector model trained in the step 2, and outputting to obtain 64-bit feature vectors of each corresponding word.

And 4, sending the divided and preprocessed potential safety hazard texts into a neural network model for fine adjustment to obtain a danger level classification model. The method specifically comprises the following steps:

and 4.1, establishing a model, establishing a bert pre-training model by using python, and generally adopting the bert Chinese model for Chinese corpora.

And 4.2, reading the divided potential safety hazard text training set, test set and verification set, and starting to train the potential safety hazard danger level classification model after modifying part of parameters, wherein the model parameters are modified as follows: the risk level classification model after fine tuning is obtained by first modifying the longest sentence length to 50, then tuning the learning rate to 2e-5, and tuning the number of iterations to 3, and finally setting the batch size to 16.

And 5, when new potential safety hazards appear in the factory building, acquiring relevant element information of the factory building, importing a danger level classification model, and estimating the danger level, wherein the specific steps are as follows:

and preprocessing the text information of the new potential safety hazard, importing the preprocessed text information into the post-fine-adjustment bert danger level classification model, and estimating the danger level of the new potential safety hazard from high to low in sequence according to the danger level probability output by the model.

And 6, after the estimated risk level is obtained, comparing the similarity of the document between the new potential safety hazard and the potential safety hazard with the same risk level existing in the database, and further evaluating the estimated risk level result, wherein the method specifically comprises the following steps:

and 6.1, extracting text data with the same risk level as the risk level of the potential safety hazard with the highest probability evaluated in the step 5 from the potential safety hazard database after the risk level of the newly appeared potential safety hazard is estimated.

And 6.2, performing Word segmentation and stop Word removal on the newly-appeared potential safety hazard and the related text data of the same-level potential safety hazard extracted from the database, and then importing the Word vector model into the Word2Vec Word vector model generated in the step 2 to generate a corresponding Word vector.

And 6.3, because the text data of the potential safety hazard is mostly short text and is usually within 50 words after preprocessing, representing the document vector of the potential safety hazard by using the average value of the word vectors in each potential safety hazard document.

And 6.4, representing the similarity of the new potential safety hazard and each text of the same-level potential safety hazard by using the space cosine similarity of the text vector, wherein the confidence threshold value is set to be 0.5, namely if the similarity of the new potential safety hazard text and more than 50% of the existing texts with the same-level potential safety hazard exceeds 50%, the risk level estimated by the risk level classification model is considered to be credible, otherwise, selecting the risk level corresponding to the next prediction probability from the step 5 and repeating the operations of the steps 6.2 to 6.3. And if all the corresponding danger levels do not exceed the confidence threshold, preferentially selecting the danger level classification model to output the danger level corresponding to the highest prediction probability.

The foregoing shows and describes the general principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A method for quickly estimating the risk level of potential safety hazards in a factory building is characterized by comprising the following steps:

(4) The potential safety hazard relates to areas, namely which areas of the factory building the potential safety hazard can occur in;

(5) The potential safety hazard relates to objects, namely which factors in a man-machine material law ring are involved in the potential safety hazard;

the step 2 specifically comprises:

step 2.1, downloading a Chinese language database as original training data on a Wikipedia website and converting the Chinese language database into simplified characters by using an opencc tool;

step 2.3, training the corpus after Word segmentation and removal of stop words by using a Word2Vec model;

step 3, dividing a plant potential safety hazard text training set, a test set and a verification set, and generating a potential safety hazard corpus word vector, wherein the method specifically comprises the following steps:

step 3.4, the potential safety hazard text contents after word segmentation and word stop removal are sent to the word vector model trained in the step 2, and the feature vector of each corresponding word is output;

step 4, sending the divided and standardized potential safety hazard texts into a bert neural network model for fine adjustment to obtain a danger level classification model;

step 5, when new potential safety hazards appear in the factory building, acquiring relevant element information according to the six points in the step 1.2, forming text information, importing a danger level classification model, and estimating the danger level, wherein the method specifically comprises the following steps: preprocessing the text information of the new potential safety hazard, importing the preprocessed text information into a finely adjusted danger level classification model, and estimating the danger level of the new potential safety hazard from high to low in sequence according to the danger level probability output by the model;

and 6, after the estimated risk level of the new potential safety hazard is obtained, comparing the text similarity of the new potential safety hazard and the potential safety hazard with the same risk level stored in the database, and further evaluating the confidence of the estimated result of the risk level of the new potential safety hazard.

2. The method according to claim 1, wherein the chinese corpus in step 2 is a wikipedia chinese corpus.

3. The method of claim 1, wherein the eigenvector output in step 3.4 is a 64-bit eigenvector.

4. The method according to claim 3, wherein the step 4 specifically comprises:

5. The method according to claim 4, wherein the step 6 specifically comprises:

step 6.1, after the danger level of the newly appeared potential safety hazard is estimated, extracting text data which are the same as the danger level of the potential safety hazard with the highest probability estimated in the step 5 from a potential safety hazard database;

and 6.4, representing the similarity of the new potential safety hazard and each text of the same-level potential safety hazard by using the space cosine similarity of the text vector, wherein the confidence threshold value is set to be 0.5, namely if the similarity of the new potential safety hazard text and more than 50% of the existing texts with the same-level potential safety hazard exceeds 50%, the risk level estimated by the risk level classification model is considered to be credible, and otherwise, selecting the risk level corresponding to the next prediction probability from the step 5 and repeating the operations of the steps 6.2 to 6.3.

6. The method according to claim 5, wherein in step 6.4, if all the corresponding risk classes do not exceed the confidence threshold, the risk class classification model is selected to output the risk class corresponding to the highest prediction probability.