CN111091311A - Safety production accident factor analysis method and system - Google Patents

Safety production accident factor analysis method and system Download PDF

Info

Publication number
CN111091311A
CN111091311A CN202010210408.7A CN202010210408A CN111091311A CN 111091311 A CN111091311 A CN 111091311A CN 202010210408 A CN202010210408 A CN 202010210408A CN 111091311 A CN111091311 A CN 111091311A
Authority
CN
China
Prior art keywords
accident
factors
safety production
distance
factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010210408.7A
Other languages
Chinese (zh)
Inventor
王斌
吴勤峰
陶建强
江丽琴
雷希燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Topinfo Technology Co ltd
Original Assignee
Zhejiang Topinfo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Topinfo Technology Co ltd filed Critical Zhejiang Topinfo Technology Co ltd
Priority to CN202010210408.7A priority Critical patent/CN111091311A/en
Publication of CN111091311A publication Critical patent/CN111091311A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a safety production accident factor analysis method, which comprises the following steps: extracting a plurality of accident factors based on the enterprise safety production accident information; determining a total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier; selecting a plurality of accident factors of N before the total TF-IDF value in each accident category is ranked as a clustering sample; clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors, and providing decision-making basis for safety production and risk potential prevention of enterprises.

Description

Safety production accident factor analysis method and system
Technical Field
The invention relates to the technical field of accident factor analysis, in particular to a safety production accident factor analysis method and system.
Background
The safety is the basic requirement of people, is the essential condition for the lasting and good development of enterprises, and is the basic guarantee for the development of the economic society. The occurrence of safety accidents often has the characteristics of accident, outburst and complexity, seems to be too defensive, and actually comes out of the hidden danger of neglected daily risks and hidden in the details which are not responsible.
With the development of scientific technology and production mode, the essential rule of accident occurrence is continuously changed, mastered data information is deeply explored, more safety accident rules are developed, factors of accident occurrence are continuously analyzed and researched, qualitative and quantitative analysis is conducted on accident reasons, prediction and prevention of accidents are conducted, safety management work is improved, reference basis is theoretically provided, enterprises can make correct decisions in safety production in a targeted mode, and safety production accidents are effectively prevented.
The accident factor analysis aims to mine key factors of possible accidents, explore the sources of accidents in various accidents, and ask the enterprise responsible persons and all employees to pay attention to the accidents, so that the enterprise can timely and pertinently take preventive measures, control various unsafe factors from the sources, and eliminate and reduce the accident occurrence probability and the severity of consequences to the maximum extent.
The prior art cannot analyze the safety production accident factors, so that how to analyze the safety production accident factors becomes a problem which needs to be solved in the field urgently.
Disclosure of Invention
Based on this, the invention aims to provide a safety production accident factor analysis method and system to realize analysis and obtain the safety production accident factor.
In order to achieve the above object, the present invention provides a factor analysis method for safety production accidents, the method comprising:
extracting a plurality of accident factors based on the enterprise safety production accident information;
determining a total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier;
selecting a plurality of accident factors of N before the total TF-IDF value in each accident category is ranked as a clustering sample; wherein N is a positive integer greater than or equal to 1;
and clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors.
Optionally, the extracting a plurality of accident factors based on the enterprise safety production accident information includes:
extracting enterprise safety production accident information from a database;
performing word segmentation processing on the enterprise safety production accident information to obtain a word segmentation set; the word segmentation set comprises a plurality of keywords;
and filtering the keywords in the word segmentation set to obtain a plurality of keywords related to accident information, and taking the keywords related to the accident information as a plurality of accident factors.
Optionally, the clustering samples by using the K-Means algorithm to obtain multiple types of safety production accident factors includes:
randomly extracting K accident factors from the clustering samples to serve as initial class center points;
calculating the distance from each accident factor to each class center point by using a distance formula;
distributing each accident factor to the nearest class center point according to the minimum distance principle to form K classes, and recalculating the class center points of the classes;
adding one to the iteration times;
judging whether the iteration times are larger than or equal to the maximum iteration times; if the iteration times are larger than or equal to the maximum iteration times, outputting K types of factors of the safety production accident to be determined; if the iteration times are less than the maximum iteration times, returning to the step of calculating the distance between each accident factor and each class center point by using a distance formula;
or judging whether the class center points of various classes change; if the class center points of the various classes are changed, returning to the step of calculating the distance from each accident factor to each class center point by using a distance formula; if the class center points of the classes are not changed, outputting K classes of safety production accident factors to be determined;
determining the inter-class distance between different classes according to the accident factors in the accident factors to be determined of the different classes for safety production;
judging whether the distance between the classes is smaller than or equal to a set distance; if the inter-class distance is smaller than or equal to the set distance, combining the two types of safety production accident factors to be determined into one type of safety production accident factors, and outputting multiple types of safety production accident factors; if the distance between the classes is larger than the set distance, no processing is needed.
Optionally, the distance formula is any one of an euclidean distance formula, a manhattan distance formula, a chebyshev distance formula, and a cosine distance formula.
Optionally, the determining a total TF-IDF value of each accident factor in each accident category based on the bayesian classifier includes:
constructing a Bayes classifier based on a naive Bayes algorithm;
calculating TF-IDF values of the accident factors by using the Bayesian classifier;
the TF-IDF values of the same accident factor in the same accident category are added to determine the total TF-IDF value of each accident factor in each accident category.
Optionally, the method further includes:
judging whether the total TF-IDF value of each accident factor in a certain type of safety production accident factors is greater than zero or not; if the accident factor is larger than zero, the incidence relation exists between the accident factors and the safety production accident factors; if the accident factor is equal to zero, indicating that no association exists between the accident factor and the safety production accident factor;
constructing an accident factor analysis tree based on the incidence relation between each accident factor and each safety production accident factor;
and drawing the accident factor analysis tree by using a grapeviz tool.
The invention also provides a safety production accident factor analysis system, which comprises:
the extraction module is used for extracting a plurality of accident factors based on the enterprise safety production accident information;
the total TF-IDF value determining module is used for determining the total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier;
the selecting module is used for selecting a plurality of accident factors of N before the total TF-IDF value in each accident category is ranked as a clustering sample; wherein N is a positive integer greater than or equal to 1;
and the clustering module is used for clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors.
Optionally, the extracting module includes:
the extraction unit is used for extracting enterprise safety production accident information from the database;
the word segmentation processing unit is used for carrying out word segmentation processing on the enterprise safety production accident information to obtain a word segmentation set; the word segmentation set comprises a plurality of keywords;
and the filtering unit is used for filtering the keywords in the word segmentation set to obtain a plurality of keywords related to accident information, and taking the keywords related to the accident information as a plurality of accident factors.
Optionally, the clustering module includes:
an initial class center point determining unit, configured to randomly extract K accident factors from the cluster samples as initial class center points;
the distance determining unit is used for calculating the distance from each accident factor to each class center point by using a distance formula;
the distribution unit is used for distributing each accident factor to the nearest class center point according to the minimum distance principle to form K classes and recalculating the class center points of the classes;
the adding processing unit is used for adding one to the iteration times;
a first judgment unit for judging whether the number of iterations is greater than or equal to the maximum number of iterations; if the iteration times are larger than or equal to the maximum iteration times, outputting K types of factors of the safety production accident to be determined; if the iteration times are less than the maximum iteration times, returning to the step of calculating the distance between each accident factor and each class center point by using a distance formula; or, the method is used for judging whether the class center points of various classes change; if the class center points of the various classes are changed, returning to the step of calculating the distance from each accident factor to each class center point by using a distance formula; if the class center points of the classes are not changed, outputting K classes of safety production accident factors to be determined;
the inter-class distance determining unit is used for determining the inter-class distance between different classes according to the accident factors in the accident factors to be determined of the different classes of safety production;
a second judging unit, configured to judge whether the inter-class distance is smaller than or equal to a set distance; if the inter-class distance is smaller than or equal to the set distance, combining the two types of safety production accident factors to be determined into one type of safety production accident factors, and outputting multiple types of safety production accident factors; if the distance between the classes is larger than the set distance, no processing is needed.
Optionally, the total TF-IDF value determining module includes:
the device comprises a construction unit, a classification unit and a classification unit, wherein the construction unit is used for constructing a Bayesian classifier based on a naive Bayesian algorithm;
a TF-IDF value determination unit for calculating TF-IDF values of the accident factors by using the Bayesian classifier;
and the total TF-IDF value determining unit is used for adding TF-IDF values of the same accident factor in the same accident category to determine the total TF-IDF value of each accident factor in each accident category.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a safety production accident factor analysis method and a system, wherein the method comprises the following steps: extracting a plurality of accident factors based on the enterprise safety production accident information; determining a total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier; selecting a plurality of accident factors of N before the total TF-IDF value in each accident category is ranked as a clustering sample; clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors, and providing decision-making basis for safety production and risk potential prevention of enterprises.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a safety production accident factor analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an accident factor analysis tree according to an embodiment of the present invention;
fig. 3 is a structural diagram of a safety production accident factor analysis system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a safety production accident factor analysis method and system to realize analysis and obtain safety production accident factors.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a factor analysis method for a safety production accident according to an embodiment of the present invention, and as shown in fig. 1, the present invention provides a factor analysis method for a safety production accident, where the method includes:
step S1: extracting a plurality of accident factors based on the enterprise safety production accident information;
step S2: determining a total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier;
step S3: selecting a plurality of accident factors of N before the total TF-IDF value in each accident category is ranked as a clustering sample;
step S4: and clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors.
The individual steps are discussed in detail below:
step S1: the enterprise safety production accident information-based extraction of a plurality of accident factors comprises:
step S11: and extracting the enterprise safety production accident information from the database.
Step S12: performing word segmentation processing on the enterprise safety production accident information through jieba in a Python word segmentation tool library to obtain a word segmentation set; the segmentation set comprises a plurality of keywords.
Step S13: and filtering the keywords in the segmentation set through the stop word list stopwords to obtain a plurality of keywords related to accident information, taking the keywords related to the accident information as a plurality of accident factors, and removing meaningless phrases in the segmentation set.
Step S2: the Bayesian classifier based determination of the total TF-IDF value of each of the accident factors in each accident category comprises:
step S21: and constructing a Bayes classifier based on a naive Bayes algorithm. The extracted accident factor is processed by adopting a Bayesian classifier packaged in a Python library sklern.
Step S22: and calculating TF-IDF values of the accident factors by using the Bayesian classifier.
The TF-IDF index reflects the activity degree of an accident factor, and the higher the index value is, the more the index value can represent the main source of the accident.
Step S221: calculating word Frequency (TF) by using the Bayes classifier, wherein the specific formula is as follows:
Figure DEST_PATH_IMAGE002A
wherein, TF is the word frequency, F is the frequency of the accident factor to be calculated appearing in the accident information, and N is the total number of the accident factor.
Step S222: the Bayesian classifier is used for calculating the Inverse Document Frequency (IDF), and the specific calculation formula is as follows:
Figure DEST_PATH_IMAGE004A
the IDF is the reverse document frequency, X is the total number of the enterprise safety production accident information, and T is the number of the enterprise safety production accident information containing the accident factor to be calculated.
Step S223: determining TF-IDF values of the accident factors according to the word frequency and the reverse document frequency, wherein the specific formula is as follows:
Figure DEST_PATH_IMAGE006A
the inverse document frequency IDF represents a measure of the general importance of a factor, which refers to the degree of discrimination of a fault factor in the fault category. When an accident factor occurs more frequently in a certain type of accident report and less frequently or not in other accident reports, it is easier to distinguish the accident report from other accident reports by the accident factor, and the greater the IDF represents the greater the degree of distinction of the factor.
The higher the value of the TF-IDF index, the more frequently this factor appears in one accident report, but the less frequently it appears in other accident reports, and such factor is suitable for representing some characteristics in the description of the accident report, rather than the conventional words (similar to the words 'of', and 'of' not), and represents some accident factor in the present invention.
Step S23: the TF-IDF values of the same accident factor in the same accident category are added to determine the total TF-IDF value of each accident factor in each accident category.
TABLE 1 Accident factor Total TF-IDF value Table
Cause of accident Seed of Japanese apricot Object striking Fire hazard Roof fall upper In Poisoning by toxic substances And suffocation device Information processing device Collapse Collapse Machine for working Machinery Injury due to wound Harm (I) Touch and touch Electric power Hoisting machine Injury of the human body
1 Thoughts of no Centralizing 0.762578253 0 0 0 0 0 0
2 Safety device Thought of 0.762578253 0 0 0 0 0 0
3 Greater paralysis Intention to 0.707106781 0 0 0 0 0 0
4 Violation operation Making 0.707106781 0 0 0 0 0 0
5 Post skill Can be used for 0.707106781 0 0 0 0 0 0
6 Tool with a locking mechanism 0.666935056 0 0 0 0 0 0
7 Mineral roller Fall off 0.646895979 0 0 0 0 0 0
8 On-site tube Theory of things 0.633207457 0 0 0 0 0 0
9 Safety cultivation method Training device 0.613842682 0 0 0 0 0 0
10 On fire 0 0.742308875 0 0 0 0 0 0
11 Welding of 0 0.731223451 0 0 0 0 0 0
12 Constructor Personnel 0 0.68213801 0 0 0 0 0 0
13 Battery jar 0 0.635117941 0 0 0 0 0 0
14 Centrifugal machine 0 0.590594288 0 0 0 0 0 0
15 Dichloro-methyl Alkane (I) and its preparation method 0 0.590594288 0 0 0 0 0 0
16 Kitchen cabinet 0 0.574521013 0 0 0 0 0 0
17 Etching of 0 0.537016764 0 0 0 0 0 0
18 Top board 0 0 1.940885521 0 0 0 0 0
19 Drilling machine 0 0 1.708436828 0 0 0 0 0
20 Violation of regulations 0 0 1.038018633 0 0 0 0 0
In order to extract the most important accident factors, the same accident factors need to be combined, the sum of TF-IDF of each accident factor in each accident is counted, then the factor ranking ten above the sum of TF-IDF in each accident is taken as the main factor of the accident, and a clustering sample is provided for the clustering of the next factor. In the clustering samples, the total TF-IDF values of the accident factor in each accident type are shown in Table 1:
step S4: clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors, wherein the method comprises the following steps:
step S41: and randomly extracting K accident factors from the cluster samples to serve as initial class center points.
In this embodiment, the value of K takes 4 types of factors for each type of accident, so K = 4 number of accident types.
Step S42: and calculating the distance from each accident factor to each class center point by using a distance formula.
Step S43: and distributing each accident factor to the nearest class center point according to the minimum distance principle to form K classes, and recalculating the class center points of the classes.
Step S44: and adding one to the iteration number.
Step S45: judging whether the iteration times are larger than or equal to the maximum iteration times; if the iteration times are larger than or equal to the maximum iteration times, outputting K types of factors of the safety production accident to be determined; and if the iteration number is less than the maximum iteration number, returning to the step of calculating the distance between each accident factor and each class center point by using a distance formula.
Step S46: or judging whether the class center points of various classes change; if the class center points of the various classes are changed, returning to the step of calculating the distance from each accident factor to each class center point by using a distance formula; and if the class center points of the classes are not changed, outputting K classes of safety production accident factors to be determined.
Step S47: and determining the inter-class distance between different classes according to the accident factors in the accident factors to be determined of the different classes for safety production.
In the processing process, similar factor categories are also required to be combined, for example, the "thought is not concentrated" and the "safety consciousness" are divided into two categories of factors, but the two categories of factors are very similar and can be classified as the "unsafe behavior of people", so that the two categories of factors are required to be combined, the combination mode of the categories adopts a Single mode, the shortest distance between accident factors in two different categories of safety production accident factors is used as the inter-category distance, and the specific calculation formula is as follows:
Figure DEST_PATH_IMAGE008A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE010A
for the inter-class distance, xeCl and yeCl represent two different classes, respectively,
Figure DEST_PATH_IMAGE012A
for the distance values between the accident factors for the distance to be calculated,
Figure DEST_PATH_IMAGE014A
Figure DEST_PATH_IMAGE016A
respectively taking the total TF-IDF values of the accident factors of the distance to be calculated in the ith accident,
Figure DEST_PATH_IMAGE018A
for the total number of categories of accidents, i.e. the number of feature dimensions of each factor, e.g. 10 different types of accidents in the accident report, the number of feature dimensions for each factor is 10
Step S48: judging whether the distance between the classes is smaller than or equal to a set distance; if the inter-class distance is smaller than or equal to the set distance, combining the two types of safety production accident factors to be determined into one type of safety production accident factors, and outputting multiple types of safety production accident factors; if the distance between the classes is larger than the set distance, no processing is needed.
The name system of each kind of central point can not be named, only a serial number is marked for each cluster, therefore, after the clustering is finished, each kind of central point, namely each cluster, is named manually, and the method is roughly divided into 4 types, namely: the results of the specific clustering process, including unsafe behavior of people, environmental factors, unsafe states of objects, and management defects, are shown in table 2:
TABLE 2 Accident factor classification table for safety production
Cause of accident Class of elements Cause of accident Seed of Japanese apricot Object striking Fire hazard Roof fall upper Poisoning by Suffocation and asphyxia Information processing device Burn injury Collapse Collapse Container with a lid Explosion of the vessel
1 Human being's immortal Safety line Is composed of Thoughts of no Centralizing 0.76257825 0 0 0 0 0 0
2 Human being's immortal Safety line Is composed of Safety device Thought of 0.76257825 0 0 0 0 0 0
3 Human being's immortal Safety line Is composed of Greater paralysis Intention to 0.70710678 0 0 0 0 0 0
4 Environmental reason Vegetable extract Tool with a locking mechanism 0.66693506 0 0 0 0 0 0
5 Environmental reason Vegetable extract Mineral roller Fall off 0.64689598 0 0 0 0 0 0
6 Managing absence Trap for storing food On-site tube Theory of things 0.63320746 0 0 0 0 0 0
7 Managing absence Trap for storing food Safety cultivation method Training device 0.61384268 0 0 0 0 0 0
8 Article of no Safety shape State of the art Top board 0 0 1.9408855 0 0 0 0
9 Article of no Safety shape State of the art Drilling machine 0 0 1.7084368 0 0 0 0
10 Article of no Safety shape State of the art Bench worker 0 0 0 0 0.783691 0 0
11 Article of no Safety shape State of the art Implement for measuring the length of a tooth 0 0 0 0 0.773982 0 0
12 Article of no Safety shape State of the art For labor protection Article (A) 0 0 0 0 0.731223 0 0
In one embodiment, the distance formula is any one of an euclidean distance, a manhattan distance, a chebyshev distance, and a cosine distance.
After the accident factor clustering of the previous step, each factor is further abstracted into a higher-level accident factor category, in this step, the association relationship between each accident and each accident factor needs to be mined, namely, the main factor of the accident occurrence is further revealed, generally speaking, the association relationship between things is mainly mined by adopting an Apriori algorithm, the principle of the Apriori algorithm is iteration layer by layer, and the (k +1) item set is researched according to the k item set. Firstly, acquiring the accumulated information of each item by means of database scanning, acquiring the item with the minimum support degree, acquiring the aggregation of a frequent 1 item set, and recording the aggregation as L1; secondly, a set of frequent 2 item sets, L2, is obtained via L1, and then L3 is found via L2, until it is difficult to find the frequent k item sets.
In the accident cause analysis of the present invention, if a TF-IDF value of an accident factor in a certain accident type is greater than 0, the accident factor and the accident type can be considered to be a frequent item set, and the incidence relation between the accident factor and the accident type can be obtained, and if the TF-IDF value is equal to 0, the accident factor does not form the cause of the accident, i.e. no incidence relation exists, so the matrix is a special case of the Apriori algorithm, and is a simplified frequent item set, and as long as we do some dimension combination and data aggregation in the processing process, the incidence relation between the accident type and the accident factor can be obtained, the specific summary steps are as follows:
the method of the invention also comprises the following steps:
step S5: judging whether the total TF-IDF value of each accident factor in a certain type of safety production accident factors is greater than zero or not; if the accident factor is larger than zero, the incidence relation exists between the accident factors and the safety production accident factors; if the accident factor is equal to zero, the accident factor and the safety production accident factor do not have an association relation.
Step S6: and constructing an accident factor analysis tree based on the incidence relation between each accident factor and each safety production accident factor.
Step S7: and drawing the accident factor analysis tree by using a grapeviz tool, and visually displaying the basic reasons and the basic reason combinations of the accident occurrence, so as to provide basis for enterprise safety production and risk potential prevention, as shown in fig. 2.
Fig. 3 is a structural diagram of a factor analysis system for safety production accidents according to an embodiment of the present invention, and as shown in fig. 3, the present invention further provides a factor analysis system for safety production accidents, where the system includes:
the extraction module 1 is used for extracting a plurality of accident factors based on the enterprise safety production accident information;
a total TF-IDF value determining module 2, which is used for determining the total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier;
the selecting module 3 is used for selecting a plurality of accident factors of N before the total TF-IDF value in each accident category is ranked as a clustering sample; wherein N is a positive integer greater than or equal to 1;
and the clustering module 4 is used for clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors.
The various modules are discussed in detail below:
as an alternative embodiment, the extraction module 1 of the present invention includes:
the extraction unit is used for extracting enterprise safety production accident information from the database;
the word segmentation processing unit is used for carrying out word segmentation processing on the enterprise safety production accident information to obtain a word segmentation set; the word segmentation set comprises a plurality of keywords;
and the filtering unit is used for filtering the keywords in the word segmentation set to obtain a plurality of keywords related to accident information, and taking the keywords related to the accident information as a plurality of accident factors.
As an optional implementation manner, the clustering module 4 of the present invention includes:
an initial class center point determining unit, configured to randomly extract K accident factors from the cluster samples as initial class center points;
the distance determining unit is used for calculating the distance from each accident factor to each class center point by using a distance formula;
the distribution unit is used for distributing each accident factor to the nearest class center point according to the minimum distance principle to form K classes and recalculating the class center points of the classes;
the adding processing unit is used for adding one to the iteration times;
a first judgment unit for judging whether the number of iterations is greater than or equal to the maximum number of iterations; if the iteration times are larger than or equal to the maximum iteration times, outputting K types of factors of the safety production accident to be determined; if the iteration times are less than the maximum iteration times, returning to the step of calculating the distance between each accident factor and each class center point by using a distance formula; or, the method is used for judging whether the class center points of various classes change; if the class center points of the various classes are changed, returning to the step of calculating the distance from each accident factor to each class center point by using a distance formula; if the class center points of the classes are not changed, outputting K classes of safety production accident factors to be determined;
the inter-class distance determining unit is used for determining the inter-class distance between different classes according to the accident factors in the accident factors to be determined of the different classes of safety production;
a second judging unit, configured to judge whether the inter-class distance is smaller than or equal to a set distance; if the inter-class distance is smaller than or equal to the set distance, combining the two types of safety production accident factors to be determined into one type of safety production accident factors, and outputting multiple types of safety production accident factors; if the distance between the classes is larger than the set distance, no processing is needed.
As an alternative embodiment, the total TF-IDF value determining module 2 of the present invention includes:
the device comprises a construction unit, a classification unit and a classification unit, wherein the construction unit is used for constructing a Bayesian classifier based on a naive Bayesian algorithm;
a TF-IDF value determination unit for calculating TF-IDF values of the accident factors by using the Bayesian classifier;
and the total TF-IDF value determining unit is used for adding TF-IDF values of the same accident factor in the same accident category to determine the total TF-IDF value of each accident factor in each accident category.
As an optional implementation, the system of the present invention further includes:
the judging module is used for judging whether the total TF-IDF value of each accident factor in a certain type of safety production accident factors is larger than zero or not; if the accident factor is larger than zero, the incidence relation exists between the accident factors and the safety production accident factors; if the accident factor is equal to zero, indicating that no association exists between the accident factor and the safety production accident factor;
the construction module is used for constructing an accident factor analysis tree based on the incidence relation between each accident factor and each safety production accident factor;
and the drawing module is used for drawing the accident factor analysis tree by utilizing a grapeviz tool.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A safety production accident factor analysis method, characterized in that the method comprises:
extracting a plurality of accident factors based on the enterprise safety production accident information;
determining a total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier;
selecting a plurality of accident factors of N before the total TF-IDF value in each accident category is ranked as a clustering sample; wherein N is a positive integer greater than or equal to 1;
and clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors.
2. The factor analysis method for incident factors on safety in production according to claim 1, wherein the extracting a plurality of incident factors based on the information of the incident on safety in production of the enterprise comprises:
extracting enterprise safety production accident information from a database;
performing word segmentation processing on the enterprise safety production accident information to obtain a word segmentation set; the word segmentation set comprises a plurality of keywords;
and filtering the keywords in the word segmentation set to obtain a plurality of keywords related to accident information, and taking the keywords related to the accident information as a plurality of accident factors.
3. The factor analysis method for safety production accidents according to claim 1, wherein the clustering samples by the K-Means algorithm to obtain multiple types of safety production accidents factors comprises:
randomly extracting K accident factors from the clustering samples to serve as initial class center points;
calculating the distance from each accident factor to each class center point by using a distance formula;
distributing each accident factor to the nearest class center point according to the minimum distance principle to form K classes, and recalculating the class center points of the classes;
adding one to the iteration times;
judging whether the iteration times are larger than or equal to the maximum iteration times; if the iteration times are larger than or equal to the maximum iteration times, outputting K types of factors of the safety production accident to be determined; if the iteration times are less than the maximum iteration times, returning to the step of calculating the distance between each accident factor and each class center point by using a distance formula;
or judging whether the class center points of various classes change; if the class center points of the various classes are changed, returning to the step of calculating the distance from each accident factor to each class center point by using a distance formula; if the class center points of the classes are not changed, outputting K classes of safety production accident factors to be determined;
determining the inter-class distance between different classes according to the accident factors in the accident factors to be determined of the different classes for safety production;
judging whether the distance between the classes is smaller than or equal to a set distance; if the inter-class distance is smaller than or equal to the set distance, combining the two types of safety production accident factors to be determined into one type of safety production accident factors, and outputting multiple types of safety production accident factors; if the distance between the classes is larger than the set distance, no processing is needed.
4. The factor analysis method for accidents according to claim 3, wherein the distance formula is any one of Euclidean distance formula, Manhattan distance formula, Chebyshev distance formula and cosine distance formula.
5. The safety production accident factor analysis method of claim 1, wherein the determining the total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier comprises:
constructing a Bayes classifier based on a naive Bayes algorithm;
calculating TF-IDF values of the accident factors by using the Bayesian classifier;
the TF-IDF values of the same accident factor in the same accident category are added to determine the total TF-IDF value of each accident factor in each accident category.
6. The safety production accident factor analysis method of claim 1, wherein the method further comprises:
judging whether the total TF-IDF value of each accident factor in a certain type of safety production accident factors is greater than zero or not; if the accident factor is larger than zero, the incidence relation exists between the accident factors and the safety production accident factors; if the accident factor is equal to zero, indicating that no association exists between the accident factor and the safety production accident factor;
constructing an accident factor analysis tree based on the incidence relation between each accident factor and each safety production accident factor;
and drawing the accident factor analysis tree by using a grapeviz tool.
7. A safety production accident factor analysis system, the system comprising:
the extraction module is used for extracting a plurality of accident factors based on the enterprise safety production accident information;
the total TF-IDF value determining module is used for determining the total TF-IDF value of each accident factor in each accident category based on a Bayesian classifier;
the selecting module is used for selecting a plurality of accident factors of N before the total TF-IDF value in each accident category is ranked as a clustering sample; wherein N is a positive integer greater than or equal to 1;
and the clustering module is used for clustering the clustering samples through a K-Means algorithm to obtain multiple types of safety production accident factors.
8. The safety production accident factor analysis system of claim 7, wherein the extraction module comprises:
the extraction unit is used for extracting enterprise safety production accident information from the database;
the word segmentation processing unit is used for carrying out word segmentation processing on the enterprise safety production accident information to obtain a word segmentation set; the word segmentation set comprises a plurality of keywords;
and the filtering unit is used for filtering the keywords in the word segmentation set to obtain a plurality of keywords related to accident information, and taking the keywords related to the accident information as a plurality of accident factors.
9. The safety production accident factor analysis system of claim 7, wherein the clustering module comprises:
an initial class center point determining unit, configured to randomly extract K accident factors from the cluster samples as initial class center points;
the distance determining unit is used for calculating the distance from each accident factor to each class center point by using a distance formula;
the distribution unit is used for distributing each accident factor to the nearest class center point according to the minimum distance principle to form K classes and recalculating the class center points of the classes;
the adding processing unit is used for adding one to the iteration times;
a first judgment unit for judging whether the number of iterations is greater than or equal to the maximum number of iterations; if the iteration times are larger than or equal to the maximum iteration times, outputting K types of factors of the safety production accident to be determined; if the iteration times are less than the maximum iteration times, returning to the step of calculating the distance between each accident factor and each class center point by using a distance formula; or, the method is used for judging whether the class center points of various classes change; if the class center points of the various classes are changed, returning to the step of calculating the distance from each accident factor to each class center point by using a distance formula; if the class center points of the classes are not changed, outputting K classes of safety production accident factors to be determined;
the inter-class distance determining unit is used for determining the inter-class distance between different classes according to the accident factors in the accident factors to be determined of the different classes of safety production;
a second judging unit, configured to judge whether the inter-class distance is smaller than or equal to a set distance; if the inter-class distance is smaller than or equal to the set distance, combining the two types of safety production accident factors to be determined into one type of safety production accident factors, and outputting multiple types of safety production accident factors; if the distance between the classes is larger than the set distance, no processing is needed.
10. The safety production accident factor analysis system of claim 7, wherein the total TF-IDF value determination module comprises:
the device comprises a construction unit, a classification unit and a classification unit, wherein the construction unit is used for constructing a Bayesian classifier based on a naive Bayesian algorithm;
a TF-IDF value determination unit for calculating TF-IDF values of the accident factors by using the Bayesian classifier;
and the total TF-IDF value determining unit is used for adding TF-IDF values of the same accident factor in the same accident category to determine the total TF-IDF value of each accident factor in each accident category.
CN202010210408.7A 2020-03-24 2020-03-24 Safety production accident factor analysis method and system Pending CN111091311A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010210408.7A CN111091311A (en) 2020-03-24 2020-03-24 Safety production accident factor analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010210408.7A CN111091311A (en) 2020-03-24 2020-03-24 Safety production accident factor analysis method and system

Publications (1)

Publication Number Publication Date
CN111091311A true CN111091311A (en) 2020-05-01

Family

ID=70400648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010210408.7A Pending CN111091311A (en) 2020-03-24 2020-03-24 Safety production accident factor analysis method and system

Country Status (1)

Country Link
CN (1) CN111091311A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364627A (en) * 2020-10-23 2021-02-12 北京建筑大学 Safety production accident analysis method and device based on text mining, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358344A (en) * 2017-06-29 2017-11-17 浙江图讯科技股份有限公司 Enterprise's hidden danger management method and its management system, electronic equipment and storage medium
CN110532298A (en) * 2019-08-07 2019-12-03 北京交通大学 More attribute railway accident reason weight analysis methods

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358344A (en) * 2017-06-29 2017-11-17 浙江图讯科技股份有限公司 Enterprise's hidden danger management method and its management system, electronic equipment and storage medium
CN110532298A (en) * 2019-08-07 2019-12-03 北京交通大学 More attribute railway accident reason weight analysis methods

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王景春等: "基于改进K-Means聚类模型的公路隧道施工风险分析及其应用", 《公路交通科技》 *
谭章禄等: "基于文本聚类的煤矿安全隐患类型挖掘研究", 《中国安全科学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364627A (en) * 2020-10-23 2021-02-12 北京建筑大学 Safety production accident analysis method and device based on text mining, electronic equipment and storage medium
CN112364627B (en) * 2020-10-23 2023-07-25 北京建筑大学 Text mining-based safety production accident analysis method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107977798B (en) Risk assessment method for quality of electronic commerce product
Chokor et al. Analyzing Arizona OSHA injury reports using unsupervised machine learning
Fan et al. Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques
US7155668B2 (en) Method and system for identifying relationships between text documents and structured variables pertaining to the text documents
US20110137906A1 (en) Systems and methods for detecting sentiment-based topics
EP3447663A1 (en) System and method for event profiling
CN103577404B (en) A kind of completely new accident towards microblogging finds method
CN111950932A (en) Multi-source information fusion-based comprehensive quality portrait method for small and medium-sized micro enterprises
CN113722417B (en) Power system violation management method and device and power equipment
Wang et al. Identifying high-frequency–low-severity construction safety risks: an empirical study based on official supervision reports in Shanghai
Edwards et al. Decision making for risk management: a comparison of graphical methods for presenting quantitative uncertainty
SV et al. An analysis of attitude of general public toward COVID-19 crises–sentimental analysis and a topic modeling study
CN115544272A (en) Attention mechanism-based chemical accident cause knowledge graph construction method
CN114817681B (en) Financial wind control system based on big data analysis and management equipment thereof
Recal et al. Comparison of machine learning methods in predicting binary and multi-class occupational accident severity
CN111091311A (en) Safety production accident factor analysis method and system
KR102077923B1 (en) Method for classifying safety document on construction site and Server for performing the same
Berkin et al. Feasibility analysis of machine learning for performance-related attributional statements
Macedo et al. Identifying low-quality patterns in accident reports from textual data
CN102915315A (en) Method and system for classifying webpages
Rupasinghe et al. Understanding construction site safety hazards through open data: text mining approach
Al-Obeidat et al. Twitter sentiment analysis to understand students' perceptions about online learning during the Covid'19
Bügel et al. Multilingual analysis of twitter news in support of mass emergency events
Ge et al. Research on enterprise hidden danger association rules based on text analysis
CN113221556A (en) Method, device and equipment for identifying potential safety hazard

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200501