CN115099586A - Method and device for identifying operation risk - Google Patents
Method and device for identifying operation risk Download PDFInfo
- Publication number
- CN115099586A CN115099586A CN202210657311.XA CN202210657311A CN115099586A CN 115099586 A CN115099586 A CN 115099586A CN 202210657311 A CN202210657311 A CN 202210657311A CN 115099586 A CN115099586 A CN 115099586A
- Authority
- CN
- China
- Prior art keywords
- data
- job
- risk identification
- ticket data
- risk
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000004140 cleaning Methods 0.000 claims abstract description 23
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 230000015654 memory Effects 0.000 claims description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000012502 risk assessment Methods 0.000 abstract description 3
- 238000007726 management method Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000004220 aggregation Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 3
- 238000001311 chemical methods and process Methods 0.000 description 3
- 238000012824 chemical production Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 231100000572 poisoning Toxicity 0.000 description 2
- 230000000607 poisoning effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000383 hazardous chemical Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000005461 lubrication Methods 0.000 description 1
- 230000006996 mental state Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000010926 purge Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the disclosure discloses a method and a device for identifying operation risks, which comprises the steps of cleaning operation ticket data according to a preset data cleaning rule after the operation ticket data of history is obtained; preprocessing the cleaned job ticket data to extract feature contents based on the preprocessed data; and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification. By establishing the model for risk identification, items for risk identification corresponding to different job tickets can be generated, the pertinence of the risk items is improved, the accuracy of risk assessment can be improved based on the risk items, and the technical problems of high error rate, low efficiency and no pertinence caused by manual work or a mode of specifying a fixed template in the related art are solved.
Description
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for job risk identification.
Background
At present, operation management systems of various industries, such as operation management systems related to hazardous chemicals, mainly aim at risk identification of operations by experienced operation management personnel, perform risk identification on some daily operations through on-site observation and knowledge and experience of the personnel in combination with related management specifications, make identified risks into templates and record the templates one by one into the operation management systems, and then in subsequent operations, introduce the manually determined fixed templates and can be used for risk analysis and identification.
By adopting the mode, on one hand, the risk identification of the operation is carried out by an experienced operation manager through combining the field observation and the knowledge and experience of the operator with the relevant management standard, the risk identification is greatly influenced by the experience difference of the operator, and the risk of the final identification of the operator possibly has larger difference even aiming at the same type of action; on the other hand, the solidified risk identification template method cannot cope with operations in different environments, and the solidified risk identification template generally has wide coverage and weak pertinence and is easy to miss identification or wrong identification.
Disclosure of Invention
The main purpose of the present disclosure is to provide a method and apparatus for job risk identification.
In order to achieve the above object, according to a first aspect of the present disclosure, there is provided a method for job risk identification, including: after historical operation ticket data are obtained, cleaning the operation ticket data according to a preset data cleaning rule; preprocessing the cleaned job ticket data to extract feature contents based on the preprocessed data; and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification.
Optionally, the method further comprises: and after receiving new operation ticket data, inputting the new operation ticket data into the model for operation risk identification, and outputting an item for risk prediction corresponding to the new operation ticket data.
Optionally, determining the term for risk prediction based on the clustered data of each group comprises: taking the items for risk identification shared by all the job ticket data in each group after clustering as fixed risk prediction items; and taking the items for risk identification which are not shared by the data in each group after clustering as optional risk prediction items, wherein each piece of job ticket data can contain the content of the risk items.
Optionally, the clustering the extracted feature contents in groups includes: taking the characteristic content of each operation ticket data as a code group, and clustering the code group based on a Hamming algorithm; and performing secondary clustering on the clustered data points by using a hierarchical clustering algorithm based on the extracted preset characteristic content.
Optionally, the preprocessing the cleaned job data includes: extracting preset keywords aiming at main ticket data in the operation ticket data, and calculating word frequency; aiming at preset operation data in the operation ticket data, based on the associated operation type and number of the main ticket, determining a first report form of a class sparse matrix corresponding to the operation ticket data by using the ID of the operation ticket data as a label; and determining a second report form of the corresponding sparse matrix by using the ID of the job ticket data as a label based on the item for risk identification associated with the main ticket according to the item record data in the job ticket data.
Optionally, the extracting the feature content based on the preprocessed data includes: extracting data of a preset dimensionality contained in the main ticket data and extracting keywords with word frequency larger than K aiming at the main ticket data, and taking the extracted data and the keywords as first characteristic contents; aiming at preset operation data, merging and extracting similar operations in the first report to obtain second characteristic content; and associating the first characteristic content and the second characteristic content through the ID of the job ticket data to obtain a cluster data set of the job ticket data.
Optionally, the determining items for risk prediction based on the clustered data of each group comprises: associating each job ticket data with the preprocessed items for risk identification based on the job ticket data corresponding ID of each group; performing intersection operation and union operation on the items for risk identification associated with all the operation ticket data in each group; taking the item used for risk identification in the obtained intersection as a fixed risk prediction item; and (4) performing difference on the obtained union set and the intersection set, and taking the entry used for risk identification in the set after difference as an optional risk prediction item.
According to a second aspect of the present disclosure, there is provided an apparatus for job risk identification, comprising: the data cleaning unit is configured to clean the operation ticket data according to a preset data cleaning rule after the historical operation ticket data is acquired; a feature extraction unit configured to preprocess the cleaned job ticket data to perform feature content extraction based on the preprocessed data;
and the clustering unit is configured to perform group clustering on the extracted characteristic contents so as to determine an item for risk prediction based on the clustered data of each group, and obtain a model for job risk identification.
According to a third aspect of the present disclosure, a computer-readable storage medium is provided, which stores computer instructions for causing a computer to execute the method for job risk identification according to any one of the implementations of the first aspect.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method for job risk identification according to any one of the implementations of the first aspect.
The method and the device for identifying the operation risk in the embodiment of the disclosure comprise the steps of cleaning operation ticket data according to a preset data cleaning rule after acquiring historical operation ticket data; preprocessing the cleaned job ticket data to extract characteristic contents based on the preprocessed data; and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification. Through establishing the model for risk identification, items corresponding to different job tickets and used for risk identification can be generated, the pertinence of the risk items is improved, the accuracy of risk assessment can be improved based on the risk items, and the technical problems of high error rate, low efficiency and no pertinence caused by manual work or a mode of specifying a fixed template in the related art are solved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained according to these drawings by those skilled in the art without creative efforts.
FIG. 1 is a method for job risk identification according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a hierarchical clustering algorithm according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those skilled in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only some embodiments of the present disclosure, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure may be described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The chemical industry is an important component part for the modern development of the industry in China, but various chemical technologies are more complex under the background of the continuous development of chemical processes, so that great potential safety hazards exist in chemical production. In order to remarkably enhance the reliability and stability of the production and operation of the chemical process and prevent main dangerous and harmful factors existing in a chemical process system, safety management personnel of an enterprise need to carefully analyze and evaluate accident situations possibly caused by the dangerous and harmful factors and accurately identify the risk problems of the chemical production enterprise, process safety accidents such as fire, explosion and chemical substance leakage are prevented from the aspects of engineering technical measures and safety management measures, the risk identification capability of the whole personnel is improved, and the stable operation of the chemical production is ensured. Therefore, the risk identification is the basis for safe and stable operation of the enterprise.
The field conditions of chemical enterprises in actual production operation are complex, and the differences of professional knowledge, experience, character, responsibility, mental state and the like of operators cause great differences in risk identification in the operation environment in similar operation, so that the risks are not identified sufficiently in the operation, and accidents occur. Therefore, it is important to improve the safety and reliability of the job management system according to how effectively the job content management system identifies the risk in the job.
According to an embodiment of the present disclosure, a method and an apparatus for job risk identification are provided, as shown in fig. 1, the method includes the following steps 101 to 103:
step 101: and after the historical operation ticket data are acquired, cleaning the operation ticket data according to a preset data cleaning rule.
In the present embodiment, job ticket data including, but not limited to, job content, job status, ticket number, job area, associated device, department of residence, job unit, job-related person, planned job time, actual job time, associated special job type, risk identification entry, and other related information may be derived from the intra-enterprise electronic job ticket data management system as historical job ticket data. It can be understood that, in order to ensure the final clustering effect on the job tickets, the number of valid samples of the job tickets should be confirmed to be more than N (for example, more than 1000) before the historical job tickets are imported, i.e., N historical records of completed job tickets are required. For example, if any enterprise terminal needs to apply the method of the embodiment, then the electronic job ticket management system of the enterprise is required to stably run for a period of about N/M days after the calculation of M tickets (for example, 20 tickets) generated by the enterprise every day, and then the system supporting the method can be used.
Further, the content of the job ticket data can refer to the following table 1, and the special job, that is, the preset job mentioned later in this embodiment:
TABLE 1
Further, after the historical job ticket data is acquired, the job ticket data may be scrubbed to remove erroneous job tickets, invalid job tickets, and unfinished job tickets.
The cleaning of the job ticket may include cleaning the main ticket data separately, and may include the steps of: cleaning the deleted (wrong) job ticket information through the deleting state (0: normal state; 1: deleting state); cleaning off the operation tickets which are not accepted (not completed) through the operation ticket states; and cleaning the test ticket or the invalid ticket through the character string length of the operation content of the operation ticket and the screening of the keywords. Illustratively, job tickets with a string length of less than m (e.g., 4) contents or containing keywords such as "test", "trial", "test", etc. are considered as test tickets or invalid tickets for purging.
Cleaning a particular job may include the steps of: the ID number of the operation ticket is associated with the main ticket, and the common operation ticket data and the deleted special ticket data are cleaned by taking the intersection of the special operation data and the main ticket data.
The data cleansing step for the entry record may include: and associating the operation ID number with the main ticket and taking an intersection, and cleaning the item information associated with the deleted operation ticket based on the intersection.
Step 102: and preprocessing the cleaned job ticket data to extract the characteristic content based on the preprocessed data.
In this embodiment, the main ticket data, the special job data, and the entry record data after being cleaned may be preprocessed, and feature extraction may be performed based on the preprocessed content.
As an optional implementation manner of this embodiment, the preprocessing the cleaned job data includes: extracting preset keywords aiming at main ticket data in the job ticket data, and calculating word frequency; aiming at preset operation data in the operation ticket data, based on the associated operation type and number of the main ticket, determining a first report form of a class sparse matrix corresponding to the operation ticket data by using the ID of the operation ticket data as a label; and determining a second report form of the corresponding sparse matrix by using the ID of the job ticket data as a label based on the item for risk identification associated with the main ticket according to the item record data in the job ticket data.
In this alternative implementation, the preprocessing for the main ticket data may include the steps of: and extracting the operation characteristic keywords in the operation content, and calculating the word frequency. The job characteristic keywords comprise a common action word set in the job content, such as: installation, maintenance, removal, heat preservation, maintenance, lubrication, replacement, building, pouring and the like.
The preprocessing for the special job data (preset job data) may include the steps of: and transposing the special operation data according to the special operation types, performing aggregation counting on the operation ticket IDs, and counting the number of various different types of special operations associated with the same operation ticket to form a first report of a class sparse matrix which takes the operation ticket ID as a unique label and the special operation type as a characteristic and takes a counting value. For example, after 2 records of a first-level fire operation and a second-level high-altitude operation are transferred in association with a certain job ticket, the job ticket becomes a record with the ID number of the job ticket as the main key, the feature value of the first-level fire operation being 1, and the feature value of the second-level high-altitude operation being 1.
Illustratively, the special job types may include the following types: the operation types comprise fire operation, limited space operation, high-altitude operation, hoisting operation, blind plate plugging operation, temporary power utilization operation, soil moving operation, circuit breaking operation and the like.
And aiming at the entry record data in the operation ticket data, transposing the entry record data according to the risk identification entry, and performing aggregation counting on the operation ticket ID to form a second report form of a similar sparse matrix which takes the operation ticket ID as a unique label, is characterized by the risk identification entry and takes a counting value. For example, if a certain job ticket recognizes 3 risk identification entry records of poisoning, electric shock, object strike and the like, the record is changed into one record with the ID number of the job ticket as the main key, the characteristic value of poisoning is 1, the characteristic value of electric shock is 1 and the characteristic value of object strike is 1 after transposition.
As an optional implementation manner of this embodiment, the extracting feature content based on the preprocessed data includes: extracting data of a preset dimensionality contained in the main ticket data and extracting keywords with word frequency larger than K aiming at the main ticket data, and taking the extracted data and the keywords as first characteristic contents; aiming at preset operation data, merging and extracting similar operations in the first report to obtain second characteristic content; and associating the first characteristic content and the second characteristic content through the ID of the job ticket data to obtain a cluster data set of the job ticket data.
In this optional implementation manner, the extracting features included in the main ticket data includes: on one hand, the job content data, job area data, related equipment data, attribution department data and actual job time data in the direct main ticket information can be used as characteristic contents. On the other hand, keywords with the work characteristic keyword word frequency larger than K (e.g., 40) are extracted as characteristic contents, and the first characteristic contents are formed by including state values. The state value can be 0 or 1, and the value 0 is used for indicating that the content of the job ticket does not contain the keyword; the value 1 represents that the job ticket content contains the keyword.
The extracting the characteristic content included in the special operation data comprises the following steps: and merging the same kind of operation in the first report to form a new characteristic and extracting the new characteristic to obtain a second characteristic content. Illustratively, the characteristic values of different levels of fire, high altitude and hoisting operation are combined to form a new characteristic and the new characteristic is extracted to obtain a second characteristic content. For example, a certain job ticket includes a first-level high-place job with a characteristic value of 1 and a third-level high-place job with a characteristic value of 1, and the combined jobs become a high-place job with a certain job ticket including a characteristic value of 2.
And finally, the first characteristic content and the second characteristic content can be associated through the operation ID to obtain a characteristic data set clustered by the ID.
Step 103: and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification.
In this embodiment, the feature extraction dataset of the job content is clustered by using Hamming algorithm (Hamming) in combination with hierarchical clustering algorithm (hierarchical clustering).
It will be understood that the hamming distance represents the number of different characters at corresponding positions of two (same length) character strings, and we denote the hamming distance between two characters x, y by d (x, y). And carrying out exclusive OR operation on the two character strings, and counting the number of 1, wherein the number is the Hamming distance. In a code group set, the number of bits with different symbol values on corresponding bits between any two code words is defined as the hamming distance between the two code words.
I.e., d (x, y) · y [ i ]. where i ═ 0,1,. n-1, x, y are n-bit codes, and #indicatesan exclusive or; for example: (00) the distance from (01) is 1, and the distances from (110) and (101) are 2. In a set of code groups, the minimum value of the hamming distance between any two codes is called the minimum hamming distance of the code group. The smaller the minimum hamming distance, the higher the similarity of code groups.
Hierarchical clustering algorithm (hierarchical clustering): small cluster combinations can be aggregated from bottom to top, or large clusters can be segmented from top to bottom. Take the aggregation from the bottom up as an example: that is, each time two clusters with the shortest distance are found, and then the two clusters are merged into a large cluster until all clusters are merged into a cluster. The whole process is to build a tree structure, similar to fig. 2. Initially, each data point is considered as a class by itself, and their distance is the distance between the two points. While for a cluster containing more than one data point, a variety of methods may be chosen. E.g., average-linkage, i.e., calculating the average of the pairwise distances of the respective data points of the two clusters. There is also a single-linkage/complete-linkage, which selects the distance of the pair of data points with the shortest/longest distance between the two clusters as the distance of the class.
As an optional implementation manner of this embodiment, performing packet clustering on the extracted feature content includes: taking the characteristic content of each operation ticket data as a code group, and clustering the code group based on a Hamming algorithm; and performing secondary clustering on the clustered data points by utilizing a hierarchical clustering algorithm based on the extracted preset characteristic content.
In this optional implementation manner, the feature content of the special job (i.e., the preset job in this embodiment) of each ticket (which can be distinguished by the job ticket ID) and the keyword whose word frequency of the content of the main ticket job is greater than K are used as a code group, the minimum hamming distance is calculated, and clustering is performed by comparing the minimum hamming distances, so as to perform group clustering on similar jobs. And then, the characteristic contents of preset dimensions (including operation areas, associated equipment, department of ownership, actual operation time and the like) in the main ticket data are introduced to perform secondary clustering through a hierarchical algorithm, so that the similar operations can be grouped and accurately clustered.
As an optional implementation manner of this embodiment, determining the item for risk prediction based on the data of each group after clustering includes: taking the items for risk identification shared by all the job ticket data in each group after clustering as fixed risk prediction items; and taking the items for risk identification which are not shared by the data in each group after clustering as optional risk prediction items, wherein each piece of job ticket data can contain the content of the risk items.
In the optional implementation manner, the fixed item of the item for risk identification is not deletable, and the optional item of the item for risk identification can be selected according to the actual operation condition; if the cluster group has the entries which are not output, the entries can be added in a self-defining mode, and the added entries are automatically updated into the model as self-defining items.
As an optional implementation manner of this embodiment, determining the item for risk prediction based on the data of each group after clustering includes: associating each job ticket data with the preprocessed entry for risk identification based on the job ticket data corresponding ID of each group; performing intersection operation and union operation on the items for risk identification associated with all the operation ticket data in each group; taking the items used for risk identification in the obtained intersection as fixed risk prediction items; and (4) performing difference on the obtained union set and the intersection set, and taking the entry used for risk identification in the set after difference as an optional risk prediction item.
In this optional implementation, after the similar jobs are grouped and clustered, the job ID is associated with the preprocessed data of the entry record data. Obtaining a fixed item of a risk identification prediction item by counting the intersection of the risk identification items in similar operation; and subtracting the intersection from the union set of the risk identification items in the similar operation by statistics to obtain a risk identification prediction item, namely the recommended item of the risk identification prediction item.
As an optional implementation manner of this embodiment, after receiving new job ticket data, the new job ticket data is input into the model for job risk identification, and an item for risk prediction corresponding to the new job ticket data is output.
In this optional implementation manner, after the user completes the filling of the basic information of the new job ticket and inputs the basic information into the model, the model performs feature extraction according to the content of the filled job ticket, clusters the basic information into the closest group, and finally outputs an entry for risk identification corresponding to the group.
Furthermore, the model can be corrected, along with the increase of the number of the jobs, the number of the features meeting the word frequency number condition of the job feature keywords can be correspondingly increased, the accuracy and the type of clustering can be correspondingly increased, the accuracy of prediction can be correspondingly increased, and the model can be periodically upgraded to obtain a prediction result with higher accuracy.
In the embodiment, the risk identification prediction of similar operations is realized by clustering after the characteristics of the operation contents are extracted; a method for extracting new characteristics of the job content by extracting the characteristic keywords of the job content and calculating the word frequency of the characteristic keywords; the method solves the defect that the risk identification in the industry is completely finished by the experience of an operation responsible person, and reduces the conditions of incomplete risk identification and high identification error rate caused by human factors; the problem that a large number of risk identification templates need to be configured after an enterprise is online in an operation management system is solved, the workload of system deployment is greatly reduced, and the online time is shortened; the self-correcting capability is provided, and along with the accumulation of the work amount, the granularity of the work cluster is finer, and the prediction result of the model is more accurate.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
According to an embodiment of the present disclosure, there is also provided an apparatus for implementing the above method for job risk identification, including: the data cleaning unit is configured to clean the operation ticket data according to a preset data cleaning rule after the historical operation ticket data is acquired; a feature extraction unit configured to preprocess the cleaned job ticket data to perform feature content extraction based on the preprocessed data; and the clustering unit is configured to perform grouping clustering on the extracted characteristic contents so as to determine an item for risk prediction based on the clustered data of each group, and obtain a model for job risk identification.
The device still includes: and the risk item prediction unit is configured to input new operation ticket data into the model for operation risk identification after receiving the new operation ticket data, and output an item for risk prediction corresponding to the new operation ticket data.
The two embodiments can realize intelligent risk identification of dangerous chemical enterprises in the operation management system according to operation contents, cluster operations of the same type by aiming at the historical records of operation tickets, and form a risk identification prediction model; carrying out risk identification item test on the newly applied job ticket by using the prediction model, dividing risk identification into a fixed item, a recommended item and a self-defined item, and correcting the prediction model in real time; and upgrading iteration can be automatically performed on the prediction model regularly (or quantitatively) according to the subsequently accumulated historical data, so that accurate clustering is realized, and the accuracy of intelligent risk identification is improved.
The embodiment of the present disclosure provides an electronic device, as shown in fig. 3, the electronic device includes one or more processors 31 and a memory 32, where one processor 31 is taken as an example in fig. 3.
The controller may further include: an input device 33 and an output device 34.
The processor 31, the memory 32, the input device 33 and the output device 34 may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example.
The processor 31 may be a Central Processing Unit (CPU). The processor 31 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 32, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the control methods in the embodiments of the present disclosure. The processor 31 executes various functional applications of the server and data processing, i.e. implements the method of the above-described method embodiments, by running non-transitory software programs, instructions and modules stored in the memory 32.
The memory 32 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a processing device operated by the server, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 32 may optionally include memory located remotely from the processor 31, which may be connected to a network connection device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 33 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing device of the server. The output device 34 may include a display device such as a display screen.
One or more modules are stored in the memory 32, which when executed by the one or more processors 31 perform the method as shown in fig. 1.
It will be understood by those skilled in the art that all or part of the processes in the methods according to the embodiments described above may be implemented by instructing relevant hardware through a computer program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the motor control methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-only memory (ROM), a Random Access Memory (RAM), a flash memory (flash memory), a hard disk (hard disk drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present disclosure have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the present disclosure, and such modifications and variations fall within the scope defined by the appended claims.
Claims (10)
1. A method for job risk identification, comprising:
after historical operation ticket data are obtained, cleaning the operation ticket data according to a preset data cleaning rule;
preprocessing the cleaned job ticket data to extract feature contents based on the preprocessed data;
and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification.
2. The method for job risk identification according to claim 1, further comprising:
and after receiving new operation ticket data, inputting the new operation ticket data into the model for operation risk identification, and outputting an item for risk prediction corresponding to the new operation ticket data.
3. The method for job risk identification according to claim 1, wherein determining items for risk prediction based on the clustered data of each group comprises:
taking the items for risk identification shared by all the job ticket data in each group after clustering as fixed risk prediction items;
and taking the items for risk identification which are not shared by the data in each group after clustering as optional risk prediction items, wherein each piece of job ticket data can contain the content of the risk items.
4. The method for job risk identification according to claim 1, wherein clustering the extracted feature content in groups comprises:
taking the characteristic content of each job ticket data as a code group, and clustering the code group based on a Hamming algorithm;
and performing secondary clustering on the clustered data points by utilizing a hierarchical clustering algorithm based on the extracted preset characteristic content.
5. The method for job risk identification according to claim 1, wherein the pre-processing the cleaned job data comprises:
extracting preset keywords aiming at main ticket data in the operation ticket data, and calculating word frequency;
aiming at preset operation data in the operation ticket data, based on the associated operation type and number of the main ticket, determining a first report form of a class sparse matrix corresponding to the operation ticket data by using the ID of the operation ticket data as a label;
and determining a second report form of the corresponding sparse matrix by using the ID of the job ticket data as a label based on the item for risk identification associated with the main ticket according to the item record data in the job ticket data.
6. The method for job risk identification according to claim 5, wherein feature content extraction based on the preprocessed data comprises:
extracting data of a preset dimensionality contained in the main ticket data and extracting keywords with word frequency larger than K aiming at the main ticket data, and taking the extracted data and the keywords as first characteristic contents;
aiming at preset operation data, merging and extracting similar operations in the first report to obtain second characteristic content;
and associating the first characteristic content and the second characteristic content through the ID of the job ticket data to obtain a cluster data set of the job ticket data.
7. The method for job risk identification according to claim 5, wherein said determining items for risk prediction based on clustered data of each group comprises:
associating each job ticket data with the preprocessed entry for risk identification based on the job ticket data corresponding ID of each group;
performing intersection operation and union operation on the items for risk identification associated with all the operation ticket data in each group;
taking the item used for risk identification in the obtained intersection as a fixed risk prediction item;
and (4) performing difference on the obtained union set and the intersection set, and taking the entry used for risk identification in the set after difference as an optional risk prediction item.
8. An apparatus for job risk identification, comprising:
the data cleaning unit is configured to clean the operation ticket data according to a preset data cleaning rule after the historical operation ticket data are obtained;
a feature extraction unit configured to preprocess the cleaned job ticket data to perform feature content extraction based on the preprocessed data;
and the clustering unit is configured to perform grouping clustering on the extracted characteristic contents so as to determine an item for risk prediction based on the clustered data of each group, and obtain a model for job risk identification.
9. A computer-readable storage medium storing computer instructions for causing a computer to perform the method for job risk identification of any one of claims 1-7.
10. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method for job risk identification of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210657311.XA CN115099586A (en) | 2022-06-10 | 2022-06-10 | Method and device for identifying operation risk |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210657311.XA CN115099586A (en) | 2022-06-10 | 2022-06-10 | Method and device for identifying operation risk |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115099586A true CN115099586A (en) | 2022-09-23 |
Family
ID=83290857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210657311.XA Pending CN115099586A (en) | 2022-06-10 | 2022-06-10 | Method and device for identifying operation risk |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115099586A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104299044A (en) * | 2014-07-01 | 2015-01-21 | 沈阳工程学院 | Clustering-analysis-based wind power short-term prediction system and prediction method |
WO2018008835A1 (en) * | 2016-07-08 | 2018-01-11 | 주식회사 인텔리퀀트 | Risk management method for securities portfolio and risk management device therefor |
CN108153738A (en) * | 2018-02-10 | 2018-06-12 | 灯塔财经信息有限公司 | A kind of chat record analysis method and device based on hierarchical clustering |
CN108491430A (en) * | 2018-02-09 | 2018-09-04 | 北京邮电大学 | It is a kind of based on the unsupervised Hash search method clustered to characteristic direction |
KR20190040847A (en) * | 2017-10-11 | 2019-04-19 | 양경옥 | Method and Apparatus for Safety Information Generation by Reading Daily Job-site Work Reports in Construction and Industrial Project |
CN111291900A (en) * | 2020-03-05 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | Method and device for training risk recognition model |
CN112257974A (en) * | 2020-09-09 | 2021-01-22 | 北京无线电计量测试研究所 | Gas lock well risk prediction model data set, model training method and application |
CN112581000A (en) * | 2020-12-24 | 2021-03-30 | 广东省电信规划设计院有限公司 | Enterprise risk index calculation method and device |
CN113688169A (en) * | 2021-08-11 | 2021-11-23 | 北京科技大学 | Mine potential safety hazard identification and early warning system based on big data analysis |
CN114066242A (en) * | 2021-11-11 | 2022-02-18 | 北京道口金科科技有限公司 | Enterprise risk early warning method and device |
CN114579791A (en) * | 2022-03-22 | 2022-06-03 | 国网山东省电力公司经济技术研究院 | Construction safety violation identification method and system based on operation ticket |
-
2022
- 2022-06-10 CN CN202210657311.XA patent/CN115099586A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104299044A (en) * | 2014-07-01 | 2015-01-21 | 沈阳工程学院 | Clustering-analysis-based wind power short-term prediction system and prediction method |
WO2018008835A1 (en) * | 2016-07-08 | 2018-01-11 | 주식회사 인텔리퀀트 | Risk management method for securities portfolio and risk management device therefor |
KR20190040847A (en) * | 2017-10-11 | 2019-04-19 | 양경옥 | Method and Apparatus for Safety Information Generation by Reading Daily Job-site Work Reports in Construction and Industrial Project |
CN108491430A (en) * | 2018-02-09 | 2018-09-04 | 北京邮电大学 | It is a kind of based on the unsupervised Hash search method clustered to characteristic direction |
CN108153738A (en) * | 2018-02-10 | 2018-06-12 | 灯塔财经信息有限公司 | A kind of chat record analysis method and device based on hierarchical clustering |
CN111291900A (en) * | 2020-03-05 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | Method and device for training risk recognition model |
CN112257974A (en) * | 2020-09-09 | 2021-01-22 | 北京无线电计量测试研究所 | Gas lock well risk prediction model data set, model training method and application |
CN112581000A (en) * | 2020-12-24 | 2021-03-30 | 广东省电信规划设计院有限公司 | Enterprise risk index calculation method and device |
CN113688169A (en) * | 2021-08-11 | 2021-11-23 | 北京科技大学 | Mine potential safety hazard identification and early warning system based on big data analysis |
CN114066242A (en) * | 2021-11-11 | 2022-02-18 | 北京道口金科科技有限公司 | Enterprise risk early warning method and device |
CN114579791A (en) * | 2022-03-22 | 2022-06-03 | 国网山东省电力公司经济技术研究院 | Construction safety violation identification method and system based on operation ticket |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110351150B (en) | Fault source determination method and device, electronic equipment and readable storage medium | |
CN109522192B (en) | Prediction method based on knowledge graph and complex network combination | |
CN110708204A (en) | Abnormity processing method, system, terminal and medium based on operation and maintenance knowledge base | |
CN110928718A (en) | Exception handling method, system, terminal and medium based on correlation analysis | |
CN112148772A (en) | Alarm root cause identification method, device, equipment and storage medium | |
JP2018045403A (en) | Abnormality detection system and abnormality detection method | |
CN111539493B (en) | Alarm prediction method and device, electronic equipment and storage medium | |
CN109992484B (en) | Network alarm correlation analysis method, device and medium | |
CN114519524A (en) | Enterprise risk early warning method and device based on knowledge graph and storage medium | |
CN112351004A (en) | Computer network based information security event processing system and method | |
CN115686910A (en) | Fault analysis method and device, electronic equipment and medium | |
CN107562558A (en) | The feedback method and system of a kind of error message | |
CN113535458B (en) | Abnormal false alarm processing method and device, storage medium and terminal | |
CN115099586A (en) | Method and device for identifying operation risk | |
CN115687031A (en) | Method, device, equipment and medium for generating alarm description text | |
CN115470034A (en) | Log analysis method, device and storage medium | |
CN114385398A (en) | Request response state determination method, device, equipment and storage medium | |
CN111209158B (en) | Mining monitoring method and cluster monitoring system for server cluster | |
CN111352818B (en) | Application program performance analysis method and device, storage medium and electronic equipment | |
CN114140241A (en) | Abnormity identification method and device for transaction monitoring index | |
CN114266472A (en) | Subway station evacuation risk analysis method based on Spark | |
CN114312930A (en) | Train operation abnormity diagnosis method and device based on log data | |
CN113407495A (en) | SIMHASH-based file similarity determination method and system | |
CN113590825A (en) | Text quality inspection method and device and related equipment | |
CN113778792A (en) | Alarm classification method and system for IT equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |