CN115099586A

CN115099586A - Method and device for identifying operation risk

Info

Publication number: CN115099586A
Application number: CN202210657311.XA
Authority: CN
Inventors: 高健; 李超; 陈诗苑; 杜兴团; 杨林林; 周志杰; 陈浩
Original assignee: Hanghai Yigongtongzhi Information Technology Co ltd
Current assignee: Hanghai Yigongtongzhi Information Technology Co ltd
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2022-09-23

Abstract

The embodiment of the disclosure discloses a method and a device for identifying operation risks, which comprises the steps of cleaning operation ticket data according to a preset data cleaning rule after the operation ticket data of history is obtained; preprocessing the cleaned job ticket data to extract feature contents based on the preprocessed data; and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification. By establishing the model for risk identification, items for risk identification corresponding to different job tickets can be generated, the pertinence of the risk items is improved, the accuracy of risk assessment can be improved based on the risk items, and the technical problems of high error rate, low efficiency and no pertinence caused by manual work or a mode of specifying a fixed template in the related art are solved.

Description

Method and device for identifying operation risk

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for job risk identification.

Background

At present, operation management systems of various industries, such as operation management systems related to hazardous chemicals, mainly aim at risk identification of operations by experienced operation management personnel, perform risk identification on some daily operations through on-site observation and knowledge and experience of the personnel in combination with related management specifications, make identified risks into templates and record the templates one by one into the operation management systems, and then in subsequent operations, introduce the manually determined fixed templates and can be used for risk analysis and identification.

By adopting the mode, on one hand, the risk identification of the operation is carried out by an experienced operation manager through combining the field observation and the knowledge and experience of the operator with the relevant management standard, the risk identification is greatly influenced by the experience difference of the operator, and the risk of the final identification of the operator possibly has larger difference even aiming at the same type of action; on the other hand, the solidified risk identification template method cannot cope with operations in different environments, and the solidified risk identification template generally has wide coverage and weak pertinence and is easy to miss identification or wrong identification.

Disclosure of Invention

The main purpose of the present disclosure is to provide a method and apparatus for job risk identification.

In order to achieve the above object, according to a first aspect of the present disclosure, there is provided a method for job risk identification, including: after historical operation ticket data are obtained, cleaning the operation ticket data according to a preset data cleaning rule; preprocessing the cleaned job ticket data to extract feature contents based on the preprocessed data; and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification.

Optionally, the method further comprises: and after receiving new operation ticket data, inputting the new operation ticket data into the model for operation risk identification, and outputting an item for risk prediction corresponding to the new operation ticket data.

Optionally, determining the term for risk prediction based on the clustered data of each group comprises: taking the items for risk identification shared by all the job ticket data in each group after clustering as fixed risk prediction items; and taking the items for risk identification which are not shared by the data in each group after clustering as optional risk prediction items, wherein each piece of job ticket data can contain the content of the risk items.

Optionally, the clustering the extracted feature contents in groups includes: taking the characteristic content of each operation ticket data as a code group, and clustering the code group based on a Hamming algorithm; and performing secondary clustering on the clustered data points by using a hierarchical clustering algorithm based on the extracted preset characteristic content.

Optionally, the preprocessing the cleaned job data includes: extracting preset keywords aiming at main ticket data in the operation ticket data, and calculating word frequency; aiming at preset operation data in the operation ticket data, based on the associated operation type and number of the main ticket, determining a first report form of a class sparse matrix corresponding to the operation ticket data by using the ID of the operation ticket data as a label; and determining a second report form of the corresponding sparse matrix by using the ID of the job ticket data as a label based on the item for risk identification associated with the main ticket according to the item record data in the job ticket data.

Optionally, the extracting the feature content based on the preprocessed data includes: extracting data of a preset dimensionality contained in the main ticket data and extracting keywords with word frequency larger than K aiming at the main ticket data, and taking the extracted data and the keywords as first characteristic contents; aiming at preset operation data, merging and extracting similar operations in the first report to obtain second characteristic content; and associating the first characteristic content and the second characteristic content through the ID of the job ticket data to obtain a cluster data set of the job ticket data.

Optionally, the determining items for risk prediction based on the clustered data of each group comprises: associating each job ticket data with the preprocessed items for risk identification based on the job ticket data corresponding ID of each group; performing intersection operation and union operation on the items for risk identification associated with all the operation ticket data in each group; taking the item used for risk identification in the obtained intersection as a fixed risk prediction item; and (4) performing difference on the obtained union set and the intersection set, and taking the entry used for risk identification in the set after difference as an optional risk prediction item.

According to a second aspect of the present disclosure, there is provided an apparatus for job risk identification, comprising: the data cleaning unit is configured to clean the operation ticket data according to a preset data cleaning rule after the historical operation ticket data is acquired; a feature extraction unit configured to preprocess the cleaned job ticket data to perform feature content extraction based on the preprocessed data;

and the clustering unit is configured to perform group clustering on the extracted characteristic contents so as to determine an item for risk prediction based on the clustered data of each group, and obtain a model for job risk identification.

According to a third aspect of the present disclosure, a computer-readable storage medium is provided, which stores computer instructions for causing a computer to execute the method for job risk identification according to any one of the implementations of the first aspect.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method for job risk identification according to any one of the implementations of the first aspect.

The method and the device for identifying the operation risk in the embodiment of the disclosure comprise the steps of cleaning operation ticket data according to a preset data cleaning rule after acquiring historical operation ticket data; preprocessing the cleaned job ticket data to extract characteristic contents based on the preprocessed data; and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification. Through establishing the model for risk identification, items corresponding to different job tickets and used for risk identification can be generated, the pertinence of the risk items is improved, the accuracy of risk assessment can be improved based on the risk items, and the technical problems of high error rate, low efficiency and no pertinence caused by manual work or a mode of specifying a fixed template in the related art are solved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained according to these drawings by those skilled in the art without creative efforts.

FIG. 1 is a method for job risk identification according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a hierarchical clustering algorithm according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those skilled in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only some embodiments of the present disclosure, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure may be described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The chemical industry is an important component part for the modern development of the industry in China, but various chemical technologies are more complex under the background of the continuous development of chemical processes, so that great potential safety hazards exist in chemical production. In order to remarkably enhance the reliability and stability of the production and operation of the chemical process and prevent main dangerous and harmful factors existing in a chemical process system, safety management personnel of an enterprise need to carefully analyze and evaluate accident situations possibly caused by the dangerous and harmful factors and accurately identify the risk problems of the chemical production enterprise, process safety accidents such as fire, explosion and chemical substance leakage are prevented from the aspects of engineering technical measures and safety management measures, the risk identification capability of the whole personnel is improved, and the stable operation of the chemical production is ensured. Therefore, the risk identification is the basis for safe and stable operation of the enterprise.

The field conditions of chemical enterprises in actual production operation are complex, and the differences of professional knowledge, experience, character, responsibility, mental state and the like of operators cause great differences in risk identification in the operation environment in similar operation, so that the risks are not identified sufficiently in the operation, and accidents occur. Therefore, it is important to improve the safety and reliability of the job management system according to how effectively the job content management system identifies the risk in the job.

According to an embodiment of the present disclosure, a method and an apparatus for job risk identification are provided, as shown in fig. 1, the method includes the following steps 101 to 103:

step 101: and after the historical operation ticket data are acquired, cleaning the operation ticket data according to a preset data cleaning rule.

In the present embodiment, job ticket data including, but not limited to, job content, job status, ticket number, job area, associated device, department of residence, job unit, job-related person, planned job time, actual job time, associated special job type, risk identification entry, and other related information may be derived from the intra-enterprise electronic job ticket data management system as historical job ticket data. It can be understood that, in order to ensure the final clustering effect on the job tickets, the number of valid samples of the job tickets should be confirmed to be more than N (for example, more than 1000) before the historical job tickets are imported, i.e., N historical records of completed job tickets are required. For example, if any enterprise terminal needs to apply the method of the embodiment, then the electronic job ticket management system of the enterprise is required to stably run for a period of about N/M days after the calculation of M tickets (for example, 20 tickets) generated by the enterprise every day, and then the system supporting the method can be used.

Further, the content of the job ticket data can refer to the following table 1, and the special job, that is, the preset job mentioned later in this embodiment:

TABLE 1

Further, after the historical job ticket data is acquired, the job ticket data may be scrubbed to remove erroneous job tickets, invalid job tickets, and unfinished job tickets.

The cleaning of the job ticket may include cleaning the main ticket data separately, and may include the steps of: cleaning the deleted (wrong) job ticket information through the deleting state (0: normal state; 1: deleting state); cleaning off the operation tickets which are not accepted (not completed) through the operation ticket states; and cleaning the test ticket or the invalid ticket through the character string length of the operation content of the operation ticket and the screening of the keywords. Illustratively, job tickets with a string length of less than m (e.g., 4) contents or containing keywords such as "test", "trial", "test", etc. are considered as test tickets or invalid tickets for purging.

Cleaning a particular job may include the steps of: the ID number of the operation ticket is associated with the main ticket, and the common operation ticket data and the deleted special ticket data are cleaned by taking the intersection of the special operation data and the main ticket data.

The data cleansing step for the entry record may include: and associating the operation ID number with the main ticket and taking an intersection, and cleaning the item information associated with the deleted operation ticket based on the intersection.

Step 102: and preprocessing the cleaned job ticket data to extract the characteristic content based on the preprocessed data.

In this embodiment, the main ticket data, the special job data, and the entry record data after being cleaned may be preprocessed, and feature extraction may be performed based on the preprocessed content.

As an optional implementation manner of this embodiment, the preprocessing the cleaned job data includes: extracting preset keywords aiming at main ticket data in the job ticket data, and calculating word frequency; aiming at preset operation data in the operation ticket data, based on the associated operation type and number of the main ticket, determining a first report form of a class sparse matrix corresponding to the operation ticket data by using the ID of the operation ticket data as a label; and determining a second report form of the corresponding sparse matrix by using the ID of the job ticket data as a label based on the item for risk identification associated with the main ticket according to the item record data in the job ticket data.

In this alternative implementation, the preprocessing for the main ticket data may include the steps of: and extracting the operation characteristic keywords in the operation content, and calculating the word frequency. The job characteristic keywords comprise a common action word set in the job content, such as: installation, maintenance, removal, heat preservation, maintenance, lubrication, replacement, building, pouring and the like.

The preprocessing for the special job data (preset job data) may include the steps of: and transposing the special operation data according to the special operation types, performing aggregation counting on the operation ticket IDs, and counting the number of various different types of special operations associated with the same operation ticket to form a first report of a class sparse matrix which takes the operation ticket ID as a unique label and the special operation type as a characteristic and takes a counting value. For example, after 2 records of a first-level fire operation and a second-level high-altitude operation are transferred in association with a certain job ticket, the job ticket becomes a record with the ID number of the job ticket as the main key, the feature value of the first-level fire operation being 1, and the feature value of the second-level high-altitude operation being 1.

Illustratively, the special job types may include the following types: the operation types comprise fire operation, limited space operation, high-altitude operation, hoisting operation, blind plate plugging operation, temporary power utilization operation, soil moving operation, circuit breaking operation and the like.

And aiming at the entry record data in the operation ticket data, transposing the entry record data according to the risk identification entry, and performing aggregation counting on the operation ticket ID to form a second report form of a similar sparse matrix which takes the operation ticket ID as a unique label, is characterized by the risk identification entry and takes a counting value. For example, if a certain job ticket recognizes 3 risk identification entry records of poisoning, electric shock, object strike and the like, the record is changed into one record with the ID number of the job ticket as the main key, the characteristic value of poisoning is 1, the characteristic value of electric shock is 1 and the characteristic value of object strike is 1 after transposition.

As an optional implementation manner of this embodiment, the extracting feature content based on the preprocessed data includes: extracting data of a preset dimensionality contained in the main ticket data and extracting keywords with word frequency larger than K aiming at the main ticket data, and taking the extracted data and the keywords as first characteristic contents; aiming at preset operation data, merging and extracting similar operations in the first report to obtain second characteristic content; and associating the first characteristic content and the second characteristic content through the ID of the job ticket data to obtain a cluster data set of the job ticket data.

In this optional implementation manner, the extracting features included in the main ticket data includes: on one hand, the job content data, job area data, related equipment data, attribution department data and actual job time data in the direct main ticket information can be used as characteristic contents. On the other hand, keywords with the work characteristic keyword word frequency larger than K (e.g., 40) are extracted as characteristic contents, and the first characteristic contents are formed by including state values. The state value can be 0 or 1, and the value 0 is used for indicating that the content of the job ticket does not contain the keyword; the value 1 represents that the job ticket content contains the keyword.

The extracting the characteristic content included in the special operation data comprises the following steps: and merging the same kind of operation in the first report to form a new characteristic and extracting the new characteristic to obtain a second characteristic content. Illustratively, the characteristic values of different levels of fire, high altitude and hoisting operation are combined to form a new characteristic and the new characteristic is extracted to obtain a second characteristic content. For example, a certain job ticket includes a first-level high-place job with a characteristic value of 1 and a third-level high-place job with a characteristic value of 1, and the combined jobs become a high-place job with a certain job ticket including a characteristic value of 2.

And finally, the first characteristic content and the second characteristic content can be associated through the operation ID to obtain a characteristic data set clustered by the ID.

Step 103: and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification.

In this embodiment, the feature extraction dataset of the job content is clustered by using Hamming algorithm (Hamming) in combination with hierarchical clustering algorithm (hierarchical clustering).

It will be understood that the hamming distance represents the number of different characters at corresponding positions of two (same length) character strings, and we denote the hamming distance between two characters x, y by d (x, y). And carrying out exclusive OR operation on the two character strings, and counting the number of 1, wherein the number is the Hamming distance. In a code group set, the number of bits with different symbol values on corresponding bits between any two code words is defined as the hamming distance between the two code words.

I.e., d (x, y) · y [ i ]. where i ═ 0,1,. n-1, x, y are n-bit codes, and #indicatesan exclusive or; for example: (00) the distance from (01) is 1, and the distances from (110) and (101) are 2. In a set of code groups, the minimum value of the hamming distance between any two codes is called the minimum hamming distance of the code group. The smaller the minimum hamming distance, the higher the similarity of code groups.

Hierarchical clustering algorithm (hierarchical clustering): small cluster combinations can be aggregated from bottom to top, or large clusters can be segmented from top to bottom. Take the aggregation from the bottom up as an example: that is, each time two clusters with the shortest distance are found, and then the two clusters are merged into a large cluster until all clusters are merged into a cluster. The whole process is to build a tree structure, similar to fig. 2. Initially, each data point is considered as a class by itself, and their distance is the distance between the two points. While for a cluster containing more than one data point, a variety of methods may be chosen. E.g., average-linkage, i.e., calculating the average of the pairwise distances of the respective data points of the two clusters. There is also a single-linkage/complete-linkage, which selects the distance of the pair of data points with the shortest/longest distance between the two clusters as the distance of the class.

As an optional implementation manner of this embodiment, performing packet clustering on the extracted feature content includes: taking the characteristic content of each operation ticket data as a code group, and clustering the code group based on a Hamming algorithm; and performing secondary clustering on the clustered data points by utilizing a hierarchical clustering algorithm based on the extracted preset characteristic content.

In this optional implementation manner, the feature content of the special job (i.e., the preset job in this embodiment) of each ticket (which can be distinguished by the job ticket ID) and the keyword whose word frequency of the content of the main ticket job is greater than K are used as a code group, the minimum hamming distance is calculated, and clustering is performed by comparing the minimum hamming distances, so as to perform group clustering on similar jobs. And then, the characteristic contents of preset dimensions (including operation areas, associated equipment, department of ownership, actual operation time and the like) in the main ticket data are introduced to perform secondary clustering through a hierarchical algorithm, so that the similar operations can be grouped and accurately clustered.

As an optional implementation manner of this embodiment, determining the item for risk prediction based on the data of each group after clustering includes: taking the items for risk identification shared by all the job ticket data in each group after clustering as fixed risk prediction items; and taking the items for risk identification which are not shared by the data in each group after clustering as optional risk prediction items, wherein each piece of job ticket data can contain the content of the risk items.

In the optional implementation manner, the fixed item of the item for risk identification is not deletable, and the optional item of the item for risk identification can be selected according to the actual operation condition; if the cluster group has the entries which are not output, the entries can be added in a self-defining mode, and the added entries are automatically updated into the model as self-defining items.

As an optional implementation manner of this embodiment, determining the item for risk prediction based on the data of each group after clustering includes: associating each job ticket data with the preprocessed entry for risk identification based on the job ticket data corresponding ID of each group; performing intersection operation and union operation on the items for risk identification associated with all the operation ticket data in each group; taking the items used for risk identification in the obtained intersection as fixed risk prediction items; and (4) performing difference on the obtained union set and the intersection set, and taking the entry used for risk identification in the set after difference as an optional risk prediction item.

In this optional implementation, after the similar jobs are grouped and clustered, the job ID is associated with the preprocessed data of the entry record data. Obtaining a fixed item of a risk identification prediction item by counting the intersection of the risk identification items in similar operation; and subtracting the intersection from the union set of the risk identification items in the similar operation by statistics to obtain a risk identification prediction item, namely the recommended item of the risk identification prediction item.

As an optional implementation manner of this embodiment, after receiving new job ticket data, the new job ticket data is input into the model for job risk identification, and an item for risk prediction corresponding to the new job ticket data is output.

In this optional implementation manner, after the user completes the filling of the basic information of the new job ticket and inputs the basic information into the model, the model performs feature extraction according to the content of the filled job ticket, clusters the basic information into the closest group, and finally outputs an entry for risk identification corresponding to the group.

Furthermore, the model can be corrected, along with the increase of the number of the jobs, the number of the features meeting the word frequency number condition of the job feature keywords can be correspondingly increased, the accuracy and the type of clustering can be correspondingly increased, the accuracy of prediction can be correspondingly increased, and the model can be periodically upgraded to obtain a prediction result with higher accuracy.

In the embodiment, the risk identification prediction of similar operations is realized by clustering after the characteristics of the operation contents are extracted; a method for extracting new characteristics of the job content by extracting the characteristic keywords of the job content and calculating the word frequency of the characteristic keywords; the method solves the defect that the risk identification in the industry is completely finished by the experience of an operation responsible person, and reduces the conditions of incomplete risk identification and high identification error rate caused by human factors; the problem that a large number of risk identification templates need to be configured after an enterprise is online in an operation management system is solved, the workload of system deployment is greatly reduced, and the online time is shortened; the self-correcting capability is provided, and along with the accumulation of the work amount, the granularity of the work cluster is finer, and the prediction result of the model is more accurate.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to an embodiment of the present disclosure, there is also provided an apparatus for implementing the above method for job risk identification, including: the data cleaning unit is configured to clean the operation ticket data according to a preset data cleaning rule after the historical operation ticket data is acquired; a feature extraction unit configured to preprocess the cleaned job ticket data to perform feature content extraction based on the preprocessed data; and the clustering unit is configured to perform grouping clustering on the extracted characteristic contents so as to determine an item for risk prediction based on the clustered data of each group, and obtain a model for job risk identification.

The device still includes: and the risk item prediction unit is configured to input new operation ticket data into the model for operation risk identification after receiving the new operation ticket data, and output an item for risk prediction corresponding to the new operation ticket data.

The two embodiments can realize intelligent risk identification of dangerous chemical enterprises in the operation management system according to operation contents, cluster operations of the same type by aiming at the historical records of operation tickets, and form a risk identification prediction model; carrying out risk identification item test on the newly applied job ticket by using the prediction model, dividing risk identification into a fixed item, a recommended item and a self-defined item, and correcting the prediction model in real time; and upgrading iteration can be automatically performed on the prediction model regularly (or quantitatively) according to the subsequently accumulated historical data, so that accurate clustering is realized, and the accuracy of intelligent risk identification is improved.

The embodiment of the present disclosure provides an electronic device, as shown in fig. 3, the electronic device includes one or more processors 31 and a memory 32, where one processor 31 is taken as an example in fig. 3.

The controller may further include: an input device 33 and an output device 34.

The processor 31, the memory 32, the input device 33 and the output device 34 may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example.

The processor 31 may be a Central Processing Unit (CPU). The processor 31 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 32, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the control methods in the embodiments of the present disclosure. The processor 31 executes various functional applications of the server and data processing, i.e. implements the method of the above-described method embodiments, by running non-transitory software programs, instructions and modules stored in the memory 32.

The memory 32 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a processing device operated by the server, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 32 may optionally include memory located remotely from the processor 31, which may be connected to a network connection device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 33 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing device of the server. The output device 34 may include a display device such as a display screen.

One or more modules are stored in the memory 32, which when executed by the one or more processors 31 perform the method as shown in fig. 1.

It will be understood by those skilled in the art that all or part of the processes in the methods according to the embodiments described above may be implemented by instructing relevant hardware through a computer program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the motor control methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-only memory (ROM), a Random Access Memory (RAM), a flash memory (flash memory), a hard disk (hard disk drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present disclosure have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the present disclosure, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A method for job risk identification, comprising:

after historical operation ticket data are obtained, cleaning the operation ticket data according to a preset data cleaning rule;

preprocessing the cleaned job ticket data to extract feature contents based on the preprocessed data;

and performing grouping clustering on the extracted characteristic contents to determine an item for risk prediction based on the data of each group after clustering, so as to obtain a model for job risk identification.

2. The method for job risk identification according to claim 1, further comprising:

and after receiving new operation ticket data, inputting the new operation ticket data into the model for operation risk identification, and outputting an item for risk prediction corresponding to the new operation ticket data.

3. The method for job risk identification according to claim 1, wherein determining items for risk prediction based on the clustered data of each group comprises:

taking the items for risk identification shared by all the job ticket data in each group after clustering as fixed risk prediction items;

and taking the items for risk identification which are not shared by the data in each group after clustering as optional risk prediction items, wherein each piece of job ticket data can contain the content of the risk items.

4. The method for job risk identification according to claim 1, wherein clustering the extracted feature content in groups comprises:

taking the characteristic content of each job ticket data as a code group, and clustering the code group based on a Hamming algorithm;

and performing secondary clustering on the clustered data points by utilizing a hierarchical clustering algorithm based on the extracted preset characteristic content.

5. The method for job risk identification according to claim 1, wherein the pre-processing the cleaned job data comprises:

extracting preset keywords aiming at main ticket data in the operation ticket data, and calculating word frequency;

aiming at preset operation data in the operation ticket data, based on the associated operation type and number of the main ticket, determining a first report form of a class sparse matrix corresponding to the operation ticket data by using the ID of the operation ticket data as a label;

and determining a second report form of the corresponding sparse matrix by using the ID of the job ticket data as a label based on the item for risk identification associated with the main ticket according to the item record data in the job ticket data.

6. The method for job risk identification according to claim 5, wherein feature content extraction based on the preprocessed data comprises:

extracting data of a preset dimensionality contained in the main ticket data and extracting keywords with word frequency larger than K aiming at the main ticket data, and taking the extracted data and the keywords as first characteristic contents;

aiming at preset operation data, merging and extracting similar operations in the first report to obtain second characteristic content;

and associating the first characteristic content and the second characteristic content through the ID of the job ticket data to obtain a cluster data set of the job ticket data.

7. The method for job risk identification according to claim 5, wherein said determining items for risk prediction based on clustered data of each group comprises:

associating each job ticket data with the preprocessed entry for risk identification based on the job ticket data corresponding ID of each group;

performing intersection operation and union operation on the items for risk identification associated with all the operation ticket data in each group;

taking the item used for risk identification in the obtained intersection as a fixed risk prediction item;

and (4) performing difference on the obtained union set and the intersection set, and taking the entry used for risk identification in the set after difference as an optional risk prediction item.

8. An apparatus for job risk identification, comprising:

the data cleaning unit is configured to clean the operation ticket data according to a preset data cleaning rule after the historical operation ticket data are obtained;

a feature extraction unit configured to preprocess the cleaned job ticket data to perform feature content extraction based on the preprocessed data;

and the clustering unit is configured to perform grouping clustering on the extracted characteristic contents so as to determine an item for risk prediction based on the clustered data of each group, and obtain a model for job risk identification.

9. A computer-readable storage medium storing computer instructions for causing a computer to perform the method for job risk identification of any one of claims 1-7.

10. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method for job risk identification of any one of claims 1-7.