CN113723689A

CN113723689A - Method, system, terminal and medium for constructing enterprise employee leave prediction model

Info

Publication number: CN113723689A
Application number: CN202111023281.9A
Authority: CN
Inventors: 刘纯熙; 张补卫; 王栋
Original assignee: CHANJET INFORMATION TECHNOLOGY CO LTD
Current assignee: CHANJET INFORMATION TECHNOLOGY CO LTD
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2021-11-30

Abstract

The invention discloses a method for constructing an enterprise employee job leaving prediction model, which comprises the following steps: acquiring historical information data of enterprise employees, wherein the historical information data comprises historical information data of employees who are at work and historical information data of employees who have left the work; preprocessing historical information data of enterprise employees, extracting and quantifying various employee characteristic information; positive and negative samples of employee job leaving are constructed according to historical information data of employees who leave the job, and an employee job leaving prediction model is constructed by utilizing a machine learning algorithm and used for predicting the job leaving probability of the employees who are in the job. According to the method, positive and negative samples of employee job leaving are constructed according to historical information data of employees who leave the job, and the result of sample labeling is used as a training set and a test set, so that the reliability of sample data is improved, and the prediction accuracy of a prediction model is improved.

Description

Method, system, terminal and medium for constructing enterprise employee leave prediction model

Technical Field

The invention relates to the technical field of computers, in particular to a method, a system, a terminal and a medium for constructing an enterprise employee departure prediction model.

Background

The enterprise staff is the most precious manpower resource of an enterprise, is the fundamental power and the source spring of enterprise development, and any enterprise can not leave the hard work of the staff. However, with the rapid development of economies, the flow of people between various enterprises is also rapidly increasing. The new personnel can fill the enterprise with fresh blood, but the employees who leave the enterprise inevitably suffer certain losses, and particularly, the leaving of the core employees of the enterprise causes the loss of human capital investment and the reconfiguration of human resources. Therefore, an enterprise employee departure early warning system is established, the departure tendency of the employees is discovered as soon as possible, and the enterprise can be helped to make response measures for employee departure in advance, including advance employee communication, employee saving and the like, so that the influence of employee departure on the enterprise and individuals is reduced to the minimum.

In view of the importance of employee job leaving to enterprises, there are a variety of models and ideas for employee job leaving prediction. The traditional construction method of the employee departure prediction model generally extracts partial data from data of employees who have left the job and data of employees who are on the job as training data, labels the employees who have left the job as positive samples, labels the employees who are on the job as negative samples, and uses the residual data as test data for verifying the prediction performance of the model. However, the establishment of the model has insurmountable disadvantages: the purpose of employee departure model building is to predict the likelihood of an employee leaving the job, but traditional model building methods have used part of the employee information for model building, i.e. assuming that these employees will not leave the job, this assumption is itself problematic. In the model optimization process, the goal of the optimization of the employee who is marked as being on duty is to predict that the employee will not leave duty, but the employee may leave duty in the near future, which is contradictory to the original purpose of model establishment (predicting whether the employee has a tendency to leave duty or not), so that the prediction accuracy of the model established by the traditional employee leave duty prediction model construction method is not high.

Disclosure of Invention

Aiming at the defects in the prior art, the embodiment of the invention provides a method, a system, a terminal and a medium for constructing a prediction model for employee job leaving, wherein positive and negative samples for employee job leaving are constructed according to historical information data of employees who leave, so that the reliability of the sample data is improved, and the prediction accuracy of the prediction model is improved.

In a first aspect, the method for constructing the enterprise employee leave prediction model provided by the embodiment of the present invention includes the following steps:

acquiring historical information data of enterprise employees, wherein the historical information data comprises historical information data of employees who are in the enterprise and historical information data of employees who have left the enterprise;

preprocessing the historical information data of the enterprise employees, extracting various employee characteristic information and quantifying;

positive and negative samples of employee job leaving are constructed according to historical information data of employees who leave the job, and an employee job leaving prediction model is constructed by utilizing a machine learning algorithm and used for predicting the job leaving probability of the employees who are in the job.

Optionally, the constructing positive and negative samples of employee attendance according to historical information data of employee already left off specifically includes:

taking the year as a time sampling unit;

dynamically splitting the employees who leave the office into a plurality of pieces of sample data according to the office years, taking the data of the current year of the employees as positive samples, and taking the data of the years of the employees as negative samples.

Optionally, the employee characteristic information includes a working age, a job level, an age, a working experience, an average working age in each unit, a working age in the last unit, a working age in the present enterprise, whether it is a local household, a sex, a department, a blood type, and a scholarship.

Optionally, the employee employment prediction model specifically includes a logistic regression employment prediction model, a decision tree employment prediction model, a random forest employment prediction model, a gradient promotion tree employment prediction model, or a deep learning employment prediction model.

In a second aspect, an embodiment of the present invention provides a system for building an enterprise employee leave prediction model, including: an information acquisition module, a preprocessing module and a model construction module,

the information acquisition module is used for acquiring historical information data of enterprise employees, wherein the historical information data comprises historical information data of employees who are on duty and historical information data of employees who are off duty;

the preprocessing module is used for preprocessing the historical information data of the enterprise employees, extracting various employee characteristic information and quantizing the information;

the model construction module is used for constructing positive and negative samples of employee job leaving according to historical information data of employees who have already left the job, and constructing an employee job leaving prediction model by utilizing a machine learning algorithm, wherein the employee job leaving prediction model is used for predicting the job leaving probability of the employees who are at the job.

Optionally, the model building module includes a sample processing unit, which takes the year as a time sampling unit, dynamically splits the employees who have left the job into a plurality of sample data according to the job year, takes the data of the current year of the employees as a positive sample, and takes the data of the current year of the employees as a negative sample.

In a third aspect, an intelligent terminal provided in an embodiment of the present invention includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method described in the foregoing embodiment.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, the computer program including program instructions, which, when executed by a processor, cause the processor to execute the method described in the above embodiment.

The invention has the beneficial effects that:

according to the construction method of the enterprise employee departure prediction model provided by the embodiment of the invention, the positive and negative samples of employee departure are constructed according to the historical information data of the employees who have left the employees, and the result of sample labeling is used as a training set and a test set, so that the reliability of sample data is improved, and the prediction accuracy of the prediction model is improved. The employee departure prediction model is simple, effective and feasible in construction method.

The system, the terminal and the medium for constructing the enterprise employee departure prediction model provided by the embodiment of the invention have the same beneficial effects as the method for constructing the enterprise employee departure prediction model.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a flow chart illustrating a method for constructing an employee departure prediction model according to a first embodiment of the present invention;

FIG. 2 illustrates a ROC curve of a logistic regression outlier prediction model in a first embodiment of the present invention;

FIG. 3 illustrates a ROC curve of a decision tree due prediction model in a first embodiment of the present invention;

FIG. 4 shows an ROC curve of a random forest escape prediction model in a first embodiment of the invention;

FIG. 5 illustrates an ROC curve of a gradient lifting tree abscission prediction model in a first embodiment of the present invention;

FIG. 6 is a block diagram illustrating a system for building an employee turnover prediction model according to a second embodiment of the present invention;

fig. 7 shows a block diagram of an intelligent terminal according to a third embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

As shown in fig. 1, a flowchart of a method for constructing an employee departure prediction model according to a first embodiment of the present invention is shown, where the method includes the following steps:

s1, acquiring historical information data of the enterprise employees, wherein the historical information data comprises historical information data of employees and historical information data of employees who leave;

s2, preprocessing the historical information data of the enterprise employees, extracting and quantifying a plurality of employee characteristic information;

and S3, constructing positive and negative samples of employee job leaving according to the historical information data of the employees who have left the job, and constructing an employee job leaving prediction model by using a machine learning algorithm, wherein the employee job leaving prediction model is used for predicting the job leaving probability of the employees who are at the job.

When obtaining the historical information data of the enterprise employees, the comprehensive data should be collected as much as possible, in this embodiment, the real historical information data of the enterprise employees is collected from the annual enterprise, so as to ensure the authenticity of the data, and avoid the risk that uncertain data may exist when using the current enterprise data, for example: a situation where the job may be left out in the next day today. The historical information data comprises historical information data of employees and historical information data of employees who have left the office, and partial data is extracted from the historical information data to be used as training data and testing data, wherein the data of the employees who have left the office is used for modeling and verification, and the data of the employees who are in the office is used for prediction.

The method for constructing the positive and negative samples of employee job leaving according to the historical information data of the employees who have left the job specifically comprises the following steps: taking the year as a time sampling unit, dynamically splitting the employees who leave the office into a plurality of sample data according to the office years, taking the data of the current year of the employees as a positive sample (leave office), and taking the data of the years of the employees as a negative sample (non-leave office). And, the data of each working year of the same employee specifically corresponds to the dynamic historical data of the corresponding year, such as: the ages of the same person in different time years are different, and the time and the grades of the jobs may be different. And performing equalization processing on the positive sample and the negative sample, extracting characteristic information as much as possible from the data after the equalization processing, wherein the characteristic information comprises the working age, the job level, the age, the number of units subjected to the working, the average working age in each unit, the working age in the previous unit, the working age in the enterprise, whether the data is local family, sex, department, blood type, academic calendar and the like, and performing quantization processing on each characteristic. 100 positive samples and 200 negative samples are randomly extracted, the sample data are randomly divided into two parts, 80% of data are used for model construction, and 20% of data are used for model verification.

And modeling by adopting a machine learning algorithm according to the marked positive and negative sample results so as to obtain an employee leave prediction model. Modeling by adopting a logistic regression algorithm to obtain a logistic regression deputy prediction model; the method comprises the steps of modeling by adopting a decision tree to obtain a decision tree departure prediction model, modeling by adopting a random forest algorithm to obtain a random forest departure prediction model, modeling by adopting a gradient lifting tree to obtain a gradient lifting tree departure prediction model, and modeling by adopting a deep learning algorithm to obtain a deep learning departure prediction model. The constructed employee departure prediction model is used for predicting the departure tendency of the current employees and further following the employees with higher departure probability according to the prediction result.

The effectiveness of the due-leave prediction model modeled using different machine learning algorithms is evaluated as follows.

1. Logistic regression deputy prediction model

Although referred to as regression, logistic regression is actually a classification model. Logistic regression is a very classical algorithm, and is widely applied to the aspect of model prediction due to its simplicity, parallelism and strong interpretability.

And evaluating the effectiveness of the logistic regression outlier prediction model by using two indexes of accuracy and AUC area of the ROC curve. The final prediction accuracy of the logistic regression outlier prediction model on the training set and the test set was 79.83% and 78.30%, respectively. The ROC curve of the logistic regression outlier prediction model is shown in FIG. 2. From the results in fig. 2, it can be seen that the prediction performance of the model on the training set and the test set is relatively stable and has little difference. The logistic regression model had an AUC value of 0.884 on the training set and 0.877 on the test set.

2. Decision tree job leaving prediction model

Decision trees are widely used for classification and regression tasks, are powerful, and can fit complex data sets. The method does not need any prior hypothesis on the data, has high calculation speed, is easy to interpret the result, and has strong robustness.

The prediction accuracy of the decision tree diversion prediction model on the training set and the test set is 100% and 82.55% respectively. The ROC curve of the decision tree departure prediction model in this embodiment is shown in fig. 3. From the results in fig. 3, it can be seen that the AUC value of the decision tree outlier prediction model in the training set is 1, and the AUC value in the test set is 0.81, and it can be seen that the performance difference of the decision tree outlier prediction model in the training set and the test set is large, and the performance in the training set is very good, which is considered by analysis that an overfitting phenomenon may occur during the training of the decision tree.

3. Random forest job leaving prediction model

Random forest refers to a classifier that trains and predicts samples using multiple decision trees. Compared with a decision tree model, the random forest model is more stable and is not easy to over-fit. The number of decision trees in the present model is chosen to be 100.

The final prediction accuracy of the random forest departure prediction model in the training set and the test set is 100% and 87.74% respectively. The ROC curve of the random forest departure prediction model in this example is shown in FIG. 4. As can be seen from the results in fig. 4, the AUC value of the random forest escape prediction model on the training set is 1, and the AUC value on the test set is 0.94. It can be seen that the performance of the random forest model is improved a lot on the test set compared with the decision tree departure prediction model. The random forest departure prediction model can partially overcome the problem of overfitting of the decision tree departure prediction model.

4. Gradient lifting tree deputy prediction model

Random forests and gradient boosting trees are all ensemble learning methods. The random forest adopts a bagging method, namely voting decision is carried out on decision results of a plurality of decision trees to achieve a final result. Different from random forests, the gradient boosting tree adopts a boosting method, and the method does not carry out voting on the output of a plurality of decision trees but carries out cascade addition and output. The number of decision trees in the present model is chosen to be 100.

The final prediction accuracy of the gradient lifting tree abscission prediction model in the training set and the test set is 94.10% and 87.26% respectively. In this embodiment, the ROC curve of the gradient lifting tree abscission prediction model is shown in fig. 5. From the results in fig. 5, the AUC value of the gradient lifting tree outlier prediction model on the training set is 0.9885, and the AUC value on the test set is 0.9333. It can be seen that the gradient boosting tree outlier prediction model performs well and is more consistent across the training set and the test set.

According to the method for constructing the job leaving prediction model by adopting various machine learning algorithms, different machine learning algorithms are adopted to enable the obtained job leaving prediction model to have different prediction performances. According to the method and the device, the positive and negative samples of employee departure are constructed according to historical information data of the employee, and the result of sample labeling is used as a training set and a test set, so that the reliability of sample data can be improved, and the prediction accuracy of a prediction model is improved. The situation that the employees are divided into the training sets to cause that the employees have no way to predict is avoided, and the situation that the predicted result is inaccurate due to the assumed condition error occurs.

According to the method for constructing the enterprise employee departure prediction model, provided by the first embodiment of the invention, positive and negative samples of employee departure are constructed according to historical information data of employees who have left the employees, and the result of sample labeling is used as a training set and a test set, so that the reliability of sample data is improved, and the prediction accuracy of the prediction model is improved. The employee departure prediction model is simple, effective and feasible in construction method.

In the first embodiment, a method for constructing an enterprise employee departure prediction model is provided, and correspondingly, a system for constructing an enterprise employee departure prediction model is also provided. Please refer to fig. 6, which is a block diagram illustrating a system for constructing an employee leave prediction model according to a second embodiment of the present invention. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

As shown in fig. 6, there is a block diagram illustrating a system for building an employee leave prediction model according to a second embodiment of the present invention, where the system includes: the system comprises an information acquisition module, a preprocessing module and a model construction module, wherein the information acquisition module is used for acquiring historical information data of enterprise employees, and the historical information data comprises historical information data of employees who are on duty and historical information data of employees who have left duty; the preprocessing module is used for preprocessing the historical information data of the enterprise employees, extracting various employee characteristic information and quantizing the information; the model building module is used for building positive and negative samples of employee job leaving according to historical information data of employees who have already left the job, and building an employee job leaving prediction model by utilizing a machine learning algorithm, wherein the employee job leaving prediction model is used for predicting the job leaving probability of the employees who are at the job.

In this embodiment, the model building module includes a sample processing unit, where the sample processing unit takes the year as a time sampling unit, dynamically splits a staff who has left the job into a plurality of sample data according to the job year, takes the data of the current year of the staff who leaves the job as a positive sample, and takes the data of the current year of the staff who is present as a negative sample.

In this embodiment, the employee characteristic information includes a working age, a job level, an age, a working history, an average working age in each unit, a working age in the previous unit, a working age in the present enterprise, whether it is a local family, a sex, a department, a blood type, a scholarship, and the like.

In this embodiment, the employee job leaving prediction model specifically includes a logistic regression job leaving prediction model, a decision tree job leaving prediction model, a random forest job leaving prediction model, a gradient promotion tree job leaving prediction model, or a deep learning job leaving prediction model.

The above is a description of an embodiment of a system for constructing an employee leave prediction model according to a second embodiment of the present invention.

The system for constructing the enterprise employee leave prediction model and the method for constructing the enterprise employee leave prediction model provided by the invention have the same inventive concept and the same beneficial effects, and are not repeated herein.

As shown in fig. 7, a block diagram of an intelligent terminal according to a third embodiment of the present invention is shown, where the terminal includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method described in the foregoing embodiment.

It should be understood that in the embodiments of the present invention, the Processor may be a Central Processing Unit (CPU), and the Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The input device may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone, etc., and the output device may include a display (LCD, etc.), a speaker, etc.

The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In a specific implementation, the processor, the input device, and the output device described in the embodiments of the present invention may execute the implementation described in the method embodiments provided in the embodiments of the present invention, and may also execute the implementation described in the system embodiments in the embodiments of the present invention, which is not described herein again.

The invention also provides an embodiment of a computer-readable storage medium, in which a computer program is stored, which computer program comprises program instructions that, when executed by a processor, cause the processor to carry out the method described in the above embodiment.

The computer readable storage medium may be an internal storage unit of the terminal described in the foregoing embodiment, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used for storing the computer program and other programs and data required by the terminal. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the terminal and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal and method can be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A method for constructing an enterprise employee departure prediction model is characterized by comprising the following steps:

2. The method of claim 1, wherein the constructing positive and negative samples of employee departures from historical information data of employees who have departed specifically comprises:

taking the year as a time sampling unit;

3. The method of claim 1, wherein the employee characteristic information comprises a working age, a job level, an age, a working experience, an average working age in each unit, a working age in a previous unit, a working age in the enterprise, whether it is a local household, a gender, a department, a blood type, and a scholarship.

4. The method of claim 1, wherein the employee employment prediction model specifically comprises a logistic regression employment prediction model, a decision tree employment prediction model, a random forest employment prediction model, a gradient elevation tree employment prediction model, or a deep learning employment prediction model.

5. A system for constructing an enterprise employee departure prediction model is characterized by comprising the following steps: an information acquisition module, a preprocessing module and a model construction module,

6. The system of claim 5, wherein the model building module comprises a sample processing unit, the sample processing unit takes the year as a time sampling unit, dynamically divides employees who have left the job into a plurality of sample data according to the job year, takes data of the current year of the employees who leave the job as a positive sample, and takes data of the current year of the employees who are present as a negative sample.

7. The system of claim 5, wherein the employee characteristic information comprises a working age, a job level, an age, a working experience, an average working age in each unit, a working age in a previous unit, a working age in the business, whether it is a local household, a gender, a department, a blood type, and a scholarship.

8. The system of claim 5, wherein the employee employment prediction model specifically comprises a logistic regression employment prediction model, a decision tree employment prediction model, a random forest employment prediction model, a gradient boosting tree employment prediction model, or a deep learning employment prediction model.

9. An intelligent terminal comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, the memory being adapted to store a computer program, the computer program comprising program instructions, characterized in that the processor is configured to invoke the program instructions to perform the method according to any of claims 1-4.

10. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any of claims 1-4.