CN105069526A - Method of calculating employee retention degree probability - Google Patents

Method of calculating employee retention degree probability Download PDF

Info

Publication number
CN105069526A
CN105069526A CN201510464301.4A CN201510464301A CN105069526A CN 105069526 A CN105069526 A CN 105069526A CN 201510464301 A CN201510464301 A CN 201510464301A CN 105069526 A CN105069526 A CN 105069526A
Authority
CN
China
Prior art keywords
employee
retention
beta
computing method
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510464301.4A
Other languages
Chinese (zh)
Inventor
奚国坚
何倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Pacific Insurance Group Co Ltd CPIC
Original Assignee
China Pacific Insurance Group Co Ltd CPIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Pacific Insurance Group Co Ltd CPIC filed Critical China Pacific Insurance Group Co Ltd CPIC
Priority to CN201510464301.4A priority Critical patent/CN105069526A/en
Publication of CN105069526A publication Critical patent/CN105069526A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a method of calculating employee retention degree probability. The method comprises the following steps of a, obtaining the retention states Y of the employees in a first sample set within a time period T after induction, wherein T is the time quantum length of retention observation; b, obtaining the input parameters X of the employees, wherein X=(X1, X2, ...Xm); c, based on the input parameters X and the retention states Y, determining the values of the coefficients beta, wherein beta=(beta0, beta1, beta2...betam); d, obtaining the input parameters X of the newly recruited employees, and calculating the coefficients beta and the retention probabilities P of the newly recruited employees based on the retention degree. According to the present invention, by the statistics analysis of various behavior data of the historical employees, the key behavior characteristics of influencing the employee retention are found, and the retention situations of the employees within a future period of time are predicted. Meanwhile, the present invention provides a calculation method of predicting the retention probabilities of the newly recruited employees, and the method provides the quantification assessment data having values for the enterprises to recruit employees.

Description

A kind of computing method calculating employee's degree of retention probability
Technical field
The present invention relates to and recruit high retention employee territories, particularly, relate to and by survey, the newly-increased employee that raises being given a mark, especially design the accurate algorithm that a kind of employee of calculating retains probability.
Background technology
Current careers market employee turnover is very frequent, no matter is to large enterprise, or for medium and small sized enterprises, the process of recruiting employee inherently can expend a large amount of man power and material of enterprise, in addition, the frequent turnover of employee, also directly can affect the process of business activity.Keep employee here to be realized by number of ways, such as, increase employee welfare treatment, enterprise culture, to the development prospect that employee is larger, play dynamic role of employee etc.But for enterprise, unrestrictedly can not increase these rigid standards of employee welfare treatment, whether employee also retains that factor has close relationship with corporate culture, employee post design etc.Research finds, in different regions, for different industries, the Consideration that employee retains has larger difference.Therefore, effectively reducing one of mode of enterprises recruitment cost is in recruitment, consider the indices of employee, selects the employee of high retention ratio.But existing enterprise recruits the method for employee, still rests on the clearing of perceptual knowledge and general index, accurately predicting employee does not retain the method for probability.
The present invention, by the statistical study to the every behavioral data of history employee, finds the critical behavior feature affecting employee and retain, and predicts the retention situation in employee's following a period of time.Meanwhile, based on history employee personality, enter to take charge of the retention situation of the content such as motivation and essential information and prediction, provide and a kind ofly predict that the newly-increased employee that raises retains the computing method of probability, provide tool valuable quantitative evaluation data for enterprise recruits employee.
Summary of the invention
In view of above-mentioned analysis, the present invention aims to provide a kind of computing method calculating employee's degree of retention probability, for calculating the retention probability P of one or more newly-increased employee that raises.
The invention discloses a kind of computing method calculating employee's degree of retention probability, be achieved through the following technical solutions, comprise:
A. obtain the first sample set employee and enter retention state Y in trade-after T time section, wherein, described T retains the time period length of observing, and described Y value is 0 or 1;
B. the input parameter X of described employee is obtained, wherein, described X=(X1, X2 ... Xm);
C. based on described input parameter X and described retention state Y, coefficient β value is determined, wherein, described β=(β 0, β 1... β m);
D. obtain the input parameter X of the newly-increased employee that raises, calculate the retention probability P of the described newly-increased employee that raises based on described retention degree design factor β.
Preferably, described step a comprises following sub-step:
A1. obtain the second sample set employee and enter indication information R in trade-after T time section, described R=(R1, R2 ... Rm);
A2. obtain described employee and enter retention state Y in trade-after T time section, wherein, the value of Y is that 1 expression is run off, and 0 represents and retains;
A3. classified to described indication information R by decision Tree algorithms, obtaining described employee status in T time section is the computation rule S run off;
A4. obtain the indication information R of described first sample set employee, based on described computation rule S, calculate the retention state Y of the first sample set employee.
Preferably, described in described step a3, decision Tree algorithms realizes as follows:
A31. preset the quantity of decision tree father node and child node, and specify the number of levels of the tree structure of described decision tree, wherein, the instruction that described number of levels is restrained as described decision Tree algorithms;
If a32. the classification of F variable is all remarkable, the then remarkable desired value of the classification of a more described F variable, getting described indication information R corresponding under described remarkable desired value minimum is current best packet variable, wherein, and described m > F > 2;
A33. all subordinaties divided into groups for described optimization variables divide into groups to form the child node under this branch, and repeat above-mentioned steps a31, a32, a33 process, until arrive described default number of levels to described each child node;
A34. according to the loss employee accounting of the final each leaf node formed, rule that the higher leaf node of accounting comprises is chosen as described regular S.
Preferably, in described step a32, for judging that all significant step of the classification of F variable comprises the steps:
First, the chi-square value of all variable classifications in parameter information R;
Secondly, as described in the chi-square value of variable classification be greater than first threshold, then determine that the classification of described variable is remarkable.
Preferably, in described step a3, described decision Tree algorithms can adopt any one in CHAID algorithm, CART algorithm or C4.5 algorithm:
Preferably, in described step c, adopt logistic regression algorithm to build parameter X and the funtcional relationship retaining state Y.
Preferably, described logistic regression algorithm comprises following steps:
C1. Q is established i=Q{Y i=1| (X 1i, X 2i..., X ki) be given X 1i, X 2i..., X kiy under condition ithe probability of=1; Y under equal conditions ithe probability of=0 can be expressed as 1-Q i;
C2. the retention probability Q (Y of each employee in described first sample set is drawn according to described step c1 i) be: Q ( y i ) = Q i Y i ( 1 - Q i ) 1 - Y i ;
C3. determine that the joint distribution of the first sample set employee is the product of each limit distribution of described retention probability, be expressed as with likelihood function:
C4. the formula drawn according to described step c3 gets natural logarithm, draws log-likelihood function:
L n ( L ( θ ) ) = L n ( Π i = 1 n Q i Y i ( 1 - Q i ) 1 - Y i ) = Σ i = 1 n [ Y i L n ( Q i ) + ( 1 - Y i ) L n ( 1 - Q i ) ] = Σ i = 1 n [ Y i L n ( Q i 1 - Q i ) + L n ( 1 - Q i ) ] = Σ i = 1 n [ Y i ( β 0 + Σ j = 1 k β j X j i ) - L n ( 1 + e β 0 + Σ j = 1 k β j X j i ) ]
C5. according to described log-likelihood function, local derviation is asked to β, thus determine described parameter value β.
Preferably, in described step c4 and c5, make local derviation equal 0, and ask parameter based on this so that calculating makes likelihood function get the parameter of maximal value.
Preferably, described parameter beta can be determined by any one in NewtonRaphson iterative algorithm or FisherScoring iterative algorithm.
Preferably, described in described steps d, probability P computing formula is as follows:
P = exp ( β 0 + β 1 X 1 + β 2 X 2 + ... + β m X m ) 1 + exp ( β 0 + β 1 X 1 + β 2 X 2 + ... + β m X m )
Preferably, also comprise the steps:
E. according to the corresponding relation and business rule constraint of predicting retention ratio and actual retention ratio, Adjustable calculation is carried out to β value;
F. based on factor beta after described input parameter X and described adjustment ', calculate the retention mark Z of the described newly-increased employee that raises.
Preferably, in described step e to f, mark Z is retained by described in following formulae discovery:
Z = 100 * ( 1 - Σ i = 1 m X i β , i )
Preferably, in described step b, the parameter X of the first sample set employee realizes as follows:
First, by the interview of employee increasing person business, confirm the key factor of business dimension, instruct and retain possible factor investigation questionnaire design;
Secondly, confirmed by the distribution characteristics of existing employee and the coordination of business department, choose in key and prop up questionnaire data collection of investigating, obtain the investigation questionnaire information X of the first sample set employee;
Finally, the exercise question higher by the correlation analysis retention relationship to X and Y enters model as input parameter X, and to raise questionnaire exercise question as follow-up increasing.
The invention provides the computing method calculating a kind of employee's degree of retention probability, adopt the retention situation of Analysis on Decision Tree Algorithm history employee, set up retention degree Definition Model according to historical data, the retention situation of prediction employee.Decision-tree model only need build once, just can repeatedly be suitable for, and the max calculation number of times predicted each time is no more than the degree of depth of decision tree, meets the demand of high efficiency in corporate operation, and decision-tree model has descriptive well, contributes to manual analysis.Adopt logistic regression algorithm, estimate the retention probability of the newly-increased employee that raises, predicting the outcome increases to enterprise the new employee that raises and has very high reference value, is conducive to reducing the cost that enterprise recruits employee.
Accompanying drawing explanation
By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 shows according to the first embodiment of the present invention, calculates the process flow diagram that employee retains probability in a kind of algorithm of the employee's of prediction degree of retention probability;
Fig. 2 shows according to a second embodiment of the present invention, predicts that employee retains the process flow diagram of state algorithm in a kind of computing method of the employee's of calculating degree of retention probability;
Fig. 3 shows according to a second embodiment of the present invention, builds the process flow diagram of decision tree in a kind of computing method of the employee's of calculating degree of retention probability;
According to the third embodiment of the invention Fig. 4 illustrates, builds by logistic regression algorithm the process flow diagram that employee increases scorecard model of raising;
Fig. 5 illustrates according to a fourth embodiment of the invention, calculates the process flow diagram of described newly-increased employee's survey mark of raising;
Fig. 6 illustrates according to a second embodiment of the present invention, the decision-tree model of structure; And
Fig. 7 illustrates according to the first embodiment of the present invention, checks the present invention to predict the statistical graph of employee's retention ratio accuracy by sample data.
Embodiment
In order to better make technical scheme of the present invention show clearly, below in conjunction with accompanying drawing, the invention will be further described.
Fig. 1 shows according to the first embodiment of the present invention, calculates the process flow diagram that employee retains probability, particularly, comprise the steps: in a kind of algorithm of the employee's of prediction degree of retention probability
First, enter step S301, obtain the first sample set employee and enter retention state Y in trade-after T time, described T retains the time span of observing, and described Y value is 0 or 1.Wherein, described retention state Y is a binary classification variable, as Y=1, represents that the retention state of described employee is for " loss "; As Y=0, represent that the retention state of described employee is for " retention ".Particularly, acquisition some employees enter the retention status data in trade-after T time, by described data upload to data-storage system, are defined as the first sample set.
It will be appreciated by those skilled in the art that as ensureing that the modeling data obtained recruits staff number certificate closer to new, as a kind of change case, the shorter employee of hiring date can be chosen and put into the first sample set, such as, can t be selected 1, t 2... t kthe in-service employee of registration, assuming that current time is t 0, t so usually 0-t 1< T, t 0-t 2< T ... t 0-t k< T, this does not affect technology contents of the present invention, does not repeat them here.
Next, perform step S302, obtain the input parameter X of described first sample set employee, wherein, described X=(X 1, X 2... X m).Particularly, parameter X is each problem option of questionnaire, comprises personality, essential information, enters to take charge of motivation and source etc.Further, described parameter X, by the mode of employee increasing person business interview, confirms the key factor of business dimension, described key factor is designed to the problem of described questionnaire.It will be appreciated by those skilled in the art that in the present embodiment, the mode by filling in papery questionnaire obtains the parameter X of described employee.As a kind of situation of change, obtaining the input parameter X of employee also can be WEB acquisition mode, particularly, by providing web browser entrance to fill in questionnaire to described employee, and by the data upload that collects in data-storage system.Alternatively change case, the parameter X obtaining employee also can be mobile terminal acquisition mode, particularly, carry out off-line by providing mobile terminal client terminal to described employee and fill in questionnaire, and by uploading in data-interface services set in the data to data storage system that collects.In addition, other acquisition modes can also be adopted to obtain the input parameter X of described employee, and this does not affect technology contents of the present invention, does not repeat them here.
Then, perform step S303, based on input parameter X and the described retention state Y of described first sample set employee, determine coefficient β value, wherein, described β=(β 1, β 2... β m).
Particularly, logistic regression algorithm can be adopted to build parameter X and the funtcional relationship retaining state Y, the value of design factor β.More specifically, the technical method shown in composition graphs 4.
Finally, perform step S304, obtain the input parameter X of the newly-increased employee that raises, based on described retention degree design factor β, when calculating Y=1, the loss probability P of the described newly-increased employee that raises, computing formula is as follows:
P = exp ( &beta; 0 + &beta; 1 X 1 + &beta; 2 X 2 + ... + &beta; m X m ) 1 + exp ( &beta; 0 + &beta; 1 X 1 + &beta; 2 X 2 + ... + &beta; m X m )
Fig. 2 shows according to a second embodiment of the present invention, predicts that employee retains the process flow diagram of state algorithm, particularly, comprise the steps: in a kind of computing method of the employee's of calculating degree of retention probability
First, perform step S101, obtain the second sample set employee and enter indication information R in trade-after T time section, described R=(R 1, R 2... R m).Preferably, the employee choosing same time registration in history, as the second sample set data, obtains the various actions performance in described employee 13 months, comprise lift achievement rate, turn out for work, first year the feature such as commission, be defined as indication information R.It will be appreciated by those skilled in the art that described indication information R is the basic act characteristic evaluating Employees' achievement in enterprise management system, obtain described indication information by relevant personnel department.
Next, perform step S102, obtain described employee and enter retention state Y in trade-after T time section, wherein, the value of Y is that 1 expression is run off, and 0 represents and retains.Particularly, obtain the retention situation of the second sample set employee registration after 13 months in described step S101, if described employee retains, then the retention state Y=0 of this employee; If described staff wastage, then the retention state Y=1 of this employee.
Then, perform step S103, classified by decision Tree algorithms to described indication information R, obtaining described employee status in T time section is the computation rule S run off.Those skilled in the art understand, when the state of employee is for running off, namely as Y=1, each indication information R is classified the classifying rules obtained for retention state compared to employee, the behavioural characteristic of described employee is more remarkable, therefore in acquisition T time section, described employee status is that the computation rule S run off can predict that employee retains state better.In this preferred embodiment, what obtain that described computation rule S adopts is CHAID algorithm, and concrete algorithmic procedure, can composition graphs 3.
As a kind of situation of change, described computation rule S can also adopt C4.5 algorithm, builds required decision tree.It will be appreciated by those skilled in the art that C4.5 algorithm specifically comprises the steps:
(1) according to indication information R in training dataset second sample set, this training dataset is sorted;
(2) wherein each indication information R is utilized dynamically to divide this training dataset;
(3) determine a threshold value in the different result set obtained after division, training dataset Data Placement is two parts by this threshold value;
(4) their gain or gain ratio is calculated respectively for the value of this both sides each several part, to ensure that the division selected makes gain maximum, the high attribute of those information gains closer to root node will have the privilege of first spanning tree, select attribute that those information gains are high from the tree close to root node.
As another kind of situation of change, described computation rule S can also adopt CART algorithm, builds decision tree.Particularly, CART algorithm and C4.5 algorithm difference are, the attribute module that it uses is Gini index, Gini index mainly metric data divides or the impurity level of training dataset second sample set is main, the attribute of coefficient value is as testing attribute, Gini value is less, shows that " degree of purity " of sample is higher.
CART algorithm is meeting one of following condition, is namely considered as leaf node and no longer carries out branch operation.
(1) sample number of all leaf nodes is 1, sample number is less than certain given minimum value or sample when all belonging to of a sort;
(2) height of decision tree reaches the threshold value that user is arranged, or time the sample attribute in leaf node after branch all belongs to same class;
(3) when training data concentrate no longer include attribute vector as branching selection time.
Finally, perform step S104, obtain the indication information R of described first sample set employee, based on described computation rule S, calculate the retention state Y of the first sample set employee.Preferably, the employee choosing same time registration in history, as the first sample set data, obtains the indication information R in described employee 13 months, based on described computation rule S, predicts the retention state Y of described first sample set employee.More specifically, can the decision-tree model shown in composition graphs 6.
Fig. 3 shows according to a second embodiment of the present invention, and a kind of process flow diagram calculating decision Tree algorithms in the computing method of employee's degree of retention probability, particularly, comprises the steps:
First, perform step S201, preset the quantity of decision tree father node and child node, and specify the number of levels of the tree structure of described decision tree, wherein, the instruction that described number of levels is restrained as described decision Tree algorithms.Those skilled in the art understand, decision tree is by according to the growth of setup parameter decision tree and stopping, total sample size size investigated by usual needs, and preferably, the quantity of father node is the twice of child node, if setting is too little, the tree generated can be very luxuriant, obtains a lot of very little segmenting market, can weaken actual reference significance, correspondingly, the setting of the number of levels of described decision tree is also as a same reason.
Next, perform step S202, calculate the chi-square value of all kinds of behavioural characteristic in the indication information R of described second sample set employee.Particularly, choose employee's indication information R of same time registration in history, wherein, described R=(R 1, R 2... R m) as sample data, comprise lift achievement rate, turn out for work, the behavioural characteristic such as commission, regard the m kind behavior expression in described employee 13 months as different node, and according to the actual retention situation Y of described employee after 13 months, calculate chi-square value.
Then, perform step S203, compare the significance degree of each behavioural characteristic, as described in the chi-square value of variable classification be greater than first threshold, then determine that the classification of described variable is remarkable.It will be appreciated by those skilled in the art that the significance degree determining each behavioural characteristic, particularly, comprise the steps:
(1) suppose that employee retains situation and indication information R has nothing to do, namely R variable is on output variable Y without impact, and chi-square value is larger, and the related possibility of both explanations is larger;
(2) determine degree of freedom: (line number-1) * (columns-1)=1, select level of signifiance α=0.05, obtain chi-square value K 0.05;
(3) chi-square value K and the K of the chi-square value K of parameter information R, Comparative indices information R 0.05size.If K > is K 0.05, then refuse null hypothesis, show that employee retains situation relevant with indication information R, namely described indication information R classifies significantly.
Further, if the classification of F variable is all remarkable, the then remarkable desired value of the classification of a more described F variable, getting described indication information R corresponding under described remarkable desired value minimum is current best packet variable, wherein, described m > F > 2.
Next, perform step S204, all subordinaties for described best packet variable divide into groups to form the child node under this branch, and to described each child node repeating said steps S202, S203, S204 process, until arrive the number of levels preset, program stops according to the tree-shaped number of levels preset.
Finally, perform step S205, according to the loss employee accounting of the final each leaf node formed, choose rule that the higher leaf node of accounting comprises as described regular S.
What it will be appreciated by those skilled in the art that the decision Tree algorithms employing described in Fig. 3 is CHAID algorithm, i.e. Chisquare automatic interactiong detection method.In a change case, c4.5 or CART decision Tree algorithms can be adopted to select best packet variable, adopt the different decision trees constructed by decision tree account form to be different, specifically best mode can be selected to obtain optimal node according to actual conditions and build decision-tree model.
According to the third embodiment of the invention Fig. 4 illustrates, builds by logistic regression algorithm the process flow diagram that employee increases scorecard model of raising.Particularly, comprise the steps:
First, perform step S401, obtain the questionnaire information of the first sample set employee, comprise personality, enter to take charge of the various features such as motivation, source, be defined as parameter X, meanwhile, situation Y is retained in the prediction obtaining described first sample set employee.Particularly, described parameter X is each problem option of survey, is determined by following steps:
(1) by the interview of employee increasing person business, confirm the key factor of business dimension, instruct and retain possible factor investigation questionnaire design;
(2) confirmed by the distribution characteristics of existing employee and the coordination of business department, choose in key and prop up questionnaire data collection of investigating, obtain the investigation questionnaire information X of the first sample set employee;
(3) enter model by the exercise question that the correlation analysis retention relationship to X and Y is higher as input parameter X, and to raise questionnaire exercise question as follow-up increasing.
Next, step S402 is performed, if Q i=Q{Y i=1| (X 1i, X 2i..., X ki) be given X 1i, X 2i..., X kiy under condition ithe probability of=1; Under equal conditions, Y ithe probability of=0 is expressed as 1-Q i; The retention probability Q (Y of described first sample set employee i) be expressed as
Following again, perform step S403, draw likelihood function:
L ( &theta; ) = &Pi; i = 1 n Q i Y i ( 1 - Q i ) 1 - Q i
It will be appreciated by those skilled in the art that each observation point is separate using the information of described first sample set employee as observation data, therefore their joint distribution can be expressed as the product of each limit distribution, represents with likelihood function.
Finally, perform step S404, make described likelihood function obtain maximal value, estimate unknown parameter β.Because maximum likelihood function calculates quite difficulty, so first get natural logarithm, draw log-likelihood function.Particularly, log-likelihood function can be expressed as:
L n ( L ( &theta; ) ) = L n ( &Pi; i = 1 n Q i Y i ( 1 - Q i ) 1 - Y i ) = &Sigma; i = 1 n &lsqb; Y i L n ( Q i ) + ( 1 - Y i ) L n ( 1 - Q i ) &rsqb; = &Sigma; i = 1 n &lsqb; Y i L n ( Q i 1 - Q i ) + L n ( 1 - Q i ) &rsqb; = &Sigma; i = 1 n &lsqb; Y i ( &alpha; + &Sigma; j = 1 k &beta; j X j i ) - L n ( 1 + e &alpha; + &Sigma; j = 1 k &beta; j X j i ) &rsqb;
More further, described log-likelihood function asks local derviation to α, β, makes local derviation equal 0, thus determines described parameter value β.
It will be appreciated by those skilled in the art that parameter value β is by NewtonRaphson iterative algorithm, or FisherScoring iterative algorithm is tried to achieve, wherein, described parameter X jifor the behavior characteristic information of the described first sample set employee of acquisition, described parameter Y iretain situation by the prediction of described employee.
Fig. 5 illustrates according to a fourth embodiment of the invention, calculates the process flow diagram of described newly-increased employee's survey mark of raising.Particularly, comprise the steps:
First, perform described step S501, obtain the weight beta corresponding to each exercise question option value in described survey, can composition graphs 4, show the concrete grammar calculating weight beta.
Next, perform step S502, obtained loss mark is oppositely converted into retention mark.Those skilled in the art understand, in above-mentioned steps, comprise the computation rule S obtaining staff wastage, calculate coefficient β value, the steps such as the probability of the newly-increased staff wastage of raising of prediction, all carry out based under staff wastage condition, therefore the mark finally calculated is staff wastage mark, therefore needs to be converted into retention mark to analyze the retention situation of the newly-increased employee that raises.
Next, performing described step S503, according to business need adjustment weight, is β ' by the weight definition after adjustment.It will be appreciated by those skilled in the art that because the requirement of industry-by-industry to employee's attainment is different, therefore Enterprises Leader can according to the business experience of reality, the suitably weight of each key element of adjustment, to realize predicting the most accurately.
Finally, perform described step S504, according to mark segmentation and the relation of prediction retention ratio, weight is normalized to centesimal system.It will be appreciated by those skilled in the art that particularly, the survey mark of the newly-increased employee that raises obtains by following computing formula:
S = 100 * ( 1 - &Sigma; i = 1 m X i &beta; i , )
Fig. 6 illustrates according to a second embodiment of the present invention, the decision-tree model of structure.Particularly, described decision-tree model is built by CHAID method, i.e. one chi, determines best spliting node, builds the algorithm of decision tree.As can be seen from Figure 6, the optimal segmentation node affecting described employee retention is that employee lifts achievement rate (perf_mon_ratio) and employee's commission of first year (FYC_avg).More specifically, obtain the training sample data set of indication information R as modeling of the second sample set employee in T time section, those skilled in the art understand, utilize EM modeling tool, select Chisquare automatic interactiong detection method, automatic generation node, the side's of card conspicuousness is higher, more first becomes the spliting node of decision tree.In decision tree shown in Fig. 6, obtain the indication information R of 3222 employees in the second sample set as sample view data, wherein, the ratio of described staff wastage accounts for 66.91%, and retaining portion accounts for 33.09%.As perf_mon_ratio < 0.4083, the loss ratio of described employee is 89.85%; When perf_mon_ratio >=0.4083, the loss ratio of described employee is 12.55%.Further, continue to generate child node, as perf_mon_ratio < 0.1548, the loss ratio of described employee is 99.86%; When perf_mon_ratio >=0.1548, the loss ratio of described employee is 73.43%, not remarkable by described node data segmentation effect, stops continuing to generate child node.Visible, lift the branch node of achievement rate (perf_mon_ratio) as decision tree using employee, whether retain to have to employee and distinguish effect best.
Next, the child node generating decision tree is continued.When perf_mon_ratio >=0.4083, the loss ratio of described employee is 12.55%, with employee's commission of first year (FYC_avg) for branch node, as FYC_avg < 562.3692, the loss ratio of described employee is 23.38%; When FYC_avg >=562.3692, the loss ratio of described employee is 6.16%.Visible, using employee's commission of first year (FYC_avg) as child node, whether retain to have to employee and obvious distinguish effect.
Following again, continue to generate next decision tree child node, until point number preset.When FYC_avg >=562.3692, the loss ratio of described employee is 6.16%, with the attendance rate (att_pct) for branch node, as att_pct < 0.3815, the loss ratio of described employee is 16.42%; When att_pct >=0.3815, the loss ratio of described employee is 3.21%.Visible, with the attendance rate (att_pct) for spliting node, effect is distinguished with whether retaining to described employee and not obvious.
It will be appreciated by those skilled in the art that according to above-mentioned decision-tree model, obtain computation rule S, the retention situation of measurable first sample set employee, namely determines Y value.Particularly, first described computation rule S comprises the steps:, judge the act achievement rate (perf_mon_ratio) of the first sample set employee to be predicted, if the act achievement rate perf_mon_ratio < 0.4083 of described employee, then obtain the retention state Y=1 of described employee.If act achievement rate perf_mon_ratio >=0.4083 of described employee, further, first year commission (FYC_avg) of more described employee: as FYC_avg < 562.3692, the retention state Y=1 of described employee; When FYC_avg >=562.3692, the retention state Y=0 of described employee.
Those skilled in the art understand, as can be seen from Figure 6, in other one group of test samples data acquisition, contain the indication information R of 5739 employees, its decision-tree model finally built is consistent with above-mentioned conclusion, and described test samples data acquisition demonstrates the accuracy of above-mentioned decision-tree model classification.
Fig. 7 illustrates according to the first embodiment of the present invention, checks the present invention to predict the statistical graph of employee's retention ratio accuracy by test data.Particularly, described test data is 2014 1 employee's questionnaires to registration in March totally 665 parts, net result shows, end and become positive correlation between retention situation and mark really in by the end of October, 2014, as 0 ~ 59 mark section, retain probability and be only 0.0%, and mark is more than 90, retention ratio is up to 89.5%.Visible, described algorithm has good separating capacity for loss and retention employee.
It will be understood by those skilled in the art that preferably, modeling tool is EnterpriseMiner (being called for short EM), therefore selected algorithm is the specific algorithm of embedded node in EM.Realize all or part of flow process of above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in in computer-readable recording medium, and this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described computer-readable recording medium is disk, CD, read-only store-memory body or random store-memory body etc.
Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims (13)

1. calculate computing method for employee's degree of retention probability, for calculating the retention probability P of one or more newly-increased employee that raises, it is characterized in that, comprise the steps:
A. obtain the first sample set employee and enter retention state Y in trade-after T time section, wherein, described T retains the time period length of observing, and described Y value is 0 or 1;
B. the input parameter X of described employee is obtained, wherein, described X=(X 1, X 2... X m);
C. based on described input parameter X and described retention state Y, coefficient β value is determined, wherein, described β=(β 0, β 1... β m);
D. obtain the input parameter X of the newly-increased employee that raises, calculate the retention probability P of the described newly-increased employee that raises based on described retention degree design factor β.
2. computing method according to claim 1, is characterized in that, described step a comprises following sub-step:
A1. obtain the second sample set employee and enter indication information R in trade-after T time section, described R=(R 1, R 2... R m);
A2. obtain described employee and enter retention state Y in trade-after T time section, wherein, the value of Y is that 1 expression is run off, and 0 represents and retains;
A3. classified to described indication information R by decision Tree algorithms, obtaining described employee status in T time section is the computation rule S run off;
A4. obtain the indication information R of described first sample set employee, based on described computation rule S, calculate the retention state Y of the first sample set employee.
3. computing method according to claim 2, is characterized in that, described in described step a3, decision Tree algorithms realizes as follows:
A31. preset the quantity of decision tree father node and child node, and specify the number of levels of the tree structure of described decision tree, wherein, the instruction that described number of levels is restrained as described decision Tree algorithms;
If a32. the classification of F variable is all remarkable, the then remarkable desired value of the classification of a more described F variable, getting described indication information R corresponding under described remarkable desired value minimum is current best packet variable, wherein, and described m > F > 2;
A33. all subordinaties divided into groups for described optimization variables divide into groups to form the child node under this branch, and repeat above-mentioned steps a31, a32, a33 process, until arrive described default number of levels to described each child node;
A34. according to the loss employee accounting of the final each leaf node formed, rule that the higher leaf node of accounting comprises is chosen as described regular S.
4. computing method according to claim 3, is characterized in that, in described step a32, for judging that all significant step of the classification of F variable comprises the steps:
The chi-square value of all variable classifications in-parameter information R;
-as described in the chi-square value of variable classification be greater than first threshold, then determine that the classification of described variable is remarkable.
5. computing method according to claim 2, is characterized in that, in described step a3, described decision Tree algorithms can also adopt any one in following algorithm:
-CHAID algorithm;
-CART algorithm; Or
-C4.5 algorithm.
6. computing method according to any one of claim 1 to 5, is characterized in that, adopt logistic regression algorithm to build parameter X and the funtcional relationship retaining state Y in described step c.
7. computing method according to claim 6, is characterized in that, described logistic regression algorithm comprises following steps:
C1. Q is established i=Q{Y i=1| (X 1i, X 2i..., X ki) be given X 1i, X 2i..., X kiy under condition ithe probability of=1; Y under equal conditions ithe probability of=0 can be expressed as 1-Q i;
C2. the retention probability Q (Y of each employee in described first sample set is drawn according to described step c1 i) be: Q ( Y i ) = Q i Y i ( 1 - Q i ) 1 - Y i ;
C3. determine that the joint distribution of the first sample set employee is the product of each limit distribution of described retention probability, be expressed as with likelihood function: L ( &theta; ) = &Pi; n Q i Y i ( 1 - Q i ) 1 - Q i ;
C4. the formula drawn according to described step c3 gets natural logarithm, draws log-likelihood function:
L n ( L ( &theta; ) ) = L n ( &Pi; i = 1 n Q i Y i ( 1 - Q i ) 1 - Y i ) = &Sigma; i = 1 n &lsqb; Y i L n ( Q i ) + ( 1 - Y i ) L n ( 1 - Q i ) &rsqb; = &Sigma; i = 1 n &lsqb; Y i L n ( Q i 1 - Q i ) + L n ( 1 - Q i ) &rsqb; = &Sigma; i = 1 n &lsqb; Y i ( &beta; 0 + &Sigma; j = 1 k &beta; j X j i ) - L n ( 1 + e &beta; 0 + &Sigma; j = 1 k &beta; j X j i ) &rsqb;
C5. according to described log-likelihood function, local derviation is asked to β, thus determine described parameter value β.
8. computing method according to claim 7, is characterized in that, in described step c4 and c5, make local derviation equal 0, and ask parameter based on this so that calculating makes likelihood function get the parameter of maximal value.
9. the computing method according to claim 7 or 8, is characterized in that, can by the following method in any one determine described parameter beta:
-NewtonRaphson iterative algorithm; Or
-FisherScoring iterative algorithm.
10. according to the computing method described in claim 1 to 9, it is characterized in that, described in described steps d, probability P computing formula is as follows:
P = exp ( &beta; 0 + &beta; 1 X 1 + &beta; 2 X 2 + ... + &beta; m X m ) 1 + exp ( &beta; 0 + &beta; 1 X 1 + &beta; 2 X 2 + ... + &beta; m X m )
11., according to the computing method described in claim 1 to 10, is characterized in that, also comprise the steps:
E. according to the corresponding relation and business rule constraint of predicting retention ratio and actual retention ratio, Adjustable calculation is carried out to β value;
F. based on factor beta after described input parameter X and described adjustment ', calculate the retention mark Z of the described newly-increased employee that raises.
12. computing method according to claim 11, is characterized in that, in described step e to f, retain mark Z by described in following formulae discovery:
Z = 100 * ( 1 - &Sigma; i = 1 m X i &beta; , i )
13. computing method according to claim 1, is characterized in that, in described step b, the parameter X of the first sample set employee realizes as follows:
-by the interview of employee increasing person business, confirm the key factor of business dimension, instruct and retain possible factor investigation questionnaire design;
-confirmed by the distribution characteristics of existing employee and the coordination of business department, choose in key and prop up questionnaire data collection of investigating, obtain the investigation questionnaire information X of the first sample set employee;
-enter model by the exercise question that the correlation analysis retention relationship to X and Y is higher as input parameter X, and to raise questionnaire exercise question as follow-up increasing.
CN201510464301.4A 2015-07-31 2015-07-31 Method of calculating employee retention degree probability Pending CN105069526A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510464301.4A CN105069526A (en) 2015-07-31 2015-07-31 Method of calculating employee retention degree probability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510464301.4A CN105069526A (en) 2015-07-31 2015-07-31 Method of calculating employee retention degree probability

Publications (1)

Publication Number Publication Date
CN105069526A true CN105069526A (en) 2015-11-18

Family

ID=54498886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510464301.4A Pending CN105069526A (en) 2015-07-31 2015-07-31 Method of calculating employee retention degree probability

Country Status (1)

Country Link
CN (1) CN105069526A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092612A (en) * 2016-08-15 2017-08-25 北京小度信息科技有限公司 A kind of method and device for pushing resource
CN107169571A (en) * 2016-03-07 2017-09-15 阿里巴巴集团控股有限公司 A kind of Feature Selection method and device
CN107203551A (en) * 2016-03-17 2017-09-26 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN110166498A (en) * 2018-02-11 2019-08-23 腾讯科技(深圳)有限公司 Class of subscriber determines method and device, computer equipment and storage medium
CN112036641A (en) * 2020-08-31 2020-12-04 中国平安人寿保险股份有限公司 Retention prediction method, device, computer equipment and medium based on artificial intelligence
CN113010579A (en) * 2021-03-24 2021-06-22 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169571A (en) * 2016-03-07 2017-09-15 阿里巴巴集团控股有限公司 A kind of Feature Selection method and device
CN107203551A (en) * 2016-03-17 2017-09-26 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN107203551B (en) * 2016-03-17 2020-10-23 腾讯科技(深圳)有限公司 Data processing method and device
CN107092612A (en) * 2016-08-15 2017-08-25 北京小度信息科技有限公司 A kind of method and device for pushing resource
CN110166498A (en) * 2018-02-11 2019-08-23 腾讯科技(深圳)有限公司 Class of subscriber determines method and device, computer equipment and storage medium
CN110166498B (en) * 2018-02-11 2021-09-28 腾讯科技(深圳)有限公司 User category determination method and device, computer equipment and storage medium
CN112036641A (en) * 2020-08-31 2020-12-04 中国平安人寿保险股份有限公司 Retention prediction method, device, computer equipment and medium based on artificial intelligence
CN112036641B (en) * 2020-08-31 2024-05-14 中国平安人寿保险股份有限公司 Artificial intelligence-based retention prediction method, apparatus, computer device and medium
CN113010579A (en) * 2021-03-24 2021-06-22 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN113010579B (en) * 2021-03-24 2024-05-14 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105069526A (en) Method of calculating employee retention degree probability
Semenoglou et al. Investigating the accuracy of cross-learning time series forecasting methods
Eiben et al. Parameter tuning for configuring and analyzing evolutionary algorithms
CN110648014B (en) Regional wind power prediction method and system based on space-time quantile regression
Rahman et al. Discretization of continuous attributes through low frequency numerical values and attribute interdependency
Babatunde et al. Impact of climatic change on agricultural product yield using k-means and multiple linear regressions
Fallahpour et al. An evolutionary-based predictive soft computing model for the prediction of electricity consumption using multi expression programming
CN106529732A (en) Carbon emission efficiency prediction method based on neural network and random frontier analysis
Chen et al. A case-based distance method for screening in multiple-criteria decision aid
Kwakkel et al. From predictive modeling to exploratory modeling: How to use non-predictive models for decisionmaking under deep uncertainty
CN101826090A (en) WEB public opinion trend forecasting method based on optimal model
Zou et al. Wind turbine power curve modeling using an asymmetric error characteristic-based loss function and a hybrid intelligent optimizer
Khan et al. Evaluating the performance of several data mining methods for predicting irrigation water requirement
CN105808582A (en) Parallel generation method and device of decision tree on the basis of layered strategy
CN114580762A (en) Hydrological forecast error correction method based on XGboost
Pecori et al. Incremental learning of fuzzy decision trees for streaming data classification
Uğuz et al. A hybrid CNN-LSTM model for traffic accident frequency forecasting during the tourist season
Barbagallo et al. Discovering reservoir operating rules by a rough set approach
Nazarov et al. Optimization of prediction results based on ensemble methods of machine learning
Cheng et al. Efficient scenario analysis for optimal adaptation of bridge networks under deep uncertainties through knowledge transfer
CN113837913B (en) Method and device for determining key threshold value of rural cultivated land resource bearing capacity
Peng et al. Stock price prediction based on recurrent neural network with long short-term memory units
Ishibashi et al. Knowledge extraction using a genetic fuzzy rule-based system with increased interpretability
Mia et al. Machine learning approach for predicting bridge components’ condition ratings
Sun et al. An application of decision tree and genetic algorithms for financial ratios' dynamic selection and financial distress prediction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20151118