CN105069526A

CN105069526A - Method of calculating employee retention degree probability

Info

Publication number: CN105069526A
Application number: CN201510464301.4A
Authority: CN
Inventors: 奚国坚; 何倩
Original assignee: China Pacific Insurance Group Co Ltd CPIC
Current assignee: China Pacific Insurance Group Co Ltd CPIC
Priority date: 2015-07-31
Filing date: 2015-07-31
Publication date: 2015-11-18

Abstract

The present invention provides a method of calculating employee retention degree probability. The method comprises the following steps of a, obtaining the retention states Y of the employees in a first sample set within a time period T after induction, wherein T is the time quantum length of retention observation; b, obtaining the input parameters X of the employees, wherein X=(X1, X2, ...Xm); c, based on the input parameters X and the retention states Y, determining the values of the coefficients beta, wherein beta=(beta0, beta1, beta2...betam); d, obtaining the input parameters X of the newly recruited employees, and calculating the coefficients beta and the retention probabilities P of the newly recruited employees based on the retention degree. According to the present invention, by the statistics analysis of various behavior data of the historical employees, the key behavior characteristics of influencing the employee retention are found, and the retention situations of the employees within a future period of time are predicted. Meanwhile, the present invention provides a calculation method of predicting the retention probabilities of the newly recruited employees, and the method provides the quantification assessment data having values for the enterprises to recruit employees.

Description

A kind of computing method calculating employee's degree of retention probability

Technical field

The present invention relates to and recruit high retention employee territories, particularly, relate to and by survey, the newly-increased employee that raises being given a mark, especially design the accurate algorithm that a kind of employee of calculating retains probability.

Background technology

Current careers market employee turnover is very frequent, no matter is to large enterprise, or for medium and small sized enterprises, the process of recruiting employee inherently can expend a large amount of man power and material of enterprise, in addition, the frequent turnover of employee, also directly can affect the process of business activity.Keep employee here to be realized by number of ways, such as, increase employee welfare treatment, enterprise culture, to the development prospect that employee is larger, play dynamic role of employee etc.But for enterprise, unrestrictedly can not increase these rigid standards of employee welfare treatment, whether employee also retains that factor has close relationship with corporate culture, employee post design etc.Research finds, in different regions, for different industries, the Consideration that employee retains has larger difference.Therefore, effectively reducing one of mode of enterprises recruitment cost is in recruitment, consider the indices of employee, selects the employee of high retention ratio.But existing enterprise recruits the method for employee, still rests on the clearing of perceptual knowledge and general index, accurately predicting employee does not retain the method for probability.

The present invention, by the statistical study to the every behavioral data of history employee, finds the critical behavior feature affecting employee and retain, and predicts the retention situation in employee's following a period of time.Meanwhile, based on history employee personality, enter to take charge of the retention situation of the content such as motivation and essential information and prediction, provide and a kind ofly predict that the newly-increased employee that raises retains the computing method of probability, provide tool valuable quantitative evaluation data for enterprise recruits employee.

Summary of the invention

In view of above-mentioned analysis, the present invention aims to provide a kind of computing method calculating employee's degree of retention probability, for calculating the retention probability P of one or more newly-increased employee that raises.

The invention discloses a kind of computing method calculating employee's degree of retention probability, be achieved through the following technical solutions, comprise:

A. obtain the first sample set employee and enter retention state Y in trade-after T time section, wherein, described T retains the time period length of observing, and described Y value is 0 or 1;

B. the input parameter X of described employee is obtained, wherein, described X=(X1, X2 ... Xm);

C. based on described input parameter X and described retention state Y, coefficient β value is determined, wherein, described β=(β ₀, β ₁... β _m);

D. obtain the input parameter X of the newly-increased employee that raises, calculate the retention probability P of the described newly-increased employee that raises based on described retention degree design factor β.

Preferably, described step a comprises following sub-step:

A1. obtain the second sample set employee and enter indication information R in trade-after T time section, described R=(R1, R2 ... Rm);

A2. obtain described employee and enter retention state Y in trade-after T time section, wherein, the value of Y is that 1 expression is run off, and 0 represents and retains;

A3. classified to described indication information R by decision Tree algorithms, obtaining described employee status in T time section is the computation rule S run off;

A4. obtain the indication information R of described first sample set employee, based on described computation rule S, calculate the retention state Y of the first sample set employee.

Preferably, described in described step a3, decision Tree algorithms realizes as follows:

A31. preset the quantity of decision tree father node and child node, and specify the number of levels of the tree structure of described decision tree, wherein, the instruction that described number of levels is restrained as described decision Tree algorithms;

If a32. the classification of F variable is all remarkable, the then remarkable desired value of the classification of a more described F variable, getting described indication information R corresponding under described remarkable desired value minimum is current best packet variable, wherein, and described m > F > 2;

A33. all subordinaties divided into groups for described optimization variables divide into groups to form the child node under this branch, and repeat above-mentioned steps a31, a32, a33 process, until arrive described default number of levels to described each child node;

A34. according to the loss employee accounting of the final each leaf node formed, rule that the higher leaf node of accounting comprises is chosen as described regular S.

Preferably, in described step a32, for judging that all significant step of the classification of F variable comprises the steps:

First, the chi-square value of all variable classifications in parameter information R;

Secondly, as described in the chi-square value of variable classification be greater than first threshold, then determine that the classification of described variable is remarkable.

Preferably, in described step a3, described decision Tree algorithms can adopt any one in CHAID algorithm, CART algorithm or C4.5 algorithm:

Preferably, in described step c, adopt logistic regression algorithm to build parameter X and the funtcional relationship retaining state Y.

Preferably, described logistic regression algorithm comprises following steps:

C1. Q is established _i=Q{Y _i=1| (X _1i, X _2i..., X _ki) be given X _1i, X _2i..., X _kiy under condition _ithe probability of=1; Y under equal conditions _ithe probability of=0 can be expressed as 1-Q _i;

C2. the retention probability Q (Y of each employee in described first sample set is drawn according to described step c1 _i) be:

Q (y_{i}) = Q_{i}^{Y_{i}} {(1 - Q_{i})}^{1 - Y_{i}};

C3. determine that the joint distribution of the first sample set employee is the product of each limit distribution of described retention probability, be expressed as with likelihood function:

C4. the formula drawn according to described step c3 gets natural logarithm, draws log-likelihood function:

\begin{matrix} L n (L (θ)) = L n (Π_{i = 1}^{n} {Q_{i}}^{Y_{i}} {(1 - Q_{i})}^{1 - Y_{i}}) \\ = Σ_{i = 1}^{n} [Y_{i} L n (Q_{i}) + (1 - Y_{i}) L n (1 - Q_{i})] \\ = Σ_{i = 1}^{n} [Y_{i} L n (\frac{Q_{i}}{1 - Q_{i}}) + L n (1 - Q_{i})] \\ = Σ_{i = 1}^{n} [Y_{i} (β_{0} + Σ_{j = 1}^{k} β_{j} X_{j i}) - L n (1 + e^{β_{0} + Σ_{j = 1}^{k} β_{j} X_{j i}})] \end{matrix}

C5. according to described log-likelihood function, local derviation is asked to β, thus determine described parameter value β.

Preferably, in described step c4 and c5, make local derviation equal 0, and ask parameter based on this so that calculating makes likelihood function get the parameter of maximal value.

Preferably, described parameter beta can be determined by any one in NewtonRaphson iterative algorithm or FisherScoring iterative algorithm.

Preferably, described in described steps d, probability P computing formula is as follows:

P = \frac{\exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + ... + β_{m} X_{m})}{1 + \exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + ... + β_{m} X_{m})}

Preferably, also comprise the steps:

E. according to the corresponding relation and business rule constraint of predicting retention ratio and actual retention ratio, Adjustable calculation is carried out to β value;

F. based on factor beta after described input parameter X and described adjustment ', calculate the retention mark Z of the described newly-increased employee that raises.

Preferably, in described step e to f, mark Z is retained by described in following formulae discovery:

Z = 100 * (1 - Σ_{i = 1}^{m} X_{i} {β^{,}}_{i})

Preferably, in described step b, the parameter X of the first sample set employee realizes as follows:

First, by the interview of employee increasing person business, confirm the key factor of business dimension, instruct and retain possible factor investigation questionnaire design;

Secondly, confirmed by the distribution characteristics of existing employee and the coordination of business department, choose in key and prop up questionnaire data collection of investigating, obtain the investigation questionnaire information X of the first sample set employee;

Finally, the exercise question higher by the correlation analysis retention relationship to X and Y enters model as input parameter X, and to raise questionnaire exercise question as follow-up increasing.

The invention provides the computing method calculating a kind of employee's degree of retention probability, adopt the retention situation of Analysis on Decision Tree Algorithm history employee, set up retention degree Definition Model according to historical data, the retention situation of prediction employee.Decision-tree model only need build once, just can repeatedly be suitable for, and the max calculation number of times predicted each time is no more than the degree of depth of decision tree, meets the demand of high efficiency in corporate operation, and decision-tree model has descriptive well, contributes to manual analysis.Adopt logistic regression algorithm, estimate the retention probability of the newly-increased employee that raises, predicting the outcome increases to enterprise the new employee that raises and has very high reference value, is conducive to reducing the cost that enterprise recruits employee.

Accompanying drawing explanation

By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:

Fig. 1 shows according to the first embodiment of the present invention, calculates the process flow diagram that employee retains probability in a kind of algorithm of the employee's of prediction degree of retention probability;

Fig. 2 shows according to a second embodiment of the present invention, predicts that employee retains the process flow diagram of state algorithm in a kind of computing method of the employee's of calculating degree of retention probability;

Fig. 3 shows according to a second embodiment of the present invention, builds the process flow diagram of decision tree in a kind of computing method of the employee's of calculating degree of retention probability;

According to the third embodiment of the invention Fig. 4 illustrates, builds by logistic regression algorithm the process flow diagram that employee increases scorecard model of raising;

Fig. 5 illustrates according to a fourth embodiment of the invention, calculates the process flow diagram of described newly-increased employee's survey mark of raising;

Fig. 6 illustrates according to a second embodiment of the present invention, the decision-tree model of structure; And

Fig. 7 illustrates according to the first embodiment of the present invention, checks the present invention to predict the statistical graph of employee's retention ratio accuracy by sample data.

Embodiment

In order to better make technical scheme of the present invention show clearly, below in conjunction with accompanying drawing, the invention will be further described.

Fig. 1 shows according to the first embodiment of the present invention, calculates the process flow diagram that employee retains probability, particularly, comprise the steps: in a kind of algorithm of the employee's of prediction degree of retention probability

First, enter step S301, obtain the first sample set employee and enter retention state Y in trade-after T time, described T retains the time span of observing, and described Y value is 0 or 1.Wherein, described retention state Y is a binary classification variable, as Y=1, represents that the retention state of described employee is for " loss "; As Y=0, represent that the retention state of described employee is for " retention ".Particularly, acquisition some employees enter the retention status data in trade-after T time, by described data upload to data-storage system, are defined as the first sample set.

It will be appreciated by those skilled in the art that as ensureing that the modeling data obtained recruits staff number certificate closer to new, as a kind of change case, the shorter employee of hiring date can be chosen and put into the first sample set, such as, can t be selected ₁, t ₂... t _kthe in-service employee of registration, assuming that current time is t ₀, t so usually ₀-t ₁< T, t ₀-t ₂< T ... t ₀-t _k< T, this does not affect technology contents of the present invention, does not repeat them here.

Next, perform step S302, obtain the input parameter X of described first sample set employee, wherein, described X=(X ₁, X ₂... X _m).Particularly, parameter X is each problem option of questionnaire, comprises personality, essential information, enters to take charge of motivation and source etc.Further, described parameter X, by the mode of employee increasing person business interview, confirms the key factor of business dimension, described key factor is designed to the problem of described questionnaire.It will be appreciated by those skilled in the art that in the present embodiment, the mode by filling in papery questionnaire obtains the parameter X of described employee.As a kind of situation of change, obtaining the input parameter X of employee also can be WEB acquisition mode, particularly, by providing web browser entrance to fill in questionnaire to described employee, and by the data upload that collects in data-storage system.Alternatively change case, the parameter X obtaining employee also can be mobile terminal acquisition mode, particularly, carry out off-line by providing mobile terminal client terminal to described employee and fill in questionnaire, and by uploading in data-interface services set in the data to data storage system that collects.In addition, other acquisition modes can also be adopted to obtain the input parameter X of described employee, and this does not affect technology contents of the present invention, does not repeat them here.

Then, perform step S303, based on input parameter X and the described retention state Y of described first sample set employee, determine coefficient β value, wherein, described β=(β ₁, β ₂... β _m).

Particularly, logistic regression algorithm can be adopted to build parameter X and the funtcional relationship retaining state Y, the value of design factor β.More specifically, the technical method shown in composition graphs 4.

Finally, perform step S304, obtain the input parameter X of the newly-increased employee that raises, based on described retention degree design factor β, when calculating Y=1, the loss probability P of the described newly-increased employee that raises, computing formula is as follows:

P = \frac{\exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + ... + β_{m} X_{m})}{1 + \exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + ... + β_{m} X_{m})}

Fig. 2 shows according to a second embodiment of the present invention, predicts that employee retains the process flow diagram of state algorithm, particularly, comprise the steps: in a kind of computing method of the employee's of calculating degree of retention probability

First, perform step S101, obtain the second sample set employee and enter indication information R in trade-after T time section, described R=(R ₁, R ₂... R _m).Preferably, the employee choosing same time registration in history, as the second sample set data, obtains the various actions performance in described employee 13 months, comprise lift achievement rate, turn out for work, first year the feature such as commission, be defined as indication information R.It will be appreciated by those skilled in the art that described indication information R is the basic act characteristic evaluating Employees' achievement in enterprise management system, obtain described indication information by relevant personnel department.

Next, perform step S102, obtain described employee and enter retention state Y in trade-after T time section, wherein, the value of Y is that 1 expression is run off, and 0 represents and retains.Particularly, obtain the retention situation of the second sample set employee registration after 13 months in described step S101, if described employee retains, then the retention state Y=0 of this employee; If described staff wastage, then the retention state Y=1 of this employee.

Then, perform step S103, classified by decision Tree algorithms to described indication information R, obtaining described employee status in T time section is the computation rule S run off.Those skilled in the art understand, when the state of employee is for running off, namely as Y=1, each indication information R is classified the classifying rules obtained for retention state compared to employee, the behavioural characteristic of described employee is more remarkable, therefore in acquisition T time section, described employee status is that the computation rule S run off can predict that employee retains state better.In this preferred embodiment, what obtain that described computation rule S adopts is CHAID algorithm, and concrete algorithmic procedure, can composition graphs 3.

As a kind of situation of change, described computation rule S can also adopt C4.5 algorithm, builds required decision tree.It will be appreciated by those skilled in the art that C4.5 algorithm specifically comprises the steps:

(1) according to indication information R in training dataset second sample set, this training dataset is sorted;

(2) wherein each indication information R is utilized dynamically to divide this training dataset;

(3) determine a threshold value in the different result set obtained after division, training dataset Data Placement is two parts by this threshold value;

(4) their gain or gain ratio is calculated respectively for the value of this both sides each several part, to ensure that the division selected makes gain maximum, the high attribute of those information gains closer to root node will have the privilege of first spanning tree, select attribute that those information gains are high from the tree close to root node.

As another kind of situation of change, described computation rule S can also adopt CART algorithm, builds decision tree.Particularly, CART algorithm and C4.5 algorithm difference are, the attribute module that it uses is Gini index, Gini index mainly metric data divides or the impurity level of training dataset second sample set is main, the attribute of coefficient value is as testing attribute, Gini value is less, shows that " degree of purity " of sample is higher.

CART algorithm is meeting one of following condition, is namely considered as leaf node and no longer carries out branch operation.

(1) sample number of all leaf nodes is 1, sample number is less than certain given minimum value or sample when all belonging to of a sort;

(2) height of decision tree reaches the threshold value that user is arranged, or time the sample attribute in leaf node after branch all belongs to same class;

(3) when training data concentrate no longer include attribute vector as branching selection time.

Finally, perform step S104, obtain the indication information R of described first sample set employee, based on described computation rule S, calculate the retention state Y of the first sample set employee.Preferably, the employee choosing same time registration in history, as the first sample set data, obtains the indication information R in described employee 13 months, based on described computation rule S, predicts the retention state Y of described first sample set employee.More specifically, can the decision-tree model shown in composition graphs 6.

Fig. 3 shows according to a second embodiment of the present invention, and a kind of process flow diagram calculating decision Tree algorithms in the computing method of employee's degree of retention probability, particularly, comprises the steps:

First, perform step S201, preset the quantity of decision tree father node and child node, and specify the number of levels of the tree structure of described decision tree, wherein, the instruction that described number of levels is restrained as described decision Tree algorithms.Those skilled in the art understand, decision tree is by according to the growth of setup parameter decision tree and stopping, total sample size size investigated by usual needs, and preferably, the quantity of father node is the twice of child node, if setting is too little, the tree generated can be very luxuriant, obtains a lot of very little segmenting market, can weaken actual reference significance, correspondingly, the setting of the number of levels of described decision tree is also as a same reason.

Next, perform step S202, calculate the chi-square value of all kinds of behavioural characteristic in the indication information R of described second sample set employee.Particularly, choose employee's indication information R of same time registration in history, wherein, described R=(R ₁, R ₂... R _m) as sample data, comprise lift achievement rate, turn out for work, the behavioural characteristic such as commission, regard the m kind behavior expression in described employee 13 months as different node, and according to the actual retention situation Y of described employee after 13 months, calculate chi-square value.

Then, perform step S203, compare the significance degree of each behavioural characteristic, as described in the chi-square value of variable classification be greater than first threshold, then determine that the classification of described variable is remarkable.It will be appreciated by those skilled in the art that the significance degree determining each behavioural characteristic, particularly, comprise the steps:

(1) suppose that employee retains situation and indication information R has nothing to do, namely R variable is on output variable Y without impact, and chi-square value is larger, and the related possibility of both explanations is larger;

(2) determine degree of freedom: (line number-1) * (columns-1)=1, select level of signifiance α=0.05, obtain chi-square value K _0.05;

(3) chi-square value K and the K of the chi-square value K of parameter information R, Comparative indices information R _0.05size.If K > is K _0.05, then refuse null hypothesis, show that employee retains situation relevant with indication information R, namely described indication information R classifies significantly.

Further, if the classification of F variable is all remarkable, the then remarkable desired value of the classification of a more described F variable, getting described indication information R corresponding under described remarkable desired value minimum is current best packet variable, wherein, described m > F > 2.

Next, perform step S204, all subordinaties for described best packet variable divide into groups to form the child node under this branch, and to described each child node repeating said steps S202, S203, S204 process, until arrive the number of levels preset, program stops according to the tree-shaped number of levels preset.

Finally, perform step S205, according to the loss employee accounting of the final each leaf node formed, choose rule that the higher leaf node of accounting comprises as described regular S.

What it will be appreciated by those skilled in the art that the decision Tree algorithms employing described in Fig. 3 is CHAID algorithm, i.e. Chisquare automatic interactiong detection method.In a change case, c4.5 or CART decision Tree algorithms can be adopted to select best packet variable, adopt the different decision trees constructed by decision tree account form to be different, specifically best mode can be selected to obtain optimal node according to actual conditions and build decision-tree model.

According to the third embodiment of the invention Fig. 4 illustrates, builds by logistic regression algorithm the process flow diagram that employee increases scorecard model of raising.Particularly, comprise the steps:

First, perform step S401, obtain the questionnaire information of the first sample set employee, comprise personality, enter to take charge of the various features such as motivation, source, be defined as parameter X, meanwhile, situation Y is retained in the prediction obtaining described first sample set employee.Particularly, described parameter X is each problem option of survey, is determined by following steps:

(1) by the interview of employee increasing person business, confirm the key factor of business dimension, instruct and retain possible factor investigation questionnaire design;

(2) confirmed by the distribution characteristics of existing employee and the coordination of business department, choose in key and prop up questionnaire data collection of investigating, obtain the investigation questionnaire information X of the first sample set employee;

(3) enter model by the exercise question that the correlation analysis retention relationship to X and Y is higher as input parameter X, and to raise questionnaire exercise question as follow-up increasing.

Next, step S402 is performed, if Q _i=Q{Y _i=1| (X _1i, X _2i..., X _ki) be given X _1i, X _2i..., X _kiy under condition _ithe probability of=1; Under equal conditions, Y _ithe probability of=0 is expressed as 1-Q _i; The retention probability Q (Y of described first sample set employee _i) be expressed as

Following again, perform step S403, draw likelihood function:

L (θ) = Π_{i = 1}^{n} Q_{i}^{Y_{i}} {(1 - Q_{i})}^{1 - Q_{i}}

It will be appreciated by those skilled in the art that each observation point is separate using the information of described first sample set employee as observation data, therefore their joint distribution can be expressed as the product of each limit distribution, represents with likelihood function.

Finally, perform step S404, make described likelihood function obtain maximal value, estimate unknown parameter β.Because maximum likelihood function calculates quite difficulty, so first get natural logarithm, draw log-likelihood function.Particularly, log-likelihood function can be expressed as:

\begin{matrix} L n (L (θ)) = L n (Π_{i = 1}^{n} {Q_{i}}^{Y_{i}} {(1 - Q_{i})}^{1 - Y_{i}}) \\ = Σ_{i = 1}^{n} [Y_{i} L n (Q_{i}) + (1 - Y_{i}) L n (1 - Q_{i})] \\ = Σ_{i = 1}^{n} [Y_{i} L n (\frac{Q_{i}}{1 - Q_{i}}) + L n (1 - Q_{i})] \\ = Σ_{i = 1}^{n} [Y_{i} (α + Σ_{j = 1}^{k} β_{j} X_{j i}) - L n (1 + e^{α + Σ_{j = 1}^{k} β_{j} X_{j i}})] \end{matrix}

More further, described log-likelihood function asks local derviation to α, β, makes local derviation equal 0, thus determines described parameter value β.

It will be appreciated by those skilled in the art that parameter value β is by NewtonRaphson iterative algorithm, or FisherScoring iterative algorithm is tried to achieve, wherein, described parameter X _jifor the behavior characteristic information of the described first sample set employee of acquisition, described parameter Y _iretain situation by the prediction of described employee.

Fig. 5 illustrates according to a fourth embodiment of the invention, calculates the process flow diagram of described newly-increased employee's survey mark of raising.Particularly, comprise the steps:

First, perform described step S501, obtain the weight beta corresponding to each exercise question option value in described survey, can composition graphs 4, show the concrete grammar calculating weight beta.

Next, perform step S502, obtained loss mark is oppositely converted into retention mark.Those skilled in the art understand, in above-mentioned steps, comprise the computation rule S obtaining staff wastage, calculate coefficient β value, the steps such as the probability of the newly-increased staff wastage of raising of prediction, all carry out based under staff wastage condition, therefore the mark finally calculated is staff wastage mark, therefore needs to be converted into retention mark to analyze the retention situation of the newly-increased employee that raises.

Next, performing described step S503, according to business need adjustment weight, is β ' by the weight definition after adjustment.It will be appreciated by those skilled in the art that because the requirement of industry-by-industry to employee's attainment is different, therefore Enterprises Leader can according to the business experience of reality, the suitably weight of each key element of adjustment, to realize predicting the most accurately.

Finally, perform described step S504, according to mark segmentation and the relation of prediction retention ratio, weight is normalized to centesimal system.It will be appreciated by those skilled in the art that particularly, the survey mark of the newly-increased employee that raises obtains by following computing formula:

S = 100 * (1 - Σ_{i = 1}^{m} X_{i} {β_{i}}^{,})

Fig. 6 illustrates according to a second embodiment of the present invention, the decision-tree model of structure.Particularly, described decision-tree model is built by CHAID method, i.e. one chi, determines best spliting node, builds the algorithm of decision tree.As can be seen from Figure 6, the optimal segmentation node affecting described employee retention is that employee lifts achievement rate (perf_mon_ratio) and employee's commission of first year (FYC_avg).More specifically, obtain the training sample data set of indication information R as modeling of the second sample set employee in T time section, those skilled in the art understand, utilize EM modeling tool, select Chisquare automatic interactiong detection method, automatic generation node, the side's of card conspicuousness is higher, more first becomes the spliting node of decision tree.In decision tree shown in Fig. 6, obtain the indication information R of 3222 employees in the second sample set as sample view data, wherein, the ratio of described staff wastage accounts for 66.91%, and retaining portion accounts for 33.09%.As perf_mon_ratio < 0.4083, the loss ratio of described employee is 89.85%; When perf_mon_ratio >=0.4083, the loss ratio of described employee is 12.55%.Further, continue to generate child node, as perf_mon_ratio < 0.1548, the loss ratio of described employee is 99.86%; When perf_mon_ratio >=0.1548, the loss ratio of described employee is 73.43%, not remarkable by described node data segmentation effect, stops continuing to generate child node.Visible, lift the branch node of achievement rate (perf_mon_ratio) as decision tree using employee, whether retain to have to employee and distinguish effect best.

Next, the child node generating decision tree is continued.When perf_mon_ratio >=0.4083, the loss ratio of described employee is 12.55%, with employee's commission of first year (FYC_avg) for branch node, as FYC_avg < 562.3692, the loss ratio of described employee is 23.38%; When FYC_avg >=562.3692, the loss ratio of described employee is 6.16%.Visible, using employee's commission of first year (FYC_avg) as child node, whether retain to have to employee and obvious distinguish effect.

Following again, continue to generate next decision tree child node, until point number preset.When FYC_avg >=562.3692, the loss ratio of described employee is 6.16%, with the attendance rate (att_pct) for branch node, as att_pct < 0.3815, the loss ratio of described employee is 16.42%; When att_pct >=0.3815, the loss ratio of described employee is 3.21%.Visible, with the attendance rate (att_pct) for spliting node, effect is distinguished with whether retaining to described employee and not obvious.

It will be appreciated by those skilled in the art that according to above-mentioned decision-tree model, obtain computation rule S, the retention situation of measurable first sample set employee, namely determines Y value.Particularly, first described computation rule S comprises the steps:, judge the act achievement rate (perf_mon_ratio) of the first sample set employee to be predicted, if the act achievement rate perf_mon_ratio < 0.4083 of described employee, then obtain the retention state Y=1 of described employee.If act achievement rate perf_mon_ratio >=0.4083 of described employee, further, first year commission (FYC_avg) of more described employee: as FYC_avg < 562.3692, the retention state Y=1 of described employee; When FYC_avg >=562.3692, the retention state Y=0 of described employee.

Those skilled in the art understand, as can be seen from Figure 6, in other one group of test samples data acquisition, contain the indication information R of 5739 employees, its decision-tree model finally built is consistent with above-mentioned conclusion, and described test samples data acquisition demonstrates the accuracy of above-mentioned decision-tree model classification.

Fig. 7 illustrates according to the first embodiment of the present invention, checks the present invention to predict the statistical graph of employee's retention ratio accuracy by test data.Particularly, described test data is 2014 1 employee's questionnaires to registration in March totally 665 parts, net result shows, end and become positive correlation between retention situation and mark really in by the end of October, 2014, as 0 ~ 59 mark section, retain probability and be only 0.0%, and mark is more than 90, retention ratio is up to 89.5%.Visible, described algorithm has good separating capacity for loss and retention employee.

It will be understood by those skilled in the art that preferably, modeling tool is EnterpriseMiner (being called for short EM), therefore selected algorithm is the specific algorithm of embedded node in EM.Realize all or part of flow process of above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in in computer-readable recording medium, and this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described computer-readable recording medium is disk, CD, read-only store-memory body or random store-memory body etc.

Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make various distortion or amendment within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims

1. calculate computing method for employee's degree of retention probability, for calculating the retention probability P of one or more newly-increased employee that raises, it is characterized in that, comprise the steps:

B. the input parameter X of described employee is obtained, wherein, described X=(X ₁, X ₂... X _m);

2. computing method according to claim 1, is characterized in that, described step a comprises following sub-step:

A1. obtain the second sample set employee and enter indication information R in trade-after T time section, described R=(R ₁, R ₂... R _m);

3. computing method according to claim 2, is characterized in that, described in described step a3, decision Tree algorithms realizes as follows:

4. computing method according to claim 3, is characterized in that, in described step a32, for judging that all significant step of the classification of F variable comprises the steps:

The chi-square value of all variable classifications in-parameter information R;

-as described in the chi-square value of variable classification be greater than first threshold, then determine that the classification of described variable is remarkable.

5. computing method according to claim 2, is characterized in that, in described step a3, described decision Tree algorithms can also adopt any one in following algorithm:

-CHAID algorithm;

-CART algorithm; Or

-C4.5 algorithm.

6. computing method according to any one of claim 1 to 5, is characterized in that, adopt logistic regression algorithm to build parameter X and the funtcional relationship retaining state Y in described step c.

7. computing method according to claim 6, is characterized in that, described logistic regression algorithm comprises following steps:

Q (Y_{i}) = Q_{i}^{Y_{i}} {(1 - Q_{i})}^{1 - Y_{i}};

L (θ) = \overset{n}{Π} Q_{i}^{Y_{i}} {(1 - Q_{i})}^{1 - Q_{i}};

\begin{matrix} L n (L (θ)) = L n (Π_{i = 1}^{n} {Q_{i}}^{Y_{i}} {(1 - Q_{i})}^{1 - Y_{i}}) \\ = Σ_{i = 1}^{n} [Y_{i} L n (Q_{i}) + (1 - Y_{i}) L n (1 - Q_{i})] \\ = Σ_{i = 1}^{n} [Y_{i} L n (\frac{Q_{i}}{1 - Q_{i}}) + L n (1 - Q_{i})] \\ = Σ_{i = 1}^{n} [Y_{i} (β_{0} + Σ_{j = 1}^{k} β_{j} X_{j i}) - L n (1 + e^{β_{0} + Σ_{j = 1}^{k} β_{j} X_{j i}})] \end{matrix}

8. computing method according to claim 7, is characterized in that, in described step c4 and c5, make local derviation equal 0, and ask parameter based on this so that calculating makes likelihood function get the parameter of maximal value.

9. the computing method according to claim 7 or 8, is characterized in that, can by the following method in any one determine described parameter beta:

-NewtonRaphson iterative algorithm; Or

-FisherScoring iterative algorithm.

10. according to the computing method described in claim 1 to 9, it is characterized in that, described in described steps d, probability P computing formula is as follows:

P = \frac{\exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + ... + β_{m} X_{m})}{1 + \exp (β_{0} + β_{1} X_{1} + β_{2} X_{2} + ... + β_{m} X_{m})}

11., according to the computing method described in claim 1 to 10, is characterized in that, also comprise the steps:

12. computing method according to claim 11, is characterized in that, in described step e to f, retain mark Z by described in following formulae discovery:

Z = 100 * (1 - Σ_{i = 1}^{m} X_{i} {β^{,}}_{i})

13. computing method according to claim 1, is characterized in that, in described step b, the parameter X of the first sample set employee realizes as follows:

-by the interview of employee increasing person business, confirm the key factor of business dimension, instruct and retain possible factor investigation questionnaire design;

-confirmed by the distribution characteristics of existing employee and the coordination of business department, choose in key and prop up questionnaire data collection of investigating, obtain the investigation questionnaire information X of the first sample set employee;

-enter model by the exercise question that the correlation analysis retention relationship to X and Y is higher as input parameter X, and to raise questionnaire exercise question as follow-up increasing.