CN109558887A - A kind of method and apparatus of predictive behavior - Google Patents

A kind of method and apparatus of predictive behavior Download PDF

Info

Publication number
CN109558887A
CN109558887A CN201710892426.6A CN201710892426A CN109558887A CN 109558887 A CN109558887 A CN 109558887A CN 201710892426 A CN201710892426 A CN 201710892426A CN 109558887 A CN109558887 A CN 109558887A
Authority
CN
China
Prior art keywords
attribute
classification
behavior
information
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710892426.6A
Other languages
Chinese (zh)
Inventor
刘国亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710892426.6A priority Critical patent/CN109558887A/en
Publication of CN109558887A publication Critical patent/CN109558887A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources

Abstract

The invention discloses a kind of method and apparatus of predictive behavior, are related to field of communication technology.One specific embodiment of this method includes: acquisition sample information;Wherein, sample information includes behavior classification and attribute classification;It calculates each attribute classification and is expert at and decision tree is trained, with extracting the attribute classification of preset number to generate decision rule for the information gain under classification;It according to decision rule, treats predictive information and carries out behavior prediction, to obtain behavior classification corresponding with information to be predicted.The embodiment reduces the interference of the factors to behavior prediction such as artificial, improves the accuracy of behavior prediction, while behavioral reasons can be checked in decision rule.

Description

A kind of method and apparatus of predictive behavior
Technical field
The present invention relates to communication technique field more particularly to a kind of method and apparatus of predictive behavior.
Background technique
Although labor turnover phenomenon is customary, often unpredictable, enterprise is often given in the unexpected leaving office of employee Work progress brings certain influence.Many drainage of human resources departments are also how employee to be learnt before labor turnover Whether there is leaving office to be inclined to and rack one's brains.
Whether prior art prediction employee has leaving office to be inclined to, and mainly passes through questionnaire survey mode, investigation society, enterprises With the factors such as individual to the comprehensive stability value of employee influence.Comprehensive stability value is bigger, indicate labor turnover a possibility that it is smaller.
It, can history log according to employee, the behavior of employee's work hours as a kind of alternative embodiments Data (such as browsed website, used software), judge employee whether have leaving office be inclined to.
In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery:
(1) in such a way that whether questionnaire survey employee leaves office, uncertain factor is more, such as employee exaggerates purposely Factor causes prediction result accuracy low.
(2) it is to predict employee by capture human behavior (such as behavioral data of history log, work hours) The mode of no leaving office, the available higher prediction result of accuracy, but HR can not learn labor turnover in this way Reason leads to not provide reference using corresponding strategy for business stability employee.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus of predictive behavior, it is at least able to solve existing skill Art predictive behavior accuracy is low, can not check the problem of behavioral reasons.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of method of predictive behavior is provided, is wrapped It includes: obtaining sample information;Wherein, sample information includes behavior classification and attribute classification;Each attribute classification is calculated to be expert at for class Information gain under not is trained decision tree, with extracting the attribute classification of preset number to generate decision rule;According to Decision rule treats predictive information and carries out behavior prediction, to obtain behavior classification corresponding with information to be predicted.
Optionally, after obtaining sample information, further includes: identified according to preset attribute-value ranges threshold value and attribute value Corresponding relationship, determine the corresponding attribute value mark of each attribute value in each attribute classification.
Optionally, present invention method further include:
According to formula
Determine the corresponding relationship of attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark Know.
Optionally, in the corresponding relationship according to preset attribute-value ranges threshold value and attribute value mark, each attribute is determined In classification after the corresponding attribute value mark of each attribute value, further includes: according to Chi-square Test, calculate each attribute classification and behavior Difference value between classification, removal difference value are less than the attribute classification of predetermined difference threshold value.
Optionally, it calculates each attribute classification to be expert at for the information gain under classification, to extract the Attribute class of preset number Not, decision tree is trained, includes: to generate decision rule
Sample information is divided into first sample information and the second sample information;
According to first sample information, calculates each attribute classification and be expert at for the information gain under classification, according to information gain Sequence from big to small is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision rule;
According to the first decision rule, behavior class prediction is carried out to the second sample information, extract the behavior classification of prediction with Consistent first decision rule of the behavior classification of second sample information, to generate the second decision rule;
It according to the second decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with information to be predicted For classification.
To achieve the above object, according to another aspect of an embodiment of the present invention, a kind of device of predictive behavior is provided, is wrapped It includes: module is obtained, for obtaining sample information;Wherein, sample information includes behavior classification and attribute classification;Training module is used Be expert in calculating each attribute classification for the information gain under classification, to extract the attribute classification of preset number, to decision tree into Row training, to generate decision rule;Prediction module, for treating predictive information and carrying out behavior prediction, to obtain according to decision rule Take behavior classification corresponding with information to be predicted.
Optionally, the device of that embodiment of the invention further include: determining module, for according to preset attribute-value ranges threshold value with The corresponding relationship of attribute value mark determines the corresponding attribute value mark of each attribute value in each attribute classification.
Optionally it is determined that module is also used to, according to formula
Determine the corresponding relationship of attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark Know.
Optionally, the device of that embodiment of the invention further includes inspection module, for calculating each Attribute class according to Chi-square Test Difference value not between behavior classification, removal difference value are less than the attribute classification of predetermined difference threshold value.
Optionally, training module is also used to: sample information is divided into first sample information and the second sample information;According to First sample information calculates each attribute classification and is expert at for the information gain under classification, according to information gain from big to small suitable Sequence is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision rule;It is advised according to the first decision Then, behavior class prediction is carried out to the second sample information, extracts the behavior classification of prediction and the behavior classification of the second sample information Consistent first decision rule, to generate the second decision rule;According to the second decision rule, it is pre- to treat predictive information progress behavior It surveys, to obtain behavior classification corresponding with information to be predicted.
To achieve the above object, according to an embodiment of the present invention in another aspect, the electronics for providing a kind of predictive behavior is set It is standby.
The electronic equipment of the embodiment of the present invention includes: one or more processors;Storage device, for storing one or more A program, when one or more of programs are executed by one or more of processors, so that one or more of processing The method that device realizes any of the above-described predictive behavior.
To achieve the above object, according to an embodiment of the present invention in another aspect, provide a kind of computer-readable medium, On be stored with computer program, which is characterized in that any of the above-described prediction row is realized when described program is executed by processor For method.
The scheme of the offer according to the present invention, one embodiment in foregoing invention have the following advantages that or beneficial to effects Fruit: can effectively find rule of conduct present in sample information, reduce the interference of the factors to behavior prediction such as artificial origin, mention The accuracy of high behavior prediction, while enterprise uses corresponding plan according to output is predicted as a result, behavioral reasons can be learnt for enterprise Reference is slightly provided.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is a kind of main flow schematic diagram of the method for predictive behavior according to an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of the method for optional predictive behavior according to an embodiment of the present invention;
Fig. 3 is the flow diagram of the method for another optional predictive behavior according to an embodiment of the present invention;
Fig. 4 is the flow diagram of the method for another optional predictive behavior according to an embodiment of the present invention;
Fig. 5 is a kind of main modular schematic diagram of the device of predictive behavior according to an embodiment of the present invention;
Fig. 6 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein;
Fig. 7 is adapted for the structural representation for realizing the mobile device of the embodiment of the present invention or the computer system of server Figure.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
Referring to Fig. 1, thus it is shown that a kind of broad flow diagram of the method for behavior prediction provided in an embodiment of the present invention, including Following steps:
S101: sample information is obtained;Wherein, sample information includes behavior classification and attribute classification.
S102: calculating each attribute classification and be expert at for the information gain under classification, to extract the attribute classification of preset number, Decision tree is trained, to generate decision rule.
S103: it according to decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with information to be predicted For classification.
It should be noted that behavior classification provided by the embodiment of the present invention can be it is a variety of, such as fault diagnosis, leave office Diagnosis etc., by labor turnover and it is on-job for be illustrated.
It, can be from data management system (such as human resource management system for step S101 in above-described embodiment mode System) in obtain sample information, which can be using X86 framework (SuSE) Linux OS server, with Hadoop building.For example, the multiple information tables related with employee for being subordinate to same company are extracted from data management system, It is merged with representing the unique information (such as ID card No., employee number) of employee as association, to generate employee's sample information Same employee information is shown in same a line by table.
Specifically, referring to table 1, the information table after merging is (X1, X2 ..., Xn, D), wherein X1, X2 ..., Xn expression person Work attribute classification corresponds respectively to SEX (gender), SYS_NAME (affiliated system), MARI_STA (marital status), SALARY (wages), JOB_LEVEL_DESCR (title and rank), NATI-PLA (native place), LOCATION (job site), HIGHEST_ EDUC_LVL_DESCR (highest education degree), AGE (age), SL (department's age);D corresponds to behavior classification, corresponds to EMPL_ The case where CLASS (leaving office/on-job state), expression personnel on-job (staying) and leaving office (going):
1 employee information table of table
For step S102, training decision tree can be using decision tree ID3 algorithm, slightly in a manner of generating decision rule The classifiers such as rough collection, random forest, the embodiment of the present invention are illustrated by taking decision tree ID3 algorithm as an example, and implementation procedure is as follows:
Tuple classification information entropy E in D is indicated are as follows:
Wherein, c indicates behavior classification total number, piIt is general to indicate that ith attribute classification occurs in entire sample information Rate.The practical significance of entropy indicates it is that information content it is expected required for the class label of tuple in D.
Training tuple D is divided by attribute Xn, then the expectation information that Xn divides D are as follows:
Wherein, β is the total number divided according to attribute Xn.
Information gain is the difference of the two:
Gain(D,Xn)=E (D)-E (D, Xn)
In short, concluding decision tree ID3 algorithm is exactly to calculate the other information of each Attribute class when needing to divide every time and increase Then benefit selects the attribute classification of information gain preferably (for example, maximum) to classify sample information, and the attribute classification is made For a decision node;For each attribute value in the decision node, a corresponding branch is created, and divides sample accordingly Information.Using same process, top-down carry out recurrence belongs to same behavior class until all sample informations occurs in branch Not, when can divide without remaining attribute classification, not have one of sample information in branch, stopping continues to divide.It gives birth to At decision rule quantity it is related with splitting non-zero branch quantity, division number it is more, obtained decision rule quantity is more, and composition is determined Plan rule set.
It is illustrated in detail below:
(1) firstly for behavior classification, have it is on-job with two classes of leaving office, if the first kind be it is on-job, the second class is to leave office, statistics 1 behavior of table learns that on-job three people, leave office two people, obtains its total expected value are as follows:
E (on-job)=- (35) log2(3/5)-(2/5)log2(2/5)=0.971
(2) calculating each other desired value of Attribute class has married and two kinds unmarried, wherein by taking marital status as an example for Wedding, a total of three people, two people are on-job, a people leaves office, therefore married desired value are as follows:
E (married)=- (2/3) log2(2/3)-(1/3)log2(1/3)=0.918
For unmarried, a total of two people, a people is on-job, a people leaves office, therefore unmarried desired value are as follows:
E (unmarried)=I (1,1)=- (1/2) log2(1/2)-(1/2)log2(1/2)=1
Therefore, for the desired value of attribute " marital status " are as follows:
E (behavior, marital status)=(3/5) E (married)+(2/5) E (unmarried)=0.951
(3) difference calculating is done to behavior classification desired value and the other desired value of Attribute class, it is other with each Attribute class of determination Information gain, equally by taking marital status as an example, the information gain of attribute " marital status " at this time are as follows:
Gain (marital status)=E (behavior)-E (behavior, marital status)=0.02
(4) each attribute classification is ranked up from big to small according to information gain, successively extracts the attribute of predetermined quantity Classification, training decision tree, determines decision node, to generate decision rule.
Specifically, predetermined number is set as 1, i.e. the extraction maximum attribute classification of information gain.For example, determining attribute " institute When the information gain maximum of category system ", then extracting attribute " affiliated system " is a decision node, and according in " affiliated system " Classification dispatching portion, Customer Service Department, after sale portion, research and development department etc. carry out branch, so far form four decision rules.If later still can be after Continuous division, then decision rule quantity generated can be greater than 4.
For step S103, information to be predicted is imported, by each attribute classification in information to be predicted according to generated Decision tree is from top to lower progress decision node matching, i.e. attribute categorical match.It, can in the information to be predicted due to artificial etc. The attribute classification in decision tree can be not present.Therefore, in carrying out decision node matching process, this attribute classification if it exists, then The other information of this Attribute class is added into decision rule and carries out rule match, exports decision behavior, such as affiliated body later It is that --- research and development department --- is on-job;If not finding matched attribute classification, extracts and be located at the other Subsequent attributes of the Attribute class Classification continues decision node matching, until obtaining decision behavior.
Method provided by above-described embodiment can more effectively find rule present in sample information, reduce artificial Interference of the factors such as reason to behavior prediction improves the accuracy of behavior prediction, while according to output decision behavior, can learn Behavioral reasons.For enterprise, the reason of labor turnover being understood according to decision rule, and take appropriate measures improving or Person improves enterprise talent organizational strategy, guarantees enterprise personnel stability, reduces enterprises' loss.
Referring to fig. 2, a kind of method flow schematic diagram of optional predictive behavior according to an embodiment of the present invention is shown, is wrapped Include following steps,
S201: sample information is obtained;Wherein, sample information includes behavior classification and attribute classification.
S202: according to the corresponding relationship of preset attribute-value ranges threshold value and attribute value mark, each attribute classification is determined In the corresponding attribute value mark of each attribute value.
S203: calculating each attribute classification and be expert at for the information gain under classification, to extract the attribute classification of preset number, Decision tree is trained, to generate decision rule.
S204: it according to decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with information to be predicted For classification.
In above embodiment, for step S201, S203, S204 can distinguish step S101, S102 shown in Figure 1, The description of S103, details are not described herein.
Information gain in above embodiment, for step S202, in the case where calculating each attribute classification and being expert at for classification Before, it needs to carry out sliding-model control to sample information, so that treated, attribute value can be logically consistent, i.e., Attribute value in same range is identified by identical attribute value to be substituted, to construct suitable decision tree.Attribute value mark can be Numerical value, text etc., the attribute value mark replaced can be according to attribute-value ranges threshold value and the attribute value marks pre-established Corresponding relationship is determined.
It is the attribute classification of text for attribute value, identical text can be replaced with identical numerical value (can choose 1 ~N), for example, the research and development department under affiliated system, mark 4 is corresponded in corresponding relationship, then all research and development departments are replaced with mark 4.In addition, the mode replaced for text can there are many, the present invention is herein with no restrictions.It can also be replaced without numerical value It changes, identical text information is only subjected to unification, for example, research and development department.
When being that variable and amplitude of variation are very big, there are many variable number for attribute value, to reduce amount of calculation, being can be with It is replaced, such as the age, (i.e. precisely for sample information classification, to avoid the over-fitting in decision tree training process But but not high for non-sample information accuracy).The present invention can preset pair of attribute-value ranges and attribute value mark It should be related to, such as the age 20~30 years old corresponded to mark 1, when employee's age under age categories is 27 years old, Ke Yiyi According to corresponding relationship, it is replaced with mark 1.In addition, the mode replaced for variable can there are many, for example, by all small In 30 years old age ,≤30 are all replaced with, specific embodiment, the present invention is herein with no restrictions.
Further, invention shows a kind of modes that a kind of pair of variate-value is replaced, specifically, according to formula
The corresponding relationship for determining attribute-value ranges threshold value and attribute value mark, so that it is determined that each attribute under each attribute classification The attribute value of value identifies, and is specifically as follows 1,2,3,4,5, correspond respectively to very little, it is small, in, it is big, very big.Compared with other replacements Mode, the alternative are easy to understand.Wherein, F () indicates the operation that rounds up, and V indicates that attribute value, MinV and MaxV indicate Minimum attribute value and maximum attribute value under one attribute classification.It, can according to above-mentioned formula for the age for example, with reference to table 1 To learn attribute-value ranges threshold value 26 years old~29 years old, 30 years old~32 years old, 33 years old~35 years old, 36 years old~38 years old, 39 years old~42 years old, point Not Dui Yingyu attribute value mark 1,2,3,4,5, it follows that the age 32 years old corresponding attribute value is identified as 2.
It further include pair before carrying out numerical value replacement to each attribute value under each attribute classification in above embodiment The cleaning treatment of sample information, with letter caused by removing in sample information due to artificial typing, system restriction are unsound etc. Cease undesirable situation, such as imperfect, inconsistent, noise information etc..Wherein, Incomplete information refer to record it is endless Information that is whole, having missing;Inconsistent information refers to that Input Process lacks operative constraint, leads to the information beyond normal range (NR) (such as there is negative value in the age);Noise information, which refers to, to be had mistake or abnormal information (such as company personnel's age is generally poly- For collection between 18-55 years old, the age other than the range is noise information).The present invention is directed to every kind of Incomplete information, using not Same processing mode.For example, can be deleted for Incomplete information, can also be supplemented by manpower;It is write for different Breath, can delete, and can also modify by manpower or program, such as research and development department is revised as in scientific research department;Noise is believed Breath, can delete, and specifically, K-mean algorithm be used to carry out clustering to detect isolated point and be deleted.Master of the present invention It will be by the way of deleting, to reduce the interference of human factor.
Method provided by above-described embodiment carries out information cleaning to sample information, human factor can be greatly reduced Interference, improves the accuracy of behavior prediction;Each attribute value under each attribute classification is subjected to numerical value replacement, for calculating later Each other information gain of Attribute class etc. calculates, and processing workload is greatly reduced, improves treatment effeciency.
Referring to Fig. 3, the method flow schematic diagram of another optional predictive behavior according to an embodiment of the present invention is shown, Include the following steps,
S301: sample information is obtained;Wherein, sample information includes behavior classification and attribute classification.
S302: according to the corresponding relationship of preset attribute-value ranges threshold value and attribute value mark, each attribute classification is determined In the corresponding attribute value mark of each attribute value.
S303: according to Chi-square Test, the difference value between each attribute classification and behavior classification is calculated, removal difference value is small In the attribute classification of predetermined difference threshold value.
S304: it for the sample information after screening attribute classification, calculates each attribute classification and is expert at for the information under classification Gain is trained decision tree, with extracting the attribute classification of preset number to generate decision rule.
S305: it according to decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with information to be predicted For classification.
In above embodiment, for step S301, S304, S305 can distinguish step S101, S102 shown in Figure 1, The description of S103, step S302 can be found in the description of step S202 shown in Fig. 2, and details are not described herein.
In above embodiment, for step S303, attribute classifications much more very is frequently included in sample information, but portion Adhering to separately property classification may be unrelated with behavior, it is therefore desirable to carry out attribute selection to sample information, be influenced with rejecting on behavior classification Property lesser attribute classification, improve behavior prediction accuracy.Above-mentioned implementation operates after cleaning to sample information, with Reduce the case where sample information influences attribute selection there are Incomplete information.
Specifically, each attribute classification in sample information by the way of Chi-square Test otherness, after calculating cleaning Whether have an impact with behavior classification (on-job/to leave office) or difference is tested.X is enabled to indicate that chi-square value, P are indicated by sampling error The caused other probability of sample difference, whenWhen, P≤0.01, difference has highly significant;WhenWhen, 0.01≤P≤0.05, difference there were significant differences property;WhenWhen the difference of P >=0.05 without conspicuousness.Later, it rejects without significant The attribute classification of property.Wherein, 0.01,0.05 correspond respectively to forecasting accuracy be 99% and 95%,It can root According to the other freedom degree of Attribute class calculated, inquiry is carried out in " Chi-square Test critical table " and is learnt, the freedom degree and the attribute The line number of classification, columns are related.
Method provided by above-described embodiment, can reject with the lesser attribute classification of behavior uneven class size, than artificial Subjectivity is selected more effectively.Behavior prediction accuracy is improved simultaneously, the workload of training decision tree is reduced, in addition, to cleaning Sample information afterwards carries out attribute selection, and also reducing sample information influences the feelings of attribute selection result there are Incomplete information Condition.
Referring to fig. 4, a kind of method flow schematic diagram of optional predictive behavior according to an embodiment of the present invention is shown, is wrapped Include following steps,
S401: sample information is obtained;Wherein, sample information includes behavior classification and attribute classification.
S402: according to the corresponding relationship of preset attribute-value ranges threshold value and attribute value mark, each attribute classification is determined In the corresponding attribute value mark of each attribute value.
S403: according to Chi-square Test, the difference value between each attribute classification and behavior classification is calculated, removal difference value is small In the attribute classification of predetermined difference threshold value.
S404: by the sample information after screening attribute classification, it is divided into first sample information and the second sample information.
S405: it according to first sample information, calculates each attribute classification and is expert at for the information gain under classification, according to information The sequence of gain from big to small is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision rule Then.
S406: according to the first decision rule, behavior class prediction is carried out to the second sample information, extracts the behavior class of prediction Not with consistent first decision rule of the behavior classification of the second sample information, to generate the second decision rule.
S407: according to the second decision rule, treating predictive information and carry out behavior prediction, corresponding to information to be predicted to obtain Behavior classification.
In above embodiment, the description of step S101 shown in Fig. 1 can be found in for step S401, step S402 can be found in The description of step S202 shown in 2, step S403 can be found in the description of step S303 shown in Fig. 3, and details are not described herein.
In above embodiment, for step S404, by information cleaning and attribute selection treated sample information, Two parts can be divided into, over-fitting occur to prevent from overtraining, but low for non-sample information behavior prediction accuracy Situation.For example, in-service employee's information can be divided into two parts, ex-employee's information be divided into two parts, later again after equal part In-service employee's information and ex-employee's information in respectively take out a as first sample information, remainder is as the second sample Information.
For step S405, first sample information confrontation plan tree is selected to be trained, traverses each layer and often of decision tree One decision node forms the first rule set extraction to extract the first decision rule, which can be found in shown in Fig. 1 The description of step S102, details are not described herein.
It for step S406, is concentrated in the first decision rule generated, some rules make the behavior prediction in future It with smaller, can be ignored, it is therefore desirable to further the first rule set extraction be screened, to extract representative, prediction Stronger second decision rule of ability forms the second rule set extraction.
To ensure that there is decision tree stronger generalization ability and higher Reasoning Efficiency, the embodiment of the present invention mainly to use standard The mode that exactness is combined with coverage extracts decision rule, wherein accuracy is mainly used for the first decision rule and sample Consistency in this information is evaluated, coverage be then mainly used for the randomness in the first decision rule and sample information into Row evaluation.Even if occur attribute classification it is inconsistent when, can still provide for behavior prediction.Rule Extraction process is specific as follows:
(1) according to from top to bottom, order from left to right traverses decision tree, successively calculates each the using the second sample information The accuracy of one decision rule, only determining and extracting accuracy greater than the first decision rule of predetermined threshold is the second decision rule Then, and it is added to the second decision rule concentration, while calculates the coverage of second decision rule.
(2) if occur regular inconsistent, i.e. conditional attribute classification is identical, and when decision attribute classification difference, it will be accurate It spends the first high decision rule and is determined as the second decision rule, and be added to the second decision rule concentration.
(3) if each branch's accuracy of a decision node is both less than pre-determined accuracy threshold value, accuracy of selection A problem of maximum decision rule is added to the second decision rule concentration, can not match when to avoid part rule being empty. If the accuracy of each branch is all the same, the second decision rule is added in the maximum decision rule of coverage and is concentrated.
The extraction process combined by above-mentioned accuracy with coverage, available length is neat, quantity is simplified Two rule set extractions, while noise can be effectively filtered, improve the accuracy rate of behavior prediction.
For step S407, before treating predictive information and carrying out behavior prediction, also need to treat predictive information progress Information cleaning and attribute selection processing.Later, by the attribute classification in treated information to be predicted in the second decision rule It concentrates and carries out attribute categorical match, the attribute classification for both having had levels high is preferentially matched, in same layer, the high attribute of accuracy Classification priority match, when accuracy is identical, the high priority match of coverage, until output behavior terminates.
Method provided by above-described embodiment, through accuracy in such a way that coverage combines, to the first decision rule Collection is screened, and finally extracts effective decision rule and the second decision rule concentration is added, can effectively filter noise, improve The predictive ability of decision tree.
Method provided by the embodiment of the present invention can effectively find rule present in sample information, reduce artificial etc. Interference of the factor to behavior prediction improves the accuracy of behavior prediction, while can be according to behavior prediction as a result, learning behavior original Cause.For enterprise, it can efficiently take appropriate measures to improve or improve talent organizational strategy, enterprise, guarantee enterprise personnel Stability reduces enterprises' loss.
Referring to Fig. 5, a kind of main modular signal of the device 500 of predictive behavior provided in an embodiment of the present invention is shown Figure;
Module 501 is obtained, for obtaining sample information;Wherein, sample information includes behavior classification and attribute classification;
Training module 502 is expert at for the information gain under classification, to extract preset number for calculating each attribute classification Attribute classification, decision tree is trained, to generate decision rule;
Prediction module 503 carries out behavior prediction for according to decision rule, treating predictive information, with obtain with it is to be predicted The corresponding behavior classification of information.
The device of that embodiment of the invention further includes determining module 504, for according to preset attribute-value ranges threshold value and attribute It is worth the corresponding relationship of mark, determines the corresponding attribute value mark of each attribute value in each attribute classification.
Determining module 504 in the device of that embodiment of the invention is also used to:
According to formula
Determine the corresponding relationship of attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark Know.
The device of that embodiment of the invention further includes inspection module 505, for according to Chi-square Test, calculate each attribute classification with Difference value between behavior classification, removal difference value are less than the attribute classification of predetermined difference threshold value.
Training module 502 in the device of that embodiment of the invention is also used to: by sample information be divided into first sample information and Second sample information;According to first sample information, calculates each attribute classification and be expert at for the information gain under classification, it is pre- to extract Fixed number purpose attribute classification, is trained decision tree, to generate the first decision rule;According to the first decision rule, to second Sample information carries out behavior class prediction, extracts the behavior classification of prediction and the behavior classification consistent first of the second sample information Decision rule, to generate the second decision rule;It according to the second decision rule, treats predictive information and carries out behavior prediction, to obtain Behavior classification corresponding with information to be predicted.
Device provided by the embodiment of the present invention can more effectively find rule present in sample information, reduce people For etc. interference of the factors to behavior prediction, improve the accuracy of behavior prediction, at the same can according to behavior prediction as a result, knowing and doing For reason.For enterprise, it more efficient can take appropriate measures to improve or improve talent organizational strategy, enterprise, guarantee enterprise Industry personnel's stability reduces enterprises' loss.
In addition, the specific implementation content of the predictive behavior device described in embodiments of the present invention, prediction described above It has been described in detail in behavioral approach, therefore has no longer illustrated in this duplicate contents.
The exemplary of the predictive behavior method or predictive behavior device that can apply the embodiment of the present invention is shown referring to Fig. 6 System architecture 600.
As shown in fig. 6, system architecture 600 may include terminal device 601,602,603, network 604 and server 605. Network 604 between terminal device 601,602,603 and server 605 to provide the medium of communication link.Network 604 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 601,602,603 and be interacted by network 604 with server 605, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 601,602,603 (merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.
Terminal device 601,602,603 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 605 can be to provide the server of various services, such as utilize terminal device 601,602,603 to user The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to reception To the data such as information query request analyze etc. processing, and by processing result (such as target push information, product letter Breath -- merely illustrative) feed back to terminal device.
It should be noted that predictive behavior method provided by the embodiment of the present invention is generally executed by server 605, accordingly Ground, predictive behavior device are generally positioned in server 605.
It should be understood that the number of terminal device, network and server in Fig. 6 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
Referring to Fig. 7, it illustrates the knots of the computer system 700 for the terminal device for being suitable for being used to realize the embodiment of the present invention Structure schematic diagram.Terminal device shown in Fig. 7 is only an example, should not function and use scope band to the embodiment of the present invention Carry out any restrictions.
As shown in fig. 7, computer system 700 includes central processing unit (CPU) 701, it can be read-only according to being stored in Program in memory (ROM) 702 or be loaded into the program in random access storage device (RAM) 703 from storage section 708 and Execute various movements appropriate and processing.In RAM 703, also it is stored with system 700 and operates required various programs and data. CPU 701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 is also connected to always Line 704.
I/O interface 705 is connected to lower component: the importation 706 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 708 including hard disk etc.; And the communications portion 709 of the network interface card including LAN card, modem etc..Communications portion 709 via such as because The network of spy's net executes communication process.Driver 710 is also connected to I/O interface 705 as needed.Detachable media 711, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 710, in order to read from thereon Computer program be mounted into storage section 708 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.? In such embodiment, which can be downloaded and installed from network by communications portion 709, and/or from can Medium 711 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 701, system of the invention is executed The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet It includes and obtains module, training module, prediction module.Wherein, the title of these modules is not constituted under certain conditions to the module The restriction of itself, for example, obtaining module is also described as " sample information acquisition module ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes:
Obtain sample information;Wherein, sample information includes behavior classification and attribute classification;
It calculates each attribute classification and is expert at and fought to the finish for the information gain under classification with extracting the attribute classification of preset number Plan tree is trained, to generate decision rule;
It according to decision rule, treats predictive information and carries out behavior prediction, to obtain behavior class corresponding with information to be predicted Not.
Technical solution according to an embodiment of the present invention can effectively find rule present in sample information, reduce people For etc. interference of the factors to behavior prediction, improve the accuracy of behavior prediction, at the same can according to behaviour decision making as a result, knowing and doing For reason.For enterprise, it can more efficiently take appropriate measures to improve or improve talent organizational strategy, enterprise, guarantee Enterprise personnel stability reduces enterprises' loss.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims (12)

1. a kind of method of predictive behavior characterized by comprising
Obtain sample information;Wherein, the sample information includes behavior classification and attribute classification;
Information gain of each attribute classification under the behavior classification is calculated to fight to the finish to extract the attribute classification of preset number Plan tree is trained, to generate decision rule;
It according to the decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with the information to be predicted For classification.
2. the method according to claim 1, wherein after the acquisition sample information, further includes:
According to the corresponding relationship of preset attribute-value ranges threshold value and attribute value mark, each attribute value in each attribute classification is determined Corresponding attribute value mark.
3. according to the method described in claim 2, its feature is being, further includes:
According to formula
Determine the corresponding relationship of the attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark Know.
4. according to the method in claim 2 or 3, which is characterized in that it is described according to preset attribute-value ranges threshold value with The corresponding relationship of attribute value mark determines in each attribute classification after the corresponding attribute value mark of each attribute value, further includes:
According to Chi-square Test, the difference value between each attribute classification and the behavior classification is calculated, removal difference value is less than pre- Determine the attribute classification of discrepancy threshold.
5. method according to any of claims 1-4, which is characterized in that described to calculate each attribute classification described Information gain under behavior classification is trained decision tree, with extracting the attribute classification of preset number to generate decision rule Include:
Sample information is divided into first sample information and the second sample information;
According to the first sample information, information gain of each attribute classification under the behavior classification is calculated, according to described The sequence of information gain from big to small is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision Rule;
According to first decision rule, behavior class prediction is carried out to second sample information, extracts the behavior class of prediction Not with consistent first decision rule of the behavior classification of second sample information, to generate the second decision rule;
According to second decision rule, treats predictive information and carry out behavior prediction, it is corresponding to the information to be predicted to obtain Behavior classification.
6. a kind of device of predictive behavior characterized by comprising
Module is obtained, for obtaining sample information;Wherein, the sample information includes behavior classification and attribute classification;
Training module, for calculating information gain of each attribute classification under the behavior classification, to extract preset number Attribute classification, is trained decision tree, to generate decision rule;
Prediction module carries out behavior prediction for according to the decision rule, treating predictive information, with obtain with it is described to be predicted The corresponding behavior classification of information.
7. device according to claim 6, which is characterized in that further include:
Determining module determines each attribute for the corresponding relationship according to preset attribute-value ranges threshold value and attribute value mark The corresponding attribute value mark of each attribute value in classification.
8. device according to claim 7, feature is being, further includes:
According to formula
Determine the corresponding relationship of the attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark Know.
9. device according to claim 7 or 8, which is characterized in that further include:
Inspection module is removed for calculating the difference value between each attribute classification and the behavior classification according to Chi-square Test Difference value is less than the attribute classification of predetermined difference threshold value.
10. the device according to any one of claim 6-9, which is characterized in that the training module is also used to:
Sample information is divided into first sample information and the second sample information;
According to the first sample information, information gain of each attribute classification under the behavior classification is calculated, according to described The sequence of information gain from big to small is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision Rule;
According to first decision rule, behavior class prediction is carried out to second sample information, extracts the behavior class of prediction Not with consistent first decision rule of the behavior classification of second sample information, to generate the second decision rule;
According to second decision rule, treats predictive information and carry out behavior prediction, it is corresponding to the information to be predicted to obtain Behavior classification.
11. a kind of electronic equipment of predictive behavior characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor Such as method as claimed in any one of claims 1 to 5 is realized when row.
CN201710892426.6A 2017-09-27 2017-09-27 A kind of method and apparatus of predictive behavior Pending CN109558887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710892426.6A CN109558887A (en) 2017-09-27 2017-09-27 A kind of method and apparatus of predictive behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710892426.6A CN109558887A (en) 2017-09-27 2017-09-27 A kind of method and apparatus of predictive behavior

Publications (1)

Publication Number Publication Date
CN109558887A true CN109558887A (en) 2019-04-02

Family

ID=65863930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710892426.6A Pending CN109558887A (en) 2017-09-27 2017-09-27 A kind of method and apparatus of predictive behavior

Country Status (1)

Country Link
CN (1) CN109558887A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163418A (en) * 2019-04-26 2019-08-23 重庆大学 A kind of labor turnover behavior prediction method based on survival analysis
CN111626898A (en) * 2020-03-20 2020-09-04 贝壳技术有限公司 Method, device, medium and electronic equipment for realizing attribution of events
CN111783295A (en) * 2020-06-28 2020-10-16 中国人民公安大学 Dynamic identification and prediction evaluation method and system for urban community specific human behavior chain

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567807A (en) * 2010-12-23 2012-07-11 上海亚太计算机信息系统有限公司 Method for predicating gas card customer churn
CN103093051A (en) * 2013-01-17 2013-05-08 浙江工业大学 Product performance conductive knowledge mining method
CN103473231A (en) * 2012-06-06 2013-12-25 深圳先进技术研究院 Classifier building method and system
CN103559504A (en) * 2013-11-04 2014-02-05 北京京东尚科信息技术有限公司 Image target category identification method and device
CN104951791A (en) * 2014-03-26 2015-09-30 华为技术有限公司 Data classification method and apparatus
CN105488025A (en) * 2015-11-24 2016-04-13 小米科技有限责任公司 Template construction method and apparatus and information identification method and apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567807A (en) * 2010-12-23 2012-07-11 上海亚太计算机信息系统有限公司 Method for predicating gas card customer churn
CN103473231A (en) * 2012-06-06 2013-12-25 深圳先进技术研究院 Classifier building method and system
CN103093051A (en) * 2013-01-17 2013-05-08 浙江工业大学 Product performance conductive knowledge mining method
CN103559504A (en) * 2013-11-04 2014-02-05 北京京东尚科信息技术有限公司 Image target category identification method and device
CN104951791A (en) * 2014-03-26 2015-09-30 华为技术有限公司 Data classification method and apparatus
CN105488025A (en) * 2015-11-24 2016-04-13 小米科技有限责任公司 Template construction method and apparatus and information identification method and apparatus

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163418A (en) * 2019-04-26 2019-08-23 重庆大学 A kind of labor turnover behavior prediction method based on survival analysis
CN111626898A (en) * 2020-03-20 2020-09-04 贝壳技术有限公司 Method, device, medium and electronic equipment for realizing attribution of events
CN111626898B (en) * 2020-03-20 2022-03-15 贝壳找房(北京)科技有限公司 Method, device, medium and electronic equipment for realizing attribution of events
CN111783295A (en) * 2020-06-28 2020-10-16 中国人民公安大学 Dynamic identification and prediction evaluation method and system for urban community specific human behavior chain
CN111783295B (en) * 2020-06-28 2020-12-22 中国人民公安大学 Dynamic identification and prediction evaluation method and system for urban community specific human behavior chain

Similar Documents

Publication Publication Date Title
CN106649890A (en) Data storage method and device
CN111402061A (en) Asset management method and system
CN111489201A (en) Method, device and storage medium for analyzing customer value
CN110310114A (en) Object classification method, device, server and storage medium
CN109558887A (en) A kind of method and apparatus of predictive behavior
CN112015562A (en) Resource allocation method and device based on transfer learning and electronic equipment
CN113435859A (en) Letter processing method and device, electronic equipment and computer readable medium
CN116109373A (en) Recommendation method and device for financial products, electronic equipment and medium
CN112860744A (en) Business process processing method and device
CN111143394B (en) Knowledge data processing method, device, medium and electronic equipment
CN110309293A (en) Text recommended method and device
CN111177372A (en) Scientific and technological achievement classification method, device, equipment and medium
CN113570222A (en) User equipment identification method and device and computer equipment
CN111882113B (en) Enterprise mobile banking user prediction method and device
CN110147482A (en) Method and apparatus for obtaining burst hot spot theme
CN115408236A (en) Log data auditing system, method, equipment and medium
CN113781246B (en) Strategy generation method and device based on preset label and storage medium
CN113362102B (en) Client cable distribution method, system and storage medium
CN110766431A (en) Method and device for judging whether user is sensitive to coupon
CN114780695A (en) Big data mining method and big data mining system for online topics
Al‐Maitah Text analytics for big data using rough–fuzzy soft computing techniques
KR20140080592A (en) Method for online evaluating patents
CN112231299A (en) Method and device for dynamically adjusting feature library
CN113342969A (en) Data processing method and device
CN110443305A (en) Self-adaptive features processing method and processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190402

RJ01 Rejection of invention patent application after publication