CN109558887A - A kind of method and apparatus of predictive behavior - Google Patents
A kind of method and apparatus of predictive behavior Download PDFInfo
- Publication number
- CN109558887A CN109558887A CN201710892426.6A CN201710892426A CN109558887A CN 109558887 A CN109558887 A CN 109558887A CN 201710892426 A CN201710892426 A CN 201710892426A CN 109558887 A CN109558887 A CN 109558887A
- Authority
- CN
- China
- Prior art keywords
- attribute
- classification
- behavior
- information
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/105—Human resources
Abstract
The invention discloses a kind of method and apparatus of predictive behavior, are related to field of communication technology.One specific embodiment of this method includes: acquisition sample information;Wherein, sample information includes behavior classification and attribute classification;It calculates each attribute classification and is expert at and decision tree is trained, with extracting the attribute classification of preset number to generate decision rule for the information gain under classification;It according to decision rule, treats predictive information and carries out behavior prediction, to obtain behavior classification corresponding with information to be predicted.The embodiment reduces the interference of the factors to behavior prediction such as artificial, improves the accuracy of behavior prediction, while behavioral reasons can be checked in decision rule.
Description
Technical field
The present invention relates to communication technique field more particularly to a kind of method and apparatus of predictive behavior.
Background technique
Although labor turnover phenomenon is customary, often unpredictable, enterprise is often given in the unexpected leaving office of employee
Work progress brings certain influence.Many drainage of human resources departments are also how employee to be learnt before labor turnover
Whether there is leaving office to be inclined to and rack one's brains.
Whether prior art prediction employee has leaving office to be inclined to, and mainly passes through questionnaire survey mode, investigation society, enterprises
With the factors such as individual to the comprehensive stability value of employee influence.Comprehensive stability value is bigger, indicate labor turnover a possibility that it is smaller.
It, can history log according to employee, the behavior of employee's work hours as a kind of alternative embodiments
Data (such as browsed website, used software), judge employee whether have leaving office be inclined to.
In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery:
(1) in such a way that whether questionnaire survey employee leaves office, uncertain factor is more, such as employee exaggerates purposely
Factor causes prediction result accuracy low.
(2) it is to predict employee by capture human behavior (such as behavioral data of history log, work hours)
The mode of no leaving office, the available higher prediction result of accuracy, but HR can not learn labor turnover in this way
Reason leads to not provide reference using corresponding strategy for business stability employee.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus of predictive behavior, it is at least able to solve existing skill
Art predictive behavior accuracy is low, can not check the problem of behavioral reasons.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of method of predictive behavior is provided, is wrapped
It includes: obtaining sample information;Wherein, sample information includes behavior classification and attribute classification;Each attribute classification is calculated to be expert at for class
Information gain under not is trained decision tree, with extracting the attribute classification of preset number to generate decision rule;According to
Decision rule treats predictive information and carries out behavior prediction, to obtain behavior classification corresponding with information to be predicted.
Optionally, after obtaining sample information, further includes: identified according to preset attribute-value ranges threshold value and attribute value
Corresponding relationship, determine the corresponding attribute value mark of each attribute value in each attribute classification.
Optionally, present invention method further include:
According to formula
Determine the corresponding relationship of attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V
Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark
Know.
Optionally, in the corresponding relationship according to preset attribute-value ranges threshold value and attribute value mark, each attribute is determined
In classification after the corresponding attribute value mark of each attribute value, further includes: according to Chi-square Test, calculate each attribute classification and behavior
Difference value between classification, removal difference value are less than the attribute classification of predetermined difference threshold value.
Optionally, it calculates each attribute classification to be expert at for the information gain under classification, to extract the Attribute class of preset number
Not, decision tree is trained, includes: to generate decision rule
Sample information is divided into first sample information and the second sample information;
According to first sample information, calculates each attribute classification and be expert at for the information gain under classification, according to information gain
Sequence from big to small is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision rule;
According to the first decision rule, behavior class prediction is carried out to the second sample information, extract the behavior classification of prediction with
Consistent first decision rule of the behavior classification of second sample information, to generate the second decision rule;
It according to the second decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with information to be predicted
For classification.
To achieve the above object, according to another aspect of an embodiment of the present invention, a kind of device of predictive behavior is provided, is wrapped
It includes: module is obtained, for obtaining sample information;Wherein, sample information includes behavior classification and attribute classification;Training module is used
Be expert in calculating each attribute classification for the information gain under classification, to extract the attribute classification of preset number, to decision tree into
Row training, to generate decision rule;Prediction module, for treating predictive information and carrying out behavior prediction, to obtain according to decision rule
Take behavior classification corresponding with information to be predicted.
Optionally, the device of that embodiment of the invention further include: determining module, for according to preset attribute-value ranges threshold value with
The corresponding relationship of attribute value mark determines the corresponding attribute value mark of each attribute value in each attribute classification.
Optionally it is determined that module is also used to, according to formula
Determine the corresponding relationship of attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V
Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark
Know.
Optionally, the device of that embodiment of the invention further includes inspection module, for calculating each Attribute class according to Chi-square Test
Difference value not between behavior classification, removal difference value are less than the attribute classification of predetermined difference threshold value.
Optionally, training module is also used to: sample information is divided into first sample information and the second sample information;According to
First sample information calculates each attribute classification and is expert at for the information gain under classification, according to information gain from big to small suitable
Sequence is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision rule;It is advised according to the first decision
Then, behavior class prediction is carried out to the second sample information, extracts the behavior classification of prediction and the behavior classification of the second sample information
Consistent first decision rule, to generate the second decision rule;According to the second decision rule, it is pre- to treat predictive information progress behavior
It surveys, to obtain behavior classification corresponding with information to be predicted.
To achieve the above object, according to an embodiment of the present invention in another aspect, the electronics for providing a kind of predictive behavior is set
It is standby.
The electronic equipment of the embodiment of the present invention includes: one or more processors;Storage device, for storing one or more
A program, when one or more of programs are executed by one or more of processors, so that one or more of processing
The method that device realizes any of the above-described predictive behavior.
To achieve the above object, according to an embodiment of the present invention in another aspect, provide a kind of computer-readable medium,
On be stored with computer program, which is characterized in that any of the above-described prediction row is realized when described program is executed by processor
For method.
The scheme of the offer according to the present invention, one embodiment in foregoing invention have the following advantages that or beneficial to effects
Fruit: can effectively find rule of conduct present in sample information, reduce the interference of the factors to behavior prediction such as artificial origin, mention
The accuracy of high behavior prediction, while enterprise uses corresponding plan according to output is predicted as a result, behavioral reasons can be learnt for enterprise
Reference is slightly provided.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment
With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is a kind of main flow schematic diagram of the method for predictive behavior according to an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of the method for optional predictive behavior according to an embodiment of the present invention;
Fig. 3 is the flow diagram of the method for another optional predictive behavior according to an embodiment of the present invention;
Fig. 4 is the flow diagram of the method for another optional predictive behavior according to an embodiment of the present invention;
Fig. 5 is a kind of main modular schematic diagram of the device of predictive behavior according to an embodiment of the present invention;
Fig. 6 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein;
Fig. 7 is adapted for the structural representation for realizing the mobile device of the embodiment of the present invention or the computer system of server
Figure.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
Referring to Fig. 1, thus it is shown that a kind of broad flow diagram of the method for behavior prediction provided in an embodiment of the present invention, including
Following steps:
S101: sample information is obtained;Wherein, sample information includes behavior classification and attribute classification.
S102: calculating each attribute classification and be expert at for the information gain under classification, to extract the attribute classification of preset number,
Decision tree is trained, to generate decision rule.
S103: it according to decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with information to be predicted
For classification.
It should be noted that behavior classification provided by the embodiment of the present invention can be it is a variety of, such as fault diagnosis, leave office
Diagnosis etc., by labor turnover and it is on-job for be illustrated.
It, can be from data management system (such as human resource management system for step S101 in above-described embodiment mode
System) in obtain sample information, which can be using X86 framework (SuSE) Linux OS server, with
Hadoop building.For example, the multiple information tables related with employee for being subordinate to same company are extracted from data management system,
It is merged with representing the unique information (such as ID card No., employee number) of employee as association, to generate employee's sample information
Same employee information is shown in same a line by table.
Specifically, referring to table 1, the information table after merging is (X1, X2 ..., Xn, D), wherein X1, X2 ..., Xn expression person
Work attribute classification corresponds respectively to SEX (gender), SYS_NAME (affiliated system), MARI_STA (marital status), SALARY
(wages), JOB_LEVEL_DESCR (title and rank), NATI-PLA (native place), LOCATION (job site), HIGHEST_
EDUC_LVL_DESCR (highest education degree), AGE (age), SL (department's age);D corresponds to behavior classification, corresponds to EMPL_
The case where CLASS (leaving office/on-job state), expression personnel on-job (staying) and leaving office (going):
1 employee information table of table
For step S102, training decision tree can be using decision tree ID3 algorithm, slightly in a manner of generating decision rule
The classifiers such as rough collection, random forest, the embodiment of the present invention are illustrated by taking decision tree ID3 algorithm as an example, and implementation procedure is as follows:
Tuple classification information entropy E in D is indicated are as follows:
Wherein, c indicates behavior classification total number, piIt is general to indicate that ith attribute classification occurs in entire sample information
Rate.The practical significance of entropy indicates it is that information content it is expected required for the class label of tuple in D.
Training tuple D is divided by attribute Xn, then the expectation information that Xn divides D are as follows:
Wherein, β is the total number divided according to attribute Xn.
Information gain is the difference of the two:
Gain(D,Xn)=E (D)-E (D, Xn)
In short, concluding decision tree ID3 algorithm is exactly to calculate the other information of each Attribute class when needing to divide every time and increase
Then benefit selects the attribute classification of information gain preferably (for example, maximum) to classify sample information, and the attribute classification is made
For a decision node;For each attribute value in the decision node, a corresponding branch is created, and divides sample accordingly
Information.Using same process, top-down carry out recurrence belongs to same behavior class until all sample informations occurs in branch
Not, when can divide without remaining attribute classification, not have one of sample information in branch, stopping continues to divide.It gives birth to
At decision rule quantity it is related with splitting non-zero branch quantity, division number it is more, obtained decision rule quantity is more, and composition is determined
Plan rule set.
It is illustrated in detail below:
(1) firstly for behavior classification, have it is on-job with two classes of leaving office, if the first kind be it is on-job, the second class is to leave office, statistics
1 behavior of table learns that on-job three people, leave office two people, obtains its total expected value are as follows:
E (on-job)=- (35) log2(3/5)-(2/5)log2(2/5)=0.971
(2) calculating each other desired value of Attribute class has married and two kinds unmarried, wherein by taking marital status as an example for
Wedding, a total of three people, two people are on-job, a people leaves office, therefore married desired value are as follows:
E (married)=- (2/3) log2(2/3)-(1/3)log2(1/3)=0.918
For unmarried, a total of two people, a people is on-job, a people leaves office, therefore unmarried desired value are as follows:
E (unmarried)=I (1,1)=- (1/2) log2(1/2)-(1/2)log2(1/2)=1
Therefore, for the desired value of attribute " marital status " are as follows:
E (behavior, marital status)=(3/5) E (married)+(2/5) E (unmarried)=0.951
(3) difference calculating is done to behavior classification desired value and the other desired value of Attribute class, it is other with each Attribute class of determination
Information gain, equally by taking marital status as an example, the information gain of attribute " marital status " at this time are as follows:
Gain (marital status)=E (behavior)-E (behavior, marital status)=0.02
(4) each attribute classification is ranked up from big to small according to information gain, successively extracts the attribute of predetermined quantity
Classification, training decision tree, determines decision node, to generate decision rule.
Specifically, predetermined number is set as 1, i.e. the extraction maximum attribute classification of information gain.For example, determining attribute " institute
When the information gain maximum of category system ", then extracting attribute " affiliated system " is a decision node, and according in " affiliated system "
Classification dispatching portion, Customer Service Department, after sale portion, research and development department etc. carry out branch, so far form four decision rules.If later still can be after
Continuous division, then decision rule quantity generated can be greater than 4.
For step S103, information to be predicted is imported, by each attribute classification in information to be predicted according to generated
Decision tree is from top to lower progress decision node matching, i.e. attribute categorical match.It, can in the information to be predicted due to artificial etc.
The attribute classification in decision tree can be not present.Therefore, in carrying out decision node matching process, this attribute classification if it exists, then
The other information of this Attribute class is added into decision rule and carries out rule match, exports decision behavior, such as affiliated body later
It is that --- research and development department --- is on-job;If not finding matched attribute classification, extracts and be located at the other Subsequent attributes of the Attribute class
Classification continues decision node matching, until obtaining decision behavior.
Method provided by above-described embodiment can more effectively find rule present in sample information, reduce artificial
Interference of the factors such as reason to behavior prediction improves the accuracy of behavior prediction, while according to output decision behavior, can learn
Behavioral reasons.For enterprise, the reason of labor turnover being understood according to decision rule, and take appropriate measures improving or
Person improves enterprise talent organizational strategy, guarantees enterprise personnel stability, reduces enterprises' loss.
Referring to fig. 2, a kind of method flow schematic diagram of optional predictive behavior according to an embodiment of the present invention is shown, is wrapped
Include following steps,
S201: sample information is obtained;Wherein, sample information includes behavior classification and attribute classification.
S202: according to the corresponding relationship of preset attribute-value ranges threshold value and attribute value mark, each attribute classification is determined
In the corresponding attribute value mark of each attribute value.
S203: calculating each attribute classification and be expert at for the information gain under classification, to extract the attribute classification of preset number,
Decision tree is trained, to generate decision rule.
S204: it according to decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with information to be predicted
For classification.
In above embodiment, for step S201, S203, S204 can distinguish step S101, S102 shown in Figure 1,
The description of S103, details are not described herein.
Information gain in above embodiment, for step S202, in the case where calculating each attribute classification and being expert at for classification
Before, it needs to carry out sliding-model control to sample information, so that treated, attribute value can be logically consistent, i.e.,
Attribute value in same range is identified by identical attribute value to be substituted, to construct suitable decision tree.Attribute value mark can be
Numerical value, text etc., the attribute value mark replaced can be according to attribute-value ranges threshold value and the attribute value marks pre-established
Corresponding relationship is determined.
It is the attribute classification of text for attribute value, identical text can be replaced with identical numerical value (can choose 1
~N), for example, the research and development department under affiliated system, mark 4 is corresponded in corresponding relationship, then all research and development departments are replaced with mark
4.In addition, the mode replaced for text can there are many, the present invention is herein with no restrictions.It can also be replaced without numerical value
It changes, identical text information is only subjected to unification, for example, research and development department.
When being that variable and amplitude of variation are very big, there are many variable number for attribute value, to reduce amount of calculation, being can be with
It is replaced, such as the age, (i.e. precisely for sample information classification, to avoid the over-fitting in decision tree training process
But but not high for non-sample information accuracy).The present invention can preset pair of attribute-value ranges and attribute value mark
It should be related to, such as the age 20~30 years old corresponded to mark 1, when employee's age under age categories is 27 years old, Ke Yiyi
According to corresponding relationship, it is replaced with mark 1.In addition, the mode replaced for variable can there are many, for example, by all small
In 30 years old age ,≤30 are all replaced with, specific embodiment, the present invention is herein with no restrictions.
Further, invention shows a kind of modes that a kind of pair of variate-value is replaced, specifically, according to formula
The corresponding relationship for determining attribute-value ranges threshold value and attribute value mark, so that it is determined that each attribute under each attribute classification
The attribute value of value identifies, and is specifically as follows 1,2,3,4,5, correspond respectively to very little, it is small, in, it is big, very big.Compared with other replacements
Mode, the alternative are easy to understand.Wherein, F () indicates the operation that rounds up, and V indicates that attribute value, MinV and MaxV indicate
Minimum attribute value and maximum attribute value under one attribute classification.It, can according to above-mentioned formula for the age for example, with reference to table 1
To learn attribute-value ranges threshold value 26 years old~29 years old, 30 years old~32 years old, 33 years old~35 years old, 36 years old~38 years old, 39 years old~42 years old, point
Not Dui Yingyu attribute value mark 1,2,3,4,5, it follows that the age 32 years old corresponding attribute value is identified as 2.
It further include pair before carrying out numerical value replacement to each attribute value under each attribute classification in above embodiment
The cleaning treatment of sample information, with letter caused by removing in sample information due to artificial typing, system restriction are unsound etc.
Cease undesirable situation, such as imperfect, inconsistent, noise information etc..Wherein, Incomplete information refer to record it is endless
Information that is whole, having missing;Inconsistent information refers to that Input Process lacks operative constraint, leads to the information beyond normal range (NR)
(such as there is negative value in the age);Noise information, which refers to, to be had mistake or abnormal information (such as company personnel's age is generally poly-
For collection between 18-55 years old, the age other than the range is noise information).The present invention is directed to every kind of Incomplete information, using not
Same processing mode.For example, can be deleted for Incomplete information, can also be supplemented by manpower;It is write for different
Breath, can delete, and can also modify by manpower or program, such as research and development department is revised as in scientific research department;Noise is believed
Breath, can delete, and specifically, K-mean algorithm be used to carry out clustering to detect isolated point and be deleted.Master of the present invention
It will be by the way of deleting, to reduce the interference of human factor.
Method provided by above-described embodiment carries out information cleaning to sample information, human factor can be greatly reduced
Interference, improves the accuracy of behavior prediction;Each attribute value under each attribute classification is subjected to numerical value replacement, for calculating later
Each other information gain of Attribute class etc. calculates, and processing workload is greatly reduced, improves treatment effeciency.
Referring to Fig. 3, the method flow schematic diagram of another optional predictive behavior according to an embodiment of the present invention is shown,
Include the following steps,
S301: sample information is obtained;Wherein, sample information includes behavior classification and attribute classification.
S302: according to the corresponding relationship of preset attribute-value ranges threshold value and attribute value mark, each attribute classification is determined
In the corresponding attribute value mark of each attribute value.
S303: according to Chi-square Test, the difference value between each attribute classification and behavior classification is calculated, removal difference value is small
In the attribute classification of predetermined difference threshold value.
S304: it for the sample information after screening attribute classification, calculates each attribute classification and is expert at for the information under classification
Gain is trained decision tree, with extracting the attribute classification of preset number to generate decision rule.
S305: it according to decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with information to be predicted
For classification.
In above embodiment, for step S301, S304, S305 can distinguish step S101, S102 shown in Figure 1,
The description of S103, step S302 can be found in the description of step S202 shown in Fig. 2, and details are not described herein.
In above embodiment, for step S303, attribute classifications much more very is frequently included in sample information, but portion
Adhering to separately property classification may be unrelated with behavior, it is therefore desirable to carry out attribute selection to sample information, be influenced with rejecting on behavior classification
Property lesser attribute classification, improve behavior prediction accuracy.Above-mentioned implementation operates after cleaning to sample information, with
Reduce the case where sample information influences attribute selection there are Incomplete information.
Specifically, each attribute classification in sample information by the way of Chi-square Test otherness, after calculating cleaning
Whether have an impact with behavior classification (on-job/to leave office) or difference is tested.X is enabled to indicate that chi-square value, P are indicated by sampling error
The caused other probability of sample difference, whenWhen, P≤0.01, difference has highly significant;WhenWhen,
0.01≤P≤0.05, difference there were significant differences property;WhenWhen the difference of P >=0.05 without conspicuousness.Later, it rejects without significant
The attribute classification of property.Wherein, 0.01,0.05 correspond respectively to forecasting accuracy be 99% and 95%,It can root
According to the other freedom degree of Attribute class calculated, inquiry is carried out in " Chi-square Test critical table " and is learnt, the freedom degree and the attribute
The line number of classification, columns are related.
Method provided by above-described embodiment, can reject with the lesser attribute classification of behavior uneven class size, than artificial
Subjectivity is selected more effectively.Behavior prediction accuracy is improved simultaneously, the workload of training decision tree is reduced, in addition, to cleaning
Sample information afterwards carries out attribute selection, and also reducing sample information influences the feelings of attribute selection result there are Incomplete information
Condition.
Referring to fig. 4, a kind of method flow schematic diagram of optional predictive behavior according to an embodiment of the present invention is shown, is wrapped
Include following steps,
S401: sample information is obtained;Wherein, sample information includes behavior classification and attribute classification.
S402: according to the corresponding relationship of preset attribute-value ranges threshold value and attribute value mark, each attribute classification is determined
In the corresponding attribute value mark of each attribute value.
S403: according to Chi-square Test, the difference value between each attribute classification and behavior classification is calculated, removal difference value is small
In the attribute classification of predetermined difference threshold value.
S404: by the sample information after screening attribute classification, it is divided into first sample information and the second sample information.
S405: it according to first sample information, calculates each attribute classification and is expert at for the information gain under classification, according to information
The sequence of gain from big to small is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision rule
Then.
S406: according to the first decision rule, behavior class prediction is carried out to the second sample information, extracts the behavior class of prediction
Not with consistent first decision rule of the behavior classification of the second sample information, to generate the second decision rule.
S407: according to the second decision rule, treating predictive information and carry out behavior prediction, corresponding to information to be predicted to obtain
Behavior classification.
In above embodiment, the description of step S101 shown in Fig. 1 can be found in for step S401, step S402 can be found in
The description of step S202 shown in 2, step S403 can be found in the description of step S303 shown in Fig. 3, and details are not described herein.
In above embodiment, for step S404, by information cleaning and attribute selection treated sample information,
Two parts can be divided into, over-fitting occur to prevent from overtraining, but low for non-sample information behavior prediction accuracy
Situation.For example, in-service employee's information can be divided into two parts, ex-employee's information be divided into two parts, later again after equal part
In-service employee's information and ex-employee's information in respectively take out a as first sample information, remainder is as the second sample
Information.
For step S405, first sample information confrontation plan tree is selected to be trained, traverses each layer and often of decision tree
One decision node forms the first rule set extraction to extract the first decision rule, which can be found in shown in Fig. 1
The description of step S102, details are not described herein.
It for step S406, is concentrated in the first decision rule generated, some rules make the behavior prediction in future
It with smaller, can be ignored, it is therefore desirable to further the first rule set extraction be screened, to extract representative, prediction
Stronger second decision rule of ability forms the second rule set extraction.
To ensure that there is decision tree stronger generalization ability and higher Reasoning Efficiency, the embodiment of the present invention mainly to use standard
The mode that exactness is combined with coverage extracts decision rule, wherein accuracy is mainly used for the first decision rule and sample
Consistency in this information is evaluated, coverage be then mainly used for the randomness in the first decision rule and sample information into
Row evaluation.Even if occur attribute classification it is inconsistent when, can still provide for behavior prediction.Rule Extraction process is specific as follows:
(1) according to from top to bottom, order from left to right traverses decision tree, successively calculates each the using the second sample information
The accuracy of one decision rule, only determining and extracting accuracy greater than the first decision rule of predetermined threshold is the second decision rule
Then, and it is added to the second decision rule concentration, while calculates the coverage of second decision rule.
(2) if occur regular inconsistent, i.e. conditional attribute classification is identical, and when decision attribute classification difference, it will be accurate
It spends the first high decision rule and is determined as the second decision rule, and be added to the second decision rule concentration.
(3) if each branch's accuracy of a decision node is both less than pre-determined accuracy threshold value, accuracy of selection
A problem of maximum decision rule is added to the second decision rule concentration, can not match when to avoid part rule being empty.
If the accuracy of each branch is all the same, the second decision rule is added in the maximum decision rule of coverage and is concentrated.
The extraction process combined by above-mentioned accuracy with coverage, available length is neat, quantity is simplified
Two rule set extractions, while noise can be effectively filtered, improve the accuracy rate of behavior prediction.
For step S407, before treating predictive information and carrying out behavior prediction, also need to treat predictive information progress
Information cleaning and attribute selection processing.Later, by the attribute classification in treated information to be predicted in the second decision rule
It concentrates and carries out attribute categorical match, the attribute classification for both having had levels high is preferentially matched, in same layer, the high attribute of accuracy
Classification priority match, when accuracy is identical, the high priority match of coverage, until output behavior terminates.
Method provided by above-described embodiment, through accuracy in such a way that coverage combines, to the first decision rule
Collection is screened, and finally extracts effective decision rule and the second decision rule concentration is added, can effectively filter noise, improve
The predictive ability of decision tree.
Method provided by the embodiment of the present invention can effectively find rule present in sample information, reduce artificial etc.
Interference of the factor to behavior prediction improves the accuracy of behavior prediction, while can be according to behavior prediction as a result, learning behavior original
Cause.For enterprise, it can efficiently take appropriate measures to improve or improve talent organizational strategy, enterprise, guarantee enterprise personnel
Stability reduces enterprises' loss.
Referring to Fig. 5, a kind of main modular signal of the device 500 of predictive behavior provided in an embodiment of the present invention is shown
Figure;
Module 501 is obtained, for obtaining sample information;Wherein, sample information includes behavior classification and attribute classification;
Training module 502 is expert at for the information gain under classification, to extract preset number for calculating each attribute classification
Attribute classification, decision tree is trained, to generate decision rule;
Prediction module 503 carries out behavior prediction for according to decision rule, treating predictive information, with obtain with it is to be predicted
The corresponding behavior classification of information.
The device of that embodiment of the invention further includes determining module 504, for according to preset attribute-value ranges threshold value and attribute
It is worth the corresponding relationship of mark, determines the corresponding attribute value mark of each attribute value in each attribute classification.
Determining module 504 in the device of that embodiment of the invention is also used to:
According to formula
Determine the corresponding relationship of attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V
Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark
Know.
The device of that embodiment of the invention further includes inspection module 505, for according to Chi-square Test, calculate each attribute classification with
Difference value between behavior classification, removal difference value are less than the attribute classification of predetermined difference threshold value.
Training module 502 in the device of that embodiment of the invention is also used to: by sample information be divided into first sample information and
Second sample information;According to first sample information, calculates each attribute classification and be expert at for the information gain under classification, it is pre- to extract
Fixed number purpose attribute classification, is trained decision tree, to generate the first decision rule;According to the first decision rule, to second
Sample information carries out behavior class prediction, extracts the behavior classification of prediction and the behavior classification consistent first of the second sample information
Decision rule, to generate the second decision rule;It according to the second decision rule, treats predictive information and carries out behavior prediction, to obtain
Behavior classification corresponding with information to be predicted.
Device provided by the embodiment of the present invention can more effectively find rule present in sample information, reduce people
For etc. interference of the factors to behavior prediction, improve the accuracy of behavior prediction, at the same can according to behavior prediction as a result, knowing and doing
For reason.For enterprise, it more efficient can take appropriate measures to improve or improve talent organizational strategy, enterprise, guarantee enterprise
Industry personnel's stability reduces enterprises' loss.
In addition, the specific implementation content of the predictive behavior device described in embodiments of the present invention, prediction described above
It has been described in detail in behavioral approach, therefore has no longer illustrated in this duplicate contents.
The exemplary of the predictive behavior method or predictive behavior device that can apply the embodiment of the present invention is shown referring to Fig. 6
System architecture 600.
As shown in fig. 6, system architecture 600 may include terminal device 601,602,603, network 604 and server 605.
Network 604 between terminal device 601,602,603 and server 605 to provide the medium of communication link.Network 604 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 601,602,603 and be interacted by network 604 with server 605, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 601,602,603
(merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.
Terminal device 601,602,603 can be the various electronic equipments with display screen and supported web page browsing, packet
Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 605 can be to provide the server of various services, such as utilize terminal device 601,602,603 to user
The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to reception
To the data such as information query request analyze etc. processing, and by processing result (such as target push information, product letter
Breath -- merely illustrative) feed back to terminal device.
It should be noted that predictive behavior method provided by the embodiment of the present invention is generally executed by server 605, accordingly
Ground, predictive behavior device are generally positioned in server 605.
It should be understood that the number of terminal device, network and server in Fig. 6 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
Referring to Fig. 7, it illustrates the knots of the computer system 700 for the terminal device for being suitable for being used to realize the embodiment of the present invention
Structure schematic diagram.Terminal device shown in Fig. 7 is only an example, should not function and use scope band to the embodiment of the present invention
Carry out any restrictions.
As shown in fig. 7, computer system 700 includes central processing unit (CPU) 701, it can be read-only according to being stored in
Program in memory (ROM) 702 or be loaded into the program in random access storage device (RAM) 703 from storage section 708 and
Execute various movements appropriate and processing.In RAM 703, also it is stored with system 700 and operates required various programs and data.
CPU 701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 is also connected to always
Line 704.
I/O interface 705 is connected to lower component: the importation 706 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 708 including hard disk etc.;
And the communications portion 709 of the network interface card including LAN card, modem etc..Communications portion 709 via such as because
The network of spy's net executes communication process.Driver 710 is also connected to I/O interface 705 as needed.Detachable media 711, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 710, in order to read from thereon
Computer program be mounted into storage section 708 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer
Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.?
In such embodiment, which can be downloaded and installed from network by communications portion 709, and/or from can
Medium 711 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 701, system of the invention is executed
The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet
It includes and obtains module, training module, prediction module.Wherein, the title of these modules is not constituted under certain conditions to the module
The restriction of itself, for example, obtaining module is also described as " sample information acquisition module ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be
Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes
Obtaining the equipment includes:
Obtain sample information;Wherein, sample information includes behavior classification and attribute classification;
It calculates each attribute classification and is expert at and fought to the finish for the information gain under classification with extracting the attribute classification of preset number
Plan tree is trained, to generate decision rule;
It according to decision rule, treats predictive information and carries out behavior prediction, to obtain behavior class corresponding with information to be predicted
Not.
Technical solution according to an embodiment of the present invention can effectively find rule present in sample information, reduce people
For etc. interference of the factors to behavior prediction, improve the accuracy of behavior prediction, at the same can according to behaviour decision making as a result, knowing and doing
For reason.For enterprise, it can more efficiently take appropriate measures to improve or improve talent organizational strategy, enterprise, guarantee
Enterprise personnel stability reduces enterprises' loss.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any
Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention
Within.
Claims (12)
1. a kind of method of predictive behavior characterized by comprising
Obtain sample information;Wherein, the sample information includes behavior classification and attribute classification;
Information gain of each attribute classification under the behavior classification is calculated to fight to the finish to extract the attribute classification of preset number
Plan tree is trained, to generate decision rule;
It according to the decision rule, treats predictive information and carries out behavior prediction, to obtain row corresponding with the information to be predicted
For classification.
2. the method according to claim 1, wherein after the acquisition sample information, further includes:
According to the corresponding relationship of preset attribute-value ranges threshold value and attribute value mark, each attribute value in each attribute classification is determined
Corresponding attribute value mark.
3. according to the method described in claim 2, its feature is being, further includes:
According to formula
Determine the corresponding relationship of the attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V
Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark
Know.
4. according to the method in claim 2 or 3, which is characterized in that it is described according to preset attribute-value ranges threshold value with
The corresponding relationship of attribute value mark determines in each attribute classification after the corresponding attribute value mark of each attribute value, further includes:
According to Chi-square Test, the difference value between each attribute classification and the behavior classification is calculated, removal difference value is less than pre-
Determine the attribute classification of discrepancy threshold.
5. method according to any of claims 1-4, which is characterized in that described to calculate each attribute classification described
Information gain under behavior classification is trained decision tree, with extracting the attribute classification of preset number to generate decision rule
Include:
Sample information is divided into first sample information and the second sample information;
According to the first sample information, information gain of each attribute classification under the behavior classification is calculated, according to described
The sequence of information gain from big to small is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision
Rule;
According to first decision rule, behavior class prediction is carried out to second sample information, extracts the behavior class of prediction
Not with consistent first decision rule of the behavior classification of second sample information, to generate the second decision rule;
According to second decision rule, treats predictive information and carry out behavior prediction, it is corresponding to the information to be predicted to obtain
Behavior classification.
6. a kind of device of predictive behavior characterized by comprising
Module is obtained, for obtaining sample information;Wherein, the sample information includes behavior classification and attribute classification;
Training module, for calculating information gain of each attribute classification under the behavior classification, to extract preset number
Attribute classification, is trained decision tree, to generate decision rule;
Prediction module carries out behavior prediction for according to the decision rule, treating predictive information, with obtain with it is described to be predicted
The corresponding behavior classification of information.
7. device according to claim 6, which is characterized in that further include:
Determining module determines each attribute for the corresponding relationship according to preset attribute-value ranges threshold value and attribute value mark
The corresponding attribute value mark of each attribute value in classification.
8. device according to claim 7, feature is being, further includes:
According to formula
Determine the corresponding relationship of the attribute-value ranges threshold value and attribute value mark;Wherein, F () indicates the operation that rounds up, V
Indicate that attribute value, MinV and MaxV indicate that minimum attribute value and maximum attribute value under an attribute classification, G indicate attribute value mark
Know.
9. device according to claim 7 or 8, which is characterized in that further include:
Inspection module is removed for calculating the difference value between each attribute classification and the behavior classification according to Chi-square Test
Difference value is less than the attribute classification of predetermined difference threshold value.
10. the device according to any one of claim 6-9, which is characterized in that the training module is also used to:
Sample information is divided into first sample information and the second sample information;
According to the first sample information, information gain of each attribute classification under the behavior classification is calculated, according to described
The sequence of information gain from big to small is extracted the attribute classification of predetermined number, is trained to decision tree, to generate the first decision
Rule;
According to first decision rule, behavior class prediction is carried out to second sample information, extracts the behavior class of prediction
Not with consistent first decision rule of the behavior classification of second sample information, to generate the second decision rule;
According to second decision rule, treats predictive information and carry out behavior prediction, it is corresponding to the information to be predicted to obtain
Behavior classification.
11. a kind of electronic equipment of predictive behavior characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method as claimed in any one of claims 1 to 5.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor
Such as method as claimed in any one of claims 1 to 5 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710892426.6A CN109558887A (en) | 2017-09-27 | 2017-09-27 | A kind of method and apparatus of predictive behavior |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710892426.6A CN109558887A (en) | 2017-09-27 | 2017-09-27 | A kind of method and apparatus of predictive behavior |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109558887A true CN109558887A (en) | 2019-04-02 |
Family
ID=65863930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710892426.6A Pending CN109558887A (en) | 2017-09-27 | 2017-09-27 | A kind of method and apparatus of predictive behavior |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109558887A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163418A (en) * | 2019-04-26 | 2019-08-23 | 重庆大学 | A kind of labor turnover behavior prediction method based on survival analysis |
CN111626898A (en) * | 2020-03-20 | 2020-09-04 | 贝壳技术有限公司 | Method, device, medium and electronic equipment for realizing attribution of events |
CN111783295A (en) * | 2020-06-28 | 2020-10-16 | 中国人民公安大学 | Dynamic identification and prediction evaluation method and system for urban community specific human behavior chain |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567807A (en) * | 2010-12-23 | 2012-07-11 | 上海亚太计算机信息系统有限公司 | Method for predicating gas card customer churn |
CN103093051A (en) * | 2013-01-17 | 2013-05-08 | 浙江工业大学 | Product performance conductive knowledge mining method |
CN103473231A (en) * | 2012-06-06 | 2013-12-25 | 深圳先进技术研究院 | Classifier building method and system |
CN103559504A (en) * | 2013-11-04 | 2014-02-05 | 北京京东尚科信息技术有限公司 | Image target category identification method and device |
CN104951791A (en) * | 2014-03-26 | 2015-09-30 | 华为技术有限公司 | Data classification method and apparatus |
CN105488025A (en) * | 2015-11-24 | 2016-04-13 | 小米科技有限责任公司 | Template construction method and apparatus and information identification method and apparatus |
-
2017
- 2017-09-27 CN CN201710892426.6A patent/CN109558887A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567807A (en) * | 2010-12-23 | 2012-07-11 | 上海亚太计算机信息系统有限公司 | Method for predicating gas card customer churn |
CN103473231A (en) * | 2012-06-06 | 2013-12-25 | 深圳先进技术研究院 | Classifier building method and system |
CN103093051A (en) * | 2013-01-17 | 2013-05-08 | 浙江工业大学 | Product performance conductive knowledge mining method |
CN103559504A (en) * | 2013-11-04 | 2014-02-05 | 北京京东尚科信息技术有限公司 | Image target category identification method and device |
CN104951791A (en) * | 2014-03-26 | 2015-09-30 | 华为技术有限公司 | Data classification method and apparatus |
CN105488025A (en) * | 2015-11-24 | 2016-04-13 | 小米科技有限责任公司 | Template construction method and apparatus and information identification method and apparatus |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163418A (en) * | 2019-04-26 | 2019-08-23 | 重庆大学 | A kind of labor turnover behavior prediction method based on survival analysis |
CN111626898A (en) * | 2020-03-20 | 2020-09-04 | 贝壳技术有限公司 | Method, device, medium and electronic equipment for realizing attribution of events |
CN111626898B (en) * | 2020-03-20 | 2022-03-15 | 贝壳找房(北京)科技有限公司 | Method, device, medium and electronic equipment for realizing attribution of events |
CN111783295A (en) * | 2020-06-28 | 2020-10-16 | 中国人民公安大学 | Dynamic identification and prediction evaluation method and system for urban community specific human behavior chain |
CN111783295B (en) * | 2020-06-28 | 2020-12-22 | 中国人民公安大学 | Dynamic identification and prediction evaluation method and system for urban community specific human behavior chain |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649890A (en) | Data storage method and device | |
CN111402061A (en) | Asset management method and system | |
CN111489201A (en) | Method, device and storage medium for analyzing customer value | |
CN110310114A (en) | Object classification method, device, server and storage medium | |
CN109558887A (en) | A kind of method and apparatus of predictive behavior | |
CN112015562A (en) | Resource allocation method and device based on transfer learning and electronic equipment | |
CN113435859A (en) | Letter processing method and device, electronic equipment and computer readable medium | |
CN116109373A (en) | Recommendation method and device for financial products, electronic equipment and medium | |
CN112860744A (en) | Business process processing method and device | |
CN111143394B (en) | Knowledge data processing method, device, medium and electronic equipment | |
CN110309293A (en) | Text recommended method and device | |
CN111177372A (en) | Scientific and technological achievement classification method, device, equipment and medium | |
CN113570222A (en) | User equipment identification method and device and computer equipment | |
CN111882113B (en) | Enterprise mobile banking user prediction method and device | |
CN110147482A (en) | Method and apparatus for obtaining burst hot spot theme | |
CN115408236A (en) | Log data auditing system, method, equipment and medium | |
CN113781246B (en) | Strategy generation method and device based on preset label and storage medium | |
CN113362102B (en) | Client cable distribution method, system and storage medium | |
CN110766431A (en) | Method and device for judging whether user is sensitive to coupon | |
CN114780695A (en) | Big data mining method and big data mining system for online topics | |
Al‐Maitah | Text analytics for big data using rough–fuzzy soft computing techniques | |
KR20140080592A (en) | Method for online evaluating patents | |
CN112231299A (en) | Method and device for dynamically adjusting feature library | |
CN113342969A (en) | Data processing method and device | |
CN110443305A (en) | Self-adaptive features processing method and processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190402 |
|
RJ01 | Rejection of invention patent application after publication |