Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
Strategy of the invention refers in particular to automated process flow, such as recommends, credit, i.e., according to different characteristic and different values
See a series of decision process for carrying out decisions.It is different from conventional machines study, tactful is mainly characterized in that:
1, strategy is the deterministic process of a compound value.The target of usual machine learning is to require Structural risk minization,
It is minimum that a certain loss function is required in other words, and strategy is the compound of multiple value goals, for example considers stability simultaneously, covers
Cover degree, accuracy rate etc..Usually requirement of the strategy to different values is more specific and complicated.From the point of view of functionally, strategy can be simple
It is interpreted as the system being combined to different models;
2, tactful formulation generally entails priori.This priori is mainly reflected in the judgement to feature, and many features have
Very good hiding information, for example, it is stable, influenced by external policy it is small etc., these variables be difficult in short term learn arrive.
Fig. 1 is the key step schematic diagram of the method for data processing according to an embodiment of the present invention.As shown in Figure 1, this hair
The method of the data processing of bright embodiment mainly includes the following steps, namely S101 to step S102.
Step S101: determine that the training dataset for training data processing model, data processing model are used for according to more
A strategy with priority orders carries out data processing;
Step S102: it according to the priority sequence from high to low of strategy and the combination of preset strategy, successively gives birth to
At the corresponding decision tree of each strategy, and using the minimum corresponding decision tree of strategy of priority as data processing model.
According to an embodiment of the invention, the decision tree of the strategy is by right if strategy is the strategy of highest priority
What training dataset was trained;If strategy is not the strategy of highest priority, the decision tree of the strategy is by right
What the first training dataset was trained, wherein the first training dataset is corresponding according to the strategy of higher priority
What the target data of decision tree output determined, the strategy of higher priority be priority be higher than the strategy priority and with the plan
The adjacent strategy of priority slightly.
According to an embodiment of the invention, then being passed through by the way that multiple strategies are carried out priority ranking according to business demand
Training dataset is trained, carries out data classification and data sieve to obtain the corresponding decision tree of strategy of highest priority
Choosing finally, generating the corresponding compound decision tree of all policies according to preset strategy combination mode, and compound decision tree is made
Data are handled for data processing model.
According to one embodiment of present invention, the corresponding decision tree of strategy generates in the following manner: to training number
It is trained according to collection or the first training dataset, prune rule and loss function is determined according to preset evaluation index, then root
Determine the divisional mode of decision tree with the corresponding decision tree of generation strategy according to prune rule and loss function.Wherein, if strategy is
The strategy of highest priority is then by being trained to training dataset;If strategy is not the strategy of highest priority, it is
By being trained to the first training dataset.And first training dataset has been trained in the corresponding decision tree of each strategy
It will do it update after.
According to one embodiment of present invention, when determining the training dataset for training data processing model, specifically
It can be executed according to following step:
It is converted according to training data X of the preset binning rule to input to obtain no repeated characteristic collection Xnew;
To no repeated characteristic collection XnewIt is polymerize to obtain data group { Xnew, s, interested_var, cut_rule,
Map }, wherein s is a certain statistic, and concrete form is determined by value assessment function, and interested_var is variable name collection
It closes, cut_rule is binning rule, and map is polymerization;
Using the corresponding data groups of all training datas as training dataset.
Wherein, binning rule is used to determine the split vertexes of decision tree.Decision tree (Decision Tree) is known each
On the basis of kind happens probability, the desired value that net present value (NPV) is sought by constituting decision tree is more than or equal to zero probability, comments
Valence project risk judges the method for decision analysis of its feasibility, is a kind of intuitive graphical method for using probability analysis.Decision tree is certainly
It is drawn under above, top node is referred to as root node.Branch (branches)/boundary (edges) is based on these nodes
Division and go out, the node no longer divided is exactly the decision/leaf node calculated.The time-consuming maximum place of decision tree is point
The selection of node occurrence is split, usual off-line algorithm is traversal all probable values of feature after sequence.Optionally, it is imitated to improve
Variable is divided into 3 sections by rate, implementation of the invention:
1, smart by stages.User behavior when being in the section some bring difference of every variation be all it is huge, for this
Interval value is not processed;
2, fuzzy interval.The section user behavior has certain similitude, can be divided into the user of different groups, for this
The joint sealing that the value in section does similar histogram is handled.Such as [0-9], be divided into 3 sections: [0-1], [2-4], [5-9], every section takes the left side
Dividing value represents the section, and for example 4 when is transformed to 2, is transformed to 5 when being 7;
3, divide section.Compared to fuzzy interval, the section user behavior consistency is higher or even just the same, or should
Simultaneously lose interest in the variation business side in section.For example think that 10 or more value is all nonsensical, then all be greater than 10 value all
10 are transformed to, the data of long-tail distribution can be effectively treated in this kind of processing mode.
In the application scenarios of the specific embodiment of the invention, it is assumed that variable from small to large preceding 10% value be smart by stages;
The value of 10%-50% is fuzzy interval, is according to value divided into 10 sections;It is segmentation section after 50%.In this way, can accurately be determined
The occurrence of the split vertexes of plan tree, to ensure that the training iteration efficiency of model.
According to another embodiment of the invention, training dataset is being trained to generate the decision tree of the first strategy
It before, can be with:
The variable for including is concentrated to classify training data, to obtain the top layer of the decision tree for generating the first strategy
Node divides group's variable and the downstream variable for generating downstream node.
The embodiment of the present invention will be handled strategic variable after determining strategic variable when carrying out decision tree training,
So as to optimized variable structure to improve following model operational efficiency, and make full use of the prior information of expert to improve output
The stability and interpretation of strategy.
Optionally, when selection is for trained variable, variable classification can also be carried out to variable and change of variable is handled.
Wherein, change of variable processing is, for example, the operation above-mentioned for carrying out by stages to variable according to binning rule.
Optionally, it when carrying out variable classification, such as can be carried out according to following two principle: 1, the corresponding change of strategy
Whether whether amount have interpretation or be appreciated that;2, compared to accuracy, more whether robustness (stability) demand of variable
By force.Whether several layers of variables are stable before traditional decision tree can't consider decision tree, once the decision treetop in practical application scene
Layer feature migration will lead to bottom strategy fails, therefore be contemplated that in top layer to stablize based on variable, consider further that addition in deep layer
More features;In addition to this, the upper layer variable of decision tree can cover all leaf nodes, so if leaf node is required to have certain
Interpretable mark, it is also contemplated that classifying to top layer variable.For example variable is required in the application scenarios of the embodiment of the present invention
With strong interpretation, i.e., selected crowd must have the exception that can be explained, such as intermediary, brush single user.Therefore
All variables are first divided into exceptional variable, its dependent variable in the first meeting of creation strategy, exceptional variable is for mark and requires every
A strategy, which divides in group, to be had, and dependent variable is used for Promotion Strategy effect.
In an embodiment of the present invention, tactful combination mainly includes serial strategy and two kinds of paralleling tactic, also,
When the combination of strategy is serial strategy, in the group according to the priority of strategy sequence from high to low and preset strategy
Conjunction mode may include: when sequentially generating each tactful corresponding decision tree
According to the sequence of the priority of strategy from high to low, successively the target data by the decision tree output of the first strategy is made
It is the input data of the corresponding decision tree of the second strategy to generate the second tactful decision tree;Wherein, the second strategy is priority
The priority tactful lower than first and the strategy adjacent with the priority of the first strategy.
When the combination of strategy is paralleling tactic, according to the priority sequence from high to low of strategy and preset
The combination of strategy can specifically include when sequentially generating each tactful corresponding decision tree:
According to the sequence of the priority of strategy from high to low, successively the decision tree of the first strategy is concentrated to export training data
Target data except data the decision tree of the second strategy is generated as the input data of the second corresponding decision tree of strategy;
Wherein, the second strategy is priority of the priority lower than the first strategy and the strategy adjacent with the priority of the first strategy.
According to an embodiment of the invention, data processing model from structure similar to a compound big decision tree,
It is the structure of a serial type.Therefore when realizing, it is necessary first to determine the strategy of a highest priority, it may be assumed that in the embodiment
Main strategy decision tree, the mesh that exports as main strategic decision-making tree of an optimal leaf node is obtained from main strategic decision-making tree
Data are marked, then define time strategic decision-making tree according to different needs except main strategic decision-making tree, and so on, until obtaining
Whole corresponding compound decision trees of strategy.
According to still another embodiment of the invention, the evaluation index of decision tree includes that minimum guarantee index, limited optimization refer to
At least one of mark and largest optimization index;Also, minimum guarantee index is realized by prune rule;Limited optimization
Index and largest optimization index are realized by loss function.
Fig. 2 is the implementation process schematic diagram of one embodiment of the invention.As shown in Fig. 2, in one embodiment of the present of invention
In, the treatment process of feature initialization refers to: the determining corresponding initial characteristics of decision variable are carried out by binning rule
After change of variable, then it is polymerize to obtain regular data group, the data group that all decision variables obtain after treatment
Set just constitutes training dataset.Later, the training data concentrated to training data is classified, to obtain for establishing
Highest priority first strategy decision tree top mode (" decision tree top mode " in such as Fig. 2) divide group's variable and
For establishing the downstream variable of the downstream node (" decision tree downstream node " in such as Fig. 2) of decision tree.By establishing decision tree
Top mode and downstream node the generation of the corresponding decision tree of current strategies can be realized.
Wherein, when progress decision tree is established, the corresponding evaluation index of strategy according to different priorities is needed.It is optional
Ground, evaluation index mainly have 3 kinds: minimum guarantee index, limited optimization index and largest optimization index, different application scenarios institutes
Corresponding evaluation index is also different, can be one of above 3 kinds or a variety of.Wherein, minimum guarantee index is to pass through beta pruning
For rule come what is realized, limited optimization index and largest optimization index are realized by loss function.
The embodiment of the present invention is broadly divided into three big modules: 1, strategic variable's processing module in the specific implementation;2, Dan Ce
Slightly decision tree generation module;3, combined strategy data processing model generation module.Wherein combined strategy data processing model generates
Module is the multiplexing realized according to the rule of combination of setting to single strategic decision-making tree generation module.The applied field of the embodiment of the present invention
Scape is the abnormality detection (whether detection user may occur certain exception) in air control scene, but theoretically can be all
It is carried out in scene, such as: commercial product recommending, tenant group etc. are needed in the scene for carrying out data recommendation.
1, strategic variable's processing module
Strategic variable handles main two purposes: (1) optimized variable structure is to improve following model operational efficiency;(2) it fills
Divide the stability and interpretation that output strategy is improved using the prior information of expert.
In an embodiment of the present invention, a kind of data structure summary is defined first, defines the principle of the data structure
Are as follows: (1) the Optimized model speed of service;(2) different values (strategy) are provided and realizes interface.
Based on the pretreatment of variable branch mailbox, multiple eigenvalue increases, and can be rapidly performed by distributed polymerization to reduce sky
Between occupancy.Assuming that input variable feature is X, binning rule cut_rule is pressed first, X is converted, it is assumed that Xnew is to become
After changing without repeated characteristic collection, be map from the method for being mapped to polymerization, new data group is { Xnew, s, cut_ after being polymerize
Rule, map }, s is a certain statistic, and concrete form is determined by follow-up value evaluation function, such as in the application of the embodiment of the present invention
What is considered in scene is certain crowd's accounting, then s are as follows: feature is the total number of persons of Xnew.Due to having target number accounting, stabilization
Property, Dividing Characteristics demand, therefore set corresponding attribute and successive projects facilitated to call directly, overall data structure be Xnew,
S, interested_var, cut_rule, map }, wherein interested_var is the interested variable name set of user.In
In the application scenarios of the embodiment of the present invention, summary data structure be implemented as Xnew, sum, [target_flag,
Month], cut_rule, map }, wherein target is the variable name of Target cluster dividing, and month is date variable, corresponding change
Amount represents the generation time of this record.Feature used by the application scenarios of the embodiment of the present invention is as shown in table 1.
Table 1
Variable |
Types of variables |
Iteration wheel number |
Statistical time range |
Whether in permanent residence |
Divide group's variable |
The first round |
One day |
Account aggregation |
Divide group's variable |
The first round |
One week |
Permanent residence |
Divide group's variable |
The first round |
One month |
Special list |
Divide group's variable |
Second wheel |
Three months |
Nearly 3 months maximum overdue durations |
Downstream variable |
The first round |
Half a year |
It is whether first single |
Downstream variable |
The first round |
1 year |
First list type |
Downstream variable |
The first round |
|
Spending amount |
Downstream variable |
The first round |
|
Issue by stages |
Downstream variable |
The first round |
|
The place period |
Downstream variable |
The first round |
|
Abnormal behaviour number |
Downstream variable |
The first round |
|
The accrediting amount |
Downstream variable |
The first round |
|
Remaining sum |
Downstream variable |
The first round |
|
Apply for the frequency |
Downstream variable |
The first round |
|
The lower list frequency |
Downstream variable |
The first round |
|
Cancel single-frequency time |
Downstream variable |
The first round |
|
Active page browsing time |
Downstream variable |
Second wheel |
|
Movable page browsing duration |
Downstream variable |
Second wheel |
|
3c is only measured |
Downstream variable |
Second wheel |
|
Game class commodity list amount |
Downstream variable |
Second wheel |
|
Liquor commodity list amount |
Downstream variable |
Second wheel |
|
Outdoor class commodity list amount |
Downstream variable |
Second wheel |
|
Wherein, statistical time range represents the statistics duration section of derivative variable, as (3C e-commerce is exactly computer to 3c
Computer, communication Communication and tri- electronic product of consumption electronic product ConsumerElectronic) list amount meeting
There are the mono- amount of 3c in one day, the mono- amount of 3c, mono- amount of 3c etc. in one month in one week, types of variables determines whether the variable is for building
Vertical top mode divides group's variable, and iteration wheel number is that the variable is used when which takes turns strategy generating.
2, single strategic decision-making tree generation module
Firstly, all data that default training data is concentrated are all previously defined summary data structure.Definition layering
Variable classification and the compound number of plies, the compound number of plies be whether it is expected that top layer divides group to be multiple characteristic crossovers, such as with
{ province, age } is used as classification, if it is desired to and initially divide group as simple as possible, then the compound number of plies can be set as 1,
So province and a variable is only had in the age be selected into as dividing group's variable;If the compound number of plies is set as 2, then initially
The tree depth of layering becomes 2, just has an opportunity to select cross feature as " Beijing and the age is greater than 18 " and divides group, it should be noted that
Any is the process that layered displacement-variable can still participate in subsequent node division after having divided layer, can optimize the effect of layering in this way.
After defining layered structure, it is also necessary to determine evaluation index.Since strategy would generally consider the synthesis of multiple values,
Different values is had different needs, with its output all compared with general strategy not as good as certain prominent several value use for reference this
A thought can determine which type of strategy oneself do not needed first, and guarantee section value meets basic demand on this basis,
Then expand certain values as far as possible.Thus it is 3 parts by evaluation index dismantling:
(1) minimum guarantee index.The part represents user and which type of situation is not intended to occur, and user is to certain in other words
What the lowest tolerated of thing is;
(2) limited optimization index.Guarantee some index lower limit, is different from minimum guarantee, this refers to the indexs to reach certain
A threshold value can be met the needs of users, and the importance that other indexs change at this time is higher than this index;
(3) largest optimization index.No matter when most important evaluation index requires that performance is good as far as possible.
For example, conversion ratio can be required to reach some index, then on this basis when carrying out commodity or businessman recommends
The number of covering is more as far as possible, while requiring the operation cost of target group must be under some value.
For minimum guarantee index, can be realized by the beta pruning of leaf node, it is assumed that in answering for the embodiment of the present invention
With considering two minimum guarantee indexs in scene:
1) stability.Wish that strategy can be permanently effective, should consider to remove, of the invention for unstable strategy
In embodiment, the ring using different month PSI (goods entry, stock and sales is Purchase, the abbreviation of Sales&Inventory) comes than mean value
The stability of current leaf node is detected, specific formula for calculation is as follows:
Wherein, whether i representative is target group,This month i class crowd accounting is represented,For last month i class
Crowd's accounting, final psi_score represent the mean value of the leaf node of each moon distribution variation.When usual PSI is higher than 0.01
For less stable, but because classifying less only two classes herein, scene is calculated so stable relative to traditional PS I, it can
The threshold value for improving psi_score is 0.025, i.e., when psi_score is higher than 0.025, the leaf node will not further progress point
It splits;
2) coverage.When 0.5% or the target group that the leaf node total number of persons is less than totality are total less than target group
Body 5% when stop division.Specific formula for calculation are as follows:
Wherein, nleaf_targetFor present node target group's sum, ntotal_targetFor overall goal total crowd, nleaf
For present node total number of persons, ntotalFor total number of persons.
For limited optimization and largest optimization, realized by allowable loss function.Use for reference FβScore, formula are as follows:
Wherein, cover is recall rate, and pre is accuracy rate, and b is the importance of recall rate.To above-mentioned FβThe formula of score
It is obtained after modifying:
Wherein, M_score is overall merit, targetiThe value (it is required that being converted to 0-1) evaluated for i-th, biIt is i-th
The weak tendency degree of a evaluation is defaulted as 1, and it is more inessential to be worth the higher evaluation.By using for reference the activation primitive of deep learning with reality
The function of existing limited optimization, and under normal circumstances, user wishes after index reaches expectation, some free floatings are also had,
It is final to consider using tanh as a correction function to targeti.
As shown in figure 3, being the value schematic diagram of the correction function of one embodiment of the invention.Since tanh (x) is in x=
Tend to be saturated substantially when 1.5 or so, and left side gradient incremental effect is obvious, therefore uses 1.5 for the corresponding mapping of objective appraisal
Point.
Assuming that desired value assessment is a, final correction formula are as follows:
Target_s=tanh (target*1.5/a).
It is limited optimization index when index corresponds to a less than 1, is largest optimization index when corresponding a is more than or equal to 1.Most
The loss function of whole split vertexes are as follows:
Under normal circumstances, score is greater than just progress next step division when not dividing after division.In the embodiment of the present invention
In application scenarios, limited optimization index is target group's coverage, and a 0.05, largest optimization index is target group's accounting.Extremely
This completes the generation of single strategic decision-making tree.
3, combined strategy data processing model generation module
There are two types of the combination of strategy is common, one is serial type strategies, another is paralleling tactic.For serial type
Strategy is the lower priority that the target data of the decision tree of the higher strategy of priority output is adjacent thereto as priority
Then the input of the decision tree of strategy redefines value assessment target with point group character again and carries out two wheel iteration.For
Paralleling tactic is data except the target data for exporting the decision tree of the higher strategy of priority as priority phase therewith
Then the input of the decision tree of adjacent lower priority strategy redefines value assessment target with point group character again and carries out two wheels
Iteration.
According to an embodiment of the invention, the strategy combination mode that its application scenarios is selected is, for example, paralleling tactic.With basis
Two strategies (first floor strategy and substrategy) carry out for data recommendations, it is assumed that setting data processing model parameter are as follows:
1, overall: serial strategy
2, initialization of variable: it is shown in Table 1, the first round compound number of plies is 1, and the second compound number of plies of wheel is 1;
3, first round iteration:
Minimum guarantee index: stability (psi_score >=0.05), coverage (covertarget>0.01and
covertotal>0.005);
Limited optimization index: coverage (a=0.05);
Largest optimization index: target group's accounting;
4, the second wheel iteration:
Minimum guarantee index: stability (psi_score >=0.05), coverage (covertarget>0.005and
covertotal>0.0005);
Limited optimization index: coverage (a=0.02)
Largest optimization index: target group's accounting;
5, the operation result of data processing model
First floor policy goals crowd's accounting is 0.067, and target group's coverage is 0.052, is mentioned compared to general population accounting
Liter degree is 2.6 times;Substrategy target group accounting 0.11, target group's coverage are 0.01, and promotion degree is 4.3 times.
The effect obtained when it can thus be seen that carrying out data recommendation by multiple strategies is obviously more preferable, more scientific.
Fig. 4 is the main modular schematic diagram of the device of data processing according to an embodiment of the present invention.As shown in figure 4, this hair
The device 400 of the data processing of bright embodiment mainly includes data determining module 401 and model construction module 402.
Data determining module 401, for determining the training dataset for training data processing model, the data processing
The strategy that model is used for according to multiple with priority orders carries out data processing;
Model construction module 402, for according to the priority sequence from high to low of strategy and the combination of preset strategy
Mode, sequentially generates the corresponding decision tree of each strategy, and using the minimum corresponding decision tree of strategy of priority as data at
Manage model.
According to an embodiment of the invention, the decision tree of the strategy is if the strategy is the strategy of highest priority
Pass through what is be trained to the training dataset;If the strategy is not the strategy of highest priority, the strategy
Decision tree be by being trained to the first training dataset, wherein first training dataset be according to compared with
What the target data of the corresponding decision tree output of the strategy of high priority determined, the strategy of the higher priority is that priority is high
Priority and the strategy adjacent with the priority of the strategy in the strategy.
According to one embodiment of present invention, the corresponding decision tree of strategy generates in the following manner: to the instruction
Practice data set or first training dataset is trained, prune rule and loss letter are determined according to preset evaluation index
Number, then determines the divisional mode of decision tree with the corresponding decision of generation strategy according to the prune rule and the loss function
Tree.
According to one embodiment of present invention, data determining module 401 can be also used for:
It is converted according to training data X of the preset binning rule to input to obtain no repeated characteristic collection Xnew;
The no repeated characteristic collection Xnew is polymerize to obtain data group;
Using the corresponding data groups of all training datas as training dataset.
According to another embodiment of the invention, the device 400 of data processing can also include variable classification module (in figure
It is not shown), it is used for:
Before being trained to the training dataset, the variable for including is concentrated to classify the training data,
The group's variable and downstream for generating downstream node is divided to become to obtain the top mode of the decision tree for generating the first strategy
Amount.
According to still another embodiment of the invention, when the combination of the strategy is serial strategy, model construction mould
Block 402 can be also used for:
According to the sequence of the priority of strategy from high to low, successively the target data by the decision tree output of the first strategy is made
The decision tree of second strategy is generated for the input data of the corresponding decision tree of the second strategy;Wherein, second strategy
For the priority priority tactful lower than described first and the strategy adjacent with the priority of first strategy.
According to still another embodiment of the invention, when the combination of the strategy is paralleling tactic, model construction mould
Block 402 can be also used for:
According to the sequence of the priority of strategy from high to low, the training data is successively concentrated to the decision tree of the first strategy
Data except the target data of output generate second strategy as the input data of the corresponding decision tree of the second strategy
Decision tree;Wherein, it is described second strategy be priority lower than it is described first strategy priority and with it is described first strategy
The adjacent strategy of priority.
Still another embodiment in accordance with the present invention, the evaluation index of the decision tree includes minimum guarantee index, limited excellent
Change at least one of index and largest optimization index;Also,
The minimum guarantee index is realized by prune rule;
The limited optimization index and the largest optimization index are realized by loss function.
Technical solution according to an embodiment of the present invention handles the training dataset of model by determining for training data,
Then according to the priority sequence from high to low of strategy and the combination of preset strategy, each strategy correspondence is sequentially generated
Decision tree may be implemented according to multiple tools and using the minimum corresponding decision tree of strategy of priority as data processing model
There is the strategy of priority orders to carry out data processing, thus realize disassembled based on target value and comprehensive analysis method with
Corresponding loss function solve the problems, such as that traditional decision tree salvage value is single so that it is more reasonable to the processing of data and
Science.In addition, the present invention also devises new data structure based on stability and operational efficiency to improve Policy iteration speed, with
And expert's priori is made full use of by the layering of decision tree and guarantees decision tree stability, interpretation.
Fig. 5 is shown can be using the exemplary of the device of the method or data processing of the data processing of the embodiment of the present invention
System architecture 500.
As shown in figure 5, system architecture 500 may include terminal device 501,502,503, network 504 and server 505.
Network 504 between terminal device 501,502,503 and server 505 to provide the medium of communication link.Network 504 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 501,502,503 and be interacted by network 504 with server 505, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 501,502,503
(merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.
Terminal device 501,502,503 can be the various electronic equipments with display screen and supported web page browsing, packet
Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 505 can be to provide the server of various services, such as utilize terminal device 501,502,503 to user
The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to reception
To the data such as information query request analyze etc. processing, and by processing result (such as target push information, product letter
Breath -- merely illustrative) feed back to terminal device.
It should be noted that the method for data processing provided by the embodiment of the present invention is generally executed by server 505, phase
Ying Di, the device of data processing are generally positioned in server 505.
It should be understood that the number of terminal device, network and server in Fig. 5 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
Below with reference to Fig. 6, it illustrates the calculating of the terminal device or server that are suitable for being used to realize the embodiment of the present invention
The structural schematic diagram of machine system 600.Terminal device or server shown in Fig. 6 are only an example, should not be to of the invention real
The function and use scope for applying example bring any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in
Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and
Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data.
CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always
Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.;
And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because
The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon
Computer program be mounted into storage section 608 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer
Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.In
In such embodiment, which can be downloaded and installed from network by communications portion 609, and/or from can
Medium 611 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) X01, system of the invention is executed
The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
Being described in the embodiment of the present invention involved unit or module can be realized by way of software, can also be with
It is realized by way of hardware.Described unit or module also can be set in the processor, for example, can be described as:
A kind of processor includes data determining module and model construction module.Wherein, the title of these units or module is in certain situation
Under do not constitute restriction to the unit or module itself, for example, data determining module is also described as " using for determining
In the module of the training dataset of training data processing model ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be
Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes
It obtains the equipment and comprises determining that it is more that the data processing model is used for basis for the training dataset of training data processing model
A strategy with priority orders carries out data processing;According to the priority sequence from high to low of strategy and preset strategy
Combination, sequentially generate the corresponding decision tree of each strategy, and using the minimum corresponding decision tree of strategy of priority as
The data processing model.
Technical solution according to an embodiment of the present invention handles the training dataset of model by determining for training data,
Then according to the priority sequence from high to low of strategy and the combination of preset strategy, each strategy correspondence is sequentially generated
Decision tree may be implemented according to multiple tools and using the minimum corresponding decision tree of strategy of priority as data processing model
There is the strategy of priority orders to carry out data processing, thus realize disassembled based on target value and comprehensive analysis method with
Corresponding loss function solve the problems, such as that traditional decision tree salvage value is single so that it is more reasonable to the processing of data and
Science.In addition, the present invention also devises new data structure based on stability and operational efficiency to improve Policy iteration speed, with
And expert's priori is made full use of by the layering of decision tree and guarantees decision tree stability, interpretation.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any
Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention
Within.