CN108182597A

CN108182597A - A kind of clicking rate predictor method based on decision tree and logistic regression

Info

Publication number: CN108182597A
Application number: CN201711439302.9A
Authority: CN
Inventors: 彭文元; 周小强; 申晓宏
Original assignee: Yc (shanghai) Information Technology Co Ltd
Current assignee: Yc (shanghai) Information Technology Co Ltd
Priority date: 2017-12-27
Filing date: 2017-12-27
Publication date: 2018-06-19

Abstract

The invention discloses a kind of clicking rate predictor methods based on decision tree and logistic regression, include the following steps：Obtain the correlated characteristic data of impression information；Establish the clicking rate prediction model based on decision tree Yu probability sparse linear grader cascade structure；Real-time training data is generated by on-line joining process device；Clicking rate prediction model is trained to carry out obtaining newest clicking rate prediction model and estimated to carry out clicking rate by real-time training data；Propose a model architecture based on decision tree Yu probability sparse linear grader cascade structure, it further comprises an on-line study layer, and disclose on-line joining process device, it is component part very crucial in an on-line study layer, and training data can be converted into real-time stream data；Clicking rate predictor method of the present invention based on decision tree and logistic regression, compared to the effect promoting of existing clicking rate appraisal procedure at least 10%.

Description

A kind of clicking rate predictor method based on decision tree and logistic regression

Technical field

The present invention relates to field of computer technology more particularly to a kind of clicking rate based on decision tree and logistic regression are pre- Estimate method.

Background technology

Digital advertisement is the industry of a value multi-million dollar, and annual also in sustainable growth.It is most online Advertising platform is all to dynamically distribute advertisement, is adjusted according to the feedback information of user, and then it is interested to user to show its Advertisement.Machine learning plays a critically important role in which advertisement is showed to user, uses this similar recommendation Pattern can also promote the dispensing efficiency of advertisement.

One in 2007 by Varian and Edelman et al. the paper created it is a kind of pay per click it is competing Valency pattern, the effect of the price-bidding model depend on estimating the accuracy of click.The data generated in usual bid are very Largely, and many new features or element addition are had, so Prediction System needs good adaptability and processing a large amount of The ability of data.

In search advertisements system, data that user is inquired will become the foundation for choosing candidate locations, but In advertisement delivery system, user can't actively go to input anything, so when showing advertisement to user, just have big The advertisement of amount can match some conditions oriented of user, such as geographical location, interest attribute, identity information etc..It but will As soon as choosing a most suitable advertisement in these a large amount of advertisements, at this moment need to come to each advertisement by machine learning It carries out clicking rate (CTR, Click-Through-Rate) to estimate, and then chooses the highest showing advertisement of clicking rate to user.

Invention content

In view of presently, there are above-mentioned deficiency, it is pre- that the present invention provides a kind of clicking rate based on decision tree and logistic regression Estimate method, it is proposed that combine the prediction model of decision tree and logistic regression, improve and estimate effect.

In order to achieve the above objectives, the embodiment of the present invention adopts the following technical scheme that：

A kind of clicking rate predictor method based on decision tree and logistic regression, it is described based on decision tree and logistic regression Clicking rate predictor method includes the following steps：

Obtain the correlated characteristic data of impression information；

Establish the clicking rate prediction model based on decision tree Yu probability sparse linear grader cascade structure；

Real-time training data is generated by on-line joining process device；

Clicking rate prediction model is trained to carry out obtaining newest clicking rate prediction model by real-time training data to carry out Clicking rate is estimated.

According to one aspect of the present invention, the work of the on-line joining process device is：Label is added in data and with online Mode trains the data of input, impression information is showed to click through with impression information, ID is asked to be attached, each user makes Used time can all generate a unique request ID, will be showed by this ID and click matches.

It is described to be included the following steps by the real-time training data of on-line joining process device generation according to one aspect of the present invention：

User accesses website or app, the relevant information of user and can be transmitted in system；

System is returned to relevant impression information in the equipment of user by sorting；

The data that the above process generates are recorded in and are showed in data flow；

When user clicks the impression information that he is seen, this click data is recorded in click data stream；

After time window phase, the data that show connected will be sent to training data concentration by on-line joining process device.

According to one aspect of the present invention, during real-time training data is generated by on-line joining process device, need to establish Abnormality detection mechanism.

According to one aspect of the present invention, on-line study method training linear classifier is used.

According to one aspect of the present invention, feature is converted using enhancing decision tree.

According to one aspect of the present invention, the enhancing decision tree includes：Every is individually set all as a classification spy Sign, its value is exactly the index value of leaf.

According to one aspect of the present invention, the mode of the enhancing decision tree training data is instructed with batch style Experienced.

According to one aspect of the present invention, feature weight is added to each feature, in each tree node structure, Select and divide a best features, once a feature in more trees in use, the importance of each feature can pass through by The whole whole penalty values addition calculation of tree obtains.

According to one aspect of the present invention, the clicking rate predictor method based on decision tree and logistic regression includes：Make A large amount of training datas are handled with the methods of sampling.

The advantages of present invention is implemented：Clicking rate predictor method of the present invention based on decision tree and logistic regression, packet Include following steps：Obtain the correlated characteristic data of impression information；It establishes and is cascaded based on decision tree and probability sparse linear grader The clicking rate prediction model of structure；Real-time training data is generated by on-line joining process device；It is trained and clicked by real-time training data Rate prediction model carries out obtaining newest clicking rate prediction model to be estimated to carry out clicking rate；One is proposed based on decision tree With the model architecture of probability sparse linear grader cascade structure, it further comprises an on-line study layer, and discloses On-line joining process device, it is component part very crucial in an on-line study layer, can be converted into training data in real time Stream data；Clicking rate predictor method of the present invention based on decision tree and logistic regression, compared to existing clicking rate The effect promoting of appraisal procedure at least 10%.

Description of the drawings

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to use required in embodiment Attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing.

Fig. 1 is a kind of clicking rate predictor method schematic diagram based on decision tree and logistic regression of the present invention；

Fig. 2 is the freshness test result schematic diagram of training data of the present invention；

Fig. 3 is the training result of the test schematic diagram that modification learning rate of the present invention carries out model；

Fig. 4 is influence schematic diagram of the different feature quantity of the present invention to result；

Fig. 5 is uniform sampling training result schematic diagram of the present invention；

Fig. 6 is negative sample sampling instruction result schematic diagram of the present invention.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art obtained without making creative work it is all its His embodiment, shall fall within the protection scope of the present invention.

In the present embodiment, we normalize entropy (NE) and calibration as our main judging quota.The molecule portion of NE It is cross entropy and the cost function of LR in fact to divide.(y is sample label, takes 1 or -1；The click that pi is sample i is predicted Probability)；Denominator part be original sample comentropy (p be positive sample probability or be exactly frequency), i.e., original sample This uncertainty.Assuming that given training dataset includes N datas, per data, all there are one label y_i∈{-1,+1} With the clicking rate p estimated_i, wherein i=1,2 ... N, the average CTR that document is tested is p, then NE is represented by

NE is a basic component for calculating relative information gain (RIG), and RIG=1-NE, it helps our to eliminate The uncertainty of sample.When we do not have model help, the positive and negative uncertain meeting of sample is big, we are not very Easily determine that sample is positive and negative；But after having model help, we can obtain the clicking rate of a prediction, under this help We easier can go judgement sample positive and negative, at this time uncertain just to have dropped.As shown in Figure 1, one kind is based on Decision tree and the clicking rate predictor method of logistic regression, the clicking rate predictor method packet based on decision tree and logistic regression Include following steps：

Step S1：Obtain the correlated characteristic data of impression information；

The specific embodiment that the step S1 obtains the correlated characteristic data of impression information can be：Using decision tree mould Type, the feature formed first to the ad data and user data carry out screening combination, and generation discrimination height is more representative Strong characteristic of division, i.e. cross feature.The dimension of feature vector on the one hand can be substantially reduced as a result, accelerate machine learning Convergence process improves assessment efficiency；On the other hand since the feature for using higher discrimination carries out the assessment of ad click rate, It can obtain more accurate assessed value.

Obtain the correlated characteristic information that particular historical in the order history period launches advertisement；The history launches advertisement Oneself advertisement through dispensing within the order history period is referred specifically to, is shown to user interface in a variety of manners, as search is drawn The search listing held up, the message column prompting interface of application program, dialog interface of application program etc..During the order history Between section to maintain the particular historical advertisement not newer period in preset time.Obtain the spy in the order history period Determine history launch advertisement correlated characteristic information, specifically, the particular historical launch advertisement fingering row clicking rate estimate work as Preceding dispensing advertisement.Wherein, the history launch advertisement correlated characteristic information be specifically including but not limited to following any one or It is multinomial：The affiliated industry of advertisement, advertisement size, advertisement text, advertising pictures, history of advertising show number, history of advertising click time Clicking rate after number, location advertising normalization.

Acquisition of information that the advertisement affiliated industrial characteristic is registered when being launched by advertisement or by letters such as its brief introductions Breath extracts corresponding keyword and obtains；The advertisement size is obtained by the size of display；The advertisement text directly passes through Its acquisition of information issued；The advertising pictures are specially the description value for characterizing its characteristics of image, and such as feature vector passes through phase The image characteristics extraction algorithm answered extracts the individual features of the picture；The history of advertising shows number and refers specifically to statistics The number of user is showed in the particular historical period of acquisition；The history of advertising number of clicks, which refers to after advertisement is demonstrated, to be used The number of clicks at family；Clicking rate after the location advertising normalization refers specifically to position shown by advertisement by certain algorithm meter After calculation, optimal location is selected to be shown the number of clicks of rear user.

Obtain the individualized feature information of target user；The individualized feature information refers specifically to related to target user , characterize the characteristic information of itself attribute.In a particular embodiment, target user's individualized feature information include but It is not limited to following any one or more：

Gender, province, occupation, income, school, the age, educational background, blood group, constellation, networking mode, networking time, preference, Love and marriage situation.

Step S2：Establish the clicking rate prediction model based on decision tree Yu probability sparse linear grader cascade structure；

The step S2 establishes the clicking rate prediction model based on decision tree Yu probability sparse linear grader cascade structure Specific embodiment can be：It is proposed a kind of model structure, enhancing decision tree and the level link of probability sparse linear grader Structure.

In practical applications, the on-line study model that the present embodiment uses is based on Stochastic Gradient Descent (SGD) algorithm, after Feature Conversion, an advertising creative will be made of a structured vectors：Wherein e_iRepresent i-th of unit vector, i₁,i₂,…,i_nWhat is represented is n-th of input feature vector Value, in the training stage, we used scale-of-two label y ∈ {+1, -1 } to indicate whether to click.When the advertisement of given labeling Intention (x, y), then the linear combination of weight can be expressed as：

Wherein w represents the weight vectors of linear click score.

In Bayes's on-line study pattern, two of which key factor, the expression way point of likelihood function and relative importance value It is not：

With

WhereinWhat is represented is the cumulative distribution function of standardized normal distribution, and what N (t) was represented is standardized normal distribution Density function, its on-line training is by it is expected that matching and match by moment are realized, the model is by weighing vector w approximation Mean value and the variance composition of Posterior distrbutionp, therefore, above-mentioned formula can be changed to by we：

Wherein v (t)：=N (t)/∮ (t), w (t)：=v (t) [v (t)+t].

However the expression formula of the likelihood function in SGD algorithms is：

P (y | x, w)=sigmoid (s (y, x, x))

Wherein sigmoid (t)=exp (t)/(1+exp (t)), we are usually referred to as logistic regression (LR), the mould Type has inferred the derivative of log-likelihood and has been expressed as the gradient direction of each coordinate fixed step size：

Wherein g is the log-likelihood Grad of all non-zero characteristics, is represented by

Specifically, the process of the generation decision-tree model is summarized as follows:If set of data samples is S, first according to certain One attribute of policy selection, such as age of user are divided according to the attribute, if the age 30 is boundary, the sample more than 30 years old It is divided into a set, the sample less than 30 years old is divided into a set.Specifically, each individualized feature of user is as an attribute, Such as gender, province, occupation, income, school, age, educational background, blood group, constellation, networking mode, networking time, preference, love and marriage The features such as situation are based respectively on a certain amount value and are divided, while the correlated characteristic of particular historical dispensing advertisement is also distinguished Show number, advertisement as an attribute, such as affiliated industry of advertisement, advertisement size, advertisement text, advertising pictures, history of advertising The features such as the clicking rate after history number of clicks, location advertising normalization, are based respectively on corresponding quantized value and carry out further It divides, until cannot divide, so as to generate the different leaf nodes of decision tree, each leaf node characterizes one Cross feature.

In practical applications, in order to improve accuracy, there are two types of simple methods to change the input of linear classifier spy Sign.For continuous feature, a straightforward procedure for learning nonlinear transformation is that feature is put into a bin set, then will The bin is as a characteristic of division.Linear classifier has effectively learnt the constant Nonlinear Mapping of a segmentation, and study has Bin boundaries are critically important, and can realize this work there are many method.Second of simple and effective conversion Mode is structure tuple input feature vector, and for characteristic of division, most stupid method is exactly to use cartesian product, but it has One shortcoming is exactly that cannot be modified combination useless, if input feature vector is all continuous, can combine tying up It is fixed, such as use k-d tree.

Enhancing decision tree is a kind of powerful and very easily method can realize that we described non-linear and first just now Group conversion.We, which individually set every, is all considered as a characteristic of division, its value is exactly the index value of leaf.For example, it is assumed that One decision tree has 2 subtrees, wherein a subtree has 3 leaf nodes, another has 2 leaf nodes, at this moment there is one Data terminate in the 2nd leaf node of subtree 1 and the 1st leaf node of subtree 2, then we can be by two points of vectors The input value of [0,1,0,1,0] as linear classifier, wherein preceding 3 values represent be subtree 1 leaf node, latter two What value represented is the leaf node of subtree 2.The enhancing decision tree that we use has followed gradient elevator (GBM), makes herein With classical L2- TreeBoost algorithms, in study iteration every time, a new tree can be all created to the residual of tree before Remaining to be modeled, the conversion based on decision tree is a kind of feature coding being subjected to supervision, it by real-valued vectors be converted into one it is compact Vector of binary values, the traversal from root node to leaf node is exactly the rule of certain features in fact, on binary vector Linear grader is substantially exactly the enhancing decision tree training unlike other modes for one group of rule learning weight The mode of data is trained with batch style, this can save the training time significantly.

In practical applications, We conducted some experiments to show the input tape using the feature of tree as linear model Come influence, in this experiment we compare two Logic Regression Models, one of them contains Feature Conversion logic, separately One primitive character directly used, later we also enhancing decision tree compared.Comparing result such as following table：

Model	NE values
		Logistic regression+enhancing decision tree	96.58%
Logistic regression	99.43%
		Enhance decision tree	100%

NE values reduce nearly 3% after having used feature conversion as can be seen from the table, this is that obviously effect carries It rises.Display logic recurrence brings the promotion of bigger with the mode that decision tree is combined in table.

In practical applications, for the freshness that data is enable to keep maximum, we used on-line study linear classifications The mode of device.

Assess the influence that different learning rates generates the logistic regression based on SDG.Realize the purpose, we do The following processing：

1. the learning rate of the feature i in the t times iteration can be expressed as

Wherein α, β are adjustable parameters.

2. the square root learning rate of each weight：

Wherein η_t,iWhat is represented is that feature i iterates to the sum of all trained examples after the t times.

3. the learning rate of each weight：

4. global learning rate：

5. instant learning rate：

η_t,i=α

First three equation is provided with learning rate for each feature, and the learning rate of all features of latter two equation is all The same.Wherein adjustable parameter is optimized by the form of grid search, specific optimal value such as following table：

The training that learning rate carries out model is changed by several ways above, the results are shown in Figure 3 for experiment, can be with To find out, the 1st kind of mode has an optimal NE values, and the 3rd kind of mode shows worst, and the 2nd kind of mode is similar with the result of the 5th kind of mode, Caused by the main reason for 4th kind of mode shows difference may be the imbalance of the example quantity in each feature, because of each instruction Different features can be included by practicing example, some features at this moment will be occurred and be contained more training examples.Using the 4th kind of side During formula, the learning rate of the feature containing a small amount of example will drastically decline, and it is optimal to prevent weight from converging to.Although the 3rd kind Mode does not have this problem, but because it the learning rate of all features is all reduced show it is still very poor, in this way As soon as may result in when model converges to non-optimal, training terminates, this also explains why this side Formula performance is worst.

Step S3：Real-time training data is generated by on-line joining process device；

Click Prediction System be typically to be deployed in one dynamically to bid in environment, so data distribution can with when Between and change, it has been found that the freshness of training data largely influence whether prediction performance.In order to verify this knot By we used the data of specific one day to be used as training, then applies model in next continuously bidding within several days. The results are shown in Figure 2 for final test, and what abscissa represented is the number of days that test data is separated by with training data in figure, indulges and sits Mark represents NE values.It can be evident that from figure as the increase NE values for being separated by number of days also accordingly increase, so one The section time (being no more than 7 days) needs the newest data of re -training later so that model keeps optimal, we use a timing Task trains the time for enhancing decision tree depending on various factors to realize this purpose, the quantity including tree, every The quantity of leaf child node, cpu, memory etc. can require over the time of 24 hours to train in the case of single cpu Go out an enhancing decision tree.But in production environment, it would be desirable to carry out concurrent training using the machine of multinuclear, enough memories Such one tree.

Newer training data can improve the accuracy of prediction, it additionally provides a simple model architecture, Middle linear classifier layer is on-line training.

In practical applications, the present embodiment proposes a kind of experimental system, which can generate real-time training data, And pass through on-line study training linear classifier.This system is known as " on-line joining process device " by we, because of its key operation It is to add in label (click/not clicking) and with the data (advertising creative) of online mode training input.It is clicked in launch process Label can be got in real time, but we can not know the user in real time due to the delay of data and network Whether the advertisement is not clicked on, so it is to be understood that whether advertising creative is clicked, it is necessary to intention in regular hour window phase Into the setting of row label, problem is that this time window phase, the setting was much on earth.

It so just needs more memories when window phase setting is long and clicks the time to cache creative information to wait for Occur, it is too short when setting, some normal click datas can be lost.This can bring the problem of " clicking covering ", all clicks Score be all successfully joined it is current show suffer, therefore, on-line joining process system must reconnect and click covering Between obtain balance.

Mean that real-time training set will be with prejudice without completely clicking covering：The CTR of experiment is often than true It is low.This is because if stand-by period long enough, the data that sub-fraction is marked as " not clicking " will be marked It is denoted as " click ".However, in practice, it has been found that in the case that waiting for, window is continually changing, it is easy to by this deviation Reduce, so as to which memory requirements is become controllable.In addition, this little deviation can also be measured and correct.On-line joining process device Groundwork is exactly by showing advertisement and ad click by asking ID to be attached, and each user is in silver-colored orange Ask-Bid System A unique request ID can all be generated by bidding, so can will be showed by this ID and click matches.Connect online A substantially flow for connecing device is：User accesses website or app, and the relevant information of user can be transmitted to silver-colored bidding for orange and be In system, Ask-Bid System is returned to relevant advertisement in the equipment of user by sorting, and the data that this process generates can quilt It is recorded in and shows in data flow, when user clicks the advertisement that he is seen, this click data will be recorded in click data In stream, after time window phase, on-line joining process device will show data what is connected and (add in and click or do not click on label) It is sent to training data concentration.Trainer can continuously generate newest model in this way.Final machine Device learning model forms a tight closure cycle, and in this model, the variation of feature distribution or model performance can It is corrected with captured, study and in a short time.

When using the real-time training data system of generation, a significant consideration needed to be considered is to establish protection Mechanism prevents to destroy the abnormal phenomenon of on-line study system.For example, when click data stream leads to it for some reason In data when being all old data, then the training data that on-line joining process device generates will become very small, this can lead to reality When trainer train generate model pre-estimating come out clicking rate become very low, and then make the showing advertisement number of Ask-Bid System It reduces.Such issues that at this moment abnormality detection mechanism can help us to avoid, such as when the distribution of real-time training data changes suddenly Become, it is possible to the automatic on-line training for disconnecting on-line joining process device.

Step S4：Clicking rate prediction model is trained to carry out obtaining newest clicking rate and estimate mould by real-time training data Type is estimated to carry out clicking rate.

In practical applications, the tree in model is more, and the time of prediction is longer.In this section, we have studied increases The quantity of tree is to estimating the influence of accuracy.The quantity of tree is increased to 2000 by us from 1, and trained data set is a whole day Data, the data one day after that test data is.Found after test the quantity of tree from 0 increase to 500 when the ratio that declines of NE values It is more apparent, but be held essentially constant in NE values later.So it is not that tree more multiple-effect fruit is better, in the training process often Reach fitting in some place.

Feature quantity is the factor that accuracy and calculated performance are estimated in another influence, in order to be better understood on feature The influence of quantity, we add feature weight to each feature.In each tree node structure, select and divide one Best features, to reduce square error to the maximum extent, once in use, the weight of each feature during a feature is set at more The property wanted can be by the way that the whole whole penalty values addition calculation of tree be obtained.

Rule of thumb, usually only a small amount of characteristic can generate model large effect, and other most of characteristics pair The influence of model can be ignored.We test also for the discovery, only retain therein 10,20,50,100,200 During a feature, influence of the different feature quantities to result is then assessed, the results are shown in Figure 4, as can be seen from the figure exists In this section of 10-50, NE values decline obvious, and 50-200NE value falls are smaller, so as to demonstrate to mould The feature quantity that type is affected often accounts for the ratio of very little.

In practical applications, when handling a large amount of training datas, we provide the method for two kinds of data from the sample survey and assess them Quality, both methods is：Uniform sampling and negative sample sampling of data.We will use the enhancing decision containing 600 trees It sets to compare.

It is a kind of method being in daily use to carry out uniform sampling to training data, because it realizes simple and does not need to repair Newly-generated model can be used by changing sample data.In in this section, the different sampling rate of our teams is assessed, right In each group of sample data, we can use enhancing model to be trained, and experimental result is as shown in figure 5, as can be seen from Figure Data are more, and modelling effect is better, and during the training data of use 10%, only low when NE values ratio is using whole training datas 0.02, so we do not need to all data being trained when testing.

Up to the present, having had many researchers, unbalanced problem has carried out a large amount of research to class, as a result table Bright, this imbalance can have a huge impact the performance of mode of learning, below we can sample to solve using negative sample Class imbalance problem.Similarly, data are carried out Contrast on effect, comparing result such as Fig. 6 institutes by us using plurality of sampling rates Show, as can be seen from the figure sample rate modelling effect at 0.025 is best.

The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, and appoints How those skilled in the art is in technical scope disclosed by the invention, the change or replacement that can be readily occurred in, all It is covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the scope of the claims Subject to.

Claims

1. a kind of clicking rate predictor method based on decision tree and logistic regression, which is characterized in that described to be based on decision tree and patrol The clicking rate predictor method returned is collected to include the following steps：

Obtain the correlated characteristic data of impression information；

Real-time training data is generated by on-line joining process device；

Clicking rate prediction model is trained to carry out obtaining newest clicking rate prediction model by real-time training data to be clicked Rate is estimated.

2. the clicking rate predictor method according to claim 1 based on decision tree and logistic regression, which is characterized in that described The work of on-line joining process device is：Label is added in data and with the data of online mode training input, impression information is showed Request ID is clicked through with impression information to be attached, when each user's use can all generate a unique request ID, pass through This ID will show and click matches.

3. the clicking rate predictor method according to claim 2 based on decision tree and logistic regression, which is characterized in that described Real-time training data is generated by on-line joining process device to include the following steps：

4. the clicking rate predictor method according to claim 3 based on decision tree and logistic regression, which is characterized in that logical It crosses during the real-time training data of on-line joining process device generation, needs to establish abnormality detection mechanism.

5. the clicking rate predictor method according to claim 1 based on decision tree and logistic regression, which is characterized in that use On-line study method training linear classifier.

6. the clicking rate predictor method according to claim 1 based on decision tree and logistic regression, which is characterized in that use Enhance decision tree to be converted to feature.

7. the clicking rate predictor method according to claim 6 based on decision tree and logistic regression, which is characterized in that described Enhancing decision tree includes：Every is individually set all for a characteristic of division, its value is exactly the index value of leaf.

8. the clicking rate predictor method according to claim 6 based on decision tree and logistic regression, which is characterized in that described The mode of enhancing decision tree training data is trained with batch style.

9. the clicking rate predictor method based on decision tree and logistic regression described in claim 6, which is characterized in that each spy Sign all adds feature weight, in each tree node structure, selects and divides a best features, once a feature is more In use, the importance of each feature can be by the way that the whole whole penalty values addition calculation of tree be obtained in tree.

10. the clicking rate predictor method based on decision tree and logistic regression according to one of claim 1 to 9, feature It is, the clicking rate predictor method based on decision tree and logistic regression includes：A large amount of training numbers are handled using the methods of sampling According to.