CN112395551A

CN112395551A - Optimization method of logistic regression

Info

Publication number: CN112395551A
Application number: CN201910751819.4A
Authority: CN
Inventors: 林淼哲; 方桢; 王雨晨; 詹杰凡
Original assignee: Shanghai Youkun Information Technology Co ltd
Current assignee: Shanghai Youkun Information Technology Co ltd
Priority date: 2019-08-15
Filing date: 2019-08-15
Publication date: 2021-02-23

Abstract

The invention discloses a method and a device for optimizing logistic regression, which relate to the field of logistic regression, wherein the method comprises the following steps: calculating a first parameter according to an area AUC under a receiver operating characteristic ROC curve as a loss function; updating the first parameter by using a gradient descent method to obtain a second parameter; and substituting the second parameter as the value of the parameter theta into a probability formula of the logistic regression model to obtain a probability value.

Description

Optimization method of logistic regression

Technical Field

The invention relates to the field of logistic regression, in particular to a logistic regression optimization method.

Background

With the development of science and technology, the application of the internet is increased day by day, the dependence of people on the network is more obvious, correspondingly, the competition among large internet companies is more intense, how to increase the access rate of users and prolong the online time of the users becomes an important problem considered by the large internet companies, and the technical mode corresponding to the important problem is click rate prediction, for example, through the click rate prediction, advertisements with stronger showing capability can be screened out by a search engine website, a shopping website can push commodities which the users wish to consume, and a news entertainment website can directionally show contents which the users are more interested in. The technical method for predicting the click rate is to adopt a logistic regression model, wherein the logistic regression model is one of the models with the highest popularity in the internet industry, and the logistic regression model is usually used for carrying out probability distribution formula expression behind random events.

In the prior art, when solving parameters of a logistic regression model, log loss of a function obtained by maximum likelihood estimation is used as a loss function, and log loss is optimized to solve model parameters in model training; however, when the evaluation model judges whether the obtained model parameters are good or bad, the Area Under the Receiver Operating Characteristic ROC (ROC) Curve (Area Under Curve, AUC) is used as an index. The processing and evaluation of the same model parameter adopt different modes successively, and no correlation exists between the two modes, so that the model parameter obtained by optimizing the log loss is not necessarily regarded as the optimal solution in the subsequent AUC evaluation model, and the prediction accuracy of the logistic regression model is reduced.

Therefore, the model parameter obtained by using the log loss as the loss function in the prior art is not necessarily considered as the optimal solution in the subsequent AUC evaluation model, which is a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides an optimization method of logistic regression, and solves the problem that in the prior art, model parameters obtained by using log loss as a loss function are not necessarily regarded as optimal solutions in a subsequent AUC evaluation model.

The optimization method of logistic regression provided by the embodiment of the application specifically includes:

calculating a first parameter according to an area AUC under a receiver operating characteristic ROC curve as a loss function;

updating the first parameter by using a gradient descent method to obtain a second parameter;

and substituting the second parameter as the value of the parameter theta into a probability formula of the logistic regression model to obtain a probability value.

One possible implementation, the AUC as a loss function includes:

according to the statistic that the AUC is equivalent to the Whitney test, the calculation method of the AUC is subjected to form conversion, and the converted formula is as follows:

wherein,

for the first vector of parameters is a vector of parameters,

in the form of a vector of positive samples,

in the form of a vector of negative samples,

representing the probability that the score of a positive sample is greater than the score of a negative sample;

the formula of the AUC as a loss function is:

wherein, in the total sample, the data set of the positive sample is P, the data set of the negative sample is Q, and the counting function g (x) is:

one possible implementation further includes:

the counting function g (x) is converted into a logic function;

obtaining said AUC as a loss function applicable to a gradient descent method, with the formula:

one possible implementation manner, wherein the updating the first parameter by using a gradient descent method includes:

updating the first parameter according to the formula of the gradient descent method and the AUC as a loss function, wherein the formula is as follows:

one possible implementation further includes:

applying supervised machine learning to train the first parameter to obtain the second parameter; the method comprises the following steps:

acquiring the existing N samples and corresponding N sample results; setting the number of samples needed for updating the first parameter once to be M, and then updating the first parameter for a total number of times to be N/M, wherein N is an integer greater than 0, M is an integer greater than 0, and N is greater than M;

and according to the M samples and the corresponding M sample results, solving the first parameter by applying the formula with the AUC as a loss function, and updating N/M times by using a gradient descent method to obtain the second parameter.

The embodiment of the present application provides an optimization device for logistic regression, which specifically includes:

the first processing unit is used for calculating a first parameter according to an area AUC under a receiver operating characteristic ROC curve as a loss function; updating the first parameter by using a gradient descent method to obtain a second parameter;

and the second processing unit is used for substituting the second parameter as the value of the parameter theta into a probability formula of the logistic regression model to obtain the probability value.

One possible implementation, the AUC as a loss function includes:

wherein,

for the first vector of parameters is a vector of parameters,

in the form of a vector of positive samples,

in the form of a vector of negative samples,

the formula of the AUC as a loss function is:

one possible implementation further includes:

the counting function g (x) is converted into a logic function;

embodiments of the present application provide a computer device comprising a program or instructions that, when executed, cause a computer to perform the method of any of the above possible designs.

Embodiments of the present application provide a storage medium containing a program or instructions that, when executed, cause a computer to perform the method of any of the above possible designs.

The optimization method of the logistic regression provided by the invention has the following beneficial effects: and taking the AUC as a loss function, and directly taking the optimized AUC as a target in the training process of the model to obtain better model performance, so that the prediction accuracy of the logistic regression model is improved, and the application effect under each business scene is further improved.

Drawings

FIG. 1 is a flow chart of a prior art logistic regression method;

FIG. 2 is a flow chart of a method for logistic regression optimization in an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for optimizing logistic regression using supervised machine learning training parameters according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an apparatus of a logistic regression optimization method in an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions, the technical solutions will be described in detail below with reference to the drawings and the specific embodiments of the specification, and it should be understood that the specific features in the embodiments and examples of the present application are detailed descriptions of the technical solutions of the present application, but not limitations of the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.

FIG. 1 is a flow chart of a logistic regression method in the prior art, and as shown in the figure, the logistic regression model is one of the most popular models in the Internet industry, and is generally used as a two-classification model.

Step 101: obtaining a probability formula of a logistic regression model, wherein the obtained formula (1) meets the following form:

specifically, for example, if the click rate predicts whether the user will click on the advertisement, the probability distribution behind the random event such as "whether one person will click on the advertisement" is expressed by using the above formula, where X represents the observed feature, θ is a parameter that the algorithm needs to solve, and Y ═ 1 is the probability when the user actually clicks on the advertisement and predicts that the advertisement is also clicked.

Step 102: when solving the parameters of the probability formula, calculating by using Log loss as a loss function;

specifically, when solving the model parameter θ, the model parameter θ is obtained by minimizing the log loss of the maximum likelihood estimation by using the log loss derived from the maximum likelihood estimation as a loss function, wherein the derivation formulas (2) to (4) obtained by the loss function satisfy the following form:

L(θ│x)＝P_r(Y│X；θ) (2)

＝∏_iP_r(y_i│x_i；θ) (3)

＝∏_ih_θ(x_i)^yi(1-h_θ(x_i))^(1-yi) (4)

step 103: updating parameters by using a gradient descent method;

specifically, the loss function in step 102 is derived by using a gradient descent method, and the parameter θ is continuously updated; and training a model parameter theta by machine learning, wherein when a sample prediction result obtained by training is very close to an actual sample result, namely a loss function is minimum, the obtained parameter theta is a finally confirmed model parameter.

Step 104: the parameters were substituted into the probability formula and the logistic regression model was evaluated using AUC.

Specifically, the parameter θ obtained through the machine learning training in step 103 is brought into the probability formula of the logistic regression model together with the verification sample to be calculated, and then the AUC value is obtained from the calculated probability value by using the AUC calculation method, when the AUC value is larger, the better the performance of the logistic regression model is, that is, the AUC is used to evaluate the logistic regression model, and whether the obtained model parameter θ is the optimal solution is determined.

From the above step 101-104, when solving the parameters of the logistic regression model, the log loss obtained by the maximum likelihood estimation is used as the loss function, and the model parameter θ is solved by optimizing (i.e. minimizing) the log loss in the model training; however, the model is then evaluated using AUC as an index to determine whether the obtained model parameter θ is the optimal solution. Different modes are successively adopted for processing and evaluating the same model parameter, and no incidence relation exists between the two modes, so that the model parameter obtained by optimizing the log loss is not necessarily regarded as the optimal solution in the subsequent AUC evaluation model, and the prediction accuracy of the logistic regression model is reduced.

Therefore, in the prior art, the model parameter obtained by using log loss as a loss function is not necessarily considered as an optimal solution in a subsequent AUC evaluation model, which is a problem to be solved urgently, the optimization method of logistic regression aims to obtain better model expression by directly taking AUC as the loss function and aiming at optimizing AUC in the training process of the model, so that the prediction accuracy of the logistic regression model is improved, and the application effect under each business scene is further improved.

Fig. 2 is a flowchart of a method for optimizing logistic regression according to an embodiment of the present application, and specific steps will be described in detail below.

Step 201: obtaining a probability formula of the logistic regression model, wherein the obtained formula (1) still satisfies the following form:

step 202: calculating a first parameter from the AUC as a loss function;

specifically, when solving the parameters of the probability formula, calculating by using an area AUC under a Receiver Operating Characteristic (ROC) curve as a loss function; there are two general most intuitive AUC calculation methods, one is: drawing an ROC curve, wherein the area under the ROC curve is the value of AUC; the other is as follows: assuming that there are a total of (m + n) samples, where m positive samples and n negative samples have a total of m x n sample pairs, the probability value that a positive sample is predicted as a positive sample is greater than the probability value that a negative sample is predicted as a positive sample is recorded as 1, and the total of the counts is divided by (m x n) to obtain the AUC. The first method of calculating AUC is typically used in AUC evaluation models; where AUC is used instead of log loss as a loss function, a second method of calculating AUC is chosen.

To apply the second method of calculating AUC, it needs to be mathematically transformed. Considering that AUC is equivalent to the statistic of wheatstone test Wilcoxon-Mann-Whitney, the method for calculating AUC is expressed as a linear model in terms of the probability that the Score of a positive sample is greater than that of a negative sample, and the Score of a positive sample is randomly selected from a positive sample set and a negative sample set, and the type of logistic regression model is combined, so that the above understanding is mathematically formalized, and the obtained formula (5) satisfies the following form:

wherein,

for the first parameter vector, equivalent to the model parameter theta in the prior art,

in the form of a vector of positive samples,

in the form of a vector of negative samples,

further, setting the data set of the positive sample as P and the data set of the negative sample as Q;

the counting function g (x) satisfies the formula (6), and the following can be specifically referred to:

thus, incorporating the counting function g (x) into the mathematical formulation of AUC, the resulting formula (7) satisfies the following form:

the formula is a formula of taking AUC as a loss function, and considering that the loss function needs to update the first parameter by applying a gradient descent method, the counting function g (x) is converted into a logic function and then is combined with the logic function

In the formula (2), i.e., the formula (8) obtained as a loss function of AUC applicable to the gradient descent method satisfies the following form:

step 203: updating the first parameter by using a gradient descent method to obtain a second parameter;

specifically, the formula with AUC as the loss function in step 202 can be derived by using a gradient descent method to update the first parameter

The resulting formula (9) satisfies the following form:

taking N sample data as training data, and adopting supervised machine learning to obtain first parameter

Training is carried out, and the first parameter is continuously updated by applying a gradient descent method

To obtain a second parameter of the optimal solution.

Step 204: and substituting the second parameter as the value of the parameter theta into a probability formula of the logistic regression model to obtain a probability value.

Specifically, L sample data are taken as verification data, and the method is divided into the following steps according to the actual result of the sample: positive and negative examples, first parameter obtained by step 203

And combining the L sample data and substituting the L sample data into a probability formula of the logistic regression model for calculation, predicting the probability of the obtained positive sample into the positive sample and the probability of the obtained negative sample into the positive sample, and combining an AUC (average value of coefficient) conventional calculation method, namely drawing an ROC (optimum characteristic) curve, wherein the area under the ROC curve is the value of AUC, calculating the value of AUC, and when the value of AUC is larger, the logistic regression model is better in performance, namely the AUC is used for evaluating the logistic regression model, and judging whether the calculated second parameter is the optimal solution.

Model parameters obtained by step 201-

The index which is consistent with the AUC evaluation model in the subsequent step 204 is kept, namely the index is obtained by applying the calculation method of AUC, thereby ensuring the model parameters obtained when the AUC is taken as a loss function and minimized

Can also be better in the subsequent AUC evaluation modelThe evaluation results of (1); and the method applies supervised machine learning to carry out model parameters

In the training process, the AUC is directly optimized as the target, so that the model parameters obtained by model training

Compared with the model parameter theta obtained by the log loss function in the prior art, the method is more optimized.

To better understand how to adopt a supervised machine learning method to model parameters

Training is performed to obtain the optimal solution, and the optimal solution is verified and evaluated, which will be further described by way of example in conjunction with the supervised machine learning method adopted in step 201-204.

Fig. 3 is a flowchart of an optimization method of logistic regression using supervised machine learning training parameters in an embodiment of the present application. The supervised machine learning refers to learning a function from a given training data set, and when new data comes, a result can be predicted according to the function. The specific process is described below in conjunction with the practical application of the logistic regression model.

Step 301: preparing training data;

specifically, firstly, training data is prepared, for example, a logistic regression model is applied to predict the click rate of the travel advertisement, and firstly, sample data of 200 users is obtained as the training data, wherein the sample data comprises: observed user characteristics and click results of the user, wherein the observed user characteristics include: name, gender, age, region, online time; the click result includes: click and not click.

Step 302: randomly initializing parameters;

specifically, the model parameters are initialized randomly, the range of the model parameters is (0, 1), the number of samples required for updating the model parameters once is set to be 10, and the total number of times of updating the model parameters is 200/10-20.

Step 303: calculating a predicted value;

specifically, after the rule is set, 10 samples are obtained as a unit, 10 samples of a unit are obtained first, the 10 samples are substituted into a probability formula of a logistic regression model according to initialized model parameters, such as 0.1, to calculate, a prediction result of the 10 samples of the first unit is obtained, the prediction result is compared with click results of users in the 10 samples, and the quality of the model parameters is judged according to the comparison result.

Step 304: calculating AUC as loss function;

the 10 samples are used as input values, and the above formula of taking the AUC as a loss function is applied to solve the model parameters.

Step 305: updating parameters by using a gradient descent method;

specifically, the model parameter solved in step 304 is used as a base number, 10 samples of the second unit are obtained, the model parameter is updated, after the update, the 10 samples of the second unit and the updated model parameter are brought into the probability formula of the logistic regression model together for calculation, the prediction result of the 10 samples of the second unit is obtained, and the prediction result is compared with the click result of the user in the 10 samples of the second unit to judge whether the model parameter is good enough.

Step 306: whether the parameters are good enough;

specifically, the model parameters are calculated and updated by repeating the same operation on the remaining 10 samples of 8 units, the parameters are adjusted by continuously updating the model parameters and the obtained comparison result, and when the comparison result shows that the prediction result is close to the click result of the user, for example, the approach rate is 99%, the parameters are considered to be good enough, so that the final parameters are obtained; and when the comparison result shows that the difference between the prediction result and the click result of the user is large, such as the approach rate is 90%, continuing to take the sample for iterative computation and updating the model parameters until the comparison result is good, such as the approach rate is 99%.

Step 307: and obtaining the final parameters.

Through the processing in step 306, a final parameter is obtained, where the argument in the logistic regression model, i.e., the observed user feature, includes more than one content, and the mathematical expression form of the argument is a vector, so the model parameter corresponding to the argument is also a vector, and the number of model parameter values included in the vector corresponds to the number of contents included in the argument (the observed user feature), so the final parameter is also a vector.

Through the steps 301-307, the final parameters are trained through supervised machine learning, then sample data of 100 users are taken as verification data, and the steps are divided into: combining the final parameters and substituting the final parameters into a probability formula of the logistic regression model to obtain the probability of the positive sample being predicted as the positive sample and the probability of the negative sample being predicted as the positive sample, and then combining an AUC conventional calculation method to obtain the value of AUC; when the value of AUC is larger, the model parameter is better, namely the obtained model parameter is the optimal solution.

Fig. 4 is a schematic structural diagram of an apparatus of an optimization method of logistic regression in the embodiment of the present application, which includes a first processing unit 401 and a second processing unit 402, and is described in detail below.

The first processing unit 401 is configured to calculate a first parameter according to an area AUC under a receiver operating characteristic ROC curve as a loss function; updating the first parameter by using a gradient descent method to obtain a second parameter;

and a second processing unit 402, configured to substitute the second parameter as a value of the parameter θ into a probability formula of the logistic regression model to obtain a probability value.

One possible implementation, the AUC as a loss function includes:

wherein,

for the first vector of parameters is a vector of parameters,

in the form of a vector of positive samples,

in the form of a vector of negative samples,

the formula of the AUC as a loss function is:

one possible implementation further includes:

the counting function g (x) is converted into a logic function;

finally, it should be noted that: as will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.