CN117035873B

CN117035873B - Multi-task combined prediction method for few-sample advertisement

Info

Publication number: CN117035873B
Application number: CN202311295560.XA
Authority: CN
Inventors: 黄国峰; 沈鑫杰
Original assignee: Guangzhou Tidong Technology Co ltd
Current assignee: Guangzhou Tidong Technology Co ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2024-03-29
Anticipated expiration: 2043-10-09
Also published as: CN117035873A

Abstract

The invention provides a method for jointly predicting a plurality of advertisement fields by using a few samples, which is characterized in that various possible advertisement events are predicted by adopting a binary coding mode, meanwhile, a plurality of joint events are introduced into a decision tree, decision classification investigation is carried out on the characteristics of the plurality of advertisement fields by utilizing the decision tree, more accurate prediction can be realized under the condition of fewer samples, further, in order to ensure that the predicted result of the decision tree is as close to a real result as possible, residual iteration is carried out on the decision tree, so that a plurality of decision tree iterations gradually reduce residual errors, and finally, the residual errors are converted into probability values through a logic function, thereby carrying out the joint prediction on the plurality of advertisements, and realizing the accuracy of prediction.

Description

Multi-task combined prediction method for few-sample advertisement

Technical Field

The invention relates to the field of artificial intelligence and advertising, in particular to a multi-task joint prediction method for few-sample advertisements.

Background

In the advertising field, a demand side platform of the demand side needs to be built in common businesses. In the initial stage of building the platform, the data volume is deficient, so the requirement on the data characteristics is relatively high. Common ad data features include bundleId (put APP), deviceMake (device manufacturer), deviceModel (device type), bidFloor, adFormat, creatiid, and the like. A large portion of these data features are numerous and non-repeating features and are therefore important for feature selection and construction. In the advertising field, interpretation of feature effectiveness is also important, while deep learning networks are black boxes, which are often poorly interpretable for feature effectiveness.

Therefore, the method of feature selection and construction is critical in advertising prediction. Current technology provides solutions in the advertising arts to address these challenges. For example, embedding (Embedding) techniques are one method of mapping high-dimensional discrete features into a low-dimensional dense vector space. For ID type features in advertisement data, embedding techniques may be used to convert them into continuous features for processing in a deep learning model. Embedding techniques can capture semantic relationships between features and provide an efficient way to represent the features.

However, the embedding technique converts the ID type feature into a continuous type feature, which requires a large amount of data to learn the embedding difference between different ID type features, and thus is not feasible in the case of insufficient data amount.

Another solution is a deep learning model, such as a neural network, which is widely used in the advertising field for tasks such as click-through rate prediction, user behavior prediction, etc. Although the deep learning model is excellent in solving a complex nonlinear problem, it is poor in interpretability.

In deep learning, if a model can perform well with robustness, it is generally the case that a large amount of data is required for training, and if the data set is missing, the model is under-fitted. And therefore is not suitable for use in deep learning models when the amount of data is insufficient.

Disclosure of Invention

The invention provides a multi-task combined prediction method for few-sample advertisements, which can effectively overcome the defects in the prior art.

Specifically, the invention provides a few-sample advertising multitasking joint prediction method, which comprises the following steps: setting the clicking action of clicking the advertisement webpage or the advertisement webpage button by a user as a clicking event, wherein the clicking event is called a 1 st event and a 2 nd event … nth event according to n possible events triggered by conditional probability; counting "click event" and each of the n possible events as 1/0 in a two-level system algorithm according to the occurrence/non-occurrence condition, thereby counting "click event" and the n possible events as 2 jointly for each user operation on the advertisement web page ⁿ⁺¹ Joint task events for item classificationThe method comprises the steps of carrying out a first treatment on the surface of the Constructing a decision tree for the model 2 by taking a set of a plurality of advertisement user samples as a training set and a plurality of advertisement domain features as feature sets ⁿ⁺¹ Predicting a prediction result of each joint task event by the joint task event; calculating the residual error of the decision tree according to the deviation between the real result and the predicted result, constructing a next-stage decision tree for the residual error, and constructing a loss function according to the predicted probability and the real probability, wherein the training target of the next-stage decision tree is a function value for minimizing the loss function, calculating the residual error of the next-stage decision tree according to the deviation between the real result and the predicted result of the next-stage decision tree, and constructing a next-stage decision tree in an iterative manner, and the like until the function value of the loss function is smaller than the set loss value, ending the iterative construction of the decision tree, and the formed iterative decision tree classification model comprises M sequentially iterative decision trees.

Optionally, n=2, event 1 is "order purchase advertisement product", and event 2 is "forward recommendation advertisement to other users".

Preferably, in the construction of the decision tree, the information gain of each advertisement domain feature is ordered, the advertisement domain feature with the largest information gain is used as a root node, so that the root node is used for classification, then the advertisement domain feature with the largest information gain is selected from other advertisement domain features as the next node for classification, and so on, so as to form a complete decision tree model.

Preferably, the residual of the decision tree is expressed as: r=y-p, where y represents the true probability of the sample and p represents the predicted probability of the current decision tree.

Preferably, the loss function is expressed as:

where y represents the probability of the sample being true and p represents the overall prediction probability.

In summary, the invention provides a method for multi-task combined prediction with few samples, which predicts various possible advertisement events in a binary coding mode, introduces multiple combined events into a decision tree, performs decision classification investigation on multiple advertisement field features by using the decision tree, can realize more accurate prediction under the condition of fewer samples, and further, performs residual iteration on the decision tree in order to ensure that the predicted result of the decision tree is as close to a real result as possible, thereby forming multiple decision tree iterations, gradually reduces residual errors, and finally converts the multiple combined events into probability values through a logic function, thereby performing combined prediction on multiple tasks and realizing the accuracy of prediction.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following discussion will discuss the embodiments or the drawings required in the description of the prior art, and it is obvious that the technical solutions described in connection with the drawings are only some embodiments of the present invention, and that other embodiments and drawings thereof can be obtained according to the embodiments shown in the drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates an exemplary decision tree structure constructed in accordance with the present invention for a few-sample advertising multitasking joint prediction method;

fig. 2 shows a general flow diagram of a few-sample advertising multitasking joint prediction method according to the invention.

Detailed Description

The following description of the embodiments of the present invention will be made in detail with reference to the accompanying drawings, wherein it is apparent that the embodiments described are only some, but not all embodiments of the present invention. All other embodiments, which can be made by a person of ordinary skill in the art without the need for inventive faculty, are within the scope of the invention, based on the embodiments described in the present invention.

The invention provides a combined prediction method for a few-sample advertisement multitasking, which has the actual task of, for example, examining the conditional probability that a user finally successfully receives advertisements to purchase advertisement products or forward and recommend advertisements to other users under the condition that the user clicks an advertisement webpage or related advertisement webpage buttons. In reality, according to different user preferences, various possibilities are often presented, for example, after a user clicks an advertisement, related products are purchased in a dispute, the advertisement is made quite successfully, the probability of purchasing an event after clicking the advertisement is high, and if the user still does not purchase the advertisement products after clicking the advertisement, the probability of purchasing the event is low, and the advertisement is made quite failed. Of course, some users may place an order purchase even without clicking on the ad page, which is again a conditional probability analysis in the case of "not clicking on the ad page".

Therefore, it is necessary to analyze the conditional probability distribution of advertisement, which is of great reference significance to future similar advertisement delivery.

Here, the prediction of the click operation "click on the advertisement web page or the related advertisement web page button" is referred to as "click event prediction", and n events caused by the click operation according to a certain conditional probability are referred to as 1 st event and 2 nd event … nth event. For example, in the above, the "order purchase advertisement product" may be referred to as event 1, and the "forward recommendation of advertisement to other users" may be referred to as event 2.

Here, consider that click prediction and prediction of each event are represented by referring to a binary algorithm, in which 0/1 means no click/click; in the 1 st event prediction, 0/1 means no occurrence/occurrence, and the same is true for the 2 nd to n th events. In the above example, for a particular user, "order purchase advertised product" is counted as 1 if it occurs and 0 if it does not occur. Likewise, "recommend advertisement forwarding to other users," count 1 if it occurs and count 0 if it does not.

Thus, let us set n=2, and examine click prediction and 1 st and 2 nd event predictions, in the case of 0/1 classification, a "code" as shown in table 1 below is formed:

TABLE 1

Thus, the click prediction and the two event predictions together form 8 classified joint tasks. Similarly, in the case of n event predictions, then 2 may be composed ⁿ⁺¹ And classifying the joint task events.

Thus, each advertiser sample is given binary encoding of the corresponding joint task according to the 0/1 classification of "click prediction" and "event prediction".

For example, in the above example, if a user clicks on an advertisement page, the "click prediction" is counted as 1, and then the 1 st event, that is, "purchasing advertisement products in order", is counted as 1, but the 2 nd event, that is, "recommending advertisement to other users" is not occurred, and the 2 nd event is counted as 0. The user's ad user sample is then given a binary code 110 for the corresponding syndicated task.

Next, a decision tree is constructed for the above 2 with a set of a plurality of ad user samples as a training set and ad domain features as feature sets ⁿ⁺¹ Item joint task event prediction.

The advertising domain features can be seen in table 2:

TABLE 2

In the construction of the decision tree, the information gain of each advertisement domain feature is ordered, the advertisement domain feature with the largest information gain is used as a root node, the root node is used for classification, then the advertisement domain feature with the largest information gain is selected from other advertisement domain features to be used as the next node for classification, and the like, so that a complete decision tree model is formed. Fig. 1 shows an example of a decision tree, in which the root node is a bundleId (put APP), and the information gain is the largest, and then the nodes count, adFormat, bundleIdScore, bidFloor, and bundleIdReview are sequentially sorted downward to develop other advertisement domain features with larger information gain.

Whereby the plurality of advertiser samples may be assigned to different leaf nodes in the decision tree and each leaf node may be assigned a predictor, each predictor comprising a value for 2 as described above ⁿ⁺¹ Number of item joint task events.

The decision tree model can reasonably predict other advertisement user samples, so that the conditional probability of the other advertisement user samples for other event prediction under click prediction is predicted. However, it is inevitable that there must be a deviation between the true and predicted results, which is here called decision tree residual, on the basis of which a suitable loss function can be defined for the gradual reduction of the residual and decision tree iterations can be performed.

Decision tree residuals may be expressed as:

wherein y represents the true probability calculation result of the sample, and p represents the prediction probability of the current decision tree model.

Thus, the decision trees are then fitted according to the residuals, and the training goal of each decision tree in the iterative process is to minimize the loss function. The loss function may be expressed as a logarithmic loss function as follows:

where y represents the probability (0 or 1) that the sample is true and p represents the overall prediction probability.

After each iteration, accumulating the prediction probability of the current decision tree and the prediction probability of the previous decision tree to obtain a final prediction result. Once the loss function L (y, p) is smaller than the set loss value L, the iteration of the decision tree is terminated, and the iterative decision tree classification model comprises M iterative decision trees.

The final iterative decision tree classification model consists of the accumulated results of the M decision trees. In the prediction process, an input sample is judged through each decision tree, the prediction results of each decision tree are accumulated, and finally the result is converted into a probability value through a logic function.

A general flow chart of the present invention may be shown in fig. 2.

The content of the few-sample multi-task joint prediction method provided by the invention is introduced. In the invention, various possible advertisement events are subjected to joint event prediction in a binary coding mode, a plurality of joint events are introduced into a decision tree, decision classification investigation is carried out on a plurality of advertisement field features by utilizing the decision tree, more accurate prediction can be realized under the condition of fewer samples, further, in order to ensure that the prediction result of the decision tree is as close to a real result as possible, residual iteration is carried out on the decision tree, so that a plurality of decision tree iterations gradually reduce residual errors, and finally, the residual errors are converted into probability values through a logic function, so that the joint prediction is carried out on multitasks, and the accuracy of prediction is realized.

The foregoing description of the exemplary embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, and variations which fall within the spirit and scope of the invention are intended to be included in the scope of the invention.

Claims

1. A method for joint prediction of ad-hoc multitasking with few samples, the method comprising:

setting the clicking action of clicking the advertisement webpage or the advertisement webpage button by a user as a clicking event, wherein the clicking event is called as a 1 st event, a 2 nd event, a … th event and an n th event according to n possible events triggered by conditional probability;

"click event" and each of the n possible events are counted as "1" in the binary algorithm if they occur and as "0" in the binary algorithm if they do not occur, whereby "click event" and the n possible events are counted as 2 in combination for each user operation on the advertising web page ⁿ⁺¹ Joint task events of item classification;

in the form of multiple advertising usersThe set of the cost is used as a training set, a plurality of advertisement domain features are used as feature sets, and a decision tree is constructed for the model 2 ⁿ⁺¹ Predicting a prediction result of each joint task event by the joint task event;

calculating residual errors of the decision tree according to the deviation between the real result and the predicted result, constructing a next-stage decision tree for the residual errors, constructing a loss function according to the predicted probability and the real probability, wherein the training target of the next-stage decision tree is a function value for minimizing the loss function,

calculating residual errors of the next-stage decision tree according to the deviation between the real result and the predicted result of the next-stage decision tree, constructing a next-stage decision tree in an iterative manner, and the like until the function value of the loss function is smaller than the set loss value, terminating iterative construction of the decision tree, wherein the formed iterative decision tree classification model comprises M sequentially iterative decision trees.

2. The method of claim 1, wherein n = 2, event 1 is "order purchase advertisement product", and event 2 is "forward advertisement recommended to other users".

3. The method according to claim 1, wherein in the construction of the decision tree, the advertisement domain features with the largest information gain are sorted according to the information gain of each advertisement domain feature, and the advertisement domain features with the largest information gain are used as root nodes, so that the root nodes are used for classification, then the advertisement domain features with the largest information gain are selected from other advertisement domain features, and are used as next nodes for classification, and so on, so as to form a complete decision tree model.

4. The few sample advertising multitasking joint prediction method of claim 1, characterized in that the residual of the decision tree is expressed as: r=y-p, where y represents the true probability of the sample and p represents the predicted probability of the current decision tree.

5. The few sample advertising multitasking joint prediction method of claim 1, characterized in that said loss function is expressed as: