CN116226757A

CN116226757A - Data processing method, device, computer equipment and storage medium

Info

Publication number: CN116226757A
Application number: CN202111462351.0A
Authority: CN
Inventors: 吕培立; 黄东波; 谭斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2023-06-06

Abstract

The embodiment of the application discloses a data processing method, a device, computer equipment and a storage medium, comprising the following steps: acquiring N sample advertisements and N sample data pairs corresponding to the N sample advertisements; determining a root node based on N sample advertisements, taking sample advertisement attribute characteristics of each sample advertisement in the root node as initial attribute characteristics, determining a first splitting point set conforming to the characteristic types of the initial attribute characteristics, determining a first optimal splitting point from the first splitting point set, and dividing the N sample advertisements according to splitting conditions indicated by the first optimal splitting point to obtain an initial probability lifting tree; and performing iterative training on the initial probability lifting tree based on the tree convergence condition and sample advertisement conversion data of each sample advertisement in the N sample advertisements to obtain a target probability lifting tree for predicting conversion data probability distribution of the target advertisement. By adopting the embodiment of the application, the prediction speed of the probability distribution can be improved.

Description

Data processing method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, a data processing device, a computer device, and a storage medium.

Background

The conventional probability estimation method (for example, gaussian process regression) is mainly used for estimating a time sequence, and since a covariance matrix between all data points in the whole sequence needs to be calculated when estimating a distribution process of the data points in the time sequence, a larger calculation amount is required when estimating the probability distribution process, so that the method is not suitable for a large number of application scenes. Therefore, the traditional probability prediction method has the defects of large calculation amount, poor flexibility and the like, and further, the computer equipment needs to spend a great deal of time when predicting the probability distribution, so that the prediction speed of the probability distribution is reduced.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, computer equipment and a storage medium, which can improve the prediction speed of probability distribution.

An aspect of an embodiment of the present application provides a data processing method, including:

acquiring N sample advertisements and N sample data pairs corresponding to the N sample advertisements; n is a positive integer; one sample advertisement corresponds to one sample data pair, and one sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data;

Determining a root node for constructing an initial probability lifting tree based on N sample advertisements, taking sample advertisement attribute characteristics of each sample advertisement in the root node as initial attribute characteristics, determining a first splitting point set conforming to the characteristic types of the initial attribute characteristics, determining a first optimal splitting point from the first splitting point set, and dividing the N sample advertisements according to splitting conditions indicated by the first optimal splitting point to obtain the initial probability lifting tree;

and performing iterative training on the initial probability lifting tree based on the tree convergence condition associated with the initial probability lifting tree and sample advertisement conversion data of each sample advertisement in the N sample advertisements to obtain a target probability lifting tree for predicting conversion data probability distribution of the target advertisement.

acquiring a throwing characteristic request aiming at a target advertisement; the putting feature request includes target advertisement attribute features of the target advertisement;

acquiring a target probability lifting tree associated with the target advertisement, inputting the attribute characteristics of the target advertisement into the target probability lifting tree, and outputting the conversion data probability distribution of the target advertisement by the target probability lifting tree; the target probability lifting tree is obtained after iterative training of the initial probability lifting tree based on the tree convergence condition and sample advertisement conversion data of each sample advertisement in the N sample advertisements; the N sample advertisements are used for constructing a root node of the initial probability lifting tree; the initial probability lifting tree is obtained by dividing N sample advertisements based on the splitting condition indicated by the first optimal splitting point; the first best split point is determined based on sample advertisement attribute characteristics of each sample advertisement in the root node; n is a positive integer;

Target advertisement conversion data of the target advertisement is determined based on the conversion data probability distribution of the target advertisement.

An aspect of an embodiment of the present application provides a data processing apparatus, including:

the sample advertisement acquisition module is used for acquiring N sample advertisements and N sample data pairs corresponding to the N sample advertisements; n is a positive integer; one sample advertisement corresponds to one sample data pair, and one sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data;

the initial tree determining module is used for determining root nodes for constructing an initial probability lifting tree based on the N sample advertisements, taking sample advertisement attribute characteristics of each sample advertisement in the root nodes as initial attribute characteristics, determining a first splitting point set conforming to the characteristic types of the initial attribute characteristics, determining a first optimal splitting point from the first splitting point set, and dividing the N sample advertisements according to splitting conditions indicated by the first optimal splitting point to obtain the initial probability lifting tree;

and the iterative training module is used for carrying out iterative training on the initial probability lifting tree based on the tree convergence condition associated with the initial probability lifting tree and sample advertisement conversion data of each sample advertisement in the N sample advertisements to obtain a target probability lifting tree for predicting conversion data probability distribution of the target advertisement.

Wherein the initial tree determination module comprises:

a first split point determining unit, configured to determine, based on N sample advertisements, a root node for constructing an initial probability promotion tree, determine a first set of split points that matches a feature type of an initial attribute feature by using a sample advertisement attribute feature of each sample advertisement in the root node as the initial attribute feature, and determine a first optimal split point from the first set of split points;

the first dividing unit is used for dividing the sample advertisements in the root node according to the splitting conditions indicated by the first optimal splitting point to obtain a first type node and a second type node;

the cut-off condition acquisition unit is used for respectively taking the first type node and the second type node as nodes to be split to acquire a split cut-off condition associated with the initial probability lifting tree;

the second splitting point determining unit is used for determining a second splitting point set which accords with the feature type of the target attribute feature by taking the sample advertisement attribute feature of each sample advertisement in the node to be split as the target attribute feature when the node to be split does not meet the splitting cut-off condition, and determining a second optimal splitting point from the second splitting point set;

The second dividing unit is used for dividing the sample advertisement in the node to be split according to the splitting condition indicated by the second optimal splitting point until the divided node meets the splitting cut-off condition, and constructing an initial probability lifting tree based on the root node and the divided node.

Wherein the first split point determination unit includes:

an initial feature determining subunit, configured to determine, based on the N sample advertisements, a root node for constructing an initial probability promotion tree, and take, as an initial attribute feature, a sample advertisement attribute feature of each sample advertisement in the root node;

the de-duplication processing subunit is used for taking the feature type of the initial attribute feature as a splitting point, de-duplication processing the splitting point and adding the splitting point after de-duplication processing to the first splitting point set;

and the splitting point screening subunit is used for screening the optimal splitting point meeting the splitting point determining condition from the first splitting point set based on the splitting point determining condition aiming at the root node, and taking the screened optimal splitting point as the first optimal splitting point.

Wherein the first set of split points comprises M split points; m is a positive integer; the M split points comprise split point F _a The method comprises the steps of carrying out a first treatment on the surface of the a is less than or equal to M;

The split point screening subunit comprises:

a loss reduction parameter determining subunit, configured to obtain a splitting point determining condition for a root node, and obtain a node loss reduction parameter corresponding to the root node;

a first split point acquisition subunit for acquiring a split point F from a first split point set _a Based on the split point F _a Dividing sample advertisements in the root node under the indicated splitting condition to obtain initial child nodes of the root node; the initial sub-node comprises a first initial node and a second initial node;

a first parameter acquisition subunit for acquiring and splitting point F _a An associated first feature loss reduction parameter;

the first comparison subunit is used for comparing the first characteristic loss reduction parameter and the node loss reduction parameter;

a first determination subunit for determining the splitting point F if the node loss reduction parameter is less than or equal to the first characteristic loss reduction parameter _a Meets the determination condition of the splitting point, and the splitting point F _a As the first best split point.

Wherein the total probability distribution indicated by the root node comprises a total shape parameter and a total slope parameter; the first sub-probability distribution indicated by the first initial node includes a first shape parameter and a first slope parameter; the second sub-probability distribution indicated by the second initial node includes a second shape parameter and a second slope parameter;

The first parameter acquisition subunit is further specifically configured to:

respectively counting the total number of samples of the sample advertisements in the root node, the first sample number of the sample advertisements in the first initial node and the second sample number of the sample advertisements in the second initial node;

determining a total node loss of the root node based on the total shape parameter, the total slope parameter, and the node loss determination rule;

determining a first node loss for the first initial node based on the first shape parameter, the first slope parameter, and the node loss determination rule;

determining a second node loss for the second initial node based on the second shape parameter, the second slope parameter, and the node loss determination rule;

acquiring a characteristic loss reduction rule, and determining a splitting point F based on the total number of samples, the first sample number, the second sample number, the total point loss, the first node loss, the second node loss and the characteristic loss reduction rule _a An associated first characteristic loss reduction parameter.

Wherein the split point screening subunit further comprises:

a second split point obtaining subunit for determining a split point F if the node loss reduction parameter is greater than the first characteristic loss reduction parameter _a The splitting point F is obtained from the first splitting point set without meeting the splitting point determining condition _a+1 ；

A second determination subunit for determining the splitting point F if a+1 is equal to M _a+1 As the first best split point.

Wherein the split point screening subunit further comprises:

a second parameter acquisition subunit for, if a+1 is smaller than M, based on the split point F _a+1 Dividing sample advertisements in root nodes under the indicated splitting condition to obtain a splitting point F _a+1 An associated second feature loss reduction parameter;

the second comparison subunit is used for comparing the second characteristic loss reduction parameter and the node loss reduction parameter;

a third determining subunit for determining the splitting point F if the node loss reduction parameter is less than or equal to the second characteristic loss reduction parameter _a+1 Meets the determination condition of the splitting point, and the splitting point F _a+1 As the first best split point.

Wherein the first dividing unit includes:

a sample advertisement obtaining subunit, configured to obtain a sample advertisement i from the root node; i is a positive integer less than or equal to N;

a first dividing subunit, configured to divide the sample advertisement i to a first type node if the sample advertisement i meets a splitting condition indicated by a first optimal splitting point;

the second dividing subunit is configured to divide the sample advertisement i to a second type node if the sample advertisement i does not meet the splitting condition indicated by the first optimal splitting point; the first type node and the second type node are both child nodes of the root node.

Wherein, this iterative training module includes:

a tree convergence condition acquisition unit configured to acquire a tree convergence condition associated with the initial probability boosting tree; the tree convergence condition includes a distribution error threshold;

a sample probability distribution determining unit configured to determine a sample conversion data probability distribution based on sample advertisement conversion data of each of the N sample advertisements;

the prediction probability distribution determining unit is used for determining the distribution error of the initial probability lifting tree based on the sample conversion data probability distribution and the prediction conversion data probability distribution when the prediction conversion data probability distribution output by the initial probability lifting tree is obtained;

and the first target tree determining unit is used for determining that the initial probability lifting tree meets the tree convergence condition when the distribution error is smaller than or equal to the distribution error threshold value, and taking the initial probability lifting tree meeting the tree convergence condition as a target probability lifting tree for predicting the conversion data probability distribution of the target advertisement.

Wherein, this iterative training module still includes:

the condition unsatisfied unit is used for determining that the initial probability lifting tree does not meet the tree convergence condition when the distribution error is larger than the distribution error threshold value;

The second target tree determining unit is used for adjusting the tree parameters of the initial probability lifting tree, taking the adjusted initial probability lifting tree as a transition probability lifting tree, and taking the transition probability lifting tree meeting the tree convergence condition as a target probability lifting tree for predicting the conversion data probability distribution of the target advertisement when the transition probability lifting tree meets the tree convergence condition.

the delivery request acquisition module is used for acquiring a delivery characteristic request aiming at the target advertisement; the putting feature request includes target advertisement attribute features of the target advertisement;

the probability distribution determining module is used for acquiring a target probability lifting tree associated with the target advertisement, inputting the attribute characteristics of the target advertisement into the target probability lifting tree, and outputting the conversion data probability distribution of the target advertisement by the target probability lifting tree; the target probability lifting tree is obtained after iterative training of the initial probability lifting tree based on the tree convergence condition and sample advertisement conversion data of each sample advertisement in the N sample advertisements; the N sample advertisements are used for constructing a root node of the initial probability lifting tree; the initial probability lifting tree is obtained by dividing N sample advertisements based on the splitting condition indicated by the first optimal splitting point; the first best split point is determined based on sample advertisement attribute characteristics of each sample advertisement in the root node; n is a positive integer;

And the target conversion data determining module is used for determining target advertisement conversion data of the target advertisement based on the conversion data probability distribution of the target advertisement.

In one aspect, a computer device is provided, including: a processor and a memory;

the processor is connected to the memory, wherein the memory is configured to store a computer program, and when the computer program is executed by the processor, the computer device is caused to execute the method provided in the embodiment of the application.

In one aspect, the present application provides a computer readable storage medium storing a computer program adapted to be loaded and executed by a processor, so that a computer device having the processor performs the method provided in the embodiments of the present application.

In one aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided in the embodiments of the present application.

In this embodiment of the present application, the computer device may determine, according to N sample data pairs corresponding to the obtained N sample advertisements, a root node for constructing an initial probability promotion tree, and further may determine, by using sample advertisement attribute features of each sample advertisement in the root node, a first best splitting point, and divide sample advertisements in the root node, so as to obtain the initial probability promotion tree. Further, the computer device may iteratively train the initial probability-enhancing tree to obtain a target probability-enhancing tree for predicting a conversion data probability distribution (e.g., a conversion cost probability distribution) of the target advertisement. The whole process does not need to consume a large amount of calculation resources like the traditional probability prediction method to calculate the related parameters (such as covariance matrix) among all sample data, but adopts a target probability lifting tree to rapidly predict the probability distribution of the converted data of the target advertisement, so that the calculation time is reduced, and the prediction speed of the probability distribution is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application;

FIG. 2 is a scene graph for training a probability boosting tree provided by an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic view of a scenario for constructing an initial probability-promotion tree according to an embodiment of the present application;

FIG. 5 is a schematic view of a scenario for iterative training of an initial probability-enhancing tree according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 7a is a schematic diagram of a comparison of a predicted transition data probability distribution and an actual transition data probability distribution determined using a target probability lifting tree according to an embodiment of the present application;

FIG. 7b is a schematic diagram of a comparison of a predicted transition data probability distribution and an actual transition data probability distribution determined using a target probability lifting tree according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a scenario for determining target conversion data according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a computer device provided in an embodiment of the present application;

FIG. 12 is a schematic diagram of a data processing system according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application. As shown in fig. 1, the network architecture may include a server 10F and a cluster of user terminals. The cluster of user terminals may comprise one or more user terminals. As shown in fig. 1, the user terminal cluster may specifically include a user terminal 100a, a user terminal 100b, user terminals 100c, …, and a user terminal 100n. As shown in fig. 1, the

user terminals

100a, 100b, 100c, …, 100n may respectively perform network connection with the server 10F, so that each user terminal may perform data interaction with the server 10F through the network connection. The network connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may be directly or indirectly connected through a wireless communication manner, or may be other manners, which is not limited herein.

Wherein each user terminal in the user terminal cluster may include: smart terminals with a tree training function such as smart phones, tablet computers, notebook computers, desktop computers, smart speakers, smart watches, vehicle-mounted terminals, smart televisions and the like. It should be appreciated that each user terminal in the user terminal cluster shown in fig. 1 may be provided with a target application (i.e. application client), which may interact with the server 10F shown in fig. 1, respectively, when the application client is running in each user terminal. The application clients may include, among other things, social clients, multimedia clients (e.g., video clients), entertainment clients (e.g., game clients), educational clients, live clients, and the like. The application client may be an independent client, or may be an embedded sub-client integrated in a client (for example, a social client, an educational client, and a multimedia client), which is not limited herein.

As shown in fig. 1, the server 10F in the embodiment of the present application may be a server corresponding to the application client. The server 10F may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The number of the user terminals and the servers is not limited in the embodiment of the application.

For ease of understanding, the embodiment of the present application may select one user terminal from the plurality of user terminals shown in fig. 1 as the target user terminal. For example, the embodiment of the present application may use the user terminal 100a shown in fig. 1 as a target user terminal, where a target application (i.e., an application client) may be integrated. At this time, the target user terminal may implement data interaction between the service data platform corresponding to the application client and the server 10F. Where the target application may run a probability-promotion tree that has been trained (i.e., a target probability-promotion tree) that may be used to predict the transformed data probability distribution of the target advertisement (i.e., the advertisement to be predicted). It should be appreciated that the conversion data probability distribution may be used to provide a reference for the advertiser to control the conversion data to facilitate subsequent determination of the final conversion data for the targeted advertisement based on the conversion data probability distribution, which is of great significance to advertising marketing, financial management. The conversion data probability distribution refers to a probability law for expressing the value of a random variable (namely conversion data). The probability of an event indicates how likely a result will occur in a test.

It should be appreciated that the embodiments of the present application propose a distribution estimation method based on probability-lifting trees, which may relate to machine learning directions in the field of artificial intelligence. It is understood that artificial intelligence (Artificial Intelligence, AI for short) is a theory, method, technique and application system that simulates, extends and extends human intelligence, senses environment, acquires knowledge and uses knowledge to obtain optimal results using digital computers or digital computer controlled computations. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Among them, machine Learning (ML) is a multi-domain interdisciplinary, and involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

It can be appreciated that the lifting method using the decision tree as a basis function may be referred to as a lifting tree (lifting tree) in the embodiments of the present application. The lift tree is considered one of the best performing methods in statistical learning. The decision tree is a decision analysis method for evaluating the risk of an item and judging the feasibility of the item by solving the probability that the expected value of the net present value is greater than or equal to zero by forming the decision tree on the basis of knowing the occurrence probability of various conditions, and is a graphical method for intuitively applying probability analysis.

It should be appreciated that the application scenario involved in the target probability promotion tree may include: application scenario 1 (e.g., a conversion cost distribution forecast scenario) and application scenario 2 (e.g., a default asset distribution forecast scenario). When training the target probability lifting tree, the computer equipment needs to acquire N sample advertisements and sample data pairs corresponding to the N sample advertisements respectively. Where N is a positive integer, one sample advertisement corresponds to one sample data pair, and one sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data. The embodiments of the present application may refer to advertisement attribute features (e.g., optimization objectives, advertisement types, and advertiser industries, etc.) of the sample advertisements as sample advertisement attribute features, and conversion data (e.g., advertisement conversion costs or advertisement default assets, etc.) of the sample advertisements as sample conversion data.

Further, the computer device may determine a root node for constructing the initial probability promotion tree based on the N sample advertisements, and may further determine a first best split point through sample advertisement attribute features of each sample advertisement in the root node, and divide the sample advertisements in the root node according to the first best split point, thereby obtaining the initial probability promotion tree. At this time, the computer device may perform iterative training on the initial probability boosting tree, and may further obtain a target probability boosting tree when training is completed. This means that the computer device can directly use the target probability promotion tree without consuming a large amount of computing resources, and rapidly predict the probability distribution of the conversion data of the target advertisement, so that the prediction speed of the probability distribution is increased.

For ease of understanding, further, please refer to fig. 2, fig. 2 is a scene diagram for training a probability boosting tree according to an embodiment of the present application. As shown in fig. 2, the computer device in the embodiment of the present application may be a computer device with a tree promotion training function, where the computer device may be any one of the user terminals in the user terminal cluster shown in fig. 1, for example, the user terminal 100a, and the computer device may also be the server 10F shown in fig. 1, and the computer device will not be limited herein.

Should beIt is understood that the computer device may obtain N sample advertisements and N sample data pairs corresponding to the N sample advertisements; n is a positive integer; one sample advertisement corresponds to one sample data pair, and one sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data. As shown in fig. 2, the number of sample advertisements in the embodiment of the present application is exemplified by 5, and the 5 sample advertisements may specifically include sample advertisement 1, sample advertisement 2, sample advertisement 3, sample advertisement 4, and sample advertisement 5. Wherein the sample data pair corresponding to the sample advertisement 1 is the sample data pair S ₁ The method comprises the steps of carrying out a first treatment on the surface of the Sample data pair corresponding to sample advertisement 2 is sample data pair S ₂ The method comprises the steps of carrying out a first treatment on the surface of the Sample data pair corresponding to sample advertisement 3 is sample data pair S ₃ The method comprises the steps of carrying out a first treatment on the surface of the Sample data pair corresponding to sample advertisement 4 is sample data pair S ₄ The method comprises the steps of carrying out a first treatment on the surface of the Sample data pair corresponding to sample advertisement 5 is sample data pair S ₅ 。

Wherein the computer device may determine a root node (e.g., initial value shown in fig. 2) for constructing an initial probability-promotion tree based on the 5 sample advertisements. Further, the computer device may determine a first set of split points that matches a feature type of the initial attribute feature using the sample advertisement attribute feature of each sample advertisement in the root node as the initial attribute feature, determine a first best split point from the first set of split points, and divide the 5 sample advertisements according to the split condition indicated by the first best split point to obtain an initial probability promotion tree (e.g., tree 1 shown in fig. 2). Wherein, the sample advertisement divided into the first leaf node in the tree 1 may include sample advertisement 1, sample advertisement 2, sample advertisement 3, and the sample advertisement divided into the second leaf node of the tree 1 may include sample advertisement 4 and sample advertisement 5.

For ease of understanding, further, please refer to table 1, table 1 is a sample probability distribution parameter calculation process table before simplification provided in the embodiment of the present application. As shown in table 1:

TABLE 1

Wherein p in Table 1 _j,l The probability distribution of the first leaf node in the j-th tree may be represented; for example, p _1,1 Refers to the probability distribution of the 1 st leaf node in the 1 st tree; c may refer to a normalization parameter, where the first subscript of C is used to represent the current tree (i.e., the number of iterations) in which the sample ad i is located, the second subscript is used to represent the position of the leaf node of the last tree for the sample ad i, and the third subscript is used to represent the position of the leaf node of the current tree for the sample ad i.

Further, the computer device may obtain a tree convergence condition associated with the initial probability-promoting tree. Wherein, the tree convergence condition can be used for indicating the number of iterative training, namely the number of trees contained in the target probability lifting tree; alternatively, the tree convergence condition may be used to indicate that the distribution error of the target probability-promotion tree is less than a preset distribution error threshold (e.g., 0.3), and the tree convergence condition will not be limited herein.

Further, the computer device may iteratively train an initial probability-boosting tree (e.g., probability-boosting tree 20T including tree 1) based on the tree convergence condition and sample advertisement conversion data for each of the 5 sample advertisements, resulting in a target probability-boosting tree for predicting a conversion data probability distribution for the target advertisement.

For example, the computer device needs to determine, based on the tree convergence condition, whether the probability-promoting tree 20T including the tree 1 is a target probability-promoting tree for predicting the conversion data probability distribution of the target advertisement, and if the probability-promoting tree 20T including the tree 1 does not satisfy the tree convergence condition, that is, it determines that the probability-promoting tree 20T including the tree 1 is not the target probability-promoting tree, at which time the computer device needs to perform a second iteration on the probability-promoting tree 20T, that is, construct the tree 2 on the basis of the tree 1 to obtain the probability-promoting tree including the tree 1 and the tree 2 (for example, a new probability-promoting tree 20T). At this time, the computer device needs to determine whether the new probability-promotion tree 20T is a target probability-promotion tree for predicting the conversion data probability distribution of the target advertisement further based on the tree convergence condition, and so on until the probability-promotion tree 20T after the iterative training satisfies the tree convergence condition, taking the probability-promotion tree 20T after the iterative training as the target probability-promotion tree.

Therefore, the computer equipment in the embodiment of the application does not need to consume a large amount of computing resources when training the initial probability lifting tree model, so that the computing amount of probability prediction can be greatly reduced. In addition, the target probability lifting tree obtained by training the computer equipment can be used for rapidly predicting the conversion data probability distribution of the target advertisement, so that the prediction speed of the probability distribution is improved.

The computer device with the lifting tree training function performs iterative training on the initial probability lifting tree through N sample data pairs corresponding to N sample advertisements, so as to obtain a specific implementation manner of the target probability lifting tree, which can be seen in the embodiments corresponding to the following fig. 3-8.

Further, referring to fig. 3, fig. 3 is a flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 3, the method may be performed by a computer device having a tree promotion training function, and the computer device may be a user terminal (e.g., any one of the user terminals in the user terminal cluster shown in fig. 1, e.g., the user terminal 100 a) or a server (e.g., the server 10F shown in fig. 1), which is not limited herein. For easy understanding, the embodiment of the application is described by taking the method performed by a server with a function of training a promotion tree as an example, and the method at least includes the following steps S101 to S103:

step S101, N sample data pairs corresponding to N sample advertisements are obtained.

Specifically, in order to quickly and flexibly estimate the complete probability distribution, the computer device with the function of promoting tree training may obtain N sample advertisements from the advertisements that are put in historically, and N sample data pairs corresponding to the N sample advertisements. Where N may be a positive integer, one sample advertisement corresponds to one sample data pair, and the one sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data.

For ease of understanding, further, please refer to table 2, table 2 is a sample data table provided in the embodiments of the present application. Wherein, the table 2 may include N sample advertisements, and one sample data pair corresponding to one sample advertisement. For example, the N sample advertisements may include sample advertisement i, and the sample data pair corresponding to sample advertisement i may be (x _i ，y _i ). Wherein i is a positive integer less than or equal to N; x is x _i May refer to the sample advertisement attribute feature, y, of the sample advertisement i _i May refer to sample ad conversion data (e.g., conversion costs) for the sample ad i. As shown in table 2:

TABLE 2

The advertisement attribute features of the advertisement may specifically include the following attribute features: a first attribute feature (e.g., optimization objective), a second attribute feature (e.g., advertisement type), a third attribute feature (e.g., advertiser industry). For example, optimization objectives may include feature 1 (e.g., targeting downloads), feature 2 (e.g., targeting activations), feature 3 (e.g., targeting secondary reservations), feature 4 (e.g., targeting payment assets), and feature 5 (e.g., targeting trust), among others. Advertisement types may include feature 6 (e.g., operating system a download), feature 7 (e.g., operating system B download), feature 8 (e.g., merchandise promotion), and feature 9 (mini-game promotion), among others. Wherein, the operating system A and the operating system B belong to two different operating systems. The advertiser industry may include feature 10 (e.g., the electronics industry), feature 11 (e.g., the education industry), feature 12 (e.g., the financial industry), and feature 13 (e.g., the travel industry), among others.

For example, the sample data pair for sample advertisement 1 shown in Table 2 may be (x ₁ ，y ₁ ) I.e. the sample advertisement attribute feature x of the sample advertisement 1 ₁ May include feature 1 (e.g., targeting downloads), feature 6 (e.g., operating system)A download), feature 13 (e.g., travel industry), sample ad conversion data y for sample ad 1 ₁ May be 1000.

Step S102, based on N sample advertisements, determining a root node for constructing an initial probability lifting tree, taking sample advertisement attribute characteristics of each sample advertisement in the root node as initial attribute characteristics, determining a first splitting point set conforming to the characteristic types of the initial attribute characteristics, determining a first optimal splitting point from the first splitting point set, and dividing the N sample advertisements according to splitting conditions indicated by the first optimal splitting point to obtain the initial probability lifting tree.

Specifically, the computer device may determine a root node for constructing an initial probability promotion tree based on the N sample advertisements, and may further use a sample advertisement attribute feature of each sample advertisement in the root node as an initial attribute feature, and may determine a first set of split points that matches a feature type of the initial attribute feature, and determine a first best split point from the first set of split points. Further, the computer device may divide the sample advertisements in the root node according to the splitting condition indicated by the first best splitting point, resulting in a first type node (e.g., left child node) and a second type node (e.g., right child node). Further, the computer device may use the first type node and the second type node as nodes to be split, respectively, and at the same time, the computer device may also obtain a split-cut condition associated with the initial probability boosting tree. Where the split cut-off condition is used to indicate the depth (e.g., 5) of the initial probability-promotion tree. Alternatively, the split-off condition here may also be used to indicate that the number of sample advertisements in the node to be split (i.e., the current node) is 1, i.e., the node to be split has failed to continue dividing, where the split-off condition will not be defined. When the node to be split meets the splitting cut-off condition, the computer device can construct the initial probability lifting tree based on the root node and the divided nodes. Optionally, when the node to be split does not meet the splitting cut-off condition, the computer device may use the sample advertisement attribute feature of each sample advertisement in the node to be split as the target attribute feature, and further may determine a second splitting point set conforming to the feature type of the target attribute feature, and determine a second optimal splitting point from the second splitting point set. Further, the computer device may divide the sample advertisement in the node to be split according to the splitting condition indicated by the second optimal splitting point until the divided node meets the splitting cut-off condition, and may further construct the initial probability promotion tree based on the root node and the divided node.

It should be appreciated that the embodiment of the present application may use the idea of a probability-promotion tree to propose a systematic scheme for modeling probability distribution, where a single probability-promotion tree is substantially identical in structure to a common decision tree: it is a binary tree and non-leaf nodes can be partitioned based on the split conditions indicated by the split points. For example, a sample advertisement that satisfies a split condition may be partitioned to a first type node (e.g., a left child node) and a sample advertisement that does not satisfy the split condition may be partitioned to a second type node (e.g., a right child node). The probability lifting trees can adopt an integrated learning thought, take the output of leaf nodes in each round of training as likelihood probability, and then apply a Bayesian formula for updating.

Specifically, if the set of samples acquired by the computer device is { (x) _i ,y _i ) I=1, 2, …, N }, then the probability density function p (y) of the sample advertisement i in a distribution prediction model (e.g., the model indicated by the target probability boosting tree containing T trees) _i |x _i ) See equation (1) below, and the related equation for ensemble learning may see equation (2):

wherein p is _j,l Refers to the first leaf node in the j-th tree Probability distribution of points; t trees may represent iterative training of T rounds, where T is a positive integer; c refers to normalization parameters;

may represent the first node to which sample data i is partitioned in the j-th tree; />

Representing the sample set on the first node in the j-th tree.

Wherein it is assumed that the output of the leaf nodes of the probability-boosting tree obeys a certain parameter distribution, such as a gaussian distribution (gaussian distribution) or a Gamma distribution (Gamma distribution). Since in the advertisement conversion scenario, the advertisement conversion data (for example, advertisement conversion cost or advertisement default property) is generally a value greater than 0, and the definition domain of the Gamma distribution is 0 to positive infinity, the parameter distribution obeyed by the output of the leaf node of the probability boosting tree in the embodiment of the present application may take the Gamma distribution as an example.

It will be appreciated that the Gamma distribution of the dual parameters may be uniquely determined by the shape parameter (k) and the slope parameter (θ). Wherein the shape parameters herein may be used to describe the shape of the distribution of the probability distribution (i.e. to describe the shape of the curve), and the slope parameters may be used to describe the slope of the probability distribution (i.e. to describe the steepness of the curve). Thus, embodiments of the present application may write the probability density function of a node as p (y|k, θ), where y may be ad conversion data.

Specifically, the computer device determines the node loss of the node l when performing the jth round (i.e., the jth tree) of iterative training on the initial probability boosting tree, and can see the following formula (3) -formula (11):

applying Gamma distribution formula

And is obtained by simple finishing:

wherein, likelihood _j-1 Refers to the overall likelihood function during the j-1 th round of iterative training; p (y) _i |k _j-1,i ,θ _j-1,i ) Refers to the probability density function of sample advertisement i during the j-1 th round of iterative training, and N refers to the total number of samples of sample advertisements. Loss (Low Density) _j-1 (L _j-1 ) Refers to the overall loss function during the j-1 th round of iterative training; loss (Low Density) _j-1 (S ^l ) The loss function of the node l in the j-1 th round of iterative training is referred; s refers to the sample in node lThe total number of the present advertisements;

called dual Gamma function, at k>Monotonically increasing at 0; η (eta) ₁ The learning rate of the pointer to the shape parameter k; η (eta) ₂ The pointer-to-shape parameter is the learning rate of the pointer-to-slope parameter θ; loss (Low Density) _j (S ^l ) Refers to node loss of node l at the j-th round of iterative training.

Specifically, the formula for determining the node loss reduction parameter corresponding to any one node (e.g., node l) by the computer device may be referred to as the following formula (12):

loss_reduce＝Loss _j (S ^l )-Loss _j-1 (S ^l ) (12)

here, the loss_reduce refers to a node loss reduction parameter of the node l, and an initial value of the node loss reduction parameter is 0.

In particular, the computer device obtains a characteristic loss reduction rule (i.e. for determining the point of division F _a The formula for the associated current feature loss reduction parameter tmp_loss_reduction may be found in the following formula (13):

wherein, |S ^l The i may represent the total number of samples of the sample advertisement in node i;

may represent partitioning into first type nodes (e.g., based on split point F _a Left child node of node l divided by the indicated split condition); />

May represent partitioning into nodes of a second type (e.g., based on split point F _a Right child node of node l divided by indicated split condition)The number of the samples; />

Node loss of the left child node of the node l in the j-th round of iterative training can be represented; />

The node loss of the right child node of node l at the j-th round of iterative training may be represented.

It should be appreciated that the computer device, when determining the first best split point (i.e., the best split point for partitioning the sample advertisements in the root node), may determine the root node for constructing the initial probability boost tree based on the N sample advertisements, and may then take as the initial attribute feature the sample advertisement attribute feature of each sample advertisement in the root node. At this time, the computer device may take the feature type of the initial attribute feature as a split point, perform a deduplication process on the split point, and add the split point after the deduplication process to the first split point set. For example, a set of split points associated with the root node (i.e., a first set of split points) c= { F ₁ ，F ₂ ，F ₃ ，…，F _M Here, M may be a positive integer.

Further, the computer device may screen out an optimal split point satisfying the split point determination condition from the first split point set based on the split point determination condition for the root node, and further may use the screened out optimal split point as the first optimal split point. The split point determination condition herein may be used to indicate that the current split point is the optimal split point when the node loss reduction parameter is less than or equal to the current feature loss reduction parameter (i.e., loss_reduction +.tmp_loss_reduction). Wherein, at initialization, the node loss reduction parameter of the root node is an initial value (e.g., -0), and the optimal split point sequence number of the current root node is an initial value (e.g., -1).

It will be appreciated that, when the splitting point determination condition for the root node is obtained, the computer device may further obtain the node loss reduction parameter (i.e. initial value 0) corresponding to the root node. At this timeThe computer device may obtain a first split point (e.g., split point F _a Where a is less than or equal to M), and may be based on the split point F _a And dividing the sample advertisements in the root node under the indicated splitting condition to obtain initial child nodes of the root node. Wherein the initial child node herein may comprise a first initial node (e.g., based on split point F _a The left child node of the root node divided by the indicated split condition) and a second initial node (e.g., based on split point F _a The right child node of the root node divided by the indicated split condition).

Further, the computer device obtains a splitting point F _a An associated current feature loss reduction parameter (i.e., a first feature loss reduction parameter). In this embodiment of the present application, the shape parameter in the probability distribution indicated by the root node (i.e., the total probability distribution) may be referred to as a total shape parameter, and the slope parameter in the probability distribution indicated by the root node may be referred to as a total slope parameter; the shape parameters in the probability distribution (namely, the first sub-probability distribution) indicated by the first initial node of the root node are called first shape parameters, and the slope parameters in the first sub-probability distribution are called first slope parameters; the shape parameters in the probability distribution indicated by the second initial node of the root node (i.e. the second sub-probability distribution) are referred to as second shape parameters, and the slope parameters in the second sub-probability distribution are referred to as second slope parameters.

At this time, the computer device may count the total number of samples of the sample advertisements in the root node, the first number of samples of the sample advertisements in the first initial node, and the second number of samples of the sample advertisements in the second initial node, respectively. Further, the computer device may determine the total node loss of the root node based on the total shape parameter, the total slope parameter, and the node loss determination rule shown in the above equation (11). Meanwhile, the computer device may further determine a first node loss of the first initial node based on the first shape parameter, the first slope parameter, and the node loss determination rule shown in the above formula (11). At the same time, the calculationThe machine may determine a second node loss of the second initial node based on the second shape parameter, the second slope parameter, and the node loss determination rule shown in the above equation (11). Further, the computer device may obtain a characteristic loss reduction rule shown in the above formula (13), determine the splitting point F based on the total number of samples, the first number of samples, the second number of samples, the total point loss, the first node loss, the second node loss, and the characteristic loss reduction rule _a An associated first characteristic loss reduction parameter.

It should be appreciated that the computer device may compare the first characteristic loss reduction parameter to the node loss reduction parameter. If the node loss reduction parameter is less than or equal to the first characteristic loss reduction parameter, the computer device may determine a split point F _a Meets the determination condition of the splitting point, and the splitting point F _a The number of the first best splitting point, i.e. the current best splitting point, is a. Alternatively, if the node loss reduction parameter is greater than the first characteristic loss reduction parameter, the computer device may determine a split point F _a The split point determination condition is not satisfied, at which time the computer device may obtain the split point F from the first set of split points _a Is the next split point (e.g., split point F _a+1 ). Since the first split point set has split points capable of effectively reducing node loss of the root node, in order to improve the determination efficiency of the optimal split point, the computer device can determine the split point F _a+1 Whether it belongs to the last split point in the first set of split points, i.e. whether a+1 is equal to M, is determined to quickly determine the first best split point for partitioning the sample advertisement in the root node.

If a+1 is equal to M, the computer device may determine the split point F _a+1 The last split point belonging to the first set of split points, in which case the computer device does not have to determine split point F _a+1 The corresponding current characteristic loss reduction parameters are all satisfying the splitting point determination condition, but can be directly applied to the splitting point F _a+1 The number of the first best splitting point, i.e. the current best splitting point, is a+1. Alternatively, ifa+1 is less than M, the computer device may be based on the split point F _a+1 The indicated splitting condition re-partitions the sample advertisement in the root node to obtain new initial child nodes, where the new initial child nodes herein may include new first initial nodes (e.g., based on splitting point F _a+1 The left child node of the root node divided by the indicated split condition) and a new second initial node (e.g., based on split point F _a+1 The right child node of the root node divided by the indicated split condition). Further, the computer device may obtain the splitting point F according to the above formula (13) _a+1 The associated current feature loss reduction parameter (i.e., the second current feature loss reduction parameter) may then be compared to the node loss reduction parameter. If the node loss reduction parameter is less than or equal to the second characteristic loss reduction parameter, the computer device may determine a split point F _a+1 Meets the determination condition of the splitting point, and then splits the splitting point F _a+1 The number of the first best splitting point, i.e. the current best splitting point, is a+1. The specific embodiment of determining the second feature loss reduction parameter by the computer device may be referred to the specific embodiment of determining the first feature loss reduction parameter by the computer device, which will not be described in detail herein.

When the first optimal splitting point is determined, the computer device may divide the sample advertisement in the root node according to the splitting condition indicated by the first optimal splitting point, so as to obtain the first type node and the second type node. Wherein the first type node refers to a left child node of the root node divided based on the splitting condition indicated by the first optimal splitting point; the second type node refers to the left child node of the root node divided based on the splitting condition indicated by the first best splitting point.

Wherein the computer device obtains a sample advertisement i from the root node; where i herein may be a positive integer less than or equal to N. If the sample advertisement i satisfies the split condition indicated by the first best split point, the computer device may divide the sample advertisement i to the first type node. Optionally, if the sample advertisement i does not meet the splitting condition indicated by the first optimal splitting point, the computer device may divide the sample advertisement i into the second type nodes; wherein, the first type node and the second type node are all child nodes of the root node.

Further, the computer device takes the first type node and the second type node as nodes to be split respectively, and further can acquire a split cut-off condition associated with the initial probability lifting tree. When the node to be split meets the splitting cut-off condition, the computer equipment can construct an initial probability lifting tree directly based on the root node and the divided nodes. Optionally, when the node to be split does not meet the splitting cut-off condition, the computer device may use the sample advertisement attribute feature of each sample advertisement in the node to be split as the target attribute feature, and further may determine a second splitting point set conforming to the feature type of the target attribute feature, and determine a second optimal splitting point from the second splitting point set. Further, the computer device may divide the sample advertisement in the node to be split according to the splitting condition indicated by the second optimal splitting point until the divided node meets the splitting cut-off condition, at which time the computer device may construct an initial probability promotion tree based on the root node and the divided node.

For ease of understanding, further, please refer to fig. 4, fig. 4 is a schematic diagram of a scenario for constructing an initial probability-enhancing tree according to an embodiment of the present application. As shown in fig. 4, the computer device in the embodiment of the present application may be a computer device with a tree promotion training function, where the computer device may be any one of the user terminals in the user terminal cluster shown in fig. 1, for example, the user terminal 100a, and the computer device may also be the server 10F shown in fig. 1, and the computer device will not be limited herein.

As shown in fig. 4, for convenience of explanation, N sample advertisements obtained by the computer device in the embodiment of the present application when constructing the initial probability boosting tree may be taken as 5 examples, and the 5 sample advertisements may specifically include sample advertisement 1, sample advertisement 2, sample advertisement 3, sample advertisement 4, and sample advertisement 5. Wherein, a sample data pair corresponding to a sample advertisement, and the sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data.

It will be appreciated that the computer device may determine a root node (e.g., node J) for constructing the initial probability-promotion tree based on the 5 sample advertisements ₁ ). For example, the computer device may obtain sample ad conversion data for each of the 5 sample ads, and may determine a total probability distribution indicated by the 5 sample ads at the root node based on the obtained 5 sample ad conversion data. At this time, the computer device in fig. 4 may connect the node J ₁ As a father node, and further according to the node J ₁ Advertisement attribute characteristics of the sample advertisement in (1) to determine a first optimal split point to obtain the node J ₁ Is a first type node and a second type node. Wherein the first type node and the second type node are node J ₁ Is a child node of (a).

It should be appreciated that the computer device may connect node J ₁ The computer device may use the feature type of the initial attribute feature as a split point, and may further perform a deduplication process on the split point to add the split point after the deduplication process to the first split point set. For example, with node J ₁ The associated set of split points (i.e., the first set of split points) may be C ₁ ＝{F ₁ ，F ₂ ，F ₃ ，…，F _M Here, M may be a positive integer. In this embodiment, 4 may be taken as an example, and the first splitting point set may specifically include splitting point F ₁ (e.g., targeting downloads), split point F ₂ (e.g. small game popularization), split point F ₃ (e.g. E-commerce industry) and splitting Point F ₄ (e.g., operating system a download).

Further, the computer device may be based on a node for J ₁ Screening out the division points meeting the division point from the first division point setAnd determining the optimal splitting point of the condition, and further taking the screened optimal splitting point as a first optimal splitting point. For example, the computer device may first obtain the split point F from the first split point set ₁ Further, the separation point F shown in FIG. 4 can be based ₁ Indicated splitting condition, for node J ₁ Each sample advertisement in the list is divided in turn, so that a node J is obtained ₁ Is included in the node (a) and (b) is a node (b). For example, if sample advertisement 1 in the root node satisfies split point F ₁ The indicated split condition, then the sample advertisement 1 is partitioned to the first initial node (e.g., based on split point F ₁ The indicated split condition divides the junction J ₁ Left child node of (c). Alternatively, if sample advertisement 2 in the root node does not satisfy split point F ₁ The indicated split condition, then the sample advertisement 2 is partitioned to a second initial node (e.g., based on split point F ₁ The indicated split condition divides the junction J ₁ Right child node of (a) and so on. After the division is completed, the computer device may acquire and split the point F based on the characteristic loss reduction rule shown in the above formula (13) ₁ An associated current feature loss reduction parameter (i.e., a first feature loss reduction parameter).

If the node loss reduction parameter (e.g., initial value 0) is greater than the first characteristic loss reduction parameter, the computer device may determine a split point F ₁ The split point determination condition is not satisfied, at which time the computer device may continue to obtain the split point F from the first set of split points ₁ Is the next split point (e.g., split point F ₂ ) And can be based on the splitting point F ₂ Indicated splitting condition, for node J ₁ Each sample advertisement in (1) is divided again in turn to obtain a node J ₁ Is included in the new initial child node of (a). Wherein the joint J ₁ May include a new first initial node (e.g., based on split point F ₂ The indicated split condition divides the junction J ₁ Left child node of (a) and a new second initial node (e.g., based on split point F) ₂ The indicated split condition divides the junction J ₁ Right child node of (c). In the same way, the processing method comprises the steps of,the computer device can acquire and split the point F based on the characteristic loss reduction rule shown in the above formula (13) ₂ An associated current feature loss reduction parameter (i.e., a second feature loss reduction parameter).

If the node loss reduction parameter is greater than the second characteristic loss reduction parameter, the computer device may determine a split point F ₂ Nor does the split point determination condition be met, at which point the computer device still needs to re-acquire the split point F from the first set of split points ₂ Is the next split point (e.g., split point F ₃ ) Based on the split point F ₃ Indicated splitting condition, for node J ₁ And sequentially dividing each sample advertisement again to obtain target child nodes. Wherein the joint J ₁ May include a first target node (e.g., based on split point F ₃ The indicated split condition divides the junction J ₁ Left child node of (a) and a second target node (e.g., based on split point F) ₃ The indicated split condition divides the junction J ₁ Right child node of (c). Similarly, the computer device may obtain the splitting point F based on the characteristic loss reduction rule shown in the above formula (13) ₃ An associated current feature loss reduction parameter (i.e., a third feature loss reduction parameter).

When the node loss reduction parameter is less than or equal to the third characteristic loss reduction parameter, the computer device may determine a split point F ₃ Meeting the splitting point determination condition, the computer device can then determine the splitting point F ₃ As a joint J ₁ Is defined as the first optimum split point of the first pair. At this time, the computer device may be based on the split point F ₃ Junction J divided by indicated splitting conditions ₁ For example, the first target node (e.g., the node J shown in fig. 4) ₂ ) Referred to as joint J ₁ Will be based on the split point F ₃ Junction J divided by indicated splitting conditions ₁ For example, the second target node (e.g., the node J shown in fig. 4) ₃ ) Referred to as joint J ₁ Is a second type node of (c).

Further, the computer deviceJunction J ₂ And joint J ₃ Respectively as nodes to be split, and further may obtain a split cut-off condition associated with the probability lifting tree 40T to determine whether to continue dividing the nodes to be split. For example, the split cut-off condition may be used to indicate that the number of sample advertisements in the node to be split (i.e., the current node) is 1, i.e., the node to be split has failed to continue dividing.

As shown in fig. 4, the joint J ₂ The sample advertisements in (a) may include sample advertisement 4 and sample advertisement 5; junction J ₃ The banner advertisements in (1) may include banner advertisement 1, banner advertisement 2, and banner advertisement 3. For the joint J ₂ And joint J ₃ In other words, the number of sample advertisements in both nodes is not 1, and therefore, the computer device can determine the node J ₂ And joint J ₃ Neither satisfies the split cut-off condition, i.e. respectively joins J ₂ And joint J ₃ As a new parent node to continue partitioning it.

For any one node to be split (e.g., node J ₂ ) In other words, the computer device may connect node J ₂ The sample advertisement attribute feature of each sample advertisement in the (a) is taken as a target attribute feature, and then a second split point set conforming to the feature type of the target attribute feature can be determined. For example, with node J ₂ The associated set of split points (i.e., the second set of split points) may be C ₂ ＝{F ₁ ，F ₂ ，F ₄ The second set of splitting points may in particular comprise splitting points F ₁ (e.g., targeting downloads), split point F ₂ (e.g. small game popularization) and splitting Point F ₄ (e.g., operating system a download). At this time, the computer device may determine a second best split point (e.g., split point F from a second set of split points, see the above embodiments for determining a first best split point ₂ ) And then according to the joint J ₂ The splitting condition indicated by the second best splitting point of (2) for the node J ₂ Dividing the sample advertisements in (1) to obtain new first type node (e.g. node J shown in FIG. 4) ₄ ) And a new second type node (e.g., graph4, node J ₅ ). Wherein, node J ₄ And joint J ₅ Are all joint J ₂ Is a child node of (a). Due to the joint J ₄ The sample advertisements in (1) include sample advertisement 4, node J ₅ The sample advertisement in (1) includes sample advertisement 5, so that the node J can be determined ₄ And joint J ₅ When the split cut-off conditions are satisfied, the joint J is not needed ₄ And joint J ₅ The division is performed.

Similarly, the computer device is determining the node J ₃ The second best cleavage point of (2) is cleavage point F ₂ At this time, it can be based on the junction J ₃ The splitting condition indicated by the second best splitting point of (2) for the node J ₃ Dividing the sample advertisements in (1) to obtain new first type node (e.g. node J shown in FIG. 4) ₆ ) And a new second type node (e.g., node J shown in fig. 4) ₇ ). Wherein, node J ₆ And joint J ₇ Are all joint J ₃ Is a child node of (a). If the split cut-off condition at this time is used to indicate the depth of the initial probabilistic boosting tree constructed (e.g., 3), the computer device may determine the junction J ₆ And joint J ₇ When the split cut-off conditions are satisfied, the joint J is not needed ₆ And joint J ₇ The division is performed. As shown in fig. 4, a junction J ₆ The sample advertisements in (1) may include sample advertisement 2 and sample advertisement 3, node J ₇ The banner advertisement in (1) may include banner advertisement (1).

The computer device may be based on node J when it is determined that all of the divided nodes satisfy the split cutoff condition ₁ And the divided nodes (e.g., node J ₂ Junction J ₃ Junction J ₄ Junction J ₅ Junction J ₆ Node J ₇ The probability boosting tree 40T shown in fig. 4 (i.e., the initial probability boosting tree) is constructed. Wherein it is understood that the sum of probabilities of the nodes of each level in the initial probability promotion tree is 1.

Step S103, based on the tree convergence condition associated with the initial probability lifting tree and sample advertisement conversion data of each sample advertisement in the N sample advertisements, performing iterative training on the initial probability lifting tree to obtain a target probability lifting tree for predicting conversion data probability distribution of the target advertisement.

In particular, the computer device may obtain a tree convergence condition associated with the initial probability-promoting tree. Wherein, the tree convergence condition can be used for indicating the number of iterative training, namely the number of trees contained in the target probability lifting tree; alternatively, the tree convergence condition may be used to indicate that the distribution error of the target probability-promotion tree is less than a preset distribution error threshold (e.g., 0.3), and the tree convergence condition will not be limited herein. In this embodiment, taking a tree convergence condition including a distribution error threshold as an example, the computer device needs to determine a sample conversion data probability distribution (i.e., an actual conversion data probability distribution) based on sample advertisement conversion data of each of the N sample advertisements. Further, upon obtaining the predicted transformed data probability distribution of the initial probability-hoisting tree output, the computer device may determine a distribution error (e.g., root mean square error) of the initial probability-hoisting tree based on the sample transformed data probability distribution and the predicted transformed data probability distribution. Further, the computer device may determine a target probability boost tree for predicting a conversion data probability distribution of the target advertisement based on the distribution error, the distribution error threshold, and the tree convergence condition.

It will be appreciated that when the distribution error is less than or equal to the distribution error threshold, the computer device may determine that the initial probability-enhancing tree satisfies the tree convergence condition, at which point the computer device may treat the initial probability-enhancing tree satisfying the tree convergence condition as the target probability-enhancing tree for predicting the transformed data probability distribution of the targeted advertisement. Optionally, when the distribution error is greater than the distribution error threshold, the computer device may determine that the initial probability-enhancing tree does not meet the tree convergence condition, at this time, the computer device may adjust a tree parameter of the initial probability-enhancing tree, and use the adjusted initial probability-enhancing tree as the transition probability-enhancing tree, until the transition probability-enhancing tree meets the tree convergence condition, and the computer device may use the transition probability-enhancing tree meeting the tree convergence condition as the target probability-enhancing tree for predicting the conversion data probability distribution of the target advertisement.

For ease of understanding, further, please refer to fig. 5, fig. 5 is a schematic diagram of a scenario for performing iterative training on an initial probability-enhancing tree according to an embodiment of the present application. As shown in fig. 5, the computer device in the embodiment of the present application may be a computer device with a tree promotion training function, where the computer device may be any one of the user terminals in the user terminal cluster shown in fig. 1, for example, the user terminal 100a, and the computer device may also be the server 10F shown in fig. 1, and the computer device will not be limited herein.

As shown in fig. 5, the sample set acquired by the computer device may include sample data pairs corresponding to N (e.g., 5) sample advertisements. The 5 banner ads may include banner ad 1, banner ad 2, banner ad 3, banner ad 4, and banner ad 5. Wherein the initial probability-promoting tree that the computer device constructs based on these 5 sample advertisements may be in probability-promoting tree 51T shown in fig. 5. Wherein the probability boosting tree 51T may include the tree 1 shown in fig. 5, and the sample advertisement divided into the first leaf node in the tree 1 may include the sample advertisement 1, the sample advertisement 2, the sample advertisement 3, and the sample advertisement divided into the second leaf node of the tree 1 may include the sample advertisement 4 and the sample advertisement 5.

For ease of understanding, further, please refer to table 3, table 3 is a simplified sample probability distribution parameter calculation process table provided in the embodiment of the present application. As shown in table 3:

TABLE 3 Table 3

Wherein Δk in Table 3 _j,l A parameter update value corresponding to a shape parameter when the sample advertisement i is divided into the first leaf node of the j-th tree can be represented; Δθ _j,l Can represent the slope parameter of the sample advertisement i when divided into the first leaf node of the j-th tree And updating the value of the corresponding parameter.

It should be appreciated that the computer device may obtain a tree convergence condition associated with an initial probability-lifting tree (e.g., probability-lifting tree 51T). For example, the tree convergence condition herein may be used to indicate that the distribution error of the target probability boosting tree is less than a preset distribution error threshold (e.g., 0.3). At this time, the computer device needs to determine a sample conversion data probability distribution (i.e., an actual conversion data probability distribution) based on the sample advertisement conversion data of each of the 5 sample advertisements. Further, when the predicted transformed data probability distribution output by the probability boosting tree 51T is obtained based on the sample probability distribution parameter calculation process shown in table 3, the computer apparatus may determine a distribution error (e.g., root mean square error) of the probability boosting tree 51T based on the sample transformed data probability distribution and the predicted transformed data probability distribution.

It will be appreciated that when the distribution error (e.g., 0.2) is less than or equal to the distribution error threshold (e.g., 0.3), the computer device may determine that the probability-enhancing tree 51T satisfies the tree convergence condition, at which point the computer device may treat the probability-enhancing tree 51T satisfying the tree convergence condition directly as the target probability-enhancing tree for predicting the converted data probability distribution of the targeted advertisement. At this time, the target probability boosting tree may include 1 tree, i.e., tree 1.

Optionally, when the distribution error (0.5) is greater than the distribution error threshold (e.g., 0.3), the computer device may determine that the probability-boosting tree 51T does not meet the tree convergence condition, at which time the computer device may adjust the tree parameters of the probability-boosting tree 51T, and perform a second round of iterative training, that is, construct the tree 2 on the basis of the tree 1. At this time, the computer device may adjust the probability-lifting tree 51T as a transition probability-lifting tree (e.g., probability-lifting tree 52T shown in fig. 5). The probability boosting tree 52T may include, among other things, tree 1 and tree 2. Further, the computer device needs to determine whether the current probability-promoting tree 52T satisfies the tree convergence condition, i.e., if the current probability-promoting tree 52T satisfies the tree convergence condition, the computer device may regard the probability-promoting tree 52T including the tree 1 and the tree 2 as the target probability-promoting tree. If the current probability-promoting tree 52T does not meet the tree convergence condition, the computer device may continue to newly construct the tree 3 based on the current probability-promoting tree 52T to obtain a new probability-promoting tree 52T, where the new probability-promoting tree 52T includes the tree 1, the tree 2, and the tree 3 until the new probability-promoting tree meets the tree convergence condition, and may use the new probability-promoting tree meeting the tree convergence condition as the target probability-promoting tree for predicting the conversion data probability distribution of the target advertisement.

Further, referring to fig. 6, fig. 6 is a flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 6, the method may be performed by a user terminal having a tree promotion training function (for example, any one of the user terminals in the user terminal cluster shown in fig. 1, for example, the user terminal 100 a), may be performed by a server having a tree promotion training function (for example, the server 10F shown in fig. 1), or may be performed interactively by a user terminal having a tree promotion application function and a server having a tree promotion training function. And are not limited herein. The method may include at least the following steps S201-S211:

Step S201, N sample data pairs corresponding to the N sample advertisements are obtained.

Step S202, determining a root node for constructing an initial probability lifting tree based on N sample advertisements, taking sample advertisement attribute characteristics of each sample advertisement in the root node as initial attribute characteristics, determining a first splitting point set conforming to the characteristic types of the initial attribute characteristics, determining a first optimal splitting point from the first splitting point set, and dividing the N sample advertisements according to splitting conditions indicated by the first optimal splitting point to obtain the initial probability lifting tree.

Step S203, performing iterative training on the initial probability promotion tree based on the tree convergence condition associated with the initial probability promotion tree and sample advertisement conversion data of each sample advertisement in the N sample advertisements to obtain a target probability promotion tree for predicting conversion data probability distribution of the target advertisement.

The data processing method in the embodiment of the application may include a lifting tree training process and a lifting tree application process. It can be understood that the step S201 to the step S203 illustrate a tree promotion application process, and the specific implementation of the tree promotion application process can be referred to the description of the step S101 to the step S103 in the embodiment corresponding to fig. 3, which will not be repeated here.

For ease of understanding, further, please refer to fig. 7a, fig. 7a is a schematic diagram illustrating comparison between a predicted transition data probability distribution and an actual transition data probability distribution determined using a target probability boost tree according to an embodiment of the present application. As shown in fig. 7a, the line L1 refers to a probability distribution (i.e., estimated distribution) of conversion data predicted for a certain advertisement in the scene 1 when the computer device adopts the target probability lifting tree, and the line L2 refers to an estimated average distribution fitted after predicting the probability distribution of conversion data of all advertisements in the scene 1 when the computer device adopts the target probability lifting tree, as shown in fig. 7a, the estimated distribution in the scene 1 can be better fitted to the real distribution.

Similarly, the line L3 refers to the probability distribution (i.e. estimated distribution) of the transformation data predicted for a certain advertisement in the scene 2 when the computer device adopts the target probability lifting tree, and the line L4 refers to an estimated average distribution fitted after predicting the probability distribution of the transformation data of all advertisements in the scene 2 when the computer device adopts the target probability lifting tree, as shown in fig. 7a, the estimated distribution in the scene 2 can also be better fit to the real distribution.

The embodiment of the application can measure the distribution difference between the estimated distribution and the actual distribution by using an index of JS divergence (Jensen Shannon Divergence, JSD for short). Wherein, the value range of the JSD is between 0 and 1, the closer to 0, the smaller the distribution difference is, and the larger the distribution difference is on the contrary. The JSD shown in fig. 7a is less than 0.2.

For ease of understanding, further, please refer to fig. 7b, fig. 7b is a schematic diagram illustrating comparison between a predicted transition data probability distribution and an actual transition data probability distribution determined using a target probability boost tree according to an embodiment of the present application. As shown in fig. 7b, the line L1 refers to a probability distribution (i.e., estimated distribution) of conversion data predicted for a certain advertisement in the scene 3 (e.g., the secondary scene) when the computer device adopts the target probability lifting tree, and the line L2 refers to an estimated average distribution fitted after predicting the probability distribution of conversion data of all advertisements in the scene 3 when the computer device adopts the target probability lifting tree, as shown in fig. 7a, the estimated distribution in the scene 3 can be better fitted to the real distribution.

Similarly, the line L3 refers to a probability distribution (i.e. a predicted distribution) of conversion data predicted for a certain advertisement in the scene 4 (e.g. an activated scene) when the computer device adopts the target probability lifting tree, and the line L4 refers to a predicted average distribution fitted after predicting the probability distribution of conversion data of all advertisements in the scene 4 when the computer device adopts the target probability lifting tree, as shown in fig. 7b, the predicted distribution in the scene 4 can also be better fitted to the real distribution. Wherein the JSD shown in fig. 7b is less than 0.2.

The promotion tree application process may be specifically described in the following steps S204 to S206.

Step S204, a feature throwing request aiming at a target advertisement is acquired.

Wherein the drop feature request herein may include a targeted advertisement attribute feature of the targeted advertisement.

Step S205, a target probability lifting tree associated with the target advertisement is obtained, the target advertisement attribute features are input into the target probability lifting tree, and the target probability lifting tree outputs the conversion data probability distribution of the target advertisement.

In particular, the computer device may obtain a target probability promotion tree associated with a target advertisement. The target probability lifting tree can be obtained by performing iterative training on the initial probability lifting tree based on a tree convergence condition and sample advertisement conversion data of each sample advertisement in the N sample advertisements; the N sample advertisements may be used to construct a root node of the initial probability promotion tree; the initial probability lifting tree is obtained by dividing N sample advertisements based on splitting conditions indicated by a first optimal splitting point, and the first optimal splitting point is determined based on sample advertisement attribute characteristics of each sample advertisement in the root node; n is a positive integer. Further, the computer device may input the targeted advertisement attribute feature to a targeted probability promotion tree such that a transformed data probability distribution of the targeted advertisement is output by the targeted probability promotion tree.

Step S206, determining the target advertisement conversion data of the target advertisement based on the conversion data probability distribution of the target advertisement.

Specifically, the computer device may select a maximum conversion data probability from the conversion data probability distribution of the target advertisement, and may further use advertisement conversion data corresponding to the selected maximum conversion data probability as target advertisement conversion data of the target advertisement.

For ease of understanding, further, please refer to fig. 8, fig. 8 is a schematic diagram of a scenario for determining target conversion data according to an embodiment of the present application. As shown in fig. 8, the server 8F in the embodiment of the present application may be a computer device with a function of training a tree, and the server 8F may be the server 10F shown in fig. 1.

It should be appreciated that the conversion data in the embodiments of the present application may take advertisement conversion cost as an example, to illustrate a specific implementation in which the computer device predicts advertisement data of a target advertisement using a model based on a target probability promotion tree. . It should be appreciated that the conversion links for advertisements in the advertisement conversion scenario are long, taking the advertisement 8G shown in FIG. 8 (e.g., a game advertisement targeting pays assets) as an example: the entire download link is subjected to at least the steps of exposure, intermediate conversion links (e.g., clicking, downloading, installing, activating), final conversion (e.g., first paying for the asset), and the advertiser's final goal is often to pay the asset by the user, thus requiring the advertiser to check the cost of first paying for the asset before delivering the advertisement. In one aspect, for any one advertisement, the need to translate the data (e.g., cost) distribution is estimated prior to advertisement placement. On the other hand, in lengthy ad conversion processes, the influencing factors are numerous, all of which influence the final distribution of conversion costs. Based on this, the embodiment of the application can collect the conversion cost of the historical advertisement and the advertisement attribute characteristics of the historical advertisement to construct a sample data pair for training the initial probability boosting tree, so as to obtain a model capable of estimating the probability distribution of the complete conversion data (for example, a distribution estimation model based on the target boosting tree), and provide a reference for controlling the conversion data for advertisers.

The request for the placement feature of the advertisement 8G in the embodiment of the present application may be generated by the server 8F based on the advertisement attribute feature of the advertisement 8G when the advertiser creates the advertisement 8G, and the request for the placement feature may also be sent to the server 8F by a user terminal (e.g., a terminal device used by the advertiser) having a network connection relationship with the server 8F, where the request for the placement feature will not be limited. At this time, when the server 8F receives the putting feature request, the advertisement attribute feature of the advertisement 8G carried in the putting feature request may be obtained, and then the advertisement attribute feature of the advertisement 8G may be input into a pre-trained target probability promotion tree, so as to output a conversion data probability distribution (for example, a conversion cost probability distribution) of the advertisement 8G.

Further, the server 8F may obtain confidence intervals of estimated errors at different positions of the advertisement conversion link shown in fig. 8, so as to adaptively determine the final actual conversion cost of the advertisement 8G according to the conversion data probability distribution of the advertisement 8G. For example, as shown in fig. 8, the confidence interval of the estimated error of the click through rate may be [0,2], the confidence interval of the estimated error of the conversion rate may be [0,1.3], and the confidence interval of the estimated error of the deep conversion rate may be [0,0.7], then the click rate of the advertisement 8G needs to be obtained urgently in the computer device, at this time, the computer device may take the maximum cost in the probability distribution of the conversion data of the advertisement 8G as the target advertisement conversion data of the advertisement 8G, and return the maximum cost to the terminal device used by the advertiser, so that the advertiser uses the target advertisement conversion data as a reference, and set the actual conversion cost of the advertisement 8G. Optionally, the server 8F may also directly return the probability distribution of the conversion data of the advertisement 8G to the terminal device used by the advertiser, so that the advertiser sets the actual conversion cost of the advertisement 8G with the probability distribution of the conversion data of the advertisement 8G as a reference.

In this embodiment of the present application, the computer device may determine, according to N sample data pairs corresponding to the obtained N sample advertisements, a root node for constructing an initial probability promotion tree, and further may determine, by using sample advertisement attribute features of each sample advertisement in the root node, a first best splitting point, and divide sample advertisements in the root node, so as to obtain the initial probability promotion tree. Further, the computer device may iteratively train the initial probability-enhancing tree to obtain a target probability-enhancing tree for predicting a conversion data probability distribution (e.g., a conversion cost probability distribution) of the target advertisement. The whole process does not need to consume a large amount of calculation resources like the traditional probability prediction method to calculate the related parameters (such as covariance matrix) among all sample data, but adopts a target probability lifting tree to rapidly and accurately predict the probability distribution of the converted data of the target advertisement, so that the calculation time is reduced, and the prediction speed of the probability distribution is improved. In addition, the computer equipment consumes a large amount of computing resources, so that the target probability lifting tree can be suitable for a large number of application scenes, and the probability distribution prediction applicability is improved.

Further, referring to fig. 9, fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 9, the data processing apparatus 1 may be a computer program (including program code) running in a computer device, for example, the data processing apparatus 1 is an application software; the data processing device 1 may be adapted to perform the respective steps of the method provided by the embodiments of the present application. As shown in fig. 9, the data processing apparatus 1 may be operated on a computer device with a function of training a lifting tree, where the computer device may be the server 10F in the embodiment corresponding to fig. 1, or may be any one of the user terminal clusters in the embodiment corresponding to fig. 1, where the user terminal 100a has a target probability lifting tree. The data processing apparatus 1 may include: sample advertisement acquisition module 10, initial tree determination module 20, and iterative training module 30.

The sample advertisement obtaining module 10 is configured to obtain N sample advertisements and N sample data pairs corresponding to the N sample advertisements; n is a positive integer; one sample advertisement corresponds to one sample data pair, and one sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data;

The initial tree determining module 20 is configured to determine a root node for constructing an initial probability promoting tree based on N sample advertisements, determine a first splitting point set conforming to a feature type of the initial attribute feature by using a sample advertisement attribute feature of each sample advertisement in the root node as an initial attribute feature, determine a first optimal splitting point from the first splitting point set, and divide the N sample advertisements according to splitting conditions indicated by the first optimal splitting point to obtain the initial probability promoting tree;

wherein the initial tree determination module 20 comprises: a first division point determination unit 201, a first division unit 202, a cut-off condition acquisition unit 203, a second division point determination unit 204, and a second division unit 205.

The first split point determining unit 201 is configured to determine, based on the N sample advertisements, a root node for constructing an initial probability promotion tree, determine a first set of split points that matches a feature type of an initial attribute feature with a sample advertisement attribute feature of each sample advertisement in the root node as the initial attribute feature, and determine a first optimal split point from the first set of split points.

Wherein the first split point determination unit 201 includes: an initial feature determination subunit 2011, a deduplication processing subunit 2012, and a split point screening subunit 2013.

The initial feature determining subunit 2011 is configured to determine a root node for constructing an initial probability lifting tree based on N sample advertisements, and take a sample advertisement attribute feature of each sample advertisement in the root node as an initial attribute feature;

the deduplication processing subunit 2012 is configured to perform deduplication processing on the split points by using the feature type of the initial attribute feature as the split point, and add the split point after deduplication processing to the first split point set;

the splitting point screening subunit 2013 is configured to screen, based on the splitting point determining condition for the root node, an optimal splitting point that meets the splitting point determining condition from the first splitting point set, and take the screened optimal splitting point as the first optimal splitting point.

the split point screening subunit 2013 includes: a loss reduction parameter determination subunit 20130, a first split point acquisition subunit 20131, a first parameter acquisition subunit 20132, a first comparison subunit 20133, a first determination subunit 20134, a second split point acquisition subunit 20135, a second determination subunit 20136, a second parameter acquisition subunit 20137, a second comparison subunit 20138, and a third determination subunit 20139.

The loss reduction parameter determining subunit 20130 is configured to obtain a splitting point determining condition for a root node, and obtain a node loss reduction parameter corresponding to the root node;

the first splitting point obtaining subunit 20131 is configured to obtain a splitting point F from the first splitting point set _a Based on the split point F _a Dividing sample advertisements in the root node under the indicated splitting condition to obtain initial child nodes of the root node; the initial sub-node comprises a first initial node and a second initial node;

the first parameter acquisition subunit 20132 is configured to acquire the splitting point F _a An associated first characteristic loss reduction parameter.

the first parameter obtaining subunit 20132 is further specifically configured to:

The first comparing subunit 20133 is configured to compare the first feature loss reduction parameter and the node loss reduction parameter;

the first determining subunit 20134 is configured to determine the splitting point F if the node loss reduction parameter is less than or equal to the first characteristic loss reduction parameter _a Meets the determination condition of the splitting point, and the splitting point F _a As the first best split point.

The second split point obtaining subunit 20135 is configured to determine the split point F if the node loss reduction parameter is greater than the first characteristic loss reduction parameter _a The splitting point F is obtained from the first splitting point set without meeting the splitting point determining condition _a+1 ；

The second determination subunit 20136 is configured to determine the splitting point F if a+1 is equal to M _a+1 As the first best split point.

The second parameter obtaining subunit 20137 is configured to, if a+1 is smaller than M, determine that the second parameter is based on the split point F _a+1 Dividing sample advertisements in root nodes under the indicated splitting condition to obtain a splitting point F _a+1 An associated second feature loss reduction parameter;

the second comparing subunit 20138 is configured to compare the second feature loss reduction parameter with the node loss reduction parameter;

the third determining subunit 20139 is configured to determine the splitting point F if the node loss reduction parameter is less than or equal to the second characteristic loss reduction parameter _a+1 Meets the determination condition of the splitting point, and the splitting point F _a+1 As the first best split point.

The specific implementation manner of the loss reduction parameter determining subunit 20130, the first split point obtaining subunit 20131, the first parameter obtaining subunit 20132, the first comparing subunit 20133, the first determining subunit 20134, the second split point obtaining subunit 20135, the second determining subunit 20136, the second parameter obtaining subunit 20137, the second comparing subunit 20138 and the third determining subunit 20139 may be referred to the description of the best split point in the embodiment corresponding to fig. 3, and will not be further described herein.

The specific implementation manner of the initial feature determining subunit 2011, the deduplication processing subunit 2012, and the split point screening subunit 2013 may be referred to the description of the screening of the best split point in the embodiment corresponding to fig. 3, and will not be further described herein.

The first dividing unit 202 is configured to divide the sample advertisement in the root node according to the splitting condition indicated by the first optimal splitting point, so as to obtain a first type node and a second type node.

Wherein the first dividing unit 202 includes: a sample ad acquisition subunit 2021, a first partitioning subunit 2022, and a second partitioning subunit 2023.

The sample advertisement obtaining subunit 2021 is configured to obtain a sample advertisement i from the root node; i is a positive integer less than or equal to N;

the first dividing subunit 2022 is configured to divide the sample advertisement i to the first type node if the sample advertisement i meets the splitting condition indicated by the first optimal splitting point;

the second dividing subunit 2023 is configured to divide the sample advertisement i into the second type node if the sample advertisement i does not meet the splitting condition indicated by the first optimal splitting point; the first type node and the second type node are both child nodes of the root node.

The specific implementation manner of the sample advertisement obtaining subunit 2021, the first dividing subunit 2022, and the second dividing subunit 2023 may refer to the description of dividing the sample advertisement in the root node in the embodiment corresponding to fig. 3, which will not be further described herein.

The cut-off condition obtaining unit 203 is configured to obtain a cut-off condition associated with the initial probability lifting tree by using the first type node and the second type node as nodes to be split respectively;

the second splitting point determining unit 204 is configured to determine, when the node to be split does not meet the splitting cut-off condition, a second splitting point set that matches a feature type of the target attribute feature with a sample advertisement attribute feature of each sample advertisement in the node to be split as the target attribute feature, and determine a second optimal splitting point from the second splitting point set;

the second partitioning unit 205 is configured to partition the sample advertisement in the node to be split according to the splitting condition indicated by the second optimal splitting point, until the partitioned node meets the splitting cut-off condition, and construct an initial probability lifting tree based on the root node and the partitioned node.

The specific implementation manner of the first splitting point determining unit 201, the first dividing unit 202, the cut-off condition obtaining unit 203, the second splitting point determining unit 204 and the second dividing unit 205 may be referred to the description of step S102 in the embodiment corresponding to fig. 3, and the detailed description will not be repeated here.

The iterative training module 30 is configured to iteratively train the initial probability-boosting tree based on a tree convergence condition associated with the initial probability-boosting tree and sample advertisement conversion data of each of the N sample advertisements, to obtain a target probability-boosting tree for predicting a conversion data probability distribution of the target advertisement.

Wherein the iterative training module 30 comprises: a tree convergence condition acquisition unit 301, a sample probability distribution determination unit 302, a predictive probability distribution determination unit 303, a first target tree determination unit 304, a condition unsatisfied unit 305, and a second target tree determination unit 306.

The tree convergence condition obtaining unit 301 is configured to obtain a tree convergence condition associated with an initial probability boosting tree; the tree convergence condition includes a distribution error threshold;

the sample probability distribution determining unit 302 is configured to determine a sample conversion data probability distribution based on sample advertisement conversion data of each of the N sample advertisements;

the prediction probability distribution determining unit 303 is configured to determine, when the prediction conversion data probability distribution output by the initial probability lifting tree is obtained, a distribution error of the initial probability lifting tree based on the sample conversion data probability distribution and the prediction conversion data probability distribution;

The first target tree determining unit 304 is configured to determine that the initial probability boosting tree satisfies a tree convergence condition when the distribution error is less than or equal to the distribution error threshold, and take the initial probability boosting tree satisfying the tree convergence condition as a target probability boosting tree for predicting a conversion data probability distribution of the target advertisement.

The condition unsatisfied unit 305 is configured to determine that the initial probability boosting tree does not satisfy the tree convergence condition when the distribution error is greater than the distribution error threshold;

the second target tree determining unit 306 is configured to adjust tree parameters of the initial probability promoting tree, take the adjusted initial probability promoting tree as a transition probability promoting tree, and take the transition probability promoting tree meeting the tree convergence condition as a target probability promoting tree for predicting conversion data probability distribution of the target advertisement until the transition probability promoting tree meets the tree convergence condition.

The specific implementation manner of the tree convergence condition obtaining unit 301, the sample probability distribution determining unit 302, the prediction probability distribution determining unit 303, the first target tree determining unit 304, the condition unsatisfied unit 305 and the second target tree determining unit 306 may be referred to the description of step S103 in the embodiment corresponding to fig. 3, and will not be further described herein.

The specific implementation manner of the sample advertisement obtaining module 10, the initial tree determining module 20 and the iterative training module 30 may be referred to the description of step S101 to step S103 in the embodiment corresponding to fig. 3, and the detailed description will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted.

Further, referring to fig. 10, fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing means 2 may be a computer program (comprising program code) running in a computer device, for example the data processing means 2 is an application software; the data processing device 2 may be adapted to perform the respective steps of the method provided by the embodiments of the present application. As shown in fig. 10, the data processing apparatus 2 may be implemented in a computer device having a tree promotion training function, where the computer device may be the server 10F in the embodiment corresponding to fig. 1, or may be any one of the user terminal clusters in the embodiment corresponding to fig. 1, where the user terminal 100a has a target probability promotion tree. The data processing apparatus 2 may include: a delivery request acquisition module 100, a probability distribution determination module 200, and a target conversion data determination module 300.

The delivery request acquisition module 100 is configured to acquire a delivery feature request for a target advertisement; the putting feature request includes target advertisement attribute features of the target advertisement;

the probability distribution determining module 200 is configured to obtain a target probability promotion tree associated with a target advertisement, input a target advertisement attribute feature to the target probability promotion tree, and output a conversion data probability distribution of the target advertisement from the target probability promotion tree; the target probability lifting tree is obtained after iterative training of the initial probability lifting tree based on the tree convergence condition and sample advertisement conversion data of each sample advertisement in the N sample advertisements; the N sample advertisements are used for constructing a root node of the initial probability lifting tree; the initial probability lifting tree is obtained by dividing N sample advertisements based on the splitting condition indicated by the first optimal splitting point; the first best split point is determined based on sample advertisement attribute characteristics of each sample advertisement in the root node; n is a positive integer;

the target conversion data determining module 300 is configured to determine target advertisement conversion data of the target advertisement based on the conversion data probability distribution of the target advertisement.

The specific implementation manners of the delivery request obtaining module 100, the probability distribution determining module 200, and the target conversion data determining module 300 may be referred to the description of step S201 to step S206 in the embodiment corresponding to fig. 6, and will not be further described herein. In addition, the description of the beneficial effects of the same method is omitted.

Further, referring to fig. 11, fig. 11 is a schematic diagram of a computer device according to an embodiment of the present application. As shown in fig. 11, the computer device 1000 may be a computer device having a lift tree training function, and the computer device 1000 may include: at least one processor 1001, e.g., a CPU, at least one network interface 1004, memory 1005, at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the aforementioned processor 1001. As shown in fig. 11, the memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application. In some embodiments, the computer device may further include a user interface 1003 shown in fig. 11, for example, if the computer device is a user terminal (e.g., user terminal 100 a) with a tree promotion training function shown in fig. 1, the computer device may further include the user interface 1003, where the user interface 1003 may include a Display, a Keyboard (Keyboard), and so on.

In the computer device 1000 shown in fig. 11, the network interface 1004 is mainly used for network communication; while user interface 1003 is primarily used as an interface for providing input to a user; and the processor 1001 may be used to invoke device control applications stored in the memory 1005.

It should be understood that the computer device 1000 described in the embodiments of the present application may perform the description of the data processing method in the embodiments corresponding to fig. 3 and 6, and may also perform the description of the data processing apparatus 1 in the embodiments corresponding to fig. 9 or the description of the data processing apparatus 2 in the embodiments corresponding to fig. 10, which are not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiments of the present application further provide a computer readable storage medium, in which the aforementioned computer program executed by the data processing apparatus 1 or the data processing apparatus 2 is stored, and the computer program includes program instructions, when executed by the processor, can execute the description of the data processing method in the embodiment corresponding to fig. 3 or fig. 6, and therefore, a description will not be given here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or, alternatively, across multiple computing devices distributed across multiple sites and interconnected by a communication network, where the multiple computing devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.

In one aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device may execute the description of the data processing method in the embodiment corresponding to fig. 3 or fig. 6, which is not described herein. In addition, the description of the beneficial effects of the same method is omitted.

Further, referring to fig. 12, fig. 12 is a schematic structural diagram of a data processing system according to an embodiment of the present application. The data processing system 3 may comprise data processing means 1a and data processing means 2a. The data processing apparatus 1a may be the data processing apparatus 1 in the embodiment corresponding to fig. 9, and it is to be understood that the data processing apparatus 1a may be integrated in a computer device having a lifting tree training function, for example, the computer device may be the server 10F in the embodiment corresponding to fig. 1, or may be any one of the user terminal clusters in the embodiment corresponding to fig. 1, for example, the user terminal 100a running a target probability lifting tree. Therefore, a detailed description will not be given here. The data processing apparatus 2a may be the data processing apparatus 2 in the embodiment corresponding to fig. 10, it is to be understood that the data processing apparatus 2a may be integrated in a computer device having a lifting tree application function, and the computer device may be any one of the user terminal clusters in the embodiment corresponding to fig. 1, for example, the user terminal 100a running a user terminal with a target probability lifting tree. Therefore, a detailed description will not be given here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the data processing system according to the present application, please refer to the description of the method embodiments of the present application.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of computer programs, which may be stored on a computer-readable storage medium, and which, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims

1. A method of data processing, comprising:

acquiring N sample advertisements and N sample data pairs corresponding to the N sample advertisements; the N is a positive integer; one sample advertisement corresponds to one sample data pair, and one sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data;

determining a root node for constructing an initial probability lifting tree based on the N sample advertisements, taking sample advertisement attribute characteristics of each sample advertisement in the root node as initial attribute characteristics, determining a first splitting point set conforming to the characteristic types of the initial attribute characteristics, determining a first optimal splitting point from the first splitting point set, and dividing the N sample advertisements according to splitting conditions indicated by the first optimal splitting point to obtain the initial probability lifting tree;

And performing iterative training on the initial probability lifting tree based on a tree convergence condition associated with the initial probability lifting tree and sample advertisement conversion data of each sample advertisement in the N sample advertisements to obtain a target probability lifting tree for predicting conversion data probability distribution of target advertisements.

2. The method of claim 1, wherein the determining, based on the N sample advertisements, a root node for constructing an initial probability-promoting tree, taking sample advertisement attribute features of each sample advertisement in the root node as initial attribute features, determining a first set of split points conforming to feature types of the initial attribute features, determining a first best split point from the first set of split points, dividing the N sample advertisements according to split conditions indicated by the first best split point, and obtaining the initial probability-promoting tree, comprises:

determining a root node for constructing an initial probability lifting tree based on the N sample advertisements, taking sample advertisement attribute characteristics of each sample advertisement in the root node as initial attribute characteristics, determining a first split point set conforming to the characteristic types of the initial attribute characteristics, and determining a first optimal split point from the first split point set;

Dividing the sample advertisements in the root node according to the splitting conditions indicated by the first optimal splitting points to obtain a first type node and a second type node;

taking the first type node and the second type node as nodes to be split respectively, and acquiring a splitting cut-off condition associated with the initial probability lifting tree;

when the node to be split does not meet the splitting cut-off condition, taking the sample advertisement attribute feature of each sample advertisement in the node to be split as a target attribute feature, determining a second splitting point set conforming to the feature type of the target attribute feature, and determining a second optimal splitting point from the second splitting point set;

and dividing the sample advertisement in the node to be split according to the splitting condition indicated by the second optimal splitting point until the divided node meets the splitting cut-off condition, and constructing the initial probability lifting tree based on the root node and the divided node.

3. The method of claim 2, wherein the determining, based on the N sample advertisements, a root node for constructing an initial probability promotion tree, taking sample advertisement attribute features of each sample advertisement in the root node as initial attribute features, determining a first set of split points that match feature types of the initial attribute features, and determining a first best split point from the first set of split points, comprises:

Determining a root node for constructing an initial probability lifting tree based on the N sample advertisements, and taking sample advertisement attribute characteristics of each sample advertisement in the root node as initial attribute characteristics;

taking the feature type of the initial attribute feature as a splitting point, performing de-duplication treatment on the splitting point, and adding the splitting point subjected to the de-duplication treatment to a first splitting point set;

and screening the optimal splitting point meeting the splitting point determining condition from the first splitting point set based on the splitting point determining condition aiming at the root node, and taking the screened optimal splitting point as a first optimal splitting point.

4. A method according to claim 3, wherein the first set of split points comprises M split points; m is a positive integer; the M split points comprise split point F _a The method comprises the steps of carrying out a first treatment on the surface of the The a is less than or equal to the M;

the step of screening the best splitting point meeting the splitting point determining condition from the first splitting point set based on the splitting point determining condition aiming at the root node, and taking the screened best splitting point as a first best splitting point comprises the following steps:

acquiring a splitting point determining condition aiming at the root node, and acquiring a node loss reduction parameter corresponding to the root node;

Obtaining the splitting point F from the first splitting point set _a Based on the splitting point F _a Dividing sample advertisements in the root node under the indicated splitting condition to obtain initial child nodes of the root node; the initial child node comprises a first initial node and a second initial node;

acquiring the splitting point F _a An associated first feature loss reduction parameter;

comparing the first characteristic loss reduction parameter with the node loss reduction parameter;

if the node loss reduction parameter is less than or equal to the first characteristic loss reduction parameter, determining the splitting point F _a Meeting the splitting point determining condition, and setting the splitting point F _a As the first best split point.

5. The method of claim 4, wherein the total probability distribution indicated by the root node comprises a total shape parameter and a total slope parameter; the first sub-probability distribution indicated by the first initial node includes the first shape parameter and a first slope parameter; the second sub-probability distribution indicated by the second initial node includes a second shape parameter and a second slope parameter;

said acquisition and said splitting point F _a An associated first feature loss reduction parameter comprising:

determining a total node loss of the root node based on the total shape parameter, the total slope parameter, and a node loss determination rule;

acquiring a characteristic loss reduction rule, determining the splitting point F based on the total number of samples, the first number of samples, the second number of samples, the total point loss, the first node loss, the second node loss and the characteristic loss reduction rule _a An associated first characteristic loss reduction parameter.

6. The method according to claim 4, wherein the method further comprises:

If the node loss reduction parameter is greater than the first characteristic loss reduction parameter, determining the split point F _a Acquiring a splitting point F from the first splitting point set without satisfying the splitting point determination condition _a+1 ；

If the a+1 is equal to the M, the splitting point F _a+1 As the first best split point.

7. The method of claim 6, wherein the method further comprises:

if the a+1 is smaller than the M, then based on the splitting point F _a+1 Dividing sample advertisements in the root node under the indicated splitting condition to obtain the splitting point F _a+1 An associated second feature loss reduction parameter;

comparing the second characteristic loss reduction parameter with the node loss reduction parameter;

if the node loss reduction parameter is less than or equal to the second characteristic loss reduction parameter, determining the splitting point F _a+1 Meeting the splitting point determining condition, and setting the splitting point F _a+1 As a means ofA first optimum split point.

8. The method according to claim 2, wherein the dividing the sample advertisement in the root node according to the splitting condition indicated by the first optimal splitting point to obtain a first type node and a second type node includes:

Acquiring a sample advertisement i from the root node; the i is a positive integer less than or equal to the N;

if the sample advertisement i meets the splitting condition indicated by the first optimal splitting point, dividing the sample advertisement i into first type nodes;

if the sample advertisement i does not meet the splitting condition indicated by the first optimal splitting point, dividing the sample advertisement i into a second type node; the first type node and the second type node are both child nodes of the root node.

9. The method of claim 1, wherein iteratively training the initial probability-boosting tree based on tree convergence conditions associated with the initial probability-boosting tree and sample advertisement conversion data for each of the N sample advertisements to obtain a target probability-boosting tree for predicting a conversion data probability distribution of a target advertisement, comprises:

acquiring a tree convergence condition associated with the initial probability boosting tree; the tree convergence condition includes a distribution error threshold;

determining a sample conversion data probability distribution based on sample advertisement conversion data for each of the N sample advertisements;

When the predicted conversion data probability distribution output by the initial probability lifting tree is obtained, determining the distribution error of the initial probability lifting tree based on the sample conversion data probability distribution and the predicted conversion data probability distribution;

and when the distribution error is smaller than or equal to the distribution error threshold, determining that the initial probability lifting tree meets the tree convergence condition, and taking the initial probability lifting tree meeting the tree convergence condition as a target probability lifting tree for predicting conversion data probability distribution of the target advertisement.

10. The method according to claim 9, wherein the method further comprises:

when the distribution error is greater than the distribution error threshold, determining that the initial probability boosting tree does not meet the tree convergence condition;

and adjusting the tree parameters of the initial probability lifting tree, taking the adjusted initial probability lifting tree as a transition probability lifting tree, and taking the transition probability lifting tree meeting the tree convergence condition as a target probability lifting tree for predicting the conversion data probability distribution of the target advertisement until the transition probability lifting tree meets the tree convergence condition.

11. A method of data processing, comprising:

acquiring a throwing characteristic request aiming at a target advertisement; the putting feature request comprises target advertisement attribute features of the target advertisement;

acquiring a target probability lifting tree associated with the target advertisement, inputting the target advertisement attribute characteristics into the target probability lifting tree, and outputting conversion data probability distribution of the target advertisement by the target probability lifting tree; the target probability lifting tree is obtained after iterative training of the initial probability lifting tree based on a tree convergence condition and sample advertisement conversion data of each sample advertisement in the N sample advertisements; the N sample advertisements are used for constructing a root node of the initial probability lifting tree; the initial probability lifting tree is obtained by dividing the N sample advertisements based on the splitting condition indicated by the first optimal splitting point; the first best split point is determined based on sample advertisement attribute characteristics of each sample advertisement in the root node; the N is a positive integer;

and determining target advertisement conversion data of the target advertisement based on the conversion data probability distribution of the target advertisement.

12. A data processing apparatus, comprising:

the sample advertisement acquisition module is used for acquiring N sample advertisements and N sample data pairs corresponding to the N sample advertisements; the N is a positive integer; one sample advertisement corresponds to one sample data pair, and one sample data pair is determined by the sample advertisement attribute feature and the sample advertisement conversion data;

an initial tree determining module, configured to determine, based on the N sample advertisements, a root node for constructing an initial probability lifting tree, determine a first splitting point set conforming to a feature type of each sample advertisement in the root node by using a sample advertisement attribute feature of the root node as an initial attribute feature, determine a first optimal splitting point from the first splitting point set, and divide the N sample advertisements according to splitting conditions indicated by the first optimal splitting point, so as to obtain the initial probability lifting tree;

and the iterative training module is used for carrying out iterative training on the initial probability lifting tree based on the tree convergence condition associated with the initial probability lifting tree and sample advertisement conversion data of each sample advertisement in the N sample advertisements to obtain a target probability lifting tree for predicting conversion data probability distribution of target advertisements.

13. A data processing apparatus, comprising:

the delivery request acquisition module is used for acquiring a delivery characteristic request aiming at the target advertisement; the putting feature request comprises target advertisement attribute features of the target advertisement;

the probability distribution determining module is used for acquiring a target probability lifting tree associated with the target advertisement, inputting the target advertisement attribute characteristics into the target probability lifting tree, and outputting conversion data probability distribution of the target advertisement by the target probability lifting tree; the target probability lifting tree is obtained after iterative training of the initial probability lifting tree based on a tree convergence condition and sample advertisement conversion data of each sample advertisement in the N sample advertisements; the N sample advertisements are used for constructing a root node of the initial probability lifting tree; the initial probability lifting tree is obtained by dividing the N sample advertisements based on the splitting condition indicated by the first optimal splitting point; the first best split point is determined based on sample advertisement attribute characteristics of each sample advertisement in the root node; the N is a positive integer;

14. A computer device, comprising: a processor and a memory;

the processor is connected to a memory, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program to cause the computer device to perform the method of any of claims 1-11.

15. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-11.

16. A computer program product or computer program, characterized in that it comprises computer instructions stored in a computer-readable storage medium, which are adapted to be read and executed by a processor to cause a computer device with the processor to perform the method of any of claims 1-11.