CN112288489A - Advertisement reach rate estimation method and device, electronic equipment and medium - Google Patents

Advertisement reach rate estimation method and device, electronic equipment and medium Download PDF

Info

Publication number
CN112288489A
CN112288489A CN202011205753.8A CN202011205753A CN112288489A CN 112288489 A CN112288489 A CN 112288489A CN 202011205753 A CN202011205753 A CN 202011205753A CN 112288489 A CN112288489 A CN 112288489A
Authority
CN
China
Prior art keywords
advertisement
reach
reach rate
estimation model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011205753.8A
Other languages
Chinese (zh)
Inventor
王同乐
段少毅
张荣荣
贾颖姣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202011205753.8A priority Critical patent/CN112288489A/en
Publication of CN112288489A publication Critical patent/CN112288489A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0276Advertisement creation

Abstract

The application provides a method, a device, electronic equipment and a medium for pre-estimating advertisement reach rate, wherein the method comprises the following steps: constructing an advertisement data set based on the acquired advertisement data and dividing the advertisement data set into a training set and a testing set, wherein each piece of advertisement data comprises advertisement characteristics and advertisement reach rates; carrying out discrete feature coding on the training set and the test set to respectively obtain advertisement feature vectors corresponding to the training set and advertisement feature vectors corresponding to the test set; constructing an advertisement reach rate estimation model based on a gradient promotion decision tree; inputting the advertisement characteristic vector corresponding to the training set into an advertisement reach rate estimation model to be trained, and training the mapping relation between the advertisement characteristic and the advertisement reach rate in the advertisement reach rate estimation model; the advertisement characteristics of the target advertisement are input into the trained advertisement reach estimation model to estimate the advertisement reach after the target advertisement is put, so that the problems of a large number of characteristic projects and characteristic combination explosion caused by using logistic regression can be completely avoided.

Description

Advertisement reach rate estimation method and device, electronic equipment and medium
Technical Field
The present application relates to the field of advertisement reach rate estimation technologies, and in particular, to a method, an apparatus, an electronic device, and a medium for estimating advertisement reach rate.
Background
The development of internet technology has led to a tremendous change in the form of advertising. Previous forms of advertising have been dominated by offline paper advertising and online television advertising. Nowadays, internet advertisement has largely replaced traditional advertisement, and becomes the mainstream advertisement form which is full of people's screens. The prosperity of internet advertisements enables the accurate advertisement delivery to become a problem to be solved urgently, and advertisers hope to achieve the accurate advertisement delivery, so that huge advertisement expenses can be saved.
In the prior art, a Click Through Rate (CTR) estimation method is mostly adopted for accurate advertisement delivery, and the method extracts features from advertisement data to predict that an advertisement will not be clicked by a user. However, accurate placement of advertisements also faces a challenge: advertisers or advertisers want to know how high the ad reach rate is when a certain amount of ads is delivered. The research of the advertisement reach rate estimation method is beneficial to reasonably arranging the advertisement delivery quantity so as to achieve the aim of saving expenses.
The existing advertisement reach rate estimation methods are very few, and the known advertisement reach rate estimation method based on curve fitting proposed by google in 2013 is provided. The method is based on historical advertising data, and adopts a nonlinear curve to fit the relation between the advertising characteristics and the reach rate. Experiments show that the method has good effect during interpolation, but has poor effect during extrapolation, namely, the predicted generalization capability is not strong.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method, an apparatus, an electronic device and a medium for estimating advertisement reach rate, which can completely avoid the problem of explosion of a large number of feature engineering and feature combinations caused by using logistic regression.
In a first aspect, an embodiment of the present application provides an advertisement reach estimation method, including:
constructing an advertisement data set based on a plurality of pieces of acquired advertisement data, and dividing the advertisement data set into a training set and a testing set, wherein each piece of advertisement data comprises advertisement characteristics and an advertisement reach rate;
performing discrete feature coding on the training set and the test set to respectively obtain advertisement feature vectors corresponding to the training set and advertisement feature vectors corresponding to the test set;
constructing an advertisement reach rate estimation model based on a gradient promotion decision tree;
inputting the advertisement characteristic vector corresponding to the training set into an advertisement reach rate pre-estimation model to be trained, and training the mapping relation between the advertisement characteristic and the advertisement reach rate in the advertisement reach rate pre-estimation model;
inputting the advertisement characteristics of the target advertisement into a trained advertisement reach rate estimation model to estimate the advertisement reach rate after the target advertisement is put.
In a possible implementation manner, after constructing an advertisement data set based on the obtained several pieces of advertisement data, before dividing the advertisement data set into a training set and a test set, the method further includes:
and performing data cleaning processing including null value removal and feature screening on the advertisement data set.
In one possible implementation, the advertising features include: at least one of commodity information, advertising media information, advertising forms, target audience information, user equipment information, advertising market information, and exposure.
In a possible implementation manner, before inputting the advertisement features of the target advertisement into the trained advertisement reach estimation model to estimate the advertisement reach after the target advertisement is delivered, the method further includes:
inputting the advertisement characteristic vectors corresponding to the test set into a trained advertisement reach rate pre-estimation model for testing;
and evaluating the performance index of the advertisement reach rate estimation model based on the test result.
In one possible embodiment, the performance indicators of the advertisement reach prediction model include: at least one of goodness-of-fit, root mean square error, mean absolute error, and mean absolute percentage error.
In a possible implementation manner, performing discrete feature coding on the training set and the test set to obtain an advertisement feature vector corresponding to the training set and an advertisement feature vector corresponding to the test set, respectively, includes:
performing discrete feature coding on discrete features in the training set and the test set by adopting label coding;
and carrying out discrete feature coding on the continuous features in the training set and the test set by adopting histogram coding.
In one possible implementation, constructing a gradient boosting decision tree-based advertisement reach prediction model includes:
initializing a weak learner, wherein the decision tree is a tree with only root nodes in the process of gradient promotion during initialization;
performing the following steps for each strong learner:
for each sample, calculating the negative gradient of its squared loss function;
taking the approximate value of the negative gradient as a residual error, and constructing training data of a next tree based on the residual error to obtain a regression tree;
calculating a target fitting value of a leaf node region of the regression tree;
updating the strong learner based on the target fitting value;
and obtaining a target learner based on the accumulated sum of the weak learner and each strong learner, thereby constructing an advertisement reach rate estimation model based on a gradient promotion decision tree.
In a second aspect, an embodiment of the present application further provides an advertisement reach estimation apparatus, including:
the system comprises a dividing module, a judging module and a judging module, wherein the dividing module is used for constructing an advertisement data set based on a plurality of pieces of acquired advertisement data and dividing the advertisement data set into a training set and a testing set, and each piece of advertisement data comprises advertisement characteristics and advertisement reach rates;
the coding module is used for carrying out discrete feature coding on the training set and the test set to respectively obtain advertisement feature vectors corresponding to the training set and advertisement feature vectors corresponding to the test set;
the construction module is used for constructing an advertisement reach rate estimation model based on a gradient promotion decision tree;
the training module is used for inputting the advertisement characteristic vectors corresponding to the training set into an advertisement reach rate pre-estimation model to be trained and training the mapping relation between the advertisement characteristics and the advertisement reach rate in the advertisement reach rate pre-estimation model;
and the estimation module is used for inputting the advertisement characteristics of the target advertisement into the trained advertisement reach estimation model to estimate the advertisement reach after the target advertisement is delivered.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.
In a fourth aspect, this application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.
According to the advertisement reach estimation method provided by the embodiment of the application, firstly, an advertisement data set is constructed based on a plurality of pieces of acquired advertisement data, the advertisement data set is divided into a training set and a testing set, and each piece of advertisement data comprises advertisement characteristics and an advertisement reach; because the input features of the Gradient Boosting Decision Tree (GBDT) must be discrete features, discrete feature coding is performed on the training set and the test set to obtain the advertisement feature vector corresponding to the training set and the advertisement feature vector corresponding to the test set, respectively. Then constructing an advertisement reach rate estimation model based on a gradient promotion decision tree, namely constructing a regression model of the relation between advertisement characteristics and reach rates by adopting the gradient promotion decision tree; and inputting the advertisement characteristic vector corresponding to the training set into an advertisement reach rate pre-estimation model to be trained, and training the mapping relation between the advertisement characteristic and the advertisement reach rate in the advertisement reach rate pre-estimation model. And finally, inputting the advertisement characteristics of the target advertisement into a trained advertisement reach rate estimation model to estimate the advertisement reach rate after the target advertisement is put. The embodiment of the application adopts the gradient promotion decision tree to construct a Regression model of the relation between the advertisement characteristics and the reach rate, namely an advertisement reach rate pre-estimation model based on the gradient promotion decision tree, the model has excellent Regression capability, is composed of a plurality of decision trees, can realize automatic characteristic engineering, is very suitable for advertisement data with a large number of discrete characteristics, and completely avoids the problems of a large number of characteristic engineering and characteristic combination explosion caused by using Logistic Regression (LR).
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a flowchart illustrating a method for estimating advertisement reach provided by an embodiment of the present application;
FIG. 2 is a schematic structural diagram illustrating an advertisement reach estimation apparatus according to an embodiment of the present disclosure;
fig. 3 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Considering that the existing advertisement reach rate estimation methods are very few, there are known advertisement reach rate estimation methods proposed by google in 2013 based on curve fitting. The method is based on historical advertising data, and adopts a nonlinear curve to fit the relation between the advertising characteristics and the reach rate. Experiments show that the method has good effect during interpolation, but has poor effect during extrapolation, namely, the predicted generalization capability is not strong. Based on this, embodiments of the present application provide a method, an apparatus, an electronic device, and a medium for estimating an advertisement reach rate, which are described below with reference to embodiments.
To facilitate understanding of the embodiment, a detailed description will be given to an advertisement reach estimation method disclosed in the embodiment of the present application.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for estimating an advertisement reach rate according to an embodiment of the present disclosure. As shown in fig. 1, the method may include the steps of:
s101, constructing an advertisement data set based on a plurality of pieces of acquired advertisement data, and dividing the advertisement data set into a training set and a testing set, wherein each piece of advertisement data comprises advertisement characteristics and an advertisement reach rate;
s102, carrying out discrete feature coding on the training set and the test set to respectively obtain advertisement feature vectors corresponding to the training set and advertisement feature vectors corresponding to the test set;
s103, constructing an advertisement reach rate estimation model based on a gradient promotion decision tree;
s104, inputting the advertisement feature vector corresponding to the training set into an advertisement reach rate pre-estimation model to be trained, and training the mapping relation between the advertisement feature and the advertisement reach rate in the advertisement reach rate pre-estimation model;
s105, inputting the advertisement characteristics of the target advertisement into the trained advertisement reach rate estimation model to estimate the advertisement reach rate after the target advertisement is put.
In this embodiment, the advertisement Reach rate is generally denoted as ReachjThe number of targeted netizens U hit by the advertisement j times in a specific marketjTarget net citizen number T for occupying networkjThe proportion of (A):
Figure BDA0002756999770000061
the i + reach rate refers to the ratio of the number of target net citizens hit by the advertisement for more than i times to the number of target net citizens reached by the network.
Figure BDA0002756999770000071
In step S101, first, a plurality of pieces of advertisement data are collected by a method of offline research and front-end webpage point burying, and each piece of advertisement data includes advertisement characteristics and advertisement reach; secondly, constructing an advertisement data set based on the plurality of pieces of advertisement data; and finally, dividing the advertisement data set into a training set and a testing set according to the ratio of 80% to 20%. Other ratios may be used to divide the training and test sets in particular implementations.
In a possible implementation manner, after constructing an advertisement data set based on the obtained several pieces of advertisement data, before dividing the advertisement data set into a training set and a test set, the method further includes: and performing data cleaning processing including null value removal and feature screening on the advertisement data set. And eliminating null values, namely eliminating the advertisement data with the missing field. The feature screening, namely screening out advertisement features highly related to advertisement reach rate, specifically comprises: at least one of commodity information, advertising media information, advertising forms, target audience information, user equipment information, advertising market information, and exposure. The data size of the resulting usable advertisement data set is N × D, where N represents the number of advertisements, each record represents an exposure, and D represents a feature dimension.
In step S102, it is assumed that the advertisement characteristics are 7-dimensional characteristics, i.e., commodity information, advertisement media information, advertisement forms, target audience information, user equipment information, advertisement market information, and exposure amount, the first 6 dimensions are discrete characteristics, and the 7 th dimension is continuous characteristics. Since the input features of the Gradient Boosting Decision Tree (GBDT) must be discrete features, discrete feature coding is performed on the discrete features in the training set and the test set by using Label Encoder (LE), and discrete feature coding is performed on the continuous features in the training set and the test set by using Histogram Encoder (HE). Processed data set S { (x)1,y1),…,(xi,yi),…,(xN,yN) In which xiIs a 7-dimensional advertisement feature vector, yiIs a 1+ Reach scalar.
In step S103, a Gradient Boosting Decision Tree (GBDT) is a Tree model, and regression of target variables is realized by combining multiple Decision trees.
Specifically, step S103 may include the following sub-steps:
s1031, initializing the weak learner, wherein the gradient boosting decision tree is a tree with only root nodes during initialization;
s1032, executing the following steps for each strong learner:
for each sample, calculating the negative gradient of its squared loss function;
taking the approximate value of the negative gradient as a residual error, and constructing training data of a next tree based on the residual error to obtain a regression tree;
calculating a target fitting value of a leaf node region of the regression tree;
updating the strong learner based on the target fitting value;
and S1033, obtaining a target learner based on the accumulation sum of the weak learner and each strong learner, and accordingly constructing an advertisement reach rate estimation model based on a gradient promotion decision tree.
In step S1031, weak learner f is initialized0(x):
Figure BDA0002756999770000081
Wherein GBDT is a tree with root node only during initialization, f0(x) Indicating a weak learner, L (y)iAnd c) represents a square loss function, yiDenotes the i + Reach scalar, c denotes the sample label mean, and N denotes the number of samples.
In step S1032, the following steps are performed for each strong learner (M is 1,2, …, M):
a) for each sample N ═ 1,2, …, N, a negative gradient was calculated:
Figure BDA0002756999770000082
wherein x isi7-dimensional advertisement feature vector, f (x), representing the ith samplei) Representing the ad reach rate, f, estimated based on the 7-dimensional ad feature vector of the ith samplem-1(x) Denotes the (m-1) th strong learner, rimRepresenting the residual error.
b) Taking the residual error obtained in the previous step as a new actual value of the sample, and taking the data (x)i,rim) I-1, 2, …, N is used as the training data of the next tree to get a new regression tree fm(x) The corresponding leaf node region is RjmJ is 1,2, …, J, where J is the number of leaf nodes of the regression tree m.
c) The best fit y was calculated for the leaf region J1, 2, …, Jjm
Figure BDA0002756999770000091
d) Updating the strong learner:
Figure BDA0002756999770000092
in step S1033, a final learner is obtained:
Figure BDA0002756999770000093
in specific implementation, parameters such as the number of learners M, the maximum tree depth and the learning rate are preset.
In a possible implementation manner, before inputting the advertisement features of the target advertisement into the trained advertisement reach estimation model to estimate the advertisement reach after the target advertisement is delivered, the method further includes: inputting the advertisement characteristic vectors corresponding to the test set into a trained advertisement reach rate pre-estimation model for testing; and evaluating the performance index of the advertisement reach rate estimation model based on the test result. The performance indexes of the advertisement reach estimation model comprise: goodness of fit R2At least one of root mean square error RMSE, mean absolute error MAE, and mean absolute percent error MAPE.
Goodness of fit R2It characterizes how well the regression equation accounts for the change in the dependent variable, or how well the equation fits to the observed values. The validity of the goodness-of-fit generally requires: number of independent variables, number of samples>1:10. The calculation formula is as follows:
R2=SSR/SST=1-SSE/SST
wherein R is2Represents goodness of fit, SSR represents regression sum of squares, SST represents total sum of squared deviations, and SSE represents residual sum of squares.
The root mean square error RMSE, which represents the expected value of the square of the error, is calculated as follows:
Figure BDA0002756999770000101
where actual (t) represents the actual value, and forecast (t) represents the predicted value.
The smaller the average absolute error MAE, the more accurate the prediction model has, and its calculation formula is as follows:
Figure BDA0002756999770000102
the mean absolute percentage error, MAPE, is a variation of the MAD, which is a percentage value and therefore easier to understand than other statistics. For example, if MAPE is 5, it means that the predicted outcome deviates 5% of the mean of the true outcomes. The calculation formula for MAPE is as follows:
Figure BDA0002756999770000103
according to the advertisement reach estimation method provided by the embodiment of the application, firstly, an advertisement data set is constructed based on a plurality of pieces of acquired advertisement data, the advertisement data set is divided into a training set and a testing set, and each piece of advertisement data comprises advertisement characteristics and an advertisement reach; because the input features of the Gradient Boosting Decision Tree (GBDT) must be discrete features, discrete feature coding is performed on the training set and the test set to obtain the advertisement feature vector corresponding to the training set and the advertisement feature vector corresponding to the test set, respectively. Then constructing an advertisement reach rate estimation model based on a gradient promotion decision tree, namely constructing a regression model of the relation between advertisement characteristics and reach rates by adopting the gradient promotion decision tree; and inputting the advertisement characteristic vector corresponding to the training set into an advertisement reach rate pre-estimation model to be trained, and training the mapping relation between the advertisement characteristic and the advertisement reach rate in the advertisement reach rate pre-estimation model. And finally, inputting the advertisement characteristics of the target advertisement into a trained advertisement reach rate estimation model to estimate the advertisement reach rate after the target advertisement is put. The embodiment of the application adopts the gradient promotion decision tree to construct a Regression model of the relation between the advertisement characteristics and the reach rate, namely an advertisement reach rate pre-estimation model based on the gradient promotion decision tree, the model has excellent Regression capability, is composed of a plurality of decision trees, can realize automatic characteristic engineering, is very suitable for advertisement data with a large number of discrete characteristics, and completely avoids the problems of a large number of characteristic engineering and characteristic combination explosion caused by using Logistic Regression (LR).
Based on the same technical concept, embodiments of the present application further provide an advertisement reach rate pre-estimation apparatus, an electronic device, a computer storage medium, and the like, which can be seen in the following embodiments.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an advertisement reach estimation apparatus according to an embodiment of the present disclosure. As shown in fig. 2, the apparatus may include:
the system comprises a dividing module 10, a data processing module and a data processing module, wherein the dividing module is used for constructing an advertisement data set based on a plurality of pieces of acquired advertisement data, and dividing the advertisement data set into a training set and a testing set, and each piece of advertisement data comprises advertisement characteristics and advertisement reach;
the encoding module 20 is configured to perform discrete feature encoding on the training set and the test set to obtain an advertisement feature vector corresponding to the training set and an advertisement feature vector corresponding to the test set, respectively;
the building module 30 is used for building an advertisement reach prediction model based on a gradient promotion decision tree;
the training module 40 is configured to input the advertisement feature vector corresponding to the training set into an advertisement reach rate pre-estimation model to be trained, and train a mapping relationship between the advertisement feature and the advertisement reach rate in the advertisement reach rate pre-estimation model;
and the estimation module 50 is used for inputting the advertisement characteristics of the target advertisement into the trained advertisement reach estimation model to estimate the advertisement reach after the target advertisement is delivered.
An embodiment of the present application discloses an electronic device, as shown in fig. 3, including: a processor 301, a memory 302, and a bus 303, the memory 302 storing machine readable instructions executable by the processor 301, the processor 301 and the memory 302 communicating via the bus 303 when the electronic device is operating. The machine readable instructions, when executed by the processor 301, perform the steps of:
constructing an advertisement data set based on a plurality of pieces of acquired advertisement data, and dividing the advertisement data set into a training set and a testing set, wherein each piece of advertisement data comprises advertisement characteristics and an advertisement reach rate;
performing discrete feature coding on the training set and the test set to respectively obtain advertisement feature vectors corresponding to the training set and advertisement feature vectors corresponding to the test set;
constructing an advertisement reach rate estimation model based on a gradient promotion decision tree;
inputting the advertisement characteristic vector corresponding to the training set into an advertisement reach rate pre-estimation model to be trained, and training the mapping relation between the advertisement characteristic and the advertisement reach rate in the advertisement reach rate pre-estimation model;
inputting the advertisement characteristics of the target advertisement into a trained advertisement reach rate estimation model to estimate the advertisement reach rate after the target advertisement is put.
In one possible implementation, after constructing the advertisement data set based on the obtained pieces of advertisement data, before dividing the advertisement data set into a training set and a test set, the processor 301 further includes:
and performing data cleaning processing including null value removal and feature screening on the advertisement data set.
In one possible implementation, the advertising features include: at least one of commodity information, advertising media information, advertising forms, target audience information, user equipment information, advertising market information, and exposure.
In a possible implementation manner, before the processor 301 inputs the advertisement features of the target advertisement into the trained advertisement reach estimation model to estimate the advertisement reach after the target advertisement is delivered, the method further includes:
inputting the advertisement characteristic vectors corresponding to the test set into a trained advertisement reach rate pre-estimation model for testing;
and evaluating the performance index of the advertisement reach rate estimation model based on the test result.
In one possible embodiment, the performance indicators of the advertisement reach prediction model include: at least one of goodness-of-fit, root mean square error, mean absolute error, and mean absolute percentage error.
In a possible implementation manner, the discrete feature coding is performed on the training set and the test set by the processor 301 to obtain an advertisement feature vector corresponding to the training set and an advertisement feature vector corresponding to the test set, respectively, and the discrete feature coding includes:
performing discrete feature coding on discrete features in the training set and the test set by adopting label coding;
and carrying out discrete feature coding on the continuous features in the training set and the test set by adopting histogram coding.
In one possible implementation, the processor 301 constructs an advertisement reach prediction model based on a gradient boosting decision tree, including:
initializing a weak learner, wherein the decision tree is a tree with only root nodes in the process of gradient promotion during initialization;
performing the following steps for each strong learner:
for each sample, calculating the negative gradient of its squared loss function;
taking the approximate value of the negative gradient as a residual error, and constructing training data of a next tree based on the residual error to obtain a regression tree;
calculating a target fitting value of a leaf node region of the regression tree;
updating the strong learner based on the target fitting value;
and obtaining a target learner based on the accumulated sum of the weak learner and each strong learner, thereby constructing an advertisement reach rate estimation model based on a gradient promotion decision tree.
The computer program product of the advertisement reach estimation method provided in the embodiment of the present application includes a computer-readable storage medium storing a non-volatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An advertisement reach estimation method is characterized by comprising the following steps:
constructing an advertisement data set based on a plurality of pieces of acquired advertisement data, and dividing the advertisement data set into a training set and a testing set, wherein each piece of advertisement data comprises advertisement characteristics and an advertisement reach rate;
performing discrete feature coding on the training set and the test set to respectively obtain advertisement feature vectors corresponding to the training set and advertisement feature vectors corresponding to the test set;
constructing an advertisement reach rate estimation model based on a gradient promotion decision tree;
inputting the advertisement characteristic vector corresponding to the training set into an advertisement reach rate pre-estimation model to be trained, and training the mapping relation between the advertisement characteristic and the advertisement reach rate in the advertisement reach rate pre-estimation model;
inputting the advertisement characteristics of the target advertisement into a trained advertisement reach rate estimation model to estimate the advertisement reach rate after the target advertisement is put.
2. The method of claim 1, wherein after constructing an advertisement data set based on the obtained pieces of advertisement data, before dividing the advertisement data set into a training set and a test set, further comprising:
and performing data cleaning processing including null value removal and feature screening on the advertisement data set.
3. The method of claim 1, wherein the advertising feature comprises: at least one of commodity information, advertising media information, advertising forms, target audience information, user equipment information, advertising market information, and exposure.
4. The method of claim 1, wherein inputting the advertisement characteristics of the target advertisement into a trained advertisement reach estimation model before estimating the advertisement reach after the target advertisement is delivered, further comprises:
inputting the advertisement characteristic vectors corresponding to the test set into a trained advertisement reach rate pre-estimation model for testing;
and evaluating the performance index of the advertisement reach rate estimation model based on the test result.
5. The method of claim 4, wherein the performance indicators of the ad reach prediction model comprise: at least one of goodness-of-fit, root mean square error, mean absolute error, and mean absolute percentage error.
6. The method of claim 1, wherein performing discrete feature coding on the training set and the test set to obtain an advertisement feature vector corresponding to the training set and an advertisement feature vector corresponding to the test set, respectively, comprises:
performing discrete feature coding on discrete features in the training set and the test set by adopting label coding;
and carrying out discrete feature coding on the continuous features in the training set and the test set by adopting histogram coding.
7. The method of claim 1, wherein constructing a gradient boosting decision tree based advertisement reach prediction model comprises:
initializing a weak learner, wherein the decision tree is a tree with only root nodes in the process of gradient promotion during initialization;
performing the following steps for each strong learner:
for each sample, calculating the negative gradient of its squared loss function;
taking the approximate value of the negative gradient as a residual error, and constructing training data of a next tree based on the residual error to obtain a regression tree;
calculating a target fitting value of a leaf node region of the regression tree;
updating the strong learner based on the target fitting value;
and obtaining a target learner based on the accumulated sum of the weak learner and each strong learner, thereby constructing an advertisement reach rate estimation model based on a gradient promotion decision tree.
8. An advertisement reach rate pre-estimation device, comprising:
the system comprises a dividing module, a judging module and a judging module, wherein the dividing module is used for constructing an advertisement data set based on a plurality of pieces of acquired advertisement data and dividing the advertisement data set into a training set and a testing set, and each piece of advertisement data comprises advertisement characteristics and advertisement reach rates;
the coding module is used for carrying out discrete feature coding on the training set and the test set to respectively obtain advertisement feature vectors corresponding to the training set and advertisement feature vectors corresponding to the test set;
the construction module is used for constructing an advertisement reach rate estimation model based on a gradient promotion decision tree;
the training module is used for inputting the advertisement characteristic vectors corresponding to the training set into an advertisement reach rate pre-estimation model to be trained and training the mapping relation between the advertisement characteristics and the advertisement reach rate in the advertisement reach rate pre-estimation model;
and the estimation module is used for inputting the advertisement characteristics of the target advertisement into the trained advertisement reach estimation model to estimate the advertisement reach after the target advertisement is delivered.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.
CN202011205753.8A 2020-11-02 2020-11-02 Advertisement reach rate estimation method and device, electronic equipment and medium Withdrawn CN112288489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011205753.8A CN112288489A (en) 2020-11-02 2020-11-02 Advertisement reach rate estimation method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011205753.8A CN112288489A (en) 2020-11-02 2020-11-02 Advertisement reach rate estimation method and device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN112288489A true CN112288489A (en) 2021-01-29

Family

ID=74352905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011205753.8A Withdrawn CN112288489A (en) 2020-11-02 2020-11-02 Advertisement reach rate estimation method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN112288489A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191830A (en) * 2021-07-02 2021-07-30 北京明略软件系统有限公司 Resource allocation method, device, equipment and computer readable medium
CN113347181A (en) * 2021-06-01 2021-09-03 上海明略人工智能(集团)有限公司 Abnormal advertisement flow detection method, system, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113347181A (en) * 2021-06-01 2021-09-03 上海明略人工智能(集团)有限公司 Abnormal advertisement flow detection method, system, computer equipment and storage medium
CN113191830A (en) * 2021-07-02 2021-07-30 北京明略软件系统有限公司 Resource allocation method, device, equipment and computer readable medium

Similar Documents

Publication Publication Date Title
CN106651542B (en) Article recommendation method and device
CN110033314B (en) Advertisement data processing method and device
CN107220845B (en) User re-purchase probability prediction/user quality determination method and device and electronic equipment
CN105678587B (en) Recommendation feature determination method, information recommendation method and device
CN109961142B (en) Neural network optimization method and device based on meta learning
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN107526810B (en) Method and device for establishing click rate estimation model and display method and device
CN109949089B (en) Method, device and terminal for determining display rate
CN105654198B (en) Brand advertisement effect optimization method capable of realizing optimal threshold value selection
CN112288489A (en) Advertisement reach rate estimation method and device, electronic equipment and medium
CN111798280B (en) Multimedia information recommendation method, device and equipment and storage medium
CN112183818A (en) Recommendation probability prediction method and device, electronic equipment and storage medium
CN110689402A (en) Method and device for recommending merchants, electronic equipment and readable storage medium
CN110647696A (en) Business object sorting method and device
CN111861605A (en) Business object recommendation method
JP5661689B2 (en) Content distribution device
CN107885754B (en) Method and device for extracting credit variable from transaction data based on LDA model
CN113297486B (en) Click rate prediction method and related device
CN106776757A (en) User completes the indicating means and device of Net silver operation
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN111859946B (en) Method and apparatus for ordering comments and machine-readable storage medium
CN115204943A (en) Advertisement recall method, device, equipment and storage medium
CN111984698B (en) Information prediction method, device and storage medium
CN112200389A (en) Data prediction method, device, equipment and storage medium
CN113450127A (en) Information display method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210129

WW01 Invention patent application withdrawn after publication