CN112149352B - Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering - Google Patents

Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering Download PDF

Info

Publication number
CN112149352B
CN112149352B CN202011007410.0A CN202011007410A CN112149352B CN 112149352 B CN112149352 B CN 112149352B CN 202011007410 A CN202011007410 A CN 202011007410A CN 112149352 B CN112149352 B CN 112149352B
Authority
CN
China
Prior art keywords
user
dpi
prediction model
gbdt
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011007410.0A
Other languages
Chinese (zh)
Other versions
CN112149352A (en
Inventor
项亮
方同星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shuming Artificial Intelligence Technology Co ltd
Original Assignee
Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shuming Artificial Intelligence Technology Co ltd filed Critical Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority to CN202011007410.0A priority Critical patent/CN112149352B/en
Publication of CN112149352A publication Critical patent/CN112149352A/en
Application granted granted Critical
Publication of CN112149352B publication Critical patent/CN112149352B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Abstract

A click prediction method for marketing activities by combining GBDT automatic characteristic engineering comprises a data preprocessing step, a GBDT prediction model establishing step, a prediction model establishing step with regularization terms and a prediction step for the click of marketing activities; the data preprocessing step comprises the steps of extracting original characteristic information from original information of a user, sequentially processing the original characteristic information in all batches with task batch numbers, carrying out One-hot coding processing on the attribution characteristics of the mobile phone number of the user, and sequencing all the task batch numbers according to an ascending order to obtain the sequence of the task batches; the user prediction model is selected as a combination of an LR model with a regularization term and a GBDT prediction model; and predicting the click willingness degree of the user group simulating the internet product marketing by adopting a user prediction model. Therefore, the invention can provide a way for directly predicting the advertisement click intention of the user and can process data with large-scale sparse characteristics.

Description

Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
Technical Field
The invention relates to the technical field of artificial intelligence marketing in the Internet, in particular to a method for predicting marketing activity clicking by combining GBDT automatic characteristic engineering.
Background
With the increasingly intense market competition of the internet industry, the application of big data becomes a new mode of internet marketing, namely, the big data of internet operators is accurately obtained by guest systems. The big data intelligent customer acquisition system takes an operator big database as a center, directly captures the contact information of users meeting the user-defined conditions, directly communicates with customers, reduces the customer acquisition cost of enterprises, and improves the profits of the enterprises.
Currently, the advertisement marketing behavior is often predicted through a user portrait and user behavior characteristics, and more commonly used Machine learning algorithms can be classified into Logistic Regression (LR) and Factorization Machine (FM) represented by a linear model, and a Gradient Boosting Decision Tree (GBDT).
However, both of the above algorithms have some inherent disadvantages:
for the linear model, because the expression capability of the linear model is limited, the interactive information between the features cannot be effectively learned by the linear model, for example, the logistic regression can only learn the first-order features, and even if the factorization machine considers the feature interaction, the factorization machine can only learn the second-order feature interactive information. Therefore, linear models rely heavily on feature engineering by algorithm engineers, which enhances the learning ability of linear models by manually selecting features and performing construction of high-order interactive features.
And secondly, the gradient lifting decision tree model can easily realize the interaction among the characteristics by traversing the characteristics and carrying out characteristic space-simple division on the samples, so that the gradient lifting decision tree model has strong learning capacity. However, in the field of marketing advertisement recommendation, user features often include a large number of sparse one-hot type features, such as a home location, a URL to access a 4G page, and the like, and only a few of these features have corresponding values.
Therefore, the gradient boosting decision tree based algorithm is not suitable for processing the data containing a large amount of sparse features, so that overfitting is easily caused, and feature information is wasted because a large amount of features may not be used as split nodes of the decision tree.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a prediction method for marketing activity clicking by combining GBDT automatic feature engineering, which models continuous features in user features by using high-order feature interaction of GBDT pairs and combines leaf nodes in a model as new high-order interaction sparse features with original user sparse features, thereby not only fully utilizing user feature information, but also solving the problem that GBDT is insensitive to sparse features.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering is characterized by comprising a data preprocessing step S1, a GBDT prediction model establishing step S2 and an LR prediction model establishing step S3 with regularization terms;
the data preprocessing step S1 includes the steps of:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a user DPI access frequency and a DPI access frequency; the task batch number represents original information of a user in a date time period, and the DPI access frequency of the user are each task batch number as a measurement unit;
step S12: sequentially processing the original characteristic information in all batches with the task batch numbers, and performing One-hot coding processing on the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
sequentially expanding all different user access DPIs as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into the relationship features of the frequency of the user access DPI and the DPI according to all the different user access DPIs;
step S13: sequencing all the task batch numbers in an ascending order to obtain sequencing of all the task batch numbers; the ascending order of the task batch numbers is in direct proportion to the date and time, and the closer the date and time is, the larger the task batch numbers are;
the GBDT prediction model building step S2 includes the steps of:
step S21: after preprocessing, taking a user ID as a sample unit, regarding the attribution feature and/or the feature whether the user visits the DPI to click as a sparse feature of the sample, and defining the frequency of the user visiting the DPI as a continuous feature;
step S22: selecting data in the task batch with the largest task batch number as a verification set, and using the rest data of the task batch number as a training set;
step S23: providing a GBDT prediction model to be established, taking continuous characteristics of each sample in the training set as input of the GBDT prediction model, taking relation characteristics of DPI (user access rate) and DPI (DPI rate) frequency of each sample in the training set as output of the GBDT prediction model, training the GBDT prediction model, and verifying the GBDT prediction model by each sample in the verification set to obtain the GBDT prediction model after parameter optimization;
the LR prediction model establishing step S3 with the regularization term specifically includes:
step S31: taking a user ID as a sample unit, regarding the attribution feature and/or the feature whether the user visits a DPI to click as the sparse feature of the sample, and frequently passing the user visiting the DPI through the GBDT prediction model to obtain the leaf node position sparse feature of the sample, wherein the number of the leaf node position sparse feature of the sample is the number of leaf nodes of the sample in the training set plus the number of subtrees of sparse features;
step S32: selecting data in the task batch with the largest task batch number as a verification set, and using the rest data of the task batch number as a training set;
step S33: providing an LR model with a regularization term, performing feature splicing on leaf node position sparse features of the samples of each sample in the training set and sparse features of the samples to serve as input of the LR model with the regularization term, taking relationship features of frequency of DPI (user access) and DPI (deep packet inspection) of each sample in the training set as output of the LR model with the regularization term, training the LR model with the regularization term, verifying the LR model with the regularization term by using each sample in the verification set to obtain the LR model with the regularization term after parameter optimization, and forming a user prediction model together with the optimized GBDT prediction model; wherein the output values of the LR model and the GBDT prediction model are output value results after weighting processing and are used as the output values of the user prediction model.
Further, the method for predicting a click on a marketing campaign by combining GBDT automated feature engineering further includes a step S4 of predicting a click on a marketing campaign, where the step S4 specifically includes:
step S41: acquiring a user group and user original information of the user group, which are predicted by clicking on a marketing activity, and extracting original characteristic information from the user original information; the original characteristic information comprises a user ID, a user mobile phone number attribution, a current task batch number, a user DPI access frequency and a user DPI access frequency; the DPI is accessed by the user and the DPI access frequency of the user takes the batch number of the task as a measurement unit;
step S42: performing One-hot coding processing on the original characteristic information of the current task batch number according to the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
expanding all different user access DPIs as independent features according to the current task batch number, and expanding the DPI access frequency in the current task batch number into the relationship features of the DPI and the DPI access frequency of the user according to all different user access DPIs;
step S43: defining the frequency of accessing the DPI by the user as a continuous characteristic by taking a user ID as a sample unit, regarding the characteristic of whether the attribution characteristic and/or the DPI is clicked by the user as a sparse characteristic of the sample, and obtaining a leaf node position sparse characteristic of the sample by the GBDT prediction model according to the frequency of accessing the DPI by the user, wherein the number of the leaf node position sparse characteristic of the sample is the number of leaf nodes of the sample in the training set multiplied by subtree sparse characteristics;
step S44: providing an established user prediction model, taking continuous features of each sample in the sample set as input of the GBDT prediction model to obtain the first prediction probability value of the GBDT prediction model, and performing feature splicing on leaf node position sparse features of the samples in the sample set and sparse features of the samples to serve as input of the LR model with the regularization term to obtain the second prediction probability value of the LR model with the regularization term; wherein the user prediction model is the LR model with regularization term + the GBDT prediction model;
step S45: and weighting the first prediction probability value and the second prediction probability value, and taking the weighted output value result as the output value of the LR model with the regularization term + the GBDT prediction model.
Further, the LR model output value with the regularization term is weighted 0.8, and the GBDT prediction model output value is weighted 0.2.
Further, the click prediction method for marketing activities in combination with GBDT automatic feature engineering further includes:
step S46: and according to the actual delivery requirement, selecting all or part of LR models with regularization terms and users with GBDT prediction model output values exceeding a certain threshold value to perform accurate marketing tasks.
According to the technical scheme, the prediction method for the marketing activity click by combining the GBDT automatic characteristic engineering can effectively utilize the characteristics of the GBDT to carry out high-order interaction on the continuous characteristics of the user and output the continuous characteristics as the sparse characteristics, then the high-order interaction is combined with the original sparse characteristics, the logistic regression model which is good at processing the sparse characteristics is used for modeling, and finally the output result of the logistic regression and the output result of the GBDT are weighted and averaged to obtain the final result. The method can obviously improve the accuracy of the user click behavior prediction.
Drawings
FIG. 1 is a flow chart illustrating a method for predicting a click on a marketing campaign by combining GBDT automatic feature engineering according to an embodiment of the present invention
FIG. 2 is a schematic diagram illustrating the implementation of the process from step S2 to step S4 in the embodiment of the present invention
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
In the following detailed description of the embodiments of the present invention, in order to clearly illustrate the structure of the present invention and to facilitate explanation, the structure shown in the drawings is not drawn to a general scale and is partially enlarged, deformed and simplified, so that it should be understood as a limitation of the present invention.
It should be noted that, in the following specific embodiments of the present invention, the method for predicting clicks on marketing campaigns by combining GBDT automatic feature engineering may include a data preprocessing step, a model building step, and a model using step, and compared with the conventional algorithm based on a gradient boosting decision tree, the present invention can provide a way for users to directly predict ad click willingness, and is also suitable for processing data with large-scale sparse features.
Referring to fig. 1, fig. 1 is a flow chart illustrating a method for predicting a click on a marketing campaign by combining GBDT automatic feature engineering according to an embodiment of the present invention. As shown in fig. 1, the method for predicting a click on a marketing campaign by combining GBDT automatic feature engineering includes a data preprocessing step S1, a GBDT prediction model building step S2, an LR prediction model building step S3 with a regularization term, and a step S4 of predicting a click on a marketing campaign.
In an embodiment of the present invention, the data preprocessing step S1 includes the following steps:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original feature information comprises a user ID (id), a user mobile phone number attribution (location), a task number (batch number), user access DPI (DPI) and user access DPI frequency (DPI frequency); the task batch number represents original information of a user in a date time period, and the DPI access frequency of the user is measured by each task batch number.
Step S12: sequentially processing the original characteristic information in all batches with the task batch numbers, and performing One-hot coding processing on the attribution characteristics of the user mobile phone number (the One-hot coding is a common method for data preprocessing, and 0/1 mapping the category characteristics into new characteristics according to different values); wherein the One-hot encoding process comprises:
and sequentially expanding all different DPIs accessed by the users as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into the relationship features of the DPI and the DPI access frequency of the users according to all different DPIs accessed by the users.
Specifically, it can be considered that one task batch number (batch number) corresponds to user data of one day, and users in the same task batch number (batch number) in the user's original information may have duplication, because the same user may access multiple users to access the DPI. Therefore, all different user access DPIs need to be expanded as a separate feature, under which the current user value is 1 if one user has accessed the user access DPI, and 0 otherwise.
And similarly, expanding the access frequency of the user to the DPI into the characteristics of the user to access the DPI and the frequency of the user to access the DPI according to all different user access DPIs, wherein if the user accesses the DPI for m times, the value of the current user under the characteristics is m, and otherwise, the value is 0.
Referring to table 1 below, table 1 is a table description of raw data before preprocessing, and taking data of the same batch as an example, the table description can be briefly expressed as follows:
raw data before preprocessing:
table 1:
Figure GDA0003075188780000061
Figure GDA0003075188780000071
referring to table 2 below, table 2 is a table description of the data after preprocessing, and the data of the same batch is taken as an example, and can be briefly expressed as follows:
TABLE 2
Figure GDA0003075188780000072
Step S13: sequencing all the task batch numbers in an ascending order to obtain sequencing of all the task batch numbers; the ascending order of the task batch numbers is in direct proportion to the date and time, and the closer the date and time is, the larger the task batch numbers are.
After the processing, the user ID in each task batch is a unique value; then, the user data of all the batches are processed, the user data are merged according to the batch direction, the sorting is carried out according to the ascending order of the task batch numbers (batch numbers), the more the date of the task batch is, the larger the task batch number is, and the processed sample can be obtained.
After the data preprocessing step is completed, the data of the last batch can be selected as a verification sample set to select model parameters, and all samples except the verification sample set form a training sample set for establishing a model, namely the training sample set is used for carrying out model training; the sample set is validated for model parameter selection.
The invention idea in the embodiment of the invention is a click prediction method for marketing activities by combining GBDT automatic characteristic engineering, wherein a user prediction model is the LR model with regularization term plus the GBDT prediction model, namely, continuous characteristics in the user characteristics are modeled by utilizing the high-order characteristic interaction of GBDT pairs, leaf nodes in the model are used as the sparse characteristics of new high-order interaction to be combined with the sparse characteristics of the original user, so that the user characteristic information can be fully utilized, and the problem that the GBDT is insensitive to the sparse characteristics is solved.
Therefore, in the embodiment of the present invention, the user prediction model actually includes two models, one is a GBDT prediction model, and the other is an LR model with a regularization term, that is, the user prediction model is a combination of the LR model with the regularization term + the GBDT prediction model.
The GBDT prediction model building step S2 includes the steps of:
step S21: after preprocessing, taking a user ID as a sample unit, regarding the attribution feature and/or the feature whether the user visits the DPI to click as a sparse feature of the sample, and defining the frequency of the user visiting the DPI as a continuous feature;
step S22: selecting data in the task batch with the largest task batch number as a verification set, and using the rest data of the task batch number as a training set;
step S23: providing a GBDT prediction model to be established, taking continuous characteristics of each sample in the training set as input of the GBDT prediction model, taking relation characteristics of DPI (user access rate) and DPI (deep packet inspection) frequency of each sample in the training set as output of the GBDT prediction model, and training and verifying the GBDT prediction model to obtain the GBDT prediction model after parameter optimization;
the LR prediction model establishing step S3 with the regularization term specifically includes:
step S31: taking a user ID as a sample unit, regarding the attribution feature and/or the feature whether the user visits a DPI to click as the sparse feature of the sample, and frequently passing the user visiting the DPI through the GBDT prediction model to obtain the leaf node position sparse feature of the sample, wherein the number of the leaf node position sparse feature of the sample is the number of leaf nodes of the sample in the training set plus the number of subtrees of sparse features;
step S32: selecting data in the task batch with the largest task batch number as a verification set, and using the rest data of the task batch number as a training set;
step S33: providing an LR model with a regularization term, performing feature splicing on leaf node position sparse features of the samples of each sample in the training set and sparse features of the samples, using the leaf node position sparse features and the sparse features of the samples as input of the LR model with the regularization term, using relationship features of frequency of DPI (user access) and DPI (deep packet inspection) of each sample in the training set as output of the LR model with the regularization term, and training and verifying the LR model with the regularization term to obtain the LR model with the regularization term after parameter optimization.
That is, for the processed data, the last batch of data is selected as the verification sample set to perform the selection of the model parameters, and all the samples except the verification sample set constitute the training sample set for establishing the model. The user prediction model is selected as a combination of the LR model with regularization term + the GBDT prediction model.
Specifically, referring to fig. 2, fig. 2 is a schematic diagram illustrating a process from step S2 to step S4 according to an embodiment of the present invention. The model training and validation processes of step S2 and step S3 are not to be recited here.
In an embodiment of the present invention, the method for predicting a click on a marketing campaign by combining GBDT automatic feature engineering includes a step S4 of predicting a click on a marketing campaign, which may specifically include:
step S41: acquiring a user group and user original information of the user group, which are predicted by clicking on a marketing activity, and extracting original characteristic information from the user original information; the original characteristic information comprises a user ID, a user mobile phone number attribution, a current task batch number, a user DPI access frequency and a user DPI access frequency; and the DPI access frequency of the user take the batch number of the task as a measurement unit.
The above steps are mainly to perform feature extraction on the user group for internet product marketing, and then to perform preprocessing on the original feature information of the current task batch number, and the preprocessing step S42 is as follows:
step S42: performing One-hot coding processing on the original characteristic information of the current task batch number according to the attribution characteristics of the user mobile phone number; and the One-hot coding processing comprises the steps of expanding all different user access DPIs as independent features according to the task batch number, and expanding the DPI access frequency in the task batch number into a relation feature of the DPI and the DPI access frequency of the user according to all different user access DPIs.
Step S43: defining the frequency of accessing the DPI by the user as a continuous characteristic by taking a user ID as a sample unit, regarding the characteristic of whether the attribution characteristic and/or the DPI is clicked by the user as a sparse characteristic of the sample, and obtaining a leaf node position sparse characteristic of the sample by the GBDT prediction model according to the frequency of accessing the DPI by the user, wherein the number of the leaf node position sparse characteristic of the sample is the number of leaf nodes of the sample in the training set multiplied by subtree sparse characteristics;
after the processing steps are completed, the characteristics are brought into a user prediction model, so that partial users with high willingness can be screened out in advance before advertisement putting, and accurate putting of marketing advertisements is carried out on the users.
Step S44: providing an established user prediction model, taking continuous features of each sample in the sample set as input of the GBDT prediction model to obtain the first prediction probability value of the GBDT prediction model, and performing feature splicing on leaf node position sparse features of the samples in the sample set and sparse features of the samples to serve as input of the LR model with the regularization term to obtain the second prediction probability value of the LR model with the regularization term; wherein the user prediction model is the LR model with regularization term + the GBDT prediction model;
step S45: and weighting the first prediction probability value and the second prediction probability value, and taking the weighted output value result as the output value of the LR model with the regularization term + the GBDT prediction model.
In an embodiment of the present invention, the LR model output value with the regularization term may be weighted to 0.8, and the GBDT prediction model output value may be weighted to 0.2.
Of course, the present invention may further include step S46: and according to the actual delivery requirement, selecting all or part of LR models with regularization terms and users with GBDT prediction model output values exceeding a certain threshold value to perform accurate marketing tasks.
The result shows that a large number of users with low willingness can be directly screened out from the putting targets through the user prediction model, so that a large amount of marketing cost is saved, and the profit margin is increased.
The above description is only for the preferred embodiment of the present invention, and the embodiment is not intended to limit the scope of the present invention, so that all the equivalent structural changes made by using the contents of the description and the drawings of the present invention should be included in the scope of the present invention.

Claims (4)

1. A prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering is characterized by comprising a data preprocessing step S1, a GBDT prediction model establishing step S2 and an LR prediction model establishing step S3 with regularization terms;
the data preprocessing step S1 includes the steps of:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a user DPI access frequency and a DPI access frequency; the task batch number represents original information of a user in a date time period, and the DPI access frequency of the user take each task batch number as a metering unit;
step S12: sequentially processing the original characteristic information in all batches with the task batch numbers, and performing One-hot coding processing on the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
sequentially expanding all different user access DPIs as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into the relationship features of the frequency of the user access DPI and the DPI according to all the different user access DPIs;
step S13: sequencing all the task batch numbers in an ascending order to obtain sequencing of all the task batch numbers; the ascending order of the task batch numbers is in direct proportion to the date and time, and the closer the date and time is, the larger the task batch numbers are;
the GBDT prediction model building step S2 includes the steps of:
step S21: after preprocessing, taking a user ID as a sample unit, regarding the attribution feature and/or the feature whether the user visits the DPI to click as a sparse feature of the sample, and defining the frequency of the user visiting the DPI as a continuous feature;
step S22: selecting data in the task batch with the largest task batch number as a verification set, and using the rest data of the task batch number as a training set;
step S23: providing a GBDT prediction model to be established, taking continuous characteristics of each sample in the training set as input of the GBDT prediction model, taking relation characteristics of DPI (user access rate) and DPI (DPI rate) frequency of each sample in the training set as output of the GBDT prediction model, training the GBDT prediction model, and verifying the GBDT prediction model by each sample in the verification set to obtain the GBDT prediction model after parameter optimization;
the LR prediction model establishing step S3 with the regularization term specifically includes:
step S31: taking a user ID as a sample unit, regarding the attribution feature and/or the feature whether the user visits a DPI to click as the sparse feature of the sample, and frequently passing the user visiting the DPI through the GBDT prediction model to obtain the leaf node position sparse feature of the sample, wherein the number of the leaf node position sparse feature of the sample is the number of leaf nodes of the sample in the training set plus the number of subtrees of sparse features;
step S32: selecting data in the task batch with the largest task batch number as a verification set, and using the rest data of the task batch number as a training set;
step S33: providing an LR model with a regularization term, performing feature splicing on leaf node position sparse features of the samples of each sample in the training set and sparse features of the samples to serve as input of the LR model with the regularization term, taking relationship features of frequency of DPI (user access) and DPI (deep packet inspection) of each sample in the training set as output of the LR model with the regularization term, training the LR model with the regularization term, verifying the LR model with the regularization term by using each sample in the verification set to obtain the LR model with the regularization term after parameter optimization, and forming a user prediction model together with the optimized GBDT prediction model; wherein the output values of the LR model and the GBDT prediction model are output value results after weighting processing and are used as the output values of the user prediction model.
2. The method for predicting a click on a marketing campaign in conjunction with GBDT automation feature engineering of claim 1, further comprising a step S4 of predicting a click on a marketing campaign, wherein the step S4 specifically comprises:
step S41: acquiring a user group and user original information of the user group, which are predicted by clicking on a marketing activity, and extracting original characteristic information from the user original information; the original characteristic information comprises a user ID, a user mobile phone number attribution, a current task batch number, a user DPI access frequency and a user DPI access frequency; the DPI is accessed by the user and the DPI access frequency of the user takes the batch number of the task as a measurement unit;
step S42: performing One-hot coding processing on the original characteristic information of the current task batch number according to the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
expanding all different user access DPIs as independent features according to the current task batch number, and expanding the DPI access frequency in the current task batch number into the relationship features of the DPI and the DPI access frequency of the user according to all different user access DPIs;
step S43: defining the frequency of accessing the DPI by the user as a continuous characteristic by taking a user ID as a sample unit, regarding the characteristic of whether the attribution characteristic and/or the DPI is clicked by the user as a sparse characteristic of the sample, and obtaining a leaf node position sparse characteristic of the sample by the GBDT prediction model according to the frequency of accessing the DPI by the user, wherein the number of the leaf node position sparse characteristic of the sample is the number of leaf nodes of the sample in the training set multiplied by subtree sparse characteristics;
step S44: providing an established user prediction model, taking continuous features of each sample in the sample set as input of the GBDT prediction model to obtain a first prediction probability value of the GBDT prediction model, and performing feature splicing on leaf node position sparse features of the samples in the sample set and sparse features of the samples to serve as input of the LR model with the regularization term to obtain a second prediction probability value of the LR model with the regularization term; wherein the user prediction model is the LR model with regularization term + the GBDT prediction model;
step S45: and weighting the first prediction probability value and the second prediction probability value, and taking the weighted output value result as the output value of the LR model with the regularization term + the GBDT prediction model.
3. The method of predicting a marketing campaign click in conjunction with GBDT automated feature engineering of claim 2, wherein the LR model output value with regularization term is weighted 0.8 and the GBDT prediction model output value is weighted 0.2.
4. The method of predicting a marketing campaign click in conjunction with GBDT automated feature engineering of claim 2 or 3, further comprising:
step S46: and according to the actual delivery requirement, selecting all or part of LR models with regularization terms and users with GBDT prediction model output values exceeding a certain threshold value to perform accurate marketing tasks.
CN202011007410.0A 2020-09-23 2020-09-23 Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering Active CN112149352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011007410.0A CN112149352B (en) 2020-09-23 2020-09-23 Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011007410.0A CN112149352B (en) 2020-09-23 2020-09-23 Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering

Publications (2)

Publication Number Publication Date
CN112149352A CN112149352A (en) 2020-12-29
CN112149352B true CN112149352B (en) 2021-08-31

Family

ID=73897702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011007410.0A Active CN112149352B (en) 2020-09-23 2020-09-23 Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering

Country Status (1)

Country Link
CN (1) CN112149352B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633937B (en) * 2020-12-30 2023-10-20 上海数鸣人工智能科技有限公司 Marketing prediction method based on dimension reduction and GBDT (Global positioning System) of depth automatic encoder
CN112819523B (en) * 2021-01-29 2024-03-26 上海数鸣人工智能科技有限公司 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network
CN113344615B (en) * 2021-05-27 2023-12-05 上海数鸣人工智能科技有限公司 Marketing campaign prediction method based on GBDT and DL fusion model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180012264A1 (en) * 2016-07-08 2018-01-11 Facebook, Inc. Custom features for third party systems
CN108830416B (en) * 2018-06-13 2020-02-18 四川大学 Advertisement click rate prediction method based on user behaviors
CN110689368B (en) * 2019-08-22 2022-07-19 北京大学(天津滨海)新一代信息技术研究院 Method for designing advertisement click rate prediction system in mobile application

Also Published As

Publication number Publication date
CN112149352A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
CN109345302B (en) Machine learning model training method and device, storage medium and computer equipment
CN112149352B (en) Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
CN108205766A (en) Information-pushing method, apparatus and system
CN109961142B (en) Neural network optimization method and device based on meta learning
CN108154425B (en) Offline merchant recommendation method combining social network and location
CN109816483B (en) Information recommendation method and device and readable storage medium
CN107507016A (en) A kind of information push method and system
CN111950806B (en) Marketing activity prediction model structure and prediction method based on factorization machine
CN115203311B (en) Industry data analysis mining method and system based on data brain
CN112258223B (en) Marketing advertisement click prediction method based on decision tree
CN111611488A (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN110880127A (en) Consumption level prediction method and device, electronic equipment and storage medium
CN114861050A (en) Feature fusion recommendation method and system based on neural network
CN112132209A (en) Attribute prediction method based on bias characteristics
CN113569162A (en) Data processing method, device, equipment and storage medium
CN116501979A (en) Information recommendation method, information recommendation device, computer equipment and computer readable storage medium
Wang et al. A Comparative Study on Contract Recommendation Model: Using Macao Mobile Phone Datasets
CN111831892A (en) Information recommendation method, information recommendation device, server and storage medium
CN112581177B (en) Marketing prediction method combining automatic feature engineering and residual neural network
CN112308419A (en) Data processing method, device, equipment and computer storage medium
CN114329167A (en) Hyper-parameter learning, intelligent recommendation, keyword and multimedia recommendation method and device
CN112269927A (en) Recommendation method based on session sequence dynamic behavior preference coupling relation analysis
CN117271905B (en) Crowd image-based lateral demand analysis method and system
CN112633937B (en) Marketing prediction method based on dimension reduction and GBDT (Global positioning System) of depth automatic encoder
CN117094762B (en) User rights and interests recommending method and system based on AI artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 200436 room 406, 1256 and 1258 Wanrong Road, Jing'an District, Shanghai

Patentee after: Shanghai Shuming Artificial Intelligence Technology Co.,Ltd.

Address before: Room 1601-026, 238 JIANGCHANG Third Road, Jing'an District, Shanghai, 200436

Patentee before: Shanghai Shuming Artificial Intelligence Technology Co.,Ltd.

CP02 Change in the address of a patent holder