CN112967088A - Marketing activity prediction model structure and prediction method based on knowledge distillation - Google Patents

Marketing activity prediction model structure and prediction method based on knowledge distillation Download PDF

Info

Publication number
CN112967088A
CN112967088A CN202110235391.5A CN202110235391A CN112967088A CN 112967088 A CN112967088 A CN 112967088A CN 202110235391 A CN202110235391 A CN 202110235391A CN 112967088 A CN112967088 A CN 112967088A
Authority
CN
China
Prior art keywords
training
model
user
net
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110235391.5A
Other languages
Chinese (zh)
Inventor
项亮
潘信法
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shuming Artificial Intelligence Technology Co ltd
Original Assignee
Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shuming Artificial Intelligence Technology Co ltd filed Critical Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority to CN202110235391.5A priority Critical patent/CN112967088A/en
Publication of CN112967088A publication Critical patent/CN112967088A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A marketing activity prediction method based on knowledge distillation comprises the steps of data preprocessing, data set division and network training framework formation of a teacher model, data set division and network training framework formation of a student model, prediction model establishment, marketing activity prediction and the like; firstly, constructing a more complex teacher model Net-T with a residual error neural network as a core, then constructing a student model Net-S formed by a simple neural network, and weighting a soft label obtained by training the teacher model Net-T at a high temperature and a hard label obtained by training the student model Net-S at the same temperature to obtain a total loss function of knowledge distillation; and (4) training to obtain a final obtained neural network model and making a prediction by taking the total loss function as an objective function of the student model Net-S in actual deployment. The result shows that the hybrid model effectively expands the application of deep learning to the calculation of the advertisement and recommendation system algorithm, and the accuracy of user click prediction is obviously improved.

Description

Marketing activity prediction model structure and prediction method based on knowledge distillation
Technical Field
The invention relates to the technical field of artificial intelligence marketing in the Internet, in particular to a marketing activity prediction model structure and a prediction method based on knowledge distillation.
Background
With the rapid development of deep learning algorithms and the successful application in many fields, for example, the problem of gradient disappearance during training is better solved by using a residual neural network (ResNet) in the field of Computer Vision (CV), and the ultra-strong Processing capability on text data is realized by a transform model and a Bert model in the field of Natural Language Processing (NLP). The revolutionary technology enables the application effect of the deep learning algorithm in different fields to be rapidly improved, and landing of the deep learning algorithm is accelerated. However, as the training data increases, the network model becomes more complex, and the parameters also tend to increase rapidly, even to the order of hundreds of millions.
Taking the calculation of the advertisement and recommendation system algorithm as an example, the following problems exist in the practical application of the algorithm:
the actual facing traffic of the advertisement calculation and recommendation system is often very large, and the recommendation system is often required to have strong timeliness. This means that, although the deep learning model may rely on hardware (such as GPU acceleration) for offline testing, for online deployment, if the deep learning model is too complex and has too many parameters, the corresponding speed of the model may be too slow, and particularly, when the traffic is large for a specific service scenario, the demand for timely pushing may not be satisfied.
Under general conditions, the offline training model and the online deployment model cannot be distinguished intentionally, namely, the model with better offline training effect is directly moved to the online to be deployed correspondingly; however, it is clear to those skilled in the art that there are some inconsistencies between the training model and the deployment model. For example: the models with good effect obtained in the training are either complex in scale or can only be realized by integrating a plurality of relatively simple models through the idea of ensemble learning (ensemble learning). In the field of computing advertising and recommendation algorithms, these simple models include Logistic Regression (LR), Factorization (FM), and simple Deep Neural Network (DNN). In addition, the large model has high requirements (memory, video memory and the like) on deployment resources; meanwhile, we have strict limitations on delay and computational resources during deployment; thus, large models are generally inconvenient to deploy directly into a service.
When the recommendation system works on the actual line, adjustment and change of data structures such as features and the like may be faced, and a complex large model is generally inflexible in adjustment and the like compared with a small model, so that additional calculation overhead is increased.
And fourthly, the parameters of the network model and the 'knowledge' quantity which can be captured or learned from the data are not in a stable linear relation, but are close to a growing curve with gradually reduced marginal benefits. In addition, the amount of "knowledge" that can be captured or learned using the exact same training data is not necessarily the same for the exact same model architecture and model parameters. That is, the proper training method can make the model parameter total amount smaller, and obtain more "knowledge" as much as possible.
Based on the above problems, there is a need in the industry to reduce the number of parameters of the model, i.e., model compression, on the premise of ensuring performance, so as to explore an effective landing method for calculating advertisements and recommending an algorithm to deploy link practical applications on line.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a marketing campaign prediction method based on knowledge distillation. In order to achieve the purpose, the technical scheme of the invention is as follows:
a marketing campaign prediction method based on knowledge distillation comprises a data preprocessing step S1, a teacher model data set division and network training frame forming step S2, a student model data set division and network training frame forming step S3 and a prediction model establishing step S4; the data preprocessing step S1 includes the steps of:
step S11: acquiring original information of N users, and extracting original characteristic information from the original information of each user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a user access DPI frequency, an access time, an access duration characteristic and/or a digital label which is clicked or not by the user; the task batch number represents original information of a user in a date time period, and the DPI access frequency of the user are each task batch number as a measurement unit;
step S12: sequentially processing the original characteristic information in all batches with the task batch numbers, and performing One-hot coding processing on the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
sequentially expanding all different user access DPIs as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into the relationship features of the DPI and the DPI access frequency of the users according to all different user access DPIs;
the step S2 of dividing the data set of the teacher model and forming the network training framework is a mixed method of adopting layered sampling and k-fold cross validation; the method specifically comprises the following steps:
s21: after preprocessing, selecting all data in the task batch number to be equally divided into k +1 sets; wherein, 1 set is used as a test set, and the data of the rest sets are used as a training set;
s22: calculating the total proportion of each sample in the two types of samples of clicking and non-clicking from the training set, and then, assuming that the training set is divided into k sets and the requirement of meeting the requirement that the proportion of the two types of samples of clicking and non-clicking in the samples obtained from each set is consistent with the total proportion;
s23: sequentially selecting one of the K sets as a verification set, and the rest K-1 sets as training sets to form K groups of verification set pairs and training set pairs; sequentially using K groups of verification sets and training set pairs to train the initialized teacher model, verifying the training result by using the corresponding verification sets, and testing by using the test sets to obtain K groups of test results; wherein the teacher model is a residual error neural network;
s24: averaging the K groups of test results to obtain an average value of the K groups of test results;
the data set partitioning and network training framework forming step S3 of the student model includes:
s31: selecting all data in the task batch number to be equally divided into K +1 sets; wherein, 1 set is used as a verification set, and the rest data is used as a training set;
s32: a neural network is adopted as a student model Net-S, wherein the student model Net-S comprises an input layer, M fully-connected hidden layers and an output layer;
the prediction model building step S4 includes the steps of:
step S41: providing an initialized knowledge distillation-based training model, wherein the training model comprises a teacher model Net-T training channel, a student model Net-S training channel and an output module;
step S42: training the teacher model Net-T by adopting the data set according to the data set division and network training framework of the teacher model, taking the average of K-fold cross validation results as a final classifier, and obtaining a soft label;
step S43: training the student model Net-S by adopting the data set according to the data set division and network training framework of the student model Net-S, obtaining hard prediction by training at a lower temperature t, and forming a hard loss function L by the hard prediction and a real label hard labelhard
Step S44: distilling knowledge of the teacher model Net-T to the student model Net-S at a higher temperature T, namely training at the high temperature T to obtain soft prediction and forming a soft loss function L with a soft labelsolf
Step S45: weighting the soft and hard Loss functions to obtain a total Loss function Loss, i.e.
Loss=αLsoft+βLhard(ii) a Where α is the soft loss function LsolfBeta is the real tag hard loss function LhardThe weight of (c);
step S46: and using the total loss function as an objective function of the student model Net-S during actual deployment, training to obtain the student model Net-S with optimized parameters, and using the student model Net-S after final optimization as a marketing activity prediction model based on knowledge distillation.
Further, the overall architecture of the teacher model Net-T network in the teacher model training channel comprises:
the input layer is used for inputting data obtained after the division of the data set of the teacher model Net-T;
the embedded layer is used for extracting information and reducing dimensions of the data characteristics input from the input layer;
multiplying the layer, and respectively performing the feature interaction of the outer product and the inner product on the features processed by the embedding layer;
the factorization layer is used for carrying out factorization on the weight matrix after the features are interacted;
a fully-connected layer comprising N hidden layers, wherein the hidden layers are designed into four network forms, namely incremental creating, invariable constant, diamond or incremental creating; wherein N is greater than M;
and the output layer outputs the predicted probability by adopting a sigmoid function, and forms a two-classification problem of clicking or not clicking by defining a threshold, namely an output result of dividing into a positive label or a negative label.
Further, two or three fully connected hidden layers are included in the student model Net-S.
Further, in knowledge distillation, the activation function of the output layer of the teacher model Net-T uses a generalized normalized exponential function, namely, a softmax function,
Figure BDA0002959791510000041
and when the temperature T is higher, the soft label obtained by training with the teacher model Net-T is softer.
Further, in the knowledge distillation process, the selection of the high temperature T is related to the parameter quantity of the student model Net-S, when the parameter quantity of the student model Net-S is smaller, the high temperature T selects a lower temperature, otherwise, when the parameter quantity of the student model Net-S is larger, the high temperature T selects a higher temperature.
Further, the method for predicting marketing campaign based on knowledge distillation further comprises a marketing campaign predicting step S5, wherein the step S5 specifically comprises:
step S51: acquiring a user group for simulating Internet product marketing and user original information of the user group, and extracting original characteristic information from the user original information; the task batch number represents original information of a user in a date time period, and the DPI access frequency of the user are each task batch number as a measurement unit;
step S52: performing One-hot coding processing on the original characteristic information of the task batch number according to the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
expanding all different user access DPIs as independent features according to the task batch number, and expanding the DPI access frequency in the task batch number into a relation feature of the DPI and the DPI access frequency of the user according to all different user access DPIs;
step S53: providing the established knowledge distillation-based marketing campaign prediction model; wherein, the probability range of the predicted value is limited between 0 and 1 by using sigmoid function, and the two classification problems of clicking or not clicking are formed by defining threshold value, namely, the predicted value of the marketing activity prediction model based on knowledge distillation is the clicking willingness degree of the user.
Further, in the method for predicting a marketing campaign based on knowledge distillation, the model predicting step S5 further includes:
step S54: and selecting all or part of the users with the model predicted value of 1 to click with willingness in a centralized manner according to the actual putting requirements to carry out accurate marketing tasks.
Further, after step S11, an anomaly detection and processing step, a continuous feature processing step, and/or a dimension reduction step are/is further included for the original information of the user; and in the continuous feature processing step, the data distribution is adjusted for the continuous features by using a RankGauss method, and in the dimensionality reduction step, the dimensionality reduction is performed on the high-dimensional features by using a principal component analysis method.
Further, the method for predicting the marketing campaign based on knowledge distillation further comprises a step S47 of performing model evaluation index processing and tuning processing on the model for predicting the marketing campaign based on knowledge distillation; the model evaluation indexes comprise an AUC value, a Log loss value and a relative information gain RIG value.
Further, the model tuning process includes one or more of the following steps:
batch normalization is added, and the problem of internal covariate deviation of data is solved;
adding a function of leading part of neurons to be in a dormant state in a training process in a network;
adjusting the learning rate, generally adjusting the learning rate in the training process through strategies such as exponential attenuation and the like;
setting multiple sub-training averaging to better solve the problem of insufficient generalization capability caused by large data variance;
adding L1 or L2 regularization, and applying penalties to the loss function to reduce the risk of overfitting;
and (3) optimizing the hyper-parameters.
According to the technical scheme, firstly, a more complex teacher model Net-T with a residual error neural network as a core is constructed, and then, a student model Net-S formed by a simple neural network is constructed. Weighting a soft label obtained by training a teacher model Net-T at high temperature and a hard label obtained by training a student model Net-S at the same temperature to obtain a total loss function of knowledge distillation; and (4) training to obtain a final obtained neural network model by taking the total loss function as an objective function of the student model Net-S in actual deployment, and making a prediction.
By the method, Bayesian inference can be effectively utilized, and prediction uncertainty is introduced into the neural network, so that the model has stronger robustness. And (3) by an inner/outer product combination method, intersecting the features to extract the high-dimensional recessive features. The mixed model can effectively expand the application of deep learning to the algorithm problem of the calculation advertisement and recommendation system, obviously improves the accuracy of the user click behavior prediction, and can screen out a large number of low-intention users from the target of delivery, thereby saving a large amount of marketing cost and realizing the increase of profit margin.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for predicting a marketing campaign based on knowledge distillation according to an embodiment of the present invention
FIG. 2 is a schematic diagram of original data and data obtained after RankGauss processing in the embodiment of the present invention
FIG. 3 is a diagram of a teacher model data set partitioning and network training framework according to an embodiment of the present invention
FIG. 4 is a diagram of a teacher model in an embodiment of the invention
FIG. 5 is a schematic diagram of a student model according to an embodiment of the invention
FIG. 6 is a schematic diagram of a knowledge distillation based marketing campaign prediction model in an embodiment of the present invention
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
In the following detailed description of the embodiments of the present invention, in order to clearly illustrate the structure of the present invention and to facilitate explanation, the structure shown in the drawings is not drawn to a general scale and is partially enlarged, deformed and simplified, so that it should be understood as a limitation of the present invention.
Referring to fig. 1, fig. 1 is a flow chart illustrating a marketing campaign prediction method based on knowledge distillation according to an embodiment of the present invention. As shown in fig. 1, the knowledge distillation-based marketing campaign prediction method includes a data preprocessing step S1, a teacher model data set partitioning and network training framework forming step S2, a student model data set partitioning and network training framework forming step S3, and a prediction model building step S4.
In an embodiment of the present invention, the data preprocessing step S1 includes the following steps:
step S11: acquiring original information of N users, and extracting original characteristic information from the original information of the users; the original feature information comprises a user ID (id), a user mobile phone number attribution (location), a task number (batch number), a user access DPI (DPI), a user access DPI frequency (DPI frequency), an access time, an access duration feature and/or a digital label of the feature of whether the user clicks or not; the task batch number represents original information of a user in a date time period, the DPI access frequency of the user, the DPI access time of the user and/or the user access time are/is each task batch number as a metering unit, and the DPI access time of the user on the day and the attribution feature of the mobile phone number of the user are category features.
Referring to table 1 below, table 1 is a table description of raw data before preprocessing, and taking the data of the same batch as an example, the raw data before preprocessing is shown in table 1 below:
Figure BDA0002959791510000071
Figure BDA0002959791510000081
in the embodiment of the present invention, the raw data further needs to undergo steps of anomaly detection and processing, category feature processing, continuous feature processing, dimension reduction processing, and the like.
Abnormality detection and processing: in the process of combining the service requirements, deletion, filling and other processing are required for missing values, overlarge values and the like in the original data. In the data acquisition process, as the number of general users is in the million level, the missing condition may occur in the data acquisition process; if the missing amount is small, the removal can be generally directly carried out; if it is impossible to determine whether the missing data will affect the final model training effect, the missing value can be filled up by taking the average, mode, median, etc.
In addition, in data acquisition, a problem of an excessively large value may be encountered, for example, a user accesses the DPI ten thousand times within a day, which generally does not help to improve the generalization capability of the model in the actual modeling process, and therefore, a culling process or a padding method may be adopted to perform corresponding processing.
Further, in the embodiment of the present invention, continuous features may also be processed, that is, data of access time and access duration of different dimensions may be mapped to a uniform interval. Specifically, for the characteristics such as the access time and the access duration, for example, the data distribution may be adjusted by using a RankGauss method. RankGauss is similar to conventional normalization or normalization methods, and its basic function is to map data of different dimensions to a uniform interval, such as 0-1 or-1 to 1. This is very important for gradient-based algorithms such as deep learning. On the basis, the RankGauss further utilizes the reciprocal of the error function, so that the normalized data presents approximate Gaussian distribution. Referring to fig. 2, fig. 2 is a schematic diagram illustrating original data and data obtained after RankGauss processing according to an embodiment of the present invention. Wherein, the graph (a) is original data, and the graph (b) is data obtained after RankGauss.
In the embodiment of the invention, Principal Component Analysis (PCA) can be further adopted to perform dimensionality reduction on the high-dimensional feature. As can be seen from the above processing of the class characteristics, a high-dimensional sparse matrix is generally formed after the one-hot encoding, which means that there is no way to derive in many places when the error propagates reversely for the training of the neural network, which is obviously not beneficial to the network training. Meanwhile, the high-dimensional characteristics also increase the calculation overhead. Therefore, it is necessary to perform dimension reduction on the high-dimensional features. The PCA achieves the purpose of reducing the dimension by solving the maximum variance of the original data in a certain projection direction; the loss of information contained in the original features is reduced as much as possible while the feature dimensions are reduced, so that the purpose of comprehensively analyzing the collected data is achieved.
Step S12: processing the category characteristics; performing One-hot coding processing on the attribution characteristics of the user mobile phone number and the DPI accessed by the user; and the One-hot coding processing comprises the steps of sequentially expanding all different user access DPIs as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into a relation feature of the DPI and the DPI access frequency of the users according to all different user access DPIs.
Specifically, firstly, One-hot unique coding can be performed on the DPI accessed by the user on the same day and the attributive features of the mobile phone number of the user, and the One-hot unique coding is expanded. Taking a user accessing the DPIs as an example, if a certain user accesses a certain DPI, recording the DPI as 1, and recording the rest DPIs as 0; thus, if there are 10 different DPIs, 10 columns of features are formed, and only one corresponding user in each column of features is 1, and the rest are 0.
After pretreatment, the data is in the form of table 2 below:
Figure BDA0002959791510000091
after the data processing step, the teacher model data set partitioning and network training framework forming step S2 can be performed, wherein the step S2 is a hybrid method combining hierarchical sampling and k-fold cross validation; the method specifically comprises the following steps:
s21: after preprocessing, selecting all data in the task batch number to be equally divided into k +1 sets; wherein, 1 set is used as a test set, and the data of the rest sets are used as a training set;
s22: calculating the total proportion of each sample in the two types of samples of clicking and non-clicking from the training set, and then, assuming that the training set is divided into k sets and the requirement of meeting the requirement that the proportion of the two types of samples of clicking and non-clicking in the samples obtained from each set is consistent with the total proportion;
s23: sequentially selecting one of the K sets as a verification set, and the rest K-1 sets as training sets to form K groups of verification set pairs and training set pairs; sequentially using K groups of verification sets and training set pairs to train the initialized teacher model, verifying the training result by using the corresponding verification sets, and testing by using the test sets to obtain K groups of test results; wherein the teacher model is a residual error neural network;
s24: averaging the K groups of test results to obtain an average value of the K groups of test results as a final classifier; the error generated by the final average of the network training framework using the K-fold interactive verification of the teacher model may also be referred to as out-of-bag (oob) error.
Referring to fig. 3, fig. 3 is a schematic diagram of a data set partitioning and network training framework of a teacher model according to an embodiment of the present invention. As shown in fig. 3, a mixed method of hierarchical sampling and k-fold cross validation is adopted to perform data set partitioning and network training framework, so that the teacher model can be more accurate.
Hierarchical sampling is a sampling method that preserves class proportions. Specifically, the overall sample is divided into a plurality of layers according to a certain characteristic, and then pure random sampling is carried out from each layer to form a sample. Specifically, the ratio of each sample in the two types of samples, click and no click, can be first calculated from the training set, and then the training set is assumed to be divided into k layers, and sampling is performed at each layer, so that the ratio of the two types of samples in the samples obtained at each layer is basically consistent with the total ratio.
In the embodiment of the present invention, 5-fold cross validation is taken as an example for explanation, that is, 80% of the training set is subdivided into 5 parts, 1 part is selected as the validation set each time, and the other 4 parts are selected as the training set. Thus, the model can be trained for 5 times, and each training is carried out by using 20% of the test set to obtain the evaluation index of the model, and the average of the 5-fold cross validation results is used as the final classifier. Thus, although the training process is relatively burdensome, the error generated by the final average of the network training framework using K-fold interactive validation of the teacher model is very small.
In the embodiment of the invention, the teacher model generally has complex requirements and can have strong expression capability on the characteristics in the data. Referring to fig. 4, fig. 4 is a schematic diagram of a teacher model according to an embodiment of the present invention. As shown in fig. 4, the overall architecture of the teacher model Net-T network in the teacher model training channel includes:
the Input layer (Input layer) is used for inputting data obtained by dividing the data set of the teacher model Net-T; the method can divide the characteristics into individual fields (field) according to different characteristics (such as information of DPI duration, gender, age distribution and the like), and perform One-hot encoding (One-hot encoding) on the category characteristics;
an Embedding layer (Embedding layer) for extracting information and reducing dimension of the data characteristics input from the input layer;
a Product layer (Product layer) for performing feature interaction of outer Product and inner Product on the features processed by the embedding layer;
a Factorization layer (Factorization layer) for factorizing the weight matrix after the feature interaction; that is, to reduce the calculation amount, the weight matrix is factorized as shown in fig. 3;
a Fully-connected layer (full-connected layer) including N hidden layers, wherein the hidden layers are designed into four network forms, namely incremental creation, invariable constant, diamond or incremental creation; wherein N is greater than M;
and the output layer outputs the predicted probability by adopting a sigmoid function, and forms a two-classification problem of clicking or not clicking by defining a threshold, namely an output result of dividing into a positive label or a negative label.
Next, a data set partitioning and network training framework forming step S3 of the student model is performed, the S3 including:
s31: selecting all data in the task batch number to be equally divided into K +1 sets; wherein, 1 set is used as a verification set, and the rest data is used as a training set;
s32: a neural network is adopted as a student model Net-S, and the student model Net-S comprises an input layer, M fully-connected hidden layers and an output layer.
In the embodiment of the invention, the student model can establish a simple deep neural network in consideration of the aspects of flexibility in model deployment, small consumption of computing resources and the like, for example, the student model Net-S can comprise two or three fully-connected hidden layers. Referring to fig. 5, fig. 5 is a schematic diagram of a student model according to an embodiment of the invention. As shown in fig. 5, the student model only includes three Fully-connected hidden layers (full-connected layers) between the input layer (Iutput) and the Output layer (Output), and the number of network parameters is greatly reduced compared to the teacher model.
After the above model is built, the prediction model building step S4 may be executed. In an embodiment of the invention, the predictive model is based on an integral framework of knowledge distillation. Referring to fig. 6, fig. 6 is a schematic diagram of a knowledge-distillation-based marketing campaign prediction model according to an embodiment of the present invention. The steps of establishing the prediction model of the marketing campaign based on knowledge distillation are described below with reference to fig. 6.
The prediction model building step S4 includes the steps of:
step S41: providing an initialized knowledge distillation-based training model, wherein the training model comprises a teacher model Net-T training channel, a student model Net-S training channel and an output module;
step S42: training the teacher model Net-T by adopting the data set according to the data set division and network training framework of the teacher model, taking the average of K-fold cross validation results as a final classifier, and obtaining a soft label;
step S43: training the student model Net-S by adopting the data set according to the data set division and network training framework of the student model Net-S, obtaining hard prediction by training at a lower temperature t, and comparing the hard prediction with the real predictionTag hard label composition hard loss function Lhard
Step S44: distilling knowledge of the teacher model Net-T to the student model Net-S at a higher temperature T, namely training at the high temperature T to obtain soft prediction and forming a soft loss function L with a soft labelsolf
Step S45: weighting the soft and hard Loss functions to obtain a total Loss function Loss, i.e.
Loss=αLsoft+βLhard(ii) a Where α is the soft loss function LsolfBeta is the real tag hard loss function LhardThe weight of (c); among them, high temperature distillation is the most important step for establishing a prediction model.
Step S46: and using the total loss function as an objective function of the student model Net-S during actual deployment, training to obtain the student model Net-S with optimized parameters, and using the student model Net-S after final optimization as a marketing activity prediction model based on knowledge distillation.
It is noted here that for the multi-class problem, the activation function of the output layer generally uses a normalized exponential function, namely, softmax function, which is modified to a generalized softmax function in order to introduce the concept of temperature in knowledge distillation. I.e., the activation function of the output layer of the teacher model Net-T, uses a generalized normalized exponential function, i.e., softmax function,
Figure BDA0002959791510000121
and when the temperature T is higher, the soft label obtained by training with the teacher model Net-T is softer.
In addition, in the knowledge distillation process, the selection of the high temperature T is related to the parameter quantity of the student model Net-S, when the parameter quantity of the student model Net-S is smaller, the high temperature T selects a lower temperature, and conversely, when the parameter quantity of the student model Net-S is larger, the high temperature T selects a higher temperature.
That is, in the knowledge distillation process, the extraction of the characteristic knowledge is carried out by selecting a proper temperature. Generally, the high and low of the high temperature T are the attention degree of the student model Net-S training process to the negative label (the negative label is not clicked in the click rate, namely the label is 0). When the temperature is lower, there is less concern about negative labels, especially those that are significantly lower than the average; and when the temperature is higher, the value related to the negative label is relatively increased, and the student model Net-S focuses relatively more on the negative label. In general, the selection of the high temperature T is related to the size of the student model Net-S, and when the parameter quantity of the student model Net-S is smaller, the high temperature T can be selected to be relatively lower temperature.
Specifically, after the training, the finally optimized student model Net-S is used as a knowledge distillation-based marketing activity prediction model, and the output prediction users are classified into two categories, namely 'click' and 'no click', so that an output layer neuron is finally added to the network structure.
After the model training is finished, the method further comprises a step S47 of carrying out model evaluation index processing and adjusting and optimizing processing on the marketing activity prediction model based on knowledge distillation; the model evaluation indexes comprise AUC (area Under cutter) values, Log loss values and relative Information gain RIG (relative Information gain) values. In general, the closer the AUC value is to 1, the better the model classification effect. The smaller the Log loss value is, the higher the accuracy of the click rate estimation is; the larger the relative information gain value is, the better the model effect is.
For example, after the data are processed according to the above steps and trained by the model, the training effect of the model can be judged through the locally verified AUC value; if the effect is poor, the model generally needs to be optimized, and for the deep learning algorithm, the optimization can be generally performed from the following aspects:
adding Batch Normalization (Batch Normalization) to solve the Internal Covariate Shift problem of data.
And secondly, adding Dropout in the network, namely enabling part of the neurons to be in a dormant state in the training process.
And thirdly, adjusting the learning rate, wherein the learning rate in the training process is generally adjusted through strategies such as exponential attenuation and the like.
And fourthly, setting multiple seed training for averaging, and reducing the overfitting risk in the training process.
Increasing L1 or L2 regularization, and applying punishment to the loss function to reduce the risk of overfitting.
And sixthly, optimizing the super parameters.
In the optimization method of the hyper-parameter, a Grid Search (Grid Search) or a Random Search (Random Search) can be generally adopted; however, the two methods are relatively high in consumption of computing resources and are not efficient. In an embodiment of the present invention, a Bayesian Optimization (Bayesian Optimization) strategy is employed. Bayesian optimization calculates posterior probability distribution of the previous n data points through Gaussian process regression to obtain the mean value and variance of each hyper-parameter at each value-taking point; bayesian optimization finally selects a group of better hyper-parameters through balancing mean and variance and according to the joint probability distribution among the hyper-parameters.
After all the processing steps are completed, the characteristics are brought into a user prediction model, so that partial users with high willingness can be screened out in advance before advertisement putting, and accurate putting of marketing advertisements is carried out on the users.
That is, the present invention may further include a marketing campaign prediction step S5, where the step S5 specifically includes:
step S51: acquiring a user group for simulating Internet product marketing and user original information of the user group, and extracting original characteristic information from the user original information; the task batch number represents original information of a user in a date time period, and the DPI access frequency of the user are each task batch number as a measurement unit;
step S52: performing One-hot coding processing on the original characteristic information of the task batch number according to the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
expanding all different user access DPIs as independent features according to the task batch number, and expanding the DPI access frequency in the task batch number into a relation feature of the DPI and the DPI access frequency of the user according to all different user access DPIs;
step S53: providing the established knowledge distillation-based marketing campaign prediction model; wherein, a probability range of a predicted value is limited between 0 and 1 by using a sigmoid function, and the predicted value is formed into a two-class problem of clicking or not clicking by defining a threshold value, namely the predicted value of the marketing activity prediction model based on knowledge distillation is the clicking willingness degree of the user;
step S54: and selecting all or part of the users with the model predicted value of 1 to click with willingness in a centralized manner according to the actual putting requirements to carry out accurate marketing tasks.
The result shows that Bayesian inference can be effectively utilized by the method of the invention, and prediction uncertainty is introduced into the neural network, so that the model has stronger robustness. And (3) by an inner/outer product combination method, intersecting the features to extract the high-dimensional recessive features. The hybrid model can effectively expand the application of deep learning to the algorithm problem of the calculation advertisement and recommendation system, and obviously improve the accuracy of the prediction of the user click behavior, thereby saving a large amount of marketing cost and realizing the increase of profit margin.
The above description is only for the preferred embodiment of the present invention, and the embodiment is not intended to limit the scope of the present invention, so that all the equivalent structural changes made by using the contents of the description and the drawings of the present invention should be included in the scope of the present invention.

Claims (10)

1. A marketing campaign prediction method based on knowledge distillation is characterized by comprising a data preprocessing step S1, a teacher model data set division and network training frame forming step S2, a student model data set division and network training frame forming step S3 and a prediction model establishing step S4;
the data preprocessing step S1 includes the steps of:
step S11: acquiring original information of N users, and extracting original characteristic information from the original information of each user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a user access DPI frequency, an access time, an access duration characteristic and/or a digital label which is clicked or not by the user; the task batch number represents original information of a user in a date time period, and the DPI access frequency of the user are each task batch number as a measurement unit;
step S12: sequentially processing the original characteristic information in all batches with the task batch numbers, and performing One-hot coding processing on the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
sequentially expanding all different user access DPIs as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into the relationship features of the DPI and the DPI access frequency of the users according to all different user access DPIs;
the step S2 of dividing the data set of the teacher model and forming the network training framework is a mixed method of adopting layered sampling and k-fold cross validation; the method specifically comprises the following steps:
s21: after preprocessing, selecting all data in the task batch number to be equally divided into k +1 sets; wherein, 1 set is used as a test set, and the data of the rest sets are used as a training set;
s22: calculating the total proportion of each sample in the two types of samples of clicking and non-clicking from the training set, and then, assuming that the training set is divided into k sets and the requirement of meeting the requirement that the proportion of the two types of samples of clicking and non-clicking in the samples obtained from each set is consistent with the total proportion;
s23: sequentially selecting one of the K sets as a verification set, and the rest K-1 sets as training sets to form K groups of verification set pairs and training set pairs; sequentially using K groups of verification sets and training set pairs to train the initialized teacher model, verifying the training result by using the corresponding verification sets, and testing by using the test sets to obtain K groups of test results; wherein the teacher model is a residual error neural network;
s24: averaging the K groups of test results to obtain an average value of the K groups of test results;
the data set partitioning and network training framework forming step S3 of the student model includes:
s31: selecting all data in the task batch number to be equally divided into K +1 sets; wherein, 1 set is used as a verification set, and the rest data is used as a training set;
s32: a neural network is adopted as a student model Net-S, wherein the student model Net-S comprises an input layer, M fully-connected hidden layers and an output layer;
the prediction model building step S4 includes the steps of:
step S41: providing an initialized knowledge distillation-based training model, wherein the training model comprises a teacher model Net-T training channel, a student model Net-S training channel and an output module;
step S42: training the teacher model Net-T by adopting the data set according to the data set division and network training framework of the teacher model, taking the average of K-fold cross validation results as a final classifier, and obtaining a soft label;
step S43: training the student model Net-S by adopting the data set according to the data set division and network training framework of the student model Net-S, obtaining hard prediction by training at a lower temperature t, and forming a hard loss function L by the hard prediction and a real label hard labelhard
Step S44: distilling knowledge of the teacher model Net-T to the student model Net-S at a higher temperature T, namely training at the high temperature T to obtain soft prediction and forming a soft loss function L with a soft labelsolf
Step S45: weighting the soft and hard Loss functions to obtain a total Loss function Loss, i.e.
Loss=αLsoft+βLhard(ii) a Where α is the soft loss function LsolfBeta is the real tag hard loss function LhardThe weight of (c);
step S46: and using the total loss function as an objective function of the student model Net-S during actual deployment, training to obtain the student model Net-S with optimized parameters, and using the student model Net-S after final optimization as a marketing activity prediction model based on knowledge distillation.
2. The knowledge distillation-based marketing campaign prediction method of claim 1, wherein the overall architecture of the teacher model Net-T network in the teacher model training channel comprises:
the input layer is used for inputting data obtained after the division of the data set of the teacher model Net-T;
the embedded layer is used for extracting information and reducing dimensions of the data characteristics input from the input layer;
multiplying the layer, and respectively performing the feature interaction of the outer product and the inner product on the features processed by the embedding layer;
the factorization layer is used for carrying out factorization on the weight matrix after the features are interacted;
a fully-connected layer comprising N hidden layers, wherein the hidden layers are designed into four network forms, namely incremental creating, invariable constant, diamond or incremental creating; wherein N is greater than M;
and the output layer outputs the predicted probability by adopting a sigmoid function, and forms a two-classification problem of clicking or not clicking by defining a threshold, namely an output result of dividing into a positive label or a negative label.
3. The knowledge distillation based marketing campaign prediction method of claim 1, wherein the student model Net-S comprises two or three hidden layers that are fully connected.
4. The knowledge distillation-based marketing campaign prediction method of claim 1, wherein, in knowledge distillation, the activation function of the output layer of the teacher model Net-T uses a generalized normalized exponential function, namely a softmax function,
Figure FDA0002959791500000031
and when the temperature T is higher, the soft label obtained by training with the teacher model Net-T is softer.
5. The knowledge-distillation-based marketing campaign prediction method of claim 1, wherein in the knowledge distillation process, the selection of the high temperature T is related to the parameter size of the student model Net-S, and when the parameter size of the student model Net-S is smaller, the high temperature T selects a lower temperature, whereas when the parameter size of the student model Net-S is larger, the high temperature T selects a higher temperature.
6. The knowledge distillation-based marketing campaign prediction method of claim 1, further comprising a marketing campaign prediction step S5, wherein the step S5 specifically comprises:
step S51: acquiring a user group for simulating Internet product marketing and user original information of the user group, and extracting original characteristic information from the user original information; the task batch number represents original information of a user in a date time period, and the DPI access frequency of the user are each task batch number as a measurement unit;
step S52: performing One-hot coding processing on the original characteristic information of the task batch number according to the attribution characteristics of the user mobile phone number; wherein the One-hot encoding process comprises:
expanding all different user access DPIs as independent features according to the task batch number, and expanding the DPI access frequency in the task batch number into a relation feature of the DPI and the DPI access frequency of the user according to all different user access DPIs;
step S53: providing the established knowledge distillation-based marketing campaign prediction model; wherein, the probability range of the predicted value is limited between 0 and 1 by using sigmoid function, and the two classification problems of clicking or not clicking are formed by defining threshold value, namely, the predicted value of the marketing activity prediction model based on knowledge distillation is the clicking willingness degree of the user.
7. The knowledge distillation-based marketing campaign prediction method of claim 6, wherein the model prediction step S5 further comprises:
step S54: and selecting all or part of the users with the model predicted value of 1 to click with willingness in a centralized manner according to the actual putting requirements to carry out accurate marketing tasks.
8. The knowledge distillation-based marketing campaign prediction method of claim 1, further comprising performing an anomaly detection and processing step, a continuous feature processing step and/or a dimension reduction step on the user' S raw information after step S11; and in the continuous feature processing step, the data distribution is adjusted for the continuous features by using a RankGauss method, and in the dimensionality reduction step, the dimensionality reduction is performed on the high-dimensional features by using a principal component analysis method.
9. The knowledge distillation-based marketing campaign prediction method of claim 1, further comprising a step S47 of performing model evaluation index processing and tuning processing on the knowledge distillation-based marketing campaign prediction model; the model evaluation indexes comprise an AUC value, a Log loss value and a relative information gain RIG value.
10. The marketing prediction method combining automated feature engineering and residual neural networks of claim 9, wherein the model tuning process comprises one or more of:
batch normalization is added, and the problem of internal covariate deviation of data is solved;
adding a function of leading part of neurons to be in a dormant state in a training process in a network;
adjusting the learning rate, generally adjusting the learning rate in the training process through strategies such as exponential attenuation and the like;
setting multiple sub-training averaging to better solve the problem of insufficient generalization capability caused by large data variance;
adding L1 or L2 regularization, and applying penalties to the loss function to reduce the risk of overfitting;
and (3) optimizing the hyper-parameters.
CN202110235391.5A 2021-03-03 2021-03-03 Marketing activity prediction model structure and prediction method based on knowledge distillation Withdrawn CN112967088A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110235391.5A CN112967088A (en) 2021-03-03 2021-03-03 Marketing activity prediction model structure and prediction method based on knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110235391.5A CN112967088A (en) 2021-03-03 2021-03-03 Marketing activity prediction model structure and prediction method based on knowledge distillation

Publications (1)

Publication Number Publication Date
CN112967088A true CN112967088A (en) 2021-06-15

Family

ID=76276304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110235391.5A Withdrawn CN112967088A (en) 2021-03-03 2021-03-03 Marketing activity prediction model structure and prediction method based on knowledge distillation

Country Status (1)

Country Link
CN (1) CN112967088A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378866A (en) * 2021-08-16 2021-09-10 深圳市爱深盈通信息技术有限公司 Image classification method, system, storage medium and electronic device
CN114022811A (en) * 2021-10-29 2022-02-08 长视科技股份有限公司 Water surface floater monitoring method and system based on continuous learning
CN114037653A (en) * 2021-09-23 2022-02-11 上海仪电人工智能创新院有限公司 Industrial machine vision defect detection method and system based on two-stage knowledge distillation
CN114139724A (en) * 2021-11-30 2022-03-04 支付宝(杭州)信息技术有限公司 Method and device for training gain model
CN114331531A (en) * 2021-12-28 2022-04-12 上海数鸣人工智能科技有限公司 Prediction method of WaveNet technology for individual behavior insight based on simulated annealing thought
CN114677673A (en) * 2022-03-30 2022-06-28 中国农业科学院农业信息研究所 Potato disease identification method based on improved YOLO V5 network model
CN115147376A (en) * 2022-07-06 2022-10-04 南京邮电大学 Skin lesion intelligent identification method based on deep Bayesian distillation network
CN115170919A (en) * 2022-06-29 2022-10-11 北京百度网讯科技有限公司 Image processing model training method, image processing device, image processing equipment and storage medium
CN115271272A (en) * 2022-09-29 2022-11-01 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN116911956A (en) * 2023-09-12 2023-10-20 深圳须弥云图空间科技有限公司 Recommendation model training method and device based on knowledge distillation and storage medium
CN117057852A (en) * 2023-10-09 2023-11-14 北京光尘环保科技股份有限公司 Internet marketing system and method based on artificial intelligence technology

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN111950806A (en) * 2020-08-26 2020-11-17 上海数鸣人工智能科技有限公司 Marketing activity prediction model structure and prediction method based on factorization machine
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112258223A (en) * 2020-10-13 2021-01-22 上海数鸣人工智能科技有限公司 Marketing advertisement click prediction method based on decision tree
CN112418343A (en) * 2020-12-08 2021-02-26 中山大学 Multi-teacher self-adaptive joint knowledge distillation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN111950806A (en) * 2020-08-26 2020-11-17 上海数鸣人工智能科技有限公司 Marketing activity prediction model structure and prediction method based on factorization machine
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112258223A (en) * 2020-10-13 2021-01-22 上海数鸣人工智能科技有限公司 Marketing advertisement click prediction method based on decision tree
CN112418343A (en) * 2020-12-08 2021-02-26 中山大学 Multi-teacher self-adaptive joint knowledge distillation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CODINGLING: "深度学习参数怎么调优,这12个trick告诉你 - 知乎", pages 1 - 2, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/52189739> *
JIEMING ZHU等: "Ensembled CTR Prediction via Knowledge Distillation", CIKM \'20, pages 2941 - 2948 *
TOTOTO: "什么是 RankGauss? - Tototo的博客", pages 1 - 7, Retrieved from the Internet <URL:https://tsumit.hatenablog.com/entry/2020/06/20/044835> *
刘振鹏: "HRS-DC:基于深度学习的混合推荐模型", 《计算机工程与应用》, pages 169 - 174 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378866A (en) * 2021-08-16 2021-09-10 深圳市爱深盈通信息技术有限公司 Image classification method, system, storage medium and electronic device
CN114037653B (en) * 2021-09-23 2024-08-06 上海仪电人工智能创新院有限公司 Industrial machine vision defect detection method and system based on two-stage knowledge distillation
CN114037653A (en) * 2021-09-23 2022-02-11 上海仪电人工智能创新院有限公司 Industrial machine vision defect detection method and system based on two-stage knowledge distillation
CN114022811A (en) * 2021-10-29 2022-02-08 长视科技股份有限公司 Water surface floater monitoring method and system based on continuous learning
CN114139724A (en) * 2021-11-30 2022-03-04 支付宝(杭州)信息技术有限公司 Method and device for training gain model
CN114139724B (en) * 2021-11-30 2024-08-09 支付宝(杭州)信息技术有限公司 Training method and device for gain model
CN114331531A (en) * 2021-12-28 2022-04-12 上海数鸣人工智能科技有限公司 Prediction method of WaveNet technology for individual behavior insight based on simulated annealing thought
CN114677673A (en) * 2022-03-30 2022-06-28 中国农业科学院农业信息研究所 Potato disease identification method based on improved YOLO V5 network model
CN115170919A (en) * 2022-06-29 2022-10-11 北京百度网讯科技有限公司 Image processing model training method, image processing device, image processing equipment and storage medium
CN115170919B (en) * 2022-06-29 2023-09-12 北京百度网讯科技有限公司 Image processing model training and image processing method, device, equipment and storage medium
CN115147376A (en) * 2022-07-06 2022-10-04 南京邮电大学 Skin lesion intelligent identification method based on deep Bayesian distillation network
CN115271272A (en) * 2022-09-29 2022-11-01 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN115271272B (en) * 2022-09-29 2022-12-27 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN116911956A (en) * 2023-09-12 2023-10-20 深圳须弥云图空间科技有限公司 Recommendation model training method and device based on knowledge distillation and storage medium
CN117057852A (en) * 2023-10-09 2023-11-14 北京光尘环保科技股份有限公司 Internet marketing system and method based on artificial intelligence technology
CN117057852B (en) * 2023-10-09 2024-01-26 头流(杭州)网络科技有限公司 Internet marketing system and method based on artificial intelligence technology

Similar Documents

Publication Publication Date Title
CN112967088A (en) Marketing activity prediction model structure and prediction method based on knowledge distillation
CN114117220B (en) Deep reinforcement learning interactive recommendation system and method based on knowledge enhancement
WO2022161202A1 (en) Multimedia resource classification model training method and multimedia resource recommendation method
CN111104595B (en) Deep reinforcement learning interactive recommendation method and system based on text information
CN108829763B (en) Deep neural network-based attribute prediction method for film evaluation website users
CN113344615B (en) Marketing campaign prediction method based on GBDT and DL fusion model
Alshmrany Adaptive learning style prediction in e-learning environment using levy flight distribution based CNN model
CN113297936B (en) Volleyball group behavior identification method based on local graph convolution network
CN113255844B (en) Recommendation method and system based on graph convolution neural network interaction
CN112819523B (en) Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network
Wang et al. Learning performance prediction via convolutional GRU and explainable neural networks in e-learning environments
CN112288554B (en) Commodity recommendation method and device, storage medium and electronic device
CN113590965B (en) Video recommendation method integrating knowledge graph and emotion analysis
CN112257841A (en) Data processing method, device and equipment in graph neural network and storage medium
CN110110372B (en) Automatic segmentation prediction method for user time sequence behavior
CN113591971B (en) User individual behavior prediction method based on DPI time sequence word embedded vector
CN113689234B (en) Platform-related advertisement click rate prediction method based on deep learning
Chen et al. Poverty/investment slow distribution effect analysis based on Hopfield neural network
CN115310004A (en) Graph nerve collaborative filtering recommendation method fusing project time sequence relation
CN115689639A (en) Commercial advertisement click rate prediction method based on deep learning
Ma Artificial intelligence-driven education evaluation and scoring: Comparative exploration of machine learning algorithms
CN113360772A (en) Interpretable recommendation model training method and device
CN112581177A (en) Marketing prediction method combining automatic feature engineering and residual error neural network
Zhang et al. An interpretable neural model with interactive stepwise influence
CN115098787B (en) Article recommendation method based on cosine ranking loss and virtual edge map neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 200436 room 406, 1256 and 1258 Wanrong Road, Jing'an District, Shanghai

Applicant after: Shanghai Shuming Artificial Intelligence Technology Co.,Ltd.

Address before: Room 1601-026, 238 JIANGCHANG Third Road, Jing'an District, Shanghai, 200436

Applicant before: Shanghai Shuming Artificial Intelligence Technology Co.,Ltd.

CB02 Change of applicant information
WW01 Invention patent application withdrawn after publication

Application publication date: 20210615

WW01 Invention patent application withdrawn after publication