CN113887694A - A CTR Prediction Model Based on Feature Representation with Attention Mechanism - Google Patents
A CTR Prediction Model Based on Feature Representation with Attention Mechanism Download PDFInfo
- Publication number
- CN113887694A CN113887694A CN202010629307.3A CN202010629307A CN113887694A CN 113887694 A CN113887694 A CN 113887694A CN 202010629307 A CN202010629307 A CN 202010629307A CN 113887694 A CN113887694 A CN 113887694A
- Authority
- CN
- China
- Prior art keywords
- feature
- attention
- characteristic
- layer
- click
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Optimization (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
In order to complete click rate estimation according to the object characteristics of the object to be detected, the method can be used as a data fine arrangement link and applied to the fields of enterprise-level recommendation systems, search systems, online advertisement systems and the like. The invention provides a click rate estimation model based on characteristic representation under an attention mechanism, which comprises the following steps: the characteristic embedding layer is used for conducting vectorization processing on the continuous characteristic and the discrete characteristic to form a stacking characteristic and an explicit characteristic cross network, conducting explicit characteristic combination and implicit characteristic cross network on the stacking characteristic through the attention cross network, conducting implicit characteristic combination and pre-estimation probability output layer on the stacking characteristic through the multilayer perceptron, and pre-estimating the click rate according to the received combination characteristic. The attention cross network eliminates the dependence of the pre-estimated model on the artificial characteristic engineering, and simultaneously, the introduction of the attention mechanism distinguishes the importance of each combined characteristic on model pre-estimation and eliminates the influence of useless and redundant characteristics on the model.
Description
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to an end-to-end click prediction technology.
Background
Click-through rate estimation is one of the most core research topics in the industry as a key technology directly influencing user platform experience and advertisement revenue. At present, research work at home and abroad is mainly on the aspect of characteristic representation, and the existing methods are mainly divided into a machine learning click rate model and a deep learning click rate model.
In the early stage, the industry is limited by computing power, online learning and model deployment, and a lightweight machine learning model is mainly built, and the most classical model belongs to a Logistic Regression model (Logistic Regression). LR is rapidly becoming the mainstream model of CTR estimation in the industry due to its advantages of good mathematical meaning, strong interpretability, convenience for engineering deployment, etc. In 2010, Bredan McMahan et al propose an online learning algorithm FTRL (follow The regulated leader) for LR, which further promotes The application of LR in The industry, but LR models are linear in nature and have limited learning capability, and The prediction effect of LR models is usually dependent on The characteristic engineering capability of data scientists. Therefore, the industry begins to explore the construction of second-order combination features by using a binomial model (multinomial Regression), and performs explicit feature combination by using pairwise intersection of the features, so that the problem of feature combination is solved to a certain extent by a violent intersection mode, but the method can only learn the co-occurrence features appearing in a training data set, and is difficult to generalize the non-co-occurrence combination features in large-scale sparse data scenes such as recommendation and advertisement. In order to solve the defect of the binomial model, an FM (localization mechanism) is proposed by Steffen Rendle of the university of comstanzi in 2010, germany, a hidden weight vector (latent vector) is learned for each feature, the hidden vector inner product is used as a feature cross weight, the feature combination problem in a sparse feature scene is well solved, in addition, the FM training complexity is further reduced by a method of transforming an objective function form, and therefore, the FM gradually becomes an important choice of the CTR model in the industry around 2012 to 2014. In 2015, the FFM (Field-aware competition Machine) proposed based on FM was dazzling in a plurality of CTR forecast competitions, and then started to be applied in recommendation and advertising scenes by companies such as Criteo, Mei Tuo and the like. Compared with an FM model, the FFM model mainly introduces a concept of 'domain', when feature crossing is carried out, each feature selects an implicit vector corresponding to a combined feature domain to carry out inner product operation to obtain the weight of crossing features, so that the model has stronger expression capability, but is limited by the limitations that the FFM has higher space complexity and can only carry out second-order feature crossing and the like, and the FFM model is not widely used in the industry. In addition, Xinran He and the like in 2014 proposed a solution based on a GBDT (gradient Boost Decision tree) + LR combination model for processing high-dimensional feature combination and screening problems, which utilizes GBDT to automatically carry out feature screening and combination, utilizes unique codes to code leaf nodes, and then inputs the coding features as an LR model to complete CTR prediction, opens the precedent of utilizing the model to carry out high-order feature construction and screening, solves the problem of very troublesome feature combination and screening in the past more efficiently, and greatly promotes the important trend of feature engineering modeling.
In the period, researchers find that personalized requirements are easier to dig out by high-order combined features, the pushing effect of 'thousands of people and thousands of faces' is achieved, the high-order combined feature business logic is difficult to understand due to the fact that sparse features are increased sharply, the traditional artificial feature engineering is difficult to continue in the aspect of digging high-order combined features, and people begin to try to continue the advantages of model extraction features to complete personalized pushing of users in a big data scene. Along with the great diversity of deep learning in the fields of computer vision and natural language processing, people begin to try to utilize a neural network to automatically perform feature characterization so as to replace manual feature engineering to finish click rate estimation.
In 2016, Deep learning is applied to click rate prediction in a large scale, a Deep Crossing serial network structure is proposed by Microsoft Ying Shan and the like, the Deep Crossing serial network structure covers the classic elements of a CTR prediction neural network model, namely, sparse features are converted into low-dimensional dense features by adding an embedded layer, segmented feature vectors are spliced by using a stack layer, combination and conversion of the features are completed by multiple layers of neural networks, the CTR prediction is finally completed by a Sigmoid activation function, and a residual error network structure is formally introduced into the click rate prediction model for enhancing the high-order feature extraction capability of the model. In the same year, Zhang Weinan, etc. of Shanghai traffic university propose FNN, and on the basis of the previous Deep Crossing network structure, the hidden layer vector of FM is used as the embedding of users and materials, thereby avoiding completely training an embedding matrix from a random state, and greatly reducing the training time of a model and the instability of an embedding layer. The method for pre-training is used for completing the training of the network embedding layer, and undoubtedly is effective engineering experience for reducing the complexity and the training instability of a deep learning model. However, the conventional DNN directly uses multiple full-link layers to complete feature cross combination, and lacks "pertinence" of feature combination to a click rate estimation scene, so Yanru Qu et al propose PNN (Product-based Neural Network), add a Product layer between an embedded layer and a full-link layer, aim at performing feature combination between different feature domains, and enhance the capability of a model to represent different data modes.
In 2016, Google Heng-Tze Cheng and the like propose a Wide & Deep parallel network structure, and a Wide part of a single input layer and a Deep part passing through a multilayer perceptron are spliced and transmitted to an output layer. The method is characterized in that the memory of the model is realized by using the Wide part (memorability), the Generalization of the model is realized by using the Deep part (Generalization), the DNN excavates the implicit high-order feature combination, and the LR connects the Wide part and the Deep part to form the unified CTR model. Wide & Deep establishes a click rate estimation parallel framework based on Deep learning, but does not get rid of the limitation of needing to help with artificial feature engineering. Aiming at the defects of insufficient performance of the Wide part and the like, in 2017, Huifeng Guo and the like propose DeepFM, on the basis of continuing the Wide and Deep parallel network structure, FM is used for replacing the original Wide part, the feature combination capability of a shallow network is enhanced, and meanwhile, the dependence of a click rate prediction depth model on an artificial feature engineering is eliminated. In the same year, Ruoxi Wang et al propose Deep & Cross Network (DCN), begin to use Cross Network to replace the original Wide part, realize bit-level explicit feature interaction, further refine the feature Cross granularity of the Wide part. Xiangnan He et al propose NFM (neural differentiation machines) to improve Deep part, introduce Bi-interaction Pooling layer to replace FM to perform feature crossing, and further strengthen the feature combination capability of Deep layer. A deep learning network DIN (deep Interest network) based on an attention mechanism is proposed by Ali baba Guorui Zhou and the like in 2018, real-time Interest features are extracted from a user behavior sequence, and feature characterization on the user side is further improved. In the same year, Jianxun Lian et al propose an xDeepFM parallel network structure, model vector-level explicit feature Interaction, and CIN (compressed Interaction network) is adopted in the Wide part to enhance the explicit feature combination capability of the model, so that a certain effect is achieved. Subsequently, Guorui Zhou et al proposed DIEN (deep Interest Evolution network) in 2019, started to introduce a sequence model AUGRU on the basis of DIN, concatenated user interests at different times to form an Interest Evolution chain, and finally input the 'Interest vector' at the current moment into an upper-layer multilayer perceptron to finish click rate estimation in conjunction with other characteristics, so as to obtain a better effect.
In summary, the current deep learning model still cannot perform feature extraction, feature combination and feature screening by using a neural network instead of manual feature engineering. Furthermore, on the premise that a neural network is used for replacing a manual characteristic project, the characteristic change trend cannot be accurately estimated, and accurate click rate estimation cannot be obtained automatically.
Disclosure of Invention
In order to solve the problems, a click rate estimation model based on feature characterization under an attention mechanism is provided. The invention adopts the following technical scheme:
the click rate estimation model based on the characteristic representation under the attention mechanism comprises the following steps: the system comprises a characteristic embedding layer, an explicit characteristic cross network, an implicit characteristic cross network and a pre-estimation probability output layer. The characteristic embedding layer is used for carrying out vectorization processing on the continuous characteristic and the discrete characteristic and then carrying out stacking embedding processing to form a stacking characteristic; an explicit feature crossover network that forms an explicit output vector by inputting the stacking features into the attention crossover network for explicit feature combining; the implicit characteristic cross network is used for inputting the stacking characteristics into a multilayer perceptron to carry out implicit characteristic combination to form an implicit output vector; a probability output layer is pre-estimated, the explicit output vector and the implicit output vector are combined to form a high-order nonlinear combined characteristic, and the combined characteristic is transmitted to a Sigmoid activation function to predict the click rate, so that the click rate is obtained; wherein the attention crossing network comprises: the cross layer processes the stacking features through a cross algorithm and generates a multi-dimensional vector; and an attention layer for processing the multidimensional vector through a fully connected neural network to generate an attention score, performing normalization processing on the attention score to generate a characteristic coefficient, and further generating an explicit output vector through an output calculation formula based on the characteristic coefficient.
The machine learning-based real-time labor contraction judging system provided by the invention has the technical characteristics that the vectorization processing comprises the following steps: carrying out one-hot code conversion on the discrete features, and taking the encoded discrete features as embedded vectors; carrying out data standardization according to data distribution characteristics on the continuous features to form dense features; stacking and embedding the embedded vectors and the dense features to be used as stacking features, wherein a matrix calculation formula of the one-hot coding conversion is as follows:
xembed,i=Wembed,ixi#(1)
in the formula, xembed,iIs an embedded vector, xiIs a binary input of class i, andis an embedding matrix to be optimized together with other parameters in the network, and neAnd nvRespectively an input dimension and an embedding vector dimension.
The machine learning-based temporary labor contraction judging system provided by the invention also has the technical characteristics that the calculation formula of the cross algorithm is as follows:
in the formula (I), the compound is shown in the specification,are column vectors representing outputs from the l-th and l + 1-th interleaved layers, respectively;is the weight and deviation of the ith layer, and the function f represents the feature vector intersection formula of each layer.
The machine learning-based real-time labor contraction judging system provided by the invention also has the technical characteristics that the computing logic of the standardized processing in the attention layer is as follows:
a′1=hTRelu(Wxi+b)#(3)
in the formula (I), the compound is shown in the specification,are model parameters and the attention score is normalized by Softmax.
The machine learning-based real-time labor contraction judging system provided by the invention has the technical characteristics that the output calculation formula of the attention crossing network is as follows:
in the formula, aiIs the attention weight.
The machine learning-based temporary labor contraction judging system provided by the invention also has the technical characteristics that the calculation logics of all layers of the multilayer perceptron are as follows:
Hl+1=f(wlHl+Bl)#(7)
in the formula, Hl+1Representing the hidden layer, f (-) is the Relu function.
The machine learning-based real-time labor contraction judging system provided by the invention has the technical characteristics that the formula of the Sigmoid activation function is as follows:
in the formula (I), the compound is shown in the specification,respectively outputting the explicit characteristic cross network and the multilayer perception machine, and obtaining a final click rate predicted value through a Sigmoid function.
The machine learning-based antenatal uterine contraction judging system provided by the invention also has the technical characteristics that the click rate estimation model based on the characteristic representation under the attention mechanism carries out error back transmission on the click rate through a Logloss loss function until the click rate output by the output layer is converged, and the parameter updating of the click rate estimation model based on the characteristic representation under the attention mechanism is completed.
The machine learning-based temporary labor contraction judging system provided by the invention also has the technical characteristics that the Logloss loss function formula is as follows:
in the formula, piFor the output of the click-through rate prediction model, yiAnd (3) a label corresponding to the sample, N is the number of training samples, and lambda is an L2 regular term coefficient, error back-transmission is carried out through a Logloss loss function, and parameter updating is carried out based on the error back-transmission until the final click rate model training is completed through convergence.
Action and Effect of the invention
According to the click rate estimation model based on the feature representation under the attention mechanism, due to the feature embedding layer, the feature embedding layer solves the problem of overlarge vector dimension after the single hot coding processing by vectorizing the continuous feature and the discrete feature. Meanwhile, the system also has an explicit characteristic cross network, the explicit characteristic cross network realizes the dynamic weighting of the combination items through the attention cross network, more efficiently utilizes the combination characteristics, and eliminates the influence of redundant characteristics on a click rate prediction model. The method further comprises an implicit characteristic cross network, and the implicit characteristic cross network completes the capture of highly nonlinear interaction characteristics by applying a multilayer perceptron, so that the problem that the characteristic expression capability of the model is limited by the parameter scale is solved. Finally, the invention is provided with a pre-estimation probability output layer which carries out output click rate pre-estimation based on the explicit characteristic cross network and the implicit characteristic cross network through a Sigmoid activation function, so that the obtained pre-estimation data is more accurate. Further, the estimated data can be used as a data fine-ranking link and applied to the fields of enterprise-level recommendation systems, search systems, online advertisement systems and the like.
Drawings
FIG. 1 is a block diagram of a click through rate prediction model based on feature characterization under an attention mechanism in an embodiment of the present invention;
FIG. 2 is a flow chart of the operation of a feature embedding layer in an embodiment of the present invention;
FIG. 3 is a network architecture diagram of an attention crossing network in an embodiment of the present invention;
FIG. 4 is a network architecture diagram of a multi-tier perceptron in an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a training process of a feature characterization-based click rate estimation model under an attention mechanism according to an embodiment of the present invention; and
FIG. 6 is a flowchart of the deployment of the click-through rate prediction model in the embodiment of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the present invention easy to understand, the click rate estimation model based on the characteristic characterization under the attention mechanism of the present invention is specifically described below with reference to the embodiments and the drawings.
< example >
FIG. 1 is a block diagram of a click through rate prediction model based on feature characterization under the attention mechanism in the embodiment of the present invention.
As shown in FIG. 1, the feature characterization-based click through rate prediction model 100 under the attention mechanism includes: a feature embedding layer 101, an explicit feature crossing network 102, an implicit feature crossing network 103, and a predictive probability output layer 104.
In this embodiment, the user-side feature, the advertisement-side feature and the context feature are collected, the collected features are divided into a continuous feature and a discrete feature, the two features are used as input features, a training data set is formed by arranging the input features, and further, model training of the click rate estimation model 100 based on the feature representation under the attention mechanism is completed through the training data set. The training data set consists of n samples (x, y), the input features are continuous features and discrete features, the goal is to construct a click rate estimation model y which is a model (x), and x belongs to R n and is used for predicting that a user u clicks a candidate material v in a specified contexttIs equal to [0, 1 ] in the probability y, y ∈]
The feature embedding layer 101 performs vectorization processing on the received input features, and stacks the obtained embedded vectors and dense features to form stacked features and output the stacked features.
FIG. 2 is a flow chart of the operation of the feature embedding layer in an embodiment of the present invention.
As shown in fig. 2, the steps of forming the stacked features by the feature embedding layer 101 are as follows:
step S1, carrying out one-hot code conversion on the discrete type features, taking the encoded discrete type features as embedded vectors, and then entering step S2;
step S2, carrying out data standardization based on data distribution characteristics on the continuous features to form dense features, and then entering step S3;
in step S3, the embedding vector formed in step S1 and the dense feature formed in step S2 are subjected to a stack embedding process, i.e., the embedding vector and the dense feature are stacked into one vector, and the vector is referred to as a stacked feature, and then an end state is entered.
In this embodiment, the matrix calculation formula of the stack embedding process is:
xembed,i=Wembed,ixi#(1)
in the formula, xembed,iIs an embedded vector, xiIs a binary input of class i, andis an embedding matrix to be optimized together with other parameters in the network, and neAnd nvRespectively an input dimension and an embedding vector dimension.
The explicit feature crossing network 102 receives the stacked features formed by the feature embedding layer 101, and performs a crossing algorithm on the stacked features through the attention crossing network to generate a multi-dimensional vector;
fig. 3 is a network configuration diagram of an attention crossover network in an embodiment of the present invention.
As shown in diagram a of fig. 3, the attention-crossing network includes: a cross-layer 21 and an attention layer 22.
As shown in the B diagram of FIG. 3, the number of neurons in each of the interleaved layers 21 is the same and equal to the input vector x0Of (c) is calculated.
In this embodiment, the calculation formula of the crossover algorithm is:
in the formula (I), the compound is shown in the specification,are column vectors representing outputs from the l-th and l + 1-th interleaved layers, respectively;is the weight and deviation of the ith layer, and the function f represents the feature vector intersection formula of each layer.
The attention layer 22 processes the multidimensional vector through the fully-connected neural network and generates an attention score, and normalizes the attention score to generate a feature coefficient, and generates an explicit output vector by an output calculation formula based on the feature coefficient.
In this embodiment, an Attention network is used as the fully-connected neural network, a ReLU is used as the activation function, and the size of the network is expressed by an Attention factor. In addition, the computational logic of the normalization process in the attention layer 22 is:
a′i=hTRelu(Wxi+b)#(3)
in the formula (I), the compound is shown in the specification,are model parameters and the attention score is normalized by Softmax. Taking the result of the above formula as a characteristic coefficient, wherein the output calculation formula of the attention crossing network is as follows:
in the formula, aiIs the attention weight.
The implicit feature crossing network 103 forms an implicit output vector by inputting the stacked features into a multi-layer perceptron for implicit feature combination.
Fig. 4 is a network structure diagram of a multi-layer perceptron in an embodiment of the invention.
As shown in fig. 4, in the present embodiment, the multi-layer perceptron is a fully connected feedforward neural network, and the calculation logic of each layer is as follows:
Hl+1=f(W1Hl+Bl)#(7)
in the formula, Hl+1Representing the hidden layer, f (-) is the Relu function.
The pre-estimated probability output layer 104 combines the explicit output vector and the implicit output vector to form a high-order nonlinear combined feature, and simultaneously transmits the combined feature to a Sigmoid activation function to predict the click rate, so as to obtain the pre-estimated click rate.
In this embodiment, the formula of the Sigmoid activation function is as follows:
in the formula (I), the compound is shown in the specification,respectively outputting an explicit characteristic cross network and a multilayer sensing machine, and obtaining a final click rate predicted value through a Sigmoid function;
after the click rate predicted value is obtained, the click rate estimation model 100 based on the characteristic representation under the attention mechanism performs error back transmission on the estimated click rate through a Logloss loss function until the estimated click rate output by the output layer is converged, and completes parameter updating of the click rate estimation model based on the characteristic representation under the attention mechanism, thereby completing model training of the click rate estimation model 100 based on the training data set in the embodiment under the attention mechanism.
In this embodiment, the formula of the Logloss loss function is as follows:
in the formula, piFor click rate estimation model output, yiThe sample corresponds to the label, N is the number of training samples, and λ is the L2 regular term coefficient.
FIG. 5 is a flowchart illustrating a training process of a feature-characterization-based click rate estimation model under an attention mechanism according to an embodiment of the present invention.
As shown in fig. 5, the training process of the click rate estimation model based on the feature characterization under the attention mechanism in this embodiment is as follows:
step U1, constructing a data set, dividing data characteristics in the data set into continuous data characteristics and discrete data characteristics, and then entering step U2;
step U2, constructing a feature embedding layer, carrying out vectorization processing on the continuous features and the discrete features, then carrying out stacking embedding processing to form stacking features, and then entering step U3;
step U3, constructing an explicit feature crossing network, performing explicit feature combination by inputting stacking features into the attention crossing network to form an explicit output vector, and then entering step U4;
step U4, constructing an implicit characteristic cross network, inputting the stacking characteristics into a multilayer perceptron to perform implicit characteristic combination to form an implicit output vector, and then entering step U5;
step U5, constructing an estimated probability output layer, combining the explicit output vector and the implicit output vector to form a high-order nonlinear combined feature, simultaneously transmitting the combined feature to a Sigmoid activation function to predict the click rate to obtain an estimated click rate, and then entering step U6;
and step U6, performing error back transmission on the estimated click rate through a Logloss loss function until the estimated click rate output by the output layer is converged, completing parameter updating of the click rate estimation model based on characteristic representation under the attention mechanism, further completing model training, and then entering an ending state.
The programming environment for implementation of the system in this embodiment is Pycharm, and the version of Python is 3.6.
FIG. 6 is a flowchart of the deployment of the click-through rate prediction model in the embodiment of the present invention.
As shown in fig. 6, the specific working steps after completing the training of the click rate estimation model based on the feature characterization under the attention mechanism of this embodiment are as follows:
step T1, acquiring original data of the real service scene, namely acquiring the original data by early-stage data embedding, background log extraction and online information collection through splicing and summarizing, and then entering step T2;
step T2, preprocessing the original data collected in the step T1 to form sorted data, and then entering the step T3;
the pretreatment in this embodiment includes: and carrying out abnormal value processing, missing value processing and noise data processing on the collected original data.
Step T3, constructing the sorted data into a training data set, a testing data set and a verification data set of the click rate estimation model, determining the proportion distribution of the training data set, the testing data set and the verification data set according to the data volume and the service, and then entering step T4;
step T4, inputting each data set into a click rate estimation model to obtain an estimated click rate, and then entering step T5;
step T5, digging a feature project according with the interest preference of the user by combining the estimated click rate, wherein no manual feature project is carried out, and then, the step S6 is carried out;
step T6, extracting the corresponding optimal super parameter combination according to the performance of the off-line data set, and then entering step T7;
step T7, estimating the prediction performance of the click estimation model through the artificially set model measurement indexes (Logloss and AUC are often used as estimation indexes in the click rate estimation scene), and then entering step T8;
and step T8, carrying out online small flow test on the click estimation model by an algorithm engineer, verifying the online effect of the model, deploying the model online after the test, and then entering an ending state.
In this embodiment, the super-reference selection method in T6 is grid search, random search, and bayesian search.
After the model deployment is completed, a series of experiments are performed by using a click rate estimation model 100(Deep & Attention Cross Network, DACN) based on feature representation under the Attention mechanism of the invention, wherein a programming environment used when an Attention Cross Network (ACN) in the click rate estimation model 100 is realized is Pycharm, and the version of Python is 3.6. The running environment of the experiment is Core i7 CPU, 32GB memory and Linux operating system. The data set for the experiment was from real click data of Criteo Lab and movie reviews data of MovieLens. Model evaluation was performed using two indices: AUC and Logloss, which evaluate the performance of the model from different levels.
Experiments compare a novel characteristic Cross Network DACN (Deep & Attention Cross Network) which realizes the combination of explicit type and implicit type, namely the invention, with LR (logical regression), DNN, FM (differentiation mechanisms), Wide & Deep, DCN (Deep & Cross Network) and Deep FM. As mentioned above, these models are currently the mainstream and industry-validated click-through rate prediction models. Since DACN aims at extracting feature combinations through models, we will not perform any artificial feature engineering on the original features as control variables.
DACN is implemented herein on TensorFlow. Performing data normalization on the dense features using a logarithmic transformation; for class type features, the features are embedded into a length of 6 × dimension1/4In dense vectors of (c); using an Adam optimizer, a Mini-Batch stochastic gradient descent is employed, with Batch size set to 512 and the DNN network setting Batch normalization. For the comparative model, the parameter settings in PNN for FNN and PNN were followed. The DNN module is provided with Dropout of 0.5, the network structure is set as 400-400, the optimization algorithm adopts Mini-batch gradient descent based on Adam, the activation function uniformly uses Relu, the embedding dimension of FM is set as 10, and the rest of the model is set to be consistent with DACN.
FIG. 7 is a graph showing the comparison result of a single model in the embodiment of the present invention.
The effect of explicitly combined features on overall model predictive performance under the attention mechanism was first validated. In the comparison model, FM explicit measurement 2-order feature interaction, DNN modeling implicit high-order feature interaction, Cross Network modeling explicit high-order feature interaction, and ACN modeling explicit feature interaction and self-contained feature screening. Wherein each single model is represented in fig. 7 in two public data sets.
Experiments show that the ACN provided by the invention is always superior to other comparative models. As a conclusion, on the one hand, for practical datasets, high order interactions on sparse features are necessary, as evidenced by the clear superiority of DNN, Cross Network and ACN over FM on both datasets; on the other hand, the ACN is an optimal individual model, and the effectiveness of the ACN in modeling the interaction of the explicit high-order features is verified.
FIG. 8 is a diagram illustrating the comparison result of the integrated model in the embodiment of the present invention
DACN integrates ACN and DNN into a peer-to-peer network architecture. The ACN is used for explicit combined feature extraction and screening, the DNN is used for implicit combined feature extraction, and feature characterization is performed to the greatest extent through parallel connection of the ACN and the DNN. The performances of the DACN and the current mainstream click rate prediction model on the two public data sets are compared, and the experimental result is as shown in fig. 8.
It is readily apparent from fig. 8 that LR is worse than all other models, indicating that the factorization-based model is critical for modeling sparse-class interaction features; and Wide & Deep, DCN and Deep FM are obviously superior to DNN, and the result shows that the DNN implicit feature extraction capability is relatively limited, and a short board with insufficient feature combination capability is usually made up by means of manual feature engineering. Secondly, the DACN index is significantly improved compared to the DCN index. The advantages of the DACN have been demonstrated from a theoretical perspective, and the addition of the Attention network structure realizes the screening of the combination features of each designated order, improves the weight of important combination features, and eliminates the influence of redundant features. Experimental results prove that the structure can effectively realize feature screening and greatly improve the performance of the integral model.
The DACN provided by the invention achieves the best performance on both public data sets, which means that explicit and implicit high-order features are combined, and the original feature characterization is more sufficient. Meanwhile, the experiment result also verifies that the ACN is used for carrying out the specified order explicit characteristic combination to greatly improve the final model performance, and the reasonability of the DACN provided by the invention is laterally verified.
FIG. 9 is a diagram illustrating the comparison result of the number of network parameters in the embodiment of the present invention
In consideration of the additional parameters introduced by ACN, CrossNet, and DNN were compared on the critico dataset, and the minimum number of parameters required for each model to achieve the optimal log loss threshold was compared, because the number of parameters of each model embedded matrix was equal, the number of parameters in the embedded layer was omitted in the calculation of the number of parameters, and the experimental results are shown in fig. 9.
The experimental results show that the storage efficiency of the ACN and the Cross Network provided by the invention is nearly one order of magnitude higher than that of DNN, and the main reason is that the common feature Cross structure realizes the completion of feature interaction of a specified order by linear space complexity. In addition, the parameters of the ACN and the Cross Network belong to the same order of magnitude, an Attention Network introduced by the ACN only comprises a hidden layer, the number of the required parameters can be approximately ignored, and the model click rate prediction accuracy is greatly improved. The ACN structure provided by the invention is proved to have great advantages in space complexity from the side.
Examples effects and effects
According to the click rate estimation model based on the feature representation provided by the embodiment, due to the feature embedding layer, the feature embedding layer carries out vectorization processing on the continuous feature and the discrete feature, and the problem that the vector dimension is too large after the one-hot coding processing is solved. Meanwhile, the system also has an explicit characteristic cross network, the explicit characteristic cross network realizes the dynamic weighting of the combination items through the attention cross network, more efficiently utilizes the combination characteristics, and eliminates the influence of redundant characteristics on a click rate prediction model. The method further comprises an implicit characteristic cross network, and the implicit characteristic cross network completes the capture of highly nonlinear interaction characteristics by applying a multilayer perceptron, so that the problem that the characteristic expression capability of the model is limited by the parameter scale is solved. And finally, the prediction probability output layer is provided, and the prediction probability output layer performs click rate prediction based on the output of the explicit characteristic cross network and the implicit characteristic cross network through a Sigmoid activation function, so that the obtained prediction data is more accurate. Further, the estimated data can be used as a data fine-ranking link and applied to the fields of enterprise-level recommendation systems, search systems, online advertisement systems and the like.
In the embodiment, the discrete features are subjected to one-hot code conversion in the feature embedding layer, and the encoded discrete features are used as embedding vectors and are subjected to data standardization according to data distribution characteristics to form dense features; the two low-dimensional dense vectors can more effectively retain original semantic information.
In an embodiment, the interleaving algorithms proposed in the interleaving layer make the combination of explicit features more efficient.
In the embodiment, the attention layer further enables the model to learn and combine the feature weights by enabling the contribution degrees of different parts to be different when the different parts are compressed together, so that automatic feature extraction is realized.
In the embodiment, the dynamic weighting of the combination items is realized by the display feature cross network through an attention cross network mechanism, the combination features are utilized more efficiently, and the influence of redundant features on the click rate prediction model is eliminated.
In the embodiment, the explicit characteristic cross network and the implicit characteristic cross network are connected in parallel, so that the characteristic characterization capability of the model is further enhanced, and the click rate estimation precision is improved.
In the embodiment, through experiments, the click rate estimation model which is mainstream at present and is verified in the industry is subjected to single model expression comparison, integrated model expression comparison and network parameter quantity comparison respectively, so that the rationality of the invention and the ACN structure provided by the invention have great advantages in the aspect of space complexity.
The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010629307.3A CN113887694B (en) | 2020-07-01 | 2020-07-01 | A click-through rate prediction model based on feature representation under attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010629307.3A CN113887694B (en) | 2020-07-01 | 2020-07-01 | A click-through rate prediction model based on feature representation under attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113887694A true CN113887694A (en) | 2022-01-04 |
CN113887694B CN113887694B (en) | 2025-03-04 |
Family
ID=79012984
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010629307.3A Active CN113887694B (en) | 2020-07-01 | 2020-07-01 | A click-through rate prediction model based on feature representation under attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113887694B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529309A (en) * | 2022-02-09 | 2022-05-24 | 北京沃东天骏信息技术有限公司 | Information auditing method and device, electronic equipment and computer readable medium |
CN115114529A (en) * | 2022-07-01 | 2022-09-27 | 安徽理工大学 | Click rate depth cross estimation model, method, equipment and storage medium |
CN115271272A (en) * | 2022-09-29 | 2022-11-01 | 华东交通大学 | Click-through rate prediction method and system for multi-order feature optimization and hybrid knowledge distillation |
CN115295153A (en) * | 2022-09-30 | 2022-11-04 | 北京智精灵科技有限公司 | Cognitive assessment method and cognitive task pushing method based on deep learning |
CN116611497A (en) * | 2023-07-20 | 2023-08-18 | 深圳须弥云图空间科技有限公司 | Click rate estimation model training method and device |
CN118070927A (en) * | 2023-12-15 | 2024-05-24 | 湖南大学 | A method and system for building a high-performance predictive model with interpretability |
WO2025077537A1 (en) * | 2023-10-09 | 2025-04-17 | 马上消费金融股份有限公司 | Click-through rate estimation method and apparatus, electronic device, storage medium, and program product |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018212711A1 (en) * | 2017-05-19 | 2018-11-22 | National University Of Singapore | Predictive analysis methods and systems |
CN109960759A (en) * | 2019-03-22 | 2019-07-02 | 中山大学 | Prediction method of click-through rate of recommendation system based on deep neural network |
CN111062775A (en) * | 2019-12-03 | 2020-04-24 | 中山大学 | A Recall Method for Recommendation System Based on Attention Mechanism |
CN111325579A (en) * | 2020-02-25 | 2020-06-23 | 华南师范大学 | Advertisement click rate prediction method |
-
2020
- 2020-07-01 CN CN202010629307.3A patent/CN113887694B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018212711A1 (en) * | 2017-05-19 | 2018-11-22 | National University Of Singapore | Predictive analysis methods and systems |
CN109960759A (en) * | 2019-03-22 | 2019-07-02 | 中山大学 | Prediction method of click-through rate of recommendation system based on deep neural network |
CN111062775A (en) * | 2019-12-03 | 2020-04-24 | 中山大学 | A Recall Method for Recommendation System Based on Attention Mechanism |
CN111325579A (en) * | 2020-02-25 | 2020-06-23 | 华南师范大学 | Advertisement click rate prediction method |
Non-Patent Citations (4)
Title |
---|
JUN XIAO等: "Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks", 《PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE》, 19 August 2017 (2017-08-19), pages 3119 - 3125 * |
QIANQIAN WANG 等: "A Hierarchical Attention Model for CTR Prediction Based on User Interest", 《IEEE SYSTEMS JOURNAL》, vol. 14, no. 3, 24 October 2019 (2019-10-24), pages 4015 - 4024 * |
WANG RUOXI 等: "Deep&CrossNetwork for Ad Click Predictions", 《ARXIV》, 17 August 2017 (2017-08-17), pages 1 - 7 * |
温瑶瑶: "注意力机制下基于深度学习的点击率预测方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 1, 31 January 2020 (2020-01-31) * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529309A (en) * | 2022-02-09 | 2022-05-24 | 北京沃东天骏信息技术有限公司 | Information auditing method and device, electronic equipment and computer readable medium |
CN115114529A (en) * | 2022-07-01 | 2022-09-27 | 安徽理工大学 | Click rate depth cross estimation model, method, equipment and storage medium |
CN115271272A (en) * | 2022-09-29 | 2022-11-01 | 华东交通大学 | Click-through rate prediction method and system for multi-order feature optimization and hybrid knowledge distillation |
CN115271272B (en) * | 2022-09-29 | 2022-12-27 | 华东交通大学 | Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation |
CN115295153A (en) * | 2022-09-30 | 2022-11-04 | 北京智精灵科技有限公司 | Cognitive assessment method and cognitive task pushing method based on deep learning |
CN116611497A (en) * | 2023-07-20 | 2023-08-18 | 深圳须弥云图空间科技有限公司 | Click rate estimation model training method and device |
CN116611497B (en) * | 2023-07-20 | 2023-10-03 | 深圳须弥云图空间科技有限公司 | Click rate estimation model training method and device |
WO2025077537A1 (en) * | 2023-10-09 | 2025-04-17 | 马上消费金融股份有限公司 | Click-through rate estimation method and apparatus, electronic device, storage medium, and program product |
CN118070927A (en) * | 2023-12-15 | 2024-05-24 | 湖南大学 | A method and system for building a high-performance predictive model with interpretability |
Also Published As
Publication number | Publication date |
---|---|
CN113887694B (en) | 2025-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113887694B (en) | A click-through rate prediction model based on feature representation under attention mechanism | |
Min et al. | Multiproblem surrogates: Transfer evolutionary multiobjective optimization of computationally expensive problems | |
CN111583031A (en) | Application scoring card model building method based on ensemble learning | |
Nguyen et al. | A new workload prediction model using extreme learning machine and enhanced tug of war optimization | |
CN118734254B (en) | Safety education training monitoring and evaluating method and system based on operation mechanism optimization | |
CN114049527B (en) | Self-knowledge distillation method and system based on online collaboration and fusion | |
CN118899844B (en) | Power distribution network load transfer control method and system based on neural network decision distillation | |
Wang et al. | Neural-architecture-search-based multiobjective cognitive automation system | |
CN115661546A (en) | A multi-objective optimization classification method based on joint design of feature selection and classifier | |
Jiang et al. | An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing | |
CN104732067A (en) | Industrial process modeling forecasting method oriented at flow object | |
CN112200208B (en) | Cloud workflow task execution time prediction method based on multi-dimensional feature fusion | |
CN114896519B (en) | An early rumor detection method and device based on stance features | |
Jing | Neural network-based pattern recognition in the framework of edge computing | |
CN118761691B (en) | Smart energy industry Internet data management method, system, medium and server | |
CN112464541B (en) | Multi-scale uncertainty considered mixed composite material layering method | |
CN118861308B (en) | A method and device for fault diagnosis of pump equipment based on knowledge graph | |
CN119003769A (en) | Netizen view analysis method based on double large models | |
CN119127986A (en) | A method for designing biochemical experiments based on artificial intelligence and a human-computer interaction system | |
CN117236374A (en) | Layering interpretation method based on fully developed material graph neural network | |
CN117934112A (en) | Sequence recommendation method and system based on recurrent self-attention network | |
CN113746813B (en) | Network attack detection system and method based on two-stage learning model | |
Zhao et al. | A novel mixed sampling algorithm for imbalanced data based on XGBoost | |
Chen | Analysis of Human Resource Intelligent Recommendation Method Based on Improved Decision Tree Algorithm | |
CN110942149B (en) | A Feature Variable Selection Method Based on Information Change Rate and Conditional Mutual Information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |