CN113887694A - Click rate estimation model based on characteristic representation under attention mechanism - Google Patents
Click rate estimation model based on characteristic representation under attention mechanism Download PDFInfo
- Publication number
- CN113887694A CN113887694A CN202010629307.3A CN202010629307A CN113887694A CN 113887694 A CN113887694 A CN 113887694A CN 202010629307 A CN202010629307 A CN 202010629307A CN 113887694 A CN113887694 A CN 113887694A
- Authority
- CN
- China
- Prior art keywords
- characteristic
- feature
- attention
- features
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 20
- 239000013598 vector Substances 0.000 claims description 60
- 238000012549 training Methods 0.000 claims description 25
- 238000012512 characterization method Methods 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 150000001875 compounds Chemical class 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 23
- 230000003993 interaction Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 10
- 238000010801 machine learning Methods 0.000 description 10
- 238000012216 screening Methods 0.000 description 9
- 238000000605 extraction Methods 0.000 description 8
- 230000008602 contraction Effects 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 208000036029 Uterine contractions during pregnancy Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
In order to complete click rate estimation according to the object characteristics of the object to be detected, the method can be used as a data fine arrangement link and applied to the fields of enterprise-level recommendation systems, search systems, online advertisement systems and the like. The invention provides a click rate estimation model based on characteristic representation under an attention mechanism, which comprises the following steps: the characteristic embedding layer is used for conducting vectorization processing on the continuous characteristic and the discrete characteristic to form a stacking characteristic and an explicit characteristic cross network, conducting explicit characteristic combination and implicit characteristic cross network on the stacking characteristic through the attention cross network, conducting implicit characteristic combination and pre-estimation probability output layer on the stacking characteristic through the multilayer perceptron, and pre-estimating the click rate according to the received combination characteristic. The attention cross network eliminates the dependence of the pre-estimated model on the artificial characteristic engineering, and simultaneously, the introduction of the attention mechanism distinguishes the importance of each combined characteristic on model pre-estimation and eliminates the influence of useless and redundant characteristics on the model.
Description
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to an end-to-end click prediction technology.
Background
Click-through rate estimation is one of the most core research topics in the industry as a key technology directly influencing user platform experience and advertisement revenue. At present, research work at home and abroad is mainly on the aspect of characteristic representation, and the existing methods are mainly divided into a machine learning click rate model and a deep learning click rate model.
In the early stage, the industry is limited by computing power, online learning and model deployment, and a lightweight machine learning model is mainly built, and the most classical model belongs to a Logistic Regression model (Logistic Regression). LR is rapidly becoming the mainstream model of CTR estimation in the industry due to its advantages of good mathematical meaning, strong interpretability, convenience for engineering deployment, etc. In 2010, Bredan McMahan et al propose an online learning algorithm FTRL (follow The regulated leader) for LR, which further promotes The application of LR in The industry, but LR models are linear in nature and have limited learning capability, and The prediction effect of LR models is usually dependent on The characteristic engineering capability of data scientists. Therefore, the industry begins to explore the construction of second-order combination features by using a binomial model (multinomial Regression), and performs explicit feature combination by using pairwise intersection of the features, so that the problem of feature combination is solved to a certain extent by a violent intersection mode, but the method can only learn the co-occurrence features appearing in a training data set, and is difficult to generalize the non-co-occurrence combination features in large-scale sparse data scenes such as recommendation and advertisement. In order to solve the defect of the binomial model, an FM (localization mechanism) is proposed by Steffen Rendle of the university of comstanzi in 2010, germany, a hidden weight vector (latent vector) is learned for each feature, the hidden vector inner product is used as a feature cross weight, the feature combination problem in a sparse feature scene is well solved, in addition, the FM training complexity is further reduced by a method of transforming an objective function form, and therefore, the FM gradually becomes an important choice of the CTR model in the industry around 2012 to 2014. In 2015, the FFM (Field-aware competition Machine) proposed based on FM was dazzling in a plurality of CTR forecast competitions, and then started to be applied in recommendation and advertising scenes by companies such as Criteo, Mei Tuo and the like. Compared with an FM model, the FFM model mainly introduces a concept of 'domain', when feature crossing is carried out, each feature selects an implicit vector corresponding to a combined feature domain to carry out inner product operation to obtain the weight of crossing features, so that the model has stronger expression capability, but is limited by the limitations that the FFM has higher space complexity and can only carry out second-order feature crossing and the like, and the FFM model is not widely used in the industry. In addition, Xinran He and the like in 2014 proposed a solution based on a GBDT (gradient Boost Decision tree) + LR combination model for processing high-dimensional feature combination and screening problems, which utilizes GBDT to automatically carry out feature screening and combination, utilizes unique codes to code leaf nodes, and then inputs the coding features as an LR model to complete CTR prediction, opens the precedent of utilizing the model to carry out high-order feature construction and screening, solves the problem of very troublesome feature combination and screening in the past more efficiently, and greatly promotes the important trend of feature engineering modeling.
In the period, researchers find that personalized requirements are easier to dig out by high-order combined features, the pushing effect of 'thousands of people and thousands of faces' is achieved, the high-order combined feature business logic is difficult to understand due to the fact that sparse features are increased sharply, the traditional artificial feature engineering is difficult to continue in the aspect of digging high-order combined features, and people begin to try to continue the advantages of model extraction features to complete personalized pushing of users in a big data scene. Along with the great diversity of deep learning in the fields of computer vision and natural language processing, people begin to try to utilize a neural network to automatically perform feature characterization so as to replace manual feature engineering to finish click rate estimation.
In 2016, Deep learning is applied to click rate prediction in a large scale, a Deep Crossing serial network structure is proposed by Microsoft Ying Shan and the like, the Deep Crossing serial network structure covers the classic elements of a CTR prediction neural network model, namely, sparse features are converted into low-dimensional dense features by adding an embedded layer, segmented feature vectors are spliced by using a stack layer, combination and conversion of the features are completed by multiple layers of neural networks, the CTR prediction is finally completed by a Sigmoid activation function, and a residual error network structure is formally introduced into the click rate prediction model for enhancing the high-order feature extraction capability of the model. In the same year, Zhang Weinan, etc. of Shanghai traffic university propose FNN, and on the basis of the previous Deep Crossing network structure, the hidden layer vector of FM is used as the embedding of users and materials, thereby avoiding completely training an embedding matrix from a random state, and greatly reducing the training time of a model and the instability of an embedding layer. The method for pre-training is used for completing the training of the network embedding layer, and undoubtedly is effective engineering experience for reducing the complexity and the training instability of a deep learning model. However, the conventional DNN directly uses multiple full-link layers to complete feature cross combination, and lacks "pertinence" of feature combination to a click rate estimation scene, so Yanru Qu et al propose PNN (Product-based Neural Network), add a Product layer between an embedded layer and a full-link layer, aim at performing feature combination between different feature domains, and enhance the capability of a model to represent different data modes.
In 2016, Google Heng-Tze Cheng and the like propose a Wide & Deep parallel network structure, and a Wide part of a single input layer and a Deep part passing through a multilayer perceptron are spliced and transmitted to an output layer. The method is characterized in that the memory of the model is realized by using the Wide part (memorability), the Generalization of the model is realized by using the Deep part (Generalization), the DNN excavates the implicit high-order feature combination, and the LR connects the Wide part and the Deep part to form the unified CTR model. Wide & Deep establishes a click rate estimation parallel framework based on Deep learning, but does not get rid of the limitation of needing to help with artificial feature engineering. Aiming at the defects of insufficient performance of the Wide part and the like, in 2017, Huifeng Guo and the like propose DeepFM, on the basis of continuing the Wide and Deep parallel network structure, FM is used for replacing the original Wide part, the feature combination capability of a shallow network is enhanced, and meanwhile, the dependence of a click rate prediction depth model on an artificial feature engineering is eliminated. In the same year, Ruoxi Wang et al propose Deep & Cross Network (DCN), begin to use Cross Network to replace the original Wide part, realize bit-level explicit feature interaction, further refine the feature Cross granularity of the Wide part. Xiangnan He et al propose NFM (neural differentiation machines) to improve Deep part, introduce Bi-interaction Pooling layer to replace FM to perform feature crossing, and further strengthen the feature combination capability of Deep layer. A deep learning network DIN (deep Interest network) based on an attention mechanism is proposed by Ali baba Guorui Zhou and the like in 2018, real-time Interest features are extracted from a user behavior sequence, and feature characterization on the user side is further improved. In the same year, Jianxun Lian et al propose an xDeepFM parallel network structure, model vector-level explicit feature Interaction, and CIN (compressed Interaction network) is adopted in the Wide part to enhance the explicit feature combination capability of the model, so that a certain effect is achieved. Subsequently, Guorui Zhou et al proposed DIEN (deep Interest Evolution network) in 2019, started to introduce a sequence model AUGRU on the basis of DIN, concatenated user interests at different times to form an Interest Evolution chain, and finally input the 'Interest vector' at the current moment into an upper-layer multilayer perceptron to finish click rate estimation in conjunction with other characteristics, so as to obtain a better effect.
In summary, the current deep learning model still cannot perform feature extraction, feature combination and feature screening by using a neural network instead of manual feature engineering. Furthermore, on the premise that a neural network is used for replacing a manual characteristic project, the characteristic change trend cannot be accurately estimated, and accurate click rate estimation cannot be obtained automatically.
Disclosure of Invention
In order to solve the problems, a click rate estimation model based on feature characterization under an attention mechanism is provided. The invention adopts the following technical scheme:
the click rate estimation model based on the characteristic representation under the attention mechanism comprises the following steps: the system comprises a characteristic embedding layer, an explicit characteristic cross network, an implicit characteristic cross network and a pre-estimation probability output layer. The characteristic embedding layer is used for carrying out vectorization processing on the continuous characteristic and the discrete characteristic and then carrying out stacking embedding processing to form a stacking characteristic; an explicit feature crossover network that forms an explicit output vector by inputting the stacking features into the attention crossover network for explicit feature combining; the implicit characteristic cross network is used for inputting the stacking characteristics into a multilayer perceptron to carry out implicit characteristic combination to form an implicit output vector; a probability output layer is pre-estimated, the explicit output vector and the implicit output vector are combined to form a high-order nonlinear combined characteristic, and the combined characteristic is transmitted to a Sigmoid activation function to predict the click rate, so that the click rate is obtained; wherein the attention crossing network comprises: the cross layer processes the stacking features through a cross algorithm and generates a multi-dimensional vector; and an attention layer for processing the multidimensional vector through a fully connected neural network to generate an attention score, performing normalization processing on the attention score to generate a characteristic coefficient, and further generating an explicit output vector through an output calculation formula based on the characteristic coefficient.
The machine learning-based real-time labor contraction judging system provided by the invention has the technical characteristics that the vectorization processing comprises the following steps: carrying out one-hot code conversion on the discrete features, and taking the encoded discrete features as embedded vectors; carrying out data standardization according to data distribution characteristics on the continuous features to form dense features; stacking and embedding the embedded vectors and the dense features to be used as stacking features, wherein a matrix calculation formula of the one-hot coding conversion is as follows:
xembed,i=Wembed,ixi#(1)
in the formula, xembed,iIs an embedded vector, xiIs a binary input of class i, andis an embedding matrix to be optimized together with other parameters in the network, and neAnd nvRespectively an input dimension and an embedding vector dimension.
The machine learning-based temporary labor contraction judging system provided by the invention also has the technical characteristics that the calculation formula of the cross algorithm is as follows:
in the formula (I), the compound is shown in the specification,are column vectors representing outputs from the l-th and l + 1-th interleaved layers, respectively;is the weight and deviation of the ith layer, and the function f represents the feature vector intersection formula of each layer.
The machine learning-based real-time labor contraction judging system provided by the invention also has the technical characteristics that the computing logic of the standardized processing in the attention layer is as follows:
a′1=hTRelu(Wxi+b)#(3)
in the formula (I), the compound is shown in the specification,are model parameters and the attention score is normalized by Softmax.
The machine learning-based real-time labor contraction judging system provided by the invention has the technical characteristics that the output calculation formula of the attention crossing network is as follows:
in the formula, aiIs the attention weight.
The machine learning-based temporary labor contraction judging system provided by the invention also has the technical characteristics that the calculation logics of all layers of the multilayer perceptron are as follows:
Hl+1=f(wlHl+Bl)#(7)
in the formula, Hl+1Representing the hidden layer, f (-) is the Relu function.
The machine learning-based real-time labor contraction judging system provided by the invention has the technical characteristics that the formula of the Sigmoid activation function is as follows:
in the formula (I), the compound is shown in the specification,respectively outputting the explicit characteristic cross network and the multilayer perception machine, and obtaining a final click rate predicted value through a Sigmoid function.
The machine learning-based antenatal uterine contraction judging system provided by the invention also has the technical characteristics that the click rate estimation model based on the characteristic representation under the attention mechanism carries out error back transmission on the click rate through a Logloss loss function until the click rate output by the output layer is converged, and the parameter updating of the click rate estimation model based on the characteristic representation under the attention mechanism is completed.
The machine learning-based temporary labor contraction judging system provided by the invention also has the technical characteristics that the Logloss loss function formula is as follows:
in the formula, piFor the output of the click-through rate prediction model, yiAnd (3) a label corresponding to the sample, N is the number of training samples, and lambda is an L2 regular term coefficient, error back-transmission is carried out through a Logloss loss function, and parameter updating is carried out based on the error back-transmission until the final click rate model training is completed through convergence.
Action and Effect of the invention
According to the click rate estimation model based on the feature representation under the attention mechanism, due to the feature embedding layer, the feature embedding layer solves the problem of overlarge vector dimension after the single hot coding processing by vectorizing the continuous feature and the discrete feature. Meanwhile, the system also has an explicit characteristic cross network, the explicit characteristic cross network realizes the dynamic weighting of the combination items through the attention cross network, more efficiently utilizes the combination characteristics, and eliminates the influence of redundant characteristics on a click rate prediction model. The method further comprises an implicit characteristic cross network, and the implicit characteristic cross network completes the capture of highly nonlinear interaction characteristics by applying a multilayer perceptron, so that the problem that the characteristic expression capability of the model is limited by the parameter scale is solved. Finally, the invention is provided with a pre-estimation probability output layer which carries out output click rate pre-estimation based on the explicit characteristic cross network and the implicit characteristic cross network through a Sigmoid activation function, so that the obtained pre-estimation data is more accurate. Further, the estimated data can be used as a data fine-ranking link and applied to the fields of enterprise-level recommendation systems, search systems, online advertisement systems and the like.
Drawings
FIG. 1 is a block diagram of a click through rate prediction model based on feature characterization under an attention mechanism in an embodiment of the present invention;
FIG. 2 is a flow chart of the operation of a feature embedding layer in an embodiment of the present invention;
FIG. 3 is a network architecture diagram of an attention crossing network in an embodiment of the present invention;
FIG. 4 is a network architecture diagram of a multi-tier perceptron in an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a training process of a feature characterization-based click rate estimation model under an attention mechanism according to an embodiment of the present invention; and
FIG. 6 is a flowchart of the deployment of the click-through rate prediction model in the embodiment of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the present invention easy to understand, the click rate estimation model based on the characteristic characterization under the attention mechanism of the present invention is specifically described below with reference to the embodiments and the drawings.
< example >
FIG. 1 is a block diagram of a click through rate prediction model based on feature characterization under the attention mechanism in the embodiment of the present invention.
As shown in FIG. 1, the feature characterization-based click through rate prediction model 100 under the attention mechanism includes: a feature embedding layer 101, an explicit feature crossing network 102, an implicit feature crossing network 103, and a predictive probability output layer 104.
In this embodiment, the user-side feature, the advertisement-side feature and the context feature are collected, the collected features are divided into a continuous feature and a discrete feature, the two features are used as input features, a training data set is formed by arranging the input features, and further, model training of the click rate estimation model 100 based on the feature representation under the attention mechanism is completed through the training data set. The training data set consists of n samples (x, y), the input features are continuous features and discrete features, the goal is to construct a click rate estimation model y which is a model (x), and x belongs to R n and is used for predicting that a user u clicks a candidate material v in a specified contexttIs equal to [0, 1 ] in the probability y, y ∈]
The feature embedding layer 101 performs vectorization processing on the received input features, and stacks the obtained embedded vectors and dense features to form stacked features and output the stacked features.
FIG. 2 is a flow chart of the operation of the feature embedding layer in an embodiment of the present invention.
As shown in fig. 2, the steps of forming the stacked features by the feature embedding layer 101 are as follows:
step S1, carrying out one-hot code conversion on the discrete type features, taking the encoded discrete type features as embedded vectors, and then entering step S2;
step S2, carrying out data standardization based on data distribution characteristics on the continuous features to form dense features, and then entering step S3;
in step S3, the embedding vector formed in step S1 and the dense feature formed in step S2 are subjected to a stack embedding process, i.e., the embedding vector and the dense feature are stacked into one vector, and the vector is referred to as a stacked feature, and then an end state is entered.
In this embodiment, the matrix calculation formula of the stack embedding process is:
xembed,i=Wembed,ixi#(1)
in the formula, xembed,iIs an embedded vector, xiIs a binary input of class i, andis an embedding matrix to be optimized together with other parameters in the network, and neAnd nvRespectively an input dimension and an embedding vector dimension.
The explicit feature crossing network 102 receives the stacked features formed by the feature embedding layer 101, and performs a crossing algorithm on the stacked features through the attention crossing network to generate a multi-dimensional vector;
fig. 3 is a network configuration diagram of an attention crossover network in an embodiment of the present invention.
As shown in diagram a of fig. 3, the attention-crossing network includes: a cross-layer 21 and an attention layer 22.
As shown in the B diagram of FIG. 3, the number of neurons in each of the interleaved layers 21 is the same and equal to the input vector x0Of (c) is calculated.
In this embodiment, the calculation formula of the crossover algorithm is:
in the formula (I), the compound is shown in the specification,are column vectors representing outputs from the l-th and l + 1-th interleaved layers, respectively;is the weight and deviation of the ith layer, and the function f represents the feature vector intersection formula of each layer.
The attention layer 22 processes the multidimensional vector through the fully-connected neural network and generates an attention score, and normalizes the attention score to generate a feature coefficient, and generates an explicit output vector by an output calculation formula based on the feature coefficient.
In this embodiment, an Attention network is used as the fully-connected neural network, a ReLU is used as the activation function, and the size of the network is expressed by an Attention factor. In addition, the computational logic of the normalization process in the attention layer 22 is:
a′i=hTRelu(Wxi+b)#(3)
in the formula (I), the compound is shown in the specification,are model parameters and the attention score is normalized by Softmax. Taking the result of the above formula as a characteristic coefficient, wherein the output calculation formula of the attention crossing network is as follows:
in the formula, aiIs the attention weight.
The implicit feature crossing network 103 forms an implicit output vector by inputting the stacked features into a multi-layer perceptron for implicit feature combination.
Fig. 4 is a network structure diagram of a multi-layer perceptron in an embodiment of the invention.
As shown in fig. 4, in the present embodiment, the multi-layer perceptron is a fully connected feedforward neural network, and the calculation logic of each layer is as follows:
Hl+1=f(W1Hl+Bl)#(7)
in the formula, Hl+1Representing the hidden layer, f (-) is the Relu function.
The pre-estimated probability output layer 104 combines the explicit output vector and the implicit output vector to form a high-order nonlinear combined feature, and simultaneously transmits the combined feature to a Sigmoid activation function to predict the click rate, so as to obtain the pre-estimated click rate.
In this embodiment, the formula of the Sigmoid activation function is as follows:
in the formula (I), the compound is shown in the specification,respectively outputting an explicit characteristic cross network and a multilayer sensing machine, and obtaining a final click rate predicted value through a Sigmoid function;
after the click rate predicted value is obtained, the click rate estimation model 100 based on the characteristic representation under the attention mechanism performs error back transmission on the estimated click rate through a Logloss loss function until the estimated click rate output by the output layer is converged, and completes parameter updating of the click rate estimation model based on the characteristic representation under the attention mechanism, thereby completing model training of the click rate estimation model 100 based on the training data set in the embodiment under the attention mechanism.
In this embodiment, the formula of the Logloss loss function is as follows:
in the formula, piFor click rate estimation model output, yiThe sample corresponds to the label, N is the number of training samples, and λ is the L2 regular term coefficient.
FIG. 5 is a flowchart illustrating a training process of a feature-characterization-based click rate estimation model under an attention mechanism according to an embodiment of the present invention.
As shown in fig. 5, the training process of the click rate estimation model based on the feature characterization under the attention mechanism in this embodiment is as follows:
step U1, constructing a data set, dividing data characteristics in the data set into continuous data characteristics and discrete data characteristics, and then entering step U2;
step U2, constructing a feature embedding layer, carrying out vectorization processing on the continuous features and the discrete features, then carrying out stacking embedding processing to form stacking features, and then entering step U3;
step U3, constructing an explicit feature crossing network, performing explicit feature combination by inputting stacking features into the attention crossing network to form an explicit output vector, and then entering step U4;
step U4, constructing an implicit characteristic cross network, inputting the stacking characteristics into a multilayer perceptron to perform implicit characteristic combination to form an implicit output vector, and then entering step U5;
step U5, constructing an estimated probability output layer, combining the explicit output vector and the implicit output vector to form a high-order nonlinear combined feature, simultaneously transmitting the combined feature to a Sigmoid activation function to predict the click rate to obtain an estimated click rate, and then entering step U6;
and step U6, performing error back transmission on the estimated click rate through a Logloss loss function until the estimated click rate output by the output layer is converged, completing parameter updating of the click rate estimation model based on characteristic representation under the attention mechanism, further completing model training, and then entering an ending state.
The programming environment for implementation of the system in this embodiment is Pycharm, and the version of Python is 3.6.
FIG. 6 is a flowchart of the deployment of the click-through rate prediction model in the embodiment of the present invention.
As shown in fig. 6, the specific working steps after completing the training of the click rate estimation model based on the feature characterization under the attention mechanism of this embodiment are as follows:
step T1, acquiring original data of the real service scene, namely acquiring the original data by early-stage data embedding, background log extraction and online information collection through splicing and summarizing, and then entering step T2;
step T2, preprocessing the original data collected in the step T1 to form sorted data, and then entering the step T3;
the pretreatment in this embodiment includes: and carrying out abnormal value processing, missing value processing and noise data processing on the collected original data.
Step T3, constructing the sorted data into a training data set, a testing data set and a verification data set of the click rate estimation model, determining the proportion distribution of the training data set, the testing data set and the verification data set according to the data volume and the service, and then entering step T4;
step T4, inputting each data set into a click rate estimation model to obtain an estimated click rate, and then entering step T5;
step T5, digging a feature project according with the interest preference of the user by combining the estimated click rate, wherein no manual feature project is carried out, and then, the step S6 is carried out;
step T6, extracting the corresponding optimal super parameter combination according to the performance of the off-line data set, and then entering step T7;
step T7, estimating the prediction performance of the click estimation model through the artificially set model measurement indexes (Logloss and AUC are often used as estimation indexes in the click rate estimation scene), and then entering step T8;
and step T8, carrying out online small flow test on the click estimation model by an algorithm engineer, verifying the online effect of the model, deploying the model online after the test, and then entering an ending state.
In this embodiment, the super-reference selection method in T6 is grid search, random search, and bayesian search.
After the model deployment is completed, a series of experiments are performed by using a click rate estimation model 100(Deep & Attention Cross Network, DACN) based on feature representation under the Attention mechanism of the invention, wherein a programming environment used when an Attention Cross Network (ACN) in the click rate estimation model 100 is realized is Pycharm, and the version of Python is 3.6. The running environment of the experiment is Core i7 CPU, 32GB memory and Linux operating system. The data set for the experiment was from real click data of Criteo Lab and movie reviews data of MovieLens. Model evaluation was performed using two indices: AUC and Logloss, which evaluate the performance of the model from different levels.
Experiments compare a novel characteristic Cross Network DACN (Deep & Attention Cross Network) which realizes the combination of explicit type and implicit type, namely the invention, with LR (logical regression), DNN, FM (differentiation mechanisms), Wide & Deep, DCN (Deep & Cross Network) and Deep FM. As mentioned above, these models are currently the mainstream and industry-validated click-through rate prediction models. Since DACN aims at extracting feature combinations through models, we will not perform any artificial feature engineering on the original features as control variables.
DACN is implemented herein on TensorFlow. Performing data normalization on the dense features using a logarithmic transformation; for class type features, the features are embedded into a length of 6 × dimension1/4In dense vectors of (c); using an Adam optimizer, a Mini-Batch stochastic gradient descent is employed, with Batch size set to 512 and the DNN network setting Batch normalization. For the comparative model, the parameter settings in PNN for FNN and PNN were followed. The DNN module is provided with Dropout of 0.5, the network structure is set as 400-400, the optimization algorithm adopts Mini-batch gradient descent based on Adam, the activation function uniformly uses Relu, the embedding dimension of FM is set as 10, and the rest of the model is set to be consistent with DACN.
FIG. 7 is a graph showing the comparison result of a single model in the embodiment of the present invention.
The effect of explicitly combined features on overall model predictive performance under the attention mechanism was first validated. In the comparison model, FM explicit measurement 2-order feature interaction, DNN modeling implicit high-order feature interaction, Cross Network modeling explicit high-order feature interaction, and ACN modeling explicit feature interaction and self-contained feature screening. Wherein each single model is represented in fig. 7 in two public data sets.
Experiments show that the ACN provided by the invention is always superior to other comparative models. As a conclusion, on the one hand, for practical datasets, high order interactions on sparse features are necessary, as evidenced by the clear superiority of DNN, Cross Network and ACN over FM on both datasets; on the other hand, the ACN is an optimal individual model, and the effectiveness of the ACN in modeling the interaction of the explicit high-order features is verified.
FIG. 8 is a diagram illustrating the comparison result of the integrated model in the embodiment of the present invention
DACN integrates ACN and DNN into a peer-to-peer network architecture. The ACN is used for explicit combined feature extraction and screening, the DNN is used for implicit combined feature extraction, and feature characterization is performed to the greatest extent through parallel connection of the ACN and the DNN. The performances of the DACN and the current mainstream click rate prediction model on the two public data sets are compared, and the experimental result is as shown in fig. 8.
It is readily apparent from fig. 8 that LR is worse than all other models, indicating that the factorization-based model is critical for modeling sparse-class interaction features; and Wide & Deep, DCN and Deep FM are obviously superior to DNN, and the result shows that the DNN implicit feature extraction capability is relatively limited, and a short board with insufficient feature combination capability is usually made up by means of manual feature engineering. Secondly, the DACN index is significantly improved compared to the DCN index. The advantages of the DACN have been demonstrated from a theoretical perspective, and the addition of the Attention network structure realizes the screening of the combination features of each designated order, improves the weight of important combination features, and eliminates the influence of redundant features. Experimental results prove that the structure can effectively realize feature screening and greatly improve the performance of the integral model.
The DACN provided by the invention achieves the best performance on both public data sets, which means that explicit and implicit high-order features are combined, and the original feature characterization is more sufficient. Meanwhile, the experiment result also verifies that the ACN is used for carrying out the specified order explicit characteristic combination to greatly improve the final model performance, and the reasonability of the DACN provided by the invention is laterally verified.
FIG. 9 is a diagram illustrating the comparison result of the number of network parameters in the embodiment of the present invention
In consideration of the additional parameters introduced by ACN, CrossNet, and DNN were compared on the critico dataset, and the minimum number of parameters required for each model to achieve the optimal log loss threshold was compared, because the number of parameters of each model embedded matrix was equal, the number of parameters in the embedded layer was omitted in the calculation of the number of parameters, and the experimental results are shown in fig. 9.
The experimental results show that the storage efficiency of the ACN and the Cross Network provided by the invention is nearly one order of magnitude higher than that of DNN, and the main reason is that the common feature Cross structure realizes the completion of feature interaction of a specified order by linear space complexity. In addition, the parameters of the ACN and the Cross Network belong to the same order of magnitude, an Attention Network introduced by the ACN only comprises a hidden layer, the number of the required parameters can be approximately ignored, and the model click rate prediction accuracy is greatly improved. The ACN structure provided by the invention is proved to have great advantages in space complexity from the side.
Examples effects and effects
According to the click rate estimation model based on the feature representation provided by the embodiment, due to the feature embedding layer, the feature embedding layer carries out vectorization processing on the continuous feature and the discrete feature, and the problem that the vector dimension is too large after the one-hot coding processing is solved. Meanwhile, the system also has an explicit characteristic cross network, the explicit characteristic cross network realizes the dynamic weighting of the combination items through the attention cross network, more efficiently utilizes the combination characteristics, and eliminates the influence of redundant characteristics on a click rate prediction model. The method further comprises an implicit characteristic cross network, and the implicit characteristic cross network completes the capture of highly nonlinear interaction characteristics by applying a multilayer perceptron, so that the problem that the characteristic expression capability of the model is limited by the parameter scale is solved. And finally, the prediction probability output layer is provided, and the prediction probability output layer performs click rate prediction based on the output of the explicit characteristic cross network and the implicit characteristic cross network through a Sigmoid activation function, so that the obtained prediction data is more accurate. Further, the estimated data can be used as a data fine-ranking link and applied to the fields of enterprise-level recommendation systems, search systems, online advertisement systems and the like.
In the embodiment, the discrete features are subjected to one-hot code conversion in the feature embedding layer, and the encoded discrete features are used as embedding vectors and are subjected to data standardization according to data distribution characteristics to form dense features; the two low-dimensional dense vectors can more effectively retain original semantic information.
In an embodiment, the interleaving algorithms proposed in the interleaving layer make the combination of explicit features more efficient.
In the embodiment, the attention layer further enables the model to learn and combine the feature weights by enabling the contribution degrees of different parts to be different when the different parts are compressed together, so that automatic feature extraction is realized.
In the embodiment, the dynamic weighting of the combination items is realized by the display feature cross network through an attention cross network mechanism, the combination features are utilized more efficiently, and the influence of redundant features on the click rate prediction model is eliminated.
In the embodiment, the explicit characteristic cross network and the implicit characteristic cross network are connected in parallel, so that the characteristic characterization capability of the model is further enhanced, and the click rate estimation precision is improved.
In the embodiment, through experiments, the click rate estimation model which is mainstream at present and is verified in the industry is subjected to single model expression comparison, integrated model expression comparison and network parameter quantity comparison respectively, so that the rationality of the invention and the ACN structure provided by the invention have great advantages in the aspect of space complexity.
The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.
Claims (9)
1. A click rate estimation model based on characteristic representation under an attention mechanism is used for completing click rate estimation according to object characteristics of an object to be detected, wherein the object characteristics are divided into continuous characteristics and discrete characteristics, and the click rate estimation model is characterized by comprising the following steps:
the feature embedding layer is used for carrying out vectorization processing on the continuous features and the discrete features and then stacking and embedding the continuous features and the discrete features to form stacking features;
an explicit feature crossover network that forms an explicit output vector by inputting the stacked features into an attention crossover network for explicit feature combining;
the implicit characteristic cross network is used for inputting the stacking characteristics into a multilayer perceptron to carry out implicit characteristic combination to form an implicit output vector;
a pre-estimated probability output layer, combining the explicit output vector and the implicit output vector to form a high-order nonlinear combined feature, and simultaneously transmitting the combined feature to a Sigmoid activation function to predict the click rate to obtain the click rate;
wherein the attention crossing network comprises:
the cross layer processes the stacking features through a cross algorithm and generates a multi-dimensional vector; and
and the attention layer processes the multidimensional vector through a fully-connected neural network to generate an attention score, performs normalization processing on the attention score to generate a characteristic coefficient, and further generates the explicit output vector through an output calculation formula based on the characteristic coefficient.
2. The feature characterization-based click through rate prediction model according to claim 1, wherein:
wherein the vectorization processing is:
carrying out one-hot code conversion on the discrete features, and taking the encoded discrete features as embedded vectors;
carrying out data standardization according to data distribution characteristics on the continuous features to form dense features;
subjecting the embedding vector and dense features to the stack embedding process as stacked features,
the matrix calculation formula of the one-hot coding conversion is as follows:
xembed,i=Wembed,ixi#(1)
3. The feature characterization-based click through rate prediction model according to claim 1, wherein:
wherein, the calculation formula of the cross algorithm is as follows:
4. The feature characterization-based click through rate prediction model according to claim 1, wherein:
wherein the computing logic of the normalization process in the attention layer is:
ai′=hTRelu(Wxi+b)#(3)
5. The feature characterization-based click through rate prediction model according to claim 1, wherein:
wherein the output calculation formula of the attention crossing network is:
6. The feature characterization-based click through rate prediction model according to claim 1, wherein:
wherein, each layer of the multilayer perceptron comprises the following calculation logics:
Hl+1=f(WlHl+BI)#(7)
7. The feature characterization-based click through rate prediction model according to claim 1, wherein:
wherein, the formula of the Sigmoid activation function is as follows:
8. The feature characterization-based click through rate prediction model according to claim 1, wherein:
and the click rate estimation model based on the characteristic representation under the attention mechanism carries out error back transmission on the click rate through a Logloss loss function until the click rate output by the output layer is converged, and completes the parameter updating of the click rate estimation model based on the characteristic representation under the attention mechanism.
9. The feature-based click-through rate prediction model of claim 5, wherein:
wherein, the formula of the Logloss loss function is as follows:
in the formula, WlIs the weight of the l-th layer, piFor the output of the click rate estimation model, yiAnd (3) a label corresponding to the sample, N is the number of training samples, and lambda is an L2 regular term coefficient, error back-transmission is carried out through a Logloss loss function, and parameter updating is carried out based on the error back-transmission until the final click rate model training is completed through convergence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010629307.3A CN113887694A (en) | 2020-07-01 | 2020-07-01 | Click rate estimation model based on characteristic representation under attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010629307.3A CN113887694A (en) | 2020-07-01 | 2020-07-01 | Click rate estimation model based on characteristic representation under attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113887694A true CN113887694A (en) | 2022-01-04 |
Family
ID=79012984
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010629307.3A Pending CN113887694A (en) | 2020-07-01 | 2020-07-01 | Click rate estimation model based on characteristic representation under attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113887694A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529309A (en) * | 2022-02-09 | 2022-05-24 | 北京沃东天骏信息技术有限公司 | Information auditing method and device, electronic equipment and computer readable medium |
CN115271272A (en) * | 2022-09-29 | 2022-11-01 | 华东交通大学 | Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation |
CN115295153A (en) * | 2022-09-30 | 2022-11-04 | 北京智精灵科技有限公司 | Cognitive assessment method and cognitive task pushing method based on deep learning |
CN116611497A (en) * | 2023-07-20 | 2023-08-18 | 深圳须弥云图空间科技有限公司 | Click rate estimation model training method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018212711A1 (en) * | 2017-05-19 | 2018-11-22 | National University Of Singapore | Predictive analysis methods and systems |
CN109960759A (en) * | 2019-03-22 | 2019-07-02 | 中山大学 | Recommender system clicking rate prediction technique based on deep neural network |
CN111062775A (en) * | 2019-12-03 | 2020-04-24 | 中山大学 | Recommendation system recall method based on attention mechanism |
CN111325579A (en) * | 2020-02-25 | 2020-06-23 | 华南师范大学 | Advertisement click rate prediction method |
-
2020
- 2020-07-01 CN CN202010629307.3A patent/CN113887694A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018212711A1 (en) * | 2017-05-19 | 2018-11-22 | National University Of Singapore | Predictive analysis methods and systems |
CN109960759A (en) * | 2019-03-22 | 2019-07-02 | 中山大学 | Recommender system clicking rate prediction technique based on deep neural network |
CN111062775A (en) * | 2019-12-03 | 2020-04-24 | 中山大学 | Recommendation system recall method based on attention mechanism |
CN111325579A (en) * | 2020-02-25 | 2020-06-23 | 华南师范大学 | Advertisement click rate prediction method |
Non-Patent Citations (4)
Title |
---|
JUN XIAO等: "Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks", 《PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE》, 19 August 2017 (2017-08-19), pages 3119 - 3125 * |
QIANQIAN WANG 等: "A Hierarchical Attention Model for CTR Prediction Based on User Interest", 《IEEE SYSTEMS JOURNAL》, vol. 14, no. 3, 24 October 2019 (2019-10-24), pages 4015 - 4024 * |
WANG RUOXI 等: "Deep&CrossNetwork for Ad Click Predictions", 《ARXIV》, 17 August 2017 (2017-08-17), pages 1 - 7 * |
温瑶瑶: "注意力机制下基于深度学习的点击率预测方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 1, 31 January 2020 (2020-01-31) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529309A (en) * | 2022-02-09 | 2022-05-24 | 北京沃东天骏信息技术有限公司 | Information auditing method and device, electronic equipment and computer readable medium |
CN115271272A (en) * | 2022-09-29 | 2022-11-01 | 华东交通大学 | Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation |
CN115271272B (en) * | 2022-09-29 | 2022-12-27 | 华东交通大学 | Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation |
CN115295153A (en) * | 2022-09-30 | 2022-11-04 | 北京智精灵科技有限公司 | Cognitive assessment method and cognitive task pushing method based on deep learning |
CN116611497A (en) * | 2023-07-20 | 2023-08-18 | 深圳须弥云图空间科技有限公司 | Click rate estimation model training method and device |
CN116611497B (en) * | 2023-07-20 | 2023-10-03 | 深圳须弥云图空间科技有限公司 | Click rate estimation model training method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Min et al. | Multiproblem surrogates: Transfer evolutionary multiobjective optimization of computationally expensive problems | |
CN113887694A (en) | Click rate estimation model based on characteristic representation under attention mechanism | |
US20200167659A1 (en) | Device and method for training neural network | |
CN111506811A (en) | Click rate prediction method based on deep residual error network | |
CN111782961B (en) | Answer recommendation method oriented to machine reading understanding | |
CN112966714B (en) | Edge time sequence data anomaly detection and network programmable control method | |
CN111461286B (en) | Spark parameter automatic optimization system and method based on evolutionary neural network | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN111709523A (en) | Width learning method based on internal integration | |
Lyu et al. | A survey of model compression strategies for object detection | |
Jiang et al. | An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing | |
CN116976505A (en) | Click rate prediction method of decoupling attention network based on information sharing | |
CN106453294A (en) | Security situation prediction method based on niche technology with fuzzy elimination mechanism | |
CN118152657A (en) | Deep learning course recommendation method based on xDeepFM & RoBERTa fusion model and electronic equipment | |
CN115481727A (en) | Intention recognition neural network generation and optimization method based on evolutionary computation | |
CN113762591B (en) | Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning | |
CN117194771B (en) | Dynamic knowledge graph service recommendation method for graph model characterization learning | |
CN116244484B (en) | Federal cross-modal retrieval method and system for unbalanced data | |
CN113033898A (en) | Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network | |
CN117056609A (en) | Session recommendation method based on multi-layer aggregation enhanced contrast learning | |
CN115689639A (en) | Commercial advertisement click rate prediction method based on deep learning | |
CN115661546A (en) | Multi-objective optimization classification method based on feature selection and classifier joint design | |
Wang et al. | Hierarchical multimodal fusion network with dynamic multi-task learning | |
CN118052631B (en) | User portrait-based method and system for generating collection policy | |
CN116244501B (en) | Cold start recommendation method based on first-order element learning and multi-supervisor association network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |