CN111538761A - Click rate prediction method based on attention mechanism - Google Patents
Click rate prediction method based on attention mechanism Download PDFInfo
- Publication number
- CN111538761A CN111538761A CN202010317646.8A CN202010317646A CN111538761A CN 111538761 A CN111538761 A CN 111538761A CN 202010317646 A CN202010317646 A CN 202010317646A CN 111538761 A CN111538761 A CN 111538761A
- Authority
- CN
- China
- Prior art keywords
- vector
- layer
- attention mechanism
- click rate
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 40
- 239000013598 vector Substances 0.000 claims abstract description 99
- 230000002452 interceptive effect Effects 0.000 claims abstract description 20
- 238000013528 artificial neural network Methods 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000009467 reduction Effects 0.000 claims abstract description 4
- 239000010410 layer Substances 0.000 claims description 82
- 230000003993 interaction Effects 0.000 claims description 29
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 4
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 8
- 235000020303 café frappé Nutrition 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003924 mental process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Algebra (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a click rate prediction method based on an attention mechanism, which comprises the following steps: step 1, preprocessing the characteristics of users, and performing One-hot unique coding on the same type of user characteristics to obtain a high-dimensional sparse characteristic vector; step 2, reducing the dimension of the high-dimension sparse feature vector by embedding the vector, and taking the feature vector after dimension reduction as an input vector of a click rate model to be respectively brought into a compressed interactive network and a deep neural network; and 3, performing Hadamard product on the input initial characteristic vector and the input vector of each hidden layer, taking the obtained result as the input value of the next hidden layer, and increasing the combination of the characteristics by one dimension every more hidden layer. The method comprehensively considers the low-dimensional characteristics, the explicit high-dimensional characteristics and the implicit high-dimensional characteristics of the user, screens useful characteristic combinations through a self-attention mechanism, improves the prediction efficiency, does not need to manually extract the characteristics, and can extract the high-dimensional characteristic combinations.
Description
Technical Field
The invention relates to the technical field of internet application, in particular to a click rate prediction method based on an attention mechanism.
Background
With the explosive growth of internet information, the field of computer science, especially artificial intelligence technology, has made great progress. As a branch of computer science and applied science, it is mainly studied how to simulate, extend and expand the mental processes of the human brain (such as memory, learning, reasoning and decision making) using machines. At present, artificial intelligence technology is successfully applied to the fields of automatic driving, medical diagnosis, language identification, image identification, financial big data and the like.
Although the current industry has deeper research on click rate estimation, the models have some problems, such as large data volume, sparse data and the like, the industry is biased to use shallow models to solve the problems, the shallow models are difficult to train, difficult to deploy in a production environment and weak in interpretability, the shallow models are used for focusing more attention on constructing explicit combined features in a manner of manually constructing features and simple operations among some features to improve the performance of the click rate estimation model, and implicit information such as implicit combined features among deeply mined data and highly nonlinear relations inherent in the features is not provided, so that the click rate estimation method has great research significance for the advertisement click rate estimation problem. The algorithm which is widely applied at present is generally a GBDT + LR model, Wide & Deep model. However, these models have a problem that features need to be manually extracted and a high-dimensional feature combination cannot be extracted. Some models capable of being automatically extracted, such as Deep FM models, have the problem that the training mode is implicit characteristic, which easily causes overhigh dimensionality. Although the Deep & Cross model can solve the problem at present, the Deep & Cross model belongs to interaction at an element level and cannot well represent feature interaction vectors.
Disclosure of Invention
The invention provides a click rate prediction method based on an attention mechanism, and aims to solve the problems that a traditional model needs manual feature extraction, high-dimensionality feature combination cannot be extracted, and dimensionality is easily overhigh.
In order to achieve the above object, an embodiment of the present invention provides a click rate prediction method based on an attention mechanism, including:
and 5, simplifying and splicing the pooled result and the result obtained by the deep neural network into a new feature vector, and bringing the new feature vector into an output layer to obtain a predicted value.
Wherein, the step 1 specifically comprises:
collecting a data set X ═ { x) of user characteristics1,x2,……xNIs the total number of training samples, xi∈{x1,x2,……xN},xiRepresenting the ith user characteristic data to be processed.
Wherein, the step 1 further comprises:
the user features are converted into a high-dimensional sparse feature vector using one-hot encoding.
Wherein, the step 2 specifically comprises:
the low-dimensional combined features are converted by an embedded layer vector, and sparse vectors are mapped to space vectors which are relatively dense and have non-zero vector elements.
Wherein, the step 2 further comprises:
processing the raw data into data with mean value of 0 and variance of 1 by a normalization method, wherein the normalized data uses xnormExpressed, the specific calculation formula is as follows:
where x denotes continuous value data, μ denotes a variance of original data, and σ denotes a mean of the original data.
Wherein, the step 3 specifically comprises:
according to the feature vectors obtained by the embedding layer, the feature vectors are spliced into a matrix of m × d, wherein m is the number of the feature vectors, d is the dimension of the feature vectors, and x iskRepresenting the state of the k-th hidden layer in the compressed interactive network,is a matrix in which HkRepresenting the number of compression features of the hidden layer of the k layer, the feature embedding layer is called the hidden layer of the 0 th layer, H0The state calculation equation for each hidden layer k in the compressed interactive network is:
wherein H is more than or equal to 1 and less than or equal to Hk,A parameter matrix representing the h-th eigenvector, where "o" represents the Hadamard product, i.e. the product operation of the corresponding bit elements between the two vectors, xkAt xk-1On the basis of (a) and (b)0Explicit interaction results inkOrder ratio of (x)k-1And the maximum order of the obtained feature interaction is increased by 1 when a hidden layer is added to the compressed interactive network.
Wherein, the step 4 specifically comprises:
different interaction vectors are endowed with different weights by the result of each layer after vector interaction through a self-attention mechanism, and the result is subjected to summation pooling to obtain a high-dimensional interaction result.
Wherein, the step 5 specifically comprises:
the vector of the embedded layer is brought into a deep neural network to obtain a result after multilayer interaction, the result obtained by the deep neural network and the result obtained by a compressed interaction network are compressed and spliced into a new matrix and are brought into a single-layer perceptron to obtain a final result, and an output result formula is as follows:
where σ is sigmoid function, xfIs the original characteristic of the image to be displayed,is the output of the DNN output layer, ycinIs the output of the CIN output layer,the linear regression, the weight matrix of the DNN output layer and the CIN output layer, and b is a learnable parameter are shown.
Wherein, the step 5 further comprises:
the weight parameters of the model are continuously updated through the loss function and the gradient descent, and the formula of the loss function is as follows:
wherein,representing the predicted value of the model prediction, yiRepresenting the true value of the actual data, N being the total number of training instances, the optimization process is to minimize the following objective function:
where λ represents the regularization term and θ represents the parameter set, including parameters in the linear part, the CIN part and the DNN part.
The scheme of the invention has the following beneficial effects:
according to the click rate prediction method based on the attention mechanism, the dense vectors behind the Embedding layer are interacted similarly to a residual error network, the result obtained through multiple interactions is summed and pooled through the attention mechanism, the result of the deep neural network and the result of the compressed interaction network are spliced into a new vector, the new vector is output to obtain a result, the prediction result is more accurate and reliable, the low-dimensional feature, the explicit high-dimensional feature and the implicit high-dimensional feature of a user are comprehensively considered, the useful feature combination is screened through the attention mechanism, the prediction efficiency is improved, the feature combination with high dimensionality can be extracted without manually extracting the features, and the overhigh dimensionality is not easily caused.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a model architecture diagram of the present invention;
FIG. 3 is a schematic diagram of each layer of the interactive network of the present invention;
FIG. 4 is a schematic diagram of the self attention mechanism summing pooling of the present invention;
FIG. 5 is a graph showing the results of the experiment according to the present invention;
fig. 6 is a schematic diagram illustrating the influence of different network layer numbers on the experimental results.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a click rate prediction method based on an attention mechanism, aiming at the problems that the existing model needs manual feature extraction, cannot extract feature combinations with high dimensionality and easily causes overhigh dimensionality.
As shown in fig. 1 to 6, an embodiment of the present invention provides a click rate prediction method based on an attention mechanism, including: step 1, preprocessing the characteristics of users, and performing One-hot unique coding on the same type of user characteristics to obtain a high-dimensional sparse characteristic vector; step 2, reducing the dimension of the high-dimension sparse feature vector by embedding the vector, and taking the feature vector after dimension reduction as an input vector of a click rate model to be respectively brought into a compressed interactive network and a deep neural network; step 3, carrying out Hadamard product on the input initial characteristic vector and the input vector of each hidden layer, taking the obtained result as the input value of the next hidden layer, and increasing the combination of the characteristics by one dimension every more hidden layer; step 4, obtaining useful combination characteristics by the result vector obtained by each layer through an attention mechanism, and summing and pooling the combination characteristics; and 5, simplifying and splicing the pooled result and the result obtained by the deep neural network into a new feature vector, and bringing the new feature vector into an output layer to obtain a predicted value.
Wherein, the step 1 specifically comprises: collecting a data set X ═ { x) of user characteristics1,x2,……xNIs the total number of training samples, xi∈{x1,x2,……xN},xiRepresenting the ith user characteristic data to be processed.
Wherein, the step 1 further comprises: the user features are converted into a high-dimensional sparse feature vector using one-hot encoding.
In the click rate prediction method based on the attention mechanism according to the above embodiment of the present invention, the encoding manner of the unique hot code is relatively simple, and the N states are encoded according to the N-bit state register, for example, the basic information of the user is user ═ user ID ═ 02, gender ═ male, and interest ═ rock ═ and roll, and the vector converted according to the definition of the unique hot code becomes a vector composed of 0 and 1, such as user ═ 0, 1, 0, …, 0] [1, 0] [0, 1, 0, …, 0 ].
Wherein, the step 2 specifically comprises: the low-dimensional combined features are converted by an embedded layer vector, and sparse vectors are mapped to space vectors which are relatively dense and have non-zero vector elements.
Wherein, the step 2 further comprises: processing the raw data into data with mean value of 0 and variance of 1 by a normalization method, wherein the normalized data uses xnormExpressed, the specific calculation formula is as follows:
where x denotes continuous value data, μ denotes a variance of original data, and σ denotes a mean of the original data.
According to the click rate prediction method based on the attention mechanism, aiming at the characteristic that the characteristic dimension of One-hot coding is too high, an embedded layer vector is used for converting the characteristic dimension into a low-dimensional combined characteristic, a sparse vector is mapped into a space vector which is relatively dense and has non-zero vector elements, for the embedded vector, initial embedded characteristics are generated by random numbers, and are iterated continuously through gradient descent, so that an accurate embedded vector value is obtained finally, for continuous values, characteristic values need to be subjected to normalization processing, specifically, original data are processed into data with the mean value of 0 and the variance of 1 through a normalization method, the normalization method can change the distribution of the original data, is insensitive to abnormal values, and is suitable for a large data scene.
Splicing the feature vectors into a matrix of m × d according to the feature vectors obtained by the embedding layer, wherein m is the number of the feature vectors, d is the dimension of the feature vectors, and x iskRepresenting the state of the k-th hidden layer in the compressed interactive network,is a matrix in which HkRepresenting the number of compression features of the hidden layer of the k layer, the feature embedding layer is called the hidden layer of the 0 th layer, H0The state calculation equation for each hidden layer k in the compressed interactive network is:
wherein H is more than or equal to 1 and less than or equal to Hk,A parameter matrix representing the h-th eigenvector, whereinRepresenting a Hadamard product, i.e. the operation of the product of corresponding bit elements between two vectors, xkAt xk-1On the basis of (a) and (b)0Explicit interaction results inkOrder ratio of (x)k-1And the maximum order of the obtained feature interaction is increased by 1 when a hidden layer is added to the compressed interactive network.
The click rate prediction method based on attention mechanism according to the above embodiment of the present invention, the product operation of the corresponding bit elements between two vectors, for example,
wherein, the step 4 specifically comprises: different interaction vectors are endowed with different weights by the result of each layer after vector interaction through a self-attention mechanism, and the result is subjected to summation pooling to obtain a high-dimensional interaction result.
According to the click rate prediction method based on the attention mechanism, which is disclosed by the embodiment of the invention, because the vector interaction has the defect of high time complexity, different interaction vectors are endowed with different weights through the self-attention mechanism according to the result of the vector interaction of each layer, so that a large amount of time can be saved.
Wherein, the step 5 specifically comprises: the vector of the embedded layer is brought into a deep neural network to obtain a result after multilayer interaction, the result obtained by the deep neural network and the result obtained by a compressed interaction network are compressed and spliced into a new matrix and are brought into a single-layer perceptron to obtain a final result, and an output result formula is as follows:
where σ is sigmoid function, xfIs the original characteristic of the image to be displayed,is the output of the DNN output layer, ycinIs the output of the CIN output layer,the linear regression, the weight matrix of the DNN output layer and the CIN output layer, and b is a learnable parameter are shown.
Wherein, the step 5 further comprises: the weight parameters of the model are continuously updated through the loss function and the gradient descent, and the formula of the loss function is as follows:
wherein,representing the predicted value of the model prediction, yiRepresenting the true value of the actual data, N being the total number of training instances, the optimization process is to minimize the following objective function:
where λ represents the regularization term and θ represents the parameter set, including parameters in the linear part, the CIN part and the DNN part.
In the click rate prediction method based on the attention mechanism according to the embodiment of the present invention, the experimental part of model training and prediction adopts an industry public data set: large ad click through rate prediction Criteo dataset and context based APP recommendation Frappe dataset. The Criteo data set contains a total of 11 GB-sized 7-day continuous user behavior logs, about 4100 ten thousand historical records, each training sample comprises 39 data features of different fields, wherein the 11 th to 13 th dimensions 113 are continuous value anonymous features, the C1 to C26 are discrete value anonymous features, and the desensitized anonymous features mainly comprise user features, item features and environment features and are transparent to the specific meaning of each field feature. Another data set is based on the APP recommended Frappe data set, each log contains 8 contextual category features such as weather, city, time, etc. except user ID and article ID, and features C1-C10 containing 10 fields in the Frappe data set all belong to category features and have no numerical features, and the Frappe data set is relatively small in size and has a total of 288609 training samples.
The data of 1/10 was randomly selected as the validation set for the Criteo dataset and the Frappe dataset, and the remaining data was used as the training set. The click rate prediction method based on the attention mechanism is implemented based on Tensorflow3+ python3.6, and an optimal set of hyper-parameters is found for each model in a mode of executing grid search on a verification set. The optimization method is Adam, the learning rate is 0.001, the batch size is 4096, regularization is performed using L2 with a coefficient of 0.0001, the number of hidden nodes is defaulted: 400 in the DNN output layer; the CIN output layer is 200 on the Criteo dataset and 100 on the Frappe dataset, for the CrossNet and CIN models in Deep & Cross, because of the difference in data, experiments will be performed by changing the depth of the hidden layer and comparing the best experimental results for each model.
As shown in fig. 5, the click-through rate prediction method (Our's) based on the attention mechanism is compared with other model experimental results, and as can be seen from the experimental results, the LR model is the least performing one of all models, because the LR model can only process some simple feature combinations with low dimensionality, which indicates that it is very necessary to extract implicit features from sparse data by a deep learning method; other models which are trained through Deep learning, such as PNN, Wide & Deep, Deep FM, Deep & Cross, have better effects than FM models, show that real data features are generally very complex, like the FM models which can only process two-dimensional features and cannot process more than three-dimensional features well, so the FM models have not very good effect on high-dimensional feature interaction processing; the processing effect of the DeepFM and Deep & Cross mixed models is better than that of the PNN model only considering the high-dimensional features, which indicates that the low-dimensional interactive features and the high-dimensional interactive features need to be considered simultaneously, and the Wide & Deep model has a lower effect than that of the PNN model because the feature combination mode of the Wide & Deep model is still manually combined; the prediction result of the click rate prediction method based on the attention mechanism is better than three mixed models including Wide & Deep, Deep FM and Deep & Cross, which indicates the need of further subdividing the explicit high-dimensional features, the explicit features are divided into high-dimensional features and low-dimensional features, and certain effect is achieved by combining the training of the implicit high-dimensional features (the features trained by DNN). Compared with the network depth of dozens of layers of computer vision, the network setting of the model of the click rate prediction method based on the attention mechanism is not particularly deep, and a good effect can be achieved by only about 3 layers. As can be seen from fig. 6, when the number of network layers is less than 3, the training result of the model is increasing, and when the number of network layers is greater than 3, the training result of the model is decreasing, which indicates that the more complicated the number of network layers is, the worse the training effect is, and overfitting is easily generated.
The click rate prediction method based on the attention mechanism according to the above embodiment of the present invention maps the same class of user features into high-dimensional sparse vectors by unique hot coding, changes the features into low-dimensional dense vectors by Embedding layer Embedding, brings the feature vectors into a compressed interactive network and a deep neural network, respectively, obtains the input value of the next layer by performing the product operation of the corresponding bit elements of the matrix on the initial input value and the input value of the hidden layer by the compressed interactive network and the deep neural network, obtains the input vector of each hidden layer by the calculation of a plurality of hidden layers, performs weight calculation on the input vector of each hidden layer by the attention mechanism, obtains the result of the high-dimensional explicit interactive vector by summing pooling, outputs the result by activating the function after splicing the result obtained by deep learning in the deep neural network and the result of the compressed interactive network, according to the click rate prediction method based on the attention mechanism, in Criteo and Frappe public data sets, low-dimensional features, explicit high-dimensional features and implicit high-dimensional features of users are comprehensively considered, useful feature combinations are screened through the attention mechanism in the attention mechanism, prediction efficiency is improved, manual feature extraction is not needed, high-dimensional feature combinations can be extracted, too high dimensionality is not easily caused, the capability of a wide-depth model for extracting complex combination features is improved, and the click rate prediction method based on the attention mechanism is good in prediction effect by multiplying vector levels instead of element levels and fusing the attention mechanism.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (9)
1. A click rate prediction method based on an attention mechanism is characterized by comprising the following steps:
step 1, preprocessing the characteristics of users, and performing One-hot unique coding on the same type of user characteristics to obtain a high-dimensional sparse characteristic vector;
step 2, reducing the dimension of the high-dimension sparse feature vector by embedding the vector, and taking the feature vector after dimension reduction as an input vector of a click rate model to be respectively brought into a compressed interactive network and a deep neural network;
step 3, carrying out Hadamard product on the input initial characteristic vector and the input vector of each hidden layer, taking the obtained result as the input value of the next hidden layer, and increasing the combination of the characteristics by one dimension every more hidden layer;
step 4, obtaining useful combination characteristics by the result vector obtained by each layer through an attention mechanism, and summing and pooling the combination characteristics;
and 5, simplifying and splicing the pooled result and the result obtained by the deep neural network into a new feature vector, and bringing the new feature vector into an output layer to obtain a predicted value.
2. The attention mechanism-based click rate prediction method according to claim 1, wherein the step 1 specifically comprises:
collecting a data set X ═ { x) of user characteristics1,x2,……xNIs the total number of training samples, xi∈{x1,x2,……xN},xiRepresenting the ith user characteristic data to be processed.
3. The attention mechanism-based click rate prediction method according to claim 2, wherein the step 1 further comprises:
the user features are converted into a high-dimensional sparse feature vector using one-hot encoding.
4. The attention mechanism-based click rate prediction method according to claim 3, wherein the step 2 specifically comprises:
the low-dimensional combined features are converted by an embedded layer vector, and sparse vectors are mapped to space vectors which are relatively dense and have non-zero vector elements.
5. The attention mechanism-based click rate prediction method of claim 4, wherein the step 2 further comprises:
processing the raw data into data with mean value of 0 and variance of 1 by a normalization method, wherein the normalized data uses xnormExpressed, the specific calculation formula is as follows:
where x denotes continuous value data, μ denotes a variance of original data, and σ denotes a mean of the original data.
6. The attention mechanism-based click rate prediction method according to claim 5, wherein the step 3 specifically comprises:
according to the feature vectors obtained by the embedding layer, the feature vectors are spliced into a matrix of m × d, wherein m is the number of the feature vectors, d is the dimension of the feature vectors, and x iskRepresenting the state of the k-th hidden layer in the compressed interactive network,is a matrix in which HkRepresenting the number of compression features of the hidden layer of the k layer, the feature embedding layer is called the hidden layer of the 0 th layer, H0The state calculation equation for each hidden layer k in the compressed interactive network is:
wherein H is more than or equal to 1 and less than or equal to Hk,A parameter matrix representing the h-th eigenvector, whereinRepresenting a Hadamard product, i.e. the operation of the product of corresponding bit elements between two vectors, xkAt xk-1On the basis of (a) and (b)0Explicit interaction results inkOrder ratio of (x)k-1And the maximum order of the obtained feature interaction is increased by 1 when a hidden layer is added to the compressed interactive network.
7. The attention mechanism-based click rate prediction method according to claim 6, wherein the step 4 specifically comprises:
different interaction vectors are endowed with different weights by the result of each layer after vector interaction through a self-attention mechanism, and the result is subjected to summation pooling to obtain a high-dimensional interaction result.
8. The attention mechanism-based click rate prediction method according to claim 7, wherein the step 5 specifically comprises:
the vector of the embedded layer is brought into a deep neural network to obtain a result after multilayer interaction, the result obtained by the deep neural network and the result obtained by a compressed interaction network are compressed and spliced into a new matrix and are brought into a single-layer perceptron to obtain a final result, and an output result formula is as follows:
where σ is sigmoid function, xfIs the original characteristic of the image to be displayed,is the output of the DNN output layer, ycinIs the output of the CIN output layer,the linear regression, the weight matrix of the DNN output layer and the CIN output layer, and b is a learnable parameter are shown.
9. The attention mechanism-based click rate prediction method of claim 8, wherein the step 5 further comprises:
the weight parameters of the model are continuously updated through the loss function and the gradient descent, and the formula of the loss function is as follows:
wherein,representing the predicted value of the model prediction, yiRepresenting the true value of the actual data, N being the total number of training instances, the optimization process is to minimize the following objective function:
where λ represents the regularization term and θ represents the parameter set, including parameters in the linear part, the CIN part and the DNN part.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010317646.8A CN111538761A (en) | 2020-04-21 | 2020-04-21 | Click rate prediction method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010317646.8A CN111538761A (en) | 2020-04-21 | 2020-04-21 | Click rate prediction method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111538761A true CN111538761A (en) | 2020-08-14 |
Family
ID=71979143
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010317646.8A Pending CN111538761A (en) | 2020-04-21 | 2020-04-21 | Click rate prediction method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111538761A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737586A (en) * | 2020-08-19 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Information recommendation method, device, equipment and computer readable storage medium |
CN112270568A (en) * | 2020-11-02 | 2021-01-26 | 重庆邮电大学 | Social e-commerce platform marketing activity order rate prediction method facing hidden information |
CN112492396A (en) * | 2020-12-08 | 2021-03-12 | 中国计量大学 | Short video click rate prediction method based on fine-grained multi-aspect analysis |
CN112559877A (en) * | 2020-12-24 | 2021-03-26 | 齐鲁工业大学 | CTR (China railway) estimation method and system based on cross-platform heterogeneous data and behavior context |
CN112633937A (en) * | 2020-12-30 | 2021-04-09 | 上海数鸣人工智能科技有限公司 | Marketing prediction method based on dimension reduction of depth automatic encoder and gradient lifting decision tree |
CN112633931A (en) * | 2020-12-28 | 2021-04-09 | 广州博冠信息科技有限公司 | Click rate prediction method, device, electronic equipment and medium |
CN112733918A (en) * | 2020-12-31 | 2021-04-30 | 中南大学 | Graph classification method based on attention mechanism and compound toxicity prediction method |
CN113010774A (en) * | 2021-02-24 | 2021-06-22 | 四川省人工智能研究院(宜宾) | Click rate prediction method based on dynamic deep attention model |
CN113220974A (en) * | 2021-05-31 | 2021-08-06 | 北京爱奇艺科技有限公司 | Click rate prediction model training and search recall method, device, equipment and medium |
CN113298084A (en) * | 2021-04-01 | 2021-08-24 | 山东师范大学 | Feature map extraction method and system for semantic segmentation |
CN113407663A (en) * | 2020-11-05 | 2021-09-17 | 腾讯科技(深圳)有限公司 | Image-text content quality identification method and device based on artificial intelligence |
CN113535800A (en) * | 2021-06-03 | 2021-10-22 | 同盾科技有限公司 | Feature representation method in credit scenario, electronic device, and storage medium |
CN113656272A (en) * | 2021-08-16 | 2021-11-16 | Oppo广东移动通信有限公司 | Data processing method and device, storage medium, user equipment and server |
CN114358364A (en) * | 2021-11-20 | 2022-04-15 | 重庆邮电大学 | Attention mechanism-based short video frequency click rate big data estimation method |
CN115809372A (en) * | 2023-02-03 | 2023-03-17 | 中国科学技术大学 | Click rate prediction model training method and device based on decoupling invariant learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109062893A (en) * | 2018-07-13 | 2018-12-21 | 华南理工大学 | A kind of product name recognition methods based on full text attention mechanism |
CN109960759A (en) * | 2019-03-22 | 2019-07-02 | 中山大学 | Recommender system clicking rate prediction technique based on deep neural network |
WO2019240900A1 (en) * | 2018-06-12 | 2019-12-19 | Siemens Aktiengesellschaft | Attention loss based deep neural network training |
US20200073937A1 (en) * | 2018-08-30 | 2020-03-05 | International Business Machines Corporation | Multi-aspect sentiment analysis by collaborative attention allocation |
CN110991464A (en) * | 2019-11-08 | 2020-04-10 | 华南理工大学 | Commodity click rate prediction method based on deep multi-mode data fusion |
-
2020
- 2020-04-21 CN CN202010317646.8A patent/CN111538761A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019240900A1 (en) * | 2018-06-12 | 2019-12-19 | Siemens Aktiengesellschaft | Attention loss based deep neural network training |
CN109062893A (en) * | 2018-07-13 | 2018-12-21 | 华南理工大学 | A kind of product name recognition methods based on full text attention mechanism |
US20200073937A1 (en) * | 2018-08-30 | 2020-03-05 | International Business Machines Corporation | Multi-aspect sentiment analysis by collaborative attention allocation |
CN109960759A (en) * | 2019-03-22 | 2019-07-02 | 中山大学 | Recommender system clicking rate prediction technique based on deep neural network |
CN110991464A (en) * | 2019-11-08 | 2020-04-10 | 华南理工大学 | Commodity click rate prediction method based on deep multi-mode data fusion |
Non-Patent Citations (1)
Title |
---|
JIANXUN LIAN等: "xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems" * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737586A (en) * | 2020-08-19 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Information recommendation method, device, equipment and computer readable storage medium |
CN112270568A (en) * | 2020-11-02 | 2021-01-26 | 重庆邮电大学 | Social e-commerce platform marketing activity order rate prediction method facing hidden information |
CN112270568B (en) * | 2020-11-02 | 2022-07-12 | 重庆邮电大学 | Order rate prediction method for social e-commerce platform marketing campaign facing hidden information |
CN113407663A (en) * | 2020-11-05 | 2021-09-17 | 腾讯科技(深圳)有限公司 | Image-text content quality identification method and device based on artificial intelligence |
CN113407663B (en) * | 2020-11-05 | 2024-03-15 | 腾讯科技(深圳)有限公司 | Image-text content quality identification method and device based on artificial intelligence |
CN112492396A (en) * | 2020-12-08 | 2021-03-12 | 中国计量大学 | Short video click rate prediction method based on fine-grained multi-aspect analysis |
CN112559877A (en) * | 2020-12-24 | 2021-03-26 | 齐鲁工业大学 | CTR (China railway) estimation method and system based on cross-platform heterogeneous data and behavior context |
CN112633931A (en) * | 2020-12-28 | 2021-04-09 | 广州博冠信息科技有限公司 | Click rate prediction method, device, electronic equipment and medium |
CN112633937A (en) * | 2020-12-30 | 2021-04-09 | 上海数鸣人工智能科技有限公司 | Marketing prediction method based on dimension reduction of depth automatic encoder and gradient lifting decision tree |
CN112633937B (en) * | 2020-12-30 | 2023-10-20 | 上海数鸣人工智能科技有限公司 | Marketing prediction method based on dimension reduction and GBDT (Global positioning System) of depth automatic encoder |
CN112733918B (en) * | 2020-12-31 | 2023-08-29 | 中南大学 | Attention mechanism-based graph classification method and compound toxicity prediction method |
CN112733918A (en) * | 2020-12-31 | 2021-04-30 | 中南大学 | Graph classification method based on attention mechanism and compound toxicity prediction method |
CN113010774A (en) * | 2021-02-24 | 2021-06-22 | 四川省人工智能研究院(宜宾) | Click rate prediction method based on dynamic deep attention model |
CN113010774B (en) * | 2021-02-24 | 2023-04-07 | 四川省人工智能研究院(宜宾) | Click rate prediction method based on dynamic deep attention model |
CN113298084A (en) * | 2021-04-01 | 2021-08-24 | 山东师范大学 | Feature map extraction method and system for semantic segmentation |
CN113220974A (en) * | 2021-05-31 | 2021-08-06 | 北京爱奇艺科技有限公司 | Click rate prediction model training and search recall method, device, equipment and medium |
CN113220974B (en) * | 2021-05-31 | 2024-06-07 | 北京爱奇艺科技有限公司 | Click rate prediction model training and search recall method, device, equipment and medium |
CN113535800A (en) * | 2021-06-03 | 2021-10-22 | 同盾科技有限公司 | Feature representation method in credit scenario, electronic device, and storage medium |
CN113656272A (en) * | 2021-08-16 | 2021-11-16 | Oppo广东移动通信有限公司 | Data processing method and device, storage medium, user equipment and server |
CN114358364A (en) * | 2021-11-20 | 2022-04-15 | 重庆邮电大学 | Attention mechanism-based short video frequency click rate big data estimation method |
CN114358364B (en) * | 2021-11-20 | 2024-06-07 | 上海愚见观池科技有限公司 | Short video click rate big data prediction method based on attention mechanism |
CN115809372B (en) * | 2023-02-03 | 2023-06-16 | 中国科学技术大学 | Click rate prediction model training method and device based on decoupling invariant learning |
CN115809372A (en) * | 2023-02-03 | 2023-03-17 | 中国科学技术大学 | Click rate prediction model training method and device based on decoupling invariant learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111538761A (en) | Click rate prediction method based on attention mechanism | |
CN109657156B (en) | Individualized recommendation method based on loop generation countermeasure network | |
Shiri et al. | A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU | |
CN111209386B (en) | Personalized text recommendation method based on deep learning | |
CN111127146B (en) | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder | |
Taylor et al. | Learning invariance through imitation | |
CN111259140B (en) | False comment detection method based on LSTM multi-entity feature fusion | |
Dhurandhar et al. | Tip: Typifying the interpretability of procedures | |
CN111177579B (en) | Application method of integrated diversity enhanced ultra-deep factorization machine model | |
Raza et al. | Understanding and using rough set based feature selection: concepts, techniques and applications | |
CN116469561A (en) | Breast cancer survival prediction method based on deep learning | |
CN113505307A (en) | Social network user region identification method based on weak supervision enhancement | |
CN116228368A (en) | Advertisement click rate prediction method based on deep multi-behavior network | |
Bhadoria et al. | Bunch graph based dimensionality reduction using auto-encoder for character recognition | |
Rijal et al. | Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19. | |
Lu et al. | Deep unsupervised learning using spike-timing-dependent plasticity | |
CN109934281B (en) | Unsupervised training method of two-class network | |
Julian et al. | Construction of Deep Representations | |
Sun et al. | Evolutionary Deep Neural Architecture Search: Fundamentals, Methods, and Recent Advances | |
CN112927248B (en) | Point cloud segmentation method based on local feature enhancement and conditional random field | |
CN115239967A (en) | Image generation method and device for generating countermeasure network based on Trans-CSN | |
CN112561599A (en) | Click rate prediction method based on attention network learning and fusing domain feature interaction | |
Chen et al. | Face recognition using DCT and hierarchical RBF model | |
Shen et al. | A deep embedding model for co-occurrence learning | |
CN113158577A (en) | Discrete data characterization learning method and system based on hierarchical coupling relation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200814 |
|
RJ01 | Rejection of invention patent application after publication |