CN111538761A - Click rate prediction method based on attention mechanism - Google Patents

Click rate prediction method based on attention mechanism Download PDF

Info

Publication number
CN111538761A
CN111538761A CN202010317646.8A CN202010317646A CN111538761A CN 111538761 A CN111538761 A CN 111538761A CN 202010317646 A CN202010317646 A CN 202010317646A CN 111538761 A CN111538761 A CN 111538761A
Authority
CN
China
Prior art keywords
vector
layer
attention mechanism
click rate
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010317646.8A
Other languages
Chinese (zh)
Inventor
邓晓衡
刘良知
李海霞
刘梦杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202010317646.8A priority Critical patent/CN111538761A/en
Publication of CN111538761A publication Critical patent/CN111538761A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a click rate prediction method based on an attention mechanism, which comprises the following steps: step 1, preprocessing the characteristics of users, and performing One-hot unique coding on the same type of user characteristics to obtain a high-dimensional sparse characteristic vector; step 2, reducing the dimension of the high-dimension sparse feature vector by embedding the vector, and taking the feature vector after dimension reduction as an input vector of a click rate model to be respectively brought into a compressed interactive network and a deep neural network; and 3, performing Hadamard product on the input initial characteristic vector and the input vector of each hidden layer, taking the obtained result as the input value of the next hidden layer, and increasing the combination of the characteristics by one dimension every more hidden layer. The method comprehensively considers the low-dimensional characteristics, the explicit high-dimensional characteristics and the implicit high-dimensional characteristics of the user, screens useful characteristic combinations through a self-attention mechanism, improves the prediction efficiency, does not need to manually extract the characteristics, and can extract the high-dimensional characteristic combinations.

Description

Click rate prediction method based on attention mechanism
Technical Field
The invention relates to the technical field of internet application, in particular to a click rate prediction method based on an attention mechanism.
Background
With the explosive growth of internet information, the field of computer science, especially artificial intelligence technology, has made great progress. As a branch of computer science and applied science, it is mainly studied how to simulate, extend and expand the mental processes of the human brain (such as memory, learning, reasoning and decision making) using machines. At present, artificial intelligence technology is successfully applied to the fields of automatic driving, medical diagnosis, language identification, image identification, financial big data and the like.
Although the current industry has deeper research on click rate estimation, the models have some problems, such as large data volume, sparse data and the like, the industry is biased to use shallow models to solve the problems, the shallow models are difficult to train, difficult to deploy in a production environment and weak in interpretability, the shallow models are used for focusing more attention on constructing explicit combined features in a manner of manually constructing features and simple operations among some features to improve the performance of the click rate estimation model, and implicit information such as implicit combined features among deeply mined data and highly nonlinear relations inherent in the features is not provided, so that the click rate estimation method has great research significance for the advertisement click rate estimation problem. The algorithm which is widely applied at present is generally a GBDT + LR model, Wide & Deep model. However, these models have a problem that features need to be manually extracted and a high-dimensional feature combination cannot be extracted. Some models capable of being automatically extracted, such as Deep FM models, have the problem that the training mode is implicit characteristic, which easily causes overhigh dimensionality. Although the Deep & Cross model can solve the problem at present, the Deep & Cross model belongs to interaction at an element level and cannot well represent feature interaction vectors.
Disclosure of Invention
The invention provides a click rate prediction method based on an attention mechanism, and aims to solve the problems that a traditional model needs manual feature extraction, high-dimensionality feature combination cannot be extracted, and dimensionality is easily overhigh.
In order to achieve the above object, an embodiment of the present invention provides a click rate prediction method based on an attention mechanism, including:
step 1, preprocessing the characteristics of users, and performing One-hot unique coding on the same type of user characteristics to obtain a high-dimensional sparse characteristic vector;
step 2, reducing the dimension of the high-dimension sparse feature vector by embedding the vector, and taking the feature vector after dimension reduction as an input vector of a click rate model to be respectively brought into a compressed interactive network and a deep neural network;
step 3, carrying out Hadamard product on the input initial characteristic vector and the input vector of each hidden layer, taking the obtained result as the input value of the next hidden layer, and increasing the combination of the characteristics by one dimension every more hidden layer;
step 4, obtaining useful combination characteristics by the result vector obtained by each layer through an attention mechanism, and summing and pooling the combination characteristics;
and 5, simplifying and splicing the pooled result and the result obtained by the deep neural network into a new feature vector, and bringing the new feature vector into an output layer to obtain a predicted value.
Wherein, the step 1 specifically comprises:
collecting a data set X ═ { x) of user characteristics1,x2,……xNIs the total number of training samples, xi∈{x1,x2,……xN},xiRepresenting the ith user characteristic data to be processed.
Wherein, the step 1 further comprises:
the user features are converted into a high-dimensional sparse feature vector using one-hot encoding.
Wherein, the step 2 specifically comprises:
the low-dimensional combined features are converted by an embedded layer vector, and sparse vectors are mapped to space vectors which are relatively dense and have non-zero vector elements.
Wherein, the step 2 further comprises:
processing the raw data into data with mean value of 0 and variance of 1 by a normalization method, wherein the normalized data uses xnormExpressed, the specific calculation formula is as follows:
Figure BDA0002460043810000021
where x denotes continuous value data, μ denotes a variance of original data, and σ denotes a mean of the original data.
Wherein, the step 3 specifically comprises:
according to the feature vectors obtained by the embedding layer, the feature vectors are spliced into a matrix of m × d, wherein m is the number of the feature vectors, d is the dimension of the feature vectors, and x iskRepresenting the state of the k-th hidden layer in the compressed interactive network,
Figure BDA0002460043810000031
is a matrix in which HkRepresenting the number of compression features of the hidden layer of the k layer, the feature embedding layer is called the hidden layer of the 0 th layer, H0The state calculation equation for each hidden layer k in the compressed interactive network is:
Figure BDA0002460043810000032
wherein H is more than or equal to 1 and less than or equal to Hk
Figure BDA0002460043810000033
A parameter matrix representing the h-th eigenvector, where "o" represents the Hadamard product, i.e. the product operation of the corresponding bit elements between the two vectors, xkAt xk-1On the basis of (a) and (b)0Explicit interaction results inkOrder ratio of (x)k-1And the maximum order of the obtained feature interaction is increased by 1 when a hidden layer is added to the compressed interactive network.
Wherein, the step 4 specifically comprises:
different interaction vectors are endowed with different weights by the result of each layer after vector interaction through a self-attention mechanism, and the result is subjected to summation pooling to obtain a high-dimensional interaction result.
Wherein, the step 5 specifically comprises:
the vector of the embedded layer is brought into a deep neural network to obtain a result after multilayer interaction, the result obtained by the deep neural network and the result obtained by a compressed interaction network are compressed and spliced into a new matrix and are brought into a single-layer perceptron to obtain a final result, and an output result formula is as follows:
Figure BDA0002460043810000034
where σ is sigmoid function, xfIs the original characteristic of the image to be displayed,
Figure BDA0002460043810000035
is the output of the DNN output layer, ycinIs the output of the CIN output layer,
Figure BDA0002460043810000036
the linear regression, the weight matrix of the DNN output layer and the CIN output layer, and b is a learnable parameter are shown.
Wherein, the step 5 further comprises:
the weight parameters of the model are continuously updated through the loss function and the gradient descent, and the formula of the loss function is as follows:
Figure BDA0002460043810000037
wherein,
Figure BDA0002460043810000038
representing the predicted value of the model prediction, yiRepresenting the true value of the actual data, N being the total number of training instances, the optimization process is to minimize the following objective function:
Figure BDA0002460043810000039
where λ represents the regularization term and θ represents the parameter set, including parameters in the linear part, the CIN part and the DNN part.
The scheme of the invention has the following beneficial effects:
according to the click rate prediction method based on the attention mechanism, the dense vectors behind the Embedding layer are interacted similarly to a residual error network, the result obtained through multiple interactions is summed and pooled through the attention mechanism, the result of the deep neural network and the result of the compressed interaction network are spliced into a new vector, the new vector is output to obtain a result, the prediction result is more accurate and reliable, the low-dimensional feature, the explicit high-dimensional feature and the implicit high-dimensional feature of a user are comprehensively considered, the useful feature combination is screened through the attention mechanism, the prediction efficiency is improved, the feature combination with high dimensionality can be extracted without manually extracting the features, and the overhigh dimensionality is not easily caused.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a model architecture diagram of the present invention;
FIG. 3 is a schematic diagram of each layer of the interactive network of the present invention;
FIG. 4 is a schematic diagram of the self attention mechanism summing pooling of the present invention;
FIG. 5 is a graph showing the results of the experiment according to the present invention;
fig. 6 is a schematic diagram illustrating the influence of different network layer numbers on the experimental results.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a click rate prediction method based on an attention mechanism, aiming at the problems that the existing model needs manual feature extraction, cannot extract feature combinations with high dimensionality and easily causes overhigh dimensionality.
As shown in fig. 1 to 6, an embodiment of the present invention provides a click rate prediction method based on an attention mechanism, including: step 1, preprocessing the characteristics of users, and performing One-hot unique coding on the same type of user characteristics to obtain a high-dimensional sparse characteristic vector; step 2, reducing the dimension of the high-dimension sparse feature vector by embedding the vector, and taking the feature vector after dimension reduction as an input vector of a click rate model to be respectively brought into a compressed interactive network and a deep neural network; step 3, carrying out Hadamard product on the input initial characteristic vector and the input vector of each hidden layer, taking the obtained result as the input value of the next hidden layer, and increasing the combination of the characteristics by one dimension every more hidden layer; step 4, obtaining useful combination characteristics by the result vector obtained by each layer through an attention mechanism, and summing and pooling the combination characteristics; and 5, simplifying and splicing the pooled result and the result obtained by the deep neural network into a new feature vector, and bringing the new feature vector into an output layer to obtain a predicted value.
Wherein, the step 1 specifically comprises: collecting a data set X ═ { x) of user characteristics1,x2,……xNIs the total number of training samples, xi∈{x1,x2,……xN},xiRepresenting the ith user characteristic data to be processed.
Wherein, the step 1 further comprises: the user features are converted into a high-dimensional sparse feature vector using one-hot encoding.
In the click rate prediction method based on the attention mechanism according to the above embodiment of the present invention, the encoding manner of the unique hot code is relatively simple, and the N states are encoded according to the N-bit state register, for example, the basic information of the user is user ═ user ID ═ 02, gender ═ male, and interest ═ rock ═ and roll, and the vector converted according to the definition of the unique hot code becomes a vector composed of 0 and 1, such as user ═ 0, 1, 0, …, 0] [1, 0] [0, 1, 0, …, 0 ].
Wherein, the step 2 specifically comprises: the low-dimensional combined features are converted by an embedded layer vector, and sparse vectors are mapped to space vectors which are relatively dense and have non-zero vector elements.
Wherein, the step 2 further comprises: processing the raw data into data with mean value of 0 and variance of 1 by a normalization method, wherein the normalized data uses xnormExpressed, the specific calculation formula is as follows:
Figure BDA0002460043810000051
where x denotes continuous value data, μ denotes a variance of original data, and σ denotes a mean of the original data.
According to the click rate prediction method based on the attention mechanism, aiming at the characteristic that the characteristic dimension of One-hot coding is too high, an embedded layer vector is used for converting the characteristic dimension into a low-dimensional combined characteristic, a sparse vector is mapped into a space vector which is relatively dense and has non-zero vector elements, for the embedded vector, initial embedded characteristics are generated by random numbers, and are iterated continuously through gradient descent, so that an accurate embedded vector value is obtained finally, for continuous values, characteristic values need to be subjected to normalization processing, specifically, original data are processed into data with the mean value of 0 and the variance of 1 through a normalization method, the normalization method can change the distribution of the original data, is insensitive to abnormal values, and is suitable for a large data scene.
Splicing the feature vectors into a matrix of m × d according to the feature vectors obtained by the embedding layer, wherein m is the number of the feature vectors, d is the dimension of the feature vectors, and x iskRepresenting the state of the k-th hidden layer in the compressed interactive network,
Figure BDA0002460043810000061
is a matrix in which HkRepresenting the number of compression features of the hidden layer of the k layer, the feature embedding layer is called the hidden layer of the 0 th layer, H0The state calculation equation for each hidden layer k in the compressed interactive network is:
Figure BDA0002460043810000062
wherein H is more than or equal to 1 and less than or equal to Hk
Figure BDA0002460043810000063
A parameter matrix representing the h-th eigenvector, wherein
Figure BDA0002460043810000067
Representing a Hadamard product, i.e. the operation of the product of corresponding bit elements between two vectors, xkAt xk-1On the basis of (a) and (b)0Explicit interaction results inkOrder ratio of (x)k-1And the maximum order of the obtained feature interaction is increased by 1 when a hidden layer is added to the compressed interactive network.
The click rate prediction method based on attention mechanism according to the above embodiment of the present invention, the product operation of the corresponding bit elements between two vectors, for example,
Figure BDA0002460043810000068
Figure BDA0002460043810000069
wherein, the step 4 specifically comprises: different interaction vectors are endowed with different weights by the result of each layer after vector interaction through a self-attention mechanism, and the result is subjected to summation pooling to obtain a high-dimensional interaction result.
According to the click rate prediction method based on the attention mechanism, which is disclosed by the embodiment of the invention, because the vector interaction has the defect of high time complexity, different interaction vectors are endowed with different weights through the self-attention mechanism according to the result of the vector interaction of each layer, so that a large amount of time can be saved.
Wherein, the step 5 specifically comprises: the vector of the embedded layer is brought into a deep neural network to obtain a result after multilayer interaction, the result obtained by the deep neural network and the result obtained by a compressed interaction network are compressed and spliced into a new matrix and are brought into a single-layer perceptron to obtain a final result, and an output result formula is as follows:
Figure BDA0002460043810000064
where σ is sigmoid function, xfIs the original characteristic of the image to be displayed,
Figure BDA0002460043810000065
is the output of the DNN output layer, ycinIs the output of the CIN output layer,
Figure BDA0002460043810000066
the linear regression, the weight matrix of the DNN output layer and the CIN output layer, and b is a learnable parameter are shown.
Wherein, the step 5 further comprises: the weight parameters of the model are continuously updated through the loss function and the gradient descent, and the formula of the loss function is as follows:
Figure BDA0002460043810000071
wherein,
Figure BDA0002460043810000072
representing the predicted value of the model prediction, yiRepresenting the true value of the actual data, N being the total number of training instances, the optimization process is to minimize the following objective function:
Figure BDA0002460043810000073
where λ represents the regularization term and θ represents the parameter set, including parameters in the linear part, the CIN part and the DNN part.
In the click rate prediction method based on the attention mechanism according to the embodiment of the present invention, the experimental part of model training and prediction adopts an industry public data set: large ad click through rate prediction Criteo dataset and context based APP recommendation Frappe dataset. The Criteo data set contains a total of 11 GB-sized 7-day continuous user behavior logs, about 4100 ten thousand historical records, each training sample comprises 39 data features of different fields, wherein the 11 th to 13 th dimensions 113 are continuous value anonymous features, the C1 to C26 are discrete value anonymous features, and the desensitized anonymous features mainly comprise user features, item features and environment features and are transparent to the specific meaning of each field feature. Another data set is based on the APP recommended Frappe data set, each log contains 8 contextual category features such as weather, city, time, etc. except user ID and article ID, and features C1-C10 containing 10 fields in the Frappe data set all belong to category features and have no numerical features, and the Frappe data set is relatively small in size and has a total of 288609 training samples.
The data of 1/10 was randomly selected as the validation set for the Criteo dataset and the Frappe dataset, and the remaining data was used as the training set. The click rate prediction method based on the attention mechanism is implemented based on Tensorflow3+ python3.6, and an optimal set of hyper-parameters is found for each model in a mode of executing grid search on a verification set. The optimization method is Adam, the learning rate is 0.001, the batch size is 4096, regularization is performed using L2 with a coefficient of 0.0001, the number of hidden nodes is defaulted: 400 in the DNN output layer; the CIN output layer is 200 on the Criteo dataset and 100 on the Frappe dataset, for the CrossNet and CIN models in Deep & Cross, because of the difference in data, experiments will be performed by changing the depth of the hidden layer and comparing the best experimental results for each model.
As shown in fig. 5, the click-through rate prediction method (Our's) based on the attention mechanism is compared with other model experimental results, and as can be seen from the experimental results, the LR model is the least performing one of all models, because the LR model can only process some simple feature combinations with low dimensionality, which indicates that it is very necessary to extract implicit features from sparse data by a deep learning method; other models which are trained through Deep learning, such as PNN, Wide & Deep, Deep FM, Deep & Cross, have better effects than FM models, show that real data features are generally very complex, like the FM models which can only process two-dimensional features and cannot process more than three-dimensional features well, so the FM models have not very good effect on high-dimensional feature interaction processing; the processing effect of the DeepFM and Deep & Cross mixed models is better than that of the PNN model only considering the high-dimensional features, which indicates that the low-dimensional interactive features and the high-dimensional interactive features need to be considered simultaneously, and the Wide & Deep model has a lower effect than that of the PNN model because the feature combination mode of the Wide & Deep model is still manually combined; the prediction result of the click rate prediction method based on the attention mechanism is better than three mixed models including Wide & Deep, Deep FM and Deep & Cross, which indicates the need of further subdividing the explicit high-dimensional features, the explicit features are divided into high-dimensional features and low-dimensional features, and certain effect is achieved by combining the training of the implicit high-dimensional features (the features trained by DNN). Compared with the network depth of dozens of layers of computer vision, the network setting of the model of the click rate prediction method based on the attention mechanism is not particularly deep, and a good effect can be achieved by only about 3 layers. As can be seen from fig. 6, when the number of network layers is less than 3, the training result of the model is increasing, and when the number of network layers is greater than 3, the training result of the model is decreasing, which indicates that the more complicated the number of network layers is, the worse the training effect is, and overfitting is easily generated.
The click rate prediction method based on the attention mechanism according to the above embodiment of the present invention maps the same class of user features into high-dimensional sparse vectors by unique hot coding, changes the features into low-dimensional dense vectors by Embedding layer Embedding, brings the feature vectors into a compressed interactive network and a deep neural network, respectively, obtains the input value of the next layer by performing the product operation of the corresponding bit elements of the matrix on the initial input value and the input value of the hidden layer by the compressed interactive network and the deep neural network, obtains the input vector of each hidden layer by the calculation of a plurality of hidden layers, performs weight calculation on the input vector of each hidden layer by the attention mechanism, obtains the result of the high-dimensional explicit interactive vector by summing pooling, outputs the result by activating the function after splicing the result obtained by deep learning in the deep neural network and the result of the compressed interactive network, according to the click rate prediction method based on the attention mechanism, in Criteo and Frappe public data sets, low-dimensional features, explicit high-dimensional features and implicit high-dimensional features of users are comprehensively considered, useful feature combinations are screened through the attention mechanism in the attention mechanism, prediction efficiency is improved, manual feature extraction is not needed, high-dimensional feature combinations can be extracted, too high dimensionality is not easily caused, the capability of a wide-depth model for extracting complex combination features is improved, and the click rate prediction method based on the attention mechanism is good in prediction effect by multiplying vector levels instead of element levels and fusing the attention mechanism.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (9)

1. A click rate prediction method based on an attention mechanism is characterized by comprising the following steps:
step 1, preprocessing the characteristics of users, and performing One-hot unique coding on the same type of user characteristics to obtain a high-dimensional sparse characteristic vector;
step 2, reducing the dimension of the high-dimension sparse feature vector by embedding the vector, and taking the feature vector after dimension reduction as an input vector of a click rate model to be respectively brought into a compressed interactive network and a deep neural network;
step 3, carrying out Hadamard product on the input initial characteristic vector and the input vector of each hidden layer, taking the obtained result as the input value of the next hidden layer, and increasing the combination of the characteristics by one dimension every more hidden layer;
step 4, obtaining useful combination characteristics by the result vector obtained by each layer through an attention mechanism, and summing and pooling the combination characteristics;
and 5, simplifying and splicing the pooled result and the result obtained by the deep neural network into a new feature vector, and bringing the new feature vector into an output layer to obtain a predicted value.
2. The attention mechanism-based click rate prediction method according to claim 1, wherein the step 1 specifically comprises:
collecting a data set X ═ { x) of user characteristics1,x2,……xNIs the total number of training samples, xi∈{x1,x2,……xN},xiRepresenting the ith user characteristic data to be processed.
3. The attention mechanism-based click rate prediction method according to claim 2, wherein the step 1 further comprises:
the user features are converted into a high-dimensional sparse feature vector using one-hot encoding.
4. The attention mechanism-based click rate prediction method according to claim 3, wherein the step 2 specifically comprises:
the low-dimensional combined features are converted by an embedded layer vector, and sparse vectors are mapped to space vectors which are relatively dense and have non-zero vector elements.
5. The attention mechanism-based click rate prediction method of claim 4, wherein the step 2 further comprises:
processing the raw data into data with mean value of 0 and variance of 1 by a normalization method, wherein the normalized data uses xnormExpressed, the specific calculation formula is as follows:
Figure FDA0002460043800000021
where x denotes continuous value data, μ denotes a variance of original data, and σ denotes a mean of the original data.
6. The attention mechanism-based click rate prediction method according to claim 5, wherein the step 3 specifically comprises:
according to the feature vectors obtained by the embedding layer, the feature vectors are spliced into a matrix of m × d, wherein m is the number of the feature vectors, d is the dimension of the feature vectors, and x iskRepresenting the state of the k-th hidden layer in the compressed interactive network,
Figure FDA0002460043800000022
is a matrix in which HkRepresenting the number of compression features of the hidden layer of the k layer, the feature embedding layer is called the hidden layer of the 0 th layer, H0The state calculation equation for each hidden layer k in the compressed interactive network is:
Figure FDA0002460043800000023
wherein H is more than or equal to 1 and less than or equal to Hk
Figure FDA0002460043800000024
A parameter matrix representing the h-th eigenvector, wherein
Figure FDA0002460043800000025
Representing a Hadamard product, i.e. the operation of the product of corresponding bit elements between two vectors, xkAt xk-1On the basis of (a) and (b)0Explicit interaction results inkOrder ratio of (x)k-1And the maximum order of the obtained feature interaction is increased by 1 when a hidden layer is added to the compressed interactive network.
7. The attention mechanism-based click rate prediction method according to claim 6, wherein the step 4 specifically comprises:
different interaction vectors are endowed with different weights by the result of each layer after vector interaction through a self-attention mechanism, and the result is subjected to summation pooling to obtain a high-dimensional interaction result.
8. The attention mechanism-based click rate prediction method according to claim 7, wherein the step 5 specifically comprises:
the vector of the embedded layer is brought into a deep neural network to obtain a result after multilayer interaction, the result obtained by the deep neural network and the result obtained by a compressed interaction network are compressed and spliced into a new matrix and are brought into a single-layer perceptron to obtain a final result, and an output result formula is as follows:
Figure FDA0002460043800000026
where σ is sigmoid function, xfIs the original characteristic of the image to be displayed,
Figure FDA0002460043800000027
is the output of the DNN output layer, ycinIs the output of the CIN output layer,
Figure FDA0002460043800000028
the linear regression, the weight matrix of the DNN output layer and the CIN output layer, and b is a learnable parameter are shown.
9. The attention mechanism-based click rate prediction method of claim 8, wherein the step 5 further comprises:
the weight parameters of the model are continuously updated through the loss function and the gradient descent, and the formula of the loss function is as follows:
Figure FDA0002460043800000031
wherein,
Figure FDA0002460043800000032
representing the predicted value of the model prediction, yiRepresenting the true value of the actual data, N being the total number of training instances, the optimization process is to minimize the following objective function:
Figure FDA0002460043800000033
where λ represents the regularization term and θ represents the parameter set, including parameters in the linear part, the CIN part and the DNN part.
CN202010317646.8A 2020-04-21 2020-04-21 Click rate prediction method based on attention mechanism Pending CN111538761A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010317646.8A CN111538761A (en) 2020-04-21 2020-04-21 Click rate prediction method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010317646.8A CN111538761A (en) 2020-04-21 2020-04-21 Click rate prediction method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN111538761A true CN111538761A (en) 2020-08-14

Family

ID=71979143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010317646.8A Pending CN111538761A (en) 2020-04-21 2020-04-21 Click rate prediction method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN111538761A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737586A (en) * 2020-08-19 2020-10-02 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and computer readable storage medium
CN112270568A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Social e-commerce platform marketing activity order rate prediction method facing hidden information
CN112492396A (en) * 2020-12-08 2021-03-12 中国计量大学 Short video click rate prediction method based on fine-grained multi-aspect analysis
CN112559877A (en) * 2020-12-24 2021-03-26 齐鲁工业大学 CTR (China railway) estimation method and system based on cross-platform heterogeneous data and behavior context
CN112633937A (en) * 2020-12-30 2021-04-09 上海数鸣人工智能科技有限公司 Marketing prediction method based on dimension reduction of depth automatic encoder and gradient lifting decision tree
CN112633931A (en) * 2020-12-28 2021-04-09 广州博冠信息科技有限公司 Click rate prediction method, device, electronic equipment and medium
CN112733918A (en) * 2020-12-31 2021-04-30 中南大学 Graph classification method based on attention mechanism and compound toxicity prediction method
CN113010774A (en) * 2021-02-24 2021-06-22 四川省人工智能研究院(宜宾) Click rate prediction method based on dynamic deep attention model
CN113220974A (en) * 2021-05-31 2021-08-06 北京爱奇艺科技有限公司 Click rate prediction model training and search recall method, device, equipment and medium
CN113298084A (en) * 2021-04-01 2021-08-24 山东师范大学 Feature map extraction method and system for semantic segmentation
CN113407663A (en) * 2020-11-05 2021-09-17 腾讯科技(深圳)有限公司 Image-text content quality identification method and device based on artificial intelligence
CN113535800A (en) * 2021-06-03 2021-10-22 同盾科技有限公司 Feature representation method in credit scenario, electronic device, and storage medium
CN113656272A (en) * 2021-08-16 2021-11-16 Oppo广东移动通信有限公司 Data processing method and device, storage medium, user equipment and server
CN114358364A (en) * 2021-11-20 2022-04-15 重庆邮电大学 Attention mechanism-based short video frequency click rate big data estimation method
CN115809372A (en) * 2023-02-03 2023-03-17 中国科学技术大学 Click rate prediction model training method and device based on decoupling invariant learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062893A (en) * 2018-07-13 2018-12-21 华南理工大学 A kind of product name recognition methods based on full text attention mechanism
CN109960759A (en) * 2019-03-22 2019-07-02 中山大学 Recommender system clicking rate prediction technique based on deep neural network
WO2019240900A1 (en) * 2018-06-12 2019-12-19 Siemens Aktiengesellschaft Attention loss based deep neural network training
US20200073937A1 (en) * 2018-08-30 2020-03-05 International Business Machines Corporation Multi-aspect sentiment analysis by collaborative attention allocation
CN110991464A (en) * 2019-11-08 2020-04-10 华南理工大学 Commodity click rate prediction method based on deep multi-mode data fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019240900A1 (en) * 2018-06-12 2019-12-19 Siemens Aktiengesellschaft Attention loss based deep neural network training
CN109062893A (en) * 2018-07-13 2018-12-21 华南理工大学 A kind of product name recognition methods based on full text attention mechanism
US20200073937A1 (en) * 2018-08-30 2020-03-05 International Business Machines Corporation Multi-aspect sentiment analysis by collaborative attention allocation
CN109960759A (en) * 2019-03-22 2019-07-02 中山大学 Recommender system clicking rate prediction technique based on deep neural network
CN110991464A (en) * 2019-11-08 2020-04-10 华南理工大学 Commodity click rate prediction method based on deep multi-mode data fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIANXUN LIAN等: "xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems" *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737586A (en) * 2020-08-19 2020-10-02 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and computer readable storage medium
CN112270568A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Social e-commerce platform marketing activity order rate prediction method facing hidden information
CN112270568B (en) * 2020-11-02 2022-07-12 重庆邮电大学 Order rate prediction method for social e-commerce platform marketing campaign facing hidden information
CN113407663A (en) * 2020-11-05 2021-09-17 腾讯科技(深圳)有限公司 Image-text content quality identification method and device based on artificial intelligence
CN113407663B (en) * 2020-11-05 2024-03-15 腾讯科技(深圳)有限公司 Image-text content quality identification method and device based on artificial intelligence
CN112492396A (en) * 2020-12-08 2021-03-12 中国计量大学 Short video click rate prediction method based on fine-grained multi-aspect analysis
CN112559877A (en) * 2020-12-24 2021-03-26 齐鲁工业大学 CTR (China railway) estimation method and system based on cross-platform heterogeneous data and behavior context
CN112633931A (en) * 2020-12-28 2021-04-09 广州博冠信息科技有限公司 Click rate prediction method, device, electronic equipment and medium
CN112633937A (en) * 2020-12-30 2021-04-09 上海数鸣人工智能科技有限公司 Marketing prediction method based on dimension reduction of depth automatic encoder and gradient lifting decision tree
CN112633937B (en) * 2020-12-30 2023-10-20 上海数鸣人工智能科技有限公司 Marketing prediction method based on dimension reduction and GBDT (Global positioning System) of depth automatic encoder
CN112733918B (en) * 2020-12-31 2023-08-29 中南大学 Attention mechanism-based graph classification method and compound toxicity prediction method
CN112733918A (en) * 2020-12-31 2021-04-30 中南大学 Graph classification method based on attention mechanism and compound toxicity prediction method
CN113010774A (en) * 2021-02-24 2021-06-22 四川省人工智能研究院(宜宾) Click rate prediction method based on dynamic deep attention model
CN113010774B (en) * 2021-02-24 2023-04-07 四川省人工智能研究院(宜宾) Click rate prediction method based on dynamic deep attention model
CN113298084A (en) * 2021-04-01 2021-08-24 山东师范大学 Feature map extraction method and system for semantic segmentation
CN113220974A (en) * 2021-05-31 2021-08-06 北京爱奇艺科技有限公司 Click rate prediction model training and search recall method, device, equipment and medium
CN113220974B (en) * 2021-05-31 2024-06-07 北京爱奇艺科技有限公司 Click rate prediction model training and search recall method, device, equipment and medium
CN113535800A (en) * 2021-06-03 2021-10-22 同盾科技有限公司 Feature representation method in credit scenario, electronic device, and storage medium
CN113656272A (en) * 2021-08-16 2021-11-16 Oppo广东移动通信有限公司 Data processing method and device, storage medium, user equipment and server
CN114358364A (en) * 2021-11-20 2022-04-15 重庆邮电大学 Attention mechanism-based short video frequency click rate big data estimation method
CN114358364B (en) * 2021-11-20 2024-06-07 上海愚见观池科技有限公司 Short video click rate big data prediction method based on attention mechanism
CN115809372B (en) * 2023-02-03 2023-06-16 中国科学技术大学 Click rate prediction model training method and device based on decoupling invariant learning
CN115809372A (en) * 2023-02-03 2023-03-17 中国科学技术大学 Click rate prediction model training method and device based on decoupling invariant learning

Similar Documents

Publication Publication Date Title
CN111538761A (en) Click rate prediction method based on attention mechanism
CN109657156B (en) Individualized recommendation method based on loop generation countermeasure network
Shiri et al. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU
CN111209386B (en) Personalized text recommendation method based on deep learning
CN111127146B (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
Taylor et al. Learning invariance through imitation
CN111259140B (en) False comment detection method based on LSTM multi-entity feature fusion
Dhurandhar et al. Tip: Typifying the interpretability of procedures
CN111177579B (en) Application method of integrated diversity enhanced ultra-deep factorization machine model
Raza et al. Understanding and using rough set based feature selection: concepts, techniques and applications
CN116469561A (en) Breast cancer survival prediction method based on deep learning
CN113505307A (en) Social network user region identification method based on weak supervision enhancement
CN116228368A (en) Advertisement click rate prediction method based on deep multi-behavior network
Bhadoria et al. Bunch graph based dimensionality reduction using auto-encoder for character recognition
Rijal et al. Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19.
Lu et al. Deep unsupervised learning using spike-timing-dependent plasticity
CN109934281B (en) Unsupervised training method of two-class network
Julian et al. Construction of Deep Representations
Sun et al. Evolutionary Deep Neural Architecture Search: Fundamentals, Methods, and Recent Advances
CN112927248B (en) Point cloud segmentation method based on local feature enhancement and conditional random field
CN115239967A (en) Image generation method and device for generating countermeasure network based on Trans-CSN
CN112561599A (en) Click rate prediction method based on attention network learning and fusing domain feature interaction
Chen et al. Face recognition using DCT and hierarchical RBF model
Shen et al. A deep embedding model for co-occurrence learning
CN113158577A (en) Discrete data characterization learning method and system based on hierarchical coupling relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200814

RJ01 Rejection of invention patent application after publication