CN111563770A - Click rate estimation method based on feature differentiation learning - Google Patents

Click rate estimation method based on feature differentiation learning Download PDF

Info

Publication number
CN111563770A
CN111563770A CN202010342981.3A CN202010342981A CN111563770A CN 111563770 A CN111563770 A CN 111563770A CN 202010342981 A CN202010342981 A CN 202010342981A CN 111563770 A CN111563770 A CN 111563770A
Authority
CN
China
Prior art keywords
feature
vector
features
neural network
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010342981.3A
Other languages
Chinese (zh)
Inventor
郑小林
杨煜溟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Jztdata Technology Co ltd
Original Assignee
Hangzhou Jztdata Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Jztdata Technology Co ltd filed Critical Hangzhou Jztdata Technology Co ltd
Priority to CN202010342981.3A priority Critical patent/CN111563770A/en
Publication of CN111563770A publication Critical patent/CN111563770A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a recommended click rate estimation technology and aims to provide a click rate estimation method based on feature differentiation learning. The method comprises the following steps: firstly, constructing an input vector of original features to obtain low-dimensional feature vector representation of each original feature; constructing a neural network with feature combination capability, obtaining combined feature vectors and constructing and outputting the combined feature vectors; then, differential activation constraint is proposed to control the similarity between the feature vectors and improve the integrity of feature vector expression; the existing compression-excitation network is used for distinguishing the feature importance, so that the distinguishing capability of the neural network on the features is improved; and finally, performing combined training on the neural network with the feature combination capability and the deep neural network to obtain a final predicted value. The method can improve the judgment capability of the click rate estimation model on the effectiveness of the combined features, can deeply analyze the original features, accurately depict the combined relation of the feature vectors, and effectively predict the probability of the recommended content being clicked by the user.

Description

Click rate estimation method based on feature differentiation learning
Technical Field
The invention relates to the field of recommended click rate estimation, in particular to a click rate estimation method based on feature differentiation learning.
Background
With the development of information technology and internet, people gradually move from the times of lack of information to the times of information overload, the complexity and the nonuniformity of massive information make information acquisition difficult and time-consuming, and both information consumers and information producers encounter great challenges. More and more internet applications have successfully introduced recommendation systems, and the fields in which recommendation systems are widely utilized include e-commerce, movies and videos, music, social networks, location-based services, personalized advertisements, and the like.
The core task of the recommendation system is to provide content display meeting the interest for a user in a specific environment, a Click Through Rate (CTR) is recommended and used for describing the probability of being clicked by the user after the content display, and the CTR estimation refers to predicting the probability value of the content recommended to the user in the specific environment being clicked by the user by using the context related data of the user and the content Through a data mining related technology. Whether the recommended content is clicked or not can be estimated to reflect whether the content currently recommended and displayed meets the interest of the user or not, therefore, a click rate estimation algorithm is widely used in a content sorting stage of a recommendation system to generate a recommendation list meeting the interest and habit of the user, the satisfaction degree of the user on the application recommended content is further improved, the online time of the user in the application is improved, or the revenues of advertisement putting in the application are improved.
The research of recommendation systems can be linked to many related fields such as user modeling, machine learning and information retrieval, etc., which have evolved into an independent research field in the 90 s of the 20 th century due to their increasing importance. A recommendation question is defined as a question as to how to estimate how a user scores unseen items, so that the item with the highest score estimate among those items can be recommended to the user. In the recommendation process, the recommendation accuracy, diversity, efficiency of the recommendation algorithm and other problems are the key points of the recommendation algorithm research.
The recommendation system can be regarded as a search ranking system, that is, given a query, a recommendation task finds relevant items from a database in a recall stage, then in a ranking stage, a recalled content subset is further ranked based on a target score estimated by a user click rate, and then content distribution is performed in combination with a strategy. In the recommendation sorting stage, the accurate estimation of the click rate of the user has important guiding effects on improving the value of traffic and increasing the advertising revenue, so that the prediction of the recommended click rate is a research direction with engineering and academic significance at the same time.
With the rapid development of the mobile internet, the feature dimensions and forms of content recommendation are increasingly large and diversified, and meanwhile, the structure of a recommendation algorithm model is also developed from a shallow layer to a deep layer, and the recommendation algorithm model is mainly divided into two types of click rate estimation methods, namely a traditional machine learning model and a deep learning model.
For user click rate estimation, a method widely adopted in the industry is to combine artificial feature engineering and a linear logistic regression model, and the linear model has the advantages of simple structure, convenience in maintenance, capability of processing discretization features and realization of distributed computation. However, linear models lack the ability to capture implicit features, and require extensive manual feature engineering for better prediction results. For example, an important task in feature engineering is cross-extraction on class features. The original class features are independent, and the combination of some possibly associated features is more beneficial to the prediction result of the model. However, conventional cross feature engineering creates a number of problems: first, obtaining high quality artificial features is costly, and data scientists spend a great deal of time exploring potential patterns in product data to design meaningful cross-sectional features for a particular task. In addition, in internet mass recommendation prediction systems, the original dimensionality of the data is typically thousands of, and it is impractical to extract all of the combined features by hand. Moreover, the manually extracted combination features cannot generate combination features that do not appear in the training set. Therefore, it is a very meaningful task to study how to use the model for automated feature cross-combination.
Aiming at the problem of feature combination of large-scale sparse discrete data, the traditional model can not be separated from manual feature engineering. And the effect of improving click rate prediction by exploring complex feature combinations by utilizing the feature expression capability of the deep neural network has two main advantages: firstly, the depth model has strong expression capability and can learn high-order nonlinear characteristics; in addition, other types of features, such as pictures, voice and the like, can be expanded more easily, and the end-to-end model prediction capability is realized.
As described above, currently, many research results on recommending click rate estimation are available, and the methods used are also widely available, for example, the following technical solutions are disclosed in the following documents:
the Chinese invention patent 'a click rate estimation method and system based on Xgboost algorithm' (CN 201811312769.1). The technical scheme comprises the following steps: selecting a preset number of original features from log data of an advertisement putting platform; carrying out model training on the Xgboost algorithm by utilizing each original characteristic to obtain a model file; acquiring current characteristics corresponding to a predetermined number of advertisements in an advertisement library of an advertisement putting platform; and respectively calculating the click rate of each current characteristic and the model file to obtain a corresponding estimated click rate value. Therefore, the method obtains the corresponding model file on the basis of the Xgboost algorithm, and the model file can rapidly process the advertisement characteristics to obtain the estimated click rate value. In addition, the method has good portability, namely the method can be realized on each platform, and has high fault tolerance compared with the related technology.
The Chinese invention patent 'a click rate estimation method based on an FFM deep neural network' (CN 201910123419.9). The technical scheme comprises the following steps: 1) discretizing data in the training data; 2) re-encoding the discretized training data; 3) training the recoded training data through an FFM deep neural network; 4) preprocessing data to be predicted; 5) and (4) pre-estimating the click rate of the preprocessed data through the trained neural network. The method of the invention utilizes the characteristics of strong expression capability and automatic feature combination of the FFM deep neural network model, so that the model can learn low-order information and high-order information of features, and simultaneously solves the problem of automatic feature crossing, thereby being better applied to the industrial and living fields.
The Chinese invention patent 'a click rate estimation method based on decision tree and logistic regression' (CN 201711439302.9). The method comprises the following steps: acquiring relevant characteristic data of the release information; establishing a click rate estimation model based on a decision tree and a probability sparse linear classifier cascade structure; generating real-time training data by an online connector; training a click rate estimation model through real-time training data to obtain the latest click rate estimation model to estimate the click rate; a model system structure based on decision tree and probability sparse linear classifier cascade structure is provided, which also comprises an online learning layer and discloses an online connector, which is a very critical component in the online learning layer and can convert training data into real-time streaming data.
Although the technical solutions of the above three documents can estimate the recommended click rate, they still have the following disadvantages to be applied to a specific internet application scenario:
most click rate estimation methods focus on combining original category features, but the integrity of combined feature expression and the importance of the combined features are not considered at the same time, and better prediction accuracy can be achieved through the complete feature expression and the effective feature utilization.
The feature intersection is a key problem in the field of click rate estimation, models with intersection network structures are designed for many relevant works, the models usually utilize vector inner products or Hadamard products to calculate the intersection of feature vectors, and no clear structure is used for distinguishing the meanings of a plurality of intermediate feature vectors in the networks, so that the feature expression capacity of the models can be limited, and the overfitting problem of the models is caused.
In addition, in the click-through rate estimation task, the importance degree between different features is different, for example, in order to predict the income situation of one person, the influence of professional features on the income is obviously greater than the hobby, and the importance degree after different feature combinations is also different. Therefore, in a neural network with a feature intersection structure, if each feature is intersected with other features by using the same weight, more and more obvious information loss is brought as the number of network layers is increased.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects in the prior art and provide a click rate estimation method based on feature differentiation learning.
In order to solve the technical problem, the solution of the invention is as follows:
the click rate estimation method based on feature differentiation learning comprises the following steps: firstly, constructing an input vector of original features to obtain low-dimensional feature vector representation of each original feature; constructing a neural network with feature combination capability, obtaining combined feature vectors and constructing and outputting the combined feature vectors; then, differential activation constraint is proposed to control the similarity between the feature vectors and improve the integrity of feature vector expression; the existing compression-excitation network is used for distinguishing the feature importance, so that the distinguishing capability of the neural network on the features is improved; and finally, performing combined training on the neural network with the feature combination capability and the deep neural network to obtain a final predicted value.
The method disclosed by the invention specifically comprises the following steps of:
(1) constructing input vectors of original features
Embedding and coding sparse features in a feature input layer of a click rate estimation model based on deep learning, and converting each input original data feature into a low-dimensional dense real numerical vector, namely an embedded vector of the features; splicing the embedded vectors of all the features to be used as a result of a feature input layer, and using the feature embedded vectors as basic units of the features;
(2) constructing neural networks with feature combination capability
Combining the features in a vector mode, wherein each basic unit is an embedded vector of the features; combining every two eigenvectors output by the upper layer of the neural network and the original eigenvectors, and performing weighted average on the obtained multiple combined vectors to obtain the output of each layer of the neural network;
combining the neural network with the original embedded vector once more every time one layer is added, wherein the number of the layers determines the times of feature combination, and the output of each hidden layer in the neural network structure is determined by the input of the previous hidden layer and the original feature; the structure of the feature vector is kept in each layer of the network, and all feature combinations are carried out according to the vector;
(3) using differential activation constraints to control similarity between feature vectors
Each layer of the neural network is used as a unit for differentiating the feature vectors, so that the feature vectors in each layer have difference as much as possible, and the cosine similarity is used for representing the difference between the feature vectors;
and (3) iteratively solving each orthogonal vector representation in a regularization constraint mode, calculating cosine similarity between every two vectors in each iteration process, and adding the cosine similarity as a regularization loss into a model for co-training: explicitly controlling the similarity degree between the feature vectors through the regular terms of the differentiated activation constraints, so that the similarity between the feature vectors is continuously reduced in the training of the neural network model;
(4) constructing outputs of neural networks
Splicing the feature vectors of all hidden layers of the neural network with the feature combination capability constructed in the step (2) to obtain a combined feature matrix as output; the combined feature matrix comprises combined features of any number of layers, and each element dimension is a feature vector;
(5) distinguishing the feature importance by utilizing a compression-excitation network;
for all combined features and original features, an attention mechanism based on a compression-excitation network is introduced, the weight of important features is increased, and the weight of unimportant features is reduced;
the output of the neural network with feature combination capability is all the combined features and the original feature vectors, which are used as the input of the compression-excitation network, and the weight vector corresponding to each feature is generated by the latter; directly connecting the weight-adjusted feature vector to an output unit, wherein the obtained neural network model is called a feature importance degree-based differential activation network;
(6) performing combined training on the differential activation network and the deep neural network based on the feature importance degree to construct a combined model
Connecting the output of the differentiated activation network based on the feature importance obtained in the step (5) to the existing deep neural network to construct a deep learning model; connecting the combined features with weights output by the differentiated activation network based on feature importance to a linear logistic regression model and a deep neural network model for combined training; connecting the outputs of the linear logistic regression model and the deep neural network model to an output unit to obtain a joint click rate pre-estimated value, namely a probability value estimated by the finally recommended click rate; the larger the value, the higher the probability that the recommended content is clicked by the user.
Compared with the prior art, the invention has the beneficial effects that:
1. in order to improve the differential expression capability of the feature vectors, the invention provides a differential activation constraint method aiming at the feature vectors, which can increase the difference among different feature vectors in a targeted manner, thereby activating more implicit modes in data and achieving the purpose of efficient feature coding.
2. The invention utilizes the existing compression-excitation network to automatically learn the weight of the combined feature, provides a differentiated activation network based on feature importance, and improves the judgment capability of a click rate estimation model on the effectiveness of the combined feature.
3. According to the method, the output of a differential activation network based on feature importance is simultaneously connected to a deep neural network and a linear logistic regression model, the deep part of the model enables the model to have the capability of simultaneously learning explicit and implicit high-order feature combinations, and meanwhile, the generalization of the whole model is improved; the shallow part of the model can learn the feature low-order combination for improving the generalization of the model, and the model does not need to be artificially combined with features.
4. The invention discloses an innovative calculation method for estimating recommended click rate, which can deeply analyze original characteristics, accurately depict the combination relation of characteristic vectors and effectively predict the probability of clicking recommended contents by users.
Drawings
Fig. 1 is an overall architecture of a feature importance-based differentiated activation network in the present invention.
Fig. 2 is a schematic diagram of the overall structure of the differentiated activation network in the present invention.
Fig. 3 is a schematic diagram of a compression-excitation network unit structure in the present invention.
Detailed Description
The click rate estimation method based on feature differentiation learning provided by the invention is based on the differentiation activation network of feature importance, provides a differentiation activation constraint method aiming at feature vectors, and can increase the differences among different feature vectors in a targeted manner, thereby activating more implicit modes in data and achieving the purpose of efficient feature coding. And the weight of the combined features is automatically learned by utilizing the existing compression-excitation network, a differentiated activation network based on feature importance is constructed, the judgment capability of a click rate estimation model on the effectiveness of the combined features is improved, and then a recommended click rate pre-estimated value is obtained through calculation.
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
the click rate estimation method based on feature differentiation learning specifically comprises the following steps:
step (1): constructing a characteristic input vector;
the feature input layer of the click rate estimation model based on deep learning comprises a process of embedding and coding sparse features, and each input original data feature is converted into a low-dimensional dense real numerical vector, namely an embedded vector of the features. Stitching the embedded vectors of all features as the result E ═ E of the feature input layer1,e2,...,ef]Where f represents the number of features,
Figure BDA0002468600800000061
an embedding vector representing the ith feature, and d is the dimension of the embedding vector,
Figure BDA0002468600800000062
is a matrix symbol. The neural network model embeds features into vectors as basic units of features,only the calculation of the feature vector is needed in the subsequent part.
Step (2): constructing a neural network with feature combination capability;
combining features in a vector mode, each basic unit is an embedded vector of the features, and the output of the k layer of the neural network with the feature combination capability is represented as a matrix
Figure BDA0002468600800000063
Wherein, HkThe layer represents the number of k-th layer feature embedding vectors set, the d-th layer is the dimension of the embedding vectors, and,
Figure BDA0002468600800000064
representing the ith feature vector of the kth layer. Set up H0M denotes the number of embedded vectors of the original feature. The method for calculating the h-th feature vector of the k-th layer in the neural network is as follows:
Figure BDA0002468600800000065
wherein H is more than or equal to 1 and less than or equal to Hk
Figure BDA0002468600800000066
Representing a parameter matrix corresponding to the H characteristic vector of the k layer of the neural network with the characteristic combination capability, wherein the number of the parameters of the k layer of the neural network with the characteristic combination capability is Hk-1*m*Hk. Feature vector
Figure BDA0002468600800000067
The calculation process is that the feature vector output by the previous layer and the original feature vector are combined pairwise, and the obtained H isk-1× m combination vectors are weighted and averaged to get the h-th eigenvector of the current layer.
The neural network with feature combination capability can be added with the original embedded vector X once more every time one layer is added0So that the number of layers of the neural network controls the number of explicit feature combinationsWherein the output of each hidden layer is determined by the previous hidden layer and the original input. The structure of the feature vector is maintained at each layer of the neural network with feature combination capability, so that all feature combinations are performed vector-wise.
And (3): utilizing differentiated activation constraints to control similarity between feature vectors;
each layer of the neural network with the feature combination capability is used as a unit for feature vector differentiation, so that the feature vectors in each layer have differences as much as possible, and cosine similarity is used for representing the differences among the feature vectors; the coding expression of the neural network with the feature combination capability should remove the redundancy of the representation among the feature vectors as much as possible, reduce the unpredictability of the system, and improve the representation capability of the combination features and the generalization capability of the model.
The feature differentiation has an important role in the feature expression capability, and in order to remove information redundancy possibly existing among feature vectors and improve the differential expression capability of the feature vectors, a Differentiated Activation Constraint (DAC) method for the feature vectors is adopted to explicitly control the similarity degree among the feature vectors.
And (4) iteratively solving each orthogonal vector representation by adopting a regularization constraint mode. And calculating cosine similarity between every two feature vectors in each iteration process, and adding the cosine similarity as regularized loss into an integral click rate estimation model for co-training. The similarity degree between the feature vectors is explicitly controlled through the regular term of the differential activation constraint, so that the similarity between the feature vectors is continuously reduced in model training, and the neural Network structure is defined as a Differential Activation Network (DAN).
Defining the depth of the differential activation network as T, and representing the number of the characteristic vectors of the k layer as HkVector of motion
Figure BDA0002468600800000071
I-th feature embedding vector representing k-th layer, setting H0M represents the original feature-embedded vector of the input. Taking each layer of the neural network with the feature combination capability as a unit for feature vector differentiation, wherein the goal of the differentiation activation constraint is to make feature vectors in each layer have differences as much as possible, namely, equivalently, minimizing the similarity between the feature vectors in each layer:
wherein the content of the first and second substances,
Figure BDA0002468600800000087
a loss function representing the differentiated activation constraint, α representing a parameter of the neural network,
Figure BDA0002468600800000081
representing feature vectors
Figure BDA0002468600800000082
And
Figure BDA0002468600800000083
cosine values in between; the size of the included angle between the vectors, namely the difference of the directions between the vectors, is expressed by the cosine value of the vector, and the difference between the characteristic vectors is expressed by cosine similarity.
And (4): constructing the output of a neural network with the characteristic combination capability;
since the k-th layer has HkA different parameter matrix, so that the k-th layer output of the differentiated activation network is HkA number of different feature vectors. FIG. 2 shows the overall structure of the differentiated activation network, defining the depth of the differentiated activation network as T, and the number of all the combined features and the original features as
Figure BDA0002468600800000084
Feature vectors of all hidden layers
Figure BDA0002468600800000085
k∈[0,T]Splicing to obtain a combined feature matrix C with the dimension of n × d ═ x1,x2,…,xn]As the output of the differentiated activation network, each element dimension is a feature vector of d, and therefore, all the combined features from 0 th order to T th order are included in the combined feature matrix C.
And (5): distinguishing the feature importance by utilizing a compression-excitation network;
after the network is activated differentially, an attention mechanism based on a compression-excitation network (SENET) is introduced, the compression-excitation network is an existing technology and is mainly used for distinguishing the weight of each feature in the neural network model, and for all combined features and original features, the weight of important features can be automatically increased, and the weight of unimportant features can be reduced.
The output of the differentiated activation network is all combined features and the original feature vector C ═ x1,x2,…,xn]Using these feature vectors as input to the compression-excitation network, a weight vector is generated corresponding to the importance of each feature, i.e., a ═ a1,a2,…,anIn which aiIs the weight of the ith feature. These weights are then applied to all features to obtain Cse=[v1,v2,…,vn]Wherein
Figure BDA0002468600800000086
Representing the weighted feature vectors, each vi=xiai,i∈[1,2,…,n]Representing the ith adjusted feature vector.
As shown in fig. 3, the compression-excitation network performs weight adjustment in parallel with the feature vector, and is composed of three parts, i.e., compression, excitation, and weight adjustment, which are described separately.
The compression (Squeeze) process converts the vectors into scalar quantities by computing the statistical features of each feature vector, and in particular, the input feature vector C is scaled [ x ] using a maximum pooling or average pooling method1,x2,…,xn]Compressed into a statistical valueVector Z ═ Z1,z2,…,zn]Wherein the scalar ziGlobal information representing the ith eigenvector, exemplified by the mean pooling, ziThe calculation process of (2) is as follows:
Figure BDA0002468600800000091
after the compression process, all elements on each feature vector are averaged to a value, because the final weight is applied to the whole feature vector, which results in calculating the weight based on the whole information of the feature vector. In addition, the correlation among the feature vectors is utilized, but not the correlation among the internal elements of the feature vectors, and the distribution information in the feature vectors is shielded by global pooling, so that the calculation of the weight is more accurate.
The Excitation (Excitation) process learns the importance between each feature based on the statistical vector Z, using two fully connected neural network layers (fully connected layers) to learn these weights. The first fully-connected layer is used for dimension reduction, the reduction ratio is set to r, the fully-connected layer compresses input n statistical values into n/r values to reduce the calculation amount, and sigma is used1As a non-linear activation function. The second fully-connected layer acts to revert back to n dimensions and uses σ2As a non-linear activation function. Thus, the weight of each feature vector can be calculated as follows:
A=Fex(Z)=σ2(W2σ1(W1Z))
wherein the content of the first and second substances,
Figure BDA0002468600800000095
and
Figure BDA0002468600800000096
respectively parameters of the first and second fully connected layer,
Figure BDA0002468600800000093
is the weight of the feature vectorAbout the ratio r. And (5) trying the performance of the reduction ratio r under various values, and finally obtaining the balance of the overall performance and the calculated amount. The excitation process adopts the fully-connected layer to train the true weight by using the correlation between the features, the output of each batch of sample compression does not represent the weight to be adjusted by the true features, and the true weight is trained based on all data, so a fully-connected network is required for training.
Finally, the compression-excitation network carries out weight adjustment on all the feature vectors, namely, the original feature vector E and the weight vector A are multiplied by elements to obtain an adjusted feature vector Cse=[v1,v2,…,vn]The calculation process is as follows:
Figure BDA0002468600800000094
the above mechanism dynamically learns the importance of feature vectors after compression-excitation of the network, and for a particular task, increases the weight of important features and decreases the weight of features unrelated to the task. Finally, the weight-adjusted feature vector CseDirectly connecting to an output unit to obtain a feature importance-based differentiated Activation Network (FiDAN), which is a shallow model without a deep neural Network structure.
And (6): performing combined training on the differential activation network and the deep neural network based on the feature importance to construct a combined model;
and connecting the output of the differential activation network based on the feature importance degree to a traditional deep neural network to construct a deep model. The deep neural network is an existing technology, is composed of a plurality of fully-connected layers and is a nonlinear activation function, and can express implicit high-order characteristic nonlinear combination. Let a be ═ y1,y2,…,y2n]The output of the network is activated for differentiation based on feature importance, wherein,
Figure BDA0002468600800000101
is a feature vector, then a is input into a deep neural network to learn high-order feature crossing, and the forward propagation process of the neural network is as follows:
x1=σ(W1a+b1)xk=σ(Wkx(k-1)+bk)
where k is the number of layers in the neural network, σ is a non-linear activation function, xkIs the output of the k layer of the neural network, and the process of deep neural network learning high-order feature intersection is implicit.
In order to enable the model to have generalization and memory at the same time, the combined features with weights output by the differentiated activation network FiDAN based on the feature importance degree are connected to the linear logistic regression model and the deep neural network model at the same time for joint training. On the one hand, this model has the ability to learn both low-order and high-order feature combinations, and on the other hand, it learns both implicit and explicit feature combinations. Therefore, the outputs of the linear logistic regression model and the deep neural network model are connected to the output unit, and the click rate pre-estimated value of the joint model is as follows:
Figure BDA0002468600800000102
wherein the content of the first and second substances,
Figure BDA0002468600800000103
is a click rate pre-estimated value, sigma represents a sigmoid function, a is the output of the differentiated activation network FiDAN,
Figure BDA0002468600800000104
is the output of the deep neural network module and w and b are the parameters of the network.
Calculated click rate pre-estimated value
Figure BDA0002468600800000105
The probability value of the final recommended click rate estimation is obtained, and the larger the value is, the higher the probability that the recommended content is clicked by the user is.
Finally, it should be noted that the above-mentioned list is only a specific embodiment of the present invention. It is obvious that the present invention is not limited to the above embodiments, but many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims (8)

1. A click rate estimation method based on feature differentiation learning is characterized by comprising the following steps: firstly, constructing an input vector of original features to obtain low-dimensional feature vector representation of each original feature; constructing a neural network with feature combination capability, obtaining combined feature vectors and constructing and outputting the combined feature vectors; then, differential activation constraint is proposed to control the similarity between the feature vectors and improve the integrity of feature vector expression; the existing compression-excitation network is used for distinguishing the feature importance, so that the distinguishing capability of the neural network on the features is improved; and finally, performing combined training on the neural network with the feature combination capability and the deep neural network to obtain a final predicted value.
2. The method according to claim 1, characterized in that it comprises in particular the steps of:
(1) constructing input vectors of original features
Embedding and coding sparse features in a feature input layer of a click rate estimation model based on deep learning, and converting each input original data feature into a low-dimensional dense real numerical vector, namely an embedded vector of the features; splicing the embedded vectors of all the features to be used as a result of a feature input layer, and using the feature embedded vectors as basic units of the features;
(2) constructing neural networks with feature combination capability
Combining the features in a vector mode, wherein each basic unit is an embedded vector of the features; combining every two eigenvectors output by the upper layer of the neural network and the original eigenvectors, and performing weighted average on the obtained multiple combined vectors to obtain the output of each layer of the neural network;
combining the neural network with the original embedded vector once more every time one layer is added, wherein the number of the layers determines the times of feature combination, and the output of each hidden layer in the neural network structure is determined by the input of the previous hidden layer and the original feature; the structure of the feature vector is kept in each layer of the network, and all feature combinations are carried out according to the vector;
(3) using differential activation constraints to control similarity between feature vectors
Each layer of the neural network is used as a unit for differentiating the feature vectors, so that the feature vectors in each layer have difference as much as possible, and the cosine similarity is used for representing the difference between the feature vectors;
iterative solution is carried out on each orthogonal vector representation in a regularization constraint mode, the cosine similarity between every two vectors is calculated in each iterative process and is used as a regularization loss to be added into a model for common training; explicitly controlling the similarity degree between the feature vectors through the regular terms of the differentiated activation constraints, so that the similarity between the feature vectors is continuously reduced in the training of the neural network model;
(4) constructing outputs of neural networks
Splicing the feature vectors of all hidden layers of the neural network with the feature combination capability constructed in the step (2) to obtain a combined feature matrix as output; the combined feature matrix comprises combined features of any number of layers, and each element dimension is a feature vector;
(5) distinguishing the feature importance by utilizing a compression-excitation network;
for all combined features and original features, an attention mechanism based on a compression-excitation network is introduced, the weight of important features is increased, and the weight of unimportant features is reduced;
the output of the neural network with feature combination capability is all the combined features and the original feature vectors, which are used as the input of the compression-excitation network, and the weight vector corresponding to each feature is generated by the latter; directly connecting the weight-adjusted feature vector to an output unit, wherein the obtained neural network model is called a feature importance degree-based differential activation network;
(6) performing combined training on the differential activation network and the deep neural network based on the feature importance degree to construct a combined model
Connecting the output of the differentiated activation network based on the feature importance obtained in the step (5) to the existing deep neural network to construct a deep learning model; connecting the combined features with weights output by the differentiated activation network based on feature importance to a linear logistic regression model and a deep neural network model for combined training; connecting the outputs of the linear logistic regression model and the deep neural network model to an output unit to obtain a joint click rate pre-estimated value, namely a probability value estimated by the finally recommended click rate; the larger the value, the higher the probability that the recommended content is clicked by the user.
3. The method according to claim 2, wherein in the step (1), the embedded vectors of all the features are spliced as a result E ═ E of the feature input layer1,e2,...,ef]Where f represents the number of features,
Figure FDA0002468600790000021
an embedding vector representing the ith feature, and d is the dimension of the embedding vector,
Figure FDA0002468600790000022
is a matrix symbol.
4. The method according to claim 2, wherein in the step (2), the output of the k-th layer of the neural network with feature combination capability is represented as a matrix
Figure FDA0002468600790000023
Wherein, HkThe layer represents the number of k-th layer feature embedding vectors set, the d-th layer is the dimension of the embedding vectors, and,
Figure FDA0002468600790000024
a feature vector representing an ith of the kth layer; set up H0M represents the number of embedded vectors of the original feature; the method for calculating the h-th feature vector of the k-th layer in the neural network is as follows:
Figure FDA0002468600790000025
wherein H is more than or equal to 1 and less than or equal to Hk
Figure FDA0002468600790000026
Representing a parameter matrix corresponding to the H characteristic vector of the k layer of the neural network with characteristic combination capability, wherein the number of the parameters of the k layer of the neural network is Hk-1*m*Hk
5. The method according to claim 2, wherein, in the step (3),
each layer of the neural network with the feature combination capability is used as a unit for feature vector differentiation, so that the feature vectors in each layer have differences as much as possible, and cosine similarity is used for representing the differences among the feature vectors;
iterative solution is carried out on each orthogonal vector representation in a regularization constraint mode, cosine similarity between every two characteristic vectors is calculated in each iterative process, and the cosine similarity is used as regularization loss and added into an overall click rate estimation model for common training; explicitly controlling the similarity degree between the feature vectors through the regular term of the differential activation constraint, so that the similarity between the feature vectors is continuously reduced in model training, and the neural network structure is defined as a differential activation network;
defining the depth of the differential activation network as T, and representing the number of the characteristic vectors of the k layer as HkVector of motion
Figure FDA0002468600790000031
I-th feature embedding vector representing k-th layer, setting H0M represents the input original feature embedding vector; difference inThe goal of the activation constraint is to make the feature vectors within each layer of the neural network as different as possible, i.e. equivalent to minimizing the similarity between the feature vectors of each layer:
Figure FDA0002468600790000032
wherein the content of the first and second substances,
Figure FDA0002468600790000033
a loss function representing the differentiated activation constraint, α representing a parameter of the neural network,
Figure FDA0002468600790000034
representing feature vectors
Figure FDA0002468600790000035
And
Figure FDA0002468600790000036
cosine values in between; the size of the included angle between the vectors, namely the difference of the directions between the vectors, is expressed by the cosine value of the vector, and the difference between the characteristic vectors is expressed by cosine similarity.
6. The method of claim 2, wherein in step (4), the k-th layer has HkThe output of k layer of the neural network is HkA number of different feature vectors; defining the depth of the differential activation network as T, and the number of all combined features and original features as
Figure FDA0002468600790000037
Feature vectors of all hidden layers
Figure FDA0002468600790000038
Splicing to obtain a combined feature matrix C with the dimension of n × d ═ x1,x2,…,xn]As differentiated activationAnd (3) outputting a network, wherein each element dimension is a feature vector of d, and the combined feature matrix C comprises all combined features from 0 order to T order.
7. The method according to claim 2, wherein, in the step (5),
for all combined features and original features, an attention mechanism based on a compression-excitation network is introduced, the compression-excitation network in the prior art is used for distinguishing the weight of each feature in the neural network model, increasing the weight of important features and reducing the weight of unimportant features;
the output of the differentiated activation network is all the combined features and the original feature vectors, which are used as the input of the compression-excitation network, and the latter generates a weight vector corresponding to each feature; directly connecting the feature vector after the weight adjustment to an output unit to obtain a differential activation network based on feature importance;
the output of the differentiated activation network is all combined features and the original feature vector C ═ x1,x2,…,xn]Using these feature vectors as input to the compression-excitation network, a weight vector is generated corresponding to the importance of each feature, i.e., a ═ a1,a2,…,anIn which aiIs the weight of the ith feature; these weights are then applied to all features to obtain Cse=[v1,v2,…,vn]Wherein
Figure FDA0002468600790000041
Representing the weighted feature vectors, each vi=xiai,i∈[1,2,…,n]Representing the ith adjusted feature vector;
the compression-excitation network adopts a mode of being connected with the characteristic vector in parallel to carry out weight adjustment and consists of three parts of compression, excitation and weight adjustment, wherein:
in the compression process, the vector is converted into a scalar by calculating the statistical characteristics of each characteristic vector; utensil for cleaning buttockVolumetric, input feature vector C ═ x using maximum pooling or mean pooling methods1,x2,…,xn]Compressed into a statistical value vector Z ═ Z1,z2,…,zn]Wherein the scalar ziGlobal information representing the ith eigenvector, exemplified by the mean pooling, ziThe calculation process of (2) is as follows:
Figure FDA0002468600790000042
after the compression process, all elements on each feature vector are averaged into a value;
the excitation process learns the importance between each feature based on the statistics vector Z, learning these weights using two fully connected neural network layers; the first fully-connected layer is used for dimension reduction, the reduction ratio is set to r, the fully-connected layer compresses input n statistical values into n/r values to reduce the calculation amount, and sigma is used1As a non-linear activation function; the second fully-connected layer acts to revert back to n dimensions and uses σ2As a non-linear activation function; thus, the weight of each feature vector is calculated as follows:
A=Fex(Z)=σ2(W2σ1(W1Z))
wherein the content of the first and second substances,
Figure FDA0002468600790000043
and
Figure FDA0002468600790000044
respectively parameters of the first and second fully connected layer,
Figure FDA0002468600790000045
is the weight of the feature vector, the reduction ratio is r; trying the performance of the reduction ratio r under various values, and finally obtaining the balance of the overall performance and the calculated amount; the function of the full connection layer adopted in the excitation process is to utilize the characteristicsTraining real weights by characterizing correlation among samples, wherein the output of each batch of sample compression does not represent the weight of the real features to be adjusted, and the real weights are obtained by training based on all data, so a full-connection network is required for training;
finally, the compression-excitation network performs weight adjustment on all the feature vectors: multiplying the original characteristic vector E and the weight vector A by elements to obtain an adjusted characteristic vector Cse=[v1,v2,…,vn]The calculation process is as follows:
Figure FDA0002468600790000051
after the compression-excitation network, the importance of the feature vector is dynamically learned, so that the weight of the important features is increased, and the weight of features irrelevant to the task is reduced; finally, the weight-adjusted feature vector CseDirectly connected to the output unit, thus obtaining the differential activation network based on the feature importance, which is a shallow model without the structure of the deep neural network.
8. The method of claim 2, wherein in step (6), the click rate pre-estimated value of the joint model is:
Figure FDA0002468600790000052
wherein the content of the first and second substances,
Figure FDA0002468600790000053
is a click rate pre-estimated value, sigma represents a sigmoid function, a is the output of the differentiated activation network FiDAN,
Figure FDA0002468600790000054
is the output of the deep neural network module and w and b are the parameters of the network.
CN202010342981.3A 2020-04-27 2020-04-27 Click rate estimation method based on feature differentiation learning Pending CN111563770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010342981.3A CN111563770A (en) 2020-04-27 2020-04-27 Click rate estimation method based on feature differentiation learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010342981.3A CN111563770A (en) 2020-04-27 2020-04-27 Click rate estimation method based on feature differentiation learning

Publications (1)

Publication Number Publication Date
CN111563770A true CN111563770A (en) 2020-08-21

Family

ID=72070621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010342981.3A Pending CN111563770A (en) 2020-04-27 2020-04-27 Click rate estimation method based on feature differentiation learning

Country Status (1)

Country Link
CN (1) CN111563770A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036474A (en) * 2020-08-28 2020-12-04 光大科技有限公司 Risk determination method and device based on deep learning
CN112182379A (en) * 2020-09-28 2021-01-05 上海宏路数据技术股份有限公司 Data processing method, electronic device, and medium
CN112365297A (en) * 2020-12-04 2021-02-12 东华理工大学 Advertisement click rate estimation method
CN112836081A (en) * 2021-03-01 2021-05-25 腾讯音乐娱乐科技(深圳)有限公司 Neural network model training method, information recommendation method and storage medium
CN113158089A (en) * 2021-04-16 2021-07-23 桂林电子科技大学 Social network position vectorization modeling method
CN113872024A (en) * 2021-12-01 2021-12-31 中国工程物理研究院电子工程研究所 Intelligent fault diagnosis method for multi-source physical monitoring quantity of optical fiber laser system
CN115271272A (en) * 2022-09-29 2022-11-01 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036474A (en) * 2020-08-28 2020-12-04 光大科技有限公司 Risk determination method and device based on deep learning
CN112182379A (en) * 2020-09-28 2021-01-05 上海宏路数据技术股份有限公司 Data processing method, electronic device, and medium
CN112365297A (en) * 2020-12-04 2021-02-12 东华理工大学 Advertisement click rate estimation method
CN112365297B (en) * 2020-12-04 2022-06-28 东华理工大学 Advertisement click rate estimation method
CN112836081A (en) * 2021-03-01 2021-05-25 腾讯音乐娱乐科技(深圳)有限公司 Neural network model training method, information recommendation method and storage medium
CN113158089A (en) * 2021-04-16 2021-07-23 桂林电子科技大学 Social network position vectorization modeling method
CN113158089B (en) * 2021-04-16 2022-04-19 桂林电子科技大学 Social network position vectorization modeling method
CN113872024A (en) * 2021-12-01 2021-12-31 中国工程物理研究院电子工程研究所 Intelligent fault diagnosis method for multi-source physical monitoring quantity of optical fiber laser system
CN115271272A (en) * 2022-09-29 2022-11-01 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN115271272B (en) * 2022-09-29 2022-12-27 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Similar Documents

Publication Publication Date Title
CN111177575B (en) Content recommendation method and device, electronic equipment and storage medium
CN111563770A (en) Click rate estimation method based on feature differentiation learning
Wu et al. Session-based recommendation with graph neural networks
CN109657156B (en) Individualized recommendation method based on loop generation countermeasure network
CN111339415B (en) Click rate prediction method and device based on multi-interactive attention network
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN111222332B (en) Commodity recommendation method combining attention network and user emotion
CN111737578B (en) Recommendation method and system
US20220171760A1 (en) Data processing method and apparatus, computer-readable storage medium, and electronic device
CN111143705B (en) Recommendation method based on graph convolution network
CN112765480B (en) Information pushing method and device and computer readable storage medium
CN110781409A (en) Article recommendation method based on collaborative filtering
CN111127146A (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
CN114693397A (en) Multi-view multi-modal commodity recommendation method based on attention neural network
CN112765461A (en) Session recommendation method based on multi-interest capsule network
KR20200010672A (en) Smart merchandise searching method and system using deep learning
CN111159242B (en) Client reordering method and system based on edge calculation
CN116976505A (en) Click rate prediction method of decoupling attention network based on information sharing
CN114781503A (en) Click rate estimation method based on depth feature fusion
Zhang et al. Multi-scale and multi-channel neural network for click-through rate prediction
CN112860998A (en) Click rate estimation method based on multi-task learning mechanism
CN115795153A (en) CTR recommendation method based on feature interaction and score integration
CN114996566A (en) Intelligent recommendation system and method for industrial internet platform
CN114565436A (en) Vehicle model recommendation system, method, device and storage medium based on time sequence modeling
CN113010774A (en) Click rate prediction method based on dynamic deep attention model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination