CN110738314B - Click rate prediction method and device based on deep migration network - Google Patents
Click rate prediction method and device based on deep migration network Download PDFInfo
- Publication number
- CN110738314B CN110738314B CN201910991888.2A CN201910991888A CN110738314B CN 110738314 B CN110738314 B CN 110738314B CN 201910991888 A CN201910991888 A CN 201910991888A CN 110738314 B CN110738314 B CN 110738314B
- Authority
- CN
- China
- Prior art keywords
- network
- feature
- deep
- perceptron
- click rate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005012 migration Effects 0.000 title claims abstract description 73
- 238000013508 migration Methods 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 28
- 239000013598 vector Substances 0.000 claims abstract description 160
- 238000012512 characterization method Methods 0.000 claims abstract description 95
- 238000012549 training Methods 0.000 claims abstract description 55
- 238000012360 testing method Methods 0.000 claims abstract description 43
- 238000013507 mapping Methods 0.000 claims abstract description 31
- 239000011159 matrix material Substances 0.000 claims description 106
- 238000004364 calculation method Methods 0.000 claims description 24
- 238000000354 decomposition reaction Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 238000012417 linear regression Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 10
- 238000011084 recovery Methods 0.000 claims description 6
- 230000008685 targeting Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0463—Neocognitrons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0263—Targeted advertisements based upon Internet or website rating
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Finance (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a click rate prediction method and a click rate prediction device based on a depth migration network, wherein the click rate prediction device is used for realizing the method, and the method comprises the steps of discretizing continuous fields; preprocessing discrete features, converting the discrete features into feature id codes, and obtaining a mapping dictionary; converting the characteristic id code into a characterization vector by using a Glove model as an initialization parameter of the depth migration network embedded layer; inputting the feature id code into a depth migration network for training; discretizing the test sample, and mapping the discrete features into feature id codes by using a mapping dictionary M; and inputting the feature id code into a depth migration network to predict the click rate. The invention optimizes the click rate prediction method, improves the prediction accuracy and simultaneously keeps the low predicted time delay.
Description
Technical Field
The invention relates to the field of big data processing, in particular to a click rate prediction method and device based on a deep migration network.
Background
With the popularization of the internet, everywhere in life is closely related to the internet: go up to wash out treasures and go to the east to shop, go up to the beauty group and hunger, go up to vacate the video and look at the movie in the curiosity. The large amount of clicking actions of users on the Internet accumulate a large amount of precious data for platforms such as Taobao, beijing dong, mei Tuo and the like, the precious data force the platforms to put resources into, and the data generate visual values, such as calculating estimated scenes of advertising or recommending systems by using the data.
The main goal of calculating the advertisement is to integrate three-party information of an advertiser, an advertisement position provider and a user, so that more accurate advertisement delivery is performed, the advertisement effect of the advertiser, the income of the advertisement position and the user experience are further improved, the situation of multi-win is achieved, in this link, the most important ring is how to perform accurate advertisement delivery, and the technology used is various methods for estimating the click rate (predicting one field of the probability of clicking the advertisement by the user). The method has the advantages that the method is simple in structure, the method is convenient to use, the data in the advertisement field is easy to use, the data in the advertisement field is calculated, the data in the advertisement field is high in sparsity and huge in magnitude, the simplest linear model like logistic regression is widely used initially, but the estimated scene of click rate needs to simultaneously consider a plurality of objects such as users and advertisements, the combination among the features is far more important than the independent consideration of each feature, a later developed Factorization Model (FM) expresses the features by using a characterization vector, the combined information among the features is expressed by the characterization vector inner product, the recent deep learning model is greatly successful in each field, the deep learning model based on a neural network is gradually applied to the click rate estimated field to make up the defect that the FM model does not comprise the feature combination with the second order or more, and the application of the scene of massive data which needs to generate the prediction result in real time is limited in calculating the advertisement although the effect of the deep learning model is greatly improved compared with the FM model.
In general, the click rate estimation method is continuously optimized to achieve remarkable effect improvement, but the low-delay requirement to be considered in actual use is gradually ignored.
Disclosure of Invention
The main purpose of the invention is to provide a click rate prediction method based on a deep migration network, which aims to overcome the problems.
In order to achieve the above purpose, the click rate prediction method based on the deep migration network provided by the invention comprises the following steps:
s10, discretizing continuous fields of the training samples to obtain discrete features of the training samples;
s20, creating a unique feature id index code for each training sample discrete feature, and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
s30, counting the co-occurrence frequency of the feature id index codes in different samples to create a feature co-occurrence frequency matrix, converting the feature id index codes into a feature vector matrix of the feature co-occurrence frequency matrix through a Glove model, and taking the feature vector as an initialization parameter of an Embedding layer of the depth migration network;
s40, inputting the feature id index code into a depth migration network to obtain cross entropy loss of predicted click rate, and updating all parameters of the depth migration network by adopting a back propagation algorithm, wherein the depth migration network comprises an embedded layer Embedding for converting the feature id index code into corresponding characterization vectors, a factorized FM network for carrying out inner product on the corresponding characterization vectors to obtain FM predicted click rate, a shared perceptron network for carrying out nonlinear transformation on the corresponding characterization vectors to obtain abstract characterization vectors, a lightweight perceptron network for inputting the abstract characterization vectors to obtain lightweight predicted click rate and a deep perceptron network for inputting the abstract characterization vectors to obtain deep predicted click rate;
S50, discretizing the test sample to obtain discrete features of the test sample, and carrying out matching mapping on the discrete features of the test sample based on a mapping dictionary M to obtain feature id index codes of the test sample;
s60, inputting the characteristic id index code of the test sample into the deep migration network for prediction to obtain click rate prediction of the test sample.
Preferably, the depth migration network includes an embedded layer Embedding, a factorized FM network, a shared perceptron network, a lightweight perceptron network, and a deep perceptron network, and the S40 includes:
s401, inputting the characteristic id index code into an embedded layer Embedding of a depth migration network to obtain a corresponding characterization vector;
S402S401 inputs the feature id index code into an embedded layer Embedding of the depth migration network to obtain a corresponding characterization vector;
s402, inputting the corresponding characterization vector into a factorization FM network, and carrying out inner product on the corresponding characterization vector by the factorization FM network to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Is an offset parameter of the linear regression term of the FM network,<v i ,v j >to characterize vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Where the ReLU is the activation function,for the first layer weight parameter of the deep perceptron network, < ->For the first layer bias parameter of the deep perceptron network, < ->For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Where the ReLU is the activation function,layer one weight parameter for lightweight perceptron network, < ->Layer one bias parameters for lightweight perceptron network,/->For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
s403, calculating click rate loss L (x; W, b) by integrating predicted click rates of an FM network, a lightweight perceptron network and a deep perceptron network, wherein the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2 ,
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a classification label value of training data, and p is p fm (x) The click rate predicted value of the FM network is represented by the time, and p is p light (x) The click rate predicted value of the light-weight perceptron network is expressed, and p is p deep (x) When the click rate predicted value of the depth network is represented, lambda is a weight value for determining the predicted error of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
S404, updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).
Preferably, the formula of the discretization in S10 is specifically as follows:
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant, >To round down the symbol, N needs to be determined according to a specific characteristic value range.
Preferably, the step S30 specifically includes:
s301, taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element, and creating a feature co-occurrence frequency matrix C n×n ;
S302 feature co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;wherein->Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j Is a bias term. The calculation is based on matrix element C i,j And its estimated value +.>Error J of (c):
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j As a bias term and with the aim of minimizing the error J, V is updated by gradient descent n×k Parameters;
s303 to update the matrix V n×k As a token vector for the corresponding feature.
The invention also discloses a click rate prediction device based on the deep migration network, which is used for realizing the method and comprises the following steps:
the discrete module is used for carrying out discretization processing on continuous fields of the training samples so as to obtain discrete characteristics of the training samples;
The mapping module is used for creating a unique feature id index code for each training sample discrete feature and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
the characteristic characterization module is used for counting the co-occurrence frequency of the characteristic id index codes in different samples to create a characteristic co-occurrence frequency matrix, converting the characteristic id index codes into a characterization vector matrix of the characteristic co-occurrence frequency matrix through a Glove model, and taking the characterization vector as an initialization parameter of an Embedding layer of the depth migration network;
the training module is used for inputting the characteristic id index code into the deep migration network for training to acquire click rate loss, and updating all parameters of the deep migration network by adopting a back propagation algorithm;
the test module is used for discretizing the test sample to obtain discrete characteristics of the test sample, and carrying out matching mapping on the discrete characteristics of the test sample based on the mapping dictionary M to obtain a characteristic id index code of the test sample;
and the prediction module is used for inputting the characteristic id index code of the test sample into the deep migration network for training to obtain click rate prediction of the test sample.
Preferably, the training module comprises:
The Embedding sub-module is used for inputting the characteristic id index code into an Embedding layer Embedding of the depth migration network to obtain a corresponding characterization vector;
the subnet prediction submodule is used for inputting the corresponding characterization vector into a factorization FM network, and the factorization FM network carries out inner product on the corresponding characterization vector to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Is an offset parameter of the linear regression term of the FM network,<v i ,v j >to characterize vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Where the ReLU is the activation function,is a first layer weight parameter of the deep layer perceptron network,for the first layer bias parameter of the deep perceptron network, < ->For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Where the ReLU is the activation function,layer one weight parameter for lightweight perceptron network, < ->Layer one bias parameters for lightweight perceptron network,/->For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
the integrated prediction sub-module is used for integrating the predicted click rate of the FM network, the light-weight perceptron network and the deep perceptron network to calculate the click rate loss L (x; W, b), and the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2 ,
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a value of a classification label of training data, and p=p fm (x) When the click rate is predicted value of FM network, p=p light (x) Predicted click rate for lightweight perceptron network, p=p deep (x) For the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
And the parameter updating sub-module is used for updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).
Preferably, the formula of the discretization processing in the discretization module is specifically as follows:
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant,>to round down the symbol, N needs to be determined according to a specific characteristic value range.
Preferably, the feature characterization module includes:
the feature co-occurrence sub-module is used for creating a feature co-occurrence frequency matrix C by taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element n×n ;
A decomposition sub-module for co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;wherein->Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j For the bias term, the calculation is based on matrix element C, targeting error minimization i,j And its estimated value +.>Error J of (c):
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j Updating V by gradient descent for bias term n×k ;
A feature characterization sub-module for characterizing the matrix V after decomposition n×k As a token vector for the corresponding feature.
Compared with the prior art, the invention has the beneficial effects that: the defect that an FM model in the prior art does not contain the feature combination with the second order or more is overcome, the deep perceptron network is guided to learn the light perceptron network by utilizing the strong migration learning capability in the deep migration network, so that the light perceptron network with better effect and better performance is obtained, finally the FM network, the light perceptron network and the deep perceptron network are integrated to train and obtain the click rate loss, the click rate prediction method is optimized, and the prediction accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a model architecture of a deep migration network of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
The click rate prediction method based on the deep migration network provided by the invention comprises the following steps:
s10, discretizing continuous fields of the training samples to obtain discrete features of the training samples;
s20, creating a unique feature id index code for each training sample discrete feature, and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
s30, counting the co-occurrence frequency of the feature id index codes in different samples to create a feature co-occurrence frequency matrix, converting the feature id index codes into a feature vector matrix of the feature co-occurrence frequency matrix through a Glove model, training the feature vector as a depth migration network to obtain click rate loss, and updating initialization parameters of an Embeddding layer of the depth migration network by adopting a back propagation algorithm;
s40, inputting the feature id index code into a depth migration network to obtain cross entropy loss of predicted click rate, and updating all parameters of the depth migration network by adopting a back propagation algorithm, wherein the depth migration network comprises an embedded layer Embedding for converting the feature id index code into corresponding characterization vectors, a factorized FM network for carrying out inner product on the corresponding characterization vectors to obtain FM predicted click rate, a shared perceptron network for carrying out nonlinear transformation on the corresponding characterization vectors to obtain abstract characterization vectors, a lightweight perceptron network for inputting the abstract characterization vectors to obtain lightweight predicted click rate and a deep perceptron network for inputting the abstract characterization vectors to obtain deep predicted click rate;
S50, discretizing the test sample to obtain discrete features of the test sample, and carrying out matching mapping on the discrete features of the test sample based on a mapping dictionary M to obtain feature id index codes of the test sample;
s60, inputting the characteristic id index code of the test sample into the deep migration network for prediction to obtain click rate prediction of the test sample.
In the embodiment of the invention, the migration learning is considered to be combined into the click rate estimation method, so that the aim of improving the prediction effect of the light-weight perceptron network while keeping the low-time delay advantage of the light-weight perceptron network is fulfilled. Meanwhile, the Glove technology is combined to initialize the characterization vector, so that the training process of the deep migration network is more stable.
Besides the data to be processed before being transmitted into the input assembly, the initial parameters of the Embedding layer in the input assembly before the training stage are initialized by the characterization vector obtained by the Glove technology.
The invention overcomes the defect that the FM model in the prior art does not contain the feature combination with the second order or more, utilizes the strong migration learning capability in the deep migration network to enable the deep perceptron network to guide the learning of the light perceptron network, thereby obtaining the light perceptron network with better effect and better performance, and finally integrating the FM network, the light perceptron network and the deep perceptron network to train and acquire the click rate loss, thereby optimizing the click rate prediction method and improving the prediction accuracy.
Preferably, the depth migration network comprises an embedded layer Embedding for converting the feature id index code into corresponding characterization vectors, a factorization FM network for carrying out inner product on the corresponding characterization vectors to obtain FM predicted click rate, a shared perceptron network for carrying out nonlinear transformation on the corresponding characterization vectors to obtain abstract characterization vectors, a lightweight perceptron network for inputting the abstract characterization vectors to obtain lightweight predicted click rate and a deep perceptron network for inputting the abstract characterization vectors to obtain deep predicted click rate.
Preferably, the S40 includes:
s401, inputting the characteristic id index code into an embedded layer Embedding of a depth migration network to obtain a corresponding characterization vector;
s402, inputting the corresponding characterization vector into a factorization FM network, and carrying out inner product on the corresponding characterization vector by the factorization FM network to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j areDifferent feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Is an offset parameter of the linear regression term of the FM network, <v i ,v j >To characterize vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Where the ReLU is the activation function,for the first layer weight parameter of the deep perceptron network, < ->For the first layer bias parameter of the deep perceptron network, < ->For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Input light-weight sensing machine netObtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Where the ReLU is the activation function,layer one weight parameter for lightweight perceptron network, < ->Layer one bias parameters for lightweight perceptron network,/->For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network; / >
S403, calculating click rate loss L (x; W, b) by integrating predicted click rates of an FM network, a lightweight perceptron network and a deep perceptron network, wherein the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2 ,
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a classification label value of training data, and p is p fm (x) The click rate predicted value of the FM network is represented by the time, and p is p light (x) The click rate predicted value of the light-weight perceptron network is expressed, and p is p deep (x) And represents the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the lightweight perceptron network and the depth network, lambda takes the checked value,z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
S404, updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).
Preferably, the formula of the discretization in S10 is specifically as follows:
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant,>to round down the symbol, N needs to be determined according to a specific characteristic value range.
Preferably, the step S30 specifically includes:
S301, taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element, and creating a feature co-occurrence frequency matrix C n×n ;
S302 feature co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;wherein->Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j Is a bias term. The calculation is based on matrix element C i,j And its estimated value +.>Error J of (c):
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j As a bias term and with the aim of minimizing the error J, V is updated by gradient descent n×k Parameters;
s303 to update the matrix V n×k As a token vector for the corresponding feature.
Preferably, the training samples are data for marking click category labels, the click category comprises non-click and click, the non-click category label is 0, and the click category label is 1; the test sample is data of a click-free class label.
The invention also discloses a click rate prediction device based on the deep migration network, which is used for realizing the method, and the method refers to the embodiment, and the device adopts all the technical schemes of all the embodiments, so that the device at least has all the beneficial effects brought by the technical schemes of the embodiments, and the description is omitted herein. It comprises the following steps:
The discrete module is used for carrying out discretization processing on continuous fields of the training samples so as to obtain discrete characteristics of the training samples;
the mapping module is used for creating a unique feature id index code for each training sample discrete feature and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
the characteristic characterization module is used for counting the co-occurrence frequency of the characteristic id index codes in different samples to create a characteristic co-occurrence frequency matrix, converting the characteristic id index codes into a characterization vector matrix of the characteristic co-occurrence frequency matrix through a Glove model, and taking the characterization vector as an initialization parameter of an Embedding layer of the depth migration network;
the training module is used for inputting the characteristic id index code into the deep migration network for training to acquire click rate loss, and updating all parameters of the deep migration network by adopting a back propagation algorithm;
the test module is used for discretizing the test sample to obtain discrete characteristics of the test sample, and carrying out matching mapping on the discrete characteristics of the test sample based on the mapping dictionary M to obtain a characteristic id index code of the test sample;
and the prediction module is used for inputting the characteristic id index code of the test sample into the deep migration network for training to obtain click rate prediction of the test sample.
Preferably, the training module comprises:
the Embedding sub-module is used for inputting the characteristic id index code into an Embedding layer Embedding of the depth migration network to obtain a corresponding characterization vector;
the subnet prediction submodule is used for inputting the corresponding characterization vector into a factorization FM network, and the factorization FM network carries out inner product on the corresponding characterization vector to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Is an offset parameter of the linear regression term of the FM network,<v i ,v j >to characterize vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Where the ReLU is the activation function,for the first layer weight parameter of the deep perceptron network, < ->For the first layer bias parameter of the deep perceptron network, < ->For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Where the ReLU is the activation function,layer one weight parameter for lightweight perceptron network, < ->Layer one bias parameters for lightweight perceptron network,/->For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
the integrated prediction sub-module is used for integrating the predicted click rate of the FM network, the light-weight perceptron network and the deep perceptron network to calculate the click rate loss L (x; W, b), and the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2 ,
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a value of a classification label of training data, and p=p fm (x) When the click rate is predicted value of FM network, p=p light (x) Predicted click rate for lightweight perceptron network, p=p deep (x) For the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
And the parameter updating sub-module is used for updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).
Preferably, the formula of the discretization processing in the discretization module is specifically as follows:
wherein V is the value of the continuous characteristic, D isDiscretized integer value, N is a constant, < ->To round down the symbol, N needs to be determined according to a specific characteristic value range.
Preferably, the feature characterization module includes:
the feature co-occurrence sub-module is used for creating a feature co-occurrence frequency matrix C by taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element n×n ;
A decomposition sub-module for co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;wherein->Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j For the bias term, the calculation is based on matrix element C, targeting error minimization i,j And its estimated value +.>Error J of (c):
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j Updating V by gradient descent for bias term n×k ;
A feature characterization sub-module for characterizing the matrix V after decomposition n×k As corresponding featuresIs described.
It is additionally understood that the model architecture of the deep migration network is shown in fig. 1, and the model has five components:
1. input assembly
The input component needs data to be input in a discrete feature id code form, and then the feature id code is transmitted into the coding layer to generate a characterization vector, wherein the feature id code is processed into the feature id code, so that the coding layer can conveniently and quickly take out the corresponding characterization vector. Therefore, before data is transmitted to the input assembly, discretization processing is performed on continuous floating point values such as the amount of money, and specifically, discretization processing can be performed on long-tail distributed floating point values with a very wide value range according to the following formula:
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant, To round down the symbol. Furthermore, the features that are discrete in nature, such as gender= [ male, female, ], are also processed]To be converted into gender= [0,1]The method comprises the steps of carrying out a first treatment on the surface of the For age= [12, 13, ], 80]To convert to age= [0,1, ], 68]。
Besides processing the data before the data is transmitted into the input assembly, the initial parameters of the Embedding layer in the input assembly before the training phase are initialized by the characterization vector obtained by the Glove technology, and the steps of converting the feature id code into the characterization vector by utilizing the Glove are as follows:
s3.1: counting the number of times of the co-occurrence of the different samples in the data according to the co-occurrence characteristics of the different samples to finally obtain a characteristic co-occurrence frequency matrix C n×n ;
S3.2: the characteristic co-occurrence frequency matrix is subjected to the following matrix decomposition:
wherein V is n×k For the decomposed matrix, bias is the bias term. Recovering an error based on the following matrix multiplication:
wherein C is i,j Is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j As a deviation term, a matrix V is obtained through gradient descent update n×k Parameters of (2)
S3.3: matrix V after characteristic co-occurrence frequency matrix decomposition n×k The row vector serves as a characterization vector for the corresponding feature.
FM network
The FM network takes the characterization vector as input, takes the inner product of the characterization vector as the characteristic combination, and estimates the scene with high data sparseness at the click rate in the explicit second-order characteristic combination mode, so that the scene has higher efficiency and better generalization capability. Compared with the fact that the perceptron network does not contain explicit feature combinations, the FM network is integrated into the deep migration network, and an Embedding layer in the deep migration network can be guided to learn better characterization vectors.
3. Shared perceptron network
The shared perceptron network takes the characterization vector as input, converts the original characterization vector into a more abstract characterization vector by using the complex nonlinear mapping capability of the perceptron network, and enables the subsequent lightweight perceptron network and the deep perceptron network to share the abstract characterization vector as the input of the network.
4. Deep layer perceptron network
The deep perceptron network takes the abstract representation vector converted by the shared perceptron network as input, further carries out nonlinear combination through more layers, so that the deep perceptron network has the capability of representing higher-order feature combination, and therefore, the light perceptron network has better performance, in order to transfer the information learned by the deep perceptron network to the light perceptron network, the output click rate of the deep perceptron network is utilized to enrich whether the original data is clicked or not to obtain 0-1 labels, the training sample adopts the data marked with the click category labels, the click category comprises non-clicked and clicked, the non-clicked category label is 0, and the clicked category label is 1; the test sample is data of a click-free class label. Compared with the prior label which only provides the attribution information 1 or 0 of the category, the output click rate of the deep perception machine network can provide more information, so that the probability of one category is known to be larger than that of the other category, and the exact probability value strength information can be known.
5. Light-weight perceptron network
The light-weight perceptron network takes the abstract representation vector converted by the shared perceptron network as input, on one hand, the information of the abstract representation vector is utilized through nonlinear combination of a plurality of shallow layers to achieve the aim of more accurately predicting the click rate, and on the other hand, the information mined from the data by the deep perceptron network is learned through fitting the predicted click rate of the deep perceptron network to achieve the aim of improving the prediction effect and simultaneously keeping the light-weight perceptron network low-delay.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.
Claims (8)
1. The click rate prediction method based on the deep migration network is characterized by comprising the following steps of:
s10, discretizing continuous fields of the training samples to obtain discrete features of the training samples;
s20, creating a unique feature id index code for each training sample discrete feature, and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
S30, counting the co-occurrence frequency of the feature id index codes in different samples to create a feature co-occurrence frequency matrix, converting the feature id index codes into a feature vector matrix of the feature co-occurrence frequency matrix through a Glove model, and taking the feature vector as an initialization parameter of an Embedding layer of the depth migration network;
s40, inputting the characteristic id index code into the depth migration network to obtain cross entropy loss of predicted click rate, and updating all parameters of the depth migration network by adopting a back propagation algorithm; comprising the following steps:
s401, inputting the characteristic id index code into an embedded layer Embedding of a depth migration network to obtain a corresponding characterization vector;
s402, inputting the corresponding characterization vector into a factorization FM network, and carrying out inner product on the corresponding characterization vector by the factorization FM network to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Bias parameter of FM network linear regression term, < v i ,v j Is > the characterization vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into deep perceptron network, obtaining deep by feedforward calculation formula as followsLayer perceptron network predictive click rate p deep (x):
Where the ReLU is the activation function,for the first layer weight parameter of the deep perceptron network, < ->For the first layer bias parameter of the deep perceptron network, < ->For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Where the ReLU is the activation function,layer one weight parameter for lightweight perceptron network, < ->Layer one bias parameters for lightweight perceptron network,/->For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
s403, calculating click rate loss L (x; W, b) by integrating predicted click rates of an FM network, a lightweight perceptron network and a deep perceptron network, wherein the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2 ,
Wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a classification label value of training data, and p is p fm (x) The click rate predicted value of the FM network is represented by the time, and p is p light (x) The click rate predicted value of the light-weight perceptron network is expressed, and p is p deep (x) When the click rate predicted value of the depth network is represented, lambda is a weight value for determining the predicted error of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
S404, updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b);
s50, discretizing the test sample to obtain discrete features of the test sample, and carrying out matching mapping on the discrete features of the test sample based on a mapping dictionary M to obtain feature id index codes of the test sample;
s60, inputting the characteristic id index code of the test sample into the deep migration network for prediction, and obtaining click rate prediction of the test sample.
2. The click rate prediction method based on a depth migration network of claim 1, wherein the depth migration network comprises an embedded layer embedded for converting feature id index codes into corresponding token vectors, a factorized FM network for inner product of the corresponding token vectors to obtain FM predicted click rates, a shared perceptron network for non-linear transformation of the corresponding token vectors to obtain abstract token vectors, a lightweight perceptron network for input abstract token vectors to obtain lightweight predicted click rates, and a deep perceptron network for input abstract token vectors to obtain deep predicted click rates.
3. The click rate prediction method based on the depth migration network according to claim 1, wherein the discretization processing formula in S10 is specifically as follows:
4. The click rate prediction method based on the depth migration network of claim 1, wherein S30 specifically comprises:
s301, taking the feature id index codes of training samples as matrix elements and creating a feature co-occurrence frequency matrix C n×n ;
S302 feature co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;wherein->Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j For the bias term, the calculation is based on matrix element C i,j And its estimated value +.>Error J of (c):
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j As a bias term and with the aim of minimizing the error J, V is updated by gradient descent n×k Parameters;
s303 to update the matrix V n×k As a token vector for the corresponding feature.
5. The click rate prediction method based on the deep migration network according to claim 1, wherein the training samples are data for marking click category labels, the click categories include not clicked and clicked, the not clicked category labels are 0, and the clicked category labels are 1; the test sample is data of a click-free class label.
6. A click rate prediction apparatus based on a deep migration network, comprising:
the discrete module is used for carrying out discretization processing on continuous fields of the training samples so as to obtain discrete characteristics of the training samples;
the mapping module is used for creating a unique feature id index code for each training sample discrete feature and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
the characteristic characterization module is used for counting the co-occurrence frequency of the characteristic id index codes in different samples to create a characteristic co-occurrence frequency matrix, converting the characteristic id index codes into a characterization vector matrix of the characteristic co-occurrence frequency matrix through a Glove model, and taking the characterization vector as an initialization parameter of an Embedding layer of the depth migration network;
The training module is used for inputting the characteristic id index code into the deep migration network for training to acquire click rate loss, and updating all parameters of the deep migration network by adopting a back propagation algorithm; comprising the following steps:
the Embedding sub-module is used for inputting the characteristic id index code into an Embedding layer Embedding of the depth migration network to obtain a corresponding characterization vector;
the subnet prediction submodule is used for inputting the corresponding characterization vector into a factorization FM network, and the factorization FM network carries out inner product on the corresponding characterization vector to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Bias parameter of FM network linear regression term, < v i ,v j Is > the characterization vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Where the ReLU is the activation function,for the first layer weight parameter of the deep perceptron network, < ->For the first layer bias parameter of the deep perceptron network, < ->For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Where the ReLU is the activation function,layer one weight parameter for lightweight perceptron network, < ->Layer one bias for lightweight perceptron networkParameter setting up->For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
the integrated prediction sub-module is used for integrating the predicted click rate of the FM network, the light-weight perceptron network and the deep perceptron network to calculate the click rate loss L (x; W, b), and the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2 ,
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a value of a classification label of training data, and p=p fm (x) When the click rate is predicted value of FM network, p=p light (x) Predicted click rate for lightweight perceptron network, p=p deep (x) For the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
The parameter updating sub-module is used for updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b);
the test module is used for discretizing the test sample to obtain discrete characteristics of the test sample, and carrying out matching mapping on the discrete characteristics of the test sample based on the mapping dictionary M to obtain a characteristic id index code of the test sample;
and the prediction module is used for inputting the characteristic id index code of the test sample into the deep migration network for training to obtain click rate prediction of the test sample.
7. The click rate prediction apparatus based on a depth migration network of claim 6, wherein the discretization processing formula in the discretization module is specifically as follows:
8. The depth migration network-based click rate prediction apparatus of claim 6, wherein the feature characterization module comprises:
the feature co-occurrence sub-module is used for creating a feature co-occurrence frequency matrix C by taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element n×n ;
A decomposition sub-module for co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;wherein->Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j For the bias term, the calculation is based on matrix element C, targeting error minimization i,j And its estimated value +.>Error J of (c):
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j Updating V by gradient descent for bias term n×k ;
A feature characterization sub-module for characterizing the updated matrix V n×k As a token vector for the corresponding feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910991888.2A CN110738314B (en) | 2019-10-17 | 2019-10-17 | Click rate prediction method and device based on deep migration network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910991888.2A CN110738314B (en) | 2019-10-17 | 2019-10-17 | Click rate prediction method and device based on deep migration network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110738314A CN110738314A (en) | 2020-01-31 |
CN110738314B true CN110738314B (en) | 2023-05-02 |
Family
ID=69269257
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910991888.2A Active CN110738314B (en) | 2019-10-17 | 2019-10-17 | Click rate prediction method and device based on deep migration network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110738314B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632319B (en) * | 2020-12-22 | 2023-04-11 | 天津大学 | Method for improving overall classification accuracy of long-tail distributed speech based on transfer learning |
CN112949752B (en) * | 2021-03-25 | 2022-09-06 | 支付宝(杭州)信息技术有限公司 | Training method and device of business prediction system |
US20240046314A1 (en) * | 2022-08-03 | 2024-02-08 | Hong Kong Applied Science and Technology Research Institute Company Limited | Systems and methods for multidimensional knowledge transfer for click through rate prediction |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103365924A (en) * | 2012-04-09 | 2013-10-23 | 北京大学 | Method, device and terminal for searching information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7933762B2 (en) * | 2004-04-16 | 2011-04-26 | Fortelligent, Inc. | Predictive model generation |
-
2019
- 2019-10-17 CN CN201910991888.2A patent/CN110738314B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103365924A (en) * | 2012-04-09 | 2013-10-23 | 北京大学 | Method, device and terminal for searching information |
Also Published As
Publication number | Publication date |
---|---|
CN110738314A (en) | 2020-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110851713B (en) | Information processing method, recommending method and related equipment | |
CN110232480B (en) | Project recommendation method realized by using variational regularized stream and model training method | |
CN111339415B (en) | Click rate prediction method and device based on multi-interactive attention network | |
CN109508584B (en) | Video classification method, information processing method and server | |
CN110738314B (en) | Click rate prediction method and device based on deep migration network | |
CN109062962B (en) | Weather information fused gated cyclic neural network interest point recommendation method | |
CN112035743B (en) | Data recommendation method and device, computer equipment and storage medium | |
CN107562787B (en) | POI (point of interest) encoding method and device, POI recommendation method and electronic equipment | |
CN110347940A (en) | Method and apparatus for optimizing point of interest label | |
CN112288042B (en) | Updating method and device of behavior prediction system, storage medium and computing equipment | |
CN115658864A (en) | Conversation recommendation method based on graph neural network and interest attention network | |
CN113010656A (en) | Visual question-answering method based on multi-mode fusion and structural control | |
CN115631008B (en) | Commodity recommendation method, device, equipment and medium | |
CN112053188A (en) | Internet advertisement recommendation method based on hybrid deep neural network model | |
CN113377914A (en) | Recommended text generation method and device, electronic equipment and computer readable medium | |
CN109189922B (en) | Comment evaluation model training method and device | |
CN111832637B (en) | Distributed deep learning classification method based on alternating direction multiplier method ADMM | |
CN113420212A (en) | Deep feature learning-based recommendation method, device, equipment and storage medium | |
CN114117229A (en) | Project recommendation method of graph neural network based on directed and undirected structural information | |
CN114528490A (en) | Self-supervision sequence recommendation method based on long-term and short-term interests of user | |
CN113868451B (en) | Cross-modal conversation method and device for social network based on up-down Wen Jilian perception | |
CN117711001A (en) | Image processing method, device, equipment and medium | |
CN111460302A (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
CN116401372A (en) | Knowledge graph representation learning method and device, electronic equipment and readable storage medium | |
CN114529007A (en) | Resource operation data prediction method, prediction model training method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |