CN110738314B

CN110738314B - Click rate prediction method and device based on deep migration network

Info

Publication number: CN110738314B
Application number: CN201910991888.2A
Authority: CN
Inventors: 郑子彬; 许海城
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2023-05-02
Anticipated expiration: 2039-10-17
Also published as: CN110738314A

Abstract

The invention discloses a click rate prediction method and a click rate prediction device based on a depth migration network, wherein the click rate prediction device is used for realizing the method, and the method comprises the steps of discretizing continuous fields; preprocessing discrete features, converting the discrete features into feature id codes, and obtaining a mapping dictionary; converting the characteristic id code into a characterization vector by using a Glove model as an initialization parameter of the depth migration network embedded layer; inputting the feature id code into a depth migration network for training; discretizing the test sample, and mapping the discrete features into feature id codes by using a mapping dictionary M; and inputting the feature id code into a depth migration network to predict the click rate. The invention optimizes the click rate prediction method, improves the prediction accuracy and simultaneously keeps the low predicted time delay.

Description

A method and device for predicting click rate based on deep migration network

技术领域Technical Field

本发明涉及大数据处理领域，特别涉及一种基于深度迁移网络的点击率预测方法及装置。The present invention relates to the field of big data processing, and in particular to a click rate prediction method and device based on a deep migration network.

背景技术Background Art

随着互联网的普及，生活中处处都和互联网息息相关：上淘宝和京东购物、上美团和饿了么订外卖、上腾讯视频和爱奇艺看电影。用户们在互联网的大量点击行为为淘宝、京东、美团等平台积累了大量宝贵数据，这些宝贵数据迫使这些平台投入资源，让这些数据产生看得到的价值，比如利用这些数据计算广告或推荐系统的预估场景等。With the popularization of the Internet, everything in life is closely related to the Internet: shopping on Taobao and JD.com, ordering takeout on Meituan and Ele.me, watching movies on Tencent Video and iQiyi. The large number of clicks of users on the Internet has accumulated a large amount of valuable data for Taobao, JD.com, Meituan and other platforms. These valuable data force these platforms to invest resources to make these data generate visible value, such as using these data to calculate the estimated scenarios of advertising or recommendation systems.

计算广告的主要目标是整合广告商、广告位提供方以及用户三方信息，从而进行更精准的广告投放，进而提高广告商的广告效果、广告位的收益以及用户体验，达到多方共赢的局面，而在这个环节中，最重要的一环便是如何进行精准的广告投放，其中用到的技术便是点击率预估(预测用户点击广告概率的一个领域)的各种方法。并且由于计算广告领域数据具有高度稀疏和量级巨大的特点，因此起初像逻辑回归这种最简单的线性模型得到广泛的使用，但是由于点击率的预估场景需要同时考虑用户、广告等多个对象，所以特征之间的组合远比分别独立考虑各个特征要重要的多，后来发展起来的因子分解模型(FM)将特征用表征向量表示，并通过表征向量内积来表达特征之间的组合信息取得了巨大的进步，而近期深度学习模型在各个领域取得巨大的成功，使得基于神经网络的深度学习模型也渐渐被应用到点击率预估领域来弥补FM模型不包含二阶以上特征组合的缺点，尽管深度学习模型效果相比FM模型提升不少，但是计算复杂度更高，在计算广告这种需要实时产生预测结果的海量数据场景的应用受到了很大的限制。The main goal of computational advertising is to integrate information from advertisers, ad space providers, and users to deliver more accurate advertising, thereby improving advertisers’ advertising effectiveness, ad space revenue, and user experience, achieving a win-win situation for all parties. In this link, the most important part is how to deliver accurate advertising, and the technology used is various methods of click-through rate estimation (a field for predicting the probability of users clicking on ads). And because the data in the computational advertising field is highly sparse and huge in magnitude, the simplest linear model such as logistic regression was widely used at first. However, since the click-through rate prediction scenario needs to consider multiple objects such as users and advertisements at the same time, the combination of features is much more important than considering each feature independently. The factor decomposition model (FM) that was developed later represents features with representation vectors and expresses the combination information between features through the inner product of the representation vectors. Great progress has been made. Recently, deep learning models have achieved great success in various fields, so that deep learning models based on neural networks have gradually been applied to the field of click-through rate prediction to make up for the shortcomings of FM models that do not include feature combinations of the second order or above. Although the effect of deep learning models is much better than that of FM models, the computational complexity is higher, and their application in massive data scenarios such as computational advertising that require real-time prediction results is greatly limited.

总的来说，点击率预估方法不断优化达到了比较显著的效果提升，但是渐渐的忽略了实际投入使用需要考虑的低时延要求。In general, the CTR prediction method has been continuously optimized and has achieved significant improvement in results, but the low latency requirement that needs to be considered in actual use has gradually been ignored.

发明内容Summary of the invention

本发明的主要目的是提出一种基于深度迁移网络的点击率预测方法，旨在克服以上问题。The main purpose of the present invention is to propose a click rate prediction method based on a deep migration network, aiming to overcome the above problems.

为实现上述目的，本发明提出的一种基于深度迁移网络的点击率预测方法，包括如下步骤：To achieve the above object, the present invention proposes a click rate prediction method based on a deep migration network, comprising the following steps:

S10对训练样本的连续型字段进行离散化处理，以获取训练样本离散特征；S10 discretizes the continuous fields of the training samples to obtain discrete features of the training samples;

S20为每个训练样本离散特征创建唯一的特征id索引码，并根据训练样本离散特征与其特征id索引码之间映射关系建立离散特征的映射字典M；S20 creates a unique feature id index code for each discrete feature of the training sample, and establishes a mapping dictionary M of the discrete features according to the mapping relationship between the discrete features of the training sample and their feature id index codes;

S30将特征id索引码在不同样本中共现频数进行统计以创建特征共现频数矩阵，通过Glove模型将特征id索引码转化为特征共现频数矩阵的表征向量矩阵，且将表征向量作为深度迁移网络Embedding层的初始化参数；S30 counts the co-occurrence frequencies of feature id index codes in different samples to create a feature co-occurrence frequency matrix, converts the feature id index codes into a representation vector matrix of the feature co-occurrence frequency matrix through a Glove model, and uses the representation vector as an initialization parameter of an Embedding layer of a deep migration network;

S40将特征id索引码输入到深度迁移网络以获取预测点击率的交叉熵损失，并采用反向传播算法更新深度迁移网络的所有参数，其中所述深度迁移网络包括将特征id索引码转换为相应表征向量的嵌入层Embedding、用于将相应表征向量进行内积获取FM预测点击率的因子分解FM网络、用于将相应表征向量进行非线性变换获取抽象表征向量的共享感知机网络、用于输入抽象表征向量获取轻量预测点击率的轻量感知机网络和用于输入抽象表征向量获取深层预测点击率的深层感知机网络；S40 inputs the feature id index code into the deep migration network to obtain the cross entropy loss of the predicted click rate, and uses the back propagation algorithm to update all parameters of the deep migration network, wherein the deep migration network includes an embedding layer Embedding for converting the feature id index code into a corresponding representation vector, a factor decomposition FM network for performing an inner product on the corresponding representation vector to obtain an FM predicted click rate, a shared perceptron network for performing a nonlinear transformation on the corresponding representation vector to obtain an abstract representation vector, a lightweight perceptron network for inputting the abstract representation vector to obtain a lightweight predicted click rate, and a deep perceptron network for inputting the abstract representation vector to obtain a deep layer predicted click rate;

S50将测试样本进行离散化处理以获取测试样本离散特征，并将测试样本离散特征基于映射字典M匹配映射得到测试样本的特征id索引码；S50 discretizes the test sample to obtain discrete features of the test sample, and matches and maps the discrete features of the test sample based on the mapping dictionary M to obtain a feature id index code of the test sample;

S60将测试样本的特征id索引码输入深度迁移网络进行预测获得测试样本的点击率预测。S60 inputs the feature ID index code of the test sample into the deep migration network for prediction to obtain the click rate prediction of the test sample.

优选地，所述深度迁移网络包括嵌入层Embedding、因子分解FM网络、共享感知机网络、轻量感知机网络和深层感知机网络，所述S40包括：Preferably, the deep migration network includes an embedding layer, a factor decomposition FM network, a shared perceptron network, a lightweight perceptron network and a deep perceptron network, and S40 includes:

S401将特征id索引码输入到深度迁移网络的嵌入层Embedding得到相应表征向量；S401 inputs the feature ID index code into the embedding layer of the deep migration network to obtain a corresponding representation vector;

S402S401将特征id索引码输入到深度迁移网络的嵌入层Embedding得到相应表征向量；S402S401 inputs the feature ID index code into the embedding layer of the deep migration network to obtain the corresponding representation vector;

S402将相应表征向量输入因子分解FM网络，因子分解FM网络对相应表征向量进行内积得到FM预测点击率p_fm(x)，其中内积公式如下：

其中x为输入的特征id索引码，v为表示特征id索引码的表征向量，i,j∈n，i和j为不同的特征id索引码下标，n为特征id索引码总数，W_fm为FM网络线性回归项的权重参数，b_fm为FM网络线性回归项的偏置参数，<v_i,v_j>为表征向量v_i和v_j的内积运算；S402 inputs the corresponding representation vector into the factor decomposition FM network, and the factor decomposition FM network performs inner product on the corresponding representation vector to obtain the FM predicted click rate p _fm (x), wherein the inner product formula is as follows:

Where x is the input feature id index code, v is the representation vector representing the feature id index code, i,j∈n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, _Wfm is the weight parameter of the FM network linear regression term, _bfm is the bias parameter of the FM network linear regression term, < _vi , _vj > is the inner product operation of the representation vectors _vi and _vj ;

将相应表征向量输入共享感知机网络进行非线性变换，得到抽象表征向量：h_s＝sigmoid(W_sv+b_s)，其中，v为输入到共享感知机网络的表征向量，W_s和为共享感知机网络的权重参数，b_s为共享感知机网络的偏置参数，h_s为共享感知机输出的抽象表征向量；The corresponding representation vector is input into the shared perceptron network for nonlinear transformation to obtain the abstract representation vector: _hs = sigmoid( _Wsv + _bs ), where v is the representation vector input into the shared perceptron network, _Ws and bs are the weight parameters of the shared perceptron network, _bs is the bias parameter of the shared perceptron network, and _hs is the abstract representation vector output by the shared perceptron.

将抽象表征向量h_s输入深层感知机网络，经如下前馈计算公式得到深层感知机网络预测点击率p_deep(x)：The abstract representation vector h _s is input into the deep perceptron network, and the predicted click rate p _deep (x) of the deep perceptron network is obtained through the following feedforward calculation formula:

其中ReLU为激活函数，

为深层感知机网络的第l层权重参数，

为深层感知机网络的第l层偏置参数，

为第l层的输出向量，h_s为共享感知机网络的输出向量，且具体层数需人工设置；Where ReLU is the activation function,

is the weight parameter of the lth layer of the deep perceptron network,

is the bias parameter of the lth layer of the deep perceptron network,

is the output vector of the lth layer, h _s is the output vector of the shared perceptron network, and the specific number of layers needs to be set manually;

将抽象表征向量h_s输入轻量感知机网络，经如下前馈计算公式得到轻量感知机网络预测点击率p_light(x)：The abstract representation vector _hs is input into the lightweight perceptron network, and the predicted click rate _plight (x) of the lightweight perceptron network is obtained through the following feedforward calculation formula:

其中ReLU为激活函数，

为轻量感知机网络的第l层权重参数，

为轻量感知机网络的第l层偏置参数，

为第l层的输出向量，h_s为共享感知机网络的输出向量，且层数取经验值,并且轻量感知机网络层数比深层感知机网络少；Where ReLU is the activation function,

is the weight parameter of the first layer of the lightweight perceptron network,

is the bias parameter of the lth layer of the lightweight perceptron network,

is the output vector of the lth layer, h _s is the output vector of the shared perceptron network, and the number of layers is an empirical value, and the number of layers of the lightweight perceptron network is less than that of the deep perceptron network;

S403整合FM网络、轻量感知机网络和深层感知机网络的预测点击率计算点击率损失L(x；W,b)，计算公式如下：S403 integrates the predicted click rates of the FM network, the lightweight perceptron network, and the deep perceptron network to calculate the click rate loss L(x; W, b). The calculation formula is as follows:

L(x；W,b)＝H(y,p_fm(x))+H(y,p_light(x))+H(y,p_deep(x))+λ||z_light(x)-z_deep(x)||²，L(x;W,b)＝H(y,p _fm (x))+H(y,p _light (x))+H(y,p _deep (x))+λ||z _light (x) -z _deep (x)|| ² ,

式中H(y,p)为二分类任务常用的交叉熵损失函数，x为输入的特征id索引码，y为训练数据的二分类标签取值，p为p_fm(x)时表示FM网络的点击率预测值，p为p_light(x)时表示轻量感知机网络的点击率预测值，p为p_deep(x)时表示深度网络的点击率预测值，λ为确定轻量感知机网络和深度网络预测误差的权重值，λ取经验值，z_light(x)为轻量感知机网络进行sigmoid变换前的模型输出值p_light(x)＝sigmoid(z_light(x))，z_deep(x)为深度网络进行sigmoid变换前的模型输出值p_deep(x)＝sigmoid(z_deep(x))；Where H(y,p) is the cross entropy loss function commonly used in binary classification tasks, x is the input feature id index code, y is the binary classification label value of the training data, p is _pfm (x), which represents the click rate prediction value of the FM network, p is _plight (x), which represents the click rate prediction value of the lightweight perceptron network, and p is _pdeep (x), which represents the click rate prediction value of the deep network. λ is the weight value for determining the prediction error between the lightweight perceptron network and the deep network, and λ takes an empirical value. _zlight (x) is the model output value of the lightweight perceptron network before sigmoid transformation, _plight (x)=sigmoid( _zlight (x)), and _zdeep (x) is the model output value of the deep network before sigmoid transformation _{, pdeep} (x)=sigmoid( _zdeep (x)).

S404根据点击率损失L(x；W,b)，采用反向传播算法更新深度迁移网络的所有参数。S404 uses a back propagation algorithm to update all parameters of the deep migration network according to the click rate loss L(x; W, b).

优选地，所述S10中离散化处理的公式具体如下：Preferably, the formula for the discretization process in S10 is as follows:

其中V为连续型特征的取值，D为离散化后的整数值，N为常数,

为向下取整符号，N需根据具体的特征取值范围确定。

Where V is the value of the continuous feature, D is the integer value after discretization, and N is a constant.

To round down the sign, N needs to be determined according to the specific feature value range.

优选地，所述S30具体包括：Preferably, the S30 specifically includes:

S301以训练样本特征id索引码在不同样本中共现频数为矩阵元素，创建特征共现频数矩阵C_n×n；S301 creates a feature co-occurrence frequency matrix C _n×n by using the co-occurrence frequencies of the training sample feature id index codes in different samples as matrix elements;

S302将特征共现频数矩阵C_n×n基于矩阵相乘恢复误差进行矩阵分解得到矩阵V_n×k参数，并梯度下降更新矩阵V_n×k参数，矩阵分解公式如；

其中

为矩阵C_n×n的元素C_i,j的估计值，i,j∈n，i和j为不同的特征id索引码下标，n为特征id索引码总数，b_i和b_j为偏差项。计算基于矩阵元素C_i,j和其估计值

的误差J：S302 performs matrix decomposition on the feature co-occurrence frequency matrix C _n×n based on the matrix multiplication recovery error to obtain the matrix V _n×k parameters, and updates the matrix V _n×k parameters by gradient descent. The matrix decomposition formula is as follows:

in

is the estimated value of the element Ci _,j of the matrix C _n×n , i,j∈n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, _bi and _bj are bias terms. The calculation is based on the matrix element Ci _,j and its estimated value

The error J:

C_i,j为矩阵C_n×n的元素，v_i和v_j为矩阵V_n×k的行向量，b_i和b_j为偏差项，并以最小化误差J为目标，通过梯度下降更新V_n×k参数；Ci _,j is the element of the matrix Cn _×n , _vi and _vj are the row vectors of the matrix Vn _×k , _bi and _bj are the bias terms, and the parameters of Vn _×k are updated by gradient descent with the goal of minimizing the error J;

S303以更新后的矩阵V_n×k的行向量作为对应特征的表征向量。S303 uses the row vector of the updated matrix V _n×k as the characterization vector of the corresponding feature.

本发明还公开了一种基于深度迁移网络的点击率预测装置，本装置用于实现上述方法，其包括：The present invention also discloses a click rate prediction device based on a deep migration network, which is used to implement the above method and includes:

离散模块，用于对训练样本的连续型字段进行离散化处理，以获取训练样本离散特征；Discretization module, used to discretize the continuous fields of training samples to obtain discrete features of training samples;

映射模块，用于为每个训练样本离散特征创建唯一的特征id索引码，并根据训练样本离散特征与其特征id索引码之间映射关系建立离散特征的映射字典M；A mapping module is used to create a unique feature ID index code for each discrete feature of a training sample, and to establish a mapping dictionary M of the discrete features according to the mapping relationship between the discrete features of the training sample and their feature ID index codes;

特征表征模块，用于将特征id索引码在不同样本中共现频数进行统计以创建特征共现频数矩阵，通过Glove模型将特征id索引码转化为特征共现频数矩阵的表征向量矩阵，且将表征向量作为深度迁移网络Embedding层的初始化参数；The feature representation module is used to count the co-occurrence frequencies of feature id index codes in different samples to create a feature co-occurrence frequency matrix, convert the feature id index codes into a representation vector matrix of the feature co-occurrence frequency matrix through the Glove model, and use the representation vector as the initialization parameter of the Embedding layer of the deep migration network;

训练模块，用于将特征id索引码输入到深度迁移网络进行训练以获取点击率损失，并采用反向传播算法更新深度迁移网络的所有参数；The training module is used to input the feature ID index code into the deep migration network for training to obtain the click-through rate loss, and use the back-propagation algorithm to update all parameters of the deep migration network;

测试模块，用于将测试样本进行离散化处理以获取测试样本离散特征，并将测试样本离散特征基于映射字典M匹配映射得到测试样本的特征id索引码；A testing module is used to discretize the test sample to obtain discrete features of the test sample, and match and map the discrete features of the test sample based on a mapping dictionary M to obtain a feature id index code of the test sample;

预测模块，用于将测试样本的特征id索引码输入深度迁移网络进行训练获得测试样本的点击率预测。The prediction module is used to input the feature ID index code of the test sample into the deep migration network for training to obtain the click rate prediction of the test sample.

优选地，所述训练模块包括：Preferably, the training module includes:

嵌入子模块，用于将特征id索引码输入到深度迁移网络的嵌入层Embedding得到相应表征向量；The embedding submodule is used to input the feature ID index code into the embedding layer of the deep migration network to obtain the corresponding representation vector;

子网预测子模块，用于将相应表征向量输入因子分解FM网络，因子分解FM网络对相应表征向量进行内积得到FM预测点击率p_fm(x)，其中内积公式如下：

其中x为输入的特征id索引码，v为表示特征id索引码的表征向量，i,j∈n，i和j为不同的特征id索引码下标，n为特征id索引码总数，W_fm为FM网络线性回归项的权重参数，b_fm为FM网络线性回归项的偏置参数，<v_i,v_j>为表征向量v_i和v_j的内积运算；The subnetwork prediction submodule is used to input the corresponding representation vector into the factor decomposition FM network. The factor decomposition FM network performs the inner product on the corresponding representation vector to obtain the FM predicted click rate p _fm (x), where the inner product formula is as follows:

其中ReLU为激活函数，

为深层感知机网络的第l层权重参数，

为深层感知机网络的第l层偏置参数，

is the weight parameter of the lth layer of the deep perceptron network,

is the bias parameter of the lth layer of the deep perceptron network,

其中ReLU为激活函数，

为轻量感知机网络的第l层权重参数，

为轻量感知机网络的第l层偏置参数，

is the bias parameter of the lth layer of the lightweight perceptron network,

整合预测子模块，用于整合FM网络、轻量感知机网络和深层感知机网络的预测点击率计算点击率损失L(x；W,b)，计算公式如下：The integrated prediction submodule is used to integrate the predicted click rates of the FM network, the lightweight perceptron network, and the deep perceptron network to calculate the click rate loss L(x; W, b). The calculation formula is as follows:

式中H(y,p)为二分类任务常用的交叉熵损失函数，x为输入的特征id索引码，y为训练数据的二分类标签取值，p＝p_fm(x)时为FM网络的点击率预测值，p＝p_light(x)时为轻量感知机网络的点击率预测值，p＝p_deep(x)为深度网络的点击率预测值，λ为确定轻量感知机网络和深度网络预测误差的权重值，λ取经验值，z_light(x)为轻量感知机网络进行sigmoid变换前的模型输出值p_light(x)＝sigmoid(z_light(x))，z_deep(x)为深度网络进行sigmoid变换前的模型输出值p_deep(x)＝sigmoid(z_deep(x))；Where H(y,p) is the cross entropy loss function commonly used in binary classification tasks, x is the input feature id index code, y is the binary classification label value of the training data, p＝p _fm (x) is the click rate prediction value of the FM network, p＝p _light (x) is the click rate prediction value of the lightweight perceptron network, p＝p _deep (x) is the click rate prediction value of the deep network, λ is the weight value for determining the prediction error between the lightweight perceptron network and the deep network, λ takes an empirical value, z _light (x) is the model output value of the lightweight perceptron network before sigmoid transformation p _light (x)＝sigmoid(z _light (x)), z _deep (x) is the model output value of the deep network before sigmoid transformation p _deep (x)＝sigmoid(z _deep (x));

参数更新子模块，用于根据点击率损失L(x；W,b)，采用反向传播算法更新深度迁移网络的所有参数。The parameter updating submodule is used to update all parameters of the deep migration network using the back-propagation algorithm according to the click-through rate loss L(x; W, b).

优选地，所述离散模块中离散化处理的公式具体如下：Preferably, the formula for discretization processing in the discrete module is as follows:

其中V为连续型特征的取值，D为离散化后的整数值，N为常数,

为向下取整符号，N需根据具体的特征取值范围确定。

优选地，所述特征表征模块包括：Preferably, the feature characterization module comprises:

特征共现子模块，用于以训练样本特征id索引码在不同样本中共现频数为矩阵元素，创建特征共现频数矩阵C_n×n；The feature co-occurrence submodule is used to create a feature co-occurrence frequency matrix C _n×n with the co-occurrence frequency of the training sample feature id index code in different samples as the matrix element;

分解子模块，用于将特征共现频数矩阵C_n×n基于矩阵相乘恢复误差进行矩阵分解得到矩阵V_n×k参数，并梯度下降更新矩阵V_n×k参数，矩阵分解公式如；

其中

为矩阵C_n×n的元素C_i,j的估计值，i,j∈n，i和j为不同的特征id索引码下标，n为特征id索引码总数，b_i和b_j为偏差项，以误差最小化为目标，计算基于矩阵元素C_i,j和其估计值

的误差J：The decomposition submodule is used to perform matrix decomposition on the feature co-occurrence frequency matrix C _n×n based on the matrix multiplication recovery error to obtain the matrix V _n×k parameters, and then update the matrix V _n×k parameters by gradient descent. The matrix decomposition formula is as follows:

in

is the estimated value of the element Ci _,j of the matrix Cn _×n , i,j∈n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, _bi and _bj are deviation terms, with the goal of minimizing the error, the calculation is based on the matrix element Ci _,j and its estimated value

The error J:

C_i,j为矩阵C_n×n的元素，v_i和v_j为矩阵V_n×k的行向量，b_i和b_j为偏差项，通过梯度下降更新V_n×k；Ci _,j is the element of the matrix Cn _×n , _vi and _vj are the row vectors of the matrix Vn _×k , _bi and _bj are the bias terms, and Vn _×k is updated by gradient descent;

特征表征子模块，用于以分解后的矩阵V_n×k的行向量作为对应特征的表征向量。The feature characterization submodule is used to use the row vector of the decomposed matrix V _n×k as the characterization vector of the corresponding feature.

本发明与现有技术相比较，其有益效果是：弥补现有技术中FM模型不包含二阶以上特征组合的缺点，利用深度迁移网络中强大的迁移学习能力让深层感知机网络来指导轻量感知机网络的学习，从而获得效果更优并且性能更好的轻量感知机网络，最后整合FM网络、轻量感知机网络和深层感知机网络训练获取点击率损失，这样优化了点击率预测方法，提升了预测准确性。Compared with the prior art, the present invention has the following beneficial effects: it makes up for the shortcoming that the FM model in the prior art does not include feature combinations of the second order or higher, utilizes the powerful transfer learning ability in the deep transfer network to allow the deep perceptron network to guide the learning of the lightweight perceptron network, thereby obtaining a lightweight perceptron network with better effect and performance, and finally integrates the FM network, the lightweight perceptron network and the deep perceptron network training to obtain the click-through rate loss, thereby optimizing the click-through rate prediction method and improving the prediction accuracy.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图示出的结构获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on the structures shown in these drawings without paying creative work.

图1为本发明的深度迁移网络的模型架构图；FIG1 is a model architecture diagram of a deep migration network of the present invention;

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization of the purpose, functional features and advantages of the present invention will be further explained in conjunction with embodiments and with reference to the accompanying drawings.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

需要说明，若本发明实施例中有涉及方向性指示(诸如上、下、左、右、前、后……)，则该方向性指示仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等，如果该特定姿态发生改变时，则该方向性指示也相应地随之改变。It should be noted that if the embodiments of the present invention involve directional indications (such as up, down, left, right, front, back, etc.), the directional indications are only used to explain the relative position relationship, movement status, etc. between the components under a certain specific posture (as shown in the accompanying drawings). If the specific posture changes, the directional indication will also change accordingly.

另外，若本发明实施例中有涉及“第一”、“第二”等的描述，则该“第一”、“第二”等的描述仅用于描述目的，而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外，各个实施例之间的技术方案可以相互结合，但是必须是以本领域普通技术人员能够实现为基础，当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在，也不在本发明要求的保护范围之内。In addition, if there are descriptions involving "first", "second", etc. in the embodiments of the present invention, the descriptions of "first", "second", etc. are only used for descriptive purposes and cannot be understood as indicating or suggesting their relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In addition, the technical solutions between the various embodiments can be combined with each other, but they must be based on the ability of ordinary technicians in the field to implement them. When the combination of technical solutions is contradictory or cannot be implemented, it should be deemed that such a combination of technical solutions does not exist and is not within the scope of protection required by the present invention.

本发明提出的一种基于深度迁移网络的点击率预测方法，包括如下步骤：The present invention proposes a method for predicting click rate based on a deep migration network, comprising the following steps:

S30将特征id索引码在不同样本中共现频数进行统计以创建特征共现频数矩阵，通过Glove模型将特征id索引码转化为特征共现频数矩阵的表征向量矩阵，且将表征向量作为深度迁移度迁移网络进行训练以获取点击率损失，并采用反向传播算法更新深度迁移网络Embedding层的初始化参数；S30 counts the co-occurrence frequencies of feature id index codes in different samples to create a feature co-occurrence frequency matrix, converts the feature id index codes into a representation vector matrix of the feature co-occurrence frequency matrix through the Glove model, and trains the representation vector as a deep migration network to obtain click-through rate loss, and uses a back-propagation algorithm to update the initialization parameters of the embedding layer of the deep migration network;

S40将特征id索引码输入到深度迁移网络以获取预测点击率的交叉熵损失，并采用反向传播算法更新深度迁移网络的所有参数，其中所述深度迁移网络包括将特征id索引码转换为相应表征向量的嵌入层Embedding、用于将相应表征向量进行内积获取FM预测点击率的因子分解FM网络、用于将相应表征向量进行非线性变换获取抽象表征向量的共享感知机网络、用于输入抽象表征向量获取轻量预测点击率的轻量感知机网络和用于输入抽象表征向量获取深层预测点击率的深层感知机网络；S40 inputs the feature id index code into the deep migration network to obtain the cross entropy loss of the predicted click rate, and uses the back propagation algorithm to update all parameters of the deep migration network, wherein the deep migration network includes an embedding layer Embedding for converting the feature id index code into a corresponding representation vector, a factor decomposition FM network for performing an inner product on the corresponding representation vector to obtain the FM predicted click rate, a shared perceptron network for performing a nonlinear transformation on the corresponding representation vector to obtain an abstract representation vector, a lightweight perceptron network for inputting the abstract representation vector to obtain a lightweight predicted click rate, and a deep perceptron network for inputting the abstract representation vector to obtain a deep layer predicted click rate;

在本发明实施例中，本发明考虑将迁移学习结合到点击率预估方法中，以达到保持轻量感知机网络低时延优点的同时，提升轻量感知机网络预测效果的目的。同时结合Glove技术进行表征向量初始化，从而使得深度迁移网络的训练过程更稳定。In the embodiment of the present invention, the present invention considers combining transfer learning with the click rate prediction method to achieve the purpose of maintaining the low latency advantage of the lightweight perceptron network while improving the prediction effect of the lightweight perceptron network. At the same time, the representation vector is initialized in combination with the Glove technology, so that the training process of the deep transfer network is more stable.

除了数据传入输入组件前要进行处理外，输入组件内部Embedding层在训练阶段前的初始参数由Glove技术获取到的表征向量进行初始化。In addition to processing data before it is passed into the input component, the initial parameters of the Embedding layer inside the input component before the training phase are initialized by the representation vector obtained by the Glove technology.

本发明弥补现有技术中FM模型不包含二阶以上特征组合的缺点，利用深度迁移网络中强大的迁移学习能力让深层感知机网络来指导轻量感知机网络的学习，从而获得效果更优并且性能更好的轻量感知机网络，最后整合FM网络、轻量感知机网络和深层感知机网络训练获取点击率损失，这样优化了点击率预测方法，提升了预测准确性。The present invention makes up for the shortcoming that the FM model in the prior art does not include feature combinations above the second order, and utilizes the powerful transfer learning ability of the deep transfer network to allow the deep perceptron network to guide the learning of the lightweight perceptron network, thereby obtaining a lightweight perceptron network with better effect and performance, and finally integrating the FM network, the lightweight perceptron network and the deep perceptron network training to obtain the click-through rate loss, thereby optimizing the click-through rate prediction method and improving the prediction accuracy.

优选地，所述深度迁移网络包括将特征id索引码转换为相应表征向量的嵌入层Embedding、用于将相应表征向量进行内积获取FM预测点击率的因子分解FM网络、用于将相应表征向量进行非线性变换获取抽象表征向量的共享感知机网络、用于输入抽象表征向量获取轻量预测点击率的轻量感知机网络和用于输入抽象表征向量获取深层预测点击率的深层感知机网络。Preferably, the deep migration network includes an embedding layer Embedding that converts a feature id index code into a corresponding representation vector, a factorization FM network for performing an inner product on the corresponding representation vector to obtain an FM predicted click-through rate, a shared perceptron network for performing a nonlinear transformation on the corresponding representation vector to obtain an abstract representation vector, a lightweight perceptron network for inputting an abstract representation vector to obtain a lightweight predicted click-through rate, and a deep perceptron network for inputting an abstract representation vector to obtain a deep predicted click-through rate.

优选地，所述S40包括：Preferably, the S40 includes:

其中ReLU为激活函数，

为深层感知机网络的第l层权重参数，

为深层感知机网络的第l层偏置参数，

is the weight parameter of the lth layer of the deep perceptron network,

is the bias parameter of the lth layer of the deep perceptron network,

其中ReLU为激活函数，

为轻量感知机网络的第l层权重参数，

为轻量感知机网络的第l层偏置参数，

is the bias parameter of the lth layer of the lightweight perceptron network,

其中V为连续型特征的取值，D为离散化后的整数值，N为常数,

为向下取整符号，N需根据具体的特征取值范围确定。

优选地，所述S30具体包括：Preferably, the S30 specifically includes:

其中

in

The error J:

优选地，所述训练样本为标记点击类别标签的数据，所述点击类别包括未点击和点击，未点击的类别标签为0，点击的类别标签为1；所述测试样本为无点击类别标签的数据。Preferably, the training samples are data marked with click category labels, the click categories include no-click and click, the no-click category label is 0, and the click category label is 1; the test samples are data without click category labels.

本发明还公开了一种基于深度迁移网络的点击率预测装置，本装置用于实现上述方法，本方法参照上述实施例，本装置采用了上述所有实施例的全部技术方案，因此至少具有上述实施例的技术方案所带来的所有有益效果，在此不再一一赘述。其包括：The present invention also discloses a click rate prediction device based on a deep migration network, which is used to implement the above method. The method refers to the above embodiment, and the present device adopts all the technical solutions of all the above embodiments, so it at least has all the beneficial effects brought by the technical solutions of the above embodiments, which will not be described one by one here. It includes:

其中ReLU为激活函数，

为深层感知机网络的第l层权重参数，

为深层感知机网络的第l层偏置参数，

is the weight parameter of the lth layer of the deep perceptron network,

is the bias parameter of the lth layer of the deep perceptron network,

其中ReLU为激活函数，

为轻量感知机网络的第l层权重参数，

为轻量感知机网络的第l层偏置参数，

is the bias parameter of the lth layer of the lightweight perceptron network,

其中V为连续型特征的取值，D为离散化后的整数值，N为常数,

为向下取整符号，N需根据具体的特征取值范围确定。

其中

in

The error J:

补充理解地是，深度迁移网络的模型架构如图1所示，该模型五个组件：For additional understanding, the model architecture of the deep migration network is shown in Figure 1. The model has five components:

1.输入组件1. Input components

输入组件需要数据按离散的特征id编码的形式进行输入，后续将特征id编码传入Embedding层产生表征向量，其中特征id编码要处理成特征id编码，以方便Embedding层快速取出对应的表征向量。因此在将数据传入输入组件前，要对金额等连续型浮点值进行离散化处理，具体来说对于取值范围很广的长尾分布浮点值可以按如下公式进行离散化处理：The input component requires data to be input in the form of discrete feature id codes, and then the feature id codes are passed to the Embedding layer to generate representation vectors, where the feature id codes must be processed into feature id codes to facilitate the Embedding layer to quickly retrieve the corresponding representation vectors. Therefore, before passing the data to the input component, continuous floating-point values such as amounts must be discretized. Specifically, long-tailed floating-point values with a wide range of values can be discretized according to the following formula:

式中V为连续型特征的取值，D为离散化后的整数值，N为常数，

为向下取整符号。此外对于本身就是离散型的特征也要进行处理，例如对于性别＝[男，女]，要转化成性别＝[0，1]；对于年龄＝[12，13，...，80]，要转化成年龄＝[0，1，...，68]。Where V is the value of the continuous feature, D is the integer value after discretization, and N is a constant.

In addition, the discrete features should also be processed, for example, gender = [male, female] should be converted to gender = [0, 1]; age = [12, 13, ..., 80] should be converted to age = [0, 1, ..., 68].

除了数据传入输入组件前要进行处理外，输入组件内部Embedding层在训练阶段前的初始参数由Glove技术获取到的表征向量进行初始化，利用Glove将特征id编码转化为表征向量步骤如下：In addition to processing the data before it is passed into the input component, the initial parameters of the embedding layer inside the input component before the training phase are initialized by the representation vector obtained by the Glove technology. The steps for using Glove to convert the feature ID encoding into the representation vector are as follows:

S3.1：根据数据中不同样本共同出现的特征，对其共同出现在同一样本的次数进行统计，最终得到一个特征共现频数矩阵C_n×n；S3.1: According to the common features of different samples in the data, the number of times they appear in the same sample is counted, and finally a feature co-occurrence frequency matrix C _n×n is obtained;

S3.2：对特征共现频数矩阵进行如下矩阵分解：S3.2: Perform the following matrix decomposition on the feature co-occurrence frequency matrix:

其中V_n×k为分解后的矩阵，bias为偏差项。基于如下矩阵相乘恢复误差：Where V _n×k is the decomposed matrix and bias is the bias term. The error is recovered based on the following matrix multiplication:

其中C_i,j为矩阵C_n×n的元素，v_i和v_j为矩阵V_n×k的行向量，b_i和b_j为偏差项，通过梯度下降更新得到矩阵V_n×k的参数Where Ci _,j is the element of the matrix Cn _×n , _vi and _vj are the row vectors of the matrix Vn _×k , _bi and _bj are the bias terms, and the parameters of the matrix _Vn×k are obtained by gradient descent update

S3.3：以特征共现频数矩阵分解后的矩阵V_n×k行向量作为对应特征的表征向量。S3.3: The row vectors of the matrix V _n×k after decomposition of the feature co-occurrence frequency matrix are used as the representation vectors of the corresponding features.

2.FM网络2.FM Network

FM网络以表征向量作为输入，以表征向量的内积作为特征组合的方式，以这种显式进行二阶特征组合的方式在点击率预估这种数据高度稀疏的场景会更高效并且具有更好的泛化能力。相比于感知机网络不包含显式的特征组合，将FM网络整合到深度迁移网络中能指导深度迁移网络中Embedding层学习到更好的表征向量。The FM network uses the representation vector as input and the inner product of the representation vector as a feature combination method. This explicit second-order feature combination method is more efficient and has better generalization ability in scenarios such as click-through rate estimation where data is highly sparse. Compared with the perceptron network that does not include explicit feature combination, integrating the FM network into the deep transfer network can guide the embedding layer in the deep transfer network to learn better representation vectors.

3.共享感知机网络3. Shared Perceptron Network

共享感知机网络以表征向量作为输入，以感知机网络复杂的非线性映射能力将原始的表征向量转换为更抽象的表征向量，并让后续的轻量感知机网络和深层感知机网络共享抽象的表征向量作为网络的输入，以这种最直接的信息迁移方式，使得深层感知机网络的信息能很方便的供轻量网络使用，以达到提高性能的目的。The shared perceptron network takes the representation vector as input, and uses the complex nonlinear mapping capability of the perceptron network to convert the original representation vector into a more abstract representation vector, and allows the subsequent lightweight perceptron network and deep perceptron network to share the abstract representation vector as the network input. In this most direct way of information migration, the information of the deep perceptron network can be conveniently used by the lightweight network to achieve the purpose of improving performance.

4.深层感知机网络4. Deep Perceptron Network

深层感知机网络以共享感知机网络转化后的抽象表征向量作为输入，进一步通过更多层的非线性组合，从而具有表征更高阶特征组合的能力，因此也别轻量感知机网络具有更好的表现，为了将深层感知机网络学习到的信息迁移给轻量感知机网络，我们利用深层感知机网络的输出点击率来丰富原始数据的是否点击这样的0－1标签，训练样本采用标记有点击类别标签的数据，所述点击类别包括未点击和点击，未点击的类别标签为0，点击的类别标签为1；所述测试样本为无点击类别标签的数据。相比于原标签只提供类别的归属信息1或者0来说，深层感知机网络的输出点击率可以提供更多的信息，不仅知道一类的概率是否大于另一类的概率，还能知道确切的概率值强弱大小信息。The deep perceptron network takes the abstract representation vector transformed by the shared perceptron network as input, and further passes through more layers of nonlinear combination, so it has the ability to represent higher-order feature combinations, so it has better performance than the lightweight perceptron network. In order to transfer the information learned by the deep perceptron network to the lightweight perceptron network, we use the output click rate of the deep perceptron network to enrich the 0-1 label of whether the original data is clicked. The training sample uses data marked with click category labels, and the click category includes no click and click. The category label of no click is 0, and the category label of click is 1; the test sample is data without click category label. Compared with the original label that only provides category attribution information 1 or 0, the output click rate of the deep perceptron network can provide more information, not only to know whether the probability of one category is greater than the probability of another category, but also to know the exact probability value strength information.

5.轻量感知机网络5. Lightweight Perceptron Network

轻量感知机网络同样以共享感知机网络转化后的抽象表征向量作为输入，一方面通过几层浅层的非线性组合对抽象表征向量的信息加以利用，以达到更精确预测点击率的目标，另一方面通过拟合深层感知机网络的预测点击率来学习深层感知机网络从数据中挖掘到的信息，以达到提升预测效果同时保持轻量感知机网络低时延的目标。The lightweight perceptron network also uses the abstract representation vector converted by the shared perceptron network as input. On the one hand, it utilizes the information of the abstract representation vector through several layers of shallow nonlinear combinations to achieve the goal of more accurate prediction of click-through rate. On the other hand, it learns the information mined from the data by the deep perceptron network by fitting the predicted click-through rate of the deep perceptron network, so as to achieve the goal of improving the prediction effect while maintaining low latency for the lightweight perceptron network.

以上所述仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是在本发明的发明构思下，利用本发明说明书及附图内容所作的等效结构变换，或直接/间接运用在其他相关的技术领域均包括在本发明的专利保护范围内。The above description is only a preferred embodiment of the present invention, and does not limit the patent scope of the present invention. All equivalent structural changes made by using the contents of the present invention specification and drawings under the inventive concept of the present invention, or directly/indirectly applied in other related technical fields are included in the patent protection scope of the present invention.

Claims

1. A click rate prediction method based on a deep migration network, characterized by comprising the following steps:

S10 discretizes the continuous fields of the training samples to obtain discrete features of the training samples;

S20 creates a unique feature id index code for each discrete feature of the training sample, and establishes a mapping dictionary M of the discrete features according to the mapping relationship between the discrete features of the training sample and their feature id index codes;

S30 counts the co-occurrence frequencies of feature id index codes in different samples to create a feature co-occurrence frequency matrix, converts the feature id index codes into a representation vector matrix of the feature co-occurrence frequency matrix through a Glove model, and uses the representation vector as an initialization parameter of an Embedding layer of a deep migration network;

S40 inputs the feature ID index code into the deep migration network to obtain the cross entropy loss of the predicted click rate, and uses the back propagation algorithm to update all parameters of the deep migration network; including:

S401 inputs the feature ID index code into the embedding layer of the deep migration network to obtain a corresponding representation vector;

S402 inputs the corresponding representation vector into the factor decomposition FM network, and the factor decomposition FM network performs inner product on the corresponding representation vector to obtain the FM predicted click rate p _fm (x), wherein the inner product formula is as follows:

Where x is the input feature id index code, v is the representation vector representing the feature id index code, i,j∈n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, _Wfm is the weight parameter of the FM network linear regression term, _bfm is the bias parameter of the FM network linear regression term, ＜ _vi , _vj ＞ is the inner product operation of the representation vectors _vi and _vj ;

The corresponding representation vector is input into the shared perceptron network for nonlinear transformation to obtain the abstract representation vector: _hs = sigmoid( _Wsv + _bs ), where v is the representation vector input into the shared perceptron network, _Ws and bs are the weight parameters of the shared perceptron network, _bs is the bias parameter of the shared perceptron network, and _hs is the abstract representation vector output by the shared perceptron.

The abstract representation vector h _s is input into the deep perceptron network, and the predicted click rate p _deep (x) of the deep perceptron network is obtained through the following feedforward calculation formula:

Where ReLU is the activation function,

is the weight parameter of the lth layer of the deep perceptron network,

is the bias parameter of the lth layer of the deep perceptron network,

The abstract representation vector _hs is input into the lightweight perceptron network, and the predicted click rate _plight (x) of the lightweight perceptron network is obtained through the following feedforward calculation formula:

Where ReLU is the activation function,

is the bias parameter of the lth layer of the lightweight perceptron network,

S403 integrates the predicted click rates of the FM network, the lightweight perceptron network, and the deep perceptron network to calculate the click rate loss L(x; W, b). The calculation formula is as follows:

L(x;W,b)＝H(y,p _fm (x))+H(y,p _light (x))+H(y,p _deep (x))+λ||z _light (x) -z _deep (x)|| ² ,

Where H(y,p) is the cross entropy loss function commonly used in binary classification tasks, x is the input feature id index code, y is the binary classification label value of the training data, p is _pfm (x), which represents the click rate prediction value of the FM network, p is _plight (x), which represents the click rate prediction value of the lightweight perceptron network, and p is _pdeep (x), which represents the click rate prediction value of the deep network. λ is the weight value for determining the prediction error between the lightweight perceptron network and the deep network, and λ takes an empirical value. _zlight (x) is the model output value of the lightweight perceptron network before sigmoid transformation, _plight (x)=sigmoid( _zlight (x)), and _zdeep (x) is the model output value of the deep network before sigmoid transformation _{, pdeep} (x)=sigmoid( _zdeep (x)).

S404 uses a back propagation algorithm to update all parameters of the deep migration network according to the click rate loss L(x; W, b);

S50 discretizes the test sample to obtain discrete features of the test sample, and matches and maps the discrete features of the test sample based on the mapping dictionary M to obtain a feature id index code of the test sample;

S60 inputs the feature ID index code of the test sample into the deep migration network for prediction, and obtains the click rate prediction of the test sample.

2. The click-through rate prediction method based on the deep migration network as described in claim 1 is characterized in that the deep migration network includes an embedding layer Embedding that converts the feature id index code into a corresponding representation vector, a factor decomposition FM network for performing an inner product on the corresponding representation vector to obtain an FM predicted click-through rate, a shared perceptron network for performing a nonlinear transformation on the corresponding representation vector to obtain an abstract representation vector, a lightweight perceptron network for inputting an abstract representation vector to obtain a lightweight predicted click-through rate, and a deep perceptron network for inputting an abstract representation vector to obtain a deep predicted click-through rate.

3. The click rate prediction method based on deep migration network according to claim 1, characterized in that the formula for discretization processing in S10 is specifically as follows:

4. The method for predicting click rate based on a deep migration network according to claim 1, wherein S30 specifically comprises:

S301 creates a feature co-occurrence frequency matrix C _n×n using the co-occurrence frequencies of the feature ID index codes of the training samples in different samples as matrix elements;

S302 performs matrix decomposition on the feature co-occurrence frequency matrix C _n×n based on the matrix multiplication recovery error to obtain the matrix V _n×k parameters, and updates the matrix V _n×k parameters by gradient descent. The matrix decomposition formula is as follows:

in

is the estimated value of the element Ci _,j of the matrix Cn _×n , i,j∈n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, _bi and _bj are bias terms, and the calculation is based on the matrix element Ci _,j and its estimated value

The error J:

Ci _,j is the element of the matrix Cn _×n , _vi and _vj are the row vectors of the matrix Vn _×k , _bi and _bj are the bias terms, and the parameters of Vn _×k are updated by gradient descent with the goal of minimizing the error J;

S303 uses the row vector of the updated matrix V _n×k as the characterization vector of the corresponding feature.

5. The click rate prediction method based on deep migration network as described in claim 1 is characterized in that the training sample is data marked with click category labels, the click categories include no click and click, the category label of no click is 0, and the category label of click is 1; the test sample is data without click category labels.

6. A click rate prediction device based on a deep migration network, characterized by comprising:

Discretization module, used to discretize the continuous fields of training samples to obtain discrete features of training samples;

A mapping module is used to create a unique feature ID index code for each discrete feature of a training sample, and to establish a mapping dictionary M of the discrete features according to the mapping relationship between the discrete features of the training sample and their feature ID index codes;

The feature representation module is used to count the co-occurrence frequencies of feature id index codes in different samples to create a feature co-occurrence frequency matrix, convert the feature id index codes into a representation vector matrix of the feature co-occurrence frequency matrix through the Glove model, and use the representation vector as the initialization parameter of the Embedding layer of the deep migration network;

The training module is used to input the feature ID index code into the deep migration network for training to obtain the click-through rate loss, and use the back-propagation algorithm to update all parameters of the deep migration network; including:

The embedding submodule is used to input the feature ID index code into the embedding layer of the deep migration network to obtain the corresponding representation vector;

The subnetwork prediction submodule is used to input the corresponding representation vector into the factor decomposition FM network. The factor decomposition FM network performs the inner product on the corresponding representation vector to obtain the FM predicted click rate p _fm (x), where the inner product formula is as follows:

Where ReLU is the activation function,

is the weight parameter of the lth layer of the deep perceptron network,

is the bias parameter of the lth layer of the deep perceptron network,

Where ReLU is the activation function,

is the bias parameter of the lth layer of the lightweight perceptron network,

The integrated prediction submodule is used to integrate the predicted click rates of the FM network, the lightweight perceptron network, and the deep perceptron network to calculate the click rate loss L(x; W, b). The calculation formula is as follows:

Where H(y,p) is the cross entropy loss function commonly used in binary classification tasks, x is the input feature id index code, y is the binary classification label value of the training data, p＝p _fm (x) is the click rate prediction value of the FM network, p＝p _light (x) is the click rate prediction value of the lightweight perceptron network, p＝p _deep (x) is the click rate prediction value of the deep network, λ is the weight value for determining the prediction error between the lightweight perceptron network and the deep network, λ takes an empirical value, z _light (x) is the model output value of the lightweight perceptron network before sigmoid transformation p _light (x)＝sigmoid(z _light (x)), z _deep (x) is the model output value of the deep network before sigmoid transformation p _deep (x)＝sigmoid(z _deep (x));

The parameter updating submodule is used to update all parameters of the deep migration network using the back propagation algorithm according to the click-through rate loss L(x; W, b);

A testing module is used to discretize the test sample to obtain discrete features of the test sample, and match and map the discrete features of the test sample based on a mapping dictionary M to obtain a feature id index code of the test sample;

The prediction module is used to input the feature ID index code of the test sample into the deep migration network for training to obtain the click rate prediction of the test sample.

7. The click rate prediction device based on deep migration network according to claim 6, characterized in that the formula for discretization processing in the discretization module is as follows:

8. The click rate prediction device based on deep migration network according to claim 6, characterized in that the feature representation module comprises:

The feature co-occurrence submodule is used to create a feature co-occurrence frequency matrix C _n×n with the co-occurrence frequency of the training sample feature id index code in different samples as the matrix element;

The decomposition submodule is used to perform matrix decomposition on the feature co-occurrence frequency matrix C _n×n based on the matrix multiplication recovery error to obtain the matrix V _n×k parameters, and then update the matrix V _n×k parameters by gradient descent. The matrix decomposition formula is as follows:

in

The error J:

Ci _,j is the element of the matrix Cn _×n , _vi and _vj are the row vectors of the matrix Vn _×k , _bi and _bj are the bias terms, and Vn _×k is updated by gradient descent;

The feature characterization submodule is used to use the row vector of the updated matrix V _n×k as the characterization vector of the corresponding feature.