CN115809372B - Click rate prediction model training method and device based on decoupling invariant learning - Google Patents

Click rate prediction model training method and device based on decoupling invariant learning Download PDF

Info

Publication number
CN115809372B
CN115809372B CN202310053850.7A CN202310053850A CN115809372B CN 115809372 B CN115809372 B CN 115809372B CN 202310053850 A CN202310053850 A CN 202310053850A CN 115809372 B CN115809372 B CN 115809372B
Authority
CN
China
Prior art keywords
environment
rate prediction
prediction model
click rate
invariant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310053850.7A
Other languages
Chinese (zh)
Other versions
CN115809372A (en
Inventor
何向南
张洋
史天昊
冯福利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202310053850.7A priority Critical patent/CN115809372B/en
Publication of CN115809372A publication Critical patent/CN115809372A/en
Application granted granted Critical
Publication of CN115809372B publication Critical patent/CN115809372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a training method and device of click rate prediction model based on decoupling invariant learning. The method comprises the following steps: step one, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method; step two, randomly sampling the environment data set to obtain a training sample data set; thirdly, fixing the environment specific part parameters of the click rate prediction model, mining the environment invariant features of the training sample data set by using the click rate prediction model, and updating the environment invariant part parameters of the click rate prediction model; fourthly, fixing the environmental unchanged partial parameters of the click rate prediction model, utilizing the updated click rate prediction model to mine the environmental specific characteristics of the training sample data set, and updating the environmental specific partial parameters of the click rate prediction model; and (3) iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining the trained click rate prediction model.

Description

Click rate prediction model training method and device based on decoupling invariant learning
Technical Field
The invention relates to the field of recommendation systems, data mining and machine learning, in particular to a training method and device of click rate prediction model based on decoupling invariant learning, a click rate prediction method, electronic equipment and a storage medium.
Background
Click rate prediction is a vital link of a recommendation system. In recent years, feature interaction modeling has been recognized as the core of click-through rate prediction problems, and most research has focused on efficient modeling of feature interactions. However, prior art feature interaction modeling models are based on empirical risk minimization fitting of historical data to learn feature interactions, i.e., in the form of interpretation historical data. In a real recommendation scene, services are required to be provided in a future scene, because the interests of users continuously change, drift exists between new data and historical data, and feature interaction obtained by fitting the historical data is difficult to generalize well on the new data, so that the performance of a recommendation system is damaged.
In order to solve the problems of distribution drift and poor generalization of a learning model based on experience risk minimization, a constant learning paradigm is proposed by a person skilled in the art. Invariant learning assumes that training data is collected from heterogeneous environments, and invariant correlations are identified by distributed shifts between the environments. While this approach enables stable feature interaction learning, it assumes that the target can be adequately predicted from the environment-invariant correlations. In a recommendation system, since the training part is affected by the coupling of the environment-invariant correlation and the environment-specific correlation, this assumption cannot be satisfied, and its ability to recognize stable feature interactions is difficult to guarantee.
Disclosure of Invention
In view of the above problems, the present invention provides a training method and apparatus for click rate prediction model based on decoupling invariant learning, a click rate prediction method, an electronic device, and a storage medium, which are capable of solving at least one of the above problems.
According to a first aspect of the present invention, there is provided a training method of a click rate prediction model based on decoupling invariant learning, comprising:
step one, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;
randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and the environment data set comprises a label value;
thirdly, fixing environment specific part parameters of the click rate prediction model, mining environment invariant features of a training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and tag values of the training sample data set by using an environment invariant loss function through a gradient descent method based on an optimization target of the environment invariant part parameters to obtain a first loss value, and updating the environment invariant part parameters of the click rate prediction model according to the first loss value;
Fourthly, fixing the environmental unchanged partial parameters of the click rate prediction model, utilizing the updated click rate prediction model to mine environmental specific characteristics of the training sample data set to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environmental specific loss function through a gradient descent method based on an optimization target of the environmental specific partial parameters to obtain a second loss value, and updating the environmental specific partial parameters of the click rate prediction model according to the second loss value;
and (3) iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining the trained click rate prediction model.
According to an embodiment of the present invention, the optimization objective of the above model is represented by formula (1) and formula (2):
Figure SMS_1
(1),
Figure SMS_2
(2),
wherein, the formula (1) represents the optimization target of the environment-unchanged part parameter, the formula (2) represents the optimization target of the environment-specific part parameter,
Figure SMS_5
representing the environment-invariant part parameters,/->
Figure SMS_8
Is indicated at the +.>
Figure SMS_11
Environmental specific part parameters->
Figure SMS_4
Is indicated at the +.>
Figure SMS_7
Calculated predictive loss,/->
Figure SMS_10
Is used for controlling->
Figure SMS_14
Superparameter of intensity, < >>
Figure SMS_3
Is used for preventing->
Figure SMS_13
Capturing regularization constraints that are relevant to the invariance of the environment, +. >
Figure SMS_15
Variance representing experience risk of different training environments, +.>
Figure SMS_16
Finger environment->
Figure SMS_6
The weight of the loss is predicted and,
Figure SMS_9
representation->
Figure SMS_12
Is a coefficient of (a).
According to an embodiment of the invention, the variances of the experience risks of the different training environments
Figure SMS_17
Represented by formula (3):
Figure SMS_18
(3),
wherein ,
Figure SMS_20
element number representing training environment set, +.>
Figure SMS_23
and />
Figure SMS_24
Values representing different environments, +.>
Figure SMS_19
Is shown in the environment
Figure SMS_22
In said environment-specific part parameter, +.>
Figure SMS_25
Is indicated at the +.>
Figure SMS_26
The prediction loss obtained by calculation in the training environment is +.>
Figure SMS_21
A mode for capturing different environmental shares;
wherein the environment
Figure SMS_27
The weight of the prediction loss is expressed by formula (4):
Figure SMS_28
(4),
wherein ,
Figure SMS_29
representing traversing all environments.
According to the embodiment of the invention, the click rate prediction model comprises a click rate prediction model based on decoupling invariable learning of a click data feature embedding layer and/or a click rate prediction model based on decoupling invariable learning of a click data feature domain weight layer.
According to an embodiment of the present invention, the click rate prediction model based on decoupling invariant learning of the click data feature embedding layer is determined by the formula (5):
Figure SMS_30
(5),
wherein ,
Figure SMS_45
representing the environment-invariant part parameters,/- >
Figure SMS_33
Is indicated at the +.>
Figure SMS_37
Environmental specific part parameters->
Figure SMS_34
Figure SMS_36
,/>
Figure SMS_41
Characteristic representing click data->
Figure SMS_44
Indicate->
Figure SMS_39
Characteristics of click data->
Figure SMS_42
Indicate->
Figure SMS_31
Characteristics of click data->
Figure SMS_35
Representing the number of click data features, +.>
Figure SMS_47
Indicate->
Figure SMS_51
The environment-unchanged feature corresponding to the individual feature is embedded, < >>
Figure SMS_48
Indicate->
Figure SMS_52
The environment-unchanged feature corresponding to the individual feature is embedded, < >>
Figure SMS_43
Indicate->
Figure SMS_46
The corresponding->
Figure SMS_49
Personal environment specific feature embedding->
Figure SMS_50
Represent the first
Figure SMS_32
The corresponding->
Figure SMS_38
Personal environment specific feature embedding->
Figure SMS_40
The click rate prediction model;
wherein, the click rate prediction model based on decoupling invariant learning of the click data feature domain weight layer is determined by formula (6):
Figure SMS_53
(6),
wherein ,
Figure SMS_54
representation field->
Figure SMS_61
Characterization of->
Figure SMS_62
Representation field->
Figure SMS_55
Characterization of->
Figure SMS_57
Representation field->
Figure SMS_60
And domain->
Figure SMS_66
The environment between them is not weighted in a changing way,
Figure SMS_59
representation field->
Figure SMS_63
And domain->
Figure SMS_64
Environment between (I)>
Figure SMS_65
Has a specific weight of ∈>
Figure SMS_56
,/>
Figure SMS_58
Representing the number of feature fields.
According to an embodiment of the invention, the above-mentioned domain
Figure SMS_67
Characterization of->
Figure SMS_68
Is based on domain->
Figure SMS_69
Middle feature embedding->
Figure SMS_70
Performing calculation, and determining by a formula (7):
Figure SMS_71
(7),
wherein ,
Figure SMS_72
represents the->
Figure SMS_75
Data characteristic,/->
Figure SMS_78
Representing all belonging to the domain->
Figure SMS_73
Data characteristics of->
Figure SMS_76
Corresponding->
Figure SMS_77
Set of->
Figure SMS_79
Represents the- >
Figure SMS_74
And embedding the features corresponding to the data features.
According to a second aspect of the present invention, there is provided a click rate prediction method including:
acquiring a historical data set of a user to be predicted, wherein the historical data set of the user to be predicted comprises user characteristic data and user click data;
and excavating a predicted result of the environment invariant feature interaction of the historical dataset of the user to be predicted by using a click rate prediction model, wherein the click rate prediction model is trained by the training method of the click rate prediction model based on decoupling invariant learning.
According to a third aspect of the present invention, there is provided a training apparatus for a click rate prediction model based on decoupling invariant learning, comprising:
the model construction module is used for executing the first step, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;
the data sampling module is used for executing the second step, and randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and the environment data set comprises a label value;
The constant parameter updating module is used for executing the third step, fixing the environment specific part parameters of the click rate prediction model, utilizing the click rate prediction model to mine the environment constant characteristics of the training sample data set, obtaining a first prediction result, processing the first prediction result and the label value of the training sample data set through a gradient descent method by utilizing the environment constant loss function based on the optimization target of the environment constant part parameters, obtaining a first loss value, and updating the environment constant part parameters of the click rate prediction model according to the first loss value;
the specific parameter updating module is used for executing the step four, fixing the environmental invariant part parameters of the click rate prediction model, utilizing the updated click rate prediction model to mine the environmental specific characteristics of the training sample data set, obtaining a second prediction result, processing the second prediction result and the label value of the training sample data set by a gradient descent method by utilizing an environmental specific loss function based on the optimization target of the environmental specific part parameters, obtaining a second loss value, and updating the environmental specific part parameters of the click rate prediction model according to the second loss value;
and the iteration module is used for carrying out the second to fourth steps in an iteration mode until the click rate prediction model meets the preset convergence condition, and obtaining the trained click rate prediction model.
According to a fourth aspect of the present invention, there is provided an electronic device comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a training method of a click rate prediction model based on decoupling invariant learning and a click rate prediction method.
According to a fifth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a training method of a click rate prediction model based on decoupling invariant learning, and a click rate prediction method.
According to the training method of the click rate prediction model based on decoupling invariant learning, which is provided by the invention, the click rate prediction model with good generalization can be obtained, so that stable characteristic interaction can be identified in different historical environments by the model, meanwhile, the problem that the identification accuracy of the click rate prediction model is low in the prior art due to the fact that data drift phenomenon exists between data processed in a model application stage and historical training data is solved, and the prediction accuracy of the model is greatly improved.
Drawings
FIG. 1 is a flow chart of a training method of a click rate prediction model based on decoupling invariant learning according to an embodiment of the present invention;
FIG. 2 (a) is a schematic diagram of a decoupled invariant learning model according to an embodiment of the present invention;
FIG. 2 (b) is a schematic diagram of a light decoupling invariant learning model according to an embodiment of the present invention;
FIG. 3 is a flowchart of a click rate prediction method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a training device based on a click rate prediction model of decoupling invariant learning according to an embodiment of the present invention;
fig. 5 schematically shows a block diagram of an electronic device adapted to implement a training method of a click rate prediction model based on decoupling invariant learning and a click rate prediction method according to an embodiment of the invention.
Detailed Description
The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
Click rate prediction is a key element of a recommendation system, and an early method is a factorization machine model which models feature interactions in the form of factorization and inner products. In recent years, with the rapid development of machine learning and deep learning technologies, those skilled in the art propose to implement more efficient and complex feature interaction modeling based on various neural networks, such as a neural network based on a multi-layer perceptron, an inner product or outer product neural network, an attention mechanism, a convolutional neural network, or a graph neural network. In recent years, methods based on neural network architecture searching have also been proposed, some of which focus on automatic searching of the optimal network architecture modeling feature interactions, and others on automatic selection or generation of optimal feature interactions. These works enable better feature interaction modeling while also greatly reducing human effort. However, the characteristic interaction modeling models have the problems of data drift, poor generalization and the like; meanwhile, aiming at the problems of data drift and poor generalization, a recommendation model based on a constant learning paradigm is proposed by a person skilled in the art; the recommendation model based on the invariable learning paradigm can be assumed that targets can be fully predicted by the invariable relevance of the environment, and the assumption cannot be satisfied in the actual training and application process of the model, so that the capability of identifying stable characteristic interaction of the recommendation model based on the invariable learning paradigm is difficult to ensure.
In order to learn stable feature interaction in the problem of recommendation system click rate prediction and improve generalization capability of a model on new data, the invention provides a stable feature interaction capturing method based on decoupling invariant learning. According to the method, historical data are divided into different environments according to time sequence, and the constant learning assumption is established by decoupling the constant environment correlation and the specific environment correlation and removing the constant environment correlation, so that the constant learning is applied to capture stable characteristic interaction. Meanwhile, the stable characteristic interaction capturing method based on decoupling invariant learning can capture stable characteristic interaction from heterogeneity among different environments of historical data, so that the learned characteristic interaction can have good generalization capability in a service stage of a real click rate prediction problem scene, and the prediction accuracy of a recommendation system is improved.
Fig. 1 is a flowchart of a training method of a click rate prediction model based on decoupling invariant learning according to an embodiment of the present invention.
As shown in FIG. 1, the training method of the click rate prediction model based on decoupling invariant learning includes operations S110-150.
In operation S110, a click rate prediction model and a model optimization target are constructed based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model include an environment invariant portion parameter and an environment specific portion parameter, and the model optimization target includes an optimization target of the environment invariant portion parameter and an optimization target of the environment specific portion parameter.
In operation S120, the environmental data set is randomly sampled to obtain a training sample data set, wherein the environmental data set represents historical click data of the user in different time periods, and the environmental data set comprises tag values.
In operation S130, the environmental specific part parameter of the click rate prediction model is fixed, the environmental invariant feature of the training sample data set is mined by using the click rate prediction model to obtain a first prediction result, the first prediction result and the tag value of the training sample data set are processed by using the environmental invariant loss function through a gradient descent method based on the optimization target of the environmental invariant part parameter to obtain a first loss value, and the environmental invariant part parameter of the click rate prediction model is updated according to the first loss value.
In operation S140, the environmental invariant part parameters of the click rate prediction model are fixed, the environmental specific features of the training sample data set are mined by using the updated click rate prediction model to obtain a second prediction result, the second prediction result and the tag values of the training sample data set are processed by using the environmental specific loss function through a gradient descent method based on the optimization target of the environmental specific part parameters to obtain a second loss value, and the environmental specific part parameters of the click rate prediction model are updated according to the second loss value.
In operation S150, operations S120 to S140 are iterated until the click rate prediction model meets a preset convergence condition, so as to obtain the trained click rate prediction model.
The training method of the click rate prediction model provided by the invention can fully mine the invariant features and specific features of the historical data in different environments; where the invariant features of the historical data refer to features that the data shares over different time periods, e.g., users have a constant, relatively fixed preference for certain items or topics over different time periods that are of long-term interest and click on content related to that item or topic; while a particular feature of the historical data refers to a user's preference for certain items or topics to pop up at a certain point in time or period of time, for example, a user may increase the attention to a sudden news hot event or sudden red-burst item on social media and increase the click rate of the relevant hot event.
According to the training method of the click rate prediction model based on decoupling invariant learning, which is provided by the invention, the click rate prediction model with good generalization can be obtained, so that stable characteristic interaction can be identified in different historical environments by the model, meanwhile, the problem that the identification accuracy of the click rate prediction model is low in the prior art due to the fact that data drift phenomenon exists between data processed in a model application stage and historical training data is solved, and the prediction accuracy of the model is greatly improved.
According to an embodiment of the present invention, the optimization objective of the above model is represented by formula (1) and formula (2):
Figure SMS_80
(1),
Figure SMS_81
(2),
wherein, the formula (1) represents the optimization target of the environment-unchanged part parameter, the formula (2) represents the optimization target of the environment-specific part parameter,
Figure SMS_82
representing the environment-invariant part parameters,/->
Figure SMS_93
Is indicated at the +.>
Figure SMS_95
Environmental specific part parameters->
Figure SMS_85
Is indicated at the +.>
Figure SMS_91
Predicted loss calculated in (a), ->
Figure SMS_92
Is used for controlling->
Figure SMS_94
Superparameter of intensity, < >>
Figure SMS_83
Is used for preventing->
Figure SMS_86
Capturing regularization constraints that are relevant to the invariance of the environment, +.>
Figure SMS_89
Variance representing experience risk of different training environments, +.>
Figure SMS_90
Finger environment->
Figure SMS_84
Weight of predictive loss, ++>
Figure SMS_87
Representation->
Figure SMS_88
Is a coefficient of (a).
According to an embodiment of the invention, the variances of the experience risks of the different training environments
Figure SMS_96
Represented by formula (3):
Figure SMS_97
(3),
wherein ,
Figure SMS_100
element number representing training environment set, +.>
Figure SMS_102
and />
Figure SMS_104
Values representing different environments, +.>
Figure SMS_99
Is shown in the environment
Figure SMS_101
In said environment-specific part parameter, +.>
Figure SMS_103
Is indicated at the +.>
Figure SMS_105
The variance of the experience risk of the different training environments is +.>
Figure SMS_98
A mode for capturing different environmental shares;
wherein the environment
Figure SMS_106
The weight of the prediction loss is expressed by formula (4):
Figure SMS_107
(4),
wherein ,
Figure SMS_108
representing traversing all environments.
According to the embodiment of the invention, the click rate prediction model comprises a click rate prediction model based on decoupling invariable learning of a click data feature embedding layer and/or a click rate prediction model based on decoupling invariable learning of a click data feature domain weight layer.
According to an embodiment of the present invention, the click rate prediction model based on decoupling invariant learning of the click data feature embedding layer is determined by the formula (5):
Figure SMS_109
(5),
wherein ,
Figure SMS_119
representing the environment-invariant part parameters,/->
Figure SMS_112
Is indicated at the +.>
Figure SMS_116
Environmental specific part parameters->
Figure SMS_113
Figure SMS_117
,/>
Figure SMS_122
Characteristic representing click data->
Figure SMS_126
Indicate->
Figure SMS_118
Characteristics of click data->
Figure SMS_121
Indicate->
Figure SMS_110
Characteristics of click data->
Figure SMS_114
Indicating the number of click data features,/>
Figure SMS_123
indicate->
Figure SMS_129
The environment-unchanged feature corresponding to the individual feature is embedded, < >>
Figure SMS_130
Indicate->
Figure SMS_131
The environment-unchanged feature corresponding to the individual feature is embedded, < >>
Figure SMS_124
Indicate->
Figure SMS_128
The corresponding->
Figure SMS_125
Personal environment specific feature embedding->
Figure SMS_127
Represent the first
Figure SMS_111
The corresponding->
Figure SMS_115
Personal environment specific feature embedding->
Figure SMS_120
And the click rate prediction model.
Wherein, the click rate prediction model based on decoupling invariant learning of the click data feature domain weight layer is determined by formula (6):
Figure SMS_132
(6),
wherein ,
Figure SMS_134
representation field->
Figure SMS_141
Characterization of->
Figure SMS_142
Representation field->
Figure SMS_133
Characterization of->
Figure SMS_136
Representation field->
Figure SMS_138
And domain->
Figure SMS_139
The environment between them is not weighted in a changing way,
Figure SMS_137
representation field->
Figure SMS_143
And domain->
Figure SMS_144
Environment between (I)>
Figure SMS_145
Has a specific weight of ∈>
Figure SMS_135
,/>
Figure SMS_140
Representing the number of feature fields.
According to an embodiment of the invention, the above-mentioned domain
Figure SMS_146
Characterization of->
Figure SMS_147
Is based on domain->
Figure SMS_148
Middle feature embedding->
Figure SMS_149
Performing calculation, and determining by a formula (7):
Figure SMS_150
(7),
wherein ,
Figure SMS_151
represents the->
Figure SMS_157
Data characteristic,/->
Figure SMS_158
Representing all belonging to the domain->
Figure SMS_153
Data characteristics of->
Figure SMS_154
Corresponding->
Figure SMS_155
Set of->
Figure SMS_156
Represents the->
Figure SMS_152
And embedding the features corresponding to the data features.
The invention provides a stable characteristic interaction capturing framework aiming at click rate prediction problem, which mainly comprises three parts: a decoupled invariant learning objective for capturing stable feature interactions; a meta learning optimization framework for implementing a decoupling invariant learning objective; a model architecture for implementing decoupled invariant learning.
For the decoupling invariable learning target for capturing stable characteristic interaction, the invention divides the historical data into the following parts in equal time length in turn
Figure SMS_159
A different environment. The invention divides model parameters for modeling feature interactions into an environment invariant part +.>
Figure SMS_160
Is->
Figure SMS_161
Respectively for capturing the environment-invariant correlations and the environment-specific correlations. The invention designs +.A. at the feature embedding level and the feature domain weight level respectively >
Figure SMS_162
And->
Figure SMS_163
. In order to realize decoupling of environment invariant correlations and environment specific correlations to meet the full prediction assumption of invariant learning and capture of stable feature interactions, the invention designs a decoupling invariant learning objective consisting of two parts, namely an environment specific learning objective which meets the full prediction objective assumption and an environment invariant learning objective which removes the influence of the environment specific correlations.
The invention divides the history data into the following parts with equal time length
Figure SMS_164
The constant characteristics or specific characteristics related to the environment in the historical data can be better mined in different environments. For example, the historical click data of the user is divided into a plurality of sections, and the characteristics shared among the sections of data, namely, the unchanged click characteristics of the user which are irrelevant to the environment, can be mined; and the different characteristics among the pieces of data may be specific click characteristics related to the environment by the user.
The objectives are learned for the context specific meeting the sufficiently predicted objective assumptions. In order to meet the fully predictive assumption of invariant learning, in the environment
Figure SMS_165
In combination with->
Figure SMS_166
And->
Figure SMS_167
It should be possible to predict the objective adequately while the context-specific part can focus on capturing context-specific correlations, so the following optimization objectives are designed, as shown in equation (8):
Figure SMS_168
(8),
wherein ,
Figure SMS_170
is indicated at the +.>
Figure SMS_172
Predicted loss calculated in (a), ->
Figure SMS_174
Is used for controlling->
Figure SMS_169
Superparameter of intensity, < >>
Figure SMS_175
Is used for preventing->
Figure SMS_176
Capturing regularization constraints that are relevant to the invariance of the environment by letting the environment specific parameters +.>
Figure SMS_177
In removing the environment->
Figure SMS_171
Outside environment->
Figure SMS_173
No contribution to the prediction is achieved as shown in equation (9):
Figure SMS_178
(9)。
by optimizing the learning objective, the environment can be kept unchanged
Figure SMS_179
Parameter +_with environment specific part>
Figure SMS_180
In the environment->
Figure SMS_181
The combination of (a) satisfies the fully predicted condition while +.>
Figure SMS_182
Focusing on capturing the environment-specific relevance.
The goal is learned unchanged for the environment that removes the specific relevant effects of the environment. When (when)
Figure SMS_183
After capturing the environment-specific relevance, fix +.>
Figure SMS_184
Equivalent to removing->
Figure SMS_185
Influence on the predicted target, at this time, the capturing environment-invariant correlation can satisfy the sufficiently predicted target (remove +.>
Figure SMS_186
Affected target). Thus, the present invention is fixed->
Figure SMS_187
The following model parameters with unchanged learning target optimization environment are designed>
Figure SMS_188
To capture stable feature interactions as shown in equation (1):
Figure SMS_189
(1),
wherein
Figure SMS_190
Variance indicating experience risk of different training environments, +.>
Figure SMS_191
Finger environment->
Figure SMS_192
The specific calculation mode of the weight of the prediction loss is shown in the formulas (3) and (4):
Figure SMS_193
(3),
Figure SMS_194
(4)。
improving performance among all environments in combination with minimizing cross-environment losses and minimizing inter-environment loss differences
Figure SMS_195
The performance gap between different environments is limited, the mode shared by the different environments is captured, and model parameters stable across environments are learned. At the same time by applying a greater weight to the environment with a high experience risk +.>
Figure SMS_196
The difficult environment can be focused more, and the cross-environment generalization performance of the model parameters can be further improved.
In summary, the overall learning objective of the decoupling invariant learning is shown in formulas (1) and (2):
Figure SMS_197
(1),
Figure SMS_198
(2)。
optimizing the learning objective can decouple the environment-invariant correlation and the environment-specific correlation in different environments, and capture cross-environment stable feature interaction between heterogeneous environments through risk variance and environment weighting, so that the feature interaction can be well generalized in a model service stage.
In the meta-learning optimization framework, two sub-optimization targets of decoupling invariant learning are interdependent, and the environment invariant optimization targets are needed
Figure SMS_199
Capturing an environmental specific correlation and fixing +.>
Figure SMS_200
To remove its effects. Thus, the present invention alternately iterates the update +.>
Figure SMS_201
And->
Figure SMS_202
First, update the environment-invariant model parameters
Figure SMS_203
. Fix->
Figure SMS_204
Optimization->
Figure SMS_205
In view of the fact that the learning objective of decoupling invariant learning is a complex double-layer optimization problem, the invention optimizes the objective based on meta-learning. During the meta-training phase, an environment is randomly sampled >
Figure SMS_206
Generating intermediate model parameters with context-specific learning objectives>
Figure SMS_207
As shown in formula (10):
Figure SMS_208
(10),
then in the meta-test stage, the constant learning loss optimization obtained by calculation by using the intermediate model parameters
Figure SMS_209
As shown in formula (11):
Figure SMS_210
(11),
wherein ,
Figure SMS_211
representation of the environment->
Figure SMS_212
The weight of the loss is predicted.
Second, update the environment-specific model parameters
Figure SMS_213
. In update->
Figure SMS_214
After that, fix->
Figure SMS_215
Directly optimizing an environment-specific learning objective to update +.>
Figure SMS_216
As shown in formula (12):
Figure SMS_217
(12),
wherein ,
Figure SMS_218
solving for->
Figure SMS_219
Is a gradient of (2); the two kinds of updating are alternately and iteratively performed until the modelAnd (5) convergence.
Fig. 2 (a) and 2 (b) respectively show schematic diagrams of two types of model of decoupling invariant learning according to an embodiment of the present invention, wherein fig. 2 (a) shows a model of decoupling invariant learning and fig. 2 (b) shows a model of light decoupling invariant learning (LightDIL).
For the model architecture, the invention designs the model parameters with unchanged environment respectively at the characteristic embedding level and the characteristic domain weight level
Figure SMS_220
Parameter specific to the environment->
Figure SMS_221
Taking decoupling of two types of correlations, the invention takes a factorizer model as an example (can also be designed based on other models), and designs the following two model architectures.
The first model architecture, shown in fig. 2 (a), feature embedding level decoupling. In view of the core of the feature embedded feature interaction model, the invention decouples at the feature embedded level. For characteristics of
Figure SMS_222
Let its corresponding environment unchanged embed vector +.>
Figure SMS_223
Embedding vector set specific to environment->
Figure SMS_224
. Then for the factorer model, the specific model prediction formula is shown as formula (5):
Figure SMS_225
(5),
wherein
Figure SMS_226
This is the default Decoupled Invariant Learning (DIL) factorizer form.
The second model architecture, shown in fig. 2 (b), feature domain weight level decoupling. The decoupling of the feature embedding layer greatly increases model parameters, which increases the difficulty of model learning, and the model storage burden and the training cost. In order to improve the model efficiency, the invention is decoupled in characteristics and layers. In particular, we assign context-invariant weights, context-specific weights to feature interactions of the feature domain hierarchy to capture context-invariant, context-specific correlations, respectively. Taking a factorization machine as an example, a model prediction formula is shown as formula (6):
Figure SMS_227
(6),
wherein ,
Figure SMS_228
is based on domain->
Figure SMS_231
Middle feature embedding->
Figure SMS_237
Calculated Domain->
Figure SMS_229
Representation of; (I)>
Figure SMS_233
Domain
Figure SMS_235
And domain->
Figure SMS_236
Environment-constant weight between +.>
Figure SMS_230
Is a domain->
Figure SMS_232
And domain->
Figure SMS_234
The environment t between the two has specific weight, with +.>
Figure SMS_238
. This model architecture is named light decoupling invariant learning (LightDIL).
In summary, the present invention designs the decoupling model architecture at the feature embedding level and the feature domain weight level, respectively, as shown in fig. 2 (a). In the service stage, only environment-unchanged model parameters, namely stable feature interaction, are used for prediction so as to ensure good generalization capability. Taking light decoupling invariant learning as an example, a specific prediction formula is shown as formula (13):
Figure SMS_239
(13),
wherein ,
Figure SMS_240
representing the empty set, representing the replacement of the set of environment-specific model parameters with the empty set during the prediction phase of light decoupling invariant learning.
Fig. 3 is a flowchart of a click rate prediction method according to an embodiment of the present invention.
As shown in FIG. 3, the click rate prediction method includes operations S310 to S320.
In operation S310, a history data set of a user to be predicted is acquired, wherein the history data set of the user to be predicted includes user characteristic data and user click data.
In operation S320, the prediction result of the environment-invariant feature interactions of the historical dataset of the user to be predicted is mined by using a click rate prediction model, where the click rate prediction model is trained by the training method of the click rate prediction model based on decoupling invariant learning.
Fig. 4 is a schematic structural diagram of a training device based on a click rate prediction model of decoupling invariant learning according to an embodiment of the present invention.
As shown in fig. 4, the training apparatus 400 based on the click rate prediction model of decoupling invariant learning includes a model construction module 410, a data sampling module 420, an invariant parameter updating module 430, a specific parameter updating module 440, and an iteration module 450.
The model building module 410 is configured to perform operation S110, and build a click rate prediction model and a model optimization target based on a decoupling invariant learning method, where parameters of the click rate prediction model include an environment invariant portion parameter and an environment specific portion parameter, and the model optimization target includes an optimization target of the environment invariant portion parameter and an optimization target of the environment specific portion parameter.
The data sampling module 420 is configured to perform operation S120 to randomly sample an environmental data set to obtain a training sample data set, where the environmental data set represents historical click data of a user in different time periods, and the environmental data set includes a tag value.
The constant parameter updating module 430 is configured to execute operation S130, fix an environmental specific part parameter of the click rate prediction model, mine an environmental constant characteristic of the training sample data set by using the click rate prediction model, obtain a first prediction result, process the first prediction result and a tag value of the training sample data set by using the environmental constant loss function through a gradient descent method based on an optimization target of the environmental constant part parameter, obtain a first loss value, and update the environmental constant part parameter of the click rate prediction model according to the first loss value.
The specific parameter updating module 440 is configured to execute operation S140, fix the environmental invariant part parameter of the click rate prediction model, mine the environmental specific feature of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, process the second prediction result and the tag value of the training sample data set by using the gradient descent method by using the environmental specific loss function based on the optimization target of the environmental specific part parameter to obtain a second loss value, and update the environmental specific part parameter of the click rate prediction model according to the second loss value.
And an iteration module 450, configured to iterate operations S120 to S140 until the click rate prediction model meets a preset convergence condition, thereby obtaining a trained click rate prediction model.
In order to better illustrate the advantages of the click rate prediction model obtained by the training method provided by the invention, the click rate prediction model obtained by the training method provided by the invention is verified by combining a specific experiment.
According to the invention, a classical click rate prediction model FM is taken as a base recommendation model, and two data Douban and MovieLens10M (ML-10M) with different types are selected for experiments. The invention takes FwFMs, autoFIS, PROFIT Group-DRO and V-Rex as comparison models. The invention takes 6 months as a time period, and divides the Douban and ML-10M into 1513 parts respectively. For dousan, the first five time periods are used as training sets, the middle five are used as validation sets, and the last five are used as test sets. For ML-10M, the first five time periods are used as training sets, the middle four are used as validation sets, and the last four are used as test sets. All methods train the model on the training set, pick the optimal parameters on the validation set, and test on the test set. We counted the average performance of the last several test phases of dousan and ML-10M, respectively, with AUC and loglos as metrics.
The experimental results are shown in table 1:
table 1 comparison of the performance of different methods on two data sets
Figure SMS_241
From table 1, it can be found that: on two different types of data sets, the method exceeds the general constant learning method V-Rex and Group-DRO in all indexes, and shows that the method can meet the full prediction assumption of constant learning through decoupling, and the constant learning is applied to click rate prediction stable feature interaction capture. Compared with the recommendation system model FwFMs, autoFIS, PROFIT, the method can obtain excellent results, which shows that the method can capture stable characteristic interaction for different recommendation scenes, realize better generalization in the prediction of the service stage and improve the accuracy of the prediction.
Fig. 5 schematically shows a block diagram of an electronic device adapted to implement a training method of a click rate prediction model based on decoupling invariant learning and a click rate prediction method according to an embodiment of the invention.
As shown in fig. 5, an electronic device 500 according to an embodiment of the present invention includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 501 may also include on-board memory for caching purposes. The processor 501 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flow according to an embodiment of the invention.
In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are stored. The processor 501, ROM 502, and RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to an embodiment of the present invention by executing programs in the ROM 502 and/or the RAM 503. Note that the program may be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in the one or more memories.
According to an embodiment of the invention, the electronic device 500 may further comprise an input/output (I/O) interface 505, the input/output (I/O) interface 505 also being connected to the bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.
According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the invention thereto, but to limit the invention thereto, and any modifications, equivalents, improvements and equivalents thereof may be made without departing from the spirit and principles of the invention.

Claims (9)

1. The training method of the click rate prediction model based on decoupling invariant learning is characterized by comprising the following steps of:
step one, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;
randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and the environment data set comprises a label value;
thirdly, fixing environment specific part parameters of the click rate prediction model, mining environment invariant features of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the tag value of the training sample data set by using an environment invariant loss function through a gradient descent method based on an optimization target of the environment invariant part parameters to obtain a first loss value, and updating the environment invariant part parameters of the click rate prediction model according to the first loss value, wherein the environment invariant features represent features shared by data in different time periods;
Fourthly, fixing the environmental invariant part parameters of the click rate prediction model, mining the environmental specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environmental specific loss function through a gradient descent method based on the optimization target of the environmental specific part parameters to obtain a second loss value, and updating the environmental specific part parameters of the click rate prediction model according to the second loss value, wherein the environmental specific characteristics represent the sudden appearance preference of a user to certain articles or topics at a certain time point or within a certain time period;
iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining a trained click rate prediction model;
wherein the optimization objective of the model is represented by formula (1) and formula (2):
Figure QLYQS_1
(1),/>
Figure QLYQS_2
(2),
wherein equation (1) represents an optimization objective for the environment-invariant portion parameter, equation (2) represents an optimization objective for the environment-specific portion parameter,
Figure QLYQS_4
representing the parameters of the constant part of the environment,
Figure QLYQS_6
Is shown in the environment
Figure QLYQS_8
Is provided with a parameter of the specific part of the environment,
Figure QLYQS_10
is shown in the environment
Figure QLYQS_12
The prediction loss obtained by the calculation in (c),
Figure QLYQS_14
is used for controlling
Figure QLYQS_16
The super-parameters of the intensity are used to determine,
Figure QLYQS_3
is used for preventing
Figure QLYQS_5
The environment-invariant dependent regularization constraints are captured,
Figure QLYQS_7
representing the variance of the experience risk of different training environments,
Figure QLYQS_9
refers to the environment
Figure QLYQS_11
The weight of the loss is predicted and,
Figure QLYQS_13
representation of
Figure QLYQS_15
Is a coefficient of (a).
2. The method of claim 1, wherein the variance of the risk of experience of the different training environments
Figure QLYQS_17
Represented by formula (3):
Figure QLYQS_18
(3),
wherein ,
Figure QLYQS_20
representing the number of elements of the training set of environments,
Figure QLYQS_21
and
Figure QLYQS_22
the values representing the different environments are taken into account,
Figure QLYQS_23
is shown in the environment
Figure QLYQS_24
Is provided with a parameter of the specific part of the environment,
Figure QLYQS_25
is shown in the environment
Figure QLYQS_26
The variance of experience risks of different training environments
Figure QLYQS_19
A mode for capturing different environmental shares;
wherein the environment is
Figure QLYQS_27
Weight of predictive loss->
Figure QLYQS_28
Represented by formula (4):
Figure QLYQS_29
(4),
wherein ,
Figure QLYQS_30
representing traversing all environments.
3. The method of claim 1, wherein the click-through rate prediction model comprises a click-through rate prediction model based on decoupling invariant learning of a click-through data feature embedding level and/or a click-through rate prediction model based on decoupling invariant learning of a click-through data feature domain weighting level.
4. A method according to claim 3, wherein the click rate prediction model based on decoupling invariant learning of click data feature embedding planes is determined by equation (5):
Figure QLYQS_31
(5),
wherein ,
Figure QLYQS_46
representing the parameters of the constant part of the environment,
Figure QLYQS_47
is shown in the environment
Figure QLYQS_49
Is provided with a parameter of the specific part of the environment,
Figure QLYQS_50
Figure QLYQS_51
Figure QLYQS_52
the characteristics of the click data are represented,
Figure QLYQS_53
represent the first
Figure QLYQS_32
The characteristics of the individual click data are such that,
Figure QLYQS_34
represent the first
Figure QLYQS_36
The characteristics of the individual click data are such that,
Figure QLYQS_39
representing the number of the click data features,
Figure QLYQS_42
represent the first
Figure QLYQS_43
The environment-invariant features corresponding to the individual features are embedded,
Figure QLYQS_44
represent the first
Figure QLYQS_45
The environment-invariant features corresponding to the individual features are embedded,
Figure QLYQS_33
represent the first
Figure QLYQS_35
The corresponding first feature
Figure QLYQS_37
The embedding of the individual environment-specific features,
Figure QLYQS_38
represent the first
Figure QLYQS_40
The corresponding first feature
Figure QLYQS_41
The embedding of the individual environment-specific features,
Figure QLYQS_48
the click rate prediction model;
wherein the click rate prediction model based on decoupling invariant learning of the click data feature domain weight layer is determined by formula (6):
Figure QLYQS_54
(6),
wherein ,
Figure QLYQS_55
representation domain
Figure QLYQS_56
Is characterized in that,
Figure QLYQS_57
representation domain
Figure QLYQS_59
Is characterized in that,
Figure QLYQS_60
representation domain
Figure QLYQS_61
AND Domain
Figure QLYQS_63
The environment between them is not weighted in a changing way,
Figure QLYQS_58
representation domain
Figure QLYQS_62
AND Domain
Figure QLYQS_64
Environment between
Figure QLYQS_65
Has a specific weight of
Figure QLYQS_66
Figure QLYQS_67
Representing the number of feature fields.
5. The method of claim 4, wherein the domain
Figure QLYQS_68
Characterization of (2)
Figure QLYQS_69
Is based on domain
Figure QLYQS_70
Middle feature embedding
Figure QLYQS_71
Performing calculation, and determining by a formula (7):
Figure QLYQS_72
(7),
wherein ,
Figure QLYQS_73
representing the first of the data
Figure QLYQS_75
The characteristics of the data are such that,
Figure QLYQS_76
representing all of the belonging domains
Figure QLYQS_77
Data characteristics of (2)
Figure QLYQS_78
Corresponding to
Figure QLYQS_79
Is a set of (a) and (b),
Figure QLYQS_80
representing the first of the data
Figure QLYQS_74
And embedding the features corresponding to the data features.
6. A click rate prediction method, comprising:
acquiring a historical data set of a user to be predicted, wherein the historical data set of the user to be predicted comprises user characteristic data and user click data;
and mining prediction results of the environment-invariant feature interactions of the historical dataset of the user to be predicted by using a click rate prediction model, wherein the click rate prediction model is trained by the method of any one of claims 1-5.
7. A training device based on a click rate prediction model of decoupling invariant learning, comprising:
the model construction module is used for executing the first step, and constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;
The data sampling module is used for executing the second step, and randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and the environment data set comprises a label value;
the constant parameter updating module is used for executing the step three, fixing the environment specific part parameters of the click rate prediction model, utilizing the click rate prediction model to mine the environment constant characteristics of the training sample data set to obtain a first prediction result, utilizing an environment constant loss function to process the first prediction result and the label value of the training sample data set through a gradient descent method based on the optimization target of the environment constant part parameters to obtain a first loss value, and updating the environment constant part parameters of the click rate prediction model according to the first loss value, wherein the environment constant characteristics represent the characteristics shared by data in different time periods;
a specific parameter updating module, configured to execute step four, fix an environmental invariant part parameter of the click rate prediction model, mine an environmental specific feature of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, process the second prediction result and a tag value of the training sample data set by using an environmental specific loss function through a gradient descent method based on an optimization target of the environmental specific part parameter to obtain a second loss value, and update the environmental specific part parameter of the click rate prediction model according to the second loss value, where the environmental specific feature represents a preference of a user for a sudden occurrence of some objects or topics at a certain time point or in a certain time period;
The iteration module is used for carrying out the second to fourth steps in an iteration mode until the click rate prediction model meets the preset convergence condition, and a trained click rate prediction model is obtained;
wherein the optimization objective of the model is represented by formula (1) and formula (2):
Figure QLYQS_81
(1),
Figure QLYQS_82
(2),
wherein equation (1) represents an optimization objective for the environment-invariant portion parameter, equation (2) represents an optimization objective for the environment-specific portion parameter,
Figure QLYQS_83
representing the parameters of the constant part of the environment,
Figure QLYQS_89
is shown in the environment
Figure QLYQS_91
Is provided with a parameter of the specific part of the environment,
Figure QLYQS_93
is shown in the environment
Figure QLYQS_94
The prediction loss obtained by the calculation in (c),
Figure QLYQS_95
is used for controlling
Figure QLYQS_96
The super-parameters of the intensity are used to determine,
Figure QLYQS_84
is used for preventing
Figure QLYQS_85
The environment-invariant dependent regularization constraints are captured,
Figure QLYQS_86
representing the variance of the experience risk of different training environments,
Figure QLYQS_87
refers to the environment
Figure QLYQS_88
The weight of the loss is predicted and,
Figure QLYQS_90
representation of
Figure QLYQS_92
Is a coefficient of (a).
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.
9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-6.
CN202310053850.7A 2023-02-03 2023-02-03 Click rate prediction model training method and device based on decoupling invariant learning Active CN115809372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310053850.7A CN115809372B (en) 2023-02-03 2023-02-03 Click rate prediction model training method and device based on decoupling invariant learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310053850.7A CN115809372B (en) 2023-02-03 2023-02-03 Click rate prediction model training method and device based on decoupling invariant learning

Publications (2)

Publication Number Publication Date
CN115809372A CN115809372A (en) 2023-03-17
CN115809372B true CN115809372B (en) 2023-06-16

Family

ID=85487763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310053850.7A Active CN115809372B (en) 2023-02-03 2023-02-03 Click rate prediction model training method and device based on decoupling invariant learning

Country Status (1)

Country Link
CN (1) CN115809372B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490389A (en) * 2019-08-27 2019-11-22 腾讯科技(深圳)有限公司 Clicking rate prediction technique, device, equipment and medium
CN111538761A (en) * 2020-04-21 2020-08-14 中南大学 Click rate prediction method based on attention mechanism
CN114240555A (en) * 2021-12-17 2022-03-25 北京沃东天骏信息技术有限公司 Click rate prediction model training method and device and click rate prediction method and device
CN114445121A (en) * 2021-12-27 2022-05-06 天翼云科技有限公司 Advertisement click rate prediction model construction and advertisement click rate prediction method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220083913A1 (en) * 2020-09-11 2022-03-17 Actapio, Inc. Learning apparatus, learning method, and a non-transitory computer-readable storage medium
CN113205184B (en) * 2021-04-28 2023-01-31 清华大学 Invariant learning method and device based on heterogeneous hybrid data
CN115018552A (en) * 2022-06-28 2022-09-06 中国科学技术大学 Method for determining click rate of product

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490389A (en) * 2019-08-27 2019-11-22 腾讯科技(深圳)有限公司 Clicking rate prediction technique, device, equipment and medium
CN111538761A (en) * 2020-04-21 2020-08-14 中南大学 Click rate prediction method based on attention mechanism
CN114240555A (en) * 2021-12-17 2022-03-25 北京沃东天骏信息技术有限公司 Click rate prediction model training method and device and click rate prediction method and device
CN114445121A (en) * 2021-12-27 2022-05-06 天翼云科技有限公司 Advertisement click rate prediction model construction and advertisement click rate prediction method

Also Published As

Publication number Publication date
CN115809372A (en) 2023-03-17

Similar Documents

Publication Publication Date Title
CN113302634B (en) System, medium, and method for learning and predicting key phrases and generating predictions
CN103502899B (en) Dynamic prediction Modeling Platform
CN111369299B (en) Identification method, device, equipment and computer readable storage medium
CN110110233B (en) Information processing method, device, medium and computing equipment
CN109471978B (en) Electronic resource recommendation method and device
JP2011096255A (en) Ranking oriented cooperative filtering recommendation method and device
US11475175B2 (en) Intelligent design structure selection in an internet of things (IoT) computing environment
CN110362663B (en) Adaptive multi-perceptual similarity detection and analysis
CN106407381B (en) A kind of method and apparatus of the pushed information based on artificial intelligence
US11423307B2 (en) Taxonomy construction via graph-based cross-domain knowledge transfer
CN111461345A (en) Deep learning model training method and device
CN113435430B (en) Video behavior identification method, system and equipment based on self-adaptive space-time entanglement
CN109389424B (en) Flow distribution method and device, electronic equipment and storage medium
US20160328466A1 (en) Label filters for large scale multi-label classification
US11954590B2 (en) Artificial intelligence job recommendation neural network machine learning training based on embedding technologies and actual and synthetic job transition latent information
CN111881358B (en) Object recommendation system, method and device, electronic equipment and storage medium
CN112668690A (en) Method and computer system for neural network model compression
CN116684330A (en) Traffic prediction method, device, equipment and storage medium based on artificial intelligence
Du et al. Improve User Retention with Causal Learning
US10853417B2 (en) Generating a platform-based representative image for a digital video
CN115809372B (en) Click rate prediction model training method and device based on decoupling invariant learning
KR20160128869A (en) Method for visual object localization using privileged information and apparatus for performing the same
CN116578400A (en) Multitasking data processing method and device
CN115641009A (en) Method and device for discovering competitors based on patent heterogeneous information network
CN115495663A (en) Information recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant