CN115809372B

CN115809372B - Click rate prediction model training method and device based on decoupling invariant learning

Info

Publication number: CN115809372B
Application number: CN202310053850.7A
Authority: CN
Inventors: 何向南; 张洋; 史天昊; 冯福利
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-06-16
Anticipated expiration: 2043-02-03
Also published as: CN115809372A

Abstract

The invention discloses a training method and device of click rate prediction model based on decoupling invariant learning. The method comprises the following steps: step one, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method; step two, randomly sampling the environment data set to obtain a training sample data set; thirdly, fixing the environment specific part parameters of the click rate prediction model, mining the environment invariant features of the training sample data set by using the click rate prediction model, and updating the environment invariant part parameters of the click rate prediction model; fourthly, fixing the environmental unchanged partial parameters of the click rate prediction model, utilizing the updated click rate prediction model to mine the environmental specific characteristics of the training sample data set, and updating the environmental specific partial parameters of the click rate prediction model; and (3) iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining the trained click rate prediction model.

Description

Click rate prediction model training method and device based on decoupling invariant learning

Technical Field

The invention relates to the field of recommendation systems, data mining and machine learning, in particular to a training method and device of click rate prediction model based on decoupling invariant learning, a click rate prediction method, electronic equipment and a storage medium.

Background

Click rate prediction is a vital link of a recommendation system. In recent years, feature interaction modeling has been recognized as the core of click-through rate prediction problems, and most research has focused on efficient modeling of feature interactions. However, prior art feature interaction modeling models are based on empirical risk minimization fitting of historical data to learn feature interactions, i.e., in the form of interpretation historical data. In a real recommendation scene, services are required to be provided in a future scene, because the interests of users continuously change, drift exists between new data and historical data, and feature interaction obtained by fitting the historical data is difficult to generalize well on the new data, so that the performance of a recommendation system is damaged.

In order to solve the problems of distribution drift and poor generalization of a learning model based on experience risk minimization, a constant learning paradigm is proposed by a person skilled in the art. Invariant learning assumes that training data is collected from heterogeneous environments, and invariant correlations are identified by distributed shifts between the environments. While this approach enables stable feature interaction learning, it assumes that the target can be adequately predicted from the environment-invariant correlations. In a recommendation system, since the training part is affected by the coupling of the environment-invariant correlation and the environment-specific correlation, this assumption cannot be satisfied, and its ability to recognize stable feature interactions is difficult to guarantee.

Disclosure of Invention

In view of the above problems, the present invention provides a training method and apparatus for click rate prediction model based on decoupling invariant learning, a click rate prediction method, an electronic device, and a storage medium, which are capable of solving at least one of the above problems.

According to a first aspect of the present invention, there is provided a training method of a click rate prediction model based on decoupling invariant learning, comprising:

step one, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;

randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and the environment data set comprises a label value;

thirdly, fixing environment specific part parameters of the click rate prediction model, mining environment invariant features of a training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and tag values of the training sample data set by using an environment invariant loss function through a gradient descent method based on an optimization target of the environment invariant part parameters to obtain a first loss value, and updating the environment invariant part parameters of the click rate prediction model according to the first loss value;

Fourthly, fixing the environmental unchanged partial parameters of the click rate prediction model, utilizing the updated click rate prediction model to mine environmental specific characteristics of the training sample data set to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environmental specific loss function through a gradient descent method based on an optimization target of the environmental specific partial parameters to obtain a second loss value, and updating the environmental specific partial parameters of the click rate prediction model according to the second loss value;

and (3) iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining the trained click rate prediction model.

According to an embodiment of the present invention, the optimization objective of the above model is represented by formula (1) and formula (2):

（1），

（2），

wherein, the formula (1) represents the optimization target of the environment-unchanged part parameter, the formula (2) represents the optimization target of the environment-specific part parameter,

representing the environment-invariant part parameters,/->

Is indicated at the +.>

Environmental specific part parameters->

Is indicated at the +.>

Calculated predictive loss,/->

Is used for controlling->

Superparameter of intensity, < >>

Is used for preventing->

Capturing regularization constraints that are relevant to the invariance of the environment, +. >

Variance representing experience risk of different training environments, +.>

Finger environment->

The weight of the loss is predicted and,

representation->

Is a coefficient of (a).

According to an embodiment of the invention, the variances of the experience risks of the different training environments

Represented by formula (3):

（3），

wherein ,

element number representing training environment set, +.>

and />

Values representing different environments, +.>

Is shown in the environment

In said environment-specific part parameter, +.>

Is indicated at the +.>

The prediction loss obtained by calculation in the training environment is +.>

A mode for capturing different environmental shares;

wherein the environment

The weight of the prediction loss is expressed by formula (4):

（4），

wherein ,

representing traversing all environments.

According to the embodiment of the invention, the click rate prediction model comprises a click rate prediction model based on decoupling invariable learning of a click data feature embedding layer and/or a click rate prediction model based on decoupling invariable learning of a click data feature domain weight layer.

According to an embodiment of the present invention, the click rate prediction model based on decoupling invariant learning of the click data feature embedding layer is determined by the formula (5):

（5），

wherein ,

representing the environment-invariant part parameters,/- >

Is indicated at the +.>

Environmental specific part parameters->

，

，/>

Characteristic representing click data->

Indicate->

Characteristics of click data->

Indicate->

Characteristics of click data->

Representing the number of click data features, +.>

Indicate->

The environment-unchanged feature corresponding to the individual feature is embedded, < >>

Indicate->

Indicate->

The corresponding->

Personal environment specific feature embedding->

Represent the first

The corresponding->

Personal environment specific feature embedding->

The click rate prediction model;

wherein, the click rate prediction model based on decoupling invariant learning of the click data feature domain weight layer is determined by formula (6):

（6），

wherein ,

representation field->

Characterization of->

Representation field->

Characterization of->

Representation field->

And domain->

The environment between them is not weighted in a changing way,

representation field->

And domain->

Environment between (I)>

Has a specific weight of ∈>

，/>

Representing the number of feature fields.

According to an embodiment of the invention, the above-mentioned domain

Characterization of->

Is based on domain->

Middle feature embedding->

Performing calculation, and determining by a formula (7):

（7），

wherein ,

represents the->

Data characteristic,/->

Representing all belonging to the domain->

Data characteristics of->

Corresponding->

Set of->

Represents the- >

And embedding the features corresponding to the data features.

According to a second aspect of the present invention, there is provided a click rate prediction method including:

acquiring a historical data set of a user to be predicted, wherein the historical data set of the user to be predicted comprises user characteristic data and user click data;

and excavating a predicted result of the environment invariant feature interaction of the historical dataset of the user to be predicted by using a click rate prediction model, wherein the click rate prediction model is trained by the training method of the click rate prediction model based on decoupling invariant learning.

According to a third aspect of the present invention, there is provided a training apparatus for a click rate prediction model based on decoupling invariant learning, comprising:

the model construction module is used for executing the first step, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;

the data sampling module is used for executing the second step, and randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and the environment data set comprises a label value;

The constant parameter updating module is used for executing the third step, fixing the environment specific part parameters of the click rate prediction model, utilizing the click rate prediction model to mine the environment constant characteristics of the training sample data set, obtaining a first prediction result, processing the first prediction result and the label value of the training sample data set through a gradient descent method by utilizing the environment constant loss function based on the optimization target of the environment constant part parameters, obtaining a first loss value, and updating the environment constant part parameters of the click rate prediction model according to the first loss value;

the specific parameter updating module is used for executing the step four, fixing the environmental invariant part parameters of the click rate prediction model, utilizing the updated click rate prediction model to mine the environmental specific characteristics of the training sample data set, obtaining a second prediction result, processing the second prediction result and the label value of the training sample data set by a gradient descent method by utilizing an environmental specific loss function based on the optimization target of the environmental specific part parameters, obtaining a second loss value, and updating the environmental specific part parameters of the click rate prediction model according to the second loss value;

and the iteration module is used for carrying out the second to fourth steps in an iteration mode until the click rate prediction model meets the preset convergence condition, and obtaining the trained click rate prediction model.

According to a fourth aspect of the present invention, there is provided an electronic device comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a training method of a click rate prediction model based on decoupling invariant learning and a click rate prediction method.

According to a fifth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a training method of a click rate prediction model based on decoupling invariant learning, and a click rate prediction method.

According to the training method of the click rate prediction model based on decoupling invariant learning, which is provided by the invention, the click rate prediction model with good generalization can be obtained, so that stable characteristic interaction can be identified in different historical environments by the model, meanwhile, the problem that the identification accuracy of the click rate prediction model is low in the prior art due to the fact that data drift phenomenon exists between data processed in a model application stage and historical training data is solved, and the prediction accuracy of the model is greatly improved.

Drawings

FIG. 1 is a flow chart of a training method of a click rate prediction model based on decoupling invariant learning according to an embodiment of the present invention;

FIG. 2 (a) is a schematic diagram of a decoupled invariant learning model according to an embodiment of the present invention;

FIG. 2 (b) is a schematic diagram of a light decoupling invariant learning model according to an embodiment of the present invention;

FIG. 3 is a flowchart of a click rate prediction method according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a training device based on a click rate prediction model of decoupling invariant learning according to an embodiment of the present invention;

fig. 5 schematically shows a block diagram of an electronic device adapted to implement a training method of a click rate prediction model based on decoupling invariant learning and a click rate prediction method according to an embodiment of the invention.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

Click rate prediction is a key element of a recommendation system, and an early method is a factorization machine model which models feature interactions in the form of factorization and inner products. In recent years, with the rapid development of machine learning and deep learning technologies, those skilled in the art propose to implement more efficient and complex feature interaction modeling based on various neural networks, such as a neural network based on a multi-layer perceptron, an inner product or outer product neural network, an attention mechanism, a convolutional neural network, or a graph neural network. In recent years, methods based on neural network architecture searching have also been proposed, some of which focus on automatic searching of the optimal network architecture modeling feature interactions, and others on automatic selection or generation of optimal feature interactions. These works enable better feature interaction modeling while also greatly reducing human effort. However, the characteristic interaction modeling models have the problems of data drift, poor generalization and the like; meanwhile, aiming at the problems of data drift and poor generalization, a recommendation model based on a constant learning paradigm is proposed by a person skilled in the art; the recommendation model based on the invariable learning paradigm can be assumed that targets can be fully predicted by the invariable relevance of the environment, and the assumption cannot be satisfied in the actual training and application process of the model, so that the capability of identifying stable characteristic interaction of the recommendation model based on the invariable learning paradigm is difficult to ensure.

In order to learn stable feature interaction in the problem of recommendation system click rate prediction and improve generalization capability of a model on new data, the invention provides a stable feature interaction capturing method based on decoupling invariant learning. According to the method, historical data are divided into different environments according to time sequence, and the constant learning assumption is established by decoupling the constant environment correlation and the specific environment correlation and removing the constant environment correlation, so that the constant learning is applied to capture stable characteristic interaction. Meanwhile, the stable characteristic interaction capturing method based on decoupling invariant learning can capture stable characteristic interaction from heterogeneity among different environments of historical data, so that the learned characteristic interaction can have good generalization capability in a service stage of a real click rate prediction problem scene, and the prediction accuracy of a recommendation system is improved.

Fig. 1 is a flowchart of a training method of a click rate prediction model based on decoupling invariant learning according to an embodiment of the present invention.

As shown in FIG. 1, the training method of the click rate prediction model based on decoupling invariant learning includes operations S110-150.

In operation S110, a click rate prediction model and a model optimization target are constructed based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model include an environment invariant portion parameter and an environment specific portion parameter, and the model optimization target includes an optimization target of the environment invariant portion parameter and an optimization target of the environment specific portion parameter.

In operation S120, the environmental data set is randomly sampled to obtain a training sample data set, wherein the environmental data set represents historical click data of the user in different time periods, and the environmental data set comprises tag values.

In operation S130, the environmental specific part parameter of the click rate prediction model is fixed, the environmental invariant feature of the training sample data set is mined by using the click rate prediction model to obtain a first prediction result, the first prediction result and the tag value of the training sample data set are processed by using the environmental invariant loss function through a gradient descent method based on the optimization target of the environmental invariant part parameter to obtain a first loss value, and the environmental invariant part parameter of the click rate prediction model is updated according to the first loss value.

In operation S140, the environmental invariant part parameters of the click rate prediction model are fixed, the environmental specific features of the training sample data set are mined by using the updated click rate prediction model to obtain a second prediction result, the second prediction result and the tag values of the training sample data set are processed by using the environmental specific loss function through a gradient descent method based on the optimization target of the environmental specific part parameters to obtain a second loss value, and the environmental specific part parameters of the click rate prediction model are updated according to the second loss value.

In operation S150, operations S120 to S140 are iterated until the click rate prediction model meets a preset convergence condition, so as to obtain the trained click rate prediction model.

The training method of the click rate prediction model provided by the invention can fully mine the invariant features and specific features of the historical data in different environments; where the invariant features of the historical data refer to features that the data shares over different time periods, e.g., users have a constant, relatively fixed preference for certain items or topics over different time periods that are of long-term interest and click on content related to that item or topic; while a particular feature of the historical data refers to a user's preference for certain items or topics to pop up at a certain point in time or period of time, for example, a user may increase the attention to a sudden news hot event or sudden red-burst item on social media and increase the click rate of the relevant hot event.

（1），

（2），

representing the environment-invariant part parameters,/->

Is indicated at the +.>

Environmental specific part parameters->

Is indicated at the +.>

Predicted loss calculated in (a), ->

Is used for controlling->

Superparameter of intensity, < >>

Is used for preventing->

Capturing regularization constraints that are relevant to the invariance of the environment, +.>

Variance representing experience risk of different training environments, +.>

Finger environment->

Weight of predictive loss, ++>

Representation->

Is a coefficient of (a).

Represented by formula (3):

（3），

wherein ,

element number representing training environment set, +.>

and />

Values representing different environments, +.>

Is shown in the environment

In said environment-specific part parameter, +.>

Is indicated at the +.>

The variance of the experience risk of the different training environments is +.>

A mode for capturing different environmental shares;

wherein the environment

The weight of the prediction loss is expressed by formula (4):

（4），

wherein ,

representing traversing all environments.

（5），

wherein ,

representing the environment-invariant part parameters,/->

Is indicated at the +.>

Environmental specific part parameters->

，

，/>

Characteristic representing click data->

Indicate->

Characteristics of click data->

Indicate->

Characteristics of click data->

Indicating the number of click data features,/>

indicate->

Indicate->

Indicate->

The corresponding->

Personal environment specific feature embedding->

Represent the first

The corresponding->

Personal environment specific feature embedding->

And the click rate prediction model.

（6），

wherein ,

representation field->

Characterization of->

Representation field->

Characterization of->

Representation field->

And domain->

The environment between them is not weighted in a changing way,

representation field->

And domain->

Environment between (I)>

Has a specific weight of ∈>

，/>

Representing the number of feature fields.

According to an embodiment of the invention, the above-mentioned domain

Characterization of->

Is based on domain->

Middle feature embedding->

Performing calculation, and determining by a formula (7):

（7），

wherein ,

represents the->

Data characteristic,/->

Representing all belonging to the domain->

Data characteristics of->

Corresponding->

Set of->

Represents the->

And embedding the features corresponding to the data features.

The invention provides a stable characteristic interaction capturing framework aiming at click rate prediction problem, which mainly comprises three parts: a decoupled invariant learning objective for capturing stable feature interactions; a meta learning optimization framework for implementing a decoupling invariant learning objective; a model architecture for implementing decoupled invariant learning.

For the decoupling invariable learning target for capturing stable characteristic interaction, the invention divides the historical data into the following parts in equal time length in turn

A different environment. The invention divides model parameters for modeling feature interactions into an environment invariant part +.>

Is->

Respectively for capturing the environment-invariant correlations and the environment-specific correlations. The invention designs +.A. at the feature embedding level and the feature domain weight level respectively >

And->

. In order to realize decoupling of environment invariant correlations and environment specific correlations to meet the full prediction assumption of invariant learning and capture of stable feature interactions, the invention designs a decoupling invariant learning objective consisting of two parts, namely an environment specific learning objective which meets the full prediction objective assumption and an environment invariant learning objective which removes the influence of the environment specific correlations.

The invention divides the history data into the following parts with equal time length

The constant characteristics or specific characteristics related to the environment in the historical data can be better mined in different environments. For example, the historical click data of the user is divided into a plurality of sections, and the characteristics shared among the sections of data, namely, the unchanged click characteristics of the user which are irrelevant to the environment, can be mined; and the different characteristics among the pieces of data may be specific click characteristics related to the environment by the user.

The objectives are learned for the context specific meeting the sufficiently predicted objective assumptions. In order to meet the fully predictive assumption of invariant learning, in the environment

In combination with->

And->

It should be possible to predict the objective adequately while the context-specific part can focus on capturing context-specific correlations, so the following optimization objectives are designed, as shown in equation (8):

（8），

wherein ,

is indicated at the +.>

Predicted loss calculated in (a), ->

Is used for controlling->

Superparameter of intensity, < >>

Is used for preventing->

Capturing regularization constraints that are relevant to the invariance of the environment by letting the environment specific parameters +.>

In removing the environment->

Outside environment->

No contribution to the prediction is achieved as shown in equation (9):

（9）。

by optimizing the learning objective, the environment can be kept unchanged

Parameter +_with environment specific part>

In the environment->

The combination of (a) satisfies the fully predicted condition while +.>

Focusing on capturing the environment-specific relevance.

The goal is learned unchanged for the environment that removes the specific relevant effects of the environment. When (when)

After capturing the environment-specific relevance, fix +.>

Equivalent to removing->

Influence on the predicted target, at this time, the capturing environment-invariant correlation can satisfy the sufficiently predicted target (remove +.>

Affected target). Thus, the present invention is fixed->

The following model parameters with unchanged learning target optimization environment are designed>

To capture stable feature interactions as shown in equation (1):

（1），

wherein

Variance indicating experience risk of different training environments, +.>

Finger environment->

The specific calculation mode of the weight of the prediction loss is shown in the formulas (3) and (4):

（3），

（4）。

improving performance among all environments in combination with minimizing cross-environment losses and minimizing inter-environment loss differences

The performance gap between different environments is limited, the mode shared by the different environments is captured, and model parameters stable across environments are learned. At the same time by applying a greater weight to the environment with a high experience risk +.>

The difficult environment can be focused more, and the cross-environment generalization performance of the model parameters can be further improved.

In summary, the overall learning objective of the decoupling invariant learning is shown in formulas (1) and (2):

（1），

（2）。

optimizing the learning objective can decouple the environment-invariant correlation and the environment-specific correlation in different environments, and capture cross-environment stable feature interaction between heterogeneous environments through risk variance and environment weighting, so that the feature interaction can be well generalized in a model service stage.

In the meta-learning optimization framework, two sub-optimization targets of decoupling invariant learning are interdependent, and the environment invariant optimization targets are needed

Capturing an environmental specific correlation and fixing +.>

To remove its effects. Thus, the present invention alternately iterates the update +.>

And->

。

First, update the environment-invariant model parameters

. Fix->

Optimization->

In view of the fact that the learning objective of decoupling invariant learning is a complex double-layer optimization problem, the invention optimizes the objective based on meta-learning. During the meta-training phase, an environment is randomly sampled >

Generating intermediate model parameters with context-specific learning objectives>

As shown in formula (10):

（10），

then in the meta-test stage, the constant learning loss optimization obtained by calculation by using the intermediate model parameters

As shown in formula (11):

（11），

wherein ,

representation of the environment->

The weight of the loss is predicted.

Second, update the environment-specific model parameters

. In update->

After that, fix->

Directly optimizing an environment-specific learning objective to update +.>

As shown in formula (12):

（12），

wherein ,

solving for->

Is a gradient of (2); the two kinds of updating are alternately and iteratively performed until the modelAnd (5) convergence.

Fig. 2 (a) and 2 (b) respectively show schematic diagrams of two types of model of decoupling invariant learning according to an embodiment of the present invention, wherein fig. 2 (a) shows a model of decoupling invariant learning and fig. 2 (b) shows a model of light decoupling invariant learning (LightDIL).

For the model architecture, the invention designs the model parameters with unchanged environment respectively at the characteristic embedding level and the characteristic domain weight level

Parameter specific to the environment->

Taking decoupling of two types of correlations, the invention takes a factorizer model as an example (can also be designed based on other models), and designs the following two model architectures.

The first model architecture, shown in fig. 2 (a), feature embedding level decoupling. In view of the core of the feature embedded feature interaction model, the invention decouples at the feature embedded level. For characteristics of

Let its corresponding environment unchanged embed vector +.>

Embedding vector set specific to environment->

. Then for the factorer model, the specific model prediction formula is shown as formula (5):

（5），

wherein

This is the default Decoupled Invariant Learning (DIL) factorizer form.

The second model architecture, shown in fig. 2 (b), feature domain weight level decoupling. The decoupling of the feature embedding layer greatly increases model parameters, which increases the difficulty of model learning, and the model storage burden and the training cost. In order to improve the model efficiency, the invention is decoupled in characteristics and layers. In particular, we assign context-invariant weights, context-specific weights to feature interactions of the feature domain hierarchy to capture context-invariant, context-specific correlations, respectively. Taking a factorization machine as an example, a model prediction formula is shown as formula (6):

（6），

wherein ,

is based on domain->

Middle feature embedding->

Calculated Domain->

Representation of; (I)>

Domain

And domain->

Environment-constant weight between +.>

Is a domain->

And domain->

The environment t between the two has specific weight, with +.>

. This model architecture is named light decoupling invariant learning (LightDIL).

In summary, the present invention designs the decoupling model architecture at the feature embedding level and the feature domain weight level, respectively, as shown in fig. 2 (a). In the service stage, only environment-unchanged model parameters, namely stable feature interaction, are used for prediction so as to ensure good generalization capability. Taking light decoupling invariant learning as an example, a specific prediction formula is shown as formula (13):

（13），

wherein ,

representing the empty set, representing the replacement of the set of environment-specific model parameters with the empty set during the prediction phase of light decoupling invariant learning.

Fig. 3 is a flowchart of a click rate prediction method according to an embodiment of the present invention.

As shown in FIG. 3, the click rate prediction method includes operations S310 to S320.

In operation S310, a history data set of a user to be predicted is acquired, wherein the history data set of the user to be predicted includes user characteristic data and user click data.

In operation S320, the prediction result of the environment-invariant feature interactions of the historical dataset of the user to be predicted is mined by using a click rate prediction model, where the click rate prediction model is trained by the training method of the click rate prediction model based on decoupling invariant learning.

Fig. 4 is a schematic structural diagram of a training device based on a click rate prediction model of decoupling invariant learning according to an embodiment of the present invention.

As shown in fig. 4, the training apparatus 400 based on the click rate prediction model of decoupling invariant learning includes a model construction module 410, a data sampling module 420, an invariant parameter updating module 430, a specific parameter updating module 440, and an iteration module 450.

The model building module 410 is configured to perform operation S110, and build a click rate prediction model and a model optimization target based on a decoupling invariant learning method, where parameters of the click rate prediction model include an environment invariant portion parameter and an environment specific portion parameter, and the model optimization target includes an optimization target of the environment invariant portion parameter and an optimization target of the environment specific portion parameter.

The data sampling module 420 is configured to perform operation S120 to randomly sample an environmental data set to obtain a training sample data set, where the environmental data set represents historical click data of a user in different time periods, and the environmental data set includes a tag value.

The constant parameter updating module 430 is configured to execute operation S130, fix an environmental specific part parameter of the click rate prediction model, mine an environmental constant characteristic of the training sample data set by using the click rate prediction model, obtain a first prediction result, process the first prediction result and a tag value of the training sample data set by using the environmental constant loss function through a gradient descent method based on an optimization target of the environmental constant part parameter, obtain a first loss value, and update the environmental constant part parameter of the click rate prediction model according to the first loss value.

The specific parameter updating module 440 is configured to execute operation S140, fix the environmental invariant part parameter of the click rate prediction model, mine the environmental specific feature of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, process the second prediction result and the tag value of the training sample data set by using the gradient descent method by using the environmental specific loss function based on the optimization target of the environmental specific part parameter to obtain a second loss value, and update the environmental specific part parameter of the click rate prediction model according to the second loss value.

And an iteration module 450, configured to iterate operations S120 to S140 until the click rate prediction model meets a preset convergence condition, thereby obtaining a trained click rate prediction model.

In order to better illustrate the advantages of the click rate prediction model obtained by the training method provided by the invention, the click rate prediction model obtained by the training method provided by the invention is verified by combining a specific experiment.

According to the invention, a classical click rate prediction model FM is taken as a base recommendation model, and two data Douban and MovieLens10M (ML-10M) with different types are selected for experiments. The invention takes FwFMs, autoFIS, PROFIT Group-DRO and V-Rex as comparison models. The invention takes 6 months as a time period, and divides the Douban and ML-10M into 1513 parts respectively. For dousan, the first five time periods are used as training sets, the middle five are used as validation sets, and the last five are used as test sets. For ML-10M, the first five time periods are used as training sets, the middle four are used as validation sets, and the last four are used as test sets. All methods train the model on the training set, pick the optimal parameters on the validation set, and test on the test set. We counted the average performance of the last several test phases of dousan and ML-10M, respectively, with AUC and loglos as metrics.

The experimental results are shown in table 1:

table 1 comparison of the performance of different methods on two data sets

From table 1, it can be found that: on two different types of data sets, the method exceeds the general constant learning method V-Rex and Group-DRO in all indexes, and shows that the method can meet the full prediction assumption of constant learning through decoupling, and the constant learning is applied to click rate prediction stable feature interaction capture. Compared with the recommendation system model FwFMs, autoFIS, PROFIT, the method can obtain excellent results, which shows that the method can capture stable characteristic interaction for different recommendation scenes, realize better generalization in the prediction of the service stage and improve the accuracy of the prediction.

As shown in fig. 5, an electronic device 500 according to an embodiment of the present invention includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 501 may also include on-board memory for caching purposes. The processor 501 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flow according to an embodiment of the invention.

In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are stored. The processor 501, ROM 502, and RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to an embodiment of the present invention by executing programs in the ROM 502 and/or the RAM 503. Note that the program may be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in the one or more memories.

According to an embodiment of the invention, the electronic device 500 may further comprise an input/output (I/O) interface 505, the input/output (I/O) interface 505 also being connected to the bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.

The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.

According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the invention thereto, but to limit the invention thereto, and any modifications, equivalents, improvements and equivalents thereof may be made without departing from the spirit and principles of the invention.

Claims

1. The training method of the click rate prediction model based on decoupling invariant learning is characterized by comprising the following steps of:

thirdly, fixing environment specific part parameters of the click rate prediction model, mining environment invariant features of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the tag value of the training sample data set by using an environment invariant loss function through a gradient descent method based on an optimization target of the environment invariant part parameters to obtain a first loss value, and updating the environment invariant part parameters of the click rate prediction model according to the first loss value, wherein the environment invariant features represent features shared by data in different time periods;

Fourthly, fixing the environmental invariant part parameters of the click rate prediction model, mining the environmental specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environmental specific loss function through a gradient descent method based on the optimization target of the environmental specific part parameters to obtain a second loss value, and updating the environmental specific part parameters of the click rate prediction model according to the second loss value, wherein the environmental specific characteristics represent the sudden appearance preference of a user to certain articles or topics at a certain time point or within a certain time period;

iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining a trained click rate prediction model;

wherein the optimization objective of the model is represented by formula (1) and formula (2):

（1），/>

（2），

wherein equation (1) represents an optimization objective for the environment-invariant portion parameter, equation (2) represents an optimization objective for the environment-specific portion parameter,

representing the parameters of the constant part of the environment,

Is shown in the environment

Is provided with a parameter of the specific part of the environment,

is shown in the environment

The prediction loss obtained by the calculation in (c),

is used for controlling

The super-parameters of the intensity are used to determine,

is used for preventing

The environment-invariant dependent regularization constraints are captured,

representing the variance of the experience risk of different training environments,

refers to the environment

The weight of the loss is predicted and,

representation of

Is a coefficient of (a).

2. The method of claim 1, wherein the variance of the risk of experience of the different training environments

Represented by formula (3):

（3），

wherein ,

representing the number of elements of the training set of environments,

and

the values representing the different environments are taken into account,

is shown in the environment

Is provided with a parameter of the specific part of the environment,

is shown in the environment

The variance of experience risks of different training environments

A mode for capturing different environmental shares;

wherein the environment is

Weight of predictive loss->

Represented by formula (4):

（4），

wherein ,

representing traversing all environments.

3. The method of claim 1, wherein the click-through rate prediction model comprises a click-through rate prediction model based on decoupling invariant learning of a click-through data feature embedding level and/or a click-through rate prediction model based on decoupling invariant learning of a click-through data feature domain weighting level.

4. A method according to claim 3, wherein the click rate prediction model based on decoupling invariant learning of click data feature embedding planes is determined by equation (5):

（5），

wherein ,

representing the parameters of the constant part of the environment,

is shown in the environment

Is provided with a parameter of the specific part of the environment,

，

，

the characteristics of the click data are represented,

represent the first

The characteristics of the individual click data are such that,

represent the first

The characteristics of the individual click data are such that,

representing the number of the click data features,

represent the first

The environment-invariant features corresponding to the individual features are embedded,

represent the first

represent the first

The corresponding first feature

The embedding of the individual environment-specific features,

represent the first

The corresponding first feature

The embedding of the individual environment-specific features,

the click rate prediction model;

wherein the click rate prediction model based on decoupling invariant learning of the click data feature domain weight layer is determined by formula (6):

（6），

wherein ,

representation domain

Is characterized in that,

representation domain

Is characterized in that,

representation domain

AND Domain

The environment between them is not weighted in a changing way,

representation domain

AND Domain

Environment between

Has a specific weight of

，

Representing the number of feature fields.

5. The method of claim 4, wherein the domain

Characterization of (2)

Is based on domain

Middle feature embedding

Performing calculation, and determining by a formula (7):

（7），

wherein ,

representing the first of the data

The characteristics of the data are such that,

representing all of the belonging domains

Data characteristics of (2)

Corresponding to

Is a set of (a) and (b),

representing the first of the data

And embedding the features corresponding to the data features.

6. A click rate prediction method, comprising:

and mining prediction results of the environment-invariant feature interactions of the historical dataset of the user to be predicted by using a click rate prediction model, wherein the click rate prediction model is trained by the method of any one of claims 1-5.

7. A training device based on a click rate prediction model of decoupling invariant learning, comprising:

the model construction module is used for executing the first step, and constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;

the constant parameter updating module is used for executing the step three, fixing the environment specific part parameters of the click rate prediction model, utilizing the click rate prediction model to mine the environment constant characteristics of the training sample data set to obtain a first prediction result, utilizing an environment constant loss function to process the first prediction result and the label value of the training sample data set through a gradient descent method based on the optimization target of the environment constant part parameters to obtain a first loss value, and updating the environment constant part parameters of the click rate prediction model according to the first loss value, wherein the environment constant characteristics represent the characteristics shared by data in different time periods;

a specific parameter updating module, configured to execute step four, fix an environmental invariant part parameter of the click rate prediction model, mine an environmental specific feature of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, process the second prediction result and a tag value of the training sample data set by using an environmental specific loss function through a gradient descent method based on an optimization target of the environmental specific part parameter to obtain a second loss value, and update the environmental specific part parameter of the click rate prediction model according to the second loss value, where the environmental specific feature represents a preference of a user for a sudden occurrence of some objects or topics at a certain time point or in a certain time period;

The iteration module is used for carrying out the second to fourth steps in an iteration mode until the click rate prediction model meets the preset convergence condition, and a trained click rate prediction model is obtained;

（1），

（2），

representing the parameters of the constant part of the environment,

is shown in the environment

Is provided with a parameter of the specific part of the environment,

is shown in the environment

The prediction loss obtained by the calculation in (c),

is used for controlling

The super-parameters of the intensity are used to determine,

is used for preventing

The environment-invariant dependent regularization constraints are captured,

refers to the environment

The weight of the loss is predicted and,

representation of

Is a coefficient of (a).

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.

9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-6.