CN115809372A

CN115809372A - Click rate prediction model training method and device based on decoupling invariant learning

Info

Publication number: CN115809372A
Application number: CN202310053850.7A
Authority: CN
Inventors: 何向南; 张洋; 史天昊; 冯福利
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-03-17
Anticipated expiration: 2043-02-03
Also published as: CN115809372B

Abstract

The invention discloses a method and a device for training a click rate prediction model based on decoupling invariant learning. The method comprises the following steps: step one, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method; randomly sampling the environment data set to obtain a training sample data set; fixing the environment specific part parameters of the click rate prediction model, mining the environment invariant characteristics of the training sample data set by using the click rate prediction model, and updating the environment invariant part parameters of the click rate prediction model; fixing the environment invariant part parameters of the click rate prediction model, mining the environment specific characteristics of the training sample data set by using the updated click rate prediction model, and updating the environment specific part parameters of the click rate prediction model; and iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining a trained click rate prediction model.

Description

Training method and device of click rate prediction model based on decoupling invariant learning

Technical Field

The invention relates to the field of recommendation systems, data mining and machine learning, in particular to a click rate prediction model training method and device based on decoupling invariant learning, a click rate prediction method, electronic equipment and a storage medium.

Background

Click rate prediction is a crucial link of a recommendation system. In recent years, feature interaction modeling is recognized as the core of the click-through rate prediction problem, and most of research focuses on efficient modeling of feature interactions. However, the feature interaction modeling models in the prior art are all based on fitting empirical risk minimization to historical data to learn feature interactions, i.e., learning feature interactions in the form of interpreting historical data. However, services need to be provided in future scenes in real recommendation scenes, and due to the fact that user interests continuously change, drift exists between new data and historical data, and feature interaction obtained by fitting the historical data is difficult to generalize well on the new data, so that performance of a recommendation system is damaged.

In order to solve the problem that the learning model is poor in generalization due to the existence of distribution drift and based on empirical risk minimization, the technical personnel in the field propose a paradigm of invariant learning. Invariant learning assumes that training data is collected from heterogeneous environments, and invariant correlations are identified by distributed shifts between the environments. While this approach makes stable feature interactive learning possible, it assumes that the target can be adequately predicted by the context-invariant correlations. In the recommendation system, since the training part is affected by the coupling of the environment-invariant correlation and the environment-specific correlation, this assumption cannot be satisfied, and the ability to recognize stable feature interactions is difficult to guarantee.

Disclosure of Invention

In view of the foregoing problems, the present invention provides a method and an apparatus for training a click rate prediction model based on decoupled invariant learning, a click rate prediction method, an electronic device, and a storage medium, so as to solve at least one of the above problems.

According to a first aspect of the present invention, there is provided a training method for a click rate prediction model based on decoupling invariant learning, comprising:

the method comprises the steps that firstly, a click rate prediction model and a model optimization target are built on the basis of a decoupling invariant learning method, wherein parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;

randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and comprises tag values;

fixing the parameters of the environment specific part of the click rate prediction model, mining the environment invariant features of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the label value of the training sample data set by using an environment invariant loss function through a gradient descent method based on the optimization target of the parameters of the environment invariant part to obtain a first loss value, and updating the parameters of the environment invariant part of the click rate prediction model according to the first loss value;

fixing the environment invariant part parameters of the click rate prediction model, mining the environment specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environment specific loss function through a gradient descent method based on the optimization target of the environment specific part parameters to obtain a second loss value, and updating the environment specific part parameters of the click rate prediction model according to the second loss value;

and iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining a trained click rate prediction model.

According to an embodiment of the present invention, the optimization objective of the above model is represented by formula (1) and formula (2):

（1），

（2），

wherein formula (1) represents an optimization objective of the environment-invariant partial parameters, formula (2) represents an optimization objective of the environment-specific partial parameters,

a parameter representing a constant part of the environment,

is represented in the environment

In the context of the parameters of the specific part of the environment,

is represented in the environment

The predicted loss is calculated as a result of the calculation,

is used for controlling

The over-parameters of the intensity are,

is used for preventing

A regularization constraint that is context invariant dependent is captured,

the variance representing the risk of experience for different training environments,

finger environment

The weight of the loss is predicted and,

to represent

The coefficient of (a).

According to the embodiment of the invention, the variance of the experience risks of different training environments is adopted

Expressed by equation (3):

（3），

wherein ,

representing the number of elements of the set of training environments,

and

the values of the different environments are represented,

is represented in the environment

The environment-specific part-parameter of (a),

is represented in the environment

The variance of the prediction loss and the empirical risk of different training environments obtained by the calculation

A mode for capturing different environment shares;

wherein the environment

The weight of the predicted loss is represented by equation (4):

（4），

wherein ,

to representAll environments are traversed.

According to the embodiment of the invention, the click rate prediction model comprises a click rate prediction model based on the decoupling invariant learning of the click data feature embedding layer and/or a click rate prediction model based on the decoupling invariant learning of the click data feature domain weight layer.

According to the embodiment of the invention, the click rate prediction model based on the decoupling invariant learning of the click data feature embedding level is determined by formula (5):

（5），

wherein ,

a parameter representing a constant part of the environment,

is represented in the environment

In the context of the parameters of the specific part of the environment,

，

，

a feature representing the click data is shown,

is shown as

The characteristics of the individual click data are,

denotes the first

The characteristics of the individual click data are such that,

indicating the number of click data features that are to be,

denotes the first

The environment-invariant features corresponding to the individual features are embedded,

is shown as

denotes the first

The characteristic corresponds to

The specific features of the individual environments are embedded in,

denotes the first

The characteristic corresponds to

The specific features of the individual environments are embedded in,

the click rate prediction model;

the click rate prediction model based on the decoupling invariant learning of the click data feature domain weight layer is determined by a formula (6):

（6），

wherein ,

representing a domain

In the above-mentioned characterization of (1),

representing a domain

In the above-mentioned characterization of (1),

representation domain

And domain

The environment of the room does not change the weight,

representing a domain

And domain

In the environment of

Of a particular weight of

，

Representing the number of feature fields.

According to an embodiment of the present invention, the domain

Is characterized by

Is based on domains

Mid-feature embedding

A calculation is performed, determined by equation (7):

（7），

wherein ,

represent the first of the data

The characteristics of the data are such that,

indicates all the domains

Data characteristics of

Corresponding to

The set of (a) or (b),

represent the first of the data

And embedding the characteristics corresponding to the data characteristics.

According to a second aspect of the present invention, there is provided a click rate prediction method, including:

acquiring a historical data set of a user to be predicted, wherein the historical data set of the user to be predicted comprises user characteristic data and user click data;

and mining a prediction result of the environment-invariant feature interaction of the historical data set of the user to be predicted by using a click rate prediction model, wherein the click rate prediction model is obtained by training the click rate prediction model based on decoupling invariant learning through the training method.

According to a third aspect of the present invention, there is provided a training apparatus for a click rate prediction model based on decoupling invariant learning, comprising:

the model building module is used for executing the first step and building a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein parameters of the click rate prediction model comprise parameters of an environment invariant part and parameters of an environment specific part, and the model optimization target comprises an optimization target of the parameters of the environment invariant part and an optimization target of the parameters of the environment specific part;

the data sampling module is used for executing the second step, randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and comprises a label value;

the invariant parameter updating module is used for executing the third step, fixing the environment specific part parameters of the click rate prediction model, excavating the environment invariant characteristics of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the label value of the training sample data set by using an environment invariant loss function through a gradient descent method based on the optimization target of the environment invariant part parameters to obtain a first loss value, and updating the environment invariant part parameters of the click rate prediction model according to the first loss value;

the specific parameter updating module is used for executing the fourth step, fixing the environment invariant part parameters of the click rate prediction model, mining the environment specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environment specific loss function through a gradient descent method based on the optimization target of the environment specific part parameters to obtain a second loss value, and updating the environment specific part parameters of the click rate prediction model according to the second loss value;

and the iteration module is used for iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.

According to a fourth aspect of the present invention, there is provided an electronic apparatus comprising:

one or more processors;

a storage device to store one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a training method based on a click-through rate prediction model of decoupled invariant learning and a click-through rate prediction method.

According to a fifth aspect of the present invention, there is provided a computer readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a method of training a click-through rate prediction model based on decoupled invariant learning and a method of click-through rate prediction.

According to the training method of the click rate prediction model based on the decoupling invariant learning, provided by the invention, the click rate prediction model with good generalization can be obtained, so that the model can identify stable characteristic interaction in different historical environments, and meanwhile, the problem that the click rate prediction model in the prior art is low in identification accuracy due to the fact that a data drift phenomenon exists between data processed in the model application stage and historical training data is solved, and the prediction accuracy of the model is greatly improved.

Drawings

FIG. 1 is a flow chart of a method of training a click-through rate prediction model based on decoupled invariant learning, according to an embodiment of the present invention;

FIG. 2 (a) is a schematic diagram of a decoupled invariant learning model according to an embodiment of the present invention;

FIG. 2 (b) is a schematic diagram of a light decoupling invariant learning model according to an embodiment of the present invention;

FIG. 3 is a flow chart of a click-through rate prediction method according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a training apparatus for a click rate prediction model based on decoupling invariant learning according to an embodiment of the present invention;

FIG. 5 schematically shows a block diagram of an electronic device suitable for implementing a click-through rate prediction model training method and a click-through rate prediction method based on decoupled invariant learning, according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings in combination with the embodiments.

Click-through rate prediction is a key link in recommendation systems, and early methods were factorized models, which model feature interactions in the form of factorization and inner products. In recent years, with the rapid development of machine learning and deep learning technologies, those skilled in the art propose to implement more efficient and complex feature interactive modeling based on various neural networks, such as a multi-layer perceptron, an inner product or outer product neural network, an attention-oriented neural network, a convolutional neural network, or a graph neural network. In recent years, methods based on neural network architecture search have also been proposed, with some efforts focused on automatic search of optimal network architecture modeling feature interactions, and other efforts focused on automatic selection or generation of optimal feature interactions. These efforts enable better feature interaction modeling while also greatly reducing human input. However, the characteristic interaction modeling models have the problems of data drift, poor generalization and the like; meanwhile, aiming at the problems of data drift and poor generalization, the technical personnel in the field provide a recommendation model based on an invariant learning paradigm; the recommendation model hypothesis target based on the invariant learning paradigm can be fully predicted by the environment invariant correlation, and the hypothesis cannot be met in the actual training and application process of the model, so that the capability of identifying stable feature interaction of the recommendation model based on the invariant learning paradigm is difficult to guarantee.

In order to learn stable feature interaction in the recommendation system click rate prediction problem and improve the generalization capability of a model on new data, the invention provides a stable feature interaction capturing method based on decoupling invariant learning. According to the method, historical data are divided into different environments according to time sequence, and an invariant learning hypothesis is established by decoupling the environment invariant correlation and the environment specific correlation and removing the environment invariant correlation, so that stable feature interaction is captured by applying invariant learning. Meanwhile, the stable characteristic interaction capturing method based on the decoupling invariant learning can capture stable characteristic interaction from heterogeneity of historical data in different environments, so that the learned characteristic interaction can have good generalization capability in a service phase of a real click rate prediction problem scene, and the prediction accuracy of a recommendation system is improved.

FIG. 1 is a flowchart of a training method of a click-through rate prediction model based on decoupled invariant learning according to an embodiment of the present invention.

As shown in FIG. 1, the training method of the click rate prediction model based on the decoupling invariant learning includes operations S110-150.

In operation S110, a click-through rate prediction model and a model optimization objective are constructed based on a decoupling invariant learning method, where parameters of the click-through rate prediction model include parameters of an environment-invariant portion and parameters of an environment-specific portion, and the model optimization objective includes an optimization objective of the parameters of the environment-invariant portion and an optimization objective of the parameters of the environment-specific portion.

In operation S120, an environment data set is randomly sampled to obtain a training sample data set, where the environment data set represents historical click data of a user in different time periods, and includes a tag value.

In operation S130, the specific environmental parameter of the click rate prediction model is fixed, the click rate prediction model is used to mine the invariant environmental characteristics of the training sample data set to obtain a first prediction result, the first prediction result and the label value of the training sample data set are processed by using the invariant environmental loss function through a gradient descent method based on the optimization target of the invariant environmental parameter to obtain a first loss value, and the invariant environmental parameter of the click rate prediction model is updated according to the first loss value.

In operation S140, the environment invariant portion parameter of the click rate prediction model is fixed, the updated click rate prediction model is used to mine the environment specific feature of the training sample data set to obtain a second prediction result, the second prediction result and the label value of the training sample data set are processed by using the environment specific loss function through a gradient descent method based on the optimization target of the environment specific portion parameter to obtain a second loss value, and the environment specific portion parameter of the click rate prediction model is updated according to the second loss value.

In operation S150, the operations S120 to S140 are performed iteratively until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.

The training method of the click rate prediction model provided by the invention can fully mine the invariant features and the specific features of historical data in different environments; wherein the invariant features of the historical data refer to features that are common to the data over different time periods, for example, a user has an invariant, relatively fixed preference for certain items or topics over different time periods, and will pay attention to the items or topics for a long time and click on content related to the items or topics; the specific characteristics of the historical data refer to the user's preference for certain items or topics appearing suddenly at a certain time point or period, for example, the user may pay more attention to a sudden news hot event or a sudden red article on social media and improve the click rate of the relevant hot event.

According to the method for training the click rate prediction model based on the decoupling invariant learning, the click rate prediction model with good generalization performance can be obtained, so that the model can identify stable characteristic interaction in different historical environments, meanwhile, the problem that the click rate prediction model in the prior art is low in identification accuracy due to the fact that data drift exists between data processed in the model application stage and historical training data is solved, and the prediction accuracy of the model is greatly improved.

（1），

（2），

whereinFormula (1) represents the optimization objective of the environment-invariant partial parameters, formula (2) represents the optimization objective of the environment-specific partial parameters,

a parameter representing a constant part of the environment,

is represented in the environment

In the context of the parameters of the specific part of the environment,

is represented in the environment

The predicted loss obtained in (1) is calculated,

is used for controlling

The over-parameters of the intensity are,

is used for preventing

A regularization constraint that is context invariant dependent is captured,

finger environment

The weight of the loss is predicted and,

to represent

The coefficient of (a).

Expressed by equation (3):

（3），

wherein ,

representing the number of elements of the set of training environments,

and

the values of the different environments are represented,

is represented in the environment

The environment-specific part parameter of (a),

is represented in the environment

The variance of the empirical risk of the different training environments

A mode for capturing different environment shares;

wherein the environment

The weight of the predicted loss is represented by equation (4):

（4），

wherein ,

representing traversal of all environments.

（5），

wherein ,

a parameter representing a constant part of the environment,

is represented in the environment

In the context of the parameters of the specific part of the environment,

，

，

a feature representing the click data is shown,

is shown as

The characteristics of the individual click data are such that,

to representFirst, the

The characteristics of the individual click data are such that,

indicating the number of click data features that are to be characterized,

is shown as

is shown as

is shown as

The characteristic corresponds to

The specific features of the individual environments are embedded in,

is shown as

The characteristic corresponds to

The specific features of the individual environments are embedded in,

the click rate prediction model.

（6），

wherein ,

representing a domain

The characterization of (a) is performed,

representing a domain

In the above-mentioned characterization of (1),

representation domain

And domain

The environment of the room does not change the weight,

representing a domain

And domain

In the environment of

Has a specific weight of

，

Representing the number of feature fields.

According to an embodiment of the present invention, the domains mentioned above

Is characterized by

Is based on domains

Mid-feature embedding

A calculation is performed, determined by equation (7):

（7），

wherein ,

represent the first of the data

The characteristics of the data are such that,

indicates all the domains

Data characteristics of

Corresponding to

The set of (a) or (b),

represent the first of the data

And embedding the characteristics corresponding to the data characteristics.

The invention provides a stable characteristic interaction capturing framework aiming at a click rate prediction problem, which mainly comprises three parts: a decoupled invariant learning objective for capturing stable feature interactions; a meta-learning optimization framework for implementing a decoupled invariant learning objective; model architecture for implementing decoupled invariant learning.

For use ofIn the method, historical data is divided into a plurality of learning targets with equal duration in sequence in order to capture the decoupling invariant learning target of the stable characteristic interaction

A different environment. The invention divides model parameters for modeling feature interaction into an environment-invariant part

With environment-specific parts

Respectively for capturing environment-invariant and environment-specific correlations. The invention designs the characteristic embedding level and the characteristic domain weight level respectively

And

. In order to achieve decoupling of the environment invariant correlation and the environment specific correlation to meet the sufficient prediction assumption of invariant learning and capture stable feature interaction, the invention designs a decoupling invariant learning target which consists of an environment specific learning target meeting the sufficient prediction target assumption and an environment invariant learning target removing the environment specific correlation influence.

The invention divides historical data into equal time length in sequence

Different environments can better mine the constant characteristics in the historical data or the specific characteristics related to the environments. For example, historical click data of a user is divided into multiple sections, and common features among the multiple sections of data, namely invariant click features irrelevant to the environment of the user, can be mined; the different characteristics among the plurality of pieces of data may be specific click characteristics of the user related to the environment.

Learning objectives are specific to the environment that satisfy the assumption of adequate prediction objectives. To satisfy the sufficient prediction assumption of invariant learning, in the environment

In, combine

And with

Should the target be sufficiently predictable, while the environment-specific part can focus on capturing the environment-specific correlations, the following optimization targets are designed, as shown in equation (8):

（8），

wherein ,

is represented in the environment

The predicted loss obtained in (1) is calculated,

is used for controlling

The over-parameters of the intensity are,

is used for preventing

Capturing regularization constraints of environment invariant correlations by letting environment specific parameters

In a removing environment

Outside environment

No contribution is made to the prediction, as shown in equation (9):

（9）。

by optimizing the learning objective, the partial parameters of the environment can be made constant

And environment specific part parameters

In the environment

The middle union satisfies the sufficient prediction condition while making

Focus on capturing environment-specific dependencies.

An environment-invariant learning objective for removing environment-specific relevant influences. When the temperature is higher than the set temperature

After capturing environment specific correlations, fix

Equivalent to removing

Impact on predicted targets when capture environment invariant correlation can satisfy adequate predicted targets (removal of

Influenced goal) of the target. Thus, the invention is fixed

Designing the following invariant learning objective optimization environment invariant model parameters

To capture stable feature interactions, as shown in equation (1):

（1），

wherein

Refers to the variance of the risk of experience for different training environments,

finger environment

The weight of the predicted loss is calculated in a specific manner as shown in equations (3) and (4):

（3），

（4）。

combining minimized cross-environment loss to improve performance across all environments, and minimizing loss difference between environments

The performance difference among different environments is limited, the sharing mode of the different environments is captured, and the model parameters stable across the environments are learned. At the same time, by applying greater weight to environments with high empirical risk

The method can pay more attention to the difficult environment, and further improve the cross-environment generalization performance of the model parameters.

In summary, the overall learning objective of the decoupling invariant learning is shown in equations (1) and (2):

（1），

（2）。

by optimizing the learning objective, the environment invariant correlation and the environment specific correlation in different environments can be decoupled, and cross-environment stable feature interaction is captured between heterogeneous environments through risk variance and environment weighting, so that the feature interaction can be well generalized in a model service stage.

In the meta-learning optimization framework, two sub-optimization targets for decoupling invariant learning are interdependent, and an environment-invariant optimization target is required

Capture and fix environment-specific dependencies

To remove its effect. Thus, the present invention alternately iteratively updates

And

。

first, the environment-invariant model parameters are updated

. Fixing the device

Optimizing

In view of the learning objective of decoupled invariant learning, which is a complex two-layer optimization problem, the present invention optimizes this objective based on meta-learning. In meta-training phase, an environment is randomly sampled

Generation of intermediate model parameters using environment-specific learning objectives

As shown in equation (10):

（10），

then in the meta-test stage, invariant learning loss optimization is obtained by using intermediate model parameter calculation

As shown in equation (11):

（11），

wherein ,

representing an environment

Predicting the lost weight.

Second, the environment-specific model parameters are updated

. In the process of updating

Then, fix

Directly optimizing environment-specific learning objectives to update

As shown in equation (12):

（12），

wherein ,

solving about

A gradient of (a); and performing alternate iteration on the two types of updating until the model converges.

Fig. 2 (a) and fig. 2 (b) respectively show schematic diagrams of two types of decoupling invariant learning models according to an embodiment of the present invention, where fig. 2 (a) shows the decoupling invariant learning model and fig. 2 (b) shows the light decoupling invariant learning model (LightDIL).

For the model architecture, the environment-invariant model parameters are respectively designed at the aspect of feature embedding and the aspect of feature domain weight

With environment-specific model parameters

By decoupling the two types of correlations, the factorization model is taken as an example (the method can also be designed based on other models), and the following two model architectures are designed.

The first model architecture, as shown in fig. 2 (a), is feature embedding level decoupling. In view of the core of the feature-embedded feature interaction model, the present invention is decoupled at the feature-embedded level. To the characteristics

Make its corresponding environment invariant embedded vector

Context-specific embedded vector set

. Then for the factorization model, the concrete model prediction formula is as shown in formula (5):

（5），

wherein

This is the default Decoupled Invariant Learning (DIL) factorizer form.

The second model architecture, as shown in fig. 2 (b), is decoupled at the feature domain weight level. The characteristic embedding layer decoupling greatly increases model parameters, so that the difficulty of model learning is improved, and the model storage burden and the training overhead are increased. To improve model efficiency, the present invention decouples features and aspects. Specifically, we assign environment-invariant weights and environment-specific weights to feature interactions at the feature domain level to capture environment-invariant and environment-specific correlations, respectively. Taking a factorization machine as an example, the model prediction formula is shown in formula (6):

（6），

wherein ,

is based on domains

Mid-feature embedding

Computed domains

The characterization of (1);

domain(s)

And domain

The environment of the room is not weighted by the change,

is a domain

And domain

With a particular weight of the environment t, of

. The model architecture is named light decoupled invariant learning (LightDIL).

In summary, the present invention designs a decoupling model architecture at the feature embedding level and the feature domain weight level, respectively, as shown in fig. 2 (a). In the service stage, only the environment-invariant model parameters, namely the stable characteristic interaction, are used for prediction so as to ensure good generalization capability. Taking the light decoupling invariant learning as an example, a specific prediction formula is shown in formula (13):

（13），

wherein ,

and representing an empty set, and replacing the set of the environment-specific model parameters with the empty set in the prediction stage of the light decoupling invariant learning.

FIG. 3 is a flowchart of a click-through rate prediction method according to an embodiment of the invention.

As shown in fig. 3, the click-through rate prediction method includes operations S310 to S320.

In operation S310, a historical data set of a user to be predicted is obtained, where the historical data set of the user to be predicted includes user feature data and user click data.

In operation S320, a click rate prediction model is used to mine a prediction result of environment invariant feature interaction of the historical data set of the user to be predicted, where the click rate prediction model is obtained by training the click rate prediction model based on the decoupling invariant learning through the above-mentioned training method.

FIG. 4 is a schematic structural diagram of a training device for a click rate prediction model based on decoupling invariant learning according to an embodiment of the present invention.

As shown in fig. 4, the training apparatus 400 for the click rate prediction model based on the decoupled invariant learning includes a model building module 410, a data sampling module 420, an invariant parameter updating module 430, a specific parameter updating module 440, and an iteration module 450.

The model building module 410 is configured to execute operation S110, and build a click-through rate prediction model and a model optimization target based on a decoupling invariant learning method, where parameters of the click-through rate prediction model include parameters of an environment-invariant portion and parameters of an environment-specific portion, and the model optimization target includes an optimization target of the parameters of the environment-invariant portion and an optimization target of the parameters of the environment-specific portion.

The data sampling module 420 is configured to execute operation S120, and perform random sampling on an environment data set to obtain a training sample data set, where the environment data set represents historical click data of a user in different time periods, and the environment data set includes a tag value.

The invariant parameter updating module 430 is configured to execute operation S130, fix the environment specific part parameter of the click rate prediction model, mine the environment invariant feature of the training sample data set by using the click rate prediction model to obtain a first prediction result, process the first prediction result and the label value of the training sample data set by using the environment invariant loss function through a gradient descent method based on the optimization target of the environment invariant part parameter to obtain a first loss value, and update the environment invariant part parameter of the click rate prediction model according to the first loss value.

The specific parameter updating module 440 is configured to execute operation S140, fix the environment-invariant parameters of the click rate prediction model, mine the environment-specific features of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, process the second prediction result and the label value of the training sample data set by using the environment-specific loss function through a gradient descent method based on the optimization target of the environment-specific parameters to obtain a second loss value, and update the environment-specific parameters of the click rate prediction model according to the second loss value.

And the iteration module 450 is configured to iterate operations S120 to S140 until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.

In order to better illustrate the advantages of the click rate prediction model obtained by the training method provided by the invention, the click rate prediction model obtained by the training method of the invention is verified by combining a specific experiment.

The method takes a classical click rate prediction model FM as a basic recommendation model, and selects two public different types of data, namely, double, movieLens10M (ML-10M) for experiment. In the invention, fwFMs, autoFIS, PROFIT, group-DRO and V-Rex are used as comparison models. The invention divides double, ML-10M into 1513 parts respectively with 6 months as a period. For double, the first five time periods are used as training sets, the middle five as validation sets, and the last five as test sets. For ML-10M, the first five time periods are used as training sets, the middle four are used as validation sets, and the last four are used as test sets. All methods train the model on the training set, select the optimal parameter on the verification set, and test on the test set. We counted the average performance of the last several test phases of double and ML-10M, respectively, as measured by AUC and logoss.

The results are shown in Table 1:

TABLE 1 comparison of Performance of different methods on two datasets

From table 1, it can be found that: on two different types of data sets, all indexes of the method exceed those of a common invariant learning method V-Rex, group-DRO, and the method can be used for applying invariant learning to click rate prediction stable characteristic interactive capture by decoupling a sufficient prediction hypothesis meeting the invariant learning. Compared with the recommendation system models FwFMs, autoFIS and PROFIT, the method can obtain excellent results, which shows that the method can capture stable characteristic interaction for different recommendation scenes, achieve better generalization in service stage prediction and improve the prediction accuracy.

FIG. 5 schematically shows a block diagram of an electronic device suitable for implementing a click-through rate prediction model training method based on decoupled invariant learning and a click-through rate prediction method according to an embodiment of the present invention.

As shown in fig. 5, an electronic device 500 according to an embodiment of the present invention includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 501 may also include onboard memory for caching purposes. Processor 501 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present invention.

In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are stored. The processor 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to the embodiments of the present invention by executing programs in the ROM 502 and/or the RAM 503. Note that the program may also be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.

According to an embodiment of the present invention, electronic device 500 may also include an input/output (I/O) interface 505, input/output (I/O) interface 505 also being connected to bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted on the storage section 508 as necessary.

The present invention also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.

According to embodiments of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, a computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 as described above.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A click rate prediction model training method based on decoupling invariant learning is characterized by comprising the following steps:

fixing the environment specific part parameter of the click rate prediction model, mining the environment invariant feature of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the label value of the training sample data set by using an environment invariant loss function through a gradient descent method based on the optimization target of the environment invariant part parameter to obtain a first loss value, and updating the environment invariant part parameter of the click rate prediction model according to the first loss value;

and iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.

2. The method of claim 1, wherein the optimization objective of the model is represented by formula (1) and formula (2):

（1），

（2），

wherein formula (1) represents an optimization objective of the environment-invariant partial parameter, formula (2) represents an optimization objective of the environment-specific partial parameter,

a parameter representing a constant part of said environment,

is represented in the environment

The environment-specific part parameter of (a),

is represented in the environment

The predicted loss obtained by the calculation of (a) above,

is used for controlling

The over-parameters of the intensity are,

is used for preventing

A regularization constraint that is context invariant dependent is captured,

finger environment

The weight of the loss is predicted and,

to represent

The coefficient of (a).

3. The method of claim 2, wherein the variance of the different training environment experience risks

Expressed by equation (3):

（3），

wherein ,

representing the number of elements of the set of training environments,

and

the values of the different environments are represented,

is represented in the environment

The environment-specific part-parameter of (a),

is represented in the environment

The variance of the empirical risk of the different training environments

A mode for capturing different environment shares;

wherein the environment is

Predicting lost weights

Expressed by equation (4):

（4），

wherein ,

representing the traversal of all environments.

4. The method of claim 1, wherein the click rate prediction models comprise click rate prediction models based on decoupled invariant learning of a click data feature embedding level and/or click rate prediction models based on decoupled invariant learning of a click data feature domain weight level.

5. The method of claim 4, wherein the click-through rate prediction model based on decoupled invariant learning of click data feature embedding levels is determined by equation (5):

（5），

wherein ,

a parameter representing a constant part of said environment,

is represented in the environment

The environment-specific part parameter of (a),

，

，

a feature representing the click data is shown,

is shown as

The characteristics of the individual click data are such that,

is shown as

The characteristics of the individual click data are such that,

represents the number of the click data features,

denotes the first

is shown as

denotes the first

The characteristic corresponds to

The specific features of the individual environments are embedded in,

is shown as

The characteristic corresponds to

The specific features of the individual environments are embedded in,

the click rate prediction model;

wherein the click rate prediction model based on the decoupling invariant learning of the click data feature domain weight layer is determined by formula (6):

（6），

wherein ,

representing a domain

The characterization of (a) is performed,

representing a domain

The characterization of (a) is performed,

representation domain

And domain

The environment of the room is not weighted by the change,

representing a domain

And domain

In the environment of

Has a specific weight of

，

Representing the number of feature fields.

6. The method of claim 5, wherein the domain is a public domain

Is characterized by

Is based on domains

Mid-feature embedding

A calculation is performed, determined by equation (7):

（7），

wherein ,

represent the first of the data

The characteristics of the data are such that,

indicates all the domains

Data characteristics of

Corresponding to

The set of (a) and (b),

represent the first of the data

And embedding the characteristics corresponding to the data characteristics.

7. A click-through rate prediction method, comprising:

and mining a prediction result of the environment-invariant feature interaction of the historical data set of the user to be predicted by using a click-through rate prediction model, wherein the click-through rate prediction model is obtained by training according to the method of any one of claims 1 to 6.

8. A training device of a click rate prediction model based on decoupling invariant learning is characterized by comprising the following components:

the model construction module is used for executing the first step, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;

the data sampling module is used for executing the second step, randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and comprises tag values;

the invariant parameter updating module is used for executing the third step, fixing the environment specific part parameter of the click rate prediction model, mining the environment invariant feature of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the label value of the training sample data set by using an environment invariant loss function through a gradient descent method based on the optimization target of the environment invariant part parameter to obtain a first loss value, and updating the environment invariant part parameter of the click rate prediction model according to the first loss value;

a specific parameter updating module, configured to perform the fourth step, fix the environment-invariant parameter of the click rate prediction model, mine the environment specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, process the second prediction result and the label value of the training sample data set by using an environment specific loss function through a gradient descent method based on the optimization target of the environment specific parameter to obtain a second loss value, and update the environment specific parameter of the click rate prediction model according to the second loss value;

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 7.