CN115809372A - Click rate prediction model training method and device based on decoupling invariant learning - Google Patents
Click rate prediction model training method and device based on decoupling invariant learning Download PDFInfo
- Publication number
- CN115809372A CN115809372A CN202310053850.7A CN202310053850A CN115809372A CN 115809372 A CN115809372 A CN 115809372A CN 202310053850 A CN202310053850 A CN 202310053850A CN 115809372 A CN115809372 A CN 115809372A
- Authority
- CN
- China
- Prior art keywords
- environment
- invariant
- rate prediction
- prediction model
- click
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 85
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000005457 optimization Methods 0.000 claims abstract description 54
- 238000005065 mining Methods 0.000 claims abstract description 10
- 238000005070 sampling Methods 0.000 claims abstract description 10
- 230000003993 interaction Effects 0.000 claims description 35
- 230000006870 function Effects 0.000 claims description 15
- 238000011478 gradient descent method Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000012512 characterization method Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 11
- 230000015654 memory Effects 0.000 description 9
- 238000012360 testing method Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method and a device for training a click rate prediction model based on decoupling invariant learning. The method comprises the following steps: step one, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method; randomly sampling the environment data set to obtain a training sample data set; fixing the environment specific part parameters of the click rate prediction model, mining the environment invariant characteristics of the training sample data set by using the click rate prediction model, and updating the environment invariant part parameters of the click rate prediction model; fixing the environment invariant part parameters of the click rate prediction model, mining the environment specific characteristics of the training sample data set by using the updated click rate prediction model, and updating the environment specific part parameters of the click rate prediction model; and iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining a trained click rate prediction model.
Description
Technical Field
The invention relates to the field of recommendation systems, data mining and machine learning, in particular to a click rate prediction model training method and device based on decoupling invariant learning, a click rate prediction method, electronic equipment and a storage medium.
Background
Click rate prediction is a crucial link of a recommendation system. In recent years, feature interaction modeling is recognized as the core of the click-through rate prediction problem, and most of research focuses on efficient modeling of feature interactions. However, the feature interaction modeling models in the prior art are all based on fitting empirical risk minimization to historical data to learn feature interactions, i.e., learning feature interactions in the form of interpreting historical data. However, services need to be provided in future scenes in real recommendation scenes, and due to the fact that user interests continuously change, drift exists between new data and historical data, and feature interaction obtained by fitting the historical data is difficult to generalize well on the new data, so that performance of a recommendation system is damaged.
In order to solve the problem that the learning model is poor in generalization due to the existence of distribution drift and based on empirical risk minimization, the technical personnel in the field propose a paradigm of invariant learning. Invariant learning assumes that training data is collected from heterogeneous environments, and invariant correlations are identified by distributed shifts between the environments. While this approach makes stable feature interactive learning possible, it assumes that the target can be adequately predicted by the context-invariant correlations. In the recommendation system, since the training part is affected by the coupling of the environment-invariant correlation and the environment-specific correlation, this assumption cannot be satisfied, and the ability to recognize stable feature interactions is difficult to guarantee.
Disclosure of Invention
In view of the foregoing problems, the present invention provides a method and an apparatus for training a click rate prediction model based on decoupled invariant learning, a click rate prediction method, an electronic device, and a storage medium, so as to solve at least one of the above problems.
According to a first aspect of the present invention, there is provided a training method for a click rate prediction model based on decoupling invariant learning, comprising:
the method comprises the steps that firstly, a click rate prediction model and a model optimization target are built on the basis of a decoupling invariant learning method, wherein parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;
randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and comprises tag values;
fixing the parameters of the environment specific part of the click rate prediction model, mining the environment invariant features of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the label value of the training sample data set by using an environment invariant loss function through a gradient descent method based on the optimization target of the parameters of the environment invariant part to obtain a first loss value, and updating the parameters of the environment invariant part of the click rate prediction model according to the first loss value;
fixing the environment invariant part parameters of the click rate prediction model, mining the environment specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environment specific loss function through a gradient descent method based on the optimization target of the environment specific part parameters to obtain a second loss value, and updating the environment specific part parameters of the click rate prediction model according to the second loss value;
and iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, and obtaining a trained click rate prediction model.
According to an embodiment of the present invention, the optimization objective of the above model is represented by formula (1) and formula (2):
wherein formula (1) represents an optimization objective of the environment-invariant partial parameters, formula (2) represents an optimization objective of the environment-specific partial parameters,a parameter representing a constant part of the environment,is represented in the environmentIn the context of the parameters of the specific part of the environment,is represented in the environmentThe predicted loss is calculated as a result of the calculation,is used for controllingThe over-parameters of the intensity are,is used for preventingA regularization constraint that is context invariant dependent is captured,the variance representing the risk of experience for different training environments,finger environmentThe weight of the loss is predicted and,to representThe coefficient of (a).
According to the embodiment of the invention, the variance of the experience risks of different training environments is adoptedExpressed by equation (3):
wherein ,representing the number of elements of the set of training environments,andthe values of the different environments are represented,is represented in the environmentThe environment-specific part-parameter of (a),is represented in the environmentThe variance of the prediction loss and the empirical risk of different training environments obtained by the calculationA mode for capturing different environment shares;
According to the embodiment of the invention, the click rate prediction model comprises a click rate prediction model based on the decoupling invariant learning of the click data feature embedding layer and/or a click rate prediction model based on the decoupling invariant learning of the click data feature domain weight layer.
According to the embodiment of the invention, the click rate prediction model based on the decoupling invariant learning of the click data feature embedding level is determined by formula (5):
wherein ,a parameter representing a constant part of the environment,is represented in the environmentIn the context of the parameters of the specific part of the environment,,,a feature representing the click data is shown,is shown asThe characteristics of the individual click data are,denotes the firstThe characteristics of the individual click data are such that,indicating the number of click data features that are to be,denotes the firstThe environment-invariant features corresponding to the individual features are embedded,is shown asThe environment-invariant features corresponding to the individual features are embedded,denotes the firstThe characteristic corresponds toThe specific features of the individual environments are embedded in,denotes the firstThe characteristic corresponds toThe specific features of the individual environments are embedded in,the click rate prediction model;
the click rate prediction model based on the decoupling invariant learning of the click data feature domain weight layer is determined by a formula (6):
wherein ,representing a domainIn the above-mentioned characterization of (1),representing a domainIn the above-mentioned characterization of (1),representation domainAnd domainThe environment of the room does not change the weight,representing a domainAnd domainIn the environment ofOf a particular weight of,Representing the number of feature fields.
According to an embodiment of the present invention, the domainIs characterized byIs based on domainsMid-feature embeddingA calculation is performed, determined by equation (7):
wherein ,represent the first of the dataThe characteristics of the data are such that,indicates all the domainsData characteristics ofCorresponding toThe set of (a) or (b),represent the first of the dataAnd embedding the characteristics corresponding to the data characteristics.
According to a second aspect of the present invention, there is provided a click rate prediction method, including:
acquiring a historical data set of a user to be predicted, wherein the historical data set of the user to be predicted comprises user characteristic data and user click data;
and mining a prediction result of the environment-invariant feature interaction of the historical data set of the user to be predicted by using a click rate prediction model, wherein the click rate prediction model is obtained by training the click rate prediction model based on decoupling invariant learning through the training method.
According to a third aspect of the present invention, there is provided a training apparatus for a click rate prediction model based on decoupling invariant learning, comprising:
the model building module is used for executing the first step and building a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein parameters of the click rate prediction model comprise parameters of an environment invariant part and parameters of an environment specific part, and the model optimization target comprises an optimization target of the parameters of the environment invariant part and an optimization target of the parameters of the environment specific part;
the data sampling module is used for executing the second step, randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and comprises a label value;
the invariant parameter updating module is used for executing the third step, fixing the environment specific part parameters of the click rate prediction model, excavating the environment invariant characteristics of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the label value of the training sample data set by using an environment invariant loss function through a gradient descent method based on the optimization target of the environment invariant part parameters to obtain a first loss value, and updating the environment invariant part parameters of the click rate prediction model according to the first loss value;
the specific parameter updating module is used for executing the fourth step, fixing the environment invariant part parameters of the click rate prediction model, mining the environment specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environment specific loss function through a gradient descent method based on the optimization target of the environment specific part parameters to obtain a second loss value, and updating the environment specific part parameters of the click rate prediction model according to the second loss value;
and the iteration module is used for iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.
According to a fourth aspect of the present invention, there is provided an electronic apparatus comprising:
one or more processors;
a storage device to store one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a training method based on a click-through rate prediction model of decoupled invariant learning and a click-through rate prediction method.
According to a fifth aspect of the present invention, there is provided a computer readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a method of training a click-through rate prediction model based on decoupled invariant learning and a method of click-through rate prediction.
According to the training method of the click rate prediction model based on the decoupling invariant learning, provided by the invention, the click rate prediction model with good generalization can be obtained, so that the model can identify stable characteristic interaction in different historical environments, and meanwhile, the problem that the click rate prediction model in the prior art is low in identification accuracy due to the fact that a data drift phenomenon exists between data processed in the model application stage and historical training data is solved, and the prediction accuracy of the model is greatly improved.
Drawings
FIG. 1 is a flow chart of a method of training a click-through rate prediction model based on decoupled invariant learning, according to an embodiment of the present invention;
FIG. 2 (a) is a schematic diagram of a decoupled invariant learning model according to an embodiment of the present invention;
FIG. 2 (b) is a schematic diagram of a light decoupling invariant learning model according to an embodiment of the present invention;
FIG. 3 is a flow chart of a click-through rate prediction method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a training apparatus for a click rate prediction model based on decoupling invariant learning according to an embodiment of the present invention;
FIG. 5 schematically shows a block diagram of an electronic device suitable for implementing a click-through rate prediction model training method and a click-through rate prediction method based on decoupled invariant learning, according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings in combination with the embodiments.
Click-through rate prediction is a key link in recommendation systems, and early methods were factorized models, which model feature interactions in the form of factorization and inner products. In recent years, with the rapid development of machine learning and deep learning technologies, those skilled in the art propose to implement more efficient and complex feature interactive modeling based on various neural networks, such as a multi-layer perceptron, an inner product or outer product neural network, an attention-oriented neural network, a convolutional neural network, or a graph neural network. In recent years, methods based on neural network architecture search have also been proposed, with some efforts focused on automatic search of optimal network architecture modeling feature interactions, and other efforts focused on automatic selection or generation of optimal feature interactions. These efforts enable better feature interaction modeling while also greatly reducing human input. However, the characteristic interaction modeling models have the problems of data drift, poor generalization and the like; meanwhile, aiming at the problems of data drift and poor generalization, the technical personnel in the field provide a recommendation model based on an invariant learning paradigm; the recommendation model hypothesis target based on the invariant learning paradigm can be fully predicted by the environment invariant correlation, and the hypothesis cannot be met in the actual training and application process of the model, so that the capability of identifying stable feature interaction of the recommendation model based on the invariant learning paradigm is difficult to guarantee.
In order to learn stable feature interaction in the recommendation system click rate prediction problem and improve the generalization capability of a model on new data, the invention provides a stable feature interaction capturing method based on decoupling invariant learning. According to the method, historical data are divided into different environments according to time sequence, and an invariant learning hypothesis is established by decoupling the environment invariant correlation and the environment specific correlation and removing the environment invariant correlation, so that stable feature interaction is captured by applying invariant learning. Meanwhile, the stable characteristic interaction capturing method based on the decoupling invariant learning can capture stable characteristic interaction from heterogeneity of historical data in different environments, so that the learned characteristic interaction can have good generalization capability in a service phase of a real click rate prediction problem scene, and the prediction accuracy of a recommendation system is improved.
FIG. 1 is a flowchart of a training method of a click-through rate prediction model based on decoupled invariant learning according to an embodiment of the present invention.
As shown in FIG. 1, the training method of the click rate prediction model based on the decoupling invariant learning includes operations S110-150.
In operation S110, a click-through rate prediction model and a model optimization objective are constructed based on a decoupling invariant learning method, where parameters of the click-through rate prediction model include parameters of an environment-invariant portion and parameters of an environment-specific portion, and the model optimization objective includes an optimization objective of the parameters of the environment-invariant portion and an optimization objective of the parameters of the environment-specific portion.
In operation S120, an environment data set is randomly sampled to obtain a training sample data set, where the environment data set represents historical click data of a user in different time periods, and includes a tag value.
In operation S130, the specific environmental parameter of the click rate prediction model is fixed, the click rate prediction model is used to mine the invariant environmental characteristics of the training sample data set to obtain a first prediction result, the first prediction result and the label value of the training sample data set are processed by using the invariant environmental loss function through a gradient descent method based on the optimization target of the invariant environmental parameter to obtain a first loss value, and the invariant environmental parameter of the click rate prediction model is updated according to the first loss value.
In operation S140, the environment invariant portion parameter of the click rate prediction model is fixed, the updated click rate prediction model is used to mine the environment specific feature of the training sample data set to obtain a second prediction result, the second prediction result and the label value of the training sample data set are processed by using the environment specific loss function through a gradient descent method based on the optimization target of the environment specific portion parameter to obtain a second loss value, and the environment specific portion parameter of the click rate prediction model is updated according to the second loss value.
In operation S150, the operations S120 to S140 are performed iteratively until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.
The training method of the click rate prediction model provided by the invention can fully mine the invariant features and the specific features of historical data in different environments; wherein the invariant features of the historical data refer to features that are common to the data over different time periods, for example, a user has an invariant, relatively fixed preference for certain items or topics over different time periods, and will pay attention to the items or topics for a long time and click on content related to the items or topics; the specific characteristics of the historical data refer to the user's preference for certain items or topics appearing suddenly at a certain time point or period, for example, the user may pay more attention to a sudden news hot event or a sudden red article on social media and improve the click rate of the relevant hot event.
According to the method for training the click rate prediction model based on the decoupling invariant learning, the click rate prediction model with good generalization performance can be obtained, so that the model can identify stable characteristic interaction in different historical environments, meanwhile, the problem that the click rate prediction model in the prior art is low in identification accuracy due to the fact that data drift exists between data processed in the model application stage and historical training data is solved, and the prediction accuracy of the model is greatly improved.
According to an embodiment of the present invention, the optimization objective of the above model is represented by formula (1) and formula (2):
whereinFormula (1) represents the optimization objective of the environment-invariant partial parameters, formula (2) represents the optimization objective of the environment-specific partial parameters,a parameter representing a constant part of the environment,is represented in the environmentIn the context of the parameters of the specific part of the environment,is represented in the environmentThe predicted loss obtained in (1) is calculated,is used for controllingThe over-parameters of the intensity are,is used for preventingA regularization constraint that is context invariant dependent is captured,the variance representing the risk of experience for different training environments,finger environmentThe weight of the loss is predicted and,to representThe coefficient of (a).
According to the embodiment of the invention, the variance of the experience risks of different training environments is adoptedExpressed by equation (3):
wherein ,representing the number of elements of the set of training environments,andthe values of the different environments are represented,is represented in the environmentThe environment-specific part parameter of (a),is represented in the environmentThe variance of the empirical risk of the different training environmentsA mode for capturing different environment shares;
According to the embodiment of the invention, the click rate prediction model comprises a click rate prediction model based on the decoupling invariant learning of the click data feature embedding layer and/or a click rate prediction model based on the decoupling invariant learning of the click data feature domain weight layer.
According to the embodiment of the invention, the click rate prediction model based on the decoupling invariant learning of the click data feature embedding level is determined by formula (5):
wherein ,a parameter representing a constant part of the environment,is represented in the environmentIn the context of the parameters of the specific part of the environment,,,a feature representing the click data is shown,is shown asThe characteristics of the individual click data are such that,to representFirst, theThe characteristics of the individual click data are such that,indicating the number of click data features that are to be characterized,is shown asThe environment-invariant features corresponding to the individual features are embedded,is shown asThe environment-invariant features corresponding to the individual features are embedded,is shown asThe characteristic corresponds toThe specific features of the individual environments are embedded in,is shown asThe characteristic corresponds toThe specific features of the individual environments are embedded in,the click rate prediction model.
The click rate prediction model based on the decoupling invariant learning of the click data feature domain weight layer is determined by a formula (6):
wherein ,representing a domainThe characterization of (a) is performed,representing a domainIn the above-mentioned characterization of (1),representation domainAnd domainThe environment of the room does not change the weight,representing a domainAnd domainIn the environment ofHas a specific weight of,Representing the number of feature fields.
According to an embodiment of the present invention, the domains mentioned aboveIs characterized byIs based on domainsMid-feature embeddingA calculation is performed, determined by equation (7):
wherein ,represent the first of the dataThe characteristics of the data are such that,indicates all the domainsData characteristics ofCorresponding toThe set of (a) or (b),represent the first of the dataAnd embedding the characteristics corresponding to the data characteristics.
The invention provides a stable characteristic interaction capturing framework aiming at a click rate prediction problem, which mainly comprises three parts: a decoupled invariant learning objective for capturing stable feature interactions; a meta-learning optimization framework for implementing a decoupled invariant learning objective; model architecture for implementing decoupled invariant learning.
For use ofIn the method, historical data is divided into a plurality of learning targets with equal duration in sequence in order to capture the decoupling invariant learning target of the stable characteristic interactionA different environment. The invention divides model parameters for modeling feature interaction into an environment-invariant partWith environment-specific partsRespectively for capturing environment-invariant and environment-specific correlations. The invention designs the characteristic embedding level and the characteristic domain weight level respectivelyAnd. In order to achieve decoupling of the environment invariant correlation and the environment specific correlation to meet the sufficient prediction assumption of invariant learning and capture stable feature interaction, the invention designs a decoupling invariant learning target which consists of an environment specific learning target meeting the sufficient prediction target assumption and an environment invariant learning target removing the environment specific correlation influence.
The invention divides historical data into equal time length in sequenceDifferent environments can better mine the constant characteristics in the historical data or the specific characteristics related to the environments. For example, historical click data of a user is divided into multiple sections, and common features among the multiple sections of data, namely invariant click features irrelevant to the environment of the user, can be mined; the different characteristics among the plurality of pieces of data may be specific click characteristics of the user related to the environment.
Learning objectives are specific to the environment that satisfy the assumption of adequate prediction objectives. To satisfy the sufficient prediction assumption of invariant learning, in the environmentIn, combineAnd withShould the target be sufficiently predictable, while the environment-specific part can focus on capturing the environment-specific correlations, the following optimization targets are designed, as shown in equation (8):
wherein ,is represented in the environmentThe predicted loss obtained in (1) is calculated,is used for controllingThe over-parameters of the intensity are,is used for preventingCapturing regularization constraints of environment invariant correlations by letting environment specific parametersIn a removing environmentOutside environmentNo contribution is made to the prediction, as shown in equation (9):
by optimizing the learning objective, the partial parameters of the environment can be made constantAnd environment specific part parametersIn the environmentThe middle union satisfies the sufficient prediction condition while makingFocus on capturing environment-specific dependencies.
An environment-invariant learning objective for removing environment-specific relevant influences. When the temperature is higher than the set temperatureAfter capturing environment specific correlations, fixEquivalent to removingImpact on predicted targets when capture environment invariant correlation can satisfy adequate predicted targets (removal ofInfluenced goal) of the target. Thus, the invention is fixedDesigning the following invariant learning objective optimization environment invariant model parametersTo capture stable feature interactions, as shown in equation (1):
wherein Refers to the variance of the risk of experience for different training environments,finger environmentThe weight of the predicted loss is calculated in a specific manner as shown in equations (3) and (4):
combining minimized cross-environment loss to improve performance across all environments, and minimizing loss difference between environmentsThe performance difference among different environments is limited, the sharing mode of the different environments is captured, and the model parameters stable across the environments are learned. At the same time, by applying greater weight to environments with high empirical riskThe method can pay more attention to the difficult environment, and further improve the cross-environment generalization performance of the model parameters.
In summary, the overall learning objective of the decoupling invariant learning is shown in equations (1) and (2):
by optimizing the learning objective, the environment invariant correlation and the environment specific correlation in different environments can be decoupled, and cross-environment stable feature interaction is captured between heterogeneous environments through risk variance and environment weighting, so that the feature interaction can be well generalized in a model service stage.
In the meta-learning optimization framework, two sub-optimization targets for decoupling invariant learning are interdependent, and an environment-invariant optimization target is requiredCapture and fix environment-specific dependenciesTo remove its effect. Thus, the present invention alternately iteratively updatesAnd。
first, the environment-invariant model parameters are updated. Fixing the deviceOptimizingIn view of the learning objective of decoupled invariant learning, which is a complex two-layer optimization problem, the present invention optimizes this objective based on meta-learning. In meta-training phase, an environment is randomly sampledGeneration of intermediate model parameters using environment-specific learning objectivesAs shown in equation (10):
then in the meta-test stage, invariant learning loss optimization is obtained by using intermediate model parameter calculationAs shown in equation (11):
Second, the environment-specific model parameters are updated. In the process of updatingThen, fixDirectly optimizing environment-specific learning objectives to updateAs shown in equation (12):
wherein ,solving aboutA gradient of (a); and performing alternate iteration on the two types of updating until the model converges.
Fig. 2 (a) and fig. 2 (b) respectively show schematic diagrams of two types of decoupling invariant learning models according to an embodiment of the present invention, where fig. 2 (a) shows the decoupling invariant learning model and fig. 2 (b) shows the light decoupling invariant learning model (LightDIL).
For the model architecture, the environment-invariant model parameters are respectively designed at the aspect of feature embedding and the aspect of feature domain weightWith environment-specific model parametersBy decoupling the two types of correlations, the factorization model is taken as an example (the method can also be designed based on other models), and the following two model architectures are designed.
The first model architecture, as shown in fig. 2 (a), is feature embedding level decoupling. In view of the core of the feature-embedded feature interaction model, the present invention is decoupled at the feature-embedded level. To the characteristicsMake its corresponding environment invariant embedded vectorContext-specific embedded vector set. Then for the factorization model, the concrete model prediction formula is as shown in formula (5):
The second model architecture, as shown in fig. 2 (b), is decoupled at the feature domain weight level. The characteristic embedding layer decoupling greatly increases model parameters, so that the difficulty of model learning is improved, and the model storage burden and the training overhead are increased. To improve model efficiency, the present invention decouples features and aspects. Specifically, we assign environment-invariant weights and environment-specific weights to feature interactions at the feature domain level to capture environment-invariant and environment-specific correlations, respectively. Taking a factorization machine as an example, the model prediction formula is shown in formula (6):
wherein ,is based on domainsMid-feature embeddingComputed domainsThe characterization of (1);domain(s)And domainThe environment of the room is not weighted by the change,is a domainAnd domainWith a particular weight of the environment t, of. The model architecture is named light decoupled invariant learning (LightDIL).
In summary, the present invention designs a decoupling model architecture at the feature embedding level and the feature domain weight level, respectively, as shown in fig. 2 (a). In the service stage, only the environment-invariant model parameters, namely the stable characteristic interaction, are used for prediction so as to ensure good generalization capability. Taking the light decoupling invariant learning as an example, a specific prediction formula is shown in formula (13):
wherein ,and representing an empty set, and replacing the set of the environment-specific model parameters with the empty set in the prediction stage of the light decoupling invariant learning.
FIG. 3 is a flowchart of a click-through rate prediction method according to an embodiment of the invention.
As shown in fig. 3, the click-through rate prediction method includes operations S310 to S320.
In operation S310, a historical data set of a user to be predicted is obtained, where the historical data set of the user to be predicted includes user feature data and user click data.
In operation S320, a click rate prediction model is used to mine a prediction result of environment invariant feature interaction of the historical data set of the user to be predicted, where the click rate prediction model is obtained by training the click rate prediction model based on the decoupling invariant learning through the above-mentioned training method.
FIG. 4 is a schematic structural diagram of a training device for a click rate prediction model based on decoupling invariant learning according to an embodiment of the present invention.
As shown in fig. 4, the training apparatus 400 for the click rate prediction model based on the decoupled invariant learning includes a model building module 410, a data sampling module 420, an invariant parameter updating module 430, a specific parameter updating module 440, and an iteration module 450.
The model building module 410 is configured to execute operation S110, and build a click-through rate prediction model and a model optimization target based on a decoupling invariant learning method, where parameters of the click-through rate prediction model include parameters of an environment-invariant portion and parameters of an environment-specific portion, and the model optimization target includes an optimization target of the parameters of the environment-invariant portion and an optimization target of the parameters of the environment-specific portion.
The data sampling module 420 is configured to execute operation S120, and perform random sampling on an environment data set to obtain a training sample data set, where the environment data set represents historical click data of a user in different time periods, and the environment data set includes a tag value.
The invariant parameter updating module 430 is configured to execute operation S130, fix the environment specific part parameter of the click rate prediction model, mine the environment invariant feature of the training sample data set by using the click rate prediction model to obtain a first prediction result, process the first prediction result and the label value of the training sample data set by using the environment invariant loss function through a gradient descent method based on the optimization target of the environment invariant part parameter to obtain a first loss value, and update the environment invariant part parameter of the click rate prediction model according to the first loss value.
The specific parameter updating module 440 is configured to execute operation S140, fix the environment-invariant parameters of the click rate prediction model, mine the environment-specific features of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, process the second prediction result and the label value of the training sample data set by using the environment-specific loss function through a gradient descent method based on the optimization target of the environment-specific parameters to obtain a second loss value, and update the environment-specific parameters of the click rate prediction model according to the second loss value.
And the iteration module 450 is configured to iterate operations S120 to S140 until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.
In order to better illustrate the advantages of the click rate prediction model obtained by the training method provided by the invention, the click rate prediction model obtained by the training method of the invention is verified by combining a specific experiment.
The method takes a classical click rate prediction model FM as a basic recommendation model, and selects two public different types of data, namely, double, movieLens10M (ML-10M) for experiment. In the invention, fwFMs, autoFIS, PROFIT, group-DRO and V-Rex are used as comparison models. The invention divides double, ML-10M into 1513 parts respectively with 6 months as a period. For double, the first five time periods are used as training sets, the middle five as validation sets, and the last five as test sets. For ML-10M, the first five time periods are used as training sets, the middle four are used as validation sets, and the last four are used as test sets. All methods train the model on the training set, select the optimal parameter on the verification set, and test on the test set. We counted the average performance of the last several test phases of double and ML-10M, respectively, as measured by AUC and logoss.
The results are shown in Table 1:
TABLE 1 comparison of Performance of different methods on two datasets
From table 1, it can be found that: on two different types of data sets, all indexes of the method exceed those of a common invariant learning method V-Rex, group-DRO, and the method can be used for applying invariant learning to click rate prediction stable characteristic interactive capture by decoupling a sufficient prediction hypothesis meeting the invariant learning. Compared with the recommendation system models FwFMs, autoFIS and PROFIT, the method can obtain excellent results, which shows that the method can capture stable characteristic interaction for different recommendation scenes, achieve better generalization in service stage prediction and improve the prediction accuracy.
FIG. 5 schematically shows a block diagram of an electronic device suitable for implementing a click-through rate prediction model training method based on decoupled invariant learning and a click-through rate prediction method according to an embodiment of the present invention.
As shown in fig. 5, an electronic device 500 according to an embodiment of the present invention includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 501 may also include onboard memory for caching purposes. Processor 501 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are stored. The processor 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to the embodiments of the present invention by executing programs in the ROM 502 and/or the RAM 503. Note that the program may also be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
According to an embodiment of the present invention, electronic device 500 may also include an input/output (I/O) interface 505, input/output (I/O) interface 505 also being connected to bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted on the storage section 508 as necessary.
The present invention also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.
According to embodiments of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, a computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 as described above.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A click rate prediction model training method based on decoupling invariant learning is characterized by comprising the following steps:
the method comprises the steps that firstly, a click rate prediction model and a model optimization target are built on the basis of a decoupling invariant learning method, wherein parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;
randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and comprises tag values;
fixing the environment specific part parameter of the click rate prediction model, mining the environment invariant feature of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the label value of the training sample data set by using an environment invariant loss function through a gradient descent method based on the optimization target of the environment invariant part parameter to obtain a first loss value, and updating the environment invariant part parameter of the click rate prediction model according to the first loss value;
fixing the environment invariant part parameters of the click rate prediction model, mining the environment specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, processing the second prediction result and the label value of the training sample data set by using an environment specific loss function through a gradient descent method based on the optimization target of the environment specific part parameters to obtain a second loss value, and updating the environment specific part parameters of the click rate prediction model according to the second loss value;
and iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.
2. The method of claim 1, wherein the optimization objective of the model is represented by formula (1) and formula (2):
wherein formula (1) represents an optimization objective of the environment-invariant partial parameter, formula (2) represents an optimization objective of the environment-specific partial parameter,a parameter representing a constant part of said environment,is represented in the environmentThe environment-specific part parameter of (a),is represented in the environmentThe predicted loss obtained by the calculation of (a) above,is used for controllingThe over-parameters of the intensity are,is used for preventingA regularization constraint that is context invariant dependent is captured,the variance representing the risk of experience for different training environments,finger environmentThe weight of the loss is predicted and,to representThe coefficient of (a).
3. The method of claim 2, wherein the variance of the different training environment experience risksExpressed by equation (3):
wherein ,representing the number of elements of the set of training environments,andthe values of the different environments are represented,is represented in the environmentThe environment-specific part-parameter of (a),is represented in the environmentThe variance of the empirical risk of the different training environmentsA mode for capturing different environment shares;
4. The method of claim 1, wherein the click rate prediction models comprise click rate prediction models based on decoupled invariant learning of a click data feature embedding level and/or click rate prediction models based on decoupled invariant learning of a click data feature domain weight level.
5. The method of claim 4, wherein the click-through rate prediction model based on decoupled invariant learning of click data feature embedding levels is determined by equation (5):
wherein ,a parameter representing a constant part of said environment,is represented in the environmentThe environment-specific part parameter of (a),,,a feature representing the click data is shown,is shown asThe characteristics of the individual click data are such that,is shown asThe characteristics of the individual click data are such that,represents the number of the click data features,denotes the firstThe environment-invariant features corresponding to the individual features are embedded,is shown asThe environment-invariant features corresponding to the individual features are embedded,denotes the firstThe characteristic corresponds toThe specific features of the individual environments are embedded in,is shown asThe characteristic corresponds toThe specific features of the individual environments are embedded in,the click rate prediction model;
wherein the click rate prediction model based on the decoupling invariant learning of the click data feature domain weight layer is determined by formula (6):
wherein ,representing a domainThe characterization of (a) is performed,representing a domainThe characterization of (a) is performed,representation domainAnd domainThe environment of the room is not weighted by the change,representing a domainAnd domainIn the environment ofHas a specific weight of,Representing the number of feature fields.
6. The method of claim 5, wherein the domain is a public domainIs characterized byIs based on domainsMid-feature embeddingA calculation is performed, determined by equation (7):
7. A click-through rate prediction method, comprising:
acquiring a historical data set of a user to be predicted, wherein the historical data set of the user to be predicted comprises user characteristic data and user click data;
and mining a prediction result of the environment-invariant feature interaction of the historical data set of the user to be predicted by using a click-through rate prediction model, wherein the click-through rate prediction model is obtained by training according to the method of any one of claims 1 to 6.
8. A training device of a click rate prediction model based on decoupling invariant learning is characterized by comprising the following components:
the model construction module is used for executing the first step, constructing a click rate prediction model and a model optimization target based on a decoupling invariant learning method, wherein the parameters of the click rate prediction model comprise environment invariant part parameters and environment specific part parameters, and the model optimization target comprises an optimization target of the environment invariant part parameters and an optimization target of the environment specific part parameters;
the data sampling module is used for executing the second step, randomly sampling an environment data set to obtain a training sample data set, wherein the environment data set represents historical click data of a user in different time periods, and comprises tag values;
the invariant parameter updating module is used for executing the third step, fixing the environment specific part parameter of the click rate prediction model, mining the environment invariant feature of the training sample data set by using the click rate prediction model to obtain a first prediction result, processing the first prediction result and the label value of the training sample data set by using an environment invariant loss function through a gradient descent method based on the optimization target of the environment invariant part parameter to obtain a first loss value, and updating the environment invariant part parameter of the click rate prediction model according to the first loss value;
a specific parameter updating module, configured to perform the fourth step, fix the environment-invariant parameter of the click rate prediction model, mine the environment specific characteristics of the training sample data set by using the updated click rate prediction model to obtain a second prediction result, process the second prediction result and the label value of the training sample data set by using an environment specific loss function through a gradient descent method based on the optimization target of the environment specific parameter to obtain a second loss value, and update the environment specific parameter of the click rate prediction model according to the second loss value;
and the iteration module is used for iterating the second step to the fourth step until the click rate prediction model meets a preset convergence condition, so as to obtain a trained click rate prediction model.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.
10. A computer-readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310053850.7A CN115809372B (en) | 2023-02-03 | 2023-02-03 | Click rate prediction model training method and device based on decoupling invariant learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310053850.7A CN115809372B (en) | 2023-02-03 | 2023-02-03 | Click rate prediction model training method and device based on decoupling invariant learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115809372A true CN115809372A (en) | 2023-03-17 |
CN115809372B CN115809372B (en) | 2023-06-16 |
Family
ID=85487763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310053850.7A Active CN115809372B (en) | 2023-02-03 | 2023-02-03 | Click rate prediction model training method and device based on decoupling invariant learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115809372B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490389A (en) * | 2019-08-27 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Clicking rate prediction technique, device, equipment and medium |
CN111538761A (en) * | 2020-04-21 | 2020-08-14 | 中南大学 | Click rate prediction method based on attention mechanism |
CN113205184A (en) * | 2021-04-28 | 2021-08-03 | 清华大学 | Invariant learning method and device based on heterogeneous hybrid data |
US20220083913A1 (en) * | 2020-09-11 | 2022-03-17 | Actapio, Inc. | Learning apparatus, learning method, and a non-transitory computer-readable storage medium |
CN114240555A (en) * | 2021-12-17 | 2022-03-25 | 北京沃东天骏信息技术有限公司 | Click rate prediction model training method and device and click rate prediction method and device |
CN114445121A (en) * | 2021-12-27 | 2022-05-06 | 天翼云科技有限公司 | Advertisement click rate prediction model construction and advertisement click rate prediction method |
CN115018552A (en) * | 2022-06-28 | 2022-09-06 | 中国科学技术大学 | Method for determining click rate of product |
-
2023
- 2023-02-03 CN CN202310053850.7A patent/CN115809372B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490389A (en) * | 2019-08-27 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Clicking rate prediction technique, device, equipment and medium |
CN111538761A (en) * | 2020-04-21 | 2020-08-14 | 中南大学 | Click rate prediction method based on attention mechanism |
US20220083913A1 (en) * | 2020-09-11 | 2022-03-17 | Actapio, Inc. | Learning apparatus, learning method, and a non-transitory computer-readable storage medium |
CN113205184A (en) * | 2021-04-28 | 2021-08-03 | 清华大学 | Invariant learning method and device based on heterogeneous hybrid data |
CN114240555A (en) * | 2021-12-17 | 2022-03-25 | 北京沃东天骏信息技术有限公司 | Click rate prediction model training method and device and click rate prediction method and device |
CN114445121A (en) * | 2021-12-27 | 2022-05-06 | 天翼云科技有限公司 | Advertisement click rate prediction model construction and advertisement click rate prediction method |
CN115018552A (en) * | 2022-06-28 | 2022-09-06 | 中国科学技术大学 | Method for determining click rate of product |
Non-Patent Citations (2)
Title |
---|
孟露,王莉: "推荐系统点击率预测模型" * |
郑嘉伟,王粉花: "基于多层次特征交互的点击率预测模型" * |
Also Published As
Publication number | Publication date |
---|---|
CN115809372B (en) | 2023-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103502899B (en) | Dynamic prediction Modeling Platform | |
CN111369299B (en) | Identification method, device, equipment and computer readable storage medium | |
CN108229667A (en) | Trimming based on artificial neural network classification | |
CN111145076B (en) | Data parallelization processing method, system, equipment and storage medium | |
KR101828215B1 (en) | A method and apparatus for learning cyclic state transition model on long short term memory network | |
CN109389424B (en) | Flow distribution method and device, electronic equipment and storage medium | |
CN111950810A (en) | Multivariable time sequence prediction method and device based on self-evolution pre-training | |
CN112785342A (en) | Real estate dynamic estimation method and device | |
CN113435430A (en) | Video behavior identification method, system and equipment based on self-adaptive space-time entanglement | |
US20140236869A1 (en) | Interactive variable selection device, interactive variable selection method, and interactive variable selection program | |
CN116684330A (en) | Traffic prediction method, device, equipment and storage medium based on artificial intelligence | |
CN110263136B (en) | Method and device for pushing object to user based on reinforcement learning model | |
CN115359321A (en) | Model training method and device, electronic equipment and storage medium | |
US11475295B2 (en) | Predicting and visualizing outcomes using a time-aware recurrent neural network | |
Larsen et al. | Fast continuous and integer L-shaped heuristics through supervised learning | |
CN114862010A (en) | Flow determination method, device, equipment and medium based on space-time data | |
CN113505583B (en) | Emotion reason clause pair extraction method based on semantic decision graph neural network | |
CN112486784A (en) | Method, apparatus and medium for diagnosing and optimizing data analysis system | |
CN116861262B (en) | Perception model training method and device, electronic equipment and storage medium | |
US11989656B2 (en) | Search space exploration for deep learning | |
CN110717537B (en) | Method and device for training user classification model and executing user classification prediction | |
CN115809372A (en) | Click rate prediction model training method and device based on decoupling invariant learning | |
JP2005222445A (en) | Information processing method and analysis device in data mining | |
WO2020059136A1 (en) | Decision list learning device, decision list learning method, and decision list learning program | |
CN113094602B (en) | Hotel recommendation method, system, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |